Calculus · 2015-02-26 · calculus is to the highly sophisticated theory of Gamma function, which...

Calculus

Sungpyo Hong

Pohang University of Science and Technology

POSTECH

2006

Preface

Calculus is the most fundamental course in the study of science and engi-neering, whose main contents include the differentiation and the integrationof functions. Current cutting edge technology would not be possible withoutthe notions of differentiation and integration.

Most universities in Korea are using Calculus text books from the US,which begin with the most basics of differentiation and integration and someapplications. However, these are already the contents of high school coursesfor those students of science and engineering in Korea. The faculty mem-bers at POSTECH felt that it is necessary to challenge the students withmore advanced contents, and they replaced the basics of differentiation andintegration with their applications to ordinary differential equations.

At first part of the course, the students feel a little bit difficulties, butat the later part of the course, they feel more challenging and get bettermotivations. We are confident that the students at POSTECH have enoughability to absorb this much.

At this point, we felt that we need our own style text book of Calculus.This lecture note is outgrown from the lectures delivered to freshmen inPOSTECH for many years with this in mind.

This book includes the techniques of integration as an appendix. Thosestudents who feel uncomfortable with integrations may refer this as theyneed. We also think that one of the most prominent applications of thecalculus is to the highly sophisticated theory of Gamma function, whichplays quite important roles not only in science and engineering but also inmathematical theories. Hence, we include the theory of the Gamma functionin an appendix, so that one can refer this part whenever they need.

We are very grateful to many colleagues in the mathematics departmentat Pohang University of Science and Technology (POSTECH). We wouldalso like to acknowledge the invaluable assistance we have received from theteaching assistants who have helped to make the problems and exercises inthis book.

v

vi Preface

We have endeavored to eliminate typos and mistakes, but there may stillbe many such things. We will appreciate it very much if anyone points outsuch mistakes or misprints.

Sungpyo Hong, & Young Sun [email protected] & [email protected]

July 2006, in Pohang, Korea

Contents

Preface v

1 Preliminaries 11.1 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Limits of Function Values . . . . . . . . . . . . . . . . . . . . 2

1.2.1 One-sided Limits and Limits at Infinity . . . . . . . . 51.2.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3 Continuity of Functions . . . . . . . . . . . . . . . . . . . . . 91.3.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.4 Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.4.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.5 Differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.5.1 Parametric equations . . . . . . . . . . . . . . . . . . . 201.5.2 Implicit differentiation . . . . . . . . . . . . . . . . . . 221.5.3 Applications of derivatives . . . . . . . . . . . . . . . . 231.5.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 25

1.6 Indeterminate Forms . . . . . . . . . . . . . . . . . . . . . . . 261.6.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 29

1.7 Antiderivatives . . . . . . . . . . . . . . . . . . . . . . . . . . 291.7.1 Definite integrals . . . . . . . . . . . . . . . . . . . . . 301.7.2 Integrals by substitution . . . . . . . . . . . . . . . . . 341.7.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 35

2 Transcendental Functions 372.1 Inverse Functions . . . . . . . . . . . . . . . . . . . . . . . . . 382.2 Trigonometric Functions . . . . . . . . . . . . . . . . . . . . . 38

2.2.1 Inverse trigonometric functions . . . . . . . . . . . . . 392.2.2 Derivatives of the inverse trig functions . . . . . . . . 412.2.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 43

vii

viii CONTENTS

2.3 The Natural Logarithm . . . . . . . . . . . . . . . . . . . . . 442.4 The Exponential Function ex . . . . . . . . . . . . . . . . . . 472.5 The Functions ax . . . . . . . . . . . . . . . . . . . . . . . . . 502.6 The Functions y = loga x . . . . . . . . . . . . . . . . . . . . 512.7 Order and Oh-Notation . . . . . . . . . . . . . . . . . . . . . 512.8 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522.9 Hyperbolic Functions . . . . . . . . . . . . . . . . . . . . . . . 54

2.9.1 The inverse hyperbolic functions . . . . . . . . . . . . 562.10 Techniques of integration . . . . . . . . . . . . . . . . . . . . 59

2.10.1 Basic integration formulas . . . . . . . . . . . . . . . . 592.10.2 Integration by parts . . . . . . . . . . . . . . . . . . . 602.10.3 Integration by partial fractions . . . . . . . . . . . . . 612.10.4 Trigonometric substitutions . . . . . . . . . . . . . . . 622.10.5 Improper integrals . . . . . . . . . . . . . . . . . . . . 63

3 Infinite Sequences and Series 673.1 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673.2 Infinite Series . . . . . . . . . . . . . . . . . . . . . . . . . . . 713.3 Tests for Convergence of Series . . . . . . . . . . . . . . . . . 74

3.3.1 The integral test . . . . . . . . . . . . . . . . . . . . . 743.3.2 The comparison test . . . . . . . . . . . . . . . . . . . 763.3.3 The ratio and root tests . . . . . . . . . . . . . . . . . 783.3.4 The alternating series test . . . . . . . . . . . . . . . . 80

3.4 Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 833.4.1 Term-by-term differentiation and integration . . . . . 863.4.2 Multiplication of power series . . . . . . . . . . . . . . 87

3.5 Taylor and Maclaurin Series . . . . . . . . . . . . . . . . . . . 883.6 Applications of Power Series . . . . . . . . . . . . . . . . . . . 963.7 Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

3.7.1 Convergence of Fourier series . . . . . . . . . . . . . . 102

4 First Order Differential Equations 1054.1 First order Linear Differential Equations . . . . . . . . . . . . 1054.2 Separable Equations . . . . . . . . . . . . . . . . . . . . . . . 111

4.2.1 Population models . . . . . . . . . . . . . . . . . . . . 1144.2.2 Brachistochrone problem . . . . . . . . . . . . . . . . 119

4.3 Exact Equations . . . . . . . . . . . . . . . . . . . . . . . . . 1224.4 Existence and Uniqueness Theorem . . . . . . . . . . . . . . . 129

CONTENTS ix

5 Second Order Differential Equations 1395.1 Second Order Linear Differential Equations . . . . . . . . . . 1395.2 H2O-LDE with Constant Coefficients . . . . . . . . . . . . . . 1435.3 Nonhomogeneous Equations . . . . . . . . . . . . . . . . . . . 150

5.3.1 Variation of parameters . . . . . . . . . . . . . . . . . 1515.3.2 Method of undetermined coefficients . . . . . . . . . . 154

5.4 Applications to Mechanical Vibrations . . . . . . . . . . . . . 1605.5 Series Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . 165

5.5.1 Singular points . . . . . . . . . . . . . . . . . . . . . . 1735.5.2 Regular singular points, method of Frobenius . . . . . 176

5.6 Laplace Transforms . . . . . . . . . . . . . . . . . . . . . . . . 1895.6.1 Properties of Laplace transforms . . . . . . . . . . . . 1935.6.2 Discontinuous non-homogeneous functions . . . . . . . 1965.6.3 The Dirac delta function . . . . . . . . . . . . . . . . . 198

5.7 The Convolution Integral . . . . . . . . . . . . . . . . . . . . 202

6 Vectors in the Space 2116.1 Cartesian Coordinates . . . . . . . . . . . . . . . . . . . . . . 2116.2 The Cross Product . . . . . . . . . . . . . . . . . . . . . . . . 2146.3 Lines and Planes in Space . . . . . . . . . . . . . . . . . . . . 2176.4 Cylinders and Quadric Surfaces . . . . . . . . . . . . . . . . . 218

7 Vector-valued functions 2217.1 Vector Functions . . . . . . . . . . . . . . . . . . . . . . . . . 2217.2 Projectile Motion . . . . . . . . . . . . . . . . . . . . . . . . . 2237.3 Arc Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2247.4 Kepler’s Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

8 Functions of several variables 2378.1 Limits and Continuity . . . . . . . . . . . . . . . . . . . . . . 2378.2 Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . 2398.3 Differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . 2428.4 Directional Derivatives . . . . . . . . . . . . . . . . . . . . . . 2458.5 Derivatives and Chain Rule . . . . . . . . . . . . . . . . . . . 2488.6 Taylor’s Polynomial . . . . . . . . . . . . . . . . . . . . . . . 2528.7 Extreme Values . . . . . . . . . . . . . . . . . . . . . . . . . . 2538.8 Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . 255

x CONTENTS

9 Multiple Integrals 2619.1 Double Integrals . . . . . . . . . . . . . . . . . . . . . . . . . 2619.2 Center of Mass . . . . . . . . . . . . . . . . . . . . . . . . . . 2669.3 Double Integrals in Polar Form . . . . . . . . . . . . . . . . . 2709.4 Triple Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . 2729.5 Triple Integrals in Cylinder and Spherical Forms . . . . . . . 2739.6 Coordinate Transforms . . . . . . . . . . . . . . . . . . . . . . 275

10 Vector Fields 28310.1 Line Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . 28310.2 Vector Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 28510.3 Potential Functions . . . . . . . . . . . . . . . . . . . . . . . . 29010.4 Green’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 29310.5 Surfaces in the Space . . . . . . . . . . . . . . . . . . . . . . . 29810.6 Stokes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 30510.7 The Divergence Theorem . . . . . . . . . . . . . . . . . . . . 312

11 Appendix A 31911.1 The Gamma Function . . . . . . . . . . . . . . . . . . . . . . 319

11.1.1 The definition of the gamma function . . . . . . . . . 31911.1.2 Uniqueness of the gamma function . . . . . . . . . . . 32111.1.3 Differentiability of the gamma function . . . . . . . . 32411.1.4 The Euler’s first integral . . . . . . . . . . . . . . . . . 32611.1.5 Γ(x) for large x . . . . . . . . . . . . . . . . . . . . . . 32711.1.6 Stirling’s formula . . . . . . . . . . . . . . . . . . . . . 33011.1.7 The connection with sinx . . . . . . . . . . . . . . . . 33211.1.8 Applications to definite integrals . . . . . . . . . . . . 335

11.2 Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . 337

Index 345

Chapter 1

Preliminaries

1.1 Functions

Functions are the major objects we deal with in mathematics because theyrelate the inputs and the outputs in most phenomenon of the real world.

Definition 1.1.1 A function from a set D to a set R is a rule f that assignsa unique element y ∈ R to each element x ∈ D.

In this case, we use the following notation: We say “f is a function”from D into R, denoted by f : D → R, if

∀ x ∈ D, ∃ | y ∈ R a y = f(x).

Note that a function is denoted by f , while a functional value y of f at anelement x is denoted by f(x). The set D of all possible input x, called theindependent variable, is called the domain of f , and the set of all valuesy = f(x), the dependent variable, is called the range, or image, of f .The range needs not be the whole set R.

When a functional value y = f(x) is given by a formula of real numberswithout mentioning the domain, the domain is assumed to be the largestset of x-values for which the formula gives real y-values: For instance, fory = x2, the domain is R and the range is R+ = {y ∈ R | y ≥ 0}. For y = 1

x ,the domain is R− {0}.Definition 1.1.2 A function f : D → R is one-to-one (or 1− 1, or injec-tive) on the domain D if f(x1) 6= f(x2) whenever x1 6= x2 in D. A functionf : D → R is onto (or surjective to) the range R if, for any y ∈ R, thereis an element x ∈ D such that y = f(x).

1

2 Chapter 1. Preliminaries

In mathematical notations: f is 1 − 1 if f(x1) = f(x2) implies x1 = x2

in D. f is onto if∀ y ∈ R, ∃ x ∈ D a y = f(x).

Definition 1.1.3 For a function f : D → R, the inverse of f is a functiong : R→ D defined by g(y) = x for y = f(x).

It is quite easy to see that if such a function g exists, then f has to be1 − 1 and onto, and conversely. Moreover, such a function g is unique andso denoted by g = f−1. A function f is said to be invertible if the inverseexists. The following is easy consequence of the definition:

Theorem 1.1.1 A function f : D → R is invertible if and only if (abbre-viated by “iff”) there is a function g : R → Dsuch that f ◦ g = Id andg ◦ f = Id, iff f is 1− 1 and onto.

1.2 Limits of Function Values

Let f(x) be a function defined on an open interval about x0, except possiblyat x0. A usual intuitive definition of the concept of the limit of f can beread as follows: A number L is the limit of f as x approaches x0, if f(x) getsarbitrarily close to a number L for all x sufficiently close to x0. However,this definition is “informal” because the meaning of “arbitrarily close” and“sufficiently close” depends on the context. To a machinist manufacturinga clock, close may mean within a few thousandths of a centimeter, whileto an astronomer observing distant galaxies, close may mean within a fewthousand light-years. Hence, one has to make the meaning of “arbitrarilyclose” and “sufficiently close” more precisely:

Definition 1.2.1 Let f(x) be a function defined on an open interval aboutx0, except possibly at x0. We say that f has the limit L as x approachesx0, and write

limx→x0

f(x) = L,

if, for every number ε > 0, there exists a corresponding number δ > 0 suchthat

|f(x)− L| < ε, (f(x) is arbitrarily close to L)

for all x that satisfies 0 < |x− x0| < δ (x is sufficiently close to x0).

1.2. Limits of function values 3

Example 1.2.1 Show that limx→1(5x− 3) = 2.

Solution: Given any ε > 0, we have to find a δ > 0 such that |(5x−3)−2| <ε whenever 0 < |x− 1| < δ.

1. For x 6= 1, solve |(5x−3)−2| < ε about x0 = 1: a simple computationshows

|(5x− 3)− 2| = |5x− 5| = 5|x− 1| < ε, ⇔ |x− 1| < ε

5.

2. Thus, if we choose a value δ = ε5 > 0, then, for x with 0 < |x− 1| <

δ = ε5 , we have

|(5x− 3)− 2| = |5x− 5| = 5|x− 1| < 5ε

5= ε,

which shows that limx→1(5x− 3) = 2. In fact any number δ > 0 such thatδ ≤ ε

5 will do the job. That is, we can choose “sufficiently small” δ that willwork. ¤

Example 1.2.2 For given ε = 1, find a δ > 0 such that, for all x 6= 5with 0 < |x − 5| < δ, the inequality |√x− 1 − 2| < 1 which will implylimx→5

√x− 1 = 2.

Solution: We find a necessary condition for |√x− 1− 2| < 1:

|√x− 1− 2| < 1 ⇒ −1 <√

x− 1− 2 < 1 ⇒ 1 <√

x− 1 < 3⇒ 1 < x− 1 < 9 ⇒ 2 < x < 10,

which means that the inequality holds for all x in (2, 10).From the given point x0 = 5, the shortest distance is 5−2 = 3 < 10−5 =

5. Thus if we take 0 < δ ≤ 3 then the interval (2 < 5 − δ, 5 + δ < 8) liesin (2, 10). That is, if x satisfies 0 < |x− 5| < δ, then clearly 2 < x < 10 sothat |√x− 1− 2| < 1 holds. ¤

Example 1.2.3 For f(x) = x2, show that limx→2 f(x) = 4.

Solution: Given any ε > 0, we want to find a δ > 0 such that |f(x)−4| < εwhenever 0 < |x− 2| < δ.

1. Solve |f(x)− 4| < ε about x0 = 2: For x 6= 2,

|x2 − 4| < ε ⇔ −ε < x2 − 4 < ε,

⇔ 4− ε < x2 < 4 + ε ⇔ √4− ε < |x| < √

4 + ε

⇔ √4− ε < x <

√4 + ε.


2. Choose a value δ > 0 such that (2 − δ, 2 + δ) ⊆ (√

4− ε,√

4 + ε).For this, one can take δ ≤ min{2 − √

4− ε,√

4 + ε − 2}. Then clearly0 < |x− 2| < δ implies that

√4− ε < x <

√4 + ε, or |f(x)− 4| < ε. ¤

The next theorem tells how to calculate the limits of functions that arearithmetic combinations of functions whose limits are known.

Theorem 1.2.1 Suppose that limx→a f(x) = L and limx→c g(x) = M .Then

(1) limx→c(f(x)± g(x)) = L±M .

(2) limx→c(f(x) · g(x)) = L ·M .

(3) limx→c(kf(x)) = kL.

(4) limx→cf(x)g(x) = L

M , provided M 6= 0.

(5) limx→c(f(x))r/s = Lr/s.

Proof: We prove (1), and leave the others to the readers. For a given ε > 0,we want to find a δ > 0 such that |f(x)+g(x)−(L+M)| < ε if 0 < |x−c| < δholds.

|f(x) + g(x)− (L + M)| = |(f(x)− L) + (g(x)−M)|≤ |f(x)− L|+ |g(x)−M |.

Since limx→a f(x) = L, ∃ δ1 > 0 a |f(x)−L| < ε2 whenever 0 < |x− c| < δ1.

Similarly, limx→c g(x) = M implies that ∃ δ2 > 0 a |g(x)−M | < ε2 whenever

0 < |x− c| < δ2. Choose δ = min{δ1, δ2} > 0. Then 0 < |x− c| < δ impliesthat |f(x)− L| < ε

2 and |g(x)−M | < ε2 . Thus,

|f(x) + g(x)− (L + M)| ≤ |f(x)− L|+ |g(x)−M | < ε

2+

ε

2= ε. ¤

The following is a direct consequence of Theorem1.2.1.

Theorem 1.2.2 If P (x) = anxn + an−1xn−1 + · · ·+ a0 and Q(x) are poly-

nomials, then

(1) limx→c P (x) = P (c).

(2) limx→cP (x)Q(x) = P (c)

Q(c) , provided Q(c) 6= 0.


Theorem 1.2.3 If f(x) ≤ g(x) for all x in an open interval containing c,except possibly at x = c, and the limits of both f and g exist as x approachesc, then limx→c f(x) ≤ limx→c g(x).

Proof: Let limx→c f(x) = L and limx→c g(x) = M . Suppose that L > M .From (1) in Theorem 1.2.1, we have limx→c(g(x) − f(x)) = M − L. Thus,for any ε > 0, there is a δ > 0 such that

|(g(x)− f(x))− (M − L)| < ε

whenever 0 < |x− c| < δ. Since L−M > 0, take ε = L−M . Then for thisε, there exists a δ > 0 such that, for x with 0 < |x− c| < δ,

(g(x)− f(x))− (M − L) < ε = L−M

holds. This implies g(x) < f(x) for x close to c, which is a contradiction. ¤

Corollary 1.2.4 (Sandwich Theorem) Suppose that g(x) ≤ f(x) ≤ h(x)on some open interval about c. If

limx→c

g(x) = limx→c

h(x) = L,

then limx→c f(x) = L.

1.2.1 One-sided Limits and Limits at Infinity

(1) If f(x) is defined on an interval (c, b) with c < b, then the right-handlimit L of f at c is defined as

limx→c+

f(x) = L :

if ∀ ε > 0 ∃ δ > 0 a |f(x)− L| < ε if 0 < x− c < δ.

(2) If f(x) is defined on an interval (a, c) with a < c, then the left-handlimit K of f at c is defined as

limx→c−

f(x) = M :

if ∀ ε > 0 ∃ δ > 0 a |f(x)−M | < ε if 0 < c− x < δ.


(3) If f(x) is defined on an interval (a, b) with c ∈ (a, b), then f has a limitat c iff it has both left-hand and right-hand limits and these one-sidedlimits are equal.

(4) By limx→∞ f(x) = L we mean that ∀ ε > 0 ∃ N > 0 a |f(x)− L| < εif x > N .

(5) By limx→−∞ f(x) = L we mean that ∀ ε > 0 ∃ N > 0 a |f(x)−L| < εif x < −N .

The line y = L is called a horizontal asymptote. For those cases,similar rules like Theorem 1.2.1 hold.

The meaning of the followings are meant by the notations themselves.

(6) By limx→c f(x) = ∞ we mean that ∀ M > 0 ∃ δ > 0 a f(x) > M if0 < |x− c| < δ.

(7) By limx→c f(x) = −∞ we mean that ∀ M > 0 ∃ δ > 0 a f(x) < −Mif 0 < |x− c| < δ.

In the case of 6 and 7, the vertical line x = c is called a vertical asymp-tote of f .

Example 1.2.4 Show that

(1) limθ→0

sin θ = 0, (2) limθ→0

cos θ = 1, (3) limθ→0

sin θ

θ= 1.

Solution: (1) From the definition of sin θ, we have −|θ| ≤ sin θ ≤ |θ| forall θ. Since limθ→0(−|θ|) = limθ→0(|θ|) = 0, by Corollary 1.2.4, we havelimθ→0 sin θ = 0.

-

6

cos θ

sin θtan θ

θ

θ

1

O

P

Q A

T

-

6

-

6y = sin θ

y = |θ|

y = −|θ|

y = 1− cos θ

y = |θ|


(2) From the definition of cos θ, we have 0 ≤ 1 − cos θ ≤ |θ| for all θ.Since limθ→0(|θ|) = 0, again by Corollary 1.2.4, we have limθ→0 cos θ = 1.

(3) We prove the right-hand limit first. Assume that 0 < θ < π2 . Then

from the figure above, we have sin θ < θ < tan θ. Thus

1 <θ

sin θ<

1cos θ

, or 1 >sin θ

θ> cos θ.

Since limθ→0 cos θ = 1, by Corollary 1.2.4 again, limθ→0+sin θ

θ = 1. Sincesin θ

θ is an even function, we also have limθ→0−sin θ

θ = 1. ¤

Example 1.2.5 Prove that limx→0+

√x = 0.

Solution: Let ε > 0 be given. Here we have x0 = 0 and L = 0.

|√x− 0| = |√x| < ε ⇒ 0 <√

x < ε ⇒ 0 < x < ε2.

Hence, if we choose 0 < δ ≤ ε2, then for x < δleε2 we have

0 < x < ε2 ⇒ 0 <√

x < ε. ¤

Example 1.2.6 Prove that limx→∞ 1x = 0.

Solution: Let ε > 0 be given. To find an M such that | 1x − 0| = | 1x | < ε forall x with x > M , we simply take M ≥ 1

ε so that limx→∞ 1x = 0. This also

shows that the x-axis is a horizontal asymptote of f(x) = 1x on the right.¤

Example 1.2.7 Prove that limx→01x2 = ∞.

Solution: Let B > 0 be given. W want to find a δ > 0 such that 1x2 > B

for all x with 0 < |x− 0| < δ. Note that

1x2

> B ⇐⇒ x2 <1B

⇐⇒ |x| < 1√B

.

Thus, we simply take δ ≤ 1√B

. Then for x with |x| < δ, we have 1x2 > 1

δ2 ≥B. This also shows that the y-axis is a vertical asymptote of f(x) = 1

x2 atx = 0. ¤


1.2.2 Exercises

1. For the functions f(x) graphed here, find the left-hand, right-hand, and thelimits or explain why they do not exist at the points x0 = 1, 2, 3:

-

6

1 2 3

1 -

6

1 2 3

1

−1

2. If limx→1 f(x) = 5, can we say anything about the value of f at x = 1?Explain.

3. Prove that limx→a c = c and limx→a x = a. Use this to find the followinglimits:

(1) limx→−1

x3 + 4x2 − 3x2 + 5

, (2) limx→−2

√4x2 − 3, (3) lim

x→1

x2 + x− 2x2 − x

.

4. Find the following limits:

(1) limx→0

√x2 + 100− 10

x2, (2) lim

x→4

4− x√x2 + 9

.

5. For given ε > 0 and x0, find δ > 0 such that |f(x) − L| < ε for all x with0 < |x− x0| < δ.

(1) f(x) = 2x− 2, L = 2, x0 = 2, ε = 0.02.

(2) f(x) =√

x− 5, L = 2, x0 = 9, ε = 1.

(3) f(x) = 1x , L = 1

4 , x0 = 4, ε = 0.05.

(4) f(x) = x2 − 5, L = 11, x0 = 4, ε = 0.5.

6. Prove the following limit statements:

(1) limx→0

√4− x = 2.

(2) limx→−2 f(x) = 4 for f(x) ={

x2, x 6= −21 x = −2.

(3) limx→0 x sin 1x = 0.

7. Prove or disprove the following statements:

(1) The number L is the limit of f(x) as x approaches x0 if f(x) gets closerto L as x approaches x0.

1.3. CONTINUITY OF FUNCTIONS 9

(2) The number L is the limit of f(x) as x approaches x0 if, given anyε > 0, there exists a value of x for which |f(x)− L| < ε.

8. Find the limit and the aymptote of the following functions:

(1) limx→∞

5x2 + 8x− 33x2 + 2

, (2) limx→3y

1(x− 3)2

.

9. Find the asymptotes of the following functions:

(1) y = 2 +sin x

x, (2) y =

x + 3x + 2

, (3) y = − 8x2 − 4

, (4) y =x2 − 32x− 4

.

1.3 Continuity of Functions

Definition 1.3.1 A function f : (a, b) → R defined on an interval (a, b) iscontinuous at a point c ∈ (a, b) if

limx→c

f(x) = f(c).

i.e., if (i) limx→c f(x) exists, (ii) f(c) is defined, (iii) limx→c f(x) = f(c).

Continuity of f at the end points a and b are defined by the right-handand left-hand limits of f at a and b.

A function is continuous on an interval if it is continuous at everypoint of the interval.

Theorem 1.3.1 Let f and g be continuous functions at c. Then the fol-lowing algebraic combinations of them are also continuous at c:

f ± g, f · g, kf, f r/s,f

gprovided g(c) 6= 0.

This theorem can be proved from the limit rules in Theorem 1.2.1. Forinstance, the first part can be seen by:

limx→c

(f ± g)(x) = limx→c

(f(x)± g(x)) = limx→c

f(x)± limx→c

g(x)

= f(c)± g(c) = (f ± g)(c).

Thus the polynomials and rational functions are all continuous whereverthey are defined at c.

The functions y = sinx and y = cosx are continuous at x = 0 byExample 1.2.4. In fact, they are continuous everywhere (see problem 5 inExercise 1.3.1). Hence, all the six trigonometric functions are continuouswhere they are defined by Theorem 1.7.1.


Example 1.3.1 Prove that the function f(x) = |x| is continuous every-where in R.

Solution: Note that f(x) = |x| ={

x, x > 0−x, x < 0

which consists of two

polynomials on R − {0}. At the origin, limx→0 |x| = 0 = |0|. Thus thefunction is continuous on R. ¤

Theorem 1.3.2 If f is continuous at c and g is continuous at f(c), the thecomposition g ◦ f is continuous at c.

If a function f is not defined at c so that f(c) does not exist, butlimx→c f(x) = L exists, then one can define a new function F (x) by therule:

F (x) ={

f(x) if x is in the domain of fL if x = c.

The function F is continuous at x = c, called the continuous extensionof f to x = c.

Example 1.3.2 The domain of the function f(x) = sin xx is R − {0} on

which it is continuous everywhere. However, since limx→0 f(x) = 1, we canextend the domain of f to the whole real line R by

F (x) ={

sin xx , x 6= 0

1, x = 0.

which is continuous on R. ¤

Example 1.3.3 The domain of the function f(x) = x2+x−6x2−4

is R−{±2} onwhich it is continuous everywhere. Can f(x) be extended to bigger set?

Solution: Note that

f(x) =x2 + x− 6

x2 − 4=

(x + 3)(x− 2)(x + 2)(x− 2)

=(x + 3)(x + 2)

,

so that limx→2 f(x) = 54 . Thus f(x) can be extended to R− {−2} by

F (x) =

{x2+x−6

x2−4, x 6= ±2

54 , x = 2.

1.3. Continuity of functions 11

However, since limx→−2 f(x) = ±∞, it cannot be extended to R. ¤

Continuous functions have a very useful property, called ”intermediatevalue property”, saying that, whenever it takes on two values in range, ittakes on all the values in between:

Theorem 1.3.3 If f : [a, b] → R is continuous on the closed interval [a, b],then f takes its maximum and minimum at some points c, d ∈ [a, b].

Theorem 1.3.4 (Intermediate Value Theorem) Let f : [a, b] → R bea continuous function on [a, b]. Then, for any value y0 between f(a) andf(b), there is a point c ∈ [a, b] such that f(c) = y0.

The proof of this theorem depends on the completeness property of R,and may found in an advanced calculus.

1.3.1 Exercises

1. Find the points where the following functions fail to be continuous. At whichpoints, if any, are the discontinuities removable? Not removable? Give rea-sons for your answers.

(1) y =1

(x− 2)2+ 4. (2) y =

x + 1x2 − 4x + 3

.

(3) y =cosx

x. (4) y =

√x4 + 1

1 + sin2 x.

2. For what value of a is

f(x) ={

x2 − 1, x < 32ax, x ≥ 3.

3. Suppose that a function f is continuous on the closed interval [0, 1], and0 ≤ f(x) ≤ 1 for every x ∈ [0, 1]. Show that there must exist a number c in[0, 1] such that f(c) = c (c is called a fixed point of f).

4. Suppose that a function f defined on an interval (a, b) is continuous at somec ∈ (a, b) and f(c) > 0. Prove that there is an interval (c− δ, c + δ) about csuch that f(x) > 0 for all x ∈ (c− δ, c + δ).

5. Prove that f is continuous at a if and only if limh→0 f(a + h) = f(a). Usethis fact to prove that y = sin x and y = cos x are continuous at every pointx = a.

12 Chapter 1. Limits and Continuity

1.4 Derivatives

A newly designed car by Hyun Dai Motors has been tested and its fueleconomy (Kilometer/litter) depending on the speed is shown in the followingdiagram:

-

Fuel Economy

A B C

Speed (km/hr)

6(km/litter)

At the speed A, the fuel economy is still improving, and at the speedC it is getting worse as the sped increases, since the curve is rising anddecaying at those points. However, at the speed B it is instantaneouslystaying still, which means the curve reached to the pick point. Thus thespeed B is the most economic speed of the car. Now, problem is how can wefind the point B? In the figure, we draw lines tangent to the graph at severalpoints and compared the slope of the lines, and picked up point B since theline is horizontal, that is, instantaneously the curve is not increasing nordecreasing there. Then, how can we find the lines tangent to the curve, andhow can we compute the slopes of the lines?

To find the tangent line at a point P of the curve, we pick up a pointon the curve nearby P and draw the secant line PQ joining P and Q, andthen move the point Q toward P along the curve.

-

6 P

QQ

Y

7

1.4. Derivatives 13

We do this in both sides of P : from the left and from the right sidesof P . If the two limit lines, if exist, as Q approaches infinitely close to Pcoincide, it is called the tangent line to the curve at P .

Now, the slope of the tangent line will be the limit of the slopes of thesecant lines PQ: Suppose the points are denoted by P = (x0, f(x0)) andQ = (x, f(x)). Then the slope of the secant line PQ is f(x)−f(x0)

x−x0. Thus the

slope of the tangent line is

limx→x0

f(x)− f(x0)x− x0

,

provided the limit exists.Historically, the slope of the secant line was called the difference quo-

tient of f at x0, and the limit, as x approaches x0, of the difference quotientwas called the derivative of f at x0, and denoted by

f ′(x0) =df(x0)

dx=

dy

dx(x0) = y′(x0).

In this case, f is said to be differentiable, or has a derivative, at x0.In practical meaning, it represents the rate of change of f(x0) as x moves

nearby x0.Now, the point x0 can be any point in the domain of f considered as a

variable in the domain, and so replace it by x:

Definition 1.4.1 The derivative of f is the function f ′ whose value at xis

f ′(x) = limh→0

f(x + h)− f(x)h

.

If f ′ exists at every point in the domain I of f , we call f differentiableon I. There are many notations for the same derivative of f :

f ′(x) = f ′ = y′ =dy

dx=

df

dx=

d

dxf(x) = D(f)(x).

Example 1.4.1 Find the derivative of y =√

x, and the tangent line atx = 4.

Solution: A direct computation from the definition goes as follow:

f ′(x) = limz→x

f(z)− f(x)z − x

= limz→x

√z −√x

z − x

= limz→x

√z −√x

(√

z −√x)(√

z +√

x)= lim

z→x

1(√

z +√

x)=

12√

x.


At x = 4, f ′(2) = 12√

4= 1

4 . Thus the tangent line is

y − 2 =14(x− 4), or y =

14x + 1. ¤

Example 1.4.2 Find the derivative of y = |x| at x = 0.

Solution: The right-hand and left-hand derivatives of |x| at x = 0 are

f ′(0+) = limx→0+

|0 + h| − |0|h

= limx→0+

|h|h

= limx→0+

h

h= 1,

f ′(0−) = limx→0−

|0 + h| − |0|h

= limx→0−

|h|h

= limx→0−

−h

h= −1,

which are different. Thus |x| is not differentiable at x = 0. ¤

For the example given at the outset of this section, we are looking for apoint B where f ′(x) = 0.

Theorem 1.4.1 (First derivative test) If a differentiable function f hasa local extremum at an interior point c of its domain, then f ′(c) = 0.

Theorem 1.4.2 If f has a derivative at x = c, then f is continuous atx = c.

Proof: We need to show that limx→c f(x) = f(c), or limh→0 f(c+h) = f(c).Since f ′(c) exists,

limh→0

f(c + h) = limh→0

[f(c) + (f(c + h)− f(c))]

= limh→0

f(c) + limh→0

f(c + h)− f(c)h

h

= f(c) + f ′(c) · 0 = f(c). ¤

Theorem 1.4.3 (Intermediate value theorem of derivatives) If f isdifferentiable on an interval I = [a, b], then for any value z between f ′(a)and f ′(b), there is a point c ∈ [a, b] such that f ′(c) = z.

1.4. Derivatives 15

Proof: Suppose that f ′(a) ≤ z ≤ f ′(b). Consider a function g(x) = f(x)−zx, which is continuous on I. Then g′(a+) = f ′(a+)− z < 0 implies g(a) >g(x1) at some x1 ∈ (a, b). Similarly, g′(b−) = f ′(b−)− z > 0 implies g(x2) <g(b) at some x2 ∈ (a, b). Thus the minimum of g(x) is at some point c in(a, b). Since g is differentiable at c ∈ (a, b), by Theorem 1.4.1 we must have

0 = g′(c) = f ′(c)− z, or f ′(c) = z. ¤

Theorem 1.4.4 (Differentiation rules) Let f and g be differentiable func-tions on an interval, and k a constant.

(1) (kf + g)′ = kf ′ + g′.

(2) (f · g)′ = f ′ · g + f · g′.(3) (f

g )′ = f ′·g−f ·g′g2 .

(4) (xn)′ = nxn−1, for any integer n (x 6= 0 if n < 0).

Proof: (1) is clear and left for the reader. (2) (Product rule)

d

dx(f(x)g(x)) = lim

h→0

f(x + h)g(x + h)− f(x + h)g(x) + f(x + h)g(x)− f(x)g(x)h

= limh→0

[f(x + h)

g(x + h)− g(x)h

+ g(x)f(x + h)− f(x)

h

]

= limh→0

f(x + h) limh→0

g(x + h)− g(x)h

+ g(x) limh→0

f(x + h)− f(x)h

= f(x)g′(x) + g(x)f ′(x).

(3) (Quotient rule)

d

dx

(f(x)g(x)

)= lim

h→0

f(x+h)g(x+h) − f(x)

g(x)

h= lim

h→0

g(x)f(x + h)− f(x)g(x + h)hg(x + h)g(x)

= limh→0

f(x + h)g(x)− f(x)g(x) + f(x)g(x)− f(x)g(x + h)hg(x + h)g(x)

= limh→0

g(x)f(x+h)−f(x)h − f(x)g(x+h)−g(x)

h

g(x + h)g(x)

=g(x)f ′(x)− f(x)g′(x)

g(x)2.


(4) For a positive integer n, use induction on n: For n = 1, x1 = x and

d

dx(x) = lim

h→0

(x + h)− x

h= lim

h→0

h

h= 1 = 1 · x0 = 1 · x1−1.

Assume for n− 1. Then, by the product rule,

d

dx(xn) =

d

dx(x · xn−1) = 1 · xn−1 + x · (n− 1)xn−2 = nxn−1.

For a negative integer n = −m, f(x) = xn = 1xm . Thus, by the quotient

rule,

d

dx(xn) =

d

dx

1xm

=xm · 0− 1 · (mxm−1)

x2m= −mx−m−1 = nxn−1.

¤

If f ′ is derivative, then its derivative is the second derivative of f , denotedby

d

dx

(dy

dx

)=

dy′

dx= y′′ = f ′′(x) =

d2y

dx2= D2(f)(x).

The higher order derivatives are also defined in the same way.

Example 1.4.3 Show that ddx(sinx) = cosx and d

dx(cos x) = − sinx.

Solution: From the usual identities of the trigonometric functions,

d

dx(sinx) = lim

h→0

sin(x + h)− sinx

h= lim

h→0

(sinx cosh + cosx sinh)− sinx

h

= limh→0

sinx(cosh− 1) + cosx sinh

h

= sinx limh→0

cosh− 1h

+ cosx limh→0

sinh

h= sinx · 0 + cosx · 1 = cos x.

Similarly, one can find the derivative of cosx. ¤

1.5. DIFFERENTIALS 17

1.4.1 Exercises

1. Prove the following formulas:

(tan x)′ = sec2 x, (sec x)′ = sec x tan x, (cot x)′ = −csc2x, (cscx)′ = −cscx cot x.

2. Find the first and second derivatives of the functions:

(1) y = 3x−2 − 1x

. (2) y = (x2 + 1)(

x + 5 +1x

).

(3) y =√

x− 1√x

. (4) y =(

1 + 3x

3z

)(3− x).

1.5 Differentials

Let y = f(x) be a differentiable function on an interval I, and a ∈ I. Wheny0 = f(a) is known, f(x), for x ∈ I close to a, can be approximated by thevalue on the tangent line to the graph at a as follows:

The tangent line to y = f(x) at x = a is

L(x) = f(a) + f ′(a)(x− a).

Thus the increment 4y = f(x) − f(a) of f at x ∈ I can be approximatedby the increment 4L = L(x)− f(a) = f ′(a)(x− a) of the tangent line at x:i.e.,

4y = f(x)− f(a) ≈ L(x)− f(a) = f ′(a)4x, 4x = x− a,

or f(x) ≈ L(x) = f(a) + f ′(a)(x− a),

which is called the linear approximation, or linearization of f at x.

-

6

4L4y

a x

f(a)

f(x)L(x)


Originally, the notation dydx for the derivative of y with respect to x does

not represent the ratio of dy and dx, but simply represents the slope of thegraph of y = f(x). However, the formula of the linear approximation showsthat the new variables dy and dx can be separated and the derivative dy

dx isa true ratio of them: From the first equation above, by setting dx = 4xand x = a, the differential dy of f is defined as

dy = f ′(a)dx.

Geometrically, the differential dy is the change 4L(x) = L(x) − L(a) inthe linearization of f given as f ′(x)dx when x = a changes by an amountdx = 4x. If dx 6= 0, then the quotient of the differential dy by the differentialdx is equal to the derivative f ′(x) because

dy ÷ dx =f ′(x)dx

dx= f ′(x) =

dy

dx.

Sometimes, we write df = f ′(x)dx in place of dy = f ′(x)dx, calling thedifferential of f .

Example 1.5.1 Find the linear approximation of f(x) =√

1 + x at x =3.2.

Solution: Let a = 3. Then f(3) = 2, f ′(3) = 12(1 + x)−1/2

∣∣x=3

= 14 , and so

the linear approximation is

L(3.2) = 2 +14(x− 3)

∣∣∣∣x=3.2

=54

+3.24

= 2.050 ≈√

4.2 ≈ 2.04939. ¤

The true change 4y = f(x) − f(a) of y = f(x) approximated by thedifferential estimate dy = f ′(a)dx when x changes by dx = 4x = x−a withan error

4y − dy = 4y − f ′(a)dx

= f(x)− f(a)− f ′(a)4x

=(

f(a +4x)− f(a)4x

− f ′(a))4x

= ε · 4x.

Note thatε =

f(a +4x)− f(a)4x

− f ′(a) → 0, 4x → 0.

1.5. Differentials 19

Hence, the approximation error ε4x is small when 4x is small. Therefore,we have

4y = f ′(a)4x + ε4x,

where ε → 0 as 4x → 0. Note that this equation is sometimes taken as thedefinition for the differentiabilty of f at x = a.

Theorem 1.5.1 (Chain rule) Suppose that y = g(x) is differentiable atx, and z = f(y) is also differentiable at y = g(x). Then the compositionz = f ◦ g(x) = f(g(x)) = f(y) is differentiable at x and

(f ◦ g)′(x) = f ′(g(x)) · g′(x), ordz

dx=

dz

dy· dy

dx.

Proof: Let 4x be an increment in x and let 4y and 4z be the correspond-ing increments in y and z. Then, from the differential equation,

4y = g′(x)4x + ε14x, ε1 → 0 as 4x → 0,

4z = f ′(y)4y + ε24y, ε2 → 0 as 4y → 0= (f ′(y) + ε2)(g′(x) + ε1)4x.

∴ 4z

4x= f ′(y)g′(x) + g′(x)ε2 + f ′(y)ε1 + ε1ε2,

Since 4y → 0 as 4x → 0, ε1 and ε2 → 0 as 4x → 0. Thus

dz

dx= lim

4x→0

4z

4x= f ′(y)g′(x) = f ′(g(x)) · g′(x).

¤

If y = f(x) is a differentiable function of x and x = g(t) is a differentiablefunction of t, then

dy

dt=

dy

dx

dx

dt, or

d

dtf(x) = f ′(x)

dx

dt.

For example, if x is a differentiable function of t, then the power chainrule says that

d

dtxn = nxn−1 dx

dt.


Example 1.5.2

d

dttan(5− sin 2t) = sec2(5− sin 2t)

d

dt(5− sin 2t)

= −2 cos 2t sec2(5− sin 2t).d

dt

(1

3x− 2

)=

d

dt(3x− 2)−1

= −1(3x− 2)−2 d

dt(3x− 2) = − 3

(3x− 2)2. ¤

Example 1.5.3 Note that, when we take the derivatives of the trigonomet-ric functions, the variable x is assumed to be measured in radian, not degree:180◦ = π radians, or x◦ = π

180x radians. The chain rule says that

d

dxsin(x◦) =

d

dxsin(

πx

180) =

π

180cos

πx

180=

π

180cos(x◦). ¤

1.5.1 Parametric equations

If a car is moving on the xy-plane, the position (x, y) of the car in time maybe represented as (x = f(t), y = g(t)) over a time interval t ∈ (a, b). Inthis case, the path in which the car moves is called a parametric curve,and the equations are called parametric equations for the curve in theparameter t. Thus the position of the car may be written as a vector valuedfunction in t:

α(t) = (f(t), g(t)) ∈ R2, t ∈ [a, b].

For example, the parametric curve

α(t) = (cos t, sin t), t ∈ [0, 2π]

represents the unit circle in the xy-plane tracing the entire circle once coun-terclockwise.

If the two functions f(t) and g(t) are differentiable, we say the parametriccurve α is differentiable and its derivative is denoted by α′(t) = (f ′(t), g′(t)).

Since, in

α′(t) =dα

dt= lim

4t

α(t +4t)− α(t)4t

,

the numerator is a vector from the position α(t) to α(t +4t) and the de-nominator 4t represents the time interval, α′(t) geometrically representsthe infinitesimal direction of motion, called the velocity vector of the car,and the magnitude ‖α′(t)‖ = distance

time is called the speed of the car.


-

Á-

*s

α(t)

α(t +4t)α′(t)

6

x

y

If y is considered as a differentiable function of x in the parametric curve,then the derivative dy

dx can be obtained by the chain rule:

dy

dt=

dy

dx· dx

dt, or

dy

dx=

dy/dt

dx/dt,

provided that dx 6= 0.If the curve α(t) = (f(t), g(t)) ∈ R2, t ∈ [a, b], defines y as twice differ-

entiable functions of x, then for dydx = y′

d2y

dx2=

d

dxy′ =

dy′/dt

dx/dt.

Example 1.5.4 The ellipse x2

a2 + y2

a2 = 1 can be parametrized as α(t) =(a cos t, b sin t) ∈ R2, t ∈ [a, b]. Find the line tangent to the curve at thepoint ( a√

2, b√

2) when t = π

4 .

Solution: The slope of the tangent line at t = π4 is

dy

dx

∣∣∣∣t=π

4

=dy/dt

dx/dt

∣∣∣∣t=π

4

=b cos t

−a sin t

∣∣∣∣t=π

4

=b/√

2−a/

√2

= − b

a.

The tangent line is now

y = b/√

2− b

a(x− a/

√2) = − b

ax +

√2b. ¤

Example 1.5.5 If an aircraft releases a bomb toward a ground target, thebomb moves along the path

α(t) = (x(t), y(t)) = (120t, −16t2 + 500), t ≥ 0.

When does the bomb hit the target? How far did it fly in horizontal distance?Find the Cartesian equation of the path of the bomb, and the rate of descentrelative to its forward motion when it hit the target.


Solution: The bomb hit the target when y = 0. Thus

−16t2 + 500 = 0 ⇐⇒ t =

√50016

=5√

52

> 0.

The horizontal distance it flew is

x(5√

52

) = 1205√

52

= 300√

5 meter.

The Cartesian equation is obtained by eliminating t: Plug t = x120 in

y = −16t2 + 500 = −16(x

120)2 + 500 = − 1

900x2 + 500.

The rate of descent relative to its forward motion when t = 5√

52 is

dy

dx

∣∣∣∣t= 5

√5

2

=dy/dt

dx/dt

∣∣∣∣t= 5

√5

2

=−32t

120

∣∣∣∣t= 5

√5

2

= −2√

53

≈ −1.49.

Thus it falls about 1.5 m per unit meter of forward motion when it hits thetarget. ¤

1.5.2 Implicit differentiation

Sometimes, a function y = f(x) may be given by an implicit equationF (x, y) = 0, e.g., x2 + y2 − 16 = 0. In some cases we may be able tosolve such an equation for y as an explicit function y = f(x) in x. In somecases it is not easy to solve such an equation for y as an explicit functiony = f(x) in x. In those cases we may find dy

dx by implicit differentiation: Takedifferentiation both sides of the equation in x and then solve the resultingequation for y′.

Example 1.5.6 Find dydx for y2 = x2 + sinxy.

Solution:d

dx(y2) =

d

dx(x2) +

d

dx(sinxy)

2ydy

dx= 2x + cos xy(y + x

dy

dx)

(2y − x cosxy)dy

dx= 2x + y cosxy

dy

dx=

2x + y cosxy

2y − x cosxy. ¤


Theorem 1.5.2 (Power rule) For a rational number pq ,

d

dxx

pq =

p

qx

pq−1

.

Proof: For y = xpq , yq = xp. Thus by the implicit differentiation, qyq−1y′ =

pxp−1.

y′ =p

qxp−1y−(q−1) =

p

qxp−1x

− pq(q−1) =

p

qxp−1x

−p+ pq =

p

qx

pq−1

. ¤

1.5.3 Applications of derivatives

Many important theorems in mathematics require the following Extremevalue theorem, whose proof requires a detailed knowledge of real numbersystem. Hence we just state the theorem without proof.

Theorem 1.5.3 (Extreme Value Theorem) If f is a continuous func-tion on a closed interval [a, b], then there are numbers x1 and x2 ∈ [a, b] suchthat f(x1) ≤ f(x) ≤ f(x2) for all x ∈ [a, b]. The numbers f(x1) = m andf(x2) = M are called the absolute minimum and absolute maximumof f .

A function has a local maximum (or, local minimum) at a point cin the domain if f(x) ≤ f(c) (or, f(x) ≥ f(c), respectively) for all x in some(half-)open interval containing c when c is an (end point) interior point ofits domain.

Theorem 1.5.4 (First Derivative Test) If f has a local maximum orlocal maximum at an interior point c of its domain and f ′(c) exists, thenf ′(c) = 0.

Proof: Suppose that f has a local maximum at x = c so that f(x)−f(c) ≤ 0for all x near c. Then

0 ≤ limx→c−

f(x)− f(c)x− c

= f ′(c) = limx→c+

f(x)− f(c)x− c

≤ 0,

which implies f ′(c) = 0. Similarly, the same holds at a local minimum. ¤


Example 1.5.7 Find the absolute maximum and minimum values of y =f(x) = x2/3 on [−2, 3].

Solution: The first derivative of f is

f ′(x) =23x−1/3 =

23 3√

x=

{< 0, if x < 0> 0 if x > 0

6= 0, for x 6= 0.

f ′(0) is not defined. Since f(0) = 0, f(−2) = 3√

4, and f(3) = 3√

9, theabsolute minimum is f(0) = 0 and the absolute maximum is f(3) = 3

√9. ¤

Theorem 1.5.5 (Rolle’s Theorem) Suppose that f : [a, b] → R is con-tinuous on [a, b] and differentiable on (a, b). If f(a) = f(b), then there is atleast one point c ∈ (a, b) such that

f ′(c) = 0.

Proof: If f is a constant function with f(x) = f(a) = f(b) for all x ∈ [a, b],then f ′(x) = 0 for all x ∈ [a, b]. If f is not a constant function withf(a) = f(b) and is differentiable on (a, b), then there is a point c where ftake its extremum value by Theorem 1.5.3. By Theorem 1.5.4, f ′(c) = 0. ¤

Theorem 1.5.6 (The Mean Value Theorem) Suppose that f : [a, b] →R is continuous on [a, b] and differentiable on (a, b). Then there is at leastone point c ∈ (a, b) such that

f(b)− f(a)b− a

= f ′(c).

Proof: The equation of the line through (a, f(a)) and (b, f(b)) is

g(x) = f(a) +f(b)− f(a)

b− a(x− a).

The vertical distance between the graphs of f and g is

h(x) = f(x)− g(x) = f(x)− f(a)− f(b)− f(a)b− a

(x− a)

which satisfies the hypothesis of Rolle’s Theorem on [a, b]: it is continuouson [a, b] and differentiable on (a, b), and h(a) = h(b) = 0. Hence h′(c) = 0at some point c ∈ (a, b). But

0 = h′(c) = f ′(c)− f(b)− f(a)b− a

. ¤


Corollary 1.5.7 If f ′(x) = 0 at each point x in an open interval (a, b), thenf is a constant function on the interval: i.e, f(x) = C for all x ∈ (a, b).

Proof: For any two points x1 ≤ x2 in (a, b), by Theorem 1.5.6, there is apoint c ∈ (x1, x2) such that

f(x2)− f(x1)x2 − x1

= f ′(c) = 0.

Thus, f(x2)− f(x1) = 0, or f(x2) = f(x1). ¤

Corollary 1.5.8 If f ′(x) = g′(x) at each point x in an open interval (a, b),then f − g is a constant function on the interval: i.e, f(x) = g(x) + C forall x ∈ (a, b).

Proof: For h(x) = f(x)− g(x), h′(x) = f ′(x)− g′(x) = 0 for all x ∈ (a, b).Thus h(x) = f(x)− g(x) = C on (a, b) by Corollary 1.5.7. ¤

1.5.4 Exercises

1. Find the differential dy for y = x5 + 37x. At x = 1 what is the value of dyfor dx = 0.2?

2. Find the linearization L(x) of f(x) at x = a.(1) f(x) = x + 1

x , a = 1.(2) f(x) = cos x, a = π

2 .(3) f(x) = (1 + x)k, a = 0.(4) f(x) = 1√

1+x, a = 0.

3. Estimate (1.0002)50 and 3√

1.009 by using the linearization.

4. Estimate the change in the volume V = 43πr3 of the sphere when the radius

changes from 2 to 2.02.

5. Estimate the change in the surface area S = 6x2 of a cube when the edgelengths changes from 2 to 2.02.

6. Find the derivatives dydx of the following functions.

(1) y =1

sin2 x− 2

sinx . (2) y = sin(x +√

x + 1).

(3) x2y2 = 1. (4) y2 =

√1 + x

1− x.


7. (1) Express the lateral surface area S of the right circular cone in terms ofthe base radius r and the height h.

(2) Find dSdt in terms of dr

dt when h is constant.

(3) Find dSdt in terms of dh

dt when r is constant.

(4) Find dSdt in terms of dr

dt and dhdt when neither r nor h is constant.

8. A particle moves along the curve y = x3/2 in the first quadrant in such a waythat its distance from the origin increases at the rate of 11 unit per second.Find dx

dt when t = 3.

1.6 Indeterminate Forms

Let f(x) and g(x) be both continuous functions such that f(a) = 0 = g(a).Then

limx→a

f(x)g(x)

=f(a)g(a)

=00

is called an indeterminate form. However, from the definition of the deriva-tive:

f ′(a)g′(a)

=limx→a

f(x)−f(a)x−a

limx→ag(x)−g(a)

x−a

= limx→a

f(x)− f(a)g(x)− g(a)

= limx→a

f(x)g(x)

.

Theorem 1.6.1 (L´Hopital´s Rule) Suppose that f(a) = 0 = g(a), f ′(a)and g′(a) exist, and that g′(a) 6= 0. Then

limx→a

f(x)g(x)

=f ′(a)g′(a)

.

Example 1.6.1 (1) limx→03x−sin x

x = 3−cos x1

∣∣x=0

= 2.

(2) limx→0

√1+x−1

x =1

2√

1+x

1

∣∣∣∣x=0

= 12 . ¤

Theorem 1.6.2 (Stronger form of L´Hopital´s Rule) Suppose thatf(a) = 0 = g(a), f and g are both differentiable on an open interval Icontaining a, and that g′(x) 6= 0 on I if x 6= a. Then

limx→a

f(x)g(x)

= limx→a

f ′(x)g′(x)

.

1.6. Indeterminate Forms 27

Example 1.6.2

(1) limx→0

x− sinx

x3= lim

x→0

1− cosx

3x2= lim

x→0

sinx

6x= lim

x→0

cosx

6=

16.

(2) limx→0

√1 + x− 1− x/2

x2= lim

x→0

1/2(1 + x)−1/2 − 1/22x

= limx→0

−1/4(1 + x)−3/2

2= −1

8. ¤

Theorem 1.6.2 is based on the following Cauchy’s Mean Value Theorem:

Theorem 1.6.3 (Cauchy’s Mean Value Theorem) Let f(x) and g(x)be both continuous functions on [a, b] and differentiable on (a, b), and g′(x) 6=0 on (a, b). Then there is a number c ∈ (a, b) such that

f ′(c)g′(c)

=f(b)− f(a)g(b)− g(a)

.

Proof: Since g′(x) 6= 0 on (a, b), we have g(b) 6= g(a). Consider a function

F (x) = f(x)− f(a)− f(b)− f(a)g(b)− g(a)

(g(x)− g(a)),

which is differentiable where f and g are, and F (b) = F (a) = 0. Thus, bythe Mean Value Theorem, there is c ∈ [a, b] such that F ′(c) = 0: i.e.,

F (c) = f ′(c)− f(b)− f(a)g(b)− g(a)

g′(c), orf ′(c)g′(c)

=f(b)− f(a)g(b)− g(a)

. ¤

Notice that the Mean Value Theorem 1.5.6 is a special case of theCauchy’s Mean Value Theorem with g(x) = x. Moreover, a geometric in-terpretation of Theorem 1.6.3 can be seen as follows: For a curve C definedby the parametric equations α(t) = (x, y) = (g(t), f(t)), t ∈ [a, b], the slopeof the curve at t is given by

dy/dt

dx/dt=

f ′(t)g′(t)

.

Thus, f ′(c)g′(c) is the slope of the tangent to the curve C at t = c. The secant

line joining the two points (g(a), f(a)) and (g(b), f(b)) has the slope

f(b)− f(a)g(b)− g(a)

.


-

6

(g(a), f(a))

(g(b), f(b))(g(c), f(c))

Proof: [Proof for Theorem 1.6.2] For a < x, g′(x) 6= 0. Set b = x inTheorem 1.6.3, then ∃ c ∈ [a, x] such that

f ′(c)g′(c)

=f(x)− f(a)g(x)− g(a)

=f(x)g(x)

,

since f(a) = 0 = g(a). Thus

limx→a+

f(x)g(x)

= limc→a+

f ′(c)g′(c)

= limx→a+

f ′(x)g′(x)

.

The same holds as x → a−. ¤

The following examples show Theorem 1.6.2 can also be used to dealwith the cases like ∞

∞ , ∞ · 0, ∞−∞, as x → a.

Example 1.6.3

(1) limx→(π/2)−

secx

1 + tanx= lim

x→(π/2)−

secx tanx

sec2 x= lim

x→(π/2)−sinx = 1.

(2) limx→∞

x− 2x2

3x2 + 5x= lim

x→∞1− 4x

6x + 5= −2

3.

(3) limx→∞(x sin

1x

) = limh→0+

1h

sinh = limh→0+

cosh

1= 1.

(4) limx→0

(1

sinx− 1

x) = lim

x→0

x− sinx

x sinx= lim

x→0

1− cosx

sinx + x cosx

= limx→0

sinx

2 cos x− x sinx= 0.

¤

1.7. ANTIDERIVATIVES 29

1.6.1 Exercises

1. Evaluate the limits.

(1) limx→01− cos x

x2. (2) limh→0

sin(a + h)− sin a

h.

(3) limx→∞ x tan1x

. (4) limx→∞ x−√

x2 + x

(5) limx→0+

√x√

sinx(6) limx→(π/2)−

sec x

tanx.

1.7 Antiderivatives

Many real problems require a function F from its known derivative f . Ifsuch a function F exists, it is called an antiderivative of f .

Definition 1.7.1 A function F is an antiderivative of f on an interval Iif F ′(x) = f(x) for all x ∈ I.

Corollary 1.5.8 says that if F (x) and G(x) are both antiderivatives off(x), then G(x) = F (x)+C for some constant C. Thus, F (x)+C representsall the antiderivatives of f(x).

Definition 1.7.2 The set of all antiderivatives of f is the indefinite inte-gral of f with respect to x, defined by

∫f(x)dx = F (x) + C, or

∫F ′(x)dx = F (x) + C,

where F (x) is an antiderivative of f(x),∫

is called an integral sign, f is theintegrand of the integral, and x is the variable of integration.

For example,∫

(2x + cosx)dx = x2 + sin x + C, and

Function f(x) Antiderivative F (x) =∫

f(x)dx

1 xn xn+1

n+1 + C, n 6= −1, n rational

2 sin kx − cos kxk + C, k a constant, k 6= 0

3 cos kx sin kxk + C, k a constant, k 6= 0

4 sec2 x tanx + C5 csc2 x − cotx + C6 sec x tanx secx + C7 cscx cotx − cscx + C


Finding an antiderivative for a function f(x) is the same problem asfinding a function y(x) that satisfies the equation

dy

dx= f(x).

In general, an equation involving derivatives of a function y = f(x)is called a differential equation. Solving a differential equation meansfinding a function y = f(x) which satisfies the given differential equation.

1.7.1 Definite integrals

Let y = f(x) be an arbitrary function defined on a closed interval [a, b].Partition [a, b] into n subintervals by choosing n − 1 points between a andb:

P = {a = x0 < x1 < · · · < xn = b}.Let 4xk = xk − xk−1 denote the width of the subinterval [xk−1, xk]. Thisis the change of x from xk−1 to xk and approximated by the differential dx.Let ck be any point in [xk−1, xk]. Then

f(ck) · 4xk

is ± the area of the vertical rectangle over the subinterval [xk−1, xk] withheight f(ck) and ± depending on the sign of f(ck). A Riemann sum forf on [a, b] is the sum

SP =n∑

k=1

f(ck)4xk.

We now can make various partitions of the interval [a, b], and manychoices of the points ck. et ‖P‖ = max{4xk | xk ∈ P}, called the norm ofP . The limit of SP as ‖P‖ → 0, if it exists, is called the definite integralof f over [a, b]:

Definition 1.7.3 For a function f(x) on [a, b], the definite integral of fover [a, b] is denoted by

lim‖P‖→0

n∑

k=1

f(ck)4xk = I =∫ b

af(x)dx,

1.7. Antiderivatives 31

provided the limit exists: i.e., for any ε > 0 there is a number δ > 0 suchthat for every partition P of [a, b] with ‖P‖ < δ and any choice of ck in[xk−1, xk],

|n∑

k=1

f(ck)4xk − I| < ε.

In this case f is said to be integrable over [a, b].

Theorem 1.7.1 A continuous function is always integrable.

Example 1.7.1 The function f(x) ={

1, if x is rational0, if x is irrational

is not inte-

grable over [0, 1]. ¤

Theorem 1.7.2 Let f and g be integrable functions over [a, b]. The follow-ing rules hold:

(1)∫ ab f(x)dx = − ∫ b

a f(x)dx.

(2)∫ aa f(x)dx = 0.

(3)∫ ba kf(x)dx = k

∫ ba f(x)dx,

∫ ba −f(x)dx = − ∫ b

a f(x)dx.

(4)∫ ba (f(x)± g(x))dx =

∫ ba f(x)dx± ∫ b

a g(x)dx.

(5)∫ ca f(x) +

∫ bc f(x)dx =

∫ ba f(x)dx.

(6) If M = max f(x) and m = min f(x) over [a, b], then

m(b− a) ≤∫ b

af(x)dx ≤ M(b− a).

(7) If f(x) ≥ 0 on [a, b], then∫ ba f(x)dx ≥ 0.

If f(x) ≥ g(x) on [a, b], then∫ ba f(x)dx ≥ ∫ b

a g(x)dx.

Definition 1.7.4 If f is integrable functions over [a, b], the average valueof f on [a, b], called its mean value, is

av(f) =1

b− a

∫ b

af(x)dx.


Theorem 1.7.3 (The Mean Value Theorem) If f is continuous on [a, b],then there is a point c ∈ [a, b] such that

f(c) =1

b− a

∫ b

af(x)dx.

Proof: By the Extreme Value Theorem 1.5.3, f assumes the absolute max-imum and minimum on [a, b]. Thus,

min f ≤ 1b− a

∫ b

af(x)dx ≤ max f.

Since f is continuous, by the Intermediate Value Theorem for continuousfunctions, there exists a number c ∈ [a, b] such that f(c) = 1

b−a

∫ ba f(x)dx.¤

Example 1.7.2 Suppose that f is continuous on [a, b], a 6= b, and if

∫ b

af(x)dx = 0,

then f(c) = 0 at some c ∈ [a, b], since

min f ≤ av(f) =1

b− a

∫ b

af(x)dx =

1b− a

· 0 = 0 ≤ max f,

and so one can use the intermediate value theorem of continuous functionsfor the existence of c in [a, b]. ¤

Let f be a continuous functions over I = [a, b]. For x ∈ I, define a newfunction

F (x) =∫ x

af(t)dt.

If f is nonnegative on [a, b], F (x) is the area under the graph of f from a tox. Let M = max f on [a, b]. For any ε > 0, take 0 < δ ≤ ε

M . Then, for any|h| < δ, |(x + h)− x| = |h| < δ. By Theorem 1.7.3, there is a c ∈ [x, x + h]such that

|F (x + h)− F (x)| =∣∣∣∣∫ x+h

xf(t)dt

∣∣∣∣ = |hf(c)| < Mδ ≤ Mε

M= ε,

which shows that F is also continuous at each point x in [a, b].


Theorem 1.7.4 (The First Fundamental Theorem of Calculus) If fis continuous on [a, b], then F (x) =

∫ xa f(t)dt is continuous on [a, b] and dif-

ferentiable on (a, b) and its derivative is f(x):

F ′(x) =d

dx

∫ x

af(t)dt = f(x).

Proof: The continuity of F is already shown. Note that

F (x + h)− F (x)h

=1h

∫ x+h

xf(t)dt = f(c),

where the last equality holds from the mean value theorem for some c ∈[x, x + h]. As h → 0, x + h → x, and so c → x. Since f is continuous at x,f(c) → f(x): Thus

F ′(x) = limh→0

F (x + h)− F (x)h

= limh→0

f(c) = f(x).¤

Theorem 1.7.5 (The Second Fundamental Theorem of Calculus) Iff is continuous on [a, b] and F is an antiderivative of f on [a.b], then

∫ b

af(x)dx = F (b)− F (a).

Proof: Note that, by the first fundamental theorem, F (x) =∫ xa f(t)dt is

an antiderivative of f . If G(x) is any other antiderivative of f , then G(x) =F (x) + C for some constant C. Then

G(b)−G(a) = F (b)− F (a) =∫ b

af(x)dx−

∫ a

af(x)dx =

∫ b

af(x)dx.

¤

This theorem has a significant meaning in mathematics: the left side iscomputing the area under the graph which seems to be noting to do with thedifferentiation. But it turns out that it is computed from an antiderivativeevaluated just at the two end points regardless of its values at points in theinterval.


1.7.2 Integrals by substitution

Note that a definite integral of a continuous function is a number as thelimit of the Riemann sums, and the fundamental theorem says that it canbe computed easily if we can find an antiderivative of the function. Findingantiderivatives is generally more difficult than fining derivatives. Here wepresent some techniques for computing them.

Note that, for a differentiable function z = F (u), F (u)+C =∫

F ′(u)du.Suppose also that u = g(x) is differentiable function. Then, by the Chainrule,

d

dxF (g(x)) = F ′(g(x))g′(x) = f(g(x))g′(x),

where F ′(u) = f(u). Hence,∫

f(g(x))g′(x)dx =∫

F ′(g(x))g′(x)dx =∫

d

dxF (g(x))dx = F (g(x)) + C

= F (u) + C =∫

F ′(u)du =∫

f(u)du.

Theorem 1.7.6 (The Substitution Rule) If f is a continuous functionin u and u = g(x) is a differentiable function, then∫

f(u)du =∫

f(g(x))g′(x)dx.

Example 1.7.3∫ √4x− 1dx =

∫14√

4x− 1 · 4dx =14

∫ √u

du

dxdx =

14

∫u1/2du

=14

u1+1/2

1/2 + 1+ C =

16u3/2 + C =

16(4x− 1)3/2 + C.

Example 1.7.4∫

2x3√

x2 + 1dx =

∫u−1/3du =

u2/3

2/3+ C

=32u3/2 + C =

32(x2 + 1)2/3 + C. ¤

If u = g(x) has continuous derivative on [a, b] and z = f(u) is continuouson the range of g, then, for a anti derivative F (u) of f ,

∫ b

af(g(x))g′(x)dx = F (g(x))|x=b

x=a = F (g(b))− F (g(a))

= F (u)|u=g(b)u=g(a) =

∫ g(b)

g(a)f(u)du.


Example 1.7.5

∫ 1

−13x2

√x3 + 1dx =

∫ 2

0

√udu =

23u3/2

∣∣∣∣2

0

=2323/2 =

4√

23

. ¤

1.7.3 Exercises

1. Find the indefinite integrals.

(1)∫

6dx

(x−√2). (2)

∫sec2 s

5ds.

(3)∫

cos2θ

5dθ. (4)

∫1 + cos 4t

2dt.

(5)∫

cscθcscθ − sin θ

dθ. (6)∫

sin√

t√t cos3

√tdt.

2. Evaluate the following definite integrals.

(1)∫ 4

9

1−√x√u

dx. (2)∫ π/3

0

2 sec2 sds.

(3)∫ π/2

0

cos2θ

5dθ. (4)

∫ 0

π/2

1 + cos 2t

2dt.

(5)∫ π

0

12(cos θ + | cos θ|)dθ. (6)

∫ √2

1

t2 +√

t

t2dt.

3. Find the limit:

limx→0

1x3

∫ x

0

t2

t4 + 1dt.

4. Suppose that z = f(u) is a continuous function on [a, b], and u(x), v(x) aredifferentiable functions of x whose values lie in [a, b]. Prove that

d

dx

∫ u(x)

v(x)

f(t)dt = f(u(x))du

dx− f(v(x))

dv

dx.

5. Find the linearization of

f(x) = 2−∫ x+1

0

91 + t

dt, at x = 1.

Chapter 2

Transcendental Functions

The set of real functions may be divided into two classes: algebraic func-tions and transcendental functions. A function y = f(x) is an algebraicfunction of x if it satisfies an equation of the form

Pn(x)yn + Pn−1(x)yn−1 + · · ·+ P1(x)y + P0(x) = 0,

in which the coefficients P0(x), . . . , Pn(x) are polynomials in x.Thus all polynomials, root functions, rational functions, and sums, prod-

ucts, quotients of these functions are all algebraic. For instance, the functiony = 1

(x2+1)2/3 is algebraic since it satisfies the equation

(x2 + 1)2y3 − 1 = 0.

Note that if y = f(x) is an algebraic function, then for any given x we cancrank it algebraically to find the values y. However, there are functionswhose values may not be calculated algebraically for some given values of x.We call those functions transcendental. That is, functions that are not alge-braic are said to be transcendental. For instance, trigonometric functions,their inverses, logarithmic functions, exponential functions and hyperbolicfunctions, etc. These functions are the main subjects of the present chapter.

Of course, we already know what logarithmic and exponential functionsare. However, we define them in terms of integral and inverse function sothat we can have definite meaning of the power of a positive number by anyreal number.

Transcendental functions are important in solving problems in engineer-ing and Physics, in addition to being important in mathematics itself.

37

38 Chapter 2. Transcendental Functions

2.1 Inverse Functions

Recall that a function y = f(x) is invertible on an interval I (i.e. it has aninverse function y = g(x), denoted by f−1(x), such that f ◦g(y) ≡ f(g(y)) =y for y in the domain of g and g ◦ f(x) ≡ g(f(x)) = x for x in the domain off .) if and only if y = f(x) is one to one and onto (bijective) on its domainI. This happens if either y = f(x) is monotonically increasing: f ′(x) > 0on I, or monotonically decreasing: f ′(x) < 0 on I.

Let y = f(x) be a differentiable function on an interval I. If it is alsoinvertible with f ′(x) 6= 0 on I with its inverse f−1(y) = x, then

f−1 ◦ f(x) = x, f ◦ f−1(y) = y

implies, by the chain rule, that

d

dx(f−1 ◦ f(x)) =

d

dyf−1(y) · d

dxf(x) =

dx

dx= 1.

Theorem 2.1.1 Suppose that y = f(x) is a differentiable function on aninterval I, and also invertible with f ′(x) 6= 0 there. Then f−1 is also invert-ible on I, and

(f−1)′(y) =1

f ′(x), x = f−1(y).

2.2 Trigonometric Functions

The radian measure of the angle ]AOB is the arc length of the unit circlecut by the angle: Thus if a circle of radius r is cut by the angle ]AOB withthe radian measure θ, then the arc length s of the circle is

s = rθ.

Hence, 90◦ = π2 radian, 180◦ = π radian, 1◦ = π

180 radian, 1 radian = (180π )◦,

etc. Through out the calculus, we use radian measure for angles.Out of a right triangle, one can define the trigonometric functions as:

sin θ =y

r, csc θ =

y

r,

cos θ =x

r, sec θ =

r

x,

tan θ =y

x, cot θ =

x

y,

tan θ =sin θ

cos θ, cot θ =

cos θ

sin θ,

sec θ =1

cos θ, csc θ =

1sin θ

.

2.2. Trigonometric Functions 39

-

6

µ

O x

θ

r

A

B

s = rθ

The derivatives of the trigonometric functions are given as follows:

(1) (sinx)′ = cosx, (cscx)′ = − cscx cotx.

(2) (cosx)′ = − sinx, (secx)′ = secx tanx.

(3) (tanx)′ = sec2 x, (cotx)′ = − csc2 x.

2.2.1 Inverse trigonometric functions

Those trigonometric functions are periodic, and so are not invertible on R.However, if we restrict their domains on some intervals as the followings,they are 1− 1 and onto their ranges:

Function(f) Domain of (f) Range of (f)

Range of (g) Domain of (g) Inverse(g)

sinx [−π2 , π

2 ] [−1, 1] sin−1 x = arcsinx

cosx [0, π] [−1, 1] cos−1 x = arccosx

tanx (−π2 , π

2 ) (−∞,∞) tan−1 x = arctanx

cotx (0, π) (−∞,∞) cot−1 x = arccot x

secx [0, π2 ) ∪ (π

2 ), π] (−∞,−1] ∪ [1,∞) sec−1 x = arcsec x

cscx [−π2 ), 0) ∪ (0, π

2 ] (−∞,−1] ∪ [1,∞) csc−1 x = arccsc x

For example, the function y = f(x) = sinx is not one to one on R.However, if the domain of sin is restricted to Df = [−π

2 , π2 ], then it is 1− 1

onto the range Rf = [−1, 1], and is invertible. Of course, there are some


other choices of Df for the function to be invertible. But traditionally, wechoose [−π

2 , π2 ] for Df .

Hence, on this interval Df = [−π2 , π

2 ], the function y = sinx has itsinverse function, denoted by y = g(x) = f−1(x) ≡ sin−1(x), which is called“arc-sine of x”, with its domain Dg = [−1, 1] and its range Rg = [−π

2 , π2 ].

Example 2.2.1 sin−1(12) = π

6 , and sin−1(−√

32 ) = −π

3 . ¤

Note that sin−1 x 6= (sinx)−1 = 1sin x = cscx.

Some properties of y = sin−1 x.

(1) sin−1(sinx) = x for x ∈ [−π2 , π

2 ].

(2) sin(sin−1 x) = x for x ∈ [−1, 1]. Note that sin−1(sin 3π2 ) = sin−1(−1) =

−π2 6= 3π

2 , since 3π2 /∈ [−π

2 , π2 ].

(3) It is an odd function : i.e., sin−1(−x) = − sin−1 x, since

y = sin−1(−x) ⇐⇒ −x = sin y

⇐⇒ x = − sin y

⇐⇒ x = sin(−y), since sine is an odd function.⇐⇒ −y = sin−1 x

⇐⇒ y = − sin−1 x = sin−1(−x).

Similarly the other trig functions : cos, tan, cot, sec, and csc also havetheir inverse functions if we restrict their domains as in the table.

Remark: For the mater of derivative formula of sec−1, sometimes wemay choose D′

f to be [−π,−π2 ) ∪ [0, π

2 ). this will be made clear in the nextsections.

For the functions csc−1 x, we have also another choice of Df . D′f =

(0, π2 ] ∪ (π, 3π

2 ].

The following relations of the inverse trigonometric functions maybe ob-tained from the figure below:

(1) cos−1 x + cos−1(−x) = π

(2) sec−1(−x) + sec−1 x = π.

(3) cos−1 x + sin−1 x = π2 ( or sin−1 x = π

2 − cos−1 x).

(4) cot−1 x + tan−1 x = π2 .


(5) csc−1 x + sec−1 x = π2 .

(6) tan−1(−x) = − tan−1 x : i.e., tan−1 is an odd function.

(7) sec−1 x = cos−1( 1x).

(8) csc−1 x = sin−1( 1x).

-

6

x−x 1

::

cos−1 x

sin−1 xcos−1(−x)

2.2.2 Derivatives of the inverse trig functions

(1) For y = sin−1 x on (−1, 1), x = sin y for y ∈ (−π2 , π

2 ). Hence by thechain rule,

1 = cos ydy

dx

⇒ dy

dx=

1cos y

=1√

1− sin2 y=

1√1− x2

.

Note that cos y > 0 for y ∈ (−π2 , π

2 ). Furthermore if u is a functionsin x, i.e., u = b(x), then

y = sin−1 u

⇒ dy

dx=

d

du(sin−1 u)

du

dx=

1√1− u2

du

dx, |u| < 1.

(2) Similarly for cos−1, tan−1, cot−1:

d

dx(cos−1 u) =

−1√1− u2

du

dx, |u| < 1,


d

dx(tan−1 u) =

11 + u2

du

dx,

d

dx(cot−1 u) =

−11 + u2

du

dx.

(3) For y = sec−1 x,

x = sec y ⇒ 1 = sec y tan ydy

dx

⇒ dy

dx=

1sec y tan y

.

Note that tan y = ±√

sec2 y − 1 = ±√x2 − 1, where + if y ∈ [0, π2 )

and − if y ∈ (π2 , π]. Hence,

dy

dx=

1|x|√x2 − 1

> 0 ∀x ∈ Dg.

∴ d

dx(sec−1 u) =

1|x|√u2 − 1

du

dx, |u| > 1.

d

dx(csc−1 u) =

−1|x|√u2 − 1

du

dx, |u| > 1.

However, if y ∈ [−π,−π2 ), and so x < 0, then tan y > 0. Hence,

tan y =√

x2 − 1 for x ∈ D′g.

dy

dx=

1x√

x2 − 1.

(4) Integration: For y = sin−1 x, dy = 1√1−x2

dx. Thus

∫1√

1− x2dx =

∫dy = y + c = sin−1 x + c.

Similarly,

−∫

1√a2 − x2

dx = cos−1 x

a+ c, x2 < a2,

∫1

a2 + x2dx =

1a

tan−1 x

a+ c

∫1

x√

x2 − a2dx =

1a

sec−1 |xa|+ c, |x| > a > 0.


Example 2.2.2 Evaluate∫

dx√4x−x2

.

Solution: Note that 4x − x2 = 4 − (4 − 4x + x2) = 22 − (x − 2)2. Take asubstitution u = x− 2, so that du = dx and

∫dx√

4x− x2=

∫du√

22 − u2= sin−1 u

2+ C = sin−1

(x− 2

2

)+ C. ¤

Example 2.2.3 Evaluate∫

dx√e2x−6

.

Solution: Take a substitution u = ex, so that du = exdx or dx = 1udu,

∫dx√

e2x − 6=

∫du/u√

u2 −√62

=∫

du

u

√u2 −√6

2=

1√6

sec−1

∣∣∣∣ex

√6

∣∣∣∣ + C.¤

2.2.3 Exercises

1. Evaluate the following integrals.

(1)∫

dx

17 + x2. (2)

∫ 1

0

4ds

4− d2.

(3)∫ −√2/2

−1

dy

y√

4y2 − 1. (4)

∫6dr√

4− (r + 1)2.

(5)∫ eπ/4

1

4dt

t(1 + ln2 t). (6)

∫sec2 xdx√1− tan2 x

.

(7)∫ 0

−1

6dy√3− 2y − y2

. (8)∫

dx

(x− 2)√

x2 − 4x + 3.

(9)∫ √

tan−1 xdx

1 + x2. (10)

∫ 2

2/√

3

cos(sec−1 x)dx

x√

x2 − 1.

2. Find the volume of the solid of revolution of the curve y = 1√1+x2 on

[−−√33 ,

√3] about the x-axis.

3. Find the length of the curve y =√

1 + x2 on [− 12 , 1

2 ] about the x-axis.


2.3 The Natural Logarithm

Recall that ddxxn = nxn−1 gives us the formula

∫xndx =

1n + 1

xn+1 + c for n 6= −1.

For n = −1, the above formula does not fit to this case: y = x−1 = 1x .

However, since it is continuous on (0,∞), for any x ∈ (0,∞), the integral∫ x

1

1tdt

always exists by Theorem 1.7.1, which is a function of x taking its values inR.

Definition 2.3.1 We define the function∫ x1

1t dt on (0,∞) by

ln x ≡∫ x

1

1tdt,

and called it the natural logarithm with the domain (0,∞).

The graph of y = ln x (properties of ln).

-

6

x

y

0

y = 1/x

y = lnx

1

1

e

By the First fundamental theorem of calculus,

d

dx(lnx) =

d

dx

∫ x

1

1tdt =

1x

.

2.3. The Natural Logarithm 45

In the case of indefinite integral, x can be either + or − in∫

1xdx. Since lnx

is defined only for x > 0, we take∫

1x

dx = ln |x|+ c.

(1) For x = 1, ln 1 = 0 since∫ 11

1t dt = 0.

(2) lnx is increasing, since (lnx)′ = 1x > 0 for x > 0.

(3) lnx is concave down, since (lnx)′′ = − 1x2 < 0 for x > 0.

(4) lnxy = lnx + ln y. Indeed,

ln xy =∫ xy

1

1tdt =

∫ x

1

1tdt +

∫ xy

x

1tdt

= lnx +∫ xy

x

1tdt.

In the last integral, set t = xu, then dt = xdu and∫ xy

x

1tdt =

∫ y

1

1xu

xdu =∫ y

1

1u

du = ln y.

(5) lnxr = r ln x for rational number r. Indeed,

d

dx(lnxr) =

1xr

d

dx(xr) =

rx(r−1)

xr=

r

x=

d

dx(r lnx).

Therefore, lnxr = r ln x + c. If x = 1, 0 = 0 + c shows c = 0. Hence,

ln xr = r lnx.

(6) ln(xy ) = lnx− ln y.

(7) The range of ln is R: ln : (0,∞) → R. In fact, by (5), ln 2n = n ln 2 →∞, as n → ∞ since ln 2 6= 0. Similarly, ln 2−n = −n ln 2 → −∞, asn →∞.

(8) In general, for a differentiable function y = f(x) with y′ = f ′(x) on I,∫

f ′(x)f(x)

dx = ln |f(x)|+ c.


For example,∫

tanxdx =∫

sinx

cosxdx = −

∫ − sinx

cosxdx = − ln | cosx|+ c

Example 2.3.1 From lnx =∫ x1

1t dt, we get

∫ x+1

x

1tdt = ln(x + 1)− ln x = ln(1 +

1x

).

-

6

1/x

x x + 11/(x + 1)

y = 1x

However, this area of the region under 1x from x to x + 1 is in between

1x+1 and 1

x : i.e.,1

x + 1< ln(1 +

1x

) <1x

,

holds for all x > 0. ¤

Example 2.3.2 Find dydx for

y = lnx√

x + 5(x− 1)3

.

Solution:

y = lnx√

x + 5− ln (x− 1)3 = lnx +12

ln(x + 5)− 3 ln(x− 1).

Thusy′ =

1x

+12(x + 5)− 3

(x− 1). ¤

Example 2.3.3 (Logarithmic Differentiation) Find dydx for

y23 =

(x2 + 1)(3x + 4)12√

(2x− 3)(x2 − 4)for x > 2.

2.4. THE EXPONENTIAL FUNCTION EX 47

Solution: By taking the logarithm of the equation, we get

23

ln y = ln(x2 + 1) +12

ln(3x + 4)− 12

[ln(2x− 3) + ln(x2 − 4)

].

By using implicit differentiation on the left,

23yy′ =

2x

x2 + 1+

32(3x + 4)

− 12x− 3

+x

x2 − 4.

Hence,

y′ =32y

(2x

x2 + 1+

32(3x + 4)

− 22x− 3

+x

x2 − 4

). ¤

2.4 The Exponential Function ex

Recall that ln : (0,∞) → R is a monotonically increasing function since itsderivative is (lnx)′ = 1

x > 0 for all x > 0. Hence, it is invertible, and wedenote it by exp : R → (0,∞) with y = expx ≡ ln−1 x for x ∈ R, theexponential function. Hence we have the following:

(1) (exp ◦ ln)(y) = exp(ln y) = y for y > 0.

(2) (ln ◦ exp)(x) = ln(expx) = x for x ∈ R.

(3) y = expx ⇐⇒ x = ln y.

Properties of exp : R→ R+:

(1) exp 0 = 1, since ln 1 = 0.

(2) There exists a unique real number e > 0 such that exp 1 = e or ln e = 1,since the range of ln is R and 1 ∈ R. Traditionally, this number “e”is called the natural number which plays very important role inscience.

(3) Now, the question is ”what is its numerical value?” Note that 2 < e <4, since

∫ 2

1

1x

dx = ln 2 < 1 = ln e <12

+13

+14

=1312

<

∫ 4

1

1x

dx = ln 4.


-

6

x

y

2 40

y = 1/x

y = ln x

1

1

e 3

In fact, its precise value may be found by the following method: Letf(x) = lnx. Then f ′(x) = 1

x so that f ′(1) = 1. But,

f ′(1) = limx→0

ln(1 + x)− ln 1x

= limx→0

1x

ln(1 + x)

= limx→0

ln(1 + x)1x = ln(lim

x→0(1 + x)

1x ) = 1.

Therefore,limx→0

(1 + x)1x = e.

Or, by setting y = 1x , e = limy→∞(1 + 1

y )y. We will approximate itlater that

e = 2.718281828459045 · · · .

(4) exp is a differentiable function on R and exp′ x = expx and∫

expxdx =expx + c. Indeed, for y = expx,

x = ln y =⇒ 1 =1y

dy

dx=⇒ dy

dx= y = expx.

(5) For any rational number r, exp r = er. Indeed, set y = exp r. Thenln y = r = r · 1 = r ln e = ln er, or y = er. Thus, we can computeexp r = er for any rational number r.

(6) For any irrational number x, ex may be defined by the value expxwhich was well-defined. Therefore, for any real number x, rational orirrational, we have expx = ex. In this way, the numbers e

√2 or eπ,

etc, make good sense.

The following is the graph of the function exp and its inverse functionln:

2.4. The Exponential Function ex 49

-

6

x

y

0

y = 1/x

y = ln x

1

1

e

e

y = x

y = expx

(7) For any x1, x2 ∈ R, exp(x1 + x2) = expx1 expx2. Indeed, set y =exp(x1 + x2) then ln y = x1 + x2. Since x1 and x2 are in the rangeof ln which is R, there exist unique positive number u1 and u2 suchthat lnu1 = x1, and lnu2 = x2, then ln y = lnu1 + lnu2 = ln u1u2.Since ln is 1 − 1, y = u1u2 with ui = exp x1 and u2 = expx2. Thusexp(x1 + x2) = y = expx1 expx2, or

ex1+x2 = ex1 · ex2 .

(8) exp(−x) = e−x = 1ex , since

ln (e−x) = −x ⇐⇒ x = − ln y = ln y−1 = ln1y

ex =1y

⇐⇒ y =1ex

.

Example 2.4.1 For any integer n, the inequalities 1n+1 < ln(1 + 1

n) < 1n ,

obtained in Example 2.3.1, imply that

ln(1 +1n

)n < 1 < ln(1 +1n

)n+1,

or, since exp is monotonically increasing,

(1 +1n

)n < e < (1 +1n

)n+1. ¤


Example 2.4.2 (1) y = eln 2+3 ln x = eln 2eln x3= 2x3.

(2) For y = etan−1 x, since ln y = tan−1 x,

1yy′ =

11 + x2

⇐⇒ y′ =etan−1 x

1 + x2. ¤

2.5 The Functions ax

So far exponential power is defined only for a special number e (i.e. ex makessense for any real number x). For arbitrary positive number a, we define

ax ≡ ex ln a

for any real x. By taking the natural Logarithm on the both sides we getln ax = x ln a.

Properties of ax,

(1) axy = (ax)y = (ay)x.(2) For y = ax ≡ ex ln a,

d

dx(ax) = (ln a)ex ln x = (ln a)ax.

Hence, ∫axdx =

1ln a

ax + c, for a 6= 1, a > 0.

(3) For any real number u and x,

d

dx(xu) = uxu−1.

Indeed, set y = xu. Then ln y = u lnx, by chain rule

1yy′ =

u

x⇒ y′ =

u

xxu = uxu−1.

Example 2.5.1 (1) For y = x√

2, y′ =√

2x√

2−1.(2) For y = xx, ln y = x ln x

1yy′ = lnx +

x

x= ln x + 1 ⇒ y′ = xx(lnx + 1). ¤

2.6. THE FUNCTIONS Y = LOGA X 51

2.6 The Functions y = loga x

For a > 0 ( 6= 1), we know that y = ax is differentiable and 1− 1. Hence, itsinverse exists, which we call the Logarithm of x to the base a and denote itby

y = loga x.

Therefore

loga ax = x for x ∈ R, and a(loga x) = x for x > 0.

Properties of loga.

(1) loga x = ln xln a . Indeed,

y = loga x ⇐⇒ ay = x ⇐⇒ lnx = y ln a ⇐⇒ y =ln x

ln a.

(2) loga(xy) = loga x + loga y.

ln(xy)ln a

=ln x

ln a+

ln y

ln a= loga x + loga y.

(3) loga xn = n loga x.

A function f(x) is said to grow faster than g(x) as x →∞ if limx→∞g(x)f(x) =

0. In this case, g(x) is said to grow slower than f(x) as x →∞.

(4) For any n > 0,

limx→∞

xn

ex= 0, lim

x→∞ln x

x= 0.

Hence, ex grows faster than any polynomial xn, and lnx grows slowerthan x.

2.7 Order and Oh-Notation

Definition 2.7.1 [Little-oh] A function f of smaller order than g as x →∞if

limx→∞

f(x)g(x)

= 0,

denoted by f = o(g) and read ”f is little-oh of g”.


That is, f grows slower than g as x →∞. For example, (1) lnx = o(x)and xn = o(ex) as x → ∞, by (4) in Section 2.6. (2) x2 = o(x3 + 1) asx →∞ since limx→∞ x2

x3+1= 0.

Definition 2.7.2 [Big-oh] Let functions f and g be positive for x suffi-ciently large. Then f is of at most the order of g as x → ∞ if there is apositive number M such that

f(x)g(x)

≤ M,

for sufficiently large x, denoted by f = O(g) and read ”f is big-oh of g”.

That is, if f and g grow at the same rate as x →∞, then f = O(g), andg = O(f). For example, (1) x + sin x = O(x) as x → ∞ since x+sin x

x ≤ 2for sufficiently large x. (2) ex + x2 = O(ex) as x → ∞ since ex+x2

ex → 1 asx →∞. (3) x = o(ex) as x →∞ since x

ex → 0 as x →∞.Clearly, f = o(g) implies f = O(g).

2.8 Applications

In modelling many real world situations, a quantity y increases or decreasesat a rate proportional to its size at a given time t. We will see soon manyexamples of this kind of situations below. Anyway, this kind of situationcan be described as follows: the fact that the rate of change of y in time tis proportional to y(t) means that

dy

dt= ky, for some constant k,

and the amount present at time t = 0 is given as y(0) = y0.If y(t) is positive and increases, the k > 0, and if y(t) is positive and

decreases, the k < 0. The above equation can be solved as follows:

1ydy = kdt,

∫1ydy =

∫kdt,

ln |y(t)| = kt + c,

y(t) = ekt+c = ecekt = Cekt,

2.8. Applications 53

where C = y(0) = y0. Thus, the quantity y(t) changes in exponential way:exponential growth if k > 0, and exponential decay if k < 0. Suchquantities is said to change according to the law of exponential change.

Example 2.8.1 Newton’s law of cooling says that the rate at which thetemperature of a body in a surrounding room is changing at any given timeis roughly proportional to the difference between the body’s temperatureand that of its surroundings. Thus, if T (t) is the temperature of the bodyat time t with initial temperature T (0) = T0 at time t = 0 and Ts is theconstant surrounding temperature, then we have

dT

dt(t) = k(T (t)− Ts).

By the above result, the solution is

T (t)− Ts = (T0 − Ts)ekt.

For example, suppose that a hard boiled egg at 98◦C is put in a sink of18◦C water to cool, and after 5 minutes the egg’s temperature is found to be38◦C. Assume that the water has not warmed appreciably. Then the timetaken for the egg to reach 20◦C can be found as follows:

Since Ts = 18◦C, T0 = T (0) = 98◦C and T (5) = 38◦C, we have

T (t)− 18 = (98− 18)ekt = 80ekt.

But at t = 5, T (5) − 18 = 38 − 18 = 20 = 80ek5. Hence, e5k = 14 , or

k = −15 ln 4. To have T (t) = 20, we solve 20− 18 = 2 = 80ekt, to get

kt = − ln 40 =⇒ t = 5ln 40ln 4

= 5(1 +ln 10ln 4

) ' 13 min. ¤

Example 2.8.2 [Half-life of a radioactive element:] The decay rate of a ra-dioactive element is proportional to the number of radioactive nuclei present.The half-life of a radioactive element is the time required for half of theradioactive nuclei originally present in any sample to decay. Then provethat the half-life of a radioactive element is a constant that does not dependon the number of radioactive nuclei initially present in the sample.

Solution: From y(t) = y0ekt, if y(t) = 1

2y0, then

12y0 = y0e

kt =⇒ kt = − ln 2 ⇒ t = − ln 2k

.

which is a constant. ¤


Example 2.8.3 [Compound interest] Suppose that, for money on depositin a savings account, a bank pays r percent annual interest and compoundsinterest k-times a year. An amount A won at the first time of the com-pounding period receives an additional

r

kA.

This interest is added to the original amount A, and the amount at the endof the first period is

A1 ≡ A +r

kA = A(1 +

r

k).

Similarly, after second period,

A2 = A1(1 +r

k) = A(1 +

r

k)2.

Hence, after n-th period, the amount is

An = A0(1 +r

k)n.

In particular, at the end of t-years, the number of compounding period isn = tk

At = A0(1 +r

k)kt.

Now if the bank increases k (i.e., increases the number of compoundingtimes per year), then, after t-years,

At = limk→∞

A0(1 +r

k)kt = A0 lim

k→∞(1 +

r

k)kt = A0 lim

k→∞((1 +

r

k)

kr )tr = A0e

rt.

2.9 Hyperbolic Functions

Another kind of functions that play important roles in applications are hy-perbolic functions: For instance, they are used in problems such as comput-ing the tension in a cable hanged on two poles like an electric transmissionline. They also appear as important antiderivatives.

2.9. Hyperbolic Functions 55

Name Definition Derivative

Hyperbolic sine of x sinhx = ex−e−x

2 coshx

Hyperbolic cosine of x coshx = ex+e−x

2 sinhx

Hyperbolic tangent of x tanhx = sinh xcosh x = ex−e−x

ex+e−x sech2x

Hyperbolic cotangent of x cothx = cosh xsinh x = ex+e−x

ex−e−x −csch2x

Hyperbolic secant of x sechx = 1cosh x = 2

ex+e−x − sech x tanhx

Hyperbolic cosecant of x csch x = 1sinh x = 2

ex−e−x − csch x cothx

The followings are the graphs of the hyperbolic functions.

6

-

6

x

12

1

y

y = e−x

2

y = − e−x

2

y = sinhx

y = coshx:y = ex

2

1

−1

y = cothx

y = cothx

y = tanhx

-

6

-

6

y =sech x

y =csch x

1

°

:

-

Hyperbolic functions bear a number of similarities to the trigonometricfunctions after which they are named. Some of the properties of hyperbolicfunctions similar to those of trigonometric functions are the followings (aneasy exercise):

cosh2 x− sinh2 x = 1,sinh 2x = 2 sinhx coshx,


cosh 2x = cosh2 x + sinh2 x,

sinh(x + y) = sinhx cosh y + coshx sinh y,

cosh(x + y) = coshx cosh y + sinhx sinh y.

Note that

d

dx(sinhx) =

d

dx(ex − e−x

2) =

ex + e−x

2= coshx.

The rest of the derivative formulas may be obtained similarly.

2.9.1 The inverse hyperbolic functions

The hyperbolic functions are also invertible on suitably restricted domains.Let y = sinh−1 x. Then

x = sinh y =ey − e−y

2.

Thus e2y − 2xey − 1 = 0 and so ey = x +√

x2 + 1, or

y = ln(x +√

x2 + 1), x ∈ R.

Moreover, by implicit differentiation,

y′ =1

cosh y=

1√1 + sinh2 y

=1√

1 + x2.

The other formulas in the following tables may be obtained similarly.

Function f Logarithm Domain

sinh−1 x ln(x +√

x2 + 1) x ∈ Rcosh−1 x ln(x +

√x2 − 1) x ≥ 1

tanh−1 x 12 ln 1+x

1−x |x| < 1

coth−1 x 12 ln x+1

x−1 |x| > 1

sech−1x ln(1+√

1−x2

x ) 0 < x ≤ 1

csch−1x ln( 1x +

√1+x2

|x| ) x 6= 0

2.9. Hyperbolic Functions 57

Function y Differential dy Domain

sinh−1(ua ) + C du√

a2+u2a > 0

cosh−1(ua ) + C du√

u2−a2u > a > 0

tanh−1(ua ) + C adu

a2−u2 u2 < a2

coth−1(ua ) + C adu

a2−u2 u2 > a2

sech−1(ua ) + C −adu

u√

a2−u2a > u > 0

csch−1(ua ) + C −adu

|u|√a2+u2u 6= 0, a > 0

The following are also useful formulas:

sech−1x = cosh−1 1x

,

csch−1x = sinh−1 1x

,

coth−1x = tanh−1 1x

,

which can be easily derived from the definition.

Problem 2.9.1 Evaluate∫ 10 sinh2 xdx. Use sinh2 x = 1

2(cosh 2x− 1).

Example 2.9.1 [Applications to Hanging Cables] Suppose that a cablewith weight w kg/m is hanging between two poles A and B freely and is inequilibrium of forces. While the cable is held at A so that it does not slip,and if the cable is then cut off at the point S where it crosses the x-axis, (itmay then be released at S) then the weigh wg of the section of the cable ASis in balance with the section AQ. Choose a coordinate system for the planeof the cable so that x-axis is horizontal, the force of gravity is straight downalong the y-axis, and the lowest point Q of the cable lies at y = H/w, whereH is the horizontal tension, the length of a vector, at the lowest point Q.Let P (x, y) denote an arbitrary point of the cable, T denote the tangentialtension, the length of a vector, at P , and W = ws denote the total weightof the cable PQ of length s with w kg/m. Then

H = T cosφ, W = ws = T sinφ.

NowW

H=

T sinφ

T cosφ= tanφ =

dy

dx=

ws

H


Hence d2y/dx2 = (w/H)ds/dx. But, since ds/dx =√

1 + (dy/dx)2,

d2y

dx2=

w

H

√1 + (

dy

dx)2.

i.e., the condition that the hanging cable is in equilibrium of the forces isexpressed by the above differential equation.

6

¾

7

x

y

H H/w

Tφ

P (x, y)

0

Q(0, Hw )

?

ws

-

6W

H

AB

-S

Figure 2.1: Hanging cable: the catenary.

If we set y′ = dy/dx = u, this equation can be rewritten as

1√1 + u2

du

dx=

w

H,

whose solution is

sinh−1 u =w

Hx + c, or sinh(

w

Hx + c) = u(x) =

dy

dx= tanφ.

Since tanφ = dydx = 0 when x = 0, c = 0. Thus,

tanφ =dy

dx= sinh(

w

Hx).

But then y(x) = Hw cosh( w

H x) + d. Since y(0) = Hw , d = 0. Thus

y(x) =H

wcosh(

w

Hx).

On the other hand, since

ds

dx=

√1 + (

dy

dx)2 =

√1 + sinh2(

w

Hx) = cosh(

w

Hx),

2.10. TECHNIQUES OF INTEGRATION 59

s =H

wsinh(

w

Hx).

Therefore, the coordinates of P (x, y) may be expressed as

x =H

wsinh−1(

w

Hs), y =

√s2 + (

H

w)2.

The curve y = a coshx/a is called a catenary from the Latin wordCatena, meaning chain.

2.10 Techniques of integration

In this section, we introduce some techniques for finding indefinite integralsof more complicated functions than those seen before.

2.10.1 Basic integration formulas

∫undu = un+1

n+1 + C (n 6= −1)∫

duu = ln |u|+ C

∫eudu = eu + C

∫audu = au

ln a + C (a > 0, a 6= 1)∫

sec2 udu = tanu + C∫

sec u tanudu = secu + C

∫tan du = − ln | cosu|+ C

∫sinhudu = coshu + C

∫du√

a2−u2= sin−1 u

a + C∫

dua2+u2 = 1

a tan−1 ua + C

∫du

u√

u2−a2= 1

a sec−1∣∣ua

∣∣ + C∫

du√a2+u2

= sinh−1 ua + C (a > 0)

∫du√

u2−a2= cosh−1 u

a + C (u > a > 0)

Example 2.10.1 Note that 1 + cos 2x = 2 cos2 x.

∫ π/4

0

√1 + cos 4xdx =

∫ π/4

0

√2 cos2 2xdx =

√2

∫ π/4

0| cos2 2x|dx

=√

2sin 2x

2

∣∣∣∣π/4

0

=√

22

.¤


Example 2.10.2∫

secxdx =∫

sec x(sec x + tanx)secx + tanx

dx

=∫

sec2 x + sec x tanx

sec x + tanxdx =

∫du

u

= ln |u|+ C = ln | sec x + tanx|+ C. ¤

2.10.2 Integration by parts

If u = f(x) and v = g(x) have continuous derivatives on [a, b], the productrule says that

d

dx[f(x)g(x)] = f ′(x)g(x) + f(x)g′(x).

Thus

f(x)g(x) =∫

d

dx[f(x)g(x)]dx =

∫f ′(x)g(x)dx +

∫f(x)g′(x)dx

or∫

f(x)g′(x)dx = f(x)g(x)−∫

f ′(x)g(x)dx

or∫ b

af(x)g′(x)dx = f(x)g(x)|ba −

∫ b

af ′(x)g(x)dx

or∫

udv = uv −∫

vdu.

Example 2.10.3∫

x cosxdx = x sinx−∫

sinxdx = x sinx− cosx + C.∫

lnxdx = x lnx−∫

x · 1x

dx = x ln x− x + C. ¤

Example 2.10.4∫

ex cosxdx = ex sinx−∫

ex sinxdx

= ex sinx−(−ex cosx−

∫ex(− cosx)dx

)

= xex sinx + ex cosx−∫

ex cosxdx,∫

ex cosxdx =ex sinx + ex cosx

2+ C. ¤

2.10. Techniques of Integration 61

2.10.3 Integration by partial fractions

Suppose that the integrand is a rational function of the form f(x)g(x) with

deg f < deg g.1. Let (x−a)m be a factor of g(x), but (x−a)m+1 - g(x). To this factor,

assign the sum of the m partial fractions:A1

x− a+

A2

(x− a)2+ · · ·+ Am

(x− a)m.

Do this for each distinct linear factors of g(x).2. Let (x2 + px+ q)n be a factor of g(x), but (x2 + px+ q)n+1 - g(x). To

this factor, assign the sum of the n partial fractions:B1x + C1

x2 + px + q+

B2x + C2

(x2 + px + q)2+ · · ·+ Bnx + Cn

(x2 + px + q)n.

Do this for each distinct quadratic factor of g(x) that can not be factoredinto linear factors with real coefficients.

3 Set the integrand f(x)g(x) equal to the sum of all these partial fractions,

and determine the coefficients.

Example 2.10.5 Evaluate∫ −2x+4

(x2+1)(x−1)2dx.

Solution: Write the integrand as−2x + 4

(x2 + 1)(x− 1)2dx =

Ax + B

x2 + 1+

C

x− 1+

D

(x− 1)2.

The numerators can written as

−2x + 4 = (Ax + B)(x− 1)2 + C(x− 1)(x2 + 1) + D(x2 + 1)= (A + C)x3 + (−2A + B − C + D)x2

+(A− 2B + C)x + (B − C + D).∴ 0 = A + C, 0 = −2A + B − C + D,

−2 = A− 2B + C, 4 = B − C + D.

∴ A = 2, C = −2, B = 1, D = 1.

∴∫ −2x + 4

(x2 + 1)(x− 1)2dx =

∫ (2x + 1x2 + 1

− 2x− 1

+1

(x− 1)2

)dx

=∫ (

2x

x2 + 1+

1x2 + 1

− 2x− 1

+1

(x− 1)2

)dx

= ln(x2 + 1) + tan−1 x− 2 ln |x− 1| − 1x− 1

+ C.¤


2.10.4 Trigonometric substitutions

Trigonometric substitutions can be very useful in transforming integrals in-volving

√a2 − x2,

√a2 + x2, and

√x2 − a2 into integrals we can evaluate

directly.Consider the following right triangles

aθ θ θ

x

√a2 + x2

√a2 − x2

√x2 − a2

ax

x

x = a tan θa

x = a sin θ x = a sec θ

The most common substitutions are the following:

With x = a tan θ, a2 + x2 = a2(1 + tan2 θ) = a2 sec2 θ.

With x = a sin θ, a2 − x2 = a2(1− sin2 θ) = a2 cos2 θ.

With x = a sec θ, x2 − a2 = a2(sec2 θ − 1) = a2 tan2 θ.

Example 2.10.6 Find the area enclosed by the ellipse x2

a2 + y2

b2= 1.

Solution: Write the equation as

y2

b2= 1− x2

a2=

a2 − x2

a2, or y =

b

a

√a2 − x2.

The area of the ellipse is

A = 4∫ a

0

b

a

√a2 − x2dx

= 4b

a

∫ π/2

0a cos θ · a cos θdθ, x = a cos θ,

= 4ab

∫ π/2

0cos2 θdθ = 4ab

∫ π/2

0

1 + cos 2θ2

dθ

= 2ab

[θ +

sin 2θ

2

]π/2

0

= πab. ¤


2.10.5 Improper integrals

Definite integrals require firstly that the domain of integration [a, b] to befinite and secondly that the range of the integrand to be finite over thedomain. In practice, we may encounter problems that fail to meet one orboth of these conditions. For example, the integral of ln x

x2 over [1,∞), or theintegral of 1√

xover (0, 1]. Those integrals are said to be improper.

Definition 2.10.1 Let f(x) be a continuous function on an interval I. Thefollowing are called improper integrals, and defined as the limits:

(1) For I = [a,∞),∫ ∞

af(x)dx = lim

b→∞

∫ b

af(x)dx.

(2) For I = (−∞, b],∫ b

−∞f(x)dx = lim

a→−∞

∫ b

af(x)dx.

(3) For I = (−∞,∞), choose any real number c,∫ ∞

−∞f(x)dx =

∫ c

−∞f(x)dx +

∫ ∞

cf(x)dx.

(4) Suppose that limx→c+ f(x) →∞ on I = (c, b],∫ b

cf(x)dx = lim

a→c+

∫ b

af(x)dx.

(5) Suppose that limx→c− f(x) →∞ on I = [a, c),∫ c

af(x)dx = lim

b→c−

∫ b

af(x)dx.

(6) Suppose that limx→c f(x) →∞ on I = [a, b] and a < c < b,∫ b

af(x)dx =

∫ c

af(x)dx +

∫ b

cf(x)dx.


In each case, if the limit is finite, we say the improper integral converges.If the limit does not exists, the integral diverges.

It is easy to show that the choice of c in (3) is immaterial.

Example 2.10.7 Evaluate∫ ∞

1

ln x

x2dx.

Solution:∫ b

1

ln x

x2dx =

[ln x(−1

x)]b

1

−∫ b

1− 1

x2dx

= − ln b

b−

[1x

]b

1

= − ln b

b− 1

b+ 1.

∴∫ ∞

1

ln x

x2dx = lim

b→∞

∫ b

1

lnx

x2dx

= limb→∞

(− ln b

b− 1

b+ 1

)= 1. ¤

Example 2.10.8 Evaluate∫ ∞

−∞

dx

1 + x2.

Solution:∫ 0

−∞

dx

1 + x2= lim

a→−∞

∫ 0

a

dx

1 + x2= lim

a→−∞[tan−1 x

]0

a

= lima→−∞(− tan−1 a) =

π

2.

Similarly, one can easily show that∫ 0−∞

dx1+x2 = π

2 . Thus

∫ ∞

−∞

dx

1 + x2=

∫ 0

−∞

dx

1 + x2+

∫ ∞

0

dx

1 + x2= π. ¤

Example 2.10.9 Evaluate ∫ 1

0

dx

1− x.


Solution:∫ 1

0

dx

1− x= lim

b→1−

∫ b

0

dx

1− x= lim

b→1−[− ln |1− x|]b0

= limb→1−

(− ln |1− b|) = ∞.

Thus the integral diverges. ¤

Example 2.10.10 Evaluate∫ 3

0

dx

(x− 1)2/3.

Solution: Since the integrand goes to infinity at x = 1, we first evaluate:∫ 1

0

dx

(x− 1)2/3= lim

b→1−

∫ b

0

dx

(x− 1)2/3= lim

b→1−

[3(x− 1)1/3

]b

0

= limb→1−

[3(b− 1)1/3 + 3

]= 3.

Similarly,∫ 3

1

dx

(x− 1)2/3= lim

c→1+

∫ 3

c

dx

(x− 1)2/3= lim

c→1+

[3(x− 1)1/3

]3

c

= limc→1+

[3(3− 1)1/3 − 3(c− 1)1/3

]= 3 3

√2.

Thus∫ 3

0

dx

(x− 1)2/3=

∫ 1

0

dx

(x− 1)2/3+

∫ 3

1

dx

(x− 1)2/3= 3 + 3 3

√2. ¤

Theorem 2.10.1 (Comparison Test for convergence) Let f and g becontinuous on [a,∞) with 0 ≤ f(x) ≤ g(x) for all x ≥ a. Then

(1)∫∞a f(x)dx converges if

∫∞a g(x)dx converges.

(2)∫∞a g(x)dx diverges if

∫∞a f(x)dx diverges.

If

limx→∞

f(x)g(x)

= L, 0 < L < ∞,

then∫∞a f(x)dx and

∫∞a g(x)dx both converge or both diverge.


The proof of the first two parts can be reasoned as the following example:

Example 2.10.11 Investigate the convergence of∫ ∞

1e−x2

dx.

Solution: By definition,∫∞1 e−x2

= limb→∞∫ b1 e−x2

dx. Since e−x2> 0 on

[1, b],∫ b1 e−x2

dx is increasing as b →∞. Moreover, on a finite interval [1, b],e−x2

< e−x. Thus∫ b

1e−x2

dx ≤∫ b

1e−xdx = −e−b + e−1 < e−1,

for all b > 1. Hence,

limb→∞

∫ b

1e−x2

dx ≤ e−1

exists. ¤

Example 2.10.12 Investigate the convergence of∫ ∞

1

3ex − 2

dx.

Solution: It is easy to see that∫∞1

1ex dx converges.

limx→∞

1/ex

3/(ex − 2)= lim

x→∞

(13− 2

ex

)=

13.

Thus,∫∞1

3ex−2dx also converges. ¤

Chapter 3

Infinite Sequences and Series

3.1 Sequences

Definition 3.1.1 An infinite sequence of numbers is a function from theset Z of integers into a set R.

The set R can be any set, but in our course it is usually the set R ofreal numbers. Thus a sequence is usually denoted by writing down all thenumbers in the range with numbers in the domain as the indices:

{a1, . . . , an, . . .} = {an ∈ R | n ∈ Z+} = {an}.

For example, an = (−1)n+1 1n means

{1, −1

2,

13, −1

4, . . . , (−1)n+1 1

n, . . .

}.

Definition 3.1.2 A sequence {an} converges to a number L if for everypositive number ε there is an integer N such that for all n > N ,

|an − L| < ε.

In this case, we write

limn→∞ an = L, or an → L,

and call L the limit of the sequence.If no such a number L exists, we say {an} diverges.

67

68 Chapter 3. Infinite Sequences and Series

Definition 3.1.3 A sequence {an} diverges to infinity if for every num-ber M there is an integer N such that for all n > N ,

an > M.

In this case, we write

limn→∞ an = ∞, or an →∞.

Similarly, if for every number m there is an integer N such that for alln > N ,

an < m,

then we say {an} diverges to negative infinity and write

limn→∞ an = −∞, or an → −∞.

Theorem 3.1.1 Let {an} and {bn} be convergent sequences to A and B,respectively.

(1) limn→∞(an ± bn) = A±B.(2) limn→∞(an · bn) = A ·B.(3) limn→∞(k · bn) = k ·B.(4) limn→∞ an

bn= A

B , provided B 6= 0.

Example 3.1.1 (1) Show limn→∞ 1n = 0.

For a given any ε > 0, take N such that N > 1ε . Then for n > N ,

| 1n− 0| < 1

N< ε.

(2) limn→∞ n−1n = limn→∞(1− 1

n) = 1− 0 = 1. ¤

Theorem 3.1.2 (Sandwich Theorem) Let {an}, {bn} and {cn} be se-quences of real numbers. If an ≤ bn ≤ cn for all n except for a finitenumber, and if limn→∞ an = limn→∞ cn = L, then limn→∞ bn = L.

Example 3.1.2 (1) limn→∞ cos nn = 0, since − 1

n ≤ cos nn ≤ 1

n .(2) limn→∞ 1

2n = 0, since 0 ≤ 12n ≤ 1

n . ¤

Theorem 3.1.3 Let {an} be a sequence of real numbers. If an → L and iff is a function that is continuous at L and defined at all an, then f(an) →f(L).

3.1. Sequences 69

Example 3.1.3 (1)√

(n + 1)/n → 1, since n+1n → 1 by taking f(x) =

√x

and L = 1,√

(n + 1)/n → √1 = 1.

(2) limn→∞ 21/n = 20 = 1: since limn→∞ 1n = 0, take an = 1

n , f(x) = 2x

and L = 1. ¤

Theorem 3.1.4 Let {an} be a sequence of real numbers and f(x) be a func-tion defined for all x ≥ n0 such that f(n) = an for n ≥ n0. Then

limx→∞ f(x) = L ⇒ lim

n→∞ an = L.

Proof: If limx→∞ f(x) = L, then for any ε > 0 there is M such that forany x > M |f(x) − L| < ε. Choose an integer N > max{M,n0}, so that∀ n > N |an − L| = |f(n)− L| < ε. ¤

Example 3.1.4 (1) limx→∞ ln nn = limx→∞

1n1 = 0.

(2) limn→∞ 2n

5n = limn→∞ 2n·ln 25 = ∞.

(3) For an =(

n+1n−1

)n, take the natural logarithm: ln an = n ln

(n+1n−1

)

and then

limn→∞ ln an = lim

n→∞n ln(

n + 1n− 1

)

= limn→∞

ln(

n+1n−1

)

1/n

= limn→∞

−2n2−1

−1/n2= lim

n→∞2n2

n2 − 1= 2.

Thus an = eln an → e2. ¤

Theorem 3.1.5 (1) limn→∞ ln nn = 0.

(2) limn→∞ n√

n = 1.

(3) limn→∞ x1/n = 1, for x > 0.

(4) limn→∞ xn = 0, for |x| < 1.

(5) limn→∞(1 + xn)n = ex, for any x.

(6)) limn→∞ xn

n! = 0, for any x.

Sometimes, a sequence may be given by a recursive formula: an is de-termined by the values of preceding terms with values of initial terms.


Example 3.1.5 (1) an = an−1 + 1 with an initial value a1 = 1 defines thenatural numbers an = n.

(2) an = ran−1 defines a geometric sequence an = rn with an initialvalue a0 = 1.

(3) an = nan−1 with a1 = 1 defines the factorial an = n!.(4) an+1 = an + an−1 with a1 = 1 and a2 = 1 defines the Fibonacci

numbers. ¤

Definition 3.1.4 A sequence {an} with the property that an ≤ an+1 forall n is called a nondecreasing sequence.

Definition 3.1.5 A sequence {an} is bounded from above if there is anumber M such that an ≤ M for all n. The number M is called an upperbound for {an}. If no number less than M is an upper bound for {an},then M is called the least upper bound for {an}.

The completeness of the real numbers means that any set bounded abovehas a least upper bound. Thus, a nondecreasing sequence bounded abovehas a least upper bound L. In fact, (i) an ≤ L for all n, (ii) for any ε > 0L− ε can not be an upper bound of the sequence since L is the least upperbound. Thus, ∃ n0 such that an0 > L− ε. Since it is nondecreasing, for alln > n0,

L ≥ an ≥ an0 > L− ε.

This means L is the limit of the sequence: i.e, an → L.

Theorem 3.1.6 A nondecreasing sequence of real numbers converges if andonly if it is bounded from above. If a nondecreasing sequence converges, itconverges to its least upper bound.

Example 3.1.6 For Cn = 11 + 1

2 + · · ·+ 1n − lnn, show that the limit

C = limn→∞Cn = lim

n→∞

(11

+12

+ · · ·+ 1n− ln n

)

exists.

Solution: Let Dn = Cn − 1n . Then Dn < Cn, and from the inequality:

1n + 1

< ln(1 +1n

) <1n

3.2. INFINITE SERIES 71

obtained in Example 2.3.1, we get

Cn+1 − Cn =1

n + 1− ln(1 +

1n

) < 0,

Dn+1 −Dn = Cn+1 − Cn − 1n + 1

+1n

=1n− ln(1 +

1n

) > 0.

That is, Cn is decreasing, while Dn is increasing. Hence, Cn converges, since0 = D1 < Dn < Cn. The limit C is called the Euler’s constant. For anapplication of this number, see the last part of Section 11.1.2. ¤

Example 3.1.7 (Logarithmic Limits) Find the limit of

an =(

1 +1np

)(1 +

2np

)· · ·

(1 +

p

np

),

where p is a positive integer.

Solution: Note that 1 ≤ an. By taking the logarithm of an, we get

0 ≤ ln an = ln(

1 +1np

)+ ln

(1 +

2np

)+ · · ·+ ln

(1 +

p

np

)

=1np

[ln

(1 +

1np

)np

+ 2 ln(

1 +2np

)np2

+ · · ·+ p ln(

1 +p

np

)npp

]

≤ 1np

(ln e + 2 ln e + · · ·+ p ln e)

=1np

(1 + 2 + · · ·+ p) =p + 12n

→ 0, as n →∞,

where the third inequality follows from Example 2.4.1. Therefore,

limn→∞ an = lim

n→∞ eln an = elimn→∞ ln an = e0 = 1. ¤

3.2 Infinite Series

An infinite series is the sum of an infinite sequence of numbers:

a1 + a2 + · · ·+ an + · · ·


How can we find this sum of infinite numbers in a finite life time? Forthis, we look at the sequence of sums of finite number of terms, called thesequence of partial sums:

S1 = a1, S2 = a1 + a2, S3 = a1 + a2 + a3, . . . , Sn =n∑

k=1

ak, . . . .

Definition 3.2.1 If the sequence {Sn} converges to a number L, we saythat the series converges to the sum L, written as

a1 + a2 + · · ·+ an + · · · =∞∑

n=1

an = L.

If the sequence of partial sums does not converge, we say that the seriesdiverges.

Example 3.2.1 [Geometric Series] A geometric series is of the form:

a + ar + ar2 + · · ·+ arn−1 + · · · =∞∑

n=0

arn−1

in which a 6= 0 and r are fixed real numbers. As it is well known, the n-thpartial sum is

Sn =a(1− rn)

1− r, r 6= 1.

Note that, if r = 1, Sn = na, and if r = −1, then Sn is alternating be-tween 0 and 1. Thus the series diverges if |r| = 1. Since rn ≤ |rn| →{

0 if |r| < 1,∞ if |r| > 1

as n →∞,

∞∑

n=1

arn−1 ={

a1−r if |r| < 1,

∞ (diverges) if |r| ≥ 1. ¤

Example 3.2.2 [Telescoping Series] (1) For a series of the form∞∑

n=1

an =∞∑

n=1

1n(n + 1)

,

an =∑∞

n=11

n(n+1) = 1n − 1

n+1 and so

Sn =(

11− 1

2

)+

(12− 1

3

)+

(13− 1

4

)+ · · ·

(1n− 1

n + 1

)

= 1− 1n + 1

→ 1, as n →∞. ¤

3.2. Infinite series 73

Consider a convergent series∑∞

n=1 an = S. Then

an = Sn − Sn−1 = (S − Sn)− (S − Sn−1) → 0

as n →∞. This proves that ”If∑∞

n=1 an converges, then an → 0.”

Theorem 3.2.1 (n-th term test for divergence) If limn→∞ an 6= 0 orfails to exist, then

∑∞n=1 an diverges.

Example 3.2.3 (1)∑∞

n=1n+1

n diverges since n+1n → 1 6= 0.

(2)∑∞

n=1(−1)n+1 diverges since limn→∞(−1)n+1 does not exist.(3) The Harmonic series

∞∑

n=1

1n

= 1 +12

+13

+14

+ ·+ 1n

+ · · ·

diverges even if an → 0. In fact,

1 +12

+ (13

+14) + (

15

+ · · ·+ 18) + (

19

+ · · ·+ 116

) + · · ·

> 1 +12

+ (12) + (

12) + (

12) + · · ·

→ ∞. ¤

Theorem 3.2.2 Suppose∑∞

n=1 an = A and∑∞

n=1 bn = B are convergentseries. Then

(1)∑∞

n=1(an ± bn) = A + B.

(2)∑∞

n=1 kan = k∑∞

n=1 an = kA.

If∑∞

n=1 an diverges, (1) and (2) in Theorem 3.2.2 also diverges. Thesum of two divergent series can be convergent: For example, if an = 1 andbn = −1 for all n, then an + bn = 0 for all n.

Adding or deleting a finite number of terms does not alter the conver-gence or the divergence of the series.


3.3 Tests for Convergence of Series

We first assume that the terms an of a series∑∞

n=1 an are all positive. ThenSn+1 = Sn+an means the sequence of the partial sums form a nondecreasingsequence. Thus for the convergence of those series all we need to find out isthat the partial sums are bounded from above.

Corollary 3.3.1 A series∑∞

n=1 an of nonnegative terms converges if andonly if its partial sums are bounded from above.

3.3.1 The integral test

Example 3.3.1 For the series:∞∑

n=1

1n2

= 1 +122

+132

+142

+ ·+ 1n2

+ · · · ,

consider the integral∫∞1 f(x)dx =

∫∞1

1x2 dx. The series may be considered

as the area of the column rectangles with base 1 and altitude 1n2 , while the

integral is the area under the graph of f from 0 to ∞ as the following figureshows:

-

6f(x) = 1

x2

1 2 3 nn− 1

112

122

x

y

0

1n2

µ

Then, for any n,

Sn = 1 +122

+132

+142

+ ·+ 1n2

≤ 1 +∫ n

1

1x2

dx = 1 +[−1

x

]n

1

= 2− 1n

< 2.

3.3. Tests for convergence 75

Thus that series converges. ¤

Theorem 3.3.2 (The integral test) Let∑∞

n=1 an be a series of positiveterms. Suppose that an = f(n) for some continuous positive decreasingfunction f of x > N . Then the series

∑∞n=1 an and the integral

∫∞N f(x)dx

both converge or both diverge.

Proof: Since f is decreasing and f(n) = an, from the figure below,

-

6

1 2 3 4 nn− 1 n + 1

a1

a1

a2

a2

an

an9

y = f(x)

x

y

we obtain the following relations:∫ n+1

1f(x)dx ≤ a1 + a2 + a3 + · · ·+ an ≤ a1 +

∫ n

1f(x)dx.

These inequalities hold for all n. This shows the series and the integral bothfinite or infinite. ¤

Example 3.3.2 [The p-Series test] For a real constant p, the series∞∑

n=1

1np

= 1 +12p

+13p

+14p

+ ·+ 1np

+ · · ·

is called a p-series. Let f(x) = 1xp . If p > 1, f(x) > 0 is decreasing function

of x, and∫ ∞

1

1xp

dx =∫ ∞

1x−pdx = lim

b→∞x−p+1

−p + 1|b1

=1

1− plim

b→∞(

1bp−1

− 1)

=1

1− p(0− 1) =

1p− 1

.


Thus the series converges.If p < 1, then 1− p > 0 and

∫ ∞

1

1xp

dx =1

1− plim

b→∞(

1bp−1

− 1) = ∞.

Thus the series diverges by the integral test.If p = 1, we have the divergent harmonic series. Hence, the p-series

converges for p > 1, and diverges for every other value of p. ¤

3.3.2 The comparison test

Theorem 3.3.3 (The Comparison Test) Let∑∞

n=1 an be a series of pos-itive terms.

(1)∑∞

n=1 an converges if there is a convergent series∑∞

n=1 cn with an ≤ cn

for n > N .(2)

∑∞n=1 an diverges if there is a divergent series

∑∞n=1 dn of nonnegative

terms with an ≥ dn for n > N .

Proof: (1) The partial sums of∑∞

n=1 an are bounded from above by

M = a1 + · · ·+ aN +∞∑

n=N+1

cn,

so that they form a nondecreasing sequence with a limit L ≤ M .(2) The partial sums of

∑∞n=1 an can not be bounded from above, oth-

erwise the divergent series∑∞

n=1 dn will be bounded from above, which willbe convergent by (1). ¤

Example 3.3.3 (1)∑∞

n=15

5n−1 diverges, since 55n−1 = 1

n− 15

> 1n .

(2)∑∞

n=01n! converges since 1

n! ≤ 12n for n > 1. ¤

Theorem 3.3.4 (Limit Comparison Test) Suppose that an > 0 and bn >0 for all n > N .

(1) If limn→∞ anbn

= c > 0, then∑∞

n=1 an and∑∞

n=1 bn both converge orboth diverge.


(2) If limn→∞ anbn

= 0 and∑∞

n=1 bn converges, then∑∞

n=1 an converges.(3) If limn→∞ an

bn= ∞ and

∑∞n=1 bn diverges, then

∑∞n=1 an diverges.

Proof: (1) For c2 > 0, there exists N such that ∀ n > N ,

∣∣∣∣an

bn− c

∣∣∣∣ <c

2.

Thusc

2< an

bn<

3c

2,

c

2bn < an <

3c

2bn,

and so Theorem 3.3.3(1) follows.(2) For ε > 0, there exists N such that ∀ n > N , an

bn< ε. Thus

an < εbn,

and so Theorem 3.3.3(2) follows.(3) For M > 0, there exists N such that ∀ n > N , an

bn> M. Thus

Mbn < an,

and so Theorem 3.3.3(3) follows. ¤

Example 3.3.4 (1) For an = 2n+1n2+2n+1

,∑∞

n=1 an diverges: Compare it withbn = 1

n . Then

limn→∞

an

bn= lim

n→∞2n2 + n

n2 + 2n + 1= 2.

(2) For an = 1+n ln nn2+5

,∑∞

n=1 an diverges: Since an ≈ n ln nn2 = ln n

n > 1n , we

can take bn = 1n . Then

limn→∞

an

bn= lim

n→∞n + n2 ln n

n2 + 5= ∞.

(3) If an = ln nn3/2 , then ln n

n3/2 < n1/4

n3/2 = 1n5/4 for large n. Thus take bn = 1

n5/4 ,and then

limn→∞

an

bn= lim

n→∞lnn

n1/4= lim

n→∞

1n

(1/4)n−3/4= lim

n→∞4

n1/4= 0

shows∑∞

n=1 an converges. ¤


3.3.3 The ratio and root tests

Theorem 3.3.5 (The Ratio Test) Let∑∞

n=1 an be a series of positiveterms. Suppose that

limn→∞

an+1

an= ρ.

(1) the series converges if ρ < 1,

(2) the series diverges if ρ > 1,

(3) the test fails if ρ = 1.

Proof: (1) Choose an r such that ρ < r < 1, so that ε = r − ρ > 0.an+1

an→ ρ means ∃ N such that ∀ n ≥ N

an+1

an< ρ + ε = r.

Thus aN+1 < raN ,

aN+2 < raN+1 < r2aN ,...

aN+m < raN+m−1 < rmaN .

Therefore,

∞∑

n=1

an =N−1∑

n=1

an +∞∑

n=N

an

<N−1∑

n=1

an + aN

∞∑

n=0

rn =N−1∑

n=1

an + aN1

1− r.

(2) an+1

an→ ρ > 1 means ∃ N such that ∀ n ≥ N

an+1

an> 1, or aN < aN+1 < aN+2 < · · ·9 0.

(3) Consider an = 1n and bn = 1

n2 . In both cases, an+1

an= n

n+1 → 1, andbn+1

bn= ( n

n+1)2 → 1. But∑∞

n=1 an diverges, while∑∞

n=1 bn converges. ¤


Example 3.3.5 (1)∑∞

n=12n+53n converges, since

an+1

an=

13

(2 + 5

2n

1 + 52n

)→ 2

3< 1.

(2)∑∞

n=1(2n)!n!n! diverges, since

an+1

an=

n!n!(2n + 2)(2n + 1)(2n)!(n + 1)!(n + 1)!(2n)!

=(2n + 2)(2n + 1)(n + 1)(n + 1)

=4n + 2n + 1

→ 4 > 1.

(3) For∑∞

n=14nn!n!(2n)! ,

an+1

an=

4(n + 1)2

(2n + 2)(2n + 1)=

2(n + 1)2n + 1

→ 1.

Thus the test fails. However, an+1

an= 2n+2

2n+1 > 1 means a1 = 2 ≤ an 9 0shows the series diverges. ¤

Example 3.3.6 A series∑∞

n=1 an with an ={

n2n , for n odd12n , for n even

is not

a geometric series. Since lim an = 0, the test for divergence fails. Theintegral test does not look promising, nor does the ratio test, since an+1

an={

12n , for n oddn+1

2 , for n even, which is alternating. ¤

Theorem 3.3.6 (The Root Test) Let∑∞

n=1 an be a series of positive terms.Suppose that

limn→∞

n√

an = ρ.

(1) the series converges if ρ < 1,

(2) the series diverges if ρ > 1,

(3) the test fails if ρ = 1.

Proof: (1) Choose an r such that ρ < r < 1, so that ε = r − ρ > 0.n√

an → ρ means ∃ N such that ∀ n ≥ N

n√

an < ρ + ε = r, or an < rn.


Then,

∞∑

n=1

an =N−1∑

n=1

an +∞∑

n=N

an

<N−1∑

n=1

an + rN∞∑

n=0

rn =N−1∑

n=1

an +rN

1− r.

Thus, by the comparison test, the series converges.(2) n

√an → ρ > 1 means ∃ N such that ∀ n ≥ N

n√

an > 1, or 1 < an 9 0.

(3) Consider an = 1n and bn = 1

n2 . In both cases, n√

an = n√

n → 1, andn√

bn = ( n√

n)2 → 1. But∑∞

n=1 an diverges, while∑∞

n=1 an converges. ¤

Example 3.3.7 (1) For the Example 3.3.6, n√

an ={

n√

n/2, for n odd12 , for n even

→12 shows the series converges.

(2)∑∞

n=0(1

1+n)n converges, since n

√( 11+n)n = 1

1+n → 0 < 1.

(3)∑∞

n=12n

n2 diverges, since n

√2n

n2 = 2( n√n)2

→ 2 > 1. ¤

3.3.4 The alternating series test

A series in which the terms are alternatively positive and negative is calledan alternating series.

Theorem 3.3.7 (The Alternating Series Test) An alternating series

∞∑

n=1

(−1)n+1an = a1 − a2 + a3 − · · ·

converges if

(1) an > 0 for all n,(2) an ≥ an+1 for all n ≥ N ,(3) an → 0.


Proof: If n = 2k is an even number, then

S2k = (a1 − a2) + (a3 − a4) + · · ·+ (an−1 − an)= a1 − (a2 − a3)− (a4 − a5)− · · · − (an−2 − an−1)− an.

Since ak − ak+1 ≥ 0, the first equality shows S2k+2 ≥ S2k, which means S2k

is nondecreasing. The second equality shows S2k ≤ a1, which means S2k isbounded from above. Thus it has a limit: limk→∞ S2k = L.

If n = 2k + 1 is an odd number, then

S2k+1 = S2k + a2k+1 → L + 0 = L,

as k →∞, since limk→∞ a2k+1 = 0. Therefore, limn→∞ Sn = L. ¤

Example 3.3.8 The alternating harmonic series

∞∑

n=1

(−1)n+1 1n

= 1− 12

+13− 1

4+ · · ·

converges. ¤

Theorem 3.3.8 (The Alternating Series Estimation) If an alternat-ing series

∞∑

n=1

(−1)n+1an = a1 − a2 + a3 − · · ·

satisfies the three conditions in Theorem 3.3.7, then for n ≥ N

|L− Sn| < an+1,

where L is the limit of the sum.

Example 3.3.9 The series∑∞

n=0(−1)n 12n converges to 1

1−(−1/2) = 23 . If we

take the sum upto 8-th term, the error is∑∞

n=8(−1)n 1n2 = 1

256 − 1512 + · · · .

Thus|L− S7| < 1

256. ¤

Definition 3.3.1 (1) An series∑∞

n=1 an converges absolutely if∑∞

n=1 |an|converges.

(2) A series that converges but does not converge absolutely convergesconditionally.


Example 3.3.10 (1) The series∑∞

n=1(−1)n+1 1n2 converges absolutely since∑∞

n=1 |(−1)n 1n2 | =

∑∞n=1

1n2 converges.

(2) The alternating harmonic series∑∞

n=1(−1)n+1 1n converges condition-

ally. ¤

Theorem 3.3.9 (The Absolutely Convergence Test) If∑∞

n=1 |an| con-verges, then

∑∞n=1 an converges.

Proof: Note that for any n,

−|an| ≤ an ≤ |an|, or 0 ≤ an + |an| ≤ 2|an|.

Thus if∑ |an| converges, so does

∑(an + |an|). Since an = (an + |an|)−|an|,∑∞

n=1 an is the difference of two convergent series:

∞∑

n=1

an =∞∑

n=1

(an + |an|)−∞∑

n=1

|an|.

Thus the series converges. ¤

Example 3.3.11 (1) The series∑∞

n=1sin nn2 converges absolutely, since

∞∑

n=1

|sinn

n2| ≤

∞∑

n=1

1n2

.

Thus it also converges.(2) For p > 0, the alternating p-series

∑∞n=1

(−1)n−1

np converges, since forp > 1 the series converges absolutely, and for 0 < p ≤ 1 the series convergesconditionally by the alternating series test. ¤

Note that the convergence of a series∑∞

n=1 an does not usually implythe convergence of a series with its term rearranged.

Theorem 3.3.10 (Rearranged Series) If∑∞

n=1 an converges absolutely,then any rearrangement of the terms gives rise absolutely convergent series.

3.4. POWER SERIES 83

Example 3.3.12 The alternating harmonic series∑∞

n=1(−1)n+1 1n converges

conditionally. This series can be rearranged to diverge or to reach any pre-assigned sum:

∑(−1)n+1 1

n=

∑ 12n− 1

−∑ 1

2n→∞−∞.

=11− 1

2+

13

+15− 1

4+

17

+19− 1

6+

111

+113−

18

+115

+117− 1

10+

119

+121− 1

12+

123

+125−

114

+127− 1

16+ · · · → 1 ¤

3.4 Power Series

In the previous sections, we have studied the convergence of series withterms of fixed numbers. Now the terms in a series can be some functions ina variable, say x, varying in some interval.

Definition 3.4.1 A power series about x = 0 is a series of the form

∞∑

n=0

anxn = a0 + a1x + a2x2 + · · ·+ anxn + · · · .

A power series about x = a is a series of the form

∞∑

n=0

an(x− a)n = a0 + a1(x− a) + a2(x− a)2 + · · ·+ an(x− a)n + · · · .

We are concerned about the values of x for which the series converges.

Example 3.4.1 [Geometric Series] A series

∞∑

n=0

xn = 1 + x + x2 + · · ·+ xn + · · ·

is called a geometric series. This converges to 11−x for those x with |x| < 1.

The partial sums of the original series are the polynomials that approximatethe limit.


The power series

1− 12(x− 2) +

14(x− 2)2 + · · ·+ (−1

2)n(x− 2)n + · · ·

is also a geometric series with the ratio −x−22 , which converges for x with

| − x−22 | < 1 or 0 < x < 4 to 1

1+x−22

= 2x . Thus,

2x

= 1− 12(x− 2) +

14(x− 2)2 + · · ·+ (−1

2)n(x− 2)n + · · · , 0 < x < 4.

This series can be approximated by polynomials for values of x near 2:

P0(x) = 1,

P1(x) = 1− 12(x− 2) = 2− x

2,

P2(x) = 1− 12(x− 2) +

14(x− 2)2 = 3− 3x

2+

3x2

4. ¤

Example 3.4.2 Find all the values of x for which the series converges:(1) For

∑∞n=1(−1)n−1 xn

n = x− x2

2 + · · ·+(−1)n−1 xn

n + · · · , since |an+1

an| =

nn+1 |x| → |x|, by the ratio test, the series converges absolutely for |x| < 1and diverges for |x| > 1. For x = 1, the series is the alternating harmonicseries which converges. For x = −1, the series is negative of the harmonicseries which diverges. Thus the interval in which the series converges is(−1, 1].

(2) For∑∞

n=1(−1)n−1 x2n−1

2n−1 = x − x3

3 + x5

5 + · · · + (−1)n−1 x2n−1

2n−1 + · · · ,since |an+1

an| = 2n−1

2n+1x2 → x2, by the ratio test, the series converges absolutelyfor x2 < 1 and diverges for x2 > 1. For x = 1, the series is an convergentalternating series. For x = −1, the series is again negative of an alternatingseries which converges. Thus the interval in which the series converges is[−1, 1].

(3) For∑∞

n=0xn

n! = 1+x+ x2

2! + · · ·+ xn

n! + · · · , since |an+1

an| = |x|

n+1 → 0, bythe ratio test, the series converges absolutely for all x ∈ R Thus the intervalin which the series converges is R.

(4) For∑∞

n=0 n!xn = 1 + x + x22! + · · · + xnn! + · · · , since |an+1

an| =

(n + 1)|x| → ∞, by the ratio test, the series diverges for all x ∈ R exceptx = 0. ¤

Theorem 3.4.1 If a power series∞∑

n=0

anxn = a0 + a1x + a2x2 + · · ·+ anxn + · · ·

3.4. Power Series 85

converges for x = c 6= 0, then it converges absolutely for all x with |x| < |c|.If it diverges for x = d, then it diverges for all x with |x| > |d|.

Proof: Suppose that∑∞

n=0 ancn converges. Then limn→∞ ancn = 0. Thus∃ N such that, for all n ≥ N , |ancn| < 1 or |an| < 1

|c|n . Now for any x with|x| < |c|,

∞∑

n=0

|anxn| =N−1∑

n=0

|anxn|+∞∑

n=N

|anxn|

≤N−1∑

n=0

|anxn|+∞∑

n=N

|xc|n

=N−1∑

n=0

|anxn|+ |xc|N 1

1− |xc |.

If it diverges for x = d and converges at c with |c| > |d|, then, by thefirst part, the series converges for all x with |x| < |c|, especially at x = d,which is a contradiction. ¤

For a series of the form∑∞

n=0 an(x− a)n, replace x− a by x′ and applyTheorem 3.4.1 to

∑∞n=0 an(x′)n.

Corollary 3.4.2 A power series∑∞

n=0 an(x − a)n converges in one of thefollowing three possible ways:

(1) There is a positive number R such that the series diverges for x with|x− a| > R and converges absolutely for x with |x− a| < R. At eitherof the end points x = a−R and x = a+R, the series may or may notconverge (0 < R < ∞).

(2) The series converges absolutely for every x ∈ R (R = ∞).

(3) The series converges at x = a and diverges elsewhere (R = 0).

Proof: Cases (2) and (3) are trivial. Assuming a = 0, suppose that it isnot in the case of (2) or (3). Thus there is a d 6= 0 such that

∑∞n=0 andn

diverges, also there is a p 6= 0 such that∑∞

n=0 anpn converges. Let

S = {x ∈ R |∞∑

n=0

anxn converges.}.


Then ∀ x ∈ S |x| ≤ d so that S is bounded from above. Thus it hasa least upper bound R. If |x| > R > p, then x 6∈ S and so

∑∞n=0 anxn

diverges. If |x| < R, |x| is not an upper bound for S, so that there is b ∈ Ssuch that |x| < b. Since

∑∞n=0 anbn converges, so

∑∞n=0 an|x|n converges by

Theorem 3.4.1.If a 6= 0, set x′ = x− a. ¤

R is called the radius of convergence of the series, and the interval(a−R, a + R) is called the interval of convergence.

3.4.1 Term-by-term differentiation and integration

Theorem 3.4.3 If a power series∑∞

n=0 an(x−a)n converges on an intervalof convergence: I = (a − R, a + R) for some R > 0, it defines a functionf(x) =

∑∞n=0 an(x− a)n on I, which is differentiable infinitely many times

on the interval I and

f ′(x) =∞∑

n=1

nan(x− a)n−1

f ′′(x) =∞∑

n=2

n(n− 1)an(x− a)n−2,

and so on. Each of these derived series converges on I.

Example 3.4.3 For f(x) =∑∞

n=0 xn = 1 + x + x2 + · · · = 11−x on (−1, 1),

f ′(x) =∞∑

n=1

nxn−1 = 1 + 2x + 3x2 + 4x3 + · · · = 1(1− x)2

f ′′(x) =∞∑

n=2

n(n− 1)xn−2 = 2! + 6x + 12x2 + · · · = 2(1− x)3

.¤

Remark: For the series f(x) =∑∞

n=0sin(n!x)

n2 , which converges on R, itsterm by term derivative

∞∑

n=1

n! cos(n!x)n2

diverges for all x: This is not a power series since it is not a sum of positiveinteger powers of x.

3.4. Power Series 87

Theorem 3.4.4 Suppose that a power series f(x) =∑∞

n=0 an(x− a)n con-verges on an interval of convergence: I = (a − R, a + R) for some R > 0.Then its term by term integral

∫f(x)dx =

∞∑

n=1

an(x− a)n+1

n + 1+ C

converges on I.

Example 3.4.4 For f(x) =∑∞

n=0(−1)n x2n+1

2n+1 = x− x3

3 + x5

5 + · · · on [−1, 1],

f ′(x) =∞∑

n=1

(−1)n−1(x2)n−1 = 1− x2 + x4 + x6 + · · · = 11 + x2

, −1 < x < 1.

f(x) =∫

f ′(x)dx =∫

11 + x2

dx = tan−1 x + C.

Since f(0) = 0, C = 0. Thus

f(x) = x− x3

3+

x5

5+ · · · = tan−1 x, −1 < x < 1.

The series, in fact, also converges to tan−1 x at x = ±1. ¤

Example 3.4.5 For f(t) =∑∞

n=0(−1)ntn = 1 − t + t2 − t3 + · · · = 11+t ,

convergent on (−1, 1),

ln(1 + x) =∫ x

0

11 + t

dx = t− t2

2+

t3

3− t4

4+ · · ·

∣∣∣∣x

0

= x− x2

2+

x3

3− x4

4+ · · · , −1 < x < 1.

The series, in fact, also converges to ln 2 at x = 1. ¤

3.4.2 Multiplication of power series

Theorem 3.4.5 Suppose that f(x) =∑∞

n=0 anxn and g(x) =∑∞

n=0 bnxn

converge absolutely for |x| < R. Their multiplication is defined as

f(x) · g(x) = (∞∑

n=0

anxn) · (∞∑

n=0

bnxn) =∞∑

n=0

cnxn,


where

cn = a0bn + a1bn−1 + · · ·+ anb0 =n∑

k=0

akbn−k.

Then the series∑∞

n=0 cnxn converges to h(x) = f(x) · g(x) for |x| < R.

Example 3.4.6 For the series

∞∑

n=0

xn = 1 + x + x2 + · · ·+ xn + · · · = 11− x

= f(x) = g(x), for |x| < 1,

f(x) · g(x) =1

(1− x)2=

∞∑

n=0

cnxn =∞∑

n=0

(n + 1)xn, for |x| < 1,

since cn = 1 + 1 + · · ·+ 1 = n + 1. Note that

d

dx(

11− x

) =1

(1− x)2. ¤

3.5 Taylor and Maclaurin Series

In the previous section, we have seen that some power series can be expressedin terms of elementary functions on the interval of convergence by usingterm by term differentiation and integration. The sum of a power seriesis a continuous function with derivatives of all orders within its interval ofconvergence.

Just like the number of functions with explicit antiderivatives that canbe expressed in terms of elementary functions is very small compared tothe number of integrable functions, the number of power series that can beexpressed in terms of elementary functions is small compared to the numberof power series that converges on some interval.

In this section, we will see that a function with derivatives of all orderson an interval I can be expressed by a convergent power series on I, so thatits functional value can be approximated from the series by summing a finiteterms.

Suppose that the function is the sum of a power series

f(x) =∞∑

n=0

an(x− a)n

= a0 + a1(x− a) + a2(x− a)2 + · · ·+ an(x− a)n + · · ·

3.5. Taylor and Maclaurin Series 89

with positive radius of convergence R > 0. By repeated term by termdifferentiation on I:

f ′(x) = a1 + 2a2(x− a) + 3a3(x− a)2 + · · ·+ nan(x− a)n−1 + · · ·f ′′(x) = 1 · 2a2 + 2 · 3a3(x− a) + 3 · 4a4(x− a)2 + · · ·

+(n− 1) · nan(x− a)n−1 + · · ·f ′′′(x) = 1 · 2 · 3a3 + 2 · 3 · 4a4(x− a) + 3 · 4 · 5a5(x− a)2 + · · ·+

...f (n)(x) = n!an + a sum of terms with (x− a) as a factor.

Thus

f(a) = a0,

f ′(a) = 1a1,

f ′′(a) = 2!a2, or a2 =f ′′(a)

2!,

f ′′′(a) = 3!a3, or a3 =f ′′′(a)

3!,

...

f (n)(a) = n!an, , or an =f (n)(a)

n!,

... .

Thus, for a function with derivatives of all orders, if it has a power seriesrepresentation f(x) =

∑∞n=0 an(x− a)n that converges to the value f(x) for

any x in an interval I centered at a, then there is only one such series withits n-th coefficients: an = f (n)(a)

n! , so that it must be

f(x) =∞∑

n=0

f (n)(a)n!

(x− a)n

= f(a) + f ′(a)(x− a) +f ′′(a)

2!(x− a)2 + · · ·+ f (n)(a)

n!(x− a)n + · · · ,

In general, if f(x) is a function with derivatives of all orders on someinterval I containing x = a, then one can make a power series as

P (x, a) =∞∑

n=0

f (n)(a)n!

(x− a)n = f(a) + f ′(a)(x− a) +f ′′(a)

2!(x− a)2 + · · ·

+f (n)(a)

n!(x− a)n + · · · ,


which is called the Taylor series of f at x = a. When a = 0, the series iscalled the Maclaurin series of f :

P (x) =∞∑

n=0

f (n)(a)n!

xn = f(a) + f ′(a)x +f ′′(a)

2!x2 + · · ·+ f (n)(a)

n!xn + · · · .

Now, a question is whether this Taylor series converges to f(x) at eachpoint x in I or not. For some functions, it will do, but some others it willnot.

Example 3.5.1 Find the Taylor series of f(x) = 1x at a = 2, and the

interval of convergence.

Solution: Since

f(x) =1x

, f(2) =12,

f ′(x) = − 1x2

, f ′(2) = − 122

,

f ′′(x) =2!x3

,f ′′(2)

2!=

123

,

...

f (n)(x) = (−1)n n!xn+1

,f (n)(2)

n!=

(−1)n

2n+1,

the Taylor series is

f(2) + f ′(2)(x− 2) +f ′′(2)

2!(x− 2)2 + · · ·+ f (n)(2)

n!(x− 2)n + · · ·

=12− (x− 2)

22+

(x− 2)2

23− · · ·+ (−1)n (x− 2)n

2n+1+ · · · .

Since this is a geometric series with a0 = 12 and ratio r = −x−2

2 , it convergesabsolutely to

12

1 + (x−2)2

=1

2 + (x− 2)=

1x

,

for |x − 2| < 2. Thus the Taylor series converges to f(x) = 1x at a = 2 for

|x− 2| < 2 or 0 < x < 4. ¤


Example 3.5.2 Consider the following function:

f(x) ={

0, if x = 0,

e−1/x2, if x 6= 0.

One can easily show that this function has derivatives of all orders at x = 0,which are f (n)(0) = 0 for all n. This means that the Taylor series of f atx = 0 is

f(0) + f ′(0)x +f ′′(0)

2!x2 + · · ·+ f (n)(0)

n!xn + · · ·

= 0 + 0 · x + 0 · x2 + · · · + 0 · xn = 0.

The series converges to 0 6= f(x) for x 6= 0. ¤

Theorem 3.5.1 (Taylor’s Theorem) Suppose that f and its first n deriva-tives f ′, f ′′, . . ., f (n) are continuous on a closed interval between a and band f (n) is differentiable on the open interval between a and b. Then thereis a number c ∈ (a, b) such that

f(b) = f(a) + f ′(a)(b− a) +f ′′(a)

2!(b− a)2 + · · · +

f (n)(a)n!

(b− a)n

+f (n+1)(c)(n + 1)!

(b− a)n+1.

Proof: Assume a < b.

f(b)− f(a) =∫ b

af ′(t)dt.

Set u(t) = f ′(t), v(t) = t− b so that du = f ′′(t)dt and dv = dt. Then

=∫ b

audv = uv|ba −

∫ b

avdu, by integration by parts

= − [(b− t)f ′(t)

]b

a+

∫ b

a(b− t)f ′′(t)dt,

= f ′(a)(b− a) +∫ b

af ′′(t)(b− t)dt,

Set u = f ′′(t), dv = (b− t)dt, so v = −(b− t)2

2!

= f ′(a)(b− a)−[(b− t)2

2!f ′′(t)

]b

a

+∫ b

a

(b− t)2

2!f ′′′(t)dt


= f ′(a)(b− a) +f ′′(a)

2!(b− a)2 +

∫ b

af ′′′(t)

(b− t)2

2!dt.

By induction,

= f ′(a)(b− a) +f ′′(a)

2!(b− a)2 + · · · +

f (n)(a)n!

(b− a)n

+∫ b

af (n+1)(t)

(b− t)n

n!dt.

Since f (n+1)(t) is bounded on [a, b], it has maximum and minimum at someu, v ∈ [a, b] such that f (n+1)(v) ≤ f (n+1)(t) ≤ f (n+1)(u) for all t ∈ [a, b].Thus,

f (n+1)(v)∫ b

a

(b− t)n

n!dt ≤

∫ b

af (n+1)(t)

(b− t)n

n!dt ≤ f (n+1)(u)

∫ b

a

(b− t)n

n!dt.

However, ∫ b

a

(b− t)n

n!dt = −

[(b− t)n+1

(n + 1)!

]b

a

=(b− a)n+1

(n + 1)!,

and so

f (n+1)(v) ≤∫ ba f (n+1)(t) (b−t)n

n! dt(b−a)n+1

(n+1)!

≤ f (n+1)(u).

By the intermediate value Theorem 1.4.3, there is a number c ∈ [a, b] suchthat ∫ b

af (n+1)(t)

(b− t)n

n!dt = f (n+1)(c)

(b− a)n+1

(n + 1)!= Rn(b, a).

When a > b, almost the same proof holds. ¤

For a fixed a, b can now be treated as an independent variable, andreplaced by x. Hence, if f has derivatives of all orders in an interval Icontaining a, then for each positive integer n and x ∈ I,

f(x) = f(a) + f ′(a)(x− a) +f ′′(a)

2!(x− a)2 + · · · +

f (n)(a)n!

(x− a)n +

f (n+1)(c)(n + 1)!

(x− a)n+1,

where

Pn(x) = f(a) + f ′(a)(x− a) +f ′′(a)

2!(x− a)2 + · · · +

f (n)(a)n!

(x− a)n


is called the Taylor polynomial of order n of f at x = a, and

f(x)− Pn(x) = Rn(x, a) =f (n+1)(c)(n + 1)!

(x− a)n+1

is called the remainder of order n or the error term.

Theorem 3.5.2 The Taylor series P (x, a) =∑∞

n=0f (n)(a)

n! (x − a)n of fconverges to f(x) on I, if Rn(x, a) → 0 as n →∞ for all x ∈ I: we write

f(x) =∞∑

n=0

f (n)(a)n!

(x− a)n.

Since we do not know what c is in the remainder term, we can notcompute the precise value of the error term. However, we can estimate itspossible maximum on I: i.e., if there is a positive number M such that|f (n+1)(t)| ≤ M for all t between a and x, then

|Rn(x, a)| ≤ M|x− a|n+1

(n + 1)!.

Thus, the Taylor polynomial Pn(x) approximates f(x) within this muchof error, and the Taylor series converges to f(x) on I if this condition onthe remainder term is satisfied for every n.

Example 3.5.3 Find the Taylor formula of each of the following functions:(1) f(x) = ex at x = 0.Since f (n)(x) = ex for all n, f (n)(0) = 1 for all n. Thus

f(x) = 1 + x +x2

2!+ · · · +

xn

n!+ ec xn+1

(n + 1)!= Pn(x, 0) + Rn(x, 0),

for some c between 0 and x. Since |Rn(x, 0)| ≤ e|x| |x|n+1

(n+1)! → 0 as n →∞ forany fixed x ∈ R, the Taylor series converges to f(x):

ex = 1 + x +x2

2!+ · · · +

xn

n!+ · · · =

∞∑

n=0

xn

n!= lim

n→∞Pn(x, 0).

(2) f(x) = sinx at x = 0.


The derivatives are:

f(x) = sinx f ′(x) = cos x,

f ′′(x) = − sinx f ′′′(x) = − cosx,

...f (2n)(x) = (−1)n sinx f (2n+1)(x) = (−1)n cosx.

Thus f (2n)(0) = 0 and f (2n+1)(0) = (−1)n.

sinx = x− x3

3!+

x5

5!− · · · +

(−1)nx2n+1

(2n + 1)!+

(−1)n+1x2n+3 cos c

(2n + 3)!= Pn(x) + Rn(x),

for some c between 0 and x. Note that |Rn(x, 0)| ≤ 1 |x|2n+2

(2n+2)! → 0 as n →∞for any fixed x ∈ R, the Taylor series converges to f(x):

sinx = x− x3

3!+

x5

5!− · · · +

(−1)nx2n+1

(2n + 1)!+ · · · =

∞∑

n=0

(−1)nx2n+1

(2n + 1)!.

(3) f(x) = cosx at x = 0.The derivatives are:

f(x) = cosx f ′(x) = − sinx,

f ′′(x) = − cosx f ′′′(x) = sinx,

...f (2n)(x) = (−1)n cosx f (2n+1)(x) = (−1)n+1 sinx.

Thus f (2n)(0) = (−1)n and f (2n+1)(0) = 0.

cosx = 1− x2

2!+

x4

4!− · · · +

(−1)nx2n

(2n)!+

(−1)nx2n+1 cos c

(2n + 1)!= Pn(x) + Rn(x),

for some c between 0 and x. Note that |Rn(x, 0)| ≤ 1 |x|2n+1

(2n+1)! → 0 as n →∞for any fixed x ∈ R, the Taylor series converges to f(x):

cosx = 1− x2

2!+

x4

4!− · · · +

(−1)nx2n

(2n)!+ · · · =

∞∑

n=0

(−1)nx2n

(2n)!.

(4) f(x) = cos 2x at x = 0.


cos 2x = 1− (2x)2

2!+

(2x)4

4!− · · · +

(−1)n(2x)2n

(2n)!+ · · · =

∞∑

n=0

(−1)n(2x)2n

(2n)!,

which converges for all x with −∞ < 2x < ∞. By the uniqueness of theTaylor series, this must be the Taylor series of cos 2x.

(5) f(x) = x sinx at x = 0.

x sinx = x(x− x3

3!+

x5

5!− · · · +

(−1)nx2n+1

(2n + 1)!+ · · · ) =

∞∑

n=0

(−1)nx2n+2

(2n + 1)!,

which converges for all x with −∞ < x < ∞. By the uniqueness of theTaylor series, this must be the Taylor series of cos 2x. ¤

Indeed, the uniqueness of the Taylor series says that it doesn’t matterhow we obtain the expression of f by a power series. As long as the functionis expressed by a convergent power series, it is the Taylor series by theuniqueness.

Most of the elementary functions can be expressed by the Taylor seriesand computing the series is the most practical method of evaluating thefunctional values f(x) for given input number x.

However, the series has infinitely many terms and we can not take thesum of infinitely many terms, all we can do is taking sum of only a finitenumber of terms. Thus the Taylor polynomial is only way that can approx-imate the sum. Than question is how many terms do we need to take tohave the error sufficiently small?

Example 3.5.4 (1) Calculate e with an error of less than 10−6.For x = 1 in the Taylor formula of ex,we get e = 1 + 1 + 1

2! + · · · + 1n! +

Rn(1), with

Rn(1) = ec 1(n + 1)!

.

Knowing e < 3, 1(n+1)! < Rn(1) < 3

(n+1)! . Since 310! < 1

106 < 19! , we choose

n + 1 = 10, or n = 9. Thus

e ≈ 1 + 1 +12!

+ · · · +19!≈ 2.718282.

(2) Calculate sin 1 with an error of less than 3 · 10−4.


The Taylor formula is

sinx = 0 + x− 0 · x2 − x3

3!+ 0 · x4 +

x5

5!−R5(x)

= 0 + x− 0 · x2 − x3

3!+ 0 · x4 +

x5

5!− 0 · x6 −R6(x).

Thus we take smaller error R2n(x) = sin c x2n+1

(2n+1)! with 2n− 1 terms.

For x = 1, R2n(1) = sin c 1(2n+1)! with c ∈ (0, 1). Thus

|R2n(1)| ≤ 1(2n + 1)!

<3

104

for 2n + 1 = 7, since 17! < 2

104 < 3104 . Hence, we get

sin 1 ≈ 1− 13!

+15!

,

with

|R6(1)| ≤ 17!

<3

104. ¤

3.6 Applications of Power Series

For an integer m ≥ 0, the binomial expansion is the well known expansionof the function:

f(x) = (1 + x)m = 1 + mx +(

m

2

)x2 + · · · +

(m

m− 1

)xm−1 + xm.

For a general number m, one can still have a similar expansion like this.Let f(x) = (1 + x)m. Then

f ′(x) = m(1 + x)m−1

f ′′(x) = (m− 1)m(1 + x)m−2

f ′′′(x) = (m− 2)(m− 1)m(1 + x)m−3

...f (k)(x) = (m− k + 1) · · · (m− 1)m(1 + x)m−k.

3.6. Applications of Power Series 97

Thus the Taylor series of f at x = 0 is:

P (x) = 1 + mx +(m− 1)m

2!x2 +

(m− 2)(m− 1)m3!

x3 + · · · +

(m− k + 1) · · · (m− 1)mk!

xk + · · · ,

which is called the binomial series. If m is an integer ≥ 0, the series stopsafter (m + 1) terms because the coefficients from k = m + 1 on are zero. Ifm is not an integer or 0, then the series is infinite and converges for |x| < 1:

Indeed, by the ratio test,∣∣∣∣ak+1

ak

∣∣∣∣ =∣∣∣∣m− k

k + 1x

∣∣∣∣ → |x|,

as k →∞. One can also easily show that the remainder term approaches 0as k →∞, so that this Taylor series converges to f(x) = (1 + x)m: With anotational convention, let

(m

1

)= m,

(m

2

)=

(m− 1)m2!

,

(m

k

)=

(m− k + 1) · · · (m− 1)mk!

.

Then

(1 + x)m = 1 +∞∑

k=1

(m

k

)xk.

Example 3.6.1 If m = −1, then(−1

1

)= −1,

(−12

)= −1(−2)

2! = 1, and(−1

k

)=−1(−2)(−3) · · · (−1− k + 1)

k!= (−1)k k!

k!= (−1)k.

Thus(1 + x)−1 = 1− x + x2 − x3 + · · · + (−1)kxk + · · · . ¤

Example 3.6.2 If m = 12 , then f(x) =

√1 + x, and

( 121

)= 1

2 ,( 1

22

)=

12·( 1

2−1)

2! = −18 , and

(−13

)=

(12)(−1

2)(−32)

3!=

116

,

(−14

)=

(12)(−1

2)(−32)(−5

2)4!

=5

128.

Hence,

√1 + x = 1 +

x

2− x2

8+

x3

16− 5x4

128+ · · · for |x| small

= 1 +12

(x +

∞∑

n=2

(−1)n−1 (2n− 3)!22n−3(n− 2)!n!

xn

). ¤


Example 3.6.3 Estimate∫ 10 sinx2dx with in an error of less than 0.001.

Note that

sinx2 = x2 − x6

3!+

x10

5!− · · · +

(−1)nx4n+2

(2n + 1)!+ · · · ,

∫sinx2dx = C +

x3

3− x7

7 · 3!+

x11

11 · 5!− · · · +

(−1)nx4n+3

(4n + 3)(2n + 1)!+ · · · .

Thus,∫ 1

0sinx2dx =

13− 1

7 · 3!+

111 · 5!

− · · · +(−1)n

(4n + 3)(2n + 1)!+ · · · .

Since 111·5! ≈ 0.00076 < 0.001,

∫ 1

0sinx2dx =

13− 1

7 · 3!≈ 0.310

within the error required. ¤

Example 3.6.4 Recall that, in Example 3.4.4, from

d

dxtan−1 x =

11 + x2

= 1− x2 + x4 − x6 + · · · ,

by term by term integration we get

tan−1 x =∫

11 + x2

dx = x− x3

3+

x5

5− x7

7+ · · · .

Since the term by term integration theorem was not proven, we derivethis series expression of tan−1 from an integral of a finite sum:

11 + t2

= 1− t2 + t4 − t6 + · · ·+ (−1)nt2n +(−1)n+1t2n+2

1 + t2.

By integrating both sides from t = 0 to t = x, we get

tan−1 x = x− x3

3+

x5

5− x7

7+ · · ·+ (−1)n x2n+1

2n + 1+ Rn(x),

where

Rn(x) =∫ x

0

(−1)n+1t2n+2

1 + t2dt.

3.6. Applications of Power Series 99

Since the denominator of the integrand is greater than or equal to 1,

|Rn(x)| ≤∫ |x|

0t2n+2dt =

|x|2n+3

2n + 3→ 0,

as n →∞ for |x| ≤ 1. Thus, we get

tan−1 x =∞∑

n=0

(−1)n x2n+1

2n + 1= x− x3

3+

x5

5− x7

7+ · · · , |x| ≤ 1.

For x = 1,π

4= 1− 1

3+

15− 1

7+ · · ·+ (−1)n

2n + 1+ · · · .

This series converges very slowly. However, for x close to 0, the seriesconverges rapidly, and so one can use various trigonometric formulas: Letα = tan−1 1

2 and β = tan−1 13 . Then

tan(α + β) =tanα + tanβ

1− tanα tanβ=

12 + 1

3

1− 16

= tanπ

4. ¤

Example 3.6.5 Series can be used to evaluate the limit of functions. Findthe limit:

limx→0

(1

sinx− 1

x

).

1sinx

− 1x

=x− sinx

x sinx=

x− (x− x3

3! + x5

5! − · · · )x(x− x3

3! + x5

5! − · · · )

=x3( 1

3! − x2

5! + · · · )x2(1− x2

3! + x4

5! − · · · )

=x( 1

3! − x2

5! + · · · )(1− x2

3! + x4

5! − · · · ).

Hence,

limx→0

(1

sinx− 1

x

)= lim

x→0

x( 13! − x2

5! + · · · )(1− x2

3! + x4

5! − · · · )= 0.

On the other hand, for |x| small,

1sinx

− 1x≈ x

13!

=x

6, or csc x =

1x

+x

6. ¤


3.7 Fourier Series

The Taylor polynomial gives a good approximation for functional valuesfor x near a particular point x = a where the functional value is given.However, the error in the approximation can be large at points that are faraway. There is another method, called the Fourier series, that often givesgood approximation on wide intervals, and often works with discontinuousfunctions for which Taylor polynomial fails. The Fourier series approximatefunctions with sums of sine and cosine functions. It is well suited for ana-lyzing periodic functions, such as radio signals and alternating currents, forsolving heat transfer problems, and for many other problems in science andengineering.

Let f(x) be a function on the interval [0, 2π]. We wish to approximatef by a function of the form:

fn(x) = a0 + (a1 cosx + b1 sinx) + (a2 cos 2x + b2 sin 2x) + · · ·+(an cosnx + bn sinnx).

= a0 +n∑

k=1

(ak cos kx + bk sin kx).

To choose ak and bk that make fn(x) a best possible approximation to f(x)in the following sense:

(1) fn(x) and f(x) give the same value when integrated from 0 to 2π:∫ 2π

0fn(x)dx =

∫ 2π

0f(x)dx.

(2) fn(x) cos kx and f(x) cos kx give the same value when integrated from0 to 2π, for k = 1, 2, 3,. . ., n:

∫ 2π

0fn(x) cos kxdx =

∫ 2π

0f(x) cos kxdx.

(2) fn(x) sin kx and f(x) sin kx give the same value when integrated from0 to 2π, for k = 1, 2, 3,. . ., n:

∫ 2π

0fn(x) sin kxdx =

∫ 2π

0f(x) sin kxdx.

3.7. Fourier Series 101

From the expression of fn(x) defined above, the left side integrals are:

∫ 2π

0fn(x)dx = 2πa0

∫ 2π

0fn(x) cos kxdx = πak

∫ 2π

0fn(x) sin kxdx = πbk.

These follow from the fact that

∫ 2π

0cos px cos qxdx =

{π if p = q,0 if p 6= q.

∫ 2π

0sin px sin qxdx =

{π if p = q,0 if p 6= q.

∫ 2π

0sin px cos qxdx = 0.

Therefore, we choose fn(x) so that the integral on the left remain the samewhen fn(x) is replaced by f(x):

a0 =12π

∫ 2π

0f(x)dx,

ak =1π

∫ 2π

0f(x) cos kxdx, k = 1, . . . , n,

bk =1π

∫ 2π

0f(x) sin kxdx, k = 1, . . . , n.

The Fourier series for f(x) is obtained, as n →∞,

a0 +∞∑

k=1

(ak cos kx + bk sin kx).

Example 3.7.1 Let f(x) ={

1 if 0 ≤ x ≤ π,2 if π < x ≤ 2π.


The coefficients of the Fourier series of f are:

a0 =12π

∫ 2π

0f(x)dx,

=12π

(∫ π

01dx +

∫ 2π

π2dx

)=

32.

ak =1π

∫ 2π

0f(x) cos kxdx,

=1π

(∫ π

0cos kxdx +

∫ 2π

π2 cos kxdx

),

=1π

(sin kx

k

∣∣∣∣π

0

+2 sin kx

k

∣∣∣∣2π

π

)= 0, k ≥ 1.

bk =1π

∫ 2π

0f(x) sin kxdx,

=1π

(∫ π

0sin kxdx +

∫ 2π

π2 sin kxdx

),

=1π

(−cos kx

k

∣∣∣∣π

0

+ −2 cos kx

k

∣∣∣∣2π

π

),

=cos kπ − 1

kπ=

(−1)k − 1kπ

, k ≥ 1.

Thus, a0 = 32 , a1 = a2 = · · · = 0,

b1 = − 2π

, b2 = 0, b3 = − 23π

, b4 = 0, b5 = − 25π

, b6 = 0, · · · ,

and the Fourier series is

32

+2π

(sinx +

sin 3x

3+

sin 5x

5+ · · ·

). ¤

3.7.1 Convergence of Fourier series

Taylor series are computed from the value of a function and derivatives at asingle point x = a, and can not reflect the behavior of discontinuous functionsuch as f in Example 3.7.1 past a discontinuity. The reason that a Fourier

3.7. Fourier Series 103

series can be used to represent such functions is that the Fourier series ofa function depends on the existence of certain integrals, whereas the Taylorseries depends on derivatives of a function near a single point. A functioncan be fairly rough even discontinuous, and still be integrable.

The coefficients used to construct Fourier series are precisely those oneshould choose to minimize the integral of the square of the distance betweenf and fn: ∫ 2π

0(f(x)− fn(x))2dx

is minimized by choosing a0, a1, . . ., an and b1, b2, . . ., bn as we did. WhileTaylor series are useful to approximate a function and its derivatives near apoint, Fourier series minimizes an error which is distributed over an interval.

Theorem 3.7.1 Let f(x) be a function such that f and f ′ are piecewisecontinuous on the interval [0, 2π]. Then f is equal to its Fourier series atall points where f is continuous. At a point c where f has a discontinuity,the Fourier series converges to

f(c+) + f(c−)2

where f(c+) and f(c−) are the right and left limits of f at c.

Chapter 4

First Order DifferentialEquations

4.1 First order Linear Differential Equations

A differential equation is an equation involving one or more derivatives ofunknown functions. For instance, the position of a moving particle on astraight line is a function u(t) of time t governed by the Newton’s law ofmotion:

m u′′(t) = f(t, u(t), u′(t)),

where m is the mass of the particle, f represents the force acting on theparticle which depends on the time t ∈ (a, b), the position u(t) and thevelocity u′(t).

A differential equation of a unknown function in one variable is calledan ordinary differential equation (abbreviated by O.D.E.), the one of afunction in more than one variable is called a partial differential equation(abbreviated by P.D.E.).

Differential equations appear naturally in diverse areas of science andthe humanities, such problems as detection of art forgeries, the diagnosis ofdiabetes, and population dynamics, etc.

The order of a D.E. is the order of the highest derivative of the functionu(t) that appears in the equation. The above equation is of order 2. Ingeneral, an n-th order differential equation is of the form:

F (t, u(t), u′(t), u′′(t), . . . , u(n)(t)) = 0.

A solution of a differential equation

y(n)(t) = f(t, y′, . . . , y(n−1))

105

106 Chapter 4. Differential Equations

on an interval (a, b) is a real function y = φ(t) with φ′, . . . , φ(n) that satisfiesthe equation: i.e.,

φ(n)(t) = f(t, φ′, . . . , φ(n−1)),

for all t ∈ (a, b).Basic problems in D.E. are the following:

(1) (Existence) Does a D.E. have a solution?(2) (Uniqueness) If yes, how many of them are there?.(3) (Practicality) Determine a solution, and find all of them.

Even if, we know that a solution exists, it may not be possible to findone in terms of elementary functions such as polynomials, trigonometric,exponential, logarithmic, or hyperbolic functions. In this case, one cantake trial and error methods, or approximate the solution numerically usingcomputers.

Definition 4.1.1 An O.D.E. F (t, y, y′, . . . , y(n)) = 0 is said to be linear,if F is a linear function of the derivatives of the unknown function y. Ageneral form of order n linear differential equation is of the form:

an(t)y(n) + an−1(t)y(n−1) + · · ·+ a1(t)y = g(t),

where ai(t)’s are functions in t on the interval I = (a, b). When g(t) ≡ 0,for all t, the equation is called a homogeneous linear D.E.

Thus a first order D.E. is of the form

dy

dt= f(t, y),

which represents the slope of the graph of the solution function y = φ(x)at each point (t, y). i.e., at each point (t, y) in the domain of f , one candraw a short line segment with the slope f(t, y). The collection of some ofthem is called the direction field for the D.E. By plotting as many of them aspossible, one can see how the graph of the solution looks like as the followingexample shows.

For a first order linear D.E., f(t, y) = p(t)y + g(t) with continuous func-tions p(t) and g(t) on (a, b). Then we can rewrite it as

dy

dt+ p(t)y = g(t).

The equation is homogeneous if g(t) = 0.

4.1. First order Linear Differential Equations 107

Example 4.1.1 Consider dydt = −1

2y + 32 = 3−y

2 .On the line y = 3, dy

dt = 0 means the slope of the solutions are 0, onthe line y = 2, dy

dt = 12 means the slope of the solutions are 1

2 , on the liney = 1, dy

dt = 1 means the slope of the solutions are 1, and on the line y = 4,dydt = −1

2 means the slope of the solutions are 112 , etc. These horizontal lines

are called isoclines of the solutions. By drawing tangent curves to thoseslope segments, one can see the solutions, called integral curves:

-

6

123

45

x

y

isoclines

integral curves

Now, the equation can be rewritten as

1y − 3

dy = −12dt, for y 6= 3.

By taking integrations of both sides, we get:

ln |y − 3| = −12t + c, c a constant,

|y − 3| = ece−12t = Ce−

12t,

y = 3 + Ce−12t.

If y = 3, it is already a solution with dydt = 0, which is already contained in

the expression of y in the last equation for C = 0. In particular, the curvethat passes through an initial condition (0, 2) is y = 3− e−

12t. ¤

Note that the solution in this example can be rewritten as

ye12t − 3e

12t = C.

This is the direct integral of its differentiation:

(y′ +12y)e

12t − 3

2e

12t = 0, or

(y′ +

12y − 3

2

)e

12t = 0.


This shows that the original D.E. can be integrated directly by multi-plying a function e

12t. This is called an integrating factor. How can we

find such an integrating factor?We first consider a homogeneous first order linear D.E. y′ + p(t)y =

0. This equation may be rewritten as y′y = −p(t), whose left side is the

derivative of ln |y(t)|, i.e.,

d

dtln |y(t)| = −p(t), or ln |y(t)| = −

∫p(t)dt + c,

or |y(t)| = C exp(−

∫p(t)dt

), or

∣∣∣∣y(t) exp(∫

p(t)dt

)∣∣∣∣ = C.

Note that if the absolute value of a continuous function is a constant, so isitself. Thus,

y(t) exp(∫

p(t)dt

)= C, or y(t) = C exp

(−

∫p(t)dt

),

which is the integral of

(y′ + p(t)y) exp(∫

p(t)dt

)= 0,

and hence, the integrating factor is exp(∫

p(t)dt). This solution is called

the general solution of the D.E. since every solution of the D.E. must beof this form. Usually, we are looking for the specific solution y(t) of theD.E. y′(t) + p(t)y(t) = 0 which takes the value y0 at some initial time t0:y(t0) = y0. For this solution, we integrate both sides of the D.E. from t0 tot: ∫ t

t0

d

dtln |y(s)|ds = −

∫ t

t0

p(s)ds,

and so, ln |y(t)| − ln |y(t0)| = ln∣∣∣∣y(t)y(t0)

∣∣∣∣ = −∫ t

t0

p(s)ds,

or,∣∣∣∣y(t)y(t0)

exp(∫ t

t0

p(t)dt

)∣∣∣∣ = 1,

or,y(t)y(t0)

exp(∫ t

t0

p(t)dt

)= ±1.

But at t = t0, the value is 1, and so the constant is 1. Therefore,

y(t) = y(t0) exp(−

∫ t

t0

p(t)dt

)= y0 exp

(−

∫ t

t0

p(t)dt

).

This is called the particular solution.

4.1. First order Linear Differential Equations 109

Example 4.1.2 Find the solution of the initial value problem:

dy

dt+ (sin t)y = 0, y(0) =

32.

The general solution is y(t) = C exp(− ∫

sin sds)

= Cecos t, and theparticular solution is

y(t) =32

exp(−

∫ t

0sin sds

)=

32ecos t−1. ¤


dy

dt+ et2y = 0, y(1) = 2.

The particular solution is

y(t) = 2 exp(−

∫ t

1es2

ds

). ¤

Remarks: The solution to Example 4.1.2 is given explicitly, while theone to Example 4.1.3 can not be evaluated. However, they are both equallyvalid and equally useful in two reasons: First, there are very simple numeri-cal schemes to evaluate the integral in any degree of accuracy with the aid ofa computer. Second, even if the solution to Example 4.1.2 is given explicitly,we still cannot evaluate it at any time t without some sort of calculating aidlike digital computers.

Now we consider the nonhomogeneous equation: y′(t) + p(t)y = q(t).Just like for a homogeneous equation, by multiplying an integrating fac-

tor µ(t) to both sides: y′µ(t) + p(t)µ(t)y = q(t)µ(t), we want to have theleft side to be the derivative of yµ(t): Thus,

In general, suppose that µ(t) is an integrating factor for a first orderlinear D.E. y′ + p(x)y = q(x). Then

d

dt(yµ(t)) = y′µ(t) + µ′(t)y

= y′µ(t) + p(t)µ(t)y = q(t)µ(t),

and so it has to be µ′(t) = p(t)µ(t), or µ′(t)µ(t) = p(t), whose solution is

µ(t) = C exp(∫

p(s)ds

)= exp

(∫p(s)ds

),


where we took C = 1 since we need only one such solution. Thus originalequation is

d

dt(yµ(t)) =

d

dty exp

(∫p(s)ds

)= q(t)µ(t).

By integration,

exp(∫

p(t)dt

)y =

∫q(t)µ(t)dt + C,

y(t) = exp(−

∫p(t)dt

)(∫q(t)µ(t)dt + C

),

which is the general solution.If an initial condition y(t0) = y0 is given, then we take the definite

integrals of ddt(yµ(t)) = q(t)µ(t) from t0 to t to get:

µ(t)y − µ(t0)y0 =∫ t

t0

q(s)µ(s)ds

or y(t) =1

µ(t)

(µ(t0)y0 +

∫ t

t0

q(s)µ(s)ds

).


dy

dt− 2ty = t, y(1) = 2.

The integrating factor is µ(t) = exp(− ∫

2tdt)

= e−t2 , and so the equa-tion becomes:

e−t2(

dy

dt− 2ty

)=

dy

dt(e−t2y) = te−t2 .

Thus the general solution is

e−t2y =∫

te−t2dt + C =−e−t2

2+ C,

or y(t) = −12

+ Cet2 .

From the initial value y(1) = 2, the constant C is determined to be

2 = y(1) = −12

+ Ce12

or C =52e−1.

4.2. SEPARABLE EQUATIONS 111

Thus the particular solution is:

y(t) = −12

+52et2−1. ¤

Example 4.1.5 Find the general solution of the initial value problem:

dy

dt+ y =

11 + t2

, y(2) = 3.

The integrating factor is µ(t) = exp(∫

1dt)

= et, and so the equationbecomes:

et

(dy

dt+ y

)=

dy

dt(ety) =

et

1 + t2.

Hence, from the initial condition y(2) = 3,∫ t

2

d

ds(esy(s))ds =

∫ t

2

es

1 + s2ds,

ety(t)− 3e2 =∫ t

2

es

1 + s2ds,

or y(t) = e−t

(3e2 +

∫ t

2

es

1 + s2ds

). ¤

4.2 Separable Equations

A more general D.E. than the first order linear D.E. is the following form:f(y)dy

dt = g(t), which is said to be separable variables. The left side canbe the derivative of some function F (y) of y so that

d

dtF (y(t)) = f(y)

dy

dt= g(t).

Thus, the general solution is of the form:

F (y(t)) =∫

g(t)dt + c, or∫

f(y)dy =∫

g(t)dt + c.

If an initial condition y(t0) = y0 is given, the particular solution can befound by determining the integral constant c from the general solution, orby integrating the given equation F (y(t)) =

∫g(t)dt + c from t0 to t:

F (y(t))− F (y0) =∫ t

t0

g(s)ds, or∫ y

y0

f(r)dr =∫ t

t0

g(s)ds.



ey dy

dt− (t + t3) = 0, y(1) = 1.

Rewrite this as eydy = (t + t3)dt, and integrating both sides, we getey(t) = t2

2 + t4

4 + c. Taking logarithms,

y(t) = ln(t2

2+

t4

4+ c),

which is the general solution.(i) By setting t = 1 and y0 = 1 in the general solution, we get

1 = ln(34

+ c), or c = e− 34.

Thus,

y(t) = ln(e− 34

+t2

2+

t4

4).

(ii) From∫ y1 erdr =

∫ t1 (s + s3)ds, we get

ey − e =t2

2+

t4

4− 1

2− 1

4, or y(t) = ln(e− 3

4+

t2

2+

t4

4). ¤


dy

dt= 1 + y2, y(0) = 0.

Rewrite this as 11+y2 dy = dt, and integrating both sides, we get

∫ y

0

11 + r2

dr =∫ t

0ds, or tan−1 y = t, or y = tan t.

Note that this solution y = tan t goes to∞ at finite times t = ±π2 which is

not expected from the original nice equation. Thus solutions usually existsonly on a finite open interval (a, b), rather than for all time. Moreover,different solutions of the same differential equation usually go to infinity atdifferent times: For example, if the initial condition is given by y(0) = 1 inthis problem, then

∫ y

1

11 + r2

dr =∫ t

0ds,

tan−1 y − tan−1 1 = t,

y = tan(t +π

4),

whose domain is (−3π4 , π

4 ). ¤

4.2. Separable Equations 113


ydy

dt+ (1 + y2) sin t = 0, y(0) = 1.

Rewrite this as y1+y2 dy = − sin tdt, and integrating both sides, we get∫ y

1r

1+r2 dr =∫ t0 − sin sds, or 1

2 ln(1 + y2)− 12 ln 2 = cos t− 1. Solving this for

y,y(t) = ±(2e−4 sin2 t

2 − 1)1/2.

Since y(0) > 0, we take y(t) = (2e−4 sin2 t2 − 1)1/2, provided 2e−4 sin2 t

2 ≥ 1,or e4 sin2 t

2 ≤ 2. Since the logarithm is monotonically increasing, we have∣∣∣∣t

2

∣∣∣∣ ≤ sin−1

√ln 22

.

Therefore, the solution exists only on the open interval

(−2 sin−1

√ln 22

, 2 sin−1

√ln 22

).

This means that y(t) just disappears at t = ±2 sin−1√

ln 22 , without going to

infinity.However, this difficulty can be anticipated from the original equation: If

we rewrite the equation as

dy

dt= −(1 + y2) sin t

y,

then the differential equation is not defined when y(t) = 0. Thus if a solutiony(t) achieves 0 at some time t = a, then we can not expect it to be definedfor t > a. This is exactly what happened here, since y(±a) = 0 for a =2 sin−1

√ln 22 . ¤


dy

dt= (1 + y)t, y(0) = −1.

In this case, one can not have 11+ydy = tdt, since y(0) = −1 so that

1+y(0) = 0. However, y(t) = −1 is already one solution of this initial valueproblem, which turns out to be the only one solution.

In general, for an initial value problem: dydt = f(y)g(t) with y(t0) = y0,

if f(y0) = 0, then we will show later that y(t) = y0 is the only solution ofthis initial value problem provided that df

dy exists and is continuous. ¤



(1 + ey)dy

dt= cos t, y(

π

2) = 3.

The particular solution can be found by∫ y

3(1 + er)dr =

∫ t

π/2cos sds,

y + ey = 2 + e3 + sin t.

This equation can not be solved explicitly for y as a function in t. Thus, thesolution to this kind of initial value problem is an implicit solution. However,one can always find y(t) numerically by using a digital computer. ¤

Example 4.2.6 Find all solutions of the D.E.:

dy

dt= − t

y.

The general solution can be found by integrating ydy = −tdt to get

y2 + t2 = c2.

The solutions are closed and we can not solve for y as a single valuedfunction of t, since the D.E. is not defined when y = 0 nevertheless thecircles t2 + y2 = c2 are perfectly well-defined even when y = 0. Hence, wewill call the circles solution curves of the D.E. ¤

4.2.1 Population models

It seems impossible to model the growth of a species by a differential equa-tion since the population of any species always changes by integer amountsand so the population can never be a differentiable function of time. How-ever, if a given population is very large, then the change of the populationby one is very small compared to the given population. Thus, we makethe approximation that large populations changes continuously, and evendifferentiably with time.

Let p(t) denote the population of a given species at time t, and let r(t, p)denote the growth rate which is the difference between its birth rate anddeath rate. If this population is isolated, that is, there is no net immigration

4.2. Population Models 115

or emigration, then the rate of change of the population is proportional tothe current population: i.e., dp(t)

dt = r(t, p)p(t). In the most simplistic model,we assume that the growth rate r = a is a constant. Thus, the differentialequation governing the population growth becomes:

dp(t)dt

= ap(t), a is a constant,

which is linear and is known as the Malthusian law of populationgrowth. For an initial value p(t0) = p0, the particular solution is an expo-nential growth:

p(t) = p0ea(t−t0).

Usually, this exponential growth of the population fits very well for smallpopulation in short time periods, but not for large population in long timeperiods. This is due to neglecting the competition between individual mem-ber themselves for the limited living space, natural resources and food avail-able as population gets very large. Therefore, the growth rate r(t, p) is afunction in time t and the current population p(t). We want to choose it tosatisfy that r ≈ a when p is small, and r decreases as p grows larger, andr < 0 when p is sufficiently large. The simplest one with these properties isr(t, p) = a− bp(t) for some positive constant b.

Therefore, the modified equation is

dp(t)dt

= (a− bp(t))p(t) = a(1− p(t)K

)p(t),

where K = ab . This equation is known as the Verhulst equation or the

logistic equation and the numbers a and b are called the vital coeffi-cients of the population. It was first introduced in 1837 by the Belgianmathematical-biologist Verhulst.

Before deriving the solution, we first look at the main features of thesolution that can be discovered directly from the differential equation itselfby using geometric reasoning, even without solving it. This is importantbecause the same methods can often be used on more complicated equationswhose solutions are more difficult to obtain.

The graph of the right side of the equation, which is a parabola is givenin the following figure:


K2

6

-K

p(t)

dp(t)dt

aK4

- ¾

6

-

K

p

t

K2

p = 0

p = K

For 0 < p < K, dp(t)dt > 0 means p is increasing, for p > K, dp(t)

dt < 0means p is decreasing, and for p = 0 or p = K, dp(t)

dt = 0 means p doesnot change. Thus the constant solutions p(t) = 0 and p(t) = K are calledequilibrium solutions. The points p = 0 and p = K on the p axis arecalled equilibrium points or critical points. Based on these observation,we can sketch the graph of the solution p(t) versus t depending on the initialvalue p(0) = p0, which is depicted in the right side of the above figure.

The fundamental theorem of O.D.E., which will be proven later, guar-antees that two different solutions never pass through the same point. Thuswhile solutions approach the equilibrium solution p(t) = K as t →∞, theydo not attain this value at any finite time. We refer to p = K as the satu-ration level, or as the environmental carrying capacity for the givenspecies.

In many situations it is good enough to have the qualitative informationabout the solution p(t) shown in the figure above. However, if we wish tohave more detailed description of the logistic growth, then we have to solvethe equation.

The logistic equation is separable and so, for a initial condition p(t0) =p0, the solution is

∫ p

p0

dr

ar − br2=

∫ t

t0

ds = t− t0.

Note that, in 1ar−br2 = 1

r(a−br) = Ar + B

a−br , the constants A and B are found

to be A = 1a and B = b

a . Thus,


∫ p

p0

dr

ar − br2=

1a

∫ p

p0

(1r

+b

a− br

)dr

=1a

(ln

p

p0+ ln

∣∣∣∣a− bp0

a− bp

∣∣∣∣)

=1a

lnp

p0

∣∣∣∣a− bp0

a− bp

∣∣∣∣ = t− t0.

Since the right side of this equation is positive, one can easily show thata−bp0

a−bp is always positive for t0 < t < ∞. Thus,

a(t− t0) = lnp

p0

a− bp0

a− bp,

or ea(t−t0) =p

p0

a− bp0

a− bp,

or p(a− bp0) = p0(a− bp)ea(t−t0),

or(a− bp0 + bp0e

a(t−t0))

p(t) = ap0ea(t−t0),

or p(t) =ap0e

a(t−t0)

a− bp0 + bp0ea(t−t0)

=ap0

bp0 + (a− bp0)e−a(t−t0).

Observe first that as t →∞,

p(t) → ap0

bp0=

a

b= K,

which means that the population always approaches the limiting value ab

regardless of its initial value.Secondly, p(t) is monotonically increasing function in time if 0 < p0 < a

b .Moreover, since

d2p

dt2= a

dp

dt− 2bp

dp

dt= (a− 2bp)p(a− bp),

dpdt is increasing if p(t) < a

2b , and decreasing if p(t) > a2b . Hence the graph of

p(t) must be of the following form:


-

6ab

a2b

t0 t

p(t)

p(t0)t

p

Such a curve is called a logistic or S-shaped curve. In reality theseprediction came from an experiment on the protozoa Paramecium caudatumperformed by the mathematical biologist G.F. Gause. Starting with fiveParamecium placed in a small test tube containing 0.5 cm3 of nutritivemedium, for six days the number of individuals in the tube was counted daily.The population increased at a rate of 230.9% per day when the number werelow. The number of individuals increased rapidly at first, and then moreslowly, until towards the fourth day it attained a maximum level of 375,saturating the test tube. from this data, if the Paramecium caudatum growaccording to the logistic equation dp

dt = ap−bp2, then a = 2.309 and b = 2.309375 .

With p(t0) = p(0) = 5, the logistic law predicts that

p(t) =375

1 + 74e−2.309t.

The comparison of this prediction with the actual measurements was re-markably good.

Remark: Let p(t) denote the human population of the earth at timet. It was estimated that the earth’s human population was increasing atan average rate of 2% per year during the period 1960-1970. In January 1,1965, the earth’s population was estimated to be 3.34 billion people.

The exponential growth expectation from the linear differential equationis

p(t) = (3.34)109e0.02(t−1965).

From this prediction, the population of the earth will be doubled every Tyears:

e0.02T = 2.


solving this for T gives T = 50 ln 2 ≈ 34.6 years. This is in excellent agree-ment with the observed value. However, in the distant future, the earth’spopulation will be 200,000 billion in the year 2515, 1,800,000 billion in theyear 2625 and 3,600,000 billion in the year 2660. These are astronomicalnumbers whose significance is difficult to image. The total surface area ofthe earth is approximately 1,860,000billon square feet, whose 80% is coveredby water. Assuming we are willing to live on boats as well as land, by theyear 2515 each person will have 9.3 square feet, by the year 2625 only onequare feet per person, etc. Thus, this model seems unreasonable.

In the logistic law of population growth, some ecologist have estimatedthat the natural value of a = 0.029. Moreover, since the human populationwas increasing at the rate of 2% per year when p0 = (3.34)109 in 1965, from1p

dpdt = a− bp, we see that

0.02 = 0.029− b(3.34)109, or b = (2.695)10−12.

Therefore, according to the logistic law of population growth, the humanpopulation of the earth will tend to the limiting value

a

b=

0.029(2.695)10−12

= 10.76 billion people.

For the population of the earth in the year 2000,

p(2000) =(0.029)(3.34)109

0.009 + (0.02)e−(0.029)35

=(29)(3.34)

9 + 20e−1.015109

= 5, 96 billion people!,

which is in excellent agreement with the reality!

4.2.2 Brachistochrone problem

One of the famous problem in the history of mathematics is the brachis-tochrone problem posed by Johann Bernoulli in 1696 to challenge hiscontemporary mathematicians, especially his elder brother Jacob Bernoulli.Correct solutions were found by the two Bernoullis, I. Newton, G. Leibniz,and M. L’Hospital.

”Among all the curves joining the pick A of a hill and the base B of thehill, find the curve which gives the shortest possible time to descent from Ato B without friction.”


This problem is important in the development of mathematics as one ofthe forerunners of the calculus of variations.

In solving this problem, we begin with the fundamental principle of opticsdiscovered by Heron, the Alexandrian scientist of the first century A.D.: Alight ray travels along a path of taking shortest time.

It follows that the light that is reflected at a mirror takes the direc-tion making the angle of reflection equal to the angle of incidence: see thefollowing figure:

PQ

RR′

Q′

It also leads to another well known principle of Snell for deflection oflight, the light going into water from the air is deflected: Suppose that thespeed of the light in the first medium is v1 and in the second medium isv2, and the angle of incidence is α1 and that of deflection is α2 as in thefollowing figure:

P

Q

X

a

b

x

¾ -c

c− x

¾ -

¾ -

v1

v2

α1

α2

The total time from P to Q is

T (x) =√

a2 + x2

v1+

√b2 + (c− x)2

v2.


The minimum of T (x) occurs at x where

dT

dx=

x

v1

√a2 + x2

+c− x

v2

√b2 + (c− x)2

= 0,

∴ sinα1

v1=

sinα2

v2,

which is called the Snell’s law of deflection.Now, the light travels several media overlapped with velocities v1, v2, v3,

. . ., vn, and angles of incidence α1, α2, α3, . . ., αn as the following figure:

v1

v2

v3

αn

α1

α2

α3

vn

α

βv

y

x

By the Snell’s law, we have

sinα1

v1=

sinα2

v2=

sinα3

v3= · · · = sinαn

vn= c, a constant.

As the layer gets thinner and thinner, and then eventually the speedchanges continuously, the path of the light will become a smooth curve suchthat at a point of the curve the speed and the slope of the tangent line willsatisfy

sinα

v= c, a constant.

We now go back to our problem. By the Newton’s equation of motion,we get

v =√

2gy,

where g = dvdt is the gravitational acceleration, v = gt = dy

dt is the speed,y = 1

2gt2 is the distance travelled. Since dydx = tanβ and

sinα = cosβ =1

secβ=

1√1 + tan2 β

=1√

1 + ( dydx)2

,


we have

c =sinα

v=

1√

2gy√

1 + ( dydx)2

y

(1 + (

dy

dx)2

)= k, with k =

12gc2

,

y

k − y(dy

dx)2 = 1,

(y

k − y

)1/2

dy = dx.

Set(

yk−y

)1/2= tanϕ. Then we get

y = k sin2 ϕ.

The differential of this is dy = 2k sinϕ cosϕdϕ. From the last equality,

dx = tanϕdy = 2k sin2 ϕdϕ = k(1− cos 2ϕ)dϕ.

Finally, by integrating this, we get

x =k

2(2ϕ− sin 2ϕ) = a(θ − sin θ), with a =

k

2, θ = 2ϕ,

y = k sin2 ϕ =k

2(1− cos 2ϕ) = a(1− cos θ),

which describe the cycloid.

4.3 Exact Equations

Consider a function y = y(t) satisfying an equation φ(t, y) = y+sin(t+y) =c. By the implicit differentiation, we have

d

dtφ(t, y) = cos(t + y) + (1 + cos(t + y))

dy

dt= 0.

In reversed order, if we are given a differential equation

cos(t + y) + (1 + cos(t + y))dy

dt= 0

4.3. Exact Equations 123

which is of the formd

dtφ(t, y) = 0, y = y(t),

then we can easily find the solution

φ(t, y) = y + sin(t + y) = c.

In general, the most general first order differential equations that we cansolve are of the following form:

d

dtφ(t, y) = 0, y = y(t),

whose solution is φ(t, y) = c.Now the question is, for a given differential equation, how we can recog-

nize when it can be put in this form. Note that the function φ(t, y) in theequation has two variables t and y. In general, the derivative of a functionin more than one variables will be discussed in Calculus II later.

In this section, we briefly introduce the derivatives of such a functionof more than one variables. It involves partial derivatives: Let z = φ(x, y)be a function of two variables x and y on a domain U in R2. The partialderivative of φ with respect to x is the usual derivative of φ by holding theother variable y as a constant:

∂φ

∂x(x, y) = φx(x, y) = lim

h→0

φ(x + h, y)− φ(x, y)h

,

provided the limit exists. Then the derivative of z is defined as

dz = φx(x, y)dx + φy(x, y)dy.

It is also known that if the first partial derivatives are continuous on U , thenthe second order partial derivatives satisfy: φxy(x, y) = φyx(x, y) on U .

For a given function φ(t, y), where y = f(t), the derivative of φ is

d

dtφ(t, y(t)) =

∂φ

∂t+

∂φ

∂y

dy

dt.

Theorem 4.3.1 The differential equation M(t, y) + N(t, y)dydt = 0 can be

written as ddtφ(t, y) = 0 if and only if there exists a function φ(t, y) such

that M(t, y) = ∂φ∂t and N(t, y) = ∂φ

∂y .

The next question is, for given two functions M(t, y) and N(t, y), howdo we know that there is a function φ(t, y) such that M(t, y) = ∂φ

∂t andN(t, y) = ∂φ

∂y ?


Theorem 4.3.2 Let M(t, y) and N(t, y) be continuous and have continuouspartial derivatives on R = (a, b)× (c, d). Then there exists a function φ(t, y)such that M(t, y) = ∂φ

∂t and N(t, y) = ∂φ∂y if and only if ∂M

∂y = ∂N∂t in R.

Proof: Suppose that M(t, y) = ∂φ∂t and N(t, y) = ∂φ

∂y . Then it maybe provedin an advanced calculus course that

∂M

∂y=

∂2φ

∂y∂t=

∂2φ

∂t∂y=

∂N

∂t.

For the converse, we are looking for a function φ(t, y) such that M(t, y) =∂φ∂t and N(t, y) = ∂φ

∂y . Define φ by

φ(t, y) =∫

M(t, y)dt + h(y),

for some function h in y which is to be found according to

N(t, y) =∂φ

∂y=

∫∂M(t, y)

∂ydt + h′(y),

or h′(y) = N(t, y)−∫

∂M(t, y)∂y

dt.

The left hand side h′(y) is a function of y alone, while the right hand sideis a function in t and y, which is possible only when

∂

∂t

(N(t, y)−

∫∂M(t, y)

∂ydt

)= 0.

However, since

∂

∂t

(N(t, y)−

∫∂M(t, y)

∂ydt

)=

∂N(t, y)∂t

− ∂M(t, y)∂y

,

∂∂t

(N(t, y)− ∫ ∂M(t,y)

∂y dt)

= 0 if and only if ∂N(t,y)∂t = ∂M(t,y)

∂y . I particular, if∂N∂t 6= ∂M

∂y , then there is no such function φ. On the other hand, if ∂N∂t = ∂M

∂y ,then we can find

h(y) =∫ (

N(t, y)−∫

∂M(t, y)∂y

dt

)dy.

Consequently, M(t, y) = ∂φ∂t and N(t, y) = ∂φ

∂y for

φ(t, y) =∫

M(t, y)dt +∫ (

N(t, y)−∫

∂M(t, y)∂y

dt

)dy. ¤


Definition 4.3.1 The differential equation M(t, y) + N(t, y)dydt = 0 is said

to be exact if ∂N∂t = ∂M

∂y .

Remarks: (1) The domain discussed in Theorem 4.3.2 can be any regionin R2 which contains no holes.

(2) When we say the solution of an exact differential equation is givenby φ(t, y) = c, what we really mean is that the equation φ(t, y) = c can besolved for y as a function of t and c. In most cases, the solution can not besolved explicitly for y as a function of t. However, a computer may be usedto compute y(t) to any desired accuracy.

(3) Practically, from the equations M(t, y) = ∂φ∂t and N(t, y) = ∂φ

∂y , wehave:

φ(t, y) =∫

M(t, y)dt + h(y), φ(t, y) =∫

N(t, y)dt + k(t).

Usually h(y) or k(t) can be determined by inspection.

Example 4.3.1 Find the general solution of

3y + et + (3t + cos y)dy

dt= 0.

Solution: Since ∂M∂y = 3 = ∂N

∂t for M(t, y) = 3y+et and N(t, y) = 3t+cos y,the equation is exact.

Method 1: From M(t, y) = ∂φ∂t ,

φ(t, y) =∫

(3y + et)dt + h(y) = et + 3ty + h(y),

=⇒ N(t, y) =∂φ

∂y= 3t + h′(y) = 3t + cos y,

=⇒ h′(y) = cos y, =⇒ h(y) = sin y + c,

∴ φ(t, y) = et + 3ty + sin y = c.

Method 2: From M(t, y) = ∂φ∂t and N(t, y) = ∂φ

∂y , we get

φ(t, y) = et + 3ty + h(y), φ(t, y) = 3ty + sin y + k(t).

By comparison, we see that h(y) = sin y and k(t) = et, so that φ(t, y) =et + 3ty + sin y. ¤



3t2y + 8ty2 + (t3 + 8t2y + 12y2)dy

dt= 0, y(2) = 1.

Solution: Since ∂M∂y = 3t2 + 16ty = ∂N

∂t for M(t, y) = 3t2y + 8ty2 andN(t, y) = t3 + 8t2y + 12y2, the equation is exact.

From M(t, y) = ∂φ∂t and N(t, y) = ∂φ

∂y , we get

φ(t, y) = t3y + 4t2y2 + h(y), φ(t, y) = t3y + 4t2y2 + 4y3 + k(t).

By comparison, we see that h(y) = 4y3 and k(t) = 0, so that φ(t, y) =t3y + 4t2y2 + 4y3 = c. By setting t = 2 and y = 1, we get c = 28. ¤


4t3et+y + t4et+y + 2t + (t4et+y + 2y)dy

dt= 0, y(0) = 1.

Solution: Since ∂M∂y = (t4+4t3)et+y = ∂N

∂t for M(t, y) = 4t3et+y+t4et+y+2t

and N(t, y) = t4et+y + 2y, the equation is exact.From N(t, y) = ∂φ

∂y ,

φ(t, y) =∫

(t4et+y + 2y)dy + k(t) = t4et+y + y2 + k(t),

=⇒ M(t, y) =∂φ

∂t= (4t3 + t4)et+y + k′(t) = 4t3et+y + t4et+y + 2t,

=⇒ k′(t) = 2t, =⇒ k(t) = t2 + c,

∴ φ(t, y) = t4et+y + y2 + t2 = c.

By setting t = 0 and y = 1, we get c = 1. ¤

Sometimes, a given equation M(t, y)+N(t, y)dydt = 0 is not exact, but by

multiplying some suitable function µ(t, y), called an integrating factor, tothe equation, we get an exact equation: i.e., the equation

µ(t, y)M(t, y) + µ(t, y)N(t, y)dy

dt= 0


can be exact. By Theorem 4.3.2, this is exact if and only if

∂

∂y(µ(t, y)M(t, y)) =

∂

∂t(µ(t, y)N(t, y)),

or M∂µ

∂y+ µ

∂M

∂y= N

∂µ

∂t+ µ

∂N

∂t.

There are only two special cases where we can find an explicit solutionof this equation: when µ is either a function of t alone, or a function of yalone. If µ is a function of t alone, then the above equation reduces to

N∂µ

∂t= µ

(∂M

∂y− ∂N

∂t

), or

1µ

∂µ

∂t=

1N

(∂M

∂y− ∂N

∂t

),

which is meaningful only when

1N

(∂M

∂y− ∂N

∂t

)= R(t)

is a function of t alone. In this case, µ(t) = exp(∫

R(t)dt)

is an integratingfactor. A similar situation occurs if µ is a function of y alone.

In general, the function

1N

(∂M

∂y− ∂N

∂t

)

is almost always a function of both t and y. Only for very special pairs offunctions M and N is it a function of t, or y, alone. This is the reason whywe cannot solve vary many differential equations.


y2

2+ 2yet + (y + et)

dy

dt= 0.

Solution: Since

1N

(∂M

∂y− ∂N

∂t

)=

1y + et

((y + 2et)− et) =y + et

y + et= 1 6= 0,

the equation is not exact, but has an integrating factor µ(t) = exp(∫

1dt) =et. Thus, the equivalent equation

et y2

2+ 2yet + et(y + et)

dy

dt= 0


is exact. Then as previous cases,

φ(t, y) = et y2

2+ ye2t + h(y), φ(t, y) = et y

2

2+ ye2t + k(t).

By comparison, we see that h(y) = 0 = k(t), so that φ(t, y) = et y2

2 +ye2t = c.Since this is a quadratic equation in y, we can solve this equation for y as afunction of t to get

y(t) = −et ±√

e2t + 2ce−t. ¤

Example 4.3.5 Use the methods of this section to find the general solutionof

dy

dt+ p(t)y = q(t).

Solution: For M(t, y) = p(t)y − q(t) and N(t, y) = 1, we have

1N

(∂M

∂y− ∂N

∂t

)=

11(p(t)− 0) = p(t) 6= 0,

the equation is not exact, but has an integrating factor µ(t) = exp(∫

p(t)dt).Thus, the equivalent equation

µ(t)(p(t)y − q(t)) + µ(t)dy

dt= 0

is exact. Then, as previous cases, from ∂φ∂y = µ(t), φ(t, y) = µ(t)y + k(t).

Since ∂φ∂t = µ(t)M(t, y), we get

µ′(t)y + k′(t) = µ(t)(p(t)y − q(t)).

Since µ′(t) = µ(t)p(t), we get k′(t) = −µ(t)q(t) and so

φ(t, y) = µ(t)y + k(t) = µ(t)y −∫

µ(t)q(t)dt = c,

which was obtained in Section 4.1. ¤

4.4. EXISTENCE AND UNIQUENESS THEOREM 129

4.4 Existence and Uniqueness Theorem

Given an initial value problem: dydt = f(t, y),with y(t0) = y0, how do we

know whether there is a solution? If we know it has a solution, how canwe find one explicit solution? If we found one, are there any other solutionstoo? If yes, how many are there? Those are very often naturally askedquestions in mathematics.

In some cases (like linear differential equations), as we have seen in theearlier sections the existence of a solution of the initial value problem can beestablished directly by actually solving the problem and exhibiting a formulafor the solution. However, in general, there is no general method of solvingthe equation that applies in all cases, and actually for almost all differentialequations, finding one explicit solution is almost impossible even if we knowthat the solution exists.

Therefore, for the general case it is necessary to adopt an indirect ap-proach that establishes the existence of a solution of the initial value prob-lem, but usually does not provide a practical means of finding an explicitsolution.

Note however that, in actual applications, it is usually more than suffi-cient to approximate the solution y(t) of the equation in four decimal places,and this can be done quite easily by using computers.

The following existence and uniqueness theorem guarantees the validityof this computations.

Theorem 4.4.1 [Fundamental Theorem of Ordinary Differential EquationI] Suppose that the differential equation dy

dt = f(t, y) is defined on the rect-angle R = [t0, t0 + a]× [y0 − b, y0 + b], and that f and ∂f

∂y are continuous inR. Let

M = max(t,y)∈R

|f(t, y)|, α = min(a,b

M).

Then the initial value problem

dy

dt= f(t, y(t)), with y(t0) = y0, (t0, y0) ∈ R, (4.1)

has a unique solution y(t) on the interval [t0, t0 + a]. Similar result holdsfor t < t0.

The heart of the proof is in constructing a sequence of functions thatconverges to a limit function satisfying the initial value problem. Actually,since the individual member of the sequence needs not satisfy the desired


conditions, it is usually impossible to compute the members of the sequenceexplicitly more than a few members, so that the limit function can not befound explicitly except for very rare cases. Nevertheless, it is possible toshow that the sequence in question converges and the limit function has thedesired properties. The argument is fairly intricate and depends, in parts, ontechniques and results that are usually encountered in a course of advancedcalculus. Thus, at the first reading of this book, the readers may skip thispart.

The strategy for the proof is in the following three steps:

(1) Construct a sequence of functions yn(t) which come closer and closerto the solution of (4.1),

(2) Show that the sequence of functions yn(t) has a limit y(t) on a suitabledomain [t0, t0 + a],

(3) Prove that the limit y(t) is a solution of 4.1) on the interval [t0, t0 +a].

Proof: (1) Construction of a sequence of functions yn(t): By integratingboth sides of the equation (4.1), we get

y(t) = y0 +∫ t

t0

f(s, y(s))ds. (4.2)

Thus, y(t) is a solution of (4.1) if and only if it is a continuous solution of(4.2).

Let us guess a first solution of (4.2) to be a constant function y0(t) = y0.Then, we define

y1(t) = y0 +∫ t

t0

f(s, y0(s))ds.

If y1(t) = y0, then y(t) = y0 is indeed a solution of (4.2). If not, we define

y2(t) = y0 +∫ t

t0

f(s, y1(s))ds,

and so on. In this manner, we define

yn+1(t) = y0 +∫ t

t0

f(s, yn(s))ds,

to obtain a sequence {yn(t)} of functions, called Picard iterates. It turnsout that this Picard iterates always converges on a suitable interval to asolution y(t) of (4.2).

4.4. Existence and Uniqueness Theorem 131

(2) Convergence of the Picard iterates: We can not expect the Picarditerates to converge for all t. Thus, we first try to find an interval in whichall the iterates yn(t) are uniformly bounded (that is, |yn(t)| ≤ K for all nand t in the interval, and for some fixed constant K).

Lemma 4.4.2 Choose any two positive numbers a and b, and let R be therectangle [t0, t0 + a]× [y0 − b, y0 + b]. Let

M = max(t,y)∈R

|f(t, y)|, α = min(a,b

M).

Then|yn(t)− y0| ≤ M(t− t0),

for t0 ≤ t ≤ t0 + α.

This lemma claims that the graph of yn(t) is sandwiched between thelines y = y0 + M(t− t0) and y = y0 −M(t− t0), for t0 ≤ t ≤ t0 + α:

-

6y0 + b

y0 − b -

6

y0

t0 + α = t0 + a t0 + α = t0 + bM

t0 t0

y0(t) y0(t)

In fact, by the construction of α, the graph of yn(t) is contained inR = [t0, t0 + a]× [y0 − b, y0 + b].

Proof: Use induction on n. The lemma is is trivially true for n = 0 sincey0(t) = y0. Suppose it is true for n, so that |yn(t)− y0| ≤ M(t− t0). Then

|yn+1(t)− y0| =∣∣∣∣∫ t

t0

f(s, yn(s))ds

∣∣∣∣ ≤∫ t

t0

|f(s, yn(s))|ds ≤ M(t− t0),

for t0 ≤ t ≤ t0 + α. ¤


We can rewrite yn(t) as

yn(t) = y0(t) + (y1(t)− y0(t)) + (y2(t)− y1(t)) + · · · + (yn(t)− yn−1(t)).

Thus yn(t) converges if and only if the series

(y1(t)− y0(t)) + (y2(t)− y1(t)) + · · · + (yn(t)− yn−1(t)) + · · ·converges, or absolutely converges: i.e.,

∑∞n=1 |yn(t)−yn−1(t)| < ∞. Observe

that

|yn(t)− yn−1(t)| =∣∣∣∣∫ t

t0

[f(s, yn−1(s))− f(s, yn−2(s))]ds

∣∣∣∣

≤∫ t

t0

|f(s, yn−1(s))− f(s, yn−2(s))| ds

=∫ t

t0

∣∣∣∣∂(f(s, ξ(s))

∂y

∣∣∣∣ |yn−1(s)− yn−2(s)| ds,

where ξ(s) lies between yn−1(s) and yn−2(s) from the intermediate valuetheorem. By Lemma 4.4.2, (s, ξ(s)) all lie in the rectangle R for s < t0 + α,and so

|yn(t)− yn−1(t)| ≤ L

∫ t

t0

|yn−1(s)− yn−2(s)| ds, t0 ≤ t ≤ t0 + α,

where L = max(t,y)∈R

∣∣∣∣∂(f(s, y)

∂y

∣∣∣∣ .

For n = 2 and 3, this inequality becomes:

|y2(t)− y1(t)| ≤ L

∫ t

t0

|y1(s)− y0|ds ≤ L

∫ t

t0

M(s− t0)ds =LM(t− t0)2

2,

|y3(t)− y2(t)| ≤ L

∫ t

t0

|y2(s)− y1(s)|ds ≤ L2M

∫ t

t0

(s− t0)2

2ds =

L2M(t− t0)3

3!,

|yn(t)− yn−1(t)| ≤ Ln−1M(t− t0)n

n!, t0 ≤ t ≤ t0 + α.

Therefore,∞∑

n=1

|yn(t)− yn−1(t)| ≤∞∑

n=1

Ln−1M(t− t0)n

n!

≤∞∑

n=1

Ln−1Mαn

n!

=M

L

[ ∞∑

n=1

(Lα)n

n!

]=

M

L(eLα − 1) < ∞.


Thus, the Picard iterates yn(t) converges for all t ∈ [t0, t0 +α] to some limitfunction y(t). ¤

(3) y(t) satisfies the initial-value problem: To show that y(t) is continu-ous and satisfies

y(t) = y0 +∫ t

t0

f(s, y(s))ds.

Since

yn+1(t) = y0 +∫ t

t0

f(s, yn(s))ds,

by taking the limits of both sides we get:

y(t) = limn→∞ yn+1(t)

= y0 + limn→∞

∫ t

t0

f(s, yn(s))ds

?= y0 +∫ t

t0

f(s, limn→∞ yn(s))ds

= y0 +∫ t

t0

f(s, y(s))ds.

We want to show the equality in the middle. For this, we show that

∣∣∣∣∫ t

t0

f(s, y(s))ds−∫ t

t0

f(s, yn(s))ds

∣∣∣∣ → 0, as n →∞.

Note that the graph of y(t) lies in R on [t0, t0 + α] since that of yn(t) are inR. Hence,

∣∣∣∣∫ t

t0

(f(s, y(s))− f(s, yn(s)))ds

∣∣∣∣ ≤∫ t

t0

|f(s, y(s))− f(s, yn(s))| ds

≤ L

∫ t

t0

|y(s)− yn(t)|ds.

Moreover, by the construction of yn(t) and y(t), we have

y(s)− yn(s) =∞∑

k=n+1

(yk(s)− yk−1(s)),


|y(s)− yn(s)| ≤ M∞∑

k=n+1

Lk−1 (s− t0)k

k!

≤ M∞∑

k=n+1

Lk−1 αk

k!

=M

L

∞∑

k=n+1

(Lα)k

k!,

∴∣∣∣∣∫ t

t0

{f(s, yn(s))− f(s, yn(s))} ds

∣∣∣∣ ≤ M

∞∑

k=n+1

(Lα)k

k!

∫ t

t0

ds

≤ Mα∞∑

k=n+1

(Lα)k

k!→ 0,

as n →∞, since the last summation is the tail end of the convergent Taylorseries of eLα.

To show y(t) is continuous, we prove that for any ε > 0 we can find δ > 0such that

|y(t + h)− y(t)| < ε, if |h| < δ.

Since we do not know the explicit form of y(t), we cannot compare y(t+h)and y(t) directly. However, note that

|y(t+h)−y(t)| ≤ |y(t+h)−yN (t+h)|+ |yN (t+h)−yN (t)|+ |yN (t)−y(t)|.

Since yn(t) → y(t), as n →∞, for t ∈ [t0, t0 +α], we can take N so largethat

M

L

∞∑

k=n+1

(Lα)k

k!<

ε

3.

Then|y(t + h)− yN (t + h)| < ε

3and |yN (t)− y(t)| < ε

3,

for h sufficiently small so that t + h < t0 + α. Moreover, since yN (t) is ob-tained from N repeated integration of continuous functions, it is continuous,and so that one can take δ > 0 so small that

|yN (t + h)− yN (t)| < ε

3for |h| < δ.

Consequently, for |h| < δ,

|y(t + h)− y(t)| < ε

3+

ε

3+

ε

3= ε.


(4) Uniqueness of y(t): Suppose that z(t) is another solution. Then

y(t) = y0 +∫ t

t0

f(s, y(s))ds, and z(t) = y0 +∫ t

t0

f(s, z(s))ds.

Thus, for (t, y) ∈ R,

|y(t)− z(t)| =∣∣∣∣∫ t

t0

(f(s, y(s))− f(s, z(s)))ds

∣∣∣∣

≤∫ t

t0

|f(s, y(s))− f(s, z(s))| ds

= L

∫ t

t0

|y(t)− z(t)| ds.

Lemma 4.4.3 Let w(t) be a nonnegative function such that

w(t) ≤ L

∫ t

t0

w(s)ds.

Then w(t) is identically zero.

Proof: If the differentiation preserved the inequality so that w′(t) ≤ Lw(t)from the condition, we would have

0 ≤ e−L(t−t0)w(t) ≤ w(t0) ≤ L

∫ t0

t0

w(s)ds = 0

and so w(t) = 0, and proof is done.However, the differentiation does not preserve inequality, while integral

does. Thus we make a trick of setting u(t) =∫ tt0

w(s)ds. Then

u′(t) = w(t) ≤ L

∫ t

t0

w(s)ds ≤ Lu(t).

Now this implies 0 ≤ e−L(t−t0)u(t) ≤ u(t0) =∫ t0t0

u(s)ds = 0, for t ≥ t0,a ndso u(t) = 0 and 0 ≤ w(t) ≤ L

∫ tt0

w(s)ds = Lu(t) = 0. ¤

Lemma 4.4.3 implies |y(t)−z(t)| = 0 or y(t) = z(t) for all t ∈ [t0, t0 +α].This completes Theorem 4.4.1. ¤


Example 4.4.1 Compute the Picard iterates for the initial value problem:

y′ = 1 + y3, y(1) = 1.

y0(t) = 1

y1(t) = 1 +∫ t

1(1 + 1)ds = 1 + 2(t− 1),

y2(t) = 1 +∫ t

1

{1 + [1 + 2(t− 1)]3

}ds,

= 1 + 2(t− 1) + 3(t− 1)2 + 4(t− 1)3 + 2(t− 1)4. ¤

Example 4.4.2 Compute the Picard iterates for the initial value problem:y′ = y, y(0) = 1, and show that they converge to y(t) = et.

y0(t) = 1

y1(t) = 1 +∫ t

11ds = 1 + t,

y2(t) = 1 +∫ t

1(1 + s)ds = 1 + t +

t2

2!,

yn(t) = 1 +∫ t

1(1 + s + · · ·+ sn−1

(n− 1)!)ds = 1 + t +

t2

2!+ · · ·+ tn

n!,

which converges to et. ¤

Example 4.4.3 Consider the initial value problem:

dy

dt= (sin 2t)y1/3, y(0) = 0.

One solution could be y(t) = 0. If we ignore the initial condition y(0) = 0and rewrite the equation as

1y1/3

dy

dt= sin 2t,

we get, by integration,

3y1/3

2=

∫ t

0sin 2sds =

1− cos 2t

2= sin2 t.

or y(t) = ±√

8/27 sin3 t,

which are two other solutions. This non-uniqueness of the solution is dueto the fact that the right hand side of the equation does not have a partialderivative with respect to y at y = 0. ¤



dy

dt= f(t, y) = t2 + e−y2

, y(0) = 0.

Choose a = 12 and b = 1. Then on the rectangle R = [0, 1

2 ]× [−1, 1],

M = max(t,y)∈R

(t2 + e−y2) = 1 + (

12)2 =

54.

Thus, for α = min(1/2, 4/5) = 1/2, the solution y(t) exists for t ∈ [0, α] =[0, 1/2], by Theorem 4.4.1, and |y(t)| ≤ 1. ¤


dy

dt= f(t, y) = y3 + e−t2 , y(0) = 1.

Choose a = 19 and b = 1. Then on the rectangle R = [0, 1

9 ]× [0, 2],

M = max(t,y)∈R

(y3 + e−t2) = 1 + 23 = 9.

Thus, for α = min(1/9, 1/9) = 1/9, the solution y(t) exists for t ∈ [0, α] =[0, 1/9], by Theorem 4.4.1, and 0 ≤ y(t) ≤ 2. ¤


dy

dt= f(t, y) = 1 + y2, y(0) = 0.

On the rectangle R = [0, a]× [−b, b],

M = max(t,y)∈R

(1 + t2) = 1 + b2.

Thus, for α = min(a, b1+b2

), the solution y(t) exists for t ∈ [0, α]. Thus, thelargest α that we can achieve is the maximum value of b

1+b2, which is 1

2 .Thus, Theorem 4.4.1 predicts that y(t) exists for 0 ≤ t ≤ 1

2 . However, sincey(t) = tan t exists on [0, π

2 ), Theorem 4.4.1 has some limitation. ¤


dy

dt= f(t, y), y(t0) = y0.


Suppose that |f(t, y)| ≤ K on [t0,∞) × R. Then, on the rectangleR = [t0, t0 + a]× [y0 − b, y0 + b],

M = max(t,y)∈R

(1 + t2) ≤ K,

for any a > 0 and b > 0. Thus, for α = min(a, bK ), the solution y(t) exists

for t ∈ [t0, t0 + α]. Now we can make α = min(a, bK ) as large as desired by

choosing a and b sufficiently large. hence, y(t) exists for t ≥ t0. ¤

Chapter 5

Second Order DifferentialEquations

5.1 Second Order Linear Differential Equations

A second order differential equation is an equation of the form

d2y

dt2= f(t, y,

dy

dt).

In general, second order differential equation arise quite often in applica-tions, but they are extremely difficult to solve. We only succeed in solvingthe special kind of equation, second order linear differential equation to-gether with an initial condition:

d2y

dt2+ p(t)

dy

dt+ q(t)y = g(t), y(t0) = y0, y′(t0) = y′0, (5.1)

where p(t), q(t) and g(t) are continuous functions in t, and y(t0) = y0 denotesthe initial position and y′(t0) = y′0 denotes the initial velocity.

The equation is homogeneous if g(t) = 0. Fortunately, many of thesecond order equations that appear in applications are of this form.

Theorem 5.1.1 [Fundamental Theorem of Ordinary Differential EquationII] Let the functions p(t) and q(t) are continuous in (α, β). Then there existsa unique solution y = φ(t) of the initial value problem of the homogeneousequation:

d2y

dt2+ p(t)

dy

dt+ q(t)y = 0, y(t0) = y0, y′(t0) = y′0,

139

140 Chapter 5. Second Order Differential Equations

on the entire interval (α, β). In particular, the solution y = φ(t) satisfyingy(t0) = 0 and y′(t0) = 0 at some time t = t0 must be identically zero.

From this theorem, one can now try to find the unique solution, and allpossible solutions of the homogeneous equation depending on the variousinitial conditions.

For notational convenience, we introduce differential operators:

D =d

dt, D2 =

d2

dt2, etc.

which make sense only when they are applied to a function y:

Dy =d

dt(y) =

dy

dt= y′

D2y =d2

dt2(y) =

d2y

dt2= y′′, etc.

Let L = D2 + pD + q, where p, q are continuous functions on (α, β). Thenthe given equation can be written as

L[y](t) = (D2 + pD + q)[y](t)= D2(y(t)) + p(t)D(y(t)) + q(t)y(t)= y′′(t) + p(t)y′(t) + q(t)y(t).

The following rules are easy to derive:

L[c1y1 + c2y2] = c1L[y1] + c2L[y2], c1, c2 constants .

An operator satisfying this property is called a linear operator. It followsfrom this property of L that, if y1(t) and y2(t) are solutions of L[y](t) = 0,so is c1y1 + c2y2 for any constants c1 and c2.

Example 5.1.1 If L = D2 + 1, then L[y](t) = y′′(t) + y(t) = 0. One caneasily verify that y1(t) = cos t and y2(t) = sin t are two distinct solutions,and so is y(t) = c1 cos t + c2 sin t, where c1, c2 are constants. In fact, everysolution y(t) is of this form: Suppose that the initial conditions are givenas y(0) = y0 and y′(0) = y′0. Then the function φ(t) = y0 cos t + y′o sin tis a solution satisfying the initial conditions, and so, by the uniqueness,f(t) = φ(t). ¤

Such two solutions cos t and sin t of L[y](t) = y′′(t) + y(t) = 0 are said to belinearly independent. In general, we have:

5.1. Second Order Linear Differential Equations 141

Definition 5.1.1 Two functions y1(t) and y2(t) are said to be linearlydependent on (α, β) if one of them is a constant multiple of the other on(α, β): that is, if there are some constants (c1, c2) 6= (0, 0), or c 6= 0, suchthat

c1y1(t) + c2y2(t) = 0, for all t ∈ (α, β),or y2(t) = cy1(t), for all t ∈ (α, β).

Otherwise, they are said to linearly independent.

Note that two linearly independent functions may be constant multipleof each other on some subinterval of (α, β), but not on the whole interval.That is, for two functions to be linearly independent, they only need to benot constant multiple of the other on a (small) subinterval of (α, β).

The following is one of the criteria for linear dependence of two functions:

Definition 5.1.2 For two functions y1(t) and y2(t) on (α, β), the function

W (t) = W [y1, y2](t) = det[

y1(t) y2(t)y′1(t) y′2(t)

]= y1(t)y′2(t)− y′1(t)y2(t)

is called the Wronskian of y1(t) and y2(t).

It is easy to see that, if y2(t) = cy1(t) on (α, β), then W [y1, y2](t) = 0 forall t ∈ (α, β). In general, the converse is not true: i.e., W [y1, y2](t) = 0 forall t ∈ (α, β) does not imply the linear dependence of y1 and y2, since theconstant c in y2(t) = cy1(t) may be different depending on t. However, ifthe two functions are solutions of a second order linear differential equationL[y](t) = 0, then the converse is also true by the following:

Lemma 5.1.2 Let L = D2 + pD + q, where p, q are continuous functionson (α, β), and let y1(t) and y2(t) be two solutions of L[y](t) = 0 on (α, β).Then the Wronskian W (t) = W [y1, y2](t) of y1(t) and y2(t) satisfies

W ′(t) + p(t)W (t) = 0.

In particular, W [y1, y2] is either ≡ 0 or 6= 0 on (α, β).

Proof: Note that

W ′(t) = y1(t)y′′2(t) + y′1(t)y′2(t)− y′1(t)y

′2(t)− y′′1(t)y2(t)

= y1(t)y′′2(t)− y′′1(t)y2(t)= −p(t)(y1(t)y′2(t)− y′1(t)y2(t)) = −p(t)W (t),


since y′′2(t) = −p(t)y′2(t)−q(t)y2(t), and y′′1(t) = −p(t)y′1(t)−q(t)y1(t). Thus

W (t) = W (t0) exp(−∫

p(s)ds) ={ ≡ 0, ∀ t ∈ (α, β), if W (t0) = 06= 0, ∀ t ∈ (α, β), if W (t0) 6= 0. ¤

Theorem 5.1.3 Let y1(t) and y2(t) be two solutions of L[y](t) = 0 on(α, β). If W (t0) = W [y1, y2](t0) = 0 for some t0 ∈ (α, β), then y2(t) = cy1(t)for all t ∈ (α, β).

Proof: Suppose W (t0) = W [y1, y2](t0) = 0 for some t0 ∈ (α, β). We wantto show that there are some constants (c1, c2) 6= (0, 0) such that

c1y1(t) + c2y2(t) = 0, for all t ∈ (α, β).

However, we know that the system{

c1y1(t0) + c2y2(t0) = 0c1y

′1(t0) + c2y

′2(t0) = 0

has a nontrivial solution (c1, c2) 6= (0, 0) if and only if W (t0) = W [y1, y2](t0) =0. For this nontrivial solution (c1, c2), let y(t) = c1y1(t) + c2y2(t) fort0 ∈ (α, β). Then y(t) is a solution of L[y](t) = 0 with the initial condi-tions y(t0) = 0 and y′(t0) = 0. Since y(t) ≡ 0 is also a solution of this initialvalue problem, by Theorem 5.1.1, y(t) = c1y1(t)+c2y2(t) ≡ 0 on (α, β): i.e.,y1(t) and y2(t) are linearly dependent on (α, β). Note that, by Lemma 5.1.2,W (t) = 0 for all t ∈ (α, β). ¤

Corollary 5.1.4 Let y1(t) and y2(t) be two solutions of L[y](t) = 0 on(α, β). Then they are linearly independent on (α, β) if and only if

W (t0) = W [y1, y2](t0) 6= 0

for some t0 ∈ (α, β) (and so W (t) = W [y1, y2](t) 6= 0 for all t ∈ (α, β)),and linearly dependent on (α, β) if and only if W (t0) = W [y1, y2](t0) = 0for some t0 ∈ (α, β) (and so W (t) = W [y1, y2](t) = 0 for all t ∈ (α, β)).

Theorem 5.1.5 Suppose that y1(t) and y2(t) are solutions of L[y](t) = 0on (α, β). If y1(t) and y2(t) are linearly independent on (α, β), then

y(t) = c1y1(t) + c2y2(t)

is the general solution of L[y](t) = 0, that is, any solution is of this form.

5.2. H2O-LDE WITH CONSTANT COEFFICIENTS 143

Proof: Let y(t) be a solution of L[y](t) = 0 on (α, β). We want to findsome constants (c1, c2) such that y(t) = c1y1(t) + c2y2(t). Let

y(t0) = c1y1(t0) + c2y2(t0) = y0

y′(t0) = c1y′1(t0) + c2y

′2(t0) = y′0

for t0 ∈ (α, β). The solution (c1, c2):

c1 =y0y

′2(t0)− y′0y2(t0)

y1(t0)y′2(t0)− y′1(t0)y2(t0)

c2 =y0y

′1(t0)− y′0y1(t0)

y1(t0)y′2(t0)− y′1(t0)y2(t0),

of the system gives the solution φ(t) = c1y1(t) + c2y2(t) of L[y](t) = 0 withφ(t0) = y0 and φ′(t0) = y′0. By the uniqueness of the solution, y(t) = φ(t).¤

Definition 5.1.3 A set of linearly independent solutions {y1(t), y2(t)} iscalled a fundamental set of solutions.

Hence, to find all the solutions of a homogeneous second order lineardifferential equation (H2O-LDE), it is good enough to find two linearly in-dependent solution.

5.2 H2O-LDE with Constant Coefficients

We now restrict our attention to the homogeneous 2-nd order linear differ-ential equations (H2O-LDE) with constant coefficients of the form:

L[y](t) = (aD2 + bD + c)(y(t)) = ay′′(t) + by′(t) + cy(t) = 0,

where a, b, c are constants. By a direct inspection of the equation, wecan easily recognize that if y is a solution, then y, y′ and y′′ must be ofthe same type, since they must cancel each other. We know that onlyexponential function has such a property. Thus we guess a solution is of theform y(t) = ert for some constant r. Then we have

L[y](t) = ay′′(t) + by′(t) + cy(t)= ar2ert + brert + cert

= (ar2 + br + c)ert = 0,

or ar2 + br + c = 0, since ert 6= 0.


The last equation is called the characteristic equation of the differentialequation. Thus,

r1 =−b +

√b2 − 4ac

2a, r2 =

−b−√b2 − 4ac

2a.

Case 1: r1 and r2 are distinct real numbers. Then y1(t) = er1t, andy2(t) = er2t are two distinct solutions with

W (t) = det[

er1t er2t

r1er1t r2e

r2t

]= (r2 − r1)e(r1+r2)t 6= 0.

Thus they are linearly independent so that the general solution is

y(t) = c1er1t + c2e

r2t.

Example 5.2.1 Find the solution y(t) of the initial value problem

y′′ + 4y′ − 2y = 0, y(0) = 1, y′(0) = 2.

Solution: The characteristic equation r2 + 4r − 2 = 0 has two solutions:

r1 =−4 +

√16 + 8

2= −2 +

√6, r2 =

−4−√16 + 82

= −2−√

6.

Thus the general solution is

y(t) = c1e(−2+

√6)t + c2e

(−2−√6)t.

The constants c1 and c2 are to be determined from the initial conditions

y(0) = c1 + c2 = 1, (−2 +√

6)c1 + (−2−√

6)c2 = 2.

By solving these equations for c1 and C2, we get

c1 =2√6

+12, c2 =

12− 2√

6,

so that the particular solution is

y(t) =(

2√6

+12

)e(−2+

√6)t +

(12− 2√

6

))e(−2−√6)t. ¤

5.2. H2O-LDE with Constant Coefficients 145

Case 2: r1 and r2 are complex numbers. They are conjugate to eachother:

r1 =−b + i

√4ac− b2

2a, r2 =

−b− i√

4ac− b2

2a.

In this case, we encounter two difficulties: On the one hand, the function ert

is not defined for complex number r, and on the other hand, we still needtwo real valued solutions.

(1) If we write r1 = λ + iµ, then r2 = λ − iµ. Then by the law ofexponents

e(λ+iµ)t = eλteiµt.

Thus we first need to define eiµt for µ real. For this, recall that

ex = 1 + x +x2

2!+ · · ·+ xn

n!+ · · · ,

which makes sense even for x complex. Thus

eiµt = 1 + (iµt) +(iµt)2

2!+ · · ·+ (iµt)n

n!+ · · ·

=(

1− (µt)2

2!+ · · ·+ (−1)k(µt)2k

(2k)!+ · · ·

)

+i

((µt)− (µt)3

3!+ · · ·+ (−1)k(µt)2k+1

(2k + 1)!+ · · ·

)

= cosµt + i sinµt.

Therefore,e(λ+iµ)t = eλteiµt = eλt(cosµt + i sinµt).

(2) We now have two complex valued functions:

y1(t) = e(λ+iµ)t = eλteiµt = eλt(cosµt + i sinµt)y2(t) = e(λ−iµ)t = eλte−iµt = eλt(cosµt− i sinµt),

which are possible solutions, but they are not real valued functions. How-ever, if we write such a complex valued solution as y(t) = u(t) + iv(t) forreal valued functions u(t) and v(t), then

L[y](t) = a[u′′(t) + iv′′(t)] + b[u′(t) + iv′(t)] + c[u(t) + iv(t)]= L[u](t) + iL[v](t) = 0.


Thus L[u](t) = 0 = L[v](t), so that u(t) and v(t) are also solutions, whichare two real valued solutions that we are looking for.

u(t) = eλt cosµt, v(t) = eλt sinµt.

From the direct computation, we get

W [u, v](t) = µe2λt 6= 0, if µ 6= 0.

Therefore, the general solution is

y(t) = c1u(t) + c2v(t) = eλt(c1 cosµt + c2 sinµt),

where λ = −b2a and µ =

√4ac−b2

2a 6= 0.


y′′ + 2y′ + 4y = 0, y(0) = 1, y′(0) = 1.

Solution: The characteristic equation r2 + 2r + 4 = 0 has two solutions:

r1 = −1 + i√

3, r2 = −1− i√

3.

Thusu(t) = e−t cos

√3t, v(t) = e−t sin

√3t

are two real valued solutions, so that the general solution is

y(t) = e−t(c1 cos√

3t + c2 sin√

3t).

The constants c1 and C2 are to be determined from the initial conditions

y(0) = c1 = 1, y′(0) = −c1 +√

3c2 = 1.

By solving these equations for c1 and c2, we get

c1 = 1, c2 =2√3,

so that the particular solution is

y(t) = e−t(cos√

3t +2√3

sin√

3t). ¤


Case 3: r1 = r2: repeated roots. This is the case in which b2− 4ac = 0,or µ = 0 in the case 2, so that r1 = r2 = − b

2a = λ, and we have only onesolution

y1(t) = eλt = e−b2a

t.

To find the general solution, we need a second solution that is not aconstant multiple of y1(t) since cy1(t) is a linearly dependent solution. Inthe eighteenth century J.D’Alembert replaced c by a function v(t) and thentried to find v(t) so that y2(t) = v(t)y1(t) becomes a solution:

y′2 = vy′1 + v′y1, y′′2 = vy′′1 + 2v′y′1 + v′′y1,

⇒ L[y2] = a(vy′′1 + 2v′y′1 + v′′y1) + b(vy′1 + v′y1) + cvy1

= ay1v′′ + (2ay′1 + by1)v′ + (ay′′1 + by′1 + cy1)v

= ay1v′′ = 0,

since y1 is a solution of ay′′(t) + by′(t) + cy(t) = 0, and y′1(t) = − b2ae−

b2a

t.Thus v(t) = t, and y2(t) = ty1(t) is a second solution. Moreover,

W [y1, y2](t) = det[

eλt teλt

λeλt (1 + λt)eλt

]= eλt 6= 0

shows they form a fundamental set of solutions. Therefore, the generalsolution is

y(t) = c1y1(t) + c2ty1(t) = (c1 + c2t)y1(t).


y′′ + 4y′ + 4y = 0, y(0) = 1, y′(0) = 3.

Solution: The characteristic equation r2 + 4r + 4 = 0 has two equal solu-tions: r1 = −2 = r2. Thus

y(t) = e−2t(c1 + c2t)

is the general solution. From the initial conditions

1 = y(0) = c1, 3 = y′(0) = −2c1 + c2,

we get c1 = 1, c2 = 5 so that the particular solution is

y(t) = e−2t(1 + 5t). ¤


In general, if we have only one solution y1(t) of a H2O-LDE

L[y](t) = (D2 + p(t)D + q(t))y(t) = y′′(t) + p(t)y′(t) + q(t)y(t) = 0,

a second solution can be y2(t) = y1(t)v(t). Then

y′2 = vy′1 + v′y1, y′′2 = vy′′1 + 2v′y′1 + v′′y1,

⇒ L[y2] = vy′′1 + 2v′y′1 + v′′y1 + p(vy′1 + v′y1) + qvy1

= y1v′′ + (2y′1 + py1)v′ + (y′′1 + py′1 + qy1)v

= y1v′′ + (2y′1 + py1)v′ = 0,

since y1 is a solution. By setting v′(t) = u(t),

= y1u′ + (2y′1 + py1)u = 0,

which is now a first order equation. Now we know what the solution of thisequation is:

v′(t) = u(t) = exp

(−2

∫y′1y1

dt−∫

p(t)dt

)=

1y21

exp

(−

∫p(t)dt

).

Then,

v(t) =∫

exp(− ∫

p(t)dt)

y1(t)2dt.

It is easy to see that W [y1, y2] 6= 0 so that {y1(t), y2(t)} is a fundamentalset of solutions, where

y2(t) = y1(t)v(t) = y1(t)∫

exp(− ∫

p(t)dt)

y1(t)2dt.

This is called the method of reduction of order since the problem isreduced to solving a first order equation.

Example 5.2.4 Find the general solution y(t) of

2t2y′′ + 3ty′ − y = 0, t > 0.

Solution: One can easily verify that y1(t) = 1t is a solution. Set y2(t) =

1t v(t). Then exp(− ∫

p(t)dt) = exp(−32

∫1t dt) = t−

32 , and

v(t) =∫

t−32

y1(t)2dt =

∫t2t−

32 dt =

∫t

12 dt =

23t

32 .

Hence, y2(t) = 1t

23 t

32 = 2

3 t1/2 and the general solution is

y(t) = c1y1 + c2y2 =c1

t+ c2t

12 . ¤



(1− t2)y′′ + 2ty′ − 2y = 0, y(0) = 3, y′(0) = −4.

Solution: One can easily verify that y1(t) = t is a solution. Now thisequation is equivalent to

y′′ +2t

1− t2y′ − 2

1− t2y = 0.

Then

v(t) =∫ exp

(− ∫

2t1−t2

dt)

y21(t)

dt =∫

exp(ln(1− t2))t2

dτ

=∫

1− t2

t2dτ = −

(1t

+ t

),

and so y2(t) = −t(

1t + t

)= −(1 + t2). Hence y(t) = c1t− c2(1 + t2). Since

3 = y(0) = −c2, −4 = y′(0) = c1,

we get c1 = −4, c2 = −3 so that the particular solution is

y(t) = −4t + 3(1 + t2). ¤

Example 5.2.6 [Legendre Equation] Find the general solution y(t) of

(1− t2)y′′ − 2ty′ + 2y = 0.

Solution: One can easily verify that y1(t) = t is a solution. Set y2(t) =tv(t). Then

exp(−∫

p(t)dt) = exp(∫

2t

1− t2dt) = exp(− ln(1− t2)) =

11− t2

,

and v(t) =∫

exp(− ∫p(t)dt)

y1(t)2dt

=∫

1(1− t2)t2

dt =∫ (

1t2

+1

1− t2

)dt

=∫ (

1t2

+12(

11 + t

+1

1− t))

dt

= −1t

+12

ln1 + t

1− t.


Hence y2(t) = tv(t) = 1− t2 ln 1+t

1−t , and the general solution is

y(t) = c1t + c2

(1− t

2ln

1 + t

1− t

). ¤

5.3 Nonhomogeneous Equations

Nonhomogeneous 2-nd order linear differential equations (NH2O-LDE) is ofthe form:

L[y](t) = (D2 + pD + q)(y(t)) = y′′(t) + p(t)y′(t) + q(t)y(t) = g(t),

where p, q, g are continuous functions on an interval I = (α, β).

Theorem 5.3.1 Let Y1(t) and Y2(t) be any two solutions of L[y](t) = g(t),and y1(t) and y2(t) be linearly independent solutions of L[y](t) = 0. Then

Y2(t) = c1y1(t) + c2y2(t) + Y1(t),

for some constants c1 and c2.

Proof: By a direct computation:

L[Y2 − Y1](t) = L[Y2](t)− L[Y1](t) = g(t)− g(t) = 0.

Thus Y2(t)− Y1(t) is a solution of L[y](t) = 0, that is,

Y2(t) = c1y1(t) + c2y2(t) + Y1(t),

for some constants c1 and c2. ¤


L[y](t) = y′′ + y = t.

Solution: The characteristic roots are r = ±i, so that a fundamental set ofsolutions of the homogeneous equation L[y](t) = y′′ + y = 0 is {cos t, sin t}.It is now easy to verify that Y1(t) = t is a particular solution of the givennonhomogeneous equation. Thus, the general solution is

y(t) = c1 cos t + c2 sin t + t. ¤

5.3. Variation of Parameters 151

Example 5.3.2 Three particular solutions of a certain NH2O LDE areknown as

Y1(t) = t, Y2(t) = t + et, Y1(t) = 1 + t + et.

Find the general solution the NH2O LDE.

Solution: By Theorem 10.3.2,

Y2(t)− Y1(t) = et, Y3(t)− Y2(t) = 1

are solutions of the corresponding homogeneous equation, which are alsolinearly independent. Thus the general solution is

y(t) = c1et + c2 + t. ¤

5.3.1 Variation of parameters

To find a particular solution of a NH2O LDE

L[y](t) = y′′(t) + p(t)y′(t) + q(t)y(t) = g(t), (5.2)

one can use the general solution of the corresponding H2O LDE:

L[y](t) = y′′(t) + p(t)y′(t) + q(t)y(t) = 0. (5.3)

Let y1(t) and y2(t) be two linearly independent solutions of Equation (5.3).Then the general solution is

yc(t) = c1y1(t) + c2y2(t).

The basic idea in the method of variation of parameters is to replace c1

and c2 by functions u1(t) and u2(t), and then determine these functions sothat

y(t) = u1(t)y1(t) + u2(t)y2(t)

becomes a solution of the Equation (5.2).It seems that we are making the problem more complicated since we

are replacing the problem of finding one unknown function y(t) by a harderproblem of finding two unknown functions u1(t) and u2(t). However, wewill see that they are found as the solutions of two very simple first orderequations. The main advantage of this method of variation of parameters


is that it is a general method: in principle at least, it can be applied toany equation, and it requires no detailed assumption about the form ofthe solution. On the other hand, this method eventually requires that weevaluate certain integrals involving the nonhomogeneous term g(t), whichmay present difficulties.

For y = u1y1 + u2y2, compute L[y]:

y′ = (u1y′1 + u2y

′2) + (u′1y1 + u′2y2)

= (u1y′1 + u2y

′2), if we impose u′1y1 + u′2y2 = 0.

y′′ = u1y′′1 + u′1y

′1 + u2y

′′2 + u′2y

′2.

∴ L[y] = (u′1y′1 + u′2y

′2) + u1(y′′1 + py′1 + qy1) + u2(y′′2 + py′2 + qy2)

= u′1y′1 + u′2y

′2 = g(t),

since y1(t) and y2(t) are solutions of (5.3). Thus, y = u1y1 + u2y2 is asolution of (5.2) if u1(t) and u2(t) satisfy

u′1y1 + u′2y2 = 0u′1y

′1 + u′2y

′2 = g(t).

Since

W [y1, y2](t) =[

y1 y2

y′1 y′2

](t) = y1(t)y′2(t)− y′1(t)y2(t) 6= 0,

one can solve the system of equations for u′1(t) and u′2(t):

u′1(t) = − g(t)y2(t)W [y1, y2](t)

, u′2(t) =g(t)y1(t)

W [y1, y2](t).

u1(t) = −∫

g(t)y2(t)W [y1, y2](t)

dt, u2(t) =∫

g(t)y1(t)W [y1, y2](t)

dt.

These are the most general form of solutions. However, these integrals arenot easy to evaluate in general.


L[y](t) = t2y′′ + ty′ − y = t ln t, on (0,∞).

Solution: One can easily verify that y1(t) = t and y2(t) = 1t are linearly

independent solutions of the homogeneous equation L[y](t) = t2y′′+ty′−y =0, with

W [y1, y2](t) = y1(t)y′2(t)− y′1(t)y2(t) = t(− 1t2

)− 1t

= −2t6= 0.

5.3. Variation of Parameters 153

Thus we have

u′1(t) = − g(t)y2(t)W [y1, y2](t)

= −ln tt

1t

−2t

=ln t

2t,

u′2(t) =g(t)y1(t)

W [y1, y2](t)=

ln tt t

−2t

= − t ln t

2,

so that

u1(t) = −∫

ln t

2tdt =

(ln t)2

4,

u2(t) =∫− t ln t

2dt = − t2(2 ln t− 1)

8.

Consequently, the general solution is

y(t) = c1t + c21t

+(ln t)2

4t− 1

4t ln t,

where t8 in u2(t)1

t is absorbed in c1t. ¤

Example 5.3.4 Find a particular solution y(t) of

L[y](t) = y′′ + y = tan t, with y(0) = 1, y′(0) = 1, on (−π

2,π

2).

Solution: The characteristic roots are r = ±i, so that a fundamental set ofsolutions of the homogeneous equation L[y](t) = y′′ + y = 0 is {cos t, sin t}with

W [y1, y2](t) = y1(t)y′2(t)− y′1(t)y2(t) = (cos t) cos t− (− sin t) sin t = 1 6= 0.

Thus we have

u′1(t) = − g(t)y2(t)W [y1, y2](t)

= − tan t sin t = −sin2 t

cos t=

cos2 t− 1cos t

,

u′2(t) =g(t)y1(t)

W [y1, y2](t)= tan t cos t = sin t,

so that

u1(t) = −∫

sin2 t

cos tdt =

∫(cos t− sec t)dt = sin t− ln(sec t + tan t),

u2(t) =∫

sin tdt = − cos t.


Consequently, a particular solution is

Y (t) = cos t(sin t− ln(sec t + tan t)) + sin t(− cos t) = − cos t ln(sec t + tan t),

on (−π2 , π

2 ).Since the general solution is

y(t) = c1 cos t + c2 sin t− cos t ln(sec t + tan t),

for the initial condition,

1 = y(0) = c1, 1 = y′(0) = c2 − 1, or c2 = 2.

Thus, the solution we are looking for is

y(t) = cos t + 2 sin t− cos t ln(sec t + tan t). ¤

5.3.2 Method of undetermined coefficients

As mentioned earlier, a serious disadvantage of the method of variation ofparameters is that the integrations required are often quite difficult. Incertain cases it is usually much simpler to guess a particular solution. Inthis section we will establish a systematic method for guessing solutions ofN-H2O LDE with constant coefficients:

L[y](t) = ay′′(t) + by′(t) + cy(t) = g(t), a, b, c constants,

whose solutions can be guessed depending on the form of the nonhomoge-neous term g(t) as follows:

Case 1: g(t) = d0 + d1t + · · ·+ dntn, a polynomial in t of degree n:One can easily guess that a solution must be also a polynomial:

y(t) = A0 + A1t + · · ·+ Antn

of the same degree n. Then

L[y](t) = a(2A2 + · · ·+ n(n− 1)Antn−2) + b(A1 + · · ·+ nAntn−1)+c(A0 + A1t + · · ·+ Antn)

= (2aA2 + bA1 + cA0) + · · ·+ (nbAn + cAn−1)tn−1 + cAntn

= d0 + d1t + · · ·+ dntn.

5.3. Method of Undetermined Coefficients 155

Therefore, for c 6= 0,

cAn = dn, An =dn

c,

cAn−1 + nbAn = dn−1, An−1 =dn−1

c− nb

dn

c2,

...

2aA2 + bA1 + cA0 = d0, A0 =1c(d0 − bA1 − 2aA2).

If c = 0 and b 6= 0, then L[y](t) = ay′′ + by′ is a polynomial of degreen− 1 while g(t) is of degree n. Thus the solution can of the form:

y(t) = t(A0 + A1t + · · ·+ Antn),

and the coefficients are determined by the same process. Note that weomitted the constant term in this solution since ’y = a constant’ is a solutionof the homogeneous equation L[y](t) = ay′′ + by′ = 0 and so is contained inthe general solution of the homogeneous equation.

If b = c = 0, then

L[y](t) = ay′′(t) = g(t), a constants

can be integrated directly to yield a solution:

y(t) =t2

a

(d0

1 · 2 +d1

2 · 3 t + · · ·+ dn

(n + 1)(n + 2)tn

).


L[y](t) = y′′ + y + y = t2.

Solution: (Method of undetermined coefficients:) Set

y(t) = A0 + A1t + A2t2

and compute

L[y](t) = y′′ + y + y

= 2A2 + (A1 + 2A2t) + (A0 + A1t + A2t2)

= (2A2 + A1 + A0) + (A1 + 2A2)t + A2t2

= t2


to yield

A2 = 1, A1 + 2A2 = 0, 2A2 + A1 + A0 = 0,

∴ A1 = −2, A0 = 0,

and so y(t) = −2t + t2 is a particular solution.(Method of variation of parameters:) The characteristic roots are r =

−12±i

√3

2 , so that a fundamental set of solutions of the homogeneous equationL[y](t) = 0 is

{y1(t) = e−t/2 cos√

32

t, y2(t) = e−t/2 sin√

32

t}

with

W [y1, y2](t) = y1(t)y′2(t)− y′1(t)y2(t) = e−t

√3

26= 0.

Thus we have

u1(t) = −∫

t2e−t/2 sin√

3t2

e−t√

32

dt = − 2√3

∫t2et/2 sin

√3t

2dt,

u2(t) =∫

t2e−t/2 cos√

3t2

e−t√

32

dt =2√3

∫t2et/2 cos

√3t

2dt.

These integrations are extremely difficult to evaluate. ¤

Case 2: g(t) = (d0 + d1t + · · ·+ dntn)eαt:Set y(t) = eαtv(t). Then

y′ = eαt(v′ + αv), y′′ = eαt(v′′ + 2αv′ + α2v),

so thatL[y](t) = eαt(av′′ + (2aα + b)v′ + (aα2 + bα + c)v).

Consequently, y(t) = eαtv(t) is a solution if and only if

av′′ + (2aα + b)v′ + (aα2 + bα + c)v = d0 + d1t + · · ·+ dntn,

whose solutions are of the form in case 1 according to the following threecases:


(1) If aα2 + bα+ c 6= 0: i.e., α is not a root of the characteristic equationar2 + br + c = 0 so that eαt is not a solution of L[y](t) = 0, then

y(t) = (A0 + A1t + · · ·+ Antn)eαt.

(2) If aα2+bα+c = 0 but 2aα+b 6= 0: i.e., α is a root of the characteristicequation ar2 + br + c = 0 so that eαt is a solution of L[y](t) = 0, but teαt isnot, then

y(t) = t(A0 + A1t + · · ·+ Antn)eαt.

(3) If aα2 + bα + c = 0 and 2aα + b = 0: i.e., α is a double root of thecharacteristic equation ar2 + br + c = 0 so that eαt and teαt are solutions ofL[y](t) = 0, then

y(t) = t2(A0 + A1t + · · ·+ Antn)eαt.


L[y](t) = y′′ − 4y + 4y = (1 + t + · · ·+ t27)e2t.

Solution: The characteristic roots are r1 = r2 = 2, so that this problembelongs to case (3) above, and a fundamental set of solutions of the homo-geneous equation L[y](t) = 0 is

{y1(t) = e2t, y2(t) = te2t}.A particular solution can be y(t) = e2tv(t), where v(t) satisfies

v′′(t) = g(t) = 1 + t + · · ·+ t27,

so that

v(t) =t2

1 · 2 +t3

2 · 3 + · · ·+ t29

28 · 29.

Hence the general solution is

y(t) = e2t

(c1 + c2t +

t2

1 · 2 +t3

2 · 3 + · · ·+ t29

28 · 29

).

It would be a terrible waste of paper and time if one plug the expression

y(t) = t2(A0 + A1t + · · ·+ A27t27)e2t

into the given equation and try to find Ai’s. ¤



L[y](t) = y′′ − 3y + 2y = (1 + t)e3t.

Solution: The characteristic roots are r1 = 1 and r2 = 2, so that e3t isnot a solutions of the homogeneous equation L[y](t) = 0, and this problembelongs to case (1) above. Set y(t) = (A0 + A1t)e3t and compute

L[y](t) = y′′ − 3y + 2y(= ay′′ + by′ + cy)= e3t[A0(ar2 + br + c) + A1(2ar + b) + A1(ar2 + br + c)])= e3t(2A0 + 3A1 + 2A1t), for r = 3, a = 1, b = −3, c = 2,

= e3t(1 + t).

Hence, 1 + t = 2A0 + 3A1 + 2A1t, and so A1 = 12 , A0 = −1

4 . Therefore,

y(t) = (−14

+12t)e3t. ¤

Case 2: g(t) = (d0 + d1t + · · ·+ dntn){

cosωtsinωt

:

Lemma 5.3.2 Let y(t) = u(t) + iv(t) be a complex valued solution of

L[y](t) = ay′′(t)+by′(t)+cy(t) = g(t) = g1(t)+ig2(t), a, b, c real constants.

ThenL[u](t) = g1(t), L[v](t) = g2(t).

This is quite clear, since

L[y](t) = L[u + iv](t) = L[u](t) + iL[v](t) = g1(t) + ig2(t).

Let y(t) = u(t) + iv(t) be a particular solution of

L[y](t) = (d0 + d1t + · · ·+ dntn)eiωt

= (d0 + d1t + · · ·+ dntn) cos ωt + i(d0 + d1t + · · ·+ dntn) sinωt.

Thus y(t) = u(t) is a solution of

L[y](t) = ay′′(t) + by′(t) + cy(t) = (d0 + d1t + · · ·+ dntn) cosωt,

and y(t) = v(t) is a solution of

L[y](t) = ay′′(t) + by′(t) + cy(t) = (d0 + d1t + · · ·+ dntn) sin ωt.



L[y](t) = y′′ + 4y = sin 2t.

Solution: The solution is the imaginary part of the solution of

L[y](t) = y′′ + 4y = ei2t.

The characteristic roots are r = ±i2, so that the equation has a particularsolution of the form y(t) = A0te

i2t. Since y′′ = A0(i4− 4t)ei2t,

L[y](t) = y′′ + 4y = 4A0(i− t)ei2t − 4A0tei2t = i4A0e

i2t.

Thus, A0 = 14i = −i1

4 and

y(t) = − it

4ei2t = − it

4(cos 2t + i sin 2t) =

t

4sin 2t− i

t

4cos 2t.

Therefore, v(t) = − t4 cos 2t is a particular solution.

As a byproduct, we also obtained a particular solution u(t) = t4 sin 2t of

L[y](t) = y′′ + 4y = cos 2t. ¤


L[y](t) = y′′ + 2y′ + y = tet cos t.

Solution: The solution is the real part of the solution of

L[y](t) = y′′ + 4y = te(1+i)t.

Note that 1 + i is not the characteristic roots since they are r1 = r2 =−1. Thus the equation has a particular solution of the form y(t) = (A0 +A1t)e(1+i)t, and so

L[y](t) = y′′ + 2y′ + y = e(1+i)tt

= e(1+i)t(A0((1 + i) + 1)2 + A1(2(1 + i) + 2) + A1((1 + i) + 1)2t)= e(1+i)t(A0(2 + i)2 + A12(2 + i) + A1(2 + i)2t).

Thus, A1 = 1(2+i)2

and A0 = − 2(2+i)3

so that

y(t) =(− 2

(2 + i)3+

t

(2 + i)2

)e(1+i)t


=et

125{[(15t− 4) cos t + (20t− 22) sin t]

+i [(22t− 20) cos t + (15t− 4) sin t]} .

Therefore, u(t) = et

125 [(15t−4) cos t+(20t−22) sin t] is a particular solution.As a byproduct, we also obtained a particular solution v(t) = et

125 [(22t−20) cos t + (15t− 4) sin t] of

L[y](t) = y′′ + 2y′ + y = tet sin t. ¤

Case 4: g(t) =∑k

j=1 pj(t)eαjt, where pj(t) are polynomials:If yj(t) is a particular solution of L[y](t) = pj(t)eαjt, for j = 1, . . . , k,

then y(t) =∑k

j=1 yj(t) is a solution of the given equation since

L[y](t) = L

k∑

j=1

yj

(t) =

k∑

j=1

L[yj ](t) =k∑

j=1

pj(t)eαjt.

Thus, a particular solution of

L[y](t) = y′′ + y′ + y = et + t sin t

is the sum of the solutions y1(t) and y2(t) of

L[y](t) = et, and L[y](t) = t sin t,

respectively.

5.4 Applications to Mechanical Vibrations

The problem in this section will illustrate how the coefficients a, b, c and thenonhomogeneous part g(t) in the N-H2O LDE

L[y](t) = ay′′(t) + by′(t) + cy(t) = g(t)

affect the solution.An object of mass m is hanging on an elastic spring of length `, which

is suspended vertically from a ceiling. Hooke’s law of spring says that ifit is stretched or compressed a distance 4`, which is small compared to itslength `, then it exerts a restoring force Fs that is proportional to 4`: i.e.,

Fs = k4`,

5.4. Applications to Mechanical Vibrations 161

for some constant k, called the spring constant. In addition, the mass andspring may be immersed in a medium such as oil which impedes the motionof an object through it. This impedance is called a damping force.

If the restoring force, Fs = k4`, of the spring is exactly balanced by theweight mg of the mass so that the mass is hanged at rest without any exter-nal force acting upon it, we say the mass is in the equilibrium position.Thus, in equilibrium position, the spring has been stretched a distance 4`so that k4` = mg, at this position we set y(0) = 0.

Let y(t) denote the position of the mass at time t by some external forces.The total force acting on the mass m is the sum of four separate forces: W ,R, D, and F :

(1) W = mg is the weight of the mass m pulling it downward. This forceis positive since we choose the downward direction as the positive ydirection.

(2) R is the restoring force of the spring which is negatively proportionalto the elongation or compression, 4` + y, i.e., R = −k(4` + y).

(3) D is the damping (or resistance) force of the medium. This force alsoalways acts in the direction opposite the direction of motion, and isdirectly proportional to the magnitude of the velocity dy

dt , i.e., D =−cdy

dt .

(4) F is the external force acting on the mass. This force in general willdepend explicitly on time.

The Newton’s second law of motion is written as

my′′(t) = W + R + D + F = mg − k(4` + y)− cdy

dt+ F (t)

= −ky(t)− cy′(t) + F (t),

since mg = k4`. Thus, it is the second linear differential equation:

my′′(t) + ky(t) + cy′(t) = F (t),

where m, c, k are nonnegative constants.I. Undamped free vibrations: No damping force and no external force

are presented: c = 0, and F (t) = 0,

my′′(t) + ky(t) = 0, or y′′ + ω20y = 0, ω2

0 =k

m.


Since the characteristic roots are r = ±iω0, the general solution is

y(t) = a cosω0t + b sinω0t,

= R cos δ cosω0t + R sin δ sinω0t, with a = R cos δ, b = R sin δ

= R cos(ω0t− δ), with R =√

a2 + b2, δ = tan−1 b

a,

where R is called the amplitude, ω0 is the natural frequency, δ the phaseangle. This solution is called simple harmonic motion.

-

6R

t

−R

y

R cos δ

2πω0

= T

a

bR =

√a2 + b2

δ

II. Damped free vibrations: c 6= 0, but no external force is presented,F (t) = 0,

my′′(t) + cy′ + ky(t) = 0,

which is a H2O LDE. The characteristic roots are

r1 =−c +

√c2 − 4km

2m, r2 =

−c−√c2 − 4km

2m.

Three general solutions are possible depending on the discriminant c2−4km:Note that since m, c, k ≥ 0, c2 − 4km ≤ c2.

(1) If c2 − 4km > 0, then y(t) = aer1t + ber2t with ri < 0. This motion iscalled overdamped.

(2) If c2 − 4km = 0, then y(t) = eλt(a + bt) with λ = − c2m . This motion

is called critically damped.(3) If c2 − 4km < 0, then, by writing rj = λ ± iµ with λ = − c

2m < 0,µ =

√4km−c2

2m ,

y(t) = eλt(a cosµt + b sinµt) = Reλt cos(µt + δ).

5.4. Applications to Mechanical Vibrations 163

This motion is called underdamped, which occurs quite often inmechanical systems and represents a damped vibration.

Note that in any case ri, λ < 0, so that erit, eλt → 0 as t →∞. Especially,in the third case, the displacement y(t) oscillates between the decreasingamplitude curves ±Reλt = ±Re−

c2m

t, and dies out as t increases.III. Damped and forced vibrations: A damping force and a periodic

external force are presented: c 6= 0 and F (t) = F0 cosωt,

my′′(t) + cy′(t) + ky(t) = F0 cosωt,

which is a N-H2O LDE. Using the method of undetermined coefficients, aparticular solution is the real part of the solution y(t) = Aeiωt with A =

F0(k−mω2)+icω

:

ψ(t) =F0

(k −mω2)2 + (cω)2[(k −mω2) cos ωt + cω sinωt]

=F0

(k −mω2)2 + (cω)2[(k −mω2)2 + (cω)2]

12 cos(ωt− δ)

=F0 cos(ωt− δ)

[(k −mω2)2 + (cω)2]12

,

where tan δ = cωk−mω2 . Hence, the general solution of the N-H2O LDE is of

the form

y(t) = φ(t) + ψ(t) = φ(t) +F0 cos(ωt− δ)

[(k −mω2)2 + (cω)2]12

,

where φ(t) is the general solution of the homogeneous equation described inthe case II. Sine in any case φ(t) → 0 as t → ∞, for large t, y(t) = ψ(t)describe very accurately the position of the mass m regardless of its initialposition and velocity. For this reason, ψ(t) is called the steady state part,while φ(t) is called the transient part of the solution.

IV. Undamped, but forced vibrations: No damping force c = 0 and aperiodic external force is presented: F (t) = F0 cosωt,

my′′(t) + ky(t) = F0 cosωt, or y′′ + ω20y =

F0

mcosωt, ω2

0 =k

m.

If ω0 6= ω, then the general solution is

y(t) = a cosω0t + b sinω0t +F0

m(ω20 − ω2)

cosωt,


which is the sum of two periodic functions of different frequencies and am-plitudes. Suppose that the mass m is initially at rest so that y(0) = 0 andy′(0) = 0. Then a = − F0

m(ω20−ω2)

and b = 0. Thus

y(t) =F0

m(ω20 − ω2)

(cosωt− cosω0t)

=[

2F0

m(ω20 − ω2)

sin(ω0 − ω)t

2

]sin

(ω0 + ω)t2

.

If |ω0 − ω| is small, then ω0 + ω > |ω0 − ω|, and so sin (ω0+ω)t2 is a rapidly

oscillating function compared to sin (ω0−ω)t2 . Thus the motion is a rapid

oscillation with frequency (ω0+ω)2 , but with a slowly varying sinusoidal am-

plitude:2F0

m(ω20 − ω2)

sin(ω0 − ω)t

2.

-

6

y

t

sin (ω0+ω)t2

M sin (ω0−ω)t2

M sin (ω0−ω)t2

This type of motion, with a periodic variation of amplitude, is called abeat, which occurs frequently in acoustics when two tuning forks of nearlyequal frequency are sounded simultaneously. In electronics the variation ofthe amplitude with time is called amplitude modulation.

The interesting case is when ω0 = ω: that is, when the frequency ω of theexternal force equals the natural frequency of the system. In the equation

y′′ + ω20y =

F0

mcosω0t,

the nonhomogeneous term F0 cosω0t is a solution of the homogeneous equa-tion. A particular solution u(t) is the real part of a solution φ(t) = Ateiω0t

5.5. SERIES SOLUTIONS 165

ofy′′ + ω2

0y =F0

meiω0t,

where eiω0t is a solution of the homogeneous equation y′′ + ω20y = 0. One

can do some little work to find A = −iF02mω0

, and

φ(t) = Ateiω0t =F0t

2mω0sinω0t− i

F0t

2mω0cosω0t,

so thatu(t) =

F0t

2mω0sinω0t.

Thus, the general solution of the given equation is

y(t) = c1 cosω0t + c2 sinω0t +F0

2mω0t sinω0t.

-

6 y = Mt

∼ sinω0t

Note that u(t) is unbounded as t → ∞ regardless of ci’s since the am-plitude is y = ± F0

2mω0t. This motion is known as resonance. In actual

practice, the spring would probably break down.The collapse of the Tacoma Bridge in Seatle in 1940 is the most famous

example of this case. When soldiers cross a bridge, they traditionally breakstep to eliminate the periodic force of their marching that could resonate anatural frequency of the bridge.

5.5 Series Solutions

So far, we have shown a systematic procedure for constructing fundamentalset of solutions only for the equations with constant coefficients. The princi-pal tool to deal with much larger class of equations with variable coefficientsis to represent given functions by power series.


Consider the general H2O LDE:

L[y](t) = P (t)y′′(t) + Q(t)y′(t) + R(t)y(t) = 0. (5.4)

A wide class of problems in mathematical physics leads to equation of thisform having polynomial coefficients. Assume that P , Q, R are polynomialsin t. It turns out that the solution of the equation 5.4 in an interval contain-ing t0 is closely associated with the behavior of P in that interval. A pointt0 such that P (t0) 6= 0 is called an ordinary point. Since P is continuous,there is an interval I = (α, β) about t0 in which P (t) 6= 0. In that intervalwe can divide the equation by P (t) to get the standard form:

y′′(t) + p(t)y′(t) + q(t)y(t) = 0, p(t) =Q(t)P (t)

, q(t) =R(t)P (t)

. (5.5)

In this case, one can guess that the solutionis also a polynomial withunknown degree. Thus we propose the solution can be a power series of theform

∑∞n=0 an(t − t0)n with certain interval of convergence (t0 − %, t0 + %),

% > 0. At first sight, it seems quite unattractive to seek a solution in thisway. But this is actually a convenient and useful form for a solution. Indeed,even if we can obtain a solution in terms of elementary functions, such asexponential or trigonometric functions, we are likely to need a power seriesif we want to evaluate them numerically or to plot their graphs.


L[y](t) = y′′(t) + y(t) = 0, on R.

Solution: From the previous sections, we know that y(t) = c1 cos t+c2 sin tis the general solution, which is not a polynomial. However, since P (t) =1 = R(t) and Q(t) = 0 are polynomials, t = 0 is an ordinary point. Weguess the solutions are also polynomials, but do not know what the degreeis. Thus, we expect the solution to be a power series:

y(t) = a0 + a1t + a2t2 + · · ·+ antn + · · · =

∞∑

n=0

antn.

∴ y′′(t) + y(t) =∞∑

n=2

(n− 1)nantn−2 +∞∑

n=0

antn

=∞∑

n=0

[(n + 1)(n + 2)an+2 + an]tn = 0.

5.5. Series Solutions 167

Thus (n + 1)(n + 2)an+2 + an = 0 for all n, and so

n = 0, 1 · 2a2 + a0 = 0 ⇒ a2 = − a01·2

n = 1, 2 · 3a3 + a1 = 0 ⇒ a3 = − a11·2·3

n = 2, 3 · 4a4 + a2 = 0 ⇒ a4 = − a23·4 = a0

4!

n = 3, 4 · 5a5 + a3 = 0 ⇒ a5 = − a34·5 = a1

5!

......

n = 2k, a2k = (−1)k a0(2k)!

n = 2k + 1, a2k+1 = (−1)k a1(2k+1)! .

Therefore,

y(t) = a0

∞∑

n=0

(−1)n t2n

(2n)!+ a1

∞∑

n=0

(−1)n t2n+1

(2n + 1)!

= a0 cos t + a1 sin t. ¤


L[y](t) = y′′(t)− 2ty′(t)− 2y(t) = 0, on R.

Solution: Since P (t) = 1, Q(t) = −2t and R(t) = −2 are polynomials, weguess the solutions are also polynomial, but do not know what the degreeis. So we set

y(t) = a0 + a1t + a2t2 + · · ·+ antn + · · · =

∞∑

n=0

antn.

Then

L[y](t) =∞∑

n=2

(n− 1)nantn−2 − 2t

∞∑

n=1

nantn−1 − 2∞∑

n=0

antn

=∞∑

n=0

(n + 1)(n + 2)an+2tn −

∞∑

n=0

2nantn −∞∑

n=0

2antn

=∞∑

n=0

[(n + 1)(n + 2)an+2 − 2nan − 2an]tn = 0.

Thus


(n+1)(n+2)an+2−2nan−2an = (n+1)(n+2)an+2−2(n+1)an = 0, ∀ n,

or an+2 = 2n+2an:

a2n =22n

a2(n−1) =1n

1n− 1

a2(n−2) = · · · = 1n!

a0,

a2n+1 =2

2n + 1a2n−1 =

22n + 1

22n− 1

a2n−3 = · · · = 2n

3 · 5 · · · (2n + 1)a1.

Therefore, the solution is

y(t) = a0 + a1t + a0t2 +

23a1t

3 +12a0t

4 +22

3 · 5a1t5 + · · ·

= a0

(1 + t2 +

12!

t4 +13!

t6 + · · ·+ 1n!

t2n + · · ·)

+a1

(t +

23t3 +

22

3 · 5 t5 + · · ·+ 2n

3 · 5 · · · (2n + 1)t2n+1 + · · ·

)

= a0et2 + a1

( ∞∑

n=0

2n

3 · 5 · · · (2n + 1)t2n+1

)

= a0y0(t) + a1y1(t).

Note that y0(t) is the solution when a0 = y(0) = 1 and a1 = y′(0) = 0,while y1(t) is the solution when a0 = y(0) = 0 and a1 = y′(0) = 1. ThusW [y0, y1](0) = 1 6= 0 shows that they are linearly independent. ¤

In the general H2O LDE (5.4), it really wasn’t necessary to assume thatthe functions P (t), Q(t) and R(t) are polynomials. We only need that theycould be expressed as power series about t = t0 (such functions are said tobe analytic at t0. Of course, we would expect in this case the algebra tobe much more cumbersome).

Theorem 5.5.1 Suppose that p(t) = Q(t)P (t) and q(t) = R(t)

P (t) are analytic (i.e.,they have convergent Taylor series expansion) at t = t0 with the radius ofconvergence ρ > 0. Then every solution of (5.4) is also analytic at t = t0with the radius of convergence ρ > 0, which is of the form

y(t) =∞∑

n=0

an(t− t0)n.


Note that the radius of convergence of y(t) is determined by that ofp(t) = Q

P and q(t) = RP , rather than by that of P , Q, and R, since the

standard form of (5.4) is

L[y](t) = y′′(t) + p(t)y′(t) + q(t)y(t) = 0.

Example 5.5.3 Find the particular solution of

L[y](t) = y′′(t) +3t

1 + t2y′(t) +

11 + t2

y(t) = 0, with y(0) = 2, y′(0) = 3.

Solution: Since p(t) and q(t) are not polynomials, we change the equationin the form

(1 + t2)y′′(t) + 3ty′(t) + y(t) = 0.

we set y(t) = a0 + a1t + a2t2 + · · ·+ antn + · · · = ∑∞

n=0 antn. Then

L[y](t) = (1 + t2)∞∑

n=2

(n− 1)nantn−2 + 3t

∞∑

n=1

nantn−1 +∞∑

n=0

antn

=∞∑

n=0

(n + 1)(n + 2)an+2tn +

∞∑

n=0

[n(n− 1) + 3n + 1]antn

=∞∑

n=0

[(n + 1)(n + 2)an+2 + (n + 1)2an]tn = 0.

Thus (n + 1)(n + 2)an+2 + (n + 1)2an = 0, ∀ n, or an+2 = −n+1n+2an, and so

a2n = (−1)n 1 · 3 · · · (2n− 1)2 · 4 · · · (2n)

a0 = (−1)n (2n)!22n(n!)2

a0,

a2n+1 = (−1)n 2 · 4 · · · (2n)3 · 5 · · · (2n + 1)

a1 = (−1)n 22n(n!)2

(2n + 1)!a1.


y(t) = a0

∞∑

n=0

(−1)n (2n)!(2nn!)2

t2n + a1

∞∑

n=0

(−1)n (2nn!)2

(2n + 1)!

= a0y0(t) + a1y1(t).

Note that y1(t) is the solution when a0 = y(0) = 1 and a1 = y′(0) = 0,while y2(t) is the solution when a0 = y(0) = 0 and a1 = y′(0) = 1. ThusW [y1, y2](0) = 1 6= 0 shows that they are linearly independent. They both


converge absolutely for t with |t| < 1. In fact p(t) = 3t1+t2

and q(t) = 11+t2

converge absolutely for t with |t| < 1. Moreover, for a0 = y(0) = 2, anda1 = y′(0) = 3 , the particular solution is

y(t) = 2y0(t) + 3y1(t). ¤

Remark: If P , Q, R are polynomials, then it is known that QP has conver-

gent power series about t0 for P (t0) 6= 0, and the radius of convergence ofp = Q

P is precisely the distance from t0 to the nearest zero of P in C. Inthe above example, P (t) = 1 + t2 = 0 if and only if t = ±i. Since t0 = 0,ρ = |i| = 1.

Example 5.5.4 Solve the initial value problem:

L[y](t) = y′′(t) + t2y′(t) + 2ty(t) = 0, with y(0) = 1, y′(0) = 0.

Solution: We set y(t) =∑∞

n=0 antn. Then

L[y](t) =∞∑

n=2

(n− 1)nantn−2 + t2∞∑

n=1

nantn−1 + 2t∞∑

n=0

antn

=∞∑

n=−1

(n + 2)(n + 3)an+3tn+1 +

∞∑

n=0

nantn+1 +∞∑

n=0

2antn+1

= 2a2 +∞∑

n=0

[(n + 2)(n + 3)an+3 + (n + 2)an]tn+1 = 0.

Thus a2 = 0 and (n + 2)(n + 3)an+3 + (n + 2)an = 0, ∀ n ≥ 0, or an+3 =− 1

n+3an. Since a0 = y(0) = 1, a1 = y′(0) = 0, and a2 = 0,

a4 = a7 = a10 = · · · = 0,a5 = a8 = a11 = · · · = 0,

a3 = −a0

3= −1

3,

a6 = −16a3 =

13 · 6 ,

a9 = −19a6 =

13 · 6 · 9 ,

...

a3n =(−1)n

3 · 6 · · · (3n)=

(−1)n

3nn!.



y(t) =∞∑

n=0

(−1)n

3nn!t3n = 1− t3

3+

t6

3 · 6 −t9

3 · 6 · 9 + · · · .

which converges for all t, since p(t) = t2 and q(t) = 2t converge for t ∈ R. ¤


L[y](t) = (t2−2t)y′′(t)+5(t−1)y′(t)+3y(t) = 0, with y(1) = 7, y′(1) = 3.

Solution: Since t0 = 1, we set y(t) =∑∞

n=0 an(t− 1)n, and

P (t) = t2 − 2t = t2 − 2t + 1− 1 = (t− 1)2 − 1.

Then

L[y](t) = [(t− 1)2 − 1]∞∑

n=2

(n− 1)nan(t− 1)n−2 + 5(t− 1)∞∑

n=1

nan(t− 1)n−1

+3∞∑

n=0

an(t− 1)n

= −∞∑

n=0

(n + 1)(n + 2)an+2(t− 1)n +∞∑

n=2

(n− 1)nan(t− 1)n

+∞∑

n=1

5nan(t− 1)n + 3∞∑

n=0

an(t− 1)n

= −∞∑

n=0

(n + 1)(n + 2)an+2(t− 1)n +∞∑

n=0

(n2 + 4n + 3)an(t− 1)n = 0.

Thus −(n+1)(n+2)an+2 +(n2 +4n+3)an = 0, ∀ n ≥ 0, or an+2 = n+3n+2an.

Since a0 = y(0) = 7 and a1 = y′(0) = 3,

a2 =32a0 =

327, a4 =

54a2 =

54

327, a6 =

76a4 =

76

54

327, · · ·

a3 =43a0 =

433, a5 =

65a3 =

65

433, a7 =

87a5 =

87

65

433, · · ·

a2n =3 · 5 · · · (2n + 1)

2 · 4 · · · (2n)7, a2n+1 =

4 · 6 · · · (2n + 2)3 · 5 · · · (2n + 1)

3.



y(t) = 7

( ∞∑

n=0

3 · 5 · · · (2n + 1)2nn!

(t− 1)2n

)+ 3

( ∞∑

n=0

2n(n + 1)!3 · 5 · · · (2n + 1)

(t− 1)2n+1

),

which converges on (0, 2), since P (t) = t(t− 2) = 0 for t = 0 and t = 2. ¤


L[y](t) = (1− t)y′′(t) + y′(t) + (1− t)y(t) = 0, with y(0) = 1, y′(0) = 1.

Solution: We set y(t) =∑∞

n=0 antn. Then

L[y](t) = (1− t)∞∑

n=2

(n− 1)nantn−2 +∞∑

n=1

nantn−1 + (1− t)∞∑

n=0

antn

=∞∑

n=0

(n + 1)(n + 2)an+2tn −

∞∑

n=1

n(n + 1)an+1tn

+∞∑

n=0

(n + 1)an+1tn +

∞∑

n=0

antn −∞∑

n=1

an−1tn,

= 2a2 + a1 + a0 +∞∑

n=1

[(n + 1)(n + 2)an+2 − (n− 1)(n + 1)an+1 + an − an−1]tn = 0.

Thus a2 = a1+a02 and ∀ n ≥ 1,

an+2 =(n− 1)(n + 1)an+1 − an + an−1

(n + 1)(n + 2).

Since a0 = y(0) = 1, a1 = y′(0) = 1,

a2 = −1,

a3 =−a1 + a0

6= 0,

a4 =3a3 − a2 + a1

12=

16,

a5 =8a4 − a3 + a2

20=

160

,

a6 =15a5 − a4 + a3

30=

1360

...

5.5. Singular Points 173

Thus, it is not easy to find a formula for the general term an. However, theform of an+2 given above is a linear recursive formula in an+1, an, and an−1,which can be easily computed by using computers. ¤

5.5.1 Singular points

The general H2O LDE:

L[y](t) = P (t)y′′(t) + Q(t)y′(t) + R(t)y(t) = 0, (5.6)

is said to be singular at t = t0 if P (t0) = 0. In this case, the precedingmethod of power series solution may fail in general, since the solution is notanalytic at t0, and so cannot be represented by a Taylor series at t0.

Since the singular points are usually few in number, one might want toignore them. However, it turns out that the solution y(t) of (5.6) frequentlybecome very large, or oscillates very rapidly in a neighborhood of such asingular point t0, and the singular points determine the principal feature ofthe solution. Thus the behavior of a physical system modelled by such adifferential equation frequently is most interesting in the neighborhood ofa singular point. Hence, one has to study the solution precisely at thosesingular points most carefully.

Definition 5.5.1 The Euler’s equation is of the form:

L[y](t) = t2y′′(t) + αty′(t) + βy(t) = 0, (5.7)

where α, and β are constants.

I. P (t) = t2 = 0 at t = 0. Thus t = 0 is a singular point. We first assumethat t > 0. By a simple inspection, we try y(t) = tr as a solution of (5.7):

L[y](t) = t2y′′(t) + αty′(t) + βy(t)= r(r − 1)tr + αrtr + βtr

= [r(r − 1) + αr + β]tr = F (r)tr = 0,

whereF (r) = r(r − 1) + αr + β = r2 + (α− 1)r + β = 0.

The solutions are

r1 = −12

((α− 1) +

√(α− 1)2 − 4β

), r2 = −1

2

((α− 1)−

√(α− 1)2 − 4β

).


Case 1: (α−1)2−4β > 0: Then (5.7) has two distinct solutions y1(t) = tr1

and y2(t) = tr2 , which are linearly independent since W [y1, y2](t) = (r2 −r1)tr1+r2 6= 0. Thus the general solution is

y(t) = c1tr1 + c2t

r2 .

Case 2: (α − 1)2 − 4β = 0: Then r1 = r2 = 1−α2 , so that we get only

one solution y1(t) = tr1 . For a second solution, one can use the method ofreduction of order. However, we present here an alternative method: Notethat, in this case, the equation (5.7) reduces to L[y](t) = F (r)tr = (r−r1)2tr.We then take partial derivatives of this equation with respect to r to get

∂

∂rL[y](t) =

∂

∂r

[t2

∂2

∂t2y(t) + αt

∂

∂ty(t) + βy(t)

]

= t2∂2

∂t2∂

∂ry(t) + αt

∂

∂t

∂

∂ry(t) + β

∂

∂ry(t)

= L

[∂

∂ry

](t) = L

[∂

∂rtr

]= L [tr ln t] .

∴ ∂

∂r((r − r1)2tr) = (r − r1)2tr ln t + 2(r − r1)tr = L [tr ln t] .

The left side of the last equation vanishes when r = r1. Thus L [tr1 ln t] = 0so that y2(t) = tr1 ln t is a second solution. Since W [y1, y2] = 2t2r1−1 6= 0,the general solution is

y(t) = (c1 + c2 ln t)tr1 .

Case 3: (α− 1)2 − 4β < 0: Then we have two complex roots

r1 = λ + iµ, and r2 = λ− iµ

where λ = 1−α2 and µ = 1

2

√4β − (α− 1)2 6= 0. Hence

φ(t) = tλ+iµ = tλ(eln t)iµ = tλeiµ ln t

= tλ[cos(µ ln t) + i sin(µ ln t)]

is a complex valued solution of (5.7). Thus

y1(t) = tλ cos(µ ln t), y2(t) = tλ sin(µ ln t)

are two linearly independent real solutions of (5.7), and so the general solu-tion is

y(t) = tλ[c1 cos(µ ln t) + c2 sin(µ ln t)].

5.5. Singular Points 175

II. We now assume that t < 0: In this case, tr and ln t are both may notbe well-defined. So, we set t = −x with x > 0. Then for y(t),

dy

dt=

dy

dx

dx

dt= −dy

dx.

d2y

dt2=

dy

dt(−dy

dx) =

d

dx(−dy

dx)dx

dt= (−1)

d2y

dx2(−1) =

d2y

dx2.

Thus, (5.7) becomes

(−x)2d2y

dx2+ α(−x)(−dy

dx) + βy = x2 d2y

dx2+ αx

dy

dx+ βy = L[y](x) = 0,

with x > 0. Hence, case I applies and so x = −t = |t| works well.In summary, an Euler’s equation has one of the following three types of

general solutions: for t 6= 0,

y(t) =

c1|t|r1 + c2|t|r2 , if (α− 1)2 − 4β > 0,|t|r1(c1 + c2 ln |t|), if (α− 1)2 − 4β = 0,|t|λ[c1 cos(µ ln |t|) + c2 sin(µ ln |t|)], if (α− 1)2 − 4β < 0.

The general form of Euler’s equation is of the form:

L[y](t) = (t− t0)2y′′(t) + α(t− t0)y′(t) + βy(t) = 0,

with a singular point at t0. Then the solutions will be of the form y(t) =(t− t0)r.

Example 5.5.7 Find the general solutions of the following equations:(1) L[y](t) = t2y′′(t) + 4ty′(t) + 2y(t) = 0.

(2) L[y](t) = t2y′′(t)− 5ty′(t) + 9y(t) = 0.

(3) L[y](t) = t2y′′(t)− 5ty′(t) + 25y(t) = 0.

Solution: (1) We set y(t) = tr. Then

F (r) = r2 + (4− 1)r + 2 = r2 + 3r + 2 = (r + 1)(r + 2) = 0

has solutions r1 = −1 and r2 = −2 so that the general solution isy(t) = c1

1|t| + c2

1|t|2 .

(2) We set y(t) = tr. Then

F (r) = r2 − 6r + 9 = (r − 3)2 = 0


has solutions r1 = r2 = 3 so that the general solution isy(t) = |t|3(c1 + c2 ln |t|).

(3) We set y(t) = tr. Then

F (r) = r2 − 6r + 25 = 0

has solutions rj = 3± i4, so that the general solution isy(t) = |t|3(c1 cos(4 ln |t|) + c2 sin(4 ln |t|)). ¤

5.5.2 Regular singular points, method of Frobenius

To develop a reasonably simple mathematical theory for solving equation (5.6)in a neighborhood of a singular point t0, the singularities in the functionsp(t) = Q(t)

P (t) and q(t) = R(t)P (t) has to be not too severe: that is, the limits

limt→t0

(t− t0)Q(t)P (t)

, and limt→t0

(t− t0)2R(t)P (t)

are finite. This means that the singularity in QP can be no worse than

(t− t0)−1, and the singularity in RP can be no worse than (t− t0)−2.

Definition 5.5.2 The equation,

L[y](t) = P (t)y′′(t) + Q(t)y′(t) + R(t)y(t) = 0, (5.8)

is said to have a regular singular point at t = t0 if P (t0) = 0, and(t− t0)

Q(t)P (t) and (t− t0)2

R(t)P (t) are analytic at t = t0.

If t0 = 0, then the standard form:

L[y](t) = y′′(t) + p(t)y′(t) + q(t)y(t) = 0, (5.9)

is said to have a regular singular point at t = 0 if tp(t) and t2q(t) areanalytic at t = 0.

A singular point of (5.8) that is not regular is called irregular.

For example, the Euler equation can be rewritten as

y′′(t) +α

ty′(t) +

β

t2y(t) = 0,

5.5. Regular Singular Points, Method of Frobenius 177

where tp(t) = α and t2q(t) = β are analytic so that t = 0 is a regularsingular point. Thus, p and q in the equation (5.9) are generalized forms ofthese ones:

p(t) =p0

t+ p1 + p2t + p3t

2 + · · ·

q(t) =q0

t2+

q1

t+ q2 + q3t + q4t

2 + · · · .

Example 5.5.8 [Bessel’s Equation] Classify the singular points of Bessel’sequation of order ν:

L[y](t) = t2y′′(t) + ty′(t) + (t2 − ν2)y(t) = 0, (5.10)

where ν is a constant.

Solution: P (t) = t2 = 0 at t = 0. Hence t = 0 is the only singular point.The equation can be rewritten as

y′′(t) +1ty′(t) + (1− ν2

t2)y(t) = 0.

Since tp(t) = 1 and t2q(t) = t2 − ν2 are both analytic at t = 0, we see thatBessel’s equation of order ν has a regular singular point at t = 0. ¤

Example 5.5.9 [Legendre Equation] Classify the singular points of Leg-endre equation:

L[y](t) = (1− t2)y′′(t)− 2ty′(t) + α(α + 1)y(t) = 0, (5.11)

where α is a constant.

Solution: P (t) = 1 − t2 = 0 at t = ±1. Hence t = ±1 are the singularpoints. The equation can be rewritten as

y′′(t)− 2t

1− t2y′(t) + α(α + 1)

11− t2

y(t) = 0.

Since (t− 1)p(t) = 2t1+t and (t− 1)2q(t) = α(α + 1)1−t

1+t are both analytic att = 1 and (t+1)p(t) = 2t

1−t and (t+1)2q(t) = α(α+1)1+t1−t are both analytic

at t = −1, t = ±1 are both the regular singular points of the Legendreequation. ¤


Example 5.5.10 Show that t = 0 is not regular singular point of:

L[y](t) = t2y′′(t) + 3y′(t) + ty(t) = 0.

Solution: P (t) = t2 = 0 at t = 0. Hence t = 0 is the only singular point.The equation can be rewritten as

y′′(t) +3t2

y′(t) +1ty(t) = 0.

Since tp(t) = 3t is not analytic at t = 0, t = 0 is an irregular singular point

of the equation. ¤

Suppose that t = 0 is a regular singular point of the equation (5.8). Forsimplicity, we assume t > 0. The case of t < 0 can be treated as the solutionof the Euler equation. By multiplying t2 to equation (5.9), we get

t2y′′(t) + t(tp(t))y′(t) + t2q(t)y(t) = 0. (5.12)

Since tp(t) and t2q(t) are analytic at t = 0, we have

tp(t) =∞∑

n=0

pntn = p0 + p1t + p2t2 + p3t

3 + · · · (5.13)

t2q(t) =∞∑

n=0

qntn = q0 + q1t + q2t2 + q3t

3 + · · · . (5.14)

for |t| < ρ. If pn = qn = 0 for n ≥ 1, then it reduces to the Euler equation:

t2y′′(t) + tp0y′(t) + q0y(t) = 0.

In general, of course, some of pn and qn are nonzero. However, the essentialcharacter of the solutions of equation (5.12) is identical to that of solutionsof the Euler equation. The presence of p1t+p2t

2 +p3t3 + · · · and q1t+q2t

2 +q3t

3 + · · · merely complicates the calculations.Since the coefficients of equation (5.12) are Euler coefficients times power

series, it is natural to expect the solution of equation (5.12) has to be Eulersolution times power series: when t0 = 0, for t > 0,

y(t) = tr∞∑

n=0

antn =∞∑

n=0

antn+r, a0 6= 0.

As usual, we need to determine r, recurrence relation for an, and the radiusof convergence of the solution y(t).


Example 5.5.11 Find the general solutions of the following equations:

L[y](t) = 2ty′′(t) + y′(t) + ty(t) = 0, 0 < t < ∞.

Solution: Note that P (t) = 2t = 0 at t = 0, and tp(t) = 12 and t2q(t) = t2

2are both analytic at t = 0. Thus t = 0 is a regular singular point. Sety(t) =

∑∞n=0 antr+n, a0 6= 0, for t > 0. Then

L[y](t) = 2ty′′(t) + y′(t) + ty(t)

=∞∑

n=0

2(r + n)(r + n− 1)antr+n−1 +∞∑

n=0

(r + n)antr+n−1 +∞∑

n=0

antr+n+1

= tr

( ∞∑

n=0

2(r + n)(r + n− 1)antn−1 +∞∑

n=0

(r + n)antn−1 +∞∑

n=2

an−2tn−1

)

= [2r(r − 1) + r]a0tr−1 + [2(r + 1)r + (r + 1)]a1t

r

+∞∑

n=0

{[2(r + n)(r + n− 1) + (r + n)]an + an−2} tr+n−1 = 0.

Thus,

(2r(r − 1) + r)a0 = r(2r − 1)a0 = 0, for n = 0,

(r + 1)(2r + 1)a1 = 0, for n = 1,

(r + n)(2(r + n)− 1)an = −an−2, for n ≥ 2.

The first equation determine r: r = 0 or r = 12 since a0 6= 0. The second

equation then forces a1 = 0. The third equation determines an for n ≥ 2.(1) If r = 0, then an = − an−2

n(2n−1) for n ≥ 2. Since a1 = 0, a2k−1 = 0 withk ≥ 2.

a2k = − a2(k−1)

2k(4k − 1)=

(−1)k

(2kk!)3 · 7 · · · (4k − 1)a0, k ≥ 1.

Thus

y1(t) = t0(

1− 12 · 3 t2 +

1(2 · 4)(3 · 7)

t4 + · · ·)

= 1 +∞∑

n=1

(−1)n

(2nn!)3 · 7 · · · (4n− 1)t2n,

which converges for all t by the Ratio test.


(2) If r = 12 , then an = − an−2

(n+ 12)(2(n+ 1

2)−1)

= − an−2

n(2n+1) for n ≥ 2. Since

a1 = 0 again, a2k−1 = 0 with k ≥ 2.

a2k = − a2(k−1)

2k(4k − 1)=

(−1)k

(2kk!)5 · 9 · · · (4k + 1)a0, k ≥ 1.

Thus

y2(t) = t12

(1− 1

2 · 5 t2 +1

(2 · 4)(5 · 9)t4 − · · ·

)

= t12

(1 +

∞∑

n=1

(−1)n

(2nn!)5 · 9 · · · (4n + 1)t2n

),

which converges for all t by the Ratio test. ¤

This is known as the method of Frobenius. We want to see whetherthis method works in general for equation (5.12).

Now, using the series expansion, 5.13 and 5.14, of tp(t) and t2q(t), andsupposing that the solution is of the form: y(t) =

∑∞n=0 antr+n, a0 6= 0, the

equation (5.12) is written as

L[y](t) = t2y′′(t) + t(∞∑

n=0

pntn)y′(t) + (∞∑

n=0

qntn)y(t)

=∞∑

n=0

(r + n)(r + n− 1)antr+n

+

( ∞∑

m=0

pmtm

)( ∞∑

n=0

(r + n)antr+n

)+

( ∞∑

m=0

qmtm

)( ∞∑

n=0

antr+n

)

= tr

[ ∞∑

n=0

(r + n)(r + n− 1)antn

+

(p0ra0 + [p1ra0 + p0(r + 1)a1]t +

∞∑

n=2

n∑

m=0

[pn−m(r + m)am]tn)

+

(q0a0 + (q1a0 + q0a1)t +

∞∑

n=2

n∑

m=0

(qn−mam)tn)]

= [r(r − 1) + rp0 + q0]a0tr

+ {[(r + 1)r + p0(r + 1) + q0]a1 + (p1r + q1)a0} tr+1 + · · ·


+

{[(r + n)(r + n− 1) + p0(r + n) + q0] an +

n−1∑

m=0

[(r + m)pn−m + qn−m]am

}tr+n + · · ·

= F (r)a0tr + [F (r + 1)a1 + (rp1 + q1)a0]tr+1 + · · ·

+

[F (r + n)an +

n−1∑

m=0

((r + m)pn−m + qn−m)am

]tr+n + · · · = 0,

where

F (r) = r(r − 1) + p0r + q0 = r2 + (p0 − 1)r + q0, (5.15)

which is called the indicial equation of (5.12). Therefore,

F (r)a0 = 0 (5.16)

F (r + n)an = −n−1∑

m=0

[(r + m)pn−m + qn−m]am, for n ≥ 1. (5.17)

The roots of the indicial equation (5.16) determine the two possible valuesr1 and r2 of r.

I. Suppose that r1 > r2 are two real distinct roots: For r = r1, the termsan’s are all determined recursively by equation (5.17), depending on r andon all the previous coefficients a0, . . ., an−1, provided that F (r1 +n) 6= 0 forall n ≥ 1. However, F (r1 + n) 6= 0 for all n ≥ 1 since r1 + n > r1 > r2 forn ≥ 1. Thus, by setting a0 = 1, one can always obtain a first solution

y1(t) = tr1

[1 +

∞∑

n=1

an(r1)tn]

of the equation (5.12), which converges whenever tp(t) and t2q(t) both con-verge. A second solution can be also obtained with r = r2, provided thatF (r2 + n) 6= 0, or r2 + n 6= r1, for all n ≥ 1:

y2(t) = tr2

[1 +

∞∑

n=1

an(r2)tn]

which also converges whenever tp(t) and t2q(t) both converge.

Remark: Since the power series parts of the solutions are analytic at t = 0within the radii of convergence, the singular behavior, if there is any, of thesolutions y1 and y2 is determined by tr1 and tr2 .

For t < 0, by the substitution t = −x with x > 0, we only need toreplace tr1 and tr2 by |t|r1 and |t|r2 in the expressions of y1 and y2.


Example 5.5.12 Find the general solutions of the following equations:

L[y](t) = 4ty′′(t) + 3y′(t) + 3y(t) = 0, 0 < t < ∞.

Solution: P (t) = 4t = 0 at t = 0. Moreover, Since tp(t) = 34 and t2q(t) = 3t

4are both analytic at t = 0, we see that t = 0 is a regular singular point. Sety(t) =

∑∞n=0 antr+n, a0 6= 0. Then

L[y](t) = 4ty′′(t) + 3y′(t) + 3y(t)

=∞∑

n=0

4(r + n)(r + n− 1)antr+n−1

+3∞∑

n=0

(r + n)antr+n−1 + 3∞∑

n=0

antr+n

= tr

( ∞∑

n=0

4(r + n)(r + n− 1)antn−1

+∞∑

n=0

3(r + n)antn−1 +∞∑

n=1

3an−1tn−1

)

= [4r(r − 1) + 3r]a0tr−1 +

+∞∑

n=1

{[4(r + n)(r + n− 1) + 3(r + n)]an + 3an−1} tr+n−1 = 0.

Thus,

4r(r − 1) + 3r = r(4r − 1) = 0, n = 0,

[4(r + n)(r + n− 1) + 3(r + n)] an =(r + n)[4(r + n)− 1]an, = −3an−1, n ≥ 1.

From the first indicial equation, we get r1 = 14 and r2 = 0 since a0 6= 0. The

second equation determines an recursively for n ≥ 1 depending on r.(1) Let r1 = 1

4 . Then an = −3an−1

n(4n+1) for n ≥ 1:

n = 1, a1 =−35

a0,

n = 2, a2 =−3a1

2 · 9 =32

2 · 5 · 9a0,

n = 3, a3 =−3a2

3 · 13=

−33

(2 · 3)(5 · 9 · 13)a0,

...


an =(−1)n3n

n! · 5 · 9 · · · (4n + 1)a0.

∴ y2(t) = t14

∞∑

n=0

(−1)n3ntn

n! · 5 · 9 · · · (4n + 1),

which converges for all t > 0 by the Ratio test.(2) Let r2 = 0. Then an = −3an−1

n(4n−1) for n ≥ 1:

n = 1, a1 =−3a0

3= −a0,

n = 2, a2 =−3a1

2 · 7 =3

2 · 7a0,

n = 3, a3 =−3a2

3 · 11=

−32

(2 · 3)(7 · 11)a0,

...

an =(−1)n3n−1

n! · 7 · 11 · · · (4n− 1)a0.

∴ y1(t) =∞∑

n=0

(−1)n3n−1tn

n! · 7 · 11 · · · (4n− 1),

which also converges for all t > 0 by the Ratio test. ¤

II. If r1 and r2 are complex numbers, they are complex conjugates toeach other and r1 6= r2 + n for any n ≥ 1. Thus one can compute twoseries solutions of the form y(t) = tr

∑∞n=0 antn, which are complex valued

functions of t. The real valued solutions are the real and imaginary partsof this complex valued solution: For r = λ ± iµ, tr = tλ+iµ = tλeiµ ln t =tλ(cos(µ ln t) + i sin(µ ln t)). Thus, the real and imaginary parts of

y(t) = tλ∞∑

n=0

[cos(µ ln t) + i sin(µ ln t)]an(r)tn.

III. Suppose that r1 = r2 so that F (r) = (r − r1)2: In this case, F (r1 +n) 6= 0 for all n ≥ 1. One can always obtain a solution of the equation (5.8)of the form:

y(t) = y(r, t) = tr∞∑

n=0

an(r)tn,


which emphasizes that the solution depends also on r and the coefficientsan’s are functions of r. Then

L[y](t) = a0F (r)tr +∞∑

n=1

{an(r)F (r + n) +

n−1∑

k=0

[(r + k)pn−k + qn−k]ak

}tr+n.

By requiring that the coefficients of tr+n be zero for n ≥ 1, we get:

an(r) =−∑n−1

k=0 [(r + k)pn−k + qn−k]ak

F (r + n). (5.18)

With this choice of an for n ≥ 1, we get

L[y](r, t) = a0F (r)tr = a0(r − r1)2tr.

Thus, for r = r1, L[y](r1, t) = 0, which means by setting a0 = 1,

y1(t) = tr1

[1 +

∞∑

n=1

an(r1)tn]

is a solution. Now

∂

∂rL[y](r, t) =

∂

∂r[a0(r − r1)2tr] = 2a0(r − r1)tr + a0(r − r1)2tr ln t,

which vanishes at r = r1. By a simple computation, we also have

∂

∂rL[y](r, t) = L

[∂y

∂r

](r, t).

That is, ∂y∂r (r1, t) is also a solution. Thus

y2(t) =∂

∂ry(r, t)

∣∣∣∣r=r1

=∂

∂r

[ ∞∑

n=0

an(r)tr+n

]

r=r1

=∞∑

n=0

an(r1)tr1+n ln t +∞∑

n=0

a′n(r1)tr1+n

= y1(t) ln t +∞∑

n=0

a′n(r1)tr1+n

is a second solution.


Remark: There are three ways of finding a second solution when r1 = r2:First, we can compute bn(r1) = a′n(r1) by substituting the above expressionfor y2(t) into the equation (5.8). Second, we can compute a′n(r1) by firstdetermining an(r) and then calculating a′n(r1). Note that when we try tofind y1(t) it was necessary to find an(r1). Thus it is better to find the generalexpression for an(r) at that time, from which we can calculate both an(r1)and a′n(r1). The first method is may be simpler if only a few terms in y2(t)are needed, or an(r) is very complicated or difficult to obtain. The thirdmethod is to use the reduction of order.

Example 5.5.13 Consider the Bessel’s equation of order λ = 0 (see Exam-ple 5.6.4)

L[y](t) = t2y′′(t) + ty′(t) + t2y(t) = 0, 0 < t < ∞.

t = 0 is a regular singular point. Find the solutions.

Solution: For a solution of the form y(t) =∑∞

n=0 antr+n, compute

L[y](t) = t2y′′(t) + ty′(t) + t2y(t)

=∞∑

n=0

(r + n)(r + n− 1)antr+n +∞∑

n=0

(r + n)antr+n +∞∑

n=0

antr+n+2

= tr

( ∞∑

n=0

(r + n)2antn +∞∑

n=2

an−2tn

)

= r2a0tr + (r + 1)2a1t

r+1 +∞∑

n=2

[(r + n)2an + an−2]tr+n = 0.

Thus,F (r)a0 = r2a0 = 0,

F (r + 1)a1 = (r + 1)2a1 = 0,F (r + n)an = (r + n)2an = −an−2, n ≥ 2.

The first indicial equation determine r: r1 = r2 = 0. The second equationthen forces a1 = 0. The third equation determines an = −an−2

(r+n)2recursively

for n ≥ 2.(1) Since a1 = 0, a2n+1 = 0 with n ≥ 1, and

a2n(r) =−a2(n−1)

(r + 2n)2=

(−1)na0

(r + 2n)2(r + 2(n− 1))2 · · · (r + 2)2, n ≥ 1.

Thus, y1(r = 0, t) = a0t0∞∑

n=0

(−1)nt2n

(2n)2(2(n− 1))2 · · · (2)2= a0

∞∑

n=0

(−1)n

22n(n!)2t2n.


This solution is often referred to as the Bessel function of the first kindof order zero, denoted by J0(t).

(2) For a second solution, we set

y2(t) = y1(t) ln t +∞∑

n=0

a′2n(0)t2n.

To find a′2n(0): Note that

a′2n(r)a2n(r)

=d

drln |a2n(r)|

=d

drln[(r + 2n)−2(r + 2(n− 1))−2 · · · (r + 2)−2]

= −2d

dr[ln(r + 2n) + ln(r + 2(n− 1)) + · · ·+ ln(r + 2)]

= −2(

1r + 2n

+1

r + 2(n− 1)+ · · ·+ 1

r + 2

).

Thus

a′2n(0) = −2(

12n

+1

2(n− 1)+ · · ·+ 1

4+

12

)a2n(0)

= −(

1n

+1

(n− 1)+ · · ·+ 1

2+

11

)a2n(0)

=−Hn(−1)n

22n(n!)2=

(−1)n+1Hn

22n(n!)2,

whereHn =

1n

+1

(n− 1)+ · · ·+ 1

2+ 1.

Therefore, a second solution is

y2(t) = y1(t) ln t +∞∑

n=0

(−1)n+1Hn

22n(n!)2t2n. ¤

IV. Suppose that r1 − r2 = N is a positive integer. Then y1(t) =tr1

∑∞n=0 an(r1)tn is a solution, and we need a second solution. When n = N ,

F (r2 + N) = 0, thus the equation (5.17) becomes:

aN · 0 = −N−1∑

k=0

[(r2 + k)pN−k + qN−k]ak.


If −∑N−1k=0 [(r2 + k)pN−k + qN−k]ak = 0, then aN can be arbitrary, or set

aN = 0. For n 6= N , again use equation (5.17) to find an(r2) and so asecond solution is of the form y2(t) = tr2

∑∞n=0 an(r2)tn. The following

Example 5.5.14 illustrates how to find an(r2)’s in this case.If

−N−1∑

k=0

[(r2 + k)pN−k + qN−k]ak 6= 0,

then the equation (5.17) is not satisfied for any choice of aN . In this case,to determine aN we may precede as follows: Note F (r) = (r − r1)(r − r2)implies

F (r + N) = (r + N − r1)(r + N + r2) = (r − r2)(r + N + r2),

which vanishes for r = r2. Since we can choose a0 arbitrarily, we takea0 = r − r2. Then each term ak’s in the numerator of the expression ofaN will contain (r − r0) as a factor and will cancel the same one in thedenominator when n = N . Now, following the same analysis as that for thecase r1 = r2, a second solution takes the form

y2(t) = ay1(t) ln t + tr2

[1 +

∞∑

n=1

cn(r2)tn]

.

In fact, let

y(r, t) = tr∞∑

n=0

an(r)tn

with a0 = a0(r) = r − r2. Then

y2(t) =∂

∂ry(r, t)

∣∣∣∣r=r2

=∂

∂r

[ ∞∑

n=0

an(r)tr+n

]

r=r2

=∞∑

n=0

an(r2)tr2+n ln t +∞∑

n=0

a′n(r2)tr2+n

= ay1(t) ln t +∞∑

n=0

a′n(r2)tr2+n,

where a is a constant given by

a = limr→r2

(r − r2)aN (r).

If aN (r2) is finite, the a = 0 and no logarithm term in y2. The proof of thisresult is beyond our scope here.


Example 5.5.14 The Bessel’s equation of order λ = 12 :

L[y](t) = t2y′′(t) + ty′(t) + [t2 − (12)2]y(t) = 0, 0 < t < ∞,

has t = 0 as a regular singular point. Find the solutions.

Solution: For a solution of the form y(t) =∑∞

n=0 antr+n, compute

L[y](t) = t2y′′(t) + ty′(t) + [t2 − (12)2]y(t)

=∞∑

n=0

(r + n)(r + n− 1)antr+n

+∞∑

n=0

(r + n)antr+n +∞∑

n=0

antr+n+2 − 14

∞∑

n=0

antr+n

= tr

( ∞∑

n=0

[(r + n)(r + n− 1) + (r + n)− 14]antn +

∞∑

n=2

an−2tn

)

= [r(r − 1) + r − 14]a0t

r + [(r + 1)r + (r + 1)− 14]a1t

r+1

+∞∑

n=2

{[(r + n)(r + n− 1) + (r + n)− 1

4]an + an−2

}tr+n = 0.

Thus,

F (r)a0 = [r(r − 1) + r − 14]a0 = (r2 − 1

4)a0 = 0,

F (r + 1)a1 = [(r + 1)r + (r + 1)− 14]a1 = [(r + 1)2 − 1

4]a1 = 0,

F (r + n)an = [(r + n)2 − 14]an = −an−2, n ≥ 2.

From the first indicial equation, we get r1 = 12 and r2 = −1

2 .(1) For r1 = 1

2 , the second equation then forces a1 = 0, and the thirdequation determines an = −an−2

(n+1)n recursively for n ≥ 2. Thus, a2k−1 = 0 fork ≥ 1.

a2k =−a2(k−1)

(2k + 1)2k=

(−1)k

(2k + 1)!a0, k ≥ 1.

Thus, y1(t) = t12

∞∑

n=0

(−1)n

(2n + 1)!t2n =

t12

t

∞∑

n=0

(−1)n

(2n + 1)!t2n+1 = t−

12 sin t.

5.6. LAPLACE TRANSFORMS 189

The Bessel function of the first kind of order one-half, J1/2, is definedas (2/π)1/2y1. Thus

J1/2(t) =

√2πt

sin t, t > 0.

(2) For r2 = −12 , F (r2 + 1) = F (1

2) = 0. Thus from the second equationF (r2 + 1) · a1 = 0 · a1 = 0 is satisfied automatically for arbitrary a1. Wechoose a1 = 0. Note that, for n ≥ 2, F (r2 + n) = n(n− 1) 6= 0. Then, fromthe third equation an = −an−2

n(n−1) for n ≥ 2. Thus a2k+1 = 0 for k ≥ 0, while

a2k =−a2(k−1)

2k(2k − 1)=

(−1)k

(2k)!a0, k ≥ 1.

Thus

y2(t) = t−12

∞∑

n=0

(−1)n

(2n)!t2n = t−

12 cos t.

The second Bessel function of the first kind of order one-half, J−1/2,is defined as (2/π)1/2y2. Thus

J−1/2(t) =

√2πt

cos t, t > 0. ¤

5.6 Laplace Transforms

Some other very useful tools in solving linear differential equations are in-tegral transforms of the form

F (s) =∫ b

aK(s, t)f(t)dt.

We call f is transformed to F , by means of integral. K is called the kernelof the transform. By making suitable choice of K and integration limits aand b, it is often possible to simplify a problem involving a linear differen-tial equation. When K(s, t) = e−st over [0,∞), it becomes the Laplacetransform. It is especially useful in two cases which arise quite often inapplications such as circuit analysis: The first case is when f(t) is discon-tinuous function of time, and the second case is when f(t) has impulsivenature: that is, its values are zero except for a very short time interval inwhich it is very large.


Definition 5.6.1 For a function f(t) defined on 0 ≤ t < ∞, the Laplacetransform of f is defined by the formula

F (s) = L[f(t)](s) =∫ ∞

0e−stf(t)dt ≡ lim

β→∞

∫ β

0e−stf(t)dt.

Example 5.6.1 For a constant function f(t) = 1,

F (s) = L[1](s) =∫ ∞

0e−stdt = lim

β→∞1− e−sβ

s=

{1s , if s > 0∞, if s ≤ 0. ¤

Example 5.6.2 For the function f(t) = eat,

F (s) = L[eat](s) =∫ ∞

0e−steatdt =

∫ ∞

0e(a−s)tdt

= limβ→∞

e(a−s)β − 1a− s

={

1s−a , if s > a

∞, if s ≤ a. ¤

Example 5.6.3 For the function f(t) = eiωt = cosωt + i sinωt,

F (s) = L[eiωt](s) =∫ ∞

0e−steiωtdt =

∫ ∞

0e(−s+iω)tdt

= limβ→∞

e(−s+iω)β − 1−s + iω

={ 1

s−iω = s+iωs2+ω2 , if s > 0

undefined, if s ≤ 0. ¤

Note that, for two functions f(t) and g(t) and constants a and b,

L[af(t) + bg(t)](s) =∫ ∞

0e−st(af(t) + bg(t))dt

= a

∫ ∞

0e−stf(t)dt + b

∫ ∞

0e−stg(t)dt

= aL[f(t)](s) + bL[g(t)](s).

Thus, in Example 5.6.3,

L[eiωt](s) = L[cosωt + i sinωt](s)= L[cosωt](s) + iL[sinωt](s)

={ s

s2+ω2 + i ωs2+ω2 , if s > 0

undefined, if s ≤ 0.

or

L[cosωt](s) ={ s

s2+ω2 , if s > 0undefined, if s ≤ 0.

L[sinωt](s) ={ ω

s2+ω2 , if s > 0undefined, if s ≤ 0.

5.6. Laplace Transforms 191

Remark: (1) The domains of f(t) and L[f(t)](s) are different: for example,the domain of F (s) = L[e2t](s) is (2,∞) while that of f(t) = e2t is R.

(2) The integral∫∞0 e−stf(t)dt may fail to exist for some s: for example,

f(t) = et2 .Therefore, we need to impose some conditions on f(t):

(1) f(t) is piecewise continuous with only a finite number of jump discon-tinuities in any finite interval.

(2) f(t) is of exponential order, that is, there exist constants M and csuch that

|f(t)| ≤ Mect, 0 ≤ t < ∞.

Lemma 5.6.1 If f(t) satisfies the two conditions above, then its Laplacetransform F (s) = L[f(t)](s) exists for all s sufficiently large.

Proof: Since f(t) is piecewise continuous, the integral∫ A0 e−stf(t)dt exists

for all A, and∫ A

0e−stf(t)dt ≤

∣∣∣∣∫ A

0e−stf(t)dt

∣∣∣∣ ≤∫ A

0e−st |f(t)| dt

≤ M

∫ A

0e−stectdt <

M

s− c, if s > c. ¤

The usefulness of the Laplace transform comes from the following theo-rem, which says that the operation of differentiation in t is replaced by theoperation of multiplication in s.

Theorem 5.6.2 If f(t) and f ′(t), defined on [0,∞), satisfy the two condi-tions above, then

L[f ′(t)](s) = sL[f(t)](s)− f(0).

Proof:

L[f ′(t)](s)) = limβ→∞

∫ β

0e−stf ′(t)dt

= limβ→∞

[e−stf(t)

]β

0+ lim

β→∞s

∫ β

0e−stf(t)dt

= −f(0) + sL[f(t)](s). ¤


In general, if f , f ′, . . ., f (n−1) satisfy the condition (1) and (2), and f (n)

satisfies (1), then

L[f ′′(t)](s) = sL[f ′(t)](s)− f ′(0) = s2L(f(t))− sf(0)− f ′(0),L[f (n)(t)](s) = snL[f(t)](s)− sn−1f(0)− sn−2f ′(0)− · · · − f (n−1)(0).

We now go back to our initial value problem of LDE:

ay′′(t) + by′(t) + cy(t) = f(t), y(0) = y0, y′(0) = y′0.

The Laplace transforms of this equation becomes:

L[f(t)](s) = L[ay′′(t) + by′(t) + cy(t)](s)= a[s2L[y(t)](s)− sy(0)− y′(0)]

+b[sL[y(t)](s))− y(0)] + cL[y(t)](s)= (as2 + bs + c)L[y(t)](s)− (as + b)y(0)− ay′(0).

Hence, a differential equation is transformed into an algebraic equation:

L[y(t)](s) =as + b

as2 + bs + cy0 +

a

as2 + bs + cy′0 +

1as2 + bs + c

L[f(t)](s).

That is, the right side is a function Y (s) in s: L[y(t)](s) = Y (s). Then y(t) =L−1[Y (s)](t), provided L−1 is meaningful. Instead, just like finding anti-derivative, one can find y(t) by looking at Y (s) = L[y(t)](s) by inspectionor by a table.

Remark: (1) The laplace transform L[y(t)](s) of a solution y(t) of L[y](t) =f(t) is expressed by an algebraic equation in s which takes care of the initialconditions automatically, and also of the nonhomogeneous part f(t). Thuswe don’t need to find solutions of homogeneous equation first to get thegeneral solution first.

(2) Higher order differential equations can be handled in the same way.


y′′(t)− 3y′(t) + 2y(t) = e3t, y(0) = 1, y′(0) = 0.


Solution: Take the Laplace transform of both sides:

L[y(t)](s) =as + b

as2 + bs + cy0 +

a

as2 + bs + cy′0 +

1(as2 + bs + c)

1(s− 3)

=s− 3

s2 − 3s + 2+

1(s2 − 3s + 2)(s− 3)

=s− 3

(s− 1)(s− 2)+

1(s− 1)(s− 2)(s− 3)

=2

(s− 1)− 1

(s− 2)+

12

(s− 1)− 1

(s− 2)+

12

(s− 3)

=52

(s− 1)− 2

(s− 2)+

12

(s− 3)= Y (s)

= L[52et](s)− L[2e2t](s) + L[

12e3t](s) = L

[52et − 2e2t +

12e3t

](s).

Thus, y(t) = 52et − 2e2t + 1

2e3t. ¤

Remark: The solution y(t) found in Example 5.6.4 is the only continuousone. There are many other discontinuous solutions like:

z(t) ={

52et − 2e2t + 1

2e3t, if t 6= 1, 2, 30, if t = 1, 2, 3.

whose Laplace transform is also Y (s), since z(t) is differ from y(t) at onlythree points.

5.6.1 Properties of Laplace transforms

Theorem 5.6.3 Let L[f(t)](s) = F (s). Then

(1) L[−tf(t)](s) = ddsF (s), and so L[(−t)nf(t)](s) = dn

dsn F (s).(2) L[eatf(t)](s) = F (s− a).

Proof: (1) Since F (s) =∫∞0 e−stf(t)dt,

d

dsF (s) =

∫ ∞

0

d

ds(e−st)f(t)dt =

∫ ∞

0−t(e−st)f(t)dt = L[−tf(t)](s).

(2) L(eatf(t)) =∫ ∞

0e(a−s)tf(t)dt =

∫ ∞

0e−(s−a)tf(t)dt = F (s− a). ¤


Example 5.6.5 The followings are easy consequences of the definition:

(1) L[f(ct)](s) =1cF (

s

c).

(2) L[1](s) =∫ ∞

0e−stdt =

1s.

(3) L[t](s) = − d

dsL(1) =

(−1)2

s2=

1s2

.

(4) L[tn](s) = − d

dsL[tn−1](s) = (−1)

d

ds

(−1)2(n−1)(n− 1)!sn

= (−1)2n n!sn+1

=n!

sn+1.

(5) L[eat](s) = L[1](s− a) =1

(s− a).

(6) L[eatt](s) = L[t](s− a) =1

(s− a)2.

(7) L([eattn](s) = L[tn](s− a) =n!

(s− a)n+1.

(8) L[eat cosωt](s) = L([cosωt](s− a) =s− a

(s− a)2 + ω2.

(9) L[eat sinωt](s) = L[sinωt](s− a) =ω

(s− a)2 + ω2.

(10) L[coshωt](s) = L[eωt + e−ωt

2

](s) =

12

(L[eωt](s) + L[e−ωt](s))

=12

(1

s− ω+

1s + ω

)=

s

s2 − ω2.

(11) L[sinhωt](s) = L[eωt − e−ωt

2

](s) =

12

(L[eωt](s)− L[e−ωt](s))

=12

(1

s− ω− 1

s + ω

)=

ω

s2 − ω2.

(12) L[eat coshωt](s) = L[coshωt](s− a) =s− a

(s− a)2 − ω2.

(13) L[eat sinhωt](s) = L[sinhωt](s− a) =ω

(s− a)2 − ω2.

(14) L[eiωtt](s) = L[t](s− iω) =1

(s− iω)2=

s2 − ω2

(s2 + ω2)2+ i

2sω

(s2 + ω2)2.¤

Example 5.6.6 Let f(t) = tp for p > −1. Then, with the substitutiont = x

s and so dt = 1sdx, for s > 0,

L([tp](s) =∫ ∞

0e−sttpdt =

1sp+1

∫ ∞

0e−xxpdx =

1sp+1

Γ(p + 1) =p

sp+1Γ(p),


where Γ(x) is the gamma function discussed in Chapter 11. Thus if p is apositive integer n, then L[tn](s) = 1

sn+1 Γ(n + 1) = n!sn+1 for s > 0. Using the

formulas Γ(12) =

√π in page 327, we get, for s > 0,

L[t1/2](s) = L[√

t](s) =1/2s3/2

Γ(12) =

√π

2s3/2.

L[t−1/2](s) = L[1√t](s) =

1√sΓ(

12) =

√π

s

(=

2√s

∫ ∞

0e−x2

dx

). ¤

Example 5.6.7 Find f(t) whose Laplace transform is given as

(1) L[f(t)](s) = −1(s−2)2

. (2) L[f(t)](s) = −4s(s2+4)2

. (3) L[f(t)](s) = 1(s−4)3

.

(4) L[f(t)](s) = s−725+(s−7)2

. (5) L[f(t)](s) = 1(s2−4s+9)

. (6) L[f(t)](s) = s(s2−4s+9)

.

Solution: (1) Observe that L[e2t](s) = 1s−2 and

− 1(s− 2)2

=d

ds

1s− 2

. ∴ −1(s− 2)2

= L[−te2t](s).

(2) Observe that L[sin 2t](s) = 2s2+4

and

−4s

(s2 + 4)2=

d

ds

2s2 + 4

. ∴ −4s

(s2 + 4)2= L[−t sin 2t](s).

(3) Observe that

1(s− 4)3

=d2

ds2

12(s− 4)

. ∴ 1(s− 4)3

= L[12t2e4t](s).

(4) Observe that L[cos 5t](s) = ss2+52 and s−7

25+(s−7)2= L[cos 5t](s− 7). Thus

∴ s− 725 + (s− 7)2

= L[e7t cos 5t](s).

(5) Observe that L[ 1√5sin√

5t](s) = 1s2+5

and 1(s2−4s+9)

= 1(s−2)2+5

. Thus

∴ 1s2 − 4s + 9

= L[1√5e2t sin

√5t](s).

(6) Observe that s(s2−4s+9)

= s−2(s−2)2+5

+ 2(s−2)2+5

, L[cos√

5t](s) = ss2+5

, and

L[e2t cos√

5t](s) = s−2(s−2)2+5

. Thus

s

s2 − 4s + 9= L[e2t cos

√5t +

2√5e2t sin

√5t](s). ¤


5.6.2 Discontinuous non-homogeneous functions

In the following non-homogeneous part function f(t) has in many casespoints of discontinuity.

ay′′(t) + by′(t) + cy(t) = f(t). (5.19)

The method of Laplace transform is quite useful in such cases.A simple example of such a function with a single jump discontinuity is

a step function

Hc(t) ={

0, 0 ≤ t < c,1, c ≤ t,

called the unit step function, or Heaviside function. Its Laplace trans-form is

L[Hc(t)](s) =∫ ∞

0e−stHc(t)dt =

∫ ∞

ce−stdt

= limβ→∞

∫ β

ce−stdt = lim

β→∞e−cs − e−βs

s=

e−cs

s, s > 0.

For a function f(t) defined on [0,∞), let g(t) be the translation of f(t)by c along the t axis:

g(t) ={

0, 0 ≤ t < c,f(t− c), c ≤ t.

= Hc(t)f(t− c).

Theorem 5.6.4 Let L[f(t)](s) = F (s). Then

L[g(t)](s) = L[Hc(t)f(t− c)](s) = e−csF (s).

Proof:

L[Hc(t)f(t− c)](s) =∫ ∞

0e−stHc(t)f(t− c)dt =

∫ ∞

ce−stf(t− c)dt

=∫ ∞

0e−s(ξ+c)f(ξ)dξ

= e−cs

∫ ∞

0e−sξf(ξ)dξ = e−csF (s). ¤

5.6. Discontinuous Non-homogeneous Functions 197

Example 5.6.8 Find f(t) whose Laplace transform is given as

(1) L[f(t)](s) = e−s

s2 . (2) L[f(t)](s) = e−3s

s2−2s−3.

Solution: (1) Since L(t) = 1s2 ,

e−s

s2= L[H1(t)(t− 1)].

(2) Note that 1s2−2s−3

= 1(s−1)2−22 and L[12 sinh 2t](s) = 1

s2−22 . Thus

1(s− 1)2 − 22

= L[12et sinh 2t](s),

e−3s

(s− 1)2 − 22= L[

12H3(t)et−3 sinh 2(t− 3)](s). ¤

Example 5.6.9 Let f(t) ={

t, 0 ≤ t < 1,0, 1 ≤ t.

Find L[f(t)](s).

Solution: Observe that f(t) can be written as

f(t) = t(H0(t)−H1(t)) = t− tH1(t).

Hence,

L[f(t)](s) = L[t](s)− L[tH1(t)](s) =1s2

+d

ds

e−s

s=

1s2− e−s

s− e−s

s2. ¤


y′′(t)− 3y′(t) + 2y(t) = f(t) ={

1, 0 ≤ t < 1, 2 ≤ t < 3, 4 ≤ t < 5,0, 1 ≤ t < 2, 3 ≤ t < 4, 5 ≤ t < ∞,

with y(0) = 0, y′(0) = 0.

Solution: By taking the Laplace transform of the equation, we get

(s2 − 3s + 2)L[y(t)](s) = L[f(t)](s), or L[y(t)](s) =L[f(t)](s)s2 − 3s + 2

=L[f(t)](s)

(s− 1)(s− 2).


Since f(t) = [H0(t)−H1(t)] + [H2(t)−H3(t)] + [H4(t)−H5(t)],

L[f(t)](s) =1s− e−s

s+

e−2s

s− e−3s

s+

e−4s

s− e−5s

s.

Thus,

L[y(t)](s) =L[f(t)](s)

(s− 1)(s− 2)=

1− e−s + e−2s − e−3s + e−4s − e−5s

s(s− 1)(s− 2).

Note that

1s(s− 1)(s− 2)

=12

1s− 1

s− 1+

12

1s− 2

= L[12− et +

12e2t

](s).

Therefore,

y(t) =[12− et +

12e2t

]−H1(t)

[12− et−1 +

12e2(t−1)

]

+H2(t)[12− et−2 +

12e2(t−2)

]−H3(t)

[12− et−3 +

12e2(t−3)

]

+H4(t)[12− et−4 +

12e2(t−4)

]−H5(t)

[12− et−5 +

12e2(t−5)

].¤

5.6.3 The Dirac delta function

In many physical and biological applications, the function f(t) of the non-homogeneous part in the differential equation (5.19) describes phenomenaof an impulsive nature, such as voltages or forces of large magnitude thatact over very short time intervals. In these situations, the only informationwe have about f(t) is that it is identically zero except for very short timeinterval (t0− τ, t0 + τ), and that its integral over the time interval is a givennumber

Iτ (f) =∫ ∞

−∞f(t)dt = I0(f) 6= 0.

Such a function is called impulsive function. For instance, set t0 = 0, andset

δτ (t) ={

12τ , −τ ≤ t ≤ τ,0, τ ≤ |t|.


Then Iτ (δτ ) = 1 for all τ > 0. This kind of function can be idealized byprescribing it to act over shorter and shorter time interval, that is, we requireτ → 0: As a limiting case, we would like to have a function δ(t) such that

δ(t) = 0, for t 6= 0, and I(δ) =∫ ∞

−∞δ(t)dt = 1.

Of course, there is no ordinary function of this kind studied in elementarycalculus that satisfies both equations above. This kind of unit impulsefunction is known as a generalized function, and is usually called the Diracdelta function. In the example,

δ(t) = limτ→o

δτ (t) = 0, if t 6= 0,

I(δ) = limτ→o

Iτ (δτ ) = 1, since Iτ (δτ ) = 1 for all τ 6= 0.

The Dirac delta function at arbitrary point t0 is now δ(t − t0) with∫∞−∞ δ(t− t0)dt = 1 with

δ(t− t0) ={ ∞, t = t0,

0, t 6= t0.

Suppose that f(t) is an impulse function which is positive on [a, b], zeroelsewhere, and

∫ ba f(t)dt = 1. Then, for any continuous function g(t) with

m ≤ g(t) ≤ M for all t ∈ [a, b],

mf(t) ≤ g(t)f(t) ≤ Mf(t),

m

∫ b

af(t)dt ≤ ∫ b

a g(t)f(t)dt ≤ M

∫ b

af(t)dt

m ≤ ∫ ba g(t)f(t)dt ≤ M.

Hence, as b → a with a ≤ t0 ≤ b,∫ b

ag(t)f(t)dt → g(t0).

Therefore, for any continuous function g(t) and f(t) = δ(t− t0),∫ b

ag(t)δ(t− t0)dt =

{g(t0), a ≤ t0 ≤ b,0, otherwise.

The Laplace transform of δ(t− t0) is

L[δ(t− t0)](s) =∫ ∞

0e−stδ(t− t0)dt = e−st0 , for t0 ≥ 0.


If t0 = 0, then

L[δ(t)](s) = limt0→0

L[δ(t− t0)](s) = limt0→0

e−st0 = 1.


y′′(t) + 2y′(t) + 2y(t) = δ(t− π), with y(0) = 0 y′(0) = 0.

Solution: The Laplace transform of the equation is

(s2 + 2s + 2)Y (s) = e−πs, or Y (s) =e−πs

s2 + 2s + 2=

e−πs

(s + 1)2 + 1.

Since 1(s+1)2+1

= L[e−t sin t](s),

L[y(t)](s) =e−πs

(s + 1)2 + 1= L[Hπ(t)e−(t−π) sin(t− π)](s). ¤


y′′(t)− 4y′(t) + 4y(t) = 3δ(t− 1) + δ(t− 2), with y(0) = 1 y′(0) = 1.

Solution: (1) The Laplace transform of the equation is

(s2−4s+4)Y (s) = s−3+3e−s+e−2s, or Y (s) =s− 3

(s− 2)2+

3e−s

(s− 2)2+

e−2s

(s− 2)2.

Since 1(s−2)2

= L[te2t](s),

3e−s

(s− 2)2+

e−2s

(s− 2)2= L[3H1(t)(t− 1)e2(t−1) + H2(t)(t− 2)e2(t−2)](s).

s− 3(s− 2)2

=s− 2

(s− 2)2− 1

(s− 2)2= L[(1− t)e2t](s).

Thus,

y(t) = (1− t)e2t + 3H1(t)(t− 1)e2(t−1) + H2(t)(t− 2)e2(t−2).

(2) Try to solve without Laplace transform.


(i) For 0 ≤ t < 1, the problem becomes:

y′′(t)− 4y′(t) + 4y(t) = 0, with y(0) = 1 y′(0) = 1.

Since the characteristic root are r1 = r2 = 2, y(t) = (a1 + a2t)e2t. By theinitial values, 1 = y(0) = a1 and 1 = y′(0) = 2a1 + a2, or a2 = −1. Hence

y(t) = (1− t)e2t, on [0, 1).

(ii) Now, y(1) = 0 and y′(1) = −e2. But at t = 1, y′(t) is suddenlyincreased by 3 so that y′(1) = 3− e2. Thus, on [1, 2), the problem becomes:

y′′(t)− 4y′(t) + 4y(t) = 0, with y(1) = 0 y′(1) = 3− e2.

Since t0 = 1, y(t) = (b1+b2(t−1))e2(t−1). By the initial values, 0 = y(1) = b1

and 3− e2 = y′(1) = 2b1 + b2, or b2 = 3− e2. Hence

y(t) = (3− e2)(t− 1)e2(t−1), on [1, 2).

(iii) Now, y(2) = (3− e2)e2 and y′(2) = 3(3− e2)e2. But at t = 2, y′(t)is suddenly increased by 1. Thus, on [2,∞), the problem becomes:

y′′(t)− 4y′(t) + 4y(t) = 0, with y(2) = (3− e2)e2 y′(2) = 1 + 3(3− e2)e2.

Since t0 = 2, y(t) = (c1 + c2(t−2))e2(t−2). By the initial values, (3− e2)e2 =y(2) = c1 and 1 + 3(3 − e2)e2 = y′(2) = 2c1 + c2, or c2 = 1 + e2(3 − e2).Hence

y(t) = [e2(3− e2) + (1 + e2(3− e2))(t− 2)]e2(t−2), on [2,∞). ¤

Example 5.6.13 A mass 1 is attached to a vertical spring, whose stiffnessconstant k = 1 N’ft. The drag force exerted on the particle is 2y′(t). Att = 0, when the particle is at rest, an external force e−t is applied. At t = 1,an additional force f(t) of very short duration is applied to the particle.This force imparts an impulse of 3 N.s to the particle. Find the position ofthe particle at t > 1.

Solution: The distance y(t) of the particle from the equilibrium positionsatisfies the initial value problem:

y′′(t) + 2y′(t) + y(t) = e−t + 3δ(t− 1), with y(0) = 0 y′(0) = 0.


The Laplace transform of the equation is

(s2 + 2s + 1)Y (s) =1

s + 1+ 3e−s, or Y (s) =

1(s + 1)3

+3e−s

(s + 1)2.

Since

1(s + 1)3

= L[12t2e−t](s),

3e−s

(s + 1)2= L[3H1(t)(t− 1)e−(t−1)](s).

Thus,

y(t) =12t2e−t + 3H1(t)(t− 1)e−(t−1). ¤

5.7 The Convolution Integral

Let L[f(t)](s) = F (s) and L[g(t)](s) = G(s). Since the Laplace transformis linear:

L[af(t) + bg(t)](s) = aL[f(t)](s) + bL[g(t)](s),

the inverse of a linear combination of the Laplace transforms F (s) and G(s)can easily found to be that of f(t) and g(t). However, some times we needto find the inverse of the product of F (s) and G(s), but in this case theinverse is not the product of f(t) and g(t): i.e.,

L[f(t) · g(t)](s) 6= L[f(t)](s) · L[g(t)](s).

Fortunately, there is an extremely interesting way of combining two functionsf and g which resembles multiplication, whose Laplace transform is theproduct of the individuals transform.

Definition 5.7.1 The convolution (f ∗ g)(t) of f and g is defined by theequation

(f ∗ g)(t) =∫ t

0f(t− u)g(u)du =

∫ t

0f(v)g(t− v)dv = (g ∗ f)(t).

Theorem 5.7.1 (1) f ∗ (g ∗ h) = (f ∗ g) ∗ h.

(2) f ∗ (g + h) = f ∗ g + f ∗ h.

(3) f ∗ 0 = 0 ∗ f = 0.

5.7. The Convolution Integral 203

(4) L[(f ∗ g)(t)](s) = L[f(t)](s) · L[g(t)](s).

Note that f ∗ 1(t) =∫ t0 f(u)du 6= f(t) and f ∗ f 6= f2. For instance, if

f(t) = cos t and g(t) = 1, then

(f ∗ 1)(t) =∫ t

0cosudu = sin t 6= f(t),

(f ∗ f)(t) =∫ t

0cosu cos(t− u)du =

t cos t + sin t

26= f(t)2.

Proof: The proofs of (1) - (3) are easy exercises. We prove (4) here.

L[f(t)] · L[g(t)](s) =∫ ∞

0e−suf(u)du

∫ ∞

0e−svg(v)dv

=∫ ∞

0

∫ ∞

0g(v)e−s(v+u)f(u)dudv, set u + v = t,

=∫ ∞

0g(v)

∫ ∞

ve−stf(t− v)dtdv,

=∫ ∞

0e−st

∫ t

0g(v)f(t− v)dvdt

=∫ ∞

0e−stdt

∫ t

0f(t− v)g(v)dv

=∫ ∞

0e−st(f ∗ g)(t)dt

= L[(f ∗ g)(t)](s).

6

- u

v

t

6

-

v

v = t

6

dt

-dv v < t

0 ≤ v ≤ t

R

The third integral is over the region R in the following tv-plane, and theorder of the integral is taken, for a fixed v varying from 0 to ∞, t variesfrom v to ∞ since u varies from 0 to ∞. The fourth equality is obtained by


changing the order of integration: for a fixed t varying from 0 to ∞, v variesfrom 0 to t. ¤

Example 5.7.1 Find the inverse Laplace transform of the functions:

(1) F (s) =a

s2(s2 + a2)(2) F (s) =

1s(s2 + 2s + 2)

.

Solution: (1) Note that 1s2 = L [t] (s) and 1

s2+a2 = L[sin at](s).

L−1

[a

s2(s2 + a2)

](t) = (t ∗ sin at)

=∫ t

0(t− u) sin audu

=[− t− u

acos au

]t

0

− 1a

∫ t

0cos audu

=t

a− 1

a2sin at =

at− sin at

a2.

(2) Note that 1s = L[1](s) and 1

s2+2s+2= 1

(s+1)2+1= L[e−t sin t](s).

L−1

[1

s(s2 + 2s + 2)

](t) =

∫ t

0e−u sinudu

=12

[1− et(cosu + sin t)

]. ¤

Remark: Consider a general linear differential equation with constant co-efficients:

ay′′(t) + by′(t) + cy(t) = f(t), with y(0) = y0, y′(0) = y′0.

The Laplace transform of this equation is

L[y(t)](s) = Y (s) =as + b

as2 + bs + cy0 +

a

as2 + bs + cy′0 +

1as2 + bs + c

· F (s)

≡ Y1(s) + Y2(s) + Ψ(s).

This shows that y1(t) = L−1[Y1(s)](t) is the solution of the homogeneousequation (f(t) = 0) with y0 = 1, y′0 = 0, and y2(t) = L−1[Y2(s)](t)is the solution of the homogeneous equation with y0 = 0, y′0 = 1, and


φ(t) = L−1 [Ψ(s)] (t) is the particular solution of a non-homogeneous equa-tion (f(t) 6= 0) with y0 = 0 = y′0. However,

φ(t) = L−1

[1

as2 + bs + c· F (s)

](t) = (

y2

a∗ f)(t).

This method is much simpler than the variation of parameters formula dis-cussed in Section 5.3.1.


y′′(t) + 4y(t) = f(t), with y(0) = 3, y′(0) = −1.

Solution: The Laplace transform of the equation is

(s2 + 4)Y (s)− 3s + 1 = F (s), or Y (s) =3s− 1s2 + 4

+F (s)s2 + 4

.

Thus

y(t) = L−1

[3s− 1s2 + 4

](t) + L−1

[F (s)s2 + 4

](t)

= 3L−1

[s

s2 + 4

](t)− 1

2L−1

[2

s2 + 4

](t) +

12L−1

[F (s)2s2 + 4

](t)

= 3 cos 2t− 12

sin 2t +12

∫ t

0sin 2(t− u)f(u)du.

If f(t) = δ(t), then L[δ(t)](s) = F (s) = 1. Thus y(t) = L−1[

1s2+4

](t) =

12 sin 2t is the solution of y′′(t) + 4y(t) = δ(t) with y0 = 0 = y′0. ¤

Example 5.7.3 Another interesting property of the cycloid, which was thesolution of the brachistochrone problem (see Section 4.2.2), is that it is alsothe solution of the tautochrone problem: Find the curve down which aparticle will slide freely under gravity alone, reaching the bottom in the sametime regardless of its starting point on the curve. This problem arose inthe construction of a clock pendulum whose period is independent of theamplitude of its motion. The tautochrone was found by Christian Huygens(1629-1695) in 1673 by geometrical methods, and later by Leibniz and JakobBernoulli using analytic arguments. Bernoulli’s solution in 1690 was one ofthe first occasions in which a differential equation was explicitly solved.


Solution: The geometric configuration is shown in the following figure:The starting point A(a, b) is joined to the terminal point 0 by the curve

C. The arc length s is measured from the origin 0. The particle P at (x, y)sliding down from A along C satisfies, by the conservation of energy,

12mv2 = mg(b− y),

where m is the mass of the particle, g is the gravitational acceleration, andv = ds

dt is the speed of the particle.

-

6

x

y

A(a, b)

P (x, y)y

b

b− y

0

À

s

Thus,

dt =1√

2g(b− y)ds.

Since

ds =√

dx2 + dy2 =

√1 + (

dx

dy)2 dy ≡ f(y)dy,

the time T (b) required for the particle to slide down from A to the origin 0is

T (b) =1√2g

∫ y=b

y=0

1√b− y

ds =1√2g

∫ y=b

y=0

f(y)√b− y

dy =1√2g

(g ∗ f)(b),

where g(y) = 1√y . Assume that T (b) = T0 is a constant, for any b. Take the

Laplace transform of the equation:

L[g ∗ f ](s) = L[g](s) · L[f ](s) =√

2g T0L[1](s) =√

2g T01s.


However, by Example 5.6.6, we have

L[g](s) = L[1√y](s) =

√π

s, s > 0.

Hence,

L[f ](s) =L[g ∗ f ](s)L[g](s)

=√

2g T0√π

1√s

=√

2g T0

π

√π√s.

Therefore,√

1 + (dx

dy)2 = f(y) =

√2g T0

πL−1

[√π√s

](t) =

√2g T0

π

1√y,

and sodx

dy=

√2α− y

y, α =

gT 20

π2,

whose solution is the same one obtained in Section 4.2.2, the cycloid.Note that, in this case, the roles of x and y axes are exchanged from the

case in Section 4.2.2. ¤

Remark: For linear systems of differential equations, students are recom-mended to take the course of Math. 300 Linear Algebra.

Chapter 6

Vectors in the Space

6.1 Cartesian Coordinates

The points, lines and planes in the 3-space can be described in terms of threemutually orthogonal real lines, called the Cartesian coordinate frame,crossing at a point, which is called the origin.

In naming the three coordinate axes, we choose a particular orientation,called right-handed, as follows: Hold one axis, say z-axis, with your righthand so that the thumb points along the positive z-axis. Then we name therest two x and y-axes is such a way that the rest fingers curl from the positivex-axis toward the positive y-axis in counterclockwise direction. With thiscoordinate frame, a point P in the space can be expressed by a triple ofreal numbers P = (x, y, z), called the Cartesian coordinates of P and thenumbers x, y and z are called the components of P . The origin is writtenas O = (0, 0, 0) ≡ 0, and the 3-space is written as

R3 = {(x, y, z) | x, y, z ∈ R}.Among the points in R3, one can define addition and scalar multiplicationcomponentwise: for P = (x, y, z) and Q = (x′, y′, z′) ∈ R3, r ∈ R,

P + Q = (x, y, z) + (x′, y′, z′) = (x + x′, y + y′, z + z′)rP = r(x, y, z) = (rx, ry, rz).

BY considering a point P = (x, y, z) as an arrow from the origin to P ,these rules are simply the parallelogram rule. The distance (or length) fromthe origin to a point P = (x, y, z) is defined by the Pythagorean rule anddenoted by

|OP | =√

x2 + y2 + z2.

211

212 Chapter 6. Vectors in the Space

Thus, an arrow has not only a direction, but a magnitude: Such an arrow wascalled a vector in the 3-space. The distance between two points P = (x, y, z)and Q = (x′, y′, z′) ∈ R3 is defined as usual:

|PQ| =√

(x′ − x)2 + (y′ − y)2 + (z′ − z)2,

which is the length of the vector−−→PQ = (x′−x, y′− y, z′− z). From the def-

inition of the operations, addition and scalar multiplication, one can easilyderive the following rules:

Theorem 6.1.1 Let u, v, w be vectors in R3 and a, b ∈ R be scalars. Then

(1) u + v = v + u,

(2) (u + v) + w = u + (v + w),

(3) u + 0 = u,

(4) u + (−u) = 0,

(5) 0u = 0, 1u = u,

(6) a(bu) = (ab)u,

(7) a(u + v) = au + av,

(8) (a + b)u = au + bu.

A vector u of length 1 is called a unit vector. If v 6= 0, then u = v|v| is

a unit vector, called the direction of v. Thus v = |v|u. The standard unitvectors are

i = (1, 0, 0), j = (0, 1, 0), k = (0, 0, 1).

Any vector v = (v1, v2, v3) ∈ R3 can be written as

v = (v1, v2, v3) = v1i + v2j + v3k.

Definition 6.1.1 The dot product of two vectors u = (u1, u2, u3) andv = (v1, v2, v3) ∈ R3 is defined by

u · v = u1v1 + u2v2 + u3v3.

Thus, |v| =√

v21 + v2

2 + v23 =

√v · v ≥ 0.

The following is an easy consequences of the definition.

Theorem 6.1.2 Let u, v be vectors in R3 and a, b ∈ R be scalars. Then

6.1. Cartesian coordinates 213

(1) u · v = v · u,

(2) (u + v) ·w = u ·w + v ·w,

(3) (λu) · v = λ(u · v),

(4) u · u ≥ 0, = holds if and only if u = 0.

By (1) and (2), u · (v+w) = u ·v+u ·w, and by (1) and (3), u · (λv) =λ(u · v).

Theorem 6.1.3 (Cauchy-Schwarz inequality) For vectors u, v in R3,

|u · v| ≤ |u||v|, “ = ”holds if and only if u = λv, λ ∈ R.

Hence, −1 ≤ u·v|u||v| ≤ 1, and so there exists a unique number θ ∈ [0, π]

such thatcos θ =

u · v|u||v| .

Definition 6.1.2 The number θ is called the angle between u and v.u and v are said to be orthogonal (or perpendicular) if u · v = 0,

denoted by u ⊥ v.

Thus u ⊥ v if and only if cos θ = 0, or θ = π2 .

The plane R2 can be naturally situated in R3 as xy-plane, and a unitvector u in R2 can be written as

u = cos θi + sin θj, for some θ ∈ [0, 2π).

Let u be a unit vector and v be any vector. Then

u · v = |u||v| cos θ = |v| cos θ,

where θ is the angle between u and v. This is the length of the vector thatis the orthogonal projection projuv of v onto the line determined by u, orthe scalar component of v in u direction. Thus

projuv = (u · v)u = (|v| cos θ)u,

which is the component of v in u direction.In general, for any two vectors u and v in R3, the (orthogonal) projection

of v in u direction is

projuv =(

u · v|u|

)u|u| =

(u · v|u|2

)u =

( |v| cos θ

|u|2)

u.


Note that this is the orthogonal projection of v in u direction, since

(v − projuv) · projuv =(v − u · v

|u|2 u)· u · v|u|2 u =

u · v|u|2 − u · v

|u|2 = 0.

Thus, any vector v can be written as the sum of two orthogonal vectors:

v = projuv + (v − projuv).

This orthogonal projection of a vector onto another vector can be usedto measure the work done by a force F in moving an object through adistance d from a point P to another point Q, that is, the work is the scalarcomponent of F in the direction

−−→PQ times the length of

−−→PQ:

Work = (|F| cos θ)|−−→PQ| = F · −−→PQ.

6.2 The Cross Product

The dot product on R3 results a scalar value, not a vector. There is anotheroperator on R3 resulting a vector, called the cross product, which is widelyused to describe the effects of forces in studies of electricity, magnetism, fluidflows, and orbital mechanics, etc.

Recall that the determinant of a 2× 2 matrix A =[

a bc d

]is defined

asdet A = |A| = ad− bc.

Then it is quite easy to verify the following three basic properties:

(1)∣∣∣∣

ka kbc d

∣∣∣∣ = k

∣∣∣∣a bc d

∣∣∣∣, for all k ∈ R,

(2)∣∣∣∣

a + a′ b + b′

c d

∣∣∣∣ =∣∣∣∣

a bc d

∣∣∣∣ +∣∣∣∣

a′ b′

c d

∣∣∣∣,

(3)∣∣∣∣

c da b

∣∣∣∣ = −∣∣∣∣

a bc d

∣∣∣∣,

(4)∣∣∣∣

1 00 1

∣∣∣∣ = 1.

6.2. The Cross product 215

By (3), (1), and (2), the linearity (1) and (2) also hold for the second row.Since, by definition, |A| = |AT | holds, the above three properties also holds

with respect to columnwise operations: for example,∣∣∣∣

ka bkc d

∣∣∣∣ = k

∣∣∣∣a bc d

∣∣∣∣,for all k ∈ R. By (3) again, the determinant of any matrix with equal rowsor equal columns is zero. Moreover, one can easily verify the followings:

∣∣∣∣a b

c + ka d + kb

∣∣∣∣ =∣∣∣∣

a bc d

∣∣∣∣ ,

|AB| = |A||B|,

For a 3× 3 matrix A =

a1 a2 a3

b1 b2 b3

c1 c2 c3

, the determinant of A is defined

by

detA =

∣∣∣∣∣∣

a1 a2 a3

b1 b2 b3

c1 c2 c3

∣∣∣∣∣∣= a1

∣∣∣∣b2 b3

c2 c3

∣∣∣∣− a2

∣∣∣∣b1 b3

c1 c3

∣∣∣∣ + a3

∣∣∣∣b1 b2

c1 c2

∣∣∣∣ .

Then one can easily verify that all the same properties of the determinantfor 2× 2 matrices also hold for 3× 3 matrices.

Let u = (u1, u2, u3) and v = (v1, v2, v3) be two nonzero vectors in R3.The cross product of these two vectors is defined formally as

u× v =

∣∣∣∣∣∣

i j ku1 u2 u3

v1 v2 v3

∣∣∣∣∣∣=

∣∣∣∣u2 u3

v2 v3

∣∣∣∣ i−∣∣∣∣

u1 u3

v1 v3

∣∣∣∣ j +∣∣∣∣

u1 u2

v1 v2

∣∣∣∣k.

The following properties of the cross product follow directly from theproperties of the determinant:

Theorem 6.2.1 Let u, v and w ∈ R3, and k, ` ∈ R. Then

(1) (ku)× (`v) = k`(u× v),

(2) u× (v + w) = u× v + u×w,

(3) (u + v)×w = u×w + v ×w,

(4) v × u = −u× v,

(5) i× j = k, j× k = i, k× i = j,

(6) u× u = 0,


(7) u · (v ×w) =

∣∣∣∣∣∣

u1 u2 u3

v1 v2 v3

w1 w2 w3

∣∣∣∣∣∣,

(8) |u× v|2 = |u|2|v|2 sin2 θ, where θ = ](u,v).

Properties (1), (2) and (3) mean the cross product is linear in bothvariables u and v. (7) can be seen by a direct expansion of the definition:

u · (v ×w) = (u1, u2, u3) ·(∣∣∣∣

u2 u3

v2 v3

∣∣∣∣ , −∣∣∣∣

u1 u3

v1 v3

∣∣∣∣ ,

∣∣∣∣u1 u2

v1 v2

∣∣∣∣)

= u1

∣∣∣∣u2 u3

v2 v3

∣∣∣∣− u2

∣∣∣∣u1 u3

v1 v3

∣∣∣∣ + u3

∣∣∣∣u1 u2

v1 v2

∣∣∣∣

=

∣∣∣∣∣∣

u1 u2 u3

v1 v2 v3

w1 w2 w3

∣∣∣∣∣∣.

Hence, u · (u× v) = 0 = v · (u× v) and so (u× v) ⊥ u, v. Hence, if we set

n =u× v|u× v| ,

then n is the unit vector orthogonal to both u and v, and right-handedwith u and v: that is, n is in the direction of the thumb when the fingersof the right hand wind from u to v.

Now (8) is proven by

|u× v|2 =∣∣∣∣

u2 u3

v2 v3

∣∣∣∣2

+∣∣∣∣

u1 u3

v1 v3

∣∣∣∣2

+∣∣∣∣

u1 u2

v1 v2

∣∣∣∣2

= (u2v3 − u3v2)2 + (u1v3 − u3v1)2 + (u1v2 − u2v1)2

= (u21 + u2

2 + u23)(v

21 + v2

2 + v23)− (u1v1 + u2v2 + u3v3)2

= |u|2|v|2 − (u · v)2

= |u|2|v|2 − |u|2|v|2 cos2 θ

= |u|2|v|2 sin2 θ.

Note that |u×v| = |u||v| sin θ is nothing but the area of the parallelogramspanned by u and v.

Therefore,u× v = |u× v|n = (|u||v| sin θ)n.

6.3. LINES AND PLANES IN SPACE 217

For any three vectors u, v and w ∈ R3,

(u× v) ·w = |u× v|(n ·w) = |u× v||w| cosϕ

= (|u||v| sin θ)(|w| cosϕ) =

∣∣∣∣∣∣

u1 u2 u3

v1 v2 v3

w1 w2 w3

∣∣∣∣∣∣

is the volume of the parallelopiped spanned by the three vectors, since|u||v| sin θ is the area of the parallelogram spanned by u and v, and |w| cosϕis the height (projection) of w in n direction. Thus, this shows the deter-minant of the 3× 3 matrix is the volume of the parallelopiped spanned bythe three row (or column) vectors.

6.3 Lines and Planes in Space

(1) In the plane, a line is determined by either given two points on the lineor a point on the line and the direction of the line. Thus, the line passingthrough a point P0 = (x0, y0, z0) parallel to a vector v = (v1, v2, v3) is theset of points P = (x, y, z) such that

−−→P0P = tv with t ∈ R. Thus

(x, y, z) = (x0, y0, z0) + t(v1, v2, v3) = (x0 + tv1, y0 + tv2, z0 + tv3),

and so the position vector is

γ(t) = (x, y, z) = P0 + tv, t ∈ R.

The line passing through two given points P0 = (x0, y0, z0) and P1 =(x1, y1, z1): The direction of this line is given by

−−−→P0P1 = (x1 − x0, y1 −

y0, z1 − z0), and so the above equation can still be used with v =−−−→P0P1.

Thus

x = (x, y, z) = (x0, y0, z0)+t(x1−x0, y1−y0, z1−z0) = (1−t)P0+tP1, t ∈ R.

In particular, the line segment joining P0 = (x0, y0, z0) to P1 = (x1, y1, z1)is given by

x = P0 + t(−−−→P0P1) = (1− t)P0 + tP1, t ∈ [0, 1].

(2) The distance d from a point Q = (q1, q2, q3) to a line through P =(p1, p2, p3) in the direction v = (v1, v2, v3) can be computed by

d =|−−→PQ× v||v| ,


since d = |−−→PQ| sin θ for θ = ](−−→PQ,v), and |−−→PQ× v| = |−−→PQ||v| sin θ.

(3) A plane in space is determined by a point on it and a vector that isperpendicular, or normal, to the plane: Let P0 = (x0, y0, z0) is a point onthe plane and n = (a, b, c) is a vector normal to the plane. Then for anypoint P = (x, y, z) on the plane,

−−→P0P is orthogonal to n. Hence,

0 = n · −−→P0P = a(x− x0) + b(y − y0) + c(z − z0) = ax + by + cz + d.

Two planes are parallel if their normal vectors are parallel.A plane is also determined by three non-collinear points P , Q, and R:

In this case,−−→PQ × −→PR = n is the normal vector to the plane and so the

above equation is still valid.(4) The intersection of two non-parallel planes is a line parallel to n1×n2.

Thus, by finding a common point P0 = (x0, y0, z0) of the two planes, theequation is determined by the same formula in (1) above.

(5) The distance from a point Q = (q1, q2, q3) to a plane ax+by+cz+d =0: Choose any point P0 = (x0, y0, z0) on the plane, then the distance fromQ to the plane is just the scalar component of

−−→P0Q in n = (a, b, c) direction.

Thus∣∣∣∣−−→P0Q · n

|n|

∣∣∣∣ =|a(q1 − x0) + b(q2 − y0) + c(q1 − z0)|√

a2 + b2 + c2=|aq1 + bq2 + cq3 + d|√

a2 + b2 + c2.

(6) The angle θ between two planes ax + by + cz + d = 0 and a′x + b′y +c′z + d′ = 0 satisfies

cos θ =n1 · n2

|n1||n2| =aa′ + bb′ + cc′√

a2 + b2 + c2√

a′2 + b′2 + c′2.

6.4 Cylinders and Quadric Surfaces

Cylinder is a surface generated by moving a straight line along a given planarcurve while holding the line parallel to a fixed line. The curve is called thegenerating curve for the cylinder. Any curve f(x, y) = c in the plane can bea generating curve for a cylinder parallel to z-axis.

For example, y = x2 represents the parabolic cylinder, and y2 − z2 = 1represents the hyperbolic cylinder.

A quadric surface is the graph in space of a second degree equation inx, y, z: The general form is:

Ax2 + By2 + Cz2 + Dxy + Eyz + Fxz + Gx + Hy + Jz + K = 0.

6.4. Cylinders and quadric surfaces 219

This equation can be simplified by translation and rotation; The cylindersabove are also examples of quadric surfaces. The basic quadric surfaces arethe following:

(1) Ellipsoid: x2

a2 + y2

b2+ z2

c2= 1,

(2) Elliptic Paraboloid: x2

a2 + y2

b2= z

c ,

(3) Hyperbolic Paraboloid: x2

a2 − y2

b2= z

c ,

(4) Elliptic cone: x2

a2 + y2

b2= z2

c2,

(5) Hyperboloid of one sheet: x2

a2 + y2

b2− z2

c2= 1,

(6) Hyperboloid of two sheets: x2

a2 − y2

b2− z2

c2= 1,

Chapter 7

Vector-valued functions

7.1 Vector Functions

The position of a moving particle in space can be represented in vector formby three parametric equations as:

γ(t) = (x(t), y(t), z(t)), t ∈ I = [a, b],

called a vector function, or a vector-valued function on I. The para-metric functions x(t), y(t) and z(t) are called the component functions ofthe position vector γ(t). The trace, or the image, of this particle γ is called acurve in R3, and γ(t) itself is called the particle’s path, or a parametriza-tion of the curve.

Definition 7.1.1 Let γ(t) = (x(t), y(t), z(t)), t ∈ I, be a path, and L avector in R3. We say that L is the limit of γ(t) as t approaches t0, andwrite

limt→t0

γ(t) = L,

if, for every number ε > 0, there exists a corresponding number δ > 0 suchthat

|γ(t)− L| < ε, whenever 0 < |t− t0| < δ.

γ(t) is continuous at t0 ∈ I if limt→t0 γ(t) = γ(t0). γ(t) is continuouson I if it is continuous at every point of I.

If L = (`1, `2, `3), limt→t0 γ(t) = L, if and only if

limt→t0

x(t) = `1, limt→t0

y(t) = `2, limt→t0

z(t) = `3.

221

222 Chapter 7. Vector-valued functions

Thus, limt→t0(x(t), y(t), z(t)) = (limt→t0 x(t), limt→t0 y(t), limt→t0 z(t)).The derivatives of a vector-valued function: γ(t) = (x(t), y(t), z(t)), t ∈

I: The difference vector between particles positions at time t and t+ M t isdefined as

M γ(t) = γ(t+ M t)− γ(t)= (x(t+ M t), y(t+ M t), z(t+ M t))− (x(t), y(t), z(t))= (x(t+ M t)− x(t), y(t+ M t)− y(t), z(t+ M t)− z(t)).

Hence, the derivative of γ(t) at t is defined as

γ′(t) =dγ

dt(t) = lim

Mt→0

M γ(t)M t

= (dx

dt,

dy

dt,

dz

dt)|t.

6

-

ª

>

M γ(t)γ(t)

¸-

γ(t+ M t)

3γ′(t)

γ(t) is called a regular curve, or smooth curve, if the vector γ′(t) 6= 0,which is the vector tangent to the curve at P = γ(t), and the tangent lineto the curve at P is the line through the point P parallel to γ′(t) given asL(λ) = γ(t) + λγ′(t), λ ∈ R.

A continuous curve that is made up of a finite number of smooth curves,i.e., γ′(t) 6= 0, is called piecewise smooth.

Definition 7.1.2 γ′(t) = v(t) is called the velocity vector in the directionof motion, |v(t)| is called the speed of the particle. a(t) = v′(t) = γ′′(t) isthe acceleration vector of the particle. T (t) = v(t)

|v(t)| is the unit vector inthe direction of motion.

The proof of the following theorem is an exercise:

Theorem 7.1.1 Let α(t) and β(t) be differentiable vector functions on I,c(t) = c a constant vector function, and f(t) a differentiable function on I.k ∈ R.

(1) c′(t) = 0,(2) (kα)′(t) = kα′(t),

7.2. PROJECTILE MOTION 223

(3) (fα)′(t) = f ′(t)α(t) + f(t)α′(t),

(4) (α± β)′(t) = α′(t)± β′(t),

(5) (α · β)′(t) = α′(t) · β(t) + α(t) · β′(t),(6) (α× β)′(t) = α′(t)× β(t) + α(t)× β′(t),

(7) (α ◦ f)′(t) = f ′(t)α′(f(t)).

A curve on the sphere has constant length so that

γ(t) · γ(t) = c, ∀ t ∈ I.

Thus by the rule above,

0 =d

dt(γ(t) · γ(t)) = 2γ(t) · γ′(t),

which means that γ(t) ⊥ γ′(t). That is, if |γ(t)| is a constant, then γ′(t) isalways orthogonal to γ(t).

Definition 7.1.3 For a continuous vector function γ(t) = (x(t), y(t), z(t)),t ∈ I = [a, b], the indefinite integral of γ is the set of all anti derivatives,denoted by ∫

γ(t)dt = R(t) + C, where R′(t) = γ(t).

The definite integral of γ over I is

∫ b

aγ(t)dt =

(∫ b

ax(t)dt,

∫ b

ay(t)dt,

∫ b

az(t)dt

).

7.2 Projectile Motion

The equation of a projectile motion is governed by the Newton’s secondlaw of motion: the force acting on a projectile equals the mass m of theprojectile times its acceleration, that is,

F = md2γ

dt2,

where γ(t) is the position vector of the projectile in time t ∈ I.

224 Chapter 7. Vector-valued functions in the Space

In an idealistic situation, we assume that the only force acting on theparticle is the constant force g of the gravity pointing straight down: in −jdirection. Thus, we assume F = −mgj, and so

md2γ

dt2= −mgj, or

d2γ

dt2= −gj,

with the initial conditions γ(0) = 0 and dγdt (0) = v0 = (v0 cos θ, v0 sin θ),

where v0 = |v0|.The first and the second integrals of the Newton’s equation are:

dγ

dt(t) = v(t) = −gtj + v0,

γ(t) = −12gt2j + v0t + γ(0)

= −12gt2j + (v0 cos θ)ti + (v0 sin θ)tj + 0

= (v0 cos θ)ti + [(v0 sin θ)t− 12gt2]j = (x(t), y(t)).

Hence, the components of the projectile’s position vector γ(t) are

x(t) = (v0 cos θ)t, y(t) = (v0 sin θ)t− 12gt2,

or y = (tan θ)x− ( g2v2

0sec2 θ)x2 which is a parabola.

The projectile reaches its highest point when its vertical velocity com-ponent is zero:

dy

dt(t) = v0 sin θ − gt = 0, thus t1 =

v0 sin θ

g.

At t = t1, it reaches the maximum height y(t1) = (v0 sin θ)2

2g . The landingtimes are t0 = 0 and t2 = 2t1 = 2v0 sin θ

g , and so the range of the projectile is

x(t2) = v20g sin 2θ.

7.3 Arc Length

In order to describe the shape of a curve in the space, one usually drivea plane along the curve and describe the position of the plane in time t.However, the description of the curve may depend on the speed of the plane.If the plane can fly in a constant speed, the curve may be described morereliably. In this section, we discuss how to do this.

7.3. Arc length 225

Let γ(t) = (x(t), y(t), z(t)), t ∈ I = [a, b], be a smooth curve on I. Fromthe difference quotient formula, we see that |4γ(t)| = |γ(t+ M t) − γ(t)|approximates the arc length 4s of the curve segment:

4s ≈ |γ(t+ M t)− γ(t)| = |γ′(c)|4t, for some c ∈ [t, t +4t],

by the mean value theorem. Thus, as 4t → 0, we get

ds = |γ′(t)|dt,

which is called the length density of the curve γ. The total length of thecurve is defined by the integration of this:

Definition 7.3.1 Let γ(t) = (x(t), y(t), z(t)), t ∈ I = [a, b], be a smoothcurve such that γ′(t) 6= 0 for all t. The arc length of γ is defined by

L(γ) =∫

γds =

∫ b

a|γ′(t)|dt =

∫ b

a

√x′(t)2 + y′(t)2 + z′(t)2dt.

It is an easy exercise to show that this arc length of the curve is inde-pendent of the parametrization of the curve: that is, independent of thespeed.

From a fixed base point P = γ(t0) on the curve, the distance of anyother point Q = γ(t) on the curve along the curve is measured by

s = g(t) =∫ t

t0

|γ′(τ)|dτ,

which is called the arc length function of the curve from P to Q. It is adirected distance since g(t) > 0 if t > t0, and g(t) < 0 if t < t0. Moreover,g : [a, b] → [c, d], s = g(t), is a differentiable function of t, since, by thefundamental theorem of Calculus,

ds

dt=

dg(t)dt

= |γ′(t)| > 0, ∀ t ∈ I.

This also shows that s = g(t) is an increasing differentiable function in t.Hence, there is an inverse function h : [c, d] → [a, b], t = h(s), such thath ◦ g(t) = t, g ◦ h(s) = s. Now,

1 =d(g ◦ h(s))

ds=

dg(t)dt

dh(s)ds

=ds

dt

dt

ds.

Thusdh(s)

ds=

dt

ds=

1dsdt

=1

|γ′(t)| > 0.


(1) Arc length reparametrization:

Definition 7.3.2 The reparametrization of γ by the arc length is β :[c, d] → R3 given by

β(s) = γ ◦ h(s) = γ(t), s ∈ [c, d], t = h(s) ∈ [a, b].

Note that

β′(s) =dβ(s)

ds=

dγ(t)dt

dh(s)ds

= γ′(t)dt

ds=

1|γ′(t)|γ

′(t),

so that |β′(s)| = 1. That is, β(s) has a unit speed for all s ∈ [c, d]. Such acurve is called a unit speed curve. Note also that γ(t) and β(s) have thesame image and

L(β) =∫ d

c|β′(s)|ds =

∫ d

c|γ′(h(s))| dt

dsds =

∫ b

a|γ′(t)|dt = L(γ).

β′(s) = 1|γ′(t)|γ

′(t) = T (s) = T (g(t)) ≡ T (t) is the unit tangent vector tothe curve.

Hence, theoretically, any regular curve in the space can be reparametrizedby the arc length so that it becomes a unit speed curve. However, there aresome practical defects in doing this. First of all, the computation of s = g(t)may be impossible in reality, and secondly, the inverse t = h(s) of g mightbe impossible, as the following examples show:

Example 7.3.1 1. If γ(t) = (2 cos t, sin t, 0), then γ′(t) = (−2 sin t, cos t, 0)and ‖γ′(t)‖ =

√4 sin2 t + cos2 t =

√4− 3 cos2 t. Thus

s(t) =∫ t

0

√4− 3 cos2 τdτ

is not integrated by elementary methods.2. If γ(t) = (t, t2

2 , 0), then γ′(t) = (1, t, 0) and ‖γ′(t)‖ =√

1 + t2.

s(t) =∫ t

0

√1 + τ2dτ =

12(t

√1 + t2 + ln(t +

√1 + t2)).

Now t = g(s) =?. ¤

Nevertheless, the arc length reparametrization simplifies the study ofspace curves.

7.3. Arc length 227

(2) The curvature: Let γ(t) = (x(t), y(t), z(t)) be a curve in R3, andβ(s) = γ ◦ h(s) its arc length reparametrization. Then T (s) = β′(s) is theunit tangent vector along the curve. The rate at which T turns per unit arclength along the curve is called the curvature of the curve:

κ(s) =∣∣∣∣dT

ds

∣∣∣∣ ≥ 0.

geometrically, as the picture below shows, dTds = T ′(s) is the vector pointing

to the direction of the infinitesimal turning of T (s) per unit arc length, andso, to the direction in which β is bending, and κ(s) =

∣∣dTds

∣∣ is the magnitudeof the rate at which the curve is bending: the curvature.

* :z

β(s)β(s+ M s)

T (s+ M s)T (s)

M T (s)

M s

-

6

ª

In practice, for s = g(t) and T (t) = γ′(t)|γ′(t)| ,

κ(s(t)) =∣∣∣∣dT (s)

ds

∣∣∣∣ =∣∣∣∣dT (t)

dt

dt

ds

∣∣∣∣ =1

|γ′(t)|

∣∣∣∣dT (t)

dt

∣∣∣∣ , or |T ′(t)| = κ(t)|γ′(t)|.

Example 7.3.2 [Straight line] Let γ(t) = v + tu, v,u ∈ R3, be a straightline in the plane. Then

γ′(t) = u,⇒ |γ′(t)| = |u|.T (t) =

u|u| , and so T ′(t) = 0.

κ(t) =1|u| |T

′(t)| = 0. ¤

Furthermore, if κ(s) = |T ′(s)| = 0 on J ⊆ I, then T ′(s) = β′′(s) = 0 sothat β(s) = v + su for s ∈ J : a part of straight line. Thus in the followingwe assume that κ(s) 6= 0 on I.

(3) Principal unit normal vector: Since 1 = |T (s)|2 = T (s) · T (s), wehave T (s)·T ′(s) = 0. That is, the direction in which T (s) turns as s increasesis orthogonal to T (s): i.e., T (s) ⊥ T ′(s). If κ(s) = |T ′(s)| 6= 0,

N(s) =1

κ(s)T ′(s), or T ′(s) = κ(s)N(s),


is a unit vector orthogonal to T (s), called the principal unit normalvector of γ, which always points toward the direction in which the curveβ(s) is bending. Hence, the curve lies infinitesimally in the plane determinedby T (s) and N(s).

In practice,

N(s(t)) =dT (s)/ds

|dT (s)/ds| =(dT (s(t))/dt)(dt/ds)|dT (s(t))/dt||dt/ds| =

dT (t)/dt

|dT (t)/dt| .

T ′(t) = dT (t)/dt = κ(t)|γ′(t)|N(t).

Example 7.3.3 [Involute] If the string wound on the unit circle is unwound,then the trace of the end point of the string is called the involute curve:This curve can be parameterized by the angle of the point of contact of thecircle and the string as follows:

γ(θ) = (cos θ, sin θ) + θ(sin θ, − cos θ) = (cos θ + θ sin θ, sin θ − θ cos θ).v(θ) = γ′(θ) = θ(cos θ, sin θ), =⇒ |v(θ)| = |γ′(θ)| = θ

T (θ) =v(θ)|v(θ)| =

γ′(θ)θ

= (cos θ, sin θ).

κ(θ) =1

|γ′(θ)| |T′(θ)| = 1

θ|(− sin θ, cos θ)| = 1

θ.

N(θ) =dT (θ)/dθ

|dT (θ)/dθ| = (− sin θ, cos θ).

M

µ1

1(cos θ, sin θ)(sin θ,− cos θ)

γ(θ) = (cos θ + θ sin θ, sin θ − θ cos θ)

θ

1

M

7.3. Arc length 229

On the other hand, s(θ) =∫ θ0 τdτ = 1

2θ2, and so θ = h(s) =√

2s. Thus

β(s) = γ ◦ h(s) = γ(√

2s) = (cos√

2s +√

2s sin√

2s, sin√

2s−√

2s cos√

2s),β′(s) = T (s) = (cos

√2s, sin

√2s),

κ(s) = |T ′(s)| = 1√2s

,

N(s) =dT (s)/ds

|dT (s)/ds| = (− sin√

2s, cos√

2s). ¤

Example 7.3.4 [Circle] Let γ(t) = (a cos 2t, a sin 2t) be the circle of radiusa with center at the origin in the plane. Then

v(t) = γ′(t) = (−2a sin 2t, 2a cos 2t),⇒ |v(t)| = 2a.

T (t) =v(t)|v(t)| = (− sin 2t, cos 2t).

T ′(t) = −2(cos 2t, sin 2t).

κ(t) =1

|v(t)| |T′(t)| = 1

a.

N(t) =dT (t)/dt

|dT (t)/dt| = −(cos 2t, sin 2t) ⊥ T (t). ¤

Let γ(t) = (x(t), y(t), z(t)) be a curve in R3. If κ(t) 6= 0, then T (t)and N(t) determine a plane, called the osculating plane of the curve atγ(t). Since T (t) ⊥ N(t), by Example 7.3.4 the circle of radius ρ(t) = 1

κ(t)

centered at C(t) = γ(t) + 1κ(t)N(t) is tangent to the curve with the same

curvature κ(t), and so approximates the curve at γ(t). This circle is calledthe osculating circle, or the circle of curvature. ρ(t) = 1

κ(t) is called theradius of curvature, and C(t) is called the center of curvature of thecurve.

(4) Binormal vector: There is one more direction in the space other thanT (s) and N(s): T (s)×N(s) = B(s) is a unit vector orthogonal to both T (t)and N(t), since

|B(s)| = |T (s)||N(s)| sin θ = 1, |T (s)| = 1 = |N(s)|, T (t) ⊥ N(t).

B(s) is called the binormal vector, so that {T (s), N(s), B(s)} now forma moving frame along γ, called the Frenet frame, or TNB-frame of thecurve γ.


Since this frame moves along the curve, their instantaneous direction ofmotion: the derivatives of them, will describe how the curve is bending as sincreases: thai is, the local shape of the curve γ.

As we have seen earlier, T ′(s) = κ(s)N(s) always points in the directionin which the curve γ is bending, and κ(s) = |T ′(s)| is the rate of bendinginstantaneously. Thus, if γ stays in the osculating plane determined byT (s) and N(s), then B(s) will remain as a constant vector orthogonal tothe osculating plane so that B′(s) = 0. However, if γ leaves the osculatingplane, then N(s) will rotate about T (s) in the direction in which γ(s) isbending away from the osculating plane, so will do B(s), but in the oppositedirection of N(s). Hence, B′(s) 6= 0 will measure this twisting motion ofthe curve: In fact, since T (s) × N(s) = B(s), B(s) ⊥ T (s), N(s): that is,B(s) · T (s) = 0 = B(s) ·N(s) and B(s) ·B(s) = 1. Thus

0 = B′(s) · T (s) + B(s) · T ′(s) = B′(s) · T (s) + B(s) · (κ(s)N(s))= B′(s) · T (s),

0 = 2B′(s) ·B(s).

That is, B′(s) ⊥ T (s), B(s), and so B′(s) ‖ N(s). Hence we can writeB′(s) = (B′(s) ·N(s))N(s).

Definition 7.3.3 τ(s) = −B′(s) · N(s) is called the torsion of γ at s, sothat B′(s) = −τ(s)N(s).

As an air plane γ(s) takes off the osculating plane (i.e., moves in B(s)direction), N(s) also rotates in up-direction (B(s) direction) and the head ofB(s) twists in −N(s)-direction. Thus, B′(s) ·N(s) < 0. Hence “−” sign inthe definition of τ(s) corrects the twisting direction of γ(s): That is, if τ(s) >0, γ(s) leaves the osculating plane in B(s) direction, and if τ(s) < 0, γ(s)leaves the osculating plane in down direction (−B(s)-direction). Moreover,|τ(s)| measures the rate at which the curve leaves the osculating plane.

Similarly, we can express N ′(s) in terms of T , N , and B: T (s)×N(s) =B(s) implies N(s) = B(s)× T (s) and T (s) = N(s)×B(s). Thus

N ′(s) = B′(s)× T (s) + B(s)× T ′(s)= −τ(s)N(s)× T (s) + κ(s)B(s)×N(s)= −κ(s)T (s) + τ(s)B(s).

The three equations

T ′(s) = κ(s)N(s)N ′(s) = −κ(s)TN(s) +τ(s)B(s)B′(s) = −τ(s)B(s)

7.3. Arc length 231

are called the Frenet equations.

(5) Acceleration vectors: When a moving particle is affected by someexternal or internal forces, it is accelerated or decelerated and so the velocitychanges. We want to understand how much of the acceleration acts in thedirection of motion: Let γ(t) = (x(t), y(t), z(t)) be a curve in R3. Then

v = γ′(t) =dγ

ds

ds

dt=

ds

dtT (s(t)),

a =dv(t)

dt=

d

dt

(T (t)

ds

dt

)=

d2s

dt2T (s(t)) +

ds

dt

dT (t)dt

=d2s

dt2T (s(t) +

ds

dt

(dT (s)

ds

ds

dt

)

=d2s

dt2T (t) +

ds

dt

(κ(t)N(t)

ds

dt

)

=d2s

dt2T (t) + κ(t)

(ds

dt

)2

N(t)

= aT T (t) + aNN(t),

where aT = d2sdt2

= ddt |v| is the tangential component of the acceleration

which measures the rate of change of the speed |v|, and aN = κ(t)(

dsdt

)2

is the normal components of the acceleration which measures the rate ofchange of the direction of v. This equation shows that the acceleration aalways lies in the osculation plane determined by T and N . Note that sinceT and N are orthonormal,

|a|2 = a2T + a2

N .

Example 7.3.5 Consider the involute: γ(θ) = (cos θ + θ sin θ, sin θ −θ cos θ), t > 0. Note that v(θ) = γ′(θ) = θ(cos θ, sin θ), and so |v(θ)| =|γ′(θ)| = θ. Thus

a = v′(θ) = (cos θ, sin θ) + θ(− sin θ, cos θ)= (cos θ − θ sin θ, sin θ + θ cos θ).

|a|2 = θ2 + 1,

aT =d2s

dθ2=

d

dθ|v| = 1,

aN =√|a|2 − a2

T =√

(θ2 + 1)− 1 = θ,

∴ a(θ) = T (θ) + θN(θ). ¤


(6) Practical computation of the curvatures: Note that

v = γ′(t) =dγ

ds

ds

dt=

ds

dtT (s(t)),

a =dv(t)

dt= γ′′(t)

=d2s

dt2T (t) + κ(t)

(ds

dt

)2

N(t)

(v × a)(t) =(

ds

dt

)T (t)×

[d2s

dt2T (t) + κ(t)

(ds

dt

)2

N(t)

]

= κ(t)(

ds

dt

)3

B(t),

|(v × a)(t)| = κ(t)∣∣∣∣ds

dt

∣∣∣∣3

= κ(t)|v(t)|3,

or κ(t) =|(v × a)(t)||v(t)|3 .

For the computation of the torsion, recall a Frenet equation: N ′(s) =−κ(s)T (s) + τ(s)B(s). Thus, by setting v(t) = |v(t)|,

dN

dt(t) =

dN(s)ds

ds(t)dt

= [−κvT + τvB](t).

Hence, by a direct computation,

γ′′′(t) =[d3s

dt3T +

d2s

dt2ds

dtκN +

d

dt(κv2)N + κv3(−κT + τB)

](t)

=[(

d3s

dt3− κ2v3

)T +

(κ

d2s

dt2ds

dt+

d

dt(κv2)

)N + κv3τB

](t).

Therefore,

γ′(t)× γ′′(t) = (κv3B)(t), =⇒ |γ′(t)× γ′′(t)|2 = (κv3)2(t),

(γ′(t)× γ′′(t)) · γ′′′(t) =

∣∣∣∣∣∣

x′ y′ z′

x′′ y′′ z′′

x′′′ y′′′ z′′′

∣∣∣∣∣∣(t) = [τ(κv3)2](t),

∴ τ(t) =(γ′(t)× γ′′(t)) · γ′′′(t)

|γ′(t)× γ′′(t)|2 .

7.4. KEPLER’S LAWS 233

Example 7.3.6 Consider the helix: γ(t) = (a cos t, a sin t, bt), a, b ≥ 0.

v(t) = (−a sin t, a cos t, b),a(t) = (−a cos t, −a sin t, 0),a′(t) = (a sin t, −a cos t, 0),

(v × a)(t) =

∣∣∣∣∣∣

i j k−a sin t a cos t b−a cos t −a sin t 0

∣∣∣∣∣∣= (ab sin t, −ab cos t, a2),

κ(t) =|(v × a)(t)||v(t)|3 =

a√

a2 + b2

(a2 + b2)3/2=

a

a2 + b2.

τ(t) =(γ′(t)× γ′′(t)) · γ′′′(t)

|γ′(t)× γ′′(t)|2 =

∣∣∣∣∣∣

−a sin t a cos t b−a cos t −a sin t 0

a sin t −a cos t 0

∣∣∣∣∣∣(a√

a2 + b2)2

=b(a2 cos2 t + a2 sin2 t)

a2(a2 + b2)=

b

a2 + b2. ¤

7.4 Kepler’s Laws

One of the best application of the Calculus is the derivation of the Kepler’slaws of the planetary motion of the earth from the Newton’s laws of motions.

We restrict our attention to planetary motions of two bodies only: thesun and the earth. Let γ(t) denote the position vector of the earth withrespect to the sun, which is placed at the origin. By the Newton’s laws ofmotion, we have

F(t) = mγ′′(t), where

F(t) = −GmM

r(t)2γ(t)r(t)

, r(t) = |γ(t)|.

Thus

γ′′(t) = −GM

r(t)3γ(t), or γ′′(t) ‖ γ(t), and so

d

dt(γ(t)× γ′(t)) = γ′(t)× γ′(t) + γ(t)× γ′′(t) = 0.

This means that γ(t)×γ′(t) = c is a constant vector, and so γ(t) is a planarmotion.


Hence, γ(t) can be expressed in terms of the polar equations in the xy-plane:

γ(t) = (r(t) cos θ(t), r(t) sin θ(t)), t ∈ R,

where r(t) = |γ(t)| and θ(t) represent the radial distance of the earth andthe angle of the radial line through the origin and the earth from a fixed axis,say the x-axis. This can be rewritten in terms of the moving orthonormalframe {ur(t), uθ(t)}, which are the radial unit direction and the angularunit direction vectors given by

ur(t) = (cos θ, sin θ)(t),uθ(t) = (− sin θ, cos θ)(t).

3

3]

-

6

γ(t)

θ

ur

uθ

Thus γ(t) = r(t)ur(t), and we want to find r(t). The motions of thisframe can be expressed as

d

dtur(t) =

d

dθur(t)

dθ

dt(t) = θ′(t)(− sin θ, cos θ)(t) = θ′(t)uθ(t),

d

dtuθ(t) =

d

dθuθ(t)

dθ

dt(t) = θ′(t)(− cos θ, − sin θ)(t) = −θ′(t)ur(t).

Now, the velocity and the acceleration of the earth are computed as follows:

v = γ′ = (rur)′ = r′ur + ru′r = r′ur + rθ′uθ,

a = γ′′ = v′ = (r′ur + rθ′uθ)′

= r′′ur + r′θ′uθ + r′θ′uθ + rθ′′uθ − rθ′2ur,

= (r′′ − rθ′2)ur + (rθ′′ + 2r′θ′)uθ.

Let r(0) = r0 be the minimum at t = 0. Then, at t = 0,

r′(0) = r′0 = 0, θ(0) = 0,

γ′(t) = v(t) = r′(t)ur(t) + r(t)θ′(t)uθ(t).

7.4. Kepler’s Laws 235

Thus, v0 ≡ |v(0)| = r0θ′0. Since,

c = γ(t)× γ′(t) = γ(t)× v(t) = (rur)× (r′ur + rθ′uθ) = r2θ′k,

r2(t)θ′(t) is a constant. Since, at t = 0, r2(0)θ′(0) = r20θ′0 = r0v0, r2(t)θ′(t) =

r0v0 for all t. The area swept by the earth around the sun in infinitesimalangle is

dA =12r2dθ =

12r2θ′dt =

12r0v0dt.

Hence, at any time interval [t0, t1] with α = θ(t0) and β = θ(t1), the Thearea swept by the earth around the sun is

A|t1t0 =∫ β

α

12r2dθ =

12r0v0

∫ t1

t0

dt =12r0v0(t1 − t0).

This is called Kepler’s the second law: γ(t) sweeps out equal area inequal time interval.

Kepler’s the first law says that the earth’s path is an ellipse with thesun at one focus. To derive this, we look at the Newton’s equation again:

−GM

r(t)3γ(t) = −GM

r(t)2ur = γ′′(t) = (r′′ − rθ′2)ur + (rθ′′ + 2r′θ′)uθ.

Thus,

−GM

r(t)2= r′′(t)− r(t)θ′2(t), and r(t)θ′′(t) + 2r′(t)θ′(t) = 0.

Since θ′(t) = r0v0r(t)2

, r′′(t) = r20v2

0r(t)3

− GMr(t)2

. Set p(t) = r′(t). Then

r′′(t) = p′(t) =dp(r)dr

dr

dt= p(t)

dp(r)dr

=r20v

20

r(t)3− GM

r(t)2.

Hence,∫

2pdp(r)dr

dr =∫

2(

r20v

20

r(t)3− GM

r(t)2

)dr,

p2(t) = − r20v

20

r(t)2+

2GM

r(t)+ c.

At t = 0, c = v20 − 2GM

r0. Thus, by using r(t)2θ′(t) = r0v0,

p2(t) = r′2(t) = v20

(1− r2

0

r(t)2

)+ 2GM

(1

r(t)− 1

r0

),

r′2(t)(r(t)2θ′(t))2

=1

r(t)4

(dr

dθ

)2

=1r20

− 1r(t)2

+ 2h

(1

r(t)− 1

r0

), h =

GM

r20v

20

.


Set u = 1r and u0 = 1

r0. Then du

dθ = − 1r2

drdθ , and so

1r(t)4

(dr

dθ

)2

=(

du

dθ

)2

= u20 − u2 + 2hu− 2hu0 = (u0 − h)2 − (u− h)2,

du

dθ= −

√(u0 − h)2 − (u− h)2,

where “−” is due to the fact that dudθ = − 1

r2drdθ and dr

dθ = r′(t)dtθ′(t)dt ≥ 0 at t = 0

since r(0) is a local minimum. Hence

−1√(u0 − h)2 − (u− h)2

du

dθ= 1,

cos−1

(u− h

u0 − h

)= θ + d.

But d = 0 since u = u0 at t = 0 and cos−1 1 = 0. Thus u = h+(u0−h) cos θ.Since u = 1

r ,

r =(1 + e)r0

1 + e cos θ, e =

r0v20

GM− 1,

which is a polar equation of an ellipse with a focus F at 0.

Problem 7.4.1 Derive the polar equation of an ellipse with a focus at the origin.

Kepler’s the third law says that if T is the period of the earth aroundthe sun and a is the semi-major axis of the earth’s orbital ellipse, then

T 2

a3=

4π2

GM.

Chapter 8

Functions of several variables

8.1 Limits and Continuity

Real -valued functions of several independent real variables are defined muchthe same way one would imagine from the single variable case. The domainsare sets of ordered n-tuples of real numbers. Let

Rn = {(x1, . . . , xn) | xi ∈ R, i = 1, . . . , n}.We will mostly work on the spaces of n = 2, or 3.

Definition 8.1.1 A real-valued function of n-variables is a function fon a subset D ⊆ Rn into R:

f : D −→ R, f(x1, . . . , xn) = w.

The set D is called the domain of f , w is the dependent variable of f ,x1, . . ., xn are called the independent variables.

Let D ⊆ Rn be a subset of Rn. A point P = (p1, . . . , pn) is an interiorpoint of D if there is a positive number ρ > 0 such that the set of pointswith

Bρ(P ) = {X = (x1, . . . , xn) ∈ Rn |√

(p1 − x1)2 + · · ·+ (pn − xn)2 < ρ},which is called an open ball of radius ρ centered at P , is contained in D.The set of all interior points of D is denoted by Int(D).

A point P = (p1, . . . , pn) is a boundary point of D if every ball centeredat P contains points that lie in D as well as the points that lie outside ofD. The set of all boundary points of D is denoted by ∂(D).

237

238 Chapter 8. Functions of several variables

Then it is clear that Int(D) ∩ ∂(D) = ∅. A region D is open if D =Int(D), and closed if D = Int(D) ∪ ∂(D).

A region D is bounded if it is contained in a ball of a fixed radius, andunbounded if it is not bounded.

Let f : D → R be a function on D ⊆ Rn. The graph of f is the set

{(x1, . . . , xn, w) ∈ Rn+1 | w = f(x1, . . . , xn), ∀(x1, . . . , xn) ∈ D}.

When n = 2, the graph is also called a surface; z = f(x, y). The set of allpoints in D whose f -values are constant:

{(x1, . . . , xn) ∈ D | c = f(x1, . . . , xn)},

is called a level surface of f . When n = 2, it is called a level curve, andwhen n = 3, it is called a level surface. For example, if z = f(x, y) is theheight of a mountain, the level curves c = f(x, y) are contour curves inthe domain D.

The following theories hold for arbitrary dimension n with a properadjustment of the number of variables. However, for convenience, we justdo for n = 2 or 3.

Definition 8.1.2 Let z = f(x, y) be a function on a domain D ⊆ R2.

(1) A number L is called the limit of f at X0 = (x0, y0) ∈ R2, which isnot necessarily in D, denoted by

limX→X0

f(X) = L,

if, for every number ε > 0, there exists a corresponding number δ > 0such that

|f(X)−L| < ε, for all points X = (x, y) ∈ D with 0 < |X−X0| < δ.

(2) z = f(x, y) is said to be continuous at X0, if

limX→X0

f(X) = f(X0) :

that is, f is defined at X0 ∈ D and it has the limit f(X0) at X0.

(3) z = f(x, y) is said to be continuous on D, if it is continuous at everypoint in D.

8.2. PARTIAL DERIVATIVES 239

Example 8.1.1 Consider f(x, y) = 4xy2

x2+y2 on D = R2 − {(0, 0)}. For X =(0, y) or X = (x, 0) ∈ D, f(X) = 0. Thus we choose L = 0. To showlimX→X0 f(X) = 0: For ε > 0 given, we want to find a δ > 0 such that

∣∣∣∣4xy2

x2 + y2− 0

∣∣∣∣ =4|x|y2

x2 + y2< ε, whenever 0 <

√x2 + y2 < δ.

However, since y2 ≤ x2 + y2,

4|x|y2

x2 + y2≤ 4|x| ≤ 4

√x2 ≤ 4

√x2 + y2.

Thus, if we choose δ = ε4 , then, for any (x, y) with 0 <

√x2 + y2 < δ,

∣∣∣∣4xy2

x2 + y2

∣∣∣∣ =4|x|y2

x2 + y2≤ 4

√x2 + y2 < 4δ = 4

ε

4= ε.

Hence if define f(0, 0) = 0, then this function is made to be a continuousfunction on R2. ¤

Example 8.1.2 Consider f(x, y) = 2xyx2+y2 on D = R2 − {(0, 0)}. For X0 =

(0, 0) 6∈ D, if we choose X = (x, y) with y = mx, then

f(x, y) =2mx2

x2(1 + m2)=

2m

1 + m2,

which depends on the value m. That is, f has no limit at X0 = (0, 0), andso there is no way to define f(0, 0) to make it continuous at X0. ¤

Theorem 8.1.1 The general rules of the arithmetics of continuous func-tions hold: that is, if f and g are continuous at X0 = (x0, y0) and k ∈ R,then so are kf , f ± g, fg, f

g provided g(X0) 6= 0. Moreover, if h(z) = w iscontinuous function at z0 = f(x0, y0), then so is w = (h ◦ f)(x, y) at X0.

8.2 Partial Derivatives

Let z = f(x, y) be a function on D ⊆ R2, and X0 = (x0, y0) ∈ D. For thefixed value y = y0, z = f(x, y0) is a function in x only through x0 whosegraph is the intersection of the graph z = f(x, y) and the vertical planey = y0.


The derivative of this function of single variable x is called the partialderivative of f with respect to x at x = x0:

d

dxf(x, y0)

∣∣∣∣x=x0

= limh→0

f(x0 + h, y0)− f(x0, y0)h

≡ ∂f

∂x

∣∣∣∣(x0,y0)

≡ ∂z

∂x

∣∣∣∣(x0,y0)

≡ fx(x0, y0) = zx.

¼ s

6

¼ s

Tangent line withslope fx(X0)

Tangent line withslope fy(X0)

x0y0

(x0, y0)

z = f(x, y0)z = f(x0, y)

x y

z

Similarly for y:

∂f

∂y

∣∣∣∣(x0,y0)

≡ ∂z

∂y

∣∣∣∣(x0,y0)

≡ fy(x0, y0) = zy

≡ d

dyf(x0, y)

∣∣∣∣y=y0

= limh→0

f(x0, y0 + h)− f(x0, y0)h

.

Example 8.2.1 For f(x, y) = 2yy+cos x , find fx and fy.

fx(x, y) =(y + cosx) ∂

∂x(2y)− 2y ∂∂x(y + cosx)

(y + cosx)2=

2y sinx

(y + cos x)2,

fy(x, y) =(y + cosx) ∂

∂y (2y)− 2y ∂∂y (y + cosx)

(y + cosx)2=

2 cos x

(y + cosx)2. ¤

Example 8.2.2 Let z = f(x, y) satisfy yz − ln z = x + y. Find zx and zy.

∂

∂x(yz)− ∂

∂x(ln z) =

∂x

∂x+

∂y

∂x

y∂z

∂x− 1

z

∂z

∂x= 1− 0

∂z

∂x=

z

yz − 1.

Similarly for ∂z∂y . ¤

8.2. Partial derivatives 241

Example 8.2.3 Let z = f(x, y) ={

0, if xy 6= 0,1, if xy = 0.

Then fx(0, 0) and

fy(0, 0) exist, but f is not continuous at (0, 0). ¤

Higher order partial derivatives are defined as usual:

∂2f

∂x2= fxx,

∂2f

∂x∂y=

∂

∂x

(∂f

∂y

)= (fy)x = fyx, etc.

Theorem 8.2.1 Suppose that z = f(x, y) has continuous second partialderivatives on an open domain D ⊆ R2. Then the mixed partial derivativesare equal:

∂2f

∂x∂y=

∂2f

∂y∂x.

Proof: For a fixed point X0 = (x0, y0) ∈ D, consider

F (4x, 4y) = f(x0+4x, y0+4y)−f(x0+4x, y0)−f(x0, y0+4y)+f(x0, y0).

For fixed y0 and4y, define g(x) = f(x, y0+4y)−f(x, y0). Then F (4x, 4y) =g(x0+4x)−g(x0). By the mean value theorem for functions of one variable,there is a number c ∈ [x0, x0 +4x] such that

F (4x, 4y) = g(x0 +4x)− g(x0) = g′(c)4x

= [fx(c, y0 +4y)− fx(c, y0)]4x.

By the mean value theorem again,

F (4x, 4y) =∂2f

∂y∂x(c, d)4x4y, d ∈ [y0, y0 +4y].

Since ∂2f∂y∂x(c, d) is continuous, we have

∂2f

∂y∂x(x0, y0) = lim

(4x,4y)→(0,0)

14x4y

F (4x,4y) =∂2f

∂x∂y(x0, y0),

where the second equality is obtained similarly from h(y) = f(x0 +4x, y)−f(x0, y), and F (4x, 4y) = h(y0 +4y)− h(y0). ¤

Example 8.2.4 For f(x, y) = xey + yx2, one can easily show that

∂2f

∂y∂x= ey + 2x =

∂2f

∂x∂y. ¤


8.3 Differentiability

Recall that, for a differentiable function y = f(x) in a single variable x ∈I = [a, b], the difference quotient 4y

4x as x changes from x0 ∈ I to x0 +4xsatisfies

4y

4x= f ′(x0) + ε, (= f ′(c), for some c ∈ [x0, x0 +4x], )

where ε → 0 as 4x → 0, or

4y = f ′(x0)4x + ε4x.

If f ′(x) is continuous at x0, then this equation becomes

dy = f ′(x0)dx, as 4x → 0,

which is called the total differential of f at x0. For 4x small, 4y ≈f ′(x0)4x is called the linear approximation of f : that is,

f(x) ≈ f(x0) + f ′(x0)(x− x0) = L(x),

the right side of which is the equation of the tangent line through (x0, f(x0))of the graph of f . The error term ε4x was computed from the Taylorpolynomial of f .

x0 x = x0 +4x

f(x)

f(x0)4y

4x

f ′(x0)4x

ε4x

L(x) = f(x0) + f ′(x0)(x− x0)

The above formula can be used for the differentiability of functions withmore than one variables: f : Rn → R. However, it is good enough to workon two variable functions:

Definition 8.3.1 A function z = f(x, y) on D is said to be differentiableat X0 = (x0, y0) ∈ D if fx and fy exist at X0, and

4z = f(X)− f(X0) = fx(X0)4x + fy(X0)4y + ε14x + ε24y,

where ε1, ε2 → 0 as 4x, 4y → 0. f is said to be differentiable on D if it isdifferentiable at every point of D.

8.3. Differentiability 243

In fact, for functions with more than one variables, we have the followingtheorem:

Theorem 8.3.1 Let z = f(x, y) be a function on an open domain D ⊆ R2.Suppose that fx and fy are defined on D and continuous at X0 = (x0, y0) ∈D. Then the increment 4z = f(X)− f(X0) of f from X0 to X = (x, y) =(x0 +4x, y0 +4y) ∈ D is given by

4z = f(X)− f(X0) = fx(X0)4x + fy(X0)4y + ε14x + ε24y,

where ε1, ε2 → 0 as 4x, 4y → 0.

Proof: We assume that 4x and 4y are small enough so that a rectangleT centered at X0 is contained in D. Then 4z = 4z1 +4z2 where

4z1 = f(x0 +4x, y0)− f(x0, y0),4z2 = f(x0 +4x, y0 +4y)− f(x0 +4x, y0).

-

ª

6

x0

x = x0 +4x

y0

y = y0 +4y

X = (x, y)

6

?

4x

4z

6

?6

?

4z1

4y

∂f∂x (X0)4x

ε14x

?

6

? ?

6

?∂f∂y (x0 +4x, y0)4x

ε′24x

?

6

?

X0

?

dz

64z2

∂f∂y (X0)4y

ε24y

From the case of single variable functions, we have

4z1 = fx(x0, y0)4x + ε14x,

4z2 = fy(x0 +4x, y0)4y + ε′24y

= fy(x0, y0)4y + ε24y, by the continuity of fx and fy,

which gives the result. ¤


Theorem 8.3.2 If the partial derivatives fx and fy are continuous on D,then z = f(x, y) is differentiable at every point of D.

Corollary 8.3.3 If z = f(x, y) is differentiable on D, then it is continuous.

Definition 8.3.2 If z = f(x, y) is differentiable on D, then the limit of 4zas 4x, 4y → 0 is denoted by

dz = fx(X)dx + fy(X)dy,

which is called the total differential of f .

Corollary 8.3.4 Let z = f(x, y) be a function with continuous fx and fy

on an open domain D ⊆ R2. Then the linear approximation of f(X),X = (x, y) = (x0 +4x, y0 +4y), at X0 = (x0 y0) is given by

f(X) ≈ f(X0) + fx(X0)4x + fy(X0)4y.

Note that the right side of the above equation is the equation of thetangent plane of the graph of f at X0. In fact, the vector (1, 0, fx(X0))is tangent to the curve z = f(x, y0) at X0, and the vector (0, 1, fy(X0))is tangent to the curve z = f(x0, y) at X0. Thus, the normal vector n(X0)to the graph of f at (x0, y0, f(X0)) is

n(X0) =

∣∣∣∣∣∣

i j k1 0 fx(X0)0 1 fy(X0)

∣∣∣∣∣∣= (−fx(X0), −fy(X0), 1).

Thus, if (x, y, w) is a point on the tangent plane spanned by the two vectors,then X −X0 = (x− x0, y − y0, w − z0) = (4x, 4y, 4w) satisfies

0 = (X −X0) · (−fx(X0), −fy(X0), 1)= (w − z0)− fx(X0)4x− fy(X0)4y,

or w = f(X0) + fx(X0)(x− x0) + fy(X0)(y − y0).

The error terms in the linear approximation of z = f(x, y) by the valuew on the tangent plane will be given later in Section 8.6.

Example 8.3.1 Compute the functional value of z = f(x, y) = x2 − xy +12y2 + 3 at X = (3.01, 2.02).

By a direct computation,

f(3.01, 2.02) = (3.01)2 − (3.01)(2.02) +12(2.02)2 + 3 = 8.0201.

8.4. DIRECTIONAL DERIVATIVES 245

Instead, we may take the linear approximation: Choose X0 = (3, 2). Thenf(X0) = 8, (4x, 4y) = (0.01, 0.02), and

fx(3, 2) = (2x− y)(3,2) = 4,

fy(3, 2) = (−x + y)(3,2) = −1,

∴ z = f(3.01, 2.02) ≈ w = f(3, 2) + fx(3, 2)4x− fy(3, 2)4y

= 8 + 4(0.01) + (−1)(0.02) = 8.02. ¤

Analogous results holds for functions z = f(x1, . . . , xn) of more than twovariables: The total differential represents the linear approximation:

df = fx1dx1 + · · ·+ fxndxn,

or 4z = fx1(X0)4x1 + · · ·+ fxn(X0)4xn.

8.4 Directional Derivatives

Note that the partial derivatives fx(X0) and fy(X0), etc, of a function withmore than one variables are the rate of changes of f when X moves alongthe lines through X0 and parallel to the coordinate axes. How about therate of change of f when X moves in other directions at X0?

Let z = f(x, y) be a differentiable function on R ⊆ R2 and X0 =(x0, y0) ∈ R. Let α(t) = (x(t), y(t)), t ∈ I = [a, b], be a differentiablecurve in R through X0 = α(t0) with α′(t0) = (x′(t0), y′(t0)) a unit vector.Then z = f ◦ α(t) = f(x(t), y(t)) is a function in t, which is the restrictionof the domain of f along the curve α(t).

Theorem 8.4.1 The composite z = f ◦α(t) = f(x(t), y(t)) is differentiableat t0, and

dz

dt(t0) = fx(X0)

dx(t0)dt

+ fy(X0)dy(t0)

dt.

Proof: By the differentiability of f , we have

4z = fx(X0)4x + fy(X0)4y + ε14x + ε24y,

∴ 4z

4t= fx(X0)

4x

4t+ fy(X0)

4y

4t+ ε1

4x

4t+ ε2

4y

4t,


where ε1, ε2 → 0 as 4x, 4y → 0. Since α(t) is differentiable, 4x, 4y → 0,and so ε1, ε2 → 0, as 4t → 0. Hence

dz

dt(t0) = lim

4t→0

4z

4t=

∂f

∂x

∣∣∣∣X0

dx(t0)dt

+∂f

∂y

∣∣∣∣X0

dy(t0)dt

. ¤

Recall that α′(t0) = (x′(t0), y′(t0)) ≡ (u1, u2) = u ∈ R2X0

is a vectortangent to α(t) at X0 = α(t0) ∈ R, and the derivative of (f ◦ α) at t0depends only on u = α′(t0), not on the curve α(t) itself: that is, for anycurve β(t) in R such that α(t0) = β(t0) and α′(t0) = u = β′(t0),

dz

dt(t0) =

∂f

∂x

∣∣∣∣X0

u1 +∂f

∂y

∣∣∣∣X0

u2.

This has several notations:dz

dt(t0) = fx(X0)u1 + fy(X0)u2

= (fx(X0), fx(X0)) · (u1, u2) = ∇f(X0) · u, in vector notation,

= [fx(X0) fx(X0)][

u1

u2

]= Df(X0)α′(t0), in matrix notation,

which is called the directional derivative of f at X0 in u direction, alsodenoted by dz

dt (t0) = Dfu(X0) = ∇f(X0) · u.This can be done at every point X ∈ R, and the first equality is the

total differential:

dz = fx(X)x′(t)dt + fx(X)y′(t)dt = fx(X)dx + fx(X)dy.

In the vector notation,

∇f(X) = (fx(X), fx(X))

is called the gradient vector of f at X in R2X0

, and in the matrix notationthe 1× 2 matrix

Df(X) = [fx(X) fy(X)]

is called the derivative of f at X, and the above equation represents thechain rule of the derivatives of functions of several variables.

By definition, the directional derivative of f is the rate of change of fwhen X moves in u-direction, and it is given as, for any unit vector u,

Dfu(X0) = ∇f(X0) · u = |∇f(X0)| cos θ, θ = ](∇f(X0), u),

=

|∇f(X0)|, when cos θ = 1 (θ = 0), or u ‖ ∇f(X0),0, when cos θ = 0 (θ = π

2 ), or u ⊥ ∇f(X0),−|∇f(X0)|, when cos θ = −1 (θ = π), or u ‖ −∇f(X0).

8.4. Directional derivatives 247

To see a geometrical meaning of this computation, consider a level curveC given by an equation f(x, y) = C. If we parameterize this contour curveas α(t) = (x(t), y(t)), t ∈ I, then C = f ◦ α(t) = f(x(t), y(t)), and so, bydifferentiating both sides, we get

0 =dz(t)dt

= Dfα′(t)(α(t)) = ∇f(α(t)) · α′(t).

Since α′(t) is tangent to the contour curve C, this means that ∇f(α(t)) is al-ways perpendicular (or, normal) to C. That is, the function f increases mostrapidly in ∇f(X) direction at X ∈ R at the rate |∇f(X), stays constant atthe rate 0 in the direction of C (or perpendicular to ∇f(X)), and decreasesmost rapidly in in −∇f(X) direction at X ∈ R at the rate |∇f(X)|.

j

¸

6

- x

y

C = f(x, y)

X = α(t)

α′(t)

∇f(X)

1

θ

u

Example 8.4.1 Consider a function z = f(x, y) = xey + cos(xy). Thedirectional derivative f in v = (3, 4) direction at X0 = (2, 0) is computedas follows: The direction of v is the unit vector u = (3

5 , 45), and the partial

derivatives are

fx(2, 0) = (ey − y sin(xy))|(2,0) = 1, fy(2, 0) = (xey − x sin(xy))|(2,0) = 2.

Thus the gradient vector of f at X0 is ∇f(2, 0) = (1, 2) and the directionalderivative is

Dfu(2, 0) = ∇f(2, 0) · u = (1, 2) · (35,

45) = −1.

f increases most rapidly in (1, 2) direction, and stays constant in (2, −1)direction. The tangent line to the level curve f(x, y) = 3 at X0 is orthogonalto ∇f(2, 0) = (1, 2). Thus its equation is

0 = ∇f(2, 0) · (x− 2, y) = (1, 2) · (x− 2, y) = x + 2y − 2. ¤


The following theorem is an easy consequence from the definition.

Theorem 8.4.2 Let f and g be differentiable functions on R, and k ∈ R,X ∈ R. Let u, v ∈ R2

X . For notational convention, we use D(f)u(X) ≡D(f)(u).

(1) D(kf + g)(u) = ∇(kf + g) ·u = (k∇f +∇g) ·u = (kD(f)+D(g))(u),(2) D(f)(ku + v) = kD(f)(u) + D(f)(v),(3) D(fg)(u) = ∇(fg) · u = (f∇g + g∇f) · u = (fD(g) + gD(f))(u),

(4) D(fg )(u) = ∇(f

g ) · u = (g∇f−f∇gg2 ) · u = (gD(f)−fD(g)

g2 )(u).

For functions of more than two variables: f : Rn → R, the similardefinitions hold except for adding more terms with respect to the number ofvariables: for example, if w = f(x, y, z) is a differentiable function of threevariables, then

dw = fxdx + fydy + fzdz,

∇f = (fx, fy, fz),D(f)(u) = ∇f · u = fxu1 + fyu2 + fzu3.

8.5 Derivatives and Chain Rule

In this section, we consider vector-valued functions of more than one vari-ables: that is, functions of the form F : Rn → Rm denoted by

F (x1, . . . , xn) = (f1(x1, . . . , xn), . . . , fm(x1, . . . , xn)), (x1, . . . , xn) ∈ R ⊆ Rn,

where R is an open domain in Rn. F is differentiable if f1, . . ., fm aredifferentiable functions onR: that is, fj ’s have continuous partial derivativeson R, and its derivative is defined to be the m × n matrix of the partialderivatives:

DF (X) =

∂f1

∂x1· · · ∂f1

∂xn

. . .∂fm

∂x1· · · ∂fm

∂xn

(X).

As usual, one can easily see that if F is differentiable on R, then it iscontinuous on R.

For simplicity, we consider a function F : R2 → R3, denoted by

F (u, v) = (x(u, v), y(u, v), z(u, v)) ∈ U ⊆ R3, (u, v) ∈ R ⊆ R2,

8.5. Derivatives and Chain rule 249

where R is an open domain in R2 and x, y, z are differentiable functions onR, and a differentiable curve α(t) = (u(t), v(t)), t ∈ I in R ⊆ R2. Then

β(t) = (F ◦ α)(t) = F (α(t)) = F (u(t), v(t)) = (x(α(t)), y(α(t)), z(α(t)))= (x(u(t), v(t)), y(u(t), v(t)), z(u(t), v(t)))

is a curve in U . Then the derivatives of the component functions x, y, andz restricted on the curve α(t) are given as

d(x ◦ α)(t)dt

=∂x

∂u

du(t)dt

+∂x

∂v

dv(t)dt

d(y ◦ α)(t)dt

=∂y

∂u

du(t)dt

+∂y

∂v

dv(t)dt

d(z ◦ α)(t)dt

=∂z

∂u

du(t)dt

+∂z

∂v

dv(t)dt

.

These three equations together can be written in the chain rule

β′(t) = D(F ◦ α)t =

d(x◦α)dt

d(y◦α)dt

d(z◦α)dt

t

=

∂x∂u

∂x∂v

∂y∂u

∂y∂v

∂z∂u

∂z∂v

α(t)

[dudt

dvdt

]

t

= DFα(t)Dαt.

This shows that DFα(t) transforms the tangent vectors α′(t) at α(t) to vectorβ′(t) tangent to β(t).

Example 8.5.1 Consider the spherical coordinatesF (θ, φ) = (x(θ, φ), y(θ, φ), z(θ, φ)), given by, for (θ, φ) ∈ (0, 2π)× (0, π),

x(θ, φ) = ρ sinφ cos θ, y(θ, φ) = ρ sinφ sin θ, z(θ, φ) = ρ cosφ.

Let α(t) = (θ(t), φ(t)) = (t, φ0) be a curve, where φ0 is a constant. Thenthe tangent vector to β(t) = (F ◦ α)(t) is

β′(t) = DFα(t)α′(t) =

∂x∂θ

∂x∂φ

∂y∂θ

∂y∂φ

∂z∂θ

∂z∂φ

α(t)

[dθdt

dφdt

]

t

=

−ρ sinφ0 sin t ρ cosφ0 cos t

ρ sinφ0 cos t ρ cosφ0 sin t

0 −ρ sinφ0

[1

0

]=

−ρ sinφ0 sin t

ρ sinφ0 cos t

0

.

¤


Let f : U → R be a differentiable function on U . Then for each fixed v0,(f ◦ F )(u, v0) is a function in u only and so the derivative of (f ◦ F ) in u isjust the partial derivative and so, by the chain rule Theorem 8.4.1, we have

∂(f ◦ F )∂u

∣∣∣∣(u0,v0)

=[∂f

∂x

∂x

∂u+

∂f

∂y

∂y

∂u+

∂f

∂z

∂z

∂u

]

(u0,v0)

.

Similarly, for fixed u0, the partial derivative of (f ◦ F ) in v is:

∂(f ◦ F )∂v

∣∣∣∣(u0,v0)

=[∂f

∂x

∂x

∂v+

∂f

∂y

∂y

∂v+

∂f

∂z

∂z

∂v

]

(u0,v0)

.

These two equations together can be written in the chain rule

D(f ◦ F )X =[

∂(f◦F )∂u

∂(f◦F )∂v

]X

=[

∂f∂x

∂f∂y

∂f∂z

]F (X)

∂x∂u

∂x∂v

∂y∂u

∂y∂v

∂z∂u

∂z∂v

X

= DfF (X)DFX .

Example 8.5.2 Let w = f(x, y, z) = x + 2y + z2 andF (u, v) = (x(u, v), y(u, v), z(u, v)) = (u

v , u2 + ln v, 2u). Then

D(f ◦ F )(u,v) = DfF (u,v)DF(u,v) =

[∂w∂u

∂w∂v

]=

[fx fy fz

]

∂x∂u

∂x∂v

∂y∂u

∂y∂v

∂z∂u

∂z∂v

= [1 2 4u]

1v − u

v2

2u 1v

2 0

=[

1v + 12u − u

v2 + 2v

]. ¤

In general, if F : Rn → Rm and G : Rm → Rp are differentiable functions,then the chain rule holds as:

D(G ◦ F )X = DGF (X)DFX =

∂g1

∂y1· · · ∂g1

∂ym

. . .∂gp

∂y1· · · ∂gp

∂ym

∂f1

∂x1· · · ∂f1

∂xn

. . .∂fm

∂x1· · · ∂fm

∂xn

.

Suppose that z = F (x, y) is a differentiable function and the level curveC = F (x, y) defines a differentiable function y = g(x). Then we haveC = F (x, y) = F (x, g(x)) is a function in x. By differentiating both sides,

0 = Fxdx

dx+ Fy

dy

dx= Fx + Fy

dg(x)dx

.

8.5. Derivatives and Chain rule 251

Thus, if Fy 6= 0, we havedy

dx= −Fx

Fy,

which is called the implicit differentiation.

Example 8.5.3 An equation F (x, y) = x2 + y2 − r2 = 0 defines two func-tions y =

√r2 − x2 and y = −√r2 − x2. In both cases,

2x + 2ydy

dx= 0 =⇒ dy

dx= −x

y,

in which ± sign is included in the sign of the y value. ¤

Let w = f(x, y, z) be a differentiable function, in which z = g(x, y) isalso a differentiable function. Then w = f(x, y, g(x, y)) = F (x, y), which isthe composite of f with φ(x, y) = (x, y, g(x, y)). Thus

DF = Df ·Dφ,

or[

Fx Fy

]=

[fx fy fz

]

1 00 1gx gy

=[

fx + fzgx fy + fzgy

].

Such a function z = g(x, y) is called a constraint of f .

Example 8.5.4 Let w = f(x, y, z) = x2 + y2 + z2, and z = g(x, y) satisfyz3 − xy + yz + y3 = 1. Find ∂w

∂x and ∂w∂y .

Set φ(x, y) = (x, y, g(x, y)). Then, by implicit differentiation,

3z2 ∂z

∂x− y + y

∂z

∂x= 0 =⇒ ∂z

∂x=

y

3z2 + y.

3z2 ∂z

∂y− x + z + y

∂z

∂y+ 3y2 = 0 =⇒ ∂z

∂y=

x− z − 3y2

3z2 + y.

Hence,[

wx wy

]=

[fx fy fz

]

1 00 1gx gy

=[

2x 2y 2z]

1 00 1y

3z2+yx−z−3y2

3z2+y

=[

2x + 2yz3z2+y

2y + 2z(x−z−3y2)3z2+y

]. ¤


8.6 Taylor’s Polynomial

Let z = f(x, y) have continuous partial derivatives on an open domainR ⊆ R2. Let P = (a, b) ∈ D ⊆ R and Q = (x, y) = (a + h, b + k) ∈ D,where h = 4x and k = 4y. Let α(t) = (a + th, b + tk), 0 ≤ t ≤ 1, be theline segment joining P to Q with α′(t) = (h, k). Then the derivative of thefunction F (t) = f ◦ α(t) = f(a + th, b + tk) is given, by the chain rule, as

F ′(0) =[fx

dx

dt+ fy

dy

dt

]

(a,b)

= [hfx + kfy](a,b),

F ′′(0) =[fxx

dx

dt+ fxy

dy

dt

]

(a,b)

h +[fyx

dx

dt+ fyy

dy

dt

]

(a,b)

k

=[fxxh2 + 2fxykh + fyyk

2](a,b)

.

By the Taylor’s formula for functions of 1-variable,

F (1) = F (0) + F ′(0)(1− 0) +F ′′(c)

2!(1− 0)2, for some c ∈ [0, 1],

or f(x, y) = f(a, b) + (fx(a, b)4x + fy(a, b)4y)

+12!

[fxx4x2 + 2fxy4x4y + fyy4y2](a+ch,b+ck).

The last term is the error term for the linear approximation of f(x, y) bythe tangent plane, discussed in Corollary 8.3.4: If

M = max{|fxx(x, y)|, |fxy(x, y)|, |fyy(x, y)| | (x, y) ∈ D},where D is a rectangle in R centered at (a, b), then

Error2(Q,P ) ≤ M

2(|4x|+ |4y|)2.

In general, for fixed h = x − a = 4x and k = y − b = 4y, define adifferential operator D = h ∂

∂x + k ∂∂y so that for a differentiable function f ,

D(f) =[h

∂

∂x+ k

∂

∂y

](f) = hfx + kfy,

D2(f) =[h

∂

∂x+ k

∂

∂y

]2

(f) = h2fxx + 2hkfxy + k2fyy,

D3(f) =[h

∂

∂x+ k

∂

∂y

]3

(f) = h3fxxx + 3h2kfxxy + 3hk2fxyy + k3fyyy,

...

8.7. EXTREME VALUES 253

Now, if f is smooth enough on R,

F (n)(0) =dn

dtnF (0) = Dn(f)(a,b) =

[h

∂

∂x+ k

∂

∂y

]n

(f)(a,b).

The Taylor polynomial is given as:

F (1) = F (0) + F ′(0)(1− 0) +F ′′(0)

2!(1− 0)2 + · · ·+ F (n)(0)

n!(1− 0)n

+F (n+1)(c)(n + 1)!

(1− 0)n+1, for some c ∈ [0, 1],

or f(x, y) = f(a, b) + [fxh + fyk](a,b) +12!

[h2fxx + 2hkfxy + k2fyy](a,b) +

· · ·+ 1n!

Dn(f)(a,b) +1

(n + 1)!Dn+1(f)(a+ch,b+ck).

8.7 Extreme Values

Most of optimization problems in applications are concerned about the max-imization or minimization of certain functions of several variables. Whenthe function is smooth on the domain, those local extrema usually occur atsome boundary points of the domain, or at the points where the tangentplanes are horizontal, or the derivatives are zero, but not all the time. Apoint where the tangent plane is horizontal, but is not a local extremum, iscalled a saddle point.

Definition 8.7.1 Let z = f(x, y) be function on a domain R ⊆ R2. For(a, b) ∈ R, f(a, b) is a local maximum (or a local minimum) if f(a, b) ≥(or, ≤) f(x, y) for all (x, y) ∈ R in an open disk centered at (a, b).

Theorem 8.7.1 (First derivative test) If z = f(x, y) has a local ex-tremum at an interior point (a, b) ∈ R, and if the first partial derivativesexist there, then fx(a, b) = 0 = fy(a, b).

Proof: If z = f(x, y) has a local extremum at (a, b) ∈ R, then g(x) = f(x, b)also has a local extremum at x = a. Thus g′(a) = fx(a, b) = 0. Similarly,fy(a, b) = 0. ¤


Definition 8.7.2 An interior point (a, b) of R is called a critical point off(x, y) if either fx or fy does not exist there, or fx(a, b) = 0 = fy(a, b). Forcritical point (a, b) in R, the point (a, b, f(a, b)) on the graph of z = f(x, y)is called a saddle point if on every open disk at (a, b) there are points (x, y)where f(a, b) > f(x, y), and points where f(a, b) < f(x, y).

Let (a, b) ∈ R be a critical point of f such that fx(a, b) = 0 = fy(a, b),or D(f) = [hfx + kfy](a,b) = 0. Thus, from the Taylor’s formula,

f(x, y) = f(a, b) +12!

[h2fxx + 2hkfxy + k2fyy](a,b) + Error3(Q,P ),

where |Error3(Q,P )| → 0 for sufficiently small (h, k) = (x−a, y−b). Hence,

f(x, y)− f(a, b) ≈ 12!

[h2fxx + 2hkfxy + k2fyy](a,b),

and so both sides have the same sign. Thus,f(a, b) is maximum if f(x, y)− f(a, b) < 0, ∀ (x, y) ∈ D,f(a, b) is minimum if f(x, y)− f(a, b) > 0, ∀ (x, y) ∈ D.

The right side is

fxx[f(x, y)− f(a, b)] ≈ 12!

[h2f2xx + 2hkfxxfxy + k2fxxfyy]

=12!

(hfxx + kfxy)2 + k2(fxxfyy − f2xy).

Theorem 8.7.2 (Second derivative test) (1) f(x, y)− f(a, b) > 0, orf(a, b) is minimum, if fxxfyy − f2

xy > 0 and fxx > 0 at (a, b),

(2) f(x, y)− f(a, b) < 0, or f(a, b) is maximum , if fxxfyy − f2xy > 0 and

fxx < 0 at (a, b).

(3) f(a, b) is a saddle point, if fxxfyy − f2xy < 0 at (a, b) since f(x, y) −

f(a, b) takes both signs depending on the values of h and k.

(4) The test fails if fxxfyy − f2xy = 0 at (a, b), since there is a possibility

that f(x, y)− f(a, b) = 0.

The expression [fxxfyy − f2xy](a,b) is called the Hessian of f , which can

be written as the determinant:

Hf (a, b) =∣∣∣∣

fxx fxy

fxy fyy

∣∣∣∣(a,b)

.

8.8. LAGRANGE MULTIPLIERS 255

Example 8.7.1 Find the local extrema of f(x, y) = xy−x2−y2−2x−2y+4.

fx = y − 2x− 2 = 0, fy = x− 2y − 2 = 0 =⇒ x = −2 = y.

Thus, (−2,−2) is the only critical point of f . Since

fxx = −2 < 0, fyy = −2, fxy = 1,

the Hessian of f at (a, b) = (−2,−2) is 3. Thus, f(−2,−2) = 8 is the localmaximum. ¤

Example 8.7.2 Find the absolute extrema of f(x, y) = 2+2x+2y−x2−y2

on the triangular region R = {(x, y) | x ≥ 0, y ≥ 0, y ≤ 9− x}.(1) On the interior: By solving fx = 2 − 2x = 0 and fy = 2 − 2y = 0,

(a, b) = (1, 1) is the only critical point in R with value f(1, 1) = 4.(2) On the boundary:

(i) At the corner P1 = (0, 0), f(0, 0) = 2, P2 = (0, 9), f(0, 9) = −61, andP3 = (9, 0), f(9, 0) = −61.

(ii) On y = 0, f(x, 0) = 2 + 2x − x2 with x ∈ [0, 9]. Thus f ′(x, 0) =2− 2x = 0 means x = 1. Thus f(1, 0) = 3.

(iii) On x = 0, f(0, x) = 2+2y−y2 with y ∈ [0, 9]. Thus f ′(0, y) = 2−2y =0 means y = 1. Thus f(0, 1) = 3.

(iv) On y = 9 − x, f(x, 9 − x) = 2 + 2x + 2(9 − x) − x2 − (9 − x)2 =−61+18x−2x2 with y ∈ [0, 9]. Thus f ′(x, 9−x) = 18−4x = 0 meansx = 9

2 , and so y = 92 . Thus f(9

2 , 92) = −41

2 .

Thus, the maximum of f is f(1, 1) = 4, and the minima occur at the corners:f(0, 9) = −61 = f(9, 0). ¤

8.8 Lagrange Multipliers

The method of Lagrange multipliers is useful to find the extreme values ofw = f(x, y, z) with its domain restricted to some constraint. The followingexamples show how to find such extreme values.

Example 8.8.1 Find the point P = (x, y, z) on the plane 2x+y−z−5 = 0closest to the origin.


Solution: To minimize the distance from the origin to a point P , it is goodenough to minimize w = f(x, y, z) = x2 + y2 + z2 subject to the constraint:2x + y− z− 5 = 0. Solving this for z, we get z = g(x, y) = 2x + y− 5. Thusour problem is reduced to minimize

w = F (x, y) = f ◦ φ(x, y) = f(x, y, 2x + y − 5) = x2 + y2 + (2x + y − 5)2,

where φ(x, y) = (x, y, 2x + y − 5). Now, solve Fx = 2x + 2(2x + y − 5)2 =10x + 4y − 20 = 0 and Fy = 2y + 2(2x + y − 5) = 4x + 4y − 10 = 0, toget x = 5

3 , y = 56 . By the second derivative test, F (5

3 , 56) is the minimum,

since HF (53 , 5

6) = 24 > 0 and Fxx = 10 > 0. Since z = 2x + y − 5 = −56 ,

P = (53 , 5

6 ,−56). ¤

Remark: In Example 8.8.1, the constraint represents a level surface inR3, which can be expressed in various ways, and so the functional valuew = f(x, y, z) usually changes in different rate depending on what dependentvariables we choose, as the following example shows one has to specify whatthe dependent variables are. ¤

Example 8.8.2 Find ∂w∂x of w = f(x, y, z) = x2 + y2 + z2 subject to a

constraint x2 + y2 − z = 0.

Solution: (1) If we choose x, y as independent variables and z as a de-pendent variable, then z = g(x, y) = x2 + y2, and so w = f ◦ φ(x, y) =f(x, y, g(xy)) with φ(x, y) = (x, y, g(x, y)). Thus

[∂w∂x

∂w∂y

]=

[fx fy fz

]

1 00 1gx gy

=

[2x 2y 2z

]

1 00 12x 2y

=[

2x + 4x(x2 + y2) 2y + 4y(x2 + y2)].

(2) If we choose x, z as independent variables and y as a dependent vari-able, then y = h(x, x) =

√z − x2, and so w = f ◦ψ(x, z) = f(x, h(x, z), z) =

x2 + (z − x2) + z2 = z + z2. Thus

∂w

∂x= 0.

(3) A geometrical interpretation: The level surface x2 + y2 − z = 0 isthe paraboloid as the following picture. When the x-coordinate of a point Pon the paraboloid varies, while holding y(= 0), as an independent variable,fixed, P moves along the parabola z = x2. Thus w as the distance from theorigin to P changes so that ∂w

∂x = 2x + 4x3 + 4xy2 6= 0.

8.8. Lagrange multipliers 257

6

) zx y

z = x2

Pc = x2 + y2

If we take y as a dependent variable, and if the x-coordinate of a point Pon the paraboloid varies, while holding z, as an independent variable, fixed,then P moves along the circle c = x2 + y2. Hence the distance w from theorigin to P remains constant and so ∂w

∂x = 0. ¤

Example 8.8.3 Find the point P = (x, y, z) on the hyperbolic cylinderx2 − z2 − 1 = 0 closest to the origin.

Solution: Again, to minimize the distance from the origin to a point P , itis good enough to minimize w = f(x, y, z) = x2 + y2 + z2 subject to theconstraint: g(x, y, z) = x2 − z2 − 1 = 0.

Note that the constraint g(x, y, z) = 0 describes a level surface S inR3. On the other hand, f(x, y, z) = c2 is also a level surface, which is thesphere of radius c centered at the origin, so that the points on this sphereare at the same distance from the origin. As the the radius increases, whenthe sphere touches the hyperbolic cylinder, the point of contact will be theclosest point on the hyperboloid to the origin. At this contact point, thetwo surface will be tangent to each other and so their normal vectors will beparallel to each other. But those normal vector are just the gradient vectorssince the surfaces are level surface and the gradients are normal to the levelsurfaces. Hence, at contact point P , we can write

∇f(P ) = λ∇g(P ), for some λ ∈ R.

In our problem,

∇f(P ) = (2x, 2y, 2z) = λ(2x, 0, −2z) = λ∇g(P ),

or 2x = 2λx, 2y = 0, 2z = −2λz.

Since P ∈ S, x 6= 0. Thus from the first equation, λ = 1. Then 2z = −2zshows z = 0. Hence P = (x, 0, 0) and it has to be P ∈ S: i.e., 1 = x2− z2 =x2, or x = ±1. Thus, P = (±1, 0, 0). ¤

This method holds in general as follows:


Theorem 8.8.1 Let w = f(x, y, z) be a differentiable function on U ⊆ R3,and α(t) = (x(t), y(t), z(t)) : C ⊆ U a smooth curve in U . If f has a localextremum at P0 = α(t0) ∈ C relative its values on C, then ∇f(P0) ⊥ C, or∇f(P0) · α′(t0) = 0.

Proof: Note that F (t) = (f ◦α)(t) = f(x(t), y(t), z(t)) has local extremumat t0 means that

d

dtFt0 = ∇f(P0) · α′(t0) = 0. ¤

Suppose that w = f(x, y, z) and g(x, y, z) are differentiable functions onU , and suppose that w = f(x, y, z) has a local extremum at P0 on the levelsurface g(x, y, z) = 0 relative to its values on the surface. Then f takes ona local extremum at P0 relative to its values on every differentiable curvethrough P0 on the surface g(x, y, z) = 0. Therefore, ∇f(P0) is orthogonalto the velocity vector of every curve on the surface through P0. Since ∇g isorthogonal to the level surface g(x, y, z) = 0, we must have

∇f(P0) = λ∇g(P0), for some λ ∈ R.

λ is called a Lagrange multiplier, and this is called the method of La-grange multipliers.

Example 8.8.4 Find the extremum values of z = f(x, y) = xy subject toC : x2

8 + y2

2 = 1.

Solution: For∇f(x, y) = (y, x) and∇g(x, y) = (x4 , y),∇f(x, y) = λ∇g(x, y)

gives

y =14λx, x = λy =⇒ y =

14λ2y.

Thus, y = 0 or λ = ±2. However, y 6= 0, otherwise x = λ0 = 0 but(0, 0) 6∈ C. Thus y 6= 0 and λ = ±2 means x = ±2y. Then from theconstraint:

0 =x2

8+

y2

2− 1 = y2 − 1 =⇒ y = ±1, x = ±2.

In fact, f(±2, ±1) = 2 is the maximum and f(±2, ∓1) = −2 is the mini-mum. ¤

Sometimes, it can be asked to find the extremum values of w = f(x, y, z)subject to two constraints: g1(x, y, z) = 0 and g2(x, y, z) = 0 with ∇g1 ∦

8.8. Lagrange multipliers 259

∇g2. In this case, it can also be found by introducing two Lagrange mul-tipliers λ1 and λ2: Find the values x, y, z, λ1, and λ2 from the followingequations

∇f = λ1∇g1 + λ2∇g2, g1(x, y, z) = 0, g2(x, y, z) = 0.

Geometrically, the two level surfaces of the constraints intersect in asmooth curve C, and we are looking for points on the curve C where f takeslocal extrema relative to its values on the curve. These are the points where∇f is orthogonal to C. Since ∇g1 and ∇g2 are both orthogonal to C, ∇flies in the plane spanned by ∇g1 and ∇g2: i.e., ∇f = λ1∇g1 + λ2∇g2 forsome λ1 and λ2.

C -

67

∇g1

∇g2

∇f

P0

g2 = 0

g1 = 0

Example 8.8.5 Find the points closest to the origin subject to two con-straints g1(x, y, z) = x + y + z = 1 and g2(x, y, z) = x2 + y2 = 1.

j¼

(1, 0, 0) (0, 1, 0)

6z

C

x + y + z = 1

x2 + y2 = 1yx

Solution: The intersection C of the two constraints is an ellipse. and wewant to find extreme values of f(x, y, z) = x2 + y2 + z2 on the ellipse, we


solve

∇f(x, y, z) = λ1∇g1(x, y, z) + λ2∇g2(x, y, z)or (2x, 2y, 2z) = λ1(1, 1, 1) + λ2(2x, 2y, 0),

or 2x = λ1 + 2λ2x, 2y = λ1 + 2λ2y, 2z = λ1.

Thus, (1− λ2)x = z and (1− λ2)y = z, and so λ2 = 1 and z = 0, or λ2 6= 1and x = y = z

1−λ2.

If z = 0, then λ1 = 0, and from the constraints, x+y = 1 and x2+y2 = 1.Thus, 0 = x2 + (1 − x)2 − 1 = 2x(x − 1) shows x = 0 and y = 1, or x = 1and y = 0. Hence, at (0, 1, 0) and (1, 0, 0), f = 1 is the minimum.

If x = y, then, from the constraints, 2x + z = 1 or z = 1− 2x, and andx2 + y2 = 2x2 = 1 or x = ± 1√

2: z = 1 ∓ √2. Hence at ( 1√

2, 1√

2, 1 − √2)

and (− 1√2, − 1√

2, 1 +

√2), f takes maximum. Thus the closest points are

(0, 1, 0) and (1, 0, 0) on the ellipse. ¤

Chapter 9

Multiple Integrals

9.1 Double Integrals

Let z = f(x, y) be a continuous function on a rectangular region R = [a, b]×[c, d]. Take a partition P of R by a network of lines parallel to the coordinateaxes into small rectangles Rj , j = 1, . . . , n, with sides 4xj and 4yj . Let4Aj = 4xj4yj be the area of Rj . The Riemann sum of f over R is

Sn =n∑

j=1

f(xj , yj)4Aj , (xj , yj) ∈ Rj .

The norm |P | of a partition P of R is the largest width or height of anyrectangle in the partition. The double integral of f over R is the limit ofthis sequence as |P | → 0, provided it exists:

lim|P |→0

n∑

j=1

f(xj , yj)4Aj =∫∫

RfdA =

∫∫

Rf(x, y)dxdy.

In an advanced Calculus, if the limit exists, then the following can beproven:

(1) The limit is independent of the order in which the region Rj are num-bered.

(2) The limit is independent of the choice of (xj , yj) ∈ Rj .

(3) If f is continuous on R, then the limit exists and unique.

The following theorem is a direct consequence from the definition of thelimits.

261

262 Chapter 9. Multiple Integrals

Theorem 9.1.1 Suppose f and g are continuous on R. c ∈ R. Then

(1)∫∫

R cfdA = c∫∫

R fdA.

(2)∫∫

R(f ± g)dA =∫∫

R fdA± ∫∫R gdA.

(3)∫∫

R fdA ≥ 0 if f(x, y) ≥ 0 in R.

(3′)∫∫

R fdA ≥ ∫∫R gdA if f(x, y) ≥ g(x, y) in R.

(4) If R = R1∪R2, then∫∫

R fdA =∫∫

R1fdA +

∫∫R2

fdA.

Note that if f(x, y) ≥ 0 on R, the f(xj , yj)4Aj is the volume of thevertical rectangular box approximating the volume of the solid under thegraph of f over Rj . Thus

∫∫R fdA is the volume of the solid under the

graph of f over the region R.

Example 9.1.1 Let R = [0, 2]× [0, 1], and z = f(x, y) = 4−x−y is definedover R. Consider the solid under the graph over R.

+ s

64

4 4

21

x

F (x)

R

y

For each x ∈ [0, 2], let F (x) denote the area of the cross section cut bythe plane parallel to yz-plane through x. Then

∫ 20 F (x)dx will be the volume

of the solid. Since F (x) is the area under the graph f(x, y) = 4−x− y overthe interval 0 ≤ y ≤ 1, F (x) =

∫ 10 f(x, y)dy =

∫ 10 (4− x− y)dy. Thus

V olume =∫ 2

0F (x)dx =

∫ 2

0

(∫ 1

0(4− x− y)dy

)dx

=∫ 2

0

[4y − xy − y2

2

]1

y=0

dx =∫ 2

0

(72− x

)dx

=[72x− x2

2

]2

x=0

= 5.

9.1. Double integrals 263

On the other hand, for each y ∈ [0, 1], let G(y) denote the area ofthe cross section cut by the plane parallel to xz-plane through y. Then∫ 10 G(y)dy will also be the volume of the solid. Since G(y) is the area un-

der the graph f(x, y) = 4 − x − y over the interval 0 ≤ x ≤ 2, G(y) =∫ 20 f(x, y)dx =

∫ 20 4− x− ydx. Thus

V olume =∫ 1

0G(y)dy =

∫ 1

0

(∫ 2

0(4− x− y)dx

)dy

=∫ 1

0

[4x− x2

2− xy

]2

x=0

dx =∫ 1

0(6− 2y) dx

=[6y − y2

]1

y=0= 5.

These two integrals are called iterated, or repeated, integral. These twocomputation show that the iterated integral is independent of the order ofintegration. ¤

The following Fubini Theorem is intuitively clear and one can overlookthe proof. It says that the double integral of any continuous function overa region in R2 can be evaluated by an iterated integral in either order.

Theorem 9.1.2 (Fubini’s Theorem I) If z = f(x, y) is continuous overa rectangular region R = [a, b]× [c, d], then

∫∫

Rf(x, y)dA =

∫ d

c

∫ b

af(x, y)dxdy =

∫ b

a

∫ d

cf(x, y)dydx.

Proof: Let c = y0 < y1 < · · · < yn = d be a partition of [c, d] into nsubintervals. For x ∈ [a, b], define

F (x) =∫ d

cf(x, y)dy =

n−1∑

k=0

∫ yk+1

yk

f(x, y)dy =n−1∑

k=0

f(x, dk(x))4yk,

where 4yk = yk+1 − yk, and dk(x) ∈ [yk, yk+1], by the mean value theorem.Thus

∫ b

a

[∫ d

cf(x, y)dy

]dx =

∫ b

aF (x)dx = lim

m→∞

m−1∑

j=1

F (cj)4xj ,

where a = x0 < x1 < · · · < xm = b is a partition of [a, b] into m subintervals,cj ∈ [xj , xj+1] and 4xj = xj+1 − xj . Since F (cj) =

∑n−1k=0 f(cj , yk(cj))4yk,

limn→∞

m−1∑

j=1

F (cj)4xj = limn,m→∞

m−1∑

j=1

n−1∑

k=0

f(cj , dk(cj))4yk4xj =∫∫

Rf(x, y)dA.


By the same reasoning, we have

∫ d

c

∫ b

af(x, y)dxdy =

∫∫

Rf(x, y)dA. ¤

Fubini Theorem may be generalized to the case where f is not necessarilycontinuous.

Example 9.1.2 For f(x, y) = 1− 6x2y on R = [0, 2]× [−1, 1],

∫∫

Rf(x, y)dA =

∫ 1

−1

∫ 2

0(1− 6x2y))dxdy =

∫ 1

−1[x− 2x3y]20dy

=∫ 1

−1(2− 16y)dy = [2y − 8y2]1−1 = 4.

=∫ 2

0

∫ 1

−1(1− 6x2y))dydx =

∫ 2

0[y − 3x2y2]1−1dx

=∫ 2

0[(1− 3x2)− (−1− 3x2)]dx =

∫ 2

02dx = 4. ¤

Let z = f(x, y) be a continuous function over a bounded, but non-rectangular, region R. For the double integral of f over R, we take a par-tition P of R as the above case and consider only those small rectanglescontained completely in R. As the norm |P | → 0, those small rectangleswill fill up the region R. Thus, as the case of the rectangular region, wedefine the double integral of f by

∫∫

Rf(x, y)dA = lim

|P |→0Sn = lim

|P |→0

n∑

j=1

f(xj , yj)4Aj ,

provided the limit exists. This integral also have the same algebraic prop-erties as that over a rectangular regions, like Theorem 9.1.1.

(xj , yj)

9.1. Double integrals 265

In particular, if the boundary of R consists of two graphs y = g1(x) ≤y = g2(x) over the interval a ≤ x ≤ b and the lines x = a and x = b, then theFubini’s Theorem also says that the double integral is the iterated integral:

-

6

g1(x)

g2(x)

a bx

y6

-c

d

h1(y) h2(y)y

x

Theorem 9.1.3 (Fubini’s Theorem II) Let f(x, y) be a bounded func-tion on a region R with possibly discontinuities on a finite union of graphsof continuous functions.

(1) If R = {(x, y) | a ≤ x ≤ b, g1(x) ≤ y ≤ g2(x)} where gi are continuousfunctions on [a, b], then

∫∫

Rf(x, y)dA =

∫ b

a

∫ g2(x)

g1(x)f(x, y)dydx.

(2) If R = {(x, y) | c ≤ y ≤ d, h1(y) ≤ x ≤ h2(y)} where hi are continuousfunctions on [c, d], then

∫∫

Rf(x, y)dA =

∫ d

c

∫ h2(y)

h1(y)f(x, y)dxdy.

Example 9.1.3 Let f(x, y) = 3− x− y over R = {(x, y) | 0 ≤ x ≤ 1, 0 ≤y ≤ x}. The volume under the graph of f over R is

V =∫ 1

0

∫ x

0(3− x− y)dydx =

∫ 1

0

(3x− 3x2

2

)dx = 1.

=∫ 1

0

∫ 1

y(3− x− y)dxdy =

∫ 1

0

(52− 4y +

3y2

2

)dy = 1. ¤

-

6

x

y1

1

y = x

x

6-y


Example 9.1.4 Let f(x, y) = sin xx over R = {(x, y) | 0 ≤ x ≤ 1, 0 ≤ y ≤

x}. The volume under the graph of f over R is

V =∫ 1

0

∫ x

0

sinx

xdydx =

∫ 1

0

[sinx

xy

]x

y=0

dx

=∫ 1

0sinxdx = − cos(1) + 1 ≈ 0.46.

However,

V =∫ 1

0

∫ 1

y

sinx

xdxdy

is not possible to integrate. ¤

There is no general rule for predicting whose order of integration will bebetter than the other order like Example 9.1.4 shows. If one order does notwork, try the other order.

For a complicated region, take proper division of the region into manyregions on which the procedure works. When the region R can be expressedin both ways of the Fubini’s Theorem 9.1.3 like the above example, theintegration boundary should be properly set up:

Example 9.1.5 Let R = {(x, y) | x + y ≥ 1, x2 + y2 ≤ 1} which is theintersection of the unit disk and the upper half plane above the line x+y = 1.

(1) For a fixed y ∈ [0, 1], x varies from x = −y + 1 to√

1− y2. Thus

the iterated integral can be written as∫ 10

∫√1−y2

−y+1 f(x, y)dxdy.

(2) For a fixed x ∈ [0, 1], y varies from y = −x+1 to√

1− x2. Thus the

iterated integral can be written as∫ 10

∫ √1−x2

−x+1 f(x, y)dydx. ¤

9.2 Center of Mass

When f(x, y) = 1 on R, the double integral is

Area(R) =∫∫

RdA

is the area of R, and the average value of an integrable function f(x, y)on R is defined by

Ave(f) =1

Area(R)

∫∫

RfdA.

9.2. Center of mass 267

Example 9.2.1 Find the area enclosed by y = x2 and y = x + 2.

Solution: They cross at (−1, 1) and (2, 4). Thus the area is

Area(R) =∫∫

RdA =

∫ 2

−1

∫ x+2

x2

dydx =∫ 2

−1(x + 2− x2)dx =

92.

=∫ 1

0

∫ √y

−√ydxdy +

∫ 4

1

∫ √y

y−2dxdy.

¤

Example 9.2.2 Find the average value of f(x, y) = x cosxy on the regionR = [0, π]× [0, 1].

Solution: Area(R) = π.

∫ π

0

∫ 1

ox cosxydydx =

∫ π

0sinxdx = [− cosx]π0 = 2,

∴ Aver(f) =2π

. ¤

When f(x, y) = δ(x, y) is the mass density of R,

M =∫∫

Rδ(x, y)dA

is the mass of the region R. Since 4m = δ(xj , yj)4Aj is the mass of thesmall rectangle Rj , xδ(xj , yj)4Aj is the moment of Rj about y-axis. Hence,

My =∫∫

Rxdm =

∫∫

Rxδ(x, y)dA, Mx =

∫∫

Rydm =

∫∫

Ryδ(x, y)dA

are the first moments of R about the axes. Then the center of mass isgiven by

x =My

M, y =

Mx

M.

Example 9.2.3 Find the centroid of the region bounded by y = x andy = x2.


Solution: We set the mass density δ(x, y) = 1:

M =∫ 1

0

∫ x

x2

dydx =∫ 1

0(x− x2)dx =

16,

Mx =∫ 1

0

∫ x

x2

ydydx =∫ 1

0(x2

2− x4

2)dx =

115

,

My =∫ 1

0

∫ x

x2

xdydx =∫ 1

0(x2 − x3)dx =

112

.

∴ x =Mx

M=

1/121/6

=12, y =

My

M=

1/151/6

=25. ¤

For a rotating shaft, the energy that is necessary to accelerate or stopthe shaft is called the moment of inertia. Consider a thin cross section ofthe shaft and partition the section into small blocks of mass 4m and let rdenote the distance from the block’s center of mass to the axis of rotation.

θ

4θ

4r 4m

r

If the angular velocity of the shaft is ω = dθdt , the linear speed of the

block’s center of mass will be

v =d

dt(rθ) = r

dθ

dt= rω.

The block’s kinetic energy is defined to be

4KE =124mv2 =

12(rω)24m.

Hence the total kinetic energy is

KE =∫∫

R

12ω2r2dm =

12ω2

∫∫

Rr2dm =

12Iω2,

9.2. Center of mass 269

where the factor

I =∫∫

Rr2dm =

∫∫

R(x2 + y2)δ(x, y)dxdy

is called the moment of inertia of the shaft about the axis of rotation.Recall that to start or to stop a locomotive with mass m moving at a

linear velocity v, we need to provide a kinetic energy of

KE =12mv2.

To rotate a shaft with moment of inertial I at an angular velocity ω, weneed to provide a kinetic energy of

KE =12Iω2,

and so the shaft’s moment of inertia is analogous to the locomotive’s mass:What makes the locomotive hard to start or stop is its mass, and whatmakes the shaft hard to start or stop is its inertia.

The moment of inertia also plays a role in determining how much ahorizontal metal beam will bend under a load. The stiffness of the beamis a constant times the moment of inertia I of a cross-section of the beamabout the beam’s longitudinal axis:

Moments of Inertia

About the x-axis Ix =∫∫

y2δ(x, y)dA

About the y-axis Iy =∫∫

x2δ(x, y)dA

About a line L Ix =∫∫

r2δ(x, y)dA, r is the distance to L,

About the origin I0 =∫∫

(x2 + y2)δ(x, y)dA = Ix + Iy.

The radii of gyration are defined by

R2x =

Ix

M, R2

y =Iy

M, R2

0 =I0

M.

Example 9.2.4 Find the moment of inertia of the following I-beam and asteel bar with rectangular cross section:


3 1

2

2

4

A

1

B

Solution: We set the mass density δ(x, y) = 1. For the steel bar:

I(A)x =∫ 1.5

−1.5

∫ 2

1y2dydx +

∫ 0.5

−0.5

∫ 1

−1y2dydx +

∫ 1.5

−1.5

∫ −1

−2y2dydx

=∫ 1.5

−1.5

73dx +

∫ 0.5

−0.5

23dx +

∫ 1.5

−1.5

73dx

=733 +

23

+733 = 14

23.

I(B)x =∫ 1

−1

∫ 2

−2y2dydx =

163

∫ 1

−1dx =

323

= 1023. ¤

9.3 Double Integrals in Polar Form

Suppose that z = f(r, θ) is defined over a region R which is expressed inpolar coordinates r and θ:

R = {(r, θ) | 0 ≤ g1(θ) ≤ r ≤ g2(θ), α ≤ θ ≤ β}.

O

θ = α

θ = β4r

4θ

(rj , θj)

g1(θ)

r = g2(θ)

4θ

9.3. Double integrals in polar form 271

Take a partition of R into polar rectangles by part of circles r = con-stants, and rays θ = constants as the following picture. One can easily findthe area of each polar rectangle to be

4Aj = rj4r4θ.

Hence, the double integral of f over R in polar coordinates is∫∫

Rf(r, θ)dA = lim

n∑

j=1

f(rj , θj)4Aj = limn∑

j=1

f(rj , θj)rj4r4θ

=∫∫

Rf(r, θ)rdrdθ.

By the Fubini’s Theorem, this double integral is equal to the iterated inte-gral:

∫∫

Rf(r, θ)dA =

∫∫

Rf(r, θ)rdrdθ =

∫ β

α

∫ g2(θ)

g1(θ)f(r, θ)rdrdθ.

Example 9.3.1 Find the area of the region R enclosed by the lemniscater2 = 4 cos 2θ.

Solution: Since r2 = 4 cos 2θ ≥ 0 only on [−π4 , π

4 ] ∪ [−3π4 , 5π

4 ], and sym-metric about both x and y-axes,

Area(R) = 4∫ π

4

0

∫ √4 cos 2θ

0rdrdθ = 4

∫ π4

02 cos 2θdθ = 4 sin 2θ|

π40 = 4. ¤

Example 9.3.2 Evaluate the integral I =∫∞0 e−x2

dx.

Solution: Consider the following integral:

I2 =(∫ ∞

0e−x2

dx

) (∫ ∞

0e−y2

dy

)=

∫ ∞

0

∫ ∞

0e−(x2+y2)dxdy

=∫ π

2

0

∫ ∞

0e−r2

rdrdθ =∫ π

2

0−1

2e−r2

∣∣∣∣∞

0

dθ =12

∫ π2

0dθ =

π

4,

so that I =√

π2 . ¤


9.4 Triple Integrals

The same procedure can be used to define the triple integral of a functionw = f(x, y, z) defined over a region R in R3:

∫∫∫

Rf(x, y, z)dV = lim

|P |→0Sn = lim

|P |→0

n∑

j=1

f(xj , yj , zj)4Vj ,

provided the limit exists. This integral also have the same algebraic prop-erties as that over a rectangular regions, like Theorem 9.1.1. Moreover, theFubini’s Theorem also holds, and so this triple integral can be evaluated byiterated integrals: Suppose that the region R is described as follows: Forx ∈ [a, b], y ∈ [g1(x), g2(x)], and for those (x, y), z ∈ [h1(x, y), h2(x, y)].Then the triple integral can be evaluated by

∫∫∫

Rf(x, y, z)dV =

∫ b

a

∫ g2(x)

g1(x)

∫ h2(x,y)

h1(x,y)f(x, y, z)dzdydx.

If f(x, y, z) = 1, the the triple integral is the volume of R.

Example 9.4.1 Find the volume of the region R enclosed by the surfacesz = h1(x, y) = x2 + 3y2 and z = h2(x, y) = 8− x2 − y2.

Solution: The two surfaces intersect on the elliptical cylinder:

x2 + 3y2 = 8− x2 − y2, or x2 + 2y2 = 4, z ≥ 0.

Thus for y ∈ [−√2,√

2], x ∈ [−√

4− 2y2,√

4− 2y2]. For those (x, y),z ∈ [x2 + 3y2, 8− x2 − y2]. Thus,

V =∫∫∫

Rf(x, y, z)dV =

∫ √2

−√2

∫ √4−2y2

−√

4−2y2

∫ 8−x2−y2

x2+3y2

dzdxdy

=∫ √

2

−√2

∫ √4−2y2

−√

4−2y2

(8− 2x2 − 4y2)dxdy

=∫ √

2

−√2

[8x− 2

3x3 − 4y2x

]√4−2y2

−√

4−2y2

dy

=∫ √

2

−√2

[4(4− 2y2)

32 − 4

3(4− 2y2)

32

]dy

=83

∫ √2

−√2(4− 2y2)

32 dy = 8π

√2,

9.5. TRIPLE INTEGRALS IN CYLINDER AND SPHERICAL FORMS273

where the last equality is obtained by the substitution y =√

2 sin θ. ¤

The average value of f over R is defined by

Aver(f) =1

V ol(R)

∫∫∫

RfdV.

The mass, the first moments, the center of mass, the moments of inertia areall well-defined similar to the 2-dimensional cases.

9.5 Triple Integrals in Cylinder and Spherical Forms

The cylindrical coordinates for space is obtained by the polar coordinates inxy-plane, while keeping z-axis. Thus a point in cylindrical coordinates aregiven by (x, y, z) = ϕ(r, θ, z) with

x = r cos θ, y = r sin θ, z = z, r =√

x2 + y2.

Thus, r = r0 is is the cylinder of radius a whose axis is z-axis, and θ = θ0 isthe half plane through z-axis and make an angle θ0 with positive x-axis.

θ

The volume element is dV = rdrdθdz. The iterated integral of the tripleintegral of a function w = f(x, y, z) is

∫∫∫

Rf(x, y, z)dV =

∫ θ2

θ1

∫ r2(θ)

r1(θ)

∫ z2(r,θ)

z1(r,θ)f(r, θ, z)dz r dr dθ.

Example 9.5.1 Find the centroid of the solid R bounded by the cylinderx2 + y2 = 4, z = 0 and z = x2 + y2.


Solution: Since R is symmetric about z-axis, it is good enough to find z.The polar expression of the cylinder is r ≤ 2, and z = r2.

Mxy =∫∫∫

Rz dV =

∫ 2π

0

∫ 2

0

∫ r2

0z dz r dr dθ

=∫ 2π

0

∫ 2

0

r5

2dr dθ =

∫ 2π

0

[r6

12

]2

0

dr dθ =∫ 2π

0

163

dθ =32π

3.

M =∫∫∫

RdV =

∫ 2π

0

∫ 2

0

∫ r2

0dz r dr dθ

=∫ 2π

0

∫ 2

0r3 dr dθ =

∫ 2π

0

[r4

4

]2

0

dr dθ =∫ 2π

04 dθ = 8π.

∴ z =Mxy

M=

43. ¤

The spherical coordinates of points in the space are given by (x, y, z) =ψ(ρ, φ, θ) where

x = r cos θ = ρ sinφ cos θ,

y = r sin θ = ρ sinφ sin θ,

z = ρ cosφ,

r = ρ sinφ,

ρ =√

x2 + y2 + z2.

Thus, ρ = ρ0 is is the sphere of radius a centered at the origin, φ = φ0

is the cone with vertex at the origin,and θ = θ0 is the half plane throughz-axis and make an angle θ0 with positive x-axis.

x

y

z

φ

θ

ρ

r

ρ sinφ4θρ4φ4ρ

4θ

9.6. COORDINATE TRANSFORMS 275

If we take a partition of a region R in R3 by this spherical coordinateswith ρ = constants, φ = constants, and θ = constants, then the volumeelement of a small solid with sides 4ρ, ρ4φ, ρ sinφ4θ is given by

4V = ρ2 sinφ dρ dφ dθ.

Hence, the triple integral of a function f(ρ, φ, θ) over a region R is evaluateiteratively as

∫∫∫

Rf(ρ, φ, θ)dV =

∫ θ2

θ1

∫ φ1(θ)

φ0(θ)

∫ ρ(φ,θ)

ρ(φ,θ)f(ρ, φ, θ)ρ2 sinφdρ dφ dθ.

Example 9.5.2 Let R the ice cream cone φ = π3 and ρ ≤ 1. Find the

volume and the moment of inertia of R about z-axis.

Solution: Note that x2 + y2 = r2 = ρ2 sin2 φ.

V =∫ 2π

0

∫ π3

0

∫ 1

0ρ2 sinφ dρ dφ dθ

=13

∫ 2π

0

∫ π3

0sinφ dφ dθ =

13

∫ 2π

0[− cosφ]

π30 dθ =

16(2π) =

π

3.

Iz =∫ 2π

0

∫ π3

0

∫ 1

0ρ4 sin3 φ dρ dφ dθ

=15

∫ 2π

0

∫ π3

0sin3 φ dφ dθ =

15

∫ 2π

0

∫ π3

0(1− cos2 φ) sinφ dφ dθ

=15

∫ 2π

0

[− cosφ +

cos3 φ

3

]π3

0

dθ =124

(2π) =π

12. ¤

9.6 Coordinate Transforms

A geometrical object in the space may be expressed in various ways: by us-ing rectangular coordinates, cylindrical coordinates, spherical coordinates,or may be by some other coordinate systems. The usual integration by sub-stitutions in one or several variables are simply coordinate transformationsinto one of the above types depending on the convenience of the expression.For example. The sphere has simpler expression ρ = ρ0 in spherical coordi-nates then in rectangular coordinates x2 + y2 + z2 = ρ2

0. When we take anycoordinate transforms, the volume, or area, element changes.


We illustrate this in 2-dimensional case. Higher dimensional cases aresimply adding more variables. Those coordinate transformations are ex-pressed by some differentiable mappings ϕ : D ⊆ R2 → U ⊆ R2 of theform

ϕ(u, v) = (x(u, v), y(u, v)) = (x, y) ∈ U , for (u, v) ∈ D.

Consider a small rectangular area R in D and its image ϕ(R) in U under ϕ.

-

6

u0 u0 +4u

v0

v0 +4v

-

6

3K

-6

4ui4vj

P0x

y

-ϕ

ϕ(P0)

ϕ(P0)(4ui)

ϕ(P0)(4vj)

Rϕ(R)

Consider the coordinate curves

α1(t) = (u0 + t4u, v0), α2(t) = (u0, v0 + t4v), t ∈ [0, 1],

and so boundaries of the image ϕ(R) are given by the curves

β1(t) = ϕ◦α1(t) = ϕ(u0+t4u, v0), β2(t) = ϕ◦α2(t) = ϕ(u0, v0+t4v), t ∈ [0, 1].

Then the sides of R are given by

4α1 = α1(1)− α1(0) = α′1(c)(1− 0) = (4u, 0), for some c ∈ [0, 1],4α2 = α2(1)− α2(0) = α′2(d)(1− 0) = (0, 4v), for some d ∈ [0, 1],

where c and d are obtained by the mean value theorem. The sides of theimage ϕ(R) are approximated by

4β1 = β1(1)− β1(0) = β′1(c)(1− 0) = Dϕα1(c) · α′1(c), for some c ∈ [0, 1],

=[ ∂x

∂u∂x∂v

∂y∂u

∂y∂v

]

α1(c)

[ 4u0

]

t=c

=[ ∂x

∂u∂y∂u

]

α1(c)

4u, and

4β2 = β2(1)− β2(0) = β′2(c)(1− 0) = Dϕα2(d) · α′2(d), for some d ∈ [0, 1],

=[ ∂x

∂u∂x∂v

∂y∂u

∂y∂v

]

α2(d)

[04v

]

t=d

=[ ∂x

∂v∂y∂v

]

α2(d)

4v.

The area of ϕ(R) is

4A ≈ ∣∣det[ 4β1 4β2 ]∣∣ =

∣∣∣∣∣∣det

∂x∂u

∣∣α1(c)

4u ∂x∂v

∣∣α2(d)

4v

∂y∂u

∣∣∣α1(c)

4u ∂y∂v

∣∣∣α2(d)

4v

∣∣∣∣∣∣

=

∣∣∣∣∣∣det

∂x∂u

∣∣α1(c)

∂x∂v

∣∣α2(d)

∂y∂u

∣∣∣α1(c)

∂y∂v

∣∣∣α2(d)

∣∣∣∣∣∣4u4v,

9.6. Coordinate transforms 277

where 4u4v = 4A is the area of R. Now as 4u, 4v → 0, (c, d) → (0, 0)and we have

dA =

∣∣∣∣∣det

[ ∂x∂u

∂x∂v

∂y∂u

∂y∂v

]

(u0,v0)

∣∣∣∣∣ dudv =

∣∣∣∣∣det

[ ∂x∂u

∂x∂v

∂y∂u

∂y∂v

]

(u0,v0)

∣∣∣∣∣ dA.

The Jacobian of the coordinate transform (x, y) = ϕ(u, v) is defined as thedeterminant:

J(u, v) =∂(x, y)∂(u, v)

∣∣∣∣(u,v)

= det[ ∂x

∂u∂x∂v

∂y∂u

∂y∂v

]

(u,v)

≡∣∣∣∣

∂x∂u

∂x∂v

∂y∂u

∂y∂v

∣∣∣∣(u,v)

.

Since the area takes positive values, we have

dA = |J(u, v)|dA =∣∣∣∣∂(x, y)∂(u, v)

∣∣∣∣ dudv.

For a continuous function z = f(x, y) on U ⊆ R2, if we take a substitu-tion, or a coordinate transform, (x, y) = ϕ(u, v) from ϕ : D → U , then∫∫

Uf(x, y)dxdy =

∫∫

Uf(x, y)dA =

∫∫

Df(x(u, v), y(u, v))|J(u, v)|dA

=∫∫

Df(x(u, v), y(u, v))|J(u, v)|dudv.

In 3-dimensional case, just an extra coordinates: if ϕ(u, v, w) = (x, y, z)is a coordinate transform of an open set D ⊆ R3 into another set U ⊆ R3,then the Jacobian is the determinant

J(u, v, w) =

∣∣∣∣∣∣∣

∂x∂u

∂x∂v

∂x∂w

∂y∂u

∂y∂v

∂y∂w

∂z∂u

∂z∂v

∂z∂w

∣∣∣∣∣∣∣=

∂(x, y, z)∂(u, v, w)

,

and the triple integral by substitution is written as∫∫∫

Uf(x, y, z)dxdydz

=∫∫∫

Df(x(u, v, w), y(u, v, w), z(u, v, w))|J(u, v, w)|dudvdw.

Example 9.6.1 The polar coordinate system takes the transform

(x, y) = ϕ(r, θ) = (r cos θ, r sin θ), for (r, θ) ∈ (0,∞)× (0, 2π).


Thus the Jacobian is

J(r, θ) =∂(x, y)∂(r, θ)

=∣∣∣∣

∂x∂r

∂x∂θ

∂y∂r

∂y∂θ

∣∣∣∣ =∣∣∣∣

cos θ −r sin θsin θ r cos θ

∣∣∣∣ = r,

∴∫∫

Uf(x, y)dxdy =

∫∫

Df(x(r, θ), y(r, θ))|r|drdθ.

Similarly for the cylindrical coordinates in R3. ¤

Example 9.6.2 The spherical coordinate system takes the transform

(x, y, z) = ϕ(ρ, φ, θ) = (ρ sinφ cos θ, ρ sinφ sin θ, ρ cosφ).

The Jacobian is

J(r, φ, θ) =∂(x, y, z)∂(r, φ, θ)

=

∣∣∣∣∣∣∣

∂x∂ρ

∂x∂φ

∂x∂θ

∂y∂ρ

∂y∂φ

∂y∂θ

∂z∂ρ

∂z∂φ

∂z∂θ

∣∣∣∣∣∣∣

=

∣∣∣∣∣∣

sinφ cos θ ρ cosφ cos θ −ρ sinφ sin θsinφ sin θ ρ cosφ sin θ ρ sinφ cos θ

cosφ −ρ sin θ 0

∣∣∣∣∣∣= ρ2 sinφ,

∴∫∫∫

Uf(x, y, z)dxdydz =

∫∫∫

Df(r, φ, θ)ρ2| sinφ|dρdφdθ. ¤

Example 9.6.3 Evaluate

∫ 4

0

∫ y2+1

y2

2x− y

2dxdy,

by taking substitution u = 2x−y2 and v = y

2 .

Solution: The region U in xy-plane is 0 ≤ y ≤ 4 and y2 ≤ x ≤ y

2 + 1. Thecorresponding region D in uv-plane is determined as

(1) for y = 0, v = y2 = 0,

(2) for y = 4, v = y2 = 2,

(3) for x = y2 , u = 2x−y

2 = 0,

(4) fFor x = y2 + 1, u = 2x−y

2 = 1.


Thus the transformation is obtained by solving for x and y:

y = 2v, 2x = 2u− y = 2u− 2v, or x = u− v.

-x

6

-

6

u

v

1

4

2

1

y

y = 2x− 2y + 2x

y = 4

y = 0u = 1u = 0

v = 2

v = 0

¾ϕ

Hence, the Jacobian is

|Dϕ| = |J(u, v)| =∣∣∣∣

1 10 2

∣∣∣∣ = 2.

∫ 4

0

∫ y2+1

y2

2x− y

2dxdy =

∫ 2

0

∫ 1

0u2dudv =

∫ 2

0[u2]10dv = 2. ¤

Example 9.6.4 Evaluate

∫ 1

0

∫ 1−x

0

√x + y(y − 2x)2dydx.

Solution: The region U in xy-plane is 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1− x. Takethe substitution u = x + y and v = y − 2x. Then the corresponding regionD in uv-plane is determined as:

(1) for x = 0, u = y and v = y, and so v = u,

(2) for y = 0, u = x and v = −2x, and so v = −2u,

(3) for x + y = 1, u = 1.


x =13(u− v), y =

13(2u + v).


-

6

-u1

v = u

v = −2u

u = 1

6v

x

y

x + y = 1¾ϕ


|Dϕ| = |J(u, v)| =∣∣∣∣

13 −1

323

13

∣∣∣∣ =13.

∫ 1

0

∫ 1−x

0

√x + y(y − 2x)2dydx =

∫ 1

0

∫ u

−2uu1/2v2 1

3dvdu

=13

∫ 1

0u1/2

[v3

3

]u

−2u

du

=∫ 1

0u1/2u3du =

29

u9/2]1

0=

29. ¤

Example 9.6.5 Evaluate∫∫

U(√

y

x+√

xy)dA

over the region U = {(x, y) | 1 ≤ xy ≤ 9, x ≤ y ≤ 4x}.

6

t = xy = 4x

xy = 1xy = 9

u-

6

1

1

2

3

v

x

y

Solution: Take the substitution u =√

xy and v =√

yx . Then the corre-

sponding region D in uv-plane is determined as:

(1) for xy = 1, u = 1,


(2) for xy = 9, u = 3,

(3) for y=x, v =

√yx = 1.

(4) for y=4x, v =

√yx = 2.


x =u

v, y = uv.


|Dϕ| = |J(u, v)| =∣∣∣∣

1v − u

v2

v u

∣∣∣∣ = 2u

v.

∫∫

U(√

y

x+√

xy)dA =∫ 3

1

∫ 2

12(u + v)

u

vdvdu =

523

ln 2− 8. ¤

Chapter 10

Vector Fields

10.1 Line Integrals

Let w = f(x, y, z) be real valued function on a domain U ⊆ R3, and C acurve in U joining A and B ∈ U . Take a partition P of the curve C intosmall intervals 4sj , and let (xj , yj , zj) be a point on 4sj , j = 1, . . . , n. Thelimit of the Riemann sum:

lim|4P |→0

n∑

j=1

f(xj , yj , zj)4sj =∫

Cf(x, y, z)ds

is called the line integral of f over the curve C, provided the limit exists.To evaluate this integral, we take a parametrization of C by γ : I =

[a, b] → C ⊆ U given as

γ(t) = (x(t), y(t), z(t)), γ(a) = A, γ(b) = B, t ∈ [a, b].

Then the points (xj , yj , zj) on4sj ⊆ C can be written as (xj , yj , zj) = γ(tj),tj ∈ I, and the values of f along the curve C are given by the composition(f ◦ γ)(t) = f(γ(t)) = f(x(t), y(t), z(t)). Moreover, 4sj ≈ |γ′(tj)|4tj , or

ds = |γ′(t)|dt.

Therefore, the line integral becomes∫

Cf(x, y, z)ds =

∫ b

af(γ(t))|γ′(t)|dt.

Let β(u) = γ(h(u)) be a reparameterization of γ by a differentiablefunction h : [c, d] → [a, b] with t = h(u) and dt = h′(u)du. Note that

h(c) = a and h(d) = b, if h′(u) > 0,

h(c) = b and h(d) = a, if h′(u) < 0,

283

284 Chapter 10. Vector Fields

and |β′(u)| = |γ′(h(u))||h′(u)|. Thus∫ d

cf(β(u))|β′(u)|du =

∫ d

cf(γ ◦ h(u))|γ′(h(u))||h′(u)|du

=

{ ∫ ba f(γ(t))|γ′(t)|h′(u)du if h′(u) > 0

∫ ab f(γ(t))|γ′(t)|(−h′(u))du if h′(u) < 0

=∫ b

af(γ(t))|γ′(t)|dt,

which shows that the line integral is independent of the choice of parametriza-tion γ or β of C and its direction: either γ(a) = A or B. If C is piecewisedifferentiable so that C = C1 + · · ·+ Ck =

∑kj=1 Cj , then

∫

Cfds =

∫

ΣCj

fds =k∑

j=1

∫

Cj

fds.

Example 10.1.1 Integrate f(x, y, z) = x−3y2+z over the line segments C3

joining (0, 0, 0) to (1, 1, 1), and C1 +C2 where C1 is the line segment joining(0, 0, 0) to (1, 1, 0) and C1 is the line segment joining (1, 2, 0) to (1, 1, 1).

Solution: Parametrize Ci’s as

γ1(t) = (t, t, 0), so that γ′1(t) = (1, 1, 0), |γ′1(t)| =√

2, t ∈ [0, 1],γ2(t) = (1, 1, t), so that γ′2(t) = (0, 0, 1), |γ′2(t)| = 1, t ∈ [0, 1],γ3(t) = (t, t, t), so that γ′3(t) = (1, 1, 1), |γ′3(t)| =

√3, t ∈ [0, 1].

∴∫

C1+C2

fds =∫

C1

fds +∫

C2

fds

=∫ 1

0f(γ1(t))

√2dt +

∫ 1

0f(γ2(t))1dt

=√

2∫ 1

0(t− 3t2)dt +

∫ 1

0(t− 2)dt = −

√2

2− 3

2.

∫

C3

fds =∫ 1

0f(γ3(t))

√3dt =

√3

∫ 1

0(2t− 3t2)dt = 0. ¤

Example 10.1.2 Find the mass, center of mass, moment of inertia aboutz-axis of the coil spring C with δ = 1 parameterized as:

γ(t) = (cos 4t, sin 4t, t), t ∈ [0, 2π].

10.2. VECTOR FIELDS 285

Solution: By symmetry, it is clear that the center of mass is at (0, 0, π).Note that γ′(t) = (−4 sin 4t, 4 cos 4t, 1), and so |γ′(t)| = √

17.

M =∫

Cδds =

∫ 2π

0

√17dt = 2π

√17.

Iz =∫

C(x2 + y2)δds =

∫ 2π

0(cos2 4t + sin2 4t)

√17dt = 2π

√17. ¤

Example 10.1.3 Find the center of mass of the metal arch C: y2 +z2 = 1,z ≥ 0, parameterized as:

γ(t) = (0, cos t, sin t), t ∈ [0, π]

with δ(x, y, z) = 2− z.

Solution: Note that γ′(t) = (0, sin t, − cos t), and so |γ′(t)| = 1.

M =∫

Cδds =

∫ π

0(2− sin t)dt = 2π − 2.

Mxy =∫

Czδds =

∫ π

0(sin t)(2− sin t)dt =

∫ π

0(2 sin t− sin2 t)dt =

8− π

2.

z =Mxy

M=

8− π

4π − 4. ¤

10.2 Vector Fields

The velocity vectors of the flowing water molecules in a river may varyposition to position, and we can think of a velocity vector as being attachedto each point of the fluid. Such a fluid flow is modelled by a vector field ona domain in R3.

Definition 10.2.1 A vector field on a domain U ⊆ R3 is a function F :U → R3 of the form:

F(x, y, z) = (M(x, y, z), N(x, y, z), P (x, y, z)) ∈ R3, (x, y, z) ∈ U .

F is continuous if M , N and P are continuous, and F is differentiable if M ,N and P are differentiable.


For example, if w = f(x, y, z) is a differentiable function on U , then thegradient

∇f =(

∂f

∂x,

∂f

∂y,

∂f

∂z

)

is called the gradient vector field of f on U . The gravitational field isgiven as

F(x, y, z) = − GM(x, y, z)(x2 + y2 + z2)3/2

= −GMγ(x, y, z)r3/2

,

where G, M are constants and γ(x, y, z) = (x, y, z) is the position vectorfield, and r = |γ(x, y, z)|.

Suppose that a vector field F(x, y, z) = (M(x, y, z), N(x, y, z), P (x, y, z))represents a force through U , like the gravity force or a electromagnetic force.Let C be a curve in U with a unit tangent vector T at each point of C. Thenthe work done by F along C is given by

W =∫

CF · Tds.

For the evaluation of this integral, we take a regular parametrization γ(t) =(x(t), y(t), z(t)), t ∈ [a, b], of C. Then the unit tangent vector is givenby T (t) = γ′(t)

|γ′(t)| , and the line segment is ds = |γ′(t)|dt, so that Tds =γ′(t)|γ′(t)| |γ′(t)|dt = γ′(t)dt. Therefore,

∫

CF · Tds, =

∫ b

aF(γ(t)) · γ′(t)dt.

Now, let β(s) = γ(h(u)) be a reparameterization of C by a differentiablefunction h : [c, d] → [a, b] with t = h(u) and dt = h′(u)du. Note that

h(c) = a and h(d) = b, if h′(u) > 0,

h(c) = b and h(d) = a, if h′(u) < 0.

In the first case, the particles tracing β and γ move in the same direction andso h is orientation-preserving. In the second case, the particles tracing βand γ move in the opposite directions and so h is orientation-reversing.Hence

∫ d

cF(β(u)) · β′(u)du =

∫ d

cF(γ(h(u))) · γ′(h(u))h′(u)du

10.2. Vector fields 287

=

{ ∫ ba F(γ(t)) · γ′(t)dt if h′(u) > 0

∫ ab F(γ(t)) · γ′(t)dt if h′(u) < 0

= ±∫ b

aF(γ(t)) · γ′(t)dt.

Therefore, unlike the line integral of a function f on U over C, the integralW =

∫C F · Tds depends on the parametrization γ(t) of C, but only on the

orientation of the parameterizations. Thus, if C has a specified orientationand −C denotes the same curve as C with the opposite orientation, then

∫

−CF · ds = −

∫

CF · ds.

For more notational convention,∫ b

aF(γ(t)) · γ′(t)dt =

∫ b

aF(γ(t)) · ds, where ds = γ′(t)dt,

=∫ b

a(M(γ(t)), N(γ(t)), P (γ(t))) ·

(dx

dt,

dy

dt,

dz

dt

)dt

=∫

CMdx + Ndy + Pdz.

Example 10.2.1 Let γ(t) = (sin t, cos t, t), t ∈ [0, 2π], be the helix. Findthe work done by F(x, y, z) = (x, y, z) along γ.

Solution: Note F(γ(t)) = F(sin t, cos t, t) = (sin t, cos t, t) and γ′(t) =(cos t, − sin t, 1).

F(γ(t)) · ds = F(γ(t)) · γ′(t)dt = sin t cos t− cos t sin t + t = t.

∫

γF(γ(t)) · ds =

∫ 2π

0tdt = 2π2. ¤

Example 10.2.2 Evaluate∫γ x2dx + xydy + dz, where γ(t) = (t, t2, 1) on

t ∈ [0, 1].

Solution: Note that x′(t) = 1, y′(t) = 2t and z′(t) = 0. Hence∫

γx2dx + xydy + dz =

∫ 1

0(t2 + 2t4)dt =

1115

. ¤


Example 10.2.3 Let

h : [a, b] → [a, b], h(u) = a + b− u = t, h(a) = b, h(b) = a, h′(u) = −1 < 0,

g : [0, 1] → [a, b], g(u) = a + (b− a)u = t, g(0) = a, g(1) = b,

g′(u) = (b− a) > 0.

Then h reverses the orientation, and g preserves the orientation.For γ(t) = (t, t2, t3), t ∈ [1, 2], h(u) = 3− u = t, g(u) = 1 + u = t, and

α(u) = γ ◦ h(u) = (3− u, (3− u)2, (3− u)3), withα′(u) = −(1, 2(3− u), 3(3− u)2),

β(u) = γ ◦ g(u) = (1 + u, (1 + u)2, (1 + u)3), withβ′(u) = (1, 2(1 + u), 3(1 + u)2).

If F(x, y, z) = (x, z, y), then

F(α(u)) · α′(u) = (3− u, (3− u)3, (3− u)2) · (−1,−2(3− u),−3(3− u)2)= −(3− u)− 5(3− u)4,

F(β(u)) · β′(u) = (1 + u, (1 + u)3, (1 + u)2) · (−1,−2(1 + u),−3(1 + u)2)= (1 + u) + 5(1 + u)4.∫

αF(α(u)) · α′(u)du =

∫ 2

1[−(3− u)− 5(3− u)4]ds

=[(3− u)2

2+ (3− u)5

]2

1

= −(12

+ 25).

∫

βF(β(u)) · β′(u)du =

∫ 1

0[(1 + u) + 5(1 + u)4]ds

=[(1 + u)2

2+ (1 + u)5

]1

0

=12

+ 25.¤

If F = (M, N, P ) represents the velocity field of a fluid flowing througha region U in the space, then F(γ(t)) · T (t) is the tangential component ofF along γ, and its integral

∫

γF · Tds =

∫

γF · ds

is the fluid’s flow along the curve γ. If the curve is a closed loop, we call itthe circulation, or the flow, of F around the curve.

10.2. Vector fields 289

For a closed curve C in the plane R2, we usually choose the positiveorientation of C to be the counterclockwise direction: walking along C withthe inside of C on the left side. In this case, we usually put a directed circleª on the integral

∫:

∮.

Consider a vector field F(x, y) = (M(x, y), N(x, y)) in the plane R2 anda curve C in the domain. If γ(t) = (x(t), y(t)) is a parametrization of Cin the positive direction, then γ′(t) = (dx

dt ,dydt ) is the tangential direction,

and n(γ(t)) = T × k = 1|γ′(t)|(

dydt , −dx

dt ) is the outward unit normal vectoralong γ. The integral of the normal component F ·n of F in outward normaldirection n is called the flux of F across C:∮

CF · nds =

∮

γF(γ(t)) · n(γ(t))ds

=∮

γ(M(γ(t)), N(γ(t))) · 1

|γ′(t)|(

dy

dt, −dx

dt

)|γ′(t)|dt

=∮

γM(γ(t))dy −N(γ(t))dx.

Example 10.2.4 Let F(x, y) = (x − y, x), and γ(t) = (cos t, sin t), t ∈[0, 2π]. Find the circulation and the flux of F across γ.

Solution: Note that γ′(t) = (− sin t, cos t) and n = (cos t, sin t). F(t) =(cos t− sin t, cos t).

∮

γF · ds =

∫ 2π

0(− sin t cos t + sin2 t + cos2 t)dt = 2π.

∮

γF · nds =

∫ 2π

0M(γ(t))dy −N(γ(t))dx

=∫ 2π

0(cos2 t− sin t cos t + sin t cos t)dt =

∫ 2π

0cos2 tdt = π.¤

Let F(x, y) = (M, N)(x, y) and G(x, y) = (N, −M)(x, y) on U ⊆ R2.Then F ⊥ G. Let C be curve in U with tangent vector ds = (dx, dy). Thennds = (dy, −dx). Thus

Flow(F) =∮

CF · ds =

∮

CMdx + Ndy =

∮

CG · nds = Flux(G),

F lux(F) =∮

CF · nds =

∮

CMdy −Ndx = −

∮

CG · ds = −Flow(G).

This means the flow and the flux are essentially the same integrals.


10.3 Potential Functions

Let F(x, y, z) be the gradient vector field ∇f(x, y, z) of a differentiable func-tion f : U → R on U . Let C be a piecewise smooth curve in U with aparametrization γ(t) on [a, b]. Then∫

CF · ds =

∫ b

a∇f(γ(t)) · γ′(t)dt =

∫ b

a(f ◦ γ)′(t)dt =

∫ b

a

d(f ◦ γ)dt

(t)dt

= [(f ◦ γ)(t)]ba = f(γ(b))− f(γ(a)).

Lemma 10.3.1 Suppose that F(x, y, z) = ∇f(x, y, z) for a differentiablefunction f : U → R on U , and γ(t), t ∈ [a, b], is any piecewise smooth curvejoining P = γ(a) to Q = γ(b) in U . Then

∫

CF · ds =

∫

γ∇f · ds = f(Q)− f(P ).

A function f : U → R such that F = ∇f is called a potential functionfor F.

Definition 10.3.1 A vector field F is said to be conservative on U ⊆ R3

if∫C F · ds is path independent: that is, for any two points P and Q in U ,

and for any two curves C1 and C2 in U joining P and Q,∫

C1

F · ds =∫

C1

F · ds ≡∫ Q

PF · ds.

Lemma 10.3.1 shows that the gradient field of a differentiable functionis conservative. In particular, if the curve γ is closed, then

∫γ ∇f · ds = 0.

In general, we have the following Theorem.

Theorem 10.3.2 Let F(x, y, z) = (M, N, P )(x, y, z) be a continuous vec-tor field in U ⊆ R3. Then the followings are equivalent:

(1) F is conservative in U .(2)

∮C F · ds = 0 for any closed curve C in U .

(3) There is a differentiable function f : U → R such that F = ∇f .

Proof: (1) ⇒ (2): Suppose that F is conservative, and C is a closed loopin U . Choose any two points A and B on C and divide C into two curve C1

joining A to B, and C2 joining B to A, so that C = C1 + C2. Thus∫

CF · ds =

∫

C1

F · ds +∫

C2

F · ds.

10.3. Potential functions 291

But −C2 joins A to B and so∫

C1

F · ds =∫

−C2

F · ds = −∫

C2

F · ds, =⇒∫

C1

F · ds +∫

C2

F · ds = 0.

(2) ⇒ (1): Suppose∮C F · ds = 0 for any closed curve C in U , and C1

and C2 are two curves joining A to B in U . Then C = C1 + (−C2) is a loopin U , and so

0 =∮

CF · ds =

∫

C1

F · ds +∫

−C2

F · ds =∫

C1

F · ds−∫

C2

F · ds.

(1) ⇒ (3): Suppose that F is conservative in U . We want to definea function f : U → R such that F = ∇f . Fix any point X ∈ U , sayX = (0, 0, 0). For any point Y = (x, y, z) ∈ U , take any oriented simplecurve C from X to Y . Define

f(x, y, z) =∫

CF · ds.

Then f is well-defined since the integral is path independent. We now needto show that F = ∇f . Choose a curve C = C1 + C2 + C3, where

C1 : γ1(t) = (t, 0, 0), t ∈ [0, x], joining (0, 0, 0) to (x, 0, 0), ⇒ γ′1(t) = i,

C2 : γ2(t) = (x, t, 0), t ∈ [0, y], joining (x, 0, 0) to (x, y, 0), ⇒ γ′2(t) = j,

C3 : γ3(t) = (x, y, t), t ∈ [0, z], joining (x, y, 0) to (x, y, z), ⇒ γ′3(t) = k.

f(x, y, z) =∫

CF · ds =

∫

C1

F · γ′1(t)dt +∫

C2

F · γ′2(t)dt +∫

C3

F · γ′3(t)dt

=∫ x

0M(γ1(t))dt +

∫ y

0N(γ2(t))dt +

∫ z

0P (γ3(t))dt.

Hence, ∂f∂z (x, y, z) = ∂

∂z

∫ z0 P (x, y, t)dt = P (x, y, z). Similarly, by permuting

z with x, or y, in the setting of the curves above, we get

∂f

∂y(x, y, z) = N(x, y, z),

∂f

∂x(x, y, z) = M(x, y, z).

For a different point of X, the functional values differ only by a constant,since the line integrals depend only on the end points. Thus their gradientsare the same.

(3) ⇒ (1): Done by Lemma 10.3.1. ¤


Theorem 10.3.2 shows that the gradient field is the only conservativevector field. Now, a question is “how do we know whether the given vectorfield is conservative?” Suppose that F = ∇f for some function f . ThenM = ∂f

∂x , N = ∂f∂y and P = ∂f

∂z . Then, by Theorem 8.2.1, we have

My = fxy = fyx = Nx,

Mz = fxz = fzx = Px,

Nz = fyz = fzy = Py.

These equations are necessary conditions for a conservative field: that is, ifthese equations do not hold, then F is not conservative. We will see laterin Theorem 10.6.3 that these equations are also sufficient condition. Thenext question is that, if F is known to be conservative, “how can we find apotential function f such that F = ∇f?” The next example answers this.

Example 10.3.1 If F = (M, N, P ) = (2x − 3,−z, cos z), then My = 0 =Nx, Nz = −1 6= Py = 0. Thus F is not conservative. ¤

Example 10.3.2 Let F = (M, N, P ) = (ex cos y+yz, xz−ex sin y, xy+z).Then My = −ex sin y + z = Nx, Nz = x = Py = 0, Mz = y = Px. ThusF is conservative, and so fx = M = ex cos y + yz. By integrating this withrespect to x, we get

f(x, y, z) = ex cos y + xyz + g(y, z), for some function g in y and z.

Now fy = −ex sin y + xz + gy = N = xz − ex sin y implies gy = 0. Thusg(y, z) = h(z). Since fz = xy+h′(z) = xy+z, h′(z) = z, and so h(z) = z2

2 +c.Hence

f(x, y, z) = ex cos y + xyz + g(y, z) = ex cos y + xyz +z2

2+ c. ¤

Recall that ds in∫C F · ds can be written as

ds = γ′(t)dt = (dx

dt,

dy

dt,

dy

dt)dt = (dx, dy, dz).

Thus F · ds = (M, N, P ) · (dx, dy, dz) = Mdx + Ndy + Pdz

which is free from the parametrization of the curve C. Such an expressionis called a differential form. When F = ∇f , it becomes

∇f · ds = ∇f · γ′(t)dt = fxdx + fydy + fzdz = df,

10.4. GREEN’S THEOREM 293

which is the total differential of f , called an exact differential form. Hence,∫

C∇f · ds =

∫

Cdf = f(B)− f(A).

A differential form Mdx + Ndy + Pdz is exact if and only if My = Nx,Mz = Px and Nz = Py.

10.4 Green’s Theorem

We have seen that the line integral is easy if the field is conservative, ora gradient field, by finding a potential function. Green’s theorem helps toevaluate the integral when the field is not conservative.

Let F = (M, N) be a vector field on a domain U ⊆ R2 which is openand simply connected, i.e., all the closed curves in U can be shrunken toa point continuously like usual disk without any holes inside. For a pointP = (x, y) in U , consider a small rectangle R with corners (x, y), (x+4x, y),(x, y +4y), and (x +4x, y +4y). Let C1, . . ., C4 be the sides of R withcounterclockwise orientation. We compute the flux of F out of R:

x-

6

x +4x

y

y +4y

4x

4y

C1

C2

C3

C4

?

-

6

¾

i

j

−i

−j

*F

on C1, n = −j = (0,−1), 4s = 4x, F · nds ≈ −N(x, y)4x,

on C2, n = i = (1, 0), 4s = 4y, F · nds ≈ M(x +4x, y)4y,

on C3, n = j = (0, 1), 4s = 4x, F · nds ≈ N(x, y +4y)4x,

on C4, n = −i = (−1, 0), 4s = 4y, F · nds ≈ −M(x, y)4y.

On C1 + C3,

F · nds ≈ N(x, y +4y)−N(x, y)4y

4y4x =∂N(x, y + c4y)

∂y4y4x

−→ ∂N

∂y(x, y)dydx, as 4y,4x → 0,


On C2 + C4

F · nds ≈ M(x +4x, y)−M(x, y)4x

4x4y =∂M(x + d4x, y)

∂x4x4y

−→ ∂M

∂x(x, y)dxdy, as 4y, 4x → 0.

Hence, on C = C1 + C2 + C3 + C4, the flux density of F is defined by

F · n4s

Area(R)−→

(∂M

∂x+

∂N

∂y

)(x, y) ≡ div F,

which is called the divergence of F.Now, consider the circulation density of F around R: On C1, 4s =

T4s = (1, 0)4x, F · ds ≈ M(x, y)4x. On C2, F · ds ≈ N(x +4x, y)4y.On C3, F · ds ≈ −M(x, y +4y)4x. On C4, F · ds ≈ −N(x, y)4y. Thus, onC1 + C3,

F · ds ≈ −M(x, y +4y)−M(x, y)4y

4y4x = −∂M(x, y + c4y)∂y

4y4x

−→ −∂M

∂y(x, y)dydx, as 4y,4x → 0.

on C2 + C4,

F · ds ≈ N(x +4x, y)−N(x, y)4x

4x4y =∂N(x + d4x, y)

∂x4x4y

−→ ∂N

∂x(x, y)dxdy, as 4y,4x → 0.

Hence, on C = C1 + C2 + C3 + C4, the circulation density of F isdefined by

F · dsArea(R)

−→(

∂N

∂x− ∂M

∂y

)(x, y) ≡ curl F,

which is called the curl of F.

Theorem 10.4.1 (Green’s Theorem) Let C be a simple closed piecewisesmooth curve in an open domain U ⊇ R2 bounding a region R ⊆ U . andF = (M, N) a vector field on U ⊇ R with continuous partial derivatives.Then the flux and the circulation of F along C are given by:∮

CF · nds =

∮

CMdy −Ndx =

∫∫

R

(∂M

∂x+

∂N

∂y

)dA =

∫∫

Rdiv FdA.

∮

CF · ds =

∮

CMdx + Ndy =

∫∫

R

(∂N

∂x− ∂M

∂y

)dA =

∫∫

Rcurl FdA.

10.4. Green’s Theorem 295

Proof: If we take G = (N,−M), then div G = curl F. Thus it is goodenough to prove one of them. We prove the circulation form:

(1) We first assume that C bounds a convex region: i.e, the lines parallelto the axes cut C in no more than two points, and C = C1 + C2 made up oftwo parts: C1: y = f1(x), a ≤ x ≤ b, and C2: y = f2(x), b ≥ x ≥ a. Then

∫ f2(x)

f1(x)

∂M

∂ydy = M(x, y)|f2(x)

f1(x) = M(x, f2(x))−M(x, f1(x)),

∫ b

a

∫ f2(x)

f1(x)

∂M

∂ydydx =

∫ b

aM(x, f2(x))dx−

∫ b

aM(x, f1(x))dx

= −∫ a

bM(x, f2(x))dx−

∫ b

aM(x, f1(x))dx

= −(∫

C2

M(x, f2(x))dx +∫

C1

M(x, f1(x))dx

)

∴∫∫

R

∂M

∂ydA = −

∮

CMdx.

-

6 6

a b

c

dC2: y = f2(x)

D1: x = g1(y)

C1: y = f1(x)

D2: x = g2(y)

Similarly, write C = D1 + D2 with D1: x = g1(y), c ≤ y ≤ d, andx = g2(y), d ≥ y ≥ c. Then

∫ g1(y)

g2(y)

∂N

∂xdx = N(x, y)|g1(y)

g2(y) = N(g1(y), y)−N(g2(y), y),

∫ d

c

∫ g1(y)

g2(y)

∂N

∂xdxdy =

∫ d

cN(g1(y), y)dy −

∫ d

cN(g2(y), y)dy

=∫ d

cN(g1(y), y)dy +

∫ c

dN(g2(y), y)dy

=∫

D1

N(g1(y), y)dy +∫

D2

N(g2(y), y)dy

∴∫∫

R

∂N

∂xdA =

∮

CNdy,


so that ∮

CMdx + Ndy =

∫∫

R

(∂N

∂x− ∂M

∂y

)dxdy.

(2) If R is a rectangular region, divide C into 4 directed line segments:C1: y = c, a ≤ x ≤ b on which dy = 0, C2: x = b, c ≤ y ≤ d on whichdx = 0, C3: y = d, b ≥ x ≥ a on which dy = 0, C2: x = a, d ≥ y ≥ c onwhich dx = 0. Thus∫∫

R

∂N

∂xdA =

∫ d

c

∫ b

a

∂N

∂xdxdy

=∫ d

c(N(b, y)−N(a, y))dy =

∫ d

cN(b, y)dy +

∫ c

dN(a, y))dy

=∫

C2

Ndy +∫

C4

Ndy

=∫

C1

Ndy +∫

C2

Ndy +∫

C3

Ndy +∫

C4

Ndy =∮

CNdy,

where the second to the last equality comes from the fact that dy = 0 on C1

and C3. Similarly, we get∫∫

R∂M∂y dA = − ∮

C Mdx, so that∮

CMdx + Ndy =

∫∫

R

(∂N

∂x− ∂M

∂y

)dxdy.

(3) The theorem applies to more general regions like the following pic-tures:

R-

¾ -¾

?6

6?

Cut the region into smaller regions on each of which the theorem applies,and then in adding them up, on the edges contained interior of R, the lineintegrals are cancelled, but the double integrals add up over R. ¤

Example 10.4.1 Let F(x, y) = (y,−x), and C: γ(t) = (cos t, sin t) the unitcircle. γ′(t) = (−sint, cos t). curl F = Nx −My = −1− 1 = −2. Thus

∫∫

Rcurl FdA = −2π =

∮

CF · ds =

∫ 2π

0−(sin2 t + cos2 t)dt. ¤

10.4. Green’s Theorem 297

Example 10.4.2 Let F(x, y) = (y,−x)x2+y2 , and C: γ(t) = (cos t, sin t) the unit

circle with γ′(t) = (− sin t, cos t). Then

curl F = Nx −My =x2 − y2

(x2 + y2)2− x2 − y2

(x2 + y2)2= 0.

Thus∫∫

Rcurl FdA = 0 6= −2π =

∮

CF · ds =

∫ 2π

0−(sin2 t + cos2 t)dt.

This is because F is not continuous at (0, 0) ∈ R.However, if the region R is an annulus bounded by two concentric circles:

Ch: x2 + y2 = h2 and C1: x2 + y2 = 1 with 0 < h < 1, then the positiveorientations of the circles are parameterized as γh(t) = (h cos t, −h sin t)for Ch and γ1(t) = (cos t, sin t), where in each case t ∈ [0, 2π]. Sinceγ′h(t) = h(− sin t, − cos t) and γ′1(t) = (− sin t, cos t),

∮

CF · ds =

∮

Ch

F · ds +∮

C1

F · ds = 2π − 2π = 0 =∫∫

Rcurl FdA.

This also shows that for any simple closed curve C containing Ch,∮

CF · ds +

∮

Ch

F · ds = 0. or∮

CF · ds = −

∮

Ch

F · ds = −2π.

A vector field F is said to be irrotational if curl F = 0. ¤

Theorem 10.4.2 If C is a simple closed curve bounding a region R to whichGreen’s theorem applies, then the area of R is

Area(R) =12

∮

Cxdy − ydx =

∮

Cxdy.

Proof: Let F = (x, y). Then div F = Mx + Ny = 1 + 1 = 2. Thus∫∫

Rdiv FdA = 2

∫∫

RdA = 2Area(R) =

∮

CMdy −Ndx

=∮

Cxdy − ydx = 2

∮

Cxdy,

since, by integration by parts,∮

Cxdy = [xy]PP −

∮

Cydx = −

∮

Cydx. ¤


Example 10.4.3 Find the area of the region R enclosed by the curve C:x2/3 + y2/3 = a2/3.

Solution: C can be parameterized as γ(t) = (a cos3 t, a sin3 t), t ∈ [0, 2π].

Area(R) =12

∮

Cxdy − ydx

=12

∫ 2π

0[(a cos3 t)(3a sin2 t cos t) + (a sin3 t)(3a cos2 t sin t)]dt

=3a2

2

∫ 2π

0(sin2 t cos2 t)dt =

3a2

8

∫ 2π

0sin2 2tdt =

3a2

8π. ¤

10.5 Surfaces in the Space

There are several ways to express surfaces in the space: Level surfaces suchas the quadric surfaces defined by quadratic equations like f(x, y, z) = x2 +y2 + z2 − r2 = 0. The graphs of two variable functions z = f(x, y) areparticular cases of the level surfaces as F (x, y, z) = z − f(x, y) = 0, orparameterized surfaces as ϕ(x, y) = (x, y, f(x, y)).

A general method is to use parametrization, like the spherical coordinatesof the spheres.

Definition 10.5.1 A patch, or a parametrization, of a surface S ⊆ R3

is a 1− 1 continuous mapping defined on a domain U ⊆ R2 written as

ϕ(u, v) = (x(u, v), y(u, v), z(u, v)) ∈ S.

If ϕ is differentiable of class C1 (i.e., x, y, z are differentiable), then we sayS is a differentiable surface.

As we have seen in Section 8.5, the derivative Dϕ of ϕ is given by amatrix Dϕ = [ϕu ϕv], which transforms a tangent vector of a curve in Uto a tangent vector of a curve in S: If α(t) = (u(t), v(t)), t ∈ I, is adifferentiable curve in U , then β(t) = (ϕ ◦ α)(t) is a differentiable curve inS and the tangent vector β′(t) is given by:

β′(t) = Dϕ · α′(t) =[

ϕu ϕv

][

dudt

dvdt

]=

∂x∂u

∂x∂v

∂y∂u

∂y∂v

∂z∂u

∂z∂v

α(t)

[dudt

dvdt

]

t

.

10.5. Surfaces in the space 299

In particular, ϕu and ϕv are the tangent vectors to coordinate curves ϕ(u, v0)and ϕ(u0, v) on S, which forms a basis for the tangent plane to the surface.

In particular, ϕu×ϕv is normal to the surface, and so n = ϕu×ϕv

|ϕu×ϕv | is theunit normal vector of the surface S. Moreover,

4S ≈ |ϕu × ϕv|4u4v

represents the area of the parallelogram in the tangent plane determined byϕu4u and ϕv4v which are the image of the rectangle in U with sides 4uiand 4vj.

7

z

Áz

-

6

-

6

ªu

v -

4vj4A

-ϕ

u +4u

4ui

6v +4v

ϕv4v = Dϕ(4vj)

ϕu4u = Dϕ(4ui)ϕ(u, v)

4S = ϕ(4A)

1

:

α′(t)

β′(t)

As 4u, 4v → 0, it approaches

dS = |ϕu × ϕv|dudv

which is called the element of surface area of S. Hence the area of thesurface S is defined by

Area(S) =∫∫

SdS =

∫∫

U|ϕu × ϕv|dudv.

If f : S → R is a continuous function on the surface S, then the surfaceintegral of f over S is defined by∫∫

SfdS =

∫∫

Uf(ϕ(u, v))|ϕu × ϕv|dudv.

As usual, the right side is independent of the parametrization ϕ of S:In fact, let ψ(s, t) = ϕ ◦ h(s, t) be another parametrization of S by a

1-1 differentiable function h : V → U given by h(s, t) = (u, v). then by thechain rule, we have

Dψ =[

ψs ψt

]= Dϕ ·Dh =

[ϕu ϕv

] [∂u∂s

∂u∂t

∂v∂s

∂v∂t

]

=[

∂u∂s ϕu + ∂v

∂sϕv∂u∂t ϕu + ∂v

∂t ϕv

],

and so ψs × ψt =(

∂u

∂s

∂v

∂t− ∂v

∂s

∂u

∂t

)(ϕu × ϕv) =

∣∣∣∣∂u∂s

∂u∂t

∂v∂s

∂v∂t

∣∣∣∣ (ϕu × ϕv).


∴ |ψs × ψt|dsdt =∣∣∣∣∂(u, v)∂(s, t)

∣∣∣∣ |ϕu × ϕv|dsdt = |ϕu × ϕv|dudv,

since

dudv =∣∣∣∣∂(u, v)∂(s, t)

∣∣∣∣ dsdt

from the argument in Section 9.6. Thus∫∫

Vf(ψ(s, t))|ψs × ψt|dsdt =

∫∫

Uf(ϕ(u, v))|ϕu × ϕv|dudv.

Example 10.5.1 Find the surface area of the sphere of radius a: x2 + y2 +z2 = a2.

Solution: Take a spherical coordinates:

ϕ(φ, θ) = (a sinφ cos θ, a sinφ sin θ, a cosφ), φ ∈ [0, π], θ ∈ [0, 2π].

ϕφ × ϕθ =

∣∣∣∣∣∣

i j ka cosφ cos θ a cosφ cos θ −a sinφ−a sinφ sin θ a sinφ cos θ 0

∣∣∣∣∣∣= (a2 sin2 φ cos θ, a2 sin2 φ sin θ, a2 sinφ cos θ),

∴ |ϕφ × ϕθ| = a2 sinφ.

∴ Area(S2) =∫ 2π

0

∫ π

0a2 sinφdφdθ =

∫ 2π

0[−a2 cosφ]π0dθ = 4πa2. ¤

Example 10.5.2 Compute the surface integral of f(x, y, z) = x2 over acone S: z =

√x2 + y2, 0 ≤ z ≤ 1.

Solution: Parameterize S by

ϕ(r, θ) = (r cos θ, r sin θ, r), r ∈ [0, 1], θ ∈ [0, 2π].

ϕr × ϕθ =

∣∣∣∣∣∣

i j kcos θ sin θ 1−r sin θ r cos θ 0

∣∣∣∣∣∣= (−r cos θ, −r sin θ, r),

∴ |ϕr × ϕθ| =√

2r.∫∫

Sx2dS =

∫ 2π

0

∫ 1

0(r2 cos θ)

√2rdrdθ =

√2

4

∫ 2π

0cos2 θdθ =

√2π

4.¤


Suppose that the surface S is the graph of a differentiable functionz = g(x, y) over a domain U ⊆ R2. Then ϕ(x, y) = (x, y, g(x, y)) is aparametrization of S, and so the area element is dA = |ϕx × ϕy|dxdy. But

|ϕx × ϕy| =∣∣∣∣∣∣

i j k1 0 gx

0 1 gy

∣∣∣∣∣∣= |(−gx, −gy, 1)| =

√1 + g2

x + g2y .

Thus, if f : S → R is a function on S, the surface integral of f over S is∫∫

SfdS =

∫∫

Uf(ϕ(x, y))

√1 + g2

x + g2ydxdy.

Note that

n =(−gx, −gy, 1)√

1 + g2x + g2

y

is the unit normal vector to the graph of g.More generally, suppose that S is a level surface given by f(x, y, z) = 0.

Assume that the projection of S onto a plane, say xy-plane, is 1-1. Over asmall rectangular region R in the xy-plane of area 4A, the region 4P inthe tangent plane to the surface approximates the part 4S of the surfaceover R as the following picture:

:

=:=

pM

n = ∇f|∇f |

θ6

u

v

u′

v′

4A

4P

4S

6

Let u′ and v′ be the side vectors of R, p a unit vector normal to R, andu and v the sides of 4P . Then, we can write them as u = u′ + ap, v =v′ + bp, for some a, b ∈ R, and

(u× v) ‖ n =∇f

|∇f | .


Note that 4S ≈ 4P = |u× v|, and

u× v = u′ × v′ + ap× v′ + bu′ × p,

|(u× v) · p| = |(u′ × v′) · p| = |u′ × v′| = 4A

= | cos θ||u× v| = | cos θ|4P ≈ | cos θ|4S,

where | cos θ| = |p · n| = |p·∇f ||∇f | . Therefore, as 4A → 0,

dS =1

| cos θ|dA =|∇f ||p · ∇f |dA,

and so the surface integral of a function h over S is∫∫

ShdS =

∫∫

Uh(x, y, z)

1| cos θ|dA =

∫∫

Uh(x, y, z)

|∇f ||p · ∇f |dA.

Example 10.5.3 Find the surface area of the paraboloid f(x, y, z) = x2 +y2 − z = 0 between z = 0 and z = 4.

Solution: The surface is over R which is the disk of radius 2 in xy-plane.Thus p = k. Now∇f = (2x, 2y,−1), |∇f | =

√4x2 + 4y2 + 1, and |∇f ·p| =

1. Thus the surface area is∫∫

R

|∇f ||p · ∇f |dA =

∫ 2π

0

∫ 2

0

√4r2 + 1rdrdθ

=173/2 − 1

12

∫ 2π

0dθ =

(173/2 − 1)π6

. ¤

Example 10.5.4 Find the surface area of the cap cut from the spheref(x, y, z) = x2 + y2 + z2 = 2 over x2 + y2 = 1, z ≥ 0.

Solution: The surface is over R which is the disk of radius 1 in xy-plane.Thus p = k. Now ∇f = (2x, 2y, 2z), |∇f | =

√4x2 + 4y2 + 4z2 = 2

√2, and

|∇f · k| = 2z. Thus the surface area is∫∫

R

|∇f ||k · ∇f |dA =

∫∫

R

√2

zdA

=∫ 2π

0

∫ 1

0

√2r√

2− r2drdθ = 2π(2−

√2). ¤


Example 10.5.5 Find the the center of mass of a thin hemispherical shellof radius a: f(x, y, z) = x2+y2+z2 = a2, z ≥ 0, and constant density δ = 1.

Solution: The surface is over R which is the disk of radius a in xy-plane.It is good enough to find z. The mass of the shell is

M =∫∫

SδdS = Area(S) = 2πa2.

Now p = k, |∇f | =√

4x2 + 4y2 + 4z2 = 2a, and |∇f · k| = 2z. Thus

Mxy =∫∫

SzδdS =

∫∫

Rz|∇f ||k · ∇f |dA =

∫∫

RadA = πa3.

Thus z = Mxy

M = a2 . ¤

A surface S is said to be orientable if a continuous unit normal vectoris well-defined over S. The spheres, torus, quadric surfaces are examples oforientable surfaces. The Mobius band is none-orientable.

Let F = (M, N, P ) be a vector field on U ⊆ R3, and S is an orientedsurface in U with outward unit normal vector field n on S. The flux of Facross S is defined by

∫∫

SF · ndS =

∫∫

SF · dS, dS ≡ ndS.

Note that the evaluation of this integral can be done through a parametriza-tion of the surface. But the integral depends on the orientation of the sur-face, and so the unit normal vector from a parametrization has to be coincidewith orientation of the surface, otherwise the integral changes its sign.

(1) If the surface S is parameterized by

ϕ(u, v) = (x(u, v), y(u, v), z(u, v)), (u, v) ∈ D ⊆ R2,

then n =ϕu × ϕv

|ϕu × ϕv| , dS = |ϕu × ϕv|dudv, and

dS = ndS =ϕu × ϕv

|ϕu × ϕv| |ϕu × ϕv|dudv = (ϕu × ϕv)dudv

= (yuzv − yvzu, zuxv − zvxu, xuyv − yuxv)dudv

=(

∂(y, z)∂(u, v)

dudv,∂(z, x)∂(u, v)

dudv,∂(x, y)∂(u, v)

dudv

)

= (dydz, dzdx, dxdy) .


∴∫∫

SF · ndS =

∫∫

SF · dS =

∫∫

DF(ϕ(u, v)) · (ϕu × ϕv)dudv

=∫∫

SMdydz + Ndzdx + Pdxdy.

(2) If the surface S is give as the graph of z = g(x, y), (x, y) ∈ D ⊆ R2,then

n =(−gx, −gy, 1)√

1 + g2x + g2

y

, dS = ndS =(−gx, −gy, 1)√

1 + g2x + g2

y

√1 + g2

x + g2ydxdy

= (−gx, −gy, 1)dxdy.

∴∫∫

SF · ndS =

∫∫

SF · dS =

∫∫

DF(x, y, g(x, y)) · (−gx, −gy, 1)dxdy.

(3) If the surface S is give as the level surface of h(x, y, z) = c whichproject onto a region R in a plane P with a unit normal vector p, then

n =∇f

|∇f | , dS = ndS =∇f

|∇f ||∇f ||p · ∇f |dA =

∇f

|p · ∇f |dA.

∴∫∫

SF · ndS =

∫∫

SF · dS =

∫∫

RF(x, y, z) · ∇f

|p · ∇f |dA.

Example 10.5.6 Find the flux of the vector field F(x, y, z) = (0, yz, z2)across the level surface S given as y2 + z2 = 1, z ≥ 0, and between x = 0and x = 1.

Solution: The surface projects onto the region R = [0, 1] × [−1, 1] in xy-plane. Thus p = k and n = ∇f

|∇f | = (0,2y,2z)√4(y2+z2)

= (0, y, z).

∫∫

SF · dS =

∫∫

R(0, yz, z2) · (0, 2y, 2z)

|2z| dA

=∫∫

R

z(y2 + z2)z

dA =∫∫

RdA = 2.

¤

Example 10.5.7 Find the flux of the vector field F(x, y, z) = (yz, x,−z2)across the surface S given as y = x2, 0 ≤ z ≤ 4, and 0 ≤ x ≤ 1.

10.6. STOKES’ THEOREM 305

Solution: Parametrize the surface by ϕ(x, z) = (x, x2, z), 0 ≤ z ≤ 4, and0 ≤ x ≤ 1. Then

ϕx = (1, 2x, 0), ϕz = (0, 0, 1) =⇒ ϕx × ϕz = (2x,−1, 0).

∫∫

SF · dS =

∫ 1

0

∫ 4

0(x2z, x,−z2) · (2x,−1, 0)dxdz

=∫ 1

0

∫ 4

0(2x3z − x)dxdz = 2. ¤

10.6 Stokes’ Theorem

One can say that the Green’s theorem is an extension of the first fundamentaltheorem in calculus to 2-dimensional case: a double integral is a line integralaround the boundary of the domain of the double integral. Stokes’ theoremis a surface version of the Green’s theorem on the plane. For this, we firstintroduce a symbolic notation:

∇ =(

∂

∂x,

∂

∂y,

∂

∂z

),

called the del operator. This operator makes sense only when it is appliedto a differentiable function f : U ⊆ R3 → R:

∇(f) = ∇f =(

∂

∂x,

∂

∂y,

∂

∂z

)(f) =

(∂f

∂x,

∂f

∂y,

∂f

∂z

),

which is the gradient field of f .

Definition 10.6.1 Let F = (M, N, P ) be a vector field in R3.(1) The curl of F is defined as

curl F = ∇× F ≡∣∣∣∣∣∣

i j k∂∂x

∂∂y

∂∂z

M N P

∣∣∣∣∣∣

=(

∂P

∂y− ∂N

∂z,

∂M

∂z− ∂P

∂x,

∂N

∂x− ∂M

∂y

).

(2) The divergence of F is defined as

div F = ∇ · F =∂M

∂x+

∂N

∂y+

∂P

∂z.


In R2, ∇ = ( ∂∂x , ∂

∂y ) and so, for F = (M, N),

curl F =(

∂N

∂x− ∂M

∂y

)k ≡ ∂N

∂x− ∂M

∂y,

which we had earlier.

Example 10.6.1 For F = (x2 − y, 4z, x2),

curl F = ∇× F =

∣∣∣∣∣∣

i j k∂∂x

∂∂y

∂∂z

x2 − y 4z x2

∣∣∣∣∣∣= (−4,−2x, 1).

div F = ∇ · F = 2x. ¤

The following theorem is a direct consequence of the definition.

Theorem 10.6.1 Let f , g be differentiable functions and c a constant. LetF and G be differentiable vector fields in R3. Then

(1) ∇(cf ± g) = c∇f ±∇g, ∇(fg) = f∇g + g∇f , ∇(fg ) = g∇f−f∇g

g2 .

(2) ∇ · (cF±G) = c∇ · F±∇ ·G, ∇× (cF±G) = c∇× F±∇×G.

(3) ∇ · (fF) = f(∇ · F) +∇f · F, ∇× (fF) = f(∇× F) +∇f × F.

For example, if F = ∇f , then

curl ∇f = ∇×∇f = 0

since mixed second partials are equal.

Theorem 10.6.2 (Stokes’ Theorem) Let F = (M, N, P ) be a differ-entiable vector field on an open domain U ⊆ R3, and S ⊆ U an orientedsurface with outward unit normal vector n. Let C = ∂S be the boundary ofS with the orientation determined by n. Then

∫∫

Scurl F · dS =

∫∫

S∇× F · ndS =

∮

CF · ds,

where dS = ndS and ds = Tds.

10.6. Stokes’ Theorem 307

Proof: Suppose that S is represented as the graph of z = f(x, y) over aregion D ⊆ R2 with ∂D = C. Then

dS = ndS =(−fx,−fy, 1)√

f2x + f2

y + 1

√f2

x + f2y + 1dxdy = (−fx,−fy, 1)dxdy,

and ds = Tds = (dx, dy, dz) where dz = fxdx + fydy. Thus∮

CF · ds =

∮

CMdx + Ndy + Pdz =

∮

C(M + Pfx)dx + (N + Pfy)dy

=∫∫

D

(∂(N + Pfy)

∂x− ∂(M + Pfx)

∂y

)dxdy,

where the last equality comes from the Green’s Theorem 10.7.1, and

∂N

∂x= Nx

∂x

∂x+ Ny

∂y

∂x+ Nz

∂z

∂x= Nx + Nzfx,

∂M

∂y= My

∂x

∂y+ My

∂y

∂y+ Mz

∂z

∂y= My + Mzfy,

∂(Pfy)∂x

= (Px + Pzfx)fy + Pfyx,

∂(Pfx)∂y

= (Py + Pzfy)fx + Pfxy.

Thus,∫∫

D

(∂(N + Pfy)

∂x− ∂(M + Pfx)

∂y

)dxdy

=∫∫

D[−fx(Py −Nz)− fy(Mz − Px) + (Nx −My)] dxdy

=∫∫

Scurl F · dS.

Suppose that S is represented by a parametrization ϕ(u, v). Then∮

CF · ds =

∮

CMdx + Ndy + Pdz

=∮

C(Mxu + Nyu + Pzu)du + (Mxv + Nyv + Pzv)dv

=∫∫

D

(∂

∂u(Mxv + Nyv + Pzv)− ∂

∂v(Mxu + Nyu + Pzu)

)dudv


=∫∫

D[My(yuxv − yvxu) + Mz(zuxv − zvxu) + Nx(xuyv − xvyu)

+Nz(zuyv − zvyu) + Px(xuzv − xvzu) + Py(yuzv − yvzu)]dudv

=∫∫

D[(Nx −My)(xuyv − xvyu) + (Mz − Px)(zuxv − zvxu)

+(Py −Nz)(yuzv − yvzu)]dudv

=∫∫

D(∇× F) · (ϕu × ϕv)dudv =

∫∫

Scurl F · dS. ¤

If S is a region in the xy-plane bounded by a curve C = ∂S, thendS = ndS = kdxdy and

(∇× F) · n = (∇× F) · k = Nx −My.

Thus the Stokes’ Theorem becomes∮

CF · ds =

∫∫

Scurl F · dS =

∫∫

S(Nx −My)dxdy,

which is the flow form of the Green’s Theorem.Suppose that two different surfaces S1 and S2 have the same boundary

curve C so that they constitute one closed surface S, just like the two hemi-spheres constituting a sphere. Suppose also that the orientations n1 and n2

of S1 and S2 coincide along C. Then, by Stokes’ Theorem 10.6.2,∫∫

S1

curl F · dS1 =∮

CF · ds =

∫∫

S2

curl F · dS2.

If the orientations n1 and n2 of S1 and S2 induce the opposite directions onC, then∫∫

Scurl F·dS =

∫∫

S1

curl F·dS1+∫∫

S2

curl F·dS2 =∮

CF·ds−

∮

CF·ds = 0.

MI

C

S1

S2

n1n2

-¾

}

¼

C

S1

S2n2

n1


Example 10.6.2 Let F = (y, −x, 0), C the circle x2 + y2 = 9, z = 0,bounding the north hemisphere S: x2 + y2 + z2 = 9, z ≥ 0. Then C can beparameterized as γ(t) = (3 cos t, 3 sin t, 0), t ∈ [0, 2π].

∮

CF · ds =

∫ 2π

0(−9 sin2 t− 9 cos2 t)dt =

∫ 2π

0−9dt = −18π.

By a simple computation,

curl F = ∇× F =

∣∣∣∣∣∣

i j k∂∂x

∂∂y

∂∂z

y −x 0

∣∣∣∣∣∣= (0, 0,−2).

Since dS = ∇f|k·∇f |dA = (2x,2y,2z)

2z dA = (x,y,z)z dA,

∫∫

Scurl F · dS =

∫∫

S(0, 0− 2) · (x, y, z)

zdA =

∫∫

x2+y2≤9−2dA = −18π.

¤

Example 10.6.3 Let F = (yez, xez, xyez), C an oriented simple closedcurve that is the boundary of a surface S. Then

∮

CF · ds =

∫∫

Scurl F · dS = 0,

since curl F = ∇× F = 0. ¤

Example 10.6.4 Evaluate∮C −y3dx + x3dy − z3dz where C is the inter-

section of the cylinder x2 + y2 = 1 and the plane x + y + z = 1 with thecounterclockwise orientation.

Solution: C is bounding the surface S defined g(x, y, z) = x+y+z−1 = 0over the unit disk x2 + y2 ≤ 1 in the xy-plane. Let F = (−y3, x3, −z3).Then curl F = (0, 0, 3x2 + 3y2) and ∇g = (1, 1, 1) and so dS = ∇g

|k·∇g|dA =(1, 1, 1)dA. Thus

∮

CF · ds =

∫∫

Scurl F · dS = 3

∫ 2π

0

∫ 1

0r3drdθ =

342π =

32π.

One can compte the line integral directly, but with great effort. (Try!). ¤

Example 10.6.5 Evaluate∫∫S curl F · dS for F = (y, −x, exz) over the

surface S given in the following picture with the indicated orientation:


>

-

6

¼x

y

z

S

n

Cx2 + y2 = 1

where C = ∂S is the unit circle.

Solution: Since∫∫S curl F · dS =

∮C F · ds, we just parameterize C as

γ(t) = (cos t, sin t, 0), t ∈ [0, 2π]. Thus, ds = (dx, dy, dz) = γ′(t)dt =(− sin t, cos t, 0)dt, and so

∮

CF · ds =

∮

Cydx− xdy =

∫ 2π

0(− sin2 t− cos2 t)dt = −2π. ¤

Let F be a continuous vector field on U ⊆ R3, Q ∈ U , and u a unitvector at Q. Let Sρ be the disk of radius ρ centered at Q and perpendicularto u with Cρ = ∂Sρ. Then∮

Cρ

F · ds =∫∫

Sρ

(∇× F) · udS = (curl F(P ) · u)Area(Sρ), for some P ∈ Sρ,

by the mean value theorem of the integral. Note that Sρ → Q as ρ → 0.Thus

curl F(Q) · u = limρ→0

1πρ2

∫∫

Sρ

(∇× F) · udS = limρ→0

1πρ2

∮

Cρ

F · ds.

This means that the component of the curl of F in a given direction uis the density of the circulation of F around u. The circulation densityis maximum when u ‖ curl F(Q).

º ±FF

6

°C

C

F

F

Circulation of F = 0Circulation of F 6= 0


When F is a velocity vector field of a fluid flow, curl F is called thevorticity vector.

Example 10.6.6 The velocity of a fluid flow rotating around the z-axis isgiven by F = (−ωy, ωx, 0), where ω is a constant as the angular velocityof the rotation. Then by a simple computation, we get ∇× F = (0, 0, 2ω).If Cρ is a circle of radius ρ bounding a disk Sρ in the xy-plane, then

∮

Cρ

F · ds =∫∫

Sρ

(∇× F) · kdS =∫∫

Sρ

2ωdA = 2ω(πρ2).

Thus(∇× F) · k = 2ω =

1πρ2

∮

Cρ

F · ds. ¤

Recall that a vector field F is conservative if its line integral is indepen-dent of the paths joining any two points, or the integral along any closedcurve is zero (see Theorem 10.3.2 in Section 10.3), and the gradient field isthe only conservative field. It is clear that if F = ∇f for some potentialfunction f on a domain U ⊆ R3, then ∇×∇f = 0 as we have seen earlier.Conversely, suppose that ∇ × F = 0. Then Stokes’ Theorem implies that,for any oriented simple closed curve C bounding a surface S,

∮

CF · ds =

∫∫

S(∇× F) · dS = 0,

which means F is the gradient vector field ∇f of some differentiable functionf by Theorem 10.3.2. Note that

curl F = ∇× F =(

∂P

∂y− ∂N

∂z,

∂M

∂z− ∂P

∂x,

∂N

∂x− ∂M

∂y

)= 0,

⇐⇒ ∂P

∂y=

∂N

∂z,

∂M

∂z=

∂P

∂x,

∂N

∂x=

∂M

∂y.

Combining this with Theorem 10.3.2, we get the following theorem.

Theorem 10.6.3 Let F(x, y, z) = (M, N, P )(x, y, z) be a continuous vec-tor field in U ⊆ R3 except possibly for a finite number of points. Then thefollowings are equivalent:

(1) F is conservative in U .(2)

∮C F · ds = 0 for any closed curve C in U .

(3) There is a differentiable function f : U → R such that F = ∇f .


(4) curl F = ∇× F = 0 in U , or F is irrotational.

Example 10.6.7 The gravitational force F by a mass M at the originto a mass m at −→γ = (x, y, z) is given by

−GmM

r2

−→γr

= −GmMF, F =−→γr3

, r = |−→γ | =√

x2 + y2 + z2,

where G is the gravitational constant. This field is not defined at the origin.Anyhow, at any point other than the origin,

∇× F =[∇(

1r3

)×−→γ +1r3∇×−→γ

].

But ∇( 1r3 ) = − 3

r5−→γ implies the first term vanishes since −→γ ×−→γ = 0. Since

the second term also trivially vanishes, we have ∇×F = 0. Now, F = −∇ffor some f . One can easily see that f(x, y, z) = −1

r since ∇(rn) = nrn−2−→γ .(GmM)f is called the gravitational potential energy. ¤

10.7 The Divergence Theorem

The 3-dimensional version of the fundamental theorem of calculus, the Green’stheorem, or the Stokes’ theorem is the divergence theorem that we are goingto discuss in this section now.

Theorem 10.7.1 (Divergence Theorem) Let U ⊆ R3 be a region in thespace enclosed by an oriented closed surface S = ∂U with outward normalvector n. Let F be a smooth vector field on U . Then

∫∫

SF · dS =

∫∫

SF · ndS =

∫∫∫

U∇ · FdV =

∫∫∫

Udiv FdV.

Note that the Stokes’ Theorem is∫∫

Scurl F · dS =

∮

∅F · ds = 0,

since the boundary C of a closed surface S is empty.

Proof: We assume that U is convex with no holes or bubbles inside, suchas a solid ball, cube, or ellipsoid. For the unit outward normal vector n =(n1, n2, n3), we may write them as

n1 = n · i = cosα, n2 = n · j = cosβ, n3 = n · k = cos γ,

10.7. The divergence Theorem 313

and so n = (cosα, cosβ, cos γ). Then∫∫

SF · ndS =

∫∫

S(M cosα + N cosβ + P cos γ)dS

∫∫∫

U∇ · FdV =

∫∫∫

U

(∂M

∂x+

∂N

∂y+

∂P

∂z

)dxdydz.

Hence, we need to prove∫∫

SM cosαdS =

∫∫∫

U

∂M

∂xdxdydz,

∫∫

SN cosβdS =

∫∫∫

U

∂N

∂ydxdydz,

∫∫

SP cos γdS =

∫∫∫

U

∂P

∂zdxdydz.

We prove the third one: Suppose S is a piecewise smooth surface as theunion of two surfaces

S1 : z = f1(x, y), for (x, y) ∈ Dxy,

S2 : z = f2(x, y), for (x, y) ∈ Dxy,

with f1 ≤ f2, where Dxy is the projection of U on the xy-plane. On S2, wehave p = k and so dS = 1

| cos γ|dA = dxdycos γ . Thus, cos γdS = dxdy. On S1,

we have cos γdS = −dxdy. Hence,∫∫

SP cos γdS =

∫∫

S1

P cos γdS +∫∫

S2

P cos γdS

=∫∫

Dxy

P (x, y, f2(x, y))dxdy −∫∫

Dxy

P (x, y, f1(x, y))dxdy

=∫∫

Dxy

[P (x, y, f2(x, y))− P (x, y, f1(x, y))]dxdy

=∫∫

Dxy

[∫ f2(x,y)

f1(x,y)

∂P

∂zdz

]dxdy =

∫∫∫

U

∂P

∂zdxdydz.

The proof of the other two equalities follow the same computation. ¤

The divergence theorem can be extended to regions that can be par-titioned into a finite number of simple regions of the type just discussed.For example, consider the region between two closed surfaces, one inside


the other. The surface of this type consists of two pieces with oppositelyoriented normals. Divide the region by a plane and apply the divergencetheorem to each half separately. The surface integrals on the faces cut bythe plane will be cancelled out since the outward normals are in oppositedirections. Adding them the equations, we arrive at the required result.

-}3

N

n n

nρ

nρ

u

−u

Example 10.7.1 Let F = (x, y, z), and S be the sphere x2 + y2 + z2 = a2

bounding the ball U of radius a. Then S is the level surface of f(x, y, z) =x2 + y2 + z2 − a2 = 0, and so n = ∇f

|∇f | = 1a(x, y, z). Thus

∫∫

SF · ndS =

∫∫

SadS = a

∫∫

SdS = 4πa3.

On the other hand, div F = ∇ · F = 3 and so∫∫∫

U∇ · FdV = 3

∫∫∫

UdV = 3

43πa3 = 4πa3. ¤

Example 10.7.2 Let S be the unit sphere x2 + y2 + z2 = 1 bounding thesolid unit ball U . Evaluate

∫∫

S(x2 + y + z)dS.

Solution: Note that the outward unit normal vector on S is n = (x, y, z).Thus if we set F = (x, 1, 1), then F · n = x2 + y + z and ∇ · F = 1. Hence,

∫∫

S(x2 + y + z)dS =

∫∫

SF · ndS =

∫∫∫

U∇ · FdV =

∫∫∫

UdV =

43π. ¤

Note that it is easy to see that div(curl F) = ∇ · (∇ × F) = 0 for anyC2 vector field F, since again the mixed second partials are equal. Thus for


closed surface S bounding a region R in the space, the Stokes’ theorem andthe divergence theorem are written as

∫∫∫

Rdiv curl FdV =

∫∫

Scurl F · dS =

∮

∅F · ds = 0.

The following theorem shows that the converse is also true: if div F = 0,then F = curl G for some G.

Theorem 10.7.2 If F = (M, N, P ) is a C1 vector field on R3 with div F =0, then there exists a C1 vector field G with F = curl G.

Proof: Define G = (G1, G2, G3) by

G1(x, y, z) =∫ z

0N(x, y, t)dt−

∫ y

0P (x, t, 0)dt,

G2(x, y, z) = −∫ z

0M(x, y, t)dt,

G3(x, y, z) = 0.

Then, by a direct computation, one can show that F = curl G. ¤

We should warn the reader that, unlike the F in Theorem 10.6.3, thefield F in Theorem 10.7.2 is not allowed to have an exceptional point. Forexample, the field F =

−→γr3 in the gravitational force field in Example 10.6.7

has div F = 0 (see below), and yet there is no G such that F = curl G(check this).

In fact, the divergence of F is computed as:

∂r

∂x=

x

r,

∂M

∂x=

∂(xr−3)∂x

=1r3− 3x2

r5.

Similarly,∂N

∂y=

1r3− 3y2

r5,

∂P

∂z=

1r3− 3z2

r5.

Thusdiv F =

3r3− 3

r5(x2 + y2 + z2) =

3r3− 3

r5r2 = 0.

In electromagnetic theory, the electronic field created by a point chargeq located at the origin is

E(x, y, z) =q

4πε0

−→γr3

=q

4πε0F, F =

−→γr3

=(x, y, z)

r3, r = |−→γ | =

√x2 + y2 + z2,


where ε0 is a physical constant.Suppose that Sρ is the sphere of radius ρ centered at the origin with the

outward unit normal nρ = (x,y,z)ρ . Then

∫∫

Sρ

F · nρdS =∫∫

Sρ

(x, y, z)ρ3

· (x, y, z)ρ

dS =∫∫

Sρ

x2 + y2 + z2

ρ4dS

=∫∫

Sρ

1ρ2

dS =1ρ2

(4πρ2) = 4π.

Suppose that S is any closed surface enclosing a region U ⊆ R3, and(0, 0, 0) ∈ U . Consider the the sphere Sρ of radius ρ centered at the origin.Then F is differentiable on the region R between the surfaces S and Sρ. Letn and u = −nρ be the unit outward normal vectors of S and Sρ, respectively.By the divergence theorem,

∫∫

SF · ndS +

∫∫

Sρ

F · udS =∫∫∫

R∇ · FdV = 0.

But∫∫

SF · ndS +

∫∫

Sρ

F · udS =∫∫

SF · ndS −

∫∫

Sρ

F · nρdS = 0,

and so ∫∫

SF · ndS =

∫∫

Sρ

F · nρdS = 4π.

This is called the Gauss’ Law: for any closed surface S containing theorigin, ∫∫

SE · ndS =

q

4πε0

∫∫

SF · ndS =

q

ε0.

Recall that, see Example 10.6.7, the potential energy of F = (x,y,z)r3 due

to a point charge q at the origin is given by

f(x, y, z) = −1r, r =

√x2 + y2 + z2

and the corresponding electric field is

E =q

4π∇f =

q

4π

−→γr3

.

Thus the Gauss’ law says that the total electric flux∫∫S E · ndS across a

closed surface S is q if the charge lies inside S and zero otherwise.


Let Bρ be the ball with Sρ = ∂Bρ in R3 of radius ρ centered at P . Thenthere is a point Q ∈ Bρ such that

∫∫

Sρ

F · ndS =∫∫∫

Bρ

div FdV = div F(Q) · V ol(Bρ),

or div F(P ) = limρ→0

div F(Q) = limρ→0

1V ol(Bρ)

∫∫

Sρ

F · ndS,

which means that div F(P ) is the density of flux: the rate of net outwardflux at P per unit volume. If div F(P ) > 0, P is a source, and if div F(P ) <0, P is a sink for F. When div F = 0, F is called incompressible.

Remark: The Green’s theorem, the Stokes’ theorem, and the divergencetheorem are all the most basic tools in studying fluid mechanics or electro-magnetism in applications. For further study in this direction, we recom-mend a book, by T. Hughes and J. Marsden, “A Short Course in FluidMechanics”, from Publish or Perish, Inc. 1976, and references given in thisbook.

— Have fun with this topic.

— Cheers for your glorious mathematical life !!!

— The End. —

Chapter 11

Appendix A

11.1 The Gamma Function

There is an important function in many different parts of mathematics,called the gamma function. The theory of the gamma function was devel-oped in connection with the problem of generalizing the factorial functionof the natural numbers, that is, the problem of finding an expression of n!for positive integers n, and then extending to arbitrary real numbers x!, andwas introduced in 1729 by L. Euler in a letter to Ch. Goldbach.

Despite the importance of the gamma function, it is often treated in avery sketchy and complicated fashion, while it can be thought of as one ofthe elementary functions and all of its basic properties can be establishedusing elementary methods of calculus. This is one of the most prominentapplications of the elementary calculus to a highly sophisticated theory offunctions. The exposition of this monograph follows the excellent book:”The Gamma Function” by Emil Artin, Holt, Rinehart and Winston,1964. We include some parts of it here since this book treats the gammafunction most completely in the calculus level, and it is, new or used onewhatsoever, unavailable in the current book markets.

11.1.1 The definition of the gamma function

Note that, for any positive integers n, we have, by using the integration byparts,

∫ ∞

0e−ttndt = n

∫ ∞

0e−ttn−1dt = n!.

319

320 Chapter 11. Appendix

This suggests that for an arbitrary real number x to define x!, we maysimply replace the integer n by x in the integral. Provided that the integralstill converges, we may define x! as the value of the integral on the left side.Rather than doing that, we introduce a function, called the gamma function,that has the value (n− 1)! for positive integer n:

Definition 11.1.1 The gamma function is defined as

Γ(x) =∫ ∞

0e−ttx−1dt,

which is called the Euler’s second integral.

(1) This improper integral exists for all x > 0: Note that the integrande−ttx−1 is smaller than tx−1 for all t > 0 since e−t < 1. Thus

∫ 1

εe−ttx−1dt <

∫ 1

εtx−1dt =

1x− εx

x<

1x

,

for x > 0. By holding x fixed, the value of the integral increase monotoni-cally as ε → 0: That is,

∫ 1

0e−ttx−1dt = lim

ε→0

∫ 1

εe−ttx−1dt

exists for any x > 0.When t > 0, et =

∑∞n=0

tn

n! > tn

n! for all integer n. Thus, e−t < n!tn , and

so e−ttx−1 < n!tn+1−x . Hence, by holding x fixed and choosing n > x + 1, for

any δ > 1, the following integral has an upper bound:∫ δ

1e−ttx−1dt <

∫ δ

1

n!tn+1−x

dt = n![ −1n− x

1tn−x

]δ

1

<n!

n− x.

However, the value of this integral increases as δ increases, and so∫ ∞

1e−ttx−1dt = lim

δ→∞

∫ δ

1e−ttx−1dt

exists. Therefore, the definition of Γ(x) is meaningful for all x > 0.(2) If we replace x by x + 1,

∫ δ

εe−ttxdt = −e−ttx|δε + x

∫ δ

εe−ttx−1dt

= e−εεx − e−δδx + x

∫ δ

εe−ttx−1dt.

11.1. The Gamma Function 321

The first two terms approach zero as ε → 0 and δ →∞. Thus,

Γ(x + 1) = xΓ(x). (11.1)

This is the most basic for the development of the rest of the theory. Clearly

Γ(1) =∫ ∞

0e−tdt = 1,

Γ(n + 1) =∫ ∞

0e−ttndt = n

∫ ∞

0e−ttn−1dt = nΓ(n) = n!,

for any positive integer n. Hence, Equation (11.1) represents a generalizationof n! = n(n− 1)! to x! for non-integer values of x.

(3) Suppose the value of the gamma function is known on the interval0 < x ≤ 1, then by Equation (11.1), its value on the interval 1 < x ≤ 2 canbe easily computed, and then on the next interval of length 1, and so on:

Γ(x + n) = (x + n− 1)(x + n− 2) · · · (x + 1)xΓ(x), ∀ n > 0.

(4) For x < 0 with −n < x < −n + 1 or 0 < x + n < 1, we define

Γ(x) =1

x(x + 1) · · · (x + n− 1)Γ(x + n).

For x negative integer or 0, this expression is meaningless, and so Γ(x) isundefined at those particular numbers. Otherwise, Γ(x) is well-defined since0 < x + n < 1.

11.1.2 Uniqueness of the gamma function

For an arbitrary function f(x) defined on (0, 1], define, ∀ n > 0 integer,

f(x + n) = (x + n− 1)(x + n− 2) · · · (x + 1)xf(x), 0 < x ≤ 1,

f(x) =1

x(x + 1) · · · (x + n− 1)f(x + n), −n < x < −n + 1.

Then f(x) is defined for all real numbers except for 0 and the negativeintegers, and satisfy the factorial property f(x + 1) = xf(x). This meansthat our definition of the gamma function seems rather arbitrary, and so thefactorial function can be extended arbitrarily. The property that singles outthe gamma function from all the other possible functions with this factorialproperty is that it is log convex: y = log Γ(x) is very smooth and convex.For a discussion of convex functions, see Section 11.2. By Theorem 11.2.11,one can easily see that Γ(x) is log convex by taking ϕ(t) = e−t.


Theorem 11.1.1 Let f(x) be a function on an interval I satisfying thefollowing three conditions:

(1) f(x + 1) = xf(x),

(2) The domain I of f contains all x > 0, and is log convex for those x,

(3) f(1) = 1.

Then f = Γ.

Proof: We have seen that Γ satisfies the three conditions. Let f(x) beanother function satisfying the three conditions. By (1) and (3), f(n) =(n−1)! for all positive integers n. By condition (1), it suffices to show f = Γon (0, 1]. Since the condition (2) means that ln f is convex, for x ∈ (0, 1] andan integer n ≥ 2, we have the monotone growth of the difference quotients

ln f(−1 + n)− ln f(n)(−1 + n)− n

≤ ln f(x + n)− ln f(n)(x + n)− n

≤ ln f(1 + n)− ln f(n)(1 + n)− n

.

∴ ln(n− 1) ≤ ln f(x + n)− ln(n− 1)!x

≤ ln n,

or ln[(n− 1)x(n− 1)!] ≤ ln f(x + n) ≤ ln[nx(n− 1)!].

Since log is a monotonic function,

(n− 1)x(n− 1)! ≤ f(x + n) ≤ nx(n− 1)!.

From f(x + n) = (x + n− 1)(x + n− 2) · · · (x + 1)xf(x),

(n− 1)x(n− 1)!x(x + 1) · · · (x + n− 1)

≤ f(x) ≤ nx(n− 1)!x(x + 1) · · · (x + n− 1)

=nxn!

x(x + 1) · · · (x + n− 1)(x + n)x + n

n,

which holds ∀ n ≥ 2. Thus, by replacing n by n + 1 on the left side, we get

nxn!x(x + 1) · · · (x + n)

≤ f(x) ≤ nxn!x(x + 1) · · · (x + n)

x + n

n.

Thus

f(x)n

x + n≤ nxn!

x(x + 1) · · · (x + n)≤ f(x).


As n →∞, we obtain

f(x) = limn→∞

nxn!x(x + 1) · · · (x + n)

.

Since Γ also satisfies all the three conditions, this equation is still valid forΓ: i.e., we have the Gauss formula:

Γ(x) = limn→∞

nxn!x(x + 1) · · · (x + n)

, (11.2)

for x ∈ (0, 1]. Thus Γ(x) = f(x) for x ∈ (0, 1]. ¤

Equation (11.2) is only proved on (0, 1]. To show that it holds in general,we define

Γn(x) ≡ nxn!x(x + 1) · · · (x + n)

.

Then

Γn(x + 1) = xn

x + n + 1Γn(x), or Γn(x) =

1x

x + n + 1n

Γn(x + 1).

This means that if limn→∞ Γn(x) exists for a number x, so does it for x+1,and conversely. Hence, the limit exists for exactly those values of x for whichΓ(x) is defined. If we denote the limit in Equation (11.2) by f(x), we getf(x + 1) = xf(x). Since we have shown that f(x) = Γ(x) on (0, 1], it mustalso agree everywhere else.

Remark: Originally, Euler used Equation (11.2) for the definition of thegamma function, and derived the integral representation, the Euler secondintegral. E. Artin has chosen the integral form for the definition becausethis approach saves us from the trouble of proving the convergence of thisGauss’s formula. The symbol Γ(x) and the name gamma-function were firstproposed in 1814 by A.M. Legendre.

Weierstrass formula: A simple manipulation shows

Γn(x) =nxn!

x(x + 1) · · · (x + n)

=nx

x(1 + x1 )(1 + x

2 ) · · · (1 + xn)

= ex(ln n−1− 12−···− 1

n) e

x1 · · · e x

n

x(1 + x1 )(1 + x

2 ) · · · (1 + xn)

.


We have seen that the Euler’s constant

C = limn→∞

(1 +

12

+ · · ·+ 1n− ln n

)

exists in Example 3.1.6. Hence,

Γ(x) = limn→∞Γn(x) = e−Cx 1

xlim

n→∞

n∏

k=1

exk

1 + xk

= e−Cx 1x

∞∏

k=1

exk

1 + xk

,

which is called the Weierstrass formula.

11.1.3 Differentiability of the gamma function

The gamma function Γ(x) is differentiable infinitely many times: By equa-tion (11.1), it suffices to prove the assertion for x > 0. ln Γ(x) is well-definedsince Γ(x) > 0 for x > 0, and by the Weierstrass formula,

ln Γ(x) = −Cx− lnx + limn→∞

n∑

k=1

(x

k− ln(1 +

x

k))

= −Cx− lnx +∞∑

k=1

(x

k− ln(1 +

x

k))

.

Take term-by term differentiation of the series on the right side:

−C − 1x

+∞∑

k=1

(1k− 1

x + k

)= −C − 1

x+

∞∑

k=1

x

k(x + k).

For an arbitrary r and x with 0 < x ≤ r, 0 ≤ xk(x+k) ≤ x

k2 ≤ rk2 implies

that the above series converges independent of x. Thus the series of theterm-by-term integral is also convergent, which is ln Γ(x). This means thatln Γ(x) is differentiable on the interval (0, r] and

d

dxln Γ(x) =

Γ′(x)Γ(x)

= −C − 1x

+∞∑

k=1

(1k− 1

x + k

).

Since r was arbitrary, this derivative exists for all x > 0.Take the derivative once again:

d2

dx2ln Γ(x) =

Γ′′(x)Γ(x)− Γ′(x)2

Γ(x)2=

∞∑

k=0

1(x + k)2

> 0, (11.3)


which obviously converges for x > 0. In general, we have convergent series:

dn

dxnln Γ(x) =

dn−1

dxn−1

Γ′(x)Γ(x)

=∞∑

k=0

(−1)n(n− 1)(x + k)n

, n ≥ 2, x > 0.

Extend the validity of this equation to include x < 0 by using Γ(x+1) =xΓ(x). In particular, for n = 2 in equation (11.3), we get

Γ′′(x)Γ(x)− Γ′(x)2 > 0, or Γ′′(x)Γ(x) > Γ′(x)2 ≥ 0,

which means that Γ(x) and Γ′′(x) are either both positive or both negativefor all x. Thus |Γ(x)| is convex. Γ(x) > 0 for x > 0, By definition of Γ(x)for x < 0, the sign of Γ(x) = (−1)n on −n < x < −n + 1. Note that theformula also shows that |Γ(x)| → ∞ as x approaches 0 or negative integers.

-

6

-

6

0 1 2 3 4-1-2-3

1

23

4

-1

-2

-3

-4

1

2

0

-11 2-1-2-3

Figure 11.1: The graphs of Γ(x) and 1Γ(x) .

For x > 0, Γ(x) has a unique minimum 0.885603 · · · at x = 1.4616321 · · · .The local minima of |Γ(x)| form a sequence tending to 0 as x → −∞.


11.1.4 The Euler’s first integral

Recall that the integral definition of the gamma function was called theEuler’s second integral.

Definition 11.1.2 The Euler’s first integral is defined as a function oftwo variables, called the Beta function:

B(x, y) =∫ 1

0tx−1(1− t)y−1dt. (11.4)

Theorem 11.1.2 The integral exists for x and y positive. Thus, by Theo-rem 11.2.11, it is log convex.

Proof:

B(x, y) =∫ 1

2

0tx−1(1− t)y−1dt +

∫ 1

12

tx−1(1− t)y−1dt

≤∫ 1

2

0tx−1(1− t)−1dt +

∫ 1

12

t−1(1− t)y−1dt

≤∫ 1

2

02tx−1dt +

∫ 1

12

2(1− t)y−1dt.

Now, apply the same method used to prove the existence of the Euler’ssecond integral Γ(x). ¤

Using integration by parts for B(x + 1, y), we get:

B(x + 1, y) =∫ 1

0(1− t)x+y−1

(t

1− t

)x

dt

= limδ,ε→0

∫ 1−δ

ε(1− t)x+y−1

(t

1− t

)x

dt

= limδ,ε→0

[−(1− t)x+y

x + y

(t

1− t

)x]1−δ

ε

+∫ 1

0

x

x + y(1− t)x+y

(t

1− t

)x−1 1(1− t)2

dt

= limδ,ε→0

[(1− ε)yεx − δy(1− δ)x

x + y

]+

x

x + y

∫ 1

0tx−1(1− t)y−1dt

=x

x + yB(x, y).


By holding y fixed, define f(x) = B(x, y)Γ(x+ y). Then f(x) is log con-vex as the product of two log convex functions, by Theorem 11.2.9. More-over,

f(x + 1) = B(x + 1, y)Γ(x + y + 1) =x

x + yB(x, y)(x + y)Γ(x + y)

= xB(x, y)Γ(x + y) = xf(x).

Since B(1, y) =∫ 10 (1− t)y−1dt = 1

y ,

f(1) =1yΓ(1 + y) = Γ(y).

Thus if we set F (x) = f(x)Γ(y) a normalization of f(x) by Γ(y), then one can

easily see that F (x) satisfies the three conditions of Theorem 11.1.1. Henceby the uniqueness of the gamma function,

F (x) =f(x)Γ(y)

= Γ(x), or B(x, y)Γ(x + y) = Γ(x)Γ(y),

or B(x, y) =Γ(x)Γ(y)Γ(x + y)

,

for all positive x and y. In particular, for x = 12 = y,

Γ(12)2 =

∫ 1

0

1√t(1− t)

dt = 2∫ π

2

0dϕ = π,

where t = sin2 ϕ. Hence,

√π = Γ(

12) =

∫ ∞

0

1et√

tdt,

Γ(n +12) = (n− 1

2)(n− 3

2) · · · (n− 2n− 1

2)√

π.

11.1.5 Γ(x) for large x

Note that, in Example 2.4.1, for integers k = 1, . . ., (n − 1), we have theinequalities: (

1 +1k

)k

< e <

(1 +

1k

)k+1

.


By multiplying them together for k = 1, . . ., (n− 1), we get

21

32

22

43

33· · · nn−1

(n− 1)n−1< en−1 <

22

133

23

44

34· · · nn

(n− 1)n,

∴ nn−1

(n− 1)!< en−1 <

nn

(n− 1)!,

orenn−1

en< (n− 1)! <

enn

en.

Thus (n − 1)! = Γ(n) grows faster than nn−1

en but not quite as fast as nn

en .This suggests us to consider a function

f(x) = xx−1/2e−xeµ(x),

where µ(x) is to be chosen so that f(x) satisfies the basic properties of Γ(x).(1) Note that

f(x + 1)f(x)

=(x + 1)x+1/2e−x−1eµ(x+1)

xx−1/2e−xeµ(x)= (1 +

1x

)x+1/2xe−1eµ(x+1)−µ(x).

Thus, f(x + 1) = xf(x) if and only if

µ(x)− µ(x + 1) = (x +12) ln(1 +

1x

)− 1.

Setg(x) ≡ (x +

12) ln(1 +

1x

)− 1.

Then

µ(x) =∞∑

n=0

g(x + n),

since

µ(x)− µ(x + 1) =∞∑

n=0

g(x + n)−∞∑

n=0

g(x + n + 1)

=∞∑

n=0

g(x + n)−∞∑

n=1

g(x + n) = g(x).

(2) Note that xx−1/2e−x is log convex, since

ln(xx−1/2e−x) = (x +12) lnx− x,

(ln(xx−1/2e−x))′ =1x

(x− 12) + lnx− 1,

(ln(xx−1/2e−x))′′ =12

1x2

+1x

> 0, for x > 0.


(3) µ(x) is convex, so that eµ(x) and f(x) are log convex: It is convex ifg(x + n) are convex for n ≥ 0. However,

g′′(x) =1

2x2(x + 1)2> 0.

(4) µ(x) =∑∞

n=0 g(x + n) is convergent: Consider

12

ln1 + y

1− y= y +

y3

3+

y5

5+ · · · , for |y| < 1.

Replace y by 12x+1 < 1 for x > 0, and multiply the equation by 2x+1. Then

(x +12) ln(1 +

1x

)− 1 = g(x) =1

3(2x + 1)2+

15(2x + 1)4

+1

7(2x + 1)6+ · · · > 0.

The right side shows also g(x) is convex since each term is convex. If we re-place 5, 7, 9, . . ., by 3, then the right side increases and becomes a geometricseries with ratio 1

(2x+1)2, so that its sum is

1/(3(2x + 1)2)1− (1/(2x + 1)2)

=1

12x(x + 1)=

112x

− 112(x + 1)

.

That is,

0 < g(x) <1

12x− 1

12(x + 1).

Hence,

0 < µ(x) =∞∑

n=0

g(x + n) <

∞∑

n=0

(1

12(x + n)− 1

12(x + n + 1)

)=

112x

.

Therefore,

µ(x) =θ

12x,

for some θ ∈ (0, 1) independent of x.(5) Hence, for a suitable constant a, af(x) = Γ(x) by Theorem 11.1.1,

so that

Γ(x) = axx−1/2e−x+µ(x) = axx−1/2e−x+ θ12x . (11.5)

For integers x = n,n! = ann+1/2e−n+ θ

12n .


11.1.6 Stirling’s formula

We now determine exact value of the constant a and some other importantconstants at the same time.

Let p be a positive integer, and consider the function:

f(x) = pxΓ(x

p)Γ(

x + 1p

) · · ·Γ(x + p− 1

p), x > 0.

(1) Since (ln px)′′ = (ln p)′ = 0 and each Γ(x+kp ) is log convex, f(x) is

log convex.(2) Moreover,

f(x + 1) = ppxΓ(x + 1

p) · · ·Γ(

x + p− 1p

)Γ(x + p

p)

= ppxΓ(x + 1

p) · · ·Γ(

x + p− 1p

)x

pΓ(

x

p) = xf(x).

(3) Hence, for some constant ap,

apΓ(x) = f(x) = pxΓ(x

p)Γ(

x + 1p

) · · ·Γ(x + p− 1

p). (11.6)

If we take x = 1, then

ap = pΓ(1p)Γ(

2p) · · ·Γ(

p

p).

For x = kp in the Gauss formula,

Γ(k

p) = lim

n→∞n

kp n!

kp (k

p + 1) · · · (kp + n)

= limn→∞

nkp n!pn+1

k(k + p) · · · (k + np).

Since for k = 1, . . ., p,

p∏

k=1

k(k + p) · · · (k + np)

= [1 · 2 · · · p][(1 + p) · · · (p + p)] · · · [(1 + np) · · · (p + np)]= (p + np)! = ((n + 1)p)!,

ap = pΓ(1p)Γ(

2p) · · ·Γ(

p

p)

= p limn→∞

(n!)ppp(n+1)n1p+···+ p

p

(p + np)!= p lim

n→∞(n!)ppp(n+1)n

p+12

(p + np)!.


We now multiply the well-known formula (see Example 7.3.4):

1 = limn→∞

(1 +

1np

)(1 +

2np

)· · ·

(1 +

p

np

)= lim

n→∞(p + np)!(np)!(np)p

,

to ap to get

ap = p limn→∞

(n!)ppp(n+1)np+12

(p + np)!· 1

= p limn→∞

(n!)ppp(n+1)np+12

(p + np)!· (p + np)!(np)!(np)p

= p limn→∞

(n!)ppnp

(np)!np−12

.

Since

(n!)p = apnpn+p/2e−np+θ1p12n , (np)! = a(np)pn+1/2e

−np+θ2

12np ,

we obtainap =

√p ap−1 lim

n→∞ eθ1p12n

− θ212np =

√p ap−1.

For p = 2, a2 =√

2 a while a2 = 2Γ(12)Γ(1) = 2

√π by the definition. Thus

a =√

2π and ap =√

p (√

2π)p−1.

Now from the equations (11.5) and (11.6), we obtain the Stirling’s for-mula and the Gauss’ product formula:

Γ(x) =√

2π xx−1/2e−x+ θ12x , 0 < θ < 1, (11.7)

(√

2π)p−1Γ(x) = px−1/2Γ(x

p)Γ(

x + 1p

) · · ·Γ(x + p− 1

p). (11.8)

In particular, for p = 2, equation (11.8) is reduced to√

π

2x−1Γ(x) = Γ(

x

2)Γ(

x + 12

), (11.9)

which is called the Legendre’s relation.For positive integer n, the Stirling’s formula (11.7) becomes

n! = n(n− 1)! = nΓ(n) =√

2π nn+1/2e−n+ θ12n .


And, by replacing x by px in equations (11.8) and (11.9), we get

Γ(px) =ppx−1/2

(√

2π)p−1Γ(x)Γ(x +

1p) · · ·Γ(x +

p− 1p

),

Γ(2x) =22x−1

√π

Γ(x)Γ(x +12),

which gives rise the name Gauss’ ”product formula”. These formulas de-scribe the behavior of Γ(x) for large values of x: With our approximation ofµ(x), the accuracy of the Stirling’s formula for Γ(x) increases as x increase.For n!, the accuracy is high enough if n ≥ 10.

In summary, we have the following expressions of Γ(x):

µ(x) =∞∑

n=0

g(x + n) =∞∑

n=0

[(x + n +

12) ln

(1 +

1x + n

)− 1

]

=θ

12x, 0 < θ < 1,

Γ(x) =∫ ∞

0e−ttx−1dt, Euler’s 2nd integral,

= limn→∞

nxn!x(x + 1) · · · (x + n)

, Gauss formula,

= e−Cx 1x

∞∏

k=1

exk

1 + xk

, Weierstrass formula,

=√

2π xx−1/2e−x+ θ12x , 0 < θ < 1, Stirling’s formula,

=px−1/2

(√

2π)p−1Γ(

x

p)Γ(

x + 1p

) · · ·Γ(x + p− 1

p), Gauss product,

n! =√

2π nn+1/2e−n+ θ12n , Stirling’s formula,

where C = limn→∞(1 + 1

2 + · · · − 1n − lnn

)in the Weierstrass formula is

the Euler’s constant.

11.1.7 The connection with sin x

The gamma function satisfies another important functional equation. Wedefine

ϕ(x) = Γ(x)Γ(1− x) sinπx, for non-integer x.


(1) Since Γ(1−x) = −xΓ(−x) and sinπ(1+x) = sin(π+πx) = − sinπx,

ϕ(x + 1) = Γ(x + 1)Γ(−x) sin π(x + 1)

= xΓ(x)Γ(1− x)−x

(− sinπx)

= Γ(x)Γ(1− x) sinπx = ϕ(x),

which show that ϕ(x) is periodic of period 1.(2) From the Legendre relation:

Γ(x

2)Γ(

x + 12

) = b2−xΓ(x), b = 2√

π,

Γ(1− x

2)Γ(1− x

2) = b2x−1Γ(1− x).

Hence, for non-integral value of x,

ϕ(x

2)ϕ(

x + 12

) = Γ(x

2)Γ(1− x

2) sin

πx

2Γ(

x + 12

)Γ(1− x

2) cos

πx

2

=b2

4Γ(x)Γ(1− x) sin πx,

= dϕ(x), d =b2

4= π.

(3) ϕ(x) is differentiable infinitely many times since so are Γ(x) and sinx.Moreover,

ϕ(x) = Γ(x)Γ(1− x) sinπx

=1x

Γ(1 + x)Γ(1− x) sin πx

= Γ(1 + x)Γ(1− x)(

π − π3x2

3!+

π5x4

5!− · · ·

),

where the right side converges for all x, especially for x = 0, and representsa function having derivatives of all orders at this point.

(4) Thus, we can extend the functional value of ϕ at 0 with ϕ(0) = π.Since ϕ is periodic by (2), we define ϕ(x) to be π for all integral numberx, so that ϕ(x) is continuous everywhere and has derivatives of all ordersat every point. The validity of the functional equation of ϕ for all x in (3)follows from the continuity. By definition, ϕ(x) > 0 for 0 < x < 1, and alsofor all x by ϕ(x + 1) = ϕ(x). In general, we have the following.


Theorem 11.1.3 Every positive periodic function ψ(x) of period p that hasa continuous second derivative and satisfies the functional equation

ψ(x

2)ψ(

x + 12

) = dψ(x), d a constant,

is a constant. In particular, ϕ(x) = π for all x.

Proof: Define g(x) = d2

dx2 lnψ(x) = ψ′′(x)ψ(x)−ψ′(x)2

ψ(x)2, which is also periodic

of period p. From the logarithm of the equation ψ(x2 )ψ(x+1

2 ) = dψ(x), onecan easily get

14

[g

(x

2

)+ g

(x + 1

2

)]= g(x).

Since g(x) is continuous on [0, p], it is bounded in this interval by, say M ,|g(x)| ≤ M , which holds for all x since g(x) is periodic. Then

|g(x)| ≤ 14

∣∣∣g(x

2

)∣∣∣ +14

∣∣∣∣g(

x + 12

)∣∣∣∣ ≤M

4+

M

4=

M

2.

By repeating this process, the upper bound of g(x) can be made as small aswe please. That is, g(x) = (lnψ(x))′′ = 0. This means that lnψ(x) = ax+ bfor some constants a, b. Since lnψ(x) is periodic, it must be a constant:ln ψ(x) = b, and so ψ(x) is a constant. In particular, since ϕ(0) = π, wehave ϕ(x) = π for all x. ¤

(5) Since π = ϕ(x) = Γ(x)Γ(1− x) sinπx, we get

Γ(x)Γ(1− x) =π

sinπx, (11.10)

which is called the Euler’s functional equation. For x = 12 , we get a

new proof of Γ(12) =

√π. In this proof, the exact value of the constant in

Legebdre’s relation was never used, that is, this proof is independent of thearguments in Sections 11.1.5 and 11.1.6.

(6) The Euler’s functional equation in (5) can be rewritten in the form:

sinπx =π

−xΓ(x)Γ(−x).

By using the Weierstrass’ product formula for Γ(x) and Γ(−x), we get thefollowing representation of sinπx as an infinite product:

sinπx = πx

∞∏

k=1

(1− x2

k2

).


For the importance of this expression in analysis, we refer the readers to thebooks on function theory.

Remark: The exceptional importance of the gamma function in mathe-matical analysis is due to the fact that it can be used to express a largenumber of definite integrals, infinite products and sum of series such as theBeta function. Moreover, it is widely used in the theory of special functions,like the hypergeometric function or the cylinder functions, and in analyticnumber theory, etc.

11.1.8 Applications to definite integrals

Here, we mention a few of the important results of the two Euler’s integrals.(1) Take the substitution e−t = τ in the Euler’s second integral, then

−e−tdt = dτ and t = ln 1τ , and so we get, by replacing τ with t again,

Γ(x) =∫ 1

0

(ln

1t

)x−1

dt.

(2) Take the substitution tx = τ in the same integral. Then xtx−1dt = dτand t = τ1/x, and so we get, by replacing τ with t again,

Γ(x) =∫ ∞

0e−t1/x 1

xdt.

(3) From (2), Γ( 1x) =

∫∞0 e−txxdt. Thus

Γ(1 +1x

) =1x

Γ(1x

) =1x

∫ ∞

0e−txxdt =

∫ ∞

0e−txdt.

For x = 2, ∫ ∞

0e−t2dt =

12Γ(

12) =

12√

π.

(4) For a > 0, take the substitution t = aτ in the same integral. Thendt = adτ , and by replacing τ with t again, we get

Γ(x) = ax

∫ ∞

0e−attx−1dt, or

Γ(x)ax

=∫ ∞

0e−attx−1dt.

(5) Take the substitution t = ττ+1 in the Euler’s first integral. Then

dt = (τ + 1)−2dτ , 1 − t = 1τ+1 , and τ = 1

1−t . By replacing τ with t again,we get

B(x, y) =∫ 1

0tx−1(1− t)y−1dt =

∫ ∞

0

tx−1

(1 + t)x+ydt =

Γ(x)Γ(y)Γ(x + y)

.


(6) Take the substitution t = sin2 φ in the Euler’s first integral. Thendt = 2 sinφ cosφdφ, 1− t = cos2 φ, and we get

B(x, y) =∫ 1

0tx−1(1− t)y−1dt

= 2∫ π

2

0(sinφ)2x−1(cosφ)2y−1dφ =

Γ(x)Γ(y)Γ(x + y)

.

If we set y = 1 − x, then Γ(x + y) = Γ(1) = 1, and so, by the Euler’sfunctional equation,

B(x, 1− x) = Γ(x)Γ(1− x) =π

sinπx,

=∫ 1

0tx−1(1− t)−xdt, 0 < x < 1,

=∫ ∞

0

tx−1

1 + tdt, 0 < x < 1,

= 2∫ π

2

0(tanφ)2x−1dφ, 0 < x < 1.

In particular, for x = 12 ,

Γ(12)Γ(

12) =

∫ 1

0

1√t(1− t)

dt = π

=∫ ∞

0

1√t (1 + t)

dt = π.

(7) If x and y are both rational numbers, the Euler’s second integral isthe integral of an algebraic function. For example, if we set x = m

n andy = 1

2 , and take the substitution t = τn in the Euler’s second integral, thendt = nτn−1dτ and we get

B(x, y) = n

∫ 1

0

tm−1

√1− tn

dt =Γ(m

n )√

π

Γ(mn + 1

2).

(i) For m = 1, n = 4, Γ(14 + 1

2) = Γ(34) = π

√2

Γ( 14)

by the Euler’s functional

equation. Thus we get

∫ 1

0

1√1− t4

dt =Γ(1

4)√

π

4Γ(34)

=Γ(1

4)2√32π

.

11.2. CONVEX FUNCTIONS 337

(ii) For m = 1, n = 3,

∫ 1

0

1√1− t3

dt =Γ(1

3)√

π

3Γ(13 + 1

2)=

Γ(13)√

π

3Γ(56)

.

For x = 23 in the Legendre’s relation: Γ(x

2 )Γ(1+x2 ) =

√π

2x−1 Γ(x),

Γ(13)Γ(

56) = 21/3√π Γ(

23).

For x = 13 in the Euler’s functional equation: Γ(x)Γ(1− x) = π

sin πx ,

Γ(13)Γ(

23) =

π

sin π3

=2π√

3.

By multiplying these two equations,

Γ(13)2Γ(

56) =

24/3π√

π√3

,

and so∫ 1

0

1√1− t3

dt =Γ(1

3)√

π

3Γ(56)

=Γ(1

3)3

π√

3 3√

16.

11.2 Convex Functions

Recall that, in developing the theory of the gamma function, the log convex-ity was the key property of the gamma function. In this section we derivethe log convexity of the gamma function.

Let f(x) be a real-valued function defined on (a, b) ⊆ R. For any pairx1 6= x2 ∈ (a, b), let

f(x1, x2) =f(x1)− f(x2)

x1 − x2= f(x2, x1).

Definition 11.2.1 f(x) is said to be convex on (a, b) if, for every x3 ∈(a, b), f(x, x3) is monotonically increasing function in x: i.e., f(x1, x3) ≥f(x2, x3) holds for any pair x1 > x2 distinct from x3.


For any distinct triple x1, x2, x3 in (a, b), if we set

f∗(x1, x2, x3) =f(x1, x3)− f(x2, x3)

x1 − x2

=(x3 − x2)f(x1) + (x1 − x3)f(x2) + (x2 − x1)f(x3)

(x1 − x2)(x2 − x3)(x3 − x1),

which is independent of the order of x1, x2, x3, the convexity of f(x) isequivalent to

f∗(x1, x2, x3) ≥ 0.

For another convex function g(x) on (a, b), one can easily show thatf + g is also convex since f + g = f + g and (f + g)∗ = f∗ + g∗. Moreover,if {fn(x)} is a sequence of convex functions on (a, b) whose limit exists:limn→∞ fn(x) = f(x), then limn→∞ f∗n(x1, x2, x3) = f∗(x1, x2, x3) for x1,x2, x3 ∈ (a, b) shows f(x) is also convex.

Theorem 11.2.1 The sum of convex functions and the limit function of aconvergent sequence of convex functions are convex. The sum of a convergentseries whose terms are all convex is also convex.

The last statement follows from the fact that the sum is the limit ofpartial sums which are also convex.

Let f(x) be a convex function defined on (a, b). For a fixed x0 ∈ (a, b)and x1 > x0 > x2, we have f(x1, x0) ≥ f(x2, x0). By keeping x2 fixed,decrease x1 to x0. Then the left side will also decrease, but no less than theright side. That is, the right-handed derivative of f(x) exists:

f ′(x+0 ) = lim

x1→x+0

f(x1)− f(x0)x1 − x0

= limx1→x+

0

f(x1, x0) ≥ f(x2, x0).

Similarly, the left-handed derivative f ′(x−0 ) also exists and

f ′(x−0 ) ≤ f ′(x+0 ).

For x0 < x2 < x3 < x1, since

f(x0, x2) ≤ f(x0, x3) ≤ f(x1, x3) = f(x3, x1),

we havef ′(x+

0 ) ≤ f ′(x−1 )

for x0 < x1. Thus the one-sided derivatives are monotonically increasing.The following is a generalized intermediate value theorem for continuous

functions.

11.2. Convex functions 339

Theorem 11.2.2 (Quesi-Rolle’s Theorem) Let f(x) be a continuous func-tion defined on [a, b] with one-sided derivatives on (a, b). If f(a) = f(b), thenthere is a point c ∈ (a, b) such that one of the values f ′(c−) and f ′(c+) is≥ 0 and the other is ≤ 0.

Proof: (1) If f(x) has its maximum at c ∈ (a, b), then

f(c + h)− f(c)h

{ ≤ 0, for h > 0,≥ 0, for h < 0.

Thus, f ′(c+) ≤ 0 and f ′(c−) ≥ 0.(2) If f(x) has its minimum at c ∈ (a, b), then similarly we obtain

f ′(c+) ≥ 0 and f ′(c−) ≤ 0.(3) If f(x) has both its maximum and minimum at a or b, then f(x) is

a constant function so that f ′(x) = 0 for all x ∈ (a, b). ¤

Theorem 11.2.3 (Quesi-Mean-Value Theorem) Let f(x) be a contin-uous function defined on [a, b] with one-sided derivatives on (a, b). Thenthere is a point c ∈ (a, b) such that

f(a, b) =f(b)− f(a)

b− a

lies between f ′(c−) and f ′(c+).

Proof: The function

F (x) = f(x)− f(a, b)(x− a)

is continuous with one-sided derivatives

F ′(x±) = f ′(x±)− f(a, b),

and F (a) = f(a) = F (b). By Theorem 11.2.2, there is c ∈ (a, b) such thatone of the values

f ′(c−)− f(a, b) or f ′(x+)− f(a, b)

is ≥ 0, the other is ≤ 0. ¤

Let f(x) be a function defined on (a, b). Suppose that f(x) has mono-tonically increasing one-sided derivatives on (a, b).


For any distinct triple x2 < x3 < x1 in (a, b), by Theorem 11.2.3, thereare points c and d with x2 < c < x3 < d < x1 such that f(x3, x1) liesbetween f ′(d−) and f ′(d+), and f(x2, x3) lies between f ′(c−) and f ′(c+).Since f ′(x−) ≤ f ′(x+), we have

f ′(d−) ≤ f(x3, x1) and f(x2, x3) ≤ f ′(c+).

Thus

f∗(x1, x2, x3) =f(x3, x1)− f(x2, x3)

x1 − x2≥ f ′(d−)− f ′(c+)

x1 − x2≥ 0, since c < d.

Therefore, we have proved the following theorem.

Theorem 11.2.4 f(x) is a convex function on (a, b) if and only if f(x) hasmonotonically increasing one-sided derivatives.

Corollary 11.2.5 Let f(x) be a twice differentiable function on (a, b). Thenf(x) is convex on (a, b) if and only if f ′(x) is monotonically increasing, ifand only if f ′′(x) ≥ 0 for all x ∈ (a, b).

Suppose now that f(x) is convex on (a, b). Take x3 = x1+x22 for x2 < x1.

Thenx3 − x2 = x1 − x3 =

12(x1 − x2),

and so

0 ≤ f∗(x1, x2, x3) =(x1 − x2)[12(f(x1) + f(x2))− f(x3)]

14(x1 − x2)3

=12(f(x1) + f(x2))− f(x3)

14(x1 − x2)2

.

Thusf(

x1 + x2

2) ≤ f(x1) + f(x2)

2,

which is symmetric in x1 and x2, and so also true for x1 < x2. For x1 = x2

it is trivial. A function f(x) satisfying this inequality for all x1, x2 ∈ (a, b)is called weakly convex. Theorem 11.2.1 is also true for weakly convexfunctions.

Lemma 11.2.6 Let f(x) be a weakly convex function on (a, b). Then

f

(x1 + x2 + · · ·+ xn

n

)≤ f(x1) + f(x2) + · · ·+ f(xn)

n. (11.11)


Proof: (1) If the inequality 11.11 holds for n, then it also holds for 2n.Indeed, set

X1 =x1 + · · ·+ xn

nand X2 =

xn+1 + · · ·+ x2n

n.

Then

f

(X1 + X2

2

)≤ f(X1) + f(X2)

2.

By applying the inequality 11.11 to two terms on the right side we obtainthe required inequality.

(2) If the inequality 11.11 holds for n+1, then it also holds for n. Indeed,for n numbers (x1, . . . , xn),

xn+1 =1n

(x1 + · · ·+ xn)

also belongs to (a, b), and

f(xn+1) = f

((n + 1)xn+1

n + 1

)= f

(nxn+1 + xn+1

n + 1

)

= f

(x1 + · · ·+ xn + xn+1

n + 1

)

≤ f(x1) + · · ·+ f(xn) + f(xn+1)n + 1

.

Now, f(xn+1)− f(xn+1)n+1 = n

n+1f(xn+1) gives the desired inequality for the nnumbers.

(3) The inequality 11.11 holds for n = 2. By step (1) it holds for n = 4,and by step (2) it also holds for n ≤ 4. By induction, the inequality holdsfor all n. ¤

Theorem 11.2.7 A function is convex if and only if it is continuous andweakly convex.

Proof: (1) A convex function is continuous since it has one-sided deriva-tives. It is also weakly convex as already has been shown.

(2) Let xn < x1 ∈ (a, b) and 0 ≤ p ≤ n two arbitrary integers. Setx1 = · · · = xp and xp+1 = · · · = xn. Then, by Lemma 11.2.6,

f( p

nx1 + (1− p

n)xn

)≤ p

nf(x1) + (1− p

n)f(xn).


For any real number t with 0 ≤ t ≤ 1, choose a sequence of rationalnumbers tn ∈ (0, 1) that converges to t. Every term of this sequence is ofthe form p

n for suitable integers p and n so that the above inequality holdsfor each term tn. Since f is continuous, we can go to the limit to obtain

f(tx1 + (1− t)xn) ≤ tf(x1) + (1− t)f(xn). (11.12)

For any distinct triple xn < x3 < x1 in (a, b), the denominator off∗(x1, xn, x3) is positive. To show the numerator is positive.

Set t = x3−xnx1−xn

. Then

0 ≤ t ≤ 1, 1− t =x1 − x3

x1 − xn,

andtx1 + (1− t)xn =

(x3 − xn)x1 + (x1 − xn)xn

x1 − xn= x3.

Thus, by the inequality 11.12,

f(x3) ≤ x3 − xn

x1 − xnf(x1) +

x1 − x3

x1 − xnf(xn),

which shows that the numerator of f∗(x1, xn, x3) is positive. ¤

Example 11.2.1 Let f(x) = − lnx for x > 0. Then f ′′(x) = 1x2 > 0 means

f(x) is convex. Thus inequality 11.11 applies:

− ln(

x1 + · · ·+ xn

n

)≤ − 1

n(lnx1 + · · ·+ ln xn),

or ln(

x1 + · · ·+ xn

n

)≥ ln n

√x1 · · · xn,

orx1 + · · ·+ xn

n≥ n

√x1 · · · xn.

Definition 11.2.2 A function f(x) that is positive on (a, b) is called logconvex (weakly log convex) if the function ln f(x) is convex (weaklyconvex).

The following is an immediate consequence of the earlier results.

Theorem 11.2.8 The product of log convex (weakly log convex) functionsis again log convex (weakly log convex). The limit of a convergent sequenceof log convex (weakly log convex) functions is also log convex (weakly logconvex) provided the limit is positive.


Theorem 11.2.9 Suppose f(x) is a twice differentiable function. f(x) islog convex if

f(x) > 0, and f(x)f ′′(x)− f ′(x)2 ≥ 0.

This follows immediately from

(ln f(x))′′ =(

f ′(x)f(x)

)′=

f ′′(x)f(x)− f ′(x)2

f(x)2≥ 0

if and only if ln f(x) is convex.

Theorem 11.2.10 If f and g are both log convex (weakly log convex) onI = (a, b), then so is f + g.

Proof: It suffices to prove for weakly log convexity, since the log convexityfollows immediately with the addition of continuity. For x1, x2 ∈ I,

ln f

(x1 + x2

2

)≤ 1

2[ln f(x1) + ln f(x2)] =

12

ln[f(x1)f(x2)].

⇔ f

(x1 + x2

2

)2

≤ f(x1)f(x2),

Similarly, the same inequality holds for g. We need to show

[f

(x1 + x2

2

)+ g

(x1 + x2

2

)]2

≤ [f(x1) + g(x1)][f(x2) + g(x2)].

This is equivalent to show

(a1 + a2)(c1 + c2)− (b1 + b2)2 ≥ 0

for positive real numbers ai, bi, ci with aici− b2i ≥ 0, i = 1, 2. However, this

follows immediately from the following tricky computation:

a1a2[(a1 + a2)(c1 + c2)− (b1 + b2)2]= a2(a1 + a2)(a1c1 − b2

1) + a1(a1 + a2)(a2c2 − b22)

+a2(a1 + a2)b21 + a1(a1 + a2)b2

2 − a1a2(b1 + b2)2

= a2(a1 + a2)(a1c1 − b21) + a1(a1 + a2)(a2c2 − b2

2)+(a1b2 − a2b1)2 ≥ 0. ¤


Consider a continuous function f(t, x) of two variables t ∈ [a, b] andx ∈ [c, d]. Suppose that, for any fixed t, f(t, x) is log convex and twicedifferentiable in x. For every integer n, define

Fn(x) = h[f(a, x) + f(a + h, x) + · · ·+ f(a + (n− 1)h, x)],

=n−1∑

k=0

f(a + kh, x)h,

where h = b−an , which is log convex as the sum of log convex functions. Thus

limn→∞Fn(x) =

∫ b

af(t, x)dt,

which is also log convex.If the improper integral

∫ ba f(t, x)dt exists as b →∞: i.e.,

∫ ∞

af(t, x)dt

exists, then it is also log convex as the limit of proper integrals over finitesubintervals.

Theorem 11.2.11 Let ϕ(t) be a positive continuous function on [a, b]. Then∫ b

aϕ(t)tx−1dt

is log convex in x for every interval on which the proper or improper integralexists.

Note that

ln(ϕ(t)tx−1) = lnϕ(t) + (ln t)(x− 1),d2

dx2

[ln(ϕ(t)tx−1)

]= 0.

Thus ϕ(t)tx−1 is log convex, and so is∫ ba ϕ(t)tx−1dt from the above argu-

ment.

Theorem 11.2.12 If f(x) is log convex on I, and if c 6= 0 is any realnumber, then both f(x + c) and cf(x) are log convex on I.

Index

n-th term test for divergence, 73p-Series test, 75

absolute minimum, 23absolute maximum, 23Absolutely Convergence Test, 82acceleration vector, 222algebraic functions, 37Alternating Series Estimation, 81Alternating Series Test, 80amplitude modulation, 164angle, 213antiderivative, 29arc length, 225arc length function, 225area, 216, 266average value, 266

Bessel’s equation of order ν, 177Beta function, 326binomial series, 97binormal vectore, 229bounded from above, 70brachistochrone problem, 119

Cartesian coordinate frame, 211Cauchy’s Mean Value Theorem, 27Cauchy-Schwarz inequality, 213center of mass, 267Chain rule, 19chain rule, 246characteristic equation, 144circle of curvature, 229circulation, 288circulation density, 294Comparison Test, 76Comparison Test for convergence, 66

component functions, 221Compound interest, 54conservative vector field, 290constraint, 251continuous, 221continuous at a point, 9continuous extension, 10continuous on an interval, 9converges, 64, 67converges absolutely, 82converges conditionally, 82convex function, 338convolution, 202critical point, 254critical points, 116critically damped, 162cross product, 214, 215curl, 294, 305curvature, 227curve, 221cycloid, 122

damping force, 161definite integral, 30, 223density of flux, 317density of the circulation, 310derivative, 13, 222, 246, 248determinant, 214difference quotient, 13differentiable, 13, 243, 248differential, 18differential equation, 30Differentiation rules, 15Dirac delta function, 199direction, 212directional derivative, 246

345

346 Index

divergence, 294, 306Divergence Theorem, 312diverges, 64diverges to infinity, 68domain, 1dot product, 212double integral, 261

element of surface area, 299environmental carrying capacity, 116equilibrium points, 116equilibrium position, 161equilibrium solutions, 116error term, 93Euler’s constant, 71, 324Euler’s equation, 173Euler’s first integral, 326Euler’s functional equation, 334Euler’s second integral, 320exponential decay, 53exponential function, 47exponential growth, 53Extreme Value Theorem, 23

factorial, 70factorial function, 319Fibonacci numbers, 70First derivative test, 14, 253First Fundamental Theorem of Cal-

culus, 33first moments, 267fixed point, 11flow, 288flux, 303flux density, 294Fourier series, 100Frenet equations, 231Frenet frame, 229Fubini’s Theorem I, 263Fubini’s Theorem II, 265fundamental set of solutions, 143

gamma function, 319, 320Gauss formula, 323Gauss’ product formula, 331

geometric sequence, 70Geometric Series, 83gradient vector, 246gradient vector field, 286gravitational field, 286gravitational potential energy, 312Green’s Theorem, 295growth rate, 114

half-life, 53harmonic motion, 162Harmonic series, 73Heaviside function, 196Hessian, 254homogeneous, 139Hooke’s law, 160horizontal asymptote, 6

image, 1implicit differentiation, 251improper, 63improper integrals, 63impulsive function, 198incompressible, 317indefinite integral, 29, 223independent variable, 1indicial equation, 181infinite sequence, 67infinite series, 71injective, 2integrable, 31integral curves, 107integral test, 75integral transforms, 189integrating factor, 108, 127Intermediate value property of deriva-

tives, 15Intermediate Value Theorem, 11interval of convergence, 86inverse, 2invertible, 2involute curve, 228irregular singular point, 176irrotational, 297, 312isoclines, 107

Index 347

iterated integral, 263

Kepler’s the first law, 235Kepler’s the second law, 235Kepler’s the third law, 236kernel, 189

L´Hopital´s Rule, 26Lagrange multiplier, 258Laplace transform, 189law of exponential change, 53least upper bound, 70left-hand limit, 5Legendre equation, 177Legendre’s relation, 331length density, 225limit, 2, 221Limit Comparison Test, 76line integral, 283linear operator, 140linear approximation, 17, 242linearization, 17linearly dependent, 141linearly independent, 141local maximum, 253local minimum, 253log convex, 342logistic, 118logistic equation, 115

Maclaurin series, 90Malthusian law of population growth,

115mass, 267mass density, 267mean value, 31Mean Value Theorem, 24mean value Theorem, 32method of Frobenius, 180moment of inertia, 268, 269moving frame, 229

natural logarithm, 44natural number, 47Newton’s law of cooling, 53nondecreasing, 70

one-to-one, 2onto, 2order, 105ordinary differential equation, 105ordinary point, 166orientable, 303orientation-preserving, 286orientation-reversing, 286origin, 211orthogonal, 213osculating circle, 229osculating plane, 229overdamped, 162

parametric curve, 20parametric equations, 20parametrization, 221, 298partial derivative, 240partial differential equation, 105particular solution, 108patch, 298path, 221perpendicular, 213piecewise smooth, 222potential function, 290Power rule, 23power series, 83principal unit normal vector, 228

Quesi-Rolle’s Theorem, 339

radius of convergence, 86radius of curvature, 229range, 1Ratio Test, 78Rearranged Series, 82regular singular point, 176regular curve, 222remainder of order n, 93reparametrization, 226repeated integral, 263resonance, 165Riemann sum, 30right-hand limit, 5right-handed, 211, 216

348 Index

Root Test, 79

S-shaped, 118saddle point, 254Sandwich Theorem, 68saturation level, 116Second Fundamental Theorem of Cal-

culus, 33separable variables, 111sequence of partial sums, 72sink, 317smooth curve, 222Snell’s law of deflection, 121source, 317speed, 20, 222spring constant, 161steady state part, 163Stirling’s formula, 331Stokes’ Theorem, 307surjective, 2

tangent, 222tautochrone problem, 206Taylor polynomial of order n, 93Taylor series, 90Taylor’s Theorem, 91Telescoping Series, 73TNB-frame, 229torsion, 230total differential, 242, 244transcendental functions, 37transient part, 163

underdamped, 163unit impulse function, 199unit speed curve, 226unit step function, 196unit tangent vector, 226unit vector, 212upper bound, 70

vector, 212vector field, 286vector function, 221vector-valued function, 221velocity vector, 20, 222

Verhulst equation, 115vertical asymptote, 6vital coefficients, 115volume, 217vorticity vector, 311

weakly log convex, 342Weierstrass formula, 323, 324Wronskian, 141

Calculus · 2015-02-26 · calculus is to the highly sophisticated theory of Gamma function, which...

Documents

Transcript of Calculus · 2015-02-26 · calculus is to the highly sophisticated theory of Gamma function, which...