Physics 172

Physics 172 - Mathematical Methods in Physics II

0.1 Lecture 1:

In this section I discuss infinite dimensional vectors spaces. Most of theconcepts that I introduced last semester on vector spaces apply to both thefinite and infinite dimensional cases.

Infinite dimensional vector spaces are important because they can be usedto reduce the solution of linear differential, integral, or partial differentialequations to problems in linear algebra. While the resulting infinite dimen-sional systems cannot always be solved exactly, the exact solutions can oftenbe expressed as limits of solutions of approximate finite-dimensional linearequations, of the type studied previously. On a computer these approxi-mate problems involve the inversion of a large matrix or the solution of aneigenvalue problem.

The solutions of linear differential, integral, or partial differential equa-tions are functions. The infinite dimensional vectors spaces that contain thesolutions of these problems are vector spaces of functions.

There are different vector spaces of functions that have properties thatare needed to solve a given problem. If I want to construct a function thatvanishes at x = a and x = b, it is useful to express it as a linear combinationof functions that vanish at x = a and x = b.

For example, consider the equation

f(x) = g(x) +

∫ b

a

K(x, y)f(y)dy (1)

where g(x) and K(x, y) are known functions, and f(x) is an unknown func-tion. If we can write

f(x) =

∞∑

n=0

φn(x)fn (2)

where the φn(x) are known functions and fn are unknown complex coeffi-cients, then the above equation becomes

∞∑

n=0

φn(x)fn = g(x) +

∫ b

a

K(x, y)∞∑

n=0

φn(y)dyfn (3)

1

If the functions φn(x) are chosen to satisfy the orthogonality condition

∫ b

a

φ∗m(x)φn(x)dx = δmn (4)

then multiplying the equation by φ∗m(x) and integrating from a to b, assuming

that we can change the order of the sum and integral, gives the infinitealgebraic system

fm = gm +

∞∑

n=0

Kmnfn (5)

where

gm =

∫ b

a

φ∗m(x)g(x)dx Kmn =

∫ b

a

φ∗m(x)K(x, y)φn(y)dxdy. (6)

I can attempt to solve this by approximating the sum by the first N terms.This leads to a linear algebraic equation for the firstN coefficients f0, · · · , fN−1.

This example will be studied in more detail later in the semester. Itprovides the motivation for why it is interesting to study infinite dimensionalvector spaces.

As a first example of an infinite-dimensional vector space I define thespace C[a, b] to be the space of continuous complex valued functions of a realvariable x on the interval [a, b]. These functions are defined to be continuouson the entire closed interval, including the endpoints.

It is obvious that if f(x), g(x) ∈ C[a, b] then h(x) = αf(x)+g(x) ∈ C[a, b],for any complex constant α. It is a simple exercise to show that C[a, b]satisfies all of the requirements needed to be a vector space.

Many infinite-dimensional vector spaces are metric spaces, normed linearspaces or inner product spaces.

The norm on the space C[a, b] is

‖f‖ = supx∈[a,b]

|f(x)| (7)

This is defined for all continuous functions on [a, b]. To show that (7) is anorm note

‖f‖ = supx∈[a,b]

|f(x)| ≥ 0 (8)

‖αf‖ = supx∈[a,b]

|αf(x)| = |α|‖f‖ (9)

2

for any complex α,

‖f + g‖ = supx∈[a,b]

|f(x) + g(x)| ≤

supx∈[a,b]

|f(x)| + supx∈[a,b]

|g(x)| = ‖f‖ + ‖g‖ (10)

and0 = ‖f‖ = sup

x∈[a,b]

|f(x)| ⇒ f(x) = 0 x ∈ [a, b]. (11)

The norm (7) is not derived from an inner product. It is not the only normthat can be used as a “distance function” for the continuous complex valuedfunctions on [a, b].

An inner product that is defined on the space of continuous complexvalued functions on the interval [a, b] is given by

〈g|f〉 :=

∫ b

a

g(x)∗f(x)dx (12)

The definition (12) implies 〈f |g〉 = 〈g|f〉∗, 〈f |g + αh〉 = 〈f |g〉+ α〈f |h〉, and〈f |f〉 ≥ 0.

If 〈f |f〉 = 0 and f(x0) = c 6= 0 is not zero for some x0 ∈ [a, b] then bycontinuity there is a small interval I ⊂ [a, b] of finite width δ containing x0

where |f(x)| > |c|/2. It follows that

0 = 〈f |f〉 =

∫ b

a

|f(x)|2dx ≥∫

I

|f(x)|2dx > δ|c|2

> 0 (13)

This leads to the contradiction 0 > 0 which implies that f(x) must be iden-tically zero on the interval [a, b].

The function 〈f |f〉1/2 is a norm on the space of continuous functions on[a, b]. This norm is not identical to the norm (7).

One property of finite dimensional vector spaces is that Cauchy sequencesof vectors converge to vectors. In the finite dimensional case the Cauchy se-quence was used to construct a sum which was shown to converge to a vector.This is not automatic in infinite dimensional vector spaces. Previously wedefined a vector space as complete if every Cauchy sequence of vectors con-verged to a vector in the space.

Consider the space of continuous complex valued functions on [a, b] withinner product 〈f |g〉. It is possible to construct a sequence of continuous

3

functions {fn} with the property 〈fm − fn|fm − fn〉 → 0 and m,n → ∞(i.e. they are a Cauchy sequence) with the property that the sequencesconverges to a function that is discontinuous. Since the limiting function isdiscontinuous, it is not a element of the inner product space. This meansthat the space of continuous functions on [a, b] with the inner product (12)is not complete.

A set of functions with this property are

fn(x) =

0 : a ≤ x ≤ cn−x−cn−

cn+−cn−

: cn− < x < cn+

1 : cn+ ≤ x < b

(14)

where cn− := (b−a)(n−1)2n

, cn+ := (b−a)(n+1)2n

. This converges to the step functionthat jumps discontinuously from 0 to 1 at the midpoint of the interval [a, b].

This shows that the space of continuous complex valued functions on [a, b]is not complete in the inner product 〈f |g〉 defined by equation (12).

This is a problem if I want to construct solutions of differential or integralequations by taking limits of solutions of finite dimensional approximationsto a given equation. I need to be sure that the limiting functions are alsovectors in the desired infinite dimensional vector space. This will be true ifthe vector space is complete in the inner product.

For example, if the function is a solution to a second order differentialequation, and a sequence of approximate solutions have second derivatives,we would like the limit to have second derivatives so it can be used in thedifferential equation.

4

0.2 Lecture 2:

If I would like to use continuous functions with the inner product (12) thereare two options. The first is to extend the class of functions in the spaceby defining new vectors in the space as Cauchy sequences of vectors alreadyin the space. One complication is that a given Cauchy sequence may notconverge to a single function. In the above example, the limiting step functioncould be defined as having the value 1 or 0 at the midpoint of (a, b). Thisdefines two different functions that differ at the point x = a+ b−a

2, but both

of them are limit points of the above Cauchy sequence (14).Since the difference of any two of the limiting functions is a non-zero func-

tion with a 0 norm, we have a new problem. This can be avoided by definingabstract vectors as “equivalence classes” of functions or Cauchy sequencesthat differ by zero norm vectors. An equivalence relation ∼ is a relation onthe set that divides a set into mutually distinct equivalence classes, whereelements in the same class are related by

a ∼ a (15)

a ∼ b → b ∼ a (16)

a ∼ b b ∼ c → a ∼ b (17)

In this case f(x) ∼ g(x), or f(x) and g(x) are in the same equivalence classif and only if

〈f − g|f − g〉 = 0 (18)

This method of making a vector space complete by (1) adding all Cauchysequences and (2) replacing vectors by equivalence classes of vectors whosedifference has zero norm can always be used to make infinite dimensionalvector spaces complete. In this case vectors are equivalence classes of Cauchysequences of functions.

The inner product of a Cauchy sequence {fn} with a vector g is definedby

〈g|f〉 = limn→∞

〈g|fn〉 (19)

This becomes a Cauchy sequence of Complex numbers, which does convergeto a complex number. The convergence is independent of g because

|〈g|f − fn〉| ≤ ‖g‖‖f − fn‖ (20)

5

which vanishes independent of g if ‖f − fn‖ → 0.This method is perfectly good, but it not popular because a space of

functions has been replaced by a space of equivalence classes of functions.This means that every operation on a vector has to be checked to make surethat if maps ever element of one equivalence classes to the same equivalenceclass.

There is a second method that is more commonly used to make spacescomplete. In this case the class of functions is enlarged. In order to acco-modate the larger class of functions it is necessary to use a more generaldefinition of the integral. The functions have the property that (1) the innerproduct 〈·|·〉 is defined for all functions in the enlarged class of functions andit agrees with the previously defined scalar poduct, 〈·|·〉, for all continuousfunctions.

The first step in this second process is to redefine what is meant by anintegral. The Riemann Integral is defined as a limit of finite sums

∫ b

a

f(x)dx = limN→∞

N∑

n=1

f(a+ (b− a)2n+ 1

2N)(b− a

N). (21)

In this definition the interval [a, b] is divided up into N smaller subintervals.The integral is approximated by summing products of the value of the func-tion at a point in each subinterval multiplied by the width of the subinterval.With this definition all continuous functions on [a, b] are integrable.

There is an alternate way to define the above integral. Instead of dividingup the interval [a, b] into small subintervals, it is also possible to divide values(range) of the function into subintervals. Consider the following definition

∫ b

a

f(x)dx = limN→∞

∞∑

n=−∞

n

NVolume{x ∈ [a, b] | n

N≤ f(x) <

n+ 1

N} (22)

While this sum is formally infinite, if the function is continuous its valuewill range over a finite minimum and maximum value on the interval [a, b],resulting in a finite sum. The number of terms in the sum will increase withN .

To make sense of this definition it is necessary to define the “volume” ofa set. It turns out that this is difficult to do in a useful way for arbitrarysets. The allowed sets should be large enough so the integral defined by (22)can be used to define the integral of any continuous function.

6

The problem is that it can be shown that there is no volume function mthat

a. Is defined for all sets

b. Satisfies m(∪nAn) =∑

nm(An) Am ∩ An = ∅ .

c. Is translationally invariant

d. Satisfies m([a, b]) = b− a

To motivate the limited choice of sets note that continuous functions havethe property that the inverse image of any open interval

f−1((a, b)) = {x|a < f(x) < b} (23)

can be expressed as a union of disjoint open intervals. This is actually analternative definition of a continuous function that is equivalent to the stan-dard definition with ε-δ definition. I state this problem without proof. Thisproperty will be used to ensure that the modified integral is defined for allcontinuous functions.

To limit the class of sets to a meaningful sub class with properties thatpermit the formulation of an integral, it is useful to introduce the concept ofa σ-algebra of sets. A collection of subsets M of a set X is called a σ-algebraif it has the following three properties

1. X ∈ M

2. A ∈ M → Ac = X − A ∈ M

3. An ∈ M, n ∈ Z then ∪n∈ZAn ∈ M

The identity A ∩ B = (Ac ∪ Bc)c can be used to express intersectionsof sets in terms of complements unions. As a consequence of this relationit follows that countable intersections of elements of a σ algebra are alsomembers of the algebra.

A measure m on a σ-algebra M is real function with values in [0,∞] thatis countably additive. This means that for disjoint An ∈ M, n ∈ Z

m(∪nAn) =∑

n

m(An) (24)

7

The σ-algebra of Borel sets of the real line is smallest σ-algebra of thereal line R that contains all open subsets of R. This definition implies thatthe σ-algebra of Borel sets is the smallest σ algebra that contains the inverseimage of all open sets with respect to any continuous function.

For homework you will show, using properties of σ-algebras that closedsets and single points are also Borel sets.

Lebesgue measure is the measure defined on the Borel sets of the real linesatisfying Vol((a, b)) = b− a.

The countable additivity can be used to show that Lebesgue measuresatisfies Vol((a, b)) = Vol([a, b]) = b−a and Vol([a, a]) = a−a = 0, so pointshave 0 Lebesgue measure.

A function f(x) on the real line is Lebesgue measurable if the inverseimage of open subset of the real line is a Borel set. By this definition allcontinuous functions are Lebesgue measurable. Since the class of borel setsare bigger than the the class of open sets, the class of measurable functionsis larger than the class of continuous functions.

The Lebsegue integral is the integral given by (22) where the volumefunction is Lebesgue measure.

8

0.3 Lecture 3:

The class of Lebesgue measurable functions contains the continuous func-tions, but is larger than the class of continuous functions. It contains func-tions that are piecewise continuous, the function that is 1 on the rationals and1/2 on the irrationals, and many other highly discontinuous functions. Theimportant property of measurable functions is that they are functions andCauchy sequences of measurable functions with respect to the inner product(6) converge to measurable functions in a sense mad precise by the theoremsbelow. While the limit may still not be unique, the condition that 〈f |f〉 = 0means that f(x) vanishes except possibly on a set of Lebesgue measure zero.This is sometimes written f(x) = 0[a.e.] which is stated that f(x) is zeroalmost everywhere with respect to Lebesgue measure.

What has been gained is that the space of equivalence classes of functionsthat differ at most on a set of Lebesgue measure zero and are square integrablewith respect to Lebesgue measure are complete with respect to the innerproduct (6). I state the three key important properties of the Lebesgueintegral:

Monotone Convergence Theorem

a. fn(x) measurable

b. fn(x) → f(x) each x

c. fn(x) ≤ fn+1(x) each x

d.∫|fn(x)|dx < C all n

⇓

f(x) is measurable∫|f(x)|dx <∞

limn→∞

∫|f(x) − fn(x)|dx = 0

Dominated Convergence Theorem

a. fn(x) measurable

b. fn(x) → f(x) each x

9

c. fn(x) ≤ g(x), all x;∫|g(x)|dx <∞

⇓

a. f(x) measurable∫|f(x)|dx <∞

limn→∞

∫|f(x) − fn(x)|dx = 0

Define the Lp(R) spaces as normed spaces of Lebesgue measurable func-tions with the Lp norm given by

‖f‖p := [

∫

|f(x)|pdx]1/p (25)

The proof that this actually defines a norm is not obvious for p 6= 2. It usesa generalization of the Schwartz inequality called Jensen’s inequality. Theargument can be found in Real and Complex Analysis by W. Rudin in thereference list. The theorem also applies if the interval is restricted to a sub-interval of the real line or if the integral includes a positive weight functiondx→ w(x)dx.Riesz-Fischer Theorem: The Lp spaces for (1 ≤ p ≤ ∞) are completenormed linear spaces. The Riesz-Fischer theorem can be proved from themonotone convergence theorem which follows the dominated convergencetheorem. The proof of these theorems can be found in any graduate leveltext om mathematical analysis. They are consequences of the definitionsabove, and also hold for more abstract measure spaces. The important thingis that the Riesz-Fischer theorem provides a large class of complete functionsspaces.

It is useful to include a positive weight function in the definition of thesespaces of Borel measurable functions

Lp(R, w) := {f(x)|‖pf‖ := [

∫

|f(x)|pw(x)dx]1/p <∞ (26)

When p = 2 the space L2(R, w) is a complete inner product space orHilbert space with inner product

〈f |g〉 =

∫

f ∗(x)g(x)w(x)dx (27)

10

A final important result, which I state without proof, is that measurablefunctions that are bounded except on sets of measure zero can be approxi-mated by continuous functions in the sense that |c(x)−m(x)| < ε except onsets of measure < ε, where c(x) is continuous and m(x) is measurable.

In this section I discuss some properties of complete inner product spacesor Hilbert spaces. In what follows the integrals will always be assumed tobe Lebesgue integrals and functions will be equivalence classes of Lebesguemeasurable functions that differ only on sets of zero measure.

A set of complex valued functions {φn(x)} on [a, b] are orthogonal withrespect to a positive weight w(x) if they satisfy the orthogonality relations

〈φn|φm〉 =

∫ b

a

w(x)φ∗n(x)φm(x)dx = δmn. (28)

Here a and b do not have to be finite.We want to investigate when a set of orthonormal functions is a basis or

spans a Hilbert space, H.Let |φn〉 be an orthonormal set of functions and let |v〉 be an arbitrary

vector. Define

|vN〉 =N∑

n=1

|φn〉cn (29)

with〈φm|v〉 = cm (30)

It follows from the definitions that

〈v|vN〉 =

N∑

n=1

|〈φn|v〉|2 (31)

Applying the Cauchy-Schwartz inequality to (31) gives

|〈v|vN〉|2 ≤ 〈v|v〉〈vN |vN〉 = 〈v|v〉N∑

n=1

|〈φn|v〉|2 (32)

Combining (32) with (31) gives

|〈v|vN〉| ≤ 〈v|v〉 (33)

11

or equivalentlyN∑

n=1

|cn|2 ≤ 〈v|v〉 (34)

This inequality is called Bessel’s inequality. It says that the sum of thesquares of the first N expansion coefficients is alway less than or equal to thesquare of the norm of the vector.

Next I give a definition of a basis for infinite dimensional vector spaces.The problem in infinite dimensional spaces is that if one basis vector is re-moved from an infinite set, the remaining set of vectors is still infinite, butnot necessarily a basis. This means that if we keep adding orthogonal vectorto a set, the limiting set is not necessarily a basis.Definition: A set of orthonormal vectors, {|φn〉}, is a basis for a Hilbertspace H if and only if the only vector orthogonal to all of the |φn〉 is the zerovector.

Let |v〉 ∈ H be arbitrary and consider the partial sums

|vN〉 :=N∑

n=1

|φn〉〈φn|v〉 (35)

Consider the norm of the difference of the partial sums for M > N terms.

‖|vM〉 − |vN〉‖2 = 〈vM |vM〉 − 〈vN |vM〉 − 〈vM |vN〉 + 〈vN |vN〉 (36)

Note that for M > N

〈vN |vM〉 = 〈vM |vN〉 = 〈vN |vN〉 (37)

so (36) becomes

〈vM |vM〉 − 〈vN |vN〉 =

M∑

n=N+1

|〈φn|v〉|2 (38)

Since the right hand side of Bessel’s inequality is independent of N it followsthat

∞∑

n=1

|〈φn|v〉|2 <∞ (39)

12

which means that the coefficients |〈φn|v〉|2 are a convergent sequence of num-bers. This implies that the difference in (36) converges to zero and the se-quence of vectors {|vN〉} is a Cauchy sequence of vectors. Since by definition,the Hilbert space is complete, the infinite sum

|w〉 := limN→∞

|vN〉 =

∞∑

n=1

|φn〉〈φn|v〉 (40)

converges to a vector |w〉 ∈ H.Next consider the difference

|u〉 = |v〉 − |w〉 (41)

There are two possibilities; either |u〉 is the zero vector or it is not the zerovector.

If |u〉 is not zero then

〈φm|u〉 = 〈φm|v〉 − 〈φm|w〉 = 〈φm|v〉 − 〈φm|v〉 = 0 (42)

which means that there is a non-zero vector orthogonal to every element ofthe orthonormal set. When this happens {|φn〉} cannot be a basis.

Conversely, if this set is a basis, the vector |u〉 must vanish for any choiceof |v〉 ∈ H which implies

|v〉 =∞∑

n=1

|φn〉〈φn|v〉 (43)

It follows that

〈v|v〉 =∞∑

n=1

|〈φn|v〉2 (44)

in which Bessel’s inequality becomes an equality for every vector in H. Equa-tion (44) is called Parseval’s relation.

This shows that a condition for completeness is for Bessel’s inequality tobe an equality for all vectors. If is fails to be an equality for 1 vector theorthogonal set is not a basis.

When {|φn〉} an orthonormal basis then the what we have shown thatany vector in the Hilbert space has the form

|v〉 =∞∑

n=1

|φn〉cn (45)

13

where∞∑

n=1

|cn|2 <∞ (46)

The infinite collection of numbers cn are the coordinates of the vector |v〉 inthe basis |φn〉

In this representation the components of the sum two vectors or the prod-uct of a vector and a complex scalar are the sums of the individual compo-nents or the product of each component by the scalar. In both cases if theoriginal coefficients satisfy (46) then the new vectors will also satisfy thiscondition.

The abstract requirements for an infinite set of orthogonal vectors to be abasis are interesting, but they are not of real practical value. In applicationsit is alway important to work with an explicit set of functions that are knownto be a basis.

It turns out that for the Hilbert space of measurable functions on a finiteinterval [a, b] that the polynomials

φn(x) = xn n = 0, 1, 2 · · · (47)

are a basis. While these are not orthonormal functions, they can be used toconstruct an infinite orthonormal set using the Gram Schmidt method.

The theorem that we will prove is even stronger. It is called the Weier-strass approximation theorem. The statement of the theorem isTheorem: (Weierstrass Approximation Theorem) Let f(x) be a con-tinuous function on the finite interval [a, b]. Then for every ε > 0 there existsan integer n > 0 and a polynomial pn(x) of degree n with the property that

∑

x∈[a,b]

|f(x) − pn(x)| < ε (48)

This theorem shows that the approximation holds on the space C[a, b]with its natural norm. Formally this means that the approximation is point-wise.

Pointwise convergence implies L2(R) convergence and more generallyLp(R) convergence. Because Cauchy sequences of continuous functions canbe used to construct the measurable functions in L2([a, b]), if we let f(x) bemeasurable, for every ε > 0 we can find a continuous function cε(x) satisfying

‖cε − f‖ < ε

2(49)

14

Since cε is continuous we can find a polynomial of sufficiently hight degree nso

‖cε − pn‖ <ε

2(50)

Using the triangle inequality

‖pn − f‖ = ‖pn − cε + cε − f‖ <

‖pn − cε‖ + ‖cε − f‖ ≤ ε

2+ε

2= ε (51)

Since ε is arbitrary, we see that the convergence also applies to measurablefunctions with respect to a Hilbert space norm. The means that the polyno-mial are also a basis for L2(R) (and Lp(R)).

Note however with measurable functions the sup norm of the differencebetween the polynomial and the measurable function does not have to besmall. For example the function could have very different values on sets ofmeasure zero or sets of very small measure.

I also note that the Weierstrass theorem the key tool in the infinite-dimensional generalization of the proof that normal linear operators havecomplete sets of eigenfunctions. This is called the spectral theorem. It isused all the time in quantum mechanics,

15

0.4 Lecture 4:

The trick for proving the Weierstrass theorem is to construct very sharplypeaked polynomials that integrate to 1 on [a, b].

I start by transforming the interval [a, b] to an interior subinterval of thestandard interval [0, 1] using

x′ =1

b− a[(x− a)(1 − σ) + (b− x)σ] (52)

where 12> σ > 0. In this case x = a → x′ = σ; x = b → x′ = (1 − σ). The

problem reduces to finding a polynomial approximation that is accurate onthe interior subinterval.

Next we extend the function g(x′) := f(x(x′)), which is defined andcontinuous on the subinterval [σ, 1 − σ], to a continuous function on all of[0, 1]. This can be done by setting f(x) := f(σ) for x < σ and f(x) = f(1−σ)for x > (1 − σ).

It is sufficient to show that g(x) can be pointwise approximated by apolynomial on the interior interval [σ, 1 − σ].

Let 1 > δ > 0 and define the functions

An(δ) :=

∫ 1

δ

(1 − y2)ndy (53)

To prove the Weierstrass theorem I will need the result

limn→∞

An(δ)

An(0)= 0 (54)

To show this note

An(δ) :=

∫ 1

δ

(1 − y2)ndy ≤ (1 − δ2)n(1 − δ) ≤ (1 − δ2)n (55)

and

An(0) :=

∫ 1

0

(1 − y2)ndy ≥∫ 1

0

(1 − y)ndy =1

n+ 1(56)

Using these together I get

An(δ)

An(0)≤ (n+ 1)(1 − δ2)n (57)

16

which vanishes as n→ ∞ for 0 < δ < 1.The desired Weierstrass polynomial approximation to g(x) is

P2n(x) :=1

2An(0)

∫ 1

0

g(z)[1−(z−x)2]ndz =1

2An(0)

∫ 1−x

−x

g(x+y)[1−y2]ndy

(58)By inspection it is clear that this is a polynomial of degree 2n in the

variable x.Pick any ε > 0 and choose δ so for |x− y| < δ

|g(x+ y) − g(x)| < ε (59)

This can be done because the interval is closed and bounded. The proof thisstatement is non-trivial and is based on a compactness argument.

Consider

P2n(x) =1

aAn(0)[

∫ −δ

−x

+

∫ δ

−δ

+

∫ 1−x

δ

]g(y + x)(1 − y2)ndy (60)

where without loss of generality I can always choose δ small enough so thatδ < x.

We estimate all three integrals

P2n(x) = I1(n) + I2(n) + I3(n) (61)

where

|I1(n)| :=1

2An(0)|∫ −δ

−x

g(y + x)(1 − y2)ndy| ≤

1

2An(0)|gmax|

∫ −δ

−x

(1 − y2)ndy| =

1

2An(0)|gmax|

∫ x

δ

(1 − y2)ndy| ≤

|gmax|An(δ)

2An(0)| → 0 (62)

and

|I3(n)| :=1

2An(0)|∫ 1−x

δ

g(y + x)(1 − y2)ndy| ≤

17

1

2An(0)|gmax|

∫ 1−x

δ

(1 − y2)ndy| ≤

|gmax|An(δ)

2An(0)| → 0 (63)

and

|I2(n)| :=1

2An(0)|∫ δ

−δ

g(y + x)(1 − y2)ndy| ≤

1

2An(0)|∫ δ

−δ

(g(x) + g(y + x) − g(x))(1 − y2)ndy| ≤

g(x)1

2An(0)|∫ δ

−δ

(1 − y2)ndy|+ 1

2An(0)|∫ δ

−δ

|g(y + x) − g(x)|(1− y2)ndy| ≤=

1

2An(0)2|[∫ 1

0

−∫ 1

δ

](1−y2)ndy+1

2An(0)|∫ δ

−δ

|g(y+x)−g(x)|(1−y2)ndy| ≤=

g(x)2An(0) − 2An(δ)

2An(0)+

ε

2An(0)

2An(0) − 2An(δ)

2An(0)(64)

This has the form g(x) + ε+(terms that vanish as as n → ∞). This meansthat

|P2n(x) − g(x)| (65)

can be made as small as desired by choosing ε small enough and n largeenough. This completes the proof of the Weierstrass theorem

What we get from the Weierstrass theorem is an explicit basis for all ofthe Lp spaces of functions on a finite interval [a, b]. These are well understoodfunctions that can be used in explicit computations. On L2([a, b]) these basisfunctions can be replaced by orthogonal polynomials using the Gram Schmidmethod. The orthogonal polynomials defined this way on the interval [−1, 1]are proportional to the Legendre polynomials (to be discussed later).

18

0.5 Lecture 5:

Now that we know that the orthogonal polynomials are a basis for a numberof function spaces, it is useful to discuss properties of orthogonal polynomialswith different choices of weight functions.

While in principle it is possible to start with any positive weight functionw(x) on the interval [a, b] and construct orthogonal Polynomials using theGram Schmidt method, there is a special set of orthogonal polynomials, calledthe classical orthogonal polynomials, that arise in many physics applications.These polynomials are all constructed using similar methods.

These polynomials are constructed using the following formula, called ageneralized Rodrigues formula:

Cn(x) =1

w(x)

dn

dxn(w(x)sn(x)) (66)

where

(1.) s(x) is a polynomial of degree ≤ 2 in x with real roots.

(2.) C1(x) is a polynomial of degree 1 in x.

(3.) w(x) is a positive weight function satisfying w(a)s(a) = w(b)s(b) = 0.

We will show that the functions Cn(x) are orthogonal polynomials on theinterval [a, b] with weight w(x), i.e.:

∫ b

a

w(x)Cn(x)Cm(x)dx = Knδnm (67)

where Kn is a positive normalization constant, These can be made orthonor-mal by replacing Cn(x) by Cn(x)/

√Kn

In addition to naturally appearing in applications, classical polynomialsof high orders are easily generated by computer.

In what follows all of these properties will be established. I begin byconsidering consequences of the assumptions above.(1.) The Rodriguez formula applied to C1(x) implies

C1(x) =1

w(x)

d

dx(w(x)s(x)) =

1

w(x)

dw(x)

dxs(x) +

ds(x)

dx(68)

19

This equation can be written as

dw(x)

dxs(x) = w(x)(c1(x) −

ds(x)

dx) = w(x)pk≤1(x) (69)

where pk≤1(x) is a polynomial of degree one or less. This is because C1(x)has degree 1 and s(x) is of degree two or less.

dw(x)

dxs(x) = w(x)pk≤1(x) (70)

(2.) Next considerdm

dxm(w(x)sn(x)p≤k(x)) =

dm−1

dxm−1

(dw(x)

dxsn(x)p≤k(x) + nw(x)sn−1(x)

ds(x)

dxp≤k(x) + w(x)sn(x)

dp≤k(x)

dx

)

(71)Next factor w(x)sn−1(x) out of the () using the result of (1.) to get

=dm−1

dxm−1

(

w(x)sn−1(x)(p≤1(x)p≤k(x) + nds(x)

dxp≤k(x) + s(x)

dp≤k(x)

dx)

)

(72)which has the general form

dm

dxm(w(x)sn(x)p≤k(x)) =

dm−1

dxm−1

(w(x)sn−1(x)p≤k+1(x)

)(73)

Continuing, using mathematical induction it follows that if this is repeatedm ≤ n times gives

dm

dxm(w(x)sn(x)p≤k(x)) = w(x)sn−m(x)p≤k+m(x) (74)

(3.) Next note that for m < n equation (2.) implies

dm

dxm(w(x)sn(x)p≤k(x)) = 0 x = a or b (75)

This follows because a factor of w(x)s(x) can be factored out of the rightside of (??) when m < n.

20

I use these three results to show that the Cn(x) are (a.) polynomials and(b.) mutually orthogonal.

First note that equation (74) with k = 0 and m = n gives

Cn(x) =1

w(x)

dn

dxn(w(x)sn(x)) =

1

w(x)w(x)s0(x)p≤n(x) = p≤n(x) (76)

which establishes that Cn(x) is a polynomial of degree no higher than n.Next I observe that for n > m

∫ b

a

Cn(x)Pm(x)w(x)dx =

∫ b

a

1

w(x)Pm(x)w(x)

dn

dxn(w(x)sn(x))dx (77)

Integrating by parts, using w(a)s(a) = w(b)s(b) = 0 gives

∫ b

a

Cn(x)Pm(x)w(x)dx = (−)

∫ b

a

dPm(x)

dx

dn−1

dxn−1(w(x)sn(x))dx (78)

Repeating this n− 1 times gives

∫ b

a

Cn(x)Pm(x)w(x)dx = (−)n

∫ b

a

dnPm(x)

dxnw(x)sn(x)dx = 0 (79)

because n > m. This shows that Cn(x) is orthogonal to all polynomialsof degree m < n. Since Cn(x) has degree no more than n it must be apolynomial of degree n.

This shows that Cn(x) are indeed orthogonal polynomials on [a, b] withweight w(x).

The requirements on the classical orthogonal polynomials are somewhatrestrictive. In fact they are so restrictive that it is possible to classify all ofthe classical orthogonal polynomials.

I start with the case that s = k where k is a constant. Then the Rodriguesformula for C1(x) becomes

c1 + c2x = C1(x) =1

w(x)

d

dx(kw(x)) (80)

This givesd

dxln(w(x)) =

c1k

+c2kx (81)

21

andw(x) = exp(

c22kx2 +

c1kx+ c0) (82)

This function will vanish at x = ±∞ provided that c22k< 0

By completing the square in the exponent and rescaling the resultingvariable I reduce the general case to

s = 1 w(x) = e−x2

a = −∞ b = +∞ (83)

The corresponding orthogonal polynomials are called Hermite polynomials.They appear in solutions to the three dimensional quantum harmonic oscil-lator. The polynomials satisfy

∫ ∞

−∞

Hn(x)Hm(x)dx = 0 m 6= n (84)

Next I consider the case that s(x) = k1 + k2x. In this case the Rodriguesformula for C1(x) gives

c1 + c2x = C1(x) =1

w(x)

d

dx((k1 + k2x)w(x))) (85)

This can be expressed as

c1 − k2 + c2x

k1 + k2x=d ln(w(x))

dx(86)

The left side of this equation can be written as

c1 − k2 + c2/k2(k2x+ k1) − c2k1/k2

k1 + k2x=c2k2

+c1 − k2 − c2k1/k2

k1 + k2x=d ln(w(x))

dx(87)

Integrating both side of this equation gives

c2k2x +

c1 − k2 − c2k1/k2

k2ln(k1 + k2x) = ln(w(x)) + c′0 (88)

Evaluating the weight function gives

w(x) = (k1 + k2x)c1−k2−c2k2/k2

k2 ec0+

c2k2

x(89)

22

In this case s(x)w(x) = 0 requires x = −k1/k2 with c1−k2−c2k2/k2

k2> −1 and

x = ∞ if c2/k2 < 0 and x = −∞ if c2/k2 > 0.By suitable transformations I can put this in the form

w(x) = e−xxν ν > −1 s(x) = x a = 0, b = ∞ (90)

The corresponding classical polynomials are called the Associated Laguerrepolynomials. These polynomials appear in the wave functions for the Hydro-gen atom.

The last case is when s(x) = k0(x− k1)(x− k2). The Rodrigues formulafor C1(x) gives

c1 + c2x =1

w(x)

d

dx(w(x)k0(x− k1)(x− k2)) =

d ln(w(x))

dxk0(x− k1)(x− k2)) + k0(x− k2) + k0(x− k1) (91)

This can be written in the form

(c1 + k0k1 + k0k2) + x(c2 − 2k0)

k0(x− k1)(x− k2)=d ln(w(x))

dx(92)

To compute this we use the rational function expansion that we proved lastsemester:

(c1 + k0k1 + k0k2) + x(c2 − 2k0)

k0(x− k1)(x− k2)=

(c1 + k0k1 + k0k2) + k1(c2 − 2k0)

k0(k1 − k2)

1

x− k1

+

(c1 + k0k1 + k0k2) + k2(c2 − 2k0)

k0(k2 − k1)

1

(x− k2)(93)

Integrating both sides of this equation gives

(c1 + k0k1 + k0k2) + k1(c2 − 2k0)

k0(k1 − k2)ln(x− k1)+

(c1 + k0k1 + k0k2) + k2(c2 − 2k0)

k0(k2 − k1)ln(x− k2) = ln(w(x)) + d1 (94)

If we define

ν1 =(c1 + k0k1 + k0k2) + k1(c2 − 2k0)

k0(k1 − k2)(95)

23

ν2 =(c1 + k0k1 + k0k2) + k2(c2 − 2k0)

k0(k2 − k1)(96)

thenw(x) = e−d1(x− k1)

ν1(x− k2)ν2 (97)

Note thatw(x)s(x) = e−d1(x− k1)

ν1+1(x− k2)ν2+1 (98)

vanishes at k1 and k2 provided ν1, ν2 > −1In this case the corresponding orthogonal polynomials, P ν1ν2

n are relatedto the Jacobi Polynomials, which are associated with k1 = −1 and k2 = 1.

There are several special cases of the Jacobi polynomials:µ1 = ν2 = λ− 1

2Gegenbauer Cλ

n(x)µ1 = ν2 = 0 Legendre Pn(x)µ1 = ν2 = −1

2Tchebichef 1st kind Tn(x)

µ1 = ν2 = 12

Tchebichef 2nd kind Un(x)

24

0.6 Lecture 6:

The classical orthogonal polynomials often arise as solutions to ordinary dif-ferential equation. Because of this it is useful to recognize the form of thedifferential equation satisfies by each of the Classical orthogonal polynomials.

To derive the differential equation start with

d

dx

w(x)s(x)

dCn(x)

dx︸︷︷︸

Pn−1(x)

= w(x)s(x)1−1Pn−1+1(x) (99)

The polynomial on the right hand side of this equation can be expandedas a linear combination of the Cm(x) for m ≤ n. I write this as

Pn−1+1(x) = −n∑

m=0

λnmCm(x) (100)

First I show that λmn = 0 unless m = n. To do this multiply the aboveexpression by Cl(x) with l ≤ n and integrate from a to b:

∫ b

a

Cl(x)d

dx

(

w(x)s(x)dCn(x)

dx

)

dx

= −n∑

m=0

∫ b

a

w(x)Cl(x)λnmCm(x) = −λnlKl (101)

where

Kl :=

∫ b

a

w(x)C2l (x)dx (102)

Integrate the left side of the above equation twice by parts, using w(x)s(x)vanishes at the endpoints, to get

∫ b

a

d

dx

(dCl(x)

dxw(x)s(x)

)

Cn(x) = dx

∫ b

a

w(x)Pl(x)Cn(x) = dx (103)

25

which vanishes unless l = n. This leads to the differential equation

1

w(x)

d

dx

[

s(x)w(x)dCn(x)

dx

]

+ λnnCn(x) = 0 (104)

In order to obtain the explicit form of this equation I need to determine thecoefficient λnn.

First note

C1(x) =1

w(x)

d

dx(s(x)w(x)). (105)

Using this in the differential equation gives

0 =1

w(x)

d

dx

[

s(x)w(x)dCn(x)

dx

]

+ λnnCn(x) =

s(x)d2Cn(x)

dx2+ C1(x)

dCn(x)

dx+ λnnCn(x) = (106)

IfCn(x) = knnx

n + knn−1xn−1 + · · · (107)

then comparing coefficients of xn gives the relation

1

2!

d2s(x)

dx2n(n− 1)knn + nk11knn + λnnknn (108)

which gives

−λnn = nk11 +n(n− 1)

2

d2s(x)

dx2(109)

where because s is at most second degree, d2s(x)dx2 is a constant independent of

x.This gives an explicit expression for all coefficients in the differential equa-

tion in terms of s and leading coefficient of C1(x).

26

0.7 Lecture 7:

While the classical orthogonal polynomials can be computed using the Ro-drigues formula, they can be computed very efficiently using recursion rela-tions, which are satisfied by all orthogonal polynomials.

To formulate the recursion relations we assume the normalization∫ b

a

w(x)Cn(x)Cm(x)dx = δmn (110)

andCn(x) = knnx

n + knn−1xn−1 + p≤n−2(x) (111)

Using these relations it follows that

Cn+1(x) = kn+1n+1xn+1 + kn+1nx

n + p≤n−1(x) (112)

I can cancel off the coefficient of xn+1 by taking the difference

Cn+1(x) −kn+1n+1

knnxCn(x) = p≤n =

n∑

l=0

an+1,lCl(x) (113)

Multiplying by Cn−1(x)w(x) and integrating from [a, b] using the orthog-onality relations (110) gives

−kn+1n+1

knn

∫ n

a

w(x)xCn(x)Cn−1(x)dx = an+1n−1 (114)

Since

xCn−1(x) =kn−1n−1

knnCn(x) + p≤n−1 (115)

the above integral becomes

an+1n−1 = −kn+1n+1

knn

kn−1n−1

knn(116)

The coefficient of xn in Equation (113) gives the relation

kn+1nxn − kn+1n+1

knnknn−1x

n = an+1nknnxn (117)

or

an+1n =1

knn

(

kn+1n − kn+1n+1knn−1

knn

)

(118)

27

This gives the relation

Cn+1(x) = (Anx +Bn)Cn(x) −DnCn−1(x) (119)

where

An =kn+1n+1

knn(120)

Bn =1

knn

(

kn+1n − kn+1n+1knn−1

k2nn

)

(121)

Dn = −kn+1n+1

knn

kn−1n−1

knn(122)

where in these recursions relations the normalization is assumed to be 1.Similar relations hold with more general normalizations.

Equation (4.54) can be used to generate all of the orthogonal polynomialsin terms of the first two.

The differential equations are uesful for recognizing when certain calssicalorthogonal polynomials arise a solution of dynamical equations. The recur-sion relations are useful for generating the different orthogonal polynomials.Both the recursion relation and diferential equation need explicit expressionsfor the coefficients of the leading two powers of x:

Cn(x) = knnxn + knn−1x

n−1 + · · · (123)

These can be determined directly from the Rodrigues equations. They aregiven on page 212-214 of the text, along with the standard normalizations.

As discussed previously the orthogonal polynomials are a complete set offunctions on the Hilbert space of square integrable functions with weight w

〈f |g〉 =

∫ b

a

w(x)f ∗(c)g(x)dx (124)

(Technically the Weierstrass theorem does not apply to the Legendre or Her-mite polynomials, however they can also be shown to be complete using adifferent proof.)

To use these bases I need to know how to express arbitrary vectors in thisbasis (in what follows I assume that the polynomials are orthonormal)

f(x) =∞∑

n=0

Pn(x)〈Pn|f〉 (125)

28

where

〈Pn|f〉 =

∫ b

a

w(x)P ∗n(x)f(x)dx (126)

In practice the infinite sum in truncated. This is usually reasonable be-cause of the Parseval relation, which ensures that sum of the squares of thediscarded coefficients gets small as more coefficients are retained. There isstill the matter of computing the inner products (126) involving the retainedbasis functions.

In practice these integrals cannot be done analytically for an arbitraryfunction. The need to be approximated. There is a class of approximationsthat are closely related to orthogonal polynomials with a given weight.

To motivate the problem of Gaussian Quadrature let w(x) be positiveweight function on the intervale [a, b]Problem: FindN points {xi} and weights {wi} such that for any polynomialof degree 2N − 1

∫ b

a

w(x)P (x)dx =

N∑

n=1

P (xn)wn (127)

Before I discuss the solution of this problem I discuss how these quadra-ture rules can be used. Assume that f(x) is a function representing a vectorin the Hilbert space. Assume that f(x) can be accurately approximated bya polynomial of degree N . Let {xn} and {wn} be the points and weights foran N point quadrature with weight w(x). Then

〈f |Pn〉 =

∫ b

a

w(x)P ∗n(x)f(x)dx ≈

N∑

m=1

wmf(xm)Pn(xm) (128)

This formula is exact if f(x) exactly a polynomial of degree N .To construct the points and weight associated with a given weight func-

tion w(x) begin by assuming that the {xn}Nn=1 are known and define the

polynomial

PN(x) =

N∏

n=1

(x− xn) =

N∑

n=0

pnxn (129)

By construction the points {xn} are roots of PN(x):

PN(xn) = 0 n = 1, · · · , N (130)

29

DefineQm := PN(x)xm m = 0, · · · , N − 1 (131)

Since PN(x) can be factored out of each Qm(x) the points {xn} are also rootsof each Qm(x):

Qm(xn) = 0 n = 1, · · · , N, m = 0, · · · , N − 1 (132)

To find the desired quadrature points require

∫ b

a

Qm(x)w(x)dx =N∑

n=1

Qm(xn)wn = 0 (133)

or equivalently∫ b

a

w(x)PN(x)xmdx = 0 (134)

Using the expansion for PN (x) =∑pnx

n I can write

0 =

N∑

n=0

pn

∫ b

a

xm+nw(x)dx (135)

It follows from (129) that pN = 1 so this becomes a linear system for thecoefficients of the polynomial PN(x):

N−1∑

n=1

Imnpn = qm (136)

where

Imn :=

∫ b

a

xm+nw(x)dx (137)

qm := −∫ b

a

xN+mw(x)dx (138)

The coefficients pn can be determined by solving the linear system (136).The zeros of this polynomial are the quadrature points.

To connect this with the discussion of orthogonal polynomials note thatthe equation

∫ b

a

w(x)PN(x)xmdx = 0 m = 0 · · ·N − 1 (139)

30

means that up to an overall constant multiplier, PN(x) is the degree N or-thogonal polynomial on the interval [a, b] with weight w. Thus the pointsxn are the roots of the degree N orthogonal polynomial on [a, b] with weightw(x).

Given the quadrature points is it still necessary to calculate the weights.To do this define the polynomials

Ln(x) :=

∏

m6=n(x− xm)∏

m6=n(xn − xm)Ln(xm) = δmn (140)

Using these polynomials

∫ b

a

w(x)Lm(x)dx =∑

n

wnLm(xn) =∑

δmnwn = wm (141)

which gives the integral expressions for the weights

wn =

∫ b

a

w(x)Ln(x)dx (142)

By accurately performing the integrals (142) once , and computing theroots of PN(x) is it possible to accurately integrate a large class of functionon [a, b]

While roots and weights still have to be evaluated, there exist standardsubroutines and tables that give the points and weights for Gauss quadraturesassociated with all of the Classical orthogonal polynomials.

The most common of the quadratures is the Gauss-Legendre quadrature.In this case the quadrature points are the zeros on the N -ht Legendre polyno-mial. It is associated with the weight w(x) = 1 on the interval [−1, 1] It canbe linearly transformed of any other interval. To integrate functions that arenot well-approximated by polynomials, it is possible to break the domain ofintegration into subintervals, where Guass quadrature methods are appliedto each subinterval.

31

0.8 Lecture 8:

The Muntz-Szasz Theorem: Let 0 < λ1 < λ2 · · · < ∞ and consider{1, tλ1, tλ2 , · · · , } is dense in the continuous functions on [0, 1] if an only if

∞∑

n=1

1

λn= ∞ (143)

Obviously∑

n1n

= ∞ which is consistent with the Weierstrass theorem.What is interesting is that this inequality is still satisfied if we remove anyfinite subset of the xm’s from the basis set. For example the odd or evenpolynomial are separately dense on [a, b].

This result does not apply to orthogonal polynomials, which is clear fromthe definition of basis. It is special to the infinite dimensional case.

I will outline the proof of the result, only because is uses properties ofanalytic functions derived last semester.

To prove the Muntz-Szasz theorem I observe that if the set of functions{xλn} are not a basis, then there is a continuous function φ(x) such that

∫ 1

0

xλnφ(x)dx = 0 (144)

for all n.By contradiction I assume that a non-zero φ(x) satisfying (144 ) exists.

For that function I define

fφ(z) :=

∫ 1

0

tzφ(t)dt z = x+ iy (145)

Note that

tz = ez ln(t) = txtiy = ex ln(t)eiy ln(t) = txeiy ln(t) (146)

For x > and t ∈ (0, 1]

limt→0

|ex ln(t)eiy ln(t)| = e−x| ln(t)| = 0 (147)

It follows that tz is analytic in the right half plane and

|tz| = tx = ex ln(t) ≤ 1 (148)

32

To demonstrate the analyticity note tz is continuous for x > 0 and satisfiesthe Cauchy Riemann equations:

tz = u+ iv = ex ln(t) cos(y ln(t)) + iex ln(t) sin(y ln(t)) (149)

∂u

∂x= ln(t)u(x) =

∂v

∂y(150)

∂u

∂y= − ln(t)v(x) = −∂v

∂x(151)

In integral representation

fφ(z) :=

∫ 1

0

tzφ(t)dt (152)

For z in the right half plane tz is analytic for any t ∈ [0, 1] and tzφ(t)continuous in t for t ∈ [0, 1]. Theorem 2 on page 40 of the text(or page 56 oflast semesters notes) implies that f(z) is analytic in the right half plane.

Next note that

|f(z)| ≤∫ 1

0

|tz||φ(t)|dt =

∫ 1

0

tx|φ(t)| ≤∫ 1

0

|φ(t)| = maxt∈[0,1]

|φ(t)| = C <∞(153)

The last step follows because a continuous function on a closed interval isnecessarily bounded.

The condition that

fφ(λn) :=

∫ 1

0

tλnφ(t)dt = 0 (154)

means that f(z) has a zero for each of real exponents λn.What we have established is that f(z) is a bounded analytic function in

the right half plane with an infinite number of zeros on the positive real line.I use the conformal mapping

z =1 + z′

1 − z′z′ =

1 − z

1 + z(155)

to map the right half plane to the interior of the unit disc

z =1 + reiφ

1 − reiφ=

1 − r2 + 2ir sin(φ)

1 + r2 − 2r cos(φ)(156)

33

It is not hard to check that z takes on all values in the right half complexas z′ varies over the interior of the unit disk.

Thus

g(z′) = f(1 + z′

1 − z′) (157)

is a bounded analytic function on the disk with and infinite number of zerosat

αn =λn − 1

λn + 1(158)

These zeros accumulate at the point z′ = 1Consider the contour integral

1

2πi

∫1 − z

1 + z

1

g(z)

dg

dz(z)dz (159)

where the contour is in the unit circle enclosing the interval [−1 + ε, 1 − δ],and we take the limit that δ → 0. The function

1

g(z)

dg

dz(z)dz (160)

has isolated poles at αn. The integrand vanishes at 1 and the point -1 isavoided by the choice of contour. The theorem on page 85 implies that

1

2πi

∫1 − z

1 + z

1

g(z)

dg

dz(z)dz

∑

n

1 − αn

1 + αn=∑

n

1

λn+ other terms (161)

The other terms come from (a) additional roots in the right half plane orhigher multiplicities of the roots λn. In all cases they have positive realparts.

Since the integrand is bounded on this contour it follows that

∑

n

1

λn< |∑

n

1

rn| = C <∞ (162)

where the second sum is over all roots times multiplicites in the right halfplane.

It the sum on the left is finite, then is it possible that there is a non-zerocontinuous function φ(x) orthogonal to all of the xλ

n, if on the other hand thesum is infinite then there are no continuous functions φ(t) that are orthogonalto all of the tλn ’s. This means that the set {zλn} is complete.

34

Clearly the powers xn satisfy∑∞

n=01n

= ∞ which is consistent with whatwe learned from the Weierstrass theorem. What is interesting is that we canremove a finite and in some cases infinite number of terms from this basisand still get a basis.

This theorem does not apply if we orthogonalize our basis functions usingthe Gram Schmid method. It we remove one function from an orthonormalbasis we have a non-trivial function that is orthogonal to the rest of the basisfunctions, which means that the remaining functions are not a basis.

35

0.9 Lecture 9:

In this section the Weierstrass approximation theorem is used to introduceFourier series.

Let f(θ) , θ ∈ [−π, π] be continuous and periodic, f(π) = f(−π). Sincethis function is periodic it follows that f(r, θ) = rf(θ) is a continuous functionin the plane, where r and θ are considered as polar coordinates. Let

x = r cos(θ) y = r sin(θ) (163)

g(x, y) = g(r cos(θ), r sin(θ)) = f(r, θ) (164)

Since g(x, y) is continuous, it is also continuous on x ∈ [−1, 1], y ∈ [−1, 1].We can apply the Weierstrass theorem separately to each variable. The proofis a straightforward extension of the one-variable proof. This means thatthere is a polynomial

Gmn =

m,n∑

k,l=0

gklxkyl (165)

It is useful to choose the polynomial in x and y to have the same order(m = n). This can be done by choosing maximum order to be the largerof m or n. The Weierstrass theorem means that for any ε > 0 there is asufficiently large N such that

|g(x, y)−N∑

m,n=0

gmnxmyn| < ε (166)

This result still holds if it is expressed in terms of polar coordinates:

|f(r, θ) −N∑

m,n=0

gmnrm+n cosm(θ) sinn(θ)| < ε (167)

Since this holds uniformly for all x, y in the square of side length 2 centeredabout the origin, we can set r = 1 and the inequality is still valid, which gives

|f(θ) −N∑

m,n=0

gmn cosm(θ) sinn(θ)| < ε (168)

This series converges uniformly and pointwise for all θ ∈ [−π, π].

36

It looks more symmetric to replace

cos(θ) =eiθ + e−iθ

2sin(θ) =

eiθ + e−iθ

2i(169)

Using the binomial theorem we have

cosm(θ) =1

2m

m∑

k=0

m!

k!(m− k)!ei(2k−m)θ (170)

sinm(θ) =(i)n

2m

m∑

k=0

m!(in)

k!(m− k)!e−i(2k−m)θ (171)

For both sums −m ≤ 2k −m ≤ m

eiθ + e−iθ

2i(172)

This leads to an alternative expression for the series

|f(θ) −2N∑

m=−2N

cmeimθ| < ε (173)

This shows that any continuous periodic function can be uniformly point-wise approximated by a trigonometric polynomial of sufficiently high order.

The basis functions eimφ have the nice property that they are orthogonalfor different values of m:

∫ π

−π

e−imθeinθdθ = 2πδmn (174)

We define the orthonormal basis functions

|em〉〈θ|em〉 =1√2πeimθ (175)

Equation (174) is equivalent to

〈em|en〉 =

∫ π

−π

〈em|θ〉dθ〈θ|en〉 = δmn (176)

We can use the Weierstrass theorem to calculate expansion coefficientsfor any continuous periodic functions. This does not use the inner product.

37

An alternative is to choose the coefficients to minimize the L2[−π, π] normof the difference

‖|f〉 −N∑

m=−N

fm|em〉‖2 =

〈f |f〉 +∑

n

|fn|2 +∑

m

(f ∗m〈em|f〉 − fm〈f |em〉) =

〈f |f〉 −∑

n

|〈f |en〉|2 +∑

n

|〈f |en〉|2+

∑

n

|fn|2 +∑

m

(f ∗m〈em|f〉 − fm〈f |em〉 =

〈f |f〉 −∑

n

|〈f |en〉|2 +∑

n

|fm − 〈em|f〉|2

Bessel’s inequality tells us that

∑

n

|〈f |en〉|2 ≤ 〈f |f〉 (177)

while the terms that depends on fm is non-negative. A minimum will beachieved only when

fm = 〈em|f〉 (178)

This leads to the Fourier expansion of the periodic function f(φ)

|f〉 =

∞∑

m=−∞

|em〉〈em|f〉 (179)

This converges for continuous functions because the error is smaller thanthe Weierstrass bound with goes to zero.

38

0.10 Lecture 10:

There various generalization of the convergence result that extend to somenon-continuous functions. For example if f(φ) is piecewise differentiable onfinite subintervals and periodic then

1

2(f(θ+) + f(θ−)) =

+∞∑

n=−∞

〈θ|en〉〈envertf〉 (180)

wheref(θ±) = lim

ε→0+f(θ ± ε) (181)

If θ is a point of continuity then f(θ±) = f(θ).The proof of this is similar to the proof of the next theorem, where we let

the finite interval approach the real line. I will prove the following importantresult:Theorem: Let f(x) be absolutely integrable and piecewise differentiable onthe real line. Then

1

2(f(x+) − f(x−)) =

1

π

∫ ∞

0

dλ

∫ ∞

−∞

f(y) cos(λ(x− y))dy (182)

To prove this I replace the integral by

1

π

∫ ∞

0

dλ

∫ ∞

−∞

f(y) cos(λ(x− y))dy =

limΛ→∞

1

π

∫ Λ

0

dλ

∫ ∞

−∞


Sincef(y) cos(λ(x− y)) (184)

is absolutely integrable on [0,Λ] I can change the order of the integration toget

1

π

∫ ∞

0

dλ

∫ ∞

−∞

f(y) cos(λ(x−y))dy = limΛ→∞

1

π

∫ ∞

−∞

dyf(y)

∫ Λ

0

dλ cos(λ(x−y)) =

limΛ→∞

1

π

∫ ∞

−∞

dyf(y)sinΛ(x− y)

x− y(185)

39

Next let y′ = y − x; y = y′ + x (note that sin(ax)/x is even, to get

limΛ→∞

1

π

∫ ∞

−∞

dyf(x+ y′)sin(Λy′)

y′(186)

From last semester we used contour integrals to calculate

∫ ∞

0

sin(Λy)

ydy =

π

2

∫ 0

−∞

sin(Λy)

ydy =

π

2(187)

Using these integrals we can write our integral as

I − 1

2(f(x+) − f(x−)) = lim

Λ→∞

[∫ ∞

0

(f(y + x) − f(x+))sin(Λy)

ydy+

∫ 0

−∞

(f(y + x) − f(x−))sin(Λy)

ydy

]

(188)

The problem is to show that the two integrals on the right side of this equationvanish as Λ → ∞. I break each integral up into four parts

∫ ε

0


ydy (189)

∫ A

ε


ydy (190)

∫ ∞

A

f(y + x)sin(Λy)

ydy (191)

−f(x+))

∫ ∞

A

sin(Λy)

ydy (192)

40

0.11 Lecture 11:

The integrand in (189) is uniformly bounded. This is because the functionhas a right handed derivative and f(x + y) approached f(x+) as y → zerofrom above. The integral can be made as small as desired by choosing ε smallenough.

The difference in (190) is piecewise differentiable. It can be expressed asa finite sum of integrals of differentiable functions on disjoint subintervals.This means that we can write

sin(Λy) = − 1

Λ

d

dycos(Λy) (193)

This can be integrated by parts. Since everything is finite, after integratingby parts this behaves like a constant times 1

Λ. For any fixed ε, and A this

vanishes Λ → ∞.The integral in (191) can be made as small as desired, using the absolute

integrability of f/y, by choosing A sufficiently large. In practice one firstchoose A large enough so this is small, then choose Λ large enough to makethe other integrals small.

The integral in (192) can be transformed to∫ ∞

ΛA

sin(u)

udu→ 0 (194)

as Λ → ∞. Because this oscillates it is useful to write the integral as a sumof positive terms

∞∑

n=m

∫ 2(n+1)π

2nπ

sin(u)

udu (195)

Since the sum converges, and the terms are all positive, this shows that thesum vanishes as m→ ∞.

The other integral can be made to vanish in the same way. The finalresult is

I =1

2(f(x+) − f(x−)) =

1

π

∫ ∞

0

dλ

∫ ∞

−∞


The more familiar form of this result is obtained by replacing the cos(λ(x−y)) by its exponential form.

1

2(f(x+) − f(x−)) =

41

1

2π

∫ ∞

0

dλ

∫ ∞

−∞

f(y)(eiλ(x−y) + e−iλ(x−y)))dy =

1

2π

∫ ∞

−∞

dλ

∫ ∞

−∞

eiλ(x−y)f(y))dy (197)

When f(x) is continuous this gives

f(x) =

1

2π

∫ ∞

−∞

dλ

∫ ∞

−∞


We define the Fourier transform of f(x) by

f(k) :=1√2π

∫

e−ikyf(y)dy (199)

The inverse Fourier transform is given by

f(x) :=1√2π

∫

eikyf(k)dk (200)

While this relation looks symmetric, except for the sign in the exponent,we started with a piecewise differentiable, absolutely integral function, f(x).We have not established that f(k) has any of these properties.

While there are a number of distinct relations between properties of func-tions and their Fourier transforms, one of the most useful relations involvesthe space S(R) of Schwartz functions of a real variable.

The useful property is that the Fourier transform of a Schwartz functionis a Schwartz function.

A function f(x) is a Schwartz function, written f(x) ∈ S(R) if and only iff(x) has an infinite number of derivatives and f(x) and each of its derivativesfall off faster than any inverse polynomial in the sense:

lim|x|→∞

|xmdmf

dxm(x)| = 0 (201)

While this condition sounds restrictive, the functions

fn(x) =1√hn

Hn(x)e−x2/2, (202)

where Hn(x) are the Hermite polynomials, and the hn are the normalizationintegrals, are all Schwartz functions.

42

0.12 Lecture 12:

While finite linear combinations of Schwartz functions are Schwartz func-tions, infinite linear combinations are not necessarily Schwartz functions.This follows because the functions fn(x) in (202) above are known to be anorthonormal basis for the Hilbert space of square integrable functions on thereal line, but we know that there are many square integrable functions thatdo not have as many derivatives or fall off as fast as Schwartz functions.

It turns out that the Schwartz functions are members of a complete metricspace defined using an infinite collection of norms:

‖f‖mn := supx∈R

|xndmf

dx(x)| (203)

ρ(f − g) :=∑

mn

1

2m+n‖(f − g)‖mn (204)

This is the only place in this course where inner product spaces and normedlinear spaces are insufficient. It is worth noting that there are different equiv-alent ways to write this metric that appear in the literature. Two metricsare equivalent if every Cauchy sequence in one metric is Cauchy in the othermetric.

Note that

1√2π

∫

e−ikyyndmf

dym(y)dy = (i)m dn

dkn(kmf(k)) (205)

which means that the meaning of m and n get interchanged under Fouriertransforms. Note that

1√2π

∫

e−ikyf(y)dy =−ik

1√2π

∫

e−iky df

dy(y)dy = (206)

Since dfdy

(y)dy is also a Schwartz function the integral has a bound inde-pendent of k, so the Fourier transform of a Schwartz function vanishes as|k| → ∞. Because derivatives of Schwartz functions are Schwartz functions,and polynomials time Schwartz functions are Schwartz functions, the Fouriertransforms of these functions also vanish in the same limit. Since polynomialtime derivatives of Schwartz function get Fourier transformed into powersand derivatives, these Fourier transforms also vanish as |k| → ∞. Thisshows that Schwartz functions are closed under Fourier transform.

43

A useful characterization of Schwartz functions can be given in terms oftheir expansion coefficients in the orthonormal basis of Hermite functions

〈x|n〉 =1√hn

Hn(x)e−12x2

. (207)

Assume that

f(x) =

∞∑

n=0

〈x|n〉fnthen (208)

f(x) ∈ L2(R) ⇐⇒∞∑

n=0

|fn|2 <∞ (209)

and it can be shown that

f(x) ∈ S(R) ⇐⇒∞∑

n=0

|fn|2(1 + n)m <∞ m ≥ 0 (210)

Thus we see that Schwartz functions have restrictive growth conditionsin their expansion coefficients in the Hermite basis.

Later we will extend Fourier transforms to a larger class of functions.Fourier transforms are especially useful for treating linear partial differentialequations. For example consider the wave equation

[1

c2∂2

∂t2− ∂2

∂x2]u(x, t) = 0 (211)

If we express u(x, t) in terms of its Fourier transform

u(x, t) =1

2π

∫

u(k, ω)eiωt+ikxdkdω (212)

then equation (169) can be written as

1

2π

∫

(−ω2

c2+ k2)u(k, ω)eiωt+ikxdkdω = 0 (213)

This will vanish providedω = ±ck (214)

which leads to a general solution of the form

u(x, t) =1

2π[

∫

fl(k)eik(x+ct)dk +

∫

fr(k)eik(x−ct)dk] (215)

44

1√2π

(fl(x+ ct) + fr(x− ct)) (216)

In this case fl(x) and fr(x) are arbitrary functions. fl(x + ct) representsa disturbance moving to the left while fr(x − ct) represents a disturbancemoving to the right. The function f(±ω/c) is the angular frequency of thisfunction.

It is easy to check that functions of this general form are solutions ofthe wave equation. This application is typical of how the Fourier transformsare used. The important property is that derivatives get converted intomultiplication operators. In this simple case the partial differential equationwas reduced to the algebraic equation, c2k2 − ω2 = 0.

Returning to the original Fourier transformation, for Schwartz functionswe have shown that

f(x) =1

2π

∫ ∞

−∞

dλ

∫ ∞

−∞


If I change the order of the integrals without first cutting off the limit ofintegration in the λ integral I get the quantity

δ(x− y) =1

2π

∫ ∞

−∞

dλeiλ(x−y), (218)

which does not define a convergent integral. However δ(x − y) formallysatisfies

f(x) =

∫

δ(x− y)f(y)dy (219)

In terms of the Lebesgue integral, if δ(x − y) represented a function itwould have to be zero except on a set of measure zero. This would makethe Lebesgue integral zero; so we cannot think of δ(x − y) as a function. Itis usually called a Dirac delta function. Quantities of like the Dirac deltafunctions are called generalized functions or distributions.

We can represent the integral of a Dirac delta function multiplied by afunction as a limit of integrals over a sequence of functions multiplied by thesame function. For example

δ(x− y) = limΛ→∞

1

π

sin(Λ(x− y))

x− y(220)

45

has the property

limΛ→∞

∫

f(y)1

π

sin(Λ(x− y))

x− ydy = f(x) =

∫

f(y)δ(x− y)dy. (221)

This makes sense provided the y integral is performed before taking the limit.It can also be written as

δn(x) =

{n − 1

2n< x < 1

2n

0 |x| ≥ 12n

(222)

which also has the property

limn→∞

∫

f(y)δn(x− y)dy = f(x) =

∫

f(y)δ(x− y)dy. (223)

The integrals on the left exist for each n and the limit exists if f is continuous.There are many other sequences of functions that can represent the “diracdelta function” δ(x− y). All that matters is that (1) they integrate to 1 and(2) the area that contributes to most of the integral eventually contains onlythe point x.

While there are many ways to represent delta functions, the basic rule∫

f(y)δ(x− y)dy = f(x) (224)

is simple, and can be implemented without the use of sequences of auxiliaryfunctions or the computation of complicated integrals

What we can observe about the rule (225) is that it is (1) linear in fand its value is a complex number. This is a property the it shares with theintegral of a fixed function multiplied by f .Definition: A tempered distribution is a continuous linear functional on thespace of Schwartz functions.

The space of tempered distributions is denoted by S ′(R).Thus if L is a tempered distribution, α is complex, and f(x) and g(x) are

Schwartz functions, then

L[f + αg] = L[f ] + αL[g] (225)

and

liml→∞

ρ(fl − g) := liml→∞

∑

mn

1

2m+n‖(fl − g)‖mn → 0 (226)

46

thenliml→∞

L[fl] = L[g] (227)

In the case of the delta function

L[f ] = f(x) (228)

defines a tempered distribution since

L[f + αg] = f(x) + αg(x) (229)

ρ(fn − g) → 0 (230)

limn→∞

L[fn] = L[ limn→∞

] = L[g] = g(x) (231)

because the

limn→∞

ρ(fn − g) → 0 ⇒ supy∈R|fn(y) − g(y)| → 0 ⇒

|fn(x) − g(x)| → 0 (232)

Many tempered distributions can be written as integrals. In a Hilbertspace of square integrable functions, each element |f〉 ∈ H defines a linearfunctional on H by the inner product

Lf [g] =

∫

f ∗(x)g(x)dx (233)

These linear functional are also tempered distribution because they are alcocontinuous on the subspace of Schwartz functions. If I expand f(x) and g(x)in terms of the basis functions

〈x|n〉 =1√hn

Hn(x)e−x2/2 (234)

|f〉 =∑

|n〉fn (235)

|g〉 =∑

|n〉gn (236)

Then the Schwartz inequality

∑

|f ∗ngn|2 = |〈f |g〉|2 ≤ [

∑

n

|f 2n|][∑

m

|g2m|] (237)

47

ensures that this scalar product is well defined.When g(x) is a Schwartz function the coefficients gn fall off faster than

[∑

m |g2m|]. This means that it is possible to relax the fall off conditions on f

so it is no longer square integrable, but gives a finite scalar product.Specifically, if I require

∑

n

|gn|2(1 + n)m <∞ ∀m (238)

and∑

n

|fn|21

(1 + n)m<∞ for some m (239)

the Schwartz inequality

∑

|fngn| =∑

| fn√

(1 + n)m

√

(1 + n)mgn|

≤ [∑

n

1

(1 + n)m|f 2

n|]1/2[∑

m

(1 + n)m|g2m|]1/2 (240)

This means if g is a Schwartz function and f is a tempered distribution this“inner product” is still finite.

Condition (239) is one way to characterize tempered distributions. We seethat the tempered distributions have less restrictive growth conditions thansquare integrable functions. One consequence of the Hermite representationof tempered distributions is that any tempered distribution, including thedelta function, can be approximated by an expansion in the Hermite basisfunctions

L =∑

n

c∗n〈n| (241)

The only new property is that the cn are restricted so

∑

n

|cn|2(1 + n)m

<∞ (242)

for some m so

L[f ] =∑

n

c∗n

∫

〈n|x〉dx〈x|f〉 (243)

48

If I consider∑

n

c∗n

∫d〈n|x〉dx

(244)

and apply this to a Schwartz function 〈x|f〉

L′[f ] =∑

n

c∗n

∫d〈n|x〉dx

dx〈x|f〉 = (245)

L[− df

dx] = −

∑

n

c∗n

∫

〈n|x〉dxd〈x|f〉dx

(246)

Interchanging the order of the sum an integral is valid provided 〈x|f〉 is aSchwartz function and the coefficients cn satisfy the growth condition for atempered distribution. Note that there are no contributions when integratingby parts because of the Schwartz functions and all of their derivatives vanishas x → ±∞. Since derivatives of Schwartz functions are Schwartz functionsthe right hand side of this expression is always defined if L is a tempereddistribution.

This leads us to define the derivative of the tempered distribution L[f ]by

L′[f ] = −L[df

dx] (247)

This can be repeated n times to get the n-th derivative of a tempered distri-bution

L(n)[f ] := (−)nL[dnf

dxn] (248)

Clear the tempered distribution have an infinite number of derivatives be-cause the Schwartz functions have an infinite number of derivatives. Thesederivative are only well defined on Schwartz functions, they may not be de-fined on an ordinary square integrable function.

As an example consider the Heaviside function

θ(x) =

{1 x > 00 x ≤ 0

(249)

L[f ] =

∫

θ(x)f(x)dx =

∫ ∞

0

f(x)dx (250)

L′[f ] = −L[f ′] = −∫

θ(x)df(x)

dxdx =

49

−∫ ∞

0

df(x)

dxdx = −f(∞) + f(0) = f(0) (251)

But this have the same value as the distribution δ(x). This gives the distri-butional identity

d

dxθ(x) = δ(x) (252)

We will used this identity when we discuss Green functions. It is useful toestablish some additional properties of tempered distributions.

We introduced Schwartz functions because Schwartz functions get mappedinto Schwartz functions under Fourier transform. We can use this propertyto define the Fourier transform of a tempered distribution:

First let f(x) and g(x) both be Schwartz functions. Then the Fouriertransform of these functions are

f(x) =1√2π

∫

dkeikxf(k) (253)

g(x) =1√2π

∫

dkeikxg(k). (254)

Since Schwartz functions are also square integrable, the inner product of fwith g is defined

〈f |g〉 =

∫

f(x)∗g(x)dx =1

2π

∫

dxdk1dk2eix(k2−k1)f ∗(k1)g(k2) = (255)

∫

dk1f∗(k1)g(k1) = 〈f |g〉 (256)

This shows that Fourier transforms preserve the Hilbert inner product.This property can be used to define the Fourier transform of a tempered

distribution. If f is a tempered distribution we can write

f =

∞∑

n=0

fnφn(x) (257)

φn(x) := 〈x|n〉 (258)

We define finite partial sums

fN(x) :=N∑

n=0

fnφn(x). (259)

50

The fN are all Schwartz functions.It follows that if g is a Schwartz function

〈f |g〉 = limN→∞

〈fN |g〉 =

limN→∞

〈fN |g〉 := 〈f |g〉 (260)

This defines the Fourier transform of any tempered distribution.Let’s use this to compute the Fourier transform of a delta function. Start

by letting g(x) be a Schwartz function. We can write

〈δy|g〉 =

∫

δ(x− y)g(x)dx = g(y) =1√2π

∫

eikyg(k)dk (261)

one can immediately read off the linear operator that act on g(k):

δy(k) =1√2πe−iky (262)

We can also arrive at the same result using the formal rule

δy(k) =1√2π

∫

δ(x− y)e−ikxdx =1√2πe−iky (263)

It often happens that while an equation may have no solutions in thespace of square integrable functions, if the space is enlarged they might havesolutions in the larger space. The space of tempered distributions is muchlarger than the space of square integrable functions.

The Schrodinger equation for a free particle with energy k2 is

− d2

dx2ψ(x) = k2ψ(x) (264)

has solutions of the forme±ikx (265)

These solutions are not square integrable functions on the R, but they areimmediately recognized as Fourier transforms of delta functions. Thus, theycan be interpreted as tempered distributions.

These distributional solutions, eikx are complete in the sense that I canwrite any square integrable function in the form

f(y) =1√2π

∫

eikyf(k)dk (266)

51

(technically it may have to be written as a limit of Schwartz functions whichcan be put in this form).

There is a lot more that can be said about distribution theory. Most ofthe important results can be derived by expressing the distributions as limitsof finite sums of Hermite basis functions.

As a practical matter, elementary results can be obtained by acting onthe distributions as if they were functions.

The tempered distributions are one a much larger class of distributions.Ordinary distributions replace Schwartz functions with infinitely differen-tiable functions with bounded support. In this the corresponding space ofdistribution is even larger. The delta function is a well defined distributionin both of these spaces.

0.13 Lecture 13:

In this section I begin the discussion of linear operators on infinite dimen-sional vector spaces. A Linear Operator L on an infinite dimensional vectorspace V is a function that maps vectors from a subset of V called the domainof L to a vectors in another subset of V, called the range of L satisfying thelinearity condition

L(|v〉 + α|w〉) = L(|v〉) + αL(|w〉) (267)

Linear operators fall into several useful classes. We will primarily beconcerned with bounded operators, unbounded operators, and compact op-erators.

As in the finite dimensional case, if V is a normed linear vector space thenthe norm of the operator L is defined by

|||L||| = sup|||v〉||=1

||L|v〉|| (268)

The operator L is bounded if is L defined for all vectors in V and |||L||| =C < ∞. Operators that are defined on subsets of V that are not boundedare called unbounded operators. If L is unbounded then it is always possibleto find an infinite sequence of unit normalized vectors |vn〉 such that

||K|vn〉|| > n (269)

52

Bounded operators have the property that they are continuous, becauseif |||vn〉 − |v〉|| → 0 then

|||L|vn〉 − L|v〉||| = |||L(|vn〉 − |v〉)||| ≤ |||L||| · |||vn〉 − |v〉|| → 0 (270)

Bounded sets in infinite dimensional spaces have some properties that arenot shared by bounded sets in finite dimensional spaces. For example, allorthonormal basis vectors fit inside a bounded sphere of radius 1 + ε. Thedistance between any pair of orthonormal basis vector is

√2. This means

that it is possible to put an infinite collection of vectors into this boundedset and they never have to get close to each other.

In a finite dimensional ball, if I have an infinite collection of vectors ofnorm 1, there has to be a subsequence that are getting closer and closerto each other. This result is called the Bolzano Weierstrass Theorem. Thestep that was glossed over in the proof of the Weierstrass theorem, where westated that g(x + y) − g(x) could be made less than some ε for all x in aclosed bounded set, for small enough y, is based on a variant of the BolzanoWeierstrass theorem. Our interest in this theorem is to identify subsets oninfinite dimensional vector spaces that look like finite dimensional vectorspaces.

53

0.14 Lecture 14:

Theorem: (Bolzano Weierstrass Theorem): Let {xn} be an infinite col-lection of points in the unit square [0, 1] × [0, 1]. Then there is an infinitesubsequence of the points that converge to a point in the square.

Points in a square can be thought of as vectors. Divide each square intoa 10 × 10 array of 100 smaller squares. Label them [m,n] where m,n areintegers that go from 0 to 9. As least one of these squares, say [m1, n1]has an infinite number of points. Divide that square into a 10 array of 100smaller squares. Label them m2, n2. At least one of these contains an infinitenumber of points. Label this square by [m1, m2;n1n2]. This process can becontinued giving successively smaller nested squares m1, m2, · · · ;n1, n2, · · · ],each containing an infinite number of points. Each of the sequences can beconsider as a decimal representation of a number between 0 and 1

x = .m1m2m3 · · · y = .n1n2n3 · · · (271)

This number also identifies a point in the intersection of all of the largersquares. The truncated sequences can be associated with Cauchy sequencesof rational numbers. Because of the completeness of the real numbers, theseCauchy sequences converge to a pair of real number between zero and one,This pair of numbers defines a vector in the plane.

The limit point is in the intersection of an infinite number of squares,each containing an infinite number of points. Picking one vector from eachof the intersecting squares gives a convergent subsequence of points. Thisargument can be extended to N dimensions, but it breaks down when N isinfinite because then each subdivision results in an infinite number of cubesof a fixed size; each of which could contain a point.

In infinite dimensional vector spaces bounded sets do not have the BolzanoWeierstrass property. Subsets of points in infinite dimensional vector spacesthat have the Bolzano Weierstrass property are called Compact sets.Definition: A subset of a vector space is Compact if every bounded infinitesequence has a convergent subsequence that converges to a vector in the set.

Compactness depends on the definition of convergence. We will normallybe concerned with compact subsets of normed linear spaces. In these spacesconvergence is given in terms of the norm.

What does a compact subset of an infinite dimensional space look like?Consider an infinite sequence of vectors |m〉. I can assume that at an infinitenumber of them are linearly independent, otherwise all of the vectors live in

54

a finite dimensional subspace, which is compact if it is closed (includes itslimit points) and bounded.

If this set contains an infinite number of independent vectors with

‖|n〉 − |m〉‖ > ε, (272)

it cannot be compact because this subset does not have a convergent subse-quence.

For the set to be compact is must be true that for all m,n > Nε

‖|n〉 − |m〉‖ < ε (273)

This means that for every ε > 0, every vector in the set can expressed as thesum of a vector in a finite dimensional subspace plus an additional vector withnorm less than ε. As ε gets smaller the dimension of the finite dimensionalspace can get larger.

What is means in words is that compact sets are very thin in all but afinite number of dimensions.

55

0.15 Lecture 15

Definition C1: A compact linear operator is a bounded linear operator thatmaps bounded sets to sets with compact closure.

In the Hilbert space case compact operators have a simple characteriza-tion.Theorem: A linear operator on a Hilbert space is compact if and only if forevery ε > 0 it can be expressed as the sum of a finite rank operator and anoperator with operator norm less than ε.

Note that in a Hilbert space with an orthonormal basis {|n〉} we can writea general operator A in the form

A =∑

mn

|n〉Anm〈m|. (274)

If I define|an〉 :=

∑

m

Anm〈m| (275)

thenA =

∑

n

|n〉〈an| (276)

where |an〉 is not generally a unit vector. A is a rank N operator if the nsum contains N terms.

I prove this theorem by contradiction. Pick an ε > 0 and assume thatthere is an infinite collection of orthogonal vectors {|n〉} with norm greaterthan ε such that there are vectors |wn〉 with norm 1 satisfying

C|wn〉 = |n〉 (277)

for all n. It this is the case the image of the set of unit normed vectorsis not a compact set and consequently the operator C cannot be compact.This means that for any ε > 0 I can find a projection operator PN on an Ndimensional subspace such that

C = PNC + (I − PN)C (278)

where ‖|C(I − P )N |v〉‖ < ε. PNC is the desired finite rank operator.Conversely, if for every ε > 0 I can find FN finite rank with

|||C − FN ||| < ε (279)

56

then every vector that gets mapped into the orthogonal complement of rangeof FN necessarily gets mapped into a vector of norm less than ε, which meansthat the range of C has compact closure. They can be understood as theclosure of the space of finite rank operators in the operator norm.

This shows that compact operators on Hilbert spaces can be uniformlyapproximated by finite dimensional matricesTheorem: Let B be bounded and C be compact on a Hilbert space. ThenA = BC and D = CB are compact.

This theorem means that the compact operators are a 2 sided ideal in thespace of bounded operators.

To prove this let ε > 0. Since C is compact it is possible to find a finitedimensional operator CN such that

|||C − CN ||| < ε. (280)

It follows that

|||BC − BCN ||| ≤ |||B||| · |||C − CN ||| < ε|||B||| (281)

and|||CB − CNB||| ≤ ·|||C − CN ||| · |||B||| < ε|||B||| (282)

Since BCN and CNB are finite dimensional and |||B||| is a fixed finite number,it follows that A and D are compact.

An important result that illustrates the utility of compact operators is thefollowing result, which is one of the central results in the theory of integralequations.Theorem:(Fredholm Alternative) If A is a compact operator on a Hilbertspace H then either

A|ψ〉 = |ψ〉 (283)

has a non-trivial solution or (I − A)−1 exists.Proof: Since A is compact there exists a finite rank AN such that

|||A− AN ||| < 1

2. (284)

It follows that

(1 − (A− AN ))−1 =∞∑

n=0

(A− AN)n (285)

57

where this series converges in norm because

|||∞∑

n=0

(A− AN )n||| ≤∞∑

n=0

|||(A− AN)n||| ≤

∞∑

n=0

|||(A− AN)|||n ≤∞∑

n=0

1

2n= 2. (286)

Next note that(I − A) = I − (A− AN) − AN =

I − (A− AN) − AN(I − (A− AN ))−1(I − (A− AN )) =

(I − AN (I − (A− AN))−1)(I − (A− AN)) =

(I − FN)(I − (A− AN)) (287)

whereFN := AN(I − (A− AN))−1 (288)

is finite rank. It follows that

(I − A)−1 = (I − (A− AN))−1(I − FN)−1. (289)

This will exist provided (I − FN) has a non-zero determinant. If (I − FN )has zero determinant (I − FN )|ψ〉 has at least one solution.

Recall that in the finite dimensional case 0 might correspond a generalizedeigenvector of some order. If the order is k > 1, then (I − FN)k−1 applied tothe generalized eigenvector gives the desired eigenvector.

Note that the Fredholm Alternative does not hold for bounded operatorsin general. To see this consider the operator

Aφ(x) = xφ(x) (290)

on L2([0, 2]). ThenAφ(x) = φ(x) (291)

requires (x − 1)φ(x) = 0, which has no solutions and (1 − A)−1 is not abounded operator because of the singularity at x = 1.

The Fredholm Alternative illustrates how one can extend results fromfinite dimensional matrices to compact operators.

58

0.16 Lecture 16

One of the advantages of compact operators is that it is possible to use theerror estimates on the approximated operator to compute bounds on theerror of the approximate solution. To show this assume that A is compactand |||A − AN ||| < ε for some rank N approximation AN . Consider theequation

|f〉 = |g〉 + A|f〉 (292)

The solution to this equation, if it exists is

|f〉 = (I − A)−1|g〉 (293)

which can be written as

|f〉 = (I − (A− AN))−1(I − FN)−1|g〉. (294)

I can approximate

(I − (A− AN))−1 ≈M∑

m=0

(A− AN)m. (295)

The error is

|||(I − (A− AN ))−1 −M∑

m=0

(A− AN)m||| =

|||∞∑

m=M+1

(A− AN)m||| ≤ εM+1

1 − ε(296)

This means that the error in the approximation is bounded by

‖|f〉 −M∑

m=0

(A− AN)m(I − FN )−1|g〉‖ ≤

εM+1

1 − ε|||(I − FN )−1||| · ‖|g〉‖ (297)

whereFN := AN(I − (A− AN))−1) (298)

59

is a rank N operator. In principle, because FN is finite rank, |||(I −FN)−1|||can be computed by matrix methods, but in practice the consistent approx-imation to FN is

FN ≈ BN := AN

M∑

m=0

(A− AN)m (299)

with error

|||FN − BN ||| ≤ |||AN |||εM+1

1 − ε(300)

Using this we can estimate the error in (I −FN)−1 by writing FN = BN + ∆which gives

(1 − FN)−1 = (1 − BN)−1 + (1 −BN )−1∆(1 − FN)−1 (301)

(1 − FN )−1 =∞∑

l=0

((1 − BN)−1∆)l(1 − BN)−1 (302)

If

|||((1 − BN)−1||| < (1 − ε)

εm(303)

then|||(1 −BN )−1∆)||| = δ < 1 (304)

|||(1 − FN )−1 −K∑

l=0

((1 −BN )−1∆)l(1 −BN )−1|||||| ≤ δK+1

1 − δ|||(1 − BN)−1|||

(305)Putting all of the estimates and approximations together

|f〉 ≈M∑

m=0

(A− AN)mK∑

l=0

((1 − BN)−1∆)l(1 − BN)−1|g〉 (306)

which has the desirable property that there is a computable upper bound onthe error than can be made as small as desired. This assumes that 1 is notnear an eigenvalue of BN .Theorem: (Riesz Schauder) Let A be a compact Hermitian operator ona Hilbert space. Then for any ε > 0, A can have only a finite number ofeigenvalues λn with |λn| > ε.

Because A is Hermitian I can assume that the eigenvectors are orthonor-mal. Let {|n〉} denote the collection of orthonormal eigenvectors with eigen-values having magnitude larger than ε

60

This set of vectors defines an infinite sequence with the property that

||A|n〉 − A|m〉|| = ||λn|n〉 − λm|m〉|| =√

λ2n + λ2

m >√

2ε (307)

which contradicts the compactness of A It follows that the subspace of linearcombinations of eigenvectors with eigenvalue λn > ε has a finite dimension.Theorem: (Completeness) Let A be a compact Hermitian operator. ThenA has a complete set of eigenvectors with eigenvalues λn satisfying

limn→∞

|λn| → 0 (308)

To prove this let |||A||| be the operator norm of A. By definition

|||A||| = sup|||v〉||=1

||A|v〉|| (309)

This definition means that either there is a vector |v〉 with

A|v〉 = ±|||A||| · |v〉 (310)

or there is a sequence of unit normed vectors |vn〉 with

||A|vn〉|| → |||A||| (311)

I will show that this second possibility implies the first.Since the sequence A|vn〉 = |wn〉 satisfies the Bolzano Weierstrass prop-

erty, it has a subsequence that converges to a vector |w〉 in the Hilbert space(Hilbert spaces are complete).

I claimA|w〉 = ±|||A||| · |w〉 (312)

To verify this note|||w〉|| = |||A||| (313)

by the definition of the operator norm and

||A|w〉|| ≤ |||A||| · |||w〉|| = |||A|||2 (314)

Pick ε > 0 and |vn〉 with n large enough so

|||wn〉 − |w〉|| < ε (315)

61

so

||A|wn〉|| ≤ ||A|w〉||+ |||A||| · |||wn〉 − |w〉|| ≤ ||A|w〉||+ |||A|||ε (316)

Let ε′ = ε|||A||| so||A|w〉|| ≥ ||A|wn〉|| − ε′ (317)

for all n > Nε Next note

|||wn〉||2 = 〈vn|AA|vn〉 ≤

〈vn|vn〉12 〈AAvn|AAvn〉

12 = ‖A|wn〉‖ (318)

Using (318) with (317) gives

||A|w〉|| ≥ |||wn〉||2 − ε′. (319)

Since ε is independent of n for large enough n

limn→∞

|||wn〉|| → |||w〉|| = |||A||| (320)

||A|w〉|| ≥ |||w〉||2 (321)

Combining (321) with the inequality

|||A||| · |||w〉|| ≥ ||A|w〉|| (322)

gives||A|w〉|| = |||A|||2 (323)

Next note〈w|AA|w〉 ≤ 〈w|w〉1/2〈w|A4|w〉 1

2 (324)

The left side of this expression is |||A|||4 by (323). The right side is boundedby |||A||| · |||A||| · |||A|||2 which gives the inequalities

|||A|||4 ≤ 〈w|w〉1/2〈w|A4|w〉1/2 ≤ |||A|||4 (325)

leading to|||A|||3 = 〈w|A4|w〉1/2 (326)

From these relations I get

〈w|(A2 − |||A|||2)(A2 − |||A|||2)|w〉 =

62

〈w|A4|w〉 − 2|||A|||2〈w|A2|w〉 + |||A|||6 = 0 (327)

This gives

0 = (A2 − |||A|||2)|w〉 = (A+ |||A|||)(A− |||A|||)|w〉 (328)

which proves that |w〉 is an eigenvector with eigenvalue ±|||A|||.This establishes the existence of one eigenvector |w1〉 with eigenvalue λ1

equal to ±|||A|||.I establish completeness by induction. Define

A1 = A− |w1〉λ1〈w1| (329)

where |w1〉 is normalized to unity. This is a compact Hermitian operatorwhere

A1|w1〉 = (λ1 − λ1)|w1〉 = 0 (330)

Any other eigenvector of A is also an eigenvector of A1 with the sameeigenvalue. Repeating our analysis, we find there is an eigenvector witheigenvalue equal (in magnitude) to the largest eigenvalue of A1 which is thesecond largest eigenvalue of A. The associate eigenvector is orthogonal to|w1〉

This process can be repeated N times giving

AN =

N∑

n=1

|wn〉λn〈wn| (331)

By the Reisz-Schauder theorem there are only finite number of eigenvalueswith magnitude larger than ε. This means

|||A− AN ||| < ε (332)

limN→∞

AN = A (333)

To complete the proof let |v〉 be any Hilbert space vector. Then

A|v〉 −∞∑

n=1

|wn〉λn〈wn|v〉 =∞∑

n=1

|wn〉(λn − λn)〈wn|v〉 (334)

It follows that

|v〉 =∞∑

n=0

|wn〉〈wn|v〉 + |v⊥〉 (335)

63

where |v⊥〉 is the difference. It follows by direct computation that

A|v⊥〉 = 0 (336)

which means that |v〉 is an eigenvector of A with eigenvalue zero. (335) showsthat the eigenvectors of A are complete.

64

0.17 Lecture 17

Next I show the general structure of a compact operator on a Hilbert space.Theorem: Let A be compact on a Hilbert space. Then there are two or-thonormal bases {|vn〉} and {|wn〉} and λn > 0 satisfying

limn→∞

λn → 0 (337)

and

A =

∞∑

n=1

|vn〉λn〈wn| (338)

To prove this note that if A is compact then A†A is compact (see home-work) , Hermitian, and positive. It has a complete set of eigenvectors withpositive eigenvalues

A†A|wn〉 = λ2n|wn〉 (339)

In what follows the |wn〉 are taken to be orthonormal.Define

|vn〉 := K|wn〉/λn (340)

for n with λn > 0. Then

〈vn|vm〉 =1

λmλn〈wn|K†K|wm〉 =

λ2m

λmλnδmn = δmn (341)

NoteA|ξ〉 = A

∑

n

|wn〉〈wn|ξ〉 =

∑

n

(A|wn〉/λn)λn〈wn|ξ〉 =

(∑

n

〈vn〉λn〈wn|)

|ξ〉 (342)

Which showsA =

∑

n

〈vn|λn〈wn| (343)

as desired.

65

The converse of this theorem is also true, because if

A =∞∑

n=1

|vn〉λn〈wn| (344)

and I define

AN :=N∑

n=1

|vn〉λn〈wn| (345)

then

A− AN =

∞∑

n=N+1

|vn〉λn〈wn| (346)

and|||A− AN ||| = maxn>N |λn| (347)

which vanishes by the Reisz Schauder theorem as N → ∞.In applications it is important to know that an operator is compact before

on attempts to make finite dimensional approximations.There are two important cases

A linear operator K is Hilbert Schmidt if and only if

Tr(K†K) <∞ (348)

A linear operator K is Trace class if and only if

Tr(K†K)1/2 <∞ (349)

Here the trace of a linear operator on an infinite dimensional vector spaceis

Tr(A) =∑

n

〈n|A|n〉 (350)

where {|n〉} is any orthonormal basis.Note that the trace is independent of the choice of basis because

Tr′(A) =∑

n

〈n′|A|n′〉 =

∑

lmn

〈n′|m〉〈m|A|l〉〈l|n′〉 =

66

∑

lm

(∑

n〈l|n′〉〈n′|m〉)〈m|A|l〉 =

∑

lm

δml〈m|A|l〉 =

∑

m

〈m|A|m〉 = Tr(A) (351)

To show that both trace class and Hilbert Schmidt operators are compactfirst note that

Tr(A†A) ≥ Tr(A†PNA) (352)

where PN is any orthogonal projector. If this is the orthogonal projector onthe first N eigenstates of K†K, the condition of being trace class or HilbertSchmidt ensures that the sum of any number of eigenvalues is bounded.

This requires that λn → 0, which means that K is compact.In the Hilbert Schmidt case

∑

n

λ2n <∞ (353)

For the trace class case ∑

n

λn <∞ (354)

and in general if∑

n

λpn <∞ (355)

for p > 0 the operator is still compact. In practice the condition of beingHilbert Schmidt is the easiest condition to show.Example: Compact operators are often associated with integral equations.Consider the integral equation

f(x) = g(x) +

∫ b

a

K(x, y)f(y)dy (356)

We can check to see if K(x, y) is the kernel of a Hilbert Schmidt operator bycalculating

Tr(K†K) =

∫ b

a

dx

∫ b

a

dyK∗(x, y)K(y, x) (357)

67

If this is finite then it follows that we can uniformly approximate K by

K(x, y) ≈N∑

n=1

φn(x)λnχ∗n(y) (358)

Replacing the actual kernel by the approximate kernel gives

f(x) ≈ g(x) +

∫ b

a

N∑

n=1

φn(x)λnχ∗n(y)f(y)dy (359)

This can be reduced to a matrix equation if I multiply by χ∗m(x) and integrate

over x

cn = gn +

N∑

m=1

Knmcm (360)

where

cn =

∫ b

a

dxχ∗n(x)f(x) (361)

gn =

∫ b

a

dxχ∗n(x)g(x) (362)

Knm =

∫ b

a

dxχ∗n(x)λmφm(x) (363)

leading to the approximate solution

f(x) = g(x) +

∫ b

a

K(x, y)χn(y)dycndy (364)

If a linear operator A is compact we know that it can be uniformly approx-imated by a finite matrix. In principle any finite dimensional approximationwill eventually converge. If A is compact and Hermitian then

A =∞∑

n=1

|n〉λn〈n| (365)

with|λn| ≥ |λn+1| (366)

68

In this basis the approximation

A ≈ AN :=

N∑

n=1

|n〉λn〈n| (367)

satisfies|||A− AN ||| < |λN+1| → 0 (368)

It is clear that this is the best choice of basis. For a large matrix it may betoo much work to diagonalize A. Note that if we take a large power m of A

Am =∞∑

n=1

|n〉λmn 〈n| (369)

then λmn will have larger coefficients for the larger eigenvalues.

One way to take advantage of this is to generate a basis using differentpowers of K applied to a single vector. After normalization, as long asthe starting vector has some overlap with the eigenvector with the largesteigenvalue, it will get emphasized. The problem is all of the vectors generatedthis way will be almost parallel. To fix this we use the Gram-Schmidt method,and orthogonalize the vectors at each step.

To illustrate this method consider the linear equation

|f〉 = |g〉 +K|f〉 (370)

where |g〉 is known and K is compact and hermetian.We generate and orthonormal basis as follows

|1〉 =|g〉

‖|g〉‖ (371)

|2〉 = K|1〉 − |1〉〈1|K|1〉 (372)

|2〉 =|2〉‖|2〉‖ (373)

... (374)

|n〉 = K| ˆn− 1〉 − | ˆn− 1〉〈 ˆn− 1|K| ˆn− 1〉 − | ˆn− 2〉〈 ˆn− 2|K| ˆn− 1〉 (375)

|n〉 =|n〉‖|2〉‖ (376)

69

Vectors generated this algorithm have the properties

〈m|n〉 = δmn (377)

〈m|K|n〉 = 0 |m− n| > 1 (378)

In this basis the equation has the form

〈n|f〉 = ‖|g〉‖δn1 +

n+1∑

m=n−1,m≥0

〈n|K|m〉〈m|f〉 (379)

This gives a tridiagonal matrix. More important, if K has a small numberof large eigenvalues, this solution to the equation can often be accuratelyapproximated using a small number of these basis functions, even when theoriginal matrix is very large.

This same method can also be used to find approximate solutions to theeigenvalue equation. If K is not Hermetian one there are related methodsthat use two sets of basis vectors. One can also convert the equation to acompact kernel equation with a Hermetian kernel using

(1 −K)|f〉 = |g〉 →

(1 −K†)(1 −K)|f〉 = (1 −K†)|g〉 (380)

(1 −K†)(1 −K) = I −K ′ K ′ := K +K† −K†K (381)

where K ′† = K ′ and

|f〉 = (1 −K ′)−1(1 −K†)|g〉 (382)

These are the methods of choice for solving very large linear systemsbased on approximating compact kernels.

70

0.18 Lecture 18

Definition: An nth degree ordinary differential equation is an equation ofthe form

F (x, u,du

dx, · · · , d

nu

dxn) = 0 (383)

The differential equation is linear if F is linear in u and all of its derivatives

F = q(x) + r0(x)u(x) + r1(x)du

dx+ · · · + rn(x)

dnu

dxn= 0 (384)

The linear differential equation is called homogeneous if q(x) = 0, otherwiseit called inhomogeneous.

The existence of a solution to any of these equations depends on thespace of functions where one looks for solutions. The functions may have Nderivatives, or they could be tempered distributions, they may satisfy certainboundary or asymptotic conditions.

It is difficult to find results that apply to general equations of the form(383). The partial derivative of F with respect the the highest derivative isnot zero at point where all of the other arguments are fixed, then the implicitfunction theorem allows us to replace (383) by an equation of the form

dnu

dxn= G(x, u,

d1u

dx1, · · · , d

n−1u

dxn−1) (385)

which is valid near the point defined by

x = x0, u(x0) = u0, · · ·dn−1u

dxn−1(x0) = u0,n−1 (386)

There is no guarantee the this inversion is valid far form this point, butwe will show that under mild conditions on G this equation alway has localsolutions.

The mild conditions are the following. None that G depends on n + 1variable. The function G(x1, · · · , xn+1) satisfies a Lipschitz condition xi ifthere exists an η > 0 such that xi, x

′i ∈ [ci − η, ci + η] implies

|G(x1, · · · , xi, · · · , xn) −G(x1, · · · , x′i, · · · , xn)| < k|xi − x′i| (387)

If G satisfies a Lipschitz condition in all n+1 variables the differential equa-tion, with initial conditions

u(x0) = c0 · · ·dn−1u

dxn−1= cn−1 (388)

71

I outline the proof. The differential equation can be converted to a systemof first order equations using

y0 := u, y1 =du

dx· · · yn−1 :=

dn−1u

dxn−1(389)

dy0

dx= y1

dy1

dx= y2...

dyn−2

dx= yn−1

dyn−1

dx= G(x0, y0, · · · , yn−1)

(390)

Next we convert this to a system of integral equations by integrating fromx0 to x:

y0(x)y1(x)

...yn−2(x)yn−1(x)

=

c0c1...

cn−2

cn−1

+

∫ x

x0y0(x

′)dx′∫ x

x0y1(x

′)dx′

...∫ x

x0yn−1(x

′)dx′∫ x

x0G(x′, y0, · · · , yn−1)dx

′

(391)

We can attempt to solve this system by successive approximations. Todo this write the integral equation as

y(x) = c +

∫ x0

x

G(x′,y)dx′ (392)

y(1)(x) = c (393)

y(2)(x) = c +

∫ x0

x

G(x′,y(1)dx′ (394)

... (395)

y(n)(x) = c +

∫ x0

x

G(x′,y(n−1))dx′ (396)

72

0.19 Lectue 19

I define a norm on this system by

|y(x) − y′(x)| =

n−1∑

m=0

|ym(x) − y′m(x)| (397)

The Lipschitz condition means that on a sufficiently small n + 1 dimen-sional volume there is a constant C such that

|y(1) − y(0)| ≤ |x− x0|C (398)

|y(2) − y(1)| ≤1

2|(x− x0)

2|C2 (399)

|y(n) − y(n−1)| ≤1

n!|(x− x0)

n|Cn (400)

|y(n)| = (401)

|n∑

k=1

(y(k) − y(k−1)) + f0| ≤n∑

k=0

1

n!||x− x0|nCn + |f0| (402)

limn→∞

|y(n)| ≤ eC|x−x0| + |f0| (403)

This shows that not only does the series for the function converge, but theseries for all of the lower order derivatives converges. This leads to the localexistence of the function and its first n − 1 derivatives. The local existencedoes not mean that the solution can be extended.

This method of converting the differential equation to an integral equationis called Picard’s method. Most local existence theorems for solutions toordinary differential equations are based on this result. It is of limited valuebecause typical functions G will satisfy a Lipschitz condition in all variablesin only a small region about an initial point.

In what follows I will discuss the solution of linear differential equationsof first and second order. These equations have the general form

α(x)du

dx+ β(x)u(x) = f(x) α(x) 6= 0 (404)

α(x)d2u

dx2+ β(x)

du

dx+ γ(x)u(x) = f(x) α(x) 6= 0 (405)

73

First I show that any first order linear differential equation can be solvedin terms of integrals of known functions. To do this define the function p(x)as the solution to

1

p(x)

dp

dx=β(x)

α(x)(406)

which can be solved to give

p(x) = p(x0)eR x

x0dx′β(x′)/α(x′)

(407)

The differential equation can then be written as

0 =du

dx+β(x)

α(x)u(x) − f(x)

α(x)=

p(x)du

dx+ u(x)

dp

dx− f(x)p(x)

α(x)→

d

dx(u(x)p(x)) =

f(x)p(x)

α(x)

which can be integrated to get the solution

u(x) =1

p(x)(u(x0)p(x0) +

∫ x

x0

dx′f(x′)p(x′)

α(x′)) (408)

Next I discuss some general properties of second order differential equa-tions of the form (89).

First consider the case that q(x) = 0. In this case if u1(x) and u2(x) areboth solutions of (89) then so is

u3(x) = c1u1(x) + c2u2(x). (409)

This is an elementary consequence of the linearity of the linearity of thedifferential operator.

I use this to show that two solutions of a second order homogeneousdifferential equation are independent if and only if

W (u1, u2) := det[

(u1(x) u2(x)du1

dx(x) du2

dx

)

] 6= 0 (410)

This determinant is called the Wronskian.

74

Clearly dependence means that there are constants c1 and c2 satisfying(

u1(x) u2(x)du1

dx(x) du2

dx

)(c1c2

)

=

(00

)

(411)

which implies the matrix has 0 as an eigenvalue and the determinant van-ishes. Conversely, if the determinant does not vanish the solutions cannot bedependent.

Clearly if W = 0 then

1

u1

du1

dx=

1

u2

du2

dx(412)

which can be integrated to get

ln(u1/u2) = const (413)

oru1 = u2 × constant (414)

Note that

dW (x)

dx= u1(x)

d2u2

dx2− u2(x)

d2u1

dx2= −β(x)

α(x)W (x). (415)

Integrating this gives

W (x) = W (x0)e−

R xx0

dx′β(x′)/α(x′)(416)

which shows if W (x) is zero at one point, (x0), then it vanishes everywhere,otherwise it vanishes nowhere (recall α(x) > 0).

It follows that if the Wronskian is zero at one point, it vanishes at allpoints and the solutions must be proportional. If it is non-zero at one pointis cannot be zero at any point, so the solutions are independent.

Assume that a second order linear homogeneous differential equation hasthree independent solutions. This means that there no non-zero coefficientsc1, c2, c3 satisfying

c1u1(x) + c2u2(x) + c3u3(x) = 0 (417)

Differentiating I also have

c1du1

dx(x) + c2

du2

dx(x) + c3

du3

dx(x) = 0 (418)

75

and

c1d2u1

dx2(x) + c2

d2u2

dx2(x) + c3

d2u3

dx2(x) = 0 (419)

c1u1(x) + c2u2(x) + c3u3(x) = 0 (420)

Consider

det

u1 u2 u2du1

dxdu2

dxdu3

dx

−βα

du1

dx− γ

αdu1

dx−β

αdu2

dx− γ

αdu2

dx−β

αdu3

dx− γ

αdu3

dx

= 0 (421)

This vanishes because the last row is a linear combination of the first tworows for every value of x. This means that it is possible to find non-zerocoefficients c1, c2, c3 that make this vanish. It follows that the solutions arenecessarily linearly dependent.

It follows that second order linear homogeneous differential equationshave at most two independent solutions.

Next I show that if a second order linear homogeneous differential equa-tion has one solution u1(x) then it necessarily has a second independentsolution, u2(x).

To construct u2(x) assume that

u2(x) = u1(x)h(x) (422)

The differential equation implies

α(x)d2

dx2(u1(x)h(x)) + β(x)

d

dx(u1(x)h(x)) + γ(x)(u1(x)h(x)) = 0 (423)

The important observation is that all of the terms with no derivatives onh cancel by the differential equation when applied to u1. What survivesinvolves only first and second derivatives of h:

α(x)(d2h

dx2u1(x) + 2

dh

dx

du1

dx) + β

d

dx(u1(x)

dh

dx) = 0 (424)

d2h

dx2+ (

2

u1(x)

du1

dx+β(x)

α(x)

dh

dx= 0 (425)

This is a first order linear homogeneous equation for dhdx

. It has the form

d

dx(ln(

dh

dx)) = −2

d

dxln(u1(x)) − p(x) (426)

76

where

p(x) =

∫ x β(x′)

α(x′)dx′ (427)

Integrating this gives

dh

dx(x) =

c

u1(x)2e

R x−x0

p(x′)dx′

(428)

Integrating a second times gives

h(x) =

∫ x

x0

c

u1(x′)2e−

R x′

x0p(x′′)dx′′

dx′ (429)

u2(x) = h(x)u1(x) (430)

The structure of the solution leads to a Wronskian of the form

W (u1, u2) = u1d

dx(u1h) − u1h

d

dxu1 = u2

1(x)dh

dx= ce

R x−x0

p(x′)dx′

(431)

which is not zero. This shows that it is alway possible to construct a secondindependent solution to the homogeneous equation is one is already known.

Next assume that a solution to the homogeneous equation are known. Ishow that is possible to construct a solution to the inhomogeneous equation.

Assume a solution v(x) or the form v(x) = u(x)h(x). Using this in thedifferential equation, all of the terms involving no derivatives of h cancelbecause u1 satisfies the homogeneous form of the differential equation. Whatremains is

d2h

dx2+ (

2

u1(x)

du1

dx+β(x)

α(x))dh

dx=

q(x)

α(x)u1(x)(432)

To solve this let R(x) be a solution to

1

R

dR

dx=

2

u1(x)

du1

dx+β(x)

α(x)(433)

leading to the equation

Rd2h

dx2+dR

dx

dh

dx=

R(x)q(x)

α(x)u1(x)(434)

where

R(x) = cu1(x)2e

R xx0

p(x′)dx′

=cu2(x)

W (x)(435)

77

The differential equation can be written as

d

dx(Rdh

dx) =

R(x)q(x)

α(x)u1(x)(436)

which has the solution

dh

dx(x) =

1

R(x)

∫ x

x0

R(x′)q(x′)

α(x′)u1(x′)dx′ (437)

This can be integrated to get

dh(x) =

∫ x

x0

1

R(x′)

∫ x′

x0

R(x′′)q(x′′)

α(x′′)u1(x′′)dx′′dx′′ (438)

v(x) = u1(x)h(x) (439)

All of the methods just discussed require that one solution is known. Thisis used to express the other solution as the solution of a first order equationthat can be solved to get the desired solution.

In the general setting one is not typically fortunate enough to have asolution available.

78

0.20 Lecture 20

In this section I begin the treatment of linear differential operators.Define the differential operator

Lx :=N∑

n=0

an(x)dn

dxn. (440)

This is really a formal operator until we specify the vector space on which itacts.

Suppose that for a a given formal differential operator Lx there is anotherformal differential operator L†

x with the property that for any sufficientlydifferentiable functions u(x) and v(x) and positive weight w(x) on the interval[a, b], that the quantity

w(x)[v∗(x)Lxu(x) − u(x)(L†xv(x))

∗] =d

dxQ[u(x), v∗(x),

du

dx,dv∗

dx] (441)

where Q[x, u(x), v∗(x), dudx, dv∗

dx] is a function of the form

Q = A(x)u(x)v∗(x) +B(x)u(x)dv∗

dx(x)+

C(x)du

dx(x)v∗(x) +D(x)

du

dx(x)

dv∗

dx(x) (442)

Equation (??) is called the Lagrange identity and L†x is called the formal

adjoint of Lx with respect to the weight w(x).Integrating the Lagrange identity from a to b gives

∫ b

a

w(x)[v∗(x)Lxu(x)dx−∫ b

a

u(x)(L†xv(x))

∗dx =

Q[b, u(b), v∗(b),du

dx(b),

dv∗

dx(b)] −Q[b, u(a), v∗(a),

du

dx(a),

dv∗

dx(a)] (443)

Equation (??) is called a generalized Green’s identity. When Lx = L†x then

Lx is called self-adjoint.Next I discuss boundary conditions. Assume conditions of the general

form

B1(u) = α1u(a) + β1du

dx(a) + γ1u(b) + δ1

du

dx(b) = 0 (444)

79

B2(u) = α2u(a) + β2du

dx(a) + γ2u(b) + δ2

du

dx(b) = 0 (445)

When these boundary conditions are combined with the requirement

Q[b, u(b), v∗(b),du

dx(b),

dv∗

dx(b)] −Q[b, u(a), v∗(a),

du

dx(a),

dv∗

dx(a)] = 0 (446)

one can eliminate two of the four quantities u(a), dudx

(a), u(b), dudx

(b). Thecoefficients of the remaining two quantities with be linear combinations of

C1(v∗) = α3v

∗(a) + β3dv∗

dx(a) + γ3v

∗(b) + δ3dv∗

dx(b) = 0 (447)

C2(v∗) = α4v

∗(a) + β4dv∗

dx(a) + γ4v

∗(b) + δ4dv∗

dx(b) = 0 (448)

These are called adjoint boundary conditions.When the functions u satisfy the homogeneous boundary conditions (??)

and (??) and the function v∗ satisfies the adjoint homogeneous boundaryconditions then we obtain Green’s identity

∫ b

a

dxw(x)v∗(x)Lxu(x) =

∫ b

a

dxw(x)u(x)(L†v)∗(x) (449)

The choice of homogeneous boundary conditions, along with sufficientdifferentiability defines the domain of the formal differential operator opera-tor Lx. If Lx = L†

x and the domains of both operators are identical then theoperator is Hermitian.

In what follows I illustrate these concepts using second order differentialoperators. These have the form

Lx = a(x)d2

dx2+ b(x)

d

dx+ c(x) (450)

First I construct the formal adjoint operator. I consider the case w(x) = 1.Consider the following two quantities

a(x)v∗(x)d2

dx2u(x) − u(x)

d2

dx2(a(x)v∗(x)) =

d

dx

(

a(x)v∗(x)d

dxu(x) − u(x)

d

dx(a(x)v∗(x))

)

(451)

80

and

v∗(x)b(x)d

dxu(x) + u(x)

d

dx(b(x)v∗(x)) =

d

dx(v∗(x)b(x)u(x)) (452)

Since these are both total derivatives, their sum is a total derivative, whichgives after adding and subtracting v∗(x)c(x)u(x):

v∗(x)[a(x)d2

dx2u(x)b(x)

d

dxu(x) + c(x)u(x)]−

u(x)[d2

dx2(a∗(x)v(x)) − d

dx(b∗(x)v(x)) + v(x)c∗(x)]∗ =

d

dx

(

a(x)v∗(x)d

dxu(x) − u(x)

d

dx(a(x)v∗(x)) + v∗(x)b(x)u(x)

)

(453)

In this case the formal adjoint to Lx is the operator

L†x = a∗(x)

d2

dx2+ (2

da∗

dx− b∗(x))

d

dx+ (

d2a∗

dx2− db∗

dx+ c∗(x)) (454)

and the operator Q is

Q = a(x)v∗(x)d

dxu(x) − u(x)a(x)

d

dxv∗(x) + v∗(x)(b(x) − da

dx)u(x) (455)

The conditions for Lx to be self-adjoint are

a(x) = a∗(x) (456)

b(x) = 2da∗

dx− b∗(x) (457)

c(x) = (d2a∗

dx2− db∗

dx+ c∗(x)) (458)

The first condition requires that a is real. The second condition requires thatb(x) is real and given by

b(x) =da

dx(459)

The third condition requires that c is real. When Lx is self adjoint then

Q = a(x)(v∗(x)d

dxu(x) − u(x)

d

dxv∗(x)) (460)

In this case Lx can be expressed in the compact form

Lxu =d

dx

(

a(x)du

dx

)

+ c(x)u(x) (461)

81

0.21 Lecture 21

Next I show that given any second order differential equation with real co-efficient functions that it is possible to choose a weight functions so it isself-adjoint.

Let

L = a(x)d2

dx2+ b(x)

d

dx+ c(x) (462)

and consider

w(x)g∗(x)a(x)d2f

dx2− f(x)

d2

dx2(w(x)g∗(x)a(x)) =

d

dx{w(x)g∗(x)a(x)

df

dx− f(x)

d

dx(w(x)g∗(x)a(x))} (463)

and

w(x)g∗(x)b(x)df

dx+f(x)

d

dx(w(x)g∗(x)b(x)) =

d

dx(w(x)g∗(x)b(x)f(x)) (464)

Subtracting these terms gives

w(x)g∗(x){a(x)d2f

dx2+ b(x)

df

dx+ c(x)f(x)}−

f(x){ d2

dx2(a(x)g∗(x)w(x)) +

d

dx(b(x)g∗(x)w(x)) + c(x)g∗(x)w(x)} =

d

dx{w(x)a(x)g∗(x)

df

dx−f(x)

d

dx(w(x)g∗(x)a(x0)+w(x)b(x)g∗(x)f(x)} (465)

The left side of this equation can be put in the form

w(x)g∗(x)(Lxf)(x) − w(x)f(x){a(x) d2

dx2+ (

2

w

d

dx(w(x)a(x)) − b(x))

d

dx+

(c(x) − 1

w

d

dx(w(x)b(x)) +

1

w(x)

d2

dx2(a(x)w(x)))}g∗(x) (466)

From this expression it is possible to read off L† and Q:

L† = a∗(x)d2

dx2+ (

2

w

d

dx(w(x)a∗(x)) − b∗(x))

d

dx+

82

(c∗(x) − 1

w

d

dx(w(x)b∗(x)) +

1

w(x)

d2

dx2(a∗(x)w(x)))} (467)

and

Q = w(x)a(x)g∗(x)df

dx− f(x)w(x)a(x)

dg∗

dx−

f(x)d

dx(w(x)a(x))g∗(x) + w(x)b(x)g∗(x)f(x) (468)

In arriving at this result the weight w(x) was assumed to be known. It ispossible to try to choose w(x) so L = L†. I show that this can be done if allof the coefficient functions are real: a(x) = a∗(x), b(x) = b∗(x), c(x) =c∗(x). The requirement that L = L† is equivalent to the following relations

a(x) = a(x) (469)

b(x) =2

w

d

dx(w(x)a(x)) − b(x) (470)

c(x) = c(x) − 1

w

d

dx(w(x)b(x)) +

1

w(x)

d2

dx2(a(x)w(x)) (471)

The first equation is trivially satisfied, the second and third equations aresatisfied provided

b(x) =1

w

d

dx(w(x)a(x)) (472)

This equation can be integrated to find the desired weight:

b(x)

a(x)− 1

a(x)

da

dx=

1

w

d

dxw(x) (473)

⇓

ln(w(x)

w(x0)) = − ln(

a(x)

a(x0)) +

∫ x

x0

b(x′)

a(x′)dx′ (474)

w(x) =w(x0)a(x0)

a(x)e

R xx0

b(x′)

a(x′)dx′

(475)

which can be made positive by a suitable choice of w(x0).With this choice the function Q becomes

Q = w(x)a(x)(g∗(x)df

dx− f(x)

dg∗

dx) (476)

83

We can also investigate the boundary conditions. The differential op-erator is Hermitian if the boundary conditions are identical to the adjointboundary conditions. I consider some useful cases:Dirichlet Boundary conditions: Dirichlet boundary conditions at x = aand x = b are f(a) = f(b) = 0. The adjoint boundary conditions can beread off from (476):

w(a)a(a)df

dx(a)g∗(a) = 0 w(b)a(b)

df

dx(b)g∗(b) = 0 (477)

which are satisfied for g(a) = g(b) = 0. Since these are identical to the bound-ary conditions on f , it follows that L with Dirichlet boundary conditions isa Hermitian operator.Neumann Boundary conditions: Neumann boundary conditions at x = aand x = b are df

dx(a) = df

dx(b) = 0. The adjoint boundary conditions can be

read off from (476):

w(a)a(a)f(a)dg

dx

∗

(a) = 0 w(b)a(b)f(b)dg

dx

∗

(b) = 0 (478)

which are satisfied for dgdx

(a) = dgdx

(b) = 0. Since these are identical to theboundary conditions on f , it follows that L with Neumann boundary condi-tions is a Hermitian operator.Mixed Boundary conditions: Mixed boundary conditions at x = a andx = b are

αf(a) +df

dx(a) = 0βf(b) +

df

dx(b) = 0 (479)

Now I compute the adjoint boundary conditions note that the mixed condi-tions lead to the relations

w(a)a(a)(g∗(a)(−αf(a)) − f(a)dg∗

dx(a)) = 0 (480)

w(b)a(b)(g∗(b)(−βf(b)) − f(b)dg∗

dx(b)) = 0 (481)

Factoring out f gives

w(a)a(a)f(a)(−g(a)α∗ − dg

dx(a))∗ = 0 (482)

w(b)a(b)f(b)(−g(b)β∗ − dg

dx(b))∗ = 0 (483)

84

For real α and β these are equivalent to

g(a)α+dg

dx(a) = 0 g(b)β +

dg

dx(b) = 0 (484)

which have the exact same form as the original boundary conditions. Thismeans the L with mixed boundary conditions and real coefficients (α, β) is aHermitian operator. Dirichlet and Neumann boundary conditions are specialcases of mixed boundary coefficients.Periodic Boundary conditions: Periodic boundary conditions at x = aand x = b are f(a) = f(b) and df

dx(a) = df

dx(b) The adjoint boundary conditions

can be read off from (476):

0 = w(b)a(b)(g∗(b)df

dx(b)− f(b)

dg∗

dx(b))−w(a)a(a)(g∗(a)

df

dx(a)− f(a)

dg∗

dx(a))

(485)

0 = f(a)(w(a)a(a)dg∗

dx(a) − w(b)a(b)

dg∗

dx(b))+

df

dx(a)(w(b)a(b)g∗(b) − w(a)a(a)g∗(a)) (486)

This will vanish provided w(a)a(a) = w(b)a(b) and g(a) = g(b) and dgdx

(a) =dgdx

(b). Thus L is Hermitian with periodic boundary conditions providedw(x)a(x) is periodic. Returning to the expression (475) for the weight

w(b)a(b) = w(a)a(a)eR b

ab(x′)

a(x′)dx′

(487)

The periodic boundary conditions requires

1 = eR b

ab(x′)

a(x′)dx′

(488)

85

0.22 Lecture 22:

In this section I discuss Green functions. Many of the concepts will be illus-trated using Hermitian differential operators, however some of the methodsdeal with larger classes of Hermitian operators.

The discussion starts by considering the problems

Lx|u〉 = |f〉 (489)

with homogeneous boundary conditions and

L†x|v〉 = |h〉 (490)

with the adjoint boundary conditionsAssume that one can find a linear operators G and g that satisfy

LxG = I (491)

where G has the same homogeneous boundary conditions as u(x) and

L†xg = I (492)

where g has the same adjoint boundary conditions as g(x) andThen

LxG|f〉 = I|f〉 = |f〉 (493)

which gives|u〉 = G|f〉 (494)

Similarly ThenL†

xg|h〉 = I|h〉 = |h〉 (495)

which gives|v〉 = g|h〉 (496)

Next I introduce more of the quantum mechanical notation that was usedpreviously

〈x|f〉 = f(x) (497)

〈f |g〉 =

∫ b

a

w(x)f ∗(x)g(x)dx (498)

The condition〈f |g〉 = 〈f |I|g〉 (499)

86

leads to

I =

∫

dx|x〉w(x)〈x| (500)

and

〈x|y〉 =1

w(x)δ(x− y) (501)

〈x|A|f〉 =

∫

〈x|A|y〉w(y)f(y)dy (502)

〈g|A|f〉 =

∫

g(x)∗w(x)〈x|A|y〉w(y)f(y)dy (503)

In this notation the Green’s functions G and g are denoted by

G(x, y) = 〈x|G|y〉 (504)

g(x, y) = 〈x|g|y〉 (505)

The equation LG = I reads

Lx〈x|G|y〉 =1

w(x)δ(x− y) (506)

and L†g = I reads

L†x〈x|g|y〉 =

1

w(x)δ(x− y) (507)

Note that if G and g satisfy the boundary conditions mentioned above then

〈g, LG〉 = 〈†g,G〉 (508)

where the meaning of this equation is

∫ b

a

〈x|g|z〉∗w(x)Lx〈x|G|y〉dx =

∫ b

a

(L†x〈x|g|z〉)∗w(x)〈x|G|y〉dx (509)

Note that equations (506) and (507) give

∫ b

a

〈x|g|z〉∗w(x)1

w(x)δ(x− y)dx =

∫ b

a

(1

w(x)δ(x− z))∗w(x)〈x|G|y〉dx (510)

which gives〈y|g|z〉∗ = 〈z|G|y〉 (511)

87

This means that second variable in G satisfies the adjoint boundary condi-tions.

In most cases we will consider G that Hermitian. Then G = g and bothvariables satisfy the same boundary conditions.

If the two independent solutions of the homogeneous form of the differ-ential equation are f1 and f2 then when x 6= y the kernel 〈x|G|y satisfies thehomogeneous form of the differential equation. It necessarily has the form

〈x|G|y〉 = a>(y)f1(x) + b>(y)f2(x) x > y (512)

〈x|G|y〉 = a<(y)f1(x) + b<(y)f2(x) x < y (513)

The problem the then to find the four coefficient functions a>(y), a<(y), b>(y), b<(y).To find this relation I use the form of the differential operator and the relation

Lx〈x|G|y〉 =1

w(x)δ(x− y) (514)

The desired relations and be obtained by integrating the general solutionsfor x > y and x < over the point x = y in the above relation.

To do this write (514) as

(d2

dx2+b(x)

a(x)

d

dx+c(x)

a(x)

)

〈x|G|y〉 =1

a(x)w(x)δ(x− y) (515)

Next let1

p(x)

dp

dx(x) =

b(x)

a(x)(516)

which can be integrated to get

p(x) = p(x0)eR xx0

b(x′)

a(x′)dx′

(517)

Using (517) in (515) gives

(

p(x)d2

dx2+dp

dx(x)

d

dx+p(x)c(x)

a(x)

)

〈x|G|y〉 =p(x)

a(x)w(x)δ(x− y) (518)

d

dx(p(x)

d

dx〈x|G|y〉 = −p(x)c(x)

a(x)〈x|G|y〉 +

p(x)

a(x)w(x)δ(x− y) (519)

88

Next I use the relation

δ(x− y) =d

dxθ(x− y) (520)

where

θ(x) =

{0 x < 01 x ≥ 0

(521)

Next I write

p(x)

a(x)w(x)δ(x− y) =

p(y)

a(y)w(y)δ(x− y) =

d

dx

p(y)

a(y)w(y)θ(x− y) (522)


d

dx

(

p(x)d

dx〈x|G|y〉 − p(y)

a(y)w(y)θ(x− y)

)

= −p(x)c(x)a(x)

〈x|G|y〉 (523)

Integrating (523) from x0 to x gives

p(x)d

dx〈x|G|y〉 − p(y)

a(y)w(y)θ(x− y)

−p(x0)d

dx〈x|G|y〉|x=x0

+p(y)

a(y)w(y)θ(x0 − y)

= −∫ x

x0

dx′p(x′)c(x′)

a(x′)〈x′|G|y〉 (524)

Next I evaluate this for x = y + ε and x0 = y − ε assuming that 〈x′|G|y〉is bounded and measurable. In the limit that ε → 0 the right hand side ofthe equation vanishes under this assumption. The left hand side becomes,canceling p(x)

d

dx〈x|G|y〉|x=x+ε

− d

dx〈x|G|y〉|x=x−ε

=1

a(x)w(x)(525)

This gives the discontinuity in the derivative of 〈x|G|y〉 is x at x = y.If I use (524) again write the derivative of 〈x|G|y〉 as

d

dx〈x|G|y〉 =

p(y)

p(x)a(y)w(y)θ(x− y) +

p(x0)

p(x)

d

dx〈x|G|y〉|x=x0

]

89

− p(y)

p(x)a(y)w(y)θ(x0 − y) − 1

p(x)

∫ x

x0

dx′p(x′)c(x′)

a(x′)〈x′|G|y〉 (526)

This has the formd

dx〈x|G|y〉 = f(x) (527)

where f(x) is a bounded function, which gives

〈x|G|y〉 = 〈x′|G|y〉+

∫ x

x0

f(x′)dx′ (528)

This implies that 〈x|G|y〉 is continuous at x = y. Equations (525) and (527)relate the coefficient functions a>(y), a<(y), b>(y), b<(y). The relations are

〈x|G|y〉 = (a>(y) − a<(y))f1(y) + (b>(y) − b<(y))f2(y) = 0 (529)

(a>(y) − a<(y))df1

dx(y) + (b>(y) − b<(y))

df2

dx(y) =

1

a(y)w(y)(530)

It is useful to write this pair of equations as a matrix equation

(f1(y) f2(y)df1

dx(y) df2

dx(y)

)(a>(y) − a<(y)b>(y) − b<(y)

)

=

(01

a(y)w(y)

)

(531)

which has a solution

(a>(y) − a<(y)b>(y) − b<(y)

)

=

(f1(y) f2(y)df1

dx(y) df2

dx(y)

)−1(01

a(y)w(y)

)

(532)

because the determinant of the matrix is the Wronskian of the solutions ofthe homogeneous equation which is never zero.

The remaining relations are determined by imposing the homogeneousboundary conditions on the x variable in 〈x|G|y〉. This works if Lxf(x) =has no non-trivial solutions satisfying the homogeneous boundary conditions.

To understand this result first note that we can always solve for thedifferences a> − a< and b> − b< because the Wronskian is not zero anda(y)w(y) is not zero. We can write

〈x|G|y〉 = a>(y)f1(x) + b>(y)f2(x) + 〈x|R|y〉 (533)

where here 〈x|R|y〉 is defined by this equation assuming 〈x|G|y〉 exists.

90

Recall that the homogeneous boundary conditions can be considered aslinear operators B1 and B2 on the solution, giving formally

B1(G) = a>B1(f1) + b>B1(f2) +B1(R) = 0 (534)

B2(G) = a>B2(f1) + b>B2(f2) +B2(R) = 0 (535)

This is a linear system for the coefficients a> and b>(B1(f1) B1(f2)B2(f1) B2(f2)

)(a>(y)(y)b>(y)(y)

)

=

(−B1(R)−B2(R)

)

(536)

This system will have a unique solution if the matrix has a non-zero deter-minant. This shows that the absence of non-trivial solutions satisfying thehomogeneous boundary conditions is sufficient for the existence and unique-ness of 〈x|G|y〉 If the determinant is zero then there is a non-trivial linearcombination g(x) = c1f1(c) + c2f2(x) that is a solution to the homogeneousequation satisfying B1(g) = B2(g) = 0.

Next we show that if G exists then the homogeneous equation has nonon-trivial solutions. Consider

〈v|LG〉 − 〈L†v|G〉 = 0 (537)

This is equivalent to

v∗(y) =

∫

w(x)〈x|L†|v〉∗〈x|G|y〉 = (538)

From this equation of L†|v〉 = 0 has a non-trivial solution satisfying theadjoint homogeneous boundary conditions then by (538) this solution mustvanish. Thus, L†|v〉 = 0 has no non-trivial solutions. By our previous argu-ment to g and L† it follows that 〈x|g|y〉 exists and is unique. Next consider

〈g|Lf〉 = 〈L†g|f〉 = f(y) (539)

which shows that if Lf = 0 has a solution it must be zero, which means thehomogeneous equation has no non-trivial solutions.

Next I consider an example. The differential equation is

L〈x|u〉 = 〈x|f〉 (540)

91

where

L =d2

dx2(541)

on the interval [0, 1] satisfying Dirichlet boundary conditions

〈0|u〉 = 〈1|u〉 = 0 (542)

First note that there are two independent solutions to the homogeneousequation

L〈x|f〉 = 0 (543)

given by〈x|f1〉 = 1 〈x|f1〉 = x (544)

If we set a + b · 0 = 0 and a + b · 1 = 0 we conclude that there are nonon-trivial solutions to the homogeneous equation satisfying the boundaryconditions.

We also note that

〈x|v〉∗L〈x|u〉 − 〈x|u〉L〈x|v〉∗ =d

dx(〈x|v〉∗ d

dx〈x|u〉 − 〈x|u〉 d

dx〈x|v〉∗ (545)

which shows that L is self adjoint with weight 1. It is also straightforwardto show that the adjoint boundary conditions are

〈0|v〉 = 〈1|v〉 = 0 (546)

which are identical to the boundary conditions on 〈x|u〉, which show that Lwith these boundary conditions is Hermitian.

The Green’s function, 〈x|G|y〉 has the general form

〈x|G|y〉 =

{a>(y) · 1 + b>(y) · x x > ya<(y) · 1 + b<(y) · x x < y

(547)

The boundary conditions at x = y for a(x) = w(x) = 1 so 1/(a(y)w(y) = 1are

(a>(y) − a<(y)) + (b>(y) − b<(y))y = 0 (548)

(b>(y) − b<(y))1 = 1 (549)

which gives(b>(y) − b<(y)) = 1 (550)

92

(a>(y) − a<(y)) = −y (551)

Next we impose Dirichlet boundary conditions 0 and 1.On the left end we can assume that y > x = 0 so

0 = a<(y) · 1 + b<(y) · 0 = a<(y) (552)

On the right end we can assume that 1 = x > y

0 = a>(y) · 1 + b>(y) · 1 (553)

Taking these four equations together we get

a<(y) = 0 a>(y) = −y b>(y) = y b<(y) = y − 1 (554)

Putting these equations together gives

〈x|G|y〉 =

{y(x− 1) x > y(y − 1)x x < y

(555)

We can now use this to construct the general solution to the equation

L〈x|u〉 = 〈x|f〉 (556)

which is

〈x|u〉 =

∫ 1

0

dy〈x|G|y〉〈y|f〉 (557)

which in this case is

〈x|u〉 = (x− 1)

∫ x

0

dyy〈y|f〉+ x

∫ 1

x

(y − 1)〈y|f〉 (558)

This clearly vanishes at x = 0 and x = 1. We can check the differentialequation

d2

dx2〈x|u〉 =

d

dx[

∫ x

0

dyy〈y|f〉+ (x− 1)x〈x|f〉 +

∫ 1

x

(y − 1)〈y|f〉 − x(x− 1)〈x|f〉] =

d

dx[−∫ x

0

dyy〈y|f〉+

∫ 1

x

(y − 1)〈y|f〉] =

93

−x〈y|f〉 − (1 − x)〈y|f〉 = 〈y|f〉 (559)

as required.To be more specific choose 〈x|f〉 = x2. Then the general formula becomes

〈x|u〉 = (x− 1)

∫ x

0

dyy3 + x

∫ 1

x

(y − 1)y2 =

1

4(x− 1)x4 + x(

1

4− 1

3− x4

4+x3

3) =

1

12x(x3 − 1) (560)

Which obviously satisfies the differential equation and boundary condition.In this example the solution could have been written down by inspections;

the advantage of the Green function methods is that solutions for any choiceof 〈x|f〉 can be calculated this way.

94

0.23 Lecture 23

In this section I consider the situation where the homogeneous equation

Lx|f〉 = 0 (561)

has non-vanishing solutions satisfying homogeneous boundary conditions. Inthis case it is still possible to construct a generalized Green’s function. I startby assuming that

L†x|vi〉 = 0 (562)

Lx|ui〉 = 0 (563)

If there are more than one solution to these equations we can without loss ofgenerality choose linear combinations of these solutions that are orthonormal

〈vi|vj〉 = δij 〈ui|uj〉 = δij (564)

The equations

LxG = I −∑

i

|vi〉〈vi| 〈ui|G = 0 (565)

L†xg = I −

∑

i

|ui〉〈ui| 〈vi|g = 0 (566)

can be solved. The solution is called a generalized Green’s function.As an example of how this works consider

L =d2

dx2(567)

with periodic boundary conditions at x = −a and x = a. Clear the function

〈x|u1〉 =1√2a

(568)

is a unit normalized solution to the homogeneous equation satisfying thehomogeneous boundary conditions. In this case G = g,

〈x|v1〉 = 〈x|u1〉 =1√2a

(569)

95

and equations (565) and (566) become

d2

dx2〈x|G|y〉 = δ(x− y) − 〈x|v1〉〈v1|y〉 =

δ(x− y) − 1

2a(570)

The above equations can be solved for x 6= y. Using

d2

dx2

(x2

4a

)

=1

2a(571)

givesd2

dx2(〈x|G|y〉 +

x2

4a) = 0 (572)

for x 6= y. The solutions of this equation are arbitrary linear combinationsof the solutions of the homogeneous form of the differential equations. Itfollows that

〈x|G|y〉 = −x2

4a+ a>(y)1 + b>(y)x x > y (573)

and

〈x|G|y〉 = −x2

4a+ a<(y)1 + b<(y)x x < y (574)

The boundary conditions on the generalized green function remain un-changed because the subtracted terms is smooth across x = y. This meansthat

a>(y) + b>(y)y − y2

4a− a<(y) − b<(y)y +

y2

4a= 0 (575)

andb>(y) − y

2a− b<(y) +

y

2a= 1 (576)

which are identical to the matching conditions on the regular Green functions.These can be solved to give

b<(y) = b>(y) − 1 (577)

a<(y) = a>(y) + y (578)

Next we impose the periodic boundary conditions and the condition that

〈u1|G = 0 (579)

96

〈x|G|y〉 = −x2

4a+ a>(y)1 + b>(y)x x > y (580)

and

〈x|G|y〉 = −x2

4a+ (a>(y) + y)1 + (b>(y) − 1)x x < y (581)

The periodic boundary conditions require

−a2

4a+ a>(y)1 + b>(y)a = −(−a)2

4a+ (a>(y) + y)1 + (b>(y) − 1)(−a) (582)

and

− a

2a+ b>(y) = −(−a)

2a+ (b>(y) − 1) (583)

The second equation is trivially satisfied for any value of b>(y). The firstequation gives

b>(y) =y

2a+

1

2= 1 + b<(y) (584)

To find a>(y) the requirement 〈u|G = 0 means that

0 =

∫ a

−a

1√2a

〈x|G|y〉dx. (585)

Multiply through by√

2a to get

0 =

∫ y

−a

[−x2

4a+ (a>(y) + y)1 + (b>(y) − 1)x]dx+

∫ a

y

[−x2

4a+ a>(y)1 + b>(y)xdx =

− y3

12a− a3

12a+ (a> + y)(y + a) + (b> − 1)

y2

2− (b> − 1)

a2

2

− a3

12a+

y3

12a+ a>(a− y) + b>

a2

2− b>

y2

2(586)

Inserting

b> =y

2a+

1

2(587)

97

in (??) gives

− y3

12a− a3

12a+ (a> + y)(y + a)((

y

2a+

1

2) − 1)

y2

2− ((

y

2a+

1

2) − 1)

a2

2

− a3

12a+

y3

12a+ a>(a− y) + (

y

2a+

1

2)a2

2− (

y

2a+

1

2)y2

2(588)

which is a linear equation for a > which can be solved to get

a >= − 1

2a(a2

3+y2

2+ ya) (589)

along with the full solutions for the coefficients

a< = − 1

2a(a2

3+y2

2− ya) (590)

b> =y

2a+

1

2(591)

b< =y

2a− 1

2(592)

This example shows how to handle the special case of generalized Green’sfunctions. In order to understand the nature of Generalized Green functionsconsider a compact operator of the form

L =∑

m=1

|vn〉λn〈un| (593)

andL† =

∑

m=1

|un〉λn〈vn| (594)

Note thatL†L =

∑

m=1

|un〉λ2n〈un| (595)

LL† =∑

m=1

|vn〉λ2n〈vn| (596)

so λn = 0 impliesL|un〉 = L†|vn〉 = 0 (597)

98

(LG′) = (LG′)† =

′∑

n=1

|vn〉〈vn| (608)

(G′L) = (G′L)† =

′∑

n=1

|un〉〈un| (609)

These equations are called the Penrose equations. They always have a uniquesolution called the Moore Penrose Generalized inverse.

So far we have used Green functions to treat the case where the solutionsatisfies homogeneous boundary conditions. We can also treat the case wherethe solution satisfies inhomogeneous boundary conditions of the form

bi1f(b) + bi2f′(b)bi3f(a) + bi4f

′(a) = σi (610)

for i = 1, 2.To solve this problem construct the Green function for the differential

equation satisfying the homogeneous form of the boundary conditions.Consider the generalized Green’s identity:

∫

[(〈x|g|y〉)∗w(x)Lx〈x|f〉 − (L†x〈x|g|y〉)∗w(x)〈x|f〉]dx =

Q(g∗, f, g∗′, f ′)(b) −Q(g∗, f, g∗′, f ′)(a) (611)

This holds for any f . Using L†g = I gives

∫

(〈x|g|y〉)∗w(x)Lx〈x|f〉dx =

〈x|f〉 +Q(g∗, f, g∗′, f ′)(b) −Q(g∗, f, g∗′, f ′)(a) (612)

Next choose the weight functions w that makes L = L†. In this case Q hasthe form

Q(f, g) = w(x)a(x)[f(x)dg

dx− g(x)

df

dx]

and 〈x|g|y〉∗ = 〈y|G|x〉. It follows that

〈x|f〉 =

∫

〈y|G|x〉w(x)Lx〈x|f〉dx

100

−w(b)a(b)[〈y|G|b〉 dfdx

(b) − f(b)d

dx〈y|G|x = b〉]

+w(a)a(a)[〈y|G|a〉 dfdx

(a) − f(a)d

dx〈y|G|x = a〉] (613)

This equation is true for any function 〈x|f〉Now assume that L〈x|f〉 = 〈x|h〉. The case 〈x|h〉 = 0 is a special case

where this method also works. It gives solution to the homogeneous form ofthe differential equation with inhomogeneous boundary conditions.

Also note that G satisfies adjoint boundary conditions in the second vari-able. What survives involves boundary values of f .

To be specific assume G is the Green function satisfying homogeneousDirichlet boundary condition. The adjoint boundary conditions are alsoDirichlet, which means

0 = 〈y|G|b〉 = 〈y|G|a〉 (614)

This leaves〈x|f〉 =

∫

〈y|G|x〉w(x)〈x|h〉dx+

w(b)a(b)[f(b)d

dx〈y|G|x = b〉]

−w(a)a(a)[f(a)d

dx〈y|G|x = a〉] (615)

where the inhomogeneous dirichlet boundary conditions are

f(x = a) = f(a) f(x = b) = f(b) (616)

which is specified input on the right side of this equation.This equation gives the solution to

L〈x|f〉 = 〈x|h〉 (617)

with values 〈b|f〉 = f(b) and 〈a|f〉 = f(a)Obviously there is nothing special about Dirichlet boundary conditions.

A similar result could have been achieved using Neumann boundary condi-tion. Setting 〈x|h〉 = 0 gives a solution of homogeneous differential equationsatisfying inhomogeneous boundary conditions.

101

0.24 Strum Liouville Equations

Consider second order differential operator with weight chosen so L = L†. Inthis case

L〈x|f〉 =d

dx(a(x)

d〈x|f〉dx

) + c(x)〈x|f〉 (618)

Consider the eigenvalue problem for this operator

L〈x|f〉 = λ〈x|f〉 (619)

In what follows we assume that 〈x|G|y〉 exists and satisfies LG = I. We alsoassume that while 〈x|G|y〉 may have a discontinuous derivative at x = y, thefunctions

|〈x|G|y〉| < C <∞ (620)

for all a ≤ x, y ≤ b.Consider

L〈x|f〉 = λ〈x|f〉 (621)

Then

G|f〉 =1

λGL|f〉 =

1

λ|f〉 (622)

which means that |f〉 is also an eigenvector of G with eigenvalue 1λ. The

eigenvalue equation has the from

〈x|G|y〉w(y)dy〈y|f〉 =1

λ〈x|f〉 (623)

The bound |〈x|G|y〉| < C <∞ ensures that for a bounded interval [a, b] that

∫

|〈x|G|y〉|2w(x)w(y)dxdy ≤ C2(b− a)2W 2 <∞ (624)

where W is any upper bound for w(x) on [a, b]. This shows that 〈x|G|y〉 isHilbert-Schmidt and hence G is compact.

It follows that 1|λn|

→ 0 as n→ ∞. This means that the eigenvalues of L

satisfy |λn| → ∞ as n→ ∞.Since we know

G =∞∑

n=0

|n〉ηn〈n| (625)

102

it follows that

L =∞∑

n=0

|n〉 1

ηn

〈n| (626)

ExampleLet γ(λ) denote a curve connecting points a and b as λ varies between 0

and 1. It follows thatγ(0) = 1 γ(1) = b (627)

In classical mechanics a particle travels along a path between two points thatis an extremum of the action functional defined by

A[γ] =

∫ 1

0

dλ[γiγi − V (~γ)] (628)

where V represents the ”potential energy”. The equations define the desiredpart involve writing a general path as the sum of the correct path ~γ0(λ) anda correction δ~γ(λ):

γi = γi0 + ηδγi(λ) (629)

where the requirement that both paths have the same endpoint means that

δ~γ(0) = δ~γ(1) = 0. (630)

The condition that ~γ0 is an extremum of this action functional can bewritten as

dA[γ, η]

dη= 0 (631)

when ~γ = ~γ0. This gives the usual Lagranges equations

d

dλ

(∂L

∂γi

)

− ∂L

∂γi= 0 (632)

We can ask the question if the solution of this equation is a minimum ofthe action functional or simply an extremum. This is done by looking at the“second derivative” and checking to see that it is positive. I will show thatthis will be the case if a certain Strum Liouville differential equation has onlypositive eigenvalues.

The check is to look at the second derivative of the action with respectto η evaluated at the extreme “point” ~γ = ~γ0:

d2

dη2A[~γ0 + ηδ~γ] =

103

∫ 1

0

{(∂2L

∂γi∂γj

)

0

δγiδγj + 2

(∂2L

∂γi∂γj

)

0

δγiδγj +

(∂2L

∂γi∂γj

)

0

δγiδγj

}

(633)

The partial derivative terms are well defined functions of the λ becausethey depend on the original curve γ0. The equation is quadratic in the varia-tions, but is is homogenous. The means the we can always scale the variationsto make this have any desired value. In order to remove the freedom to rescalethe variations we normalize them so

∫ 1

0

∑

i

δγiδγidλ = 1 (634)

We then define a new functional of the variations∫ 1

0

{

2

(∂2L

∂γi∂γj

)

0

δγiδγj + 2

(∂2L

∂γi∂γj

)

0

δγiδγj+

2

(∂2L

∂γi∂γj

)

0

δγiδγj − 2σ∑

i

δγiδγi

}

dλ (635)

We seek extreme values of this functional subject to the constraint (??). Inthis case

δ~γ = δ~γ0 + ρδ~γ1) (636)

Clearly the minimum will be an extremum of this functional. Taking thederivative with respect to ρ setting ρ = 0 gives

∫ 1

0

{

2

(∂2L

∂γi∂γj

)

0

δγi0δγj1 + 2

(∂2L

∂γi∂γj

)

0

(δγi0δγj1 + δγi1δγj0)

+2

(∂2L

∂γi∂γj

)

0

δγi0δγj1 − 2σδγi0γi1

}

(637)

Integrating the δγi1 by parts gives the following

∫ 1

0

{

2

(∂2L

∂γi∂γj

)

0

δγi0 − 2d

dλ[

(∂2L

∂γi∂γj

)

0

δγi0] + 2

(∂2L

∂γj∂γi

)

0

δγi0

− d

dλ[2

(∂2L

∂γi∂γj

)

0

δγi0] − 2σδγj0δ

}

δγj1dλ (638)

If this is required to be extremal then is should vanish for all δγi1. Thiscondition requires

104

The resulting equation is

− d

dλ[

(∂2L

∂γi∂γj

)

0

dδγi0

dλ] +

(∂2L

∂γi∂γj

)

0

δγi0 −d

dλ[

(∂2L

∂γi∂γj]

)

0

δγi0] = −σδγj0

(639)This equation has the general form

d

dλ

(

Aij(λ)dfi

dλ

)

+ Cij(λ)fj = σfi (640)

This is an ordinary Strum Liouville equation when γ is a one dimensionalcurve. The fi represent variations of the curve in extremal directions whilethe eigenvalue sigma represents the value of second order variation when thedisplacements are normalized to 1.

The condition for a minimum is that all of the eigenvalues of the StrumLiouville problem must be positive. If the time interval [0, 1] is changed theeigenvalues will change. Normally for short times the eigenvalues are positive,however eventually for longer times one eventually may pass through zero.

Next I consider the relation between Green functions, Resolvents, andsolutions to the Strum- Liouville problem.

Let L be a second order Strum Liouville differential operator with a givenset of boundary conditions and define

L′ = z − L (641)

where z is any complex number. We know that L has a complete set ofeigenvectors |fn〉 with eigenvalues λn which get large in magnitude as n→ ∞

ClearlyL′|fn〉 = (z − L)|fn〉 = (z − λn)|fn〉 (642)

so L′ also has the same complete set of eigenvectors with different eigenvalues.The Green’s function for L′ satisfies

L′G′(z) = I (643)

and it exists whenever z is not an eigenvalue of L. Expand

〈x|G′(z)|y〉 =∞∑

n=0

〈x|fn〉an(y) (644)

105

of L to both the Green functions and eigenfunction expansion.In some problems it is easy to solve the Strum Liouville eigenvalue prob-

lem, and then the above method can be used to write the Greens function interms of eigenfunctions and eigenvalues of L.

It also happens that if one can find the resolvent by other methods, it canbe used to extract solutions of the Strum Liouville problem. Assume the λm

is an isolated eigenvalue of L. We showed previously in the neighborhood ofany point z not an eigenvalue of of L that G(z) is analytic in z. For isolatedeigenvalues it as simple poles at the eigenvalues. Consider

1

2π

∫

G(z)dz (654)

about a contour that encloses z = λn, but no other eigenvalues. Then theabove integral becomes

1

2π

∫

G(z)dz = |fn〉〈fn| (655)

which is the orthogonal projector on the nth eigenstate of L.The advantage of the representation of the Green functions in terms

of eigenfunctions expansions is that they can be applied to any HermetianHilbert space operators, whether they are differential operators, partial dif-ferential operators, integral operators, or integrodifferential operators.

The resolvent G(z) has poles at the eigenvalues of L. There are preciselyat the points where L′ has homegeneous solutions satisfying homegeneousboundary conditions.

0.25 Lecture 25

So far our discussion of second order linear differential equations was based onthe assumption that we already had at least one solution to the homogeneousequation. In this section we discuss one method for solving a large class ofsecond order differential equation using power series.

Consider a second order linear differential operator of the form

L =d2

dz2+ p(z)

d

dz+ q(z) (656)

where p(z) and q(z) are analytic in an open region R except possibly on aset of isolated points. Recall that an open region (pathwise connected)

107

z0 ∈ R is called an ordinary point if both p(z) and q(z) are analytic atz = z0.

z0 ∈ R is called a singular point if p(z) or q(z) has a singularity atz = z0.

z0 ∈ R is called a regular singular point if p(z) has a pole of order ≤ 1and q(z) has a pole of order ≤ 2 at z = z0.

z0 ∈ R is called an irregular singular point if it is a singular point butnot a regular singular point.

Next I show that in a neighborhood of an ordinary point of L the equationLu(z) = 0 has two independent analytic solutions. Consider the problem

Lu(z) = 0 u(z0) = a u′(z0) = b (657)

The first step is to change this into a differential equation for a newfunction f(z) defined by

u(z) = f(z)e− 1

2

R zz0

p(z′)dx′

(658)

where the integral does not depend on path in a sufficiently small regionabout z0. Using this definition

u′(z) =

(

f ′(z) − f(z)1

2p(z)

)

e−

R zz0

p(z′)dx′

(659)

and

u′′(z) =

(

f ′′(z) − 21

2f ′(z)p(z) − f(z)

1

2p′(z) + f(z)

1

4p(z)2

)

e−

R zz0

p(z′)dx′

(660)Using these formulas in the differential equation gives

Lu(z) =

(

f ′′(z) − f ′(z)p(z) + f(z)1

4p(z)2f ′(z)p(z) − f(z)

1

2p′(z) − f(z)

1

2p(z)2 + q(z)f(z)

)

×

e− 1

2

R zz0

p(z′)dx′

= 0 (661)

108

which is equivalent to

f ′′(z) +

(

q(z) − 1

4p(z)2 − 1

2p′(z)

)

f(z) = 0 (662)

which I write asf ′′(z) = −k(z)f(z) (663)

with

k(z) = q(z) − 1

4p(z)2 − 1

2p′(z) (664)

To solve this integrate twice to get

f ′(z) = f ′(z0) −∫ z

z0

k(z′)f(z′)dz′ (665)

f(z) = f(z0) + f ′(z0)(z − z0) −∫ z

z0

∫ z′

z0

k(z′′)f(z′′)dz′′ =

a + b(z − z0) −∫ z

z0

dz′∫ z′

z0

k(z′′)f(z′′)dz′′ = (666)

I reduce the integral to a one dimensional integral by integrating by partsas follows

a + b(z − z0) −∫ z

z0

(d

dz′z′)

∫ z′

z0

k(z′′)f(z′′)dz′′ =

a + b(z − z0) +

∫ z

z0

z′k(z′)f(z′)dz′ − z

∫ z

z0

k(z′)f(z′)dz′ =

a+ b(z − z0) +

∫ z

z0

(z′ − z)k(z′)f(z′)dz′ (667)

I solve this by successive approximations. I assume that

〈x|f〉 =

∞∑

n=0

〈z|fn〉 (668)

where〈z|f0〉 = a+ b(z − z0) (669)

109

〈z|fn〉 =

∫ z

z0

(z′ − z)k(z′)fn−1(z′) (670)

By induction I assume that

|〈z|fn〉| ≤ F0Kn

0 (z − z0)2n

n!(671)

where K0 is the maximum value of k(x) along the curve between z0 and zUsing this induction assumption in equation (670) for n− 1 using the path

z′ = z0 + (z − z0)t (672)

I get|〈z|fn〉| ≤

∫ z

z0

|(z′ − z0 − (z − z0))|K0F0Kn−1

0 |z′ − z0|2n−2

(n− 1)!≤

|z − z0|2n

∫ 1

0

dt(1 − t)K0F0Kn−1

0 t2n−2

(n− 1)!≤ (673)

Next note∫ 1

0

dt(1 − t)t2n−2dt = B(2, 2n− 1) =Γ(2)Γ(2n− 1)

Γ(2n+ 1)=

1(2n− 2)!

(2n)!=

1

2n(2n− 1)≤ 1

n(674)


|〈z|fn〉| ≤ |z − z0|2nKn0 F0

1

(n)!(675)

which verifies that if the (n − 1)-st term satisfies the induction assumptionthen the n− th term satisfies the induction assumption.

Note that the starting function in the iteration is analytic and k(z) isanalytic (for z0 regular). At each stage the second derivative of the n −th terms is −k(z) times the n − 1-st term, which is analytic. Consider adisk centered at z = z0 where both f0(z) and k(z) are analytic. Then theintegral in analytic in this region (see equation 16.1 of the text). Integratingthis function again we obtain another function analytic in the same region.

110

It follows that the series (667) is a uniformly convergent sum of analyticfunctions in a region where k(z) is analytic.

Because of the uniform convergence I can change the order of sum andderivative to obtain

d2

dx2〈x|f〉 =

∞∑

n=0

〈z|fn〉 =∞∑

n=1

d2

dx2〈z|fn〉 =

−∞∑

n=1

k(z)〈z|fn−1〉 = −k(z)∞∑

n=0

〈z|fn〉 (676)

which shows that this series satisfies the differential equation. The seriessolution also clearly satisfies the boundary conditions.

If there is another solution satisfying the same boundary condition thedifference must be a solution of the equation that vanishes and has vanish-ing derivative at z = z0. If follows from the differential equation that allderivatives vanish. By Taylor’s theorem the difference function is necessarilyzero.

This shows that the series solution is the unique analytic solution of thedifferential equation satisfying the two boundary conditions at z0.

Since the solution of the differential equation is analytic in a neighborhoodof an ordinary point z0 it can be expressed as a convergent power series aboutz − z0

u(z) =

∞∑

n=0

cn(z − z0)n (677)

The equation can be solved by recursion:

k(z) =∑

kn(z − z0)n (678)

d2u

dz2= −k(z)u(z) (679)

∞∑

n=0

cnn(n− 1)(z − z0)n−2 =

∞∑

l=0

∞∑

m=0

klcm(z − z0)m+l. (680)

The first two terms on the left do not contribute, giving

∞∑

n=0

cnn(n− 1)(z − z0)n−2 =

∞∑

n=2

cnn(n− 1)(z − z0)n−2 =

111

∞∑

n=0

cn+2(n + 2)(n+ 1)(z − z0)n, (681)

while the sum on the right is

∞∑

l=0

∞∑

m=0

klcm(z − z0)m+l =

∞∑

k=0

(z − z0)k

k∑

m=0

kk−mcm (682)

Equating the coefficients of identical powers of (z − z0) gives the recursionrelations

cn+2 =n∑

m=0

kn−m

(n+ 2)(n + 1)cm (683)

which requires the boundary conditions c0 and c1 to start the recursion.As an example consider Hermite’s equation

d2

dx2〈z|u〉 − 2z

d

dz〈z|u〉 + 2λ〈z|u〉 = 0 (684)

Here I simply write the solution as a series

〈z|u〉 =

∞∑

n=0

cnzn (685)

Inserting this series into the differential equation gives

∞∑

n=2

cnn(n− 1)zn−2 −∞∑

n=1

2ncnzn + 2λ

∞∑

n=0

cnzn = (686)

∞∑

n=0

cn+2(n + 2)(n+ 1)zn −∞∑

n=1

2ncnzn + 2λ

∞∑

n=0

cnzn (687)

This gives the recursion

cn+2 =2(n− λ)

(n + 2)(n+ 1)cn (688)

112

Note that the sum is finite when λ is an integer, but analytic solutions existfor non-integer lambdas.

Next I consider solutions for the case that the differential equation issolved at a regular singular point. In this case

A(z) = (z − z0)p(z) B(z) = (z − z0)2q(z) (689)

are analytic in a region that includes the point z0 These functions can beexpanded about z0

A(z) =∞∑

n=0

an(z − z0)n B(z) =

∞∑

n=0

bn(z − z0)n (690)

where

an =1

2πi

∮

A(z)(z − z0)−n+1 bn =

1

2πi

∮

B(z)(z − z0)−n+1 (691)

where the integral is a circle in the region of analyticity about z0.I try a solution of the differential of the form

〈z|u〉 = (z − z0)r

∞∑

n=0

cn(z − z0)n (692)

The constant r will be chosen later.Inserting all three series in the differential equation, and multiplying the

result by (z − z0)2−r gives

∞∑

n=0

(n+ r)(n+ r − 1)cn(z − z0)n =

∞∑

k=0

∞∑

r=0

akcm(m+ r)(z − z0)k+1+m−1 +

∞∑

k=0

∞∑

m=0

bkcm(z − z0)k+m

∞∑

n=0

n∑

m=0

(an−mcm(m + r) + bn−mcm)(z − z0)n (693)

Equating coefficients of zn gives

(n+ r)(n+ r − 1)cn +n∑

m=0

(an−m(m + r) + bn−m)cm) = 0 (694)

113

Next I separate all terms involving cn to get

cn[(n+r)(n+r−1)+a0(n+r)+b0] = −n−1∑

m=0

(an−m(m+r)+bn−m)cm) (695)

This recursion make sense as long as n 6= 0 We can now choose r so thecoefficient of c0 vanishes. This gives

λ0(r) := r(r − 1) + a0r + b0 = 0 (696)

This is called the indicial equation. It has two roots that allow us to choosec0 as an arbitrary starting value for the recursion.

This equation has two roots. If they are distinct and do not differ byintegers they lead to independent solutions of the differential equation.

The indicial equation has two roots. I call them r+ and r−. I distinguishthem by requiring Re(r+) ≥ Re(r−).

First I demonstrate that (692) with cn determined by (695) with r = r+.

(n+ r)(n+ r − 1) + a0(n+ r) + b0 = λ0(n+ r) (697)

Because r+ is the root of this polynomial with the largest real part, this isnecessarily non-sero for any n > 0 (otherwise there will be a root with largerreal part). This gives

cn = −n−1∑

m=0

(an−m(m+ r) + bn−m)cm)

λ(n + r+)(698)

Next I argue that the series generated this way is uniformly convergentin a sufficiently small disk about z = z0.

Start by observing that both A(z) and B(z) are analytic in a neigh-borhood of z0. In a disk D in the region of analyticity both functions arebounded because they are analytic on the closed disk:

‖A(z)| < Λ1 (699)

‖B(z)| < Λ2 z ∈ D (700)

The integral representation (691) of the coefficients an and bn imply thebounds

an ≤ Λ1

Rnbn ≤ Λ1

Rn(701)

114

Using these bounds

|cn| = |n−1∑

m=0

(an−m(m + r) + bn−m)cm)

λ(n+ r+)| ≤

|n−1∑

m=0

(Λ1(m+ r) + Λ2))

Rn−mλ(n+ r+)||cm| (702)

The denominator can be bounded by noting that

|λ(n+ r+)| = n2 + n(2r+ − 1 + a0) (703)

in addition, solving the indicial equation gives

2r± + a0 − 1 = ±√

(1 − a0)2 − 4b0 (704)

where one of the roots has positive real part which means

2r+ + a0 − 1 ≥ 0 (705)

It follows that

|λ(n+ r+)| = n2 + n(2r+ − 1 + a+ 0) < n2 (706)

which gives

|cn| = |n−1∑

m=0

(an−m(m + r) + bn−m)cm)

λ(n+ r+)| ≤

|n−1∑

m=0

(Λ1(m + r+) + Λ2))

n2Rn−m||cm| (707)

giving

|Rncn| ≤K

n|

n−1∑

m=0

||Rmcm| ≤ K|Rmcm| (708)

where

K ≥ (Λ1(1 + r+) + Λ2)) ≥(Λ1(m + r+) + Λ2))

n(709)

By induction assume that

Rm|cm| ≤ KmF0 (710)

115

then it follows that

|Rncn| ≤ F0K

n|

n−1∑

m=0

||Km| = (711)

|Rncn| ≤ F0Kn 1

n||1 +

1

K+ · · ·+ 1

Kn−1| (712)

We can always choose K > 1 which gives

|Rncn| ≤ F0Kn (713)

It follows that〈z|u〉 = (z − z0)

rf(z) (714)

|f(z)| ≤∞∑

n=0

|cn(z − z0)n| (715)

∞∑

n=0

|F0K(z − z0)

R|n (716)

which converges whenever

|K(z − z0)

R| < 1 (717)

By starting with the root with the largest real part we ensured thatλ(n+ r+) never vanished. This also holds for the root with the smaller realpart provided the two roots do not differ by an integer. In this case the secondsolution is determined by the same procedure as the above solution, butstarting with the second solution of the indicial equation. The convergenceproof is unchanged.

In the case the that two roots of the indical equation differ by an integerthe series method only works for the root with the larger real part. The othersolution can be constructed by writing the second solution in the form

u2(z) = h(z)u1(z) (718)

where previously we found

dh

dz=

c

u1(z)2e

R zz0

p(z′)dz′(719)

116

Using the series expansion for p(z) and u+(z)

p(z) =1

z − z0

∞∑

n=0

an(z − z0)n (720)

u+(z) = (z − z0)r+

∞∑

n=0

cn(z − z0)n (721)

dh

dz=

c

((z − z0)r+∑∞

n=0 cn(z − z0)n)2e−a0 ln(z−z0)−

P

∞

n=11n

an(z−z0)n

(722)

where we have absorbed an infinite constant in c, This does not change thez dependence.

The function dhdz

has the general form

dh

dz=

c

(z − z0)2r++a0F (z) (723)

where F (z) is analytic near z = z0 provided that c0 6= 0.We note that since r− = r+ −N where N is a positive integer we have

2r+ + a0 = 2r− + a0 + 2N (724)

Recall the indicial equation has the form

r(r − 1) + a0r + b0 = 0 (725)

Subtracting the equation with each root gives

(r+)2 − (r−)2 + (a0 − 1)(r+ − r−) = 0 (726)

r+ + r− + a0 − 1 = 0 (727)

r+ + r+ −N + a0 − 1 = 2r+ + a0 −N − 1 (728)

Using this in the representation (??) gives

dh

dz=

c

(z − z0)N+1F (z) (729)

It follows that if

F (z) =∞∑

n=1

fn(z − z0)n (730)

117

then

h(z) = cfN ln(z − z0) + (z − z0)−N

∞∑

n=1,6=N

cfn

n−N(z − z0)

n (731)

is a solution of the above equation. We finally get the form of the generalsolution

u−(z) = u+(z)[d0 ln(z − z0) + (z − z0)−NG(z)] (732)

where G(z) is analytic. This can be solved by the series method by expandingu+(z) and G(z)

0.26 Lecture 26

In this section I consider a class of differential equations with regular singularpoints. These are called Fuchsian equations. We limit our considerations toa class of Fuchsian equations with at most three singular points in the entirecomplex plane - including a possible singular point at infinity. This type ofequation was studied by Riemann and includes a large class of equations thatare of great importance in physics.

It will be useful to write these equations in the following form

d2u

dz2+

(1 − α− α′

z − z1+

1 − β − β ′

z − z2+

1 − γ − γ′

z − z3+

)du

dz+

[(z1 − z2)(z1 − z3)αα

′

z − z1+

(z2 − z1)(z2 − z3)ββ′

z − z2+

(z3 − z2)(z3 − z1)γγ′

z − z3

]

×

u(z)

(z − z1)(z − z2)z − z3)(733)

whereα + α′ + β + β ′ + γ + γ′ = 1 (734)

The equation is put in this form because the indicial equations at each sin-gular point are

r(r − 1) + (1 − α− α′)r + αα′ = 0 (735)

r(r − 1) + (1 − β − β ′)r + ββ ′ = 0 (736)

r(r − 1) + (1 − γ − γ ′)r + γγ′ = 0 (737)

118

which have roots α, α′, β, β ′, and γ, γ′.Solutions of this equation are is represented by the symbol

P

z1 z2 z3α β γ zα′ β ′ γ′

(738)

The form of this function can be represented will have a different form whenz is in a neighborhood of each of the three singular points. This symbol hasnine parameters.

If the symbol is multiplied by

(z − z1)r(z − z2)

s(z − z3)t (739)

(z − z1)r(z − z2)

s(z − z3)tP


=

(z − z1

z − z3

)r (z − z2

z − z3

)s

P


=

(z − z2

z − z1

)s(z − z3

z − z1

)t

P


=

(z − z1

z − z2

)r (z − z3

z − z2

)t

P


=

P

z1 z2 z3α + r β + s γ + t zα′ + r β ′ + s γ′ + t

(740)

The proof of this result will be a homework exercise. This shows that givenone solution of the differential equation corresponding to one set of the nineparameters (eight independent) we can relate it to a special class of solutionshaving only six independent parameters. This is because we are free to choosethe parameters r, s

If we use the homographic transformations

z′ =Az +B

Cz +Dz′i =

Azi +B

Czi +D(741)

119

it is possible to show

P


= P

z′1 z′2 z′3α β γ z′

α′ β ′ γ′

(742)

This will also be verified in the homework. The result means that the six pa-rameters can be reduced to three by utilizing the freedom to use homographictransformations to move the positions of the singularities.

Using this freedom we choose

z1 = 0, z2 = ∞, z3 = 1 (743)

r = −α, s = α + γ, t = −γ (744)

which means that the general Riemann P-symbol can be expressed in termsof the special P-symbol

P

0 ∞ 10 α + β + γ 0 z

α′ − α β ′ + α + γ γ′ − γ

(745)

We define

1 − c := α′ − α b := β ′ + α + γ c− a− b := γ ′ − γ (746)

P


= (747)

(z − z1)−α(z − z2)

α+γ(z − z3)−γP

0 ∞ 10 a 0 Az+B

Cz+D

1 − c b c− a− b

(748)

where

0 =Az1 +B

Cz1 +D(749)

∞ =Az2 +B

Cz2 +D(750)

1 =Az3 +B

Cz3 +D(751)

120

can be solved to give

B

A= −z1

D

C= −z2

A

D=

(z2 − z3)

z − 2(z3 − z − 1)(752)

z′ =

(z − z1

z − z2

)(z3 − z2z3 − z1

)

. (753)

The function

P

0 ∞ 10 a 0 z

1 − c b c− a− b

(754)

is a solution to the hypergeometric equation.

d2u

dz2+

(c

z+ 0 +

1 − c + a+ b)

z − 1

)du

dz+

ab

z(z − 1)u(z) (755)

It is useful to multiply this equation by z(1 − z) to get

z(1 − z)d2u

dz2+ (z(1 + a+ b) − c)

du

dz+ abu = 0. (756)

The specific series solution about the origin, corresponding to the rootα = 0 of the indicial equation, with value 1 at the origin is called the Hy-pergeometric function, which is denoted by F (a, b, c, z) Because α = 0 it isanalytic about the origin thus can be expressed in the form

F (a, b, c, z) =∞∑

n=0

cnzn c0 = 1 (757)

Inserting this expansion in the differential equation above give

∞∑

n=0

n(n− 1)(zn − zn−1)cn +∞∑

n=0

n(zn(1 + a+ b) − czn−1)cn+

∞∑

n=0

abzncn = 0 (758)

121

Shifting to get common powers of z gives

∞∑

n=0

(n(n− 1)cn − (n+ 1)ncn+1 + n(1 + a+ b)cn − c(n + 1)cn+1+

abcn) zn = 0 (759)

which leads to the following relation between cn+1 and cn:

cn+1 ((n+ 1)(n+ c)) =(n2 + n(a + b) + ab

)cn (760)

which gives the simple relationship

cn =(n+ a)(n+ b)

(n+ 1)(n+ c)cn (761)

This can be iterated to get

cn =a(a + 1) · · · (n+ a− 1)b(b + 1) · · · (n+ b− 1)

n!c(c+ 1) · · · (n + c− 1)c0 (762)

a(a+ 1) · · · (n + a− 1) =Γ(n + a)

Γ(a)(763)

b(b + 1) · · · (n + b− 1) =Γ(n+ b)

Γ(b)(764)

c(c+ 1) · · · (n + c− 1) =Γ(n+ c)

Γ(c)(765)

Which gives the following expression for the hypergeometric function

F (a, b, c, z) =Γ(c)

Γ(a)Γ(b)

∞∑

n=0

Γ(n+ a)Γ(n + b)

n!Γ(n + c)zn (766)

This series converges for |z| < 1 since the next singular point is at 1.This function has a number of useful properties. They are derived in the

text.

z1−cF (b− c+ 1, a− c + 1, 2 − c, z)

122

satisfies the same differential equation as F (a, b, c; z) in a neighborhoodof z = 0. It gives the second independent solution to the Hypergeo-metric equation near z = 0.

Note that we can write equation (748), replacing the P -symbol on theright by the Hypergeometric function of the appropriate parameters toget

P


= (767)

(z − z1

z − z2

)α(z − z3

z − z2

)γ

×

F (α + β + γ, α+ β ′ + γ, 1 + α− α′;(z − z − 1)(z3 − z2)

(z − z2)(z3 − z1)(768)

Note that the differential equation is invariant with respect to the 3!permutations of the columns of the Riemann P symbol. On the otherhand the right side of the above equation is not symmetric with respectto permutations of the columns.

The equation is also symmetric with respect to α ↔ α′, β ↔ β ′ γ ↔ γ′,while the right side of (ref) is only symmetric with respect to β ↔ β ′.

Using these properties it is possible to relate both solutions in a neigh-borhood of each singular point to the hypergeometric functions. Thefollowing solutions can be derived using the above symmetries, of sim-ply checked by substituting them into the differential equation, usingproperties of the Hypergeometric functions.

The two solutions near the singular point z = 1 have the form

F (a, b, a+ b+ 1 − c, 1 − z) (769)

(1 − z)c−a−bF (c− b, c− a, 1 + c− a− b, 1 − z) (770)

while the two solutions near the singular point at infinity have the form

z−aF (a, a− c + 1, a− b + 1,1

z) (771)

z−bF (b, b− c+ 1, b− a+ 1,1

z) (772)

123

This shows that all of the solutions of the equation of Riemann arerelated to the Hypergeometric function.

2. The series solution to the hypergeometric equation has a limited ra-dius of convergence. The domain of analyticity can be extended usingintegral representations. Some of these representations are

F (a, b, c, z) =Γ(c)

Γ(b)Γ(c− b)

∫ ∞

1

(t− z)−a(ta−c(t− 1)c−b−1dt (773)

F (a, b, c, z) =Γ(c)

Γ(c− b)Γ(b)

∫ 1

0

(1 − tz)−atb−1(1 − t)c−b−1dt (774)

These can be checked by direct substitution into the differential equa-tion. There are many other representations that are analytic in differentregions.

Some special cases of Hypergeometric functions are

F (−a, b, b;−z) = (1 + z)a (775)

F (1, 1, 2;−z) =1

zln(1 + z) (776)

F (−λ, λ+ α + β + 1, α+ 1,1 − z

2) =

Γ(λ+ 1)Γ(α+ 1)

Γ(λ+ α+ 1)P

(α,β)λ (z) (777)

F (λ+ 1, λ+ α + 1, 2λ+ α + β + 2,2

1 − z) =

Γ(2λ+ α + β + 2)(z − 1)λ+α+1(z + 1)β

Γ(λ+ α + 1)Γ(λ+ β + 1)2λ+α+βQ

(α,β)λ (z) (778)

which are the Jacobi functions of the first and second kind.There are a number of special cases

Cα+1/2λ (z) =

Γ(2α+ λ+ 1)Γ(α + 1)

Γ(2α+ 1)Γ(λ+ α + 1)P

(α,α)λ (z) (779)

Qλ(z) = Q00λ (z) =

2λΓ(1 + λ)2

Γ(2λ+ 2)(z − 1)1+λF (λ+ 1, λ+ 1, 2λ+ 2,

2

1 − z(780)

The Riemann equation

d2f

dz2+

(c

z+

1 − a− b

z − z2

+1 − c− a+ b

z − z3

)df

dz+

abz2(z2 − z3)

z(z − z2)2(z − z3)f = 0 (781)

124

has regular singular points are z = 0 and z = z2, z3. Set

z2 = b = 2α z3 = α (782)

and let α → ∞ keeping a and c fixed. The resulting equation is

zd2f

dz2+ (c− z)

df

dz− af = 0 (783)

In this limit the distinct singular points at z2 and z3 are both transformedinto singular points at infinity. In this limit the point z = ∞ is no longer aregular singular point - it is an irregular singular point.

This equation has a solution that is analytic near zero and 1 at zero. Thiscan be constructed using the series method

Φ(a, c; z) =

∞∑

n=0

fnzn (784)

Using this in the differential equation gives

∞∑

n=0

n(n− 1)fnzn−1 +

∞∑

n=0

cnfnzn−1 −

∞∑

n=0

nfnzn −

∞∑

n=0

afnzn = 0 (785)

Finding common powers of z gives

∞∑

n=0

(n + 1(n)fn+1 + c(n+ 1)fn+1 − nfn − afn) zn (786)

which leads to the recursion

fn+1 =a+ n

(n+ 1)(n+ c)fn (787)

If we normalize this to 1 at z = 0 the series gives

Φ(a, c; z) =Γ(c)

Γ(a)

∞∑

n=0

Γ(a+ n)

Γ(c+ n)Γ(n + 1)zn (788)

This is called the Confluent Hypergeometric function. This series is validwhen c is not zero or a negative integer. When the series makes sense it isanalytic in the complex plane.

125

Comparing the series for the Hypergeometric function shows that

Φ(a, c; z) = limbto∞

F (a, b, c; z/b) (789)

When c is not an integer a second linearly independent solution of equation(?? ) is

z1−cΦ(a− c + 1, 2 − c, z) (790)

It is conventional to define

Ψ(a, c; z) :=Γ(1 − c)

Γ(a− c− 1)Φ(a, c, z)+

Γ(c− 1)

Γ(a)z1−cΦ(a−c+1, 2−c, z) (791)

There are a number of special functions that are special cases of the confluentHypergeometric function.

Some of the more familiar ones are

Hn(z√2) = 2nΦ(−n

2,1

2,z2

2) (792)

Lµn(z) =

Γ(n+ µ+ 1)

Γ(n+ 1)Γ(µ+ 1)Φ(−n, µ+ 1, z) (793)

Erf(z) = zΦ(1

2,3

2,−z2) (794)

Jν(z) =1

Γ(ν + 1)

(z

2

)ν

e−izΦ(ν +1

2, 2ν + 1, 2iz) (795)

Other related functions are

• Kelvin functions

• Coulomb Wave functions

• Incomplete Gamma Function

• Weber functions

• Error integral

• Airy functions

• Si and Ci functions

• Bateman functions

• Cunningham functions

126

0.27 Lecture 27

In this section I give a brief introduction to group theory, representationtheory, and tensors.Definition A group is a set G with a multiplication operator · with thefollowing properties

1. a, b ∈ G⇒ a · b ∈ G

2. ∃ e ∈ G such that ∀a ∈ G e · a = a · e = a. The uniqueelement e is called the identity element of the group.

3. ∀ a ∈ G ∃b satisfying a · b = b · a = e. The unique elementb is called the inverse of a.

4. ∀ a, b, c ∈ G (a · b) · c = a · (b · c).

There are many examples of groups.Example 1: The set of permutations of 4 objects with the group productbeing composition of permutations.

p1 :=

(1 2 3 44 3 1 2

)

p2 :=

(1 2 3 44 2 1 3

)

p1 · p2 =

(1 2 3 42 3 4 1

)

Example 2: The real numbers with the group product being addition.Example 3: Invertible N×N matrices with the group product being matrixmultiplication.Definition A subgroup H of a group G consists of a subset of the elementsof G that

1. Is closed under group multiplication

2. a ∈ H → a−1 ∈ H.

Example 4: The group of integers under addition is a subgroup of the realnumbers under addition.Example 5: The group of N×N matrices with determinant 1 is a subgroupof the invertible matrices.

127

Definition A group G is Abelian a · b = b · a for every a, b ∈ G. A groupthat is not-Abelian is called non-Abelian.Definition Two groups G1 and G2 are isomorphic there is a 1 to 1 corre-spondence between the elements of the two groups that preserves the groupproduct:

g1 ↔ g2, h1 ↔ h2, ⇒ g1 ·1 h1 ↔ g2 ·2 h2

Definition A group G is cyclic if all elements of G have the form gm forsome g ∈ G (by definition g0 = e).Definition Let H be a subgroup of G. The elements of G of the form hgwhere g fixed in G has h is any element of H is called a right coset of Hin G. The elements of G of the form gh where g is fixed in G and h is anyelement of H is called a left coset of H in G.Theorem: The right (left) cosets divide the elements of G into disjointequivalence classes.Definition A subgroup H of G is normal if the left and right cosets areidentical.Theorem: The cosets of a normal subgroup H of G form a group. Thisgroup is called the quotient group, written G/H

Recall that linear operators on vector spaces can be represented by ma-trices.Definition A linear representation of a group G on a vector space V is amapping from the group G to the linear transformations on V satisfying

g → D(g)

with the propertyD(g1 · g2) = D(g1)D(g2)

for all gi ∈ G. In this expression D(g1)D(g2) is matrix multiplication.Definition Two representations D1(g) and D2(g) of a group G are equivalentif they are related by a similarity transformation S

D1(g) = SD2(g)S−1 (796)

Definition A representation D(g) is reducible if it is equivalent to a blockdiagonal representation

SD(g)S−1 =

D1(g) 0 · · ·0 D2(g) · · ·...

.... . .

(797)

128

Definition A representation D(g) is irreducible if it is not reducible.Theorem: (Schur’s Lemma) Any matrix that commutes with all elementsof an irreducible representation is necessarily a multiple of the identity.

Proof: Let Ax = λx. The x′ = D(g)x is also an eigenvector of A for anyg ∈ G. Thus the subspace spanned by the eigenvectors of A with eigenvalueλ is an invariant subspace of D(G). If D(G) irreducible the only invariantsubspace is whole space, which means that A = λI

A Lie group is group where the group elements are analytic functions ofsome parameters and where the group multiplication law depends analyti-cally on the parameters. Lie groups are some of the most important groupsin physics.

Examples of Lie groups areExample 1: U(1) group of unitary matrices in 1 dimension.Example 2: U(N) group of unitary matrices in N dimension.Example 3: SU(N) group of unitary matrices in N dimension with determi-nant 1.Example 4: O(N) group of real orthogonal matrices in N dimension withdeterminant 1.Example 5: SO(M,N) group of real matrices in N + M dimension withdeterminant 1 that preserve a diagonal metric of the form

η =

1 0 · · · 0 00 1 · · · 0 0...

.... . .

......

0 0 · · · −1 00 0 · · · 0 −1

(798)

with M factors of 1 and N factors of −1.Tensors are associated with group representations. Consider a set of

linear transformations on an N dimensional vector space that preserves thefollowing quadratic from

Q(v,w) :=

N∑

i=1,j

v∗iMijwj (799)

where wj = 〈j|w〉, where v∗i = 〈v|i〉, and

Mij := 〈i|M |j〉 (800)

129

where Mij is understood to be a fixed invertible matrix. In this representationlinear transformations Gij that satisfy

Mij = G†imMmnGnj (801)

leave Q(v,w) invariant.It is a simple exercise to show that this set of transformations is a group

under matrix multiplication. This is like a generalization of the unitaritycondition where the identity is replaced by an arbitrary but fixed matrix.

In terms of matrix multiplication

Q(v,w) = v†Mw

If I defineut := v†M

then the invariant quadratic form becomes

Q(v,w) = utw.

This is invariant under the combined transformations

w → w′ = Gw

(802)

u′ = (M tG∗(M t)−1u = GDu (803)

There are quantities in nature the transform like product of vectors underthe action of a group of transformations. A Rank (m,n) Tensor is a multi-index quantity that transforms like

Ta1···am;b1···bn → T ′a1···am;b1···bn

Ga1a′

1· · ·Gama′

mGDb1b′1

· · ·GDbnb′nTa′

1 ···a′

m;b′1···b′

n

(804)We see that there are two kinds of indices - sometimes called covariant andcontravariant indices.

The transformation matrices are symmetric with respect the interchangesof the contravariant and covariant indices are interchanged among them-selves. This means that each set of indices (covariant) (contravariant) canbe chosen to have a specific symmetry with respect to permutations, whichis preserved under the group transformation law.

If a covariant and contravariant component are summed the resultingquantity is a tensor of rank (m−1, n−1). A tensor of rank 0 is a scalar withrespect to the group transformation law.

130

A useful property of tensors is that if T1 and T2 are tensors of the samerank, and T1 = T2 for one value of g ∈ G, then it holds for all values.

For example, in special relativity, if we demand the Newton’s second lawholds in the rest frame of a particle, and if the right and left side of the equa-tion are assumed to be rank (1,0) tensors with respect to the Lorentz group,then the transformed equation gives the generlaization of these equations inany frame.

Examples of tensors are:

131

29:172 Assignment 1 - Due Wed. Jan. 24

1.) Consider the space of continuous complex-valued functions on the in-terval [a, b] with inner product

∫ b

a

f ∗(x)g(x)dx

a. Show that the sum of two Cauchy sequences is a Cauchy sequence.

b. Show that if two different Cauchy sequences converge to the samefunction that the difference of these sequences converges to zero.

2.) Consider the functions {fn(x)} defined by

f(x) =

0 : x < −1 − 1/nn+ 1 + nx : −1 − 1/n < x < −1

1 : −1 ≤ x ≤ 1n + 1 − nx : 1 < x < 1 + 1/n

0 : x ≥ 1 + 1/n

Show that this sequence is a Cauchy sequence. Show that it convergesto the discontinuous block function

b(x) =

{0 : |x| > 11 : |x| ≤ 1

in the L2(R) norm.

3.) Estimate the Lebesgue integral of

∫ 1

0

x2dx

by dividing the range of this function into 10 equally spaced intervalsbetween 0 and 1. Find upper and lower bounds for the integral by usingthe largest or smallest value of the function on each interval. Comparethis to the exact value.

4.) Show that the set of rational numbers (numbers that can be expressesas ratios of integers m/n) between 0 and 1 are a set of Lebesgue measurezero.

132

5.) Consider the set of rational numbers between zero and one. Show thatit is possible to place each rational in the interior of an open interval(i.e. a < (m/n) < b) in such a way that if we discard all rationals and

all of the open intervals containing these rationals that what remains inthe interval [0, 1] has a measure as close to 1 as desired. [This exercisewas a critical element in Kolomogorov, Arnold, and Moser’s solution tothe classical three-body problem which asks whether the solar systemis stable]

6.) Let S1 and S2 be subsets of a larger set S. For a subset S ′ of S let S ′c

the the complement of S ′ in S. This means the set of points in S thatare not in S ′.

a. ShowS1 ∩ S2 = (Sc

1USc2)

c

b. Show how does this show that the intersection of two Lebesguemeasurable sets are is measurable?

c. Show that every closed interval, [a, b] of the real line is Lebesguemeasurable. Find the Lebesgue measure of this set?

133

29:172 Assignment 2 - Due Wednesday, Jan. 31

1.) Consider the function f(x) = (1−x2) on the interval [−1, 1]. Calculatea 4-th degree Weierstrass polynomial approximation to this function.Compare the exact and approximate functions.

2.) Use the Rodrigues formula to calculate the first four Hermite polyno-mials.

3.) Use the Gram Schmid method to calculate the first three orthogonalpolynomials on [−1, 1] with weight w(x) = 1. Compare these to firstthree Legendre polynomials generated

4. Assume that |f(x) − pn(x)| < ε for all x ∈ [a, b], where pn(x) is apolynomial. Let M be a 2 × 2 Hermitian matrix with real eigenvaluesλ satisfying a < λ < b. Show that

‖|(f(M) − pn(M))‖| < ε

where ‖|O‖| is the matrix or operator norm of O.

134

29:172 Assignment 3 - Due Wednesday, Feb. 7

1.) Use the Rodrigues formula for the Laguerre polynomials to derive theconstants kn, k

′n on page 212 of the text.

2.) Let

g(x, t) =1√

1 − 2xt+ t2. (805)

Show thatg(x, t) =

∑

n=0

∞tnPn(x) (806)

where Pn(x) is the n − th degree Legendre polynomial. The functiong(x, t) is called a generating function.

3.) The Schrodinger equation for a harmonic oscillator has the form

(− d2

dx2+ x2)Fn(x) = (2n+ 1)Fn(x). (807)

Assume thatFn(x) = Cn(x)e−

12x2

. (808)

Show that Cn satisfies the same differential equation as one of theclassical orthogonal polynomials.

4.) Find the Gauss Legendre points and weights for a two point GaussLegendre quadrature on [−1, 1]. Show that

∫ 1

−1

xndx =

2∑

i=1

xni wi (809)

is exact or n = 0, 1, 2, 3. How good is the integral for x6?

5. Use the recursion relations to generate the first four Laguerre Polyno-mials with ν = 0.

6. Show that the Legendre Polynomials satisfy the following recursionrelations involving first derivatives

(n + 1)Pn+1(x) = (n+ 1)xPn − (1 − x2)dPn

x. (810)

135


1.) Fourier Series: In class I developed the Fourier series for periodic func-tions on the interval [−π, π]. I constructed the orthonormal basis func-tions

〈θ|n〉 =1√2πeiθn

a. Find the corresponding orthonormal basis functions for periodic func-tions on the interval [0, L]?

b. If f(θ) is a real valued periodic function, how can you recognize thatit is a real functions by looking at the expansion coefficients:

f(θ) =∑

n

fn〈θ|n〉

2.) Calculate∫∞

0sin(ax)

xdx for a > 0.

3.) Calculate the Fourier Transform of e−ax2

4.) Let f(x) and g(x) have Fourier Transforms.

f(k) =1√2π

∫

dyeikyf(y) g(k) =1√2π

∫

dyeikyg(y)

Find an expression for the Fourier transform of the product f(x)g(x)in terms of their individual Fourier transforms, f(k) and g(k),

5.) Show

limλ→∞

∫ ∞

−∞

eiλxf(x)dx = 0

if f is absolutely integrable and differentiable for every x.

6.) Let f(θ) be 1 for 0 < θ < π and -1 for −π < θ ≤ 0. Find the coefficientsfn in the Fourier series

f(θ) =∑

n

fn〈θ|n〉

136


1.) Delta functions. Show

δ(x− x0) = limε→0+

1

π

ε

(x− x0)2 + ε2

δ(x− x0) = limλ→∞

1

π

1 − cos(λ(x− x0

λ(x− x0)2

δ(x− x0) = limλ→0

χλ(x− x0)

2λwhere χλ(x) is 1 when −λ < x < λ and zero otherwise.

2.) Prove the followingδ(x) = δ(−x)xδ(x) = 0

δ(ax) =1

|a|δ(x)

xdδ(x)

dx= −δ(x)

δ(f(x)) =∑

xi|f(xi)=0

1

| dfdx

(xi)|δ(x− xi)

3.) Show that if f(x) is a continuous function that is identically zero for|x| > L, then its Fourier transform is an entire function.

4.) Show that∫

θ(x)δ(x)f(x)dx

does not make sense on the space of Schwartz functions.

5. Calculate the Fourier transforms of

f(x) =1

(x2 + a2)(x2 + b2)

f(x) =

{1 |x| < 10 |x| ≥ 1

θ(x) =

{1 x > 00 x ≤ 0

137

6. Show that the linear operator ddx

is not bounded on the space of squareintegrable functions on the real line.

138

29:172 Assignment 6 - Due Wed, Feb. 28

1. Let A be a bounded operator with the property that A2 is compact.Under what conditions does

(I − A)−1

exist. Hint use the Fredholm alternative.

2. Show that if A is compact then A† is also compact.

3. Let {|χn〉}∞n=1 be any orthonormal basis and let K be compact. Con-sider the finite rank approximations

KN =

N∑

m,n=1

|χn〉〈χn|K|χm〉〈χm|

Show that‖|K −KN‖|

can be made as small as desired by choosing a large enough N .

4. Show that the integral operator K defined by

〈x|K|f〉 =

∫ ∞

−∞

〈x|K|y〉dy〈y|f〉

with〈x|K|y〉 = e−x2−y2

is compact.

5. The harmonic oscillator Hamiltonian is defined by

H := − d2

dx2+ x2

a. Show that H is a linear operator.

b. Show that H is an unbounded operator.

139

c. H is known to be Hermitian and has a complete set of orthogonaleigenvectors with eigenvalues 2n+1, n = 0, 1, 2, · · · . The resolventof H is the operator

R(z) = (z −H)−1

where z is a complex number. Show that if z 6= 2n+ 1 that R(z)is compact.

6. It is possible for a unitary operator to be compact? Prove your result.

140

29:172 Assignment 7 - Due Wednesday, March. 21

1.) Implicit function theorem. Consider the function w = e−(x2+y2). Wewould like to solve this for x near x = 2 by constructing

x = g(w, y).

While this can be done analytically, write

w = e−(22+y2) +∂e−(x2+y2)

∂x|x=2(x− 2) +R(x, y)

Write this as

x = 2 +1

∂e−(x2+y2)

∂x |x=2

[w − e−(22+y2) −R(x, y)]

Try to approximate this by iteration

x0 := 2

xn := 2 +1

∂e−(x2+y2)

∂x |x=2

[w − e−(22+y2) − R(xn−1, y)]

for x near 2. Compare the result of this for a few iterations to theexact result for a selected value of y. The purpose of this exercise is toillustrate how the implicit function theorem works.

2. Consider Newton’s equation for a linear harmonic oscillator

md2x

dt2= −kx

a. Convert this to a system of two coupled first order differential equa-tions.

b. Convert the system of first order linear differential equations to a pairof coupled integral equations.

141

c. Using the initial conditions, x(0) = a and dxdt

(0) = 0, solve the cou-pled integral equations by successive approximations, using the initialconditions as the first approximation (method used in class to provethe existence of solutions of differential equations). While there arean infinite number of iterations needed to obtain the full solution, thisproblem is simple enough that you should be able to obtain the exactsolution.

3. Consider Legendre’s differential equation. If we do not specify bound-ary conditions then we know that it has two independent solutions. Oneof the two independent solutions is the Legendre polynomial. Find thesecond independent solution for the case n = 1.

4. Find the solution of the first order differential equation

cosh(x)df

dx+ sinh(x)f(x) = 0

5. Find the solution of the first order differential equation

cosh(x)df

dx+ sinh(x)f(x) = sinh(2x)

6. Consider the Wronskian of the differential equation

a(x)d2f

dx2+ b(x)

df

dx+ c(x)f(x) = 0

where a(x) > 0. Show that the Wronskian of this equation is non-zerofor all x.

142

29:172 Assignment 8 - Due Wednesday, March. 28

1.) Let Lx = d2

dx2 on the interval [a, b].

Find the weight w(x) that makes this a hermitian operator.

Find the Green function for this operator corresponding to Dirichletboundary conditions.

Find the Green function for this operator corresponding to Neumannboundary conditions.

2. Consider the linear differential operator

Lx = ex d2

dx2− x2 d

dx+ 1

Find the adjoint operator on the interval [0, 1]

Find the boundary conditions that are adjoint to Dirichlet boundaryconditions.

3. Assume that K is the compact operator

K =

∞∑

n=0

|pn〉1

n+ 1〈pn|

that acts on square integrable functions on the interval [−1, 1]. In thisexpression

〈x|pn〉 = Pn(x)√

n+ 1/2

where Pn(x) is the n−th Legendre polynomial. Find the Green functionfor K. Express your answer in terms of Legendre polynomials. Hint,use the abstract identity KG = I.

4. Consider the differential equation

Lf (x) =d2f(x)

dx2− ω2f(x) = g(x)

where ω is real.

Find independent solutions to the homogeneous equation.

143

Calculate the Wronskian of this system. Verify that it never vanishes.

Find the most general solution to the inhomogeneous equation for ageneral g(x),

5.) Consider Legendre’s differential equation for n = 1. Last week you cal-culated the a second independent solution to the homogeneous equa-tion. Use that solution to find the differences a< − a> and b< − b>

144

29:172 Assignment 9 - Due Wed, April 4

1.) Find the Green’s function for the differential operator

L =d

dx[xd

dx] − 1

x

satisfying Dirichlet boundary condition at x = 0 and x = 1 (hint - seetext)

2.) Let A be a linear operator and let M be the Moore-Penrose generalizedinverse of A.

a. Prove that M is unique.

b. Let A be compact. Let M be the Moore Penrose generalized inverse ofA. Discuss conditions on A for M to be compact.

3.) Consider the differential operator L = d2

dx2 on the interval [0, π] withperiodic boundary conditions.

a. Are there any solutions to the homogeneous equation, L|f〉 = 0 satis-fying periodic boundary conditions?

b. Find the generalized Green function for L that satisfies periodic bound-ary conditions.

c. If we want to solve L|f〉 = |g〉, are there any restrictions no |g〉 in orderfor a solution to exist.

d. Solve L|f〉 = |g〉 for 〈x|g〉 = cos(2x).

4.) Consider the differential operator L = d2

dx2 on the interval [a, b].

Find a solution to the equation L|f〉 = 0 satisfying 〈x|f〉(a) = 1,〈x|f〉(b) = −1.

145

29:172 Assignment 10 - Due Monday, April. 11

1.) Let M denote a finite dimensional matrix satisfying |||M ||| < c < ∞.Let M ′ = 1

2cM .

a. Show that the Moore-Penrose Generalized inverse of M is X = 12cX ′

where X ′ is the Moore Penrose Generalized inverse of M ′

b. Consider the series

X ′ =∞∑

n=0

(I −M †′M ′)nM †′

Show that this series converges to the Moore-Penrose generalized in-verse of M ′

2. Consider the differential operator

L =1

x

d2

dx2x+ 1

a. Find independent solutions to the homogeneous equation L|f〉 = 0 onthe interval [0, a].

b. Find the Green function satisfying Dirichlet boundary on [0, a].

3.) Let L be a Hermitian differential operator with some specified homo-geneous boundary conditions. Consider the resolvent operator

R(z) = (z − L)−1

which is defined whenL|f〉 = z|f〉

has no solutions satisfying the homogeneous boundary conditions. Letγ(t), t ∈ [0, 1] be a parameterized circle (counter clockwise) in thecomplex plane with radius r and center at z = 0 with the propertythat R(z) is defined for all z on the curve.

a. Show that

P =1

2πi

∫ t

0

R(γ(t))dγ

dtdt

is an orthogonal projection operator.

146

b. If L|f〉 = η|f〉, what can you say about P |f〉?

4.) The Lagrangian for a free particle in a one dimensional box is

L =1

2

(dx

dt

)2

The solution that is a stationary point of the action functional

A[x] =

∫ T

0

L(x(t))dt

for x(0) = a and x(T ) = b is known to be

x0(t) = a+b− a

Tt

a. Use the method used in the example in class to determine whetherthis solution is a minimum (among all curves satisfying x(0) = a andx(T ) = b) of the action functional or not.

6.) Given a Strum Liouville operator of the form

L =d

dxg(x)

d

dx− h(x)

with weight w(x) = 1, x ∈ [a, b], and g(x) and h(x) real. Show explicitlythat the eigenvalues of L|fn〉 = λn|fn〉 are real and that eigenvectorscorresponding different eigenvalues are orthogonal on [a, b]

147

29:172 Assignment 11 - Due Wed, April. 25

1.) Use the series method to solve the second order differential equationwith constant coefficients,

L〈x|f〉 = 0

L =d2

dx2+ a

d

dx+ b

with boundary conditions〈0|f〉 = 1

d

dx〈x|f〉|x=0 = 1

b. What is the domain of analyticity of your solution?

c. Put this equation in the form (14.6) (see K&D) and verify equation(14.12) for this example.

2.) Consider the differential equation

d2

dz2f(z) − f(z) = 0

Find recursion relations that define the coefficients of the two-independentpower series solutions of this equation about z = 0.


d2

dz2f(z) + sin(z)

d

dzf(z) − cos(z)f(z) = 0

Find recursion relations that define the coefficients of the two-independentpower series solutions of this equation about z = 0.


d2

dz2f(z) +

2

z

d

dzf(z) − zf(z) = 0

Find the indicial equation for a solution about z = 0. Find the roots.

5.) Find the recursion relation that defines the coefficients of the seriessolution to the differential equation in problem 4) associated with theroot of the indicial equation with largest real part.

148

29:172 Assignment 12 - Due Wed., May 2

1.) Let r + s+ t = 0 and α + α′ + β + β ′ + γ + γ′ = 1. Show that

(z − z1)r(z − z2)

s(z − z3)tP


satisfies the Riemann equation satisfied by

P

z1 z2 z3α+ r β + s γ + t zα′ + r β ′ + s γ′ + t

2. Show that for

z′ =Az +B

Cz +Dz′i =

Azi +B

Czi +D

that

P


= P

z′1 z′2 z′3α β γ z′

α′ β ′ γ′

3. Verify thatz1−cF (b− c+ 1, a− c + 1, 2 − c, z)

satisfies the same differential equation as F (a, b, c, z)

4. Verify that

F (a, b, c, z) =Γ(c)

Γ(b)Γ(c− b)

∫ ∞

1

(t− z)−a(ta−c(t− 1)c−b−1dt

F (a, b, c, z) =Γ(c)

Γ(c− b)Γ(b)

∫ 1

0

(1 − tz)−atb−1(1 − t)c−b−1dt

are solutions of the Hypergeometric equation.

5. Verify thatz1−cΦ(a− c + 1, 2 − c, z)

satisfies the same differential equation as the confluent Hypergeometricfunction (c not integer) Φ(a, c, z).

149

Physics 172

Documents

Transcript of Physics 172