Finite-Dimensional Linear Algebra Solutions to …pages.mtu.edu/~msgocken/fdlabook/SSManual.pdf ·...

Finite-Dimensional Linear Algebra

Solutions to selected odd-numbered exercises

Mark S. Gockenbach

June 19, 2012

Errata for the first printing

The following corrections will be made in the second printing of the text, expected in 2011. These solutionsare written as if they have already been made.Page 65: Exercise 14: belongs in Section 2.7.Page 65: Exercise 16: should read “(cf. Exercise 2.3.21)”, not “(cf. Exercise 2.2.21)”.Page 71: Exercise 9 (b): Z5

4 should be Z45.

Page 72: Exercise 11: “over V ” should be “over F”.Page 72: Exercise 15: “i = 1, 2, . . . , k” should be “j = 1, 2, . . . , k” (twice).Page 79: Exercise 1: “x3 = 2” should be “x3 = 3”.Page 82: Exercise 14(a): “Each Ai and Bi has degree 2n + 1” should read “Ai, Bi ∈ P2n+1 for all i =0, 1, . . . , n”.Page 100, Exercise 11: “K : C[a, b]→ C[a, b]” should be “K : C[c, d]→ C[a, b]”Page 114, Line 9: “L : Fn → Rm” should be “L : Fn → Fm”.Page 115: Exercise 8:

S = (1, 0, 0), (0, 1, 0), (0, 0, 1) X = (1, 1, 1), (0, 1, 1), (0, 0, 1).

should be

S = (1, 0, 0), (0, 1, 0), (0, 0, 1), X = (1, 1, 1), (0, 1, 1), (0, 0, 1).Page 116, Exercise 17(b): “Fmn” should be “Fmn”.Page 121, Exercise 3: “T : R4 → R3” should be “T : R4 → R4”.Page 124, Exercise 15: “T : X/ker(L)→ R(U)” should be “T : X/ker(L)→ R(L)”.Page 124, Exercise 15:

T ([x]) = T (x) for all [x] ∈ X/ker(L)

should be

T ([x]) = L(x) for all [x] ∈ X/ker(L).

Page 129, Exercise 4(b): Period is missing at the end of the sentence.Page 130, Exercise 8: L : Z3

3 → Z33 should read L : Z3

5 → Z35

Page 130, Exercise 13(b): “T defines . . . ” should be “S defines . . . ”.Page 131, Exercise 15: “K : C[a, b]× C[c, d]→ C[a, b]” should be “K : C[c, d]→ C[a, b]”.Page 138, Exercise 7(b): “define” should be “defined”.Page 139, Exercise 12: In the last line, “spx1, x2, . . . , xn” should be “spx1, x2, . . . , xk”.Page 139, Exercise 12: The proposed plan for the proof is not valid. Instead, the instructions should read:Choose vectors x1, . . . , xk ∈ X such that T (x1), . . . , T (xk) is a basis for R(T ), and choose a basis y1, . . . , yℓfor ker(T ). Prove that x1, . . . , xk, y1, . . . , yℓ is a basis for X . (Hint: First show that ker(T ) ∩ spx1, . . . , xkis trivial.)Page 140, Exercise 15: In the displayed equation, |Aii should be |Aii|.Page 168: Definition 132 defines the adjacency matrix of a graph, not the incidence matrix (which is somethingdifferent). The correct term (adjacency matrix) is used throughout the rest of the section. (Change “incidence”to “adjacency” in three places: the title of Section 3.10.1, Page 168 line -2, Page 169 line 1.)Page 199, Equation (3.41d): “x1, x2 ≤ 0” should be “x1, x2 ≥ 0”.

1

2 Errata for the first printing

Page 204, Exercise 10: “α1, . . . , αk ∈ R” should be “α1, . . . , αk ≥ 0”. Also, C should not be boldface in thedisplayed formula.

Page 221, Exercise 9: “m > n” should be “m < n”.

Page 242, Corollary 194: “for each i = 1, 2, . . . , t” should be “for each i = 1, 2, . . . , m”.

Page 251, Exercise 18(e):

w =

[

0v

]

,

should be

w =

[

0v

]

.

(That is, the comma should be a period.)

Page 256, Exercise 13: First line should read “Let X be a finite-dimensional vector space over C withbasis. . . ”. References in part (b) to Fn×n, F k×k, F k×ℓ, F ℓ×ℓ should be replaced with Cn×n, etc. Also, in part(b), “Prove that [T ]X” should be replaced with “Prove that [T ]X ,X”.

Page 264, Exercise 3: Add “Assume p, q is linearly independent.”

Page 271, Exercise 3: “. . . we introduced the incidence matrix . . . ” should be “. . . we introduced theadjacency matrix . . . ”.

Page 282, Exercise 6: S = sp(1, 3,−3, 2), (3, 7,−11,−4) should be S = sp(1, 4,−1, 3), (4, 7,−19, 3).Page 282, Exercise 7(b): “N (A) ∩ col(A)” should be “N (A) ∩ col(A) = 0”.

Page 283, Exercise 12: “Lemma 5.1.2” should be “Lemma 229”.

Page 306, Example 252: “B = p0, D(p0), D2(p0) = x2, 2x, 2” should be “B = D2(p0), D(p0), p0 =

2, 2x, x2”. Also, “[T ]B,B” should be “[D]B,B” (twice). Similarly, A should be defined as 2,−1+2x, 1−x+x2and “[T ]A,A” should be “[D]A,A”.

Page 308, Exercise 3: “Suppose X is a vector space. . . ” should be “Suppose X is a finite-dimensional vectorspace. . . ”.

Page 311, Line 7: “corresponding to λ” should be “corresponding to λi”.

Page 316, Exercise 6(f): Should end with a “;” instead of a “.”.

Page 317, Exercise 15: “ker((T − λI)2) = ker(A− λI)” should be “ker((T − λI)2) = ker(T − λI)”.

Page 322, displayed equation (5.21): The last line should read v′r = λvr.

Page 325, Exercise 9: “If U(t0) is singular, say U(t)c = 0 for some c ∈ Cn, c 6= 0” should be “If U(t0) issingular, say U(t0)c = 0 for some c ∈ Cn, c 6= 0”.

Page 331, Line 16: “. . . is at least t + 1” should be “. . . is at least s + 1”.

Page 356, Exercise 9: “. . . such that x1, x2, x3, x4.” should be “. . . such that x1, x2, x3, x4 is an orthog-onal basis for R4.”

Page 356, Exercise 13: “. . . be a linearly independent subset of V ” should be “. . . be an orthogonal subsetof V ”

Page 356, Exercise 14: “. . . be a linearly independent subset of V ” should be “. . . be an orthogonal subsetof V ”

Pages 365–368: Miscellaneous exercises 1–21 should be numbered 2–22.

Page 365, Exercise 6 (should be 7): “. . . under the L2(0, 1) norm” should be “. . . under the L2(0, 1) innerproduct”.

Page 383, Line 1: “col(T )” should be “col(A)” and “col(T )⊥” should be “col(A)⊥”.

Page 383, Exercise 3: “. . . a basis for R4” should be “. . . a basis for R3”.

Page 384, Exercise 6: “basis” should be “bases”.

Page 385, Exercise 14: “Exercise 6.4.13” should be “Exercise 6.4.1”. “That exercise also” should be“Exercise 6.4.13”.

Page 385, Exercise 15: “See Exercise 6.4” should be “See Exercise 6.4.14”.

00, Exercise 4:

f(x) = f

(

a +b− a

2(t + 1)

)

.

Errata for the first printing 3

should be

f(x) = f

(

a +b− a

2(x + 1)

)

.

Page 410, Exercise 1: The problem should specify ℓ = 1, k(x) = x + 1, f(x) = −4x− 1.Page 411, Exercise 6: “u(ℓ) = 0.” should be “u(ℓ) = 0” (i.e. there should not be a period after 0).Page 424, Exercise 1: “. . . prove (1)” should be “. . . prove (6.50)”.Page 432, Exercise 9: “G−1/2 is the inverse of G−1/2” should be “G−1/2 is the inverse of G1/2”.Page 433, Exercise 16: “. . . so we will try to estimate the values u(x1), u(x2), . . . , u(xn)” should be “. . . sowe will try to estimate the values u(x1), u(x2), . . . , u(xn−1)”.Page 438, Exercise 3: “. . . define T : Rn → Fn” should be “. . . define T : Fn → Fn”.Page 448, Exercise 8: In the formula for f , −200x2

1x2 should be −200x21x2. Also, (−1.2, 1) should be (1, 1).

Page 453, Exercise 6: Add: “Assume ∇g(x(0)) has full rank.”Page 475, Exercise 10: “A = GH” should be “A = GQ”.Page 476, Exercise 15(a):

‖A‖F =

√

√

√

√

m∑

i=1

n∑

j=1

|A2ij for all A ∈ Cm×n.

should be

‖A‖F =

√

√

√

√

m∑

i=1

n∑

j=1

|Aij |2 for all A ∈ Cm×n.

Page 476, Exercise 15: No need to define ‖C‖F again.01, last paragraph: The text fails to define k ≡ ℓ (mod p) for general k, ℓ ∈ Z. The following textshould be added: “In general, for k, ℓ ∈ Z, we say that k ≡ ℓ (mod p) if p divides k − ℓ, that is, if there existsm ∈ Z with k = ℓ + mp. It is easy to show that, if r is the congruence class of k ∈ Z, then p divides k − r,and hence this is consistent with the earlier definition. Moreover, it is a straightforward exercise to show thatk ≡ ℓ (mod p) if and only if k and ℓ have the same congruence class modulo p.”

Page 511, Theorem 381: “A(k)ij = Aij” should be “M

(k)ij = Aij”.

Page 516, Exercise 8: “A(1), A(2), . . . , A(n−1)” should be “M (1), M (2), . . . , M (n−1)”.Page 516, Exercise 10: n2/2− n/2 should be n2 − n.Page 523, Exercise 6(b): “. . . the columns of AP are. . . ” should be “. . . the columns of APT are. . . ”.Page 535, Theorem 401: ‖A‖1 should be ‖A‖∞ (twice).Page 536, Exercise 1: “. . . be any matrix norm. . . ” should be “. . . be any induced matrix norm. . . ”.Page 554, Exercise 4: “. . . b is consider the data . . . ” should be “. . . b is considered the data . . . ”.Page 554, Exercise 5: “. . . b is consider the data . . . ” should be “. . . b is considered the data . . . ”.Page 563, Exercise 7: “Let v ∈ Rm be given and define α = ±‖x‖2, let u1, u2, . . . , um be an orthonormalbasis for Rm, where u1 = x/‖x‖2 . . . ” should be “Let v ∈ Rm be given, define α = ±‖v‖2, x = αe1 − v,u1 = x/‖x‖2, and let u1, u2, . . . , um be an orthonormal basis for Rm,. . . ”.Page 571, Exercise 3: “Prove that the angle between Akv0 and x1 converges to zero as k →∞” should be“Prove that the angle between Akv0 and spx1 = EA(λ1) converges to zero as k →∞.Page 575, line 15: 3n2 − n should be 3n2 + 2n− 5.Page 575, line 16: “n square roots” should be “n− 1 square roots”.Page 580, Exercise 3: “. . . requires 3n2−n arithmetic operations, plus the calculation of n square roots,. . . ”should be “. . . requires 3n2 + 2n− 5 arithmetic operations, plus the calculation of n− 1 square roots,. . . ”.Page 585, line 19: “original subsequence” should be “original sequence”.Page 585, line 20: “original subsequence” should be “original sequence”.Page 604, Exercise 4: “Theorem 4” should be “Theorem 451”.Page 608, line 18: “. . . exists a real number. . . ” should be “. . . exists as a real number. . . ”.

Chapter 2

Fields and vector spaces

2.1 Fields

3. Let F be a field and let α ∈ F be nonzero. We wish to show that the multiplicative inverse of α isunique. Suppose β ∈ F satisfies αβ = 1. Then, multiplying both sides of the equation by α−1, we obtainα−1(αβ) = α−1 · 1, or (α−1α)β = α−1, or 1 · β = α−1. It follows that β = α−1, and thus α has a uniquemultiplicative inverse.

7. Let F be a field and let α, β be elements of F . We wish to show that the equation α+x = β has a uniquesolution. The proof has two parts. First, if x satisfies α + x = β, then adding −α to both sides showsthat x must equal −α + β = β−α. This shows that the equation has at most one solution. On the otherhand, x = −α + β is a solution since α + (−α + β) = (α− α) + β = 0 + β = β. Therefore, α + x = β hasa unique solution, namely, x = −α + β.

13. Let F = (α, β) : α, β ∈ R, and define addition and multiplication on F by (α, β)+(γ, δ) = (α+γ, β+δ),(α, β) · (γ, δ) = (αγ, βδ). With these definitions, F is not a field because multiplicative inverses do notexists. It is straightforward to verify that (0, 0) is an additive identity and (1, 1) is a multiplicativeidentity. Then (1, 0) 6= (0, 0), yet (1, 0) · (α, β) = (α, 0) 6= (1, 1) for all (α, β) ∈ F . Since F contains anonzero element with no multiplicative inverse, F is not a field.

15. Suppose F is a set on which are defined two operations, addition and multiplication, such that all theproperties of a field are satisfied except that addition is not assumed to be commutative. We wish toshow that, in fact, addition must be commutative, and therefore F must be a field. We first note thatit is possible to prove that 0 · γ = 0, −1 · γ = −γ, and−(−γ) = γ for all γ ∈ F without invokingcommutativity of addition. Moreover, for all α, β ∈ F , −β + (−α) = −(α + β) since (α + β) + (−β +(−α)) = ((α + β) + (−β)) + (−α) = (α + (β + (−β))) + (−α) = (α + 0) + (−α) = α + (−α) = 0. Wetherefore conclude that −1 · (α + β) = −β + (−α) for all α, β ∈ F . But, by the distributive property,−1 · (α + β) = −1 · α + (−1) · β = −α + (−β), and therefore −α + (−β) = −β + (−α) for all α, β ∈ F .Applying this property to −α, −β in place of α, β, respectively, yields α + β = β + α for all α, β ∈ F ,which is what we wanted to prove.

19. Let F be a finite field.

(a) Consider the elements 1, 1+1, 1+1+1, . . . in F . Since F contains only finitely many elements, theremust exist two terms in this sequence that are equal, say 1+1+ · · ·+1 (ℓ terms) and 1+1+ · · ·+1(k terms), where k > ℓ. We can then add −1 to both sides ℓ times to show that 1+1+ · · ·+1 (k− ℓterms) equals 0 in F . Since at least one of the sequence 1, 1+1, 1+1+1, . . . equals 0, we can definen to be the smallest integer greater than 1 such that 1 + 1 + · · · + 1 = 0 (n terms). We call n thecharacteristic of the field.

(b) Given that the characteristic of F is n, for any α ∈ F , we have α+α+ · · ·+α = α(1+1+ · · ·+1) =α · 0 = 0 if the sum has n terms.

5

6 CHAPTER 2. FIELDS AND VECTOR SPACES

(c) We now wish to show that the characteristic n is prime. Suppose, by way of contradiction, thatn = kℓ, where 1 < k, ℓ < n. Define α = 1 + 1 + · · ·+ 1 (k terms) and β = 1 + 1 + · · ·+ 1 (ℓ terms).Then αβ = 1 + 1 + · · ·+ 1 (n terms), so that αβ = 0. But this implies that α = 0 or β = 0, whichcontradicts the definition of the characteristic n. This contradiction shows that n must be prime.

2.2 Vector spaces

7. (a) The elements of P1(Z2) are the polynomials 0, 1, x, 1 + x, which define distinct functions on Z2.We have 0+0 = 0, 0+1 = 1, 0+x = x, 0+ (1+x) = 1+x, 1+1 = 0, 1+x = 1+x, 1+ (1+x) = x,x+x = (1+1)x = 0x = 0, x+(1+x) = 1+(x+x) = 1, (1+x)+(1+x) = (1+1)+(x+x) = 0+0 = 0.

(b) Nominally, the elements of P2(Z2) are 0, 1, x, 1 + x, x2, 1 + x2, x + x2, 1 + x + x2. However, sincethese elements are interpreted as functions mapping Z2 into Z2, it turns out that the last fourfunctions equal the first four. In particular, x2 = x (as functions), since 02 = 0 and 12 = 1. Then1 + x2 = 1 + x, x + x2 = x + x = 0, and 1 + x + x2 = 1 + 0 = 1. Thus we see that the functionspaces P2(Z2) and P1(Z2) are the same.

(c) Let V be the vector space consisting of all functions from Z2 into Z2. To specify f ∈ V means tospecify the two values f(0) and f(1). There are exactly four ways to do this: f(0) = 0, f(1) = 0(so f(x) = 0); f(0) = 1, f(1) = 1 (so f(x) = 1); f(0) = 0, f(1) = 1 (so f(x) = x); and f(0) = 1,f(1) = 0 (so f(x) = 1 + x). Thus we see that V = P1(Z2).

9. Let V = R2 with the usual scalar multiplication and the following nonstandard vector addition: u⊕ v =(u1 + v1, u2 + v2 + 1) for all u, v ∈ R2. It is easy to check that commutativity and associativity of ⊕hold, that (0,−1) is an additive identity, and that each u = (u1, u2) has an additive inverse, namely,(−u1,−u2− 2). Also, α(βu) = (αβ)u for all u ∈ V , α, β ∈ R (since scalar multiplication is defined in thestandard way). However, if α ∈ R, then α(u + v) = α(u1 + v1, u2 + v2 + 1) = (αu1 + αv1, αu2 +αv2 +α),while αu + αv = (αu1, αu2) + (αv1, αv2) = (αu1 + αv1, αu2 + αv2 + 1), and these are unequal if α 6= 1.Thus the first distributive property fails to hold, and V is not a vector space over R. (In fact, the seconddistributive property also fails.)

15. Suppose U and V are vector spaces over a field F , and define addition and scalar multiplication on U×Vby (u, v) + (w, z) = (u + w, v + z), α(u, v) = (αu, αv). We wish to prove that U × V is a vector spaceover F . In fact, the verifications of all the defining properties of a vector space are straightforward. Forinstance, (u, v) + (w, z) = (u + w, v + z) = (w + u, z + v) = (w, z) + (u, v) (using the commutativity ofaddition in U and V ), and therefore addition in U × V is commutative. Note that the additive identityin U × V is (0, 0), where the first 0 is the zero vector in U and the second is the zero vector in V . Wewill not verify the remaining properties here.

2.3 Subspaces

3. Let V be a vector space over R, and let v ∈ V be nonzero. We wish to prove that S = 0, v is not asubspace of V . If S were a subspace, then 2v would lie in S. But 2v 6= 0 by Theorem 5, and 2v 6= v(since otherwise adding −v to both sides would imply that v = 0). Hence 2v 6∈ S, and therefore S is nota subspace of V .

7. Define S = x ∈ R2 : ax1 + bx2 = 0, where a, b ∈ R are constants. We will show that S is subspaceof R2. First, (0, 0) ∈ S, since a · 0 + b · 0 = 0. Next, suppose x ∈ S and α ∈ R. Then ax1 + bx2 = 0,and therefore a(αx1) + b(αx2) = α(ax1 + bx2) = α · 0 = 0. This shows that αx ∈ S, and therefore S isclosed under scalar multiplication. Finally, suppose x, y ∈ S, so that ax1 + bx2 = 0 and ay1 + by2 = 0.Then a(x1 + y1) + b(x2 + y2) = (ax1 + bx2) + (ay1 + by2) = 0 + 0 = 0, which shows that x + y ∈ S, andtherefore that S is closed under addition. This completes the proof.

11. Let R be regarded as a vector space over R. We wish to prove that R has no proper subspaces. It sufficesto prove that if S is a nontrivial subspace of R, then S = R. So suppose S is a nontrivial subspace,

2.4. LINEAR COMBINATIONS AND SPANNING SETS 7

which means that there exists x 6= 0 belonging to S. But then, given any y ∈ R, y = (yx−1)x belongs toS because S is closed under scalar multiplication. Thus R ⊂ S, and hence S = R.

17. Let S =

u ∈ C[a, b] :∫ b

a u(x) dx = 0

. We will show that S is a subspace of C[a, b]. First, since the

integral of the zero function is zero, we see that the zero function belongs to S. Next, suppose u ∈ S

and α ∈ R. Then∫ b

a(αu)(x) dx =

∫ b

aαu(x) dx = α

∫ b

au(x) dx = α · 0 = 0, and therefore αu ∈ S. Finally,

suppose u, v ∈ S. Then∫ b

a (u+ v)(x) dx =∫ b

a (u(x)+ v(x)) dx =∫ b

a u(x) dx+∫ b

a v(x) dx = 0+0 = 0. Thisshows that u + v ∈ S, and we have proved that S is a subspace of C[a, b].

19. Let V be a vector space over a field F , and let X and Y be subspaces of V .

(a) We will show that X ∩ Y is also a subspace of V . First of all, since 0 ∈ X and 0 ∈ Y , it followsthat 0 ∈ X ∩ Y . Next, suppose x ∈ X ∩ Y and α ∈ F . Then, by definition of intersection, x ∈ Xand x ∈ Y . Since X and Y are subspaces, both are closed under scalar multiplication and thereforeαx ∈ X and αx ∈ Y , from which it follows that α ∈ X ∩ Y . Thus X ∩ Y is closed under scalarmultiplication. Finally, suppose x, y ∈ X ∩ Y . Then x, y ∈ X and x, y ∈ Y . Since X and Y areclosed under addition, we have x + y ∈ X and x + y ∈ Y , from which we see that x + y ∈ X ∩ Y .Therefore, X ∩ Y is closed under addition, and we have proved that X ∩ Y is a subspace of V .

(b) It is not necessarily that case that X ∪ Y is a subspace of V . For instance, let V = R2, and defineX = x ∈ R2 : x2 = 0, Y = x ∈ R2 : x1 = 0. Thus X ∪ Y is not closed under addition, andhence is not a subspace of R2. For instance, (1, 0) ∈ X ⊂ X ∪ Y and (0, 1) ∈ Y ⊂ X ∪ Y ; however,(1, 0) + (0, 1) = (1, 1) 6∈ X ∪ Y .

2.4 Linear combinations and spanning sets

3. Let S = sp1 + 2x + 3x2, x− x2 ⊂ P2.

(a) There is a (unique) solution α1 = 2, α2 = 1 to α1(1 + 2x + 3x2) + α2(x − x2) = 2 + 5x + 5x2.Therefore, 2 + 5x + 5x2 ∈ S.

(b) There is no solution α1, α2 to α1(1+2x+3x2)+α2(x−x2) = 1−x+x2. Therefore, 1−x+x2 6∈ S.

7. Let u = (1, 1,−1), v = (1, 0, 2) be vectors in R3. We wish to show that S = spu, v is a plane in R3.First note that if S = x ∈ R3 : ax1 + bx2 + cx3 = 0, then (taking x = u, x = v) we see that a, b, cmust satisfy a + b − c = 0, a + 2c = 0. One solution is a = 2, b = −3, c = −1. We will now provethat S = x ∈ R3 : 2x1 − 3x2 − x3 = 0. First, suppose x ∈ S. Then there exist α, β ∈ R such thatx = αu+βv = α(1, 1,−1)+β(1, 0, 2) = (α+β, α,−α+2β), and 2x1−3x2−x3 = 2(α+β)−3α−(−α+2β) =2α + 2β − 3α + α − 2β = 0. Therefore, x ∈ x ∈ R3 : 2x1 − 3x2 − x3 = 0. Conversely, supposex ∈ x ∈ R3 : 2x1− 3x2− x3 = 0. If we solve the equation αu + βv = x, we see that it has the solutionα = x2, β = x1−x2, and therefore x ∈ S. (Notice that x2(1, 1,−1)+(x1−x2)(1, 0, 2) = (x1, x2, 2x1−3x2),and the assumption 2x1 − 3x2 − x3 = 0 implies that 2x1 − 3x2 = x3.) This completes the proof.

11. Let S = sp(−1,−3, 3), (−1,−4, 3), (−1,−1, 4) ⊂ R3. We wish to determine if S = R3 or if S is a propersubspace of R3. Given an arbitrary x ∈ R3, we solve α1(−1,−1, 3) + α2(−1,−4, 3) + α3(−1,−1, 4) =(x1, x2, x3) and find that there is a unique solution, namely, α1 = −13x1 +x2−3x3, α2 = 9x1−x2 +2x3,α3 = 3x1 + x3. This shows that every x ∈ R3 lies in S, and therefore S = R3.

15. Let V be a vector space over a field F , and let u ∈ V , u 6= 0, α ∈ F . We wish to prove that spu =spu, αu. First, if x ∈ spu, then x = βu for some β ∈ F , in which case we can write x = βu + 0(αu),which shows that x also belongs to spu, αu. Conversely, if x ∈ spu, αu, then there exist scalarsβ, γ ∈ F such that x = βu + γ(αu). But then x = (β + γα)u, and therefore x ∈ spu. Thusspu = spu, αu.


2.5 Linear independence

3. Let V be a vector space over a field F , and let u1, . . . , un ∈ V . Suppose ui = 0 for some i, 1 ≤ i ≤ n,and define scalars α1, . . . , αn ∈ F by αk = 0 if k 6= i, αi = 1. Then α1u1 + · · ·+ αnun = 0 · u1 + · · ·+ 0 ·ui−1 + 1 · 0 + 0 · ui+1 + · · ·+ 0 · un = 0, and hence there is a nontrivial solution to α1u1 + · · ·+ αnun = 0.This shows that u1, . . . , un is linearly dependent.

9. We wish to show that 1, x, x2 is linearly dependent in P2(Z2). The equation α1 · 1 + α2x + α3x2 = 0

has the nontrivial solution α1 = 0, α2 = 1, α3 = 1. To verify this, we must simply verify that x + x2 isthe zero function in P2(Z2). Substituting x = 0, we obtain 0+02 = 0+0 = 0, and with x = 1, we obtain1 + 12 = 1 + 1 = 0.

13. We wish to show that p1, p2, p3, where p1(x) = 1 − x2, p2(x) = 1 + x − 6x2, p3(x) = 3 − 2x2,is linearly independent and spans P2. We first verify that the set is linearly independent by solvingα1(1−x2)+α2(1+x−6x2)+α3(3−2x2) = 0. This equation is equivalent to the system α1+α2+3α2 = 0,α2 = 0, −α1 − 6α2 − 2α3 = 0, and a direct calculation shows that the only solution is α1 = α2 = α3 =0. To show that the set spans P2, we take an arbitrary p ∈ P2, say p(x) = c0 + c1x + c2x

2, andsolve α1(1 − x2) + α2(1 + x − 6x2) + α3(3 − 2x2) = c0 + c1x + c2x

2. This is equivalent to the systemα1 + α2 + 3α2 = c0, α2 = c1, −α1 − 6α2− 2α3 = c2. There is a unique solution: α1 = −2c0− 16c1− 3c2,α2 = c1, α3 = c0+5c1+c2. This shows that p ∈ spp1, p2, p3, and, since p was arbitrary, that p1, p2, p3spans all of P2.

17. (a) Let V be a vector space over R, and suppose x, y, z is a linearly independent subset of V . We wishto show that x+ y, y + z, x+ z is also linearly independent. Let α1, α2, α3 ∈ R satisfy α1(x+ y)+α2(y + z) + α3(x + z) = 0. This equation is equivalent to (α1 + α3)x + (α1 + α2)y + (α2 + α3)z = 0.Since x, y, z is linearly independent, it follows that α1 +α3 = α1 +α2 = α2 +α3 = 0. This systemcan be solved directly to show that α1 = α2 = α3 = 0, which proves that x + y, y + z, x + z islinearly independent.

(b) We now show, by example, that the previous result is not necessarily true if V is a vector spaceover some field F 6= R. Let V = Z3

2, and define x = (1, 0, 0), y = (0, 1, 0), and z = (0, 0, 1).Obviously x, y, z is linearly independent. On the other hand, we have (x+ y)+ (y + z)+ (x+ z) =(1, 1, 0)+(0, 1, 1)+(1, 0, 1) = (1+0+1, 1+1+0, 0+1+1) = (0, 0, 0), which shows that x+y, y+z, x+zis linearly dependent.

21. Let V be a vector space over a field F , and suppose u1, u2, . . . , un is linearly dependent. We wish toprove that, given any i, 1 ≤ i ≤ n, either ui is a linear combination of u1, . . . , ui−1, ui+1, . . . , un or thesevectors form a linearly dependent set. By assumption, there exist scalars α1, . . . , αn ∈ F , not all zero, suchthat α1u1 + · · ·+αiui + · · ·+αnun = 0. We now consider two cases. If αi 6= 0, the we can solve the latterequation for ui to obtain ui = −α−1

i α1u1−· · ·−α−1i αi−1ui−1−α−1

i αi+1ui+1−· · ·−α−1i αnun. In this case,

ui is a linear combination of the remaining vectors. The second case is that αi = 0, in which case at leastone of α1, . . . , αi−1, αi+1, . . . , αn is nonzero, and we have α1u1+· · ·+αi−1ui−1+αi+1ui+1+· · ·+αnun = 0.This shows that u1, . . . , ui−1, ui+1, . . . , un is linearly dependent.

2.6 Basis and dimension

3. We now repeat the previous exercise for the vectors v1 = (−1, 3,−1), v2 = (1,−2,−2), v3 = (−1, 7,−13).If we try to solve α1v1 +α2v2 +α3v3 = x for an arbitrary x ∈ R3, we find that this equation is equivalentto the following system:

−α1 + α2 − α3 = x1

α2 + 4α3 = 3x1 + x2

0 = 8x1 + 3x2 + x3.

Since this system is inconsistent for most x ∈ R3 (the system is consistent only if x happens to satisfy8x1 + 3x2 + x3 = 0), v1, v2, v3 does not span R3 and therefore is not a basis.

2.6. BASIS AND DIMENSION 9

7. Consider the subspace S = spp1, p2, p3, p4, p5 of P3, where

p1(x) = −1 + 4x− x2 + 3x3, p2(x) = 2− 8x + 2x2 − 5x3,

p3(x) = 3− 11x + 3x2 − 8x3, p4(x) = −2 + 8x− 2x2 − 3x3,

p5(x) = 2− 8x + 2x2 + 3x3.

(a) The set p1, p2, p3, p4, p5 is linearly dependent (by Theorem 34) because it contains five elementsand the dimension of P3 is only four.

(b) As illustrated in Example 39, we begin by solving

α1p1(x) + α2p2(x) + α3p3(x) + α4p4(x) + α5p5(x) = 0;

this is equivalent to the system

−α1 + 2α2 + 3α3 − 2α4 + 2α5 = 0,

4α1 − 8α2 − 11α3 + 8α4 − 8α5 = 0,

−α1 + 2α2 + 3α3 − 2α4 + 2α5 = 0,

3α1 − 5α2 − 8α3 − 3α4 + 3α5 = 0,

which reduces to

α1 = 16α4 − 16α5,

α2 = 9α4 − 9α5,

α3 = 0.

Since there are nontrivial solutions, p1, p2, p3, p4, p5 is linearly dependent (which we already knew),but we can deduce more than that. By taking α4 = 1, α5 = 0, we see that α1 = 16, α2 = 9, α3 = 0,α4 = 1, α5 = 0 is one solution, which means that

16p1(x) + 9p2(x) + p4(x) = 0 ⇒ p4(x) = −16p1(x)− 9p2(x).

This shows that p4 ∈ spp1, p2 ⊂ spp1, p2, p3. Similarly, taking α4 = 0, α5 = 0, we find that

−16p1(x)− 9p2(x) + p5(x) = 0 ⇒ p5(x) = 16p1(x) + 9p2(x),

and hence p5 ∈ spp1, p2 ⊂ spp1, p2, p3. It follows from Lemma 19 that spp1, p2, p3, p4, p5 =spp1, p2, p3. Our calculations above show that p1, p2, p3 is linearly independent (if α4 = α5 = 0,then also α1 = α2 = α3 = 0). Therefore, p1, p2, p3 is a linearly independent spanning set of S andhence a basis for S.

13. Suppose V is a vector space over a field F , and S, T are two n-dimensional subspaces of V . We wish toprove that if S ⊂ T , then in fact S = T . Let s1, s2, . . . , sn be a basis for S. Since S ⊂ T , this impliesthat s1, s2, . . . , sn is a linearly independent subset of T . We will now show that s1, s2, . . . , sn alsospans T . Let t ∈ T be arbitrary. Since T has dimension n, the set s1, s2, . . . , sn, t is linearly dependentby Theorem 34. But then, by Lemma 33, t must be a linear combination of s1, s2, . . . , sn (since no sk is alinear combination of s1, s2, . . . , sk−1). This shows that t ∈ sps1, s2, . . . , sn, and hence we have shownthat s1, s2, . . . , sn is a basis for T . But then

T = sps1, s2, . . . , sn = S,

as desired.


2.7 Properties of bases

1. Consider the following vectors in R3: v1 = (1, 5, 4), v2 = (1, 5, 3), v3 = (17, 85, 56), v4 = (1, 5, 2),v5 = (3, 16, 13).

(a) We wish to show that v1, v2, v3, v4, v5 spans R3. Given an arbitrary x ∈ R3, the equation

α1v1 + α2v2 + α3v3 + α4v4 + α5v5 = x

is equivalent to the system

α1 + α2 + 17α3 + α4 + 3α5 = x1,

5α1 + 5α2 + 85α3 + 5α4 + 16α5 = x2,

4α1 + 3α2 + 56α3 + 2α4 + 13α5 = x3.

Applying Gaussian elimination, this system reduces to

α1 = 17x1 − 4x2 + x3 − 5α3 + α4,

α2 = x2 − x1 − x3 − 12α3 − 2α5,

α5 = x2 − 5x1.

This shows that there are solutions regardless of the value of x; that is, each x ∈ R3 can be writtenas a linear combination of v1, v2, v3, v4, v5. Therefore, v1, v2, v3, v4, v5 spans R3.

(b) Now we wish to find a subset of v1, v2, v3, v4, v5 that is a basis for R3. According to the calculationsgiven above, each x ∈ R3 can be written as a linear combination of v1, v2, v5 (just take α3 = α4 = 0in the system solved above). Since dim(R3) = 3, any three vectors spanning R3 form a basis for R3

(by Theorem 45). Hence v1, v2, v5 is a basis for R3.

5. Let u1 = (1, 4, 0,−5, 1), u2 = (1, 3, 0,−4, 0), u3 = (0, 4, 1, 1, 4) be vectors in R5.

(a) To show that u1, u2, u3 is linearly independent, we solve the equation α1u1 + α2u2 + α3u3 = 0,which is equivalent to the system

α1 + α2 = 0,

4α1 + 3α2 + 4α3 = 0,

α3 = 0,

−5α1 − 4α2 + α3 = 0,

α1 + 4α3 = 0.

A direct calculation shows that this system has only the trivial solution.

(b) To extend u1, u2, u3 to a basis for R5, we need two more vectors. We will try u4 = (0, 0, 0, 1, 0)and u5 = (0, 0, 0, 0, 1). We solve α1u1 + α2u2 + α3u3 + α4u4 + α5u5 = 0 and find that the onlysolution is the trivial one. This implies that u1, u2, u3, u4, u5 is linearly independent and hence,by Theorem 45, a basis for R5.

9. Consider the vectors u1 = (3, 1, 0, 4) and u2 = (1, 1, 1, 4) in Z45.

(a) It is obvious that u1, u2 is linearly independent, since neither vector is a multiple of the other.

(b) To extend u1, u2 to a basis for Z45, we must find vectors u3, u4 such that u1, u2, u3, u4 is linearly

independent. We try u3 = (0, 0, 1, 0) and u4 = (0, 0, 0, 1). A direct calculation then shows thatα1u1 + α2u2 + α3u3 + α4u4 = 0 has only the trivial solution. Therefore u1, u2, u3, u4 is linearlyindependent and hence, since dim(Z4

5) = 4, it is a basis for Z45.

2.7. PROPERTIES OF BASES 11

15. Let V be a vector space over a field F , and let u1, . . . , un be a basis for V . Let v1, . . . , vk be vectors inV , and suppose

vj = α1,ju1 + . . . + αn,jun, j = 1, 2, . . . , k.

Define the vectors x1, . . . , xk in Fn by

xj = (α1,j , . . . , αn,j), j = 1, 2, . . . , k.

(a) We first prove that v1, . . . , vk is linearly independent if and only if x1, . . . , xk is linearly indepen-dent. We will do this by showing that c1v1+· · ·+ckvk = 0 in V is equivalent to c1x1+· · ·+ckxk = 0 inFn. Then the first equation has only the trivial solution if and only if the second equation does, andthe result follows. The proof is a direct manipulation, for which summation notation is convenient:

k∑

j=1

cjvj = 0⇔k∑

j=1

cj

(

n∑

i=1

αijui

)

⇔k∑

j=1

n∑

i=1

cjαijui = 0

⇔n∑

i=1

k∑

j=1

cjαijui = 0

⇔n∑

i=1

k∑

j=1

cjαij

ui = 0.

Since u1, . . . , un is linearly independent, the last equation is equivalent to

k∑

j=1

cjαij = 0, i = 1, 2, . . . , n,

which, by definition of xj and of addition in Fn, is equivalent to

k∑

j=1

cjxj = 0.

This completes the proof.

(b) Now we show that v1, . . . , vk spans V if and only if x1, . . . , xk spans Fn. Since each vector in Vcan be represented uniquely as a linear combination of u1, . . . , un, there is a one-to-one correspon-dence between V and Fn:

w = c1u1 + · · ·+ cnun ∈ V ←→ x = (c1, . . . , cn) ∈ Fn.

Mimicking the manipulations in the first part of the exercise, we see that

k∑

j=1

cjvj = w ⇔k∑

j=1

cjxj = x.

Thus the first equation has a solution for every v ∈ V if and only if the second equation has asolution for every x ∈ Fn. The result follows.


2.8 Polynomial interpolation and the Lagrange basis

1. (a) The Lagrange polynomials for the interpolation nodes x0 = 1, x1 = 2, x3 = 3 are

L0(x) =(x− 2)(x− 3)

(1− 2)(1− 3)=

1

2(x− 2)(x− 3),

L1(x) =(x− 1)(x− 3)

(2− 1)(2− 3)= −(x− 1)(x− 3),

L2(x) =(x− 1)(x− 2)

(3− 1)(3− 2)=

1

2(x− 1)(x− 2).

(b) The quadratic polynomial interpolating (1, 0), (2, 2), (3, 1) is

p(x) = 0L0(x) + 2L1(x) + L2(x)

= −2(x− 1)(x− 3) +1

2(x − 1)(x− 2)

= −3

2x2 +

13

2x− 5.

5. We wish to write p2(x) = 2 + x − x2 as a linear combination of the Lagrange polynomials constructedon the nodes x0 = −1, x1 = 1, x2 = 3. The graph of p passes through the points (−1, p(−1)), (1, p(1)),(3, p(3)), that is, (−1, 0), (1, 2), (3,−4). The Lagrange polynomials are

L0(x) =(x− 1)(x− 3)

(−1− 1)(−1− 3)=

1

8(x− 1)(x− 3),

L1(x) =(x + 1)(x− 3)

(1 + 1)(1− 3)= −1

4(x + 1)(x− 3),

L2(x) =(x + 1)(x− 1)

(3 + 1)(3− 1)=

1

8(x + 1)(x− 1),

and therefore,

p(x) = 0L0(x) + 2L1(x)− 4L2(x)

= −1

2(x + 1)(x− 3)− 1

2(x + 1)(x− 1).

11. Consider a secret sharing scheme in which five individuals will receive information about the secret, andany two of them, working together, will have access to the secret. Assume that the secret is a two-digit integer, and that p is chosen to be 101. The degree of the polynomial will be one, since then thepolynomial will be uniquely determined by two data points. Let us suppose that the secret is N = 42and we choose the polynomial to be p(x) = N + c1x, where c1 = 71 (recall that c1 is chosen at random).We also choose the five interpolation nodes at random to obtain x1 = 9, x2 = 14, x3 = 39, x4 = 66, andx5 = 81. We then compute

y1 = p(x1) = 42 + 71 · 9 = 75,

y2 = p(x2) = 42 + 71 · 14 = 26,

y3 = p(x3) = 42 + 71 · 39 = 84,

y4 = p(x4) = 42 + 71 · 66 = 82,

y5 = p(x5) = 42 + 71 · 81 = 36

(notice that all arithmetic is done modulo 101). The data points, to be distributed to the five individuals,are (9, 75), (14, 26), (39, 84), (66, 82), (81, 36).

2.9. CONTINUOUS PIECEWISE POLYNOMIAL FUNCTIONS 13

2.9 Continuous piecewise polynomial functions

1. The following table shows the maximum errors obtained in approximating f(x) = ex on the interval [0, 1]by polynomial interpolation and by piecewise linear interpolation, each on a uniform grid with n nodes.

n Poly. interp. err. PW linear interp. err.1 2.1187 · 10−1 2.1187 · 10−1

2 1.4420 · 10−2 6.6617 · 10−2

3 9.2390 · 10−4 3.2055 · 10−2

4 5.2657 · 10−5 1.8774 · 10−2

5 2.6548 · 10−6 1.2312 · 10−2

6 1.1921 · 10−7 8.6902 · 10−3

7 4.8075 · 10−9 6.4596 · 10−3

8 1.7565 · 10−10 4.9892 · 10−3

9 5.8575 · 10−12 3.9692 · 10−3

10 1.8119 · 10−13 3.2328 · 10−3

For this example, polynomial interpolation is very effective.

Chapter 3

Linear operators

3.1 Linear operators

1. Let m, b be real numbers, and define f : R→ R be defined by f(x) = mx + b. If b is nonzero, then thisfunction is not linear. For example,

f(2 · 1) = f(2) = 2m + b,

2f(1) = 2(m + b) = 2m + 2b.

Since b 6= 0, 2m + b 6= 2m + 2b, which shows that f is not linear. On the other hand, if b = 0, then f islinear:

f(x + y) = m(x + y) = mx + my = f(x) + f(y) for all x, y ∈ R,

f(ax) = m(ax) = a(mx) = af(x) for all a, x ∈ R.

Thus we see that f(x) = mx + b is linear if and only if b = 0.

5. We wish to determine which of the following real-valued functions defined on Rn is linear.

(a) f : Rn → R, f(x) =∑n

i=1 xi.

(b) g : Rn → R, g(x) =∑n

i=1 |xi|.(c) h : Rn → R, h(x) =

∏ni=1 xi.

The function f is linear:

f(x + y) =n∑

i=1

(x + y)i =n∑

i=1

(xi + yi) =n∑

i=1

xi +n∑

i=1

yi = f(x) + f(y),

f(αx) =

n∑

i=1

(αx)i =

n∑

i=1

αxi = α

n∑

i=1

xi = αf(x).

However, g and h are both nonlinear. For instance, if x 6= 0, then g(−x) 6= −g(x) (in fact, g(−x) = g(x)for all x ∈ Rn). Also, if no component of x is zero, then h(2x) = 2nh(x) 6= 2h(x) (of course, we areassuming n > 1).

9. (a) If A ∈ C2×3, x ∈ C3 are defined by

A =

[

1 + i 1− i 2i2− i 1 + 2i 3

]

, x =

32 + i1− 3i

,

15

16 CHAPTER 3. LINEAR OPERATORS

then

Ax =

[

12 + 4i9− 7i

]

.

(b) If A ∈ Z3×32 , x ∈ Z3

2 are defined by

A =

1 1 01 0 10 1 1

, x =

111

,

then

Ax =

000

.

13. We now wish to give a formula for the ith row of AB, assuming A ∈ Fm×n, B ∈ Fn×p. As pointed outin the text, we have a standard notation for the columns of a matrix, but no standard notation for therows. Let us suppose that the rows of B are r1, r2, . . . , rn. Then

(AB)ij =

n∑

k=1

Aik(rk)j =

(

n∑

k=1

Aikrk

)

j

(once again using the componentwise definition of the operations). This shows that the ith row of AB is

n∑

k=1

Aikrk,

that is, the linear combination of the rows of B, with the weights taken from the ith row of A.

3.2 More properties of linear operators

3. Consider the linear operator mapping R2 into itself that sends each vector (x, y) to its projection onto thex-axis, namely, (x, 0). We see that e1 is mapped to itself and e2 = (0, 1) is mapped to (0, 0). Therefore,the matrix representing the linear operator is

A =

[

1 00 0

]

.

9. Let x ∈ RN be denoted as x = (x0, x1, . . . , xN−1). Given x, y ∈ RN , the convolution of x and y is thevector x ∗ y ∈ RN defined by

(x ∗ y)n =

N−1∑

m=0

xmyn−m, n = 0, 1, . . . , N − 1.

In this formula, y is regarded as defining a periodic vector of period N ; therefore, if n−m < 0, we take

3.3. ISOMORPHIC VECTOR SPACES 17

yn−m = yN+n−m. The linearity of the convolution operator is obvious:

((x + z) ∗ y)n =

N−1∑

m=0

(x + z)myn−m =

N−1∑

m=0

(xm + zm)yn−m

=

N−1∑

m=0

(xmyn−m + zmyn−m)

=N−1∑

m=0

xmyn−m +N−1∑

m=0

zmyn−m

= (x ∗ y)m + (z ∗ y)m,

((αx) ∗ y)n =

N−1∑

m=0

(αx)myn−m =

N−1∑

m=0

(αxm)yn−m =

N−1∑

m=0

α(xmyn−m)

= α

N−1∑

m=0

xmyn−m

= α(x ∗ y)n.

Therefore, if y is fixed and F : RN → RN , then F (x + z) = F (x) + F (z) for all x, z ∈ RN andF (αx) = αF (x) for all x ∈ RN , α ∈ R. This proves that F is linear.

Next, notice that, if ek is the kth standard basis vector, then

F (ek)n = yn−k, n = 0, 1, . . . , N − 1.

It follows that

F (e0) = (y0, y1, . . . , YN−1),

F (e1) = (yN−1, y0, . . . , YN−2),

F (e2) = (yN−2, yN−1, . . . , YN−3),

......

F (eN−1) = (y1, y2, . . . , Y0).

Therefore, F (x) = Ax for all x ∈ RN , where

A =

y0 yN−1 yN−2 · · · y1

y1 y0 yN−1 · · · y2

y2 y1 y0 · · · y3

......

.... . .

...yN−1 yN−2 yN−3 · · · y0

.

3.3 Isomorphic vector spaces

1. (a) The function f : R → R, f(x) = 2x + 1 is invertible since f(x) = y has a unique solution for eachy ∈ R:

f(x) = y ⇔ 2x + 1 = y ⇔ x =y − 1

2.

We see that f−1(y) = (y − 1)/2.

(b) The functionf : R→ (0,∞), f(x) = ex is also invertible, and the inverse is f−1(y) = ln (y):

eln (y) = y for all y ∈ R, ln (ex) = x for all x ∈ R.


(c) The function f : R2 → R2, f(x) = (x1 + x2, x1 − x2) is invertible since f(x) = y has a uniquesolution for each y ∈ R2:

f(x) = y ⇔ x1 + x2 = y1,x1 − x2 = y2

⇔ x1 = y1+y2

2 ,x2 = y1−y2

2 .

We see that

f−1(y) =

(

y1 + y2

2,y1 − y2

2

)

.

(d) The function f : R2 → R2, f(x) = (x1−2x2,−2x1 +4x2) fails to be invertible, and in fact is neitherinjective nor surjective. For example, f(0) = 0 and also f((2, 1)) = (0, 0) = 0, which shows that fis not injective. The equation f(x) = (1, 1) has no solution:

f(x) = (1, 1) ⇔ x1 − 2x2 = 1,−2x1 + 4x2 = 1

⇔ x1 − 2x2 = 1,0 = 3.

This shows that f is not surjective.

5. Let X , Y , and Z be sets, and suppose f : X → Y , g : Y → Z are given functions.

(a) If f and g f are invertible, then g is invertible. We can prove this using the previous exercise andthe fact that f−1 is invertible. We have g = (g f) f−1:

((g f) f−1)(y) = g(f(f−1(y))) = g(y) for all y ∈ Y.

Therefore, since g f and f−1 are invertible, the previous exercise implies that g is invertible.

(b) Similarly, if g and g f are invertible, then f is invertible since f = g−1 (g f).

(c) If we merely know that g f is invertible, we cannot conclude that either f or g is invertible. Forexample, let f : R→ R2, g : R2 → R be defined by f(x) = (x, 0) for all x ∈ R and g(y) = y1 + y2

for all y ∈ R2, respectively. Then

(g f)(x) = g(f(x)) = g((x, 0)) = x + 0 = x,

and it is obvious that g f is invertible (in fact, (g f)−1 = g f)). However, f is not surjectiveand g is not injective, so neither is invertible.

13. Let X be the basis for Z32 from the previous exercise, let A ∈ Z3×3

2 be defined by

A =

1 1 01 0 10 1 1

,

and define L : Z32 → Z3

2 by L(x) = Ax. We wish to find [L]X ,X . The columns of this matrix are [L(x1)]X ,[L(x2)]X , [L(x3)]X . We have

[L(x1)]X = [(1, 1, 0)]X = (0, 1, 0),

[L(x2)]X = [(0, 1, 1)]X = (1, 0, 1),

[L(x3)]X = [(0, 0, 0)]X = (0, 0, 0),

and hence

[L]X ,X =

0 1 01 0 00 1 0

.

19. We wish to determine if the operator D defined in Example 79 is an isomorphism. In fact, D is notan isomorphism since it is not injective. For example, if p(x) = x and q(x) = x + 1, then p 6= q butD(p) = D(q).

3.4. LINEAR OPERATOR EQUATIONS 19

3.4 Linear operator equations

1. Suppose L : R3 → R3 is linear, b ∈ R3 is given, and y = (1, 0, 1), z = (1, 1,−1) are two solutions toL(x) = b. We are asked to find two more solutions to L(x) = b. We know that y− z = (0,−1, 2) satisfiesL(y − z) = 0. Therefore, z + α(y − z) satisfies L(z + α(y − z)) = L(z) + αL(y − z) = b + α · 0 = b for allα ∈ R. Thus two more solutions of L(x) = b are

z + 2(y − z) = (1, 1,−1) + (0,−2, 4) = (1,−1, 3),

z + 3(y − z) = (1, 1,−1) + (0,−3, 6) = (1,−2, 5).

5. Let L : R3 → R3 satisfy ker(L) = sp(1, 1, 1) and L(u) = v, where u = (1, 1, 0) and v = (2,−1, 2). Avector x satisfies L(x) = v if and only if there exists α such that x = u + αz (where z = (1, 1, 1)), thatis, if and only if x− u = αz for some α ∈ R.

(a) For x = (1, 2, 1), x− u = (0, 1, 1), which is not a multiple of z. Thus L(x) 6= v.

(b) For x = (3, 3, 2), x− u = (2, 2, 2) = 2z. Thus L(x) = v.

(c) For x = (−3,−3,−2), x− u = (−4,−4,−2), which is not a multiple of z. Thus L(x) 6= v.

7. Let A ∈ Z3×32 , b ∈ Z3

2 be defined by

A =

1 1 01 0 10 1 1

, b =

101

.

If we solve Ax = 0 directly, we obtain two solutions: x = (0, 0, 0) or x = (1, 1, 1). Thus the solutionspace of Ax = 0 is (0, 0, 0), (1, 1, 1). If we solve Ax = b, we also obtain two solutions: x = (0, 1, 0)or x = (1, 0, 1). If L : Z3

2 → Z32 is the linear operator defined by L(x) = Ax for all x ∈ Z3

2, thenker(L) = (0, 0, 0), (1, 1, 1) and the solution set of L(x) = b is (0, 1, 0), (1, 0, 1) = (0, 1, 0) + ker(L), aspredicted by Theorem 90.

3.5 Existence and uniqueness of solutions

3. Each part of this exercise describes an operator with certain properties. We wish to determine if such anoperator can exist.

(a) We wish to find a linear operator T : R3 → R2 such that T (x) = b has a solution for all b ∈ R2.Any surjective operator will do, such as T : R3 → R2 defined by T (x) = (x1, x2).

(b) No linear operator T : R2 → R3 has the property that T (x) = b has a solution for all b ∈ R3.This would imply that T is surjective, but then Theorem 99 would imply that dim(R3) ≤ dim(R2),which is obviously not true.

(c) Every linear operator T : R3 → R2 has the property that, for some b ∈ R2, the equation T (x) = bhas infinitely many solutions. In fact, since dim(R2) < dim(R3), any such T : R3 → R2 fails tobe injective and hence has a nontrivial kernel. Therefore, T (x) = 0 necessarily has infinitely manysolutions.

(d) We wish to construct a linear operator T : R2 → R3 such that, for some b ∈ R3, the equationT (x) = b has infinitely many solutions. We will take T defined by T (x) = (x1 − x2, x1 − x2, 0).Then, for every α ∈ R, x = (α, α) satisfies T (x) = 0.

(e) We wish to find a linear operator T : R2 → R3 with the property that T (x) = b does not havea solution for all b ∈ R3, but when there is a solution, it is unique. Any nonsingular T will do,since R(T ) is necessarily a proper subspace of R3 (that is, T : R2 → R3 cannot be surjective), andnonsingularity implies that T (x) = b has a unique solution for each b ∈ R(T ) (by Definition 96 andTheorem 92). For example, T : R2 → R3 defined by T (x) = (x1, x2, 0) has the desired properties.


7. Define S : Pn → Pn by S(p)(x) = p(2x + 1). We wish to find the rank and nullity of S. We first findthe kernel of S. We have that S(p) = 0 if and only if p(2x + 1) = 0 for all x ∈ R. Consider an arbitraryy ∈ R and define x = (y − 1)/2. Then

p(2x + 1) = 0 ⇒ p

(

2 · y − 1

2+ 1

)

= 0 ⇒ p(y) = 0.

Since p(y) = 0 for all y ∈ R, it follows that p must be the zero polynomial, and hence the kernel of S istrivial. Thus nullity(S) = 0.

Now consider any q ∈ Pn, and define p ∈ Pn by p(x) = q((x − 1)/2). Then

(S(p))(x) = p(2x + 1) = q(2x + 1− 1

2) = q(x).

This shows that S is surjective, and hence rank(S) = dim(Pn) = n + 1.

13. (a) Suppose X and U are finite-dimensional vector spaces over a field F and T : X → U is an injectivelinear operator. Then R(T ) is a subspace of U , and we can define T1 : X →R(T ) by T1(x) = T (x)for all x ∈ X . Obviously T1 is surjective, and it is injective since T is injective. Thus T1 is anisomorphism between X and R(T ).

(b) Consider the operator S : Pn−1 → Pn of Example 100 (and the previous exercise), which we haveseen to be injective. By the previous part of this exercise, S defines an isomorphism between Pn−1

and R(S) = spx, x2, . . . , xn ⊂ Pn.

3.6 The fundamental theorem; inverse operators

3. Let F be a field, let A ∈ Fn×n, and let T : Fn → Fn be defined by T (x) = Ax. We first wish to showthat A is invertible if and only if T is invertible. We begin by noting that if M ∈ Fn×n and Mx = x forall x ∈ Fn, then M = I. This follows because each linear operator mapping Fn into Fn is representedby a unique matrix (with respect to the standard basis on Fn). The condition Mx = x for all x ∈ Fn

shows that M represents the identity operator, as does the identity matrix I; hence M = I.

Now suppose that T is invertible, and let B be the matrix of T−1 under the standard basis. We thenhave

(AB)x = A(Bx) = T (T−1(x)) = x for all x ∈ Fn ⇒ AB = I

and(BA)x = B(Ax) = T−1(T (x)) = x for all x ∈ Fn ⇒ BA = I.

This shows that A is invertible and that B = A−1.

Conversely, suppose that A is invertible, and define S : Fn → Fn by S(x) = A−1x. Then

S(T (x)) = A−1(Ax) = (A−1A)x = Ix = x for all x ∈ Fn

andT (S(x)) = A(A−1x) = (AA−1)x = Ix = x for all x ∈ Fn.

This shows that T is invertible and that S = T−1.

Notice that the above also shows that if A is invertible, then A−1 is the matrix defining T−1.

7. We repeat the previous exercise for the operators defined below.

(a) M : R2 → R3 defined by M(x) = Ax, where

A =

1 11 00 1

.

3.7. GAUSSIAN ELIMINATION 21

Since the dimension of the domain is less than the dimension of the co-domain, Theorem 99 impliesthat M is not surjective. The range of M is spanned by the columns of A (which are linearlyindependent), so

R(M) = sp(1, 1, 0), (1, 0, 1)and rank(M) = 2. By the fundamental theorem, we see that nullity(M) = 0; thus ker(M) is trivialand M is injective.

(b) M : R3 → R2 define by M(x) = Ax, where

A =

[

1 2 11 0 −1

]

.

Since the dimension of the domain is greater than the dimension of the co-domain, Theorem 93implies that M cannot be injective. A direct calculation shows that ker(M) = sp(1,−1, 1), andhence nullity(M) = 1. By the fundamental theorem, we see that rank(M) = 2 = dim(R2). HenceM must be surjective.

13. Let F be a field and suppose A ∈ Fm×n, B ∈ Fn×p. We wish to show that that rank(AB) ≤ rank(A).For all x ∈ F p, we have (AB)x = A(Bx), where Bx ∈ Fn. It follows that (AB)x ∈ col(A) for all x ∈ F p,which shows that col(AB) ⊂ col(A). Hence, by Exercise 2.6.13, dim(col(AB)) ≤ dim(col(A)), that is,rank(AB) ≤ rank(A).

15. Suppose A ∈ Cn×n is called strictly diagonally dominant:

|Aii| >n∑

j = 1j 6= i

|Aij |, i = 1, 2, . . . , n.

We wish to prove that a A is nonsingular. Suppose x ∈ Rn satisfies x 6= 0, Ax = 0. Let j be the indexwith the property that |xj | ≥ |xk| for all k = 1, 2, . . . , n, k 6= j. Then

(Ax)i =

n∑

j=1

Aijxj = Aiixi +

n∑

j = 1j 6= i

Aijxj .

Since (Ax)i = 0, we obtain

Aiixi = −n∑

j = 1j 6= i

Aijxj .

But∣

∣

∣

∣

∣

∣

∣

−n∑

j = 1j 6= i

Aijxj

∣

∣

∣

∣

∣

∣

∣

≤n∑

j = 1j 6= i

|Aij ||xj | ≤n∑

j = 1j 6= i

|Aij ||xi|

= |xi|n∑

j = 1j 6= i

|Aij | < |xi||Aii|.

This is a contradiction, which shows that x cannot be nonzero. Thus the only solution of Ax = 0 isx = 0, which shows that A must be nonzero.

3.7 Gaussian elimination

3. The matrix A has no inverse; its rank is only 2.

7. The matrix A is not invertible; its rank is only 2.


3.8 Newton’s method

1. The two solutions of (3.13) are

√

−1 +√

5

2,−1 +

√5

2

,

−

√

−1 +√

5

2,−1 +

√5

2

.

3. We apply Newton’s method to solve F (x) = 0, where F : R3 → R3 is defined by

F (x) =

x21 + x2

2 + x23 − 1

x21 + x2

2 − x3

3x1 + x2 + 3x3

.

There are two solutions:

(−0.721840, 0.311418, 0.6180340), (−0.390621,−0.682238, 0.6180340)

(to six digits).

3.9 Linear ordinary differential equations

1. If x1(t) = ert, x2(t) = tert, where r ∈ R, then the Wronskian matrix is

W =

[

ert0 t0ert0

rert0 ert0 + rt0ert0

]

.

Choosing t0 = 0, we obtain

W =

[

1 0r 1

]

,

which is obvious nonsingular. Thus ert, tert is a linearly independent subset of C(R).

7. Consider the set x1, x2, x3 ⊂ C(R), where x1(t) = t, x2(t) = t2, x3(t) = t3.

(a) The Wronskian matrix of x1, x2, x3 at t0 = 1 is

W =

1 1 11 2 30 2 6

.

A direct calculation shows that ker(W ) = 0, and hence W is nonsingular. By Theorem 129, thisimplies that x1, x2, x3 is linearly independent.

(b) The Wronskian matrix of x1, x2, x3 at t0 = 0 is

W =

0 0 01 0 00 2 0

,

which is obviously singular.

Theorem 130 states that, if x1, x2, x3 are all solutions of a third-order linear ODE, then W is nonsingularfor all t0 if and only if x1, x2, x3 is linearly independent. In this example, x1, x2, x3 is linearlyindependent, but W is singular for t0 = 0. This does not violate Theorem 130, because x1, x2, x3 are notsolutions of a common third-order ODE.

3.10. GRAPH THEORY 23

3.10 Graph theory

1. Let G be a graph, and let vi, vj be two nodes in VG. Since (AℓG)ij is the number of walks of length ℓ

joining vi and vj , the distance between vi and vj is the smallest ℓ (ℓ = 1, 2, 3, . . .) such that (AℓG)ij 6= 0.

3. Let G be a graph, and let AG be the adjacency matrix of G. We wish to prove that(

A2G

)

iiis the degree

of vi for each i = 1, 2, . . . , n. Since AG is symmetric ((AG)ij = (AG)ji for all i, j = 1, 2, . . . , n), we have

(

A2G

)

ii=

n∑

j=1

(AG)2ij .

Now, (AG)2ij is 1 if an edge joins vi and vj , and it equals 0 otherwise. Thus(

A2G

)

iicounts the number of

edges having vi as an endpoint.

3.11 Coding theory

3. The following message is received.

010110 000101 010110 010110 010001 100100 010110 010001

It is known that the code of Example 141 is used. Let B be the 8× 6 binary matrix whose rows are theabove codewords. If we try to solve MG = B, we obtain

M =

0 1 0 1x x x x0 1 0 10 1 0 10 1 0 01 0 0 10 1 0 10 1 0 0

,

where the second “codeword” cannot be decoded because it is not a true codeword (that is mG = b2 isinconsistent, where b2 = 000101). In fact, both 000111 (= mG for m = 0001) and 001101 (= mG form = 0011) are distance 1 from b2. This means that the first ASCII character, with code 0101xxxx couldbe 01010001 (Q) or 01010011 (S). The remaining characters are 01010101 (U), 01001001 (I), 01010100(T), so the message is either “QUIT” or “SUIT”.

3.12 Linear programming

5. The LP is unbounded; every point of the form (x1, x2) = (3+ t, t/3), t ≥ 0, is feasible, and for such pointsz increases without bound as t→∞.

11. If we apply the simplex method to the LP of Example 158, using the smallest subscript rule to chooseboth the entering and leaving variables, the method terminates after 7 iterations with an optimal solutionof x = (1, 0, 1, 0) and z = 1. The basic variables change as follows:

x5, x6, x7 → x1, x6, x7 → x1, x2, x7 → x3, x2, x7 → x3, x4, x7→ x5, x4, x7 → x5, x1, x7 → x5, x1, x3.

Chapter 4

Determinants and eigenvalues

4.1 The determinant function

3. Define

A =

a 0 0 00 b 0 00 0 c 00 0 0 d

, B =

a b c d0 e f g0 0 h i0 0 0 j

.

Then

det(A) = det(ae1, be2, ce3, de4) = adet(e1, be2, ce3, de4)

= abdet(e1, e2, ce3, de4)

= abcdet(e1, e2, e3, de4)

= abcddet(e1, e2, e3, e4)

= abcd.

Here we have repeatedly used the second part of the definition of the determinant. To compute det(B),we note that we can add any multiple of one column to another without changing the determinant (thethird part of the definition of the determinant). We have

det(B) = det(ae1, be1 + ee2, ce1 + fe2 + he3, de1 + ge2 + ie3 + je4)

= adet(e1, be1 + ee2, ce1 + fe2 + he3, de1 + ge2 + ie3 + je4)

= adet(e1, ee2, fe2 + he3, ge2 + ie3 + je4)

= aedet(e1, e2, fe2 + he3, ge2 + ie3 + je4)

= aedet(e1, e2, he3, ie3 + je4)

= aehdet(e1, e2, e3, ie3 + je4)

= aehdet(e1, e2, e3, je4)

= aehjdet(e1, e2, e3, e4) = aehj.

5. Consider the permutation τ = (4, 3, 2, 1) ∈ S4. We can write

τ = [2, 3][1, 4],

τ = [3, 4][2, 4][2, 3][1, 4][1, 3][1, 2].

Since τ is the product of an even number of permutations, we see that σ(τ) = 1.

25

26 CHAPTER 4. DETERMINANTS AND EIGENVALUES

11. Let n be a positive integer, and let i and j be integers satisfying

1 ≤ i, j ≤ n, i 6= j.

For any τ ∈ Sn, define τ ′ by τ ′ = τ [i, j] (that is, τ ′ is the composition of τ and the transposition [i, j].Finally, define f : Sn → Sn by f(τ) = τ ′. We wish to prove that f is a bijection. First, let γ ∈ Sn, anddefine θ ∈ Sn by θ = γ[i, j]. Then f(θ) = (γ[i, j])[i, j] = γ([i, j][i, j]). It is obvious that [i, j][i, j] is theidentity permutation, and hence f(θ) = γ. This shows that f is surjective. Similarly, if θ1, θ2 ∈ Sn andf(θ1) = f(θ2), then θ1[i, j] = θ2[i, j]. But then

θ1[i, j] = θ2[i, j] ⇒ (θ1[i, j])[i, j] = (θ2[i, j])[i, j]

⇒ θ1([i, j][i, j]) = θ2([i, j][i, j]) ⇒ θ1 = θ2.

This shows that f is injective, and hence bijective.

4.2 Further properties of the determinant function

3. Let F be a field and let A ∈ Fn×n. We wish to show that AT A is singular if and only if A is singular.We have

det(AT A) = det(AT )det(A) = det(A)2,

since det(AT ) = det(A). It follows that det(AT A) = 0 if and only if det(A) = 0, that is, AT A is singularif and only if A is singular.

9. Let A ∈ Fm×n, B ∈ Fn×m, where m < n. We will show by example that both det(AB) = 0 anddet(AB) 6= 0 are possible. First, note that

A =

[

1 2 11 0 1

]

, B =

1 11 11 1

⇒ AB =

[

4 42 2

]

⇒ det(AB) = 4 · 2− 2 · 4 = 0.

On the other hand,

A =

[

1 2 11 0 1

]

, B =

1 12 01 1

⇒ AB =

[

6 22 2

]

⇒ det(AB) = 6 · 2− 2 · 2 = 8.

4.3 Practical computation of det(A)

7. Suppose A ∈ Rn×n is invertible and has integer entries, and assume det(A) = ±1. Notice that thedeterminant of a matrix with integer entries is obviously an integer. We can compute the jth column ofA−1 by solving Ax = ej. By Cramer’s rule,

(A−1)ij =det(Ai(ej))

det(A)= ±det(Ai(ej)), i = 1, . . . , n.

Since det(Ai(ej)) is an integer, so is (A−1)ij . This holds for all i, j, and hence A−1 has integer entries.

8. The solution is x = (2, 1, 1).

4.5. EIGENVALUES AND THE CHARACTERISTIC POLYNOMIAL 27

4.5 Eigenvalues and the characteristic polynomial

1. (a) The eigenvalues are λ1 = 1 (algebraic multiplicity 2) and λ2 = −1 (algebraic multiplicity 1). Basesfor the eigenspaces are (0, 1, 0), (1, 0, 2) and (4, 0, 7), respectively.

(b) The eigenvalues are λ1 = 2 (algebraic multiplicity 2) and λ2 = 1 (algebraic multiplicity 1). Basesfor the eigenspaces are (1, 1,−2) and (5, 5,−9), respectively.

5. Suppose A ∈ Rn×n has a real eigenvalue λ and a corresponding eigenvector z ∈ Cn. We wish to showthat either the real or imaginary part of z is an eigenvector of A. We are given that Az = λz, z 6= 0.Write z = x + iy, x, y ∈ Rn. Then

Az = λz ⇒ A(x + iy) = λ(x + iy)⇒ Ax + iAy = λx + iλy

⇒ Ax = λx and Ay = λy.

Since z 6= 0, it follows that x 6= 0 or y 6= 0. If x 6= 0, then the real part of z is an eigenvector for Acorresponding to λ; otherwise, y 6= 0 and the imaginary part of z is an eigenvector.

9. Let q(r) = rn + cn−1rn−1 + · · ·+ c0 be an arbitrary polynomial with coefficients in a field F , and let

A =

0 0 0 · · · −c0

1 0 0 · · · −c1

0 1 0 · · · −c2

.... . .

. . ....

0 0 · · · 1 −cn−1

.

We wish to prove that pA(r) = q(r). We argue by induction on n. The case n = 1 is trivial, since A is1×1 in that case: A = [−c0], |rI−A| = r+ c0 = q(r). Suppose the result holds for polynomials of degreen− 1, let q(r) = rn + cn−1r

n−1 + . . . + c0, and let A be defined as above. Then

|rI −A| =

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

r 0 0 · · · c0

−1 r 0 · · · c1

0 −1 r · · · c2

.... . .

. . ....

0 0 · · · −1 r + cn−1

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

,

and cofactor expansion along the first row yields

|rI −A| = r

∣

∣

∣

∣

∣

∣

∣

∣

∣

r 0 · · · c1

−1 r · · · c2

. . .. . .

...0 · · · −1 r + cn−1

∣

∣

∣

∣

∣

∣

∣

∣

∣

+

(−1)n+1c0

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

−1 r · · ·−1 r · · ·

. . .. . .

−1 r−1

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

.

By the induction hypothesis, the first determinant is

c1 + c2r + · · ·+ cn−1rn−2 + rn−1,

and the second is simply (−1)n−1. Thus

pA(r) = |rI −A|= r

(

c1 + c2r + · · ·+ cn−1rn−2 + rn−1

)

+ (−1)n+1c0(−1)n−1

= c1r + c2r2 + · · ·+ cn−1r

n−1 + rn + c0 = q(r).

28 CHAPTER 4. DETERMINANTS AND EIGENVALUES

This completes the proof by induction.

4.6 Diagonalization

3. A is diagonalizable: A = XDX−1, where

D =

[

1− i√

2 0

0 1 + i√

2

]

, X =

[

i√

2 −i√

21 1

]

.

7. A is diagonalizable: A = XDX−1, where

D =

[

0 00 1

]

, X =

[

0 11 1

]

.

11. Let F be a finite field. We will show that F is not algebraically closed by constructing a polynomial p(x)with coefficients in F such that p(x) 6= 0 for all x ∈ F . Let the elements of F be α1, α2, . . . , αq. Define

p(x) = (x− α1)(x − α2) · · · (x− αq) + 1.

Then p is a polynomial of degree q, and p(x) = 1 for all x ∈ F . Thus p(x) has no roots, which showsthat F is not algebraically closed.

4.7 Eigenvalues of linear operators

1. Let T : R3 → R3 be defined by

T (x) = (ax1 + bx2, bx1 + ax2 + bx3, bx2 + ax3),

where a, b ∈ R are constants. Notice that T (x) = Ax for all x ∈ R3, where

A =

a b 0b a b0 b a

.

The eigenvalues of A are a, a+√

2b, a−√

2b. If b = 0, then A is already diagonal, which means that [T ]X ,Xis diagonal if X is the standard basis. If b 6= 0, then A (and hence T ) has three distinct eigenvalues, andhence three linearly independent eigenvectors. It follows that there exists a basis X such that [T ]X ,X isdiagonal.

9. Let L : C3 → C3 be defined by

L(z) = (z1, 2z2, z1 + z3).

Then L(z) = Az for all z ∈ C3, where

A =

1 0 00 2 01 0 1

.

The eigenvalues of A are the diagonal entries, λ1 = 1 (with algebraic multiplicity 2) and λ2 = 2. Astraightforward calculation shows that EA(1) = sp(0, 0, 1), and hence A is defective and not diagonal-izable. Notice that A = [L]S,S , where S is the standard basis for C3. Since [L]X ,X is either diagonalizablefor every basis X or diagonalizable for no basis X , we see that [L]X ,X is not diagonalizable for every basisX of C3.

4.8. SYSTEMS OF LINEAR ODES 29

11. Let T : Z22 → Z2

2 be defined by T (x) = (0, x1 + x2). Then T (x) = Ax for all x ∈ Z2, where

A =

[

0 01 1

]

.

The eigenvalues of A are the diagonal entries, λ1 = 0 and λ2 = 1. Corresponding eigenvectors arex1 = (1, 1) and x2 = (0, 1), respectively. It follows that if X = x1, x2, then [T ]X ,X is diagonal:

[T ]X ,X =

[

0 00 1

]

.

4.8 Systems of linear ODEs

5. Let A ∈ R2×2 be defined by

A =

[

1 49 1

]

.

We wish to solve the IVP u′ = Au, u(0) = v, where v = (1, 2). The eigenvalues of A are λ1 = 7 andλ2 = −5. Corresponding eigenvectors are (2, 3) and (2,−3), respectively. Therefore, the general solutionof the ODE u′ = Au is

x(t) = c1e7t

[

23

]

+ c2e−5t

[

2−3

]

.

We solve x(0) = (1, 2) to obtain c1 = 7/12, c2 = −1/12, and thus the solution of the IVP is

x(t) =7

12e7t

[

23

]

− 1

12e−5t

[

2−3

]

.

7.

etA =

[

et cos (2t) et sin (2t)−et sin (2t) et cos (2t)

]

.

4.9 Integer programming

1. Each of the matrices is totally unimodular by Theorem 219. The sets S1 and S2 are given below.

(a) S1 = 1, 2, S2 = 3, 4.(b) S1 = 1, 2, 3, 4, S2 = ∅.(c) S1 = 2, 3, S2 = 1, 4.

Chapter 5

The Jordan canonical form

5.1 Invariant subspaces

1. (a) S is not invariant under A (in fact, A does not map either basis vector into S).

(b) T is invariant under A.


A =

3 0 −1−6 1 3

2 0 0

,

and let S = sps1, s2, where s1 = (0, 1, 0), s2 = (1, 0, 1). A direct calculation shows that

As1 = s1, As2 = −3s1 + 2s2.

It follows that S is invariant under A. We extend s1, s2 to a basis s1, s2, s3 of R3 by definings3 = (0, 0, 1), and define X = [s1|s2|s3]. Then

X−1AX =

1 −3 30 2 −10 0 1

.

is block upper triangular (in fact, simply upper triangular).

9. Let U be a finite-dimensional vector space over a field F , and let T : U → U be a linear operator. LetU = u1, u2, . . . , un be a basis for U and define A = [T ]U ,U . Suppose X ∈ Fn×n is an invertible matrix,and define J = X−1AX . Finally, define

vj =

n∑

i=1

Xijui, j = 1, 2, . . . , n.

(a) We wish to show that V = v1, v2, . . . , vn is a basis for U . Since n = dim(V ), it suffices to provethat V is linearly independent. First notice that

n∑

j=1

cjvj =

n∑

j=1

cj

(

n∑

i=1

Xijui

)

=

n∑

j=1

n∑

i=1

Xijcjui =

n∑

i=1

n∑

j=1

Xijcjui

=

n∑

j=1

(

n∑

i=1

Xijcj

)

ui

=

n∑

j=1

(Xc)iui.

31

32 CHAPTER 5. THE JORDAN CANONICAL FORM

Also,∑n

j=1(Xc)iui = 0 implies that Xc = 0 since U is linearly independent. Therefore,

n∑

j=1

cjvj = 0 ⇒n∑

j=1

(Xc)iui = 0 ⇒ Xc = 0 ⇒ c = 0,

where the last step follows from the fact that X is invertible. Thus V is linearly independent.

(b) Now we wish to prove that [T ]V,V = J . The calculation above shows that if [v]V = c, then [v]U = Xc,that is,

[v]U = X [v]V .

Therefore, for all v ∈ V ,

[T ]V,V [v]V = [T (v)]V ⇔ [T ]V,VX−1[v]U = X−1[T (v)]U

⇔(

X [T ]V,VX−1)

[v]U = [T (v)]U ,

which shows that [T ]U ,U = X [T ]V,VX−1, or [T ]V,V = X−1[T ]U ,UX = X−1AX = J , as desired.

5.2 Generalized eigenspaces

5. For the given A ∈ R5×5, we have pA(r) = (r − 1)3(r + 1)2, so the eigenvalues are λ1 = 1 and λ2 = −1.Direct calculation shows that

dim(N (A − I)) = 1, dim(N ((A − I)2)) = 2, dim(N ((A − I)3)) = 3, dim(N ((A − I)4)) = 3

anddim(N (A + I)) = 1, dim(N ((A + I)2)) = 2, dim(N ((A + I)3)) = 2.

These results show that the generalized eigenspaces are N ((A − I)3) and N ((A + I)2). Bases for thesesubspaces are

N ((A − I)3) = sp(0, 1, 0, 0, 0), (0, 0, 1, 0, 0), (0, 0, 0, 1, 0),N ((A + I)2) = sp(−1, 0, 12, 4, 0), (1, 4, 0, 0, 2).

A direct calculation shows that the union of the two bases is linearly independent and hence a basis forR5, and it then follows from Theorem 226 that R5 = N ((A− I)3) +N ((A + I)2).

9. Let F be a field, let λ ∈ F be an eigenvalue of A ∈ Fn×n, and suppose that the algebraic and geometricmultiplicities of λ are equal, say to m. We wish to show that N ((A − λI)2) = N (A − λI). By Theorem235, there exists a positive integer k such that

dim(N ((A − λI)k+1)) = dim(N ((A − λI)k))

and dim(N ((A − λI)k)) = m. We know that

N (A− λI) ⊂ N ((A − λI)2) ⊂ · · · ⊂ N ((A − λI)k),

and, by hypothesis, dim(N (A−λI)) = m = dim(N ((A−λI)k)). This implies that N (A−λI) = N ((A−λI)k), and hence that N ((A − λI)2) = N (A− λI) (since N (A− λI) ⊂ N ((A− λI)2) ⊂ N ((A − λI)k)).

13. Let F be a field and suppose A ∈ Fn×n has distinct eigenvalues λ1, . . . , λt. We wish to show that A isdiagonalizable if and only if

mA(r) = (r − λ1) · · · (r − λt).

First, suppose A is diagonalizable and write p(r) = (r − λ1) · · · (r − λt). Every vector x ∈ Rn can bewritten as a linear combination of eigenvectors of A, from which it is easy to prove that p(A)x = 0 forall x ∈ Fn (recall that the factors (A− λiI) commute with one another). Hence p(A) is the zero matrix.

5.3. NILPOTENT OPERATORS 33

It follows from Theorem 239 that mA(r) divides p(r). But every eigenvalue of A is a root of mA(r), andhence p(r) divides mA(r). Since both p(r) and mA(r) are monic, it follows that mA(r) = p(r), as desired.

Conversely, suppose the minimal polynomial of A is mA(r) = (r − λ1) · · · (r − λt). But then Corollary244 implies that Fn is the direct sum of the subspaces

N (A− λ1I), . . . ,N (A− λtI),

which are the eigenspaces

EA(λ1), . . . , EA(λt).

Since Fn is the direct sum of the eigenspaces, it follows that there is a basis of Fn consisting of eigenvectorsof A (see the proof of Theorem 226 given in Exercise 5.1.10), and hence A is diagonalizable.

5.3 Nilpotent operators

5. Let A ∈ Cn×n be nilpotent. We wish to prove that the index of nilpotency of A is at most n. ByExercise 2, the only eigenvalue of A is λ = 0, which means that the characteristic polynomial of A mustbe pA(r) = rn. By the Cayley-Hamilton theorem, pA(A) = 0, and hence An = 0. It follows that theindex of nilpotency of A is at most n.

9. Suppose A ∈ Rn×n has 0 as its only eigenvalue, so that it must be nilpotent of index k for some ksatisfying 1 ≤ k ≤ 5. The possibilities are

• If k = 1, then A = 0 and dim(N (A)) = 5.

• If k = 2, there is at least one chain x1, Ax1 of nonzero vectors. There could be a second suchchain, x2, Ax2, in which case the fifth basis vector must come from N (A), we have dim(N (A)) = 3,dim(N (A2)) = 5, and A is similar to

1 1 0 0 00 1 0 0 00 0 1 1 00 0 0 1 00 0 0 0 1

.

It is also possible that there is only one such chain, in which case dim(N (A)) = 4, dim(N (A2)) = 5,and A is similar to

1 1 0 0 00 1 0 0 00 0 1 0 00 0 0 1 00 0 0 0 1

.

• If k = 3, there is a chain x1, Ax1, A2x1 of nonzero vectors. There are two possibilities for the

other vectors needed for the basis. There could be a chain x2, Ax2 (with A2x20), in which casedim(N (A)) = 2, dim(N (A2)) = 4, and dim(N (A3)) = 5, and A is similar to

1 1 0 0 00 1 1 0 00 0 1 0 00 0 0 1 10 0 0 0 1

.


Altenatively, there could be two more independent vectors in N (A), in which case dim(N (A)) = 3,dim(N (A2)) = 4, dim(N (A3)) = 5, and A is similar to

1 1 0 0 00 1 1 0 00 0 1 0 00 0 0 1 00 0 0 0 1

.

• If k = 4, then there is a chain x1, Ax1, A2x1, A

3x1 of nonzero vectors, and there must be a sec-ond independent vector in N (A). Then dim(N (A)) = 2, dim(N (A2)) = 3, dim(N (A3)) = 4,dim(N (A4)) = 5, and A is similar to

1 1 0 0 00 1 1 0 00 0 1 1 00 0 0 1 00 0 0 0 1

.

• If k = 5, then there is a chain x1, Ax1, A2x1, A

3x1, A4x1 of nonzero vectors, dim(N (A)) = 1,

dim(N (A2)) = 2, dim(N (A3)) = 3, dim(N (A4)) = 4, dim(N (A5)) = 5, and A is similar to

1 1 0 0 00 1 1 0 00 0 1 1 00 0 0 1 10 0 0 0 1

.

15. Let F be a field and suppose A ∈ Fn×n is nilpotent. We wish to prove that det(I +A) = 1. By Theorem251 and the following discussion, we know there exists an invertible matrix X ∈ Fn such that X−1AXis upper triangular with zeros on the diagonal. But then

X−1(I + A)X = I + X−1AX

is upper triangular with ones on the diagonal. From this, we conclude that det(X−1(I +A)X) = 1. SinceI + A is similar to X−1(I + A)X , it has the same determinant, and hence det(I + A) = 1.

5.4 The Jordan canonical form of a matrix

1. Let

A =

1 1 10 1 10 0 1

∈ R3×3.

The only eigenvalue of A is λ = 1, and dim(N (A − I)) = 1. Therefore, there must be a single chain ofthe form (A− I)2x1, (A− I)x1, x1, where x1 ∈ N ((A − I)3) \ N ((A − I)2). We have

(A− I)2 =

0 0 10 0 00 0 0

,

and thus it is easy to see that x1 = (0, 0, 1) 6∈ N ((A − I)2). We define

X = [(A− I)2x1|(A− I)x1|x1] =

1 1 00 1 00 0 1

,

5.5. THE MATRIX EXPONENTIAL 35

and then

X−1AX =

1 1 00 1 10 0 1

.


A =

−3 1 −4 −4−17 1 −17 −38−4 −1 −3 −14

4 0 4 10

.

Then pA(r) = (r−1)3(r−2). A direct calculation shows that dim(N (A− I)) = 1, dim(N ((A− I)2)) = 2,dim(N ((A − I)3)) = 3 (and, of course, dim(N (A − 2I)) = 1. Therefore, the Jordan canonical form of Ais

1 1 0 00 1 1 00 0 1 00 0 0 2

.


A =

−7 1 24 4 7−9 4 21 3 6−2 −1 11 2 3−7 13 −18 −6 −8

3 −5 6 3 5

.

Then pA(r) = (r−1)3(r−2)2. A direct calculation shows that dim(N (A−I)) = 2, dim(N ((A−I)2)) = 3,dim(N (A− 2I)) = 1, and dim(N ((A − 2I)2)) = 2. Therefore, the Jordan canonical form of A is

1 1 0 0 00 1 0 0 00 0 1 0 00 0 0 2 10 0 0 0 2

5.5 The matrix exponential

3. We wish to find the matrix exponential etA for the matrix given in Exercise 5.4.5. The similarity trans-formation defined by

X =

1 20 −9 00 1 0 −4−1 −20 0 −2

0 0 4 1

puts A in Jordan canonical form:

X−1AX = J =

1 1 0 00 1 1 00 0 1 00 0 0 2

.

We have

etJ =

et tet t2

2 et 00 et tet 00 0 et 00 0 0 e2t

,


and etA = XetJX−1 is2

6

6

6

6

6

6

4

et„

1 − 4t − t2

2

«

tet et„

−4t − t2

2

«

et“

−4t − t2”

et (16 − t) − 16e2t et et (16 − t) − 16e2t et (36 − 2t) − 36e2t

et„

8 − 4t + t2

2

«

− 8e2t −tet et„

9 + 4t + t2

2

«

− 8e2t et“

18 + 4t + t2”

− 18e2t

−4et + 4e2t 0 −4et + 4e2t −8et + 4e2t

3

7

7

7

7

7

7

5

.

5. Let A, B ∈ Cn×n.

(a) We first show that if A and B commute, then so do etA and B. We define U(t) = etAB − BetA.Then

U ′(t) = AetAB −BAetA = AetAB −ABetA

= A(

etAB −BetA)

= AU(t).

Also, U(0) = IB −BI = B −B = 0. Thus U satisfies U ′ = AU , U(0) = 0. But the unique solutionof this IVP is U(t) = 0; hence etAB = BetA.

(b) Use the preceding result to show that if A and B commute, the et(A+B) = etAetB holds. We defineU(t) = etAetB. Then

U ′(t) = AetAetB + etABetB = AetAetB + BetAetB

= (A + B)etAetB

(notice how we used the first part of the exercise), and U(0) = I. But et(A+B) also satisfies the IVPU ′ = (A + B)U , U(0) = I. Hence et(A+B) and etAetB must be equal.

(c) Let

A =

[

0 11 0

]

, B =

[

0 10 0

]

.

Then

etA =

[

1+e2

2−1+e2

2e−1+e2

2e1+e2

2e

]

, etB =

[

1 10 1

]

and

eA+B =

12

(

e√

2 + e−√

2)

1√2

(

e√

2 − e−√

2)

12√

2

(

e√

2 − e−√

2)

12

(

e√

2 + e−√

2)

.

It is easy to verify that eAeB 6= eA+B.

5.6 Graphs and eigenvalues

3. The adjacency matrix is

AG =

0 1 1 0 0 0 01 0 1 0 0 0 01 1 0 0 0 0 00 0 0 0 1 0 10 0 0 1 0 1 00 0 0 0 1 0 10 0 0 1 0 1 0

,

and its eigenvalues are 0,±1,±2. Notice that the adjacency matrix shows that every vertex has degree2, and hence G is 2-regular. Theorem 263 states that the largest eigenvalue of G should be 2, as indeedit is. Also, since the multiplicity of λ = 2 is 2, G must have two connected components. It does; thevertices v1, v2, v3 form one connected component, and the vertices v4, v5, v6, v7 form the other.

Chapter 6

Orthogonality and best approximation

6.1 Norms and inner products

1. Let V be a normed vector space over R, and suppose u, v ∈ V with v = αu, α ≥ 0. Then

‖u + v‖ = ‖u + au‖ = ‖(1 + a)u‖ = (1 + a)‖u‖= ‖u‖+ a‖u‖= ‖u‖+ ‖au‖ = ‖u‖+ ‖v‖

(note the repeated use of the second property of a norm from Definition 265). Thus, if v = au, a ≥ 0,then equality holds in the triangle inequality (‖u + v‖ = ‖u‖+ ‖v‖).

5. We wish to derive relationships among the L1(a, b), L2(a, b), and L∞(a, b) norms. Three such relationshipsexist:

‖f‖1 ≤ (b− a)‖f‖∞ for all f ∈ L∞(a, b),

‖f‖2 ≤√

b− a‖f‖∞ for all f ∈ L∞(a, b),

‖f‖1 ≤√

b− a‖f‖2 for all f ∈ L2(a, b).

The first two are simple; we have |f(x)| ≤ ‖f‖∞ for (almost) all x ∈ [a, b], and hence

‖f‖1 =

∫ b

a

|f(x)| dx ≤∫ b

a

‖f‖∞ dx = (b− a)‖f‖∞,

‖f‖22 =

∫ b

a

|f(x)|2 dx ≤∫ b

a

‖f‖2∞ dx = (b− a)‖f‖2∞.

To prove the third relationship, define v(x) = 1 for all x ∈ [a, b]. Then

∫ b

a

|f(x)| dx =

∫ b

a

|v(x)||f(x)| dx = 〈|v|, |f |〉2 ≤ ‖v‖2‖f‖2,

where the last step follows from the Cauchy-Schwarz inequality. Since ‖v‖2 =√

b− 1, the desired resultfollows.

Note that it is not possible to bound ‖f‖∞ by a multiple of either ‖f‖1 or ‖f‖2, nor to bound ‖f‖2 by amultiple of ‖f‖1.

9. The following graph shows the (boundaries of the) unit balls in the ℓ2 norm (solid curve), the ℓ1 norm(dotted curve), and the ℓ∞ norm (dashed curve).

37

38 CHAPTER 6. ORTHOGONALITY AND BEST APPROXIMATION

Notice that ℓ1 unit ball is contained in the other two, which is consistent with ‖x‖1 ≥ ‖x‖2, ‖x‖∞, whilethe ℓ2 unit ball is contained in the ℓ∞ unit ball, consistent with ‖x‖∞ ≤ ‖x‖2.

11. Suppose V is an inner product space and ‖ · ‖ is the norm defined by the inner product 〈·, ·〉 on V . Then,for all u, v ∈ V ,

‖u + v‖2 + ‖u− v‖2 = 〈u + v, u + v〉+ 〈u− v, u− v〉= 〈u, u〉+ 2 〈u, v〉+ 〈v, v〉+ 〈u, u〉 − 2 〈u, v〉+ 〈v, v〉= 2 〈u, u〉+ 2 〈v, v〉= 2‖u‖2 + 2‖v‖2.

Thus the parallelogram law holds in V .

If u = (1, 1) and v = (1,−1) in R2, then a direct calculation shows that

‖u + v‖21 + ‖u− v‖21 = 8, 2‖u‖21 + 2‖v‖21 = 16,

and hence the parallelogram law does not hold for ‖ · ‖1. Therefore, ‖ · ‖1 cannot be defined by an innerproduct. For the same u and v, we have

‖u + v‖2∞ + ‖u− v‖2∞ = 8, 2‖u‖2∞ + 2‖v‖2∞ = 4,

and hence the parallelogram law does not hold for ‖ · ‖∞. Therefore, ‖ · ‖∞ cannot be defined by an innerproduct.

6.2 The adjoint of a linear operator

9. Let M : P2 → P3 be defined by M(p) = q, where q(x) = xp(x). We wish to find M∗, assumingthat the L2(0, 1) inner product is imposed on both P2 and P3. Following the technique of Example277 (and the previous exercise), we compute the matrix N , defined by Nij = 〈M(pi), qj〉P3

, where

S1 = p1, p2, p3 = 1, x, x2 and S2 = q1, q2, q3, q4 = 1, x, x2, x3:

N =

12

13

14

15

13

14

15

16

14

15

16

17

.

The Gram matrix is the same as in the previous exercise, and we obtain

[M ]S3,S2 = G−1N =

0 0 120

335

1 0 − 35 − 32

35

0 1 32

127

.

This shows that M∗ maps a0 + a1x + a2x2 + a3x

3 to

(

1

20a2 +

3

35a3

)

+

(

a0 −3

5a2 −

32

35a3

)

x +

(

a1 +3

2a2 +

12

7a3

)

x2.

6.3. ORTHOGONAL VECTORS AND BASES 39

11. Let X and U be finite-dimensional inner product spaces over R, and suppose T : X → U is linear. DefineS : R(T ∗)→ R(T ) by S(x) = T (x) for all x ∈ R(T ∗). We will prove that S is an isomorphism betweenR(T ∗) and R(T ).

(a) First, suppose x1, x2 ∈ R(T ∗) satisfy S(x1) = S(x2). Since x1, x2 ∈ R(T ∗), there exist u1, u2 ∈ Usuch that x1 = T ∗(u1), x2 = T ∗(u2). Then we have

S(x1) = S(x2)⇒ T (x1) = T (x2)

⇒ T (T ∗(u1)) = T (T ∗(u2))

⇒ T (T ∗(u1 − u2)) = 0

⇒ 〈u1 − u2, T (T ∗(u1 − u2))〉U = 0

⇒ 〈T ∗(u1 − u2), T∗(u1 − u2)〉X = 0

⇒ T ∗(u1 − u2) = 0

⇒ T ∗(u1) = T ∗(u2)

⇒ x1 = x2.

Therefore, S is injective.

(b) Since S injective, Theorem 93 implies that dim(R(T )) ≥ dim(R(T ∗)). This results holds for anylinear operator mapping one finite-dimensional inner product space to another, and hence it ap-plies to the operator T ∗. Hence dim(R(T ∗)) ≥ dim(R((T ∗)∗)). Since (T ∗)∗ = T , it follows thatdim(R(T ∗)) ≥ dim(R(T )), and therefore that dim(R(T ∗)) ≥ dim(R(T )) (that is, rank(T ∗) =rank(T )).

(c) Now we see that S is an injective linear operator mapping one finite-dimensional vector space toanother of the same dimension. It follows from Corollary 105 that S is also surjective and hence isan isomorphism.

6.3 Orthogonal vectors and bases

3. The equation α1p1 + α2p2 + α3p3 = q is equivalent to

(

α1 −1

2α2 +

1

6α3

)

+ (α2 − α3)x + α3x2 = 3 + 2x + x2,

which in turn is equivalent to the system

α1 −1

2α2 +

1

6α3 = 3,

α2 − α3 = 2,

α3 = 1.

The solution is α = (13/3, 3/1), and thus

q(x) =13

3p1(x) + 3p2(x) + p3(x).

7. Consider the functions ex and e−x to be elements of C[0, 1], and regard C[0, 1] as an inner product spaceunder the L2(0, 1) inner product. Define S = spex, e−x. We wish to find an orthogonal basis f1, f2for S. We will take f1(x) = ex. The function f2 must be of the form f(x) = c1e

x + c2e−x and satisfy

∫ 1

0

f(x)ex dx = 0.


This last condition leads to the equation

c1

∫ 1

0

e2x dx + c2

∫ 1

0

1 dx = 0 ⇔ 1

2(e2 − 1)c1 + c2 = 0.

One solution is c1 = 2, c2 = 1− e2. Thus if f2(x) = 2ex + (1− e2)e−x, then f1, f2 is an orthonal basisfor S.

15. Let V be an inner product space over R, and let u, v be vectors in V .

(a) Assume u and v are nonzero. We wish to prove that v ∈ spu if and only if | 〈u, v〉 | = ‖u‖‖v‖.First suppose v ∈ spu. Then, by Exercise 13,

v =〈v, u〉〈u, u〉u⇒ ‖v‖ =

∥

∥

∥

∥

〈v, u〉〈u, u〉u

∥

∥

∥

∥

=| 〈v, u〉 |‖u‖2 ‖u‖ =

| 〈v, u〉 |‖u‖ .

This yields ‖u‖‖v‖ = | 〈v, u〉 |, as desired.

On the other hand, if v 6∈ spu, then, by the previous exercise,

‖v‖ >

∥

∥

∥

∥

〈v, u〉〈u, u〉u

∥

∥

∥

∥

=| 〈v, u〉 |‖u‖ ,

and hence ‖u‖‖v‖ > | 〈v, u〉 |. Thus v ∈ spu if and only if ‖u‖‖v‖ = | 〈v, u〉 |.(b) If u and v are nonzero, then the first part of the exercise shows that equality holds in the Cauchy-

Schwarz inequality if and only if v ∈ spu, that is, if and only if v is a multiple of u. If v = 0 oru = 0, then equality trivially holds in the Cauchy-Schwarz inequality (both sides are zero). Thus wecan say that equality holds in the Cauchy-Schwarz inequality if and only if u = 0 or v is a multipleof u.

6.4 The projection theorem

1. Let A ∈ Rm×n.

(a) We wish to prove that N (AT A) = N (A). First, if x ∈ N (A), then Ax = 0, which implies thatAT Ax = 0, and hence that x ∈ N (AT A). Thus N (A) ⊂ N (AT A).

Conversely, suppose x ∈ N (AT A). Then

AT Ax = 0 ⇒ x ·AT Ax = 0 ⇒ (Ax) · (Ax) = 0 ⇒ Ax = 0.

Therefore, x ∈ N (A), and we have shown that N (AT A) ⊂ N (A). This completes the proof.

(b) If A has full rank, then the null space of A is trivial (by the fundamental theorem of linear algebra)and hence so is the null space of N (AT A). Since AT A is square, this shows that AT A is invertible.

(c) Thus, if A has full rank, then AT A is invertible and, for any y ∈ Rm there is a unique solutionx = (AT A)−1AT y of the normal equations AT Ax = AT y. Thus, by Theorem 291, there is a uniqueleast-squares solution to Ax = y.

9. Consider the following data points: (0, 3.1), (1, 1.4), (2, 1.0), (3, 2.2), (4, 5.2), (5, 15.0). We wish to findthe function of the form f(x) = a1e

x + a2e−x that fits the data as nearly as possible in the least-squares

sense. We wish to solve the equations

a1exi + a2e

−xi = yi, i = 1, 2, 3, 4, 5, 6

6.5. THE GRAM-SCHMIDT PROCESS 41

in the least-squares sense. These equations are equivalent to the system Ma = y, where

M =

ex1 e−x1

ex2 e−x2

ex3 e−x3

ex4 e−x4

ex5 e−x5

ex6 e−x6

, y =

3.11.41.02.25.215.0

.

The solution is a.= (0.10013, 2.9878), and the approximating function is 0.10013ex + 2.9878e−x.

15. Let A ∈ Rm×n, where m < n and rank(A) = m. Let y ∈ Rm.

(a) Since rank(A) = dim(col(A)), the fact that rank(A) = m proves that col(A) = Rm. Thus Ax = yhas a solution for all y ∈ Rm. Moreover, by Theorem 93, N (A) is nontrivial (a linear operatorcannot be injective unless the dimension of the co-domain is at least as large as the dimension ofthe domain), and hence Ax = y has infinitely many solutions.

(b) Consider the matrix AAT ∈ Rm×m. We know from Exercise 1(a) (applied to AT ) that N (AAT ) =N (AT ). By Exercise 6.2.11, rank(AT ) = rank(A) = m, and hence nullity(AT ) = 0 by the funda-mental theorem of linear algebra. Since N (AAT ) = N (AT ) is trivial, this proves that the squarematrix AAT is invertible.

(c) If x = AT(

AAT)−1

y, then

Ax = A(

AT(

AAT)−1

y)

= (AAT )(

AAT)−1

y = y,

and hence x is a solution of Ax = y.

6.5 The Gram-Schmidt process

5. (a) The best cubic approximation, in the L2(−1, 1) norm, to the function f(x) = ex, is the polynomialq(x) = α1+α2x+α3x

2+α4x3, where Gα = b. G is the Gram matrix, and b is defined by bi = 〈f, p1〉2,

where pi(x) = xi−1, i = 1, 2, 3, 4:

G =

2 0 23 0

0 23 0 2

523 0 2

5 00 2

5 0 27

, b =

e− e−1

2e−1

e− 5e−1

16e−1 − 2e

.

We obtain

α =

334e − 3e

4105e

4 − 7654e

15e4 − 105

4e12954e − 175e

4

.

(b) Applying the Gram-Schmidt process to the standard basis 1, x, x2, x3, we obtain the orthogonalbasis

1, x, x2 − 1

3, x3 − 3

5x

.

(c) Using the orthogonal basis for P3, we compute the best approximation to f(x) = ex from P3 (relativeto the L2(−1, 1) norm) to be

e− e−1

2+

3

ex +

(

15e

4− 105

4e

)(

x2 − 1

3

)

+

(

1295

4e− 175e

4

)(

x3 − 3

5x

)

.


9. Let P be the plane in R3 defined by the equation 3x− y − z = 0.

(a) A basis for P is (1, 3, 0), (1, 0, 3); applying the Gram-Schmidt process to this basis yields theorthogonal basis (1, 3, 0), (9/10,−3/10, 3).

(b) The projection of u = (1, 1, 1) onto P is (8/11, 12/11, 12/11).

13. Define an inner product on C[0, 1] by

〈f, g〉 =∫ 1

0

(1 + x)f(x)g(x) dx.

(a) We will first verify that 〈·, ·〉 really does define an inner product on C[0, 1]. For all f, g ∈ C[0, 1], wehave

〈f, g〉 =∫ 1

0

(1 + x)f(x)g(x) dx =

∫ 1

0

(1 + x)g(x)f(x) dx = 〈g, f〉 ,

and thus the first property of an inner product is satisfied. If f, g, h ∈ C[0, 1] and α, β ∈ R, then

〈αf + βg, h〉 =∫ 1

0

(1 + x)(αf(x) + βg(x))h(x) dx

=

∫ 1

0

α(1 + x)f(x)h(x) + β(1 + x)g(x)h(x) dx

= α

∫ 1

0

(1 + x)f(x)h(x) dx + β

∫ 1

0

(1 + x)g(x)h(x) dx

= α 〈f, h〉+ β 〈g, h〉 .This verifies the second property of an inner product. Finally, for any f ∈ C[0, 1],

〈f, f〉 =∫ 1

0

(1 + x)f(x)2 dx ≥ 0

(since (1 + x)f(x)2 ≥ 0 for all x ∈ [0, 1]). Also, if 〈f, f〉 = 0, then (1 + x)f(x)2 = 0 for all x ∈ [0, 1].Since 1 + x > 0 for all x ∈ [0, 1], this implies that f(x)2 ≡ 0, or f(x) ≡ 0. Therefore, 〈f, f〉 = 0 ifand only if f = 0, and we have verified that 〈·, ·〉 defines an inner product on C[0, 1].

(b) Applying the Gram-Schmidt process to the standard basis yields the orthogonal basis

1, x− 5

9, x2 − 68

65x +

5

26

.

6.6 Orthogonal complements

1. Let S = sp(1, 2, 1,−1), (1, 1, 2, 0). We wish to find a basis for S⊥. Let A ∈ R2×4 be the matrixwhose rows are the given vectors (the basis vectors for S). Then x ∈ S⊥ if and only if Ax = 0; that is,S⊥ = N (A). A direct calculation shows that S⊥ = N (A) = (−3, 1, 1, 0), (−1, 1, 0, 1).

5. (a) Since N (A) is orthogonal to col(AT ), it suffices to orthogonalize the basis of col(AT ) by (a singlestep of) the Gram-Schmidt process, yielding (1, 4,−4), (−16/33,−97/33,−101/33). Then

(24,−5, 1), (1, 4,−4), (−16/33,−97/33,−101/33)is an orthogonal basis for R3.

(b) A basis for N (AT ) is (1,−1, 1, 0), (−2, 1, 0, 1) and a basis for col(A) is (1, 1, 0, 1), (4, 3,−1, 5).Applying the Gram-Schmidt process to each of the bases individually yields (1,−1, 1, 0), (−1, 0, 1, 1)and (1, 1, 0, 1), (0,−1,−1, 1), respectively. The union of these two bases,

(1, 1, 0, 1), (−1, 0, 1, 1), (1, 1, 0, 1), (0,−1,−1, 1),is an orthogonal basis for R4.

6.7. COMPLEX INNER PRODUCT SPACES 43

9. (a) Let A ∈ Rn×n be symmetric. Then N (A)⊥ = col(A) and col(A)⊥ = N (A).

(b) It follows that y ∈ col(A) if and only if y ∈ N (A)⊥; in other words, Ax = y has a solution if andonly if

Az = 0 ⇒ y · z = 0.

6.7 Complex inner product spaces

1. The projection of v onto S is

w =

(

2

3− 4

9i,

4

9+

2

3i,

1

3+

4

9i

)

.

5. The best approximation to f from P2 is

2i

π− 24

π2

(

x− 1

2

)

+60i(π2 − 12)

π3

(

x2 − x +1

6

)

.

9. (a) Let u = (1, 1), v = (1,−1 + i) ∈ C2. Then a direct calculation shows that ‖u + v‖22 = ‖u‖22 + ‖v‖22and 〈u, v〉2 = −i 6= 0.

(b) Suppose V is a complex inner product space. If u, v ∈ V and ‖u + v‖2 = ‖u‖2 + ‖v‖2, then

〈u, u〉+ 〈v, v〉+ 〈u, v〉+ 〈v, u〉 = 〈u, u〉+ 〈v, v〉⇔ 〈u, v〉+ 〈v, u〉 = 0

⇔ 〈u, v〉+ 〈u, v〉 = 0.

For any z = x + iy ∈ C, z + z = 0 is equivalent to 2x = 0 or, equivalently, x = 0. Thus‖u + v‖2 = ‖u‖2 + ‖v‖2 holds if and only if the real part of 〈u, v〉 is zero.

6.8 More on polynomial approximation

1. (a) The best quadratic approximation to f in the (unweighted) L2(−1, 1) norm is

e− e−1

2+

3

ex +

15e− 105e−1

4

(

x2 − 1

3

)

.

(b) The best quadratic approximation to f in the weighted L2(−1, 1) norm is (approximately)

1.2660659 + 1.1303182x + 0.27149534(2x2− 1)

(note that the integrals were computed numerically).

The following graph shows the error in the L2 approximation (solid curve) and the error in the weightedL2 approximation (dashed) curve.

−1 −0.5 0 0.5 10

0.02

0.04

0.06

0.08

0.1

We see that the weighted L2 approximation has the smaller maximum error.


3. (a) The orthogonal basis for P3 on [−1, 1] is

1, x, x2 − 1

3, x3 − 3

5x

.

Transforming this to the interval [0, π], we obtain the following orthogonal basis for P3 as a subspaceof L2(0, π):

1,2

πt− 1,

4

π2t2 − 4

πt +

2

3,

8

π3t3 − 12

π2t2 +

24

5πt− 2

5

.

The best cubic approximation to f(t) = sin (t) on [0, π] is

p(t) =2

π+

15(π2 − 12)

π3

(

4

π2t2 − 4

πt +

2

3

)

.

(b) The orthogonal basis for P3 on [−1, 1], in the weighted L2 inner product, is

1, x, 2x2 − 1, 4x3 − 3x

.

Transforming this to the interval [0, π], we obtain the following orthogonal basis for P3 as a subspaceof L2(0, π):

1,2

πt− 1,

8

π2t2 − 8

πt + 1,

32

π3t3 − 48

π2t2 +

18

πt− 1

.

The best cubic approximation to f(t) = sin (t) on [0, π], in the weighted L2 norm, is (approximately)

q(t) = 0.47200122− 0.49940326

(

8

π2t2 − 8

πt + 1

)

(the integrals were computed numerically).

(c) The following graph shows the error in the ordinary L2 approximation (solid curve) and the errorin the weighted L2 approximation (dashed curve):

0 1 2 3 40

0.01

0.02

0.03

0.04

0.05

0.06

The second approximation has a smaller maximum error.

6.9 The energy inner product and Galerkin’s method

5. The variational form of the BVP is to find u ∈ V such that

∫ ℓ

0

k(x)du

dx(x)

dv

dx(x) + p(x)u(x)v(x)

dx =

∫ ℓ

0

f(x)v(x) dx for all v ∈ V.

If the basis for Vn is φ1, . . . , φn, then Galerkin’s method results in the system (K + M)U = F , whereK and F are the same stiffness matrix and load vector as before, and M is the mass matrix:

Mij =

∫ ℓ

0

p(x)φj(x)φi(x) dx, i, j = 1, . . . , n.

6.10. GAUSSIAN QUADRATURE 45

7. The projection of f onto V can be computed just as described by the projection theorem. The projectionis∑n−1

i=1 Viφi, where V ∈ Rn−1 satisfies the system MV = B. The Gram matrix is called the mass matrixin this context (see the solution of Exercise 5):

Mij =

∫ ℓ

0

φj(x)φi(x) dx, i, j = 1, . . . , n− 1.

The vector B is defined by Bi =∫ ℓ

0f(x)φi(x) dx, i = 1, . . . , n− 1.

6.10 Gaussian quadrature

1. We wish to find the Gaussian quadrature rule with n = 3 quadrature nodes (on the reference interval[−1, 1]). We know that the nodes are the roots of the third orthogonal polynomial, p3(x). Therefore, thenodes are x1 = −

√

3/5, x2 = 0, x3 =√

3/5. To find the weights, we need the Lagrange polynomialsdefined by these nodes, which are

L1(x) =5

6x(x −

√

3/5), L2(x) = −5

3x2 + 1, L3(x) =

5

6x(x +

√

3/5).

Then

w1 =

∫ 1

−1

L1(x) dx =5

9,

w2 =

∫ 1

−1

L2(x) dx =5

9,

w3 =

∫ 1

−1

L3(x) dx =8

9.

3. Let w(x) = 1/√

1− x2. We wish to find the weighted Gaussian quadrature rule∫ 1

−1

w(x)f(x) dx.=

n∑

i=1

wif(xi),

where n = 3. The nodes are the roots of the third orthogonal polynomial under this weight function,which is T3(x) = 4x3 − 3x. The nodes are thus x1 = −

√

3/4, x2 = 0, x3 =√

3/4. The correspondingLagrange polynomials are

L1(x) =2

3x(x −

√

3/4), L2(x) = −4

3x2 + 1, L3(x) =

2

3x(x +

√

3/4).

The weights are then Then

w1 =

∫ 1

−1

w(x)L1(x) dx =π

3,

w2 =

∫ 1

−1

w(x)L2(x) dx =π

3,

w3 =

∫ 1

−1

w(x)L3(x) dx =π

3.

6.11 The Helmholtz decomposition

1. Let Ω be a domain in R3, and let φ, u be a scalar field and a vector field, respectively, defined on Ω. Wehave

∇ · (φu) =

3∑

i=1

∂

∂xi(φui) =

3∑

i=1

(

∂φ

∂xiui + φ

∂ui

∂xi

)

=

3∑

i=1

∂φ

∂xiui + φ

3∑

i=1

∂ui

∂xi= ∇φ · u + φ∇ · u.


3. Let φ : Ω→ R be a smooth scalar field. Then

∇ · ∇φ =3∑

i=1

∂

∂xi

(

∂φ

∂xi

)

=3∑

i=1

∂2φ

∂x2i

=∂2φ

∂x21

+∂2φ

∂x22

+∂2φ

∂x23

.

Chapter 7

The spectral theory of symmetric

matrices

7.1 The spectral theorem for symmetric matrices

1. Let A ∈ Rm×n. Then (AT A)T = AT (AT )T = AT A and, for every x ∈ Rn,

x · AT Ax = (Ax) · (Ax) = ‖Ax‖22 ≥ 0.

Therefore, AT A is symmetric and positive semidefinite.

5. Suppose A ∈ Rn×n satisfies(Ax) · (Ay) = x · y for all x, y ∈ Rn.

Then

x · (AT Ay) = x · y for all x, y ∈ Rn

⇒ AT Ay = y for all y ∈ Rn (by Corollary 275)

⇒ AT A = I.

The last step follows from the uniqueness of the matrix representing a linear operator on Rn. SinceAT A = I, we see that A is orthogonal.

9. Let X be a finite-dimensional inner product space over R with basis X = x1, . . . , xn, and assume thatT : X → X is a self-adjoint linear operator (T ∗ = T ). Define A = [T ]X ,X . Let G be the Gram matrix forthe basis X , and define B ∈ Rn×n by

B = G1/2AG−1/2,

where G1/2 is the square root of G (see Exercise 7) and G−1/2 is the inverse of G1/2.

(a) We wish to prove that B is symmetric. Let x, y ∈ X be given, and define α = [x]X , β = [y]X . Then,since [T (x)]X = A[x]X = Aα, we have T (x) =

∑ni=1(Aα)ixi. Therefore,

〈T (x), y〉 =⟨

n∑

i=1

(Aα)ixi,

n∑

j=1

βjxj

⟩

=

n∑

i=1

n∑

j=1

(Aα)iβj 〈xi, xj〉

=

n∑

i=1

(Aα)i

n∑

j=1

Gijβj

= (Aα) ·Gβ = α · (AT G)β.

47

48 CHAPTER 7. THE SPECTRAL THEORY OF SYMMETRIC MATRICES

Similarly, [T (y)]X = Aβ, and an analogous calculation shows that 〈x, T (y)〉 = α · GAβ. Since〈T (x), y〉 = 〈x, T (y)〉, it follows that α · (AT G)β = α ·GAβ for all α, β ∈ Rn, and hence GA = AT G.This implies that (GA)T = AT GT = AT G = GA (using the fact that G is symmetric), and henceGA is symmetric. (Thus the fact that T is self-adjoint does not imply that A is symmetric, butrather than GA is symmetric.)

Now, it is easy to see that G−1/2 is symmetric, and if C is symmetric, then so is XCXT for anysquare matrix X . Therefore,

G−1/2GA(G−1/2)T = G−1/2GAG−1/2 = G1/2AG−1/2

is symmetric, which is what we wanted to prove.

(b) If λ, u is an eigenvalue/eigenvector pair of B, then

Bu = λu ⇒ G1/2AG−1/2u = λu ⇒ AG−1/2u = λG−1/2u.

Since u 6= 0 and G−1/2 is nonsingular, G−1/2u is also nonzero, and hence λ, G−1/2u is an eigenpairof A.

(c) Since B is symmetric, there exists an orthonormal basis u1, . . . , un of Rn consisting of eigenvectorsof B. Let λ1, . . . , λn be the corresponding eigenvalues. From above, we know that

G−1/2u1, . . . , G−1/2un

is a basis of Rn consisting of eigenvectors of A. Define y1, . . . , yn ⊂ X by [yi]X = G−1/2ui, thatis,

yi =

n∑

k=1

(

G−1/2ui

)

kxk, i = 1, . . . , n.

Notice that

〈yi, yj〉 =⟨

n∑

k=1

(

G−1/2ui

)

kxk,

n∑

ℓ=1

(

G−1/2uj

)

ℓxℓ

⟩

=

n∑

k=1

n∑

ℓ=1

(

G−1/2ui

)

k

(

G−1/2uj

)

ℓ〈xk, xℓ〉

=(

G−1/2ui

)

·G(

G−1/2uj

)

= ui ·(

G−1/2GG−1/2)

uj

= ui · uj =

1, i = j,0, i 6= j.

This shows that y1, . . . , yn is an orthonormal basis for X . Also,

[T (yi)]X = A[yi]X = AG−1/2ui = λiG−1/2ui = λi[yi]X = [λiyi]X ,

which proves that T (yi) = λiyi. Thus each yi is an eigenvector of T , and we have proved that thereexists an orthonormal basis of X consisting of eigenvectors of T .

7.2 The spectral theorem for normal matrices

1. The matrix A is normal since AT A = AAT = 2I. We have A = UDU∗, where

U =

[

i√2− i√

21√2

1√2

]

, D =

[

1− i 00 1 + i

]

.

7.3. OPTIMIZATION AND THE HESSIAN MATRIX 49

7. Let A ∈ Rn×n be skew-symmetric.

(a) Since AT A = AAT = −A2, it follows that A is normal

(b) Hence there exist a unitary matrix X ∈ Cn×n and a diagonal matrix D ∈ Cn×n such that A =XDX∗, or, equivalently, D = X∗AX . We then have

D∗ = X∗A∗X = X∗(−A)X = −X∗AX = −D.

Therefore, each diagonal entry λ of D, that is, each eigenvalue of A, satisfies λ = −λ. This impliesthat λ is of the form iθ, θ ∈ R. Thus a skew-symmetric matrix has only purely imaginary eigenvalues.

11. Suppose A, B ∈ Cn×n are normal and commute (AB = BA). Then, by Exercise 5.4.18, A and B aresimultaneously diagonalizable; that is, there exists a unitary matrix X ∈ Cn×n such that X∗AX = Dand X∗BX = C are both diagonal matrices in Cn×n. It follows that

A + B = XDX∗ + XCX∗ = X(D + C)X∗, (A + B)∗ = X(D + C)∗X∗.

Since A + B and (A + B)∗ are simultaneously diagonalizable, it follows that they commute and henceA + B is normal.

15. Let A ∈ Fm×n and B ∈ Fn×p, where F represents R or C. We wish to find a formula for the productAB in terms of outer products of the columns of A and the rows of B. Let A = [c1| · · · |cn],

B =

r1

...rn

.

Then

AB =

n∑

k=1

ckrTk .

To verify this, notice that (ckrTk )ij = AikBkj , and hence

(

n∑

k=1

ckrTk

)

ij

=n∑

k=1

(

ckrTk

)

ij=

n∑

k=1

AikBkj = (AB)ij .

This holds for all i, j, and thus verifies the formula above. It follows that the linear operator defined byAB can be written as

n∑

k=1

ck ⊗ rk.

7.3 Optimization and the Hessian matrix

1. Suppose A ∈ Rn×n and define Asym = (1/2)(

A + AT)

. We have

(Asym)T =1

2

(

A + AT)T

=1

2

(

AT + A)

=1

2

(

A + AT)

= Asym,

and hence Asym is symmetric. Also, for any x ∈ Rn,

x ·Asymx = x ·(

1

2

(

A + AT)

)

x =1

2x ·(

Ax + AT x)

=1

2x · Ax +

1

2x · AT x

=1

2x · Ax +

1

2Ax · x

=1

2x · Ax +

1

2x · Ax = x · Ax.

50 CHAPTER 7. THE SPECTRAL THEORY OF SYMMETRIC MATRICES

5. The eigenvalues of A are λ = −1, λ = 3. Since A is indefinite, q has no global minimizer (or globalmaximizer).

7. The eigenvalues of A are λ = 0, λ = 5, and therefore A is positive semidefinite and singular. The vectorb belongs to col(A), and therefore, by Exercise 2, every vector in x∗ +N (A), where x∗ is any solution ofAx = −b, is a global minimizer of q. (In other words, every solution of Ax = −b is a global minimizer.)We can take x∗ = (−1, 0), and N (A) = sp(2,−1).

7.4 Lagrange multipliers

3. The maximizer is x.= (−0.058183, 0.73440,−0.67622), with Lagrange multiplier λ

.= (3.1883,−5.7454)

and f(x).= 17.923, while the minimizer is x

.= (0.67622,−0.73440, 0.058183), with Lagrange multiplier

λ.= (0.81171,−5.7454) and f(x)

.= 14.077.

7. The minimizer, and associated Lagrange multiplier, of f(x) subject to g(x) = u is

x(u) =

(

−√

1 + u√3

,−√

1 + u√3

,−√

1 + u√3

)

, λ(u) = −√

3

2√

1 + u.

We have p(u) = f(x(u)) = −√

3√

1 + u. Therefore, by direct calculation,

∇p(u) = −√

3

2√

1 + u⇒ ∇p(0) = −

√3

2.

On the other hand, the Lagrange multiplier associated with x∗ = x(0) is λ∗ = λ(0) = −√

3/2. Thus∇p(0) = λ∗, as implied by the previous exercise.

7.5 Spectral methods for differential equations

1. The exact solution is u(x) = (x− x2)/2, and the solution obtained by the method of Fourier series is

u(x) =

∞∑

n=1

2(1− (−1)n)

n3π3sin (nπx).

The following graph shows the exact solution u (the solid curve), together with the partial Fourier serieswith 1 (the dashed curve) and 4 terms (the dotted curve).

0 0.2 0.4 0.6 0.8 10

0.02

0.04

0.06

0.08

0.1

0.12

0.14

Notice that the partial Fourier series with 4 terms is already indistinguishable from the exact solution onthis scale.

3. Consider the operator M : C2D[0, ℓ]→ C[0, ℓ] defined by M(u) = −u′′ + u.

7.5. SPECTRAL METHODS FOR DIFFERENTIAL EQUATIONS 51

(a) For any u, v ∈ C2D[0, ℓ], we have

〈M(u), v〉2 =

∫ 1

0

(−u′′(x) + u(x))v(x) dx

= −∫ 1

0

u′′(x)v(x) dx +

∫ 1

0

u(x)v(x) dx

= − u′(x)v(x)|10 +

∫ 1

0

u′(x)v′(x) dx +

∫ 1

0

u(x)v(x) dx

=

∫ 1

0

u′(x)v′(x) dx +

∫ 1

0

u(x)v(x) dx (since v(0) = v(1) = 0)

= u(x)v′(x)|10 −∫ 1

0

u(x)v′′(x) dx +

∫ 1

0

u(x)v(x) dx

= −∫ 1

0

u(x)v′′(x) dx +

∫ 1

0

u(x)v(x) dx (since u(0) = u(1) = 0)

=

∫ 1

0

u(x)(−v′′(x) + v(x)) dx

= 〈u, M(v)〉2 .

This shows that M is a symmetric operator. Also, notice from the above calculation that

〈M(u), u〉2 =

∫ 1

0

(u′(x))2 dx +

∫ 1

0

(u(x))2 dx,

which shows that 〈M(u), u〉2 > 0 for all u ∈ C2D[0, ℓ], u 6= 0. Then, if λ is an eigenvalue of M and u

is a corresponding eigenfunction with 〈u, u〉2 = 1, then

λ = λ 〈u, u〉2 = 〈λu, u〉2 = 〈M(u), u〉2 > 0,

which shows that all the eigenvalues of M are positive.

(b) It is easy to show that the eigenvalues of M are n2π2 + 1, n = 1, 2, . . ., with corresponding eigen-functions sin (nπx) (the calculation is essentially the same as in Section 7.5.1).

Chapter 8

The singular value decomposition

8.1 Introduction to the SVD

3. The SVD of A is UΣV T , where

U =

1√2

0 1√2

1√2

0 − 1√2

0 −1 0

, Σ =

4√

3 0 0

0√

11 00 0 0

, V =

1√6− 1√

117√66

2√6− 1√

11− 4√

661√6

3√11

1√66

.

The outer product form simplifies to

A = 2

110

[

1 2 1]

+

00−1

[

−1 −1 3]

.

7. If the columns of A are A1, . . . , An, U =[

‖A1‖−1A1| · · · |‖An‖−1An

]

, V = I, and Σ is the diagonal matrixwith diagonal entries ‖A1‖, . . . , ‖An‖, then A = UΣV T is the SVD of A.

11. Suppose A ∈ Cn×n is invertible and A = UΣV ∗ is the SVD of A.

(a) We have (UΣV ∗)∗ = V Σ∗U∗ = V ΣU∗ (since Σ is a real, diagonal matrix), and hence A∗ = V ΣU∗

is the SVD of A.

(b) Since A is invertible, all the diagonal entries of Σ are positive, and hence Σ−1 exists. We have(UΣV ∗)−1 = V Σ−1U∗; however, the diagonal entries of Σ−1, σ−1

1 , . . . , σ−1n , are ordered from smallest

to largest. We obtain the SVD of A−1 by re-ordering. Define W = [vn| · · · |v1], Z = [un| · · · |u1], andlet T be the diagonal matrix with diagonal entries σ−1

n , . . . , σ−11 . Then A−1 = WTZ∗ is the SVD of

A−1.

(c) The SVD of A−∗ is ZTW ∗, where W , T , and Z are defined above.

In outer product form,

A∗ =

n∑

i=1

σi(vi ⊗ ui), A−1 =

n∑

i=1

σ−1i (vi ⊗ ui), A−∗ =

n∑

i=1

σ−1i (ui ⊗ vi).

8.2 The SVD for general matrices

3. We haveprojcol(A)b = (1, 5/2, 5/2, 4) , projN (AT )b = (0,−1/2, 1/2, 0) .

53

54 CHAPTER 8. THE SINGULAR VALUE DECOMPOSITION

5. Referring to the solution of Exercise 8.1.4, if U = [u1|u2|u3|u4], V = [v1|v2|v3|v4], then u1, u2, u3 isa basis for col(A), u4 is a basis for N (AT ), v1, v2, v3 is a basis for col(AT ), and v4 is a basis forN (A).

9. Let A ∈ Rm×n be nonsingular. We wish to compute min‖Ax‖2 : x ∈ Rn, ‖x‖2 = 1. Note that

‖Ax‖2 =√

x · AT Ax. By Exercise 7.4.5, the minimum value of x ·AT Ax, where ‖x‖2 = 1, is the smallesteigenvalue of AT A, which is σ2

n. Therefore, min‖Ax‖2 : x ∈ Rn, ‖x‖2 = 1 = σn, and the value of xyielding the minimum is vn, the right singular vector corresponding to σn, the smallest singular value.

15. (a) Let U ∈ Cm×m, V ∈ Cn×n be unitary. We wish to prove that ‖UA‖F = ‖A‖F and ‖AV ‖F = ‖A‖Ffor all A ∈ Cm×n. We begin with two preliminary observations. By definition of the Frobeniusnorm, for any A ∈ Cm×n,

‖A‖2F =

n∑

j=1

(

m∑

i=1

|Aij |2)

=

n∑

j=1

‖Aj‖22,

where Aj is the jth column of A. Also, it is obvious that ‖AT ‖F = ‖A‖F for all A ∈ Cm×n.

We thus have

‖UA‖2F =

n∑

j=1

‖(UA)j‖2F =

n∑

j=1

‖UAj‖2F =

n∑

j=1

‖Aj‖2F = ‖A‖2F

and‖AV ‖F = ‖V T AT ‖F = ‖AT ‖F = ‖A‖F ,

as desired.

(b) Let A ∈ Cm×n be given, and let r be a positive integer with r < rank(A). We wish to find thematrix B ∈ Cm×n of rank r such that ‖A−B‖F is as small as possible. If we define B = UΣrV

T ,where Σr ∈ Rm×n is the diagonal matrix with diagonal entries σ1, . . . , σr, 0, . . . , 0, then

‖A−B‖F = ‖UΣV ∗ − UΣrV∗‖F = ‖U(Σ− Σr)V

∗‖F = ‖(Σ− Σr)V∗‖F = ‖Σ− Σr‖F

=√

σ2r+1 + · · ·+ σw

t ,

where t ≤ minm, n is the rank of A. Notice that the rank of a matrix is the number of positive sin-

gular values, so rank(B) = r, as desired. Thus we can make ‖A−B‖F as small as√

σ2r+1 + · · ·+ σw

t .

Moreover, for any B ∈ Cm×n, we have

‖A−B‖2F = ‖Σ− U∗BV ‖2F

=m∑

i=1

n∑

j=1

(

Σij − (U∗BV )ij

)2

=

t∑

i=1

(σi − (U∗BV )ii)2

+

m∑

i=1

t∑

j = 1j 6= i

(U∗BV )2ij +

m∑

i=1

n∑

j=t+1

(U∗BV )2ij .

Now, we are free to choose all the entries of U∗BV (since U∗ and V are invertible, given anyC ∈ Cm×n, there exists a unique B ∈ Cm×n with U∗BV = C) to make the above sum as smallas possible. Since all three summations are nonnegative, we should choose (U∗BV )ij = 0 fori = 1, . . . , m, j = 1, . . . , n, j 6= i for i = 1, . . . , t. This causes the second two summations to vanish,and yields

‖A−B‖2F =

t∑

i=1

(σi − (U∗BV )ii)2.

8.3. SOLVING LEAST-SQUARES PROBLEMS USING THE SVD 55

The rank of U∗BV (and hence the rank of B) is the number of nonzero diagonal entries

(U∗BV )11, . . . , (U∗BV )tt.

Since the rank of B must be r, it is clear that U∗BV = Σr, where Σr is defined above, will make‖A−B‖F as small as possible. This shows that B = UΣrV

∗ is the desired matrix.

8.3 Solving least-squares problems using the SVD

3. The matrix A has two positive singular values, σ1 = 6 and σ2 = 2. The corresponding right singularvectors are

v1 =

232313

, v2 =

− 1√2

1√2

0

and the left singular vectors are

u1 =

12121212

, u2 =

− 1√2

01√2

0

.

The minimum-norm least-squares solution to Ax = b is then given by

x =u1 · bσ1

v1 +u2 · bσ2

v2.

(a) x = (2/3,−1/3, 1/12);

(b) x = (13/18,−5/18, 1/9);

(c) x = (0, 0, 0) (b is orthogonal to col(A)).

7. Let A ∈ Rm×n have rank r, and let σ1, . . . , σr be the positive singular values of A, with correspondingright singular vectors v1, . . . , vr ∈ Rn and left singular vectors u1, . . . , ur ∈ Rm. Then, for all x ∈ Rn,

Ax =

r∑

i=1

σi(vi · x)ui

and, for all b ∈ Rm,

A†b =

r∑

i=1

ui · bσi

vi.

8.4 The SVD and linear inverse problems

1. (a) Let A ∈ Rm×n be given, let I ∈ Rn×n be the identity matrix, and let ǫ be a positive number. For

any b ∈ Rm, we can solve the equation

[

AǫI

]

x =

[

b0

]

(8.1)

in the least-square sense, which is equivalent to minimizing

∥

∥

∥

∥

[

AǫI

]

x−[

b0

]∥

∥

∥

∥

2

2

=

∥

∥

∥

∥

[

Ax− bǫx

]∥

∥

∥

∥

2

2

.

56 CHAPTER 8. THE SINGULAR VALUE DECOMPOSITION

Now, for any Euclidean vector w ∈ Rk, partitioned as w = (u, v), u ∈ Rp, v ∈ Rq, p + q = k, wehave ‖w‖22 = ‖u‖22 + ‖v‖22, and hence

∥

∥

∥

∥

[

Ax− bǫx

]∥

∥

∥

∥

2

2

= ‖Ax− b‖22 + ‖ǫx‖22 = ‖Ax− b‖22 + ǫ2‖x‖22.

Thus solving (8.1) in the least-squares sense is equivalent to choosing x ∈ Rn to minimize ‖Ax −b‖22 + ǫ2‖x‖22.

(b) We have[

AǫI

]T

=[

AT ǫI]

,

which implies that

[

AǫI

]T [AǫI

]

= AT A + ǫ2I,

[

AǫI

]T [b0

]

= AT b.

Hence the normal equations for (8.1) take the form (AT A + ǫ2I)x = AT b.

(c) We have[

AǫI

]

x = 0 ⇒[

Axǫx

]

=

[

00

]

,

which yields ǫx = 0, that is, x = 0. Thus the matrix is nonsingular and hence, by the fundamentaltheorem, it has full rank. It follows from Exercise 6.4.1 that AT A+ ǫ2I is invertible, and hence thereis a unique solution xǫ to (AT A + ǫ2I)x = AT b.

3. With the errors drawn from a normal distribution with mean zero and standard deviation 10−4:

• truncated SVD works best with k = 3 singular values/vectors;

• Tikhonov regularization works best with ǫ around 10−3.

8.5 The Smith normal form of a matrix

1. We have A = USV , where

U =

4 2 −16 1 03 2 −1

, S =

1 0 00 3 00 0 15

, V =

1 0 −10 1 40 0 1

.

3. We have A = USV , where

U =

4 0 15 0 17 −1 0

, S =

1 0 00 3 00 0 0

, V =

2 1 41 0 70 0 1

.

Chapter 9

Matrix factorizations and numerical

linear algebra

9.1 The LU factorization

1.

L =

1 0 0 03 1 0 0−1 −4 1 0−2 0 5 1

, U =

1 3 2 40 1 −1 30 0 1 −10 0 0 1

.

7. Let

L =

[

1 0ℓ 1

]

, U =

[

u v0 w

]

,

and notice that

LU =

[

u vℓu ℓv + w

]

.

(a) We wish to show that there do not exist matrices L, U of the above forms such that LU = A, where

A =

[

0 11 1

]

.

This is straightforward, since LU = A implies u = 0 (comparing the 1, 1 entries) and also ℓu = 1(comparing the 2, 1 entries). No choice of ℓ, u, v, w can make both of these true, and hence theredo not exist such L and U .

(b) If

A =

[

0 10 1

]

,

then LU = A is equivalent to u = 0, v = 1, and ℓ + w = 1. There are infinitely many choices of ℓand w that will work, and hence there exist infinitely many L, U satisfying LU = A.

11. Computing A−1 is equivalent to solving Ax = ej for j = 1, 2, . . . , n. If we compute the LU factorizationand then solve the n systems LU = ej, j = 1, . . . , n, the operation count is

2

3n3 − 1

2n2 − 1

6n + n

(

2n2 − n)

=8

3n3 +

1

2n2 − 7

6n.

57

58 CHAPTER 9. MATRIX FACTORIZATIONS AND NUMERICAL LINEAR ALGEBRA

We can reduce the above operation count by taking advantage of the zeros in the vectors ej . SolvingLc = ej takes the following form:

ci = 0, i = 1, . . . , j − 1,

cj = 1,

ci = −i−1∑

k=j

Likck, i = j + 1, . . . , n.

Thus solving Lc− ej requires∑n

i=j+1 2(i− j) = (n − j)2 + (n− j) operations. The total for solving alln of the lower triangular systems Lc = ej, j = 1, . . . , n, is

n∑

j=1

(n− j)2 + (n− j)

=1

3n3 − 1

3n

(instead of n(n2 − n) = n3 − n if we ignore the structure of the right-hand side). We still need n3

operations to solve the n upper triangular systems UAj = L−1ej (since we perform back substition, thereis no simplification from the fact that the first j − 1 entries in L−1ej are zero). Hence the total is

2

3n3 − 1

2n2 − 1

6n +

1

3n3 − 1

3n + n3 = 2n3 +

3

2n2 − 3

2n.

Notice the reduction in the leading term from (8/3)n3 to 2n3.

9.2 Partial pivoting

1. The solution is x = (2, 1,−1); partial pivoting requires interchanging rows 1 and 2 on the first step, andinterchanging rows 2 and 3 on the second step.

5. Suppose A ∈ Rn×n has an LU decomposition, A = LU . We know that the determinant of a squarematrix is the product of the eigenvalues, and also that the determinant of a triangular matrix is theproduct of the diagonal entries. We therefore have det(A) = λ1λ2 · · ·λn, where λ1, λ2, . . . , λn are theeigenvalues of A (listed according to multiplicity), and also

det(A) = det(LU) = det(L)det(U) = (1 1 · · · 1)(U11U22 · · ·Unn) = U11U22 · · ·Unn.

This shows that λ1λ2 · · ·λn = U11U22 · · ·Unn.

9. On the first step, partial pivoting requires that rows 1 and 3 be interchanged. No interchange is necessaryon step 2, and rows 3 and 4 must be interchanged on step 3. Thus

P =

0 0 1 00 1 0 00 0 0 11 0 0 0

.

The LU factorization of PA is PA = LU , where

L =

1 0 0 00.5 1 0 0

0.25 −0.5 1 0−0.25 −0.1 0.25 1

, U =

6 2 −1 40 3 1 −20 0 2 10 0 0 1

.

9.3. THE CHOLESKY FACTORIZATION 59

9.3 The Cholesky factorization

1. (a) The Cholesky factorization is A = RT R, where

R =

1 −3 20 2 −20 0 2

.

(b) We also have A = LDLT , where L = (D−1/2R)T and the diagonal entries of D are the diagonalentries of R:

D =

1 0 00 2 00 0 2

, L =

1 0 0

−3√

2 0

2 −√

2√

2

.

Alternatively, we can write A = LU , where U = DR and L = (D−1R)T :

L =

1 0 0−3 1 0

2 −1 1

, U =

1 −3 20 4 −40 0 4

.

5. Let A ∈ Rn×n be SPD. The algorithm described on pages 527–528 of the text (that is, the equationsderived on those pages) shows that there is a unique upper triangular matrix R with positive diagonalentries such that RT R = A. (The only freedom in solving those equations for the entries of R lies inchoosing the positive or negative square root when computing Rii. If Rii is constrained to be positive,then the entries of R are uniquely determined.)

9.4 Matrix norms

1. Let ‖ · ‖ be any induced matrix norm on Rn×n. If λ ∈ Cn×n, x ∈ Cn, x 6= 0, is an eigenvalue/eigenvectorpair of A, then

‖Ax‖ ≤ ‖A‖‖x‖ ⇒ ‖λx‖ ≤ ‖A‖‖x‖ ⇒ |λ|‖x‖ ≤ ‖A‖‖x‖ ⇒ |λ| ≤ ‖A‖

(the last step follows from the fact that x 6= 0). Since this holds for every eigenvalue of A, it follows that

ρ(A) = max|λ| : λ is an eigenvalue of A ≤ ‖A‖.

7. Let A ∈ Rm×n. We wish to prove that ‖AT ‖2 = ‖A‖2. This follows immediately from Theorem 403 andExercise 4.5.14. Theorem 403 implies that ‖A‖2 =

√

λmax(AT A), ‖AT ‖2 =√

λmax(AAT ). By Exercise4.5.14, AT A and AAT have the same nonzero eigenvalues, and hence λmax(AT A) = λmax(AAT ), fromwhich it follows that ‖A‖2 = ‖AT ‖2.

9.5 The sensitivity of linear systems to errors

1. Let A ∈ Rn×n be nonsingular.

(a) Suppose b ∈ Rn is given. Choose c ∈ Rn such that

‖A−1c‖‖c‖ = sup

‖A−1x‖‖x‖ : x ∈ Rn, x 6= 0

= ‖A−1‖,

and notice that ‖A−1c‖ = ‖A−1‖‖c‖. (Such a c exists. For the common norms—the ℓ1, ℓ∞, andEuclidean norms—we have seen in the text how to compute such a c; for an arbitrary norm, it


can be shown that such a c exists, although some results from analysis concerning continuity andcompactness are needed.) Define b = b + c and x = A−1b, x = A−1(b + c). Then

‖x− x‖ = ‖A−1(b + c)−A−1b‖ = ‖A−1c‖ = ‖A−1‖‖c‖ = ‖A−1‖‖b− b‖.Notice that, for the Euclidean norm ‖ ·‖2 and induced matrix norm, c should be chosen to be a rightsingular vector corresponding to the smallest singular value of A.

(b) Let A ∈ Rn×n be given, and let x, c ∈ Rn be chosen so that

‖Ax‖‖x‖ = sup

‖Ay‖‖y‖ : y ∈ Rn, y 6= 0

= ‖A‖,

‖A−1c‖‖c‖ = sup

‖A−1y‖‖y‖ : y ∈ Rn, y 6= 0

= ‖A−1‖.

Define b = Ax, b = b + c, and x = A−1(b + c) = x + A−1c. We then have

‖b‖ = ‖Ax‖ = ‖A‖‖x‖ ⇒ ‖x‖ =‖b‖‖A‖

and‖x− x‖ = ‖A−1c‖ = ‖A−1‖‖c‖ = ‖A−1‖‖b− b‖.

It follows that‖x− x‖‖x‖ =

‖A−1‖‖b− b‖‖b‖‖A‖

= ‖A‖‖A−1‖‖b− b‖‖b‖ .

5. Let A ∈ Rn×n be invertible, and let ‖ · ‖ denote any norm on Rn and the corresponding induced matrixnorm.

(a) Let B ∈ Rn×n be any singular matrix. We have seen (Exercise 9.4.9) that ‖Ax‖ ≥ ‖x‖/‖A−1‖ forall x ∈ Rn. Let x ∈ N (B) with ‖x‖ = 1. Then

‖A−B‖ ≥ ‖(A−B)x‖ = ‖Ax‖ ≥ ‖x‖‖A−1‖ =

1

‖A−1‖ .

(b) It follows that

inf

‖A−B‖‖A‖ : B ∈ Rn×n, det(B) = 0

≥ inf

1/‖A−1‖‖A‖ : B ∈ Rn×n, det(B) = 0

=1

cond(A).

(c) Consider the special case of the Euclidean norm on Rn and induced norm ‖ · ‖2 on Rn×n. LetA = UΣV T be the SVD of A, and define A′ = UΣ′V T , where Σ′ is the diagonal matrix withdiagonal entries σ1, . . . , σn−1, 0 (σ1 ≥ · · ·σn−1 ≥ σn > 0 are the singular values of A). Then

‖A−A′‖2 = ‖UΣV T − UΣ′V T ‖2 = ‖U(Σ− Σ′)V T ‖2= ‖Σ− Σ′‖2

= σn =1

‖A−1‖(‖Σ− Σ′‖2 is the largest singular value of Σ− Σ′, which is a diagonal matrix with a single nonzeroentry, σn). It follows that

‖A−A′‖2‖A‖2

=1

‖A‖2‖A−1‖2=

1

cond2(A).

Hence the inequality derived in part (b) is an equality in this case.

9.6. NUMERICAL STABILITY 61

9.6 Numerical stability

1. (a) Suppose x and y are two real numbers and x and y are perturbations of x and y, respectively. Wehave

xy − xy = xy − xy + xy − xy = x(y − y) + (x− x)y,

which implies that|xy − xy| ≤ |x||y − y|+ |x− x||y|.

This gives a bound on the absolute error in approximating xy by xy. Dividing by xy yields

|xy − xy||xy| ≤ |x||x|

|y − y||y| +

|x− x||x| .

This yields a bound on the relative error in xy in terms of the relative errors in x and y, althoughit would be preferable if the bound did not contain x (except in the expression for the relative errorin x). We can manipulate the bound as follows:

|xy − xy||xy| ≤ |x||x|

|y − y||y| +

|x− x||x| =

|y − y||y| +

|x− x||x| +

( |x||x| − 1

) |y − y||y|

≤ |y − y||y| +

|x− x||x| +

∣

∣

∣

∣

|x||x| − 1

∣

∣

∣

∣

|y − y||y|

=|y − y||y| +

|x− x||x| +

||x| − |x|||x|

|y − y||y|

≤ |y − y||y| +

|x− x||x| +

|x− x||x|

|y − y||y| .

When the relative errors in x and y are small (that is, much less than 1), then their product ismuch smaller, and we see that the relative error in xy as an approximation to xy is approximatelybounded by the sum of the errors in x and y.

(b) If x and y are floating point numbers, then fl(xy) = xy(1+ ǫ), where |ǫ| ≤ u. Therefore, fl(xy) is theexact product of x and y, where x = x and y = y(1 + ǫ). This shows that the computed product isthe exact product of nearby numbers, and therefore that floating point multiplication is backwardstable.

9.7 The sensitivity of the least-squares problem

3. Let A ∈ Rm×n, b ∈ Rm be given, and let x be a least-squares solution of Ax = b. We have

b = Ax + (b −Ax) ⇒ b · Ax = (Ax) · (Ax) + (b −Ax) · Ax = ‖Ax‖22,

since (b−Ax) · Ax = 0 (b −Ax is orthogonal to col(A)). It follows that

‖b‖2‖Ax‖2 cos (θ) = ‖Ax‖22 ⇒ ‖Ax‖2 = ‖b‖2 cos (θ),

where θ is the angle between Ax and b. Also, since Ax, b−Ax are orthogonal, the Pythagorean theoremimplies

‖b‖22 = ‖Ax‖22 + ‖b−Ax‖22.Dividing both sides by ‖b‖22 yields

1 =‖Ax‖22‖b‖22

+‖b−Ax‖22‖b‖22

⇒ 1 = cos2 (θ) +‖b−Ax‖22‖b‖22

.

Therefore,‖b−Ax‖22‖b‖22

= sin2 (θ) ⇒ ‖b−Ax‖2 = ‖b‖2 sin (θ).


9.8 The QR factorization

1. Let x, y ∈ R3 be defined by x = (1, 2, 1) and y = (2, 1, 1). We define

u =x− y

‖x− y‖2=

(

− 1√2,

1√2, 0

)

, Q = I − 2uuT =

0 1 01 0 00 0 1

.

Then Qx = y.

3. We have A = QR, where

Q =

−0.42640 0.65970 0.61885−0.63960 −0.70368 0.30943

0.63960 −0.26388 0.72199

,

R =

−4.6904 −3.8376 2.77160 2.0671 −2.11100 0 9.2828

(correct to the digits shown). The Householder vectors are

u1 = (0.84451, 0.37868,−0.37868), u2 = (−0.99987, 0.015968).

9.9 Eigenvalues and simultaneous iteration

1. We apply n iterations of the power method, normalizing the approximate eigenvectors in the Euclideannorm at each iteration, and estimating the eigenvalue by λ = (x ·Ax)/(x · x) = x ·Ax. We use a randomstarting vector. With n = 10, we have

x0 = (0.37138,−0.22558, 1.1174),

x10 = (0.40779,−0.8165, 0.40871),

λ.= 3.9999991538580488.

With n = 20, we obtain

x0 = (−1.0891, 0.032557, 0.55253),

x20 = (−0.40825, 0.8165,−0.40825),

λ.= 3.9999999999593752.

It seems clear that the dominant eigenvalue is λ = 4.

5. Let A ∈ Rn×n. We wish to prove that there exists an orthogonal matrix Q such that QT AQ is blockupper triangular, with each diagonal block of size 1× 1 or 2× 2. If all the eigenvalues of A are real, thenthe proof of Theorem 413 can be given using only real numbers and vectors, and the result is immediate.Therefore, let λ, λ be a complex conjugate pair of eigenvalues of A. By an induction argument similar tothat in the proof of Theorem 413, it suffices to prove that there exists an orthogonal matrix Q ∈ Rn×n

such that

QT AQ =

[

T B0 C

]

,

where T ∈ R2×2 has eigenvalues λ, λ. Suppose z = x + iy, z 6= 0, x, y ∈ Rn, satisfies Az = λz. In thesolution of the previous exercise, we saw that x, y is linearly independent, so define S = spx, y ⊂ Rn

and S = spx, y ⊂ Cn (S = spz, z). Let q1, q2 be an orthonormal basis for S, and extend it to an

9.10. THE QR ALGORITHM 63

orthonormal basis q1, q2, . . . , qn for Rn. Define Q1 = [q1|q2], Q2 = [q3| · · · |qn], and Q = [Q1|Q2]. Wethen have

QT AQ =

[

QT1

QT2

]

[AQ1|AQ2] =

[

QT1 AQ1 QT

1 AQ2

QT2 AQ1 QT

2 AQ2

]

.

Since both columns of AQ1 belong to S and each column of Q2 belongs to S⊥, it follows that QT2 AQ1 = 0.

Thus it remains only to prove that the eigenvalues of T = QT1 AQ1 are λ, λ. We see that η ∈ C , u ∈ C2

form an eigenpair of T if and only if

Tu = ηu ⇔ QT1 AQ1u = ηu ⇔ A(Q1u) = η(Q1u).

Since both z and z can be written as Q1u for u ∈ C2, it follows that both λ and λ are eigenvalues of T ;moreover, since T is 2× 2, these are the only eigenvalues of T . This completes the proof.

9.10 The QR algorithm

1. Two steps are required to reduce A to upper Hessenberg form. The result is

H =

−2.0000 4.8507 · 10−1 −1.1021 · 10−1 8.6750 · 10−1

−4.1231 3.1765 1.3851 −7.6145 · 10−1

0 −1.4240 1.9481 1.6597 · 10−1

0 0 8.9357 · 10−1 −3.1246

and the vectors defining the two Householder transformations are

u1 = (8.6171 · 10−1,−2.8146 · 10−1, 4.2219 · 10−1),

u2 = (9.3689 · 10−1, 3.4964 · 10−1).

5. The inequality|λk+1 − µ||λk − µ| <

|λk+1||λk|

(9.1)

is equivalent to|λk+1 − µ||λk+1|

<|λk − µ||λk|

,

and hence (9.1) holds if and only if the relative error in µ as an estimate of λk+1 is less than the relativeerror in µ as an estimate of λk.

Chapter 10

Analysis in vector spaces

10.1 Analysis in Rn

3. Let ‖ · ‖ and ‖ · ‖∗ be two norms on Rn. Since ‖ · ‖ and ‖ · ‖∗ are equivalent, there exist positive constantsc1, c2 such that c1‖x‖ ≤ ‖x‖∗ ≤ c2‖x‖ for all x ∈ Rn. Suppose that xk → x under ‖ · ‖, and let ǫ > 0.Then there exists a positive integer N such that ‖xk − x‖ < ǫ/c2 for all k ≥ N . It follows that

‖xk − x‖∗ ≤ c2‖xk − x‖ < c2ǫ

c2= ǫ

for all k ≥ N . Therefore, xk → x under ‖ · ‖∗.Conversely, if xk → x under ‖ · ‖∗ and ǫ > 0 is given, there exists a positive integer N such that‖xk − x‖∗ < c1ǫ for all k ≥ N . It follows that ‖xk − x‖ ≤ c−1

1 ‖xk − x‖∗ < c−11 c1ǫ = ǫ for all k ≥ N .

Therefore, xk → x under ‖ · ‖.7. Let ‖ · ‖ and ‖ · ‖∗ be two norms on Rn, let S be a nonempty subset of Rn, let f : S → Rn be a function,

and let y be an accumulation point of S. Since ‖ ·‖ and ‖ ·‖∗ are equivalent, there exist positive constantsc1, c2 such that c1‖x‖ ≤ ‖x‖∗ ≤ c2‖x‖ for all x ∈ Rn. Suppose first that limx→y f(x) = L under ‖ · ‖,and let ǫ > 0 be given. Then there exists δ > 0 such that if x ∈ S and ‖x− y‖ < δ, then |f(x) − L| < ǫ.But then ‖x− y‖∗ < c1δ ⇒ ‖y−x‖ ≤ c−1

1 ‖x− y‖ < c−1c1δ = δ. Therefore, if x ∈ S and ‖x− y‖∗ < c1δ,it follows that ‖x− y‖ < δ, and hence that |f(x)−L| < ǫ. This shows that limx→y f(x) = L under ‖ · ‖∗.Conversely, suppose that limx→y f(x) = L under ‖ · ‖∗, and let ǫ > 0 be given. Then there exists δ > 0such that if x ∈ S and ‖x− y‖∗ < δ, then |f(x) − L| < ǫ. But then

‖x− y‖ < c−12 δ ⇒ ‖y − x‖∗ ≤ c2‖x− y‖ < c2c

−12 δ = δ.

Therefore, if x ∈ S and ‖x− y‖ < c−12 δ, it follows that ‖x− y‖∗ < δ, and hence that |f(x)−L| < ǫ. This

shows that limx→y f(x) = L under ‖ · ‖.11. Let ‖ · ‖ and ‖ · ‖∗ be two norms on Rn, and let xk be a sequence in Rn. Since ‖ · ‖ and ‖ · ‖∗ are

equivalent, there exist positive constants c1, c2 such that c1‖x‖ ≤ ‖x‖∗ ≤ c2‖x‖ for all x ∈ Rn. Supposefirst that xk is Cauchy under ‖ · ‖, and let ǫ > 0 be given. Then there exists a positive integer N suchthat m, n ≥ N implies that ‖xm − xn‖ < c−1

2 ǫ. But then m, n ≥ N implies that

‖xm − xn‖∗ ≤ c2‖xm − xn‖ < c2c−12 ǫ = ǫ,

and hence xk is Cauchy under ‖ · ‖∗.Conversely, suppose xk is Cauchy under ‖ · ‖∗, and let ǫ > 0 be given. Then there exists a positiveinteger N such that m, n ≥ N implies that ‖xm − xn‖∗ < c1ǫ. But then m, n ≥ N implies that

‖xm − xn‖ ≤ c−11 ‖xm − xn‖∗ < c−1

1 c1ǫ = ǫ,

and hence xk is Cauchy under ‖ · ‖.

65

66 CHAPTER 10. ANALYSIS IN VECTOR SPACES

10.2 Infinite-dimensional vector spaces

3. Suppose fk is a Cauchy sequence in C[a, b] (under the L∞ norm) that converges pointwise to f : [a, b]→R. We wish to prove that fk → f in the L∞ norm. By Theorem 442, C[a, b] is complete under ‖·‖∞, andhence there exists a function g ∈ C[a, b] such that ‖fk − g‖∞ → 0 as k → ∞. By the previous exercise,fk converges uniformly to g and hence, in particular, g(x) = limk→∞ fk(x) for all x ∈ [a, b] (cf. thediscussion on page 593 in the text). However, by assumption, f(x) = limk→∞ fk(x) for all x ∈ [a, b]. Thisproves that g(x) = f(x) for all x ∈ [a, b], that is, that g = f . Thus ‖fk − f‖∞ → 0 as k →∞.

10.3 Functional analysis

3. Let V be a normed vector space. We wish to prove that V ∗ is complete. Let fk be a Cauchy sequencein V ∗, let v ∈ V , v 6= 0, be fixed, and let ǫ > 0 be given. Then, since fk is Cauchy, there exists apositive integer N such that ‖fn − fm‖V ∗ < ǫ/‖v‖ for all m, n ≥ N . By definition of ‖ · ‖V ∗ , it followsthat |fn(v)− fm(v)| < ǫ for all m, n ≥ N . This proves that fk(v) is a Cauchy sequence of real numbersand hence converges. We define f(v) = limk→∞ fk(v). Since v was an arbitrary element of V , this definesf : V → R. Moreover, it is easy to show that f is linear:

f(αv) = limk→∞

fk(αv) = limk→∞

αfk(v) = α limk→∞

fk(v) = αf(v),

f(u + v) = limk→∞

fk(u + v) = limk→∞

(fk(u) + fk(v)) = limk→∞

fk(u) + limk→∞

fk(v) = f(u) + f(v).

We can also show that f is bounded. Since fk is Cauchy under ‖ ·‖V ∗ , it is easy to show that ‖fk‖V ∗is a bounded sequence of real numbers, that is, that there exists M > 0 such that ‖fk‖V ∗ ≤ M for allk. Therefore, if v ∈ V , ‖v‖ ≤ 1, then |f(v)| = |limk→∞ fk(v)| = limk→∞ |fk(v)| ≤ limk→∞ ‖fk‖V ∗‖v‖ ≤M‖v‖. Thus f is bounded, and hence f ∈ V ∗. Finally, we must show that fk → f under ‖ · ‖V ∗ , that is,that ‖fk − f‖V ∗ → 0 as k → ∞. Let ǫ > 0 be given. Since fk is Cauchy under ‖ · ‖V ∗ , there exists apositive integer N such that ‖fm−fn‖V ∗ < ǫ/2 for all m, n ≥ N . We will show that |fn(v)−f(v)| < ǫ forall v ∈ V , ‖v‖ ≤ 1, and all n ≥ N , which then implies that ‖fn − f‖V ∗ < ǫ for all n ≥ N and completesthe proof. For any v ∈ V , ‖v‖ ≤ 1, and all n, m ≥ N , we have

|fn(v)− f(v)| ≤ |fn(v)− fm(v)|+ |fm(v)− f(v)| < ǫ

2+ |fm(v)− f(v)|.

Moreover, since fm(v)→ f(v), there exists m ≥ N such that |fm(v) − f(v)| ≤ ǫ/2. Therefore, it followsthat |fn(v) − f(v)| < ǫ for all n ≥ N . This holds for all v ∈ V (notice that N is independent of v), andthe proof is complete.

10.4 Weak convergence

5. Let V be a normed linear space over R, let x be any vector in V , and let S = Br(x) = y ∈ V : ‖y−x‖ <r, where r > 0. Then, for any y, z ∈ S, α, β ∈ [0, 1], α + β = 1, we have

‖αy + βz − x‖ = ‖αy + βz − αx− βx‖ = ‖α(y − x) + β(z − x)‖≤ α‖y − x‖+ β‖z − x‖< αr + βr = r.

This shows that αy + βz ∈ S, and hence that S is convex.

7. Let f : Rn → R be convex and continuously differentiable, and let x, y ∈ Rn be given. By the previousexercise,

f(x) ≥ f(y) +∇f(y) · (x− y),

f(y) ≥ f(x) +∇f(x) · (y − x).

10.4. WEAK CONVERGENCE 67

Adding these two equations yields

f(x) + f(y) ≥ f(y) + f(x) +∇f(y) · (x− y) +∇f(x) · (y − x).

Canceling the common terms and rearranging yields

(∇f(x)−∇f(y)) · (x− y) ≥ 0.

Finite-Dimensional Linear Algebra Solutions to …pages.mtu.edu/~msgocken/fdlabook/SSManual.pdf ·...

Documents

Transcript of Finite-Dimensional Linear Algebra Solutions to …pages.mtu.edu/~msgocken/fdlabook/SSManual.pdf ·...