235 Course Notes

Linear Algebra 2

Course Notes for MATH 235

Edition 1.1

D. Wolczuk

Copyright: D. Wolczuk, 1st Edition, 2011

Contents

7 Fundamental Subspaces 17.1 Bases of Fundamental Subspaces. . . . . . . . . . . . . . . . . . . 17.2 Subspaces of Linear Mappings. . . . . . . . . . . . . . . . . . . . 8

8 Linear Mappings 108.1 General Linear Mappings. . . . . . . . . . . . . . . . . . . . . . . 108.2 Rank-Nullity Theorem . . . . . . . . . . . . . . . . . . . . . . . . 158.3 Matrix of a Linear Mapping . . . . . . . . . . . . . . . . . . . . . 198.4 Isomorphisms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

9 Inner Products 319.1 Inner Product Spaces. . . . . . . . . . . . . . . . . . . . . . . . . 319.2 Orthogonality and Length. . . . . . . . . . . . . . . . . . . . . . . 369.3 The Gram-Schmidt Procedure. . . . . . . . . . . . . . . . . . . . 499.4 General Projections. . . . . . . . . . . . . . . . . . . . . . . . . . 539.5 The Fundamental Theorem. . . . . . . . . . . . . . . . . . . . . . 619.6 The Method of Least Squares. . . . . . . . . . . . . . . . . . . . . 64

10 Applications of Orthogonal Matrices 7110.1 Orthogonal Similarity. . . . . . . . . . . . . . . . . . . . . . . . . 7110.2 Orthogonal Diagonalization. . . . . . . . . . . . . . . . . . . . . . 7410.3 Quadratic Forms. . . . . . . . . . . . . . . . . . . . . . . . . . . 8210.4 Graphing Quadratic Forms. . . . . . . . . . . . . . . . . . . . . . 9010.5 Optimizing Quadratic Forms. . . . . . . . . . . . . . . . . . . . . 9610.6 Singular Value Decomposition. . . . . . . . . . . . . . . . . . . . 99

11 Complex Vector Spaces 10811.1 Complex Number Review. . . . . . . . . . . . . . . . . . . . . . . 10811.2 Complex Vector Spaces. . . . . . . . . . . . . . . . . . . . . . . . 11511.3 Complex Diagonalization. . . . . . . . . . . . . . . . . . . . . . . 12211.4 Complex Inner Products. . . . . . . . . . . . . . . . . . . . . . . 12511.5 Unitary Diagonalization. . . . . . . . . . . . . . . . . . . . . . . . 13411.6 Cayley-Hamilton Theorem. . . . . . . . . . . . . . . . . . . . . . 140

ii

Chapter 7

Fundamental Subspaces

The main purpose of this chapter is to review a few important concepts from thefirst six chapters. These concepts include subspaces, bases, dimension, and linearmappings. As you will soon see the rest of the book relies heavily on these and otherconcepts from the first six chapters.

7.1 Bases of Fundamental Subspaces

Recall from Math 136 the four fundamental subspaces of a matrix.

DEFINITIONFundamental

Subspaces

Let A be anm× n matrix. The four fundamental subspaces ofA are

1. The columnspace ofA is Col(A) = {A~x ∈ Rm | ~x ∈ Rn}.2. The rowspace ofA is Row(A) = {AT~x ∈ Rn | ~x ∈ Rm}.3. The nullspace ofA is Null(A) = {~x ∈ Rn | A~x = ~0}.4. The left nullspace ofA is Null(AT) = {~x ∈ Rm | AT~x = ~0}.

THEOREM 1 Let A be anm×n matrix. Then Col(A) and Null(AT) are subspaces ofRm and Row(A)and Null(A) are subspaces ofRn.

Our goal now is to find an easy way to determine a basis for each of the four funda-mental subspaces.

REMARK

To help you understand the following two proofs, you may wishto pick a simple 3×2matrix A and follow the steps of the proof with your matrixA.

1

2 Chapter 7 Fundamental Subspaces

THEOREM 2 Let A be anm× n matrix. The columns ofA which correspond to leading ones in thereduced row echelon form ofA form a basis for Col(A). Moreover,

dim Col(A) = rankA

Proof: We first observe that ifA is the zero matrix, then the result is trivial. Hence,we can assume that rankA = r > 0.

Denote the columns of the reduced row echelon formR of A by ~r1, . . . ,~rn. SincerankA = r, Rcontainsr leading ones. Lett1, . . . , tr denote the indexes of the columnsof R which contain leading ones. We will first show thatB = {~r t1, . . . ,~r tr } is a basisfor Col(R).

Observe that by definition of the reduced row echelon form thevectors~r t1, . . . ,~r trare distinct standard basis vectors ofRm and hence form a linearly independent set.Additionally, every column ofRwhich does not contain a leading one can be writtenas a linear combination of the columns which do contain leading ones, so SpanB =Col(R). Therefore,B is a basis for Col(R) as claimed.

Denote the columns ofA by ~a1, . . . , ~an. We will now show thatC = {~at1, . . . , ~atr } isa basis for Col(A) by using the fact thatB is a basis for Col(R). To do this, we firstneed to find a relationship between the vectors inB andC.

SinceR is the reduced row echelon form ofA there exists a sequence of elementarymatricesE1, . . . ,Ek such thatEk · · ·E1A = R. Let E = Ek · · ·E1. Recall that everyelementary matrix is invertible, henceE−1 = E−1

1 · · ·E−1k exists. Then

R= EA= [E~a1 · · · E~an]

Consequently,~r i = E~ai, or~ai = E−1~r i.

To prove thatC is linearly independent, we use the definition of linear independence.Consider

c1~at1 + · · · + cr~atr =~0

Multiply both sides byE to get

E(c1~at1 + · · · + cr~atr ) = E~0

c1E~at1 + · · · + cr E~atr =~0

c1~r t1 + · · · + cr~r tr =~0

Thus,c1 = · · · = cr = 0 since{~r t1, . . . ,~r tr } is linearly independent. Thus,C is linearlyindependent.

To show thatC spans Col(A), we need to show that if we pick any~b ∈ Col(A), wecan write it as a linear combination of the vectors~at1, . . . , ~atr . Pick~b ∈ Col(A). Then,

Section 7.1 Bases of Fundamental Subspaces 3

by definition of the columnspace, there exists a~x ∈ Rn such thatA~x = ~b. Then weget

~b = A~x

~b = E−1R~x

E~b = R~x

Therefore,E~b is in the columnspace ofR and hence can be written as a linear com-bination of the basis vectors{~r t1, . . . ,~r tr }. Hence, we have

E~b = d1~r t1 + · · · + dr~r tr

~b = E−1(d1~r t1 + · · · + dr~r tr )

~b = d1E−1~r t1 + · · · + drE

−1~r tr

~b = d1~at1 + · · · + dr~atr

as required. Thus, Span{~at1, . . . , ~atr } = Col(A). Hence,{~at1, . . . , ~atr } is a basis forCol(A).

Recall that the dimension of a vector space is the number of vectors in any basis.Thus, dim Col(A) = r = rankA. �

THEOREM 3 Let A be anm× n matrix. The set of all non-zero rows in the reduced row echelonform of A form a basis for Row(A). Hence,

dim Row(A) = rankA

Proof: We first observe that ifA is the zero matrix, then the result is trivial. Hence,we can assume that rankA = r > 0.

Let Rbe the reduced row echelon form ofA. By definition of the reduced row echelonform, we have that the set of all non-zero rows ofR form a basis for the rowspace ofR. We will now show that they also form a basis for the rowspace of A.

As in the proof above, denote the columns ofA by ~a1, . . . , ~an. Let E = Ek · · ·E1

whereE1, . . . ,Ek are elementary matrices such thatEk · · ·E1A = R. We then haveA = E−1R.

Let ~b ∈ Row(A). Then

~b = AT~x

= (E−1R)T~x

= RT(E−1)T~x

Let ~y = (E−1)T~x. Then, we have~b = RT~y, so ~b ∈ Row(R). Thus, Row(A) ⊆Row(R). Hence, the non-zero rows ofR spans Row(A). Since they are also linearlyindependent, they form a basis for Row(A), and the result follows. �


We know how to find a basis for the nullspace of a matrix from ourwork on findingbases of eigenspaces in Chapter 6. You may have noticed in Chapter 6 that thestandard procedure for finding a spanning set of a nullspace always led to a linearlyindependent set. We could prove this directly, but it would be awkward. Instead, wefirst prove the following important result.

THEOREM 4 (Dimension Theorem)

Let A be anm× n matrix. Then

rankA+ dim Null(A) = n

Proof: Let {~v1, . . . ,~vk} be a basis for Null(A) so that dim Null(A) = k. Then we canextend this to a basis{~v1, . . . ,~vk,~vk+1, . . . ,~vn} forRn. We will prove that{A~vk+1, . . . ,A~vn}is a basis for Col(A).

Consider

~0 = ck+1A(~vk+1) + · · · + cnA(~vn) = A(ck+1~vk+1 + · · · + cn~vn)

Then,ck+1~vk+1 + · · · + cn~vn ∈ Null(A). Hence, we can write it as a linear combinationof the basis vectors for Null(A). So, we get

ck+1~vk+1 + · · · + cn~vn = d1~v1 + · · · + dk~vk

−d1~v1 − · · · − dk~vk + ck+1~vk+1 + · · · + cn~vn = ~0

Thus,d1 = · · · = dk = ck+1 = · · · = cn = 0 since{~v1, . . . ,~vn} is linearly independent.Therefore,{A~vk+1, . . . ,A~vn} is linearly independent.

Let ~b ∈ Col(A). Then~b = A~x for some~x ∈ Rn. Writing ~x as a linear combination ofthe basis vectors{~v1, . . . ,~vn} gives

~b = A(c1~v1 + · · · + ck~vk + ck+1~vk+1 + · · · + cn~vn)

= c1A~v1 + · · · + ckA~vk + ck+1A~vk+1 + · · · + cnA~vn

But,~vi ∈ Null(A) for 1 ≤ i ≤ k, soA~vi = ~0. Hence, we have

~b = ~0+ · · · + ~0+ ck+1A~vk+1 + · · · + cnA~vn

= ck+1A~vk+1 + · · · + cnA~vn

Thus, Span{A~vk+1, . . . ,A~vn} = Col(A).

Therefore, we have show that{A~vk+1, . . . ,A~vn} is a basis for Col(L) and hence

rankA = dim Col(A) = n− k = n− dim Null(A)

as required. �


COROLLARY 5 Let A be anm× n matrix with rankA = r. Then

dim Null(A) = n− r

dim Null(AT) = m− r

We now can find a basis for the four fundamental subspaces of a matrix. We demon-strate this with a couple examples.

EXAMPLE 1 Let A =

1 2 −11 1 21 0 5

. Find a basis for the four fundamental subspaces.

Solution: Row reducingA we get

1 2 −11 1 21 0 5

∼

1 0 50 1 −30 0 0

= R

A basis for Row(A) is the set of non-zero rows ofR. Thus, a basis is

105

,

01−3

(remember that in linear algebra, vectors inRn are always written as column vectors).

A basis for Col(A) is the columns ofA which correspond to the columns ofR which

contain leading ones. Hence, a basis for Col(A) is

111

,

210

.

To find a basis for Null(A), we solveA~x = ~0 as normal. That is, we rewrite the systemR~x = ~0 as a system of equations:

x1 + 5x3 = 0

x2 − 3x3 = 0

Hence,x3 is a free variable and the general solution is

x1

x2

x3

=

−5x3

3x3

x3

= x3

−531

Consequently, a basis for Null(A) is

−531

.

To find a basis for Null(AT), we can just use our method for finding the nullspace ofa matrix onAT . That is, we row reduceAT to get

1 1 12 1 0−1 2 5

∼

1 0 −10 1 20 0 0


So, we have

x1 − x3 = 0

x2 + 2x3 = 0

Thus,x3 is a free variable and the general solution is

x1

x2

x3

=

x3

−2x3

x3

= x3

1−21

Therefore, a basis for the left nullspace ofA is

1−21

.

EXAMPLE 2 Let A =

2 4 −4 21 2 2 −7−1 −2 0 3

. Find a basis for the four fundamental subspaces.

Solution: Row reducingA we get

2 4 −4 21 2 2 −7−1 −2 0 3

∼

1 2 0 −30 0 1 −20 0 0 0

= R

The set of non-zero rows

120−3

,

001−2

of R forms a basis for Row(A).

The columns ofA which correspond to the columns ofRwhich contain leading ones

form a basis for Col(A). Hence, a basis for Col(A) is

21−1

,

−420

.

To find a basis for Null(A), we solveA~x = ~0 as normal. That is, we rewrite the systemR~x = ~0 as a system of equations:

x1 + 2x2 − 3x4 = 0

x3 − 2x4 = 0

Hence,x2 andx4 are free variables and the general solution is

x1

x2

x3

x4

=

−2x2 + 3x4

x2

2x4

x4

= x2

−2100

+ x4

3021

Hence, a basis for Null(A) is

−2100

,

3021

.


We row reduceAT to get

2 1 −14 2 −2−4 2 02 −7 3

∼

1 0 −1/40 1 −1/20 0 00 0 0

Thus, we have

x1 −14

x3 = 0

x2 −12

x3 = 0

Consequently,x3 is a free variable and the general solution is

x1

x2

x3

=

14x312x3

x3

= x3

1/41/21

Therefore, a basis for the left nullspace ofA is

1/41/21

.

REMARK

It is not actually necessary to row reduceAT to find a basis for the left nullspace ofA.We can use the fact thatEA= R whereE is the product of elementary matrices usedto bringA to its reduced row echelon formR to find a basis for the left nullspace. Thederivation of this procedure is left as an exercise.

Section 7.1 Problems

1. Find a basis for the four fundamental subspace of each matrix.

(a)

[

1 2 −12 4 3

]

(b)

2 −1 53 4 21 1 1

(c)

1 −2 3 5−2 4 0 −43 −6 −5 1

(d)

3 2 5 3−1 0 −3 11 1 1 11 4 −5 1

2. LetA be ann× n matrix. Prove that Null(A) = {~0} if and only if detA = 0.

3. Let B be anm× n matrix.

(a) Prove that if~x is any vector in the left nullspace ofB, then~xTB = ~0T .

(b) Prove that if~x is any vector in the left nullspace ofB and~y is any vectorin the columnspace ofB, then~x · ~y = 0.

4. Invent a 2× 2 matrixA such that Null(A) = Col(A).


7.2 Subspaces of Linear Mappings

Recall from Math 136 that we defined a linear mappingL : Rn→ Rm to be a functionwith domainRn and codomainRm such that

L(s~x+ t~y) = sL(~x) + tL(~y)

for all ~x, ~y ∈ Rn ands, t ∈ R. We defined the range and kernel (nullspace) of a linearmappingL : Rn→ Rm by

Range(L) = {L(~x) ∈ Rm | ~x ∈ Rn}Ker(L) = {~x ∈ Rn | L(~x) = ~0}

It is easy to verify using the Subspace Test that Range(L) is a subspace ofRm andKer(L) is a subspace ofRn.

Additionally, we saw that every linear mappingL : Rn→ Rm is a matrix mapping. Inparticular, we defined the standard matrix [L] of L by

[L] =[

L(~e1) · · · L(~en)]

where{~e1, . . . , ~en} is the standard basis forRn. It satisfies

L(~x) = [L]~x

for all ~x ∈ Rn. This relationship between linear mappings and matrix mappings isvery important, and we will continue to look at this during the remainder of the book.The purpose of the next two theorems is to further demonstrate this relationship.

THEOREM 1 If L : Rn → Rm is a linear mapping, then Range(L) = Col([L]) andrank[L] = dim Range(L).

THEOREM 2 If L : Rn → Rm is a linear mapping, then Ker(L) = Null([L]) anddim(Ker(L)) = n− rank[L].

EXAMPLE 1 Let L : R4→ R3 be the linear mapping with standard matrix

[L] =

0 1 0 −21 2 1 −12 4 3 −1

Find a basis for the range ofL and kernel ofL.

Solution: To find a basis for the range ofL, we can just find a basis for thecolumnspace of [L]. Row reducing [L] gives

0 1 0 −21 2 1 −12 4 3 −1

∼

1 0 0 20 1 0 −20 0 1 1

Section 7.2 Subspaces of Linear Mappings 9

Since the first three columns of the reduced row echelon form abasis for thecolumnspace of the reduced row echelon form of [L], the first three columns of [L]

form a basis for Col([L]). Thus, a basis for the range ofL is

012

,

124

,

013

.

To find a basis for Ker(L), we will find a basis for the nullspace of [L]. Thus, we needto find a basis for the solution space of the homogeneous system [L]~x = ~0. We found

above that the RREF of [L] is

1 0 0 20 1 0 −20 0 1 1

. Thus, we have

x1 + 2x4 = 0

x2 − 2x4 = 0

x3 + x4 = 0

Then the general solution of [L]~x = ~0 is

~x =

x1

x2

x3

x4

=

−2x4

2x4

−x4

x4

= x4

−22−11

So, a basis for Ker(L) is

−22−11

.

The Dimension Theorem for matrices gives us the following result.

THEOREM 3 Let L : Rn→ Rm be a linear mapping. Then,

dim Range(L) + dim Ker(L) = dim(Rn)


1. LetL(x1, x2) = (x1,−2x1 + x2, 0).

(a) Prove thatL is linear.

(b) Find the standard matrix ofL.

(c) Find a basis for Ker(L) and Range(L).

2. Prove Theorem 1 and Theorem 2.

Chapter 8

Linear Mappings

8.1 General Linear Mappings

Linear Mappings L : V→ W

We now observe that we can extend our definition of a linear mapping to the casewhere the domain and codomain are general vectors spaces instead of justRn.

DEFINITIONLinear Mapping

LetV andW be vector spaces. A mappingL : V→W is calledlinear if

L(s~x+ t~y) = sL(~x) + tL(~y)

for all ~x, ~y ∈ V ands, t ∈ R.

REMARKS

1. As before, two linear mappingsL andM are equal if and only if they have thesame domain, the same codomain, andL(~v) = M(~v) for all ~v in the domain.

2. The definition of a linear mapping above still makes sense because of the clo-sure properties of vector spaces.

3. As before, we sometimes call a linear mappingL : V→ V a linear operator.

4. It is important not to assume that any results that held forlinear mappingsL : Rn → Rm also hold for linear mappingsL : V→W.

10

Section 8.1 General Linear Mappings 11

EXAMPLE 1 Let L : P3(R)→ R2 be defined byL(a+ bx+ cx2 + dx3) =

[

a+ db+ c

]

.

(a) EvaluateL(1+ 2x2 − x3).

Solution: L(1+ 2x2 − x3) =

[

1+ (−1)0+ 2

]

=

[

02

]

.

(b) Prove thatL is linear.

Solution: Let a1+b1x+c1x2+d1x3, a2+b2x+c2x2+d2x3 ∈ P3(R) ands, t ∈ R,then

L(

s(a1 + b1x+ c1x2 + d1x3) + t(a2 + b2x+ c2x2 + d2x3))

= L(

(sa1 + ta2) + (sb1 + tb2)x+ (sc1 + tc2)x2 + (sd1 + td2)x

3)

=

[

sa1 + ta2 + sd1 + td2

sb1 + tb2 + sc1 + tc2

]

= s

[

a1 + d1

b1 + c1

]

+ t

[

a2 + d2

b2 + c2

]

= sL(a1 + b1x+ c1x2 + d1x3) + tL(a2 + b2x+ c2x2 + d2x3)

Thus,L is linear.

EXAMPLE 2 Let tr : Mn×n(R) → R be defined by trA =n

∑

i=1

(A)ii (called thetrace of a matrix).

Prove that tr is linear.

Solution: Let A, B ∈ Mn×n(R) ands, t ∈ R. Then

tr(sA+ tB) =n

∑

i=1

(sA+ tB)ii

=

n∑

i=1

(s(A)ii + t(B)ii )

= sn

∑

i=1

(A)ii + tn

∑

i=1

(B)ii

= str A+ t tr B

Thus, tr is linear.

12 Chapter 8 Linear Mappings

EXAMPLE 3 Prove that the mappingL : P2(R) → M2×2(R) defined byL(a+ bx+ cx2) =

[

a bc0 abc

]

is not linear.

Solution: Observe that

L(1+ x+ x2) =

[

1 10 1

]

But,

L(

2(1+ x+ x2))

= L(2+ 2x+ 2x2) =

[

2 40 8

]

, 2L(1+ x+ x2)

We now begin to show that many of the results we had for linear mappingsL : Rn → Rm also hold for linear mappingsL : V→W.

THEOREM 1 LetV andW be vector spaces and letL : V→W be a linear mapping. Then,

L(~0) = ~0

DEFINITIONAddition

ScalarMultiplication

Let L : V→W andM : V→W be linear mappings. Then we defineL + M by

(L + M)(~v) = L(~v) + M(~v)

and for anyt ∈ R we definetL by

(tL)(~v) = tL(~v)

EXAMPLE 4 Let L : M2×2(R) → P2(R) be defined byL

([

a bc d

])

= a + bx+ (a + c + d)x2 and

M : M2×2(R) → P2(R) be defined byM

([

a bc d

])

= (b+ c) + dx2. ThenL andM are

both linear andL + M is the mapping defined by

(L + M)

([

a bc d

])

= L

([

a bc d

])

+ M

([

a bc d

])

= [a+ bx+ (a+ c+ d)x2] + [(b+ c) + dx2]

= (a+ b+ c) + bx+ (a+ c+ 2d)x2

Similarly, 4L is the mapping defined by

(4L)

([

a bc d

])

= 4L

([

a bc d

])

= 4a+ 4bx+ (4a+ 4c+ 4d)x2

Section 8.1 General Linear Mappings 13

THEOREM 2 Let V andW be vector spaces. The setL of all linear mappingsL : V → W withstandard addition and scalar multiplication of mappings isa vector space.

Proof: To prove thatL is a vector space, we need to show that it satisfies all tenvector spaces axioms. We will prove V1 and V2 and leave the rest as exercises. LetL,M ∈ L.

V1 To prove thatL is closed under addition, we need to show thatL+M is a linearmapping with domainV and codomainW.

By definition, the domain ofL + M isV and for any~v ∈ V we have

(L + M)(~v) = L(~v) + M(~v) ∈W

sinceL(~v) ∈W, M(~v) ∈W, andW is closed under addition. Moreover, sinceLandM are linear for any~v1,~v2 ∈ V ands, t ∈ R we have

(L + M)(s~v1 + t~v2) = L(s~v1 + t~v2) + M(s~v1 + t~v2)

= sL(~v1) + tL(~v2) + sM(~v1) + tM(~v2)

= s[L(~v1) + M(~v1)] + t[L(~v2) + M(~v2)]

= s(L + M)(~v1) + t(L + M)(~v2)

Thus,L + M is linear soL + M ∈ L.

V2 For any~v ∈ V we have

(L + M)(~v) = L(~v) + M(~v) = M(~v) + L(~v) = (M + L)(~v)

since addition inW is commutative. HenceL + M = M + L.

�

We also define the composition of mappings in the expected way.

DEFINITIONComposition

Let L : V→W andM :W→ U be linear mappings. Then we defineM ◦ L by

(M ◦ L)(~v) = M(L(~v))

for all ~v ∈ V.


EXAMPLE 5 Let L : P2(R)→ R3 be defined byL(a+bx+cx2) =

a+ bc0

and letM : R3 → M2×2(R)

be defined byM

x1

x2

x3

=

[

x1 0x2 + x3 0

]

. ThenM ◦ L is the mapping defined by

(M ◦ L)(a+ bx+ cx2) = M(L(a+ bx+ cx2)) = M

a+ bc0

=

[

a+ b 0c 0

]

Observe thatM ◦ L is in fact a linear mapping fromP2(R) to M2×2(R).


1. Determine which of the following mappings are linear. If it is linear, prove it.If not, give a counterexample to show that it is not linear.

(a) L : R2→ M2×2(R) defined byL(a, b) =

[

a 00 b

]

(b) T : P2(R) → P2(R) defined byT(a+ bx+ cx2) = (a− b) + (bc)x2.

(c) L : P2(R) → M2×2(R) defined byL(a+ bx+ cx2) =

([

1 0a+ c b+ c

])

.

(d) T : M2×2(R) → R3 defined byT

([

a bc d

])

=

a+ bb+ cc− a

(e) D : P3(R)→ P2(R) defined byD(a+ bx+ cx2 + dx3) = b+ 2cx+ 3dx2

(f) L : M2×2(R) → R defined byL(A) = detA.

2. LetB be a basis for ann-dimensional vector spaceV. Prove that the mappingL : V→ Rn defined byL(~v) = [~v]B for all ~v ∈ V is linear.

3. LetL : V→W be a linear mapping. Prove thatL(~0) = ~0.

4. Let L : V → W andM : W → U be linear mappings. Prove thatM ◦ L is alinear mapping fromV toU.

5. LetL be the set of all linear mappings fromV toW with standard addition andscalar multiplication of linear mappings. Prove that

(a) tL ∈ L for all L ∈ L andt ∈ R.

(b) t(L + M) = tL + tM for all L,M ∈ L andt ∈ R.

Section 8.2 Rank-Nullity Theorem 15

8.2 Rank-Nullity Theorem

Our goal in this section is not only to extend the definitions of the range and kernel ofa linear mapping to general linear mappings, but to also generalize Theorem 8.2.3 togeneral linear mappings. We will see that this generalization, called the Rank-NullityTheorem, is extremely usefully.

DEFINITIONRange

Kernel

Let L : V→W be a linear mapping, then thekernel of L is

Ker(L) = {~v ∈ V | L(~v) = ~0W}

and therange of L isRange(L) = {L(~v) | ~v ∈ V}

THEOREM 1 Let L : V → W be a linear mapping. Then Ker(L) is a subspace ofV and Range(L)is a subspace ofW.

The procedure for finding a basis for the range and kernel of a general linear mappingis, of course, exactly the same as we saw for linear mappings fromRn toRm.

EXAMPLE 1 Find a basis for the range and kernel ofL : P3(R) → R2 defined by

L(a+ bx+ cx2 + dx3) =

[

a+ db+ c

]

Solution: If a+ bx+ cx2 + dx3 ∈ Ker(L), then[

00

]

= L(a+ bx+ cx2 + dx3) =

[

a+ db+ c

]

Therefore,a = −d andb = −c. Thus, every vector in Ker(L) has the form

a+ bx+ cx2 + dx3 = −d − cx+ cx2 + dx3 = d(−1+ x3) + c(−x+ x2)

Since{−1+ x3,−x+ x2} is clearly linearly independent, it is a basis for Ker(L).

The range ofL contains all vectors of the form[

a+ db+ c

]

= (a+ d)

[

10

]

+ (b+ c)

[

01

]

So a basis for Range(L) is

{[

10

]

,

[

01

]}

.


EXAMPLE 2 Determine the dimension of the range and kernel ofL : P2(R)→ M3×2(R) defined by

L(a+ bx+ cx2) =

a bc b

a+ b a+ c− b

Solution: If a+ bx+ cx2 ∈ Ker(L), then

0 00 00 0

= L(a+ bx+ cx2) =

a bc b

a+ b a+ c− b

Hence, we must havea = b = c = 0. Thus, Ker(L) = {~0}, so a basis for Ker(L) is theempty set. Hence, dim(Ker(L)) = 0.

The range ofL contains all vectors of the form

a bc b

a+ b a+ c− b

= a

1 00 01 1

+ b

0 10 11 −1

+ c

0 01 00 1

We can easily verify that the set

1 00 01 1

,

0 10 11 −1

,

0 01 00 1

is linearly independent

and hence a basis for Range(L). Consequently, dim(Range(L)) = 3.

EXERCISE 1 Find a basis for the range and kernel ofL : R2→ M2×2(R) defined by

L

([

x1

x2

])

=

[

x1 x1 + x2

0 x1 + x2

]

Observe in all of the examples above that dim(Range(L)) + dim(Ker(L)) = dimVwhich matches the result we had for linear mappingsL : Rn → Rm. Before weextend this to the general case, we make a couple of definitions.

DEFINITIONRank

Nullity

Let L : V→W be a linear mapping. Then we define therank of L by

rank(L) = dim(Range(L))

We define thenullity of L to be

nullity(L) = dim(Ker(L))

Section 8.2 Rank-Nullity Theorem 17

THEOREM 2 (Rank-Nullity Theorem)

Let V be ann-dimensional vector space and letW be a vector space. IfL : V → Wis linear, then

rank(L) + nullity(L) = n

Proof: Assume that nullity(L) = k and let{

~v1, · · · ,~vk}

be a basis for Ker(L). Thenwe can extend

{

~v1, · · · ,~vk}

to a basis{

~v1, · · · ,~vk,~vk+1, · · · ,~vn}

for V. We claim that{

L(~vk+1), · · · , L(~vn)}

is a basis for Range(L).

Let~y ∈ Range(L). Then by definition, there exists~v ∈ V such that

~y = L(~v) = L(c1~v1 + · · · + ck~vk + ck+1~vk+1 + · · · + cn~vn)

= c1L(~v1) + · · · + ckL(~vk) + ck+1L(~vk+1) + · · · + cnL(~vn)

= ~0+ · · · + ~0+ ck+1L(~vk+1) + · · · + cnL(~vn)

= ck+1L(~vk+1) + · · · + cnL(~vn)

Hence~y ∈ Span{L(~vk+1), . . . , L(~vn)}

Consider

ck+1L(~vk+1) + · · · + cnL(~vn) = ~0

L(ck+1~vk+1 + · · · + cn~vn) = ~0

This implies thatck+1~vk+1 + · · · + cn~vn ∈ Ker(L). Thus we can write

ck+1~vk+1 + · · · + cn~vn = d1~v1 + · · · + dk~vk

But, then we have

−d1~v1 − · · · − dk~vk + ck+1~vk+1 + · · · + cn~vn = ~0

Hence,d1 = · · · = dk = ck+1 = · · · = cn = 0 since{

~v1, · · · ,~vn}

is linearly independentas it is a basis forV.

Therefore, we have shown that{L(~vk+1), · · · , L(~vn)} is a basis for Range(L) and hence

rank(L) = n− k = n− nullity(L)

�

REMARK

The proof of the Rank-Nullity Theorem should seem very familiar. It is essentiallyidentical to that of the Dimension Theorem in Chapter 7. In this book there will bequite a few times where proofs are essentially repeated, so spending time to makesure you understand proofs when you first see them can have real long term benefits.


EXAMPLE 3 Let L : P2(R)→ M2×3(R) be defined by

L(a+ bx+ cx2) =

[

a a ca a c

]

Find the rank and nullity ofL.

Solution: If a+ bx+ cx2 ∈ Ker(L) then[

a a ca a c

]

= L(a+ bx+ cx2) =

[

0 0 00 0 0

]

Hencea = 0 andc = 0. So every vector in Ker(L) has the formbx so a basis forKer(L) is {x}. Thus nullity(L) = 1 and so by the Rank-Nullity Theorem we get that

rank(L) = dimP2(R) − nullity(L) = 3− 1 = 2

In the example above, we could have instead found a basis for the range and thenused the Rank-Nullity Theorem to find the nullity ofL.


1. Find the rank and nullity of the following linear mappings.

(a) L : R2→ M2×2(R) defined byL(a+ bx+ cx2) =

[

a 00 b

]

(b) L : P2(R) → P2(R) defined byL(a+ bx+ cx2) = (a− b) + (bc)x2.

(c) T : P2(R) → M2×2(R) defined byT(a+ bx+ cx2) =

[

0 0a+ c b+ c

]

.

(d) L : R3→ P1(R) defined byL

abc

= (a+ b) + (a+ b+ c)x.

2. Prove that if{L(~v1), . . . , L(~vk)} spansW, then dimV ≥ dimW.

3. Find a linear mappingL : V→W where dimV ≥ dimW, but Range(L) ,W.

4. Find a linear mappingL : V→ V such that Ker(L) = Range(L).

5. LetV andW ben-dimensional vector spaces and letL : V → W be a linearmapping. Prove that Range(L) =W if and only if Ker(L) = {~0}.

6. LetU,V,W be finite dimensional vector spaces and letL : V → U andM :U→W be linear mappings.

(a) Prove that rank(M ◦ L) ≤ rank(M).

(b) Prove that rank(M ◦ L) ≤ rank(L).

(c) Prove that ifM is invertible, then rank(M ◦ L) = rankL.

Section 8.3 Matrix of a Linear Mapping 19

8.3 Matrix of a Linear Mapping

We now show that every linear mappingL : V→W can also be represented as a ma-trix mapping. However, we must be careful when dealing with general vector spacesas our domain and codomain. For example, it is certainly impossible to represent alinear mappingL : P2(R) → M2×2(R) as a matrix mappingL(~x) = A~x since we cannot multiply a matrix by a polynomial~x ∈ P2(R). Moreover, we require the output tobe a 2× 2 matrix.

Thus, if we are going to define a matrix representation of a general linear mapping,we need to convert vectors fromV to vectors inRn. Recall that the coordinate vectorof ~v ∈ V with respect to an ordered basisB is a vector inRn. In particular, ifB ={~v1, . . . ,~vn} and~v = b1~v1 + · · · + bn~vn, then the coordinate vector of~v with respect toB is defined to be

[~v]B =

b1...

bn

Using coordinates, we can write a matrix mapping representation for a linear map-ping L : V→W. That is, we want to find a matrixA such that

[L(~x)]C = A[~x]B

for every~x ∈ V, whereB is a basis forV andC is a basis forW.

Consider the left-hand side [L(~x)]C. Using properties of linear mappings and coordi-nates, we get

[L(~x)]C = [L(b1~v1 + · · · + bn~vn)]C= [b1L(~v1) + · · · + bnL(~vn)]C= b1[L(~v1)]C + · · · + bn[L(~vn)]C

=[

[L(~v1)]C · · · [L(~vn)]C]

b1...

bn

= A[~x]B

Thus, we see the desired matrix is

A =[

[L(~v1)]C · · · [L(~vn)]C]

DEFINITIONMatrix of a

Linear Mapping

SupposeB = {~v1, . . . ,~vn} is any basis for a vector spaceV andC is any basis for afinite dimensional vector spaceW and letL : V→W be a linear mapping. Then thematrix of L with respect to bases B and C is

C[L]B =[

[L(~v1)]C · · · [L(~vn)]C]

It satisfies[L(~x)]C = C[L]B[~x]B

for all ~x ∈ V.


EXAMPLE 1 Let L : P2(R) → R2 be the linear mapping defined byL(a + bx+ cx2) =

[

a+ cb− c

]

,

B = {1+ x2, 1+ x,−1+ x+ x2}, C ={[

10

]

,

[

11

]}

. FindC[L]B.

Solution: To find C[L]B we need to determine theC-coordinates of the images of thevectors inB underL. We have

L(1+ x2) =

[

2−1

]

= 3

[

10

]

+ (−1)

[

11

]

L(1+ x) =

[

11

]

= 0

[

10

]

+ 1

[

11

]

L(−1+ x+ x2) =

[

00

]

= 0

[

10

]

+ 0

[

11

]

Hence,

C[L]B =[

[L(1+ x2)]C [L(1+ x)]C [L(−1+ x+ x2)]C]

=

[

3 0 0−1 1 0

]

We can check our answer. Letp(x) = a + bx+ cx2. To useC[L]B we need [p(x)]B.Hence, we need to writep(x) as a linear combination of the vectors inB. In particular,we need to solve

a+bx+cx2 = t1(1+x2)+ t2(1+x)+ t3(−1+x+x2) = (t1+ t2− t3)+(t2+ t3)x+(t1+ t3)x3

Row reducing the corresponding augmented matrix gives

1 1 −1 a0 1 1 b1 0 1 c

∼

(a− b+ 3c)/3(a+ 2b− c)/3(−a+ b+ c)/3

Thus,

[p(x)]B =13

a− b+ 3ca+ 2b− c−a+ b+ c

So we get

[L(p(x))]C = C[L]B[p(x)]B

=

[

3 0 0−1 1 0

]

13

a− b+ 2ca+ 2b− c−a+ b+ c

=

[

a− b+ 2cb− c

]

Therefore, by definition ofC-coordinates

L(p(x)) = (a− b+ 2c)

[

10

]

+ (b− c)

[

11

]

=

[

a+ cb− c

]

as required.


EXAMPLE 2 Let T : R2 → M2×2(R) be the linear mapping defined byT

([

ab

])

=

[

a+ b 00 a− b

]

,

and letB ={[

2−1

]

,

[

12

]}

andC ={[

1 10 0

]

,

[

1 00 1

]

,

[

1 10 1

]

,

[

0 01 0

]}

. DetermineC[T]B

and use it to calculate theT(~v) where [~v]B =

[

2−3

]

.

Solution: To find C[L]B we need to determine theC-coordinates of the images of thevectors inB underL. We have

T(2,−1) =

[

1 00 3

]

= −2

[

1 10 0

]

+ 1

[

1 00 1

]

+ 2

[

1 10 1

]

+ 0

[

0 01 0

]

T(1, 2) =

[

3 00 −1

]

= 4

[

1 10 0

]

+ 3

[

1 00 1

]

+ (−4)

[

1 10 1

]

+ 0

[

0 01 0

]

Hence,

C[T]B =[

[T(2,−1)]C [T(1, 2)]C]

=

−2 41 32 −40 0

Thus, we get

[T(~v)]C =

−2 41 32 −40 0

[

2−3

]

=

−16−7160

so

T(~v) = (−16)

[

1 10 0

]

+ (−7)

[

1 00 1

]

+ 16

[

1 10 1

]

+ (0)

[

0 01 0

]

=

[

−7 00 9

]

In the special case of a linear operatorL acting on a finite-dimensional vector spaceV with basisB, we often wish to find the matrixB[L]B.

DEFINITIONMatrix of a

Linear Operator

SupposeB = {~v1, . . . ,~vn} is any basis for a vector spaceV and letL : V → V be alinear operator. Then theB-matrix of L (or the matrix ofL with respect to the basisB) is

[L]B =[

[L(~v1)]B · · · [L(~vn)]B]

It satisfies[L(~x)]B = [L]B[~x]B

for all ~x ∈ V.


EXAMPLE 3 Let B =

111

,

101

,

132

and letA =

−6 −2 9−5 −1 7−7 −2 10

. Find theB-matrix of the linear

mapping defined byL(~x) = A~x.

Solution: By definition, the columns of theB-matrix are theB-coordinates of theimages of the vectors inB underL. We find that

−6 −2 9−5 −1 7−7 −2 10

111

=

111

= 1

111

+ 0

101

+ 0

132

−6 −2 9−5 −1 7−7 −2 10

101

=

323

= 2

111

+ 1

101

+ 0

132

−6 −2 9−5 −1 7−7 −2 10

132

=

667

= 3

111

+ 2

101

+ 1

132

Hence, [L]B =[

[L(1, 1, 1)]B [L(1, 0, 1)]B [L(1, 3, 2)]B]

=

1 2 30 1 20 0 1

EXAMPLE 4 Let L : P2(R)→ P2(R) be defined byL(a+bx+cx2) = (2a+b)+(a+b+c)x+(b+2c)x2.Find [L]B whereB = {−1+x2, 1−2x+x2, 1+x+x2} and find [L]S whereS = {1, x, x2}.Solution: To determine [L]B we need to find theB-coordinates of the images of thevectors inB underL.

L(−1+ x2) = −2+ 0x+ 2x2 = 2(−1+ x2) + 0(1− 2x+ x2) + 0(1+ x+ x2)

L(1− 2x+ x2) = 0+ 0x+ 0x2 = 0(−1+ x2) + 0(1− 2x+ x2) + 0(1+ x+ x2)

L(1+ x+ x2) = 3+ 3x+ 3x2 = 0(−1+ x2) + 0(1− 2x+ x2) + 3(1+ x+ x2)

Thus,

[L]B =[

[L(−1+ x2)]B [L(1− 2x+ x2)]B [L(1+ x+ x2)]B]

=

2 0 00 0 00 0 3

Similarly, for [L]S we calculate theS-coordinates of the image of the vectors inSunderL.

L(1) = 2+ x = 2(1)+ 1(x) + 0x2

L(x) = 1+ x+ x2 = 1(1)+ 1(x) + 1(x2)

L(x2) = x+ 2x2 = 0(1)+ 1(x) + 2(x2)

Hence,

[L]S =

2 1 01 1 10 1 2


In Example4 we found the matrix ofL with respect to the basisB is diagonal whilethe matrix ofL with respect to the basisS is not. This is perhaps not surprising sincewe see that the vectors inB areeigenvectors of L (that is, they satisfyL(~vi) = λi~vi),while the vectors inS are not. Recall from Math 136, that we called such a basis ageometrically natural basis.

REMARK

The relationship between diagonalization and the matrix ofa linear operator is ex-tremely important. We will review this a little later in the text. You may also find ituseful at this point to review Section 6.1 in the Math 136 course notes.


1. Find the matrix of each linear mapping with respect to the basesB andC.

(a) D : P2(R) → P1(R) defined byD(a+ bx+ cx2) = b+ 2cx, B = {1, x, x2},C = {1, x}

(b) L : R2→ P2(R) defined byL(a1, a2) = (a1+ a2)+ a1x2,B ={[

1−1

]

,

[

12

]}

,

C = {1+ x2, 1+ x,−1− x+ x2}

(c) T : M2×2(R) → R2 defined byT

([

a bc d

])

=

[

a+ cb+ c

]

,

B ={[

1 10 0

]

,

[

0 11 0

]

,

[

1 00 1

]

,

[

0 01 1

]}

, C ={[

11

]

,

[

−11

]}

2. Find theB-matrix of each linear mapping with respect to the give basisB.

(a) D : P2(R)→ P2(R) defined byD(a+ bx+ cx2) = b+ 2cx, B = {1, x, x2}

(b) L : M2×2(R) → M2×2(R) defined byL

([

a bc d

])

=

[

a b+ c0 d

]

,

B ={[

1 10 0

]

,

[

1 00 1

]

,

[

1 10 1

]

,

[

0 01 0

]}

(c) T : U → U, whereU is the subspace of diagonal matrices inM2×2(R),

defined byT

([

a 00 b

])

=

[

a+ b 00 2a+ b

]

, B ={[

1 00 1

]

,

[

2 00 3

]}

3. Let V be ann-dimensional vector space and letL : V → V by the linearoperator defined byL(~v) = ~v. Prove that [L]B = I for any basisB of V.

4. Invent a mappingL : R2 → R2 and a basisB for R2, such that Col([L]B) ,Range(L). Justify your answer.


8.4 Isomorphisms

Recall that the ten vector space axioms define a “structure” for the set based onthe operations of addition and scalar multiplication. Since all vector spaces satisfythe same ten properties, we would expect that alln-dimensional vector spaces shouldhave the same structure. This idea seems to be confirmed by ourwork with coordinatevectors in Math 136. No matter whatn-dimensional vector spaceVwe use and whichbasis we use for the vector space, we have a nice way of relating the vectors inV tovectors inRn. Moreover, with respect to this basis the operations of addition andscalar multiplication are preserved. Consider the following calculations:

123

+

456

=

579

(1+ 2x+ 3x2) + (4+ 5x+ 6x2) = 5+ 7x+ 9x2

(~v1 + 2~v2 + 3~v3) + (4~v1 + 5~v2 + 6~v3) = 5~v1 + 7~v2 + 9~v3

No matter which vector space we are using and which basis for that vector space,any linear combination of vectors is really just performed on the coordinates of thevectors with respect to the basis.

We will now look at how to use general linear mappings to mathematically provethese observations. We begin with some familiar concepts.

One-To-One and Onto

DEFINITIONOne-To-One

Onto

Let L : V → W be a linear mapping.L is calledone-to-one (injective) if for every~u,~v ∈ V such thatL(~u) = L(~v) we must have~u = ~v.L is calledonto (surjective) if for every~w ∈W, there exists~v ∈ V such thatL(~v) = ~w.

EXAMPLE 1 Let L : R2 → P2(R) be the linear mapping defined byL(a1, a2) = a1 + a2x2. ThenL is one-to-one, since ifL(a1, a2) = L(b1, b2), thena1 + a2x2 = b1 + b2x2 and hencea1 = b1 anda2 = b2. L is not onto, since there is no~a ∈ R2 such thatL(~a) = x.

EXAMPLE 2 Let L : M2×2(R) → R be the linear mapping defined byL(A) = tr A. ThenL is not

one-to-one, sinceL

([

1 00 −1

])

= L

([

0 00 0

])

, but

[

1 00 −1

]

,

[

0 00 0

]

. L is onto, since if

we pick anyx ∈ R, then we can find a matrixA ∈ M2×2(R), for exampleA =

[

x 00 0

]

,

such thatL (A) = x.

Observe that a linear mapping being one-to-one means that each ~w ∈ Range(L) ismapped to by exactly one~v ∈ V while L being onto means that Range(L) = W.So there is a relationship between a mapping being onto and its range. We nowestablish a relationship between a mapping being one-to-one and its kernel. Theserelationships will be exploited in some proofs below.

Section 8.4 Isomorphisms 25

LEMMA 1 Let L : V→W be a linear mapping.L is one-to-one if and only if Ker(L) = {~0}.

Proof: AssumeL is one-to-one. Let~x ∈ Ker(L). ThenL(~x) = ~0 = L(~0) whichimplies that~x = ~0 sinceL is one-to-one. Hence Ker(L) = {~0}.

Assume Ker(L) = {~0} and considerL(~u) = L(~v). Then

~0 = L(~u) − L(~v) = L(~u− ~v)

Hence,~u− ~v ∈ Ker(L), so~u− ~v = ~0. Therefore,~u = ~v and soL is one-to-one. �

EXAMPLE 3 Let L : R3→ P5(R) be the linear mapping defined by

L(a, b, c) = a(1+ x3) + b(1+ x2 + x5) + cx4

Prove thatL is one-to-one, but not onto.

Solution: Let ~a =

abc

∈ Ker(L). Then,

0 = L(a, b, c) = a(1+ x3) + b(1+ x2 + x5) + cx4

This givesa = b = c = 0, so Ker(L) =

000

. Hence,L is one-to-one by Lemma1.

Observe that a basis for Range(L) is {1+ x3, 1+ x2+ x5, x4}. Hence, dim(Range(L)) =3 < dimP5(R). Thus, Range(L) , P5(R), soL is not onto.

EXAMPLE 4 Let L : M2×2(R)→ R2 be the linear mapping defined by

L

([

a bc d

])

=

[

ab

]

Prove thatL is onto, but not one-to-one.


[

0 01 1

]

∈ Ker(L). Thus,L is not one-to-one by Lemma1.

Let

[

x1

x2

]

∈ R2. Then observe thatL

([

x1 x2

0 0

])

=

[

x1

x2

]

, soL is onto.

EXERCISE 1 Let L : R3 → P2(R) be the linear mapping defined byL(a, b, c) = a+bx+ cx2. ProvethatL is one-to-one and onto.


Isomorphisms

To prove that two vector spacesV andW have the same structure we need to relateeach vector~vi ∈ V with a unique vector~wi ∈ W such that ifa~v1 + b~v2 = ~v3, thena~w1 + b~w2 = ~w3. Thus, we will require a linear mapping fromV toW which is bothone-to-one and onto.

DEFINITIONIsomorphism

Isomorphic

Let V andW be vector spaces. ThenV is said to beisomorphic to W if thereexists a linear mappingL : V → W which is one-to-one and onto.L is called anisomorphism fromV toW.

EXAMPLE 5 Show thatR4 is isomorphic toM2×2(R).

Solution: We need to invent a linear mappingL fromR4 to M2×2(R) that is one-to-oneand onto. To define such a linear mapping, we think about how torelate vectors inR

4 to vectors inM2×2(R). Keeping in mind our work at the beginning of this section,we define the mappingL : R4→ M2×2(R) by

L(a0, a1, a2, a3) =

[

a0 a1

a2 a3

]

Linear: Let ~a =

a0

a1

a2

a3

, ~b =

b0

b1

b2

b3

∈ R4 ands, t ∈ R. ThenL is linear since

L(s~a+ t~b) = L(sa0 + tb0, sa1 + tb1, sa2 + tb2, sa3 + tb3)

=

[

sa0 + tb0 sa1 + tb1

sa2 + tb2 sa3 + tb3

]

= s

[

a0 a1

a2 a3

]

+ t

[

b0 b1

b2 b3

]

= sL(~a) + tL(~b)

One-To-One: If L(~a) = L(~b), then

[

a0 a1

a2 a3

]

=

[

b0 b1

b2 b3

]

which impliesai = bi for

i = 0, 1, 2, 3. Therefore,~a = ~b and soL is one-to-one.

Onto:Pick

[

a0 a1

a2 a3

]

∈ M2×2(R). ThenL(a0, a1, a2, a3) =

[

a0 a1

a2 a3

]

, soL is onto.

Hence,L is an isomorphism and soR4 is isomorphic toM2×2(R).

It is instructive to think carefully about the isomorphism in the example above. Ob-serve that the images of the standard basis vectors forR

4 are the standard basis vec-tors for M2×2(R). That is, the isomorphism is mapping basis vectors to basisvectors.We will keep this in mind when constructing an isomorphism inthe next example.


EXAMPLE 6 Let T = {p(x) ∈ P2(R) | p(1) = 0} andS =

abc

∈ R4 | a+ b+ c = 0

. Prove thatT

is isomorphic toS.

Solution: By the factor theorem, every vectorp(x) ∈ T has the form (x− 1)(a+ bx).Hence, a basis forT isB = {1− x, x− x2}. Also, every vector~x ∈ S has the form

~x =

ab

−a− b

= a

10−1

+ b

01−1

Therefore, a basis forS is C =

10−1

,

01−1

. So, to define an isomorphism we want

to map the vectors inB to the vectors inC. In particular, we defineL : T→ S by

L(

a(1− x) + b(x− x2))

= a

10−1

+ b

01−1

Linear: Let p(x), q(x) ∈ T with p(x) = a1(1− x) + b1(x− x2) andq(x) = a2(1− x) + b2(x− x2). Then for anys, t ∈ R we get

L(sp(x) + tq(x)) = L(

s(a1 + ta2)(1− x) + (sb1 + tb2)(x− x2))

= s(a1 + ta2)

10−1

+ (sb1 + tb2)

01−1

= s

a1

10−1

+ b1

01−1

+ t

a2

10−1

+ b2

01−1

= sL(p(x)) + tL(q(x))

ThusL is linear.

One-To-One: Let a(1− x) + b(x− x2) ∈ Ker(L). Then

000

= L(

a(1− x) + b(x− x2))

= a

10−1

+ b

01−1

=

ab

−a− b

Hence,a = b = 0, so Ker(L) = {0}. Therefore, by Lemma1, L is one-to-one.

Onto: Let a

10−1

+ b

01−1

∈ S. Then

L(

a(1− x) + b(x− x2))

= a

10−1

+ b

01−1

soL is onto.

Thus,L is an isomorphism andT is isomorphic toS.


EXERCISE 2 Prove thatP2(R) is isomorphic toR3.

Observe in the examples above that the isomorphic vector spaces have the same di-mension. This makes sense since if we are mapping basis vectors to basis vectors,then both vector spaces need to have the same number of basis vectors. We nowprove that this is indeed always the case.

THEOREM 2 LetV andW be finite dimensional vector spaces. ThenV is isomorphic toW if andonly if dimV = dimW.

Proof: On one hand, assume that dimV = dimW = n. Let B = {~v1, . . . ,~vn} be abasis forV andC = {~w1, . . . , ~wn} be a basis forW. As in our examples above, wedefine a mappingL : V → W which maps the basis vectors inB to the basis vectorsin C. So, we define

L(t1~v1 + · · · + tn~vn) = t1~w1 + · · · + tn~wn

Linear: Let ~x = a1~v1 + · · · + an~vn, ~y = b1~v1 + · · · + bn~vn ∈ V ands, t ∈ R. Then

L(s~x+ t~y) = L(

(sa1 + tb1)~v1 + · · · + (san + tbn)~vn)

= (sa1 + tb1)~w1 + · · · + (san + tbn)~wn

= s(a1~w1 + · · · + an~wn) + t(b1~w1 + · · · + bn~wn)

= sL(~x) + tL(~y)

Therefore,L is linear.

One-To-One: If ~v ∈ Ker(L), then

~0 = L(~v) = L(c1~v1 + · · · + cn~vn) = c1L(~v1) + · · · + cnL(~vn) = c1~w1 + · · · + cn~wn

C is linearly independent so we getc1 = · · · = cn = 0. Therefore Ker(L) = {~0} andhenceL is one-to-one.

Onto: We have Ker(L) = {~0} sinceL is one-to-one. Thus, we get rank(L) = dimV −0 = n by the Rank-Nullity Theorem. Consequently, Range(L) is ann-dimensionalsubspace ofW which implies that Range(L) =W, soL is onto.

Thus,L is an isomorphism fromV toW and soV is isomorphic toW.

On the other hand, assume thatV is isomorphic toW. Then there exists an isomor-phismL from V toW. SinceL is one-to-one and onto, we get Range(L) = W andKer(L) = {~0}. Thus, the Rank-Nullity Theorem gives

dimW = dim(Range(L)) = rank(L) = dimV − nullity(L) = dimV

as required. �


The proof not only proves the theorem, but it demonstrates a few additional facts.First, it shows the intuitively obvious fact that ifV is isomorphic toW, thenW isisomorphic toV. Typically we just say thatV andW are isomorphic. Second, itconfirms that we can make an isomorphism fromV toW by mapping basis vectorsof V to basis vectors toW. Finally, observe that once we had proven thatL wasone-to-one, we could exploit the Rank-Nullity Theorem and that dimV = dimW toimmediately get that the mapping was onto. In particular, itshows us how to provethe following theorem.

THEOREM 3 If V andW are isomorphic vector spaces andL : V→W is linear, thenL is one-to-one if and only ifL is onto.

REMARK

Note that ifV andW are bothn-dimensional, then it doesnot mean every linearmappingL : V → W must be an isomorphism. For example,L : R4 → M2×2(R)

defined byL(x1, x2, x3, x4) =

[

0 00 0

]

is definitely not one-to-one nor onto!

We saw that we can make an isomorphism by mapping basis vectors to basis vectors.The following theorem shows that this property actually characterizes isomorphisms.

THEOREM 4 LetV andW be isomorphic vector spaces and let{~v1, . . . ,~vn} be a basis forV. Thena linear mappingL : V → W is an isomorphism if and only if{L(~v1), . . . , L(~vn)} is abasis forW.

The following exercise is intended to be quite challenging.It requires complete un-derstanding of what we have done in this chapter.

EXERCISE 3 Use the following steps to find a basis for the vector spaceL of all linear operatorsL : V→ V on an 2-dimensional vector spaceV.

1. Prove thatL is isomorphic toM2×2(R) by constructing an isomorphismT.

2. Find an isomorphismT−1 from M2×2(R) toL. Let {~e1, ~e2, ~e3, ~e4} denote the stan-dard basis vectors forM2×2(R). CalculateT−1(~e1), T−1(~e2), T−1(~e3), T−1(~e4).By Theorem4 these vectors for a basis forL.

3. Demonstrate that you are correct by proving that the set{T−1(~e1),T−1(~e2),T−1(~e3),T−1(~e4)} is linearly independent and spansL.



1. For each of the following pairs of vector spaces, define an explicit isomor-phism to establish that the spaces are isomorphic. Prove that your map is anisomorphism.

(a) P1(R) andR2

(b) P3(R) andM2×2(R)

(c) L : P2(R) → P2(R) defined byL(a+ bx+ cx2) = (a− b) + (bc)x2

(d) The vector spaceP = {p(x) ∈ P3 | p(1) = 0} and the vector spaceU of2× 2 upper triangular matrices

(e) The vector spaceS =

x1

x2

0x3

∈ R4 | x1 − x2 + x3 = 0

and the vector space

U = {a+ bx+ cx2 ∈ P2 | a = b}(f) Any n-dimensional vector spaceV andPn−1(R).

2. LetV andW ben-dimensional vector spaces and letL : V → W be a linearmapping. Prove thatL is one-to-one if and only ifL is onto.

3. Let L : V → W be an isomorphism and let{~v1, . . . ,~vn} be a basis forV. Provethat{L(~v1), . . . , L(~vn)} is a basis forW.

4. LetL : V→ U andM : U→W be linear mappings.

(a) Prove that ifL andM are onto, thenM ◦ L is onto.

(b) Give an example whereL is not onto, butM ◦ L is onto.

(c) Is it possible to give an example whereM is not onto, butM ◦ L is onto?Explain.

5. LetV andW be vector spaces with dimV = n and dimW = m, let L : V→Wbe a linear mapping, and letA be the matrix ofL with respect to basesB for VandC forW.

(a) Define an explicit isomorphism from Range(L) to Col(A). Prove that yourmap is an isomorphism.

(b) Use (a) to prove that rank(L) = rank(A).

6. LetU andV be finite dimensional vector spaces andL : U→ V andM : V→U both be one-to-one linear mappings. Prove thatU andV are isomorphic.

Chapter 9

Inner Products

In Math 136, we looked at the dot product function for vectorsin Rn. We saw thatthe dot product function has an important relationship to the length of a vector andto the angle between two vectors. Moreover, the dot product gave us an easy wayof finding the projection of a vector onto a plane. In Chapter 8, we saw that everyn-dimensional vector space is identical toRn in terms of being a vector space. Thus,we should be able to extend these ideas to general vector spaces.

9.1 Inner Product Spaces

Recall that we defined a vector space so that the operations ofaddition and scalarmultiplication had the essential properties of addition and scalar multiplication ofvectors inRn. Thus, to generalize the concept of the dot product to general vectorspaces, it makes sense to include the essential properties of the dot product in ourdefinition.

DEFINITIONInner Product

LetV be a vector space. Aninner product onV is a function〈 , 〉 : V × V→ R thathas the following properties: for every~v, ~u, ~w ∈ V ands, t ∈ R we have

I1 〈~v,~v〉 ≥ 0 and〈~v,~v〉 = 0 if and only if~v = ~0 (Positive Definite)I2 〈~v, ~w〉 = 〈~w,~v〉 (Symmetric)I3 〈s~v+ t~u, ~w〉 = s〈~v, ~w〉 + t〈~u, ~w〉 (Bilinear)

DEFINITIONInner Product

Space

A vector spaceV with an inner product〈 , 〉 onV is called aninner product space.

In the same way that a vector space is dependent on the defined operations of additionand scalar multiplication, an inner product space is dependent on the definitions ofaddition, scalar multiplication, and the inner product. This will be demonstrated inthe examples below.

31

32 Chapter 9 Inner Products

EXAMPLE 1 The dot product is an inner product onRn, called the standard inner product onRn.

EXAMPLE 2 Which of the following defines an inner product onR3?

(a)⟨

~x, ~y⟩

= x1y1 + 2x2y2 + 4x3y3.

Solution: We have⟨

~x, ~x⟩

= x21 + 2x2

2 + 4x23 ≥ 0

and⟨

~x, ~x⟩

= 0 if and only if~x = ~0. Hence, it is positive definite.

⟨

~x, ~y⟩

= x1y1 + 2x2y2 + 4x3y3 = y1x1 + 2y2x2 + 4y3x3 =⟨

~y, ~x⟩

Therefore, it is symmetric. Also, it is bilinear since

⟨

s~x+ t~y,~z⟩

= (sx1 + ty1)z1 + 2(sx2 + ty2)z2 + 4(sx3 + ty3)z3

= s(x1z1 + 2x2z2 + 4x3z3) + t(y1z1 + 2y2z2 + 4y3z3)

= s⟨

~x,~z⟩

+ t⟨

~y,~z⟩

Thus〈 , 〉 is an inner product.

(b)⟨

~x, ~y⟩

= x1y1 − x2y2 + x3y3.

Solution: Observe that if~x =

010

, then

⟨

~x, ~x⟩

= 0(0)− 1(1)+ 0(0)= −1 < 0

So, it is not positive definite and thus it is not an inner product.

(c)⟨

~x, ~y⟩

= x1y1 + x2y2.

Solution: If ~x =

001

, then

⟨

~x, ~x⟩

= 0(0)+ 0(0)= 0

but~x , ~0. So, it is not positive definite and thus it is not an inner product.

REMARK

A vector space can have infinitely many different inner products. However, differentinner products do not necessarily differ in an interesting way. For example, inRn allinner products behave like the dot product with respect to some basis.

Section 9.1 Inner Product Spaces 33

EXAMPLE 3 On the vector spaceM2×2(R), define〈 , 〉 by

〈A, B〉 = tr(BTA)

where tr is the trace of a matrix defined as the sum of the diagonal entires. Show that〈 , 〉 is an inner product onM2×2(R).

Solution: Let A, B,C ∈ M2×2(R) ands, t ∈ R. Then:

〈A,A〉 = tr(ATA) = tr

([

a11 a21

a12 a22

] [

a11 a12

a21 a22

])

= tr

([

a211 + a2

21 a11a12 + a21a22

a11a12 + a21a22 a212 + a2

22

])

= a211+ a2

21+ a212+ a2

22 ≥ 0

Moreover,〈A,A〉 = 0 if and only if A is the zero matrix. Hence,〈 , 〉 is positivedefinite.

〈A, B〉 = tr(BTA) = tr

([

b11 b21

b12 b22

] [

a11 a12

a21 a22

])

= tr

([

a11b11 + a21b21 b11a12 + b21a22

a11b12 + a21b22 a12b12 + a22b22

])

= a11b11 + a12b12+ a21b21 + a22b22

With a similar calculation we find that

〈B,A〉 = a11b11 + a12b12+ a21b21 + a22b22 = 〈A, B〉

Therefore,〈 , 〉 is symmetric. In Section 8.1 we proved that tr is a linear operator.Hence, we get

〈sA+ tB,C〉 = tr(CT(sA+ tB)) = tr(sCTA+ tCTB)

= str(CTA) + t tr(CT B) = s〈A,C〉 + t 〈B,C〉

Thus,〈 , 〉 is also bilinear, and so it is an inner product onM2×2(R).

REMARKS

1. Of course, this can be generalized to an inner product〈A, B〉 = tr(BTA) onMm×n(R). This is called the standard inner product onMm×n(R).

2. Observe that calculating〈A, B〉 corresponds exactly to finding the dot producton the isomorphic vectors inRmn. As a result, when using this inner productyou do not actually have to computeBTA.


EXAMPLE 4 Prove that the function〈 , 〉 defined by

〈p(x), q(x)〉 = p(−1)q(−1)+ p(0)q(0)+ p(1)q(1)

is an inner product onP2(R).

Solution: Let p(x) = a+ bx+ cx2, q(x), r(x) ∈ P2(R). Then:

〈p(x), p(x)〉 = [p(−1)]2 + [p(0)]2 + [p(1)]2

= [a− b+ c]2 + [a]2 + [a+ b+ c]2 ≥ 0

Moreover,〈p(x), p(x)〉 = 0 if and only if a − b + c = 0, a = 0, anda + b + c = 0,which impliesa = b = c = 0. Hence,〈 , 〉 is positive definite. We have

〈p(x), q(x)〉 = p(−1)q(−1)+ p(0)q(0)+ p(1)q(1)

= q(−1)p(−1)+ q(0)p(0)+ q(1)p(1)

= 〈q(x), p(x)〉

Hence,〈 , 〉 is symmetric. Finally, we get

〈(sp+ tq)(x), r(x)〉 =(sp+ tq)(−1)r(−1)+ (sp+ tq)(0)r(0)+ (sp+ tq)(1)r(1)

=[sp(−1)+ tq(−1)]r(−1)+ [sp(0)+ tq(0)]r(0)+

+ [sp(1)+ tq(1)]r(1)

=s[p(−1)r(−1)+ p(0)r(0)+ p(1)r(1)]+

+ t[q(−1)r(−1)+ q(0)r(0)+ q(1)r(1)]

=s〈p(x), r(x)〉 + t 〈q(x), r(x)〉

Therefore,〈 , 〉 is also bilinear and hence is an inner product onP2(R).

EXAMPLE 5 An extremely important inner product in applied mathematics, physics, and engineer-ing is the inner product

〈 f (x), g(x)〉 =∫ π

−πf (x)g(x) dx

on the vector spaceC[−π, π] of continuous functions defined on the closed intervalfrom −π to π. This is the foundation for Fourier Series.

THEOREM 1 LetV be an inner product space with inner product〈 , 〉. Then for any~v ∈ V we have⟨

~v, ~0⟩

= 0

Section 9.1 Inner Product Spaces 35

REMARK

Whenever one considers an inner product space one must definewhich inner productthey are using. However, inRn andMm×n(R) one generally uses the standard innerproduct. Thus, whenever we are working inRn or Mm×n(R), if no other inner productis mentioned, we will take this to mean that the standard inner product is being used.


1. Calculate the following inner products under the standard inner product onM2×2(R).

(a)

⟨[

1 20 −1

]

,

[

1 20 −1

]⟩

(b)

⟨[

3 −42 5

]

,

[

4 12 −1

]⟩

(c)

⟨[

2 3−4 1

]

,

[

−5 31 5

]⟩

2. Calculate the following given that the function

〈p(x), q(x)〉 = p(0)q(0)+ p(1)q(1)+ p(2)q(2)

defines an inner product onP2(R).

(a) 〈x, 1+ x+ x2〉 (b) 〈1+ x2, 1+ x2〉 (c) 〈x+ x2, 1+ x+ x2〉

3. Determine, with proof, which of the following functions defines an inner prod-uct on the given vector space.

(a) OnR3, 〈~x, ~y〉 = 2x1y1 + 3x2y2 + 4x3y3

(b) OnP2(R), 〈p(x), q(x)〉 = p(−1)q(−1)+ p(1)q(1)

(c) OnM2×2(R),

⟨[

a1 a2

a3 a4

]

,

[

b1 b2

b3 b4

]⟩

= a1b1 − a4b4 + a2b2 − a3b3

(d) OnP2(R), 〈p(x), q(x)〉 = p(−2)q(−2)+ p(1)q(1)+ p(2)q(2)

(e) OnR3, 〈~x, ~y〉 = 2x1y1 − x1y2 − x2y1 + 2x2y2 + x3y3

(f) On P2(R), 〈p(x), q(x)〉 = 2p(−1)q(−1)+2p(0)q(0)+2p(1)q(1)−p(−1)q(1)−p(1)q(−1)

4. LetV be an inner product space with inner product〈 , 〉. Prove that〈~v, ~0〉 = 0for any~v ∈ V.

5. Let 〈 , 〉 denote the standard inner product onRn and letA ∈ Mn×n(R). Provethat

〈A~x, ~y〉 = 〈~x,AT~y〉for all ~x, ~y ∈ Rn. [HINT: Recall that~xT~y = ~x · ~y.]


9.2 Orthogonality and Length

The concepts of length and orthogonality are fundamental ingeometry and havemany real world applications. Thus, we now extend these concepts to general in-ner product spaces.

Length

DEFINITIONLength

LetV be an inner product space with inner product〈 , 〉. Then for any~v ∈ Vwe definethelength (or norm) of~v by

‖~v‖ =√

⟨

~v,~v⟩

Observe that the length of a vector in an inner product space is dependent on thedefinition of the inner product being used in the same way thatthe sum of two vectorsis dependent on the definition of addition being used. This isdemonstrated in theexamples below.

EXAMPLE 1 Find the length of~x =

102

in R3 with the inner product⟨

~x, ~y⟩

= x1y1 + 2x2y2 + 4x3y3.

Solution: We have

‖~x‖ =√

⟨

~x, ~x⟩

=√

(1)(1)+ 2(0)(0)+ 4(2)(2)=√

17

Observe that under the standard inner product the length of~x would have been√

5.

EXAMPLE 2 Let p(x) = x andq(x) = 2− 3x2 be vectors inP2(R) with inner product

〈p(x), q(x)〉 = p(−1)q(−1)+ p(0)q(0)+ p(1)q(1)

Calculate the length ofp(x) and the length ofq(x).

Solution: We have

‖p(x)‖ =√

〈p(x), p(x)〉 =√

p(−1)p(−1)+ p(0)p(0)+ p(1)p(1)

=√

(−1)(−1)+ 0(0)+ 1(1)=√

2

‖q(x)‖ =√

〈q(x), q(x)〉 =√

q(−1)q(−1)+ q(0)q(0)+ q(1)q(1)

=√

(−1)(−1)+ 2(2)+ (−1)(−1) =√

6

Section 9.2 Orthogonality and Length 37

EXAMPLE 3 Find the length ofp(x) = x in P2(R) with inner product

〈p(x), q(x)〉 = p(0)q(0)+ p(1)q(1)+ p(2)q(2)

Solution: We have

‖p(x)‖ =√

〈p(x), p(x)〉 =√

p(0)p(0)+ p(1)p(1)+ p(2)p(2)

=√

0(0)+ 1(1)+ 2(2)=√

5

EXAMPLE 4 OnC[−π, π] find the length off (x) = x under the inner product

〈 f (x), g(x)〉 =∫ π

−πf (x)g(x) dx

Solution: We have

‖x‖2 =∫ π

−πx2 dx=

13

x3∣

∣

∣

∣

∣

π

−π=

23π3

Thus,‖x‖ =√

2π3/3.

EXAMPLE 5 Find the length ofA =

[

1/2 1/2−1/2 −1/2

]

in M2×2(R).

Solution: Using the standard inner product we get

‖A‖ =√

〈A,A〉 =√

(1/2)2 + (1/2)2 + (−1/2)2 + (−1/2)2 = 1

In many cases it is useful (and sometimes necessary) to use vectors which have lengthone.

DEFINITIONUnit Vector

LetV be an inner product space with inner product〈 , 〉. If ~v ∈ V is a vector such that‖~v‖ = 1, then~v is called aunit vector.

In many situations, we will be given a general vector~v ∈ V and need to find a unitvector in the direction of~v. This is callednormalizing the vector. The followingtheorem shows us how to do this.

THEOREM 1 Let V be an inner product space with inner product〈 , 〉. Then for any~v ∈ V andt ∈ R, we have

‖t~v‖ = |t| ‖~v‖

Moreover, if~v , ~0, thenv̂ =1‖~v‖~v is a unit vector in the direction of~v.


EXAMPLE 6 Find a unit vector in the direction ofp(x) = x in P2(R) with inner product

〈p(x), q(x)〉 = p(0)q(0)+ p(1)q(1)+ p(2)q(2)

Solution: We have

‖p(x)‖ =√

p(0)p(0)+ p(1)p(1)+ p(2)p(2) =√

0(0)+ 1(1)+ 2(2)=√

5

Thus, a unit vector in the direction ofp is

p̂(x) =1√

5x

Orthogonality

DEFINITIONOrthogonal

Vectors

LetV be an inner product space with inner product〈 , 〉. If ~x, ~y ∈ V such that

⟨

~x, ~y⟩

= 0

then~x and~y are said to be orthogonal.

EXAMPLE 7 The standard basis vectors forR2 are not orthogonal under the inner product⟨

~x, ~y⟩

= 2x1y1 + x1y2 + x2y1 + 2x2y2 for R2 since⟨[

10

]

,

[

01

]⟩

= 2(1)(0)+ 1(1)+ 0(0)+ 2(0)(1)= 1

EXAMPLE 8 Show that

[

1 23 −1

]

is orthogonal to

[

−1 2−1 0

]

in M2×2(R).

Solution:⟨[

1 23 −1

]

,

[

−1 2−1 0

]⟩

= (−1)(1)+ 2(2)+ (−1)(3)+ 0(−1) = 0.

So, they are orthogonal.

DEFINITIONOrthogonal Set

If S = {~v1, . . . ,~vk} is a set of vectors in an inner product spaceV with inner product〈 , 〉 such that

⟨

~vi ,~vj

⟩

= 0 for all i , j, thenS is called anorthogonal set.


EXAMPLE 9 Prove that{1, x, 3x2 − 2} is an orthogonal set of vectors inP2(R) under the innerproduct〈p(x), q(x)〉 = p(−1)q(−1)+ p(0)q(0)+ p(1)q(1).

Solution: We have

〈1, x〉 = 1(−1)+ 1(0)+ 1(1)= 0⟨

1, 3x2 − 2⟩

= 1(1)+ 1(−2)+ 1(1)= 0⟨

x, 3x2 − 2⟩

= (−1)(1)+ 0(−2)+ 1(1)= 0

Hence, each vector is orthogonal to all other vectors so the set is orthogonal.

EXAMPLE 10 Prove that{1, x, 3x2 − 2} is not an orthogonal set of vectors inP2(R) under the innerproduct〈p(x), q(x)〉 = p(0)q(0)+ p(1)q(1)+ p(2)q(2).

Solution: We have〈1, x〉 = 1(0)+ 1(1)+ 1(2) = 3 so 1 andx are not orthogonal.Thus, the set is not orthogonal.

EXAMPLE 11 Prove that

{[

1 20 1

]

,

[

0 10 −2

]

,

[

0 01 0

]}

is an orthogonal set inM2×2(R).

Solution: We have⟨[

1 20 1

]

,

[

0 10 −2

]⟩

= 1(0)+ 2(1)+ 0(0)+ 1(−2) = 0⟨[

1 20 1

]

,

[

0 01 0

]⟩

= 1(0)+ 2(0)+ 0(1)+ 1(0)= 0⟨[

0 01 0

]

,

[

0 10 −2

]⟩

= 0(0)+ 0(1)+ 1(0)+ 0(−2) = 0

Hence, the set is orthogonal.

EXERCISE 1 Determine if

{[

3 1−1 2

]

,

[

1 −32 1

]

,

[

1 23 −1

]}

is an orthogonal set inM2×2(R).

One important property of orthogonal vectors inR2 is the Pythagorean Theorem.This extends to an orthogonal set in an inner product space.

THEOREM 2 If {~v1, . . . ,~vk} is an orthogonal set in an inner product spaceV, then

‖~v1 + · · · + ~vk‖2 = ‖~v1‖2 + · · · + ‖~vk‖2


The following theorem gives us an important result about orthogonal sets which donot contain the zero vector.

THEOREM 3 Let S = {~v1, . . . ,~vk} be an orthogonal set in an inner product spaceV with innerproduct〈 , 〉 such that~vi ,

~0 for all 1≤ i ≤ k. ThenS is linearly independent.

Proof: Consider~0 = c1~v1 + · · · + ck~vk

Take the inner product of both sides with~vi to get⟨

~vi , ~0⟩

=⟨

~vi , c1~v1 + · · · + ck~vk⟩

0 = c1⟨

~vi ,~v1⟩

+ · · · + ci⟨

~vi ,~vi⟩

+ · · · + ck⟨

~vi ,~vk⟩

= 0+ · · · + 0+ ci‖~vi ‖2 + 0+ · · · + 0

= ci‖~vi ‖2

Since~vi ,~0 we have that‖~vi ‖ , 0, and soci = 0 for 1 ≤ i ≤ k. Therefore,S is

linearly independent. �

Consequently, if we have an orthogonal set of non-zero vectors which spans an innerproduct spaceV, then it will be a basis forV. We will call this anorthogonal basis.Since the vectors in the basis are all orthogonal to each other, our geometric intuitiontells us that it should be quite easy to find the coordinates ofany vector with respectto this basis. The following theorem demonstrates this.

THEOREM 4 If S = {~v1, . . . ,~vn} is an orthogonal basis for an inner product spaceV with innerproduct〈 , 〉 and~v ∈ V, then the coefficient of~vi when~v is written as a linear combi-

nation of the vectors inS is⟨

~v,~vi⟩

‖~vi ‖2. In particular,

~v =

⟨

~v,~v1⟩

‖~v1‖2~v1 + · · · +

⟨

~v,~vn⟩

‖~vn‖2~vn

Proof: Since~v ∈ V we can write

~v = c1~v1 + · · · + cn~vn

Taking the inner product of both sides with~vi gives⟨

~vi,~v⟩

=⟨

~vi , c1~v1 + · · · + cn~vn⟩

= c1⟨

~vi ,~v1⟩

+ · · · + ci⟨

~vi ,~vi⟩

+ · · · + cn⟨

~vi,~vn⟩

= 0+ · · · + 0+ ci‖~vi ‖2 + 0+ · · · + 0

sinceS is orthogonal. Also,~vi ,~0 which implies that‖~vi ‖2 , 0. Therefore, we get

ci =

⟨

~v,~vi⟩

‖~vi ‖2

This is valid for 1≤ i ≤ n and so the result follows. �


EXAMPLE 12 Find the coordinates of~x =

12−3

in R3 with respect to the orthogonal basis

B =

111

,

10−1

,

1−21

.

Solution: By Theorem4 we have

c1 =

⟨

~x,~v1⟩

‖~v1‖2=

03= 0

c2 =

⟨

~x,~v2⟩

‖~v2‖2=

42= 2

c3 =

⟨

~x,~v3⟩

‖~v3‖2=−66= −1

Thus, [~x]B =

02−1

.

Note that it is easy to verify that

12−3

= 0

111

+ 2

10−1

+ (−1)

1−21

EXAMPLE 13 Given thatB ={[

1 11 1

]

,

[

−2 01 1

]

,

[

0 01 −1

]}

is an orthogonal basis for a subspaceS

of M2×2(R), find the coordinates ofA =

[

4 23 −1

]

with respect toB.

Solution: We have

c1 =

⟨

A,~v1⟩

‖~v1‖2=

4(1)+ 2(1)+ 3(1)+ (−1)(1)12 + 12 + 12 + 12

=84= 2

c2 =

⟨

A,~v2⟩

‖~v2‖2=

4(−2)+ 2(0)+ 3(1)+ (−1)(1)(−2)2 + 02 + 12 + 12

=−66= −1

c3 =

⟨

A,~v3⟩

‖~v3‖2=

4(0)+ 2(0)+ 3(1)+ (−1)(−1)(0)2 + 02 + 12 + (−1)2

=42= 2

Thus, [A]B =

2−12

.


Orthonormal Bases

Observe that the formula for the coordinates with respect toan orthogonal basiswould be even easier if all the basis vectors were unit vectors. Thus, it is desirable toconsider such bases.

DEFINITIONOrthonormal Set

If S = {~v1, . . . ,~vk} is an orthogonal set in an inner product spaceV such that‖~vi ‖ = 1for 1 ≤ i ≤ k, thenS is called anorthonormal set.

By Theorem3 an orthonormal set is necessarily linearly independent.

DEFINITIONOrthonormal

Basis

A basis for an inner product spaceV which is an orthonormal set is called anorthonormal basis of V.

EXAMPLE 14 The standard basis ofRn is an orthonormal basis under the standard inner product.

EXAMPLE 15 The standard basis forR3 is not an orthonormal basis under the inner product

⟨

~x, ~y⟩

= x1y1 + 2x2y2 + x3y3

since~e2 =

010

is not a unit vector under this inner product.

EXAMPLE 16 LetB =

1√3

11−1

, 1√2

101

, 1√6

1−2−1

. Verify thatB is an orthonormal basis inR3.

Solution: We first verify that each vector is a unit vector. We have

⟨

1√

3

11−1

,1√

3

11−1

⟩

=13+

13+

13= 1

⟨

1√

2

101

,1√

2

101

⟩

=12+ 0+

12= 1

⟨

1√

6

1−2−1

,1√

6

1−2−1

⟩

=16+

46+

16= 1


We now verify thatB is orthogonal.

⟨

1√

3

11−1

,1√

2

101

⟩

=1√

6(1+ 0− 1) = 0

⟨

1√

3

11−1

,1√

6

1−2−1

⟩

=1√

18(1− 2+ 1) = 0

⟨

1√

2

101

,1√

6

1−2−1

⟩

=1√

12(1+ 0− 1) = 0

Therefore,B is an orthonormal set of 3 vectors inR3 which implies that it is a basisfor R3.

EXERCISE 2 Show thatB ={[

1/√

2 1/√

20 0

]

,

[

0 01/√

2 1/√

2

]

,

[

1/2 −1/21/2 −1/2

]}

is an orthonormal

set inM2×2(R).

EXERCISE 3 Show thatB = {1, x, x2} is not an orthonormal set forP2(R) under the inner product

〈p(x), q(x)〉 = p(−2)q(−2)+ p(0)q(0)+ p(2)q(2)

EXAMPLE 17 Turn {1, x, 3x2 − 2} into an orthonormal basis forP2(R) under the inner product

〈p(x), q(x)〉 = p(−1)q(−1)+ p(0)q(0)+ p(1)q(1)

Solution: In Example9 we showed that{1, x, 3x2 − 2} is orthogonal. Hence, it is alinearly independent set of three vectors inP2(R) and thus is a basis forP2(R). Toturn it into an orthonormal basis, we just need to normalize each vector. We have

‖1‖ =√

12 + 12 + 12 =√

3

‖x‖ =√

(−1)2 + 02 + 12 =√

2

‖3x2 − 2‖ =√

12 + (−2)2 + 12 =√

6

Hence, an orthonormal basis forP2(R) is

{

1√

3,

x√

2,3x2 − 2√

6

}


EXAMPLE 18 Show that the set{1, sinx, cosx} is an orthogonal set inC[−π, π] under

〈 f (x), g(x)〉 =∫ π

−πf (x)g(x) dx, then make it an orthonormal set.

Solution:

〈1, sinx〉 =∫ π

−π1 · sinx dx= 0

〈1, cosx〉 =∫ π

−π1 · cosx dx= 0

〈sinx, cosx〉 =∫ π

−πsinx · cosx dx= 0

Thus, the set is orthogonal. Next we find that

〈1, 1〉 =∫ π

−π12 dx= 2π

〈sinx, sinx〉 =∫ π

−πsin2 x dx= π

〈cosx, cosx〉 =∫ π

−πcos2 x dx= π

Hence,‖1‖ =√

2π, ‖ sinx‖ =√π, and‖ cosx‖ =

√π. Therefore,

{

1√

2π,sinx√π,cosx√π

}

is an orthonormal set inC[−π, π].

The following theorem shows, as predicted, that it is very easy to find coordinates ofa vector with respect to an orthonormal basis.

THEOREM 5 If ~v is any vector in an inner product spaceV with inner product〈 , 〉 andB = {~v1, . . . ,~vn} is an orthonormal basis forV, then

~v =⟨

~v,~v1⟩

~v1 + · · · +⟨

~v,~vn⟩

~vn

Proof: By Theorem4 we have

~v =⟨

~v,~v1⟩

‖~v1‖2~v1 + · · · +

⟨

~v,~vn⟩

‖~vn‖2~vn =

⟨

~v,~v1⟩

~v1 + · · · +⟨

~v,~vn⟩

~vn

since‖~vi ‖ = 1 for 1≤ i ≤ n. �


REMARK

The formula for finding coordinates with respect to an orthonormal basis looks nicerthat that for an orthogonal basis. However, in practice, is it not necessarily easier touse as the vectors in an orthonormal basis often contain square roots. Compare thefollowing example to Example 12.

EXAMPLE 19 Find

123

B

given thatB =

1√3

11−1

, 1√2

101

, 1√6

1−2−1

is an orthonormal basis forR3.

Solution: We have

c1 =⟨

~x,~v1⟩

=

⟨

123

,1√

3

11−1

⟩

= 0

c2 =⟨

~x,~v2⟩

=

⟨

123

,1√

2

101

⟩

=4√

2= 2√

2

c3 =⟨

~x,~v3⟩

=

⟨

123

,1√

6

1−2−1

⟩

=−6√

6= −√

6

Thus, [~x]B =

02√

2−√

6

.

EXAMPLE 20 Let 〈p(x), q(x)〉 = p(−1)q(−1) + p(0)q(0) + p(1)q(1) be an inner product forP2(R).Write f (x) = 1 + x + x2 as a linear combination of the vectors in the orthonormal

basisB ={

1√

3,

x√

2,3x2 − 2√

6

}

.

Solution: By Theorem 5, the coordinates of~x with respect toB are

c1 =⟨

f ,~v1⟩

= 1

(

1√

3

)

+ 1

(

1√

3

)

+ 3

(

1√

3

)

=5√

3

c2 =⟨

f ,~v2⟩

= 1

(

−1√

2

)

+ 1(0)+ 3

(

1√

2

)

=√

2

c3 =⟨

f ,~v3⟩

= 1

(

1√

6

)

+ 1

(

−2√

6

)

+ 3

(

1√

6

)

=2√

6

Thus,

1+ x+ x2 =5√

3

(

1√

3

)

+√

2

(

x√

2

)

+2√

6

(

3x2 − 2√

6

)


Orthogonal Matrices

We end this section by looking at a very important application of orthonormal basesof Rn.

THEOREM 6 For ann× n matrix P, the following are equivalent.

(1) The columns ofP form an orthonormal basis forRn.(2) PT = P−1

(3) The rows ofP form an orthonormal basis forRn.

Proof: Let P =[

~v1 · · · ~vn

]

whereB = {~v1, . . . ,~vn} is an orthonormal basis forRn.

(1)⇔ (2): By definition of matrix multiplication we have

PTP =

~vT1...~vT

n

[

~v1 · · · ~vn

]

=

~v1 · ~v1 ~v1 · ~v2 · · · ~v1 · ~vn

~v2 · ~v1 ~v2 · ~v2 · · · ~v2 · ~vn...

.... . .

...~vn · ~v1 ~vn · ~v2 · · · ~vn · ~vn

ThereforePTP = I if and only if~vi · ~vj = 0 wheneveri , j and~vi · ~vi = 1.

(2)⇔ (3): The rows ofP form an orthonormal basis forRn if and only if the columnsof PT from an orthonormal basis forRn. We proved above that this is true if and onlyif

(PT)T = (PT)−1

P = (P−1)T

PT =(

(P−1)T)T

PT = P−1

as required. �


Matrix

Let P be ann × n matrix such that the columns ofP form an orthonormal basis forR

n. ThenP is called anorthogonal matrix.

REMARKS

1. Be very careful to remember that an orthogonal matrix hasorthonormal rowsand columns. A matrix whose columns form an orthogonal set but not anorthonormal set is not orthogonal!

2. By Theorem6 we could have defined an orthogonal matrix as a matrixP suchthat PT = P−1 instead. We will use this property of orthogonal matrices fre-quently.


EXAMPLE 21 Which of the following matrices are orthogonal.

A =

0 1 00 0 11 0 0

, B

[

1/2 1/2−1/2 1/2

]

, C =

1/√

2 1/√

2 01/√

6 −1/√

6 2/√

6−1/√

3 1/√

3 1/√

3

Solution: The columns ofA are the standard basis vectors forR3 which we know isan orthonormal basis forR3. Thus,A is orthogonal.

Although the columns ofB are clearly orthogonal, we have that∥

∥

∥

∥

∥

∥

[

1/2−1/2

]∥

∥

∥

∥

∥

∥

=√

(1/2)2 + (−1/2)2 =√

1/2 , 1

Thus, the first column is not a unit vector, soB is not orthogonal.

By matrix multiplication, we find that

CCT =

1/√

2 1/√

2 01/√

6 −1/√

6 2/√

6−1/√

3 1/√

3 1/√

3

1/√

2 1/√

6 −1/√

31/√

2 −1/√

6 1/√

30 2/

√6 1/

√3

= I

HenceC is orthogonal.

We now look at some useful properties of orthogonal matrices.

THEOREM 7 Let P andQ ben× n orthogonal matrices and~x, ~y ∈ Rn. Then

(1) (P~x) · (P~y) = ~x · ~y.(2) ‖P~x‖ = ‖~x‖.(3) detP = ±1.(4) All real eigenvalues ofP are 1 or−1.(5) PQ is an orthogonal matrix.

Proof: We will prove (1) and (2) and leave the others as exercises. Wehave

(P~x) · (P~y) = (P~x)T(P~y) = ~xTPTP~y = ~xT~y = ~x · ~y

Thus,‖P~x‖2 = (P~x) · (P~x) = ~x · ~x = ‖~x‖2

as required. �



1. Consider the subspace SpanB in M2×2(R) where

B ={[

1 1−1 −1

]

,

[

1 −11 −1

]

,

[

−1 11 −1

]}

(a) Prove thatB is an orthogonal basis forS.

(b) TurnB into an orthonormal basisC by normalizing the vectors inB.

(c) Find the coordinates of~x =

[

−3 6−2 −1

]

with respect to the orthonormal

basisC.

2. Consider the inner product〈 , 〉 on P2(R) defined by

〈p(x), q(x)〉 = p(−1)q(−1)+ p(0)q(0)+ p(1)q(1)

LetB = {1+ x,−2+ 3x, 2− 3x2}

(a) Prove thatB is an orthogonal basis forP2(R).

(b) Find the coordinates of 5+ x− 3x2 with respect toB.

3. Consider the inner product〈 , 〉 on definedR3 by⟨

~x, ~y⟩

= x1y1 + 2x2y2 + x3y3.

LetB =

12−1

,

3−1−1

,

317

(a) Prove thatB is an orthogonal basis forR3.

(b) TurnB into an orthonormal basisC by normalizing the vectors inB.

(c) Find the coordinates of~x =

111

with respect to the orthonormal basisC.

4. Which of the following matrices are orthogonal.

(a)

[

1/√

10 −3/√

103/√

10 1/√

10

]

(b)

[

2 1−1 2

]

(c)

2/3 1/3 2/3−1/√

18 4/√

18 −1/√

18−1/√

2 0 1/√

2

(d)

1 1 12 −1 01 1 −1

(e)

0 1 00 0 11 0 0

(f)

2/√

20 3/√

11 03/√

20 −1/√

11 −1/√

23/√

20 −1/√

11 1/√

2

5. ConsiderP2(R) with inner product〈p(x), q(x)〉 = p(−1)q(−1) + p(0)q(0) +p(1)q(1). Given thatB = {1− x2, 1

2(x− x2)} is an orthonormal set, extendB tofind an orthonormal basis forP2(R).

6. LetV be an inner product space with inner product〈 , 〉. Prove that for any~v ∈ V andt ∈ R, we have

‖t~v‖ = |t| ‖~v‖

Section 9.3 The Gram-Schmidt Procedure 49

7. LetP andQ ben× n orthogonal matrices

(a) Prove that detP = ±1.

(b) Prove that all real eigenvalues ofP are±1.

(c) Give an orthogonal matrixP whose eigenvalues are not±1.

(d) Prove thatPQ is an orthogonal matrix.

8. Given thatB =

1/√

6−2/√

61/√

6

,

1/√

31/√

31/√

3

,

1/√

20

−1/√

2

is an orthonormal basis

for R3. UsingB, determine another orthonormal basis forR3 which includes

the vector

1/√

61/√

31/√

2

, and briefly explain why your basis is orthonormal.

9. Let {~v1, . . . ,~vm} be a set ofm orthonormal vectors inRn with m < n. LetQ =

[

~v1 · · · ~vm

]

. Prove thatQTQ = Im.

10. Let{~v1, . . . ,~vn} be an orthonormal basis for an inner product spaceVwith innerproduct〈 , 〉 and let~x = c1~v1 + · · · + cn~vn and~y = d1~v1 + · · · + dn~vn. Show that

〈~x, ~y〉 = c1d1 + · · · + cndn and ‖~x‖2 = c21 + · · · + c2

n

9.3 The Gram-Schmidt Procedure

We saw in Math 136 that every finite dimensional vector space has a basis. In thissection, we derive an algorithm for changing a basis for a given inner product spaceinto an orthogonal basis. Once we have an orthogonal basis, it is easy to normalizeeach vector to make it orthonormal.

LetW be an-dimensional inner product space and let{~w1, . . . , ~wn} be a basisW. Wewant to find an orthogonal basis{~v1, . . . ,~vn} forW.

To develop a standard procedure, we will look at a few simple cases and then gener-alize.

First, consider the case where{~w1} is a basis forW. In this case, we can take~v1 = ~w1

so that{~v1} is an orthogonal basis forW.

Now, consider the case where{~w1, ~w2} is a basis forW. Starting as in the case abovewe let~v1 = ~w1. We then need to pick~v2 so that it is orthogonal to~v1 and{~v1,~v2} spansW. To see how to find such a vector~v2 we work backwards. Assume that we have anorthogonal set{~v1,~v2} which spansW, then we must have that~w2 ∈ Span{~v1,~v2}. Ifthis is true, then, from our work with coordinates with respect to an orthogonal basis,we find that

~w2 =< ~w2,~v1 >

‖~v1‖2~v1 +

< ~w2,~v2 >

‖~v2‖2~v2


where< ~w2,~v2 >

‖~v2‖2, 0 since~w2 is not a scalar multiple of~w1 = ~v1. Rearranging gives

< ~w2,~v2 >

‖~v2‖2~v2 = ~w2 −

< ~w2,~v1 >

‖~v1‖2~v1

Multiplying by a non-zero scalar does not change orthogonality or spanning, so wecan take any scalar multiple of~v2. For simplicity, we take

~v2 = ~w2 −< ~w2,~v1 >

‖~v1‖2~v1

Consequently, we have that{~v1,~v2} is an orthogonal basis forW.

For the case where{~w1, ~w2, ~w3} is a basis forW, we can repeat the same procedure.We start by picking~v1 and~v2 as above. We then need to find~v3 such that{~v1,~v2,~v3}is orthogonal and spansW. Using the same argument as in the previous case, we get

~v3 = ~w3 −< ~w3,~v1 >

‖~v1‖2~v1 −

< ~w3,~v2 >

‖~v2‖2~v2

and {~v1,~v2,~v3} is an orthogonal basis forW. Repeating this algorithm for the caseW = Span{~w1, . . . , ~wk} gives us the following result.

THEOREM 1 (Gram-Schmidt Orthogonalization)

Let W be a subspace of an inner product space with basis{~w1, . . . , ~wk}. Define~v1, . . . ,~vk successively as follows:

~v1 = ~w1

~v2 = ~w2 −< ~w2,~v1 >

‖~v1‖2~v1

~vi = ~wi −< ~wi ,~v1 >

‖~v1‖2~v1 −

< ~wi ,~v2 >

‖~v2‖2~v2 − · · · −

< ~wi ,~vi−1 >

‖~vi−1‖2~vi−1

for 3 ≤ i ≤ k. Then Span{~v1, . . . ,~vi} = Span{~w1, . . . , ~wi} for 1 ≤ i ≤ k. In particular,{~v1, . . . ,~vk} is an orthogonal basis ofW.

EXAMPLE 1 Use the Gram-Schmidt procedure to transformB =

111

,

1−11

,

110

into an

orthonormal basis forR3.

Solution: Denote the vectors inB by ~w1, ~w2, and ~w3 respectively. We first take

~v1 = ~w1 =

111

. Then

~w2 −⟨

~w2,~v1⟩

‖~v1‖2~v1 =

1−11

− 13

111

=

2/3−4/32/3

Section 9.3 The Gram-Schmidt Procedure 51

We can take any scalar multiple of this, so we take~v2 =

1−21

to simplify the next

calculation. Next,

~v3 = ~w3 −⟨

~w3,~v1⟩

‖~v1‖2~v1 −

⟨

~w3,~v2⟩

‖~v2‖2~v2

=

110

− 23

111

+16

1−21

=

1/20−1/2

Then{~v1,~v2,~v3} is an orthogonal basis forR3. To get an orthonormal basis, we nor-malize each vector. We get

v̂1 =1√

3

111

v̂2 =1√

6

1−21

v̂3 =1√

2

10−1

So,{v̂1, v̂2, v̂3} is an orthonormal basis forR3.

EXAMPLE 2 Use the Gram-Schmidt procedure to transformB =

111

,

1−11

,

110

into an orthog-

onal basis forR3 with inner product⟨

~x, ~y⟩

= 2x1y1 + x2y2 + 3x3y3

Solution: Take~v1 =

111

. Then

~w2 −⟨

~w2,~v1⟩

‖~v1‖2~v1 =

1−11

− 46

111

=

1/3−5/31/3

To simplify calculations we take~v2 =

1−51

. Next,

~v3 = ~w3 −⟨

~w3,~v1⟩

‖~v1‖2~v1 −

⟨

~w3,~v2⟩

‖~v2‖2~v2

=

110

− 36

111

+330

1−51

=

350−2

5

Thus, an orthogonal basis is

111

,

1−51

,

30−2

.


EXAMPLE 3 Find an orthogonal basis ofP3(R) with inner product〈p, q〉 =∫ 1

−1p(x)q(x) dx by

applying the Gram-Schmidt procedure to the basis{1, x, x2, x3}.

Solution: Take~v1 = 1. Then

~v2 = x− 〈x, 1〉‖1‖2 1 = x− 02

1 = x

~v3 = x2 −

⟨

x2, 1⟩

‖1‖2 1−

⟨

x2, x⟩

‖x‖2 x = x2 −23

21− 0

23

x = x2 − 13

We instead take~v3 = 3x2 − 1 to make calculations easier. Finally, we get

~v4 = x3 −

⟨

x3, 1⟩

‖1‖2 1−

⟨

x3, x⟩

‖x‖2 x−

⟨

x3, 3x2 − 1⟩

‖3x2 − 1‖2 (3x2 − 1) = x3 − 35

x

Hence, an orthogonal basis for this inner product space{

1, x, x2 − 13, x

3 − 35x

}

.

EXAMPLE 4 Use the Gram-Schmidt procedure to find an orthogonal basis for the subspaceW ofM2×2(R) spanned by

{~w1, ~w2, ~w3, ~w4} ={[

1 10 1

]

,

[

1 01 1

]

,

[

2 11 2

]

,

[

1 00 1

]}

under the standard inner product.

Solution: Take~v1 = ~w1. Then

~w2 −⟨

~w2,~v1⟩

‖~v1‖2~v1 =

[

1 01 1

]

− 23

[

1 10 1

]

=

[

1/3 −2/31 1/3

]

So, we take~v2 =

[

1 −23 1

]

. Next we get

~v3 = ~w3 −⟨

~w3,~v1⟩

‖~v1‖2~v1 −

⟨

~w3,~v2⟩

‖~v2‖2~v2 =

[

2 11 2

]

− 53

[

1 10 1

]

− 515

[

1 −23 1

]

=

[

0 00 0

]

At first glance, we may think something has gone wrong since the zero vector cannotbe a member of the basis. However, if we look closely, we see that this implies that~w3 is a linear combination of~w1 and~w2. Hence, we have

Span{~w1, ~w2, ~w4} = Span{~w1, ~w2, ~w3, ~w4}

Therefore, we can ignore~w3 and continue the procedure using~w4.

~v3 = ~w4 −⟨

~w4,~v1⟩

‖~v1‖2~v1 −

⟨

~w4,~v2⟩

‖~v2‖2~v2 =

[

1 00 1

]

− 23

[

1 10 1

]

− 215

[

1 −23 1

]

=

[

1/5 −2/5−2/5 1/5

]

Hence an orthogonal basis forW is {~v1,~v2,~v3}.

Section 9.4 General Projections 53


1. Use the Gram-Schmidt procedure to transformB =

10−1

,

223

,

131

into an

orthogonal basis forR3.

2. ConsiderP2(R) with the inner product

〈p(x), q(x)〉 = p(−1)q(−1)+ p(0)q(0)+ p(1)q(1)

Use the Gram-Schmidt procedure to transformS = {1, x, x2} into an orthonor-mal basis forP2(R).

3. LetB ={[

−1 12 2

]

,

[

1 0−1 −1

]

,

[

1 10 0

]

,

[

1 00 1

]}

. Find an orthogonal basis for

the subspaceS = SpanB of M2×2(R).

4. SupposeP2(R) has an inner product〈 , 〉 which satisfies the following:

〈1, 1〉 = 2 〈1, x〉 = 2 〈1, x2〉 = −2

〈x, x〉 = 4 〈x, x2〉 = −2 〈x2, x2〉 = 3

Given thatB = {x, 2x2 + x, 2} is a basis ofP2(R). Apply the Gram-Schmidtprocedure to this basis to find an orthogonal basis ofP2(R).

9.4 General Projections

In Math 136 we saw how to calculate the projection of a vector onto a plane inR3.Such projections have many important uses. So, our goal now is to extend projectionsto general inner product spaces.

We will do this by mimicking what we did with projections inRn. We first recallfrom Math 136 that given a vector~x ∈ R3 and a planeP in R3 which passes throughthe origin, we wanted to write~x as a the sum of a vector inP and a vector orthogonalto every vector inP. Therefore, given a subspaceW of an inner product spaceV andany vector~v ∈ V, we want to find a vector proj

W~v inW and a vector perp

W~v which

is orthogonal to every vector inW such that

~v = projW~v+ perp

W~v

In the case of the plane inR3, we knew that the orthogonal vector was a scalar mul-tiple of the normal vector. In the general case, we need to start by looking at the setof vectors which are orthogonal to every vector inW.


Complement

LetW be a subspace of an inner product spaceV. Theorthogonal complementW⊥

ofW in V is defined by

W⊥ = {~v ∈ V | ⟨~w,~v⟩ = 0 for all ~w ∈W}


EXAMPLE 1 LetW = Span

1001

,

1100

. FindW⊥ in R4.

Solution: We want to find all~v =

v1

v2

v3

v4

such that

0 =

⟨

v1

v2

v3

v4

,

1001

⟩

= v1 + v4

0 =

⟨

v1

v2

v3

v4

,

1100

⟩

= v1 + v2

Solving this homogeneous system of two equations we find thatthe general solution

is~v = s

0010

+ t

−1101

, s, t ∈ R. Thus,

W⊥ = Span

0010

,

−1101

EXAMPLE 2 LetW = Span{x} in P2(R) with 〈p(x), q(x)〉 =∫ 1

0p(x)q(x) dx. FindW⊥.

Solution: We want to find allp(x) = a+ bx+ cx2 such that〈p, x〉 = 0. This gives

0 =∫ 1

0ax+ bx2 + cx3 =

12

a+13

b+14

c

So c = −2a − 43b. Thus every vector in the orthogonal compliment has the form

a+ bx−(

2a− 43b

)

x2 = a(1− 2x2) + b(

x− 43x2

)

. Thus we get

W⊥ = Span{1− 2x2, 3x− 4x2}

The following theorem shows that for a vector~x to be inW⊥ it only need be orthog-onal to each of the basis vectors ofW.


THEOREM 1 Let {~v1, . . . ,~vk} be a basis for a subspaceW of an inner product spaceV. If⟨

~x,~vi⟩

= 0for 1 ≤ i ≤ k, then~x ∈W⊥.

The following theorem gives some important properties of the othogonal complimentwhich we will require later.

THEOREM 2 LetW be a finite dimensional subspace of an inner product spaceV. Then

(1) W⊥ is a subspace ofV.(2) If dimV = n, then dimW⊥ = n− dimW.(3) If dimV = n, then

(

W⊥)⊥ =W.

(4) W ∩W⊥ = {~0}.(5) If dimV = n, {~v1, . . . ,~vk} is an orthogonal basis forW, and{~vk+1, . . . ,~vn} is an

orthogonal basis forW⊥, then{~v1, . . . ,~vk,~vk+1, . . . ,~vn} is an orthogonal basisfor V.

Proof: For (1), we apply the Subspace Test. By definitionW is a subset ofV. Also,~0 ∈ W⊥ since〈~0, ~w〉 = 0 for all ~w ∈ W by Theorem 3.2.1. Let~u,~v ∈ W⊥ andt ∈ R.Then for any~w ∈W we have

⟨

~u+ ~v, ~w⟩

=⟨

~u, ~w⟩

+⟨

~v, ~w⟩

= 0+ 0 = 0

and⟨

t~u, ~w⟩

= t⟨

~u, ~w⟩

= t(0) = 0

Thus,~u+ ~v ∈ W⊥ andt~u ∈W⊥, soW⊥ is a subspace ofV.

For (2), let {~v1, . . . ,~vk} be an orthonormal basis forW. We can extend this to abasis forV and then apply the Gram-Schmidt procedure to get an orthonormal basis{~v1, . . . ,~vn} for V. Then, for any~x ∈W⊥ we have~x ∈ V so we can write

~x =⟨

~x,~v1⟩

~v1 + · · · +⟨

~x,~vk⟩

~vk +⟨

~x,~vk+1⟩

~vk+1 + · · · +⟨

~x,~vn⟩

~vn

= ~0+ · · · + ~0+ ⟨

~x,~vk+1⟩

~vk+1 + · · · +⟨

~x,~vn⟩

~vn

Hence,W⊥ = Span{~vk+1, . . . ,~vn}. Moreover,{~vk+1, . . . ,~vn} is linearly independentsince it is orthonormal. Thus, dimW⊥ = n− k = n− dimW.

For (3), if ~w ∈ W, then⟨

~w,~v⟩

= 0 for all ~v ∈ W⊥ by definition ofW⊥. HenceW ∈(

W⊥)⊥. SoW ⊆ (

W⊥)⊥ Also, dimW⊥⊥ = n− dimW⊥ = n− (n− dimW) = dimW.

HenceW =(

W⊥)⊥.

For (4), if ~x ∈W ∩W⊥, then⟨

~x, ~x⟩

= 0 and hence~x = ~0. Therefore,W ∩W⊥ = {~0}.

For (5), we have that{~v1, . . . ,~vk,~vk+1, . . . ,~vn} is an orthogonal set of non-zero vectorsin V since every vector inW⊥ is orthogonal to every vectorW. Moreover, it containsn vectors by (4). Hence, we have that it is linearly independent set ofn vectors. Thus,it is an orthogonal basis forV. �


Property (3) may seem obvious at first glance which may make uswonder why thecondition that dimV = n was necessary. This is because this property may not betrue in an infinite dimensional inner product space! That is,we can have

(

W⊥)⊥,W.

We demonstrate this with an example.

EXAMPLE 3 Let P be the inner product space of all real polynomials with innerproduct⟨

a0 + a1x+ · · · + akxk, b0 + b1x+ · · · + bl xl⟩

= a0b0 + a1b1 + · · · + apbp

wherep = min(k, l). Now, letU be the subspace

U ={

g(x) ∈ P | g(1) = 0}

We claim thatU⊥ = {0}. Let f (x) ∈ U⊥, say f (x) = a0 + a1x+ · · · + anxn. Note that〈 f , xn+1〉 = 0. Defineg(x) = f (x) − f (1)xn+1. We get thatg(1) = 0 and henceg ∈ U.Therefore, we have

0 = 〈 f , g〉 = ⟨

f , f − f (1)xn+1⟩ = 〈 f , f 〉 − f (1)〈 f , xn+1〉 = 〈 f , f 〉

and hencef (x) = 0.

Of course(

U⊥)⊥ = P, since〈~0,~v〉 = 0 for every~v ∈ V. Therefore,

(

U⊥)⊥, U.

Notice that, as in the proof of property (3) above, we do have thatU ⊂ (

U⊥)⊥.

We now return to our purpose of looking at orthogonal complements, to define theprojection of a vector~v onto a subspaceW of an inner product spaceV. We stated thatwe want to find proj

W~v and perp

W~v such that~v = proj

W~v+perp

W~v with proj

W~v ∈W

and perpW~v ∈W⊥.

Suppose that dimV = n, {~v1, . . . ,~vk} is an orthogonal basis forW, and{~vk+1, . . . ,~vn}is an orthogonal basis forW⊥. By Theorem2 we have that{~v1, . . . ,~vk,~vk+1, . . . ,~vn} isan orthogonal basis forV. So, for any~v ∈ V we find the coordinates with respect tothis orthogonal basis are:

~v =

⟨

~v,~v1⟩

‖~v1‖2~v1 + · · · +

⟨

~v,~vk⟩

‖~vk‖2~vk +

⟨

~v,~vk+1⟩

‖~vk+1‖2~vk+1 + · · · +

⟨

~v,~vn⟩

‖~vn‖2~vn

This gives us what we want! Taking

projW~v =

⟨

~v,~v1⟩

‖~v1‖2~v1 + · · · +

⟨

~v,~vk⟩

‖~vk‖2~vk ∈W

and

perpW~v =

⟨

~v,~vk+1⟩

‖~vk+1‖2~vk+1 + · · · +

⟨

~v,~vn⟩

‖~vn‖2~vn ∈W⊥

we get~v = projW~v+ perp

W~v with proj

W~v ∈W and perp

W~v ∈W⊥.


DEFINITIONProjection

Perpendicular

SupposeW is ak-dimensional subspace of an inner product spaceV and{~v1, . . . ,~vk}is an orthogonal basis forW. For any~v ∈ V we define theprojection of ~v ontoW by

projW~v =

⟨

~v,~v1⟩

‖~v1‖2~v1 + · · · +

⟨

~v,~vk⟩

‖~vk‖2~vk

and theperpendicular of the projection by

perpW~v = ~v− proj

W~v

We have defined the perpendicular of the projection in such a way that we do notneed an orthogonal basis (or any basis) forW⊥. To ensure this is valid, we need toverify that perp

W~v ∈W⊥.

THEOREM 3 SupposeW is ak-dimensional subspace of an inner product spaceV. Then for any~v ∈ V, we have

perpW~v = ~v− proj

W~v ∈W⊥

Proof: Let {~v1, . . . ,~vk} be an orthogonal basis forW. Then we can write any~w ∈ Was~w = c1~v1 + · · · + ck~vk. This gives

⟨

perpW~v, ~w

⟩

=⟨

~v− projW~v, ~w

⟩

=⟨

~v, ~w⟩ − ⟨

projW~v, ~w

⟩

Observe that

⟨

~v, ~w⟩

=⟨

~v, c1~v1 + · · · + ck~vk⟩

= c1⟨

~v,~v1⟩

+ · · · + ck⟨

~v,~vk⟩

and, using the fact that{~v1, . . . ,~vk} is an orthogonal set, we get

⟨

projW~v, ~w

⟩

=

⟨⟨

~v,~v1⟩

‖~v1‖2~v1 + · · · +

⟨

~v,~vk⟩

‖~vk‖2~vk, c1~v1 + · · · + ck~vk

⟩

= c1

⟨

~v,~v1⟩

‖~v1‖2⟨

~v1,~v1⟩

+ · · · + ck

⟨

~v,~vk⟩

‖~vk‖2⟨

~vk,~vk⟩

= c1⟨

~v,~v1⟩

+ · · · + ck⟨

~v,~vk⟩

Thus,⟨

~v, ~w⟩ − ⟨

projW~v, ~w

⟩

= 0

and hence perpW~v is orthogonal to every~w ∈W and so perp

W~v ∈W⊥. �

Observe that this implies that projW~v and perp

W~v are orthogonal for any~v ∈ V as

we saw in Math 136. Additionally, these functions satisfy other properties which wesaw for projections in Math 136.


THEOREM 4 SupposeW is ak-dimensional subspace of an inner product spaceV. Then, projW

isa linear operator onV with kernelW⊥.

THEOREM 5 Suppose thatW is a subspace of a finite dimensional subspaceV. Then, for any~v ∈ V we have

projW⊥ ~v = perp

W~v


{[

1 00 1

]

,

[

1 00 −1

]}

be a subspace ofM2×2(R). Find projW

[

1 −12 3

]

.

Solution: Denote the vectors in the basis forW by ~v1 =

[

1 00 1

]

and~v2 =

[

1 00 −1

]

,

and observe that{~v1,~v2} is orthogonal. Let~x =

[

1 −12 3

]

, then

projW

[

1 −12 3

]

=

⟨

~x,~v1⟩

‖~v1‖2~v1 +

⟨

~x,~v2⟩

‖~v2‖2~v2 =

42

[

1 00 1

]

+−22

[

1 00 −1

]

=

[

1 00 3

]

EXAMPLE 5 Let B =

10−11,

,

1101

,

−110−1

and let~x =

43−25

. Find the projection of~x onto the

subspaceS = SpanB of R4.

Solution: To find the projection, we must have an orthogonal basis forS. Thus, ourfirst step must be to perform the Gram-Schmidt procedure onB. Denote the given

basis by~z1 =

10−11,

,~z2 =

1101

,~z3 =

−110−1

. Let ~w1 = ~z1. Then, we get

~w2 = ~z2 −⟨

~z2, ~w1⟩

‖~w1‖2~w1 =

1101

− 23

10−11,

=13

1321

To simplify calculations we use~w2 =

1321

instead. Then, we get

~w3 = ~z3 −⟨

~z3, ~w1⟩

‖~w1‖2~w1 −

⟨

~z3, ~w2⟩

‖~w2‖2~w2 =

−110−1

+23

10−11,

− 115

1321

=25

−12−2−1


We take~w3 =

−12−2−1

. Thus, the set{~w1, ~w2, ~w3} is an orthogonal basis forS. We can

now determine the projection.

projS~x =

⟨

~x, ~w1⟩

‖~w1‖2~w1 +

⟨

~x, ~w2⟩

‖~w2‖2~w2 +

⟨

~x, ~w3⟩

‖~w3‖2~w3 =

113~w1 +

1415~w2 +

110~w3 =

9/23−29/2

REMARK

Observe that the iterative step in the Gram-Schmidt procedure is just calculating aperpendicular of a projection. In particular, the Gram-Schmidt procedure can berestated as follows: If{~w1, . . . , ~wk} is a basis for an inner product spaceW, thenlet ~v1 = ~w1, and for 2 ≤ i ≤ k recursively defineWi−1 = Span{~v1, . . . ,~vi−1} and~vi = perp

Wi−1~wi. Then{~v1, . . . ,~vk} is an orthogonal basis forW.

If {~v1, . . . ,~vk} is an orthonormal basis forW, then the formula for calculating a pro-jection simplifies to

projW~v =

⟨

~v,~v1⟩

~v1 + · · · +⟨

~v,~vk⟩

~vk


010

,

−45

035

be a subspace ofR3. Find projW

111

and perpW

111

.


010

,

−45

035

is an orthonormal basis forW. Hence,

projW~x =

⟨

111

,

010

⟩

010

+

⟨

111

,

−4/50

3/5

⟩

−4/50

3/5

=

4/251

−3/25

perpW~x =

111

−

4/251

−3/25

=

21/250

28/25



1. ConsiderS = Span

{[

1 01 0

]

,

[

2 −11 3

]}

in M2×2(R). Find a basis forS⊥.

2. Find an orthogonal basis for the orthogonal compliment ofeach subspace ofP2(R) under the inner product〈p(x), q(x)〉 = p(−1)q(−1)+ p(0)q(0)+ p(1)q(1).

(a)S = Span{

x2 + 1}

(b) S = Span{x2}

3. Let ~w1 =

1201

, ~w2 =

2112

, ~w3 =

2132

, and letS = Span{~v1,~v2,~v3}.

(a) Find an orthonormal basis forS.

(b) Let~y =

00012

. Determine projS~y.

4. Consider the subspaceS = Span

{[

1 0−1 1,

]

,

[

1 11 1

]

,

[

2 01 1

]}

of M2×2(R).

(a) Find an orthogonal basis forS.

(b) Determine the projection of~x =

[

4 3−2 5

]

on toS.

5. OnP2(R) define the inner product〈p, q〉 = p(−1)q(−1) + p(0)q(0) + p(1)q(1)and letS = Span

{

1, x− x2}

.

(a) Use the Gram-Schmidt procedure to determine an orthogonal basis forS.

(b) Determine projS(1+ x+ x2).

6. Define the inner product〈~x, ~y〉 = 2x1y1 + x2y2 + 3x3y3 onR3. Extend the set

10−1

to an orthogonal basis forR3.

7. SupposeW is a k-dimensional subspace of an inner product spaceV. Provethat proj

Wis a linear operator onV with kernelW⊥.

8. InP3(R) using the inner product〈p(x), q(x)〉 = p(−1)q(−1)+p(0)q(0)+p(1)q(1)+p(2)q(2) the following four vectors are orthogonal:

p1(x) = 9− 13x− 15x2 + 10x3

p2(x) = 1+ x− x2

p3(x) = 1− 2x

p4(x) = 1

Let S = Span{p1(x), p2(x), p3(x)}. Determine the projection of−1+ x+ x2− x3

ontoS. [HINT: If you do an excessive number of calculations, you have missedthe point of the question.]

Section 9.5 The Fundamental Theorem 61

9.5 The Fundamental Theorem

LetW be a subspace of a finite dimensional inner product spaceV. In the last sectionwe saw that any~v ∈ V we can be written as~v = ~x+ ~y where~x ∈ W and~y ∈ W⊥. Wenow invent some notation for this.

DEFINITIONDirect Sum

Let V be a vector space andU andW be subspaces of a vector spaceV such thatU ∩W = {~0}. Thedirect sum of U andW is

U ⊕W = {~u+ ~w ∈ V | ~u ∈ U, ~w ∈W}

EXAMPLE 1 LetU = Span

101

andW = Span

111

. What isU ⊕W?

Solution: SinceU ∩W = {~0} we have

U ⊕W = {~u+ ~w ∈ V | ~u ∈ U, ~w ∈ W} =

s

101

+ t

111

| s, t ∈ R

= Span

101

,

111

THEOREM 1 If U andW are subspaces of a vector spaceV such thatU ∩W = {~0}, thenU ⊕W isa subspace ofV. Moreover, if{~v1, . . . ,~vk} is a basis forU and{~w1, . . . , ~wℓ} is a basisforW, then{~v1, . . . ,~vk, ~w1, . . . , ~wℓ} is a basis forU ⊕W.

EXAMPLE 2 LetU = Span{x2 + 1, x4} andW = Span{x3 − x, x3 + x2 + 1}. Find a basis forU ⊕W.

Solution: SinceU ∩W = {~0} a basis forU ⊕W is {x2 + 1, x4, x3 − x, x3 + x2 + 1}.

Theorem 9.4.2 tells us that for any subspaceW of an n-dimensional inner productspaceV thatW∩W⊥ = {~0} and dimW⊥ = n−dimW. Combining this with Theorem1 we get the following result.

THEOREM 2 LetV be a finite dimensional inner product space and letW be a subspace ofV. Then

W ⊕W⊥ = V


We now look at the related result for matrices. We start by looking at an example.

EXAMPLE 3 Find a basis for each of the four fundamental subspaces ofA =

1 2 0 −13 6 1 −1−2 −4 2 6

.

Then determine the orthogonal complement of the rowspace ofA and the orthogonalcomplement of the columnspace ofA.

Solution: To find a basis for each of the four fundamental subspaces, we use themethod derived in Section 7.1. Row reducingA to reduced row echelon form gives

R=

1 2 0 −10 0 1 20 0 0 0

Hence, a basis for Row(A) is

120−1

,

0012

, a basis for Null(A) is

−2100

,

10−21

,

and a basis for Col(A) is

13−2

,

012

.

Row reducingAT gives

1 0 −80 1 20 0 00 0 0

. Thus, a basis for the left nullspace is

8−21

.

Observe that the basis vectors for Null(A) are orthogonal to the basis vectors forRow(A). Additionally, we know that dim(RowA)⊥ = 4 − 2 = 2 by Theorem 9.4.1(2). Therefore, since dim(Null(A)) = 2 we have that (RowA)⊥ = Null(A).

Similarly, we see that (ColA)⊥ = Null(AT).

The result of this example seems pretty amazing. Moreover, applying parts (4) and(5) of Theorem 9.4.1 and our notation for direct sums, we get that

Rn = Row(A) ⊕ Null(A) and R

m = Col(A) ⊕ Null(AT)

This tells us that every vector inR4 is the sum of a vector in the rowspace ofA and avector in the nullspace ofA.

The generalization of this result to anym× n matrix is extremely important.

Section 9.5 The Fundamental Theorem 63

THEOREM 3 (The Fundamental Theorem of Linear Algebra)

Let A be anm× n matrix, then Col(A)⊥ = Null(AT) and Row(A)⊥ = Null(A). Inparticular,

Rn = Row(A) ⊕ Null(A) and R

m = Col(A) ⊕ Null(AT)

Proof: Let A =

~vT1...~vT

n

.

If ~x ∈ Row(A)⊥, then~x is orthogonal to each column ofAT . Hence,~vi · ~x = 0 for1 ≤ i ≤ n. Thus we get

A~x =

~vT1~x...~vT

n~x

=

~v1 · ~x...~vn · ~x

= ~0

hence~x ∈ Null(A).

On the other hand, let~x ∈ Null(A), thenA~x = ~0 so~vi · ~x = 0 for all i. Now, pick any~b ∈ Row(A). Then~b = c1~v1 + · · · + cn~vn. Therefore,

~b · ~x = (c1~v1 + · · · + cn~vn) · ~x = c1(~v1 · ~x) + · · · + cn(~vn · ~x) = 0

Hence~x ∈ Row(A)⊥.

Applying what we proved above toAT we get

(Col(A))⊥ = (Row(AT))⊥ = Null(AT)

�

Observe that the Fundamental Theorem of Linear Algebra tells us more than therelationship between the four fundamental subspaces. For example, ifA is anm× nmatrix, then we know that the rank and nullity of the linear mapping L(~x) = A~x isrankL = rankA = dim RowA and nullityL = dim(Null A). SinceRn = Row(A) ⊕Null(A) we get by Theorem 9.4.1 (2) that

n = rankL + nullity(L)

That is, the Fundamental Theorem of Linear Algebra implies the Rank-Nullity The-orem.

It can also be shown that the Fundamental Theorem of Linear Algebra implies a lotof our results about solving systems of linear equations. Wewill also find it useful ina couple of proofs in future sections.



1. LetA =

1 1 3 12 2 6 2−1 0 −2 −33 1 7 7

. Find a basis for Row(A)⊥.

2. Prove that ifU andW are subspaces of a vector spaceV such thatU∩W = {~0},thenU ⊕W is a subspace ofV. Moreover, if{~v1, . . . ,~vk} is a basis forU and{~w1, · · · , ~wℓ} is a basis forW, then{~v1, . . . ,~vk, ~w1, . . . , ~wℓ} is a basis forU ⊕W.

9.6 The Method of Least Squares

In the sciences one often tries to find a correlation between two quantities by collect-ing data from repeated experimentation. Say, for example, ascientist is comparingquantitiesx andy which is known to satisfy a quadratic relationy = a0 + a1x+ a2x2.The scientist would like to find the values ofa0, a1, anda2 which best fits their ex-perimental data.

Observe that the data collected from each experiment will correspond to an equationwhich can be used to determine values of the three unknownsa0, a1, anda2. To get asaccurate of a result as possible, the scientists will perform the experiment many times.We thus end up with a system of linear equations which has moreequations thanunknowns. Such a system of equations is called anoverdetermined system. Alsonotice that due to experimentation error, the system is verylikely to be inconsistent.So, the scientist needs to find the values ofa0, a1, anda2 which best approximatesthe data they collected.

To solve this problem, we rephrase it in terms of linear algebra. LetA be anm× nmatrix with m > n and let the systemA~x = ~b be inconsistent. We want to find avector~x that minimizes the distance betweenA~x and~b. Hence, we need to minimize‖A~x− ~b‖. The following theorem tells us how to do this.

THEOREM 1 (Approximation Theorem)

LetW be a finite dimensional subspace of an inner product spaceV. If ~v ∈ V, thenthe vector closest to~v inW is proj

W~v. That is,

‖~v− projW~v‖ < ‖~v− ~w‖

for all ~w ∈W, ~w , projW~v.

Proof: Consider~v− ~w = (~v− projW~v) + (projW~v− ~w). Then observe that

< ~v− projW~v, projW~v− ~w >=< perpW~v, projW~v > − < perpW~v, ~w >= 0− 0 = 0

Section 9.6 The Method of Least Squares 65

Hence,{~v−projW~v, projW~v−~w} is an orthogonal set. Therefore, since the PythagoreanTheorem holds in an orthogonal set, we get

‖~v− ~w‖2 = ‖(~v− projW~v) + (projW~v− ~w)‖2

= ‖~v− projW~v‖2 + ‖ projW~v− ~w‖2

> ‖~v− projW~v‖2

since‖ projW~v− ~w‖2 > 0 if ~w , projW~v. The result now follows. �

Notice thatA~x is in the columnspace ofA. Thus, the Approximation Theorem tells usthat we can minimize‖A~x − ~b‖ by finding the projection of~b onto the columnspaceof A. Therefore, if we solve the consistent system

A~x = projCol A~b

we will find the desired vector~x. This might seem quite simple, but we can make iteven easier. Subtracting both sides of this equation from~b we get

~b− A~x = ~b− projCol A~b = perpCol A

~b

Thus,~b−A~x is in the orthogonal complement of the columnspace ofA which, by theFundamental Theorem of Linear Algebra, means~b − A~x is in the nullspace ofAT .Hence

AT(~b− A~x) = ~0

or equivalentlyATA~x = AT~b

This is called thenormal system and the individual equations are called thenormalequations. This system will be consistent by construction. However, it need not havea unique solution. If it does have infinitely many solutions,then each of the solutionswill minimize ‖A~x− ~b‖.

EXAMPLE 1 Determine the vector~x that minimizes‖A~x− ~b‖ for the system

3x1 − x2 = 4

x1 + 2x2 = 0

2x1 + x2 = 1

Solution: We haveA =

3 −11 22 1

so ATA =

[

14 11 6

]

, AT~b =

[

14−3

]

. SinceATA is

invertible, we find that

~x = (ATA)−1AT~b =183

[

6 −1−1 14

] [

14−3

]

=183

[

87−56

]

Thus,x1 =8783 andx2 =

−5683 .

Note that these give 3x1 − x2 = 3.82, x1 + 2x2 = −0.3, 2x1 + 3x2 = 1.42, so we havefound a fairly good approximation.


EXAMPLE 2 Determine the vector~x that minimizes‖A~x− ~b‖ for the system

2x1 + x2 = −5

−2x1 + x2 = 8

2x1 + 3x2 = 1


2 1−2 12 3

so

ATA =

[

12 66 11

]

, AT~b =

[

−246

]

SinceATA is invertible, we find that

~x = (ATA)−1AT~b =196

[

11 −6−6 12

] [

−246

]

=

[

−300216

]

Thus,x1 =8783 andx2 =

−5683 .

This method of finding an approximate solution is called themethod of least squares.It is called this because we are minimizing

‖A~x− ~b‖ =

∥

∥

∥

∥

∥

∥

∥

∥

∥

v1...

vm

∥

∥

∥

∥

∥

∥

∥

∥

∥

=

√

v21 + · · · + v2

m

which is equivalent to minimizingv21 + · · · + v2

m.

We now return to our problem of finding the curve of best fit for aset of data points.

Say in an experiment we get a set of data points (x1, y1), . . . , (xm, ym) and we want tofind the values ofa0, . . . , an such thatp(x) = a0 + a1x+ · · · + anxn is the polynomialof best fit. That is, we want the values ofa0, . . . , an such that the values ofyi areapproximated as well as possible byp(xi).

Let ~y =

y1...

yn

and definep(~x) =

p(x1)...

p(xn)

. To make this look like the method of least

squares we write

p(~x) =

p(x1)...

p(xn)

=

a0 + a1(x1) + · · · + an(x1)n

...a0 + a1(xm) + · · · + an(xm)n

=

1 x1 · · · xn1

......

...1 xm · · · xn

m

a0...

an

= X~a

Thus, we are trying to minimize‖X~a−~y‖ and we can use the method of least squaresabove.

This can be summarized in the following theorem.


THEOREM 2 Let n data points (x1, y1), . . . , (xm, yn) be given and write

~y =

y1...

ym

, X =

1 x1 x21 · · · xn

11 x2 x2

2 · · · xn2

......

1 xm x2m · · · xn

m

Then, if~a =

a0...

an

is any solution to the normal system

XTX~a = XT~y

then the polynomialp(x) = a0 + a1x+ · · · + anxn

is the best fitting polynomial of degreen for the given data. Moreover, if at leastn+1of the numbersx1, . . . , xm are distinct, then the matrixXTX is invertible and thus~a isunique with

~a = (XTX)−1XT~y

Proof: We prove the first part above, so we just prove the second part.

Suppose thatn + 1 of the xi are distinct and consider a linear combination of thecolumns ofX.

c0

11...1

+ c1

x1

x2...

xm

+ · · · + cn

xn1

xn2...

xnm

=

00...0

Now let q(x) = c0 + c1x+ · · · + cnxn. Equating coefficients we see thatx1, x2, . . . , xm

are all roots ofq(x). Thenq(x) is a degreen polynomial with at leastn + 1 distinctroots which contradicts the Fundamental Theorem of Algebra. Therefore,q(x) mustbe the zero polynomial, and soci = 0 for 0 ≤ i ≤ n. Thus, the columns ofX arelinearly independent.

To show this implies thatXTX is invertible we considerXTX~v = ~0. We have

‖X~v‖2 = (X~v)TX~v = ~vTXTX~v = ~vT~0 = 0

Hence,X~v = ~0, so~v = ~0 since the columns ofX are linearly independent. ThusXTXis invertible as required. �


EXAMPLE 3 Find a0 anda1 to obtain the best fitting equation of the formy = a0 + a1x for thegiven data.

x 1 3 4 6 7y 1 2 3 4 5

Solution: We letX =

1 11 31 41 61 7

and~y =

12345

. We then get

XTX =

[

1 1 1 1 11 3 4 6 7

]

1 11 31 41 61 7

=

[

5 2121 111

]

XT~y =

[

1 1 1 1 11 3 4 6 7

]

12345

=

[

1578

]

By Theorem 2 we get that~a =

[

a0

a1

]

is unique and satisfies

~a = (XTX)−1XT~y

=

[

5 2121 111

]−1 [

1578

]

=1

114

[

111 −21−21 5

] [

1578

]

=138

[

925

]

So, the line of best fit isy = 938 +

2538x.


EXAMPLE 4 Find a0, a1, a2 to obtain the best fitting equation of the formy = a0 + a1x+ a2x2 forthe given data.

x −3 −1 0 1 3y 3 1 1 2 4

Solution: We have~y =

31124

andX =

1 −3 91 −1 11 0 01 1 11 3 9

. Hence,

XTX =

5 0 200 20 020 0 164

, XT~y =

11466

By Theorem 2 we get that~a =

[

a0

a1

]

is unique and satisfies

~a = (XTX)−1XT~y

=

5 0 200 20 020 0 164

−1

11466

=1

210

2424255

So the best fitting parabola is

p(x) =121105+

15

x+1142

x2



1. Find the vector~x that minimizes‖A~x− ~b‖ for each of the following systems.

(a) x1 + 2x2 = 1

x1 + x2 = 2

x1 − x2 = 5

(c) 2x1 + x2 = 4

2x1 − x2 = −1

3x1 + 2x2 = 8

(e) x1 − x2 = 2

2x1 − x2 + x3 = −2

2x1 + 2x2 + x3 = 3

x1 − x2 = −2

(b) x1 + 5x2 = 1

−2x1 − 7x2 = 1

x1 + 2x2 = 0

(d) x1 + 2x2 = 2

x1 + 2x2 = 3

x1 + 3x2 = 2

x1 + 3x2 = 3

(f) 2x1 − 2x2 = 1

−x1 + x2 = 2

3x1 + x2 = 1

2x1 − x2 = 2

2. Finda0 anda1 to obtain the best fitting equation of the formy = a0 + a1x forthe given data.

(a)x −1 0 1y −3 2 2

(b)x −1 0 1 2y −3 −1 0 1

3. Finda0, a1, a2 to obtain the best fitting equation of the formy = a0+a1x+a2x2

for the given data.

(a)x −2 −1 1 2y 0 −2 0 1 (b)

x −2 −1 0 1 2y 0 1 −1 3 1

Chapter 10

Applications of Orthogonal Matrices

10.1 Orthogonal Similarity

In Math 136, we saw that two matricesA andB are similar if there exists an invertiblematrix P such thatP−1AP = B. Moreover, we saw that this corresponds to applyinga change of coordinates to the standard matrix of a linear operator. In particular, ifL : Rn → Rn is a linear operator andB = {~v1, . . . ,~vn} is a basis forRn, then takingP =

[

~v1 · · · ~vn

]

we get

[L]B = P−1[L]P

We then looked for a basisB of Rn such that [L]B was diagonal. We found that sucha basis is made up of the eigenvectors of [L].

In the last chapter, we saw that orthonormal bases have some very nice properties. Forexample, if the columns ofP form an orthonormal basis forRn, thenP is orthogonaland it is very easy to findP−1. In particular, we haveP−1 = PT . Hence, we now turnour attention to trying to find an orthonormal basisB of eigenvectors such that [L]Bis diagonal. Since the corresponding matrixP will be orthogonal, we will get

[L]B = PT [L]P

DEFINITIONOrthogonally

Similar

Two matricesA andB are said to beorthogonally similar if there exists an orthogo-nal matrixP such that

PTAP= B

SincePT = P−1 we have that ifA andBare orthogonally similar, then they are similar.Therefore, all the properties of similar matrices still hold. In particular, ifA andBare orthogonally similar, then rankA = rankB, tr A = tr B, detA = detB, andA andB have the same eigenvalues.

We saw in Math 136 that not every square matrixA has a set of eigenvectors whichforms a basis forRn. Since we are now going to require the additional condition thatthe basis is orthonormal, we expect that even fewer matriceswill have this property.So, we first just look for matricesA such that there exists an orthogonal matrixP forwhich PTAP is upper triangular.

71

72 Chapter 10 Applications of Orthogonal Matrices

THEOREM 1 (Triangularization Theorem)

If A is ann × n matrix with real eigenvalues, thenA is orthogonally similar to anupper triangular matrixT.

Proof: We prove the result by induction onn. If n = 1, thenA is upper triangular.Therefore, we can takeP to be the orthogonal matrixP = [1] and the result follows.Now assume the result holds for all (n− 1)× (n− 1) matrices and consider ann× nmatrix A with all real eigenvalues.

Let ~v1 be a unit eigenvector of one ofA’s eigenvaluesλ1. Extend the set{~v1} to abasis forRn and apply the Gram-Schmidt procedure to produce an orthonormal basis{~v1, ~w2, . . . , ~wn} for Rn. Then the matrixP1 =

[

~v1 ~w2 . . . ~wn

]

is orthogonal and

PT1 AP1 =

~vT1~wT

2...~wT

n

A[

~v1 ~w2 . . . ~wn

]

=

~v1 · A~v1 ~v1 · A~w2 · · · ~v1A~wn

~w2 · A~v1 ~w2 · A~w2 . . . ~w2 · A~wn...

. . ....

~wn · A~v1 ~wn · A~w2 . . . ~wn · A~wn

Consider the entries in the first column. SinceA~v1 = λ1~v1 we get

~wi · A~v1 = ~wi · λ1~v1 = λ1(~wi · ~v1) = 0

for 2 ≤ i ≤ n and~v1 · A~v1 = λ1. Hence

PT1 AP1 =

[

λ1~bT

~0 A1

]

whereA1 is an (n− 1)× (n− 1) matrix,~b ∈ Rn−1, and~0 is the zero vector inRn−1. A

is similar to

[

λ1~bT

~0 A1

]

so all the eigenvalues ofA1 are eigenvalues ofA and hence all

the eigenvalues ofA1 are real. Therefore, by the inductive hypothesis, there exists an(n− 1)× (n− 1) orthogonal matrixQ such thatQTA1Q = T1 is upper triangular.

Let P2 =

[

1 ~0T

~0 Q

]

. Since the columns ofQ are orthonormal, the columns ofP2 are

also orthonormal and henceP2 is an orthogonal matrix. Consequently, the matrixP = P1P2 is also orthogonal by Theorem7. Then by block multiplication we get

PTAP= (P1P2)TA(P1P2) = PT

2 PT1 AP1P2

=

[

1 ~0T

~0 QT

] [

λ1~bT

~0 A1

] [

1 ~0T

~0 Q

]

=

[

λ1~bTQ

~0 QTA1Q

]

=

[

λ1~bTQ

~0 T1

]

is upper triangular. Thus, the result follows by induction. �

Section 10.1 Orthogonal Similarity 73

If A is orthogonally similar to an upper triangular matrixT, thenA andT must sharethe same eigenvalues. Thus, sinceT is upper triangular, the eigenvalues must appearalong the main diagonal ofT.

Notice that the proof of the theorem gives us a method for finding an orthogonalmatrix P so thatPTAP = T is upper triangular. However, since the proof is byinduction, this leads to a recursive algorithm. We demonstrate this with an example.

EXAMPLE 1 Let A =

2 1 10 3 −20 2 −1

. Find an orthogonal matrixP such thatPTAP is upper triangular.

Solution: As in the proof, we first need to find one of the real eigenvaluesof A. By

inspection, we see that 2 is an eigenvalue ofA with unit eigenvector~v1 =

100

. Next

we extend{~v1} to an orthonormal basis forR3. This is very easy in this case as we

can take~w2 =

010

and~w3 =

001

. Thus we getP1 = I and

PT1 AP1 =

2 1 10 3 −20 2 −1

=

[

2 ~bT

~0 A1

]

whereA1 =

[

3 −22 −1

]

and~bT =[

1 1]

.

We now need to apply the inductive hypothesis onA1. Thus, we need to find a2 × 2 orthogonal matrixQ such thatQTA1Q = T1 is upper triangular. We start

by finding a real eigenvalue ofA1. ConsiderA1 − λI =[

3− λ −22 −1− λ

]

. Then

C(λ) = det(A1 − λI ) = (λ − 1)2, so we have one eigenvalueλ = 1. This gives

A1 − λI =[

2 −22 −2

]

so a corresponding unit eigenvector is

[

1/√

21/√

2

]

.

We extend

{[

1/√

21/√

2

]}

to a orthonormal basis forR2 by picking 1√2

[

−11

]

. Therefore,

Q = 1√2

[

1 −11 1

]

is an orthogonal matrix and

QTA1Q =1√

2

[

1 1−1 1

] [

3 −22 −1

]

1√

2

[

1 −11 1

]

=

[

1 −40 1

]

= T1

Now that we haveQ andT1, the proof of the theorem tells us to choose

P2 =

[

1 ~0T

~0 Q

]

=

1 0 00 1/

√2 −1/

√2

0 1/√

2 1/√

2


andP = P1P2 = P2. Then

PTAP=

[

2 ~bTQ~0 T1

]

=

2√

2 00 1 −40 0 1

as required.

Since the procedure is recursive, it would be very inefficient for larger matrices.There is a much better algorithm for doing this, although we will not cover it inthis book. The purpose of the example above is not to teach onehow to find such anorthogonal matrix, but to help one understand the proof.


1. Prove that ifA is orthogonal similar toB andB is orthogonally similar toC,thenA is orthogonal similar toC.

2. By following the steps in the proof of the Triangularization Theorem find anorthogonal matrixP and upper triangular matrixT such thatPTAP = T foreach of the following matrices.

(a) A =

[

2 41 5

]

(b) A =

1 0 −12 2 14 0 5

10.2 Orthogonal Diagonalization

We now know that we can find an orthonormal basisB such that theB-matrix of anylinear operatorL on Rn with real eigenvalues is upper triangular. However, as wesaw in Math 136, having a diagonal matrix would be even better. So, we now look atwhich matrices are orthogonally similar to a diagonal matrix.

DEFINITIONOrthogonally

Diagonalizable

An n × n matrix A is said to beorthogonally diagonalizable if there exists an or-thogonal matrixP and diagonal matrixD such that

PTAP= D

that is, ifA is orthogonally similar to a diagonal matrix.

Our goal is to determine all matrices which are orthogonallydiagonalizable. Atfirst glance, it may not be clear of how to even start trying to find matrices which areorthogonally diagonalizable, if any exist. One way to approach this problem is to startby looking at what condition a matrix must have if it is orthogonally diagonalizable.

Section 10.2 Orthogonal Diagonalization 75

THEOREM 1 If A is orthogonally diagonalizable, thenAT = A.

Proof: If A is orthogonally diagonalizable, then there exists an orthogonal matrixP such thatPTAP = D with D diagonal. SincePT = P−1 we can write this asA = PDPT . Taking transposes gives

AT = (PDPT )T = (PT)TDTPT = PDPT = A

sinceD = DT asD is diagonal. Thus,A = AT . �

Observe that this theorem does not even guarantee the existence of an orthogonallydiagonalizable matrix. It only shows that if one exists, then it must satisfyAT = A.But, on the other hand, it gives us a place to start looking fororthogonally diagonal-izable matrices. Lets first consider the set of 2× 2 such matrices.

EXAMPLE 1 Prove that every 2× 2 matrixA =

[

a bb c

]

is orthogonally diagonalizable.

Solution: We will show that we can find an orthonormal basis of eigenvectors forAso that we can find an orthogonal matrixP which diagonalizesA. We haveC(λ) =

det

[

a− λ bb c− λ

]

= λ2 − (a + c)λ + ac− b2. Thus, by the quadratic formula, the

eigenvalues ofA are

λ+ =a+ c+

√

(a+ c)2 − 4(ac− b2)2

=a+ c+

√

(a− c)2 + 4b2

2

λ− =a+ c−

√

(a+ c)2 − 4(ac− b2)

2=

a+ c−√

(a− c)2 + 4b2

2

If a = c andb = 0, thenA is already diagonal and thus can be orthogonally diagonal-ized by I . Otherwise,A has two real eigenvalues with algebraic multiplicity 1. Wehave

A− λ+I =

a−c−√

(a−c)2+4b2

2 b

bc−a−√

(a−c)2+4b2

2

∼

1 2b

a−c−√

(a−c)2+4b2

0 0

Hence, an eigenvector forλ+ is~v1 =

−2b

a−c−√

(a−c)2+4b2

1

. Also,

A− λ−I =

a−c+√

(a−c)2+4b2

2 b

bc−a+√

(a−c)2+4b2

2

∼

1 2b

a−c+√

(a−c)2+4b2

0 0

so, an eigenvector forλ− is~v2 =

−2b

a−c+√

(a−c)2+4b2

1

.


Observe that

~v1 · ~v2 =

−2b

a− c−√

(a− c)2 + 4b2

−2b

a− c+√

(a− c)2 + 4b2

+ 1(1)

=4b2

(a− c)2 − (a− c)2 − 4b2+ 1 = 0

Thus, the eigenvectors are orthogonal and hence we can normalize them to get anorthonormal basis forR2. So,A can be diagonalized by an orthogonal matrix.

It is clear that the conditionAT = A is going to be very important. Thus, we makethe following definition.

DEFINITIONSymmetric

Matrix

A matrix A such thatAT = A is said to besymmetric.

It is clear from the definition that a matrix must be square to be symmetric. Moreover,observe that a matrixA is symmetric if and only ifai j = A ji for all 1 ≤ i, j ≤ n.

EXAMPLE 2 Determine which of the following matrices is symmetric.

(a) A =

1 2 12 0 11 1 −3

(b) B =

1 0 11 1 11 0 1

(c) C =

1 0 00 2 00 0 3

Solution: It is easy to verify thatAT = A andCT = C so they are both symmetric.However,

BT =

1 1 10 1 01 1 1

, B

soB is not symmetric.

Example1 shows that every 2× 2 symmetric matrix is orthogonally diagonalizable.Thanks to our work in the last section proving that every symmetric matrix is or-thogonally diagonalizable is not difficult. We start by stating an amazing result aboutsymmetric matrices.

LEMMA 2 If A is a symmetric matrix with real entries, then all of its eigenvalues are real.

This proof requires properties of vectors and matrices withcomplex entries and sowill be delayed until Chapter 11.


THEOREM 3 (Principal Axis Theorem)

Every symmetric matrix is orthogonally diagonalizable.

Proof: Let A be a symmetric matrix. By Lemma2 all eigenvalues ofA are real.Therefore, we can apply the Triangularization Theorem to get that there exists anorthogonal matrixP such thatPTAP = T is upper triangular. SinceA is symmetric,we have thatAT = A and hence

TT = (PTAP)T = PTAT(PT)T = PTAP= T

Therefore,T is an upper triangular symmetric matrix. But, ifT is upper triangular,thenTT is lower triangular, and soT is both upper and lower triangular and henceTis diagonal. �

Although proving this important theorem is quite easy thanks to the TriangularizationTheorem, it does not give us a nice way of orthogonally diagonalizing a symmetricmatrix. So, we instead refer to our solution of Example1. In the example we saw thatthe eigenvectors corresponding to different eigenvalues were naturally orthogonal. Toprove that this is always the case, we first prove an importantproperty of symmetricmatrices.

THEOREM 4 A matrix A is symmetric if and only if~x · (A~y) = (A~x) · ~y for all ~x, ~y ∈ Rn.

Proof: Suppose thatA is symmetric, then for any~x, ~y ∈ Rn we have

~x · (A~y) = ~xTA~y = ~xTAT~y = (A~x)T~y = (A~x) · ~y

Conversely, if~x · (A~y) = (A~x) · ~y for all ~x, ~y ∈ Rn, then

(~xTA)~y = (A~x)T~y = (~xTAT)~y

Since this is valid for all~y ∈ Rn we get that~xTA = ~xTAT by Theorem 3.1.4 fromthe Math 136 course notes. Taking transposes of both sides givesAT~x = A~x and thisis still valid for all ~x ∈ Rn. Hence, applying Theorem 3.1.4 again givesAT = A asrequired. �

THEOREM 5 If ~v1,~v2 are eigenvectors of a symmetric matrixA corresponding to distinct eigenval-uesλ1, λ2, then~v1 is orthogonal to~v2.

Proof: We are assuming thatA~v1 = λ1~v1 andA~v2 = λ2~v2, λ1 , λ2. Theorem4 gives

λ1(~v1 · ~v2) = (λ1~v1) · ~v2 = (A~v1) · ~v2 = ~v1 · (A~v2) = ~v1 · (λ2~v2) = λ2(~v1 · ~v2)

Hence, (λ1 − λ2)(~v1 · ~v2) = 0. But,λ1 , λ2, so~v1 · ~v2 = 0 as required. �


Consequently, if a symmetric matrixA hasn distinct eigenvalues, the basis of eigen-vectors which diagonalizesA will naturally be orthogonal. Hence, to orthogonallydiagonalize such a matrixA, we just need to normalize these eigenvectors to form anorthonormal basis forRn of eigenvectors ofA.

EXAMPLE 3 Find an orthogonal matrixP such thatPTAP is diagonal, whereA =

1 0 −10 1 2−1 2 5

.

Solution: The characteristic polynomial is

C(λ) =

∣

∣

∣

∣

∣

∣

∣

∣

1− λ 0 −10 1− λ 2−1 2 5− λ

∣

∣

∣

∣

∣

∣

∣

∣

= (1−λ)[(1−λ)(5−λ)−4]− (1−λ) = −(λ−1)λ(λ−6)

Thus the eigenvalues ofA areλ1 = 0, λ2 = 1, andλ3 = 6. Forλ1 = 0 we get

A− 0I =

1 0 −10 1 2−1 2 5

∼

1 0 −10 1 20 0 0

Hence, a basis forEλ1, the eigenspace ofλ1, is

1−21

. Forλ2 = 1, we get

A− I ∼

1 −2 00 0 10 0 0

Hence, a basis forEλ2 is

210

. Forλ3 = 6, we get

A− 6I ∼

1 0 1/50 1 −2/50 0 0


−125

.

As predicted by the theorem, we can easily verify that these are orthogonal to eachother. After normalizing we find that an orthonormal basis ofeigenvectors forA is

1/√

6−2/√

61/√

6

,

2/√

51/√

50

,

−1/√

302/√

305/√

30

Thus, we take

P =

1/√

6 2√

5 −1/√

30−2/√

6 1/√

5 2/√

301/√

6 0 5/√

30


Since the columns ofP form an orthonormal basis forR3 we get thatP is orthogonal.Moreover, since the columns ofP form a basis of eigenvectors forA, we get thatPdiagonalizesA. In particular,

PTAP=

0 0 00 1 00 0 6

Notice that Theorem5does not say anything about different eigenvectors correspond-ing to the same eigenvalue. In particular, if an eigenvalue of a symmetric matrixA hasgeometric multiplicity 2, then there is no guarantee that a basis for the eigenspace ofthe eigenvalue will be orthogonal. Of course, this is easilyfixed as we can just applythe Gram-Schmidt procedure to find an orthonormal basis for the eigenspace.

EXAMPLE 4 Orthogonally diagonalizeA =

8 −2 2−2 5 42 4 5

.

Solution: The characteristic polynomial ofA is

C(λ) =

∣

∣

∣

∣

∣

∣

∣

∣

8− λ −2 2−2 5− λ 42 4 5− λ

∣

∣

∣

∣

∣

∣

∣

∣

=

8− λ −4 20 0 9− λ2 λ − 1 5− λ

= −λ(λ − 9)2

Thus, the eigenvalues areλ1 = 0 andλ2 = 9 where the algebraic multiplicity ofλ1 is1 and the algebraic multiplicity ofλ2 is 2. Forλ1 = 0 we get

A− 0I ∼

1 0 1/20 1 10 0 0


12−2

= {~v1}. Forλ2 = 9 we get

A− 9I ∼

1 2 −20 0 00 0 0


−210

,

201

= {~v2,~v3}.

Observe that~v2 and~v3 are both orthogonal to~v1, but not to each other. Thus, weapply the Gram-Schmidt procedure on the basis{~v2,~v3} to get an orthonormal basis


for Eλ2.

~w2 =

−210

~w3 = ~v3 −⟨

~v3, ~w2⟩

‖~w2‖2~w2 =

201

− −45

−210

=

2/54/51

Instead, we take~w3 =

245

. Thus,{~w2, ~w3} is an orthogonal basis for the eigenspace of

λ2 and hence{~v1, ~w2, ~w3} is an orthogonal basis forR3 of eigenvectors ofA.

We normalize the vectors to get the orthonormal basis

1/32/3−2/3

,

−2/√

51/√

50

,

2/√

454/√

455/√

45

Hence, we get

P =

1/3 −2/√

5 2/√

452/3 1/

√5 4/

√45

−2/3 0 5/√

45

and D = PTAP=

0 0 00 9 00 0 9

EXAMPLE 5 Orthogonally diagonalizeA =

1 1 11 1 11 1 1

.


C(λ) =

∣

∣

∣

∣

∣

∣

∣

∣

1− λ 1 11 1− λ 11 1 1− λ

∣

∣

∣

∣

∣

∣

∣

∣

=

1− λ 1 21 1− λ 2− λ0 λ 0

= −λ2(λ − 3)

Thus, the eigenvalues areλ1 = 0 with algebraic multiplicity 2 andλ2 = 3 withalgebraic multiplicity 1. Forλ1 = 0 we get

A− 0I ∼

1 1 10 0 00 0 0


−110

,

−101

.


Since this is not an orthogonal basis forEλ1, we need to apply the Gram-Schmidtprocedure. We get

~w1 =

−110

~w2 =

−101

− 12

−110

=

−1/2−1/2

1

Instead, we take~w2 =

11−2

. Then,{~w1, ~w2} is an orthogonal basis forEλ1. Forλ2 = 3

we get

A− 3I ∼

1 0 −10 1 −10 0 0


111

.

We normalize the basis vectors to get the orthonormal basis

−1/√

21/√

20

,

1/√

61/√

6−2/√

6

,

1/√

31/√

31/√

3

Hence, we take

P =

−1/√

2 1/√

6 1/√

31/√

2 1/√

6 1/√

30 −2/

√6 1/

√3

and get

PTAP=

0 0 00 0 00 0 3



1. Orthogonally diagonalize each of the following symmetric matrices.

(a)

[

2 22 2

]

(b)

[

4 −2−2 7

]

(c)

[

1 −2−2 −2

]

(d)

1 1 11 1 11 1 1

(e)

0 2 −22 1 0−2 0 1

(f)

2 −2 5−2 −5 −2−5 −2 2

(g)

2 −4 −4−4 2 −4−4 −4 2

(h)

5 −1 1−1 5 11 1 5

(i)

1 1 1 11 1 1 11 1 1 11 1 1 1

2. Prove that ifA is invertible and orthogonally diagonalizable, thenA−1 is or-thogonally diagonalizable.

3. Determine whether each statement is true or false. Justify your answer with aproof or a counter example.

(a) Every orthogonal matrix is orthogonally diagonalizable.

(b) If A andB are orthogonally diagonalizable, thenAB is orthogonally diag-onalizable.

(c) If A is orthogonally similar to a symmetric matrixB, thenA is orthogo-nally diagonalizable.

10.3 Quadratic Forms

We now use the results of the last section to study a very important class of functionscalled quadratic forms. Quadratic forms are not only important in linear algebra, butalso in number theory, group theory, differential geometry, and many other areas.

Simply put, a quadratic form is a function defined as a linear combination of allpossible termsxi xj for 1 ≤ i ≤ j ≤ n. For example, for two variablesx1, x2 we havea1x2

1+a2x1x2+a3x22, and for three variablesx1, x2, x3 we geta1x2

1+a2x1x2+a3x1x3+

a4x22 + a5x2x3 + a6x2

3.

Notice that a quadratic form is certainly not a linear function. Instead, they are tied

to linear algebra by matrix multiplication. For example, ifB =

[

1 20 −1

]

and~x =

[

x1

x2

]

,

then we have

Q(~x) = ~xTB~x =[

x1 x2

]

[

1 20 −1

] [

x1

x2

]

=[

x1 x2

]

[

x1 + 2x2

−x2

]

= x1(x1 + 2x2) + x2(−x2) = x21 + 2x1x2 − x2

2

which is a quadratic form. We now use this to precisely define aquadratic form.

Section 10.3 Quadratic Forms 83

DEFINITIONQuadratic Form

A quadratic form onRn with corresponding matrixn× n matrix A is defined by

Q(~x) = ~xTA~x, for any~x ∈ Rn

There is a connection between the dot product and quadratic forms. Observe that

Q(~x) = ~xTA~x = ~x · (A~x) = (A~x) · ~x = (A~x)T~x = ~xTAT~x

Hence,A andAT give the same quadratic form (this does not imply thatA = AT). Ofcourse, this is most natural whenA is symmetric. In fact, it can be shown that everyquadratic form can be written asQ(~x) = ~xTA~x whereA is symmetric. Moreover, eachsymmetric matrixA uniquely determines a quadratic form. Thus, as we will see, weoften deal with a quadratic form and its corresponding symmetric matrix in the sameway.

EXAMPLE 1 Let B =

[

1 20 −1

]

. Find the symmetric matrix corresponding toQ(~x) = ~xT B~x.

Solution: From our work above we have

Q(~x) = ~xTB~x = x21 + 2x1x2 − x2

2

We want to find a symmetric matrixA =

[

a bb c

]

such that

x21 + 2x1x2 − x2

2 = ~xTA~x

We have that

x21 + 2x1x2 − x2

2 = ~xTA~x =

[

x1 x2

]

[

a bb c

] [

x1

x2

]

=[

x1 x2

]

[

ax1 + bx2

bx1 + cx2

]

= x1(ax1 + bx2) + x2(bx1 + cx2)

= a1x21 + 2a2x1x2 + a3x2

2

Hence, we takea1 = 1, a2 = 1, anda3 = −1. Consequently, the symmetric matrix

corresponding toQ(~x) is A =

[

1 11 −1

]

.

For a given quadratic form onRn, an easy way to think of the corresponding sym-metric matrix is as a grid with the rows and columns corresponding to the variablesx1, . . . , xn. So the 11 entry of the matrix is thex1x1 grid and so must have the coeffi-cient ofx2

1. The 12 entry of the matrix is thex1x2 grid, but the 21 entry is also ax1x2

grid, so they have to equally split the coefficient of x1x2 between them. Of course,this also works in reverse.


EXAMPLE 2 Let A =

[

1 44 3

]

. Then the quadratic form corresponding toA is

Q(x1, x2) = x21 + 2(4)x1x2 + 3x2

2 = x21 + 8x1x2 + 3x2

2

EXAMPLE 3 Let A =

1 2 −32 −4 0−3 0 −1

. Then the quadratic form corresponding toA is

Q(x1, x2, x3) = x21 + 2(2)x1x2 + 2(−3)x1x3 − 4x2

2 + 2(0)x2x3 − x23

= x21 + 4x1x2 − 6x1x3 − 4x2

2 − x23

EXERCISE 1 Let A =

−3 1/2 11/2 3/2 21 2 0

. Find the quadratic form corresponding toA.

EXAMPLE 4 Let Q(x1, x2) = 2x21 + 3x1x2 − 4x2

2. Find the corresponding symmetric matrixA.

Solution: We geta11 = 2, 2a12 = 3, anda22 = 4. Hence,

A =

[

2 3/23/2 −4

]

EXAMPLE 5 Let Q(x1, x2, x3) = x21 − 2x1x2 + 8x1x3 + 3x2

2 + x2x3 − 5x23, Find the corresponding

symmetric matrixA.

Solution: We havea11 = 1, 2a12 = −2, 2a13 = 8, a22 = 3, 2a23 = 1, anda33 = −5.Hence,

A =

1 −1 4−1 3 1/24 1/2 −5

EXERCISE 2 Let Q(x1, x2, x3) = 3x21+

12x1x2− x2

2+2x2x3+2x23, Find the corresponding symmetric

matrix A.


EXAMPLE 6 Let Q(x1, x2, x3) = x21 − 2x2

2 − 3x23, then the corresponding symmetric matrix is

A =

1 0 00 −2 00 0 −3

The termsxi xj with i , j are called cross-terms. Thus, a quadratic formQ(~x) has nocross terms if and only if its corresponding symmetric matrix is diagonal. We willcall such a quadratic form adiagonal quadratic form.

Classifying Quadratic Forms

DEFINITION Let Q(~x) be a quadratic form. Then

1. Q(~x) is positive definite ifQ(~x) > 0 for all ~x , ~0.2. Q(~x) is negative definite ifQ(~x) < 0 for all ~x , ~0.3. Q(~x) is indefinite ifQ(~x) > 0 for some~x andQ(~x) < 0 for some~x.4. Q(~x) is positive semidefinite ifQ(~x) ≥ 0 for all ~x andQ(~x) = 0 for some~x , ~0.5. Q(~x) is negative semidefinite ifQ(~x) ≤ 0 for all ~x andQ(~x) = 0 for some~x , ~0.

EXAMPLE 7 Q(x1, x2) = x21 + x2

2 is positive definite,Q(x1, x2) = −x21 − x2

2 is negative defi-nite, Q(x1, x2) = x2

1 − x22 is indefinite,Q(x1, x2) = x2

1 is positive semidefinite, andQ(x1, x2) = −x2

1 is negative semidefinite.

We classify symmetric matrices in the same way that we classify their correspondingquadratic forms.

EXAMPLE 8 A =

[

1 00 1

]

corresponds to the quadratic formQ(x1, x2) = x21 + x2

2, soA is positive

definite.

EXAMPLE 9 Classify the symmetric matrixA =

[

4 −2−2 4

]

and the corresponding quadratic form

Q(~x) = ~xTA~x.

Solution: Observe that we have

Q(~x) = 4x21 − 4x1x2 + 4x2

2 = (2x1 − x2)2 + 3x2

2 > 0

whenever~x , ~0. Thus,Q(~x) andA are both positive definite.


REMARK

The definitions, of course, can be generalized to any function and matrix. For exam-

ple, the matrixA =

[

1 10 1

]

is positive definite since

~xTA~x = x21 + x1x2 + x2

2 = (x1 +12

x2)2 +

34

x22 > 0

for all ~x , ~0. In this book we will just be dealing with quadratic forms and symmetricmatrices. However, it is important to remember that “AssumeA is positive definite”does not imply thatA is symmetric.

Each of the quadratic forms in the example above was simple toclassify since it eitherhad no cross-terms or we were easily able to complete a square. To classify moredifficult quadratic forms, we use our theory from the last sectionand the connectionbetween quadratic forms and symmetric matrices.

THEOREM 1 Let A be the symmetric matrix corresponding to a quadratic formQ(~x) = ~xTA~x. If Pis an orthogonal matrix that diagonalizesA, thenQ(~x) can be expressed as

λ1y21 + · · ·λny

2n

where

y1...

yn

= ~y = PT~x and whereλ1, . . . , λn are the eigenvalues ofA corresponding

to the columns ofP.

Proof: SinceP is orthogonal, if~y = PT~x, then~x = P~y. Moreover, we have thatPTAP= D is diagonal where the diagonal entries ofD are the eigenvaluesλ1, . . . , λn

of A. Hence,

Q(~x) = ~xTA~x = (P~y)TA(P~y) = ~yTPTAP~y = ~yTD~y = λ1y21 + · · · λny

2n

�

This theorem shows that every quadratic formQ(~x) is equivalent to a diagonal quadraticform. Moreover,Q(~x) can be brought into this form by the change of variables~y = PT~x whereP is an orthogonal matrix which diagonalizes the correspondingsymmetric matrixA.

EXAMPLE 10 Find new variablesy1, y2, y3, y4 such that

Q(x1, x2, x3, x4) = 3x21+2x1x2−10x1x3+10x1x4+3x2

2+10x2x3−10x2x4+3x23+2x3x4+3x2

4

has diagonal form. Use the diagonal form to classifyQ(~x).


Solution: We have corresponding symmetric matrixA =

3 1 −5 51 3 5 −5−5 5 3 15 −5 1 3

.

We first find an orthogonal matrixP which diagonalizesA. We haveC(λ) = (λ −12)(λ + 8)(λ − 4)2 and finding the corresponding normalized eigenvectors gives

P =12

1 1 1 1−1 −1 1 1−1 1 1 −11 −1 1 −1

with PTAP=

12 0 0 00 −8 0 00 0 4 00 0 0 4

. Thus, if

~y =

y1

y2

y3

y4

= PT

x1

x2

x3

x4

=12

x1 − x2 − x3 + x4

x1 − x2 + x3 − x4

x1 + x2 + x3 + x4

x1 + x2 − x3 + x4

then we getQ = 12y2

1 − 8y22 + 4y2

3 + 4y24

Therefore,Q(~x) clearly takes positive and negative values, soQ(~x) is indefinite.

REMARKS

1. The eigenvectors we used to make upP are the principal axes ofA. We willsee a geometric interpretation of this in the next section.

2. By changing the order of the eigenvectors inP we also change the order ofthe eigenvalues inD and hence the coefficients of the correspondingyi. Forexample, if we took

P =12

1 1 1 11 −1 1 −11 1 −1 −1−1 1 −1 1

then we would getQ = −8y21+ 4y2

2+ 4y23+ 12y2

4. Notice, that since we can pickany vector~y ∈ R4, this does in fact give us exactly the same set of values asour choice above. Alternatively, you can think of just doinganother change ofvariablesz1 = y2, z2 = y3, z3 = y4, andz4 = y1.

Generalizing the method of the example above gives us the following theorem.


THEOREM 2 Let A be a symmetric matrix. Then the quadratic formQ(~x) = ~xTA~x is

(1) positive definite if and only if the eigenvalues ofA are all positive.(2) negative definite if and only if the eigenvalues ofA are all negative.(3) indefinite if and only if the some of the eigenvalues ofA are positive and some

are negative.(4) positive semidefinite if and only if it has some zero eigenvalues and the rest

positive.(5) negative semidefinite if and only if has some zero eigenvalues and the rest

negative.

Proof: We will prove (1). The proofs of the others are similar. By thePrincipal AxisTheoremA is orthogonally diagonalizable, so there exists an orthogonal matrix Psuch that

Q(~x) = ~xTA~x = ~yTD~y = λ1y21 + · · · + λny

2n

where~y = PT~x andλ1, . . . , λn are the eigenvalues ofA. Consequently,Q(~x) > 0 forall ~x , ~0 if and only if allλi are positive. Thus,Q(~x) is positive definite if and only ifthe eigenvalues ofA are all positive. �

EXAMPLE 11 Classify the following:

(a) Q(x1, x2) = 4x21 − 6x1x2 + 2x2

2.


[

4 −3−3 2

]

so

det(A− λI ) =∣

∣

∣

∣

∣

∣

4− λ −3−3 2− λ

∣

∣

∣

∣

∣

∣

= λ2 − 6λ − 1

Applying the quadratic formula we getλ = 6±√

402 . So, we have both positive and

negative eigenvalues soQ is indefinite.

(b) Q(x1, x2, x3) = 2x21 + 2x1x2 + 2x1x3 + 2x2

2 + 2x2x3 + 2x23.


2 1 11 2 11 1 2

. We can find that the eigenvalues are 1, 1 and 4.

Hence all the eigenvalues ofA are positive soA is positive definite.

(c) A =

3 2 02 2 20 2 1

.

Solution: The eigenvalues ofA are 5, 2, and−1. Therefore,A is indefinite.



1. For each of the following quadratic formsQ(~x):

(i) Find the corresponding symmetric matrix.

(ii) Classify the quadratic form and its corresponding symmetric matrix.

(iii) Find a corresponding diagonal form ofQ(~x) and the change of variableswhich brings it into this form.

(a) Q(~x) = x21 + 3x1x2 + x2

2

(b) Q(~x) = 8x21 + 4x1x2 + 11x2

2

(c) Q(~x) = x21 + 8x1x2 − 5x2

2

(d) Q(~x) = −x21 + 4x1x2 + 2x2

2

(e) Q(~x) = 4x21 + 4x1x2 + 4x1x3 + 4x2

2 + 4x2x3 + 4x23

(f) Q(~x) = 4x1x2 − 4x1x3 + x22 − x2

3

(g) Q(~x) = 4x21 − 2x1x2 + 2x1x3 + 4x2

2 + 2x2x3 + 4x23

(h) Q(~x) = −4x21 + 2x1x2 + 2x1x3 − 4x2

2 − 2x2x3 − 4x23

2. LetQ(~x) = ~xTA~x with A =

[

a bb c

]

and detA , 0.

(a) Prove thatQ is positive definite if detA > 0 anda > 0.

(b) Prove thatQ is negative definite if detA > 0 anda < 0.

(c) Prove thatQ is indefinite if detA < 0.

3. Let A be ann× n matrix and let~x, ~y ∈ Rn. Define< ~x, ~y >= ~xTA~y. Prove that< , > is an inner product onRn if and only if A is a positive definite, symmetricmatrix.

4. LetA be a positive definite symmetric matrix. Prove that:

(a) the diagonal entries ofA are all positive.

(b) A is invertible.

5. Let A andB be symmetricn × n matrices whose eigenvalues are all positive.Show that the eigenvalues ofA+ B are all positive.


10.4 Graphing Quadratic Forms

In many applications of quadratic forms it is important to beable to sketch the graphof a quadratic formQ(~x) = k for somek. Observe that in general it is not easyto identify the shape of the graph of a general equationax2

1 + bx1x2 + cx22 = k by

inspection. On the other hand, if you are familiar with conicsections, it is easy todetermine the shape of the graph of an equation of the formλ1x2

1 + λ2x22 = k. We

demonstrate this in a table.

The graph ofλ1x21 + λ2x2

2 = k looks like:

k > 0 k = 0 k < 0λ1 > 0, λ2 > 0 ellipse point(0, 0) dneλ1 < 0, λ2 < 0 dne point(0, 0) ellipseλ1λ2 < 0 hyperbola asymptotes for hyperbola hyperbola

λ1 = 0, λ2 > 0 parallel lines line x2 = 0 dneλ1 = 0, λ2 < 0 dne line x2 = 0 parallel lines

.

In the last section we saw that we could bring a quadratic forminto diagonal formby performing the change variables~y = PT~x whereP orthogonally diagonalizesthe corresponding symmetric matrix. Therefore, to sketch an equation of the formax2

1 + bx1x2 + cx22 = k we could first apply the change of variables to bring it into the

form λ1y21+λ2y2

2 = k which will be easy to sketch. However, we must determine howperforming the change of variables will affect the graph.

THEOREM 1 Let Q(~x) = ax21 + bx1x2 + cx2

2. Then there exists an orthogonal matrixP whichcorresponds to a rotation such that the change of variables~y = PT~x bringsQ(~x) intodiagonal form.

Proof: Let A =

[

a b/2b/2 c

]

. SinceA is symmetric we can apply the Principal Axis

Theorem to get that there exists an orthonormal basis{~v1,~v2} of R2 of eigenvectors of

A. Let~v1 =

[

a1

a2

]

and~v2 =

[

b1

b2

]

. Since~v1 is a unit vector we must havea21 + a2

2 = 1.

Hence, the components lie on the unit circle and so there exists an angleθ such thata1 = cosθ anda2 = sinθ. Moreover, since~v2 is a unit vector orthogonal to~v1 we canpick b1 = − sinθ andb2 = cosθ. Hence we have

P =

[

cosθ − sinθsinθ cosθ

]

which corresponds to a rotation byθ. Finally, from our work above, we know thatthis change of basis matrix bringsQ into diagonal form. �

Section 10.4 Graphing Quadratic Forms 91

REMARK

Observe that our choices for~v1 and~v2 in the proof above were not unique. For

example, we could have picked~v1 =

[

cosθsinθ

]

and~v2 =

[

sinθcosθ

]

. In this case the matrix

P =[

~v1 ~v2

]

would correspond to a rotation and a reflection. This does notcontradictthe theorem though. The theorem only says that we can chooseP to correspond to arotation, not that a rotation is the only choice. We will demonstrate this in an examplebelow.

EXAMPLE 1 Consider the equationx21 + x1x2 + x2

2 = 1. Find a rotation so that the equation has nocross terms.


[

1 1/21/2 1

]

. The eigenvalues areλ1 =12 and λ2 =

32.

Therefore, the graph ofx21+x1x2+x2

2 = 1 is an ellipse. We find that the corresponding

unit eigenvectors are~v1 =1√2

[

−11

]

and~v2 =1√2

[

−1−1

]

. So, from our work in the proof

of the theorem, we can pick cosθ =−1√

2and sinθ =

1√

2. Hence,θ =

3π4

and

[

x1

x2

]

=

[

−1/√

2 −1/√

21/√

2 −1/√

2

] [

y1

y2

]

=

[

(−y1 − y2)/√

2(y1 − y2)/

√2

]

Substituting these into the equation gives

1 = x21 + x1x2 + x2

2 =

(

−y1 − y2√2

)2

+

(

−y1 − y2√2

) (

y1 − y2√2

)

+

(

y1 − y2√2

)2

=12

(y1 + 2y1y2 + y22) +

12

(−y21 + y2

2) +12

(y21 − 2y1y2 + y2

2)

=12

y21 +

32

y22

Thus, the graph ofx21 + x1x2 + x2

2 = 1 is the graph of 1=12

y21 +

32

y22 rotated by3π

4

degrees. This is shown below.


Let us look at this procedure in general. Assume that we want to sketchQ(x1, x2) = kwhereQ(x1, x2) is a quadratic form with corresponding symmetric matrixA. Wefirst write Q(x1, x2) in diagonal formλ1y2

1 + λ2y22 by finding the orthogonal matrix

P =[

~v1 ~v2

]

which diagonalizesA and performing the change of variables~y = PT~x.We can then easily sketchλ1y2

1 + λ2y22 = k (finding the equations of the asymptotes if

it is a hyperbola) in they1y2-plane. Then using the fact that

~x = P~y =[

~v1 ~v2

]

[

y1

y2

]

= y1~v1 + y2~v2

we can find they1-axis and they2-axis in thex1x2-plane. In particular, they1-axis in

they1y2-plane is spanned by

[

10

]

, thus they1-axis in thex1x2-plane is spanned by

~x = 1~v1 + 0~v2 = ~v1

Hence, performing the change of variables rotates they1-axis to be in the directionof ~v1. Similarly, they2 axis is rotated to the~v2 direction. Therefore, we can sketchthe graph ofQ(x1, x2) = k in the x1x2-plane by drawing they1 andy2 axes in thex1x2-plane and sketching the graph ofλ1y2

1 + λ2y22 = k on these axes as we did in

they1y2-plane. For this reason, the orthogonal eigenvectors~v1,~v2 of A are called theprincipal axes for Q(x1, x2).

EXAMPLE 2 Sketch 6x21 + 6x1x2 − 2x2

2 = 5.

Solution: We start by finding an orthogonal matrix which diagonalizes the corre-sponding symmetric matrixA.

We haveA =

[

6 33 −2

]

, so det(A − λI ) =∣

∣

∣

∣

∣

∣

6− λ 33 −2− λ

∣

∣

∣

∣

∣

∣

= (λ − 7)(λ + 3). We

find corresponding eigenvectors ofA are

[

31

]

for λ = 7 and

[

−13

]

for λ = −3. Thus,

P =

[

3/√

10 −1/√

101/√

10 3/√

10

]

orthogonally diagonalizesA.

Then, from our work above, we know that thechange of variables~y = PT~x changes 6x2

1 +

6x1x2 − 2x22 = 5 into 7y2

1 − 3y22 = 5. To sketch

the hyperbola 5= 7y21 − 3y2

2 more accurately, wefirst find the equations of its asymptotes. We get

0 = 7y21 − 3y2

2

y2 = ±√

213

y1

≈ ±1.528y1


Now to sketch 6x21+6x1x2−2x2

2 = 5 in thex1x2-plane we first draw they1-axis in the

direction of

[

31

]

and they2-axis in the direction of

[

−13

]

. Next, to be more precise, we

use the change of variables~y = PT~x to convert the equations of the asymptotes of thehyperbola to equations in thex1x2-plane. We have

[

y1

y2

]

=1√

10

[

3 1−1 3

] [

x1

x2

]

=1√

10

[

3x1 + x2

−x1 + 3x2

]

Hence, we get the equations of the asymptotes are

y2 = ±√

213

y1

1√

10(−x1 + 3x2) = ±

√213

1√

10(3x1 + x2)

−3x1 + 9x2 = ±3√

21x1 ±√

21x2

x2(9∓√

21)= (3± 3√

21)x1

x2 =3± 3

√21

9∓√

21x1

Plotting the asymptotes and copying the graphfrom above onto they1 andy2 axes gives the pic-ture to the right.

REMARK

The advantage of finding the principal axes over just calculating the rotation is thatwe actually get precise equations of the axes and the asymptotes which are requiredin some applications.

EXAMPLE 3 Sketch 2x21 + 4x1x2 + 5x2

2 = 1.


[

2 22 5

]

, so det(A − λI ) =∣

∣

∣

∣

∣

∣

2− λ 22 5− λ

∣

∣

∣

∣

∣

∣

= (λ − 6)(λ − 1).

We find corresponding eigenvectors ofA are

[

12

]

for λ = 6 and

[

−21

]

for λ = 1. To

demonstrate a rotation and a reflection, we will pickP =

[

−2/√

5 1/√

51/√

5 2/√

5

]

.


We find corresponding eigenvectors ofA are

[

12

]

for λ = 6 and

[

−21

]

for λ = 1. To demon-

strate a rotation and a reflection, we will pick

P =

[

−2/√

5 1/√

51/√

5 2/√

5

]

. The change of variables

~y = PT~x changes 2x21 + 4x1x2 + 5x2

2 = 1 into theellipsey2

1 + 6y22 = 1. Graphing gives the picture

to the right.

To sketch 2x21+4x1x2+5x2

2 = 1 in thex1x2-plane

we draw they1-axis in the direction of

[

−21

]

and

the y2-axis in the direction of

[

12

]

and copy the

picture above onto the axes appropriately to getthe picture to the right.

EXAMPLE 4 Sketchx21 + 4x1x2 − 2x2

2 = 1.


[

1 22 −2

]

so det(A−λI ) =∣

∣

∣

∣

∣

∣

1− λ 22 −2− λ

∣

∣

∣

∣

∣

∣

= (λ+3)(λ−2). We

find corresponding eigenvectors ofA are

[

−12

]

for λ = −3 and

[

21

]

for λ = 2. Thus,

P =

[

−1/√

5 2/√

52/√

5 1/√

5

]

orthogonally diagonalizesA.

Then, the change of variables~y = PT~x changesx2

1 + 4x1x2 − 2x22 = 1 into−3y2

1 + 2y22 = 5. This

is a hyperbola, so we find the equations of itsasymptotes. We get

0 = −3y21 + 2y2

2

y2 = ±√

62

y1

≈ ±1.225y1


To sketchx21 + 4x1x2 − 2x2

2 = 1 in the x1x2-plane we first draw they1-axis in the

direction of

[

−12

]

and they2-axis in the direction of

[

21

]

. Next, we use the change

of variables~y = PT~x to convert the equations of the asymptotes of the hyperbola toequations in thex1x2-plane. We have

[

y1

y2

]

=1√

5

[

−1 22 1

] [

x1

x2

]

=1√

5

[

−x1 + 2x2

2x1 + x2

]

Hence, we get the equations of the asymptotes are

y2 = ±√

62

x1

1√

5(2x1 + x2) = ±

√6

21√

5(−x1 + 2x2)

4x1 + 2x2 = ∓√

6x1 ± 2√

6x2

x2(2∓ 2√

6) = (−4∓√

6)x1

x2 =−4∓

√6

2∓ 2√

6x1

Plotting the asymptotes and copying the graphfrom above onto they1 andy2 axes gives the pic-ture to the right.

Notice that in the last two examples, the change of variableswe used actually corre-sponded to a reflection and a rotation.


1. Sketch the graph of each of the following equations showing both the originaland new axes. For any hyperbola, find the equation of the asymptotes.

(a) x21 + 8x1x2 + x2

2 = 6 (b) 3x21 − 2x1x2 + 3x2

2 = 12(c) −4x2

1 + 4x1x2 − 7x22 = 8 (d)−3x2

1 − 4x1x2 = 4(e)−x2

1 + 4x1x2 + 2x22 = 6 (f) 4x2

1 + 4x1x2 + 4x22 = 12


10.5 Optimizing Quadratic Forms

In calculus we use quadratic forms to classify critical points as local minimums andmaximums. However, in many other applications of quadraticforms, we actuallywant to find the maximum and/or minimum of the quadratic form subject to a con-straint. Most of the time, it is possible to use a change of variables so that the con-straint is‖~x‖ = 1. That is, given a quadratic formQ(~x) = ~xTA~x onRn we want to findthe maximum and minimum value ofQ(~x) subject to‖~x‖ = 1. For ease, we insteaduse the equivalent constraint

1 = ‖~x‖2 = ~x21 + · · · ~x2

n

To develop a procedure for solving this problem in general, we begin by looking at acouple of examples.

EXAMPLE 1 Find the maximum and minimum ofQ(x1, x2) = 2x21 + 3x2

2 subject to the constraintx2

1 + x22 = 1.

Solution: Since we don’t have a general method for solving this type of problem, weresort to the basics. We will first try a few points (x1, x2) that satisfy the constraint.We get

Q(1, 0) = 2

Q

√3

2,12

=94

Q

(

1√

2,

1√

2

)

=52

Q

12,

√3

2

=114

Q(0, 1) = 3

Although we have only tried a few points, we may guess that 3 isthe maximum value.Since we have already found thatQ(0, 1) = 3, to prove that 3 is the maximum we justneed to show that 3 is an upper bound forQ(x1, x2) subject tox2

1 + x22 = 1. Indeed,

we have that

Q(x1, x2) = 2x21 + 3x2

2 ≤ 3x21 + 3x2

2 = 3(x21 + x2

2) = 3

Hence, 3 is the maximum.

Similarly, we have

Q(x1, x2) = 2x21 + 3x2

2 ≥ 2x21 + 2x2

2 = 2(x21 + x2

2) = 2

Hence, 2 is a lower bound forQ(x1, x2) subject to the constraint andQ(1, 0) = 2, so2 is the minimum.

Section 10.5 Optimizing Quadratic Forms 97

This example was extremely easy since we had no cross-terms in the quadratic form.We now look at what happens if we have cross-terms.

EXAMPLE 2 Find the maximum and minimum ofQ(x1, x2) = 7x21 + 8x1x2 + x2

2 subject to theconstraintx2

1 + x22 = 1.

Solution: We first try some values of (x1, x2) that satisfy the constraint. We have

Q(1, 0) = 7

Q

(

1√

2,

1√

2

)

= 8

Q(0, 1) = 1

Q

(

− 1√

2,

1√

2

)

= 0

Q(−1, 0) = 7

From this we might think that the maximum is 8 and the minimum is 0, although weagain realize that we have tested very few points. Indeed, ifwe try more points, wequickly find that these are not the maximum and minimum. So, instead of testingpoints, we need to try to think of where the maximum and minimum should occur.In the previous example, they occurred at (1, 0) and (0, 1) which are on the principalaxes. Thus, it makes sense to consider the principal axes of this quadratic form aswell.

We find thatQ(x1, x2) has eigenvaluesλ = 9 andλ = −1 with corresponding unit

eigenvectors~v1 =

[

2/√

51/√

5

]

and~v2 =

[

1/√

5−2/√

5

]

. Taking these for (x1, x2) we get

Q

(

2√

5,

1√

5

)

= 9, Q

(

1√

5,−2√

5

)

= −1

Moreover, taking

~y = PT~x =1√

5

[

2 11 −2

] [

x1

x2

]

=1√

5

[

2x1 + x2

x1 − 2x2

]

we get

Q(x1, x2) = ~yTD~y = 9

(

2√

5x1 +

1√

5x2

)2

−(

1√

5x1 −

2√

5x2

)2

≤ 9

(

2√

5x1 +

1√

5x2

)2

+ 9

(

1√

5x1 −

2√

5x2

)2

= 9

(

2√

5x1 +

1√

5x2

)2

+

(

1√

5x1 −

2√

5x2

)2

= 9

since(

2√

5x1 +

1√

5x2

)2

+

(

1√

5x1 −

2√

5x2

)2

= x21 + x2

2 = 1

Similarly, we can show thatQ(x1, x2) ≥ −1. Thus, the maximum ofQ(x1, x2) subjectto ‖~x‖ = 1 is 9 and the minimum is−1.


We generalize what we learned in the examples above to get thefollowing theorem.

THEOREM 1 Let Q(~x) be a quadratic form onRn with corresponding symmetric matrixA. Themaximum value and minimum value ofQ(~x) subject to the constraint‖~x‖ = 1 are thegreatest and least eigenvalues ofA respectively. Moreover, these values occur when~x is taken to be a corresponding unit eigenvector of the eigenvalue.

Proof: SinceA is symmetric, we can find an orthogonal matrixP =[

~v1 · · · ~vn

]

which orthogonally diagonalizesA where~v1, . . . ,~vn are arranged so that the corre-sponding eigenvaluesλ1, . . . , λn satisfyλ1 ≥ λ2 ≥ · · · ≥ λn.

Hence, there exists a diagonal matrixD such that

~xTA~x = ~yTD~y

where~x = P~y. SinceP is orthogonal we have that

‖~y‖ = ‖P~y‖ = ‖~x‖ = 1

Thus, the quadratic form~yTD~y subject to‖~y‖ = 1 takes the same set of values as thequadratic form~xTA~x subject to‖~x‖ = 1. Hence, we just need to find the maximumand minimum of~yTD~y subject to‖~y‖ = 1.

Sincey21 + · · · + y2

n = ‖~y‖2 = 1 we have

~yTD~y = λ1y21 + · · · + λny

2n ≤ λ1y

21 + · · · + λ1y

2n = λ1(y

21 + · · · + y2

n) = λ1

and~yTD~y = λ1 with ~y =

10...0

. Henceλ1 is the maximum value.

Similarly,

~yTD~y = λ1y21 + · · · + λny

2n ≥ λny

21 + · · · + λny

2n = λn(y

21 + · · · + y2

n) = λn

and~yTD~y = λn with ~y =

0...01

. Henceλn is the minimum value. �

EXAMPLE 3 Let A =

−3 0 −10 3 0−1 0 −3

. Find the maximum and minimum value of the quadratic form

Q(~x) = ~xTA~x subject tox21 + x2

2 + x23 = 1.


C(λ) =

∣

∣

∣

∣

∣

∣

∣

∣

−3− λ 0 −10 3− λ 0−1 0 −3− λ

∣

∣

∣

∣

∣

∣

∣

∣

= −(λ − 3)(λ + 4)(λ + 2)

Thus, by Theorem1 the maximum ofQ(~x) subject to the constraint is 3 and theminimum is−4.

Section 10.6 Singular Value Decomposition 99

EXAMPLE 4 Find the maximum and minimum value of the quadratic form

Q(x1, x2, x3) = 4x21 − 2x1x2 + 2x1x3 + 4x2

2 − 2x2x3 + 4x23

subject tox21+ x2

2 + x23 = 1. Find vectors at which the maximum and minimum occur.

Solution: The characteristic polynomial of the corresponding symmetric matrix is

C(λ) =

∣

∣

∣

∣

∣

∣

∣

∣

4− λ 1 11 4− λ −11 −1 4− λ

∣

∣

∣

∣

∣

∣

∣

∣

= −(λ − 5)2(λ − 2)

We find that a unit eigenvectors corresponding toλ1 = 5 andλ2 = 2 are~v1 =

1/√

21/√

20

and~v2 =

−1/√

31/√

31/√

3

respectively. Thus, by Theorem1 the maximum ofQ(~x) subject

to x21 + x2

2 + x23 = 1 is 5 and occurs when~x = ~v1, and the minimum is 2 and occurs

when~x = ~v2.


1. Find the maximum and minimum of each quadratic form subject to ‖~x‖ = 1.

(a) Q(~x) = 4x21 − 2x1x2 + 4x2

2

(b) Q(~x) = 3x21 + 10x1x2 − 3x2

2

(c) Q(~x) = −x21 + 8x2

2 + 4x2x3 + 11x23

(d) Q(~x) = 3x21 − 4x1x2 + 8x1x3 + 6x2

2 + 4x2x3 + 3x23

(e) Q(~x) = 2x21 + 2x1x2 + 2x1x3 + 2x2

2 + 2x2x3 + 2x23

10.6 Singular Value Decomposition

We have now seen that finding the maximum and minimum of a quadratic form sub-ject to‖~x‖ = 1 is very easy. However, in many applications we are not luckyenoughto get a symmetric matrix; in fact, we often do not even get a square matrix. Thus, itis very desirable to derive a similar method for finding the maximum and minimumof ‖A~x‖ for anm× n matrix A subject to‖~x‖ = 1. This derivation will lead us to avery important matrix factorization known as the singular value decomposition. Weagain begin by looking at an example.


EXAMPLE 1 Consider a linear mappingf : R3 → R2 defined byf (~x) = A~x with A =

[

1 0 11 1 −1

]

.

Since the range is now a subspace ofR2, we want to find the maximum length ofA~xsubject to‖~x‖ = 1. That is, we want to maximize‖A~x ‖. Observe that

‖A~x‖2 = (A~x)T(A~x) = ~xTATA~x

SinceATA is symmetric this is a quadratic form. Hence, using Theorem 10.5.1, tomaximize‖A~x‖ we just need to find the square root of the largest eigenvalue of ATA.

We have

ATA =

2 1 01 1 −10 −1 2

The characteristic polynomial ofATA is C(λ) = −λ(λ − 3)(λ − 2). Thus, the eigen-values ofATA are 3, 2 and 0. Hence, the maximum of‖A~x ‖ subject to‖~x‖ = 1 is

√3

and the minimum is 0. Moreover, these occur at the corresponding eigenvectors

−1/√

3−1/√

31/√

3

and

−1/√

62/√

61/√

6

of ATA.

In the example, we found that the eigenvalues ofATA were all non-negative. Thiswas important as we had to take the square root of them to maximize/minimize‖A~x‖.We now prove that this is always the case.

THEOREM 1 Let A be anm×nmatrix and letλ1, . . . , λn be the eigenvalues ofATA with correspond-ing unit eigenvectors~v1, . . . ,~vn. Thenλ1, . . . , λn are all non-negative. In particular,

‖A~vi ‖ =√

λi

Proof: For 1≤ i ≤ n we haveATA~vi = λi~vi and hence

‖A~vi ‖2 = (A~vi)TA~vi = vT

i ATA~vi = ~vTi (λi~vi) = λi~v

Ti ~vi = λi‖~vi‖ = λi

since‖~vi ‖ = 1. �

Observe from the example above that the square root of the largest and smallesteigenvalues ofATA are the maximum and minimum of values of‖A~x‖ subject to‖~x‖ = 1. So, these are behaving like the eigenvalues of a symmetricmatrix. Thismotivates the following definition.

DEFINITIONSingular Values

The singular values σ1, . . . , σn of an m × n matrix A are the square roots of theeigenvalues ofATA arranged so thatσ1 ≥ σ2 ≥ · · · ≥ σn ≥ 0.


EXAMPLE 2 Find the singular values ofA =

[

1 1 12 2 2

]

.

Solution: We haveATA =

5 5 55 5 55 5 5

. The characteristic polynomial isC(λ) =

−λ2(λ − 15) so the eigenvalues are (from greatest to least)λ1 = 15, λ2 = 0, andλ3 = 0. Thus, the singular values ofA areσ1 =

√15,σ2 = 0, andσ3 = 0.

EXAMPLE 3 Find the singular values ofB =

1 1−1 2−1 1

.

Solution: We haveBTB =

[

3 −2−2 6

]

. The characteristic polynomial isC(λ) = (λ −

2)(λ−7). Hence, the eigenvalues ofBTB areλ1 = 7 andλ2 = 2, so the singular valuesof B areσ1 =

√7 andσ2 =

√2.

EXAMPLE 4 Find the singular values ofA =

2 −11 01 2

andB =

[

2 1 1−1 0 2

]

.


[

6 00 5

]

. Hence, the eigenvalues ofATA areλ1 = 6 and

λ2 = 5. Thus, the singular values ofA areσ1 =√

6 andσ2 =√

5.

We haveBTB =

5 2 02 1 10 1 5

. The characteristic polynomial ofBTB is

C(λ) =

∣

∣

∣

∣

∣

∣

∣

∣

5− λ 2 02 1− λ 10 1 5− λ

∣

∣

∣

∣

∣

∣

∣

∣

= −λ(λ − 5)(λ − 6)

Hence, the eigenvalues ofBTB areλ1 = 6, λ2 = 5, andλ3 = 0. Thus, the singularvalues ofB areσ1 =

√6,σ2 =

√5, andσ3 = 0.

Recall that 0 is an eigenvalue of ann × n matrix A if and only if rankA , n. Inparticular, we know that the number of non-zero eigenvaluesof ATA is the rank ofATA. Thus, it is natural to ask if there is a relationship betweenthe number of non-zero singular values of anm× n matrix A and the rank ofA. In the problems above,we see that we have the rank ofA equals the number of non-zero singular values. Toprove this result in general, we use the following lemma.


THEOREM 2 Let A be anm× n matrix. Then rank(ATA) = rank(A).

Proof: If A~x = ~0, thenATA~x = AT~0 = ~0. Hence the nullspace ofA is a subset of thenullspace ofATA. On the other hand, considerATA~x = ~0. Then

‖A~x‖2 = (A~x) · (A~x) = (A~x)T(A~x) = ~xTATA~x = ~xT~0 = 0

Thus,A~x = ~0. Hence, the nullspace ofATA is a subset of the nullspace ofA.

Therefore, dim(Null(ATA)) = dim(Null(A)) and so by the Rank-Nullity Theorem

rank(ATA) = n− dim(Null(ATA)) = n− dim(Null(A)) = rank(A)

�

COROLLARY 3 If A is anm× n matrix and rank(A) = r, thenA hasr non-zero singular values.

We now want to extend the similarity of singular values and eigenvalues by definingsingular vectors. Observe that ifA is anm× n matrix with m , n, then we cannothaveA~v = σ~v sinceA~v ∈ Rm. Hence, the best we can do to match the definition ofeigenvalues and eigenvectors is to pick suitable non-zero vectors~v ∈ Rn and~u ∈ Rm

such thatA~v = σ~u.

By definition, for any non-zero singular valueσ of A there is a vector~v , ~0 suchthat ATA~v = σ2~v. Thus, if we haveA~v = σ~u, then we haveATA~v = AT(σ~u) soσ2~v = σAT~u. Dividing byσ, we see that we must also haveAT~u = σ~v. Moreover, byTheorem 1, if~v is a unit eigenvector ofATA, then we get that~u = 1

σA~v is also a unit

vector.

Therefore, for a non-zero singular valueσ of A, we will want unit vectors~v and~usuch that

A~v = σ~u and AT~u = σ~v

However, our derivation does not work forσ = 0. In this case, we will see that weare satisfied with just one of these conditions being satisfied.

DEFINITIONSingular Vectors

Let A be anm× n matrix. If ~v ∈ Rn and~u ∈ Rm are unit vectors andσ , 0 is asingular value ofA such that

A~v = σ~u and AT~u = σ~v

then we say that~u is aleft singular vector of A and~v is aright singular vector of A.Additionally, if ~u is a unit vector such thatAT~u = ~0, then~u is a left singular vectorof A. If ~v is a unit vector such thatA~v = ~0, then~v is aright singular vector of A.

This definition of right singular vectors, not only preserves our relationship of thesevectors and the eigenvectors ofATA, but we also get the corresponding result for theleft singular vectors and the eigenvectors ofAAT .


THEOREM 4 Let A be anm× n matrix. If~v is a right singular vector ofA, then~v is an eigenvectorof ATA. If ~u is a left singular vector ofA, then~u is an eigenvector ofAAT .

Now that we have singular values and singular vectors mimicking eigenvalues andeigenvectors, we wish to try to mimic orthogonal diagonalization. LetA be anm× nmatrix. As was the case with singular vectors, ifm, n, thenPTAP is not defined fora square matrixP. Thus, we want to find anm×morthogonal matrixU and ann× northogonal matrixV such thatUTAV = Σ, whereΣ is anm× n “diagonal” matrix.

To do this, we require that for anym×n matrixA there exists an orthonormal basis forR

n of right singular vectors and an orthonormal basis forRm of left singular vectors.We now prove that these always exist.

LEMMA 5 Let A be anm×nmatrix with rank(A) = r and suppose that{~v1, . . . ,~vn} is an orthonor-mal basis forRn consisting of the eigenvectors ofATA arranged so that the corre-sponding eigenvaluesλ1, . . . , λn are arranged from greatest to least and letσ1, . . . , σn

be the singular values ofA. Then{A~v1σ1, . . . , A~vr

σr} is an orthonormal basis for ColA.

Proof: SinceA has rankr, we know thatATA has rankr and soATA hasr non-zeroeigenvalues. Thus,σ1, . . . , σr are ther non-zero singular values ofA.

Now observe that for 1≤ i, j ≤ r, i , j we have

A~vi · A~vj = (A~vi)TA~vj = ~viA

TA~vj = ~vTi λ j~vj = λ j~v

Ti ~vj = 0

since{~v1, . . . ,~vr} is an orthonormal set. Hence,{A~v1, . . . ,A~vr } is orthogonal and so{A~v1σ1, . . . , A~vr

σr} is orthonormal by Theorem 1. Moreover, we know that dim Col(A) = r,

so this is an orthonormal basis for ColA. �

THEOREM 6 Let A be anm× n matrix with rankr. There exists an orthonormal basis{~v1, . . . ,~vn}of Rn of right singular vectors ofA and an orthonormal basis{~u1, . . . , ~um} of Rm ofleft singular vectors ofA.

Proof: By definition eigenvectors ofATA are right singular vectors ofA. Hence,sinceATA is symmetric, we can find an orthonormal basis{~v1, . . . ,~vn} of right singu-lar vector ofA.

By Lemma5, the left singular vector~ui =1σi

A~vi, 1≤ i ≤ r, form an orthonormal basisfor Col(A). Also, by definition, a left singular vector~u j of A corresponding toσ = 0lies in the nullspace ofAT . Hence, by the Fundamental Theorem of Linear Algebra,if {~ur+1, . . . , ~um} is an orthonormal basis for the nullspace ofAT , then{~u1, . . . , ~um} isan orthonormal basis forRm of left singular vectors ofA. �


Thus, for anym× n matrix A with rank r we have an orthonormal basis{~v1, . . . ,~vn}for Rn of right singular vectors ofA corresponding to then singular valuesσ1, . . . , σn

of A and an orthonormal basis{~u1, . . . , ~um} for Rm of left singular vectors such that

A[

~v1 · · · ~vn

]

=[

A~v1 · · ·A~vr A~vr+1 · · · A~vn

]

=[

σ1~u1 · · · σr~ur~0 · · · ~0

]

=[

~u1 · · · ~um

]

Σ

whereΣ is them× n matrix with (Σ)ii = σi for 1 ≤ i ≤ r and all other entries ofΣ are0.

Hence, we have that there exists an orthogonal matrixV and orthogonal matrixUsuch thatAV = UΣ. Instead of writing this asUTAV = Σ, we typically, write this asA = UΣVT to get a matrix decomposition ofA.

DEFINITIONSingular ValueDecomposition

(SVD)

A singular value decomposition of anm× n matrix A is a factorization of the form

A = UΣVT

whereU is an orthogonal matrix containing left singular vectors ofA, V is an or-thogonal matrix containing right singular vectors ofA, andΣ is them×n matrix with(Σ)ii = σi for 1 ≤ i ≤ r and all other entries ofΣ are 0.

ALGORITHM

To find a singular value decomposition of a matrix, we follow what we did above.

(1) Find the eigenvaluesλ1, . . . , λn of ATA arranged from greatest to least and acorresponding set of orthonormal eigenvectors{~v1, . . . ,~vn}.

(2) Let V =[

~v1 · · · ~vn

]

and letΣ be them× n matrix whose firstr diagonalentries are ther singular values ofA arranged from greatest to least and allother entries 0.

(3) Find left singular vectors ofA by computing~ui =1σi

A~vi for 1 ≤ i ≤ r. Thenextend the set{~u1, . . . , ~ur} to an orthonormal basis forRm. TakeU =

[

~u1 · · · ~um

]

.

ThenA = UΣVT .

REMARK

As indicated in the proof of Theorem 5. One way to extend{~u1, . . . , ~ur} to an or-thonormal basis forRm is by finding an orthonormal basis for Null(AT).


EXAMPLE 5 Find a singular value decomposition ofA =

[

1 −1 33 1 1

]

.

Solution: We find thatATA =

10 2 62 2 −26 −2 10

has eigenvalues (ordered from greatest

to least) ofλ1 = 16,λ2 = 6 andλ3 = 0. Corresponding orthonormal eigenvectors are

~v1 =

1/√

20

1/√

2

, ~v2 =

1/√

31/√

3−1/√

3

, ~v3 =

1/√

6−2/√

6−1/√

6

So we letV =

1/√

2 1/√

3 1/√

60 1/

√3 −2/

√6

1/√

2 −1/√

3 −1/√

6

.

The singular values ofA areσ1 =√

16= 4, andσ2 =√

6, so we letΣ =

[

4 0 00√

6 0

]

.

Next, we compute

~u1 =1σ1

A~v1 =14

[

4/√

24/√

2

]

=

[

1/√

21/√

2

]

~u2 =1σ2

A~v2 =1√

6

[

−3/√

33/√

3

]

=

[

−1/√

21/√

2

]

Since this forms a basis forR2 we takeU =

[

1/√

2 −1/√

21/√

2 1/√

2

]

.

Then, we have a singular value decompositionA = UΣVT .

EXAMPLE 6 Find a singular value decomposition ofB =

2 −42 2−4 01 4

.

Solution: We haveBTB =

[

25 00 36

]

. The eigenvalues areλ1 = 36 andλ2 = 25 with

corresponding orthonormal eigenvectors

[

01

]

,

[

10

]

. SoV =

[

0 11 0

]

.

The singular values ofB areσ1 = 6 andσ2 = 5 soΣ =

6 00 50 00 0

.


We compute

~u1 =1σ1

B~v1 =16

−4204

~u2 =1σ2

B~v2 =15

22−41

But, we only have 2 orthogonal vectors inR4, so we need to extend{~u1, ~u2} to anorthonormal basis forR4. We know that we can complete a basis forR4 by findingan orthonormal basis for Null(BT).

Applying the Gram-Schmidt algorithm to a basis for Null(BT) we get vectors

~u3 =13

1−202

, ~u4 =115

8894

.

We letU =[

~u1 ~u2 ~u3 ~u4

]

and then we have a singular value decompositionUΣVT

of B.

EXAMPLE 7 Find a singular value decomposition forA =

1 12 2−1 −1

.


[

6 66 6

]

. Thus, by inspection, the eigenvalues ofA λ1 = 12

andλ2 = 0, with corresponding unit eigenvectors~v1 =

[

1/√

21/√

2

]

and~v2 =

[

−1/√

21/√

2

]

respectively. Hence, the singular values ofA areσ1 =√

12 andσ2 = 0. Thus, wehave

V =

[

1/√

2 −1/√

21/√

2 1/√

2

]

Σ =

√12 00 00 0

We compute

~u1 =1σ1

A~v1 =1√

25

24−2

Since we only have one vectors inR3, we need to extend{~u1} to an orthonormal basisfor R3. To do this, we find an orthonormal basis for Null(AT).


We observe that a basis for Null(AT) is

−210

,

101

. Applying the Gram-Schmidt

algorithm to this basis and normalizing gives

~u2 =1√

6

−210

, ~u3 =1√

30

125

We letU =[

~u1 ~u2 ~u3

]

and then we have a singular value decompositionUΣVT ofA.


1. Find a singular value decomposition for the following matrices.

(a)

[

2 2−1 2

]

(b)

[

3 −48 6

]

(c)

[

2 30 2

]

(d)

1 21 21 21 2

(e)

[

1 1 1 12 2 2 2

]

(f)

[

4 4 −23 2 4

]

2. Let A be anm× n matrix and letP be anm×m orthogonal matrix. Show thatPA has the same singular values asA.

3. LetUΣVT be a singular value decomposition for anm× n matrix A with rankr. Find, with proof, an orthonormal basis for Row(A), Col(A), Null(A) andNull(AT) from the columns ofU andV.

4. LetA =

2 1 10 2 00 0 2

be the standard matrix of a linear mappingL. (Observe that

A is not diagonalizable).

(a) Find a basisB for R3 of right singular vectors ofA.

(b) Find a basisC for R3 of left singular vectors ofA.

(c) DetermineC[L]B.

Chapter 11

Complex Vector Spaces

Our goal in this chapter is to extend everything we have done with real vector spacesto complex vector spaces. That is, vector spaces which use complex numbers fortheir scalars instead of just real numbers. We begin with a very brief review of thebasic operations on complex numbers.

11.1 Complex Number Review

Recall that a complex number is a number of the formz= a+ bi wherea, b ∈ R andi2 = −1. The set of all complex numbers is denotedC. We calla the real part ofzand denote it by

Rez= a

and we callb the imaginary part ofzand denote it by

Im z= b

It is very important to notice that the real numbers are a subset of the complex num-bers. In particular, every real numbera is the complex numbera+ 0i. If the real partof a complex numberz is 0, then we say thatz is imaginary. Any complex numberwhich has non-zero imaginary part is said to be non-real.

We define addition and subtraction of complex numbers by

(a+ bi) ± (c+ di) = (a± c) + (b± d)i

EXAMPLE 1 We have

(3+ 4i) + (2− 3i) = (3+ 2)+ (4− 3)i = 5+ i

(1− 3i) − (4+ 2i) = (1− 4)+ (−3− 2)i = −3− 5i

108

Section 11.1 Complex Number Review 109

Observe that addition of complex numbers is defined by addingthe real parts andadding the imaginary parts of the complex numbers. That is, we add complex num-bers component wise just like vectors inRn.

In fact, the setC with this rule for addition and with scalar multiplication by a realscalar defined by

t(a+ bi) = ta+ tbi, t ∈ Rforms a vector space overR. Let’s find a basis for this real vector space. Notice thatevery vector inC can be written asa+ bi with a, b ∈ R, so{1, i} spansC. Also, {1, i}is clearly linearly independent as the only solution to

t11+ t2i = 0+ 0i

is t1 = t2 = 0 sincet1 and t2 are real scalars. Hence,C forms a two dimensionalreal vector space. Consequently, it is isomorphic to every other two dimensional realvector space. In particular, it is isomorphic to the planeR2. As a result, the setC isoften called the complex plane. We generally think of havingan isomorphism whichmaps the complex numberz= a+ bi to the point (a, b) in R2.

We can use this isomorphism to define the absolute value of a complex numberz =a + bi. Recall that the absolute value of a real numberx measures the distance thatx is from the origin. So, we define the absolute value ofz as the distance from thepoint (a, b) in R2 to the origin.

DEFINITIONAbsolute Value

Theabsolute value or modulus of a complex numberz= a+ bi is defined by

|z| =√

a2 + b2

EXAMPLE 2 We have

|2− 3i| =√

22 + (−3)2 =√

13

∣

∣

∣

∣

∣

12+

12

i∣

∣

∣

∣

∣

=

√

(

12

)2

+

(

12

)2

=

√

12

Of course, this has the same properties as the real absolute value.

THEOREM 1 Let w, z ∈ C. Then

(1) |z| is a non-negative real number.(2) |z| = 0 if and only ifz= 0.(3) |w+ z| ≤ |w| + |z|

110 Chapter 11 Complex Vector Spaces

REMARK

If z = a + bi is real, thenb = 0, so|z| = |a + 0i| =√

a2 + 02 = |a|. Hence, this is adirect generalization of the real absolute value function.

The ability to represent complex numbers as points in the plane is very important inthe study of complex numbers. However, viewingC as a real vector space has oneserious limitation; it only defines the multiplication of a real scalar by a complexnumber and does not immediately give us a way of multiplying two complex num-bers together. So, we return to the standard form of complex numbers and definemultiplication of complex numbers by using the normal distributive property and thefact thati2 = −1. We get

(a+ bi)(c+ di) = ac+ adi+ bci+ bdi2 = ac− bd+ (ad+ bc)i

EXAMPLE 3 We have

(3+ 4i)(2− 3i) = 3(2)− 4(−3)+ [4(2)+ 3(−3)]i = 18− i

(1− 3i)(4+ 2i) = 1(4)− (−3)(2)+ [(−3)(4)+ 1(2)]i = 10− 10i

We get the following familiar properties.

THEOREM 2 Let z1, z2, z3 ∈ C, then

(1) z1 + z2 = z2 + z1

(2) z1z2 = z2z1

(3) z1 + (z2 + z3) = (z1 + z2) + z3

(4) z1(z2z3) = (z1z2)z3

(5) z1(z2 + z3) = z1z2 + z1z3

(6) |z1z2| = |z1||z2|

We look at one more important example of complex multiplication.

EXAMPLE 4(a+ bi)(a− bi) = a2 + b2 + [b(a) − a(b)]i = a2 + b2 = |a+ bi|2

This example motivates the following definition.

DEFINITIONComplex

Conjugate

Let z= a+ bi be a complex number. Then, thecomplex conjugate of z is defined by

z= a− bi


EXAMPLE 5 We have

4− 3i = 4+ 3i

3i = −3i

4 = 4

2+ 2i = 2− 2i

We have many important properties of complex conjugates.

THEOREM 3 Let w, z ∈ C with z= a+ bi, then

(1) z= z(2) z is real if and only ifz= z(3) z is imaginary if and only ifz= −z(4) z± w = z± w(5) zw= zw(6) z+ z= 2 Re(z) = 2a(7) z− z= i2 Im(z) = 2bi(8) zz= a2 + b2 = |z|2

Since the absolute value of a non-zero complex number is a positive real number,we can use property (8) to divide complex numbers. We demonstrate this with someexamples.

EXAMPLE 6 Calculate2− 3i1+ 2i

.

Solution: To calculate this, we make the denominator real by multiplying the numer-ator and the denominator by the complex conjugate of the denominator. We get

2− 3i1+ 2i

=2− 3i1+ 2i

· 1− 2i1− 2i

=2− 6+ (−3− 4)i

12 + 22=−4+ i

5=−45− 7

5i

We can easily check this answer. We have

(1+ 2i)

(

−45− 7

5i

)

= −45+

145+

[

−85− 7

5

]

i = 2− 3i

EXAMPLE 7 Calculatei

1+ i.

Solution: We havei

1+ i=

i1+ i

· 1− i1− i

=1+ i1+ 1

=12+

12

i


EXERCISE 1 Calculate5

3− 4i.

In some real world applications, such as problems involvingelectric circuits, oneneeds to solve systems of linear equations involving complex numbers. The proce-dure for solving systems of linear equations is the same, except that we can nowmultiply a row by a non-zero complex number and we can a complex multiple of onerow to another. We demonstrate this with a few examples.

EXAMPLE 8 Solve the system of linear equations

z1 + iz2 + (−1+ 2i)z3 = −1+ 2i

z2 + 2z3 = 2+ 2i

2z1 + (−1+ 2i)z2 + (−6+ 4i)z3 = −4

Solution: We row reduce the augmented matrix of the system to RREF.

1 i −1+ 2i −1+ 2i0 1 2 2+ 2i2 −1+ 2i −6+ 4i −4

R3 − 2R1

∼

1 i −1+ 2i −1+ 2i0 1 2 2+ 2i0 −1 −4 −2− 4i

R1 − iR2

R3 + R2

∼

1 0 −1 10 1 2 2+ 2i0 0 −2 −2i

−12R3

∼

1 0 −1 10 1 2 2+ 2i0 0 1 i

R1 + R3

R2 − 2R3 ∼

1 0 0 1+ i0 1 0 20 0 1 i

Hence, the solution isz1 = 1+ i, z2 = 2, andz3 = i.

REMARK

It is very easy to make calculation mistakes when working with complex numbers.Thus, it is highly recommended that you check your answer whenever possible.


EXAMPLE 9 Solve the system of linear equation

z1 + z2 + iz3 = 1

−2iz1 + z2 + (1− 2i)z3 = −2i

(1+ i)z1 + (2+ 2i)z2 − 2z3 = 2


1 1 i 1−2i 1 1− 2i −2i1+ i 2+ 2i −2 2

R2 + 2iR1

R3 + (1+ i)R1

∼

1 1 i 10 1+ 2i −1− 2i 00 1+ i −1− i 1− i

11+2i R2 ∼

1 1 i 10 1 −1 00 1+ i −1− i 1− i

R1 − R2

R3 − (1+ i)R2

∼

1 0 1+ i 10 1 −1 00 0 0 1− i

Hence, the system is inconsistent.

EXAMPLE 10 Find the general solution of the system of linear equations

z1 − iz2 = 1+ i

(1+ i)z1 + (1− i)z2 + (1− i)z3 = 1+ 3i

2z1 − 2iz2 + (3+ i)z3 = 1+ 5i


1 −i 0 1+ i1+ i 1− i 1− i 1+ 3i

2 −2i 3+ i 1+ 5i

R2 − (1+ i)R1

R3 − 2R1

∼

1 −i 0 1+ i0 0 1− i 1+ i0 0 3+ i −1+ 3i

11−i R2 ∼

1 −i 0 1+ i0 0 1 i0 0 3+ i −1+ 3i

R3 − (3+ i)R2

∼

1 −i 0 1+ i0 0 1 i0 0 0 0

Thus,z2 is a free variable. Since we are solving this system overC, this means thatz2 can take anycomplexvalue. Hence, we letz2 = α ∈ C. Then, the general solutionis z1 = 1+ i + αi, z2 = α, andz3 = i.



1. Calculate the following

(a) (3+ 4i) − (−2+ 6i) (b) (−1+ 2i)(3+ 2i) (c) −3i(−2+ 3i)

(d) |(1+ i)(1− 2i)(3+ 4i) | (e) |1+ 6i | (f)∣

∣

∣

∣

23 −√

2i∣

∣

∣

∣

(g) 21−i (h) 4−3i

3−4i (i) 2+5i−3−6i

2. Solving the following systems of linear equations overC.

(a) z1 + (2+ i)z2 + iz3 = 1+ i

iz1 + (−1+ 2i)z2 + 2iz4 = −i

z1 + (2+ i)z2 + (1+ i)z3 + 2iz4 = 2− i

(b) iz1 + 2z2 − (3+ i)z3 = 1

(1+ i)z1 + (2− 2i)z2 − 4z3 = i

iz1 + 2z2 − (3+ 3i)z3 = 1+ 2i

(c) iz1 + (1+ i)z2 + z3 = 2i

(1− i)z1 + (1− 2i)z2 + (−2+ i)z3 = −2+ i

2iz1 + 2iz2 + 2z3 = 4+ 2i

(d) z1 − z2 + iz3 = 2i

(1+ i)z1 − iz2 + iz3 = −2+ i

(1− i)z1 + (−1+ 2i)z2 + (1+ 2i)z3 = 3+ 2i

3. Prove that for anyz1, z2, z3 ∈ C that we have

z1(z2 + z3) = z1z2 + z1z3

4. Prove that for any positive integern andz ∈ C we have

zn = (z)n

Section 11.2 Complex Vector Spaces 115

11.2 Complex Vector Spaces

We saw in the last section that the set of all complex numbersC can be thought ofas a two dimensional real vector space. However, we see this is inappropriate for thesolution space of Example 5.1.10 since it requires multiplication by complex scalars.Thus, we need to extend the definition of a real vector space toa complex vectorspace.

DEFINITIONComplex Vector

Space

A setV is called avector space over C if there is an operation of addition and anoperation of scalar multiplication such that for any~v,~z, ~w ∈ V andα, β ∈ C we have:

V1 ~z+ ~w ∈ VV2 (~z+ ~w) + ~v = ~z+ (~w+ ~v)V3 ~z+ ~w = ~w+~zV4 There is a vector denoted~0 inV such that~z+~0 = ~z. It is called thezero vector.V5 For each~z ∈ V there exists an element−~z such that~z+ (−~z) = ~0. −~z is called

theadditive inverse of ~z.V6 α~z ∈ VV7 α(β~z) = (αβ)~zV8 (α + β)~z= α~z+ β~zV9 α(~z+ ~w) = α~z+ α~w

V10 1~z= ~z.

This definition looks a lot like the definition of a real vectorspace. In fact, the onlydifference is that we now allow the scalars to be any complex number, instead of justany real number. In the rest of this section we will generalize most of our conceptsfrom real vector spaces to complex vector spaces.

EXAMPLE 1 The setCn =

z1...zn

| zi ∈ C

is a vector space overC with addition defined by

z1...zn

+

w1...

wn

=

z1 + w1...

zn + wn

and scalar multiplication defined by

α

z1...zn

=

αz1...αzn

for anyα ∈ C.

Just like a complex numberz ∈ C, we can split a vector inCn into a real and imaginarypart.


THEOREM 1 If ~z ∈ Cn, then there exists vectors~x, ~y ∈ Rn such that

~z= ~x+ i~y

Other familiar real vector spaces can be turned into complexvector spaces.

EXAMPLE 2 The setMm×n(C) of all m×n matrices with complex entries is a complex vector spacewith standard addition and complex scalar multiplication of matrices.

We now extend all of our theory from real vector spaces to complex vectors space.

DEFINITIONSubspace

If S is a subset of a complex vector spaceV andS is a complex vector space underthe same operations asV, thenS is said to be asubspace of V.

As in the real case, we use the Subspace Test to prove a non-empty subsetS of acomplex vector spaceV is a subspace ofV.

EXAMPLE 3 Prove thatS =

z1

z2

z3

| z1 + iz2 = −z3

is a subspace ofC3.

Solution: By definition S is a subset ofC3. Moreover,S is non-empty since

000

satisfies 0+ i(0) = −0. Let

z1

z2

z3

,

w1

w2

w3

∈ S. Thenz1 + iz2 = −z3 andw1 + iw2 = −w3.

Hence,

z1

z2

z3

+

w1

w2

w3

=

z1 + w1

z2 + w2

z3 + w3

∈ S

since (z1 + w1) + i(z2 + w2) = z1 + iz2 + w1 + iw2 = −z3 − w3 = −(z3 + w3), and

α

z1

z2

z3

=

αz1

αz2

αz3

∈ S

sinceαz1 + i(αz2) = α(z1 + iz2) = α(−z3) = −(αz3). Thus,S is a subspace ofC3 bythe subspace test.


EXAMPLE 4 IsR a subspace ofC as a vector space overC?

Solution: By definitionR is a non-empty subset ofC. Let x, y ∈ R. Then,x + y isalso a real number, so the set is closed under addition. But, the difference between areal vector space and a complex vector space is that we have scalar multiplication bycomplex numbers in a complex vector space. So, if we take 2∈ R and the scalari ∈ C,then we geti(2) < R. Thus,R is not closed under complex scalar multiplication, andhence it is not a subspace ofC.

The concepts of linear independence, spanning, bases, and dimension are defined thesame as in the real case except that we now have complex scalarmultiplication.

EXAMPLE 5 Find a basis forC as a vector space overC.

Solution: It may be tempting to think that a basis forC is {1, i}, but this wouldbe incorrect since this set is linearly dependent inC. In particular, the equationα11+ α2i = 0 has solutionα1 = 1 andα2 = i since 1(1)+ i(i) = 1− 1 = 0. A basisfor C is {1} since any complex numbera+ bi is just equal to (a+ bi)(1).

EXERCISE 1 What is the standard basis forCn?

EXAMPLE 6 Determine the dimension of the subspaceS =

z1

z2

z3

| z1 + iz2 = −z3

of C3.

Solution: Every vector inS can be written in the form

z1

z2

z3

=

z1

z2

−z1 − iz2

= z1

10−1

+ z2

01−i

Thus,

10−1

,

01−i

spansS and is clearly linearly independent since neither vector is

a scalar multiple of the other, so it is a basis forS. Thus, dimS = 2.

EXERCISE 2 Find a basis for the four fundamental subspaces ofA =

1 −1 −ii 1 −1+ 2i−i 1+ 2i −3+ 2i2 0 2i

.


REMARK

Be warned that it is possible for two complex vectors to be scalar multiples of eachother without looking like they are. For example, can you determine by inspection

which of the following vectors is a scalar multiple of

[

1− i−2i

]

?

[

1−1− i

]

,

[

−i−1− i

]

,

[

7− i8+ 6i

]

Coordinates with respect to a basis, linear mappings, and matrices of linear mappingsare also defined in exactly the same way.

EXAMPLE 7 Given thatB =

11+ i1− i

,

−1−i

−1+ 2i

,

ii

2+ 2i

is a basis ofC3, find theB-coordinates

of ~z=

2i−2+ i3+ 2i

.

Solution: We need to find complex numbersα1, α2, andα3 such that

2i−2+ i3+ 2i

= α1

11+ i1− i

+ α2

−1−i

−1+ 2i

+ α3

ii

2+ 2i

Row reducing the corresponding augmented matrix gives

1 −1 i 2i1+ i −i i −2+ i1− i −1+ 2i 2+ 2i 3+ 2i

R2 − (1+ i)R1

R3 − (1− i)R1

∼

1 −1 i 2i0 1 1 −i0 i 1+ i 1

R1 + R2

R3 − iR2

∼

1 0 1+ i i0 1 1 −i0 0 1 0

R1 − (1+ i)R3

R2 − R3∼

1 0 0 i0 1 0 −i0 0 1 0

Thus, [~z]B =

i−i0

.


EXAMPLE 8 Find the standard matrix ofL : C2→ C3 given byL(z1, z2) =

iz1

z2

z1 + z2

.

Solution: The standard basis forC2 is

{[

10

]

,

[

01

]}

. We have

L(1, 0) =

i01

andL(0, 1) =

011

. Hence, [L] =[

L(1, 0) L(0, 1)]

=

i 00 11 1

.

Of course, the concepts of determinants and invertibility are also the same.

EXAMPLE 9 We have

det

1+ i 2 30 −2i −2+ 3i0 0 1− i

= (1+ i)(−2i)(1− i) = −4i

det

i i 2i1 1 2

1+ i 1− i 3i

= 0

EXAMPLE 10 Let A =

[

1+ i 11 2− i

]

andB =

i 0 11 i 1− i0 1 1+ i

. Show thatA andB are invertible and

find their inverse.

Solution: We have detA =

∣

∣

∣

∣

∣

∣

1+ i 11 2− i

∣

∣

∣

∣

∣

∣

= 2 + i , 0, soA is invertible. Using the

formula for the inverse of a 2× 2 matrix we found in Math 136 gives

A−1 =1

2+ i

[

2− i −1−1 1+ i

]

Row reducing [B | I ] to RREF gives

i 0 1 1 0 01 i 1− i 0 1 00 1 1+ i 0 0 1

∼

1 0 0 2− 2i −1 i0 1 0 3− i −1− i i0 0 1 −1− 2i i 1

Since the RREF ofB is I , B is invertible. Moreover,B−1 =

2− 2i −1 i3− i −1− i i−1− 2i i 1

.


EXERCISE 3 Let A =

1+ i 2 i2 3− 2i 1+ i

1− i 1− 3i 1+ i

. Find the determinant ofA andA−1.

Complex Conjugates

We saw in Section 11.1 that the complex conjugate was a very important and usefuloperation on complex numbers. Thus, we should expect that itwill also be usefuland important in the study of general complex vector spaces.Therefore, we extendthe definition of the complex conjugate to vectors inCn and matrices inMm×n(C).

DEFINITIONComplex

Conjugate in Cn

Let~z=

z1...zn

∈ Cn. Then, we define~z by~z=

z1...zn

.

DEFINITIONComplex

Conjugate inMm×n(C)

Let A =

z11 · · · z1n...

...zm1 · · · zmn

∈ Mm×n(C). Then, we defineA by A =

z11 · · · z1n...

...zm1 · · · zmn

.

THEOREM 2 Let A ∈ Mm×n(C) and~z ∈ Cn. ThenA~z= A~z.

EXAMPLE 11 Let ~w =

12i

1− i

,~z=

101

, andA =

[

1 1− 3i−i 2i

]

. Then

~w =

1−2i1+ i

~z=

101

A =

[

1 1+ 3ii −2i

]

EXAMPLE 12 Is the mappingL : C2 → C2 defined byL(~z) = ~z a linear mapping?

Solution: ConsiderL(α~z1 + ~z2) = α~z1 +~z2 = α~z1 + ~z2, butα , α if α is not real. So,it is not linear. For example

L

(

(1+ i)

[

10

])

= L

([

1+ i0

])

=

[

1+ i0

]

=

[

1− i0

]

,

[

1+ i0

]

= (1+ i)L

([

10

])


Even though the complex conjugate of a vector inCn does not define a linear map-ping, we will see that complex conjugates of vectors inCn and matrices inMm×n(C)occur naturally in the study of complex vector spaces.


1. Determine which of the following sets is a subspace ofC3. Find a basis and

the dimension of each subspace.

(a) S1 =

z1

z2

z3

∈ C3 | iz1 = z3

(b) S2 =

z1

z2

z3

∈ C3 | z1z2 = 0

(c) S3 =

z1

z2

z3

∈ C3 | z1 + z2 + z3 = 0

2. Find a basis for the four fundamental subspaces ofA =

1 i1+ i −1+ i−1 i

.

3. Let~z ∈ Cn. Prove that there exists vectors~x, ~y ∈ Rn such that~z= ~x+ i~y.

4. LetL : C3→ C2 be defined byL(z1, z2, z3) =

[

−iz1 + (1+ i)z2 + (1+ 2i)z3

(−1+ i)z1 − 2iz2 − 3iz3

]

.

(a) Prove thatL is linear and find the standard matrix ofL.

(b) Determine a basis for the range and kernel ofL.

5. LetL : C2→ C2 be defined byL(z1, z2) =

[

z1 + (1+ i)z2

(1+ i)z1 + 2iz2

]

.

(a) Prove thatL is linear.

(b) Determine a basis for the range and kernel ofL.

(c) Use the result of part (b) to define a basisB for C2 such that [L]B isdiagonal.

6. Calculate the determinant of

1 −1 i1+ i −i i1− i −1+ 2i 1+ 2i

.

7. Find the inverse of

1 1 20 1+ i 1i 1+ i 1+ 2i

.

8. Prove that for anyn× n matrix A, we have detA = detA.


11.3 Complex Diagonalization

In Math 136 we saw that some matrices were not diagonalizableover R becausethey had complex eigenvalues. But, if we extend all of our concepts of eigenvalues,eigenvectors, and diagonalization to complex vector spaces, then we will be able todiagonalize these matrices.

DEFINITIONEigenvalue

Eigenvector

Let A ∈ Mn×n(C). If there existsλ ∈ C and~z ∈ Cn with ~z, ~0 such thatA~z = λ~z, thenλ is called aneigenvalue of A and~z is called aneigenvector of A corresponding toλ. We call (λ,~z) aneigenpair.

All of the theory of similar matrices, eigenvalues, eigenvectors, and diagonalizationwe did in Math 136 still applies in the complex case, except that we now allow theuse of complex eigenvalues and eigenvectors.

EXAMPLE 1 DiagonalizeA =

[

0 1−1 0

]

overC.


C(λ) =

∣

∣

∣

∣

∣

∣

−λ 1−1 −λ

∣

∣

∣

∣

∣

∣

= λ2 + 1

Therefore, the eigenvalues ofA areλ1 = i andλ2 = −i. Forλ1 = i we get

A− iI =

[

−i 1−1 −i

]

∼[

1 i0 0

]


{[

−i1

]}

. Forλ2 = −i we get

A+ iI =

[

i 1−1 i

]

∼[

1 −i0 0

]

So, a basis forEλ2 is

{[

i1

]}

. Thus,A is diagonalized byP =

[

−i i1 1

]

to D =

[

i 00 −i

]

.

EXAMPLE 2 DiagonalizeB =

[

i 1+ i1+ i 1

]

overC.


C(λ) =

∣

∣

∣

∣

∣

∣

i − λ 1+ i1+ i 1− λ

∣

∣

∣

∣

∣

∣

= λ2 − (1+ i)λ − i

Section 11.3 Complex Diagonalization 123

Using the quadratic formula we get

λ =(1+ i) ±

√6i

2=

(1+ i) ±√

6(1+i√2)

2=

1±√

32

(1+ i)

So, the eigenvalues areλ1 =1+√

32 (1+ i) andλ2 =

1−√

32 (1+ i). Forλ1 we get

B− λ1I =

i − 1+√

32 (1+ i) 1+ i

1+ i 1− 1+√

32 (1+ i)

∼[

−√

3+ i 22 −

√3− i

]

∼[

1 −2/(√

3− i)0 0

]

Then a basis forEλ1 is~v1 =

[

2√3− i

]

. Forλ2 we get

B− λ2I =

i − 1−√

32 (1+ i) 1+ i

1+ i 1− 1−√

32 (1+ i)

∼[√

3+ i 22

√3− i

]

∼[

1 2/(√

3+ i)0 0

]

Hence, a basis forEλ2 is~v2 =

[

−2√3+ i

]

. Thus,B is diagonalized by

P =

[

2 −2√3− i

√3+ i

]

to D =

[

λ1 00 λ2

]

.

EXAMPLE 3 DiagonalizeF =

[

3 2+ i2− i 7

]

overC.


C(λ) =

∣

∣

∣

∣

∣

∣

3− λ 2+ i2− i 7− λ

∣

∣

∣

∣

∣

∣

= λ2 − 10λ + 16= (λ − 2)(λ − 8)

So the eigenvalues areλ1 = 2 andλ2 = 8. Forλ1 we get

F − λ1I =

[

1 2+ i2− i 5

]

∼[

1 2+ i0 0

]

Then a basis forEλ1 is~v1 =

[

−2− i1

]

. Forλ2 we get

F − λ2I =

[

−5 2+ i2− i −1

]

∼[

1 −1/(2− i)0 0

]

Hence, a basis forEλ2 is~v2 =

[

−1−2+ i

]

. Thus,F is diagonalized by

P =

[

−2− i −11 −2+ i

]

to D =

[

2 00 8

]

.


As usual, these examples teach us more than just how to diagonalize complex ma-trices. In the first example we see that when the matrix has only real entries, thenthe eigenvalues come in complex conjugate pairs. The secondexample shows thatwhen working with matrices with non-real entries our theoryof symmetric matricesfor real matrices does not apply. In particular, we hadBT = B, but the eigenvalues ofB are not real. On the other hand, observe that the matrixF has non-real entires butthe eigenvalues ofF are all real.

THEOREM 1 Let A ∈ Mn×n(R). If λ is a non-real eigenvalue ofA with corresponding eigenvector~z, thenλ is also an eigenvalue ofA with eigenvector~z.

Proof: We haveA~z= λ~z. Hence,

λ~z= λ~z= A~z= A~z= A~z

sinceA is a real matrix. Thus,λ is an eigenvalue ofA with eigenvector~z. �

COROLLARY 2 If A ∈ Mn×n(R) with n odd, thenA has at least one real eigenvalue.

EXAMPLE 4 Find all eigenvalues ofA =

0 2 1−2 3 01 0 2

given thatλ1 = 2+ i is an eigenvalue ofA.

Solution: SinceA is a real matrix andλ1 = 2 + i is an eigenvalue ofA we have byTheorem 1 thatλ2 = λ1 = 2− i is also an eigenvalue ofA. Finally, we know that thesum of the eigenvalues is the trace of the matrix, so the remaining eigenvalue, whichmust be real, is

λ3 = tr A− λ1 − λ2 = 5− (2+ i) − (2− i) = 1

It is easy to verify this result by checking that det(A− I ) = 0.


1. For each of the following matrices, either diagonalize the matrix overC, orshow that it is not diagonalizable.

(a)

[

2 1+ i1− i 1

]

(b)

[

3 5−5 −3

]

(c)

2 2 −1−4 1 22 2 −1

(d)

[

1 ii −1

]

(e)

2 1 −12 1 03 −1 2

(f)

1+ i 1 01 1 −i1 0 1

Section 11.4 Complex Inner Products 125

2. For anyθ ∈ R, let Rθ =

[

cosθ − sinθsinθ cosθ

]

.

(a) DiagonalizeRθ overC.

(b) Verify your answer in (a) is correct, by showing the matrix P and diagonalmatrix D from part (a) satisfyP−1RθP = D for θ = 0 andθ = π4.

3. LetA ∈ Mn×n(C) and let~zbe an eigenvector ofA. Prove that~z is an eigenvectorof A. What is the corresponding eigenvalue?

4. Suppose that a real 2×2 matrixA has 2+i as an eigenvalue with a corresponding

eigenvector

[

1+ ii

]

. DetermineA.

11.4 Complex Inner Products

The concepts of orthogonality and orthonormal bases were very useful in real vectorspaces. Hence, it makes sense to extend these concepts to complex vector spaces. Inparticular, we would like to be able to define the concepts of orthogonality and lengthin complex vector spaces to match those in the real case.

REMARK

In this section we leave the proofs of the theorems to the reader (or homework as-signments/tests) as they are essentially the same as in the real case.

We start by looking at how to define an inner product forCn.

We first observe that if we extend the standard dot product to vectors inCn, then thisdoes not define an inner product. Indeed, if~z= ~x+ i~y, then we have

~z ·~z= z21 + · · · + z2

n

= (x21 + · · · + x2

n − y21 − · · · − y2

n) + 2i(x1y1 + · · · + xnyn)

But, to be an inner product we require that〈~z,~z〉 be a non-negative real number sothat we can define the length of a vector. Since~z · ~z does not even have to be real, itcannot define an inner product.

Thinking of properties of complex numbers, the only way we can ensure that we geta non-negative real number when multiplying complex numbers is to multiply thecomplex number by its conjugate. In particular, if we define

〈~z, ~w〉 = z1w1 + · · · + znwn = ~z · ~w

then, we get< ~z,~z>= z1z1 + · · · + znzn = |z1|2 + · · · + |zn|2

which is not only a non-negative real number, but it is 0 if andonly if ~z = ~0. Thus,this is what we are looking for.


DEFINITIONStandard Innerproduct on Cn

Thestandard inner product < , > onCn is defined by

⟨

~z, ~w⟩

= ~z · ~w = z1w1 + · · · + znwn

for any~z, ~w ∈ Cn.

REMARK

In some other fields, for example engineering, they use

⟨

~z, ~w⟩

= ~w ·~z

for the definition of the standard inner product forCn. Be warned that many computerprograms use the engineering definition of the inner productfor Cn.

EXAMPLE 1 Let~z=

12+ i1− i

and~w =

i2

1+ 3i

. Then

< ~z,~z> = ~z ·~z=

12+ i1− i

·

12− i1+ i

= 12 + (2+ i)(2− i) + (1− i)(1+ i) = 8

< ~z, ~w > = ~z · ~w =

12+ i1− i

·

−i2

1− 3i

= 1(−i) + (2+ i)(2)+ (1− i)(1− 3i) = 2− 3i

< ~w,~z> = ~w ·~z=

i2

1+ 3i

·

12− i1+ i

= i(1)+ 2(2− i) + (1+ 3i)(1+ i) = 2+ 3i

This example shows that the standard inner product forCn is not symmetric. The

next example shows that it is also not bilinear.

EXAMPLE 2 Let~z, ~w ∈ Cn. Then, for anyα ∈ C we have〈α~z, ~w〉 = α〈~z, ~w〉, but〈~z, α~w〉 = α〈~z, ~w〉.

Solution: We have

〈α~z, ~w〉 = (α~z) · ~w = α(~z · ~w) = α〈~z, ~w〉〈~z, α~w〉 = ~z · α~w = ~z · (α~w) = α(~z · ~w) = α〈~z, ~w〉


EXERCISE 1 Let~z=

1+ i1− i

2

and~w =

1− i2i−2

. Find〈~z, ~w〉 and〈~w,~z〉.

THEOREM 1 Let 〈 , 〉 denote the standard inner product forCn. Then

(1)⟨

~z,~z⟩ ∈ R and

⟨

~z,~z⟩ ≥ 0 for all~z ∈ V, and

⟨

~z,~z⟩

= 0 if and only if~z= ~0(2)

⟨

~z, ~w⟩

=⟨

~w,~z⟩

(3)⟨

~v+~z, ~w⟩

=⟨

~v, ~w⟩

+⟨

~z, ~w⟩

(4)⟨

α~z, ~w⟩

= α⟨

~z, ~w⟩

As we saw in the examples, this theorem shows that the standard inner product onC

n does not satisfy the same properties as a real inner product.So, to define an innerproduct on a general complex vector space, we should use these properties instead ofthose for a real inner product.

DEFINITIONHermitian Inner

Product

Let V be a complex vector space. AHermitian inner product onV is a function〈 , 〉 : V × V→ C such that for all~v, ~w,~z ∈ V andα ∈ C we have

(1)⟨

~z,~z⟩ ∈ R and

⟨

~z,~z⟩ ≥ 0 for all~z ∈ V, and

⟨

~z,~z⟩

= 0 if and only if~z= ~0(2)

⟨

~z, ~w⟩

=⟨

~w,~z⟩

(3)⟨

~v+~z, ~w⟩

=⟨

~v, ~w⟩

+⟨

~z, ~w⟩

(4)⟨

α~z, ~w⟩

= α⟨

~z, ~w⟩

A complex vector space with a Hermitian inner product is called aHermitian innerproduct space.

THEOREM 2 If 〈 , 〉 is a Hermitian inner product on a complex vector spaceV, then for all~v, ~w,~z ∈V andα ∈ C we have

⟨

~z,~v+ ~w⟩

=⟨

~z,~v⟩

+⟨

~z, ~w⟩

⟨

~z, α~w⟩

= α⟨

~z, ~w⟩

It is important to observe that this is a true generalizationof the real inner product.That is, we could use this for the definition of the real inner product since if

⟨

~z, ~w⟩

andα are strictly real, then this will satisfy the definition of a real inner product.

THEOREM 3 The function〈A, B〉 = tr(BTA) defines a Hermitian inner product onMm×n(C).


EXAMPLE 3 Let A =

[

1 i2 2+ i

]

andB =

[

−3 3+ i−2i 0

]

. Find 〈A, B〉 under the inner product

〈A, B〉 = tr(BTA)

Solution: We have

〈A, B〉 = tr(BTA) = tr

([

−3 2i3− i 0

] [

1 i2 2+ i

])

= tr

[

−3+ 4i −2+ i3− i 1+ 3i

]

= −3+ 4i + 1+ 3i = −2+ 7i

EXAMPLE 4 If A =

a11 · · · a1n...

...am1 · · · amn

andB =

b11 · · · b1n...

...bm1 · · · bmn

, then

BTA =

b11 · · · bm1...

...

b1n · · · bmn

a11 · · · a1n...

...am1 · · · amn

Hence, we get

tr(BTA) =

m∑

ℓ=1

aℓ1bℓ1 +m

∑

ℓ=1

aℓ2bℓ2 + · · · +m

∑

ℓ=1

aℓnbℓn =n

∑

j=1

m∑

ℓ=1

aℓ jbℓ j

which corresponds to the standard inner product of the corresponding vectors underthe obvious isomorphism withCmn.

REMARK

As in the real case, whenever we are working withCn or Mm×n(C), if no other innerproduct is specified, we will assume we are working with the standard inner product.

EXERCISE 2 Let A, B ∈ M2×2(C) with A =

[

1 1− i2+ 2i i

]

and B =

[

1 1− i2+ 2i i

]

. Determine

〈A, B〉 and〈B,A〉.


Length and Orthogonality

We can now define length and orthogonality to match the definitions in the real case.

DEFINITIONLength

Unit Vector

LetV be a Hermitian inner product space with inner product〈 , 〉. Then, for any~z ∈ Vwe define thelength of ~zby

‖~z‖ =√

⟨

~z,~z⟩

If ‖~z‖ = 1, then~z is called aunit vector.

DEFINITIONOrthogonality

Let V be a Hermitian inner product space with inner product〈 , 〉. Then, for any~z, ~w ∈ V we say that~z and~w areorthogonal if

⟨

~z, ~w⟩

= 0.

EXAMPLE 5 Let ~u =

i1+ i2− i

∈ C3.

‖~u‖ =√

⟨

~u, ~u⟩

=

√

~u · ~u

=√

i(−i) + (1+ i)(1− i) + (2− i)(2+ i) =√

1+ 2+ 5 =√

8

EXAMPLE 6 Let~z=

1i

2+ 3i

∈ C3. Which of the following vectors are orthogonal to~z:

~u =

−i10

, ~v =

i10

, ~w =

1− 2i−7

1− i

Solution: We have

⟨

~z, ~u⟩

= ~z · ~u =

1i

2+ 3i

·

i10

= i + i + 0 = 2i

⟨

~z,~v⟩

= ~z · ~v =

1i

2+ 3i

·

−i10

= i − i + 0 = 0

⟨

~z, ~w⟩

= ~z · ~w =

1i

2+ 3i

·

1+ 2i−7

1+ i

= (1+ 2i) − 7i + (−1+ 5i) = 0

Hence,~v and~w are orthogonal to~z, and~u is not orthogonal to~z.


Of course, these satisfy all of our familiar properties of length and orthogonality.

THEOREM 4 Let V be a Hermitian inner product space with inner product〈 , 〉. Then, for any~z, ~w ∈ V andα ∈ C we have

(1) ‖α~z‖ = |α| ‖~z‖(2) ‖~z+ ~w‖ ≤ ‖~z‖ + ‖~w‖(3)

1‖~z‖~z is a unit vector in the direction of~z.


Orthonormal

Let S = {~z1, . . . ,~zk} be a set in a Hermitian inner product space with Hermitian innerproduct〈 , 〉. S is said to beorthogonal if

⟨

~zℓ,~zj

⟩

= 0 for all ℓ , j. S is said to beorthonormal if it is orthogonal and‖~zj ‖ = 1 for all j.

THEOREM 5 If S = {~z1, . . . ,~zk} is an orthogonal set in a Hermitian inner product spaceV, then

‖~z1 + · · · +~zk‖2 = ‖~z1‖2 + · · · + ‖~zk‖2

THEOREM 6 If S = {~z1, . . . ,~zk} is an orthogonal set of non-zero vectors in a Hermitian innerproduct spaceV, thenS is linearly independent.

The concept of projections and, of course, the Gram-Schmidtprocedure are still validfor Hermitian inner products.

EXAMPLE 7 Use the Gram-Schmidt procedure to find an orthogonal basis for the subspaceS ofC

4 defined by

S = Span

1010

,

i1i1

,

1+ i−1i

1+ i

= {~w1, ~w2, ~w3}

Solution: First, we let~v1 = ~w1. Then we have

~v2 = ~w2 −⟨

~w2,~v1⟩

‖~v1‖2~v1 =

i1i1

− 2i2

1010

=

0101

~v3 = ~w3 −⟨

~w3,~v1⟩

‖~v1‖2~v1 −

⟨

~w3,~v2⟩

‖~v2‖2~v2

=

1+ i−1i

1+ i

− 1+ 2i2

1010

− i2

0101

=

1/2−1− i/2−1/2

1+ i/2

Consequently,{~v1,~v2,~v3} is an orthogonal basis forS.


We recall that a real square matrix with orthonormal columnsis an orthogonal matrix.Since orthogonal matrices were so useful in the real case, wegeneralize them as well.

DEFINITIONUnitary Matrix

If the columns of a matrixU form an orthonormal basis forCn, thenU is calledunitary.

THEOREM 7 Let U ∈ Mn×n(C). Then the following are equivalent:

(1) the columns ofU form an orthonormal basis forCn.(2) UT = U−1

(3) the rows ofU form an orthonormal basis forCn.

EXAMPLE 8 Determine which of the following matrices are unitary:

A =

[

1 1+ 2i−1 1+ 2i

]

, B =

(1+ i)/√

2 0 00 0 10 i 0

, C =

1+ i2

1− i√

6

1√

6i2

0−3− 3i√

2412

2i√

6

1− i√

24

Solution: The columns ofA are not unit vectors, soA is not unitary.

The columns ofB are clearly unit vectors and orthogonal to each other, so thecolumns ofB form an orthonormal basis forC3. Hence,B is unitary.

We have

CTC =

1− i2

−i2

12

1+ i√

60

−2i√

61√

6

−3+ 3i√

24

1+ i√

24

1+ i2

1− i√

6

1√

6i2

0−3− 3i√

2412

2i√

6

1− i√

24

=

1 0 00 1 00 0 1

soC is unitary.

REMARK

Observe that if the entries ofA are all real, thenA is unitary if and only if it isorthogonal.

THEOREM 8 If U andW are unitary matrices, thenUW is a unitary matrix.


Notice this is the second time we have seen the conjugate of the transpose of a matrix.So, we make the following definition.

DEFINITIONConjugateTranspose

Let A ∈ Mm×n(C). Then theconjugate transpose of A is

A∗ = AT

EXAMPLE 9 If A =

1 0 1+ i −2i1− 3i 2 2+ i i

i −3i 4− i 1− 5i

, thenA∗ =

1 1+ 3i −i0 2 3i

1− i 2− i 4+ i2i −i 1+ 5i

.

REMARK

Observe that ifA is a real matrix, thenA∗ = AT . Hence, the conjugate transpose isthe complex version of the transpose.

THEOREM 9 Let A, B ∈ Mm×n(C) and letα ∈ C. Then

(1) 〈A~z, ~w〉 = 〈~z,A∗~w〉 for all ~z, ~w ∈ Cn

(2) (A∗)∗ = A(3) (A+ B)∗ = A∗ + B∗

(4) (αA)∗ = αA∗

(5) (AB)∗ = B∗A∗

We end this section by proving Lemma 10.2.2.

LEMMA 10 If A is a symmetric matrix with real entries, then all of its eigenvalues are real.

Proof: Let λ be any eigenvalue ofA with corresponding unit eigenvector~z ∈ Cn.Then, we have

λ = λ⟨

~z,~z⟩

=⟨

λ~z,~z⟩

=⟨

A~z,~z⟩

= A~z ·~z= ~z · A~z

by Theorem 10.2.4. SinceA has all real entries, we get

A~z= A~z= λ~z= λ~z

Thus,λ = ~z · λ~z= λ~z ·~z= λ ⟨~z,~z⟩ = λ

Consequently,λ is real. �



1. Let~u =

12i

1− i

, ~z =

2−2i1− i

, and~w =

1+ i1− 2i

i

be vectors inC3. Calculate the

following.

(a) ‖~u‖ (b) ‖~w‖ (c) 〈~u,~z〉 (d) 〈~z, ~u〉(e) 〈~u, (2+ i)~z〉 (f) 〈~z, ~w〉 (g) 〈~w,~z〉 (h) 〈~u+~z, 2i~w− i~z〉

2. Determine which of the following matrices is unitary.

(a)

[

i/√

2 1/√

21/√

2 −i/√

2

]

(b)

(1+ i)/2 (1− i)/√

6 (1− i)/√

121/2 0 3i/

√12

1/2 2i/√

6 −i/√

12

(c)

[

1/√

2 −1/√

21/√

2 1/√

2

]

(d)

0 i 0i 0 00 0 (1+ i)/2

3. ConsiderC3 with its standard inner product. Let~z=

1+ i2− i−1+ i

, ~w =

1− i−2− 3i−1

.

(a) Evaluate〈~z, ~w〉 and〈~w, 2i~z〉.(b) Find a vector in Span{~z, ~w} that is orthogonal to~z.

(c) Write the formula for the projection of~u onto S = Span{~z, ~w} and thenfind the projection.

4. LetA, B ∈ Mm×n(C).

(a) Prove that (A∗)∗ = A.

(b) 〈A~z, ~w〉 = 〈~z,A∗~w〉 for all ~z, ~w ∈ Cn.

5. Let U ∈ Mn×n(C). Prove that if the columns ofU form an orthonormal basisfor Cn, thenU∗ = U−1.

6. LetU be ann× n unitary matrix.

(a) Show that〈U~z,U~w〉 = 〈~z, ~w〉 for any~z, ~w ∈ Cn.

(b) Supposeλ is an eigenvalue ofU. Use part (a) to show that|λ| = 1.

(c) Find a unitary matrixU which has eigenvalueλ whereλ , ±1.

7. Let V be a complex inner product space, with complex inner product〈 , 〉.Prove that if〈~u,~v〉 = 0, then‖~u+ ~v‖2 = ‖~u‖2 + ‖~v‖2. Is the converse true?


11.5 Unitary Diagonalization

In Section 10.2, we saw that every real symmetric matrix is orthogonally diagonaliz-able. It is natural to ask if there is a comparable result in the case of matrices withcomplex entries.

First, we observe that ifA is a real symmetric matrix, then the conditionAT = A isequivalent toA∗ = A. Hence, this condition should be our equivalent of symmetric.

DEFINITIONHermitian matrix

A matrix A ∈ Mn×n(C) such thatA∗ = A is calledHermitian.

EXAMPLE 1 Which of the following matrices are Hermitian?

A =

[

2 3− i3+ i 4

]

, B =

[

1 2i−2i 3− i

]

, C =

0 i i−i 0 i−i i 0

Solution: We haveA∗ =

[

2 3− i3+ i 4

]

= A, soA is Hermitian.

B∗ =

[

1 2i−2i 3+ i

]

, B, soB is not Hermitian.

C∗ =

0 i i−i 0 −i−i −i 0

, C, soC is not Hermitian.

Observe that ifA is Hermitian then we have(A)ℓ j = A jℓ, so the diagonal entries ofA must be real, and forℓ , j the ℓ j-th entry must be the complex conjugate of thejℓ-th entry. Moreover, we see that every real symmetric matrixis Hermitian. Thus,we expect that Hermitian matrices satisfy the same properties as symmetric matrices.

THEOREM 1 An n× n matrix is Hermitian if and only if for all~z, ~w ∈ Cn we have

⟨

A~z, ~w⟩

=⟨

~z,A~w⟩

Proof: If A is Hermitian, thenA∗ = A and hence by Theorem 11.4.9 we get⟨

A~z, ~w⟩

=⟨

~z,A∗~w⟩

=⟨

~z,A~w⟩

If < ~z,A~w >=< A~z, ~w >, then we have

~zTA ~w = ~zTA~w =⟨

~z,A~w⟩

=⟨

A~z, ~w⟩

= (A~z)T ~w = ~zTAT ~w

Since this is valid for all~z, ~w ∈ Cn, we have thatA = AT , and henceA = A∗. ThusAis Hermitian. �

Section 11.5 Unitary Diagonalization 135

THEOREM 2 SupposeA is ann× n Hermitian matrix. Then:

(1) all the eigenvalues ofA are real.(2) if λ1 andλ2 are distinct eigenvalues with corresponding eigenvectors~v1 and~v2

respectively, then~v1 and~v2 are orthogonal.

From this result we expect to get something very similar to the Principal Axis Theo-rem for Hermitian matrices. But, instead of orthogonally diagonalizing, we now lookat the complex equivalent... unitarily diagonalizing.

DEFINITIONUnitarily Similar

If A andB are matrices such thatU∗AU = B, whereU is a unitary matrix, then wesay thatA andB areunitarily similar.

If A andB are unitarily similar, then they are similar. Consequently, all of our prop-erties of similarity still applies.

DEFINITIONUnitarily

Diagonalizable

An n× n matrix A is said to beunitarily diagonalizable if it is unitarily similar to adiagonal matrixD.

EXAMPLE 2 Let A =

[

−4 1− 3i1+ 3i 5

]

. Verify that A is Hermitian and show thatA is unitarily

diagonalizable.

Solution: We see thatA∗ =

[

−4 1− 3i1+ 3i 5

]

= A, soA is Hermitian. We have

C(λ) =

∣

∣

∣

∣

∣

∣

−4− λ 1− 3i1+ 3i 5− λ

∣

∣

∣

∣

∣

∣

= λ2 − λ − 30= (λ + 5)(λ − 6)

Hence, the eigenvalues areλ1 = −5 andλ2 = 6. We have

A+ 5I =

[

1 1− 3i1+ 3i 10

]

∼[

1 1− 3i0 0

]

So a basis forEλ1 is

{[

−1+ 3i1

]}

. We also have

A− 6I =

[

−10 1− 3i1+ 3i −1

]

∼[

1 −(1− 3i)/100 0

]

So a basis forEλ2 is

{[

1− 3i10

]}

. Observe that

⟨[

−1+ 3i1

]

,

[

1− 3i10

]⟩

= (−1+ 3i)(1+ 3i) + 1(10)= 0

Thus, the eigenvectors are orthogonal. Normalizing them, we get thatA is diagonal-

ized by the unitary matrixU =

[

−(1+ 3i)/√

11 (1− 3i)/√

1101/√

11 10/√

110

]

to D =

[

−5 00 6

]

.


We expect that every Hermitian matrix is unitarily diagonalizable. We will prove thisexactly the same way that we did in Section 10.2 for symmetricmatrices; by firstproving that every square matrix is unitarily similar to an upper triangular matrix.However, we now get the additional benefit of not having to restrict ourselves to realeigenvalues as we did with the Triangularization Theorem.

THEOREM 3 (Schur’s Theorem)

If A is ann×n matrix, thenA is unitarily similar to an upper triangular matrix whosediagonal entries are the eigenvalues ofA.

Proof: We prove this by induction. Ifn = 1, thenA is upper triangular. Assumethe result holds for all (n − 1)× (n − 1) matrices. Letλ1 be an eigenvalue ofA withcorresponding unit eigenvector~z1. Extend{~z1} to an orthonormal basis{~z1, . . . ,~zn} ofC

n and letU1 =[

~z1 · · · ~zn

]

. Then,U1 is unitary and we have

U∗1AU1 =

~z1

T

...

~zn

T

A[

~z1 · · · ~zn

]

=

~z1

Tλ1~z1 ~z1

TA~z2 · · · ~z1

TA~zn

~z2

Tλ1~z1 ~z2

TA~z2 · · · ~z2

TA~zn

......

...

~zn

Tλ1~z1 ~zn

TA~z2 · · · ~zn

TA~zn

For 1≤ j ≤ n we have

~zj

Tλ1~z1 = λ1~zj

T~z1 = λ1~zT

j ~z1 = λ1~zj ·~z1 = λ1~z1 ·~zj = λ1〈~z1,~zj〉

Since{~z1, . . . ,~zn} is orthonormal, we get that~z1Tλ1~z1 = λ1 and~zj

Tλ1~z1 = 0 for 2 ≤

j ≤ n. Hence, we can write

U∗1AU1 =

[

λ1~bT

0 A1

]

whereA1 is an (n−1)×(n−1) matrix and~b ∈ Cn−1. Thus, by the inductive hypothesiswe get that there exits a unitary matrixQ such thatQ∗A1Q = T1 is upper triangular.

Let U2 =

[

1 00 Q

]

, thenU2 is clearly unitary and henceU = U1U2 is unitary. Thus,

U∗AU = (U1U2)∗A(U1U2) = U∗2U∗1AU1U2 =

[

1 00 Q∗

] [

λ1~bT

0 A1

] [

1 00 Q

]

=

[

λ1~bTQ

0 T1

]

which is upper triangular. Since unitarily similar matrices have the same eigenvaluesand the eigenvalues of a triangular matrix are its diagonal entries, we get that theeigenvalues ofA are along the main diagonal ofU∗AU as required. �


REMARK

Schur’s Theorem shows that for everyn× n matrix A there exists a unitary matrixUsuch thatU∗AU = T. SinceUT = U−1 we can solve this forA to getA = UTU∗.This is called a Schur decomposition ofA.

We can now prove the result corresponding to the Principal Axis Theorem.

THEOREM 4 (Spectral Theorem for Hermitian Matrices)

If A is Hermitian, then it is unitarily diagonalizable.

Proof: By Schur’s Theorem, there exists a unitary matrixU and an upper triangularmatrixT such thatU∗AU = T. SinceA∗ = A we get that

T∗ = (U∗AU)∗ = U∗A∗U∗∗ = U∗AU = T

SinceT is upper triangular, we get thatT∗ is lower triangular. Thus,T is both upperand lower triangular and hence is diagonal. Consequently,U unitarily diagonalizesA. �

In the real case, we also had that every orthogonally diagonalizable matrix was sym-metric. Unfortunately, the corresponding result is not true in the complex case as thenext example demonstrates.

EXAMPLE 3 Prove thatA =

[

0 1−1 0

]

is unitarily diagonalizable, but not Hermitian.

Solution: Observe thatA∗ =

[

0 −11 0

]

, A, soA is not Hermitian.

In Example 11.3.1, we showed thatA is diagonalized byP =

[

−i i1 1

]

. Observe that

⟨[

−i1

]

,

[

i1

]⟩

= (−i)(−i) + 1(1)= 0

so the columns ofP are orthogonal. Hence, if we normalize them, we get thatA is

unitarily diagonalized byU =

− i√2

i√2

1√2

1√2

.

The matrix in Example3 satisfiesA∗ = −A and so is calledskew-Hermitian. Thus,we see that there are matrices which are not Hermitian but whose eigenvectors forman orthonormal basis forCn and hence are unitarily diagonalizable. In particular,in the proof of the Spectral Theorem for Normal Matrices, we can observe that the


reverse direction fails because ifA is not Hermitian, we cannot guarantee thatT∗ = Tsince some of the eigenvalues may not be real.

So, as we did in the real case, we should look for a necessary condition for a matrixto be unitarily diagonalizable.

Assume that eigenvectors ofA form an orthonormal basis{~z1, . . . ,~zn} for Cn. Then,we know thatU =

[

~z1 · · · ~zn

]

unitarily diagonalizesA. That is, we haveU∗AU =D, whereD is diagonal. Hence,D∗ = U∗A∗U. Now, notice thatDD∗ = D∗D sinceDandD∗ are diagonal. This gives

(U∗AU)(U∗A∗U) = (U∗A∗U)(U∗AU) ⇒ U∗AA∗U = U∗A∗AU

Consequently, ifA is unitarily diagonalizable, then we must haveAA∗ = A∗A.

DEFINITIONNormal Matrix

An n× n matrix A is callednormal if AA∗ = A∗A.

THEOREM 5 (Spectral Theorem for Normal Matrices)

A matrix A is unitarily diagonalizable if and only ifA is normal.

Proof: We proved that every unitarily diagonalizable matrix is normal above. So,we just need to prove that every normal matrix is unitarily diagonalizable. Of course,we do this using Schur’s Theorem.

By Schur’s Theorem, there exists an upper triangular matrixT and a unitary matrixU such thatU∗AU = T. We just need to prove thatT is in fact diagonal. Observethat

TT∗ = (U∗AU)(U∗A∗U) = U∗AA∗U = U∗A∗AU = (U∗A∗U)(U∗AU) = T∗T.

HenceT is also normal and if we compare the diagonal entries ofTT∗ andT∗T weget

|t11|2 + |t12|2 + · · · + |t1n|2 = |t11|2

|t22|2 + · · · + |t2n|2 = |t12|2 + |t22|2...

|tnn|2 = |t1n|2 + |t2n|2 + · · · + |tnn|2

Hence, we must havetℓ j = 0 for all ℓ , j, soT is diagonal as required. �

As a result normal matrices are very important. We now look atsome useful proper-ties of normal matrices.

THEOREM 6 If A is ann× n normal matrix, then

(1) ‖A~z‖ = ‖A∗~z‖, for all~z ∈ Cn.(2) A− λI is normal for everyλ ∈ C.(3) If A~z= λ~z, thenA∗~z= λ~z.


Proof: For (1) observe that ifAA∗ = A∗A, thenAAT = ATA. Then

‖A~z‖2 =< A~z,A~z>= (A~z)TA~z= ~zTATA~z= ~zTAAT~z

= ~zT(A∗)TA∗~z= (A∗~z)TA∗~z=< A∗~z,A∗~z>= ‖A∗~z‖2

for any~z ∈ Cn.

For (2) we observe that

(A− λI )(A− λI )∗ = (A− λI )(A∗ − λI ) = AA∗ − λA∗ − λA+ |λ|2I= A∗A− λA− λA∗ + |λ|2I = (A∗ − λI )(A− λI )= (A− λI )∗(A− λI )

For (3) suppose thatA~z = λ~z for some~z ∈ Cn and letB = A − λI . ThenB is normalby (2) and

B~z= (A− λI )~z= A~z− λ~z= ~0.So, by (1) we get

0 = ‖B~z‖ = ‖B∗~z‖ = ‖(A∗ − λI )~z‖ = ‖A∗~z− λ~z‖

Thus,A∗~z= λ~z, as required. �


1. Unitarily diagonalize the following matrices.

(a)

[

a b−b a

]

(b) B =

[

4i 1+ 3i−1+ 3i i

]

(c) C =

1 0 1+ i0 2 0

1− i 0 0

2. Prove that all the eigenvalues of a Hermitian matrixA are real.

3. Prove that all the eigenvalues of a skew-Hermitian matrixA are purely imagi-nary.

4. LetA ∈ Mn×n(C) satisfyA∗ = iA.

(a) Prove thatA is normal.

(b) Show that every eigenvalueλ of A must satisfyλ = −iλ.

5. LetA ∈ Mn×n(C) be normal and invertible. Prove thatB = A∗A−1 is unitary.

6. Let A andB be Hermitian matrices. Prove thatAB is Hermitian if and only ifAB= BA.

7. LetA =

[

2i −2+ i1− i 3

]

.

(a) Prove thatλ1 = 2+ i is an eigenvalue ofA.

(b) Find a unitary matrixU such thatU∗AU = T is upper triangular.


11.6 Cayley-Hamilton Theorem

We now use Schur’s theorem to prove an important result aboutthe relationship of amatrix with its characteristic polynomial.

THEOREM 1 (Cayley-Hamilton Theorem)

If A ∈ Mn×n(C), thenA is a root of its characteristic polynomialC(λ).

Proof: By Schur’s theorem, there exists a unitary matrixU and an upper triangularmatrixT such thatU∗AU = T. Let

C(λ) = cnλn + cn−1λ

n−1 + · · · + c1λ + c0 = (λ − λ1) · · · (λ − λn)

For anyn× n matrix X we define

C(X) = cnXn + cn−1Xn−1 + · · · + c1X + c0I

Observe that

C(T) = C(U∗AU)

= cn(U∗AU)n + · · · + c1(U

∗AU) + c0I

= cnU∗AnU + · · · + c1U

∗AU + c0U∗U

= U∗cnAnU + · · · + U∗c1AU + U∗c0U

= U∗(cnAn + · · · + c1A+ c0I )U

= U∗C(A)U

We can complete the proof by showingC(T) is the zero matrix.

Since the eigenvaluesλ1, . . . , λn of A are the diagonal entries ofT we have

C(X) = (−1)n(X − λ1I )(X − λ2I ) · · · (X − λnI )

soC(T) = (−1)n(T − λ1I )(T − λ2I ) · · · (T − λnI ). (11.1)

Observe that the first column ofT − λ1I is zero since the first column ofT is

λ1

0...0

.

Then, since the second column ofT − λ2I is

c0...0

, we get that (T − λ1I )(T − λ2I ) has

the first two columns zero since the first two columns ofT − λ1I have the form

b0...0

.

Continuing, we see that all the columns of Equation11.1are zero as required. �

Section 11.6 Cayley-Hamilton Theorem 141

EXAMPLE 1 Show thatA =

[

a b0 c

]

is a root of its characteristic polynomial.

Solution: SinceA is upper triangular, we haveC(λ) = (λ−a)(λ−c) = λ2−(a+c)λ+ac.Then,

C(A) = A2 − (a+ c)A+ acI

=

[

a2 ab+ bc0 c2

]

−[

a2 + ac ab+ cb0 ac+ c2

]

+

[

ac 00 ac

]

=

[

0 00 0

]

EXERCISE 1 Show thatA =

a b c0 d e0 0 f

is a root of its characteristic polynomial in two ways. First,

show it directly by using the method of Example 1. Second, show it by evaluating

C(A) = (A− aI)(A− dI)(A− f I )

by multiplying from left to right as in the proof.

235 Course Notes

Documents

Transcript of 235 Course Notes