Lecture Notes on Linear and Multilinear Algebra...

Lecture Notes

on

Linear and Multilinear Algebra

2301-610

Wicharn Lewkeeratiyutkul

Department of Mathematics and Computer Science

Faculty of Science

Chulalongkorn University

August 2014

Contents

Preface iii

1 Vector Spaces 1

1.1 Vector Spaces and Subspaces . . . . . . . . . . . . . . . . . . . . . 2

1.2 Basis and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.3 Linear Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.4 Matrix Representation . . . . . . . . . . . . . . . . . . . . . . . . . 32

1.5 Change of Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

1.6 Sums and Direct Sums . . . . . . . . . . . . . . . . . . . . . . . . . 48

1.7 Quotient Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

1.8 Dual Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

2 Multilinear Algebra 73

2.1 Free Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

2.2 Multilinear Maps and Tensor Products . . . . . . . . . . . . . . . . 78

2.3 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

2.4 Exterior Products . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

3 Canonical Forms 107

3.1 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

3.2 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

3.3 Minimal Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . 128

3.4 Jordan Canonical Forms . . . . . . . . . . . . . . . . . . . . . . . . 141

i

ii CONTENTS

4 Inner Product Spaces 1614.1 Bilinear and Sesquilinear Forms . . . . . . . . . . . . . . . . . . . . 1614.2 Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 1674.3 Operators on Inner Product Spaces . . . . . . . . . . . . . . . . . . 1804.4 Spectral Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

Bibliography 203

Preface

This book grew out of the lecture notes for the course 2301-610 Linear andMultilinaer Algebra given at the Deparment of Mathematics, Faculty of Science,Chulalongkorn University that I have taught in the past 5 years.

Linear Algebra is one of the most important subjects in Mathematics, withnumerous applications in pure and applied sciences. A more theoretical linearalgebra course will emphasize on linear maps between vector spaces, while anapplied-oriented course will mainly work with matrices. Matrices have an ad-vantage of being easier to compute, while it is easier to establish the results byworking with linear maps. This book tries to establish a close connection betweenthe two aspects of the subject.

I would like to thank my students who took this course with me and proof-read the early drafts. Special thanks go to Chao Kusollerschariya who providetechnical help about latex and suggest several easier proofs, and Detchat Samartwho supplied the proofs on polynomials.

Please do not distribute.

Wicharn Lewkeeratiyutkul

iii

Chapter 1

Vector Spaces

In this chapter, we will study an abstract theory of vector spaces and linear mapsbetween vector spaces. A vector space is a generalization of the space of vectors inthe 2- or 3-dimensional Euclidean space. We can add two vectors and multiply avector by a scalar. In a general framework, we still can add vectors, but the scalarsdon’t have to be numbers; they are required to satisfy some algebraic propertieswhich constitute a field. A vector space is defined to be a non-empty set thatsatisfies certain axioms that generalize the addition and scalar multiplication ofvectors in R2 and R3. This will allow our theory to be applicable to a wide rangeof situations.

Once we have some vector spaces, we can construct new vector spaces fromexisting ones by taking subspaces, direct sums and quotient spaces. We thenintroduce a basis for a vector space, which can be regarded as choosing a coor-dinate system. Once we fix a basis for the vector space, every other element canbe written uniquely as a linear combination of elements in the basis.

We also study a linear map between vector spaces. It is a function thatpreserves the vector space operations. If we fix bases for vector spaces V andW , a linear map from V into W can be represented by a matrix. This will allowthe computational aspect of the theory. The set of all linear maps between twovector spaces is a vector space itself. The case when the target space is a scalarfield is of particular importance, called the dual space of the vector space.

1

2 CHAPTER 1. VECTOR SPACES

1.1 Vector Spaces and Subspaces

Definition 1.1.1. A field is a set F with two binary operations, + and ·, andtwo distinct elements 0 and 1, satisfying the following properties:

(i) ∀x, y, z ∈ F , (x+ y) + z = x+ (y + z);

(ii) ∀x ∈ F , x+ 0 = 0 + x = x;

(iii) ∀x ∈ F ∃y ∈ F , x+ y = y + x = 0;

(iv) ∀x, y ∈ F , x+ y = y + x;

(v) ∀x, y, z ∈ F , (x · y) · z = x · (y · z);

(vi) ∀x ∈ F , x · 1 = 1 · x = x;

(vii) ∀x ∈ F − 0 ∃y ∈ F , x · y = y · x = 1;

(viii) ∀x, y ∈ F , x · y = y · x;

(ix) ∀x, y, z ∈ F , x · (y + z) = x · y + x · z.

Properties (i)-(iv) say that (F,+) is an abelian group. Properties (v)-(viii)say that (F −0, ·) is an abelian group. Property (ix) is the distributive law forthe multiplication over addition.

Example 1.1.2. Q, R, C, Zp, where p is a prime number, are fields.

Definition 1.1.3. A vector space over a field F is a non-empty set V , togetherwith an addition + : V × V → V and a scalar multiplication · : F × V → V ,satisfying the following properties:

(i) ∀u, v, w ∈ V , (u+ v) + w = u+ (v + w);

(ii) ∃0 ∈ V ∀v ∈ V , v + 0 = 0 + v = v;

(iii) ∀v ∈ V ∃ − v ∈ V , v + (−v) = (−v) + v = 0;

(iv) ∀u, v ∈ V , u+ v = v + u;

(v) ∀m,n ∈ F ∀v ∈ V , (m+ n) · v = m · v + n · v;

1.1. VECTOR SPACES AND SUBSPACES 3

(vi) ∀m ∈ F ∀u, v ∈ V , m · (u+ v) = m · u+m · v;

(vii) ∀m,n ∈ F ∀v ∈ V , (m · n) · v = m · (n · v);

(viii) ∀v ∈ V , 1 · v = v.

Proposition 1.1.4. Let V be a vector space over a field F . Then

(i) ∀v ∈ V , 0 · v = 0;

(ii) ∀k ∈ F , k · 0 = 0;

(iii) ∀v ∈ V ∀k ∈ F , k · v = 0 ⇔ k = 0 or v = 0;

(iv) ∀v ∈ V , (−1) · v = −v.

Proof. Let v ∈ V and k ∈ F . Then

(i) 0 · v + v = 0 · v + 1 · v = (0 + 1) · v = 1 · v = v. Hence 0 · v = 0.

(ii) k · 0 = k · (0 · 0) = (k · 0) · 0 = 0 · 0 = 0.

(iii) If k · v = 0 and k 6= 0, then

v = 1 · v =(

1k· k)· v =

1k

(k · v) = 0.

(iv) (−1) · v+ v = (−1) · v+ 1 · v = (−1 + 1) · v = 0 · v = 0. Hence (−1) · v = −v.

Remark. When there is no confusion, we will denote the additive identity 0simply by 0.

Example 1.1.5. The following sets with the given addition and scalar multipli-cation are vector spaces.

(i) The set of n-tuples whose entries are in F :

Fn = (x1, x2, . . . , xn) | xi ∈ F, i = 1, 2, . . . , n,

with the addition and scalar multiplication given by

(x1, . . . , xn) + (y1, . . . , yn) = (x1 + y1, . . . , xn + yn),

k(x1, . . . , xn) = (kx1, . . . , kxn).


(ii) The set of m× n matrices whose entries are in F :

Mm×n(F ) = [aij ]m×n | aij ∈ F, i = 1, 2, . . . ,m; j = 1, 2, . . . , n ,

with the usual matrix addition and scalar multiplication. Note that ifm = n, we write Mn(F ) for Mn×n(F ).

(iii) The set of polynomials over F :

F [x] = a0 + a1x+ · · ·+ anxn | n ∈ N ∪ 0, ai ∈ F, i = 0, 1, . . . , n.

with the usual polynomial addition and scalar multiplication.

(iv) The set of sequences over F :

S = (xn) | xn ∈ F for all n ∈ N,

with the addition and scalar multiplication given by

(xn) + (yn) = (xn + yn),

k(xn) = (kxn).

Here we are not concerned with convergence of the sequences.

(v) Let X be a non-empty set. The set of F -valued functions on X

F(X) = f : X → F

with the following addition and scalar multiplication:

(f + g)(x) = f(x) + g(x) (x ∈ X),

(kf)(x) = kf(x) (x ∈ X).

Once we have some vector spaces to begin with, there are several methods toconstruct new vector spaces from the old ones. We first consider a subset whichis also a vector space under the same operations.

Definition 1.1.6. Let V be a vector space over a field F . A subspace of V is asubset of V which is also a vector space over F under the same operations. Wewrite W ≤ V to denote that W is a subspace of V .


Proposition 1.1.7. Let W be a non-empty subset of a vector space V over afield F . Then the following statements are equivalent:

(i) W is a subspace of V ;

(ii) ∀v ∈W ∀w ∈W , v + w ∈W and ∀v ∈W ∀k ∈ F, kv ∈W ;

(iii) ∀v ∈W ∀w ∈W ∀α ∈ F ∀β ∈ F , αv + βw ∈W .

Proof. We will establish (i)⇔ (ii) and (ii)⇔ (iii).(i)⇒ (ii). Assume W is a subspace of V . Then W is a vector space over a fieldF under the restriction of the addition and the scalar multiplication to W . Hencev + w ∈W and kv ∈W for any v, w ∈W and any k ∈ F .(ii) ⇒ (i). Assume (ii) holds. Since the axioms of a vector space hold for allelements in V , they also hold for elements in W as well. Since W is non-empty,we can choose an element v ∈W . Then 0 = 0 · v ∈W . Moreover, for any v ∈W ,−v = (−1) · v ∈ W . This shows that W contains the additive identity and theadditive inverse of each element.(ii) ⇒ (iii). Let v, w ∈ W and α, β ∈ F . Then αv ∈ W and βw ∈ W , whichimplies αv + βw ∈W .(iii)⇒ (ii). Assume (iii) holds. Then for any v, w ∈W , v+w = 1 ·v+1 ·w ∈Wand for any v ∈W and any k ∈ F , kv = k · v + 0 · v ∈W .

Example 1.1.8.

(i) 0 and V are subspaces of a vector space V .

(ii) [0, 1] and [0,∞) are not subspaces of R.

(iii) Let F be a field and define

(a) W1 = the set of upper triangular matrices in Mn(F ),

(b) W2 = the set of lower triangular matrices in Mn(F ), and

(c) W3 = the set of nonsingular matrices in Mn(F ).

Then W1 and W2 are subspaces of Mn(F ), but W3 is not a subspace ofMn(F ).


(iv) Let Pn = p ∈ F [x] : deg p ≤ n ∪ 0. Then Pn is a subspace of F [x], butthe set p ∈ F [x] : deg p = n ∪ 0 is not a subspace of F [x].

(v) By Example 1.1.5 (v), the set of real-valued functions F([a, b]) is a vectorspace over R. Now let

C([a, b]) = f : [a, b]→ R | f is continuous.

Then C([a, b]) is a subspace of F([a, b]). This follows from the standardfact from calculus that a sum of two continuous functions is still continuousand a multiplication of a continuous function by a scalar is also continuous.

(vi) Let S be the sequence space in Example 1.1.5 (iv). The following subsetsare subspaces of S:

`1 =

(xn) ∈ S :∞∑n=1

|xn| <∞

`∞ =

(xn) ∈ S : supn∈N|xn| <∞

c =

(xn) ∈ S : lim

n→∞xn exists

.

These subspaces play an important role and will be studied in greater detailsin functional analysis.

Proposition 1.1.9. Let V be a vector space and suppose Wα is a subspace of Vfor each α ∈ Λ. Then

⋂α∈ΛWα is a subspace of V .

Proof. Since 0 ∈Wα for each α ∈ Λ, we have 0 ∈⋂α∈ΛWα. Thus

⋂α∈ΛWα 6= ∅.

Next, let w1, w2 ∈⋂α∈ΛWα and k1, k2 ∈ F . Then w1, w2 ∈ Wα for each α ∈ Λ.

Hence k1w1 + k2w2 ∈Wα for each α ∈ Λ, i.e. k1w1 + k2w2 ∈⋂α∈ΛWα.

Proposition 1.1.10. Let S be a subset of a vector space V . Then there is thesmallest subspace of V containing S.

Proof. Define T = W ≤ V | S ⊆ W. Then T 6= ∅ because V ∈ T . LetU =

⋂W∈T W . Then U is a subspace of V containing S. If W ∗ is a subspace of

V containing S, then W ∗ ∈ T , which implies U =⋂W∈T W ⊆ W ∗. This shows

that U is the smallest subspace of V containing S.


Definition 1.1.11. Let S be a subset of a vector space V . Then we call thesmallest subspace containing S the subspace of V generated by S or the subspaceof V spanned by S, or simply the span of S, denoted by spanS or 〈S〉.

If 〈S〉 = V , we say that V is spanned by S or S spans V .

Proposition 1.1.12. Let S and T be subsets of a vector space V . Then

(i) 〈∅〉 = 0;

(ii) S ⊆ 〈S〉;

(iii) S ⊆ T ⇒ 〈S〉 ⊆ 〈T 〉;

(iv) 〈〈S〉〉 = 〈S〉.

Proof. (i) Clearly, 0 is the smallest subspace of V containing ∅.

(ii) This follows from the definition of 〈S〉.

(iii) Assume S ⊆ T . Since T ⊆ 〈T 〉, we have S ⊆ 〈T 〉. Then 〈T 〉 is a subspaceof V containing S. But 〈S〉 is the smallest subspace of V containing S. Hence〈S〉 ⊆ 〈T 〉.

(iv) Since S ⊆ 〈S〉, by (iii), 〈S〉 ⊆ 〈〈S〉〉. On the other hand, 〈〈S〉〉 is the smallestsubspace of V containing 〈S〉. But then 〈S〉 is a subspace of V containing 〈S〉itself. It implies that 〈〈S〉〉 ⊆ 〈S〉. Hence 〈〈S〉〉 = 〈S〉.

Definition 1.1.13. Let v1, . . . , vn ∈ V and k1, . . . , kn ∈ F . Then the elementk1v1 + · · · + knvn is called a linear combination of v1, . . . , vn with coefficientsk1, . . . , kn, respectively.

Theorem 1.1.14. If S ⊆ V , then 〈S〉 = the set of linear combinations of ele-ments in S.

Proof. Let W be the set of linear combinations of elements in S. For any s ∈ S,s is a linear combination of an element in S, namely s = 1 · s. Hence S ⊆W . Letv, w ∈ W and k, ` ∈ F . Then there exist v1, . . . , vn, w1, . . . , wm in S and scalarsα1, . . . , αn, β1, . . . , βm, for some m,n ∈ N, such that

v = α1v1 + · · ·+ αnvn and w = β1w1 + · · ·+ βmwm.


It follows that

kv + `w = (kα1)v1 + · · ·+ (kαn)vn + (`β1)w1 + · · ·+ (`βm)wm.

Thus kv + `w is a linear combination of elements in S. This shows that W is asubspace of V containing S. Hence 〈S〉 ⊆ W . On the other hand, let v ∈ W .Then there exist v1, . . . , vn ∈ S and α1, . . . , αn ∈ F , for some n ∈ N, such thatv = α1v1 + · · · + αnvn. Since each vi ∈ S ⊆ 〈S〉 and 〈S〉 is a subspace of V , wehave v =

∑ni=1 αivi ∈ 〈S〉. Hence W ⊆ 〈S〉. We now conclude that 〈S〉 = W .

Example 1.1.15.

(i) Let V = Fn, where F is a field. Let

e1 = (1, 0, . . . , 0), e2 = (0, 1, . . . , 0), . . . , en = (0, 0, . . . , 1).

Then span e1, e2, . . . , en = Fn. Indeed, any element (x1, . . . , xn) ∈ Fn

can be written as a linear combination x1e1 + · · ·+ xnen.

(ii) Let V = Mm×n(F ). For i = 1, . . . ,m and j = 1, . . . , n, let Eij be the m×nmatrix whose (i, j)-entry is 1 and 0 otherwise. Then Mm×n(F ) is spannedby Eij | i = 1, . . . ,m, j = 1, . . . , n. For, if A = [aij ] ∈Mm×n(F ), then

A =m∑i=1

n∑j=1

aij Eij .

(iii) The set of monomials 1, x, x2, x3, . . . spans the vector space F [x] becauseany polynomial in F [x] can be written as a linear combination of monomials.

(iv) Let S = (xn) | xn ∈ F for all n ∈ N be the vector space of sequences inF . For each k ∈ N, let

ek = (0, . . . , 0, 1, 0, 0, . . . )

where ek has 1 in the k-th coordinate, and 0’s elsewhere. Then ek∞k=1

does not span S. For example, a sequence (1, 1, 1, . . . ) cannot be writtenas a linear combination of ek’s. In fact,

span ek∞k=1 = (xn) ∈ S | xn = 0 for all but finitely many n’s.

We leave this as an exercise.


Exercises

1.1.1. Determine which of the following subsets of R4 are subspaces of R4.

(i) U = (a, b, c, d) | a+ 2b = c+ 2d.

(ii) V = (a, b, c, d) | a+ 2b = c+ 2d+ 1.

(iii) W = (a, b, c, d) | ab = cd.

1.1.2. Let A be an m× n matrix.

(i) Show that b ∈ Rm | Ax = b for some x ∈ Rn is a subspace of Rm.

(ii) Show that x ∈ Rn | Ax = 0 is a subspace of Rn.

(iii) Let b 6= 0 be an element in Rm. Verify whether x ∈ Rn | Ax = b is asubspace of Rn.

1.1.3. Verify whether the following sets are subspaces of Mn(R).

(i) A ∈Mn(R) | detA = 0;

(ii) A ∈Mn(R) | At = A;

(iii) A ∈Mn(R) | At = −A;

1.1.4. Let W1 and W2 be subspaces of a vector space V . Prove that W1 ∪W2 isa subspace of V if and only if W1 ⊆W2 or W2 ⊆W1.

1.1.5. Let V be a vector space over an infinite field F . Prove that V cannot bewritten as a finite union of its proper subspaces.

1.1.6. Let S = (xn) | xn ∈ F for all n ∈ N. Define

f = (xn) ∈ S | xn = 0 for all but finitely many n’s.

Prove that f = (xn) ∈ S | ∃N ∈ N ∀n ≥ N, xn = 0 and that f is a subspace ofS spanned by ek∞k=1, where ek is defined in Example 1.1.15 (iv).

1.1.7. An abelian group 〈V,+〉 is called divisible if for any non-zero integer n,nV = V , i.e. if for every u ∈ V and for any non-zero integer n, there exists v ∈ Vsuch that u = nv.

Prove that an abelian group 〈V,+〉 is a vector space over Q if and only if Vis divisible, all of whose non-zero elements are of infinite order.


1.2 Basis and Dimension

Definition 1.2.1. Let V be a vector space over a field F and S a subset of V .We say that S is linearly dependent if there exist distinct elements v1, . . . , vn ∈ Sand scalars k1, . . . , kn ∈ F , not all zero, such that k1v1 + · · ·+ knvn = 0.

We say that S is linearly independent if S is not linearly dependent. Inother words, S is linearly independent if and only if for any distinct elementsv1, . . . , vn ∈ S and any k1, . . . , kn ∈ F ,

k1v1 + · · ·+ knvn = 0 ⇒ k1 = · · · = kn = 0.

Remark.

(i) ∅ is linearly independent.

(ii) If 0 ∈ S, then S is linearly dependent.

(iii) If S ⊆ T and T is linearly independent, then S is linearly independent.

Example 1.2.2.

(i) Let V = Fn, where F is a field. Let e1, . . . , en be the coordinate vectorsdefined in Example 1.1.15 (i). Then e1, . . . , en is linearly independent.To see this, let α1, . . . , αn ∈ F be such that α1e1 + · · · + αnen = 0. Butthen

(0, . . . , 0) = α1e1 + · · ·+ αnen = (α1, . . . , αn).

Hence α1 = · · · = αn = 0.

(ii) Let V = Mm×n(F ). For i = 1, . . . ,m and j = 1, . . . , n, let Eij be defined asin Example 1.1.15 (ii). Then Eij | i = 1, . . . ,m, j = 1, . . . , n is linearlyindependent.

(iii) Let V = F [x]. Then the set 1, x, x2, x3, . . . is linearly independent. Thisfollows from the fact that a polynomial a0 + a1x + · · · + anx

n = 0 if andonly if ai = 0 for i = 0, 1, . . . , n.

(iv) Let S be the vector space of sequences in F defined in Example 1.1.15 (iv).For each k ∈ N, let ek be the k-th coordinate sequence. Then ek∞k=1 islinearly independent in S. We leave it to the reader to verify this fact.

1.2. BASIS AND DIMENSION 11

(v) Let V = C([0, 1]), the space of continuous real-valued functions defined on[0,1]. Let f(x) = 2x and g(x) = 3x for any x ∈ [0, 1]. Then f, g is linearlyindependent in C([0, 1]). Indeed, let α, β ∈ R be such that αf + βg = 0.Then α 2x + β 3x = 0 for any x ∈ [0, 1]. If x = 0, α + β = 0. If x = 1,2α+ 3β = 0. Solving these equations, we obtain α = β = 0.

(vi) If V is a vector space over fields F1 and F2 and S ⊆ V , then it is possiblethat a subset S of V is linearly independent over F1, but linearly dependentover F2. For example, let V = R, F1 = Q and F2 = R and S = 1,

√2.

Then S is linearly dependent over R: (−√

2) · 1 + 1 ·√

2 = 0. On the otherhand, suppose α, β ∈ Q are such that α · 1 + β ·

√2 = 0. If β 6= 0, then

α/β = −√

2, which is a contradiction. Hence β = 0, which implies α = 0.This shows that S is linearly independent over Q.

Theorem 1.2.3. Let S be a linearly independent subset of a vector space V .Then ∀v ∈ V − S, v /∈ 〈S〉 ⇔ S ∪ v is linearly independent.

Proof. Let v ∈ V − S be such that v /∈ 〈S〉. To show that S ∪ v is linearlyindependent, let v1, . . . , vn ∈ S and k1, . . . , kn, k ∈ F be such that

k1v1 + k2v2 + · · ·+ knvn + kv = 0.

Then kv = −k1v1 − k2v2 − · · · − knvn. If k 6= 0, we have

v = −k1

kv1 −

k2

kv2 − · · · −

knkvn ∈ 〈S〉 ,

which is a contradiction. It follows that k = 0 and that k1v1 + · · · + knvn = 0.By linear independence of S, we also have k1 = · · · = kn = 0. Hence S ∪ v islinearly independent.

On the other hand, let v ∈ V be such that v ∈ 〈S〉 and v /∈ S. Then thereexist v1, . . . , vn ∈ S and k1, . . . , kn ∈ F such that v = k1v1 + · · · + knvn. Hencek1v1 + · · · knvn + (−1)v = 0. Thus S ∪ v is linearly dependent.

Definition 1.2.4. A subset S of a vector space V is called a basis for V if

(i) S spans V, and

(ii) S is linearly independent.


Example 1.2.5.

(i) The following set of coordinate vectors is a basis for Fn

(1, 0, . . . , 0), (0, 1, . . . , 0), . . . , (0, 0, . . . , 1).

It is called the standard basis for Fn.

(ii) For i = 1, . . . ,m and j = 1, . . . , n, let Eij be an m× n matrix whose (i, j)-entry is 1 and 0 otherwise. Then Eiji=1,...,m

j=1,...,nis a basis for Mm×n(F ).

(iii) The set of monomials 1, x, x2, x3, . . . is a basis for F [x].

Theorem 1.2.6. Let B be a basis for a vector space V . Then any element in V

can be written uniquely as a linear combination of elements in B.

Proof. Since B spans V , any element in V can be written as a linear combinationof elements in B. We have to show that it can be written in a unique way. Letv ∈ V . Assume that

v =n∑i=1

αivi =m∑j=1

βjwj ,

where αi, βj ∈ F and vi, wj ∈ B for i = 1, . . . , n and j = 1, . . . ,m, for somem,n ∈ N. Without loss of generality, assume that vi = wi for i = 1, . . . , k andvk+1, . . . , vn ∩ wk+1, . . . , wm = ∅. Then we have

k∑i=1

(αi − βi)vi +n∑

i=k+1

αivi +m∑

j=k+1

(−βj)wj = 0.

By linear independence of v1, . . . , vk, vk+1, . . . , vn, wk+1, . . . , wm ⊆ B, we have

αi − βi = 0 for i = 1, . . . , k;

αi = 0 for i = k + 1, . . . , n;

−βj = 0 for j = k + 1, . . . ,m.

Hence m = n = k and v is written uniquely as a linear combination∑n

i=1 αivi.

Next, we give an alternative definition of a basis for a vector space.


Theorem 1.2.7. Let B be a subset of a vector space V . Then B is a basis forV if and only if B is a maximal linearly independent subset of V .

Proof. Let B be a basis for V and let C be a linearly independent subset of Vsuch that B ⊆ C. Assume that B 6= C. Then there is an element v ∈ C suchthat v /∈ B. Hence B ∪ v is still linearly independent, being a subset of C. ByTheorem 1.2.3, v /∈ 〈B〉, which is a contradiction since 〈B〉 = V . Hence B = C.

Conversely, let B be a maximal linearly independent subset of V . Supposethat 〈B〉 6= V . Then there exists v ∈ V such that v /∈ 〈B〉. By Theorem 1.2.3again, B ∪ v is linearly independent, contradicting the assumption. Hence B

spans V . It follows that B is a basis for V .

Next we show that every vector space has a basis. The proof of this requiresthe Axiom of Choice in an equivalent form of Zorn’s Lemma which we recall first.

Theorem 1.2.8 (Zorn’s Lemma). If S is a partially ordered set in which everytotally ordered subset has an upper bound, then S has a maximal element.

Theorem 1.2.9. In a vector space, every linearly independent set can be extendedto a basis. In particular, every vector space has a basis.

Proof. The second statement follows from the first one by noting that the emptyset is a linearly independent set and thus can be extended to a basis. To provethe first statement, let So be a linearly independent set in a vector space V . Set

E = S ⊆ V | S is linearly independent and So ⊆ S .

Then E is partially ordered by inclusion. Let E ′ = Sαα∈Λ be a totally orderedsubset of E . Let T =

⋃α∈Λ Sα. Clearly, So ⊆ T ⊆ V . To establish linear inde-

pendence of T , let v1, . . . , vn ∈ T and k1, . . . , kn ∈ F be such that∑n

i=1 kivi = 0.Since each vi belongs to some Sαi and E ′ is a totally ordered set, there must beSβ in E ′ such that vi ∈ Sβ for i = 1, . . . , n. Since Sβ is linearly independent, wehave ki = 0 for i = 1, . . . , n. This shows that T is an upper bound in E ′.

By Zorn’s Lemma, E has a maximal element S∗. Thus S∗ is a linearly inde-pendent set containing So. If there is v /∈ 〈S∗〉, by Theorem 1.2.3, S∗ ∪ v is alinearly independent set containing So. This contradicts the maximality of S∗.Hence 〈S∗〉 = V , which implies that S∗ is a basis for V .


The proof of the existence of a basis for a vector space above relies on theZorn’s Lemma, which is equivalent to the Axiom of Choice. Any proof thatrequires the Axiom of Choice is nonconstructive. It gives the existence withoutexplaining how to find one. In this situation, it implies that every vector spacecontains a basis, but it does not tell us how to construct one. If the vectorspace is finitely generated, i.e. spanned by a finite set, then we can constructa basis from the spanning set. In general, we know that a vector space has abasis but we may not be able to give one such example. For example, considerthe vector space S = (xn) | xn ∈ F for all n ∈ N. We have seen that the setof coordinate sequences ek∞k=1 is a linearly independent subset of S and hencecan be extended to a basis for S, but we do not have an explicit description ofsuch a basis.

A basis for a vector space is not unique. However, any two bases for thesame vector space have the same cardinality. We begin by proving the followingtheorem.

Theorem 1.2.10. For any vector space V , if V has a spanning set with n ele-ments, then any subset of V with more than n elements is linearly dependent.

Proof. We will prove the statement by induction on n.Case n = 1: Assume that V = span v. Let R be a subset of V with at leasttwo elements. Choose x, y ∈ R with x 6= y. Then there exist α, β ∈ F such thatx = αv and y = βv. Hence βx − αy = 0. Since x 6= y, α and β cannot be bothzero. This shows that R is linearly dependent.

Assume that the statement holds for n − 1. Let V be a vector space witha spanning set S = v1, . . . , vn. Let R = x1, . . . , xm be a subset of V wherem > n. Each xi ∈ R can be written as a linear combination of v1, . . . , vn:

xi =n∑j=1

aijvj i = 1, 2, . . . ,m. (1.1)

We examine the scalars ai1 that multiply v1 and split the proof into two cases.Case 1: ai1 = 0 for i = 1, . . . ,m. In this case, the sums in (1.1) do not involvev1. Let W = spanv2, . . . , vn. Then W is spanned by a set with n− 1 elements,R ⊆W and |R| = m > n > n− 1. It follows that R is linearly dependent.


Case 2: ai1 6= 0 for some i. By renumbering if necessary, we assume that a11 6= 0.Consider i = 1 in (1.1):

x1 =n∑j=1

a1jvj .

Multiplying by ci = ai1/a11, i = 2, . . . , n, both sides, we have

cix1 = ai1v1 +n∑j=2

cia1jvj i = 2, . . . ,m. (1.2)

Substract (1.1) from (1.2):

cix1 − xi =n∑j=2

(cia1j − aij)vj i = 2, . . . ,m.

Let W = spanv2, . . . , vn and R′ = cix1 − xi : i = 2, . . . ,m. We see thatR′ ⊆ W and |R′| = m − 1 > n − 1. By the induction hypothesis, R′ is linearlydependent. Hence there exist α2, . . . , αn ∈ F , not all zero, such that( m∑

i=2

αici

)x1 −

m∑i=2

αixi =m∑i=2

αi(cix1 − xi) = 0.

This implies that R = x1, . . . , xm is linearly dependent.

Corollary 1.2.11. If V has finite bases B and C, then |B| = |C|.

Proof. From the above theorem, if B spans V and C is linearly independent,then |C| ≤ |B|. By reversing the roles of B and C, we have |B| ≤ |C|. Hence|B| = |C|.

In fact, the above Corollary is true if V has infinite bases too, but the proofrequires arguments involving infinite cardinal numbers, which is beyond the scopeof this book, so we state it as a fact below and omit the proof.

Theorem 1.2.12. All bases for a vector space have the same cardinality.

Definition 1.2.13. Let V be a vector space over a field F . The cardinality of abasis for V is called the dimension of V , denoted by dimV . If dimV < ∞ (i.e.V has a finite basis), we say that V is finite-dimensional. Otherwise, we say thatV is infinite-dimensional.


Example 1.2.14.

(i) dim(0) = 0.

(ii) dimFn = n.

(iii) dim(Mm×n(F )) = mn.

(iv) dim(F [x]) =∞.

Proposition 1.2.15. Let V be a vector space. If W is a subspace of V , thendimW ≤ dimV .

Proof. Let B be a basis for W . Then B is a linearly independent subset of V , andhence can be extended to a basis C for V . Thus dimW = |B| ≤ |C| = dimV .

Corollary 1.2.16. Let V be a finite-dimensional vector space. If B is a linearlyindependent subset of V such that |B| = dimV , then B is a basis for V .

Proof. Let B be a linearly independent subset of V such that |B| = dimV .Suppose B is not a basis for V . Then B can be extended to a basis C for V andB ( C. Thus |C| = dimV = |B|, a contradiction.

Corollary 1.2.17. Let V be a finite-dimensional vector space and W a subspaceof V . If dimW = dimV , then W = V .

Proof. Let B be a basis for W . Then |B| = dimW = dimV . By Corollary 1.2.16,B is a basis for V . Hence W = 〈B〉 = V .


Exercises

1.2.1. Prove that 1,√

2,√

3 is linearly independent over Q, but linearly depen-dent over R.

1.2.2. Prove that sinx, cosx is a linearly independent subset of C([0, π]).

1.2.3. Prove that R is an infinite-dimensional vector space over Q.

1.2.4. If u, v, w is a basis for a vector space V , show that u+ v, v+w,w+uis a basis for V .

1.2.5. Let A and B be linearly independent subsets of a vector space V such thatA∩B = ∅. Show that A∪B is linearly independent if and only if 〈A〉∩〈B〉 = 0.

1.2.6. Prove the converse of Theorem 1.2.6: if B is a subset of a vector space Vover a field F such that every element in V can be written uniquely as a linearcombination of elements in B, then B is a basis for V .

1.2.7. Let V be a vector space over a field F and S ⊆ V with |S| ≥ 2. Showthat S is linearly dependent if and only if some element of S can be written as alinear combination of the other elements in S.

1.2.8. Let S be a subset of a vector space V . Show that S is a basis for V if andonly if S is a minimal spanning subset of V .

1.2.9. Let S be a spanning subset of a vector space V . Show that there is asubset B of S which is a basis for V .

1.2.10. Let V be a finite-dimensional vector space with dimension n. Let S ⊆ V .Prove that

(i) if |S| < n, then S does not span V ;

(ii) if |S| = n and V is spanned by S, then S is a basis for V .


1.3 Linear Maps

In this section, we study a function between vector spaces that preserves thevector space operations.

Definition 1.3.1. Let V and W be vector spaces over a field F . A functionT : V →W is called a linear map or a linear transformation if

(i) T (u+ v) = T (u) + T (v) for any u, v ∈ V ;

(ii) T (kv) = k T (v) for any v ∈ V and k ∈ F .

We can combine conditions (i) and (ii) together into a single condition asfollows:

Proposition 1.3.2. Let V and W be vector spaces over a field F . A functionT : V →W is linear if and only if

T (αu+ βv) = αT (u) + βT (v) for any u, v ∈ V and α, β ∈ F .

Proof. Assume that T : V →W is linear. Then for any u, v ∈ V and α, β ∈ F ,

T (αu+ βv) = T (αu) + T (βv) = αT (u) + βT (v).

Conversely, for any u, v ∈ V ,

T (u+ v) = T (1 · u+ 1 · v) = 1 · T (u) + 1 · T (v) = T (u) + T (v)

and for any v ∈ V and any k ∈ F ,

T (kv) = T (k · v + 0 · v) = kT (v) + 0T (v) = kT (v).

Hence T is linear.

The above proposition says that a linear map preserves a linear combinationof two elements. We can apply a mathematical induction to show that it preservesany linear combination of elements in a vector space.

Corollary 1.3.3. Let T : V →W be a linear map. Then

T (α1v1 + · · ·+ αnvn) = α1T (v1) + · · ·+ αnT (vn),

for any n ∈ N, any v1, . . . , vn ∈ V and any α1, . . . , αn ∈ F .

1.3. LINEAR MAPS 19

Proposition 1.3.4. If T : V →W be a linear map, then T (0) = 0.

Proof. T (0) = T (0 · 0) = 0 · T (0) = 0.

Example 1.3.5. The following functions are examples of linear maps.

(i) The zero map T : V →W defined by T (v) = 0 for all v ∈ V . The zero mapwill be denoted by 0.

(ii) The identity map IV : V → V defined by IV (v) = v for all v ∈ V .

(iii) Let A ∈Mm×n(F ). Define LA : Fn → Fm by

LA(x) = Ax for any x ∈ Fn,

where x is represented as an n× 1 matrix.

(iv) Define D : F [x]→ F [x] by

D(a0 + a1x+ · · ·+ anxn) = a1 + 2a2x+ · · ·+ nanx

n−1.

The map D is the “formal” differentiation of polynomials. We may denoteD(f) by f ′. The linearity of D can be written as (f + g)′ = f ′ + g′ and(kf)′ = k f ′ for any f, g ∈ F [x] and k ∈ F .

(v) Define T : C([a, b])→ R by

T (f) =∫ b

af(x) dx for any f ∈ C([a, b]).

The linearity of T follows from properties of the Riemann integration.

(vi) Let S denote the set of all sequences in F . Define R, L : S → S by

R((x1, x2, x3, . . . )) = (0, x1, x2, x3, . . . ), and

L((x1, x2, x3, . . . )) = (x2, x3, x4, . . . ).

The map R is called the right-shift operator and the map L is called theleft-shift operator.


Definition 1.3.6. Let T : V → W be a linear map. Define the kernel and theimage of T to be the following sets:

kerT = v ∈ V | T (v) = 0,

imT = w ∈W | ∃v ∈ V, w = T (v).

Proposition 1.3.7. Let T : V →W be a linear map. Then

(i) kerT is a subspace of V ;

(ii) imT is a subspace of W ;

(iii) T is onto if and only if imT = W ;

(iv) T is 1-1 if and only if kerT = 0.

Proof. (i) Since T (0) = 0, 0 ∈ kerT. Let u, v ∈ kerT and α, β ∈ F . Then

T (αu+ βv) = αT (u) + βT (v) = 0.

Hence αu+ βv ∈ kerT . It shows that kerT is a subspace of V .

(iii) Since T (0) = 0, 0 ∈ imT . Let u, v ∈ imT and α, β ∈ F . Then there existx, y ∈ V such that T (x) = u and T (y) = v. It follows that

αu+ βv = αT (x) + βT (y) = T (αx+ βy) ∈ imT.

Thus imT is a subspace of W .

(iii) This is a restatement of T being onto.

(iv) Suppose that T is 1-1. It is clear that 0 ⊆ kerT . Let u ∈ kerT . ThenT (u) = 0 = T (0). Since T is 1-1, u = 0. Hence kerT = 0.

Conversely, assume that kerT = 0. Let u, v ∈ V be such that T (u) = T (v).Then T (u− v) = T (u)− T (v) = 0. Thus u− v = 0, i.e. u = v. This shows thatT is 1-1.

The next theorem states the relation between the dimensions of the kerneland the image of a linear map.

1.3. LINEAR MAPS 21

Theorem 1.3.8. Let T : V → W be a linear map between finite-dimensionalvector spaces. Then

dimV = dim(kerT ) + dim(imT ).

Proof. Let A = v1, . . . , vk be a basis for kerT . Then it is a linearly independentset in V and thus can be extended to a basis B = v1, . . . , vk, vk+1, . . . , vn forV . We will show that C = T (vk+1), . . . , T (vn) is a basis for imT . To see thatit spans imT , let w = T (v), where v ∈ V . Then v can be written uniquely asv = α1v1 + · · ·+ αnvn, for some α1, . . . , αn ∈ F . Hence

T (v) = T( n∑i=1

αivi

)=

n∑i=1

αiT (vi) =n∑

i=k+1

αiT (vi),

because T (v1) = · · · = T (vk) = 0. Hence w = T (v) is in the span of C. Now letαk+1, . . . , αn ∈ F be such that

αk+1T (vk+1) + · · ·+ αnT (vn) = 0.

Then

T( n∑i=k+1

αivi

)=

n∑i=k+1

αiT (vi) = 0.

Hence∑n

i=k+1 αivi ∈ kerT . Since A is a basis for kerT , there exist α1, . . . , αk ∈ Fsuch that

n∑i=k+1

αivi =k∑i=1

αivi.

It follows thatk∑i=1

αivi +n∑

i=k+1

(−αi)vi = 0.

Since B is a basis for V , αi = 0 for i = 1, . . . , n. In particular, it means that C islinearly independent. We conclude that C is a basis for imT . Now,

dim(imT ) = n− k = dimV − dim(kerT ).

This establishes the theorem.


Definition 1.3.9. Let V and W be vector spaces and T : V →W a linear map.We call dim(kerT ) and dim(imT ) the nullity and rank of T , respectively. Denotethe rank of T by rankT . (We do not introduce notation for the nullity becauseit has less use.)

Example 1.3.10. Let T : R3 → R3 be defined by

T (x, y, z) = (2x− y, x+ 2y − z, z − 5x).

Find kerT , imT , rankT and the nullity of T .

Solution. If T (x, y, z) = (0, 0, 0), then

2x− y = 0, x+ 2y − z = 0, z − 5x = 0.

Solving this system of equations, we see that y = 2x, z = 5x, where x is a freevariable. Hence kerT = (x, 2x, 5x) | x ∈ R = 〈(1, 2, 5)〉. Moreover,

T (x, y, z) = x(2, 1,−5) + y(−1, 2, 0) + z(0,−1, 1).

Since (2, 1,−5) = −2(−1, 2, 0)− 5(0,−1, 1), imT = 〈(−1, 2, 0), (0,−1, 1)〉. HencerankT = 2 and the nullity of T is 1.

The next theorem states that a function defined on a basis of a vector spacecan be uniquely extended to a linear map on the entire vector space. Hence alinear map on a vector space is uniquely determined on its basis.

Theorem 1.3.11. Let B be a basis for a vector space V . Then for any vectorspace W and a function t : B → W , there is a unique linear map T : V → W

which extends t.

Proof. Existence: Let v ∈ V . Then v can be written uniquely in the form

v =n∑i=1

αivi

for some n ∈ N, v1, . . . , vn ∈ B and α1, . . . , αn ∈ F . Define

T (v) =n∑i=1

αit(vi).

1.3. LINEAR MAPS 23

Clearly, this map is well-defined and T extends t. To show that T is linear, letu, v ∈ V and r, s ∈ F . Then

u =m∑i=1

αiui and v =n∑j=1

βjvj

for some m,n ∈ N, ui, vj ∈ B and αi, βj ∈ F , i = 1, . . . ,m, j = 1, . . . , n.By renumbering if necessary, we may assume that ui = vi for i = 1, . . . , k anduk+1, . . . , um ∩ vk+1, . . . , vn = ∅. Then

ru+ sv =k∑i=1

(rαi + sβi)ui +m∑

i=k+1

rαiui +n∑

j=k+1

sβjvj .

Hence

T (ru+ sv) =k∑i=1

(rαi + sβi)t(ui) +m∑

i=k+1

rαit(ui) +n∑

j=k+1

sβjt(vj)

= rm∑i=1

αit(ui) + sn∑j=1

βjt(vj)

= r T (u) + s T (v).

Uniqueness: Assume that S and T are linear maps from V into W that areextensions of t. Let v ∈ V . Then v can be written uniquely as v =

∑ni=1 kivi for

some n ∈ N, v1, . . . , vn ∈ B and k1, . . . , kn ∈ F . Since S is linear,

S(v) =n∑i=1

kiS(vi) =n∑i=1

kit(vi).

Do the same for T . We can see that S(v) = T (v) for any v ∈ V .

We can state the above theorem in terms of the universal mapping property,which will be useful later.

Let iB : B → V denote the inclusion map defined by iB(x) = x for any x ∈ B.Then the above theorem can be restated as:

B iB //

t????????? V

T

W

For any vector space W and a function t : B → W ,there exists a unique linear map T : V → W suchthat T iB = t.


Definition 1.3.12. A function T : V →W is called a linear isomorphism if it islinear and bijective. If there is a linear isomorphism from V onto W , we say thatV is isomorphic to W , denoted by V ∼= W .

Proposition 1.3.13. Let T : V → W be a linear map. Then T is a linearisomorphism if and only if T has a linear inverse, i.e., a linear map S : W → V

such that ST = IV and TS = IW .

Proof. (⇒) Assume that T is a linear isomorphism. Since T is bijective, T hasan inverse function T−1 : W → V such that T−1T = IV and TT−1 = IW . Itremains to show that T−1 is linear. Let u, v ∈W and α, β ∈ F . Then

T (αT−1(u) + βT−1(v)) = αT (T−1(u)) + β T (T−1(v)) = αu+ βv.

Thus T−1(αu+ βv) = αT−1(u) + βT−1(v).(⇐) Suppose there is a linear map S : W → V such that ST = IV and TS = IW .It is easy to verify that ST = IV implies injectivity of T and TS = IW impliessurjectivity of T . Hence T is linear and bijective, i.e. a linear isomorphism.

By the above proposition, a linear isomorphism is also called an invertiblelinear map. Frequently, it is easy to prove that two vector spaces are isomorphicby finding linear maps from one vector space to the other which are inverses ofeach other.

Example 1.3.14.

(i) Let V be a finite-dimensional vector space of dimension n over a field F .Then V ∼= Fn. To see this, fix a basis v1, . . . , vn for V . Then any elementin V can be written uniquely as a1v1 + · · ·+ anvn, where a1, . . . , an ∈ F . Alinear isomorphism between V and Fn is given by

a1v1 + · · ·+ anvn ←→ (a1, . . . , an).

(ii) Mm×n(F ) ∼= Mn×m(F ). The linear maps Φ: Mm×n(F ) → Mn×m(F ) andΨ: Mn×m(F )→Mm×n(F ) defined by

Φ(A) = At, and Ψ(B) = Bt,

for any A ∈Mn×m(F ) and B ∈Mn×m(F ), are inverses of each other.

1.3. LINEAR MAPS 25

Theorem 1.3.15. Let V and W be vector spaces. Then V ∼= W if and only ifdimV = dimW.

Proof. Assume that T : V → W is a linear isomorphism. Let B be a basis forV . Then it is easy to show that T [B] is a basis for W . Since T is a bijection,|B| = |T [B]|, which implies that dimV = dimW . Conversely, assume thatdimV = dimW and let B and C be bases for V and W , respectively. SupposeB = vαα∈Λ and C = wαα∈Λ. Define T : B → W by T (vα) = wα for eachα ∈ Λ and extend it to a linear map with the same name from V into W .Similarly, define S : C → V by S(wα) = vα for each α ∈ Λ and extend it to alinear map S from W into V . It is easy to see that ST = IV and TS = IW .Hence S and T are linear isomorphisms, which implies that V ∼= W .

Theorem 1.3.16. Let T : V → W be a linear map between finite-dimensionalvector spaces with dimV = dimW . Then the following statements are equivalent:

(i) T is 1-1;

(ii) T is onto;

(iii) T is a linear isomorphism.

Proof. It suffices to show that T is 1-1 if and only if T is onto. Suppose T is1-1. Then kerT = 0. By Theorem 1.3.8, dimW = dimV = dim(imT ). Itfollows from Corollary 1.2.17 that W = imT . On the other hand, suppose T isonto. Then imT = W . Hence dim(imT ) = dimW = dimV . By Theorem 1.3.8,dim(kerT ) = 0, i.e. kerT = 0, which implies that T is 1-1.

Corollary 1.3.17. Let T and S be linear maps on a finite-dimensional vectorspace V . Then ST = IV implies TS = IV . In other words, if a linear map on afinite-dimensional vector space is either left-invertible or right-invertible, then itis invertible.

Proof. The condition TS = IV implies that T is onto and S is 1-1. The conclusionnow follows from Theorem 1.3.16.

Remark. Theorem 1.3.16 and Corollary 1.3.17 may not hold if V is infinitedimensional. See problem 1.3.13.


Proposition 1.3.18. Let V and W be vector spaces over F . If S, T : V → W

are linear maps and k ∈ F , define S + T and kT by

(S + T )(v) = S(v) + T (v),

(kT )(v) = k T (v).

Then S + T and kT are linear maps from V into W .

Proof. For any u, v ∈ V and α, β ∈ F ,

(S + T )(αu+ βv) = S(αu+ βv) + T (αu+ βv)

= αS(u) + βS(v) + αT (u) + βT (v)

= αS(u) + T (u)+ βS(v) + T (v)

= α(S + T )(u) + β(S + T )(v).

Hence S + T is linear. Similarly, we can show that kT is linear.

Definition 1.3.19. Let V and W be vector spaces over a field F . Denote byL(V,W ) or Hom(V,W ) the set of linear maps from V into W :

L(V,W ) = Hom(V,W ) = T : V →W | T is linear.

If V = W , we simply write L(V ) or Hom(V ).

Proposition 1.3.20. Let V and W be vector spaces over F . Then L(V,W ) is avector space over F under the operations defined above.

Proof. We leave this as a routine exercise.

Proposition 1.3.21. Let V and W be finite-dimensional vector spaces over F .Then

dim(L(V,W )) = (dimV )(dimW ).

Proof. Let B = v1, . . . , vn and C = w1, . . . , wm be bases for V and W ,respectively. For i = 1, . . . ,m and j = 1, . . . , n, define Tij : B→W by

Tij(vj) = wi and Tij(vk) = 0 if k 6= j.

Extend each of them to a linear map from V into W . We leave it as an exerciseto show that Tij is a basis for L(V,W ). Since Tij has mn elements, we seethat dim(L(V,W )) = nm = (dimV )(dimW ).

1.3. LINEAR MAPS 27

The next proposition shows that a composition of linear maps is still linear.

Proposition 1.3.22. Let U , V and W be vector spaces over F . If S : U → V

and T : V →W are linear maps, let

TS(v) = T S(v) = T (S(v)) for any v ∈ V .

Then TS is a linear map from U into W .

Proof. For any u, v ∈ U and α, β ∈ F ,

(TS)(αu+ βv) = T (αS(u) + βS(v)) = αTS(u) + β TS(v).

This shows that TS is linear.

Sometimes a vector space has an extra operation which can be regarded as amultiplication.

Definition 1.3.23. Let V be a vector space over a field F . A product is afunction V × V → V , (x, y) 7→ x · y, satisfying the following properties: for anyx, y, z ∈ V ,

(i) x · (αy + βz) = α(x · y) + β(x · z); “left-distributive law”

(ii) (αx+ βy) · z = α(x · z) + β(y · z). “right-distributive law”

The product is said to be associative if it satisfies

(x · y) · z = x · (y · z) for any x, y, z ∈ V .

An algebra is a vector space equipped with an associative product. It is said tobe commutative if the product is commutative. If it has a multiplicative identity,i.e. ∃ 1 ∈ V , 1 · v = v · 1 = v for all v ∈ V , we call it a unital algebra.

Note that an algebra has 3 operations: addition, multiplication and scalarmultiplication. It has a ring structure under the addition and multiplication.

Definition 1.3.24. Let V and W be algebras over a field F . A map φ : V →W

is called an algebra homomorphism if it is a linear map such that

φ(x · y) = φ(x) · φ(y) for any x, y ∈ V .

It is called an algebra isomorphism if it a bijective algebra homomorphism.


Proposition 1.3.25. Let V be a vector space over a field F . Define the producton L(V ) by ST = S T for any S, T ∈ L(V ). Then L(V ) is a unital algebra. IfdimV > 1, it is a non-commutative algebra.

Proof. By Proposition 1.3.20, L(V ) is a vector space over F . By linearity of S,for any S, T1, T2 ∈ L(V ) and α, β ∈ F ,

S(αT1 + βT2) = αST1 + β ST2.

On the other hand, by the definition of addition, for any S, T1, T2 ∈ L(V ) andα, β ∈ F ,

(αS1 + βS2)T = αS1T + β S2T.

The associativity of the product follows from the associativity of the compositionof functions. Moreover, IV T = TIV = T . Hence L(V ) is a unital algebra. IfdimV > 1, choose a linearly independent subset x, y of V and extend it to abasis for V . Define S(x) = y, S(y) = y, T (x) = x and T (y) = x and extend themto linear maps on V . It is easy to see that ST (x) 6= TS(x).

We have seen that L(V ) is a unital algebra. Other examples of an algebracan be found below.

Example 1.3.26.

(i) For n ≥ 2, the set Mn(F ) of n × n matrices over F is a unital non-commutative algebra, where the product is the usual matrix multiplication.The identity matrix In is the multiplicative identity.

(ii) The set F [x] of polynomials over F is a unital commutative algebra underthe usual polynomial operations. The polynomial 1 is the multiplicativeidentity.

(iii) Let X be a non-empty set and F a field. The set of all F -valued functionsF(X) = f : X → F is a unital commutative algebra under the point-wise operations. The constant function 1(x) = 1 for any x ∈ X, is themultiplicative identity.

(iv) The space C([a, b]) of continuous functions on [a, b] is a unital commutativealgebra under the pointwise operations.

1.3. LINEAR MAPS 29

Exercises

1.3.1. Fix a matrix Q ∈Mn(F ) and let W = A ∈Mn(F ) | AQ = QA.

(a) Prove that W is a subspace of Mn(F ).

(b) Define T : Mn(F ) → Mn(F ) by T (A) = AQ − QA for any A ∈ Mn(F ).Prove that T is a linear map and find kerT .

1.3.2. Let T : U → V be a linear map. If W is a subspace of V , prove that

T−1[W ] = u ∈ U | T (u) ∈W

is a subspace of U .

1.3.3. Let V and W be vector spaces and T : V → W a linear transformation.Prove that T is one-to-one if and only if T maps any linearly independent subsetsof V to a linearly independent subset of W .

1.3.4. Let V and W be vector spaces and T : V → W a linear transformation.Let B be a basis for kerT and C a basis for V such that B ⊆ C. Let B′ = C−B.Show that

(i) for any v1 and v2 in B′, if v1 6= v2 then T (v1) 6= T (v2);

(ii) T [B′] = T (v) | v ∈ B′ is a basis for imT .

Remark. We do not assume that V and W are finite-dimensional.

1.3.5. Let V be a vector space over a field F with dimV = 1. Show thatif T : V → V is a linear map, then there exists a unique scalar k such thatT (v) = kv for any v ∈ V .

1.3.6. Let T be a linear map on a finite-dimensional vector space V such thatrankT = rankT 2. Show that imT ∩ kerT = 0.

1.3.7. Let T be a linear map on a finite-dimensional vector space V such thatT 2 = 0. Show that 2 rankT ≤ dimV .


1.3.8. Let V and W be finite-dimensional vector spaces and T : V →W a linearmap. Show that

rankT ≤ mindimV,dimW.

1.3.9. Let U , V and W be finite-dimensional vector spaces and S : U → V ,T : V →W linear maps. Show that

rank(TS) ≤ minrankS, rankT.

Moreover, if S or T is a linear isomorphism, then

rank(TS) = minrankS, rankT.

1.3.10. Let Vi be vector spaces over a field F and fi : Vi → Vi+1 linear maps.Consider a sequence

· · · −→ Vi−1fi−1−−−→ Vi

fi−→ Vi+1 −→ · · ·

It is called exact at Vi if im fi−1 = ker fi. It is exact if it is exact at each Vi.

(i) Prove that 0 −→ VT−→W is exact if and only if T is 1-1.

(ii) Prove that VT−→W −→ 0 is exact if and only if T is onto.

(iii) Let V1, . . . , Vn be finite-dimensional vector spaces. Assume that we havean exact sequence

0 −→ V1 −→ V2 −→ · · · −→ Vn −→ 0.

Prove thatn∑i=0

(−1)i dimVi = 0.

1.3.11. Prove Proposition 1.3.20.

1.3.12. Recall that f = (xn) | ∃N ∈ N ∀n ≥ N, xn = 0 is a subspace of S.Prove that f is isomorphic to F [x].

1.3.13. Give an example to show that Theorem 1.3.16 and Corollary 1.3.17 maynot hold if V is infinite dimensional.

1.3. LINEAR MAPS 31

1.3.14. Prove that the set Tij in Proposition 1.3.21 is a basis for L(V,W ).

1.3.15. Let V and W be vector spaces and let U : V → W be a linear isomor-phism. Show that the map T 7→ UTU−1 is a linear isomorphism from L(V, V )onto L(W,W ).

1.3.16. Suppose V is a finite-dimensional vector space and T : V → V a linearmap such that T 6= 0 and T is not a linear isomorphism. Show that there is alinear map S : V → V such that ST = 0 but TS 6= 0.

1.3.17. Let V be a finite-dimensional vector space and suppose that U and W

are subspaces of V such that dimU + dimW = dimV . Prove that there exists alinear transformation T : V → V such that kerT = U and imT = W .


1.4 Matrix Representation

In this section, we gives a computational aspect of linear maps. The main theoremis that there is a 1-1 correspondence between the set of linear maps and the setof matrices. By assigning coordinates with respect to bases for vector spaces, weturn a linear mapping to a matrix multiplication. On the other hand, to proveresults about matrices, it is easily done by considering the linear map obtainedby the matrix multiplication.

Definition 1.4.1. Let V be a vector space of dimension n. An ordered n-tuple(v1, . . . , vn) of n elements in V is called an ordered basis if v1, . . . , vn is a basisfor V .

In other words, an ordered basis for a vector space is a basis such that the orderof its elements is taken into account. We still use the usual notation v1, . . . , vnfor ordered basis (v1, . . . , vn).

Definition 1.4.2. Let B = v1, . . . , vn be an ordered basis for a vector spaceV . If v = k1v1 + · · · + knvn, where ki ∈ F for i = 1, . . . , n, then (k1, . . . , kn) iscalled the coordinate vector of v with respect to B, denoted by [v]B.

Remark. For a computational purpose, we write a vector (α1, . . . , αn) in Fn asa column matrix:

α1

...αn

We will also write it horizontally as [α1 . . . αn]t.

Proposition 1.4.3. Let V be a vector space over a field F with dimV = n. Fixan ordered basis B for V . Then the map v 7→ [v]B is a linear isomorphism fromV onto Fn.

Proof. Let B = v1, . . . , vn be an ordered basis for V . Any v ∈ V can bewritten uniquely as v = α1v1 + · · ·+αnvn, where αi ∈ F for i = 1, . . . , n. A linearisomorphism between V and Fn is given by

v = α1v1 + · · ·+ αnvn ←→ [α1 . . . αn]t.

1.4. MATRIX REPRESENTATION 33

It is easy to see that the map in each direction is a linear map and is an inverseof each other.

Theorem 1.4.4. Let V and W be vector spaces over a field F with dimV = n

and dimW = m. Fix ordered bases B for V and C for W , respectively. IfT : V →W is a linear map, then there is a unique m× n matrix A such that

[T (v)]C = A[v]B for any v ∈ V . (1.3)

Proof. Let B = v1, . . . , vn and C = w1, . . . , wm be ordered bases for V andW , respectively. First assume that there exists an m × n matrix A such that(1.3) holds. For each vj in B, [T (vj)]C = A[vj ]B. But then the column matrix[vj ]B has 1 in the j-th position and 0 in the other places. Thus A[vj ]B is the j-thcolumn of A. This shows that the matrix A can be formed by obtaining the j-thcolumn of A from [T (vj)]C:

A =[

[T (v1)]C [T (v2)]C . . . [T (vn)]C].

For each j ∈ 1, . . . , n, there exist a1j , . . . , amj ∈ F such that

T (vj) =m∑i=1

aijwi. (1.4)

Now we obtain all entries aij ’s of A. Hence if A satisfies (1.3), then A must be inthis form. Now we show that the matrix A defined this way satisfies (1.3). Letv ∈ V and write v = k1v1 + · · ·+ knvn, where k1, . . . , kn ∈ F . Then

T (v) = T( n∑j=1

kjvj

)=

n∑j=1

kjT (vj)

=n∑j=1

kj

( m∑i=1

aijwi

)=

m∑i=1

( n∑j=1

aijkj

)wi.

Hence [T (v)]C is an m × 1 matrix whose i-th row is∑n

j=1 aijkj . On the otherhand, A[v]B is an m×1 matrix whose i-th row is obtained by multiplying the i-throw of A to the only column of [v]B. Hence the i-th row of A[v]B is

∑nj=1 aijkj .

This shows that (1.3) holds. We now finish the proof.


Remark. We can give an alternative proof of the existence part of the abovetheorem as follows. For each vj in B, we write

T (vj) =m∑i=1

aijwi.

Form an m×n matrix A with the the (i, j)-entry aij given by the above equation.Hence

[T (vj)]C =

a1j

...amj

On the other hand, A[vj ]B is the j-th column of A. Hence [T (vj)]C = A[vj ]B.Now we can view [T ( · )]C and A[ · ]B = LA([ · ]B) as composite functions of linearmaps and hence both of them are linear. We have established the equality ofthese two linear maps on the ordered basis B = v1, . . . , vn and thus they mustbe equal on all elements v ∈ V .

Definition 1.4.5. The unique matrix A in Theorem 1.4.4 is called the matrixrepresentation of T with respect to the ordered bases B and C, respectively, andis denoted by [T ]B,C. Hence

[T (v)]C = [T ]B,C [v]B for all v ∈ V .

VT //

[ ]B

W

[ ]C

Fn

[T ]B,C // Fm

If V = W and B = C, we simply write [T ]B.

Given an m× n matrix A over F , we can define a linear map LA : Fn → Fm

by a matrix multiplication LA(x) = Ax for any x ∈ Fn. Now given a linearmap, we can construct a matrix so that the linear map, in coordinates, is justthe matrix multiplication.


Example 1.4.6. Let T : R3 → R2 be defined by

T (x, y, z) = (2x− 3y − z,−x+ y + 2z).

Let B = (1, 1, 0), (0, 1, 1), (1, 0, 1) and C = (1, 1), (−1, 1) be bases for R3 andR2, respectively. Find [T ]B,C.

Solution. Note that

T (1, 1, 0) = (−1, 0) = −12(1, 1) + 1

2(−1, 1)

T (0, 1, 1) = (−4, 3) = −12(1, 1) + 7

2(−1, 1)

T (1, 0, 1) = (1, 1) = 1(1, 1) + 0(−1, 1).

This shows that [T ]B,C =

(−1

2 −12 1

12

72 0

).

Example 1.4.7. Let Tθ : R2 → R2 be the θ-angle counterclockwise rotationaround the origin. Then it is a linear map on R2. Let B = (1, 0), (0, 1) be thestandard basis for R2. Find [Tθ]B and write down an explicit formula for Tθ.

Solution. If we rotate the points (1,0) and (0,1) on the plane counterclockwiseby θ-angle, using elementary geometry, we see that they get moved to the points(cos θ, sin θ) and (− sin θ, cos θ), respectively. Hence

Tθ(1, 0) = (cos θ, sin θ) = cos θ(1, 0) + sin θ(0, 1),

Tθ(0, 1) = (− sin θ, cos θ) = − sin θ(1, 0) + cos θ(0, 1).

Thus

[Tθ]B =

(cos θ − sin θsin θ cos θ

).

If (x, y) ∈ R2, then

[Tθ(x, y)]B =


)(x

y

)=

(x cos θ − y sin θx sin θ + y cos θ

).

Hence Tθ(x, y) = (x cos θ − y sin θ, x sin θ + y cos θ).


Proposition 1.4.8. Let V and W be finite-dimensional vector spaces with or-dered bases B and C, respectively. Let S, T : V →W be linear maps and α, β ∈ F .Then

[αS + βT ]B,C = α[S]B,C + β[T ]B,C.

Proof. Note that for any v ∈ V ,

[S(v)]C = [S]B,C[v]B and [T (v)]C = [T ]B,C[v]B,

which implies

[(αS + βT )(v)]C = α[S(v)]C + β[T (v)]C

= α[S]B,C[v]B + β[T ]B,C[v]B

= (α[S]B,C + β[T ]B,C)[v]B.

But then [αS + βT ]B,C is a unique matrix such that

[(αS + βT )(v)]C = [αS + βT ]B,C[v]B for any v ∈ V .

We conclude that [αS + βT ]B,C = α[S]B,C + β[T ]B,C.

Proposition 1.4.9. Let U , V and W be finite-dimensional vector spaces withordered bases A, B and C, respectively. Let S : U → V and T : V → W be linearmaps. Then

[TS]A,C = [T ]B,C[S]A,B.

Proof. Note that

[S(u)]B = [S]A,B [u]A for any u ∈ U , and (1.5)

[T (v)]C = [T ]B,C [v]B for any v ∈ V . (1.6)

Replacing v = S(u) in (1.6) and applying (1.5), we have

[T (S(u))]C = [T ]B,C[S(u)]B = [T ]B,C[S]A,B[u]A for any u ∈ U .

On the other hand, [TS]A,C is the unique matrix such that

[TS(u)]C = [TS]A,C [u]A for any u ∈ U .

We now conclude that [TS]A,C = [T ]B,C[S]A,B.


Theorem 1.4.10. Let V and W be finite-dimensional vector spaces over a fieldF with dimV = n and dimW = m. Let B and C be ordered bases for V and W ,respectively. Then the map T 7→ [T ]B,C is a linear isomorphism from L(V,W )onto Mm×n(F ). Hence

L(V,W ) ∼= Mm×n(F ).

Proof. Define Φ: L(V,W ) → Mm×n(F ) by Φ(T ) = [T ]B,C for any T ∈ L(V,W ).By Proposition 1.4.8, Φ is a linear map. To see that it is 1-1, let T ∈ L(V,W ) besuch that [T ]B,C = 0. Then for any v ∈ V ,

[T (v)]C = [T ]B,C[v]B = 0 [v]B = 0.

Hence for any v ∈ V , the coordinate vector of T (v), with respect to C, is azero vector. This shows that T (v) = 0 for any v ∈ V , i.e., T ≡ 0. To showthat T is onto, let A = [aij ] be an m × n matrix. Write B = v1, . . . , vn andC = w1, . . . , wm. Define t : B→W by

t(vj) =m∑i=1

aijwi for j = 1, . . . , n.

Extend t uniquely to a linear map T : V →W . It is easy to see that [T ]B,C = A.Hence Φ is a linear isomorphism.

If V = W , we know from Proposition 1.3.25 and Example 1.3.26 that L(V )and Mn(F ) are algebras. In this case, they are also isomorphic as algebras.

Corollary 1.4.11. Let V be a finite-dimensional vector space over a field F withdimV = n. Let B be an ordered basis for V . Then the map Φ: T 7→ [T ]B is analgebra isomorphism from L(V ) onto Mn(F ).

Proof. By Theorem 1.4.10, Φ is a linear isomorphism. That Φ(TS) = Φ(T )Φ(S)for all S, T ∈ L(V ) follows from Proposition 1.4.9.

By Theorem 1.4.10 and Corollary 1.4.11, we see that linear maps and matricesare two aspects of the same thing. We can prove theorems about matrices byworking with linear maps instead. See, e.g., Exercises 1.4.1-1.4.3. On the otherhand, matrices have an advantage of being easier to calculate with.


Proposition 1.4.12. Let V and W be finite-dimensional vector spaces with thesame dimension. Let B and C be ordered bases for V and W , respectively. Thena linear map T : V →W is invertible if and only if [T ]B,C is an invertible matrix.

Proof. Let n = dimV = dimW . Assume that T is invertible. Then there is alinear map T−1 : W → V such that T−1T = IV and TT−1 = IW . Then

[T−1]C,B[T ]B,C = [T−1T ]B = [IV ]B = In and

[T ]B,C[T−1]C,B = [TT−1]C = [IW ]C = In.

Hence [T ]B,C is invertible and [T ]−1B,C = [T−1]C,B.

Conversely, write A = [T ]B,C and assume that A is an invertible matrix. Thenthere is an n× n matrix B such that AB = BA = In. By Theorem 1.4.10, thereis a linear map S : W → V such that [S]C,B = B. Hence

[ST ]B = [S]C,B[T ]B,C = BA = In = [IV ]B.

Hence ST = IV . Similarly, TS = IW . This shows that T and S are invertibleand T−1 = S.

In the remaining part of this section, we discuss the rank of a matrix. Therank of a linear map is the dimension of its image. We will define the rank of amatrix to be the dimension of its column space, which turns out to be the sameas the dimension of its row space. We will establish the relation between the rankof a matrix and the rank of the corresponding linear map.

Definition 1.4.13. Let A be an m× n matrix over a field F . The row space ofA is the subspace of Fn spanned by the row vectors of A. Similarly, the columnspace of A is the subspace of Fm spanned by the column vectors of A.

The row rank of A is defined to be the dimension of the row space of A. Thecolumn rank of A is defined to be the dimension of the column space of A.

If A is an m × n matrix over F , the row space of A is a subspace of Fn,while the column space is a subspace of Fm. However, it is remarkable that theirdimensions are equal.

Theorem 1.4.14. Let A be an m× n matrix over a field F . Then the row rankand the column rank of A are equal.


Proof. Let A = [aij ]. Let r1, . . . , rm be the row vectors of A, and let c1, . . . , cn

be the column vectors of A. Let d be the row rank of A and let v1, . . . , vd bea basis for the row space of A. Write each vk ∈ Fn as

vk = (βk1, . . . , βkn) ∈ Fn.

For i = 1, . . . ,m, write

ri =d∑

k=1

αikvk =d∑

k=1

αik(βk1, . . . , βkn),

where each αik ∈ F . Hence

ri = (ai1, . . . , ain) =( d∑k=1

αikβk1, . . . ,d∑

k=1

αikβkn

).

From this, it follows that, for i = 1, . . . ,m and for j = 1, . . . , n,

aij =d∑

k=1

αikβkj .

Hence for j = 1, . . . , n,

cj = (a1j , . . . , amj)

=( d∑k=1

α1kβkj , . . . ,

d∑k=1

αmkβkj

)=

d∑k=1

βkj(α1k, . . . , αmk)

=d∑

k=1

βkjxk,

where xk = (α1k, . . . , αmk) ∈ Fm, for k = 1, . . . , d. This shows that

〈c1, . . . , cn〉 ⊆ 〈x1, . . . , xd〉.

Hence the column rank of A ≤ d = the row rank of A. But this is true forany matrix A. Thus the column rank of At ≤ the row rank of At. Since the rowspace and the column space of At are the column space and the row space of A,respectively, this shows that the column rank of A equals the row rank of A.


Definition 1.4.15. Let A be an m × n matrix over a field F . The rank of Ais defined to be the column rank of A, which equals the row rank of A, and isdenoted by rankA.

Remark. The elementary row operations preserve the row space of a matrix.Hence the rank of a matrix is still preserved under the elementary row operations.We can apply these operations to the matrix until it is in a reduced echelon form.Then the rank of the matrix is the number of non-zero row vectors in the reducedechelon form. However, the elementary row operations do not preserve the columnspace (but it preserves the column rank = the row rank).

Proposition 1.4.16. Let A be an m×n matrix over a field F . Let LA : Fn → Fm

be the linear map defined by LA(x) = Ax for any x ∈ Fn. Then

(i) imLA = the column space of A;

(ii) rankLA = rankA.

Proof. Let c1, . . . , cn be the column vectors of A. If x = (x1, . . . , xn) ∈ Fn, it iseasy to see that

Ax = x1c1 + · · ·+ xncn.

It follows that imLA = the column space of A. Thus (ii) follows from (i).

On the other hand, if T : V → W is a linear map, the rank of T coincideswith the rank of the matrix representation of T .

Proposition 1.4.17. Let V and W be finite-dimensional vector spaces with or-dered bases B and C, respectively. For any linear map T : V →W ,

rankT = rank([T ]B,C).

Proof. See Exercise 1.4.5.


Exercises

1.4.1. Recall that if A ∈Mm×n(F ), then the linear map LA : Fn → Fm is givenby LA(x) = Ax, for all x ∈ Fn. Prove the following statements:

(i) if B and C are standard ordered bases for Fn and Fm, respectively, then[LA]B,C = A;

(ii) if A and B are m× n matrices, then A = B if and only if LA = LB;

(iii) if A is an n× n matrix, then A is invertible if and only if LA is invertible.

1.4.2. If A and B are n × n matrices such that AB = In, prove that BA = In.Hence A and B are invertible and A−1 = B.

1.4.3. Let A be an m × n matrix and B an n ×m matrix such that AB = Im

and BA = In. Prove that m = n, A and B are invertible and A = B−1.

1.4.4. Let A be an n × n matrix. Show that A is invertible if and only ifrankA = n.

1.4.5. Let V and W be finite-dimensional vector spaces over a field F withdimV = n and dimW = m. Let B and C be ordered bases for V and W ,respectively. Let T : V → W be a linear map, and write A = [T ]B,C. LetLA : Fn → Fm be defined by LA(x) = Ax for all x ∈ Fn. Prove the followingstatements:

(i) kerT ∼= kerLA;

(ii) imT ∼= imLA;

(iii) rankT = rankA.

1.4.6. Let T : R2 → R2 be a linear map such that T 2 = T . Show that T = 0 orT = I or there is an ordered basis B for R2 such that

[T ]B =

[1 00 0

].

Hint: Consider dim(kerT ).


1.5 Change of Bases

Given two ordered bases B and B′ for a vector space V , the coordinate vectorsof an element v ∈ V with respect to B and B′, respectively, are usually different.We can transform one to the other by a matrix multiplication.

Theorem 1.5.1. Let B and B′ be ordered bases for a vector space V . Then thereexists a unique square matrix P such that

[v]B′ = P [v]B for any v ∈ V . (1.7)

V[ ]B

[ ]B′

?????????

FnP // Fn

Proof. The proof of this theorem is similar to the proof of Theorem 1.4.4. LetB = v1, . . . , vn and B′ = v′1, . . . , v′n be ordered bases for V . First, assumethat there is a matrix P such that (1.7) holds. For each vj ∈ B, [vj ]B is the n× 1column matrix with 1 in the j-th row and 0 in the other positions. Thus P [vj ]Bis the j-th column of P . Hence for (1.7) to hold, the j-th-column of P must be[vj ]B′ . It follows that P will be of the form:

P =[

[v1]B′ [v2]B′ . . . [vn]B′].

It remains to show that the matrix P defined above satisfies (1.7). The proof isthe same as that of Theorem 1.4.4 and we leave it as an exercise.

Definition 1.5.2. The matrix P with the property above is called the transitionmatrix from B to B′. Notice that this is the same as [IV ]B,B′ .

The proof of Theorem 1.5.1 gives a method of how to find a transition matrix.Let B = v1, . . . , vn and B′ = v′1, . . . , v′n be ordered bases for V . The j-thcolumn of the transition matrix from B to B′ is the coordinate vector of vj withrespect to B′. More precisely, for j = 1, . . . , n, write

vj =n∑i=1

pij v′i.

The matrix P = [pij ] is the transition matrix from B to B′.

1.5. CHANGE OF BASES 43

Example 1.5.3. Let B = (1, 0), (0, 1) and B′ = (1, 1), (−1, 1) be orderedbases for R2. Find the transition matrix from B to B′ and the transition matrixfrom B′ to B.

Solution. Note that

(1, 0) =12

(1, 1)− 12

(−1, 1),

(0, 1) =12

(1, 1) +12

(−1, 1).

Hence the transition matrix from B to B′ is

(12

12

−12

12

). Similarly,

(1, 1) = 1(1, 0) + 1(0, 1),

(−1, 1) = −1(1, 0) + 1(0, 1).

Hence the transition matrix from B′ to B is

(1 −11 1

).

In the above example, notice that(1 −11 1

)−1

=

(12

12

−12

12

).

This is true in general, as stated in the next proposition.

Proposition 1.5.4. The transition matrix is invertible. In fact, the inverse ofthe transition matrix from B to B′ is the transition matrix from B′ to B.

Proof. Let P be the transition matrix from B to B′ and Q the transition matrixfrom B′ to B, respectively. Then

[v]B′ = P [v]B and [v]B = Q[v]B′ for any v ∈ V .

Hence

[v]B = Q[v]B′ = QP [v]B for any v ∈ V , and

[v]B′ = P [v]B = PQ[v]B′ for any v ∈ V .

But then the identity matrix I is the unique matrix such that [v]B = I[v]B forany v ∈ V . Thus QP = I. By the same reason, PQ = I. This shows that P andQ are inverses of each other.


The next theorem shows the relation between the matrix representations ofthe same linear map with respect to different ordered bases.

Theorem 1.5.5. Let V and W be finite-dimensional vector spaces, B, B′ orderedbases for V , and C, C′ ordered bases for W . Let P be the transition matrix fromB to B′ and Q the transition matrix from C to C′. Then for any linear mapT : V →W ,

[T ]B′,C′ = Q [T ]B,C P−1.

VT //

[ ]B′

??????????

[ ]B

W

[ ]C

[ ]C′

??????????

Fn[T ]B′,C′ // Fm

Fn

P

77ooooooooooooooooo [T ]B,C // Fm

Q

77oooooooooooooooo

Proof. We can rephrase the statement of this theorem in terms of a commutativediagram in the following way. If the diagram is commutative on all 4 sides of thetent, it must be commutative at the base of the tent as well.

Now we proof the theorem. Write all the properties of the relevant matrices:

[T (v)]C = [T ]B,C[v]B for any v ∈ V ; (1.8)

[v]B′ = P [v]B for any v ∈ V ; (1.9)

[w]C′ = Q[w]C for any w ∈W . (1.10)

Replacing w = T (v) in (1.10) and applying the other identities above, we have

[T (v)]C′ = Q [T (v)]C = Q [T ]B,C[v]B = Q [T ]B,C P−1[v]B′

for any v ∈ V . But then [T ]B′,C′ is the unique matrix such that

[T (v)]C′ = [T ]B′,C′ [v]B′ for any v ∈ V .

We now conclude that [T ]B′,C′ = Q [T ]B,C P−1.


Corollary 1.5.6. Let V be a finite-dimensional vector space with ordered basesB and B′. Let P be the transition matrix from B to B′. Then for any linear mapT : V → V ,

[T ]B′ = P [T ]B P−1.

Proof. Let C = B, C′ = B′ and Q = P in Theorem 1.5.5.

Definition 1.5.7. Let A and B be square matrices. We say that A is similarto B if there is an invertible matrix P such that B = PAP−1. We use notationA ∼ B to denote A being similar to B.

Proposition 1.5.8. Similarity is an equivalence relation on Mn(F ).

Proof. This is easy and is left as an exercise.

If A is similar to B, then B is similar to A. Hence we can say that A and B

are similar.

Proposition 1.5.9. If T : V → V is a linear map on a finite-dimensional vectorspace V and if B and B′ are ordered bases for V , then [T ]B ∼ [T ]B′.

Proof. It follows from Corollary 1.5.6.


Exercises

1.5.1. If A = [aij ] is a square matrix in Mn(F ), define the trace of A to be thesum of all entries in the main diagonal:

tr(A) =n∑i=1

aii.

For any A,B ∈Mn(F ), prove that

(i) tr(AB) = tr(BA);

(ii) tr(ABA−1) = tr(B) if A is invertible.

1.5.2. Let T : V → V be a linear map on a finite-dimensional vector space V .Define the trace of T to be

tr(T ) = tr([T ]B)

where B is an ordered basis for V . Prove the following statements:

(i) this definition is well-defined, i.e., independent of a basis;

(ii) tr(TS) = tr(ST ) for any linear maps S and T on V .

1.5.3. Let V be a vector space over a field F .

(i) If V is finite dimensional, prove that it is impossible to find two linear mapsS and T on V such that ST − TS = IV .

(ii) Show that the statement in (i) is not true if V is infinite dimensional.(Take V = F [x], S(f)(x) = f ′(x) and T (f)(x) = xf(x) for any f ∈ F [x].)

1.5.4. Let V be a finite-dimensional vector space with dimV = n. Let B be anordered basis for V and T : V → V a linear map on V . Prove that if A is ann × n matrix similar to [T ]B, then there is an ordered basis C for V such that[T ]C = A.

1.5.5. Let V be a finite-dimensional vector space with dimV = n. Show that twon×n matrices A and B are similar if and only if they are matrix representationsof the same linear map on V with respect to (possibly) different ordered bases.


1.5.6. Let S, T : V → V be linear maps on a finite-dimensional vector space V .Show that there exist ordered bases B and B′ for V such that [S]B = [T ]B′ if andonly if there is a linear isomorphism U : V → V such that T = USU−1.Hint: If [S]B = [T ]B′ , let U be a linear map that carries B onto B′. Conversely,if T = USU−1, let B be any basis for V and B′ = U [B].

1.5.7. Show that if A and B are similar matrices, then rankA = rankB.


1.6 Sums and Direct Sums

In this section, we construct a new vector space from existing ones. Given vectorspaces V and W over the same field, we can define a vector space structure onthe Cartesian product V ×W . The new vector space obtained this way is calledan external direct sum. On the other hand, we can define the sum of subspacesof a vector space. If the subspaces have only zero vector in common, the sumwill be called the (internal) direct sum. We will investigate the relation betweenexternal and internal direct sums and generalize the idea into the case where wehave an arbitrary number of vector spaces.

Definition 1.6.1. Let V and W be vector spaces over the same field F . Define

V ×W = (v, w) | v ∈ V, w ∈W,

together with the following operations

(v, w) + (v′, w′) = (v + v′, w + w′)

k(v, w) = (kv, kw)

for any (v, w), (v′, w′) ∈ V ×W and k ∈ F . It is easy to check that V ×W isa vector space over F , called the direct product or the external direct sum of Vand W .

Proposition 1.6.2. Let V and W be finite-dimensional vector spaces. Then

dim(V ×W ) = dimV + dimW.

Proof. Let v1, . . . , vn and w1, . . . , wm be bases for V and W , respectively.Then B = (v1, 0), . . . , (vn, 0) ∪ (0, w1), . . . , (0, wm) is a basis for V ×W . Tosee that B spans V ×W , let (v, w) ∈ V ×W . Then v and w can be writtenuniquely as v = α1v1 + · · ·+αnvn and w = β1w1 + · · ·+βmwm, where αi, βj ∈ Ffor all i and j. Then

(v, w) = (v, 0) + (0, w) =n∑i=1

αi(vi, 0) +m∑j=1

βj(0, wj).

1.6. SUMS AND DIRECT SUMS 49

Now, let α1, . . . , αn, β1, . . . , βm be elements in F such thatn∑i=1

αi(vi, 0) +m∑j=1

βj(0, wj) = (0, 0).

Then ( n∑i=1

αivi,m∑j=1

βjwj

)= (0, 0).

Hencen∑i=1

αivi = 0 andm∑j=1

βjwj = 0.

It follows that αi = 0 and βj = 0 for all i, j.

Proposition 1.6.2 shows that the dimension of V ×W is the sum of the dimen-sions of V and W if they are finite-dimensional. It suggests that the Cartesianproduct V ×W is really a “sum” and not a product of vector spaces. That iswhy we call it the external direct sum. The adjective external is to emphasizethat we construct a new vector space from the existing ones.

Next, we turn to constructing a new subspace from existing ones. We knowthat an intersection of subspaces is still a subspace, but a union of subspaces maynot be a subspace. The sum of subspaces will play a role of union as we will seebelow.

Definition 1.6.3. Let W1 and W2 be subspaces of a vector space V . Define thesum of W1 and W2 to be

W1 +W2 = w1 + w2 | w1 ∈W1, w2 ∈W2.

Proposition 1.6.4. Let W1 and W2 be subspaces of a vector space V over a fieldF . Then W1 +W2 is a subspace of V generated by W1 ∪W2, i.e.

W1 +W2 = 〈W1 ∪W2〉 .

Proof. Clearly W1 +W2 6= ∅. Let x, y ∈ W1 +W2 and k ∈ F . Then there existw1, w

′1 ∈W1 and w2, w

′2 ∈W2 such that x = w1 + w2 and y = w′1 + w′2. Hence

x+ y = (w1 + w2) + (w′1 + w′2) = (w1 + w′1) + (w2 + w′2) ∈W1 +W2;

kx = k(w1 + w2) = (kw1) + (kw2) ∈W1 +W2.


Thus W1 + W2 is a subspace of V . Next, note that W1 and W2 are subsets ofW1 + W2, which implies W1 ∪W2 ⊆ W1 + W2. Hence 〈W1 ∪W2〉 ⊆ W1 + W2.Now let x ∈ W1 + W2. Then x = w1 + w2, where w1 ∈ W1 and w2 ∈ W2. Thusw1 and w2 belong to W1∪W2. This implies x = w1 +w2 ∈ 〈W1 ∪W2〉 . It followsthat W1 +W2 ⊆ 〈W1 ∪W2〉.

Example 1.6.5.

(i) We can write R2 as a sum of two subspaces in several ways:

R2 = 〈(1, 0)〉+ 〈(0, 1)〉 = 〈(1, 0)〉+ 〈(1, 1)〉 = 〈(1,−1)〉+ 〈(1, 1)〉 .

(ii) R3 can be written as a sum of the xy-plane and the yz-plane:

R3 = (x, y, 0) | x, y ∈ R+ (0, y, z) | y, z ∈ R. (1.11)

Also, R3 can be written as a sum of the xy-plane and the z-axis:

R3 = (x, y, 0) | x, y ∈ R+ (0, 0, z) | z ∈ R. (1.12)

Note that in (1.11), R3 is a sum of subspaces of dimension 2, while in (1.12),R3 is a sum of a subspace of dimension 2 and a subspace of dimension 1.

If V = W1 +W2, every element v in V can be written as v = w1 +w2, wherew1 ∈ W1 and w2 ∈ W2, but this representation need not be unique. It will beunique when W1 ∩W2 = 0, which is called the (internal) direct sum.

Definition 1.6.6. Let W1 and W2 be subspaces of a vector space V . We saythat V is the (internal) direct sum of W1 and W2, written as V = W1 ⊕W2, ifV = W1 +W2 and W1 ∩W2 = 0.

Proposition 1.6.7. Let W1 and W2 be subspaces of a vector space V . ThenV = W1 ⊕W2 if and only if every v ∈ V can be written uniquely as v = w1 +w2

for some w1 ∈W1 and w2 ∈W2.

Proof. Assume that V = W1 ⊕W2. By the definition of W1 + W2, every v ∈ Vcan be written as v = w1 + w2 for some w1 ∈ W1 and w2 ∈ W2. Assume


that v = w1 + w2 = w′1 + w′2, where w1, w′1 ∈ W1 and w2, w

′2 ∈ W2. Then

w1 − w′1 = w′2 − w2 ∈W1 ∩W2 = 0. This shows that w1 = w′1 and w2 = w′2.Conversely, it is easy to see that V = W1+W2. Let v ∈W1∩W2. Then we can

write v = v+0 ∈W1+W2 and v = 0+v ∈W1+W2. By the uniqueness part of theassumption, we have v = 0. Hence W1 ∩W2 = 0 and thus V = W1 ⊕W2.

Example 1.6.8.

(i) In Example 1.6.5, we have the following sum of R3:

R3 = (x, y, 0) | x, y ∈ R+ (0, y, z) | y, z ∈ R.

However, this is not a direct sum because

(x, y, 0) | x, y ∈ R ∩ (0, y, z) | y, z ∈ R = (0, y, 0) | y ∈ R.

On the other hand, we have the following direct sum

R3 = (x, y, 0) | x, y ∈ R ⊕ (0, 0, z) | z ∈ R.

(ii) Let V = Mn(R) and let W1 and W2 be the subspaces of symmetric matricesand of skew-symmetric matrices, respectively:

W1 = A ∈Mn(R) | At = A and W2 = A ∈Mn(R) | At = −A.

We leave it as an exercise to show that V = W1 ⊕W2.

Theorem 1.6.9. Let W1 and W2 be subspaces of a finite-dimensional vectorspace. Then

dim(W1 +W2) = dimW1 + dimW2 − dim(W1 ∩W2).

Proof. Let B = v1, . . . , vk be a basis for W1 ∩W2. Then B is a linearly inde-pendent subset of W1 and W2. Extend it to a basis B1 = v1, . . . , vk, w1, . . . , wnfor W1 and also extend it to a basis B2 = v1, . . . , vk, w

′1, . . . , w

′m for W2. Note

that wi 6= w′j for any i, j; for otherwise they are in W1 ∩W2. We will show thatthe set B = B1∪B2 = v1, . . . , vk, w1, . . . , wn, w

′1, . . . , w

′m is a basis for W1 +W2.

Once we establish this fact, it then follows that

dim(W1 +W2) = k + n+m = dimW1 + dimW2 − dim(W1 ∩W2).


To show that B spans W1 + W2, let v = u1 + u2, where u1 ∈ W1 and u2 ∈ W2.Since B1 and B2 are bases for W1 and W2, respectively, we can write

u1 = α1v1 + . . . αkvk + αk+1w1 + · · ·+ αk+nwn and

u2 = β1v1 + . . . βkvk + βk+1w′1 + · · ·+ βk+mw

′m,

where αi’s and βj ’s are in F . Then

u1 + u2 =k∑i=1

(αi + βi)vi +n∑i=1

αk+iwi +m∑i=1

βk+iw′i.

Hence B spans V . To establish linear independence of B, let α1, . . . , αk, β1, . . . , βn

and β′1, . . . , β′m be elements in F such that

k∑i=1

αivi +n∑i=1

βiwi +m∑i=1

β′iw′i = 0. (1.13)

Thenk∑i=1

αivi +n∑i=1

βiwi = −m∑i=1

β′iw′i ∈ W1 ∩W2.

Hence

−m∑i=1

β′iw′i =

k∑i=1

γivi

for some γ1, . . . , γk in F , which implies

k∑i=1

γivi +m∑i=1

β′iw′i = 0.

By linear independence of B2, γi and β′i are all zero. Now, (1.13) reduces to

k∑i=1

αivi +n∑i=1

βiwi = 0.

By linear independence of B1, we see that αi and βi are all zero. Hence B is abasis for W1 +W2.

Corollary 1.6.10. If W1 and W2 are subspaces of a finite-dimensional vectorspace V and V = W1 ⊕W2, then dimV = dimW1 + dimW2.


Proof. It V = W1⊕W2, then W1 ∩W2 = 0, and thus dim(W1 ∩W2) = 0.

The next proposition shows the relation between internal and external directsums.

Proposition 1.6.11. Let W1 and W2 be subspaces of a vector space V . Supposethat V = W1 ⊕W2. Then V ∼= W1 ×W2.

On the other hand, let V and W be vector spaces over a field F . Let X =V ×W be the external direct sum of V and W . Let X1 = (v, 0) | v ∈ V andX2 = (0, w) | w ∈W. Then X1 and X2 are subspaces of X, X1

∼= V , X2∼= W

and X = X1 ⊕X2.

Proof. Exercise.

In the future, we will talk about a direct sum without stating whether it isinternal or external. We also write V ⊕W to denote the (external) direct sum ofV and W . It should be clear from the context whether it is internal or external.Moreover, by Proposition 1.6.11, we can regard it as an internal direct sum oran external direct sum without confusion. We sometimes omit the adjective“internal” or “external” and simply talk about the direct sum of vector spaces.

Proposition 1.6.12. Let W be a subspace of a vector space V . Then there existsa subspace U of V such that V = U ⊕W .

Proof. Let B be a basis for W . Then B is linearly independent in V and hencecan be extended to a basis C for V . Let B′ = C− B and U = 〈B′〉. It is easy tocheck that U is a subspace of V such that V = W ⊕ U .

Proposition 1.6.13. Let V1 and V2 be subspaces of a vector space V such thatV = V1 ⊕ V2. Then given any vector space W and linear maps T1 : V1 → W andT2 : V2 → W , there is a unique linear map T : V1 ⊕ V2 → W such that T |V1 = T1

and T |V2 = T2.

Proof. Assume that there is a linear map T : V1 ⊕ V2 → W such that T |V1 = T1

and T |V2 = T2. By linearity, for any v1 ∈ V1 and v2 ∈ V2,

T (v1 + v2) = T (v1) + T (v2) = T1(v1) + T2(v2).


Hence we define the map T : V1 ⊕ V2 →W by

T (v1 + v2) = T1(v1) + T2(v2) for any v1 ∈ V1, v2 ∈ V2.

It is easy to show that T is linear and satisfies T |V1 = T1 and T |V2 = T2. Thisfinishes the uniqueness and existence of the map T .

Proposition 1.6.13 is the universal mapping property for the direct sum. Ifwe let ι1 : V1 → V1⊕ V2 and ι2 : V2 → V1⊕ V2 be the inclusion maps of V1 and V2

into V1 ⊕ V2, respectively, then it can be summarized by the following diagram:

V1ι1 //

T1

""EEEEEEEEEEEEEEEEEE V1 ⊕ V2

T

V2ι2oo

T2

||yyyyyyyyyyyyyyyyyy

W

This proposition can also be interpreted for the external direct sum if we defineι1 : V1 → V1⊕ V2 and ι2 : V2 → V1⊕ V2 by ι1(v1) = (v1, 0) and ι2(v2) = (0, v2) forany v1 ∈ V1 and v2 ∈ V2.

There is also another universal mapping property of the direct sum in termsof the projection maps.

Proposition 1.6.14. Let V1 and V2 be vector spaces over the same field. Fori = 1, 2, define πi : V1 ⊕ V2 → Vi by πi(v1, v2) = vi for any v1 ∈ V1 and v2 ∈ V2.Then given any vector space W and linear maps T1 : W → V1 and T2 : W → V2,there is a unique linear map T : W → V1⊕V2 such that π1T = T1 and π2T = T2.

W

T

T1

||yyyyyyyyyyyyyyyyyy

T1

""EEEEEEEEEEEEEEEEEE

V1 V1 ⊕ V2π1oo π2 // V2

Proof. Exercise.

Next, we will define a sum and a direct sum for a finite number of subspaces.


Definition 1.6.15. Let W1, . . . ,Wn be subspaces of a vector space V . Define

W1 + · · ·+Wn = w1 + · · ·+ wn | w1 ∈W1, . . . , wn ∈Wn.

Proposition 1.6.16. If W1, . . . ,Wn are subspaces of a vector space V , thenW1 + · · ·+Wn is a subspace of V generated by W1 ∪ · · · ∪Wn:

W1 + · · ·+Wn = 〈W1 ∪ · · · ∪Wn〉 .

Proof. The proof is the same as that of Proposition 1.6.4.

Definition 1.6.17. Let W1, . . . ,Wn be subspaces of a vector space V . We saythat V is the (internal) direct sum of W1, . . . , Wn if

(i) V = W1 + · · ·+Wn, and

(ii) Wi ∩ (W1 + · · ·+Wi−1 +Wi+1 + · · ·+Wn) = 0 for i = 1, . . . , n.

Denote it by V = W1 ⊕ · · · ⊕Wn.

The second condition in the above definition can be replaced by one of thefollowing equivalent statements below.

Proposition 1.6.18. Let W1, . . . ,Wn be subspaces of a vector space V and letV = W1 + · · ·+Wn. Then TFAE:

(i) Wi ∩ (W1 + · · ·+Wi−1 +Wi+1 + · · ·+Wn) = 0 for i = 1, . . . , n;

(ii) ∀w1 ∈W1 . . .∀wn ∈Wn, w1 + · · ·+ wn = 0 ⇒ w1 = · · · = wn = 0;

(iii) every v ∈ V can be written uniquely as v = w1 + · · ·+ wn, with wi ∈Wi.

Proof. (i) ⇒ (ii). Assume (i) holds. Let w1 ∈ W1, . . . , wn ∈ Wn be such thatw1 + · · ·+ wn = 0. For each i ∈ 1, . . . , n, we see that

−wi = w1 + · · ·+ wi−1 + wi+1 + · · ·+ wn

∈ Wi ∩ (W1 + · · ·+Wi−1 +Wi+1 + · · ·+Wn) = 0.

Hence wi = 0 for each i ∈ 1, . . . , n.(ii)⇒ (iii). Assume (ii) holds. Suppose an element v ∈ V can be written as

v = w1 + · · ·+ wn = w′1 + · · ·+ w′n,


where wi, w′i ∈Wi for i = 1, . . . , n. Then

(w1 − w′1) + · · ·+ (wn − w′n) = 0.

By the assumption, wi = w′i for i = 1, . . . , n.(iii)⇒ (i). Assume (iii) holds. Let v ∈Wi∩(W1 + · · ·+Wi−1 +Wi+1 + · · ·+Wn).Then v ∈Wi and

v = w1 + · · ·+ wi−1 + 0 + wi+1 + · · ·+ wn,

where wi ∈ Wi for i = 1, . . . , i − 1, i + 1, . . . , n. By the assumption, we see thatv = 0 and wi = 0 for wi, w′i ∈Wi for i = 1, . . . , n. This means that

Wi ∩ (W1 + · · ·+Wi−1 +Wi+1 + · · ·+Wn) = 0

for i = 1, 2, . . . , n.

The concept of a direct product or an external direct sum of an arbitrarynumber of vector spaces can be defined similarly. We first start with the casewhen there are finitely many vector spaces.

Definition 1.6.19. Let W1, . . . ,Wn be vector spaces over a field F . Define

W1 × · · · ×Wn = (w1, . . . , wn) | w1 ∈W1, . . . , wn ∈Wn.

Define the vector space operations componentwise. Then W1 × · · · × Wn is avector space over F , called the external direct sum of W1, . . . , Wn.

We list important results for a finite direct sum of vector spaces whose proofsare left as exercises.

Proposition 1.6.20. Let W1, . . . ,Wn be subspaces of a vector space V . Supposethat V = W1 ⊕ · · · ⊕Wn. Then V ∼= W1 × · · · ×Wn.

On the other hand, let V1, . . . , Vn be vector spaces. Let X = V1 × · · · × Vn bethe external direct sum of V1, . . . , Vn. For i = 1, . . . , n, let

Xi = (v1, . . . , vn) | vi ∈ Vi and vj = 0 for j 6= i.

Then each Xi is a subspace of X, Xi∼= Vi and X = X1 ⊕ · · · ⊕Xn.


Proof. Exercise.

We will also denote the external direct sum of V1, . . . , Vn by V1 ⊕ · · · ⊕ Vn.

Proposition 1.6.21. Assume that V = V1 ⊕ · · · ⊕ Vn, where V1, . . . , Vn aresubspaces of a vector space V . For i = 1, . . . , n, let Bi be a linearly independentsubset of Vi. Then B1 ∪ · · · ∪ Bn is a linearly independent subset of V . Inparticular, if Bi is a basis for each Vi, then B1 ∪ · · · ∪Bn is a basis for V .

Proof. Exercise.

Corollary 1.6.22. Let V1, . . . , Vn be subspaces of a finite-dimensional vectorspace V such that V = V1 ⊕ · · · ⊕ Vn. Then

dimV = dimV1 + · · ·+ dimVn.

Proof. Exercise.

In the above definition, we see that a direct product and an external directsum are the same when we have a finite number of vector spaces. Next, weconsider the general case when we have an arbitrary number of vector spaces. Inthis case, the definitions of a direct product and an external direct sum will bedifferent. But there is a close relation between the internal direct sum and theexternal direct sum.

Definition 1.6.23. Let Vαα∈Λ be a family of vector spaces. Define the Carte-sian product ∏

α∈Λ

Vα =v : Λ→

⋃α∈Λ

Vα : v(α) ∈ Vα for all α ∈ Λ.

Define the following operations:

(v + w)(α) = v(α) + w(α)

(kv)(α) = kv(α),

for any v, w ∈∏α∈Λ Vα and α ∈ Λ. It is easy to check that

∏α∈Λ Vα is a

vector space under the operations defined above. It is called the direct productof Vαα∈Λ. Next, we define⊕

α∈Λ

Vα =v ∈

∏α∈Λ

Vα : v(α) = 0 for all but finitely many α.


It is easy to see that⊕

α∈Λ Vα is a subspace of∏α∈Λ Vα. We call

⊕α∈Λ Vα the

(external) direct sum of Vαα∈Λ. Note that the direct product and the externaldirect sum of Vαα∈Λ are the same when the index set Λ is finite.

Now we define an arbitrary internal direct sum of a vector space.

Definition 1.6.24. Let V be a vector space over a field F . Let Vαα∈Λ be afamily of subspaces of V such that

(i) V =⟨⋃

α∈Λ Vα⟩;

(ii) for each β ∈ Λ, Vβ ∩⟨⋃

α∈Λ−β Vα

⟩= 0.

Then we say that V is the (internal) direct sum of Vαα∈Λ and denote it byV =

⊕α∈Λ Vα. An element in

⊕α∈Λ Vα can be written as a finite sum

∑α∈Λ vα,

where vα ∈ Vα for each α ∈ Λ and vα = 0 for all but finitely many α’s. Moreover,this representation is unique.

Theorem 1.6.25. Let Vαα∈Λ be a family of vector spaces. Form the externaldirect sum V =

⊕α∈Λ Vα. For each α ∈ Λ, let Wα be the subspace of V defined

byWα = v ∈ V | v(β) = 0 for all β ∈ Λ− α.

Then Wα∼= Vα for each α ∈ Λ and V =

⊕α∈ΛWα as an internal direct sum.

On the other hand, let V be a vector space over a field F and Wαα∈Λ afamily of subspaces of V such that V =

⊕α∈ΛWα as an internal direct sum.

Form an external direct sum W =⊕

α∈ΛWα. Then V ∼= W .

Proof. Exercise.


Exercises

1.6.1. Let V = Mn(R) be a vector space over R. Define

W1 = A ∈Mn×n(R) | At = A and W2 = A ∈Mn×n(R) | At = −A.

(a) Prove that W1 and W2 are subspaces of V .

(b) Prove that V = W1 ⊕W2.

1.6.2. Let A1, . . . , An be subsets of a vector space V . Show that

〈A1 ∪ · · · ∪An〉 = 〈A1〉+ · · ·+ 〈An〉.

1.6.3. Assume V = U ⊕W , where U and W are subspaces of V . For any v ∈ V ,there exists a unique pair (u,w) where u ∈ U and w ∈ W such that v = u + w.Define P (v) = u and Q(v) = w. Prove that

(i) P and Q are linear maps on V ;

(ii) P 2 = P and Q2 = Q;

(iii) P (V ) = U and Q(V ) = W .

1.6.4. Let P : V → V be a linear map on a vector space V such that P 2 = P .Prove that V = imP ⊕ kerP.

1.6.5. Prove Theorem 1.6.11.


1.6.7. Let U , V and W be vector spaces. Prove that

(i) L(U ⊕ V,W ) ∼= L(U,W )⊕ L(V,W );

(ii) L(U, V ⊕W ) ∼= L(U, V )⊕ L(U,W ).

Now generalize these statements to finite direct sums:

(iii) L(⊕n

i=1 Vi,W ) ∼=⊕n

i=1 L(Vi,W );

(iv) L(U,⊕n

i=1 Vi) ∼=⊕n

i=1 L(U, Vi).


1.6.8. Let W1, . . . ,Wn be subspaces of a vector space V . Show that V = W1 ⊕· · · ⊕Wn if and only if there exist linear maps P1, . . . , Pn on V such that

(i) IV = P1 + · · ·+ Pn;

(ii) PiPj = 0 for any i 6= j;

(iii) Pi(V ) = Wi for each i.

Moreover, show that if Pi satisfy (i) and (ii) for each i, then P 2i = Pi for each i.


1.6.10. Let V1, . . . , Vn be subspaces of a vector space V such that V = V1 ⊕· · · ⊕ Vn. Prove that given any vector space W and linear maps Ti : Vi → W fori = 1, . . . , n, there is a unique linear map T : V → W such that T |Vi = Ti fori = 1, . . . , n.

1.6.11. Prove Proposition 1.6.21 and Corollary 1.6.22.

1.6.12. Let W1, . . . ,Wn be subspaces of a finite-dimensional vector space V suchthat V = W1 + · · ·+Wn. Prove that V = W1 ⊕ · · · ⊕Wn if and only if dimV =dimW1 + · · ·+ dimWn.


1.6.14. Let Vαα∈Λ be a family of vector spaces. For each α ∈ Λ, define theα-th projection πα :

∏α∈Λ Vα → Vα by πα(v) = v(α) for each v ∈

∏α∈Λ Vα, and

the α-th inclusion ια : Vα →∏α∈Λ Vα by ια(x) = v ∈

∏α∈Λ Vα, where v(α) = x

and v(β) = 0 for any β 6= α.Let V =

∏α∈Λ Vα and let U =

⊕α∈Λ Vα. Prove the following statements:

(i) παια = IVα , the identity map on Vα, for each α ∈ Λ;

(ii) παιβ = 0 for all α 6= β in Λ;

(iii) πα is surjective and ια is injective for all α ∈ Λ;

(iv) given a vector space W and a family Tα : W → Vα for each α ∈ Λ, there isa unique linear map T : W → V such that πα T = Tα for each α ∈ Λ;

(v) given a vector space W and a family Sα : Vα →W for each α ∈ Λ, there isa unique linear map S : U →W such that S ια = Sα for each α ∈ Λ.

1.7. QUOTIENT SPACES 61

1.7 Quotient Spaces

Definition 1.7.1. Let W be a subspace of a vector space V . For each v ∈ V ,define the affine space of v to be

v +W = v + w | w ∈W.

Note that u ∈ v + W if and only if u = v + w for some w ∈ W . In general, anaffine space v +W is not a subspace of V because 0 may not be in v +W .

Proposition 1.7.2. Let W be a subspace of a vector space V . Then for anyu, v ∈ V ,

(i) u+W = v +W ⇔ u− v ∈W ;

(ii) u+W 6= v +W ⇒ (u+W ) ∩ (v +W ) = ∅.

Proof. (i) Assume that u + W = v + W . Since u = u + 0 ∈ u + W , we haveu = v + w for some w ∈ W . Hence u − v = w ∈ W . Conversely, assume thatu − v ∈ W . Then u + w = v + (u − v + w) ∈ v + W for any w ∈ W . Thusu + W ⊆ v + W . Since W is a subspace of V , u − v ∈ W implies v − u ∈ Wand hence v + w = u + (v − u + w) ∈ u + W for any w ∈ W . It follows thatv +W ⊆ u+W . Thus u+W = v +W .(ii) Suppose there exists x ∈ (u + W ) ∩ (v + W ). Then x = u + w = v + w′

for some w,w′ ∈ W . Hence u − v = w′ − w ∈ W . By part (i), it follows thatu+W = v +W .

Definition 1.7.3. Let V be vector space over a field F and W a subspace of V .Define

V/W = v +W | v ∈ V .

Define the vector space operations on V/W by

(u+W ) + (v +W ) = (u+ v) +W

k(v +W ) = kv +W,

for any u + W , v + W ∈ V/W and k ∈ F . It is easy to check that theseoperations are well-defined and that V/W is a vector space over a field F under


the operations define above. The space V/W is called the quotient space of Vmodulo W .

Proposition 1.7.4. Let W be a subspace of a vector space V . Define the canon-ical map π : V → V/W by

π(v) = v +W for any v ∈ V .

Then π is a surjective linear map with kerπ = W .

Proof. Exercise.

Theorem 1.7.5 (Universal Mapping Property). Let V be a vector space andW ≤ V . Given any vector space U and a linear map t : V → U such that t(w) = 0∀w ∈ W , i.e. ker t ⊇ W , there exists a unique linear map T : V/W → U suchthat T π = t.

Vπ //

t

???????????? V/W

T

U

Proof. If there exists a linear map T : V/W → U such that T π = t, then wemust have

T (v +W ) = T (π(v)) = T π(v) = t(v) for each v ∈ V .

Hence if it exists, it must be defined by the formula T (v + W ) = t(v) for anyv ∈ V . Now, we show that this is a well-defined linear map. Assume thatv1 + W = v2 + W . Then v1 − v2 ∈ W ⊆ ker t. Hence t(v1) = t(v2). This showsthat T is well-defined. It is then easy to check that T is linear and T π = t. Theuniqueness of T follows from the discussion above.

Theorem 1.7.6 (First Isomorphism Theorem). Let t : V → U be a surjectivelinear map. Then V/ ker t ∼= U .

Proof. By the Universal Mapping Property (Theorem 1.7.5), there is a uniquelinear map T : V/ ker t → U such that T π = t. Since t = T π is surjective,

1.7. QUOTIENT SPACES 63

T is surjective. To show that T is 1-1, we will prove that kerT = ker t (recallthat ker t is the zero in V/ ker t). Let v + ker t ∈ kerT . Then

0 = T (v + ker t) = T (π(v)) = t(v).

Hence v ∈ ker t. That is, v + ker t = ker t. On the other hand,

T (ker t) = T (π(0)) = t(0) = 0.

Hence ker t ∈ kerT .

Theorem 1.7.7 (Second Isomorphism Theorem). Let U and W be subspaces ofa vector space V . Then (U +W )/W ∼= U/(U ∩W ).

Proof. Exercise.

Corollary 1.7.8. Let V = U ⊕W . Then (U ⊕W )/W ∼= U .

Theorem 1.7.9 (Third Isomorphism Theorem). Let W ≤ U and U ≤ V . Then(V/W )/(U/W ) ∼= V/U .

Proof. Exercise.

Theorem 1.7.10. Let V be a finite-dimensional vector space and W a subspaceof V . Then

dim(V/W ) = dimV − dimW.

Proof. Note that the canonical map π : V → V/W is a surjective linear mapwhose kernel is W . Now the result follows from Theorem 1.3.8.


Exercises

1.7.1. Let W be a subspace of a vector space V . Define a relation ∼ on V by

u ∼ v if u− v ∈W.

Prove that ∼ is an equivalence relation on V and the equivalence classes are theaffine spaces of V .

1.7.2. Prove that an affine space v+W is a subspace of V if and only if v ∈W ,in which case v +W = W .


1.7.4. Let F be a field and let V = F [x]. Define W by

W = a0 + a1x+ · · ·+ anxn ∈ F [x] : n ∈ N ∪ 0a0 + a1 + · · ·+ an = 0.

Prove that W is a subspace of V and find dim(V/W ).

1.7.5. Let T : V → W be a linear map between vector spaces V and W . LetA and B be subspaces of V and W , respectively such that T [A] ⊆ B. Denoteby p : V → V/A and q : W → W/B their respective canonical maps. Prove thatthere is a unique linear map T : V/A→W/B such that T p = q T .

V

p

T //W

q

V/A

T //W/B

Furthermore, prove that

(i) T is 1-1 if and only if A = T−1[B];

(i) T is onto if and only if B + imT = W .

1.7.6. Prove the Second and Third Isomorphism Theorems.

1.8. DUAL SPACES 65

1.8 Dual Spaces

We know that if V and W are vector spaces over a field F , the set L(V,W ) oflinear maps from V into W is also a vector space over F . In this section, weconsider the special case where W = F . It plays an important role in varioussubjects such as differential geometry, functional analysis, quantum mechanics.

Definition 1.8.1. Let V be a vector space over a field F . A linear map T : V → F

is called a linear functional on V . The set of linear functionals on V is called thedual space or the conjugate space of V , denoted by V ∗:

V ∗ = L(V, F ) = Hom(V, F ).

By Proposition 1.3.20, V ∗ is a vector space over a field F . Hence if V is afinite-dimensional vector space, then so is V ∗ and dimV = dimV ∗.

Example 1.8.2.

(i) For i = 1, . . . , n, let pi : Fn → F be defined by pi(a1, . . . , an) = ai for any(a1, . . . , an) ∈ Fn. It is easy to see that each pi is a linear functional onFn, called the i-th coordinate function.

(ii) Let a = (a1, . . . , an) ∈ Fn. The map Ta : Fn → F defined by

Ta(x1, . . . , xn) = a · x = a1x1 + · · ·+ anxn,

for any x = (x1, . . . , xn) ∈ Fn, is a linear functional on Fn.

(iii) For each a ∈ F , define Ea : F [x] → F be Ea(p) = p(a) for each p ∈ F [x].Then Ea is a linear functional on F [x].

(iv) Define T : C([a, b])→ R by T (f) =∫ ba f(x) dx for each f ∈ C([a, b]). Then

T is a linear functional on C([a, b]).

(v) For any square matrix, its trace is the sum of all elements in the maindiagonal of the matrix. Define tr : Mn(F )→ F as follows:

tr([aij ]) =n∑i=1

aii.

The map tr is a linear functional on Mn(F ), called the trace function.


Proposition 1.8.3. Let B = v1, v2, . . . , vn is a basis for a finite-dimensionalvector space V . For i = 1, 2, . . . , n, define v∗i ∈ V ∗ on the basis B by

v∗i (vj) = δij =

1 if i = j;

0 if i 6= j.

Then B∗ = v∗1, v∗2, . . . , v∗n is a basis for V ∗.

Proof. Let f ∈ V ∗. We will show that f =∑n

i=1 f(vi)v∗i . To see this, let g bethe linear functional

∑ni=1 f(vi)v∗i . Then for j = 1, 2, . . . , n,

g(vj) =n∑i=1

f(vi)v∗i (vj) =n∑i=1

f(vi)δij = f(vj).

Hence f = g on the basis B and thus f = g on V . This implies that B∗ spansV ∗. Next, let α1, . . . , αn ∈ F be such that

∑ni=1 αiv

∗i = 0. Applying vj , for each

j, to both sides, we have

0 =n∑i=1

αiv∗i (vj) =

n∑i=1

αiδij = αj .

Hence αj = 0 for j = 1, 2, . . . , n. This shows that B∗ is linearly independent andthus a basis for V ∗.

Remark. Proposition 1.8.3 is not true if V is infinite-dimensional. For example,let V = F [x] and B = 1, x, x2, . . . . Then B is a basis for V . Let B∗ =f0, f1, f2, . . . , where fk(xn) = δkn. It is easy to check that B∗ is linearlyindependent, but B∗ does not span V ∗. To see this, let g ∈ V ∗ be defined on abasis B by g(xn) = 1 for every n ∈ N ∪ 0. Suppose g ∈ 〈B∗〉. Then

g = k0f0 + k1f1 + · · ·+ kmfm,

for some m ∈ N and k1, . . . , km ∈ F . Apply the above equation to xm+1. Theng(xm+1) = 1, but fk(xm+1) = 0 for k = 0, 1, . . . ,m, which is a contradiction.Hence g 6∈ 〈B∗〉.

1.8. DUAL SPACES 67

Definition 1.8.4. Let V be a vector space. For any subset S of V , the annihilatorof S, denoted by S, is defined by

S = f ∈ V ∗ | f(x) = 0 for all x ∈ S.

Proposition 1.8.5. Let S be a subset of a vector space V . Then

(i) 0V = V ∗ and V = 0V ∗;

(ii) S is a subspace of V ∗;

(iii) For any subsets S1 and S2 of V , S1 ⊆ S2 implies S2 ⊆ S1 .

Proof. (i) This follows from the definition of an annihilator.(ii) The proof is routine and we leave it to the reader.(iii) Assume that S1 ⊆ S2 ⊆ V . For any f ∈ V ∗, if f(x) = 0 for all x ∈ S2, thenf(x) = 0 for all x ∈ S1. Hence S2 ⊆ S1 .

Proposition 1.8.6. If W is a subspace of a finite-dimensional vector space V ,then

dimV = dimW + dimW .

Proof. Let W be a subspace of V . Let v1, . . . , vk be a basis for W and extendit to a basis B = v1, . . . , vk, vk+1, . . . , vn for V . Let B∗ = v∗1, . . . , v∗n be thedual basis of B and let C∗ = v∗k+1, . . . , v

∗n. We will show that C∗ is a basis for

W . Since C∗ ⊆ B∗, it follows that C∗ is linearly independent. To see that C∗

spans W , let f ∈ W . We will show that f =∑n

i=k+1 f(vi)v∗i . By the proof ofProposition 1.8.3, f =

∑ni=1 f(vi)v∗i . Since f ∈ W and vi ∈ W for i = 1, . . . , k,

we have f(vi) = 0 for i = 1, . . . , k. Hence f =∑n

i=k+1 f(vi)v∗i ∈ span C∗. Now,dimW = |C∗| = n− k = dimV − dimW .

Next, we define a dual of a linear map. Given a linear map T : V → W , wecan use T to turn a linear functional f on W into a linear functional on V justby a composition T f . Hence there is a map from W ∗ into V ∗ associated to T .

Definition 1.8.7. Let T : V →W be a linear map. Define T t : W ∗ → V ∗ by

T t(f) = f T for any f ∈W ∗.

The map T t is called the transpose or the dual of T .


Proposition 1.8.8. Let V and W be vector spaces. Then

(i) If T ∈ L(V,W ), then T t ∈ L(W ∗, V ∗).

(ii) (IV )t = IV ∗.

(iii) (αS + βT )t = αSt + βT t for any S, T ∈ L(V,W ) and α, β ∈ F .

(iv) (TS)t = StT t for any S ∈ L(U, V ) and T ∈ L(V,W ).

(v) If T ∈ L(V,W ) is invertible, then T t is invertible and (T t)−1 = (T−1)t.

Proof. (i) Let f , g ∈W ∗ and α, β ∈ F . Then

T t(αf + βg) = (αf + βg) T = α(f T ) + β(g T ) = αT t(f) + β T t(g).

Hence T t : W ∗ → V ∗ is linear.(ii) Note that

(IV )t(f) = f IV = f = IV ∗(f) for any f ∈ V ∗.

Hence (IV )t = IV ∗ .(iii) Let S, T ∈ L(V,W ) and α, β ∈ F . Then for any f ∈W ∗,

(αS + βT )t(f) = f (αS + βT ) = α(f S) + β(f T ) = αSt(f) + β T t(f).

Hence (αS + βT )t = αSt + βT t.(iv) Let S ∈ L(U, V ) and T ∈ L(V,W ). Then for any f ∈W ∗,

(TS)t(f) = f (T S) = St(f T ) = St(T t(f)).

Hence (TS)t = StT t.(v) Assume that T ∈ L(V,W ) is invertible. Then there is S ∈ L(W,V ) such thatST = IV and TS = IW . Then

T tSt = (ST )t = (IV )t = IV ∗ and StT t = (TS)t = (IW )t = IW ∗ .

This shows that T t is invertible and (T t)−1 = St = (T−1)t.

1.8. DUAL SPACES 69

Proposition 1.8.9. Let V and W be finite-dimensional vector spaces over F andT : V →W a linear map. Let B and C be ordered bases for V and W , respectively.Also, let B∗ and C∗ be the dual (ordered) bases of B and C, respectively. Then

[T t]C∗,B∗ = [T ]tB,C.

Proof. Let B = v1, . . . , vn and C = w1, . . . , wm be ordered bases for V andW , respectively. Let B∗ = v∗1, . . . , v∗n and C = w∗1, . . . , w∗m be the dual basesof B and C, respectively. Let A = [aij ] = [T ]B,C and B = [bij ] = [T t]C∗,B∗ . Thenfor j = 1, . . . , n,

T (vj) =m∑k=1

akjwk

and for i = 1, . . . ,m,

T t(w∗i ) =n∑k=1

bkiv∗k.

These two equalities imply that

aij = w∗i (T (vj)) = (T t(w∗i ))(vj) = bji

for i = 1, . . . ,m and j = 1, . . . , n. This shows that B = At.

If V is a vector space, its dual space V ∗ is also a vector space and hence wecan again define the dual space (V ∗)∗ of V ∗. In the sense that we will describebelow, the second dual V ∗∗ = (V ∗)∗ is closely related to the original space V ,especially if V is finite-dimensional, V and V ∗∗ are isomorphic via a canonical(basis-free) linear isomorphism.

Definition 1.8.10. If V is a vector space, the dual space of V ∗, denoted by V ∗∗,is called the double dual or the second dual of V .

To establish the main result about the double dual space, the following propo-sition will be useful.

Proposition 1.8.11. Let V be a vector space and v ∈ V . If f(v) = 0 for anyf ∈ V ∗, then v = 0. Equivalently, if v 6= 0, then there exists f ∈ V ∗ such thatf(v) 6= 0.


Proof. Assume that v 6= 0. Then v is linearly independent and thus can beextended to a basis B for V . Let t : B→ F be defined by t(v) = 1 and t(x) = 0for any x ∈ B− v. Extend t to a linear functional f on V . Hence f ∈ V ∗ andf(v) 6= 0.

Theorem 1.8.12. Let V be a vector space over a field F . For each v ∈ V , definev : V ∗ → F by v(f) = f(v) for any f ∈ V ∗. Then

(i) v is a linear functional on V ∗, i.e. v ∈ V ∗∗ for each v ∈ V .

(ii) the map Φ: V → V ∗∗, v 7→ v, is an injective linear map.

(iii) If V is finite-dimensional, then Φ is a linear isomorphism.

Hence V ∼= V ∗∗, via the canonical map Φ if V is finite-dimensional.

Proof. (i) For any f , g ∈ V ∗ and α, β ∈ F ,

v(αf + βg) = (αf + βg)(v) = αf(v) + βg(v) = αv(f) + βv(g).

This shows that v is linear on V ∗ for each v ∈ V .

(ii) Let v, w ∈ V and α ∈ F . Then, for any f ∈ V ∗,

(v + w)(f) = f(v + w) = f(v) + f(w) = v(f) + w(f) = (v + w)(f).

Similarly, for any f ∈ V ∗

(αv)(f) = f(αv) = αf(v) = α v(f).

Hence (v + w) = v + w and (αv) = α v. This shows that Φ is linear. To see thatit is 1-1, let v ∈ V be such that v 6= 0. By Proposition 1.8.11, there exists f ∈ V ∗

such that f(v) 6= 0. Thus v(f) 6= 0, i.e., v 6= 0. Hence Φ is 1-1.

(iii) If V is finite-dimensional, then dimV ∗∗ = dimV ∗ = dimV. Since Φ is 1-1,by Theorem 1.3.16, it is a linear isomorphism.

1.8. DUAL SPACES 71

Exercises

1.8.1. Consider C as a vector space over R. Prove that the dual basis for 1, i isRe, Im, where Re and Im are the real part and the imaginary part, respectively,of a complex number.

1.8.2. Let V be a finite-dimensional vector space and U , W subspaces of V .Prove that

(i) (U +W ) = U ∩W .

(ii) (U ∩W ) = U +W .

(iii) If V = U ⊕W , then V ∗ = U ⊕W .

Also try to prove these statements without assuming that V is finite-dimensional.

1.8.3. Let V be a finite-dimensional vector space and W a subspace of V . Provethat W ∼= (W ), under the canonical isomorphism between V and V ∗∗.

1.8.4. Let V be a vector space. For any M ⊆ V ∗, define the annihilator M ofM by

M = x ∈ V | f(x) = 0 ∀f ∈M.

Prove that

(i) 0V ∗ = V and (V ∗) = 0V .

(ii) M is a subspace of V .

(iii) For any M1, M2 ⊆ V ∗, if M1 ⊆M2, then M2 ⊆ M1.

(iv) if V is finite dimensional, then

dimV = dimV ∗ = dimW + dim(W ).

1.8.5. Let V and W be vector spaces. Prove that (V ⊕W )∗ ∼= V ∗ ⊕W ∗, wherethe direct sums are external.


1.8.6. Let Vαα∈Λ be a family of vector spaces. Prove that(⊕α∈Λ

Vα

)∗ ∼= ∏α∈Λ

V ∗α .

1.8.7. Let V be a vector space. Prove the following statements:

(i) If U is a proper subspace of V and x ∈ V − U , then there exists f ∈ V ∗

such that f(x) = 1, but f(U) = 0;

(ii) for any subspaces W1 and W2 of V , W1 = W2 if and only if W 1 = W 2 .

1.8.8. Let V be a finite-dimensional vector space. If C is a basis for V ∗, provethat there exists a basis B for V such that C = B∗.

1.8.9. Let f , g ∈ V ∗ be such that ker f ⊆ ker g. Prove that g = αf for someα ∈ F .

1.8.10. Let T : V →W be a linear map. Prove the following statements:

(i) kerT t = (imT ).

(ii) imT t = (kerT ).

(iii) T is 1-1 if and only if T t is onto.

(iv) T is onto if and only if T t is 1-1.

(v) rankT = rankT t if V is finite-dimensional.

Hint for (ii): Let U be a subspace of W such that W = imT⊕U . If g ∈ (kerT ),define f(T (x) + u) = g(x) for any x ∈ V and u ∈ U .

1.8.11. Let T : V →W be a linear map. Let ΦV : V → V ∗∗ and ΦW : W →W ∗∗

be the canonical maps, defined in Theorem 1.8.12, for V and W , respectively. LetT tt : V ∗∗ →W ∗∗ denote the double transpose of T . Prove that T ttΦV = ΦW T.Draw a commutative diagram.

Chapter 2

Multilinear Algebra

In this chapter, we study various aspects of multilinear maps. A multilinearmap is a function defined on a product of vector spaces which is linear in eachfactor. To study a multilinear map, we turn them into a linear map on a newvector space, called a tensor product of the vector spaces. A tensor product ischaracterized by its universal mapping property.

Then we look at the determinant of a matrix, which can be regarded as amultilinear map on the row vectors. The determinant function is an exampleof an alternating multilinear map. This leads to a study of an exterior productof vector spaces, which is also defined by its universal mapping property foralternating multilinear maps.

To acquaint a reader with the concept of a universal mapping property, we firststart with the concept of free vector spaces, which will be used in the constructionof a tensor product of vector spaces.

2.1 Free Vector Spaces

Throughout this chapter, unless otherwise stated, F will be an aribitrary field.Given any nonempty set X, we will construct a vector space FX over F whichcontains X as a basis. Then FX is called a free vector space on X.

Recall that if V is a vector space with a basis B, we have the followinguniversal mapping property:

73

74 CHAPTER 2. MULTILINEAR ALGEBRA

B iB //

t????????? V

T

W

Given a vector space W and a function t : B → W ,there exists a unique linear map T : V → W suchthat T iB = t.

We now define a free vector space on a non-empty set by the universal mappingproperty.

Definition 2.1.1. Let X be a non-empty set. A free vector space on X is apair (V, i) consisting of a vector space V and a function i : X → V satisfying thefollowing universal mapping property:

Xi //

t????????? V

T

W

Given a vector space W and a function t : X → W ,there exists a unique linear map T : V → W suchthat T i = t.

Hence if V is a vector space over F with a basis B, then (V, iB) is a free vectorspace on B, where iB : B → V is the inclusion map.

Proposition 2.1.2. If (V, i) is a free vector space on a non-empty set X, then i

is injective.

Proof. Let x, y ∈ X be such that x 6= y. Take W = F in the universal mappingproperty and choose a function t : X → F so that t(x) 6= t(y) (e.g., t(x) = 0,t(y) = 1). Then there is a unique linear map T : V → F such that T i = t. Itfollows that T (i(x)) 6= T (i(y)), which implies i(x) 6= i(y). Thus i is injective.

If (V, i) is a free vector space on a non-empty set X, we will soon see thati(X) forms a basis for V . Since i is injective, we can identify X with a subseti(X) of V and simply say that V is a vector space containing X as a basis. Theterm “free” means there is no relationship between the elements of X. The pointof view here is that, starting from an arbitrary set, we can construct a vectorspace for which the given set is a basis.

Proposition 2.1.3. Let F be a field and X a non-empty set. Then there existsa free vector space over F on X.

2.1. FREE VECTOR SPACES 75

Proof. Define

FX = f : X → F | f(x) 6= 0 for finitely many x.

For each x ∈ X, define δx : X → F by δx(y) =

1 if y = x;

0 if y 6= x.It is now routine to verify that

(i) FX is a subspace of F(X);

(ii) for any f ∈ FX , f =∑

x f(x)δx, where the sum is a finite sum;

(iii) δxx∈X is linearly independent.

It follows that FX is a vector space over F containing δxx∈X as a basis. LetiX : X → FX be defined by iX(x) = δx for each x ∈ X. It is readily checked thatthe universal mapping property is satisfied. Hence (FX , iX) is a free vector spaceover F on X.

With a slight abuse of notation, we will identify the function δx with theelement x ∈ X itself. Then we can view FX as a vector space containing X as abasis. A typical element in FX can be written as

∑ni=1 αixi, where n ∈ N, αi ∈ F

and xi ∈ X, for i = 1, . . . , n. The vector space operations are done by combininglike terms using the rules

αxi + βxi = (α+ β)xi,

α(βxi) = (αβ)xi.

In general, there are several ways to construct a free vector space on a non-empty set. However, the universal mapping property will show that differentconstructions of a free vector space on the same set are all isomorphic. Hence, afree vector space is uniquely determined up to isomorphism.

Proposition 2.1.4. A free vector space on a non-empty set X is unique up toisomorphism. More precisely, if (V1, i1) and (V2, i2) are free vector spaces on X,then there is a linear isomorphism T : V1 → V2 such that T i1 = i2.


Proof. Let (V1, i1) and (V2, i2) be free vector spaces on a non-empty set X. Bythe universal mapping property, we have the following commutative diagrams:

Xi1 //

t????????? V1

T

W

Xi2 //

t????????? V2

T

W

Now, taking W = V2 and t = i2 in the first diagram, we have a linear mapT1 : V1 → V2 such that T1 i1 = i2. Similarly, taking W = V1 and t = i1 in thesecond diagram, we have a linear map T2 : V2 → V1 such that T2 i2 = i1.

Xi1 //

i2 ???????? V1

T1

V2

Xi2 //

i1 ???????? V2

T2

V1

Hence (T1 T2) i2 = i2 and (T2 T1) i1 = i1. However, the identity mapIV1 on V1 is a unique linear map such that IV1 i1 = i1. Similarly, IV2 is a uniquelinear map such that IV2 i2 = i2.

Xi1 //

i1 ???????? V1

IV1

V1

Xi2 //

i2 ???????? V2

IV2

V2

Hence T2 T1 = IV1 and T1 T2 = IV2 . This shows that T1 and T2 are inverses ofeach other. Hence T : V1 → V2 is a linear isomorphism such that T i1 = i2.

2.1. FREE VECTOR SPACES 77

Exercises

2.1.1. Let (V, i) be a free vector space on a non-empty set X. Given a vectorspace U and a function j : X → U , show that (U, j) is a free vector space on X

if and only if there is a unique linear map f : U → V such that f j = i.

2.1.2. Let (V, i) be a free vector space on a non-empty set X. Prove directlyfrom the universal mapping property that i(X) spans V .Hint: Let W be the span of i(X) and iW : W → V the inclusion map. Applythe universal mapping property to the following commutative diagram to showthat iW is surjective.

V

ϕ

X

i

?? i //

i

????????? W

iW

V

2.1.3. Let (V, i) be a free vector space on a non-empty set X. Prove that i(X)is a basis for V .


2.2 Multilinear Maps and Tensor Products

Definition 2.2.1. Let V1, . . . , Vn and W be vector spaces over F . A functionf : V1 × · · · × Vn →W is said to be multilinear if for each i ∈ 1, 2, . . . , n,

f(x1, . . . , αxi + βyi, . . . , xn) = αf(x1, . . . , xi, . . . , xn) + βf(x1, . . . , yi, . . . , xn)

for any xi, yi ∈ Vi and α, β ∈ F . In other words, a multilinear map is a functionon a Cartesian product of vector spaces which is linear in each variable.

If W = F , we call it a multilinear form.Denote by Mul(V1, . . . , Vn;W ) the set of multilinear maps from V1 × · · · × Vn

into W .

Remark. If n = 1, a multilinear map is simply a linear map. If n = 2, we callit a bilinear map. In general, we may call a multilinear map on a product of nvector spaces an n-linear map.

Examples.

(1) Let V be a vector space. Then the dual pairing ω : V × V ∗ → F defined byω(v, f) = f(v) for any v ∈ V and f ∈ V ∗ is a bilinear form.

(2) If V is an algebra, a multiplication · : V × V → V is a bilinear map.

(3) Let A be an n× n matrix over F . The map L : Fn × Fn → F defined by

L(x, y) = ytAx for any x, y ∈ Fn,

is a bilinear form on Fn. Here we identify a vector in Fn with an n× 1 columnmatrix.

(4) We can view the determinant function on Mn(F ) as a multilinear map asfollows. Let A be an n × n matrix and r1, . . . , rn the rows of A. Then thedeterminant can be viewed as a function det : Fn × · · · × Fn → F defined by

det(r1, . . . , rn) = detA.

That det is a multilinear map follows from the following properties:

det(r1, . . . , ri + r′i, . . . , rn) = det(r1, . . . , ri, . . . , rn) + det(r1, . . . , r′i, . . . , rn)

det(r1, . . . , αri, . . . , rn) = α det(r1, . . . , ri, . . . , rn)

2.2. MULTILINEAR MAPS AND TENSOR PRODUCTS 79

Proposition 2.2.2. Let V1, . . . , Vn and W be vector spaces over F . Then the setof multilinear maps Mul(V1, . . . , Vn;W ) is a vector space over F under additionand scalar multiplication defined by

(f + g)(v1, . . . , vn) = f(v1, . . . , vn) + g(v1, . . . , vn)

(kf)(v1, . . . , vn) = kf(v1, . . . , vn).

Proof. The proof is routine and we leave it to the reader.

Proposition 2.2.3. Let V1, . . . , Vn and W be vector spaces over F . Then forany n ≥ 2,

Mul(V1, . . . , Vn;W ) ∩ L(V1 × · · · × Vn,W ) = 0.

Proof. Let T ∈ Mul(V1, . . . , Vn;W ) ∩ L(V1 × · · · × Vn,W ). Then

T (v1, v2, . . . , vn) = T (v1, 0, . . . , 0) + T (0, v2, . . . , vn)

= 0 · T (v1, v2, 0, . . . , 0) + 0 · T (v1, v2, . . . , vn)

= 0

for any (v1, v2, . . . , vn) ∈ V1 × · · · × Vn. The first equality above follows from thelinearity of T and the second one follows from the linearity in the second andfirst variables, respectively.

From this proposition, we see that theory of linear maps cannot be appliedto multilinear maps directly. However, we can transform a multilinear map to alinear map on a certain vector space and apply theory of linear algebra to thisinduced linear map and then transfer information back to the original multilinearmap. In the process of doing so, we will construct a new vector space which isvery important in its own. It is called a tensor product of vector spaces. Webegin by considering a tensor product of two vector spaces.

Let U and V be vector spaces over F . We would like to define a new vectorspace U ⊗ V which is the “product”of U and V . (Note that the direct productU × V is really the “sum”of U and V .) The space U ⊗ V will consist of formalelements of the form

α1(u1 ⊗ v1) + · · ·+ αn(un ⊗ vn), (2.1)

where n ∈ N, αi ∈ F , ui ∈ U and vi ∈ V , for i = 1, 2, . . . , n.


Moreover, it satisfies the distributive laws:

(αu1 + βu2)⊗ v = α(u1 ⊗ v) + β(u2 ⊗ v) ∀u1, u2 ∈ U ∀v ∈ V ∀α, β ∈ F ;

u⊗ (αv1 + βv2) = α(u⊗ v1) + β(u⊗ v2) ∀u ∈ U ∀v1, v2 ∈ V ∀α, β ∈ F .

But we do not assume that the tensor product is commutative: u ⊗ v 6= v ⊗ u.In fact, u⊗ v and v ⊗ u live in different spaces.

By the distributive laws, we can rewrite the formal sum (2.1) as

(α1u1)⊗ v1 + · · ·+ (αnun)⊗ vn.

By renaming αiui as ui, any element in U ⊗ V can be written as

u1 ⊗ v1 + · · ·+ un ⊗ vn. (2.2)

However, this representation is not unique. We can have different formal sums(2.2) that represent the same element in U ⊗V . This will be a problem when wedefine a function on the tensor product U ⊗ V . To get around this problem, wewill introduce the universal mapping property of a tensor product. In fact, wewill define a tensor product U⊗V to be the universal object that turns a bilinearmap on U×V into a linear map on U⊗V . Any linear map on the tensor productU ⊗ V will be defined through the universal mapping property.

Definition 2.2.4. Let U and V be vector spaces over F . A tensor product of Uand V , is a vector space X over F , together with a bilinear map b : U × V → X

with the following universal mapping property:

U × V

ϕ##GGGGGGGGGb // X

φ~~||

||

W

Given any vector space W and a bilinear mapϕ : U × V → W , there exists a unique linearmap φ : X →W such that φ b = ϕ.

There are several ways to define a tensor product of vector spaces. If thevector spaces are finite-dimensional, we can give an elementary construction. Onthe other hand, one can construct a tensor product of modules, in which case aconstruction of a tensor product of vector spaces is a special case. Here, we willadopt a medium ground in which we construct a tensor product of two vectorspaces, not necessarily finite-dimensional.


Theorem 2.2.5. Let U and V be vector spaces. Then a tensor product of U andV exists.

Proof. Let U and V be vector spaces over a field F . Let (FU×V , i) denote thefree vector space on U × V . Here U × V is the Cartesian product of U and V

with no algebraic structure. Then

FU×V =∑

finite

αj(uj , vj) | (uj , vj) ∈ U × V and αj ∈ F.

Let T be the subspace of FU×V generated by all vectors of the form:

α(u, v) + β(u′, v)− (αu+ βu′, v) and

α(u, v) + β(u, v′)− (u, αv + βv′)

for all α, β ∈ F , u, u′ ∈ U and v, v′ ∈ V . Let b : U × V → FU×V /T be a mapdefined by b(u, v) = (u, v)+T . Note that the map b is just the composition of thecanonical map i : U × V → FU×V and the projection map π : FU×V → FU×V /T .Since α(u, v)+β(u′, v)−(αu+βu′, v) ∈ T and α(u, v)+β(u, v′)−(u, αv+βv′) ∈ Tfor all α, β ∈ F , u, u′ ∈ U and v, v′ ∈ V , it follows that

(αu+ βu′, v) + T = α(u, v) + β(u′, v) + T and

(u, αv + βv′) + T = α(u, v) + β(u, v′) + T

for all α, β ∈ F , u, u′ ∈ U and v, v′ ∈ V . From this, it is easy to see that b isbilinear. Next, we prove that the quotient space FU×V /T satisfies the universalmapping property in Definiton 2.2.4. Consider the following diagram:

U × V

b = πi

%%

ϕ$$JJJJJJJJJJ

i // FU×V

ϕ

π // FU×V /T

φyys

ss

ss

W

Let W be a vector space over F and ϕ : U × V → W a bilinear map. Bythe universal mapping property of the free vector space FU×V , there exists aunique linear map ϕ : FU×V → W such that ϕ i = ϕ. Since ϕ is bilinear, it


sends any of the vectors which generate T to zero, so T ⊆ kerϕ. Hence by theuniversal mapping property of the quotient space, there exists a unique linearmap φ : FU×V /T →W such that φ π = ϕ. Hence

φ b = φ π i = ϕ i = ϕ.

It remains to show that φ is unique. Suppose that φ′ : FU×V /T → W is a linearmap such that φ′ b = ϕ. Then φ′ π : FU×V → W is a linear map for which(φ′ π) i = φ′ b = ϕ. Hence by the uniqueness of ϕ, we have φ′ π = ϕ. Butthen by the uniqueness of φ, we have φ′ = φ.

We have given a construction of a tensor product of two vector spaces. In fact,there are different ways in constructing a tensor product. For example, if U andV are finite-dimensional vector spaces, then the space Bil(U∗, V ∗;F ) consistingof all bilinear maps from U∗×V ∗ into F satisfies the universal mapping propertyfor the tensor product. (Exercise!) However, any construction of a tensor productwill give an isomorphic vector space as stated in the next Proposition.

Proposition 2.2.6. A tensor product of U and V is unique up to isomorphism.More precisely, if (X1, b1) and (X2, b2) are tensor products of U and V , then thereis a linear isomorphism F : X1 → X2 such that F b1 = b2.

Proof. The proof here is the same as the proof of uniqueness of a free vectorspace on a non-empty set (Theorem 2.1.4). We repeat it here for the sake ofcompleteness. Let (X1, b1) and (X2, b2) be tensor products of U and V . Notethat b1 and b2 are bilinear maps from U × V into X1 and X2, respectively. Bythe universal mapping property of (X1, b1), there exists a unique linear mapF1 : X1 → X2 such that F1 b1 = b2. Similarly, there exists a unique linear mapF2 : X2 → X1 such that F2 b2 = b1.

X1

F1

U × V

b1

77nnnnnnnnnnnnn

b2 ''PPPPPPPPPPPPP

X2

F2

WW


Hence F2 F1 b1 = b1. But then IX1 is the unique linear map from X1 into X1

such that IX1 b1 = b1. Thus F2 F1 = IX1 . Similarly, F1 F2 = IX2 .

U × V b1 //

b1

?????????????? X1

IX1

X1

U × V b2 //

b2

?????????????? X2

IX2

X2

Thus F1 = F−12 . Hence F1 is a linear isomorphism such that F1 b1 = b2.

Remark. Since the tensor product of U and V is unique up to isomorphism, wedenote it by U ⊗V or U ⊗F V if the base field is emphasized. We summarize theuniversal mapping property as follows:

U × V

ϕ##GGGGGGGGGb // U ⊗ V

φv vv

vv

W

Given any vector space W and a bilinear mapϕ : U × V → W , there exists a unique linearmap φ : U ⊗ V →W such that φ b = ϕ.

We also write u⊗ v = b(u, v).

Proposition 2.2.7. Let U and V be vector spaces over a field F . Then

(i) (αu1 + βu2)⊗ v = α(u1 ⊗ v) + β(u2 ⊗ v) ∀u1, u2 ∈ U ∀v ∈ V ∀α, β ∈ F ;

(ii) u⊗ (αv1 + βv2) = α(u⊗ v1) + β(u⊗ v2) ∀u ∈ U ∀v1, v2 ∈ V ∀α, β ∈ F .

Proof. (i) Let u1, u2 ∈ U , v ∈ V and α, β ∈ F . Then by bilinearity of b,

(αu1 + βu2)⊗ v = b(αu1 + βu2, v)

= αb(u1, v) + βb(u2, v)

= α(u1 ⊗ v) + β(u2 ⊗ v)

(ii) The proof of (ii) is very similar and is omitted here.

Theorem 2.2.8. Let U and V be vector spaces. If B and C are bases for U andV , respectively, then u⊗ v | u ∈ B, v ∈ C is a basis for U ⊗ V .


Proof. Let D = u ⊗ v | u ∈ B, v ∈ C. To see that D is linearly independent,let ui ∈ B, vj ∈ C and aij ∈ F , for i = 1, . . . , n, j = 1, . . . ,m, be such that

n∑i=1

m∑j=1

aij(ui ⊗ vj) = 0. (2.3)

For k = 1, . . . , n, define ϕk : B→ F by

ϕk(u) =

1 if u = uk;

0 otherwise.

Similarly, for ` = 1, . . . ,m, define ψ` : C→ F by

ψ`(v) =

1 if v = v`;

0 otherwise.

Extend ϕk and ψ` to linear functionals on U and V , respectively. Moreover, fork = 1, . . . , n, ` = 1, . . . ,m, define fk` : U × V → F by

fk`(u, v) = ϕk(u)ψ`(v) for any (u, v) ∈ U × V .

It is easy to see that fk` is a bilinear map for k = 1, . . . , n, ` = 1, . . . ,m. Hencethere is a unique linear map Fk` : U⊗V → F such that Fk`b = fk`. In particular,Fk`(ui ⊗ vj) = Fk` b(ui, vj) = fk`(ui, vj) = ϕk(ui)ψ`(vj).

Now, for k = 1, . . . , n, ` = 1, . . . ,m, apply Fk` to (2.3):

0 = Fk`

( n∑i=1

m∑j=1

aij(ui ⊗ vj))

=n∑i=1

m∑j=1

aijFk`(ui ⊗ vj)

=n∑i=1

m∑j=1

aijϕk(ui)ψ`(vj)

= ak`.

This shows that the coefficients aij = 0 for any i = 1, . . . , n, j = 1, . . . ,m. ThusD is linearly independent.


Next, let Y = span D. Then Y is a subspace of the vector space U ⊗V . Thusthere is a subspace Z of U⊗V such that U⊗V = Y ⊕Z. Let Φ be the projectionmap from U ⊗ V (= Y ⊕ Z) onto Y , namely,

Φ(y + z) = y (y ∈ Y , z ∈ Z).

To show that Φ b = b, let u =∑n

i=1 αiui ∈ U and v =∑m

j=1 βjvj ∈ V , whereui ∈ U , vj ∈ V and αi, βj ∈ F for i = 1, . . . , n and j = 1, . . . ,m. Then bybilinearity of b, we have

b(u, v) = b( n∑i=1

αiui,m∑j=1

βjvj

)=∑i,j

αiβj b(ui, vj) =∑i,j

αiβj(ui ⊗ vj).

Hence b(u, v) ∈ span D = Y . It follows that

Φ b(u, v) = Φ(b(u, v)) = b(u, v).

Thus Φ b = b, as desired. Since Φ: U ⊗ V → U ⊗ V is a linear map suchthat Φ b = b and IU⊗V : U ⊗ V → U ⊗ V is the unique linear map such thatIU⊗V b = b. By the uniqueness, we have Φ = IU⊗V . It follows that Z = 0 andthat span D = Y = U ⊗V . We now can conclude that D is a basis for U ⊗V .

Corollary 2.2.9. Let U and V be finite-dimensional vector spaces. Then

dim(U ⊗ V ) = (dimU)(dimV ).

Proof. Let B, C and D be the bases of U , V and U ⊗ V , respectively, as in theproof of Theorem 2.2.8. Then |D| = |B| · |C|.

Corollary 2.2.10. Let U and V be vector spaces. Then any element in U ⊗ Vcan be written as

n∑i=1

ui ⊗ vi,

where n ∈ N, ui ∈ U and vi ∈ V , for i = 1, . . . , n.

Proof. Let B and C be bases for U and V , respectively. Let x ∈ U ⊗ V . Then x

can be written as

x =n∑i=1

m∑j=1

aij(ui ⊗ vj)


where m, n ∈ N, ui ∈ B, vj ∈ C and aij ∈ F for i = 1, . . . , n and j = 1, . . . ,m.Thus

x =n∑i=1

m∑j=1

aij(ui ⊗ vj) =n∑i=1

ui ⊗( m∑j=1

aijvj

)=

n∑i=1

ui ⊗ v′i,

where each v′i =∑m

j=1 aijvj ∈ V .

Remark. A typical element in U ⊗ V is not u ⊗ v, but a linear combination of

elements in the formn∑i=1

ui ⊗ vi, where n ∈ N, ui ∈ U , vi ∈ V , i = 1, ..., n. A

linear combination of products may not be written as a single product of twoelements.

U ⊗ V 6= u⊗ v | u ∈ U, v ∈ V .

But

U ⊗ V = spanu⊗ v | u ∈ U, v ∈ V

= n∑i=1

ui ⊗ vi | n ∈ N, ui ∈ U, vi ∈ V.

However, a linear combination that represents an element in U⊗V is not unique.For example,

2(u⊗ v) = (2u)⊗ v = u⊗ (2v) = u⊗ v + u⊗ v = (2u)⊗ (2v)− 2(u⊗ v).

This is an important point because a function on the tensor product U⊗V definedby specifying the action on its elements may not be well-defined. In general, wewill use the universal mapping property to define a linear map on the tensorproduct. We will see more about this later.

Next we investigate several properties of tensor products.

Theorem 2.2.11. Let V be a vector space over a field F . Then

F ⊗ V ∼= V ∼= V ⊗ F.

Proof. Let b : F × V → F ⊗ V be a bilinear map defined by

b(k, v) = k ⊗ v for any (k, v) ∈ F × V .


Define ϕ : F × V → V by

ϕ(k, v) = kv for any (k, v) ∈ F × V .

It is easy to see that ϕ is a bilinear map. Then there is a unique linear mapΦ: F ⊗ V → V such that Φ b = ϕ. In particular, Φ(k ⊗ v) = kv for any k ∈ Fand v ∈ V . Now define Ψ: V → F ⊗ V by Ψ(v) = 1 ⊗ v for any v ∈ V . ByProposition 2.2.7, Ψ is linear. Moreover, Φ Ψ = IV . To see that Ψ Φ = IF⊗V ,consider, for any k ∈ F and v ∈ V ,

Ψ Φ(k ⊗ v) = Ψ(kv) = 1⊗ kv = k(1⊗ v) = k ⊗ v. (2.4)

Hence Ψ Φ = IF⊗V on the set k ⊗ v | k ∈ F, v ∈ V , which spans F ⊗ V . Itfollows that Ψ Φ = IF⊗V . (Alternatively, (2.4) show that Ψ Φ b = b. By theuniqueness, Ψ Φ = IF⊗V .) It means that Φ and Ψ are inverses of each other,and that Φ: F ⊗ V → V is a linear isomorphism. Now we establish F ⊗ V ∼= V .Similarly, we can show that V ⊗ F ∼= V .

Theorem 2.2.12. Let U and V be vector spaces over a field F . Then

U ⊗ V ∼= V ⊗ U.

Proof. Let b1 : U ×V → U ⊗V and b2 : V ×U → V ⊗U be bilinear maps definedby

b1(u, v) = u⊗ v and b2(v, u) = v ⊗ u.

Define ϕ : U × V → V ⊗ U by

ϕ(u, v) = b2(v, u) = v ⊗ u.

Similarly, define ψ : V × U → U ⊗ V by

ψ(v, u) = b1(u, v) = u⊗ v.

Then ϕ and ψ are bilinear maps and hence there exists a unique pair of linearmaps Φ: U ⊗ V → V ⊗ U and Ψ: V ⊗ U → U ⊗ V such that Φ b1 = ϕ andΨ b2 = ψ. Note that

Ψ Φ b1(u, v) = Ψ ϕ(u, v) = Ψ b2(v, u) = ψ(v, u) = b1(u, v)

for any (u, v) ∈ U × V . Hence Ψ Φ b1 = b1. Similarly, Φ Ψ b2 = b2. ThusΨ Φ = IU⊗V and Φ Ψ = IV⊗U . It follows that U ⊗ V ∼= V ⊗ U .


Theorem 2.2.13. Let U , V , W be vector spaces over a field F . Then

(U ⊗ V )⊗W ∼= U ⊗ (V ⊗W ).

Proof. Fix w ∈W . Define ϕw : U × V → U ⊗ (V ⊗W ) by

ϕw(u, v) = u⊗ (v ⊗ w) for any u ∈ U and v ∈ V .

By Proposition 2.2.7, we see that ϕw is bilinear. Then there exists a unique linearmap φw : U ⊗ V → U ⊗ (V ⊗W ) such that

φw(u⊗ v) = u⊗ (v ⊗ w) for any u ∈ U and v ∈ V .

It is easy to see that φw+w′ = φw + φw′ and φkw = kφw for any w, w′ ∈ W andk ∈ F . Define a bilinear map φ : (U ⊗ V )×W → U ⊗ (V ⊗W ) by

φ(x,w) = φw(x) for any x ∈ U ⊗ V and w ∈W .

Then there is a unique linear map Φ: (U ⊗ V )⊗W → U ⊗ (V ⊗W ) such that

Φ(x⊗ w) = φ(x,w) = φw(x) for any x ∈ U ⊗ V and w ∈W .

In particular, for any u ∈ U , v ∈ V and w ∈W ,

Φ((u⊗ v)⊗ w) = u⊗ (v ⊗ w). (2.5)

Similarly, there is a linear map Ψ: U ⊗ (V ⊗W )→ (U ⊗ V )⊗W such that

Ψ(u⊗ (v ⊗ w)) = (u⊗ v)⊗ w (2.6)

for any u ∈ U , v ∈ V and w ∈W . By (2.5) and (2.6), we see that

Ψ Φ((u⊗ v)⊗ w) = Ψ(u⊗ (v ⊗ w)) = (u⊗ v)⊗ w

for any u ∈ U , v ∈ V and w ∈ W . If x ∈ U ⊗ V , then x is a linear combinationof elements in the form u⊗ v. This implies that

Ψ Φ(x⊗ w) = x⊗ w for any x ∈ U ⊗ V and w ∈W .

Hence Ψ Φ = I(U⊗V )⊗W on the set x ⊗ w | x ∈ U ⊗ V, w ∈ W, which spans(U ⊗ V )⊗W . It follows that Ψ Φ = I(U⊗V )⊗W . Similarly, Φ Ψ = IU⊗(V⊗W ).This shows that (U ⊗ V )⊗W ∼= U ⊗ (V ⊗W ).


Theorem 2.2.14. Let U , V , W be vector spaces over a field F . Then

U ⊗ (V ⊕W ) ∼= (U ⊗ V )⊕ (U ⊗W ).

Proof. Define ϕ : U × (V ⊕W )→ (U ⊗ V )⊕ (U ⊗W ) by

ϕ(u, (v, w)) = (u⊗ v, u⊗ w).

It is easy to check that ϕ is bilinear. Hence there is a unique linear map Φ: U ⊗(V ⊕W )→ (U ⊗ V )⊕ (U ⊗W ) such that

Φ(u⊗ (v, w)) = (u⊗ v, u⊗ w).

Let f1 : U × V → U ⊗ (V ⊕W ) and f2 : U ×W → U ⊗ (V ⊕W ) be defined by

f1(u, v) = u⊗ (v, 0) and f2(u,w) = u⊗ (0, w).

Then f1 and f2 are bilinear maps and hence there exist linear maps ψ1 : U⊗V →U ⊗ (V ⊕W ) and ψ2 : U ⊗W → U ⊗ (V ⊕W ) such that

ψ1(u⊗ v) = u⊗ (v, 0) and ψ2(u⊗ w) = u⊗ (0, w).

Now, define Ψ: (U ⊗ V )⊕ (U ⊗W )→ U ⊗ (V ⊕W ) by

Ψ(x, y) = ψ1(x) + ψ2(y) for any x ∈ U ⊗ V and y ∈ U ⊗W .

In particular, for any u1, u2 ∈ U , v ∈ V and w ∈W ,

Ψ(u1 ⊗ v, u2 ⊗ w) = u1 ⊗ (v, 0) + u2 ⊗ (0, w).

It is routine to verify that

Ψ Φ(u⊗ (v, w)) = u⊗ (v, w).

But then u ⊗ (v, w) | u ∈ V, v ∈ V, w ∈ W spans U ⊗ (V ⊕ W ). HenceΨ Φ = IU⊗(V⊕W ). On the other hand, if u ∈ U and v ∈ V , then

Φ ψ1(u⊗ v) = Φ(u⊗ (v, 0)) = (u⊗ v, u⊗ 0) = (u⊗ v, 0).


Hence Φ ψ1(x) = (x, 0) for any x ∈ U ⊗V . Similarly, Φ ψ2(y) = (0, y) for anyy ∈ U ⊗W . Hence for any x ∈ U ⊗ V and y ∈ U ⊗W ,

Φ Ψ(x, y) = Φ(ψ1(x) + ψ2(y))

= Φ ψ1(x) + Φ ψ2(y)

= (x, 0) + (0, y)

= (x, y).

Thus Φ Ψ = I(U⊗V )⊕(U⊗W ). Hence Φ and Ψ are linear isomorphisms.

We can generalize the definition of a tensor product of two vector spaces to atensor product of n vector spaces by the universal mapping property.

Definition 2.2.15. Let V1, . . . , Vn be vector spaces over the same field F . Atensor product of V1, . . . , Vn is a vector space V1 ⊗ · · · ⊗ Vn, together with ann-linear map t : V1 × · · · × Vn → V1 ⊗ · · · ⊗ Vn satisfying the universal mappingproperty. Given a vector space W and an n-linear map ϕ : V1 × · · · × Vn → W ,there exists a unique linear map φ : V1 ⊗ · · · ⊗ Vn →W such that φ t = ϕ.

V1 × · · · × Vn

ϕ&&MMMMMMMMMMMt // V1 ⊗ · · · ⊗ Vn

φxxq q q q q q

W

We can show that a tensor product V1 ⊗ · · · ⊗ Vn exists and is unique up toisomorphism. The proof is similar to the case n = 2 and will only be sketchedhere. The uniqueness part is routine. For the existence, consider the free vectorspace FV1×···×Vn on V1×· · ·×Vn modulo the subspace T generated by the elementsof the form

(v1, . . . , αvi + βv′i, . . . , vn)− α(v1, . . . , vi, . . . , vn)− β(v1, . . . , v′i, . . . , vn).

Let t(v1, . . . , vn) = (v1, . . . , vn) + T . It is routine to verify that t is an n-linearmap and that FV1×···×Vn/T satisfies the universal mapping property above. Anelement t(v1, . . . , vn) in V1 ⊗ · · · ⊗ Vn will be denoted by v1 ⊗ · · · ⊗ vn.

If Bi is a basis for Vi for i = 1, . . . , n, then the following set is a basis forV1 ⊗ · · · ⊗ Vn:

v1 ⊗ · · · ⊗ vn | v1 ∈ B1, . . . , vn ∈ Bn.


Theorem 2.2.13 can be generalized to the following theorem:

Theorem 2.2.16. Let V1, . . . , Vn be vector spaces over the same field F . Fork = 1, . . . , n, there is a unique linear isomorphism

Φk :( k⊗i=1

Vi

)⊗( n⊗i=k+1

Vi

)→

n⊗i=1

Vi

such that for any v1 ∈ V1, . . . , vn ∈ Vn,

Φk((v1 ⊗ · · · ⊗ vk)⊗ (vk+1 ⊗ · · · ⊗ vn)) = v1 ⊗ · · · ⊗ vk ⊗ vk+1 ⊗ · · · ⊗ vn.

Proof. Exercise.


Exercises

2.2.1. Let U , V and W be vector spaces over F . Show that

Bil(U, V ;W ) ∼= L(U ⊗ V,W ),

where Bil(U, V ;W ) is the space of bilinear maps from U × V into W .

2.2.2. Show that there is a unique linear map Φ: Fm ⊗ Fn → Mm×n(F ) suchthat

Φ

x1

x2

...xm

⊗y1

y2

...yn

=

x1y1 x1y2 . . . x1yn

x2y1 x2y2 . . . x2yn...

.... . .

...xmy1 xmy2 . . . xmyn

.Then prove that Φ is a linear isomorphism. Hence Fm ⊗ Fn ∼= Mm×n(F ).

Now, let n = 2 and let e1 = (1, 0), e2 = (0, 1) be the standard basis for F 2.Notice that the 2×2 identity matrix I2 corresponds to the element e1⊗e1+e2⊗e2

in F 2⊗F 2. Show that we cannot find u, v ∈ F 2 such that I2∼= u⊗v. This shows

that an element in a tensor product may not be a simple tensor.

2.2.3. Let V and W be vector spaces.

(i) Prove that there is a unique linear map Φ: V ∗⊗W ∗ → (V ⊗W )∗ such that

Φ(f ⊗ g)(v ⊗ w) = f(v)g(w)

for any f ∈ V ∗, g ∈W ∗, v ∈ V and w ∈W .

(ii) Show that if V andW are finite-dimensional, then Φ is a linear isomorphism.Hence

(V ⊗W )∗ ∼= V ∗ ⊗W ∗.

2.2.4. Let V and W be vector spaces. Give a canonical linear map V ∗ ⊗W →Hom(V,W ) and prove that it is a linear isomorphism when V and W are finite-dimensional. Hence

V ∗ ⊗W ∼= Hom(V,W ).


2.2.5. Let U and V be finite-dimensional vector spaces over a field F . Denoteby Bil(U∗, V ∗;F ) the set of bilinear maps from U∗× V ∗ into F . Let b : U × V →Bil(U∗, V ∗;F ) be defined by

b(u, v)(f, g) = f(u)g(v)

for any u ∈ U , v ∈ V and f ∈ U∗, g ∈ V ∗.Prove that the pair (Bil(U∗, V ∗;F ), b) satisfies the universal mapping property

for a tensor product: given any vector space W over F and a bilinear mapϕ : U × V → W , there exists a unique linear map Φ: Bil(U∗, V ∗;F ) → W suchthat Φ b = ϕ. (This gives another construction of a tensor product U ⊗V whenU and V are finite-dimensional.)

2.2.6. Let U and V be finite-dimensional vector spaces, u1, . . . , un ∈ U andv1, . . . , vn ∈ V . Prove that if

∑ni=1 ui ⊗ vi = 0 and v1, . . . , vn is linearly inde-

pendent, then ui = 0 for i = 1, . . . , n.

2.2.7. Let V , V ′, W and W ′ be vector spaces. Let S : V → V ′ and T : W →W ′

be linear maps. Show that there exists a unique linear map Ψ: V ⊗W → V ′⊗W ′

such that

Ψ(v ⊗ w) = S(v)⊗ T (w) for any v ∈ V and w ∈W .

The unique linear map Ψ is called the tensor product of S and T , denoted byS ⊗ T . Hence

(S ⊗ T )(v ⊗ w) = S(v)⊗ T (w) for any v ∈ V and w ∈W .

2.2.8. Let V , V ′, W and W ′ be vector spaces over F . Let S, S′ : V → V ′ andT , T ′ : W →W ′ be linear maps and k ∈ F . Show that

(i) S ⊗ (T + T ′) = S ⊗ T + S ⊗ T ′;

(ii) (S + S′)⊗ T = S ⊗ T + S′ ⊗ T ;

(iii) (kS)⊗ T = S ⊗ (kT ) = k(S ⊗ T ).


2.2.9. Let V and W be vector spaces. Let S1, S2 : V → V and T1, T2 : W → W

be linear maps. Show that

(S1 ⊗ T1)(S2 ⊗ T2) = (S1S2)⊗ (T1T2).


2.3. DETERMINANTS 95

2.3 Determinants

In this section, we will define the determinant function. Here, we do not needthe fact that F is a field. It suffices to assume that F is a commutative ringwith identity. However, we will develop the theory on vector spaces over a fieldas before, but keep in mind that what we are doing here works in a more generalsituation where vector spaces are replaced by modules over a commutative ringwith identity.

Definition 2.3.1. Let V and W be vector spaces and f : V n →W a multilinearfunction. Then f is said to be symmetric if

f(vσ(1), . . . , vσ(n)) = f(v1, . . . , vn) for any σ ∈ Sn, (2.7)

and skew-symmetric if

f(vσ(1), . . . , vσ(n)) = (sgnσ) f(v1, . . . , vn) for any σ ∈ Sn. (2.8)

Moreover, f is said to be alternating if

f(v1, . . . , vn) = 0 whenever vi = vj for some i 6= j.

Recall that Sn is the set of permutations on 1, 2, . . . , n and that |Sn| = n!elements. Moreover, sgnσ = 1 if σ is an even permutation and sgnσ = −1 if σis an odd permutation. Since any permutation can be written as a product oftranspositions, (2.8) is equivalent to

f(v1, . . . , vi, . . . , vj , . . . , vn) = −f(v1, . . . , vj , . . . , vi, . . . , vn) (2.9)

for any v1, . . . , vn ∈ V .

Proposition 2.3.2. Let V and W be vector spaces over F and f : V n → W amultilinear function. If f is alternating then it is skew-symmetric. The converseholds if 1 + 1 6= 0 in F .

Proof. Assume that f is alternating. First, let us consider the case n = 2. Notethat for any u, v ∈ V ,

0 = f(u+ v, u+ v) = f(u, u) + f(u, v) + f(v, u) + f(v, v)

= 0 + f(u, v) + f(v, u) + 0.


Hence f(u, v) = −f(v, u) for any u, v ∈ V . This argument can be generalized forgeneral n: for any v1, . . . , vn ∈ V ,

f(v1, . . . , vi, . . . , vj , . . . , vn) = −f(v1, . . . , vj , . . . , vi, . . . , vn).

This shows that (2.8) holds for a transposition σ = (i j) ∈ Sn and hence holdsfor any σ ∈ Sn.

On the other hand, assume that f is skew-symmetric. Let (v1, . . . , vn) ∈ V n

and vi = vj for some i 6= j. Let σ be the transposition (i j). Then sgnσ = −1and thus

f(v1, . . . , vi, . . . , vj , . . . , vn) = −f(v1, . . . , vj , . . . , vi, . . . , vn)

= −f(v1, . . . , vi, . . . , vj , . . . , vn),

because vi = vj . Since 1 + 1 6= 0, we have f(v1, . . . , vi, . . . , vj , . . . , vn) = 0.

Next, we will consider a multilinear map on the vector space Fn over the fieldF . We can view an element in (Fn)n as an n × n matrix whose i-th row is thei-th component in (Fn)n.

Theorem 2.3.3. Let r ∈ F . Then there is a unique alternating multilinear mapf : (Fn)n → F such that f(e1, . . . , en) = r, where e1, . . . , en is the standardbasis for Fn.

Proof. (Uniqueness) Suppose f : (Fn)n → F is an alternating multilinear mapsuch that f(e1, . . . , en) = r. Let X1, . . . , Xn ∈ Fn and write each of them as

Xi = (ai1, . . . , ain) =n∑j=1

aijej .

By multilinearity,

f(X1, . . . , Xn) = f( n∑j1=1

a1j1ej1 , . . . ,n∑

jn=1

anjnejn

)=

n∑j1=1

· · ·n∑

jn=1

a1j1 . . . anjnf(ej1 , . . . , ejn).


Since f is alternating, f(ej1 , . . . , ejn) = 0 unless ej1 , . . . , ejn are all distinct; thatis, the set j1, . . . , jn = 1, . . . , n in some order. Hence the sum above reducesto the sum of n! terms over all the permutations in Sn:

f(X1, . . . , Xn) =∑σ∈Sn

a1σ(1) . . . anσ(n)f(eσ(1), . . . , eσ(n))

=∑σ∈Sn

(sgnσ) a1σ(1) . . . anσ(n)f(e1, . . . , en)

=∑σ∈Sn

r (sgnσ) a1σ(1) . . . anσ(n).

(Existence) We define the function f : (Fn)n → F by

f(X1, . . . , Xn) =∑σ∈Sn

r (sgnσ) a1σ(1) . . . anσ(n), (2.10)

where each Xi = (ai1, . . . , ain) and verify that it satisfies the desired property.To see that f is multilinear, we will show that f is linear in the first coordinate.

For the other coordinates, the proof is similar. Assume that X ′1 = (b11, . . . , b1n).Then

f(αX1 + βX ′1, . . . , Xn)

=∑σ∈Sn

r (sgnσ) [αa1σ(1) + βb1σ(1)] . . . anσ(n)

= α∑σ∈Sn

r (sgnσ) a1σ(1) . . . anσ(n) + β∑σ∈Sn

r (sgnσ) b1σ(1) . . . anσ(n)

= αf(X1, . . . , Xn) + βf(X ′1, . . . , Xn).

To show that f(e1, . . . , en) = r, note that each ei = (δi1, . . . , δin), whereδij = 1 if i = j and zero otherwise. Hence the product δ1σ(1) . . . δnσ(n) = 0 unlessσ(1) = 1, . . . , σ(n) = n, i.e., σ is the identity permutation. Thus

f(e1, . . . , en) =∑σ∈Sn

r (sgnσ) δ1σ(1) . . . δnσ(n) = r.

To show that it is alternating, suppose that Xj = Xk for some j 6= k. Then themap σ 7→ σ(j k) is a 1-1 correspondence between the set of even permutationsand the set of odd permutations. Recall that the set of even permutations is


denoted by An. Hence the sum on the right-hand-side of (2.10) can be separatedinto the sum over even permutations and the sum over odd permutations:

f(X1, . . . , Xn) =∑σ∈An

r a1σ(1) . . . anσ(n) −∑

τ∈An(j k)

r a1τ(1) . . . anτ(n). (2.11)

Let σ ∈ An and τ = σ(j k). If i /∈ j, k, then τ(i) = σ(j k)(i) = σ(i) and thusaiτ(i) = aiσ(i). Moreover, τ(j) = σ(j k)(j) = σ(k) implies ajτ(j) = ajσ(k) = akσ(k)

since Xj = Xk. Similarly, akτ(k) = akσ(j) = ajσ(j). This shows that for anyσ ∈ An and τ = σ(j k),

a1σ(1) . . . anσ(n) = a1τ(1) . . . anτ(n).

Thus each term in the first sum in (2.11) will cancel out with the correspondingterm in the second sum so that the total sum is zero. Hence f is alternating.

Definition 2.3.4. The unique alternating multilinear function d : Mn(F ) → F

such that d(In) = 1 is called the determinant function on Mn(F ), denoted bydet. The determinant of a matrix A ∈Mn(F ) is the element det(A) in F . Henceif A = [aij ], then

det(A) =∑σ∈Sn

(sgnσ) a1σ(1) . . . anσ(n).

Remark. By Theorem 2.3.3 and Definition 2.3.4, it follows that any alternat-ing multilinear function f : Mn(F ) → F is a scalar multiple of the determinantfunction.

Theorem 2.3.5. For any A, B ∈Mn(F ), det(AB) = det(A) det(B).

Proof. First, we note the following fact: if A is an m×n matrix and B is an n×pmatrix, then the i-th row of the product AB is the product of the row matrix Aiwith B.

To prove the theorem, let A1, . . . , An be the rows of A, respectively. Let ddenote the determinant function, regarded as a multilinear function of the rowsof a matrix. Hence det(AB) = d(A1B, . . . , AnB).

Now, keep B fixed and let

f(A1, . . . , An) = det(AB) = d(A1B, . . . , AnB).


It is easy to verify that f is an alternating multilinear function. It follows thatf(A) = c det(A), where c = f(In) = det(B). Hence det(AB) = det(B) det(A).

Corollary 2.3.6. If A ∈Mn(F ) is invertible, then det(A−1) = 1/ det(A).

Proof. Since AA−1 = In, detA · det(A−1) = det(AA−1) = det In = 1.

Theorem 2.3.7. For any A ∈Mn(F ), det(A) = det(At).

Proof. Let A = [aij ] and At = [bij ] where bij = aji. Note that if σ ∈ Sn

is such that σ(i) = j, then i = σ−1(j) and thus aσ(i)i = ajσ−1(j). Moreover,sgnσ = sgnσ−1 for any σ ∈ Sn. Hence

det(At) =∑σ∈Sn

(sgnσ) b1σ(1) . . . bnσ(n)

=∑σ∈Sn

(sgnσ) aσ(1)1 . . . aσ(n)n

=∑σ∈Sn

(sgnσ−1) a1σ−1(1) . . . anσ−1(n).

Since the last sum is taken over all permutations in Sn, it must equal detA.

Now, we define the determinant of a linear operator on a finite-dimensionalvector space.

Definition 2.3.8. Let V be a finite-dimensional vector space and T : V → V alinear map. Define the determinant of T , denoted detT , by

detT = det([T ]B),

where [T ]B is the matrix representation of T with respect to an ordered basis B.Note that this definition is independent of the ordered basis. For if B and B′ areordered bases for V and P is the transition matrix from B to B′, then

[T ]B′ = P [T ]B P−1,

Hence det([T ]B′) = det(P [T ]B P−1) = det([T ]B).


Proposition 2.3.9. Let S and T be linear maps on a finite-dimensional vectorspace. Then

det(ST ) = det(S) det(T ).

Proof. Let [S] and [T ] be the matrix representations of S and T (with respect toa certain ordered basis), respectively. Then [ST ] = [S][T ]. Hence

det(ST ) = det([ST ]) = det([S][T ]) = det([S]) det([T ]) = detS detT.

2.4. EXTERIOR PRODUCTS 101

2.4 Exterior Products

In this section, we will construct a vector space that satisfies the universal map-ping property for alternating multilinear maps.

Theorem 2.4.1. Let V be a vector space over a field F and k a positive integer.Then there exists a vector space X over F , together with a k-linear alternatingmap a : V k → X satisfying the universal mapping property: given a vector spaceW and a k-linear alternating map ϕ : V k →W , there exists a unique linear mapφ : X →W such that φ a = ϕ.

V k

ϕ!!BBBBBBBBa // X

φ~~||

||

W

Moreover, the pair (X, a) satisfying the universal mapping property above isunique up to isomorphism.

Proof. Let T be the subspace of V ⊗k = V ⊗ · · · ⊗ V (k-times) spanned by

v1 ⊗ · · · ⊗ vk | vi = vj for some i 6= j.

Let X = V ⊗k/T and let a : V k → X be defined by

a(v1, . . . , vk) = v1 ⊗ · · · ⊗ vk + T.

It is easy to see that a is k-linear. If v1, . . . , vk ∈ V are such that vi = vj forsome i 6= j, then v1⊗· · ·⊗vk ∈ T and hence a(v1, . . . , vn) = v1⊗· · ·⊗vk+T = T.

This shows that a is alternating.Now we show that it satisfies the universal mapping property. Let f : V k →

V ⊗k be the canonical k-linear map sending (v1, . . . , vk) to v1 ⊗ · · · ⊗ vk and letπ : V ⊗k → V ⊗k/T be the canonical projection map. Then a = π f .

V k

a = πf

##

ϕ""DDDDDDDD

f // V ⊗k

ϕ

π // V ⊗k/T

φzzuu

uu

u

W


Let W be a vector space and ϕ : V k →W a k-linear alternating map. By theuniversal mapping property of the tensor product, there is a unique linear mapϕ : V ⊗k →W such that ϕ f = ϕ. If v1 ⊗ · · · ⊗ vk ∈ T , then

ϕ(v1 ⊗ · · · ⊗ vk) = ϕ(f(v1, . . . , vk)) = ϕ(v1, . . . , vk) = 0

because ϕ is alternating. This shows that ϕ sends the elements that generateT to zero. Hence T ⊆ kerϕ. Then by the universal mapping property of thequotient space, there is a unique linear map φ : V ⊗k/T →W such that φπ = ϕ.Hence

φ a = φ π f = ϕ f = ϕ.

To show that φ is unique, let φ′ : V ⊗k/T → W be such that φ′ a = ϕ. Thenφ′ π : V ⊗k → W is a linear map for which (φ′ π) f = φ′ a = ϕ. Hence bythe uniqueness of ϕ, we have φ′ π = ϕ. But then by the uniqueness of φ, wehave φ′ = φ.

Finally, the uniqueness of the pair (X, a) up to isomorphism follows from thestandard argument of the universal mapping property.

Definition 2.4.2. The vector space X in Theorem 2.4.1 is called the k-th exteriorpower of V and is denoted by

∧k V . Hence∧k V is a vector space together

with a k-linear alternating map a : V k →∧k V satisfying the universal mapping

property:

V k

ϕ AAAAAAAAa // ∧k V

φ||yy

yy

W

Given any vector space W and a k-linear alter-nating map ϕ : V k →W , there is a unique linearmap φ :

∧k V →W such that φ a = ϕ.

An element a(v1, . . . , vk) in∧k V will be denoted by v1 ∧ · · · ∧ vk. It is called an

exterior product or a wedge product of v1, . . . , vk.

Proposition 2.4.3. The wedge product satisfies the following properties.

(i) v1 ∧ · · · ∧ (α vi) ∧ · · · ∧ vk = α(v1 ∧ · · · ∧ vi ∧ · · · ∧ vk) for any α ∈ F ;

(ii) v1∧ · · ·∧ (vi+ v′i)∧ · · ·∧ vk = v1∧ · · ·∧ vi∧ · · · ∧ vk + v1∧ · · · ∧ v′i∧ · · ·∧ vk;


(iii) v1 ∧ · · · ∧ vk = 0 if vi = vj for some i 6= j.

(iv) vσ(1) ∧ · · · ∧ vσ(k) = sgn(σ) v1 ∧ · · · ∧ vk for any σ ∈ Sk.

Proof. The first two properties follow from the multilinearity of a. The last twoproperties follow from the fact that a is alternating and skew-symmetric.

Theorem 2.4.4. Let V be a finite-dimensional vector space with a basis B =v1, . . . , vn. Then the following set is a basis for

∧k V :

vi1 ∧ · · · ∧ vik | 1 ≤ i1 < · · · < ik ≤ n. (2.12)

Proof. Let C be the set in (2.12). To show that C spans∧k V , let us recall that

the following set is a basis for V ⊗k:

vi1 ⊗ · · · ⊗ vik | 1 ≤ i1, . . . , ik ≤ n.

By the universal mapping for the tensor product, there is a unique linear mapπ : V ⊗k →

∧k V such that

π(x1 ⊗ · · · ⊗ xk) = x1 ∧ · · · ∧ xk

for any x1, . . . , xk ∈ V . It follows that vi1 ∧ · · · ∧ vik | 1 ≤ i1, . . . , ik ≤ n spans∧k V . If two indices are the same, then vi1 ∧ · · · ∧ vik = 0. If the indices are alldifferent, then we can rearrange them in increasing order by using Proposition2.4.3 (iii) in order to see that C spans

∧k V .Next, we show that C is linearly independent. Let

I = (i1, . . . , ik) | 1 ≤ i1 < · · · < ik ≤ n.

If α = (i1, . . . , ik) ∈ I, write vα = vi1 ∧ · · · ∧ vik . Now suppose∑α∈I

aαvα = 0, (2.13)

where each aα ∈ F . We will construct linear maps Fβ :∧k V → F such that

Fβ(vα) = δαβ for all α, β ∈ I. Let B∗ = f1, . . . , fn be the dual basis of B forV ∗. Then fj(vi) = δij for i, j = 1, . . . , n. Define fα : V k → F by

fα(x1, . . . , xk) =∑σ∈Sk

(sgnσ) fiσ(1)(x1) . . . fiσ(k)

(xk).


Then fα is a k-linear alternating map. The proof of this fact is similar to theexistence part of the proof of Theorem 2.3.3 and is omitted here. By the universalmapping property, there is a unique linear map Fα :

∧k V → F such that Fαa =fα. Then for any α, β ∈ I,

Fβ(vα) = fα(vi1 , . . . , vik) = δαβ.

If we apply each Fβ to both sides of (2.13), we see that aβ = 0. It means that C

is linearly independent.

Corollary 2.4.5. Let V be a finite-dimensional vector space with dimV = n.Then

∧k V is a finite-dimensional vector space with dimension(nk

).

Proof. This follows from a standard combinatorial argument.

From this Corollary, it is natural to define∧0 V = F . Moreover, if k > n,

then∧k V = 0. If k = n, we see that dim(

∧n V ) = 1. Hence if v1, . . . , vn isa basis for V , then the singleton set v1 ∧ · · · ∧ vn is a basis for

∧n V .Let T : V → V be a linear map on a vector space V over a field F . Then T

induces a unique linear map T∧k :∧k V →

∧k V such that

T∧k(x1 ∧ · · · ∧ xk) = T (x1) ∧ · · · ∧ T (xk)

for any x1, . . . , xk ∈ V . The case k = dimV will be of special interest.

Theorem 2.4.6. Let V be a finite-dimensional vector space with dimV = n andT : V → V a linear map. Then detT is the unique scalar such that

T (v1) ∧ · · · ∧ T (vn) = (detT )(v1 ∧ · · · ∧ vn)

for any v1, . . . , vn ∈ V .

Proof. By the discussion above, T∧n is a linear map on a 1-dimensional vectorspace

∧n V . Hence there exists a unique scalar k such that T∧k(w) = kw for anyw ∈

∧n V . Next, we will show that k = detT . Let v1, . . . , vn be a basis for V .Then v1 ∧ · · · ∧ vn is a basis for

∧n V . For i = 1, . . . , n, write

T (vi) =n∑j=1

aijvj .


Note that the obtained matrix A = [aij ] is the transpose of the matrix represen-tation of T . But then the determinant of a matrix is equal to the determinant ofits transpose, and thus detT = detA. Now, let us consider

T (v1) ∧ · · · ∧ T (vn) =( n∑j1=1

a1j1vj1

)∧ · · · ∧

( n∑jn=1

anjnvjn

)=

n∑j1=1

· · ·n∑

jn=1

a1j1 . . . anjn(vj1 ∧ · · · ∧ vjn)

=∑σ∈Sn

a1σ(1) . . . anσ(n)(vσ(1) ∧ · · · ∧ vσ(n))

=∑σ∈Sn

(sgnσ) a1σ(1) . . . anσ(n)(v1 ∧ · · · ∧ vn)

= (detT )(v1 ∧ · · · ∧ vn).

Hence T∧k(v1 ∧ · · · ∧ vn) = (detT )(v1 ∧ · · · ∧ vn). Thus k = detT .


Exercises

2.4.1. Let V be a vector space and v1, . . . , vk ∈ V . If v1, . . . , vk is linearlydependent, show that v1 ∧ · · · ∧ vk = 0. In particular, if dimV = n and k > n,then v1 ∧ · · · ∧ vk = 0.

2.4.2. Let V be a finite-dimensional vector space. Prove that for any k ∈ N,(∧kV)∗ ∼= ∧k

V ∗.

2.4.3. Let V be a finite-dimensional vector space and T : V → V a linear map.Show that if dimV = n and f : V n → F is an n-linear alternating form, then

f(T (v1), . . . , T (vn)) = (detT )f(v1, . . . , vn)

for any v1, . . . , vn ∈ V . Moreover, detT is the only scalar satisfying the aboveequality for any v1, . . . , vn ∈ V .

2.4.4. Let f : V n →W be a multilinear map. Define f : V n →W by

f(v1, . . . , vn) =∑σ∈Sn

sgn(σ)f(vσ(1), . . . , vσ(n))

for any v1, . . . , vn ∈ V . Prove that f is a multilinear alternating map.

2.4.5. Let V be a vector space over a field F and k a positive integer. Showthat there exists a vector space X over F , together with a symmetric k-linearmap s : V k → X satisfying the universal mapping property: given a vector spaceW and a symmetric k-linear map ϕ : V k → W , there exists a unique linear mapφ : X → W such that φ s = ϕ. Moreover, show that the pair (X, s) satisfyingthe universal mapping property above is unique up to isomorphism.

The pair (X, s) satisfying the above universal mapping property for symmetrick-linear maps is called the k-th symmetric product of V , denoted by Sk(V ).

2.4.6. Let V be a finite-dimensional vector space with dimension n. What is thedimension of Sk(V )? Justify your answer.

Chapter 3

Canonical Forms

The basic question in this chapter is as follows. Given a finite-dimensional vectorspace V and a linear operator T : V → V , does there exist an ordered basis B

for V such that [T ]B has a “simple” form. First we investigate when T can berepresented as a diagonal matrix. Then we will find a Jordan canonical form ofa linear operator. But first, we review some results about polynomials that willbe used in this chapter.

3.1 Polynomials

Definition 3.1.1. A polynomial f(x) ∈ F [x] is said to be monic if the coefficientof the highest degree term of f(x) is 1. A polynomial f(x) ∈ F [x] is said to beconstant if f(x) = c for some c ∈ F . Equivalently, f(x) is constant if f(x) = 0or deg f(x) = 0.

Definition 3.1.2. Let f(x), g(x) ∈ F [x], with g(x) 6= 0. We say that g(x)divides f(x), denoted by g(x) | f(x), if there is a polynomial q(x) ∈ F [x] suchthat f(x) = q(x)g(x).

Theorem 3.1.3 (Division Algorithm). Let f(x), g(x) ∈ F [x], with g(x) 6= 0.Then there exist unique polynomials q(x) and r(x) in F [x] such that

f(x) = q(x)g(x) + r(x)

and deg r(x) < deg g(x) or r(x) = 0.

107

108 CHAPTER 3. CANONICAL FORMS

Proof. First, we will show the existence part. If f(x) = 0, take q(x) = 0 andr(x) = 0. If f(x) 6= 0 and deg f(x) < deg g(x), take q(x) = 0 and r(x) = f(x).Assume that deg f(x) ≥ deg g(x). We will prove the theorem by induction ondeg f(x). If deg f(x) = 0, then deg g(x) = 0, i.e., f(x) = a and g(x) = b for somea, b ∈ F −0. Then f(x) = ab−1g(x) + 0, with q(x) = ab−1 and r(x) = 0. Next,let f(x), g(x) ∈ F [x] with deg f(x) = n > 0 and deg g(x) = m ≤ n. Assume thatthe statement holds for any polynomial of degree < n. Write

f(x) = anxn + · · ·+ a1x+ a0 and g(x) = bmx

m + · · ·+ b1x+ b0

where n ≥ m, ai, bj ∈ F for all i, j and bm 6= 0. Let

h(x) = f(x)− anb−1m xn−mg(x). (1)

Then either h(x) = 0 or deg h(x) < n. If h(x) = 0, take q(x) = anb−1m xn−m and

r(x) = 0. If deg h(x) < n, by the induction hypothesis, there exist q′(x) and r′(x)in F [x] such that

h(x) = q′(x)g(x) + r′(x) (2)

where either r′(x) = 0 or deg r′(x) < deg g(x). Combining (1) and (2) together,we have that f(x) = (anb−1

m xn−m + q′(x))g(x) + r′(x), as desired.

To prove uniqueness, assume that

f(x) = q1(x)g(x) + r1(x) = q2(x)g(x) + r2(x)

where qi(x), ri(x) ∈ F [x] and ri(x) = 0 or deg ri(x) < deg g(x), for i = 1, 2. Then(q1(x)− q2(x))g(x) = r2(x)− r1(x). If r2(x)− r1(x) 6= 0, then q1(x)− q2(x) 6= 0,which implies

deg g(x) ≤ deg((q1(x)− q2(x))g(x)) = deg(r2(x)− r1(x)) < deg g(x),

a contradiction. Thus r2(x)− r1(x) = 0, which implies q1(x)− q2(x) = 0.

Corollary 3.1.4. Let p(x) ∈ F [x] and α ∈ F . Then p(α) = 0 if and only ifp(x) = (x− α)q(x) for some q(x) ∈ F [x].

3.1. POLYNOMIALS 109

Proof. Assume that p(α) = 0. By the Division Algorithm, there exist q(x), r(x)in F [x] such that p(x) = (x− α)q(x) + r(x) where deg r(x) < 1 or r(x) = 0, i.e.r(x) is constant. By the assumption, we have r(α) = p(α) = 0, which impliesr(x) = 0. So p(x) = (x− α)q(x). The converse is obvious.

Definition 3.1.5. Let p(x) ∈ F [x] and α ∈ F . We say that α is a root or a zeroof p(x) if p(α) = 0. Hence the above corollary says that α is a root of p(x) if andonly if x− α is a factor of p(x).

Corollary 3.1.6. Any polynomial of degree n ≥ 1 has at most n distinct zeros.

Proof. We will prove by induction on n. The case n = 1 is clear. Assume thatthe statement holds for a positive integer n. Let f(x) be a polynomial of degreen+ 1. If f has no zero, we are done. Suppose that f(α) = 0 for some α ∈ F . ByCorollary 1.1.4, f(x) = (x − α)g(x) for some g(x) ∈ F [x]. Hence deg g(x) = n.By the induction hypothesis, g(x) has at most n zeros, which implies that f(x)has at most n+ 1 zeros.

Definition 3.1.7. Let f1(x), . . . , fn(x) ∈ F [x]. A monic polynomial g(x) ∈ F [x]is said to be the greatest common divisor of f1(x), . . . , fn(x) if it satisfies thesetwo properties:

(i) g(x) | fi(x) for i = 1, . . . , n;

(ii) for any h(x) ∈ F [x], if h(x) | fi(x) for i = 1, . . . , n, then h(x) | g(x).

We denote the greatest common divisor of f1, . . . , fn by gcd(f1, . . . , fn).

Definition 3.1.8. Let f1(x), . . . , fn(x) ∈ F [x]. A monic polynomial g(x) ∈ F [x]is said to be the least common multiple of f1(x), . . . , fn(x) if it satisfies thesetwo properties:

(i) fi(x) | g(x) for i = 1, . . . , n;

(ii) for any h(x) ∈ F [x], if fi(x) | h(x) for i = 1, . . . , n, then g(x) | h(x).

We denote the least common multiple of f1, . . . , fn by lcm(f1, . . . , fn).


Proposition 3.1.9. Let f1(x), . . . , fn(x) be nonzero polynomials in F [x] and let

g(x) = gcd(f1(x), . . . , fn(x)).

Then there exist q1(x), . . . , qn(x) ∈ F [x] such that

g(x) = q1(x)f1(x) + · · ·+ qn(x)fn(x).

Proof. Let

P = n∑

i=1

pi(x)fi(x) | pi(x) ∈ F [x], i = 1, . . . , n.

Since each fi(x) ∈ P is nonzero, P contains a nonzero element. By the Well-ordering Principle, P contains a polynomial d(x) ∈ F [x] with the smallest degree.Thus d(x) =

∑ni=1 pi(x)fi(x), for some pi(x) ∈ F [x], i = 1, . . . , n. By the Division

Algorithm, there exist a(x), r(x) ∈ F [x] such that f1(x) = a(x)d(x)+r(x), wherer(x) = 0 or deg r(x) < deg d(x). It follows that r(x) ∈ P. Indeed,

r(x) = f1(x)− a(x)d(x)

= f1(x)− a(x)( n∑i=1

pi(x)fi(x))

= (1− a(x)p1(x))f1(x)−n∑i=2

a(x)pi(x)fi(x) ∈ P.

But then d(x) is an element in P with the smallest degree. Hence r(x) = 0, whichimplies d(x) | f1(x). Similarly, we have d(x) | fi(x) for i = 1, . . . , n. It followsthat d(x) | g(x). Thus there exists σ(x) ∈ F [x] such that

g(x) = σ(x)d(x) =n∑i=1

σ(x)pi(x)fi(x).

Letting qi(x) = σ(x)pi(x), for i = 1, . . . , n, finishes the proof.

Definition 3.1.10. Two polynomials f(x) and g(x) in F [x] are said to be rela-tively prime if gcd(f(x), g(x)) = 1.

Corollary 3.1.11. Two polynomials f(x) and g(x) in F [x] are relatively primeif and only if there exist p(x), q(x) ∈ F [x] such that p(x)f(x) + q(x)g(x) = 1.


Proof. If gcd(f(x), g(x)) = 1, the implication follows from Theorem 3.1.9. Con-versely, suppose there exist p(x), q(x) ∈ F [x] such that p(x)f(x) + q(x)g(x) = 1.Let d(x) = gcd(f(x), g(x)). Then d(x) | f(x) and d(x) | g(x). It follows easilythat d(x) | p(x)f(x) + q(x)g(x). Hence d(x) | 1, which implies d(x) = 1.

Definition 3.1.12. Let p(x), q(x), r(x) ∈ F [x] with r(x) 6= 0. We say that p(x)is congruent to q(x) modulo r(x) if r(x) | (p(x)− q(x)), denoted by

p(x) ≡ q(x) mod r(x).

Theorem 3.1.13 (Chinese Remainder Theorem). Let σ1(x), . . . , σn(x) be polyno-mials in F [x] such that σi(x) and σj(x) are relatively prime for i 6= j. Then givenany polynomials r1(x), . . . , rn(x) ∈ F [x], there exists a polynomial p(x) ∈ F [x]such that

p(x) ≡ ri(x) mod σi(x) for i = 1, . . . , n.

Proof. For j = 1, . . . , n, let

ϕj(x) =n∏i=1i 6=j

σi(x).

Then σj(x) and ϕj(x) are relatively prime for each j. By Proposition 3.1.9, foreach i, there exist pi(x), qi(x) ∈ F [x] such that 1 = pi(x)ϕi(x) + qi(x)σi(x). Let

p(x) = r1(x)p1(x)ϕ1(x) + · · ·+ rn(x)pn(x)ϕn(x).

Note that for each i,

ri(x)pi(x)ϕi(x) = ri(x)− ri(x)qi(x)σi(x) ≡ ri(x) mod σi(x)

andrj(x)pj(x)ϕj(x) ≡ 0 mod σi(x) for j 6= i.

Hence p(x) ≡ ri(x) mod σi(x) for i = 1, . . . , n.

Definition 3.1.14. We say that f(x) ∈ F [x] is irreducible over F if f is notconstant, and whenever f(x) = g(x)h(x) with g(x), h(x) ∈ F [x], then g(x) orh(x) is constant.


Note that the notion of irreducibility depends on the field F . For example,f(x) = x2 + 1 is irreducible over R, but not irreducible over C because

x2 + 1 = (x+ i)(x− i) over C.

Also, note that a linear polynomial ax+ b, where a 6= 0, is always irreducible.

Lemma 3.1.15. Let f(x), g(x) and h(x) be polynomials in F [x] where f(x) isirreducible. If f(x) | g(x)h(x), then either f(x) | g(x) or f(x) | h(x).

Proof. Assume that f(x) is irreducible, f(x) | g(x)h(x), but f(x) - g(x). We willshow that gcd(f(x), g(x)) = 1. Let d(x) = gcd(f(x), g(x)). Since d(x) | f(x),we can write f(x) = d(x)k(x) for some k(x) ∈ F [x]. By irreducibility of f(x),d(x) or k(x) is a constant. If k(x) = k is a constant, then d(x) = k−1f(x), whichimplies that f(x) | g(x), a contradiction. Hence, d(x) is a constant, i.e. d(x) = 1.By Proposition 3.1.9, we have 1 = p(x)f(x)+q(x)g(x) for some p(x), q(x) ∈ F [x].Thus h(x) = p(x)f(x)h(x)+q(x)g(x)h(x). Since f(x) divides g(x)h(x), it dividesthe term on the right-hand side of this equation and hence f(x) | h(x).

Theorem 3.1.16 (Unique factorization of polynomials). Every nonconstant poly-nomial in F [x] can be written as a product of irreducible polynomials, and thefactorization is unique up to associates; namely, if f(x) ∈ F [x] and

f(x) = g1(x) . . . gn(x) = h1(x) . . . hm(x),

then n = m and we can renumber the indices so that gi(x) = αihi(x) for someαi ∈ F for i = 1, . . . , n.

Proof. We will prove the theorem by induction on deg f(x). Obviously, anypolynomial of degree 1 is irreducible. Let n > 1 be an integer and assume thatany polynomial of degree less than n can be written as a product of irreduciblepolynomials. Let f(x) ∈ F [x] with deg f(x) = n. If f(x) is irreducible, we aredone. Otherwise, we can write f(x) = g(x)h(x) for some g(x), h(x) ∈ F [x], wheredeg g(x) < n and deg h(x) < n. By the induction hypothesis, both g(x) and h(x)can be written as products of irreducible polynomials, and hence so can f(x).

Next, we will prove uniqueness of the factorization again by the inductionon deg f(x). This is clear in case deg f(x) = 1. Let n > 1 and assume that a


factorization of any polynomial of degree less than n is unique up to associates.Let f(x) ∈ F [x] with deg f(x) = n. Suppose that

f(x) = g1(x) . . . gn(x) = h1(x) . . . hm(x),

where gi(x) and hj(x) are all irreducible. Hence g1(x) | h1(x) . . . hm(x). Itfollows easily by a generalization of Lemma 3.1.15 that g1(x) | hi(x) for somei = 1, . . . ,m. By renumbering the irreducible factors in the second factorizationif necessary, we may assume that i = 1. Since g1(x) and h1(x) are irreducible,g1(x) = α1h1(x) for some α1 ∈ F . Thus

α1g2(x) . . . gn(x) = h2(x) . . . hm(x).

Note that the polynomial above has degree less than n. Hence, by the inductionhypothesis, m = n and for each j = 2, . . . , n, gj(x) = αjhj(x) for some αj ∈ F .This finishes the induction and the proof of the theorem.

Remark. Theorem 3.1.16 says that F [x] is a Unique Factorization Domain(UFD) whenever F is a field.

Definition 3.1.17. A nonconstant polynomial p(x) ∈ F [x] is said to split overF if p(x) can be written as a product of linear factors in F [x].

Definition 3.1.18. A field F is said to be algebraically closed if every noncon-stant polynomial over F has a root in F .

Examples. C is algebraically closed by the Fundamental Theorem of Algebra,but Q and R are not algebraically closed.

Proposition 3.1.19. The following statements on a field F are equivalent:

(i) F is algebraically closed;

(ii) every nonconstant polynomial p(x) ∈ F [x] splits over F ;

(iii) every irreducible polynomial in F [x] has degree one.


Proof. (i) ⇒ (ii). Let p(x) ∈ F [x] − F and n = deg p(x). We will prove byinduction on n. If n = 1, then we are done. Assume that n > 1 and every non-constant polynomial of degree n−1 in F [x] splits over F . Since F is algebraicallyclosed, p(x) has a root α ∈ F . By Corollary 3.1.4, p(x) = (x − α)q(x) for someq(x) ∈ F [x]. Then deg q(x) = n−1, and hence q(x) splits over F by the inductionhypothesis. Thus p(x) also splits over F .(ii) ⇒ (iii). Let q(x) be an irreducible polynomial in F [x]. Then deg q(x) ≥ 1and hence q(x) splits over F by the assumption. If deg q(x) > 1, then any linearfactor of q(x) is its nonconstant proper factor, contradicting irreducibility of q(x).Thus deg q(x) = 1.(iii) ⇒ (i). Let f(x) be a nonconstant polynomial over F . By Theorem 3.1.16,f(x) can be written as a product of linear factors. Hence there exists α ∈ F suchthat (x− α) | f(x), i.e. α is a root of f(x), by Corollary 3.1.4. This shows thatF is algebraically closed.

3.2. DIAGONALIZATION 115

3.2 Diagonalization

Throughout this chapter, V will be a finite-dimensional vector space over a fieldF and T : V → V a linear operator on V .

Definition 3.2.1. A linear operator T : V → V is said to be diagonalizable ifthere exists an ordered basis B for V such that [T ]B is a diagonal matrix.

An n×n matrix A is said to be diagonalizable if there is an invertible matrixP such that P−1AP is a diagonal matrix, i.e. A is similar to a diagonal matrix.

Lemma 3.2.2. Let V be a finite-dimensional vector space with dimV = n. LetB be an ordered basis for V and T : V → V a linear operator on V . If A is ann × n matrix similar to [T ]B, then there is an ordered basis C for V such that[T ]C = A.

Proof. Let B = v1, . . . , vn and A an n× n matrix. Then there is an invertiblematrix P such that A = P−1[T ]BP . Since L(V ) ∼= Mn(F ), there is a linear mapU : V → V such that [U ]B = P . Then U is a linear isomorphism because P isinvertible. Hence

[U−1TU ]B = [U−1]B[T ]B[U ]B = P−1[T ]BP = A.

It follows that, for j = 1, . . . , n,

U−1TU(vj) =n∑i=1

aijvi, and that

TU(vj) =n∑i=1

aijU(vi). (3.1)

Let C = U(v1), . . . , U(vn). Since B is an ordered basis for V and U is a linearisomorphism, we see that C is a basis for V and [T ]C = A by (3.1).

Proposition 3.2.3. Let T : V → V be a linear operator on V and B an orderedbasis for V . Then T is diagonalizable if and only if [T ]B is diagonalizable.

Proof. Assume that T is diagonalizable. Then there is an ordered basis C suchthat [T ]C is a diagonal matrix. Thus there is an invertible matrix P such thatP−1[T ]BP = [T ]C. This shows that [T ]B is diagonalizable.


Conversely, assume that [T ]B is diagonalizable. Then [T ]B is similar to adiagonal matrix D. By Lemma 3.2.2, there is an ordered basis C for V such that[T ]C = D. Hence T is diagonalizable.

Corollary 3.2.4. Let A ∈ Mn(F ) and LA : Fn → Fn a linear map on Fn

defined by LA(x) = Ax for any x ∈ Fn, considered as a column matrix. Then A

is diagonalizable if and only if LA is diagonalizable.

Proof. Exercise.

Proposition 3.2.5. Let V be a vector space over a field F with dimV = n andT : V → V a linear operator. Then T is diagonalizable if and only if there is abasis B = v1, . . . , vn for V and scalars λ1, . . . , λn ∈ F , not necessarily distinct,such that

Tvj = λjvj for j = 1, . . . , n.

Proof. Assume there is a basis B = v1, . . . , vn for V and scalars λ1, . . . , λn inF , not necessarily distinct, such that

Tvj = λjvj for j = 1, . . . , n.

By the definition of matrix representation, we have

[T ]B =

λ1 0 . . . 00 λ2 . . . 0...

.... . .

...0 0 . . . λn

. (3.2)

Conversely, assume that T is diagonalizable. Then there is an ordered basisB = v1, . . . , vn such that [T ]B is a diagonal matrix in the form (3.2). Again, bythe definition of matrix representation, we have Tvj = λjvj for j = 1, . . . , n.

Corollary 3.2.6. Let A ∈Mn(F ). Then A is diagonalizable if and only if thereis a basis B = v1, . . . , vn for Fn and scalars λ1, . . . , λn ∈ F , not necessarilydistinct, such that

Avj = λjvj for j = 1, . . . , n.

Proof. This follows immediately from Proposition 3.2.5 and Corollary 3.2.4.


Definition 3.2.7. Let T : V → V be a linear operator on V . A scalar λ ∈ F iscalled an eigenvalue for T if there is a non-zero v ∈ V such that T (v) = λv. Anon-zero vector v such that T (v) = λv is called an eigenvector corresponding tothe eigenvalue λ.

For each λ ∈ F , define

Vλ = v ∈ V | T (v) = λv = ker(T − λIV ).

Then Vλ is a subspace of V . If λ is not an eigenvalue of T , then Vλ = 0. If λis an eigenvalue of T , we call Vλ the eigenspace corresponding to the eigenvalueλ. Any non-zero vector in Vλ is an eigenvector corresponding to λ.

Similarly, we define an eigenvalue, an eigenvector and an eigenspace of amatrix in an analogous way.

Definition 3.2.8. Let A be an n× n matrix with entries in a field F . A scalarλ ∈ F is called an eigenvalue for A if there is a non-zero v ∈ Fn such thatAv = λv. A non-zero vector v such that Av = λv is called an eigenvectorcorresponding to the eigenvalue λ.For each λ ∈ F , define

Vλ = v ∈ Fn | Av = λv.

If λ is not an eigenvalue of A, then Vλ = 0. If λ is an eigenvalue of A, Vλ iscalled the eigenspace corresponding to the eigenvalue λ. Any non-zero vector inVλ is an eigenvector corresponding to λ.

In fact, an eigenvalue (eigenvector, eigenspace) for a matrix A is an eigenvalue(eigenvector, eigenspace) for a linear operator LA : Fn → Fn, x 7→ Ax. Hence anyresult about eigenvalues and eigenvectors of a linear operator will be transferredto the analogous result for a matrix as well.

Using the language of eigenvalues and eigenvectors, we can rephrase Propo-sition 3.2.5 as follows:

Corollary 3.2.9. A linear operator T is diagonalizable if and only if there is abasis for V consisting of eigenvectors of T .

From this Corollary, to verify whether a linear operator is diagonalizable, wewill find its eigenvectors and see whether they form a basis for the vector space.


Proposition 3.2.10. Let T : V → V be a linear operator. If v1, . . . , vk are eigen-vectors of T corresponding to distinct eigenvalues, then v1, . . . , vk is linearlyindependent.

Proof. We will proceed by induction on k. If k = 1, then the result followsimmediately because a non-zero vector forms a linearly independent set. Assumethe statement holds for k − 1 eigenvectors. Let v1, . . . , vk be eigenvectors of Tcorresponding to distinct eigenvalues λ1, . . . , λk, respectively. Let α1, . . . , αk ∈ Fbe such that

α1v1 + α2v2 + · · ·+ αkvk = 0. (3.3)

Applying T both sides of (3.3), we have

α1λ1v1 + α2λ2v2 + · · ·+ αkλkvk = 0. (3.4)

Multiplying equation (3.3) by λk, we also have

α1λkv1 + α2λkv2 + · · ·+ αkλkvk = 0. (3.5)

We now subtract (3.5) from (3.4).

α1(λ1 − λk)v1 + α2(λ2 − λk)v2 + · · ·+ αk−1(λk−1 − λk)vk−1 = 0.

By the induction hypothesis, αi(λi−λk) = 0 for i = 1, . . . , k−1. Hence αi = 0 fori = 1, . . . , k−1 because λi’s are all distinct. Substitute αi = 0 for i = 1, . . . , k−1in (3.3). It follows that αk = 0. Thus v1, . . . , vk is linearly independent.

Corollary 3.2.11. Let V be a finite-dimensional vector space with dimV = n

and T : V → V a linear operator. Then T has at most n distinct eigenvalues.Furthermore, if T has n distinct eigenvalues, then T is diagonalizable.

Proof. Let λ1, . . . , λk be the distinct eigenvalues of T with corresponding eigen-vectors v1, . . . , vk, respectively. By Proposition 3.2.10, v1, . . . , vk is linearlyindependent. Since dimV = n, it follows that k ≤ n. If k = n, v1, . . . , vnis a basis for V consisting of eigenvectors of T . Hence T is diagonalizable byCorollary 3.2.9.


Proposition 3.2.12. Let T : V → V be a linear operator with distinct eigenvaluesλ1, . . . , λk. Let W = Vλ1 + · · · + Vλk , where each Vλi is the correspondingeigenspace of λi. Then W = Vλ1⊕· · ·⊕Vλk . In other words, the sum of eigenspacesis indeed a direct sum.

Proof. Let v1 ∈ Vλ1 , . . . , vk ∈ Vλk be such that v1 + · · ·+ vk = 0. Suppose vi 6= 0for some i. By renumbering if necessary, assume that vi 6= 0 for 1 ≤ i ≤ j andvi = 0 for i = j + 1, . . . , k. Then v1 + · · ·+ vj = 0. This shows that v1, . . . , vjis linearly dependent. But this contradicts Proposition 3.2.10. Hence vi = 0 fori = 1, . . . , k.

Theorem 3.2.13. Let T : V → V be a linear operator with distinct eigenvaluesλ1, . . . , λk. Then TFAE:

(i) T is diagonalizable;

(ii) V = Vλ1 ⊕ · · · ⊕ Vλk ;

(iii) dimV = dimVλ1 + · · ·+ dimVλk .

Proof. Let W = Vλ1 + · · ·+ Vλk . By Proposition 3.2.12, W = Vλ1 ⊕ · · · ⊕ Vλk .(i) ⇒ (ii). Assume that T is diagonalizable. Let B be a basis for V consistingof eigenvectors of T . For i = 1, . . . , k, let Bi = B ∩ Vλi . Then B = ∪ki=1Bi andBi ∩Bj = ∅ for any i 6= j. Note that each Bi is a linearly independent subset ofVλi . Hence

dimV = |B| =k∑i=1

|Bi| ≤k∑i=1

dimVλi = dimW.

This implies V = W = Vλ1 ⊕ · · · ⊕ Vλk .(ii)⇒ (iii). This follows from Corollary 1.6.22.(iii) ⇒ (i). Suppose dimV =

∑ki=1 dimVλi . For i = 1, . . . , k, choose a basis Bi

for each Vλi . Then Bi ∩Bj = ∅ for any i 6= j. Let B = ∪ki=1Bi. By Proposition1.6.21, B is a basis for W and hence is linearly independent in V . It follows that

dimV =k∑i=1

dimVλi =k∑i=1

|Bi| = |B|.

Thus B is a basis for V consisting of eigenvectors of T . This shows that T isdiagonalizable.


The next proposition gives a method for computing an eigenvalue of a linearmap by solving a certain polynomial equation called the characteristic equation.

Proposition 3.2.14. Let T : V → V be a linear operator and λ ∈ F . Then λ isan eigenvalue of T if and only if det(T − λIV ) = 0.

Proof. For any λ ∈ F , we have the following equivalent statements:

λ is an eigenvalue of T ⇔ ∃v ∈ V − 0, T (v) = λv

⇔ ∃v ∈ V − 0, (T − λIV )(v) = 0

⇔ T − λIV is not 1-1

⇔ T − λIV is not invertible

⇔ det(T − λIV ) = 0.

Notice that we use the assumption V being finite-dimensional in the fourth equiv-alence.

Corollary 3.2.15. Let A ∈Mn(F ) and λ ∈ F . Then λ is an eigenvalue of A ifand only if det(A− λIn) = 0.

Proposition 3.2.16. Let T : V → V be a linear operator and λ ∈ F . If B is anordered basis for V , then

det(T − λIV ) = det([T ]B − λIn).

Hence λ is an eigenvalue of T if and only if λ is an eigenvalue of [T ]B.

Proof. The first statement follows from the fact that

det(T − λIV ) = det([T − λIV ]B) = det([T ]B − λIn).

The second statement immediately follows.

Definition 3.2.17. Let A ∈Mn(F ). We define the characteristic polynomial ofA to be

χA(x) = det(xIn −A).


Similarly, if T : V → V is a linear operator on V , we define the characteristicpolynomial of T to be

χT (x) = det(xIn − [T ]B),

where B is an ordered basis for V .Notice that χT (and χA) is a monic polynomial of degree n = dimV . More-

over, Proposition 3.2.14 shows that the eigenvalues of T (or A) are the roots ofits characteristic polynomial.

Remark. Note that the matrix xIn−A is in Mn(F [x]) with each entry in xIn−Abeing a polynomial in F [x]. In this case, F [x] is a ring but not a field. We canextend the definition of the determinant of a matrix over a field to that of amatrix over a ring. However, we cannot define the characteristic polynomial ofa linear operator T to be det(xIV − T ) because xIV − T is not a linear operatoron a vector space V . We define its characteristic polynomial using its matrixrepresentation instead.

Example. Define T : R2 → R2 by

T (x, y) = (x+ 4y, 3x+ 2y).

Find the eigenvalues of T and determine whether it is diagonalizable. If it is, finda basis for R2 consisting of eigenvectors of T .

Solution. Let B = (1, 0), (0, 1) be the standard ordered basis for R2. Let

A = [T ]B =

(1 43 2

).

Then T can be viewed as T (v) = Av = LA(v) for any v ∈ R2, written as a column2× 1 matrix. Hence

χT (x) = χA(x) = det

(x− 1 −4−3 x− 2

)= (x− 1)(x− 2)− 12

= (x− 5)(x+ 2).


Thus the eigenvalues of T are −2 and 5. Since T has 2 distinct eigenvalues, it isdiagonalizable. To find a basis consisting of eigenvectors of T , we will find theeigenspaces corresponding to −2 and 5, respectively.λ = −2 : Let v = (x, y) ∈ ker(T + 2I). Then (A+ 2I)v = 0, i.e.,(

3 43 4

)(x

y

)=

(00

).

Thus 3x+ 4y = 0. Hence the eigenspace corresponding to λ = −2 is 〈(4,−3)〉.λ = 5 : Let v = (x, y) ∈ ker(T − 5I). Then (A− 5I)v = 0, i.e.,(

−4 43 −3

)(x

y

)=

(00

).

Thus x− y = 0. Hence the eigenspace corresponding to λ = 5 is 〈(1, 1)〉.Let C = (4,−3), (1, 1). Then C is a linearly independent set which has 2

elements, and hence it is a basis for R2 consisting of eigenvectors of T . In fact, toobtain a basis for R2 consisting of eigenvectors of T , we simply choose one vectorfrom each eigenspace.


T (x, y) = (x+ y, y).


Solution. Let A = [T ]B =

(1 10 1

), where B is the standard ordered basis for

R2. Hence


(x− 1 −1

0 x− 1

)= (x− 1)2.

Hence the eigenvalue of T is 1 with multiplicity 2. Now we find the eigenspacecorresponding to 1. Let v = (x, y) ∈ ker(T − I). Then (A− I)v = 0, i.e.,(

0 10 0

)(x

y

)=

(00

).


Thus y = 0. Hence the eigenspace corresponding to λ = 1 is 〈(1, 0)〉. We seethat there is no basis for R2 consisting of eigenvectors of T . Thus T is notdiagonalizable.


T (x, y) = (−y, x).


Solution. Let A = [T ]B =

(0 −11 0

), where B is the standard ordered basis for

R2. Hence


(x 1−1 x

)= x2 + 1.

Since x2 + 1 has no root in R, we see that T is not diagonalizable. Note that ifT is regarded as a linear map on C2, then T has two eigenvalues i and −i andhence it is diagonalizable over C.

Remark. A linear map on a complex vector space always has an eigenvalue bythe Fundamental Theorem of Algebra.

If A is an n × n diagonalizable matrix, then there is an invertible matrix P

such that P−1AP = D is a diagonal matrix. Assume that D = diag(λ1, . . . , λn).Then

AP = PD.

If we write P = [p1 . . . pn], where each pj is the j-th column of P , then

AP = [Ap1 . . . Apn] and PD = [λ1p1 . . . λnpn].

Hence[Ap1 . . . Apn] = [λ1p1 . . . λnpn].

It follows that Apj = λjpj for j = 1, . . . , n. Thus each pj is an eigenvector of Acorresponding to the eigenvalue λj . Hence the j-th column of P is an eigenvectorof A corresponding to the the j-th diagonal entry of D.


Example. Given the following 3× 3 matrix A:

A =

2 0 −20 1 0−2 0 5

,

determine whether A is diagonalizable. If it is, find an invertible matrix P suchthat P−1AP is a diagonal matrix.

Solution.

χA(x) = det

x− 2 0 20 x− 1 02 0 x− 5

= (x− 1)2(x− 6).

Hence the eigenvalues of A are 1, 1, 6. By routine calculation, we see thatthe eigenspace corresponding to λ = 1 is 〈(0, 1, 0), (2, 0, 1)〉 and the eigenspacecorresponding to λ = 6 is 〈(1, 0,−2)〉. Hence we have a basis for R3

(0, 1, 0), (2, 0, 1), (1, 0,−2)

consisting of eigenvectors of A. This shows that A is diagonalizable. If we let

P =

0 2 11 0 00 1 −2

and D =

1 0 00 1 00 0 6

,

then we have P−1AP = D.

The following proposition about a determinant of a block matrix will be useful.

Proposition 3.2.18. Suppose A is an (m + n) × (m + n) matrix which can bewritten in a block form: (

B C

O D

),

where B ∈ Mm×m(F ), C ∈ Mm×n(F ), D ∈ Mn×n(F ) and O is the zero matrixof size n×m, respectively. Then

detA = detB · detD.


Proof. We outline the calculations and leave the details to the reader. Note that(B C

O D

)=

(Im C

O D

)(B OO In

),

where the zero matrices are of their suitable sizes. It is easy to verify that

det

(Im C

O D

)= detD and det

(B OO In

)= detB.

The desired result now follows.

Definition 3.2.19. Let T : V → V be a linear operator. Assume that

χT (x) = (x− λ1)n1 . . . (x− λk)nk ,

where λ1, . . . , λk are distinct eigenvalues of T . We call ni the algebraic multiplicityof λi and dimVλi the geometric multiplicity of λi.

In other words, the algebraic multiplicity of λi is the number of the repeatedfactors x − λi in the characteristic polynomial, and its geometric multiplicity isthe number of linearly independent eigenvectors corresponding to λi.

Proposition 3.2.20. Let T : V → V be a linear operator. Assume that thecharacteristic polynomial χT (x) splits over F with distinct roots λ1, . . . , λk. Foreach eigenvalue λi of T , its geometric multiplicity is no greater than its algebraicmultiplicity. They are equal for all eigenvalues if and only if T is diagonalizable.

Proof. Let λ be an eigenvalue of T with geometric multiplicity d. Then Vλ con-tains d linearly independent eigenvectors, say v1, . . . , vd. Extend it to a basisB = v1, . . . , vn for V . Since

T (vi) = λvi for i = 1, . . . , d,

the matrix representation [T ]B has the block form

[T ]B =

(λId B

O C

),


where B is a d× (n− d) matrix, C is an (n− d)× (n− d) matrix, and O is thezero matrix of size (n− d)× d, respectively. By Proposition 3.2.18,

det(xIn − [T ]B) = det((x− λ)Id) · det(xIn−d − C)

= (x− λ)dg(x),

where g(x) = det(xIn−d − C) is a polynomial in F [x]. Hence (x − λ)d | χT (x).This shows that d ≤ the algebraic multiplicity of λ.

Let n = dimV , ni = algebraic multiplicity of λi and di = geometric multi-plicity of λi for i = 1, . . . , k.

Suppose T is diagonalizable. Let B be a basis for V consisting of eigenvectorsof T . For i = 1, . . . , k, let Bi = B∩Vλi , the set of vectors in B that are eigenvectorsof T corresponding to λi, and let mi = |Bi|. Then

mi ≤ di ≤ ni for i = 1, . . . , k.

Hence

n =k∑i=1

mi ≤k∑i=1

di ≤k∑i=1

ni = n.

This implies that mi = di = ni for i = 1, . . . , k. Conversely, if di = ni fori = 1, . . . , k, then

dimV = n =k∑i=1

ni =k∑i=1

di =k∑i=1

dimVλi .

Hence T is diagonalizable by Theorem 3.2.13.


Exercises

In this exercise, let V be a finite-dimensional vector space over a field F andT : V → V a linear operator.

3.2.1. Given 3× 3 matrices A below, determine whether A is diagonalizable. Ifit is, find an invertible matrix P such that P−1AP is a diagonal matrix.

(a)

3 1 −12 2 −12 2 0

(b)

5 −6 −6−1 4 23 −6 −4

.

3.2.2. Prove the following statements:

(i) 0 is an eigenvalue of T if and only if T is non-invertible;

(ii) If T is invertible and λ is an eigenvalue of T , then λ−1 is an eigenvalue forT−1.

3.2.3. Let S and T be linear operators on V . Show that ST and TS have thesame set of eigenvalues.Hint: Separate the cases whether 0 is an eigenvalue.

3.2.4. If A and B are similar square matrices, show that χA = χB. Hence similarmatrices have the same set of eigenvalues.

3.2.5. If T 2 has an eigenvalue λ2, for some λ ∈ F , show that λ or −λ is aneigenvalue for T .Remark. Try to use the definition and not the characteristic equation.

3.2.6. Prove that if λ is an eigenvalue of T and p(x) ∈ F [x], then p(λ) is aneigenvalue of p(T ).

3.2.7. Let λ ∈ F and suppose there is a non-zero v ∈ V such that T (v) = λv.Prove that there is a non-zero linear functional f ∈ V ∗ such that T t(f) = λf . Inother words, if λ is an eigenvalue of T , then it is an eigenvalue of T t.


3.3 Minimal Polynomial

Definition 3.3.1. Let T : V → V be a linear operator and p(x) ∈ F [x]. Ifp(x) = a0 + a1x+ · · ·+ anx

n, define

p(T ) = a0I + a1T + · · ·+ anTn.

Then p(T ) is a linear operator on V . Also, if p(x), q(x) ∈ F [x] and k ∈ F ,

(p+ q)(T ) = p(T ) + q(T )

(kp)(T ) = kp(T )

(pq)(T ) = p(T )q(T ).

In other words, the map p(x) 7→ p(T ) is an algebra homomorphism from thepolynomial algebra F [x] into the algebra of linear operators L(V ). Note that anytwo polynomials in T commute:

p(T )q(T ) = (pq)(T ) = (qp)(T ) = q(T )p(T ).

Similarly, if A is an n× n matrix over F and p(x) ∈ F [x] is as above, define

p(A) = a0In + a1A+ · · ·+ anAn.

Then p(A) is an n×n matrix over F and the map p(x) 7→ p(A) is an algebra ho-momorphism from the polynomial algebra F [x] into the algebra of n×n matricesMn(F ).

Lemma 3.3.2. Let T : V → V be a linear operator. Then there is a non-zeropolynomial p(x) ∈ F [x] such that p(T ) = 0.

Proof. Let n = dimV . Consider the set of n2 + 1 elements I, T, T 2, . . . , Tn2 in

L(V ). Since dimL(V ) = n2, it is linearly dependent. Hence there exist scalarsa0, a1, . . . , an2 , not all zero, such that

a0I + a1T + · · ·+ an2Tn2

= 0.

Now, let p(x) = a0 + a1x+ · · ·+ an2xn2. Then p(x) ∈ F [x] and p(T ) = 0.

3.3. MINIMAL POLYNOMIAL 129

Theorem 3.3.3. Let T : V → V be a linear operator. Then there is a uniquemonic polynomial of smallest degree mT (x) ∈ F [x] such that mT (T ) = 0. More-over, if f(x) ∈ F [x] is such that f(T ) = 0, then mT (x) divides f(x).

Similarly, for any matrix A ∈Mn(F ), there is a unique monic polynomial ofsmallest degree mA(x) ∈ F [x] such that mA(A) = 0. Moreover, if f(x) ∈ F [x] issuch that f(A) = 0, then mA(x) divides f(x).

Proof. We will prove only the first part of the theorem. By Lemma 3.3.2, there isa polynomial p(x) such that p(T ) = 0. By the Well-Ordering Principle, let m(x)be a polynomial over F of smallest degree such that m(T ) = 0. By dividing allthe coefficients by the leading coefficient, we can choose m(x) to be monic. Nowlet f(x) ∈ F [x] be a polynomial such that f(T ) = 0. By the Division Algorithmfor polynomials (Theorem 3.1.3), there exist q(x), r(x) ∈ F [x] such that

f(x) = q(x)m(x) + r(x),

where deg r(x) < degm(x) or r(x) = 0. Hence

f(T ) = q(T )m(T ) + r(T ).

Since f(T ) = m(T ) = 0, it follows that r(T ) = 0. But then m(T ) is the polyno-mial of smallest degree such that m(T ) = 0. This shows that r(x) = 0 and thatf(x) = q(x)m(x). Thus m(x) | f(x).

Now, let m(x) and m′(x) be monic polynomials of smallest degree such thatm(T ) = m′(T ) = 0. By the argument above, m(x) | m′(x) and m′(x) | m(x).This implies that m′(x) = cm(x) for some c ∈ F . Since m(x) and m′(x) aremonic, we see that c = 1 and that m(x) = m′(x).

Definition 3.3.4. Let T : V → V be a linear operator. The unique monicpolynomial mT (x) ∈ F [x] of smallest degree such that mT (T ) = 0 is called theminimal polynomial of T .

If A ∈ Mn(F ), then the unique monic polynomial mA(x) ∈ F [x] of smallestdegree such that mA(A) = 0 is called the minimal polynomial of A.


Theorem 3.3.5. Let T : V → V be a linear operator. Then mT (λ) = 0 if andonly if λ is an eigenvalue of T . In other words, χT (x) and mT (x) have the sameset of roots, possibly except for multiplicities.

Proof. (⇒) Assume that mT (λ) = 0. By Corollary 3.1.4, mT (x) = (x − λ)q(x)for some q(x) ∈ F [x]. Since deg q(x) < degmT (x), we see that q(T ) 6= 0. Hencethere is a nonzero v ∈ V such that q(T )(v) 6= 0. Let w = q(T )(v). It follows that

(T − λI)(w) = (T − λI)q(T )(v) = mT (T )(v) = 0.

Thus λ is an eigenvalue of T with a corresponding eigenvector w.(⇐) Assume that λ is an eigenvalue of T . Then there is a nonzero v ∈ V suchthat T (v) = λv. By the Division Algorithm for polynomials (Theorem 3.1.3),there exist q(x), r(x) ∈ F [x] such that

mT (x) = q(x)(x− λ) + r(x),

where deg r(x) < deg(x− λ) or r(x) = 0, i.e., r(x) = r is a constant. Thus

0 = mT (T ) = q(T )(T − λI) + rI.

Applying the above equality to the eigenvector v, we obtain

0 = q(T )(T − λI)(v) + rv = rv.

Then r = 0. Hence mT (x) = q(x)(x− λ), which implies mT (λ) = 0.

Theorem 3.3.6 (Cayley-Hamilton). If A ∈Mn(F ), then χA(A) = 0.

Proof. Let A ∈Mn(F ). Write C = xIn −A. Then

χA(x) = det(xI −A) = knxn + kn−1x

n−1 + · · ·+ k1x+ k0,

where kn, kn−1, . . . , k0 ∈ F . We will show that

χA(A) = knAn + kn−1A

n−1 + · · ·+ k1A+ k0In = 0.

Recall that for any square matrix P , adjP = (Cof P )t is a matrix satisfyingP adjP = (detP )In. Thus adjC is an n×n matrix whose entries are polynomialsof degree ≤ n− 1. Hence we can write adjC as

adjC = Mn−1xn−1 +Mn−2x

n−2 + · · ·+M1x+M0


where Mi, i = 0, 1, . . . , n− 1, are n× n matrices with scalar entries. Thus

C adjC = (xIn −A)(Mn−1xn−1 +Mn−2x

n−2 + · · ·+M1x+M0)

= Mn−1xn + (Mn−2 −AMn−1)xn−1 + · · ·+ (M0 −AM1)x−AM0.

On the other hand,

(detC)In = χA(x)In = (knIn)xn + (kn−1In)xn−1 + · · ·+ (k1In)x+ k0In.

By comparing the matrix coefficients, we see that

knIn = Mn−1

kn−1In = Mn−2 −AMn−1

......

k1In = M0 −AM1

k0In = −AM0.

Multiply on the left the first equation by An, the second equation by An−1, andso on. We then have

knAn = AnMn−1

kn−1An−1 = An−1Mn−2 −AnMn−1

......

k1A = AM0 −A2M1

k0In = −AM0.

Adding up these equations, we obtain

knAn + kn−1A

n−1 + · · ·+ k1A+ k0In = 0.

Hence χA(A) = 0.

Corollary 3.3.7. Let T : V → V be a linear operator. Then χT (T ) = 0.


Proof. Let B be an ordered basis for V . Write A = [T ]B. Note that

χT (x) = det(xIn − [T ]B) = χA(x).

We leave it as an exercise to show that

[p(T )]B = p([T ]B) for any p(x) ∈ F [x].

Hence [χT (T )]B = χT ([T ]B) = χA(A) = 0. This shows that χT (T ) = 0.

Corollary 3.3.8. If T : V → V is a linear operator, then mT divides χT . Simi-larly, if A ∈Mn(F ), then mA divides χA.

Proof. This follows immediately from Theorem 3.3.3, Theorem 3.3.6 and Corol-lary 3.3.7.

Example. Let T : R2 → R2 be defined by

T (x, y) = (3x− 2y, 2x− y).

Find χT and mT .

Solution. Let B = (1, 0), (0, 1) be the standard basis for R2. Let

A = [T ]B =

(3 −22 −1

).

Then


(x− 3 2−2 x+ 1

)= (x− 3)(x+ 1) + 4 = (x− 1)2.

Since mA divides χA and they have the same roots, we see that mA(x) = x − 1or mA(x) = (x− 1)2. If p(x) = x− 1, then p(A) = A− I 6= 0. Hence mT (x) =mA(x) = (x− 1)2.

Example. Let T : R3 → R3 be defined by

T (x, y, z) = (3x− 2y,−2x+ 3y, 5z).

Find χT and mT .


Solution. Let B = (1, 0, 0), (0, 1, 0), (0, 0, 1) be the standard basis for R3. Let

A = [T ]B =

3 −2 0−2 3 00 0 5

.

Then


x− 3 2 02 x− 3 00 0 x− 5

= (x− 5)2(x− 1).

Thus mA(x) = (x− 5)(x− 1) or mA(x) = (x− 5)2(x− 1). But then

(A− 5I)(A− I) =

−2 −2 0−2 −2 00 0 0

2 −2 0−2 2 00 0 4

= 0.

Hence mT (x) = mA(x) = (x− 1)(x− 5).

Theorem 3.3.9. Let T : V → V be a linear operator. Then T is diagonalizableif and only if mT (x) is a product of distinct linear factors over F .

Proof. (⇒) Assume that there is a basis B = v1, . . . , vn for V consisting ofeigenvectors of T . For i = 1, . . . , n, let λi be the eigenvalue corresponding to vi,i.e. T (vi) = λivi. Let α1, . . . , αk be distinct elements of the set λ1, . . . , λn. Letp(x) = (x− α1) . . . (x− αk). We will show that p(T ) = 0. Since B is a basis forV , it suffices to show that p(T )(vi) = 0 for i = 1, . . . , n. Fix i ∈ 1, . . . , n andassume that λi = αj . Then T (vi) = αjvi. Since

p(T ) = (T − α1I) . . . (T − αjI) . . . (T − αkI),

we can switch the order of the terms in the parentheses so that T −αjI is the lastone on the right-hand side. Then (T − αjI)(vi) = 0, which implies p(T )(vi) = 0.Hence p(T ) = 0. It follows that mT (x) | p(x). But then p(x) is a product ofdistinct linear factors, and so is mT (x).


(⇐) Suppose mT (x) = (x−λ1) . . . (x−λk), where λ1, . . . , λk are all distinct. LetVλi = ker(T − λiI) be the eigenspace corresponding to λi for i = 1, . . . , k. Firstwe show that V = Vλ1 + · · ·+ Vλk . For i = 1, . . . , k, let

σi(x) = x− λi and τi(x) =∏j 6=i

(x− λj).

Then σi(x)τi(x) = mT (x) for i = 1, . . . , k, and hence σi(T )τi(T ) = mT (T ) = 0.Note that τ1(x), . . . , τk(x) have no common factors in F [x], and thus

gcd(τ1(x), . . . , τk(x)) = 1.

By Proposition 3.1.9, there exist q1(x), . . . , qk(x) ∈ F [x] such that

q1(x)τ1(x) + · · ·+ qk(x)τk(x) = 1.

It follows thatq1(T )τ1(T ) + · · ·+ qk(T )τk(T ) = I.

Let v ∈ V and vi = qi(T )τi(T )(v) for i = 1, . . . , k. Then v = v1 + · · ·+ vk and

(T − λiI)(vi) = σi(T )qi(T )τi(T )(v) = qi(T )σi(T )τi(T )(v) = 0.

Thus vi ∈ Vλi for i = 1, . . . , k. Hence V = Vλ1 + · · · + Vλk . By Proposition3.2.12 and Theorem 3.2.13, we conclude that V = Vλ1 ⊕ · · · ⊕ Vλk and that T isdiagonalizable.

Definition 3.3.10. Let T : V → V be a linear operator. A subspace W of V issaid to be invariant under T or T -invariant if T (W ) ⊆W .

Example. 0, V , kerT and imT are all T -invariant. An eigenspace of T is alsoT -invariant.

Example. Let T : F [x] → F [x] be the differentiation operator f 7→ f ′. Theneach subspace Pn of F [x] consisting of polynomials of degree ≤ n and the zeropolynomial is T -invariant.

Remark. If T : V → V is a linear operator on V and W is a T -invariant subspaceof V , then the restriction T |W of T to W is a linear operator on W . We will denoteit by TW . Hence TW : W → W is a linear operator such that TW (x) = T (x) forany x ∈W .


Let us discuss a relation between the matrix representation of a linear operatorand its restriction on an invariant subspace. Let W be a T -invariant subspace ofV . Let C = v1, . . . , vk be a basis for W and extend it to a basis B = v1, . . . , vnfor V . Write

T (vj) =n∑i=1

αijvi j = 1, . . . , n.

Since W is T -invariant, we have T (vj) ∈ W for j = 1, . . . , k. Hence αij = 0 forj = 1, . . . , k and i = k + 1, . . . , n. We see that [T ]B has the block form

[T ]B =

(B C

O D

),

where B = [TW ]C is a k×k matrix, C is a k×(n−k) matrix, D is an (n−k)×(n−k)matrix, and O is the zero matrix of size (n− k)× k, respectively.

Suppose V = V1 ⊕ · · · ⊕ Vk, where Vi’s are T -invariant subspaces of V . LetBi be a basis for Vi for each i. Then, by Proposition 1.6.21, B = B1 ∪ · · · ∪Bk isan ordered basis for V and [T ]B has the block form

[T ]B =

A1 O . . . OO A2 . . . O...

.... . .

...O O . . . Ak

(3.6)

where Ai = [T |Vi ]Bi for i = 1, . . . , k.Next we investigate relations between characteristic polynomials and minimal

polynomials of a linear operator and its restriction on invariant subspaces.

Proposition 3.3.11. Let T : V → V be a linear operator, and let W be a T -invariant subspace of V . Then

(a) the characteristic polynomial of TW divides the characteristic polynomial ofT ;

(b) the minimal polynomial of TW divides the minimal polynomial of T .

Proof. (a) Let C be a basis for W and extend it to a basis B for V . Let A = [T ]Band B = [TW ]C. Then χA(x) and χB(x) are the characteristic polynomials of T


and TW , respectively. Note that A has the block form

A =

(B C

O D

).

By Proposition 3.2.18,

χA(x) = det

(xIk −B −C

O xIn−k −D

)= χB(x) det(xIn−k −D).

Since det(xIn−k −D) is a polynomial in x, we see that χB(x) | χA(x).

(b) Denote by mT (x) and mTW (x) the minimal polynomials of T and TW , re-spectively. Since mT (T ) = 0, we see that mT (TW ) = mT (T )|W = 0. It followsthat mTW (x) | mT (x).

Corollary 3.3.12. Let T : V → V be a linear operator, and W a T -invariantsubspace of V . If T is diagonalizable, then TW is diagonalizable.

Proof. If T is diagonalizable, then its minimal polynomial is a product of distinctfactors. But then the minimal polynomial of TW divides the minimal polynomialof T and thus it must be a product of distinct factors as well. Hence TW isdiagonalizable.

Proposition 3.3.13. Let T : V → V be a linear operator and suppose thatV = V1 ⊕ · · · ⊕ Vk, where each Vi is a T -invariant subspace of V . Let Ti = T |Viregarded as a linear operator on Vi for i = 1, . . . , k. Then

(a) χT (x) =∏ki=1 χTi(x);

(b) mT (x) = lcmmT1(x), . . . ,mTk(x).

Proof. (a) Let Bi be a basis for Vi for each i. Then B = B1 ∪ · · · ∪ Bk is anordered basis for V , by Proposition 1.6.21. Then the matrix [T ]B has the blockform (3.6). By a generalization of Proposition 3.2.18, we see that

χT (x) = det(xI − [T ]B) =k∏i=1

det(xIi − [Ti]Bi) =k∏i=1

χTi(x),

where Ii denotes the identity matrix of size dimVi for each i. This finishes theproof of part (a).

To prove part (b), we will show that


(i) mTi | mT for i = 1, . . . , k;

(ii) for any p(x) ∈ F [x], if mTi(x) | p(x) for i = 1, . . . , k, then mT (x) | p(x).

The first statement follows from Proposition 3.3.11 (ii). To show the secondstatement, let p(x) ∈ F [x] be such that mTi(x) | p(x) for i = 1, . . . , k. Thenp(x) = qi(x)mTi(x) for some qi(x) ∈ F [x]. In particular, if vi ∈ Vi, then

p(Ti)(vi) = qi(Ti)mTi(Ti)(vi) = 0.

Now let v ∈ V and write v = v1 + · · · + vk, where vi ∈ Vi for i = 1, . . . , k.Since each Vi is invariant under T , it is also invariant under p(T ). Note thatp(T )|Vi = p(T |Vi) = p(Ti). Hence

p(T )(v) =k∑i=1

p(T )(vi) =k∑i=1

p(Ti)(vi) = 0.

This shows that p(T ) = 0, which implies mT (x) | p(x). This finishes the proof ofthe second statement and of part (b).

We finish this section by proving that commuting diagonalizable linear op-erators can be simultaneously diagonalized, i.e., they have a common basis ofeigenvectors. This is an important result which is useful in some applications oflinear algebra. Since the proof of this theorem is quite involved, we start with aneasier version.

Proposition 3.3.14. Let V be a vector space over an algebraically closed fieldF and let S, T : V → V be linear operators such that ST = TS. Then S and T

have a common eigenvector.

Proof. First, consider the linear operator T : V → V . Since F is an algebraicallyclosed field, the characteristic equation of T has a root in F and hence T hasan eigenvalue, say, λ. Then the eigenspace W = Vλ corresponding to eigenvalueλ is nonzero. Consider the restriction S|W : W → V of S to W . We will showthat S(W ) ⊆ W . To see this, let w ∈ W . Then T (w) = λw. It follows thatTS(w) = ST (w) = S(λw) = λS(w). Hence S(w) ∈ W . Now we can regard S|Was a linear operator on W .


By the same reason, S|W : W → W has an eigenvalue, say, α with a cor-responding eigenvector v. Then S(v) = S|W (v) = αv. Moreover, v ∈ W ⊆ V ,which implies that v is an eigenvector of T corresponding to λ: T (v) = λv. Hencev is a common eigenvector for both S and T .

Theorem 3.3.15. Let F be a commuting family of diagonalizable linear operatorson V , i.e. ST = TS for all S, T ∈ F . Then there is an ordered basis B for Vsuch that [T ]B is a diagonal matrix for each T ∈ F . In other words, all linearoperators in F are simultaneously diagonalizable.

Proof. We will prove this theorem by induction on the dimension of V . If dimV =1, this is obvious. Next, let n be a positive integer such that the statement ofthe theorem holds for all vector spaces of dimension less than n. Let V be avector space of dimension n. Choose T ∈ F which is not a scalar multiple ofI. Let λ1, . . . , λk be the distinct eigenvalues of T and let Vi be the eigenspacecorresponding to λi for each i. Then each Vi is a proper subspace of V andV = V1 ⊕ · · · ⊕ Vk. Note that each Vi is S-invariant for any S ∈ F . To see this,let S ∈ F . If v ∈ Vi, then T (v) = λiv, which implies TS(v) = ST (v) = S(λiv) =λiS(v). Hence S(v) ∈ Vi.

Fix i ∈ 1, . . . , k and let Fi denote the set of linear operators S|Vi : Vi → Vi

where S ∈ F . Each operator S|Vi in Fi is diagonalizable by Corollary 3.3.12.Hence Fi is a commuting family of diagonalizable linear operators on Vi. SincedimVi < n, by the induction hypothesis, the operators in Fi can be simulta-neously diagonalized, i.e., there exists a basis Bi for Vi consisting of commoneigenvectors of every operators in Fi. Thus B = B1 ∪ · · · ∪ Bk is a basis for Vconsisting of simultaneous eigenvectors of every operator in F . This completesthe induction and the proof of this theorem.


Exercises

In this exercise, unless otherwise stated, V is a finite-dimensional vector spaceover a field F .

3.3.1. Find the characteristic polynomials and the minimal polynomials of thefollowing matrices, and determine whether they are diagonalizable.

(a)

3 1 −12 2 −12 2 0

(b)

5 −6 −6−1 4 23 −6 −4

.

3.3.2. Let P : R2 → R2 be defined by P (x, y) = (x, 0). Find the minimal poly-nomial of P .

3.3.3. Let V be a finite-dimensional vector space over C and T : V → V a linearoperator. If Tn = I for some n ∈ N, show that T is diagonalizable.

3.3.4. Show that T is invertible if and only if the constant term in the minimalpolynomial of T is non-zero. Moreover, if T is invertible, then T−1 = p(T ) forsome p(x) ∈ F [x].

3.3.5. Let T : V → V be a linear operator and let B be an ordered basis for V .If p(x) ∈ F [x], prove that [p(T )]B = p([T ]B). Then prove that mT = m[T ]B .

3.3.6. If A and B are similar matrices, show that mA = mB.

3.3.7. If A is an invertible matrix such that Ak is diagonalizable for some k ≥ 2,show that A is diagonalizable.

3.3.8. Let T : V → V be a linear operator. If f(x) ∈ F [x] is any polynomial,show that ker f(T ) = ker g(T ), where g = gcd(f,mT ).

3.3.9. Let T : V → V be a linear operator. If every subspace of V is T -invariant,show that T is a scalar multiple of the identity.

3.3.10. Let T : R2 → R2 be a linear operator defined by T (x, y) = (2x + y, 2y).Let W1 = 〈(1, 0)〉. Prove that W1 is T -invariant and that there is no T -invariantsubspace W2 of V such that R2 = W1 ⊕W2.


3.3.11. Let A and B be nonsingular complex square matrices such that ABA =B. Prove that

(i) if v is an eigenvector of A, then so is Bv;

(ii) A and B2 have a common eigenvector.

3.4. JORDAN CANONICAL FORMS 141

3.4 Jordan Canonical Forms

In this section, we show that in case a linear operator (or a matrix) is not diago-nalizable, there is still a matrix representation which has a nice form and is calledthe Jordan canonical form. If it is diagonalizable, then its Jordan canonical formis a diagonal matrix.

Definition 3.4.1. Let V be a finite-dimensional vector space over a field F andΩ an algebraically closed field containing F . Let T : V → V be a linear operator.We say that

- T is semisimple if mT (x) is a product of distinct linear factors over Ω[x];

- T is nilpotent if Tn = 0 for some n ∈ N.

Remark. If F is algebraically closed, then T is semisimple if and only if T isdiagonalizable.

Proposition 3.4.2. Let T : V → V be a linear operator. If T is semisimple andnilpotent, then T = 0.

Proof. Since T is nilpotent, Tn = 0 for some n ∈ N. Let p(x) = xn ∈ F [x]. Thenp(T ) = 0. Hence mT | p, which implies mT (x) = xk for some k ≤ n. Since T issemisimple, k = 1. Thus mT (x) = x so that T = mT (T ) = 0.

Proposition 3.4.3. Let S, T : V → V be linear operators such that ST = TS

and α, β ∈ F .

(i) If S and T are semisimple, then so is αS + βT .

(ii) If S and T are nilpotent, then so is αS + βT .

Proof. (i) We will prove this statement under the assumption that F is alge-braically closed. In this case, S and T are diagonalizable. (In the general case,we can extend V to a new vector space over an algebraically closed field Ω con-taining F . Then S and T wil be diagonalizable over Ω.) Since S and T commute,they are simultaneously diagonalizable by Theorem 3.3.15. Thus there is anordered basis B for V such that [S]B and [T ]B are diagonal matrices. Hence


[αS+βT ]B = α[S]B +β[T ]B is also a diagonal matrix. This shows that αS+βT

is diagonalizable (semisimple).(ii) Assume that Sm = 0 and Tn = 0 for some m, n ∈ N. Since ST = TS,

(αS + βT )m+n =m+n∑k=0

(m+ n

k

)αm+n−k βk Sm+n−k T k.

If 0 ≤ k ≤ n, then m+n− k ≥ m and thus Sm+n−k = 0. If n < k ≤ m+n, thenT k = 0. It follows that (αS + βT )m+n = 0 and that αS + βT is nilpotent.

Theorem 3.4.4 (Primary Decomposition). Let T : V → V be a linear operator.Assume that the minimal polynomial mT (x) can be written as

mT (x) = (x− λ1)m1 . . . (x− λk)mk ,

where λ1, . . . , λk are distinct elements in F . Define

Vi = ker(T − λiI)mi , i = 1, . . . , k.

Then

(i) each Vi is a non-zero, T -invariant subspace of V ;

(ii) V = V1 ⊕ · · · ⊕ Vk.

Proof. (i) Let i ∈ 1, . . . , k. Since λi is a root of mT (x), it is an eigenvalue ofT . Hence ker(T − λiI) 6= 0. But then ker(T − λiI) ⊆ ker(T − λiI)mi = Vi. Itfollows that Vi is a non-zero subspace of V . To see that Vi is T -invariant, notethat T commutes with any polynomial in T . Thus T (T −λiI)mi = (T −λiI)mi T ,which implies that, for any v ∈ Vi,

(T − λiI)mi T (v) = T (T − λiI)mi(v) = T (0) = 0,

and hence T (v) ∈ ker(T − λiI)mi = Vi. This shows that Vi is T -invariant.

(ii) For i = 1, . . . , k, let

σi(x) = (x− λi)mi and τi(x) =∏j 6=i

σj(x) =∏j 6=i

(x− λj)mj . (3.7)


Then for each i = 1, . . . , k, we have σi(x)τi(x) = mT (x) and hence

σi(T )τi(T ) = mT (T ) = 0.

Note that τ1(x), . . . , τk(x) have no common factors in F [x], and thus

gcd(τ1(x), . . . , τk(x)) = 1.

By Proposition 3.1.9, there exist q1(x), . . . , qk(x) ∈ F [x] such that

q1(x)τ1(x) + · · ·+ qk(x)τk(x) = 1.

Henceq1(T )τ1(T ) + · · ·+ qk(T )τk(T ) = I.

Let v ∈ V and vi = qi(T )τi(T )(v) for i = 1, . . . , k. Then v = v1 + · · ·+ vk and

(T − λiI)mi(vi) = σi(T )qi(T )τi(T )(v) = qi(T )σi(T )τi(T )(v) = 0.

Thus vi ∈ Vi for i = 1, . . . , k. It remains to show that

Vi ∩(∑j 6=i

Vj

)= 0 for i = 1, . . . , k. (3.8)

Fix i ∈ 1, . . . , k. Let v ∈ Vi ∩(∑

j 6=i Vj

). Since v ∈ Vi,

σi(T )(v) = 0. (3.9)

Write v =∑

j 6=i vj , where vj ∈ Vj for j = 1, . . . , k and j 6= i. Then τi(T )(vj) = 0for all j 6= i. Hence

τi(T )(v) =∑j 6=i

τi(T )(vj) = 0. (3.10)

Note that gcd(σi, τi) = 1. By Proposition 3.1.9, there exist p(x), q(x) ∈ F [x]such that

p(x)σi(x) + q(x)τi(x) = 1.

Thusp(T )σi(T ) + q(T )τi(T ) = I.

By (3.9) and (3.10), it follows that

v = p(T )σi(T )(v) + q(T )τi(T )(v) = 0.

Hence (3.8) holds. This establishes (ii).


Proposition 3.4.5. Let T : V → V be a linear operator with the characteristicpolynomial χT (x) and the minimal polynomial mT (x) given by

χT (x) = (x− λ1)n1 . . . (x− λk)nk and

mT (x) = (x− λ1)m1 . . . (x− λk)mk .

For i = 1, . . . , k, if Vi = ker(T − λiI)mi, then

(i) the characteristic polynomial of T |Vi is (x− λi)ni,

(ii) the minimal polynomial of T |Vi is (x− λi)mi, and

(iii) dimVi = ni.

Proof. (i) Let Ti = T |Vi . Then (Ti − λiI)mi = 0 on Vi. Hence the minimalpolynomial mTi(x) of Ti divides σi(x) = (x − λi)mi . Thus mTi(x) = (x − λi)pi

for some integer pi. It follows that χTi(x) = (x − λi)qi for some integer qi. ByProposition 3.3.13(i), we have

(x− λ1)n1 . . . (x− λk)nk = χT (x) = (x− λ1)q1 . . . (x− λk)qk .

We conclude that qi = ni, by Theorem 3.1.16, and that χTi(x) = (x− λi)ni .(ii) Note that (x−λi)pi and (x−λj)pj are relative prime if i 6= j. Hence their

least common multiple is just a product of every term. By Proposition 3.3.13(ii),

(x− λ1)m1 . . . (x− λk)mk = mT (x) = (x− λ1)p1 . . . (x− λk)pk .

Again, we have pi = mi and hence mTi(x) = (x− λi)mi for i = 1, . . . , k.(iii) Note that the degree of the characteristic polynomial is the dimension of

the vector space. Hence dimVi = ni.

On each Vi = ker(T − λiI)mi , write Ti = T |Vi so that

Ti = λiIVi + (Ti − λiIVi).

Hence if Bi is any ordered basis for Vi, then

[Ti]Bi = [λiIVi ]Bi + [Ti − λiIVi ]Bi= diag(λi, . . . , λi) + [Ti − λiIVi ]Bi .


We will choose an ordered basis Bi for Vi so that [Ti − λiIVi ]Bi has a niceform. Note that (Ti − λiIVi)mi = 0 on Vi. In this case, we say that Ti − λiIVi isa nilpotent operator. We will investigate a nilpotent operator more carefully.

Definition 3.4.6. Let T : V → V be a linear operator on a finite-dimensionalvector space V . We say that T is nilpotent if T k = 0 for some k ∈ N. Thesmallest positive integer k such that T k = 0 is called the index of nilpotency,or simply the index, of T , denoted by IndT . A nilpotent matrix can be definedsimilarly.

Example. For k ∈ N, define Nk ∈Mk(F ) by

Nk =

0 1 0 . . . 00 0 1 . . . 0...

......

. . ....

0 0 0 . . . 10 0 0 . . . 0

. (3.11)

Then Nk is a nilpotent matrix of index k.

Proposition 3.4.7. Let T : V → V be a nilpotent operator. Then

(i) IndT = k if and only if mT (x) = xk;

(ii) IndT ≤ dimV ;

(iii) if n = dimV , then Tn = 0.

Proof. Exercise.

Definition 3.4.8. Let T : V → V be a linear operator. We say that V is T -cyclicif there is a vector v ∈ V such that V is spanned by the set v, T (v), T 2(v), . . . .In this case, v is called a cyclic vector of V . A T -cyclic subspace of V generatedby v is the span of the set v, T (v), T 2(v), . . . .

Remark. It is obvious that a T -cyclic subspace is T -invariant.

Example 3.4.9. Let T : R[x]→ R[x] be the differentiation operator: T (f) 7→ f ′.Then the T -cyclic subspace of R[x] generated by x2 is 〈x2, 2x, 2〉 = P2(R).


Proposition 3.4.10. Let T : V → V be a linear operator and n = dimV . If Vis T -cyclic generated by v ∈ V , then v, T (v), T 2(v), . . . , Tn−1(v) is a basis forV .

Proof. Let j be the smallest integer for which v, T (v), . . . , T j(v) is linearlydependent. The existence of j follows from the assumption that V is finite-dimensional. It follows that v, T (v), . . . , T j−1(v) is linearly independent. Wewill show that

T k(v) ∈ 〈v, T (v), . . . , T j−1(v)〉 for any k ∈ N ∪ 0. (3.12)

This is clear for 0 ≤ k ≤ j − 1. Suppose T s(v) ∈ 〈v, T (v), . . . , T j−1(v)〉. Write

T s(v) = b0v + b1T (v) + · · ·+ bj−1Tj−1(v),

where b0, b1, . . . , bj−1 ∈ F . Apply T both sides:

T s+1(v) = b0T (v) + b1T2(v) + · · ·+ bj−1T

j(v).

Since the set v, T (v), . . . , T j(v) is linearly dependent, T j(v) can be written asa linear combination of v, T (v), . . . , T j−1(v). Hence

T s+1(v) ∈ 〈v, T (v), . . . , T j−1(v)〉.

By induction, we have established (3.12). It follows that

V = 〈v, T (v), . . . , 〉 ⊆ 〈v, T (v), . . . , T j−1(v)〉 ⊆ V.

Hence V = 〈v, T (v), . . . , T j−1(v)〉. It follows that v, T (v), . . . , T j−1(v) is abasis for V . Since dimV = n, we see that j = n.

Proposition 3.4.11. Let V be a vector space with dimV = k and T : V → V

a linear operator. If V is T -cyclic generated by v ∈ V with ordered basis B =T k−1(v), T k−2(v), . . . , T (v), v, then

[T ]B = Nk,

where Nk is the k × k matrix defined by (3.11).


Proof. Let vi = T k−i(v) for i = 1, . . . , k. Then T (v1) = 0 and T (vi) = vi−1 fori = 2, . . . , k. It follows that [T ]B = Nk where Nk is defined by (3.11).

Lemma 3.4.12. Let T : V → V be a nilpotent linear operator of index k. Thenthere exist subspaces W and W ′ of V such that

(i) W and W ′ are T -invariant;

(ii) W is T -cyclic with dimension k;

(iii) V = W ⊕W ′.

Proof. Let v ∈ V be such that T k−1(v) 6= 0. Let W be the subspace of Vgenerated by B = v, T (v), . . . , T k−1(v). To show that B is linearly independent,let α0, . . . , αk−1 be scalars such that

α0v + α1T (v) + · · ·+ αk−1Tk−1(v) = 0. (3.13)

Applying T k−1 to (3.13), we obtain α0Tk−1(v) = 0, which implies α0 = 0. Hence

(3.13) reduces toα1T (v) + · · ·+ αk−1T

k−1(v) = 0. (3.14)

Again, applying T k−2(v) to (3.14), we have α1 = 0. Repeat this process until wehave α0 = α1 = · · · = αk−1 = 0. Thus B is linearly independent. Hence W is aT -cyclic subspace of dimension k. Next define

T = Z ≤ V | Z is a T -invariant subspace of V and Z ∩W = 0.

Then T 6= ∅ since 0 ∈ T . Choose W ′ ∈ T with the maximum dimension.Then W ′ is a T -invariant subspace of V and W ′ ∩W = 0. It remains to showthat V = W +W ′.

Suppose V 6= W +W ′. We will produce an element y ∈ V such that

y /∈W +W ′, but T (y) = w′ ∈W ′. (3.15)

Once we have this element y, let Z = 〈W ′ ∪ y〉 = W ′ + 〈y〉 be the subspace ofV generated by W ′ and y. Then Z is T -invariant and Z ∩W = 0. Indeed, ift = s+ αy ∈ Z, where s ∈W ′ and α ∈ F , we have

T (t) = T (s) + αT (y) ∈W ′ ⊆ Z.


Hence T (Z) ⊆ Z. Next, let t = s + αy ∈ Z ∩W , where s ∈ W ′ and α ∈ F .Then αy = t − s ∈ W + W ′. If α 6= 0, we have y ∈ W + W ′, a contradiction.Thus α = 0, which implies t = s ∈W ∩W ′ = 0. It follows that t = 0 and thatZ ∩W = 0. This contradicts the choice of W ′ since dimW ′ < dimZ. We canconclude that V = W +W ′ and that V = W ⊕W ′.

Now we find an element y satisfying (3.15). Since W +W ′ 6= V , there existsan x ∈ V such that x /∈W +W ′. Note that T 0(x) = x and T k(x) = 0 ∈W +W ′.Hence there is an i ∈ N∪ 0 such that T i(x) /∈W +W ′ but T i+1(x) ∈W +W ′.Set u = T i(x). Then u /∈ W + W ′ but T (u) ∈ W + W ′. Write T (u) = w + w′,where w ∈W and w′ ∈W ′. We claim that w = T (z) for some z ∈W .

To see this, note that

T k−1(w) + T k−1(w′) = T k−1(w + w′) = T k(u) = 0.

Since W and W ′ are both T -invariant,

T k−1(w) = −T k−1(w′) ∈W ∩W ′ = 0.

Hence T k−1(w) = 0. Since w ∈W = 〈v, T (v), . . . , T k−1(v)〉,

w = α0v + α1T (v) + · · ·+ αk−1Tk−1(v)

for some scalars α0, . . . , αk−1 ∈ F . Applying T k−1 to w in theabove equation, we see that 0 = T k−1(w) = α0T

k−1(v), which impliesα0 = 0. As a result,

w = α1T (v) + · · ·+ αk−1Tk−1(v) = T (z),

where z = α1v + · · ·+ αk−1Tk−2(v) ∈W .

Thus T (u) = w + w′ = T (z) + w′. Hence w′ = T (u − z). Let y = u − z. Ify ∈W +W ′, then u = z+ y ∈W +W ′, a contradiction. Hence y /∈W +W ′, butT (y) = w′ ∈W ′. This finishes the proof.

Theorem 3.4.13. Let T : V → V be a nilpotent linear operator. Then there existT -cyclic subspaces W1, . . . ,Wr of V such that


(i) V = W1 ⊕ · · · ⊕Wr;

(ii) dimWi = Ind(T |Wi) for i = 1, . . . , r;

(iii) IndT = dimW1 ≥ · · · ≥ dimWr.

Proof. We induct on n = dimV . If n = 1, then IndT = dimV = 1. Hence T = 0and we are done with r = 1 and W1 = V .

Assume that the statement of the theorem holds whenever dimV < n. LetV be a vector space with dimension n and T : V → V a nilpotent operator. IfIndT = n = dimV , then we are done with r = 1 and W1 = V by Lemma3.4.12. Suppose now that IndT < dimV . By Lemma 3.4.12 again, there exist T -invariant subspaces W1 and W ′ such that W1 is T -cyclic with dimW1 = IndT andV = W1 ⊕W ′. Since dimW1 ≥ 1, dimW ′ < n and T |W ′ is a nilpotent operatoron W ′. By the induction hypothesis, there exist T -cyclic subspaces W2, . . . ,Wr

of W ′ such that W ′ = W2 ⊕ · · · ⊕Wr, dimWi = Ind(T |Wi) for i = 2, . . . , r andInd(T |W ′) = dimW2 ≥ · · · ≥ dimWr. Thus

V = W1 ⊕W ′ = W1 ⊕ · · · ⊕Wr.

Since IndT ≥ Ind(T |W ′), we have dimW1 ≥ dimW2 ≥ · · · ≥ dimWr.

While the cyclic subspaces that constitute the cyclic decomposition in Theo-rem 3.4.13 are not unique, the number of cyclic subspaces in the direct sum andtheir respective dimensions are uniquely determined by the information of theoperator T alone.

Proposition 3.4.14. Let T : V → V be a nilpotent linear operator. Suppose

V = W1 ⊕ · · · ⊕Wr,

where Wi’s are T -cyclic subspaces such that IndT = dimW1 ≥ · · · ≥ dimWr anddimWi = Ind(T |Wi) for i = 1, . . . , r. Then

(i) r = dim(kerT );

(ii) For any q ∈ N, the number of subspaces Wi with dimWi = q is

2 dim(kerT q)− dim(kerT q−1)− dim(kerT q+1).


Proof. Suppose V = W1 ⊕ · · · ⊕Wr. For any q = 1, 2, . . . , we first show that

kerT q = (W1 ∩ kerT q)⊕ · · · ⊕ (Wr ∩ kerT q). (3.16)

Let u ∈ kerT q and write u = u1 + · · ·+ ur, where ui ∈Wi for each i. Then

0 = T q(u) = T q(u1) + · · ·+ T q(ur).

Since each Wi is T -invariant, T q(ui) ∈ Wi for each i. By a property of directsum, we conclude that T q(ui) = 0 for each i. Hence each ui ∈ Wi ∩ kerT q. Thisestablishes (3.16).

Suppose Wi is a T -cyclic subspace spanned by Bi = v, T (v), . . . , T ki−1(v),for some v ∈ Wi, where ki = dimWi = Ind(T |Wi). Then T ki(v) = 0. Note thatWi ⊆ kerT q if q ≥ ki. Hence

dim(Wi ∩ kerT q) = ki if ki < q. (3.17)

Next, we show that

dim(Wi ∩ kerT q) = q if ki ≥ q. (3.18)

Clearly, T ki−q(v), . . . , T ki−1 ∈Wi ∩ kerT q. If x ∈Wi ∩ kerT q, then

x = α0v + α1T (v) + · · ·+ αki−1Tki−1(v),

where α0, . . . , αki−1 ∈ F . Then

0 = T q(x) = α0Tq(v) + α1T

q+1(v) + · · ·+ αki−q−1Tki−1(v).

Since Bi is linearly independent, it follows that α0 = · · · = αki−q−1 = 0. Hence

x = αki−qTki−q(v) + · · ·+ αki−1T

ki−1(v).

This shows that Wi∩kerT q is spanned by T ki−q(v), . . . , T ki−1(v) and thus hasdimension q.

Now, applying (3.16) and (3.18) to q = 1, we see that r = dim(kerT ). Ingeneral,

dim(kerT q) =r∑i=1

dim(Wi ∩ kerT q) =∑

ki≤q−1

ki +∑ki≥q

q.


Hence

dim(kerT q−1) =∑

ki≤q−2

ki +∑

ki≥q−1

(q − 1) =∑

ki≤q−1

ki +∑ki≥q

(q − 1).

It follows that

(# of Wi with dimWi ≥ q) = dim(kerT q)− dim(kerT q−1).

This shows that

(# of Wi with dimWi = q) = 2 dim(kerT q)− dim(kerT q−1)− dim(kerT q+1).

Corollary 3.4.15. Let T : V → V be a nilpotent linear operator of index k ≥ 2.Then there exists an ordered basis B for V such that

[T ]B =

Nk1 0

. . .

0 Nkr

where

(i) k = k1 ≥ k2 ≥ · · · ≥ kr;

(ii) k1 + · · ·+ kr = n = dimV .

Moreover, the numbers r and k1, . . . , kr are uniquely determined by T .

Proof. It follows from Theorem 3.4.13 and Proposition 3.4.11. The uniquenesspart follows from Proposition 3.4.14.

Theorem 3.4.16 (Jordan canonical form). Let T : V → V be a linear operatorsuch that mT (x) splits over F . Then there exists an ordered basis B for V suchthat [T ]B is in the following Jordan canonical form:

[T ]B =

A1

A2

. . .

Ak

(3.19)


where each Ai is of the form

Ai =

Ji1

Ji2. . .

Jiri

and where each Jij, called a Jordan block, is of the form

Jij =

λi 1

λi. . .. . . 1

λi

. (3.20)

Proof. Let mT (x) = (x− λ1)m1 . . . (x− λk)mk , where λ1, . . . , λk are distinct ele-ments in F . Let Vi = ker(T −λiI)mi for i = 1, . . . , k. By Theorem 3.4.4, each Viis a non-zero T -invariant subspace and

V = V1 ⊕ · · · ⊕ Vk.

Fix i ∈ 1, . . . , k. Consider Ti = T |Vi , regarded as a linear operator on Vi. Write

Ti = λiIVi + (Ti − λiIVi). (3.21)

By Proposition 3.4.5 (ii), the minimal polynomial of Ti is (x− λi)mi . Hence theminimal polynomial of Ti − λiIVi is xmi . This shows that the linear operatorTN := Ti−λiIVi is nilpotent of index mi. By Theorem 3.4.13, there are non-zerosubspaces Wi1, . . . ,Wiri of Vi such that each Wij is TN -cyclic, Ind(TN |Wij ) =dimWij , mi = Ind(TN ) = dimWi1 ≥ · · · ≥ dimWiri and

Vi = Wi1 ⊕ · · · ⊕Wiri .

By Proposition 3.4.11, there is an ordered basis Bij for Wij such that

[TN |Wij ]Bij = Nkj for some kj ∈ N.


It follows from (3.21) that

[Ti|Wij ]Bij = [λiIWij ]Bij + [TN |Wij ]Bij

=

λi

λi. . .

λi

+

0 1

0. . .. . . 1

0

=

λi 1

λi. . .. . . 1

λi

.

Finally, we have

V =k⊕i=1

Vi =k⊕i=1

ri⊕j=1

Wij .

Let B = ∪ki=1∪nij=1 Bij be the ordered basis for V obtained from Bij ’s. Then [T ]B

is of the form (3.19) as desired.

The following theorem gives a procedure on how to find a Jordan canonicalform of a linear operator whose minimal polynomial splits.

Theorem 3.4.17. Let T : V → V be a linear operator. Assume that

χT (x) = (x− λ1)n1 . . . (x− λk)nk and

mT (x) = (x− λ1)m1 . . . (x− λk)mk .

Let J be the Jordan canonical form of T . Then

(i) For each i, each entry on the main diagonal of Jij is λi, and the numberof λi’s on the main diagonal of J is equal to ni. Hence the sum (over j) ofthe orders of the Jij’s is ni.

(ii) For each i, the largest Jordan block Jij is of size mi ×mi.

(iii) For each i, the number of blocks Jij equals the dimension of the eigenspaceker(T − λiI).


(iv) For each i the number of blocks Jij with size q × q equals

2 dim(ker(T − λiI)q)− dim(ker(T − λiI)q+1)− dim(ker(T − λiI)q−1).

(v) The Jordan canonical form is unique up to the order of the Jordan blocks.

Proof. Let Vi = ker(T−λiI)mi for i = 1, . . . , k. By Proposition 3.4.5, dimVi = ni

for each i. This, together with the proof of Theorem 3.4.16, implies (i).Recall that the linear operator T − λiI, restricted to Vi, is nilpotent of index

mi and that we have a cyclic decomposition

Vi = Wi1 ⊕ · · · ⊕Wiri , with mi = dimWi1 ≥ · · · ≥ dimWiri .

Each Jordan block Jij corresponds to the subspace Wij in the cyclic decomposi-tion above. Hence the largest Jordan block Jij is of size mi ×mi.

Parts (iii) and (iv) follow from Proposition 3.4.14. The knowledge of (i)-(iv)shows that the Jordan canonical form is unique up to the order of the Jordanblocks.

Corollary 3.4.18. Let A be a square matrix such that mA(x) splits over F .Then A is similar to a matrix in the Jordan canonical form (3.19). Moreover,two matrices are similar if and only if they have the same Jordan canonical form,except possibly for a permutation of the blocks.

Example. Let T : V → V be a linear operator. Assume that

χT (x) = (x− 2)4(x− 3)3 and

mT (x) = (x− 2)2(x− 3)2.

Find all possible Jordan canonical forms of T .

Solution. We can extract the following information about the Jordan canonicalform J of T :

• J has size 7× 7.

• λ = 2 appears 4 times and λ = 3 appears 3 times.


• There is at least one Jordan block corresponding to λ = 2 of order 2.

• There is at least one Jordan block corresponding to λ = 3 of order 2.

With these 3 properties above, the Jordan canonical form of T is one of thefollowing matrices:

2 10 2

2 10 2

3 10 3

3

or

2 10 2

22

3 10 3

3

.

The first matrix occurs when dim ker(T − 2I) = 2 and the second one occurswhen dim ker(T − 2I) = 3.


Example. Suppose that T : V → V is a linear operator such that

(i) χT (x) = (x− 5)7,

(ii) mT (x) = (x− 5)3,

(iii) dim ker(T − 5I) = 3, and

(iv) dim ker(T − 5I)2 = 6.

Find a possible Jordan canonical form of T .

Solution. Note that since mT (x) = (x− 5)3, we see that ker(T − 5I)3 = V .From the give information, we know that

• the Jordan canonical form of T has size 7× 7 and λ = 5 appears 7 times.

• The largest Jordan block has size 3× 3.

• The number of Jordan blocks = dim ker(T − 5I) = 3.

With these 3 pieces of information above, the possible Jordan canonical form ofT can be one of the following matrices:

5 1 00 5 10 0 5

5 10 5

5 10 5

or

5 1 00 5 10 0 5

5 1 00 5 10 0 5

5

.

But then the number of Jordan blocks with size 2× 2 equals

2 dim ker(T − 5I)2 − dim ker(T − 5I)3 − dim ker(T − 5I) = 12− 7− 3 = 2.

Hence the only possible Jordan canonical form is the first matrix above.


Example. Classify 3× 3 matrices A such that A2 = 0 up to similarity.

Solution. Two matrices are similar if and only if they have the same Jordancanonical form. Hence we will find 3× 3 matrices in Jordan canonical form suchthat A2 = 0.

Let p(x) = x2. Then p(A) = 0, which implies that mA(x) | p(A). HencemA(x) = x or mA(x) = x2. If mA(x) = x, then A = 0. If mA(x) = x2, then A

has 2 Jordan blocks of sizes 2× 2 and 1× 1, respectively with 0 on its diagonal:0 1 00 0 00 0 0

.

Hence, up to similarity, there are two 3× 3 matrices A such that A2 = 0.

As an application of Jordan canonical form, we prove the following result:

Theorem 3.4.19. Let A be an n × n matrix over F . Assume that χA(x) splitsover F . Then

(i) the sum of all eigenvalues of A = trA;

(ii) the product of all eigenvalues of A = detA.

Proof. Let J be the Jordan canonical form of A. Then A = PJP−1 where P isan invertible matrix. Thus

detA = det(PJP−1) = detJ and trA = tr(PJP−1) = trJ.

Since J is the upper triangular matrix, det J and tr J is the product and the sum,respectively, of the diagonal entries. But then the diagonal entries of J consist ofthe eigenvalues of A. The result now follows.


Exercises

3.4.1. Find the characteristic polynomial and the minimal polynomial of matrix

A =

1 2 30 4 50 0 4

and determine its Jordan canonical form.

3.4.2. Let T : V → V be a linear operator. Show that T is nilpotent of index kif and only if [T ]B is nilpotent of index k for any ordered basis B for V .

3.4.3. Let T : V → V be a linear operator. Then the T -cyclic subspace generatedby v is p(T )(v) | p(x) ∈ F [x].

3.4.4. Let V be a finite-dimensional vector space over a field F and T : V → V

a linear operator. For λ ∈ F , define the generalized λ-eigenspace V[λ] to be

V[λ] = v ∈ V | (T − λI)k(v) = 0 for some k ∈ N.

(i) Prove that V[λ] is a subspace of V and that if λ, β1, . . . , βn are distinctelements in F , then V[λ] ∩ (V[β1] + · · ·+ V[βn]) = 0.

(ii) If mT (x) = (x− λ1)m1 . . . (x− λk)mk , where λ1, . . . , λk ∈ F , prove that

V[λi] = ker(T − λiI)mi i = 1, . . . , k.

3.4.5. If A ∈M5(C) with χA(x) = (x−2)3(x+ 7)2 and mA(x) = (x−2)2(x+ 7),what is the Jordan canonical form for A?

3.4.6. How many possible Jordan canonical forms are there for a 6× 6 complexmatrix A with χA(x) = (x+ 2)4(x− 1)2?

3.4.7. Classify up to similarity all 3× 3 complex matrices A such that A3 = I.

3.4.8. Let T : R2 → R2 be a linear operator such that T 2 = 0. Show that T = 0or there is an ordered basis B for R2 such that

[T ]B =

[0 10 0

].


3.4.9. List up to similarity all real 4× 4 matrices A such that A3 = A2 6= 0, andexhibit the Jordan canonical form of each.

3.4.10. Let A and B be square matrices such that A2 = A and and B2 = B.Show that A and B are similar if and only if they have the same rank.

3.4.11. Let A be an n× n complex matrix such that A2 = cA for some c ∈ C.

(i) Describe all the possibilities for the Jordan canonical form of A.

(ii) Suppose B is an n × n complex matrix such that B2 = cB (same c), andassume that rankA = rankB. Prove that A and B are similar over C.

Remark. Consider the cases c = 0 and c 6= 0.

3.4.12. Let A ∈Mn(C) with rankA = 1.

(i) Find all the possibilities for the Jordan canonical form of A.

(ii) Prove that det(In +A) = 1 + tr(A).

Chapter 4

Inner Product Spaces

4.1 Bilinear and Sesquilinear Forms

Definition 4.1.1. Let V be a vector space over a field F . A bilinear form onV is a function f : V × V → F which is linear in the both variables, i.e., for anyx, y, z ∈ V and α, β ∈ F ,

(i) f(αx+ βy, z) = αf(x, z) + βf(y, z),

(ii) f(z, αx+ βy) = αf(z, x) + βf(z, y).

A bilinear form f on V is said to be symmetric if

f(v, w) = f(w, v) for any v, w ∈ V .

Similarly, it is said to be skew-symmetric if

f(v, w) = −f(w, v) for any v, w ∈ V .

If the underlying field is the field of complex numbers, we can define asesquilinear form on V .

Definition 4.1.2. Let V be a vector space over C. A sesquilinear form on V isa function f : V ×V → C which is linear in the first variable and conjugate-linear(or anti-linear) in the second variable, i.e., for any x, y, z ∈ V and α, β ∈ C,

(i) f(αx+ βy, z) = αf(x, z) + βf(y, z),

161

162 CHAPTER 4. INNER PRODUCT SPACES

(ii) f(z, αx+ βy) = αf(z, x) + βf(z, y).

A sesquilinear form f is hermitian if f(x, y) = f(y, x) for any x, y ∈ V .

Definition 4.1.3. If f : V × V → F is a bilinear form or a sesquilinear form onV , then the map q : V → F defined by

q(v) = f(v, v) for any v ∈ V

is called a quadratic form associated with f .

The following proposition gives a formula that shows how to recover a sesqui-linear form from its quadratic form.

Proposition 4.1.4 (Polarization identity). If f : V × V → C is a sesquilinearform on V and q(v) = f(v, v) is its associated quadratic form, then for anyx, y ∈ V ,

f(x, y) =14

3∑k=0

ik q(x+ iky)

=14

[q(x+ y)− q(x− y)

]+i

4

[q(x+ iy)− q(x− iy)

]. (4.1)

Proof. The proof is a straightforward calculation and is left as an exercise.

We also have a Polarization identity for a symmetric bilinear form, which willbe given as an exercise.

Definition 4.1.5. Let f be a bilinear form or a sesquilinear form on a vectorspace V . Then f is said to be

- nondegenerate if

f(x, y) = 0 ∀y ∈ V ⇒ x = 0, and

f(y, x) = 0 ∀y ∈ V ⇒ x = 0.

- positive semi-definite if

f(x, x) ≥ 0 for any x ∈ V .

4.1. BILINEAR AND SESQUILINEAR FORMS 163

- positive definite if

∀x ∈ V, x 6= 0 ⇒ f(x, x) > 0.

Remark. A positive definite (sesquilinear or bilinear) form is positive semi-definite. A positive semi-definite form f is positive definite if and only if f(v, v) =0 implies v = 0.

Proposition 4.1.6. A sesquilinear form f on a complex vector space V is her-mitian if and only if f(x, x) ∈ R for any x ∈ V .

Proof. If the sesquilinear form is hermitian, then

f(x, x) = f(x, x) for any x ∈ V ,

which implies f(x, x) ∈ R for any x ∈ V . Conversely, assume that f(x, x) ∈ Rfor any x ∈ V . Then the associated quadratic form q(x) = f(x, x) ∈ R for anyx ∈ V . Since q(αx) = |α|2q(x) for any x ∈ X and α ∈ C, we have

q(y + ix) = q(i(x− iy)) = q(x− iy),

q(y − ix) = q(−i(x+ iy)) = q(x+ iy).

These identities, together with the Polarization identity, imply that f is hermi-tian.

Corollary 4.1.7. A positive semi-definite sesquilinear form is hermitian.

Proof. It follows immediately from Definition 4.1.5 and Proposition 4.1.6.

Proposition 4.1.8. A positive definite (sesquilinear or bilinear) form is nonde-generate.

Proof. Let f be a positive definite sesquilinear (bilinear) form on V . Let u ∈ Vbe such that f(u, v) = 0 for any v ∈ V . In particular, f(u, u) = 0, whichimplies u = 0. Similarly, if f(v, u) = 0 for any v ∈ V , then u = 0. Hence it isnondegenerate.


Theorem 4.1.9 (Cauchy-Schwarz inequality). Let f be a positive semi-definite(bilinear or sesquilinear) form on V . Then for any x, y ∈ V ,

|f(x, y)| ≤√f(x, x)

√f(y, y).

Proof. We will prove this for a sesquilinear form over a complex vector space.Let A = f(x, x), B = |f(x, y)| and C = f(y, y). If B = 0, the result followstrivially. Suppose B 6= 0. Let α = B/f(y, x). Then |α| = 1 and αf(y, x) = B.By Corollary 4.1.7, we also have αf(x, y) = B. For any r ∈ R,

f(x− rαy, x− rαy) = f(x, x)− rαf(y, x)− rαf(x, y) + r2f(y, y)

= A− 2rB + r2C.

Hence A− 2rB + r2C ≥ 0 for any r ∈ R. If C = 0, then 2rB ≤ A for any r ∈ R,which implies B = 0. If C > 0, take r = B/C so that A − B2/C ≥ 0, whichimplies B2 ≤ AC.

4.1. BILINEAR AND SESQUILINEAR FORMS 165

Exercises

4.1.1. If f : V × V → F is a symmetric bilinear form on V and q(v) = f(v, v) isits associated quadratic form, show that for any x, y ∈ V ,

f(x, y) =14

[q(x+ y)− q(x− y)

]=

12

[q(x+ y)− q(x)− q(y)

].

4.1.2. For any z = (z1, . . . , zn) and w = (w1, . . . , wn) in Cn, define

(z, w) = z1w1 + · · ·+ znwn.

Prove that this is a nondegenerate symmetric bilinear form on Cn, which is notpositive definite. However, the formula

〈z, w〉 = z1w1 + · · ·+ znwn

defines a positive definite hermitian sesquilinear form on Cn.

4.1.3. Verify whether each of the following bilinear forms on Fn, where F = R,C,is symmetric, skew-symmetric or nondegenerate:

(i) f(x, y) = x1y1 + · · ·+ xpyp − xp+1yp+1 − · · · − xnyn;

(ii) g(x, y) = (x1y2 − x2y1) + · · ·+ (x2m−1y2m − x2my2m−1);

(iii) h(x, y) = (x1ym+1 − xm+1y1) + · · ·+ (xmy2m − x2mym).

Remark. In (i), p in an integer in the set 1, . . . , n. In (ii) and (iii), we assumethat n = 2m is even.

4.1.4. Let V be a finite-dimensional vector space of dimension n. Let B be anordered basis for V and let A and B be n× n matrices such that

[v]tBA [w]B = [v]tBB [w]B for any v, w ∈ V .

Prove that A = B.

4.1.5. Let V be a finite-dimensional vector space with a basis B = v1, . . . , vnand let f be a bilinear form on V . The matrix representation of f , denoted by[f ]B, is the matrix whose ij-entry is f(vi, vj).


Prove the following statements:

(i) f(v, w) = [v]tB[f ]B [w]B for any v, w ∈ V ;

(ii) [f ]B is symmetric (skew-symmetric) if and only if f is symmetric (skew-symmetric);

(iii) [f ]B is invertible if and only if f is nondegenerate.

4.1.6. Compute the matrix representations of the bilinear forms in Problem 4.1.3.

4.1.7. Let V be a finite-dimensional vector space and f a bilinear form on V .Let B and B′ be ordered bases for V and P the transition matrix from B to B′.Show that

[f ]B = P t[f ]B′P.

4.1.8. Let f be a bilinear form on a vector space V . A linear operator T : V → V

is said to preserve the bilinear form f if

f(Tv, Tw) = f(v, w) for any v, w ∈ V .

Show that

(i) if f is nondegenerate and T preserves f , then T is invertible;

(ii) if f is nondegenerate, then the set of linear operators preserving f is a groupunder composition.

(iii) if V is finite-dimensional, then T preserves f if and only if

[T ]tB[f ]B[T ]B = [f ]B

for any ordered basis B for V .

4.2. INNER PRODUCT SPACES 167

4.2 Inner Product Spaces

Definition 4.2.1. Let V be a vector space over field F, (where F = R or F = C).An inner product on V is a map 〈· , ·〉 : V × V → F satisfying

(1) 〈x+ y, z〉 = 〈x, z〉+ 〈y, z〉 for each x, y, z ∈ V ;

(2) 〈αx, y〉 = α〈x, y〉 for each x, y ∈ V and α ∈ F;

(3) 〈x, y〉 = 〈y, x〉 for each x, y ∈ V ;

(4) ∀x ∈ V , x 6= 0⇒ 〈x, x〉 > 0.

A real (or complex) vector space equipped with an inner product is called a real(or complex) inner product space.

Proposition 4.2.2. Let V be an inner product space. Then

(i) 〈x, αy〉 = α〈x, y〉 for any x, y ∈ V and α ∈ F;

(ii) 〈x, y + z〉 = 〈x, y〉+ 〈x, z〉 for any x, y, z ∈ V ;

(iii) 〈x, 0〉 = 〈0, x〉 = 0 for any x ∈ V .

Proof. Easy.

From Definition 4.2.1 and Proposition 4.2.2, we see that if F = R, then theinner product is linear in both variables, and if F = C, then the inner product islinear in the first variable and conjugate-linear in the second variable. Hence thereal inner product is a positive definite, symmetric bilinear form and the complexinner product is a positive definite hermitian sesquilinear form.

The next proposition is useful in proving results about an inner product.

Proposition 4.2.3. If y and z are elements in an inner product space V suchthat 〈x, y〉 = 〈x, z〉 for each x ∈ V , then y = z.

Proof. Since 〈x, y − z〉 = 0, for any x ∈ V , by choosing x = y − z, we have〈y − z, y − z〉 = 0. Hence y = z.


Let V be an inner product space. For each x ∈ V , write

‖x‖ =√〈x, x〉. (4.2)

In other words, ‖x‖ is the square-root of the associated quadratic form of x. TheCauchy-Schwarz inequality (Theorem 4.1.9) can be written as

|〈x, y〉| ≤ ‖x‖ ‖y‖ for any x, y ∈ V .

Definition 4.2.4. Let V be a vector space. A function ‖ · ‖ : V → [0,∞) is saidto be a norm on V if

(i) ‖x‖ = 0 if and only if x = 0,

(ii) ‖cx‖ = |c| ‖x‖ for any x ∈ V and c ∈ F,

(iii) ‖x+ y‖ ≤ ‖x‖+ ‖y‖ for any x, y ∈ V .

A vector space equipped with a norm is called a normed linear space, or simplya normed space. Property (iii) is referred to as the triangle inequality.

Proposition 4.2.5. Let V be an inner product space. Then the function ‖ · ‖defined in (4.2) is a norm on V .

Proof. It is easy to see that ‖x‖ ≥ 0 and ‖x‖ = 0 if and only if x = 0. For anyx ∈ V and α ∈ F,

‖αx‖2 = 〈αx, αx〉 = αα〈x, x〉 = |α|2‖x‖2.

Hence ‖αx‖ = |α|‖x‖. To prove the triangle inequality, let x, y ∈ V .

‖x+ y‖2 = 〈x+ y, x+ y〉

= 〈x, x〉+ 〈x, y〉+ 〈y, x〉+ 〈y, y〉

= ‖x‖2 + 2 Re〈x, y〉+ ‖y‖2

≤ ‖x‖2 + 2|〈x, y〉|+ ‖y‖2

≤ ‖x‖2 + 2 ‖x‖ ‖y‖+ ‖y‖2

= (‖x‖+ ‖y‖)2.

Hence ‖x+ y‖ ≤ ‖x‖+ ‖y‖.


Proposition 4.2.6 (Parallelogram law). Let V be an inner product space. Thenfor any x, y ∈ V ,

‖x+ y‖2 + ‖x− y‖2 = 2‖x‖2 + 2‖y‖2.

Proof. For any x, y ∈ V , we have

‖x+ y‖2 = 〈x, x〉+ 〈x, y〉+ 〈y, x〉+ 〈y, y〉

= ‖x‖2 + 2 Re〈x, y〉+ ‖y‖2, and

‖x− y‖2 = 〈x, x〉 − 〈x, y〉 − 〈y, x〉+ 〈y, y〉

= ‖x‖2 − 2 Re〈x, y〉+ ‖y‖2.

We immediately see that ‖x+ y‖2 + ‖x− y‖2 = 2‖x‖2 + 2‖y‖2.

Proposition 4.2.7 (Polarization identity). Let V be an inner product space.

(1) If F = R, then

〈x, y〉 =14(‖x+ y‖2 − ‖x− y‖2

).

(2) If F = C, then

〈x, y〉 =14(‖x+ y‖2 − ‖x− y‖2 + i‖x+ iy‖2 − i‖x− iy‖2

).

Proof. The complex case is Proposition 4.1.4. The real case is easy and is left asan exercise.

Examples.

1. Fn is an inner product space with respect to the following inner product

〈x, y〉 =n∑i=1

xiyi = x1y1 + x2y2 + · · ·+ xnyn,

where x = (x1, . . . , xn) and y = (y1, . . . , yn) ∈ Fn. Note that when F = R, thenthe inner product is simply 〈x, y〉 =

∑ni=1 xiyi.

2. `2 = (xn) |∑∞

n=1 |xn|2 <∞. If x = (xn) and y = (yn) ∈ `2, then

〈x, y〉 =∞∑i=1

xiyi


is an inner product on `2. The series above is convergent by the Cauchy-Schwarzinequality. Note that

√〈x, x〉 = ‖x‖2 on `2.

3. Mn(F), regarded as Fn2, is an inner product space with respect to the

following inner product

〈A,B〉 =n∑i=1

n∑j=1

aijbij = tr(AB∗)

where B∗ = (B)t = Bt is the conjugate-transpose of B. In case of a real matrix,B∗ is simply Bt.

4. The vector space C[0, 1] of all continuous functions on [0, 1] is an innerproduct space with respect to the inner product

〈f, g〉 =∫ 1

0f(x)g(x) dx (f, g ∈ C[0, 1]).

Definition 4.2.8. Let V be an inner product space.

(1) We say that u, v ∈ V are orthogonal if 〈u, v〉 = 0 and write u ⊥ v.

(2) If x ∈ V is orthogonal to every element of a subset W of V , then we saythat x is orthogonal or perpendicular to W and write x ⊥W .

(3) If U , W are subsets of V and u ⊥ w for all u ∈ U and all w ∈ W , then wesay that U and W are orthogonal and write U ⊥W .

(4) The set of all x ∈ V orthogonal to a set W is denoted by W⊥ and calledthe orthogonal complement of W :

W⊥ = x ∈ V | x ⊥W.

Proposition 4.2.9. Let V be an inner product space.

(1) 0⊥ = V and V ⊥ = 0.

(2) If A is a subset of V , then A⊥ is a subspace of V .

(3) If A is a subset of V , then A ∩A⊥ = ∅ or A ∩A⊥ = 0;

if A is a subspace of V , then A ∩A⊥ = 0.


(4) For any subsets A, B of V , if A ⊆ B then B⊥ ⊆ A⊥.

(5) For any subset A of V , A ⊆ A⊥⊥.

Proof. (1) is trivial.

(2) Clearly, 0 ∈ A⊥. If x1, x2 ∈ A⊥ and α, β ∈ F, then

〈αx1 + βx2, y〉 = α〈x1, y〉+ β〈x2, y〉 = 0 for all y ∈ A.

Hence αx1 + βx2 ∈ A⊥.

(3) Assume that A∩A⊥ 6= ∅. Let x ∈ A∩A⊥. Since x ∈ A⊥, we have 〈x, y〉 = 0for each y ∈ A. In particular, 〈x, x〉 = 0. Hence x = 0. This shows thatA ∩ A⊥ ⊆ 0. Now, assume that A is a subspace of A. Since both A and A⊥

are subspaces of V , 0 ∈ A ∩A⊥. Hence A ∩A⊥ = 0.

(4) Assume that A ⊆ B. Let x ∈ B⊥. If y ∈ A, then y ∈ B and hence 〈x, y〉 = 0.This shows that x ∈ A⊥. Thus, B⊥ ⊆ A⊥.

(5) Let x ∈ A. Then 〈x, y〉 = 0 for any y ∈ A⊥. Hence x ∈ A⊥⊥.

Definition 4.2.10. A nonempty collection O = uα | α ∈ Λ of elements inan inner product space is said to be an orthogonal set if uα ⊥ uβ for all α 6= β

in Λ. If, in addition, each uα has norm one, then we say that the set O is anorthonormal set. That is, the set O is orthonormal if and only if 〈uα, uβ〉 = δαβ

for each α, β ∈ Λ, where δαβ is the Kronecker’s delta function.

Note that we can always construct an orthonormal set from an orthogonal setof nonzero vectors by dividing each vector by its norm.

Examples.

(1) (1, 0, 0), (0, 1, 0), (0, 0, 1) is an orthonormal set in R3.

(2) (1,−1, 0), (1, 1, 0), (0, 0, 1) is an orthogonal set in R3, but not an orthonor-mal set. By dividing each element by its norm, we obtain an orthonormalset

( 1√2,− 1√

2, 0), ( 1√

2, 1√

2, 0), (0, 0, 1)

.


(3) ei2nπx∞n=−∞ is an orthonormal set in C[0, 1] because∫ 1

0e2nπix · e2mπix dx =

∫ 1

0e2(n−m)πix dx = δnm.

Proposition 4.2.11. Any orthogonal set of nonzero vectors is linearly indepen-dent.

Proof. Assume that O is an orthogonal set consisting of non-zero vectors. Ifui ∈ O and ci ∈ F, i = 1, 2, . . . , n, are such that

∑ni=1 ciui = 0, then, for

j = 1, 2, . . . , n,

0 =⟨ n∑i=1

ciui, uj

⟩=

n∑i=1

ci 〈ui, uj〉 = cj‖uj‖2,

which implies that cj = 0 for each j. Hence O is linearly independent.

The next proposition is a generalization of the Pythagorean theorem for aright-angled triangle.

Proposition 4.2.12 (Pythagorean formula). If x1, x2, . . . , xn is an orthogonalsubset of an inner product space, then∥∥∥ n∑

i=1

xi

∥∥∥2=

n∑i=1

‖xi‖2.

Proof. Using the fact that 〈xi, xj〉 = 0 if i 6= j, we have∥∥∥ n∑i=1

xi

∥∥∥2=⟨ n∑i=1

xi,

n∑j=1

xj

⟩=

n∑i=1

n∑j=1

〈xi, xj〉 =n∑i=1

‖xi‖2.

If S = u1, u2, . . . , un is a linearly independent subset of a vector space V ,then any element x ∈ span(S) can be written uniquely as a linear combinationx =

∑ni=1 αiui. However, if S is an orthonormal set in an inner product space, it

is linearly independent by Proposition 4.2.11. In this case, if x =∑n

i=1 αiui is inspan(S), we can determine the formula for the coefficients αi.

Proposition 4.2.13. Let u1, . . . , un be an orthonormal set in an inner productspace and x =

∑ni=1 αiui, where each αi ∈ F. Then αi = 〈x, ui〉 for i = 1, . . . , n

and

‖x‖2 =n∑i=1

|αi|2 =n∑i=1

| 〈x, ui〉 |2.


Proof. If x =∑n

i=1 αiui, then

〈x, uj〉 =⟨ n∑i=1

αiui, uj

⟩=

n∑i=1

αi 〈ui, uj〉 = αj .

Moreover, by Proposition 4.2.12,

‖x‖2 =∥∥∥ n∑i=1

αiui

∥∥∥2=

n∑i=1

‖αiui‖2 =n∑i=1

|αi|2 =n∑i=1

| 〈x, ui〉 |2.

Proposition 4.2.14. Let u1, u2, . . . , un be an orthonormal subset of an innerproduct space V . Let N = spanu1, u2, . . . , un. For any x ∈ V , define theorthogonal projection of x on N by

PN (x) =n∑i=1

〈x, ui〉ui.

Then PN (x) ∈ N and x− PN (x) ∈ N⊥. In particular, x− PN (x) ⊥ PN (x).

Proof. It is obvious that PN (x) ∈ N . First, we show that x − PN (x) ⊥ uj forj = 1, . . . , n. Using the fact that 〈ui, uj〉 = δij , we have

〈x− PN (x), uj〉 =⟨x−

n∑i=1

〈x, ui〉ui, uj⟩

= 〈x, uj〉 −n∑i=1

〈x, ui〉〈ui, uj〉

= 〈x, uj〉 − 〈x, uj〉 = 0.

This implies that x− PN (x) ⊥ uj for j = 1, . . . , n. If y =∑n

j=1 cjuj ∈ N , then

〈x− PN (x), y〉 =⟨x− PN (x),

n∑j=1

cjuj

⟩=

n∑j=1

cj 〈x− PN (x), uj〉 = 0.

Hence x− PN (x) ⊥ N , which implies x− PN (x) ⊥ PN (x).

We will apply this proposition to show that we can construct an orthonormalset from a linearly independent set and it still has the same spanning set. Thisis known as Gram-Schmidt orthogonalization process.


Theorem 4.2.15 (Gram-Schmidt orthogonalization process). Let V be an innerproduct space and let x1, x2, . . . be a linearly independent set in V . Then thereis an orthonormal set u1, u2, . . . such that, for each n ∈ N,

spanx1, . . . , xn = spanu1, . . . , un.

Proof. First, set u1 = x1/‖x1‖. Next, let z2 = x2 − 〈x2, u1〉u1. Clearly, z2 isorthogonal to u1. Then, we set u2 = z2/‖z2‖. In general, assume that we havechosen an orthonormal set u1, u2, . . . , un−1 such that

spanx1, . . . , xn−1 = spanu1, . . . , un−1. (4.3)

Define

zn = xn −n−1∑i=1

〈xn, ui〉ui.

By Proposition 4.2.14,∑n−1

i=1 〈xn, ui〉ui is the orthogonal projection of xn ontospanu1, . . . , un−1. Hence zn is orthogonal to each uj . Now, let un = zn/‖zn‖.It follows that u1, . . . , un is an orthonormal set. Moreover, it is easy to see thatun ∈ spanu1, . . . , un−1, xn. This, together with (4.3), implies

spanu1, . . . , un ⊆ spanx1, . . . , xn.

On the other hand, xn ∈ spanu1, . . . , un−1, zn = spanu1, . . . , un−1, un. Thus

spanx1, . . . , xn ⊆ spanu1, . . . , un.

This finishes the proof.

Definition 4.2.16. Let V be a finite-dimensional inner product space. An or-thonormal basis for V is a basis for V which is an orthonormal set.

Example.

(1) (1, 0, 0), (0, 1, 0), (0, 0, 1) is an orthonormal basis for R3.

(2)

( 1√2,− 1√

2, 0), ( 1√

2, 1√

2, 0), (0, 0, 1)

is an orthonormal basis for R3.


Corollary 4.2.17. Let V be a finite-dimensional inner product space. Then V

has an orthonormal basis.

Proof. Let x1, . . . , xn be a basis for V . By Gram-Schmidt orthogonalizationprocess, there is an orthonormal set u1, . . . , un such that

spanx1, . . . , xn = spanu1, . . . , un.

This shows that u1, . . . , un is an orthonormal basis for V .

Example. Apply Gram-Schmidt process to produce an orthonormal basis forR3 from basis (1, 1, 0), (0, 1, 1), (1, 0, 1).

Solution. Let x1 = (1, 1, 0), x2 = (0, 1, 1) and x3 = (1, 0, 1). First, set

u1 =x1

‖x1‖=(

1√2,

1√2, 0).

Next, let

z2 = x2 − 〈x2, u1〉u1 = (0, 1, 1)− 1√2

(1√2,

1√2, 0)

=(−1

2,12, 1).

Then set

u2 =z2

‖z2‖=√

2√3

(−1

2,12, 1)

=(− 1√

6,

1√6,

2√6

).

Now let

z3 = x3 − 〈x3, u1〉u1 − 〈x3, u2〉u2

= (1, 0, 1)− 1√2

(1√2,

1√2, 0)− 1√

6

(− 1√

6,

1√6,

2√6

)=(

23,−2

3,23

).

Finally, set

u3 =z3

‖z3‖=√

32

(23,−2

3,23

)=(

1√3,− 1√

3,

1√3

).

We have an orthonormal basis(

1√2, 1√

2, 0),(− 1√

6, 1√

6, 2√

6

),(

1√3,− 1√

3, 1√

3

).


Theorem 4.2.18 (Projection Theorem). Let V be a finite-dimensional innerproduct space and W a subspace of V . Then

V = W ⊕W⊥.

Proof. Let u1, . . . , un be an orthonormal basis for W . Let x ∈ V . Consider theorthogonal projection of x on W :

PW (x) =n∑i=1

〈x, ui〉ui.

By Proposition 4.2.14, PW (x) ∈W and x− PW (x) ∈W⊥. Thus

x = PW (x) + (x− PW (x)) ∈W +W⊥.

This shows that V = W + W⊥. We already know that W ∩W⊥ = 0. HenceV = W ⊕W⊥.

Corollary 4.2.19. Let W be a subspace of a finite-dimensional inner productspace V . Then W⊥⊥ = W .

Proof. If x ∈ W , then x ⊥ W⊥, which implies x ∈ W⊥⊥. Hence W ⊆ W⊥⊥.On the other hand, let x ∈ W⊥⊥. By Theorem 4.2.18, we can write x = y + z,where y ∈ W and z ∈ W⊥. Since y ∈ W , it is also in W⊥⊥. Since W⊥⊥ isa subspace of V , we have x − y ∈ W⊥⊥. But then x − y = z ∈ W⊥. Hencex−y ∈W⊥∩ (W⊥)⊥ = 0. Thus x = y ∈W , which shows that W = W⊥⊥.

Next we show that, given a subspace W and a point v in V , the orthogonalprojection PW (v) is the point on W that minimizes the distance from v to W .

Proposition 4.2.20. Let W be a subspace of a finite-dimensional inner productspace V and v ∈ V . Then

‖v − PW (v)‖ ≤ ‖v − w‖ for any w ∈W .

Moreover, the equality holds if and only if w = PW (v).


Proof. For any w ∈W ,

‖v − w‖2 = ‖(v − PW (v)) + (PW (v)− w)‖2

= ‖v − PW (v)‖2 + ‖PW (v)− w‖2

≥ ‖v − PW (v)‖2,

where the second equality holds because v − PW (v) ∈W⊥ and PW (v)−w ∈W .Also, the equality holds if and only if ‖PW (v)− w‖ = 0, i.e., PW (v) = w.

Now we consider a linear functional on an inner product space. It is easilyseen that for a fixed w ∈ V , the map v 7→ 〈v, w〉 is a linear functional on V . Thenext theorem shows that these are the only linear functionals on V . It is alsotrue in a more general case with a variety of interesting applications.

Theorem 4.2.21 (Riesz’s Theorem). Let f be a linear functional on a finite-dimensional inner product space V . Then there is a unique w ∈ V such that

f(v) = 〈v, w〉 for any v ∈ V .

Proof. Let u1, . . . , un be an orthonormal basis for V . Let

w =n∑i=1

f(ui)ui.

Let v ∈ V and write v =∑n

i=1〈v, ui〉ui. Then

f(v) = f( n∑i=1

〈v, ui〉ui)

=n∑i=1

〈v, ui〉f(ui)

=⟨v,

n∑i=1

f(ui)ui⟩

= 〈v, w〉.

To show uniqueness, let w′ ∈ V be such that f(v) = 〈v, w〉 = 〈v, w′〉 for anyv ∈ V . By Proposition 4.2.3, w = w′.


Exercises

4.2.1. Let n ≥ 3. Prove that if x, y are elements of a complex inner productspace, then

〈x, y〉 =1n

n∑k=1

ωk‖x+ ωky‖2,

where ω is the n-th root of unity, i.e., ωn = 1.

4.2.2. Show that in a complex inner product space, x ⊥ y if and only if

‖x+ αy‖ = ‖x− αy‖ for all scalars α.

4.2.3. Let ϕ be a nonzero linear functional on a finite-dimensional inner productspace V . Prove that (kerϕ)⊥ is a subspace of dimension 1.

4.2.4. Let W1, W2 be subspaces of an inner product space V . Prove that

(W1 +W2)⊥ = W⊥1 ∩W⊥2 .

4.2.5. Let A be a subset of a finite-dimensional inner product space V . Provethat A⊥ = (spanA)⊥.

4.2.6. In each of the following parts, apply Gram-Schmidt process to the givenbasis for R3 to produce an orthonormal basis, and write the given element x asa linear combination of the elements in the orthonormal basis thus obtained.

(a) (1, 0,−1), (0, 1, 1), (1, 2, 3), and x = (2, 1,−2);

(b) (1, 1, 1), (0, 1, 1), (0, 0, 3), and x = (3, 3, 1).

4.2.7. Let v1, . . . , vk be an orthonormal subset of an inner product space V .Show that for any x ∈ V ,

k∑i=1

|〈x, vi〉|2 ≤ ‖x‖2.

Prove also that the equality holds if and only if v1, . . . , vk is an orthonormalbasis for V .


4.2.8. Let V be a complex inner product space. Let v1, . . . , vn be an orthonor-mal basis for V . Prove that

(i) x =n∑i=1

〈x, vi〉 vi for any x ∈ V ;

(ii) 〈x, y〉 =n∑i=1

〈x, vi〉〈y, vi〉 for any x, y ∈ V .


4.3 Operators on Inner Product Spaces

Throughout this section, unless otherwise stated, V is a finite-dimensional innerproduct space. For simplicity, in this section we will write Tx for T (x) whenthere is no confusion.

Proposition 4.3.1. Let V be an inner product space and T a linear operator onV .

(i) If 〈Tx, y〉 = 0 for any x, y ∈ V , then T = 0.

(ii) If V is a complex inner product space and 〈Tx, x〉 = 0 for all x ∈ V , thenT = 0.

Proof. (i) For each x ∈ V , 〈Tx, y〉 = 0 for any y ∈ V . Hence Tx = 0 for eachx ∈ V , which implies that T = 0.

(ii) Let x, y ∈ V and r ∈ C. Then

0 = 〈T (rx+ y), rx+ y〉 = |r|2〈Tx, x〉+ 〈Ty, y〉+ r〈Tx, y〉+ r〈Ty, x〉

= r〈Tx, y〉+ r〈Ty, x〉.

Setting r = 1, we have

〈Tx, y〉+ 〈Ty, x〉 = 0.

Setting r = i, we have

〈Tx, y〉 − 〈Ty, x〉 = 0.

Hence 〈Tx, y〉 = 0 for any x, y ∈ V . It follows from part (i) that T = 0.

Remark. Part (ii) may not hold for a real inner product space. For example,let V = R2 and T is the 90-rotation, i.e. T (x, y) = (−y, x) for any (x, y) ∈ R2.Then 〈Tv, v〉 = 0 for each v ∈ V , but T 6= 0.

Theorem 4.3.2. Let T be a linear operator on V . Then there is a unique linearoperator T ∗ on V satisfying

〈Tx, y〉 = 〈x, T ∗y〉 for all x, y ∈ V .

4.3. OPERATORS ON INNER PRODUCT SPACES 181

Proof. Let T be a linear operator on V . For any y ∈ V , the map x 7→ 〈Tx, y〉 isa linear functional on V . By Riesz’s Theorem (Theorem 4.2.21), there exists aunique z ∈ V such that

〈Tx, y〉 = 〈x, z〉 for all x ∈ V .

Define T ∗y = z. To show that the map T ∗ is linear, let y1, y2 ∈ V and α, β ∈ F.Then for any x ∈ V ,

〈x, T ∗(αy1 + βy2)〉 = 〈Tx, αy1 + βy2〉

= α〈Tx, y1〉+ β〈Tx, y2〉

= α〈x, T ∗y1〉+ β〈x, T ∗y2〉

= 〈x, αT ∗y1 + βT ∗y2〉.

Hence T ∗(αy1 +βy2) = αT ∗y1 +βT ∗y2. For uniqueness, assume that S is a linearoperator on V such that

〈Tx, y〉 = 〈x, Sy〉 for all x, y ∈ V .

Then〈x, Sy〉 = 〈x, T ∗y〉 for all x, y ∈ V .

Thus S = T ∗ by Proposition 4.3.1.

Definition 4.3.3. Let T be a linear operator on V . Then the linear operator T ∗

defined in Theorem 4.3.2 is called the adjoint of T .

We summarize important properties of the adjoint of an operator in the fol-lowing theorem:

Theorem 4.3.4. Let T , S be linear operators on V . Then

1. T ∗∗ = T ;

2. (αT + βS)∗ = αT ∗ + βS∗ for all α, β ∈ F ;

3. (TS)∗ = S∗T ∗ ;

4. If T is invertible, then T ∗ is also invertible and (T ∗)−1 = (T−1)∗.


Proof. Let T and S be linear operators on V .

(1) For any x, y ∈ V ,

〈x, T ∗∗y〉 = 〈T ∗x, y〉 = 〈x, Ty〉.

Hence T ∗∗ = T .

(2) We leave this as a (straightforward) exercise.

(3) For any x, y ∈ V ,

〈x, (TS)∗y〉 = 〈TSx, y〉 = 〈Sx, T ∗y〉 = 〈x, S∗T ∗y〉.

Hence (TS)∗ = S∗T ∗.

(4) Assume that T−1 exists. Then TT−1 = T−1T = I. Taking adjoint andapplying (3), we see that

(T−1)∗T ∗ = T ∗(T−1)∗ = I∗ = I.

Hence T ∗ is invertible and (T ∗)−1 = (T−1)∗.

Remark. If T : V → W is a linear operator between finite-dimensional innerproduct spaces, we can define T ∗ : W → V to be a unique linear operator fromW into V satisfying

〈Tx, y〉W = 〈x, T ∗y〉V for any x ∈ V and y ∈W.

It has all the properties listed in the previous theorem. Since we are mainlyinterested in the case where V = W , we will restrict ourselves to this setting.

Examples. If we write elements in Cn as column vectors (n× 1 matrices), thenwe can write the inner product on Cn as

〈x, y〉 =n∑i=1

xiyi = xty.

Recall that any n × n matrix A defines a linear operator LA on Cn by left mul-tiplication LA(x) = Ax, where x ∈ Cn is written as an n× 1 matrix. Then

(LA)∗ = LA∗ ,


where A∗ = At. To see this, let x, y ∈ Cn. Then

〈LA(x), y〉 = 〈Ax, y〉 = (Ax)ty = xtAty = 〈x, Aty〉 = 〈x, LA∗(y)〉.

On the other hand, if T is a linear operator on V and B is an ordered orthonormalbasis for V , then [T ∗]B = [T ]∗B. We leave this as an exercise.

Definition 4.3.5. Let T be a linear operator on V . Then

- T is said to be normal if TT ∗ = T ∗T ;

- T is said to be self-adjoint or hermitian if T ∗ = T ;

- T is said to be unitary if T is invertible and T ∗ = T−1.

If V is a real inner product space and T is unitary, then we may say that T isorthogonal. It is clear that if T is self-adjoint or unitary, then it is normal.

Definition 4.3.6. Let A ∈Mn(F). Then

- A is said to be normal if AA∗ = A∗A ;

- A is said to be self-adjoint or hermitian if A∗ = A ;

- A is said to be unitary if A is invertible and A∗ = A−1.

If F = R, then

- A is said to be symmetric if At = A ;

- A is said to be orthogonal if A is invertible and At = A−1.

In other words, a symmetric matrix is a real self-adjoint matrix and an orthogonalmatrix is a real unitary matrix.

Examples. Let V = Fn and let A be an n×n matrix over F. Consider the linearoperator LA given by left multiplication by a matrix A. It is easy to verify that

- LA is normal if and only if A is normal;

- LA is hermitian if and only if A is hermitian;


- LA is unitary if and only if A is unitary.

Theorem 4.3.7. Let T be a linear operator on V .

(i) T is self-adjoint if and only if 〈Tx, y〉 = 〈x, Ty〉 for any x, y ∈ V .

(ii) If T is self-adjoint, then 〈Tx, x〉 ∈ R for each x ∈ V .

(iii) If V is a complex inner product space, then T is self-adjoint if and only if〈Tx, x〉 ∈ R for each x ∈ V .

Proof. (i) Assume that T = T ∗. Then

〈Tx, y〉 = 〈x, T ∗y〉 = 〈x, Ty〉 for any x, y ∈ V .

Conversely, if 〈Tx, y〉 = 〈x, Ty〉 for any x, y ∈ V , then

〈Tx, y〉 = 〈x, Ty〉 = 〈T ∗x, y〉 for any x, y ∈ V .

Hence T = T ∗ by Proposition 4.3.1 (i).

(ii) Assume that T is self-adjoint. Then for any x ∈ V ,

〈Tx, x〉 = 〈x, Tx〉 = 〈Tx, x〉,

which implies 〈Tx, x〉 ∈ R for any x ∈ V .

(iii) Let V be a complex inner product space. Assume that 〈Tx, x〉 ∈ R forany x ∈ V . Then

〈Tx, x〉 = 〈Tx, x〉 = 〈x, Tx〉 = 〈T ∗x, x〉 for any x ∈ V .

By Proposition 4.3.1 (ii), we conclude that T = T ∗.

Proposition 4.3.8. Let T be a self-adjoint operator on V . If 〈Tx, x〉 = 0 forall x ∈ V , then T = 0.

Proof. Assume that 〈Tx, x〉 = 0 for all x ∈ V . If V is a complex inner productspace, then T = 0 (without assuming that T is self-adjoint) by Proposition 4.3.1(ii). Thus we will establish this for a real inner product space. For any x, y ∈ V ,

0 = 〈T (x+ y), x+ y〉 = 〈Tx, x〉+ 〈Tx, y〉+ 〈Ty, x〉+ 〈Ty, y〉,


which implies 〈Tx, y〉+ 〈Ty, x〉 = 0. But then

〈Tx, y〉 = 〈y, Tx〉 = 〈Ty, x〉.

The first equality follows from the fact that the inner product is real and thesecond one follows because T is self-adjoint. It follows that 〈Tx, y〉 = 0.

Theorem 4.3.9. Let T be a linear operator on V . Then T is normal if and onlyif ‖Tx‖ = ‖T ∗x‖ for each x ∈ V .

Proof. Let T be a linear operator on V . Note that T ∗T − T ∗T is self-adjoint.Then by Proposition 4.3.8,

T ∗T − TT ∗ = 0 ⇐⇒ 〈(T ∗T − TT ∗)(x), x〉 = 0 for any x ∈ V

⇐⇒ 〈T ∗Tx, x〉 = 〈TT ∗x, x〉 for any x ∈ V

⇐⇒ ‖Tx‖2 = ‖T ∗x‖2 for any x ∈ V .

Hence T is normal if and only if ‖Tx‖ = ‖T ∗x‖ for any x ∈ V .

Theorem 4.3.10. Let T be a linear operator on V . Then TFAE:

(i) T is unitary;

(ii) ‖Tx‖ = ‖x‖ for all x ∈ V ;

(iii) 〈Tx, Ty〉 = 〈x, y〉 for all x, y ∈ V ;

(iv) T ∗T = I.

Proof. (i)⇒ (ii). For all x ∈ V ,

‖Tx‖2 = 〈Tx, Tx〉 = 〈T ∗Tx, x〉 = 〈T−1Tx, x〉 = 〈x, x〉 = ‖x‖2.

(ii)⇒ (iii). We use the Polarization identity (Proposition 4.2.7). We will proveit when F = C. The real case can be done the same way. For all x, y ∈ V ,

〈x, y〉 =14(‖x+ y‖2 − ‖x− y‖2

)+i

4(‖x+ iy‖2 − ‖x− iy‖2

)=

14(‖Tx+ Ty‖2 − ‖Tx− Ty‖2

)+i

4(‖Tx+ iTy‖2 − ‖Tx− iTy‖2

)= 〈Tx, Ty〉.


(iii) ⇒ (iv). Since 〈T ∗Tx, y〉 = 〈Tx, Ty〉 = 〈x, y〉 for all x, y ∈ V , we haveT ∗T = I.

(iv) ⇒ (v). If T ∗T = I, then T is 1-1. Since V is finite-dimensional, T isinvertible and thus T ∗ = T−1.

Remark. This proposition is not true for an infinite-dimensional inner productspace. For example, let R : `2 → `2 be the right-shift operator, i.e.

R(x1, x2, . . . ) = (0, x1, x2, . . . ).

Then ‖Rx‖ = ‖x‖ for all x ∈ `2, but R is not surjective and thus not invertible.

Theorem 4.3.11. Let A ∈Mn(C). Then TFAE:

(i) A is unitary;

(ii) ‖Ax‖ = ‖x‖ for all x ∈ Cn;

(iii) 〈Ax,Ay〉 = 〈x, y〉 for all x, y ∈ Cn;

(iv) A∗A = In;

(v) the column vectors of A are orthonormal;

(vi) the row vectors of A are orthonormal.

Proof. The proof that (i), (ii), (iii) and (iv) are equivalent is similar to the proofof Theorem 4.3.10. We now show that (iv) and (v) are equivalent. Let A = [aij ]and A∗ = [bij ], where bij = aji. Then A∗A = [cij ], where

(A∗A)ij =n∑k=1

bikakj =n∑k=1

akiakj . (4.4)

The fact that A∗A = In is equivalent to (A∗A)ij = δij for i, j ∈ 1, . . . , n. Thei-th column vector of A is Ci = (a1i, . . . , ani), for i = 1, . . . , n. Hence

〈Ci, Cj〉 =n∑k=1

akiakj .


It follows that

〈Cj , Ci〉 = 〈Ci, Cj〉 =n∑k=1

akiakj . (4.5)

From (4.4) and (4.5), we see that (iv) and (v) are equivalent.That (vi) is equivalent to the other statements follows from the fact that A is

unitary if and only if At is unitary and that the row vectors of A are the columnvectors of At.

We also have the following version for an orthogonal real matrix.

Theorem 4.3.12. Let A ∈Mn(R). Then TFAE:

(i) A is orthogonal;

(ii) ‖Ax‖ = ‖x‖ for all x ∈ Rn;

(iii) 〈Ax,Ay〉 = 〈x, y〉 for all x, y ∈ Rn;

(iv) AtA = In;

(v) the column vectors of A are orthonormal;

(vi) the row vectors of A are orthonormal.


Exercises

4.3.1. Let V be a (finite-dimensional) inner product space. If P is an orthogonalprojection onto a subspace of V , prove that P 2 = P and P ∗ = P . Conversely, ifP is a linear operator on V such that P 2 = P and P ∗ = P , show that P is anorthogonal projection onto a subspace of V .

4.3.2. Prove that a linear operator on a finite-dimensional inner product spaceis unitary if and only if it maps an orthonormal basis onto an orthonormal basis.

4.3.3. Let V be a finite-dimensional inner product space with dimV = n. LetB = v1, . . . , vn be an ordered orthonormal basis for V . If T is a linear operatoron V , prove that

(i) the ij-entry of [T ]B is 〈T (vj), vi〉 for any i, j ∈ 1, 2, . . . , n;

(ii) [T ∗]B = [T ]∗B;

(iii) T is normal if and only if [T ]B is normal;

(iv) T is self-adjoint if and only if [T ]B is self-adjoint;

(v) T is unitary if and only if [T ]B is unitary.

4.3.4. If T : V → V is a linear operator on a finite-dimensional inner productspace, show that

kerT ∗ = (imT )⊥ and imT ∗ = (kerT )⊥.

4.3.5. Let f be a sesquilinear form on a finite-dimensional complex inner productspace V . Prove that there is a linear operator T : V → V such that

f(x, y) = 〈Tx, y〉 for any x, y ∈ V .

Moreover, show that

(i) T is self-adjoint if and only if f is hermitian;

(ii) T is invertible if and only if f is nondegenerate.


4.3.6. Let T be a unitary linear operator on a complex inner product space V .Prove that for any subspace W of V ,

T (W⊥) = T (W )⊥.

4.3.7. Show that every linear operator T on a complex inner product space Vcan be written uniquely in the form

T = T1 + iT2,

where T1 and T2 are self-adjoint linear operators on V . The operators T1 andT2 are called the real part and the imaginary part of T , respectively. Moreover,show that T is normal if and only if its real part and imaginary part commute.

4.3.8. Let P be a linear operator on a finite-dimensional complex inner productspace V such that P 2 = P . Show that the following statements are equivalent:

(a) P is self-adjoint;

(b) P is normal;

(c) kerP = (imP )⊥;

(d) 〈Px, x〉 = ‖Px‖2 for all x ∈ V .


4.4 Spectral Theorem

Theorem 4.4.1. Let T be a self-adjoint operator on V .

(i) Any eigenvalue of T is real.

(ii) The eigenspaces associated with distinct eigenvalues are orthogonal.

Proof. (i) Let λ be an eigenvalue for T . Then there is a nonzero vector v ∈ Vsuch that Tv = λv. Hence

λ〈v, v〉 = 〈Tv, v〉 = 〈v, Tv〉 = λ〈v, v〉.

Since v 6= 0, we have λ = λ. This implies that λ is real.

(ii) Let λ and µ be distinct eigenvalues of T . Let u and v be elements in V

such that Tu = λu and Tv = µv. Then

λ〈u, v〉 = 〈λu, v〉 = 〈Tu, v〉 = 〈u, Tv〉 = 〈u, µv〉 = µ〈u, v〉.

In the last equality, we use the fact that an eigenvalue of a self-adjoint operatoris real. Since λ 6= µ, we have 〈u, v〉 = 0. Hence the eigenspaces associated with λand µ are orthogonal.

Theorem 4.4.2. Let T be a normal operator on V .

(i) If v is an eigenvector of T corresponding to an eigenvalue λ, then v is aneigenvector of T ∗ corresponding to an eigenvalue λ. Moreover,

ker(T − λI) = ker(T ∗ − λI).

(ii) The eigenspaces associated with distinct eigenvalues are orthogonal.

Proof. (i) Let v be an eigenvector of T corresponding to an eigenvalue λ. SinceT is normal, it is easy to check that T − λI is normal. By Theorem 4.3.9,

0 = ‖(T − λI)v‖ = ‖(T − λI)∗v‖ = ‖(T ∗ − λI)v‖,

which implies T ∗(v) = λv. Hence v is an eigenvector of T ∗ corresponding to aneigenvalue λ.

4.4. SPECTRAL THEOREM 191

(ii) Let λ and µ be distinct eigenvalues of T . Let u and v be nonzero elementsin V such that Tu = λu and Tv = µv. By (i), T ∗v = µv. Hence

λ〈u, v〉 = 〈λu, v〉 = 〈Tu, v〉 = 〈u, T ∗v〉 = 〈u, µv〉 = µ〈u, v〉.

Since λ 6= µ, we have 〈u, v〉 = 0. Hence the eigenspaces associated with λ and µ

are orthogonal.

Proposition 4.4.3. Let T be a linear operator on V . If W is a T -invariantsubspace of V , then W⊥ is T ∗-invariant.

Proof. Suppose T (W ) ⊆W , i.e. Tw ∈W for any w ∈W . If v ∈W⊥, then

〈T ∗v, w〉 = 〈v, Tw〉 = 0 for any w ∈W .

Hence T ∗v ∈W⊥, which implies T ∗(W⊥) ⊆W⊥.

Definition 4.4.4. A linear operator T on a finite-dimensional inner productspace V is said to be orthogonally diagonalizable if there is an orthonormal basisfor V consisting of eigenvectors of T .

A matrix A ∈ Mn(F) is said to be orthogonally diagonalizable if there is anorthonormal basis for Fn consisting of eigenvectors of A.

Theorem 4.4.5 (Spectral theorem - complex version). A linear operator on afinite-dimensional complex inner product space is orthogonally diagonalizable ifand only if it is normal.

Proof. If V has an orthonormal basis B consisting of eigenvectors of T , then [T ]Bis a diagonal matrix, say, [T ]B = diag(λ1, . . . , λn) and thus

[T ∗]B = [T ]∗B = diag(λ1, . . . , λn).

From this, it is easy to check that [T ]B[T ]∗B = [T ]∗B[T ]B. It follows that [T ]B isnormal, and thus T is normal.

Now assume that T is normal. We will prove the result by induction onthe dimension of V . If dimV = 1, then the result is trivial. Now supposen = dimV > 1. Assume the result holds for any complex inner product space ofdimension less than n. Since V is a complex vector space, T has an eigenvalue


λ because the characteristic polynomial always has a root in C. Let W be theeigenspace corresponding to λ. If W = V , then T = λI and the result followstrivially. Assume that W is a proper subspace of V . By Theorem 4.4.2, W =ker(T − λI) = ker(T ∗ − λI). Hence W is invariant under both T and T ∗. ByProposition 4.4.3, W⊥ is invariant under T ∗∗ = T . This shows that both W

and W⊥ are T -invariant. Thus T |W and T |W⊥ are normal operators on W andW⊥, respectively (see Theorem 4.3.9). Since V = W ⊕W⊥ and 0 < dimW <

n, we have 0 < dimW⊥ < n. By the induction hypothesis, there exist anorthonormal basis u1, . . . , uk for W consisting of eigenvectors of T |W and anorthonormal basis uk+1, . . . , un for W⊥ consisting of eigenvectors of T |W⊥ .Thus u1, u2, . . . , un is an orthonormal basis of V consisting of eigenvectors ofT .

The complex spectral theorem says that a linear operator on a complex innerproduct space can be diagonalized by an orthonormal basis precisely when it isnormal. However, if an inner product space is real, a linear operator is diagonal-ized by an orthonormal basis precisely when it is self-adjoint. To prove this, weneed an important lemma. A linear operator on a real inner product space maynot have an eigenvalue, but it is true for a self-adjoint operator.

Lemma 4.4.6. Let T be a self-adjoint operator on a finite-dimensional real innerproduct space. Then T has an eigenvalue (and an eigenvector).

Proof. Let V be a finite-dimensional real inner product space and T : V → V aself-adjoint operator. Fix an orthonormal basis B for V and let A be the matrixrepresentation of T with respect to B. View A as a matrix in Mn(C). Since T isself-adjoint, A is self-adjoint. As a complex matrix, A must have an eigenvalueλ, which will be real by Theorem 4.4.1 (i). Thus there is an eigenvector v ∈ Cn

such that Av = λv. In fact, we can choose v to be in Rn. Since A is a real matrixand λ is real, A − λIn is also a real matrix. Hence the system (A − λIn)v = 0has a nontrivial solution in Rn because A− λIn is singular. It follows that thereis a nonzero vector v ∈ V such that Tv = λv. Hence λ is an eigenvalue of Tcorresponding to an eigenvector v.

Theorem 4.4.7 (Spectral theorem - real version). A linear operator on a finite-


dimensional real inner product space is orthogonally diagonalizable if and only ifit is self-adjoint.

Proof. Let V be a real inner product space. Assume first that V has an orthonor-mal basis B consisting of eigenvectors of T . Then [T ]B is a diagonal matrix, say,[T ]B = diag(λ1, . . . , λn), where λi’s are all real. Thus

[T ]∗B = diag(λ1, . . . , λn) = diag(λ1, . . . , λn) = [T ]B.

Hence [T ]B is self-adjoint and thus T is self-adjoint.Now assume that T is self-adjoint. We will prove the result by induction

on the dimension of V . If dimV = 1, then the result is trivial. Now supposedimV > 1. Then T has an eigenvalue λ by Lemma 4.4.6. Let W be the eigenspacecorresponding to λ. We assume that W is a proper subspace of V . Then W isinvariant under T . By Proposition 4.4.3, W⊥ is invariant under T ∗ = T . Henceboth W and W⊥ are T -invariant. Thus T |W and T |W⊥ are self-adjoint operatorson W and W⊥, respectively. The rest of the proof is the same as the proof ofTheorem 4.4.5.

If A is a diagonalizable matrix, then there is an invertible matrix P suchthat P−1AP is a diagonal matrix. In fact, the columns of P are formed byeigenvectors of A. If A is a complex normal matrix or a real symmetric matrix,then A is diagonalizable. In this case, the matrix P can be chosen to be unitaryin the complex case and orthogonal in the real case. This the a matrix version ofthe spectral theorem.

Theorem 4.4.8. A complex matrix is orthogonally diagonalizable if and only ifthere is a unitary matrix P such that P−1AP is a diagonal matrix. A real matrixis orthogonally diagonalizable if and only if there is an orthogonal matrix P suchthat P−1AP is a diagonal matrix.

Proof. We will give a proof for the complex case. Let A be a complex matrix.First, note that if P is an invertible matrix, then D = P−1AP is equivalent toPD = AP . Moreover, if D is a diagonal matrix and P = [u1 . . . un], where eachui is the i-th column of P , then

AP = [Au1 . . . Aun] and PD = [λ1u1 . . . λnun]. (4.6)


Assume that A is orthogonally diagonalizable. Then there is an orthonormalbasis B = u1, . . . , un for Cn and λ1, . . . , λn ∈ C such that Aui = λiui fori = 1, . . . , n. Let D = diag(λ1, . . . , λn) and let P be the matrix of which the i-thcolumn is ui for each i. Then P is unitary by Theorem 4.3.11. From (4.6), wesee that AP = PD, which implies P−1AP is a diagonal matrix.

Conversely, assume that there is a unitary matrix P such that D = P−1AP

is a diagonal matrix. Then AP = PD. Assume that D = diag(λ1, . . . , λn) andP = [u1 . . . un], where each ui is the i-th column of P . By (4.6), it follows thatAui = λiui for i = 1, . . . , n. Hence each ui is an eigenvector of A correspondingto the eigenvalue λi. Since P is unitary, u1, . . . , un is an orthonormal basis forCn by Theorem 4.3.11. This completes the proof.

Theorem 4.4.9 (Spectral theorem - matrix version). If A is a normal matrixin Mn(C), then there is a unitary matrix P such that P−1AP = D is a diagonalmatrix. Hence

A = PDP−1 = PDP ∗.

If A is a real symmetric matrix, then there is an orthogonal matrix P such thatP−1AP = D is a diagonal matrix. Hence

A = PDP−1 = PDP t.

Proof. Let A be a complex normal matrix. Then LA is a normal operator onCn. By the Spectral theorem (Theorem 4.4.5), there is an orthonormal basis forCn consisting of eigenvectors of LA (which are eigenvectors of A). By Theorem4.4.8, there is a unitary matrix P such that P−1AP = D is a diagonal matrix.The proof for a real symmetric matrix is the same and will be omitted.

Example. Define

A =

(1 22 −2

).

Find an orthogonal matrix P and a diagonal matrix D such that A = PDP−1.

Proof.

χA(x) = det

(x− 1 −2−2 x+ 2

)= x2 + x− 6.


The eigenvalues of A are −3 and 2. Moreover,

V−3 = 〈(1,−2)〉 and V2 = 〈(2, 1)〉.

Thus B = ( 1√5, −2√

5), ( 2√

5, 1√

5) is an orthonormal basis for R2 consisting of eigen-

vectors of A. Let

P =

(1√5

2√5

−2√5

1√5

)and D =

(−3 0

0 2

).

Then P is an orthogonal matrix such that A = PDP−1.

Example. Define

A =

5 4 24 5 22 2 2

.

Find an orthogonal matrix P and a diagonal matrix D such that A = PDP−1.

Proof. Solving the equation det(xI3−A) = 0, we have x = 1, 1, 10, which are theeigenvalues of A. Hence

V1 = 〈(1, 0,−2), (0, 1,−2)〉 and V10 = 〈(2, 2, 1)〉.

Note that V1 ⊥ V10, so we have to choose 2 orthonormal vectors from V1 byapplying the Gram-Schmidt process to V1. Let x1 = (1, 0,−2) and x2 = (0, 1,−2).Let u1 = x1/‖x1‖ =

(1√5, 0, −2√

5

). Write z2 = x2 − 〈x2, u1〉u1 =

(− 4

5 , 1,−25

),

and u2 = z2/‖z2‖ =(− 4√

45, 5√

45,− 2√

45

). Moreover, let

u3 = (2, 2, 1)/‖(2, 2, 1)‖ =(2

3,23,13

).

Let

P =

1√5

−4√45

23

0 5√45

23

−2√5

−2√45

13

and D =

1 0 00 1 00 0 10

.

Then P is an orthogonal matrix and A = PDP−1.


Remark. 1. A linear operator or a square matrix can be diagonalizable but notorthogonally diagonalizable. For example, let

A =

(1 10 2

).

It is easy to show that A has eigenvalues 1 and 2 and that V1 = span(1, 0) andV2 = span(1, 1). Hence (1, 0), (1, 1) is a basis for R2 consisting of eigenvectorsof A. However, we cannot choose vectors in V1 and V2 that are orthogonal. Hencethere is no orthonormal basis for R2 consisting of eigenvectors of A.

2. A real matrix can be orthogonally diagonalizable over C, but not over R.For example, consider the following real orthogonal matrix:

A =


)

where θ is a real number. Note that A, regarded as a complex matrix, is unitaryand hence normal. Thus A is orthogonally diagonalizable over C. Its eigenval-ues eiθ and e−iθ and their corresponding eigenspaces are 〈(1,−i)〉 and 〈(1, i)〉,respectively. Then(

1 1−i i

)−1(cos θ − sin θsin θ cos θ

)(1 1−i i

)=

(eiθ 00 e−iθ

).

However, the only real matrices which are orthogonally diagonalizable (over R)are symmetric matrices. Hence A is not orthogonally diagonalizable over R.

Definition 4.4.10. Let T be a linear operator on an inner product space V . Wesay that T is positive or positive semi-definite if T is self-adjoint and

〈Tx, x〉 ≥ 0 for any x ∈ V .

Moreover, we say that T is positive definite if T is self-adjoint and

〈Tx, x〉 > 0 for any x 6= 0.

A positive (semi-definite) matrix and a positive definite matrix can be definedanalogously.


Note that if V is a complex inner product space, by Theorem 4.3.7, we candrop the assumption T being self-adjoint.

Example. The following matrix is positive definite as can be readily checked.

A =

(2 −1−1 2

).

Theorem 4.4.11. Let T be a linear operator on a finite-dimensional inner prod-uct space V . Then TFAE:

(i) T is positive;

(ii) T is self-adjoint and all eigenvalues of T are nonnegative;

(iii) T = P 2 for some self-adjoint operator P ;

(iv) T = S∗S for some linear operator S.

Proof. (i)⇒ (ii). Assume that T is positive. Clearly, T is self-adjoint. Let λ bean eigenvalue of T . Then Tx = λx for some nonzero x ∈ V . Thus

λ‖x‖2 = 〈λx, x〉 = 〈Tx, x〉 ≥ 0,

which implies λ ≥ 0.(ii) ⇒ (iii). Assume T is self-adjoint and all eigenvalues of T are nonnegative.By the Spectral theorem, there is an orthonormal basis B = u1, . . . , un for Vconsisting of eigenvectors of T . Assume that Tuj = λjuj for j = 1, . . . , n. Thenλj ≥ 0 for all j. Define Puj =

√λj uj for j = 1, . . . , n and extend it to a linear

operator on V . Clearly,

P 2uj = P (√λj uj) = λjuj = Tuj for j = 1, . . . , n.

Hence P 2 = T on a basis B for V . It follows that P 2 = T on V . Note that[P ]B = diag(

√λ1, . . . ,

√λn ). Thus [P ]B is a self-adjoint matrix, which implies P

is a self-adjoint operator.(iii)⇒ (iv). If T = P 2 where P is a self-adjoint operator, then

P ∗P = PP = P 2 = T.


(iv)⇒ (i). If T = S∗S, then T ∗ = (S∗S)∗ = S∗S = T, and

〈Tx, x〉 = 〈S∗S x, x〉 = 〈Sx, Sx〉 = ‖Sx‖2 ≥ 0

for any x ∈ V . Hence T is positive.

Similarly, we can establish the following Corollary for positive definiteness:

Corollary 4.4.12. Let T be a linear operator on a finite-dimensional innerproduct space V . Then TFAE:

(i) T is positive definite;

(ii) T is self-adjoint and all eigenvalues of T are positive;

(iii) T = P 2 for some self-adjoint invertible operator P ;

(iv) T = S∗S for some invertible linear operator S;

(v) T is positive and invertible.

Proof. Exercise.

Remark. The operator P in Theorem 4.4.11 (ii) is indeed a positive operatoras can be seen in the proof. It is called a positive square-root of T . Although alinear operator can have many square-root, it has a unique positive square-root.

Proposition 4.4.13. A positive operator has a unique positive square-root.

Proof. Let T be a positive operator on V . Let λ1, . . . , λk be the distinct eigen-values of T . Since T is positive, λi ≥ 0 for i = 1, . . . , n. Since T is self-adjoint, itis diagonalizable. Hence

V = ker(T − λ1I)⊕ · · · ⊕ ker(T − λkI).

Let P be a positive operator such that P 2 = T . If α is an eigenvalue of P witha corresponding eigenvector v. Then Pv = αv, which implies Tv = P 2v = α2v.Hence α2 is one of λj for some j ∈ 1, . . . , k. It follows that α =

√λj for some

j and thatker(P −

√λj I) ⊆ ker(T − λjI).


This shows that the only eigenvalues of P are√λ1, . . . ,

√λk. Since P is self-

adjoint, it is diagonalizable and thus

V = ker(P −√λ1 I)⊕ · · · ⊕ ker(P −

√λk I).

It follows that

ker(P −√λj I) = ker(T − λjI) for j = 1, . . . , k.

Hence on each subspace ker(T −λjI) of V , P =√λj I. Thus the positive square-

root P of T is uniquely determined.

Theorem 4.4.14 (Polar Decomposition). Let T be an invertible operator onV . Then T = UP , where U is a unitary operator and P is a positive definiteoperator. Moreover, the operators U and P are uniquely determined.

Proof. Since T ∗T is positive, there is a (unique) positive square-root P so thatP 2 = T ∗T . Since T is invertible, so is P . By Corollary 4.4.12, P is positivedefinite. Let U = TP−1. Then

U∗U = (TP−1)∗(TP−1) = (P−1)∗T ∗TP−1 = P−1P 2P−1 = I.

Hence U is unitary.Suppose T = U1P1 = U2P2, where U1, U2 are unitary and P1, P2 are positive

definite. ThenT ∗T = (P ∗1U

∗1 )(U1P1) = P ∗1 IP1 = P 2

1 .

Similarly, T ∗T = P 22 . But the positive square-root of T ∗T is unique. Hence

P1 = P2. It follows that U1P1 = U2P2 = U2P1. Since P1 is invertible, we haveU1 = U2.

Corollary 4.4.15. Let A be an invertible matrix in Mn(F). Then A = UP ,where U is a unitary (orthogonal if F = R) matrix and P is a positive definitematrix. Moreover, the matrices U and P are uniquely determined.


Exercises

4.4.1. Given

A =

0 2 −12 3 −2−1 −2 0

,find an orthogonal matrix P that diagonalizes A.

4.4.2. Let T be a normal operator on a complex finite-dimensional inner productspace and let σ(T ) denote the set of eigenvalues of T . Prove that

(a) T is self-adjoint if and only if σ(T ) ⊆ R;

(b) T is unitary if and only if σ(T ) ⊆ z ∈ C : |z| = 1.

4.4.3. Let T be a self-adjoint operator on a finite-dimensional inner productspace such that tr(T 2) = 0. Prove that T = 0.

4.4.4. Show that if T is self-adjoint, nilpotent operator on a finite-dimensionalinner product space, then T = 0.

4.4.5. Let T be a normal (symmetric) operator on a complex (real) finite-dimensional inner product space V . Show that there exist orthogonal projectionsE1, . . . , En on V and scalars λ1, . . . , λn such that

(i) EiEj = 0 for i 6= j;

(ii) I = E1 + · · ·+ En;

(iii) T = λ1E1 + · · ·+ λnEn.

Conversely, if there exist orthogonal projections E1, . . . , En satisfying (i)-(iii)above, show that T is normal.

4.4.6. Let a1, . . . , an, b1, . . . , bn ∈ F for some n ∈ N. Show that there is apolynomial p(x) ∈ F[x] such that p(ai) = bi for i = 1, . . . , n. This is called theLagrange Interpolation Theorem.

4.4.7. Let T be a linear operator on a finite-dimensional complex inner productspace. Show that T is normal if and only if there is a polynomial p ∈ C[x] suchthat T ∗ = p(T ).


4.4.8. Let S and T be linear operators on a finite-dimensional inner productspace. Prove that

(i) if S and T are positive, then so is S + T ;

(ii) if T is positive and c ≥ 0, then so is cT ;

(iii) any orthogonal projection is positive;

(iv) if T is positive, then S∗TS is positive;

(v) if T is a linear operator, then TT ∗ and T ∗T are positive;

(vi) if T is positive and invertible, then so is T−1.

4.4.9. Prove Corollary 4.4.12.

4.4.10. Let T be a self-adjoint operator on a finite-dimensional inner productspace. Prove that T is positive definite if and only if T is invertible and T−1 ispositive definite.

4.4.11. Let A be an n× n real symmetric matrix. Prove there is an α ∈ R suchthat A+ αIn is positive definite.

4.4.12. Let A be a 2 × 2 matrix

(a b

c d

). Prove that A is positive definite if

and only if a > 0 and detA > 0.

4.4.13. Find a square root of matrix

A =

1 3 −30 4 50 0 9

.

4.4.14. Let T be a positive operator on a finite-dimensional inner product spaceV . Prove that

|〈Tx, y〉|2 ≤ 〈Tx, x〉〈Ty, y〉 for all x, y ∈ V .

Bibliography

[1] Sheldon Axler, Linear Algebra Done Right, Seconnd Edition, Springer, NewYork 1997.

[2] Thomas S. Blyth, Module Theory: An Approach to Linear Algebra, SecondEdition, Oxford University Press, Oxford 1990.

[3] William C. Brown, A Second Course in Linear Algebra, John Wiley & Sons,New York, 1988.

[4] Stephen H. Friedberg, Arnold J. Insel, Lawrence E. Spence, Linear Algebra,Second Edition, Prentice Hall, New Jersey, 1989.

[5] Kenneth Hoffman and Ray Kunze, Linear Algebra, Second Edition, PrenticeHall, New Jersey, 1971.

[6] Thomas W. Hungerford, Algebra, Springer-Verlag, New York 1974.

[7] Seymour Lipschutz, Linear Algebra SI (Metric) Edition, McGraw Hill, Sin-gapore, 1987.

[8] Aigli Papantonopoulou, Algebra : Pure and Applied, First Edition, PrenticeHall, New Jersey, 2002.

[9] Steven Roman, Advanced Linear Algebra, Third Edition, Springer, New York2008.

[10] Surjeet Singh and Qazi Zameeruddin, Modern Algebra, Vikas PublishingHouse PVT Ltd., New Delhi, 1988.

203

Lecture Notes on Linear and Multilinear Algebra...

Documents

Transcript of Lecture Notes on Linear and Multilinear Algebra...