Advanced Linear Algebra (Yisong Yong).pdf

A Concise Text on Advanced Linear Algebra

This engaging textbook for advanced undergraduate students and beginning graduatescovers the core subjects in linear algebra. The author motivates the concepts bydrawing clear links to applications and other important areas.

The book places particular emphasis on integrating ideas from analysis whereverappropriate and features many novelties in its presentation. For example, the notion ofdeterminant is shown to appear from calculating the index of a vector field which leadsto a self-contained proof of the Fundamental Theorem of Algebra; theCayley–Hamilton theorem is established by recognizing the fact that the set ofcomplex matrices of distinct eigenvalues is dense; the existence of a real eigenvalue ofa self-adjoint map is deduced by the method of calculus; the construction of the Jordandecomposition is seen to boil down to understanding nilpotent maps of degree two;and a lucid and elementary introduction to quantum mechanics based on linear algebrais given.

The material is supplemented by a rich collection of over 350 mostly proof-orientedexercises, suitable for readers from a wide variety of backgrounds. Selected solutionsare provided at the back of the book, making it ideal for self-study as well as for use asa course text.

A Concise Text onAdvanced Linear Algebra

YISONG YANGPolytechnic School of Engineering, New York University

University Printing House, Cambridge CB2 8BS, United Kingdom

Cambridge University Press is part of the University of Cambridge.

It furthers the University’s mission by disseminating knowledge in the pursuit ofeducation, learning and research at the highest international levels of excellence.

www.cambridge.orgInformation on this title: www.cambridge.org/9781107087514

c© Yisong Yang 2015

This publication is in copyright. Subject to statutory exceptionand to the provisions of relevant collective licensing agreements,no reproduction of any part may take place without the written

permission of Cambridge University Press.

First published 2015

Printed in the United Kingdom by Clays, St Ives plc

A catalogue record for this publication is available from the British Library

Library of Congress Cataloguing in Publication dataYang, Yisong.

A concise text on advanced linear algebra / Yisong Yang, Polytechnic Schoolof Engineering, New York University.

pages cmIncludes bibliographical references and index.

ISBN 978-1-107-08751-4 (Hardback) – ISBN 978-1-107-45681-5 (Paperback)1. Algebras, Linear–Textbooks. 2. Algebras, Linear–Study and teaching (Higher) .

3. Algebras, Linear–Study and teaching (Graduate). I. Title.II. Title: Advanced linear algebra.

QA184.2.Y36 2015512′.5–dc23 2014028951

ISBN 978-1-107-08751-4 HardbackISBN 978-1-107-45681-5 Paperback

Cambridge University Press has no responsibility for the persistence or accuracyof URLs for external or third-party internet websites referred to in this publication,

and does not guarantee that any content on such websites is, or will remain,accurate or appropriate.

For Sheng,Peter, Anna, and Julia

Contents

Preface page ixNotation and convention xiii

1 Vector spaces 11.1 Vector spaces 11.2 Subspaces, span, and linear dependence 81.3 Bases, dimensionality, and coordinates 131.4 Dual spaces 161.5 Constructions of vector spaces 201.6 Quotient spaces 251.7 Normed spaces 28

2 Linear mappings 342.1 Linear mappings 342.2 Change of basis 452.3 Adjoint mappings 502.4 Quotient mappings 532.5 Linear mappings from a vector space into itself 552.6 Norms of linear mappings 70

3 Determinants 783.1 Motivational examples 783.2 Definition and properties of determinants 883.3 Adjugate matrices and Cramer’s rule 1023.4 Characteristic polynomials and Cayley–Hamilton

theorem 107

4 Scalar products 1154.1 Scalar products and basic properties 115

vii

viii Contents

4.2 Non-degenerate scalar products 1204.3 Positive definite scalar products 1274.4 Orthogonal resolutions of vectors 1374.5 Orthogonal and unitary versus isometric mappings 142

5 Real quadratic forms and self-adjoint mappings 1475.1 Bilinear and quadratic forms 1475.2 Self-adjoint mappings 1515.3 Positive definite quadratic forms, mappings, and matrices 1575.4 Alternative characterizations of positive definite matrices 1645.5 Commutativity of self-adjoint mappings 1705.6 Mappings between two spaces 172

6 Complex quadratic forms and self-adjoint mappings 1806.1 Complex sesquilinear and associated quadratic forms 1806.2 Complex self-adjoint mappings 1846.3 Positive definiteness 1886.4 Commutative self-adjoint mappings and consequences 1946.5 Mappings between two spaces via self-adjoint mappings 199

7 Jordan decomposition 2057.1 Some useful facts about polynomials 2057.2 Invariant subspaces of linear mappings 2087.3 Generalized eigenspaces as invariant subspaces 2117.4 Jordan decomposition theorem 218

8 Selected topics 2268.1 Schur decomposition 2268.2 Classification of skewsymmetric bilinear forms 2308.3 Perron–Frobenius theorem for positive matrices 2378.4 Markov matrices 242

9 Excursion: Quantum mechanics in a nutshell 2489.1 Vectors in Cn and Dirac bracket 2489.2 Quantum mechanical postulates 2529.3 Non-commutativity and uncertainty principle 2579.4 Heisenberg picture for quantum mechanics 262

Solutions to selected exercises 267Bibliographic notes 311References 313Index 315

Preface

This book is concisely written to provide comprehensive core materials fora year-long course in Linear Algebra for senior undergraduate and beginninggraduate students in mathematics, science, and engineering. Students who gainprofound understanding and grasp of the concepts and methods of this coursewill acquire an essential knowledge foundation to excel in their future aca-demic endeavors.

Throughout the book, methods and ideas of analysis are greatly emphasizedand used, along with those of algebra, wherever appropriate, and a delicatebalance is cast between abstract formulation and practical origins of varioussubject matters.

The book is divided into nine chapters. The first seven chapters embody atraditional course curriculum. An outline of the contents of these chapters issketched as follows.

In Chapter 1 we cover basic facts and properties of vector spaces. Theseinclude definitions of vector spaces and subspaces, concepts of linear dep-endence, bases, coordinates, dimensionality, dual spaces and dual bases,quotient spaces, normed spaces, and the equivalence of the norms of a finite-dimensional normed space.

In Chapter 2 we cover linear mappings between vector spaces. We start fromthe definition of linear mappings and discuss how linear mappings may be con-cretely represented by matrices with respect to given bases. We then introducethe notion of adjoint mappings and quotient mappings. Linear mappings froma vector space into itself comprise a special but important family of mappingsand are given a separate treatment later in this chapter. Topics studied thereinclude invariance and reducibility, eigenvalues and eigenvectors, projections,nilpotent mappings, and polynomials of linear mappings. We end the chapterwith a discussion of the concept of the norms of linear mappings and use itto show that being invertible is a generic property of a linear mapping and

ix

x Preface

then to show how the exponential of a linear mapping may be constructed andunderstood.

In Chapter 3 we cover determinants. As a non-traditional but highlymotivating example, we show that the calculation of the topological degreeof a differentiable map from a closed curve into the unit circle in R2 involvescomputing a two-by-two determinant, and the knowledge gained allows us toprove the Fundamental Theorem of Algebra. We then formulate the definitionof a general determinant inductively, without resorting to the notion of permu-tations, and establish all its properties. We end the chapter by establishing theCayley–Hamilton theorem. Two independent proofs of this important theoremare given. The first proof is analytic and consists of two steps. In the first step,we show that the theorem is valid for a matrix of distinct eigenvalues. In thesecond step, we show that any matrix may be regarded as a limiting point of asequence of matrices of distinct eigenvalues. Hence the theorem follows againby taking the limit. The second proof, on the other hand, is purely algebraic.

In Chapter 4 we discuss vector spaces with scalar products. We start from themost general notion of scalar products without requiring either non-degeneracyor positive definiteness. We then carry out detailed studies on non-degenerateand positive definite scalar products, respectively, and elaborate on adjointmappings in terms of scalar products. We end the chapter with a discussionof isometric mappings in both real and complex space settings and noting theirsubtle differences.

In Chapter 5 we focus on real vector spaces with positive definite scalarproducts and quadratic forms. We first establish the main spectral theorem forself-adjoint mappings. We will not take the traditional path of first using theFundamental Theorem of Algebra to assert that there is an eigenvalue and thenapplying the self-adjointness to show that the eigenvalue must be real. Insteadwe shall formulate an optimization problem and use calculus to prove directlythat a self-adjoint mapping must have a real eigenvalue. We then present aseries of characteristic conditions for a symmetric bilinear form, a symmetricmatrix, or a self-adjoint mapping, to be positive definite. We end the chapterby a discussion of the commutativity of self-adjoint mappings and the useful-ness of self-adjoint mappings for the investigation of linear mappings betweendifferent spaces.

In Chapter 6 we study complex vector spaces with Hermitian scalar productsand related notions. Much of the theory here is parallel to that of the real spacesituation with the exception that normal mappings can only be fully understoodand appreciated within a complex space formalism.

In Chapter 7 we establish the Jordan decomposition theorem. We start witha discussion of some basic facts regarding polynomials. We next show how

Preface xi

to reduce a linear mapping over its generalized eigenspaces via the Cayley–Hamilton theorem and the prime factorization of the characteristic polynomialof the mapping. We then prove the Jordan decomposition theorem. The keyand often the most difficult step in this construction is a full understandingof how a nilpotent mapping is reduced canonically. We approach this probleminductively with the degree of a nilpotent mapping and show that it is crucial totackle a mapping of degree 2. Such a treatment eases the subtlety of the subjectconsiderably.

In Chapter 8 we present four selected topics that may be used as materi-als for some optional extra-curricular study when time and interest permit. Inthe first section we present the Schur decomposition theorem, which may beviewed as a complement to the Jordan decomposition theorem. In the secondsection we give a classification of skewsymmetric bilinear forms. In the thirdsection we state and prove the Perron–Frobenius theorem regarding the prin-cipal eigenvalues of positive matrices. In the fourth section we establish somebasic properties of the Markov matrices.

In Chapter 9 we present yet another selected topic for the purpose ofoptional extra-curricular study: a short excursion into quantum mechanicsusing gadgets purely from linear algebra. Specifically we will use Cn as thestate space and Hermitian matrices as quantum mechanical observables to for-mulate the over-simplified quantum mechanical postulates including Bohr’sstatistical interpretation of quantum mechanics and the Schrödinger equationgoverning the time evolution of a state. We next establish Heisenberg’s uncer-tainty principle. Then we prove the equivalence of the Schrödinger descriptionvia the Schrödinger equation and the Heisenberg description via the Heisen-berg equation of quantum mechanics.

Also provided in the book is a rich collection of mostly proof-orientedexercises to supplement and consolidate the main course materials. Thediversity and elasticity of these exercises aim to satisfy the needs and inter-ests of students from a wide variety of backgrounds.

At the end of the book, solutions to some selected exercises are presented.These exercises and solutions provide additional illustrative examples, extendmain course materials, and render convenience for the reader to master thesubjects and methods covered in a broader range.

Finally some bibliographic notes conclude the book.This text may be curtailed to meet the time constraint of a semester-long

course. Here is a suggested list of selected sections for such a plan: Sec-tions 1.1–1.5, 2.1–2.3, 2.5, 3.1.2, 3.2, and 3.3 (present the concept of adjugatematrices only), Section 3.4 (give the second proof of the Cayley–Hamilton the-orem only, based on an adjugate matrix expansion), Sections 4.3, 4.4, 5.1, 5.2

xii Preface

(omit the analytic proof that a self-adjoint mapping must have an eigenvaluebut resort to Exercise 5.2.1 instead), Sections 5.3, 6.1, 6.2, 6.3.1, and 7.1–7.4.Depending on the pace of lectures and time available, the instructor maydecide in the later stage of the course to what extent the topics in Sections7.1–7.4 (the Jordan decomposition) can be presented productively.

The author would like to take this opportunity to thank Patrick Lin, ThomasOtway, and Robert Sibner for constructive comments and suggestions, andRoger Astley of Cambridge University Press for valuable editorial advice,which helped improve the presentation of the book.

West Windsor, New Jersey Yisong Yang

Notation and convention

We use N to denote the set of all natural numbers,

N = {0, 1, 2, . . . },and Z the set of all integers,

Z = {. . . ,−2,−1, 0, 1, 2, . . . }.We use i to denote the imaginary unit

√−1. For a complex number c =a + ib where a, b are real numbers we use

c = a − ib

to denote the complex conjugate of c. We use �{c} and �{c} to denote the realand imaginary parts of the complex number c = a + ib. That is,

�{c} = a, �{c} = b.

We use i, j, k, l,m, n to denote integer-valued indices or space dimen-sion numbers, a, b, c scalars, u, v,w, x, y, z vectors, A,B,C,D matrices,P,R, S, T mappings, and U,V,W,X, Y,Z vector spaces, unless otherwisestated.

We use t to denote the variable in a polynomial or a function or the transposeoperation on a vector or a matrix.

When X or Y is given, we use X ≡ Y to denote that Y , or X, is defined tobe X, or Y , respectively.

Occasionally, we use the symbol ∀ to express ‘for all’.Let X be a set and Y,Z subsets of X. We use Y \ Z to denote the subset of

elements in Y which are not in Z.

xiii

1

Vector spaces

In this chapter we study vector spaces and their basic properties and structures.We start by stating the definition and a discussion of the examples of vectorspaces. We next introduce the notions of subspaces, linear dependence, bases,coordinates, and dimensionality. We then consider dual spaces, direct sums,and quotient spaces. Finally we cover normed vector spaces.

1.1 Vector spaces

A vector space is a non-empty set consisting of elements called vectors whichcan be added and multiplied by some quantities called scalars. In this section,we start with a study of vector spaces.

1.1.1 Fields

The scalars to operate on vectors in a vector space are required to form a field,which may be denoted by F, where two operations, usually called addition,denoted by ‘+’, and multiplication, denoted by ‘·’ or omitted, over F are per-formed between scalars, such that the following axioms are satisfied.

(1) (Closure) If a, b ∈ F, then a + b ∈ F and ab ∈ F.(2) (Commutativity) For a, b ∈ F, there hold a + b = b + a and ab = ba.(3) (Associativity) For a, b, c ∈ F, there hold (a + b)+ c = a + (b + c) and

a(bc) = (ab)c.(4) (Distributivity) For a, b, c ∈ F, there hold a(b + c) = ab + ac.(5) (Existence of zero) There is a scalar, called zero, denoted by 0, such that

a + 0 = a for any a ∈ F.(6) (Existence of unity) There is a scalar different from zero, called one,

denoted by 1, such that 1a = a for any a ∈ F.

1

2 Vector spaces

(7) (Existence of additive inverse) For any a ∈ F, there is a scalar, denoted by−a or (−a), such that a + (−a) = 0.

(8) (Existence of multiplicative inverse) For any a ∈ F \ {0}, there is a scalar,denoted by a−1, such that aa−1 = 1.

It is easily seen that zero, unity, additive and multiplicative inverses are allunique. Besides, a field consists of at least two elements.

With the usual addition and multiplication, the sets of rational numbers, realnumbers, and complex numbers, denoted by Q, R, and C, respectively, are allfields. These fields are infinite fields. However, the set of integers, Z, is not afield because there is a lack of multiplicative inverses for its non-unit elements.

Let p be a prime (p = 2, 3, 5, . . . ) and set pZ = {n ∈ Z | n = kp, k ∈ Z}.Classify Z into the so-called cosets modulo pZ, that is, some non-overlappingsubsets of Z represented as [i] (i ∈ Z) such that

[i] = {j ∈ Z | i − j ∈ pZ}. (1.1.1)

It is clear that Z is divided into exactly p cosets, [0], [1], . . . , [p − 1]. UseZp to denote the set of these cosets and pass the additive and multiplicativeoperations in Z over naturally to the elements in Zp so that

[i] + [j ] = [i + j ], [i][j ] = [ij ]. (1.1.2)

It can be verified that, with these operations, Zp becomes a field with its obvi-ous zero and unit elements, [0] and [1]. Of course, p[1] = [1] + · · · + [1] (pterms)= [p] = [0]. In fact, p is the smallest positive integer whose multipli-cation with unit element results in zero element. A number of such a propertyis called the characteristic of the field. Thus, Zp is a field of characteristic p.For Q,R, and C, since no such integer exists, we say that these fields are ofcharacteristic 0.

1.1.2 Vector spaces

Let F be a field. Consider the set of n-tuples, denoted by Fn, with elementscalled vectors arranged in row or column forms such as⎛⎜⎜⎝

a1

...

an

⎞⎟⎟⎠ or (a1, . . . , an) where a1, . . . , an ∈ F. (1.1.3)

Furthermore, we can define the addition of two vectors and the scalar multipli-cation of a vector by a scalar following the rules such as

1.1 Vector spaces 3

⎛⎜⎜⎝a1

...

an

⎞⎟⎟⎠+⎛⎜⎜⎝

b1

...

bn

⎞⎟⎟⎠ =⎛⎜⎜⎝

a1 + b1

...

an + bn

⎞⎟⎟⎠ , (1.1.4)

α

⎛⎜⎜⎝a1

...

an

⎞⎟⎟⎠ =⎛⎜⎜⎝

αa1

...

αan

⎞⎟⎟⎠ where α ∈ F. (1.1.5)

The set Fn, modeled over the field F and equipped with the above operations,is a prototype example of a vector space.

More generally, we say that a set U is a vector space over a field F if U isnon-empty and there is an operation called addition, denoted by ‘+’, betweenthe elements of U , called vectors, and another operation called scalar mul-tiplication between elements in F, called scalars, and vectors, such that thefollowing axioms hold.

(1) (Closure) For u, v ∈ U , we have u + v ∈ U . For u ∈ U and a ∈ F, wehave au ∈ U .

(2) (Commutativity) For u, v ∈ U , we have u+ v = v + u.(3) (Associativity of addition) For u, v,w ∈ U , we have u + (v + w) =

(u+ v)+ w.(4) (Existence of zero vector) There is a vector, called zero and denoted by 0,

such that u+ 0 = u for any u ∈ U .(5) (Existence of additive inverse) For any u ∈ U , there is a vector, denoted

as (−u), such that u+ (−u) = 0.(6) (Associativity of scalar multiplication) For any a, b ∈ F and u ∈ U , we

have a(bu) = (ab)u.(7) (Property of unit scalar) For any u ∈ U , we have 1u = u.(8) (Distributivity) For any a, b ∈ F and u, v ∈ U , we have (a+b)u = au+bu

and a(u+ v) = au+ av.

As in the case of the definition of a field, we see that it readily follows fromthe definition that zero vector and additive inverse vectors are all unique ina vector space. Besides, any vector multiplied by zero scalar results in zerovector. That is, 0u = 0 for any u ∈ U .

Other examples of vector spaces (with obviously defined vector addition andscalar multiplication) include the following.

(1) The set of all polynomials with coefficients in F defined by

P = {a0 + a1t + · · · + antn | a0, a1, . . . , an ∈ F, n ∈ N}, (1.1.6)

4 Vector spaces

where t is a variable parameter.

(2) The set of all real-valued continuous functions over the interval [a, b] fora, b ∈ R and a < b usually denoted by C[a, b].

(3) The set of real-valued solutions to the differential equation

an

dnx

dtn+ · · · + a1

dx

dt+ a0x = 0, a0, a1, . . . , an ∈ R. (1.1.7)

(4) In addition, we can also consider the set of arrays of scalars in F consistingof m rows of vectors in Fn or n columns of vectors in Fm of the form

(aij ) =

⎛⎜⎜⎜⎜⎝a11 a12 . . . a1n

a21 a22 . . . a2n

. . . . . . . . . . . .

am1 am2 . . . amn

⎞⎟⎟⎟⎟⎠ , (1.1.8)

where aij ∈ F, i = 1, . . . , m, j = 1, . . . , n, called an m by n or m × n

matrix and each aij is called an entry or component of the matrix. The setof all m × n matrices with entries in F may be denoted by F(m, n). Inparticular, F(m, 1) or F(1, n) is simply Fm or Fn. Elements in F(n, n) arealso called square matrices.

1.1.3 Matrices

Here we consider some of the simplest manipulations on, and properties of,matrices.

Let A be the matrix given in (1.1.8). Then At , called the transpose of A, isdefined to be

At =

⎛⎜⎜⎜⎜⎝a11 a21 . . . am1

a12 a22 . . . am2

. . . . . . . . . . . .

a1n a2n . . . amn

⎞⎟⎟⎟⎟⎠ . (1.1.9)

Of course, At ∈ F(n,m). Simply put, At is a matrix obtained from taking therow (column) vectors of A to be its corresponding column (row) vectors.

For A ∈ F(n, n), we say that A is symmetric if A = At , or skew-symmetricor anti-symmetric if At = −A. The sets of symmetric and anti-symmetricmatrices are denoted by FS(n, n) and FA(n, n), respectively.

It is clear that (At )t = A.

1.1 Vector spaces 5

It will now be useful to introduce the notion of dot product. For any twovectors u = (a1, . . . , an) and v = (b1, . . . , bn) in Fn, their dot product u·v ∈ F

is defined to be

u · v = a1b1 + · · · + anbn. (1.1.10)

The following properties of dot product can be directly examined.

(1) (Commutativity) u · v = v · u for any u, v ∈ Fn.

(2) (Associativity and homogeneity) u · (av + bw) = a(u · v)+ b(u · w) forany u, v,w ∈ Fn and a, b ∈ F.

With the notion of dot product, we can define the product of two matricesA ∈ F(m, k) and B ∈ F(k, n) by

C = (cij ) = AB, i = 1, . . . , m, j = 1, . . . , n, (1.1.11)

where cij is the dot product of the ith row of A and the j th column of B. ThusAB ∈ F(m, n).

Alternatively, if we use u, v to denote column vectors in Fn, then

u · v = utv. (1.1.12)

That is, the dot product of u and v may be viewed as a matrix product of the1× n matrix ut and n× 1 matrix v as well.

Matrix product (or matrix multiplication) enjoys the following properties.

(1) (Associativity of scalar multiplication) a(AB) = (aA)B = A(aB) forany a ∈ F and any A ∈ F(m, k), B ∈ F(k, n).

(2) (Distributivity) A(B + C) = AB + AC for any A ∈ F(m, k) and B,C ∈F(k, n); (A+B)C = AC+BC for any A,B ∈ F(m, k) and C ∈ F(k, n).

(3) (Associativity) A(BC) = (AB)C for any A ∈ F(m, k), B ∈ F(k, l),C ∈ F(l, n).

Alternatively, if we express A ∈ F(m, k) and B ∈ F(k, n) as made of m rowvectors and n column vectors, respectively, rewritten as

A =

⎛⎜⎜⎝A1

...

Am

⎞⎟⎟⎠ , B = (B1, . . . , Bn), (1.1.13)

6 Vector spaces

then, formally, we have

AB =

⎛⎜⎜⎜⎜⎝A1 · B1 A1 · B2 · · · A1 · Bn

A2 · B1 A2 · B2 · · · A2 · Bn

· · · · · · · · · · · ·Am · B1 Am · B2 · · · Am · Bn

⎞⎟⎟⎟⎟⎠

=

⎛⎜⎜⎝A1

...

Am

⎞⎟⎟⎠ (B1, . . . , Bn)

=

⎛⎜⎜⎜⎜⎝A1B1 A1B2 · · · A1Bn

A2B1 A2B2 · · · A2Bn

· · · · · · · · · · · ·AmB1 AmB2 · · · AmBn

⎞⎟⎟⎟⎟⎠ , (1.1.14)

which suggests that matrix multiplication may be carried out with legitimatemultiplications executed over appropriate matrix blocks.

If A ∈ F(m, k) and B ∈ F(k, n), then At ∈ F(k,m) and Bt ∈ F(n, k) sothat BtAt ∈ F(n,m). Regarding how AB and BtAt are related, here is theconclusion.

Theorem 1.1 For A ∈ F(m, k) and B ∈ F(k, n), there holds

(AB)t = BtAt . (1.1.15)

The proof of this basic fact is assigned as an exercise.Other matrices in F(n, n) having interesting properties include the

following.

(1) Diagonal matrices are of the form A = (aij ) with aij = 0 wheneveri = j . The set of diagonal matrices is denoted as FD(n, n).

(2) Lower triangular matrices are of the form A = (aij ) with aij = 0 when-ever j > i. The set of lower triangular matrices is denoted as FL(n, n).

(3) Upper triangular matrices are of the form A = (aij ) with aij = 0 when-ever i > j . The set of upper triangular matrices is denoted as FU(n, n).

There is a special element in F(n, n), called the identity matrix, or unitmatrix, and denoted by In, or simply I , which is a diagonal matrix whosediagonal entries are all 1 (unit scalar) and off-diagonal entries are all 0. Forany A ∈ F(n, n), we have AI = IA = A.

1.1 Vector spaces 7

Definition 1.2 A matrix A ∈ F(n, n) is called invertible or nonsingular if thereis some B ∈ F(n, n) such that

AB = BA = I. (1.1.16)

In this situation, B is unique (cf. Exercise 1.1.7) and called the inverse of A

and denoted by A−1.

If A,B ∈ F(n, n) are such that AB = I , then we say that A is a left inverseof B and B a right inverse of A. It can be shown that a left or right inverse issimply the inverse. In other words, if A is a left inverse of B, then both A andB are invertible and the inverses of each other.

If A ∈ R(n, n) enjoys the property AAt = AtA = I , then A is called anorthogonal matrix. For A = (aij ) ∈ C(m, n), we adopt the notation A = (aij )

for taking the complex conjugate of A and use A† to denote taking the complexconjugate of the transpose of A, A† = A

t, which is also commonly referred

to as taking the Hermitian conjugate of A. If A ∈ C(n, n), we say that A isHermitian symmetric, or simply Hermitian, if A† = A, and skew-Hermitianor anti-Hermitian, if A† = −A. If A ∈ C(n, n) enjoys the property AA† =A†A = I , then A is called a unitary matrix. We will see the importance ofthese notions later.

Exercises

1.1.1 Show that it follows from the definition of a field that zero, unit, additive,and multiplicative inverse scalars are all unique.

1.1.2 Let p ∈ N be a prime and [n] ∈ Zp. Find −[n] and prove the existenceof [n]−1 when [n] = [0]. In Z5, find −[4] and [4]−1.

1.1.3 Show that it follows from the definition of a vector space that both zeroand additive inverse vectors are unique.

1.1.4 Prove the associativity of matrix multiplication by showing thatA(BC) = (AB)C for any A ∈ F(m, k), B ∈ F(k, l), C ∈ F(l, n).

1.1.5 Prove Theorem 1.1.1.1.6 Let A ∈ F(n, n) (n ≥ 2) and rewrite A as

A =(

A1 A2

A3 A4

), (1.1.17)

where A1 ∈ F(k, k), A2 ∈ F(k, l), A3 ∈ F(l, k), A4 ∈ F(l, l), k, l ≥ 1,k + l = n. Show that

At =(

At1 At

3

At2 At

4

). (1.1.18)

8 Vector spaces

1.1.7 Prove that the inverse of an invertible matrix is unique by showing thefact that if A,B,C ∈ F(n, n) satisfy AB = I and CA = I then B = C.

1.1.8 Let A ∈ C(n, n). Show that A is Hermitian if and only if iA is anti-Hermitian.

1.2 Subspaces, span, and linear dependence

Let U be a vector space over a field F and V ⊂ U a non-empty subset of U .We say that V is a subspace of U if V is a vector space over F with theinherited addition and scalar multiplication from U . It is worth noting that,when checking whether a subset V of a vector space U becomes a sub-space, one only needs to verify the closure axiom (1) in the definition of avector space since the rest of the axioms follow automatically as a conse-quence of (1).

The two trivial subspaces of U are those consisting only of zero vector, {0},and U itself. A nontrivial subspace is also called a proper subspace.

Consider the subset Pn (n ∈ N) of P defined by

Pn = {a0 + a1t + · · · + antn | a0, a1, . . . , an ∈ F}. (1.2.1)

It is clear that Pn is a subspace of P and Pm is subspace of Pn when m ≤ n.Consider the set Sa of all vectors (x1, . . . , xn) in Fn satisfying the equation

x1 + · · · + xn = a, (1.2.2)

where a ∈ F. Then Sa is a subspace of Fn if and only if a = 0.Let u1, . . . , uk be vectors in U . The linear span of {u1, . . . , uk}, denoted by

Span{u1, . . . , uk}, is the subspace of U defined by

Span{u1, . . . , uk} = {u ∈ U | u = a1u1 + · · · + akuk, a1, . . . , ak ∈ F}.(1.2.3)

Thus, if u ∈ Span{u1, . . . , uk}, then there are a1, . . . , ak ∈ F such that

u = a1u1 + · · · + akuk. (1.2.4)

We also say that u is linearly spanned by u1, . . . , uk or linearly dependent onu1, . . . , uk . Therefore, zero vector 0 is linearly dependent on any finite set ofvectors.

If U = Span{u1, . . . , uk}, we also say that U is generated by the vectorsu1, . . . , uk .

For Pn defined in (1.2.1), we have Pn = Span{1, t, . . . , tn}. ThusPn is generated by 1, t, . . . , tn. Naturally, for two elements p, q in

1.2 Subspaces, span, and linear dependence 9

Pn, say p(t) = a0 + a1t + · · · + antn, q(t) = b0 + b1t + · · · + bnt

n, weidentify p and q if and only if all the coefficients of p and q of like powers oft coincide in F, or, ai = bi for all i = 0, 1, . . . , n.

In Fn, define

e1 = (1, 0, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), en = (0, 0, . . . , 0, 1).

(1.2.5)

Then Fn = Span{e1, e2, . . . , en} and Fn is generated by e1, e2, . . . , en.Thus, for S0 defined in (1.2.2), we have

(x1, x2, . . . , xn) = −(x2 + · · · + xn)e1 + x2e2 + · · · + xnen

= x2(e2 − e1)+ · · · + xn(en − e1), (1.2.6)

where x2, . . . , xn are arbitrarily taken from F. Therefore

S0 = Span{e2 − e1, . . . , en − e1}. (1.2.7)

For F(m, n), we define Mij ∈ F(m, n) to be the vector such that all itsentries vanish except that its entry at the position (i, j) (at the ith row and j thcolumn) is 1, i = 1, . . . , m, j = 1, . . . , n. We have

F(m, n) = Span{Mij | i = 1, . . . , m, j = 1, . . . , n}. (1.2.8)

The notion of spans can be extended to cover some useful situations. Let U

be a vector space and S be a (finite or infinite) subset of U . Define

Span(S) = the set of linear combinations

of all possible finite subsets of S. (1.2.9)

It is obvious that Span(S) is a subspace of U . If U = Span(S), we say that U

is spanned or generated by the set of vectors S.As an example, we have

P = Span{1, t, . . . , tn, . . . }. (1.2.10)

Alternatively, we can also express P as

P = ∪∞n=0Pn. (1.2.11)

The above discussion motivates the following formal definition.

Definition 1.3 Let u1, . . . , um be m vectors in the vector space U over afield F. We say that these vectors are linearly dependent if one of them maybe written as a linear span of the rest of them or linearly dependent on the rest

10 Vector spaces

of them. Or equivalently, u1, . . . , um are linearly dependent if there are scalarsa1, . . . , am ∈ F where (a1, . . . , am) = (0, . . . , 0) such that

a1u1 + · · · + amum = 0. (1.2.12)

Otherwise u1, . . . , um are called linearly independent. In this latter case, theonly possible vector (a1, . . . , am) ∈ Fn to make (1.2.12) fulfilled is the zerovector, (0, . . . , 0).

To proceed further, we need to consider the following system of linearequations ⎧⎪⎨⎪⎩

a11x1 + · · · + a1nxn = 0,

. . . . . . . . . . . . . . . . . . . . . . . .

am1x1 + · · · + amnxn = 0,

(1.2.13)

over F with unknowns x1, . . . , xn.

Theorem 1.4 In the system (1.2.13), if m < n, then the system has a nontrivialsolution (x1, . . . , xn) = (0, . . . , 0).

Proof We prove the theorem by using induction on m+ n.The beginning situation is m+n = 3 when m = 1 and n = 2. It is clear that

we always have a nontrivial solution.Assume that the statement of the theorem is true when m + n ≤ k where

k ≥ 3.Let m+n = k+1. If k = 3, the condition m < n implies m = 1, n = 3 and

the existence of a nontrivial solution is obvious. Assume then k ≥ 4. If all thecoefficients of the variable x1 in (1.2.13) are zero, i.e. a11 = · · · = am1 = 0,then x1 = 1, x2 = · · · = xn = 0 is a nontrivial solution. So we may assumeone of the coefficients of x1 is nonzero. Without loss of generality, we assumea11 = 0. If m = 1, there is again nothing to show. Assume m ≥ 2. Dividing thefirst equation in (1.2.13) by a11 if necessary, we can further assume a11 = 1.Then, adding the (−ai1)-multiple of the first equation into the ith equation, in(1.2.13), for i = 2, . . . , m, we arrive at⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

x1+ a12x2 + · · · + a1nxn = 0,

b22x2 + · · · + b2nxn = 0,

. . . . . . . . . . . . . . . . . . . . . . . .

bm2x2 + · · · + bmnxn = 0.

(1.2.14)

The system below the first equation in (1.2.14) contains m − 1 equations andn − 1 unknowns x2, . . . , xn. Of course, m − 1 < n − 1. So, in view of the

1.2 Subspaces, span, and linear dependence 11

inductive assumption, it has a nontrivial solution. Substituting this nontrivialsolution into the first equation in (1.2.14) to determine the remaining unknownx1, we see that the existence of a nontrivial solution to the original system(1.2.13) follows.

The importance of Theorem 1.4 is seen in the following.

Theorem 1.5 Any set of more than m vectors in Span{u1, . . . , um} must belinearly dependent.

Proof Let v1, . . . , vn ∈ Span{u1, . . . , um} be n vectors where n > m.Consider the possible linear dependence relation

x1v1 + · · · + xnvn = 0, (1.2.15)

for some x1, . . . , xn ∈ F.Since each vj ∈ Span{u1, . . . , um}, j = 1, . . . , n, there are scalars aij ∈ F

(i = 1, . . . , m, j = 1, . . . , n) such that

vj =m∑

i=1

aijui, j = 1, . . . , n. (1.2.16)

Substituting (1.2.16) into (1.2.15), we have

n∑j=1

xj

(m∑

i=1

aijui

)= 0. (1.2.17)

Regrouping the terms in the above equation, we arrive at

m∑i=1

⎛⎝ n∑j=1

aij xj

⎞⎠ ui = 0, (1.2.18)

which may be fulfilled by setting

n∑j=1

aij xj = 0, i = 1, . . . , m. (1.2.19)

This system of equations is exactly the system (1.2.13) which allows a non-trivial solution in view of Theorem 1.4. Hence the proof follows.

We are now prepared to study in the next section several fundamental prop-erties of vector spaces.

12 Vector spaces

Exercises

1.2.1 Let U1 and U2 be subspaces of a vector space U . Show that U1 ∪U2 isa subspace of U if and only if U1 ⊂ U2 or U2 ⊂ U1.

1.2.2 Let Pn denote the vector space of the polynomials of degrees up to n

over a field F expressed in terms of a variable t . Show that the vectors1, t, . . . , tn in Pn are linearly independent.

1.2.3 Show that the vectors in Fn defined in (1.2.5) are linearly independent.1.2.4 Show that S0 defined in (1.2.2) may also be expressed as

S0 = Span{e1 − en, . . . , en−1 − en}, (1.2.20)

and deduce that, in Rn, the vectors(1,

1

2, . . . ,

1

2n−2 , 2

[1

2n−1−1

]), e1−en, . . . , en−1 − en, (1.2.21)

are linearly dependent (n ≥ 4).1.2.5 Show that FS(n, n), FA(n, n), FD(n, n), FL(n, n), and FU(n, n) are all

subspaces of F(n, n).1.2.6 Let u1, . . . , un (n ≥ 2) be linearly independent vectors in a vector

space U and set

vi−1 = ui−1 + ui, i = 2, . . . , n; vn = un + u1. (1.2.22)

Investigate whether v1, . . . , vn are linearly independent as well.1.2.7 Let F be a field. For any two vectors u = (a1, . . . , an) and v =

(b1, . . . , bn) in Fn (n ≥ 2), viewed as matrices, we see that the ma-trix product A = utv lies in F(n, n). Prove that any two row vectors ofA are linearly dependent. What happens to the column vectors of A?

1.2.8 Consider a slightly strengthened version of the second part of Exercise1.2.1 above: Let U1, U2 be subspaces of a vector space U , U1 = U ,U2 = U . Show without using Exercise 1.2.1 that there exists a vec-tor in U which lies outside U1 ∪ U2. Explain how you may apply theconclusion of this exercise to prove that of Exercise 1.2.1.

1.2.9 (A challenging extension of the previous exercise) Let U1, . . . , Uk be k

subspaces of a vector space U over a field of characteristic 0. If Ui = U

for i = 1, . . . , k, show that there is a vector in U which lies outside∪k

i=1Ui .1.2.10 For A ∈ F(m, n) and B ∈ F(n,m) with m > n show that AB as an

element in F(m,m) can never be invertible.

1.3 Bases, dimensionality, and coordinates 13

1.3 Bases, dimensionality, and coordinates

Let U be a vector space over a field F, take u1, . . . , un ∈ U , and setV = Span{u1, . . . , un}. Eliminating linearly dependent vectors from the set{u1, . . . , un} if necessary, we can certainly assume that the vectors u1, . . . , un

are already made linearly independent. Thus, any vector u ∈ V may takethe form

u = a1u1 + · · · + anun, a1, . . . , an ∈ F. (1.3.1)

It is not hard to see that the coefficients a1, . . . , an in the above representationmust be unique. In fact, if we also have

u = b1u1 + · · · + bnun, b1, . . . , bn ∈ F, (1.3.2)

then, combining the above two relations, we have (a1 − b1)u1 + · · · +(an − bn)un = 0. Since u1, . . . , un are linearly independent, we obtaina1 = b1, . . . , an = bn and the uniqueness follows.

Furthermore, if there is another set of vectors v1, . . . , vm in U such that

Span{v1, . . . , vm} = Span{u1, . . . , un}, (1.3.3)

then m ≥ n in view of Theorem 1.5. As a consequence, if v1, . . . , vm are alsolinearly independent, then m = n. This observation leads to the following.

Definition 1.6 If there are linearly independent vectors u1, . . . , un ∈ U suchthat U = Span{u1, . . . , un}, then U is said to be finitely generated and the setof vectors {u1, . . . , un} is called a basis of U . The number of vectors in anybasis of a finitely generated vector space, n, is independent of the choice of thebasis and is referred to as the dimension of the finitely generated vector space,written as dim(U) = n. A finitely generated vector space is also said to beof finite dimensionality or finite dimensional. If a vector space U is not finitedimensional, it is said to be infinite dimensional, also written as dim(U) = ∞.

As an example of an infinite-dimensional vector space, we show that whenR is regarded as a vector space over Q, then dim(R) = ∞. In fact, recallthat a real number is called an algebraic number if it is the zero of a polyno-mial with coefficients in Q. We also know that there are many non-algebraicnumbers in R, called transcendental numbers. Let τ be such a transcenden-tal number. Then for any n = 1, 2, . . . the numbers 1, τ, τ 2, . . . , τ n are lin-early independent in the vector space R over the field Q. Indeed, if there arer0, r1, r2, . . . , rn ∈ Q so that

r0 + r1τ + r2τ2 + · · · + rnτ

n = 0, (1.3.4)

14 Vector spaces

and at least one number among r0, r1, r2, . . . , rn is nonzero, then τ is the zeroof the nontrivial polynomial

p(t) = r0 + r1t + r2t2 + · · · + rnt

n, (1.3.5)

which violates the assumption that τ is transcendental. Thus R is infinitedimensional over Q.

The following theorem indicates that it is fairly easy to construct a basis fora finite-dimensional vector space.

Theorem 1.7 Let U be an n-dimensional vector space over a field F. Any n

linearly independent vectors in U form a basis of U .

Proof Let u1, . . . , un ∈ U be linearly independent vectors. We only need toshow that they span U . In fact, take any u ∈ U . We know that u1, . . . , un, u

are linearly dependent. So there is a nonzero vector (a1, . . . , an, a) ∈ Fn+1

such that

a1u1 + · · · + anun + au = 0. (1.3.6)

Of course, a = 0, otherwise it contradicts the assumption that u1, . . . , un

are linearly independent. So u = (−a−1)(a1u1 + · · · + anun). Thus u ∈Span{u1, . . . , un}.

Definition 1.8 Let {u1, . . . , un} be a basis of the vector space U . Given u ∈ U

there are unique scalars a1, . . . , an ∈ F such that

u = a1u1 + · · · + anun. (1.3.7)

These scalars, a1, . . . , an are called the coordinates, and (a1, . . . , an) ∈ Fn thecoordinate vector, of the vector u with respect to the basis {u1, . . . , un}.

It will be interesting to investigate the relation between the coordinate vec-tors of a vector under different bases.

Let U = {u1, . . . , un} and V = {v1, . . . , vn} be two bases of the vector spaceU . For u ∈ U , let (a1, . . . , an) ∈ Fn and (b1, . . . , bn) ∈ Fn be the coordinatevectors of u with respect to U and V , respectively. Thus

u = a1u1 + · · · + anun = b1v1 + · · · + bnvn. (1.3.8)

On the other hand, we have

vj =n∑

i=1

aijui, j = 1, . . . , n. (1.3.9)

1.3 Bases, dimensionality, and coordinates 15

The n× n matrix A = (aij ) is called a basis transition matrix or basis changematrix. Inserting (1.3.9) into (1.3.8), we have

n∑i=1

aiui =n∑

i=1

⎛⎝ n∑j=1

aij bj

⎞⎠ ui. (1.3.10)

Hence, by the linear independence of the basis vectors, we have

ai =n∑

j=1

aij bj , i = 1, . . . , n. (1.3.11)

Note that the relation (1.3.9) between bases may be formally and conve-niently expressed in a ‘matrix form’ as

(v1, . . . , vn) = (u1, . . . , un)A, (1.3.12)

or concisely V = UA, or ⎛⎜⎜⎝v1

...

vn

⎞⎟⎟⎠ = At

⎛⎜⎜⎝u1

...

un

⎞⎟⎟⎠ , (1.3.13)

where multiplications between scalars and vectors are made in a well definedmanner. On the other hand, the relation (1.3.11) between coordinate vectorsmay be rewritten as ⎛⎜⎜⎝

a1

...

an

⎞⎟⎟⎠ = A

⎛⎜⎜⎝b1

...

bn

⎞⎟⎟⎠ , (1.3.14)

or

(a1, . . . , an) = (b1, . . . , bn)At . (1.3.15)

Exercises

1.3.1 Let U be a vector space with dim(U) = n ≥ 2 and V a subspace of U

with a basis {v1, . . . , vn−1}. Prove that for any u ∈ U \ V the vectorsu, v1, . . . , vn−1 form a basis for U .

1.3.2 Show that dim(F(m, n)) = mn.1.3.3 Determine dim(FS(n, n)), dim(FA(n, n)), and dim(FD(n, n)).1.3.4 Let P be the vector space of all polynomials with coefficients in a field

F. Show that dim(P) = ∞.

16 Vector spaces

1.3.5 Consider the vector space R3 and the bases U = {e1, e2, e3} andV = {e1, e1 + e2, e1 + e2 + e3}. Find the basis transition matrix A fromU into V satisfying V = UA. Find the coordinate vectors of the givenvector (1, 2, 3) ∈ R3 with respect to the bases U and V , respectively,and relate these vectors with the matrix A.

1.3.6 Prove that a basis transition matrix must be invertible.1.3.7 Let U be an n-dimensional vector space over a field F where n ≥ 2

(say). Consider the following construction.

(i) Take u1 ∈ U \ {0}.(ii) Take u2 ∈ U \ Span{u1}.

(iii) Take (if any) u3 ∈ U \ Span{u1, u2}.(iv) In general, take ui ∈ U \ Span{u1, . . . , ui−1} (i ≥ 2).

Show that this construction will terminate itself in exactly n steps,that is, it will not be possible anymore to get un+1, and that the vectorsu1, u2, . . . , un so obtained form a basis of U .

1.4 Dual spaces

Let U be an n-dimensional vector space over a field F. A functional (also calleda form or a 1-form) f over U is a linear function f : U → F satisfying

f (u+ v) = f (u)+ f (v), u, v ∈ U ; f (au) = af (u), a ∈ F, u ∈ U.

(1.4.1)

Let f, g be two functionals. Then we can define another functional calledthe sum of f and g, denoted by f + g, by

(f + g)(u) = f (u)+ g(u), u ∈ U. (1.4.2)

Similarly, let f be a functional and a ∈ F. We can define another functionalcalled the scalar multiple of a with f , denoted by af , by

(af )(u) = af (u), u ∈ U. (1.4.3)

It is a simple exercise to check that these two operations make the set of allfunctionals over U a vector space over F. This vector space is called the dualspace of U , denoted by U ′.

Let {u1, . . . , un} be a basis of U . For any f ∈ U ′ and any u = a1u1+ · · · +anun ∈ U , we have

f (u) = f

(n∑

i=1

aiui

)=

n∑i=1

aif (ui). (1.4.4)

1.4 Dual spaces 17

Hence, f is uniquely determined by its values on the basis vectors,

f1 = f (u1), . . . , fn = f (un). (1.4.5)

Conversely, for arbitrarily assigned values f1, . . . , fn in (1.4.5), we define

f (u) =n∑

i=1

aifi for any u =n∑

i=1

aiui ∈ U. (1.4.6)

It is clear that f is a functional. That is, f ∈ U ′. Of course, such an f satisfies(1.4.5).

Thus, we have seen a well-defined 1-1 correspondence

U ′ ↔ Fn, f ↔ (f1, . . . , fn). (1.4.7)

Especially we may use u′1, . . . , u′n to denote the elements in U ′ correspond-ing to the vectors e1, . . . , en in Fn given by (1.2.5). Then we have

u′i (uj ) = δij ={

0, i = j,

1, i = j,i, j = 1, . . . , n. (1.4.8)

It is clear that u′1, . . . , u′n are linearly independent and span U ′ because anelement f of U ′ satisfying (1.4.5) is simply given by

f = f1u′1 + · · · + fnu

′n. (1.4.9)

In other words, {u′1, . . . , u′n} is a basis of U ′, commonly called the dual basisof U ′ with respect to the basis {u1, . . . , un} of U . In particular, we have seenthat U and U ′ are of the same dimensionality.

Let U = {u1, . . . , un} and V = {v1, . . . , vn} be two bases of the vec-tor space U . Let their dual bases be denoted by U ′ = {u′1, . . . , u′n} andV ′ = {v′1, . . . , v′n}, respectively. Suppose that the bases U ′ and V ′ are relatedthrough

u′j =n∑

i=1

a′ij v′i , j = 1, . . . , n. (1.4.10)

Using (1.3.9) and (1.4.10) to evaluate u′i (vj ), we obtain

u′i (vj ) = u′i

(n∑

k=1

akjuk

)=

n∑k=1

akju′i (uk) =

n∑k=1

akj δik = aij , (1.4.11)

u′i (vj ) =n∑

k=1

a′kiv′k(vj ) =

n∑k=1

a′kiδkj = a′j i , (1.4.12)

18 Vector spaces

which leads to a′ij = aji (i, j = 1, . . . , n). In other words, we have arrived atthe correspondence relation

u′j =n∑

i=1

ajiv′i , j = 1, . . . , n. (1.4.13)

With matrix notation as before, we have

(u′1, . . . , u′n) = (v′1, . . . , v′n)At , (1.4.14)

or ⎛⎜⎜⎝u′1...

u′n

⎞⎟⎟⎠ = A

⎛⎜⎜⎝v′1...

v′n

⎞⎟⎟⎠ . (1.4.15)

Besides, for any u′ ∈ U ′ written as

u′ = a′1u′1 + · · · + a′nu′n = b′1v′1 + · · · + b′nv′n, (1.4.16)

the discussion in the previous section and the above immediately allow usto get

b′i =n∑

j=1

ajia′j , i = 1, . . . , n. (1.4.17)

Thus, in matrix form, we obtain the relation

(b′1 . . . , b′n) = (a′1 . . . , a′n)A, (1.4.18)

or ⎛⎜⎜⎝b′1...

b′n

⎞⎟⎟⎠ = At

⎛⎜⎜⎝a′1...

a′n

⎞⎟⎟⎠ . (1.4.19)

Comparing the above results with those established in the previous section,we see that, with respect to bases and dual bases, the coordinates vectors in U

and U ′ follow ‘opposite’ rules of correspondence. For this reason, coordinatevectors in U are often called covariant vectors, and those in U ′ contravariantvectors.

Using the relation stated in (1.4.8), we see that we may naturally viewu1, . . . , un as elements in (U ′)′ = U ′′ so that they form a basis of U ′′ dualto {u′1, . . . , u′n} since

1.4 Dual spaces 19

ui(u′j ) ≡ u′j (ui) = δij =

{0, i = j,

1, i = j,i, j = 1, . . . , n. (1.4.20)

Thus, for any u ∈ U ′′ satisfying u(u′i ) = ai (i = 1, . . . , n), we have

u = a1u1 + · · · + anun. (1.4.21)

In this way, we see that U ′′ may be identified with U . In other words, wehave seen that the identification just made spells out the relationship

U ′′ = U, (1.4.22)

which is also referred to as reflectivity of U or U is said to be reflective.Notationally, for u ∈ U and u′ ∈ U ′, it is often convenient to rewrite u′(u),

which is linear in both the u and u′ arguments, as

u′(u) = 〈u′, u〉. (1.4.23)

Then our identification (1.4.22), made through setting

u′(u) = u(u′), u ∈ U, u′ ∈ U ′, (1.4.24)

simply says that the ‘pairing’ 〈·, ·〉 as given in (1.4.23) is symmetric:

〈u′, u〉 = 〈u, u′〉, u ∈ U, u′ ∈ U ′. (1.4.25)

For any non-empty subset S ⊂ U , the annihilator of S, denoted by S0, isthe subset of U ′ given by

S0 = {u′ ∈ U ′ | 〈u′, u〉 = 0,∀u ∈ S}. (1.4.26)

It is clear that S0 is always a subspace of U ′ regardless of whether S is asubspace of U . Likewise, for any nonempty subset S′ ⊂ U ′, we can define theannihilator of S′, S′0, as the subset

S′0 = {u ∈ U | 〈u′, u〉 = 0,∀u′ ∈ S′} (1.4.27)

of U . Of course, S′0 is always a subspace of U .

Exercises

1.4.1 Let F be a field. Describe the dual spaces F′ and (F2)′.1.4.2 Let U be a finite-dimensional vector space. Prove that for any vectors

u, v ∈U (u = v) there exists an element f ∈U ′ such that f (u) = f (v).1.4.3 Let U be a finite-dimensional vector space and f, g ∈ U ′. For any v ∈

U , f (v) = 0 if and only if g(v) = 0. Show that f and g are linearlydependent.

20 Vector spaces

1.4.4 Let F be a field and

V = {(x1, . . . , xn) ∈ Fn | x1 + · · · + xn = 0}. (1.4.28)

Show that any f ∈ V 0 may be expressed as

f (x1, . . . , xn) = c

n∑i=1

xi, (x1, . . . , xn) ∈ Fn, (1.4.29)

for some c ∈ F.1.4.5 Let U = P2 and f, g, h ∈ U ′ be defined by

f (p) = p(−1), g(p) = p(0), h(p) = p(1), p(t) ∈ P2.

(1.4.30)

(i) Show that B′ = {f, g, h} is a basis for U ′.(ii) Find a basis B of U which is dual to B′.

1.4.6 Let U be an n-dimensional vector space and V be an m-dimensionalsubspace of U . Show that the annihilator V 0 is an (n−m)-dimensionalsubspace of U ′. In other words, there holds the dimensionality equation

dim(V )+ dim(V 0) = dim(U). (1.4.31)

1.4.7 Let U be an n-dimensional vector space and V be an m-dimensionalsubspace of U . Show that V 00= (V 0)0=V .

1.5 Constructions of vector spaces

Let U be a vector space and V,W its subspaces. It is clear that V ∩W is alsoa subspace of U but V ∪ W in general may fail to be a subspace of U . Thesmallest subspace of U that contains V ∪W should contain all vectors in U ofthe form v + w where v ∈ V and w ∈ W . Such an observation motivates thefollowing definition.

Definition 1.9 If U is a vector space and V,W its subspaces, the sum of V

and W , denoted by V +W , is the subspace of U given by

V +W ≡ {u ∈ U | u = v + w, v ∈ V,w ∈ W }. (1.5.1)

Checking that V +W is a subspace of U that is also the smallest subspaceof U containing V ∪W will be assigned as an exercise.

1.5 Constructions of vector spaces 21

Now let B0 = {u1, . . . , uk} be a basis of V ∩W . Expand it to obtain basesfor V and W , respectively, of the forms

BV ={u1, . . . , uk, v1, . . . , vl}, BW = {u1, . . . , uk, w1, . . . , wm}. (1.5.2)

From the definition of V +W , we get

V +W = Span{u1, . . . , uk, v1, . . . , vl, w1, . . . , wm}. (1.5.3)

We can see that {u1, . . . , uk, v1, . . . , vl, w1, . . . , wm} is a basis of V +W . Infact, we only need to show that the vectors u1, . . . , uk, v1, . . . , vl, w1, . . . , wm

are linearly independent. For this purpose, consider the linear relation

a1u1 + · · · + akuk + b1v1 + · · · + blvl + c1w1 + · · · + cmwm=0, (1.5.4)

where a1, . . . , ak, b1, . . . , bl, c1, . . . , cm are scalars. We claim that

w = c1w1 + · · · + cmwm = 0. (1.5.5)

Otherwise, using (1.5.4) and (1.5.5), we see that w ∈ V . However, we alreadyhave w ∈ W . So w ∈ V ∩W , which is false since u1, . . . , uk, w1, . . . , wm arelinearly independent. Thus (1.5.5) follows and c1 = · · · = cm = 0. Applying(1.5.5) in (1.5.4), we immediately have a1 = · · · = ak = b1 = · · · = bl = 0.

Therefore we can summarize the above discussion by concluding with thefollowing theorem.

Theorem 1.10 The following general dimensionality formula

dim(V +W) = dim(V )+ dim(W)− dim(V ∩W) (1.5.6)

is valid for the sum of any two subspaces V and W of finite dimensions in avector space.

Of great importance is the situation when dim(V ∩W) = 0 or V ∩W = {0}.In this situation, the sum is called direct sum, and rewritten as V ⊕W . Thus,we have

dim(V ⊕W) = dim(V )+ dim(W). (1.5.7)

Direct sum has the following characteristic.

Theorem 1.11 The sum of two subspaces V and W of U is a direct sum if andonly if each vector u in V +W may be expressed as the sum of a unique vectorv ∈ V and a unique vector w ∈ W .

22 Vector spaces

Proof Suppose first V ∩W = {0}. For any u ∈ V +W , assume that it maybe expressed as

u = v1 + w1 = v2 + w2, v1, v2 ∈ V, w1, w2 ∈ W. (1.5.8)

From (1.5.8), we have v1 − v2 = w2 −w1 which lies in V ∩W . So v1 − v2 =w2 − w1 = 0 and the stated uniqueness follows.

Suppose that any u ∈ V + W can be expressed as u = v + w for someunique v ∈ V and w ∈ W . If V ∩W = {0}, take x ∈ V ∩W with x = 0. Thenzero vector 0 may be expressed as 0 = x + (−x) with x ∈ V and (−x) ∈ W ,which violates the stated uniqueness since 0 = 0 + 0 with 0 ∈ V and 0 ∈ W ,as well.

Let V be a subspace of an n-dimensional vector space U and BV ={v1, . . . , vk} be a basis of V . Extend BV to get a basis of U , say{v1, . . . , vk, w1, . . . , wl}, where k + l = n. Define

W = Span{w1, . . . , wl}. (1.5.9)

Then we obtain U = V ⊕W . The subspace W is called a linear complement,or simply complement, of V in U . Besides, the subspaces V and W are said tobe mutually complementary in U .

We may also build up a vector space from any two vector spaces, say V andW , over the same field F, as a direct sum of V and W . To see this, we constructvectors of the form

u = (v,w), v ∈ V, w ∈ W, (1.5.10)

and define vector addition and scalar multiplication component-wise by

u1 + u2 = (v1, w1)+ (v2, w2) = (v1 + v2, w2 + w2),

v1, v2 ∈ V, w1, w2 ∈ W, (1.5.11)

au = a(v,w) = (aw, av), v ∈ V, w ∈ W, a ∈ F. (1.5.12)

It is clear that the set U of all vectors of the form (1.5.10) equipped with thevector addition (1.5.11) and scalar multiplication (1.5.12) is a vector space overF. Naturally we may identify V and W with the subspaces of U given by

V = {(v, 0) | v ∈ V }, W = {(0, w) |w ∈ W }. (1.5.13)

Of course, U = V ⊕W . Thus, in a well-understood sense, we may also rewritethis relation as U = V ⊕W as anticipated. Sometimes the vector space U soconstructed is also referred to as the direct product of V and W and rewrittenas U = V ×W . In this way, R2 may naturally be viewed as R×R, for example.

1.5 Constructions of vector spaces 23

More generally, let V1, . . . , Vk be any k subspaces of a vector space U . In asimilar manner we can define the sum

V = V1 + · · · + Vk, (1.5.14)

which is of course a subspace of U . Suggested by the above discussion, ifeach v ∈ V may be written as v = v1 + · · · + vk for uniquely determinedv1 ∈ V1, . . . , vk ∈ Vk , then we say that V is the direct sum of V1, . . . , Vk andrewrite such a relation as

V = V1 ⊕ · · · ⊕ Vk. (1.5.15)

It should be noted that, when k ≥ 3, extra caution has to be exerted whenchecking whether the sum (1.5.14) is a direct sum. For example, the naivecondition

Vi ∩ Vj = {0}, i = j, i, j = 1, . . . , k, (1.5.16)

among V1, . . . , Vk , is not sufficient anymore to ensure (1.5.15).To illustrate this subtlety, let us consider V = F2 and take

V1 = Span

{(1

0

)}, V2 = Span

{(1

1

)}, V3 = Span

{(1

−1

)}.

(1.5.17)

It is clear that V1, V2, V3 satisfy (1.5.16) and V = V1 + V2 + V3 but V cannotbe a direct sum of V1, V2, V3.

In fact, in such a general situation, a correct condition which replaces(1.5.16) is

Vi ∩⎛⎝ ∑

1≤j≤k,j =i

Vj

⎞⎠ = {0}, i = 1, . . . , k. (1.5.18)

In other words, when the condition (1.5.18) is fulfilled, then (1.5.15) is valid.The proof of this fact is left as an exercise.

Exercises

1.5.1 For

U = {(x1, . . . , xn) ∈ Fn| x1 + · · · + xn = 0},V = {(x1, . . . , xn) ∈ Fn | x1 = · · · = xn},

(1.5.19)

prove that Fn = U ⊕ V .

24 Vector spaces

1.5.2 Consider the vector space of all n×n matrices over a field F, denoted byF(n, n). As before, use FS(n, n) and FA(n, n) to denote the subspaces ofsymmetric and anti-symmetric matrices. Assume that the characteristicof F is not equal to 2. For any M ∈ F(n, n), rewrite M as

M = 1

2(M +Mt)+ 1

2(M −Mt). (1.5.20)

Check that1

2(M +Mt) ∈ FS(n, n) and

1

2(M −Mt) ∈ FA(n, n). Use

this fact to prove the decomposition

F(n, n) = FS(n, n)⊕ FA(n, n). (1.5.21)

What happens when the characteristic of F is 2 such as when F = Z2?1.5.3 Show that FL(n, n) ∩ FU(n, n) = FD(n, n).1.5.4 Use FL(n, n) and FU(n, n) in F(n, n) to give an example for the dimen-

sionality relation (1.5.6).1.5.5 Let X = C[a, b] (a, b ∈ R and a < b) be the vector space of all real-

valued continuous functions over the interval [a, b] and

Y ={f ∈ X

∣∣∣∣ ∫ b

a

f (t) dt = 0

}. (1.5.22)

(i) Prove that if R is identified with the set of all constant functions over[a, b] then X = R⊕ Y .

(ii) For a = 0, b = 1, and f (t) = t2 + t − 1, find the unique c ∈ R andg ∈ Y such that f (t) = c + g(t) for all t ∈ [0, 1].

1.5.6 Let U be a vector space and V,W its subspaces such that U = V +W .If X is subspace of U , is it true that X = (X ∩ V )+ (X ∩W)?

1.5.7 Let U be a vector space and V,W,X some subspaces of U such that

U = V ⊕W, U = V ⊕X. (1.5.23)

Can one infer W = X from the condition (1.5.23)? Explain why orwhy not.

1.5.8 Let V1, . . . , Vk be some subspaces of U and set V = V1 + · · · + Vk .Show that this sum is a direct sum if and only if one of the followingstatements is true.

(i) V1, . . . , Vk satisfy (1.5.18).(ii) If non-overlapping sets of vectors

{v11, . . . , v1

l1}, . . . , {vk

1, . . . , vklk} (1.5.24)

1.6 Quotient spaces 25

are bases of V1, . . . , Vk , respectively, then the union

{v11, . . . , v1

l1} ∪ · · · ∪ {vk

1, . . . , vklk} (1.5.25)

is a basis of V .(iii) There holds the dimensionality relation

dim(V ) = dim(V1)+ · · · + dim(Vk). (1.5.26)

1.6 Quotient spaces

In order to motivate the introduction of the concept of quotient spaces, we firstconsider a concrete example in R2.

Let v ∈ R2 be any nonzero vector. Then

V = Span{v} (1.6.1)

represents the line passing through the origin and along (or opposite to) thedirection of v. More generally, for any u ∈ R2, the coset

[u] = {u+ w |w ∈ V } = {x ∈ R2 | x − u ∈ V } (1.6.2)

represents the line passing through the vector u and parallel to the vector v.Naturally, we define [u1] + [u2] = {x + y | x ∈ [u1], y ∈ [u2]} and claim

[u1] + [u2] = [u1 + u2]. (1.6.3)

In fact, let z ∈ [u1] + [u2]. Then there exist x ∈ [u1] and y ∈ [u2] such thatz = x + y. Rewrite x, y as x = u1 + w1, y = u2 + w2 for some w1, w2 ∈ V .Hence z = (u1 + u2)+ (w1 + w2), which implies z ∈ [u1 + u2]. Conversely,if z ∈ [u1 + u2], then there is some w ∈ V such that z = (u1 + u2) + w =(u1+w)+u2. Since u1+w ∈ [u1] and u2 ∈ [u2], we see that z ∈ [u1]+ [u2].Hence the claim follows.

From the property (1.6.3), we see clearly that the coset [0] = V serves as anadditive zero element among the set of all cosets.

Similarly, we may also naturally define a[u] = {ax | x ∈ [u]} for a ∈ R

where a = 0. Note that this last restriction is necessary because otherwise 0[u]would be a single-point set consisting of zero vector only. We claim

a[u] = [au], a ∈ R \ {0}. (1.6.4)

In fact, if z ∈ a[u], there is some x ∈ [u] such that z = ax. Since x ∈ [u],there is some w ∈ V such that x = u+w. So z = au+ aw which implies z ∈[au]. Conversely, if z ∈ [au], then there is some w ∈ V such that z = au+w.Since z = a(u+ a−1w), we get z ∈ a[u]. So (1.6.4) is established.

26 Vector spaces

Since the coset [0] is already seen to be the additive zero when adding cosets,we are prompted to define

0[u] = [0]. (1.6.5)

Note that (1.6.4) and (1.6.5) may be collectively rewritten as

a[u] = [au], a ∈ R, u ∈ R2. (1.6.6)

We may examine how the above introduced addition between cosets andscalar multiplication with cosets make the set of all cosets into a vector spaceover R, denoted by R2/V, and called the quotient space of R2 modulo V . Asinvestigated, the geometric meaning of R2/V is that it is the set of all the linesin R2 parallel to the vector v and that these lines can be added and multipliedby real scalars so that the set of lines enjoys the structure of a real vector space.

There is no difficulty extending the discussion to the case of R3 with V aline or plane through the origin.

More generally, the above quotient-space construction may be formulatedas follows.

Definition 1.12 Let U be a vector space over a field F and V a subspace of U .The set of cosets represented by u ∈ U given by

[u] = {u+ w |w ∈ V } = {u} + V ≡ u+ V, (1.6.7)

equipped with addition and scalar multiplication defined by

[u] + [v] = [u+ v], u, v ∈ U, a[u] = [au], a ∈ F, u ∈ U, (1.6.8)

forms a vector space over F, called the quotient space of U modulo V , and isdenoted by U/V .

Let BV = {v1, . . . , vk} be a basis of V . Extend BV to get a basis of U , say

BU = {v1, . . . , vk, u1, . . . , ul}. (1.6.9)

We claim that {[u1], . . . , [ul]} forms a basis for U/V . In fact, it isevident that these vectors span U/V . So we only need to show their linearindependence.

Consider the relation

a1[u1] + · · · + al[ul] = 0, a1, . . . , al ∈ F. (1.6.10)

Note that 0 = [0] in U/V . So a1u1 + · · · + alul ∈ V . Thus there are scalarsb1, . . . , bk ∈ F such that

a1u1 + · · · + alul = b1v1 + · · · + bkvk. (1.6.11)

1.6 Quotient spaces 27

Hence a1 = · · · = al = b1 = · · · = bk = 0 and the claimed linear indepen-dence follows.

As a consequence of the afore-going discussion, we arrive at the followingbasic dimensionality relation

dim(U/V )+ dim(V ) = dim(U). (1.6.12)

Note that the construction made to arrive at (1.6.12) demonstrates a practicalway to find a basis for U/V . The quantity dim(U/V ) = dim(U)− dim(V ) issometimes called the codimension of the subspace V in U . The quotient spaceU/V may be viewed as constructed from the space U after ‘collapsing’ or‘suppressing’ its subspace V .

Exercises

1.6.1 Let V be the subspace of R2 given by V = Span{(1,−1)}.(i) Draw V in R2.

(ii) Draw the cosets

S1 = (1, 1)+ V, S2 = (2, 1)+ V. (1.6.13)

(iii) Describe the quotient space R2/V .(iv) Draw the coset S3 = (−2)S1 + S2.(v) Determine whether S3 is equal to (−1, 0)+ V (explain why).

1.6.2 Let V be the subspace of P2 (set of polynomials of degrees up to 2 andwith coefficients in R) satisfying∫ 1

−1p(t) dt = 0, p(t) ∈ P2. (1.6.14)

(i) Find a basis to describe V .(ii) Find a basis for the quotient space P2/V and verify (1.6.12).

1.6.3 Describe the plane given in terms of the coordinates (x, y, z) ∈ R3 bythe equation

ax + by + cz = d, (a, b, c) = (0, 0, 0), (1.6.15)

where a, b, c, d ∈ R are constants, as a coset in R3 and as a point in aquotient space.

1.6.4 Let U be a vector space and V and W some subspaces of U . For u ∈ U ,use [u]V and [u]W to denote the cosets u+ V and u+W , respectively.Show that, if V ⊂ W , and u1, . . . , uk ∈ U , then that [u1]V , . . . , [uk]Vare linearly dependent implies that [u1]W, . . . , [uk]W are linearlydependent. In particular, if U is finite dimensional, then dim(U/V ) ≥dim(U/W), if V ⊂ W , as may also be seen from using (1.6.12).

28 Vector spaces

1.7 Normed spaces

It will be desirable to be able to evaluate the ‘length’ or ‘magnitude’ or‘amplitude’ of any vector in a vector space. In other words, it will be use-ful to associate to each vector a quantity that resembles the notion of length ofa vector in (say) R3. Such a quantity is generically called norm.

In this section, we take the field F to be either R or C.

Definition 1.13 Let U be a vector space over the field F. A norm over U is acorrespondence ‖ · ‖ : U → R such that we have the following.

(1) (Positivity) ‖u‖ ≥ 0 for u ∈ U and ‖u‖ = 0 only for u = 0.(2) (Homogeneity) ‖au‖ = |a|‖u‖ for a ∈ F and u ∈ U .(3) (Triangle inequality) ‖u+ v‖ ≤ ‖u‖ + ‖v‖ for u, v ∈ U .

A vector space equipped with a norm is called a normed space. If ‖ · ‖ isthe specific norm of the normed space U , we sometimes spell this fact out bystating ‘normed space (U, ‖ · ‖)’.

Definition 1.14 Let (U, ‖ · ‖) be a normed space and {uk} a sequence in U .For some u0 ∈ U , we say that uk → u0 or {uk} converges to u0 as k →∞ if

limk→∞‖u0 − uk‖ = 0, (1.7.1)

and the sequence {uk} is said to be a convergent sequence. The vector u0 isalso said to be the limit of the sequence {uk}.

The notions of convergence and limit are essential for carrying out calculusin a normed space.

Let U be a finite-dimensional space. We can easily equip U with a norm.For example, assume that B = {u1, . . . , un} is a basis of U . For any u ∈ U ,

define

‖u‖1 =n∑

i=1

|ai |, where u =n∑

i=1

aiui . (1.7.2)

It is direct to verify that ‖ · ‖1 indeed defines a norm.More generally, for p ≥ 1, we may set

‖u‖p =(

n∑i=1

|ai |p) 1

p

, where u =n∑

i=1

aiui . (1.7.3)

It can be shown that ‖ · ‖p also defines a norm over U . We will not check thisfact. What interests us here, however, is the limit

1.7 Normed spaces 29

limp→∞‖u‖p = max{|ai | | i = 1, . . . , n}. (1.7.4)

To prove (1.7.4), we note that the right-hand side of (1.7.4) is simply |ai0 |for some i0 ∈ {1, . . . , n}. Thus, in view of (1.7.3), we have

‖u‖p ≤(

n∑i=1

|ai0 |p) 1

p

= |ai0 |n1p . (1.7.5)

From (1.7.5), we obtain

lim supp→∞

‖u‖p ≤ |ai0 |. (1.7.6)

On the other hand, using (1.7.3) again, we have

‖u‖p ≥(|ai0 |p) 1

p = |ai0 |. (1.7.7)

Thus,

lim infp→∞ ‖u‖p ≥ |ai0 |. (1.7.8)

Therefore (1.7.4) is established. As a consequence, we are motivated to adoptthe notation

‖u‖∞ = max{|ai | | i = 1, . . . , n}, where u =n∑

i=1

aiui, (1.7.9)

and restate our result (1.7.4) more elegantly as

limp→∞‖u‖p = ‖u‖∞, u ∈ U. (1.7.10)

It is evident that ‖ · ‖∞ does define a norm over U .Thus we have seen that there are many ways to introduce a norm over a

vector space. So it will be important to be able to compare norms. For calculus,it is obvious that the most important thing with regard to a norm is the notionof convergence. In particular, assume there are two norms, say ‖ · ‖ and ‖ · ‖′,equipped over the vector space U . We hope to know whether convergence withrespect to norm ‖·‖ implies convergence with respect to norm ‖·‖′. This desiremotivates the introduction of the following concept.

Definition 1.15 Let U be a vector space and ‖ · ‖ and ‖ · ‖′ two norms over U .We say that ‖ · ‖ is stronger than ‖ · ‖′ if convergence with respect to norm ‖ · ‖implies convergence with respect to norm ‖·‖′. More precisely, any convergentsequence {uk} with limit u0 in (U, ‖ · ‖) is also a convergent sequence with thesame limit in (U, ‖ · ‖′).

Regarding the above definition, we have the following.

30 Vector spaces

Theorem 1.16 Let U be a vector space and ‖ · ‖ and ‖ · ‖′ two norms over U .Then norm ‖ · ‖ is stronger than norm ‖ · ‖′ if and only if there is a constantC > 0 such that

‖u‖′ ≤ C‖u‖, ∀u ∈ U. (1.7.11)

Proof If (1.7.11) holds, then it is clear that ‖ · ‖ is stronger than ‖ · ‖′.Suppose that ‖·‖ is stronger than ‖·‖′ but (1.7.11) does not hold. Thus there

is a sequence {uk} in U such that uk = 0 (k = 1, 2, . . . ) and

‖uk‖′‖uk‖ ≥ k, k = 1, 2, . . . . (1.7.12)

Define

vk = 1

‖uk‖′ uk, k = 1, 2, . . . . (1.7.13)

Then (1.7.12) yields the bounds

‖vk‖ ≤ 1

k, k = 1, 2, . . . . (1.7.14)

Consequently, vk → 0 as k →∞ with respect to norm ‖ · ‖. However, accord-ing to (1.7.13), we have ‖vk‖′ = 1, k = 1, 2, . . . , and vk → 0 as k →∞ withrespect to norm ‖ · ‖′. This reaches a contradiction.

Definition 1.17 Let U be a vector space and ‖ · ‖ and ‖ · ‖′ two norms over U .We say that norms ‖ · ‖ and ‖ · ‖′ are equivalent if convergence in norm ‖ · ‖ isequivalent to convergence in norm ‖ · ‖′.

In view of Theorem 1.16, we see that norms ‖ · ‖ and ‖ · ‖′ are equivalent ifand only if the inequality

C1‖u‖ ≤ ‖u‖′ ≤ C2‖u‖, ∀u ∈ U, (1.7.15)

holds true for some suitable constants C1, C2 > 0.The following theorem regarding norms over finite-dimensional spaces is of

fundamental importance.

Theorem 1.18 Any two norms over a finite-dimensional space are equivalent.

Proof Let U be an n-dimensional vector space and B = {u1, . . . , un} a basisfor U . Define the norm ‖ · ‖1 by (1.7.2). Let ‖ · ‖ be any given norm over U .Then by the properties of norm we have


‖u‖ ≤n∑

i=1

|ai |‖ui‖ ≤ α2

n∑i=1

|ai | = α2‖u‖1, (1.7.16)

where we have set

α2 = max{‖ui‖ | i = 1, . . . , n}. (1.7.17)

In other words, we have shown that ‖ · ‖1 is stronger than ‖ · ‖.In order to show that ‖ · ‖ is also stronger than ‖ · ‖1, we need to prove the

existence of a constant α1 > 0 such that

α1‖u‖1 ≤ ‖u‖, ∀u ∈ U. (1.7.18)

Suppose otherwise that (1.7.18) fails to be valid. In other words, the set ofratios { ‖u‖

‖u‖1

∣∣∣∣ u ∈ U, u = 0

}(1.7.19)

does not have a positive infimum. Then there is a sequence {vk} in U such that

1

k‖vk‖1 ≥ ‖vk‖, vk = 0, k = 1, 2, . . . . (1.7.20)

Now set

wk = 1

‖vk‖1vk, k = 1, 2, . . . . (1.7.21)

Then (1.7.20) implies that wk → 0 as k →∞ with respect to ‖ · ‖.On the other hand, we have ‖wk‖1 = 1 (k = 1, 2, . . . ). Express wk with

respect to basis B as

wk = a1,ku1 + · · · + an,kun, a1,k, . . . , an,k ∈ F, k = 1, 2, . . . . (1.7.22)

The definition of norm ‖ · ‖1 then implies |ai,k| ≤ 1 (i = 1, . . . , n, k =1, 2, . . . ). Hence, by the Bolzano–Weierstrass theorem, there is a subsequenceof {k}, denoted by {ks}, such that ks →∞ as s →∞ and

ai,ks → some ai,0 ∈ F as s →∞, i = 1, . . . , n. (1.7.23)

Now set w0 = a1,0u1 + · · · + an,0un. Then ‖w0 − wks‖1 → 0 as s →∞.Moreover, the triangle inequality gives us

‖w0‖1 ≥ ‖wks‖1 − ‖wks − w0‖1 = 1− ‖wks − w0‖1, s = 1, 2, . . . .

(1.7.24)

Thus, letting s →∞ in (1.7.24), we obtain ‖w0‖1 ≥ 1.However, substituting u = w0 −wks in (1.7.16) and letting s →∞, we see

that wks → w0 as s → ∞ with respect to norm ‖ · ‖ as well which is falsebecause we already know that wk → 0 as k →∞ with respect to norm ‖ · ‖.

32 Vector spaces

Summarizing the above study, we see that there are some constants α1, α2 >

0 such that

α1‖u‖1 ≤ ‖u‖ ≤ α2‖u‖1, ∀u ∈ U. (1.7.25)

Finally, let ‖ · ‖′ be another norm over U . Then we have some constantsβ1, β2 > 0 such that

β1‖u‖1 ≤ ‖u‖′ ≤ β2‖u‖1, ∀u ∈ U. (1.7.26)

Combining (1.7.25) and (1.7.26), we arrive at the desired conclusion

β1

α2‖u‖ ≤ ‖u‖′ ≤ β2

α1‖u‖, ∀u ∈ U, (1.7.27)

which establishes the equivalence of norms ‖ · ‖ and ‖ · ‖′ as stated.

As an immediate application of Theorem 1.18, we have the following.

Theorem 1.19 A finite-dimensional normed space (U, ‖·‖) is locally compact.That is, any bounded sequence in U contains a convergent subsequence.

Proof Let {u1, . . . , un} be any basis of U and introduce norm ‖ · ‖1 by theexpression (1.7.2). Let {vk} be a bounded sequence in (U, ‖ · ‖). Then Theo-rem 1.18 says that {vk} is also bounded in (U, ‖ · ‖1). If we rewrite vk as

vk = a1,ku1 + · · · + an,kun, a1,k, . . . , an,k ∈ F, k = 1, 2, . . . , (1.7.28)

the definition of ‖ · ‖1 then implies that the sequences {ai,k} (i = 1, 2, . . . ) areall bounded in F. Thus the Bolzano–Weierstrass theorem indicates that thereare subsequences {ai,ks } (i = 1, . . . , n) which converge to some ai,0 ∈ F

(i = 1, . . . , n) as s →∞. Consequently, setting v0 = a1,0u1 + · · · + an,0un,we have ‖v0 − vks‖1 → 0 as s →∞. Thus ‖v0 − vks‖ ≤ C‖v0 − vks‖1 → 0as s →∞ as well.

Let (U, ‖ · ‖) be a finite-dimensional normed vector space and V a subspaceof U . Consider the quotient space U/V . As an exercise, it may be shown that

‖[u]‖ = inf{‖v‖ | v ∈ [u]}, [u] ∈ U/V, u ∈ U, (1.7.29)

defines a norm over U/V .

Exercises

1.7.1 Consider the vector space C[a, b] of the set of all real-valued continuousfunctions over the interval [a, b] where a, b ∈ R and a < b. Define twonorms ‖ · ‖1 and ‖ · ‖∞ by setting


‖u‖1 =∫ b

a

|u(t)| dt, ‖u‖∞ = maxt∈[a,b] |u(t)|, ∀u ∈ C[a, b].

(1.7.30)

Show that ‖ · ‖∞ is stronger than ‖ · ‖1 but not vice versa.1.7.2 Over the vector space C[a, b] again, define norm ‖ · ‖p(p ≥ 1) by

‖u‖p =(∫ b

a

|u(t)|p dt

) 1p

, ∀u ∈ C[a, b]. (1.7.31)

Show that ‖ · ‖∞ is stronger than ‖ · ‖p for any p ≥ 1 and prove that

limp→∞‖u‖p = ‖u‖∞, ∀u ∈ C[a, b]. (1.7.32)

1.7.3 Let (U, ‖ · ‖) be a normed space and use U ′ to denote the dual space ofU . For each u′ ∈ U ′, define

‖u′‖′ = sup{|u′(u)| | u ∈ U, ‖u‖ = 1}. (1.7.33)

Show that ‖ · ‖′ defines a norm over U ′.1.7.4 Let U be a vector space and U ′ its dual space which is equipped with a

norm, say ‖ · ‖′. Define

‖u‖ = sup{|u′(u)| | u′ ∈ U ′, ‖u′‖′ = 1}. (1.7.34)

Show that ‖ · ‖ defines a norm over U as well.1.7.5 Prove that for any finite-dimensional normed space (U, ‖·‖) with a given

subspace V we may indeed use (1.7.29) to define a norm for the quotientspace U/V .

1.7.6 Let ‖ · ‖ denote the Euclidean norm of R2 which is given by

‖u‖ =√

a21 + a2

2, u = (a1, a2) ∈ R2. (1.7.35)

Consider the subspace V = {(x, y) ∈ R2 | 2x − y = 0} and the coset[(−1, 1)] in R2 modulo V .

(i) Find the unique vector v ∈ R2 such that ‖v‖ = ‖[(−1, 1)]‖.(ii) Draw the coset [(−1, 1)] in R2 and the vector v found in part (i) and

explain the geometric content of the results.

2

Linear mappings

In this chapter we consider linear mappings over vector spaces. We begin bystating the definition and a discussion of the structural properties of linear map-pings. We then introduce the notion of adjoint mappings and illustrate someof their applications. We next focus on linear mappings from a vector spaceinto itself and study a series of important concepts such as invariance andreducibility, eigenvalues and eigenvectors, projections, nilpotent mappings,and polynomials of linear mappings. Finally we discuss the use of norms oflinear mappings and present a few analytic applications.

2.1 Linear mappings

A linear mapping may be regarded as the simplest correspondencebetween two vector spaces. In this section we start our study with the def-inition of linear mappings. We then discuss the matrix representation of alinear mapping, composition of linear mappings, and the rank and nullity ofa linear mapping.

2.1.1 Definition, examples, and notion of associated matrices

Let U and V be two vector spaces over the same field F. A linear mapping orlinear map or linear operator is a correspondence T from U into V , written asT : U → V , satisfying the following.

(1) (Additivity) T (u1 + u2) = T (u1)+ T (u2), u1, u2 ∈ U .

(2) (Homogeneity) T (au) = aT (u), a ∈ F, u ∈ U .

34

2.1 Linear mappings 35

A special implication of the homogeneity condition is that T (0) = 0. Onemay also say that a linear mapping ‘respects’ or preserves vector addition andscalar multiplication.

The set of all linear mappings from U into V will be denoted by L(U, V ).For S, T ∈ L(U, V ), we define S + T to be a mapping from U into V

satisfying

(S + T )(u) = S(u)+ T (u), ∀u ∈ U. (2.1.1)

For any a ∈ F and T ∈ L(U, V ), we define aT to be the mapping from U intoV satisfying

(aT )(u) = aT (u), ∀u ∈ U. (2.1.2)

We can directly check that the mapping addition (2.1.1) and scalar-mappingmultiplication (2.1.2) make L(U, V ) a vector space over F. We adopt thenotation L(U) = L(U,U).

As an example, consider the space of matrices, F(m, n). ForA = (aij ) ∈ F(m, n), define

TA(x) = Ax =⎛⎜⎝ a11 · · · a1n

· · · · · · · · ·am1 · · · amn

⎞⎟⎠⎛⎜⎜⎝

x1

...

xn

⎞⎟⎟⎠ , x =

⎛⎜⎜⎝x1

...

xn

⎞⎟⎟⎠ ∈ Fn.

(2.1.3)

Then TA ∈ L(Fn,Fm). Besides, for the standard basis e1, . . . , en of Fn, wehave

TA(e1) =

⎛⎜⎜⎝a11

...

am1

⎞⎟⎟⎠ , . . . , TA(en) =

⎛⎜⎜⎝a1n

...

amn

⎞⎟⎟⎠ . (2.1.4)

In other words, the images of e1, . . . , en under the linear mapping TA are sim-ply the column vectors of the matrix A, respectively.

Conversely, take any element T ∈ L(Fn,Fm). Let v1, . . . , vn ∈ Fm beimages of e1, . . . , en under T such that

T (e1) = v1 =

⎛⎜⎜⎝a11

...

am1

⎞⎟⎟⎠ = m∑i=1

ai1ei, . . . ,

36 Linear mappings

T (en) = vn =

⎛⎜⎜⎝a1n

...

amn

⎞⎟⎟⎠ = m∑i=1

ainei, (2.1.5)

where {e1, . . . , em} is the standard basis of Fm. Then any x = x1e1+· · ·+xnen

has the image

T (x) = T

⎛⎝ n∑j=1

xj en

⎞⎠ = n∑j=1

xjvj = (v1, . . . , vn)

⎛⎜⎜⎝x1

...

xn

⎞⎟⎟⎠

=⎛⎜⎝ a11 · · · a1n

· · · · · · · · ·am1 · · · amn

⎞⎟⎠⎛⎜⎜⎝

x1

...

xn

⎞⎟⎟⎠ . (2.1.6)

In other words, T can be identified with TA through the matrix A consistingof column vectors as images of e1, . . . , en under T given in (2.1.5).

It may also be examined that mapping addition and scalar-mapping multi-plication correspond to matrix addition and scalar-matrix multiplication.

In this way, as vector spaces, F(m, n) and L(Fn,Fm) may be identified witheach other.

In general, let BU = {u1, . . . , un} and BV = {v1, . . . , vm} be bases of thefinite-dimensional vector spaces U and V over the field F, respectively. Forany T ∈ L(U, V ), we can write

T (uj ) =m∑

i=1

aij vi, aij ∈ F, i = 1, . . . , m, j = 1, . . . , n. (2.1.7)

Since the images T (u1), . . . , T (un) completely determine the mapping T , wesee that the matrix A = (aij ) completely determines the mapping T . In otherwords, to each mapping in L(U, V ), there corresponds a unique matrix inF(m, n).

Conversely, for any A = (aij ) ∈ F(m, n), we define T (u1), . . . , T (un) by

(2.1.7). Moreover, for any x ∈ U given by x =n∑

j=1

xjuj for x1, . . . , xn ∈ F,

we set

T (x) = T

⎛⎝ n∑j=1

xjuj

⎞⎠ = n∑j=1

xjT (uj ). (2.1.8)

It is easily checked that this makes T a well-defined element in L(U, V ).


Thus we again see, after specifying bases for U and V , that we may identifyL(U, V ) with F(m, n) in a natural way.

After the above general description of linear mappings, especially their iden-tification with matrices, we turn our attention to some basic properties of linearmappings.

2.1.2 Composition of linear mappings

Let U,V,W be vector spaces over a field F of respective dimensions n, l,m.For T ∈ L(U, V ) and S ∈ L(V,W), we can define the composition of T andS with the understanding

(S ◦ T )(x) = S(T (x)), x ∈ U. (2.1.9)

It is obvious that S ◦ T ∈ L(U,W). We now investigate the matrix of S ◦ T interms of the matrices of S and T .

To this end, let BU ={u1, . . . , un},BV ={v1, . . . , vl},BW ={w1, . . . , wm}be the bases of U,V,W , respectively, and A = (aij ) ∈ F(l, n) and B =(bij ) ∈ F(m, l) be the correspondingly associated matrices of T and S, respec-tively. Then we have

(S ◦ T )(uj ) = S(T (uj )) = S

(l∑

i=1

aij vi

)=

l∑i=1

aij S(vi)

=l∑

i=1

m∑k=1

aij bkiwk =m∑

i=1

l∑k=1

bikakjwi. (2.1.10)

In other words, we see that if we take C = (cij ) ∈ F(m, n) to be the matrixassociated to the linear mapping S ◦ T with respect to the bases BU and BW ,then C = BA. Hence the composition of linear mappings corresponds to themultiplication of their associated matrices, in the same order. For this reason, itis also customary to use ST to denote S ◦T , when there is no risk of confusion.

The composition of linear mappings obviously enjoys the associativityproperty

R ◦ (S ◦ T ) = (R ◦ S) ◦ T , R ∈ L(W,X), S ∈ L(V,W), T ∈ L(U, V ),

(2.1.11)

as can be seen from

(R ◦ (S ◦ T ))(u) = R((S ◦ T )(u)) = R(S(T (u))) (2.1.12)

38 Linear mappings

and

((R ◦ S) ◦ T )(u) = (R ◦ S)(T (u)) = R(S(T (u))) (2.1.13)

for any u ∈ U .Thus, when applying (2.1.11) to the situation of linear mappings between

finite-dimensional vector spaces and using their associated matrix representa-tions, we obtain another proof of the associativity of matrix multiplication.

2.1.3 Null-space, range, nullity, and rank

For T ∈ L(U, V ), the subset

N(T ) = {x ∈ U | T (x) = 0} (2.1.14)

in U is a subspace of U called the null-space of T , which is sometimes calledthe kernel of T , and denoted as Kernel(T ) or T −1(0). Here and in the sequel,note that, in general, T −1(v) denotes the set of preimages of v ∈ V under T .That is,

T −1(v) = {x ∈ U | T (x) = v}. (2.1.15)

Besides, the subset

R(T ) = {v ∈ V | v = T (x) for some x ∈ U} (2.1.16)

is a subspace of V called the range of T , which is sometimes called the imageof U under T , and denoted as Image(T ) or T (U).

For T ∈ L(U, V ), we say that T is one-to-one, or 1-1, or injective, if T (x) =T (y) whenever x, y ∈ U with x = y. It is not hard to show that T is 1-1 if andonly if N(T ) = {0}. We say that T is onto or surjective if R(T ) = V .

A basic problem in linear algebra is to investigate whether the equation

T (x) = v (2.1.17)

has a solution x in U for a given vector v ∈ V . From the meaning of R(T ),we see that (2.1.17) has a solution if and only if v ∈ R(T ), and the solutionis unique if and only if N(T ) = {0}. More generally, if v ∈ R(T ) and u ∈ U

is any particular solution of (2.1.17), then the set of all solutions of (2.1.17) issimply the coset

[u] = {u} +N(T ) = u+N(T ). (2.1.18)

Thus, in terms of N(T ) and R(T ), we may understand the structure of theset of solutions of (2.1.17) completely.


First, as the set of cosets, we have

U/N(T ) = {{T −1(v)} | v ∈ R(T )}. (2.1.19)

Furthermore, for v1, v2 ∈ R(T ), take u1, u2 ∈ U such that T (u1) =v2, T (u2) = v2. Then [u1] = T −1(v1), [u2] = T −1(v2), and [u1] + [u2] =[u1 + u2] = T −1(v1 + v2); for a ∈ F, v ∈ R(T ), and u ∈ T −1(v), we haveau ∈ T −1(av) and a[u] = [au] = T −1(av), which naturally give us theconstruction of U/N(T ) as a vector space.

Next, we consider the quotient space V/R(T ), referred to as cokernel ofT , written as Cokernel(T ). Recall that V/R(T ) is the set of non-overlappingcosets in V modulo R(T ) so that exactly [0] = R(T ). Therefore we have

V \ R(T ) =⋃[v], (2.1.20)

where the union of cosets on the right-hand side of (2.1.20) is made over all[v] ∈ V/R(T ) except the zero element [0]. In other words, the union of allcosets in V/R(T ) \ {[0]} gives us the set of all such vectors v in V that theequation (2.1.17) fails to have a solution in U . With set notation, this last state-ment is

{v ∈ V | T −1(v) = ∅} =⋃[v]. (2.1.21)

Thus, loosely speaking, the quotient space or cokernel V/R(T ) measuresthe ‘size’ of the set of vectors in V that are ‘missed’ by the image of U

under T .

Definition 2.1 Let U and V be finite-dimensional vector spaces over a field F.The nullity and rank of a linear mapping T : U → V , denoted by n(T ) andr(T ), respectively, are the dimensions of N(T ) and R(T ). That is,

n(T ) = dim(N(T )), r(T ) = dim(R(T )). (2.1.22)

As a consequence of this definition, we see that T is 1-1 if and only ifn(T )= 0 and T is onto if and only if r(T )= dim(V ) or dim(Cokernel(T ))=0.

The following simple theorem will be of wide usefulness.

Theorem 2.2 Let U,V be vector spaces over a field F and T ∈ L(U, V ).

(1) If v1, . . . , vk ∈ V are linearly independent and u1, . . . , uk ∈ U are suchthat T (u1) = v1, . . . , T (uk) = vk, then u1, . . . , uk are linearly indepen-dent as well. In other words, the preimages of linear independent vectorsare also linearly independent.

40 Linear mappings

(2) If N(T ) = {0} and u1, . . . , uk ∈ U are linearly independent, thenv1 = T (u1), . . . , vk = T (uk) ∈ V are linearly independent as well. Inother words, the images of linearly independent vectors under a 1-1 linearmapping are also linearly independent.

Proof (1) Consider the relation a1u1 + · · · + akuk = 0, a1, . . . , ak ∈ F.

Applying T to the above relation, we get a1v1 + · · · + akvk = 0. Hencea1 = · · · = ak = 0.

(2) We similarly consider a1v1 + · · · + akvk = 0. This relation immediatelygives us T (a1u1 + · · · + akuk) = 0. Using N(T ) = {0}, we deducea1u1+· · ·+akuk = 0. Thus the linear independence of u1, . . . , uk impliesa1 = · · · = ak = 0.

The fact stated in the following theorem is known as the nullity-rank equa-tion or simply rank equation.

Theorem 2.3 Let U,V be finite-dimensional vector spaces over a field F andT ∈ L(U, V ). Then

n(T )+ r(T ) = dim(U). (2.1.23)

Proof Let {u1, . . . , uk} be a basis of N(T ). We then expand it to get a basisfor the full space U written as {u1, . . . , uk, w1, . . . , wl}. Thus

R(T ) = Span{T (w1), . . . , T (wl)}. (2.1.24)

We now show that T (w1), . . . , T (wl) form a basis for R(T ) by establishingtheir linear independence. To this end, consider b1T (w1)+ · · · + blT (wl) = 0for some b1, . . . , bl ∈ F. Hence T (b1w1 + · · · + blwl) = 0 or b1w1 + · · · +blwl ∈ N(T ). So there are a1, . . . , ak ∈ F such that b1w1 + · · · + blwl =a1u1 + · · · + akuk . Since u1, . . . , uk, w1, . . . , wl are linearly independent, wearrive at a1 = · · · = ak = b1 = · · · = bl = 0. In particular, r(T ) = l and therank equation is valid.

As an immediate application of the above theorem, we have the following.

Theorem 2.4 Let U,V be finite-dimensional vector spaces over a field F. Ifthere is some T ∈ L(U, V ) such that T is both 1-1 and onto, then there musthold dim(U) = dim(V ). Conversely, if dim(U) = dim(V ), then there is someT ∈ L(U, V ) which is both 1-1 and onto.

Proof If T is 1-1 and onto, then n(T ) = 0 and r(T ) = dim(V ). Thusdim(U) = dim(V ) follows from the rank equation.


Conversely, assume dim(U) = dim(V ) = n and let {u1, . . . , un} and{v1, . . . , vn} be any bases of U and V , respectively. We can define a uniquelinear mapping T : U → V by specifying the images of u1, . . . , un under T

to be v1, . . . , vn. That is,

T (u1) = v1, . . . , T (un) = vn, (2.1.25)

so that

T (u) =n∑

i=1

aivi, ∀u =n∑

i=1

aiui ∈ U. (2.1.26)

Since r(T ) = n, the rank equation implies n(T ) = 0. So T is 1-1 as well.

We can slightly modify the proof of Theorem 2.4 to prove the following.

Theorem 2.5 Let U,V be finite-dimensional vector spaces over a field F sothat dim(U) = dim(V ). If T ∈ L(U, V ) is either 1-1 or onto, then T must beboth 1-1 and onto. In this situation, there is a unique S ∈ L(V,U) such thatS ◦T : U → U and T ◦S : V → V are identity mappings, denoted by IU andIV , respectively, satisfying IU (u) = u,∀u ∈ U and IV (v) = v,∀v ∈ V .

Proof Suppose dim(U) = dim(V ) = n. If T ∈ L(U, V ) is 1-1, then n(T ) =0. So the rank equation gives us r(T ) = n = dim(V ). Hence T is onto. IfT ∈ L(U, V ) is onto, then r(T ) = n = dim(U). So the rank equation gives usn(T ) = 0. Hence T is 1-1. Now let {u1, . . . , un} be any basis of U . Since T

is 1-1 and onto, we know that v1, . . . , vn ∈ V defined by (2.1.25) form a basisfor V . Define now S ∈ L(V,U) by setting

S(v1) = u1, . . . , S(vn) = un, (2.1.27)

so that

S(v) =n∑

i=1

biui, ∀v =n∑

i=1

bivi ∈ V. (2.1.28)

Then it is clear that S ◦ T = IU and T ◦ S = IV . Thus the stated existence ofS ∈ L(V,U) follows.

If R ∈ L(V,U) is another mapping such that R ◦ T = IU and T ◦R = IV . Then the associativity of the composition of linear mappings (2.1.11)gives us the result R = R ◦ IV = R ◦ (T ◦ S) = (R ◦ T ) ◦ S = IU ◦S = S. Thus the stated uniqueness of S ∈ L(V,U) follows as well.

Let U,V be vector spaces over a field F and T ∈ L(U, V ). We say that T isinvertible if there is some S ∈ L(V,U) such that S ◦ T = IU and T ◦ S = IV .

42 Linear mappings

Such a mapping S is necessarily unique and is called the inverse of T , denotedas T −1.

The notion of inverse can be slightly relaxed to something refereed to as aleft or right inverse: We call an element S ∈ L(V,U) a left inverse of T ∈L(U, V ) if

S ◦ T = IU ; (2.1.29)

an element R ∈ L(V,U) a right inverse of T ∈ L(U, V ) if

T ◦ R = IV . (2.1.30)

It is interesting that if T is known to be invertible, then the left and right in-verses coincide and are simply the inverse of T . To see this, we let T −1 ∈L(V,U) denote the inverse of T and use the associativity of composition ofmappings to obtain from (2.1.29) and (2.1.30) the results

S = S ◦ IV = S ◦ (T ◦ T −1) = (S ◦ T ) ◦ T −1 = IV ◦ T −1 = T −1,

(2.1.31)

and

R = IU ◦ R = (T −1 ◦ T ) ◦ R = T −1 ◦ (T ◦ R) = T −1 ◦ IV = T −1,

(2.1.32)

respectively.It is clear that, regarding T ∈ L(U, V ), the condition (2.1.29) implies that

T is 1-1, and the condition (2.1.30) implies that T is onto. In view of Theo-rem 2.5, we see that when U,V are finite-dimensional and dim(U) = dim(V ),then T is invertible. Thus we have S = R = T −1. On the other hand, ifdim(U) = dim(V ), T can never be invertible and the notion of left and rightinverses is of separate interest.

As an example, we consider the linear mappings S : F3 → F2 and T :F2 → F3 associated with the matrices

A =(

1 0 0

0 1 0

), B =

⎛⎜⎝ 1 0

0 1

0 0

⎞⎟⎠ , (2.1.33)

respectively according to (2.1.3). Then we may examine that S is a left inverseof T (thus T is a right inverse of S). However, it is absurd to talk about theinvertibility of these mappings.

Let U and V be two vector spaces over the same field. An invertible elementT ∈ L(U, V ) (if it exists) is also called an isomorphism. If there is an isomor-phism between U and V , they are said to be isomorphic to each other, writtenas U ≈ V .


As a consequence of Theorem 2.4, we see that two finite-dimensional vectorspaces over the same field are isomorphic if and only if they have the samedimension.

Let U,V be finite-dimensional vector spaces over a field F and {u1, . . . , un},{v1, . . . , vm} any bases of U,V , respectively. For fixed i = 1, . . . , m, j =1, . . . , n, define Tij ∈ L(U, V ) by setting

Tij (uj ) = vi; Tij (uk) = 0, 1 ≤ k ≤ n, k = j. (2.1.34)

It is clear that {Tij | i = 1, . . . , m, j = 1, . . . , n} is a basis of L(U, V ). Inparticular, we have

dim(L(U, V )) = mn. (2.1.35)

Thus naturally L(U, V ) ≈ F(m, n). In the special situation when V = F

we have L(U,F) = U ′. Thus dim(L(U,F)) = dim(U ′) = dim(U) = n asobtained earlier.

Exercises

2.1.1 For A= (aij ) ∈ F(m, n), define a mapping MA :Fm→Fn by setting

MA(x)= xA = (x1, . . . , xm)

⎛⎜⎝ a11 · · · a1n

· · · · · · · · ·am1 · · · amn

⎞⎟⎠ , x ∈ Fm,

(2.1.36)

where Fl is taken to be the vector space of F-valuedl-component row vectors. Show that MA is linear and the rowvectors of the matrix A are the images of the standard vectorse1, . . . , em of Fm again taken to be row vectors. Describe, as vectorspaces, how F(m, n) may be identified with L(Fm,Fn).

2.1.2 Let U,V be vector spaces over the same field F and T ∈ L(U, V ).Prove that T is 1-1 if and only if N(T ) = {0}.

2.1.3 Let U,V be vector spaces over the same field F and T ∈ L(U, V ). IfY is a subspace of V , is the subset of U given by

X = {x ∈ U | T (x) ∈ Y } ≡ T −1(Y ), (2.1.37)

necessarily a subspace of U?2.1.4 Let U,V be finite-dimensional vector spaces and T ∈ L(U, V ). Prove

that r(T ) ≤ dim(U). In particular, if T is onto, then dim(U) ≥dim(V ).

44 Linear mappings

2.1.5 Prove that if T ∈ L(U, V ) satisfies (2.1.29) or (2.1.30) then T is 1-1 oronto.

2.1.6 Consider A ∈ F(n, n). Prove that, if there is B or C ∈ F(n, n) suchthat either AB = In or CA = In, then A is invertible and B = A−1 orC = A−1.

2.1.7 Let U be an n-dimensional vector space (n ≥ 2) and U ′ its dual space.For f ∈ U ′ recall that f 0 = N(f ) = {u ∈ U | f (u) = 0}. Let g ∈ U ′be such that f and g are linearly independent. Hence f 0 = g0.

(i) Show that U = f 0 + g0 must hold.(ii) Establish dim(f 0 ∩ g0) = n− 2.

2.1.8 For T ∈ L(U), where U is a finite-dimensional vector space, show thatN(T 2) = N(T ) if and only if R(T 2) = R(T ).

2.1.9 Let U,V be finite-dimensional vector spaces and S, T ∈ L(U, V ).Show that

r(S + T ) ≤ r(S)+ r(T ). (2.1.38)

2.1.10 Let U,V,W be finite-dimensional vector spaces over the same fieldand T ∈ L(U, V ), S ∈ L(V,W). Establish the rank and nullityinequalities

r(S ◦ T ) ≤ min{r(S), r(T )}, n(S ◦ T ) ≤ n(S)+ n(T ). (2.1.39)

2.1.11 Let A,B ∈ F(n, n) and A or B be nonsingular. Show that r(AB) =min{r(A), r(B)}. In particular, if A = BC or A = CB where C ∈F(n, n) is nonsingular, then r(A) = r(B).

2.1.12 Let A ∈ F(m, n) and B ∈ F(n,m). Prove that AB ∈ F(m,m) must besingular when m > n.

2.1.13 Let T ∈ L(U, V ) and S ∈ L(V,W) where dim(U)= n anddim(V )=m. Prove that

r(S ◦ T ) ≥ r(S)+ r(T )−m. (2.1.40)

(This result is also known as the Sylvester inequality.)2.1.14 Let U,V,W,X be finite-dimensional vector spaces over the same field

and T ∈ L(U, V ), S ∈ L(V,W),R ∈ L(W,X). Establish the rankinequality

r(R ◦ S)+ r(S ◦ T ) ≤ r(S)+ r(R ◦ S ◦ T ). (2.1.41)

(This result is also known as the Frobenius inequality.) Note that(2.1.40) may be deduced as a special case of (2.1.41).

2.2 Change of basis 45

2.1.15 Let U be a vector space over a field F and {v1, . . . , vk} and{w1, . . . , wl} be two sets of linearly independent vectors. That is,the dimensions of the subspaces V = Span{v1, . . . , vk} and W =Span{w1, . . . , wl} of U are k and l, respectively. Consider the subspaceof Fk+l defined by

S =⎧⎨⎩(y1, . . . , yk, z1, . . . , zl) ∈ Fk+l

∣∣∣∣∣∣k∑

i=1

yivi +l∑

j=1

zjwj = 0

⎫⎬⎭ .

(2.1.42)

Show that dim(S) = dim(V ∩W).2.1.16 Let T ∈ L(U, V ) be invertible and U = U1 ⊕ · · · ⊕ Uk . Show that

there holds V = V1 ⊕ · · · ⊕ Vk where Vi = T (Ui), i = 1, . . . , k.2.1.17 Let U be finite-dimensional and T ∈ L(U). Show that U = N(T ) ⊕

R(T ) if and only if N(T ) ∩ R(T ) = {0}.

2.2 Change of basis

It will be important to investigate how the associated matrix changes withrespect to a change of basis for a linear mapping.

Let {u1, . . . , un} and {u1, . . . , un} be two bases of the n-dimensional vectorspace U over a field F. A change of bases from {u1, . . . , un} to {u1, . . . , un} issimply a linear mapping R ∈ L(U,U) such that

R(uj ) = uj , j = 1, . . . , n. (2.2.1)

Following the study of the previous section, we know that R is invertible.Moreover, if we rewrite (2.2.1) in a matrix form, we have

uk =n∑

j=1

bjkuj , k = 1, . . . , n, (2.2.2)

where the matrix B = (bjk) ∈ F(n, n) is called the basis transition matrixfrom the basis {u1, . . . , un} to the basis {u1, . . . , un}, which is necessarilyinvertible.

Let V be an m-dimensional vector space over F and take T ∈ L(U, V ).Let A = (aij ) and A = (aij ) be the m × n matrices of the linear mapping T

associated with the pairs of the bases

{u1, . . . , un} and {v1, . . . , vm}, {u1, . . . , un} and {v1, . . . , vm},(2.2.3)

46 Linear mappings

of the spaces U and V , respectively. Thus, we have

T (uj ) =m∑

i=1

aij vi, T (uj ) =m∑

i=1

aij vi, j = 1, . . . , n. (2.2.4)

Combining (2.2.2) and (2.2.4), we obtainm∑

i=1

aij vi = T (uj ) = T

(n∑

k=1

bkjuk

)

=n∑

k=1

bkj

m∑i=1

aikvi

=m∑

i=1

(n∑

k=1

aikbkj

)vi, j = 1, . . . , n. (2.2.5)

Therefore, we can read off the relation

aij =n∑

k=1

aikbkj , i = 1, . . . , m, j = 1, . . . , n, (2.2.6)

or

A = AB. (2.2.7)

Another way to arrive at (2.2.7) is to define T ∈ L(U, V ) by

T (uj ) =m∑

i=1

aij vi, j = 1, . . . , n. (2.2.8)

Then, with the mapping R ∈ L(U,U) given in (2.2.1), the second relation in(2.2.4) simply says T = T ◦ R, which leads to (2.2.7) immediately.

Similarly, we may consider a change of basis in V . Let {v1, . . . , vm} beanother basis of V which is related to the basis {v1, . . . , vm} through an invert-ible mapping S ∈ L(V, V ) with

S(vi) = vi , i = 1, . . . , m, (2.2.9)

given by the basis transition matrix C = (cil) ∈ F(m,m) so that

vi =m∑

l=1

clivl, i = 1, . . . , m. (2.2.10)

For T ∈ L(U, V ), let A = (aij ), A = (aij ) ∈ F(m, n) be the matrices of T

associated with the pairs of the bases

{u1, . . . , un} and {v1, . . . , vm}, {u1, . . . , un} and {v1, . . . , vm},(2.2.11)


of the spaces U and V , respectively. Thus, we have

T (uj ) =m∑

i=1

aij vi =m∑

i=1

aij vi , j = 1, . . . , n. (2.2.12)

Inserting (2.2.10) into (2.2.12), we obtain the relation

aij =m∑

l=1

cil alj , i = 1, . . . , m, j = 1, . . . , n. (2.2.13)

Rewriting (2.2.13) in its matrix form, we arrive at

A = CA. (2.2.14)

As before, we may also define T ∈ L(U, V ) by

T (uj ) =m∑

i=1

aij vi, j = 1, . . . , n. (2.2.15)

Then

(S ◦ T )(uj ) =m∑

i=1

aij S(vi) =m∑

i=1

aij vi , j = 1, . . . , n. (2.2.16)

In view of (2.2.12) and (2.2.16), we have S ◦ T = T , which leads again to(2.2.14).

An important special case is when U = V with dim(U) = n. We investigatehow the associated matrices of an element T ∈ L(U,U) with respect to twobases are related through the transition matrix between the two bases. For thispurpose, let {u1, . . . , un} and {u1, . . . , un} be two bases of U related throughthe mapping R ∈ L(U,U) so that

R(uj ) = uj , j = 1, . . . , n. (2.2.17)

Now let A = (aij ), A = (aij ) ∈ F(n, n) be the associated matrices of T withrespect to the bases {u1, . . . , un}, {u1, . . . , un}, respectively. Thus

T (uj ) =n∑

i=1

aijui, T (uj ) =n∑

i=1

aij ui , j = 1, . . . , n. (2.2.18)

Define T ∈ L(U,U) so that

T (uj ) =n∑

i=1

aij ui, j = 1, . . . , n. (2.2.19)

48 Linear mappings

Then (2.2.17), (2.2.18), and (2.2.19) give us

R ◦ T ◦ R−1 = T . (2.2.20)

Thus, if we choose B = (bij ) ∈ F(n, n) to represent the invertible mapping(2.2.17), that is,

R(uj ) =n∑

i=1

bijui, j = 1, . . . , n, (2.2.21)

then (2.2.20) gives us the manner in which the two matrices A and A arerelated:

A = BAB−1. (2.2.22)

Such a relation spells out the following definition.

Definition 2.6 Two matrices A,B ∈ F(n, n) are said to be similar if there isan invertible matrix C ∈ F(n, n) such that A = CBC−1, written as A ∼ B.

It is clear that similarity of matrices is an equivalence relation. In otherwords, we have A ∼ A, B ∼ A if A ∼ B, and A ∼ C if A ∼ B and B ∼ C,for A,B,C ∈ F(n, n).

Consequently, we see that the matrices associated to a linear mapping of afinite-dimensional vector space into itself with respect to different bases aresimilar.

Exercises

2.2.1 Consider the differentiation operation D = d

dtas a linear mapping over

P (the vector space of all real polynomials of variable t) given by

D(p)(t) = dp(t)

dt, p ∈ P . (2.2.23)

(i) Find the matrix that represents D ∈ L(P2,P1) with respect to thestandard bases

B1P2= {1, t, t2}, B1

P1= {1, t} (2.2.24)

of P2,P1, respectively.(ii) If the basis of P2 is changed into

B2P2= {t − 1, t + 1, (t − 1)(t + 1)}, (2.2.25)

find the basis transition matrix from the basis B1P2

into the basis

B2P2

.


(iii) Obtain the matrix that represents D ∈ L(P2,P1) with respect toB2P2

and B1P1

directly first and then obtain it again by using theresults of (i) and (ii) in view of the relation stated as in (2.2.7).

(iv) If the basis of P1 is changed into

B2P1= {t − 1, t + 1}, (2.2.26)

obtain the matrix that represents D ∈ L(P2,P1) with respect to thebases B1

P2and B2

P1for P2 and P1 respectively.

(v) Find the basis transition matrix from B1P1

into B2P1

.(vi) Obtain the matrix that represents D ∈ L(P2,P1) with respect to

B1P2

and B2P1

directly first and then obtain it again by using theresults of (i) and (ii) in view of the relation stated as in (2.2.14).

2.2.2 (Continued from Exercise 2.2.1) Now consider D as an element inL(P2,P2).

(i) Find the matrices that represent D with respect to B1P2

and B2P2

,respectively.

(ii) Use the basis transition matrix from the basis B1P2

into the basis B2P2

of P2 to verify your results in (i) in view of the similarity relationstated as in (2.2.22).

2.2.3 Let D : Pn → Pn be defined by (2.2.23) and consider the linear map-ping T : Pn → Pn defined by

T (p) = tD(p)− p, p = p(t) ∈ Pn. (2.2.27)

Find N(T ) and R(T ) and show that Pn = N(T )⊕ R(T ).2.2.4 Show that the real matrices(

a b

c d

)and

1

2

(a + b + c + d a − b + c − d

a + b − c − d a − b − c + d

)(2.2.28)

are similar through investigating the matrix representations of a certainlinear mapping over R2 under the standard basis {e1, e2} and the trans-formed basis {u1 = e1 + e2, u2 = e1 − e2}, respectively.

2.2.5 Prove that the matrices⎛⎜⎜⎜⎜⎝a11 a12 · · · a1n

a21 a22 · · · a2n

· · · · · · · · · · · ·an1 an2 · · · ann

⎞⎟⎟⎟⎟⎠ and

⎛⎜⎜⎜⎜⎝ann · · · an2 an1

· · · · · · · · · · · ·a2n · · · a22 a21

a1n · · · a12 a11

⎞⎟⎟⎟⎟⎠ (2.2.29)

50 Linear mappings

in F(n, n) are similar by realizing them as the matrix representatives ofa certain linear mapping over Fn with respect to two appropriate basesof Fn.

2.3 Adjoint mappings

Let U,V be finite-dimensional vector spaces over a field F and U ′, V ′ theirdual spaces. For T ∈ L(U, V ) and v′ ∈ V ′, we see that

〈T (u), v′〉, ∀u ∈ U, (2.3.1)

defines a linear functional over U . Hence there is a unique vector u′ ∈ U ′ suchthat

〈u, u′〉 = 〈T (u), v′〉, ∀u ∈ U. (2.3.2)

Of course, u′ depends on T and v′. So we may write this relation as

u′ = T ′(v′). (2.3.3)

Under such a notion, we can rewrite (2.3.2) as

〈u, T ′(v′)〉 = 〈T (u), v′〉, ∀u ∈ U, ∀v′ ∈ V ′. (2.3.4)

Thus, in this way we have constructed a mapping T ′ : V ′ → U ′. We nowshow that T ′ is linear.

In fact, let v′1, v′2 ∈ V ′. Then (2.3.4) gives us

〈u, T ′(v′i )〉 = 〈T (u), v′i〉, ∀u ∈ U, i = 1, 2. (2.3.5)

Thus

〈u, T ′(v′1)+ T ′(v′2)〉 = 〈T (u), v′1 + v′2〉, ∀u ∈ U. (2.3.6)

In view of (2.3.4) and (2.3.6), we arrive at T ′(v′1 + v′2) = T (v′1) + T ′(v′2).Besides, for any a ∈ F, we have from (2.3.4) that

〈u, T ′(av′)〉 = 〈T (u), av′〉 = a〈T (u), v′〉 = a〈u, T ′(v′)〉= 〈u, aT ′(v′)〉, ∀u ∈ U, ∀v′ ∈ V ′, (2.3.7)

which yields T ′(av′) = aT ′(v′). Thus the linearity of T ′ is established.

Definition 2.7 For any given T ∈ L(U, V ), the linear mapping T ′ : V ′ → U ′defined by (2.3.4) is called the adjoint mapping, or simply adjoint, of T .

2.3 Adjoint mappings 51

Using the fact U ′′ = U,V ′′ = V , and the relation (2.3.4), we have

T ′′ ≡ (T ′)′ = T . (2.3.8)

It will be interesting to study the matrices associated with T and T ′. Forthis purpose, let {u1, . . . , un} and {v1, . . . , vm} be bases of U and V , and{u′1, . . . , u′n} and {v′1, . . . , v′m} the corresponding dual bases, respectively. TakeA = (aij ) ∈ F(m, n) and A′ = (a′kl) ∈ F(n,m) such that

T (uj ) =m∑

i=1

aij vi, j = 1, . . . , n, (2.3.9)

T ′(v′l) =n∑

k=1

a′klu′k, l = 1, . . . , m. (2.3.10)

Inserting u = uj and v′ = v′i in (2.3.4), we have a′j i = aij (i = 1, . . . , m,

j = 1, . . . , n). In other words, we get A′ = At . Thus the adjoint of linearmapping corresponds to the transpose of a matrix.

We now investigate the relation between N(T ), R(T ) and N(T ′), R(T ′).

Theorem 2.8 Let U,V be finite-dimensional vector spaces and T ∈ L(U, V ).Then N(T ),N(T ′), R(T ), R(T ′) and their annihilators enjoy the followingrelations.

(1) N(T )0 = R(T ′).(2) N(T ) = R(T ′)0.

(3) N(T ′)0 = R(T ).

(4) N(T ′) = R(T )0.

Proof We first prove (2). Let u ∈ N(T ). Then 〈T ′(v′), u〉 = 〈v′, T (u)〉 = 0for any v′ ∈ V ′. So u ∈ R(T ′)0. If u ∈ R(T ′)0, then 〈u′, u〉 = 0 for anyu′ ∈ R(T ′). So 〈T ′(v′), u〉 = 0 for any v′ ∈ V ′. That is, 〈v′, T (u)〉 = 0 forany v′ ∈ V ′. Hence u ∈ N(T ). So (2) is established.

We now prove (1). If u′ ∈ R(T ′), there is some v′ ∈ V ′ such that u′ =T ′(v′). So 〈u′, u〉 = 〈T ′(v′), u〉 = 〈v′, T (u)〉 = 0 for any u ∈ N(T ). Thisshows u′ ∈ N(T )0. Hence R(T ′) ⊂ N(T )0. However, using (2) and (1.4.31),we have

dim(N(T )) = dim(R(T ′)0) = dim(U ′)− dim(R(T ′))= dim(U)− dim(R(T ′)), (2.3.11)

52 Linear mappings

which then implies

dim(R(T ′)) = dim(U)− dim(N(T )) = dim(N(T )0). (2.3.12)

This proves N(T )0 = R(T ′).Finally, (3) and (4) follow from replacing T by T ′ and using T ′′ = T in (1)

and (2).

We note that (1) in Theorem 2.8 may also follow from taking the annihilatorson both sides of (2) and applying Exercise 1.4.7. Thus, part (2) of Theorem 2.8is the core of all the statements there.

We now use Theorem 2.8 to establish the following result.

Theorem 2.9 Let U,V be finite-dimensional vector spaces. For any T ∈L(U, V ), the rank of T and that of its dual T ′ ∈ L(V ′, U ′) are equal.That is,

r(T ) = r(T ′). (2.3.13)

Proof Using (2) in Theorem 2.8, we have n(T ) = dim(U)−r(T ′). However,applying the rank equation (2.1.23) or Theorem 2.3, we have n(T ) + r(T ) =dim(U). Combining these results, we have r(T ) = r(T ′).

As an important application of Theorem 2.8, we consider the notion of rankof a matrix.

Let T ∈ L(Fn,Fm) be defined by (2.1.6) by the matrix A = (aij ) ∈ F(m, n).Then the images of the standard basis vectors e1, . . . , en are the column vectorsof A. Thus R(T ) is the vector space spanned by the column vectors of A whosedimension is commonly called the column rank of A, denoted as corank(A).Thus, we have

r(T ) = corank(A). (2.3.14)

On the other hand, since the associated matrix of T ′ is At whose column vec-tors are the row vectors of A. Hence R(T ′) is the vector space spanned by therow vectors of A (since (Fm)′ = Fm) whose dimension is likewise called therow rank of A, denoted as rorank(A). Thus, we have

r(T ′) = rorank(A). (2.3.15)

Consequently, in view of (2.3.14), (2.3.15), and Theorem 2.9, we see thatthe column and row ranks of a matrix coincide, although they have differentmeanings.

Since the column and row ranks of a matrix are identical, they are jointlycalled the rank of the matrix.

2.4 Quotient mappings 53

Exercises

2.3.1 Let U be a finite-dimensional vector space over a field F and regard agiven element f ∈ U ′ as an element in L(U,F). Describe the adjoint off , namely, f ′, as an element in L(F′, U ′), and verify r(f ) = r(f ′).

2.3.2 Let U,V,W be finite-dimensional vector spaces over a field F.Show that for T ∈ L(U, V ) and S ∈ L(V,W) there holds(S ◦ T )′ = T ′ ◦ S′.

2.3.3 Let U,V be finite-dimensional vector spaces over a field F and T ∈L(U, V ). Prove that for given v ∈ V the non-homogeneous equation

T (u) = v, (2.3.16)

has a solution for some u ∈ U if and only if for any v′ ∈ V ′ such thatT ′(v′) = 0 one has 〈v, v′〉 = 0. In particular, the equation (2.3.16) hasa solution for any v ∈ V if and only if T ′ is injective. (This statement isalso commonly known as the Fredholm alternative.)

2.3.4 Let U be a finite-dimensional vector space over a field F and a ∈ F.Define T ∈ L(U,U) by T (u) = au where u ∈ U . Show that T ′ is thengiven by T ′(u′) = au′ for any u′ ∈ U ′.

2.3.5 Let U = F(n, n) and B ∈ U . Define T ∈ L(U,U) by T (A) = AB−BA

where A ∈ U . For f ∈ U ′ given by

f (A) = 〈f,A〉 = Tr(A), A ∈ U, (2.3.17)

where Tr(A), or trace of A, is the sum of the diagonal entries of A,determine T ′(f ).

2.3.6 Let U = F(n, n). Then f (A) = Tr(ABt ) defines an element f ∈ U ′.(i) Use Mij to denote the element in F(n, n) whose entry in the ith row

and j th column is 1 but all other entries are zero. Verify the formula

Tr(MijMkl) = δilδjk, i, j, k, k = 1, . . . , n. (2.3.18)

(ii) Apply (i) to show that for any f ∈ U ′ there is a unique elementB ∈ U such that f (A) = Tr(ABt ).

(iii) For T ∈ L(U,U), defined by T (A) = At , describe T ′.

2.4 Quotient mappings

Let U,V be two vector spaces over a field F and T ∈ L(U, V ). Suppose thatX, Y are subspaces of U,V , respectively, which satisfy the property T (X) ⊂ Y

54 Linear mappings

or T ∈ L(X, Y ). We now show that such a property allows us to generate alinear mapping

T : U/X → V/Y, (2.4.1)

from T naturally.As before, we use [·] to denote a coset in U/X or V/Y . Define T : U/

X → V/Y by setting

T ([u]) = [T (u)], ∀[u] ∈ U/X. (2.4.2)

We begin by showing that this definition does not suffer any ambiguity byverifying

T ([u1]) = T ([u2]) whenever [u1] = [u2]. (2.4.3)

In fact, if [u1]= [u2], then u1−u2 ∈ X. Thus T (u1)−T (u2)= T (u1−u2) ∈ Y ,which implies [T (u1)]= [T (u2)]. So (2.4.3) follows.

The linearity of T can now be checked directly.First, let u1, u2 ∈ U . Then, by (2.4.2), we have

T ([u1] + [u2]) = T ([u1 + u2]) = [T (u1 + u2)] = [T (u1)+ T (u2)]= [T (u1)] + [T (u2)] = T ([u1])+ T ([u2]). (2.4.4)

Next, let a ∈ F and u ∈ U . Then, again by (2.4.2), we have

T (a[u]) = T ([au]) = [T (au)] = a[T (u)] = aT ([u]). (2.4.5)

It is not hard to show that the property T (X) ⊂ Y is also necessary toensure that (2.4.2) gives us a well-defined mapping T from U/X into V/Y forT ∈ L(U, V ). Indeed, let u ∈ X. Then [u] = [0]. Thus [T (u)] = T ([u]) =T ([0]) = [0] which implies T (u) ∈ Y .

In summary, we can state the following basic theorem regarding construct-ing quotient mappings between quotient spaces.

Theorem 2.10 Let X, Y be subspaces of U,V , respectively, and T ∈ L(U, V ).Then (2.4.2) defines a linear mapping T from U/X into V/Y if and only if thecondition

T (X) ⊂ Y (2.4.6)

holds.

In the special situation when X = {0} ⊂ U and Y is an arbitrary subspaceof V , we see that U/X = U any T ∈ L(U, V ) induces the quotient mapping

T : U → V/Y, T (u) = [T (u)]. (2.4.7)

2.5 Linear mappings from a vector space into itself 55

Exercises

2.4.1 Let V,W be subspaces of a vector space U . Then the quotient mappingI : U/V → U/W induced from the identity mapping I : U → U iswell-defined and given by

I ([u]V ) = [u]W, u ∈ U, (2.4.8)

if V ⊂ W , where [·]V and [·]W denote the cosets in U/V andU/W , respectively. Show that the fact I ([0]V ) = [0]W implies that if[u1]V , . . . , [uk]V are linearly dependent in U/V then [u1]W, . . . , [uk]Ware linearly dependent in U/W .

2.4.2 Let U be a finite-dimensional vector space and V a subspace of U . Use I

to denote the quotient mapping I : U → U/V induced from the identitymapping I : U → U given as in (2.4.7). Apply Theorem 2.3 or the rankequation (2.1.23) to I to reestablish the relation (1.6.12).

2.4.3 Let U,V be finite-dimensional vector spaces over a field and X, Y thesubspaces of U,V , respectively. For T ∈ L(U, V ), assume T (X) ⊂ Y

and use T to denote the quotient mapping from U/X into V/Y inducedfrom T . Show that r(T ) ≤ r(T ).

2.5 Linear mappings from a vector space into itself

Let U be a vector space over a field F. In this section, we study the importantspecial situation when linear mappings are from U into itself. We shall denotethe space L(U,U) simply by L(U). Such mappings are also often called lineartransformations. First, we consider some general properties such as invarianceand reducibility. Then we present some examples.

2.5.1 Invariance and reducibility

In this subsection, we consider some situations in which the complexity of alinear mapping may be ‘reduced’ somewhat.

Definition 2.11 Let T ∈ L(U) and V be a subspace of U . We say that V is aninvariant subspace of T if T (V ) ⊂ V .

Given T ∈ L(U), it is clear that the null-space N(T ) and range R(T ) of T

are both invariant subspaces of T .To see how the knowledge about an invariant subspace reduces the complex-

ity of a linear mapping, we assume that V is a nontrivial invariant subspace of

56 Linear mappings

T ∈ L(U) where U is n-dimensional. Let {u1, . . . , uk} be any basis of V . Weextend it to get a basis of U , say {u1, . . . , uk, uk+1, . . . , un}. With respect tosuch a basis, we have

T (ui) =k∑

i′=1

bi′iui′ , i = 1, . . . , k, T (uj ) =n∑

j ′=1

cj ′j uj ′ , j = k + 1, . . . , n,

(2.5.1)

where B = (bi′i ) ∈ F(k, k) and C = (cj ′j ) ∈ F(n, [n − k]). With respect tothis basis, the associated matrix A ∈ F(n, n) becomes

A =(

B C1

0 C2

),

(C1

C2

)= C. (2.5.2)

Such a matrix is sometimes referred to as boxed upper triangular.Thus, we see that a linear mapping T over a finite-dimensional vector space

U has a nontrivial invariant subspace if and only if there is a basis of U so thatthe associated matrix of T with respect to this basis is boxed upper triangular.

For the matrix A given in (2.5.2), the vanishing of the entries in the left-lower portion of the matrix indeed reduces the complexity of the matrix. Wehave seen clearly that such a ‘reduction’ happens because of the invarianceproperty

T (Span{u1, . . . , uk}) ⊂ Span{u1, . . . , uk}. (2.5.3)

Consequently, if we also have the following additionally imposed invarianceproperty

T (Span{uk+1, . . . , un}) ⊂ Span{uk+1, . . . , un}, (2.5.4)

then cj ′j = 0 for j ′ = 1, . . . , k in (2.5.1) or C1 = 0 in (2.5.2), which furtherreduces the complexity of the matrix A.

The above investigation motivates the introduction of the concept ofreducibility of a linear mapping as follows.

Definition 2.12 We say that T ∈ L(U) is reducible if there are nontrivialinvariant subspaces V,W of T so that U = V ⊕ W . In such a situation, wealso say that T may be reduced over the subspaces V and W . Otherwise, if nosuch pair of invariant subspaces exist, we say that T is irreducible.

Thus, if U is n-dimensional and T ∈ L(U) may be reduced over the non-trivial subspaces V and W , then we take {v1, . . . , vk} and {w1, . . . , wl} tobe any bases of V and W , so that {v1, . . . , vk, w1, . . . , wl} is a basis of U .


Over such a basis, T has the representation

T (vi) =k∑

i′=1

bi′ivi′ , i = 1, . . . , k, T (wj ) =l∑

j ′=1

cj ′jwj ′, j = 1, . . . , l,

(2.5.5)

where the matrices B = (bi′i ) and C = (cj ′j ) are in F(k, k) and F(l, l),respectively. Thus, we see that, with respect to such a basis of U , the asso-ciated matrix A ∈ F(n, n) assumes the form

A =(

B 0

0 C

), (2.5.6)

which takes the form of a special kind of matrices called boxed diagonalmatrices.

Thus, we see that a linear mapping T over a finite-dimensional vector spaceU is reducible if and only if there is a basis of U so that the associated matrixof T with respect to this basis is boxed diagonal.

An important and useful family of invariant subspaces are calledeigenspaces which may be viewed as an extension of the notion of null-spaces.

Definition 2.13 For T ∈ L(U), a scalar λ ∈ F is called an eigenvalueof T if the null-space N(T − λI) is not the zero space {0}. Any nonzerovector in N(T − λI) is called an eigenvector associated with the eigenvalueλ and N(T − λI) is called the eigenspace of T associated with the eigen-value λ, and often denoted as Eλ. The integer dim(Eλ) is called the geometricmultiplicity of the eigenvalue λ.

In particular, for A ∈ F(n, n), an eigenvalue, eigenspace, and the eigen-vectors associated with and geometric multiplicity of an eigenvalue of thematrix A are those of the A-induced mapping TA : Fn → Fn, defined byTA(x) = Ax, x ∈ Fn.

Let λ be an eigenvalue of T ∈ L(U). It is clear that Eλ is invariant underT and T = λI (I is the identity mapping) over Eλ. Thus, let {u1, . . . , uk}be a basis of Eλ and extend it to obtain a basis for the full space U . Thenthe associated matrix A of T with respect to this basis takes a boxed lowertriangular form

A =(

λIk C1

0 C2

), (2.5.7)

where Ik denotes the identity matrix in F(k, k).

58 Linear mappings

We may explore the concept of eigenspaces to pursue a further reduction ofthe associated matrix of a linear mapping.

Theorem 2.14 Let u1, . . . , uk be eigenvectors associated with distinct eigen-values λ1, . . . , λk of some T ∈ L(U). Then these vectors are linearlyindependent.

Proof We use induction on k. If k= 1, the statement of the theorem is alreadyvalid since u1 = 0. Assume the statement of the theorem is valid for k=m ≥ 1.For k = m+ 1, consider the relation

c1u1 + · · · + cmum + cm+1um+1 = 0, c1, . . . , cm, cm+1 ∈ F. (2.5.8)

Applying T to (2.5.8), we have

c1λ1u1 + · · · + cmλmum + cm+1λm+1um+1 = 0. (2.5.9)

Multiplying (2.5.8) by λm+1 and subtracting the result from (2.5.9), we obtain

c1(λ1 − λm+1)u1 + · · · + cm(λm − λm+1)um = 0. (2.5.10)

Thus, by the assumption that u1, . . . , um are linearly independent, we get c1 =· · · = cm = 0 since λ1, . . . , λm, λm+1 are distinct. Inserting this result into(2.5.8), we find cm+1 = 0. So the proof follows.

As a corollary of the theorem, we immediately conclude that a linearmapping over an n-dimensional vector space may have at most n distincteigenvalues.

If λ1, . . . , λk are the distinct eigenvalues of T , then Theorem 2.14 indicatesthat the sum of Eλ1 , . . . , Eλk

is a direct sum,

Eλ1 + · · · + Eλk= Eλ1 ⊕ · · · ⊕ Eλk

. (2.5.11)

Thus, the equality

dim(Eλ1)+ · · · + dim(Eλk) = dim(U) = n (2.5.12)

holds if and only if

U = Eλ1 ⊕ · · · ⊕ Eλk. (2.5.13)

In this situation T is reducible over Eλ1 , . . . , Eλkand may naturally be

expressed as a direct sum of mappings

T = λ1I ⊕ · · · ⊕ λkI, (2.5.14)


where λiI is understood to operate on Eλi, i = 1, . . . , k. Furthermore, using

the bases of Eλ1 , . . . , Eλkto form a basis of U , we see that the associated

matrix A of T becomes diagonal

A = diag{λ1In1, . . . , λkInk}, (2.5.15)

where ni = dim(Eλi) (i = 1, . . . , k). In particular, when T has n distinct

eigenvalues, λ1, . . . , λn, then any n eigenvectors correspondingly associatedwith these eigenvalues form a basis of U . With respect to this basis, the asso-ciated matrix A of T is simply

A = diag{λ1, . . . , λn}. (2.5.16)

We now present some examples as illustrations.Consider T ∈ L(R2) defined by

T (x) =(

2 1

1 2

)x, x =

(x1

x2

)∈ R2. (2.5.17)

Then it may be checked directly that T has 1 and 3 as eigenvalues and

E1 = Span{(1,−1)t }, E3 = Span{(1, 1)t }. (2.5.18)

Thus, with respect to the basis {(1,−1)t , (1, 1)t }, the associated matrix of T is(1 0

0 3

). (2.5.19)

So T is reducible and reduced over the pair of eigenspaces E1 and E3.Next, consider S ∈ L(R2) defined by

S(x) =(

0 −1

1 0

)x, x =

(x1

x2

)∈ R2. (2.5.20)

We show that S has no nontrivial invariant spaces. In fact, if V is one, thendim(V ) = 1. Let x ∈ V such that x = 0. Then the invariance of V requiresS(x) = λx for some λ ∈ R. Inserting this relation into (2.5.20), we obtain

−x2 = λx1, x1 = λx2. (2.5.21)

Hence x1 = 0, x2 = 0. Iterating the two equations in (2.5.21), we obtainλ2 + 1 = 0 which is impossible. So S is also irreducible.

60 Linear mappings

2.5.2 Projections

In this subsection, we study an important family of reducible linear mappingscalled projections.

Definition 2.15 Let V and W be two complementary subspaces of U . That is,U = V ⊕W . For any u ∈ U , express u uniquely as u = v+w, v ∈ V , w ∈ W ,and define the mapping P : U → U by

P(u) = v. (2.5.22)

Then P ∈ L(U) and is called the projection of U onto V along W .

We need to check that the mapping P defined in Definition 2.15 is indeedlinear. To see this, we take u1, u2 ∈ U and express them as u1 = v1+w1, u2 =v2+w2, for unique v1, v2 ∈ V , w1, w2 ∈ W . Hence P(u1) = v1, P (u2) = v2.On the other hand, from u1+u2 = (v1+v2)+(w1+w2), we get P(u1+u2) =v1 + v2. Thus P(u1 + u2) = P(u1) + P(u2). Moreover, for any a ∈ F andu ∈ U , write u = v + w for unique v ∈ V,w ∈ W . Thus P(u) = v andau = av + aw give us P(au) = av = aP (u). So P ∈ L(U) as claimed.

From Definition 2.15, we see that for v ∈ V we have P(v) = v. ThusP(P (u)) = P(u) for any u ∈ U . In other words, the projection P satisfies thespecial property P ◦ P = P . For notational convenience, we shall use T k todenote the k-fold composition T ◦· · ·◦T for any T ∈ L(U). With this notation,we see that a projection P satisfies the condition P 2 = P. Any linear mappingsatisfying such a condition is called idempotent.

We now show that being idempotent characterizes a linear mapping being aprojection.

Theorem 2.16 A linear mapping T over a vector space U is idempotent if andonly if it is a projection. More precisely, if T 2 = T and

V = N(I − T ), W = N(T ), (2.5.23)

then U = V ⊕W and T is simply the projection of U onto V along W .

Proof Let T be idempotent and define the subspaces V,W by (2.5.23). Weclaim

R(T ) = N(I − T ), R(I − T ) = N(T ). (2.5.24)

In fact, if u ∈ R(T ), then there is some x ∈ U such that u = T (x). Hence(I − T )u = (I − T )(T (x)) = T (x) − T 2(x) = 0. So u ∈ N(I − T ). Ifu ∈ N(I − T ), then u = T (u) which implies u ∈ R(T ) already. If u ∈


R(I − T ), then there is some x ∈ U such that u = (I − T )(x). So T (u) =T ◦(I−T )(x) = (T −T 2)(x) = 0 and u ∈ N(T ). If u ∈ N(T ), then T (u) = 0which allows us to rewrite u as u = (I − T )(u). Thus u ∈ R(I − T ).

Now consider the identity

I = T + (I − T ). (2.5.25)

Applying (2.5.25) to U and using (2.5.24), we obtain U = V + W . Let u ∈V ∩ W . Using the definition of V,W in (2.5.23), we get T (u) − u = 0 andT (u) = 0. Thus u = 0. So U = V ⊕W .

Finally, for any u ∈ U , express u as u = v + w for some unique v ∈ V

and w ∈ W and let P ∈ L(U) be the projection of U onto V along W . Then(2.5.23) indicates that T (v) = v and T (w) = 0. So T (u) = T (v + w) =T (v)+ T (w) = v = P(u). That is, T = P and the proof follows.

An easy but interesting consequence of Theorem 2.16 is the following.

Theorem 2.17 Let V,W be complementary subspaces of U . Then P ∈ L(U)

is the projection of U onto V along W if and only if I − P is the projection ofU onto W along V .

Proof Since

(I − P)2 = I − 2P + P 2, (2.5.26)

we see that (I−P)2 = I−P if and only if P 2 = P . The rest of the conclusionfollows from (2.5.23).

From (2.5.23), we see that when T ∈ L(U) is idempotent, it is reducibleover the null-spaces N(T ) and N(I − T ). It may be shown that the converseis true, which is assigned as an exercise.

Recall that the null-spaces N(T ) and N(I − T ), if nontrivial, of an idempo-tent mapping T ∈ L(U), are simply the eigenspaces E0 and E1 of T associatedwith the eigenvalue 0 and 1, respectively. So, with respect to a basis of U con-sisting of the vectors in any bases of E0 and E1, the associated matrix A of T

is of the form

A =(

0k 0

0 Il

), (2.5.27)

where 0k is the zero matrix in F(k, k), k = dim(N(T )) = n(T ), l =dim(N(I − T )) = r(T ).

62 Linear mappings

Let T ∈ L(U) and λ1, . . . , λk be some distinct eigenvalues of T such thatT reduces over Eλ1 , . . . , Eλk

. Use Pi to denote the projection of U onto Eλi

along

Eλ1 ⊕ · · · ⊕ Eλi⊕ · · · ⊕ Eλk

, (2.5.28)

where [·] indicates the item that is missing in the expression. Then we canrepresent T as

T = λ1P1 + · · · + λkPk. (2.5.29)

2.5.3 Nilpotent mappings

Consider the vector space Pn of the set of all polynomials of degrees up to n

with coefficients in a field F and the differentiation operator

D = d

dtso that D(a0 + a1t + · · · + ant

n) = a1 + 2a2t + · · · + nantn−1.

(2.5.30)

Then Dn+1 = 0 (zero mapping). Such a linear mapping D : Pn → Pn is anexample of nilpotent mappings we now study.

Definition 2.18 Let U be a finite-dimensional vector space and T ∈ L(U).We say that T is nilpotent if there is an integer k ≥ 1 such that T k = 0. For anilpotent mapping T ∈ L(U), the smallest integer k ≥ 1 such that T k = 0 iscalled the degree or index of nilpotence of T .

The same definition may be stated for square matrices.Of course, the degree of a nonzero nilpotent mapping is always at least 2.

Definition 2.19 Let U be a vector space and T ∈ L(U). For any nonzerovector u ∈ U , we say that u is T -cyclic if there is an integer m ≥ 1 such thatT m(u) = 0. The smallest such integer m is called the period of u under orrelative to T . If each vector in U is T -cyclic, T is said to be locally nilpotent.

It is clear that a nilpotent mapping must be locally nilpotent. In fact, thesetwo notions are equivalent in finite dimensions.

Theorem 2.20 If U is finite-dimensional, then a mapping T ∈ L(U) is nilpo-tent if and only if it is locally nilpotent.


Proof Suppose that dim(U) = n ≥ 1 and T ∈ L(U) is locally nilpotent. Let{u1, . . . , un} be any basis of U and m1, . . . , mn be the periods of u1, . . . , un,respectively. Set

k = max{m1, . . . , mn}. (2.5.31)

Then it is seen that T is nilpotent of degree k.

We now show how to use a cyclic vector to generate an invariant subspace.

Theorem 2.21 For T ∈ L(U), let u ∈ U be a nonzero cyclic vector under T

of period m. Then the vectors

u, T (u), . . . , T m−1(u), (2.5.32)

are linearly independent so that they span an m-dimensional T -invariant sub-space of U . In particular, m ≤ dim(U).

Proof We only need to show that the set of vectors in (2.5.32) are linearlyindependent. If m = 1, the statement is self-evident. So we now assume m ≥ 2.

Let c0, . . . , cm−1 ∈ F so that

c0u+ c1T (u)+ · · · + cm−1Tm−1(u) = 0. (2.5.33)

Assume that there is some l = 0, . . . , m− 1 such that cl = 0. Let l ≥ 0 be thesmallest such integer. Then l ≤ m− 2 so that (2.5.33) gives us

T l(u) = − 1

cl

(cl+1Tl+1(u)+ · · · + cm−1T

m−1(u)). (2.5.34)

This relation implies that

T m−1(u) = T l+([m−1]−l)(u)

= − 1

cl

(cl+1Tm(u)+ · · · + cm−1T

2(m−1)+l(u)) = 0, (2.5.35)

which contradicts the fact that T m−1(u) = 0.

We can apply the above results to study the reducibility of a nilpotentmapping.

Theorem 2.22 Let U be a finite-dimensional vector space and T ∈ L(U)

a nilpotent mapping of degree k. If T is reducible over V and W , that is,U = V ⊕W , T (V ) ⊂ V , and T (W) ⊂ W , then

k ≤ max{dim(V ), dim(W)}. (2.5.36)

In particular, if k = dim(U), then T is irreducible.

64 Linear mappings

Proof Let {v1, . . . , vn1} and {w1, . . . , wn2} be bases of V and W ,respectively. Then, by Theorem 2.21, the periods of these vectors cannotexceed the right-hand side of (2.5.36). Thus (2.5.36) follows from (2.5.31).

If k = dim(U), then dim(V ) = k and dim(W) = 0 or dim(V ) = 0 anddim(W) = k, which shows that T is irreducible.

Let dim(U) = n and T ∈ L(U) be nilpotent of degree n. Then there is avector u ∈ U of period n. Set

U1 = Span{T n−1(u)}, . . . , Un−1 = Span{T (u), . . . , T n−1(u)}. (2.5.37)

Then U1, . . . , Un−1 are nontrivial invariant subspaces of T satisfying U1 ⊂· · · ⊂ Un−1 and dim(U1) = 1, . . . , dim(Un−1) = n− 1. Thus, T has invariantsubspaces of all possible dimensions, but yet is irreducible.

Again, let dim(U) = n and T ∈ L(U) be nilpotent of degree n. Choose acyclic vector u ∈ U of period n and set

u1 = u, . . . , un = T n−1(u). (2.5.38)

Then, with respect to the basis B = {u1, . . . , un}, we have

T (ui) = ui+1, i = 1, . . . , n− 1, T (un) = 0. (2.5.39)

Hence, if we use S = (sij ) to denote the matrix of T with respect to the basisB, then

sij ={

1, j = i − 1, i = 2, . . . , n,

0, otherwise.(2.5.40)

That is,

S =

⎛⎜⎜⎜⎜⎜⎜⎜⎝

0 0 · · · · · · 0

1 0 · · · · · · 0...

. . .. . .

. . ....

0 · · · 1 0 0

0 · · · · · · 1 0

⎞⎟⎟⎟⎟⎟⎟⎟⎠. (2.5.41)

Alternatively, we may also set

u1 = T n−1(u), . . . , un = u. (2.5.42)


Then with respect to this reordered basis the matrix of T becomes

S =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

0 1 0 · · · 0

0 0 1 · · · 0...

. . .. . .

. . ....

... · · · . . . 0 1

0 · · · · · · 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠. (2.5.43)

The n × n matrix S expressed in either (2.5.41) or (2.5.43) is also called ashift matrix, which clearly indicates that T is nilpotent of degree n.

In general, we shall see later that, if T ∈ L(U) is nilpotent and U is finite-dimensional, there are T -invariant subspaces U1, . . . , Ul such that

U = U1 ⊕ · · · ⊕ Ul, dim(U1) = n1, . . . , dim(Ul) = nl, (2.5.44)

and the degree of T restricted to Ui is exactly ni (i = 1, . . . , l). Thus, we canchoose T -cyclic vectors of respective periods n1, . . . , nl , say u1, . . . , ul , anduse the vectors

u1, · · · , T n1−1(u1), . . . , ul, . . . , Tnl−1(ul), (2.5.45)

say, as a basis of U . With respect to such a basis, the matrix S of T is

S =

⎛⎜⎜⎜⎜⎜⎜⎝S1 0 · · · 0

0. . .

... 0...

.... . .

...

0 0 · · · Sl

⎞⎟⎟⎟⎟⎟⎟⎠ , (2.5.46)

where each Si is an ni × ni shift matrix (i = 1, . . . , l).Let k be the degree of T or the matrix S. From (2.5.46), we see clearly that

the relation

k = max{n1, . . . , nl} (2.5.47)

holds.

2.5.4 Polynomials of linear mappings

In this section, we have seen that it is often useful to consider various powers ofa linear mapping T ∈ L(U) as well as some linear combinations of appropriate

66 Linear mappings

powers of T . These manipulations motivate the introduction of the notion ofpolynomials of linear mappings. Specifically, for any p(t) ∈ P with the form

p(t) = antn + · · · + a1t + a0, a0, a1, . . . , an ∈ F, (2.5.48)

we define p(T ) ∈ L(U) to be the linear mapping over U given by

p(T ) = anTn + · · · + a1T + a0I. (2.5.49)

It is straightforward to check that all usual operations over polynomials invariable t can be carried over correspondingly to those over polynomials inthe powers of a linear mapping T over the vector space U . For example, iff, g, h ∈ P satisfy the relation f (t) = g(t)h(t), then

f (T ) = g(T )h(T ), (2.5.50)

because the powers of T follow the same rule as the powers of t . That is,T kT l = T k+l , k, l ∈ N.

For T ∈ L(U), let λ ∈ F be an eigenvalue of T . Then, for any p(t) ∈ Pgiven as in (2.5.48), p(λ) is an eigenvalue of p(T ). To see this, we assume thatu ∈ U is an eigenvector of T associated with λ. We have

p(T )(u) = (anTn + · · · + a1T + a0I )(u)

= (anλn + · · · + a1λ+ a0)(u) = p(λ)u, (2.5.51)

as anticipated.If p ∈ P is such that p(T ) = 0, then we say that T is a root of p. Hence, if

T is a root of p, any eigenvalue λ of T must also be a root of p, p(λ) = 0, byvirtue of (2.5.51).

For example, an idempotent mapping is a root of the polynomial p(t) =t2− t , and a nilpotent mapping is a root of a polynomial of the form p(t) = tm

(m ∈ N). Consequently, the eigenvalues of an idempotent mapping can onlybe 0 and 1, and that of a nilpotent mapping, 0.

For T ∈ L(U), let λ1, . . . , λk be some distinct eigenvalues of T

such that T reduces over Eλ1 , . . . , Eλk. Then T must be a root of the

polynomial

p(t) = (t − λ1) · · · (t − λk). (2.5.52)

To see this, we rewrite any u ∈ U as u = u1 + · · · + uk where ui ∈ Eλi,

i = 1, . . . , k. Then


p(T )u =k∑

i=1

p(T )ui

=k∑

i=1

(T − λ1I ) · · · (T − λiI ) · · · (T − λkI)(T − λiI )ui

= 0, (2.5.53)

which establishes p(T ) = 0, as claimed.It is clear that the polynomial p(t), given in (2.5.52), is the lowest-degree

nontrivial polynomial among all polynomials for which T is a root, which isoften referred to as the minimal polynomial of T , a basic notion to be detailedlater.

Exercises

2.5.1 Let S, T ∈ L(U) and S ◦ T = T ◦ S. Show that the null-space N(S)

and range R(S) of S are invariant subspaces under T . In particular, aneigenspace of T associated with eigenvalue λ is seen to be invariantunder T when taking S = T − λI .

2.5.2 Let S, T ∈ L(U). Prove that the invertibility of the mappings I +S ◦T

and I + T ◦ S are equivalent by showing that if I + S ◦ T is invertiblethen so is I + T ◦ S with

(I + T ◦ S)−1 = I − T ◦ (I + S ◦ T )−1 ◦ S. (2.5.54)

2.5.3 The rotation transformation over R2, denoted by Rθ ∈ L(R2), is givenby

Rθ(x) =(

cos θ − sin θ

sin θ cos θ

)x, x =

(x1

x2

)∈ R2. (2.5.55)

Show that Rθ has no nontrivial invariant subspace in R2 unless θ = kπ

(k ∈ Z).2.5.4 Consider the vector space P of the set of all polynomials over a field F.

Define T ∈ L(P) by setting

T (p) = tp(t), ∀p(t) = a0 + a1t + · · · + antn ∈ P . (2.5.56)

Show that T cannot have an eigenvalue.2.5.5 Let A = (aij ) ∈ F(n, n) be invertible and satisfy

n∑j=1

aij = a ∈ F, i = 1, . . . , n. (2.5.57)

68 Linear mappings

(i) Show that a must be an eigenvalue of A and that a = 0.(ii) Show that if A−1 = (bij ) then

n∑j=1

bij = 1

a, i = 1, . . . , n. (2.5.58)

2.5.6 Let S, T ∈ L(U) and assume that S, T are similar. That is, there is aninvertible element R ∈ L(U) such that S = R ◦ T ◦ R−1. Show thatλ ∈ F is an eigenvalue of S if and only if it is an eigenvalue of T .

2.5.7 Let U = F(n, n) and T ∈ L(U) be defined by taking matrix transpose,

T (A) = At, A ∈ U. (2.5.59)

Show that both ±1 are the eigenvalues of T and identify E1 and E−1.Prove that T is reducible over the pair E1 and E−1. Can T have aneigenvalue different from ±1? What is the minimal polynomial of T ?

2.5.8 Let U be a vector space over a field F and T ∈ L(U). If λ1, λ2 ∈F are distinct eigenvalues and u1, u2 are the respectively associatedeigenvectors of T , show that, for any nonzero a1, a2 ∈ F, the vectoru = a1u1 + a2u2 cannot be an eigenvector of T .

2.5.9 Let U be a finite-dimensional vector space over a field F and T ∈L(U). Assume that T is of rank 1.

(i) Prove that T must have an eigenvalue, say λ, in F.(ii) If λ = 0, show that R(T ) = Eλ.

(iii) Is the statement in (ii), i.e. R(T ) = Eλ, valid when λ = 0?

2.5.10 Let A ∈ C(n, n) be unitary. That is, AA† = A†A = In. Show that anyeigenvalue of A is of unit modulus.

2.5.11 Show that, if T ∈ L(U) is reducible over the pair of null-spaces N(T )

and N(I − T ), then T is idempotent.2.5.12 Let S, T ∈ L(U) be idempotent. Show that

(i) R(S) = R(T ) if and only if S ◦ T = T , T ◦ S = S.(ii) N(S) = N(T ) if and only if S ◦ T = S, T ◦ S = T .

2.5.13 Let T ∈ L(R3) be defined by

T (x) =⎛⎜⎝ 1 1 −1

−1 1 1

1 3 −1

⎞⎟⎠ x, x =⎛⎜⎝ x1

x2

x3

⎞⎟⎠ ∈ R3. (2.5.60)

Let S ∈ L(R3) project R3 onto R(T ) along N(T ). Determine S byobtaining a matrix A ∈ R(3, 3) that represents S. That is, S(x) = Ax

for x ∈ R3.


2.5.14 Let U be an n-dimensional vector space (n ≥ 2) and U ′ its dual space.Let u1, u2 and u′1, u′2 be independent vectors in U and U ′, respectively.Define a linear mapping T ∈ L(U) by setting

T (u) = 〈u, u′1〉u1 + 〈u, u′2〉u2, u ∈ U. (2.5.61)

(i) Find the condition(s) regarding u1, u2, u′1, u

′2 under which T is a

projection.(ii) When T is a projection, determine subspaces V and W of U in

terms of u1, u2, u′1, u

′2 so that T projects U onto V along W .

2.5.15 We may slightly generalize the notion of idempotent mappings to amapping T ∈ L(U) satisfying

T 2 = aT , T = 0, a ∈ F, a = 0, 1. (2.5.62)

Show that such a mapping T is reduced over the pair of subspacesN(T ) and N(aI − T ).

2.5.16 Let T ∈ L(U) and λ1, . . . , λk be some distinct eigenvalues of T

such that T is reducible over the eigenspaces Eλ1, . . . , Eλk. Show that

λ1, . . . , λk are all the eigenvalues of T . In other words, if λ is anyeigenvalue of T , then λ is among λ1, . . . , λk .

2.5.17 Is the differential operator D : Pn → Pn reducible? Find an elementin Pn that is of period n+ 1 under D and use it to obtain a basis of Pn.

2.5.18 Let α and β be nonzero elements in F(n, 1). Then A = αβt ∈ F(n, n).

(i) Prove that the necessary and sufficient condition for A to be nilpo-tent is αtβ = 0. If A is nilpotent, what is the degree of A?

(ii) Prove that the necessary and sufficient condition for A to be idem-potent is αtβ = 1.

2.5.19 Show that the linear mapping T ∈ L(P) defined in (2.5.56) and thedifferential operator D ∈ L(P) satisfy the identity D ◦T −T ◦D = I .

2.5.20 Let S, T ∈ L(U) satisfy S ◦ T − T ◦ S = I . Establish the identity

Sk ◦ T − T ◦ Sk = kSk−1, k ≥ 2. (2.5.63)

2.5.21 Let U be a finite-dimensional vector space over a field F and T ∈ L(U)

an idempotent mapping satisfying the equation

T 3 + T − 2I = 0. (2.5.64)

Show that T = I .2.5.22 Let U be a finite-dimensional vector space over a field F and T ∈ L(U)

satisfying

T 3 = T (2.5.65)

so that ±1 cannot be the eigenvalues of T . Show that T = 0.

70 Linear mappings

2.5.23 Let U be a finite-dimensional vector space over a field F and S, T ∈L(U) satisfying S ∼ T . For any p ∈ P , show that p(S) ∼ p(T ).

2.5.24 Let U be a finite-dimensional vector space over a field F and T ∈L(U). For any p ∈ P , show that p(T )′ = p(T ′).

2.5.25 Let U be an n-dimensional vector space over a field F, T ∈ L(U)

nilpotent of degree k ≥ 2, and n(T ) = l ≥ 2. Assume k + l = n+ 1.

(i) Show that there are subspaces V and W of U where

V = Span{u, T (u), . . . , T k−1(u)}, (2.5.66)

with u ∈ U a T -cyclic vector of period k, and W is an (l − 1)-dimensional subspace of N(T ), such that T is reducible over thepair V and W .

(ii) Describe R(T ) and determine r(T ).

2.5.26 Let U be a finite-dimensional vector space over a field F and S, T ∈L(U).

(i) Show that if S is invertible and T nilpotent, S − T must also beinvertible provided that S and T commute, S ◦ T = T ◦ S.

(ii) Find an example to show that the condition in (i) that S and T

commute cannot be removed.

2.5.27 Let U be a finite-dimensional vector space, T ∈ L(U), and V a sub-space of U which is invariant under T . Show that if U = R(T ) ⊕ V

then V = N(T ).

2.6 Norms of linear mappings

In this section, we begin by considering general linear mappings betweenarbitrary finite-dimensional normed vector spaces. We then concentrate onmappings from a normed space into itself.

2.6.1 Definition and elementary properties of normsof linear mappings

Let U,V be finite-dimensional vector spaces over R or C and ‖ · ‖U , ‖ · ‖Vnorms on U,V , respectively. Assume that B = {u1, . . . , un} is any basis of U .

For any u ∈ U , write u as u =n∑

i=1

aiui where a1, . . . , an are the coordinates

2.6 Norms of linear mappings 71

of u with respect to B. Thus

‖T (u)‖V =∥∥∥∥∥T(

n∑i=1

aiui

)∥∥∥∥∥V

≤n∑

i=1

|ai |‖T (ui)‖V

≤(

max1≤i≤n

{‖T (ui)‖V }) n∑

i=1

|ai |

≡(

max1≤i≤n

{‖T (ui)‖V })‖u‖1 ≤ C‖u‖U , (2.6.1)

where we have used the fact that norms over a finite-dimensional space are allequivalent. This estimate may also be restated as

‖T (u)‖V‖u‖U ≤ C, u ∈ U, u = 0. (2.6.2)

This boundedness result enables us to formulate the definition of the normof a linear mapping T ∈ L(U, V ) as follows.

Definition 2.23 For T ∈ L(U, V ), we define the norm of T , induced from therespective norms of U and V , by

‖T ‖ = sup

{‖T (u)‖V‖u‖U

∣∣∣∣ u ∈ U, u = 0

}. (2.6.3)

To show that (2.6.3) indeed defines a norm for the space L(U, V ), we needto examine that it fulfills all the properties required of a norm. To this end, wenote from (2.6.3) that

‖T (u)‖V ≤ ‖T ‖‖u‖U , u ∈ U. (2.6.4)

For S, T ∈ L(U, V ), since the triangle inequality and (2.6.4) give us

‖(S + T )(u)‖V ≤ ‖S(u)‖V + ‖T (u)‖V ≤ (‖S‖ + ‖T ‖)‖u‖U , (2.6.5)

we obtain ‖S+T ‖ ≤ ‖S‖+‖T ‖, which says the triangle inequality holds overL(U, V ).

Let a be any scalar. Then ‖aT (u)‖V = |a|‖T (u)‖V for any u ∈ U . Hence

‖aT ‖ = sup

{‖aT (u)‖V‖u‖U

∣∣∣∣ u ∈ U, u = 0

}= |a| sup

{‖T (u)‖V‖u‖U

∣∣∣∣ u ∈ U, u = 0

}= |a|‖T ‖, (2.6.6)

which indicates that homogeneity follows.Finally, it is clear that ‖T ‖ = 0 implies T = 0.

72 Linear mappings

Let T ∈ L(U, V ) and S ∈ L(V,W) where W is another normed vectorspace with the norm ‖ · ‖W . Then S ◦T ∈ L(U,W) and it follows from (2.6.4)that

‖(S ◦ T )(u)‖W ≤ ‖S‖‖T (u)‖V ≤ ‖S‖‖T ‖‖u‖U , u ∈ U. (2.6.7)

Consequently, we get

‖S ◦ T ‖ ≤ ‖S‖‖T ‖. (2.6.8)

This simple but general inequality is of basic usefulness.In the rest of the section, we will focus on mappings from U into itself. We

note that, for the identity mapping I ∈ L(U), it is clear that ‖I‖ = 1.

2.6.2 Invertibility of linear mappings as a generic property

Let U be a finite-dimensional vector space with norm ‖ · ‖. It has been seenthat the space L(U) may be equipped with an induced norm which may also bedenoted by ‖·‖ since there is no risk of confusion. The availability of a norm ofL(U) allows one to perform analysis on L(U) so that a deeper understandingof L(U) may be achieved.

As an illustration, in this subsection, we will characterize invertibility oflinear mappings by using the norm.

Theorem 2.24 Let U be a finite-dimensional normed space. An element T ∈L(U) is invertible if and only if there is a constant c > 0 such that

‖T (u)‖ ≥ c‖u‖, u ∈ U. (2.6.9)

Proof Assume (2.6.9) is valid. Then it is clear that N(T ) = {0}. Hence T

is invertible. Conversely, assume that T is invertible and T −1 ∈ L(U) is itsinverse. Then 1 = ‖I‖ = ‖T −1 ◦ T ‖ ≤ ‖T −1‖‖T ‖ implies that the normof an invertible mapping can never be zero. Thus, for any u ∈ U , we have‖u‖ = ‖(T −1 ◦T )(u)‖ ≤ ‖T −1‖‖T (u)‖ or ‖T (u)‖ ≥ (‖T −1‖)−1‖u‖, u ∈ U ,which establishes (2.6.9).

We now show that invertibility is a generic property for elements in L(U).

Theorem 2.25 Let U be a finite-dimensional normed space and T ∈ L(U).

(1) For any ε > 0 there exists an invertible element S ∈ L(U) such that‖S − T ‖ < ε. This property says that the subset of invertible mappings inL(U) is dense in L(U) with respect to the norm of L(U).


(2) If T ∈ L(U) is invertible, then there is some ε > 0 such that S ∈ L(U) isinvertible whenever S satisfies ‖S − T ‖ < ε. This property says that thesubset of invertible mappings in L(U) is open in L(U) with respect to thenorm of L(U).

Proof For any scalar λ, consider Sλ = T − λI . If dim(U) = n, there are atmost n possible values of λ for which Sλ is not invertible. Now ‖T − Sλ‖ =|λ|‖I‖ = |λ|. So for any ε > 0, there is a scalar λ, |λ| < ε, such that Sλ isinvertible. This proves (1).

We next consider (2). Let T ∈ L(U) be invertible. Then (2.6.9) holds forsome c > 0. Let S ∈ L(U) be such that ‖S − T ‖ < ε for some ε > 0. Then,for any u ∈ U , we have

c‖u‖ ≤ ‖T (u)‖ = ‖(T − S)(u)+ S(u)‖≤ ‖T − S‖‖u‖ + ‖S(u)‖ ≤ ε‖u‖ + ‖S(u)‖, u ∈ U, (2.6.10)

or ‖S(u)‖ ≥ (c − ε)‖u‖. Therefore, if we choose ε < c, we see in viewof Theorem 2.24 that S is invertible when ‖S − T ‖ < ε. Hence (2) followsas well.

2.6.3 Exponential of a linear mapping

Let T ∈ L(U). For a positive integer m, we consider Tm ∈ L(U) given by

Tm =m∑

k=0

1

k!Tk, (2.6.11)

with the understanding T 0 = I . Therefore, for l < m, we have the estimate

‖Tl − Tm‖ ≤m∑

k=l+1

‖T ‖kk! . (2.6.12)

In particular, ‖Tl − Tm‖ → 0 as l, m→∞. Hence we see that the limit

limm→∞

m∑k=0

1

k!Tk (2.6.13)

is a well-defined element in L(U) and is naturally denoted as

eT =∞∑

k=0

1

k!Tk, (2.6.14)

74 Linear mappings

and is called the exponential of T ∈ L(U). Thus e0 = I . As in calculus, ifS, T ∈ L(U) are commutative, we can verify the formula

eSeT ≡ eS ◦ eT = eS+T . (2.6.15)

A special consequence of this simple property is that the exponential of anymapping T ∈ L(U) is invertible. In fact, the relation (2.6.15) indicates that

(eT )−1 = e−T . (2.6.16)

More generally, with the notation (t) = etT (t ∈ R), we have

(1) (s)(t) = (s + t), s, t ∈ R,(2) (0) = I ,

and we say that : R→ L(U) defines a one-parameter group.Furthermore, we also have

1

h((t + h)−(t)) = 1

h(e(t+h)T − etT )

= 1

h(ehT − I )etT = etT 1

h(ehT − I )

= T

( ∞∑k=0

hk

(k + 1)!Tk

)etT

= etT

( ∞∑k=0

hk

(k + 1)!Tk

)T , h ∈ R, h = 0. (2.6.17)

Therefore, we obtain the limit

limh→0

1

h((t + h)−(t)) = T etT = etT T . (2.6.18)

In other words, the above result gives us

′(t) = d

dtetT = T etT = etT T = T (t) = (t)T , t ∈ R. (2.6.19)

In particular, ′(0) = T , which intuitively says that T is the initial rate ofchange of the one-parameter group . One also refers to this relationship as‘T generates ’ or ‘T is the generator of .’

The relation (2.6.19) suggests that it is legitimate to differentiate the series

(t) =∞∑

k=0

1

k! tkT k term by term:

d

dt

∞∑k=0

1

k! tkT k =

∞∑k=0

1

k!d

dt(tk)T k = T

∞∑k=0

1

k! tkT k, t ∈ R. (2.6.20)


With the exponential of a linear mapping, T ∈ L(U), various elementaryfunctions of T involving the exponentials of T may also be introduced accord-ingly. For example,

cosh T = 1

2

(eT + e−T

), sinh T = 1

2

(eT − e−T

), (2.6.21)

cos T = 1

2

(eiT + e−iT

), sin T = 1

2i

(eiT − e−iT

), (2.6.22)

are all well defined and enjoy similar properties of the corresponding classicalfunctions.

The matrix version of the discussion here can also be easily formulated anal-ogously and is omitted.

Exercises

2.6.1 Let U be a finite-dimensional normed space and {Tn} ⊂ L(U) asequence of invertible mappings that converges to a non-invertiblemapping T ∈ L(U). Show that ‖T −1

n ‖ → ∞ as n→∞.2.6.2 Let U and V be finite-dimensional normed spaces with the norms ‖·‖U

and ‖ · ‖V , respectively. For T ∈ L(U, V ), show that the induced normof T may also be evaluated by the expression

‖T ‖ = sup{‖T (u)‖V | u ∈ U, ‖u‖U = 1}. (2.6.23)

2.6.3 Consider T ∈ L(Rn,Rm) defined by

T (x) = Ax, A = (aij ) ∈ R(m, n), x =

⎛⎜⎜⎝x1

...

xn

⎞⎟⎟⎠∈ Rn. (2.6.24)

Use the norm ‖ · ‖∞ for both Rm and Rn, correspondingly, and denoteby ‖T ‖∞ the induced norm of the linear mapping T given in (2.6.24).Show that

‖T ‖∞ = max1≤i≤m

⎧⎨⎩n∑

j=1

|aij |⎫⎬⎭ . (2.6.25)

(This quantity is also sometimes denoted as ‖A‖∞.)2.6.4 Let T be the linear mapping defined in (2.6.24). Show that, if we use

the norm ‖ · ‖1 for both Rm and Rn, correspondingly, and denote by‖T ‖1 the induce norm of T , then

76 Linear mappings

‖T ‖1 = max1≤j≤n

{m∑

i=1

|aij |}

. (2.6.26)

(This quantity is also sometimes denoted as ‖A‖1.)2.6.5 Let U be a finite-dimensional normed space over R or C and use N to

denote the subset of L(U) consisting of all nilpotent mappings. Showthat N is not an open subset of L(U).

2.6.6 Let U be a finite-dimensional normed space and T ∈ L(U). Prove that‖eT ‖ ≤ e‖T ‖.

2.6.7 Let A ∈ R(n, n). Show that (eA)t = eAt

and that, if A is skew-symmetric, At = −A, then eA is orthogonal.

2.6.8 Let A ∈ C(n, n). Show that (eA)† = eA†and that, if A is anti-

Hermitian, A† = −A, then eA is unitary.2.6.9 Let A ∈ R(n, n) and consider the initial value problem of the following

system of differential equations

dx

dt= Ax, x = x(t) =

⎛⎜⎜⎝x1(t)

...

xn(t)

⎞⎟⎟⎠ ; x(0) = x0 =

⎛⎜⎜⎝x1,0

...

xn,0

⎞⎟⎟⎠ .

(2.6.27)

(i) Show that, with the one-parameter group (t) = etA, the solutionof the problem (2.6.27) is simply given by x = (t)x0.

(ii) Moreover, use x(i)(t) to denote the solution of (2.6.27) when x0 =ei where {e1, . . . , en} is the standard basis of Rn, i = 1, . . . , n.Show that (t) = etA is the n× n matrix with x(1)(t), . . . , x(n)(t)

as the n corresponding column vectors,

(t) = (x(1)(t), . . . , x(n)(t)). (2.6.28)

(This result provides a practical method for computing the matrix ex-ponential, etA, which may also be viewed as the solution of the matrix-valued initial value problem

dX

dt= AX, X(0) = In, X ∈ R(n, n).) (2.6.29)

2.6.10 Consider the mapping : R→ R(2, 2) defined by

(t) =(

cos t − sin t

sin t cos t

), t ∈ R. (2.6.30)


(i) Show that (t) is a one-parameter group.(ii) Find the generator, say A ∈ R(2, 2), of (t).

(iii) Compute etA directly by the formula

etA =∞∑

k=0

tk

k!Ak (2.6.31)

and verify (t) = etA.(iv) Use the practical method illustrated in Exercise 2.6.9 to obtain

the matrix exponential etA through solving two appropriate initialvalue problems as given in (2.6.27).

2.6.11 For the functions of T ∈ L(U) defined in (2.6.21) and (2.6.22), estab-lish the identities

cosh2 T − sinh2 T = I, cos2 T + sin2 T = I. (2.6.32)

2.6.12 For T ∈ L(U), establish the formulas

d

dt(sinh tT ) = T cosh tT ,

d

dt(cosh tT ) = T sinh tT . (2.6.33)

2.6.13 For T ∈ L(U), establish the formulas

d

dt(sin tT ) = T cos tT ,

d

dt(cos tT ) = −T sin tT . (2.6.34)

3

Determinants

In this chapter we introduce one of the most important computational tools inlinear algebra – the determinants. First we discuss some motivational exam-ples. Next we present the definition and basic properties of determinants. Thenwe study some applications of determinants.

3.1 Motivational examples

We now present some examples occurring in geometry, algebra, and topologythat use determinants as a natural underlying computational tool.

3.1.1 Area and volume

Let u = (a1, a2) and v = (b1, b2) be nonzero vectors in R2. We considerthe area of the parallelogram formed from using these two vectors as adjacentedges. First we may express u in polar coordinates as

u = (a1, a2) = ‖u‖(cos θ, sin θ). (3.1.1)

Thus, we may easily resolve the vector v along the direction of u and thedirection perpendicular to u as follows

v = (b1, b2) = c1(cos θ, sin θ)+ c2

(cos[θ ± π

2

], sin[θ ± π

2

])= (c1 cos θ ∓ c2 sin θ, c1 sin θ ± c2 cos θ), c1, c2 ∈ R. (3.1.2)

Here c2 may be interpreted as the length of the vector in the resolution that istaken to be perpendicular to u. Hence, from (3.1.2), we can read off the result

c2 = ±(b2 cos θ − b1 sin θ) = |b2 cos θ − b1 sin θ |. (3.1.3)

78

3.1 Motivational examples 79

Therefore, using (3.1.3) and then (3.1.1), we see that the area σ of the parallel-ogram under consideration is given by

σ = c2‖u‖ = |‖u‖ cos θ b2 − ‖u‖ sin θ b1| = |a1b2 − a2b1|. (3.1.4)

Thus we see that the quantity a1b2 − a2b1 formed from the vectors (a1, a2)

and (b1, b2) stands out, that will be called the determinant of the matrix

A =(

a1 a2

b1 b2

), (3.1.5)

written as det(A) or denoted by ∣∣∣∣∣ a1 a2

b1 b2

∣∣∣∣∣ . (3.1.6)

Since det(A) = ±σ , it is also referred to as the signed area of the parallelo-gram formed by the vectors (a1, a2) and (b1, b2).

We now consider volume. We shall apply some vector algebra over R3 tofacilitate our discussion.

We use · and × to denote the usual dot and cross products between vectorsin R3. We use i, j, k to denote the standard mutually orthogonal unit vectors inR3 that form a right-hand system. For any vectors

u = (a1, a2, a3) = a1i+ a2j+ a3k, v = (b1, b2, b3) = b1i+ b2j+ b3k,

(3.1.7)

in R3, we know that

u× v = (a2b3 − a3b2)i− (a1b3 − a3b2)j+ (a1b2 − a2b1)k (3.1.8)

is perpendicular to both u and v and ‖u × v‖ gives us the area of the paral-lelogram formed from using u, v as two adjacent edges, which generalizes thepreceding discussion in R2. To avoid the trivial situation, we assume u and v

are linearly independent. So u× v = 0 and

R3 = Span{u, v, u× v}. (3.1.9)

Let w = (c1, c2, c3) be another vector. Then (3.1.9) allows us toexpress w as

w = au+ bv + c(u× v), a, b, c ∈ R. (3.1.10)

From the geometry of the problem, we see that the volume of the parallelepipedformed from using u, v,w as adjacent edges is given by

δ = ‖u× v‖‖c(u× v)‖ = |c|‖u× v‖2 (3.1.11)

80 Determinants

because ‖c(u × v)‖ is the height of the parallelepiped, with the bottom area‖u× v‖.

From (3.1.10), we have

w · (u× v) = c‖u× v‖2. (3.1.12)

Inserting (3.1.12) into (3.1.11), we obtain the simplified volume formula

δ = |w · (u× v)|= |c1(a2b3 − a3b2)− c2(a1b3 − a3b2)+ c3(a1b2 − a2b1)|. (3.1.13)

In analogy of the earlier discussion on the area formula in R2, we may set upthe matrix

A =⎛⎜⎝ c1 c2 c3

a1 a2 a3

b1 b2 b3

⎞⎟⎠ , (3.1.14)

and define the signed volume or determinant of the 3× 3 matrix A as

det(A) =

∣∣∣∣∣∣∣c1 c2 c3

a1 a2 a3

b1 b2 b3

∣∣∣∣∣∣∣= c1(a2b3 − a3b2)− c2(a1b3 − a3b2)+ c3(a1b2 − a2b1). (3.1.15)

In view of the 2 × 2 determinants already defined, we may rewrite (3.1.15)in the decomposed form∣∣∣∣∣∣∣

c1 c2 c3

a1 a2 a3

b1 b2 b3

∣∣∣∣∣∣∣ = c1

∣∣∣∣∣ a2 a3

b2 b3

∣∣∣∣∣− c2

∣∣∣∣∣ a1 a3

b1 b3

∣∣∣∣∣+ c3

∣∣∣∣∣ a1 a2

b1 b2

∣∣∣∣∣ .(3.1.16)

3.1.2 Solving systems of linear equations

Next we remark that a more standard motivational example for determinants isCramer’s rule or Cramer’s formulas for solving systems of linear equations.

For example, consider the 2× 2 system{a1x1 + a2x2 = c1,

b1x1 + b2x2 = c2,(3.1.17)

where a1, a2, b1, b2, c1, c2 ∈ F. Multiplying the first equation by b1, the sec-ond equation by a1, and subtracting, we have

(a1b2 − a2b1)x2 = a1c2 − b1c1; (3.1.18)


multiplying the first equation by b2, the second equation by a2, and subtracting,we have

(a1b2 − a2b1)x1 = b2c1 − a2c2. (3.1.19)

Thus, with the notation of determinants and in view of (3.1.18) and (3.1.19),we may express the solution to (3.1.17) elegantly as

x1 =

∣∣∣∣∣ c1 a2

c2 b2

∣∣∣∣∣∣∣∣∣∣ a1 a2

b1 b2

∣∣∣∣∣, x2 =

∣∣∣∣∣ a1 c1

b1 c2

∣∣∣∣∣∣∣∣∣∣ a1 a2

b1 b2

∣∣∣∣∣, if

∣∣∣∣∣ a1 a2

b1 b2

∣∣∣∣∣ = 0. (3.1.20)

The extension of these formulas to 3 × 3 systems will be assigned as anexercise.

3.1.3 Topological invariants

Let f be a real-valued continuously differentiable function over R that mapsthe closed interval [α, β] (α < β) into [a, b] (a < b) so that the boundarypoints are mapped into boundary points as well,

f : {α, β} → {a, b}. (3.1.21)

The function f maps the interval [α, β] to cover the interval [a, b].At a point t0 ∈ [α, β] where the derivative of f is positive, f ′(t0) > 0, f

maps a small interval around t0 onto a small interval around f (t0) and pre-serves the orientation (from left to right) of the intervals; if f ′(t0) < 0, itreverses the orientation. If f ′(t0) = 0, we say that t0 is a regular point of f .

For c ∈ [a, b], if f ′(t) = 0 for any t ∈ f−1(c), we call c a regular value off . It is clear that, f−1(c) is a finite set when c is a regular value of f . If c is aregular value of f , we define the integer

N(f, c) = N+(f, c)−N−(f, c), (3.1.22)

where

N±(f, c) = the number of points in f−1(c) where ±f ′ > 0. (3.1.23)

If f ′(t) = 0 for any t ∈ (α, β), then N±(f, c) = 1 and N∓(f, c) = 0according to ±f ′(t) > 0 (t ∈ (α, β)), which leads to N(f, c) = ±1 accordingto ±f ′(t) > 0 (t ∈ (α, β)). In particular, N(f, c) is independent of c.

If f ′ = 0 at some point, such a point is called a critical point of f . Weassume further that f has finitely many critical points in (α, β), say t1, . . . , tm

82 Determinants

(m ≥ 1). Set t0 = α, tm+1 = β, and assume ti < ti+1 for i = 0, . . . , m. Then,if c is a regular value of f , we have

N±(f, c) = the number of intervals (ti , ti+1) containing f−1(c)

and satisfying ± f ′(t) > 0, t ∈ (ti , ti+1), i = 0, . . . , m.

(3.1.24)

Thus, we have seen that, although both N+(f, c) and N−(f, c) may dependon c, the quantity N(f, c) as given in (3.1.22) does not depend on c and is seento satisfy

N(f, c) =

⎧⎪⎨⎪⎩1, f (α) = a, f (β) = b,

−1, f (α) = b, f (β) = a,

0, f (α) = f (β).

(3.1.25)

This quantity may summarily be rewritten into the form of an integral

N(f, c) = 1

b − a(f (β)− f (α)) = 1

b − a

∫ β

α

f ′(t) dt, (3.1.26)

and interpreted to be the number count for the orientation-preserving timesminus the orientation-reversing times the function f maps the interval [α, β]to cover the interval [a, b]. The advantage of using the integral representationfor N(f, c) is that it is clearly independent of the choice of the regular valuec. Indeed, the right-hand side of (3.1.26) is well defined for any differentiablefunction f not necessarily having only finitely many critical points and thespecification of a regular value c for f becomes irrelevant. In fact, the integralon the right-hand side of (3.1.26) is the simplest topological invariant calledthe degree of the function f , which may now be denoted by

deg(f ) = 1

b − a

∫ β

α

f ′(t) dt. (3.1.27)

The word ‘topological’ is used to refer to the fact that a small alteration of f

cannot perturb the value of deg(f ) since deg(f ) may take only integer val-ues and the right-hand side of (3.1.27), however, relies on the derivative of f

continuously.As a simple application, we note that it is not hard to see that for any c ∈

[a, b] the equation f (t) = c has at least one solution when deg(f ) = 0.We next extend our discussion of topological invariants to two-dimensional

situations.Let � and C be two closed differentiable curves in R2 oriented counterclock-

wise and let

u : � → C (3.1.28)


be a differentiable map. In analogy with the case of a real-valued functionover an interval discussed above, we may express the number count for theorientation-preserving times minus the orientation-reversing times u maps thecurve � to cover the curve C in the form of a line integral,

deg(u) = 1

|C|∫

�

τ · du, (3.1.29)

where |C| denotes the length of the curve C and τ is the unit tangent vectoralong the positive direction of C.

In the special situation when C = S1 (the unit circle in R2 centered at theorigin), we write u as

u = (f, g), f 2 + g2 = 1, (3.1.30)

where f, g are real-valued functions, so that

τ = (−g, f ). (3.1.31)

Now assume further that the curve � is parameterized by a parameter t takenover the interval [α, β]. Then, inserting (3.1.30) and (3.1.31) into (3.1.29) andusing |S1| = 2π , we have

deg(u) = 1

2π

∫ β

α

(−g, f ) · (f ′, g′) dt

= 1

2π

∫ β

α

(fg′ − gf ′) dt

= 1

2π

∫ β

α

∣∣∣∣∣ f g

f ′ g′

∣∣∣∣∣ dt. (3.1.32)

Thus we see that the concept of determinant arises naturally again.Let v be a vector field over R2. Let � be a closed curve in R2 where v = 0.

Then

v� = 1

‖v‖v (3.1.33)

defines a map from � into S1. The index of the vector field v along the curve� is then defined as

ind(v|�) = deg(v�), (3.1.34)

which is another useful topological invariant.As an example, we consider a vector field v over R2 given by

v(x, y) = (x2 − y2, 2xy), (x, y) ∈ R2. (3.1.35)

Then ‖v(x, y)‖2 = (x2+y2)2 > 0 for (x, y) = (0, 0). So for any closed curve� not intersecting the origin, the quantity ind(v|�) is well defined.

84 Determinants

Let S1R denote the circle of radius R > 0 in R2 centered at the origin. We

may parameterize S1R by the polar angle θ : x = R cos θ, y = R sin θ , θ ∈

[0, 2π ]. With (3.1.35), we have

vS1R= 1

R2 (x2 − y2, 2xy)

= (cos2 θ − sin2 θ, 2 cos θ sin θ) = (cos 2θ, sin 2θ). (3.1.36)

Therefore, using (3.1.32), we get

ind(v|S1R) = deg(vS1

R) = 1

2π

∫ 2π

0

∣∣∣∣∣ cos 2θ sin 2θ

−2 sin 2θ 2 cos 2θ

∣∣∣∣∣ dθ = 2. (3.1.37)

For any closed curve � enclosing but not intersecting the origin, we cancontinuously deform it into a circle S1

R (R > 0), while staying away fromthe origin in the process. By continuity or topological invariance, we obtainind(v|�) = ind(v|S1

R) = 2. The meaning of this result will be seen in the

following theorem.

Theorem 3.1 Let v be a vector field that is differentiable over a boundeddomain in R2 and let � be a closed curve contained in . If v = 0 on� and ind(v|�) = 0, then there must be at least one point enclosed inside �

where v = 0.

Proof Assume otherwise that v = 0 in the domain enclosed by �. Let γ

be another closed curve enclosed inside �. Since γ may be obtained from� through a continuous deformation and v = 0 nowhere inside �, we haveind(v|�) = ind(v|γ ). On the other hand, if we parameterize the curve γ usingits arclength s, then

ind(v|γ ) = deg(vγ ) = 1

2π

∫γ

τ · v′γ (s) ds, vγ =(

1

‖v‖v

)∣∣∣∣γ

, (3.1.38)

where τ(s) is the unit tangent vector along the unit circle S1 at the image pointvγ (s) under the map vγ : γ → S1. Rewrite vγ as

vγ (s) = (f (x(s), y(s)), g(x(s), y(s))). (3.1.39)

Then

v′γ (s) = (fxx′(s)+ fyy

′(s), gxx′(s)+ gyy

′(s)). (3.1.40)

The assumption on v gives us the uniform boundedness |fx |, |fy |, |gx |, |gy |inside �. Using this property and (3.1.40), we see that there is a γ -independentconstant C > 0 such that


‖v′γ (s)‖ ≤ C

√x′(s)2 + y′(s)2 = C. (3.1.41)

In view of (3.1.38) and (3.1.41), we have

1 ≤ |ind(v|γ )| ≤ 1

2π

∫γ

‖v′γ (s)‖ ds ≤ C

2π|γ |, (3.1.42)

which leads to absurdness when the total arclength |γ | of the curve γ is madesmall enough.

Thus, returning to the example (3.1.35), we conclude that the vector field v

has a zero inside any circle S1R (R > 0) since we have shown that ind(v|S1

R) =

2 = 0, which can only be the origin as seen trivially in (3.1.35) already.We now use Theorem 3.1 to establish the celebrated Fundamental Theorem

of Algebra as stated below.

Theorem 3.2 Any polynomial of degree n ≥ 1 with coefficients in C of theform

f (z) = anzn + an−1z

n−1 + · · · + a0, a0, . . . , an−1, an ∈ C, an = 0,

(3.1.43)

must have a zero in C. That is, there is some z0 ∈ C such that f (z0) = 0.

Proof Without loss of generality and for sake of simplicity, we may assumean = 1 otherwise we may divide f (z) by an.

Let z = x + iy, x, y ∈ R, and write f (z) as

f (z) = P(x, y)+ iQ(x, y), (3.1.44)

where P,Q are real-valued functions of x, y. Consider the vector field

v(x, y) = (P (x, y),Q(x, y)). (3.1.45)

Then it is clear that ‖v(x, y)‖ = |f (z)| and it suffices to show that v vanishesat some (x0, y0) ∈ R2.

In order to simplify our calculation, we consider a one-parameter deforma-tion of f (z) given by

f t (z) = zn + t (an−1zn−1 + · · · + a0), t ∈ [0, 1], (3.1.46)

and denote the correspondingly constructed vector field by vt (x, y). So on thecircle S1

R = {(x, y) ∈ R2 | ‖(x, y)‖ = |z| = R} (R > 0), we have the uniformlower estimate

86 Determinants

‖vt (x, y)‖ = |f t (z)|≥ Rn

(1− |an−1| 1

R− · · · − |a0| 1

Rn

)≡ C(R), t ∈ [0, 1]. (3.1.47)

Thus, when R is sufficiently large, we have C(R) ≥ 1 (say). For such a choiceof R, by topological invariance, we have

ind(v|S1

R

)= ind(v1|S1

R

)= ind(v0|S1

R

). (3.1.48)

On the other hand, over S1R we may again use the polar angle θ : x =

R cos θ, y = R sin θ , or z = Reiθ , to represent f 0 as f 0(z) = Rneinθ . Hencev0 = Rn(cos nθ, sin nθ). Consequently,

v0S1

R

= 1

‖v0‖v0 = (cos nθ, sin nθ). (3.1.49)

Therefore, as before, we obtain

ind(v0|S1

R

)= deg(v0S1

R

)= 1

2π

∫ 2π

0

∣∣∣∣∣ cos nθ sin nθ

−n sin nθ n cos nθ

∣∣∣∣∣ dθ = n. (3.1.50)

In view of (3.1.48) and (3.1.50), we get ind(v|S1R) = n. Thus, applying Theo-

rem 3.1, we conclude that v must vanish somewhere inside the circle S1R .

Use � to denote a closed surface in R3 and S2 the standard unit sphere in R3.We may also consider a map u : � → S2. Since the orientation of S2 is givenby its unit outnormal vector, say ν, we may analogously express the numbercount, for the number of times that u covers S2 in an orientation-preservingmanner minus the number of times that u covers S2 in an orientation-reversingmanner, in the form of a surface integral, also called the degree of the mapu : � → S2, by

deg(u) = 1

|S2|∫

�

ν · dσ, (3.1.51)

where dσ is the vector area element over S2 induced from the map u.To further facilitate computation, we may assume that � is parameterized

by the parameters s, t over a two-dimensional domain and u = (f, g, h),where f, g, h are real-valued functions of s, t so that f 2+g2+h2 = 1. At theimage point u, the unit outnormal of S2 at u, is simply u itself. Moreover, thevector area element at u under the mapping u can be represented as


dσ =(

∂u

∂s× ∂u

∂t

)dsdt. (3.1.52)

Thus, inserting these and |S2| = 4π into (3.1.51), we arrive at

deg(u) = 1

4π

∫

u ·(

∂u

∂s× ∂u

∂t

)dsdt

= 1

4π

∫

∣∣∣∣∣∣∣f g h

fs gs hs

ft gt ht

∣∣∣∣∣∣∣ dsdt. (3.1.53)

This gives another example of the use of determinants.

Exercises

3.1.1 For the 3× 3 system of equations⎧⎪⎨⎪⎩a1x1 + a2x2 + a3x3 = d1,

b1x1 + b2x2 + b3x3 = d2,

c1x1 + c2x2 + c3x3 = d3,

(3.1.54)

find similar solution formulas as those for (3.1.17) expressed as ratios ofsome 3× 3 determinants.

3.1.2 Let f : over R [α, β] → [a, b] be a real-valued continuously differen-tiable function satisfying {f (α), f (β)} ⊂ {a, b}, where α, β, a, b ∈ R

and α < β and a < b. Show that if deg(f ) = 0 then f (t) = c has asolution for any c ∈ (a, b).

3.1.3 Consider the vector fields or maps f, g : R2 → R2 defined by

f (x, y) = (ax, by), g(x, y) = (a2x2 − b2y2, 2abxy), (x, y) ∈ R2,

(3.1.55)

where a, b > 0 are constants.

(i) Compute the indices of f and g around a suitable closed curvearound the origin of R2.

(ii) What do you notice? Do the results depend on a, b? In particular,what are the numbers (counting multiplicities) of zeros of the mapsf and g?

(iii) If we make some small alterations of f and g by a positive param-eter ε > 0 to get (say)

fε(x, y) = (ax − ε, by),

gε(x, y) = (a2x2 − b2y2 − ε, 2abxy), (x, y) ∈ R2,(3.1.56)

88 Determinants

what happens to the indices of fε and gε around the same closedcurve? What happens to the zeros of fε and gε?

3.1.4 Consider the following simultaneous system of nonlinear equations{x3 − 3xy2 = 5 cos2(x + y),

3x2y − y3 = −2e−x2y2.

(3.1.57)

Use the topological method in this section to prove that the system hasat least one solution.

3.1.5 Consider the stereographic projection of S2 sited in R3 with the Carte-sian coordinates x, y, z onto the xy-plane through the south pole(0, 0,−1) which induces a parameterization of S2 by R2 given by

f = 2x

1+ x2 + y2 , g = 2y

1+ x2 + y2 , h = 1− x2 − y2

1+ x2 + y2 , (x, y) ∈ R2,

(3.1.58)

where u = (f, g, h) ∈ S2. Regarded as the identity map u : S2 → S2,we have deg(u) = 1. Verify this result by computing the integral

deg(u) = 1

4π

∫R2

∣∣∣∣∣∣∣f g h

fx gx hx

fy gy hy

∣∣∣∣∣∣∣ dxdy. (3.1.59)

3.1.6 The hedgehog map is a map S2 → S2 defined in terms of the parame-terization of R2 by polar coordinates r, θ by the expression

u = (cos(nθ) sin f (r), sin(nθ) sin f (r), cos f (r)), (3.1.60)

where 0 < r < ∞, θ ∈ [0, 2π ], n ∈ Z, and f is a real-valued functionsatisfying the boundary condition f (0) = π and f (∞) = 0. Compute

deg(u) = 1

4π

∫ ∞0

∫ 2π

0u ·(

∂u

∂r× ∂u

∂θ

)dθdr (3.1.61)

and explain your result.

3.2 Definition and properties of determinants

Motivated by the practical examples shown in the previous section, we nowsystematically develop the notion of determinants. There are many ways to

3.2 Definition and properties of determinants 89

define determinants. The inductive definition to be presented below is perhapsthe simplest.

Definition 3.3 Consider A ∈ F(n, n) (n ≥ 1). If n = 1, then A = (a) (a ∈ F)and the determinant of A, det(A), is defined to be det(A) = a; if A = (aij )

with n ≥ 2, the minor Mij of the entry aij is the determinant of the (n− 1)×(n − 1) submatrix of A obtained from deleting the ith row and j th column ofA occupied by aij and the cofactor Cij is given by

Cij = (−1)i+jMij , i, j = 1, . . . , n. (3.2.1)

The determinant of A is defined by the expansion formula

det(A) =n∑

i=1

ai1Ci1. (3.2.2)

The formula (3.2.2) is also referred to as the cofactor expansion of the deter-minant according to the first column.

This definition indicates that if a column of an n × n matrix A is zero thendet(A) = 0. To show this, we use induction. When n = 1, it is trivial. Assumethat the statement is true at n − 1 (n ≥ 2). We now prove the statement atn (n ≥ 2). In fact, if the first column of A is zero, then det(A) = 0 simplyby the definition of determinant (see (3.2.2)); if another column rather thanthe first column of A is zero, then all the cofactors Ci1 vanish by the inductiveassumption, which still results in det(A) = 0. The definition also implies that ifA = (aij ) is upper triangular then det(A) is the product of its diagonal entries,det(A) = a11 · · · ann, as may be shown by induction as well.

The above definition of a determinant immediately leads to the followingimportant properties.

Theorem 3.4 Consider the n× n matrices A,B given as

A =

⎛⎜⎜⎜⎜⎜⎜⎜⎝

a11 . . . a1n

· · · · · · · · ·ak1 . . . akn

· · · · · · · · ·an1 . . . ann

⎞⎟⎟⎟⎟⎟⎟⎟⎠, B =

⎛⎜⎜⎜⎜⎜⎜⎜⎝

a11 . . . a1n

· · · · · · · · ·rak1 . . . rakn

· · · · · · · · ·an1 . . . ann

⎞⎟⎟⎟⎟⎟⎟⎟⎠. (3.2.3)

90 Determinants

That is, B is obtained from A by multiplying the kth row of A by a scalar r ∈ F,k = 1, . . . , n. Then we have det(B) = r det(A).

Proof We prove the theorem by induction.When n = 1, the statement of the theorem is trivial.Assume that the statement is true for (n−1)× (n−1) matrices when n ≥ 2.Now for the n × n matrices given in (3.2.3), use CA

ij , and CBij to denote

the cofactors of A and B, respectively, i, j = 1, . . . , n. Then, (3.2.3) and theinductive assumption give us

CBk1 = CA

k1; CBi1 = rCA

i1, i = k. (3.2.4)

Therefore, we arrive at

det(B) =∑i =k

ai1CBi1 + rak1C

Bk1 =∑i =k

ai1rCAi1 + rak1C

Ak1 = r det(A).

(3.2.5)

The proof is complete.

This theorem implies that if a row of a matrix A is zero then det(A) = 0.As an application, we show that if an n × n matrix A = (aij ) is lower

triangular then det(A) = a11 · · · ann. In fact, when n = 1, there is nothingto show. Assume that the formula is true at n − 1 (n ≥ 2). At n ≥ 2, thefirst row of the minor Mi1 of A vanishes for each i = 2, . . . , n. So Mi1 = 0,i = 2, . . . , n. However, the inductive assumption gives us M11 = a22 · · · ann.Thus det(A) = a11(−1)1+1M11 = a11 · · · ann as claimed.

Therefore, if an n × n matrix A = (aij ) is either upper or lower triangular,we infer that there holds det(At ) = det(A), although, later, we will show thatsuch a result is true for general matrices.

Theorem 3.5 For the n × n matrices A = (aij ), B = (bij ), C = (cij ) whichhave identical rows except the kth row in which ckj = akj + bkj , j = 1, . . . , n,we have det(C) = det(A)+ det(B).

Proof We again use induction on n.The statement is clear when n = 1.Assume that the statement is valid for the n− 1 case (n ≥ 2).For A,B,C given in the theorem with n ≥ 2, with the notation in the proof

of Theorem 3.4 and in view of the inductive assumption, we have

CCk1 = CA

k1; CCi1 = CA

i1 + CBi1, i = k. (3.2.6)


Consequently,

det(C) =∑i =k

ai1CCi1 + ck1C

Ck1

=∑i =k

ai1(CAi1 + CB

i1)+ (ak1 + bk1)CAk1

=∑i =k

ai1CAi1 + ak1C

Ak1 +∑i =k

ai1CBi1 + bk1C

Ak1

= det(A)+ det(B), (3.2.7)

as asserted.

Theorem 3.6 Let A,B be two n × n (n ≥ 2) matrices so that B is obtainedfrom interchanging any two rows of A. Then det(B) = − det(A).

Proof We use induction on n.At n = 2, we can directly check that the statement of the theorem is true.Assume that the statement is true at n− 1 ≥ 2.Let A,B be n× n (n ≥ 3) matrices given by

A =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

a11 . . . a1n

· · · · · · · · ·ai1 . . . ain

· · · · · · · · ·aj1 . . . ajn

· · · · · · · · ·an1 . . . ann

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠, B =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

a11 . . . a1n

· · · · · · · · ·aj1 . . . ajn

· · · · · · · · ·ai1 . . . ain

· · · · · · · · ·an1 . . . ann

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠, (3.2.8)

where j = i + k for some k ≥ 1. We observe that it suffices to prove theadjacent case when k = 1 because when k ≥ 2 we may obtain B from A sim-ply by interchanging adjacent rows k times downwardly and then k − 1 timesupwardly, which gives rise to an odd number of adjacent row interchanges.

For the adjacent row interchange, j = i+1, the inductive assumption allowsus to arrive at the following relations between the minors of the matrices A andB immediately,

MBk1 = −MA

k1, k = i, i + 1; MBi1 = MA

i+1,1, MBi+1,1 = MA

i1, (3.2.9)

which implies that the corresponding cofactors of A and B all differ by a sign,

CBk1 = −CA

k1, k = i, i + 1; CBi1 = −CA

i+1,1, CBi+1,1 = −CA

i1. (3.2.10)

92 Determinants

Hence

det(B) =∑

k =i,i+1

ak1CBk1 + ai+1,1C

Bi1 + ai1C

Bi+1,1

=∑

k =i,i+1

ak1(−CAk1)+ ai+1,1(−CA

i+1,1)+ ai1(−CAi1)

= − det(A), (3.2.11)

as expected.

This theorem indicates that if two rows of an n × n (n ≥ 2) matrix A areidentical then det(A) = 0. Thus adding a multiple of a row to another row ofA does not alter the determinant ofA:∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

a11 . . . a1n

· · · · · · · · ·ai1 . . . ain

· · · · · · · · ·rai1 + aj1 . . . rain + ajn

· · · · · · · · ·an1 . . . ann

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣=

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

a11 . . . a1n

· · · · · · · · ·ai1 . . . ain

· · · · · · · · ·rai1 . . . rain

· · · · · · · · ·an1 . . . ann

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

+

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

a11 . . . a1n

· · · · · · · · ·ai1 . . . ain

· · · · · · · · ·aj1 . . . ajn

· · · · · · · · ·an1 . . . ann

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣= det(A). (3.2.12)

The above results provide us with practical computational techniques whenevaluating the determinant of an n× n matrix A. In fact, we may perform thefollowing three types of permissible row operations on A.

(1) Multiply a row of A by a nonzero scalar. Such an operation may alsobe realized by multiplying A from the left by the matrix obtained frommultiplying the corresponding row of the n × n identity matrix I by thesame scalar.

(2) Interchange any two rows of A when n ≥ 2. Such an operation may alsobe realized by multiplying A from the left by the matrix obtained frominterchanging the corresponding two rows of the n× n identity matrix I .


(3) Add a multiple of a row to another row of A when n ≥ 2. Such an oper-ation may also be realized by multiplying A from the left by the matrixobtained from adding the same multiple of the row to another row, corre-spondingly, of the n× n identity matrix I .

The matrices constructed in the above three types of permissible row oper-ations are called elementary matrices of types 1, 2, 3. Let E be an elementarymatrix of a given type. Then E is invertible and E−1 is of the same type. Moreprecisely, if E is of type 1 and obtained from multiplying a row of I by thescalar r , then det(E) = r det(I ) = r and E−1 is simply obtained from multi-plying the same row of I by r−1, resulting in det(E−1) = r−1; if E is of type2, then E−1 = E and det(E) = det(E−1) = − det(I ) = −1; if E is of type 3and obtained from adding an r multiple of the ith row to the j th row (i = j )of I , then E−1 is obtained from adding a (−r) multiple of the ith row to thej th row of I and det(E) = det(E−1) = det(I ) = 1. In all cases,

det(E−1) = det(E)−1. (3.2.13)

In conclusion, the properties of determinant under permissible row opera-tions may be summarized collectively as follows.

Theorem 3.7 Let A be an n× n matrix and E be an elementary matrix of thesame dimensions. Then

det(EA) = det(E) det(A). (3.2.14)

For an n × n matrix A, we can perform a sequence of permissible rowoperations on A to reduce it into an upper triangular matrix, say U , whosedeterminant is simply the product of whose diagonal entries. Thus, if weexpress A as Ek · · ·E1A = U where E1, . . . , Ek are some elementary mat-rices, then Theorem 3.7 gives us the relation

det(Ek) · · · det(E1) det(A) = det(U). (3.2.15)

We are now prepared to establish the similar properties of determinants withrespect to column operations.

Theorem 3.8 The conclusions of Theorems 3.4, 3.5, and 3.6 hold when thestatements about the row vectors are replaced correspondingly with the columnvectors of matrices.

Proof Using induction, it is easy to see that the conclusions of Theorems 3.4and 3.5 hold. We now prove that the conclusion of Theorem 3.6 holds when

94 Determinants

the row interchange there is replaced with column interchange. That is, weshow that if two columns in an n × n matrix A are interchanged, its determi-nant will change sign. This property is not so obvious since our definition ofdeterminant is based on the cofactor expansion by the first column vector andan interchange of the first column with another column alters the first columnof the matrix. The effect of the value of determinant with respect to such analteration needs to be examined closely, which will be our task below.

We still use induction.At n = 2 the conclusion may be checked directly.Assume the conclusion holds at n− 1 ≥ 1.We now prove the conclusion at n ≥ 3. As before, it suffices to establish the

conclusion for any adjacent column interchange.If the column interchange does not involve the first column, we see that the

conclusion about the sign change of the determinant clearly holds in view ofthe inductive assumption and the cofactor expansion formula (3.2.2) since allthe cofactors Ci1 (i = 1, . . . , n) change their sign exactly once under any pairof column interchange.

Now consider the effect of an interchange of the first and second columnsof A. It can be checked that such an operation may be carried out throughmultiplying A by the matrix F from the right where F is obtained from then× n identity matrix I by interchanging the first and second columns of I . Ofcourse, det(F ) = −1.

Let E1, . . . , Ek be a sequence of elementary matrices and U = (uij ) anupper triangular matrix so that Ek · · ·E1A = U . Then we have

UF =

⎛⎜⎜⎜⎜⎜⎜⎜⎝

u12 u11 u13 · · · u1n

u22 0 u23 · · · u2n

0 0 u33 · · · u3n

· · · · · · · · · · · · · · ·0 · · · · · · 0 unn

⎞⎟⎟⎟⎟⎟⎟⎟⎠. (3.2.16)

Thus the cofactor expansion formula down the first column, as stated in Defi-nition 3.3, and the inductive assumption at n− 1 lead us to the result

det(UF) = u11

∣∣∣∣∣∣∣∣∣∣0 u23 · · · · · ·0 u33 · · · · · ·· · · · · · · · · · · ·0 · · · · · · unn

∣∣∣∣∣∣∣∣∣∣− u22

∣∣∣∣∣∣∣∣∣∣u11 u13 · · · · · ·0 u33 · · · · · ·· · · · · · · · · · · ·0 · · · · · · unn

∣∣∣∣∣∣∣∣∣∣= −u11u22 · · · unn = − det(U). (3.2.17)


Combining (3.2.15) and (3.2.17), we obtain

det(Ek · · ·E1AF) = det(UF) = − det(U)

= − det(Ek) · · · det(E1) det(A). (3.2.18)

Thus, applying (3.2.14) on the left-hand side of (3.2.18), we arrive atdet(AF) = − det(A) as desired.

Theorem 3.8 gives us additional practical computational techniques whenevaluating the determinant of an n×n matrix A because it allows us to performthe following three types of permissible column operations on A.

(1) Multiply a column of A by a nonzero scalar. Such an operation may alsobe realized by multiplying A from the right by the matrix obtained frommultiplying the corresponding column of the n × n identity matrix I bythe same scalar.

(2) Interchange any two columns of A when n ≥ 2. Such an operation mayalso be realized by multiplying A from the right by the matrix obtainedfrom interchanging the corresponding two columns of the n × n identitymatrix I .

(3) Add a multiple of a column to another column of A when n ≥ 2. Suchan operation may also be realized by multiplying A from the right by thematrix obtained from adding the same multiple of the column to anothercolumn, correspondingly, of the n× n identity matrix I .

The matrices constructed in the above three types of permissible columnoperations are simply the elementary matrices defined and described earlier.

Like those for permissible row operations, the properties of a determinantunder permissible column operations may be summarized collectively as well,as follows.

Theorem 3.9 Let A be an n× n matrix and E be an elementary matrix of thesame dimensions. Then

det(AE) = det(A) det(E). (3.2.19)

With the above preparation, we are ready to harvest a series of importantproperties of determinants.

First we show that a determinant is invariant under matrix transpose.

Theorem 3.10 Let A be an n× n matrix. Then

det(A) = det(At ). (3.2.20)

96 Determinants

Proof Choose a sequence of elementary matrices E1, . . . , Ek such that

Ek · · ·E1A = U (3.2.21)

is an upper triangular matrix. Thus from Ut = AtEt1 · · ·Et

k and Theorem 3.9we have

det(U) = det(Ut ) = det(AtEt1 · · ·Et

k) = det(At ) det(Et1) · · · det(Et

k).

(3.2.22)

Comparing (3.2.22) with (3.2.15) and noting det(El) = det(Etl ), l = 1, . . . , k,

because an elementary matrix is either symmetric or lower or upper triangular,we arrive at det(A) = det(At ) as claimed.

Next we show that a determinant preserves matrix multiplication.

Theorem 3.11 Let A and B be two n× n matrices. Then

det(AB) = det(A) det(B). (3.2.23)

Proof Let E1, . . . , Ek be a sequence of elementary matrices so that (3.2.21)holds for an upper triangular matrix U . There are two cases to be treatedseparately.

(1) U has a zero row. Then det(U) = 0. Moreover, by the definition of matrixmultiplication, we see that UB also has a zero row. Hence det(UB) = 0.On the other hand (3.2.21) leads us to Ek · · ·E1AB = UB. So

det(Ek) · · · det(E1) det(A) = det(U) = 0,

det(Ek) · · · det(E1) det(AB) = 0.(3.2.24)

In particular, det(A) = 0 and det(AB) = 0. Thus (3.2.23) is valid.(2) U has no zero row. Hence unn = 0. Using (type 3) row operations if nec-

essary we may assume uin = 0 for all i ≤ n−1. Therefore un−1,n−1 = 0.Thus, using more (type 3) row operations if necessary, we may eventuallyassume that U is made diagonal with u11 = 0, . . . , unn = 0. Using (type1) row operations if necessary we may assume u11 = · · · = unn = 1.That is, U = I . Thus we get Ek · · ·E1A = I and Ek · · ·E1AB = B.Consequently,

det(Ek) · · · det(E1) det(A) = 1,

det(Ek) · · · det(E1) det(AB) = det(B),(3.2.25)

which immediately lead us to the anticipated conclusion (3.2.23).



The formula (3.2.23) can be used immediately to derive a few simple butbasic conclusions about various matrices.

For example, if A ∈ F(n, n) is invertible, then there is some B ∈ F(n, n)

such that AB = In. Thus det(A) det(B) = det(AB) = det(In) = 1, whichimplies that det(A) = 0. In other words, the condition det(A) = 0 is neces-sary for any A ∈ F(n, n) to be invertible. In the next section, we shall showthat this condition is also sufficient. As another example, if A ∈ R(n, n) is or-thogonal, then AAt = In. Hence (det(A))2 = det(A) det(At ) = det(AAt) =det(In)= 1. In other words, the determinant of an orthogonal matrix can onlytake values ±1. Similarly, if A ∈ C(n, n) is unitary, then the conditionAA† = In leads us to the conclusion

| det(A)|2 = det(A)det(A) = det(A) det(At) = det(AA†) = det(In) = 1

(3.2.26)

(cf. (3.2.38)). That is, the determinant of a unitary matrix is of modulus one.Below we show that we can make a cofactor expansion along any column

or row to evaluate a determinant.

Theorem 3.12 Let A = (aij ) be an n × n matrix and C = (Cij ) its cofactormatrix. Then

det(A) =n∑

i=1

aikCik

=n∑

j=1

akjCkj , k = 1, . . . , n. (3.2.27)

In other words, the determinant of A may be evaluated by a cofactor expansionalong any column or any row of A.

Proof We first consider the column expansion case.We will make induction on k = 1, . . . , n.If k = 1, there is nothing to show.Assume the statement is true at k ≥ 1 and n ≥ 2. Interchanging the kth and

(k+1)th columns and using the inductive assumption that a cofactor expansioncan be made along the kth column of the matrix with the two columns alreadyinterchanged, we have

− det(A) = a1,k+1(−1)1+kM1,k+1 + · · · + ai,k+1(−1)i+kMi,k+1

+ · · · + an,k+1(−1)n+kMn,k+1, (3.2.28)

98 Determinants

which is exactly what was claimed at k + 1:

det(A) =n∑

i=1

ai,k+1(−1)i+(k+1)Mi,k+1. (3.2.29)

We next consider the row expansion case.We use MAt

ij to denote the minor of the (i, j)th entry of the matrix At (i, j =1, . . . , n). Applying Theorem 3.10 and Definition 3.3 we have

det(A) = det(At ) =n∑

j=1

a1j (−1)j+1MAt

j1 =n∑

j=1

a1j (−1)1+jM1j , (3.2.30)

which establishes the legitimacy of the cofactor expansion formula along thefirst row of A. The validity of the cofactor expansion along an arbitrary rowmay be proved by induction as done for the column case.

Assume that A ∈ F(n, n) takes a boxed upper triangular form,

A =(

A1 A3

0 A2

), (3.2.31)

where A1 ∈ F(k, k), A2 ∈ F(l, l), A3 ∈ F(k, l), and k + l = n. Then we havethe useful formula

det(A) = det(A1) det(A2). (3.2.32)

To prove (3.2.32), we use a few permissible row operations to reduce A1 intoan upper triangular form U1 whose diagonal entries are u11, . . . , ukk (say).Thus det(U1) = u11 · · · ukk . Likewise, we may also use a few permissiblerow operations to reduce A2 into an upper triangular form U2 whose diagonalentries are uk+1,k+1, . . . , unn (say). Thus det(U2) = uk+1,k+1 · · · unn. Nowapply the same sequences of permissible row operations on A. The boxedupper triangular form of A allows us to reduce A into the following uppertriangular form

U =(

U1 A4

0 U2

). (3.2.33)

Since the diagonal entries of U are u11, . . . , ukk, uk+1,k+1, . . . , unn, we have

det(U) = u11 · · · ukkuk+1,k+1 · · · unn = det(U1) det(U2). (3.2.34)

Discounting the effects of the permissible row operations on A, A1, and A2,we see that (3.2.34) implies (3.2.32).


It is easy to see that if A takes a boxed lower triangular form,

A =(

A1 0

A3 A2

), (3.2.35)

where A1 ∈ F(k, k), A2 ∈ F(l, l), A3 ∈ F(l, k), and k + l = n, then (3.2.32)still holds. Indeed, taking transpose in (3.2.35), we have

At =(

At1 At

3

0 At2

), (3.2.36)

which becomes a boxed upper triangular matrix studied earlier. Thus, usingTheorem 3.10 and (3.2.32), we obtain

det(A) = det(At ) = det(At1) det(At

2) = det(A1) det(A2), (3.2.37)

as anticipated.

Exercises

3.2.1 For A ∈ C(n, n) use Definition 3.3 to establish the property

det(A) = det(A), (3.2.38)

where A is the matrix obtained from A by taking complex conjugatefor all entries of A.

3.2.2 Let E ∈ F(n, n) be such that each row and each column of E can onlyhave exactly one nonzero entry which may either be 1 or−1. Show thatdet(E) = ±1.

3.2.3 In F(n, n), anti-upper triangular and anti-lower triangular matricesare of the forms⎛⎜⎜⎜⎜⎜⎜⎝

a11 a12 · · · a1n

...... . .

.0

... . ..

. .. ...

an1 0 · · · 0

⎞⎟⎟⎟⎟⎟⎟⎠ ,

⎛⎜⎜⎜⎜⎜⎜⎝0 · · · 0 a1n

... . ..

. .. ...

0 . ..

. .. ...

an1 an2 · · · ann

⎞⎟⎟⎟⎟⎟⎟⎠ ,

(3.2.39)

respectively. Establish the formulas to express the determinants of thesematrices in terms of the anti-diagonal entries an1, . . . , a1n.

100 Determinants

3.2.4 Show that

det

⎛⎜⎜⎜⎜⎝x 1 1 1

1 y 0 0

1 0 z 0

1 0 0 t

⎞⎟⎟⎟⎟⎠ = txyz− yz− tz− ty, x, y, z, t ∈ R.

(3.2.40)

3.2.5 Let A,B ∈ F(3, 3) and assume that the first and second columns ofA are same as the first and second columns of B. If det(A) = 5 anddet(B) = 2, find det(3A− 2B) and det(3A+ 2B).

3.2.6 (Extension of Exercise 3.2.5) Let A,B ∈ F(n, n) such that only the j thcolumns of them are possibly different. Establish the formula

det(aA+ bB) = (a + b)n−1 (a det(A)+ b det(B)) ,

a, b ∈ F. (3.2.41)

3.2.7 Let A(t) = (aij (t)) ∈ R(n, n) be such that each entry aij (t) is a differ-entiable function of t ∈ R. Establish the differentiation formula

d

dtdet(A(t)) =

n∑i,j=1

daij (t)

dtCij (t), (3.2.42)

where Cij (t) is the cofactor of the entry aij (t), i, j = 1, . . . , n, in thematrix A(t).

3.2.8 Prove the formula

det

⎛⎜⎜⎜⎜⎜⎜⎜⎝

x a1 a2 · · · an

a1 x a2 · · · an

a1 a2 x · · · an

......

.... . .

...

a1 a2 a3 · · · x

⎞⎟⎟⎟⎟⎟⎟⎟⎠=(

x +n∑

i=1

ai

)n∏

i=1

(x − ai).

(3.2.43)

3.2.9 Let p1(t), . . . , pn+2(t) be n+2 polynomials of degrees up to n ∈ N andwith coefficients in C. Show that for any n + 2 numbers c1, . . . , cn+2

in C, there holds

det

⎛⎜⎜⎜⎜⎜⎝p1(c1) p1(c2) · · · p1(cn+2)

p2(c1) p2(c2) · · · p2(cn+2)

......

. . ....

pn+2(c1) pn+2(c2) · · · pn+2(cn+2)

⎞⎟⎟⎟⎟⎟⎠ = 0. (3.2.44)


3.2.10 (Determinant representation of a polynomial) Establish the formula

det

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

x −1 0 · · · 0 0

0 x −1 · · · 0 0

0 0 x. . . 0 0

......

.... . .

. . ....

0 0 0 · · · x −1

a0 a1 a2 · · · an−1 an

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠= anx

n + an−1xn−1 + · · · + a1x + a0. (3.2.45)

3.2.11 For n ≥ 2 establish the formula

det

⎛⎜⎜⎜⎜⎜⎝a1 − λ a2 · · · an

a1 a2 − λ · · · an

......

. . ....

a1 a2 · · · an − λ

⎞⎟⎟⎟⎟⎟⎠= (−1)nλn−1

(λ−

n∑i=1

ai

). (3.2.46)

3.2.12 Let A ∈ F(n, n) be such that AAt = In and det(A) < 0. Show thatdet(A+ In) = 0.

3.2.13 Let A ∈ F(n, n) such that the entries of A are either 1 or−1. Show thatdet(A) is an even integer when n ≥ 2.

3.2.14 Let A = (aij ) ∈ R(n, n) satisfy the following diagonally dominantcondition:

|aii | >∑j =i

|aij |, i = 1, . . . , n. (3.2.47)

Show that det(A) = 0. (This result is also known as the Levy–Desplanques theorem.)

3.2.15 (A refined version of the previous exercise) Let A = (aij ) ∈ R(n, n)

satisfy the following positive diagonally dominant condition:

aii >∑j =i

|aij |, i = 1, . . . , n. (3.2.48)

Show that det(A) > 0. (This result is also known as the Minkowskitheorem.)

102 Determinants

3.2.16 If A ∈ F(n, n) is skewsymmetric and n is odd, then A must be singular.What happens when n is even?

3.2.17 Let α, β ∈ F(1, n). Establish the formula

det(In − αtβ) = 1− αβt . (3.2.49)

3.2.18 Compute the determinant

f (x1, . . . , xn) = det

⎛⎜⎜⎜⎜⎜⎜⎜⎝

100 x1 x2 · · · xn

x1 1 0 · · · 0

x2 0 1 · · · 0...

......

. . ....

xn 0 0 · · · 1

⎞⎟⎟⎟⎟⎟⎟⎟⎠(3.2.50)

and describe what the equation f (x1, . . . , xn) = 0 representsgeometrically.

3.3 Adjugate matrices and Cramer’s rule

Let A = (aij ) be an n× n (n ≥ 2) matrix and let C = (Cij ) be the associatedcofactor matrix. For k, l = 1, . . . , n, we may apply Theorem 3.12 to obtain therelations

n∑i=1

aikCil = 0, k = l, (3.3.1)

n∑j=1

akjClj = 0, k = l. (3.3.2)

In fact, it is easy to see that the left-hand side of (3.3.1) is the cofactor expan-sion of the determinant along the lth column of such a matrix that is obtainedfrom A through replacing the lth column by the kth column of A whose valuemust be zero and the left-hand side of (3.3.2) is the cofactor expansion of thedeterminant along the lth row of such a matrix that is obtained from A throughreplacing the lth row by the kth row of A whose value must also be zero.

We can summarize the properties stated in (3.2.27), (3.3.1), and (3.3.2) bythe expressions

CtA = det(A)In, ACt = det(A)In. (3.3.3)

These results motivate the following definition.

3.3 Adjugate matrices and Cramer’s rule 103

Definition 3.13 Let A be an n × n matrix and C its cofactor matrix. Theadjugate matrix of A, denoted by adj(A), is the transpose of the cofactor matrixof A:

adj(A) = Ct . (3.3.4)

Adjugate matrices are sometimes also called adjoint or adjunct matrices.As a consequence of this definition and (3.3.3) we have

adj(A)A = A adj(A) = det(A)In, (3.3.5)

which leads immediately to the following conclusion.

Theorem 3.14 Let A be an n × n matrix. Then A is invertible if and only ifdet(A) = 0. Furthermore, if A is invertible, then A−1 may be expressed as

A−1 = 1

det(A)adj(A). (3.3.6)

Proof If det(A) = 0, from (3.3.5) we arrive at (3.3.6). Conversely, if A−1

exists, then from AA−1 = In and Theorem 3.11 we have det(A) det(A−1) = 1.Thus det(A) = 0.

As an important application, we consider the unique solution of the system

Ax = b (3.3.7)

when A = (aij ) is an invertible n× n matrix, x = (x1, . . . , xn)t the vector of

unknowns, and b = (b1, . . . , bn)t a given non-homogeneous right-hand-side

vector.In such a situation we can use (3.3.6) to get

x = 1

det(A)adj(A)b. (3.3.8)

Therefore, with adj(A) = (Aij ) = Ct (where C = (Cij ) is the cofactor matrixof A) we may read off to obtain the result

xi = 1

det(A)

n∑j=1

Aijbj = 1

det(A)

n∑j=1

bjCji

= det(Ai)

det(A), i = 1, . . . , n, (3.3.9)

where Ai is the matrix obtained from A after replacing the ith column of A bythe vector b, i = 1, . . . , n.

104 Determinants

The formulas stated in (3.3.9) are called Cramer’s formulas. Such a solutionmethod is also called Cramer’s rule.

Let A ∈ F(m, n) and of rank k. Then there are k row vectors of A which arelinearly independent. Use B to denote the submatrix of A consisting of those k

row vectors. Since B is of rank k we know that there are k column vectors of B

which are linearly independent. Use C to denote the submatrix of B consistingof those k column vectors. Then C is a submatrix of A which lies in F(k, k)

and is of rank k. In particular det(C) = 0. In other words, we have shown thatif A is of rank k then A has a k × k submatrix of nonzero determinant.

To end this section, we consider a practical problem as an application ofdeterminants: The unique determination of a polynomial by interpolation.

Let p(t) be a polynomial of degree (n− 1) ≥ 1 over a field F given by

p(t) = an−1tn−1 + · · · + a1t + a0, (3.3.10)

and let t1, . . . , tn be n points in F so that p(ti) = pi (i = 1, . . . , n). To ease theillustration to follow, we may assume that n is sufficiently large (say n ≥ 5).Therefore we have the simultaneous system of equations

⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩a0 + a1t1 + · · · + an−2t

n−21 + an−1t

n−11 = p1,

a0 + a1t2 + · · · + an−2tn−22 + an−1t

n−12 = p2,

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ,a0 + a1tn + · · · + an−2t

n−2n + an−1t

n−1n = pn,

(3.3.11)

in the n unknowns a0, a1, . . . , an−2, an−1, whose coefficient matrix, say A, hasthe determinant

det(A) =

∣∣∣∣∣∣∣∣∣∣1 t1 t2

1 · · · tn−21 tn−1

1

1 t2 t22 · · · tn−2

2 tn−12

· · · · · · · · · · · · · · · · · ·1 tn t2

n · · · tn−2n tn−1

n

∣∣∣∣∣∣∣∣∣∣. (3.3.12)

Adding the (−t1) multiple of the second last column to the last column,. . . ,the (−t1) multiple of the second column to the third column, and the (−t1)

multiple of the first column to the second column, we get

3.3 Adjugate matrices and Cramer’s rule 105

det(A)

=

∣∣∣∣∣∣∣∣∣∣1 0 0 · · · 0 0

1 (t2 − t1) t22 (t2 − t1) · · · tn−3

2 (t2 − t1) tn−22 (t2 − t1)

· · · · · · · · · · · · · · · · · ·1 (tn − t1) tn(tn − t1) · · · tn−3

n (tn − t1) tn−2n (tn − t1)

∣∣∣∣∣∣∣∣∣∣=

n∏i=2

(ti − t1)

∣∣∣∣∣∣∣∣∣∣1 t2 t2

2 · · · tn−32 tn−2

2

1 t3 t23 · · · tn−3

3 tn−23

· · · · · · · · · · · · · · · · · ·1 tn t2

n · · · tn−3n tn−2

n

∣∣∣∣∣∣∣∣∣∣. (3.3.13)

Continuing the same expansion, we eventually get

det(A)

=(

n∏i=2

(ti − t1)

)⎛⎝ n∏j=3

(tj − t2)

⎞⎠ · · ·( n∏k=n−1

(tk − tn−2)

)(tn − tn−1)

=∏

1≤i<j≤n

(tj − ti ). (3.3.14)

Hence the matrix A is invertible if and only if t1, t2, . . . , tn are distinct. Undersuch a condition the coefficients a0, a1, . . . , an−1 are uniquely determined.

The determinant (3.3.12) is called the Vandermonde determinant.If A,B ∈ F(n, n) are similar, there is an invertible C ∈ F(n, n) such that

A = C−1BC, which gives us det(A) = det(B). That is, similar matriceshave the same determinant. In view of this fact and the fact that the matrixrepresentations of a linear mapping from a finite-dimensional vector space intoitself with respect to different bases are similar, we may define the determinantof a linear mapping, say T , denoted by det(T ), to be the determinant of thematrix representation of T with respect to any basis. In this situation, we seethat T is invertible if and only if det(T ) = 0.

Exercises

3.3.1 Show that A ∈ F(m, n) is of rank k if and only if there is a k × k

submatrix of A whose determinant is nonzero and that the determinantof any square submatrix of a larger size (if any) is zero.

3.3.2 Let A ∈ R(n, n) be an orthogonal matrix. Prove that adj(A) = At or−At depending on the sign of det(A).

106 Determinants

3.3.3 For A ∈ F(n, n) (n ≥ 2) establish the formula

det(adj(A)) = (det(A))n−1. (3.3.15)

In particular this implies that A is nonsingular if and only if adj(A) isso.

3.3.4 For A ∈ F(n, n) (n ≥ 2) prove the rank relations

r(adj(A)) =

⎧⎪⎨⎪⎩n, r(A) = n,

1, r(A) = n− 1,

0, r(A) ≤ n− 2.

(3.3.16)

3.3.5 Let A ∈ F(n, n) be invertible. Show that adj(A−1) = (adj(A))−1.3.3.6 For A ∈ F(n, n) where n ≥ 3 show that adj(adj(A)) = (det(A))n−2A.

What happens when n = 2?3.3.7 For A ∈ F(n, n) where n ≥ 2 show that adj(At ) = (adj(A))t .3.3.8 Let A ∈ R(n, n) satisfy adj(A) = At . Prove that A = 0 if and only if

det(A) = 0.3.3.9 For A,B ∈ R(n, n) show that adj(AB) = adj(B)adj(A). Thus, if A is

idempotent, so is adj(A), and if A is nilpotent, so is adj(A).3.3.10 For A = (aij ) ∈ F(n, n) consider the linear system

⎧⎪⎨⎪⎩a11x1 + · · · + a1nxn = 0,

· · · · · · · · · · · · · · · · · · · · · · · ·an1x1 + · · · + annxn = 0.

(3.3.17)

Show that if A is singular but adj(A) = 0 then the space of solutionsof (3.3.17) is spanned by any nonzero column vector of adj(A).

3.3.11 Use Cramer’s rule and the Vandermonde determinant to find thequadratic polynomial p(t) with coefficients in C whose values att = 1, 1+ i,−3 are −2, 0, 5, respectively.

3.3.12 Let p(t) be a polynomial of degree n − 1 given in (3.3.10). UseCramer’s rule and the Vandermonde determinant to prove that p(t) can-not have n distinct zeros unless it is the zero polynomial.

3.3.13 Let A ∈ F(m, n), B ∈ F(n,m), and m > n. Show that det(AB) = 0.3.3.14 For U = F(n, n) define T ∈ L(U) by T (A) = AB − BA for A ∈ U

where B ∈ U is fixed. Show that for such a linear mapping T we havedet(T ) = 0 no matter how B is chosen.

3.4 Characteristic polynomials and Cayley–Hamilton theorem 107

3.4 Characteristicpolynomials and Cayley–Hamilton theorem

We first consider the concrete case of matrices.Let A = (aij ) be an n× n matrix over a field F. We first consider the linear

mapping TA : Fn → Fn induced from A defined by

TA(x) = Ax, x =

⎛⎜⎜⎝x1

...

xn

⎞⎟⎟⎠ ∈ Fn. (3.4.1)

Recall that an eigenvalue of TA is a scalar λ ∈ F such that the null-space

Eλ = N(TA − λI) = {x ∈ Fn | TA(x) = λx}, (3.4.2)

where I ∈ L(Fn) is the identity mapping, is nontrivial and Eλ is called aneigenspace of TA, whose nonzero vectors are called eigenvectors. The eigen-values, eigenspaces, and eigenvectors of TA are also often simply referred toas those of the matrix A. The purpose of this section is to show how to usedeterminant as a tool to find the eigenvalues of A.

If λ is an eigenvalue of A and x an eigenvector, then Ax = λx. In otherwords, x is a nonzero solution of the equation (λIn − A)x = 0. Hence thematrix λIn − A is singular. Therefore λ satisfies

det(λIn − A) = 0. (3.4.3)

Of course the converse is true as well: If λ satisfies (3.4.3) then (λIn−A)x = 0has a nontrivial solution which indicates that λ is an eigenvalue of A. Conse-quently the eigenvalues of A are the roots of the function

pA(λ) = det(λIn − A), (3.4.4)

which is seen to be a polynomial of degree n following the cofactor expansionformula (3.2.2) in Definition 3.3. The polynomial pA(λ) defined in (3.4.4) iscalled the characteristic polynomial associated with the matrix A, whose rootsare called the characteristic roots of A. So the eigenvalues of A are the char-acteristic roots of A. In particular, A can have at most n distinct eigenvalues.

Theorem 3.15 Let A = (aij ) be an n × n (n ≥ 2) matrix and pA(λ) itscharacteristic polynomial. Then pA(λ) is of the form

pA(λ) = λn − Tr(A)λn−1 + · · · + (−1)n det(A). (3.4.5)

108 Determinants

Proof Write pA(λ) = anλn + an−1λ

n−1 + · · · + a0. Then

a0 = pA(0) = det(−A) = (−1)n det(A) (3.4.6)

as asserted. Besides, using Definition 3.3 and induction, we see that the twoleading-degree terms in pA(λ) containing λn and λn−1 can only appear in theproduct of the entry λ − a11 and the cofactor C

λIn−A11 of the entry λ − a11

in the matrix λIn − A. Let An−1 be the submatrix of A obtained by deletingthe row and column vectors of A occupied by the entry a11. Then C

λIn−A11 =

det(λIn−1−An−1) whose two leading-degree terms containing λn−1 and λn−2

can only appear in the product of λ−a22 and its cofactor in the matrix λIn−1−An−1. Carrying out this process to the end we see that the two leading-degreeterms in pA(λ) can appear only in the product

(λ− a11) · · · (λ− ann), (3.4.7)

which may be read off to give us the results

λn, −(a11 + · · · + ann)λn−1. (3.4.8)

This completes the proof.

If A ∈ C(n, n) we may also establish Theorem 3.15 by means of calculus.In fact, we have

pA(λ)

λn= det

(In − 1

λA

). (3.4.9)

Thus, letting λ→∞ in (3.4.9), we obtain an = det(In) = 1. Moreover, (3.4.9)also gives us the expression

an + an−11

λ+ · · · + a0

1

λn= det

(In − 1

λA

). (3.4.10)

Thus, replacing1

λby t , we get

q(t) ≡ an + an−1t + · · · + a0tn

=

∣∣∣∣∣∣∣∣∣∣1− ta11 −ta12 · · · −ta1n

−ta21 1− ta22 · · · −ta2n

· · · · · · · · · · · ·−tan1 −tan2 · · · 1− tann

∣∣∣∣∣∣∣∣∣∣. (3.4.11)

Therefore

an−1 = q ′(0) = −n∑

i=1

aii (3.4.12)


as before.We now consider the abstract case.Let U be an n-dimensional vector space over a field F and T ∈ L(U).

Assume that u ∈ U is an eigenvector of T associated to the eigenvalue λ ∈ F.Given a basis U = {u1, . . . , un} we express u as

u = x1u1 + · · · + xnun, x = (x1, . . . , xn)t ∈ Fn. (3.4.13)

Let A = (aij ) ∈ F(n, n) be the matrix representation of T with respect to thebasis U so that

T (uj ) =n∑

i=1

aijui, j = 1, . . . , n. (3.4.14)

Then the relation T (u) = λu leads ton∑

j=1

aij xj = λxi, i = 1, . . . , n, (3.4.15)

which is exactly the matrix equation Ax = λx already discussed. Hence λ maybe obtained from solving for the roots of the characteristic equation (3.4.3).

Let V = {v1, . . . , vn} be another basis of U and B = (bij ) ∈ F(n, n) be thematrix representation of T with respect to the basis V so that

T (vj ) =n∑

i=1

bij vi, i = 1, . . . , n. (3.4.16)

Since U and V are two bases of U , there is an invertible matrix C = (cij ) ∈F(n, n) called the basis transition matrix such that

vj =n∑

i=1

cij ui, j = 1, . . . , n. (3.4.17)

Following the study of the previous chapter we know that A,B,C are re-lated through the similarity relation A = C−1BC. Hence we have

pA(λ) = det(λIn − A) = det(λIn − C−1BC)

= det(C−1[λIn − B]C)

= det(λIn − B) = pB(λ). (3.4.18)

That is, two similar matrices have the same characteristic polynomial. Thuswe may use pA(λ) to define the characteristic polynomial of linear mappingT ∈ L(U), rewritten as pT (λ), where A is the matrix representation of T withrespect to any given basis of U , since such a polynomial is independent of thechoice of the basis.

110 Determinants

The following theorem, known as the Cayley–Hamilton theorem, is of fun-damental importance in linear algebra.

Theorem 3.16 Let A ∈ C(n, n) and pA(λ) be its characteristic polynomial.Then

pA(A) = 0. (3.4.19)

Proof We split the proof into two situations.First we assume that A has n distinct eigenvalues, say λ1, . . . , λn. We can

write pA(λ) as

pA(λ) =n∏

i=1

(λ− λi). (3.4.20)

Let u1, . . . , un ∈ Cn be the eigenvectors associated to the eigenvaluesλ1, . . . , λn respectively which form a basis of Cn. Then

pA(A)ui =⎛⎝ n∏

j=1

(A− λj In)

⎞⎠ ui

=⎛⎝∏

j =i

(A− λj In)

⎞⎠ (A− λiIn)ui = 0, (3.4.21)

for any i = 1, . . . , n. This proves pA(A) = 0.Next we consider the general situation when A may have multiple eigen-

values. We begin by showing that A may be approximated by matrices of n

distinct eigenvalues. We show this by induction on n. When n = 1, there isnothing to do. Assume the statement is true at n − 1 ≥ 1. We now proceedwith n ≥ 2.

In view of Theorem 3.2 there exists an eigenvalue of A, say λ1. Let u1 be anassociated eigenvector. Extend {u1} to get a basis for Cn, say {u1, u2, . . . , un}.Then the linear mapping defined by (3.4.1) satisfies

TA(uj ) = Auj =n∑

i=1

bijui, i = 1, . . . , n; b11 = λ1, bi1 = 0, i = 2, . . . , n.

(3.4.22)

That is, the matrix B = (bij ) of TA with respect to the basis {u1, . . . , un} is ofthe form

B =(

λ1 b0

0 B0

), (3.4.23)


where B0 ∈ C(n − 1, n − 1) and b0 ∈ Cn−1 is a row vector. By the inductiveassumption, for any ε > 0, there is some C0 ∈ C(n − 1, n − 1) such that‖B0 − C0‖ < ε and C0 has n − 1 distinct eigenvalues, say λ2, . . . , λn, wherewe use ‖ · ‖ to denote any norm of the space of square matrices whenever thereis no risk of confusion. Now set

C =(

λ1 + δ b0

0 C0

). (3.4.24)

It is clear that the eigenvalues of C are λ1 + δ, λ2, . . . , λn, which are distinctwhen δ > 0 is small enough. Of course there is a constant K > 0 dependingon n only such that ‖B − C‖ < Kε when δ > 0 is sufficiently small.

On the other hand, if we use the notation uj = (u1j , . . . , unj )t , j =

1, . . . , n, then we have

uj =n∑

i=1

uij ei, j = 1, . . . , n. (3.4.25)

Thus, with U = (uij ), we obtain the relation A = U−1BU . Therefore

‖A− U−1CU‖ = ‖U−1(B − C)U‖ ≤ ‖U−1‖‖U‖Kε. (3.4.26)

Of course U−1CU has the same n distinct eigenvalues as the matrix C. Hencethe asserted approximation property in the situation of n× n matrices is estab-lished.

Finally, let {A(l)} be a sequence in C(n, n) such that A(l) → A as l → ∞and each A(l) has n distinct eigenvalues (l = 1, 2, . . . ). We have already shownthat pA(l) (A

(l)) = 0 (l = 1, 2, . . . ). Consequently,

pA(A) = liml→∞pA(l) (A

(l)) = 0. (3.4.27)

The proof of the theorem is now complete.

Of course Theorem 3.16 is valid for A ∈ F(n, n) whenever F is a subfieldof C although A may not have an eigenvalue in F. Important examples includeF = Q and F = R.

We have seen that the proof of Theorem 3.16 relies on the assumption of theexistence of an eigenvalue which in general is only valid when the underlyingfield is C. In fact, however, the theorem holds universally over any field. Belowwe give a proof of it without assuming F = C, which is of independent interestand importance.

112 Determinants

To proceed, we need to recall and re-examine the definition of polynomi-als in the most general terms. Given a field F, a polynomial p over F is anexpression of the form

p(t) = a0 + a1t + · · · + antn, a0, a1, . . . , an ∈ F, (3.4.28)

where a0, a1, . . . , an are called the coefficients of p and the variable t is aformal symbol that ‘generates’ the formal symbols t0 = 1, t1 = t, . . . , tn =(t)n (called the powers of t) which ‘participate’ in all the algebraic operationsfollowing the usual associative, commutative, and distributive laws as if t werean F-valued parameter or variable. For example, (at i)(btj ) = abti+j for anya, b ∈ F and i, j ∈ N. The set of all polynomials with coefficients in F and inthe variable t is denoted by P , which is a vector space over F, with additionand scalar multiplication defined in the usual ways, whose zero element is anypolynomial with zero coefficients. In other words, two polynomials in P areconsidered equal if and only if the coefficients of the same powers of t in thetwo polynomials all agree.

In the rest of this section we shall assume that the variable of any polynomialwe consider is such a formal symbol. It is clear that this assumption does notaffect the computation we perform.

Let A ∈ F(n, n) and λ be a (formal) variable just introduced. Then we have

(λIn − A)adj(λIn − A) = det(λIn − A)In, (3.4.29)

whose right-hand side is simply

det(λIn − A)In = (λn + an−1λn−1 + · · · + a1λ+ a0)In. (3.4.30)

On the other hand, we may expand the left-hand side of (3.4.29) into the form

(λIn − A)adj(λIn − A)

= (λIn − A)(An−1λn−1 + · · · + A1λ+ A0)

= An−1λn+(An−2 − AAn−1)λ

n−1+(A0 − AA1)λ− AA0, (3.4.31)

where A0, A1, . . . , An−1 ∈ F(n, n). Comparing the like powers of λ in(3.4.30) and (3.4.31) we get the relations

An−1 = In, An−2 − AAn−1 = an−1In, . . . ,

A0 − AA1 = a1In, −AA0 = a0In.(3.4.32)

Multiplying from the left the first relation in (3.4.32) by An, the second byAn−1,. . . , and the second last by A, and then summing up the results, we obtain

0 = An + an−1An−1 + · · · + a1A+ a0In = pA(A), (3.4.33)

as anticipated.


Let U be an n-dimensional vector space over a field F and T ∈ L(U). Givena basis U = {u1, . . . , un} let A ∈ F(n, n) be the matrix that represents T withrespect to U . Then we know that Ai represents T i with respect to U for anyi = 1, 2 . . . . As a consequence, for any polynomial p(t) with coefficients inF, the matrix p(A) represents p(T ) with respect to U . Therefore the Cayley–Hamilton theorem may be restated in terms of linear mappings as follows.

Theorem 3.17 Let U be a finite-dimensional vector space over an arbitraryfield. Any T ∈ L(U) is trivialized or annihilated by its characteristic polyno-mial pT (λ). That is,

pT (T ) = 0. (3.4.34)

For A ∈ F(n, n) (n ≥ 2) let pA(λ) = λn + an−1λn−1 + · · · + a1λ + a0 be

the characteristic polynomial of A. Theorem 3.15 gives us a0 = (−1)n det(A).Inserting this result into the equation pA(A) = 0 we have

A(An−1 + an−1An−2 + · · · + a1In) = (−1)n+1 det(A)In, (3.4.35)

which leads us again to the conclusion that A is invertible whenever det(A) =0. Furthermore, in this situation, the relation (3.4.35) implies

A−1 = (−1)n+1

det(A)(An−1 + an−1A

n−2 + · · · + a1In), (3.4.36)

or alternatively,

adj(A) = (−1)n+1(An−1 + an−1An−2 + · · · + a1In), det(A) = 0.

(3.4.37)

We leave it as an exercise to show that the condition det(A) = 0 above for(3.4.37) to hold is not necessary and can thus be dropped.

Exercises

3.4.1 Let pA(λ) be the characteristic polynomial of a matrix A = (aij ) ∈F(n, n), where n ≥ 2 and F = R or C. Show that in pA(λ) = λn+· · ·+a1λ+ a0 we have

a1 = (−1)n−1n∑

i=1

Cii, (3.4.38)

where Cii is the cofactor of the entry aii of the matrix A (i = 1, . . . , n).3.4.2 Consider the subset D of C(n, n) defined by

D = {A ∈ C(n, n) |A has n distinct eigenvalues}. (3.4.39)

114 Determinants

Prove that D is open in C(n, n).3.4.3 Let A ∈ F(2, 2) be given by

A =(

a b

c d

). (3.4.40)

Find the characteristic polynomial of A. Assume det(A) = ad− bc = 0and use (3.4.36) to derive the formula that gives A−1.

3.4.4 Show that (3.4.37) is true for any A ∈ F(n, n). That is, the conditiondet(A) = 0 in (3.4.37) may actually be removed.

3.4.5 Let U be an n-dimensional vector space over F and T ∈ L(U) is nilpo-tent of degree n. What is the characteristic polynomial of T ?

3.4.6 For A,B ∈ F(n, n), prove that the characteristic polynomials of AB andBA are identical. That is,

pAB(λ) = pBA(λ). (3.4.41)

3.4.7 Let F be a field and α, β ∈ F(1, n). Find the characteristic polynomialof the matrix αtβ ∈ F(n, n).

3.4.8 Consider the matrix

A =⎛⎜⎝ 1 2 0

0 2 0

−2 −1 −1

⎞⎟⎠ . (3.4.42)

(i) Use the characteristic polynomial of A and the Cayley–Hamiltontheorem to find A−1.

(ii) Use the characteristic polynomial of A and the Cayley–Hamiltontheorem to find A10.

4

Scalar products

In this chapter we consider vector spaces over a field which is either R or C. Weshall start from the most general situation of scalar products. We then considerthe situations when scalar products are non-degenerate and positive definite,respectively.

4.1 Scalar products and basic properties

In this section, we use F to denote the field R or C.

Definition 4.1 Let U be a vector space over F. A scalar product over U isdefined to be a bilinear symmetric function f : U × U → F, written simplyas (u, v) ≡ f (u, v), u, v ∈ U . In other words the following properties hold.

(1) (Symmetry) (u, v) = (v, u) ∈ F for u, v ∈ U .(2) (Additivity) (u+ v,w) = (u,w)+ (v,w) for u, v,w ∈ U .(3) (Homogeneity) (au, v) = a(u, v) for a ∈ F and u, v ∈ U .

We say that u, v ∈ U are mutually perpendicular or orthogonal to eachother, written as u ⊥ v, if (u, v) = 0. More generally for any non-emptysubset S of U we use the notation

S⊥ = {u ∈ U | (u, v) = 0 for any v ∈ S}. (4.1.1)

For u ∈ U we say that u is a null vector if (u, u) = 0.

It is obvious that S⊥ is a subspace of U for any nonempty subset S of U .Moreover {0}⊥ = U . Furthermore it is easy to show that if the vectorsu1, . . . , uk are mutually perpendicular and not null then they are linearlyindependent.

115

116 Scalar products

Let u, v ∈ U so that u is not null. Then we can resolve v into the sum oftwo mutually perpendicular vectors, one in Span{u}, say cu for some scalarc, and one in Span{u}⊥, say w. In fact, rewrite v as v = w + cu and require(u,w) = 0. We obtain the unique solution c = (u, v)/(u, u). In summary, wehave obtained the orthogonal decomposition

v = w + (u, v)

(u, u)u, w = v − (u, v)

(u, u)u ∈ Span{u}⊥. (4.1.2)

As the first application of the decomposition (4.1.2), we state the following.

Theorem 4.2 Let U be a finite-dimensional vector space equipped with ascalar product (·, ·) and set U0 = U⊥. Any basis of U0 can be extended tobecome an orthogonal basis of U . In other words any two vectors in such abasis of U are mutually perpendicular.

Proof If U0 = U there is nothing to show. Below we assume U0 = U .If U0 = {0} we may start from any basis of U . If U0 = {0} let {u1, . . . , uk}

be a basis of U0 and extend it to get a basis of U , say {u1, . . . , uk, v1, . . . , vl}.That is, U = U0 ⊕ V , where

V = Span{v1, . . . , vl}. (4.1.3)

If (v1, v1) = 0 then there is some vi (i ≥ 2) such that (v1, vi) = 0 oth-erwise v1 ∈ U0 which is false. Thus, without loss of generality, we mayassume (v1, v2) = 0 and consider {v2, v1, . . . , vl} instead if (v2, v2) = 0 oth-erwise we may consider {v1 + v2, v2, . . . , vl} as a basis of V because now(v1 + v2, v1 + v2) = 2(v1, v2) = 0. So we have seen that we may assume(v1, v1) = 0 to start with after renaming the basis vectors {v1, . . . , vl} of V ifnecessary. Now let w1 = v1 and set

wi = vi − (w1, vi)

(w1, w1)w1, i = 2, . . . , l. (4.1.4)

Then wi = 0 since v1, vi are linearly independent for all i = 2, . . . , l. It isclear that wi ⊥ w1 (i = 2, . . . , l). If (wi, wi) = 0 for some i = 2, . . . , l,we may assume i = 2 after renaming the basis vectors {v1, . . . , vl} of V ifnecessary. If (wi, wi) = 0 for all i = 2, . . . , l, there must be some j =i, i, j = 2, . . . , l, such that (wi, wj ) = 0, otherwise wi ∈ U0 for i = 2, . . . , l,which is false. Without loss of generality, we may assume (w2, w3) = 0 andconsider

w2 + w3 = (v2 + v3)− (w1, v2 + v3)

(w1, w1)w1. (4.1.5)

4.1 Scalar products and basic properties 117

It is clear that (w2+w3, w2+w3) = 2(w2, w3) = 0 and (w2+w3) ⊥ w1. Since{v1, v2+v3, v3, . . . , vl} is also a basis of V , the above procedure indicates thatwe may rename the basis vectors {v1, . . . , vl} of V if necessary so that weobtain

w2 = v2 − (w1, v2)

(w1, w1)w1, (4.1.6)

which satisfies (w2, w2) = 0. Of course Span{v1, . . . , vl} =Span{w1, w2, v3, . . . , vl}. Now set

wi = vi − (w2, vi)

(w2, w2)w2 − (w1, vi)

(w1, w1)w1, i = 3, . . . , l. (4.1.7)

Then wi ⊥ w2 and wi ⊥ w1 for i = 3, . . . , l. If (wi, wi) = 0 for some i =3, . . . , l, by renaming {v1, . . . , vl} if necessary, we may assume (w3, w3) = 0.If (wi, wi) = 0 for all i = 3, . . . , l, then there is some i = 4, . . . , l suchthat (w3, wi) = 0. We may assume (w3, w4) = 0. Thus (w3 + w4, w3 +w4) = 2(w3, w4) = 0 and (w3 + w4) ⊥ w1, (w3 + w4) ⊥ w2. Of course,{v1, v2, v3 + v4, v4 . . . , vl} is also a basis of V . Thus we see that we mayrename the basis vectors {v1, . . . , vl} of V so that we obtain

w3 = v3 − (w2, v3)

(w2, w2)w2 − (w1, v3)

(w1, w1)w1, (4.1.8)

which again satisfies (w3, w3) = 0 and

Span{v1, . . . , vl} = Span{w1, w2, w3, v4, . . . , vl}. (4.1.9)

Therefore, by renaming the basis vectors {v1, . . . , vl} if necessary, we willbe able to carry the above procedure out and obtain a new set of vectors{w1, . . . , wl} given by

w1 = v1, wi = vi −i−1∑j=1

(wj , vi)

(wj ,wj )wj , i = 2, . . . , l, (4.1.10)

and having the properties

(wi, wi) = 0, i = 1, . . . , l,

(wi, wj ) = 0, i = j, i, j = 1, . . . , l, (4.1.11)

Span{w1, . . . , wl} = Span{v1, . . . , vl}.In other words {u1, . . . , uk, w1, . . . , wl} is seen to be an orthogonal basis of U .

The method described in the proof of Theorem 4.2, especially the schemegiven by the formulas in (4.1.10)–(4.1.11), is known as the Gram–Schmidtprocedure for basis orthogonalization.

118 Scalar products

In the rest of this section, we assume F = R.If U0 = U⊥ is a proper subspace of U , that is, U0 = {0} and U0 = U , we

have seen from Theorem 4.2 that we may express an orthogonal basis of U by(say)

{u1, . . . , un0 , v1, . . . , vn+ , w1, . . . , wn−}, (4.1.12)

so that {u1, . . . , un0} is a basis of U0 and that (if any)

(vi, vi) > 0, i = 1, . . . , n+, (wi, wi) < 0, i = 1, . . . , n−.

(4.1.13)

It is clear that, with

U+ = Span{v1, . . . , vn+}, U− = Span{w1, . . . , wn−}, (4.1.14)

we have the following elegant orthogonal subspace decomposition

U = U0 ⊕ U+ ⊕ U−, dim(U0) = n0, dim(U+) = n+, dim(U−) = n−.

(4.1.15)

It is interesting that the integers n0, n+, n− are independent of the choice ofan orthogonal basis. Such a statement is also known as the Sylvester theorem.

In fact, it is obvious that n0 is independent of the choice of an orthogonalbasis since n0 is the dimension of U0 = U⊥. Assume that

{u1, . . . , un0 , v1, . . . , vm+ , w1, . . . , wm−} (4.1.16)

is another orthogonal basis of U so that {u1, . . . , un0} is a basis of U0 and that(if any)

(vi , vi ) > 0, i = 1, . . . , m+, (wi , wi) < 0, i = 1, . . . , m−.

(4.1.17)

To proceed, we assume n+ ≥ m+ for definiteness and we need to est-ablish n+ ≤ m+. For this purpose, we show that u1, . . . , un0 , v1, . . . , vn+ ,

w1, . . . , wm− are linearly independent. Indeed, if there are scalars a1, . . . , an0 ,

b1, . . . , bn+ , c1, . . . , cm− in R such that

a1u1 + · · · + an0un0 + b1v1 + · · · + bn+vn+ = c1w1 + · · · + cm−wm− ,

(4.1.18)

then we may take the scalar products of both sides of (4.1.18) with themselvesto get

b21(v1, v1)+ · · · + b2

n+(vn+ , vn+) = c21(w1, w1)+ · · · + c2

m−(wm− , wm−).

(4.1.19)

4.1 Scalar products and basic properties 119

Thus, applying the properties (4.1.13) and (4.1.17) in (4.1.19), we concludethat b1 = · · · = bn+ = c1 = · · · = cm− = 0. Inserting this result into(4.1.18) and using the linear independence of u1, . . . , un0 , we arrive at a1 =· · · = an0 = 0. So the asserted linear independence follows. As a consequence,we have

n0 + n+ +m− ≤ dim(U). (4.1.20)

In view of (4.1.20) and n0+m++m− = dim(U) we find n+ ≤ m+ as desired.Thus the integers n0, n+, n− are determined by the scalar product and

independent of the choice of an orthogonal basis. These integers are some-times referred to as the indices of nullity, positivity, and negativity of the scalarproduct, respectively.

It is clear that for the orthogonal basis (4.1.12) of U we can further rescalethe vectors v1, . . . , vn+ and w1, . . . , wn− to make them satisfy

(vi, vi) = 1, i = 1, . . . , n+, (wi, wi) = −1, i = 1, . . . , n−. (4.1.21)

Such an orthogonal basis is called an orthonormal basis.

Exercises

4.1.1 Let S be a non-empty subset of a vector space U equipped with a scalarproduct (·, ·). Show that S⊥ is a subspace of U and S ⊂ (S⊥)⊥.

4.1.2 Let u1, . . . , uk be mutually perpendicular vectors of a vector space U

equipped with a scalar product (·, ·). Show that if these vectors are notnull then they must be linearly independent.

4.1.3 Let S1 and S2 be two non-empty subsets of a vector space U equippedwith a scalar product (·, ·). If S1 ⊂ S2, show that S⊥1 ⊃ S⊥2 .

4.1.4 Consider the vector space Rn and define

(u, v) = utAv, u =

⎛⎜⎜⎝a1

...

an

⎞⎟⎟⎠ , v =

⎛⎜⎜⎝b1

...

bn

⎞⎟⎟⎠ ∈ Rn, (4.1.22)

where A ∈ R(n, n).

(i) Show that (4.1.22) defines a scalar product over Rn if and only if A

is symmetric.(ii) Show that the subspace U0 = (Rn)⊥ is in fact the null-space of the

matrix A given by

N(A) = {x ∈ Rn |Ax = 0}. (4.1.23)

120 Scalar products

4.1.5 In special relativity, one equips the space R4 with the Minkowski scalarproduct or Minkowski metric given by

(u, v)= a1b1 − a2b2 − a3b3 − a4b4, u=

⎛⎜⎜⎜⎜⎝a1

a2

a3

a4

⎞⎟⎟⎟⎟⎠, v=

⎛⎜⎜⎜⎜⎝b1

b2

b3

b4

⎞⎟⎟⎟⎟⎠∈R4.

(4.1.24)

Find an orthogonal basis and determine the indices of nullity, positivity,and negativity of R4 equipped with this scalar product.

4.1.6 With the notation of the previous exercise, consider the following modi-fied scalar product

(u, v) = a1b1 − a2b3 − a3b2 − a4b4. (4.1.25)

(i) Use the Gram–Schmidt procedure to find an orthonormal basisof R4.

(ii) Compute the indices of nullity, positivity, and negativity.

4.1.7 Let U be a vector space with a scalar product and V,W two subspacesof U . Show that

(V +W)⊥ = V ⊥ ∩W⊥. (4.1.26)

4.1.8 Let U be an n-dimensional vector space over C with a scalar product(·, ·). Show that if n ≥ 2 then there must be a vector u ∈ U , u = 0, suchthat (u, u) = 0.

4.2 Non-degenerate scalar products

Let U be a vector space equipped with the scalar product (·, ·). In this section,we examine the special situation when U0 = U⊥ = {0}.

Definition 4.3 A scalar product (·, ·) over U is said to be non-degenerate ifU0 = U⊥ = {0}. Or equivalently, if (u, v) = 0 for all v ∈ U then u = 0.

The most important consequence of a non-degenerate scalar product is thatit allows us to identify U with its dual space U ′ naturally through the pairinggiven by the scalar product. To see this, we note that for each v ∈ U

f (u) = (u, v), u ∈ U (4.2.1)

4.2 Non-degenerate scalar products 121

defines an element f ∈ U ′. We now show that all elements of U ′ may bedefined this way.

In fact, assume dim(U) = n and let {u1, . . . , un} be an orthogonal basisof U . Since

(ui, ui) ≡ ci = 0, i = 1, . . . , n, (4.2.2)

we may take

vi = 1

ci

ui, i = 1, . . . , n, (4.2.3)

to achieve

(ui, vj ) = δij , i, j = 1, . . . , n. (4.2.4)

Thus, if we define fi ∈ U ′ by setting

fi(u) = (u, vi), u ∈ U, i = 1, . . . , n, (4.2.5)

then {f1, . . . , fn} is seen to be a basis of U ′ dual to {u1, . . . , un}. Therefore,for each f ∈ U ′, there are scalars a1, . . . , an such that

f = a1f1 + · · · + anfn. (4.2.6)

Consequently, for any u ∈ U , we have

f (u) = (a1f1 + · · · + anfn)(u) = a1f1(u)+ · · · + anfn(u)

= a1(u, v1)+ · · · + an(u, vn) = (u, a1v1 + · · · + anvn) ≡ (u, v),

(4.2.7)

which proves that any element f of U ′ is of the form (4.2.1). Such a statement,that is, any functional over U may be represented as a scalar product, in thecontext of infinite-dimensional spaces, is known as the Riesz representationtheorem.

In order to make our discussion more precise, we denote the dependence off on v in (4.2.1) explicitly by f ≡ v′ ∈ U ′ and use ρ : U → U ′ to expressthis correspondence,

ρ(v) = v′. (4.2.8)

Therefore, we may summarize various relations discussed above as follows,

〈u, ρ(v)〉 = 〈u, v′〉 = (u, v), u, v ∈ U. (4.2.9)

Then we can check to see that ρ is linear. In fact, for v,w ∈ U , we have

〈u, ρ(v + w)〉 = (u, v + w) = (u, v)+ (u,w)

= 〈u, ρ(v)〉 + 〈u, ρ(w)〉 = 〈u, ρ(v)+ ρ(w)〉, u ∈ U,

(4.2.10)

122 Scalar products

which implies ρ(v + w) = ρ(v) + ρ(w). Besides, for a ∈ F and v ∈ U ,we have

〈u, ρ(av)〉 = (u, av) = a(u, v) = a〈u, ρ(v)〉 = 〈u, aρ(v)〉, (4.2.11)

which establishes ρ(av) = aρ(v). Thus the linearity of ρ : U → U ′ follows.Since we have seen that ρ : U → U ′ is onto, we conclude that ρ : U → U ′

is an isomorphism, which may rightfully be called the Riesz isomorphism. Asa consequence, we can rewrite (4.2.9) as

〈u, ρ(v)〉 = 〈u, v′〉 = (u, v) = (u, ρ−1(v′)), u, v ∈ U, v′ ∈ U ′.(4.2.12)

On the other hand, for T ∈ L(U), recall that the dual of T , T ′ ∈ L(U ′),satisfies

〈u, T ′(v′)〉 = 〈T (u), v′〉, u ∈ U, v′ ∈ U ′. (4.2.13)

So in view of (4.2.12) and (4.2.13), we arrive at

(T (u), v) = (u, (ρ−1 ◦ T ′ ◦ ρ)(v)), u, v ∈ U. (4.2.14)

In other words, for any T ∈ L(U), there is a unique element T ∗ ∈ L(U),called the dual of T with respect to the non-degenerate scalar product (·, ·)and determined by the relation

T ∗ = ρ−1 ◦ T ′ ◦ ρ, (4.2.15)

via the Riesz isomorphism ρ : U → U ′, such that

(T (u), v) = (u, T ∗(v)), u, v ∈ U. (4.2.16)

Through the Riesz isomorphism, we may naturally identify U ′ with U . Inthis way, we may view U as its own dual space and describe U as a self-dualspace. Thus, for T ∈ L(U), we may naturally identify T ∗ with T ′ as well,without spelling out the Riesz isomorphism, which leads us to formulate thefollowing definition.

Definition 4.4 Let U be a vector space with a non-degenerate scalar product(·, ·). For a mapping T ∈ L(U), the unique mapping T ′ ∈ L(U) satisfying

(u, T (v)) = (T ′(u), v), u, v ∈ U, (4.2.17)

is called the dual or adjoint mapping of T , with respect to the scalar product(·, ·).

If T = T ′, T is said to be a self-dual or self-adjoint mapping with respect tothe scalar product (·, ·).


Definition 4.5 Let T ∈ L(U) where U is a vector space equipped with a scalarproduct (·, ·). We say that T is an orthogonal mapping if (T (u), T (v)) = (u, v)

for any u, v ∈ U .

As an immediate consequence of the above definition, we have the followingbasic results.

Theorem 4.6 That T ∈ L(U) is an orthogonal mapping is equivalent to oneof the following statements.

(1) (T (u), T (u)) = (u, u) for any u ∈ U .(2) For any orthogonal basis {u1, . . . , un} of U the vectors T (u1), . . . , T (un)

are mutually orthogonal and (T (ui), T (ui)) = (ui, ui) for i = 1, . . . , n.(3) T ′ ◦ T = T ◦ T ′ = I , the identity mapping over U .

Proof If T is orthogonal, it is clear that (1) holds.Now assume (1) is valid. Using the properties of the scalar product, we have

the identity

2(u, v) = (u+ v, u+ v)− (u, u)− (v, v), u, v ∈ U. (4.2.18)

So 2(T (u), T (v)) = (T (u+ v), T (u+ v))− (T (u), T (u))− (T (v), T (v)) =2(u, v) for any u, v ∈ U . Thus T is orthogonal.

That T being orthogonal implies (2) is trivial.Assume (2) holds. We express any u, v ∈ U as

u =n∑

i=1

aiui, v =n∑

i=1

biui, ai, bi ∈ F, i = 1, . . . , n. (4.2.19)

Therefore we have

(T (u), T (v)) =⎛⎝T

(n∑

i=1

aiui

), T

⎛⎝ n∑j=1

bjuj

⎞⎠⎞⎠=

n∑i,j=1

aibj (T (ui), T (uj ))

=n∑

i=1

aibi(T (ui), T (ui))

=n∑

i=1

aibi(ui, ui)

=⎛⎝ n∑

i=1

aiui,

n∑j=1

bjuj

⎞⎠ = (u, v), (4.2.20)

124 Scalar products

which establishes the orthogonality of T .We now show that T being orthogonal and (3) are equivalent. In fact, if T is

orthogonal, then (u, (T ′ ◦ T )(v)) = (u, v) or (u, (T ′ ◦ T − I )(v)) = 0 for anyu ∈ U . By the non-degeneracy of the scalar product we get (T ′ ◦T −I )(v) = 0for any v ∈ U , which proves T ′ ◦ T = I . In other words, T ′ is a left inverseof T . In view of the discussion in Section 2.1, T ′ is also a right inverse of T .That is, T ◦ T ′ = I . So (3) follows. That (3) implies the orthogonality of T isobvious.

As an example, we consider R2 equipped with the standard Euclidean scalarproduct, i.e. the dot product, given as

(u, v)+ ≡ u · v = a1b1 + a2b2 for u =(

a1

a2

), v =(

b1

b2

)∈ R2.

(4.2.21)

It is straightforward to check that the rotation mapping Rθ : R2 → R2

defined by

Rθ(u) =(

cos θ − sin θ

sin θ cos θ

)u, θ ∈ R, u ∈ R2, (4.2.22)

is an orthogonal mapping with respect to the scalar product (4.2.21). However,it fails to be orthogonal with respect to the Minkowski scalar product

(u, v)− ≡ a1b1 − a2b2 for u =(

a1

a2

), v =(

b1

b2

)∈ R2. (4.2.23)

Nevertheless, if we modify Rθ into ρθ : R2 → R2 using hyperbolic cosine andsine functions and dropping the negative sign, by setting

ρθ (u) =(

cosh θ sinh θ

sinh θ cosh θ

)u, θ ∈ R, u ∈ R2, (4.2.24)

we see that ρθ is orthogonal with respect to the scalar product (4.2.23),although it now fails to be orthogonal with respect to (4.2.21), of course. Thisexample clearly illustrates the dependence of the form of an orthogonal map-ping on the underlying scalar product.

As another example, consider the space R2 with the scalar product

(u, v)∗ = a1b1 + a1b2 + a2b1 − a2b2, u =(

a1

a2

), v =(

b1

b2

)∈ R2.

(4.2.25)


It is clear that (·, ·)∗ is non-degenerate and may be rewritten as

(u, v)∗ = ut

(1 1

1 −1

)v. (4.2.26)

Thus, for a mapping T ∈ L(R2) defined by

T (u) =(

a b

c d

)u ≡ Au, u ∈ R2, a, b, c, d ∈ R, (4.2.27)

we have

(T ′(u), v)∗ = (u, T (v))∗ = ut

(1 1

1 −1

)Av

= ut

(1 1

1 −1

)A

(1 1

1 −1

)−1 (1 1

1 −1

)v, (4.2.28)

which implies

T ′(u) =(

1 1

1 −1

)−1

At

(1 1

1 −1

)u

= 1

2

(a + b + c + d a + b − c − d

a − b + c − d a − b − c + d

)u, u ∈ R2. (4.2.29)

Consequently, if T is self-adjoint with respect to the scalar product (·, ·)∗, thecondition a = b + c + d holds for a but b, c, d are arbitrary.

Exercises

4.2.1 Let U be a vector space equipped with a non-degenerate scalar product,(·, ·), and T ∈ L(U).

(i) Show that the pairing (u, v)T = (u, T (v)) (u, v ∈ U ) defines ascalar product on U if and only if T is self-adjoint.

(ii) Show that for a self-adjoint mapping T ∈ L(U) the pairing (·, ·)Tgiven above defines a non-degenerate scalar product over U if andonly if T is invertible.

4.2.2 Consider the vector space R2 equipped with the non-degenerate scalarproduct

(u, v) = a1b2 + a2b1, u =(

a1

a2

), v =(

b1

b2

)∈ R2. (4.2.30)

126 Scalar products

Let

V = Span

{(1

0

)}. (4.2.31)

Show that V ⊥ = V . This provides an example that in general V + V ⊥may fail to make up the full space.

4.2.3 Let A = (aij ) ∈ R(2, 2) and define TA ∈ L(R2) by

TA(u) =(

a11 a12

a21 a22

)u, u =

(a1

a2

)∈ R2. (4.2.32)

Find conditions on A such that TA is self-adjoint with respect to thescalar product (·, ·)− defined in (4.2.23).

4.2.4 Define the scalar product

(u, v)0 = a1b2 + a2b1, u =(

a1

a2

)v=(

b1

b2

)∈ R2, (4.2.33)

over R2.

(i) Show that the scalar product (·, ·)0 is non-degenerate.(ii) For TA ∈ L(R2) given in (4.2.32), obtain the adjoint mapping T ′A

of TA and find conditions on the matrix A so that TA is self-adjointwith respect to the scalar product (·, ·)0.

4.2.5 Use U to denote the vector space of real-valued functions with allorders of derivatives over the real line R which vanish outside boundedintervals. Equip U with the scalar product

(u, v) =∫ ∞−∞

u(t)v(t) dt, u, v ∈ U. (4.2.34)

Show that the linear mapping D = d

dt: U → U is anti-self-dual or

anti-self-adjoint. That is, D′ = −D.4.2.6 Let U be a finite-dimensional vector space equipped with a scalar prod-

uct, (·, ·). Decompose U into the direct sum as stated in (4.1.15). Showthat we may use the scalar product (·, ·) of U to make the quotientspace U/U0 into a space with a non-degenerate scalar product whenU0 = U , still denoted by (·, ·), given by

([u], [v]) = (u, v), [u], [v] ∈ U/U0. (4.2.35)

4.2.7 Let U be a finite-dimensional vector space equipped with a scalar prod-uct, (·, ·), and let V be a subspace of U . Show that (·, ·) is a non-degenerate scalar product over V if and only if V ∩ V ⊥ = {0}.

4.3 Positive definite scalar products 127

4.2.8 Let U be a finite-dimensional vector space with a non-degenerate scalarproduct (·, ·). Let ρ : U → U ′ be the Riesz isomorphism. Show thatfor any subspace V of U there holds

ρ(V ⊥) = V 0. (4.2.36)

In other words, the mapping ρ is an isomorphism from V ⊥ onto V 0.In particular, dim(V ⊥) = dim(V 0). Thus, in view of (1.4.31) and(4.2.36), we have the dimensionality equation

dim(V )+ dim(V ⊥) = dim(U). (4.2.37)

4.2.9 Let U be a finite-dimensional space with a non-degenerate scalar prod-uct and let V be a subspace of U . Use (4.2.37) to establish V = (V ⊥)⊥.

4.2.10 Let U be a finite-dimensional space with a non-degenerate scalar prod-uct and let V,W be two subspaces of U . Establish the relation

(V ∩W)⊥ = V ⊥ +W⊥. (4.2.38)

4.2.11 Let U be an n-dimensional vector space over C with a non-degeneratescalar product (·, ·). Show that if n ≥ 2 then there must be linearlyindependent vectors u, v ∈ U such that (u, u) = 0 and (v, v) = 0 but(u, v) = 1.

4.3 Positive definite scalar products

In this section we consider two types of positive definite scalar products: realones and complex ones. Real ones are modeled over the standard Euclideanscalar product on Rn:

(u, v) = u · v = utv = a1b1 + · · · + anbn,

u =

⎛⎜⎜⎝a1

...

an

⎞⎟⎟⎠ , v =

⎛⎜⎜⎝b1

...

bn

⎞⎟⎟⎠ ∈ Rn,(4.3.1)

and complex ones over the standard Hermitian scalar product on Cn:

(u, v) = u · v = u†v = a1b1 + · · · + anbn,

u =

⎛⎜⎜⎝a1

...

an

⎞⎟⎟⎠ , v =

⎛⎜⎜⎝b1

...

bn

⎞⎟⎟⎠ ∈ Cn.(4.3.2)

128 Scalar products

The common feature of these products is the positivity property (u, u) ≥ 0for any vector u and (u, u) = 0 only when u = 0. The major difference isthat the former is symmetric but the latter fails to be so. Instead, there holdsthe adjusted property (u, v) = (v, u) for any u, v ∈ Cn, which is seen to benaturally implemented to ensure the positivity property. As a consequence, inboth the Rn and Cn cases, we are able to define the norm of a vector u to be‖u‖ = √(u, u).

Motivated from the above examples, we can bring forth the followingdefinition.

Definition 4.7 A positive definite scalar product over a real vector space U

is a scalar product (·, ·) satisfying (u, u) ≥ 0 for u ∈ U and (u, u) = 0 onlywhen u = 0.

A positive definite scalar product over a complex vector space U is a scalarfunction (u, v) ∈ C, defined for each pair of vectors u, v ∈ U , which satisfiesthe following conditions.

(1) (Hermitian symmetry) (u, v) = (v, u) for u, v ∈ U .(2) (Additivity) (u+ v,w) = (u,w)+ (v,w) for u, v,w ∈ U .(3) (Partial homogeneity) (u, av) = a(u, v) for a ∈ C and u, v ∈ U .(4) (Positivity) (u, u) ≥ 0 for u ∈ U and (u, u) = 0 only when u = 0.

Since the real case is contained as a special situation of the complex case,we shall focus our discussion on the complex case, unless otherwise stated.

Needless to say, additivity regarding the second argument in (·, ·) still holdssince

(u, v + w)= (v + w, u) = (v, u)+ (w, u) = (u, v)+ (u,w), u, v,w ∈ U.

(4.3.3)

On the other hand, homogeneity regarding the first argument takes a modifiedform,

(au, v) = (v, au) = a(v, u) = a(u, v), a ∈ C, u ∈ U. (4.3.4)

We will extend our study carried out in the previous two sections for generalscalar products to the current situation of a positive definite scalar product thatis necessarily non-degenerate since (u, u) > 0 for any nonzero vector u in U .

First, we see that for u, v ∈ U we can still use the condition (u, v) = 0 todefine u, v to be mutually perpendicular vectors. Next, since for any u ∈ U wehave (u, u) ≥ 0, we can formally define the norm of u as in Cn by

‖u‖ = √(u, u). (4.3.5)


It is clearly seen that the norm so defined enjoys the positivity and homogeneityconditions required of a norm. That it also satisfies the triangle inequality willbe established shortly. Thus (4.3.5) indeed gives rise to a norm of the space U

that is said to be induced from the positive definite scalar product (·, ·).Let u, v ∈ U be perpendicular. Then we have

‖u+ v‖2 = ‖u‖2 + ‖v‖2. (4.3.6)

This important expression is also known as the Pythagoras theorem, whichmay be proved by a simple expansion

‖u+ v‖2=(u+ v, u+ v)=(u, u)+ (u, v)+ (v, u)+ (v, v)=‖u‖2 + ‖v‖2,

(4.3.7)

since (u, v) = 0 and (v, u) = (u, v) = 0.For u, v ∈ U with u = 0, we may use the orthogonal decomposition formula

(4.1.2) to resolve v into the form

v =(

v − (u, v)

(u, u)u

)+ (u, v)

(u, u)u ≡ w + (u, v)

(u, u)u, (4.3.8)

so that (u,w) = 0. Hence, in view of the Pythagoras theorem, we get

‖v‖2 = ‖w‖2 +∣∣∣∣ (u, v)

(u, u)

∣∣∣∣2 ‖u‖2 ≥ |(u, v)|2‖u‖2 . (4.3.9)

In other words, we have

|(u, v)| ≤ ‖u‖‖v‖, (4.3.10)

with equality if and only if w = 0 or equivalently, v ∈ Span{u}. Of course(4.3.10) is valid when u = 0. Hence, in summary, we may state that (4.3.10)holds for any u, v ∈ U and that the equality is achieved if and only if u, v arelinearly dependent.

The inequality (4.3.10) is the celebrated Schwarz inequality whose deriva-tion is seen to be another direct application of the vector orthogonal decompo-sition formula (4.1.2).

We now apply the Schwarz inequality to establish the triangle inequality forthe norm ‖ · ‖ induced from a positive definite scalar product (·, ·).

Let u, v ∈ U . Then in view of (4.3.10) we have

‖u+ v‖2 = ‖u‖2 + ‖v‖2 + (u, v)+ (v, u)

≤ ‖u‖2 + ‖v‖2 + 2|(u, v)|≤ ‖u‖2 + ‖v‖2 + 2‖u‖‖v‖= (‖u‖ + ‖v‖)2. (4.3.11)

130 Scalar products

Hence the triangle inequality ‖u+ v‖ ≤ ‖u‖ + ‖v‖ follows.If {u1, . . . , un} is a basis of U , we may invoke the Gram–Schmidt procedure

v1 = u1, vi = ui −i−1∑j=1

(vj , ui)

(vj , vj )vj , i = 2, . . . , n, (4.3.12)

as before to obtain an orthogonal basis for U . In fact, we may examine thevalidity of this procedure by a simple induction.

When n = 1, there is nothing to show.Assume the procedure is valid at n = k ≥ 1.At n = k + 1, by the inductive assumption, we know that we may construct

{v1, . . . , vk} to get an orthogonal basis for Span{u1, . . . , uk}. Define

vk+1 = uk+1 −k∑

j=1

(vj , uk+1)

(vj , vj )vj . (4.3.13)

Then we can check that (vk+1, vi) = 0 for i = 1, . . . , k and

vk+1 ∈ Span{uk+1, v1, . . . , vk} ⊂ Span{u1, . . . , uk, uk+1}. (4.3.14)

Of course vk+1 = 0 otherwise uk+1 ∈ Span{v1, . . . , vk} = Span{u1, . . . , uk}.Thus we have obtained k + 1 nonzero mutually orthogonal vectorsv1, . . . , vk, vk+1 that make up a basis for Span{u1, . . . , uk, uk+1} as asserted.

Thus, we have seen that, in the positive definite scalar product situation,from any basis {u1, . . . , un} of U , the Gram–Schmidt procedure (4.3.12) pro-vides a scheme of getting an orthogonal basis {v1, . . . , vn} of U so that eachof its subsets {v1, . . . , vk} is an orthogonal basis of Span{u1, . . . , uk} fork = 1, . . . , n.

Let {v1, . . . , vn} be an orthogonal basis for U . The positivity property allowsus to modify the basis further by setting

wi = 1

‖vi‖vi, i = 1, . . . , n, (4.3.15)

so that {w1, . . . , wn} is an orthogonal basis of U consisting of unit vectors(that is, ‖wi‖ = 1 for i = 1, . . . , n). Such an orthogonal basis is called anorthonormal basis.

We next examine the dual space U ′ of U in view of the positive definitescalar product (·, ·) over U .

For any u ∈ U , it is clear that

f (v) = (u, v), v ∈ U, (4.3.16)

defines an element in U ′. Now let {u1, . . . , un} be an orthonormal basis of U

and define fi ∈ U ′ by setting


fi(v) = (ui, v), v ∈ U. (4.3.17)

Then {f1, . . . , fi} is a basis of U ′ which is dual to {u1, . . . , un}. Hence, forany f ∈ U ′, there are scalars a1, . . . , an such that

f = a1f1 + · · · + anfn. (4.3.18)

Consequently, we have

f (v) = a1f1(v)+ · · · + anfn(v)

= a1(u1, v)+ · · · + an(un, v)

= (a1u1 + · · · + anun, v)

≡ (u, v), u = a1u1 + · · · + anun. (4.3.19)

Thus, each element in U ′ may be represented by an element in U in the formof a scalar product. In other words, the Riesz representation theorem still holdshere, although homogeneity of such a representation takes an adjusted form,

fi #→ ui, i = 1, . . . , n; f = a1f1 + · · · + anfn #→ a1u1 + · · · + anun.

(4.3.20)

Nevertheless, we now show that adjoint mappings are well defined.Let U,V be finite-dimensional vector spaces over C with positive definite

scalar products, both denoted by (·, ·). For T ∈ L(U, V ) and any v ∈ V , theexpression

f (u) = (v, T (u)), u ∈ U, (4.3.21)

defines an element f ∈ U ′. Hence there is a unique element w ∈ U suchthat f (u) = (w, u). Since w depends on v, we may denote this relation byw = T ′(v). Hence

(v, T (u)) = (T ′(v), u). (4.3.22)

We will prove T ′ ∈ L(V,U).In fact, for v1, v2 ∈ V , we have

(T ′(v1 + v2), u) = (v1 + v2, T (u))

= (v1, T (u))+ (v2, T (u))

= (T ′(v1), u)+ (T ′(v2), u)

= (T ′(v1)+ T ′(v2), u), u ∈ U. (4.3.23)

Thus T ′(v1 + v2) = T ′(v1)+ T ′(v2) and additivity follows.

132 Scalar products

For a ∈ C and v ∈ V , we have

(T ′(av), u) = (av, T (u)) = a(v, T (u))

= a(T ′(v), u) = (aT ′(v), u), u ∈ U. (4.3.24)

This shows T ′(av) = aT ′(v) and homogeneity also follows.Of particular interest is a mapping from U into itself. In this case we can

define a self-dual or self-adjoint mapping T with respect to the positive definitescalar product (·, ·) to be such that T ′ = T . Similar to Definition 4.5, we alsohave the following.

Definition 4.8 Let U be a real or complex vector space equipped witha positive definite scalar product (·, ·). Assume that T ∈ L(U) satisfies(T (u), T (v)) = (u, v) for any u, v ∈ U .

(1) T is called orthogonal when U is real.(2) T is called unitary when U is complex.

In analogue to Theorem 4.6, we have the following.

Theorem 4.9 That T ∈ L(U) is orthogonal or unitary is equivalent to one ofthe following statements.

(1) T is norm-preserving. That is, ‖T (u)‖ = ‖u‖ for any u ∈ U , where ‖ · ‖is the norm of U induced from its positive definite scalar product (·, ·).

(2) T maps an orthonormal basis to another orthonormal basis of U .(3) T ′ ◦ T = T ◦ T ′ = I .

Proof We need only to carry out the proof in the complex case because nowthe scalar product (·, ·) fails to be symmetric and the relation (4.2.18) is invalid.

That T being unitary implies (1) is trivial since ‖T (u)‖2 = (T (u), T (u)) =(u, u) = ‖u‖2 for any u ∈ U .

Assume (1) holds. From the expansions

‖u+ v‖2 = ‖u‖2 + ‖v‖2 + 2�{(u, v)}, (4.3.25)

‖iu+ v‖2 = ‖u‖2 + ‖v‖2 + 2�{(u, v)}, (4.3.26)

we obtain the following polarization identity in the complex situation:

(u, v) = 1

2(‖u+ v‖2 − ‖u‖2 − ‖v‖2)+ 1

2i(‖iu+ v‖2 − ‖u‖2 − ‖v‖2),

u, v ∈ U. (4.3.27)


Applying (4.3.27), we obtain

(T (u), T (v)) = 1

2(‖T (u+ v)‖2 − ‖T (u)‖2 − ‖T (v)‖2)

+ 1

2i(‖T (iu+ v)‖2 − ‖T (u)‖2 − ‖T (v)‖2)

= 1

2(‖u+ v‖2 − ‖u‖2 − ‖v‖2)

+ 1

2i(‖iu+ v‖2 − ‖u‖2 − ‖v‖2)

= (u, v), u, v ∈ U. (4.3.28)

Hence T is unitary.The rest of the proof is similar to that of Theorem 4.6 and thus skipped.

If T ∈ L(U) is orthogonal or unitary and λ ∈ C an eigenvalue of T , then itis clear that |λ| = 1 since T is norm-preserving.

Let A ∈ F(n, n) and define TA ∈ L(Fn) in the usual way TA(u) = Au forany column vector u ∈ Fn.

When F = R, let the positive define scalar product be the Euclidean onegiven in (4.3.1). That is,

(u, v) = utv, u, v ∈ Rn. (4.3.29)

Thus

(u, TA(v)) = utAv = (Atu)tv = (T ′A(u), v), u, v ∈ Rn. (4.3.30)

Therefore T ′A(u) = Atu (u ∈ Rn). If TA is orthogonal, then T ′A ◦ TA = TA ◦T ′A = I which leads to AtA = AAt = In. Besides, a self-adjoint mappingTA = T ′A is defined by a symmetric matrix, A = At .

The above discussion may be carried over to the abstract setting as follows.Let U be a real vector space with a positive definite scalar product (·, ·) and

B = {u1, . . . , un} an orthonormal basis. Assume T ∈ L(U) is represented bythe matrix A ∈ R(n, n) with respect to the basis B so that

T (uj ) =n∑

i=1

aijui, j = 1, . . . , n. (4.3.31)

Similarly T ′ ∈ L(U) is represented by A′ = (a′ij ) ∈ R(n, n). Then we have

aij = (ui, T (uj )) = (T ′(ui), uj ) = a′j i , i, j = 1, . . . , n. (4.3.32)

134 Scalar products

So we again have A′ = At . If T is orthogonal, then T ◦ T ′ = T ′ ◦ T = I ,which gives us AAt = AtA = In as before. If T is self-adjoint, T = T ′, thenA is again a symmetric matrix.

When F = C, let the positive define scalar product be the Hermitian onegiven in (4.3.2). Then

(u, v) = utv, u, v ∈ Cn. (4.3.33)

Thus

(u, TA(v)) = utAv =(A

tu)t

v = (T ′A(u), v), u, v ∈ Cn. (4.3.34)

Therefore T ′A(u) = Atu (u ∈ Cn). If TA is unitary, then T ′A◦TA = TA◦T ′A = I

which leads to AtA = AA

t = In. Besides, a self-adjoint mapping T = T ′is defined by a matrix which is symmetric under complex conjugation andtranspose of A. That is, A = A

t.

Similar to the real vector space situation, we leave it as an exercise to ex-amine that, in the abstract setting, the matrix representing the adjoint mappingwith respect to an orthonormal basis is obtained by taking matrix transposeand complex conjugate of the matrix of the original mapping with respect tothe same basis.

The above calculations lead us to formulate the following concepts, whichwere originally introduced in Section 1.1 without explanation.

Definition 4.10 A real matrix A ∈ R(n, n) is said to be orthogonal if its trans-pose At is its inverse. That is, AAt = AtA = In. It is easily checked thatA is orthogonal if and only if its sets of column and row vectors both formorthonormal bases of Rn with the standard Euclidean scalar product.

A complex matrix A ∈ C(n, n) is said to be unitary if the complex conjugateof its transpose A

t, also called its Hermitian conjugate denoted as A† = A

t,

is the inverse of A. That is AA† = A†A = In. It is easily checked that A isunitary if and only if its sets of column and row vectors both form orthonormalbases of Cn with the standard Hermitian scalar product.

A complex matrix A ∈ C(n, n) is called Hermitian if it satisfies the propertyA = A† or, equivalently, if it defines a self-adjoint mapping TA = T ′A.

With the above terminology and the Gram–Schmidt procedure, we mayestablish a well-known matrix factorization result, commonly referred to asthe QR factorization, for a non-singular matrix.

In fact, let A ∈ C(n, n) be non-singular and use u1, . . . , un to denote the n

corresponding column vectors of A that form a basis of Cn. Use (·, ·) to denote


the Hermitian scalar product on Cn. That is, (u, v) = u†v, where u, v ∈ Cn

are column vectors. Apply the Gram–Schmidt procedure to set⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩

v1 = u1,

v2 = u2 − (v1, u2)

(v1, v1)v1,

· · · · · · · · · · · ·vn = un − (v1, un)

(v1, v1)v1 − · · · − (vn−1, un)

(vn−1, vn−1)vn−1.

(4.3.35)

Then {v1, . . . , vn} is an orthogonal basis of Cn. Set wi = (1/‖vi‖)vi fori = 1, . . . , n. We see that {w1, . . . , wn} is an orthonormal basis of Cn.Therefore, inverting (4.3.35) and rewriting the resulting relations in terms of{w1, . . . , wn}, we get⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩

u1 = ‖v1‖w1,

u2 = (v1, u2)

(v1, v1)‖v1‖w1 + ‖v2‖w2,

· · · · · · · · · · · ·un = (v1, un)

(v1, v1)‖v1‖w1 + · · · + (vn−1, un)

(vn−1, vn−1)‖vn−1‖wn−1 + ‖vn‖wn.

(4.3.36)

For convenience, we may express (4.3.36) in the compressed form⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩u1 = r11w1,

u2 = r12w1 + r22w2,

· · · · · · · · · · · ·un = r1nw1 + · · · + rn−1,nwn−1 + rnnwn,

(4.3.37)

which implies the matrix relation

A = (u1, . . . , un) = (w1, . . . , wn)

⎛⎜⎜⎜⎜⎜⎝r11 r12 · · · r1n

0 r22 · · · r2n

......

. . ....

0 0 · · · rnn

⎞⎟⎟⎟⎟⎟⎠= QR, (4.3.38)

where Q = (w1, . . . , wn) ∈ C(n, n) is unitary since its column vectors aremutually perpendicular and of unit length and R = (rij ) ∈ C(n, n) is uppertriangular with positive diagonal entries since rii = ‖vi‖ for i = 1, . . . , n.

136 Scalar products

This explicit construction is the desired QR factorization for A. It is clear thatif A is real then the matrices Q and R are also real and Q is orthogonal.

To end this section, we note that the norm of a vector u in a vector space U

equipped with a positive definite scalar product (·, ·) may be evaluated throughthe following useful expression

‖u‖ = sup{|(u, v)| | v ∈ U, ‖v‖ = 1}. (4.3.39)

Indeed, let η denote the right-hand side of (4.3.39). From the Schwarz inequal-ity (4.3.10), we get |(u, v)| ≤ ‖u‖ for any v ∈ U satisfying ‖v‖ = 1. Soη ≤ ‖u‖. To show that η ≥ ‖u‖, it suffices to consider the nontrivial situationu = 0. In this case, we have

η ≥∣∣∣∣(u,

1

‖u‖u

)∣∣∣∣ = ‖u‖. (4.3.40)

Hence, in conclusion, η = ‖u‖ and (4.3.39) follows.

Exercises

4.3.1 Let U be a vector space with a positive definite scalar product (·, ·).Show that u1, . . . , uk ∈ U are linearly independent if and only if theirassociated metric matrix, also called the Gram matrix,

M =⎛⎜⎝ (u1, u1) · · · (u1, uk)

· · · · · · · · ·(uk, u1) · · · (uk, uk)

⎞⎟⎠ (4.3.41)

is nonsingular.4.3.2 (Continued from Exercise 4.3.1) Show that if u ∈ U lies in

Span{u1, . . . , uk} then the column vector ((u1, u), . . . , (uk, u))t lies inthe column space of the metric matrix M . However, the converse is nottrue when k < dim(U).

4.3.3 Let U be a complex vector space with a positive definite scalar productand B = {u1, . . . , un} an orthonormal basis of U . For T ∈ L(U), letA,A′ ∈ C(n, n) be the matrices that represent T , T ′, respectively, withrespect to the basis B. Show that A′ = A†.

4.3.4 Let U be a finite-dimensional complex vector space with a positive defi-nite scalar product and S ∈ L(U) be anti-self-adjoint. That is, S′ = −S.Show that I ± S must be invertible.

4.3.5 Consider the complex vector space C(m, n). Show that

(A,B) = Tr(A†B), A,B ∈ C(m, n) (4.3.42)

4.4 Orthogonal resolutions of vectors 137

defines a positive definite scalar product over C(m, n) that extends thetraditional Hermitian scalar product over Cm = C(m, 1). With such ascalar product, the quantity ‖A‖ = √(A,A) is sometimes called theHilbert–Schmidt norm of the matrix A ∈ C(m, n).

4.3.6 Let (·, ·) be the standard Hermitian scalar product on Cm and A ∈C(m, n). Establish the following statement known as the Fredholmalternative for complex matrix equations: Given b ∈ Cm the non-homogeneous equation Ax = b has a solution for some x ∈ Cn if andonly if (y, b) = 0 for any solution y ∈ Cm of the homogeneous equationA†y = 0.

4.3.7 For A ∈ C(n, n) show that if the column vectors of A form an orthonor-mal basis of Cn with the standard Hermitian scalar product so do therow vectors of A.

4.3.8 For the matrix

A =⎛⎜⎝ 1 −1 1

−1 1 2

2 1 −2

⎞⎟⎠ , (4.3.43)

obtain a QR factorization.

4.4 Orthogonal resolutions of vectors

In this section, we continue to study a finite-dimensional vector space U with apositive definite scalar product (·, ·). We focus our attention on the problem ofresolving a vector into the span of a given set of orthogonal vectors. To avoidthe trivial situation, we always assume that the set of the orthogonal vectorsconcerned never contains a zero vector unless otherwise stated.

We begin with the following basic orthogonal decomposition theorem.

Theorem 4.11 Let V be a subspace of U . Then there holds the orthogonaldecomposition

U = V ⊕ V ⊥. (4.4.1)

Proof If V = {0}, there is nothing to show. Assume V = {0}. Let{v1, . . . , vk} be an orthogonal basis of V . We can then expand {v1, . . . , vk} intoan orthogonal basis for U , which may be denoted as {v1, . . . , vk, w1, . . . , wl},where k + l = n = dim(U). Of course, w1, . . . , wl ∈ V ⊥. For any u ∈ V ⊥,we rewrite u as

u = a1v1 + · · · + akvk + b1w1 + · · · + blwl, (4.4.2)

138 Scalar products

with some scalars a1, . . . , ak, b1, . . . , bl . From (u, vi) = ai(vi, vi) = 0(i = 1, . . . , k), we obtain a1 = · · · = ak = 0, which establishesu ∈ Span{w1, . . . , wl}. So V ⊥ ⊂ Span{w1, . . . , wl}. Therefore V ⊥ =Span{w1, . . . , wl} and U = V + V ⊥.

Finally, take u ∈ V ∩ V ⊥. Then (u, u) = 0. By the positivity condition, weget u = 0. Thus (4.4.1) follows.

We now follow (4.4.1) to concretely construct the orthogonal decompositionof a given vector.

Theorem 4.12 Let V be a nonzero subspace of U with an orthogonal basis{v1, . . . , vk}. Any u ∈ U may be uniquely decomposed into the form

u = v + w, v ∈ V, w ∈ V ⊥, (4.4.3)

where v is given by the expression

v =k∑

i=1

(vi, u)

(vi, vi)vi . (4.4.4)

Moreover, the vector v given in (4.4.4) is the unique solution of the minimiza-tion problem

η ≡ inf{‖u− x‖ | x ∈ V }. (4.4.5)

Proof The validity of the expression (4.4.3) for some unique v ∈ V andw ∈ V ⊥ is already ensured by Theorem 4.11.

We rewrite v as

v =k∑

i=1

aivi . (4.4.6)

Then (vi, v) = ai(vi, vi) (i = 1, . . . , k). That is,

ai = (vi, v)

(vi, vi), i = 1, . . . , k, (4.4.7)

which verifies (4.4.4).For the scalars a1, . . . , ak given in (4.4.7) and v in (4.4.6), we have from

(4.4.3) the relation

u−k∑

i=1

bivi = w +k∑

i=1

(ai − bi)vi, x =k∑

i=1

bivi ∈ V. (4.4.8)


Consequently,

‖u− x‖2 =∥∥∥∥∥u−

k∑i=1

bivi

∥∥∥∥∥2

=∥∥∥∥∥w +

k∑i=1

(ai − bi)vi

∥∥∥∥∥2

= ‖w‖2 +k∑

i=1

|ai − bi |2‖vi‖2

≥ ‖w‖2, (4.4.9)

and the lower bound ‖w‖2 is attained only when bi = ai for all i = 1, . . . , k,or x = v.

So the proof is complete.

Definition 4.13 Let {v1, . . . , vk} be a set of orthogonal vectors in U . For u ∈U , the sum

k∑i=1

aivi, ai = (vi, u)

(vi, vi), i = 1, . . . , k, (4.4.10)

is called the Fourier expansion of u and a1, . . . , ak are the Fourier coefficients,with respect to the orthogonal set {v1, . . . , vk}.

Of particular interest is when a set of orthogonal vectors becomes a basis.

Definition 4.14 Let {v1, . . . , vn} be a set of orthogonal vectors in U . The setis said to be complete if it is a basis of U .

The completeness of a set of orthogonal vectors is seen to be characterizedby the norms of vectors in relation to their Fourier coefficients.

Theorem 4.15 Let U be a vector space with a positive definite scalar productand V = {v1, . . . , vn} a set of orthogonal vectors in U . For any u ∈ U , let thescalars a1, . . . , an be the Fourier coefficients of u with respect to V .

(1) There holds the inequality

n∑i=1

|ai |2‖vi‖2 ≤ ‖u‖2 (4.4.11)

140 Scalar products

(which is often referred to as the Bessel inequality).

(2) That u ∈ Span{v1, . . . , vn} if and only if the equality in (4.4.11) isattained. That is,

n∑i=1

|ai |2‖vi‖2 = ‖u‖2 (4.4.12)

(which is often referred to as the Parseval identity). Therefore, the set V iscomplete if and only if the Parseval identity (4.4.12) holds for any u ∈ U .

Proof For given u, use v to denote the Fourier expansion of v as stated in(4.4.3) and (4.4.4) with k = n. Then ‖u‖2 = ‖w‖2 + ‖v‖2. In particular,‖v‖2 ≤ ‖u‖2, which is (4.4.11).

It is clear that u ∈ Span{v1, . . . , vn} if and only if w = 0, which is equivalentto the fulfillment of the equality ‖u‖2 = ‖v‖2, which is (4.4.12).

If {u1, . . . , un} is a set of orthogonal unit vectors in U , the Fourier expansionand Fourier coefficients of a vector u ∈ U with respect to {u1, . . . , un} takethe elegant forms

n∑i=1

aiui, ai = (ui, u), i = 1, . . . , n, (4.4.13)

such that the Bessel inequality becomes

n∑i=1

|(ui, u)|2 ≤ ‖u‖2. (4.4.14)

Therefore {u1, . . . , un} is complete if and only if the Parseval identity

n∑i=1

|(ui, u)|2 = ‖u‖2 (4.4.15)

holds for any u ∈ U .It is interesting to note that, since ai = (vi, u)/‖vi‖2 (i = 1, . . . , n) in

(4.4.11) and (4.4.12), the inequalities (4.4.11) and (4.4.14), and the identities(4.4.12) and (4.4.15), are actually of the same structures, respectively.

Exercises

4.4.1 Let U be a finite-dimensional vector space with a positive definite scalarproduct, (·, ·), and S = {u1, . . . , un} an orthogonal set of vectors in U .Show that the set S is complete if and only if S⊥ = {0}.


4.4.2 Let V be a nontrivial subspace of a vector space U with a positive def-inite scalar product. Show that, if {v1, . . . , vk} is an orthogonal basis ofV , then the mapping P : U → U given by its Fourier expansion,

P(u) =k∑

i=1

(vi, u)

(vi, vi)vi, u ∈ U, (4.4.16)

is the projection of U along V ⊥ onto V . That is, P ∈ L(U), P 2 = P ,N(P ) = V ⊥, and R(P ) = V .

4.4.3 Let U be a finite-dimensional vector space with a positive definite scalarproduct, (·, ·), and V a subspace of U . Use PV : U → U to denote theprojection of U onto V along V ⊥. Prove that if W is a subspace of U

containing V then

‖u− PV (u)‖ ≥ ‖u− PW(u)‖, u ∈ U, (4.4.17)

and V = W if and only if equality in (4.4.17) holds, where the norm ‖·‖of U is induced from the positive definite scalar product (·, ·). In otherwords, orthogonal projection of a vector into a larger subspace providesa better approximation of the vector.

4.4.4 Let V be a nontrivial subspace of a finite-dimensional vector space U

with a positive definite scalar product. Recall that over U/V we maydefine the norm

‖[u]‖ = inf{‖x‖ | x ∈ [u]}, [u] ∈ U/V, (4.4.18)

for the quotient space U/V .

(i) Prove that for each [u] ∈ U/V there is a unique w ∈ [u] such that‖[u]‖ = ‖w‖.

(ii) Find a practical method to compute the vector w shown to exist inpart (i) among the coset [u].

4.4.5 Consider the vector space Pn of real-coefficient polynomials in variablet of degrees up to n ≥ 1 with the positive definite scalar product

(u, v) =∫ 1

−1u(t)v(t) dt, u, v ∈ Pn. (4.4.19)

Applying the Gram–Schmidt procedure to the standard basis{1, t, . . . , tn} of Pn, we may construct an orthonormal basis, say{L0, L1, . . . , Ln}, of Pn. The set of polynomials {Li(t)} are the well-known Legendre polynomials. Explain why the degree of each Li(t)

must be i (i = 0, 1, . . . , n) and find Li(t) for i = 0, 1, 2, 3, 4.

142 Scalar products

4.4.6 In P2, find the Fourier expansion of the polynomial u(t) = −3t2+ t−5in terms of the Legendre polynomials.

4.4.7 Let Pn be the vector space with the scalar product defined in (4.4.19).For f ∈ P ′3 satisfying

f (1) = −1, f (t) = 2, f (t2) = 6, f (t3) = −5, (4.4.20)

find an element v ∈ P3 such that v is the pre-image of the Riesz mappingρ : P3 → P ′3 of f ∈ P ′3, or ρ(v) = f . That is, there holds

f (u) = (u, v), u ∈ P3. (4.4.21)

4.5 Orthogonal and unitary versus isometric mappings

Let U be a vector space with a positive definite scalar product, (·, ·), whichinduces a norm ‖ · ‖ on U . If T ∈ L(U) is orthogonal or unitary, then it isclear that

‖T (u)− T (v)‖ = ‖u− v‖, u, v ∈ U. (4.5.1)

In other words, the distance of the images of any two vectors in U under T

is the same as that between the two vectors. A mapping from U into itselfsatisfying such a property is called an isometry or isometric. In this section, weshow that any zero-vector preserving mapping from a real vector space U witha positive definite scalar product into itself satisfying the property (4.5.1) mustbe linear. Therefore, in view of Theorems 4.6, it is orthogonal. In other words,in the real setting, being isometric characterizes a mapping being orthogonal.

Theorem 4.16 Let U be a real vector space with a positive definite scalarproduct. A mapping T from U into itself satisfying the isometric property(4.5.1) and T (0) = 0 if and only if it is orthogonal.

Proof Assume T satisfies T (0) = 0 and (4.5.1). We show that T must belinear. To this end, from (4.5.1) and replacing v by 0, we get ‖T (u)‖ = ‖u‖for any u ∈ U . On the other hand, the symmetry of the scalar product (·, ·)gives us the identity

(u, v) = 1

2(‖u+ v‖2 − ‖u‖2 − ‖v‖2), u, v ∈ U. (4.5.2)

4.5 Orthogonal and unitary versus isometric mappings 143

Replacing u, v in (4.5.2) by T (u),−T (v), respectively, we get

−(T (u), T (v)) = 1

2(‖T (u)− T (v)‖2 − ‖T (u)‖2 − ‖ − T (v)‖2)

= 1

2(‖u− v‖2 − ‖u‖2 − ‖v‖2)

= −(u, v). (4.5.3)

Hence (T (u), T (v)) = (u, v) for any u, v ∈ U . Using this result, we have

‖T (u+ v)− T (u)− T (v)‖2

= ‖T (u+ v)‖2 + ‖T (u)‖2 + ‖T (v)‖2

− 2(T (u+ v), T (u))− 2(T (u+ v), T (v))+ 2(T (u), T (v))

= ‖u+ v‖2 + ‖u‖2 + ‖v‖2 − 2(u+ v, u)− 2(u+ v, v)+ 2(u, v)

= ‖(u+ v)− u− v‖2 = 0, (4.5.4)

which proves the additivity condition

T (u+ v) = T (u)+ T (v), u, v ∈ U. (4.5.5)

Besides, setting v = −u in (4.5.5) and using T (0) = 0, we have T (−u) =−T (u) for any u ∈ U .

Moreover, (4.5.5) also implies that, for any integer m, we have T (mu) =mT (u). Replacing u by

1

mu where m is a nonzero integer, we also have

T (1

mu) = 1

mT (u). Combining these results, we conclude that

T (ru) = rT (u), r ∈ Q, u ∈ U. (4.5.6)

Finally, for any a ∈ R, let {rk} be a sequence in Q such that rk → a as k →∞.Then we find

‖T (au)− aT (u)‖ ≤ ‖T (au)− T (rku)‖ + ‖rkT (u)− aT (u)‖= |a − rk|‖u‖ + |rk − a|‖T (u)‖ → 0 as k →∞.

(4.5.7)

That is, T (au) = aT (u) for any a ∈ R and u ∈ U . So homogeneity is estab-lished and the proof follows.

We next give an example showing that, in the complex situation, being iso-metric alone is not sufficient to ensure a mapping to be unitary.

The vector space we consider here is taken to be Cn with the standard Her-mitian scalar product (4.3.2) and the mapping T : Cn → Cn is given byT (u) = u for u ∈ Cn. Then T satisfies (4.5.1) and T (0) = 0. However,

144 Scalar products

T is not homogeneous with respect to scalar multiplication. More precisely,T (au) = aT (u) whenever a ∈ C with �(a) = 0 and u ∈ Cn \ {0}. HenceT ∈ L(U).

It will be interesting to spell out some conditions in the complex situation inaddition to (4.5.1) that would, when put together with (4.5.1), along with thezero-vector preserving property, ensure a mapping to be unitary. The followingtheorem is such a result.

Theorem 4.17 Let U be a complex vector space with a positive definite scalarproduct (·, ·). A mapping T from U into itself satisfying the isometric property(4.5.1), T (0) = 0, and

‖iT (u)− T (v)‖ = ‖iu− v‖, u, v ∈ U, (4.5.8)

if and only if it is unitary, where ‖ · ‖ is induced from (·, ·).

Proof It is obvious that if T ∈ L(U) is unitary then both (4.5.1) and (4.5.8)are fulfilled. We now need to show that the converse is true.

First, setting v = iu in (4.5.8), we obtain

T (iu) = iT (u), u ∈ U. (4.5.9)

Next, replacing u, v in (4.3.27) by T (u),−T (v), respectively, and using(4.5.1) and (4.5.8), we have

−(T (u), T (v)) = 1

2(‖T (u)− T (v)‖2 − ‖T (u)‖2 − ‖T (v)‖2)

+ 1

2i(‖iT (u)− T (v)‖2 − ‖T (u)‖2 − ‖T (v)‖2)

= 1

2(‖u− v‖2 − ‖u‖2 − ‖v‖2)

+ 1

2i(‖u− iv‖2 − ‖u‖2 − ‖u‖2)

= −(u, v), u, v ∈ U. (4.5.10)

That is, (T (u), T (v)) = (u, v) for u, v ∈ U . Thus a direct expansion gives usthe result

‖T (u+ v)− T (u)− T (v)‖2

= ‖T (u+ v)‖2 + ‖T (u)‖2 + ‖T (v)‖2

− 2�(T (u+ v), T (u))− 2�(T (u+ v), T (v))+ 2�(T (u), T (v))

= ‖u+ v‖2 + ‖u‖2 + ‖v‖2 − 2�(u+ v, u)− 2�(u+ v, v)+ 2�(u, v)

= ‖(u+ v)− u− v‖2 = 0, (4.5.11)

4.5 Orthogonal and unitary versus isometric mappings 145

which establishes the additivity property T (u+v) = T (u)+T (v) for u, v ∈ U

as before. Therefore (4.5.6) holds in the current complex formalism.In view of the additivity property of T , (4.5.6), and (4.5.9), we have

T ((p + iq)u) = T (pu)+ T (iqu) = pT (u)+ iqT (u)

= (p + iq)T (u), p, q ∈ Q, u ∈ U. (4.5.12)

Finally, for any a = b+ ic ∈ C where b, c ∈ R, we may choose a sequence{rk} (rk = pk + iqk with pk, qk ∈ Q) such that pk → b and qk → c orrk → a as k →∞. In view of these and (4.5.12), we see that (4.5.7) holds inthe complex situation as well. In other words, T (au) = aT (u) for a ∈ C andu ∈ U .


It is worth noting that Theorems 4.16 and 4.17 are valid in general withoutrestricting to finite-dimensional spaces.

Exercises

4.5.1 Let U be a real vector space with a positive definite scalar product (·, ·).Establish the following variant of Theorem 4.16: A mapping T from U

into itself satisfying the property

‖T (u)+ T (v)‖ = ‖u+ v‖, u, v ∈ U, (4.5.13)

if and only if it is orthogonal, where ‖ · ‖ is induced from (·, ·).4.5.2 Show that the mapping T : Cn → Cn given by T (u) = u for u ∈ Cn,

which is equipped with the standard Hermitian scalar product, satisfies(4.5.13) but that it is not unitary.

4.5.3 Let U be a complex vector space with a positive definite scalar product(·, ·). Establish the following variant of Theorem 4.17: A mapping T

from U into itself satisfying the property (4.5.13) and

‖iT (u)+ T (v)‖ = ‖iu+ v‖, u, v ∈ U, (4.5.14)

if and only if it is unitary, where ‖ · ‖ is induced from (·, ·).4.5.4 Check to see why the mapping T defined in Exercise 4.5.2 fails to satisfy

the property (4.5.14) with U = Cn.4.5.5 Let T : Rn → Rn (n ≥ 2) be defined by

T (x) =

⎛⎜⎜⎝xn

...

x1

⎞⎟⎟⎠ , x =

⎛⎜⎜⎝x1

...

xn

⎞⎟⎟⎠ ∈ Rn, (4.5.15)

146 Scalar products

where Rn is equipped with the standard Euclidean scalar product.

(i) Show that T is an isometry.(ii) Determine all the eigenvalues of T .

(iii) Show that eigenvectors associated to different eigenvalues are mu-tually perpendicular.

(iv) Determine the minimal polynomial of T .

5

Real quadratic forms and self-adjoint mappings

In this chapter we exclusively consider vector spaces over the field of realsunless otherwise stated. We first present a general discussion on bilinear andquadratic forms and their matrix representations. We also show how a sym-metric bilinear form may be uniquely represented by a self-adjoint mapping.We then establish the main spectrum theorem for self-adjoint mappings basedon a proof of the existence of an eigenvalue using calculus. We next focus oncharacterizing the positive definiteness of self-adjoint mappings. After thesewe study the commutativity of self-adjoint mappings. In the last section weshow the effectiveness of using self-adjoint mappings in computing the normof a mapping between different spaces and in the formalism of least squaresapproximations.

5.1 Bilinear and quadratic forms

Let U be a finite-dimensional vector space over R. The simplest real-valuedfunctions over U are linear functions, which are also called functionals earlierand have been studied. The next simplest real-valued functions to be studiedare bilinear forms whose definition is given as follows.

Definition 5.1 A function f : U × U → R is called a bilinear form if itsatisfies, for any u, v,w ∈ U and a ∈ R, the following conditions.

(1) f (u+ v,w) = f (u,w)+ f (v,w), f (au, v) = af (u, v).

(2) f (u, v + w) = f (u, v)+ f (u,w), f (u, av) = af (u, v).

Let B = {u1, . . . , un} be a basis of U . For u, v ∈ U with coordinate vectorsx = (x1, . . . , xn)

t , y = (y1, . . . , yn)t ∈ Rn with respect to B, we have

147

148 Real quadratic forms and self-adjoint mappings

f (u, v) = f

⎛⎝ n∑i=1

xiui,

n∑j=1

yjuj

⎞⎠ = n∑i,j=1

xif (ui, uj )yj = xtAy, (5.1.1)

where A = (aij ) = (f (ui, uj )) ∈ R(n, n) is referred to as the matrix repre-sentation of the bilinear form f with respect to the basis B.

Let B = {u1, . . . , un} be another basis of U so that A = (aij ) =(f (ui , uj )) ∈ R(n, n) is the matrix representation of f with respect to B.If x, y ∈ Rn are the coordinate vectors of u, v ∈ U with respect to B and thebasis transition matrix between B and B is B = (bij ) so that

uj =n∑

i=1

bijui, j = 1, . . . , n, (5.1.2)

then x = Bx, y = By (cf. Section 1.3). Hence we arrive at

f (u, v) = xt Ay = xtAy = xt (BtAB)y, (5.1.3)

which leads to the relation A = BtAB and gives rise to the following concept.

Definition 5.2 For A,B ∈ F(n, n), we say that A and B are congruent if thereis an invertible element C ∈ F(n, n) such that

A = CtBC. (5.1.4)

Therefore, our calculation above shows that the matrix representations of abilinear form with respect to different bases are congruent.

For a bilinear form f : U × U → R, we can set

q(u) = f (u, u), u ∈ U, (5.1.5)

which is called the quadratic form associated with the bilinear form f . Thequadratic form q is homogeneous of degree 2 since q(tu) = t2q(u) for anyt ∈ R and u ∈ U .

Of course q is uniquely determined by f through (5.1.5). However, the con-verse is not true, which will become clear after the following discussion.

To proceed, let B = {u1, . . . , un} be a basis of U and u ∈ U any givenvector whose coordinate vector with respect to B is x = (x1, . . . , xn)

t . Then,from (5.1.1), we have

q(u) = xtAx = xt

(1

2(A+ At)+ 1

2(A− At)

)x = 1

2xt (A+ At)x,

(5.1.6)

5.1 Bilinear and quadratic forms 149

since xtAx is a scalar which results in xtAx = (xtAx)t = xtAtx. In otherwords, the quadratic form q constructed from (5.1.5) can only capture theinformation contained in the symmetric part, 1

2 (A + At), but nothing in theskewsymmetric part, 1

2 (A−At), of the matrix A = (f (ui, uj )). Consequently,the quadratic form q cannot determine the bilinear form f completely, in gen-eral situations, unless the skewsymmetric part of A is absent or A − At = 0.In other words, A = (f (ui, uj )) is symmetric. This observation motivates thefollowing definition.

Definition 5.3 A bilinear form f : U × U → R is symmetric if it satisfies

f (u, v) = f (v, u), u, v ∈ U. (5.1.7)

If f is a symmetric bilinear form, then we have the expansion

f (u+ v, u+ v) = f (u, u)+ f (v, v)+ 2f (u, v), u, v ∈ U. (5.1.8)

Thus, if q is the quadratic form associated with f , we derive from (5.1.8) therelation

f (u, v) = 1

2(q(u+ v)− q(u)− q(v)), u, v ∈ U, (5.1.9)

which indicates how f is uniquely determined by q. In a similar manner, wealso have

f (u, v) = 1

4(q(u+ v)− q(u− v)), u, v ∈ U. (5.1.10)

As in the situation of scalar products, the relations of the types (5.1.9) and(5.1.10) are often referred to as polarization identities for symmetric bilinearforms.

From now on we will concentrate on symmetric bilinear forms.Let f : U × U → R be a symmetric bilinear form. If x, y ∈ Rn are the

coordinate vector of u, v ∈ U with respect to a basis B, then f (u, v) is givenby (5.1.1) so that matrix A ∈ R(n, n) is symmetric. Recall that (x, y) = xty isthe Euclidean scalar product over Rn. Thus, if we view A as a linear mappingRn → Rn given by x #→ Ax, then the right-hand side of (5.1.1) is simply(x,Ay). Since A = At , the right-hand side of (5.1.1) is also (Ax, y). In otherwords, A defines a self-adjoint mapping over Rn with respect to the standardEuclidean scalar product over Rn.

Conversely, if U is a vector space equipped with a positive definite scalarproduct (·, ·) and T ∈ L(U) is a self-adjoint or symmetric mapping, then

f (u, v) = (u, T (v)), u, v ∈ U, (5.1.11)


is a symmetric bilinear form. Thus, in this way, we see that symmetric bilinearforms are completely characterized.

In a more precise manner, we have the following theorem, which relatessymmetric bilinear forms and self-adjoint mappings over a vector space with apositive definite scalar product.

Theorem 5.4 Let U be a finite-dimensional vector space with a positive def-inite scalar product (·, ·). For any symmetric bilinear form f : U × U → R,there is a unique self-adjoint or symmetric linear mapping, say T ∈ L(U),such that the relation (5.1.11) holds.

Proof For each v ∈ U , the existence of a unique vector, say T (v), so that(5.1.11) holds, is already shown in Section 4.2. Since f is bilinear, we haveT ∈ L(U). The self-adjointness or symmetry of T follows from the symmetryof f and the scalar product.

Note that, in view of Section 4.1, a symmetric bilinear form is exactly a kindof scalar product as well, not necessarily positive definite, though.

Exercises

5.1.1 Let f : U × U → R be a bilinear form such that f (u, u) > 0 andf (v, v) < 0 for some u, v ∈ U .

(i) Show that u, v are linearly independent.(ii) Show that there is some w ∈ U,w = 0 such that f (w,w) = 0.

5.1.2 Let A,B ∈ F(n, n). Show that if A,B are congruent then A,B musthave the same rank.

5.1.3 Let A,B ∈ F(n, n). Show that if A,B are congruent and A is symmetricthen so is B.

5.1.4 Are the matrices

A =(

2 1

1 1

), B =

(0 1

1 0

), (5.1.12)

congruent in R(2, 2)?5.1.5 Consider the identity matrix In ∈ F(n, n). Show that In and −In are not

congruent if F = R but are congruent if F = C.

5.2 Self-adjoint mappings 151

5.1.6 Consider the quadratic form

q(x) = x21 + 2x2

2 − x23 + 2x1x2 − 4x1x3, x =

⎛⎜⎝ x1

x2

x3

⎞⎟⎠ ∈ R3,

(5.1.13)

where R3 is equipped with the standard Euclidean scalar product(x, y) = xty for x, y ∈ R3.

(i) Find the unique symmetric bilinear form f : R3 × R3 → R suchthat q(x) = f (x, x) for x ∈ R3.

(ii) Find the unique self-adjoint mapping T ∈ L(R3) such thatf (x, y) = (x, T (y)) for any x, y ∈ R3.

5.1.7 Let f : U × U → R be a bilinear form.

(i) Show that f is skewsymmetric, that is, f (u, v) = −f (v, u) for anyu, v ∈ U , if and only if f (u, u) = 0 for any u ∈ U .

(ii) Show that if f is skewsymmetric then the matrix representation off with respect to an arbitrary basis of U must be skewsymmetric.

5.1.8 Prove that Theorem 5.4 is still true if the positive definite scalar product(·, ·) of U is only assumed to be non-degenerate instead to ensure theexistence and uniqueness of an element T ∈ L(U).

5.2 Self-adjoint mappings

Let U be a finite-dimensional vector space with a positive definite scalarproduct, (·, ·). For self-adjoint mappings, we have the following foundationaltheorem.

Theorem 5.5 Assume that T ∈ L(U) is self-adjoint with respect to the scalarproduct of U . Then

(1) T must have a real eigenvalue,(2) there is an orthonormal basis of U consisting of eigenvectors of T .

Proof We use induction on dim(U).There is nothing to show at dim(U) = 1.Assume that the theorem is true at dim(U) = n− 1 ≥ 1.We investigate the situation when dim(U) = n ≥ 2.


For convenience, let B = {u1, . . . , un} be an orthonormal basis of U . Forany vector u ∈ U , let x = (x1, . . . , xn)

t ∈ Rn be its coordinate vector. UseA = (aij ) ∈ R(n, n) to denote the matrix representation of T with respect toB so that

T (uj ) =n∑

i=1

aijui, i = 1, . . . , n. (5.2.1)

Then set

Q(x) = (u, T (u)) =⎛⎝ n∑

i=1

xiui,

n∑j=1

xjT (uj )

⎞⎠=⎛⎝ n∑

i=1

xiui,

n∑j=1

xj

n∑k=1

akjuk

⎞⎠ = n∑i,j,k=1

xixj akj δik

=n∑

i,j=1

xixj aij = xtAx. (5.2.2)

Consider the unit sphere in U given as

S ={

u =n∑

i=1

xiui ∈ U

∣∣∣∣ ‖u‖2 = (u, u) =n∑

i=1

x2i = 1

}, (5.2.3)

which may also be identified as the unit sphere in Rn centered at the originand commonly denoted by Sn−1. Since Sn−1 is compact, the function Q givenin (5.2.2) attains its minimum over Sn−1 at a certain point on Sn−1, say x0 =(x0

1 , . . . , x0n)t .

We may assume x0n = 0. Without loss of generality, we may also assume

x0n > 0 (the case x0

n < 0 can be treated similarly). Hence, near x0, we mayrepresent the points on Sn−1 by the formulas

xn =√

1− x21 − · · · − x2

n−1 where (x1, . . . , xn−1) is near (x01 , . . . , x0

n−1).

(5.2.4)

Therefore, with

P(x1, . . . , xn−1) = Q

(x1, . . . , xn−1,

√1− x2

1 − · · · − x2n−1

), (5.2.5)

we see that (x01 , . . . , x0

n−1) is a critical point of P . Thus we have

(∇P)(x01 , . . . , x0

n−1) = 0. (5.2.6)


In order to carry out the computation involved in (5.2.6), we rewrite thefunction P as

P(x1, . . . , xn−1) =n−1∑i,j=1

aij xixj

+ 2n−1∑i=1

ainxi

√1− x2

1 − · · · − x2n−1

+ ann(1− x21 − · · · − x2

n−1). (5.2.7)

Thus

∂P

∂xi

= 2n−1∑j=1

aij xj + 2ain

√1− x2

1 − · · · − x2n−1

− 2xi√

1− x21 − · · · − x2

n−1

n−1∑j=1

ajnxj − 2annxi, i = 1, . . . , n− 1.

(5.2.8)

Using (5.2.8) in (5.2.6), we arrive at

n∑j=1

aij x0j =

1

x0n

⎛⎝ n∑j=1

ajnx0j

⎞⎠ x0i , i = 1, . . . , n− 1. (5.2.9)

Note that we also have

n∑j=1

anj x0j =

1

x0n

⎛⎝ n∑j=1

ajnx0j

⎞⎠ x0n (5.2.10)

automatically since A is symmetric. Combining (5.2.9) and (5.2.10), we get

Ax0 = λ0x0, λ0 = 1

x0n

⎛⎝ n∑j=1

ajnx0j

⎞⎠ , (5.2.11)

which establishes that λ0 is an eigenvalue and x0 an eigenvector associatedwith λ0, of A.

Let v1 ∈ U be such that its coordinate vector with respect to the basis Bis x0. Then we have


T (v1) = T

⎛⎝ n∑j=1

x0j uj

⎞⎠ = n∑j=1

x0j T (uj )

=n∑

i,j=1

aijuix0j =

n∑i=1

⎛⎝ n∑j=1

aij x0j

⎞⎠ ui

= λ0

n∑i=1

x0i ui = λ0v1, (5.2.12)

which verifies that λ0 is an eigenvalue of T and v1 is an associated eigenvector.Now set U1 = Span{v1} and make the decomposition U = U1 ⊕ U⊥1 .

We claim that U⊥1 is invariant under T . In fact, for any u ∈ U⊥1 , we have(u, v1) = 0. Hence (T (u), v1) = (u, T (v1)) = (u, λ0v1) = λ0(u, v1) = 0which indicates T (u) ∈ U⊥1 .

Of course dim(U⊥1 ) = n− 1. Using the inductive assumption, we know thatU⊥1 has an orthonormal basis {v2, . . . , vn} so that each vi is an eigenvectorassociated with a real eigenvalue of T , i = 2, . . . , n.

Finally, we may rescale v1 to make it a unit vector. Thus {v1, v2, . . . , vn} isan orthonormal basis of U so that each vector vi is an eigenvector associatedwith a real eigenvalue of T , i = 1, 2, . . . , n, as desired.

With respect to the basis {v1, . . . , vn}, the matrix representation of T isdiagonal, whose diagonal, entries are the eigenvalues of T that are shown to beall real. In other words, all eigenvalues of T are real.

Let T ∈ L(U) be self-adjoint, use λ1, . . . , λk to denote all the distinct eigen-values of T , and denote by Eλ1 , . . . , Eλk

the corresponding eigenspaces whichare of course invariant subspaces of T . Using Theorem 5.5, we see that thereholds the direct sum

U = Eλ1 ⊕ · · · ⊕ Eλk. (5.2.13)

In particular, we may use E0 to denote the eigenspace corresponding to theeigenvalue 0 (if any). Moreover, we may set

E+ =⊕λi>0

Eλi, E− =

⊕λi<0

Eλi. (5.2.14)

The associated numbers

n0 = dim(E0), n+ = dim(E+), n− = dim(E−), (5.2.15)

are exactly what were previously called the indices of nullity, positivity, andnegativity, respectively, in the context of a real scalar product in Section 4.2,which is simply a symmetric bilinear form here and may always be represented


by a self-adjoint mapping over U . It is clear that n0 is simply the nullity of T ,or n0 = n(T ), and n+ + n− is the rank of T , n+ + n− = r(T ). Furthermore,for ui ∈ Eλi

, uj ∈ Eλj, we have

λi(ui, uj ) = (T (ui), uj ) = (ui, T (uj )) = λj (ui, uj ), (5.2.16)

which leads to (λi − λj )(ui, uj ) = 0. Thus, for i = j , we have (ui, uj ) = 0.In other words, the eigenspaces associated with distinct eigenvalues of T aremutually perpendicular. (This observation suggests a practical way to con-struct an orthogonal basis of U consisting of eigenvectors of T : First findall eigenspaces of T . Then obtain an orthogonal basis for each of theseeigenspaces by using the Gram–Schmidt procedure. Finally put all these or-thogonal bases together to get an orthogonal basis of the full space.)

A useful matrix version of Theorem 5.5 may be stated as follows.

Theorem 5.6 A matrix A ∈ R(n, n) is symmetric if and only if there is anorthogonal matrix P ∈ R(n, n) such that

A = P tDP, (5.2.17)

where D ∈ R(n, n) is a diagonal matrix and the diagonal entries of D are theeigenvalues of A.

Proof The right-hand side of (5.2.17) is of course symmetric.Conversely, let A be symmetric. The linear mapping T ∈ L(Rn) defined by

T (x) = Ax, with x ∈ Rn any column vector, is self-adjoint with respect to thestandard scalar product over Rn. Hence there are column vectors u1, . . . , un

in Rn consisting of eigenvectors of T or A associated with real eigenvalues,say λ1, . . . , λn, which form an orthonormal basis of Rn. The relations Au1 =λ1u1, . . . , Aun = λnun, may collectively be rewritten as AQ = QD whereD = diag{λ1, . . . , λn} and Q ∈ R(n, n) is made of taking u1, . . . , un as therespective column vectors. Since ut

iuj = δij , i, j = 1, . . . , n, we see that Q isan orthogonal matrix. Setting P = Qt , we arrive at (5.2.17).

Note that the proof of Theorem 5.6 gives us a practical way to construct thematrices P and D in (5.2.17).

Exercises

5.2.1 The fact that a symmetric matrix A ∈ R(n, n) has and can have onlyreal eigenvalues may also be proved algebraically more traditionally asfollows. Consider A as an element in C(n, n) and let λ ∈ C be any ofits eigenvalue whose existence is ensured by the Fundamental Theorem


of Algebra. Let u ∈ Cn be an eigenvector of A associated to λ. Then,using (·, ·) to denote the standard Hermitian scalar product over Cn, wehave

λ(u, u) = λu†u = (u,Au). (5.2.18)

Use (5.2.18) to show that λ must be real, or equivalently, λ = λ. Thenshow that A has an eigenvector in Rn associated to λ.


q(x) = x21 + 2x2

2 − 2x23 + 4x1x3, x =

⎛⎜⎝ x1

x2

x3

⎞⎟⎠ ∈ R3. (5.2.19)

(i) Find a symmetric matrix A ∈ R(3, 3) such that q(x) = xtAx.(ii) Find an orthonormal basis of R3 (with respect to the standard Eu-

clidean scalar product of R3) consisting of the eigenvectors of A.(iii) Find an orthogonal matrix P ∈ R(3, 3) so that the substitution of

the variable, y = Px, transforms the quadratic form (5.2.19) intothe diagonal form

λ1y21 + λ2y

22 + λ3y

23 = ytdiag{λ1, λ2, λ3}y, y =

⎛⎜⎝ y1

y2

y3

⎞⎟⎠∈R3,

(5.2.20)

where λ1, λ2, λ3 are the eigenvalues of A, counting multiplicities.

5.2.3 Show that any symmetric matrix A ∈ R(n, n) must be congruent witha diagonal matrix of the form D = diag{d1, . . . , dn}, where di = ±1or 0 for i = 1, . . . , n.

5.2.4 Let A ∈ R(n, n) be symmetric and det(A) < 0. Show that there is acolumn vector x ∈ Rn such that xtAx < 0.

5.2.5 Show that if A ∈ R(n, n) is orthogonal then adj(A) = ±At .5.2.6 Show that if A1, . . . , Ak ∈ R(n, n) are orthogonal, so is their product

A = A1 · · ·Ak .5.2.7 Assume that A ∈ R(n, n) is symmetric and all of its eigenvalues are

±1. Prove that A is orthogonal.5.2.8 Let A ∈ R(n, n) be an upper or lower triangular matrix. If A is orthog-

onal, show that A must be diagonal and the diagonal entries of A canonly be ±1.

5.2.9 Show that if T ∈ L(U) is self-adjoint and T m = 0 for some integerm ≥ 1 then T = 0.

5.3 Positive definite quadratic forms, mappings, and matrices 157

5.2.10 Show that if T ∈ L(U) is self-adjoint and m an odd positive integerthen there is a self-adjoint element S ∈ L(U) such that T = Sm.

5.2.11 Let A ∈ R(n, n) be symmetric and satisfy the equation A3+A2+4In =0. Prove that A = −2In.

5.2.12 Let A ∈ R(n, n) be orthogonal.

(i) Show that the real eigenvalues of A can only be ±1.(ii) If n = odd and det(A) = 1, then 1 is an eigenvalue of A.

(iii) If det(A) = −1, then −1 is an eigenvalue of A.

5.2.13 Let x ∈ Rn be a nonzero column vector. Show that

P = In −(

2

xtx

)xxt (5.2.21)

is an orthogonal matrix.5.2.14 Let A ∈ R(n, n) be an idempotent symmetric matrix such that r(A) =

r . Show that the characteristic polynomial of A is

pA(λ) = det(λIn − A) = (λ− 1)rλn−r . (5.2.22)

5.2.15 Let A ∈ R(n, n) be a symmetric matrix whose eigenvalues are all non-negative. Show that det(A+ In) > 1 if A = 0.

5.2.16 Let u ∈ Rn be a nonzero column vector. Show that there is an orthog-onal matrix Q ∈ R(n, n) such that

Qt(uut )Q = diag{utu, 0, . . . , 0}. (5.2.23)

5.3 Positive definite quadratic forms, mappings,and matrices

Let q be a quadratic form over a finite-dimensional vector space U with a pos-itive definite scalar product (·, ·) and f a symmetric bilinear form that inducesq: q(u) = f (u, u), u ∈ U . Then there is a self-adjoint mapping T ∈ L(U)

such that

q(u) = (u, T (u)), u ∈ U. (5.3.1)

In this section, we apply the results of the previous section to investigate thesituation when q stays positive, which is important in applications.

Definition 5.7 The positive definiteness of various subjects of concern isdefined as follows.


(1) A quadratic form q over U is said to be positive definite if

q(u) > 0, u ∈ U, u = 0. (5.3.2)

(2) A self-adjoint mapping T ∈ L(U) is said to be positive definite if

(u, T (u)) > 0, u ∈ U, u = 0. (5.3.3)

(3) A symmetric matrix A ∈ R(n, n) is said to be positive definite if for anynonzero column vector x ∈ Rn there holds

xtAx > 0. (5.3.4)

Thus, when a quadratic form q and a self-adjoint mapping T are relatedthrough (5.3.1), then the positive definiteness of q and T are equivalent. There-fore it will be sufficient to study the positive definiteness of self-adjoint map-pings which will be shown to be equivalent to the positive definiteness of anymatrix representations of the associated bilinear forms of the self-adjoint map-pings. Recall that the matrix representation of the symmetric bilinear forminduced from T with respect to an arbitrary basis {u1, . . . , un} of U is a matrixA ∈ R(n, n) defined by

A = (aij ), aij = (ui, T (uj )), i, j = 1, . . . , n. (5.3.5)

The following theorem links various notions of positive definiteness.

Theorem 5.8 That a self-adjoint mapping T ∈ L(U) is positive definite isequivalent to any of the following statements.

(1) All the eigenvalues of T are positive.(2) There is a positive constant, λ0 > 0, such that

(u, T (u)) ≥ λ0‖u‖2, u ∈ U. (5.3.6)

(3) There is a positive definite mapping S ∈ L(U) such that T = S2.(4) The matrix A defined in (5.3.5) with respect to an arbitrary basis is posi-

tive definite.(5) The eigenvalues of the matrix A defined in (5.3.5) with respect to an arbi-

trary basis are all positive.(6) The matrix A defined in (5.3.5) with respect to an arbitrary basis enjoys

the factorization A = B2 for some positive definite matrix B ∈ R(n, n).

Proof Assume T is positive definite. Let λ be any eigenvalue of T and u ∈ U

an associated eigenvector. Then λ‖u‖2 = (u, T u) > 0. Thus λ > 0 and (1)follows.


Assume (1) holds. Let {u1, . . . , un} be an orthonormal basis of U so that ui

is an eigenvector associated with the eigenvalue λi (i = 1, . . . , n). Set λ0 =min{λ1, . . . , λn}. Then λ0 > 0. Moreover, for any u ∈ U with u =

n∑i=1

aiui

where ai ∈ R (i = 1, . . . , n), we have

(u, T (u)) =⎛⎝ n∑

i=1

aiui,

n∑j=1

λjajuj

⎞⎠ = n∑i=1

λia2i ≥ λ0

n∑i=1

a2i = λ0‖u‖2,

(5.3.7)

which establishes (2). It is obvious that (2) implies the positive definitenessof T .

We now show that the positive definiteness of T is equivalent to the state-ment (3). In fact, let {u1, . . . , un} be an orthonormal basis of U consisting ofeigenvectors with {λ1, . . . , λn} the corresponding eigenvalues. In view of (1),λi > 0 for i = 1, . . . , n. Now define S ∈ L(U) to satisfy

S(ui) =√

λiui, i = 1, . . . , n. (5.3.8)

It is clear that T = S2.Conversely, if T = S2 for some positive definite mapping S, then 0 is not

an eigenvalue of S. So S is invertible. Therefore

(u, T (u)) = (u, S2(u)) = (S(u), S(u)) = ‖S(u)‖2 > 0, u ∈ U, u = 0,

(5.3.9)

and the positive definiteness of T follows.Let {u1, . . . , un} be an arbitrary basis of U and x = (x1, . . . , xn)

t ∈ Rn a

nonzero vector. For u =n∑

i=1

xiui , we have

(u, T (u)) =⎛⎝ n∑

i=1

xiui,

n∑j=1

xjT (uj )

⎞⎠ = n∑i,j=1

xi(ui, T (uj ))xj = xtAx,

(5.3.10)

which is positive if T is positive definite, and vice versa. So the equivalence ofthe positive definiteness of T and the statement (4) follows.

Suppose that A is positive definite. Let λ be any eigenvalue of A and x anassociated eigenvector. Then xtAx = λxtx > 0 implies λ > 0 since xtx > 0.So (5) follows.

Now assume (5). Using Theorem 5.6, we see that A = P tDP where D is adiagonal matrix in R(n, n) whose diagonal entries are the positive eigenvalues


of A, say λ1, . . . , λn, and P ∈ R(n, n) is an orthogonal matrix. Then, withy = Px = (y1, . . . , yn)

t ∈ Rn where x ∈ Rn, we have

xtAx = (Px)tDPx = ytDy =n∑

i=1

λiy2i > 0, (5.3.11)

whenever x = 0 since P is nonsingular. Hence (5) implies (4) as well.Finally, assume (5) holds and set

D1 = diag{√λ1, . . . ,√

λn}. (5.3.12)

Then D21 = D. Thus

A = P tDP = P tD21P = (P tD1P)(P tD1P) = B2, B = P tD1P.

(5.3.13)

Using (5) we know that B is positive definite. So (6) follows. Conversely, if (6)holds, we may use the symmetry of B to get xtAx = xtB2x = (Bx)t (Bx) > 0whenever x ∈ Rn with x = 0 because B is nonsingular, which implies Bx =0. Thus (4) holds.


We note that, if in (5.3.5) the basis {u1, . . . , un} is orthonormal, then

T (uj ) =n∑

i=1

(ui, T (uj ))ui =n∑

i=1

aijui, j = 1, . . . , n, (5.3.14)

which coincides with our notation adapted earlier.For practical convenience, it is worth stating the following matrix version of

Theorem 5.8.

Theorem 5.9 That a symmetric matrix A ∈ R(n, n) is positive definite isequivalent to any of the following statements.

(1) All the eigenvalues of A are positive.(2) There is a positive definite matrix B ∈ R(n, n) such that A = B2.(3) A is congruent to the identity matrix. That is, there is a nonsingular matrix

B ∈ R(n, n) such that A = BtInB = BtB.

Proof That A is positive definite is equivalent to either (1) or (2) has alreadybeen demonstrated in the proof of Theorem 5.8 and it remains to establish (3).

If there is a nonsingular matrix B ∈ R(n, n) such that A = BtB, thenxtAx = (Bx)t (Bx) > 0 for x ∈ Rn with x = 0, which implies that A ispositive definite.


Now assume A is positive definite. Then, combining Theorem 5.6 and (1),we can rewrite A as A = P tDP where P ∈ R(n, n) is orthogonal andD ∈ R(n, n) is diagonal whose diagonal entries, say λ1, . . . , λn, are all posi-tive. Define the diagonal matrix D1 by (5.3.12) and set D1P = B. Then B isnonsingular and A = BtB as asserted.

Similarly, we say that the quadratic form q, self-adjoint mapping T , orsymmetric matrix A ∈ R(n, n) is positive semi-definite or non-negative if inthe condition (5.3.2), (5.3.3), or (5.3.4), respectively, the ‘greater than’ sign(>) is replaced by ‘greater than or equal to’ sign (≥). For positive semi-definite or non-negative mappings and matrices, the corresponding versionsof Theorems 5.8 and 5.9 can similarly be stated, simply with the word ‘pos-itive’ there being replaced by ‘non-negative’ and In in Theorem 5.9 (3) bydiag{1, . . . , 1, 0, . . . , 0}.

Besides, we can define a quadratic form q, self-adjoint mapping T , or sym-metric matrix A to be negative definite or negative semi-definite (non-positive),if −q, −T , or −A is positive definite or positive semi-definite (non-negative).

Thus, a quadratic form, self-adjoint mapping, or symmetric matrix is calledindefinite or non-definite if it is neither positive semi-definite nor negativesemi-definite.

As an illustration, we consider the quadratic form

q(x)=a(x21 + x2

2 + x23 + x2

4)+ 2x1x2 + 2x1x3 + 2x1x4, x=

⎛⎜⎜⎝x1

...

x4

⎞⎟⎟⎠∈ R4,

(5.3.15)

where a is a real number. It is clear that q may be represented by the matrix

A =

⎛⎜⎜⎜⎜⎝a 1 1 1

1 a 0 0

1 0 a 0

1 0 0 a

⎞⎟⎟⎟⎟⎠ , (5.3.16)

with respect to the standard basis of R4, whose characteristic equation is

(λ− a)2([λ− a]2 − 3) = 0. (5.3.17)

Consequently the eigenvalues of A are λ1 = λ2 = a, λ3 = a + √3, λ4 =a − √3. Therefore, when a >

√3, the matrix A is positive definite; when

a = √3, A is positive semi-definite but not positive definite; when a < −√3,


A is negative definite; when a = −√3, A is negative semi-definite but notnegative definite; when a ∈ (−√3,

√3), A is indefinite.

Exercises

5.3.1 Let T ∈ L(U) be self-adjoint and S ∈ L(U) be anti-self-adjoint. Showthat if T is positive definite then so is T − S2.

5.3.2 Prove that if A and B are congruent matrices then A being positivedefinite or positive semi-definite is equivalent to B being so.

5.3.3 Let U be a vector space with a positive definite scalar product (·, ·).Show that, for any set of linearly independent vectors {u1, . . . , uk} inU , the metric matrix

M =⎛⎜⎝ (u1, u1) · · · (u1, uk)

· · · · · · · · ·(uk, u1) · · · (uk, uk)

⎞⎟⎠ , (5.3.18)

must be positive definite.5.3.4 Show that a necessary and sufficient condition for A ∈ R(n, n) to be

symmetric is that there exists an invertible matrix B ∈ R(n, n) suchthat A = BtB + aIn for some a ∈ R.

5.3.5 Let A ∈ R(n, n) be positive definite. Show that det(A)> 0.5.3.6 Show that the inverse of a positive definite matrix is also positive defi-

nite.5.3.7 Show that if A ∈ R(n, n) is positive definite then so is adj(A).5.3.8 Let A ∈ R(m, n) where m = n. Then AAt ∈ R(m,m) and AtA ∈

R(n, n) are both symmetric. Prove that if A is of full rank, that is,r(A) = min{m, n}, then one of the matrices AAt,AtA is positive def-inite and the other is positive semi-definite but can never be positivedefinite.

5.3.9 Let A = (aij ) ∈ R(n, n) be positive definite.

(i) Prove that aii > 0 for i = 1, . . . , n.(ii) Establish the inequalities

|aij | < √aiiajj , i = j, i, j = 1, . . . , n. (5.3.19)

5.3.10 Let T ∈ L(U) be self-adjoint. Show that, if T is positive semi-definiteand k ≥ 1 any integer, then there is a unique positive semi-definiteelement S ∈L(U) such that T = Sk .

5.3.11 Let A ∈ R(n, n) be positive semi-definite and consider the null set

S = {x ∈ Rn | xtAx = 0}. (5.3.20)


Prove that x ∈ S if and only if Ax = 0. Hence S is the null-space ofthe matrix A. What happens when A is indefinite such that there arenonzero vectors x, y ∈ Rn so that xtAx > 0 and ytAy < 0?

5.3.12 Let A,B ∈ R(n, n) be symmetric and A positive definite. Prove thatthere is a nonsingular matrix C ∈ R(n, n) so that both CtAC andCtBC are diagonal matrices (that is, A and B are simultaneously con-gruent to diagonal matrices).

5.3.13 Assume that A,B ∈ R(n, n) are symmetric and the eigenvalues ofA,B are greater than or equal to a, b ∈ R, respectively. Show that theeigenvalues of A+ B are greater than or equal to a + b.

5.3.14 Let A,B ∈ R(n, n) be positive definite. Prove that the eigenvalues ofAB are all positive.


q(x) = (n+ 1)

n∑i=1

x2i −(

n∑i=1

xi

)2

, x =

⎛⎜⎜⎝x1

...

xn

⎞⎟⎟⎠ ∈ Rn.

(5.3.21)

(i) Find a symmetric matrix A ∈ R(n, n) such that q(x) = xtAx forx ∈ Rn.

(ii) Compute the eigenvalues of A to determine whether q or A is pos-itive definite.

5.3.16 Let A,B ∈ R(n, n) where A is positive definite and B is positive semi-definite. Establish the inequality

det(A+ B) ≥ det(A)+ det(B) (5.3.22)

and show that equality in (5.3.22) holds only when B = 0.5.3.17 Let A,B ∈ R(n, n) be positive definite such that B − A is positive

semi-definite. Prove that any solution λ of the equation

det(λA− B) = 0, (5.3.23)

must be real and satisfy λ ≥ 1.5.3.18 Let U be an n-dimensional vector space with a positive definite scalar

product (·, ·) and T ∈ L(U) a positive definite self-adjoint mapping.Given b ∈ U , consider the function

f (u) = 1

2(u, T (u))− (u, b), u ∈ U. (5.3.24)


(i) Show that the non-homogeneous equation

T (x) = b (5.3.25)

enjoys the following variational principle: x ∈ U solves (5.3.25)if and only if it is the minimum point of f , i.e. f (u) ≥ f (x) forany u ∈ U .

(ii) Show without using the invertibility of T that the solution to(5.3.25) is unique.

5.3.19 Use the notation of the previous exercise and consider the quadraticform q(u)= (u, T (u)) (u∈U ) where T ∈L(U) is positive semi-definite. Prove that q is convex. That is, for any α, β ≥ 0 satisfyingα + β = 1, there holds

q(αu+ βv) ≤ αq(u)+ βq(v), u, v ∈ U. (5.3.26)

5.4 Alternative characterizations of positive definite matrices

In this section, we present two alternative characterizations of positive definitematrices, that are useful in applications. The first one involves using determi-nants and the second one involves an elegant matrix decomposition.

Theorem 5.10 Let A = (aij ) ∈ R(n, n) be symmetric. The matrix A is positivedefinite if and only if all its leading principal minors are positive, that is, if andonly if

a11 > 0,

∣∣∣∣∣ a11 a12

a21 a22

∣∣∣∣∣ > 0, . . . ,

∣∣∣∣∣∣∣a11 · · · a1n

· · · · · · · · ·an1 · · · ann

∣∣∣∣∣∣∣ > 0. (5.4.1)

Proof First assume that A is positive definite. For any k = 1, . . . , n, set

Ak =⎛⎜⎝ a11 · · · a1k

· · · · · · · · ·ak1 · · · akk

⎞⎟⎠ . (5.4.2)

For any y = (y1, . . . , yk)t ∈ Rk , y = 0, we take x = (y1, . . . , yk, 0, . . . , 0)t

∈ Rn. Thus, there holds

ytAky =k∑

i,j=1

aij yiyj = xtAx > 0. (5.4.3)

5.4 Alternative characterizations of positive definite matrices 165

So Ak ∈ R(k, k) is positive definite. Using Theorem 5.9, there is a nonsingularmatrix B ∈ R(k, k) such that Ak = BtB. Thus, det(Ak) = det(BtB) =det(Bt ) det(B) = det(B)2 > 0, and (5.4.1) follows.

We next assume that (5.4.1) holds and we show that A is positive definite byan inductive argument.

If n = 1, there is nothing to show.Assume that the assertion is valid at n− 1 (n ≥ 2).We prove that A is positive definite at n ≥ 2.To proceed, we rewrite A in a blocked form as

A =(

An−1 α

αt ann

), (5.4.4)

where α = (a1n, . . . , an−1,n)t ∈ Rn−1. By the inductive assumption at n− 1,

we know that An−1 is positive definite. Hence, applying Theorem 5.9, we havea nonsingular matrix B ∈ R(n− 1, n− 1) such that An−1 = BtB. Thus

A =(

An−1 α

αt ann

)=(

Bt 0

0 1

)(In−1 β

βt ann

)(B 0

0 1

), (5.4.5)

if we choose β = (Bt )−1α ≡ (b1, . . . , bn−1)t ∈ Rn−1. Since det(A) > 0, we

obtain, after making some suitable row operations, the result

det(A)

det(An−1)=det

(In−1 β

βt ann

)=ann − b2

1 − · · · − b2n−1 > 0. (5.4.6)

Now for x ∈ Rn so that

y =(

B 0

0 1

)x ≡ (y1, . . . , yn−1, yn)

t , (5.4.7)

we have

xtAx = yt

(In−1 β

βt ann

)y

= y21 + · · · + y2

n−1 + 2(b1y1 + · · · + bn−1yn−1)yn + anny2n

= (y1 + b1yn)2 + · · · + (yn−1 + bn−1yn)

2

+ (ann − b21 − · · · − b2

n−1)y2n, (5.4.8)

which is clearly positive whenever y = 0, or equivalently, x = 0, in view ofthe condition (5.4.6).

Therefore the proof is complete.


Note that, if we take x = (x1, . . . , xn)t ∈ Rn so that all the nonvanishing

components of x, if any, are given by

xi1 = y1, . . . , xik = yk, i1, . . . , ik = 1, . . . , n, i1 < · · · < ik, (5.4.9)

(if k = 1 it is understood that x has only one nonvanishing component atxi1 = y1, if any) then for y = (y1, . . . , yk)

t ∈ Rk there holds

ytAi1,...,ik y =k∑

l,m=1

ail imylym = xtAx, (5.4.10)

where

Ai1,...,ik =⎛⎜⎝ ai1i1 · · · ai1ik

· · · · · · · · ·aiki1 · · · aikik

⎞⎟⎠ (5.4.11)

is a submatrix of A obtained from deleting all the ith rows and j th columnsof A for i, j = 1, . . . , n and i, j = i1, . . . , ik . The quantity det(Ai1,...,ik )

is referred to as a principal minor of A of order k. Such a principal minorbecomes a leading principal minor when i1 = 1, . . . , ik = k with A1,...,k = Ak .For A = (aij ) the principal minors of order 1 are all its diagonal entries,a11, . . . , ann. It is clear that if A is positive definite then so is Ai1,...,ik . Hencedet(Ai1,...,ik ) > 0. Therefore we arrive at the following slightly strengthenedversion of Theorem 5.10.

Theorem 5.11 Let A = (aij ) ∈ R(n, n) be symmetric. The matrix A is positivedefinite if and only if all its principal minors are positive, that is, if and only if

aii > 0,

∣∣∣∣∣∣∣ai1i1 · · · ai1ik

· · · · · · · · ·aiki1 · · · aikik

∣∣∣∣∣∣∣ > 0, (5.4.12)

for i, i1, . . . , ik = 1, . . . , n, i1 < · · · < ik, k = 2, . . . , n.

We now pursue the decomposition of a positive definite matrix as anothercharacterization of this kind of matrices.

Theorem 5.12 Let A ∈ R(n, n) be symmetric. The matrix A is positive definiteif and only if there is a nonsingular lower triangular matrix L ∈ R(n, n) suchthat A = LLt . Moreover, when A is positive definite, there is a unique L withthe property that all the diagonal entries of L are positive.


Proof If there is a nonsingular L ∈ R(n, n) such that A = LLt , we see inview of Theorem 5.9 that A is of course positive definite.

Now we assume A = (aij ) ∈ R(n, n) is positive definite. We look for aunique lower triangular matrix L = (lij ) ∈ R(n, n) such that A = LLt so thatall the diagonal entries of L are positive, l11 > 0, . . . , lnn > 0.

We again use induction.When n = 1, then A = (a11) with a11 > 0. So the unique choice is L =

(√

a11).Assume that the assertion is valid at n− 1 (n ≥ 2).We establish the decomposition at n (n ≥ 2).To proceed, we rewrite A in the form (5.4.4). In view of Theorem 5.10, the

matrix An−1 is positive definite. Thus there is a unique lower triangular matrixL1 ∈ R(n− 1, n− 1), with positive diagonal entries, so that An−1 = L1L

t1.

Now take L ∈ R(n, n) with

L =(

L2 0

γ t a

), (5.4.13)

where L2 ∈ R(n−1, n−1) is a lower triangular matrix, γ ∈ Rn−1 is a columnvector, and a ∈ R is a suitable number. Then, if we set A = LLt , we obtain

A =(

An−1 α

αt ann

)=(

L2 0

γ t a

)(Lt

2 γ

0 a

)

=(

L2Lt2 L2γ

(L2γ )t γ tγ + a2

). (5.4.14)

Therefore we arrive at the relations

An−1 = L2Lt2, α = L2γ, γ tγ + a2 = ann. (5.4.15)

If we require that all the diagonal entries of L2 be positive, then the inductiveassumption leads to L2 = L1. Hence the vector γ is also uniquely determined,γ = L−1

1 α. So it remains to show that the number a may be uniquely deter-mined as well. For this purpose, we need to show in (5.4.15) that

ann − γ tγ > 0. (5.4.16)

In fact, in (5.4.5), the matrix B is any nonsingular element in R(n− 1, n− 1)

that gives us An−1 = BtB. Thus, from the relation An−1 = L1Lt here we may

specify L1 = Bt in (5.4.5) so that β = γ there. That is,


A =(

An−1 α

αt ann

)=(

L1 0

0 1

)(In−1 γ

γ t ann

)(Lt

1 0

0 1

).

(5.4.17)

Hence (5.4.6), namely, (5.4.16), follows immediately. Therefore the last rela-tion in (5.4.15) leads to the unique determination of the number a:

a = √ann − γ tγ > 0. (5.4.18)

The proof follows.

The above-described characterization that a positive definite n × n matrixcan be decomposed as the product of a unique lower triangular matrix L, withpositive diagonal entries, and its transpose, is known as the Cholesky decom-position theorem, which has wide applications in numerous areas. Throughresolving the Cholesky relation A = LLt , it may easily be seen that the lowertriangular matrix L = (lij ) can actually be constructed from A = (aij ) explic-itly following the formulas⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

l11 = √a11,

li1 = 1

l11ai1, i = 2, . . . , n,

lii =√√√√aii −

i−1∑j=1

l2ij , i = 2, . . . , n,

lij = 1

ljj

⎛⎝aij −j−1∑k=1

likljk

⎞⎠ , i = j + 1, . . . , n, j = 2, . . . , n.

(5.4.19)

Thus, if A ∈ R(n, n) is nonsingular, the product AAt may always be reducedinto the form LLt for a unique lower triangular matrix L with positive diagonalentries.

Exercises

5.4.1 Let a1, a2, a3 be real numbers and set

A =⎛⎜⎝ a1 a2 a3

a2 a3 a1

a3 a1 a2

⎞⎟⎠ . (5.4.20)

Prove that A can never be positive definite no matter how a1, a2, a3 arechosen.



q(x) = (x1 + a1x2)2+(x2 + a2x3)

2+(x3 + a3x1)2, x =

⎛⎜⎝ x1

x2

x3

⎞⎟⎠∈R3,

(5.4.21)

where a1, a2, a3 ∈ R. Find a necessary and sufficient condition ona1, a2, a3 for q to be positive definite. Can you extend your finding tothe case over Rn?

5.4.3 Let A = (aij ) ∈ R(n, n) be symmetric. Prove that if the matrix A is pos-itive semi-definite then all its leading principal minors are non-negative,

a11 ≥ 0,

∣∣∣∣∣ a11 a12

a21 a22

∣∣∣∣∣ ≥ 0, . . . ,

∣∣∣∣∣∣∣a11 · · · a1n

· · · · · · · · ·an1 · · · ann

∣∣∣∣∣∣∣ ≥ 0. (5.4.22)

Is the converse true?5.4.4 Investigate the possibility whether Theorem 5.12 can be extended so

that the lower triangular matrix may be replaced by an upper triangularmatrix so that the theorem may now read: A symmetric matrix A is pos-itive definite if and only if there is a nonsingular upper triangular matrixU such that A = UUt . Furthermore, if A is positive definite, then thereis a unique upper triangular matrix U with positive diagonal entries suchthat A = UUt .

5.4.5 Assume that A = (aij ) ∈ R(n, n) is positive definite and rewrite A as

A =(

An−1 α

αt ann

), (5.4.23)

where An−1 ∈ R(n, n) is positive definite by Theorem 5.10 and α is acolumn vector in Rn−1.

(i) Show that the matrix equation(In−1 β

0 1

)t (An−1 α

αt ann

)(In−1 β

0 1

)

=(

An−1 0

0 ann + a

) (5.4.24)

has a unique solution for some β ∈ Rn−1 and a ≤ 0.


(ii) Use (i) to establish the inequality

det(A) ≤ det(An−1)ann. (5.4.25)

(iii) Use (ii) and induction to establish the general conclusion

det(A) ≤ a11 · · · ann. (5.4.26)

(iv) Show that (5.4.26) still holds when A is positive semi-definite.

5.4.6 Let A ∈ R(n, n) and view A as made of n column vectors u1, . . . , un inRn which is equipped with the standard Euclidean scalar product.

(i) Show that AtA is the metric or Gram matrix of u1, . . . , un ∈ Rn.(ii) Use the fact that AtA is positive semi-definite and (5.4.26) to prove

that

| det(A)| ≤ ‖u1‖ · · · ‖un‖. (5.4.27)

5.4.7 Let A = (aij ) ∈ R(n, n) and a ≥ 0 is a bound of the entries of A

satisfying

|aij | ≤ a, i, j = 1, . . . , n. (5.4.28)

Apply the conclusion of the previous exercise to establish the followingHadamard inequality for determinants,

| det(A)| ≤ annn2 . (5.4.29)


A =⎛⎜⎝ 2 1 −1

1 3 1

−1 1 2

⎞⎟⎠ . (5.4.30)

(i) Apply Theorem 5.10 to show that A is positive definite.(ii) Find the unique lower triangular matrix L ∈ R(3, 3) stated in The-

orem 5.12 such that A = LLt .

5.5 Commutativity of self-adjoint mappings

We continue our discussion about self-adjoint mappings over a real finite-dimensional vector space U with a positive definite scalar product (·, ·).

The main focus of this section is to characterize a situation when two map-pings may be simultaneously diagonalized.

5.5 Commutativity of self-adjoint mappings 171

Theorem 5.13 Let S, T ∈ L(U) be self-adjoint. Then there is an orthonor-mal basis of U consisting of eigenvectors of both S, T if and only if S and T

commute, or S ◦ T = T ◦ S.

Proof Let λ1, . . . , λk be all the distinct eigenvalues of T and Eλ1 , . . . , Eλk

the associated eigenspaces which are known to be mutually perpendicular. As-sume that S, T ∈ L(U) commute. Then, for any u ∈ Eλi

(i = 1, . . . , k), wehave

T (S(u)) = S(T (u)) = S(λiu) = λiS(u). (5.5.1)

Thus S(u) ∈ Eλi. In other words, each Eλi

is invariant under S. Since S is self-adjoint, each Eλi

has an orthonormal basis, say {ui,1, . . . , ui,mi}, consisting of

the eigenvectors of S. Therefore, the set of vectors

{u1,1, . . . , u1,m1 , . . . , uk,1, . . . , uk,mk} (5.5.2)

is an orthonormal basis of U consisting of the eigenvectors of both S and T .Conversely, if {u1, . . . , un} is an orthonormal basis of U consisting of the

eigenvectors of both S and T , then

S(ui) = εiui, T (ui) = λiui, εi, λi ∈ R, i = 1, . . . , n. (5.5.3)

Thus T (S(ui)) = εiλiui = S(T (ui)) (i = 1, . . . , n), which establishes S ◦T = T ◦ S as anticipated.

Note that although the theorem says that, when S and T commute andn = dim(U), there are n mutually orthogonal vectors that are the eigenvec-tors of S and T simultaneously, it does not say that an eigenvector of S or T

must also be an eigenvector of T or S. For example, we may take S = I (theidentity mapping). Then S commutes with any mapping and any nonzero vec-tor u is an eigenvector of S but u cannot be an eigenvector of all self-adjointmappings.

The matrix version of Theorem 5.13 is easily stated and proved: Two sym-metric matrices A,B ∈ R(n, n) are commutative, AB = BA, if and only ifthere is an orthogonal matrix P ∈ R(n, n) such that

A = P tDAP, B = P tDBP, (5.5.4)

where DA,DB are diagonal matrices in R(n, n) whose diagonal entries are theeigenvalues of A,B, respectively.


Exercises

5.5.1 Let U be a finite-dimensional vector space over a field F. Show that forT ∈ L(U), if T commutes with any S ∈ L(U), then there is some a ∈ F

such that T = aI .5.5.2 If A,B ∈ R(n, n) are positive definite matrices and AB =BA, show

that AB is also positive definite.5.5.3 Let U be an n-dimensional vector space over a field F, where F = R or

C and T , S ∈ L(U). Show that, if T has n distinct eigenvalues in F andS commutes with T , then there is a polynomial p(t), of degree at most(n− 1), of the form

p(t) = a0 + a1t + · · · + an−1tn−1, a0, a1, . . . , an−1 ∈ F, (5.5.5)

such that S = p(T ).5.5.4 (Continued from the previous exercise) If T ∈ L(U) has n distinct

eigenvalues in F, show that

CT = {S ∈ L(U) | S ◦ T = T ◦ S}, (5.5.6)

i.e. the subset of linear mappings over U and commutative with T , as asubspace of L(U), is exactly n-dimensional.

5.5.5 Let U be a finite-dimensional vector space with a positive definite scalarproduct and T ∈ L(U).

(i) Show that if T is normal satisfying T ◦ T ′ = T ′ ◦ T then ‖T (u)‖ =‖T ′(u)‖ for any u ∈ U . In particular, N(T ) = N(T ′).

(ii) Show that if T is normal and idempotent, T 2 = T , then T = T ′.

5.6 Mappings between two spaces

In this section, we briefly illustrate how to use self-adjoint mappings to studygeneral mappings between two vector spaces with positive definite scalar prod-ucts.

Use U and V to denote two real vector spaces of finite dimensions equippedwith positive definite scalar products (·, ·)U and (·, ·)V , respectively. For T ∈L(U, V ), since

f (u) = (T (u), v)V , u ∈ U, (5.6.1)

defines an element f in U ′, we know that there is a unique element in U

depending on v, say T ′(v), such that f (u) = (u, T ′(v))U . That is,

(T (u), v)V = (u, T ′(v))U , u ∈ U, v ∈ V. (5.6.2)

5.6 Mappings between two spaces 173

Hence we have obtained a well-defined mapping T ′ : V → U .It is straightforward to check that T ′ is linear. Thus T ′ ∈ L(V,U). This

construction allows us to consider the composed mappings T ′ ◦T ∈ L(U) andT ◦ T ′ ∈ L(V ), which are both seen to be self-adjoint.

Moreover, since

(u, (T ′ ◦ T )(u))U = (T (u), T (u))V ≥ 0, u ∈ U, (5.6.3)

(v, (T ◦ T ′)(v))V = (T ′(v), T ′(v))U ≥ 0, v ∈ V, (5.6.4)

we see that T ′ ◦ T ∈ L(U) and T ◦ T ′ ∈ L(V ) are both positive semi-definite.Let ‖ · ‖U and ‖ · ‖V be the norms induced from the positive definite scalar

products (·, ·)U and (·, ·)V , respectively. Recall that the norm of T with respectto ‖ · ‖U and ‖ · ‖V is given by

‖T ‖ = sup{‖T (u)‖V | ‖u‖U = 1, u ∈ U}. (5.6.5)

On the other hand, since T ′ ◦ T ∈ L(U) is self-adjoint and positive semi-definite, there is an orthonormal basis {u1, . . . , un} of U consisting of eigen-vectors of T ′ ◦ T , associated with the corresponding non-negative eigenvalues

σ1, . . . , σn. Therefore, for u =n∑

i=1

aiui ∈ U with ‖u‖2U =

n∑i=1

a2i = 1, we

have

‖T (u)‖2V = (T (u), T (u))V = (u, (T ′ ◦ T )(u))U =

n∑i=1

σia2i ≤ σ0, (5.6.6)

where

σ0 = max1≤i≤n

{σi}, (5.6.7)

which proves ‖T ‖ ≤ √σ0. Furthermore, let i = 1, . . . , n be such that σ0 = σi .Then (5.6.5) leads to

‖T ‖2 ≥ ‖T (ui)‖2V = (ui, (T

′ ◦ T )(ui))U = σi = σ0. (5.6.8)

Consequently, we may conclude with

‖T ‖ = √σ0, σ0 is the largest eigenvalue of the mapping T ′ ◦ T . (5.6.9)

In particular, the above also gives us a practical method to compute the normof a linear mapping T from U into itself by using the generated self-adjointmapping T ′ ◦ T .


If T ∈ L(U) is already self-adjoint, then, since ‖T ‖ is the radical root of thelargest eigenvalue of T 2, we have the expression

‖T ‖ = max1≤i≤n

{|λi |}, (5.6.10)

where λ1, . . . , λn are the eigenvalues of T .An immediate consequence of (5.6.9) and (5.6.10) is the elegant formula

‖T ‖2 = ‖T ′ ◦ T ‖, T ∈ L(U, V ). (5.6.11)

Assume that T ∈ L(U) is self-adjoint. From (5.6.10), it is not hard to seethat ‖T ‖ may also be assessed according to

‖T ‖ = sup{|(u, T (u))| | u ∈ U, ‖u‖ = 1}. (5.6.12)

In fact, let η denote the right-hand side of (5.6.12). Then the Schwarzinequality (4.3.10) implies that |(u, T (u))| ≤ ‖T (u)‖ ≤ ‖T ‖ for u ∈ U satis-fying ‖u‖ = 1. So η ≤ ‖T ‖. On the other hand, let λ be the eigenvalue of T

such that ‖T ‖ = |λ| and u ∈ U an associated eigenvector with ‖u‖ = 1. Then‖T ‖ = |λ| = |(u, T (u))| ≤ η. Hence (5.6.12) is verified.

We now show how to extend (5.6.12) to evaluate the norm of a general linearmapping between vector spaces with scalar products.

Theorem 5.14 Let U,V be finite-dimensional vector spaces equipped withpositive definite scalar products (·, ·)U , (·, ·)V , respectively. For T ∈ L(U, V ),we have

‖T ‖ = sup{|(T (u), v)V | | u ∈ U, v ∈ V, ‖u‖U = 1, ‖v‖V = 1}. (5.6.13)

Proof Recall that there holds

‖T ‖ = sup{‖T (u)‖V | u ∈ U, ‖u‖U = 1}. (5.6.14)

Thus, for any ε > 0, there is some uε ∈ U with ‖uε‖ = 1 such that

‖T (uε)‖V ≥ ‖T ‖ − ε. (5.6.15)

Furthermore, for T (uε) ∈ V , we have

‖T (uε)‖V = sup{|(T (uε), v)V | | v ∈ V, ‖v‖V = 1}. (5.6.16)

Thus, there is some vε ∈ V with ‖vε‖V = 1 such that

|(T (uε), vε)V | ≥ ‖(T (uε)‖V − ε. (5.6.17)


As a consequence, if we use η to denote the right-hand side of (5.6.13), wemay combine (5.6.15) and (5.6.17) to obtain η ≥ ‖T ‖ − 2ε. Since ε > 0 isarbitrary, we have η ≥ ‖T ‖.

On the other hand, for u ∈ U, v ∈ V with ‖u‖U = 1, ‖v‖V = 1, we mayuse the Schwarz inequality (4.3.10) to get |(T (u), v)V | ≤ ‖T (u)‖V ≤ ‖T ‖.Hence η ≤ ‖T ‖.

Therefore we arrive at ‖T ‖ = η and the proof follows.

An important consequence of Theorem 5.14 is that the norms of a linearmapping, over two vector spaces with positive definite scalar products, and itsdual assume the same value.

Theorem 5.15 Let U,V be finite-dimensional vector spaces equipped withpositive definite scalar products (·, ·)U , (·, ·)V , respectively. For T ∈ L(U, V )

and its dual T ′ ∈ L(V,U), we have ‖T ‖ = ‖T ′‖. Thus ‖T ′ ◦ T ‖ = ‖T ◦ T ′‖and the largest eigenvalues of the positive semi-definite mappings T ′ ◦ T ∈L(U) and T ◦ T ′ ∈ L(V ) are the same.

Proof The fact that ‖T ‖ = ‖T ′‖ may be deduced from applying (5.6.13) toT ′ and the relation (T (u), v)V = (u, T ′(v))U (u ∈ U ,v ∈ V ). The conclusion‖T ′ ◦ T ‖ = ‖T ◦ T ′‖ follows from (5.6.11) and that the largest eigenvaluesof the positive semi-definite mappings T ′ ◦ T and T ◦ T ′ are the same is aconsequence of the eigenvalue characterization of the norm of a self-adjointmapping stated in (5.6.10) and ‖T ′ ◦ T ‖ = ‖T ◦ T ′‖.

As an application, we see that, for any matrix A ∈ R(m, n), the largesteigenvalues of the symmetric matrices AtA ∈ R(n, n) and AAt ∈ R(m,m)

must coincide. In fact, regarding eigenvalues, it is not hard to establish a moregeneral result as stated as follows.

Theorem 5.16 Let U,V be vector spaces over a field F and T ∈ L(U, V ), S ∈L(V,U). Then the nonzero eigenvalues of S ◦ T and T ◦ S are the same.

Proof Let λ ∈ F be a nonzero eigenvalue of S ◦ T and u ∈ U an associatedeigenvector. We show that λ is also an eigenvalue of T ◦ S. In fact, from

(S ◦ T )(u) = λu, (5.6.18)

we have (T ◦ S)(T (u)) = λT (u). In order to show that λ is an eigenvalue ofT ◦ S, it suffices to show T (u) = 0. However, such a property is already seenin (5.6.18) since λ = 0 and u = 0.


Interchanging S and T , we see that, if λ is a nonzero eigenvalue of T ◦ S,then it is also an eigenvalue of S ◦ T .

It is interesting that we do not require any additional properties for the vectorspaces U,V in order for Theorem 5.16 to hold.

We next study another important problem in applications known as the leastsquares approximation.

Let T ∈ L(U, V ) and v ∈ V . Consider the optimization problem

η ≡ inf{‖T (u)− v‖2

V | u ∈ U}

. (5.6.19)

For simplicity, we first assume that (5.6.19) has a solution, that is, there issome x ∈ U such that ‖T (x) − v‖2

V = η, and we look for some appropriatecondition the solution x will fulfill. For this purpose, we set

f (ε) = ‖T (x + εy)− v‖2V , y ∈ U, ε ∈ R. (5.6.20)

Then f (0) = η ≤ f (ε). Hence we may expand the right-hand side of (5.6.20)to obtain

0 =(

df

dε

)ε=0

= (T (x)− v, T (y))V + (T (y), T (x)− v)V

= 2((T ′ ◦ T )(x)− T ′(v), y)U , y ∈ U, (5.6.21)

which implies that x ∈ U is a solution to the equation

(T ′ ◦ T )(x) = T ′(v), v ∈ V. (5.6.22)

This equation is commonly referred to as the normal equation. It is not hardto see that the equation (5.6.22) is always consistent for any v ∈ V . In fact,it is clear that R(T ′ ◦ T ) ⊂ R(T ′). On the other hand, if u ∈ N(T ′ ◦ T ),then ‖T (u)‖2

V = (u, (T ′ ◦ T )(u)) = 0. Thus u ∈ N(T ). This establishesN(T ) = N(T ′ ◦ T ). So by the rank equation we deduce dim(U) = n(T ) +r(T ) = n(T ′ ◦ T ) + r(T ′ ◦ T ). Therefore, in view of Theorem 2.9, we getr(T ′ ◦ T ) = r(T ) = r(T ′). That is, R(T ′ ◦ T ) = R(T ′), as desired, whichproves the solvability of (5.6.22) for any v ∈ V .

Next, let x be a solution to (5.6.22). We show that x solves (5.6.19). In fact,if y is another solution to (5.6.22), then z = x−y ∈ N(T ′ ◦T ) = N(T ). Thus

‖T (y)− v‖2V = ‖T (x + z)− v‖2

V = ‖T (x)− v‖2V , (5.6.23)


which shows that the quantity ‖T (x) − v‖2V is independent of the solution x

of (5.6.22). Besides, for any test element u ∈ U , we rewrite u as u = x + w

where x is a solution to (5.6.22) and w ∈ U . Then we have

‖T (u)− v‖2V =‖T (x)− v‖2

V + 2(T (x)− v, T (w))V + ‖T (w)‖2V

=‖T (x)− v‖2V + 2((T ′ ◦ T )(x)− T ′(v), w)U

+ ‖T (w)‖2V

=‖T (x)− v‖2V + ‖T (w)‖2

V

≥ ‖T (x)− v‖2V . (5.6.24)

Consequently, x solves (5.6.19) as anticipated.Using knowledge about self-adjoint mappings, we can express a solution to

the normal equation (5.6.22) explicitly in terms of the eigenvectors and eigen-values of T ′ ◦T . In fact, let {u1, . . . , uk, . . . , un} be an orthonormal basis of U

consisting of the eigenvectors of the positive semi-definite mapping T ′ ◦ T ∈L(U) associated with the non-negative eigenvalues λ1, . . . , λk, . . . , λn amongwhich λ1, . . . , λk are positive (if any). Then a solution x of (5.6.22) may be

written x =n∑

i=1

aiui for some a1, . . . , an ∈ R. Inserting this into (5.6.22), we

obtaink∑

i=1

aiλiui = T ′(v). (5.6.25)

Thus, taking scalar product on both sides of the above, we find

ai = 1

λi

(ui, T′(v))U , i = 1, . . . , k, (5.6.26)

which leads to the following general solution formula for the equation (5.6.22):

x =k∑

i=1

1

λi

(ui, T′(v))Uui + x0, x0 ∈ N(T ), (5.6.27)

since N(T ′ ◦ T ) = N(T ).

Exercises

5.6.1 Show that the mapping T ′ : V → U defined in (5.6.2) is linear.5.6.2 Let U,V be finite-dimensional vector spaces with positive definite scalar

products. For any T ∈ L(U, V ) show that the mapping T ′ ◦ T ∈ L(U)

is positive definite if and only if n(T ) = 0.


5.6.3 Let U,V be finite-dimensional vector spaces with positive definite scalarproducts, (·, ·)U , (·, ·)V , respectively. For any T ∈ L(U, V ), define T ′ ∈L(V,U) by (5.6.2).

(i) Establish the relations R(T )⊥ = N(T ′) and R(T ) = N(T ′)⊥ (thelatter is also known as the Fredholm alternative).

(ii) Prove directly, without resorting to Theorem 2.9, that r(T ) =r(T ′).

(iii) Prove that r(T ◦ T ′) = r(T ′ ◦ T ) = r(T ).

5.6.4 Apply the previous exercise to verify the validity of the rank relation

r(AtA) = r(AAt) = r(A) = r(At ), A ∈ R(m, n). (5.6.28)

5.6.5 Let T ∈ L(R2,R3) be defined by

T

(x1

x2

)=⎛⎜⎝ 1 2

−1 1

2 3

⎞⎟⎠( x1

x2

),

(x1

x2

)∈ R2, (5.6.29)

where R2 and R3 are equipped with the standard Euclidean scalar prod-ucts.

(i) Find the eigenvalues of T ′ ◦ T and compute ‖T ‖.(ii) Find the eigenvalues of T ◦ T ′ and verify that T ◦ T ′ is positive

semi-definite but not positive definite (cf. part (ii) of (2)).(iii) Check to see that the largest eigenvalues of T ′ ◦ T and T ◦ T ′ are

the same.(iv) Compute all eigenvalues of T ◦T ′ and T ′ ◦T and explain the results

in view of Theorem 5.16.

5.6.6 Let A ∈ F(m, n) and B ∈ F(n,m) where m < n and consider AB ∈F(m,m) and BA ∈ F(n, n). Show that BA has at most m + 1 distincteigenvalues in F.

5.6.7 (A specialization of the previous exercise) Let u, v ∈ Fn be columnvectors. Hence utv ∈ F and vut ∈ F(n, n).

(i) Show that the matrix vut has a nonzero eigenvalue in F only ifutv = 0.

(ii) Show that, when utv = 0, the only nonzero eigenvalue of vut in F

is utv so that v is an associated eigenvector.(iii) Show that, when utv = 0, the eigenspace of vut associated with

the eigenvalue utv is one-dimensional.


5.6.8 Consider the Euclidean space Rk equipped with the standard inner prod-uct and let A ∈ R(m, n). Formulate a solution of the following optimiza-tion problem

η ≡ inf{‖Ax − b‖2 | x ∈ Rn

}, b ∈ Rm, (5.6.30)

by deriving a matrix version of the normal equation.5.6.9 Consider a parametrized plane in R3 given by

x1 = y1 + y2, x2 = y1 − y2, x3 = y1 + y2, y1, y2 ∈ R.

(5.6.31)

Use the least squares approximation to find a point in the plane that is theclosest to the point in R3 with the coordinates x1 = 2, x2 = 1, x3 = 3.

6

Complex quadratic formsand self-adjoint mappings

In this chapter we extend our study on real quadratic forms and self-adjointmappings to the complex situation. We begin with a discussion on the com-plex version of bilinear forms and the Hermitian structures. We will relate theHermitian structure of a bilinear form with representing it by a unique self-adjoint mapping. Then we establish the main spectrum theorem for self-adjointmappings. We next focus again on the positive definiteness of self-adjoint map-pings. We explore the commutativity of self-adjoint mappings and apply it toobtain the main spectrum theorem for normal mappings. We also show how touse self-adjoint mappings to study a mapping between two spaces.

6.1 Complex sesquilinear and associated quadratic forms

Let U be a finite-dimensional vector space over C. Extending the standardHermitian scalar product over Cn, we may formulate the notion of a complex‘bilinear’ form as follows.

Definition 6.1 A complex-valued function f : U × U → C is called asesquilinear form, which is also sometimes loosely referred to as a bilinearform, if it satisfies for any u, v,w ∈ U and a ∈ C the following conditions.

(1) f (u+ v,w) = f (u,w)+ f (v,w), f (u, v + w) = f (u, v)+ f (u,w).(2) f (au, v) = af (u, v), f (u, av) = af (u, v).

As in the real situation, we may consider how to use a matrix to representa sesquilinear form. To this end, let B = {u1, . . . , un} be a basis of U . Foru, v ∈ U , let x = (x1, . . . , xn)

t , y = (y1, . . . , yn)t ∈ Cn denote the coordinate

vectors of u, v with respect to the basis B. Then

180

6.1 Complex sesquilinear and associated quadratic forms 181

f (u, v) = f

⎛⎝ n∑i=1

xiui,

n∑j=1

yjuj

⎞⎠ = n∑i,j=1

xif (ui, uj )yj = xtAy = x†Ay,

(6.1.1)

where A = (aij ) = (f (ui, uj )) lies in C(n, n) which is the matrix representa-tion of the sesquilinear form f with respect to B.

Let B = {u1, . . . , un} be another basis of U so that the coordinate vectors ofu, v are x, y ∈ Cn with respect to B. Hence, using A = (aij ) = (f (ui , uj )) ∈C(n, n) to denote the matrix representation of the sesquilinear form f withrespect to B and B = (bij ) ∈ C(n, n) the basis transition matrix from B intoB so that

uj =n∑

i=1

bijui, j = 1, . . . , n, (6.1.2)

we know that the relations x = Bx and y = By are valid. Therefore, we canconclude with

f (u, v) = x†Ay = x†Ay = x†(B†AB)y. (6.1.3)

Consequently, there holds A = B†AB. As in the real situation, we make thedefinition that two matrices A,B ∈ C(n, n) are said to be Hermitian congru-ent, or simply congruent, if there is an invertible matrix C ∈ C(n, n) such that

A = C†BC. (6.1.4)

Hence we see that the matrix representations of a sesquilinear form over U

with respect to different bases of U are Hermitian congruent.Let f : U × U → C be a sesquilinear form. Define q : U → C by setting

q(u) = f (u, u), u ∈ U. (6.1.5)

As in the real situation, we may call q the quadratic form associated with f .We have the following homogeneity property

q(zu) = |z|2q(u), u ∈ U, z ∈ C. (6.1.6)

Conversely, we can also show that f is uniquely determined by q. This factis in sharp contrast with the real situation.

In fact, since

f (u+ v, u+ v)=f (u, u)+f (v, v)+f (u, v)+f (v, u), u, v∈U,

(6.1.7)

f (u+iv, u+iv)=f (u, u)+f (v, v)+if (u, v)− if (v, u), u, v∈U,

(6.1.8)

182 Complex quadratic forms and self-adjoint mappings

we have the following polarization identity relating a sesquilinear form to itsinduced quadratic form,

f (u, v) = 1

2(q(u+ v)− q(u)− q(v))

− i

2(q(u+ iv)− q(u)− q(v)) , u, v ∈ U. (6.1.9)

For U , let B = {u1, . . . , un} be a basis, u ∈ U , and x ∈ Cn the coordinatevector of u with respect to B. In view of (6.1.1), we get

q(u)=x†Ax = x†(

1

2(A+ A†)+ 1

2(A− A†)

)x

= 1

2x†(A+ A†)x + 1

2x†(A− A†)x=�{q(u)} + i�{q(u)}. (6.1.10)

In other words, q is real-valued when A is Hermitian, A = A†, which may bechecked to be equivalent to the condition

f (u, v) = f (v, u), u, v ∈ U. (6.1.11)

In fact, if q is real-valued, then replacing u by iv and v by u in (6.1.9),we have

−if (v, u) = 1

2(q(u+ iv)− q(u)− q(v))− i

2(q(u+ v)− q(u)− q(v)) .

(6.1.12)

Combining (6.1.9) and (6.1.12), and using the condition that q is real-valued,we arrive at (6.1.11).

Thus we are led to the following definition.

Definition 6.2 A sesquilinear form f : U ×U → C is said to be Hermitian ifit satisfies the condition (6.1.11).

For any column vectors x, y ∈ Cn, the standard Hermitian scalar product isgiven by

(x, y) = x†y. (6.1.13)

Thus f : Cn × Cn → C defined by f (x, y) = (x, y) for x, y ∈ Cn is aHermitian sesquilinear form. More generally, with any n×n Hermitian matrixA, we see that

f (x, y) = x†Ay = (x,Ay), x, y ∈ Cn, (6.1.14)

is also a Hermitian sesquilinear form. Conversely, the relation (6.1.1) indi-cates that a Hermitian sesquilinear form over an n-dimensional complex vector

6.1 Complex sesquilinear and associated quadratic forms 183

space is completely represented, with respect to a given basis, by a Hermitianmatrix, in terms of the standard Hermitian scalar product over Cn.

Consider a finite-dimensional complex vector space U with a positive defi-nite scalar product (·, ·). Given any sesquilinear form f : U × U → C, since

g(v) = f (u, v), v ∈ U, (6.1.15)

is linear functional in v depending on u ∈ U , there is a unique vector (say)T (u) ∈ U such that

f (u, v) = (T (u), v), u, v ∈ U, (6.1.16)

which in fact defines the correspondence T as an element in L(U). Let T ′denote the adjoint of T . Then f may also be represented as

f (u, v) = (u, T ′(v)), u, v ∈ U. (6.1.17)

In other words, a sesquilinear form f over U may completely be representedby a linear mapping T or T ′ from U into itself, in terms of the positive definitescalar product over U , alternatively through the expression (6.1.16) or (6.1.17).

Applying (6.1.9) to f (u, v) = (u, T (v)) (u, v ∈ U ) for any T ∈ L(U), weobtain the following useful polarization identity for T :

(u, T (v)) = 1

2((u+ v, T (u+ v))− (u, T (u))− (v, T (v)))

− i

2((u+ iv, T (u+ iv))+ (u, T (u))+ (v, T (v))) ,

u, v ∈ U. (6.1.18)

The Hermitian situation is especially interesting for us.

Theorem 6.3 Let f be a sesquilinear form over a finite-dimensional complexvector space U with a positive definite scalar product (·, ·) and represented bythe mapping T ∈ L(U) through (6.1.16) or (6.1.17). Then f is Hermitian ifand only if T is self-adjoint, T = T ′.

Proof If f is Hermitian, then

f (u, v) = f (v, u) = (T (v), u) = (u, T (v)), u, v ∈ U. (6.1.19)

In view of (6.1.17) and (6.1.19), we arrive at T = T ′. The converse is similar.

Let B = {u1, . . . , un} be an orthonormal basis of U , T ∈ L(U) be self-adjoint, and A = (aij ) ∈ C(n, n) the matrix representation of T with respect

to B. Then T (uj ) =n∑

i=1

aijui (j = 1, . . . , n) so that


aij = (ui, T (uj )) = (T (ui), uj ) = aji, i, j = 1, . . . , n. (6.1.20)

Hence A = A†. Of course the converse is true too. Therefore T is self-adjointif and only if the matrix representation of T with respect to any orthonormalbasis is Hermitian. Consequently, a self-adjoint mapping over a complex vectorspace with a positive definite scalar product is interchangeably referred to asHermitian as well.

Exercises

6.1.1 Let U be a complex vector space with a positive definite scalar product(·, ·) and T ∈ L(U). Show that T = 0 if and only if (u, T (u)) = 0 forany u ∈ U . Give an example to show that the same may not hold for areal vector space with a positive definite scalar product.

6.1.2 Let U be a complex vector space with a positive definite scalar product(·, ·) and T ∈ L(U). Show that T is self-adjoint or Hermitian if and onlyif (u, T (u)) = (T (u), u) for any u ∈ U . Give an example to show thatthe same may not hold for a real vector space with a positive definitescalar product.

6.1.3 Let U be a complex vector space with a basis B = {u1, . . . , un} andA = (f (ui, uj )) ∈ C(n, n) the matrix representation of f with respectto B. Show that f is Hermitian if and only if A is Hermitian.

6.1.4 Let U be a complex vector space with a positive definite scalar product(·, ·) and T ∈ L(U). Show that T can be uniquely decomposed into asum T = R + iS where R, S ∈ L(U) are both self-adjoint.

6.1.5 Show that the inverse of an invertible self-adjoint mapping is also self-adjoint.

6.1.6 Show that In and −In cannot be Hermitian congruent, although they arecongruent as elements in C(n, n).

6.2 Complex self-adjoint mappings

As in the real situation, we now show that complex self-adjoint or Hermitianmappings are completely characterized by their spectra as well.

Theorem 6.4 Let U be a complex vector space with a positive definite scalarproduct (·, ·) and T ∈ L(U). If T is self-adjoint, then the following are valid.

(1) The eigenvalues of T are all real.(2) Let λ1, . . . , λk be all the distinct eigenvalues of T . Then T may be reduced

over the direct sum of mutually perpendicular eigenspaces

6.2 Complex self-adjoint mappings 185

U = Eλ1 ⊕ · · · ⊕ Eλk. (6.2.1)

(3) There is an orthonormal basis of U consisting of eigenvectors of T .

Proof Let T be self-adjoint and λ ∈ C an eigenvalue of T with u ∈ U anassociated eigenvector. Then, using T (u) = λu, we have

λ‖u‖2 = (u, T (u)) = (T (u), u) = λ‖u‖2, (6.2.2)

which gives us λ = λ so that λ ∈ R. This establishes (1). Note that this proofdoes not assume that U is finite dimensional.

To establish (2), we use induction on dim(U).If dim(U) = 1, there is nothing to show.Assume that the statement (2) is valid if dim(U) ≤ n− 1 for some n ≥ 2.Consider dim(U) = n, n ≥ 2.Let λ1 be an eigenvalue of T in C. From (1), we know that actually λ1 ∈ R.Use Eλ1 to denote the eigenspace of T associated with λ1:

Eλ1 = {u ∈ U | T (u) = λ1u} = N(λ1I − T ). (6.2.3)

If Eλ1 = U , then T = λ1I and there is nothing more to show. We now assumeEλ1 = U .

It is clear that Eλ1 is invariant under T . In fact, T is reducible over thedirect sum U = Eλ1 ⊕ (Eλ1)

⊥. To see this, we need to establish the invarianceT ((Eλ1)

⊥) ⊂ (Eλ1)⊥. Indeed, for any u ∈ Eλ1 and v ∈ (Eλ1)

⊥, we have

(u, T (v)) = (T (u), v) = λ1(u, v) = 0. (6.2.4)

Thus T (v) ∈ (Eλ1)⊥ and T (Eλ1)

⊥ ⊂ (Eλ1)⊥.

Using the fact that dim((Eλ1)⊥) = dim(U) − dim(Eλ1) ≤ n − 1 and the

inductive assumption, we see that T is reduced over a direct sum of mutuallyperpendicular eigenspaces of T in (Eλ1)

⊥:

(Eλ1)⊥ = Eλ2 ⊕ · · · ⊕ Eλk

, (6.2.5)

where λ2, . . . , λk are all the distinct eigenvalues of T over the invariant sub-space (Eλ1)

⊥, which are real.Finally, we need to show that λ1, . . . , λk obtained above are all possible

eigenvalues of T . For this purpose, let λ be an eigenvalue of T and u anassociated eigenvector. Then there are u1 ∈ Eλ1 , . . . , uk ∈ Eλk

such thatu = u1+· · ·+uk . Hence the relation T (u) = λu gives us λ1u1+· · ·+λkuk =λ(u1 + · · · + uk). That is,

(λ1 − λ)u1 + · · · + (λk − λ)uk = 0. (6.2.6)


Since u = 0, there exists some i = 1, . . . , k such that ui = 0. Thus, takingscalar product of the above equation with ui , we get (λi −λ)‖ui‖2 = 0, whichimplies λ = λi . Therefore λ1, . . . , λk are all the possible eigenvalues of T .

To establish (3), we simply construct an orthonormal basis over eacheigenspace Eλi

, obtained in (2), denoted by Bλi, i = 1, . . . , k. Then B =

Bλ1 ∪ · · · ∪ Bλkis a desired orthonormal basis of U as stated in (3).

Let A = (aij ) ∈ C(n, n) and consider the mapping T ∈ L(Cn) defined by

T (x) = Ax, x ∈ Cn. (6.2.7)

Using the mapping (6.2.7) and Theorem 6.4, we may obtain the followingcharacterization of a Hermitian matrix, which may also be regarded as a matrixversion of Theorem 6.4.

Theorem 6.5 A matrix A ∈ C(n, n) is Hermitian if and only if there is aunitary matrix P ∈ C(n, n) such that

A = P †DP, (6.2.8)

where D ∈ R(n, n) is a real diagonal matrix whose diagonal entries are theeigenvalues of A.

Proof If (6.2.8) holds, it is clear that A is Hermitian, A = A†.Conversely, assume A is Hermitian. Using B0 = {e1, . . . , en} to denote the

standard basis of Cn equipped with the usual Hermitian positive definite scalarproduct (x, y) = x†y for x, y ∈ Cn, we see that B0 is an orthonormal basis ofCn. With the mapping T defined by (6.2.7), we have (T (x), y) = (Ax)†y =x†Ay = (x, T (y)) for any x, y ∈ Cn, and

T (ej ) =n∑

i=1

aij ei, j = 1, . . . , n. (6.2.9)

Thus T is self-adjoint or Hermitian. Using Theorem 6.4, there is anorthonormal basis, say B, consisting of eigenvalues of T , say λ1, . . . , λn,which are all real. With respect to B, the matrix representation of T is diago-nal, D = diag{λ1, . . . , λn}. Now since the basis transition matrix from B0 intoB is unitary, thus (6.2.8) must hold for some unitary matrix P as expected.

In other words, Theorem 6.5 states that a complex square matrix is diago-nalizable through a unitary matrix into a real diagonal matrix if and only if itis Hermitian.

Practically, the decomposition (6.2.8) for a Hermitian matrix A may also,more preferably, be established as in the proof of Theorem 5.6 as follows.

6.2 Complex self-adjoint mappings 187

Find an orthonormal basis {u1, . . . , un} of Cn, with the standard Hermitianscalar product, consisting of eigenvectors of A: Aui = λiui , i = 1, . . . , n. LetQ ∈ C(n, n) be made of taking u1, . . . , un as its respective column vectors.Then Q is unitary and AQ = QD where D = diag{λ1, . . . , λn}. Thus P = Q†

renders (6.2.8).

Exercises

6.2.1 Let A ∈ C(n, n) be Hermitian. Show that det(A) must be a real number.6.2.2 (Extension of the previous exercise) Let U be a complex vector space

with a positive definite scalar product and T ∈ L(U). If T is self-adjoint,then the coefficients of the characteristic polynomial of T are all real.

6.2.3 Let U be a complex vector space with a positive definite scalar productand S, T ∈ L(U) self-adjoint and commutative, S ◦ T = T ◦ S.

(i) Prove the identity

‖(S ± iT )(u)‖2 = ‖S(u)‖2 + ‖T (u)‖2, u ∈ U. (6.2.10)

(ii) Show that S ± iT is invertible if either S or T is so. However, theconverse is not true.

(This is an extended version of Exercise 4.3.4.)6.2.4 Let U be a complex vector space with a positive definite scalar product

and V a subspace of U . Show that the mapping P ∈ L(U) that projectsU onto V along V ⊥ is self-adjoint.

6.2.5 (A strengthened version of the previous exercise) If P ∈ L(U) is idem-potent, P 2 = P , show that P projects U onto R(P ) along R(P )⊥ if andonly if P ′ = P .

6.2.6 We rewrite any A ∈ C(n, n) into the form A = B + iC whereB,C ∈ R(n, n). Show that a necessary and sufficient condition for A

to be unitary is that BtC = CtB and BtB + CtC = In.6.2.7 Let U = C[0, 1] be the vector space of all complex-valued continuous

functions in the variable t ∈ [0, 1] equipped with the positive definitescalar product

(u, v) =∫ 1

0u(t)v(t) dt, u, v ∈ U. (6.2.11)

(i) Show that T (u)(t) = tu(t) for t ∈ [0, 1] defines a self-adjoint map-ping in over U .

(ii) Show that T does not have an eigenvalue whatsoever.

6.2.8 Let T ∈ L(U) be self-adjoint where U is a finite-dimensional complexvector space with a positive definite scalar product.


(i) Show that if T k = 0 for some integer k ≥ 1 then T = 0.(ii) (A sharpened version of (i)) Given u ∈ U show that if T k(u) = 0

for some integer k ≥ 1 then T (u) = 0.

6.3 Positive definiteness

We now consider the notion of positive definiteness in the complex situation.Assume that U is a finite-dimensional complex vector space with a positivedefinite scalar product (·, ·).

6.3.1 Definition and basic characterization

We start with the definition of the positive definiteness of various subjects ofour interest in the complex situation.

Definition 6.6 The positive definiteness of a quadratic form, a self-adjoint orHermitian mapping, or a Hermitian matrix may be defined as follows.

(1) A real-valued quadratic form q over U (hence it is generated from asesquilinear Hermitian form) is positive definite if

q(u) > 0, u ∈ U, u = 0. (6.3.1)

(2) A self-adjoint or Hermitian mapping T ∈ L(U) is positive definite if

(u, T (u)) > 0, u ∈ U, u = 0. (6.3.2)

(3) A Hermitian matrix A ∈ C(n, n) is positive definite if

x†Ax > 0, x ∈ Cn, x = 0. (6.3.3)

Let f : U × U → C be a sesquilinear Hermitian form and the quadraticform q is obtained from f through (6.1.5). Then there is a unique self-adjointor Hermitian mapping T ∈ L(U) such that

q(u) = f (u, u) = (u, T (u)), u ∈ U. (6.3.4)

Thus we see that the positive definiteness of q and that of T are equivalent.

Besides, let {u1, . . . , un} be any basis of U and write u ∈ U as u =n∑

i=1

xiui

where x = (x1, . . . , xn)t ∈ Cn is the coordinate vector of u. Then, in view of

(6.1.1),

q(u) = x†Ax, A = (f (ui, uj )), (6.3.5)

6.3 Positive definiteness 189

and the real-valuedness of q is seen to be equivalent to A being Hermitian and,thus, the positive definiteness of A and that of q are equivalent. Therefore thepositive definiteness of a self-adjoint or Hermitian mapping is central, whichis our focus of this section.

Parallel to Theorem 5.8, we have the following.

Theorem 6.7 That a self-adjoint or Hermitian mapping T ∈ L(U) is positivedefinite is equivalent to any of the following statements.

(1) All the eigenvalues of T are positive.(2) There is a positive constant, λ0 > 0, such that

(u, T (u)) ≥ λ0‖u‖2, u ∈ U. (6.3.6)

(3) There is a positive definite self-adjoint or Hermitian mapping S ∈ L(U)

such that T = S2.(4) The Hermitian matrix A defined by

A = (aij ), aij = (ui, T (uj )), i, j = 1, . . . , n, (6.3.7)

with respect to an arbitrary basis {u1, . . . , un} of U , is positive definite.(5) The eigenvalues of the Hermitian matrix A defined in (6.3.7) with respect

to an arbitrary basis are all positive.(6) The Hermitian matrix A defined in (6.3.7) with respect to an arbitrary

basis enjoys the factorization A = B2 for some Hermitian positive definitematrix B ∈ C(n, n).

The proof of Theorem 6.7 is similar to that of Theorem 5.8 and thus left asan exercise. Here we check only that the matrix A defined in (6.3.7) is indeedHermitian. In fact, this may be seen from the self-adjointness of T through

aji = (uj , T (ui)) = (T (uj ), ui) = (ui, T (uj )) = aij , i, j = 1, . . . , n.

(6.3.8)

Note also that if {u1, . . . , un} is an orthonormal basis of U , then the quanti-ties aij (i, j = 1, . . . , n) in (6.3.7) simply give rise to the matrix representation

of T with respect to this basis. That is, T (uj ) =n∑

i=1

aijui (j = 1, . . . , n).

A matrix version of Theorem 6.7 may be stated as follows.

Theorem 6.8 That a Hermitian matrix A ∈ C(n, n) is positive definite isequivalent to any of the following statements.

(1) All the eigenvalues of A are positive.


(2) There is a unitary matrix P ∈ C(n, n) and a diagonal matrix D ∈ R(n, n)

whose diagonal entries are all positive such that A = P †DP .(3) There is a positive definite Hermitian matrix B ∈C(n, n) such that A=B2.(4) A is Hermitian congruent to the identity matrix. That is, there is a

nonsingular matrix B ∈ C(n, n) such that A = B†InB = B†B.

The proof is similar to that of Theorem 5.9 and left as an exercise.Positive semi-definiteness and negative definiteness can be defined and

investigated analogously as in the real situation in Section 5.3 and areskipped here.

6.3.2 Determinant characterization of positive definiteHermitian matrices

If A ∈ C(n, n) is Hermitian, then det(A) ∈ R. Moreover, if A is positivedefinite, Theorem 6.8 says there is a nonsingular matrix B ∈ C(n, n) such thatA = B†B. Thus det(A) = det(B†) det(B) = | det(B)|2 > 0. Such a propertysuggests that it may be possible to extend Theorem 5.10 to Hermitian matricesas well, as we now do.

Theorem 6.9 Let A = (aij ) ∈ C(n, n) be Hermitian. The matrix A is positivedefinite if and only if all its leading principal minors are positive, that is, if andonly if

a11 > 0,

∣∣∣∣∣ a11 a12

a21 a22

∣∣∣∣∣ > 0, . . . ,

∣∣∣∣∣∣∣a11 · · · a1n

· · · · · · · · ·an1 · · · ann

∣∣∣∣∣∣∣ > 0. (6.3.9)

Proof The necessity proof is similar to that in Theorem 5.1.11 and omitted.The sufficiency proof is also similar but needs some adaptation to meet thedelicacy with handling complex numbers. To see how this is done, we mayassume that (6.3.9) holds and we show that A is positive definite by an induc-tive argument. Again, if n = 1, there is nothing to show, and assume that theassertion is valid at n−1 (that is, when A ∈ C(n−1, n−1)) (n ≥ 2). It remainsto prove that A is positive definite at n ≥ 2.

As before, we rewrite the Hermitian matrix A in the blocked form

A =(

An−1 α

α† ann

), (6.3.10)

where α = (a1n, . . . , an−1,n)t ∈ Cn−1. By the inductive assumption at n− 1,

we know that the Hermitian matrix An−1 is positive definite. Hence, applying


Theorem 6.8, we have a nonsingular matrix B ∈ C(n − 1, n − 1) such thatAn−1 = B†B. Thus

A =(

An−1 α

α† ann

)=(

B† 0

0 1

)(In−1 β

β† ann

)(B 0

0 1

),

(6.3.11)

where β = (B†)−1α ≡ (b1, . . . , bn−1)t ∈ Cn−1. Since det(A) > 0, we obtain

with some suitable row operations the result

det(A)

det(An−1)= det

(In−1 β

β† ann

)= ann − |b1|2 − · · · − |bn−1|2 > 0,

(6.3.12)

with det(B†B) = det(An−1). Taking x ∈ Cn and setting

y =(

B 0

0 1

)x ≡ (y1, . . . , yn−1, yn)

t , (6.3.13)

we have

x†Ax = y†

(In−1 β

β† ann

)y

= (y1 + b1yn, . . . , yn−1 + bn−1yn,

b1y1 + · · · + bn−1yn−1 + annyn)

⎛⎜⎜⎝y1

...

yn

⎞⎟⎟⎠= |y1 + b1yn|2 + · · · + |yn−1 + bn−1yn|2+ (ann − |b1|2 − · · · − |bn−1|2)|yn|2, (6.3.14)

which is positive when y = 0 or x = 0 because of the condition (6.3.12).Thus the proof is complete.

6.3.3 Characterization by the Cholesky decomposition

In this subsection, we show that the Cholesky decomposition theorem is alsovalid in the complex situation.

Theorem 6.10 Let A = (aij ) ∈ C(n, n) be Hermitian. The matrix A is pos-itive definite if and only if there is a nonsingular lower triangular matrixL ∈ C(n, n) such that A = LL†. Moreover, when A is positive definite, there isa unique such L with the property that all the diagonal entries of L are positive.


Proof Assume A = LL† with L being nonsingular. Then y = L†x = 0whenever x = 0. Thus we have x†Ax = x†(LL†)x = y†y > 0, which provesthat A is positive definite.

Assume now A is positive definite. We show that there is a unique lowertriangular matrix L = (lij ) ∈ C(n, n) whose diagonal entries are all positive,l11, . . . , lnn > 0, so that A = LL†. We again use induction.

When n = 1, we have A = (a11) and the unique choice for L is L = (√

a11)

since a11 > 0 in view of Theorem 6.9.Assume the conclusion is established at n− 1 (n ≥ 2).We proceed to establish the conclusion at n (n ≥ 2).Rewrite A in the blocked form (6.3.10). Applying Theorem 6.9, we know

that An−1 ∈ C(n − 1, n − 1) is positive definite. So there is a unique lowertriangular matrix L1 ∈ C(n−1, n−1) with positive diagonal entries such thatAn−1 = L1L

†1.

Now consider L ∈ C(n, n) of the form

L =(

L2 0

γ † a

), (6.3.15)

where L2 ∈ C(n−1, n−1) is a lower triangular matrix, γ ∈ Cn−1 is a columnvector, and a ∈ R is a suitable number. Then, if we set A = LL†, we obtain

A =(

An−1 α

α† ann

)=(

L2 0

γ † a

)(L

†2 γ

0 a

)

=(

L2L†2 L2γ

(L2γ )† γ †γ + a2

). (6.3.16)

Therefore we arrive at the relations

An−1 = L2L†2, α = L2γ, γ †γ + a2 = ann. (6.3.17)

Thus, if we require that all the diagonal entries of L2 are positive, then theinductive assumption gives us L2 = L1. So the vector γ is also uniquelydetermined, γ = L−1

1 α. Thus it remains only to show that the number a maybe uniquely determined as well. To this end, we need to show in (6.3.17) that

ann − γ †γ > 0. (6.3.18)

In fact, inserting L1 = B† in (6.3.11), we have β = γ . That is,

A =(

An−1 α

α† ann

)=(

L1 0

0 1

)(In−1 γ

γ † ann

)(L

†1 0

0 1

).

(6.3.19)


Hence (6.3.18) follows as a consequence of det(A) > 0. Thus the third equa-tion in (6.3.17) leads to the unique determination of the positive number a asin the real situation:

a =√

ann − γ †γ > 0. (6.3.20)

The inductive proof is now complete.

Exercises

6.3.1 Prove Theorem 6.7.6.3.2 Prove Theorem 6.8.6.3.3 Let A ∈ C(n, n) be nonsingular. Prove that there is a unique lower

triangular matrix L ∈ C(n, n) with positive diagonal entries so thatAA† = LL†.

6.3.4 Show that if A1, . . . , Ak ∈ C(n, n) are unitary matrices, so is theirproduct A = A1 · · ·Ak .

6.3.5 Show that if A ∈ C(n, n) is Hermitian and all its eigenvalues are ±1then A is unitary.

6.3.6 Assume that u ∈ Cn is a nonzero column vector. Show that there is aunitary matrix Q ∈ C(n, n) such that

Q†(uu†)Q = diag{u†u, 0, . . . , 0}. (6.3.21)

6.3.7 Let A = (aij ) ∈ C(n, n) be Hermitian. Show that if A is positivedefinite then aii > 0 for i = 1, . . . , n and

|aij | < √aiiajj , i, j = 1, . . . , n. (6.3.22)

6.3.8 Is the Hermitian matrix

A =⎛⎜⎝ 5 i 2− i

−i 4 1− i

2+ i 1+ i 3

⎞⎟⎠ (6.3.23)

positive definite?6.3.9 Let A = (aij ) ∈ C(n, n) be a positive semi-definite Hermitian matrix.

Show that aii ≥ 0 for i = 1, . . . , n and that

det(A) ≤ a11 · · · ann. (6.3.24)

6.3.10 Let u1, . . . , un be n column vectors of Cn, with the standard Hermitianscalar product, which are the n column vectors of a matrix A ∈ C(n, n).Establish the inequality

| det(A)| ≤ ‖u1‖ · · · ‖un‖. (6.3.25)


What is the Hadamard inequality in the context of a complex matrix?6.3.11 Let A ∈ C(n, n) be nonsingular. Show that there exist a unitary matrix

P ∈ C(n, n) and a positive definite Hermitian matrix B ∈ C(n, n) suchthat A = PB. Show also that, if A is real, then the afore-mentionedmatrices P and B may also be chosen to be real so that P is orthogonaland B positive definite.

6.3.12 Let A ∈ C(n, n) be nonsingular. Since A†A is positive definite, itseigenvalues, say λ1, . . . , λn, are all positive. Show that there exist uni-tary matrices P and Q in C(n, n) such that A = PDQ where

D = diag{√

λ1, . . . ,√

λn

}(6.3.26)

is a diagonal matrix. Show also that the same conclusion is true forsome orthogonal matrices P and Q in R(n, n) when A is real.

6.3.13 Let U be a finite-dimensional complex vector space with a positivedefinite scalar product (·, ·) and T ∈ L(U) a positive definite mapping.

(i) Establish the generalized Schwarz inequality

|(u, T (v))| ≤ √(u, T (u))√

(v, T (v)), u, v ∈ U. (6.3.27)

(ii) Show that the equality in (6.3.27) occurs if and only if u, v arelinearly dependent.

6.4 Commutative self-adjoint mappings and consequences

In this section, we extend our investigation about commutativity of self-adjointmappings in the real situation to that in the complex situation. It is not hard tosee that the conclusion in this extended situation is the same as in the realsituation.

Theorem 6.11 Let U be a finite-dimensional complex vector space with a pos-itive definite scalar product (·, ·) and S, T ∈ L(U) be self-adjoint or Hermi-tian. Then there is an orthonormal basis of U consisting of eigenvectors ofboth S, T if and only if S and T commute, or S ◦ T = T ◦ S.

The proof of the theorem is the same as that for the real situation andomitted.

We now use Theorem 6.11 to study a slightly larger class of linear mappingscommonly referred to as normal mappings.

6.4 Commutative self-adjoint mappings and consequences 195

Definition 6.12 Let T ∈ L(U). We say that T is normal if T and T ′ commute:

T ◦ T ′ = T ′ ◦ T . (6.4.1)

Assume T is normal. Decompose T in the form

T = 1

2(T + T ′)+ 1

2(T − T ′) = P +Q, (6.4.2)

where P = 1

2(T + T ′) ∈ L(U) is self-adjoint or Hermitian and Q = 1

2(T −

T ′) ∈ L(U) is anti-self-adjoint or anti-Hermitian since it satisfies Q′ = −Q.From

(u, iQ(v)) = i(u,Q(v)) = −i(Q(u), v) = (iQ(u), v), u, v ∈ U, (6.4.3)

we see that iQ ∈ L(U) is self-adjoint or Hermitian. In view of (6.4.1), P

and iQ are commutative. Applying Theorem 6.11, we conclude that U has anorthonormal basis consisting of eigenvectors of P and iQ simultaneously, say{u1, . . . , un}, associated with the corresponding real eigenvalues, {ε1, . . . , εn}and {δ1, . . . , δn}, respectively. Consequently, we have

T (ui) = (P +Q)(ui) = (P − i[iQ])(ui) = (εi + iωi)ui, i = 1, . . . , n,

(6.4.4)

where ωi = −δi (i = 1, . . . , n). This discussion leads us to the followingtheorem.

Theorem 6.13 Let T ∈ L(U). Then T is normal if and only if there is anorthonormal basis of U consisting of eigenvectors of T .

Proof Suppose that U has an orthonormal basis consisting of eigenvectorsof T , say B = {u1, . . . , un}, with the corresponding eigenvalues {λ1, . . . , λn}.Let T ′ ∈ L(U) be represented by the matrix B = (bij ) ∈ C(n, n) with respectto B. Then

T ′(uj ) =n∑

i=1

bijui, j = 1, . . . , n. (6.4.5)

Therefore, we have

(T ′(uj ), ui) =(

n∑k=1

bkjuk, ui

)= bij , i, j = 1, . . . , n, (6.4.6)

and

(T ′(uj ), ui) = (uj , T (ui)) = (uj , λiui) = λiδij , i, j = 1, . . . , n. (6.4.7)


Combining (6.4.6) and (6.4.7), we find bij = λiδij (i, j = 1, . . . , n).In other words, B is diagonal, B = diag{λ1, . . . , λn}, such that λ1, . . . , λn

are the eigenvalues of T ′ with the corresponding eigenvectors u1, . . . , un.In particular,

(T ◦ T ′)(ui) = λiλiui = λiλiui = (T ′ ◦ T )(ui), i = 1, . . . , n, (6.4.8)

which proves T ◦ T ′ = T ′ ◦ T . That is, T is normal.Conversely, if T is normal, the existence of an orthonormal basis of U con-

sisting of eigenvectors of T is already shown.The proof is complete.

To end this section, we consider commutative normal mappings.

Theorem 6.14 Let S, T ∈ L(U) be normal. Then there is an orthonormalbasis of U consisting of eigenvectors of S and T simultaneously if and only ifS and T are commutative, S ◦ T = T ◦ S.

Proof If {u1, . . . , un} is an orthonormal basis of U so that S(ui) = γiui

and T (ui) = λiui , γi, λi ∈ C, i = 1, . . . , n, then (S ◦ T )(ui) = γiλiui =(T ◦ S)(ui), i = 1, . . . , n, which establishes the commutativity of S and T :S ◦ T = T ◦ S.

Conversely, assume S and T are commutative normal mappings. Letλ1, . . . , λk be all the distinct eigenvalues of T and Eλ1 , . . . , Eλk

the associatedeigenspaces that may readily be checked to be mutually perpendicular (leftas an exercise in this section). Fix i = 1, . . . , k. For any u ∈ Eλi

, we haveT (S(u)) = S(T (u)) = S(λiu) = λiS(u). Thus S(u) ∈ Eλi

. That is, Eλi

is invariant under S. Since S is normal, Eλihas an orthonormal basis, say

{ui,1, . . . , ui,mi}, consisting of the eigenvectors of S. Therefore, the set of

vectors

{u1,1, . . . , u1,m1 , . . . , uk,1, . . . , uk,mk} (6.4.9)

is an orthonormal basis of U consisting of the eigenvectors of both S

and T .

An obvious but pretty general example of a normal mapping that is not nec-essarily self-adjoint or Hermitian is a unitary mapping T ∈ L(U) since itsatisfies T ◦T ′ = T ′ ◦T = I . Consequently, for a unitary mapping T ∈ L(U),there is an orthonormal basis of U consisting of eigenvectors of T . Further-more, since T is isometric, it is clear that all eigenvalues of T are of absolutevalue 1, as already observed in Section 4.3.

6.4 Commutative self-adjoint mappings and consequences 197

The matrix versions of Definition 6.12 and Theorems 6.13 and 6.14 may bestated as follows.

Definition 6.15 A matrix A ∈ C(n, n) is said to be normal if it satisfies theproperty AA† = A†A.

Theorem 6.16 Normal matrices have the following characteristic properties.

(1) A matrix A ∈ C(n, n) is diagonalizable through a unitary matrix, that is,there is a unitary matrix P ∈ C(n, n) and a diagonal matrix D ∈ C(n, n)

such that A = P †DP , if and only if A is normal.

(2) Two normal matrices A,B ∈ C(n, n) are simultaneously diagonalizablethrough a unitary matrix, that is, there is a unitary matrix P ∈ C(n, n)

and two diagonal matrices D1,D2 ∈ C(n, n) such that A = P †D1P ,B = P †D2P , if and only if A,B are commutative: AB = BA.

To prove the theorem, we may simply follow the standard way to associatea matrix A ∈ C(n, n) with the mapping it generates over Cn through x #→ Ax

for x ∈ Cn as before and apply Theorems 6.13 and 6.14.Moreover, if A ∈ C(n, n) is unitary, AA† = A†A = In, then there is a

unitary matrix P ∈ C(n, n) and a diagonal matrix D = diag{λ1, . . . , λn} with|λi | = 1, i = 1, . . . , n, such that A = P †DP .

Exercises

6.4.1 Let U be a finite-dimensional complex vector space with a positivedefinite scalar product (·, ·) and T ∈ L(U). Show that T is normal ifand only if T satisfies the identity

‖T (u)‖ = ‖T ′(u)‖, u ∈ U. (6.4.10)

6.4.2 Use the previous exercise and the property (λI−T )′ = λI−T ′ (λ ∈ C)to show directly that if T is normal and λ is an eigenvalue of T with anassociated eigenvector u ∈ U then λ and u are a pair of eigenvalue andeigenvector of T ′.

6.4.3 Let U be a finite-dimensional complex vector space with a positivedefinite scalar product (·, ·) and T ∈ L(U) satisfy the property that, ifλ ∈ C is an eigenvalue of T and u ∈ U an associated eigenvector, thenλ is an eigenvalue and u an associated eigenvector of T ′. Show that ifλ,μ ∈ C are two different eigenvalues of T and u, v are the associatedeigenvectors, respectively, then (u, v) = 0.


6.4.4 Assume that A ∈ C(n, n) is triangular and normal. Show that A mustbe diagonal.

6.4.5 Let U be a finite-dimensional complex vector space with a positivedefinite scalar product and T ∈ L(U) be normal.

(i) Show that T is self-adjoint if and only if all the eigenvalues of T

are real.(ii) Show that T is anti-self-adjoint if and only if all the eigenvalues of

T are imaginary.(iii) Show that T is unitary if and only if all the eigenvalues of T are of

absolute value 1.

6.4.6 Let T ∈ L(U) be a normal mapping where U is a finite-dimensionalcomplex vector space with a positive definite scalar product.

(i) Show that if T k = 0 for some integer k ≥ 1 then T = 0.(ii) (A sharpened version of (i)) Given u ∈ U show that if T k(u) = 0

for some integer k ≥ 1 then T (u) = 0.

(This is an extended version of Exercise 6.2.8.)6.4.7 If R, S ∈ L(U) are Hermitian and commutative, show that T = R± iS

is normal.6.4.8 Consider the matrix

A =(

1 i

i 1

). (6.4.11)

(i) Show that A is not Hermitian nor unitary but normal.(ii) Find an orthonormal basis of C2, with the standard Hermitian

scalar product, consisting of the eigenvectors of A.

6.4.9 Let U be a finite-dimensional complex vector space with a positivedefinite scalar product and T ∈ L(U) be normal. Show that for anyinteger k ≥ 1 there is a normal element S ∈ L(U) such that T = Sk .Moreover, if T is unitary, then there is a unitary element S ∈ L(U)

such that T = Sk .6.4.10 Recall that any c ∈ C may be rewritten as c = aeiθ , where a, θ ∈ R

and a ≥ 0, known as a polar decomposition of c. Let U be a finite-dimensional complex vector space with a positive definite scalar prod-uct and T ∈ L(U). Show that T enjoys a similar polar decompositionproperty such that there are a positive semi-definite element R and aunitary element S, both in L(U), satisfying T = R ◦ S = S ◦R, if andonly if T is normal.

6.5 Mappings between two spaces via self-adjoint mappings 199

6.4.11 Let U be a complex vector space with a positive definite scalar product(·, ·). A mapping T ∈ L(U) is called hyponormal if it satisfies

(u, (T ◦ T ′ − T ′ ◦ T )(u)) ≤ 0, u ∈ U. (6.4.12)

(i) Show that T is normal if any only if T and T ′ are both hyponormal.(ii) Show that T being hyponormal is equivalent to

‖T ′(u)‖ ≤ ‖T (u)‖, u ∈ U. (6.4.13)

(iii) If T is hyponormal, so is T + λI where λ ∈ C.(iv) Show that if λ ∈ C is an eigenvalue and u an associated eigen-

vector of a hyponormal mapping T ∈ L(U) then λ and u are aneigenvalue and an associated eigenvector of T ′.

6.4.12 Let U be an n-dimensional (n ≥ 2) complex vector space with a pos-itive definite scalar product and T ∈ L(U). Show that T is normal ifand only if there is a complex-coefficient polynomial p(t) of degree atmost n− 1 such that T ′ = p(T ).

6.4.13 Let U be a finite-dimensional complex vector space with a positivedefinite scalar product and T ∈ L(U). Show that T is normal if andonly if T and T ′ have the same invariant subspaces of U .

6.5 Mappings between two spaces via self-adjoint mappings

As in the real situation, we show that self-adjoint or Hermitian mappings maybe used to study mappings between two complex vector spaces.

As in Section 5.6, let U and V denote two complex vector spaces of finitedimensions, with positive definite Hermitian scalar products (·, ·)U and (·, ·)V ,respectively. Given T ∈ L(U, V ) and v ∈ V , it is clear that

f (u) = (v, T (u))V , u ∈ U, (6.5.1)

defines an element f in U ′ which depends on v ∈ V . So there is a unique ele-ment in U depending on v, say T ′(v), such that f (u) = (T ′(v), u)U . That is,

(v, T (u))V = (T ′(v), u)U , u ∈ U, v ∈ V. (6.5.2)

Thus we have obtained a well-defined mapping T ′ : V → U .It may easily be verified that T ′ is linear. Thus T ′ ∈ L(V,U). As in the

real situation, we can consider the composed mappings T ′ ◦ T ∈ L(U) andT ◦ T ′ ∈ L(V ), which are both self-adjoint or Hermitian.


Besides, from the relations

(u, (T ′ ◦ T )(u))U = (T (u), T (u))V ≥ 0, u ∈ U,

(v, (T ◦ T ′)(v))V = (T ′(v), T ′(v))U ≥ 0, v ∈ V,(6.5.3)

it is seen that T ′ ◦T ∈ L(U) and T ◦T ′ ∈ L(V ) are both positive semi-definite.Use ‖ · ‖U and ‖ · ‖V to denote the norms induced from (·, ·)U and (·, ·)V ,

respectively. Then

‖T ‖ = sup{‖T (u)‖V | ‖u‖U = 1, u ∈ U}. (6.5.4)

On the other hand, using the fact that T ′ ◦ T ∈ L(U) is self-adjoint and posi-tive semi-definite, we know that there is an orthonormal basis {u1, . . . , un} ofU consisting of eigenvectors of T ′ ◦ T , with the corresponding non-negative

eigenvalues, σ1, . . . , σn, respectively. Hence, for any u =n∑

i=1

aiui ∈ U with

‖u‖2U =

n∑i=1

|ai |2 = 1, we have

‖T (u)‖2V = (T (u), T (u))V = ((T ′ ◦ T )(u), u)U =

n∑i=1

σi |ai |2 ≤ σ0,

(6.5.5)

where

σ0 = max1≤i≤n

{σi} ≥ 0, (6.5.6)

which shows ‖T ‖ ≤ √σ0. Moreover, let i = 1, . . . , n be such that σ0 = σi .Then (6.5.4) gives us

‖T ‖2 ≥ ‖T (ui)‖2V = ((T ′ ◦ T )(ui), ui)U = σi = σ0. (6.5.7)

Consequently, as in the real situation, we conclude with

‖T ‖ = √σ0, σ0 ≥ 0 is the largest eigenvalue of the mapping T ′ ◦ T .(6.5.8)

Therefore the norm of a linear mapping T ∈ L(U, V ) may be obtainedby computing the largest eigenvalue of the induced self-adjoint or Hermitianmapping T ′ ◦ T ∈ L(U).

In particular, if T ∈ L(U) is already self-adjoint, then, since ‖T ‖ is simplythe radical root of the largest eigenvalue of T 2, we arrive at

‖T ‖ = max1≤i≤n

{|λi |}, (6.5.9)


where λ1, . . . , λn are the eigenvalues of T .Thus, a combination of (6.5.8) and (6.5.9) leads to the formula

‖T ‖2 = ‖T ′ ◦ T ‖, T ∈ L(U, V ). (6.5.10)

Analogously, from (6.5.9), we note that, when T ∈ L(U) is self-adjoint, thequantity ‖T ‖ may also be evaluated accordingly by

‖T ‖ = sup{|(u, T (u))| | u ∈ U, ‖u‖ = 1}, (6.5.11)

as in the real situation.We can similarly show how to extend (6.5.11) to evaluate the norm of an

arbitrary linear mapping between U and V .

Theorem 6.17 Let U,V be finite-dimensional complex vector spaces withpositive definite Hermitian scalar products (·, ·)U , (·, ·)V , respectively. ForT ∈ L(U, V ), we have

‖T ‖ = sup{|(v, T (u))V | | u ∈ U, v ∈ V, ‖u‖U = 1, ‖v‖V = 1}. (6.5.12)

The proof is identical to that for Theorem 5.14.As in the real situation, we may use Theorem 6.17 to establish the fact that

the norms of a linear mapping and its dual, over two complex vector spaceswith positive definite Hermitian scalar products, share the same value.

Theorem 6.18 Let U,V be finite-dimensional complex vector spaces withpositive definite Hermitian scalar products (·, ·)U , (·, ·)V , respectively. ForT ∈ L(U, V ) and its dual T ′ ∈ L(V,U), we have ‖T ‖ = ‖T ′‖. Thus‖T ′ ◦ T ‖ = ‖T ◦ T ′‖ and the largest eigenvalues of the positive semi-definiteself-adjoint or Hermitian mappings T ′ ◦T ∈ L(U) and T ◦T ′ ∈ L(V ) are thesame.

Proof The fact that ‖T ‖ = ‖T ′‖ may be deduced from applying (6.5.12) toT ′ and the relation (v, T (u))V = (T ′(v), u)U (u ∈ U ,v ∈ V ). The conclusion‖T ′ ◦ T ‖ = ‖T ◦ T ′‖ follows from (6.5.10) and that the largest eigenvalues ofthe positive semi-definite self-adjoint or Hermitian mappings T ′ ◦T and T ◦T ′are the same is a consequence of the eigenvalue characterization of the normof a self-adjoint mapping stated in (6.5.9) and ‖T ′ ◦ T ‖ = ‖T ◦ T ′‖.

However, as in the real situation, Theorem 6.18 is natural and hardly sur-prising in view of Theorem 5.16.

We continue to study a mapping T ∈ L(U, V ), where U and V are two com-plex vector spaces with positive definite Hermitian product (·, ·)U and (·, ·)V


and of dimensions n and m, respectively. Let σ1, . . . , σn be all the eigen-values of the positive semi-definite mapping T ′ ◦ T ∈ L(U), among whichσ1, . . . , σk are positive, say. Use {u1, . . . , uk, . . . , un} to denote an orthonor-mal basis of U consisting of eigenvectors of T ′ ◦ T associated with the eigen-values σ1, . . . , σk, . . . , σn. Then we have

(T (ui), T (uj ))V = (ui, (T′ ◦ T )(uj ))U

= σj (ui, uj )U = σj δij , i, j = 1, . . . , n. (6.5.13)

This simple expression indicates that T (ui) = 0 for i > k (if any) and that{T (u1), . . . , T (uk)} forms an orthogonal basis of R(T ). In particular, k =r(T ).

Now set

vi = 1

‖T (ui)‖V T (ui), i = 1, . . . , k. (6.5.14)

Then {v1, . . . , vk} is an orthonormal basis for R(T ). Taking i = j = 1, . . . , k

in (6.5.13), we see that ‖T (ui)‖ = √σi, i = 1, . . . , k. In view of this and

(6.5.14), we arrive at

T (ui) = √σivi, i = 1, . . . , k, T (uj ) = 0, j > k (if any). (6.5.15)

In the above construction, the positive numbers√

σ1, . . . ,√

σk are calledthe singular values of T and the expression (6.5.15) the singular value decom-position for T . This result may conveniently be summarized as a theorem.

Theorem 6.19 Let U,V be finite-dimensional complex vector spaces withpositive definite Hermitian scalar products and T ∈ L(U, V ) is ofrank k ≥ 1. Then there are orthonormal bases {u1, . . . , uk, . . . , un}and {v1, . . . , vk, . . . , vm} of U and V , respectively, and positive numbersλ1, . . . , λk , referred to as the singular values of T , such that

T (ui) = λivi, i = 1, . . . , k, T (uj ) = 0, j > k (if any). (6.5.16)

In fact the numbers λ21, . . . , λ

2k are all the positive eigenvalues and u1, . . . , uk

the associated eigenvectors of the self-adjoint mapping T ′ ◦ T ∈ L(U).

Let A ∈ C(m, n). Theorem 6.19 implies that there are unitary matrices P ∈C(n, n) and Q ∈ C(m,m) such that⎧⎪⎨⎪⎩

AP = Q� or A = Q�P †,

� =(

D 0

0 0

)∈ R(m, n), D = diag{λ1, . . . , λk},

(6.5.17)


where k = r(A) and λ1, . . . , λk are some positive numbers for whichλ2

1, . . . , λ2k are all the positive eigenvalues of the Hermitian matrix A†A. The

numbers λ1, . . . , λk are called the singular values of the matrix A and theexpression (6.5.17) the singular value decomposition for the matrix A.

Note that Exercise 6.3.12 may be regarded as an early and special version ofthe general singular value decomposition procedure here.

Of course the results above may also be established similarly for mappingsbetween real vector spaces and for real matrices.

Exercises

6.5.1 Verify that for T ∈ L(U, V ) the mapping T ′ given in (6.5.2) is a well-defined element in L(V,U) although the scalar products (·, ·)U and(·, ·)V of U and V are both sesquilinear.


A =(

2 1− i 3

3i −1 3+ 2i

). (6.5.18)

(i) Find the eigenvalues of AA† and A†A and compare.(ii) Find ‖A‖.

6.5.3 Apply (6.5.8) and use the fact that the nonzero eigenvalues of T ′ ◦ T

and T ◦ T ′ are the same to prove directly that ‖T ‖ = ‖T ′‖ as stated inTheorem 6.18.

6.5.4 If U is a finite-dimensional vector space with a positive definite scalarproduct and T ∈ L(U) is normal, show that ‖T 2‖ = ‖T ‖2. Can thisresult be extended to ‖T m‖ = ‖T ‖m for any positive integer m?


A =(

1+ i 2 −1

2 1− i 1

). (6.5.19)

(i) Find the singular values of A.(ii) Find a singular value decomposition of A.

6.5.6 Let A ∈ C(m, n). Show that A and A† have the same singular values.6.5.7 Let A ∈ C(n, n) be invertible. Investigate the relationship between the

singular values of A and those of A−1.6.5.8 Let A ∈ C(n, n). Use the singular value decomposition for A to show

that A may be rewritten as A = PB = CQ, where P,Q are some uni-tary matrices and B,C some positive semi-definite Hermitian matrices.


6.5.9 Let U be a finite-dimensional complex vector space with a positivedefinite Hermitian scalar product and T ∈ L(U) positive semi-definite.Show that the singular values of T are simply the positive eigenvaluesof T .

6.5.10 Show that a square complex matrix is unitary if and only if its singularvalues are all 1.

Note that most of the exercises in Chapter 5 may be restated in the contextof the complex situation of this chapter and are omitted.

7

Jordan decomposition

In this chapter we establish the celebrated Jordan decomposition theoremwhich allows us to reduce a linear mapping over C into a canonical form interms of its eigenspectrum. As a preparation we first recall some facts regard-ing factorization of polynomials. Then we show how to reduce a linear map-ping over a set of its invariant subspaces determined by a prime factorization ofthe characteristic polynomial of the mapping. Next we reduce a linear mappingover its generalized eigenspaces. Finally we prove the Jordan decompositiontheorem by understanding how a mapping behaves itself over each of its gen-eralized eigenspaces.

7.1 Some useful facts about polynomials

Let P be the vector space of all polynomials with coefficients in a given fieldF and in the variable t . Various technical computations and concepts involvingelements in P may be simplified considerably with the notion ‘ideal’ as wenow describe.

Definition 7.1 A non-empty subset I ⊂ P is called an ideal of P if it satisfiesthe following two conditions.

(1) f + g ∈ I for any f, g ∈ I .(2) fg ∈ I for any f ∈ P and g ∈ I .

Since F may naturally be viewed as a subset of P , we see that af ∈ I forany a ∈ F and f ∈ I . Hence an ideal is also a subspace.

Let g1, . . . , gk ∈ P . Construct the subset of P given by

{f1g1 + · · · + fkgk | f1, . . . , fk ∈ P}. (7.1.1)

205

206 Jordan decomposition

It is readily checked that the subset defined in (7.1.1) is an ideal of P . Wemay say that this deal is generated from g1, . . . , gk and use the notationI(g1, . . . , gk) to denote it.

There are two trivial ideals: I = {0} and I = P and it is obvious that{0} = I(0) and P = I(1). That is, both {0} and P are generated from somesingle elements in P . The following theorem establishes that any ideal in Pmay be generated from a single element in P .

Theorem 7.2 Any ideal in P is singly generated. More precisely, if I = {0}is an ideal of P , then there is an element g ∈ P such that I = I(g). More-over, if there is another h ∈ P such that I = I(h), then g and h are of thesame degree. Besides, if the coefficients of the highest-degree terms of g and h

coincide, then g = h.

Proof If I = I(g) for some g ∈ P , it is clear that g will have the lowestdegree among all elements in I in view of the definition of I(g). Such anobservation indicates what to look for in our proof. Indeed, since I = {0}, wemay choose g to be an element in I \ {0} that is of the lowest degree. Then itis clear that I(g) ⊂ I . For any h ∈ I \ {0}, we claim g|h (i.e. g divides h).Otherwise, we may rewrite h as

h(t) = q(t)g(t)+ r(t) (7.1.2)

for some q, r ∈ P so that the degree of r is lower than that of g. Since g ∈ I,we have qg ∈ I. Thus r = h− qg ∈ I , which contradicts the definition of g.Consequently, g|h. So h ∈ I(g) as expected, which proves I ⊂ I(g).

If there is another h ∈ P such that I = I(h), of course g and h must havethe same degree since h|g and g|h. If the coefficients of the highest-degreeterms of g and h coincide, then g = h, otherwise g − h ∈ I \ {0} would be ofa lower degree, which contradicts the choice of g given earlier.

The theorem is proved.

Let g1, . . . , gk ∈ P \ {0}. Choose g ∈ P such that I(g) = I(g1, . . . , gk).Then there are elements f1, . . . , fk ∈ P such that

g = f1g1 + · · · + fkgk, (7.1.3)

which implies that g contains all common divisors of g1, . . . , gk . In otherwords, if h|g1, . . . , h|gk , then h|g. On the other hand, the definition ofI(g1, . . . , gk) already gives us g|g1, . . . , g|gk . So g itself is a commondivisor of g1, . . . , gk . In view of Theorem 7.2, we see that the coefficient of thehighest-degree term of g determines g completely. Thus, we may fix g by tak-ing the coefficient of the highest-degree term of g to be 1. Such a polynomial

7.1 Some useful facts about polynomials 207

g is referred to as the greatest common divisor of g1, . . . , gk and often denotedas g = gcd(g1, . . . , gk). Therefore, we see that the notion of an ideal and itsgenerator provides an effective tool for the study of greatest common divisors.

Given g1, . . . , gk ∈ P \ {0}, if there does not exist a common divisor of anontrivial degree (≥ 1) for g1, . . . , gk , then we say that g1, . . . , gk are rela-tively prime or co-prime. Thus g1, . . . , gk are relatively prime if and only ifgcd(g1, . . . , gk) = 1. In this situation, there are elements f1, . . . , fk ∈ P suchthat the identity

f1(t)g1(t)+ · · · + fk(t)gk(t) = 1 (7.1.4)

is valid for arbitrary t . This fact will be our starting point in the subsequentdevelopment.

A polynomial p ∈ P of degree at least 1 is called a prime polynomial or anirreducible polynomial if p cannot be factored into the product of two polyno-mials in P of degrees at least 1. Two polynomials are said to be equivalent ifone is a scalar multiple of the other.

Exercises

7.1.1 For f, g ∈ P show that f |g if and only if I(f ) ⊃ I(g).7.1.2 Consider I ⊂ P over a field F given by

I = {f ∈ P | f (a1) = · · · = f (ak) = 0}, (7.1.5)

where a1, . . . , ak ∈ F are distinct.

(i) Show that I is an ideal.(ii) Find a concrete element g ∈ P such that I = I(g).

(iii) To what extent, is the element g obtained in (ii) unique?

7.1.3 Show that if f ∈ P and (t − 1)|f (tn), where n ≥ 1 is an integer, then(tn − 1)|f (tn).

7.1.4 Let f, g ∈ P , g = 0, and n ≥ 1 be an integer. Prove that f |g if and onlyif f n|gn.

7.1.5 Let f, g ∈ P be nonzero polynomials and n ≥ 1 an integer. Show that

(gcd(f, g))n = gcd(f n, gn). (7.1.6)

7.1.6 Let U be a finite-dimensional vector space over a field F and P the vec-tor space of all polynomials over F. Show that I = {p ∈ P |p(T ) = 0}is an ideal of P . Let g ∈ P be such that I = I(g) and the coefficient ofthe highest-degree term of g is 1. Is g the minimal polynomial of T thathas the minimum degree among all the polynomials that annihilate T ?


7.2 Invariant subspaces of linear mappings

In this section, we show how to use characteristic polynomials and their primefactorization to resolve or reduce linear mappings into mappings over invari-ant subspaces. As a preparation, we first establish a factorization theorem forcharacteristic polynomials.

Theorem 7.3 Let U be a finite-dimensional vector space over a field F andT ∈ L(U). Assume that V,W are nontrivial subspaces of U such thatU = V ⊕ W and V,W are invariant under T . Use R, S to denote T

restricted to V,W , and pR(λ), pS(λ), pT (λ) the characteristic polynomials ofR ∈ L(V ), S ∈ L(W), T ∈ L(U), respectively. Then pT (λ) = pR(λ)pS(λ).

Proof Let {v1, . . . , vk} and {w1, . . . , wl} be bases of V,W , respectively.Then {v1, . . . , vk, w1, . . . , wl} is a basis of U . Assume that B ∈ F(k, k) andC ∈ F(l, l) are the matrix representations of R and S, with respect to the bases{v1, . . . , vk} and {w1, . . . , wl}, respectively. Then

A =(

B 0

0 C

)(7.2.1)

is the matrix representation of T with respect to the basis{v1, . . . , vk, w1, . . . , wl}. Consequently, we have

pT (λ) = det(λI − A) = det

(λIk − B 0

0 λIl − C

)= det(λIk − B) det(λIl − C) = pR(λ)pS(λ), (7.2.2)

as asserted, and the theorem is proved.

We can now demonstrate how to use the factorization of the characteristicpolynomial of a linear mapping to naturally resolve it over invariant subspaces.

Theorem 7.4 Let U be a finite-dimensional vector space over a field F andT ∈ L(U) and use pT (λ) to denote the characteristic polynomial of T . FactorpT (λ) into the form

pT = pn11 · · ·pnk

k , (7.2.3)

where p1, . . . , pk are nonequivalent prime polynomials in P , and set

gi = pT

pni

i

= pn11 · · · pni

i · · ·pnk

k , i = 1, . . . , k, (7.2.4)

wheredenotes the factor that is missing. Then we have the following.

7.2 Invariant subspaces of linear mappings 209

(1) The vector space U has the direct decomposition

U = V1 ⊕ · · · ⊕ Vk, Vi = N(pni

i (T )), i = 1, . . . , k. (7.2.5)

(2) T is invariant over each Vi , i = 1, . . . , k.

Proof Since pT (λ) is the characteristic polynomial, the Cayley–Hamiltontheorem gives us pT (T ) = 0. Since g1, . . . , gk are relatively prime, there arepolynomials f1, . . . , fk ∈ P such that

f1(λ)g1(λ)+ · · · + fk(λ)gk(λ) = 1. (7.2.6)

Therefore, we have

f1(T )g1(T )+ · · · + fk(T )gk(T ) = I. (7.2.7)

Thus, given u ∈ U , we can rewrite u as

u = u1 + · · · + uk, ui = fi(T )gi(T )u, i = 1, . . . , k. (7.2.8)

Now since pni

i (T )ui = pni

i (T )fi(T )gi(T )u = fi(T )pT (T )u = 0, we getui ∈ N(p

ni

i (T )) (i = 1, . . . , k), which proves U = V1 + · · · + Vk .For any i = 1, . . . , k, we need to show

Wi ≡ Vi ∩⎛⎝ ∑

1≤j≤k,j =i

Vj

⎞⎠ = {0}. (7.2.9)

In fact, since pni

i and gi are relatively prime, there are polynomials qi and ri

in P such that

qipni

i + rigi = 1. (7.2.10)

This gives us the relation

qi(T )pni

i (T )+ ri(T )gi(T ) = I. (7.2.11)

Note also that the definition of gi indicates that∑1≤j≤k,j =i

Vj ⊂ N(gi(T )). (7.2.12)

Thus, let u ∈ Wi . Then, applying (7.2.11) to u and using (7.2.12), we find

u = qi(T )pni

i (T )u+ ri(T )gi(T )u = 0. (7.2.13)

Therefore (1) is established.Let u ∈ Vi . Then p

ni

i (T )(T (ui)) = Tpni

i (T )(u) = 0. Thus T (u) ∈ Vi andthe invariance of Vi under T is proved, which establishes (2).


Exercises

7.2.1 Let S, T ∈ L(U), where U is a finite-dimensional vector space over afield F. Show that if the characteristic polynomials pS(λ) and pT (λ) ofS and T are relatively prime then pS(T ) and pT (S) are both invertible.

7.2.2 Let S, T ∈ L(U) where U is a finite-dimensional vector space overa field F. Use the previous exercise to show that if the characteristicpolynomials of S and T are relatively prime and R ∈ L(U) satisfiesR ◦ S = T ◦ R then R = 0.

7.2.3 Let U be a finite-dimensional vector space over a field F and T ∈ L(U).Show that T is idempotent, T 2 = T , if and only if

r(T )+ r(I − T ) = dim(U). (7.2.14)

7.2.4 Let U be a finite-dimensional vector space over a field F and T ∈ L(U).Prove the following slightly extended version of Theorem 7.4.

Suppose that the characteristic polynomial of T , say pT (λ), has thefactorization pT (λ) = g1(λ)g2(λ) over F where g1 and g2 are rela-tively prime polynomials. Then U = N(g1(T )) ⊕ N(g2(T )) and bothN(g1(T )) and N(g2(T )) are invariant under T .

7.2.5 Let u ∈ Cn be a nonzero column vector and set A = uu† ∈ C(n, n).

(i) Show that A2 = aA, where a = u†u.(ii) Find a nonsingular matrix B ∈ C(n, n) so that A = B−1DB where

D ∈ C(n, n) is diagonal and determine D.(iii) Describe

N(A) = {x ∈ Cn |Ax = 0},N(A− aIn) = {x ∈ Cn | (A− aIn)x = 0}, (7.2.15)

as two invariant subspaces of the mapping T ∈ L(Cn) given byx #→ Ax (x ∈ Cn) over which T reduces.

(iv) Determine r(T ).

7.2.6 Let U be a finite-dimensional vector space and T1, . . . , Tk ∈ L(U) sat-isfy T 2

i = Ti (i = 1, . . . , k) and Ti ◦ Tj = 0 (i, j = 1, . . . , k, i = j , ifany). Show that there holds the space decomposition

U = R(T1)⊕ · · · ⊕ R(Tk)⊕ V, V = ∩ki=1N(Ti), (7.2.16)

which reduces T1, . . . , Tk simultaneously.

7.3 Generalized eigenspaces as invariant subspaces 211

7.3 Generalized eigenspaces as invariant subspaces

In the first part of this section, we will carry out a study of nilpotent mappings,which will be crucial for the understanding of the structure of a general linearmapping in terms of its eigenvalues, to be seen in the second part of the section.

7.3.1 Reducibility of nilpotent mappings

Let U be a finite-dimensional vector space over a field F and T ∈ L(U) anilpotent mapping. Recall that T is said to be of degree m ≥ 1 if m is thesmallest integer such that T m = 0. When m = 1, then T = 0 and the situationis trivial. In the nontrivial case, m ≥ 2, we know that if u ∈ U is of periodm (that is, T m(u) = 0 but T m−1(u) = 0), then u, T (u), . . . , T m−1(u) arelinearly independent vectors in U . Thus, in any nontrivial situation, m satisfies2 ≤ m ≤ dim(U).

For a nontrivial nilpotent mapping, we have the following key results.

Theorem 7.5 Let T ∈ L(U) be a nilpotent mapping of degree m ≥ 2 whereU is finite-dimensional. Then the following are valid.

(1) There are k vectors u1, . . . , uk and k integers m1 ≥ 2, . . . , mk ≥ 2 suchthat u1, . . . , uk are of periods m1, . . . , mk , respectively, such that U hasa basis of the form

u01, . . . , u

0k0

, u1, T (u1), . . . , Tm1−1(u1), . . . , uk, T (uk), . . . , T

mk−1(uk),

(7.3.1)

where u1, . . . , uk0 , if any, are some vectors taken from N(T ). Thus, setting{U0 = Span{u0

1, . . . , u0k0},

Ui = Span{ui, T (ui), . . . , Tmi−1(ui)}, i = 1, . . . , k,

(7.3.2)

we have the decomposition

U = U0 ⊕ U1 ⊕ · · · ⊕ Uk, (7.3.3)

and that U0, U1, . . . , Uk are invariant under T .(2) The degree m of T is given by

m = max{mi | i = 1, . . . , k}. (7.3.4)

(3) The sum of the integers k0 and k is the nullity of T :

k0 + k = n(T ). (7.3.5)


Proof (1) We proceed inductively with m ≥ 2.(i) m = 2.Let {v1, . . . , vk} be a basis of R(T ). Choose u1, . . . , uk ∈ U such that

T (u1) = v1, . . . , T (uk) = vk . We assert that

u1, T (u1), . . . , uk, T (uk) (7.3.6)

are linearly independent vectors in U . To see this, consider

a1u1 + b1T (u1)+ · · · + akuk + bkT (uk) = 0, ai, bi ∈ F, i = 1, . . . , k.

(7.3.7)

Applying T to (7.3.7), we obtain a1v1 + · · · + akvk = 0. Thus a1 = · · · =ak = 0. Inserting this result into (7.3.7), we obtain b1v1 + · · · + bkvk = 0which leads to b1 = · · · = bk = 0.

It is clear that v1, . . . , vk ∈ N(T ) since T 2 = 0. Choose u01, . . . , u

0k0∈

N(T ), if any, so that {u01, . . . , u

0k0

, v1, . . . , vk} becomes a basis of N(T ).For any u ∈ U , we rewrite T (u) as

T (u) = a1v1 + · · · + akvk = T (a1u1 + · · · + akuk), (7.3.8)

where a1, . . . , ak ∈ F are uniquely determined. Therefore we concludethat u − (a1u1 + · · · + aku) ∈ N(T ) and that there are unique scalarsc1, . . . , ck0 , b1, . . . , bk such that

u− (a1u1 + · · · + aku) = c1u1 + · · · + ck0uk0 + b1v1 + · · · + bkvk.

(7.3.9)

Consequently we see that

u01, . . . , u

0k0

, u1, T (u1), . . . , uk, T (uk) (7.3.10)

form a basis of U , as described.(ii) m ≥ 3.We assume the statement in (1) holds when the degree of a nilpotent map-

ping is up to m− 1 ≥ 2.Let T ∈ L(U) be of degree m ≥ 3 and set V = R(T ). Since V is invariant

under T , we may regard T as an element in L(V ). To avoid confusion, we useTV to denote the restriction of T over V .

It is clear that the degree of TV is m − 1 ≥ 2. Applying the inductiveassumption to TV , we see that there are vectors v1, . . . , vl ∈ V with respectiveperiods m′1 ≥ 2, . . . , m′l ≥ 2 and vectors v0

1, . . . , v0l0∈ N(TV ), if any, such

that

v01, . . . , v0

l0, v1, TV (v1), . . . , T

m′1−1V (v1), . . . , vl, TV (vl), . . . , T

m′l−1V (vl)

(7.3.11)

form a basis of V .


Since V = R(T ), we can find some w1, . . . , wl0 , u1, . . . , ul ∈ U such that

T (w1) = v01, . . . , T (wl0) = v0

l0, T (u1) = v1, . . . , T (ul) = vl. (7.3.12)

Hinted by (i), we assert that

w1, v01 . . . , wl0 , v

0l0, u1, T (u1), . . . , T

m′1(u1), . . . , ul, T (ul), . . . , Tm′l (ul)

(7.3.13)

are linearly independent. In fact, consider the relation

l0∑i=1

(aiwi + biv0i )+

l∑i=1

m′i∑j=0

bij Tj (ui) = 0, (7.3.14)

where ais, bis, and bij s are scalars. Applying T to (7.3.14) and using (7.3.12)we arrive at

l0∑i=1

aiv0i +

l∑i=1

m′i−1∑j=0

bij TjV (vi) = 0, (7.3.15)

which results in the conclusion

ai = 0, i = 1, . . . , l0; bij = 0, j = 0, 1, . . . , m′i − 1, i = 1, . . . , l.

(7.3.16)

Substituting (7.3.16) into (7.3.15) we have

b1v01 + · · · + bl0v

0l0+ b1m′1T

m′1(u1)+ · · · + blm′l Tm′l (ul) = 0. (7.3.17)

In other words, we get

b1v01 + · · · + bl0v

0l0+ b1m′1T

m′1−1V (v1)+ · · · + blm′l T

m′l−1V (vl) = 0. (7.3.18)

Since the vectors given in (7.3.11) are linearly independent, we find

b1 = · · · = bl0 = b1m′1 = · · · = blm′l = 0 (7.3.19)

as well, which establishes that the vectors given in (7.3.13) are linearly inde-pendent.

It is clear that

v01, . . . , v0

l0, T m′1(u1), . . . , T

m′l (ul) ∈ N(T ). (7.3.20)

Find u01, . . . , u

0k0∈ N(T ), if any, such that

u01, . . . , u

0k0

, v01, . . . , v0

l0, T m′1(u1), . . . , T

m′l (ul) (7.3.21)

form a basis of N(T ).


Take any u ∈ U . We know that T (u) ∈ V implies that

T (u) =l0∑

i=1

biv0i +

l∑i=1

m′i−1∑j=0

bij TjV (vi)

=l0∑

i=1

biT (wi)+l∑

i=1

m′i∑j=1

bij Tj (ui) (7.3.22)

for some unique bis and bij s in F, which gives rise to the result

u−l0∑

i=1

biwi −l∑

i=1

m′i−1∑j=0

bij Tj (ui) ∈ N(T ). (7.3.23)

Consequently, we see that

u−l0∑

i=1

biwi −l∑

i=1

m′i−1∑j=0

bij Tj (ui) =

k0∑i=1

aiu0i +

l0∑i=1

civ0i +

l∑i=1

diTm′i (ui)

(7.3.24)

for some unique ais, cis, and dis in F. Thus we conclude that the vectors

u01, . . . , u

0k0

, w1, T (w1), . . . , wl0 , T (wl0),

ui, T (ui), . . . , Tm′i (ui), i = 1, . . . , l,

(7.3.25)

form a basis of U so that u01, . . . , u

0k0∈ N(T ) (if any), w1, . . . , wl0 are of

period 2, and u1, . . . , ul of periods m′1 + 1, . . . , m′l + 1, respectively.Thus (1) is proved.Statement (2) is obvious.In (7.3.25), we have seen that k = l0 + l. Hence, from (7.3.21), we have

n(T ) = k0 + k as anticipated, which proves (3).

We are now prepared to consider general linear mappings over complex vec-tor spaces.

7.3.2 Reducibility of a linear mapping via generalized eigenspaces

We now assume that the field we work on is C. In this situation, any primepolynomial must be of degree 1 which allows us to make the statements inTheorem 7.4 concrete and explicit.


Theorem 7.6 Let U be a complex n-dimensional vector space and T ∈ L(U).For the characteristic polynomial pT (λ) of T , let λ1, . . . , λk be the distinctroots of pT (λ) so that

pT (λ) = (λ− λ1)n1 · · · (λ− λk)

nk , n1, . . . , nk ∈ N. (7.3.26)

Then the following statements are valid.

(1) The vector space U has the decomposition

U = V1 ⊕ · · · ⊕ Vk, Vi = N((T − λiI )ni ), i = 1, . . . , k. (7.3.27)

(2) Each Vi is invariant under T and T − λiI is nilpotent over Vi , i =1, . . . , k.

(3) The characteristic polynomial of T restricted to Vi is simply

pVi(λ) = (λ− λi)

ni , (7.3.28)

and the dimension of Vi is ni , i = 1, . . . , k.

Proof In view of Theorem 7.4, it remains only to establish (3).From Theorem 7.2, we have the factorization

pT (λ) = pV1(λ) · · ·pVk(λ), (7.3.29)

where pVi(λ) is the characteristic polynomial of T restricted to Vi (i =

1, . . . , k).For fixed i, we consider T over Vi . Since T − λiI is nilpotent on Vi , we can

find vectors u01, . . . , u

0m0∈ N(T − λiI ), if any, and cyclic vectors u1, . . . , ul

of respective periods m1 ≥ 2, . . . , ml ≥ 2, if any, so that⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩u0

1, . . . , u0m0

,

u1, (T − λiI )(u1), . . . , (T − λi)m1−1(u1),

. . . . . . . . . . . . . . .

ul, (T − λiI )(ul), . . . , (T − λi)ml−1(ul),

(7.3.30)

form a basis for Vi , in view of Theorem 7.5. In particular, we have

di ≡ dim(Vi) = m0 +m1 + · · · +ml. (7.3.31)


With respect to such a basis, the matrix of T −λiI is seen to take the followingboxed diagonal form ⎛⎜⎜⎜⎜⎜⎝

0 0 · · · 0

0 S1 · · · 0

0 · · · . . . 0

0 · · · 0 Sl

⎞⎟⎟⎟⎟⎟⎠ , (7.3.32)

where, in the diagonal of the above matrix, 0 is the zero matrix of size m0×m0

and S1, . . . , Sl are shift matrices of sizes m1 ×m1, . . . , ml ×ml , respectively.Therefore the matrix of T = (T − λiI )+ λiI with respect to the same basis issimply

Ai =

⎛⎜⎜⎜⎜⎜⎝0 0 · · · 0

0 S1 · · · 0

0 · · · . . . 0

0 · · · 0 Sl

⎞⎟⎟⎟⎟⎟⎠+ λiIdi. (7.3.33)

Consequently, it follows immediately that the characteristic polynomial of T

restricted to Vi may be computed by

pVi(λ) = det(λIdi

− Ai) = (λ− λi)di . (7.3.34)

Finally, inserting (7.3.34) into (7.3.29), we obtain

pT (λ) = (λ− λ1)d1 · · · (λ− λk)

dk . (7.3.35)

Comparing (7.3.35) with (7.3.26), we arrive at di = ni , i = 1, . . . , k, asasserted.

Note that the factorization expressed in (7.3.26) indicates that each eigen-value, λi , repeats itself ni times as a root in the characteristic polynomial ofT . For this reason, the integer ni is called the algebraic multiplicity of theeigenvalue λi of T .

Given T ∈ L(U), let λi be an eigenvalue of T . For any integer m ≥ 1, thenonzero vectors in N((T − λiI )m) are called the generalized eigenvectors andN((T − λiI )m) the generalized eigenspace associated to the eigenvalue λi . Ifm = 1, generalized eigenvectors and eigenspace are simply eigenvectors andeigenspace, respectively, associated to the eigenvalue λi of T , and

n(T − λiI ) = dim(N(T − λiI )) (7.3.36)


is the geometric multiplicity of the eigenvalue λi . Since N(T −λiI ) ⊂ N((T −λi)

ni ), we have

n(T − λiI ) ≤ dim(N(T − λi)ni ) = ni. (7.3.37)

In other words, the geometric multiplicity is less than or equal to the algebraicmultiplicity, of any eigenvalue, of a linear mapping.

Exercises

7.3.1 Let U be a finite-dimensional vector space over a field F and T ∈ L(U).Assume that λ0 ∈ F is an eigenvalue of T and consider the eigenspaceEλ0 = N(T−λ0I ) associated with λ0. Let {u1, . . . , uk} be a basis of Eλ0

and extend it to obtain a basis of U , say B = {u1, . . . , uk, v1, . . . , vl}.Show that, using the matrix representation of T with respect to thebasis B, the characteristic polynomial of T may be shown to take theform

pT (λ) = (λ− λ0)kq(λ), (7.3.38)

where q(λ) is a polynomial of degree l with coefficients in F. In particu-lar, use (7.3.38) to infer again, without relying on Theorem 7.6, that thegeometric multiplicity does not exceed the algebraic multiplicity, of theeigenvalue λ0.

7.3.2 Let U be a finite-dimensional vector space over a field F and T ∈ L(U).We say that U is a cyclic vector space with respect to T if there is avector u ∈ U such that the vectors

u, T (u), . . . , T n−1(u), (7.3.39)

form a basis for U , and we call the vector u a cyclic vector of T .

(i) Find the matrix representation of T with respect to the basis

{T n−1(u), . . . , T (u), u} (7.3.40)

with specifying

T (T n−1(u)) = an−1Tn−1(u)+ · · · + a1T (u)+ a0u,

a0, a1, . . . , an−1 ∈ F. (7.3.41)

Note that T is nilpotent of degree n only when a0 = a1 = · · · =an−1 = 0.

(ii) Find the characteristic polynomial and the minimal polynomial ofT in terms of a0, a1, . . . , an−1.


7.3.3 Let U be an n-dimensional vector space (n ≥ 2) over a field F andT ∈ L(U). Assume that T has a cyclic vector. Show that S ∈ L(U)

commutes with T if and only S = p(T ) for some polynomial p(t) ofdegree at most n− 1 and with coefficients in F.

7.3.4 Let U be an n-dimensional complex vector space with a positive definitescalar product and T ∈ L(U) a normal mapping.

(i) Show that if T has a cyclic vector then T has n distinct eigenvalues.(ii) Assume that T has n distinct eigenvalues and u1, . . . , un are the

associated eigenvectors. Show that

u = a1u1 + · · · + anun (7.3.42)

is a cyclic vector of T if and only if ai = 0 for any i = 1, . . . , n.

7.3.5 Let U be an n-dimensional vector space over a field F and T ∈ L(U)

a degree n nilpotent mapping, n ≥ 2. Show that there is no S ∈ L(U)

such that S2 = T .

7.4 Jordan decomposition theorem

Let T ∈ L(U) where U is an n-dimensional vector space over C andλ1, . . . , λk all the distinct eigenvalues of T , of respective algebraic multiplici-ties n1, . . . , nk , so that the characteristic polynomial of T assumes the form

pT (λ) = (λ− λi)n1 · · · (λ− λk)

nk . (7.4.1)

For each i = 1, . . . , k, use Vi to denote the generalized eigenspace associatedwith the eigenvalue λi :

Vi = N((T − λiI )ni ). (7.4.2)

Then we have seen the following.

(1) Vi is invariant under T .(2) There are eigenvectors u0

i,1, . . . , u0i,l0i

of T , if any, associated with the

eigenvalue λi , and cyclic vectors ui,1, . . . , ui,li of respective periodsmi,1 ≥ 2, . . . , mi,li ≥ 2, if any, relative to T − λiI , such that Vi has abasis, denoted by Bi , consisting of vectors⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩

u0i,1, . . . , u

0i,l0i

,

ui,1, (T − λiI )(ui,1), . . . , (T − λi)mi,1−1(ui,1),

· · · · · · · · · · · ·ui,li , (T − λiI )(ui,li ), . . . , (T − λi)

mi,li−1(ui,li ).

(7.4.3)

7.4 Jordan decomposition theorem 219

(3) T − λiI is nilpotent of degree mi on Vi where

mi = max{1,mi,1, . . . , mi,li }. (7.4.4)

Since (T − λiI )ni is null over Vi , we have mi ≤ ni . Therefore, applying T tothese vectors, we have⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

T (u0i,s ) = λiu

0i,s , s = 1, . . . , l0

i ,

T (ui,1) = (T − λiI )(ui,1)+ λiui,1,

T ((T − λiI )(ui,1)) = (T − λiI )2(ui,1)

+λi(T − λiI )(ui,1),

· · · · · · · · · · · · · · ·T ((T − λi)

mi,1−1(ui,1)) = λi(T − λi)mi,1−1(ui,1),

· · · · · · · · · · · · · · ·T (ui,li ) = (T − λiI )(ui,li )+ λiui,li ,

T ((T − λiI )(ui,li )) = (T − λiI )2(ui,li )

+λi(T − λiI )(ui,li ),

· · · · · · · · · · · · · · ·T ((T − λi)

mi,li−1(ui,li )) = λi(T − λi)

mi,li−1(ui,li ).

(7.4.5)

From (7.4.5), we see that, as an element in L(Vi), the matrix representation ofT with respect to the basis (7.4.3) is

Ji ≡

⎛⎜⎜⎜⎜⎜⎝Ji,0 0 · · · 0

0 Ji,1 · · · 0

0 0. . . 0

0 · · · 0 Ji,li

⎞⎟⎟⎟⎟⎟⎠ , (7.4.6)

where Ji,0 = λiIl0iand

Ji,s =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

λi 0 · · · · · · 0

1 λi 0 · · · 0...

. . .. . .

. . . 0... · · · . . .

. . ....

0 · · · · · · 1 λi

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠(7.4.7)

is an mi,s ×mi,s matrix, s = 1, . . . , li .


Alternatively, we may also reorder the vectors listed in (7.4.3) to get⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩

u0i,1, . . . , u

0i,l0i

,

(T − λi)mi,1−1(ui,1), . . . , (T − λiI )(ui,1), ui,1,

· · · · · · · · · · · ·(T − λi)

mi,li−1(ui,li ), . . . , (T − λiI )(ui,li ), ui,li .

(7.4.8)

With the choice of these reordered basis vectors, the submatrix Ji,s insteadtakes the following updated form,

Ji,s =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

λi 1 0 · · · 0

0 λi 1. . .

...

.... . .

. . .. . . 0

... · · · . . .. . . 1

0 · · · · · · 0 λi

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠∈ C(mi,s , mi,s), s = 1, . . . , li .

(7.4.9)

The submatrix Ji,s given in either (7.4.7) or (7.4.9) is called a Jordan block.To simplify this statement, Ji,0 = λiIl0i

is customarily said to consist of l0i

1× 1 (degenerate) Jordan blocks.Consequently, if we choose

B = B1 ∪ · · · ∪ Bk, (7.4.10)

where Bi is as given in (7.4.3) or (7.4.8), i = 1, . . . , k, to be a basis of U , thenthe matrix that represents T with respect to B is

J =

⎛⎜⎜⎜⎜⎜⎜⎝J1 0 · · · 0

0. . .

. . ....

.... . .

. . . 0

0 · · · 0 Jk

⎞⎟⎟⎟⎟⎟⎟⎠ , (7.4.11)

which is called a Jordan canonical form or a Jordan matrix.We may summarize the above discussion into the following theorem, which

is the celebrated Jordan decomposition theorem.

Theorem 7.7 Let U be an n-dimensional vector space over C and T ∈ L(U)

so that its distinct eigenvalues are λ1, . . . , λk with respective algebraic multi-plicities n1, . . . , nk . Then the following hold.


(1) U has the decomposition U = V1⊕ · · ·⊕Vk and T is invariant over eachof the subspaces V1, . . . , Vk .

(2) For each i = 1, . . . , k, T − λiI is nilpotent of degree mi over Vi wheremi ≤ ni .

(3) For each i = 1, . . . , k, appropriate eigenvectors and generalized eigen-vectors may be chosen in Vi as stated in (7.4.3) or (7.4.8) which generatea basis of Vi , say Bi .

(4) With respect to the basis B = B1∪· · ·∪Bk of U , the matrix representationof T assumes the Jordan canonical form (7.4.11).

The theorem indicates that T nullifies the polynomial

mT (λ) = (λ− λ1)m1 · · · (λ− λk)

mk (7.4.12)

which is a polynomial of the minimum degree among all nonzero polynomialshaving T as a root. In fact, to show mT (T ) = 0, we rewrite any u ∈ U in theform

u = u1 + · · · + uk, u1 ∈ V1, . . . , uk ∈ Vk. (7.4.13)

Thus, we have

mT (T )(u) = (T − λ1I )m1 · · · (T − λkI)nk (u1 + · · · + uk)

=k∑

i=1

((T − λ1I )m1 · · · [(T − λiI )mi ] · · ·

· · · (T − λkI)nk)(T − λiI )mi (ui)

= 0, (7.4.14)

which establishes mT (T ) = 0 as asserted. Here [·] denotes the item that ismissing in the expression. Since mT (λ)|pT (λ), we arrive at pT (T ) = 0, asstated in the Cayley–Hamilton theorem.

Given T ∈ L(U), use P to denote the vector space of all polynomials withcoefficients in C and consider the subspace of P:

AT = {p ∈ P |p(T ) = 0}. (7.4.15)

It is clear that AT is an ideal in P . Elements in AT are also called annihilatingpolynomials of the mapping T . Let AT be generated by some m(λ). Then m(λ)

has the property that it is a minimal-degree polynomial among all nonzeroelements in AT . If we normalize the coefficient of the highest-degree term ofm(λ) to 1, then m is uniquely determined and is called the minimal polynomialof the linear mapping T . It is clear that, given T ∈ L(U), if λ1, . . . , λk are allthe distinct eigenvalues of T and m1, . . . , mk are the corresponding degrees of


nilpotence of T−λ1I, . . . , T−λkI over the respective generalized eigenspaces(7.4.2), then mT (λ) defined in (7.4.12) is the minimal polynomial of T .

For example, if T is nilpotent of degree k, then mT (λ) = λk; if T is idem-potent, that is, T 2 = T , and T = 0, T = I , then mT (λ) = λ2 − λ.

We may use minimal polynomials to characterize a diagonalizable linearmapping whose matrix representation with respect to a suitable basis is diag-onal. In such a situation it is clear that this basis is made of eigenvectors andthe diagonal entries of the diagonal matrix are the corresponding eigenvaluesof the mapping.

Theorem 7.8 Let U be an n-dimensional vector space over C and T ∈ L(U).Then T is diagonalizable if and only if its minimal polynomial has only simpleroots.

Proof Let mT (λ) be the minimal polynomial of T and λ1, . . . , λk the distincteigenvalues of T . If all roots of mT (λ) are simple, then for each eigenvalue λi ,the degree of T − λiI over the generalized eigenspace associated to λi is 1, orT = λiI , i= 1, . . . , k. Thus all the Jordan blocks given in (7.4.6) are diagonal,which makes J stated in (7.4.11) diagonal.

Conversely, assume T is diagonalizable and λ1, . . . , λk are all the distincteigenvalues of T . Since U = Eλ1 ⊕ · · · ⊕ Eλk

, we see that h(λ) = (λ −λ1) · · · (λ−λk) ∈ AT defined in (7.4.15). We claim that mT (λ) = h(λ). To seethis, we only need to show that for any element p ∈ AT we have p(λi) = 0,i = 1, . . . , k. Assume otherwise p(λ1) = 0 (say). Then p(λ) and λ − λ1 areco-prime. So there are polynomials f, g such that f (λ)p(λ)+ g(λ)(λ−λ1) ≡1. Consequently, I = g(T )(T − λ1I ), which leads to the contradiction u = 0for any u ∈ Eλ1 . Thus, we see that h is the lowest degree element in AT \ {0}.In other words, mT = h.

As a matrix version of Theorem 7.7, we may state that any n × n complexmatrix is similar to a Jordan matrix of the form (7.4.11). For matrices, min-imal polynomials may be defined analogously and, hence, omitted. Besides,the matrix version of Theorem 7.8 may read as follows: An n × n matrixis diagonalizable if and only if the roots of its minimal polynomial are allsimple.

Exercises

7.4.1 Show that a diagonalizable nilpotent mapping must be trivial, T = 0,and a nontrivial idempotent mapping T = I is diagonalizable.


7.4.2 Let λ1, . . . , λk be all the distinct eigenvalues of a Hermitian mappingT over an n-dimensional vector space U over C. Show that

mT (λ) = (λ− λ1) · · · (λ− λk). (7.4.16)

7.4.3 Let U be an n-dimensional vector space over C and S, T ∈ L(U). Inthe proof of Theorem 7.8, it is shown that if S ∼ T , then mS(λ) =mT (λ). Give an example to show that the condition mS(λ) = mT (λ) isnot sufficient to ensure S ∼ T .

7.4.4 Let A ∈ C(n, n). Show that A ∼ At .7.4.5 Let A,B ∈ R(n, n). Show that if there is a nonsingular element C ∈

C(n, n) such that A = C−1BC then there is a nonsingular elementK ∈ R(n, n) such that A = K−1BK .

7.4.6 Let A,B ∈ C(n, n) be normal. Show that if the characteristic polyno-mials of A,B coincide then A ∼ B.

7.4.7 Let T ∈ L(Cn) be defined by

T (x) =

⎛⎜⎜⎝xn

...

x1

⎞⎟⎟⎠ , x =

⎛⎜⎜⎝x1

...

xn

⎞⎟⎟⎠ ∈ Cn. (7.4.17)

(i) Determine all the eigenvalues of T .(ii) Find the minimal polynomial of T .

(iii) Does Cn have a basis consisting of eigenvectors of T ?


A =⎛⎜⎝ a 0 0

1 0 1

0 1 0

⎞⎟⎠ , (7.4.18)

where a ∈ R.

(i) For what value(s) of a can or cannot the matrix A be diagonalized?(ii) Find the Jordan forms of A corresponding to various values of a.

7.4.9 Let A ∈ C(n, n) and k ≥ 2 be an integer such that A ∼ Ak .

(i) Show that, if λ is an eigenvalue of A, so is λk .(ii) Show that, if in addition A is nonsingular, then each eigenvalue of

A is a root of unity. In other words, if λ ∈ C is an eigenvalue of A,then there is an integer s ≥ 1 such that λs = 1.

7.4.10 Let A ∈ C(n, n) satisfy Am = aIn for some integer m ≥ 1 and nonzeroa ∈ C. Use the information about the minimal polynomial of A toprove that A is diagonalizable.


7.4.11 Let A,B ∈ F(n, n) satisfy A ∼ B. Show that adj(A) ∼ adj(B).7.4.12 Show that the n× n matrices

A =

⎛⎜⎜⎜⎜⎜⎝1 1 · · · 1

1 1 · · · 1...

.... . .

...

1 1 · · · 1

⎞⎟⎟⎟⎟⎟⎠ , B =

⎛⎜⎜⎜⎜⎜⎝n 0 · · · 0

b2 0 · · · 0...

.... . .

...

bn 0 · · · 0

⎞⎟⎟⎟⎟⎟⎠ , (7.4.19)

where b2, . . . , bn ∈ R, are similar and diagonalizable.7.4.13 Show that the matrices

A =⎛⎜⎝ 2 0 0

0 0 1

0 1 0

⎞⎟⎠ , B =⎛⎜⎝ 1 0 0

0 −1 0

0 −6 2

⎞⎟⎠ , (7.4.20)

are similar and find a nonsingular element C ∈ R(3, 3) such that A =C−1BC.

7.4.14 Show that if A ∈ C(n, n) has a single eigenvalue then A is not diago-nalizable unless A = λIn. In particular, a triangular matrix with iden-tical diagonal entries can never be diagonalizable unless it is alreadydiagonal.

7.4.15 Consider A ∈ C(n, n) and express its characteristic polynomial pA(λ)

as

pA(λ) = (λ− λ1)n1 · · · (λ− λk)

nk , (7.4.21)

where λ1, . . . , λk ∈ C are the distinct eigenvalues of A and n1, . . . , nk

the respective algebraic multiplicities of these eigenvalues such thatk∑

i=1

ni = n. Show that A is diagonalizable if and only if r(λiI − A) =n− ni for i = 1, . . . , k.

7.4.16 Show that the matrix

A =

⎛⎜⎜⎜⎜⎝0 −1 0 0

0 0 1 0

0 0 0 −1

a4 0 0 0

⎞⎟⎟⎟⎟⎠ , (7.4.22)

where a > 0, is diagonalizable in C(4, 4) but not in R(4, 4).7.4.17 Show that there is no matrix in R(3, 3) whose minimal polynomial is

m(λ) = λ2 + 3λ+ 4.7.4.18 Let A ∈ R(n, n) and satisfy A2 + In = 0.


(i) Show that the minimal polynomial of A is simply λ2+ 1.(ii) Show that n must be an even integer, n = 2m.

(iii) Show that there are n linearly independent vectors,u1, v1, . . . , um, vm, in Rn such that

Aui = −vi, Avi = ui, i = 1, . . . , m. (7.4.23)

(iv) Use the vectors u1, v1, . . . , um, vm to construct an invertiblematrix B ∈ R(n, n) such that

A = B

(0 Im

−Im 0

)B−1. (7.4.24)

8

Selected topics

In this chapter we present a few selected subjects that are important in applica-tions as well, but are not usually included in a standard linear algebra course.These subjects may serve as supplemental or extracurricular materials. Thefirst subject is the Schur decomposition theorem, the second is about the clas-sification of skewsymmetric bilinear forms, the third is the Perron–Frobeniustheorem for positive matrices, and the fourth concerns the Markov or stochasticmatrices.

8.1 Schur decomposition

In this section we establish the Schur decomposition theorem, which serves asa useful complement to the Jordan decomposition theorem and renders furtherinsight and fresh angles into various subjects, such as the spectral structures ofnormal mappings and self-adjoint mappings, already covered.

Theorem 8.1 Let U be a finite-dimensional complex vector space with a pos-itive definite scalar product and T ∈ L(U). There is an orthonormal basisB = {u1, . . . , un} of U such that the matrix representation of T with respectto B is upper triangular. That is,

T (uj ) =j∑

i=1

bijui, j = 1, . . . , n, (8.1.1)

for some bij ∈ C, i = 1, . . . , j, j = 1, . . . , n. In particular, the diagonalentries b11, . . . , bnn of the upper triangular matrix B = (bij ) ∈ C(n, n) arethe eigenvalues of T , which are not necessarily distinct.

Proof We prove the theorem by induction on dim(U).When dim(U) = 1, there is nothing to show.

226

8.1 Schur decomposition 227

Assume that the theorem holds when dim(U) = n− 1 ≥ 1.We proceed to establish the theorem when dim(U) = n ≥ 2.Let w be an eigenvector of T ′ associated with the eigenvalue λ and consider

V = (Span{w})⊥. (8.1.2)

We assert that V is invariant under T . In fact, for any v ∈ V , we have

(w, T (v)) = (T ′(w), v) = (λw, v) = λ(w, v) = 0, (8.1.3)

which verifies T (v) ∈ V . Thus T ∈ L(V ).Applying the inductive assumption on T ∈ L(V ) since dim(V ) = n−1, we

see that there is an orthonormal basis {u1, . . . , un−1} of V and scalars bij ∈ C,

i = 1, . . . , j, j = 1, . . . , n− 1 such that

T (uj ) =j∑

i=1

bijui, j = 1, . . . , n− 1. (8.1.4)

Finally, setting un = (1/‖w‖)w, we conclude that {u1, . . . , un−1, un} is anorthonormal basis of U with the stated properties.

Using the Schur decomposition theorem, Theorem 8.1, the Cayley–Hamilton theorem (over C) can be readily proved.

In fact, since T (u1) = b11u1, we have (T − b11I )(u1) = 0. For j − 1 ≥ 1,we assume (T − b11I ) · · · (T − bj−1,j−1I )(uk) = 0 for k = 1, . . . , j − 1.Using the matrix representation of T with respect to {u1, . . . , un}, we have

(T − bjj I )(uj ) =j−1∑i=1

bijui . (8.1.5)

Hence we arrive at the general conclusion

(T − b11I ) · · · (T − bjj I )(uj )

= (T − b11I ) · · · (T − bj−1,j−1I )

⎛⎝j−1∑i=1

bijui

⎞⎠ = 0, (8.1.6)

for j = 2, . . . , n. Hence (T − b11I ) · · · (T − bjj I )(uk) = 0 for k = 1, . . . , j,

j = 1, . . . , n. In particular, since the characteristic polynomial of T takes theform

pT (λ) = det(λI − B) = (λ− b11) · · · (λ− bnn), (8.1.7)

we have

pT (T )(ui) = (T − b11I ) · · · (T − bnnI )(ui) = 0, i = 1, . . . , n. (8.1.8)

228 Selected topics

Consequently, pT (T ) = 0, as anticipated.It is clear that, in Theorem 8.1, if the upper triangular matrix B ∈ C(n, n)

is diagonal, then the orthonormal basis B is made of the eigenvectors of T .In this situation T is normal. Likewise, if B is diagonal and real, then T isself-adjoint.

We remark that Theorem 8.1 may also be proved without resorting to theadjoint mapping.

In fact, use the notation of Theorem 8.1 and proceed to the nontrivial situ-ation dim(U) = n ≥ 2 directly. Let λ1 ∈ C be an eigenvalue of T and u1 anassociated unit eigenvector. Then we have the orthogonal decomposition

U = Span{u1} ⊕ V, V = (Span{u1})⊥. (8.1.9)

Let P ∈ L(U) be the projection of U onto V along Span{u1} and set S =P ◦ T . Then S may be viewed as an element in L(V ).

Since dim(V ) = n− 1, we may apply the inductive assumption to obtain anorthonormal basis, say {u2, . . . , un}, of V , such that

S(uj ) =j∑

i=2

bijui, j = 2, . . . , n, (8.1.10)

for some bij s in C. Of course the vectors u1, u2, . . . , un now form an orthonor-mal basis of U . Moreover, in view of (8.1.10) and R(I − P) = Span{u1},we have

T (u1) = λ1u1 ≡ b11u1,

T (uj ) = ((I − P) ◦ T )(uj )+ (P ◦ T )(uj )

= b1j u1 +j∑

i=2

bijui, b1j ∈ C, j = 2, . . . , n,

(8.1.11)

where the matrix B = (bij ) ∈ C(n, n) is clearly upper triangular as described.Thus Theorem 8.1 is again established.The matrix version of Theorem 8.1 may be stated as follows.

Theorem 8.2 For any matrix A ∈ C(n, n) there is a unitary matrix P ∈C(n, n) and an upper triangular matrix B ∈ C(n, n) such that

A = P †BP. (8.1.12)

That is, the matrix A is Hermitian congruent or similar through a unitarymatrix to an upper triangular matrix B whose diagonal entries are all theeigenvalues of A.

8.1 Schur decomposition 229

The proof of Theorem 8.2 may be obtained by applying Theorem 8.1, wherewe take U = Cn with the standard Hermitian scalar product and define T ∈L(Cn) by setting T (u) = Au for u ∈ Cn. In fact, with B = {u1, . . . , un} beingthe orthonormal basis of Cn stated in Theorem 8.1, the unitary matrix P in(8.1.12) is such that the ith column vector of P † is simply ui , i = 1, . . . , n.

From (8.1.12) we see immediately that A is normal if and only if B is diag-onal and that A is Hermitian if and only if B is diagonal and real.

Exercises

8.1.1 Show that the matrix B may be taken to be lower triangular in Theo-rems 8.1 and 8.2.

8.1.2 Let A = (aij ), B = (bij ) ∈ C(n, n) be stated in the relation (8.1.12).Use the fact Tr(A†A) = Tr(B†B) to infer the identity

n∑i,j=1

|aij |2 =∑

1≤i≤j≤n

|bij |2. (8.1.13)

8.1.3 Denote by λ1, . . . , λn all the eigenvalues of a matrix A ∈ C(n, n). Showthat A is normal if and only if it satisfies the equation

Tr(A†A) = |λ1|2 + · · · + |λn|2. (8.1.14)

8.1.4 For A ∈ R(n, n) assume that the roots of the characteristic polynomialof A are all real. Establish the real version of the Schur decompositiontheorem, which asserts that there is an orthogonal matrix P ∈ R(n, n)

and an upper triangular matrix B ∈ R(n, n) such that A = P tBP andthat the diagonal entries of B are all the eigenvalues of A. Can you provea linear mapping version of the theorem when U is a real vector spacewith a positive definite scalar product?

8.1.5 Show that if A ∈ R(n, n) is normal and all the roots of its characteristicpolynomial are real then A must be symmetric.

8.1.6 Let U be a finite-dimensional complex vector space with a positive defi-nite scalar product and S, T ∈ L(U). If S and T are commutative, showthat U has an orthonormal basis, say B, such that, with respect to B, thematrix representations of S and T are both upper triangular.

8.1.7 Let U be an n-dimensional (n ≥ 2) complex vector space with a positivedefinite scalar product and T ∈ L(U). If λ ∈ C is an eigenvalue of T u ∈U an associated eigenvector, then the quotient space V = U/Span{u} isof dimension n− 1.

(i) Define a positive definite scalar product over V and show that T

induces an element in L(V ).

230 Selected topics

(ii) Formulate an inductive proof of Theorem 8.1 using the constructionin (i).

8.1.8 Given A ∈ C(n, n) (n ≥ 2), follow the steps below to carry out aninductive but also constructive proof of Theorem 8.2.

(i) Find an eigenvalue, say λ1, of A, and a unit eigenvector u1 ∈ Cn,taken as a column vector. Let u2, . . . , un ∈ Cn be chosen so thatu1, u2, . . . , un form an orthonormal basis of Cn. Use u1, u2, . . . , un

as the first, second,. . . , and the nth column vectors of a matrix calledQ1. Then Q1 is unitary. Check that

Q†1AQ1 =

(λ1 α

0 An−1

), (8.1.15)

where An−1 ∈ C(n− 1, n− 1) and α ∈ Cn−1 is a row vector.(ii) Apply the inductive assumption to get a unitary element Q∈C(n−1,

n− 1) so that Q†An−1Q is upper triangular. Show that

Q2 =(

1 0

0 Q

)(8.1.16)

is a unitary element in C(n, n) such that

(Q1Q2)†A(Q1Q2) = Q

†2Q

†1AQ1Q2 (8.1.17)

becomes upper triangular as desired.

8.1.9 Let T ∈ L(U) where U is a finite-dimensional complex vector spacewith a positive definite scalar product. Prove that if λ is an eigenvalue ofT then λ is an eigenvalue of T ′.

8.2 Classification of skewsymmetric bilinear forms

Let U be a finite-dimensional vector space over a field F. A bilinear formf : U × U → F is called skewsymmetric or anti-symmetric if it satisfies

f (u, v) = −f (v, u), u, v ∈ U. (8.2.1)

Let B = {u1, . . . , un} be a basis of U . For u, v ∈ U with coordinate vectorsx = (x1, . . . , xn)

t , y = (y1, . . . , yn)t ∈ Fn with respect to B, we can rewrite

f (u, v) as

f (u, v) = f

⎛⎝ n∑i=1

xiui,

n∑j=1

yjuj

⎞⎠ = n∑i,j=1

xif (ui, uj )yj = xtAy, (8.2.2)

8.2 Classification of skewsymmetric bilinear forms 231

where A = (aij ) = (f (ui, uj )) ∈ F(n, n) is the matrix representation of f

with respect to the basis B. Thus, combining (8.2.1) and (8.2.2), we see that A

must be skewsymmetric or anti-symmetric, A = −At , because

−xtAy = −f (u, v) = f (v, u) = ytAx = (ytAx)t = xtAty, (8.2.3)

and x, y ∈ Fn are arbitrary.Let B = {u1, . . . , un} be another basis of U and A = (aij ) = (f (ui , uj )) ∈

F(n, n) the matrix representation of f with respect to B. If the transition matrixbetween B and B is B = (bij ) ∈ F(n, n) so that

uj =n∑

i=1

bijui, j = 1, . . . , n, (8.2.4)

then we know that A and A are congruent through B, A = BtAB, as discussedin Chapter 5.

In this section, we study the canonical forms of skewsymmetric formsthrough a classification of skewsymmetric matrices by congruent relations.

Theorem 8.3 Let A ∈ F(n, n) be skewsymmetric. Then there is some non-singular matrix C ∈ F(n, n) satisfying det(C) = ±1 so that A is congruentthrough C to a matrix � ∈ F(n, n) of the following canonical form

� ≡

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

α1 · · · 0 · · · · · · 0...

. . ....

......

...

... · · · αk · · · ......

0 · · · · · · 0 · · · ...

......

......

. . ....

0 · · · · · · · · · · · · 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠= CACt , (8.2.5)

where

αi =(

0 ai

−ai 0

), i = 1, . . . , k, (8.2.6)

are some 2 × 2 skewsymmetric matrices given in terms of k (if any) nonzeroscalars a1, . . . , ak ∈ F.

Proof In the trivial case A = 0, there is nothing to show.We now make induction on n.

232 Selected topics

If n = 1, then A = 0 and there is nothing to show. If n = 2 but A = 0, then

A =(

0 a

−a 0

), a = 0. (8.2.7)

Hence there is nothing to show either.Assume the statement of the theorem is valid for any n ≤ l for some l ≥ 2.Let n = l + 1. Assume the nontrivial case A = (aij ) = 0. So there is some

aij = 0 for i, j = 1, . . . , n. Let i be the smallest among {1, . . . , n} such thataij = 0 for some j = 1, . . . , n. If i = 1 and j = 2, then a12 = 0. If i = 1 andj > 2, we let E1 be the elementary matrix obtained from interchanging thesecond and j th rows of In. Then det(E1) = −1 and the entry at the first rowand second column of E1AEt

1 is nonzero. If i > 1, then the first row of A is azero row. Let E1 be the elementary matrix obtained from interchanging the firstand ith rows of In. Then det(E1) = −1 and the first row of E1AEt

1 = (bij )

is nonzero. So there is some 2 ≤ j ≤ n such that b1j = 0. Let E2 be theelementary matrix obtained from interchanging the second and j th rows ofE1AEt

1. Then, in E2E1AEt1E

t2 = (cij ), we have c12 ≡ a1 = 0. We also have

det(E2) = ±1 depending on whether j = 2 or j = 2.Thus we may summarize that there is a matrix E with det(E) = ±1

such that

EAEt =(

α1 β

−βt An−2

), α1 =

(0 a1

−a1 0

), (8.2.8)

where β ∈ F(2, n− 2) and An−2 ∈ F(n− 2, n− 2) is skewsymmetric.Consider P ∈ F(n, n) of the form

P =(

I2 0

γ t In−2

), (8.2.9)

where γ ∈ F(2, n− 2) is to be determined. Then det(P ) = 1 and

P t =(

I2 γ

0 In−2

). (8.2.10)

Consequently, we have

PEAEtP t =(

I2 0

γ t In−2

)(α1 β

−βt An−2

)(I2 γ

0 In−2

)

=(

α1 α1γ + β

γ tα1 − βt γ tα1γ − βtγ + γ tβ + An−2

). (8.2.11)


To proceed, we choose γ to satisfy

α1γ + β = 0. (8.2.12)

This is possible to do since α1 ∈ F(2, 2) is invertible. In other words, we areled to the unique choice

γ = −α−11 β, α−1

1 =(

0 −a−11

a1 0

). (8.2.13)

Thus, in view of (8.2.12), we see that (8.2.11) becomes

PEAEtP t =(

α1 0

0 G

), (8.2.14)

where

G = γ tα1γ − βtγ + γ tβ + An−2 (8.2.15)

is a skewsymmetric element in F(n− 2, n− 2).Using the inductive assumption, we can find an element D ∈ F(n−2, n−2)

satisfying det(D) = ±1 such that DGDt is of the desired canonical formstated in the theorem.

Now let

Q =(

I2 0

0 D

). (8.2.16)

Then det(Q) = ±1 and

QPEAEtP tQt =(

α1 0

0 DGDt

)= �, (8.2.17)

where the matrix � ∈ F(n, n) is as given in (8.2.5).Taking C = QPE, we see that the theorem is established.

Furthermore, applying a sequence of elementary matrices to the left andright of the canonical matrix � given in (8.2.5) realizing suitable row and col-umn interchanges, we see that we can use a nonsingular matrix of determinant±1 to congruently reduce � into another canonical form,

� =⎛⎜⎝ 0 Dk 0

−Dk 0 0

0 0 0

⎞⎟⎠ ∈ F(n, n), (8.2.18)

234 Selected topics

where Dk ∈ F(k, k) is the diagonal matrix

Dk = diag{a1, . . . , ak}. (8.2.19)

As a by-product of the above discussion, we infer that the rank of askewsymmetric matrix is 2k, an even number.

We now consider the situation when F = R. By row and column operationsif necessary, we may assume without loss of generality that a1, . . . , ak > 0 in(8.2.6). With this assumption, define

R =

⎛⎜⎜⎜⎜⎜⎝β1 · · · 0 0...

. . ....

...

0 · · · βk 0

0 · · · 0 In−2k

⎞⎟⎟⎟⎟⎟⎠ , βi =

⎛⎜⎜⎝1√ai

0

01√ai

⎞⎟⎟⎠ , i = 1, . . . , k.

(8.2.20)

Then

R�Rt = ≡

⎛⎜⎜⎜⎜⎜⎝J2 · · · 0 0...

. . ....

...

0 · · · J2 0

0 · · · · · · 0

⎞⎟⎟⎟⎟⎟⎠ , (8.2.21)

where

J2 =(

0 1

−1 0

)(8.2.22)

appears k times in (8.2.21).Of course, as before, we may also further congruently reduce in (8.2.21)

into another canonical form,

=(

J2k 0

0 0

), (8.2.23)

where J2k ∈ R(2k, 2k) is given by

J2k =(

0 Ik

−Ik 0

). (8.2.24)

As in the case of scalar products, for a skewsymmetric bilinear form f :U × U → F, define

S⊥ = {u ∈ U | f (u, v) = 0, v ∈ S} (8.2.25)


for a non-empty subset S of U . Then S⊥ is a subspace of U .Set

U0 = U⊥ = {u ∈ U | f (u, v) = 0, v ∈ U}. (8.2.26)

We call f non-degenerate if U0 = {0}. In other words, a non-degenerateskewsymmetric bilinear form f is characterized by the fact that when u ∈ U

and f (u, v) = 0 for all v ∈ U then u = 0. Thus f is called degenerate ifU0 = {0} or dim(U0) ≥ 1. It is not hard to show that f is degenerate if andonly if the rank of the matrix representation of f with respect to any basis ofU is smaller than dim(U).

If f is degenerate over U , then the canonical form (8.2.5) indicates that wemay obtain a basis

{u1, . . . , un0 , v1, . . . , vk, w1, . . . , wk} (8.2.27)

of U so that {u1, . . . , un0} is a basis of U0 with n0 = dim(U0) and that⎧⎪⎨⎪⎩f (vi, wi) = ai = 0, i = 1, . . . , k,

f (vi, vj ) = f (wi,wj ) = 0, i, j = 1, . . . , k,

f (vi, wj ) = 0, i = j, i, j = 1, . . . , k.

(8.2.28)

Of particular interest is when f is non-degenerate over U . In such a situa-tion, dim(U) must be an even number, 2k.

Definition 8.4 Let U be a vector space of 2k dimensions. A skewsymmetricbilinear form f : U × U → F is called symplectic if f is non-degenerate. Aneven dimensional vector space U equipped with a symplectic form is called asymplectic vector space. A basis {v1, . . . , vk, w1, . . . , wk} of a 2k-dimensionalsymplectic vector space U equipped with the symplectic form f is called sym-plectic if⎧⎪⎨⎪⎩

f (vi, wi) = 1, i = 1, . . . , k,

f (vi, vj ) = f (wi,wj ) = 0, i, j = 1, . . . , k,

f (vi, wj ) = 0, i = j, i, j = 1, . . . , k.

(8.2.29)

A symplectic basis is also called a Darboux basis.Therefore, we have seen that a real symplectic vector space always has a

symplectic basis.

Definition 8.5 Let U be a symplectic vector space equipped with a symplecticform f . A subspace V of U is called isotropic if f (u, v) = 0 for any u, v ∈ V .

236 Selected topics

If dim(U) = 2k then any k-dimensional isotropic subspace of U is called aLagrangian subspace.

Let U be a symplectic vector space with a symplectic basis given as in(8.2.29). Then we see that both

V = Span{v1, . . . , vk}, W = Span{w1, . . . , wk} (8.2.30)

are Lagrangian subspaces of U .If U is a finite-dimensional complex vector space, we may consider a

skewsymmetric sesquilinear form f from U ×U into C. Such a form satisfies

f (u, v) = −f (v, u), u, v ∈ U, (8.2.31)

and is called Hermitian skewsymmetric or skew-Hermitian. Therefore, the form

g(u, v) = if (u, v), u, v ∈ U, (8.2.32)

is Hermitian. It is clear that the matrix representation, say A ∈ C(n, n), withrespect to any basis of U of a Hermitian skewsymmetric form is anti-Hermitianor skew-Hermitian, A† = −A. Hence iA is Hermitian. Applying the knowl-edge about Hermitian forms and Hermitian matrices studied in Chapter 6, it isnot hard to come up with a complete understanding of skew-Hermitian formsand matrices, in the same spirit of Theorem 8.3. We leave this as an exercise.

Exercises

8.2.1 Let f be a bilinear form over a vector space U . Show that f is skewsym-metric if and only if f (u, u) = 0 for any u ∈ U .

8.2.2 Let f be a skewsymmetric bilinear form over U and S ⊂ U a non-emptysubset. Show that S⊥ defined in (8.2.25) is a subspace of U .

8.2.3 Let U be a finite-dimensional vector space and f a skewsymmetric bil-inear form over U . Define U0 by (8.2.26) and use A to denote the matrixrepresentation of f with respect to any given basis of U . Show that

dim(U0) = dim(U)− r(A). (8.2.33)

8.2.4 Let U be a symplectic vector space equipped with a symplectic form f .If V is a subspace of U , V ⊥ is called the symplectic complement of V

in U . Prove the following.

(i) dim(V )+ dim(V ⊥) = dim(U).(ii) (V ⊥)⊥ = V .

8.2.5 Let U be a symplectic vector space equipped with a symplectic form f .If V is a space of U , we can consider the restriction of f over V . Provethat f is symplectic over V if and only if V ∩ V ⊥ = {0}.

8.3 Perron–Frobenius theorem for positive matrices 237

8.2.6 Show that a subspace V of a symplectic vector space U is isotropic ifand only if V ⊂ V ⊥.

8.2.7 Show that a subspace V of a symplectic vector space U is Lagrangian ifand only if V = V ⊥.

8.2.8 For A ∈ C(n, n), show that A is skew-Hermitian if and only if there is anonsingular element C ∈ C(n, n) such that CAC† = iD where D is ann× n real diagonal matrix.

8.3 Perron–Frobenius theorem for positive matrices

Let A = (aij ) ∈ R(n, n). We say that A is positive (non-negative) if aij > 0(aij ≥ 0) for all i, j = 1, . . . , n. Likewise we say that a vector u = (ai) ∈Rn is positive (non-negative) if ai > 0 (ai ≥ 0) for all i = 1, . . . , n. Moregenerally, for A,B ∈ R(n, n), we write A > B (A ≥ B) if (A − B) > 0((A − B) ≥ 0); for u, v ∈ Rn, we write u > v (u ≥ v) if (u − v) > 0((u− v) ≥ 0).

The Perron–Frobenius theorem concerns the existence and properties of apositive eigenvector, associated to a positive eigenvalue, of a positive matrixand may be stated as follows.

Theorem 8.6 Let A = (aij ) ∈ R(n, n) be a positive matrix. Then there isa positive eigenvalue, r , of A, called the dominant eigenvalue, satisfying thefollowing properties.

(1) There is a positive eigenvector, say u, associated with r , such that anyother non-negative eigenvectors of A associated with r must be positivemultiples of u.

(2) r is a simple root of the characteristic polynomial of A.(3) If λ ∈ C is any other eigenvalue of A, then

|λ| < r. (8.3.1)

Furthermore, any nonnegative eigenvector of A must be associated withthe dominant eigenvalue r .

Proof We equip Rn with the norm

‖x‖ = max{|xi | | i = 1, . . . , n}, x =

⎛⎜⎜⎝x1

...

xn

⎞⎟⎟⎠ ∈ Rn, (8.3.2)

238 Selected topics

and consider the subset

S = {x ∈ Rn | ‖x‖ = 1, x ≥ 0} (8.3.3)

of Rn. Then define

� = {λ ∈ R | λ ≥ 0, Ax ≥ λx for some x ∈ S}. (8.3.4)

We can show that � is an interval in R of the form [0, r] for some r > 0.In fact, take a test vector, say y = (1, . . . , 1)t ∈ Rn. Since A is positive, it

is clear that Ay ≥ λy if λ satisfies

0 < λ ≤ min

⎧⎨⎩n∑

j=1

aij

∣∣∣∣i = 1, . . . , n

⎫⎬⎭ . (8.3.5)

Moreover, the definition of � implies immediately that if λ ∈ � then[0, λ] ⊂ �. Thus � is connected.

Let λ ∈ �. Then there is some x ∈ S such that Ax ≥ λx. Therefore, since‖x‖ = 1, we have

λ = ‖λx‖ = ‖Ax‖ = max

⎧⎨⎩∣∣∣∣∣∣

n∑j=1

aij xj

∣∣∣∣∣∣∣∣∣∣ i = 1, . . . , n

⎫⎬⎭≤ max

⎧⎨⎩n∑

j=1

aij

∣∣∣∣ i = 1, . . . , n

⎫⎬⎭ , (8.3.6)

which establishes the boundedness of �.Now set

r = sup {λ ∈ �} . (8.3.7)

Then we have seen that r satisfies 0 < r < ∞. Hence there is a sequence{λk} ⊂ � such that

r = limk→∞ λk. (8.3.8)

On the other hand, the assumption that {λk} ⊂ � indicates that there is asequence {x(k)} ⊂ S such that

Ax(k) ≥ λkx(k), k = 1, 2, . . . . (8.3.9)

Using the compactness of S, we may assume that there is subsequence of {x(k)}which we still denote by {x(k)}without loss of generality such that it convergesto some element, say u = (a1, . . . , an)

t , in S, as k → ∞. Letting k → ∞ in(8.3.9), we arrive at

Au ≥ ru. (8.3.10)


In particular, this proves r ∈ �. Thus indeed � = [0, r].We next show that equality must hold in (8.3.10).Suppose otherwise there is some i0 = 1, . . . , n such that

n∑j=1

ai0j aj > rai0;n∑

j=1

aij aj ≥ rai for i ∈ {1, . . . , n} \ {i0}. (8.3.11)

Let z = (zi) = u + sei0 (s > 0). Since zi = ai for i = i0, we conclude fromthe second inequality in (8.3.11) that

n∑j=1

aij zj > rzi for i ∈ {1, . . . , n} \ {i0}, s > 0. (8.3.12)

However, the first inequality in (8.3.11) gives us

n∑j=1

ai0j aj > r(ai0 + s) when s > 0 is sufficiently small, (8.3.13)

which leads ton∑

j=1

ai0j zj > rzi0 when s > 0 is sufficiently small. (8.3.14)

Thus Az > rz. Set v = z/‖z‖. Then v ∈ S and Av > rv. Hence we maychoose ε > 0 small such that Av > (r + ε)v. So r + ε ∈ �, which contradictsthe definition of r made in (8.3.7).

Therefore Au = ru and r is a positive eigenvalue of A.The positivity of u = (ai) follows easily since u ∈ S and A > 0 so that

ru = Au leads to

λai =n∑

j=1

aij aj > 0, i = 1, . . . , n, (8.3.15)

because u = 0. That is, a non-negative eigenvector of A associated to a positiveeigenvalue must be positive.

Let v be any non-negative eigenvector of A associated to r . We show thatthere is a positive number a such that v = au. For this purpose, we constructthe vector

us = u− sv, s > 0. (8.3.16)

Of course us is a non-negative eigenvector of A associated to r when s > 0 issmall. Set

s0 = sup{s | us ≥ 0}. (8.3.17)

240 Selected topics

Then s0 > 0 and us0 ≥ 0 but us0 > 0. If us0 = 0 then us0 > 0 since us0 is aneigenvector of A associated to r that contradicts the definition of s0. Thereforewe must have us0 = 0, which gives us the result v = (s−1

0 )u as desired.In order to show that r is the simple root of the characteristic polynomial

of A, we need to prove that there is only one Jordan block associated with theeigenvalue r and that this Jordan block can only be 1 × 1. To do so, we firstshow that the geometric multiplicity of r is 1. Then we show that there is nogeneralized eigenvector. That is, the equation

(A− rIn)v = u, v ∈ Cn, (8.3.18)

has no solution.Suppose otherwise that the dimension of the eigenspace Er is greater than

one. Then there is some v ∈ Er that is not a scalar multiple of u. Write v asv = v1 + iv2 with v1, v2 ∈ Rn. Then Av1 = rv1 and Av2 = rv2. We assertthat one of the sets of vectors {u, v1} and {u, v2} must be linearly independentover R. Otherwise there are a1, a2 ∈ R such that v1 = a1u and v2 = a2u whichimply v = (a1 + ia2)u, a contradiction. Use w to denote either v1 or v2 whichis linearly independent from u over R. Then we know that ±w can never benon-negative. Thus there are components of w that have different signs. Nowconsider the vector

us = u+ sw, s > 0. (8.3.19)

It is clear that us > 0 when s > 0 is sufficiently small since u > 0. So thereis some s0 > 0 such that us0 ≥ 0 but a component of us0 is zero. However,because us0 = 0 owing to the presence of a positive component in w, we arriveat a contradiction because us0 is seen to be an eigenvector of A associated to r .

We next show that (8.3.18) has no solution. Since u ∈ Rn, we need only toconsider v ∈ Rn in (8.3.18). We proceed as follows.

Consider

ws = v + su, s > 0. (8.3.20)

Since Au = ru, we see that ws also satisfies (8.3.18). Take s > 0 sufficientlylarge so that ws > 0. Hence (A− rIn)ws = u > 0 or

Aws > rws. (8.3.21)

Thus, if δ > 0 is sufficiently small, we have Aws > (r + δ)ws . Rescaling ws

if necessary, we may assume ws ∈ S. This indicates r + δ ∈ � which violatesthe definition of r stated in (8.3.7).

So the assertion that r is a simple root of the characteristic polynomial of A

follows.


Moreover, let λ ∈ C an eigenvalue of A which is not r and v = (bi) ∈ Cn

an associated eigenvector. From Av = λv we have

|λ||bi | ≤n∑

j=1

aij |bj |, i = 1, . . . , n. (8.3.22)

Rescaling if necessary, we may also assume

w =

⎛⎜⎜⎝|b1|...

|bn|

⎞⎟⎟⎠ ∈ S. (8.3.23)

Since (8.3.22) implies Aw ≥ rw, we see in view of the definition of � that|λ| ∈ �. Using (8.3.7), we have |λ| ≤ r .

If |λ| < r , there is nothing more to do. If |λ| = r , the discussion just madein the earlier part of the proof shows that w is a non-negative eigenvector ofA associated to r and thus equality holds in (8.3.22) for all i = 1, . . . , n.Therefore, since A > 0, the complex numbers b1, . . . , bn must share the samephase angle, θ ∈ R, so that

bi = |bi |eiθ , i = 1, . . . , n. (8.3.24)

On the other hand, since w is a non-negative eigenvector of A associated to r ,there is a number a > 0 such that

w = au. (8.3.25)

Combining (8.3.23)–(8.3.25), we have v = aeiθu. In particular, λ = r .Finally let v be an arbitrary eigenvector of A which is non-negative and

associated to an eigenvalue λ ∈ C. Since At > 0 and the characteristic poly-nomial of At is the same as that of A, we conclude that r is also the dominanteigenvalue of At . Now use w to denote a positive eigenvector of At associatedto r . Then Atw = rw. Thus

λvtw = (Av)tw = vtAtw = rvtw. (8.3.26)

However, in view of the fact that v ≥ 0, v = 0, w > 0, we have vtw > 0. Soit follows from (8.3.26) that λ = r .

The proof of the theorem is complete.

The dominant eigenvalue r of the distinguished characteristics of a positivematrix A stated in Theorem 8.6 is also called the Perron or Perron–Frobeniuseigenvalue of the matrix A.

For an interesting historical account of the Perron–Frobenius theory and adiscussion of its many applications and generalizations and attributions of theproofs including the one presented here, see Bellman [6].

242 Selected topics

Exercises

8.3.1 Let A = (aij ) ∈ R(n, n) be a positive matrix and r > 0 the Perron–Frobenius eigenvalue of A. Show that r satisfies the estimate

min1≤i≤n

⎛⎝ n∑j=1

aij

⎞⎠ ≤ r ≤ max1≤i≤n

⎛⎝ n∑j=1

aij

⎞⎠ . (8.3.27)


A =⎛⎜⎝ 1 2 3

2 3 1

3 2 1

⎞⎟⎠ . (8.3.28)

(i) Use (8.3.27) to find the dominant eigenvalue of A.(ii) Check to see that u = (1, 1, 1)t is a positive eigenvector of A. Use u

and Theorem 8.6 to find the dominant eigenvalue of A and confirmthat this is exactly what was obtained in part (i).

(iii) Compute all the eigenvalues of A directly and confirm the resultobtained in part (i) or (ii).

8.3.3 Let A,B ∈ R(n, n) be positive matrices and use rA, rB to denotethe dominant eigenvalues of A,B, respectively. Show that rA ≤ rB ifA≤B.

8.3.4 Let A ∈ R(n, n) be a positive matrix and u = (ai) ∈ Rn a non-negativeeigenvector of A. Show that the dominant eigenvalue r of A may becomputed by the formula

r = 1∑ni=1 ai

⎛⎝ n∑j,k=1

ajkak

⎞⎠ . (8.3.29)

8.4 Markov matrices

In this section we discuss a type of nonnegative matrices known as the Markovor stochastic matrices which are important in many areas of applications.

Definition 8.7 Let A = (aij ) ∈ R(n, n) so that aij ≥ 0 for i, j = 1, . . . , n. IfA satisfies

n∑j=1

aij = 1, i = 1, . . . , n, (8.4.1)

8.4 Markov matrices 243

that is, the components of each row vector in A sum up to 1, then A is calleda Markov or stochastic matrix. If there is an integer m ≥ 1 such that Am is apositive matrix, then A is called a regular Markov or regular stochastic matrix.

A few immediate consequences follow directly from the definition of aMarkov matrix and are stated below.

Theorem 8.8 Let A = (aij ) ∈ R(n, n) be a Markov matrix. Then 1 is aneigenvalue of A which enjoys the following properties.

(1) The vector u = (1, . . . , 1)t ∈ Rn is an eigenvector of A associated to theeigenvalue 1.

(2) Any eigenvalue λ ∈ C of A satisfies

|λ| ≤ 1. (8.4.2)

Proof Using (8.4.1), the fact that u = (1, . . . , 1)t satisfies Au = u may bechecked directly.

Now let λ ∈ C be any eigenvalue of A and v = (bi) ∈ Cn an associatedeigenvector. Then there is some i0 = 1, . . . , n such that

|bi0 | = max{|bi | | i = 1, . . . , n}. (8.4.3)

Thus, from the relation Av = λv, we have

λbi0 =n∑

j=1

ai0j bj , (8.4.4)

which in view of (8.4.1) gives us

|λ||bi0 | ≤n∑

j=1

ai0j |bj | ≤ |bi0 |n∑

j=1

ai0j = |bi0 |. (8.4.5)

Since |bi0 | > 0, we see that the bound (8.4.2) follows.

In fact, for a nonnegative element A ∈ R(n, n), it is clear that A being aMarkov matrix is equivalent to the vector u = (1, . . . , 1)t being an eigenvec-tor of A associated to the eigenvalue 1. This simple fact establishes that theproduct of any number of the Markov matrices in R(n, n) is also a Markovmatrix.

Let A ∈ R(n, n) be a Markov matrix. It will be interesting to identify acertain condition under which the eigenvalue 1 of A becomes dominant as inthe Perron–Frobenius theorem. Below is such a result.

244 Selected topics

Theorem 8.9 If A ∈ R(n, n) is a regular Markov matrix, then the eigenvalue1 of A is the dominant eigenvalue of A which satisfies the following properties.

(1) The absolute value of any other eigenvalue λ ∈ C of A is less than 1. Thatis, |λ| < 1.

(2) 1 is a simple root of the characteristic polynomial of A.

Proof Let m ≥ 1 be an integer such that Am > 0. Since Am is a Markovmatrix, we see that 1 is the dominant eigenvalue of Am. On the other hand, ifλ ∈ C is any eigenvalue of A other than 1, since λm is an eigenvalue of Am

other than 1, we have in view of Theorem 8.6 that |λm| < 1, which proves|λ| < 1.

We now show that 1 is a simple root of the characteristic polynomial of A. Ifm = 1, the conclusion follows from Theorem 8.6. So we may assume m ≥ 2.

Recall that there is an invertible matrix C ∈ C(n, n) such that

A = CBC−1, (8.4.6)

where B ∈ C(n, n) takes the boxed diagonal form

B = diag{J1, · · · , Jk} (8.4.7)

in which each Ji (i = 1, . . . , k) is either a diagonal matrix of the form λI or aJordan block of the form

J =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

λ 1 0 · · · 0

0 λ 1. . .

...

.... . .

. . .. . . 0

... · · · . . .. . . 1

0 · · · · · · 0 λ

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠, (8.4.8)

where we use λ to denote a generic eigenvalue of A. In either case we mayrewrite J as the sum of two matrices, in the form

J = λI + P, (8.4.9)

where for some integer l ≥ 1 (with l being the degree of nilpotence of P ) wehave P l = 0. Hence, in view of the binomial expansion formula, we have


Jm = (λI + P)m

=m∑

s=0

m!s!(m− s)!λ

m−sP s

=l−1∑s=0

m!s!(m− s)!λ

m−sP s

= λmI + λm−1mP + · · · + λm−l+1 m(m− 1) · · · (m− [l − 2])(l − 1)! P l−1,

(8.4.10)

which is an upper triangular matrix with the diagonal entries all equal to λm.From (8.4.6) and (8.4.7), we have

Am = Cdiag{Jm1 , . . . , Jm

k }C−1. (8.4.11)

However, using the condition Am > 0, Theorem 8.6, and (8.4.10), weconclude that there exists exactly one Jordan block among the Jordan blocksJ1, . . . , Jk of A with λ = 1 and that such a Jordan block can only be 1× 1.

The proof of the theorem is thus complete.

Since in (8.4.10) there are l terms, we see that Jm → 0 as m → ∞ when|λ| < 1. This observation leads to the following important convergence theo-rem for the Markov matrices.

Theorem 8.10 If A ∈ R(n, n) is a regular Markov matrix, then

limm→∞Am = K, (8.4.12)

where K is a positive Markov matrix with n identical row vectors, vt1 = · · · =

vtn = vt , where v = (b1, . . . , bn)

t ∈ Rn is the unique positive vector satisfying

Atv = v,

n∑i=1

bi = 1. (8.4.13)

Proof If A is a regular Markov matrix, then there is an integer m ≥ 1 suchthat Am > 0. Thus (At )m = (Am)t > 0. Since At and A have the samecharacteristic polynomial, At has 1 as its dominant eigenvalue as A does. Letv = (bi) be an eigenvector of At associated to the eigenvalue 1. Then v is alsoan eigenvector of (At )m associated to 1. Since (At )m > 0 and its eigenvalue 1as the dominant eigenvalue of (At )m is simple, we see in view of Theorem 8.6that we may choose v ∈ Rn so that either v > 0 or v < 0. We now choosev > 0 and normalize it so that its components sum to 1. That is, v satisfies(8.4.13).

246 Selected topics

Using Theorem 8.9, let C ∈ C(n, n) be invertible such that

A = Cdiag{1, J1, . . . , Jk}C−1. (8.4.14)

Rewriting (8.4.14) as AC = Cdiag{1, J1, · · · , Jk}, we see that the first columnvector of C is an eigenvector of A associated to the eigenvalue 1 which issimple by Theorem 8.9. Hence we may choose this first column vector of C tobe u = a(1, . . . , 1)t for some a ∈ C, a = 0.

On the other hand, rewrite D = C−1 and express (8.4.14) as

DA = diag{1, J1, . . . , Jk}D. (8.4.15)

We see that the first row vector of D, say wt for some w ∈ Cn, satisfieswtA = wt . Hence Atw = w. Since 1 is a simple root of the characteristicpolynomial of At , we conclude that there is some b ∈ C, b = 0, such thatw = bv where v satisfies (8.4.13).

Since D = C−1, we have wtu = 1, which leads to

abvt (1, . . . , 1)t = ab

n∑i=1

bi = ab = 1. (8.4.16)

Finally, for any integer m ≥ 1, (8.4.14) gives us

Am = Cdiag{1, Jm1 , . . . , Jm

k }D. (8.4.17)

Since each of the Jordan blocks J1, . . . , Jk is of the form (8.4.9) for someλ ∈ C satisfying |λ| < 1, we have seen that

Jm1 , . . . , Jm

k → 0 as m→∞. (8.4.18)

Therefore, taking m→∞ in (8.4.17), we arrive at

limm→∞Am = Cdiag{1, 0, . . . , 0}D. (8.4.19)

Inserting the results that the first column vector of C is u = a(1, . . . , 1)t , thefirst row vector of D is wt = bvt , where v satisfies (8.4.13), and ab = 1 into(8.4.19), we obtain

limm→∞Am =

⎛⎜⎜⎝b1 · · · bn

... · · · ...

b1 · · · bn

⎞⎟⎟⎠ = K, (8.4.20)

as asserted.

For a Markov matrix A ∈ R(n, n), the power of A, Am, may or may notapproach a limiting matrix as m → ∞. If the limit of Am as m → ∞ existsand is some K ∈ R(n, n), then it is not hard to show that K is also a Markovmatrix, which is called the stable matrix of A and A is said to be a stable


Markov matrix. Theorem 8.10 says that a regular Markov matrix A is stableand, at the same time, gives us a constructive method to find the stable matrixof A.

Exercises

8.4.1 Let A ∈ R(n, n) be a stable Markov matrix and K ∈ R(n, n) the stablematrix of A. Show that K is also a Markov matrix and satisfies AK =KA = K .

8.4.2 Show that the matrix

A =(

0 1

1 0

)(8.4.21)

is a Markov matrix which is not regular. Is A stable?8.4.3 Consider the Markov matrix

A = 1

2

⎛⎜⎜⎝1 1 01

21

1

20 1 1

⎞⎟⎟⎠ . (8.4.22)

(i) Check that A is regular by showing that A2 > 0.(ii) Find the stable matrix of A.

(iii) Use induction to establish the formula

Am= 1

2m

⎛⎜⎜⎜⎜⎝3

2+ (2m−2 − 1) 2m−1 1

2+ (2m−2 − 1)

2m−2 2m−1 2m−2

1

2+ (2m−2 − 1) 2m−1 3

2+ (2m−2 − 1)

⎞⎟⎟⎟⎟⎠,

m = 1, 2, . . . . (8.4.23)

Take m→∞ in (8.4.23) and verify your result obtained in (ii).

8.4.4 Let A ∈ R(n, n) be a Markov matrix. If At is also a Markov matrix, A

is said to be a doubly Markov matrix. Show that the stable matrix K ofa regular doubly Markov matrix is simply given by

K = 1

n

⎛⎜⎜⎝1 · · · 1... · · · ...

1 · · · 1

⎞⎟⎟⎠ . (8.4.24)

8.4.5 Let A1, . . . , Ak ∈ R(n, n) be k doubly Markov matrices. Show that theirproduct A = A1 · · ·Ak is also a doubly Markov matrix.

9

Excursion: Quantum mechanics in a nutshell

The content of this chapter may serve as yet another supplemental topic to meetthe needs and interests beyond those of a usual course curriculum. Here weshall present an over-simplified, but hopefully totally transparent, descriptionof some of the fundamental ideas and concepts of quantum mechanics, usinga pure linear algebra formalism.

9.1 Vectors in Cn and Dirac bracket

Consider the vector space Cn, consisting of column vectors, and use{e1, . . . , en} to denote the standard basis of Cn. For u, v ∈ Cn with

u =

⎛⎜⎜⎝a1

...

an

⎞⎟⎟⎠ = n∑i=1

aiei, v =

⎛⎜⎜⎝b1

...

bn

⎞⎟⎟⎠ = n∑i=1

biei, (9.1.1)

recall that the Hermitian scalar product is given by

(u, v) = u†v =n∑

i=1

aibi, (9.1.2)

so that {e1, . . . , en} is a unitary basis, satisfying (ei, ej ) = δij , i, j = 1, . . . , n.In quantum mechanics, it is customary to rewrite the scalar product (9.1.2)

in a bracket form, 〈u|v〉. Then it was Dirac who suggested to view 〈u|v〉 as thescalar pairing of a ‘bra’ vector 〈u| and a ‘ket’ vector |v〉, representing the rowvector u† and the column vector v. Thus we may use |e1〉, . . . , |en〉 to denotethe standard basis vectors of Cn and represent the vector u in Cn as

248

9.1 Vectors in Cn and Dirac bracket 249

|u〉 =n∑

i=1

ai |ei〉. (9.1.3)

Therefore the bra-counterpart of |u〉 is simply given as

〈u| = (|u〉)† =n∑

i=1

ai〈ei |. (9.1.4)

As a consequence, the orthonormal condition regarding the basis {e1, . . . ,

en} becomes

〈ei |ej 〉 = δij , i, j = 1, . . . , n, (9.1.5)

and the Hermitian scalar product of the vectors |u〉 and |v〉 assumes the form

〈u|v〉 =n∑

i=1

aibi = 〈v|u〉. (9.1.6)

For the vector |u〉 given in (9.1.3), we find that

ai = 〈ei |u〉, i = 1, . . . , n. (9.1.7)

Now rewriting |u〉 as

|u〉 =n∑

i=1

|ei〉ai, (9.1.8)

and inserting (9.1.7) into (9.1.8), we obtain

|u〉 =n∑

i=1

|ei〉〈ei |u〉 ≡(

n∑i=1

|ei〉〈ei |)|u〉, (9.1.9)

which suggests that the strange-looking ‘quantity’,n∑

i=1

|ei〉〈ei |, should natu-

rally be identified as the identity mapping or matrix,n∑

i=1

|ei〉〈ei | = I, (9.1.10)

which readily follows from the associativity property of matrix multiplication.Similarly, we have

〈u| =n∑

i=1

〈ei |u〉〈ei | =n∑

i=1

〈u|ei〉〈ei | = 〈u|(

n∑i=1

|ei〉〈ei |)

. (9.1.11)

Thus (9.1.10) can be applied to both bra and ket vectors symmetrically andwhat it expresses is simply the fact that |e1〉, . . . , |en〉 form an orthonormalbasis of Cn.

250 Excursion: Quantum mechanics in a nutshell

We may reexamine some familiar linear mappings under the new notation.Let |u〉 be a unit vector in Cn. Use P|u〉 to denote the mapping that projects

Cn onto Span{|u〉} along (Span{|u〉})⊥. Then we have

P|u〉|v〉 = 〈u|v〉|u〉, |v〉 ∈ Cn. (9.1.12)

Placing the scalar number 〈u|v〉 to the right-hand side of the above expression,we see that the mapping P|u〉 can be rewritten as

P|u〉 = |u〉〈u|. (9.1.13)

Let {|u1〉, . . . , |un〉} be any orthonormal basis of Cn. Then P|ui 〉 projects Cn

onto Span{|ui〉} along

Span{|u1〉, . . . , |ui〉, . . . , |un〉}. (9.1.14)

It is clear that

I =n∑

i=1

P|ui 〉, (9.1.15)

since ai = 〈ui |u〉, which generalizes the result (9.1.10).If T is an arbitrary Hermitian operator with eigenvalues λ1, . . . , λn and the

associated orthonormal eigenvectors u1, . . . , un (which form a basis of Cn),then we have the representation

T =n∑

i=1

λi |ui〉〈ui |, (9.1.16)

as may be checked easily. Besides, in 〈u|T |v〉, we may interpret T as appliedeither to the ket vector |v〉, from the left, or to the bra vector 〈u|, from the right,which will not cause any ambiguity.

To summarize, we may realize Cn by column vectors as ket vectors

|u〉 =

⎛⎜⎜⎝a1

...

an

⎞⎟⎟⎠ , |v〉 =

⎛⎜⎜⎝b1

...

bn

⎞⎟⎟⎠ ∈ Cn, (9.1.17)

so that the bra vectors are represented as row vectors

〈u| = (|u〉)† = (a1, . . . , an), 〈v| = (|v〉)† = (b1, . . . , bn). (9.1.18)

Consequently,

〈u|v〉 = (|u〉)†|v〉 =n∑

i=1

aibi, (9.1.19)

9.1 Vectors in Cn and Dirac bracket 251

|u〉〈v| = |u〉(|v〉)† =

⎛⎜⎜⎝a1

...

an

⎞⎟⎟⎠ (b1, . . . , bn)

=⎛⎜⎝ a1b1 · · · a1bn

· · · · · · · · ·anb1 · · · anbn

⎞⎟⎠ = (aibj ). (9.1.20)

In particular, we have the representation

n∑i=1

|ei〉〈ei | =

⎛⎜⎜⎜⎜⎜⎝1 0 · · · 0

0 1 · · · 0

0 0. . . 0

0 0 · · · 1

⎞⎟⎟⎟⎟⎟⎠ . (9.1.21)

Finally, if A ∈ C(n, n) is a Hermitian matrix (or equivalently viewed asa self-adjoint mapping over Cn), it is clear to explain A in 〈u|A|v〉 as to beapplied on the ket vector |v〉 from the left or on the row vector 〈u| from theright, without ambiguity, since

〈u|A|v〉 = 〈u|(A|v〉) = (〈u|A)|v〉 = (A|u〉)†|v〉. (9.1.22)

Exercises

9.1.1 Consider the orthonormal basis of C2 consisting of the ket vectors

|u1〉 = 1

2

(1+ i

1− i

), |u2〉 = 1

2

(1+ i

−1+ i

). (9.1.23)

Verify the identity

|u1〉〈u1| + |u2〉〈u2| = I2. (9.1.24)

9.1.2 Consider the Hermitian matrix

A =( −1 3+ i

3− i 2

)(9.1.25)

in C(2, 2).

(i) Find an orthonormal basis of C2 consisting of eigenvectors, say|u1〉, |u2〉, associated with the eigenvalues, say λ1, λ2, of A.

(ii) Find the matrices that represent the orthogonal projections

P|u1〉 = |u1〉〈u1|, P|u2〉 = |u2〉〈u2|. (9.1.26)


(iii) Verify the identity

A = λ1P|u1〉 + λ2P|u2〉 = λ1|u1〉〈u1| + λ2|u2〉〈u2|. (9.1.27)

9.2 Quantum mechanical postulates

In physics literature, quantum mechanics may be formulated in terms of twoobjects referred to as states and observables. A state, also called a wave func-tion, contains statistical information of a certain physical observable, whichoften describes one of some measurable quantities such as energy, momenta,and position coordinates of a mechanical system, such as those encounteredin describing the motion of a hypothetical particle. Mathematically, states arevectors in a complex vector space with a positive definite Hermitian scalarproduct, called the state space, and observables are Hermitian mappings overthe state space given. In this section, we present an over-simplified formalismof quantum mechanics using the space Cn as the state space and Hermitianmatrices in C(n, n) as observables.

To proceed, we state a number of axioms, in our context, called the quantummechanical postulates, based on which quantum mechanics is built.

• State postulate. The state of a mechanical system, hereby formally referredto as ‘a particle’, is described by a unit ket vector |φ〉 in Cn.

• Observable postulate. A physically measurable quantity of the particle suchas energy, momenta, etc., called an observable, is represented by a Hermitianmatrix A ∈ C(n, n) so that its expected value for the particle in the state |φ〉,denoted by 〈A〉, is given by

〈A〉 = 〈φ|A|φ〉. (9.2.1)

• Measurement postulate. Let A ∈ C(n, n) be an observable with eigenvaluesλ1, . . . , λn, which are known to be all real, and the corresponding eigenvec-tors of A be denoted by |u1〉, . . . , |un〉, which form an orthonormal basisof Cn. As a random variable, the measurement XA of the observable A,when the particle lies in the state |φ〉, can result only in a reading among theeigenvalues of A, and obeys the probability distribution

P({XA = λ}) =

⎧⎪⎨⎪⎩∑λi=λ

|φi |2, if λ is an eigenvalue,

0, if λ is not an eigenvalue,

(9.2.2)

9.2 Quantum mechanical postulates 253

where φ1, . . . , φn ∈ C are the coordinates of |φ〉 with respect to the basis{u1, . . . , un}, that is,

|φ〉 =n∑

i=1

φi |ui〉. (9.2.3)

The measurement postulate (9.2.2) may be viewed as directly motivatedfrom the expected (or expectation) value formula

〈A〉 =n∑

i=1

λi |φi |2, (9.2.4)

which can be obtained by substituting (9.2.3) into (9.2.1).In the context of the measurement postulate, we have the following quantum

mechanical interpretation of an eigenstate.

Theorem 9.1 The particle lies in an eigenstate of the observable A if andonly if all measurements of A render the same value which must be some eigen-value of A.

Proof Suppose that λ is an eigenvalue of A. Use Eλ to denote the correspond-ing eigenspace. If the particle lies in a state |φ〉 ∈ Eλ, then φi = 〈ui |φ〉 = 0when λi = λ, where λ1, . . . , λn are all the possible eigenvalues of A and|u1〉, . . . , |un〉 are the corresponding eigenvectors that form an orthonormalbasis of Cn. Therefore

P({XA = λi}) = 0 when λi = λ, (9.2.5)

which leads to P({XA = λ}) = 〈φ|φ〉 = 1.Conversely, if P({XA = λ}) = 1 holds for some λ when the particle lies

in the state |φ〉, then according to the measurement postulate (9.2.2), we haveλ = λi for some i = 1, . . . , n. Hence

0 = 1− P({XA = λi}) =∑

λj =λi

|φj |2, (9.2.6)

which follows that φj = 0 when j = i. Thus |φ〉 ∈ Eλias claimed.

We are now at a position to consider how a state evolves itself with respectto time t .

• Time evolution postulate. A state vector |φ〉 follows time evolution accord-ing to the law

ihd

dt|φ〉 = H |φ〉, (9.2.7)


where H ∈ C(n, n) is a Hermitian matrix called the Hamiltonian of the systemand h > 0 a universal constant called the Planck constant. Equation (9.2.7) isthe matrix version of the celebrated Schrödinger equation.

In classical mechanics, the Hamiltonian H of a system measures the totalenergy, which may be written as the sum of the kinetic energy K and potentialenergy V ,

H = K + V. (9.2.8)

If the system consists of a single particle of mass m > 0, then K may beexpressed in terms of the momentum P of the particle through the relation

K = 1

2mP 2. (9.2.9)

In quantum mechanics in our context here, both P and V are taken to be Her-mitian matrices.

Let |φ(0)〉 = |φ0〉 be the initial state of the time-dependent state vector|φ(t)〉. Solving (9.2.7), we obtain

|φ(t)〉 = U(t)|φ0〉, U(t) = e−ihtH

. (9.2.10)

Since H is Hermitian, − i

hH is anti-Hermitian. Hence U(t) is unitary,

U(t)U†(t) = I, (9.2.11)

which ensures the conservation of the normality of the state vector |φ(t)〉.That is,

〈φ(t)|φ(t)〉 = 〈φ0|φ0〉 = 1. (9.2.12)

Assume that the eigenvalues of H , say λ1, . . . , λn are all positive. Letu1, . . . , un be the associated eigenstates of H that form an orthonormalbasis of Cn. With the expansion

|φ0〉 =n∑

i=1

φ0,i |ui〉, (9.2.13)

we may rewrite the state vector |φ(t)〉 as

|φ(t)〉 =n∑

i=1

φ0,ie− i

htH |ui〉 =

n∑i=1

e−ihλi tφ0,i |ui〉 =

n∑i=1

e−iωi tφ0,i |ui〉,

(9.2.14)

9.2 Quantum mechanical postulates 255

where

ωi = λi

h, i = 1, . . . , n, (9.2.15)

are angular frequencies of the eigenmodes

e−iωi tφ0,i |ui〉, i = 1, . . . , n. (9.2.16)

In other words, the state vector or wave function |φ(t)〉 is a superposition of n

eigenmodes with the associated angular frequencies determined by the eigen-values of the Hamiltonian through (9.2.15).

As an observable, the Hamiltonian H measures the total energy. Thus theeigenvalues λ1, . . . , λn of H are the possible energy values of the system. Forthis reason, we may use E to denote a generic energy value, among λ1, . . . , λn.Correspondingly, we use ω to denote a generic angular frequency, amongω1, . . . , ωn. Therefore we arrive at the generic relation

E = hω, (9.2.17)

known as the Einstein formula, arising originally in the work of Einsteintowards an understanding of the photoelectric effect, which later became one ofthe two basic equations in the wave–particle duality hypothesis of de Broglieand laid the very foundation of quantum mechanics. Roughly speaking, theformula (9.2.17) indicates that a particle of energy E behaves like a wave ofangular frequency ω and that a wave of angular frequency ω also behaves likea particle of energy E.

Let A be an observable and assume that the system lies in the state|φ(t)〉 which is governed by the Schrödinger equation (9.2.7). We investigatewhether the expectation value 〈A〉(t) = 〈φ(t)|A|φ(t)〉 is conserved or time-independent. For this purpose, we compute

d

dt〈A〉(t) = d

dt〈φ(t)|A|φ(t)〉

=(

d

dt〈φ(t)|)

A|φ(t)〉 + 〈φ(t)|A(

d

dt|φ(t)〉)

=(

d

dt|φ(t)|〉)†

A|φ(t)〉 + 〈φ(t)|A(

d

dt|φ(t)〉)

=(− i

hH |φ(t)〉

)†

A|φ(t)〉 + 〈φ(t)|A(− i

hH |φ(t)〉

)=⟨φ(t)

∣∣∣∣ ih [H,A]∣∣∣∣φ(t)

⟩, (9.2.18)


where

[H,A] = HA− AH (9.2.19)

is the commutator of H and A which measures the non-commutativity of H

and A. Hence we see that 〈A〉 is time-independent if A commutes with H :

[H,A] = 0. (9.2.20)

In particular, the average energy 〈H 〉 is always conserved. Furthermore, if H isrelated through the momentum P and potential V through (9.2.8) and (9.2.9)so that P commutes with V , then [H,P ] = 0 and the average momentum〈P 〉 is also conserved. These are the quantum mechanical extensions of lawsof conservation for energy and momentum.

Exercises

9.2.1 Consider the Hermitian matrix

A =⎛⎜⎝ 5 i 0

−i 3 1− i

0 1+ i 5

⎞⎟⎠ (9.2.21)

as an observable of a system.

(i) Find the eigenvalues of A as possible readings when measuring theobservable A.

(ii) Find the corresponding unit eigenvectors of A, say |u1〉, |u2〉, |u3〉,which form an orthonormal basis of the state space C3.

(iii) Assume that the state the system occupies is given by the vector

|φ〉 = 1√15

⎛⎜⎝ i

2+ i

3

⎞⎟⎠ (9.2.22)

and resolve |φ〉 in terms of |u1〉, |u2〉, |u3〉.(iv) If the system stays in the state |φ〉, determine the probability dis-

tribution function of the random variable XA, which is the randomvalue read each time when a measurement about A is made.

(v) If the system stays in the state |φ〉, evaluate the expected value, 〈A〉,of XA.

9.2.2 Consider the Schrödinger equation

ihd

dt|φ〉 = H |φ〉, H =

(4 i

−i 4

). (9.2.23)

9.3 Non-commutativity and uncertainty principle 257

(i) Find the orthonormal eigenstates of H and use them to constructthe solution |φ(t)〉 of (9.2.23) satisfying the initial condition

|φ(0)〉 = |φ0〉 = 1√2

(1

−1

). (9.2.24)

(ii) Consider a perturbation of the Hamiltonian H given as

Hε = H + εσ1, σ1 =(

0 1

1 0

), ε ∈ R, (9.2.25)

where σ1 is known as one of the Pauli matrices. Show that the com-mutator of H and Hε is

[H,Hε] = 2iεσ3, σ3 =(

1 0

0 −1

), (9.2.26)

where σ3 is another Pauli matrix, and use it and (9.2.18) to evaluatethe rate of change of the time-dependent expected value 〈Hε〉(t) =〈φ(t)|Hε|φ(t)〉.

(iii) Establish the formula

〈φ(t)|Hε|φ(t)〉 = 〈φ0|H |φ0〉 + ε〈φ(t)|σ1|φ(t)〉 (9.2.27)

and use it to verify the result regardingd

dt〈Hε〉(t) obtained in (ii)

through the commutator identity (9.2.18).

9.3 Non-commutativity and uncertainty principle

Let {|ui〉} be an orthonormal set of eigenstates of a Hermitian matrix A ∈C(n, n) with the associated real eigenvalues {λi} and use XA to denote therandom variable of measurement of A. If the system lies in the state |φ〉with φi = 〈ui |φ〉 (i = 1, 2, · · · ), the distribution function of XA is asgiven in (9.2.2). Thus the variance of XA can be calculated according to theformula

σ 2A =

n∑i=1

(λi − 〈A〉)2|φi |2

= (|(A− 〈A〉I )|φ〉)†(A− 〈A〉I )|φ〉= 〈φ|(A− 〈A〉I )|(A− 〈A〉I )|φ〉, (9.3.1)


which is given in a form free of the choice of the basis {|ui〉}. Thus,with (9.3.1), if A and B are two observables, then the Schwarz inequalityimplies that

σ 2Aσ 2

B ≥ |〈φ|(A− 〈A〉I )|(B − 〈B〉I )|φ〉|2 ≡ |c|2, (9.3.2)

where the complex number c is given by

c = 〈φ|(A− 〈A〉I )|(B − 〈B〉I )|φ〉= 〈φ|(A− 〈A〉I )(B − 〈B〉I )|φ〉= 〈φ|AB|φ〉 − 〈B〉〈φ|A|φ〉 − 〈A〉〈φ|B|φ〉 + 〈A〉〈B〉〈φ|φ〉= 〈AB〉 − 〈A〉〈B〉. (9.3.3)

Interchanging A and B, we have

c = 〈φ|(B − 〈B〉I )|(A− 〈A〉I )|φ〉= 〈BA〉 − 〈A〉〈B〉. (9.3.4)

Therefore, we obtain

�(c) = 1

2i(c − c) = 1

2i〈[A,B]〉. (9.3.5)

Inserting (9.3.5) into (9.3.2), we arrive at the inequality

σ 2Aσ 2

B ≥(

1

2i〈[A,B]〉

)2

, (9.3.6)

which roughly says that if two observables are non-commutative, we cannotachieve simultaneous high-precision measurements for them. To put the state-ment in another way, if we know one observable with high precision, we donot know the other observable at the same time with high precision. This fact,in particular, the inequality (9.3.6), in quantum mechanics, is known as theHeisenberg uncertainty principle.

On the other hand, when A and B commute, we know that A and B sharethe same eigenstates which may form an orthonormal basis of Cn. Let φ be acommonly shared eigenstate so that

A|φ〉 = λA|φ〉, B|φ〉 = λB |φ〉, λA, λB ∈ R. (9.3.7)

If the system lies in the state |φ〉, then, with simultaneous full certainty, themeasured values of the observables A and B are λA and λB , respectively.

Let k ≥ 1 be an integer and define the kth moment of an observable A in thestate |φ〉 by

〈Ak〉 = 〈φ|Ak|φ〉. (9.3.8)


Thus, in (9.3.3), when we set B = A, we see that the variance σ 2A of A in the

state |φ〉 may be computed using the formula

σ 2A = 〈A2〉 − 〈A〉2. (9.3.9)

In probability theory, the radical root of the variance, σA =√

σ 2A, is called

standard deviation. In quantum mechanics, σA is also called uncertainty,which measures the randomness of the observed values of the observable A.

It will be instructive to identify those states which will render the maximumuncertainty. To simplify our discussion, we shall assume that A ∈ C(n, n) hasn distinct eigenvalues λ1, . . . , λn. As before, we use |u1〉, . . . , |un〉 to denotethe corresponding eigenstates of A which form an orthonormal basis of thestate space Cn. In order to emphasize the dependence of the uncertainty on theunderlying state |φ〉, we use σ 2

A,|φ〉 to denote the associated variance. We areto solve the problem

max{σ 2A,|φ〉 | 〈φ|φ〉 = 1}. (9.3.10)

For this purpose, we write any normalized state vector |φ〉 as

|φ〉 =n∑

i=1

φi |ui〉. (9.3.11)

Then 〈A2〉 and 〈A〉 are given by

〈A2〉 =n∑

i=1

λ2i |φi |2, 〈A〉 =

n∑i=1

λi |φi |2. (9.3.12)

Hence, we have

σ 2A,|φ〉 =

n∑i=1

λ2i |φi |2 −

(n∑

i=1

λi |φi |2)2

. (9.3.13)

To ease computation, we replace |φi | by xi ∈ R (i = 1, . . . , n) and considerinstead the constrained maximization problem⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩

max

⎧⎨⎩n∑

i=1

λ2i x

2i −(

n∑i=1

λix2i

)2⎫⎬⎭ ,

n∑i=1

x2i = 1.

(9.3.14)


Thus, using calculus, the maximum points are to be sought among the solutionsof the equations

xi

⎛⎝λ2i − 2λi

⎡⎣ n∑j=1

λjx2j

⎤⎦− ξ

⎞⎠ = 0, i = 1, . . . , n, (9.3.15)

where ξ ∈ R is a Lagrange multiplier. Multiplying this equation by xi andsumming over i = 1, . . . , n, we find

ξ = 〈A2〉 − 2〈A〉2. (9.3.16)

Consequently (9.3.15) takes the form

xi

(λ2

i − 2〈A〉λi − [〈A2〉 − 2〈A〉2])= 0, i = 1, . . . , n. (9.3.17)

On the other hand, since the quadratic equation

λ2 − 2〈A〉λ− (〈A2〉 − 2〈A〉2) = 0 (9.3.18)

has two real roots

λ = 〈A〉 ±√〈A2〉 − 〈A〉2 = 〈A〉 ± σA (9.3.19)

in the nontrivial situation σA > 0, we see that there are at least n− 2 values ofi = 1, . . . , n such that

λ2i − 2〈A〉λi − (〈A2〉 − 2〈A〉2) = 0, (9.3.20)

which leads us to conclude in view of (9.3.17) that xi = 0 at those valuesof i. For definiteness, we assume xi = 0 when i = 1, 2. Hence (9.3.14) isreduced into⎧⎪⎨⎪⎩ max

{λ2

1x21 + λ2

2x22 −(λ1x

21 + λ2x

22

)2},

x21 + x2

2 = 1.

(9.3.21)

Using the constraint in (9.3.21), we may simplify the objective function of theproblem (9.3.21) into the form

(λ1 − λ2)2(x2

1 − x41), (9.3.22)

which may be further maximized to give us the solution

|x1| = |x2| = 1√2. (9.3.23)


In this case, it is straightforward to check that λ1 and λ2 are indeed the tworoots of equation (9.3.19). In particular,

σ 2A,|φ〉 =

1

4(λ1 − λ2)

2. (9.3.24)

Consequently, if we use λmin and λmax to denote the smallest and largest eigen-values of A, then we see in view of (9.3.24) that the maximum uncertainty isgiven by

σA,max = 1

2(λmax − λmin), (9.3.25)

which is achieved when the system occupies the maximum uncertainty state

|φmax〉 = a|uλmax〉 + b|uλmin〉, a, b ∈ C, |a| = |b| = 1√2. (9.3.26)

In other words, we have the result

σA,|φmax〉 = σA,max. (9.3.27)

Exercises

9.3.1 Let A,B ∈ C(2, 2) be two observables given by

A =(

1 i

−i 2

), B =

( −2 1− i

1+ i 3

). (9.3.28)

Evaluate the quantities σ 2A, σ 2

B , and 〈[A,B]〉 in the state

|φ〉 = 1√5

( −1

2

), (9.3.29)

and use them to check the uncertainty principle (9.3.6).9.3.2 Consider an observable A ∈ C(n, n), which has n distinct eigenval-

ues λ1, . . . , λn. Let |u1〉, . . . , |un〉 be the corresponding eigenstates ofA which form an orthonormal basis of Cn. Consider the uniform stategiven by

|φ〉 = 1√n

n∑i=1

|ui〉. (9.3.30)

(i) Compute the uncertainty of A in the state |φ〉.(ii) Compare your result with the maximum uncertainty given in the

expression (9.3.25).


9.3.3 Let A ∈ C(3, 3) be an observable given by

A =⎛⎜⎝ 1 2+ i 0

2− i −3 0

0 0 5

⎞⎟⎠ . (9.3.31)

(i) Compute the maximum uncertainty of A.(ii) Find all maximum uncertainty states of A.

(iii) Let |φ〉 be the uniform state defined in Exercise 9.3.2. Find theuncertainty of A in the state |φ〉 and compare it with the maximumuncertainty of A found in (i).

9.4 Heisenberg picture for quantum mechanics

So far our description of quantum mechanics has been based on theSchrödinger equation, which governs the evolution of a state vector. Such adescription of quantum mechanics is also called a Schrödinger picture. Herewe study another important description of quantum mechanics called theHeisenberg picture within which the state vector is time-independent butobservables evolve with respect to time following a dynamical equation similarto that seen in classical mechanics.

We start from the Schrödinger equation (9.2.7) defined by the Hamilto-nian H . Let |φ(t)〉 be a state vector and |φ(0)〉 = |φ0〉. Then

|φ(t)〉 = e−ihtH |φ0〉. (9.4.1)

Thus, for any observable A, its expected value in the state |φ(t)〉 is given by

〈A〉(t) = 〈φ(t)|A|φ(t)〉 = 〈φ0|e ihtH

Ae−ihtH |φ0〉. (9.4.2)

On the other hand, if we use time-independent state vector, |φ0〉, and replace A

with a correctly formulated time-dependent version, A(t), we are to have thesame mechanical conclusion. In particular, the expected value of A at time t inthe state |φ(t)〉 must be equal to the expected value of A(t) in the state |φ0〉.That is,

〈φ(t)|A|φ(t)〉 = 〈φ0|A(t)|φ0〉. (9.4.3)

Comparing (9.4.2) and (9.4.3), we obtain

〈φ0|e ihtH

Ae−ihtH − A(t)|φ0〉 = 0. (9.4.4)

Therefore, using the arbitrariness of |φ0〉, we arrive at the relation

9.4 Heisenberg picture for quantum mechanics 263

A(t) = eihtH

Ae−ihtH

, (9.4.5)

which indicates how an observable should evolve itself with respect to time.Differentiating (9.4.5), we are led to the equation

d

dtA(t) = i

h[H,A(t)], (9.4.6)

which spells out the dynamical law that a time-dependent observable mustfollow and is known as the Heisenberg equation.

We next show that (9.4.6) implies (9.2.7) as well.As preparation, we establish the following Gronwall inequality which is

useful in the study of differential equations: If f (t) and g(t) are continuousnon-negative functions in t ≥ 0 and satisfy

f (t) ≤ a +∫ t

0f (τ)g(τ ) dτ, t ≥ 0, (9.4.7)

for some constant a ≥ 0, then

f (t) ≤ a exp

{∫ t

0g(τ) dτ

}, t ≥ 0. (9.4.8)

To prove it, we modify (9.4.7) into

f (t) < a + ε +∫ t

0f (τ)g(τ ) dτ, t ≥ 0, (9.4.9)

and set

h(t) = a + ε +∫ t

0f (τ)g(τ ) dτ, t ≥ 0, (9.4.10)

where ε > 0. Therefore h(t) is positive-valued and differentiable with h′(t) =f (t)g(t) for t > 0 and h(0) = a + ε. Multiplying (9.4.9) by g(t), we have

h′(t)h(t)

≤ g(t), t ≥ 0. (9.4.11)

Integrating (9.4.11), we obtain

h(t) ≤ h(0) exp

{∫ t

0g(τ) dτ

}, t ≥ 0. (9.4.12)

However, (9.4.10) and (9.4.11) indicate that f (t) < h(t) (t ≥ 0). Hence wemay use (9.4.12) to get

f (t) < (a + ε) exp

{∫ t

0g(τ) dτ

}, t ≥ 0. (9.4.13)

Finally, since ε > 0 is arbitrary, we see that (9.4.8) follows.


We now turn our attention back to the Heisenberg equation (9.4.6). Supposethat B(t) is another solution of the equation. Then C(t) = A(t)−B(t) satisfies

d

dtC(t) = i

h[H,C(t)], C(0) = A(0)− B(0). (9.4.14)

Therefore, we have

‖C′(t)‖ ≤ 2

h‖H‖‖C(t)‖. (9.4.15)

On the other hand, we may use the triangle inequality to get

‖C(t + h)‖ − ‖C(t)‖ ≤ ‖C(t + h)− C(t)‖, h > 0, (9.4.16)

which allows us to conclude withd

dt‖C(t)‖ ≤ ‖C′(t)‖. (9.4.17)

Inserting (9.4.17) into (9.4.15) and integrating, we find

‖C(t)‖ ≤ ‖C(0)‖ + 2

h

∫ t

0‖H‖‖C(τ)‖ dτ, t ≥ 0. (9.4.18)

Consequently, it follows from applying the Gronwall inequality that

‖C(t)‖ ≤ ‖C(0)‖e 2h‖H‖t

, t ≥ 0. (9.4.19)

The same argument may be carried out in the domain t ≤ 0 with the timeflipping t #→ −t . Thus, in summary, we obtain the collective conclusion

‖C(t)‖ ≤ ‖C(0)‖e 2h‖H‖ |t |

, t ∈ R. (9.4.20)

In particular, if C(0) = 0, then C(t) ≡ 0, which implies that the solution tothe initial value problem of the Heisenberg equation (9.4.6) is unique. Hence,if A(0) = A, then the solution is uniquely given by (9.4.5). As a consequence,if A commutes with H , A(t) = A for all time t . In other words, if an observ-able commutes with the Hamiltonian initially, it stays commutative with theHamiltonian and remains in fact constant for all time.

We can now derive the Schrödinger equation (9.2.7) from the Heisenbergequation (9.4.6).

In fact, let A(t) be the unique solution of (9.4.6) evolving from its initialstate A. Then A(t) is given by (9.4.5). Let |φ(t)〉 denote the state vector thatevolves with respect to t from its initial state vector |φ0〉 so that it gives riseto the same expected value as that evaluated using the Heisenberg equationthrough A(t). Then (9.4.3) holds. We hope to examine, to what extent, therelation (9.4.1) must be valid. To this end, we assume that

〈u|A|u〉 = 〈v|A|v〉 (9.4.21)

9.4 Heisenberg picture for quantum mechanics 265

holds for any Hermitian matrix A ∈ C(n, n) for some |u〉, |v〉 ∈ Cn and weinvestigate how |u〉 and |v〉 are related.

If |v〉 = 0, then 〈u|A|u〉 = 0 for any Hermitian matrix A, which implies|u〉 = 0 as well. Thus, in the following, we only consider the nontrivial situa-tion |u〉 = 0, |v〉 = 0.

If v = 0, we set V = Span{|v〉} and W = V ⊥. Choose a Hermitian matrixA so that |x〉 #→ A|x〉 (|x〉 ∈ Cn) defines the projection of Cn onto V alongW . Write |u〉 = a|v〉 + |w〉, where a ∈ C and |w〉 ∈ W . Then A|u〉 = a|v〉.Thus

〈u|A|u〉 = |a|2〈v|v〉 = |a|2〈v|A|v〉, (9.4.22)

which leads to

|a| = 1 or a = eiθ , θ ∈ R. (9.4.23)

Moreover, let A ∈ C(n, n) be a Hermitian matrix so that |x〉 #→A|x〉 (|x〉 ∈Cn)defines the projection of Cn onto W along V . Then A|v〉 = 0 and A|w〉 = |w〉.Inserting these into (9.4.21), we find

0 = 〈v|A|v〉 = 〈u|A|u〉 = 〈w|w〉, (9.4.24)

which gives us the result w = 0. In other words, that (9.4.21) holds for anyHermitian matrix A ∈ C(n, n) implies that |u〉 and |v〉 differ from each otherby a phase factor a satisfying (9.4.23). That is,

|u〉 = eiθ |v〉, θ ∈ R. (9.4.25)

Now inserting (9.4.5) into (9.4.3), we obtain

〈φ(t)|A|φ(t)〉 = 〈ψ(t)|A|ψ(t)〉, (9.4.26)

where

|ψ(t)〉 = e−ihtH |φ0〉. (9.4.27)

Consequently, in view of the conclusion (9.4.25), we arrive at the relation

|φ(t)〉 = eiθ(t)e−ihtH |φ0〉, (9.4.28)

where θ(t) is a real-valued function of t , which simply cancels itself out in(9.4.26) and may well taken to be zero. In other words, we are prompted toconclude that the state vector should follow the law of evolution given simplyby (9.4.1) or by (9.4.28) with setting θ(t) ≡ 0, which is the unique solution ofthe Schrödinger equation (9.2.7) subject to the initial condition |φ(0)〉 = |φ0〉.Thus the Schrödinger equation inevitably comes into being as anticipated.


Exercises

9.4.1 Consider the Heisenberg equation (9.4.6) subject to the initial conditionA(0) = A0. An integration of (9.4.6) gives

A(t) = A0 + i

h

∫ t

0[H,A(τ)] dτ. (9.4.29)

(i) From (9.4.29) derive the result

[H,A(t)] = [H,A0] + i

h

∫ t

0[H, [H,A(τ)]] dτ. (9.4.30)

(ii) Use (9.4.30) and the Gronwall inequality to come up with an alter-native proof that H and A(t) commute for all time if and only ifthey do so initially.

9.4.2 Consider the Hamiltonian H and an observable A given by

H =(

2 i

−i 2

), A =

(1 1− i

1+ i 1

). (9.4.31)

(i) Solve the Schrödinger equation (9.2.7) to obtain the time-dependent state |φ(t)〉 evolving from the initial state

|φ0〉 = 1√2

(1

i

). (9.4.32)

(ii) Evaluate the expected value of the observable A assuming that thesystem lies in the state |φ(t)〉.

(iii) Solve the Heisenberg equation (9.4.6) with the initial conditionA(0) = A and use it to evaluate the expected value of the sameobservable within the Heisenberg picture. That is, compute thequantity 〈φ0|A(t)|φ0〉. Compare the result with that obtained in (ii)and explain.

Solutions to selected exercises

Section 1.1

1.1.2 Let n satisfy 1 ≤ n < p and write 2n = kp + l for some integers k andl where 0 ≤ l < p. Then n+ (n− l) = kp. Thus [n− l] = −[n].

If [n] = [0], then n is not a multiple of p. Since p is a prime, so thegreatest common divisor of n and p is 1. Thus there are integers k, l suchthat

kp + ln = 1. (S1)

Consequently [l] is the multiplicative inverse of [n]. That is,[l] = [n]−1.

We may also prove the existence of [n]−1 without using (S1) but byusing an interesting statement called Fermat’s Little Theorem: For anynonnegative integer m and positive integer p, the integer mp − m isdivisible by p. We prove this theorem by induction. When m = 0, thereis nothing to show. Assume the statement is true at m ≥ 0. At m+ 1 wehave

(m+ 1)p − (m+ 1) =p−1∑k=1

(p

k

)mk + (mp −m), (S2)

which is clearly divisible by p in view of the inductive assumption andthe definition of binomial coefficients. So the theorem follows.

Now we come back to the construction of [n]−1. Since p is a primeand divides np − n = n(np−1 − 1) but not n, p divides np−1 − 1. Sothere is an integer k such that np−1 − 1 = kp. Therefore nnp−2 = 1modulo p. That is, [np−2] = [n]−1.

1.1.7 Multiplying the relation AB = I by C from the left we have C(AB) =C which gives us B = IB = (CA)B = C(AB) = C.

267

268 Solutions to selected exercises

Section 1.2

1.2.7 We have

utv=

⎛⎜⎜⎝a1

...

an

⎞⎟⎟⎠ (b1, . . . , bn)=

⎛⎜⎜⎜⎜⎝a1b1 a1b2 · · · a1bn

a2b1 a2b2 · · · a2bn

· · · · · · · · · · · ·anb1 anb2 · · · anbn

⎞⎟⎟⎟⎟⎠,

whose ith row and j th row are

ai(b1, b2, . . . , bn) and aj (b1, b2, . . . , bn),

which are clearly linearly dependent with each other.For column vectors we proceed similarly.

1.2.8 If either U1 ⊂ U2 or U2 ⊂ U1, the statement is trivially true. Supposeotherwise and pick u1 ∈ U1 but u1 ∈ U2 and u2 ∈ U2 but u2 ∈ U1. Weassert that u = u1+u2 ∈ U1∪U2. If u ∈ U1∪U2 then u ∈ U1 or u ∈ U2.So, respectively, u2 = u+ (−u1) ∈ U1 or u1 = u+ (−u2) ∈ U2, whichis false.

1.2.9 Without loss of generality we may assume that there are no such i, j ,i, j = 1, . . . , k, i = j , that Ui ⊂ Uj . Thus, in view of the previousexercise, we know that for k = 2 we can find u1 ∈ U1 and u2 ∈ U2

such that u = u1 + u2 ∈ U1 ∪ U2. Assume that the statement of theproblem is true at k = m ≥ 2. We proceed to prove the statement atk = m+ 1.

Assume otherwise that U = U1 ∪ · · · ∪ Um+1. By the inductive as-sumption there is some u ∈ U such that

u ∈ U1 ∪ · · · ∪ Um. (S3)

Thus u ∈ Um+1. Pick v ∈ Um+1 and consider the following m + 1vectors

w1 = u+ v, w2 = 2u+ v, . . . , wm+1 = (m+ 1)u+ v.

There is some i = 1, . . . , m + 1 such that wi ∈ U1 ∪ · · · ∪ Um.Otherwise, if wi ∈ U1∪· · ·∪Um for all i = 1, . . . , m+1, then there arei, j = 1, . . . , m+ 1, i = j , such that wi,wj lie in one of the subspacesU1, . . . , Um, say Ul (1 ≤ l ≤ m), which leads to wi −wj = (i − j)u ∈Ul or u ∈ Ul , a contradiction to (S3).

Now assume wi ∈ U1 ∪ · · · ∪ Um for some i = 1, . . . , m + 1. Thuswi ∈ Um+1. So v = wi + (−iu) ∈ Um+1 since u ∈ Um+1, which isagain false.

Solutions to selected exercises 269

Section 1.4

1.4.3 Without loss of generality assume f = 0. Then there is some u ∈ U

such that f (u) = 0. For any v ∈ U consider

w = v − f (v)

f (u)u.

We have f (w) = 0. Hence g(v) = 0 as well which gives us

g(v) = g(u)

f (u)f (v) ≡ af (v), v ∈ U.

That is, g = af .1.4.4 Let {v1, . . . , vn−1} be a basis of V . Extend it to get a basis of Fn, say

{v1, . . . , vn−1, vn}. Let {f1, . . . , fn−1, fn} be a basis of (Fn)′ dual to{v1, . . . , vn−1, vn}. It is clear that V 0 = Span{fn}. On the other hand,for any (x1, . . . , xn) ∈ Fn we have

v = (x1, x2, . . . , xn)− (x1 + x2 + · · · + xn, 0, . . . , 0) ∈ V.

So

0 = fn(v) = fn(x1, . . . , xn)− (x1 + · · · + xn)fn(e1),

(x1, . . . , xn) ∈ Fn. (S4)

For any f ∈ V 0, there is some a ∈ F such that f = afn. Hence in viewof (S4) we obtain

f (x1, . . . , xn) = afn(e1)

n∑i=1

xi, (x1, . . . , xn) ∈ Fn.

Section 1.7

1.7.2 Assume the nontrivial situation u = 0. It is clear that ‖u‖p ≤ ‖u‖∞ forany p ≥ 1. Thus lim sup

p→∞‖u‖p ≤ ‖u‖∞.

Let t0 ∈ [a, b] be such that |u(t0)| = ‖u‖∞ > 0. For any ε ∈(0, ‖u‖∞) we can find an interval around t0, say Iε ⊂ [a, b], such that|u(t)| > ‖u‖∞ − ε when t ∈ Iε. Thus

‖u‖p ≥(∫

Iε

|u(t)|p dt

) 1p ≥ (‖u‖∞ − ε) |Iε|

1p , (S5)

where |Iε| denotes the length of the interval Iε. Letting p →∞ in (S5)we find lim inf

p→∞ ‖u‖p ≥ ‖u‖∞−ε. Since ε > 0 may be chosen arbitrarily

small, we arrive at lim infp→∞ ‖u‖p ≥ ‖u‖∞.

Thus the limit ‖u‖p → ‖u‖∞ as p →∞ follows.


1.7.3 (i) Positivity: Of course ‖u′‖′ ≥ 0 for any u′ ∈ U ′. If ‖u′‖′ = 0,then |u′(u)| = 0 for any u ∈ U satisfying ‖u‖ = 1. Thus, for any

v ∈ U, v = 0, we have u′(v) = ‖v‖u′( 1

‖v‖v) = 0, which shows

u′(v) = 0 for any v ∈ U . So u′ = 0.(ii) Homogeneity: For any a ∈ F and u′ ∈ U ′, we have

‖au′‖′ = sup{|au′(u)| | u ∈ U, ‖u‖ = 1}= |a| sup{|u′(u)| | u ∈ U, ‖u‖ = 1} = |a|‖u′‖′.

(iii) Triangle inequality: For any u′, v′ ∈ U ′, we have

‖u′ + v′‖′ = sup{|u′(u)+ v′(u)| | u ∈ U, ‖u‖ = 1}≤ sup{|u′(u)| + |v′(u)| | u ∈ U, ‖u‖ = 1}≤ sup{|u′(u)| | u ∈ U, ‖u‖ = 1}+ sup{|v′(v)| | v ∈ U, ‖v‖ = 1}

= ‖u′‖′ + ‖v′‖′.1.7.4 (i) Positivity: Of course ‖u‖ ≥ 0. If ‖u‖ = 0, then |u′(u)| = 0 for all

u′ ∈ U ′ satisfying ‖u′‖′ = 1. Then for any v′ ∈ U ′, v′ = 0, we

have v′(u) = ‖v′‖′(

1

‖v′‖′ v′)

(u) = 0, which shows v′(u) = 0 for

any v′ ∈ U ′. Hence u = 0.(ii) Homogeneity: Let a ∈ F and u ∈ U . We have

‖au‖ = sup{|u′(au)| | u′ ∈ U ′, ‖u′‖′ = 1}= |a| sup{|u′(u)| | u′ ∈ U ′, ‖u′‖′ = 1} = |a|‖u‖.

(iii) Triangle inequality: For u, v ∈ U , we have

‖u+ v‖ = sup{|u′(u+ v)| | u′ ∈ U ′, ‖u′‖′ = 1}≤ sup{|u′(u)| + |u′(v)| | u′ ∈ U ′, ‖u′‖′ = 1}≤ sup{|u′(u)| | u′ ∈ U ′, ‖u′‖′ = 1}+ sup{|v′(v)| | v′ ∈ U ′, ‖v′‖′ = 1}

= ‖u‖ + ‖v‖.

Section 2.1

2.1.7 (i) Since f, g = 0 we have dim(f 0) = dim(g0) = n − 1. Let{u1, . . . , un−1} be a basis of f 0 and take v ∈ g0 but v ∈ f 0.Then u1, . . . , un−1, v are linearly independent, and hence, form abasis of U . In particular, U = f 0 + g0.


(ii) From the dimensionality equation

n = dim(f 0)+ dim(g0)− dim(f 0 ∩ g0)

the answer follows.2.1.8 We have N(T ) ⊂ N(T 2) and R(T 2) ⊂ R(T ). On the other hand, if

n = dim(U), the rank equation gives us

n(T )+ r(T ) = n = n(T 2)+ r(T 2).

Thus n(T ) = n(T 2) if and only if r(T ) = r(T 2). So N(T 2) = N(T )

if and only if R(T 2) = R(T ).2.1.10 Let w1, . . . , wk ∈ W be a basis of R(S ◦ T ). Then there are

u1, . . . , uk ∈ U such that S(T (ui)) = wi, i = 1, . . . , k.

Let vi = T (ui), i = 1, . . . , k. Then Span{v1, . . . , vk} ⊂ R(T ).Hence k ≤ r(T ). Of course, R(S ◦ T ) ⊂ R(S). So k ≤ r(S) as well.

Next, let u1, . . . , uk ∈ U form a basis of N(T ) ⊂ N(S ◦ T ). Letz1, . . . , zl ∈ N(S ◦ T ) so that u1, . . . , uk, z1, . . . , zl form a basis ofN(S ◦ T ). Of course T (z1), . . . , T (zl) ∈ N(S). We assert that theseare linearly independent. In fact, if there are scalars a1, . . . , al such thata1T (z1) + · · · + alT (zl) = 0, then T (a1z1 + · · · + alzl) = 0, whichindicates that a1z1 + · · · + alzl ∈ N(T ). Hence a1z1 + · · · + alzl =b1u1+ · · ·+ bkuk for some scalars b1, . . . , bk . However, we know thatu1, . . . , uk, z1, . . . , zl are linearly independent. So a’s and b’s are allzero. In particular, T (z1), . . . , T (zl) are linearly independent. Hencel ≤ n(S), which proves k + l ≤ n(T )+ n(S).

2.1.12 Define T ∈ L(Fm,Fn) by T (x) = Bx, x ∈ Fm. Then, since R(T ) ⊂Rn, we have r(T ) ≤ n. By the rank equation n(T )+ r(T ) = m and thecondition m > n, we see that n(T ) > 0. Hence there is a nonzero vectorx ∈ Fm such that T (x) = 0 or Bx = 0. Thus (AB)x = A(Bx) = 0which proves that the m×m matrix AB cannot be invertible.

2.1.14 Use N and R to denote the null-space and range of a mapping. Let

N (R) ∩R(S ◦ T ) = Span{w1, ..., wk} ⊂ W,

where w1, ..., wk are independent vectors.Expand {w1, ..., wk} to get a basis for R(S ◦ T ) so that

R(S ◦ T ) = Span{w1, ..., wk, y1, ..., yl}.Then {R(y1), ..., R(yl)} is a basis for R(R ◦ S ◦ T ) since

R(w1) = ... = R(wk) = 0


and R(y1), ..., R(yl) are independent. In particular, r(R ◦ S ◦ T ) = l.Since R(S ◦ T ) ⊂ R(S), we can expand {w1, ..., wk, y1, ..., yl} to

get a basis for R(S) so that

R(S) = Span{w1, ..., wk, y1, ..., yl, z1, ..., zm}.Now we can count the numbers as follows:

r(R ◦ S) = dim (Span{R(w1), ..., R(wk),

R(y1), ..., R(yl), R(z1), ..., R(zm)})= dim(Span{R(y1), ...,R(yl), R(z1), ..., R(zm)})≤ l +m,

r(S ◦ T ) = k + l.

So

r(R ◦ S)+ r(S ◦ T ) ≤ l +m+ k + l

= (k + l +m)+ l

= r(S)+ r(R ◦ S ◦ T ).

2.1.15 Define a mapping T : Fk+l → V +W by

T (y1, . . . , yk, z1, . . . , zl) =k∑

i=1

yivi +l∑

j=1

zjwj ,

(y1, . . . , yk, z1, . . . , zl)∈Fk+l .

Then N(T ) = S. From the rank equation we have n(T )+ r(T ) = k+ l

or dim(S) + r(T ) = k + l. On the other hand, it is clear that R(T ) =V + W . So r(T ) = dim(V + W). From the dimensionality equation(1.5.6) we have dim(V +W) = dim(V )+ dim(W)− dim(V ∩W), orr(T )+ dim(V ∩W) = k + l. Therefore dim(S) = dim(V ∩W).

Section 2.2

2.2.4 Define T ∈ L(R2) by setting

T (x) =(

a b

c d

)x, x =

(x1

x2

)∈ R2.

Then we have

T (e1) = ae1 + ce2,

T (e2) = be1 + de2,


and

T (u1) = 1

2(a + b + c + d)u1 + 1

2(a + b − c − d)u2,

T (u2) = 1

2(a − b + c − d)u1 + 1

2(a − b − c + d)u2,

so that the problem follows.2.2.5 Let A = (aij ) and define T ∈ L(Fn) by setting T (x) = Ax where

x ∈ Fn is taken to be a column vector. Then

T (ej ) =n∑

i=1

aij ei, j = 1, . . . , n.

Consider a new basis of Fn given by

f1 = en, f2 = en−1, . . . , fn = e1,

or fi = en−i+1 for i = 1, . . . , n. Then we obtain

T (fj ) =n∑

i=1

an−i+1,n−j+1fi ≡n∑

i=1

bij fi, j = 1, . . . , n.

Since A and B = (bij ) are the matrix representations of the mapping T

under the bases {e1, . . . , en} and {f1, . . . , fn}, respectively, we concludethat A ∼ B.

Section 2.3

2.3.3 That the equation T (u) = v has a solution for some u ∈ U is equivalentto v ∈ R(T ), which is equivalent to v ∈ N(T ′)0 (by Theorem 2.8),which is equivalent to 〈v, v′〉 = 0 for all v′ ∈ N(T ′) or T ′(v′) = 0.

Section 2.4

2.4.3 Let k = r(T ) = dim(R(T )). Let {[v1]Y , . . . , [vk]Y } be a basis of R(T )

in V/Y . Then there are [u1]X, . . . , [uk]X ∈ U/X such that

T ([u1]X) = [v1]Y , . . . , T ([uk]) = [vk]Y .

On the other hand, from the definition of T , we have T ([ui]X) =[T (ui)]Y for i = 1, . . . , k. We claim that T (u1), . . . , T (uk) are linearlyindependent. In fact, let a1, . . . , ak be scalars such that

a1T (u1)+ · · · + akT (uk) = 0.

Taking cosets, we get a1[T (u1)]Y + · · · + ak[T (uk)]Y = [0]Y , whichleads to a1[v1]Y + · · · + ak[vk]Y = [0]Y . Thus a1 = · · · = ak = 0. Thisproves r(T ) ≥ k.


Section 2.5

2.5.12 (i) Assume R(S) = R(T ). Since S projects U onto R(S) = R(T )

along N(S), we have for any u ∈ U the result S(T (u)) = T (u).

Thus S ◦ T = T . Similarly, T ◦ S = S.Furthermore, from S ◦T = T , then T (U) = (S ◦T )(U) implies

R(T ) ⊂ R(S). Likewise, T ◦ S = S implies R(S) ⊂ R(T ). SoR(S) = R(T ).

(ii) Assume N(S) = N(T ). For any u ∈ U , rewrite u as u = v + w

with v ∈ R(T ) and w ∈ N(T ). Then T (v) = v, T (w) = 0, andS(w) = 0 give us

(S ◦ T )(u) = S(T (v)+ T (w)) = S(v)= S(v + w)= S(u).

Hence S ◦ T = S. Similarly, T ◦ S = T .Assume S ◦ T = S, T ◦ S = T . Let u ∈ N(T ). Then S(u) =

S(T (u)) = 0. So u ∈ N(S). So N(T ) ⊂ N(S). Interchange S

and T . We get N(S) ⊂ N(T ).2.5.14 (i) We have

T 2(u) = T (〈u, u′1〉u1 + 〈u, u′2〉u2)

= 〈u, u′1〉T (u1)+ 〈u, u′2〉T (u2)

= 〈u, u′1〉(〈u1, u′1〉u1 + 〈u1, u

′2〉u2)

+ 〈u, u′2〉(〈u2, u′1〉u1 + 〈u2, u

′2〉u2)

= (〈u, u′1〉〈u1, u′1〉 + 〈u, u′2〉〈u2, u

′1〉)u1

+ (〈u, u′1〉〈u1, u′2〉 + 〈u, u′2〉〈u2, u

′2〉)u2

= T (u) = 〈u, u′1〉u1 + 〈u, u′2〉u2. (S6)

Since u1, u2 are independent, we have

〈u, u′1〉〈u1, u′1〉 + 〈u, u′2〉〈u2, u

′1〉 = 〈u, u′1〉,

〈u, u′1〉〈u1, u′2〉 + 〈u, u′2〉〈u2, u

′2〉 = 〈u, u′2〉.

Namely,

〈u, 〈u1, u′1〉u′1 + 〈u2, u

′1〉u′2〉 = 〈u, u′1〉,

〈u, 〈u1, u′2〉u′1 + 〈u2, u

′2〉u′2〉 = 〈u, u′2〉.

Since u ∈ U is arbitrary, we have

〈u1, u′1〉u′1 + 〈u2, u

′1〉u′2 = u′1, 〈u1, u

′2〉u′1 + 〈u2, u

′2〉u′2 = u′2.


Since u′1, u′2 are independent, we have

〈u1, u′1〉 = 1, 〈u2, u

′1〉 = 0, 〈u1, u

′2〉 = 0, 〈u2, u

′2〉 = 1.

(S7)

Conversely, if (S7) holds, then we can use (S6) to get

T 2(u) = 〈u, u′1〉u1 + 〈u, u′2〉u2 = T (u), u ∈ U.

In conclusion, (S7) is a necessary and sufficient condition to ensurethat T is a projection.

(ii) Assume T is a projection. Then (S7) holds. From V = {u ∈U | T (u) = u} and the definition of T , we see that V ⊂Span{u1, u2}. Using (S7), we also have u1, u2 ∈ V . So V =Span{u1, u2}.

If T (u) = 0, then 〈u, u′1〉u1 + 〈u, u′2〉u2 = 0. Since u1, u2

are independent, we have 〈u, u′1〉 = 0, 〈u, u′2〉 = 0. So W =(Span{u′1, u′2})0.

2.5.15 We have T (T − aI) = 0 or (aI − T )T = T (aI − T ) = 0. Besides,

I =(

I − 1

aT

)+ 1

aT = 1

a(aI − T )+ 1

aT .

Therefore, we may rewrite any u ∈ U as

u = I (u) = 1

a(aI − T )(u)+ 1

aT (u) ≡ v + w. (S8)

Thus

T (v) = T

(1

a(aI − T )(u)

)= 0,

(aI − T )(w) = (aI − T )

(1

aT (u)

)= 0.

That is, v ∈ N(T ) and w ∈ N(aI−T ). Hence U = N(T )+N(aI−T ).Let u ∈ N(aI − T ) ∩ N(T ). Inserting this into (S8), we get u = 0.

So N(aI − T ) ∩N(T ) = {0} and U = N(aI − T )⊕N(T ).Since T commutes with aI − T and T , it is clear that N(aI − T )

and N(T ) are invariant under T .2.5.25 (i) Since T is nilpotent of degree k, there is an element u ∈ U such

that T k−1(u) = 0 but T k(u) = 0. It is clear that

V = Span{u, T (u), . . . , T k−1(u)}is a k-dimensional subspace of U invariant under T . Let W be thecomplement of Span{T k−1(u)} in N(T ). Then dim(W) = l − 1.


Consider the subspace X = V + W of U . Pick v ∈ V ∩ W andwrite v as

v = a0u+ a1T (u)+ · · · + ak−1Tk−1(u), a0, a1, . . . , ak−1 ∈ F.

Then T (v) = 0 gives us a0T (u)+ · · · + ak−2Tk−1(u) = 0, which

leads to a0 = a1 = · · · = ak−2 = 0. So v ∈ Span{T k−1(u)} andv ∈ W which indicates v = 0. Hence X = V ⊕W . However, sincedim(X) = dim(V )+ dim(W) = k+ (l− 1) = n, we have X = U

so that T is reducible over V,W .(ii) It is clear that R(T ) = Span{T (u), . . . , T k−1(u)} and r(T ) =

k − 1.2.5.26 (i) Write S − T as S ◦ (I − S−1 ◦ T ) and set P = S−1 ◦ T . Then

P is nilpotent as well. Let the degree of P be k. Assume P = 0.Then k ≥ 2. It may be checked that I − P is invertible sinceI + P + · · · + P k−1 is the inverse of I − P .

(ii) For A ∈ R(2, 2) define TA ∈ L(R2) by TA(x) = Ax where x ∈ R2

is a column vector. Now set

A =(

0 1

1 0

), B =

(0 1

0 0

).

Then TA is invertible, TB is nilpotent, but TA−TB is not invertible.It is direct to check that TA, TB do not commute.

2.5.27 Let v ∈ V . Then T (v) ∈ R(T )∩V since V is invariant under T . SinceR(T )∩V = {0} we have T (v) = 0. So v ∈ N(T ). That is, V ⊂ N(T ).On the other hand, the rank equation indicates that dim(V ) = n(T ) =dim(N(T )). So V = N(T ).

Section 2.6

2.6.1 Write Sn = Tn − T . Then

T = Tn − Sn = Tn ◦ (I − Rn), (S9)

where Rn = T −1n ◦ Sn. If ‖T −1

n ‖ → ∞ as n → ∞, we may assumethat {‖T −1

n ‖} is bounded without loss of generality. Since ‖Sn‖ → 0as n → ∞ we have ‖Rn‖ ≤ ‖T −1

n ‖‖Sn‖ → 0 as n → ∞ as well.So in view of Theorem 2.25 we see that I − Rn is invertible when n issufficiently large, which leads to the false conclusion that T is invertiblein view of (S9).


2.6.5 Let P ∈ N and consider Tλ = λI + P , where λ is a scalar. It is clearthat Tλ is invertible for all λ = 0 and ‖P − Tλ‖ → 0 as λ → 0. SinceTλ is invertible so it can never be nilpotent.

Section 3.1

3.1.3 (i) Let C be a closed curve around but away from the origin of R2 andconsider

u : C → S1, u(x, y) = f (x, y)

‖f (x, y)‖ , (x, y) ∈ C.

Take R > 0 to be a constant and let C be parametrized by θ :

x = R

acos θ, y = R

bsin θ, 0 ≤ θ ≤ 2π. Then u = (u1, u2) =

(cos θ, sin θ) (on C). So ind(f |C) = deg(u) = 1. On the otherhand, on C, we have

‖g(x, y)‖2 = (a2x2 − b2y2)2 + 4a2b2x2y2 = (a2x2 + b2y2)2 = R4.

Now we can construct v : C → S1 by setting

v(x, y) = g(x, y)

‖g(x, y)‖ =1

R2 (a2x2 − b2y2, 2abxy)

= (cos2 θ − sin2 θ, 2 cos θ sin θ).

Therefore, ind(g|C) = deg(v) = 2.(ii) The origin of R2 is the only zero of f and g that is a simple zero of

f and a double zero of g.(iii) Since fε and gε are small perturbations of f and g, by stability we

deduce ind(fε|C) = 1, ind(gε|C)= 2. However, fε still has exactlyone simple zero, x= ε, y = 0, but gε has two simple zeros, x= ±√

ε, y = 0. Thus, when going from ε = 0 to ε > 0, the double zeroof the latter splits into two simple zeros. Note that, algebraically,a double zero and two simple zeros are both counted as two zeros,which is, loosely speaking, indicative of the result ind(gε|C) = 2.

3.1.4 Consider the vector field

v =(x3 − 3xy2 − 5 cos2(x + y), 3x2y − y3 + 2e−x2y2

).

We are to show that v has a zero somewhere. We use the deformationmethod again to simplify the problem. For this purpose, we set

vt =(x3 − 3xy2 − 5t cos2(x + y), 3x2y − y3+2te−x2y2

), 0 ≤ t ≤ 1.


It is not hard to show that there is some R > 0 such that ‖vt‖ ≥ 1

for√

x2 + y2 = r ≥ R, t ∈ [0, 1]. Thus, we only need to showthat over a suitably large circle given by CR = {r = R}, we haveind(v0|CR

) = 0. Over CR we have x = R cos θ, y = R sin θ, θ ∈[0, 2π ]. On the other hand, for z = x+ iy, we have z3 = (x3− 3xy2)+i(−y3 + 3x2y). Hence, with z = Reiθ , we get v0 = R3(cos 3θ, sin 3θ),

so that u= 1

‖v0‖v0=(cos 3θ, sin 3θ). Thus deg(u)= 3. So ind(v|CR) =

ind(v1|CR) = ind(v0|CR

) = 3 = 0 as expected and v must vanish atsome point inside CR .

3.1.5 Inserting the expression for the stereographic projection given and goingthrough a tedious calculation, we obtain

deg(u) = 1

π

∫R2

1

(1+ x2 + y2)2 dxdy

= 1

π

∫ 2π

0dθ

∫ ∞0

r

(1+ r2)2dr = 1.

3.1.6 Inserting the hedgehog expression and integrating, we obtain

deg(u) = 1

4π

∫ ∞0

∫ 2π

0u ·(

∂u

∂r× ∂u

∂θ

)dθdr

= −n

2

∫ 0

π

sin f df = n,

which indicates that the map covers the 2-sphere, while preserving theorientation, n times.

Section 3.2

3.2.6 Consider the matrix C = aA + bB. The entries of C not in the j thcolumn are the corresponding entries of A multiplied by (a+b) but thej th column of C is equal to the sum of the a-multiple of the j th columnof A and b-multiple of the j th column of B. So by the properties ofdeterminants, we have

det(C) = (a + b)n−1 det

⎛⎜⎜⎝a11 · · · aa1j + bb1j · · · a1n

... · · · · · · · · · ...

an1 · · · aanj + bbnj · · · ann

⎞⎟⎟⎠= (a + b)n−1(a det(A)+ b det(B)).


3.2.7 Let the column vectors in A(t) be denoted by A1(t), . . . , An(t). Then

det(A(t + h))− det(A(t))

= |A1(t + h),A2(t + h), . . . , An(t + h)|− |A1(t), A2(t), . . . , An(t)|

= |A1(t + h),A2(t + h), . . . , An(t + h)|− |A1(t), A2(t + h), . . . , An(t + h)|+ |A1(t), A2(t + h),A3(t + h) . . . , An(t + h)|− |A1(t), A2(t), A3(t + h), . . . , An(t + h)|+ |A1(t), A2(t), A3(t + h), . . . , An(t + h)|+ · · · + |A1(t), A2(t), . . . , An(t + h)|− |A1(t), A2(t), . . . , An(t)|

= |A1(t + h)− A1(t), A2(t + h), . . . , An(t + h)|+ |A1(t), A2(t + h)− A2(t), A3(t + h), . . . , An(t + h)|+ · · · + |A1(t), A2(t), . . . , An(t + h)− An(t)|.

Dividing the above by h = 0, we get

1

h(det(A(t + h))− det(A(t)))

=∣∣∣∣1h(A1(t + h)− A1(t)), A2(t + h), . . . , An(t + h)

∣∣∣∣+∣∣∣∣A1(t),

1

h(A2(t + h)− A2(t)), . . . , An(t + h)

∣∣∣∣+ · · · +

∣∣∣∣A1(t), A2(t), . . . ,1

h(An(t + h)− An(t))

∣∣∣∣ .Now taking the h→ 0 limits on both sides of the above, we arrive at

d

dtdet(A(t)) = |A′1(t), A2(t), . . . , An(t)|

+ |A1(t), A′2(t), . . . , An(t)|

+ · · · + |A1(t), A2(t), . . . , A′n(t)|.

Finally, expanding the j th determinant along the j th column on theright-hand side of the above by the cofactors, j = 1, 2, . . . , n,we have


d

dtdet(A(t)) =

n∑i=1

a′i1(t)Ci1(t)+n∑

i=1

a′i2(t)Ci2(t)

+ · · · +n∑

i=1

a′in(t)Cin(t).

3.2.8 Adding all columns to the first column of the matrix, we get∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

x +n∑

i=1

ai a1 a2 · · · an

x +n∑

i=1

ai x a2 · · · an

x +n∑

i=1

ai a2 x · · · an

......

.... . .

...

x +n∑

i=1

ai a2 a3 · · · x

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

=(

x +n∑

i=1

ai

)∣∣∣∣∣∣∣∣∣∣∣∣∣

1 a1 a2 · · · an

1 x a2 · · · an

1 a2 x · · · an

......

.... . .

...

1 a2 a3 · · · x

∣∣∣∣∣∣∣∣∣∣∣∣∣.

Consider the (n + 1) × (n + 1) determinant on the right-hand side ofthe above. We now subtract row n from row n + 1, row n − 1 fromrow n,..., row 2 from row 3, row 1 from row 2. Then that determinantbecomes

a ≡

∣∣∣∣∣∣∣∣∣∣∣∣∣

1 a1 a2 · · · an

0 x − a1 0 · · · 0

0 · · · x − a2 · · · 0...

......

. . ....

0 · · · · · · · · · x − an

∣∣∣∣∣∣∣∣∣∣∣∣∣.

Since the minor of the entry at the position (1, 1) is the determinant of

a lower triangular matrix, we get a =n∏

i=1

(x − ai).


3.2.9 If c1, . . . , cn+2 are not all distinct then it is clear that the determinant iszero since there are two identical columns. Assume now c1, . . . , cn+2

are distinct and consider the function of x given by

p(x) = det

⎛⎜⎜⎜⎜⎜⎝p1(x) p1(c2) · · · p1(cn+2)

p2(x) p2(c2) · · · p2(cn+2)

......

. . ....

pn+2(x) pn+2(c2) · · · pn+2(cn+2)

⎞⎟⎟⎟⎟⎟⎠ .

By the cofactor expansion along the first column, we see that p(x)

is a polynomial of degree at most n which vanishes at n + 1 points:c2, . . . , cn+2. Hence p(x)= 0 for all x. In particular, p(c1)= 0 as well.

3.2.10 We use Dn+1 to denote the determinant and implement induction to dothe computation. When n = 1, we have D2 = a1x + a0. At n − 1,we assume Dn = an−1x

n−1 + · · · + a1x + a0. At n, by the cofactorexpansion according to the first column, we get

Dn+1 = xDn + (−1)n+2a0(−1)n

= x(anxn−1 + · · · + a2x + a1)+ a0.

3.2.11 Use D(λ) to denote the determinant on the left-hand side of (3.2.46). Itis clear that D(0) = 0. So (3.2.46) is true at λ = 0.

Now assume λ = 0 and rewrite D(λ) as

D(λ) = det

⎛⎜⎜⎜⎜⎜⎜⎜⎝

1 0 0 · · · 0

1 a1 − λ a2 · · · an

1 a1 a2 − λ · · · an

......

.... . .

...

1 a1 a2 · · · an − λ

⎞⎟⎟⎟⎟⎟⎟⎟⎠.

Adding the (−a1) multiple of column 1 to column 2, the (−a2) mul-tiple of column 1 to column 3,..., and the (−an) multiple of column 1to column n+ 1, we see that D(λ) becomes


D(λ) = det

⎛⎜⎜⎜⎜⎜⎜⎜⎝

1 −a1 −a2 · · · −an

1 −λ 0 · · · 0

1 0 −λ · · · 0...

......

. . ....

1 0 0 · · · −λ

⎞⎟⎟⎟⎟⎟⎟⎟⎠

= λn det

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 −a1

λ−a2

λ· · · −an

λ

1 −1 0 · · · 0

1 0 −1 · · · 0...

......

. . ....

1 0 0 · · · −1

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠.

On the right-hand side of the above, adding column 2, column 3,...,column n+ 1 to column 1, we get

D(λ) = λn det

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1− 1

λ

n∑i=1

ai −a1

λ−a2

λ· · · −an

λ

0 −1 0 · · · 0

0 0 −1 · · · 0...

......

. . ....

0 0 0 · · · −1

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠= λn

(1− 1

λ

n∑i=1

ai

)(−1)n = (−1)nλn−1

(λ−

n∑i=1

ai

).

3.2.12 From det(AAt) = det(In) = 1 and det(A)2 = det(AAt) we getdet(A)2 = 1. Using det(A) < 0, we get det(A) = −1. Hencedet(A+ In) = det(A+AAt) = det(A) det(In+At) = − det(In+A),which results in det(In + A) = 0.

3.2.13 Since a11 = 1 or −1, we may add or subtract the first row from theother rows of A so that we reduce A into a matrix B = (bij ) satisfyingb11 = 1 or −1, bi1 = 0 for i ≥ 2, and all entries in the submatrix ofB with the first row and column of B deleted are even numbers. By thecofactor expansion along the first column of B we see that det(B) =an even integer. Since det(A) = det(B), the proof follows.


3.2.14 To see that det(A) = 0, it suffices to show that A is invertible, or equiv-alently, N(A) = {x ∈ Rn |Ax = 0} = {0}. In fact, take x ∈ N(A) andassume otherwise x = (x1, . . . , xn)

t = 0. Let i = 1, . . . , n be suchthat

|xi | = max1≤j≤n

{|xj |}. (S10)

Then |xi | > 0. On the other hand, the ith component of the equationAx = 0 reads

ai1x1 + · · · + aiixi + · · · + ainxn = 0. (S11)

Combining (S10) and (S11) we arrive at

0 = |ai1x1 + · · · + aiixi + · · · + ainxn|≥ |aii ||xi | −

∑1≤j≤n,j =i

|aij ||xj |

≥ |xi |⎛⎝|aii | −

∑1≤j≤n,j =i

|aij |⎞⎠ > 0,

which is a contradiction.3.2.15 Consider the modified matrix

A(t) = D + t (A−D), 0 ≤ t ≤ 1,

where D = diag{a11, . . . , ann}. Then A(t) satisfies the condition statedin the previous exercise. So det(A(t)) = 0. Furthermore, det(A(0)) =det(D) > 0. So det(A(1)) > 0 as well otherwise there is a point t0 ∈(0, 1) such that det(A(t0)) = 0, which is false.

3.2.17 Let α = (a1, . . . , bn), β = (b1, . . . , bn).To compute

det(In − αtβ) =

∣∣∣∣∣∣∣∣∣∣∣

1− a1b1 −a1b2 · · · −a1bn

−a2b1 1− a2b2 · · · −a2bn

......

. . ....

−anb1 −anb2 · · · 1− anbn

∣∣∣∣∣∣∣∣∣∣∣,

we artificially enlarge it into an (n + 1) × (n + 1) determinant ofthe form


det(In − αtβ) =

∣∣∣∣∣∣∣∣∣∣∣∣∣

1 b1 b2 · · · bn

0 1− a1b1 −a1b2 · · · −a1bn

0 −a2b1 1− a2b2 · · · −a2bn

......

.... . .

...

0 −anb1 −anb2 · · · 1− anbn

∣∣∣∣∣∣∣∣∣∣∣∣∣.

Now adding the a1 multiple of row 1 to row 2, a2 multiple of row1 to row 3,..., and an multiple of row 1 to the last row, of the abovedeterminant, we get

det(In − αtβ) =

∣∣∣∣∣∣∣∣∣∣∣∣∣

1 b1 b2 · · · bn

a1 1 0 · · · 0

a2 0 1 · · · 0...

......

. . ....

an 0 0 · · · 1

∣∣∣∣∣∣∣∣∣∣∣∣∣.

Next, subtracting the b1 multiple of row 2 from row 1, b2 multiple ofrow 3 from row 1,..., and bn multiple of the last row from row 1, weobtain

det(In − αtβ) =

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

1−n∑

i=1

aibi 0 0 · · · 0

a1 1 0 · · · 0

a2 0 1 · · · 0...

......

. . ....

an 0 0 · · · 1

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣= 1−

n∑i=1

aibi .

3.2.18 The method is contained in the solution of Exercise 3.2.17. In fact,adding the (−x1) multiple of row 2 to row 1, (−x2) multiple of row 3to row 1,..., and (−xn) multiple of the last row to row 1, we get

f (x1, . . . , xn) = det

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

100−n∑

i=1

x2i 0 0 · · · 0

x1 1 0 · · · 0

x2 0 1 · · · 0...

......

. . ....

xn 0 0 · · · 1

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠= 100−

n∑i=1

x2i .


So f (x1, . . . , xn) = 0 represents the sphere in Rn centered at the originand of radius 10.

Section 3.3

3.3.3 First we observe that for A ∈ F(m, n) and B ∈ F(n, l) written as B =(B1, . . . , Bl) where B1, . . . , Bl ∈ Fn are l column vectors we have

AB = (AB1, . . . , ABl), (S12)

where AB1, . . . , ABl are l column vectors in Fm.Now for A ∈ F(n, n) we recall the relation (3.3.5). If det(A) = 0, we

can evaluate determinants on both sides of (3.3.5) to arrive at (3.3.15). Ifdet(A) = 0 then in view of (3.3.5) and (S12) we have

adj(A)A1 = 0, . . . , adj(A)An = 0, (S13)

where A1, . . . , An are the n column vectors of A. If A = 0, then atleast one of A1, . . . , An is nonzero. In other words, n(adj(A)) ≥ 1 sothat r(adj(A)) ≤ n− 1. Hence det(adj(A)) = 0 and (3.3.15) is valid. IfA = 0 then adj(A) = 0 and (3.3.15) trivially holds.

3.3.4 If r(A) = n then both A and adj(A) are invertible by (3.3.5). So weobtain r(adj(A)) = n. If r(A) = n−1 then A contains an (n−1)×(n−1)

submatrix whose determinant is nonzero. So adj(A) = 0 which leads tor(adj(A)) ≥ 1. On the other hand, from (S13), we see that n(adj(A)) ≥n − 1. So r(adj(A)) ≤ 1. This proves r(adj(A)) = 1. If r(A) ≤ n − 2then any n− 1 row vectors of A are linearly dependent, which indicatesthat all row vectors of any (n− 1)× (n− 1) submatrix of A are linearlydependent. In particular all cofactors of A are zero. Hence adj(A) = 0and r(adj(A)) = 0.

3.3.6 Let n ≥ 3 and assume A is invertible. Using (3.3.5) and (3.3.15) we have

adj(A)adj(adj(A)) = det(adj(A))In = (det(A))n−1In.

Comparing this with (3.3.5) again we obtain adj(adj(A)) =(det(A))n−2A. If A is not invertible, then det(A) = 0 and r(adj(A)) ≤ 1.Since n ≥ 3, we have r(adj(A)) ≤ 1 ≤ n − 2. So in view ofExercise 3.3.4 we have r(adj(adj(A))) = 0 or adj(adj(A)) = 0. Soadj(adj(A)) = (det(A))n−2A is trivially true.

If n = 2 it is direct to verify the relation adj(adj(A)) = A.


3.3.7 For A = (aij ), we use MAij and CA

ij to denote the minor and cofactor ofthe entry aij , respectively. Then we have

MAij = MAt

ji , (S14)

which leads to CAij = CAt

ji , i, j = 1, . . . , n. Hence

(adj(A))t = (CAij ) = (CAt

ij )t = adj(At ).

(If A is invertible, we may use (3.3.5) to write adj(A) = det(A)A−1.Therefore (adj(A))t = det(A)(A−1)t = det(At )(At )−1 = adj(At ).)

3.3.8 From (3.3.5) we have AAt = det(A)In. For B = (bij ) = AAt , we get

bii =n∑

j=1

a2ij , i = 1, . . . , n.

So det(A) = 0 if and only if bii = 0 for all i = 1, . . . , n, or A = 0.3.3.9 First assume that A,B are both invertible. Then, in view of (3.3.5), we

have

adj(AB) = det(AB)(AB)−1 = det(A) det(B)B−1A−1

= (det(B)B−1)(det(A)A−1) = adj(B)adj(A).

Next we prove the conclusion without assuming that A,B are invertible.Since det(A − λI) and det(B − λI) are polynomials of the variable λ,of degree n, they have at most n roots. Let {λk} be a sequence in R sothat λk → 0 as k → ∞ and det(A − λkI) = 0, det(B − λkI) = 0 fork = 1, 2, . . . . Hence we have

adj((A− λkI)(B − λkI)) = adj(B − λkI)adj(A− λkI), k = 1, 2, . . . .

Letting k → ∞ in the above we arrive at adj(AB) = adj(B)adj(A)

again.

Section 3.4

3.4.1 Using CAij to denote the cofactor of the entry aij of the matrix A = (aij )

and applying (3.2.42), we have

a1 = d

dλpA(λ)

∣∣∣∣λ=0

= d

dλ(det(λIn − A))

∣∣∣∣λ=0

=n∑

i=1

C(−A)ii = (−1)n−1

n∑i=1

CAii .


3.4.2 It suffices to show that M = C(n, n) \D is closed in C(n, n). Let {Ak}be a sequence in M which converges to some A ∈ C(n, n). We need toprove that A ∈M. To this end, consider pAk

(λ) and let {λk} be multipleroots of pAk

(λ). Then we have

pAk(λk) = 0, p′Ak

(λk) = 0, k = 1, 2, . . . . (S15)

On the other hand, we know that the coefficients of pA(λ) are con-tinuously dependent on the entries of A. So the coefficients of pAk

(λ)

converge to those of pA(λ), respectively. In particular, the coefficientsof pAk

(λ), say {akn−1}, . . . , {ak

1}, {ak0}, are bounded sequences. Thus

|λk|n ≤ |λnk − pAk

(λk)| + |pAk(λk)| ≤

n−1∑i=0

|aki ||λk|i ,

which indicates that {λk} is a bounded sequence in C. Passing to a sub-sequence if necessary, we may assume without loss of generality thatλk → some λ0 ∈ C as k → ∞. Letting k → ∞ in (S15), we obtainpA(λ0) = 0, p′A(λ0) = 0. Hence λ0 is a multiple root of pA(λ) whichproves A ∈M.

3.4.4 Using (3.4.31), we have

adj(λIn − A) = An−1λn−1 + · · · + A1λ+ A0,

where the matrices An−1, An−2, . . . , A1, A0 are determined through(3.4.32). Hence, setting λ = 0 in the above, we get

adj(−A) = A0 = a1In + a2A+ · · · + an−1An−2 + An−1.

That is,

adj(A) = (−1)n−1(a1In + a2A+ · · ·+an−1An−2 + An−1). (S16)

Note that (S16) may be used to prove some known facts moreeasily. For example, the relation adj(At ) = (adj(A))t (see Exercise3.3.7) follows immediately.

3.4.6 First assume that A is invertible. Then

pAB(λ) = det(λIn − AB) = det(A[λA−1 − B])= det(A) det(λA−1 − B) = det([λA−1 − B]A)

= det(λIn − BA) = pBA(λ).


Next assume A is arbitrary. If F = R or C, we may proceed as follows.Let {ak} be a sequence in F such that A− akIn is invertible and ak → 0as k →∞. Then we have

p(A−akIn)B(λ) = pB(A−akIn)(λ), k = 1, 2, . . . .

Letting k →∞ in the above we arrive at pAB(λ) = pBA(λ) again.If A is not invertible or F is not R or C, the above methods fail and

a different method needs to be used. To this end, we recognize the fol-lowing matrix relations in F(2n, 2n):

(In −A

0 λIn

)(A λIn

In B

)=(

0 λIn − AB

λIn λB

),(

In 0

−B λIn

)(A λIn

In B

)=(

A λIn

λIn − BA 0

).

The determinants of the left-hand sides of the above are the same. Thedeterminants of the right-hand sides of the above are

(−1)nλn det(λIn − AB), (−1)nλn det(λIn − BA).

Hence the conclusion pAB = (λ) = pBA(λ) follows.3.4.7 We consider the nontrivial case n ≥ 2. Using the notation and method

in the solution to Exercise 3.2.17, we have

det(λIn − αtβ) =

∣∣∣∣∣∣∣∣∣∣∣∣∣

1 b1 b2 · · · bn

a1 λ 0 · · · 0

a2 0 λ · · · 0...

......

. . ....

an 0 0 · · · λ

∣∣∣∣∣∣∣∣∣∣∣∣∣.

Assume λ = 0. Subtracting the b1/λ multiple of row 2 from row 1,b2/λ multiple of row 3 from row 1,..., and bn/λ multiple of the last rowfrom row 1, we obtain


det(λIn − αtβ) =

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

1− 1

λ

n∑i=1

aibi 0 0 · · · 0

a1 λ 0 · · · 0

a2 0 λ · · · 0...

......

. . ....

an 0 0 · · · λ

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣=(

1− 1

λ

n∑i=1

aibi

)λn = λn−1(λ− αβt ).

Since both sides of the above also agree at λ = 0, they are identical forall λ.

3.4.8 (i) We have pA(λ) = λ3 − 2λ2 − λ + 2. From pA(A) = 0 we haveA(A2 − 2A− In) = −2In. So

A−1 = −1

2(A2 − 2A− In) =

⎛⎜⎜⎜⎜⎝1 −1 0

01

20

−23

2−1

⎞⎟⎟⎟⎟⎠ .

(ii) We divide λ10 by pA(λ) to find

λ10 =pA(λ)(λ7+ 2λ7+ 5λ5+ 10λ4+ 21λ3+ 42λ2+ 85λ+ 170)

+ 341λ2 − 340.

Consequently, inserting A in the above, we have

A10 = 341A2 − 340In =⎛⎜⎝ 1 2046 0

0 1024 0

0 −1705 1

⎞⎟⎠ .

Section 4.1

4.1.8 Let u1 and u2 be linearly independent vectors in U and assume(u1, u1) = 0 and (u2, u2) = 0 otherwise there is nothing to show. Setu3 = a1u1 + u2 where a1 = −(u1, u2)/(u1, u1). Then u3 = 0 and isperpendicular to u1. So u1, u3 are linearly independent as well. Consideru = cu1+u3 (c ∈ C). Of course u = 0 for any c. Since (u1, u3) = 0, we


have (u, u) = c2(u1, u1) + (u3, u3). Thus, in order to have (u, u) = 0,we may choose

c =√− (u3, u3)

(u1, u1).

Section 4.2

4.2.6 We first show that the definition has no ambiguity. For this purpose,assume [u1] = [u] and [v1] = [v]. We need to show that (u1, v1) =(u, v). In fact, since u1 ∈ [u] and v1 ∈ [v], we know that u1 − u ∈ U0

and v1−v ∈ U0. Hence there are x, y ∈ U0 such that u1−u = x, v1−v = y. So (u1, v1) = (u+ x, v+ y) = (u, v)+ (u, y)+ (x, v+ y) =(u, v).

It is obvious that ([u], [v]) is bilinear and symmetric.To show that the scalar product is non-degenerate, we assume [u] ∈

U/U0 satisfies ([u], [v]) = 0 for any [v] ∈ U/U0. Thus (u, v) = 0 forall v ∈ U which implies u ∈ U0. In other words, [u] = [0].

4.2.8 Let w ∈ V ⊥. Then (v,w) = 0 for any v ∈ V . Hence 0 = (v,w) =〈v, ρ(w)〉,∀v ∈ V, which implies ρ(w) ∈ V 0. Hence ρ(V ⊥) ⊂ V 0.On the other hand, take w′ ∈ V 0. Then 〈v,w′〉 = 0 for all v ∈ V .Since ρ : U → U ′ is an isomorphism, there is a unique w ∈ U suchthat ρ(w) = w′. Thus (v,w) = 〈v, ρ(w)〉 = 〈v,w′〉 = 0, v ∈ V. Thatis, w ∈ V ⊥. Thus w′ ∈ ρ(V ⊥), which proves V 0 ⊂ ρ(V ⊥).

4.2.10 Recall (4.1.26). Replace V,W in (4.1.26) by V ⊥,W⊥ and use(V ⊥)⊥ = V, (W⊥)⊥ = W . We get (V ⊥ +W⊥)⊥ = V ∩W. Taking⊥ on both sides of this equation, we see that the conclusion follows.

4.2.11 From Exercise 4.1.8 we know that there is a nonzero vector u ∈ U suchthat (u, u) = 0. Since (·, ·) is non-degenerate, there is some w ∈ U

such that (u,w) = 0. It is clear that u,w are linearly independent. Ifw satisfies (w,w) = 0, then we take v = w/(u,w). Hence (v, v) = 0and (u, v) = 1. Assume now (w,w) = 0 and consider x = u+cw (c ∈C). Then x can never be zero for any c ∈ C. Set (x, x) = 0. We get2c(u,w) + c2(w,w) = 0. So we may choose c = −2(u,w)/(w,w).Thus, with v = x/(u, x) where

(u, x) = −2(u,w)2

(w,w)= 0,

we have (v, v) = 0 and (u, v) = 1 again.


Section 4.3

4.3.1 We consider the real case for simplicity. The complex case is similar.Suppose M is nonsingular and consider a1u1+· · ·+akuk = 0, where

a1, . . . , ak are scalars. Taking scalar products of this equation with thevectors u1, . . . , uk consecutively, we get

a1(u1, ui)+ · · · + ak(uk, ui) = 0, i = 1, . . . , k.

Since Mt is nonsingular, we get a1 = · · · = ak = 0.Suppose M is singular. Then so is Mt and the above system of equa-

tions will have a solution (a1, . . . , ak) = (0, . . . , 0). We rewrite theabove system as(

k∑i=1

aiui, u1

)= 0, . . . ,

(k∑

i=1

aiui, uk

)= 0.

Then modify the above into

a1

(k∑

i=1

aiui, u1

)= 0, . . . , ak

(k∑

i=1

aiui, uk

)= 0

and sum up the k equations. We get⎛⎝ k∑i=1

aiui,

k∑j=1

ajuj

⎞⎠ = 0.

That is, we have (v, v) = 0 where v =k∑

i=1

aiui . Since the scalar product

is positive definite, we arrive at v = 0. Thus u1, . . . , uk are linearlydependent.

4.3.2 Suppose u ∈ Span{u1, . . . , uk}. Then there are scalars a1, . . . , ak suchthat u = a1u1+· · ·+ akuk. Taking scalar products of this equation withu1, . . . , uk , we have

(ui, u) = a1(ui, u1)+ · · · + ak(ui, uk), i = 1, . . . , k,

which establishes⎛⎜⎜⎝(u1, u)

...

(uk, u)

⎞⎟⎟⎠ ∈ Span

⎧⎪⎪⎨⎪⎪⎩⎛⎜⎜⎝

(u1, u1)

...

(uk, u1)

⎞⎟⎟⎠ , . . . ,

⎛⎜⎜⎝(u1, uk)

...

(uk, uk)

⎞⎟⎟⎠⎫⎪⎪⎬⎪⎪⎭ .


On the other hand, let V = Span{u1, . . . , uk}. If k < dim(U), thenV ⊥ = {0}. Take u ∈ V ⊥, u = 0. Then⎛⎜⎜⎝

(u1, u)

...

(uk, u)

⎞⎟⎟⎠ =⎛⎜⎜⎝

0...

0

⎞⎟⎟⎠

∈ Span

⎧⎪⎪⎨⎪⎪⎩⎛⎜⎜⎝

(u1, u1)

...

(uk, u1)

⎞⎟⎟⎠ , . . . ,

⎛⎜⎜⎝(u1, uk)

...

(uk, uk)

⎞⎟⎟⎠⎫⎪⎪⎬⎪⎪⎭ ,

but u ∈ Span{u1, . . . , uk}.4.3.4 For any u ∈ U , we have

‖(I ± S)u‖2 = ((I ± S)u, (I ± S)u)

= ‖u‖2 + ‖Su‖2 ± (u, Su)± (Su, u)

= ‖u‖2 + ‖Su‖2 ± (S′u, u)± (Su, u)

= ‖u‖2 + ‖Su‖2 ∓ (Su, u)± (Su, u)

= ‖u‖2 + ‖Su‖2.

So (I ± S)u = 0 whenever u = 0. Thus n(I ± S) = 0 and I ± S mustbe invertible.

4.3.6 Set R(A) = {u ∈ Cm | u = Ax for some x ∈ Cn}. Rewrite Cm as Cm =R(A)⊕ (R(A))⊥ and assert (R(A))⊥ = {y ∈ Cm |A†y = 0} = N(A†).

In fact, if A†y = 0, then (y,Ax) = y†Ax = (x†A†y)† = 0 for anyx ∈ Cn. That is, N(A†) ⊂ R(A)⊥. On the other hand, take z ∈ R(A)⊥.Then for any x ∈ Cn we have 0 = (z, Ax) = z†Ax = (A†z)†x =(A†z, x), which indicates A†z = 0. So z ∈ N(A†). Thus R(A)⊥ ⊂N(A†).

Therefore the equation Ax = b has a solution if and only if b ∈ R(A)

or, if and only if b is perpendicular to R(A)⊥ = N(A†).

Section 4.4

4.4.2 It is clear that P ∈ L(U). Let {w1, . . . , wl} be an orthogonal basis ofW = V ⊥. Then {v1, . . . , vk, w1, . . . , wl} is an orthogonal basis of U .Now it is direct to check that P(vj ) = vj , j = 1, . . . , k, P (ws) =0, s = 1, . . . , l. So P 2(vj ) = P(vj ), j = 1, . . . , k, P 2(ws) =P(ws), s = 1, . . . , l. Hence P 2 = P , R(P ) = V , and N(P ) = W

as claimed. In other words, P : U → U is the projection of U onto V

along W .


4.4.4 (i) We know that we have the direct sum U = V⊕V ⊥. For any x, y ∈ [u]we have x = v1 + w1, y = v2 + w2, vi ∈ V,wi ∈ V ⊥, i = 1, 2. Sox − y = (v1 − v2) + (w1 − w2). However, since x − y ∈ V , we havew1 = w2. In other words, for any x ∈ [u], there is a unique w ∈ V ⊥so that x = v + w for some v ∈ V . Of course, [x] = [w] = [u] and‖x‖2 = ‖v‖2 + ‖w‖2 whose minimum is attained at v = 0. This proves‖[u]‖ = ‖w‖.

(ii) So we need to find the unique w ∈ V ⊥ such that [w] = [u]. Firstfind an orthogonal basis {v1, . . . , vk} of V . Then for u we know that theprojection of u onto V along V ⊥ is given by the Fourier expansion (seeExercise 4.4.2)

v =k∑

i=1

(vi, u)

(vi, vi)vi .

Thus, from u = v + w where w ∈ V ⊥, we get

w = u−k∑

i=1

(vi, u)

(vi, vi)vi .

Section 4.5

4.5.5 (i) It is direct to check that ‖T (x)‖ = ‖x‖ for x ∈ Rn.(ii) Since T (T (x)) = x for x ∈ Rn, we have T 2 = I . Consequently,

T = T ′. So if λ is an eigenvalue of T then λ2 = 1. Thus the eigen-values of T may only be ±1. In fact, for u = (1, 0, . . . , 0, 1)t andv = (1, 0, . . . , 0,−1)t , we have T (u) = u and T (v) = −v. So ±1are both eigenvalues of T .

(iii) Let x, y ∈ Rn satisfy T (x) = x and T (y) = −y. Then (x, y) =(T (x), T (y)) = (x,−y) = −(x, y). So (x, y) = 0.

(iv) Since T 2 − I = 0 and T = ±I , we see that the minimal polyno-mial of T is mT (λ) = λ2 − 1.

Section 5.2

5.2.7 We can write A as A = P tDP where P is orthogonal and

D = diag{d1, . . . , dn}, di = ±1, i = 1, . . . , n.

It is easily seen that the column vectors of D are mutually orthogonaland of length 1. So D is orthogonal as well. Hence A must be orthog-onal as a product of three orthogonal matrices.


5.2.16 Use ‖·‖ to denote the standard norm of Rn and let u1 = 1

‖u‖u. Expand

{u1} to get an orthonormal basis {u1, u2, . . . , un} of Rn (with the usualscalar product) where u1, . . . , un are all column vectors. Let Q be then×n matrix whose 1st, 2nd,. . . , nth column vectors are u1, u2, . . . , un.Then

utQ = utQ = ut (u1, u2, . . . , un)

= (utu1, utu2, . . . , u

tun) = (‖u‖, 0, . . . , 0).

Therefore

Qt(uut )Q = (utQ)t (utQ)

=

⎛⎜⎜⎜⎜⎜⎝‖u‖

0...

0

⎞⎟⎟⎟⎟⎟⎠ (‖u‖, 0, . . . , 0) = diag{‖u‖2, 0, . . . , 0}.

Section 5.3

5.3.8 For definiteness, assume m < n. So r(A) = m. Use ‖ · ‖ to denote thestandard norm of Rl . Now check

q(x) = xt (AtA)x = (Ax)t (Ax) = ‖Ax‖2 ≥ 0, x ∈ Rn,

Q(y)= yt (AAt)y= (Aty)t (Aty)=‖Aty‖2 ≥ 0, y ∈ Rm.

Hence AtA ∈ R(n, n) and AAt ∈ R(m,m) are both semi-positivedefinite.

If q(x) = 0, we have Ax = 0. Since r(A) = m < n, we know thatn(A) = n − r(A) > 0. Thus there is some x = 0 such that Ax = 0.Hence q cannot be positive definite.

If Q(y) = 0, we have Aty = 0. Since r(At ) = r(A) = m, we knowthat n(At ) = m− r(At ) = 0. Thus y = 0. Hence Q(y) > 0 whenevery = 0 and Q is positive definite.

5.3.11 Assume A is semi-positive definite. Then there is a semi-positive defi-nite matrix B such that A = B2. If x ∈ S, then 0 = xtAx = xtB2x =(Bx)t (Bx) = ‖Bx‖2. That is, Bx = 0, which implies Ax = B2x = 0.Conversely, Ax = 0 clearly implies x ∈ S.

If A is indefinite, N(A) may not be equal to S. For example, takeA = diag{1,−1}. Then N(A) = {0} and


S ={

x =(

x1

x2

)∈ R2∣∣∣∣x2

1 − x22 = 0

},

which is the union of the two lines x1 = x2 and x1 = −x2, whichcannot even be a subspace of R2.

5.3.12 Since A is positive definite, there is an invertible matrix P ∈ R(n, n)

such that P tAP = In. Since P tBP is symmetric, there is an orthog-onal matrix Q ∈ R(n, n) such that Qt(P tBP )Q is a real diagonalmatrix. Set C = PQ. Then both CtAC and CtBC are diagonal.

5.3.13 First we observe that if C is symmetric then the eigenvalues of C aregreater than or equal to c if and only if xtCx ≥ cxtx = c‖x‖2, x ∈ Rn.

In fact, if λ1, . . . , λn are eigenvalues of C, then there is an orthogonalmatrix P such that C = P tDP , where D = diag{λ1, . . . , λn}. So withy = Px we have

xtCx = (Px)tD(Px) = ytDy =n∑

i=1

λiy2i ≥ λ0‖y‖2,

where λ0 is the smallest eigenvalue of C. If λ0 ≥ c then the above givesus xtCx ≥ c‖y‖2 = c‖x‖2. Conversely, we see that

xtCx − c‖x‖2 ≥ 0 impliesn∑

i=1

λiyi − c‖y‖2 =n∑

i=1

(λi − c)y2i ≥ 0.

Since x and hence y can be arbitrarily chosen, we have λi ≥ c for all i.Now from xtAx ≥ a‖x‖2, xtBx ≥ b‖x‖2, we infer xt (A+ B)x ≥

(a + b)‖x‖2. Thus the eigenvalues of A + B are greater than or equalto a + b.

5.3.14 Since A is positive definite, there is another positive definite matrixP ∈ R(n, n) such that A = P 2. Let C = AB. Then we have C = P 2B.Thus P−1CP = PBP . Since the right-hand side of this relation is pos-itive definite, its eigenvalues are all positive. Hence all the eigenvaluesof C are positive.

5.3.16 Let P be a nonsingular matrix such that A = P tP . Then det(A) =det(P tP ) = det(P )2. Now

det(A+ B) = det(P tP + B)

= det(P t ) det(I + [P−1]tB[P−1]) det(P ).

Consider C = [P−1]tB[P−1] which is clearly semi-positive definite.Let the eigenvalues of C be λ1, . . . , λn ≥ 0. Then there is an orthogonalmatrix Q such that QtCQ = D where D = diag{λ1, . . . , λn}. Thus


det(C) = det(QtDQ) = det(D) = λ1 · · · λn. Combining the aboveresults and using det(Q)2 = 1, we obtain

det(A+ B) = det(Qt) det(P t ) det(I + C) det(P ) det(Q)

= det(A) det(I +QtCQ) = det(A) det(I +D)

= det(A)

n∏i=1

(1+ λi)

≥ det(A)(1+ λ1 · · · λn)

= det(A)+ det(A) det(C).

However, recall the relation between B and C we get B = P tCP

which gives us det(B) = det(P )2 det(C) = det(A) det(C) so that thedesired inequality is established.

If the inequality becomes an equality, then all λ1, . . . , λn vanish. SoC = 0 which indicates that B = 0.

5.3.17 Since A is positive definite, there is an invertible matrix P ∈ R(n, n)

such that P tAP = In. Hence we have

(det(P ))2 det(λA− B) = det(P t ((λ− 1)A− (B −A))P )

= det((λ− 1)In − P t(B − A)P ).

Since P t (B − A)P is positive semi-definite whose eigenvalues are allnon-negative, we see that the roots of the equation det(λA − B) = 0must satisfy λ− 1 ≥ 0.

5.3.18 (i) Let x ∈ U be a minimum point of f . Then for any y ∈ U we haveg(ε) ≥ g(0), where g(ε) = f (x + εy) (ε ∈ R). Thus

0 =(

dg

dε

)ε=0

= 1

2(y, T (x))+ 1

2(x, T (y))− (y, b)

= (y, T (x)− b).

Since y ∈ U is arbitrary, we get T (x)−b = 0 and (5.3.25) follows.Conversely, if x ∈ U satisfies (5.3.25), then

f (u)− f (x) = 1

2(u, T (u))− (u, T (x))− 1

2(x, T (x))+ (x, T (x))

= 1

2(u− x, T (u− x)) ≥ λ0‖u− x‖2, u ∈ U,

(S17)

for some constant λ0 > 0. That is, f (u) ≥ f (x) for all u ∈ U .


(ii) If x, y ∈ U solve (5.3.25), then (i) indicates that x, y are the mini-mum points of f . Hence f (x) = f (y). Replacing u by y in (S17)we arrive at x = y.

5.3.19 Let S ∈ L(U) be positive semi-definite so that S2 = T . Then q(u) =(S(u), S(u)) = ‖S(u)‖2 for any u ∈ U . Hence the Schwarz inequality(4.3.10) gives us

q(αu+ βv) = (S(αu+ βv), S(αu+ βv))

= α2‖S(u)‖2 + 2αβ(S(u), S(v))+ β2‖S(v)‖2

≤ α2‖S(u)‖2 + 2αβ‖S(u)‖‖S(v)‖ + β2‖S(v)‖2

≤ α2‖S(u)‖2 + αβ(‖S(u)‖2 + ‖S(v)‖2)

+ β2‖S(v)‖2

= α(α + β)‖S(u)‖2 + β(α + β)‖S(v)‖2

= αq(u)+ βq(v), u, v ∈ U,

where we have also used the inequality 2ab ≤ a2 + b2 for a, b ∈ R.

Section 5.4

5.4.1 Suppose otherwise that A is positive definite. Using Theorem 5.11 wehave

a1, a2, a3 > 0,

∣∣∣∣∣ a1 a2

a2 a3

∣∣∣∣∣ ,∣∣∣∣∣ a1 a3

a3 a2

∣∣∣∣∣ ,∣∣∣∣∣ a3 a1

a1 a2

∣∣∣∣∣ > 0,

which contradicts det(A) > 0.5.4.2 In the general case, n ≥ 3, we have

q(x) =n−1∑i=1

(xi + aixi+1)2 + (xn + anx1)

2, x =

⎛⎜⎜⎝x1

...

xn

⎞⎟⎟⎠ ∈ Rn.

We can rewrite q(x) asn∑

i=1

y2i with

yi = xi + aixi+1, i = 1, . . . , n− 1, yn = xn + anx1. (S18)

It is seen that q(x) is positive definite if and only if the change of vari-ables given in (S18) is invertible, which is equivalent to the condition1+ (−1)n+1a1a2 · · · an = 0.

5.4.3 If A is positive semi-definite, then A + λIn is positive definite for allλ > 0. Hence all the leading principal minors of A + λIn are positive


when λ > 0. Taking λ → 0+ in these positive minors we arrive atthe conclusion. The converse is not true, however. For example, takeA = diag{1, 0,−1}. Then all the leading principal minors of A are non-negative but A is indefinite.

5.4.5 (i) The left-hand side of equation (5.4.24) reads(An−1 An−1β + α

βtAn−1 + αt βtAn−1β + αtβ + βtα + ann

).

Comparing the above with the right-hand side of (5.4.24), we arriveat the equation An−1β = −α, which has a unique solution sinceAn−1 is invertible. Inserting this result into the entry at the position(n, n) in the above matrix and comparing with the right-hand sideof (5.4.24) again, we get a = −αt (A−1

n−1)α ≤ 0 since A−1n−1 is

positive definite.(ii) Taking determinants on both sides of (5.4.24) and using the facts

det(An−1) > 0 and a ≤ 0, we obtain

det(A) = det(An−1)(ann + a) ≤ det(An−1)ann.

(iii) This follows directly.(iv) If A is positive semi-definite, then A + λIn is positive definite for

λ > 0. Hence det(A + λIn) ≤ (a11 + λ) · · · (ann + λ), λ > 0.

Letting λ → 0+ in the above inequality we see that (5.4.26) holdsagain.

5.4.6 (i) It is clear that the entry at the position (i, j) of the matrix AtA is(ui, uj ).

(ii) Thus, applying (5.4.26), we have det(AtA) ≤ (u1, u1) · · · (un, un)

or (det(A))2 ≤ ‖u1‖2 · · · ‖un‖2.5.4.7 Using the definition of the standard Euclidean scalar product we have

‖ui‖ ≤ an12 , i = 1, . . . , n. Thus (5.4.29) follows.

Section 5.5

5.5.1 Assume the nontrivial case dim(U) ≥ 2. First suppose that T satisfiesT (u) ∈ Span{u} for any u ∈ U . If T = aI for some a ∈ F, then there aresome linearly independent vectors u, v ∈ U such that T (u) = au andT (v) = bv for some a, b ∈ Fwith a = b. Since T (u+v) ∈ Span{u+v},we have T (u+v) = c(u+v) for some c ∈ F. Thus c(u+v) = au+bv,which implies a = c and b = c because u, v are linearly independent.This is false. Next we assume that there is some u ∈ U such that v =T (u) ∈ Span{u}. Thus u, v are linearly independent. Let S ∈ L(U) besuch that S(u) = v and S(v) = u. Then T (v) = T (S(u)) = S(T (u)) =


S(v) = u. Let R ∈ L(U) be such that R(u) = v and R(v) = 0. Thenu = T (v) = T (R(u)) = R(T (u)) = R(v) = 0, which is again false.

5.5.3 Let λ1, . . . , λn ∈ F be the n distinct eigenvalues of T and u1, . . . , un

the associated eigenvectors, respectively, which form a basis of U . Wehave T (S(ui)) = S(T (ui)) = λiS(ui). So S(ui) ∈ Eλi

. In other words,there is some bi ∈ F such that S(ui) = biui for i = 1, . . . , n. It is clearthat the scalars b1, . . . , bn determine S uniquely. On the other hand,consider a mapping R ∈ L(U) defined by

R = a0I + a1T + · · · + an−1Tn−1, a0, a1, . . . , an−1 ∈ F. (S19)

Then

R(ui) = (a0 + a1λi + · · · + an−1λn−1i )ui, i = 1, . . . , n.

Therefore, if a0, a1, . . . , an−1 are so chosen that

a0 + a1λi + · · · + an−1λn−1i = bi, i = 1, . . . , n, (S20)

then R = S. However, using the Vandermonde determinant, we knowthat the non-homogeneous system of equations (S20) has a unique so-lution in a0, a1, . . . , an−1. Thus R may be constructed by (S19) to yieldR = S.

5.5.4 From the previous exercise we have CT = Span{I, T , . . . , T n−1}. Itremains to show that, as vectors in L(U), the mappings I, T , . . . , T n−1

are linearly independent. To see this, we consider

c0I + c1T + · · · + cn−1Tn−1 = 0, c0, c1, . . . , cn−1 ∈ F.

Applying the above to ui we obtain

c0 + c1λi + · · · + cn−1λn−1i = 0, i = 1, . . . , n,

which lead to c0 = c1 = · · · = cn−1 = 0 in view of the Vandermondedeterminant. So dim(CT ) = n.

Section 5.6

5.6.3 (i) Let v ∈ N(T ′). Then 0 = (u, T ′(v))U = (T (u), v)V for any u ∈U . So v ∈ R(T )⊥. Conversely, if v ∈ R(T )⊥, then for any u ∈ U

we have 0 = (T (u), v)V = (u, T ′(v))U . So T ′(v) = 0 or v ∈N(T ′). Thus N(T ′) = R(T )⊥. Taking ⊥ on this relation we obtainR(T ) = N(T ′)⊥.


(ii) From R(T )⊥ = N(T ′) and V = R(T ) ⊕ R(T )⊥ we havedim(V ) = r(T )+ n(T ′). Applying this result in the rank equationr(T ′)+ n(T ′) = dim(V ) we get r(T ) = r(T ′).

(iii) It is clear that N(T ) ⊂ N(T ′ ◦ T ). If u ∈ N(T ′ ◦ T ) then 0 =(u, (T ′ ◦ T )(u))U = (T (u), T (u))V . Thus u ∈ N(T ). This provesN(T ) = N(T ′ ◦ T ). Hence n(T ) = n(T ′ ◦ T ). Using the rankequations dim(U) = r(T ) + n(T ) and dim(U) = r(T ′ ◦ T ) +n(T ′ ◦ T ) we obtain r(T ) = r(T ′ ◦ T ). Replacing T by T ′, we getr(T ′) = r(T ◦ T ′).

Section 6.2

6.2.5 Since P 2 = P , we know that P projects U onto R(P ) along N(P ). IfN(P ) = R(P )⊥, we show that P ′ = P . To this end, for any u1, u2

∈ U we rewrite them as u1 = v1 + w1, u2 = v2 + w2 wherev1, v2 ∈ R(P ),w1, w2 ∈ R(P )⊥. Then we have P(v1) = v1, P (v2) =v2, P (w1) = P(w2) = 0. Hence (u1, P (u2)) = (v1 + w1, v2) =(v1, v2), (P (u1), u2) = (v1, v2 + w2) = (v1, v2). This proves(u1, P (u2)) = (P (u1), u2), which indicates P = P ′. Conversely, wenote that for any T ∈ L(U) there holds R(T )⊥ = N(T ′) (cf. Exercise5.6.3). If P = P ′, we have R(P )⊥ = N(P ).

6.2.8 (i) If T k = 0 for some k ≥ 1 then eigenvalues of T vanish. Let{u1, . . . , un} be a basis of U consisting of eigenvectors of T . ThenT (ui) = 0 for all i = 1, . . . , n. Hence T = 0.

(ii) If k = 1, then there is nothing to show. Assume that the statementis true up to k = l ≥ 1. We show that the statement is true at k = l+1. Ifl = odd, then k = 2m for some integer m ≥ 1. Hence T 2m(u) = 0 givesus (u, T 2m(u)) = (T m(u), T mu) = 0. So T m(u) = 0. Since m ≤ l, wededuce T (u) = 0. If l = even, then l = 2m for some integer m ≥ 1 andT 2m+1(u) = 0 gives us T 2m(v) = 0 where v = T (u). Hence T (v) = 0.That is, T 2(u) = 0. So it follows again that T (u) = 0.

Section 6.3

6.3.11 Since A†A is positive definite, there is a positive definite Hermitianmatrix B ∈ C(n, n) such that A†A = B2. Thus we can rewrite A asA = PB with P = (A†)−1B which may be checked to satisfy

PP † = (A†)−1BB†A−1 = (A†)−1B2A−1 = (A†)−1A†AA−1 = In.


6.3.12 From the previous exercise, we may rewrite A as A = CB where C,B

are in C(n, n) such that C is unitary and B positive definite. HenceA†A = B2. So if λ1, . . . , λn > 0 are the eigenvalues of A†A then√

λ1, . . . ,√

λn are those of B. Let Q ∈ C(n, n) be unitary such thatB = Q†DQ, where D is given in (6.3.26). Then A = CQ†DQ. LetP = CQ†. We see that P is unitary and the problem follows.

6.3.13 Since T is positive definite, (u, v)T ≡ (u, T (v)), u, v ∈ U , also de-fines a positive definite scalar product over U . So the problem followsfrom the Schwarz inequality stated in terms of this new scalar product.

Section 6.4

6.4.4 Assume that A = (aij ) is upper triangular. Then we see that the diag-onal entries of A†A and AA† are

|a11|2, . . . ,i∑

j=1

|aji |2, . . . ,n∑

j=1

|ajn|2,

n∑j=1

|a1j |2, . . . ,n∑

j=i

|aij |2, . . . , |ann|2,

respectively. Comparing these entries in the equation A†A = AA†, weget aij = 0 for i < j . So A is diagonal.

6.4.6 (i) The conclusion T = 0 follows from the fact that all eigenvalues ofT vanish and U has a basis consisting of eigenvectors of T .

(ii) We have (T ′ ◦ T )k(u) = 0 since T ′, T commute. Using Exer-cise 6.2.8 we have (T ′ ◦ T )(u) = 0. Hence 0 = (u, (T ′ ◦ T )(u)) =(T (u), T (u)) which gives us T (u) = 0.

6.4.10 Assume that there are positive semi-definite R and unitary S, in L(U),such that

T = R ◦ S = S ◦ R. (S21)

Then T ′ = S′ ◦ R = R ◦ S′. So T ′ ◦ T = T ◦ T ′ and T is nor-mal. Now assume that T is normal. Then U has an orthonormal basis,say {u1, . . . , un}, consisting of eigenvectors of T . Set T (ui) = λiui

(i = 1, . . . , n). Make polar decompositions of these eigenvalues,

λi = |λi |eiθi , i = 1, . . . , n, with the convention that θi = 0 if λi = 0.Define R, S ∈ L(U) by setting

R(ui) = |λi |ui, S(ui) = eiθi ui, i = 1, . . . , n.


Then it is clear that R is positive semi-definite, S unitary, and (S21)holds.

6.4.12 Assume that T is normal. Let λ1, . . . , λk ∈ C be all the distinct eigen-values of T and Eλ1 , . . . , Eλk

the corresponding eigenspaces. ThenU = Eλ1 ⊕· · ·⊕Eλk

. Let {ui,1, . . . , ui,mi} be an orthonormal basis of

Eλi(i = 1, . . . , k). Then T ′(u) = λiu for any u ∈ Eλi

(i = 1, . . . , k).On the other hand, using the Vandermonde determinant, we know thatwe can find a0, a1, . . . , ak−1 ∈ C such that

a0 + a1λi + · · · + ak−1λk−1i = λi, i = 1, . . . , k.

Now set p(t) = a0 + a1t + · · · + ak−1tk−1. Then p(T )(u) = λiu for

any u ∈ Eλi(i = 1, . . . , k). Thus T ′ = p(T ).

6.4.13 Assume that T is normal. If dim(U) = n = 1, there is nothing to do.Assume n ≥ 2. Then the previous exercise establishes that there isa polynomial p(t) such that T ′ = p(T ). Hence any subspace invari-ant under T is also invariant under T ′. However, T ′ = p(T ) impliesT = q(T ′) where q is the polynomial obtained from p by replacingthe coefficients of p by the complex conjugates of the correspondingcoefficients of p. Thus any invariant subspace of T ′ is also invariantunder T .

Now assume that T and T ′ have the same invariant subspaces. Weshow that T is normal. We proceed inductively on dim(U) = n. Ifn = 1, there is nothing to do. Assume that the conclusion is true atn = k ≥ 1. We consider n = k + 1. Let λ ∈ C be an eigenvalue ofT and u ∈ U an associated eigenvector of T . Then V = Span{u} isinvariant under T and T ′. We claim that V ⊥ is invariant under T and T ′as well. In fact, take w ∈ V ⊥. Since T ′(u) = au for some a ∈ C, wehave (u, T (w)) = (T ′(u),w) = a(u,w) = 0. So T (w) ∈ V ⊥. HenceV ⊥ is invariant under T and T ′ as well. Since dim(V ⊥) = n− 1 = k,we see that T ◦ T ′ = T ′ ◦ T on V ⊥. Besides, we have T (T ′(u)) =T (au) = aλu and T ′(T (u)) = T ′(λu) = λau. So T ◦ T ′ = T ′ ◦ T onV . This proves that T is normal on U since U = V ⊕ V ⊥.

Section 6.5

6.5.4 Since T is normal, we have (T 2)′◦T 2 = (T ′◦T )2 so that the eigenvaluesof (T 2)′ ◦ T 2 are those of T ′ ◦ T squared. This proves ‖T 2‖ = ‖T ‖2 inparticular. Likewise, (T m)′ ◦ T m = (T ′ ◦ T )m so that the eigenvaluesof (T m)′ ◦ T m are those of T ′ ◦ T raised to the mth power. So ‖T m‖ =‖T ‖m is true in general.


6.5.8 Note that some special forms of this problem have appeared as Exer-cises 6.3.11 and 6.3.12. By the singular value decomposition for A wemay rewrite A as A = P�Q where Q,P ∈ C(n, n) are unitary and� ∈ R(n, n) is diagonal whose diagonal entries are all nonnegative.Alternatively, we also have A = (PQ)(Q†�Q) = (P�P †)(PQ), asproducts of a unitary and positive semi-definite matrices, expressed intwo different orders.

Section 7.1

7.1.4 Since g = 0, we also have f = 0. Assume f n|gn. We show that f |g. Ifn = 1, there is nothing to do. Assume n ≥ 2. Let h = gcd(f, g). Thenwe have f = hp, g = hq, p, q ∈ P , and gcd(p, q) = 1. If p is a scalar,then f |g. Suppose otherwise that p is not a scalar. We rewrite f n and gn

as f n = hnpn and gn = hnqn. Since f n|gn, we have gn = f nr , wherer ∈ P . Hence hnqn = hnpnr . Therefore qn = pnr . In particular, p|qn.However, since gcd(p, q) = 1, we have p|qn−1. Arguing repeatedly wearrive at p|q, which is a contradiction.

7.1.5 Let h = gcd(f, g). Then f = hp, g = hq, and gcd(p, q) = 1.Thus f n = hnpn, gn = hnqn, and gcd(pn, qn) = 1, which impliesgcd(f n, gn) = hn.

Section 7.2

7.2.1 If pS(λ) are pT (λ) are relatively prime, then there are polynomials f, g

such that f (λ)pS(λ) + g(λ)pT (λ) = 1. Thus, I = f (T )pS(T ) +g(T )pT (T ) = f (T )pS(T ) which implies pS(T ) is invertible andpS(T )−1 = f (T ). Similarly we see that pT (S) is also invertible.

7.2.2 Using the notation of the previous exercise, we have I = f (T )pS(T ).Thus, applying R ◦ S = T ◦ R, we get

R = f (T )pS(T ) ◦ R = R ◦ f (S)pS(S) = 0.

7.2.3 It is clear that N(T ) ⊂ R(I − T ). It is also clear that R(I − T ) ⊂N(T ) when T 2 = T . So N(T ) = R(I − T ) if T 2 = T . Thus (7.2.14)follows from the rank equation r(T ) + n(T ) = dim(U). Conversely,assume (7.2.14) holds. From this and the rank equation again we getr(I − T ) = n(T ). So N(T ) = R(I − T ) which establishes T 2 = T .

7.2.4 We have f1g1 + f2g2 = 1 for some polynomials f1, f2. So

I = f1(T )g1(T )+ f2(T )g2(T ). (S22)


As a consequence, for any u ∈ U , we have u = v + w wherev = f1(T )g1(T )u and w = f2(T )g2(T )u. Hence g2(T )v =f1(T )pT (T )u = 0 and g1(T )w = f2(T )pT (T )u = 0. This shows v ∈N(g2(T )) and w ∈ N(g1(T )). Therefore U = N(g1(T ))+ N(g2(T )).Pick u ∈ N(g1(T ))∩N(g2(T )). Applying (S22) to u we see that u = 0.Hence the problem follows.

7.2.6 We proceed by induction on k. If k = 1, (7.2.16) follows from Exer-cise 7.2.3. Assume that at k− 1 ≥ 1 the relation (7.2.16) holds. That is,

U = R(T1)⊕ · · · ⊕ R(Tk−1)⊕W, W = N(T1) ∩ · · · ∩N(Tk−1).

(S23)

Now we have U = R(Tk) ⊕ N(Tk). We assert N(Tk) = R(T1) ⊕· · · ⊕ R(Tk−1) ⊕ V . In fact, pick any u ∈ N(Tk). Then (S23)indicates that u = u1 + · · · + uk−1 + w for u1 ∈ R(T1), . . . , uk−1 ∈R(Tk−1), w ∈ W . Since Tk ◦ Ti = 0 for i = 1, . . . , k − 1, we see thatu1, . . . , uk−1 ∈ N(Tk). Hence w ∈ N(Tk). So w ∈ V . This establishesthe assertion and the problem follows.

Section 7.3

7.3.2 (ii) From (7.3.41) we see that, if we set

p(λ) = λn − an−1λn−1 − · · · − a1λ− a0,

then p(T )(T k(u)) = T k(p(T )(u)) = 0 for k = 0, 1, . . . , n− 1, whereT 0 = I . This establishes p(T ) = 0 since {T n−1(u), . . . , T (u), u} is abasis of U . So pT (λ) = p(λ). It is clear that mT (λ) = pT (λ).

7.3.3 Let u be a cyclic vector of T . Assume that S and T commute. Leta0, a1, . . . , an−1 ∈ F be such that S(u) = an−1T

n−1(u) + · · · +a1T (u)+a0u. Set p(t) = an−1t

n−1+· · ·+a1t+a0. Hence S(T k(u)) =T k(S(u)) = (T kp(T ))(u) = p(T )(T k(u)) for k = 1, . . . , n − 1. Thisproves S = p(T ) since u, T (u), . . . , T n−1(u) form a basis of U .

7.3.4 (i) Let u be a cyclic vector of T . Since T is normal, U has a basisconsisting of eigenvectors, say u1, . . . , un, of T , associated with thecorresponding eigenvalues, λ1, . . . , λn. Express u as u = a1u1 + · · · +anun for some a1, . . . , an ∈ C. Hence

T (u) = a1λ1u1 + · · · + anλnun,

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·T n−1(u) = a1λ

n−11 u1 + · · · + anλ

n−1n un.


Inserting the above relations into the equation

x1u+ x2T (u)+ · · · + xnTn−1(u) = 0, (S24)

we obtain ⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩a1(x1 + λ1x2 + · · · + λn−1

1 xn) = 0,

a2(x1 + λ2x2 + · · · + λn−12 xn) = 0,

· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·an(x1 + λnx2 + · · · + λn−1

n xn) = 0.

(S25)

If λ1, . . . , λn are not all distinct, then (S25) has a solution(x1, . . . , xn) ∈ Cn, which is not the zero vector for any givena1, . . . , an, contradicting with the linear independence of the vectorsu, T (u), . . . , T n−1(u).

(ii) With (7.3.42), we consider the linear dependence of the vectorsu, T (u), . . . , T n−1(u) as in part (i) and come up with (S24) and (S25).Since λ1, . . . , λn are distinct, in view of the Vandermonde determinant,we see that the system (S25) has only zero solution x1 = 0, . . . , xn = 0if and only if ai = 0 for any i = 1, . . . , n.

7.3.5 Assume there is such an S. Then S is nilpotent of degree m where m

satisfies 2(n − 1) < m ≤ 2n since S2(n−1) = T n−1 = 0 and S2n =T n = 0. By Theorem 2.22, we arrive at 2(n − 1) < m ≤ n, which isfalse.

Section 7.4

7.4.5 Let the invertible matrix C ∈ C(n, n) be decomposed as C = P + iQ,where P,Q ∈ R(n, n). If Q = 0, there is nothing to show. AssumeQ = 0. Then from CA = BC we have PA = BP and QA = BQ.Thus, for any real number λ, we have (P + λQ)A = B(P + λQ). Itis clear that there is some λ ∈ R such that det(P + λQ) = 0. For suchλ, set K = P + λQ. Then A = K−1BK .

7.4.9 (i) Let u ∈ Cn \ {0} be an eigenvector associated to the eigenvalueλ. Then Au = λu. Thus Aku = λku. Since there is an invert-ible matrix B ∈ C(n, n) such that Ak = B−1AB, we obtain(B−1ABu) = λku or A(Bu) = λk(Bu), so that the problemfollows.

(ii) From (i) we see that if λ ∈ C is an eigenvalue then λk, λk2, . . . , λkl

are all eigenvalues of A which cannot be all distinct when l islarge enough. So there are some integers 1 ≤ l < m such that


λkl = λkm

. Since A is nonsingular, then λ = 0. Hence λ satisfies

λkm−l = 1 as asserted.7.4.11 Use (3.4.37) without assuming det(A) = 0 or (S16).7.4.12 Take u = (1, . . . , 1)t ∈ Rn. It is clear that Au = nu. It is also

clear that n(A) = n − 1 since r(A) = 1. So n(A − nIn) = 1

and A ∼ diag{n, 0, . . . , 0}. Take v = (1,b2

n, . . . ,

bn

n)t ∈ Rn. Then

Bv = nv. Since r(B) = 1, we get n(B) = n− 1. So n(B − nIn) = 1.Consequently, B ∼ diag{n, 0, . . . , 0}. Thus A ∼ B.

7.4.17 Suppose otherwise that there is an A ∈ R(3, 3) such that m(λ) = λ2+3λ+4 is the minimal polynomial of A. Let pA(λ) be the characteristicpolynomial of A. Since pA(λ) ∈ P3 and the coefficients of pA(λ) areall real, so pA(λ) has a real root. On the other hand, recall that theroots of m(λ) are all the roots of pA(λ) but the former has no real root.So we arrive at a contradiction.

7.4.18 (i) Let mA(λ) be the minimal polynomial of A. Then mA(λ)|λ2 + 1.However, λ2 + 1 is prime over R, so mA(λ) = λ2 + 1.

(ii) Let pA(λ) be the characteristic polynomial of A. Then the degreeof pA(λ) is n. Since mA(λ) = λ2 + 1 contains all the roots ofpA(λ) in C, which are ±i, which must appear in conjugate pairsbecause pA(λ) is of real coefficients, so pA(λ) = (λ2 + 1)m forsome integer m ≥ 1. Hence n = 2m.

(iii) Since pA(λ) = (λ− i)m(λ+ i)m and mA(λ) = (λ− i)(λ+ i) (i.e.,±i are single roots of mA(λ)), we know that

N(A− iIn) = {x ∈ Cn |Ax = ix},N(A+ iIn) = {y ∈ Cn |Ay = −iy},

are both of dimension m in Cn and Cn = N(A − iIn)⊕ N(A +iIn). Since A is real, we see that if x ∈ N(A − iIn) then x ∈N(A+ iIn), and vice versa. Moreover, if {w1, . . . , wm} is a basisof N(A− iIn), then {w1, . . . , wm} is a basis of N(A + iIn), andvice versa. Thus {w1, . . . , wm,w1, . . . , wm} is a basis of Cn. Wenow make the decomposition

wi = ui + ivi, ui, vi ∈ Rn, i = 1, . . . , m. (S26)

Then ui, vi (i = 1, . . . , m) satisfy (7.4.23). It remains to showthat these vectors are linearly independent in Rn. In fact, consider


m∑i=1

aiui +m∑

i=1

bivi = 0, a1, . . . , am, b1, . . . , bm ∈ R. (S27)

From (S26) we have

ui = 1

2(wi + wi), vi = 1

2i(wi − wi), i = 1, . . . , m.

(S28)

Inserting (S28) into (S27), we obtain

m∑i=1

(ai − ibi)wi +m∑

i=1

(ai + ibi)wi = 0,

which leads to ai = bi = 0 for all i = 1, . . . , m.(iv) Take ordered basis B = {u1, . . . , um, v1, . . . , vm}. Then it is seen

that, with respect to B, the matrix representation of the mappingTA ∈ L(Rn) defined by TA(u) = Au, u ∈ Rn is simply

C =(

0 Im

−Im 0

).

More precisely, if a matrix called B is formed by using the vectorsin the ordered basis B as its first, second, . . . , and the nth columnvectors, then AB = BC, which establishes (7.4.24).

Section 8.1

8.1.5 If A is normal, there is a unitary matrix P ∈ C(n, n) such that A =P †DP , where D is a diagonal matrix of the form diag{λ1, . . . , λn}withλ1, . . . , λn the eigenvalues of A which are assumed to be real. ThusA† = A. However, because A is real, we have A = At .

8.1.6 We proceed inductively on dim(U). If dim(U) = 1, there is nothing toshow. Assume that the problem is true at dim(U) = n − 1 ≥ 1. Weprove the conclusion at dim(U) = n ≥ 2. Let λ ∈ C be an eigenvalueof T and Eλ the associated eigenspace of T . Then for u ∈ Eλ we haveT (S(u)) = S(T (u)) = λS(u). Hence S(u) ∈ Eλ. So Eλ is invariant un-der S. As an element in L(Eλ), S has an eigenvalue μ ∈ C. Let u ∈ Eλ

be an eigenvector of S associated with μ. Then u is a common eigen-vector of S and T . Applying this observation to S′ and T ′ since S′, T ′commute as well, we know that S′ and T ′ also have a common eigenvec-tor, say w, satisfying S′(w) = σw, T ′(w) = γw, for some σ, γ ∈ C.


Let V = (Span{w})⊥. Then V is invariant under S and T as can be seenfrom

(w, S(v)) = (S′(w), v) = (σw, v) = σ(w, v) = 0,

(w, T (v)) = (T ′(w), v) = (γw, v) = γ (w, v) = 0,

for v ∈ V . Since dim(V ) = n − 1, we may find an orthonormal basisof V , say {u1, . . . , un−1}, under which the matrix representations of S

and T are upper triangular. Let un = w/‖w‖. Then {u1, . . . , un−1, un}is an orthonormal basis of U under which the matrix representations ofS and T are upper triangular.

8.1.9 Let λ ∈ C be any eigenvalue of T and v ∈ U an associated eigenvector.Then we have

((T ′ − λI)(u), v) = (u, (T − λI)(v)) = 0, u ∈ U, (S29)

which implies that R(T ′ − λI) ⊂ (Span{v})⊥. Therefore r(T ′ − λI) ≤n− 1. Thus, in view of the rank equation, we obtain n(T ′ −λI) ≥ 1. Inother words, this shows that λ must be an eigenvalue of T ′.

Section 8.2

8.2.3 Let B = {u1, . . . , un} be a basis of U and x, y ∈ Fn the coordinatevectors of u, v ∈ U with respect to B. With A = (aij ) = (f (ui, uj ))

and (8.2.2), we see that u ∈ U0 if and only if xtAy = 0 for all y ∈ Fn

or Ax = 0. In other words, u ∈ U0 if and only if x ∈ N(A). Sodim(U0) = n(A) = n− r(A) = dim(U)− r(A).

8.2.4 (i) As in the previous exercise, we use B = {u1, . . . , un} to denotea basis of U and x, y ∈ Fn the coordinate vectors of any vectorsu, v ∈ U with respect to B. With A = (aij ) = (f (ui, uj )) and(8.2.2), we see that u ∈ V ⊥ if and only if

(Ax)ty = 0, v ∈ V. (S30)

Let dim(V ) = m. We can find m linearly independent vectorsy(1), . . . , y(m) in Fn to replace the condition (S30) by

(y(1))t (Ax) = 0, . . . , (y(m))t (Ax) = 0.

These equations indicate that, if we use B to denote the matrixformed by taking (y(1))t , . . . , (y(m))t as its first,. . . , and mth rowvectors, then Ax ∈ N(B) = {z ∈ Fn |Bz = 0}. In other words, thesubspace of Fn consisting of the coordinate vectors of the vectors inV ⊥ is given by X = {x ∈ Fn |Ax ∈ N(B)}. Since A is invertible,


we have dim(X) = dim(N(B)) = n(B) = n− r(B) = n−m. Thisestablishes dim(V ⊥) = dim(X) = dim(U)− dim(V ).

(ii) For v ∈ V , we have f (u, v) = 0 for any u ∈ V ⊥. So V ⊂ (V ⊥)⊥.On the other hand, from (i), we get

dim(V )+ dim(V ⊥) = dim(V ⊥)+ dim((V ⊥)⊥).

So dim(V ) = dim((V ⊥)⊥) which implies V = (V ⊥)⊥.

8.2.7 If V = V ⊥, then V is isotropic and dim(V ) = 1

2dim(U) in view of

Exercise 8.2.4. Thus V is Lagrangian. Conversely, if V is Lagrangian,

then V is isotropic such that V ⊂ V ⊥ and dim(V ) = 1

2dim(U). From

Exercise 8.2.4, we have dim(V ⊥) = 1

2dim(U) = dim(V ). So V =V ⊥.

Section 8.3

8.3.1 Let u = (ai) ∈ Rn be a positive eigenvector associated to r . Then theith component of the relation Au = ru reads

rai =n∑

j=1

aij aj , i = 1, . . . , n. (S31)

Choose k, l = 1, . . . , n such that

ak = min{ai | i = 1, . . . , n}, al = max{ai | i = 1, . . . , n}.Inserting these into (S31) we find

rak ≥ ak

n∑j=1

akj , ral ≤ al

n∑j=1

alj .

From these we see that the bounds stated in (8.3.27) follow.8.3.3 Use the notation �A = {λ ∈ R | λ ≥ 0, Ax ≥ λx for some x ∈ S},

where S is defined by (8.3.3). Recall the construction (8.3.7). We seethat rA = sup{λ ∈ �A}. Since A ≤ B implies �A ⊂ �B , we deducerA ≤ rB .

Section 8.4

8.4.4 From limm→∞Am = K, we obtain lim

m→∞(At )m = limm→∞(Am)t = Kt .

Since all the row vectors of K and Kt are identical, we see that allentries of K are identical. By the condition (8.4.13), we deduce (8.4.24).


8.4.5 It is clear that all the entries of A and At are non-negative. It remainsto show that 1 and u = (1, . . . , 1)t ∈ Rn are a pair of eigenvalues andeigenvectors of both A and At . In fact, applying Aiu = u and At

iu = u

(i = 1, . . . , k) consecutively, we obtain Au = A1 · · ·Aku = u andAtu = At

k · · ·At1u = u, respectively.

Section 9.3

9.3.2 (i) In the uniform state (9.3.30), we have

〈A〉 = 1

n

n∑i=1

λi, 〈A2〉 = 1

n

n∑i=1

λ2i .

Hence the uncertainty σA of the observable A in the state (9.3.30)is given by the formula

σ 2A = 〈A2〉 − 〈A〉2 = 1

n

n∑i=1

λ2i −(

1

n

n∑i=1

λi

)2

. (S32)

(ii) From (9.3.25) and (S32), we obtain the comparison

1

n

n∑i=1

λ2i −(

1

n

n∑i=1

λi

)2

≤ 1

4(λmax − λmin)

2.

Bibliographic notes

We end the book by mentioning a few important but more specialized subjectsthat are not touched in this book. We point out only some relevant referencesfor the interested.

Convex sets. In Lang [23] basic properties and characterizations of convexsets in Rn are presented. For a deeper study of convex sets using advancedtools such as the Hahn–Banach theorem see Lax [25].

Tensor products and alternating forms. These topics are covered elegantlyby Halmos [18]. In particular, there, the determinant is seen to arise asthe unique scalar, associated with each linear mapping, defined by the one-dimensional space of top-degree alternating forms, over a finite-dimensionalvector space.

Minmax principle for computing the eigenvalues of self-adjoint mappings.This is a classical variational resolution of the eigenvalue problem knownas the method of the Rayleigh–Ritz quotients. For a thorough treatment seeBellman [6], Lancaster and Tismenetsky [22], and Lax [25].

Calculus of matrix-valued functions. These techniques are useful and pow-erful in applications. For an introduction see Lax [25].

Irreducible matrices. Such a notion is crucial for extending the Perron–Frobenius theorem and for exploring the Markov matrices further under morerelaxed conditions. See Berman and Plemmons [7], Horn and Johnson [21],Lancaster and Tismenetsky [22], Meyer [29], and Xu [38] for related studies.

Transformation groups and bilinear forms. Given a non-degenerate bilin-ear form over a finite-dimensional vector space, the set of all linear mappingson the space which preserve the bilinear form is a group under the operationof composition. With a specific choice of the bilinear form, a particular suchtransformation group may thus be constructed and investigated. For a conciseintroduction to this subject in the context of linear algebra see Hoffman andKunze [19].

311

312 Bibliographic notes

Computational methods for solving linear systems. Practical methods forsolving systems of linear equations are well investigated and documented. SeeLax [25], Golub and Ortega [13], and Stoer and Bulirsch [32] for a descriptionof some of the methods.

Computing the eigenvalues of symmetric matrices. This is a much devel-oped subject and many nice methods are available. See Lax [25] for meth-ods based on the QR factorization and differential flows. See also Stoer andBulirsch [32].

Computing the eigenvalues of general matrices. Some nicely formulatediterative methods may be employed to approximate the eigenvalues of a generalmatrix under certain conditions. These methods include the QR convergencealgorithm and the power method. See Golub and Ortega [13] and Stoer andBulirsch [32].

Random matrices. The study of random matrices, the matrices whose entriesare random variables, was pioneered by Wigner [36, 37] to model the spectra oflarge atoms and has recently become the focus of active mathematical research.See Akemann, Baik, and Di Francesco [1], Anderson, Guionnet, and Zeitouni[2], Mehta [28], and Tao [33], for textbooks, and Beenakker [5], Diaconis [12],and Guhr, Müller-Groeling, and Weidenmüller [17], for survey articles.

Besides, for a rich variety of applications of Linear Algebra and its relatedstudies, see Bai, Fang, and Liang [3], Bapat [4], Bellman [6], Berman andPlemmons [7], Berry, Dumais, and O’Brien [8], Brualdi and Ryser [9], Datta[10], Davis [11], Gomide et al [14], Graham [15], Graybill [16], Horadam [20],Latouche and Vaidyanathan [24], Leontief [26], Lyubich, Akin, Vulis, and Kar-pov [27], Meyn and Tweedie [30], Stinson [31], Taubes [34], Van Dooren andWyman [35], and references therein.

References

[1] G. Akemann, J. Baik, and P. Di Francesco, The Oxford Handbook of RandomMatrix Theory, Oxford University Press, Oxford, 2011.

[2] G. W. Anderson, A. Guionnet, and O. Zeitouni, An Introduction to RandomMatrices, Cambridge University Press, Cambridge, 2010.

[3] Z. Bai, Z. Fang, and Y.-C. Liang, Spectral Theory of Large Dimensional Ran-dom Matrices and Its Applications to Wireless Communications and FinanceStatistics, World Scientific, Singapore, 2014.

[4] R. B. Bapat, Linear Algebra and Linear Models, 3rd edn, Universitext, Springer-Verlag and Hindustan Book Agency, New Delhi, 2012.

[5] C. W. J. Beenakker, Random-matrix theory of quantum transport, Reviews ofModern Physics 69 (1997) 731–808.

[6] R. Bellman, Introduction to Matrix Analysis, 2nd edn, Society of Industrial andApplied Mathematics, Philadelphia, 1997.

[7] A. Berman and R. J. Plemmons, Nonnegative Matrices in the Mathematical Sci-ences, Society of Industrial and Applied Mathematics, Philadelphia, 1994.

[8] M. Berry, S. Dumais, and G. O’Brien, Using linear algebra for intelligent infor-mation retrieval, SIAM Review 37 (1995) 573–595.

[9] R. A. Brualdi and H. J. Ryser, Combinatorial Matrix Theory, Encyclopedia ofMathematics and its Applications 39, Cambridge University Press, Cambridge,1991.

[10] B. N. Datta, Numerical Linear Algebra and Applications, 2nd edn, Society ofIndustrial and Applied Mathematics, Philadelphia, 2010.

[11] E. Davis, Linear Algebra and Probability for Computer Science Applications,A. K. Peters/CRC Press, Boca Raton, FL, 2012.

[12] P. Diaconis, Patterns in eigenvalues: the 70th Josiah Willard Gibbs lecture, Bul-letin of American Mathematical Society (New Series) 40 (2003) 155–178.

[13] G. H. Golub and J. M. Ortega, Scientific Computing and Differential Equations,Academic Press, Boston and New York, 1992.

[14] J. Gomide, R. Melo-Minardi, M. A. dos Santos, G. Neshich, W. Meira, Jr., J. C.Lopes, and M. Santoro, Using linear algebra for protein structural comparisonand classification, Genetics and Molecular Biology 32 (2009) 645–651.

[15] A. Graham, Nonnegative Matrices and Applicable Topics in Linear Algebra,John Wiley & Sons, New York, 1987.

313

314 References

[16] F. A. Graybill, Introduction to Matrices with Applications in Statistics,Wadsworth Publishing Company, Belmont, CA, 1969.

[17] T. Guhr, A. Müller-Groeling, and H. A. Weidenmüller, Random-matrix theoriesin quantum physics: common concepts, Physics Reports 299 (1998) 189–425.

[18] P. R. Halmos, Finite-Dimensional Vector Spaces, 2nd edn, Springer-Verlag, NewYork, 1987.

[19] K. Hoffman and R. Kunze, Linear Algebra, Prentice-Hall, Englewood Cliffs,NJ, 1965.

[20] K. J. Horadam, Hadamard Matrices and Their Applications, Princeton Univer-sity Press, Princeton, NJ, 2007.

[21] R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge University Press,Cambridge, New York, and Melbourne, 1985.

[22] P. Lancaster and M. Tismenetsky, The Theory of Matrices, 2nd edn, AcademicPress, San Diego, New York, London, Sydney, and Tokyo, 1985.

[23] S. Lang, Linear Algebra, 3rd edn, Springer-Verlag, New York, 1987.[24] G. Latouche and R. Vaidyanathan, Introduction to Matrix Analytic Methods in

Stochastic Modeling, Society of Industrial and Applied Mathematics, Philadel-phia, 1999.

[25] P. D. Lax, Linear Algebra and Its Applications, John Wiley & Sons, Hoboken,NJ, 2007.

[26] W. Leontief, Input-Output Economics, Oxford University Press, New York,1986

[27] Y. I. Lyubich, E. Akin, D. Vulis, and A. Karpov, Mathematical Structures inPopulation Genetics, Springer-Verlag, New York, 2011.

[28] M. L. Mehta, Random Matrices, Elsevier Academic Press, Amsterdam, 2004.[29] C. Meyer, Matrix Analysis and Applied Linear Algebra, Society of Industrial

and Applied Mathematics, Philadelphia, 2000.[30] S. P. Meyn and R. L. Tweedie, Markov Chains and Stochastic Stabil-

ity, Springer-Verlag, London, 1993; 2nd edn, Cambridge University Press,Cambridge, 2009.

[31] D. R. Stinson, Cryptography, Discrete Mathematics and its Applications, Chap-man & Hall/CRC Press, Boca Raton, FL, 2005.

[32] J. Stoer and R. Bulirsch, Introduction to Numerical Analysis, Springer-Verlag,New York, Heidelberg, and Berlin, 1980.

[33] T. Tao, Topics in Random Matrix Theory, American Mathematical Society,Providence, RI, 2012.

[34] C. H. Taubes, Lecture Notes on Probability, Statistics, and Linear Algebra,Department of Mathematics, Harvard University, Cambridge, MA, 2010.

[35] P. Van Dooren and B. Wyman, Linear Algebra for Control Theory, IMA Vol-umes in Mathematics and its Applications, Springer-Verlag, New York, 2011.

[36] E. Wigner, Characteristic vectors of bordered matrices with infinite dimensions,Annals of Mathematics 62 (1955) 548–564.

[37] E. Wigner, On the distribution of the roots of certain symmetric matrices, Annalsof Mathematics 67 (1958) 325–327.

[38] Y. Xu, Linear Algebra and Matrix Theory (in Chinese), 2nd edn, Higher Educa-tion Press, Beijing, 2008.

Index

1-form, 16

characteristic roots, 107

addition, 3adjoint mapping, 50, 122adjoint matrix, 103adjugate matrix, 103adjunct matrix, 103algebraic multiplicity, 216algebraic number, 13angular frequencies, 255annihilating polynomials, 221annihilator, 19anti-Hermitian mappings, 195anti-Hermitian matrices, 7anti-lower triangular matrix, 99anti-self-adjoint, 126anti-self-adjoint mappings, 195anti-self-dual, 126anti-symmetric, 4anti-symmetric forms, 230anti-upper triangular matrix, 99

basis, 13basis change matrix, 15basis orthogonalization, 117basis transition matrix, 15, 45Bessel inequality, 140bilinear form, 147, 180bilinear forms, 147boxed diagonal matrix, 57boxed lower triangular form, 99boxed upper triangular form, 98boxed upper triangular matrix, 56bracket, 248

Cayley–Hamilton theorem, 110characteristic of a field, 2characteristic polynomial, 107characteristic polynomial of linear mapping,

109Cholesky decomposition theorem, 168Cholesky relation, 168co-prime, 207codimension, 27cofactor, 89cofactor expansion, 89cokernel, 39column rank of a matrix, 52commutator, 256complement, 22complete set of orthogonal vectors, 139component, 4composition of mappings, 37congruent matrices, 148contravariant vectors, 18convergent sequence, 28convex, 164coordinate vector, 14coordinates, 14covariant vectors, 18Cramer’s formulas, 80, 104Cramer’s rule, 80, 104cyclic vector, 62, 217cyclic vector space, 217

Darboux basis, 235degree, 86degree of a nilpotent mapping, 62dense, 72determinant, 79determinant of a linear mapping, 105

315

316 Index

diagonal matrix, 6diagonalizable, 222diagonally dominant condition, 101dimension, 13direct product of vector spaces, 22direct sum, 21direct sum of mappings, 58dominant, 237dot product, 5doubly Markov matrices, 247dual basis, 17dual mapping, 122dual space, 16

eigenmodes, 255eigenspace, 57eigenvalue, 57eigenvector, 57Einstein formula, 255entry, 4equivalence of norms, 30equivalence relation, 48equivalent polynomials, 207Euclidean scalar product, 124, 127

Fermat’s Little Theorem, 267field, 1field of characteristic 0, 2field of characteristic p, 2finite dimensional, 13finite dimensionality, 13finitely generated, 13form, 16Fourier coefficients, 139Fourier expansion, 139Fredholm alternative, 53, 137, 178Frobenius inequality, 44functional, 16Fundamental Theorem of Algebra, 85

generalized eigenvectors, 216generalized Schwarz inequality, 194generated, 8generic property, 72geometric multiplicity, 57Gram matrix, 136Gram–Schmidt procedure, 117great common divisor, 207Gronwall inequality, 263

Hamiltonian, 254hedgehog map, 88Heisenberg equation, 263

Heisenberg picture, 262Heisenberg uncertainty principle, 258Hermitian congruent, 181Hermitian conjugate, 7, 134Hermitian matrices, 7Hermitian matrix, 134Hermitian scalar product, 127Hermitian sesquilinear form, 182Hermitian skewsymmetric forms, 236Hermitian symmetric matrices, 7Hilbert–Schmidt norm of a matrix, 137homogeneous, 148hyponormal mappings, 199

ideal, 205idempotent linear mappings, 60identity matrix, 6image, 38indefinite, 161index, 83index of negativity, 119index of nilpotence, 62index of nullity, 119index of positivity, 119infinite dimensional, 13injective, 38invariant subspace, 55inverse mapping, 42invertible linear mapping, 41invertible matrix, 7irreducible linear mapping, 56irreducible polynomial, 207isometric, 142isometry, 142isomorphic, 42isomorphism, 42isotropic subspace, 235

Jordan block, 220Jordan canonical form, 220Jordan decomposition theorem, 220Jordan matrix, 220

kernel, 38

least squares approximation, 176left inverse, 7, 42Legendre polynomials, 141Levy–Desplanques theorem, 101limit, 28linear complement, 22linear function, 16linear span, 8

Index 317

linear transformation, 55linearly dependent, 8, 9linearly independent, 10linearly spanned, 8locally nilpotent mappings, 62lower triangular matrix, 6

mapping addition, 35Markov matrices, 243matrix, 4matrix exponential, 76matrix multiplication, 5matrix-valued initial value problem, 76maximum uncertainty, 261maximum uncertainty state, 261Measurement postulate, 252metric matrix, 136minimal polynomial, 67, 221Minkowski metric, 120Minkowski scalar product, 120Minkowski theorem, 101minor, 89mutually complementary, 22

negative definite, 161negative semi-definite, 161nilpotent mappings, 62non-definite, 161non-degenerate, 120non-negative, 161non-negative matrices, 237non-negative vectors, 237non-positive, 161nonsingular matrix, 7norm, 28norm that is stronger, 29normal equation, 176normal mappings, 172, 194, 195normal matrices, 197normed space, 28null vector, 115null-space, 38nullity, 39nullity-rank equation, 40

observable postulate, 252observables, 252one-parameter group, 74one-to-one, 38onto, 38open, 73orthogonal, 115orthogonal mapping, 123, 132

orthogonal matrices, 7orthogonal matrix, 134orthonormal basis, 119, 130

Parseval identity, 140Pauli matrices, 257period, 62permissible column operations, 95permissible row operations, 92perpendicular, 115Perron–Frobenius theorem, 237photoelectric effect, 255Planck constant, 254polar decomposition, 198polarization identities, 149polarization identity, 132positive definite Hermitian mapping, 188positive definite Hermitian matrix, 188positive definite quadratic form, 188positive definite quadratic forms, 158positive definite scalar product over a complex

vector space, 128positive definite scalar product over a real

vector space, 128positive definite self-adjoint mapping, 188positive definite self-adjoint mappings, 158positive definite symmetric matrices, 158positive diagonally dominant condition, 101positive matrices, 237positive semi-definite, 161positive vectors, 237preimages, 38prime polynomial, 207principal minors, 166product of matrices, 5projection, 60proper subspace, 8Pythagoras theorem, 129

QR factorization, 134quadratic form, 148, 181quotient space, 26

random variable, 252range, 38rank, 39rank equation, 40rank of a matrix, 52reducibility, 56reducible linear mapping, 56reflective, 19reflectivity, 19regular Markov matrices, 243

318 Index

regular stochastic matrices, 243relatively prime, 207Riesz isomorphism, 122Riesz representation theorem, 121right inverse, 7, 42row rank of a matrix, 52

scalar multiple of a functional, 16scalar multiplication, 3scalar product, 115scalar-mapping multiplication, 35scalars, 1, 3Schrödinger equation, 254Schrödinger picture, 262Schur decomposition theorem, 226Schwarz inequality, 129self-adjoint mapping, 122self-dual mapping, 122self-dual space, 122sesquilinear form, 180shift matrix, 65signed area, 79signed volume, 80similar matrices, 48singular value decomposition for a mapping,

202singular value decomposition for a matrix, 203singular values of a mapping, 202singular values of a matrix, 203skewsymmetric, 4skew-Hermitian forms, 236skew-Hermitian matrices, 7skewsymmetric forms, 230special relativity, 120square matrix, 4stable Markov matrices, 247stable matrix, 246standard deviation, 259

State postulate, 252state space, 252states, 252stochastic matrices, 243subspace, 8sum of functionals, 16sum of subspaces, 20surjective, 38Sylvester inequality, 44Sylvester theorem, 118symmetric, 4symmetric bilinear forms, 149symplectic basis, 235symplectic complement, 236symplectic forms, 235symplectic vector space, 235

time evolution postulate, 253topological invariant, 82transcendental number, 13transpose, 4

uncertainty, 259uniform state, 261unit matrix, 6unitary mapping, 132unitary matrices, 7unitary matrix, 134upper triangular matrix, 6

Vandermonde determinant, 105variance, 257variational principle, 164vector space, 1vector space over a field, 3vectors, 1, 3

wave function, 252wave–particle duality hypothesis, 255

Advanced Linear Algebra (Yisong Yong).pdf

Documents

Transcript of Advanced Linear Algebra (Yisong Yong).pdf