June20,2001 14:01 i56-frontmatter Sheetnumber1 Pagenumberi...

June 20, 2001 14:01 i56-frontmatter Sheet number 1 Page number i cyan black

Introduction toLinear Algebra

FIFTH EDITION

June 20, 2001 14:01 i56-frontmatter Sheet number 2 Page number ii cyan black

June 20, 2001 14:01 i56-frontmatter Sheet number 3 Page number iii cyan black

Introduction toLinear AlgebraFIFTH EDITION

LEE W. JOHNSONR. DEAN RIESS

JIMMY T. ARNOLDVirginia Polytechnic Institute and State University

June 20, 2001 14:01 i56-frontmatter Sheet number 4 Page number iv cyan black

Sponsoring Editor: Laurie RosatoneAssociate Production Supervisor: Julie LaChanceMarketing Manager: Michael BoeziManufacturing Buyer: Evelyn BeatonPrepress Services Buyer: Caroline FellSenior Designer: Barbara T. AtkinsonCover Designer: Barbara T. AtkinsonCover Image: ©EyeWireInterior Designer: Sandra RigneyProduction Services: TechBooksComposition and Art: Techsetters, Inc.

Library of Congress Cataloging-in-Publication Data

Johnson, Lee W.Introduction to linear algebra /Lee W. Johnson, R. Dean Riess, Jimmy T. Arnold.—5th ed.

p. cm.Includes index.

ISBN 0-201-65859-3 (alk. paper)1. Algebra, Linear. I. Johnson, Lee W. II. Riess, R. Dean (Ronald Dean), 1940–III. Arnold, Jimmy T.(Jimmy Thomas), 1941–IV. Title.

QA184.J63 2001. 00-054308512’.5—dc21

Copyright © 2002 by Pearson Education, Inc.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, ortransmitted, in any form or by any means, electronic, mechanical, photocopying, recording, orotherwise, without the prior written permission of the publisher.

Printed in the United States of America.

1 2 3 4 5 6 7 8 9 10-CRS-04 03 02 01

June 20, 2001 14:01 i56-frontmatter Sheet number 5 Page number v cyan black

To our wivesRochelle, Jan, and Linda

June 20, 2001 14:01 i56-frontmatter Sheet number 6 Page number vi cyan black

June 20, 2001 14:01 i56-frontmatter Sheet number 7 Page number vii cyan black

vii

Preface

Linear algebra is an important component of undergraduate mathematics, particularlyfor students majoring in the scientific, engineering, and social science disciplines. At thepractical level, matrix theory and the related vector-space concepts provide a languageand a powerful computational framework for posing and solving important problems.Beyond this, elementary linear algebra is a valuable introduction to mathematical ab-straction and logical reasoning because the theoretical development is self-contained,consistent, and accessible to most students.

Therefore, this book stresses both practical computation and theoretical principlesand centers on the principal topics of the first four chapters:

matrix theory and systems of linear equations,elementary vector-space concepts, andthe eigenvalue problem.

This core material can be used for a brief (10-week) course at the late-freshman/sophomore level. There is enough additional material in Chapters 5–7 either for a moreadvanced or a more leisurely paced course.

FEATURES

Our experience teaching freshman and sophomore linear algebra has led us to carefullychoose the features of this text. Our approach is based on the way students learn and onthe tools they need to be successful in linear algebra as well as in related courses.

We have found that students learn more effectively when the material has a consistentlevel of difficulty. Therefore, in Chapter 1, we provide early and meaningful coverageof topics such as linear combinations and linear independence. This approach helps thestudent negotiate what is usually a dramatic jump in level from solving systems of linearequations to working with concepts such as basis and spanning set.

Tools Students Need (When They Need Them)

The following examples illustrate how we provide students with the tools they need forsuccess.An early introduction to eigenvalues. In Chapter 3, elementary vector-space ideas(subspace, basis, dimension, and so on) are introduced in the familiar setting of Rn.Therefore, it is possible to cover the eigenvalue problem very early and in much greaterdepth than is usually possible. A brief introduction to determinants is given in Section 4.2to facilitate the early treatment of eigenvalues.An early introduction to linear combinations. In Section 1.5, we observe that thematrix-vector product Ax can be expressed as a linear combination of the columns of

June 20, 2001 14:01 i56-frontmatter Sheet number 8 Page number viii cyan black

viii Preface

A, Ax = x1A1 + x2A2 + · · · + xnAn. This viewpoint leads to a simple and naturaldevelopment for the theory associated with systems of linear equations. For instance,the equation Ax = b is consistent if and only if b is expressible as a linear combinationof the columns of A. Similarly, a consistent equation Ax = b has a unique solution ifand only if the columns of A are linearly independent. This approach gives some earlymotivation for the vector-space concepts (introduced in Chapter 3) such as subspace,basis, and dimension. The approach also simplifies ideas such as rank and nullity (whichare then naturally given in terms of dimension of appropriate subspaces).Applications to different fields of study. Some applications are drawn from differenceequations and differential equations. Other applications involve interpolation of data andleast-squares approximations. In particular, students from a wide variety of disciplineshave encountered problems of drawing curves that fit experimental or empirical data.Hence, they can appreciate techniques from linear algebra that can be applied to suchproblems.Computer awareness. The increased accessibility of computers (especially personalcomputers) is beginning to affect linear algebra courses in much the same way as it hascalculus courses. Accordingly, this text has somewhat of a numerical flavor, and (whenit is appropriate) we comment on various aspects of solving linear algebra problems ina computer environment.

A Comfort in the Storm

We have attempted to provide the type of student support that will encourage successin linear algebra—one of the most important undergraduate mathematics courses thatstudents take.A gradual increase in the level of difficulty. In a typical linear algebra course, thestudents find the techniques of Gaussian elimination and matrix operations fairly easy.Then, the ensuing material relating to vector spaces is suddenly much harder. We dothree things to lessen this abrupt midterm jump in difficulty:

1. We introduce linear independence early in Section 1.7.2. We include a new Chapter 2, “Vectors in 2-Space and 3-Space.”3. We first study vector space concepts such as subspace, basis, and dimension in

Chapter 3, in the familiar geometrical setting of Rn.Clarity of exposition. For many students, linear algebra is the most rigorous andabstract mathematical course they have taken since high-school geometry. We havetried to write the text so that it is accessible, but also so that it reveals something ofthe power of mathematical abstraction. To this end, the topics have been organized sothat they flow logically and naturally from the concrete and computational to the moreabstract. Numerous examples, many presented in extreme detail, have been included inorder to illustrate the concepts. The sections are divided into subsections with boldfaceheadings. This device allows the reader to develop a mental outline of the material andto see how the pieces fit together.Extensive exercise sets. We have provided a large number of exercises, ranging fromroutine drill exercises to interesting applications and exercises of a theoretical nature.The more difficult theoretical exercises have fairly substantial hints. The computational

June 20, 2001 14:01 i56-frontmatter Sheet number 9 Page number ix cyan black

Preface ix

exercises are written using workable numbers that do not obscure the point with a massof cumbersome arithmetic details.Trustworthy answer key. Except for the theoretical exercises, solutions to the odd-numbered exercises are given at the back of the text. We have expended considerableeffort to ensure that these solutions are correct.Spiraling exercises. Many sections contain a few exercises that hint at ideas that willbe developed later. Such exercises help to get the student involved in thinking aboutextensions of the material that has just been covered. Thus the student can anticipate abit of the shape of things to come. This feature helps to lend unity and cohesion to thematerial.Historical notes. We have a number of historical notes. These assist the student ingaining a historical and mathematical perspective of the ideas and concepts of linearalgebra.Supplementary exercises. We include, at the end of each chapter, a set of supplemen-tary exercises. These exercises, some of which are true–false questions, are designed totest the student’s understanding of important concepts. They often require the studentto use ideas from several different sections.Integration of MATLAB. We have included a collection of MATLAB projects at theend of each chapter. For the student who is interested in computation, these projectsprovide hands-on experience with MATLAB.A short MATLAB appendix. Many students are not familiar with MATLAB. There-fore, we include a very brief appendix that is sufficient to get the student comfortablewith using MATLAB for problems that typically arise in linear algebra.The vector form for the general solution. To provide an additional early introductionto linear combinations and spanning sets, in Section 1.5 we introduce the idea of thevector form for the general solution of Ax = b.

SUPPLEMENTS

Solutions Manuals

An Instructor’s Solutions Manual and a Student’s Solutions Manual are available.The odd-numbered computational exercises have answers at the back of the book. Thestudent’s solutions manual (ISBN 0-201-65860-7) includes detailed solutions for theseexercises. The instructor’s solutions manual (ISBN 0-201-75814-8) contains solutionsto all the exercises.New Technology Resource Manual. This manual was designed to assist in the teach-ing of the MATLAB, Maple, and Mathematica programs in the context of linear algebra.This manual is available from Addison-Wesley (ISBN 0-201-75812-1) or via[our website,] http://www.aw.com/jra.

Organization

To provide greater flexibility, Chapters 4, 5, and 6 are essentially independent. Thesechapters can be taken in any order once Chapters 1 and 3 are covered. Chapter 7 isa mélange of topics related to the eigenvalue problem: quadratic forms, differential

June 20, 2001 14:01 i56-frontmatter Sheet number 10 Page number x cyan black

x Preface

equations, QR factorizations, Householder transformations, generalized eigenvectors,and so on. The sections in Chapter 7 can be covered in various orders. A schematicdiagram illustrating the chapter dependencies is given below. Note that Chapter 2,“Vectors in 2-Space and 3-Space,” can be omitted with no loss of continuity.

Chapter 1 Chapter 2 (optional)

Chapter 3

Chapter 5 Chapter 6Chapter 4

Chapter 7

We especially note that Chapter 6 (Determinants) can be covered before Chapter 4(Eigenvalues). However, Chapter 4 contains a brief introduction to determinants thatshould prove sufficient to users who do not wish to cover Chapter 6.

A very short but useful course at the beginning level can be built around the followingsections:

Section 1.1–1.3, 1.5–1.7, 1.9Sections 3.1–3.6Sections 4.1–4.2, 4.4–4.5

A syllabus that integrates abstract vector spaces. Chapter 3 introduces elementaryvector-space ideas in the familiar setting of Rn. We designed Chapter 3 in this wayso that it is possible to cover the eigenvalue problem much earlier and in greater depththan is generally possible. Many instructors, however, prefer an integrated approach tovector spaces, one that combines Rn and abstract vector spaces. The following syllabus,similar to ones used successfully at several universities, allows for a course that integratesabstract vector spaces into Chapter 3. This syllabus also allows for a detailed treatmentof determinants:

Sections 1.1–1.3, 1.5–1.7, 1.9Sections 3.1–3.3, 5.1–5.3, 3.4–3.5, 5.4–5.5Sections 4.1–4.3, 6.4–6.5, 4.4–4.7

Augmenting the core sections. As time and interest permit, the core of Sections 1.1–1.3, 1.5–1.7, 1.9, 3.1–3.6, 4.1–4.2, and 4.4–4.5 can be augmented by including variouscombinations of the following sections:

(a) Data fitting and approximation: 1.8, 3.8–3.9, 7.5–7.6.(b) Eigenvalue applications: 4.8, 7.1–7.2.

June 20, 2001 14:01 i56-frontmatter Sheet number 11 Page number xi cyan black

Preface xi

(c) More depth in vector space theory: 3.7, Chapter 5.(d) More depth in eigenvalue theory: 4.6–4.7, 7.3–7.4, 7.7–7.8.(e) Determinant theory: Chapter 6.

To allow the possibility of getting quickly to eigenvalues, Chapter 4 contains abrief introduction to determinants. If the time is available and if it is desirable, Chapter 6(Determinants) can be taken after Chapter 3. In such a course, Section 4.1 can be coveredquickly and Sections 4.2–4.3 can be skipped.

Finally, in the interest of developing the student’s mathematical sophistication, wehave provided proofs for almost every theorem. However, some of the more technicalproofs (such as the demonstration that det(AB) = det(A)det(B)) are deferred to the endof the sections. As always, constraints of time and class maturity will dictate whichproofs should be omitted.

ACKNOWLEDGMENTS

A great many valuable contributions to the Fifth Edition were made by those whoreviewed the manuscript as it developed through various stages:

Idris Assani, University of North Carolina, Chapel HillSatish Bhatnagar, University of Nevada, Las VegasRichard Daquila, Muskingum CollegeRobert Dobrow, Clarkson UniversityBranko Grunbaum, University of WashingtonIsom Herron, Rennsselaer Polytechnic InstituteDiane Hoffoss, Rice UniversityRichard Kubelka, San Jose State UniversityTong Li, University of IowaDavid K. Neal, Western Kentucky UniversityEileen Shugart, Virginia Institute of TechnologyNader Vakil, Western Illinios UniversityTarynn Witten, Trinity UniversityChristos Xenophontos, Clarkson University

In addition, we wish to thank Michael A. Jones, Montclair State University andIsom Herron, Rennsselaer Polytech Institute for their careful work in accuracy checkingthis edition.Blacksburg, Virginia L.W.J.

R.D.R.J.T.A.

June 20, 2001 14:01 i56-frontmatter Sheet number 12 Page number xii cyan black

June 20, 2001 14:01 i56-frontmatter Sheet number 13 Page number xiii cyan black

xiii

Contents

1 Matrices and Systems of Linear Equations 11.1 Introduction to Matrices and Systems of Linear Equations 21.2 Echelon Form and Gauss–Jordan Elimination 141.3 Consistent Systems of Linear Equations 281.4 Applications (Optional) 391.5 Matrix Operations 461.6 Algebraic Properties of Matrix Operations 611.7 Linear Independence and Nonsingular Matrices 711.8 Data Fitting, Numerical Integration, and

Numerical Differentiation (Optional) 801.9 Matrix Inverses and Their Properties 92

2 Vectors in 2-Space and 3-Space 1132.1 Vectors in the Plane 1142.2 Vectors in Space 1282.3 The Dot Product and the Cross Product 1352.4 Lines and Planes in Space 148

3 The Vector Space Rn 1633.1 Introduction 1643.2 Vector Space Properties of Rn 1673.3 Examples of Subspaces 1763.4 Bases for Subspaces 1883.5 Dimension 2023.6 Orthogonal Bases for Subspaces 2143.7 Linear Transformations from Rn to Rm 2253.8 Least-Squares Solutions to Inconsistent Systems,

with Applications to Data Fitting 2433.9 Theory and Practice of Least Squares 255

June 20, 2001 14:01 i56-frontmatter Sheet number 14 Page number xiv cyan black

xiv Contents

4 The Eigenvalue Problem 2754.1 The Eigenvalue Problem for (2× 2) Matrices 2764.2 Determinants and the Eigenvalue Problem 2804.3 Elementary Operations and Determinants (Optional) 2904.4 Eigenvalues and the Characteristic Polynomial 2984.5 Eigenvectors and Eigenspaces 3074.6 Complex Eigenvalues and Eigenvectors 3154.7 Similarity Transformations and Diagonalization 3254.8 Difference Equations; Markov Chains; Systems of

Differential Equations (Optional) 338

5 Vector Spaces and Linear Transformations 3575.1 Introduction 3585.2 Vector Spaces 3605.3 Subspaces 3685.4 Linear Independence, Bases, and Coordinates 3755.5 Dimension 3885.6 Inner-Product Spaces, Orthogonal Bases, and

Projections (Optional) 3925.7 Linear Transformations 4035.8 Operations with Linear Transformations 4115.9 Matrix Representations for Linear Transformations 4195.10 Change of Basis and Diagonalization 431

6 Determinants 4476.1 Introduction 4486.2 Cofactor Expansions of Determinants 4486.3 Elementary Operations and Determinants 4556.4 Cramer’s Rule 4656.5 Applications of Determinants: Inverses and Wronksians 471

7 Eigenvalues and Applications 4837.1 Quadratic Forms 4847.2 Systems of Differential Equations 4937.3 Transformation to Hessenberg Form 502

June 20, 2001 14:01 i56-frontmatter Sheet number 15 Page number xv cyan black

Contents xv

7.4 Eigenvalues of Hessenberg Matrices 5107.5 Householder Transformations 5197.6 The QR Factorization and Least-Squares Solutions 5317.7 Matrix Polynomials and the Cayley–Hamilton Theorem 5407.8 Generalized Eigenvectors and Solutions of Systems

of Differential Equations 546

Appendix: An Introduction to MATLAB AP1

Answers to Selected Odd-Numbered Exercises AN1

Index I1

August 2, 2001 13:48 i56-ch01 Sheet number 1 Page number 1 cyan black

1

1Matrices andSystems of LinearEquations

Overview In this chapter we discuss systems of linear equations and methods (such as Gauss-Jordanelimination) for solving these systems. We introduce matrices as a convenient languagefor describing systems and the Gauss-Jordan solution method.

We next introduce the operations of addition and multiplication for matrices andshow how these operations enable us to express a linear system in matrix-vector termsas

Ax = b.

Representing the matrix A in column form as A = [A1,A2, . . . ,An], we then show thatthe equation Ax = b is equivalent to

x1A1 + x2A2 + · · · + xnAn = b.

The equation above leads naturally to the concepts of linear combination and linear inde-pendence. In turn, those ideas allow us to address questions of existence and uniquenessfor solutions of Ax = b and to introduce the idea of an inverse matrix.

Core Sections 1.1 Introduction to Matrices and Systems of Linear Equations1.2 Echelon Form and Gauss-Jordan Elimination1.3 Consistent Systems of Linear Equations1.5 Matrix Operations1.6 Algebraic Properties of Matrix Operations1.7 Linear Independence and Nonsingular Matrices1.9 Matrix Inverses and Their Properties


2 Chapter 1 Matrices and Systems of Linear Equations

1.1 INTRODUCTION TO MATRICES AND SYSTEMSOF LINEAR EQUATIONS

In the real world, problems are seldom so simple that they depend on a single inputvariable. For example, a manufacturer’s profit clearly depends on the cost of materials,but it also depends on other input variables such as labor costs, transportation costs, andplant overhead. A realistic expression for profit would involve all these variables. Usingmathematical language, we say that profit is a function of several variables.

In linear algebra we study the simplest functions of several variables, the ones thatare linear. We begin our study by considering linear equations. By way of illustration,the equation

x1 + 2x2 + x3 = 1

is an example of a linear equation, and x1 = 2, x2 = 1, x3 = −3 is one solution for theequation. In general a linear equation in n unknowns is an equation that can be put inthe form

a1x1 + a2x2 + · · · + anxn = b. (1)

In (1), the coefficients a1, a2, . . . , an and the constant b are known, and x1, x2, . . . , xndenote the unknowns. A solution to Eq. (1) is any sequence s1, s2, . . . , sn of numberssuch that the substitution x1 = s1, x2 = s2, . . . , xn = sn satisfies the equation.

Equation (1) is called linear because each term has degree one in the variablesx1, x2, . . . , xn. (Also, see Exercise 37.)

Example 1 Determine which of the following equations are linear.

(i) x1 + 2x1x2 + 3x2 = 4(ii) x1/2

1 + 3x2 = 4(iii) 2x−1

1 + sin x2 = 0(iv) 3x1 − x2 = x3 + 1

Solution Only Eq. (iv) is linear. The terms x1x2, x1/21 , x−1

1 , and sin x2 are all nonlinear.

Linear SystemsOur objective is to obtain simultaneous solutions to a system (that is, a set) of one ormore linear equations. Here are three examples of systems of linear equations.

(a) x1 + x2 = 3x1 − x2 = 1

(b) x1 − 2x2 − 3x3 = −11−x1 + 3x2 + 5x3 = 15

(c) 3x1 − 2x2 = 16x1 − 4x2 = 6

In terms of solutions, it is easy to check that x1 = 2, x2 = 1 is one solution to system(a). Indeed, it can be shown that this is the only solution to the system.


1.1 Introduction to Matrices and Systems of Linear Equations 3

On the other hand, x1 = −4, x2 = 2, x3 = 1 and x1 = −2, x2 = 6, x3 = −1 areboth solutions to system (b). In fact, it can be verified by substitution that x1 = −3− x3and x2 = 4− 2x3 yields a solution to system (b) for any choice of x3. Thus, this systemhas infinitely many solutions.

Finally, note that the equations given in (c) can be viewed as representing two parallellines in the plane. Therefore, system (c) has no solution. (Another way to see that (c)has no solution is to observe that the second equation in (c), when divided by 2, reducesto 3x1 − 2x2 = 3. Because the first equation requires 3x1 − 2x2 = 1, there is no waythat both equations can be satisfied.)

In general, an (m × n) system of linear equations is a set of equations of the form:

a11x1 + a12x2 + · · · + a1nxn = b1

a21x1 + a22x2 + · · · + a2nxn = b2...

......

am1x1 + am2x2 + · · · + amnxn = bm.

(2)*

For example, the general form of a (3× 3) system of linear equations is

a11x1 + a12x2 + a13x3 = b1

a21x1 + a22x2 + a23x3 = b2

a31x1 + a32x2 + a33x3 = b3.

A solution to system (2) is a sequence s1, . . . , sn of numbers that is simultaneouslya solution for each equation in the system. The double subscript notation used for thecoefficients is necessary to provide an “address” for each coefficient. For example, a32appears in the third equation as the coefficient of x2.

Example 2

(a) Display the system of equations with coefficients a11 = 2, a12 = −1, a13 =−3, a21 = −2, a22 = 2, and a23 = 5, and with constants b1 = −1 and b2 = 3.

(b) Verify that x1 = 1, x2 = 0, x3 = 1 is a solution for the system.

Solution

(a) The system is

2x1 − x2 − 3x3 = −1−2x1 + 2x2 + 5x3 = 3.

(b) Substituting x1 = 1, x2 = 0, and x3 = 1 yields

2(1)− (0)− 3(1) = −1−2(1)+ 2(0)+ 5(1) = 3.

∗For clarity of presentation, we assume throughout the chapter that the constants aij and bi are real numbers,although all statements are equally valid for complex constants. When we consider eigenvalue problems,we will occasionally encounter linear systems having complex coefficients, but the solution technique is nodifferent. In Chapter 4 we will discuss the technical details of solving systems that have complex coefficients.



Geometric Interpretations of Solution SetsWe can use geometric examples to get an initial impression about the nature of solutionsets for linear systems. For example, consider a general (2×2) system of linear equations

a11x1 + a12x2 = b1 (a11, a12 not both zero)a21x1 + a22x2 = b2 (a21, a22 not both zero).

Geometrically, the solution set for each of these equations can be represented as a linein the plane. A solution for the system, therefore, corresponds to a point (x1, x2) wherethe lines intersect. From this geometric interpretation, it follows that there are exactlythree possibilities:

1. The two lines are coincident (the same line), so there are infinitely manysolutions.

2. The two lines are parallel (never meet), so there are no solutions.3. The two lines intersect at a single point, so there is a unique solution.

The three possibilities are illustrated in Fig. 1.1 and in Example 3.

x1

x2

x1

x2

x1

x2

Coincident linesInfinitely many solutions

(a)

Parallel linesNo solution

(b)

Intersecting linesUnique solution

(c)

Figure 1.1 The three possibilities for the solution set of a (2× 2)system.

Example 3 Give a geometric representation for each of the following systems of equations.

(a) x1 + x2 = 22x1 + 2x2 = 4

(b) x1 + x2 = 2x1 + x2 = 1

(c) x1 + x2 = 3x1 − x2 = 1

Solution The representations are displayed in Fig. 1.1.



The graph of a linear equation in three variables, ax1+ bx2+ cx3 = d, is a plane inthree-dimensional space (as long as one of a, b, or c is nonzero). So, as another example,let us consider the general (2× 3) system:

a11x1 + a12x2 + a13x3 = b1

a21x1 + a22x2 + a23x3 = b2.

Because the solution set for each equation can be represented by a plane, there are twopossibilities:

1. The two planes might be coincident, or they might intersect in a line. In eithercase, the system has infinitely many solutions.

2. The two planes might be parallel. In this case, the system has no solution.

Note, for the case of the general (2× 3) system, that the possibility of a unique solutionhas been ruled out.

As a final example, consider a general (3× 3) system:

a11x1 + a12x2 + a13x3 = b1

a21x1 + a22x2 + a23x3 = b2

a31x1 + a32x2 + a33x3 = b3.

If we view this (3 × 3) system as representing three planes, it is easy to see from thegeometric perspective that there are three possible outcomes: infinitely many solutions,no solution, or a unique solution (see Fig. 1.2). Note that Fig. 1.2(b) does not illustrateevery possible case of a (3 × 3) system that has no solution. For example, if just twoof three planes are parallel, then the system has no solution even though the third planemight intersect each of the two parallel planes.

We conclude this subsection with the following remark, which we will state formallyin Section 1.3 (see Corollary to Theorem 3). This remark says that the possible outcomessuggested by the geometric interpretations shown in Figs. 1.1 and 1.2 are typical for anysystem of linear equations.

Remark An (m × n) system of linear equations has either infinitely many solutions,no solution, or a unique solution.

(b)(a) (c)

Figure 1.2 The general (3× 3) system may have (a) infinitely manysolutions, (b) no solution, or (c) a unique solution.



In general, a system of equations is called consistent if it has at least one solution, and thesystem is called inconsistent if it has no solution. By the preceding remark, a consistentsystem has either one solution or an infinite number of solutions; it is not possible for alinear system to have, for example, exactly five solutions.

MatricesWe begin our introduction to matrix theory by relating matrices to the problem of solvingsystems of linear equations. Initially we show that matrix theory provides a convenientand natural symbolic language to describe linear systems. Later we show that matrixtheory is also an appropriate and powerful framework within which to analyze and solvemore general linear problems, such as least-squares approximations, representations oflinear operations, and eigenvalue problems.

The rectangular array

140

322

−110

2−3

3

is an example of a matrix. More generally, an (m × n) matrix is a rectangular array ofnumbers of the form

A =

a11

a21...am1

a12

a22

am2

· · ·· · ·

· · ·

a1n

a2n...amn

.

Thus an (m × n) matrix has m rows and n columns. The subscripts for the entry aijindicate that the number appears in the ith row and j th column ofA. For example, a32 isthe entry in the third row and second column of A. We will frequently use the notationA = (aij ) to denote a matrix A with entries aij .

Example 4 Display the (2 × 3) matrix A = (aij ), where a11 = 6, a12 = 3, a13 = 7, a21 = 2,a22 = 1, and a23 = 4.

Solution

A =[

62

31

74

]

Matrix Representation of a Linear SystemTo illustrate the use of matrices to represent linear systems, consider the (3× 3) systemof equations

x1 + 2x2 + x3 = 42x1 − x2 − x3 = 1x1 + x2 + 3x3 = 0.



If we display the coefficients and constants for this system in matrix form,

B =

121

2−1

1

1−1

3

410

,

then we have expressed compactly and naturally all the essential information. The matrixB is called the augmented matrix for the system.

In general, with the (m× n) system of linear equations

a11x1 + a12x2 + · · · + a1nxn = b1

a21x1 + a22x2 + · · · + a2nxn = b2...

......

am1x1 + am2x2 + · · · + amnxn = bm,(3)

we associate two matrices. The coefficient matrix for system (3) is the (m× n) matrixA where

A =

a11

a21...am1

a12

a22

am2

· · ·· · ·

· · ·

a1n

a2n...amn

.

The augmented matrix for system (3) is the [m× (n+ 1)] matrix B where

B =

a11

a21...am1

a12

a22

am2

· · ·· · ·

· · ·

a1n

a2n...amn

b1

b2...

bm

.

Note that B is nothing more than the coefficient matrix A augmented with an extracolumn; the extra column is the right-hand side of system (3).

The augmented matrix B is usually denoted as [A | b], where A is the coefficientmatrix and

b =

b1

b2...

bm

.

Example 5 Display the coefficient matrix A and the augmented matrix B for the system

x1 − 2x2 + x3 = 22x1 + x2 − x3 = 1−3x1 + x2 − 2x3 = −5.



Solution The coefficient matrix A and the augmented matrix [A | b] are given by

A =

12−3

−211

1−1−2

and [A | b] =

12−3

−211

1−1−2

21−5

.

Elementary OperationsAs we shall see, there are two steps involved in solving an (m× n) system of equations.The steps are:

1. Reduction of the system (that is, the elimination of variables).2. Description of the set of solutions.

The details of both steps will be left to the next section. For the remainder of this section,we will concentrate on giving an overview of the reduction step.

The goal of the reduction process is to simplify the given system by eliminatingunknowns. It is, of course, essential that the reduced system of equations have the sameset of solutions as the original system.

Definition 1 Two systems of linear equations in n unknowns are equivalent provided that theyhave the same set of solutions.

Thus the reduction procedure must yield an equivalent system of equations. The follow-ing theorem provides three operations, called elementary operations, that can be usedin reduction.

Theorem 1 If one of the following elementary operations is applied to a system of linear equations,then the resulting system is equivalent to the original system.

1. Interchange two equations.2. Multiply an equation by a nonzero scalar.3. Add a constant multiple of one equation to another.

(In part 2 of Theorem 1, the term scalar means a constant; that is, a number.) The proofof Theorem 1 is included in Exercise 41 of Section 1.1.

To facilitate the use of the elementary operations listed above, we adopt the followingnotation:

Notation Elementary Operation PerformedEi ↔ Ej The ith and j th equations are interchanged.kEi The ith equation is multiplied by the nonzero scalar k.Ei + kEj k times the j th equation is added to the ith equation.



The following simple example illustrates the use of elementary operations to solve a(2×2) system. (The complete solution process for a general (m×n) system is describedin detail in the next section.)

Example 6 Use elementary operations to solve the system

x1 + x2 = 5−x1 + 2x2 = 4.

Solution The elementary operation E2 + E1 produces the following equivalent system:

x1 + x2 = 53x2 = 9.

The operation 13E2 then leads to

x1 + x2 = 5x2 = 3.

Finally, using the operation E1 − E2, we obtain

x1 = 2x2 = 3.

By Theorem 1, the system above is equivalent to the original system. Hence the solutionto the original system is also x1 = 2, x2 = 3.

(Note: Example 6 illustrates a systematic method for solving a system of linear equa-tions. This method is calledGauss-Jordan elimination and is described fully in the nextsection.)

Row OperationsAs noted earlier, we want to use an augmented matrix as a shorthand notation for asystem of equations. Because equations become translated to rows in the augmentedmatrix, we want to perform elementary operations on the rows of a matrix. Toward thatend, we introduce the following terminology.

Definition 2 The following operations, performed on the rows of a matrix, are called elementaryrow operations:

1. Interchange two rows.2. Multiply a row by a nonzero scalar.3. Add a constant multiple of one row to another.



As before, we adopt the following notation:

Notation Elementary Row OperationRi ↔ Rj The ith and j th rows are interchanged.kRi The ith row is multiplied by the nonzero scalar k.Ri + kRj k times the j th row is added to the ith row.

We say that two (m × n) matrices, B and C, are row equivalent if one can beobtained from the other by a sequence of elementary row operations. Now if B is theaugmented matrix for a system of linear equations and if C is row equivalent to B, thenC is the augmented matrix for an equivalent system. This observation follows becausethe elementary row operations for matrices exactly duplicate the elementary operationsfor equations.

Thus, we can solve a linear system with the following steps:

1. Form the augmented matrix B for the system.2. Use elementary row operations to transform B to a row equivalent matrix C

which represents a “simpler” system.3. Solve the simpler system that is represented by C.

We will specify what we mean by a simpler system in the next section. For now, weillustrate in Example 7 how using elementary row operations to reduce an augmentedmatrix is exactly parallel to using elementary operations to reduce the correspondingsystem of equations.

Example 7 Consider the (3× 3) system of equations

2x2 + x3 = −23x1 + 5x2 − 5x3 = 12x1 + 4x2 − 2x3 = 2.

Use elementary operations on equations to reduce the following system. Simultaneouslyuse elementary row operations to reduce the augmented matrix for the system.

Solution In the left-hand column of the following table, we will reduce the given system ofequations using elementary operations. In the right-hand column we will perform theanalogous elementary row operations on the augmented matrix. (Note: At each step ofthe process, the system of equations obtained in the left-hand column is equivalent to theoriginal system. The corresponding matrix in the right-hand column is the augmentedmatrix for the system in the left-hand column.)

Our initial goal is to have x1 appear in the first equation with coefficient 1, andthen to eliminate x1 from the remaining equations. This can be accomplished by thefollowing steps:



System: Augmented Matrix:

2x2 + x3 = −23x1 + 5x2 − 5x3 = 12x1 + 4x2 − 2x3 = 2

032

254

1−5−2

−212

E1 ↔ E3: R1 ↔ R3:

2x1 + 4x2 − 2x3 = 23x1 + 5x2 − 5x3 = 1

2x2 + x3 = −2

230

452

−2−5

1

21−2

(1/2)E1: (1/2)R1:x1 + 2x2 − x3 = 1

3x1 + 5x2 − 5x3 = 12x2 + x3 = −2

130

252

−1−5

1

11−2

E2 − 3E1: R2 − 3R1:

x1 + 2x2 − x3 = 1− x2 − 2x3 = −2

2x2 + x3 = −2

100

2−1

2

−1−2

1

1−2−2

The variable x1 has now been eliminated from the second and third equations. Next,we eliminate x2 from the first and third equations and leave x2, with coefficient 1, in thesecond equation. We continue the reduction process with the following operations:

(−1)E2: (−1)R2:

x1 + 2x2 − x3 = 1x2 + 2x3 = 2

2x2 + x3 = −2

100

212

−121

12−2

E1 − 2E2: R1 − 2R2:

x1 − 5x3 = −3x2 + 2x3 = 2

2x2 + x3 = −2

100

012

−521

−32−2

E3 − 2E2: R3 − 2R2:x1 − 5x3 = −3

x2 + 2x3 = 2−3x3 = −6

100

010

−52−3

−32−6



The variable x2 has now been eliminated from the first and third equations. Next, weeliminate x3 from the first and second equations and leave x3, with coefficient 1, in thethird equation:

System: Augmented Matrix:(−1/3)E3: (−1/3)R3:

x1 − 5x3 = −3x2 + 2x3 = 2

x3 = 2

100

010

−521

−322

E1 + 5E3: R1 + 5R3:x1 = 7

x2 + 2x3 = 2x3 = 2

100

010

021

722

E2 − 2E3: R2 − 2R3:x1 = 7

x2 = −2x3 = 2

100

010

001

7−2

2

The last system above clearly has a unique solution given by x1 = 7, x2 = −2, andx3 = 2. Because the final system is equivalent to the original given system, bothsystems have the same solution.

The reduction process used in the preceding example is known as Gauss-Jordanelimination and will be explained in Section 1.2. Note the advantage of the shorthandnotation provided by matrices. Because we do not need to list the variables, the sequenceof steps in the right-hand column is easier to perform and record.

Example 7 illustrates that row equivalent augmented matrices represent equivalentsystems of equations. The following corollary to Theorem 1 states this in mathematicalterms.

Corollary Suppose [A | b] and [C | d] are augmented matrices, each representing a different (m×n)system of linear equations. If [A | b] and [C | d] are row equivalent matrices, then thetwo systems are also equivalent.

1.1 EXERCISES

Which of the equations in Exercises 1–6 are linear?1. x1 + 2x3 = 3 2. x1x2 + x2 = 13. x1 − x2 = sin2 x1 + cos2 x1

4. x1 − x2 = sin2 x1 + cos2 x2

5. |x1| − |x2| = 0 6. πx1 +√

7x2 =√

3

In Exercises 7–10, coefficients are given for a systemof the form (2). Display the system and verify that thegiven values constitute a solution.

7. a11 = 1, a12 = 3, a21 = 4, a22 = −1,b1 = 7, b2 = 2; x1 = 1, x2 = 2



8. a11 = 6, a12 = −1, a13 = 1, a21 = 1,a22 = 2, a23 = 4, b1 = 14, b2 = 4;x1 = 2, x2 = −1, x3 = 1

9. a11 = 1, a12 = 1, a21 = 3, a22 = 4,a31 = −1, a32 = 2, b1 = 0, b2 = −1,b3 = −3; x1 = 1, x2 = −1

10. a11 = 0, a12 = 3, a21 = 4, a22 = 0,b1 = 9, b2 = 8; x1 = 2, x2 = 3

In Exercises 11–14, sketch a graph for each equation todetermine whether the system has a unique solution, nosolution, or infinitely many solutions.11. 2x + y = 5

x − y = 112. 2x − y = −1

2x − y = 213. 3x + 2y = 6−6x − 4y = −12

14. 2x + y = 5x − y = 1x + 3y = 9

15. The (2× 3) system of linear equationsa1x + b1y + c1z = d1

a2x + b2y + c2z = d2

is represented geometrically by two planes. Howare the planes related when:a) The system has no solution?b) The system has infinitely many solutions?Is it possible for the system to have a unique solu-tion? Explain.

In Exercises 16–18, determine whether the given (2×3)system of linear equations represents coincident planes(that is, the same plane), two parallel planes, or twoplanes whose intersection is a line. In the latter case, givethe parametric equations for the line; that is, give equa-tions of the form x = at + b, y = ct + d, z = et + f .16. 2x1 + x2 + x3 = 3−2x1 + x2 − x3 = 1

17. x1 + 2x2 − x3 = 2x1 + x2 + x3 = 3

18. x1 + 3x2 − 2x3 = −12x1 + 6x2 − 4x3 = −2

19. Display the (2×3)matrixA = (aij ), where a11 = 2,a12 = 1, a13 = 6, a21 = 4, a22 = 3, and a23 = 8.

20. Display the (2×4)matrixC = (cij ), where c23 = 4,c12 = 2, c21 = 2, c14 = 1, c22 = 2, c24 = 3,c11 = 1, and c13 = 7.

21. Display the (3×3)matrixQ = (qij ), where q23 = 1,q32 = 2, q11 = 1, q13 = −3, q22 = 1, q33 = 1,q21 = 2, q12 = 4, and q31 = 3.

22. Suppose the matrix C in Exercise 20 is the aug-mented matrix for a system of linear equations. Dis-play the system.

23. Repeat Exercise 22 for the matrices in Exercises 19and 21.

In Exercises 24–29, display the coefficient matrixA andthe augmented matrix B for the given system.24. x1 − x2 = −1x1 + x2 = 3

25. x1 + x2 − x3 = 22x1 − x3 = 1

26. x1 + 3x2 − x3 = 12x1 + 5x2 + x3 = 5x1 + x2 + x3 = 3

27. x1 + x2 + 2x3 = 63x1 + 4x2 − x3 = 5−x1 + x2 + x3 = 2

28. x1 + x2 − 3x3 = −1x1 + 2x2 − 5x3 = −2−x1 − 3x2 + 7x3 = 3

29. x1 + x2 + x3 = 12x1 + 3x2 + x3 = 2x1 − x2 + 3x3 = 2

In Exercises 30–36, display the augmented matrix for thegiven system. Use elementary operations on equationsto obtain an equivalent system of equations in which x1appears in the first equation with coefficient one and hasbeen eliminated from the remaining equations. Simul-taneously, perform the corresponding elementary rowoperations on the augmented matrix.30. 2x1 + 3x2 = 6

4x1 − x2 = 731. x1 + 2x2 − x3 = 1

x1 + x2 + 2x3 = 2−2x1 + x2 = 4

32. x2 + x3 = 4x1 − x2 + 2x3 = 1

2x1 + x2 − x3 = 6

33. x1 + x2 = 9x1 − x2 = 7

3x1 + x2 = 634. x1 + x2 + x3 − x4 = 1−x1 + x2 − x3 + x4 = 3−2x1 + x2 + x3 − x4 = 2

35. x2 + x3 − x4 = 3x1 + 2x2 − x3 + x4 = 1−x1 + x2 + 7x3 − x4 = 0

36. x1 + x2 = 0x1 − x2 = 0

3x1 + x2 = 037. Consider the equation 2x1 − 3x2 + x3 − x4 = 3.

a) In the six different possible combinations, setany two of the variables equal to 1 and graph theequation in terms of the other two.

b) What type of graph do you always get when youset two of the variables equal to two fixedconstants?

c) What is one possible reason the equation informula (1) is called linear?



38. Consider the (2× 2) systema11x1 + a12x2 = b1

a21x1 + a22x2 = b2.

Show that if a11a22 − a12a21 = 0, then this systemis equivalent to a system of the form

c11x1 + c12x2 = d1

c22x2 = d2,

where c11 = 0 and c22 = 0. Note that the secondsystem always has a solution. [Hint: First supposethat a11 = 0, and then consider the special case inwhich a11 = 0.]

39. In the following (2× 2) linear systems (A) and (B),c is a nonzero scalar. Prove that any solution,x1 = s1, x2 = s2, for (A) is also a solution for(B). Conversely, show that any solution, x1 = t1,x2 = t2, for (B) is also a solution for (A). Where isthe assumption that c is nonzero required?

(A)a11x1 + a12x2 = b1

a21x1 + a22x2 = b2

(B)a11x1 + a12x2 = b1

ca21x1 + ca22x2 = cb2

40. In the (2× 2) linear systems that follow, the system(B) is obtained from (A) by performing the elemen-tary operation E2 + cE1. Prove that any solution,x1 = s1, x2 = s2, for (A) is a solution for (B). Sim-ilarly, prove that any solution, x1 = t1, x2 = t2, for(B) is a solution for (A).

(A)a11x1 + a12x2 = b1

a21x1 + a22x2 = b2

(B)a11x1 + a12x2 = b1

(a21 + ca11)x1 + (a22 + ca12)x2 = b2 + cb1

41. Prove that any of the elementary operations in The-orem 1 applied to system (2) produces an equivalentsystem. [Hint: To simplify this proof, represent theith equation in system (2) as fi(x1, x2, . . . , xn) =bi ; so

fi(x1, x2, . . . , xn) = ai1x1 + ai2x2 + · · · + ainxnfor i = 1, 2, . . . , m. With this notation, system (2)has the form of (A), which follows. Next, for exam-ple, if a multiple of c times the j th equation is addedto the kth equation, a new system of the form (B) isproduced:

(A) (B)f1(x1, x2, . . . , xn) = b1 f1(x1, x2, . . . , xn) = b1

......

......

fj (x1, x2, . . . , xn) = bj fj (x1, x2, . . . , xn) = bj...

......

...

fk(x1, x2, . . . , xn) = bk g(x1, x2, . . . , xn) = r...

......

...

fm(x1, x2, . . . , xn) = bm fm(x1, x2, . . . , xn) = bmwhere g(x1, x2, . . . , xn) = fk(x1, x2, . . . , xn) +cfj (x1, x2, . . . , xn), and r = bk + cbj . To showthat the operation gives an equivalent system, showthat any solution for (A) is a solution for (B), andvice versa.]

42. Solve the system of two nonlinear equations in twounknowns

x21 − 2x1 + x2

2 = 3

x21 − x2

2 = 1.

1.2 ECHELON FORM AND GAUSS-JORDANELIMINATION

As we noted in the previous section, our method for solving a system of linear equationswill be to pass to the augmented matrix, use elementary row operations to reduce theaugmented matrix, and then solve the simpler but equivalent system represented by thereduced matrix. This procedure is illustrated in Fig. 1.3.

The objective of the Gauss-Jordan reduction process (represented by the middleblock in Fig. 1.3) is to obtain a system of equations simplified to the point where we


1.2 Echelon Form and Gauss-Jordan Elimination 15

Given systemof equations

Augmentedmatrix

Reducedmatrix

Reduced systemof equations Solution

Figure 1.3 Procedure for solving a system of linear equations

can immediately describe the solution. See, for example, Examples 6 and 7 in Section1.1. We turn now to the question of how to describe this objective in mathematicalterms—that is, how do we know when the system has been simplified as much as it canbe? The answer is: The system has been simplified as much as possible when it is inreduced echelon form.

Echelon FormWhen an augmented matrix is reduced to the form known as echelon form, it is easy tosolve the linear system represented by the reduced matrix. The formal description ofechelon form is given in Definition 3. Then, in Definition 4, we describe an even simplerform known as reduced echelon form.

Definition 3 An (m× n) matrix B is in echelon form if:

1. All rows that consist entirely of zeros are grouped together at the bottom ofthe matrix.

2. In every nonzero row, the first nonzero entry (counting from left to right) isa 1.

3. If the (i + 1)-st row contains nonzero entries, then the first nonzero entry is ina column to the right of the first nonzero entry in the ith row.

Put informally, a matrix A is in echelon form if the nonzero entries in A form astaircase-like pattern, such as the four examples shown in Fig. 1.4. (Note: Exercise 46shows that there are exactly seven different types of echelon form for a (3× 3) matrix.Figure 1.4 illustrates four of the possible patterns. In Fig. 1.4, the entries marked ∗ canbe zero or nonzero.)

A = 100

∗10

∗∗1

A = 100

∗10

∗∗0

A = 100

∗00

∗10

A = 000

100

∗10

Figure 1.4 Patterns for four of the seven possible types of (3× 3)matrices in echelon form. Entries marked ∗ can be either zero or nonzero.



Two examples of matrices in echelon form are

A =

10000

−10000

41000

38100

0−4

200

23110

02230

B =

000

100

−110

460

3−5

1

.

We show later that every matrix can be transformed to echelon form with elementary rowoperations. It turns out, however, that echelon form is not unique. In order to guaranteeuniqueness, we therefore add one more constraint and define a form known as reducedechelon form. As noted in Theorem 2, reduced echelon form is unique.

Definition 4 A matrix that is in echelon form is in reduced echelon form provided that the firstnonzero entry in any row is the only nonzero entry in its column.

Figure 1.5 gives four examples (corresponding to the examples in Fig. 1.4) of matricesin reduced echelon form.

A = 100

010

001

A = 100

010

∗∗0

A = 100

∗00

010

A = 000

100

010

Figure 1.5 Patterns for four of the seven possible types of (3× 3)matrices in reduced echelon form. Entries marked ∗ can be either zeroor nonzero.

Two examples of matrices in reduced echelon form are

A =

100

010

001

2−1

3

B =

100

200

010

130

−140

.

As can be seen from these examples and from Figs. 1.4 and 1.5, the feature that distin-guishes reduced echelon form from echelon form is that the leading 1 in each nonzerorow has only 0’s above and below it.

Example 1 For each matrix shown, choose one of the following phrases to describe the matrix.

(a) The matrix is not in echelon form.(b) The matrix is in echelon form, but not in reduced echelon form.(c) The matrix is in reduced echelon form.



A =

123

01−4

001

, B =

100

3−1

0

211

,

C =

000

100

−100

010

, D =

100

200

310

421

530

, E =

100

,

F =

001

, G = [1 0 0], H = [0 0 1].

Solution A, B, and F are not in echelon form; D is in echelon form but not in reduced echelonform; C, E, G, and H are in reduced echelon form.

Solving a Linear System Whose Augmented Matrix Is in ReducedEchelon FormSoftware packages that can solve systems of equations typically include a commandthat produces the reduced echelon form of a matrix. Thus, to solve a linear system on amachine, we first enter the augmented matrix for the system and then apply the machine’sreduce command. Once we get the machine output (that is, the reduced echelon formfor the original augmented matrix), we have to interpret the output in order to find thesolution. The next example illustrates this interpretation process.

Example 2 Each of the following matrices is in reduced echelon form and is the augmented matrixfor a system of linear equations. In each case, give the system of equations and describethe solution.

B =

1000

0100

0010

3−2

70

, C =

100

010

−130

001

,

D =

100

−300

010

4−5

0

210

, E =

100

200

010

500

.

Solution

Matrix B: Matrix B is the augmented matrix for the following system:

x1 = 3x2 = −2

x3 = 7.

Therefore, the system has the unique solution x1 = 3, x2 = −2, and x3 = 7.



Matrix C: Matrix C is the augmented matrix for the following system

x1 −x3 = 0x2 + 3x3 = 0

0x1 + 0x2 + 0x3 = 1.

Because no values for x1, x2, or x3 can satisfy the third equation, the system isinconsistent.

Matrix D: Matrix D is the augmented matrix for the following system

x1 − 3x2 + 4x4 = 2x3 − 5x4 = 1.

We solve each equation for the leading variable in its row, finding

x1 = 2+ 3x2 − 4x4

x3 = 1+ 5x4.

In this case, x1 and x3 are the dependent (or constrained) variables whereas x2 and x4 arethe independent (or unconstrained) variables. The system has infinitely many solutions,and particular solutions can be obtained by assigning values to x2 and x4. For example,setting x2 = 1 and x4 = 2 yields the solution x1 = −3, x2 = 1, x3 = 11, and x4 = 2.

Matrix E: The second row of matrix E sometimes leads students to conclude erro-neously that the system of equations is inconsistent. Note the critical difference betweenthe third row of matrix C (which did represent an inconsistent system) and the secondrow of matrix E. In particular, if we write the system corresponding to E, we find

x1 + 2x2 = 5x3 = 0.

Thus, the system has infinitely many solutions described by

x1 = 5− 2x2

x3 = 0

where x2 is an independent variable.

As we noted in Example 2, if an augmented matrix has a row of zeros, we sometimesjump to the conclusion (an erroneous conclusion) that the corresponding system ofequations is inconsistent (see the discussion of matrix E in Example 2). Similar confusioncan arise when the augmented matrix has a column of zeros. For example, consider thematrix

100

000

010

−2−4

0

001

312

,

where F is the augmented matrix for a system of 3 equations in 5 unknowns. Thus,F represents the system



x1 − 2x4 = 3x3 − 4x4 = 2

x5 = 2.

The solution of this system is x1 = 3+ 2x4, x3 = 1+ 4x4, x5 = 2, and x4 is arbitrary.Note that the equations place no constraint whatsoever on the variable x2. That does notmean that x2 must be zero; instead, it means that x2 is also arbitrary.

Recognizing an Inconsistent SystemSuppose [A | b] is the augmented matrix for an (m × n) linear system of equations. If[A | b] is in reduced echelon form, you should be able to tell at a glance whether thelinear system has any solutions. The idea was illustrated by matrix C in Example 2.

In particular, we can show that if the last nonzero row of [A | b] has its leading 1 inthe last column, then the linear system has no solution. To see why this is true, supposethe last nonzero row of [A | b] has the form

[0, 0, 0, . . . , 0, 1].This row, then, represents the equation

0x1 + 0x2 + 0x3 + · · · + 0xn = 1.

Because this equation cannot be satisfied, it follows that the linear system representedby [A | b] is inconsistent. We list this observation formally in the following remark.

Remark Let [A | b] be the augmented matrix for an (m×n) linear system of equations,and let [A | b] be in reduced echelon form. If the last nonzero row of [A | b] has its leading1 in the last column, then the system of equations has no solution.

When you are carrying out the reduction of [A | b] to echelon form by hand, youmight encounter a row that consists entirely of zeros except for a nonzero entry in the lastcolumn. In such a case, there is no reason to continue the reduction process since youhave found an equation in an equivalent system that has no solution; that is, the systemrepresented by [A | b] is inconsistent.

Reduction to Echelon FormThe following theorem guarantees that every matrix can be transformed to one and onlyone matrix that is in reduced echelon form.

Theorem 2 Let B be an (m× n) matrix. There is a unique (m× n) matrix C such that:

(a) C is in reduced echelon form

and

(b) C is row equivalent to B.

Suppose B is the augmented matrix for an (m× n) system of linear equations. Oneimportant consequence of this theorem is that it shows we can always transform B by a



series of elementary row operations into a matrix C which is in reduced echelon form.Then, because C is in reduced echelon form, it is easy to solve the equivalent linearsystem represented by C (recall Example 2).

The following steps show how to transform a given matrix B to reduced echelonform. As such, this list of steps constitutes an informal proof of the existence portionof Theorem 2. We do not prove the uniqueness portion of Theorem 2. The steps listedassume that B has at least one nonzero entry (because if B has only zero entries, then Bis already in reduced row echelon form).

Reduction to Reduced Echelon Form for an (m× n)Matrix

Step 1. Locate the first (left-most) column that contains a nonzero entry.Step 2. If necessary, interchange the first row with another row so that the first

nonzero column has a nonzero entry in the first row.Step 3. If a denotes the leading nonzero entry in row one, multiply each entry

in row one by 1/a. (Thus, the leading nonzero entry in row one is a 1.)Step 4. Add appropriate multiples of row one to each of the remaining rows so

that every entry below the leading 1 in row one is a 0.Step 5. Temporarily ignore the first row of this matrix and repeat Steps 1–4 on

the submatrix that remains. Stop the process when the resulting matrixis in echelon form.

Step 6. Having reached echelon form in Step 5, continue on to reduced echelonform as follows: Proceeding upward, add multiples of each nonzerorow to the rows above in order to zero all entries above the leading 1.

The next example illustrates an application of the six-step process just described.When doing a small problem by hand, however, it is customary to alter the steps slightly—instead of going all the way to echelon form (sweeping from left to right) and then goingfrom echelon to reduced echelon form (sweeping from bottom to top), it is customaryto make a single pass (moving from left to right) introducing 0’s above and below theleading 1. Example 3 demonstrates this single-pass variation.

Example 3 Use elementary row operations to transform the following matrix to reduced echelonform

A =

0000

003−2

00

−128

01−3

1

23−9

6

811−24

17

49

−3321

.

Solution The following row operations will transform A to reduced echelon form.



R1 ↔ R3, (1/3)R1: Introduce a leading 1 into the first row of the firstnonzero column.

0000

100−2

−4008

−1101

−3326

−811

817

−1194

21

R4 + 2R1: Introduce 0’s below the leading 1 in row 1.

0000

1000

−4000

−110−1

−3320

−811

81

−1194−1

R1 + R2, R4 + R2: Introduce 0’s above and below the leading 1 in row 2.

0000

1000

−4000

0100

0323

311

812

−2948

(1/2)R3: Introduce a leading 1 into row 3.

0000

1000

−4000

0100

0313

3114

12

−2928

R2 − 3R3, R4 − 3R3: Introduce 0’s above and below the leading 1 in row 3.

0000

1000

−4000

0100

0010

3−1

40

−2322

(1/2)R4: Introduce a leading 1 into row 4.

0000

1000

−4000

0100

0010

3−1

40

−2321

R1 + 2R4, R2 − 3R4, R3 − 2R4: Introduce 0’s above the leading 1 in row 4.

0000

1000

−4000

0100

0010

3−1

40

0001



Having provided this example of how to transform a matrix to reduced echelon form,we can be more specific about the procedure for solving a system of equations that isdiagrammed in Fig. 1.3.

Solving a System of Equations

Given a system of equations:Step 1. Create the augmented matrix for the system.Step 2. Transform the matrix in Step 1 to reduced echelon form.Step 3. Decode the reduced matrix found in Step 2 to obtain its associated

system of equations. (This system is equivalent to the original system.)Step 4. By examining the reduced system in Step 3, describe the solution set

for the original system.

The next example illustrates the complete process.

Example 4 Solve the following system of equations:

2x1 − 4x2 + 3x3 − 4x4 − 11x5 = 28−x1 + 2x2 − x3 + 2x4 + 5x5 = −13

− 3x3 + x4 + 6x5 = −103x1 − 6x2 + 10x3 − 8x4 − 28x5 = 61.

Solution We first create the augmented matrix and then transform it to reduced echelon form. Theaugmented matrix is

2−1

03

−420−6

3−1−310

−421−8

−1156

−28

28−13−10

61

.

The first step is to introduce a leading 1 into row 1. We can introduce the leading 1if we multiply row 1 by 1/2, but that would create fractions that are undesirable for handwork. As an alternative, we can add row 2 to row 1 and avoid fractions.

R1 + R2:

1−1

03

−220−6

2−1−310

−221−8

−656

−28

15−13−10

61



R2 + R1, R4 − 3R1: Introduce 0’s below the leading 1 in row 1.

1000

−2000

21−3

4

−201−2

−6−1

6−10

152

−1016

R1 − 2R2, R3 + 3R2, R4 − 4R2: Introduce 0’s above and below the leading 1 inrow 2.

1000

−2000

0100

−201−2

−4−1

3−6

112−4

8

R1 + 2R3, R4 + 2R3: Introduce 0’s above and below the leading 1 in row 3.

1000

−2000

0100

0010

2−1

30

32−4

0

The matrix above represents the system of equations

x1 − 2x2 + 2x5 = 3x3 − x5 = 2

x4 + 3x5 = −4.

Solving the preceding system, we find:

x1 = 3+ 2x2 − 2x5

x3 = 2 + x5

x4 = −4 − 3x5

(1)

In Eq. (1) we have a nice description of all of the infinitely many solutions to theoriginal system—it is called the general solution for the system. For this example,x2 and x5 are viewed as independent (or unconstrained) variables and can be assignedvalues arbitrarily. The variables x1, x3, and x4 are dependent (or constrained) variables,and their values are determined by the values assigned to x2 and x5. For example, inEq. (1), setting x2 = 1 and x5 = −1 yields a particular solution given by x1 = 7,x2 = 1, x3 = 1, x4 = −1, and x5 = −1.

Electronic Aids and SoftwareOne testimony to the practical importance of linear algebra is the wide variety of elec-tronic aids available for linear algebra computations. For instance, many scientific



calculators can solve systems of linear equations and perform simple matrix opera-tions. For computers there are general-purpose computer algebra systems such as Derive,Mathematica, and Maple that have extensive computational capabilities. Special-purposelinear algebra software such as MATLAB is very easy to use and can perform virtuallyany type of matrix calculation.

In the following example, we illustrate the use of MATLAB. From time to time, asappropriate, we will include other examples that illustrate the use of electronic aids.

Example 5 In certain applications, it is necessary to evaluate sums of powers of integers such as

1 + 2 + 3 + · · · + n,12 + 22 + 32 + · · · + n2,

13 + 23 + 33 + · · · + n3, and so on.

Interestingly, it is possible to derive simple formulas for such sums. For instance, youmight be familiar with the formula

1+ 2+ 3+ · · · + n = n(n+ 1)2

.

Such formulas can be derived using the following result: If n and r are positive integers,then there are constants a1, a2, . . . , ar+1 such that

1r + 2r + 3r + · · · + nr = a1n+ a2n2 + a3n

3 + · · · + ar+1nr+1. (2)

Use Eq. (2) to find the formula for 13 + 23 + 33 + · · · + n3. (Note: Eq. (2) can be derivedfrom the theory of linear difference equations.)

Solution From Eq. (2) there are constants a1, a2, a3, and a4 such that

13 + 23 + 33 + · · · + n3 = a1n+ a2n2 + a3n

3 + a4n4.

If we evaluate the formula just given for n = 1, n = 2, n = 3, and n = 4, we obtainfour equations for a1, a2, a3, and a4:

a1 + a2 + a3 + a4 = 1 (n = 1)2a1 + 4a2 + 8a3 + 16a4 = 9 (n = 2)3a1 + 9a2 + 27a3 + 81a4 = 36 (n = 3)4a1 + 16a2 + 64a3 + 256a4 = 100. (n = 4)

The augmented matrix for this system is

A =

1234

149

16

18

2764

11681

256

19

36100

.

We used MATLAB to solve the system by transformingA to reduced echelon form.The steps, as they appear on a computer screen, are shown in Fig. 1.6. The symbol >>is a prompt from MATLAB. At the first prompt, we entered the augmented matrix A



>>A=[1,1,1,1,1;2,4,8,16,9;3,9,27,81,36;4,16,64,256,100]

A= 1 1 1 1 1 2 4 8 16 9 3 9 27 81 36 4 16 64 256 100

>>C=rref(A)

C= 1.0000 0 0 0 0 0 1.0000 0 0 0.2500 0 0 1.0000 0 0.5000 0 0 0 1.0000 0.2500

>>C

C= 1 0 0 0 0 0 1 0 0 1/4 0 0 1 0 1/2 0 0 0 1 1/4

Figure 1.6 Using MATLAB in Example 5 to row reduce the matrix Ato the matrix C.

and then MATLAB displayed A. At the second prompt, we entered the MATLAB row-reduction command, C = rref(A). The new matrix C, as displayed by MATLAB, isthe result of transforming A to reduced echelon form.

MATLAB normally displays results in decimal form. To obtain a rational form forthe reduced matrix C, from the submenu numerical form we selected rat and entered C,finding

C =

1000

0100

0010

0001

01/41/21/4

.

From this, we have a1 = 0, a2 = 1/4, a3 = 1/2, and a4 = 1/4. Therefore, the formulafor the sum of the first n cubes is

13 + 23 + 33 + · · · + n3 = 14n2 + 1

2n3 + 1

4n4

or, after simplification,

13 + 23 + 33 + · · · + n3 = n2(n+ 1)2

4.



ADDING INTEGERS Mathematical folklore has it that Gauss discovered the formula1+ 2+ 3+ · · · + n = n(n+ 1)/2 when he was only ten years old. To occupy time, his teacher asked thestudents to add the integers from 1 to 100. Gauss immediately wrote an answer and turned his slate over.To his teacher’s amazement, Gauss had the only correct answer in the class. Young Gauss had recognizedthat the numbers could be put in 50 sets of pairs such that the sum of each pair was 101:

(50+ 51)+ (49+ 52)+ (48+ 53)+ · · · + (1+ 100) = 50(101) = 5050.

Soon his brilliance was brought to the attention of the Duke of Brunswick, who thereafter sponsored theeducation of Gauss.

1.2 EXERCISES

Consider the matrices in Exercises 1–10.a) Either state that the matrix is in echelon form

or use elementary row operations totransform it to echelon form.

b) If the matrix is in echelon form, transform itto reduced echelon form.

1.[

1 20 1

]2.[

10

21−1

3

]

3.[

24

31

10

]4.[

01

12

13

]

5.[

02

00

21

34

]6.[

20

00

31

12

]

7.

100

310

241

121

8.

200

−110

31−3

9.

100

220

−1−2

0

−2−3

1

10.−1

00

420

−310

4−3

1

6−3

2

In Exercises 11–21, each of the given matrices representsthe augmented matrix for a system of linear equations.In each exercise, display the solution set or state that thesystem is inconsistent.

11.[

1 1 00 1 0

]12.

[1 1 00 0 2

]

13.[

1 2 1 00 1 3 1

]14.

[1 2 2 10 1 0 0

]

15.

1 1 1 00 1 0 00 0 0 1

16.

1 2 0 10 1 1 00 0 2 0

17.

1 0 1 0 00 0 1 1 00 0 0 1 0

18.

1 2 1 30 0 0 20 0 0 0

19.

1 0 0 10 1 0 10 0 0 1

20.

1 1 2 0 2 00 1 1 1 0 00 0 1 2 1 2

21.

2 1 3 2 0 10 0 1 1 2 10 0 0 0 3 0

In Exercises 22–35, solve the system by transformingthe augmented matrix to reduced echelon form.22. 2x1 − 3x2 = 5−4x1 + 6x2 = −10

23. x1 − 2x2 = 32x1 − 4x2 = 1



24. x1 − x2 + x3 = 32x1 + x2 − 4x3 = −3

25. x1 + x2 = 23x1 + 3x2 = 6

26. x1 − x2 + x3 = 42x1 − 2x2 + 3x3 = 2

27. x1 + x2 − x3 = 2−3x1 − 3x2 + 3x3 = −6

28. 2x1 + 3x2 − 4x3 = 3x1 − 2x2 − 2x3 = −2−x1 + 16x2 + 2x3 = 16

29. x1 + x2 − x3 = 12x1 − x2 + 7x3 = 8−x1 + x2 − 5x3 = −5

30. x1 + x2 − x5 = 1x2 + 2x3 + x4 + 3x5 = 1

x1 − x3 + x4 + x5 = 031. x1 + x3 + x4 − 2x5 = 1

2x1 + x2 + 3x3 − x4 + x5 = 03x1 − x2 + 4x3 + x4 + x5 = 1

32. x1 + x2 = 1x1 − x2 = 3

2x1 + x2 = 3

33. x1 + x2 = 1x1 − x2 = 3

2x1 + x2 = 234. x1 + 2x2 = 1

2x1 + 4x2 = 2−x1 − 2x2 = −1

35. x1 − x2 − x3 = 1x1 + x3 = 2

x2 + 2x3 = 3

In Exercises 36–40, find all values a for which the sys-tem has no solution.36. x1 + 2x2 = −3ax1 − 2x2 = 5

37. x1 + 3x2 = 42x1 + 6x2 = a

38. 2x1 + 4x2 = a3x1 + 6x2 = 5

39. 3x1 + ax2 = 3ax1 + 3x2 = 5

40. x1 + ax2 = 6ax1 + 2ax2 = 4

In Exercises 41 and 42, find all values α and β where0 ≤ α ≤ 2π and 0 ≤ β ≤ 2π .

41. 2 cosα + 4 sin β = 33 cosα − 5 sin β = −1

42. 2 cos2 α − sin2 β = 112 cos2 α + 8 sin2 β = 13

43. Describe the solution set of the following system interms of x3: x1 + x2 + x3 = 3

x1 + 2x2 = 5.For x1, x2, x3 in the solution set:

a) Find the maximum value of x3 such thatx1 ≥ 0 and x2 ≥ 0.

b) Find the maximum value ofy = 2x1 − 4x2 + x3 subject to x1 ≥ 0 andx2 ≥ 0.

c) Find the minimum value ofy = (x1 − 1)2 + (x2 + 3)2 + (x3 + 1)2 with norestriction on x1 or x2. [Hint: Regard y as afunction of x3 and set the derivative equal to 0;then apply the second-derivative test to verifythat you have found a minimum.]

44. Let A and I be as follows:

A =[

1c

d

b

], I =

[10

01

].

Prove that if b − cd = 0, then A is row equivalentto I .

45. As in Fig. 1.4, display all the possible configurationsfor a (2 × 3) matrix that is in echelon form. [Hint:There are seven such configurations. Consider thevarious positions that can be occupied by one, two,or none of the symbols.]

46. Repeat Exercise 45 for a (3×2)matrix, for a (3×3)matrix, and for a (3× 4) matrix.

47. Consider the matrices B and C:

B =[

12

23

], C =

[13

24

].

By Exercise 44, B and C are both row equivalent tomatrix I in Exercise 44. Determine elementary rowoperations that demonstrate thatB is row equivalentto C.

48. Repeat Exercise 47 for the matrices

B =[

13

47

], C =

[12

21

].

49. A certain three-digit number N equals fifteen timesthe sum of its digits. If its digits are reversed, theresulting number exceedsN by 396. The one’s digitis one larger than the sum of the other two. Givea linear system of three equations whose three un-knowns are the digits of N . Solve the system andfind N .

50. Find the equation of the parabola, y = ax2+bx+c,that passes through the points (−1, 6), (1, 4), and(2, 9). [Hint: For each point, give a linear equationin a, b, and c.]

51. Three people play a game in which there are al-ways two winners and one loser. They have the



understanding that the loser gives each winner anamount equal to what the winner already has. Afterthree games, each has lost just once and each has$24. With how much money did each begin?

52. Find three numbers whose sum is 34 when the sumof the first and second is 7, and the sum of the secondand third is 22.

53. A zoo charges $6 for adults, $3 for students, and$.50 for children. One morning 79 people enter andpay a total of $207. Determine the possible numbersof adults, students, and children.

54. Find a cubic polynomial,p(x) = a+bx+cx2+dx3,

such that p(1) = 5, p′(1) = 5, p(2) = 17, andp′(2) = 21.

In Exercises 55–58, use Eq. (2) to find the formula forthe sum. If available, use linear algebra software forExercises 57 and 58.55. 1+ 2+ 3+ · · · + n56. 12 + 22 + 32 + · · · + n2

57. 14 + 24 + 34 + · · · + n4

58. 15 + 25 + 35 + · · · + n5

1.3 CONSISTENT SYSTEMS OF LINEAR EQUATIONS

We saw in Section 1.1 that a system of linear equations may have a unique solution,infinitely many solutions, or no solution. In this section and in later sections, it will beshown that with certain added bits of information we can, without solving the system,either eliminate one of the three possible outcomes or determine precisely what theoutcome will be. This will be important later when situations will arise in which weare not interested in obtaining a specific solution, but we need to know only how manysolutions there are.

To illustrate, consider the general (2× 3) linear system

a11x1 + a12x2 + a13x3 = b1

a21x1 + a22x2 + a23x3 = b2.

Geometrically, the system is represented by two planes, and a solution corresponds toa point in the intersection of the planes. The two planes may be parallel, they maybe coincident (the same plane), or they may intersect in a line. Thus the system iseither inconsistent or has infinitely many solutions; the existence of a unique solutionis impossible.

Solution Possibilities for a Consistent Linear SystemWe begin our analysis by considering the (m× n) system of linear equations

a11x1 + a12x2 + · · ·+ a1nxn = b1

a21x1 + a22x2 + · · ·+ a2nxn = b2...

......

am1x1 + am2x2 + · · ·+ amnxn = bm.(1)

Our goal is to deduce as much information as possible about the solution set of system(1) without actually solving the system.

To that end, let [A | b] denote the augmented matrix for system (1). We know wecan use row operations to transform the [m× (n+ 1)]matrix [A | b] to a row equivalentmatrix [C | d] where [C | d] is in reduced echelon form. Hence, instead of trying to


1.3 Consistent Systems of Linear Equations 29

deduce the various possibilities for the solution set of (1), we will focus on the simplerproblem of analyzing the solution possibilities for the equivalent system represented bythe matrix [C | d].

We begin by making four remarks about an [m × (n + 1)] matrix [C | d] that is inreduced echelon form. Our first remark recalls an observation made in Section 1.2.

Remark 1: The system represented by the matrix [C | d] is inconsistent if and only if[C | d] has a row of the form [0, 0, 0, . . . , 0, 1].

Our second remark also follows because [C | d] is in reduced echelon form. Inparticular, we know every nonzero row of [C | d] has a leading 1. We also know thereare no other nonzero entries in a column of [C | d] that contains a leading 1. Thus, if xkis the variable corresponding to a leading 1, then xk can be expressed in terms of othervariables that do not correspond to any leading ones in [C | d]. Therefore, we obtain

Remark 2: Every variable corresponding to a leading 1 in [C | d] is a dependent vari-able. (That is, each “leading-one variable” can be expressed in terms of the independentor “nonleading-one variables.”)

We illustrate Remark 2 with the following example.

Example 1 Consider the matrix [C | d] given by

[C | d] =

10000

20000

01000

32000

00100

43100

12200

.

The matrix [C | d] is in reduced echelon form and represents the consistent system

x1 + 2x2 + 3x4 +x3 + 2x4 +

x5

+ 4x6 = 1+ 3x6 = 2+ x6 = 2.

The dependent variables (corresponding to the leading 1’s) are x1, x3, and x5. They canbe expressed in terms of the other (independent) variables as follows:

x1 = 1− 2x2 − 3x4 − 4x6

x3 = 2 − 2x4 − 3x6

x5 = 2 − x6.

Our third remark gives a bound on the number of nonzero rows in [C | d]. Letr denote the number of nonzero rows in [C | d]. (Later we will see that the number r iscalled the “rank” of C.) Since every nonzero row contains a leading 1, the number r isequal to the number of leading 1’s. Because the matrix is in echelon form, there cannotbe more leading 1’s in [C | d] than there are columns. Since the matrix [C | d] has n+ 1columns, we conclude:

Remark 3: Let r denote the number of nonzero rows in [C | d]. Then, r ≤ n+ 1.



Our fourth remark is a consequence of Remark 1 and Remark 3. Let r denote the numberof nonzero rows in [C | d]. If r = n+1, then [C | d] has a row of the form [0, 0, . . . , 0, 1]and hence the system represented by [C | d]must be inconsistent. Therefore, if the systemis consistent, we need to have r < n+ 1. This observation leads to:

Remark 4: Let r denote the number of nonzero rows in [C | d]. If the system repre-sented by [C | d] is consistent, then r ≤ n.

In general, let [C | d] be an [m × (n + 1)] matrix in reduced echelon form where[C | d] represents a consistent system. According to Remark 2, if [C | d] has r nonzerorows, then there are r dependent (constrained) variables in the solution of the systemcorresponding to [C | d]. In addition, by Remark 4, we know r ≤ n. Since there arenvariables altogether in this (m×n) system, the remainingn−r variables are independent(or unconstrained) variables. See Theorem 3.

Theorem 3 Let [C | d] be an [m× (n+ 1)]matrix in reduced echelon form, where [C | d] representsa consistent system. Let [C | d] have r nonzero rows. Then r ≤ n and in the solution ofthe system there are n− r variables that can be assigned arbitrary values.

Theorem 3 is illustrated below in Example 2.

Example 2 Illustrate Theorem 3 using the results of Example 1.

Solution The augmented matrix [C | d] in Example 1 is (5×6) and represents a consistent systemsince it does not have a row of the form [0, 0, 0, 0, 0, 1]. The matrix has r = 3 nonzerorows and hence must have n− r = 6− 3 = 3 independent variables. The 3 dependentvariables and 3 independent variables are displayed in Example 1.

The remark in Section 1.1 that a system of linear equations has either infinitely manysolutions, no solution, or a unique solution is an immediate consequence of Theorem 3.To see why, let [A | b] denote the augmented matrix for a system of m equations inn unknowns. Then [A | b] is row equivalent to a matrix [C | d] that is in reduced eche-lon form. Since the two augmented matrices represent equivalent systems, both of thesystems have the same solution set. By Theorem 3, we know the only possibilities forthe system represented by [C | d] (and hence for the system represented by [A | b]) are:

1. The system is inconsistent.2. The system is consistent and, in the notation of Theorem 3, r < n. In this

case there are n− r unconstrained variables, so the system has infinitely manysolutions.

3. The system is consistent and r = n. In this case there are no unconstrainedvariables, so the system has a unique solution.

We can also use Theorem 3 to draw some conclusions about a general (m × n)system of linear equations in the case where m < n. These conclusions are given inthe following corollary. Note that the hypotheses do not require the augmented matrixfor the system to be in echelon form. Nor do the hypotheses require the system to beconsistent.



Corollary Consider an (m × n) system of linear equations. If m < n, then either the system isinconsistent or it has infinitely many solutions.

Proof Consider an (m × n) system of linear equations where m < n. If the system is incon-sistent, there is nothing to prove. If the system is consistent, then Theorem 3 applies.For a consistent system, suppose that the augmented matrix [A | b] is row equivalent to amatrix [C | d] that is in echelon form and has r nonzero rows. Because the given systemhasm equations, the augmented matrix [A | b] hasm rows. Therefore the matrix [C | d]also has m rows. Because r is the number of nonzero rows for [C | d], it is clear thatr ≤ m. Butm < n, so it follows that r < n. By Theorem 3, there are n− r independentvariables. Because n− r > 0, the system has infinitely many solutions.

Example 3 What are the possibilities for the solution set of a (3 × 4) system of linear equations?If the system is consistent, what are the possibilities for the number of independentvariables?

Solution By the corollary to Theorem 3, the system either has no solution or has infinitely manysolutions. If the system reduces to a system with r equations, then r ≤ 3. Thus r mustbe 1, 2, or 3. (The case r = 0 can occur only when the original system is the trivialsystem in which all coefficients and all constants are zero.) If the system is consistent,the number of free parameters is 4− r , so the possibilities are 3, 2, and 1.

Example 4 What are the possibilities for the solution set of the following (3× 4) system?

2x1 − x2 + x3 − 3x4 = 0x1 + 3x2 − 2x3 + x4 = 0−x1 − 2x2 + 4x3 − x4 = 0

Solution First note that x1 = x2 = x3 = x4 = 0 is a solution, so the system is consistent. By thecorollary to Theorem 3, the system must have infinitely many solutions. That is, m = 3and n = 4, so m < n.

Homogeneous SystemsThe system in Example 4 is an example of a homogeneous system of equations. Moregenerally, the (m× n) system of linear equations given in (2) is called a homogeneoussystem of linear equations:

a11x1 + a12x2 + · · ·+ a1nxn = 0a21x1 + a22x2 + · · ·+ a2nxn = 0...

......

am1x1 + am2x2 + · · ·+ amnxn = 0.

(2)

Thus system (2) is the special case of the general (m × n) system (1) given earlierin which b1 = b2 = · · · = bm = 0. Note that a homogeneous system is alwaysconsistent, because x1 = x2 = · · · = xn = 0 is a solution to system (2). This solution iscalled the trivial solution or zero solution, and any other solution is called a nontrivialsolution. A homogeneous system of equations, therefore, either has the trivial solution



as the unique solution or also has nontrivial (and hence infinitely many) solutions. Withthese observations, the following important theorem is an immediate consequence of thecorollary to Theorem 3.

Theorem 4 A homogeneous (m×n) system of linear equations always has infinitely many nontrivialsolutions when m < n.

Example 5 What are the possibilities for the solution set ofx1 + 2x2 + x3 + 3x4 = 0

2x1 + 4x2 + 3x3 + x4 = 03x1 + 6x2 + 6x3 + 2x4 = 0?

Solve the system.

Solution By Theorem 4, the system has infinitely many nontrivial solutions. We solve by reducingthe augmented matrix:

1 2 1 3 02 4 3 1 03 6 6 2 0

.

R2 − 2R1, R3 − 3R1:

100

200

113

3−5−7

000

R3 − 3R2, R1−R2:

100

200

010

8−5

8

000

(1/8)R3, R1 − 8R3, R2 + 5R3:

100

200

010

001

000

.

Note that the last column of zeros is maintained under elementary row operations, so thegiven system is equivalent to the homogeneous system

x1 + 2x2 = 0x3 = 0

x4 = 0.Therefore, we obtain

x1 = −2x2

x3 = 0x4 = 0

as the solution.



Example 6 What are the possibilities for the solution set of

2x1 + 4x2 + 2x3 = 0−2x1 − 2x2 + 2x3 = 0

2x1 + 6x2 + 9x3 = 0?

Solve the system.

Solution Theorem 4 no longer applies because m = n = 3. However, because the system ishomogeneous, either the trivial solution is the unique solution or there are infinitelymany nontrivial solutions. To solve, we reduce the augmented matrix

2−2

2

4−2

6

229

000

.

(1/2)R1, R2 + 2R1, R3 − 2R1:

100

222

147

000

(1/2)R2, R1 − 2R2, R3 − 2R2:

100

010

−323

000

(1/3)R3, R1 + 3R3, R2 − 2R3:

100

010

001

000

.

Therefore, we find x1 = 0, x2 = 0, and x3 = 0 is the only solution to the system.

Example 7 For the system of equations

x1 − 2x2 + 3x3 = b1

2x1 − 3x2 + 2x3 = b2

−x1 + 5x3 = b3,

determine conditions on b1, b2, and b3 that are necessary and sufficient for the systemto be consistent.

Solution The augmented matrix is

12−1

−2−3

0

325

b1

b2

b3

.



The augmented matrix reduces to

100

010

−5−4

0

−3b1 + 2b2

−2b1 + b2

−3b1 + 2b2 + b3

.

If−3b1+2b2+b3 = 0 the system is inconsistent. On the other hand, if−3b1+2b2+b3 =0, then the system has general solution

x1 = −3b1 + 2b2 + 5x3

x2 = −2b1 + b2 + 4x3.

Thus, the given system is consistent if and only if −3b1 + 2b2 + b3 = 0.

Conic Sections and Quadric SurfacesAn interesting application of homogeneous equations involves the quadratic equation intwo variables:

ax2 + bxy + cy2 + dx + ey + f = 0. (3)

If Eq. (3) has real solutions, then the graph is a curve in the xy-plane. If at least one ofa, b, or c is nonzero, the resulting graph is known as a conic section. Conic sectionsinclude such familiar plane figures as parabolas, ellipses, hyperbolas, and (as well)certain degenerate forms such as points and lines. Objects as diverse as planets, comets,man-made satellites, and electrons follow trajectories in space that correspond to conicsections. The earth, for instance, travels in an elliptical path about the sun, with the sunat one focus of the ellipse.

In this subsection we consider an important data-fitting problem associated withEq. (3), namely:

Suppose we are given several points in the xy-plane, (x1, y1), (x2, y2),

. . . , (xn, yn). Can we find coefficients a, b, . . . , f so that the graphof Eq. (3) passes through the given points?

For example, if we know an object is moving along an ellipse, can we make a fewobservations of the object’s position and then determine its complete orbit? As we willsee, the answer is yes. In fact, if an object follows a trajectory that corresponds to thegraph of Eq. (3), then five or fewer observations are sufficient to determine the completetrajectory.

The following example introduces the data-fitting technique. As you will see, Ex-ample 8 describes a method for finding the equation of the line passing through twopoints in the plane. This is a simple and familiar problem, but its very simplicity is avirtue because it suggests methods we can use for solving more complicated problems.

Example 8 The general equation of a line is dx+ ey+f = 0. Find the equation of the line throughthe points (1, 2) and (3, 7).



Solution In an analytic geometry course, we would probably find the equation of the line byfirst calculating the slope of the line. In this example, however, we are interested indeveloping methods that can be used to find equations for more complicated curves; andwe do not want to use special purpose techniques, such as slopes, that apply only to lines.

Since the points (1, 2) and (3, 7) lie on the line defined by dx + ey + f = 0, weinsert these values into the equation and find the following conditions on the coefficientsd, e, and f :

d + 2e + f = 03d + 7e + f = 0.

We are guaranteed from Theorem 4 that the preceding homogeneous linear system hasnontrivial solutions; that is, we can find a line passing through the two given points. Tofind the equation of the line, we need to solve the system. We begin by forming theassociated augmented matrix [

1 2 1 03 7 1 0

].

The preceding matrix can be transformed to reduced echelon form, yielding[10

01

5−2

00

].

It follows that the solution is d = −5f , e = 2f , and hence the equation of the lineis

−5f x + 2fy + f = 0.

Canceling the parameter f , we obtain an equation for the line:

−5x + 2y + 1 = 0.

Example 8 suggests how we might determine the equation of a conic that passesthrough a given set of points in the xy-plane. In particular, see Eq. (3); the general conichas six coefficients, a, b, . . . , f . So, given any five points (xi, yi) we can insert thesefive points into Eq. (3) and the result will be a homogeneous system of five equations forthe six unknown coefficients that define the conic section. By Theorem 4, the resultingsystem is guaranteed to have a nontrivial solution—that is, we can guarantee that any fivepoints in the plane lie on the graph of an equation of the form (3). Example 9 illustratesthis point.

Example 9 Find the equation of the conic section passing through the five points (−1, 0), (0, 1),(2, 2), (2,−1), (0,−3). Display the graph of the conic.

Solution The augmented matrix for the corresponding homogeneous system of five equations insix unknowns is listed below. In creating the augmented matrix, we formed the rows



in the same order the points were listed and formed columns using the same order theunknowns were listed in Eq. (3). For example, the third row of the augmented matrixarises from inserting (2, 2) into Eq. (3):

4a + 4b + 4c + 2d + 2e + f = 0.

In particular, the augmented matrix is

10440

004−2

0

01419

−10220

012−1−3

11111

00000

.

We used MATLAB to transform the augmented matrix to reduced echelon form, finding

10000

01000

00100

00010

00001

7/18−1/2

1/3−11/18

2/3

00000

.

Thus, the coefficients of the conic through these five points are given by

a = −7f/18, b = f/2, c = −f/3, d = 11f/18, e = −2f/3.

Setting f = 18, we obtain a version of Eq. (3) with integer coefficients:

−7x2 + 9xy − 6y2 + 11x − 12y + 18 = 0.

The graph of this equation is an ellipse and is shown in Fig. 1.7. The graph was drawn us-ing the contour command from MATLAB. Contour plots and other features of MATLABgraphics are described in the Appendix.

Finally, it should be noted that the ideas discussed above are not limited to thexy-plane. For example, consider the quadratic equation in three variables:

ax2 + by2 + cz2 + dxy + exz+ fyz+ gx + hy + iz+ j = 0. (4)

The graph of Eq. (4) is a surface in three-space; the surface is known as a quadricsurface. Counting the coefficients in Eq. (4), we find ten. Thus, given any nine points inthree-space, we can find a quadric surface passing through the nine points (see Exercises30–31).



–3–4 –2 –1 10 2 3 4

–3

–4

–2

–1

1

0

2

3

4

Figure 1.7 The ellipse determined by five data points, see Example 9.

1.3 EXERCISES

In Exercises 1–4, transform the augmented matrix forthe given system to reduced echelon form and, in thenotation of Theorem 3, determine n, r , and the number,n − r , of independent variables. If n − r > 0, thenidentify n− r independent variables.1. 2x1 + 2x2 − x3 = 1−2x1 − 2x2 + 4x3 = 1

2x1 + 2x2 + 5x3 = 5−2x1 − 2x2 − 2x3 = −3

2. 2x1 + 2x2 = 14x1 + 5x2 = 44x1 + 2x2 = −2

3. − x2 + x3 + x4 = 2x1 + 2x2 + 2x3 − x4 = 3x1 + 3x2 + x3 = 2

4. x1 + 2x2 + 3x3 + 2x4 = 1x1 + 2x2 + 3x3 + 5x4 = 2

2x1 + 4x2 + 6x3 + x4 = 1−x1 − 2x2 − 3x3 + 7x4 = 2

In Exercises 5 and 6, assume that the given system isconsistent. For each system determine, in the notation ofTheorem 3, all possibilities for the number, r of nonzerorows and the number, n− r , of unconstrained variables.Can the system have a unique solution?



5. ax1 + bx2 = cdx1 + ex2 = fgx1 + hx2 = i

6. a11x1 + a12x2 + a13x3 + a14x4 = b1a21x1 + a22x2 + a23x3 + a24x4 = b2a31x1 + a32x2 + a33x3 + a34x4 = b3

In Exercises 7–18, determine all possibilities for thesolution set (from among infinitely many solutions, aunique solution, or no solution) of the system of linearequations described.7. A homogeneous system of 3 equations in 4

unknowns.8. A homogeneous system of 4 equations in 5

unknowns.9. A system of 3 equations in 2 unknowns.10. A system of 4 equations in 3 unknowns.11. A homogeneous system of 3 equations in 2

unknowns.12. A homogeneous system of 4 equations in 3

unknowns.13. A system of 2 equations in 3 unknowns that hasx1 = 1, x2 = 2, x3 = −1 as a solution.

14. A system of 3 equations in 4 unknowns that hasx1 = −1, x2 = 0, x3 = 2, x4 = −3 as a solution.

15. A homogeneous system of 2 equations in 2unknowns.

16. A homogeneous system of 3 equations in 3unknowns.

17. A homogeneous system of 2 equations in 2unknowns that has solution x1 = 1, x2 = −1.

18. A homogeneous system of 3 equations in 3unknowns that has solution x1 = 1, x2 = 3, x3 =−1.

In Exercises 19–22, determine by inspection whether thegiven system has nontrivial solutions or only the trivialsolution.19. 2x1 + 3x2 − x3 = 0

x1 − x2 + 2x3 = 020. x1 + 2x2 − x3 + 2x4 = 0

2x1 + x2 + x3 − x4 = 03x1 − x2 − 2x3 + 3x4 = 0

21. x1 + 2x2 − x3 = 0x2 + 2x3 = 0

4x3 = 022. x1 − x2 = 0

3x1 = 0

2x1 + x2 = 023. For what value(s) of a does the system have nontriv-

ial solutions?x1 + 2x2 + x3 = 0−x1 − x2 + x3 = 03x1 + 4x2 + ax3 = 0.

24. Consider the system of equationsx1 + 3x2 − x3 = b1x1 + 2x2 = b2

3x1 + 7x2 − x3 = b3.

a) Determine conditions on b1, b2, and b3 that arenecessary and sufficient for the system to beconsistent. [Hint: Reduce the augmentedmatrix for the system.]

b) In each of the following, either use your answerfrom a) to show the system is inconsistent orexhibit a solution.i) b1 = 1, b2 = 1, b3 = 3ii) b1 = 1, b2 = 0, b3 = −1iii) b1 = 0, b2 = 1, b3 = 2

25. Let B be a (4× 3) matrix in reduced echelon form.a) If B has three nonzero rows, then determine the

form of B. (Using Fig. 1.5 of Section 1.2 as aguide, mark entries that may or may not be zeroby ∗.)

b) Suppose that a system of 4 linear equations in2 unknowns has augmented matrix A, whereA is a (4× 3) matrix row equivalent to B.Demonstrate that the system of equations isinconsistent.

In Exercises 26–31, follow the ideas illustrated in Exam-ples 8 and 9 to find the equation of the curve or surfacethrough the given points. For Exercises 28–29, displaythe graph of the equation as in Fig. 1.7.26. The line through (3, 1) and (7, 2).27. The line through (2, 8) and (4, 1).28. The conic through (−4, 0), (−2,−2), (0, 3), (1, 1),

and (4, 0).29. The conic through (−4, 1), (−1, 2), (3, 2), (5, 1),

and (7,−1).30. The quadric surface through (0, 0, 1), (1, 0, 1),

(0, 1, 0), (3, 1, 0), (2, 0, 4), (1, 1, 2), (1, 2, 1),(2, 2, 3), (2, 2, 1).

31. The quadric surface through (1, 2, 3), (2, 1, 0),(6, 0, 6), (3, 1, 3), (4, 0, 2), (5, 5, 1), (1, 1, 2),(3, 1, 4), (0, 0, 2).


1.4 Applications (Optional) 39

In Exercises 32–33, note that the equation of a circle hasthe form

ax2 + ay2 + bx + cy + d = 0.Hence a circle is determined by three points. Find theequation of the circle through the given points.

32. (1, 1), (2, 1), and (3, 2)33. (4, 3), (1, 2), and (2, 0)

1.4 APPLICATIONS (OPTIONAL)

In this brief section we discuss networks and methods for determining flows in networks.An example of a network is the system of one-way streets shown in Fig. 1.8. A typicalproblem associated with networks is estimating the flow of traffic through this network ofstreets. Another example is the electrical network shown in Fig. 1.9. A typical problemconsists of determining the currents flowing through the loops of the circuit.

(Note: The network problems we discuss in this section are kept very simple so thatthe computational details do not obscure the ideas.)

Figure 1.8 A network of one-way streets

Figure 1.9 An electrical network



Flows in NetworksNetworks consist of branches and nodes. For the street network shown in Fig. 1.8, thebranches are the streets and the nodes are the intersections. We assume for a networkthat the total flow into a node is equal to the total flow out of the node. For example,Fig. 1.10 shows a flow of 40 into a node and a total flow of x1 + x2 + 5 out of the node.Since we assume that the flow into a node is equal to the flow out, it follows that theflows x1 and x2 must satisfy the linear equation 40 = x1 + x2 + 5, or equivalently,

x1 + x2 = 35.

As an example of network flow calculations, consider the system of one-way streetsin Fig. 1.11, where the flow is given in vehicles per hour. For instance, x1 + x4 vehiclesper hour enter node B, while x2 + 400 vehicles per hour leave.

40

5

x2

x1

Figure 1.10 Since we assume that the flow into a node is equal to theflow out, in this case, x1 + x2 = 35.

x2

x3x4x5

x7

x1

x6

A

F

B

E

C

D

800

600

600

1600

400

400

400

Figure 1.11 The traffic network analyzed in Example 1

Example 1

(a) Set up a system of equations that represents traffic flow for the network shownin Fig. 1.11. (The numbers give the average flows into and out of the networkat peak traffic hours.)

(b) Solve the system of equations. What is the traffic flow if x6 = 300 andx7 = 1300 vehicles per hour?



Solution

(a) Since the flow into a node is equal to the flow out, we obtain the followingsystem of equations:

800 = x1 + x5 (Node A)x1 + x4 = 400+ x2 (Node B)

x2 = 600+ x3 (Node C)1600+ x3 = 400+ x7 (Node D)

x7 = x4 + x6 (Node E)x5 + x6 = 1000. (Node F )

(b) The augmented matrix for the system above is

1 0 0 0 1 0 0 8001 −1 0 1 0 0 0 4000 1 −1 0 0 0 0 6000 0 1 0 0 0 −1 −12000 0 0 1 0 1 −1 00 0 0 0 1 1 0 1000

.

Some calculations show that this matrix is row equivalent to

1 0 0 0 0 −1 0 −2000 1 0 0 0 0 −1 −6000 0 1 0 0 0 −1 −12000 0 0 1 0 1 −1 00 0 0 0 1 1 0 10000 0 0 0 0 0 0 0

.

Therefore, the solution isx1 = x6 − 200x2 = x7 − 600x3 = x7 − 1200x4 = x7 − x6

x5 = 1000− x6.

If x6 = 300 and x7 = 1300, then (in vehicles per hour)

x1 = 100, x2 = 700, x3 = 100, x4 = 1000, x5 = 700.

We normally want the flows in a network to be nonnegative. For instance, considerthe traffic network in Fig. 1.11. If x6 were negative, it would indicate that traffic wasflowing from F to E rather than in the prescribed direction from E to F .



Example 2 Consider the street network in Example 1 (see Fig. 1.11). Suppose that the streets fromA to B and from B to C must be closed (that is, x1 = 0 and x2 = 0). How might thetraffic be rerouted?

Solution By Example 1, the flows arex1 = x6 − 200x2 = x7 − 600x3 = x7 − 1200x4 = x7 − x6

x5 = 1000− x6.

Therefore, if x1 = 0 and x2 = 0, it follows that x6 = 200 and x7 = 600. Us-ing these values, we then obtain x3 = −600, x4 = 400, and x5 = 800. In orderto have nonnegative flows, we must reverse directions on the street connecting C andD; this change makes x3 = 600 instead of −600. The network flows are shown inFig. 1.12.

0

600400800

0

600200

A

F

B

E

C

D

800

600

600

1600

400

400

400

Figure 1.12 The traffic network analyzed in Example 2

Electrical NetworksWe now consider current flow in simple electrical networks such as the one illustratedin Fig. 1.13. For such networks, current flow is governed by Ohm’s law and Kirchhoff’slaws, as follows.

Ohm’s Law: The voltage drop across a resistor is the product of the current andthe resistance.Kirchhoff’s First Law: The sum of the currents flowing into a node is equal to thesum of the currents flowing out.Kirchhoff’s Second Law: The algebraic sum of the voltage drops around a closedloop is equal to the total voltage in the loop.

(Note: With respect to Kirchhoff’s second law, two basic closed loops in Fig. 1.13 arethe counterclockwise paths BDCB and BCAB. Also, in each branch, we make a tentative



10 ohms

20 ohms

10 volts

5 volts

10 ohmsI2

I1

I3

B C

A

D

Figure 1.13 The electrical network analyzed in Example 3

assignment for the direction of current flow. If a current turns out to be negative, wethen reverse our assignment for that branch.)

Example 3 Determine the currents I1, I2, and I3 for the electrical network shown in Fig. 1.13.

Solution Applying Kirchhoff’s second law to the loops BDCB and BCAB, we obtain equations

−10I2 + 10I3 = 10 (BDCB)20I1 + 10I2 = 5 (BCAB).

Applying Kirchhoff’s first law to either of the nodes B or C, we find I1 = I2 + I3.Therefore,

I1 − I2 − I3 = 0.

The augmented matrix for this system of three equations is

1 −1 −1 00 −10 10 10

20 10 0 5

.

This matrix can be row reduced to

1 0 0 0.40 1 0 −0.30 0 1 0.7

.

Therefore, the currents are

I1 = 0.4, I2 = −0.3, I3 = 0.7.

Since I2 is negative, the current flow is fromC toB rather than fromB toC, as tentativelyassigned in Fig. 1.13.



1.4 EXERCISES

In Exercises 1 and 2, (a) set up the system of equationsthat describes traffic flow; (b) determine the flows x1, x2,and x3 if x4 = 100; and (c) determine the maximum andminimum values for x4 if all the flows are constrainedto be nonnegative.

1.

x2x4

x1

x3

400

400

600

200

200 200

400800

2.

x2x4

x1

x3

700

400

600

600

600 200

400500

In Exercises 3 and 4, find the flow of traffic in the rotaryif x1 = 600.

3.

x4

x3x2

x1

400

200200

400

4.

x6

x4x3

x2 x5

x1

200 400

300 200

400

500



In Exercises 5–8, determine the currents in the variousbranches.

5.

4 ohms

4 ohms

4 volts

2 volts

3 ohms

I2

I1

I3

6.

2 ohms

3 volts

4 volts

1 ohm1 ohm

I2

I3

I1

7.

4 ohms

10 volts

3 ohms2 ohms

8.

2 volts5 volts

1 ohm1 ohm 1 ohm

1 ohm

1 ohm

2 volts

9. a) Set up the system of equations that describes thetraffic flow in the accompanying figure.

b) Show that the system is consistent if and only ifa1 + b1 + c1 + d1 = a2 + b2 + c2 + d2.

x2x4

x1

x3

a1

d2

b2

c1

d1 c2

b1a2



10. The electrical network shown in the accompanyingfigure is called a Wheatstone bridge. In this bridge,R2 and R4 are known resistances and R3 is a knownresistance that can be varied. The resistance R1 isunknown and is to be determined by using the bridge.The resistance R5 represents the internal resistanceof a voltmeter attached between nodesB andD. Thebridge is said to be balanced when R3 is adjusted sothat there is no current flowing in the branch betweenB and D. Show that, when the bridge is balanced,R1R4 = R2R3. (In particular, the unknown resis-tance R1 can be found from R1 = R2R3/R4 whenthe bridge is balanced.)

V

A

B

D

CR5

R1 R2

R4R3

1.5 MATRIX OPERATIONS

In the previous sections, matrices were used as a convenient way of representing systemsof equations. But matrices are of considerable interest and importance in their ownright, and this section introduces the arithmetic operations that make them a usefulcomputational and theoretical tool.

In this discussion of matrices and matrix operations (and later in the discussion ofvectors), it is customary to refer to numerical quantities as scalars. For conveniencewe assume throughout this chapter that all matrix (and vector) entries are real numbers;hence the term scalar will mean a real number. In later chapters the term scalar willalso be applied to complex numbers. We begin with a definition of the equality of twomatrices.

Definition 5 Let A = (aij ) be an (m× n) matrix, and let B = (bij ) be an (r × s) matrix. Wesay that A and B are equal (and write A = B) if m = r , n = s, and aij = bij forall i and j , 1 ≤ i ≤ m, 1 ≤ j ≤ n.

Thus two matrices are equal if they have the same size and, moreover, if all theircorresponding entries are equal. For example, no two of the matrices

A =[

1 23 4

], B =

[2 14 3

], and C =

[1 2 03 4 0

]

are equal.


1.5 Matrix Operations 47

Matrix Addition and Scalar MultiplicationThe first two arithmetic operations, matrix addition and the multiplication of a matrixby a scalar, are defined quite naturally. In these definitions we use the notation (Q)ij todenote the ij th entry of a matrixQ.

Definition 6 Let A = (aij ) and B = (bij ) both be (m × n) matrices. The sum, A + B, is the(m× n) matrix defined by

(A+ B)ij = aij + bij .

Note that this definition requires that A and B have the same size before their sum isdefined. Thus if

A =[

1 2 −12 3 0

], B =

[ −3 1 20 −4 1

], and C =

[1 23 1

],

then

A+ B =[ −2 3 1

2 −1 1

],

while A+ C is undefined.

Definition 7 Let A = (aij ) be an (m× n) matrix, and let r be a scalar. The product, rA, is the(m× n) matrix defined by

(rA)ij = raij .

For example,

2

1 32 −10 3

=

2 64 −20 6

.

Example 1 Let the matrices A,B, and C be given by

A =[

1 3−2 7

], B =

[6 12 4

], and C =

[1 2 −13 0 5

].

Find each of A+ B, A+ C, B + C, 3C, and A+ 2B, or state that the indicated opera-tion is undefined.



Solution The defined operations yield

A+ B =[

7 40 11

], 3C =

[3 6 −39 0 15

], and A+ 2B =

[13 52 15

],

while A+ C and B + C are undefined.

Vectors in Rn

Before proceeding with the definition of matrix multiplication, recall that a point inn-dimensional space is represented by an ordered n-tuple of real numbers x = (x1,

x2, . . . , xn). Such an n-tuple will be called an n-dimensional vector and will be writtenin the form of a matrix,

x =

x1

x2...xn

.

For example, an arbitrary three-dimensional vector has the form

x =x1

x2

x3

,

and the vectors

x =

123

, y =

321

, and z =

231

are distinct three-dimensional vectors. The set of all n-dimensional vectors with realcomponents is called Euclidean n-space and will be denoted by Rn. Vectors in Rn willbe denoted by boldface type. Thus Rn is the set defined by

Rn = {x: x =

x1

x2...xn

where x1, x2, . . . , xn are real numbers}.

As the notation suggests, an element of Rn can be viewed as an (n× 1) real matrix, andconversely an (n×1) real matrix can be considered an element ofRn. Thus addition andscalar multiplication of vectors is just a special case of these operations for matrices.

Vector Form of the General SolutionHaving defined addition and scalar multiplication for vectors and matrices, we can usethese operations to derive a compact expression for the general solution of a consistentsystem of linear equations. We call this expression the vector form for the generalsolution.



The idea of the vector form for the general solution is straightforward and is bestexplained by a few examples.

Example 2 The matrix B is the augmented matrix for a homogeneous system of linear equations.Find the general solution for the linear system and express the general solution in termsof vectors

B =[

1 0 −1 −3 00 1 2 1 0

].

Solution Since B is in reduced echelon form, it is easy to write the general solution:

x1 = x3 + 3x4, x2 = −2x3 − x4.

In vector form, therefore, the general solution can be expressed as

x =

x1

x2

x3

x4

=

x3 + 3x4

−2x3 − x4

x3

x4

=

x3

−2x3

x3

0

+

3x4

−x4

0x4

= x3

1−2

10

+ x4

3−1

01

.

This last expression is called the vector form for the general solution.

In general, the vector form for the general solution of a homogeneous system consistsof a sum of well-determined vectors multiplied by the free variables. Such expressionsare called “linear combinations” and we will use this concept of a linear combinationextensively, beginning in Section 1.7. The next example illustrates the vector form forthe general solution of a nonhomogeneous system.

Example 3 Let B denote the augmented matrix for a system of linear equations

B =

1 −2 0 0 2 30 0 1 0 −1 20 0 0 1 3 −4

.

Find the vector form for the general solution of the linear system.

Solution Since B is in reduced echelon form, we readily find the general solution:

x1 = 3+ 2x2 − 2x5, x3 = 2+ x5, x4 = −4− 3x5.



Expressing the general solution in vector form, we obtain

x =

x1

x2

x3

x4

x5

=

3+ 2x2 − 2x5

x2

2+ x5

−4− 3x5

x5

=

302−4

0

+

2x2

x2

000

+

−2x5

0x5

−3x5

x5

=

302−4

0

+ x2

21000

+ x5

−201−3

1

.

Thus, the general solution has the form x = b + au + bv, where b, u, and v are fixedvectors in R5.

Scalar ProductIn vector calculus, the scalar product (or dot product) of two vectors

u =

u1

u2...un

and v =

v1

v2...vn

in Rn is defined to be the number u1v1+ u2v2+ · · · + unvn =∑ni=1 uivi . For example,

if

u =

23−1

and v =

−4

23

,

then the scalar product of u and v is 2(−4)+ 3(2)+ (−1)3 = −5. The scalar productof two vectors will be considered further in the following section, and in Chapter 3 theproperties of Rn will be more fully developed.

Matrix MultiplicationMatrix multiplication is defined in such a way as to provide a convenient mechanismfor describing a linear correspondence between vectors. To illustrate, let the variablesx1, x2, . . . , xn and the variables y1, y2, . . . , ym be related by the linear equations

a11x1 + a12x2 + · · · + a1nxn = y1

a21x1 + a22x2 + · · · + a2nxn = y2...

......

am1x1 + am2x2 + · · · + amnxn = ym.(1)



If we set

x =

x1

x2...xn

and y =

y1

y2...ym

,

then (1) defines a correspondence x→ y from vectors in Rn to vectors in Rm. The ithequation of (1) is

ai1x1 + ai2x2 + · · · + ainxn = yi,and this can be written in a briefer form as

n∑j=1

aij xj = yi. (2)

If A is the coefficient matrix of system (1),

A =

a11 a12 · · · a1n

a21 a22 · · · a2n...

...am1 am2 · · · amn

,

then the left-hand side of Eq. (2) is precisely the scalar product of the ith row of A withthe vector x. Thus if we define the product ofA and x to be the (m×1) vectorAxwhoseith component is the scalar product of the ith row of A with x, then Ax is given by

Ax =

n∑j=1

a1j xj

n∑j=1

a2j xj

...n∑j=1

amjxj

.

Using the definition of equality (Definition 5), we see that the simple matrix equation

Ax = y (3)

is equivalent to system (1).In a natural fashion, we can extend the idea of the product of a matrix and a vector

to the product, AB, of an (m × n) matrix A and an (n × s) matrix B by defining theij th entry of AB to be the scalar product of the ith row of A with the j th column of B.Formally, we have the following definition.



Definition 8 Let A = (aij ) be an (m × n) matrix, and let B = (bij ) be an (r × s) matrix. Ifn = r , then the product AB is the (m× s) matrix defined by

(AB)ij =n∑k=1

aikbkj .

If n = r , then the product AB is not defined.

The definition can be visualized by referring to Fig. 1.14.

a11

ai1

am1

...

...

a12

ai2

am2

...

...

a1n

ain

amn

...

...

. . .

. . .

. . .

c11

ci1

cm1

...

...

c1j

cij

cmj

...

...

c1s

cis

cms

...

...

. . .

. . .

. . .

. . .

. . .

. . .

b11b21

bn1

...

b1jb2j

bnj

...

b1sb2s

bns

...

. . .

. . .

. . .

. . .

. . .

. . . =

m × n n × s m × s

Figure 1.14 The ij th entry of AB is the scalar product of the ith row ofA and the j th column of B.

Thus the product AB is defined only when the inside dimensions of A and B areequal. In this case the outside dimensions, m and s, give the size of AB. Furthermore,the ij th entry of AB is the scalar product of the ith row of A with the j th column of B.For example,

[2 1 −3−2 2 4

]−1 2

0 −32 1

=[

2(−1)+ 1(0)+ (−3)2 2(2)+ 1(−3)+ (−3)1(−2)(−1)+ 2(0)+ 4(2) (−2)2+ 2(−3)+ 4(1)

]=[ −8 −2

10 −6

],

whereas the product [2 1 −3−2 2 4

][ −1 0 22 −3 1

]

is undefined.



Example 4 Let the matrices A,B,C, and D be given by

A =[

1 22 3

], B =

[ −3 21 −2

],

C =[

1 0 −20 1 1

], and D =

3 1−1 −2

1 1

.

Find each of AB, BA, AC, CA, CD, and DC, or state that the indicated product isundefined.

Solution The definition of matrix multiplication yields

AB =[ −1 −2−3 −2

], BA =

[1 0−3 −4

], and AC =

[1 2 02 3 −1

].

The product CA is undefined, and

CD =[

1 −10 −1

]and DC =

3 1 −5−1 −2 0

1 1 −1

.

Example 4 illustrates that matrix multiplication is not commutative; that is, normallyAB and BA are different matrices. Indeed, the product AB may be defined while theproduct BA is undefined, or both may be defined but have different dimensions. Evenwhen AB and BA have the same size, they usually are not equal.

Example 5 Express each of the linear systemsx1 = 2y1 − y2

x2 = −3y1 + 2y2

x3 = y1 + 3y2

andy1 = −4z1 + 2z2

y2 = 3z1 + z2

as a matrix equation and use matrix multiplication to express x1, x2, and x3 in terms ofz1 and z2.

Solution We havex1

x2

x3

=

2 −1−3 2

1 3

[y1

y2

]and

[y1

y2

]=[ −4 2

3 1

][z1

z2

].

Substituting for

[y1

y2

]in the left-hand equation gives

x1

x2

x3

=

2 −1−3 2

1 3

[ −4 2

3 1

][z1

z2

]=−11 3

18 −45 5

[z1

z2

].



Therefore,

x1 = −11z1 + 3z2

x2 = 18z1 − 4z2

x3 = 5z1 + 5z2.

The use of the matrix equation (3) to represent the linear system (1) provides aconvenient notational device for representing the (m× n) system

a11x1 + a12x2 + · · · + a1nxn = b1

a21x1 + a22x2 + · · · + a2nxn = b2...

......

...am1x1 + am2x2 + · · · + amnxn = bm

(4)

of linear equations with unknowns x1, . . . , xn. Specifically, ifA = (aij ) is the coefficientmatrix of (4), and if the unknown (n × 1) matrix x and the constant (m × 1) matrix bare defined by

x =

x1

x2...xn

and b =

b1

b2...

bm

,

then the system (4) is equivalent to the matrix equation

Ax = b. (5)

Example 6 Solve the matrix equation Ax = b, where

A =

1 3 −12 5 −12 8 −2

, x =

x1

x2

x3

, and b =

266

.

Solution The matrix equation Ax = b is equivalent to the (3× 3) linear system

x1 + 3x2 − x3 = 22x1 + 5x2 − x3 = 62x1 + 8x2 − 2x3 = 6.

This system can be solved in the usual way—that is, by reducing the augmented matrix—to obtain x1 = 2, x2 = 1, x3 = 3. Therefore,

s =

213

is the unique solution to Ax = b.



Other Formulations of Matrix MultiplicationIt is frequently convenient and useful to express an (m×n)matrixA = (aij ) in the form

A = [A1,A2, . . . ,An], (6)

where for each j, 1 ≤ j ≤ n,Aj denotes the j th column ofA. That is, Aj is the (m×1)column vector

Aj =

a1j

a2j...amj

.

For example, if A is the (2× 3) matrix

A =[

1 3 62 4 0

], (7)

then A = [A1,A2,A3], where

A1 =[

12

], A2 =

[34

], and A3 =

[60

].

The next two theorems use Eq. (6) to provide alternative ways of expressing the matrixproducts Ax and AB; these methods will be extremely useful in our later developmentof matrix theory.

Theorem 5 Let A = [A1,A2, . . . ,An] be an (m × n) matrix whose j th column is Aj , and let x bethe (n× 1) column vector

x =

x1

x2...xn

.

Then the product Ax can be expressed as

Ax = x1A1 + x2A2 + · · · + xnAn.The proof of this theorem is not difficult and uses only Definitions 5, 6, 7, and 8; the

proof is left as an exercise for the reader. To illustrate Theorem 5, let A be the matrix

A =[

1 3 62 4 0

],

and let x be the vector in R3,

x =x1

x2

x3

.



Then

Ax =[

1 3 62 4 0

]x1

x2

x3

=[

x1 + 3x2 + 6x3

2x1 + 4x2 + 0x3

]

= x1

[12

]+ x2

[34

]+ x3

[60

];

so that Ax = x1A1 + x2A2 + x3A3. In particular, if we set

x =

22−3

,

then Ax = 2A1 + 2A2 − 3A3.From Theorem 5, we see that the matrix equation Ax = b corresponding to the

(m× n) system (4) can be expressed as

x1A1 + x2A2 + · · · + xnAn = b. (8)

Thus, Eq. (8) says that solving Ax = b amounts to showing that b can be written interms of the columns of A.

Example 7 Solve

x1

122

+ x2

358

+ x3

−1−1−2

=

266

.

Solution By Theorem 5, the given equation is equivalent to the matrix equation Ax = b, where

A =

1 3 −12 5 −12 8 −2

, x =

x1

x2

x3

, and b =

266

.

This equation was solved in Example 6 giving x1 = 2, x2 = 1, x3 = 3, so we have

2

122

+

358

+ 3

−1−1−2

=

266

.

Although Eq. (8) is not particularly efficient as a computational tool, it is useful forunderstanding how the internal structure of the coefficient matrix affects the possiblesolutions of the linear system Ax = b.



Another important observation, which we will use later, is an alternative way ofexpressing the product of two matrices, as given in Theorem 6.

Theorem 6 Let A be an (m× n) matrix, and let B = [B1,B2, . . . ,Bs] be an (n× s) matrix whosekth column is Bk . Then the j th column of AB is ABj , so that

AB = [AB1, AB2, . . . , ABs].Proof If A = (aij ) and B = (bij ), then the j th column of AB contains the entries

n∑k=1

a1kbkj

n∑k=1

a2kbkj

...n∑k=1

amkbkj ;

and these are precisely the components of the column vector ABj , where

Bj =

b1j

b2j...

bnj

.

It follows that we can write AB in the form AB = [AB1, AB2, . . . , ABs].To illustrate Theorem 6, let A and B be given by

A =

2 60 41 2

and B =

[1 3 0 14 5 2 3

].

Thus the column vectors for B are

B1 =[

14

], B2 =

[35

], B3 =

[02

], and B4 =

[13

]

and

AB1 =

2616

9

, AB2 =

362013

, AB3 =

1284

, and AB4 =

2012

7

.

Calculating AB, we see immediately that AB is a (3 × 4) matrix with columns AB1,AB2, AB3, and AB4; that is,

AB =

26 36 12 2016 20 8 12

9 13 4 7

.



1.5 EXERCISES

The (2× 2)matrices listed in Eq. (9) are used in severalof the exercises that follow.

A =[

2 11 3

], B =

[0 −11 3

],

C =[ −2 3

1 1

], Z =

[0 00 0

] (9)

Exercises 1–6 refer to the matrices in Eq. (9).1. Find (a) A+B; (b) A+C; (c) 6B; and (d) B + 3C.2. Find (a)B+C; (b) 3A; (c)A+2C; and (d)C+8Z.3. Find a matrix D such that A+D = B.4. Find a matrix D such that A+ 2D = C.5. Find a matrix D such that A+ 2B + 2D = 3B.6. Find a matrixD such that 2A+5B+D = 2B+3A.

The vectors listed in Eq. (10) are used in several of theexercises that follow.

r =[

10

], s =

[2−3

],

t =[

14

], u =

[ −46

] (10)

In Exercises 7–12, perform the indicated computation,using the vectors in Eq. (10) and the matrices in Eq. (9).7. a) r + sb) 2r + tc) 2s+ u

8. a) t + sb) r + 3uc) 2u+ 3t

9. a) Arb) Brc) C(s+ 3t)

10. a) Btb) C(r + s)c) B(r + s)

11. a) (A+ 2B)rb) (B + C)u

12. a) (A+ C)rb) (2B + 3C)s

Exercises 13–20 refer to the vectors in Eq. (10). In eachexercise, find scalars a1 and a2 that satisfy the givenequation, or state that the equation has no solution.13. a1r + a2s = t 14. a1r + a2s = u15. a1s+ a2t = u 16. a1s+ a2t = r + t17. a1s+ a2u = 2r + t 18. a1s+ a2u = t19. a1t + a2u = 3s+ 4t 20. a1t + a2u = 3r + 2s

Exercises 21–24 refer to the matrices in Eq. (9) and thevectors in Eq. (10).21. Find w2, where w1 = Br and w2 = Aw1. CalculateQ = AB. Calculate Qr and verify that w2 is equaltoQr.

22. Find w2, where w1 = Cs and w2 = Aw1. CalculateQ = AC. Calculate Qs and verify that w2 is equaltoQs.

23. Find w3, where w1 = Cr, w2 = Bw1, and w3 =Aw2. Calculate Q = A(BC) and verify that w3 isequal toQr.

24. Find w3, where w1 = Ar, w2 = Cw1, and w3 =Bw2. Calculate Q = B(CA) and verify that w3 isequal toQr.

Exercises 25–30 refer to the matrices in Eq. (9). Findeach of the following.25. (A+ B)C 26. (A+ 2B)A27. (A+ C)B 28. (B + C)Z29. A(BZ) 30. Z(AB)

The matrices and vectors listed in Eq. (11) are used inseveral of the exercises that follow.

A =[

2 31 4

], B =

[1 21 4

], u =

[13

],

v = [ 2, 4], C =

2 14 08 −13 2

, (11)

D =

2 1 3 62 0 0 41 −1 1 −11 3 1 2

, w =

2311

.

Exercises 31–41 refer to the matrices and vectors inEq. (11). Find each of the following.31. AB and BA 32. DC33. Au and vA 34. uv and vu35. v(Bu) 36. Bu37. CA 38. CB39. C(Bu) 40. (AB)u and A(Bu)41. (BA)u and B(Au)



In Exercises 42–49, the given matrix is the augmentedmatrix for a system of linear equations. Give the vectorform for the general solution.42.

[1 0 −1 −2 00 1 2 3 0

]

43.[

1 0 −1 −20 1 2 3

]

44.

1 0 −1 0 −10 1 2 0 10 0 0 1 1

45.

1 0 −1 0 −1 00 1 2 0 1 00 0 0 1 1 0

46.[

1 0 −1 −2 −3 10 1 2 3 4 0

]

47.[

1 0 −1 −2 −3 00 1 2 3 4 0

]

48.

1 0 −1 0 −1 −2 00 1 2 0 1 2 00 0 0 1 1 1 0

49.

1 −1 0 −2 0 00 0 1 2 0 00 0 0 0 1 0

50. In Exercise 40, the calculations (AB)u and A(Bu)produce the same result. Which calculation requiresfewer multiplications of individual matrix entries?(For example, it takes two multiplications to get the(1, 1) entry of AB.)

51. The next section will show that all the followingcalculations produce the same result:C[A(Bu)] = (CA)(Bu) = [C(AB)]u = C[(AB)u].Convince yourself that the first expression requiresthe fewest individual multiplications. [Hint: Form-ing Bu takes four multiplications, and thus A(Bu)takes eight multiplications, and so on.] Count thenumber of multiplications required for each of thefour preceding calculations.

52. Refer to the matrices and vectors in Eq. (11).a) Identify the column vectors in A = [A1,A2]

and D = [D1,D2,D3,D4].

b) In part (a), is A1 in R2, R3, or R4? Is D1 in R2,R3, or R4?

c) Form the (2× 2) matrix with columns[AB1, AB2], and verify that this matrix is theproduct AB.

d) Verify that the vector Dw is the same as2D1 + 3D2 + D3 + D4.

53. Determine whether the following matrix productsare defined. When the product is defined, give thesize of the product.a) AB and BA, where A is (2× 3) and B is (3× 4)b) AB and BA, where A is (2× 3) and B is (2× 4)c) AB and BA, where A is (3× 7) and B is (6× 3)d) AB and BA, where A is (2× 3) and B is (3× 2)e) AB and BA, where A is (3× 3) and B is (3× 1)f ) A(BC) and (AB)C, where A is (2× 3), B is(3× 5), and C is (5× 4)

g) AB and BA, where A is (4× 1) and B is (1× 4)54. What is the size of the product (AB)(CD), whereA is (2 × 3), B is (3 × 4), C is (4 × 4), and D is(4 × 2)? Also calculate the size of A[B(CD)] and[(AB)C]D.

55. If A is a matrix, what should the symbol A2 mean?What restrictions on A are required in order that A2

be defined?56. Set

O =[

0 00 0

],

A =[

2 00 2

], and

B =[

1 b

b−1 1

],

where b = 0. Show that O,A, and B are solu-tions to the matrix equation X2 − 2X = O. Con-clude that this quadratic equation has infinitely manysolutions.

57. Two newspapers compete for subscriptions in aregion with 300,000 households. Assume that nohousehold subscribes to both newspapers and thatthe following table gives the probabilities that ahousehold will change its subscription status duringthe year.



From A From B From None

To A .70 .15 .30To B .20 .80 .20To None .10 .05 .50For example, an interpretation of the first column ofthe table is that during a given year, newspaper Acan expect to keep 70% of its current subscriberswhile losing 20% to newspaper B and 10% to nosubscription.

At the beginning of a particular year, supposethat 150,000 households subscribe to newspaper A,100,000 subscribe to newspaper B, and 50,000 haveno subscription. Let P and x be defined by

P =.70 .15 .30.20 .80 .20.10 .05 .50

and x =

150,000100,000

50,000

.

The vector x is called the state vector for the begin-ning of the year. CalculatePx andP 2x and interpretthe resulting vectors.

58. Let A =[

1 23 4

].

a) Find all matrices B =[a b

c d

]such that

AB = BA.b) Use the results of part (a) to exhibit (2× 2)

matrices B and C such that AB = BA andAC = CA.

59. Let A and B be matrices such that the product ABis defined and is a square matrix. Argue that theproduct BA is also defined and is a square matrix.

60. Let A and B be matrices such that the product ABis defined. Use Theorem 6 to prove each of thefollowing.a) If B has a column of zeros, then so does AB.b) If B has two identical columns, then so doesAB.

61. a) Express each of the linear systems i) and ii) inthe form Ax = b.i) 2x1 − x2 = 3 ii) x1 − 3x2 + x3 = 1

x1 + x2 = 3 x1 − 2x2 + x3 = 2x2 − x3 = −1

b) Express systems i) and ii) in the form of Eq. (8).c) Solve systems i) and ii) by Gaussian

elimination. For each system Ax = b,

represent b as a linear combination of thecolumns of the coefficient matrix.

62. Solve Ax = b, where A and b are given by

A =[

1 11 2

], b =

[23

].

63. Let A and I be the matrices

A =[

1 11 2

], I =

[1 00 1

].

a) Find a (2× 2) matrix B such that AB = I .[Hint: Use Theorem 6 to determine the columnvectors of B.]

b) Show that AB = BA for the matrix B found inpart (a).

64. Prove Theorem 5 by showing that the ith com-ponent of Ax is equal to the ith component ofx1A1 + x2A2 + · · · + xnAn, where 1 ≤ i ≤ m.

65. For A and C, which follow, find a matrix B (if pos-sible) such that AB = C.

a) A =[

1 31 4

], C =

[2 63 6

]

b) A =

1 1 10 2 12 4 3

, C =

1 0 01 2 01 3 5

c) A =[

1 22 4

], C =

[0 00 0

],

where B = C.66. A (3 × 3) matrix T = (tij ) is called an upper-

triangular matrix if T has the form

T =t11 t12 t13

0 t22 t23

0 0 t33

.

Formally, T is upper triangular if tij = 0 when-ever i > j . If A and B are upper-triangular (3× 3)matrices, verify that the product AB is also uppertriangular.

67. An (n × n) matrix T = (tij ) is called upper trian-gular if tij = 0 whenever i > j . Suppose that Aand B are (m × n) upper-triangular matrices. UseDefinition 8 to prove that the product AB is uppertriangular. That is, show that the ij th entry of AB iszero when i > j .


1.6 Algebraic Properties of Matrix Operations 61

In Exercises 68–70, find the vector form for the generalsolution.68. x1 + 3x2 − 3x3 + 2x4 − 3x5 = −4

3x1 + 9x2 − 10x3 + 10x4 − 14x5 = 22x1 + 6x2 − 10x3 + 21x4 − 25x5 = 53

69. 14x1 − 8x2 + 3x3 − 49x4 + 29x5 = 44−8x1 + 5x2 − 2x3 + 29x4 − 16x5 = −24

3x1 − 2x2 + x3 − 11x4 + 6x5 = 970. 18x1 + 18x2 − 10x3 + 7x4 + 2x5 + 50x6 = 26−10x1 − 10x2 + 6x3 − 4x4 − x5 − 27x6 = −13

7x1 + 7x2 − 4x3 + 5x4 + 2x5 + 30x6 = 182x1 + 2x2 − x3 + 2x4 + x5 + 12x6 = 8

71. In Exercise 57 we saw that the state vector giv-ing the number of newspaper subscribers in year ncould be found by forming Pnx where x is the ini-tial state. Later, in Section 3.8, we will see thatas n grows larger and larger, the vector Pnx tendstoward a limit. Use MATLAB to calculate Pnx forn = 1, 2, . . . , 30. For ease of reading, display theresults using bank format in the MATLAB numericoptions menu. What do you think the steady statedistribution of newspapers will be?

1.6 ALGEBRAIC PROPERTIES OF MATRIX OPERATIONS

In the previous section we defined the matrix operations of addition, multiplication, andthe multiplication of a matrix by a scalar. For these operations to be useful, the basicrules they obey must be determined. As we will presently see, many of the familiaralgebraic properties of real numbers also hold for matrices. There are, however, impor-tant exceptions. We have already noted, for example, that matrix multiplication is notcommutative. Another property of real numbers that does not carry over to matrices isthe cancellation law for multiplication. That is, if a, b, and c are real numbers such thatab = ac and a = 0, then b = c. By contrast, consider the three matrices

A =[

1 11 1

], B =

[1 42 1

], and C =

[2 21 3

].

Note that AB = AC but B = C. This example shows that the familiar cancellation lawfor real numbers does not apply to matrix multiplication.

Properties of Matrix OperationsThe next three theorems list algebraic properties that do hold for matrix operations. Insome cases, although the rule seems obvious and the proof simple, certain subtletiesshould be noted. For example, Theorem 9 asserts that (r + s)A = rA + sA, where rand s are scalars and A is an (m × n) matrix. Although the same addition symbol, +,appears on both sides of the equation, two different addition operations are involved;r + s is the sum of two scalars, and rA+ sA is the sum of two matrices.

Our first theorem lists some of the properties satisfied by matrix addition.

Theorem 7 If A,B, and C are (m× n) matrices, then the following are true:

1. A+ B = B + A.2. (A+ B)+ C = A+ (B + C).3. There exists a unique (m × n) matrix O (called the zero matrix) such thatA+O = A for every (m× n) matrix A.

4. Given an (m × n) matrix A, there exists a unique (m × n) matrix P such thatA+ P = O.



These properties are easily established, and the proofs of 2–4 are left as exercises.Regarding properties 3 and 4, we note that the zero matrix,O, is the (m× n)matrix, allof whose entries are zero. Also the matrix P of property 4 is usually called the additiveinverse for A, and the reader can show that P = (−1)A. The matrix (−1)A is alsodenoted as −A, and the notation A − B means A + (−B). Thus property 4 states thatA− A = O.

Proof of Property 1 If A = (aij ) and B = (bij ) are (m× n) matrices, then, by Definition 6,

(A+ B)ij = aij + bij .Similarly, by Definition 6,

(B + A)ij = bij + aij .Since addition of real numbers is commutative, aij+bij and bij+aij are equal. Therefore,A+ B = B + A.

Three associative properties involving scalar and matrix multiplication are given inTheorem 8.

Theorem 8

1. If A,B, and C are (m × n), (n × p), and (p × q) matrices, respectively, then(AB)C = A(BC).

2. If r and s are scalars, then r(sA) = (rs)A.3. r(AB) = (rA)B = A(rB).The proof is again left to the reader, but we will give one example to illustrate the

theorem.

Example 1 Demonstrate that (AB)C = A(BC), where

A =[

1 2−1 3

], B =

[2 −1 31 −1 1

], and C =

3 1 2−2 1 −1

4 −2 −1

.

Solution Forming the products AB and BC yields

AB =[

4 −3 51 −2 0

]and BC =

[20 −5 2

9 −2 2

].

Therefore, (AB)C is the product of a (2 × 3) matrix with a (3 × 3) matrix, whereasA(BC) is the product of a (2×2)matrix with a (2×3)matrix. Forming these products,we find

(AB)C =[

38 −9 67 −1 4

]and A(BC) =

[38 −9 6

7 −1 4

].

Finally, the distributive properties connecting addition and multiplication are givenin Theorem 9.



Theorem 9

1. If A and B are (m× n) matrices and C is an (n× p) matrix, then (A+B)C =AC + BC.

2. If A is an (m× n) matrix and B and C are (n× p) matrices, then A(B +C) =AB + AC.

3. If r and s are scalars and A is an (m× n) matrix, then (r + s)A = rA+ sA.4. If r is a scalar and A and B are (m× n) matrices, then r(A+ B) = rA+ rB.

Proof We will prove property 1 and leave the others to the reader. First observe that (A+B)Cand AC + BC are both (m × p) matrices. To show that the components of these twomatrices are equal, let Q = A+ B, where Q = (qij ). Then (A+ B)C = QC, and thersth component ofQC is given by

n∑k=1

qrkcks =n∑k=1

(ark + brk)cks =n∑k=1

arkcks +n∑k=1

brkcks .

Becausen∑k=1

arkcks +n∑k=1

brkcks

is precisely the rsth entry of AC + BC, it follows that (A+ B)C = AC + BC.

The Transpose of a MatrixThe concept of the transpose of a matrix is important in applications. Stated informally,the transpose operation, applied to a matrix A, interchanges the rows and columns of A.The formal definition of transpose is as follows.

Definition 9 If A = (aij ) is an (m × n) matrix, then the transpose of A, denoted AT , is the(n × m) matrix AT = (bij ), where bij = aji for all i and j , 1 ≤ j ≤ m, and1 ≤ i ≤ n.

The following example illustrates the definition of the transpose of a matrix:

Example 2 Find the transpose of A =[

1 3 72 1 4

].

Solution By Definition 9, AT is the (3× 2) matrix

AT =

1 23 17 4

.



In the preceding example, note that the first row ofA becomes the first column ofAT ,and the second row of A becomes the second column of AT . Similarly, the columns ofA become the rows of AT . Thus AT is obtained by interchanging the rows and columnsof A.

Three important properties of the transpose are given in Theorem 10.

Theorem 10 If A and B are (m× n) matrices and C is an (n× p) matrix, then:

1. (A+ B)T = AT + BT .2. (AC)T = CTAT .3. (AT )T = A.

Proof We will leave properties 1 and 3 to the reader and prove property 2. Note first that (AC)Tand CTAT are both (p ×m) matrices, so we have only to show that their correspondingentries are equal. From Definition 9, the ij th entry of (AC)T is the jith entry of AC.Thus the ij th entry of (AC)T is given by

n∑k=1

ajkcki .

Next the ij th entry of CTAT is the scalar product of the ith row of CT with the j thcolumn of AT . In particular, the ith row of CT is [c1i , c2i , . . . , cni] (the ith column ofC), whereas the j th column of AT is

aj1

aj2...ajn

(the j th row of A). Therefore, the ij th entry of CTAT is given by

c1iaj1 + c2iaj2 + · · · + cniajn =n∑k=1

ckiajk.

Finally, sincen∑k=1

ckiajk =n∑k=1

ajkcki,

the ij th entries of (AC)T and CTAT agree, and the matrices are equal.

The transpose operation is used to define certain important types of matrices, such aspositive-definite matrices, normal matrices, and symmetric matrices. We will considerthese in detail later and give only the definition of a symmetric matrix in this section.

Definition 10 A matrix A is symmetric if A = AT .



If A is an (m× n) matrix, then AT is an (n×m) matrix, so we can have A = ATonly ifm = n. An (n×n)matrix is called a square matrix; thus if a matrix is symmetric,it must be a square matrix. Furthermore, Definition 9 implies that if A = (aij ) is an(n× n) symmetric matrix, then aij = aji for all i and j , 1 ≤ i, j ≤ n. Conversely, if Ais square and aij = aji for all i and j , then A is symmetric.

Example 3 Determine which of the matrices

A =[

1 22 3

], B =

[1 21 2

], and C =

1 63 12 0

is symmetric. Also show that BTB and CTC are symmetric.

Solution By Definition 9,

AT =[

1 22 3

], BT =

[1 12 2

], and CT =

[1 3 26 1 0

].

Thus A is symmetric since AT = A. However, BT = B and CT = C. Therefore, Band C are not symmetric. As can be seen, the matrices BTB and CTC are symmetric:

BTB =[

2 44 8

]and CTC =

[14 9

9 37

].

In Exercise 49, the reader is asked to show thatQTQ is always a symmetric matrixwhether or notQ is symmetric.

In the (n × n) matrix A = (aij ), the entries a11, a22, . . . , ann are called the maindiagonal of A. For example, the main diagonal of a (3 × 3) matrix is illustrated inFig. 1.15. Since the entries aij and aji are symmetric partners relative to the maindiagonal, symmetric matrices are easily recognizable as those in which the entries forma symmetric array relative to the main diagonal. For example, if

A =

2 3 −13 4 2−1 2 0

and B =

1 2 2−1 3 0

5 2 6

,

then, by inspection, A is symmetric, whereas B is not.

a11a21a31

a12a22a32

a13a23a33

Figure 1.15Main diagonal

The Identity MatrixAs we will see later, the (n × n) identity matrix plays an important role in matrix the-ory. In particular, for each positive integer n, the identity matrix In is defined to be the



(n× n) matrix with ones on the main diagonal and zeros elsewhere:

In =

1 0 0 · · · 00 1 0 · · · 00 0 1 · · · 0...

...

0 0 0 · · · 1

.

That is, the ij th entry of In is 0 when i = j , and is 1 when i = j . For example, I2 andI3 are given by

I2 =[

1 00 1

]and I3 =

1 0 00 1 00 0 1

.

The identity matrix is the multiplicative identity for matrix multiplication. Specif-ically, let A denote an (n × n) matrix. Then, as in Exercise 62, it is not hard to showthat

AIn = A and InA = A.Identity matrices can also be used with rectangular matrices. For example, let B denotea (p × q) matrix. Then, as in Exercise 62,

BIq = B and IpB = B.By way of illustration, consider

A =

1 2 0−1 3 4

6 1 8

, B =

[2 3 11 5 7

], C =

−2 0

8 36 1

,

and

x =

103

.

Note thatI3A = AI3 = ABI3 = BI3C = CI3x = x,

whereas the products I3B and CI3 are not defined.Usually the dimension of the identity matrix is clear from the context of the problem

under consideration, and it is customary to drop the subscript, n, and denote the (n× n)identity matrix simply as I . So, for example, if A is an (n × n) matrix, we will write



IA = AI = A instead of InA = AIn = A. Note that the identity matrix is a sym-metric matrix.

Scalar Products and Vector NormsThe transpose operation can be used to represent scalar products and vector norms. Aswe will see, a vector norm provides a method for measuring the size of a vector.

To illustrate the connection between transposes and the scalar product, let x and ybe vectors in R3 given by

x =

1−3

2

and y =

121

.

Then xT is the (1× 3) vector

xT = [1,−3, 2],and xT y is the scalar (or (1× 1) matrix) given by

xT y = [1,−3, 2]

121

= 1− 6+ 2 = −3.

More generally, if x and y are vectors in Rn,

x =

x1

x2...xn

, y =

y1

y2...yn

,

then

xT y =n∑i=1

xiyi;

that is, xT y is the scalar product or dot product of x and y. Also note that yT x =∑ni=1 yixi =

∑ni=1 xiyi = xT y.

One of the basic concepts in computational work is that of the length or norm of avector. If

x =[a

b

]

is inR2, then x can be represented geometrically in the plane as the directed line segmentOP⇀ from the origin O to the point P , which has coordinates (a, b), as illustrated inFig. 1.16. By the Pythagorean theorem, the length of the line segment ⇀OP is

√a2 + b2.

a x

b

x

P

O

y

Figure 1.16Geometric vector intwo-space



A similar idea is used in Rn. For a vector x in Rn,

x =

x1

x2...xn

,

it is natural to define the Euclidean length, or Euclidean norm of x, denoted by ‖x‖, tobe

‖x‖ =√x2

1 + x22 + · · · + x2

n.

(The quantity ‖x‖ gives us a way to measure the size of the vector x.)Noting that the scalar product of x with itself is

xT x = x21 + x2

2 + · · · + x2n,

we have

‖x‖ =√xT x. (1)

For vectors x and y in Rn, we define the Euclidean distance between x and y to be‖x − y‖. Thus the distance between x and y is given by

‖x − y‖ = √(x − y)T (x − y)

= √(x1 − y1)2 + (x2 − y2)2 + · · · + (xn − yn)2. (2)

Example 4 If x and y in R3 are given by

x =−2

32

and y =

12−1

,

then find xT y, ‖x‖, ‖y‖, and ‖x − y‖.Solution We have

xT y = [ −2 3 2]

12−1

= −2+ 6− 2 = 2.

Also, ‖x‖ = √xT x = √4+ 9+ 4 = √17, and ‖y‖ = √yT y = √1+ 4+ 1 = √6.

Subtracting y from x gives

x − y =−3

13

,

so ‖x − y‖ = √(x − y)T (x − y) = √9+ 1+ 9 = √19.



1.6 EXERCISES

The matrices and vectors listed in Eq. (3) are used inseveral of the exercises that follow.

A =

3 14 72 6

, B =

1 2 17 4 36 0 1

,

C =

2 1 4 06 1 3 52 4 2 0

, D =

[2 11 4

],

E =[

3 62 3

], F =

[1 11 1

],

u =[

1−1

], v =

[ −33

](3)

Exercises 1–25 refer to the matrices and vectors inEq. (3). In Exercises 1–6, perform the multiplicationsto verify the given equality or nonequality.1. (DE)F = D(EF ) 2. (FE)D = F(ED)3. DE = ED 4. EF = FE5. Fu = Fv 6. 3Fu = 7Fv

In Exercises 7–12, find the matrices.7. AT 8. DT 9. ETF10. ATC 11. (Fv)T 12. (EF)v

In Exercises 13–25, calculate the scalars.13. uT v 14. vT Fu 15. vT Dv16. vT Fv 17. uT u 18. vT v19. ‖u‖ 20. ‖Dv‖ 21. ‖Au‖22. ‖u− v‖ 23. ‖Fu‖ 24. ‖Fv‖25. ‖(D − E)u‖26. LetA andB be (2×2)matrices. Prove or find a coun-

terexample for this statement: (A − B)(A + B) =A2 − B2.

27. LetA and B be (2×2)matrices such thatA2 = ABand A = O. Can we assert that, by cancellation,A = B? Explain.

28. Let A and B be as in Exercise 27. Find the flaw inthe following proof that A = B.

Since A2 = AB, A2 − AB = O. FactoringyieldsA(A−B) = O. SinceA = O, it follows thatA− B = O. Therefore, A = B.

29. Two of the six matrices listed in Eq. (3) are symmet-ric. Identify these matrices.

30. Find (2 × 2) matrices A and B such that A andB are symmetric, but AB is not symmetric. [Hint:(AB)T = BTAT = BA.]

31. Let A and B be (n × n) symmetric matrices. Givea necessary and sufficient condition for AB to besymmetric. [Hint: Recall Exercise 30.]

32. Let G be the (2 × 2) matrix that follows, and con-sider any vector x in R2 where both entries are notsimultaneously zero:

G =[

2 11 1

], x =

[x1

x2

]; |x1| + |x2| > 0.

Show that xTGx > 0. [Hint: Write xTGx as a sumof squares.]

33. Repeat Exercise 32 using the matrixD in Eq. (3) inplace of G.

34. For F in Eq. (3), show that xTFx ≥ 0 for all x inR2. Classify those vectors x such that xTFx = 0.

If x and y are vectors in Rn, then the product xT y is of-ten called an inner product. Similarly, the product xyTis often called an outer product. Exercises 35–40 con-cern outer products; the matrices and vectors are given inEq. (3). In Exercises 35–40, form the outer products.35. uvT 36. u(Fu)T 37. v(Ev)T

38. u(Ev)T 39. (Au)(Av)T 40. (Av)(Au)T

41. Let a and b be given by

a =[

12

]and b =

[34

].

a) Find x in R2 that satisfies both xTa = 6 andxT b = 2.

b) Find x in R2 that satisfies both xT(a + b) = 12and xTa = 2.

42. Let A be a (2× 2)matrix, and let B and C be givenby

B =[

1 31 4

]and C =

[2 34 5

].

a) If AT + B = C, what is A?



b) If AT B = C, what is A? andc) Calculate BC1, BT1 C, (BC1)

TC2, and ‖CB2‖.43. Let

A =

4 −2 22 4 −41 1 0

and u =

132

.

a) Verify that Au = 2u.b) Without forming A5, calculate the vector A5u.c) Give a formula for Anu, where n is a positive

integer. What property from Theorem 8 isrequired to derive the formula?

44. Let A,B, and C be (m × n) matrices such thatA + C = B + C. The following statements arethe steps in a proof that A = B. Using Theorem 7,provide justification for each of the assertions.a) There exists an (m× n) matrix O such thatA = A+O.

b) There exists an (m× n) matrix D such thatA = A+ (C +D).

c) A = (A+ C)+D = (B + C)+D.d) A = B + (C +D).e) A = B +O.f ) A = B.

45. Let A,B,C, and D be matrices such that AB = DandAC = D. The following statements are steps ina proof that if r and s are scalars, thenA(rB+sC) =(r+ s)D. Use Theorems 8 and 9 to provide reasonsfor each of the steps.a) A(rB + sC) = A(rB)+ A(sC).b) A(rB + sC) = r(AB)+ s(AC) = rD + sD.c) A(rB + sC) = (r + s)D.

46. Letx and ybe vectors inRn such that‖x‖ = ‖y‖ = 1andxTy = 0. Use Eq. (1) to show that‖x−y‖ = √2.

47. Use Theorem 10 to show that A+AT is symmetricfor any square matrix A.

48. Let A be the (2× 2) matrix

A =[

1 23 6

].

Choose some vector b in R2 such that the equationAx = b is inconsistent. Verify that the associated

equationATAx = AT b is consistent for your choiceof b. Let x* be a solution to ATAx = AT b, andselect some vectors x at random from R2. Verifythat ‖Ax* − b‖ ≤ ‖Ax − b‖ for any of these ran-dom choices for x. (In Chapter 3, we will show thatATAx = AT b is always consistent for any (m× n)matrix A regardless of whether Ax = b is consis-tent or not. We also show that any solution x* ofATAx = AT b satisfies ‖Ax* − b‖ ≤ ‖Ax − b‖ forall x in Rn; that is, such a vector x* minimizes thelength of the residual vector r = Ax − b.)

49. Use Theorem 10 to prove each of the following:a) IfQ is any (m× n)matrix, thenQTQ andQQT

are symmetric.b) If A,B, and C are matrices such that

the product ABC is defined, then(ABC)T = CTBTAT. [Hint: Set BC = D.]Note: These proofs can be done quickly withoutconsidering the entries in the matrices.

50. LetQ be an (m× n)matrix and x any vector in Rn.Prove that xTQTQx ≥ 0. [Hint: Observe thatQx isa vector in Rm.]

51. Prove properties 2, 3, and 4 of Theorem 7.52. Prove property 1 of Theorem 8. [Note: This is a

long exercise, but the proof is similar to the proof ofpart 2 of Theorem 10.]

53. Prove properties 2 and 3 of Theorem 8.54. Prove properties 2, 3, and 4 of Theorem 9.55. Prove properties 1 and 3 of Theorem 10.

In Exercises 56–61, determine n andm so that InA = Aand AIm = A, where:56. A is (2× 3) 57. A is (5× 7)58. A is (4× 4) 59. A is (4× 6)60. A is (4× 2) 61. A is (5× 5)62. a) Let A be an (n× n) matrix. Use the definition

of matrix multiplication to show that AIn = Aand InA = A.

b) Let B be a (p × q) matrix. Use the definition ofmatrix multiplication to show that BIq = B andIpB = B.


1.7 Linear Independence and Nonsingular Matrices 71

1.7 LINEAR INDEPENDENCE ANDNONSINGULAR MATRICES

Section 1.5 demonstrated how the general linear system

a11x1 + a12x2 + · · · + a1nxn = b1

a21x1 + a22x2 + · · · + a2nxn = b2...

......

...

am1x1 + am2x2 + · · · + amnxn = bm

(1)

can be expressed as a matrix equation Ax = b. We observed in Section 1.1 thatsystem (1) may have a unique solution, infinitely many solutions, or no solution. Thematerial in Section 1.3 illustrates that, with appropriate additional information, we canknow which of the three possibilities will occur. The case in whichm = n is of particularinterest, and in this and later sections, we determine conditions on the matrix A in orderthat an (n× n) system has a unique solution.

Linear IndependenceIf A = [A1,A2, . . . ,An], then, by Theorem 5 of Section 1.5, the equation Ax = b canbe written in terms of the columns of A as

x1A1 + x2A2 + · · · + xnAn = b. (2)

From Eq. (2), it follows that system (1) is consistent if, and only if, b can be written asa sum of scalar multiples of the column vectors of A. We call a sum such as x1A1 +x2A2 + · · · + xnAn a linear combination of the vectors A1,A2, . . . ,An. Thus Ax = bis consistent if, and only if, b is a linear combination of the columns of A.

Example 1 If the vectors A1,A2,A3, b1, and b2 are given by

A1 =

12−1

, A2 =

131

, A3 =

143

,

b1 =

381

, and b2 =

25−1

,

then express each of b1 and b2 as a linear combination of the vectors A1,A2,A3.

Solution If A = [A1,A2,A3], that is,

A =

1 1 12 3 4−1 1 3

,



then expressing b1 as a linear combination of A1,A2,A3 is equivalent to solving the(3 × 3) linear system with matrix equation Ax = b1. The augmented matrix for thesystem is

1 1 1 32 3 4 8−1 1 3 1

,

and solving in the usual manner yields

x1 = 1+ x3

x2 = 2− 2x3,

where x3 is an unconstrained variable. Thus b1 can be expressed as a linear combinationof A1,A2,A3 in infinitely many ways. Taking x3 = 2, for example, yields x1 = 3, x2 =−2, so

3A1 − 2A2 + 2A3 = b1;that is,

3

12−1

− 2

131

+ 2

143

=

381

.

If we attempt to follow the same procedure to express b2 as a linear combination ofA1,A2,A3, we discover that the system of equationsAx = b2 is inconsistent. Therefore,b2 cannot be expressed as a linear combination of A1,A2,A3.

It is convenient at this point to introduce a special symbol, θ , to denote the m-dimensional zero vector. Thus θ is the vector in Rm, all of whose components arezero:

θ =

00...

0

.

We will use θ throughout to designate zero vectors in order to avoid any possible con-fusion between a zero vector and the scalar zero. With this notation, the (m × n)homogeneous system

a11x1 + a12x2 + · · · + a1nxn = 0a21x1 + a22x2 + · · · + a2nxn = 0...

......

...am1x1 + am2x2 + · · · + amnxn = 0

(3)

has the matrix equation Ax = θ , which can be written as

x1A1 + x2A2 + · · · + xnAn = θ . (4)

In Section 1.3, we observed that the homogeneous system (3) always has the trivialsolution x1 = x2 = · · · = xn = 0. Thus in Eq. (4), θ can always be expressed as a linear



combination of the columns A1,A2, . . . ,An of A by taking x1 = x2 = · · · = xn = 0.There could, however, be nontrivial solutions, and this leads to the following definition.

Definition 11 A set ofm-dimensional vectors {v1, v2, . . . , vp} is said to be linearly independentif the only solution to the vector equation

a1v1 + a2v2 + · · · + apvp = θis a1 = 0, a2 = 0, . . . , ap = 0. The set of vectors is said to be linearly dependentif it is not linearly independent. That is, the set is linearly dependent if we can finda solution to a1v1 + a2v2 + · · · + apvp = θ where not all the ai are zero.

Any time you need to know whether a set of vectors is linearly independent orlinearly dependent, you should start with the dependence equation:

a1v1 + a2v2 + · · · + apvp = θ (5)

You would then solve Eq. (5). If there are nontrivial solutions, then the set of vectorsis linearly dependent. If Eq. (5) has only the trivial solution, then the set of vectors islinearly independent.

We can phrase Eq. (5) in matrix terms. In particular, let V denote the (m×p)matrixmade up from the vectors v1, v2, . . . , vp:

V = [v1, v2, . . . , vp].Then Eq. (5) is equivalent to the matrix equation

V x = θ . (6)

Thus to determine whether the set {v1, v2, . . . , vp} is linearly independent or dependent,we solve the homogeneous system of equations (6) by forming the augmented matrix[V | θ ] and reducing [V | θ ] to echelon form. If the system has nontrivial solutions, then{v1, v2, . . . , vp} is a linearly dependent set. If the trivial solution is the only solution,then {v1, v2, . . . , vp} is a linearly independent set.

Example 2 Determine whether the set {v1, v2, v3} is linearly independent or linearly dependent,where

v1 =

123

, v2 =

2−1

4

, and v3 =

052

.

Solution To determine whether the set is linearly dependent, we must determine whether thevector equation

x1v1 + x2v2 + x3v3 = θ (7)



has a nontrivial solution. But Eq. (7) is equivalent to the (3× 3) homogeneous systemof equations V x = θ , where V = [v1, v2, v3]. The augmented matrix, [V | θ ], for thissystem is

1 2 0 02 −1 5 03 4 2 0

.

This matrix reduces to

1 0 2 00 1 −1 00 0 0 0

.

Therefore, we find the solution x1 = −2x3, x2 = x3, where x3 is arbitrary. In particular,Eq. (7) has nontrivial solutions, so {v1, v2, v3} is a linearly dependent set. Setting x3 = 1,for example, gives x1 = −2, x2 = 1. Therefore,

−2v1 + v2 + v3 = θ .Note that from this equation we can express v3 as a linear combination of v1 and v2:

v3 = 2v1 − v2.

Similarly, of course, v1 can be expressed as a linear combination of v2 and v3, and v2can be expressed as a linear combination of v1 and v3.

Example 3 Determine whether or not the set {v1, v2, v3} is linearly dependent, where

v1 =

12−3

, v2 =

−2

11

, and v3 =

1−1−2

.

Solution If V = [v1, v2, v3], then the augmented matrix [V | θ ] is row equivalent to

1 0 0 00 1 0 00 0 1 0

.

Thus the only solution of x1v1+x2v2+x3v3 = θ is the trivial solution x1 = x2 = x3 = 0;so the set {v1, v2, v3} is linearly independent.

In contrast to the preceding example, note that v3 cannot be expressed as a linearcombination of v1 and v2. If there were scalars a1 and a2 such that

v3 = a1v1 + a2v2,

then there would be a nontrivial solution to x1v1 + x2v2 + x3v3 = θ ; namely, x1 =−a1, x2 = −a2, x3 = 1.

We note that a set of vectors is linearly dependent if and only if one of the vectorsis a linear combination of the remaining ones (see the exercises). It is also worth noting



THE VECTOR SPACE Rn, n > 3 The extension of vectors and their correspondingalgebra into more than three dimensions was an extremely important step in the development ofmathematics. This advancement is attributed largely to Hermann Grassmann (1809–1877) in hisAusdehnungslehre. In this work Grassmann discussed linear independence and dependence and manyconcepts dealing with the algebraic structure of Rn (such as dimension and subspaces), which we willstudy in Chapter 3. Unfortunately, Grassmann’s work was so difficult to read that it went almostunnoticed for a long period of time, and he did not receive as much credit as he deserved.

that any set of vectors that contains the zero vector is linearly dependent (again, see theexercises).

The unit vectors e1, e2, . . . , en in Rn are defined by

e1 =

100...

0

, e2 =

010...

0

, e3 =

001...

0

, . . . , en =

000...

1

. (8)

It is easy to see that {e1, e2, . . . , en} is linearly independent. To illustrate, consider theunit vectors

e1 =

100

, e2 =

010

, and e3 =

001

in R3. If V = [e1, e2, e3], then

[V | θ ] =

1 0 0 00 1 0 00 0 1 0

,

so clearly the only solution of V x = θ (or equivalently, of x1e1 + x2e2 + x3e3 = θ ) isthe trivial solution x1 = 0, x2 = 0, x3 = 0.

The next example illustrates that, in some cases, the linear dependence of a set ofvectors can be determined by inspection. The example is a special case of Theorem 11,which follows.

Example 4 Let {v1, v2, v3} be the set of vectors in R2 given by

v1 =[

12

], v2 =

[31

], and v3 =

[23

].

Without solving the corresponding homogeneous system of equations, show that the setis linearly dependent.



Solution The vector equation x1v1 + x2v2 + x3v3 = θ is equivalent to the homogeneous systemof equations V x = θ , where V = [v1, v2, v3]. But this is the homogeneous system

x1 + 3x2 + 2x3 = 02x1 + x2 + 3x3 = 0,

consisting of two equations in three unknowns. By Theorem 4 of Section 1.3, the systemhas nontrivial solutions; hence the set {v1, v2, v3} is linearly dependent.

Example 4 is a particular case of the following general result.

Theorem 11 Let {v1, v2, . . . , vp} be a set of vectors in Rm. If p > m, then this set is linearlydependent.

Proof The set {v1, v2, . . . , vp} is linearly dependent if the equation V x = θ has a nontrivialsolution, where V = [v1, v2, . . . , vp]. But V x = θ represents a homogeneous (m× p)system of linear equations with m < p. By Theorem 4 of Section 1.3, V x = θ hasnontrivial solutions.

Note that Theorem 11 does not say that if p ≤ m, then the set {v1, v2, . . . , vp} islinearly independent. Indeed Examples 2 and 3 illustrate that if p ≤ m, then the set maybe either linearly independent or linearly dependent.

Nonsingular MatricesThe concept of linear independence allows us to state precisely which (n×n) systems oflinear equations always have a unique solution. We begin with the following definition.

Definition 12 An (n × n) matrix A is nonsingular if the only solution to Ax = θ is x = θ .Furthermore, A is said to be singular if A is not nonsingular.

If A = [A1,A2, . . . ,An], then Ax = θ can be written as

x1A1 + x2A2 + · · · + xnAn = θ ,so it is an immediate consequence of Definition 12 that A is nonsingular if and only ifthe column vectors of A form a linearly independent set. This observation is importantenough to be stated as a theorem.

Theorem 12 The (n×n)matrixA = [A1,A2, . . . ,An] is nonsingular if and only if {A1,A2, . . . ,An}is a linearly independent set.

Example 5 Determine whether each of the matrices

A =[

1 32 2

]and B =

[1 22 4

]

is singular or nonsingular.



Solution The augmented matrix [A | θ ] for the system Ax = θ is row equivalent to[1 0 00 1 0

],

so the trivial solution x1 = 0, x2 = 0 (or x = θ) is the unique solution. Thus A isnonsingular.

The augmented matrix [B | θ ] for the system Bx = θ is row equivalent to[1 2 00 0 0

].

Thus, B is singular because the vector

x =[ −2

1

]

is a nontrivial solution ofBx = θ . Equivalently, the columns ofB are linearly dependentbecause

−2B1 + B2 = θ .The next theorem demonstrates the importance of nonsingular matrices with respect

to linear systems.

Theorem 13 LetA be an (n×n)matrix. The equationAx = b has a unique solution for every (n×1)column vector b if and only if A is nonsingular.

Proof Suppose first that Ax = b has a unique solution no matter what choice we make for b.Choosing b = θ implies, by Definition 12, that A is nonsingular.

Conversely, suppose that A = [A1,A2, . . . ,An] is nonsingular, and let b be any(n × 1) column vector. We first show that Ax = b has a solution. To see this, observefirst that

{A1,A2, . . . ,An, b}is a set of (n× 1) vectors in Rn; so by Theorem 11 this set is linearly dependent. Thusthere are scalars a1, a2, . . . , an, an+1 such that

a1A1 + a2A2 + · · · + anAn + an+1b = θ; (9)

and moreover not all these scalars are zero. In fact, if an+1 = 0 in Eq. (9), then

a1A1 + a2A2 + · · · + anAn = θ ,and it follows that {A1,A2, . . . ,An} is a linearly dependent set. Since this contradictsthe assumption that A is nonsingular, we know that an+1 is nonzero. It follows fromEq. (9) that

s1A1 + s2A2 + · · · + snAn = b,

where

s1 = −a1

an+1, s2 = −a2

an+1, . . . , sn = −an

an+1.



Thus Ax = b has a solution s given by

s =

s1

s2...sn

.

This shows that Ax = b is always consistent when A is nonsingular.To show that the solution is unique, suppose that the (n×1) vector u is any solution

whatsoever to Ax = b; that is, Au = b. Then As− Au = b− b, or

A(s− u) = θ;therefore, y = s − u is a solution to Ax = θ . But A is nonsingular, so y = θ ; that iss = u. Thus Ax = b has one, and only one, solution.

In closing we note that for a specific system Ax = b, it is usually easier to demon-strate the existence and/or uniqueness of a solution by using Gaussian elimination andactually solving the system. There are many instances, however, in which theoreticalinformation about existence and uniqueness is extremely valuable to practical computa-tions. A specific instance of this is provided in the next section.

1.7 EXERCISES

The vectors listed in Eq. (10) are used in several of theexercises that follow.

v1 =[

12

], v2 =

[23

], v3 =

[24

],

v4 =[

11

], v5 =

[36

],

u0 =

100

, u1 =

12−1

, u2 =

21−3

,

u3 =−1

43

, u4 =

440

, u5 =

110

(10)

In Exercises 1–14, use Eq. (6) to determine whether thegiven set of vectors is linearly independent or linearlydependent. If the set is linearly dependent, express onevector in the set as a linear combination of the others.1. {v1, v2} 2. {v1, v3}3. {v1, v5} 4. {v2, v3}

5. {v1, v2, v3} 6. {v2, v3, v4}7. {u4, u5} 8. {u3, u4}9. {u1, u2, u5} 10. {u1, u4, u5}11. {u2, u4, u5} 12. {u1, u2, u4}13. {u0, u1, u2, u4} 14. {u0, u2, u3, u4}15. Consider the sets of vectors in Exercises 1–14. Us-

ing Theorem 11, determine by inspection which ofthese sets are known to be linearly dependent.

The matrices listed in Eq. (11) are used in some of theexercises that follow.

A =[

1 23 4

], B =

[1 22 4

], C =

[1 32 4

],

D =

1 0 00 1 00 1 0

, E =

0 1 00 0 20 1 3

,

F =

1 2 10 3 20 0 1

(11)



In Exercises 16–27, use Definition 12 to determinewhether the given matrix is singular or nonsingular. If amatrixM is singular, give all solutions ofMx = θ .16. A 17. B 18. C19. AB 20. BA 21. D22. F 23. D + F 24. E25. EF 26. DE 27. FT

In Exercises 28–33, determine conditions on the scalarsso that the set of vectors is linearly dependent.

28. v1 =[

1a

], v2 =

[23

]

29. v1 =[

12

], v2 =

[3a

]

30. v1 =

121

, v2 =

132

, v3 =

01a

31. v1 =

121

, v2 =

1a

3

, v3 =

02b

32. v1 =[a

1

], v2 =

[b

3

]

33. v1 =[

1a

], v2 =

[b

c

]

In Exercises 34–39, the vectors and matrices are fromEq. (10) and Eq. (11). The equations listed in Exercises34–39 all have the form Mx = b, and all the equationsare consistent. In each exercise, solve the equation andexpress b as a linear combination of the columns ofM .34. Ax = v1 35. Ax = v3

36. Cx = v4 37. Cx = v2

38. Fx = u1 39. Fx = u3

In Exercises 40–45, express the given vector b as a lin-ear combination of v1 and v2, where v1 and v2 are inEq. (10).

40. b =[

27

]41. b =

[3−1

]

42. b =[

04

]43. b =

[00

]

44. b =[

12

]45. b =

[10

]

In Exercises 46–47, let S = {v1, v2, v3}.a) For what value(s) a is the set S linearly

dependent?b) For what value(s) a can v3 be expressed as a

linear combination of v1 and v2?

46. v1 =[

1−1

], v2 =

[ −22

], v3 =

[3a

]

47. v1 =[

10

], v2 =

[11

], v3 =

[3a

]

48. Let S = {v1, v2, v3} be a set of vectors in R3, wherev1 = θ . Show that S is a linearly dependent setof vectors. [Hint: Exhibit a nontrivial solution foreither Eq. (5) or Eq. (6).)]

49. Let {v1, v2, v3} be a set of nonzero vectors in Rmsuch that vTi vj = 0 when i = j . Show that the setis linearly independent. [Hint: Set a1v1 + a2v2 +a3v3 = θ and consider θTθ .]

50. If the set {v1, v2, v3} of vectors in Rm is linearlydependent, then argue that the set {v1, v2, v3, v4} isalso linearly dependent for every choice of v4 inRm.

51. Suppose that {v1, v2, v3} is a linearly independentsubset of Rm. Show that the set {v1, v1 + v2, v1 +v2 + v3} is also linearly independent.

52. If A and B are (n× n) matrices such that A is non-singular and AB = O, then prove that B = O.[Hint: Write B = [B1, . . . ,Bn] and consider AB =[AB1, . . . , ABn].]

53. If A, B, and C are (n × n) matrices such that A isnonsingular and AB = AC, then prove that B = C.[Hint: Consider A(B − C) and use the precedingexercise.]

54. LetA = [A1, . . . ,An−1] be an (n× (n−1))matrix.Show that B = [A1, . . . ,An−1, Ab] is singular forevery choice of b in Rn−1.

55. Suppose that C and B are (2× 2) matrices and thatB is singular. Show that CB is singular. [Hint: ByDefinition 12, there is a vector x1 in R2, x1 = θ ,such that Bx1 = θ .]

56. Let {w1,w2} be a linearly independent set of vectorsin R2. Show that if b is any vector in R2, then b isa linear combination of w1 and w2. [Hint: Considerthe (2× 2) matrix A = [w1,w2].]



57. Let A be an (n× n) nonsingular matrix. Show thatAT is nonsingular as follows:a) Suppose that v is a vector in Rn such thatAT v = θ . Cite a theorem from this section thatguarantees there is a vector w in Rn such thatAw = v.

b) By part (a), ATAw = θ , and thereforewTATAw = wTθ = 0. Cite results fromSection 1.6 that allow you to conclude that‖Aw‖ = 0. [Hint: What is (Aw)T ?]

c) Use parts (a) and (b) to conclude that ifAT v = θ , then v = θ ; this shows that AT isnonsingular.

58. Let T be an (n× n) upper-triangular matrix

T =

t11 t12 t13 · · · t1n

0 t22 t23 · · · t2n

0 0 t33 · · · t3n...

...

0 0 0 · · · tnn

.

Prove that if tii = 0 for some i, 1 ≤ i ≤ n, then Tis singular. [Hint: If t11 = 0, find a nonzero vec-

tor v such that T v = θ . If trr = 0, but tii = 0for 1, 2, . . . , r − 1, use Theorem 4 of Section 1.3 toshow that columns T1,T2, . . . ,Tr of T are linearlydependent. Then select a nonzero vector v such thatT v = θ .]

59. Let T be an (n × n) upper-triangular matrix asin Exercise 58. Prove that if tii = 0 for i =1, 2, . . . , n, then T is nonsingular. [Hint: Let T =[T1,T2, . . . ,Tn], and suppose that a1T1 + a2T2 +· · · + anTn = θ for some scalars a1, a2, . . . , an.First deduce that an = 0. Next show an−1 = 0,and so on.] Note that Exercises 58 and 59 establishthat an upper-triangular matrix is singular if and onlyif one of the entries t11, t22, . . . , tnn is zero. By Ex-ercise 57 the same result is true for lower-triangularmatrices.

60. Suppose that the (n× n) matrices A and B are rowequivalent. Prove thatA is nonsingular if and only ifB is nonsingular. [Hint: The homogeneous systemsAx = θ and Bx = θ are equivalent by Theorem 1of Section 1.1.]

1.8 DATA FITTING, NUMERICAL INTEGRATION,AND NUMERICAL DIFFERENTIATION (OPTIONAL)

In this section we present four applications of matrix theory toward the solution of a prac-tical problem. Three of the applications involve numerical approximation techniques,and the fourth relates to solving certain types of differential equations. In each case,solving the general problem depends on being able to solve a system of linear equations,and the theory of nonsingular matrices will guarantee that a solution exists and is unique.

Polynomial InterpolationWe begin by applying matrix theory to the problem of interpolating data with polyno-mials. In particular, Theorem 13 of Section 1.7 is used to establish a general existenceand uniqueness result for polynomial interpolation. The following example is a simpleillustration of polynomial interpolation.

Example 1 Find a quadratic polynomial, q(t), such that the graph of q(t) goes through the points(1, 2), (2, 3), and (3, 6) in the ty-plane (see Fig. 1.17).

t

y

Figure 1.17Points in the ty-plane

Solution A quadratic polynomial q(t) has the form

q(t) = a + bt + ct2, (1a)


1.8 Data Fitting, Numerical Integration, and Numerical Differentiation (Optional) 81

so our problem reduces to determining constants a, b, and c such that

q(1) = 2q(2) = 3q(3) = 6.

(1b)

The constraints in (1b) are, by (1a), equivalent to

a + b + c = 2a + 2b + 4c = 3a + 3b + 9c = 6.

(1c)

Clearly (1c) is a system of three linear equations in the three unknowns a, b, and c;so solving (1c) will determine the polynomial q(t). Solving (1c), we find the uniquesolution a = 3, b = −2, c = 1; therefore, q(t) = 3 − 2t + t2 is the unique quadraticpolynomial satisfying the conditions (1b). A portion of the graph of q(t) is shown inFig. 1.18.

t

y

Figure 1.18Graph of q(t)

Frequently polynomial interpolation is used when values of a function f (t) are givenin tabular form. For example, given a table of n + 1 values of f (t) (see Table 1.1), aninterpolating polynomial for f (t) is a polynomial, p(t), of the form

p(t) = a0 + a1t + a2t2 + · · · + antn

such that p(ti) = yi = f (ti) for 0 ≤ i ≤ n. Problems of interpolating data in tables arequite common in scientific and engineering work; for example, y = f (t)might describe atemperature distribution as a function of time with yi = f (ti) being observed (measured)temperatures. For a time t not listed in the table, p(t ) provides an approximation forf (t ).

Table 1.1

t f (t)

t0 y0

t1 y1

t2 y2

......

tn yn

Example 2 Find an interpolating polynomial for the four observations given in Table 1.2. Give anapproximation for f (1.5).

Table 1.2

t f (t)

0 31 02 −13 6

Solution In this case, the interpolating polynomial is a polynomial of degree 3 or less,

p(t) = a0 + a1t + a2t2 + a3t

3,

where p(t) satisfies the four constraints p(0) = 3, p(1) = 0, p(2) = −1, and p(3) = 6.As in the previous example, these constraints are equivalent to the (4 × 4) system ofequations

a0 = 3a0 + a1 + a2 + a3 = 0a0 + 2a1 + 4a2 + 8a3 = −1a0 + 3a1 + 9a2 + 27a3 = 6.

Solving this system, we find that a0 = 3, a1 = −2, a2 = −2, a3 = 1 is the uniquesolution. Hence the unique polynomial that interpolates the tabular data for f (t) is

p(t) = 3− 2t − 2t2 + t3.The desired approximation for f (1.5) is p(1.5) = −1.125.



Note that in each of the two preceding examples, the interpolating polynomial wasunique. Theorem 14, on page 83, states that this is always the case. The next exampleconsiders the general problem of fitting a quadratic polynomial to three data points andillustrates the proof of Theorem 14.

Example 3 Given three distinct numbers t0, t1, t2 and any set of three values y0, y1, y2, show thatthere exists a unique polynomial,

p(t) = a0 + a1t + a2t2, (2a)

of degree 2 or less such that p(t0) = y0, p(t1) = y1, and p(t2) = y2.

Solution The given constraints and (2a) define a (3× 3) linear system,

a0 + a1t0 + a2t20 = y0

a0 + a1t1 + a2t21 = y1

a0 + a1t2 + a2t22 = y2,

(2b)

where a0, a1, and a2 are the unknowns. The problem is to show that system (2b) has aunique solution. We can write system (2b) in matrix form as T a = y, where

T =

1 t0 t20

1 t1 t21

1 t2 t22

, a =

a0a1a2

, and y =

y0

y1

y2

. (2c)

By Theorem 13, the system is guaranteed to have a unique solution if T is nonsingular.To establish that T is nonsingular, it suffices to show that if

c =c0

c1

c2

is a solution to the homogeneous system T x = θ , then c = θ . But T c = θ is equivalentto

c0 + c1t0 + c2t20 = 0

c0 + c1t1 + c2t21 = 0

c0 + c1t2 + c2t22 = 0.

(2d)

Let q(t) = c0 + c1t + c2t2. Then q(t) has degree at most 2 and, by system (2d),

q(t0) = q(t1) = q(t2) = 0. Thus q(t) has three distinct real zeros. By Exercise 25, if aquadratic polynomial has three distinct real zeros, then it must be identically zero. Thatis, c0 = c1 = c2 = 0, or c = θ . Hence T is nonsingular, and so system (2b) has a uniquesolution.

The matrix T given in (2c) is the (3 × 3) Vandermonde matrix. More gener-ally, for real numbers t0, t1, . . . , tn, the [(n + 1) × (n + 1)] Vandermonde matrix T



is defined by

T =

1 t0 t20 · · · tn0

1 t1 t21 · · · tn1...

...

1 tn t2n · · · tnn

. (3)

Following the argument given in Example 3 and making use of Exercise 26, we can showthat if t0, t1, . . . , tn are distinct, then T is nonsingular. Thus, by Theorem 13, the linearsystem T x = y has a unique solution for each choice of y in Rn+1. As a consequence,we have the following theorem.

Theorem 14 Given n+1 distinct numbers t0, t1, . . . , tn and any set of n+1 values y0, y1, . . . , yn, thereis one and only one polynomial p(t) of degree n or less, p(t) = a0 + a1t + · · · + antn,such that p(ti) = yi, i = 0, 1, . . . , n.

Solutions to Initial Value ProblemsThe following example provides yet another application of the fact that the Vandermondematrix T given in (3) is nonsingular when t0, t1, . . . , tn are distinct. Problems of thissort arise in solving initial value problems in differential equations.

Example 4 Given n + 1 distinct numbers t0, t1, . . . , tn and any set of n + 1 values y0, y1, . . . , yn,show that there is one, and only one, function that has the form

y = a0et0x + a1e

t1x + · · · + anetnx (4a)

and that satisfies the constraints y(0) = y0, y′(0) = y1, . . . , y

(n)(0) = yn.Solution Calculating the first n derivatives of y gives

y = a0et0x + a1e

t1x + · · · + anetnx

y ′ = a0t0et0x + a1t1e

t1x + · · · + antnetnxy ′′ = a0t

20 et0x + a1t

21 et1x + · · · + ant2netnx

......

...

y(n) = a0tn0 et0x + a1t

n1 et1x + · · · + antnn etnx .

(4b)

Substituting x = 0 in each equation of system (4b) and setting y(k)(0) = yk yields thesystem

y0 = a0 + a1 + · · · + an

y1 = a0t0 + a1t1 + · · · + antny2 = a0t

20 + a1t

21 + · · · + ant2n

......

...

yn = a0tn0 + a1t

n1 + · · · + antnn

(4c)



with unknowns a0, a1, . . . , an. Note that the coefficient matrix for the linear system (4c)is

T T =

1 1 · · · 1t0 t1 · · · tn

t20 t21 · · · t2n...

...

tn0 tn1 · · · tnn

, (4d)

where T is the [(n + 1) × (n + 1)] Vandermonde matrix given in Eq. (3). It is left asan exercise (see Exercise 57 of Section 1.7) to show that because T is nonsingular, thetranspose T T is also nonsingular. Thus by Theorem 13, the linear system (4c) has aunique solution.

The next example is a specific case of Example 4.

Example 5 Find the unique function y = c1ex + c2e

2x + c3e3x that satisfies the constraints y(0) =

1, y ′(0) = 2, and y ′′(0) = 0.

Solution The given function and its first two derivatives are

y = c1ex + c2e

2x + c3e3x

y ′ = c1ex + 2c2e

2x + 3c3e3x

y ′′ = c1ex + 4c2e

2x + 9c3e3x.

(5a)

From (5a) the given constraints are equivalent to

1 = c1 + c2 + c3

2 = c1 + 2c2 + 3c3

0 = c1 + 4c2 + 9c3.

(5b)

The augmented matrix for system (5b) is

1 1 1 11 2 3 21 4 9 0

,

and solving in the usual manner yields the unique solution c1 = −2, c2 = 5, c3 = −2.Therefore, the function y = −2ex + 5e2x − 2e3x is the unique function that satisfies thegiven constraints.

Numerical IntegrationThe Vandermonde matrix also arises in problems where it is necessary to estimate nu-merically an integral or a derivative. For example, let I (f ) denote the definite integral



I (f ) =∫ b

a

f (t) dt.

If the integrand is fairly complicated or if the integrand is not a standard form that canbe found in a table of integrals, then it will be necessary to approximate the value I (f )numerically.

One effective way to approximate I (f ) is first to find a polynomial p that approxi-mates f on [a, b],

p(t) ≈ f (t), a ≤ t ≤ b.Next, given that p is a good approximation to f , we would expect that the approximationthat follows is also a good one:

∫ b

a

p(t) dt ≈∫ b

a

f (t) dt. (6)

Of course, since p is a polynomial, the integral on the left-hand side of Eq. (6) can beeasily evaluated and provides a computable estimate to the unknown integral, I (f ).

One way to generate a polynomial approximation to f is through interpolation. Ifwe select n + 1 points t0, t1, . . . , tn in [a, b], then the nth-degree polynomial p thatsatisfies p(ti) = f (ti), 0 ≤ i ≤ n, is an approximation to f that can be used in Eq. (6)to estimate I (f ).

In summary, the numerical integration process proceeds as follows:

1. Given f , construct the interpolating polynomial, p.2. Given p, calculate the integral,

∫ bap(t) dt .

3. Use∫ bap(t) dt as the approximation to

∫ baf (t) dt .

It turns out that this approximation scheme can be simplified considerably, and step 1can be skipped entirely. That is, it is not necessary to construct the actual interpolatingpolynomial p in order to know the integral of p,

∫ bap(t) dt .

We will illustrate the idea with a quadratic interpolating polynomial. Suppose p isthe quadratic polynomial that interpolates f at t0, t1, and t2. Next, suppose we can findscalars A0, A1, A2 such that

A0 + A1 + A2 =∫ b

a

1 dt

A0t0 + A1t1 + A2t2 =∫ b

a

t dt

A0t20 + A1t

21 + A2t

22 =

∫ b

a

t2 dt.

(7)

Now, if the interpolating polynomial p is given by p(t) = a0 + a1t + a2t2, then the

equations in (7) give



∫ b

a

p(t) dt =∫ b

a

[a0 + a1t + a2t2] dt

= a0

∫ b

a

1 dt + a1

∫ b

a

t dt + a2

∫ b

a

t2 dt

= a0

2∑i=0

Ai + a1

2∑i=0

Aiti + a2

2∑i=0

Ait2i

=2∑i=0

Ai[a0 + a1ti + a2t2i ]

=2∑i=0

Aip(ti).

The previous calculations demonstrate the following: If we know the values of a quadraticpolynomial p at three points t0, t1, t2 and if we can find scalars A0, A1, A2 that satisfysystem (7), then we can evaluate the integral of p with the formula∫ b

a

p(t) dt =2∑i=0

Aip(ti). (8)

Next, since p is the quadratic interpolating polynomial for f , we see that the valuesof p(ti) are known to us; that is, p(t0) = f (t0), p(t1) = f (t1), and p(t2) = f (t2). Thus,combining Eq. (8) and Eq. (6), we obtain∫ b

a

p(t) dt =2∑i=0

Aip(ti) =2∑i=0

Aif (ti) ≈∫ b

a

f (t) dt,

or equivalently, ∫ b

a

f (t) dt ≈2∑i=0

Aif (ti). (9)

The approximation∑2i=0 Aif (ti) in (9) is known as a numerical integration formula.

Observe that once the evaluation points t0, t1, t2 are selected, the scalars A0, A1, A2 aredetermined by system (7). The coefficient matrix for system (7) has the form

A =

1 1 1t0 t1 t2

t20 t21 t22

,

and so we see that A is nonsingular since A is the transpose of a Vandermonde matrix(recall matrix (4d)).

In general, if t0, t1, . . . , tn are n + 1 points in [a, b], we can proceed exactly as inthe derivation of formula (9) and produce a numerical integration formula of the form∫ b

a

f (t) dt ≈n∑i=0

Aif (ti). (10)



The weights Ai in formula (10) would be determined by solving the Vandermondesystem:

A0 + A1 + · · · + An =∫ b

a

1 dt

A0t0 + A1t1 + · · · + Antn =∫ b

a

t dt

......

...

A0tn0 + A1t

n1 + · · · + Antnn =

∫ b

a

tn dt.

(11)

The approximation∑ni=0 Aif (ti) is the same number that would be produced by calcu-

lating the polynomialp of degree n that interpolates f at t0, t1, . . . , tn and then evaluating∫ bap(t) dt .

Example 6 For an interval [a, b] let t0 = a, t1 = (a+b)/2, and t2 = b. Construct the correspondingnumerical integration formula.

Solution For t0 = a, t1 = (a + b)/2, and t2 = b, the system to be solved is given by (11) withn = 2. We write system (11) as Cx = d, where

C =

1 1 1a t1 b

a2 t21 b2

and d =

b − a(b2 − a2)/2(b3 − a3)/3

.

It can be shown (see Exercise 23) that the solution of Cx = d is A0 = (b − a)/6,A1 = 4(b− a)/6, A2 = (b− a)/6. The corresponding numerical integration formula is∫ b

a

f (t) dt ≈ [(b − a)/6]{f (a)+ 4f [(a + b)/2] + f (b)}. (12)

The reader may be familiar with the preceding approximation, which is known as Simp-son’s rule.

Example 7 Use the integration formula (12) to approximate the integral

I (f ) =∫ 1/2

0cos(πt2/2) dt.

Solution With a = 0 and b = 1/2, formula (12) becomes

I (f ) ≈ 1/12[cos(0)+ 4 cos(π/32)+ cos(π/8)]= (1/12)[1.0+ 4(0.995184 . . .)+ 0.923879 . . .]= 0.492051 . . . .

Note that in Example 7, the number I (f ) is equal to C(0.5), where C(x) denotesthe Fresnel integral

C(x) =∫ x

0cos(πt2/2) dt.



The function C(x) is important in applied mathematics, and extensive tables of thefunction C(x) are available. The integrand is not a standard form, and C(x) must beevaluated numerically. From a table, C(0.5) = 0.49223442 . . . .

Numerical DifferentiationNumerical differentiation formulas can also be derived in the same fashion as numericalintegration formulas. In particular, suppose that f is a differentiable function and wewish to estimate the value f ′(a), where f is differentiable at t = a.

Let p be the polynomial of degree n that interpolates f at t0, t1, . . . , tn, where theinterpolation nodes ti are clustered near t = a. Thenp provides us with an approximationfor f , and we can estimate the value f ′(a) by evaluating the derivative of p at t = a:

f ′(a) ≈ p′(a).As with a numerical integration formula, it can be shown that the value p′(a) can

be expressed as

p′(a) = A0p(t0)+ A1p(t1)+ · · · + Anp(tn). (13)

In formula (13), the weights Ai are determined by the system of equations

q ′0(a) = A0q0(t0) + A1q0(t1) + · · · + Anq0(tn)

q ′1(a) = A0q1(t0) + A1q1(t1) + · · · + Anq1(tn)...

......

q ′n(a) = A0qn(t0) + A1qn(t1) + · · · + Anqn(tn),where q0(t) = 1, q1(t) = t, . . . , qn(t) = tn. So if formula (13) holds for the n + 1special polynomials 1, t, . . . , tn, then it holds for every polynomial p of degree n or less.

If p interpolates f at t0, t1, . . . , tn so that p(ti) = f (ti), 0 ≤ i ≤ n, then (byformula 13) the approximation f ′(a) ≈ p′(a) leads to

f ′(a) ≈ A0f (t0)+ A1f (t1)+ · · · + Anf (tn). (14)

An approximation of the form (14) is called a numerical differentiation formula.

Example 8 Derive a numerical differentiation formula of the form

f ′(a) ≈ A0f (a − h)+ A1f (a)+ A2f (a + h).Solution The weights A0, A1, and A2 are determined by forcing Eq. (13) to hold for p(t) =

1, p(t) = t , and p(t) = t2. Thus the weights are found by solving the system

[p(t) = 1] 0 = A0 + A1 + A2

[p(t) = t] 1 = A0(a − h) + A1(a) + A2(a + h)[p(t) = t2] 2a = A0(a − h)2 + A1(a)

2 + A2(a + h)2.



In matrix form, the system above can be expressed as Cx = d, where

C =

1 1 1a − h a a + h(a − h)2 a2 (a + h)2

and d =

01

2a

.

By (4d), the matrix C is nonsingular and (see Exercise 24) the solution is A0 =−1/2h,A1 = 0, A2 = 1/2h. The numerical differentiation formula has the form

f ′(a) ≈ [f (a + h)− f (a − h)]/2h. (15)

(Note: Formula (15) in this example is known as the centered-difference approximationto f ′(a).)

The same techniques can be used to derive formulas for estimating higher derivatives.

Example 9 Construct a numerical differentiation formula of the form

f ′′(a) ≈ A0f (a)+ A1f (a + h)+ A2f (a + 2h)+ A3f (a + 3h).

Solution The weightsA0, A1, A2, andA3 are determined by forcing the preceding approximationto be an equality for p(t) = 1, p(t) = t, p(t) = t2, and p(t) = t3. These constraintslead to the equations

[p(t) = 1] 0 = A0 + A1 + A2 + A3

[p(t) = t] 0 = A0(a) + A1(a + h) + A2(a + 2h) + A3(a + 3h)[p(t) = t2] 2 = A0(a)

2 + A1(a + h)2 + A2(a + 2h)2 + A3(a + 3h)2[p(t) = t3] 6a = A0(a)

3 + A1(a + h)3 + A2(a + 2h)3 + A3(a + 3h)3.

Since this system is a bit cumbersome to solve by hand, we decided to use the computeralgebra system Derive. (Because the coefficient matrix has symbolic rather than numer-ical entries, we had to use a computer algebra system rather than numerical softwaresuch as MATLAB. In particular, Derive is a popular computer algebra system that ismenu-driven and very easy to use.)

Figure 1.19 shows the results from Derive. Line 2 gives the command to row reducethe augmented matrix for the system. Line 3 gives the results. Therefore, the numericaldifferentiation formula is

f ′′(a) ≈ 1h2 [2f (a)− 5f (a + h)+ 4f (a + 2h)− f (a + 3h)].



2: ROW_REDUCE

3:

1

a

a2

a3

1

a + h

(a + h)2

(a + h)3

1

a + 2h

(a + 2h)2

(a + 2h)3

1

a + 3h

(a + 3h)2

(a + 3h)3

0

0

2

6a

1

0

0

0

0

1

0

0

0

0

1

0

0

0

0

1

2

h2

5

h2

–

4

h2

1

h2

–

Figure 1.19 Using Derive to solve the system of equations inExample 9

1.8 EXERCISES

In Exercises 1–6, find the interpolating polynomial forthe given table of data. [Hint: If the data table hask entries, the interpolating polynomial will be of degreek − 1 or less.]

1.t 0 1 2

y −1 3 62.

t −1 0 2

y 6 1 −3

3.t −1 1 2

y 1 5 74.

t 1 3 4

y 5 11 14

5.t −1 0 1 2

y −6 1 4 15

6.t −2 −1 1 2

y −3 1 3 13

In Exercises 7–10, find the constants so that the givenfunction satisfies the given conditions.7. y = c1e

2x + c2e3x ; y(0) = 3, y ′(0) = 7

8. y = c1e(x−1) + c2e

3(x−1); y(1) = 1, y ′(1) = 59. y = c1e

−x + c2ex + c3e

2x ; y(0) = 8, y ′(0) = 3,y ′′(0) = 11

10. y = c1ex + c2e

2x + c3e3x ; y(0) = −1, y ′(0) = −3,

y ′′(0) = −5

As in Example 6, find the weights Ai for the numericalintegration formulas listed in Exercises 11–16. [Note:It can be shown that the special formulas developed inExercises 11–16 can be translated to any interval of thegeneral form [a, b]. Similarly, the numerical differentia-tion formulas in Exercises 17–22 can also be translated.]



11.∫ 3h

0f (t) dt ≈ A0f (h)+ A1f (2h)

12.∫ h

0f (t) dt ≈ A0f (0)+ A1f (h)

13.∫ 3h

0f (t) dt ≈ A0f (0) + A1f (h) + A2f (2h) +

A3f (3h)

14.∫ 4h

0f (t) dt ≈ A0f (h)+ A1f (2h)+ A2f (3h)

15.∫ h

0f (t) dt ≈ A0f (−h)+ A1f (0)

16.∫ h

0f (t) dt ≈ A0f (−h)+ A1f (0)+ A2f (h)

As in Example 8, find the weights for the numerical dif-ferentiation formulas in Exercises 17–22. For Exercises21 and 22, replace p′(a) in formula (13) by p′′(a).

17. f ′(0) ≈ A0f (0)+ A1f (h)

18. f ′(0) ≈ A0f (−h)+ A1f (0)19. f ′(0) ≈ A0f (0)+ A1f (h)+ A2f (2h)20. f ′(0) ≈ A0f (0)+A1f (h)+A2f (2h)+A3f (3h)21. f ′′(0) ≈ A0f (−h)+ A1f (0)+ A2f (h)

22. f ′′(0) ≈ A0f (0)+ A1f (h)+ A2f (2h)23. Complete the calculations in Example 6 by trans-

forming the augmented matrix [C | d] to reducedechelon form.

24. Complete the calculations in Example 8 by trans-forming the augmented matrix [C | d] to reducedechelon form.

25. Let p denote the quadratic polynomial defined byp(t) = at2+bt+c, where a, b, and c are real num-bers. Use Rolle’s theorem to prove the following: Ift0, t1, and t2 are real numbers such that t0 < t1 < t2and if p(t0) = 0, p(t1) = 0, and p(t2) = 0, thena = b = c = 0. (Recall that Rolle’s theorem statesthere are values u1 and u2 such that u1 is in (t0, t1),u2 is in (t1, t2), p′(u1) = 0, and p′(u2) = 0.)

26. Use mathematical induction to prove that a polyno-mial of the form p(t) = antn + · · · + a1t + a0 canhave n + 1 distinct real zeros only if an = an−1 =· · · = a1 = a0 = 0. [Hint: Use Rolle’s theorem, asin Exercise 25.]

Exercises 27–33 concern Hermite interpolation, whereHermite interpolation means the process of constructingpolynomials that match both function values and deriva-tive values.

In Exercises 27–30, find a polynomial p of theform p(t) = at3 + bt2 + ct + d that satisfies the givenconditions.27. p(0) = 2, p′(0) = 3, p(1) = 8, p′(1) = 1028. p(0) = 1, p′(0) = 2, p(1) = 4, p′(1) = 429. p(−1) = −1, p′(−1) = 5, p(1) = 9, p′(1) = 930. p(1) = 3, p′(1) = 4, p(2) = 15, p′(2) = 2231. Suppose that t0 and t1 are distinct real numbers,

where t0 < t1. Prove: If p is any polynomialof the form p(t) = at3 + bt2 + ct + d, wherep(t0) = p(t1) = 0 and p′(t0) = p′(t1) = 0, thena = b = c = d = 0. [Hint: Apply Rolle’stheorem.]

32. Suppose t0 and t1 are distinct real numbers, wheret0 < t1. Suppose y0, y1, s0, and s1 are given realnumbers. Prove that there is one, and only one,polynomial p of the form p(t) = at3+bt2+ ct+dsuch that p(t0) = y0, p

′(t0) = s0, p(t1) = y1, andp′(t1) = s1. [Hint: Set up a system of four equationscorresponding to the four interpolation constraints.Use Exercise 31 to show that the coefficient matrixis nonsingular.]

33. Let t0 < t1 < · · · < tn be n + 1 distinct real num-bers. Let y0, y1, . . . , yn and s0, s1, . . . , sn be givenreal numbers. Show that there is one, and only one,polynomial p of degree 2n + 1 or less such thatp(ti) = yi, 0 ≤ i ≤ n, and p′(ti) = si, 0 ≤ i ≤ n.[Hint: As in Exercise 31, show that all the coeffi-cients of p are zero if yi = si = 0, 0 ≤ i ≤ n.Next, as in Exercise 32, write the system of equa-tions corresponding to the interpolation constraintsand verify that the coefficient matrix is nonsingular.]

In Exercises 34 and 35, use linear algebra software, suchas Derive, to construct the formula.

34.∫ 5h

0f (x)dx ≈

5∑j=0

Ajf (jh)

35. f ′(a) ≈ A0f (a − 2h)+ A1f (a − h)+ A2f (a)

+ A3f (a + h)+ A4f (a + 2h)



1.9 MATRIX INVERSES AND THEIR PROPERTIES

In the preceding sections the matrix equation

Ax = b (1)

has been used extensively to represent a system of linear equations. Equation (1) looks,symbolically, like the single linear equation

ax = b, (2)

where a and b are real numbers. Since Eq. (2) has the unique solution

x = a−1b

when a = 0, it is natural to ask whether Eq. (1) can also be solved as

x = A−1b.

In this section we investigate this question. We begin by defining the inverse of amatrix, showing how to calculate it, and then showing how the inverse can be used tosolve systems of the form Ax = b.

The Matrix InverseFor a nonzero real number a, the inverse of a is the unique real number a−1 having theproperty that

a−1a = aa−1 = 1. (3)

In Eq. (3), the number 1 is the multiplicative identity for real number multiplication. Inan analogous fashion, let A be an (n × n) matrix. We now ask if we can find a matrixA−1 with the property that

A−1A = AA−1 = I. (4)

(In Eq. (4) I denotes the (n× n) identity matrix; see Section 1.6.)We formalize the idea suggested by Eq. (4) in the next definition. Note that the

commutativity condition A−1A = AA−1 means that A and A−1 must be square and ofthe same size; see Exercise 75.

Definition 13 Let A be an (n× n) matrix. We say that A is invertible if we can find an (n× n)matrix A−1 such that

A−1A = AA−1 = I.The matrix A−1 is called an inverse for A.

(Note: It is shown in Exercise 77 that if A is invertible, then A−1 is unique.As an example of an invertible matrix, consider

A =[

1 23 4

].


1.9 Matrix Inverses and Their Properties 93

It is simple to show that A is invertible and that A−1 is given by

A−1 =[ −2 1

3/2 −1/2

].

(To show that the preceding matrix is indeed the inverse of A, we need only form theproducts A−1A and AA−1 and then verify that both products are equal to I .)

Not every square matrix is invertible, as the next example shows.

Example 1 Let A be the (2× 2) matrix

A =[

1 23 6

].

Show that A has no inverse.

Solution An inverse for A must be a (2× 2) matrix

B =[a b

c d

]

such thatAB = BA = I . If such a matrixB exists, it must satisfy the following equation:[1 00 1

]=[

1 23 6

][a b

c d

]=[

a + 2c b + 2d3a + 6c 3b + 6d

].

The preceding equation requires that a + 2c = 1 and 3a + 6c = 0. This is clearlyimpossible, so A has no inverse.

Using Inverses to Solve Systems of Linear EquationsOne major use of inverses is to solve systems of linear equations. In particular, considerthe equation

Ax = b, (5)

where A is an (n× n) matrix and where A−1 exists. Then, to solve Ax = b, we mightthink of multiplying both sides of the equation by A−1:

A−1Ax = A−1bor

x = A−1b.The preceding calculations suggest the following: To solve Ax = b we need only

compute the vector x given byx = A−1b. (6)

To verify that the vector x = A−1b is indeed a solution, we need only insert it into theequation:

Ax = A(A−1b)= (AA−1)b (by associativity of multiplication)= Ib (by Definition 13)= b. (because I is the identity matrix)



Existence of InversesAs we saw earlier in Example 1, some matrices do not have an inverse. We now turn ourattention to determining exactly which matrices are invertible. In the process, we willalso develop a simple algorithm for calculating A−1.

LetA be an (n×n)matrix. IfA does have an inverse, then that inverse is an (n×n)matrix B such that

AB = I. (7a)

(Of course, to be an inverse, the matrix B must also satisfy the condition BA = I . Wewill put this additional requirement aside for the moment and concentrate solely on thecondition AB = I .)

Expressing B and I in column form, the equation AB = I can be rewritten as

A[b1, b2, . . . , bn] = [e1, e2, . . . , en]or

[Ab1, Ab2, . . . , Abn] = [e1, e2, . . . , en]. (7b)

If A has an inverse, therefore, it follows that we must be able to solve each of thefollowing n equations:

Ax = e1

Ax = e2...

Ax = en.

(7c)

In particular, if A is invertible, then the kth column of A−1 can be found by solvingAx = ek, k = 1, 2, . . . , n.

We know (recall Theorem 13) that all the equations listed in (7c) can be solved if Ais nonsingular. We suspect, therefore, that a nonsingular matrix always has an inverse.In fact, as is shown in Theorem 15, A has an inverse if and only if A is nonsingular.

Before stating Theorem 15, we give a lemma. (Although we do not need it here,the converse of the lemma is also valid; see Exercise 70.)

Lemma Let P,Q, and R be (n × n) matrices such that PQ = R. If either P or Q is singular,then so is R.

Proof Suppose first that Q is singular. Then there is a nonzero vector x1 such that Qx1 = θ .Therefore, using associativity of matrix multiplication,

Rx1 = (PQ)x1

= P(Qx1)

= P θ= θ .

So,Q singular implies R is singular.Now, suppose Q is nonsingular but the other factor, P , is singular. Then there is a

nonzero vector x1 such that Px1 = θ . Also, Q nonsingular means we can find a vector



x2 such thatQx2 = x1. (In addition, note that x2 must be nonzero because x1 is nonzero.)Therefore,

Rx2 = (PQ)x2

= P(Qx2)

= Px1

= θ .Thus, if either P orQ is singular, then the product PQ is also singular.

We are now ready to characterize invertible matrices.

Theorem 15 Let A be an (n× n) matrix. Then A has an inverse if and only if A is nonsingular.

Proof Suppose first that A has an inverse. That is, as in equation (7a), there is a matrix Bsuch that AB = I . Now, as Exercise 74 proves, I is nonsingular. Therefore, by thelemma, neither A nor B can be singular. This argument shows that invertibility impliesnonsingularity.

For the converse, suppose A is nonsingular. Since A is nonsingular, we see fromequations (7a)–(7c) that there is a unique matrix B such that AB = I . This matrix Bwill be the inverse of A if we can show that A and B commute; that is, if we can alsoshow that BA = I .

We will use a common algebraic trick to prove BA = I . First of all, note that thematrix B must also be nonsingular since AB = I . Therefore, just as with equations(7a)–(7c), there is a matrix C such that BC = I . Then, combining the expressionsAB = I and BC = I , we obtain

A = AI = A(BC) = (AB)C = IC = C.Since A = C, we also have BA = BC = I . Therefore, BA = I , and this shows that B isthe inverse of A. Hence, A nonsingular implies that A is invertible.

Calculating the InverseIn this subsection we give a simple algorithm for finding the inverse of a matrix A,provided that A has an inverse. The algorithm is based on the system of equations (7c):

Ax = e1, Ax = e2, . . . , Ax = en.

We first observe that there is a very efficient way to organize the solution of thesen systems; we simply row reduce the associated augmented matrix [A | e1, e2, . . . , en].The procedure is illustrated in the next example.

Example 2 Find the inverse of the (3× 3) matrix

A =

1 2 32 5 41 −1 10

.



Solution The augmented matrix [A | e1, e2, e3] is given by

1 2 3 1 0 02 5 4 0 1 01 −1 10 0 0 1

.

(Note that the augmented matrix has the form [A | I ].)We now perform appropriate row operations to transform [A | I ] to reduced echelon

form.

R2 − 2R1, R3−R1:

1 2 3 1 0 00 1 −2 −2 1 00 −3 7 −1 0 1

R1 − 2R2, R3 + 3R2:

1 0 7 5 −2 00 1 −2 −2 1 00 0 1 −7 3 1

R1 − 7R3, R2 + 2R3:

1 0 0 54 −23 −70 1 0 −16 7 20 0 1 −7 3 1

.

Having the reduced echelon form above, we easily find the solutions of the three systemsAx = e1, Ax = e2, Ax = e3. In particular,

Ax = e1 has solution: Ax = e2 has solution: Ax = e3 has solution:

x1 =

54−16−7

x2 =

−23

73

x3 =

−7

21

.

Therefore, A−1 = [x1, x2, x3] or

A−1 =

54 −23 −7−16 7 2−7 3 1

.

This procedure illustrated in Example 2 can be summarized by the following algo-rithm for calculating A−1.



Computation of A−1

To calculate the inverse of a nonsingular (n× n) matrix, we can proceed asfollows:Step 1. Form the (n× 2n) matrix [A | I ].Step 2. Use elementary row operations to transform [A | I ] to the form [I |B].Step 3. Reading from this final form, A−1 = B.

(Note: Step 2 of the algorithm above assumes that [A | I ] can always be row reducedto the form [I |B] when A is nonsingular. This is indeed the case, and we ask you toprove it in Exercise 76 by showing that the reduced echelon form for any nonsingularmatrix A is I . In fact, Exercise 76 actually establishes the stronger result listed next inTheorem 16.)

Theorem 16 Let A be an (n × n) matrix. Then A is nonsingular if and only if A is row equivalentto I .

The next example illustrates the algorithm for calculating A−1 and also illustrateshow to compute the solution to Ax = b by forming x = A−1b.

Example 3 Consider the system of equations

x1 + 2x2 = −12x1 + 5x2 = −10.

(a) Use the algorithm to find the inverse of the coefficient matrix A.(b) Use the inverse to calculate the solution of the system.

Solution

(a) We begin by forming the (2× 4) matrix [A | I ],

[A | I ] =[

1 2 1 02 5 0 1

].

We next row reduce [A | I ] to [I |B] as follows:

R2 − 2R1: [1 2 1 00 1 −2 1

]

R1 − 2R2: [1 0 5 −20 1 −2 1

].



Thus, A−1 is the matrix [5 −2−2 1

].

(b) The solution to the system is x = A−1b where

b =[ −1−10

].

Now, A−1b = [15,−8]T , so the solution is x1 = 15, x2 = −8.

Inverses for (2× 2)MatricesThere is a simple formula for the inverse of a (2×2)matrix, which we give in the remarkthat follows.

Remark Let A be a (2× 2) matrix,

A =[a b

c d

],

and set 7 = ad − bc.(a) If 7 = 0, then A does not have an inverse.(b) If 7 = 0, then A has an inverse given by

A−1 = 17

[d −b−c a

]. (8)

Part (a) of the remark is Exercise 69. To verify the formula given in (b), supposethat 7 = 0, and define B to be the matrix

B = 17

[d −b−c a

].

Then

BA = 17

[d −b−c a

][a b

c d

]= 17

[ad − bc 0

0 ad − bc

]=[

1 00 1

].

Similarly, AB = I , so B = A−1.The reader familiar with determinants will recognize the number7 in the remark as

the determinant of the matrix A. We make use of the remark in the following example.

Example 4 Let A and B be given by

A =[

6 83 4

]and B =

[1 73 5

].

For each matrix, determine whether an inverse exists and calculate the inverse if it doesexist.



Solution For the matrix A, the number 7 is

7 = 6(4)− 8(3) = 0,

so, by the remark, A cannot have an inverse. For the matrix B, the number 7 is

7 = 1(5)− 7(3) = −16.

According to formula (8)

B−1 = − 116

[5 −7−3 1

].

Example 5 Consider the matrix A

A =[λ

22

λ− 3

].

For what values of λ is the matrix A nonsingular? Find A−1 if A is nonsingular.

Solution The number 7 is given by

7 = λ(λ− 3)− 4 = λ2 − 3λ− 4 = (λ− 4)(λ+ 1).

Thus, A is singular if and only if λ = 4 or λ = −1. For values other than these two, A−1

is given by

A−1 = 1λ2 − 3λ− 4

[λ− 3−2

−2λ

].

Properties of Matrix InversesThe following theorem lists some of the properties of matrix inverses.

Theorem 17 Let A and B be (n× n) matrices, each of which has an inverse. Then:

1. A−1 has an inverse, and (A−1)−1 = A.2. AB has an inverse, and (AB)−1 = B−1A−1.3. If k is a nonzero scalar, then kA has an inverse, and (kA)−1 = (1/k)A−1.4. AT has an inverse, and (AT )−1 = (A−1)T .

Proof

1. Since AA−1 = A−1A = I , the inverse of A−1 is A; that is, (A−1)−1 = A.2. Note that (AB)(B−1A−1) = A(BB−1)A−1 = A(IA−1) = AA−1 = I . Simi-

larly, (B−1A−1)(AB) = I , so, by Definition 13, B−1A−1 is the inverse for AB.Thus (AB)−1 = B−1A−1.

3. The proof of property 3 is similar to the proofs given for properties 1 and 2 andis left as an exercise.



4. It follows from Theorem 10, property 2, of Section 1.6 that AT (A−1)T =(A−1A)T = I T = I . Similarly, (A−1)T AT = I . Therefore, AT has inverse(A−1)T .

Note that the familiar formula (ab)−1 = a−1b−1 for real numbers is valid onlybecause multiplication of real numbers is commutative. We have already noted thatmatrix multiplication is not commutative, so, as the following example demonstrates,(AB)−1 = A−1B−1.

Example 6 Let A and B be the (2× 2) matrices

A =[

1 32 4

]and B =

[3 −21 −1

].

1. Use formula (8) to calculate A−1, B−1, and (AB)−1.2. Use Theorem 17, property 2, to calculate (AB)−1.3. Show that (AB)−1 = A−1B−1.

Solution For A the number 7 is 7 = 1(4)− 3(2) = −2, so by formula (8)

A−1 =[ −2 3/2

1 −1/2

].

For B the number 7 is 3(−1)− 1(−2) = −1, so

B−1 =[

1 −21 −3

].

The product AB is given by

AB =[

6 −510 −8

],

so by formula (8)

(AB)−1 =[ −4 5/2−5 3

].

By Theorem 17, property 2,

(AB)−1 = B−1A−1 =[

1 −21 −3

][ −2 3/21 −1/2

]=[ −4 5/2−5 3

].

Finally,

A−1B−1 =[ −2 3/2

1 −1/2

][1 −21 −3

]=[ −1/2 −1/2

1/2 −1/2

] = (AB)−1.

The following theorem summarizes some of the important properties of nonsingularmatrices.



Theorem 18 Let A be an (n× n) matrix. The following are equivalent:

(a) A is nonsingular; that is, the only solution of Ax = θ is x = θ .(b) The column vectors of A are linearly independent.(c) Ax = b always has a unique solution.(d) A has an inverse.(e) A is row equivalent to I .

Ill-Conditioned MatricesIn applications the equationAx = b often serves as a mathematical model for a physicalproblem. In these cases it is important to know whether solutions toAx = b are sensitiveto small changes in the right-hand side b. If small changes in b can lead to relativelylarge changes in the solution x, then the matrix A is called ill-conditioned.

The concept of an ill-conditioned matrix is related to the size ofA−1. This connectionis explained after the next example.

Example 7 The (n× n) Hilbert matrix is the matrix whose ijth entry is 1/(i+ j − 1). For example,the (3× 3) Hilbert matrix is

1 1/2 1/3

1/2 1/3 1/41/3 1/4 1/5

.

Let A denote the (6× 6) Hilbert matrix, and consider the vectors b and b+7b:

b =

121

1.41412

, b+7b =

121

1.414212

.

Note that b and b+7b differ slightly in their fourth components. Compare the solutionsof Ax = b and Ax = b+7b.

Solution We used MATLAB to solve these two equations. If x1 denotes the solution of Ax = b,and x2 denotes the solution of Ax = b + 7b, the results are (rounded to the nearestinteger):

x1 =

−6538185706

−12562373271363−3616326

1427163

and x2 =

−6539

185747

−1256519

3272089

−3617120

1427447

.



(Note: Despite the fact that b and b +7b are nearly equal, x1 and x2 differ by almost800 in their fifth components.)

Example 7 illustrates that the solutions of Ax = b and Ax = b+7b may be quitedifferent even though 7b is a small vector. In order to explain these differences, let x1denote the solution ofAx = b and x2 the solution ofAx = b+7b. Therefore,Ax1 = band Ax2 = b+7b. To assess the difference, x2 − x1, we proceed as follows:

Ax2 − Ax1 = (b+7b)− b = 7b.Therefore, A(x2 − x1) = 7b, or

x2 − x1 = A−17b.

If A−1 contains large entries, then we see from the equation above that x2 − x1 can belarge even though 7b is small.

The Hilbert matrices described in Example 7 are well-known examples of ill-conditioned matrices and have large inverses. For example, the inverse of the (6 × 6)Hilbert matrix is

A−1 =

36 −630 3360 −7560 7560 −2772

−630 14700 −88200 211680 −220500 83160

3360 −88200 564480 −1411200 1512000 −582120

−7560 211680 −1411200 3628800 −3969000 1552320

7560 −220500 1512000 −3969000 4410000 −1746360

−2772 83160 −582120 1552320 −1746360 698544

.

Because of the large entries in A−1, we should not be surprised at the large differencebetween x1 and x2, the two solutions in Example 7.

1.9 EXERCISES

In Exercises 1–4, verify that B is the inverse of A byshowing that AB = BA = I .

1. A =[

7 45 3

], B =

[3 −4−5 7

]

2. A =[

3 102 10

], B =

[1 −1−.2 .3

]

3. A =−1 −2 11

1 3 −150 −1 5

, B =

0 1 35 5 41 1 1

4. A =

1 0 02 1 03 4 1

, B =

1 0 0−2 1 0

5 −4 1

In Exercises 5–8, use the appropriate inverse matrixfrom Exercises 1–4 to solve the given system of linearequations.

5. 3x1 + 10x2 = 62x1 + 10x2 = 9

6. 7x1 + 4x2 = 55x1 + 3x2 = 2

7. x2 + 3x3 = 45x1 + 5x2 + 4x3 = 2x1 + x2 + x3 = 2

8. x1 = 2−2x1 + x2 = 3

5x1 − 4x2 + x3 = 2

In Exercises 9–12, verify that the given matrix A doesnot have an inverse. [Hint: One of AB = I or BA = Ileads to an easy contradiction.]

9. A =

0 0 01 2 13 2 1

10. A =

0 4 20 1 70 3 9



11. A =

2 2 41 1 73 3 9

12. A =

1 1 11 1 12 3 2

In Exercises 13–21, reduce [A | I ] to find A−1. In eachcase, check your calculations by multiplying the givenmatrix by the derived inverse.

13.[

1 12 3

]14.

[2 36 7

]

15.[

1 22 1

]16.

−1 −2 11

1 3 −150 −1 5

17.

1 0 02 1 03 4 1

18.

1 3 50 1 40 2 7

19.

1 4 20 2 13 5 3

20. 1 −2 2 11 −1 5 02 −2 11 20 2 8 1

21. 1 2 3 1−1 0 2 1

2 1 −3 01 1 2 1

As in Example 5, determine whether the (2×2)matricesin Exercises 22–26 have an inverse. If A has an inverse,find A−1 and verify that A−1A = I .

22. A =[ −3 2

1 1

]23. A =

[2 −22 3

]

24. A =[ −1 3

2 1

]25. A =

[2 14 2

]

26. A =[

6 −29 −3

]

In Exercises 27–28 determine the value(s) of λ for whichA has an inverse.

27. A =[λ 41 λ

]28. A =

1 −2 34 −1 42 −3 λ

In Exercises 29–34, solve the given system by formingx = A−1b, where A is the coefficient matrix for thesystem.29. 2x1 + x2 = 4

3x1 + 2x2 = 230. x1 + x2 = 0

2x1 + 3x2 = 431. x1 − x2 = 5

3x1 − 4x2 = 232. 2x1 + 3x2 = 1

3x1 + 4x2 = 733. 3x1 + x2 = 10−x1 + 3x2 = 5

34. x1 − x2 = 102x1 + 3x2 = 4

The following matrices are used in Exercises 35–45.

A−1 =[

3 10 2

], B =

[1 22 1

], C−1 =

[ −1 11 2

].

(9)

In Exercises 35–45, use Theorem 17 and the matrices in(9) to formQ−1, whereQ is the given matrix.35. Q = AC 36. Q = CA37. Q = AT 38. Q = ATC39. Q = CTAT 40. Q = B−1A

41. Q = CB−1 42. Q = B−1

43. Q = 2A 44. Q = 10C45. Q = (AC)B−1

46. Let A be the matrix given in Exercise 13. Use theinverse found in Exercise 13 to obtain matrices Band C such that AB = D and CA = E, where

D =[ −1 2 3

1 0 2

]and E =

2 −11 10 3

.

47. Repeat Exercise 46 with A being the matrix givenin Exercise 16 and where

D =

2 −11 10 3

and E =

[ −1 2 31 0 2

].

48. For what values of a is

A =

1 1 −10 1 21 1 a

nonsingular?49. Find (AB)−1, (3A)−1, and (AT )−1 given that

A−1 =

1 2 53 1 62 8 1

and B−1 =

3 −3 45 1 37 6 −1

.



50. Find the (3 × 3) nonsingular matrix A if A2 =AB + 2A, where

B =

2 1 −10 3 2−1 4 1

.

51. Simplify (A−1B)−1(C−1A)−1(B−1C)−1 for (n×n)invertible matrices A,B, and C.

52. The equation x2 = 1 can be solved by settingx2 − 1 = 0 and factoring the expression to obtain(x−1)(x+1) = 0. This yields solutions x = 1 andx = −1.a) Using the factorization technique given above,

what (2× 2) matrix solutions do you obtain forthe matrix equation X2 = I?

b) Show that

A =[a 1− a2

1 − a

]

is a solution to X2 = I for every real number a.c) Let b = ±1. Show that

B =[b 0c −b

]

is a solution to X2 = I for every real number c.d) Explain why the factorization technique used in

part (a) did not yield all the solutions to thematrix equation X2 = I .

53. Suppose that A is a (2 × 2) matrix with columns uand v, so that A = [u, v], u and v in R2. Supposealso that uT u = 1, uT v = 0, and vT v = 1. Provethat ATA = I . [Hint: Express the matrix A as

A =[u1 v1

u2 v2

], u =

[u1

u2

], v

[v1

v2

]

and form the product ATA.]54. Let u be a vector in Rn such that uT u = 1. LetA = I−uuT , where I is the (n×n) identity. Verifythat AA = A. [Hint: Write the product uuT uuT asuuT uuT = u(uT u)uT, and note that uT u is a scalar.]

55. Suppose that A is an (n × n) matrix such thatAA = A, as in Exercise 54. Show that if A hasan inverse, then A = I .

56. Let A = I − avvT , where v is a nonzero vector inRn, I is the (n×n) identity, and a is the scalar givenby a = 2/(vT v). Show thatA is symmetric and thatAA = I ; that is, A−1 = A.

57. Consider the (n × n) matrix A defined in Exercise56. For x in Rn, show that the product Ax has theform Ax = x− λv, where λ is a scalar. What is thevalue of λ for a given x?

58. Suppose that A is an (n × n) matrix such thatATA = I (the matrix defined in Exercise 56 is sucha matrix). Let x be any vector in Rn. Show that‖Ax‖ = ‖x‖; that is, multiplication of x by A pro-duces a vector Ax having the same length as x.

59. Let u and v be vectors in Rn, and let I denote the(n × n) identity. Let A = I + uvT , and supposevT u = −1. Establish the Sherman–Woodberryformula:

A−1 = I − auvT, a = 1/(1+ vT u). (10)

[Hint: Form AA−1, where A−1 is given by formula(10).]

60. IfA is a square matrix, we define the powersA2,A3,and so on, as follows: A2 = AA,A3 = A(A2), andso on. Suppose A is an (n× n) matrix such that

A3 − 2A2 + 3A− I = O.Show that AB = I , where B = A2 − 2A+ 3I .

61. Suppose that A is (n× n) andA2 + b1A+ b0I = O, (11)

where b0 = 0. Show that AB = I , where B =(−1/b0)[A+ b1I ].

It can be shown that whenA is a (2×2)matrix such thatA−1 exists, then there are constants b1 and b0 such thatEq. (11) holds. Moreover, b0 = 0 in Eq. (11) unless Ais a multiple of I . In Exercises 62–65, find the constantsb1 and b0 in Eq. (11) for the given (2× 2)matrix. Also,verify that A−1 = (−1/b0)[A+ b1I ].62. A in Exercise 13. 63. A in Exercise 15.64. A in Exercise 14. 65. A in Exercise 22.66. a) If linear algebra software is available, solve the

systems Ax = b1 and Ax = b2, where

A =

0.932 0.443 0.4170.712 0.915 0.8870.632 0.514 0.493

,

b1 =

11−1

, b2 =

1.011.01−1.01

.

Note the large difference between the twosolutions.


Supplementary Exercises 105

b) Calculate A−1 and use it to explain the results ofpart (a).

67. a) Give examples of nonsingular (2× 2) matricesA and B such that A+ B is singular.

b) Give examples of singular (2× 2) matrices Aand B such that A+ B is nonsingular.

68. Let A be an (n× n) nonsingular symmetric matrix.Show that A−1 is also symmetric.

69. a) Suppose that AB = O, where A is nonsing-ular. Prove that B = O.

b) Find a (2× 2) matrix B such that AB = O,where B has nonzero entries and where A is thematrix

A =[

1 11 1

].

Why does this example not contradict part (a)?70. LetA,B, and C be matrices such thatA is nonsing-

ular and AB = AC. Prove that B = C.71. Let A be the (2× 2) matrix

A =[a b

c d

],

and set 7 = ad − bc. Prove that if 7 = 0, then Ais singular. Conclude that A has no inverse. [Hint:Consider the vector

v =[

d

−c

];

also treat the special case when d = c = 0.]

72. LetA and B be (n×n) nonsingular matrices. Showthat AB is also nonsingular.

73. What is wrong with the following argument that ifAB is nonsingular, then each ofA andB is also non-singular?

Since AB is nonsingular, (AB)−1 exists. Butby Theorem 17, property 2, (AB)−1 = B−1A−1.Therefore, A−1 and B−1 exist, so A and B arenonsingular.

74. Let A and B be (n × n) matrices such that AB isnonsingular.a) Prove that B is nonsingular. [Hint: Suppose v is

any vector such that Bv = θ , and write (AB)vas A(Bv).]

b) Prove that A is nonsingular. [Hint: By part (a),B−1 exists. Apply Exercise 72 to the matricesAB and B−1.]

75. Let A be a singular (n × n) matrix. Argue that atleast one of the systems Ax = ek , k = 1, 2, . . . , n,must be inconsistent, where e1, e2, . . . , en are then-dimensional unit vectors.

76. Show that the (n × n) identity matrix, I , is non-singular.

77. Let A and B be matrices such that AB = BA. Showthat A and B must be square and of the same order.[Hint: Let A be (p × q) and let B be (r × s). Nowshow that p = r and q = s.]

78. Use Theorem 3 to prove Theorem 16.79. Let A be (n × n) and invertible. Show that A−1 is

unique.

SUPPLEMENTARY EXERCISES

1. Consider the system of equations

x1 = 12x1 + (a2 + a − 2)x2 = a2 − a − 4.

For what values of a does the system have infinitelymany solutions? No solutions? A unique solutionin which x2 = 0?

2. Let

A =

1 −1 −12 −1 1−3 1 −3

, x =

x1

x2

x3

, and

b =b1

b2

b3

.



a) Determine conditions on b1, b2, and b3 that arenecessary and sufficient for the system ofequations Ax = b to be consistent.[Hint: Reduce the augmented matrix [A | b].]

b) For each of the following choices of b, eithershow that the system Ax = b is inconsistent orexhibit the solution.

i) b =

111

ii) b =

521

iii) b =

731

iv) b =

012

3. Let

A =

1 −1 32 −1 5−3 5 −10

1 0 4

and x =

x1

x2

x3

.

a) Simultaneously solve each of the systemsAx = bi , i = 1, 2, 3, where

b1 =

−5−17

1924

, b2 =

511−12

8

, and

b3 =

12−1

5

.

b) Let B = [b1, b2, b3]. Use the results of part (a)to exhibit a (3× 3) matrix C such that AC = B.

4. Let

A =[

1 −1 32 −1 4

]and C =

[1 23 1

].

Find a (3× 2) matrix B such that AB = C.5. Let A be the nonsingular (5× 5) matrix

A = [A1,A2,A3,A4,A5],and let B = [A5,A1,A4,A2,A3]. For a given vec-tor b, suppose that [1, 3, 5, 7, 9]T is the solution toBx = b. What is the solution of Ax = b?

6. Let

v1 =

113

, v2 =

214

, and v3 =

529

.

a) Solve the vector equationx1v1 + x2v2 + x3v3 = b, where

b =

85

18

.

b) Show that the set of vectors {v1, v2, v3} islinearly dependent by exhibiting a nontrivialsolution to the vector equationx1v1 + x2v2 + x3v3 = θ .

7. Let

A =

1 −1 32 −1 5−1 4 −5

and define a function T : R3 → R3 by T (x) = Axfor each

x =x1

x2

x3

in R3.a) Find a vector x in R3 such that T (x) = b, where

b =

132

.

b) If θ is the zero vector of R3, then clearlyT (θ) = θ . Describe all vectors x in R3 suchthat T (x) = θ .

8. Let

v1 =

1−1

3

, v2 =

2−1

5

, and v3 =

−1

4−5

.

Find

x =x1

x2

x3

so that xT v1 = 2, xT v2 = 3, and xT v3 = −4.


Conceptual Exercises 107

9. Find A−1 for each of the following matrices A

a) A =

1 2 12 5 41 1 0

b) A =[

cos θ − sin θsin θ cos θ

]10. For what values of λ is the matrix

A =[λ− 4 −1

2 λ− 1

]

singular? Find A−1 if A is nonsingular.

11. Find A if A is (2× 2) and (4A)−1 =[

3 15 2

].

12. Find A and B if they are (2× 2) and

A+ B =[

4 68 10

]and A− B =

[2 24 6

].

13. Let

A =

1 0 00 −1 00 0 −1

.

Calculate A99 and A100.

In Exercises 14–18, A and B are (3× 3) matrices suchthat

A−1 =

2 3 57 2 14 −4 3

and B−1 =

−6 4 3

7 −1 52 3 1

.

14. Without calculatingA, solve the system of equationsAx = b, where

x =x1

x2

x3

and b =

−1

01

.

15. Without calculating A or B, find (AB)−1.16. Without calculating A, find (3A)−1.17. Without calculating A or B, find (ATB)−1.18. Without calculating A or B, find[(A−1B−1)−1A−1B]−1.

CONCEPTUAL EXERCISES

In Exercises 1–8, answer true or false. Justify your an-swer by providing a counterexample if the statement isfalse or an outline of a proof if the statement is true.1. IfA and B are symmetric (n×n)matrices, thenAB

is also symmetric.2. IfA is an (n×n)matrix, thenA+AT is symmetric.3. If A and B are nonsingular (n × n) matrices such

that A2 = I and B2 = I , then (AB)−1 = BA.4. If A and B are nonsingular (n × n) matrices, thenA+ B is also nonsingular.

5. A consistent (3× 2) linear system of equations cannever have a unique solution.

6. IfA is an (m×n)matrix such thatAx = θ for everyx in Rn, then A is the (m× n) zero matrix.

7. IfA is a (2×2) nonsingular matrix and u1 and u2 arenonzero vectors in R2, then {Au1, Au2} is linearlyindependent.

8. LetA be (m×n) andB be (p×q). IfAB is definedand square, then BA is also defined and square.

In Exercises 9–16, give a brief answer.

9. Let P,Q, and R be nonsingular (n × n) matricessuch that PQR = I . Express Q−1 in terms of Pand R.



10. Suppose that each of A,B, and AB are symmetric(n× n) matrices. Show that AB = BA.

11. Let u1, u2, and u3 be nonzero vectors inRn such thatuT1 u2 = 0, uT1 u3 = 0, and uT2 u3 = 0. Show that{u1, u2, u3} is a linearly independent set.

12. Let u1 and u2 be linearly dependent vectors in R2,and let A be a (2× 2)matrix. Show that the vectorsAu1 and Au2 are linearly dependent.

13. An (n × n) matrix A is orthogonal provided thatAT = A−1, that is, if AAT = ATA = I . If Ais an (n × n) orthogonal matrix, then prove that‖x‖ = ‖Ax‖ for every vector x in Rn.

14. An (n × n) matrix A is idempotent if A2 = A.What can you say about A if it is both idempotentand nonsingular?

15. Let A and B be (n × n) idempotent matrices suchthat AB = BA. Show that AB is also idempotent.

16. An (n×n)matrixA isnilpotent of index k ifAk = Obut Ai = O for 1 ≤ i ≤ k − 1.a) Show: If A is nilpotent of index 2 or 3, then A

is singular.b) (Optional) Show: If A is nilpotent of index k,k ≥ 2, then A is singular. [Hint: Consider aproof by contradiction.]

MATLAB EXERCISES

Exercise 1 illustrates some ideas associated with population dynamics. We will look at thistopic again in Chapter 4, after we have developed the necessary analytical tools—eigenvaluesand eigenvectors.

1. Population dynamics An island is divided into three regions, A,B, and C. The yearlymigration of a certain animal among these regions is described by the following table.

From A From B From C

To A 70% 15% 10%To B 15% 80% 30%To C 15% 5% 60%

For example, the first column in the table tells us, in any given year, that 70% of thepopulation in A remains in region A, 15% migrates to B, and 15% migrates to C.

The total population of animals on the island is expected to remain stable for theforeseeable future and a census finds the current population consists of 300 in region A,350 in regionB, and 200 in regionC. Corresponding to the migration table and the census,we define a matrix A and a vector x0:

A =.70 .15 .10.15 .80 .30.15 .05 .60

x0 =

300350200

.

The matrix A is called the transition matrix and the vector x0 is the initial state vector. Ingeneral, let xk = [x1, x2, x3]T denote the state vector for year k. (The state vector tells usthat in year k there are x1 animals in region A, x2 in region B, and x3 in region C.) Then,using the transition matrix, we find in year k + 1 that the population distribution is givenby

xk+1 = Axk. (1)


MATLAB Exercises 109

a) Use Eq. (1) to find the population distribution one year after the census.b) Give a formula for xn in terms of powers of A and x0.c) Calculate the state vectors x1, x2, . . . , x10. Observe that the population distribution

seems to be reaching a steady state. Estimate the steady-state population foreach region.

d) Calculate x20 and compare it with your estimate from part c).e) Let x−1 denote the state vector one year prior to the census. Calculate x−1.f ) Demonstrate that Eq. (1) has not always been an accurate model for population

distribution by calculating the state vector four years prior to the census.g) How should we rearrange the population just after the census so that the distribution

three years later is x3 = [250, 400, 200]T ? That is, what should x0 be in order to hitthe target x3?

We have already seen one example of a partitioned matrix (also called a block matrix) whenwe wroteA in column form asA = [A1,A2, . . . ,An]; recall Section 1.6. Exercise 2 expandson this idea and illustrates how partitioned matrices can be multiplied in a natural way.

2. Partitionedmatrices A matrixA is a (2×2) block matrix if it is represented in the form

A =[A1 A2

A3 A4

],

where each of the Ai are matrices. Note that the matrix A need not be a square matrix; forinstance, Amight be (7× 12) with A1 being (3× 5), A2 being (3× 7), A3 being (4× 5),and A4 being (4 × 7). We can imagine creating a (2 × 2) block matrix by dividing thearray into four pieces using a horizontal line and a vertical line.

Now suppose B is also a (2× 2) block matrix given by

B =[B1 B2

B3 B4

].

Finally, let us suppose that the product AB can be formed and that B has been partitionedin a way such that the following matrix is defined:[

A1B1 + A2B3 A1B2 + A2B4

A3B1 + A4B3 A3B2 + A4B4

].

It turns out that the product AB is given by this block matrix. That is, if all the submatrixproducts are defined, then we can treat the blocks in a partitioned matrix as though theywere scalars when forming products. It is tedious to prove this result in general, so we askyou to illustrate its validity with some randomly chosen matrices.a) Using the MATLAB command round(10*rand(6, 6))generate two randomly

selected (6× 6) matrices A and B. Compute the product AB. Then write each of Aand B as a block matrix of the form

A =[A1 A2

A3 A4

]B =

[B1 B2

B3 B4

].



Above, each Ai and Bi should be a (3× 3) block. Using matrix surgery (see Section 4of Appendix A) extract the Ai and Bi matrices and form the new block matrix:[

A1B1 + A2B3 A1B2 + A2B4

A3B1 + A4B3 A3B2 + A4B4

].

Compare the preceding block matrix with AB and confirm that they are equal.b) Repeat this calculation on three other matrices (not necessarily (6× 6) matrices).

Break some of these matrices into blocks of unequal sizes. You need to make surethat corresponding blocks are the correct size so that matrix multiplication is defined.

c) Repeat the calculation in (a) with the product of a (2× 3) block matrix times a(3× 3) block matrix.

In Exercise 3, determine how many places were lost to round-off error when Ax = b wassolved on the computer?

3. This exercise expands on the topic of ill-conditioned matrices, introduced at the end ofSection 1.9. In general, a mathematician speaks of a problem as being ill conditioned ifsmall changes in the parameters of the problem lead to large changes in the solution to theproblem.

Part d) of this exercise also discusses a very practical question:

How much reliance can I place in the solution toAx = b that my com-puter gives me?

A reasonably precise assessment of this question can be made using the concept of acondition number for A.

An easily understood example of an ill-conditioned problem is the equation Ax = bwhere A is the (n× n) Hilbert matrix (see Example 7, Section 1.9 for the definition of theHilbert matrix). When A is the Hilbert matrix, then a small change in any entry of A or asmall change in any entry of b will lead to a large change in the solution of Ax = b.

Let A denote the (n × n) Hilbert matrix; in MATLAB, A can be created by thecommand A = hilb(n, n).a) Let B denote the inverse of A, as calculated by MATLAB. For n = 8, 9, 10, 11, and

12, form the product AB and note how the product looks less and less like the identity.In order to have the results clearly displayed, you might want to use the MATLABBank format for your output. For each value n, list the difference of the (1, 1) entries,(AB)11 − I11. [Note that it is not MATLAB’s fault that the inverse cannot becalculated with any accuracy. MATLAB’s calculations are all done with 17-placearithmetic, but the Hilbert matrix is so sensitive that seventeen places are not enough.]

b) This exercise illustrates how small changes in b can sometimes dramatically shift thesolution of Ax = b when A is an ill-conditioned matrix. Let A denote the (9× 9)Hilbert matrix and let b denote the (9× 1) column vector consisting entirely of 1’s.Use MATLAB to calculate the solution u = inv(A)∗b. Next change the fourthcomponent of b to 1.001 and let v = inv(A)∗b. Compare the difference between thetwo solution vectors u and v; what is the largest component (in absolute value) of thedifference vector u− v? For ease of comparison, you might form the matrix [u, v]and display it using Bank format.

c) This exercise illustrates that different methods of solving Ax = b may lead to wildlydifferent numerical answers in the computer when A is ill-conditioned. For A and b



as in part b), compare the solution vector u found using the MATLAB commandu = inv(A)*b with the solution w found using the MATLAB commandww = rref([A, b]). For comparison, display the matrix [u,w] using Bankformat. What is the largest component (in absolute value) of the difference vectoru− w?

d) To give a numerical measure for how ill conditioned a matrix is, mathematicians usethe concept of a condition number. You can find the definition of the conditionnumber in a numerical analysis text. The condition number has many uses, one ofwhich is to estimate the error between a machine-calculated solution to Ax = b andthe true solution. To explain, let xc denote the machine-calculated solution and let xtdenote the true solution. For a machine that uses d-place arithmetic, we can boundthe relative error between the true solution and the machine solution as follows:

‖xc − xt‖‖xt‖ ≤ 10−d Cond(A). (2)

In inequality (2), Cond(A) denotes the condition number. The left-hand side of theinequality is the relative error (sometimes also called the percentage error). The relativeerror has the following interpretation: If the relative error is about 10−k , then the twovectors xc and xt agree to about k places. Thus, using inequality (2), suppose Cond(A) isabout 10c and suppose we are using MATLAB so that d = 17. Then the right-hand sideof inequality (2) is roughly (10−17)(10c) = 10−(17−c). In other words, we might have asfew as 17−c correct places in the computer-calculated solution (we might have more than17− c correct places, but inequality (2) is sharp and so there will be problems for whichthe inequality is nearly an equality). If c = 14, for instance, then we might have as few as3 correct places in our answer.

Test inequality (2) using the (n × n) Hilbert matrix for n = 3, 4, . . . , 9. As thevector b, use the n-dimensional vector consisting entirely of 1’s. For a calculated solution,use MATLAB to calculate xc = inv(A)∗b where A is the (n × n) Hilbert matrix. Forthis illustration we also need to determine the true solution xt . Now, it is known that theHilbert matrix has an inverse with only integer entries, see Example 6 in Section 1.9 for alisting of the inverse of the (6×6)Hilbert matrix. (In fact, there is a known formula givingthe entries of Hilbert matrix inverses.) Therefore, the true solution to our problem is avector xt that has only integer entries. The calculated solution found by MATLAB canbe rounded in order to generate the true solution. Do so, using the MATLAB roundingcommand: xt = round(xc). Finally, the MATLAB command cond(A)will calcu-late the condition number for A. Prepare a table listing n, the left-hand side of inequality(2), and the right-hand side of inequality (2) with d = 17. Next, using the long format,display several of the pairs xc and xt and comment on how well the order of magnitude ofthe relative error compares with the number of correct places in xc.

May 24, 2001 14:10 i56-ch03 Sheet number 1 Page number 163 cyan black

163

3The VectorSpace Rn

Overview In Chapter 2 we discussed geometric vector concepts in the familiar setting of two-spaceand three-space. In this chapter, we extend these concepts to n-dimensional space. Forinstance, we will see that lines and planes in three-space give rise to the idea of a subspaceof Rn.

Many of the results in this chapter are grounded in the basic idea of linear indepen-dence that was introduced in Chapter 1. Linear independence is key, for example, todefining the concept of the dimension of a subspace or a basis for a subspace.

In turn, ideas such as subspace and basis are fundamental to modern mathematicsand applications. We will see how these ideas are used to solve applied problemsinvolving least-squares fits to data, Fourier series approximations of functions, systemsof differential equations, and so forth.

In this chapter, for example, Sections 3.8 and 3.9 deal with least-squares fits todata. As we see in these two sections, methods for determining a least-squares fit (anda framework for interpreting the results of a least-squares fit) cannot be understoodwithout a thorough appreciation of the basic topics in this chapter—subspace, basis, anddimension.

Core Sections 3.2 Vector Space Properties of Rn

3.3 Examples of Subspaces3.4 Bases for Subspaces3.5 Dimension3.6 Orthogonal Bases for Subspaces


164 Chapter 3 The Vector Space Rn

3.1 INTRODUCTION

In mathematics and the physical sciences, the term vector is applied to a wide variety ofobjects. Perhaps the most familiar application of the term is to quantities, such as forceand velocity, that have both magnitude and direction. Such vectors can be represented intwo-space or in three-space as directed line segments or arrows. (A review of geometricvectors is given in Chapter 2.) As we will see in Chapter 5, the term vector may alsobe used to describe objects such as matrices, polynomials, and continuous real-valuedfunctions. In this section we demonstrate that Rn, the set of n-dimensional vectors,provides a natural bridge between the intuitive and natural concept of a geometric vectorand that of an abstract vector in a general vector space. The remainder of the chapter isconcerned with the algebraic and geometric structure of Rn and subsets of Rn. Some ofthe concepts fundamental to describing this structure are subspace, basis, and dimension.These concepts are introduced and discussed in the first few sections. Although theseideas are relatively abstract, they are easy to understand in Rn, and they also haveapplication to concrete problems. Thus Rn will serve as an example and as a model forthe study in Chapter 5 of general vector spaces.

To make the transition from geometric vectors in two-space and three-space to two-dimensional and three-dimensional vectors inR2 andR3, recall that the geometric vector,v, can be uniquely represented as a directed line segment OP , with initial point at theorigin, O, and with terminal point P . If v is in two-space and point P has coordinates(a, b), then it is natural to represent v in R2 as the vector

x =[a

b

].

Similarly, if v is in three-space and point P has coordinates (a, b, c), then v can berepresented by the vector

x =a

b

c

a x

b

v

(a, b)

O

y

a

by

c

v(a, b, c)

O

z

x

Figure 3.1 Geometric vectors


3.1 Introduction 165

in R3 (see Fig. 3.1). Under the correspondence v → x described above, the usualgeometric addition of vectors translates to the standard algebraic addition in R2 andR3. Similarly, geometric multiplication by a scalar corresponds precisely to the stan-dard algebraic scalar multiplication (see Fig. 3.2). Thus the study of R2 and R3 allowsus to translate the geometric properties of vectors to algebraic properties. As we con-sider vectors from the algebraic viewpoint, it becomes natural to extend the conceptof a vector to other objects that satisfy the same algebraic properties but for whichthere is no geometric representation. The elements of Rn, n ≥ 4, are an immediateexample.

a + ca c x

b + db

d

v + wv

w

(a + c, b + d )(a, b)

(c, d )

O

y

caa x

cb

bv

cv(ca, cb)

(a, b)

O

y

Figure 3.2 Addition and scalar multiplication of vectors

We conclude this section by noting a useful geometric interpretation for vectors inR2 and R3. A vector

x =[a

b

]

in R2 can be represented geometrically as the point in the plane that has coordinates(a, b). Similarly, the vector

x =a

b

c

in R3 corresponds to the point in three-space that has coordinates (a, b, c). As the nexttwo examples illustrate, this correspondence allows us to interpret subsets of R2 and R3

geometrically.

Example 1 Give a geometric interpretation of the subset W of R2 defined by

W = {x: x =[x1

x2

], x1 + x2 = 2}.

Solution Geometrically, W is the line in the plane with equation x + y = 2 (see Fig. 3.3).



x

y

2

1

1 2

Figure 3.3 The line x + y = 2

Example 2 Let W be the subset of R3 defined by

W = {x: x =x1

x2

1

, x1 and x2 any real numbers}.

Give a geometric interpretation of W .

Solution Geometrically, W can be viewed as the plane in three-space with equation z = 1(see Fig. 3.4).

y

z

x

Figure 3.4 The plane z = 1

3.1 EXERCISES

Exercises 1–11 refer to the vectors given in (1).

u =[

31

], v =

[12

],

x =

013

, y =

210

,

(1)

In Exercises 1–11, sketch the geometric vector (withinitial point at the origin) corresponding to each of thevectors given.

1. u and −u 2. v and 2v3. u and −3u 4. v and −2v

5. u, v, and u + v6. u, 2v, and u + 2v7. u, v, and u − v8. u, v, and v − u9. x and 2x

10. y and −y11. x, y, and x + y

In Exercises 12–17, interpret the subset W of R2 geo-metrically by sketching a graph for W .

12. W = {x: x =[a

b

], a + b = 1}


3.2 Vector Space Properties of Rn 167

13. W = {x: x =[x1

x2

], x1 = −3x2,

x2 any real number}

14. W = {w: w =[

0b

], b any real number}

15. W = {u: u =[c

d

], c + d ≥ 0}

16. W = {x: x = t[

13

], t any real number}

17. W = {x: x =[a

b

], a2 + b2 = 4}

In Exercises 18–21, interpret the subset W of R3 geo-metrically by sketching a graph for W .

18. W = {x: x =a

00

, a > 0}

19. W = {x: x =x1

x2

x3

, x1 = −x2 − 2x3}

20. W = {w: w = r

201

, r any real number}

21. W = {u: u =a

b

c

, a2 + b2 + c2 = 1 and

c ≥ 0}In Exercises 22–26, give a set-theoretic description ofthe given points as a subset W of R2.22. The points on the line x − 2y = 123. The points on the x-axis24. The points in the upper half-plane25. The points on the line y = 226. The points on the parabola y = x2

In Exercises 27–30, give a set-theoretic description ofthe given points as a subset W of R3.27. The points on the plane x + y − 2z = 028. The points on the line with parametric equations

x = 2t , y = −3t , and z = t29. The points in the yz-plane30. The points in the plane y = 2

3.2 VECTOR SPACE PROPERTIES OF Rn

Recall that Rn is the set of all n-dimensional vectors with real components:

Rn = {x: x =

x1

x2...xn

, x1, x2, . . . , xn real numbers}.

If x and y are elements of Rn with

x =

x1

x2...xn

and y =

y1

y2...yn

,



then (see Section 1.5) the vector x + y is defined by

x + y =

x1 + y1

x2 + y2...

xn + yn

,

and if a is a real number, then the vector ax is defined to be

ax =

ax1

ax2...axn

.

In the context of Rn, scalars are always real numbers. In particular, throughout thischapter, the term scalar always means a real number.

The following theorem gives the arithmetic properties of vector addition and scalarmultiplication. Note that the statements in this theorem are already familiar from Section1.6, which discusses the arithmetic properties of matrix operations (a vector in Rn is an(n × 1) matrix, and hence the properties of matrix addition and scalar multiplicationlisted in Section 1.6 are inherited by vectors in Rn).

As we will see in Chapter 5, any set that satisfies the properties of Theorem 1 iscalled a vector space; thus for each positive integer n,Rn is an example of a vector space.

Theorem 1 If x, y, and z are vectors in Rn and a and b are scalars, then the following propertieshold:

Closure properties:(c1) x + y is in Rn.(c2) ax is in Rn.

Properties of addition:(a1) x + y = y+ x.(a2) x + (y+ z) = (x + y)+ z.(a3) Rn contains the zero vector, θ , and x + θ = x for all x in Rn.(a4) For each vector x in Rn, there is a vector −x in Rn such that x + (−x) = θ .

Properties of scalar multiplication:(m1) a(bx) = (ab)x.(m2) a(x + y) = ax + ay.(m3) (a + b)x = ax + bx.(m4) 1x = x for all x in Rn.

Subspaces of Rn

In this chapter we are interested in subsets, W , of Rn that satisfy all the properties ofTheorem 1 (with Rn replaced by W throughout). Such a subset W is called a subspace



ORIGINS OFHIGHER-DIMENSIONAL SPACES In addition to Grassmann(see Section 1.7), Sir William Hamilton (1805–1865) also envisioned algebras of n-tuples (which hecalled polyplets). In 1833, Hamilton gave rules for the addition and multiplication of ordered pairs,(a, b), which became the algebra of complex numbers, z = a + bi. He searched for years for anextension to 3-tuples. He finally discovered, in a flash of inspiration while crossing a bridge, that theextension was possible if he used 4-tuples (a, b, c, d) = a + bi + cj + dk. In this algebra of quaternions,however, multiplication is not commutative; for example, ij = k, but ji = −k. Hamilton stopped andcarved the basic formula, i2 = j 2 = k2 = ijk, on the bridge. He considered the quaternions his greatestachievement, even though his so-called Hamiltonian principle is considered fundamental to modernphysics.

of Rn. For example, consider the subset W of R3 defined by

W = {x: x =x1

x2

0

, x1 and x2 real numbers}.

Viewed geometrically, W is the xy-plane (see Fig. 3.5), so it can be represented by R2.Therefore, as can be easily shown, W is a subspace of R3.

y

z

x

Figure 3.5W as a subset of R3

The following theorem provides a convenient way of determining when a subsetWof Rn is a subspace of Rn.

Theorem 2 A subset W of Rn is a subspace of Rn if and only if the following conditions are met:

(s1)∗ The zero vector, θ , is in W .(s2) x + y is in W whenever x and y are in W .(s3) ax is in W whenever x is in W and a is any scalar.

Proof Suppose that W is a subset of Rn that satisfies conditions (s1)–(s3). To show that W isa subspace of Rn, we must show that the 10 properties of Theorem 1 (with Rn replacedby W throughout) are satisfied. But properties (a1), (a2), (m1), (m2), (m3), and (m4)are satisfied by every subset of Rn and so hold in W . Condition (a3) is satisfied by Wbecause the hypothesis (s1) guarantees that θ is in W . Similarly, (c1) and (c2) are givenby the hypotheses (s2) and (s3), respectively. The only remaining condition is (a4), andwe can easily see that −x = (−1)x. Thus if x is in W , then, by (s3), −x is also in W .Therefore, all the conditions of Theorem 1 are satisfied by W , and W is a subspace ofRn.

For the converse, suppose W is a subspace of Rn. The conditions (a3), (c1), and(c2) of Theorem 1 imply that properties (s1), (s2), and (s3) hold in W .

The next example illustrates the use of Theorem 2 to verify that a subset W of Rnis a subspace of Rn.

∗The usual statement of Theorem 2 lists only conditions (s2) and (s3) but assumes that the subsetW is nonempty.Thus (s1) replaces the assumption that W is nonempty. The two versions are equivalent (see Exercise 34).




W = {x: x =x1

x2

x3

, x1 = x2 − x3, x2 and x3 any real numbers}.

Verify that W is a subspace of R3 and give a geometric interpretation of W .

Solution To show thatW is a subspace ofR3, we must check that properties (s1)–(s3) of Theorem2 are satisfied by W . Clearly the zero vector, θ , satisfies the condition x1 = x2 − x3.Therefore, θ is in W , showing that (s1) holds. Now let u and v be in W , where

u =u1

u2

u3

and v =

v1

v2

v3

,

and let a be an arbitrary scalar. Since u and v are in W ,

u1 = u2 − u3 and v1 = v2 − v3. (1)

The sum u+ v and the scalar product au are given by

u+ v =u1 + v1

u2 + v2

u3 + v3

and au =

au1

au2

au3

.

To see that u+ v is in W , note that (1) gives

u1 + v1 = (u2 − u3)+ (v2 − v3) = (u2 + v2)− (u3 + v3). (2)

Thus if the components of u and v satisfy the condition x1 = x2 − x3, then so do thecomponents of the sum u + v. This argument shows that condition (s2) is met by W .Similarly, from (1),

au1 = a(u2 − u3) = au2 − au3, (3)

so au is in W . Therefore, W is a subspace of R3.Geometrically, W is the plane whose equation is x − y + z = 0 (see Fig. 3.6).

y

z

x

(1, 1, 0)

(0, 1, 1)

z = y – x

Figure 3.6 A portion of the plane x − y + z = 0



Verifying that Subsets are SubspacesExample 1 illustrates the typical procedure for verifying that a subset W of Rn is asubspace of Rn. In general such a verification proceeds along the following lines:

Verifying thatW Is a Subspace of Rn

Step 1. An algebraic specification for the subset W is given, and thisspecification serves as a test for determining whether a vector in Rn isor is not in W .

Step 2. Test the zero vector, θ , of Rn to see whether it satisfies the algebraicspecification required to be in W . (This shows that W is nonempty.)

Step 3. Choose two arbitrary vectors x and y from W . Thus x and y are in Rn,and both vectors satisfy the algebraic specification of W .

Step 4. Test the sum x + y to see whether it meets the specification of W .Step 5. For an arbitrary scalar, a, test the scalar multiple ax to see whether it

meets the specification of W .

The next example illustrates again the use of the procedure described above to verifythat a subset W of Rn is a subspace.


W = {x: x =x1

x2

x3

, x2 = 2x1, x3 = 3x1, x1 any real number}.

Verify that W is a subspace of R3 and give a geometric interpretation of W .

Solution For clarity in this initial example, we explicitly number the five steps used to verify thatW is a subspace.

1. The algebraic condition for x to be in W is

x2 = 2x1 and x3 = 3x1. (4)

In words, x is in W if and only if the second component of x is twice the firstcomponent and the third component of x is three times the first.

2. Note that the zero vector, θ , clearly satisfies (4). Therefore, θ is in W .3. Next, let u and v be two arbitrary vectors in W :

u =u1

u2

u3

and v =

v1

v2

v3

.



Because u and v are in W , each must satisfy the algebraic specification of W .That is,

u2 = 2u1 and u3 = 3u1 (5a)

v2 = 2v1 and v3 = 3v1. (5b)

4. Next, check whether the sum, u + v, is in W . (That is, does the vector u + vsatisfy Eq. (4)?) Now, the sum u+ v is given by

u+ v =u1 + v1

u2 + v2

u3 + v3

.

By (5a) and (5b), we have

u2 + v2 = 2(u1 + v1) and (u3 + v3) = 3(u1 + v1).

Thus u+ v is in W whenever u and v are both in W (see Eq. (4)).5. Similarly, for any scalar a, the scalar multiple au is given by

au =au1

au2

au3

.

Using (5a) gives au2 = a(2u1) = 2(au1) and au3 = a(3u1) = 3(au1). There-fore, au is in W whenever u is in W (see Eq. (4)).

Thus, by Theorem 2,W is a subspace of R3. Geometrically,W is a line through theorigin with parametric equations

x = x1

y = 2x1

z = 3x1.

The graph of the line is given in Fig. 3.7.

Exercise 29 shows that any line in three-space through the origin is a subspace ofR3, and Example 3 of Section 3.3 shows that in three-space any plane through the originis a subspace. Also note that for each positive integer n, Rn is a subspace of itself and{θ} is a subspace of Rn. We conclude this section with examples of subsets that are notsubspaces.


W = {x: x =x1

x2

1


Show that W is not a subspace of R3.



y

z

x

(1, 2, 3)

(1, 2, 0)

Figure 3.7 A geometric representation of the subspace W(see Example 2)

Solution To show that W is not a subspace of R3, we need only verify that at least one of theproperties (s1)–(s3) of Theorem 2 fails. Note that geometrically W can be interpretedas the plane z = 1, which does not contain the origin. In other words, the zero vector,θ , is not in W . Because condition (s1) of Theorem 2 is not met, W is not a subspace ofR3. Although it is not necessary to do so, in this example we can also show that bothconditions (s2) and (s3) of Theorem 2 fail. To see this, let x and y be in W , where

x =x1

x2

1

and y =

y1

y2

1

.

Then x + y is given by

x + y =x1 + y1

x2 + y2

2

.

In particular, x+ y is not in W , because the third component of x+ y does not have thevalue 1. Similarly,

ax =ax1

ax2

a

.

So if a �= 1, then ax is not in W .


W = {x: x =[x1

x2

], x1 and x2 any integers}.

Demonstrate that W is not a subspace of R2.



Solution In this case θ is in W , and it is easy to see that if x and y are in W , then so is x + y. Ifwe set

x =[

11

]

and a = 1/2, then x is inW but ax is not. Therefore, condition (s3) of Theorem 2 is notmet by W , and hence W is not a subspace of R2.

Example 5 Let W be the subspace of R2 defined by

W = {x: x =[x1

x2

], where either x1 = 0 or x2 = 0}.

Show that W is not a subspace of R2.

Solution Let x and y be defined by

x =[

10

]and y =

[01

].

Then x and y are in W . But

x + y =[

11

]

is not in W , so W is not a subspace of R2. Note that θ is in W , and for any vector x inW and any scalar a, ax is again in W . Geometrically, W is the set of points in the planethat lie either on the x-axis or on the y-axis. Either of these axes alone is a subspace ofR2, but, as this example demonstrates, their union is not a subspace.

3.2 EXERCISES

In Exercises 1–8, W is a subset of R2 consisting of vec-tors of the form

x =[x1

x2

].

In each case determine whether W is a subspace of R2.If W is a subspace, then give a geometric descriptionof W .

1. W = {x: x1 = 2x2}2. W = {x: x1 − x2 = 2}3. W = {x: x1 = x2 or x1 = −x2}4. W = {x: x1 and x2 are rational numbers}5. W = {x: x1 = 0}6. W = {x: |x1| + |x2| = 0}

7. W = {x: x21 + x2 = 1}

8. W = {x: x1x2 = 0}In Exercises 9–17, W is a subset of R3 consisting ofvectors of the form

x =x1

x2

x3

.

In each case, determine whetherW is a subspace of R3.If W is a subspace, then give a geometric descriptionof W .

9. W = {x: x3 = 2x1 − x2}10. W = {x: x2 = x3 + x1}11. W = {x: x1x2 = x3}



12. W = {x: x1 = 2x3}13. W = {x: x2

1 = x1 + x2}14. W = {x: x2 = 0}15. W = {x: x1 = 2x3, x2 = −x3}16. W = {x: x3 = x2 = 2x1}17. W = {x: x2 = x3 = 0}18. Let a be a fixed vector in R3, and defineW to be the

subset of R3 given byW = {x: aT x = 0}.

Prove that W is a subspace of R3.19. LetW be the subspace defined in Exercise 18, where

a =

123

.

Give a geometric description for W .20. LetW be the subspace defined in Exercise 18, where

a =

100

.

Give a geometric description of W .21. Let a and b be fixed vectors in R3, and let W be the

subset of R3 defined byW = {x: aT x = 0 and bT x = 0}.

Prove that W is a subspace of R3.

In Exercises 22–25, W is the subspace of R3 defined inExercise 21. For each choice of a and b, give a geomet-ric description of W .

22. a =

1−1

2

, b =

2−1

3

23. a =

122

, b =

130

24. a =

111

, b =

222

25. a =

10−1

, b =

−2

02

26. In R4, let x = [1,−3, 2, 1]T , y = [2, 1, 3, 2]T , andz = [−3, 2,−1, 4]T . Set a = 2 and b = −3. Illus-trate that the ten properties of Theorem 1 are satisfiedby x, y, z, a, and b.

27. In R2, suppose that scalar multiplication were de-fined by

ax = a[x1

x2

]=[

2ax1

2ax2

]

for every scalar a. Illustrate with specific examplesthose properties of Theorem 1 that are not satisfied.

28. Let

W = {x: x =[x1

x2

], x2 ≥ 0}.

In the statement of Theorem 1, replace each occur-rence of Rn with W . Illustrate with specific exam-ples each of the ten properties of Theorem 1 that arenot satisfied.

29. InR3, a line through the origin is the set of all pointsin R3 whose coordinates satisfy x1 = at , x2 = bt ,and x3 = ct ,where t is a variable and a, b, and c arenot all zero. Show that a line through the origin is asubspace of R3.

30. If U and V are subsets of Rn, then the set U + V isdefined byU + V = {x: x = u+ v, u in U and v in V }.

Prove that if U and V are subspaces of Rn, thenU + V is a subspace of Rn.

31. Let U and V be subspaces of Rn. Prove that theintersection, U ∩ V , is also a subspace of Rn.

32. Let U and V be the subspaces of R3 defined byU = {x: aT x = 0} and V = {x: bT x = 0},

where

a =

1

1

0

and b =

01−1

.

Demonstrate that the union, U ∪V , is not a subspace ofR3 (see Exercise 18).33. Let U and V be subspaces of Rn.

a) Show that the union, U ∪ V , satisfies properties(s1) and (s3) of Theorem 2.

b) If neither U nor V is a subset of the other, showthat U ∪ V does not satisfy condition (s2) of



Theorem 2. [Hint: Choose vectors u and v suchthat u is in U but not in V and v is in V but notin U . Assume that u+ v is in either U or V andreach a contradiction.]

34. LetW be a nonempty subset ofRn that satisfies con-ditions (s2) and (s3) of Theorem 2. Prove that θ is inW and conclude that W is a subspace of Rn. (Thusproperty (s1) of Theorem 2 can be replaced with theassumption that W is nonempty.)

3.3 EXAMPLES OF SUBSPACES

In this section we introduce several important and particularly useful examples of sub-spaces of Rn.

The Span of a SubsetTo begin, recall that if v1, . . . , vr are vectors in Rn, then a vector y in Rn is a linearcombination of v1, . . . , vr , provided that there exist scalars a1, . . . , ar such that

y = a1v1 + · · · + arvr .The next theorem shows that the set of all linear combinations of v1, . . . , vr is a

subspace of Rn.

Theorem 3 If v1, . . . , vr are vectors in Rn, then the set W consisting of all linear combinations ofv1, . . . , vr is a subspace ofRn.

Proof To show that W is a subspace of Rn, we must verify that the three conditions ofTheorem 2 are satisfied. Now θ is in W because

θ = 0v1 + · · · + 0vr .

Next, suppose that y and z are inW . Then there exist scalars a1, . . . , ar , b1, . . . , br suchthat

y = a1v1 + · · · + arvrand

z = b1v1 + · · · + brvr .Thus,

y+ z = (a1 + b1)v1 + · · · + (ar + br)vr ,so y+ z is a linear combination of v1, . . . , vr ; that is, y+ z is inW . Also, for any scalarc,

cy = (ca1)v1 + · · · + (car)vr .In particular, cy is in W . It follows from Theorem 2 that W is a subspace of Rn.

If S = {v1, . . . , vr} is a subset of Rn, then the subspace W consisting of all linearcombinations of v1, . . . , vr is called the subspace spanned by S and will be denoted by

Sp(S) or Sp{v1, . . . , vr}.


3.3 Examples of Subspaces 177

For a single vector v in Rn, Sp{v} is the subspace

Sp{v} = {av: a is any real number}.If v is a nonzero vector in R2 or R3, then Sp{v} can be interpreted as the line determinedby v (see Fig. 3.8). As a specific example, consider

v =

123

.

Then

Sp{v} = {t

123

: t is any real number}.

Thus Sp{v} is the line with parametric equations

x = t

y = 2tz = 3t.

Equivalently, Sp{v} is the line that passes through the origin and through the point withcoordinates 1, 2, and 3 (see Fig. 3.9).

v

av

Figure 3.8Sp{v}

y

z

x

(1, 2, 3)

(1, 2, 0)

Figure 3.9 Sp

1

23

If u and v are noncollinear geometric vectors, then

Sp{u, v} = {au+ bv: a, b any real numbers}is the plane containing u and v (see Fig. 3.10). The following example illustrates thiscase with a subspace of R3.



v bv

au au + bv

u

0

Figure 3.10 Sp{u, v}

Example 1 Let u and v be the three-dimensional vectors

u =

210

and v =

012

.

Determine W = Sp{u, v} and give a geometric interpretation of W .

Solution Let y be an arbitrary vector in R3, where

y =y1

y2

y3

.

Then y is in W if and only if there exist scalars x1 and x2 such that

y = x1u+ x2v. (1)

That is, y is in W if and only if there exist scalars x1 and x2 such that

y1 = 2x1

y2 = x1 + x2

y3 = 2x2.

(2)

The augmented matrix for linear system (2) is

2 0 y1

1 1 y2

0 2 y3

,

PHYSICAL REPRESENTATIONS OF VECTORS The vector space work ofGrassmann and Hamilton was distilled and popularized for the case of R3 by a Yale University physicist,Josiah Willard Gibbs (1839–1903). Gibbs produced a pamphlet, “Elements of Vector Analysis,” mainlyfor the use of his students. In it, and subsequent articles, Gibbs simplified and improved Hamilton’s workin multiple algebras with regard to three-dimensional space. This led to the familiar geometricalrepresentation of vector algebra in terms of operations on directed line segments.



and this matrix is row equivalent to the matrix

1 0 (1/2)y1

0 1 y2 − (1/2)y1

0 0 (1/2)y3 + (1/2)y1 − y2

(3)

in echelon form. Therefore, linear system (2) is consistent if and only if (1/2)y1− y2+(1/2)y3 = 0, or equivalently, if and only if

y1 − 2y2 + y3 = 0. (4)

Thus W is the subspace given by

W = {y =y1

y2

y3

: y1 − 2y2 + y3 = 0}. (5)

It also follows from Eq. (5) that geometricallyW is the plane in three-space with equationx − 2y + z = 0 (see Fig. 3.11).

y

z

xu

v

Figure 3.11A portion of the planex − 2y + z = 0

The Null Space of a MatrixWe now introduce two subspaces that have particular relevance to the linear system ofequations Ax = b, where A is an (m× n) matrix. The first of these subspaces is calledthe null space of A (or the kernel of A) and consists of all solutions of Ax = θ .

Definition 1 Let A be an (m × n) matrix. The null space of A [denoted N (A)] is the set ofvectors in Rn defined by

N (A) = {x: Ax = θ , x in Rn}.

In words, the null space consists of all those vectors x such thatAx is the zero vector.The next theorem shows that the null space of an (m× n) matrix A is a subspace of Rn.

Theorem 4 If A is an (m× n) matrix, then N (A) is a subspace of Rn.

Proof To show that N (A) is a subspace of Rn, we must verify that the three conditions ofTheorem 2 hold. Let θ be the zero vector in Rn. Then

Aθ = θ , (6)

and so θ is inN (A). (Note: In Eq. (6), the left θ is in Rn but the right θ is in Rm.) Nowlet u and v be vectors in N (A). Then u and v are in Rn and

Au = θ and Av = θ . (7)

To see that u + v is in N (A), we must test u + v against the algebraic specification ofN (A); that is, we must show that A(u+ v) = θ . But it follows from Eq. (7) that

A(u+ v) = Au+ Av = θ + θ = θ ,



and therefore u+ v is in N (A). Similarly, for any scalar a, it follows from Eq. (7) that

A(au) = aAu = aθ = θ .Therefore, au is in N (A). By Theorem 2, N (A) is a subspace of Rn.

Example 2 Describe N (A), where A is the (3× 4) matrix

A =

1 1 3 12 1 5 41 2 4 −1

.

Solution N (A) is determined by solving the homogeneous system

Ax = θ . (8)

This is accomplished by reducing the augmented matrix [A | θ ] to echelon form. It iseasy to verify that [A | θ ] is row equivalent to

1 0 2 3 00 1 1 −2 00 0 0 0 0

.

Solving the corresponding reduced system yields

x1 = −2x3 − 3x4

x2 = − x3 + 2x4

as the solution to Eq. (8). Thus a vector x in R4,

x =

x1

x2

x3

x4

,

is in N (A) if and only if x can be written in the form

x =

−2x3 − 3x4

−x3 + 2x4

x3

x4

= x3

−2−1

10

+ x4

−3

201

,

where x3 and x4 are arbitrary; that is,

N (A) = {x: x = x3

−2−1

10

+ x4

−3

201




As the next example demonstrates, the fact that N (A) is a subspace can be used toshow that in three-space every plane through the origin is a subspace.

Example 3 Verify that any plane through the origin in R3 is a subspace of R3.

Solution The equation of a plane in three-space through the origin is

ax + by + cz = 0, (9)

where a, b, and c are specified constants not all of which are zero. Now, Eq. (9) can bewritten as

Ax = θ ,where A is a (1× 3) matrix and x is in R3:

A = [a b c] and x =x

y

z

.

Thus x is on the plane defined by Eq. (9) if and only if x is in N (A). Since N (A)is a subspace of R3 by Theorem 4, any plane through the origin is a subspaceof R3.

The Range of a MatrixAnother important subspace associated with an (m × n) matrix A is the range of A,defined as follows.

Definition 2 Let A be an (m× n) matrix. The range of A [denotedR(A)] is the set of vectorsin Rm defined by

R(A) = {y: y = Ax for some x in Rn}.

In words, the range ofA consists of the set of all vectors y inRm such that the linearsystem

Ax = y

is consistent. As another way to view R(A), suppose that A is an (m× n) matrix. Wecan regard multiplication by A as defining a function from Rn to Rm. In this sense, as xvaries through Rn, the set of all vectors

y = Ax

produced in Rm constitutes the “range” of the function.



We saw in Section 1.5 (see Theorem 5) that if the (m × n) matrix A has columnsA1,A2, . . . ,An and if

x =

x1

x2...xn

,

then the matrix equation

Ax = y

is equivalent to the vector equation

x1A1 + x2A2 + · · · + xnAn = y.

Therefore, it follows that

R(A) = Sp{A1,A2, . . . ,An}.By Theorem 3, Sp{A1,A2, . . . ,An} is a subspace of Rm. (This subspace is also calledthe column space of matrix A.) Consequently,R(A) is a subspace of Rm, and we haveproved the following theorem.

Theorem 5 If A is an (m × n) matrix and if R(A) is the range of A, then R(A) is a subspaceof Rm.

The next example illustrates a way to give an algebraic specification forR(A).

Example 4 Describe the range of A, where A is the (3× 4) matrix

A =

1 1 3 12 1 5 41 2 4 −1

.

Solution Let b be an arbitrary vector in R3,

b =b1

b2

b3

.

Then b is inR(A) if and only if the system of equations

Ax = b

is consistent. The augmented matrix for the system is

[A | b] =

1 1 3 1 b1

2 1 5 4 b2

1 2 4 −1 b3

,



which is equivalent to

1 0 2 3 b2 − b1

0 1 1 −2 2b1 − b2

0 0 0 0 −3b1 + b2 + b3

.

It follows that Ax = b has a solution [or equivalently, b is in R(A)] if and only if−3b1 + b2 + b3 = 0, or b3 = 3b1 − b2, where b1 and b2 are arbitrary. Thus

R(A) = {b: b =

b1

b2

3b1 − b2

= b1

103

+ b2

01−1

, b1 and b2 any real numbers}.

The Row Space of a MatrixIf A is an (m × n) matrix with columns A1,A2, . . . ,An, then we have already definedthe column space of A to be

Sp{A1,A2, . . . ,An}.In a similar fashion, the rows of A can be regarded as vectors a1, a2, . . . , am in Rn, andthe row space of A is defined to be

Sp{a1, a2, . . . , am}.For example, if

A =[

11

20

31

],

then the row space of A is Sp{a1, a2}, where

a1 = [1 2 3] and a2 = [1 0 1].The following theorem shows that row-equivalent matrices have the same row space.

Theorem 6 Let A be an (m × n) matrix, and suppose that A is row equivalent to the (m × n)matrix B. Then A and B have the same row space.

The proof of Theorem 6 is given at the end of this section. To illustrate Theorem 6,let A be the (3× 3) matrix

A =

1 −1 12 −1 41 1 5

.



By performing the elementary row operationsR2−2R1,R3−R1,R1+R2, andR3−2R2,we obtain the matrix

B =

100

010

320

.

By Theorem 6, matrices A and B have the same row space. Clearly the zero row of Bcontributes nothing as an element of the spanning set, so the row space ofB is Sp{b1, b2},where

b1 = [1 0 3] and b2 = [0 1 2].If the rows of A are denoted by a1, a2, and a3, then

Sp{a1, a2, a3} = Sp{b1, b2}.More generally, given a subset S = {v1, . . . , vm} of Rn, Theorem 6 allows us to obtaina “nicer” subset T = {w1, . . . ,wk} of Rn such that Sp(S) = Sp(T ). The next exampleillustrates this.

Example 5 Let S = {v1, v2, v3, v4} be a subset of R3, where

v1 =

121

, v2 =

235

, v3 =

14−5

, and v4 =

25−1

.

Show that there exists a set T = {w1,w2} consisting of two vectors in R3 such thatSp(S) = Sp(T ).

Solution Let A be the (3× 4) matrixA = [v1, v2, v3, v4];

that is,

A =

1 2 1 22 3 4 51 5 −5 −1

.

The matrix AT is the (4× 3) matrix

AT =

1 2 12 3 51 4 −52 5 −1

,

and the row vectors of AT are precisely the vectors vT1 , vT2 , vT3 , and vT4 . It is straightfor-ward to see that AT reduces to the matrix

BT =

1 0 70 1 −30 0 00 0 0

.



So, by Theorem 6, AT and BT have the same row space. Thus A and B have the samecolumn space, where

B =

1 0 0 00 1 0 07 −3 0 0

.

In particular, Sp(S) = Sp(T ), where T = {w1,w2},

w1 =

107

and w2 =

0−1

3

.

Proof of Theorem 6 (Optional)Assume that A and B are row-equivalent (m× n) matrices. Then there is a sequence ofmatrices

A = A1, A2, . . . , Ak−1, Ak = Bsuch that for 2 ≤ j ≤ k,Aj is obtained by performing a single elementary row operationon Aj−1. It suffices, then, to show that Aj−1 and Aj have the same row space for eachj , 2 ≤ j ≤ k. This means that it is sufficient to consider only the case in which B isobtained from A by a single elementary row operation.

Let A have rows a1, . . . , am; that is, A is the (m× n) matrix

A =

a1...

aj...

ak...

am

,

where each ai is a (1× n) row vector;

ai = [ai1 ai2 · · · ain].Clearly the order of the rows is immaterial; that is, if B is obtained by interchanging thej th and kth rows of A,

B =

a1...

ak...

aj...

am

,



then A and B have the same row space because

Sp{a1, . . . , aj , . . . , ak, . . . , am} = Sp{a1, . . . , ak, . . . , aj , . . . , am}.Next, suppose that B is obtained by performing the row operation Rk + cRj on A; thus,

B =

a1...

aj...

ak + caj...

am

.

If the vector x is in the row space of A, then there exist scalars b1, . . . , bm such that

x = b1a1 + · · · + bjaj + · · · + bkak + · · · + bmam. (10)

The vector equation (10) can be rewritten as

x = b1a1 + · · · + (bj − cbk)aj + · · · + bk(ak + caj )+ · · · + bmam, (11)

and hence x is in the row space of B. Conversely, if the vector y is in the row space ofB, then there exist scalars d1, . . . , dm such that

y = d1a1 + · · · + djaj + · · · + dk(ak + caj )+ · · · + dmam. (12)

But Eq. (12) can be rearranged as

y = d1a1 + · · · + (dj + cdk)aj + · · · + dkak + · · · + dmam, (13)

so y is in the row space of A. Therefore, A and B have the same row space.The remaining case is the one in which B is obtained fromA by multiplying the j th

row by the nonzero scalar c. This case is left as Exercise 54 at the end of this section.

3.3 EXERCISES

Exercises 1–11 refer to the vectors in Eq. (14).

a =[

1−1

], b =

[2−3

], c =

[ −22

],

d =[

10

], e =

[00

].

(14)

In Exercises 1–11, either show that Sp(S) = R2 or givean algebraic specification for Sp(S). If Sp(S) �= R2,then give a geometric description of Sp(S).

1. S = {a} 2. S = {b} 3. S = {e}4. S = {a, b} 5. S = {a, d} 6. S = {a, c}

7. S = {b, e} 8. S = {a, b, d}9. S = {b, c, d} 10. S = {a, b, e}

11. S = {a, c, e}Exercises 12–19 refer to the vectors in Eq. (15).

v =

120

, w =

0−1

1

, x =

11−1

,

y =−2−2

2

, z =

102

(15)



In Exercises 12–19, either show that Sp(S) = R3 or givean algebraic specification for Sp(S). If Sp(S) �= R3,then give a geometric description of Sp(S).12. S = {v} 13. S = {w}14. S = {v,w} 15. S = {v, x}16. S = {v,w, x} 17. S = {w, x, z}18. S = {v,w, z} 19. S = {w, x, y}20. Let S be the set given in Exercise 14. For each vec-

tor given below, determine whether the vector is inSp(S). Express those vectors that are in Sp(S) as alinear combination of v and w.

a)

111

b)

11−1

c)

120

d)

231

e)

−1

24

f )

113

21. Repeat Exercise 20 for the set S given in Exer-cise 15.

22. Determine which of the vectors listed in Eq. (14) isin the null space of the matrix

A =[

2 23 3

].


A =

0 10 20 3

.


A = [−2 1 1].25. Determine which of the vectors listed in Eq. (15) is

in the null space of the matrix

A =

1 −1 02 −1 13 −5 −2

.

In Exercises 26–37, give an algebraic specification forthe null space and the range of the given matrix A.

26. A =[

1 −2−3 6

]27. A =

[ −1 32 −6

]

28. A =[

1 11 2

]29. A =

[1 12 5

]

30. A =[

1 −1 22 −1 5

]31. A =

[1 2 13 6 4

]

32. A =

1 32 71 5

33. A =

0 10 20 3

34. A =

1 −2 12 −3 51 0 7

35. A =

1 2 31 3 12 2 10

36. A =

1 0 −1−1 1 2

1 2 2

37. A =

1 2 12 5 41 3 4

38. Let A be the matrix given in Exercise 26.a) For each vector b that follows, determine

whether b is inR(A).b) If b is in R(A), then exhibit a vector x in R2

such that Ax = b.c) If b is in R(A), then write b as a linear

combination of the columns of A.

i) b =[

1−3

]ii) b =

[ −12

]

iii) b =[

11

]iv) b =

[ −26

]

v) b =[

3−6

]vi) b =

[00

]



39. Repeat Exercise 38 for the matrix A given in Exer-cise 27.

40. Let A be the matrix given in Exercise 34.a) For each vector b that follows, determine

whether b is in R(A).b) If b is in R(A), then exhibit a vector x in R3

such that Ax = b.c) If b is in R(A), then write b as a linear

combination of the columns of A.

i) b =

120

ii) b =

11−1

iii) b =

472

iv) b =

012

v) b =

01−2

vi) b =

000

41. Repeat Exercise 40 for the matrix A given in Exer-cise 35.

42. Let

W = {y =

2x1 − 3x2 + x3

−x1 + 4x2 − 2x3

2x1 + x2 + 4x3

: x1, x2, x3 real}.

Exhibit a (3 × 3) matrix A such that W = R(A).Conclude that W is a subspace of R3.

43. Let

W = {x =x1

x2

x3

: 3x1 − 4x2 + 2x3 = 0}.

Exhibit a (1 × 3) matrix A such that W = N (A).Conclude that W is a subspace of R3.

44. Let S be the set of vectors given in Exercise 16.Exhibit a matrix A such that Sp(S) = R(A).

45. Let S be the set of vectors given in Exercise 17.Exhibit a matrix A such that Sp(S) = R(A).

In Exercises 46–49, use the technique illustrated in Ex-ample 5 to find a set T = {w1,w2} consisting of twovectors such that Sp(S) = Sp(T ).

46. S =

10−1

,

221

,

122

47. S =

−2

13

,

22−1

,−2

77

48. S =

101

,−2

0−2

,

112

,−2

31

49. S =

122

,

153

,

062

,

1−1

1

50. Identify the range and the null space for each of thefollowing.a) The (n× n) identity matrixb) The (n× n) zero matrixc) Any (n× n) nonsingular matrix A

51. Let A and B be (n× n) matrices. Verify thatN (A) ∩N (B) ⊆ N (A+ B).

52. LetA be an (m× r)matrix and B an (r×n)matrix.a) Show that N (B) ⊆ N (AB).b) Show that R(AB) ⊆ R(A).

53. LetW be a subspace of Rn, and let A be an (m× n)matrix. Let V be the subset of Rm defined by

V = {y: y = Ax for some x in W }.Prove that V is a subspace of Rm.

54. LetA be an (m×n)matrix, and letB be obtained bymultiplying the kth row ofA by the nonzero scalar c.Prove that A and B have the same row space.

3.4 BASES FOR SUBSPACES

Two of the most fundamental concepts of geometry are those of dimension and theuse of coordinates to locate a point in space. In this section and the next, we extendthese notions to an arbitrary subspace of Rn by introducing the idea of a basis for a


3.4 Bases for Subspaces 189

subspace. The first part of this section is devoted to developing the definition of a basis,and in the latter part of the section, we present techniques for obtaining bases for thesubspaces introduced in Section 3.3. We will consider the concept of dimension inSection 3.5.

An example fromR2 will serve to illustrate the transition from geometry to algebra.We have already seen that each vector v in R2,

v =[a

b

], (1)

can be interpreted geometrically as the point with coordinates a and b. Recall that in R2

the vectors e1 and e2 are defined by

e1 =[

10

]and e2 =

[01

].

Clearly the vector v in (1) can be expressed uniquely as a linear combination of e1 ande2:

v = ae1 + be2. (2)

As we will see later, the set {e1, e2} is an example of a basis for R2 (indeed, it is calledthe natural basis for R2). In Eq. (2), the vector v is determined by the coefficientsa and b (see Fig. 3.12). Thus the geometric concept of characterizing a point by itscoordinates can be interpreted algebraically as determining a vector by its coefficientswhen the vector is expressed as a linear combination of “basis” vectors. (In fact, thecoefficients obtained are often referred to as the coordinates of the vector. This ideawill be developed further in Chapter 5.) We turn now to the task of making these ideasprecise in the context of an arbitrary subspace W of Rn.

a x

b

be2

ae1

v

(a, b)

O

y

1

1

e1

e2

Figure 3.12 v = ae1 + be2

Spanning SetsLet W be a subspace of Rn, and let S be a subset of W . The discussion above suggeststhat the first requirement for S to be a basis forW is that each vector inW be expressibleas a linear combination of the vectors in S. This leads to the following definition.



Definition 3 Let W be a subspace of Rn, and let S = {w1, . . . ,wm} be a subset of W . We saythat S is a spanning set for W , or simply that S spansW , if every vector w in Wcan be expressed as a linear combination of vectors in S;

w = a1w1 + · · · + amwm.

A restatement of Definition 3 in the notation of the previous section is that S is aspanning set of W provided that Sp(S) = W . It is evident that the set S = {e1, e2, e3},consisting of the unit vectors in R3, is a spanning set for R3. Specifically, if v is in R3,

v =a

b

c

, (3)

then v = ae1 + be2 + ce3. The next two examples consider other subsets of R3.

Example 1 In R3, let S = {u1, u2, u3}, where

u1 =

1−1

0

, u2 =

−2

31

, and u3 =

124

.

Determine whether S is a spanning set for R3.

Solution We must determine whether an arbitrary vector v in R3 can be expressed as a linearcombination of u1, u2, and u3. In other words, we must decide whether the vectorequation

x1u1 + x2u2 + x3u3 = v, (4)

where v is the vector in (3), always has a solution. The vector equation (4) is equivalentto the (3× 3) linear system with the matrix equation

Ax = v, (5)

where A is the (3× 3) matrix A = [u1, u2, u3]. The augmented matrix for Eq. (5) is

[A | v] =

1 −2 1 a

−1 3 2 b

0 1 4 c

,

and this matrix is row equivalent to

1 0 0 10a + 9b − 7c0 1 0 4a + 4b − 3c0 0 1 −a − b + c

.



Therefore,x1 = 10a + 9b − 7cx2 = 4a + 4b − 3cx3 = −a − b + c

is the solution of Eq. (4). In particular, Eq. (4) always has a solution, so S is a spanningset for R3.

Example 2 Let S = {v1, v2, v3} be the subset of R3 defined by

v1 =

123

, v2 =

−1

0−7

, and v3 =

270

.

Does S span R3?

Solution Let v be the vector given in Eq. (3). As before, the vector equation

x1v1 + x2v2 + x3v3 = v (6)

is equivalent to the (3× 3) system of equations

Ax = v, (7)

where A = [v1, v2, v3]. The augmented matrix for Eq. (7) is

[A | v] =

1 −1 2 a

2 0 7 b

3 −7 0 c

,

and the matrix [A | v] is row equivalent to

1 0 7/2 b/20 1 3/2 −a + (1/2)b0 0 0 −7a + 2b + c

.

It follows that Eq. (6) has a solution if and only if −7a + 2b + c = 0. In particular, Sdoes not span R3. Indeed,

Sp(S) = {v: v =a

b

c

, where− 7a + 2b + c = 0}.

For example, the vector

w =

111

is in R3 but is not in Sp(S); that is, w cannot be expressed as a linear combination of v1,v2, and v3.



The next example illustrates a procedure for constructing a spanning set for the nullspace, N (A), of a matrix A.


A =

1 1 3 12 1 5 41 2 4 −1

.

Exhibit a spanning set for N (A), the null space of A.

Solution The first step toward obtaining a spanning set for N (A) is to obtain an algebraicspecification for N (A) by solving the homogeneous system Ax = θ . For the givenmatrix A, this was done in Example 2 of Section 3.3. Specifically,

N (A) = {x: x =

−2x3 − 3x4

−x3 + 2x4

x3

x4


Thus a vector x in N (A) is totally determined by the unconstrained parameters x3 andx4. Separating those parameters gives a decomposition of x:

x =

−2x3 − 3x4

−x3 + 2x4

x3

x4

=

−2x3

− x3

x3

0

+−3x4

2x4

0x4

= x3

−2−1

10

+ x4

−3

201

. (8)

Let u1 and u2 be the vectors

u1 =

−2−1

10

and u2 =

−3

201

.

By setting x3 = 1 and x4 = 0 in Eq. (8), we obtain u1, so u1 is in N (A). Similarly,u2 can be obtained by setting x3 = 0 and x4 = 1, so u2 is in N (A). Moreover, it is animmediate consequence of Eq. (8) that each vector x in N (A) is a linear combinationof u1 and u2. Therefore, N (A) = Sp{u1, u2}; that is, {u1, u2} is a spanning set forN (A).

The remaining subspaces introduced in Section 3.3 were either defined or character-ized by a spanning set. If S = {v1, . . . , vr} is a subset ofRn, for instance, then obviouslyS is a spanning set for Sp(S). If A is an (m× n) matrix,

A = [A1, . . . ,An],



then, as we saw in Section 3.3, {A1, . . . ,An} is a spanning set forR(A), the range of A.Finally, if

A =

a1

a2...

am

,

where ai is the ith-row vector of A, then, by definition, {a1, . . . , am} is a spanning setfor the row space of A.

Minimal Spanning SetsIf W is a subspace of Rn, W �= {θ}, then spanning sets for W abound. For example, avector v in a spanning set can always be replaced by av, where a is any nonzero scalar.It is easy to demonstrate, however, that not all spanning sets are equally desirable. Forexample, define u in R2 by

u =[

11

].

The set S = {e1, e2, u} is a spanning set for R2. Indeed, for an arbitrary vector v in R2,

v =[a

b

],

v = (a− c)e1 + (b− c)e2 + cu, where c is any real number whatsoever. But the subset{e1, e2} already spans R2, so the vector u is unnecessary.

Recall that a set {v1, . . . , vm} of vectors in Rn is linearly independent if the vectorequation

x1v1 + · · · + xmvm = θ (9)

has only the trivial solution x1 = · · · = xm = 0; if Eq. (9) has a nontrivial solution, thenthe set is linearly dependent. The set S = {e1, e2, u} is linearly dependent because

e1 + e2 − u = θ .Our next example illustrates that a linearly dependent set is not an efficient spanning set;that is, fewer vectors will span the same space.

Example 4 Let S = {v1, v2, v3} be the subset of R3, where

v1 =

111

, v2 =

231

, and v3 =

351

.

Show that S is a linearly dependent set, and exhibit a subset T of S such that T containsonly two vectors but Sp(T ) = Sp(S).

Solution The vector equation

x1v1 + x2v2 + x3v3 = θ (10)



is equivalent to the (3× 3) homogeneous system of equations with augmented matrix

A =

111

231

351

000

.

Matrix A is row equivalent to

B =

1 0 −1 00 1 2 00 0 0 0

in echelon form. Solving the system with augmented matrix B gives

x1 = x3

x2 = −2x3.

Because Eq. (10) has nontrivial solutions, the set S is linearly dependent. Taking x3 = 1,for example, gives x1 = 1, x2 = −2. Therefore,

v1 − 2v2 + v3 = θ . (11)

Equation (11) allows us to express v3 as a linear combination of v1 and v2:

v3 = −v1 + 2v2.

(Note that we could just as easily have solved Eq. (11) for either v1 or v2.) It now followsthat

Sp{v1, v2} = Sp{v1, v2, v3}.To illustrate, let v be in the subspace Sp{v1, v2, v3}:

v = a1v1 + a2v2 + a3v3.

Making the substitution v3 = −v1 + 2v2, yields

v = a1v1 + a2v2 + a3(−v1 + 2v2).

This expression simplifies to

v = (a1 − a3)v1 + (a2 + 2a3)v2;in particular, v is in Sp{v1, v2}. Clearly any linear combination of v1 and v2 is in Sp(S)because

b1v1 + b2v2 = b1v1 + b2v2 + 0v3.

Thus if T = {v1, v2}, then Sp(T ) = Sp(S).

The lesson to be drawn from Example 4 is that a linearly dependent spanning setcontains redundant information. That is, if S = {w1, . . . ,wr} is a linearly dependentspanning set for a subspaceW , then at least one vector from S is a linear combination ofthe other r − 1 vectors and can be discarded from S to produce a smaller spanning set.On the other hand, if B = {v1, . . . , vm} is a linearly independent spanning set for W ,then no vector in B is a linear combination of the other m− 1 vectors in B. Hence if a



vector is removed from B, this smaller set cannot be a spanning set forW (in particular,the vector removed from B is in W but cannot be expressed as a linear combinationof the vectors retained). In this sense a linearly independent spanning set is a minimalspanning set and hence represents the most efficient way of characterizing the subspace.This idea leads to the following definition.

Definition 4 Let W be a nonzero subspace of Rn. A basis for W is a linearly independentspanning set for W .

Note that the zero subspace of Rn, W = {θ}, contains only the vector θ . Althoughit is the case that {θ} is a spanning set for W , the set {θ} is linearly dependent. Thus theconcept of a basis is not meaningful for W = {θ}.Uniqueness of RepresentationLet B = {v1, v2, . . . , vp} be a basis for a subspace W of Rn, and let x be a vector in W .Because B is a spanning set, we know that there are scalars a1, a2, . . . , ap such that

x = a1v1 + a2v2 + · · · + apvp. (12)

Because B is also a linearly independent set, we can show that the representation of x inEq. (12) is unique. That is, if we have any representation of the form x= b1v1+ b2v2+· · · + bpvp, then a1 = b1, a2 = b2, . . . , ap = bp. To establish this uniqueness, supposethat b1, b2, . . . , bp are any scalars such that

x = b1v1 + b2v2 + · · · + bpvp.

Subtracting the preceding equation from Eq. (12), we obtain

θ = (a1 − b1)v1 + (a2 − b2)v2 + · · · + (ap − bp)vp.Then, using the fact that {v1, v2, . . . , vp} is linearly independent, we see that a1−b1 = 0,a2 − b2 = 0, . . . , ap − bp = 0. This discussion of uniqueness leads to the followingremark.

Remark Let B = {v1, v2, . . . , vp} be a basis for W , where W is a subspace of Rn. Ifx is in W , then x can be represented uniquely in terms of the basis B. That is, there areunique scalars a1, a2, . . . , ap such that

x = a1v1 + a2v2 + · · · + apvp.

As we see later, these scalars are called the coordinates of x with respect to the basis.

Examples of BasesIt is easy to show that the unit vectors

e1 =

100

, e2 =

010

, and e3 =

001



constitute a basis for R3. In general, the n-dimensional vectors e1, e2, . . . , en form abasis for Rn, frequently called the natural basis.

In Exercise 30, the reader is asked to use Theorem 13 of Section 1.7 to prove thatany linearly independent subset B = {v1, v2, v3} of R3 is actually a basis for R3. Thus,for example, the vectors

v1 =

100

, v2 =

110

, and v3 =

111

provide another basis for R3.In Example 3, a procedure for determining a spanning set forN (A), the null space

of a matrixA, was illustrated. Note in Example 3 that the spanning set {u1, u2} obtainedis linearly independent, so it is a basis forN (A). Oftentimes, if a subspaceW of Rn hasan algebraic specification in terms of unconstrained variables, the procedure illustratedin Example 3 yields a basis for W . The next example provides another illustration.

Example 5 Let A be the (3 × 4) matrix given in Example 4 of Section 3.3. Use the algebraicspecification ofR(A) derived in that example to obtain a basis forR(A).

Solution In Example 4 of Section 3.3, the range of A was determined to be

R(A) = {b: b =

b1

b2

3b1 − b2


Thus b1 and b2 are unconstrained variables, and a vector b inR(A) can be decomposedas

b =

b1

b2

3b1 − b2

=

b1

03b1

+

0b2

−b2

= b1

103

+ b2

01−1

. (13)

If u1 and u2 are defined by

u1 =

103

and u =

01−1

,

then u1 and u2 are inR(A). One can easily check that {u1, u2} is a linearly independentset, and it is evident from Eq. (13) that R(A) is spanned by u1 and u2. Therefore,{u1, u2} is a basis forR(A).

The previous example illustrates how to obtain a basis for a subspace W , givenan algebraic specification for W . The last two examples of this section illustrate twodifferent techniques for constructing a basis for W from a spanning set.



Example 6 Let W be the subspace of R4 spanned by the set S = {v1, v2, v3, v4, v5}, where

v1 =

112−1

, v2 =

1211

, v3 =

14−1

5

,

v4 =

104−1

, and v5 =

2502

.

Find a subset of S that is a basis for W .

Solution The procedure is suggested by Example 4. The idea is to solve the dependence relation

x1v1 + x2v2 + x3v3 + x4v4 + x5v5 = θ (14)

and then determine which of the vj ’s can be eliminated. If V is the (4× 5) matrix

V = [v1, v2, v3, v4, v5],then the augmented matrix [V | θ ] reduces to

1 0 −2 0 1 00 1 3 0 2 00 0 0 1 −1 00 0 0 0 0 0

. (15)

The system of equations with augmented matrix (15) has solution

x1 = 2x3 − x5

x2 = −3x3 − 2x5

x4 = x5,

(16)

where x3 and x5 are unconstrained variables. In particular, the set S is linearly dependent.Moreover, taking x3 = 1 and x5 = 0 yields x1 = 2, x2 = −3, and x4 = 0. ThusEq. (14) becomes

2v1 − 3v2 + v3 = θ . (17)

Since Eq. (17) can be solved for v3,v3 = −2v1 + 3v2,

it follows that v3 is redundant and can be removed from the spanning set. Similarly,setting x3 = 0 and x5 = 1 gives x1 = −1, x2 = −2, and x4 = 1. In this case, Eq. (14)becomes

−v1 − 2v2 + v4 + v5 = θ ,



and hence

v5 = v1 + 2v2 − v4.

Since both v3 and v5 are in Sp{v1, v2, v4}, it follows (as in Example 4) that v1, v2, andv4 span W .

To see that the set {v1, v2, v4} is linearly independent, note that the dependencerelation

x1v1 + x2v2 + x4v4 = θ (18)

is just Eq. (14) with v3 and v5 removed. Thus the augmented matrix [v1, v2, v4 | θ ], forEq. (18) reduces to

1000

0100

0010

0000

, (19)

which is matrix (15) with the third and fifth columns removed. From matrix (19), it isclear that Eq. (18) has only the trivial solution; so {v1, v2, v4} is a linearly independentset and therefore a basis for W .

The procedure demonstrated in the preceding example can be outlined as follows:

1. A spanning set S{v1, . . . , vm} for a subspace W is given.2. Solve the vector equation

x1v1 + · · · + xmvm = θ . (20)

3. If Eq. (20) has only the trivial solution x1 = · · · = xm = 0, then S is a linearlyindependent set and hence is a basis for W .

4. If Eq. (20) has nontrivial solutions, then there are unconstrained variables. Foreach xj that is designated as an unconstrained variable, delete the vector vj fromthe set S. The remaining vectors constitute a basis for W .

Our final technique for constructing a basis uses Theorem 7.

Theorem 7 If the nonzero matrix A is row equivalent to the matrix B in echelon form, then thenonzero rows of B form a basis for the row space of A.

Proof By Theorem 6, A and B have the same row space. It follows that the nonzero rows ofB span the row space of A. Since the nonzero rows of an echelon matrix are linearlyindependent vectors, it follows that the nonzero rows of B form a basis for the row spaceof A.

Example 7 Let W be the subspace of R4 given in Example 6. Use Theorem 7 to construct a basisfor W .



Solution As in Example 6, let V be the (4× 5) matrix

V = [v1, v2, v3, v4, v5].Thus W can be viewed as the row space of the matrix V T , where

V T =

1 1 2 −11 2 1 11 4 −1 51 0 4 −12 5 0 2

.

Since V T is row equivalent to the matrix

BT =

1 0 0 −90 1 0 40 0 1 20 0 0 00 0 0 0

in echelon form, it follows from Theorem 7 that the nonzero rows of BT form a basisfor the row space of V T . Consequently the nonzero columns of

B =

1 0 0 0 00 1 0 0 00 0 1 0 0−9 4 2 0 0

are a basis for W . Specifically, the set {u1, u2, u3} is a basis of W , where

u1 =

100−9

, u2 =

0104

, and u3 =

0012

.

The procedure used in the preceding example can be summarized as follows:

1. A spanning set S = {v1, . . . , vm} for a subspace W of Rn is given.2. Let V be the (n×m) matrix V = [v1, . . . vm]. Use elementary row operations

to transform V T to a matrix BT in echelon form.3. The nonzero columns of B are a basis for W .



3.4 EXERCISES

In Exercises 1–8, letW be the subspace ofR4 consistingof vectors of the form

x =

x1

x2

x3

x4

.

Find a basis forW when the components of x satisfy thegiven conditions.

1. x1 + x2 − x3 = 0x2 − x4 = 0

2. x1 + x2 − x3 + x4 = 0x2 − 2x3 − x4 = 0

3. x1 − x2 + x3 − 3x4 = 04. x1 − x2 + x3 = 05. x1 + x2 = 06. x1 − x2 = 0

x2 − 2x3 = 0x3 − x4 = 0

7. −x1 + 2x2 − x4 = 0x2 + x3 = 0

8. x1 − x2 − x3 + x4 = 0x2 + x3 = 0

9. Let W be the subspace described in Exercise 1. Foreach vector x that follows, determine if x is inW . Ifx is in W , then express x as a linear combination ofthe basis vectors found in Exercise 1.

a) x =

1121

b) x =

−1

232

c) x =

3−3

0−3

d) x =

2020

10. Let W be the subspace described in Exercise 2. Foreach vector x that follows, determine if x is in W .If x is in W , then express x as a linear combinationof the basis vectors found in Exercise 2.

a) x =

−3

3

1

1

b) x =

0

3

2

−1

c) x =

7

8

3

2

d) x =

4

−2

0

−2

In Exercises 11–16:a) Find a matrix B in reduced echelon form such thatB is row equivalent to the given matrix A.

b) Find a basis for the null space of A.c) As in Example 6, find a basis for the range of A that

consists of columns of A. For each column, Aj , ofA that does not appear in the basis, express Aj as alinear combination of the basis vectors.

d) Exhibit a basis for the row space of A.

11. A =

1 2 3 −13 5 8 −21 1 2 0

12. A =

1 1 21 1 22 3 5

13. A =

1 2 1 02 5 3 −12 2 0 20 1 1 −1

14. A =

2 2 02 1 12 3 0

15. A =

1 2 12 4 13 6 2



16. A =

2 1 22 2 12 3 0

17. Use the technique illustrated in Example 7 to obtaina basis for the range of A, where A is the matrixgiven in Exercise 11.

18. Repeat Exercise 17 for the matrix given in Exer-cise 12.



In Exercises 21–24 for the given set S:a) Find a subset of S that is a basis for Sp(S) using the

technique illustrated in Example 6.b) Find a basis for Sp(S) using the technique

illustrated in Example 7.

21. S ={[

12

],

[24

]}

22. S ={[

12

],

[21

],

[32

]}

23. S =

121

,

250

,

371

,

113

24. S =

12−1

3

,−2

12−1

,−1−1

1−3

,−2

220

25. Find a basis for the null space of each of the follow-ing matrices.

a)

[1 0 01 0 1

]b)

[1 1 01 1 0

]

c)

[1 1 01 1 1

]

26. Find a basis for the range of each matrix in Exer-cise 25.

27. Let S = {v1, v2, v3}, where

v1 =

121

, v2 =

−1−1

1

, and

v3 =−1

15

.

Show that S is a linearly dependent set, and verifythat Sp{v1, v2, v3} = Sp{v1, v2}.

28. Let S = {v1, v2, v3}, where

v1 =[

10

], v2 =

[01

], and

v3 =[ −1

1

].

Find every subset of S that is a basis for R2.29. Let S = {v1, v2, v3, v4}, where

v1 =

121

, v2 =

−1−1

1

,

v3 =−1

17

, and v4 =

−2−4−4

.

Find every subset of S that is a basis for R3.30. LetB = {v1, v2, v3} be a set of linearly independent

vectors in R3. Prove that B is a basis for R3. [Hint:Use Theorem 13 of Section 1.7 to show that B is aspanning set for R3.]

31. Let B = {v1, v2, v3} be a subset of R3 such thatSp(B) = R3. Prove that B is a basis for R3. [Hint:Use Theorem 13 of Section 1.7 to show that B is alinearly independent set.]

In Exercises 32–35, determine whether the given set Sis a basis for R3.

32. S =

1−1−2

,

112

,

2−3−3



33. S =

11−2

,

252

,

132

34. S =

1−1−2

,

112

,

2−3−3

,

145

35. S =

11−2

,

252

36. Find a vector w in R3 such that w is not a linearcombination of v1 and v2:

v1 =

12−1

, and v2 =

2−1−2

.

37. Prove that every basis for R2 contains exactly twovectors. Proceed by showing the following:a) A basis for R2 cannot have more than two

vectors.b) A basis for R2 cannot have one vector. [Hint:

Suppose that a basis for R2 could contain onevector. Represent e1 and e2 in terms of the basisand obtain a contradiction.]

38. Show that any spanning set for Rn must containat least n vectors. Proceed by showing that ifu1, u2, . . . ,up are vectors in Rn, and if p < n,then there is a nonzero vector v in Rn such thatvT ui = 0, 1 ≤ i ≤ p. [Hint: Write the constraintsas a (p × n) system and use Theorem 4 of Section1.3.] Given v as above, can v be a linear combinationof u1, u2, . . . , up?

39. Recalling Exercise 38, prove that every basis for Rncontains exactly n vectors.

3.5 DIMENSION

In this section we translate the geometric concept of dimension into algebraic terms.Clearly R2 and R3 have dimension 2 and 3, respectively, since these vector spaces aresimply algebraic interpretations of two-space and three-space. It would be natural toextrapolate from these two cases and declare that Rn has dimension n for each positiveinteger n; indeed, we have earlier referred to elements of Rn as n-dimensional vectors.But if W is a subspace of Rn, how is the dimension of W to be determined? Anexamination of the subspace, W , of R3 defined by

W = {x: x =x2 − 2x3

x2

x3

, x2 and x3 any real numbers}

suggests a possibility. Geometrically, W is the plane with equation x = y − 2z, sonaturally the dimension of W is 2. The techniques of the previous section show that Whas a basis {v1, v2} consisting of the two vectors

v1 =

110

and v2 =

−2

01

.

Thus in this case the dimension of W is equal to the number of vectors in a basis for W .

The Definition of DimensionMore generally, for any subspace W of Rn, we wish to define the dimension of W tobe the number of vectors in a basis for W . We have seen, however, that a subspace W


3.5 Dimension 203

may have many different bases. In fact, Exercise 30 of Section 3.4 shows that any set ofthree linearly independent vectors in R3 is a basis for R3. Therefore, for the concept ofdimension to make sense, we must show that all bases for a given subspace W containthe same number of vectors. This fact will be an easy consequence of the followingtheorem.

Theorem 8 Let W be a subspace of Rn, and let B = {w1,w2, . . . ,wp} be a spanning set for Wcontaining p vectors. Then any set of p+1 or more vectors inW is linearly dependent.

Proof Let {s1, s2, . . . , sm} be any set of m vectors inW , wherem > p. To show that this set islinearly dependent, we first express each si in terms of the spanning set B:

s1 = a11w1 + a21w2 + · · · + ap1wp

s2 = a12w1 + a22w2 + · · · + ap2wp...

......

sm = a1mw1 + a2mw2 + · · · + apmwp.

(1)

To show that {s1, s2, . . . , sm} is linearly dependent, we must show that there is a nontrivialsolution of

c1s1 + c2s2 + · · · + cmsm = θ . (2)

Now using system (1), we can rewrite Eq. (2) in terms of the vectors in B as

c1(a11w1 + a21w2 + · · · + ap1wp) +c2(a12w1 + a22w2 + · · · + ap2wp) +

· · ·+ cm(a1mw1 + a2mw2 + · · · + apmwp) = θ .(3a)

Equation (3a) can be regrouped as

(c1a11 + c2a12 + · · · + cma1m)w1 +(c1a21 + c2a22 + · · · + cma2m)w2 +

· · ·+ (c1ap1 + c2ap2 + · · · + cmapm)wp = θ .(3b)

Now finding c1, c2, . . . , cm to satisfy Eq. (2) is the same as finding c1, c2, . . . , cm tosatisfy Eq. (3b). Furthermore, we can clearly satisfy Eq. (3b) if we can choose zero foreach coefficient of each wi . Therefore, to obtain one solution of Eq. (3b), it suffices tosolve the system

a11c1 + a12c2 + · · · + a1mcm = 0a21c1 + a22c2 + · · · + a2mcm = 0...

......

ap1c1 + ap2c2 + · · · + apmcm = 0.

(4)

[Recall that each aij is a specified constant determined by system (1), whereas eachci is an unknown parameter of Eq. (2).] The homogeneous system in (4) has moreunknowns than equations, so by Theorem 4 of Section 1.3 there is a nontrivial solutionto system (4). But a solution to system (4) is also a solution to Eq. (2), so Eq. (2) has anontrivial solution, and the theorem is established.



As an immediate corollary of Theorem 8, we can show that all bases for a subspacecontain the same number of vectors.

Corollary Let W be a subspace of Rn, and let B = {w1,w2, . . . ,wp} be a basis for W containingp vectors. Then every basis for W contains p vectors.

Proof Let Q = {u1, u2, . . . , ur} be any basis for W . Since Q is a spanning set for W , byTheorem 8 any set of r + 1 or more vectors in W is linearly dependent. Since B is alinearly independent set of p vectors in W , we know that p ≤ r . Similarly, since B isa spanning set of p vectors for W , any set of p + 1 or more vectors in W is linearlydependent. By assumption, Q is a set of r linearly independent vectors in W ; so r ≤ p.Now, since we have p ≤ r and r ≤ p, it must be that r = p.

Given that every basis for a subspace contains the same number of vectors, we canmake the following definition without any possibility of ambiguity.

Definition 5 Let W be a subspace of Rn. If W has a basis B = {w1,w2, . . . ,wp} ofp vectors, then we say that W is a subspace of dimension p, and we writedim(W) = p.

In Exercise 30, the reader is asked to show that every nonzero subspace of Rn doeshave a basis. Thus a value for dimension can be assigned to any subspace of Rn, wherefor completeness we define dim(W) = 0 if W is the zero subspace.

Since R3 has a basis {e1, e2, e3} containing three vectors, we see that dim(R3) = 3.In general,Rn has a basis {e1, e2, . . . , en} that contains n vectors; so dim(Rn) = n. Thusthe definition of dimension—the number of vectors in a basis—agrees with the usualterminology; R3 is three-dimensional, and in general, Rn is n-dimensional.


W = {x: x =x1

x2

x3

, x1 = −2x3, x2 = x3, x3 arbitrary}.

Exhibit a basis for W and determine dim(W).

Solution A vector x in W can be written in the form

x =−2x3

x3

x3

= x3

−2

11

.

Therefore, the set {u} is a basis for W , where

u =−2

11

.


3.5 Dimension 205

It follows that dim(W) = 1. Geometrically,W is the line through the origin and throughthe point with coordinates (−2, 1, 1), so again the definition of dimension coincides withour geometric intuition.

The next example illustrates the importance of the corollary to Theorem 8.

Example 2 Let W be the subspace of R3, W = span{u1, u2, u3, u4}, where

u1 =

112

, u2 =

240

, u3 =

352

, and u4 =

25−2

.

Use the techniques illustrated in Examples 5, 6, and 7 of Section 3.4 to find three differentbases for W . Give the dimension of W .

Solution

(a) The technique used in Example 5 consisted of finding a basis for W by usingthe algebraic specification for W . In particular, let b be a vector in R3:

b =a

b

c

.

Then b is in W if and only if the vector equation

x1u1 + x2u2 + x3u3 + x4u4 = b (5a)

is consistent. The matrix equation for (5a) is Ux = b, where U is the (3× 4)matrix U = [u1, u2, u3, u4]. Now, the augmented matrix [U | b] is row equiv-alent to the matrix

1 0 1 −1 2a − b0 1 1 3/2 −a/2+ b/20 0 0 0 −4a + 2b + c

. (5b)

Thus b is inW if and only if−4a+ 2b+ c = 0 or, equivalently, c = 4a− 2b.The subspace W can then be described by

W = {b: b =

a

b

4a − 2b

, a and b any real numbers}.

From this description it follows that W has a basis {v1, v2}, where

v1 =

104

and v2 =

01−2

.



(b) The technique used in Example 6 consisted of discarding redundant vectorsfrom a spanning set for W . In particular since {u1, u2, u3, u4} spans W , thistechnique gives a basis forW that is a subset of {u1, u2, u3, u4}. To obtain sucha subset, solve the dependence relation

x1u1 + x2u2 + x3u3 + x4u4 = θ . (5c)

Note that Eq. (5c) is just Eq. (5a) with b = θ . It is easily seen from matrix(5b) that Eq. (5c) is equivalent to the reduced system

x1 + x3 − x4 = 0x2 + x3 + (3/2)x4 = 0.

(5d)

Backsolving (5d) yields

x1 = −x3 + x4

x2 = −x3 − (3/2)x4,

where x3 and x4 are arbitrary. Therefore, the vectors u3 and u4 can be deletedfrom the spanning set for W , leaving {u1, u2} as a basis for W .

(c) Let U be the (3 × 4) matrix whose columns span W,U = [u1, u2, u3, u4].Following the technique of Example 7, reduce UT to the matrix

CT =

1 0 40 1 −20 0 00 0 0

in echelon form. In this case the nonzero columns of

C =

1 0 0 00 1 0 04 −2 0 0

form a basis for W ; that is, {w1,w2} is a basis for W , where

w1 =

104

and w2 =

01−2

.

In each case the basis obtained forW contains two vectors, so dim(W) = 2. Indeed,viewed geometrically, W is the plane with equation −4x + 2y + z = 0.

Properties of a p-Dimensional SubspaceAn important feature of dimension is that a p-dimensional subspaceW has many of thesame properties as Rp. For example, Theorem 11 of Section 1.7 shows that any set ofp + 1 or more vectors in Rp is linearly dependent. The following theorem shows thatthis same property and others hold in W when dim(W) = p.


3.5 Dimension 207

Theorem 9 Let W be a subspace of Rn with dim(W) = p.

1. Any set of p + 1 or more vectors in W is linearly dependent.2. Any set of fewer than p vectors in W does not span W .3. Any set of p linearly independent vectors in W is a basis for W .4. Any set of p vectors that spans W is a basis for W .

Proof Property 1 follows immediately from Theorem 8, because dim(W) = p means that Whas a basis (and hence a spanning set) of p vectors.

Property 2 is equivalent to the statement that a spanning set for W must contain atleast p vectors. Again, this is an immediate consequence of Theorem 8.

To establish property 3, let {u1, u2, . . . , up} be a set ofp linearly independent vectorsinW . To see that the given set spansW , let v be any vector inW . By property 1, the set{v, u1, u2, . . . , up} is a linearly dependent set of vectors because the set contains p + 1vectors. Thus there are scalars a0, a1, . . . , ap (not all of which are zero) such that

a0v + a1u1 + a2u2 + · · · + apup = θ . (6)

In addition, in Eq. (6), a0 cannot be zero because {u1, u2, . . . , up} is a linearly indepen-dent set. Therefore, Eq. (6) can be rewritten as

v = (−1/a0)[a1u1 + a2u2 + · · · + apup]. (7)

It is clear from Eq. (7) that any vector in W can be expressed as a linear combination ofu1, u2, . . . , up, so the given linearly independent set also spans W . Therefore, the set isa basis.

The proof of property 4 is left as an exercise.

Example 3 LetW be the subspace of R3 given in Example 2, and let {v1, v2, v3} be the subset ofWdefined by

v1 =

1−1

6

, v2 =

120

, and v3 =

216

.

Determine which of the subsets {v1}, {v2}, {v1, v2}, {v1, v3}, {v2, v3}, and {v1, v2, v3} isa basis for W .

Solution In Example 2, the subspace W was described as

W = {b: b =

a

b

4a − 2b

, a and b any real numbers}. (8)

Using Eq. (8), we can easily check that v1, v2, and v3 are in W . We saw further inExample 2 that dim(W) = 2. By Theorem 9, property 2, neither of the sets {v1} or {v2}spans W . By Theorem 9, property 1, the set {v1, v2, v3} is linearly dependent. We caneasily check that each of the sets {v1, v2}, {v1, v3}, and {v2, v3}, is linearly independent,so by Theorem 9, property 3, each is a basis for W .



The Rank of a MatrixIn this subsection we use the concept of dimension to characterize nonsingular matricesand to determine precisely when a system of linear equations Ax = b is consistent. Foran (m× n) matrix A, the dimension of the null space is called the nullity of A, and thedimension of the range ofA is called the rank of A. The following example will illustratethe relationship between the rank of A and the nullity of A, as well as the relationshipbetween the rank of A and the dimension of the row space of A.

Example 4 Find the rank, nullity, and dimension of the row space for the matrix A, where

A =

1 1 1 2−1 0 2 −3

2 4 8 5

.

Solution To find the dimension of the row space of A, observe that A is row equivalent to thematrix

B =

1 0 −2 00 1 3 00 0 0 1

,

and B is in echelon form. Since the nonzero rows of B form a basis for the row spaceof A, the row space of A has dimension 3.

To find the nullity of A, we must determine the dimension of the null space. Sincethe homogeneous system Ax = θ is equivalent to Bx = θ , the null space of A can bedetermined by solving Bx = θ . This gives

x1 = 2x3

x2 = −3x3

x4 = 0.

Thus N (A) can be described by

N (A) = {x: x =

2x3

−3x3

x3

0

, x3 any real number}.

It now follows that the nullity of A is 1 because the vector

v =

2−3

10

forms a basis for N (A).To find the rank of A, we must determine the dimension of the range of A. Recall

that R(A), the range of A, equals the column space of A, so a basis for R(A) can be


3.5 Dimension 209

found by reducing AT to echelon form. It is straightforward to show that AT is rowequivalent to the matrix CT , where

CT =

1000

0100

0010

.

The nonzero columns of the matrix C,

C =

100

010

001

000

,

form a basis forR(A). Thus the rank of A is 3.

Note in the previous example that the row space of A is a subspace of R4, whereasthe column space (or range) of A is a subspace of R3. Thus they are entirely differentsubspaces; even so, the dimensions are the same, and the next theorem states that this isalways the case.

Theorem 10 If A is an (m× n) matrix, then the rank of A is equal to the rank of AT .

The proof of Theorem 10 will be given at the end of this section. Note that the rangeof AT is equal to the column space of AT . But the column space of AT is precisely therow space of A, so the following corollary is actually a restatement of Theorem 10.

Corollary If A is an (m× n) matrix, then the row space and the column space of A have the samedimension.

This corollary provides a useful way to determine the rank of a matrix A. Specifi-cally, ifA is row equivalent to a matrixB in echelon form, then the number, r , of nonzerorows in B equals the rank of A.

The null space of an (m× n) matrix A is determined by solving the homogeneoussystem of equations Ax = θ . Suppose the augmented matrix [A | θ ] for the system isrow equivalent to the matrix [B | θ ], which is in echelon form. Then clearly A is rowequivalent to B, and the number, r , of nonzero rows of B equals the rank of A. But ris also the number of nonzero rows of [B | θ ]. It follows from Theorem 3 of Section 1.3that there are n − r free variables in a solution for Ax = θ . But the number of vectorsin a basis for N (A) equals the number of free variables in the solution for Ax = θ

(see Example 3 of Section 3.4); that is, the nullity of A is n− r . Thus we have shown,informally, that the following formula holds.

Remark If A is an (m× n) matrix, then

n = rank(A)+ nullity(A).

This remark will be proved formally in a more general context in Chapter 5.



Example 4 illustrates the argument preceding the remark. If A is the matrix givenin Example 4,

A =

1 1 1 2−1 0 2 −3

2 4 8 5

.

then the augmented matrix [A | θ ] is row equivalent to

[B | θ ] =

1 0 −2 0 00 1 3 0 00 0 0 1 0

.

Since A is row equivalent to B, the corollary to Theorem 10 implies that A has rank3. Further, in the notation of Theorem 3 of Section 1.3, the system Ax = θ has n = 4unknowns, and the reduced matrix [B | θ ] has r = 3 nonzero rows. Therefore, thesolution for Ax = θ has n − r = 4 − 3 = 1 independent variables, and it follows thatthe nullity of A is 1. In particular,

rank(A)+ nullity(A) = 3+ 1 = 4,

as is guaranteed by the remark.The following theorem uses the concept of the rank of a matrix to establish necessary

and sufficient conditions for a system of equations, Ax = b, to be consistent.

Theorem 11 An (m× n) system of linear equations, Ax = b, is consistent if and only if

rank(A) = rank([A | b]).Proof Suppose that A = [A1,A2, . . . ,An]. Then the rank of A is the dimension of the column

space of A, that is, the subspace

Sp{A1,A2, . . . ,An}. (9)

Similarly, the rank of [A | b] is the dimension of the subspace

Sp{A1,A2, . . . ,An, b}. (10)

But we already know that Ax = b is consistent if and only if b is in the column space ofA. It follows that Ax = b is consistent if and only if the subspaces given in Eq. (9) andEq. (10) are equal and consequently have the same dimension.

Our final theorem in this section shows that rank can be used to determine nonsin-gular matrices.

Theorem 12 An (n× n) matrix A is nonsingular if and only if the rank of A is n.

Proof Suppose that A = [A1,A2, . . . ,An]. The proof of Theorem 12 rests on the observationthat the range of A is given by

R(A) = Sp{A1,A2, . . . ,An}. (11)

If A is nonsingular then, by Theorem 12 of Section 1.7, the columns of A are linearlyindependent. Thus {A1,A2, . . . ,An} is a basis forR(A), and the rank of A is n.


3.5 Dimension 211

Conversely, suppose that A has rank n; that is, R(A) has dimension n. It is animmediate consequence of Eq. (11) and Theorem 9, property 4, that {A1,A2, . . . ,An}is a basis for R(A). In particular, the columns of A are linearly independent, so, byTheorem 12 of Section 1.7, A is nonsingular.

Proof of Theorem 10 (Optional)To prove Theorem 10, let A = (aij ) be an (m × n) matrix. Denote the rows of A bya1, a2, . . . , am. Thus,

ai = [ai1, ai2, . . . , ain].Similarly, let A1,A2, . . . ,An be the columns of A, where

Aj =

a1j

a2j...amj

.

Suppose thatAT has rank k. Since the columns ofAT are aT1 , aT2 , . . . , aTm, it follows thatif

W = Sp{a1, a2, . . . , am},then dim(W) = k. Therefore, W has a basis {w1,w2, . . . ,wk}, and, by Theorem 9,property 2, m ≥ k. For 1 ≤ j ≤ k, suppose that wj is the (1× n) vector

wj = [wj1, wj2, . . . , wjn].Writing each ai in terms of the basis yields

[a11, a12, . . . , a1n] = a1 = c11w1 + c12w2 + · · ·+ c1kwk

[a21, a22, . . . , a2n] = a2 = c21w1 + c22w2 + · · ·+ c2kwk...

......

...[am1, am2, . . . , amn] = am = cm1w1 + cm2w2 + · · ·+ cmkwk.

(12)

Equating the j th component of the left side of system (12) with the j th componentof the right side yields

a1j

a2j...amj

= w1j

c11

c21...cm1

+ w2j

c12

c22...cm2

+ · · · + wkj

c1k

c2k...cmk

(13)

for 1 ≤ j ≤ n. For 1 ≤ i ≤ k, define ci to be the (m× 1) column vector

ci =

c1i

c2i...cmi

.



Then system (13) becomes

Aj = w1jc1 + w2jc2 + · · · + wkjck, 1 ≤ j ≤ n. (14)

It follows from the equations in (14) that

R(A) = Sp{A1,A2, . . .An} ⊆ Sp{c1, c2, . . . , ck}.It follows from Theorem 8 that the subspace

V = Sp{c1, c2, . . . , ck}has dimension k, at most. By Exercise 32, dim[R(A)] ≤ dim(V ) ≤ k; that is,rank(A) ≤ rank(AT ).

Since (AT )T = A, the same argument implies that rank(AT ) ≤ rank(A). Thusrank(A) = rank(AT ).

3.5 EXERCISES

Exercises 1–14 refer to the vectors in (15).

u1 =[

11

], u2 =

[12

], u3 =

[ −11

],

u4 =[

00

], u5 =

[33

], v1 =

1−1

1

,

v2 =

012

, v3 =

1−1

0

, v4 =

−1

33

(15)

In Exercises 1–6, determine by inspection why the givenset S is not a basis for R2. (That is, either S is linearlydependent or S does not span R2.)

1. S = {u1} 2. S = {u2}3. S = {u1, u2, u3} 4. S = {u2, u3, u5}5. S = {u1, u4} 6. S = {u1, u5}

In Exercises 7–9, determine by inspection why the givenset S is not a basis for R3. (That is, either S is linearlydependent or S does not span R3.)

7. S = {v1, v2} 8. S = {v1, v3}9. S = {v1, v2, v3, v4}

In Exercises 10–14, use Theorem 9, property 3, to de-termine whether the given set is a basis for the indicatedvector space.

10. S = {u1, u2} for R2

11. S = {u2, u3} for R2

12. S = {v1, v2, v3} for R3

13. S = {v1, v2, v4} for R3

14. S = {v2, v3, v4} for R3

In Exercises 15–20,W is a subspace of R4 consisting ofvectors of the form

x =

x1

x2

x3

x4

.

Determine dim(W) when the components of x satisfythe given conditions.15. x1 − 2x2 + x3 − x4 = 016. x1 − 2x3 = 017. x1 = −x2 + 2x4

x3 = − x4

18. x1 + x3− 2x4 = 0x2+ 2x3− 3x4 = 0

19. x1 = −x4x2 = 3x4x3 = 2x4

20. x1 − x2 = 0x2 − 2x3 = 0

x3 − x4 = 0

In Exercises 21–24, find a basis for N (A) and give thenullity and the rank of A.

21. A =[

1 2−2 −4

]22. A =

[ −1 2 02 −5 1

]


3.5 Dimension 213

23. A =

1 −1 3

2 −1 8

−1 4 3

24. A =

1 2 0 5

1 3 1 7

2 3 −1 9

In Exercises 25 and 26, find a basis for R(A) and givethe nullity and the rank of A.

25. A =

1 2 1

−1 0 3

1 5 7

26. A =

1 1 2 0

2 4 2 4

2 1 5 −2

27. LetW be a subspace, and let S be a spanning set forW . Find a basis for W , and calculate dim(W) foreach set S.

a) S =

1

1

−2

,−1

−2

3

,

1

0

−1

,

2

−1

0

b) S =

1

2

−11

,

3

1

1

2

,−1

1

−2

2

,

0

−2

1

2

28. Let W be the subspace of R4 defined by W ={x: vT x = 0}. Calculate dim(W), where

v =

1

2

−3

−1

.

29. Let W be the subspace of R4 defined by W = {x:aT x = 0 and bT x = 0 and cT x = 0}. Calculatedim(W) for

a =

1

−1

0

0

, b =

1

0

−1

0

, and

c =

0

1

−1

0

.

30. Let W be a nonzero subspace of Rn. Show that Whas a basis. [Hint: Let w1 be any nonzero vector inW . If {w1} is a spanning set forW , then we are done.If not, there is a vector w2 in W such that {w1,w2}is linearly independent. Why? Continue by askingwhether this is a spanning set forW . Why must thisprocess eventually stop?]

31. Suppose that {u1, u2, . . . , up} is a basis for a sub-space W , and suppose that x is in W with x =a1u1 + a2u2 + · · · + apup. Show that this repre-sentation for x in terms of the basis is unique—that is, if x = b1u1 + b2u2 + · · · + bpup, thenb1 = a1, b2 = a2, . . . , bp = ap.

32. Let U and V be subspaces of Rn, and suppose thatU is a subset of V . Prove that dim(U) ≤ dim(V ).If dim(U) = dim(V ), prove that V is contained inU , and thus conclude that U = V .

33. For each of the following, determine the largest pos-sible value for the rank ofA and the smallest possiblevalue for the nullity of A.a) A is (3× 3)b) A is (3× 4)c) A is (5× 4)

34. If A is a (3 × 4) matrix, prove that the columns ofA are linearly dependent.

35. IfA is a (4× 3)matrix, prove that the rows of A arelinearly dependent.

36. LetA be an (m×n)matrix. Prove that rank(A) ≤ mand rank(A) ≤ n.

37. LetA be a (2×3)matrix with rank 2. Show that the(2 × 3) system of equations Ax = b is consistentfor every choice of b in R2.

38. Let A be a (3× 4) matrix with nullity 1. Prove thatthe (3×4) system of equationsAx = b is consistentfor every choice of b in R3.



39. Prove that an (n× n)matrix A is nonsingular if andonly if the nullity of A is zero.

40. Let A be an (m×m) nonsingular matrix, and let Bbe an (m× n) matrix. Prove that N (AB) = N (B)and conclude that rank (AB) = rank (B).

41. Prove property 4 of Theorem 9 as follows: Assumethat dim(W) = p and let S = {w1, . . . ,wp} be a setof p vectors that spans W . To see that S is linearlyindependent, suppose that c1w1 + · · · + cpwp =θ . If ci �= 0, show that W = Sp{w1, . . . ,wi−1,

wi+1, . . . ,wp}. Finally, use Theorem 8 to reach acontradiction.

42. Suppose that S = {u1, u2, . . . , up} is a set of lin-early independent vectors in a subspace W , wheredim(W) = m and m > p. Prove that there is avector up+1 in W such that {u1, u2, . . . , up, up+1}is linearly independent. Use this proof to show thata basis including all the vectors in S can be con-structed for W .

3.6 ORTHOGONAL BASES FOR SUBSPACES

We have seen that a basis provides a very efficient way to characterize a subspace. Also,given a subspaceW , we know that there are many different ways to construct a basis forW . In this section we focus on a particular type of basis called an orthogonal basis.

Orthogonal BasesThe idea of orthogonality is a generalization of the vector geometry concept of per-pendicularity. If u and v are two vectors in R2 or R3, then we know that u and v areperpendicular if uT v = 0 (see Theorem 7 in Section 2.3). For example, consider thevectors u and v given by

u =[

1−2

]and v =

[63

].

Clearly uT v = 0, and these two vectors are perpendicular when viewed as directed linesegments in the plane (see Fig. 3.13).

x

y

2

2 4 6

–2 (1, –2)

(6, 3)

u

v

Figure 3.13 In R2, nonzero vectors u and v are perpendicular if andonly if uT v = 0.

In general, for vectors in Rn, we use the term orthogonal rather than the termperpendicular. Specifically, if u and v are vectors in Rn, we say that u and v areorthogonal if

uT v = 0.We will also find the concept of an orthogonal set of vectors to be useful.


3.6 Orthogonal Bases for Subspaces 215

Definition 6 Let S = {u1, u2, . . . , up} be a set of vectors in Rn. The set S is said to bean orthogonal set if each pair of distinct vectors from S is orthogonal; that is,uTi uj = 0 when i �= j .

Example 1 Verify that S is an orthogonal set of vectors, where

S =

1012

,

11−1

0

,

1−2−1

0

,

Solution If we use the notation S = {u1, u2, u3}, then

uT1 u2 = [1 0 1 2]

11−1

0

= 1+ 0− 1+ 0 = 0

uT1 u3 = [1 0 1 2]

1−2−1

0

= 1+ 0− 1+ 0 = 0

uT2 u3 = [1 1 − 1 0]

1−2−1

0

= 1− 2+ 1+ 0 = 0.

Therefore, S = {u1, u2, u3} is an orthogonal set of vectors in R4.

An important property of an orthogonal set S is that S is necessarily linearly inde-pendent (so long as S does not contain the zero vector).

Theorem 13 Let S = {u1, u2, . . . ,up} be a set of nonzero vectors in Rn. If S is an orthogonal set ofvectors, then S is a linearly independent set of vectors.

Proof Let c1, c2, . . . , cp be any scalars that satisfy

c1u1 + c2u2 + · · · + cpup = θ . (1)

Form the scalar product

uT1 (c1u1 + c2u2 + · · · + cpup) = uT1 θ



or

c1(uT1 u1)+ c2(uT1 u2)+ · · · + cp(uT1 up) = 0.

Since uT1 uj = 0 for 2 ≤ j ≤ p, the expression above reduces to

c1(uT1 u1) = 0. (2)

Next, because uT1 u1 > 0 when u1 is nonzero, we see from Eq. (2) that c1 = 0.Similarly, forming the scalar product of both sides of Eq. (1) with ui , we see that

ci(uTi ui ) = 0 or ci = 0 for 1 ≤ i ≤ p. Thus S is a linearly independent set ofvectors.

By Theorem 13, any orthogonal set S containing p nonzero vectors from a p-dimensional subspace W will be a basis for W (since S is a linearly independent subsetofp vectors fromW , where dim(W) = p). Such a basis is called an orthogonal basis. Inthe following definition, recall that the symbol ‖v‖ denotes the length of v, ‖v‖ = √vT v.

Definition 7 Let W be a subspace of Rn, and let B = {u1, u2, . . . , up} be a basis for W . IfB is an orthogonal set of vectors, then B is called an orthogonal basis for W .

Furthermore, if ‖ui‖ = 1 for 1 ≤ i ≤ p, then B is said to be an orthonormalbasis for W .

The word orthonormal suggests both orthogonal and normalized. Thus an or-thonormal basis is an orthogonal basis consisting of vectors having length 1, where avector of length 1 is a unit vector or a normalized vector. Observe that the unit vectorse1, e2, . . . , en form an orthonormal basis for Rn.

Example 2 Verify that the set B = {v1, v2, v3} is an orthogonal basis for R3, where

v1 =

121

, v2 =

3−1−1

, and v3 =

1−4

7

.

Solution We first verify that B is an orthogonal set by calculating

vT1 v2 = 3− 2− 1 = 0vT1 v3 = 1− 8+ 7 = 0vT2 v3 = 3+ 4− 7 = 0.

Now, R3 has dimension 3. Thus, since B is a set of three vectors and is also a linearlyindependent set (see Theorem 13), it follows that B is an orthogonal basis for R3.

These observations are stated formally in the following corollary of Theorem 13.



Corollary Let W be a subspace of Rn, where dim(W) = p. If S is an orthogonal set of p nonzerovectors and is also a subset of W , then S is an orthogonal basis for W .

Orthonormal BasesIf B = {u1, u2, . . . ,up} is an orthogonal set, then C = {a1u1, a2u2, . . . , apup} is alsoan orthogonal set for any scalars a1, a2, . . . , ap. If B contains only nonzero vectors andif we define the scalars ai by

ai = 1√uTi ui

,

then C is an orthonormal set. That is, we can convert an orthogonal set of nonzerovectors into an orthonormal set by dividing each vector by its length.

Example 3 Recall that the set B in Example 2 is an orthogonal basis for R3. Modify B so that it isan orthonormal basis.

Solution Given that B = {v1, v2, v3} is an orthogonal basis for R3, we can modify B to be anorthonormal basis by dividing each vector by its length. In particular (see Example 2),the lengths of v1, v2, and v3 are

‖v1‖ =√

6, ‖v2‖ =√

11, and ‖v3‖ =√

66.

Therefore, the set C = {w1,w2,w3} is an orthonormal basis for R3, where

w1 = 1√6

v1 =

1/√

6

2/√

6

1/√

6

, w2 = 1√

11v2 =

3/√

11

−1/√

11

−1/√

11

, and

w3 = 1√66

v3 =

1/√

66

−4/√

66

7/√

66

.

Determining CoordinatesSuppose that W is a p-dimensional subspace of Rn, and B = {w1,w2, . . . ,wp} is abasis for W . If v is any vector in W , then v can be written uniquely in the form

v = a1w1 + a2w2 + · · · + apwp. (3)

(In Eq. (3), the fact that the scalars a1, a2, . . . , ap are unique is proved in Exercise 31 ofSection 3.5.) The scalars a1, a2, . . . , ap in Eq. (3) are called the coordinates of v withrespect to the basis B.

As we will see, it is fairly easy to determine the coordinates of a vector with respect toan orthogonal basis. To appreciate the savings in computation, consider how coordinatesare found when the basis is not orthogonal. For instance, the set B1 = {v1, v2, v3} is a



basis for R3, where

v1 =

11−1

, v2 =

−1

21

, and v3 =

2−2

1

.

As can be seen, vT1 v3 �= 0, and so B1 is not an orthogonal basis. Next, suppose we wishto express some vector v in R3, say v = [5,−5,−2]T , in terms of B1. We must solvethe (3× 3) system: a1v1+ a2v2+ a3v3 = v. In matrix terms the coordinates a1, a2, anda3 are found by solving the equation

1 −1 21 2 −2−1 1 1

a1

a2

a3

=

5−5−2

.

(By Gaussian elimination, the solution is a1 = 1, a2 = −2, a3 = 1.)By contrast, ifB2 = {w1,w2,w3} is an orthogonal basis forR3, it is easy to determine

a1, a2, and a3 so that

v = a1w1 + a2w2 + a3w3. (4)

To find the coordinate a1 in Eq. (4), we form the scalar product

wT1 v = wT

1 (a1w1 + a2w2 + a3w3)

= a1(wT1 w1)+ a2(wT

1 w2)+ a3(wT1 w3)

= a1(wT1 w1).

The last equality follows because wT1 w2 = 0 and wT

1 w3 = 0. Therefore, from above

a1 = wT1 v

wT1 w1

.

Similarly,

a2 = wT2 v

wT2 w2

and a3 = wT3 v

wT3 w3

.

(Note: Since B2 is a basis, wTi wi > 0, 1 ≤ i ≤ 3.)

Example 4 Express the vector v in terms of the orthogonal basis B = {w1,w2,w3}, where

v =

12−3

6

, w1 =

121

, w2 =

3−1−1

, and w3 =

1−4

7

.

Solution Beginning with the equation

v = a1w1 + a2w2 + a3w3,



we form scalar products to obtain

wT1 v = a1(wT

1 w1), or 12 = 6a1

wT2 v = a2(wT

2 w2), or 33 = 11a2

wT3 v = a3(wT

3 w3), or 66 = 66a3.

Thus a1 = 2, a2 = 3, and a3 = 1. Therefore, as can be verified directly, v = 2w1 +3w2 + w3.

In general, let W be a subspace of Rn, and let B = {w1,w2, . . . ,wp} be an or-thogonal basis for W . If v is any vector in W , then v can be expressed uniquely in theform

v = a1w1 + a2w2 + · · · + apwp, (5a)

where

ai = wTi v

wTi wi

, 1 ≤ i ≤ p. (5b)

Constructing an Orthogonal BasisThe next theorem gives a procedure that can be used to generate an orthogonal ba-sis from any given basis. This procedure, called the Gram–Schmidt process, is quitepractical from a computational standpoint (although some care must be exercised whenprogramming the procedure for the computer). Generating an orthogonal basis is of-ten the first step in solving problems in least-squares approximation; so Gram–Schmidtorthogonalization is of more than theoretical interest.

Theorem 14 Gram–Schmidt LetW be a p-dimensional subspace ofRn, and let {w1,w2, . . . ,wp}be any basis for W . Then the set of vectors {u1, u2, . . . , up} is an orthogonal basis forW , where

u1 = w1

u2 = w2 − uT1 w2

uT1 u1u1

u3 = w3 − uT1 w3

uT1 u1u1 − uT2 w3

uT2 u2u2,

and where, in general,

ui = wi −i−1∑k=1

uTk wi

uTk ukuk, 2 ≤ i ≤ p. (6)

The proof of Theorem 14 is somewhat technical, and we defer it to the end of this section.In Eq. (6) we have explicit expressions that can be used to generate an orthogonal

set of vectors {u1, u2, . . . , up} from a given set of linearly independent vectors. These



explicit expressions are especially useful if we have reason to implement the Gram–Schmidt process on a computer.

However, for hand calculations, it is not necessary to memorize formula (6). All weneed to remember is the form or the general pattern of the Gram–Schmidt process. Inparticular, the Gram–Schmidt process starts with a basis {w1,w2, . . . ,wp} and generatesnew vectors u1, u2, u3, . . . according to the following pattern:

u1 = w1

u2 = w2 + au1

u3 = w3 + bu1 + cu2

u4 = w4 + du1 + eu2 + f u3...

ui = wi + α1u1 + α2u2 + · · · + αi−1ui−1...

In this sequence, the scalars can be determined in a step-by-step fashion from the or-thogonality conditions.

For instance, to determine the scalar a in the definition of u2, we use the conditionuT1 u2 = 0:

0 = uT1 u2 = uT1 w2 + auT1 u1;Therefore: a = −(uT1 w2)/(uT1 u1).

(7)

To determine the two scalars b and c in the definition of u3, we use the two conditionsuT1 u3 = 0 and uT2 u3 = 0. In particular,

0 = uT1 u3 = uT1 w3 + buT1 u1 + cuT1 u2

= uT1 w3 + buT1 u1 (since uT1 u2 = 0 by Eq. (7))Therefore: b = −(uT1 w3)/(uT1 u1).

Similarly,

0 = uT2 u3 = uT2 w3 + buT2 u1 + cuT2 u2

= uT2 w3 + cuT2 u2 (since uT2 u1 = 0 by Eq. (7))Therefore: c = −(uT2 w3)/(uT2 u2).

The examples that follow illustrate the previous calculations.Finally, to use the Gram–Schmidt orthogonalization process to find an orthogonal

basis for W , we need some basis for W as a starting point. In many of the applicationsthat require an orthogonal basis for a subspace W , it is relatively easy to produce thisinitial basis—we will give some examples in a later section. Given a basis for W , theGram–Schmidt process proceeds in a mechanical fashion using Eq. (6). (Note: It wasshown in Exercise 30 of Section 3.5 that every nonzero subspace of Rn has a basis.Therefore, by Theorem 14, every nonzero subspace of Rn has an orthogonal basis.)



Example 5 Let W be the subspace of R3 defined by W = Sp{w1,w2}, where

w1 =

112

and w2 =

02−4

.

Use the Gram–Schmidt process to construct an orthogonal basis for W .

Solution We define vectors u1 and u2 of the formu1 = w1

u2 = w2 + au1,

where the scalar a is found from the condition uT1 u2 = 0. Now, u1 = [1, 1, 2]T and thusuT1 u2 is given by

uT1 u2 = uT1 (w2 + au1) = uT1 w2 + auT1 u1 = −6+ 6a.

Therefore, to have uT1 u2 = 0, we need a = 1. With a = 1, u2 is given by u2 = w2+u1 =[1, 3,−2]T .

In detail, an orthogonal basis for W is B = {u1, u2}, where

u1 =

112

and u2 =

13−2

.

For convenience in hand calculations, we can always eliminate fractional compo-nents in a set of orthogonal vectors. Specifically, if x and y are orthogonal, then so areax and y for any scalar a:

If xT y = 0, then (ax)T y = a(xT y) = 0.

We will make use of this observation in the following example.

Example 6 Use the Gram–Schmidt orthogonalization process to generate an orthogonal basis forW = Sp{w1,w2,w3}, where

w1 =

0121

, w2 =

0131

, and w3 =

1110

.

Solution First we should check to be sure that {w1,w2,w3} is a linearly independent set. Acalculation shows that the vectors are linearly independent. (Exercise 27 illustrates whathappens when the Gram–Schmidt algorithm is applied to a linearly dependent set.)

To generate an orthogonal basis {u1, u2, u3} from {w1,w2,w3}, we first setu1 = w1

u2 = w2 + au1

u3 = w3 + bu1 + cu2.



With u1 = [0, 1, 2, 1]T , the orthogonality condition uT1 u2 = 0 leads to uT1 w2+auT1 u1 =0, or 8+ 6a = 0. Therefore, a = −4/3 and hence

u2 = w2 − (4/3)u1 = [0,−1/3, 1/3,−1/3]T .Next, the conditions uT1 u3 = 0 and uT2 u3 = 0 lead to

0 = uT1 (w3 + bu1 + cu2) = 3+ 6b

0 = uT2 (w3 + bu1 + cu2) = 0+ (1/3)c.Therefore, b = −1/2 and c = 0. Having the scalars b and c,

u3 = w3 − (1/2)u1 − (0)u2 = [1, 1/2, 0,−1/2]T .For convenience, we can eliminate the fractional components in u2 and u3 and obtain anorthogonal basis {v1, v2, v3}, where

v1 =

0121

, v2 =

0−1

1−1

, and v3 =

210−1

.

(Note: In Example 6, we could have also eliminated fractional components in the middleof the Gram–Schmidt process. That is, we could have redefined u2 to be the vectoru2 = [0,−1, 1,−1]T and then calculated u3 with this new, redefined multiple of u2.)

As a final example, we use MATLAB to construct orthogonal bases.


A =

141

211

102

364

215

.

Find an orthogonal basis forR(A) and an orthogonal basis for N (A).Solution The MATLAB command orth(A)gives an orthonormal basis for the range of A.

The command null(A)gives an orthonormal basis for the null space of A. The re-sults are shown in Fig. 3.14. Observe that the basis for R(A) has three vectors; thatis, the dimension of R(A) is three or, equivalently, A has rank three. The basis forN (A) has two vectors; that is, the dimension of N (A) is two, or equivalently, A hasnullity two.

Proof of Theorem 14 (Optional)We first show that the expression given in Eq. (6) is always defined and that the vectorsu1, u2, . . . , up are all nonzero. To begin, u1 is a nonzero vector since u1 = w1. ThusuT1 u1 > 0, and so we can define u2. Furthermore, we observe that u2 has the formu2 = w2− bu1 = w2− b1w1; so u2 is nonzero since it is a nontrivial linear combination



A= 1 2 1 3 2 4 1 0 6 1 1 1 2 4 5

>>orth(A)

ans= 0.3841 -0.1173 -0.9158 0.7682 0.5908 0.2466 0.5121 -0.7983 0.3170

>>null(A)

ans= -0.7528 -0.0690 -0.2063 0.1800 -0.1069 -0.9047 0.5736 -0.0469 -0.2243 0.3772

Figure 3.14 The MATLAB command orth(A)produces anorthonormal basis for the range of A. The command null(A)givesan orthonormal basis for the null space of A.

of w1 and w2. Proceeding inductively, suppose that u1, u2, . . . , ui−1 have been generatedby Eq. (6); and suppose that each uk has the form

uk = wk − c1w1 − c2w2 − · · · − ck−1wk−1.

From this equation, each uk is nonzero; and it follows that Eq. (6) is a well-definedexpression [since uTk uk > 0 for 1 ≤ k ≤ (i − 1)]. Finally, since each uk in Eq. (6) is alinear combination of w1,w2, . . . ,wk , we see that ui is a nontrivial linear combinationof w1,w2, . . . ,wi ; and therefore ui is nonzero.

All that remains to be proved is that the vectors generated by Eq. (6) are orthogonal.Clearly uT1 u2 = 0. Proceeding inductively again, suppose that uTj uk = 0 for any j andk, where j �= k and 1 ≤ j , k ≤ i − 1. From (6) we have

uTj ui = uTj

(wi −

i−1∑k=1

uTk wi

uTk ukuk

)= uTj wi −

i−1∑k=1

(uTk wi

uTk uk

)(uTj uk)

= uTj wi −(

uTj wi

uTj uj

)(uTj uj ) = 0.

Thus ui is orthogonal to uj for 1 ≤ j ≤ i − 1. Having this result, we have shownthat {u1, u2, . . . , up} is an orthogonal set of p nonzero vectors. So, by the corollary ofTheorem 13, the vectors u1, u2, . . . , up are an orthogonal basis for W .



3.6 EXERCISES

In Exercises 1–4, verify that {u1, u2, u3} is an orthogonalset for the given vectors.

1. u1 =

111

, u2 =

−1

01

, u3 =

−1

2−1

2. u1 =

101

, u2 =

−1

01

, u3 =

010

3. u1 =

112

, u2 =

20−1

, u3 =

1−5

2

4. u1 =

212

, u2 =

12−2

, u3 =

−2

21

In Exercises 5–8, find values a, b, and c such that{u1, u2, u3} is an orthogonal set.

5. u1 =

111

, u2 =

22−4

, u3 =

a

b

c

6. u1 =

201

, u2 =

11−2

, u3 =

a

b

c

7. u1 =

111

, u2 =

−2−1a

, u3 =

4b

c

8. u1 =

21−1

, u2 =

a

1−1

, u3 =

b

3c

In Exercises 9–12, express the given vector v in termsof the orthogonal basis B = {u1, u2, u3}, where u1, u2,and u3 are as in Exercise 1.

9. v =

110

10. v =

012

11. v =

333

12. v =

121

In Exercises 13–18, use the Gram–Schmidt process togenerate an orthogonal set from the given linearly inde-pendent vectors.

13. 0010

,

1121

,

1011

14. 1012

,

2102

,

1−1

01

15.

110

,

021

,

116

16.

012

,

362

,

10−5

5

17. 0101

,

1200

,

0210

18. 1102

,

0212

,

0102

In Exercises 19 and 20, find a basis for the null space andthe range of the given matrix. Then use Gram–Schmidtto obtain orthogonal bases.

19.

1 −2 1 −52 1 7 51 −1 2 −2


3.7 Linear Transformations from Rn to Rm 225

20.

1 3 10 11 9−1 2 5 4 1

2 −1 −1 1 4

21. Argue that any set of four or more nonzero vectorsin R3 cannot be an orthogonal set.

22. Let S = {u1, u2, u3} be an orthogonal set of nonzerovectors in R3. Define the (3 × 3) matrix A byA = [u1, u2, u3]. Show that A is nonsingular andATA = D, where D is a diagonal matrix. Calculatethe diagonal matrix D when A is created from theorthogonal vectors in Exercise 1.

23. Let W be a p-dimensional subspace of Rn. If v is avector in W such that vT w = 0 for every w in W ,show that v = θ . [Hint: Consider w = v.]

24. TheCauchy–Schwarz inequality. Let x and y be vec-tors in Rn. Prove that |xT y| ≤ ‖x‖‖y‖. [Hint: Ob-serve that ‖x− cy‖2 ≥ 0 for any scalar c. If y �= θ ,

let c = xT y/yT y and expand (x−cy)T (x−cy) ≥ 0.Also treat the case y = θ .]

25. The triangle inequality. Let x and y be vectors inRn. Prove that ‖x+y‖ ≤ ‖x‖+‖y‖. [Hint: Expand‖x + y‖2 and use Exercise 24.]

26. Let x and y be vectors in Rn. Prove that |‖x‖ −‖y‖| ≤ ‖x − y‖. [Hint: For one part consider‖x + (y− x)‖ and Exercise 25.]

27. If the hypotheses for Theorem 14 were altered sothat {wi}p−1

i=1 is linearly independent and {wi}pi=1 islinearly dependent, use Exercise 23 to show thatEq. (6) yields up = θ .

28. Let B = {u1, u2, . . . , up} be an orthonormal basisfor a subspace W . Let v be any vector in W , wherev = a1u1 + a2u2 + · · · + apup. Show that

‖v‖2 = a21 + a2

2 + · · · + a2p.

3.7 LINEAR TRANSFORMATIONS FROM Rn TO Rm

In this section we consider a special class of functions, called linear transformations,that map vectors to vectors. As we will presently observe, linear transformations arisenaturally as a generalization of matrices. Moreover, linear transformations have impor-tant applications in engineering science, the social sciences, and various branches ofmathematics.

The notation for linear transformations follows the usual notation for functions. IfV is a subspace of Rn and W is a subspace of Rm, then the notation

F : V → W

will denote a function, F , whose domain is the subspace V and whose range is containedin W . Furthermore, for v in V we write

w = F(v)to indicate that F maps v to w. To illustrate, let F : R3 → R2 be defined by

F(x) =[x1 − x2

x2 + x3

],

where

x =x1

x2

x3

.



In this case if, for example, v is the vector

v =

123

,

then F(v) = w, where

w =[ −1

5

].

In earlier sections we have seen that an (m × n) matrix A determines a functionfrom Rn to Rm. Specifically for x in Rn, the formula

T (x) = Ax (1)

defines a function T : Rn→ Rm. To illustrate, let A be the (3× 2) matrix

A =

1 −10 23 1

.

In this case Eq. (1) defines a function T : R2 → R3, and the formula for T is

T (x) = T([

x1

x2

])=

1 −10 23 1

[x1

x2

]=

x1 − x2

2x2

3x1 + x2

;

for instance,

T

([11

])=

024

.

Returning to the general case in whichA is an (m×n)matrix, note that the functionT defined by Eq. (1) satisfies the following linearity properties:

T (v + w) = A(v + w) = Av + Aw = T (v)+ T (w)T (cv) = A(cv) = cAv = cT (v), (2)

where v and w are any vectors in Rn and c is an arbitrary scalar. We next define alinear transformation to be a function that satisfies the two linearity properties given inEq. (2).

Definition 8 Let V and W be subspaces of Rn and Rm, respectively, and let T be a functionfrom V to W , T : V → W . We say that T is a linear transformation if for all uand v in V and for all scalars a

T (u+ v) = T (u)+ T v)and

T (au) = aT (u). (3)



It is apparent from Eq. (2) that the function T defined in Eq. (1) by matrix multipli-cation is a linear transformation. Conversely, if T : Rn→ Rm is a linear transformation,then (see Theorem 15 on page 232) there is an (m× n) matrix A such that T is definedby Eq. (1). Thus linear transformations fromRn toRm are precisely those functions thatcan be defined by matrix multiplication as in Eq. (1). The situation is not so simple forlinear transformations on arbitrary vector spaces or even for linear transformations onsubspaces of Rn. Thus the concept of a linear transformation is a convenient and usefulgeneralization to arbitrary subspaces of matrix functions defined as in Eq. (1).

Examples of Linear TransformationsMost of the familiar functions from the reals to the reals are not linear transformations.For example, none of the functions

f (x) = x + 1, g(x) = x2, h(x) = sin x, k(x) = exis a linear transformation. Indeed, it will follow from the exercises that a functionf : R→ R is a linear transformation if and only if f is defined by f (x) = ax for somescalar a.

We now give several examples to illustrate the use of Definition 8 in verifyingwhether a function is or is not a linear transformation.

Example 1 Let F : R3 → R2 be the function defined by

F(x) =[x1 − x2

x2 + x3

], where x =

x1

x2

x3

.

Determine whether F is a linear transformation.

Solution We must determine whether the two linearity properties in Eq. (3) are satisfied by F .Thus let u and v be in R3,

u =u1

u2

u3

and v =

v1

v2

v3

,

and let c be a scalar. Then

u+ v =u1 + v1

u2 + v2

u3 + v3

.

Therefore, from the rule defining F ,

F(u+ v) =[(u1 + v1)− (u2 + v2)

(u2 + v2)+ (u3 + v3)

]

=[u1 − u2

u2 + u3

]+[v1 − v2

v2 + v3

]

= F(u)+ F(v).



Similarly,

F(cu) =[cu1 − cu2

cu2 + cu3

]= c[u1 − u2

u2 + u3

]= cF (u),

so F is a linear transformation.Note that F can also be defined as F(x) = Ax, where A is the (2× 3) matrix

A =[

1 −1 00 1 1

].

Example 2 Define H: R2 → R2 by

H(x) =[x1 − x2 + 1

3x2

], where x =

[x1

x2

].

Determine whether H is a linear transformation.

Solution Let u and v be in R2:

u =[u1

u2

]and v =

[v1

v2

].

Then

H(u+ v) =[(u1 + v1)− (u2 + v2)+ 1

3(u2 + v2)

],

while

H(u)+H(v) =[u1 − u2 + 1

3u2

]+[v1 − v2 + 1

3v2

]

=[(u1 + v1)− (u2 + v2)+ 2

3(u2 + v2)

].

Thus we see thatH(u+v) �= H(u)+H(v). Therefore,H is not a linear transformation.Although it is not necessary, it can also be verified easily that if c �= 1, then H(cu) �=cH(u).

Example 3 Let W be a subspace of Rn such that dim(W) = p, and let S = {w1,w2, . . . ,wp} be anorthonormal basis for W . Define T : Rn→ W by

T (v) = (vT w1)w1 + (vT w2)w2 + · · · + (vT wp)wp. (4)

Prove that T is a linear transformation.

Solution If u and v are in Rn, thenT (u+ v) = [(u+ v)T w1]w1 + [(u+ v)T w2]w2 + · · · + [(u+ v)T wp]wp

= [(uT + vT )w1]w1 + [(uT + vT )w2]w2 + · · · + [(uT + vT )wp]wp

= (uT w1)w1 + (uT w2)w2 + · · · + (uT wp)wp

+(vT w1)w1 + (vT w2)w2 + · · · + (vT wp)wp

= T (u)+ T (v).



It can be shown similarly that T (cu) = cT (u) for each scalar c, so T is a lineartransformation.

The vector T (v) defined by Eq. (4) is called the orthogonal projection of v ontoW and will be considered further in Sections 3.8 and 3.9. As a specific illustration ofExample 3, let W be the subspace of R3 consisting of all vectors of the form

x =x1

x2

0

.

Thus W is the xy-plane, and the set {e1, e2} is an orthonormal basis for W . For x in R3,

x =x1

x2

x3

,

the formula in Eq. (4) yields

T (x) = (xT e1)e1 + (xT e2)e2 = x1e1 + x2e2.

Thus,

T (x) =x1

x2

0

.

This transformation is illustrated geometrically by Fig. 3.15.

y

z

x

x

T(x)

(x1, x2, 0)

(x1, x2, x3)

Figure 3.15 Orthogonal projection onto the xy-plane

Example 4 Let W be a subspace of Rn, and let a be a scalar. Define T : W → W by T (w) = aw.Demonstrate that T is a linear transformation.

Solution If v and w are in W , then

T (v + w) = a(v + w) = av + aw = T (v)+ T (w).



Likewise, if c is a scalar, then

T (cw) = a(cw) = c(aw) = cT (w).It follows that T is a linear transformation.

The linear transformation defined in Example 4 is called a dilation when a > 1 anda contraction when 0 < a < 1. These cases are illustrated geometrically in Fig. 3.16.

aw

w

w

aw

a > 1, dilation 0 < a < 1, contraction

(a) (b)

Figure 3.16 Dilations and contractions

The mapping I: W → W defined by I (w) = w is the special case of Example 4 inwhich a = 1. The linear transformation I is called the identity transformation.

Example 5 Let W be a subspace of Rn, and let θ be the zero vector in Rm. Define T : W → Rm byT (w) = θ for each w in W . Show that T is a linear transformation.

Solution Let v and w be vectors in W , and let c be a scalar. Then

T (v + w) = θ = θ + θ = T (v)+ T (w)and

T (cv) = θ = cθ = cT (v),so T is a linear transformation.

The linear transformation T defined in Example 5 is called the zero transformation.Later in this section we will consider other examples when we study a particular class

of linear transformations from R2 to R2. For the present, we turn to further propertiesof linear transformations.

The Matrix of a TransformationLet V and W be subspaces, and let T : V → W be a linear transformation. If u and vare vectors in V and if a and b are scalars, then the linearity properties (3) yield

T (au+ bv) = T (au)+ T (bv) = aT (u)+ bT (v). (5)



Inductively we can extend Eq. (5) to any finite subset of V . That is, if v1, v2, . . . , vr arevectors in V and if c1, c2, . . . , cr are scalars, then

T (c1v1 + c2v2 + · · · + crvr ) = c1T (v1)+ c2T (v2)+ · · · + crT (vr ). (6)

The following example illustrates an application of Eq. (6).


W = {x: x =x2 + 2x3

x2

x3


Then {w1,w2} is a basis for W , where

w1 =

110

and w2 =

201

.

Suppose that T : W → R2 is a linear transformation such that T (w1) = u1 and T (w2) =u2, where

u1 =[

11

]and u2 =

[1−1

].

Let the vector w be given by

w =−1

3−2

.

Show that w is in W , express w as a linear combination of w1 and w2, and use Eq. (6)to determine T (w).

Solution It follows from the description of W that w is in W . Furthermore, it is easy to see that

w = 3w1 − 2w2.

By Eq. (6),

T (w) = 3T (w1)− 2T (w2) = 3u1 − 2u2 = 3

[11

]− 2

[1−1

].

Thus,

T (w) =[

15

].

Example 6 illustrates that the action of a linear transformation T on a subspace Wis completely determined once the action of T on a basis for W is known. Our nextexample provides yet another illustration of this fact.



Example 7 Let T : R3 → R2 be a linear transformation such that

T (e1) =[

12

], T (e2) =

[ −11

], and T (e3) =

[23

].

For an arbitrary vector x in R3,

x =x1

x2

x3

,

give a formula for T (x).

Solution The vector x can be written in the form

x = x1e1 + x2e2 + x3e3,

so by Eq. (6),

T (x) = x1T (e1)+ x2T (e2)+ x3T (e3). (7)

Thus,

T (x) = x1

[12

]+ x2

[ −11

]+ x3

[23

]=[

x1 − x2 + 2x3

2x1 + x2 + 3x3

].

Continuing with the notation of the preceding example, let A be the (2× 3) matrixwith columns T (e1), T (e2), T (e3); thus,

A = [T (e1), T (e2), T (e3)] =[

1 −1 22 1 3

].

It is an immediate consequence of Eq. (7) and Theorem 5 of Section 1.5 that T (x) = Ax.Thus Example 7 illustrates the following theorem.

Theorem 15 Let T : Rn→ Rm be a linear transformation, and let e1, e2, . . . , en be the unit vectors inRn. If A is the (m× n) matrix defined by

A = [T (e1), T (e2), . . . , T (en)],then T (x) = Ax for all x in Rn.

Proof If x is a vector in Rn,

x =

x1

x2...xn

,

then x can be expressed in the form

x = x1e1 + x2e2 + · · · + xnen.



It now follows from Eq. (6) that

T (x) = x1T (e1)+ x2T (e2)+ · · · + xnT (en). (8)

IfA = [T (e1), T (e2), . . . , T (en)], then by Theorem 5 of Section 1.5, the right-hand sideof Eq. (8) is simply Ax. Thus Eq. (8) is equivalent to T (x) = Ax.

Example 8 Let T : R2 → R3 be the linear transformation defined by the formula

T

([x1

x2

])=

x1 + 2x2

−x1 + x2

2x1 − x2

.

Find a matrix A such that T (x) = Ax for each x in R2.

Solution By Theorem 15, A is the (3× 2) matrix

A = [T (e1), T (e2)].It is an easy calculation that

T (e1) =

1−1

2

and T (e2) =

21−1

.

Therefore,

A =

1 2−1 1

2 −1

.

One can easily verify that T (x) = Ax for each x in R2.

Null Space and RangeAssociated with a linear transformation, T , are two important and useful subspaces calledthe null space and the range of T . These are defined as follows.

Definition 9 Let V and W be subspaces, and let T : V → W be a linear transformation.The null space of T , denoted by N (T ), is the subset of V given by

N (T ) = {v: v is in V and T (v) = θ}.The range of T , denoted byR(T ), is the subset of W defined by

R(T ) = {w: w is in W and w = T (v) for some v in V }.

That N (T ) and R(T ) are subspaces will be proved in the more general context ofChapter 5. If T maps Rn into Rm, then by Theorem 15 there exists an (m × n) matrix



A such that T (x) = Ax. In this case it is clear that the null space of T is the null spaceof A and the range of T coincides with the range of A.

As with matrices, the dimension of the null space of a linear transformation T iscalled the nullity of T , and the dimension of the range of T is called the rank of T . IfT is defined by matrix multiplication, T (x) = Ax, then the transformation T and thematrix A have the same nullity and the same rank. Moreover, if T : Rn → Rm, then Ais an (m× n) matrix, so it follows from the remark in Section 3.5 that

rank(T )+ nullity(T ) = n. (9)

Formula (9) will be proved in a more general setting in Chapter 5.The next two examples illustrate the use of the matrix of T to determine the null

space and the range of T .

Example 9 Let F be the linear transformation given in Example 1, F : R3 → R2. Describe the nullspace and the range of F , and determine the nullity and the rank of F .

Solution It follows from Theorem 15 that F(x) = Ax, where A is the (2× 3) matrix

A = [F(e1), F (e2), F (e3)] =[

1 −1 00 1 1

].

Thus the null space and the range of F coincide, respectively, with the null space and therange of A. The null space of A is determined by backsolving the homogeneous systemAx = θ , where x is in R3:

x =x1

x2

x3

.

This gives

N (F ) = N (A) = {x : x1 = −x3 and x2 = −x3}.Using the techniques of Section 3.4, we can easily see that the vector

u =−1−1

1

is a basis for N (F ), so F has nullity 1. By Eq. (9),

rank(F ) = n− nullity(F ) = 3− 1 = 2.

ThusR(F ) is a two-dimensional subspace of R2, and henceR(F ) = R2.Alternatively, note that the system of equations Ax = b has a solution for each b in

R2, soR(F ) = R(A) = R2.

Example 10 Let T : R2 → R3 be the linear transformation given in Example 8. Describe the nullspace and the range of T , and determine the nullity and the rank of T .



Solution In Example 8 it was shown that T (x) = Ax, where A is the (3× 2) matrix

A =

1 2−1 1

2 −1

.

If b is the (3× 1) vector,

b =b1

b2

b3

,

then the augmented matrix [A | b] for the linear system Ax = b is row equivalent to

1 0 (1/3)b1 − (2/3)b2

0 1 (1/3)b1 + (1/3)b2

0 0 (−1/3)b1 + (5/3)b2 + b3

(10)

Therefore, T (x) = Ax = b can be solved if and only if 0 = (−1/3)b1 + (5/3)b2 + b3.The range of T can thus be described as

R(T ) = R(A)

= {b: b =

b1

b2

(1/3)b1 − (5/3)b2


A basis forR(T ) is {u1, u2} where

u1 =

10

1/3

and u2 =

01

−5/3

.

Thus T has rank 2, and by Eq. (9),

nullity(T ) = n− rank(T ) = 2− 2 = 0.

It follows that T has null space {θ}. Alternatively, it is clear from matrix (10), withb = θ , that the homogeneous system of equations Ax = θ has only the trivial solution.Therefore, N (T ) = N (A) = {θ}.

Orthogonal Transformations on R2 (Optional)It is often informative and useful to view linear transformations on either R2 or R3 froma geometric point of view. To illustrate this general notion, the remainder of this sectionis devoted to determining those linear transformations T : R2 → R2 that preserve thelength of a vector; that is, we are interested in linear transformations T such that

‖T (v)‖ = ‖v‖ (11)

for all v in R2. Transformations that satisfy Eq. (11) are called orthogonal transforma-tions. We begin by giving some examples of orthogonal transformations.



Example 11 Let θ be a fixed angle, and let T : R2 → R2 be the linear transformation defined byT (v) = Av, where A is the (2× 2) matrix

A =[


].

Give a geometric interpretation of T , and show that T is an orthogonal transformation.

Solution Suppose that v and T (v) are given by

v =[a

b

]and T (v) =

[c

d

].

Then T (v) = Av, so[c

d

]=[


][a

b

]=[a cos θ − b sin θa sin θ + b cos θ

]. (12)

We proceed now to show that T (v) is obtained geometrically by rotating the vector vthrough the angle θ . To see this, let φ be the angle between v and the positive x-axis(see Fig. 3.17), and set r = ‖v‖. Then the coordinates a and b can be written as

a = r cosφ, b = r sin φ. (13)

Making the substitution (13) for a and b in (12) yieldsc = r cosφ cos θ − r sin φ sin θ = r cos(φ + θ)

andd = r cosφ sin θ + r sin φ cos θ = r sin(φ + θ).

Therefore, c and d are the coordinates of the point obtained by rotating the point (a, b)through the angle θ . Clearly then, ‖T (v)‖ = ‖v‖, and T is an orthogonal lineartransformation.

c a x

d

bv

(a, b)

(c, d)y

T(v)

θφ

Figure 3.17 Rotation through the angle θ

The linear transformation T defined in Example 11 is called a rotation. Thus if Ais a (2× 2) matrix,

A =[a −bb a

],



where a2 + b2 = 1, then the linear transformation T (v) = Av is the rotation throughthe angle θ , 0 ≤ θ < 2π , where cos θ = a and sin θ = b.

Example 12 Define T : R2 → R2 by T (v) = Av, where

A =[ −1/2

√3/2

−√3/2 −1/2

].

Give a geometric interpretation of T .

Solution Since cos(4π/3) = −1/2 and sin(4π/3) = −√3/2, T is the rotation through the angle4π/3.

Now let l be a line in the plane that passes through the origin, and let v be a vectorin the plane. If we define T (v) to be the symmetric image of v relative to l (see Fig.3.18), then clearly T preserves the length of v. It can be shown that T is multiplicationby the matrix

A =[

cos θ sin θsin θ − cos θ

],

where (1/2)θ is the angle between l and the positive x-axis. Any such transformation iscalled a reflection. Note that a reflection T is also an orthogonal linear transformation.

x

y

v

T(v)

(1/2)�

l

Figure 3.18Reflection about a line

Example 13 Let T : R2 → R2 be defined by T (v) = Av, where A is the (2× 2) matrix

A =[

1/2√

3/2√3/2 −1/2

].

Give a geometric interpretation of T .

Solution Since cos(π/3) = 1/2 and sin(π/3) = √3/2, T is the reflection about the line l, wherel is the line that passes through the origin at an angle of 30 degrees.

The next theorem gives a characterization of orthogonal transformations on R2. Aconsequence of this theorem will be that every orthogonal transformation is either arotation or a reflection.

Theorem 16 Let T : R2 → R2 be a linear transformation. Then T is an orthogonal transformation ifand only if ‖T (e1)‖ = ‖T (e2)‖ = 1 and T (e1) is perpendicular to T (e2).

Proof If T is an orthogonal transformation, then ‖T (v)‖ = ‖v‖ for every vector v in R2.In particular, ‖T (e1)‖ = ‖e1‖ = 1, and similarly ‖T (e2)‖ = 1. Set u1 = T (e1),u2 = T (e2), and v = [1, 1]T = e1 + e2. Then

2 = ‖v‖2 = ‖T (v)‖2 = ‖T (e1 + e2)‖2 = ‖T (e1)+ T (e2)‖2.



Thus,

2 = ‖u1 + u2‖2

= (u1 + u2)T (u1 + u2)

= (uT1 + uT2 )(u1 + u2)

= uT1 u1 + uT1 u2 + uT2 u1 + uT2 u2

= ‖u1‖2 + 2uT1 u2 + ‖u2‖2

= 2+ 2uT1 u2.

It follows that uT1 u2 = 0, so u1 is perpendicular to u2.The proof of the converse is Exercise 47.

We can now use Theorem 16 to give a geometric description for any orthogonallinear transformation, T , on R2. First, suppose that T (e1) = u1 and T (e2) = u2. If

u1 =[a

b

],

then 1 = ‖u1‖2 = a2 + b2. Since ‖u2‖ = 1 and u2 is perpendicular to u1, there are twochoices for u2 (see Fig. 3.19): either

u2 =[ −b

a

]or u2 =

[b

−a

].

In either case, it follows from Theorem 15 that T is defined by T (v) = Av, where A isthe (2× 2) matrix A = [u1, u2]. Thus if

u2 =[ −b

a

],

then

A =[a −bb a

],

x

y

u1e1

(a, b)

(b, –a)

(–b, a)

Figure 3.19 Choices for u2



so T is a rotation. If

u2 =[

b

−a

],

then

A =[a b

b −a

],

and T is a reflection. In either case note that ATA = I , so AT = A−1 (see Exercise 48).An (n× n) real matrix with the property that ATA = I is called an orthogonal matrix.Thus we have shown that an orthogonal transformation on R2 is defined by

T (x) = Ax,

where A is an orthogonal matrix.

3.7 EXERCISES

1. Define T : R2 → R2 by

T

([x1

x2

])=[

2x1 − 3x2

−x1 + x2

].

Find each of the following.

a) T

([00

])b) T

([11

])

c) T

([21

])d) T

([ −10

])

2. Define T : R2 → R2 by T (x) = Ax, where

A =[

1 −1−3 3

].


a) T

([22

])b) T

([31

])

c) T

([20

])d) T

([00

])

3. Let T : R3 → R2 be the linear transformation de-fined by

T

x1

x2

x3

=

[x1 + 2x2 + 4x3

2x1 + 3x2 + 5x3

].

Which of the following vectors are in the null spaceof T ?

a)

000

b)

2−3

1

c)

121

d)

−1

3/2−1/2

4. Let T : R2 → R2 be the function defined in Exer-cise 1. Find x in R2 such that T (x) = b, where

b =[

2−2

].

5. Let T : R2 → R2 be the function given in Exer-cise 1. Show that for each b in R2, there is an x inR2 such that T (x) = b.

6. Let T be the linear transformation given in Exer-cise 2. Find x in R2 such that T (x) = b, where

b =[ −2

6

].

7. Let T be the linear transformation given in Exer-cise 2. Show that there is no x in R2 such that



T (x) = b for

b =[

11

].

In Exercises 8–17, determine whether the function F isa linear transformation.

8. F : R2 → R2 defined by

F

([x1

x2

])=[

2x1 − x2

x1 + 3x2

]


F

([x1

x2

])=[x2

x1

]


F

([x1

x2

])=[x1 + x2

1

]


F

([x1

x2

])=[

x21

x1x2

]


F

x1

x2

x3

=

[x1 − x2 + x3

−x1 + 3x2 − 2x3

]


F

x1

x2

x3

=

[x1

x2

]


F

([x1

x2

])=

x1 − x2

−x1 + x2

x2


F

([x1

x2

])=x1

x2

0

16. F : R2 → R defined by

F

([x1

x2

])= 2x1 + 3x2

17. F : R2 → R defined by

F

([x1

x2

])= |x1| + |x2|

18. Let W be the subspace of R3 defined by

W = {x: x =x1

x2

x3

, x2 = x3 = 0}.

Find an orthonormal basis for W , and use Eq. (4)of Example 3 to give a formula for the orthogonalprojection T : R3 → W ; that is, determine T (v) forarbitrary v in R3:

v =a

b

c

.

Give a geometric interpretation of W , v, and T (v).19. Let T : R2 → R3 be a linear transformation such

that T (e1) = u1 and T (e2) = u2, where

u1 =

10−1

and u2 =

210

.


a) T

([11

])

b) T

([2−1

])

c) T

([32

])

20. Let T : R2 → R2 be a linear transformation suchthat T (v1) = u1 and T (v2) = u2, where

v1 =[

01

], v2 =

[ −11

],

u1 =[

02

], and u2 =

[31

].




a) T

([11

])

b) T

([2−1

])

c) T

([32

])

In Exercises 21–24, the action of a linear transformationT on a basis for either R2 or R3 is given. In each caseuse Eq. (6) to derive a formula for T .

21. T

([11

])=[

2−1

]and

T

([1−1

])=[

03

]

22. T

([11

])=

121

and

T

([1−1

])=

022

23. T

101

=

[01

],

T

0−1

1

=

[10

],

T

1−1

0

=

[00

]

24. T

101

=

0−1

1

,

T

0−1

1

=

210

,

T

1−1

0

=

001

In Exercises 25–30, a linear transformation T is given.In each case find a matrix A such that T (x) = Ax. Alsodescribe the null space and the range of T and give therank and the nullity of T .25. T : R2 → R2 defined by

T

([x1

x2

])=[

x1 + 3x2

2x1 + x2

]

26. T : R2 → R3 defined by

T

([x1

x2

])=x1 − x2

x1 + x2

x2

27. T : R2 → R defined by

T

([x1

x2

])= 3x1 + 2x2


T

x1

x2

x3

=

x1 + x2

x3

x2


T

x1

x2

x3

=

[x1 − x2

x2 − x3

]

30. T : R3 → R defined by

T

x1

x2

x3

= 2x1 − x2 + 4x3

31. Let a be a real number, and define f : R → R byf (x) = ax for each x in R. Show that f is a lineartransformation.



32. Let T : R→ R be a linear transformation, and sup-pose that T (1) = a. Show that T (x) = ax for eachx in R.

33. Let T : R2 → R2 be the function that maps eachpoint in R2 to its reflection with respect to thex-axis. Give a formula for T and show that T isa linear transformation.

34. Let T : R2 → R2 be the function that maps eachpoint in R2 to its reflection with respect to the liney = x. Give a formula for T and show that T is alinear transformation.

35. Let V and W be subspaces, and let F : V → W

and G: V → W be linear transformations. DefineF + G: V → W by [F + G](v) = F(v) + G(v)for each v in V . Prove that F + G is a lineartransformation.

36. Let F : R3 → R2 and G: R3 → R2 be defined by

F

x1

x2

x3

=

[2x1 − 3x2 + x3

4x1 + 2x2 − 5x3

]

and

G

x1

x2

x3

=

[ − x1 + 4x2 + 2x3

−2x1 + 3x2 + 3x3

].

a) Give a formula for the linear transformationF +G (see Exercise 35).

b) Find matrices A, B, and C such that F(x) =Ax, G(x) = Bx, and (F +G)(x) = Cx.

c) Verify that C = A+ B.37. Let V and W be subspaces, and let T : V → W be

a linear transformation. If a is a scalar, define aT :V → W by [aT ](v) = a[T (v)] for each v in V .Show that aT is a linear transformation.

38. Let T : R3 → R2 be the linear transformation de-fined in Exercise 29. The linear transformation [3T ]:R3 → R2 is defined in Exercise 37.a) Give a formula for the transformation 3T .b) Find matrices A and B such that T (x) = Ax

and [3T ](x) = Bx.c) Verify that B = 3A.

39. Let U , V , and W be subspaces, and let F : U → V

and G: V → W be linear transformations. Provethat the composition G ◦ F : U → W of F and G,

defining by [G◦F ](u) = G(F(u)) for each u in U ,is a linear transformation.

40. Let F : R3 → R2 and G: R2 → R3 be linear trans-formations defined by

F

x1

x2

x3

=

[ −x1 + 2x2 − 4x3

2x1 + 5x2 + x3

]

and

G

([x1

x2

])=

x1 − 2x2

3x1 + 2x2

−x1 + x2

.

a) By Exercise 39, G ◦ F : R3 → R3 is a lineartransformation. Give a formula for G ◦ F .

b) Find matrices A, B, and C such that F(x) =Ax, G(x) = Bx, and [G ◦ F ](x) = Cx.

c) Verify that C = BA.41. Let B be an (m × n) matrix, and let T : Rn → Rm

be defined by T (x) = Bx for each x in Rn. If Ais the matrix for T given by Theorem 15, show thatA = B.

42. LetF : Rn → Rp andG: Rp → Rm be linear trans-formations, and assume that Theorem 15 yields ma-tricesA andB, respectively, forF andG. Show thatthe matrix for the composition G ◦ F (see Exercise39) is BA. [Hint: Show that (G ◦ F)(x) = BAx forx in Rn and then apply Exercise 41.]

43. Let I : Rn → Rn be the identity transformation. De-termine the matrix A such that I (x) = Ax for eachx in Rn.

44. Let a be a real number and define T : Rn → Rn byT (x) = ax (see Example 4). Determine the matrixA such that T (x) = Ax for each x in Rn.

Exercises 45–49 are based on the optional material.45. Let T : R2 → R2 be a rotation through the angle θ .

In each of the following cases, exhibit the matrix forT . Also represent v and T (v) geometrically, where

v =[

11

].

a) θ = π/2 b) θ = π/3 c) θ = 2π/346. Let T : R2 → R2 be the reflection with respect to

the line l. In each of the following cases, exhibit


3.8 Least-Squares Solutions to Inconsistent Systems, with Applications to Data Fitting 243

the matrix for T . Also represent e1, e2, T (e1), andT (e2) geometrically.a) l is the x-axis. b) l is the y-axis.c) l is the line with equation y = x.d) l is the line with equation y = √3x.

47. Let T : R2 → R2 be a linear transformation thatsatisfies the conditions of Theorem 16. Show thatT is orthogonal. [Hint: If v = [a, b]T , thenv = ae1 + be2. Now use Eq. (6).]

48. Let T : R2 → R2 be an orthogonal linear trans-formation, and let A be the corresponding (2 × 2)matrix. Show that ATA = I . [Hint: Use Theorem16.]

49. Let A = [A1,A2] be a (2 × 2) matrix such thatATA = I , and define T : R2 → R2 by T (x) = Ax.a) Show that {A1,A2} is an orthonormal set.b) Use Theorem 16 to show that T is an ortho-

gonal transformation.

3.8 LEAST-SQUARES SOLUTIONS TO INCONSISTENTSYSTEMS, WITH APPLICATIONS TO DATA FITTING

When faced with solving a linear system of the form Ax = b, our procedure has been todescribe all solutions if the system is consistent but merely to say “there are no solutions”if the system is inconsistent. We now want to go a step further with regard to inconsistentsystems. If a given linear system Ax = b has no solution, then we would like to dothe next best thing—find a vector x∗ such that the residual vector, r = Ax∗ − b, is assmall as possible. In terms of practical applications, we shall see that any technique forminimizing the residual vector can also be used to find best least-squares fits to data.

A common source of inconsistent systems are overdetermined systems (that is,systems with more equations than unknowns). The system that follows is an example ofan overdetermined system:

x1 + 4x2 = −2x1 + 2x2 = 6

2x1 + 3x2 = 1.

Overdetermined systems are often inconsistent, and the preceding example is no excep-tion. Given that the above system has no solution, a reasonable goal is to find values forx1 and x2 that come as close as possible to satisfying all three equations. Methods forachieving that goal are the subject of this section.

Least-Squares Solutions to Ax = bConsider the linear system Ax = b where A is (m× n). If x is a vector in Rn, then thevector r = Ax− b is called a residual vector. A vector x∗ in Rn that yields the smallestpossible residual vector is called a least-squares solution to Ax = b. More precisely, x∗is a least-squares solution to Ax = b if

‖Ax∗ − b‖ ≤ ‖Ax − b‖, for all x in Rn.

(If Ax = b happens to be consistent, then a least-squares solution x∗ is also a solutionin the usual sense since ‖Ax∗ − b‖ = 0.)



The special case of an inconsistent (3 × 2) system Ax = b suggests how we cancalculate least-squares solutions. In particular, consider Fig. 3.20 which illustrates avector b that is not inR(A); that is, Ax = b is inconsistent.

y

z

x

b

y*

R(A)

Figure 3.20 y∗ is the closest vector inR(A) to b

Let the vector y∗ inR(A) be the closest vector inR(A) to b; that is

‖y∗ − b‖ ≤ ‖y− b‖, for all y inR(A).Geometry suggests (see Fig. 3.20) that the vector y∗ − b is orthogonal to every vectorin R(A). Since the columns of A form a spanning set for R(A), this orthogonalitycondition leads to

AT1 (y∗ − b) = 0

AT2 (y∗ − b) = 0

or, in matrix-vector terms,

AT(y∗ − b) = θ .Since y∗ = Ax∗ for some x∗ in R2, the preceding equation becomes AT (Ax∗ − b) = θ ,or

ATAx∗ = AT b.

Thus, the geometry of the (3 × 2) system, as illustrated in Fig. 3.20, suggests thatwe can find least-squares solutions by solving the associated system (1):

ATAx = AT b. (1)

As the following theorem asserts, this solution procedure is indeed valid.

Theorem 17 Consider the (m× n) system Ax = b.

(a) The associated system ATAx = AT b is always consistent.(b) The least-squares solutions of Ax = b are precisely the solutions of ATAx =

AT b.(c) The least-squares solution is unique if and only if A has rank n.



We will give the proof of Theorem 17 in the next section. For now, we will illustratethe use of Theorem 17 and also give some examples showing the connections betweendata-fitting problems and least-squares solutions of inconsistent systems. (In parts (a)and (b) of Theorem 17, the associated equations ATAx = AT b are called the normalequations.)

Example 1 Find the least-squares solutions to the inconsistent system

x1 + 4x2 = −2x1 + 2x2 = 6

2x1 + 3x2 = 1.

Solution By Theorem 17, we can find the least-squares solutions by solving the normal equations,ATAx = AT b, where

A =

1 41 22 3

and b =

−2

61

.

Forming ATA and AT b, we obtain

ATA =[

6 1212 29

]and AT b =

[67

].

Solving the system ATAx = AT b, we find the least-squares solution

x∗ =[

3−1

].

Least-Squares Fits to DataOne of the major applications for least-squares solutions is to the problem of determiningbest least-squares fits to data. To introduce this important topic, consider a table of datasuch as the one displayed next.

Table 3.1

t t0 t1 t2 · · · tn

y y0 y1 y2 · · · yn

Suppose, when we plot the data in Table 3.1, that it has a distribution such as the oneshown in Fig. 3.21. When we examine Fig. 3.21, it appears that the data points nearlyfall along a line of the form y = mt + c. A logical question is: “What is the best linethat we can draw through the data, one that comes closest to representing the data?”



t

y

Figure 3.21 A nearly linear distribution of data points

In order to answer this question, we need a way to quantify the terms best andclosest. There are many different methods we might use to measure best, but one of themost useful is the least-squares criterion:

Find m and c to minimizen∑i=0

[(mti + c)− yi]2 (2)

The particular linear polynomial y = mt + c that minimizes the sum of squares inEq. (2) is called the best least-squares linear fit to the data in Table 3.1. (We see in thenext section that best least-squares linear fits always exist and are unique.)

Similarly, suppose the set of data points from Table 1 has a distribution such as theone displayed in Fig. 3.22. In this case, it appears that the data might nearly fall alongthe graph of a quadratic polynomial y = at2 + bt + c. As in Eq. (2), we can use aleast-squares criterion to choose the best least-squares quadratic fit:

Find a, b, and c to minimizen∑i=0

[(at2i + bti + c)− yi]2.

In a like manner, we can consider fitting data in a least-squares sense with polynomialsof any appropriate degree.

t

y

Figure 3.22 A nearly parabolic distribution of data points

In the next several subsections, we examine the connection between least-squaresfits to data and least-squares solutions to Ax = b.



Least-Squares Linear Fits to DataSuppose the laws of physics tell us that two measurable quantities, t and y, are relatedin a linear fashion:

y = mt + c. (3)

Suppose also that we wish to determine experimentally the values ofm and c. If we knowthat Eq. (3) models the phenomena exactly and that we have made no experimental error,then we can determine m and c with only two experimental observations. For instance,if y = y0 when t = t0 and if y = y1 when t = t1, we can solve for m and c from

mt0 + c = y0

mt1 + c = y1or

[t0 1t1 1

][m

c

]=[y0

y1

].

Usually we must be resigned to experimental errors or to imperfections in the modelgiven by Eq. (3). In this case, we would probably make a number of experimentalobservations, (ti , yi) for i = 0, 1, . . . , k. Using these observed values in Eq. (3) leadsto an overdetermined system of the form

mt0 + c = y0

mt1 + c = y1...

mtk + c = yk.In matrix terms, this overdetermined system can be expressed as Ax = b, where

A =

t0 1t1 1...

...

tk 1

, x =

[m

c

], and b =

y0

y1...yk

.

In this context, a least-squares solution to Ax = b is a vector x∗ = [m∗, c∗]T thatminimizes ‖Ax − b‖, where

‖Ax − b‖2 =k∑i=0

[(mti + c)− yi]2.

Comparing the equation above with the least-squares criterion (2), we see that the bestleast-squares linear fit, y = m∗t + c∗, can be determined by finding the least-squaressolution of Ax = b.

Example 2 Consider the experimental observations given in the following table:

t 1 4 8 11

y 1 2 4 5.

Find the least-squares linear fit to the data.



Solution For the function defined by y = mt + c, the data lead to the overdetermined system

m+ c = 14m+ c = 28m+ c = 4

11m+ c = 5.

In matrix terms, the system is Ax = b, where

A =

1 14 18 1

11 1

, x =

[m

c

], and b =

1245

.

The least-squares solution, x∗, is found by solving ATAx = AT b, where

ATA =[

202 2424 4

]and AT b =

[9612

].

There is a unique solution to ATAx = AT b because A has rank 2. The solution is

x∗ =[

12/2915/29

].

Thus the least-squares linear fit is defined by

y = 1229t + 15

29.

The data points and the linear fit are sketched in Fig. 3.23.

t

y

2 4 6 8 10 12

2

4

6y = t +12

291529

Figure 3.23 The least-squares linear fit to the data in Example 2

Using MATLAB to Find Least-Squares SolutionsUp to now we have been finding least-squares solutions to inconsistent systems bysolving the normal equations ATAx = AT b. This method is fine in theory but (becauseof roundoff error) it is not reliable for machine calculations—especially for large systems



Ax = b. MATLAB has several reliable alternatives for finding least-squares solutionsto inconsistent systems; these methods do not depend on solving the normal equations.

If A is not square, the simple MATLAB command x = A\b produces a least-squares solution to Ax = b using a QR-factorization of A. (In Chapter 7, we give athorough discussion of how to find least-squares solutions using QR-factorizations andHouseholder transformations.) If A is square but inconsistent, then the command x =A\b results in a warning but does not return a least-squares solution. If A is not square,a warning is also issued when A does not have full rank. In the next section we willgive more details about these matters and about using MATLAB to find least-squaressolutions.

Example 3 Lubricating characteristics of oils deteriorate at elevated temperatures, and the amountof bearing wear, y, is normally a linear function of the operating temperature, t . That is,y = mt + b. By weighing bearings before and after operation at various temperatures,the following table was constructed:

Operatingtemperature, ◦C 120 148 175 204 232 260 288 316 343 371

Amount ofwear, gm/10,000 hr 3 4 5 5.5 6 7.5 8.8 10 11.1 12

Determine the least-squares linear fit from these readings and use it to determine anoperating temperature that should limit bearing wear to 7 gm/10,000 hr of operation.

Solution For the system Ax = b, we see that A and b are given by

A =[

120 148 175 204 232 260 288 316 343 3711 1 1 1 1 1 1 1 1 1

]T

b = [3, 4, 5, 5.5, 6, 7.5, 8.8, 10, 11.1, 12]T .The least-squares solution toAx = b is found from the MATLAB commands in Fig. 3.24.

>>A=[120 148 175 204 232 260 288 316 343 371; 1 1 1 1 1 1 1 1 1 1]';>>b=[3 4 5 5.5 6 7.5 8.8 10 11.1 12]';>>x=A\b

x=

0.0362 -1.6151

Figure 3.24 The MATLAB commands for Example 3



From Fig. 3.24 we see that the least-squares linear fit is

y = (0.0362)t − 1.6151.

Setting y = 7 yields t = 237.986. Hence an operating temperature of about 238◦Cshould limit bearing wear to 7 gm/10,000 hr.

General Least-Squares FitsConsider the following table of data:

t t0 t1 t2 · · · tm

y y0 y1 y2 · · · ym

.

When the data points (ti , yi) are plotted in the ty-plane, the plot may reveal a trend that isnonlinear (see Fig. 3.25). For a set of data such as that sketched in Fig. 3.25, a linear fitwould not be appropriate. However, we might choose a polynomial function, y = p(t),where

p(ti) � yi, 0 ≤ i ≤ m.In particular, suppose we decide to fit the data with an nth-degree polynomial:

p(t) = antn + an−1tn−1 + · · · + a1t + a0, m ≥ n.

t

y

t0 t1 t2

Figure 3.25 Nonlinear data

As a measure for goodness of fit, we can ask for coefficients a0, a1, . . . , an thatminimize the quantity Q(a0, a1, . . . , an), where

Q(a0, a1, . . . , an) =m∑i=0

[p(ti)− yi]2

=m∑i=0

[(a0 + a1ti + · · · + antni )− yi]2.(4)



As can be seen by inspection, minimizing Q(a0, a1, . . . , an) is the same as minimizing‖Ax − b‖2, where

A =

1 t0 t20 · · · tn0

1 t1 t21 · · · tn1...

...

1 tm t2m · · · tnm

, x =

a0

a1...an

, and b =

y0

y1...ym

. (5)

As before, we can minimize ‖Ax − b‖2 = Q(a0, a1, . . . , an) by solving ATAx = AT b.The nth-degree polynomial p∗ that minimizes Eq. (4) is called the least-squares nth-degree fit.

Example 4 Consider the data from the following table:

t −2 −1 0 1 2

y 12 5 3 2 4.

Find the least-squares quadratic fit to the data.

Solution Since we want a quadratic fit, we are trying to match the form y = a0 + a1t + a2t2 to

the data. The equations area0 − 2a1 + 4a2 = 12a0 − a1 + a2 = 5a0 = 3a0 + a1 + a2 = 2a0 + 2a1 + 4a2 = 4.

This overdetermined system can be shown to be inconsistent. Therefore, we look for aleast-squares solution to Ax = b, where A and b are as in system (5), with n = 2 andm = 4.

The matrix A and the vectors x and b are

A =

1 −2 41 −1 11 0 01 1 11 2 4

, x =

a0

a1

a2

, and b =

125324

.

The least-squares solution of Ax = b is found by solving ATAx = AT b, where

ATA =

5 0 100 10 0

10 0 34

and AT b =

26−19

71

.



The solution is x∗ = [87/35,−19/10, 19/14], and hence the least-squares quadratic fitis

p(t) = 1914t2 − 19

10t + 87

35.

A graph of y = p(t) and the data points are sketched in Fig. 3.26.

–2 –1 1 2

5

10

t

y

y = t2 – t + 1914

1910

8735

Figure 3.26 Least-squares quadratic fit for the data in Example 4

The same principles apply when we decide to fit data with any linear combinationof functions. For example, suppose y = f (t) is defined by

f (t) = a1g1(t)+ a2g2(t)+ · · · + angn(t),where g1, g2, . . . , gn are given functions. We can use the method of least squares todetermine scalars a1, a2, . . . , an that will minimize

Q(a1, a2, . . . , an) =m∑i=1

[f (ti)− yi]2

=m∑i=1

{[a1g1(ti)+ a2g2(ti)+ · · · + angn(ti)] − yi}2.(6)

The ideas associated with minimizing Q(a1, a2, . . . , an) are explored in the exercises.

Rank Deficient MatricesIn each of Examples 1–4, the least-squares solution to Ax = b was unique. Indeed, ifA is (m× n), then part (c) of Theorem 17 states that least-squares solutions are uniqueif and only if the rank of A is equal to n. If the rank of A is less than n, then we say thatA is rank deficient, or A does not have full rank.



Therefore, when A is rank deficient, there is an infinite family of least-squaressolutions to Ax = b. Such an example is given next. This example is worked usingMATLAB, and we note that MATLAB produces only a single least-squares solution butdoes give a warning that A is rank deficient. In Section 3.9 we will discuss this topic inmore detail.

Example 5 For A and b as given, the system Ax = b has no solution. Find all the least-squaressolutions

A =

1 0 20 2 2−1 1 −1−1 2 0

, b =

3−3

0−3

.

Solution The MATLAB calculation is displayed in Fig. 3.27(a). Notice that MATLAB warns usthat A is rank deficient, having rank two. In Exercise 18 we ask you to verify that Adoes indeed have rank two.

>>A=[1 0 2;0 2 2;-1 1 -1;-1 2 0];>>b=[3 -3 0 -3]';>>x=A\b

Warning: Rank deficient, rank = 2 tol = 2.6645e-15

x=

0 -1.5000 0.5000

Figure 3.27(a) The MATLAB commands for Example 5

Since A is rank deficient, there are infinitely many least-squares solutions to theinconsistent system Ax = b. MATLAB returned just one of these solutions, namelyx = [0,−1.5, 0.5]T . We can find all the solutions by solving the normal equationsATAx = AT b.

Fig. 3.27(b) shows the result of using MATLAB to solve the normal equations forthe original system (since A and b have already been defined, in Fig. 3.27(a), MATLAB



>>NormEqn=[A'*A,A'*b]

NormEqn =

3 -3 3 6 -3 9 3 -12 3 3 9 0

>>rref(NormEqn)

ans =

1 0 2 1 0 1 1 -1 0 0 0 0

Figure 3.27(b) Setting up and solving the normal equations forExample 5

makes it very easy to define the augmented matrix for ATAx = AT b). The completesolution is x1 = 1− 2x3, x2 = −1− x3, or in vector form:

x∗ =

1− 2x3

−1− x3

x3

=

1−1

0

+ x3

−2−1

1

.

As can be seen from the complete solution just displayed, the particular MATLAB least-squares solution can be recovered by setting x3 = 0.5.

3.8 EXERCISES

In Exercises 1–6, find all vectors x∗ that minimize‖Ax−b‖, whereA and b are as given. Use the proceduresuggested by Theorem 17, as illustrated in Examples 1and 5.

1. A =

1 2−1 1

1 3

, b =

111

2. A =

1 2 4−2 −3 −7

1 3 5

, b =

112

3. A =

1 2 13 5 4−1 1 −4

, b =

130

4. A =

1 2 −12 3 1−1 −1 −2

3 5 0

, b =

1010

5. A =

1 22 43 6

, b =

02

16


3.9 Theory and Practice of Least Squares 255

6. A =

1 0 03 0 01 1 1

, b =

1131

In Exercises 7–10, find the least-squares linear fit to thegiven data. In each exercise, plot the data points and thelinear approximation.

7.t −1 0 1 2

y 0 1 2 4

8.t −2 0 1 2

y 2 1 0 0

9.t −1 0 1 2

y −1 1 2 3

10.t 0 1 2 3

y −2 3 7 10

In Exercises 11–14, find the least-squares quadratic fitto the given data. In each exercise, plot the data pointsand the quadratic approximation.

11.t −2 −1 1 2

y 2 0 1 2

12.t 0 1 2 3

y 0 0 1 2

13.t −2 −1 0 1

y −3 −1 0 3

14.t −2 0 1 2

y 5 1 1 5

15. Consider the following table of data:t t1 t2 · · · tm

y y1 y2 · · · ym.

For given functions g1 and g2, consider a functionf defined by f (t) = a1g1(t)+ a2g2(t). Show that

m∑i=1

[f (ti)− yi]2 = ‖Ax − b‖2,

where

A =

g1(t1) g2(t1)

g1(t2) g2(t2)...

g1(tm) g2(tm)

, x =

[a1

a2

], and

b =

y1

y2...ym

.

16. Let g1(t) = √t and g2(t) = cosπt , and considerthe data points (ti , yi), 1 ≤ i ≤ 4, defined by

t 1 4 9 16

y 0 2 4 5.

As in Eq. (6), let Q(a1, a2) = ∑4i=1[a1g1(ti) +

a2g2(ti) − yi]2, where g1(ti) = √ti and g2(ti) =cosπti .a) Use the result of Exercise 15 to determine A, x,

and b so that Q(a1, a2) = ‖Ax − b‖2.b) Find the coefficients for f (t) = a1

√t +

a2 cosπt that minimize Q(a1, a2).17. Consider the [(m+1)×(n+1)]matrixA in Eq. (5),

where m ≥ n. Show that A has rank n + 1. [Hint:Suppose that Ax = θ , where x = [a0, a1, . . . , an]T .What can you say about the polynomial p(t) =a0 + a1t + · · · + antn?]

18. Find the rank of the matrix A in Example 5.

3.9 THEORY AND PRACTICE OF LEAST SQUARES

In the previous section, we discussed least-squares solutions to Ax = b and the closelyrelated idea of best least-squares fits to data. In this section, we have two major objectives:

(a) Develop the theory for the least-squares problem in Rn



(b) Use the theory to explain some of the technical language associated with leastsquares so that we can become comfortable using computational packages suchas MATLAB for least-squares problems.

The Least-Squares Problem in Rn

The theory necessary for a complete understanding of least squares is fairly conciseand geometric. To ensure our development is completely unambiguous, we begin byreviewing some familiar terminology and notation. In particular, let x be a vector in Rn,

x =

x1

x2...xn

.

We define the distance between two vectors x and y to be the length of the vector x− y;recall that the length of x − y is the number ‖x − y‖ where

‖x − y‖ = √(x − y)T (x − y)

= √(x1 − y1)2 + (x2 − y2)2 + · · · + (xn − yn)2.The problem we wish to consider is stated next.

The Least-Squares Problem in Rn

Let W be a p-dimensional subspace of Rn. Given a vector v in Rn, find a vectorw∗ in W such that

‖v − w∗‖ ≤ ‖v − w‖, for all w in W .

The vector w∗ is called the best least-squares approximation to v.

That is, among all vectors w in W , we want to find the special vector w∗ in W thatis closest to v. Although this problem can be extended to some very complicated andabstract settings, examination of the geometry of a simple special case will exhibit afundamental principle that extends to all such problems.

Consider the special case where W is a two-dimensional subspace of R3. Geomet-rically, we can visualizeW as a plane through the origin (see Fig. 3.28). Given a point vnot on W , we wish to find the point in the plane, w∗, that is closest to v. The geometryof this problem seems to insist (see Fig. 3.28) that w∗ is characterized by the fact thatthe vector v − w∗ is perpendicular to the plane W .

The next theorem shows that Fig. 3.28 is not misleading. That is, if v − w∗ isorthogonal to every vector in W , then w∗ is the best least-squares approximation to v.

Theorem 18 Let W be a p-dimensional subspace of Rn, and let v be a vector in Rn. Suppose thereis a vector w∗ in W such that (v−w∗)T w = 0 for every vector w in W . Then w∗ is thebest least-squares approximation to v.



y

z

x

v

v – w*

b

w*

W

Figure 3.28 w∗ is the closest point in the plane W to v

Proof Let w be any vector inW and consider the following calculation for the distance from vto w:

‖v − w‖2 = ‖(v − w∗)+ (w∗ − w)‖2

= (v − w∗)T (v − w∗)+ 2(v − w∗)T (w∗ − w)+ (w∗ − w)T (w∗ − w)= ‖v − w∗‖2 + ‖w∗ − w‖2.

(1)

(The last equality follows because w∗−w is a vector inW , and therefore (v−w∗)T (w∗−w) = 0.) Since ‖w∗ − w‖2 ≥ 0, it follows from Eq. (1) that ‖v − w‖2 ≥ ‖v − w∗‖2.Therefore, w∗ is the best approximation to v.

The equality in calculation (1), ‖v−w‖2 = ‖v−w∗‖2+‖w∗−w‖2, is reminiscent ofthe Pythagorean theorem. A schematic view of this connection is sketched in Fig. 3.29.

w

v

w*W

Figure 3.29 A geometric interpretation of the vector w∗ in W closestto v

In a later theorem, we will show that there is always one, and only one, vector w∗in W such that v − w∗ is orthogonal to every vector in W . Thus it will be establishedthat the best approximation always exists and is always unique. The proof of this factwill be constructive, so we now concentrate on methods for finding w∗.

Finding Best ApproximationsTheorem 18 suggests a procedure for finding the best approximation w∗. In particular,we should search for a vector w∗ in W that satisfies the following condition:

If w is any vector in W , then (v − w∗)T w = 0.



The search for w∗ is simplified if we make the following observation: If v − w∗ is or-thogonal to every vector in W , then v− w∗ is also orthogonal to every vector in a basisfor W . In fact, see Theorem 19, the condition that v − w∗ be orthogonal to the basisvectors is both necessary and sufficient for v−w∗ to be orthogonal to every vector inW .

Theorem 19 Let W be a p-dimensional subspace of Rn, and let {u1, u2, . . . , up} be a basis for W .Let v be a vector in Rn. Then (v − w∗)T w = 0 for all w in W if and only if

(v − w∗)T ui = 0, 1 ≤ i ≤ p.The proof of Theorem 19 is left as Exercise 17.

As Theorem 19 states, the best approximation w∗ can be found by solving the pequations:

(v − w∗)T u1 = 0(v − w∗)T u2 = 0

...

(v − w∗)T up = 0

(2)

Suppose we can show that these p equations always have a unique solution. Then, byTheorem 18, it will follow that the best approximation exists and is unique.

Existence and Uniqueness of Best ApproximationsWe saw above that w∗ is a best least-squares approximation to v if the vector v − w∗satisfies system (2). We now use this result to prove that best approximations always existand are always unique. In addition, we will give a formula for the best approximation.

Theorem 20 LetW be a p-dimensional subspace of Rn and let v be a vector in Rn. Then there is oneand only one best least-squares approximation in W to v.

Proof The proof of existence is based on finding a solution to the system of Eq. (2). Now,system (2) is easiest to analyze and solve if we assume the basis vectors are orthogonal.

In particular, let {u1, u2, . . . , up} be an orthogonal basis for W (in Section 3.6 weobserved that every subspace of Rn has an orthogonal basis). Let w∗ be a vector in Wwhere

w∗ = a1u1 + a2u2 + · · · + apup. (3)

Using Eq. (3), the equations in system (2) become

(v − (a1u1 + a2u2 + · · · + apup))T ui = 0, for i = 1, 2, . . . , p.

Then, because the basis vectors are orthogonal, the preceding equations simplify con-siderably:

vT ui − aiuTi ui = 0, for i = 1, 2, . . . , p.

Solving for the coefficients ai , we obtain

ai = vT uiuTi ui

.



Note that the preceding expression for ai is well defined since ui is a basis vector, andhence the denominator uTi ui cannot be zero.

Having solved the system (2), we can write down an expression for a vector w∗such that (v − w∗)T w = 0 for all w in W . By Theorem 18, this vector w∗ is a bestapproximation to v:

w∗ =p∑i=1

vT uiuTi ui

ui . (4)

Having established the existence of best approximations with formula (4), we turnnow to the question of uniqueness. To begin, suppose w is any best approximation tov, and w∗ is the best approximation defined by Eq. (4). Since the vector v − w∗ wasconstructed so as to be orthogonal to every vector in W , we can make a calculationsimilar to the one in Eq. (1) and conclude the following:

‖v − w‖2 = ‖v − w∗‖2 + ‖w∗ − w‖2.

But, if w and w∗ are both best approximations to v, then it follows from the equationabove that ‖w∗−w‖2 = 0. This equality implies that w∗−w = θ or w∗ = w. Therefore,uniqueness of best approximations is established.

The following example illustrates how a best approximation can be found fromEq. (4).


W = {x: x =x1

x2

x3

, x1 + x2 − 3x3 = 0}.

Let v be the vector v = [1,−2,−4]T . Use Eq. (4) to find the best least-squares approx-imation to v.

Solution Our first task is to find an orthogonal basis forW . We will use the Gram–Schmidt processto find such a basis.

To begin, x is in W if and only if x1 = −x2 + 3x3. That is, if and only if x has theform

x =−x2 + 3x3

x2

x3

= x2

−1

10

+ x3

301

.

Therefore, a natural basis for W consists of the two vectors w1 = [−1, 1, 0]T andw2 = [3, 0, 1]T .

We now use the Gram–Schmidt process to derive an orthogonal basis {u1, u2} fromthe natural basis {w1,w2}. In particular,

(a) Let u1 = w1.(b) Choose a scalar a so that u2 = w2 + au1 is orthogonal to u1.



To find the scalar a in (b), consider

uT1 u2 = uT1 (w2 + au1)

= uT1 w2 + auT1 u1.

Thus, to have uT1 u2 = 0, we need uT1 w2 + auT1 u1 = 0, or

a = −uT1 w2

uT1 u1

= −−32

= 1.5.

Having found a, we calculate the second vector in the orthogonal basis for W , findingu2 = w2 + 1.5u1 = [3, 0, 1]T + 1.5[−1, 1, 0]T = [1.5, 1.5, 1]T .

Next, let w∗ = a1u1 + a2u2 denote the best approximation, and determine thecoefficients of w∗ using Eq. (4):

a1 = vT u1

uT1 u1= −3

2= −1.5

a2 = vT u2

uT2 u2= −5.5

5.5= −1.

Therefore, the best approximation is given by

w∗ = −1.5u1 − u2 = −1.5[−1, 1, 0]T − [1.5, 1.5, 1]T = [0,−3,−1]T .(As a check for the calculations, we can form v − w∗ = [1, 1,−3]T and verify thatv − w∗ is orthogonal to each of the original basis vectors, w1 = [−1, 1, 0]T and w2 =[3, 0, 1]T .)

Least-Squares Solutions to Inconsistent Systems Ax = bIn Section 3.8 we were interested in a special case of least-squares approximations—finding least-squares solutions to inconsistent systems Ax = b. Recall that our methodfor finding least-squares solutions consisted of solving the normal equations ATAx =AT b. In turn, the validity of the normal equations approach was based on Theorem 17,which said:

(a) The normal equations are always consistent.(b) The solutions of the normal equations are precisely the least-squares solutions

of Ax = b.(c) If A is (m× n), then least-squares solutions of Ax = b are unique if and only

if A has rank n.

We are now in a position to sketch a proof of Theorem 17. The basic ideas support-ing Theorem 17 are very important to a complete understanding of least-squares so-lutions of inconsistent systems. These ideas are easy to explain and are illustrated inFig. 3.30.



Rn Rm

b

y*y = Ax

R(A)

Figure 3.30 A geometric visualization of Theorem 17

In Fig. 3.30, we think of the (m × n) matrix A as defining a function of theform y = Ax from Rn to Rm. The subspace R(A) represents the range of A; it is ap-dimensional subspace of Rm. We have drawn the vector b so that it is not in R(A),illustrating the case where the system Ax = b is inconsistent. The vector y∗ representsthe (unique) best approximation inR(A) to b.

Proof of Theorem 17 Because y∗ is in R(A), there must be vectors x in Rn such that Ax = y∗. In addition,because y∗ is the closest point inR(A) to b, we can say:

A vector x inRn is a best least-squares solution to Ax = bif and only if Ax = y∗. (5)

In order to locate y∗ in W , we note that y∗ is characterized by wT (y∗ − b) = 0 forany vector w in R(A). Then, since the columns of A form a spanning set for R(A), y∗can be characterized by the conditions:

ATi (y∗ − b) = 0, for i = 1, 2, . . . , n. (6)

The orthogonality conditions above can be rewritten in matrix/vector terms as

AT (y∗ − b) = θ . (7)

Finally, since y∗ is in R(A), finding y∗ to solve Eq. (7) is the same as finding vectors xin Rn that satisfy the normal equations:

AT (Ax − b) = θ . (8)

We can now complete the proof of Theorem 17 by making the observation thatEq. (6) and Eq. (8) are equivalent in the following sense: A vector x in Rn satisfiesEq. (8) if and only if the vector y∗ satisfies Eq. (6), where y∗ = Ax.

To establish part (a) of Theorem 17, we note that Eq. (6) is consistent, and hencethe normal equations given in Eq. (8) are consistent as well. Part (b) of Theorem17 follows from rule (5) and the equivalence of equations (6) and (8). Part (c) ofTheorem 17 follows because Ax = y∗ has a unique solution if and only if the columnsof A are linearly independent.



Uniqueness of Least-Squares Solutions to Ax = bBest least-squares approximations are always unique but least-squares solutions toAx =b might or might not be unique. The preceding statement is somewhat confusing becausethe term least-squares is being used in two different contexts. To clarify this widelyaccepted, but somewhat unfortunate, choice of terms, we can refer to Fig. 3.30.

In Fig. 3.30, the best least-squares approximation, y∗, is unique (uniqueness wasproved in Theorem 20). A best least-squares solution to Ax = b, however, is a vector xsuch thatAx = y∗, and there might or might not be infinitely many solutions toAx = y∗.(The equation Ax = y∗ is always consistent because y∗ is in R(A); the equation has aunique solution if and only if the columns of A are linearly independent.)

Recall from the previous section that an (m×n)matrixA is called rank deficient if ithas rank less than n (that is, if the columns ofA are linearly dependent). WhenA is rankdeficient, there are infinitely many least-squares solutions to Ax = b. In this instance,we might want to select the minimum norm solution as the least-squares solution we usein our application. To explain, we say x∗ is the minimum norm least-squares solutionto Ax = b if ‖x∗‖ minimizes ‖x‖ among all least-squares solutions. That is,

‖x∗‖ = min{‖x‖ : Ax = y∗}.It can be shown that the minimum norm solution always exists and is always unique.

The minimum norm solution is associated with another least-squares concept, thatof the pseudoinverse of A. The pseudoinverse of A is, in a sense, the closest thing toan inverse that a rectangular matrix can have. To explain the idea, we first introducethe Frobenius norm for an (m× n) matrix A. The Frobenius norm, denoted ‖A‖F , isdefined by the following:

‖A‖F =√√√√ m∑

i=1

n∑j=1

a2ij .

Just as ‖x‖ measures the size of a vector x, ‖A‖F measures the size of a matrix A.Now, letA be an (m×n)matrix. The pseudoinverse ofA, denotedA+, is the (n×m)

matrix that minimizes ‖AX − I‖F where I denotes the (m×m) identity matrix. It canbe shown that such a minimizing matrix always exists and is always unique. As can beseen from the definition of the pseudoinverse, it is the closest thing (in a least-squaressense) to an inverse for a rectangular matrix. In the event thatA is square and invertible,then the pseudoinverse coincides with the usual inverse, A−1. It can be shown that theminimum norm least-squares solution of Ax = b can be found from

x∗ = A+b.

An actual calculation of the pseudoinverse is usually made with the aid of another typeof decomposition, the singular-value decomposition. A discussion of the singular-valuedecomposition would lead us too far afield, and so we ask the interested reader to consulta reference, such as Golub and Van Loan, Matrix Computations.

MATLAB and Least-Squares SolutionsAs we noted in the previous section, there are several ways to solve least-squares prob-lems using MATLAB.



(a) If A is (m × n) with m �= n, then the MATLAB command A\b returns aleast-squares solution to Ax = b. If A happens to be rank deficient, thenMATLAB selects a least-squares solution with no more than p nonzero entries(where p denotes the rank ofA). The least-squares solution is calculated usinga QR-factorization for A (see Chapter 7).

(b) IfA is square and inconsistent, then the MATLAB command A\bwill producea warning thatA is singular or nearly singular, but will not give a least-squaressolution. One way to use MATLAB to find a least-squares solution for a squarebut inconsistent system is to set up and solve the normal equations.

(c) WhetherA is square or rectangular, the MATLAB commandx = pinv(A)*bwill give the minimum norm least-squares solution; the command pinv(A)generates the pseudoinverse A+.

Example 2 The following sample values from the function z = f (x, y) were obtained from exper-imental observations:

f (1, 1) = −1.1 f (1, 2) = 0.9f (2, 1) = 0.2 f (2, 2) = 2.0f (3, 1) = 0.9 f (3, 2) = 3.1

We would like to approximate the surface z = f (x, y) by a plane of the form z =ax + by + c. Use a least-squares criterion to choose the parameters a, b, and c.

Solution The conditions implied by the experimental observations are

a + b + c = −1.12a + b + c = 0.23a + b + c = 0.9a + 2b + c = 0.9

2a + 2b + c = 2.03a + 2b + c = 3.1.

A least-squares solution,

a = 1.05, b = 2.00, c = −4.10,

to this overdetermined system Ax = b was found using MATLAB, see Fig. 3.31. SinceMATLAB did not issue a rank deficient warning, we can assume that A has full rank(rank equal to 3) and therefore that the least-squares solution is unique.

Example 3 Find a least-squares solution to the equation Ax = b where

A =

1 1 21 2 31 3 41 4 5

, b =

1212

.



>>A=[1,1,1;2,1,1;3,1,1;1,2,1;2,2,1;3,2,1];>>b=[-1.1,.2,.9,.9,2.,3.1]';>>x=A\b

x =

1.0500 2.0000 -4.1000

Figure 3.31 The results of Example 2

Solution The results are shown in Fig. 3.32(a). Note that MATLAB has issued a rank deficientwarning and concluded thatA has rank 2. BecauseA is not full rank, least-squares solu-tions toAx = b are not unique. SinceA has rank 2, the MATLAB command A\b selectsa solution with no more than 2 nonzero components, namely x1 = [0.0,−0.8, 1.1]T .

As an alternative, we can use the pseudoinverse to calculate the minimum-normleast-squares solution (see Fig. 3.32(b)). As can be seen from Fig. 3.32(b), the MATLABcommand pinv(A)*b has produced the least-squares solution x2 = [0.6,−0.2, 0.4]T .A calculation shows that ‖x1‖ = 1.2806, while the minimum norm solution inFig. 3.32(b) has ‖x2‖ = 0.7483.

Finally, to complete this example, we can find all possible least-squares solutionsby solving the normal equations. We find, using the MATLAB command rref(B),

>>x=A\b

Warning: Rank deficient, rank = 2

x =

0 -0.8000 1.1000

x=pinv(A)*b

x =

0.6000 -0.2000 0.4000

(b)(a)

Figure 3.32 (a) Using the command A\b to find a least-squaressolution for Example 3. (b) Using the pseudoinverse to find aleast-squares solution for Example 3.



that the augmented matrix B = [ATA |AT b] is row equivalent to

1 0 1 10 1 1 .20 0 0 0

.

Thus, the set of all least-squares solutions are found from x = [1− x3, 0.2− x3, x3]T =[1, 0.2, 0]T + x3[−1,−1, 1]T .

Example 4 As a final example to illustrate how MATLAB treats inconsistent square systems, find aleast-squares solution to Ax = b where

A =

2 3 51 0 33 3 8

, b =

111

.

Solution The results are given in Fig. 3.33 where, for clarity, we used the rational format to displaythe calculations. As can be seen, the MATLAB command A\b results in a warning thatA may be ill conditioned and may have a solution vector with very large components.

Then, a least-squares solution is calculated using the pseudoinverse. The least-squares solution found is x = [2/39,−2/13, 8/39]T .

>>A=[2,3,5;1,0,3;3,3,8]; >>b=[1,1,1]'; >>x=A\b

Warning: Matrix is close to singular or badly scaled. Results may be inaccurate. RCOND = 6.405133e-18

x =

-67553994410557447505999378950822251799813685248

>>x=pinv(A)*b

x =

2/39 -2/13 8/39

Figure 3.33 The results from Example 4



3.9 EXERCISES

Exercises 1–16 refer to the following subspaces:a) W =x: x =

x1

x2

x3

, x1 − 2x2 + x3 = 0

b) W = R(B), B =

1 21 10 1

c) W = R(B), B =

1 2 4−1 0 −2

1 1 3

d) W =x: x =

x1

x2

x3

, x1 + x2 + x3 = 0

x1 − x2 − x3 = 0

In Exercises 1–10, find a basis for the indicated subspaceW . For the given vector v, solve the normal equations(2) and determine the best approximation w∗. Verifythat v − w∗ is orthogonal to the basis vectors.

1. W given by (a), v = [1, 2, 6]T2. W given by (a), v = [3, 0, 3]T3. W given by (a), v = [1, 1, 1]T4. W given by (b), v = [1, 1, 6]T5. W given by (b), v = [3, 3, 3]T6. W given by (b), v = [3, 0, 3]T7. W given by (c), v = [2, 0, 4]T8. W given by (c), v = [4, 0,−1]T9. W given by (d), v = [1, 3, 1]T

10. W given by (d), v = [3, 4, 0]TIn Exercises 11–16, find an orthogonal basis for the in-dicated subspace W . Use Eq. (4) to determine the bestapproximation w∗ for the given vector v.11. W and v as in Exercise 112. W and v as in Exercise 213. W and v as in Exercise 414. W and v as in Exercise 515. W and v as in Exercise 716. W and v as in Exercise 817. Prove Theorem 19.


1. Let

W = {x: x =[x1

x2

], x1x2 = 0}.

Verify that W satisfies properties (s1) and (s3) ofTheorem 2. Illustrate by example that W does notsatisfy (s2).

2. Let

W = {x: x =[x1

x2

], x1 ≥ 0, x2 ≥ 0}.

Verify that W satisfies properties (s1) and (s2) ofTheorem 2. Illustrate by example that W does notsatisfy (s3).

3. Let

A =

2 −1 11 4 −12 2 1

and

W = {x: x =x1

x2

x3

, Ax = 3x}.

a) Show that W is a subspace of R3.b) Find a basis for W and determine dim(W).



4. If

S =

11−2

,

213

and

T =

105

,

01−7

,

321

,

then show that Sp(S) = Sp(T ). [Hint: Obtain an al-gebraic specification for each of Sp(S) and Sp(T ).]

5. Let

A =

1 −1 2 32 −2 5 41 −1 0 7

.

a) Reduce the matrix A to echelon form, anddetermine the rank and the nullity of A.

b) Exhibit a basis for the row space of A.c) Find a basis for the column space of A (that is,

for R(A)) consisting of columns of A.d) Use the answers obtained in parts b) and c) to

exhibit bases for the row space and the columnspace of AT .

e) Find a basis for N (A).6. Let S = {v1, v2, v3}, where

v1 =

1−1

1

, v2 =

12−1

, and

v3 =

33−1

.

a) Find a subset of S that is a basis for Sp(S).b) Find a basis for Sp(S) by setting A =[v1, v2, v3] and reducing AT to echelon form.

c) Give an algebraic specification for Sp(S), anduse that specification to obtain a basis for Sp(S).

7. Let A be the (m× n) matrix defined by

A =

n+ 1 n+ 2 · · · 2n− 1 2n2n+ 1 2n+ 2 · · · 3n− 1 3n...

...

mn+ 1 mn+ 2 · · · (m+ 1)n− 1 (m+ 1)n

.

Find a basis for the row space of A, and determinethe rank and the nullity of A.

8. In a)–c), use the given information to determine thenullity of T .a) T : R3 → R2 and the rank of T is 2.b) T : R3 → R3 and the rank of T is 2.c) T : R3 → R3 and the rank of T is 3.

9. In a)–c), use the given information to determine therank of T .a) T : R3 → R2 and the nullity of T is 2.b) T : R3 → R3 and the nullity of T is 1.c) T : R2 → R3 and the nullity of T is 0.

10. Let B = {x1, x2} be a basis for R2, and let T :R2 → R2 be a linear transformation such that

T (x1) =[

11

]and T (x2) =

[2−1

].

If e1 = x1 − 2x2 and e2 = 2x1 + x2, where e1 ande2 are the unit vectors in R2, then find the matrixof T .

11. Let

b =[a

b

],

and suppose that T : R3 → R2 is a linear transfor-mation defined by T (x) = Ax, where A is a (2× 3)matrix such that the augmented matrix [A | b] re-duces to

[1 0 8 −5a + 3b0 1 −3 2a − b

].

a) Find vectors x1 and x2 in R3 such that T (x1) =e1 and T (x2) = e2, where e1 and e2 are the unitvectors in R2.

b) Exhibit a nonzero vector x3 in R3 such that x3 isin N (T ).

c) Show that B = {x1, x2, x3} is a basis for R3.d) Express each of the unit vectors e1, e2, e3 of R3

as a linear combination of the vectors in B.Now calculate T (ei ), i = 1, 2, 3, and deter-mine the matrix A.



In Exercises 12–18, b = [a, b, c, d]T , T : R6 → R4 isa linear transformation defined by T (x) = Ax, and A isa (4× 6) matrix such that the augmented matrix [A | b]reduces to

1 0 2 0 −3 10 1 −1 0 2 20 0 0 1 −1 −20 0 0 0 0 0

∣∣∣∣∣∣∣∣∣

4a + b − 2c12a + 5b − 7c−5a − 2b + 3c−16a − 7b + 9c + d

.

12. Exhibit a basis for the row space ofA, and determinethe rank and the nullity of A.

13. Determine which of the following vectors are inR(T ). Explain how you can tell.

w1 =

1−1

10

, w2 =

1132

,

w3 =

2−2

19

, w4 =

2143

14. For each vector wi , i = 1, 2, 3, 4 listed in Exer-cise 13, if the system of equations Ax = wi is con-sistent, then exhibit a solution.

15. For each vector wi , i = 1, 2, 3, 4 listed in Exer-cise 13, if wi is in R(T ), then find a vector x in R6

such that T (x) = wi .

16. Suppose that A = [A1,A2,A3,A4,A5,A6].a) For each vector wi , i = 1, 2, 3, 4, listed in Ex-

ercise 13, if wi is in the column space of A, thenexpress wi as a linear combination of thecolumns of A.

b) Find a subset of {A1,A2,A3,A4,A5,A6} that isa basis for the column space of A.

c) For each column, Aj , of A that does not appearin the basis obtained in part b), express Aj as alinear combination of the basis vectors.

d) Let b = [1,−2, 1,−7]T . Show that b is in thecolumn space of A, and express b as a linearcombination of the basis vectors found inpart b).

e) If x = [2, 3, 1,−1, 1, 1]T , then express Ax as alinear combination of the basis vectors found inpart b).

17. a) Give an algebraic specification for R(T ), anduse that specification to determine a basis forR(T ).

b) Show that b = [1, 2, 3, 3]T is in R(T ), andexpress b as a linear combination of the basisvectors found in part a).

18. a) Exhibit a basis for N (T ).b) Show that x = [6, 1, 1,−2, 2,−2]T is

in N (T ), and express x as a linearcombination of the basis vectors foundin part a).


In Exercises 1–12, answer true or false. Justify your an-swer by providing a counterexample if the statement isfalse or an outline of a proof if the statement is true.

1. If W is a subspace of Rn and x and y are vectors inRn such that x + y is in W , then x is in W and y isin W .

2. If W is a subspace of Rn and ax is in W , where a isa nonzero scalar and x is in Rn, then x is in W .

3. If S = {x1, . . . , xk} is a subset of Rn and k ≤ n,then S is a linearly independent set.

4. If S = {x1, . . . , xk} is a subset of Rn and k > n,then S is a linearly dependent set.

5. If S = {x1, . . . , xk} is a subset of Rn and k < n,then S is not a spanning set for Rn.

6. If S = {x1, . . . , xk} is a subset of Rn and k ≥ n,then S is a spanning set for Rn.

7. If S1 and S2 are linearly independent subsets of Rn,then the set S1 ∪ S2 is also linearly independent.

8. If W is a subspace of Rn, then W has exactly onebasis.



9. IfW is a subspace of Rn, and dim(W) = k, thenWcontains exactly k vectors.

10. If B is a basis for Rn and W is a subspace of Rn,then some subset of B is a basis for W .

11. If W is a subspace of Rn, and dim(W) = n, thenW = Rn.

12. Let W1 and W2 be subspaces of Rn with bases B1and B2, respectively. Then B1 ∩ B2 is a basis forW1 ∩W2.

In Exercises 13–23, give a brief answer.13. Let W be a subspace of Rn, and set V = {x: x is in

Rn but x is not inW}. Determine if V is a subspaceof Rn.

14. Explain what is wrong with the following argument:LetW be a subspace ofRn, and letB = {e1, . . . , en}be the basis of Rn consisting of the unit vectors.SinceB is linearly independent and since every vec-tor w inW can be written as a linear combination ofthe vectors in B, it follows that B is a basis for W .

15. If B = {x1, x2, x3} is a basis for R3, show thatB ′ = {x1, x1 + x2, x1 + x2 + x3} is also a basisfor R3.

16. Let W be a subspace of Rn, and let S ={w1, . . . ,wk} be a linearly independent subset ofWsuch that {w1, . . . ,wk,w} is linearly dependent forevery w in W . Prove that S is a basis for W .

17. Let {u1, . . . , un} be a linearly independent subset ofRn, and let x inRn be such that uT1 x = · · · = uTn x =0. Show that x = θ .

18. Let u be a nonzero vector in Rn, and let W be thesubset of Rn defined by W = {x: uT x = 0}.a) Prove that W is a subspace of Rn.b) Show that dim(W) = n− 1.c) If θ = w + cu, where w is in W and c is a

scalar, show that w = θ and c = 0. [Hint:Consider uT (w + cu).]

d) If {w1, . . . ,wn−1} is a basis for W , show that{w1, . . . ,wn−1, u} is a basis for Rn. [Hint:Suppose that c1w1 + · · · + cn−1wn−1 + cu = θ .

Now set w = c1w1 + · · · + cn−1wn−1 and usepart c).]

19. Let V andW be subspaces ofRn such that V ∩W ={θ} and dim(V )+ dim(W) = n.a) If v + w = θ , where v is in V and w is in W ,

show that v = θ and w = θ .b) If B1 is a basis for V and B2 is a basis for W ,

show that B1 ∪ B2 is a basis for Rn. [Hint:Use part a) to show that B1 ∪ B2 is linearlyindependent.]

c) If x is in Rn, show that x can be written in theform x = v+w, where v is in V and w is in W .[Hint: First note that x can be written as a linearcombination of the vectors in B1 ∪ B2.]

d) Show that the representation obtained in part c)is unique; that is, if x = v1 + w1, where v1 is inV and w1 is in W , then v = v1 and w = w1.

20. A linear transformation T : Rn → Rn is onto pro-vided thatR(T ) = Rn. Prove each of the following.a) If the rank of T is n, then T is onto.b) If the nullity of T is zero, then T is onto.c) If T is onto, then the rank of T is n and the

nullity of T is zero.21. If T : Rn → Rm is a linear transformation, then

show that T (θn) = θm, where θn and θm are the zerovectors in Rn and Rm, respectively.

22. Let T : Rn → Rm be a linear transformation, andsuppose that S = {x1, . . . , xk} is a subset of Rnsuch that {T (x1), . . . , T (xk)} is a linearly indepen-dent subset of Rm. Show that the set S is linearlyindependent.

23. Let T : Rn → Rm be a linear transformationwith nullity zero. If S = {x1, . . . , xk} is a lin-early independent subset of Rn, then show that{T (x1), . . . , T (xk)} is a linearly independent subsetof Rm.



MATLAB EXERCISES

A continuing problem for university administrations is managing admissions so that thefreshman class entering in the fall is neither too large nor too small. As you know, most highschool seniors apply simultaneously for admission to several different universities. Therefore,a university must accept more applicants than it can handle in order to compensate for theexpected number who decline an offer of admission.

Least-squares fits to historical data is often used for forecasting, whether it be forecastinguniversity enrollments, or for business applications such as inventory control, or for technicalapplications such as modeling drag based on wind-tunnel data. In this exercise, we use alinear least-squares fit to model enrollment data.

1. Forecasting enrollments The following enrollment data is from Virginia Tech. It liststhe total number of students, both undergraduate and graduate.

Total enrollment at Virginia Tech, 1979–1996

Year Number Year Number Year Number

1979 20414 1985 22044 1991 239121980 21071 1986 22345 1992 236371981 21586 1987 22702 1993 238651982 21510 1988 22361 1994 238731983 21356 1989 22922 1995 236741984 22454 1990 23365 1996 24812

a) To get a feeling for the data, enter the numbers of students in a vector called TOTAL.Then, issue the MATLAB command plot(TOTAL,‘o’). This command will givea scatterplot of the sixteen data points. If you want the years listed on the horizontalaxis, you can define the vector YEAR with the command YEAR = 1979:1996 andthen use the plot command plot(YEAR, TOTAL,‘o’). Note that the scatterplotindicates a general trend of increasing enrollments, but with enrollments that decreasefrom time to time.

b) Because it is such a common but important problem, MATLAB has commands thatcan be used to generate best least-squares polynomial approximations. In particular,given data vectors X and Y, the command A = polyfit(X, Y, n)gives thevector of coefficients for the best least-squares polynomial of degree n to the data.Given a vector of evaluation points T, the command POFT = polyval(A, T)willevaluate (at each point of T) the polynomial having a vector of coefficients given byA. Use the polyfit command A = polyfit(YEAR,TOTAL,1) to generate thecoefficients for a linear fit of the data graphed in part a). Issue the hold command tohold the graph from part a). Generate the vector Y from Y=polyval(A,YEAR)and note that Y is the vector of values of the linear fit. Issue the commandplot(YEAR,Y) to superimpose the graph of the linear fit on the scatterplot frompart a).

c) In order to gain a feeling for how well the linear fit works as a forecasting tool,imagine that you do not know the enrollments for 1996 and 1995. Calculate the linearfit for the smaller set of data, the years 1979–1994, a set of 16 points. How well doesthe linear fit over these sixteen points predict the actual enrollment numbers for 1995and 1996?



d) Use the linear fit calculated in part b) to estimate the year when enrollment can beexpected to reach 30,000 and the year when enrollment should reach 35,000.

How does a computer evaluate functions such as y = cos x or y = ex? Exercises 2–5illustrate how mathematical functions such as y = tan x are evaluated on a computer orcalculator. By way of introduction, note that the only operations a computer can actuallyperform are addition, subtraction, multiplication, and division. A computer cannot actuallyevaluate functions such as y = √x or y = sin x; instead, whenever a number such asy = √2.7 or y = sin 2.7 is requested, the computer executes an algorithm that yields anapproximation of the requested number. We now consider some computer algorithms forapproximating mathematical functions.

2. We begin by formulating a method for estimating the function y = cos x. Recall thaty = cos x is periodic with period 2π . Therefore, if we can find a polynomial y = g(x)that is a good approximation when x is [−π, π ], then we can also use it to approximatey = cos a for any real number a. To illustrate this point, consider the value a = 17.3 andnote that 5π < 17.3 < 7π . Now, let x = 17.3− 6π and note:

x = 17.3− 6π is in the interval [−π, π ]. (1)

cos 17.3 = cos(17.3− 6π) = cos x ≈ g(x). (2)

In general, if a is any real number, then we can always locate a between two successiveodd multiples ofπ , (2k−1)π ≤ a ≤ (2k+1)π . Having located a, we see that x = a−2kπis in the interval [−π, π ] and therefore we have a good approximation for the valuecos(a − 2kπ), namely g(a − 2kπ). But, because of periodicity, cos(a − 2kπ) = cos a,and so g(a − 2kπ) is also a good approximation for cos a.

In light of the preceding observations, we turn our attention to the problem of ap-proximating y = cos x, for x in the interval [−π, π ]. First, note that the approximationinterval can be further reduced from [−π, π ] to [0, π/2]. In particular, if we have anapproximation to y = cos x that is good in [0, π/2], then we can use it to give a goodapproximation in [−π, π ]. We ask you to establish this fact in part a).a) Suppose y = g(x) is a good approximation to y = cos x whenever x is in the

interval [0, π/2]. Now, let a be in the interval [π/2, π ]. Use inequalities to show thatπ − a is in the interval [0, π/2]. Next, use trigonometric identities to show thatcos a = − cos(π − a). Thus, for a in [π/2, π ], we can use the approximation:

cos a = − cos(π − a) ≈ −g(π − a).Finally, since cos−x = cos x, the approximation cos x ≈ g(x) can be extended to theinterval [−π, π ].

b) In part a) we saw that if we had an approximation for y = cos x that is a good one in[0, π/2], then we could use it to approximate y = cos a for any real value a. In thispart, we see how a least-squares approximation y = g(x) will serve as a good way toestimate y = cos x in [0, π/2].

If we want to generate a least-squares approximation toy = cos x, we need a collectionof data points (xi, cos xi), i = 1, 2, . . . , m. Given these m data points, we can choosey = g(x) to be the best least-squares polynomial approximation of degree n for the data.In order to carry out this project, however, we need to select appropriate values for bothm and n. There is a rule of thumb that is well known among people who need to analyzedata:



The degree of the least-squares fit should be about half the number ofdata points.

So, when we have m = 10 data points, we might guess that a polynomial of degreen = 5 will provide a reasonable least-squares fit. (By increasing the degree of the fittingpolynomial, we can drive the error at the data points to zero; in fact, when n = m− 1, thefitting polynomial becomes an interpolating polynomial, matching the data points exactly.However, the graph of a high-degree interpolating polynomial often oscillates wildly andleads to a poor approximation between the data points. This deficiency of interpolatingpolynomials is one of the main reasons for using least-squares fits—we are looking foran approximation that behaves smoothly over the entire interval and the choice n ≈ m/2seems to work well in practice. In a later MATLAB exercise (see Chapter 4) we willexplore some of the problems associated with polynomial interpolation.)

So, form = 10, 12, 14, 16, 18, and 20, let us choose y = g(x) to be the least-squarespolynomial approximation of degree n = 5, 6, 7, 8, 9, and 10 respectively. As data valuesxi , let us choosem points equally spaced in [0, π/2]. We also need a measure of goodnessfor the approximation cos x ≈ g(x). Let t1, t2, . . . , t100 denote 100 points, equally spacedin [0, π/2] and let D denote the maximum value of | cos ti − g(ti)|, i = 1, 2, . . . , 100.The size of D will serve as a measure of how well g(x) approximates cos x.

For each value ofm, find the least-squares polynomial approximation of degree n andlist the coefficients of the polynomial, using long format. Next, calculate the number Ddefined in the previous paragraph. Finally, list in column form and in long format, the 100values (cos ti , g(ti)). Write a brief report summarizing your conclusions.

Note: MATLAB provides a number of computational tools (see Appendix A) thatmake it very easy to carry out the investigations in Exercise 1. In particular, the commandlinspace generates vectors with equally-spaced components (the data values xi and ti).If X denotes the vector of data values xi , then the command Y = cos(X)generates avector of data values xi, yi = cos xi . In MATLAB, the number π can be entered by typingpi. Recall from part b) of Exercise 1 that the MATLAB function polyfit will calculate thecoefficients for a best least-squares approximation and the function polyval will evaluatethe approximation at a given set of points. Finally, to calculate the largest entry in absolutevalue of a vector v, use the command maximum(abs(v)).

Note: Exercise 1 illustrates the basic ideas underlying computer evaluation of math-ematical functions. For obvious competitive reasons, computer and calculator manu-facturers generally will not reveal the details of how their particular machine evaluatesmathematical functions. If you are interested in knowing more about this topic, you mightconsult the Computer Evaluation of Mathematical Functions by C. T. Fike. In addition,the now outdated line of IBM 360 and 370 mainframe computers provided a manual givingthe exact description of how each FORTRAN command was implemented by their com-piler (for instance, the FORTRAN command y = sqrt(α)was executed by carryingout two steps of Newton’s method for the equation x2 = α, starting with an initial guessgenerated from the value α).

3. If you enjoy programming, write a MATLAB function that calculates y = cos x for anyreal input value x. You could draw on the ideas in Exercise 1.

4. Repeat Exercise 1 for the function y = ex . Consider choosing a least-squares polynomialapproximation y = g(x) that is good on the interval [0, 1] and then using the fact thatea+b = eaeb. For example, suppose x = 4.19. You could approximate e4.19 as follows:



e4.19 = e4e0.19 ≈ e4g(0.19).

For the preceding approximation, we would have the constant e precalculated and stored.Then, the evaluation of e4 just requires multiplication.

5. Repeat Exercise 1 for the function y = tan x. This time, a polynomial approximation willnot be effective since the tangent has vertical asymptotes at x = −π/2 and x = π/2 andpolynomial functions cannot imitate such behavior. For functions having either verticalor horizontal asymptotes, you can try approximating by a rational function (that is, by aquotient of polynomials).

In particular, let y = f (x) denote the function that we wish to approximate. Arational function approximation for f will typically take the following form:

f (x) ≈ a0 + a1x + · · · + amxmb0 + b1x + · · · + bnxn .

The preceding approximation actually has m + n + 1 parameters because we can dividenumerator and denominator by a constant. For example, we can assume that b0 = 1 if wewant an approximation that is valid for x = 0.

An example should clarify the ideas. Suppose for 0 ≤ x < π/2, we want to approx-imate y = tan x by a rational function of the form

g(x) = a0 + a1x + a2x2

1+ b1x + b2x2 .

Since g is a five parameter function, we will use a least-squares criterion involving ten datavalues to determine a0, a1, a2, b1, and b2. Since we want g(x) to approximate tan x, theten data values will be yi = tan xi for i = 1, 2, . . . , 10. In particular, the ten conditionsyi = g(xi) lead to the system:

y1(1+ b1x1 + b2x21 ) = a0 + a1x1 + a2x

21

y2(1+ b1x2 + b2x22 ) = a0 + a1x2 + a2x

22

...

y10(1+ b1x10 + b2x210) = a0 + a1x10 + a2x

210

In matrix terms, this system is

1 x1 x21 −x1y1 −x2

1y1

1 x2 x22 −x2y2 −x2

2y2...

...

1 x10 x210 −x10y10 −x2

10y10

a0

a1

a2

b1

b2

=

y1

y2...

y10

.

As in Exercise 1, try various choices form andn until you obtain a good approximationg(x) for tan x. Since the tangent function is odd, you might want to select m to be an oddinteger and n to be an even integer.

There are other ways you might think of to approximate y = tan x. For instance, ifyou have separate polynomial approximations for y = sin x and y = cos x, then it alsomakes sense to use the quotient of these two approximations; you will find, however, thatthe choice of a rational function determined as above will be better.


275

4The EigenvalueProblem

An understanding of the eigenvalue problem requires several results about determinants.

We review the necessary results in Section 4.2.

Readers familiar with determinants may omit Sections 4.2 and 4.3 with no loss of continuity.

A thorough treatment of determinants is given in Chapter 6. Chapter 6 is designed so that it can

be covered now (before eigenvalues) or later (after eigenvalues).

Overview As we shall see, the eigenvalue problem is of great practical importance in mathematicsand applications. In Section 4.1 we introduce the eigenvalue problem for the special caseof (2× 2)matrices; this special case can be handled using ideas developed in Chapter 1.In Section 4.4 we move on to the general case, the eigenvalue problem for (n × n)matrices. The general case requires several results from determinant theory, and theseare summarized in Section 4.2. If you are familiar with these results, you can proceeddirectly to the general (n× n) case in Section 4.4.

If you have time and if you want a thorough discussion of determinants, you mightwant to cover Chapter 6 (Determinants) before Chapter 4 (The Eigenvalue Problem).Chapters 4 and 6 are independent, and they are designed to be read in any order.

Core Sections 4.1 The Eigenvalue Problem for (2 × 2) Matrices4.2 Determinants and the Eigenvalue Problem (or Sections 6.1–6.3)4.4 Eigenvalues and the Characteristic Polynomial4.5 Eigenvectors and Eigenspaces4.6 Complex Eigenvalues and Eigenvectors4.7 Similarity Transformations and Diagonalization


276 Chapter 4 The Eigenvalue Problem

4.1 THE EIGENVALUE PROBLEM FOR (2× 2)MATRICESThe eigenvalue problem, the topic of this chapter, is a problem of considerable theoreticalinterest and wide-ranging application. For instance, applications found in Sections 4.8and 5.10 and Chapter 7 include procedures for:

(a) solving systems of differential equations;(b) analyzing population growth models;(c) calculating powers of matrices;(d) diagonalizing linear transformations; and(e) simplifying and describing the graphs of quadratic forms in two and three

variables.

The eigenvalue problem is formulated as follows.

The Eigenvalue ProblemFor an (n× n) matrix A, find all scalars λ such that the equation

Ax = λx (1)

has a nonzero solution, x. Such a scalar λ is called an eigenvalue of A, and anynonzero (n× 1) vector x satisfying Eq. (1) is called an eigenvector correspond-ing to λ.

Let x be an eigenvector of A corresponding to an eigenvalue λ. Then the vectorAx is a scalar multiple of x (see Eq. (1)). Represented as geometric vectors, x and Axhave the same direction if λ is positive and the opposite direction if λ is negative (seeFig. 4.1).

x x

Ax Ax

( > 0)λ ( < 0)λ

Figure 4.1 Let Ax = λx, where x is a nonzero vector. Then x and Axare parallel vectors.

Now, we can rewrite Eq. (1) as

Ax − λx = θ ,or

(A− λI)x = θ , x �= θ , (2)


4.1 The Eigenvalue Problem for (2× 2)Matrices 277

where I is the (n × n) identity matrix. If Eq. (2) is to have nonzero solutions, then λmust be chosen so that the (n× n) matrix A− λI is singular. Therefore, the eigenvalueproblem consists of two parts:

1. Find all scalars λ such that A− λI is singular.2. Given a scalar λ such that A − λI is singular, find all nonzero vectors x such

that (A− λI)x = θ .

If we know an eigenvalue of A, then the variable-elimination techniques described inChapter 1 provide an efficient way to find the eigenvectors. The new feature of theeigenvalue problem is in part 1, determining all scalars λ such that the matrix A− λI issingular. In the next subsection, we discuss how such values λ are found.

Eigenvalues for (2× 2)MatricesBefore discussing how the eigenvalue problem is solved for a general (n× n)matrix A,we first consider the special case where A is a (2× 2) matrix. In particular, suppose wewant to solve the eigenvalue problem for a matrix A of the form

A =[a b

c d

].

As we noted above, the first step is to find all scalars λ such that A− λI is singular.The matrix A− λI is given by

A− λI =[a b

c d

]−[λ 00 λ

],

or

A− λI =[a − λ b

c d − λ

].

Next we recall (see Exercise 68 in Section 1.9) that a (2 × 2) matrix is singular ifand only if the product of the diagonal entries is equal to the product of the off-diagonalentries. That is, if B is the (2× 2) matrix

B =[r s

t u

], (3a)

then

B is singular⇔ ru− st = 0. (3b)

If we apply the result in (3b) to the matrix A− λI , it follows that A− λI is singular ifand only if λ is a value such that

(a − λ)(d − λ)− bc = 0. (4)

Expanding the equation for λ given above, we obtain the following condition on λ:

λ2 − (a + d)λ+ (ad − bc) = 0.



Equivalently, A−λI is singular if and only if λ is a root of the polynomial equation

t2 − (a + d)t + (ad − bc) = 0. (5)

An example will serve to illustrate this idea.

Example 1 Find all scalars λ such that A− λI is singular, where

A =[

5 −26 −2

].

Solution The matrix A− λI has the form

A− λI =[

5− λ −26 −2− λ

].

As in Eq. (4), A− λI is singular if and only if

−(5− λ)(2+ λ)+ 12 = 0,

or λ2 − 3λ + 2 = 0. Since λ2 − 3λ + 2 = (λ − 2)(λ − 1), it follows that A − λI issingular if and only if λ = 2 or λ = 1.

As a check for the calculations in Example 1, we list the matrices A− λI for λ = 2and λ = 1:

A− 2I =[

3 −26 −4

], A− I =

[4 −26 −3

]. (6)

Note that these matrices, A− 2I and A− I , are singular.

Eigenvectors for (2× 2)MatricesAs we observed earlier, the eigenvalue problem consists of two steps: First find theeigenvalues (the scalars λ such that A− λI is singular). Next find the eigenvectors (thenonzero vectors x such that (A− λI)x = θ ).

In the following example, we find the eigenvectors for matrix A in Example 1.

Example 2 For matrix A in Example 1, determine the eigenvectors corresponding to λ = 2 and toλ = 1.

Solution According to Eq. (2), the eigenvectors corresponding to λ = 2 are the nonzero solutionsof (A− 2I )x = θ . Thus, for the singular matrix A− 2I listed in (6), we need to solvethe homogeneous system

3x1 − 2x2 = 06x1 − 4x2 = 0.

The solution of this system is given by 3x1 = 2x2, or x1 = (2/3)x2. Thus all the nonzerosolutions of (A− 2I )x = θ are of the form

x =[(2/3)x2

x2

]= x2

[2/31

], x2 �= 0.


4.1 The Eigenvalue Problem for (2× 2)Matrices 279

For λ = 1, the solutions of (A− I )x = θ are found by solving

4x1 − 2x2 = 06x1 − 3x2 = 0.

The nonzero solutions of this system are all of the form

x =[(1/2)x2

x2

]= x2

[1/21

], x2 �= 0.

The results of Examples 1 and 2 provide the solution to the eigenvalue problem forthe matrix A, where A is given by

A =[

5 −26 −2

].

In a summary form, the eigenvalues and corresponding eigenvectors are as listed below:

Eigenvalue: λ = 2; Eigenvectors: x = a[

2/31

], a �= 0.

Eigenvalue: λ = 1; Eigenvectors: x = a[

1/21

], a �= 0.

Note that for a given eigenvalue λ, there are infinitely many eigenvectors correspondingto λ. Since Eq. (2) is a homogeneous system, it follows that if x is an eigenvectorcorresponding to λ, then so is ax for any nonzero scalar a.

Finally, we make the following observation: If A is a (2× 2) matrix, then we havea simple test for determining those values λ such that A− λI is singular. But if A is an(n×n)matrix with n > 2, we do not (as yet) have a test for determining whetherA−λIis singular. In the next section a singularity test based on the theory of determinants willbe developed.

4.1 EXERCISES

In Exercises 1–12, find the eigenvalues and the eigen-vectors for the given matrix.

1. A =[

1 02 3

]2. A =

[2 10 −1

]

3. A =[

2 −1−1 2

]4. A =

[1 −21 4

]

5. A =[

2 11 2

]6. A =

[3 −15 −3

]

7. A =[

1 0

2 1

]8. A =

[2 3

0 2

]

9. A =[

2 2

3 3

]10. A =

[1 2

4 8

]

11. A =[

1 −1

1 3

]12. A =

[2 −1

1 4

]



Using Eq. (4), apply the singularity test to the matrices inExercises 13–16. Show that there is no real scalarλ suchthatA−λI is singular. [Note: Complex eigenvalues arediscussed in Section 4.6.]

13. A =[ −2 −1

5 2

]14. A =

[3 −25 −3

]

15. A =[

2 −11 2

]16. A =

[1 −11 1

]

17. Consider the (2× 2) symmetric matrix

A =[a b

b d

].

Show that there are always real scalars λ such thatA−λI is singular. [Hint: Use the quadratic formulafor the roots of Eq. (5).]

18. Consider the (2× 2) matrix A given by

A =[

a b

−b a

], b �= 0.

Show that there are no real scalars λ such thatA−λIis singular.

19. Let A be a (2 × 2) matrix. Show that A and AThave the same set of eigenvalues by considering thepolynomial equation (5).

4.2 DETERMINANTS AND THE EIGENVALUE PROBLEM

Now we turn our attention to the eigenvalue problem for a general (n× n)matrix A. Aswe observed in the last section, the first task is to determine all scalars λ such that thematrix A− λI given by

A− λI =[a − λ b

c d − λ

],

we have a simple test for singularity:

A− λI is singular⇔ (a − λ)(d − λ)− bc = 0.

For a general (n×n)matrix A, the theory of determinants can be used to discover thosevalues λ such that A− λI is singular.

Determinant theory has long intrigued mathematicians. The reader has probablylearned how to calculate determinants, at least for (2 × 2) and (3 × 3) matrices. Thepurpose of this section is to briefly review those aspects of determinant theory that can beused in the eigenvalue problem. A formal development of determinants, including proofs,definitions, and the important properties of determinants, can be found in Chapter 6. Inthis section we present three basic results: an algorithm for evaluating determinants, acharacterization of singular matrices in terms of determinants, and a result concerningdeterminants of matrix products.

Determinants of (2× 2)MatricesWe begin with the definition for the determinant of a (2× 2) matrix.


4.2 Determinants and the Eigenvalue Problem 281

Definition 1 Let A be the (2× 2) matrix

A =[a11 a12

a21 a22

].

The determinant of A, denoted by det(A), is the number

det(A) =∣∣∣∣∣ a11 a12

a21 a22

∣∣∣∣∣ = a11a22 − a21a12.

(Note: As Definition 1 indicates, the determinant of a (2× 2) matrix is simply thedifference of the products of the diagonal entries and the off-diagonal entries. Thus, inthe context of the singularity test displayed in Eqs. (3a) and (3b) in the previous section,a (2× 2)matrix A is singular if and only if det(A) = 0. Also note that we designate thedeterminant of A by vertical bars when we wish to exhibit the entries of A.)

Example 1 Find det(A), where

A =[

2 41 3

].


det(A) =∣∣∣∣∣ 2 4

1 3

∣∣∣∣∣ = 2 · 3− 1 · 4 = 2.


A =[

2 43 6

].

DETERMINANTS Determinants were studied and extensively used long before matrix algebrawas developed. In 1693, the co-founder of calculus, Gottfried Wilhelm Leibniz (1646–1716), essentiallyused determinants to determine if a (3× 3) linear system was consistent. (Similar work was done tenyears earlier in Japan by Seki-Kowa.) Cramer’s Rule (see Section 6.4), which uses determinants to solvelinear systems, was developed in 1729 by Colin Maclaurin (1698–1746). Joseph Louis Lagrange(1736–1813) used determinants to express the area of a triangle and the volume of a tetrahedron.

It was Augustin-Louis Cauchy (1789–1857) who first coined the term “determinant” and in 1812published a unification of the theory of determinants. In subsequent publications Cauchy useddeterminants in a variety of ways such as the development of the functional determinant commonlycalled the Jacobian.




det(A) =∣∣∣∣∣ 2 4

3 6

∣∣∣∣∣ = 2 · 6− 3 · 4 = 0.

Again, Examples 1 and 2 are special instances that reaffirm our earlier observationabout the singularity of a (2×2)matrixA. That is,A is singular if and only if det(A) = 0.

Determinants of (3× 3)MatricesIn Definition 1, we associated a number, det(A), with a (2× 2) matrix A. This numberassignment had the property that det(A) = 0 if and only ifA is singular. We now developa similar association of a number, det(A), with an (n× n) matrix A.

We first consider the case in which n = 3.

Definition 2 Let A be the (3× 3) matrix

A =a11 a12 a13

a21 a22 a23

a31 a32 a33

.

The determinant of A is the number det(A), where

det(A) = a11

∣∣∣∣∣ a22 a23

a32 a33

∣∣∣∣∣− a12

∣∣∣∣∣ a21 a23

a31 a33

∣∣∣∣∣+ a13

∣∣∣∣∣ a21 a22

a31 a32

∣∣∣∣∣ . (1)

(Note: The determinant of a (3 × 3) matrix is defined to be the weighted sum of three(2× 2) determinants. Similarly, the determinant of an (n× n)matrix will be defined asthe weighted sum of n determinants each of order [(n− 1)× (n− 1)].)


A =

1 2 −15 3 4−2 0 1

.

Solution From Definition 2,

det(A) = (1)∣∣∣∣∣ 3 4

0 1

∣∣∣∣∣− (2)∣∣∣∣∣ 5 4−2 1

∣∣∣∣∣+ (−1)

∣∣∣∣∣ 5 3−2 0

∣∣∣∣∣= 1(3 · 1− 4 · 0)− 2[5 · 1− 4(−2)] − 1[5 · 0− 3(−2)]= 3− 26− 6 = −29.



Minors and CofactorsIf we examine the three (2 × 2) determinants that appear in Eq. (1), we can see apattern. In particular, the entries in the first (2 × 2) determinant can be obtained fromthe matrix A by striking out the first row and column of A. Similarly, the entries inthe second (2× 2) determinant can be obtained by striking out the first row and secondcolumn of A. Finally, striking out the first row and third column yields the third (2× 2)determinant.

The process of generating submatrices by striking out rows and columns is funda-mental to the definition of a general (n× n) determinant. For a general (n× n) matrixA, we will use the notation Mrs to designate the [(n − 1) × (n − 1)] matrix generatedby removing row r and column s from A (see Definition 3).

Definition 3 Let A = (aij ) be an (n × n) matrix. The [(n − 1) × (n − 1)] matrix that resultsfrom removing the rth row and sth column from A is called a minor matrix of Aand is designated byMrs .

Example 4 illustrates the idea in Definition 3.

Example 4 List the minor matricesM21,M23,M42, andM11 for the (4× 4) matrix A given by

A =

1 2 1 30 1 2 04 2 0 −1−2 3 1 1

.

Solution The minor matrix M21 is obtained from A by removing the second row and the firstcolumn from A:

M21 =

2 1 32 0 −13 1 1

.

Similarly, we have

M23 =

1 2 34 2 −1−2 3 1

, M42 =

1 1 30 2 04 0 −1

, and

M11 =

1 2 02 0 −13 1 1

.



Using the notation for a minor matrix, we can reinterpret the definition of a (3× 3)determinant as follows: If A = (aij ) is a (3 × 3) matrix, then from Eq. (1) andDefinition 3,

det(A) = a11 det(M11)− a12 det(M12)+ a13 det(M13). (2)

In determinant theory, the number det(Mij ) is called a minor. Precisely, ifA = (aij )is an (n× n) matrix, then the number det(Mij ) is the minor of the (i, j)th entry, aij . Inaddition, the numbers Aij defined by

Aij = (−1)i+j det(Mij )

are known as cofactors (or signed minors). Thus the expression for det(A) in Eq. (2) isknown as a cofactor expansion corresponding to the first row.

It is natural, then, to wonder about other cofactor expansions of A that parallel theone given in Eq. (2). For instance, what is the cofactor expansion of A corresponding tothe second row or even, perhaps, corresponding to the third column?

By analogy, a cofactor expansion along the second row would have the form

−a21 det(M21)+ a22 det(M22)− a23 det(M23). (3)

An expansion along the third column would take the form

a13 det(M13)− a23 det(M23)+ a33 det(M33). (4)

Example 5 Let A denote the (3× 3) matrix from Example 3,

A =

1 2 −15 3 4−2 0 1

.

Calculate the second-row and third-column cofactor expansions defined by Eqs. (3) and(4), respectively.

Solution According to the pattern in Eq. (3), a second-row expansion has the value

−5

∣∣∣∣∣ 2 −10 1

∣∣∣∣∣+ 3

∣∣∣∣∣ 1 −1−2 1

∣∣∣∣∣− 4

∣∣∣∣∣ 1 2−2 0

∣∣∣∣∣ = −10− 3− 16 = −29.

Using Eq. (4), we obtain a third-column expansion given by

−∣∣∣∣∣ 5 3−2 0

∣∣∣∣∣− 4

∣∣∣∣∣ 1 2−2 0

∣∣∣∣∣+∣∣∣∣∣ 1 2

5 3

∣∣∣∣∣ = −6− 16− 7 = −29.



(Note: For the (3× 3) matrix A in Example 5, there are three possible row expansionsand three possible column expansions. It can be shown that each of these six expansionsyields exactly the same value, namely, −29. In general, as we observe in the nextsubsection, all row expansions and all column expansions produce the same value forany (n× n) matrix.)

The Determinant of an (n× n)MatrixWe now give an inductive definition for det(A), the determinant of an (n × n) matrix.That is, det(A) is defined in terms of determinants of [(n− 1)× (n− 1)] matrices. Thenatural extension of Definition 2 is the following.

Definition 4 Let A = (aij ) be an (n× n) matrix. The determinant of A is the number det(A),where

det(A) = a11 det(M11)− a12 det(M12)+ · · · + (−1)n+1a1n det(M1n)

=n∑j=1

(−1)j+1a1j det(M1j ). (5)

The definition for det(A) can be stated in a briefer form if we recall the notationAijfor a cofactor. That is, Aij = (−1)i+j det(Mij ). Using the cofactor notation, we canrephrase Definition 4 as

det(A) =n∑j=1

a1jA1j . (6)

In the following example we see how Eq. (5) gives the determinant of a (4 × 4)matrix as the sum of four (3 × 3) determinants, where each (3 × 3) determinant is thesum of three (2× 2) determinants.

Example 6 Use Definition 4 to calculate the det(A), where

A =

1 2 −1 1−1 0 2 −2

3 −1 1 12 0 −1 2

.



Solution The determinants of the minor matricesM11,M12,M13, andM14 are (3×3) determinantsand are calculated as before with Definition 2:

det(M11) =

∣∣∣∣∣∣∣0 2 −2−1 1 1

0 −1 2

∣∣∣∣∣∣∣ = 0

∣∣∣∣∣ 1 1−1 2

∣∣∣∣∣− 2

∣∣∣∣∣ −1 10 2

∣∣∣∣∣+ (−2)

∣∣∣∣∣ −1 10 −1

∣∣∣∣∣ = 2

det(M12) =

∣∣∣∣∣∣∣−1 2 −2

3 1 12 −1 2

∣∣∣∣∣∣∣ = (−1)

∣∣∣∣∣ 1 1−1 2

∣∣∣∣∣− 2

∣∣∣∣∣ 3 12 2

∣∣∣∣∣+ (−2)

∣∣∣∣∣ 3 12 −1

∣∣∣∣∣ = −1

det(M13) =

∣∣∣∣∣∣∣−1 0 −2

3 −1 12 0 2

∣∣∣∣∣∣∣ = (−1)

∣∣∣∣∣ −1 10 2

∣∣∣∣∣− 0

∣∣∣∣∣ 3 12 2

∣∣∣∣∣+ (−2)

∣∣∣∣∣ 3 −12 0

∣∣∣∣∣ = −2

det(M14) =

∣∣∣∣∣∣∣−1 0 2

3 −1 12 0 −1

∣∣∣∣∣∣∣ = (−1)

∣∣∣∣∣ −1 10 −1

∣∣∣∣∣− 0

∣∣∣∣∣ 3 12 −1

∣∣∣∣∣+ 2

∣∣∣∣∣ 3 −12 0

∣∣∣∣∣ = 3.

Hence, from Eq. (5) with n = 4,

det(A) = 1(2)− 2(−1)+ (−1)(−2)− 1(3) = 3.

Example 7 For the (4× 4)matrix A in Example 6, calculate the second-column cofactor expansiongiven by

−a12 det(M12)+ a22 det(M22)− a32 det(M32)+ a42 det(M42).

Solution From Example 6, det(M12) = −1. Since a22 = 0 and a42 = 0, we need not calculatedet(M22) and det(M42). The only other value needed is det(M32), where

det(M32) =

∣∣∣∣∣∣∣1 −1 1−1 2 −2

2 −1 2

∣∣∣∣∣∣∣ = 1

∣∣∣∣∣ 2 −2−1 2

∣∣∣∣∣−(−1)

∣∣∣∣∣ −1 −22 2

∣∣∣∣∣+1

∣∣∣∣∣ −1 22 −1

∣∣∣∣∣ = 1.

Thus the second-column expansion gives the value

−2(−1)+ 0 det(M22)− (−1)(1)+ 0 det(M42) = 3.

From Example 6, det(A) = 3. From Example 7, a second-column expansion alsoproduces the same value, 3. The next theorem states that a cofactor expansion along anyrow or any column always produces the same number, det(A). The expansions in thetheorem are phrased in the same brief notation as in Eq. (6). The proof of Theorem 1 isgiven in Chapter 6.

Theorem 1 Let A = (aij ) be an (n × n) matrix with minor matrices Mij and cofactors Aij =(−1)i+j det(Mij ). Then



det(A) =n∑j=1

aijAij (ith-row expansion)

=n∑i=1

aijAij (j th-column expansion)

Because of Theorem 1, we can always find det(A) by choosing the row or columnof A with the most zeros for the cofactor expansion. (If aij = 0, then aijAij = 0, andwe need not compute Aij .) In the next section we consider how to use elementary rowor column operations to create zeros and hence simplify determinant calculations.

Determinants and Singular MatricesTheorems 2 and 3, which follow, are fundamental to our study of eigenvalues. Thesetheorems are stated here and their proofs are given in Chapter 6.

Theorem 2 Let A and B be (n× n) matrices. Then

det(AB) = det(A) det(B).

The following example illustrates Theorem 2.

Example 8 Calculate det(A), det(B), and det(AB) for the matrices

A =[

1 2−1 1

]and B =

[2 31 −1

].

Solution The product, AB, is given by

AB =[

4 1−1 −4

].

Clearly det(AB) = −15. We also see that det(A) = 3 and det(B) = −5. Observe, forthis special case, that det(A) det(B) = det(AB).

To study the eigenvalue problem for an (n×n)matrix, we need a test for singularity.The following theorem shows that determinant theory provides a simple and elegant test.

Theorem 3 Let A be an (n× n) matrix. Then

A is singular if and only if det(A) = 0.

Theorem 3 is already familiar for the case in which A is a (2 × 2) matrix (recallDefinition 1 and Examples 1 and 2). An outline for the proof of Theorem 3 is given inthe next section. Finally, in Section 4.4 we will be able to use Theorem 3 to devise aprocedure for solving the eigenvalue problem.

We conclude this brief introduction to determinants by observing that it is easy tocalculate the determinant of a triangular matrix.



Theorem 4 Let T = (tij ) be an (n× n) triangular matrix. Then

det(T ) = t11t22 . . . tnn.

The proof of Theorem 4 is left to the exercises. The next example illustrates how aproof for Theorem 4 might be constructed.

Example 9 Use a cofactor expansion (as in Definition 4 or Theorem 1) to calculate det(T ):

T =

2 1 3 70 4 8 10 0 1 50 0 0 3

.

Solution By Theorem 1, we can use a cofactor expansion along any row or column to calculatedet(T ). Because of the structure of T , an expansion along the first column or the fourthrow will be easiest.

Expanding along the first column, we find

det(T ) =

∣∣∣∣∣∣∣∣∣∣

2 1 3 70 4 8 10 0 1 50 0 0 3

∣∣∣∣∣∣∣∣∣∣= 2

∣∣∣∣∣∣∣4 8 10 1 50 0 3

∣∣∣∣∣∣∣

= (2)(4)∣∣∣∣∣ 1 5

0 3

∣∣∣∣∣ = 24.

This example provides a special case of Theorem 4.

(Note: An easy corollary to Theorem 4 is the following: If I is the (n× n) identitymatrix, then det(I ) = 1. In the exercises that follow, some additional results are derivedfrom the theorems in this section and from the fact that det(I ) = 1.)

4.2 EXERCISES

In Exercises 1–6, list the minor matrix Mij , and calcu-late the cofactor Aij = (−1)i+j det(Mij ) for the matrixA given by

A =

2 −1 3 14 1 3 −16 2 4 12 2 0 −2

(7)

1. M11 2. M21 3. M31

4. M41 5. M34 6. M43

7. Use the results of Exercises 1–4 to calculate det(A)for the matrix A given in (7).

In Exercises 8–19, calculate the determinant of the givenmatrix. Use Theorem 3 to state whether the matrix issingular or nonsingular.

8. A =[

2 1−1 2

]9. A =

[1 −1−2 2

]

10. A =[

2 34 6

]11. A =

[1 12 1

]



12. A =

1 2 42 3 74 2 10

13. A =

2 −3 2−1 −2 1

3 1 −1

14. A =

1 2 10 3 2−1 1 1

15. A =

2 0 01 3 22 1 4

16. A =

2 0 03 1 02 4 2

17. A =

1 2 1 50 3 0 00 4 1 20 3 1 4

18. A =

0 1 0 00 0 1 01 0 0 00 0 0 1

19. A =

0 0 0 20 0 3 10 2 1 23 4 1 4

20. Let A = (aij ) be a given (3 × 3) matrix. Form theassociated (3× 5) matrix B shown next:

B =a11

a21

a31

a12

a22

a32

a13

a23

a33

a11

a21

a31

a12

a22

a32

a) Subtract the sum of the three upward diagonalproducts from the sum of the three downwarddiagonal products and argue that your result isequal to det(A).

b) Show, by example, that a similar basketweavealgorithm cannot be used to calculate thedeterminant of a (4× 4) matrix.

In Exercises 21 and 22, find all ordered pairs (x, y) suchthat A is singular.

21. A =x y 12 3 10 −1 1

22. A =

x 1 12 1 10 −1 y

23. Let A = (aij ) be the (n× n) matrix specified thus:aij = d for i = j and aij = 1 for i �= j . For n = 2,3, and 4, show that

det(A) = (d − 1)n−1(d − 1+ n).24. Let A and B be (n× n) matrices. Use Theorems 2

and 3 to give a quick proof of each of the following.a) If either A or B is singular, then AB is singular.b) If AB is singular, then either A or B is singular.

25. Suppose that A is an (n × n) nonsingular matrix,and recall that det(I ) = 1, where I is the (n × n)identity matrix. Show that det(A−1) = 1/ det(A).

26. IfA and B are (n×n)matrices, then usuallyAB �=BA. Nonetheless, argue that always det(AB) =det(BA).

In Exercises 27–30, use Theorem 2 and Exercise 25to evaluate the given determinant, where A and B are(n× n) matrices with det(A) = 3 and det(B) = 5.

27. det(ABA−1) 28. det(A2B)

29. det(A−1B−1A2) 30. det(AB−1A−1B)

31. a) Let A be an (n× n) matrix. If n = 3, det(A)can be found by evaluating three (2× 2)determinants. If n = 4, det(A) can be found byevaluating twelve (2× 2) determinants. Givea formula, H(n), for the number of (2× 2)determinants necessary to find det(A) for anarbitrary n.

b) Suppose you can perform additions,subtractions, multiplications, and divisions eachat a rate of one per second. How long does ittake to evaluate H(n) (2× 2) determinantswhen n = 2, n = 5, and n = 10?

32. Let U and V be (n × n) upper-triangular matrices.Prove a special case of Theorem 2: det(UV ) =det(U) det(V ). [Hint: Use the definition for matrixmultiplication to calculate the diagonal entries of theproduct UV , and then apply Theorem 4. You willalso need to recall from Exercise 67 in Section 1.5that UV is an upper-triangular matrix.]

33. Let V be an (n×n) triangular matrix. Use Theorem4 to prove that det(V T ) = det(V ).

34. Let T = (tij ) be an (n × n) upper-triangular ma-trix. Prove that det(T ) = t11t22 . . . tnn. [Hint: Usemathematical induction, beginning with a (2 × 2)upper-triangular determinant.]



4.3 ELEMENTARY OPERATIONS AND DETERMINANTS(OPTIONAL)*

We saw in Section 4.2 that having many zero entries in a matrix simplifies the calculationof its determinant. The ultimate case is given in Theorem 4. If T = (tij ) is an (n× n)triangular matrix, then it is very easy to calculate det(T ):

det(T ) = t11t22 . . . tnn.

In Chapter 1, we used elementary row operations to create zero entries. We nowconsider these row operations (along with similar column operations) and describe theireffect on the value of the determinant. For instance, consider the (2× 2) matrices

A =[

1 23 4

]and B =

[3 41 2

].

Clearly B is the result of interchanging the first and second rows of A (an elementaryrow operation). Also, we see that det(A) = −2, whereas det(B) = 2. This computationdemonstrates that performing an elementary operation may change the value of thedeterminant. As we will see, however, it is possible to predict in advance the nature ofany changes that might be produced by an elementary operation. For example, we willsee that a row interchange always reverses the sign of the determinant.

Before studying the effects of elementary row operations on determinants, we con-sider the following theorem, which is an immediate consequence of Theorem 1.

Theorem 5 If A is an (n× n) matrix, then

det(A) = det(AT ).

Proof The proof is by induction, and we begin with the case n = 2. Let A = (aij ) be a (2× 2)matrix:

A =[a11 a12

a21 a22

], AT =

[a11 a21

a12 a22

].

Hence it is clear that det(A) = det(AT ) when A is a (2× 2) matrix.The inductive step hinges on the following observation about minor matrices: Sup-

pose that B is a square matrix, and let C = BT . Next, let Mrs and Nrs denote minormatrices of B and C, respectively. Then these minor matrices are related by

Nij = (Mji)T . (1)

(In words, the ij th minor matrix of BT is equal to the transpose of the jith minor matrixof B.)

To proceed with the induction, suppose that Theorem 5 is valid for all (k × k) ma-trices, 2 ≤ k ≤ n − 1. Let A be an (n × n) matrix, where n > 2. Let Mrs denote the

∗The results in this section are not required for a study of the eigenvalue problem. They are included here forthe convenience of the reader and because they follow naturally from definitions and theorems in the previoussection. See Chapter 6 for proofs.


4.3 Elementary Operations and Determinants (Optional) 291

minor matrices ofA, and letNrs denote the minor matrices ofAT . Consider an expansionof det(A) along the first row and an expansion of det(AT ) along the first column:

det(A) = a11 det(M11)− a12 det(M12)+ · · · + (−1)n+1a1n det(M1n)

det(AT ) = a11 det(N11)− a12 det(N21)+ · · · + (−1)n+1a1n det(Nn1).(2)

(The expansion for det(AT ) in Eq. (2) incorporates the fact that the first-column entriesfor AT are the same as the first-row entries of A.)

By Eq. (1), the minor matricesNj1 in Eq. (2) satisfyNj1 = (M1j )T , 1 ≤ j ≤ n. By

the inductive hypotheses, det(MT1j ) = det(M1j ), since M1j is a matrix of order n − 1.

Therefore, both expansions in Eq. (2) have the same value, showing that det(AT ) =det(A).

One valuable aspect of Theorem 5 is that it tells us an elementary column opera-tion applied to a square matrix A will affect det(A) in precisely the same way as thecorresponding elementary row operation.

Effects of Elementary OperationsWe first consider how the determinant changes when rows of a matrix are interchanged.

Theorem 6 Let A be an (n × n) matrix, and let B be formed by interchanging any two rows (orcolumns) of A. Then

det(B) = − det(A).

Proof First we consider the case where the two rows to be interchanged are adjacent, say, theith and (i+ 1)st rows. LetMij , 1 ≤ j ≤ n, be the minor matrices ofA from the ith row,and let Ni+1,j , 1 ≤ j ≤ n, be the minor matrices of B from the (i + 1)st row. A bit ofreflection will reveal that Ni+1,j = Mij . Since ai1, ai2, . . . , ain are the elements of the(i + 1)st row of B, we have

det(B) =n∑j=1

(−1)i+1+j aij det(Ni+1,j )

= −n∑j=1

(−1)i+j aij det(Ni+1,j )

= −n∑j=1

(−1)i+j aij det(Mij ), since Ni+1,j = Mij

= − det(A).Thus far we know that interchanging any two adjacent rows changes the sign of the

determinant. Now suppose that B is formed by interchanging the ith and kth rows ofA, where k ≥ i + 1. The ith row can be moved to the kth row by (k − i) successiveinterchanges of adjacent rows. The original kth row at this point is now the (k − 1)strow. This row can be moved to the ith row by (k − 1 − i) successive interchanges ofadjacent rows. At this point all other rows are in their original positions. Hence, wehave formed B with 2k − 1− 2i successive interchanges of adjacent rows. Thus,

det(B) = (−1)(2k−1−2i) det(A) = − det(A).



Corollary If A is an (n× n) matrix with two identical rows (columns), then det(A) = 0.

We leave the proof of the corollary as an exercise.


A =

0 0 0 40 0 3 20 1 2 52 3 1 3

.

Solution We could calculate det(A) by using a cofactor expansion, but we also see that we canrearrange the rows of A to produce a triangular matrix.

Adopting the latter course of action, we have

det(A) =

∣∣∣∣∣∣∣∣∣∣

0 0 0 40 0 3 20 1 2 52 3 1 3

∣∣∣∣∣∣∣∣∣∣= −

∣∣∣∣∣∣∣∣∣∣

2 3 1 30 0 3 20 1 2 50 0 0 4

∣∣∣∣∣∣∣∣∣∣=

∣∣∣∣∣∣∣∣∣∣

2 3 1 30 1 2 50 0 3 20 0 0 4

∣∣∣∣∣∣∣∣∣∣= 24.

Next we consider the effect of another elementary operation.

Theorem 7 Suppose thatB is obtained from the (n×n)matrixA by multiplying one row (or column)of A by a nonzero scalar c and leaving the other rows (or columns) unchanged. Then

det(B) = c det(A).

Proof Suppose that [cai1, cai2, . . . , cain] is the ith row of B. Since the other rows of B areunchanged from A, the minor matrices of B from the ith row are the same as Mij , theminor matrices of A from the ith row. Using a cofactor expansion from the ith row ofB to calculate det(B) gives

det(B) =n∑j=1

(caij )(−1)i+j det(Mij )

= cn∑j=1

aij (−1)i+j det(Mij )

= c det(A).

As we see in the next theorem, the third elementary operation leaves the determinantunchanged. (Note: Theorem 8 is also valid when the word column is substituted for theword row.)

Theorem 8 Let A be an (n× n) matrix. Suppose that B is the matrix obtained from A by replacingthe ith row of A by the ith row of A plus a constant multiple of the kth row of A, k �= i.Then

det(B) = det(A).



Proof Note that the ith row of B has the form

[ai1 + cak1, ai2 + cak2, . . . , ain + cakn].Since the other rows of B are unchanged from A, the minor matrices taken with respectto the ith row of B are the same as the minor matricesMij of A.

Using a cofactor expansion of det(B) from the ith row, we have

det(B) =n∑j=1

(aij + cakj )(−1)i+j det(Mij )

=n∑j=1

aij (−1)i+j det(Mij )+ cn∑j=1

akj (−1)i+j det(Mij )

= det(A)+ cn∑j=1

akj (−1)i+j det(Mij ).

(3)

Theorem 8 will be proved if we can show that the last summation on the right-handside of Eq. (3) has the value zero. In order to prove that the summation has the valuezero, construct a matrixQ by replacing the ith row ofA by the kth row ofA. The matrixQ so constructed has two identical rows (the kth row of A appears both as the kth rowand the ith row of Q). Therefore, by the corollary to Theorem 6, det(Q) = 0. Next,expanding det(Q) along the ith row ofQ, we obtain (since the ith-row minors ofQ arethe same as those of A and since the ij th entry ofQ is akj )

0 = det(Q) =n∑j=1

akj (−1)i+j det(Mij ). (4)

Substituting Eq. (4) into Eq. (3) establishes the theorem.

Example 2 Evaluate det(A), where

A =

1 2 10 3 2−2 1 1

.

Solution The value of det(A) is unchanged if we add a multiple of 2 times row 1 to row 3. Theeffect of this row operation will be to introduce another zero entry in the first column.Specifically,

det(A) =

∣∣∣∣∣∣∣1 2 10 3 2−2 1 1

∣∣∣∣∣∣∣(R3 + 2R1)−−−−−−−→=

∣∣∣∣∣∣∣1 2 10 3 20 5 3

∣∣∣∣∣∣∣ = −1.

Using Elementary Operations to Simplify DeterminantsClearly it is usually easier to calculate the determinant of a matrix with several zeroentries than to calculate one with no zero entries. Therefore, a common strategy in



determinant evaluation is to mimic the steps of Gaussian elimination—that is, to useelementary row or column operations to reduce the matrix to triangular form.


A =

1 2 −15 3 4−2 0 1

.

Solution With Gaussian elimination, we would first form the matrixB by the following operations:Replace R2 by R2 − 5R1 and replace R3 by R3 + 2R1. From Theorem 8, the matrix Bproduced by these two row operations has the same determinant as the original matrixA. In detail:

det(A) =

∣∣∣∣∣∣∣1 2 −15 3 4−2 0 1

∣∣∣∣∣∣∣ =∣∣∣∣∣∣∣

1 2 −10 −7 90 4 −1

∣∣∣∣∣∣∣ = 1

∣∣∣∣∣−7 94 −1

∣∣∣∣∣ = 7− 36 = −29.

We could have created a zero in the (2, 1) position of the last (2× 2) determinant. Theformula for (2× 2) determinants is so simple, however, that it is customary to evaluatea (2 × 2) determinant directly. The next example illustrates that we need not alwaysattempt to go to a triangular form in order to simplify a determinant.


A =

1 2 −1 1−1 0 2 −2

3 −1 1 12 0 −1 2

.

Solution We can introduce a third zero in the second column if we replace R1 by R1 + 2R3:

det(A) =

∣∣∣∣∣∣∣∣∣∣

1 2 −1 1−1 0 2 −2

3 −1 1 12 0 −1 2

∣∣∣∣∣∣∣∣∣∣=

∣∣∣∣∣∣∣∣∣∣

7 0 1 3−1 0 2 −2

3 −1 1 12 0 −1 2

∣∣∣∣∣∣∣∣∣∣

= −(−1)

∣∣∣∣∣∣∣7 1 3−1 2 −2

2 −1 2

∣∣∣∣∣∣∣ .(The second equality is from Theorem 8. The third equality is from an expansionalong the second column.) Next we replace R2 by R2 − 2R1 and R3 by R1 + R3.



The details are

det(A) =

∣∣∣∣∣∣∣7 1 3−1 2 −2

2 −1 2

∣∣∣∣∣∣∣ =∣∣∣∣∣∣∣

7 1 3−15 0 −8

9 0 5

∣∣∣∣∣∣∣

= −1

∣∣∣∣∣ −15 −89 5

∣∣∣∣∣ = (75− 72) = 3.

The next example illustrates that if the entries in a determinant are integers, then wecan avoid working with fractions until the last step. The technique involves multiplyingvarious rows by constants to make each entry in a column divisible by the pivot entry inthe column.


A =

2 3 −2 43 −3 5 25 2 4 3−3 4 −3 2

.

Solution We first multiply rows 2, 3, and 4 by 2 to make them divisible by 2. The row reductionoperations to create zeros in the first column can then proceed without using fractions.The row operations are R2 − 3R1, R3 − 5R1, and R4 + 3R1:

det(A) =

∣∣∣∣∣∣∣∣∣∣

2 3 −2 43 −3 5 25 2 4 3−3 4 −3 2

∣∣∣∣∣∣∣∣∣∣= 1

8

∣∣∣∣∣∣∣∣∣∣

2 3 −2 46 −6 10 4

10 4 8 6−6 8 −6 4

∣∣∣∣∣∣∣∣∣∣

= 18

∣∣∣∣∣∣∣∣∣∣

2 3 −2 40 −15 16 −80 −11 18 −140 17 −12 16

∣∣∣∣∣∣∣∣∣∣= 2

8

∣∣∣∣∣∣∣−15 16 −8−11 18 −14

17 −12 16

∣∣∣∣∣∣∣

= 14

∣∣∣∣∣∣∣−15 2(8) 2(−4)−11 2(9) 2(−7)

17 2(−6) 2(8)

∣∣∣∣∣∣∣ =2(2)

4

∣∣∣∣∣∣∣−15 8 −4−11 9 −7

17 −6 8

∣∣∣∣∣∣∣ .



We now multiply the second row by 4 and use R2 − 7R1 and R3 + 2R1:

det(A) = 14

∣∣∣∣∣∣∣−15 8 −4−44 36 −28

17 −6 8

∣∣∣∣∣∣∣ =14

∣∣∣∣∣∣∣−15 8 −4

61 −20 0−13 10 0

∣∣∣∣∣∣∣= −4

4(610− 260) = −350.

The preceding examples illustrate that there are many strategies that will lead toa simpler determinant calculation. Exactly which choices are made are determined byexperience and personal preference.

Proof of Theorem 3In the last section we stated Theorem 3: An (n× n) matrix A is singular if and only ifdet(A) = 0. The results of this section enable us to sketch a proof for Theorem 3.

If A is an (n× n) matrix, then we know from Chapter 1 that we can use Gaussianelimination to produce a row-equivalent upper-triangular matrix T . This matrix T canbe formed by using row interchanges and adding multiples of one row to other rows.Thus, by Theorems 6 and 8,

det(A) = ± det(T ). (5)

An outline for the proof of Theorem 3 is given below. We use tij to denote the entries ofthe upper-triangular matrix T :

1. det(A) = 0⇔ det(T ) = 0, by Eq. (5);2. det(T ) = 0⇔ tii = 0 for some i, by Theorem 4;3. tii = 0 for some i ⇔ T singular (see Exercise 56 of Section 1.7);4. T singular⇔ A singular, since T and A are row equivalent.

4.3 EXERCISES

In Exercises 1–6, evaluate det(A) by using row opera-tions to introduce zeros into the second and third entriesof the first column.

1. A =

1 2 1

3 0 2

−1 1 3

2. A =

2 4 6

3 1 2

1 2 1

3. A =

3 6 9

2 0 2

1 2 0

4. A =

1 1 2

−2 1 3

1 4 1

5. A =

2 4 −33 2 52 3 4

6. A =

3 4 −22 3 52 4 3

In Exercises 7–12, use only column interchanges or rowinterchanges to produce a triangular determinant andthen find the value of the original determinant.

7.∣∣∣∣∣∣∣∣∣∣

1 0 0 02 0 0 31 1 0 11 4 2 2

∣∣∣∣∣∣∣∣∣∣

8.∣∣∣∣∣∣∣∣∣∣

0 0 3 12 1 0 10 0 0 20 2 2 1

∣∣∣∣∣∣∣∣∣∣



9.∣∣∣∣∣∣∣∣∣∣

0 0 2 0

0 0 1 3

0 4 1 32 1 5 6

∣∣∣∣∣∣∣∣∣∣

10.∣∣∣∣∣∣∣∣∣∣

0 0 1 0

1 2 1 3

0 0 0 50 3 1 2

∣∣∣∣∣∣∣∣∣∣11.

∣∣∣∣∣∣∣∣∣∣

0 0 1 0

0 2 6 3

2 4 1 50 0 0 4

∣∣∣∣∣∣∣∣∣∣

12.∣∣∣∣∣∣∣∣∣∣

0 1 0 0

0 2 0 3

2 1 0 6

3 2 2 4

∣∣∣∣∣∣∣∣∣∣In Exercises 13–18, assume that the (3 × 3) matrix Asatisfies det(A) = 2, where A is given by

A =a b c

d e f

g h i

.

Calculate det(B) in each case.

13. B =a b 3c

d e 3f

g h 3i

14. B =

d e f

g h i

a b c

15. B =b a c

e d f

h g i

16. B =

a b c

a + d b + e c + fg h i

17. B =d e f

2a 2b 2c

g h i

18. B =d f e

a c b

g i h

In Exercises 19–22, evaluate the (4 × 4) determinants.Theorems 6–8 can be used to simplify the calculations.19.

∣∣∣∣∣∣∣∣∣∣

2 4 2 6

1 3 2 1

2 1 2 3

1 2 1 1

∣∣∣∣∣∣∣∣∣∣

20.∣∣∣∣∣∣∣∣∣∣

0 2 1 3

1 2 1 0

0 1 1 3

2 2 1 2

∣∣∣∣∣∣∣∣∣∣21.

∣∣∣∣∣∣∣∣∣∣

0 4 1 3

0 2 2 1

1 3 1 2

2 2 1 4

∣∣∣∣∣∣∣∣∣∣

22.∣∣∣∣∣∣∣∣∣∣

2 2 4 4

1 1 3 3

1 0 2 1

4 1 3 2

∣∣∣∣∣∣∣∣∣∣In Exercises 23 and 24, use row operations to obtain atriangular determinant and find the value of the originalVandermonde determinant.23.

∣∣∣∣∣∣∣1 a a2

1 b b2

1 c c2

∣∣∣∣∣∣∣24.

∣∣∣∣∣∣∣∣∣∣

1 a a2 a3

1 b b2 b3

1 c c2 c3

1 d d2 d3

∣∣∣∣∣∣∣∣∣∣25. LetA be an (n×n)matrix. Use Theorem 7 to argue

that det(cA) = cn det(A).26. Prove the corollary to Theorem 6. [Hint: Suppose

that the ith and j th rows of A are identical. Inter-change these two rows and let B denote the matrixthat results. How are det(A) and det(B) related?]

27. Find examples of (2×2)matricesA andB such thatdet(A+ B) �= det(A)+ det(B).

28. An (n × n) matrix A is called skew symmetric ifAT = −A. Show that if A is skew symmetric,then det(A) = (−1)n det(A). [Hint: Use Theo-rem 5 and Exercise 25.] Now, argue that an (n× n)skew-symmetric matrix is singular when n is an oddinteger.



4.4 EIGENVALUES AND THE CHARACTERISTICPOLYNOMIAL

Having given the brief introduction to determinant theory presented in Section 4.2, wereturn to the central topic of this chapter, the eigenvalue problem. For reference, recallthat the eigenvalue problem for an (n× n) matrix A has two parts:

1. Find all scalars λ such that A − λI is singular. (Such scalars are the eigen-values of A.)

2. Given an eigenvalue λ, find all nonzero vectors x such that (A − λI)x = θ .(Such vectors are the eigenvectors corresponding to the eigenvalue λ.)

In this section we focus on part 1, finding the eigenvalues. In the next section we discusseigenvectors.

In Section 4.1, we were able to determine the eigenvalues of a (2 × 2) matrix byusing a test for singularity given by Eq. (4) in Section 4.1. Knowing Theorem 3 fromSection 4.2, we now have a test for singularity that is applicable to any (n× n) matrix.As applied to the eigenvalue problem, Theorem 3 can be used as follows:

A− λI is singular⇔ det(A− λI) = 0. (1)

An example will illustrate how the singularity test given in Eq. (1) is used in practice.

Example 1 Use the singularity test given in Eq. (1) to determine the eigenvalues of the (3×3)matrixA, where

A =

1 1 10 3 3−2 1 1

.

Solution A scalar λ is an eigenvalue of A if and only if A − λI is singular. According to thesingularity test in Eq. (1), λ is an eigenvalue of A if and only if λ is a scalar such that

det(A− λI) = 0.

Thus we focus on det(A− λI), where A− λI is the matrix given by

A− λI =

1 1 10 3 3−2 1 1

−

λ 0 00 λ 00 0 λ

=

1− λ 1 10 3− λ 3−2 1 1− λ

.


4.4 Eigenvalues and the Characteristic Polynomial 299

Expanding det(A− λI) along the first column, we have

det(A− λI) =

∣∣∣∣∣∣∣1− λ 1 1

0 3− λ 3

−2 1 1− λ

∣∣∣∣∣∣∣

= (1− λ)∣∣∣∣∣

3− λ 3

1 1− λ

∣∣∣∣∣− (0)∣∣∣∣∣

1 1

1 1− λ

∣∣∣∣∣

+ (−2)

∣∣∣∣∣1 1

3− λ 3

∣∣∣∣∣= (1− λ)[(3− λ)(1− λ)− 3] − 2[3− (3− λ)]= (1− λ)[λ2 − 4λ] − 2[λ]= [−λ3 + 5λ2 − 4λ] − [2λ]= −λ3 + 5λ2 − 6λ

= −λ(λ2 − 5λ+ 6)

= −λ(λ− 3)(λ− 2).

From the singularity test in Eq. (1), we see that A− λI is singular if and only if λ = 0,λ = 3, or λ = 2.

The ideas developed in Example 1 will be formalized in the next subsection.

The Characteristic PolynomialFrom the singularity condition given in Eq. (1), we know that A − λI is singular ifand only if det(A − λI) = 0. In Example 1, for a (3 × 3) matrix A, we saw that theexpression det(A− λI) was a polynomial of degree 3 in λ. In general, it can be shownthat det(A − λI) is a polynomial of degree n in λ when A is (n × n). Then, sinceA− λI is singular if and only if det(A− λI) = 0, it follows that the eigenvalues of Aare precisely the zeros of the polynomial det(A− λI).

To avoid any possible confusion between the eigenvalues λ of A and the problemof finding the zeros of this associated polynomial (called the characteristic polynomialof A), we will use the variable t instead of λ in the characteristic polynomial and writep(t) = det(A− tI ). To summarize this discussion, we give Theorems 9 and 10.

Theorem 9 Let A be an (n× n) matrix. Then det(A− tI ) is a polynomial of degree n in t .

The proof of Theorem 9 is somewhat tedious, and we omit it. The fact thatdet(A− tI ) is a polynomial leads us to the next definition.



Definition 5 Let A be an (n× n) matrix. The nth-degree polynomial, p(t), given by

p(t) = det(A− tI )is called the characteristic polynomial for A.

Again, in the context of the singularity test in Eq. (1), the roots of p(t) = 0 are theeigenvalues of A. This observation is stated formally in the next theorem.

Theorem 10 Let A be an (n× n) matrix, and let p be the characteristic polynomial for A. Then theeigenvalues of A are precisely the roots of p(t) = 0.

Theorem 10 has the effect of replacing the original problem—determining values λfor whichA−λI is singular—by an equivalent problem, finding the roots of a polynomialequationp(t) = 0. Since polynomials are familiar and an immense amount of theoreticaland computational machinery has been developed for solving polynomial equations, weshould feel more comfortable with the eigenvalue problem.

The equation p(t) = 0 that must be solved to find the eigenvalues of A is calledthe characteristic equation. Suppose that p(t) has degree n, where n ≥ 1. Then theequation p(t) = 0 can have no more than n distinct roots. From this fact, it followsthat:

(a) An (n× n) matrix can have no more than n distinct eigenvalues.

Also, by the fundamental theorem of algebra, the equation p(t) = 0 always has at leastone root (possibly complex). Therefore:

(b) An (n× n) matrix always has at least one eigenvalue (possibly complex).

Finally, we recall that any nth-degree polynomial p(t) can be written in the factoredform

p(t) = a(t − r1)(t − r2) · · · (t − rn).The zeros of p, r1, r2, . . . , rn, however, need not be distinct or real. The number of timesthe factor (t − r) appears in the factorization of p(t) given above is called the algebraicmultiplicity of r .

Example 2 Find the characteristic polynomial and the eigenvalues for the (2× 2) matrix

A =[

1 53 3

].

Solution By Definition 5, the characteristic polynomial is found by calculatingp(t) = det(A−tI ),or

p(t) =∣∣∣∣∣

1− t 5

3 3− t

∣∣∣∣∣ = (1− t)(3− t)− 15

= t2 − 4t − 12 = (t − 6)(t + 2).



THE FUNDAMENTAL THEOREMOF ALGEBRA The eigenvalues of an(n× n) matrix A are the zeros of p(t) = det(A− tI ), a polynomial of degree n. The fundamentaltheorem of algebra states that the equation p(t) = 0 has a solution, r1, in the field of complex numbers.Since q(t) = p(t)/(t − r1) is a polynomial of degree n− 1, repeated use of this result allows us to write

p(t) = a(t − r1)(t − r2) · · · (t − rn).A number of famous mathematicians (including Newton, Euler, d’Alembert, and Lagrange)

attempted proofs of the fundamental theorem. In 1799, Gauss critiqued these attempts and presented aproof of his own. He admitted that his proof contained an unestablished assertion, but he stated that itsvalidity could not be doubted. Gauss gave three more proofs in his lifetime, but all suffered from animperfect understanding of the concept of continuity and the structure of the complex number system.These properties were established in 1874 by Weierstrass and not only made the proofs by Gaussrigorous, but a 1746 proof due to d’Alembert as well.

By Theorem 10, the eigenvalues of A are the roots of p(t) = 0; thus the eigenvalues areλ = 6 and λ = −2.


A =[

2 −11 2

].

Solution The characteristic polynomial is

p(t) =∣∣∣∣∣ 2− t −1

1 2− t

∣∣∣∣∣ = t2 − 4t + 5.

By the quadratic formula, the eigenvalues are λ = 2+ i and λ = 2− i. Therefore, thisexample illustrates that a matrix with real entries can have eigenvalues that are complex.In Section 4.6, we discuss complex eigenvalues and eigenvectors at length.


A =

3 −1 −1−12 0 5

4 −2 −1

.

Solution By Definition 5, the characteristic polynomial is given by p(t) = det(A− tI ), or

p(t) =

∣∣∣∣∣∣∣3− t −1 −1−12 −t 5

4 −2 −1− t

∣∣∣∣∣∣∣ .



Expanding along the first column, we have

p(t) = (3− t)∣∣∣∣∣ −t 5−2 −1− t

∣∣∣∣∣+ 12

∣∣∣∣∣ −1 −1−2 −1− t

∣∣∣∣∣+ 4

∣∣∣∣∣ −1 −1−t 5

∣∣∣∣∣= (3− t)[t (1+ t)+ 10] + 12[(1+ t)− 2] + 4[−5− t]= (3− t)[t2 + t + 10] + 12[t − 1] + 4[−t − 5]= [−t3 + 2t2 − 7t + 30] + [12t − 12] + [−4t − 20]= −t3 + 2t2 + t − 2.

By Theorem 10, the eigenvalues of A are the roots of p(t) = 0. We can write p(t) as

p(t) = −(t − 2)(t − 1)(t + 1),

and thus the eigenvalues of A are λ = 2, λ = 1, and λ = −1.

(Note: Finding or approximating the root of a polynomial equation is a task thatis generally best left to the computer. Therefore, so that the theory associated with theeigenvalue problem is not hidden by a mass of computational details, the examples andexercises in this chapter will usually be constructed so that the characteristic equationhas integer roots.)

Special ResultsIf we know the eigenvalues of a matrix A, then we also know the eigenvalues of certainmatrices associated with A. A list of such results is found in Theorems 11 and 12.

Theorem 11 Let A be an (n× n) matrix, and let λ be an eigenvalue of A. Then:

(a) λk is an eigenvalue of Ak , k = 2, 3, . . . .(b) If A is nonsingular, then 1/λ is an eigenvalue of A−1.(c) If α is any scalar, then λ+ α is an eigenvalue of A+ αI .

Proof Property (a) is proved by induction, and we begin with the case k = 2. Suppose that λis an eigenvalue of A with an associated eigenvector, x. That is,

Ax = λx, x �= θ . (2)

Multiplying both sides of Eq. (2) by the matrix A gives

A(Ax) = A(λx)A2x = λ(Ax)A2x = λ(λx)A2x = λ2x, x �= θ .

Thus λ2 is an eigenvalue of A2 with a corresponding eigenvector, x.In the exercises the reader is asked to finish the proof of property (a) and prove prop-

erties (b) and (c) of Theorem 11. (Note: As the proof of Theorem 11 will demonstrate,if x is any eigenvector of A, then x is also an eigenvector of Ak , A−1, and A+ αI .)



Example 5 Let A be the (3× 3) matrix in Example 4. Determine the eigenvalues of A5, A−1, andA+ 2I .

Solution From Example 4, the eigenvalues of A are λ = 2, λ = 1, and λ = −1. By Theorem 11,A5 has eigenvalues λ = 25 = 32, λ = 15 = 1, and λ = (−1)5 = −1. Since A5 is a(3× 3) matrix and can have no more than three eigenvalues, those eigenvalues must be32, 1, and −1.

Similarly, the eigenvalues ofA−1 are λ = 1/2, λ = 1, and λ = −1. The eigenvaluesof A+ 2I are λ = 4, λ = 3, and λ = 1.

The proof of the next theorem rests on the following fact (see Section 1.7): If B is asquare matrix, then both B and BT are nonsingular or both B and BT are singular. (Seealso Exercise 30.)

Theorem 12 Let A be an (n× n) matrix. Then A and AT have the same eigenvalues.

Proof Observe that (A− λI)T = AT − λI . By our earlier remark, A− λI and (A− λI)T areeither both singular or both nonsingular. Thus λ is an eigenvalue of A if and only if λ isan eigenvalue of AT .

The next result follows immediately from the definition of an eigenvalue. Wewrite the result as a theorem because it provides another important characterization ofsingularity.

Theorem 13 Let A be an (n × n) matrix. Then A is singular if and only if λ = 0 is an eigenvalueof A.

(Note: If A is singular, then the eigenvectors corresponding to λ = 0 are in the nullspace of A.)

Our final theorem treats a class of matrices for which eigenvalues can be determinedby inspection.

Theorem 14 Let T = (tij ) be an (n×n) triangular matrix. Then the eigenvalues of T are the diagonalentries, t11, t22, . . . , tnn.

Proof Since T is triangular, the matrix T − tI is also triangular. The diagonal entries of T − tIare t11 − t , t22 − t, . . . , tnn − t . Thus, by Theorem 4, the characteristic polynomial isgiven by

p(t) = det(T − tI ) = (t11 − t)(t22 − t) · · · (tnn − t).By Theorem 10, the eigenvalues are λ = t11, λ = t22, . . . , λ = tnn.

Example 6 Find the characteristic polynomial and the eigenvalues for the matrix A given by

A =

1 2 1 00 3 −1 10 0 2 10 0 0 3

.



Solution By Theorem 4, p(t) = det(A− tI ) has the form

p(t) = (1− t)(3− t)2(2− t).The eigenvalues are λ = 1, λ = 2, and λ = 3. The eigenvalues λ = 1 and λ = 2 have al-gebraic multiplicity 1, whereas the eigenvalue λ = 3 has algebraic multiplicity 2.

Computational ConsiderationsIn all the examples we have considered so far, it was possible to factor the characteristicpolynomial and thus determine the eigenvalues by inspection. In reality we can rarelyexpect to be able to factor the characteristic polynomial; so we must solve the character-istic equation by using numerical root-finding methods. To be more specific about rootfinding, we recall that there are formulas for the roots of some polynomial equations.For instance, the solution of the linear equation

at + b = 0, a �= 0,

is given by

t = −ba;

and the roots of the quadratic equation

at2 + bt + c = 0, a �= 0,

are given by the familiar quadratic formula

t = −b ±√b2 − 4ac

2a.

There are similar (although more complicated) formulas for the roots of third-degreeand fourth-degree polynomial equations. Unfortunately there are no such formulas forpolynomials of degree 5 or higher [that is, formulas that express the zeros of p(t) as asimple function of the coefficients of p(t)]. Moreover, in the mid-nineteenth centuryAbel proved that such formulas cannot exist for polynomials of degree 5 or higher.∗This means that in general we cannot expect to find the eigenvalues of a large matrixexactly—the best we can do is to find good approximations to the eigenvalues. Theeigenvalue problem differs qualitatively from the problem of solving Ax = b. For asystem Ax = b, if we are willing to invest the effort required to solve the system byhand, we can obtain the exact solution in a finite number of steps. On the other hand,we cannot in general expect to find roots of a polynomial equation in a finite number ofsteps.

Finding roots of the characteristic equation is not the only computational aspect ofthe eigenvalue problem that must be considered. In fact, it is not hard to see that specialtechniques must be developed even to find the characteristic polynomial. To see thedimensions of this problem, consider the characteristic polynomial of an (n× n)matrixA: p(t) = det(A− tI ). The evaluation of p(t) from a cofactor expansion of det(A− tI )

∗For a historical discussion, see J. E. Maxfield and M. W. Maxfield, Abstract Algebra and Solution by Radicals(Dover, 1992).



ultimately requires the evaluation of n!/2 determinants of order (2×2). Even for modestvalues of n, the number n!/2 is alarmingly large. For instance,

10!/2 = 1,814,400,

whereas

20!/2 > 1.2× 1018.

The enormous number of calculations required to compute det(A − tI ) means that wecannot find p(t) in any practical sense by expanding det(A− tI ). In Chapter 6, we notethat there are relatively efficient ways of finding det(A), but these techniques (whichamount to using elementary row operations to triangularize A) are not useful in ourproblem of computing det(A− tI ) because of the variable t . In Section 7.3, we resolvethis difficulty by using similarity transformations to transformA to a matrixH , whereAandH have the same characteristic polynomial, and where it is a trivial matter to calculatethe characteristic polynomial for H . Moreover, these transformation methods will giveus some other important results as a by-product, results such as the Cayley–Hamiltontheorem, which have some practical computational significance.

4.4 EXERCISES

In Exercises 1–14, find the characteristic polynomial andthe eigenvalues for the given matrix. Also, give the al-gebraic multiplicity of each eigenvalue. [Note: In eachcase the eigenvalues are integers.]1.[

1 02 3

]2.[

2 10 −1

]

3.[

2 −1−1 2

]4.[

13 −169 −11

]

5.[

1 −11 3

]6.[

2 23 3

]

7.−6 −1 2

3 2 0−14 −2 5

8.−2 −1 0

0 1 1−2 −2 −1

9.

3 −1 −1−12 0 5

4 −2 −1

10.−7 4 −3

8 −3 332 −16 13

11.

2 4 40 1 −10 1 3

12. 6 4 4 14 6 1 44 1 6 41 4 4 6

13. 5 4 1 14 5 1 11 1 4 21 1 2 4

14. 1 −1 −1 −1−1 1 −1 −1−1 −1 1 −1−1 −1 −1 1

15. Prove property (b) of Theorem 11. [Hint: Beginwith Ax = λx, x �= θ .]

16. Prove property (c) of Theorem 11.17. Complete the proof of property (a) of Theorem 11.18. Let q(t) = t3 − 2t2 − t + 2; and for any (n × n)

matrix H , define the matrix polynomial q(H) by

q(H) = H 3 − 2H 2 −H + 2I,

where I is the (n× n) identity matrix.a) Prove that if λ is an eigenvalue of H , then the

number q(λ) is an eigenvalue of the matrixq(H). [Hint: Suppose that Hx = λx, wherex �= θ , and use Theorem 11 to evaluate q(H)x.]

b) Use part a) to calculate the eigenvalues of q(A)and q(B), where A and B are from Exercises 7and 8, respectively.

19. With q(t) as in Exercise 18, verify that q(C) is thezero matrix, where C is from Exercise 9. (Note thatq(t) is the characteristic polynomial for C. See Ex-ercises 20–23.)



Exercises 20–23 illustrate the Cayley–Hamilton theo-rem, which states that if p(t) is the characteristic poly-nomial for A, then p(A) is the zero matrix. (As in Ex-ercise 18, p(A) is the (n × n) matrix that comes fromsubstituting A for t in p(t).) In Exercises 20–23, verifythat p(A) = O for the given matrix A.20. A in Exercise 3 21. A in Exercise 4

22. A in Exercise 9 23. A in Exercise 13

24. This problem establishes a special case of theCayley–Hamilton theorem.a) Prove that if B is a (3× 3) matrix, and ifBx = θ for every x in R3, then B is the zeromatrix. [Hint: Consider Be1, Be2, and Be3.]

b) Suppose that λ1, λ2, and λ3 are the eigenvaluesof a (3× 3) matrix A, and suppose that u1, u2,and u3 are corresponding eigenvectors. Provethat if {u1, u2, u3} is a linearly independent set,and if p(t) is the characteristic polynomial forA, then p(A) is the zero matrix. [Hint: Anyvector x in R3 can be expressed as a linearcombination of u1, u2, and u3.]

25. Consider the (2× 2) matrix A given by

A =[a b

c d

].

The characteristic polynomial for A is p(t) = t2 −(a + d)t + (ad − bc). Verify the Cayley–Hamiltontheorem for (2 × 2) matrices by forming A2 andshowing that p(A) is the zero matrix.

26. Let A be the (3 × 3) upper-triangular matrix givenby

A =a d f

0 b e

0 0 c

.

The characteristic polynomial for A is p(t) =−(t − a)(t − b)(t − c). Verify that p(A) has theform p(A) = −(A− aI)(A− bI)(A− cI). [Hint:Expand p(t) and p(A); for instance, (A− bI)(A−cI) = A2− (b+ c)A+bcI ]. Next, show that p(A)is the zero matrix by forming the product of the ma-tricesA−aI , A−bI , andA− cI . [Hint: Form theproduct (A− bI)(A− cI) first.]

27. Let q(t) = tn+an−1tn−1+· · ·+a1t+a0, and define

the (n× n) “companion” matrix by

A =

−an−1 −an−2 · · · −a1 −a0

1 0 · · · 0 00 1 · · · 0 0...

...

0 0 · · · 1 0

.

a) For n = 2 and for n = 3, show thatdet(A− tI ) = (−1)nq(t).

b) Give the companion matrix A for thepolynomial q(t) = t4 + 3t3 − t2 + 2t − 2.Verify that q(t) is the characteristic polynomialfor A.

c) Prove for all n that det(A− tI ) = (−1)nq(t).28. The power method is a numerical method used to es-

timate the dominant eigenvalue of a matrix A. (Bythe dominant eigenvalue, we mean the one that islargest in absolute value.) The algorithm proceedsas follows:a) Choose any starting vector x0, x0 �= θ .b) Let xk+1 = Axk , k = 0, 1, 2, . . . .c) Let βk = xTk xk+1/xTk xk , k = 0, 1, 2, . . . .

Under suitable conditions, it can be shown that{βk} → λ1, where λ1 is the dominant eigenvalueof A. Use the power method to estimate the domi-nant eigenvalue of the matrix in Exercise 9. Use thestarting vector

x =

111

and calculate β0, β1, β2, β3, and β4.29. This exercise gives a condition under which the

power method (see Exercise 28) converges. Sup-pose that A is an (n× n) matrix and has real eigen-values λ1, λ2, . . . , λn with corresponding eigen-vectors u1, u2, . . . , un. Furthermore, suppose that|λ1| > |λ2| ≥ · · · ≥ |λn|, and the starting vectorx0 satisfies x0 = c1u1 + c2u2 + · · · + cnun, wherec1 �= 0. Prove that

limk→∞βk = λ1.

[Hint: Observe that xj = Ajx0, j = 1, 2, . . . ,and use Theorem 11 to calculate xk+1 and xk . Next,factor all powers of λ1 from the numerator and de-nominator of βk = xTk xk+1/xTk xk .]


4.5 Eigenvectors and Eigenspaces 307

30. Theorem 12 shows that A and AT have the sameeigenvalues. In Theorem 5 of Section 4.3, it wasshown that det(A) = det(AT ). Use this result toshow that A and AT have the same characteristicpolynomial. [Note: Theorem 12 proves thatA−λIand AT − λI are singular or nonsingular together.This exercise shows that the eigenvalues of A andAT have the same algebraic multiplicity.]

The characteristic polynomial p(t) = det(A − tI ) hasthe form p(t) = (−1)ntn + an−1t

n−1 + · · · + a1t + a0.

The coefficients of p(t) can be found by evaluatingdet(A−tI ) atn distinct values of t and solving the result-ing Vandermonde system for an−1, . . . , a1, a0. Employthis technique in Exercises 31–34 to find the character-istic polynomial for the indicated matrix A.



4.5 EIGENVECTORS AND EIGENSPACES

As we saw in the previous section, we can find the eigenvalues of a matrix A by solvingthe characteristic equation det(A − tI ) = 0. Once we know the eigenvalues, the fa-miliar technique of Gaussian elimination can be employed to find the eigenvectors thatcorrespond to the various eigenvalues.

In particular, the eigenvectors corresponding to an eigenvalue λ ofA are the nonzerosolutions of

(A− λI)x = θ . (1)

Given a value forλ, the equations in (1) can be solved for x by using Gaussian elimination.

Example 1 Find the eigenvectors that correspond to the eigenvalues of matrix A in Example 1 ofSection 4.4.

Solution For matrix A in Example 1, A− λI is the matrix

A− λI =

1− λ 1 10 3− λ 3−2 1 1− λ

.

Also, from Example 1 we know that the eigenvalues of A are given by λ = 0, λ = 2,and λ = 3.

For each eigenvalue λ, we find the eigenvectors that correspond to λ by solving thesystem (A− λI)x = θ . For the eigenvalue λ = 0, we have (A− 0I )x = θ , or Ax = θ ,to solve:

x1 + x2 + x3 = 03x2 + 3x3 = 0

−2x1 + x2 + x3 = 0.

The solution of this system is x1 = 0, x2 = −x3, with x3 arbitrary. Thus the eigenvectors



of A corresponding to λ = 0 are given by

x =

0−aa

= a

0−1

1

, a �= 0,

and any such vector x satisfiesAx = 0 · x. This equation illustrates that the definition ofeigenvalues does allow the possibility that λ = 0 is an eigenvalue. We stress, however,that the zero vector is never considered an eigenvector (after all, Ax = λx is alwayssatisfied for x = θ , no matter what value λ has).

The eigenvectors corresponding to the eigenvalue λ = 3 are found by solving(A− 3I )x = θ :

−2x1 + x2 + x3 = 03x3 = 0

−2x1 + x2 − 2x3 = 0.

The solution of this system is x3 = 0, x2 = 2x1, with x1 arbitrary. Thus the nontrivialsolutions of (A− 3I )x = θ (the eigenvectors of A corresponding to λ = 3) all have theform

x =

a

2a0

= a

120

, a �= 0.

Finally, the eigenvectors corresponding to λ = 2 are found from (A−2I )x = θ , and thesolution is x1 = −2x3, x2 = −3x3, with x3 arbitrary. So the eigenvectors correspondingto λ = 2 are of the form

x =−2a−3aa

= a

−2−3

1

, a �= 0.

We pause here to make several comments. As Example 1 shows, there are infinitelymany eigenvectors that correspond to a given eigenvalue. This comment should beobvious, for if A− λI is a singular matrix, there are infinitely many nontrivial solutionsof (A − λI)x = θ . In particular, if Ax = λx for some nonzero vector x, then we alsohave Ay = λy when y = ax, with a being any scalar. Thus any nonzero multiple of aneigenvector is again an eigenvector.

Next, we again note that the scalar λ = 0 may be an eigenvalue of a matrix, asExample 1 showed. In fact, from Theorem 13 of Section 4.4 we know that λ = 0 is aneigenvalue of A whenever A is singular.

Last, we observe from Example 1 that finding all the eigenvectors correspondingto λ = 0 is precisely the same as finding the null space of A and then deleting the zerovector, θ . Likewise, the eigenvectors of A corresponding to λ = 2 and λ = 3 are thenonzero vectors in the null space of A− 2I and A− 3I , respectively.



Eigenspaces and Geometric MultiplicityIn the preceding discussion, we made the following observation: If λ is an eigenvalueof A, then the eigenvectors corresponding to λ are precisely the nonzero vectors in thenull space of A− λI . It is convenient to formalize this observation.

Definition 6 Let A be an (n× n) matrix. If λ is an eigenvalue of A, then:(a) The null space of A− λI is denoted by Eλ and is called the eigenspace

of λ.(b) The dimension of Eλ is called the geometric multiplicity of λ.

(Note: Since A− λI is singular, the dimension of Eλ, the geometric multiplicity ofλ, is always at least 1 and may be larger. It can be shown that the geometric multiplicityof λ is never larger than the algebraic multiplicity of λ. The next three examples illustratesome of the possibilities.)

Example 2 Determine the algebraic and geometric multiplicities for the eigenvalues of A

A =

1 1 00 1 10 0 1

.

Solution The characteristic polynomial is p(t) = (1 − t)3, and thus the only eigenvalue of A isλ = 1. The eigenvalue λ = 1 has algebraic multiplicity 3.

The eigenspace is found by solving (A− I )x = θ . The system (A− I )x = θ is

x2 = 0x3 = 0.

Thus x is in the eigenspace Eλ corresponding to λ = 1 if and only if x has the form

x =x1

00

= x1

100

. (2)

The geometric multiplicity of the eigenvalue λ = 1 is 1, and x is an eigenvector if x hasthe form (2) with x1 �= 0.

Example 3 Determine the algebraic and geometric multiplicities for the eigenvalues of B,

B =

1 1 00 1 00 0 1

.



Solution The characteristic polynomial is p(t) = (1 − t)3, so λ = 1 is the only eigenvalue, andit has algebraic multiplicity 3.

The corresponding eigenspace is found by solving (B−I )x = θ . Now (B−I )x = θif and only if x has the form

x =x1

0x3

= x1

100

+ x3

001

. (3)

By (3), the eigenspace has dimension 2, and so the eigenvalue λ = 1 has geometricmultiplicity 2. The eigenvectors of B are the nonzero vectors of the form (3).

Example 4 Determine the algebraic and geometric multiplicities for the eigenvalues of C,

C =

1 0 00 1 00 0 1

.

Solution The characteristic polynomial is p(t) = (1− t)3, so λ = 1 has algebraic multiplicity 3.The eigenspace is found by solving (C − I )x = θ , and since C − I is the zero

matrix, every vector in R3 is in the null space of C − I . The geometric multiplicity ofthe eigenvalue λ = 1 is equal to 3.

(Note: The matrices in Examples 2, 3, and 4 all have the same characteristic poly-nomial, p(t) = (1− t)3. However, the respective eigenspaces are different.)

Defective MatricesFor applications (such as diagonalization) it will be important to know whether an (n×n)matrix A has a set of n linearly independent eigenvectors. As we will see later, if Ais an (n × n) matrix and if some eigenvalue of A has a geometric multiplicity that isless than its algebraic multiplicity, then A will not have a set of n linearly independenteigenvectors. Such a matrix is called defective.

Definition 7 LetA be an (n×n)matrix. If there is an eigenvalue λ ofA such that the geometricmultiplicity of λ is less than the algebraic multiplicity of λ, then A is called adefective matrix.

Note that the matrices in Examples 1 and 4 are not defective. The matrices inExamples 2 and 3 are defective. Example 5 provides another instance of a defectivematrix.



Example 5 Find all the eigenvalues and eigenvectors of the matrix A:

A =

−4 1 1 1

−16 3 4 4

−7 2 2 1

−11 1 3 4

.

Also, determine the algebraic and geometric multiplicities of the eigenvalues.

Solution Omitting the details, a cofactor expansion yields

det(A− tI ) = t4 − 5t3 + 9t2 − 7t + 2

= (t − 1)3(t − 2).

Hence the eigenvalues are λ = 1 (algebraic multiplicity 3) and λ = 2 (algebraic multi-plicity 1).

In solving (A− 2I )x = θ , we reduce the augmented matrix [A− 2I | θ ] as follows,multiplying rows 2, 3, and 4 by constants to avoid working with fractions:

[A− 2I | θ ] =

−6 1 1 1 0

−16 1 4 4 0

−7 2 0 1 0

−11 1 3 2 0

∼

−6 1 1 1 0

−48 3 12 12 0

−42 12 0 6 0

−66 6 18 12 0

∼

−6 1 1 1 0

0 −5 4 4 0

0 5 −7 −1 0

0 −5 7 1 0

∼

−6 1 1 1 0

0 −5 4 4 0

0 0 −3 3 0

0 0 0 0 0

.

Backsolving yields x1 = 3x4/5, x2 = 8x4/5, x3 = x4. Hence x is an eigenvectorcorresponding to λ = 2 only if

x =

x1

x2

x3

x4

=

3x4/5

8x4/5

x4

x4

=

x4

5

3

8

5

5

, x4 �= 0.

Thus the algebraic and geometric multiplicities of the eigenvalue λ = 2 are equal to 1.



In solving (A− I )x = θ , we reduce the augmented matrix [A− I | θ ]:

[A− I | θ ] =

−5 1 1 1 0−16 2 4 4 0−7 2 1 1 0−11 1 3 3 0

∼

−5 1 1 1 0−80 10 20 20 0−35 10 5 5 0−55 5 15 15 0

∼

−5 1 1 1 0

0 −6 4 4 00 3 −2 −2 00 −6 4 4 0

∼

−5 1 1 1 0

0 3 −2 −2 00 0 0 0 00 0 0 0 0

.

Backsolving yields x1 = (x3 + x4)/3 and x2 = 2(x3 + x4)/3. Thus x is an eigenvectorcorresponding to λ = 1 only if x is a nonzero vector of the form

x =

x1

x2

x3

x4

=

(x3 + x4)/3

2(x3 + x4)/3x3

x4

=

x3

3

1230

+

x4

3

1203

. (4)

By (4), the eigenspace Eλ corresponding to λ = 1 has a basis consisting of the vectors

1230

and

1203

.

Since Eλ has dimension 2, the eigenvalue λ = 1 has geometric multiplicity 2 andalgebraic multiplicity 3. (Matrix A is defective.)

The next theorem shows that a matrix can be defective only if it has repeated eigen-values. (As shown in Example 4, however, repeated eigenvalues do not necessarily meanthat a matrix is defective.)

Theorem 15 Let u1, u2, . . . , uk be eigenvectors of an (n × n) matrix A corresponding to distincteigenvalues λ1, λ2, . . . , λk . That is,

Auj = λjuj for j = 1, 2, . . . , k; k ≤ n (5)

λi �= λj for i �= j ; 1 ≤ i, j ≤ k. (6)

Then {u1, u2, . . . , uk} is a linearly independent set.



Proof Since u1 �= θ , the set {u1} is trivially linearly independent. If the set {u1, u2, . . . , uk}were linearly dependent, then there would exist an integer m, 2 ≤ m ≤ k, such that:

(a) S1 = {u1, u2, . . . , um−1} is linearly independent.(b) S2 = {u1, u2, . . . , um−1, um} is linearly dependent.

Now since S2 is linearly dependent, there exist scalars c1, c2, . . . , cm (not all zero) suchthat

c1u1 + c2u2 + · · · + cm−1um−1 + cmum = θ . (7)

Furthermore, cm in Eq. (7) cannot be zero. (If cm = 0, then Eq. (7) would imply that S1is linearly dependent, contradicting (a).)

Multiplying both sides of Eq. (7) by A and using Auj = λjuj , we obtain

c1λ1u1 + c2λ2u2 + · · · + cm−1λm−1um−1 + cmλmum = θ . (8)

Multiplying both sides of Eq. (7) by λm yields

c1λmu1 + c2λmu2 + · · · + cm−1λmum−1 + cmλmum = θ . (9)

Subtracting Eq. (8) from Eq. (9), we find that

c1(λm − λ1)u1 + c2(λm − λ2)u2 + · · · + cm−1(λm − λm−1)um−1 = θ . (10)

If we set βj = cj (λm − λj ), 1 ≤ j ≤ m− 1, Eq. (10) becomes

β1u1 + β2u2 + · · · + βm−1um−1 = θ .Since S1 is linearly independent, it then follows that

β1 = β2 = · · · = βm−1 = 0,

or

cj (λm − λj ) = 0, for j = 1, 2, . . . , m− 1.

Because λm �= λj for j �= m, we must have cj = 0 for 1 ≤ j ≤ m− 1.Finally (see Eq. 7), if cj = 0 for 1 ≤ j ≤ m − 1, then cmum = θ . Since

cm �= 0, it follows that um = θ . But um is an eigenvector, and so um �= θ . Hencewe have contradicted the assumption that there is an m, m ≤ k, such that S2 is linearlydependent. Thus {u1, u2, . . . , uk} is linearly independent.

An important and useful corollary to Theorem 15 is given next.

Corollary LetA be an (n×n)matrix. IfA has n distinct eigenvalues, thenA has a set of n linearlyindependent eigenvectors.



4.5 EXERCISES

The following list of matrices and their respective char-acteristic polynomials is referred to in Exercises 1–11.

A =[

2 −1−1 2

], B =

[1 −11 3

],

p(t) = (t − 3)(t − 1), p(t) = (t − 2)2,

C =−6 −1 2

3 2 0−14 −2 5

, D =

−7 4 −3

8 −3 332 −16 13

,

p(t) = −(t − 1)2(t + 1), p(t) = −(t − 1)3,

E =

6 4 4 14 6 1 44 1 6 41 4 4 6

, F =

1 −1 −1 −1−1 1 −1 −1−1 −1 1 −1−1 −1 −1 1

,

p(t) =(t + 1)(t + 5)2(t − 15),

p(t) = (t + 2)(t − 2)3

In Exercises 1–11, find a basis for the eigenspace Eλfor the given matrix and the value of λ. Determine thealgebraic and geometric multiplicities of λ.1. A, λ = 3 2. A, λ = 1 3. B, λ = 24. C, λ = 1 5. C, λ = −1 6. D, λ = 17. E, λ = −1 8. E, λ = 5 9. E, λ = 1510. F , λ = −2 11. F , λ = 2

In Exercises 12–17, find the eigenvalues and the eigen-vectors for the given matrix. Is the matrix defective?12.

1 1 −10 2 −10 0 1

13.

2 1 20 3 20 0 2

14.

1 2 10 1 20 0 1

15.

2 0 30 2 10 0 1

16.−1 6 2

0 5 −61 0 −2

17.

3 −1 −1−12 0 5

4 −2 −1

18. If a vector x is a linear combination of eigenvec-tors of a matrix A, then it is easy to calculate the

product y = Akx for any positive integer k. For in-stance, suppose that Au1 = λ1u1 and Au2 = λ2u2,where u1 and u2 are nonzero vectors. If x =a1u1 + a2u2, then (see Theorem 11 of Section 4.4)y = Akx = Ak(a1u1+a2u2) = a1A

ku1+a2Aku2 =

a1(λ1)ku1 + a2(λ2)

ku2. Find A10x, where

A =[

4 −25 −3

]and x =

[09

].

19. As in Exercise 18, calculate A10x for

A =

1 2 −10 5 −20 6 −2

and x =

247

.

20. Consider a (4× 4) matrix H of the form

H =

× × × ×a × × ×0 b × ×0 0 c ×

(11)

In matrix (11) the entries designated×may be zeroor nonzero. Suppose, in matrix (11), that a, b, andc are nonzero. Let λ be any eigenvalue of H . Showthat the geometric multiplicity of λ is equal to 1.[Hint: Verify that the rank of H − λI is exactlyequal to 3.]

21. An (n×n)matrixP is called idempotent ifP 2 = P .Show that if P is an invertible idempotent matrix,then P = I .

22. Let P be an idempotent matrix. Show that the onlyeigenvalues of P are λ = 0 and λ = 1. [Hint:Suppose that Px = λx, x �= θ .]

23. Letu be a vector inRn such thatuT u = 1. Show thatthe (n × n) matrix P = uuT is an idempotent ma-trix. [Hint: Use the associative properties of matrixmultiplication.]

24. Verify that ifQ is idempotent, then so is I−Q. Alsoverify that (I − 2Q)−1 = I − 2Q.

25. Suppose that u and v are vectors in Rn such thatuT u = 1, vT v = 1, and uT v = 0. Show thatP = uuT + vvT is idempotent.

26. Show that any nonzero vector of the form au + bvis an eigenvector corresponding to λ = 1 for thematrix P in Exercise 25.


4.6 Complex Eigenvalues and Eigenvectors 315

27. Let A be an (n × n) symmetric matrix, with (real)distinct eigenvalues λ1, λ2, . . . , λn. Let the corre-sponding eigenvectors u1, u2, . . . , un be chosen sothat ‖ui‖ = 1 (that is, uTi ui = 1). Exercise 29shows that A can be decomposed asA = λ1u1uT1 + λ2u2uT2 + · · · + λnunuTn . (12)

Verify decomposition (12) for each of the followingmatrices.

a) B =[

2 −1−1 2

]b) C =

[1 22 1

]

c) D =[

3 22 0

]

28. Let A be a symmetric matrix and suppose thatAu = λu, u �= θ and Av = βv, v �= θ . Also sup-pose that λ �= β. Show that uT v = 0. [Hint: Since

Av and u are vectors, (Av)T u = uT(Av). Rewritethe term (Av)T u by using Theorem 10, property 2,of Section 1.6.]

29. Having A as in Exercise 27, we see from Exercise28 that uTi uj = 0, i �= j . By the corollary to Theo-rem 15, {u1, u2, . . . , un} is an orthonormal basis forRn. To show that decomposition (12) is valid, let Cdenote the right-hand side of (12). Then show that(A − C)ui = θ for 1 ≤ i ≤ n. Finally, show thatA − C is the zero matrix. [Hint: Look at Exercise24 in Section 4.4.]

(Note: We will see in the next section that a real sym-metric matrix has only real eigenvalues. It can alsobe shown that the eigenvectors can be chosen to be or-thonormal, even when the eigenvalues are not distinct.Thus decomposition (12) is valid for any real symmetricmatrix A.)

4.6 COMPLEX EIGENVALUES AND EIGENVECTORS

Up to now we have not examined in detail the case in which the characteristic equationhas complex roots—that is, the case in which a matrix has complex eigenvalues. We willsee that the possibility of complex eigenvalues does not pose any additional problemsexcept that the eigenvectors corresponding to complex eigenvalues will have complexcomponents, and complex arithmetic will be required to find these eigenvectors.

Example 1 Find the eigenvalues and the eigenvectors for

A =[

3 1−2 1

].

Solution The characteristic polynomial for A is p(t) = t2− 4t + 5. The eigenvalues of A are theroots of p(t) = 0, which we can find from the quadratic formula,

λ = 4±√−42

= 2± i,

where i = √−1. Thus despite the fact that A is a real matrix, the eigenvalues of A arecomplex, λ = 2 + i and λ = 2 − i. To find the eigenvectors of A corresponding toλ = 2+ i, we must solve [A− (2+ i)I ]x = θ , which leads to the (2× 2) homogeneoussystem

(1− i)x1+ x2 = 0−2x1− (1+ i)x2 = 0.

(1)



COMPLEX NUMBERS Ancient peoples knew that certain quadratic equations, such asx2 + 1 = 0, had no real solutions. This posed no difficulty, however, because their particular problems(such as finding the intersections of a line and a circle) did not require complex solutions. Hence peoplepaid complex numbers little attention and referred to them as imaginary numbers. In 1545, however,Cardano published a formula for finding the roots of a cubic equation that often required algebraicmanipulation of

√−1 in order to find certain real solutions. Guided by Cardano’s formula, Bombelli, in1572, is credited with working out the algebra of complex numbers. However, the important link togeometry, the association of a + bi with the point (a, b), was not developed for another hundred years.

Probably the two people most influential in developing complex numbers into their essential role indescribing scientific phenomena were Leonhard Euler (1707–1783) and Augustin-Louis Cauchy(1789–1857). Besides introducing much of the mathematical notation used today, Euler used complexnumbers to unify the study of exponential, logarithmic, and trigonometric functions. Cauchy is regardedas the founder of the field of functions of a complex variable. Many terms and results in the extension ofcalculus to complex variables are due to Cauchy and are named after him.

At the end of this section, we will discuss the details of how such a system is solved.For the moment, we merely note that if the first equation is multiplied by 1 + i, thenEq. (1) is equivalent to

2x1 + (1+ i)x2 = 0−2x1 − (1+ i)x2 = 0.

Thus the solutions of Eq. (1) are determined by x1 = −(1 + i)x2/2. The nonzerosolutions of Eq. (1), the eigenvectors corresponding to λ = 2+ i, are of the form

x = a[

1+ i−2

], a �= 0.

Similar calculations show that the eigenvectors of A corresponding to λ = 2− i are allof the form

x = b[

1− i−2

], b �= 0.

Complex Arithmetic and Complex VectorsBefore giving the major theoretical results of this section, we briefly review several ofthe details of complex arithmetic. We will usually represent a complex number z in theform z = a + ib, where a and b are real numbers and i2 = −1. In the representationz = a + ib, a is called the real part of z, and b is called the imaginary part of z. Ifz = a + ib and w = c + id , then z + w = (a + c) + i(b + d), whereas zw =(ac − bd)+ i(ad + bc). Thus, for example, if z1 = 2+ 3i and z2 = 1− i, then

z1 + z2 = 3+ 2i and z1z2 = 5+ i.



If z is the complex number z = a + ib, then the conjugate of z (denoted by z) isdefined to be z = a − ib. We list several properties of the conjugate operation:

(z+ w) = z+ w(zw) = zwz+ z = 2az− z = 2ibzz = a2 + b2.

(2)

From the last equality, we note that zz is a positive real quantity when z �= 0. In fact,if we visualize z as the point (a, b) in the coordinate plane (called the complex plane),then√a2 + b2 is the distance from (a, b) to the origin (see Fig. 4.2). Hence we define

the magnitude of z to be |z|, where

|z| = √zz =√a2 + b2.

We also note from (2) that if z = z, then b = 0 and so z is a real number.

a x

b

–b

y(a, b)

(a, –b)

z = a – ib

z = a + ib

Figure 4.2 A complex number and its conjugate

Example 2 Let z = 4− 2i and w = 3+ 5i.

(a) Find the values of the real and imaginary parts of w.(b) Calculate z, w, and |z|.(c) Find u = 2z+ 3w and v = zw.

Solution

(a) Since w = 3+ 5i, the real part of w is 3, and the imaginary part is 5.(b) For z = 4− 2i, z = 4+ 2i. Similarly, since w = 3+ 5i, we have w = 3− 5i.

Finally, |z| = √(4)2 + (−2)2 = √20.(c) Here, 2z = 2(4−2i) = 8−4i, whereas 3w = 3(3−5i) = 9−15i. Therefore,

u = 2z+ 3w = (8− 4i)+ (9− 15i) = 17− 19i.



The product v = zw is calculated as follows:

v = zw = (4+ 2i)(3+ 5i)= 12+ 6i + 20i + 10i2

= 2+ 26i.

The conjugate operation is useful when dealing with matrices and vectors thathave complex components. We define the conjugate of a vector as follows: If x =[x1, x2, . . . , xn]T , then the conjugate vector (denoted by x) is given by

x =

x1

x2...

xn

.

In (3), an example of a vector x and its conjugate, x, is given:

x =

2+ 3i4

1− 7i

, x =

2− 3i4

1+ 7i

. (3)

The magnitude or norm of a complex vector x (denoted by ‖x‖) is defined in termsof x and x:

‖x‖ =√xT x. (4)

With respect to Eq. (4), note that

xT x = x1x1 + x2x2 + · · · + xnxn = |x1|2 + |x2|2 + · · · + |xn|2.(If x is a real vector, so that x = x, then the definition for ‖x‖ in Eq. (4) agrees with ourearlier definition in Section 1.6.)

As the next example illustrates, the scalar product xT y will usually be complexvalues if x and y are complex vectors.

Example 3 Find xT y, ‖x‖, and ‖y‖ for

x =

21− i

3+ 2i

and y =

i

1+ i2− i

.

Solution For xT y, we find

xT y = (2)(i)+ (1− i)(1+ i)+ (3+ 2i)(2− i)= (2i)+ (2)+ (8+ i)= 10+ 3i.

Similarly, ‖x‖ =√xT x = √4+ 2+ 13 = √19, whereas ‖y‖ =

√yT y =√

1+ 2+ 5 = √8.



Eigenvalues of Real MatricesIn a situation where complex numbers might arise, it is conventional to refer to a vectorx as a real vector if all the components of x are known to be real numbers. Similarly, weuse the term real matrix to denote a matrix A, all of whose entries are real.

With these preliminaries, we can present two important results. The first result wasillustrated in Example 1. We found λ = 2+ i to be an eigenvalue with a correspondingeigenvector x = [1 + i,−2]T . We also found that the conjugates, λ = 2 − i andx = [1 − i,−2]T , were the other eigenvalue/eigenvector pair. That is, the eigenvaluesand eigenvectors occurred in conjugate pairs. The next theorem tells us that Example 1is typical.

Theorem 16 Let A be a real (n × n) matrix with an eigenvalue λ and corresponding eigenvector x.Then λ is also an eigenvalue of A, and x is an eigenvector corresponding to λ.

Proof It can be shown (see Exercise 36) that λx = λx. Furthermore, since A is real, it can beshown (see Exercise 36) that

Ax = Ax = Ax.Using these two results and the assumption Ax = λx, we obtain

Ax = Ax = λx = λx, x �= θ .Thus λ is an eigenvalue corresponding to the eigenvector x.

Finally, as the next theorem shows, there is an important class of matrices for whichthe possibility of complex eigenvalues is precluded.

Theorem 17 If A is an (n× n) real symmetric matrix, then all the eigenvalues of A are real.

Proof Let A by any (n × n) real symmetric matrix, and suppose that Ax = λx, where x �= θand where we allow the possibility that x is a complex vector. To isolate λ, we first notethat

xT (Ax) = xT (λx) = λ(xT x). (5)

Regarding Ax as a vector, we see that xT (Ax) = (Ax)T x (since, in general, uT v = vT ufor complex vectors u and v). Using this observation in Eq. (5), we obtain

λxT x = xT (Ax) = (Ax)T x = xTAT x = xTAx, (6)

with the last equality holding because A = AT . Since A is real, we also know thatAx = λx; hence we deduce from Eq. (6) that

λxT x = xTAx = xT (λx) = λxT x,or

λxT x = λxT x. (7)



Because x �= θ , xT x is nonzero, and so from Eq. (7) we see that λ = λ, which meansthat λ is real.

Gaussian Elimination for Systems with ComplexCoefficients (Optional)The remainder of this section is concerned with the computational details of solving(A−λI)x = θ when λ is complex. We will see that although the arithmetic is tiresome,we can use Gaussian elimination to solve a system of linear equations that has somecomplex coefficients in exactly the same way that we solve systems of linear equationshaving real coefficients. For example, consider the (2× 2) system

a11x1 + a12x2 = b1

a21x1 + a22x2 = b2,

where the coefficients aij may be complex. Just as before, we can multiply the firstequation by −a21/a11, add the result to the second equation to eliminate x1 from thesecond equation, and then backsolve to find x2 and x1. For larger systems with complexcoefficients, the principles of Gaussian elimination are exactly the same as they are forreal systems; only the computational details are different.

One computational detail that might be unfamiliar is dividing one complex numberby another (the first step of Gaussian elimination for the (2× 2) system above is to forma21/a11). To see how a complex division is carried out, let z = a + ib and w = c + id,where w �= 0. To form the quotient z/w, we multiply numerator and denominatorby w:

z

w= zwww.

In detail, we have

z

w= zwww= (a + ib)(c − id)

c2 + d2 = (ac + bd)+ i(bc − ad)c2 + d2 . (8)

Our objective is to express the quotient z/w in the standard form z/w = r + is, wherer and s are real numbers; from Eq. (8), r and s are given by

r = ac + bdc2 + d2 and s = bc − ad

c2 + d2 .

For instance,

2+ 3i1+ 2i

= (2+ 3i)(1− 2i)(1+ 2i)(1− 2i)

= 8− i5= 8

5− 1

5i.

Example 4 Use Gaussian elimination to solve the system in (1):

(1− i)x1+ x2 = 0−2x1− (1+ i)x2 = 0.



Solution The initial step in solving this system is to multiply the first equation by 2/(1− i) andthen add the result to the second equation. Following the discussion above, we write2/(1− i) as

21− i =

2(1+ i)(1− i)(1+ i) =

2+ 2i2= 1+ i.

Multiplying the first equation by 1 + i and adding the result to the second equationproduces the equivalent system

(1− i)x1 + x2 = 00 = 0,

which leads to x1 = −x2/(1− i). Simplifying, we obtain

x1 = −x2

1− i =−x2(1+ i)(1− i)(1+ i) =

−(1+ i)2

x2.

With x2 = −2a, the solutions are all of the form

x = a[

1+ i−2

].

(Note: Since we are allowing the possibility of vectors with complex components,we will also allow the parameter a in Example 4 to be complex. For example, with a = iwe see that

x =[ −1+ i−2i

]

is also a solution.)

Example 5 Find the eigenvalues and the eigenvectors of A, where

A =−2 −2 −9−1 1 −3

1 1 4

.

Solution The characteristic polynomial of A is

p(t) = −(t − 1)(t2 − 2t + 2).

Thus the eigenvalues of A are λ = 1, λ = 1+ i, λ = 1− i.As we noted earlier, the complex eigenvalues occur in conjugate pairs; and if we

find an eigenvector x for λ = 1 + i, then we immediately see that x is an eigenvectorfor λ = 1 − i. In this example we find the eigenvectors for λ = 1 + i by reducing theaugmented matrix [A− λI | θ ] to echelon form. Now for λ = 1+ i,

[A− λI | θ ] =−3− i −2 −9 0−1 −i −3 0

1 1 3− i 0

.



To introduce a zero into the (2, 1) position, we use the multiple m, where

m = 1−3− i =

−13+ i =

−(3− i)(3+ i)(3− i) =

−3+ i10

.

Multiplying the first row by m and adding the result to the second row, and then multi-plying the first row by−m and adding the result to the third row, we find that [A−λI | θ ]is row equivalent to

−3− i −2 −9 0

06− 12i

10−3− 9i

100

04+ 2i

103− i

100

.

Multiplying the second and third rows by 10 in the preceding matrix, we obtain arow-equivalent matrix:

−3− i −2 −9 0

0 6− 12i −3− 9i 00 4+ 2i 3− i 0

.

Completing the reduction, we multiply the second row by r and add the result to thethird row, where r is the multiple

r = −(4+ 2i)6− 12i

= −(4+ 2i)(6+ 12i)(6− 12i)(6+ 12i)

= −60i180

= −i3.

We obtain the row-equivalent matrix−3− i −2 −9 0

0 6− 12i −3− 9i 00 0 0 0

;

and the eigenvectors of A corresponding to λ = 1+ i are found by solving

−(3+ i)x1 − 2x2 = 9x3

(6 − 12i)x2 = (3+ 9i)x3,(9)

with x3 arbitrary, x3 �= 0. We first find x2 from

x2 = 3+ 9i6− 12i

x3 = (3+ 9i)(6+ 12i)180

x3 = −90+ 90i180

x3,

or

x2 = −1+ i2

x3.

From the first equation in (9), we obtain

−(3+ i)x1 = 2x2 + 9x3 = (8+ i)x3,



or

x1 = −(8+ i)3+ i x3 = −(8+ i)(3− i)10x3 = −25+ 5i

10x3 = −5+ i

2x3.

Setting x3 = 2a, we have x2 = (−1 + i)a and x1 = (−5 + i)a; so the eigenvectors ofA corresponding to λ = 1+ i are all of the form

x =(−5+ i)a(−1+ i)a

2a

= a

−5+ i−1+ i

2

, a �= 0.

Furthermore, we know that eigenvectors of A corresponding to λ = 1− i have the form

x = b−5− i−1− i

2

, b �= 0.

If linear algebra software is available, then finding eigenvalues and eigenvectors isa simple matter.

Example 6 Find the eigenvalues and the eigenvectors for the (4× 4) matrix

A =

3 3 6 91 4 3 72 −5 8 32 −9 7 4

.

Solution We used MATLAB to solve this problem. The command [V, D] = eig(A) pro-duces a diagonal matrix D and a matrix of eigenvectors V . That is, AV = DV or (if Ais not defective) V −1AV = D. The results from MATLAB are shown in Fig. 4.3. Ascan be seen from the matrix D in Fig. 4.3, A has two complex eigenvalues, which are(to the places shown) λ = 6.9014 + 5.3028i and λ = 6.9014 − 5.3028i. In addition,A has two real eigenvalues λ = 4.0945 and λ = 1.1027. Eigenvectors are found in thecorresponding columns of V .

As the preceding examples indicate, finding eigenvectors that correspond to a com-plex eigenvalue proceeds exactly as for a real eigenvalue except for the additional detailsrequired by complex arithmetic.

Although complex eigenvalues and eigenvectors may seem an undue complication,they are in fact fairly important to applications. For instance, we note (without tryingto be precise) that oscillatory and periodic solutions to first-order systems of differentialequations correspond to complex eigenvalues; and since many physical systems exhibitsuch behavior, we need some way to model them.



A= 3 3 6 9 1 4 3 7 2 -5 8 3 2 -9 7 4

>>[V,D]=eig(A)

V = 0.6897 + 0.2800i 0.6897 - 0.2800i 0.8216 0.9609 0.4761 + 0.2051i 0.4761 - 0.2051i 0.4196 -0.0067 0.1338 + 0.2255i 0.1338 - 0.2255i 0.3014 -0.2765 -0.1139 + 0.3090i -0.1139 - 0.3090i -0.2409 -0.0160

D = 6.9014 + 5.3028i 0 0 0 0 6.9014 - 5.3028i 0 0 0 0 4.0945 0 0 0 0 1.1027

Figure 4.3 MATLAB was used to find the eigenvalues andeigenvectors of matrix A in Example 6—that is, AV = VD orV −1AV = D, where D is diagonal.

4.6 EXERCISES

In Exercises 1–18, s = 1 + 2i, u = 3 − 2i, v = 4 + i,w = 2− i, and z = 1+ i. In each exercise, perform theindicated calculation and express the result in the forma + ib.1. u 2. z 3. u+ v4. z+ w 5. u+ u 6. s − s7. vv 8. uv 9. s2 − w10. z2w 11. uw2 12. s(u2 + v)13. u/v 14. v/u2 15. s/z16. (w + v)/u 17. w + iz 18. s − iwFind the eigenvalues and the eigenvectors for the matri-ces in Exercises 19–24. (For the matrix in Exercise 24,one eigenvalue is λ = 1+ 5i.)19.

[6 8−1 2

]20.

[2 4−2 −2

]

21.[ −2 −1

5 2

]22.

5 −5 −5

−1 4 2

3 −5 −3

23.

1 −4 −1

3 2 3

1 1 3

24.

1 −5 0 0

5 1 0 0

0 0 1 −2

0 0 2 1

In Exercises 25 and 26, solve the linear system.25. (1+ i)x + iy = 5+ 4i(1− i)x − 4y = −11+ 5i

26. (1− i)x − (3+ i)y = −5− i

(2+ i)x + (1+ 2i)y = 1+ 6i


4.7 Similarity Transformations and Diagonalization 325

In Exercises 27–30, calculate ‖x‖.

27. x =[

1+ i2

]28. x =

[3+ i2− i

]

29. x =

1− 2ii

3+ i

30. x =

2i1− i

3

In Exercises 31–34, use linear algebra software to findthe eigenvalues and the eigenvectors.31.

2 2 55 3 71 5 3

32.

1 2 88 4 92 6 1

33. 5 −1 0 83 6 8 −31 1 4 29 7 6 9

34. 5 5 4 60 8 6 71 2 3 16 3 8 5

35. Establish the five properties of the conjugate opera-tion listed in (2).

36. LetA be an (m×n)matrix, and let B be an (n×p)matrix, where the entries of A and B may be com-plex. Use Exercise 35 and the definition of AB toshow that AB = AB. (By A, we mean the matrixwhose ij th entry is the conjugate of the ij th entry ofA.) If A is a real matrix and x is an (n× 1) vector,show that Ax = Ax.

37. Let A be an (m× n) matrix, where the entries of Amay be complex. It is customary to use the symbolA∗ to denote the matrix

A∗ = (A)T .

Suppose that A is an (m × n) matrix and B is an(n× p) matrix. Use Exercise 36 and the propertiesof the transpose operation to give a quick proof that(AB)∗ = B∗A∗.

38. An (n×n)matrixA is called Hermitian ifA∗ = A.a) Prove that a Hermitian matrix A has only real

eigenvalues. [Hint: Observing that xT x = x∗x,modify the proof of Theorem 17.]

b) Let A = (aij ) be an (n× n) Hermitian matrix.Show that aii is real for 1 ≤ i ≤ n.

39. Let p(t) = a0 + a1t + · · · + antn, where the coeffi-cients a0, a1, . . . , an are all real.a) Prove that if r is a complex root of p(t) = 0,

then r is also a root of p(t) = 0.b) If p(t) has degree 3, argue that p(t) must have

at least one real root.c) If A is a (3× 3) real matrix, argue that A must

have at least one real eigenvalue.40. An (n × n) real matrix A is called orthogonal

if ATA = I . Let λ be an eigenvalue of an or-thogonal matrix A, where λ = r + is. Provethat λλ = r2 + s2 = 1. [Hint: First show that‖Ax‖ = ‖x‖ for any vector x.]

41. A real symmetric (n×n)matrixA is called positivedefinite ifxTAx > 0 for allx inRn, x �= θ . Prove thatthe eigenvalues of a real symmetric positive-definitematrix A are all positive.

42. An (n× n) matrix A is called unitary if A∗A = I .(If A is a real unitary matrix, then A is orthogonal;see Exercise 40.) Show that if A is unitary and λ isan eigenvalue for A, then |λ| = 1.

4.7 SIMILARITY TRANSFORMATIONSAND DIAGONALIZATION

In Chapter 1, we saw that two linear systems of equations have the same solution if theiraugmented matrices are row equivalent. In this chapter, we are interested in identifyingclasses of matrices that have the same eigenvalues.

As we know, the eigenvalues of an (n×n)matrixA are the zeros of its characteristicpolynomial,

p(t) = det(A− tI ).



Thus if an (n× n) matrix B has the same characteristic polynomial as A, then A and Bhave the same eigenvalues. As we will see, it is fairly simple to find such matrices B.

SimilarityIn particular, let A be an (n × n) matrix, and let S be a nonsingular (n × n) matrix.Then, as the following calculation shows, the matrices A and B = S−1AS have thesame characteristic polynomial. To establish this fact, observe that the characteristicpolynomial for S−1AS is given by

p(t) = det(S−1AS − tI )= det(S−1AS − tS−1S)

= det[S−1(A− tI )S]= det(S−1) det(A− tI ) det(S), by Theorem 2= [det(S−1) det(S)] det(A− tI )= det(A− tI ).

(1)

(The last equality given follows because det(S−1) det(S) = det(S−1S) = det(I ) = 1.)Thus, by (1), the matrices S−1AS and A have the same characteristic polynomial

and hence the same set of eigenvalues. The discussion above leads to the next definition.

Definition 8 The (n×n)matricesA andB are said to be similar if there is a nonsingular (n×n)matrix S such that B = S−1AS.

The calculations carried out in (1) show that similar matrices have the same char-acteristic polynomial. Consequently the following theorem is immediate.

Theorem 18 If A and B are similar (n × n) matrices, then A and B have the same eigenvalues.Moreover, these eigenvalues have the same algebraic multiplicity.

Although similar matrices always have the same characteristic polynomial, it is nottrue that two matrices with the same characteristic polynomial are necessarily similar.As a simple example, consider the two matrices

A =[

1 01 1

]and I =

[1 00 1

].

Now p(t) = (1− t)2 is the characteristic polynomial for both A and I ; so A and I havethe same set of eigenvalues. If A and I were similar, however, there would be a (2× 2)matrix S such that

I = S−1AS.

But the equation I = S−1AS is equivalent to S = AS, which is in turn equivalentto SS−1 = A or I = A. Thus I and A cannot be similar. (A repetition of this



argument shows that the only matrix similar to the identity matrix is I itself.) In thisrespect, similarity is a more fundamental concept for the eigenvalue problem than isthe characteristic polynomial; two matrices can have exactly the same characteristicpolynomial without being similar; so similarity leads to a more finely detailed way ofdistinguishing matrices.

Although similar matrices have the same eigenvalues, they do not generally havethe same eigenvectors. For example, if B = S−1AS and if Bx = λx, then

S−1ASx = λx or A(Sx) = λ(Sx).Thus if x is an eigenvector for B corresponding to λ, then Sx is an eigenvector for Acorresponding to λ.

DiagonalizationComputations involving an (n×n)matrixA can often be simplified if we know thatA issimilar to a diagonal matrix. To illustrate, suppose S−1AS = D, where D is a diagonalmatrix. Next, suppose we need to calculate the power Ak , here k is a positive integer.Knowing that D = S−1AS, we can proceed as follows:

Dk = (S−1AS)k

= S−1AkS.(2)

(The fact that (S−1AS)k = S−1AkS is established in Exercise 25.) Note that because Dis a diagonal matrix, it is easy to form the power Dk .

Once the matrix Dk is computed, the matrix Ak can be recovered from Eq. (2) byforming SDkS−1:

SDkS−1 = S(S−1AkS)S−1 = Ak.Whenever an (n × n) matrix A is similar to a diagonal matrix, we say that A is

diagonalizable. The next theorem gives a characterization of diagonalizable matrices.

Theorem 19 An (n × n) matrix A is diagonalizable if and only if A possesses a set of n linearlyindependent eigenvectors.

Proof Suppose that {u1, u2, . . . , un} is a set of n linearly independent eigenvectors for A:

Auk = λkuk, k = 1, 2, . . . , n.

Let S be the (n× n) matrix whose column vectors are the eigenvectors of A:

S = [u1, u2, . . . , un].Now S is a nonsingular matrix; so S−1 exists where

S−1S = [S−1u1, S−1u2, . . . , S

−1un] = [e1, e2, . . . , en] = I. (3)

Furthermore, since Auk = λkuk , we obtain

AS = [Au1, Au2, . . . , Aun] = [λ1u1, λ2u2, . . . , λnun];and so from Eq. (3),

S−1AS = [λ1S−1u1, λ2S

−1u2, . . . , λnS−1un] = [λ1e1, λ2e2, . . . , λnen].



Therefore, S−1AS has the form

S−1AS =

λ1 0 0 · · · 00 λ2 0 · · · 00 0 λ3 · · · 0...

...

0 0 0 · · · λn

= D;

and we have shown that if A has n linearly independent eigenvectors, then A is similarto a diagonal matrix.

Now suppose thatC−1AC = D, whereC is nonsingular andD is a diagonal matrix.Let us write C and D in column form as

C = [C1,C2, . . . ,Cn] and D = [d1e1, d2e2, . . . , dnen].From C−1AC = D, we obtainAC = CD, and we write both of these in column form as

AC = [AC1, AC2, . . . , ACn]CD = [d1Ce1, d2Ce2, . . . , dnCen].

But since Cek = Ck for k = 1, 2, . . . , n, we see that AC = CD implies

ACk = dkCk, k = 1, 2, . . . , n.

Since C is nonsingular, the vectors C1, C2, . . . ,Cn are linearly independent (and inparticular, no Ck is the zero vector). Thus the diagonal entries of D are the eigenvaluesof A, and the column vectors of C are a set of n linearly independent eigenvectors.

Note that the proof of Theorem 19 gives a procedure for diagonalizing an (n × n)matrix A. That is, if A has n linearly independent eigenvectors u1, u2, . . . , un, then thematrix S = [u1, u2, . . . ,un] will diagonalize A.

Example 1 Show that A is diagonalizable by finding a matrix S such that S−1AS = D:

A =[

5 −63 −4

].

Solution It is easy to verify that A has eigenvalues λ1 = 2 and λ2 = −1 with correspondingeigenvectors

u1 =[

21

]and u2 =

[11

].

Forming S = [u1, u2], we obtain

S =[

2 11 1

], S−1 =

[1 −1−1 2

].

As a check on the calculations, we form S−1AS. The matrix AS is given by

AS =[

5 −63 −4

][2 11 1

]=[

4 −12 −1

].



Next, forming S−1(AS), we obtain

S−1(AS) =[

1 −1−1 2

][4 −12 −1

]=[

2 00 −1

]= D.

Example 2 Use the result of Example 1 to calculate A10, where

A =[

5 −63 −4

].

Solution As was noted in Eq. (2), D10 = S−1A10S. Therefore, A10 = SD10S−1. Now byExample 1,

D10 =[

210 00 (−1)10

]=[

1024 00 1

].

Hence A10 = SD10S−1 is given by

A10 =[

2047 −20461023 −1022

].

Sometimes complex arithmetic is necessary to diagonalize a real matrix.

Example 3 Show that A is diagonalizable by finding a matrix S such that S−1AS = D:

A =[

1 1−1 1

].

Solution A has eigenvalues λ1 = 1+ i and λ2 = 1− i, with corresponding eigenvectors

u1 =[

1i

]and u2 =

[1−i

].

Forming the matrix S = [u1, u2], we obtain

S =[

1 1i −i

], S−1 =

12− i

212

i

2

.

As a check, note that AS is given by

AS =[

1 1−1 1

][1 1i −i

]=[

1+ i 1− i−1+ i −1− i

].

Next, S−1(AS) is the matrix

S−1(AS) = 12

[1 −i1 i

][1+ i 1− i−1+ i −1− i

]=[

1+ i 00 1− i

]= D.



Some types of matrices are known to be diagonalizable. The next theorem lists onesuch condition. Then, in the last subsection, we prove the important theorem: If A is areal symmetric matrix, then A is diagonalizable.

Theorem 20 Let A be an (n× n) matrix with n distinct eigenvalues. Then A is diagonalizable.

Proof By Theorem 15, ifA has n distinct eigenvalues, thenA has a set of n linearly independenteigenvectors. Thus by Theorem 19, A is diagonalizable.

As the next example shows, a matrix A may be diagonalizable even though it hasrepeated eigenvalues.

Example 4 Show that A is diagonalizable, where

A =

25 −8 3024 −7 30−12 4 −14

.

Solution The eigenvalues of A are λ1 = λ2 = 1 and λ3 = 2. The eigenspace corresponding toλ1 = λ2 = 1 has dimension 2, with a basis {u1, u2}, where

u1 =

130

and u2 =

−4

34

.

An eigenvector corresponding to λ3 = 2 is

u3 =

44−2

.

Defining S by S = [u1, u2, u3], we can verify that

S−1AS = D =

1 0 00 1 00 0 2

.

Orthogonal MatricesA remarkable and useful fact about symmetric matrices is that they are always diago-nalizable. Moreover, the diagonalization of a symmetric matrix A can be accomplishedwith a special type of matrix known as an orthogonal matrix.

Definition 9 A real (n × n) matrix Q is called an orthogonal matrix if Q is invertible andQ−1 = QT .



Definition 9 can be rephrased as follows: A real square matrix Q is orthogonal ifand only if

QTQ = I. (4)

Another useful description of orthogonal matrices can be obtained from Eq. (4). Inparticular, suppose thatQ = [q1, q2, . . . , qn] is an (n× n) matrix. Since the ith row ofQT is equal to qTi , the definition of matrix multiplication tells us:

The ij th entry ofQTQ is equal to qTi qj .

Therefore, by Eq. (4), an (n× n) matrixQ = [q1, q2, . . . , qn] is orthogonal if and onlyif:

The columns ofQ, {q1, q2, . . . , qn},form an orthonormal set of vectors. (5)

Example 5 Verify that the matrices,Q1 andQ2 are orthogonal:

Q1 = 1√2

1 0 1

0√

2 0−1 0 1

and Q2 =

0 0 11 0 00 1 0

.

Solution We use Eq. (4) to show thatQ1 is orthogonal. Specifically,

QT1Q1 = 12

1 0 −1

0√

2 01 0 1

1 0 1

0√

2 0−1 0 1

= 1

2

2 0 00 2 00 0 2

= I.

We use condition Eq. (5) to show thatQ2 is orthogonal. The column vectors ofQ2 are,in the order they appear, {e2, e3, e1}. Since these vectors are orthonormal, it followsfrom Eq. (5) thatQ2 is orthogonal.

From the characterization of orthogonal matrices given in condition Eq. (5), thefollowing observation can be made: If Q = [q1, q2, . . . , qn] is an (n × n) orthogonalmatrix and if P = [p1, p2, . . . , pn] is formed by rearranging the columns of Q, then Pis also an orthogonal matrix.

As a special case of this observation, suppose that P is a matrix formed by rear-ranging the columns of the identity matrix, I . Then, since I is an orthogonal matrix,it follows that P is orthogonal as well. Such a matrix P , formed by rearranging thecolumns of I , is called a permutation matrix. The matrixQ2 in Example 5 is a specificinstance of a (3× 3) permutation matrix.

Orthogonal matrices have some special properties that make them valuable toolsfor applications. These properties were mentioned in Section 3.7 with regard to (2× 2)orthogonal matrices. Suppose we think of an (n×n)matrixQ as defining a function (orlinear transformation) from Rn to Rn. That is, for x in Rn, consider the function definedby

y = Qx.As the next theorem shows, if Q is orthogonal, then the function y = Qx preserves thelengths of vectors and the angles between pairs of vectors.



Theorem 21 LetQ be an (n× n) orthogonal matrix.

(a) If x is in Rn, then ‖Qx‖ = ‖x‖.(b) If x and y are in Rn, then (Qx)T (Qy) = xT y.(c) Det(Q) = ±1.

Proof We will prove property (a) and leave properties (b) and (c) to the exercises. Let x be avector in Rn. Then

‖Qx‖ =√(Qx)T (Qx) =

√xTQTQx =

√xT Ix =

√xT x = ‖x‖.

The fact that xT (QTQ)x = xT Ix comes from Eq. (4).

Theorem 21 can be illustrated geometrically (see Figs. 4.4 and 4.5). In Fig. 4.4(a), avector x in R2 is shown, where ‖x‖ = 1. The vectorQx is shown in Fig. 4.4(b), where,by Theorem 21, Qx also has length 1. In Fig. 4.5(a), vectors x and y are shown, where‖x‖ = 1 and ‖y‖ = 2. From vector geometry, we know that the angle θ between x andy satisfies the condition

xT y = ‖x‖‖y‖ cos θ, 0 ≤ θ ≤ π. (6)

In Fig. 4.5(b), the vectors Qx andQy are shown, where the angle between Qx andQyis also equal to θ . To establish that the angle between x and y is equal to the anglebetweenQx andQy, we can argue as follows: Let γ denote the angle betweenQx andQy, where 0 ≤ γ ≤ π . As in Eq. (6), the angle γ satisfies the condition

(Qx)T (Qy) = ‖Qx‖‖Qy‖ cos γ, 0 ≤ γ ≤ π. (7)

By Theorem 21, (Qx)T (Qy) = xT y and ‖Qx‖‖Qy‖ = ‖x‖‖y‖. Thus, from Eq. (6) andEq. (7), cos θ = cos γ . Since the cosine function is one-to-one on [0, π ], the conditioncos θ = cos γ implies that θ = γ .

(0, 1)

(–1, 0)

Qx

(0, 1)

(–1, 0)

x

‖Qx‖ = 1‖x‖ = 1

(b)(a)

x1

x2

x1

x2

Figure 4.4 The length of x is equal to the length ofQx



(0, 2)

(–2, 0)

(–1, 0)

Qx

Qy

(b)

(0, 2)

(–2, 0)

(–1, 0)

yx

(a)

x1

x2

x1

x2

θθ

Figure 4.5 The angle between x and y is equal to the angle betweenQx andQy.

Diagonalization of Symmetric MatricesWe conclude this section by showing that every symmetric matrix can be diagonalized byan orthogonal matrix. Several approaches can be used to establish this diagonalizationresult. We choose to demonstrate it by first stating a special case of a theorem known asSchur’s theorem.

Theorem 22 Let A be an (n× n)matrix, where A has only real eigenvalues. Then there is an (n× n)orthogonal matrixQ such that

QTAQ = T ,

where T is an (n× n) upper-triangular matrix.

We leave the proof of Theorem 22 as a series of somewhat challenging exercises. Itis important to observe that the triangular matrix T in Theorem 22 is similar to A. Thatis, sinceQ−1 = QT , it follows thatQTAQ is a similarity transformation.

Schur’s theorem (of which Theorem 22 is a special case) states that any (n × n)matrix A is unitarily similar to a triangular matrix T . The definition of a unitary matrixis given in the exercises of the previous section.

Linear algebra software can be used to find matrices Q and T that satisfy the con-clusions of Schur’s theorem: QTAQ = T . Note that we can rewriteQTAQ = T as

A = QTQT .

The decompositionA = QTQT is called aSchur decompositionor a Schur factorizationof A.



Example 6 The (3× 3) matrix A has real eigenvalues:

A =

2 4 37 5 91 3 1

.

Find an orthogonal matrixQ and an upper-triangular matrix T such thatQTAQ = T .

Solution We used MATLAB in this example. The MATLAB command [Q, T] = schur(A)yields appropriate matrices Q and T (see Fig. 4.6). Since A and T are similar, theeigenvalues ofA are the diagonal entries of T . Thus, to the places shown in Fig. 4.6, theeigenvalues of A are λ = 11.6179, λ = −0.3125, and λ = −3.3055.

A= 2 4 3 7 5 9 1 3 1

>>[Q,T]=schur(A)

Q= -0.4421 0.7193 -0.5359 -0.8514 -0.1486 0.5030 -0.2822 -0.6786 -0.6781

T= 11.6179 2.1869 6.6488 0 -0.3125 0.1033 0 0 -3.3055

Figure 4.6 MATLAB was used in Example 6 to find matricesQ and Tsuch thatQTAQ = T .

With Theorem 22, it is a simple matter to show that any real symmetric matrix canbe diagonalized by an orthogonal matrix. In fact, as the next theorem states, a matrixis orthogonally diagonalizable if and only if the matrix is symmetric. We will use thisresult in Section 7.1 when we discuss diagonalizing quadratic forms.

Theorem 23 Let A be a real (n× n) matrix.

(a) If A is symmetric, then there is an orthogonal matrixQ such thatQTAQ = D,where D is diagonal.

(b) IfQTAQ = D, whereQ is orthogonal andD is diagonal, thenA is a symmetricmatrix.



Proof To prove property (a), suppose A is symmetric. Recall, by Theorem 17, that A hasonly real eigenvalues. Thus, by Theorem 22, there is an orthogonal matrix Q such thatQTAQ = M , whereM is an upper-triangular matrix. Using the transpose operation onthe equalityM = QTAQ and also using the fact that AT = A, we obtain

MT = (QTAQ)T = QTATQ = QTAQ = M.Thus, sinceM is upper triangular andMT = M , it follows thatM is a diagonal matrix.

To prove property (b), suppose that QTAQ = D, where Q is orthogonal and Dis diagonal. Since D is diagonal, we know that DT = D. Thus, using the transposeoperation on the equalityQTAQ = D, we obtain

QTAQ = D = DT = (QTAQ)T = QTATQ.From this result, we see thatQTAQ = QTATQ. Multiplying byQ andQT , we obtain

Q(QTAQ)QT = Q(QTATQ)QT(QQT )A(QQT ) = (QQT )AT (QQT )

A = AT .Thus, since A = AT , matrix A is symmetric.

Theorem 23 states that every real symmetric matrix A is orthogonally diagonal-izable; that is, QTAQ = D, where Q is orthogonal and D is diagonal. From theproof of Theorem 19 (also, see Examples 1, 3, and 4), the eigenvalues of A are thediagonal entries of D, and eigenvectors of A can be chosen as the columns of Q.Since the columns of Q form an orthonormal set, the following result is a corollary ofTheorem 23.

Corollary Let A be a real (n × n) symmetric matrix. It is possible to choose eigenvectorsu1, u2, . . . , un for A such that {u1, u2, . . . , un} is an orthonormal basis for Rn.

The corollary is illustrated in the next example. Before presenting the example, wenote the following fact, which is established in Exercise 43:

If u and v are eigenvectors of a symmetric matrix andif u and v belong to different eigenspaces, then uT v = 0. (8)

Note that ifA is not symmetric, then eigenvectors corresponding to different eigenvaluesare not generally orthogonal.

Example 7 Find an orthonormal basis for R4 consisting of eigenvectors of the matrix

A =

1 −1 −1 −1−1 1 −1 −1−1 −1 1 −1−1 −1 −1 1

.



Solution Matrix A is a special case of the Rodman matrix (see Exercise 42). The characteristicpolynomial for A is given by

p(t) = det(A− tI ) = (t − 2)3(t + 2).

Thus the eigenvalues of A are λ1 = λ2 = λ3 = 2 and λ4 = −2.It is easy to verify that corresponding eigenvectors are given by

w1 =

1−1

00

, w2 =

10−1

0

, w3 =

100−1

, and w4 =

1111

.

Note that w1, w2, and w3 belong to the eigenspace associated with λ = 2, whereas w4is in the eigenspace associated with λ = −2. As is promised by condition (8), wT1 w4 =wT2 w4 = wT3 w4 = 0. Also note that the matrix S defined by S = [w1,w2,w3,w4] willdiagonalize A. However, S is not an orthogonal matrix.

To obtain an orthonormal basis for R4 (and hence an orthogonal matrix Q thatdiagonalizesA), we first find an orthogonal basis for the eigenspace associated with λ =2. Applying the Gram–Schmidt process to the set {w1,w2,w3}, we produce orthogonalvectors

x1 =

1−1

00

, x2 =

1/21/2−1

0

, and x3 =

1/31/31/3−1

.

Thus the set {x1, x2, x3,w4} is an orthogonal basis for R4 consisting of eigenvectors ofA. This set can then be normalized to determine an orthonormal basis for R4 and anorthogonal matrixQ that diagonalizes A.

We conclude by mentioning a result that is useful in applications. Let A be an(n × n) symmetric matrix with eigenvalues λ1, λ2, . . . , λn. Let u1, u2, . . . , un be acorresponding set of orthonormal eigenvectors, where Aui = λiui , 1 ≤ i ≤ n. MatrixA can be expressed in the form

A = λ1u1uT1 + λ2u2uT2 + · · · + λnunuTn . (9)

In Eq. (9), each (n × n) matrix uiuTi is a rank-one matrix. Expression (9) is called aspectral decomposition for A. A proof for Eq. (9) can be constructed along the lines ofExercise 29 of Section 4.5.

4.7 EXERCISES

In Exercises 1–12, determine whether the given matrixA is diagonalizable. IfA is diagonalizable, calculateA5

using the method of Example 2.1. A =

[2 −1−1 2

]2. A =

[1 −1−1 1

]



3. A =[ −3 2−2 1

]4. A =

[1 30 1

]

5. A =[

1 010 2

]6. A =

[ −1 70 1

]

7. A =

3 −2 −48 −7 −16−3 3 7

8. A =−1 −1 −4−8 −3 −16

1 2 7

9. A =

3 −1 −1−12 0 5

4 −2 −1

10. A =

1 1 −10 2 −10 0 1

11. A =

1 1 −20 2 −10 0 1

12. A =

1 3 30 5 40 0 1

In Exercises 13–18, use condition (5) to determinewhether the given matrixQ is orthogonal.

13. Q =[

0 11 0

]14. Q = 1√

5

[1 −22 1

]

15. Q =[

2 −11 2

]16. Q =

[3 2−2 3

]

17. Q = 1√6

√

3 1√

2

0 −2√

2

−√3 1√

2

18. Q =

1 1 −42 −2 11 3 2

In Exercises 19 and 20, find values α, β, a, b, and csuch that matrix Q is orthogonal. Choose positive val-ues for α and β. [Hint: Use condition (5) to determinethe values.]

19. Q =α β a

0 2β b

α −β c

20. Q =

α −β a

α 3β b

α −2β c

In Exercises 21–24, use linear algebra software to findan orthogonal matrix Q and an upper-triangular matrixT such that QTAQ = T . [Note: In each exercise, thematrix A has only real eigenvalues.]21.

1 0 13 3 52 6 2

22.

3 0 79 −6 41 1 4

23. 4 5 2 80 6 7 52 4 5 39 7 3 6

24. 4 7 3 58 5 7 82 4 3 50 5 7 4

25. LetA be an (n×n)matrix, and let S be a nonsingular(n× n) matrix.a) Verify that (S−1AS)2 = S−1A2S and that(S−1AS)3 = S−1A3S.

b) Prove by induction that (S−1AS)k = S−1AkS

for any positive integer k.26. Show that if A is diagonalizable and if B is similar

to A, then B is diagonalizable. [Hint: Suppose thatS−1AS = D andW−1AW = B.]

27. Suppose that B is similar to A. Show each of thefollowing.a) B + αI is similar to A+ αI .b) BT is similar to AT .c) If A is nonsingular, then B is nonsingular and,

moreover, B−1 is similar to A−1.28. Prove properties (b) and (c) of Theorem 21. [Hint:

For property (c), use the fact thatQTQ = I .]29. Let u be a vector in Rn such that uT u = 1. LetQ = I − 2uuT . Show that Q is an orthogonal ma-trix. Also, calculate the vector Qu. Is u an eigen-vector forQ?

30. Suppose that A and B are orthogonal (n×n)matri-ces. Show that AB is an orthogonal matrix.

31. Let x be a nonzero vector in R2, x = [a, b]T . Finda vector y in R2 such that xT y = 0 and yT y = 1.

32. Let A be a real (2× 2) matrix with only real eigen-values. Suppose that Au = λu, where uT u = 1.By Exercise 31, there is a vector v in R2 such that



uT v = 0 and vT v = 1. LetQ be the (2× 2) matrixgiven byQ = [u, v], and note thatQ is an orthogo-nal matrix. Verify that

QTAQ =[λ uTAv0 vTAv

].

(Thus Theorem 22 is proved for a (2×2)matrixA.)

In Exercises 33–36, use the procedure outlined in Ex-ercise 32 to find an orthogonal matrix Q such thatQTAQ = T , T upper triangular.

33. A =[

1 −11 3

]34. A =

[5 −26 −2

]

35. A =[

2 −1−1 2

]36. A =

[2 23 3

]

37. Let A and R be (n × n) matrices. Show that theij th entry of RTAR is given by RTi ARj , whereR = [R1,R2, . . . ,Rn].

38. Let A be a real (3× 3) matrix with only real eigen-values. Suppose that Au = λu, where uT u = 1.By the Gram–Schmidt process, there are vectorsv and w in R3 such that {u, v,w} is an orthonor-mal set. Consider the orthogonal matrixQ given byQ = [u, v,w]. Verify that

QTAQ =λ uTAv uTAw0 vTAv vTAw0 wT Av wT Aw

=λ uT Av uT Aw0

A10

.

39. Let B = QTAQ, where Q and A are as in Exer-cise 38. Consider the (2× 2) submatrix of B given

by A1 in Exercise 38. Show that the eigenvaluesof A1 are real. [Hint: Calculate det(B − tI ), andshow that every eigenvalue of A1 is an eigenvalueof B. Then make a statement showing that all theeigenvalues of B are real.]

40. Let B = QTAQ, where Q and A are as in Exer-cise 38. By Exercises 32 and 39, there is a (2 × 2)matrix S such that ST S = I , ST A1S = T1, whereT1 is upper triangular. Form the (3× 3) matrix R:

R =

1 0 00

S0

.

Verify each of the following.a) RTR = I .b) RTQTAQR is an upper-triangular matrix.(Note that this exercise verifies Theorem 22 for a(3× 3) matrix A.)

41. Following the outline of Exercises 38–40, use in-duction to prove Theorem 22.

42. Consider the (n × n) symmetric matrix A = (aij )defined as follows:a) aii = 1, 1 ≤ i ≤ n;b) aij = −1, i �= j, 1 ≤ i, j ≤ n.(A (4× 4) version of this matrix is given in Exam-ple 7.) Verify that the eigenvalues of A are λ = 2(geometric multiplicity n− 1) and λ = 2− n (geo-metric multiplicity 1). [Hint: Show that the follow-ing are eigenvectors: ui = e1 − ei , 2 ≤ i ≤ n andu1 = [1, 1, . . . , 1]T .]

43. Suppose that A is a real symmetric matrix and thatAu = λu, Av = βv, where λ �= β, u �= θ , andv �= θ . Show that uT v = 0. [Hint: ConsideruT Av.]

4.8 DIFFERENCE EQUATIONS; MARKOV CHAINS;SYSTEMS OF DIFFERENTIAL EQUATIONS(OPTIONAL)

In this section we examine how eigenvalues can be used to solve difference equations andsystems of differential equations. In Chapter 7, we treat other applications of eigenvaluesand also return to a deeper study of systems of differential equations.


4.8 Difference Equations; Markov Chains; Systems of Differential Equations (Optional) 339

Let A be an (n× n) matrix, and let x0 be a vector in Rn. Consider the sequence ofvectors {xk} defined by

x1 = Ax0

x2 = Ax1

x3 = Ax2...

In general, this sequence is given by

xk = Axk−1, k = 1, 2, . . . . (1)

Vector sequences that are generated as in Eq. (1) occur in a variety of applications andserve as mathematical models to describe population growth, ecological systems, radartracking of airborne objects, digital control of chemical processes, and the like. Oneof the objectives in such models is to describe the behavior of the sequence {xk} inqualitative or quantitative terms. In this section we see that the behavior of the sequence{xk} can be analyzed from the eigenvalues of A.

The following simple example illustrates a typical sequence of the form (1).

Example 1 Let xk = Axk−1, k = 1, 2, . . . . Calculate x1, x2, x3, x4, and x5, where

A =[.8 .2.2 .8

]and x0 =

[12

].

Solution Some routine but tedious calculations show that

x1 = Ax0 =[

1.21.8

], x2 = Ax1 =

[1.321.68

], x3 = Ax2 =

[1.3921.608

]

x4 = Ax3 =[

1.43521.5648

], and x5 = Ax4 =

[1.461121.53888

].

In Example 1, the first six terms of a vector sequence {xk} are listed. An inspectionof these first few terms suggests that the sequence might have some regular pattern ofbehavior. For instance, the first components of these vectors are steadily increasing,whereas the second components are steadily decreasing. In fact, as shown in Example3, this monotonic behavior persists for all terms of the sequence {xk}. Moreover, it canbe shown that

limk→∞ xk = x∗,

where the limit vector x∗ is given by

x∗ =[

1.51.5

].



Difference EquationsLet A be an (n× n) matrix. The equation

xk = Axk−1 (2)

is called a difference equation. A solution to the difference equation is any sequence ofvectors {xk} that satisfies Eq. (2). That is, a solution is a sequence {xk}whose successiveterms are related by x1 = Ax0, x2 = Ax1, . . . , xk = Axk−1, . . . . (Equation 2 is not themost general form of a difference equation.)

The basic challenge posed by a difference equation is to describe the behavior ofthe sequence {xk}. Some specific questions are:

1. For a given starting vector x0, is there a vector x∗ such that

limk→∞ xk = x∗?

2. If the sequence {xk} does have a limit, x∗, what is the limit vector?3. Find a “formula” that can be used to calculate xk in terms of the starting vector x0.4. Given a vector b and an integer k, determine x0 so that xk = b.5. Given a vector b, characterize the set of starting vectors x0 for which {xk} → b.

Unlike many equations, the Difference Eq. in (2) does not raise any interesting questionsconcerning the existence or uniqueness of solutions. For a given starting vector x0, wesee that a solution to Eq. (2) always exists because it can be constructed. For instance, inExample 1 we found the first six terms of the solution to the given difference equation. Interms of uniqueness, suppose x0 is a given starting vector. It can be shown (see Exercise21) that if {wk} is any sequence satisfying Eq. (2) and if w0 = x0, then wk = xk ,k = 1, 2, . . . .

The next example shows how a difference equation might serve as a mathematicalmodel for a physical process. The model is kept very simple so that the details donot obscure the ideas. Thus the example should be considered illustrative rather thanrealistic.

Example 2 Suppose that animals are being raised for market, and the grower wishes to determinehow the annual rate of harvesting animals will affect the yearly size of the herd.

Solution To begin, let x1(k) and x2(k) be the state variables that measure the size of the herd inthe kth year of operation, where

x1(k) = number of animals less than one year old at year kx2(k) = number of animals more than one year old at year k.

We assume that animals less than one year old do not reproduce, and that animals morethan one year old have a reproduction rate of b per year. Thus if the herd has x2(k)

mature animals at year k, we expect to have x1(k + 1) young animals at year k + 1,where

x1(k + 1) = bx2(k).

Next we assume that the young animals have a death rate of d1 per year, and themature animals have a death rate of d2 per year. Furthermore, we assume that the mature



animals are harvested at a rate of h per year and that young animals are not harvested.Thus we expect to have x2(k + 1) mature animals at year k + 1, where

x2(k + 1) = x1(k)+ x2(k)− d1x1(k)− d2x2(k)− hx2(k).

This equation reflects the following facts: An animal that is young at year k will matureby year k + 1; an animal that is mature at year k is still mature at year k + 1; a certainpercentage of young and mature animals will die during the year; and a certain percentageof mature animals will be harvested during the year. Collecting like terms in the secondequation and combining the two equations, we obtain the state equations for the herd:

x1(k + 1) = bx2(k)

x2(k + 1) = (1− d1)x1(k)+ (1− d2 − h)x2(k).(3)

The state equations give the size and composition of the herd at year k + 1 in termsof the size and composition of the herd at year k. For example, if we know the initialcomposition of the herd at year zero, x1(0) and x2(0), we can use (3) to calculate thecomposition of the herd after one year, x1(1) and x2(1).

In matrix form, (3) becomes

x(k) = Ax(k − 1), k = 1, 2, 3, . . . ,

where

x(k) =[x1(k)

x2(k)

]and A =

[0 b

(1− d1) (1− d2 − h)

].

In the context of this example, the growth and composition of the herd are governed bythe eigenvalues of A, and these can be controlled by varying the parameter h.

Solving Difference EquationsConsider the difference equation

xk = Axk−1, (4)

where A is an (n× n) matrix. The key to finding a useful form for solutions of Eq. (4)is to observe that the sequence {xk} can be calculated by multiplying powers of A by thestarting vector x0. That is,

x1 = Ax0

x2 = Ax1 = A(Ax0) = A2x0

x3 = Ax2 = A(A2x0) = A3x0

x4 = Ax3 = A(A3x0) = A4x0,

and, in general,

xk = Akx0, k = 1, 2, . . . . (5)

Next, let A have eigenvalues λ1, λ2, . . . , λn and corresponding eigenvectors u1, u2,

. . . ,un. We now make a critical assumption: Let us suppose that matrix A is notdefective. That is, let us suppose that the set of eigenvectors {u1, u2, . . . , un} is linearlyindependent.



With the assumption that A is not defective, we can use the set of eigenvectors as abasis forRn. In particular, any starting vector x0 can be expressed as a linear combinationof the eigenvectors:

x0 = a1u1 + a2u2 + · · · + anun.Then, using Eq. (5), we can obtain the following expression for xk:

xk = Akx0

= Ak(a1u1 + a2u2 + · · · + anun)= a1A

ku1 + a2Aku2 + · · · + anAkun

= a1(λ1)ku1 + a2(λ2)

ku2 + · · · + an(λn)kun.

(6)

(This last equality comes from Theorem 11 of Section 4.4: IfAu = λu, thenAku = λku.)Note that if A does not have a set of n linearly independent eigenvectors, then the

expression for xk in Eq. (6) must be modified. The modification depends on the idea ofa generalized eigenvector. It can be shown (see Section 7.8) that we can always choosea basis for Rn consisting of eigenvectors and generalized eigenvectors of A.

Example 3 Use Eq. (6) to find an expression for xk , where xk is the kth term of the sequence inExample 1. Use your expression to calculate xk for k = 10 and k = 20. Determinewhether the sequence {xk} converges.

Solution The sequence {xk} in Example 1 is generated by xk = Axk−1, k = 1, 2, . . . , where

A =[.8 .2.2 .8

]and x0 =

[12

].

Now the characteristic polynomial for A is

p(t) = t2 − 1.6t + 0.6 = (t − 1)(t − 0.6).

Therefore, the eigenvalues of A are λ1 = 1 and λ2 = 0.6. Corresponding eigenvectorsare

u1 =[

11

]and u2 =

[1−1

].

The starting vectorx0 can be expressed in terms of the eigenvectors asx0 = 1.5u1−0.5u2:

x0 =[

12

]= 1.5

[11

]− 0.5

[1−1

].

Therefore, the terms of the sequence {xk} are given by

xk = Akx0 = Ak(1.5u1 − 0.5u2)

= 1.5Aku1 − 0.5Aku2

= 1.5(1)ku1 − 0.5(0.6)ku2

= 1.5u1 − 0.5(0.6)ku2.



In detail, the components of xk are

xk =[

1.5− 0.5(0.6)k

1.5+ 0.5(0.6)k

], k = 0, 1, 2, . . . . (7)

For k = 10 and k = 20, we calculate xk from Eq. (7), finding

x10 =[

1.496976 . . .1.503023 . . .

]and x20 =

[1.499981 . . .1.500018 . . .

].

Finally, since limk→∞(0.6)k = 0, we see from Eq. (7) that

limk→∞ xk = x∗ =

[1.51.5

].

Types of Solutions to Difference EquationsIf we reflect about the results of Example 3, the following observations emerge: Supposea sequence {xk} is generated by xk = Axk−1, k = 1, 2, . . . ,whereA is the (2×2)matrix

A =[.8 .2.2 .8

].

Then, no matter what starting vector x0 is selected, the sequence {xk}will either convergeto the zero vector, or the sequence will converge to a multiple of u1 = [1, 1]T .

To verify this observation, let x0 be any given initial vector. We can express x0 interms of the eigenvectors:

x0 = a1u1 + a2u2.

Since the eigenvalues of A are λ1 = 1 and λ2 = 0.6, the vector xk is given by

xk = Akx0 = a1(1)ku1 + a2(0.6)ku2 = a1u1 + a2(0.6)ku2.

Given this expression for xk , there are only two possibilities:

1. If a1 �= 0, then limk→∞ xk = a1u1.2. If a1 = 0, then limk→∞ xk = θ .

In general, an analogous description can be given for the possible solutions of anydifference equation. Specifically, letA be a nondefective (n×n)matrix with eigenvaluesλ1, λ2, . . . , λn. For convenience, let us assume the eigenvalues are indexed according totheir magnitude, where

|λ1| ≥ |λ2| ≥ · · · ≥ |λn|.Let x0 be any initial vector, and consider the sequence {xk}, where xk = Axk−1, k =1, 2, . . . . Finally, suppose x0 is expressed as

x0 = a1u1 + a2u2 + · · · + anun,where a1 �= 0.



From Eq. (6), we have the following possibilities for the sequence {xk}:1. If |λ1| < 1, then limk→∞ xk = θ .2. If |λ1| = 1, then there is a constantM > 0 such that ‖xk‖ ≤ M , for all k.3. If λ1 = 1 and |λ2| < 1, then limk→∞ xk = a1u1.4. If |λ1| > 1, then limk→∞ ‖xk‖ = ∞.

Other possibilities exist that are not listed. For example, if λ1 = 1, λ2 = 1, and |λ3| < 1,then {xk} → a1u1 + a2u2.

Also, in listing the possibilities we assumed thatAwas not defective and that a1 �= 0.If a1 = 0 but a2 �= 0, it should be clear that a similar list can be made by using λ2 inplace of λ1. If matrix A is defective, it can be shown (see Section 7.8) that the list aboveis still valid, with the following exception (see Exercise 19 for an example): If |λ1| = 1and if the geometric multiplicity of λ1 is less than the algebraic multiplicity, then it willusually be the case that ‖xk‖ → ∞ as k→∞.

Example 4 For the herd model described in Example 2, let the parameters be given by b = 0.9,d1 = 0.1, and d2 = 0.2. Thus xk = Axk−1, where

A =[

0 .9.9 .8− h

].

Determine a harvest rate h so that the herd neither dies out nor grows without bound.

Solution For any given harvest rate h, the matrix A will have eigenvalues λ1 and λ2, where|λ1| ≥ |λ2|. If |λ1| < 1, then {xk} → θ , and the herd is dying out. If |λ1| > 1, then{‖xk‖} → ∞, which indicates that the herd is increasing without bound.

Therefore, we want to select h so that λ1 = 1. For any given h, λ1 and λ2 are rootsof the characteristic equation p(t) = 0, where

p(t) = det(A− tI ) = t2 − (.8− h)t − .81.To have λ1 = 1, we need p(1) = 0, or

(1)2 − (.8− h)(1)− .81 = 0,or

h− .61 = 0.Thus a harvest rate of h = 0.61 will lead to λ1 = 1 and λ2 = −0.81.

Note that to the extent the herd model in Examples 2 and 4 is valid, a harvest rate ofless than 0.61 will cause the herd to grow, whereas a rate greater than 0.61 will cause theherd to decrease. A harvest rate of 0.61 will cause the herd to approach a steady-statedistribution of 9 young animals for every 10 mature animals. That is, for any initialvector x0 = a1u1 + a2u2, we have (with h = 0.61)

xk = a1u1 + a2(−0.81)ku2,

where the eigenvectors u1 and u2 are given by

u1 =[

910

]and u2 =

[10−9

].



Markov ChainsA special type of difference equation arises in the study of Markov chains or Markovprocesses. We cannot go into the interesting theory of Markov chains, but we will givean example that illustrates some of the ideas.

Example 5 An automobile rental company has three locations, which we designate as P ,Q, and R.When an automobile is rented at one of the locations, it may be returned to any of thethree locations.

Suppose, at some specific time, that there are p cars at location P , q cars atQ, andr cars at R. Experience has shown, in any given week, that the p cars at location P aredistributed as follows: 10% are rented and returned to Q, 30% are rented and returnedto R, and 60% remain at P (these either are not rented or are rented and returned to P ).Similar rental histories are known for locationsQ and R, as summarized below.

Weekly Distribution HistoryLocation P : 60% stay at P , 10% go toQ, 30% go to R.Location Q: 10% go to P , 80% stay atQ, 10% go to R.Location R: 10% go to P , 20% go toQ, 70% stay at R.

Solution Let xk represent the state of the rental fleet at the beginning of week k:

xk =p(k)

q(k)

r(k)

.

For the state vector xk , p(k) denotes the number of cars at location P , q(k) the numberatQ, and r(k) the number at R.

From the weekly distribution history, we see that

p(k + 1) = .6p(k)+ .1q(k)+ .1r(k)q(k + 1) = .1p(k)+ .8q(k)+ .2r(k)r(k + 1) = .3p(k)+ .1q(k)+ .7r(k).

(For instance, the number of cars at P when week k + 1 begins is determined by the60% that remain at P , the 10% that arrive fromQ, and the 10% that arrive from R.)

To the extent that the weekly distribution percentages do not change, the rentalfleet is rearranged among locations P , Q, and R according to the rule xk+1 = Axk ,k = 0, 1, . . . , where A is the (3× 3) matrix

A =.6 .1 .1.1 .8 .2.3 .1 .7

.

Example 5 represents a situation in which a fixed population (the rental fleet) isrearranged in stages (week by week) among a fixed number of categories (the locationsP , Q, and R). Moreover, in Example 5 the rules governing the rearrangement remain



fixed from stage to stage (the weekly distribution percentages stay constant). In general,such problems can be modeled by a difference equation of the form

xk+1 = Axk, k = 0, 1, . . . .

For such problems the matrix A is often called a transition matrix. Such a matrix hastwo special properties:

The entries of A are all nonnegative. (8a)

In each column of A, the sum of the entries has the value 1. (8b)

It turns out that a matrix having properties (8a) and (8b) always has an eigenvalue ofλ = 1. This fact is established in Exercise 26 and illustrated in the next example.

Example 6 Suppose the automobile rental company described in Example 5 has a fleet of 600cars. Initially an equal number of cars is based at each location, so that p(0) = 200,q(0) = 200, and r(0) = 200. As in Example 5, let the week-by-week distribution ofcars be governed by xk+1 = Axk , k = 0, 1, . . . , where

xk =p(k)

q(k)

r(k)

, A =

.6 .1 .1.1 .8 .2.3 .1 .7

, and x0 =

200200200

.

Find limk→∞ xk . Determine the number of cars at each location in the kth week, fork = 1, 5, and 10.

Solution If A is not defective, we can use Eq. (6) to express xk as

xk = a1(λ1)ku1 + a2(λ2)

ku2 + a3(λ3)ku3,

where {u1, u2, u3} is a basis for R3, consisting of eigenvectors of A.It can be shown that A has eigenvalues λ1 = 1, λ2 = .6, and λ3 = .5. Thus A has

three linearly independent eigenvectors:

λ1 = 1, u1 =

497

; λ2 = .6 u2 =

01−1

;

λ3 = .5, u3 =−1−1

2

.

The initial vector, x0 = [200, 200, 200]T , can be written as

x0 = 30u1 − 150u2 − 80u3.

Thus the vector xk = [p(k), q(k), r(k)]T is given byxk = Akx0

= Ak(30u1 − 150u2 − 80u3)

= 30(λ1)ku1 − 150(λ2)

ku2 − 80(λ3)ku3

= 30u1 − 150(.6)ku2 − 80(.5)ku3.

(9)



From the expression above, we see that

limk→∞ xk = 30u1 =

120270210

.

Therefore, as the weeks proceed, the rental fleet will tend to an equilibrium state with120 cars at P , 270 cars at Q, and 210 cars at R. To the extent that the model is valid,locationQ will require the largest facility for maintenance, parking, and the like.

Finally, using Eq. (9), we can calculate the state of the fleet for the kth week:

x1 =

160220220

, x5 =

122.500260.836216.664

, and x10 =

120.078269.171210.751

.

Note that the components of x10 are rounded to three places. Of course, for anactual fleet the state vectors xk must have only integer components. The fact that thesequence defined in Eq. (9) need not have integer components represents a limitation ofthe assumed distribution model.

Systems of Differential EquationsDifference equations are useful for describing the state of a physical system at discretevalues of time. Mathematical models that describe the evolution of a physical system forall values of time are frequently expressed in terms of a differential equation or a systemof differential equations. A simple example of a system of differential equations is

v′(t) = av(t)+ bw(t)w′(t) = cv(t)+ dw(t). (10)

In Eq. (10), the problem is to find functions v(t) and w(t) that simultaneously satisfythese equations and in which initial conditions v(0) andw(0)may also be specified. Wecan express Eq. (10) in matrix terms if we let

x(t) =[v(t)

w(t)

].

Then Eq. (10) can be written as x′(t) = Ax(t), where

x′(t) =[v′(t)w′(t)

]and A =

[a b

c d

].

The equation x′(t) = Ax(t) is reminiscent of the simple scalar differential equation,y ′(t) = αy(t), which is frequently used in calculus to model problems such as radioactivedecay or bacterial growth. To find a function y(t) that satisfies the identity y ′(t) = αy(t),we rewrite the equation as y ′(t)/y(t) = α. Integrating both sides with respect to t yieldsln |y(t)| = αt + β, or equivalently y(t) = y0e

αt , where y0 = y(0).Using the scalar equation as a guide, we assume the vector equation x′(t) = Ax(t)

has a solution of the form

x(t) = eλtu, (11)



where u is a constant vector. To see if the function x(t) in Eq. (11) can be a solution, wedifferentiate and get x′(t) = λeλtu. On the other hand, Ax(t) = eλtAu; so Eq. (11) willbe a solution of x′(t) = Ax(t) if and only if

eλt (A− λI)u = θ . (12)

Now eλt �= 0 for all values of t ; so Eq. (12) will be satisfied only if (A − λI)u = θ .Therefore, if λ is an eigenvalue of A and u is a corresponding eigenvector, then x(t)given in Eq. (11) is a solution to x′(t) = Ax(t). (Note: The choice u = θ will also givea solution, but it is a trivial solution.)

If the (2× 2) matrix A has eigenvalues λ1 and λ2 with corresponding eigenvectorsu1 and u2, then two solutions of x′(t) = Ax(t) are x1(t) = eλ1tu1 and x2(t) = eλ2tu2. Itis easy to verify that any linear combination of x1(t) and x2(t) is also a solution; so

x(t) = a1x1(t)+ a2x2(t) (13)

will solve x′(t) = Ax(t) for any choice of scalars a1 and a2. Finally, the initial-valueproblem consists of finding a solution to x′(t) = Ax(t) that satisfies an initial condition,x(0) = x0, where x0 is some specified vector. Given the form of x1(t) and x2(t), it isclear from Eq. (13) that x(0) = a1u1 + a2u2. If the eigenvectors u1 and u2 are linearlyindependent, we can always choose scalars b1 and b2 so that x0 = b1u1 + b2u2; andtherefore x(t) = b1x1(t)+ b2x2(t) is the solution of x′(t) = Ax(t), x(0) = x0.

Example 7 Solve the initial-value problem

v′(t) = v(t)− 2w(t), v(0) = 4w′(t) = v(t)+ 4w(t), w(0) = −3.

Solution In vector form, the given equation can be expressed as x′(t) = Ax(t), x(0) = x0, where

x(t) =[v(t)

w(t)

], A =

[1 −21 4

], and x0 =

[4−3

].

The eigenvalues of A are λ1 = 2 and λ2 = 3, with corresponding eigenvectors

u1 =[

2−1

]and u2 =

[1−1

].

As before, x1(t) = e2tu1 and x2(t) = e3tu2 are solutions of x′(t) = Ax(t), as is anylinear combination, x(t) = b1x1(t) + b2x2(t). We now need only choose appropriateconstants b1 and b2 so that x(0) = x0, where we know x(0) = b1u1 + b2u2. For x0 asgiven, it is routine to find x0 = u1+ 2u2. Thus the solution of x′(t) = Ax(t), x(0) = x0is x(t) = x1(t)+ 2x2(t), or

x(t) = e2tu1 + 2e3tu2.

In terms of the functions v and w, we have

x(t) =[v(t)

w(t)

]= e2t

[2−1

]+ 2e3t

[1−1

]=[

2e2t + 2e3t

−e2t − 2e3t

].



In general, given the problem of solving

x′(t) = Ax(t), x(0) = x0, (14)

whereA is an (n×n)matrix, we can proceed just as above. We first find the eigenvaluesλ1, λ2, . . . , λn of A and corresponding eigenvectors u1, u2, . . . , un. For each i, xi (t) =eλi tui is a solution of x′(t) = Ax(t), as is the general expression

x(t) = b1x1(t)+ b2x2(t)+ · · · + bnxn(t). (15)

As before, x(0) = b1u1 + b2u2 + · · · + bnun; so if x0 can be expressed as a linearcombination of u1, u2, . . . , un, then we can construct a solution to Eq. (14) in the formof Eq. (15). If the eigenvectors of A do not form a basis for Rn, we can still get asolution of the form Eq. (15); but a more detailed analysis is required. See Example 4,Section 7.8.

4.8 EXERCISES

In Exercises 1–6, consider the vector sequence {xk},where xk = Axk−1, k = 1, 2, . . . . For the given startingvector x0, calculate x1, x2, x3, and x4 by using directmultiplication, as in Example 1.

1. A =[

0 11 0

], x0 =

[24

]

2. A =[.5 .5.5 .5

], x0 =

[16

8

]

3. A =[.5 .25.5 .75

], x0 =

[12864

]

4. A =[

2 −1−1 2

], x0 =

[31

]

5. A =[

1 41 1

], x0 =

[ −12

]

6. A =[

3 14 3

], x0 =

[20

]

In Exercises 7–14, let xk = Axk−1, k = 1, 2, . . . , forthe given A and x0. Find an expression for xk by usingEq. (6), as in Example 3. With a calculator, compute x4and x10 from the expression. Comment on limk→∞ xk .7. A and x0 in Exercise 18. A and x0 in Exercise 29. A and x0 in Exercise 310. A and x0 in Exercise 411. A and x0 in Exercise 5

12. A and x0 in Exercise 6

13. A =

3 −1 −1−12 0 5

4 −2 −1

, x0 =

3−14

8

14. A =−6 1 3−3 0 2−20 2 10

, x0 =

11−1

In Exercises 15–18, solve the initial-value problem.15. u′(t) = 5u(t)− 6v(t), u(0) = 4v′(t) = 3u(t)− 4v(t), v(0) = 1

16. u′(t) = u(t)+ 2v(t), u(0) = 1v′(t) = 2u(t)+ v(t), v(0) = 5

17. u′(t) = u(t)+ v(t)+ w(t), u(0) = 3v′(t) = 3v(t)+ 3w(t), v(0) = 3w′(t) = −2u(t)+ v(t)+ w(t), w(0) = 1

18. u′(t) = −2u(t)+ 2v(t)− 3w(t), u(0) = 3v′(t) = 2u(t)+ v(t)− 6w(t), v(0) = −1w′(t) = −u(t)− 2v(t), w(0) = 3

19. Consider the matrix A given by

A =[

1 20 1

].

Note that λ = 1 is the only eigenvalue of A.a) Verify that A is defective.b) Consider the sequence {xk} determined by

xk = Axk−1, k = 1, 2, . . . , wherex0 = [1, 1]T . Use induction to show that



xk = [2k + 1, 1]T . (This exercise gives anexample of a sequence xk = Axk−1, wherelimk→∞ ‖xk‖ = ∞, even though A has noeigenvalue larger than 1 in magnitude.)

In Exercises 20 and 21, choose a value α so that thematrix A has an eignevalue of λ = 1. Then, forx0 = [1, 1]T , calculate limk→∞ xk, where xk = Axk−1,

k = 1, 2, . . . .

20. A =[.5 .5.5 1+ α

]

21. A =[

0 .3.6 1+ α

]

22. Suppose that {uk} and {vk} are sequences satisfy-ing uk = Auk−1, k = 1, 2, . . . , and vk = Avk−1,

k = 1, 2, . . . . Show that if u0 = v0, then ui = vifor all i.

23. Let B = (bij ) be an (n × n) matrix. Matrix B iscalled a stochastic matrix if B contains only non-negative entries and if bi1 + bi2 + · · · + bin = 1,1 ≤ i ≤ n. (That is, B is a stochastic matrix if BTsatisfies conditions 8a and 8b.) Show that λ = 1is an eigenvalue of B. [Hint: Consider the vectorw = [1, 1, . . . , 1]T .]

24. Suppose that B is a stochastic matrix whose entriesare all positive. By Exercise 23, λ = 1 is an eigen-value of B. Show that if Bu = u, u �= θ , then uis a multiple of the vector w defined in Exercise 23.

[Hint: Define v = αu so that vi = 1 and |vj | ≤ 1,1 ≤ j ≤ n. Consider the ith equations in Bw = wand Bv = v.]

25. Let B be a stochastic matrix, and let λ by any eigen-value of B. Show that |λ| ≤ 1. For simplicity, as-sume that λ is real. [Hint: Suppose that Bu = λu,u �= θ . Define a vector v as in Exercise 24.]

26. Let A be an (n × n) matrix satisfying conditions(8a) and (8b). Show that λ = 1 is an eigenvalue ofA and that if Au = βu, u �= θ , then |β| ≤ 1. [Hint:Matrix AT is stochastic.]

27. Suppose that (A − λI)u = θ , u �= θ , and thereis a vector v such that (A − λI)v = u. Then v iscalled a generalized eigenvector. Show that {u, v}is a linearly independent set. [Hint: Note thatAv = λv + u. Suppose that au + bv = θ , andmultiply this equation by A.]

28. Let A, u, and v be as in Exercise 27. Show thatAkv = λkv + kλk−1u, k = 1, 2, . . . .

29. Consider matrix A in Exercise 19.a) Find an eigenvector u and a generalized

eigenvector v for A.b) Express x0 = [1, 1]T as x0 = au+ bv.c) Using the result of Exercise 28, find an

expression for Akx0 = Ak(au+ bv).d) Verify that Akx0 = [2k + 1, 1]T as was shown

by other means in Exercise 19.


1. Find all values x such that A is singular, where

A =x 1 2

3 x 0

0 −1 1

.

2. For what values x doesA have only real eigenvalues,where

A =[

2 1

x 3

]?

3. Let

A =[a b

c d

],

where a+ b = 2 and c+ d = 2. Show that λ = 2 isan eigenvalue for A. [Hint: Guess an eigenvector.]

4. LetA andB be (3×3)matrices such that det(A) = 2and det(B) = 9. Find the values of each of thefollowing.a) det(A−1B2)

b) det(3A)c) det(AB2A−1)



5. For what values x is A defective, where

A =[

2 x

0 2

].

In Exercises 6–9, A is a (2 × 2) matrix such thatA2 + 3A− I = O.6. Suppose we know that

Au =[

21

], where u =

[13

].

Find A2u and A3u.7. Show thatA is nonsingular. [Hint: Is there a nonzero

vector x such that Ax = θ?]8. Find A−1u, where u is as in Exercise 6.9. Using the fact thatA2 = I−3A,we can find scalarsak and bk such that Ak = akA + bkI. Find thesescalars for k = 2, 3, 4, and 5.

In Exercises 10 and 11, find the eigenvalues λi giventhe corresponding eigenvector ui . Do not calculate thecharacteristic polynomial for A.

10. A =[

2 −121 −5

], u1 =

[41

],

u2 =[

31

]

11. A =[

1 2−1 4

], u1 =

[21

],

u2 =[

11

]

12. Find x so that u is an eigenvector. What is the cor-responding eigenvalue λ?

A =[

2 x

1 −5

], u =

[1−1

]

13. Find x and y so that u is an eigenvector correspond-ing to the eigenvalue λ = 1:

A =[x y

2x −y

], u =

[ −11

]

14. Find x and y so that u is an eigenvector correspond-ing to the eigenvalue λ = 4:

A =[x + y y

x − 3 1

], u =

[ −31

]


In Exercises 1–8, answer true or false. Justify your an-swer by providing a counterexample if the statement isfalse or an outline of a proof if the statement is true. Ineach exercise, A is a real (n× n) matrix.1. If A is nonsingular with A−1 = AT , then

det(A) = 1.2. If x is an eigenvector for A, where A is nonsingular,

then x is also an eigenvector for A−1.

3. If A is nonsingular, then det(A4) is positive.4. If A is defective, then A is singular.5. If A is an orthogonal matrix and if x is in Rn, then‖Ax‖ = ‖x‖.

6. If S is (n× n) and nonsingular, then A and S−1AS

have the same eigenvalues.7. If A and B are diagonal (n × n) matrices, then

det(A+ B) = det(A) + det(B).8. If A is singular, then A is defective.

In Exercises 9–14, give a brief answer.9. Suppose thatA andQ are (n×n)matrices whereQ

is orthogonal. Then we know thatA andB = QTAQhave the same eigenvalues.a) If x is an eigenvector of B corresponding to λ,

give an eigenvector of A corresponding to λ.



b) If u is an eigenvector of A corresponding to λ,give an eigenvector of B corresponding to λ.

10. Suppose that A is (n× n) and A3 = O. Show that 0is the only eigenvalue of A.

11. Show that ifA is (n×n) and is similar to the (n×n)identity I, then A = I.

12. Let A and B be (n× n) with A nonsingular. Showthat AB and BA are similar. [Hint: ConsiderS−1ABS = BA.]

13. Suppose thatA andB are (n×n) andA is similar toB. Show thatAk is similar toBk for k = 2, 3, and 4.

14. Let u be a vector in Rn such that uTu = 1, and let Adenote the (n× n) matrix A = I − 2uuT .a) Is A symmetric?b) Is A orthogonal?c) Calculate Au.d) Suppose that w is in Rn and uTw = 0.What isAw?

e) Give the eigenvalues of A and give thegeometric multiplicity for each eignevalue.

MATLAB EXERCISES

1. Recognizing eigenvectors geometrically Let x = [x1, x2]T and let y = [y1, y2]T bevectors in R2. The following MATLAB command gives us a geometric representation ofx and y:

plot([0, x(1)], [0, x(2)], [0, y(1)], [0, y(2)]). (1)

(In particular, the single command plot([0, x(1)], [0, x(2)]) draws a linefrom the origin (0, 0) to the point (x(1), x(2)); this line is a geometric representation ofthe vector x. The longer command (1) draws two lines, one representing x and the othery.)a) Let A be the (2× 2) matrix

A =[

3 71 3

].

For each of the following vectors x, use the command (1) to plot x and y = Ax.Which of the vectors x is an eigenvector for A? How can you tell from the geometricrepresentation drawn by MATLAB?

i) x =[

0.35360.9354

]ii) x =

[0.93540.3536

]

iii) x =[ −0.3536

0.9354

]iv) x =

[ −0.93540.3536

].

b) Let λ be an eigenvalue for A with corresponding eigenvector x. Then:

(Ax)T xxT x

= (λx)T x

xT x= λx

T xxT x

= λ.The expression (Au)T u/uT u is called a Rayleigh quotient. Therefore, the precedingformula says that if u is an eigenvector for A, then the value of the Rayleigh quotientis equal to an eigenvalue corresponding to u.



For each of the eigenvectors found in part a), use MATLAB to compute the Rayleighquotient and hence determine the corresponding eigenvalue λ. Check yourcalculations by comparing Ax and λx.

c) Repeat parts a) and b) for the following matrix A and vectors given in i)–iv):

A =[

1 31 1

]

and

i) x =[ −0, 8660

0.5000

]ii) x =

[0.50000.8660

]

iii) x =[ −0.5000

0.8660

]iv) x =

[ −0.86600.5000

].

2. Determinants of block matrices In the MATLAB exercises for Chapter 1, we discussedhow block matrices could be multiplied by thinking of the blocks as numbers. In thisexercise, we extend the ideas to include determinants of block matrices.

Consider a (2× 2) block matrix A of the form

A =[A1 A2

A3 A4

]. (2)

We would like to be able to say that det(A) = det(A1) det(A4)− det(A3) det(A2) and, infact, sometimes we can; but, sometimes we cannot.a) Generate a random (6× 6) matrix A and partition it into four (3× 3) blocks in order

to create a (2× 2) block matrix of the form displayed in Eq. (2). Use the MATLABdeterminant command to calculate det(A)—as you might expect, the command isdet(A). Use MATLAB to calculate the determinant of each block and compare yourresult with the value: det(A1) det(A4)− det(A3) det(A2). Is the formuladet(A) = det(A1) det(A4)− det(A3) det(A2) valid in general?

b) Now, let us do part a) again. This time, however, we will choose A3 to be the (3× 3)zero matrix. That is, A is a block upper-triangular matrix. Verify for your randomlychosen matrix that the expected result holds:

If A is block upper triangular, then det(A) = det(A1) det(A4).

c) The result in part b) suggests the following theorem:

The determinant of a block triangular matrix is equal to theproduct of the determinants of its diagonal blocks.

This theorem is indeed true and it is valid for a matrix with any number of blocks, solong as the matrix is partitioned in such a way that the diagonal blocks are square. Illustratethis result by generating a random (12× 12) matrix A and partitioning it as follows:

A =A11 A12 A13

0 A22 A23

0 0 A33

.



Your (3 × 3) block upper-triangular matrix A must have its diagonal blocks square,but there is no other requirement. For example,A11 could be (2×2),A22 could be (7×7),and A33 could be (3× 3). The proof of the theorem stated above can be established usingtechniques discussed in Chapter 6.

3. Dominant eigenvalues An eigenvalue λ for a matrix A is called a dominant eigenvalueif |λ| > |β| for any other eigenvalue β of A. This exercise will illustrate how powers of Amultiplying a starting vector x0 will line up along a dominant eigenvector. That is, givena starting vector x0, the following sequence of vectors will tend to become a multiple ofan eigenvector associated with the dominant eigenvalue λ.

x1 = Ax0, x2 = Ax1, . . . , xk = Axk−1, . . . . (3)

The sequence of vectors defined above was discussed in Section 4.8 under the topicsof difference equations and Markov chains. In that section we saw how the dominanteigenvalue/eigenvector pair determined the steady-state solution of the difference equation.The sequence of vectors was also introduced in Exercise 28 in Section 4.4. In that exercise,we saw the converse; that estimates of the steady-state solution can be used to estimate adominant eigenvalue and eigenvector (this procedure is the power method).

The point of this exercise is to illustrate numerically and graphically how the sequence(3) lines up in the direction of a dominant eigenvector. First, however, we want to recallwhy this sequence behaves in such a fashion.

As an example, suppose A is a (3 × 3) matrix with eigenvalues λ1, λ2, λ3 andeigenvectors u1, u2, u3. Further, suppose λ1 is a dominant eigenvalue. Now we knowthat the kth term in sequence (3) can be expressed as xk = Akx0 (see Eqs. (4) and (5) inSection 4.8). Finally, let us suppose that x0 can be expessed as a linear combination of theeigenvectors:

x0 = c1u1 + c2u2 + c3u3.

Using the fact that xk = Akx0, we see from the preceding representation for x0 thatwe have

xk = c1(λ1)ku1 + c2(λ2)

ku2 + c3(λ3)ku3

or

xk = c1λk1

(u1 + c2

c1

(λ2

λ1

)ku2 + c3

c1

(λ3

λ1

)ku3

). (4)

Since λ1 is a dominant eigenvalue, the reason that xk lines up in the direction of thedominant eigenvector u1 is clear from formula (4) for xk.

We note that formula (4) can be used in two different ways. For a given starting vectorx0, we can use (4) to estimate the steady-state vector xk at some future time tk (this use isdiscussed in Section 4.8). Conversely, given a matrixA, we can calculate the sequence (3)and use formula (4) to estimate the dominant eigenvalue (this use is the power method).a) Let the matrix A and the starting vector x0 be as follows:

A =

3 −1 −1−12 0 5

4 −2 −1

, x0 =

111

.

Use MATLAB to generate x1, x2, . . . , x10. (You need not use subscripted vectors, youcan simply repeat the following command ten times: x = A*x. This assignment



statement replaces x by Ax each time it is executed.) As you can see, the vectors xkare lining up in a certain direction. To conveniently identify that direction, divide eachcomponent of x10 by the first component of x10. Calculate the next three vectors in thesequence (the vectors x11, x12, and x13) normalizing each one as you did for x10.

What is your guess as to a dominant eigenvector for A?b) From formula (4) we see that xk+1 ≈ λ1xk. Use this approximation and the results of

part a) to estimate the dominant eigenvalue of A.c) As you can see from part a), when we generate the sequence (3) we obtain vectors

with larger and larger components when the dominant eigenvalue is larger than 1 inabsolute value. To avoid vectors with large components, it is customary to normalizeeach vector in the sequence. Thus, rather than generating sequence (3), we insteadgenerate the following sequence (5) of unit vectors:

x1 = Ax0

‖Ax0‖ , x2 = Ax1

‖Ax1‖ , . . . , xk =Axk−1

Axk−1‖ , . . . (5)

A slight modification of formula (4) shows that the normalized sequence (5) also linesup along the dominant eigenvector. Repeat part a) using the normalized sequence (5)and observe that you find the same dominant eigenvector. Try several differentstarting vectors, such as x0 = [1, 2, 3]T .

d) This exercise illustrates graphically the ideas in parts a)–c). Consider the matrix Aand starting vector x0 given by

A =[

2.8 −1.6−1.6 5.2

], x0 =

[1/√

2

1/√

2

].

Use MATLAB to calculate the sequence of vectors defined by (5). In order to give ageometric representation of each term in the sequence, we can use the followingMATLAB commands:x=[1,1]’x=x/norm(x)plot([0, x(1)], [0, x(2)])holdx = A*x/norm(A*x)plot([0, x(1)], [0, x(2)])x = A*x/norm(A*x)plot([0, x(1)], [0, x(2)])etc.

Continue until the sequences appear to be stabilizing.e) Exercise 28 in Section 4.4 describes the power method, which is based on the ideas

discussed so far in this exercise. Exercise 28 gives an easy way (based on Rayleighquotients) to estimate the dominant eigenvalue that corresponds to the dominanteigenvectors generated by the sequence (5); see the definition of βk in part c) ofExercise 28. Use this idea to estimate the dominant eigenvalue for the matrix A inpart d) of this MATLAB exercise.

June 1, 2001 10:36 i56-ch05 Sheet number 1 Page number 357 cyan black

357

5Vector Spacesand LinearTransformations

Overview In Chapter 3 we saw, by using an algebraic perspective, that we could extend geometricvector concepts to Rn. In this chapter, using Rn as a model, we further extend the idea ofa vector to include objects such as matrices, polynomials, functions, infinite sequences,and so forth.

As we will see in this chapter, concepts introduced in Chapter 3 (such as subspace,basis, and dimension) have natural extensions to the general vector space setting. Inaddition, applications treated in Chapter 3 (such as least squares fits to data) also haveextensions to the general vector space setting.

Core Sections 5.2 Vector Spaces5.3 Subspaces5.4 Linear Independence, Bases, and Coordinates5.5 Dimension5.7 Linear Transformations5.8 Operations with Linear Transformations5.9 Matrix Representations for Linear Transformations5.10 Change of Basis and Diagonalization


358 Chapter 5 Vector Spaces and Linear Transformations

5.1 INTRODUCTION

Chapter 3 illustrated that by passing from a purely geometric view of vectors to analgebraic perspective we could, in a natural way, extend the concept of a vector toinclude elements of Rn. Using Rn as a model, this chapter extends the notion of a vectoreven further to include objects such as matrices, polynomials, functions continuous ona given interval, and solutions to certain differential equations. Most of the elementaryconcepts (such as subspace, basis, and dimension) that are important to understandingvector spaces are immediate generalizations of the same concepts in Rn.

Linear transformations were also introduced in Chapter 3, and we showed in Sec-tion 3.7 that a linear transformation, T , from Rn to Rm is always defined by matrixmultiplication; that is,

T (x) = Ax (1)

for some (m×n)matrixA. In Sections 5.7–5.10, we will consider linear transformationson arbitrary vector spaces, thus extending the theory of mappings defined as in Eq. (1)to a more general setting. For example, differentiation and integration can be viewed aslinear transformations.

Although the theory of vector spaces is relatively abstract, the vector-space structureprovides a unifying framework of great flexibility, and many important practical problemsfit naturally into a vector-space framework. As examples, the set of all solutions to adifferential equation such as

a(x)y ′′ + b(x)y ′ + c(x)y = 0can be shown to be a two-dimensional vector space. Thus if two linearly independentsolutions are known, then all the solutions are determined. The previously defined notionof dot product can be extended to more general vector spaces and used to define the dis-tance between two vectors. This notion is essential when one wishes to approximate oneobject with another (for example, to approximate a function with a polynomial). Lineartransformations permit a natural extension of the important concepts of eigenvalues andeigenvectors to arbitrary vector spaces.

A basic feature of vector spaces is that they possess both an algebraic character anda geometric character. In this regard the geometric character frequently gives a pictorialinsight into how a particular problem can be solved, whereas the algebraic character isused actually to calculate a solution.

As an example of how we can use this dual geometric/algebraic character of vectorspaces, consider the following. In 1811 and 1822, Fourier, in his Mathematical Theoryof Heat, made extremely important discoveries by using trigonometric series of the form

s(x) =∞∑k=0

(ak cos kx + bk sin kx)

to represent functions, f (x), −π ≤ x ≤ π . Today these representations can be visual-ized and utilized in a simple way using vector-space concepts.

For any positive integer n, let Sn represent the set of all trigonometric polynomialsof degree at most n:

Sn ={sn(x): sn(x) =

n∑k=0

(ak cos kx + bk sin kx), ak and bk real numbers

}.


5.1 Introduction 359

Now, if s∗n(x) is the best approximation in Sn to f (x), then we might hope that s(x) =limn→∞ s∗n(x). A heuristic picture of this setting is shown in Fig. 5.1, where F[−π, π ]denotes all functions defined on [−π, π ].

f (x)

F [– , ]

sn*(x)

sn(x)

Sn

π π

Figure 5.1 Among all sn(x) in Sn, we are searching for s∗(x), whichbest approximates f (x),−π ≤ x ≤ π .

In Fig. 5.2, we have a vector approximation problem that we already know how towork from calculus. Here � is a plane through the origin, and we are searching for apoint y∗ in � that is closer to the given point b than any other point y in �. Usingb, y, and y∗ as the position vectors for the points b, y, and y∗, respectively, we knowthat y∗ is characterized by the fact that the remainder vector, b− y∗, is perpendicular toevery position vector y in �. That is, we can find y∗ by setting (b − y∗)T u1 = 0 and(b− y∗)T u2 = 0, where {u1, u2} is any basis for �.

y

z

x

b

y*

Π

Figure 5.2 The vector y∗ in � is closer to b than is any other vector yin � if and only if b− y∗ is perpendicular to all y in �.

Figure 5.3 gives another way of visualizing this problem. We see a striking similaritybetween Figs. 5.1 and 5.3. It gives us the inspiration to ask if we can find s∗n(x) inFig. 5.1 by choosing its coefficients, ak and bk , so that the remainder function, f (x) −s∗n(x), is in some way “perpendicular” to every sn(x) in Sn.



R3

y*

y

b

Π

Figure 5.3 An abstract representation of the problem of finding theclosest vector, y∗, in a subspace � to a vector b.

As we will show in Section 5.6, this is precisely the approach we use to computes∗n(x). Thus the geometric character of the vector-space setting provides our intuitionwith a possible procedure for solution. We must then use the algebraic character to:

(a) argue that our intuition is valid, and(b) perform the actual calculation of the coefficients ak and bk for s∗n(x).

5.2 VECTOR SPACES

We begin our study of vector spaces by recalling the basic properties of Rn. First recallthat there are two algebraic operations in Rn:

1. Vectors in Rn can be added.2. Any vector in Rn can be multiplied by a scalar.

Furthermore, these two operations satisfy the 10 basic properties given in Theorem 1 ofSection 3.2. For example, if u and v are in Rn, then u + v is also in Rn. Moreover,u+ v = v + u and (u+ v)+ w = u+ (v + w) for all u, v, and w in Rn.

There are numerous sets other than Rn on which there are defined algebraic oper-ations of addition and scalar multiplication. Moreover, in many cases these operationswill satisfy the same 10 rules listed in Theorem 1 of Section 3.2. For example, wehave already defined matrix addition and scalar multiplication on the set of all (m× n)

matrices. Furthermore, it follows from Theorems 7, 8, and 9 of Section 1.6 that theseoperations satisfy the properties given in Theorem 1 of Section 3.2 (see Example 2 laterin this section). Thus, with Rn as a model, we could just as easily study the set of all(m × n) matrices and derive most of the properties and concepts given in Chapter 3,but in the context of matrices. Rather than study each such set individually, however,it is more efficient to define a vector space in the abstract as any set of objects thathas algebraic operations that satisfy a given list of basic properties. Using only theseassumed properties, we can prove other properties and develop further concepts. Theresults obtained in this manner then apply to any specific vector space. For example,later in this chapter the term linearly independent will be applied to a set of matrices, aset of polynomials, or a set of continuous functions.


5.2 Vector Spaces 361

Drawing on this discussion, we see that a general vector space should consist ofa set of elements (or vectors), V , and a set of scalars, S, together with two algebraicoperations:

1. An addition, which is defined between any two elements of V and which pro-duces a sum that is in V ;

2. A scalar multiplication, which defines how to multiply any element of V by ascalar from S.

In practice the set V can consist of any collection of objects for which meaningfuloperations of addition and scalar multiplication can be defined. For example, V mightbe the set of all (2 × 3) matrices, the set R4 of all four-dimensional vectors, a set offunctions, a set of polynomials, or the set of all solutions to a linear homogeneousdifferential equation. We will take the set S of scalars to be the set of real numbers,although for added flexibility other sets of scalars may be used (for example, S couldbe the set of complex numbers). Throughout this chapter the term scalar will alwaysdenote a real number.

Using Rn as a model and the properties of Rn listed in Theorem 1 of Section 3.2as a guide, we now define a general vector space. Note that the definition says nothingabout the set V but rather specifies rules that the algebraic operations must satisfy.

Definition 1 A set of elements V is said to be a vector space over a scalar field S if an additionoperation is defined between any two elements of V and a scalar multiplicationoperation is defined between any element of S and any vector in V . Moreover,if u, v, and w are vectors in V , and if a and b are any two scalars, then these10 properties must hold.

Closure properties:(c1) u+ v is a vector in V .(c2) av is a vector in V .

Properties of addition:(a1) u+ v = v + u.(a2) u+ (v + w) = (u+ v)+ w.(a3) There is a vector θ in V such that v + θ = v for all v in V .(a4) Given a vector v in V , there is a vector −v in V such that v + (−v) = θ .

Properties of scalar multiplication:(m1) a(bv) = (ab)v.(m2) a(u+ v) = au+ av.(m3) (a + b)v = av + bv.(m4) 1v = v for all v in V .

The first two conditions, (c1) and (c2), in Definition 1, called closure properties,ensure that the sum of any two vectors in V remains in V and that any scalar multiple



of a vector in V remains in V . In condition (a3), θ is naturally called the zero vector(or the additive identity). In (a4), the vector −v is called the additive inverse of v, and(a4) asserts that the equation v + x = θ has a solution in V . When the set of scalars Sis the set of real numbers, V is called a real vector space; and as we have said, we willconsider only real vector spaces.

Example of Vector SpacesWe already have two familiar examples of vector spaces, namely, Rn and the set of all(m× n) matrices. It is easy to verify that these are vector spaces, and the verification issketched in the next two examples.

Example 1 For any positive integer n, verify that Rn is a real vector space.

Solution Theorem 1 of Section 3.2 shows that Rn satisfies the properties listed in Definition 1, soRn is a real vector space.

Example 2 may strike the reader as being a little unusual since we are consideringmatrices as elements in a vector space. The example, however, illustrates the flexibilityof the vector-space concept; any set of entities that has addition and scalar multiplicationoperations can be a vector space, provided that addition and scalar multiplication satisfythe requirements of Definition 1.

Example 2 Verify that the set of all (2× 3) matrices with real entries is a real vector space.

Solution Let A and B be any (2×3) matrices, and let addition and scalar multiplication be definedas in Definitions 6 and 7 of Section 1.5. Therefore, A+ B and aA are defined by

A+ B =[

a11 a12 a13

a21 a22 a23

]+[

b11 b12 b13

b21 b22 b23

]

=[

a11 + b11 a12 + b12 a13 + b13

a21 + b21 a22 + b22 a23 + b23

]

aA = a

[a11 a12 a13

a21 a22 a23

]=[

aa11 aa12 aa13

aa21 aa22 aa23

].

From these definitions it is obvious that both the sum A + B and the scalar multipleaA are again (2 × 3) matrices; so (c1) and (c2) of Definition 1 hold. Properties (a1),(a2), (a3), and (a4) follow from Theorem 7 of Section 1.6; and (m1), (m2), and (m3)are proved in Theorems 8 and 9 of Section 1.6. Property (m4) is immediate from thedefinition of scalar multiplication [clearly 1A = A for any (2 × 3) matrix A]. Foremphasis we recall that the zero element in this vector space is the matrix

O =[

0 0 00 0 0

],

and clearly A+O = A for any (2× 3) matrix A. We further observe that (−1)A is theadditive inverse for A because

A+ (−1)A = O.



[That is, (−1)A is a matrix we can add toA to produce the zero elementO.] A duplicationof these arguments shows that for any m and n the set of all (m× n) matrices with realentries is a real vector space.

The next three examples show that certain sets of functions have a natural vector-space structure.

Example 3 Let P2 denote the set of all real polynomials of degree 2 or less. Verify that P2 is a realvector space.

Solution Note that a natural addition is associated with polynomials. For example, let p(x) andq(x) be the polynomials

p(x) = 2x2 − x + 3 and q(x) = x2 + 2x − 1.

Then the sum r(x) = p(x) + q(x) is the polynomial r(x) = 3x2 + x + 2. Scalarmultiplication is defined similarly; so if s(x) = 2q(x), then

s(x) = 2x2 + 4x − 2.

Given this natural addition and scalar multiplication associated with the set P2, it seemsreasonable to expect that P2 is a real vector space.

To establish this conclusion rigorously, we must be a bit more careful. To begin, wedefine P2 to be the set of all expressions (or functions) of the form

p(x) = a2x2 + a1x + a0, (1)

where a2, a1, and a0 are any real constants. Thus the following polynomials are vectorsin P2:

p1(x) = x2 − x + 3, p2(x) = x2 + 1, p3(x) = x − 2,p4(x) = 2x, p5(x) = 7, p6(x) = 0.

For instance, we see that p2(x) has the form of Eq. (1), with a2 = 1, a1 = 0, anda0 = 1. Similarly, p4(x) is in P2 because p4(x) is a function of the form (1), wherea2 = 0, a1 = 2, and a0 = 0. Finally, p6(x) has the form (1) with a2 = 0, a1 = 0, anda0 = 0. To define addition precisely, let

p(x) = a2x2 + a1x + a0 and q(x) = b2x

2 + b1x + b0

be two vectors in P2. We define the sum r(x) = p(x)+ q(x) to be the polynomial

r(x) = (a2 + b2)x2 + (a1 + b1)x + (a0 + b0);

and we define the scalar multiple s(x) = cp(x) to be the polynomial

s(x) = (ca2)x2 + (ca1)x + (ca0).

We leave it to the reader to verify that these algebraic operations meet the requirementsof Definition 1; we note only that we choose the zero vector to be the polynomial that isidentically zero. That is, the zero element inP2 is the polynomial θ(x), where θ(x) = 0;or in terms of Eq. (1), θ(x) is defined by

θ(x) = 0x2 + 0x + 0.



Example 4 In this example we take Pn to be the set of all real polynomials of degree n or less. Thatis, Pn consists of all functions p(x) of the form

p(x) = anxn + an−1x

n−1 + · · · + a2x2 + a1x + a0,

where an, an−1, . . . , a2, a1, a0 are any real constants. With addition and scalar mul-tiplication defined as in Example 3, it is easy to show that Pn is a real vector space.

The next example presents one of the more important vector spaces in applications.

Example 5 Let C[a, b] be the set of functions defined by

C[a, b] = {f (x): f (x) is a real-valued continuous function, a ≤ x ≤ b}.Verify that C[a, b] is a real vector space.

Solution C[a, b] has a natural addition, just as Pn. If f and g are vectors in C[a, b], then wedefine the sum h = f + g to be the function h given by

h(x) = f (x)+ g(x), a ≤ x ≤ b.

Similarly, if c is a scalar, then the scalar multiple q = cf is the function

q(x) = cf (x), a ≤ x ≤ b.

As a concrete example, if f (x) = ex and g(x) = sin x, then 3f +g is the function r ,where the action of r is defined by r(x) = 3ex + sin x. Note that the closure properties,(c1) and (c2), follow from elementary results of calculus—sums and scalar multiples ofcontinuous functions are again continuous functions. The remaining eight properties ofDefinition 1 are easily seen to hold in C[a, b]; the verification proceeds exactly as inPn.

Note that any polynomial can be regarded as a continuous function on any interval[a, b]. Thus for any given positive integer n, Pn is not only a subset of C[a, b] but alsoa vector space contained in the vector space C[a, b]. This concept of a vector space thatcontains a smaller vector space (or a vector subspace) is quite important and is one topicof the next section.

FUNCTION SPACES The giant step of expanding vector spaces from Rn to spaces offunctions was a combined effort of many mathematicians. Probably foremost among them, however, wasDavid Hilbert (1862–1943), for whom Hilbert spaces are named. Hilbert had great success in solvingseveral important contemporary problems by emphasizing abstraction and an axiomatic approach. Hisideas on abstract spaces came largely from his work on important integral equations in physics. Hilbertrelated integral equations to problems of infinitely-many equations in infinitely-many unknowns, anatural extension of a fundamental problem in the setting of Rn. Great credit for expansion ofvector-space ideas is also given to the work of Riesz, Fischer, Fréchet, and Weyl. In particular, HermannWeyl (1885–1955) was known for his stress on the rigorous application of axiomatic logic rather thanvisual plausibility, which was all too often accepted as proof.



Further Vector-Space PropertiesThe algebraic operations in a vector space have additional properties that can be de-rived from the 10 fundamental properties listed in Definition 1. The first of these, thecancellation laws for vector addition, are straightforward to prove and will be left asexercises.

Cancellation Laws for Vector AdditionLet V be a vector space, and let u, v, and w be vectors in V .

1. If u+ v = u+ w, then v = w.2. If v + u = w + u, then v = w.

Some additional properties of vector spaces are summarized in Theorem 1.

Theorem 1 If V is a vector space, then:

1. The zero vector, θ , is unique.2. For each v, the additive inverse −v is unique.3. 0v = θ for every v in V , where 0 is the zero scalar.4. aθ = θ for every scalar a.5. If av = θ , then a = 0 or v = θ .6. (−1)v = −v.

Proof [We prove properties 1, 4, and 6 and leave the remaining properties as exercises.] Wefirst prove property 1. Suppose that ζ is a vector in V such that v+ ζ = v for all v in V .Then setting v = θ , we have

θ + ζ = θ . (2)

By property (a3) of Definition 1, we know also that

ζ + θ = ζ. (3)

But from property (a1) of Definition 1, we know that ζ + θ = θ + ζ ; so using Eq. (2),property (a1), and Eq. (3), we conclude that

θ = θ + ζ = ζ + θ = ζ,

or θ = ζ .We next prove property 4 of Theorem 1. We do so by observing that θ + θ = θ ,

from property (a3) of Definition 1. Therefore if a is any scalar, we see from property(m2) of Definition 1 that

aθ = a(θ + θ) = aθ + aθ . (4)



Since aθ = aθ + θ by property (a3) of Definition 1, Eq. (4) becomes

aθ + θ = aθ + aθ .

The cancellation laws now yield θ = aθ .Finally, we outline a proof for property 6 of Theorem 1 by displaying a sequence of

equalities (the last equality is based on property 3, which is an exercise):

v + (−1)v = (1)v + (−1)v = [1+ (−1)]v = 0v = θ .Thus (−1)v is a solution to the equation v+ x = θ . But from property 2 of Theorem 1,the additive inverse−v is the only solution of v+ x = θ ; so we must have (−1)v = −v.Thus property 6 constitutes a formula for the additive inverse. This formula is not totallyunexpected, but neither is it so obvious as it might seem, since a number of vector-spaceproperties were required to prove it.

Example 6 We conclude this section by introducing the zero vector space. The zero vector spacecontains only one vector, θ ; the arithmetic operations are defined by

θ + θ = θkθ = θ .

It is easy to verify that the set {θ} with the operations just defined is a vector space.

5.2 EXERCISES

For u, v, and w given in Exercises 1–3, calculate u −2v, u− (2v − 3w), and −2u− v + 3w.1. In the vector space of (2× 3) matrices

u =[

2 1 3−1 1 2

], v =

[1 4 −15 2 7

],

w =[

4 −5 11−13 −1 −1

].

2. In the vector space P2

u = x2 − 2, v = x2 + 2x − 1, w = 2x + 1.

3. In the vector space C[0, 1]u = ex, v = sin x, w =

√x2 + 1.

4. For u, v, and w in Exercise 2, find nonzero scalarsc1, c2, c3 such that c1u + c2v + c3w = θ . Arethere nonzero scalars c1, c2, c3 such that c1u+c2v+c3w = θ for u, v, and w in Exercise 1?

5. For u, v, and w in Exercise 2, find scalars c1, c2, c3such that c1u + c2v + c3w = x2 + 6x + 1.Show that there are no scalars c1, c2, c3 such thatc1u+ c2v + c3w = x2.

In Exercises 6–11, the given set is a subset of a vectorspace. Which of these subsets are also vector spacesin their own right? To answer this question, determinewhether the subset satisfies the 10 properties of Defini-tion 1. (Note: Because these sets are subsets of a vectorspace, properties (a1), (a2), (m1), (m2), (m3), and (m4)are automatically satisfied.)

6. S = {v in R4: v1 + v4 = 0}7. S = {v in R4: v1 + v4 = 1}8. P = {p(x) in P2: p(0) = 0}9. P = {p(x) in P2: p′′(0) �= 0}10. P = {p(x) in P2: p(x) = p(−x) for all x}11. P = {p(x) in P2: p(x) has degree 2}



In Exercises 12–16, V is the vector space of all real(3 × 4) matrices. Which of the given subsets of V arealso vector spaces?

12. S = {A in V : a11 = 0}13. S = {A in V : a11 + a23 = 0}14. S = {A in V : |a11| + |a21| = 1}15. S = {A in V : a32 �= 0}16. S = {A in V : each aij is an integer}17. Let Q denote the set of all (2× 2) nonsingular ma-

trices with the usual matrix addition and scalar mul-tiplication. Show that Q is not a vector space byexhibiting specific matrices in Q that violate prop-erty (c1) of Definition 1. Also show that properties(c2) and (a3) are not met.

18. Let Q denote the set of all (2× 2) singular matriceswith the usual matrix addition and scalar multipli-cation. Determine whether Q is a vector space.

19. LetQdenote the set of all (2×2) symmetric matriceswith the usual matrix addition and scalar multipli-cation. Verify that Q is a vector space.

20. Prove the cancellation laws for vector addition.21. Prove property 2 of Theorem 1. [Hint: See the proof

of Theorem 15 in Section 1.9.]22. Prove property 3 of Theorem 1. [Hint: Note that

0v = (0 + 0)v. Now mimic the proof given forproperty 4.]

23. Prove property 5 of Theorem 1. (If a �= 0 then mul-tiply both sides of av = θ by a−1. Use properties(m1) and (m4) of Definition 1 and use property 4 ofTheorem 1.)

24. Prove that the zero vector space, defined in Exam-ple 6, is indeed a vector space.

In Exercises 25–29, the given set is a subset ofC[−1, 1].Which of these are also vector spaces?

25. F = {f (x) in C[−1, 1]: f (−1) = f (1)}26. F = {f (x) in C[−1, 1]: f (x) = 0 for − 1/2 ≤

x ≤ 1/2}27. F = {f (x) in C[−1, 1]: f (1) = 1}28. F = {f (x) in C[−1, 1]: f (1) = 0}

29. F ={f (x) in C[−1, 1]:

∫ 1

−1f (x) dx = 0

}

30. The set C2[a, b] is defined to be the set of allreal-valued functions f (x) defined on [a, b], wheref (x), f ′(x), and f ′′(x) are continuous on [a, b].Verify that C2[a, b] is a vector space by citing theappropriate theorems on continuity and differentia-bility from calculus.

31. The following are subsets of the vector spaceC2[−1, 1]. Which of these are vector spaces?a) F = {f (x) in C2[−1, 1]: f ′′(x)+ f (x) =

0,−1 ≤ x ≤ 1}b) F = {f (x) in C2[−1, 1]: f ′′(x)+ f (x) =

x2,−1 ≤ x ≤ 1}32. Show that the setP of all real polynomials is a vector

space.33. Let F(R) denote the set of all real-valued functions

defined on the reals. Thus

F(R) = {f : f is a function, f : R→ R}.With addition of functions and scalar multiplicationdefined as in Example 5, show that F(R) is a vectorspace.

34. Let

V = {x: x =[

x1

x2

], where x1 and x2 are in R}.

For u and v in V and c in R, define the operationsof addition and scalar multiplication on V by

u+ v =[

u1

u2

]+[

v1

v2

]=[

u1 + v1 + 1u2 + v2 − 1

]and

cu =[

cu1

cu2

]. (5)

a) Show that the operations defined in (5) satisfyproperties (c1), (c2), (a1)–(a4), (m1), and (m4)of Definition 1.

b) Give examples to illustrate that properties(m2) and (m3) are not satisfied by the opera-tions defined in (5).

35. Let

V = {x: x =[

x1

x2

], where x1 and x2 are in R}.



For u and v in V and c in R, define the operationsof addition and scalar multiplication on V by

u+ v =[

u1

u2

]+[

v1

v2

]=[

u1 + v1

u2 + v2

]and

cu =[

00

]. (6)

Show that the operations defined in (6) satisfy all theproperties of Definition 1 except (m4). (Note thatthe addition given in (6) is the usual addition of R2.Since R2 is a vector space, all the additive propertiesof Definition 1 are satisfied.)

36. Let

V = {x: x =[

x1

x2

], where x2 > 0}.

For u and v in V and c in R, define addition andscalar multiplication by

u+ v =[

u1

u2

]+[

v1

v2

]=[

u1 + v1

u2v2

]and

cu =[

cu1

uc2

]. (7)

With the operations defined in (7), show that V is avector space.

5.3 SUBSPACES

Chapter 3 demonstrated that whenever W is a p-dimensional subspace of Rn, then W

behaves essentially like Rp (for instance, any set of p + 1 vectors in W is linearlydependent). The situation is much the same in a general vector space V . In this setting,certain subsets of V inherit the vector-space structure of V and are vector spaces in theirown right.

Definition 2 If V and W are real vector spaces, and if W is a nonempty subset of V , then W iscalled a subspace of V .

Subspaces have considerable practical importance and are useful in problems in-volving approximation, optimization, differential equations, and so on. The vector-space/subspace framework allows us to pose and rigorously answer questions such as,How can we find good polynomial approximations to complicated functions? and Howcan we generate good approximate solutions to differential equations? Questions suchas these are at the heart of many technical problems; and vector-space techniques, to-gether with the computational power of the computer, are useful in helping to answerthem.

As was the case in Rn, it is fairly easy to recognize when a subset of a vectorspace V is actually a subspace. Specifically, the following restatement of Theorem 2 ofSection 3.2 holds in any vector space.

Theorem 2 Let W be a subset of a vector space V . Then W is a subspace of V if and only if thefollowing conditions are met:


5.3 Subspaces 369

(s1) The zero vector, θ , of V is in W .(s2) u+ v is in W whenever u and v are in W .(s3) au is in W whenever u is in W and a is any scalar.

The proof of Theorem 2 coincides with the proof given in Section 3.2 with oneminor exception. In Rn it is easily seen that −v = (−1)v for any vector v. In a generalvector space V , this is a consequence of Theorem 1 of Section 5.2.

Examples of SubspacesIf we are given that W is a subset of a known vector space V , Theorem 2 simplifiesthe task of determining whether or not W is itself a vector space. Instead of testing all10 properties of Definition 1, Theorem 2 states that we need only verify that properties(s1)–(s3) hold. Furthermore, just as in Chapter 3, a subset W of V will be specified bycertain defining relationships that tell whether a vector u is in W . Thus to verify that(s1) holds, it must be shown that the zero vector, θ , of V satisfies the specification givenfor W . To check (s2) and (s3), we select two arbitrary vectors, say u and v, that satisfythe defining relationships of W (that is, u and v are in W ). We then test u+ v and au tosee whether they also satisfy the defining relationships of W . (That is, do u+ v and aubelong to W?) The next three examples illustrate the use of Theorem 2.

Example 1 Let V be the vector space of all real (2 × 2) matrices, and let W be the subset of Vspecified by

W = {A: A =[

0 a12

a21 0

], a12 and a21 any real scalars}.

Verify that W is a subspace of V .

Solution The zero vector for V is the (2 × 2) zero matrix O, and O is in W since it satisfies thedefining relationships of W . If A and B are any two vectors in W , then A and B havethe form

A =[

0 a12

a21 0

], B =

[0 b12

b21 0

].

Thus A+ B and aA have the form

A+ B =[

0 a12 + b12

a21 + b21 0

], aA =

[0 aa12

aa21 0

].

Therefore, A+ B and aA are in W , and we conclude that W is a subspace of the set ofall real (2× 2) matrices.

Example 2 Let W be the subset of C[a, b] (see Example 5 of Section 5.2) defined by

W = {f (x) in C[a, b]: f (a) = f (b)}.Verify that W is a subspace of C[a, b].



Solution The zero vector in C[a, b] is the zero function, θ(x), defined by θ(x) = 0 for all x inthe interval [a, b]. In particular, θ(a) = θ(b) since θ(a) = 0 and θ(b) = 0. Therefore,θ(x) is in W . Now let g(x) and h(x) be any two functions that are in W , that is,

g(a) = g(b) and h(a) = h(b). (1)

The sum of g(x) and h(x) is the function s(x) defined by s(x) = g(x) + h(x). To seethat s(x) is in W , note that property (1) gives

s(a) = g(a)+ h(a) = g(b)+ h(b) = s(b).

Similarly, if c is a scalar, then it is immediate from property (1) that cg(a) = cg(b). Itfollows that cg(x) is in W . Theorem 2 now implies that W is a subspace of C[a, b].

The next example illustrates how to use Theorem 2 to show that a subset of a vectorspace is not a vector space. Recall from Chapter 3 that if a subset fails to satisfy any oneof the properties (s1), (s2), or (s3), then it is not a subspace.

Example 3 LetV be the vector space of all (2×2)matrices, and letW be the subspace ofV specifiedby

W = {A: A =[

a b

c d

], ad = 0 and bc = 0}.

Show that W is not a subspace of V .

Solution It is straightforward to show that W satisfies properties (s1) and (s3) of Theorem 2. Thusto demonstrate that W is not a subspace of V , we must show that (s2) fails. It suffices togive a specific example that illustrates the failure of (s2). For example, define A and B

by

A =[

1 00 0

]and B =

[0 00 1

].

Then A and B are in W , but A+ B is not, since

A+ B =[

1 00 1

].

In particular, ad = (1)(1) = 1, so ad �= 0.

If n ≤ m, then Pn is a subspace of Pm. We can verify this assertion directly fromDefinition 2 since we have already shown that Pn and Pm are each real vector spaces,and Pn is a subset of Pm. Similarly, for any n, Pn is a subspace of C[a, b]. Againthis assertion follows directly from Definition 2 since any polynomial is continuous onany interval [a, b]. Therefore, Pn can be considered a subspace of C[a, b], as well as avector space in its own right.

Spanning SetsThe vector-space structure as given in Definition 1 guarantees that the notion of a linearcombination makes sense in a general vector space. Specifically, the vector v is a linear


5.3 Subspaces 371

combination of the vectors v1, v2, . . . , vm provided that there exist scalars a1, a2, . . . , amsuch that

v = a1v1 + a2v2 + · · · + amvm.

The next example illustrates this concept in the vector space P2.

Example 4 In P2 let p(x), p1(x), p2(x), and p3(x) be defined by p(x) = −1 + 2x2, p1(x) =1+ 2x − 2x2, p2(x) = −1− x, and p3(x) = −3− 4x + 4x2. Express p(x) as a linearcombination of p1(x), p2(x), and p3(x).

Solution Setting p(x) = a1p1(x)+ a2p2(x)+ a3p3(x) yields

−1+ 2x2 = a1(1+ 2x − 2x2)+ a2(−1− x)+ a3(−3− 4x + 4x2).

Equating coefficients yields the system of equations

a1 − a2 − 3a3 = −12a1 − a2 − 4a3 = 0−2a1 + 4a3 = 2.

This system has the unique solution a1 = 3, a2 = −2, and a3 = 2. We can easily checkthat

p(x) = 3p1(x)− 2p2(x)+ 2p3(x).

The very useful concept of a spanning set is suggested by the preceding discussion.

Definition 3 Let V be a vector space, and let Q = {v1, v2, . . . , vm} be a set of vectors in V . Ifevery vector v in V is a linear combination of vectors in Q,

v = a1v1 + a2v2 + · · · + amvm,

then we say that Q is a spanning set for V .

For many vector spaces V , it is relatively easy to find a natural spanning set.For example, it is easily seen that {1, x, x2} is a spanning set for P2 and, in general,{1, x, . . . , xn} is a spanning set for Pn. The vector space of all (2 × 2) matrices isspanned by the set {E11, E12, E21, E22}, where

E11 =[

1 00 0

], E12 =

[0 10 0

], E21 =

[0 01 0

], and E22 =

[0 00 1

].

More generally, if the (m×n) matrix Eij is the matrix with 1 as the ij th entry and zeroselsewhere, then {Eij : 1 ≤ i ≤ m, 1 ≤ j ≤ n} is a spanning set for the vector space of(m× n) real matrices.

IfQ = {v1, v2, . . . , vk} is a set of vectors in a vector spaceV , then, as in Section 3.3,the span of Q, denoted Sp(Q), is the set of all linear combinations of v1, v2, . . . , vk:

Sp(Q) = {v: v = a1v1 + a2v2 + · · · + akvk}.



From closure properties (c1) and (c2) of Definition 1, it is obvious that Sp(Q) is a subsetof V . In fact, the proof of Theorem 3 in Section 3.3 is valid in a general vector space,so we have the following theorem.

Theorem 3 If V is a vector space and Q = {v1, v2, . . . , vk} is a set of vectors in V , then Sp(Q) is asubspace of V .

The connection between spanning sets and the span of a set is fairly obvious. IfW is a subspace of V and Q ⊆ W , then Q is a spanning set for W if and only ifW = Sp(Q). As the next three examples illustrate, it is often easy to obtain a spanningset for a subspace W when an algebraic specification for W is given.

Example 5 Let V be the vector space of all real (2× 2) matrices, and let W be the subspace givenin Example 1:

W = {A: A =[

0 a12

a21 0


Find a spanning set for W .

Solution One obvious spanning set for W is seen to be the set of vectors Q = {A1, A2}, where

A1 =[

0 10 0

]and A2 =

[0 01 0

].

To verify this assertion, suppose A is in W , where

A =[

0 a12

a21 0

].

Then clearly A = a12A1 + a21A2, and therefore Q is a spanning set for W .

Example 6 Let W be the subspace of P2 defined by

W = {p(x): p(x) = a0 + a1x + a2x2, where a2 = −a1 + 2a0}.

Exhibit a spanning set for W .

Solution Let p(x) = a0 + a1x + a2x2 be a vector in W . From the specifications of W , we know

that a2 = −a1 + 2a0. That is,p(x) = a0 + a1x + a2x

2

= a0 + a1x + (−a1 + 2a0)x2

= a0(1+ 2x2)+ a1(x − x2).

Since every vector p in W is a linear combination of p1(x) = 1+ 2x2 and p2(x) =x − x2, we see that {p1(x), p2(x)} is a spanning set for W .

A square matrix, A = (aij ), is called skew symmetric if AT = −A. Recall that theij th entry of AT is aji , the jith entry of A. Thus the entries of A must satisfy aji = −aijin order for A to be skew symmetric. In particular, each entry, aii , on the main diagonalmust be zero since aii = −aii .


5.3 Subspaces 373

Example 7 Let W be the set of all (3× 3) skew-symmetric matrices. Show that W is a subspace ofthe vector space V of all (3× 3) matrices, and exhibit a spanning set for W .

Solution Let O denote the (3× 3) zero matrix. Clearly OT = O = −O, so O is in W . If A andB are in W , then AT = −A and BT = −B. Therefore,

(A+ B)T = AT + BT = −A− B = −(A+ B).

It follows that A + B is skew symmetric; that is, A + B is in W . Likewise, if c is ascalar, then

(cA)T = cAT = c(−A) = −(cA),so cA is in W . By Theorem 2, W is a subspace of V . Moreover, the remarks precedingthe example imply that W can be described by

W = {A: A =

0 a b

−a 0 c

−b −c 0

, a, b, c any real numbers}.

From this description it is easily seen that a natural spanning set for W is the set Q ={A1, A2, A3}, where

A1 =

0 1 0−1 0 0

0 0 0

, A2 =

0 0 10 0 0−1 0 0

, and A3 =

0 0 00 0 10 −1 0

.

Finally, note that in Definition 3 we have implicitly assumed that spanning sets arefinite. This is not a required assumption, and frequently Sp(Q) is defined as the setof all finite linear combinations of vectors from Q, where Q may be either an infiniteset or a finite set. We do not need this full generality, and we will explore this ideano further other than to note later that one contrast between the vector space Rn and ageneral vector space V is that V might not possess a finite spanning set. An exampleof a vector space where the most natural spanning set is infinite is the vector space P ,consisting of all polynomials (we place no upper limit on the degree). Then, for instance,Pn is a subspace of P for each n, n = 1, 2, 3, . . . . A natural spanning set for P (in thegeneralized sense described earlier) is the infinite set

Q = {1, x, x2, . . . , xk, . . .}.

5.3 EXERCISES

Let V be the vector space of all (2×3) matrices. Whichof the subsets in Exercises 1–4 are subspaces of V ?1. W = {A in V : a11 + a13 = 1}2. W = {A in V : a11 − a12 + 2a13 = 0}3. W = {A in V : a11 − a12 = 0, a12 + a13 = 0, and

a23 = 0}4. W = {A in V : a11a12a13 = 0}

In Exercises 5–8, which of the given subsets of P2 aresubspaces of P2?5. W = {p(x) in P2: p(0)+ p(2) = 0}6. W = {p(x) in P2: p(1) = p(3)}7. W = {p(x) in P2: p(1)p(3) = 0}8. W = {p(x) in P2: p(1) = −p(−1)}



In Exercises 9–12, which of the given subsets ofC[1,−1] are subspaces of C[−1, 1]?9. F = {f (x) in C[−1, 1]: f (−1) = −f (1)}10. F = {f (x) in C[−1, 1]: f (x) ≥ 0 for all x in[−1, 1]}

11. F = {f (x) in C[−1, 1]: f (−1) = −2 andf (1) = 2}

12. F = {f (x) in C[−1, 1]: f (1/2) = 0}In Exercises 13–16, which of the given subsets ofC2[−1, 1] (see Exercise 30 of Section 5.2) are subspacesof C2[−1, 1]?13. F = {f (x) in C2[−1, 1]: f ′′(0) = 0}14. F = {f (x) in C2[−1, 1]: f ′′(x)− exf ′(x) +

xf (x) = 0,−1 ≤ x ≤ 1}15. F = {f (x) in C2[−1, 1]: f ′′(x) + f (x) = sin x,

−1 ≤ x ≤ 1}16. F = {f (x) in C2[−1, 1]: f ′′(x) = 0,−1 ≤

x ≤ 1}In Exercises 17–21, express the given vector as a linearcombination of the vectors in the given set Q.17. p(x) = −1 − 3x + 3x2 and Q = {p1(x), p2(x),

p3(x)}, where p1(x) = 1 + 2x + x2, p2(x) =2+ 5x, and p3(x) = 3+ 8x − 2x2

18. p(x) = −2− 4x + x2 and

Q = {p1(x), p2(x), p3(x), p4(x)},and where p1(x) = 1 + 2x2 + x3, p2(x) = 1 +x + 2x3, p3(x) = −1 − 3x + 4x2 − 4x3, andp4(x) = 1+ 2x − x2 + x3

19. A =[ −2 −4

1 0

]and Q = {B1, B2, B3, B4},

where B1 =[

1 02 1

], B2 =

[1 10 2

],

B3 =[ −1 −3

4 −4

], and B4 =

[1 2−1 1

].

20. f (x) = ex and Q = {sinh x, cosh x}21. f (x) = cos 2x and Q = {sin2 x, cos2 x}22. Let V be the vector space of all (2 × 2) matrices.

The subset W of V defined by

W = {A in V : a11 − a12 = 0, a12 + a22 = 0}is a subspace ofV . Find a spanning set forW . [Hint:

Observe that A is in W if and only if A has the form

A =[

a11 a11

a21 −a11

],

where a11 and a21 are arbitrary.]23. Let W be the subset of P3 defined byW = {p(x) in P3: p(1) = p(−1) and p(2) = p(−2)}.

Show thatW is a subspace ofP3, and find a spanningset for W .

24. Let W be the subset of P3 defined byW = {p(x) in P3: p(1) = 0 and p′(−1) = 0}.

Show thatW is a subspace ofP3, and find a spanningset for W .

25. Find a spanning set for each of the subsets that is asubspace in Exercises 1–8.

26. Show that the set W of all symmetric (3× 3) matri-ces is a subspace of the vector space of all (3 × 3)matrices. Find a spanning set for W .

27. The trace of an (n × n) matrix A = (aij ), denotedtr(A), is defined to be the sum of the diagonal el-ements of A; that is, tr(A) = a11 + a22 + · · · + ann.Let V be the vector space of all (3 × 3) matrices,and let W be defined by

W = {A in V : tr(A) = 0}.Show that W is a subspace of V , and exhibit a span-ning set for W .

28. Let A be an (n × n) matrix. Show that B =(A+AT )/2 is symmetric and that C = (A−AT )/2is skew symmetric.

29. Use Exercise 28 to show that every (n × n) matrixcan be expressed as the sum of a symmetric matrixand a skew-symmetric matrix.

30. Use Exercises 26 and 29 and Example 7 to constructa spanning set for the vector space of all (3 × 3)matrices where the spanning set consists entirely ofsymmetric and skew-symmetric matrices. Specifyhow a (3× 3) matrix A = (aij ) can be expressed byusing this spanning set.

31. Let V be the set of all (3× 3) upper-triangular ma-trices, and note that V is a vector space. Each of thesubsets W is a subspace of V . Find a spanning setfor W .a) W = {A in V : a11 = 0, a22 = 0, a33 = 0}b) W = {A in V : a11 + a22 + a33 = 0, a12 +

a23 = 0}


5.4 Linear Independence, Bases, and Coordinates 375

c) W = {A in V : a11 = a12, a13 = a23, a22 = a33}d) W = {A in V : a11 = a22, a22 − a33 = 0, a12 +

a23 = 0}32. Let p(x) = a0 + a1x + a2x

2 be a vector in P2.Find b0, b1, and b2 in terms of a0, a1, and a2 sothat p(x) = b0 + b1(x + 1) + b2(x + 1)2. [Hint:Equate the coefficients of like powers of x.] Repre-sent q(x) = 1 − x + 2x2 and r(x) = 2 − 3x + x2

in terms of the spanning set {1, x + 1, (x + 1)2}.33. Let A be an arbitrary matrix in the vector space of

all (2× 2) matrices:

A =[

a b

c d

].

Find scalars x1, x2, x3, x4 in terms of a, b, c, and d

such that A = x1B1 + x2B2 + x3B3 + x4B4, where

B1 =[

1 01 −2

], B2 =

[2 11 −2

],

B3 =[ −1 3−3 6

], and B4 =

[1 1−2 5

].

Represent the matrices

C =[

0 2−1 1

]and D =

[2 10 1

]

in terms of the spanning set {B1, B2, B3, B4}.

5.4 LINEAR INDEPENDENCE, BASES,AND COORDINATES

One of the central ideas of Chapters 1 and 3 is linear independence. As we will see, thisconcept generalizes directly to vector spaces. With the concepts of linear independenceand spanning sets, it is easy to extend the idea of a basis to our vector-space setting. Thenotion of a basis is one of the most fundamental concepts in the study of vector spaces.For example, in certain vector spaces a basis can be used to produce a coordinate systemfor the space. As a consequence, a real vector space with a basis of n vectors behavesessentially like Rn. Moreover, this coordinate system sometimes permits a geometricperspective in an otherwise nongeometric setting.

Linear IndependenceWe begin by restating Definition 11 of Section 1.7 in a general vector-space setting.

Definition 4 Let V be a vector space, and let {v1, v2, . . . , vp} be a set of vectors in V . This setis linearly dependent if there are scalars a1, a2, . . . , ap, not all of which are zero,such that

a1v1 + a2v2 + · · · + apvp = θ . (1)

The set {v1, v2, . . . , vp} is linearly independent if it is not linearly dependent; thatis, the only scalars for which Eq. (1) holds are the scalars a1 = a2 = · · · = ap = 0.

Note that as a consequence of property 3 of Theorem 1 in Section 5.2, the vectorequation (1) in Definition 4 always has the trivial solution a1 = a2 = · · · = ap = 0.Thus the set {v1, v2, . . . , vp} is linearly independent if the trivial solution is the onlysolution to Eq. (1). If another solution exists, then the set is linearly dependent.



As before, it is easy to prove that a set {v1, v2, . . . , vp} is linearly dependent if andonly if some vi is a linear combination of the other p − 1 vectors in the set. The onlyreal distinction between linear independence/dependence in Rn and in a general vectorspace is that we cannot always test for dependence by solving a homogeneous system ofequations. That is, in a general vector space we may have to go directly to the definingequation

a1v1 + a2v2 + · · · + apvp = θand attempt to determine whether there are nontrivial solutions. Examples 2 and 3illustrate the point.

Example 1 Let V be the vector space of (2× 2) matrices, and let W be the subspace

W = {A: A =[

0 a12

a21 0


Define matrices B1, B2, and B3 in W by

B1 =[

0 21 0

], B2 =

[0 10 0

], and B3 =

[0 23 0

].

Show that the set {B1, B2, B3} is linearly dependent, and express B3 as a linear combi-nation of B1 and B2. Show that {B1, B2} is a linearly independent set.

Solution According to Definition 4, the set {B1, B2, B3} is linearly dependent provided that thereexist nontrivial solutions to the equation

a1B1 + a2B2 + a3B3 = O, (2)

whereO is the zero element in V [that is,O is the (2× 2) zero matrix]. Writing Eq. (2)in detail, we see that a1, a2, a3 are solutions of Eq. (2) if[

0 2a1

a1 0

]+[

0 a2

0 0

]+[

0 2a3

3a3 0

]=[

0 00 0

].

With corresponding entries equated, a1, a2, a3 must satisfy

2a1 + a2 + 2a3 = 0 and a1 + 3a3 = 0.

This (2× 3) homogeneous system has nontrivial solutions by Theorem 4 of Section 1.3,and one such solution is a1 = −3, a2 = 4, a3 = 1. In particular,

−3B1 + 4B2 + B3 = O; (3)

so the set {B1, B2, B3} is a linearly dependent set of vectors in W . It is an immediateconsequence of Eq. (3) that

B3 = 3B1 − 4B2.

To see that the set {B1, B2} is linearly independent, let a1 and a2 be scalars such thata1B1 + a2B2 = O. Then we must have

2a1 + a2 = 0 and a1 = 0.



Hence a1 = 0 and a2 = 0; so if a1B1 + a2B2 = O, then a1 = a2 = 0. Thus {B1, B2} isa linearly independent set of vectors in W .

Establishing linear independence/dependence in a vector space of functions such asPn or C[a, b] may sometimes require techniques from calculus. We illustrate one suchtechnique in the following example.

Example 2 Show that {1, x, x2} is a linearly independent set in P2.

Solution Suppose that a0, a1, a2 are any scalars that satisfy the defining equation

a0 + a1x + a2x2 = θ(x), (4)

where θ(x) is the zero polynomial. If Eq. (4) is to be an identity holding for all valuesof x, then [since θ ′(x) = θ(x)] we can differentiate both sides of Eq. (4) to obtain

a1 + 2a2x = θ(x). (5)

Similarly, differentiating both sides of Eq. (5), we obtain

2a2 = θ(x). (6)

From Eq. (6) we must have a2 = 0. If a2 = 0, then Eq. (5) requires a1 = 0; hence inEq. (4), a0 = 0 as well. Therefore, the only scalars that satisfy Eq. (4) are a0 = a1 =a2 = 0, and thus {1, x, x2} is linearly independent in P2. (Also see the material onWronskians in Section 6.5.)

The following example illustrates another procedure for showing that a set of func-tions is linearly independent.

Example 3 Show that {√x, 1/x, x2} is a linearly independent subset of C[1, 10].Solution If the equation

a1√x + a2(1/x)+ a3x

2 = 0 (7)

holds for all x, 1 ≤ x ≤ 10, then it must hold for any three values of x in the interval.Successively letting x = 1, x = 4, and x = 9 in Eq. (7) yields the system of equations

a1 + a2 + a3 = 02a1 + (1/4)a2 + 16a3 = 03a1 + (1/9)a2 + 81a3 = 0.

(8)

It is easily shown that the trivial solution a1 = a2 = a3 = 0 is the unique solution forsystem (8). It follows that the set {√x, 1/x, x2} is linearly independent.

Note that a nontrivial solution for system (8) would have yielded no informationregarding the linear independence/dependence of the given set of functions. We couldhave concluded only that Eq. (7) holds when x = 1, x = 4, or x = 9.

Vector-Space BasesIt is now straightforward to combine the concepts of linear independence and spanningsets to define a basis for a vector space.



Definition 5 Let V be a vector space, and let B = {v1, v2, . . . , vp} be a spanning set for V . IfB is linearly independent, then B is a basis for V .

Thus as before, a basis for V is a linearly independent spanning set for V . (Againwe note the implicit assumption that a basis contains only a finite number of vectors.)

There is often a “natural” basis for a vector space. We have seen in Chapter 3 thatthe set of unit vectors {e1, e2, . . . , en} in Rn is a basis for Rn. In the preceding sectionwe noted that the set {1, x, x2} is a spanning set for P2. Example 2 showed further that{1, x, x2} is linearly independent and hence is a basis for P2. More generally, the set{1, x, . . . , xn} is a natural basis for Pn. Similarly, the matrices

E11 =[

1 00 0

], E12 =

[0 10 0

], E21 =

[0 01 0

], and E22 =

[0 00 1

]

constitute a basis for the vector space of all (2 × 2) real matrices (see Exercise 11). Ingeneral, the set of (m× n) matrices {Eij : 1 ≤ i ≤ m, 1 ≤ j ≤ n} defined in Section 5.3is a natural basis for the vector space of all (m× n) real matrices.

Examples 5, 6, and 7 in Section 5.3 demonstrated a procedure for obtaining anatural spanning set for a subspace W when an algebraic specification for W is given.The spanning set obtained in this manner is often a basis for W . The following exampleprovides another illustration.

Example 4 Let V be the vector space of all (2× 2) real matrices, and let W be the subspace definedby

W = {A: A =[

a a + b

a − b b

], a and b any real numbers}.

Exhibit a basis for W .

Solution In the specification for W, a and b are unconstrained variables. Assigning values a =1, b = 0 and then a = 0, b = 1 yields the matrices

B1 =[

1 11 0

]and B2 =

[0 1−1 1

]

in W . Since [a a + b

a − b b

]= a

[1 11 0

]+ b

[0 1−1 1

],

the set {B1, B2} is clearly a spanning set for W . The equation

c1B1 + c2B2 = O(where O is the (2× 2) zero matrix) is equivalent to[

c1 c1 + c2

c1 − c2 c2

]=[

0 00 0

].



Equating entries immediately yields c1 = c2 = 0; so the set {B1, B2} is linearly inde-pendent and hence is a basis for W .

Coordinate VectorsAs we noted in Chapter 3 a basis is a minimal spanning set; as such, a basis contains noredundant information. This lack of redundance is an important feature of a basis in thegeneral vector-space setting and allows every vector to be represented uniquely in termsof the basis (see Theorem 4). We cannot make such an assertion of unique representationabout a spanning set that is linearly dependent; in fact, in this case, the representation isnever unique.

Theorem 4 Let V be a vector space, and let B = {v1, v2, . . . , vp} be a basis for V . For each vectorw in V , there exists a unique set of scalars w1, w2, . . . , wp such that

w = w1v1 + w2v2 + · · · + wpvp.

Proof Let w be a vector in V and suppose that w is represented in two ways as

w = w1v1 + w2v2 + · · · + wpvpw = u1v1 + u2v2 + · · · + upvp.

Subtracting, we obtain

θ = (w1 − u1)v1 + (w2 − u2)v2 + · · · + (wp − up)vp.

Therefore, since {v1, v2, . . . , vp} is a linearly independent set, it follows that w1− u1 =0, w2 − u2 = 0, . . . , wp − up = 0. That is, a vector w cannot be represented in twodifferent ways in terms of a basis B.

Now, let V be a vector space with a basis B = {v1, v2, . . . , vp}. Given that eachvector w in V has a unique representation in terms of B as

w = w1v1 + w2v2 + · · · + wpvp, (9)

it follows that the scalars w1, w2, . . . , wp serve to characterize w completely in termsof the basis B. In particular, we can identify w unambiguously with the vector [w]B inRp, where

[w]B =

w1

w2...wp

.

We will call the unique scalars w1, w2, . . . , wp in Eq. (9) the coordinates of w withrespect to the basis B, and we will call the vector [w]B in Rp the coordinate vector ofw with respect to B. This idea is a useful one; for example, we will show that a setof vectors {u1, u2, . . . , ur} in V is linearly independent if and only if the coordinatevectors [u1]B, [u2]B, . . . , [ur ]B are linearly independent in Rn. Since we know how todetermine whether vectors in Rp are linearly independent or not, we can use the idea ofcoordinates to reduce a problem of linear independence/dependence in a general vector



space to an equivalent problem in Rp, which we can work. Finally, we note that thesubscript B is necessary when we write [w]B , since the coordinate vector for w changeswhen we change the basis.

Example 5 Let V be the vector space of all real (2× 2) matrices. Let B = {E11, E12, E21, E22} andQ = {E11, E21, E12, E22}, where

E11 =[

1 00 0

], E12 =

[0 10 0

], E21 =

[0 01 0

], and E22 =

[0 00 1

].

Let the matrix A be defined by

A =[

2 −1−3 4

].

Find [A]B and [A]Q.

Solution We have already noted that B is the natural basis for V . Since Q contains the samevectors as B, but in a different order, Q is also a basis for V . It is easy to see that

A = 2E11 − E12 − 3E21 + 4E22,

so

[A]B =

2−1−3

4

.

Similarly,

A = 2E11 − 3E21 − E12 + 4E22,

so

[A]Q =

2−3−1

4

.

It is apparent in the preceding example that the ordering of the basis vectors deter-mined the ordering of the components of the coordinate vectors. A basis with such animplicitly fixed ordering is usually called an ordered basis. Although we do not intendto dwell on this point, we do have to be careful to work with a fixed ordering in a basis.

If V is a vector space with (ordered) basis B = {v1, v2, . . . , vp}, then thecorrespondence

v→ [v]Bprovides an identification between vectors in V and elements of Rp. For instance, thepreceding example identified a (2×2) matrix with a vector in R4. The following lemma



lists some of the properties of this correspondence. (The lemma hints at the idea of anisomorphism that will be developed in detail later.)

Lemma Let V be a vector space that has a basis B = {v1, v2, . . . , vp}. If u and v are vectors inV and if c is a scalar, then the following hold:

[u+ v]B = [u]B + [v]Band

[cu]B = c[u]B.Proof Suppose that u and v are expressed in terms of the basis vectors in B as

u = a1v1 + a2v2 + · · · + apvpand

v = b1v1 + b2v2 + · · · + bpvp.

Then clearly

u+ v = (a1 + b1)v1 + (a2 + b2)v2 + · · · + (ap + bp)vpand

cu = (ca1)v1 + (ca2)v2 + · · · + (cap)vp.

Therefore,

[u]B =

a1

a2...ap

, [v]B =

b1

b2...bp

,

[u+ v]B =

a1 + b1

a2 + b2...

ap + bp

, and [cu] =

ca1

ca2...

cap

.

We can now easily see that [u+ v]B = [u]B + [v]B and [cu]B = c[u]B .

The following example illustrates the preceding lemma.

Example 6 In P2, let p(x) = 3− 2x + x2 and q(x) = −2+ 3x − 4x2. Show that

[p(x)+ q(x)]B = [p(x)]B + [q(x)]B and [2p(x)]B = 2[p(x)]B,where B is the natural basis for P2: B = {1, x, x2}.

Solution The coordinate vectors for p(x) and q(x) are

[p(x)]B =

3−2

1

and [q(x)]B =

−2

3−4

.



Furthermore, p(x)+ q(x) = 1+ x − 3x2 and 2p(x) = 6− 4x + 2x2. Thus

[p(x)+ q(x)]B =

11−3

and [2p(x)]B =

6−4

2

.

Therefore, [p(x)+ q(x)]B = [p(x)]B + [q(x)]B and [2p(x)]B = 2[p(x)]B .

Suppose that the vector space V has basis B = {v1, v2, . . . , vp}, and let{u1, u2, . . . , um} be a subset of V . The two properties in the preceding lemma caneasily be combined and extended to give

[c1u1 + c2u2 + · · · + cmum]B = c1[u1]B + c2[u2]B + · · · + cm[um]B. (10)

This observation will be useful in proving the next theorem.

Theorem 5 Suppose that V is a vector space with a basis B = {v1, v2, . . . , vp}. Let S ={u1, u2, . . . , um} be a subset of V , let T = {[u1]B, [u2]B, . . . , [um]B}.

1. A vector u in V is in Sp(S) if and only if [u]B is in Sp(T ).2. The set S is linearly independent in V if and only if the set T is linearly inde-

pendent in Rp.

Proof The vector equation

u = x1u1 + x2u2 + · · · + xmum (11)

in V is equivalent to the equation

[u]B = [x1u1 + x2u2 + · · · + xmum]B (12)

in Rp. It follows from Eq. (10) that Eq. (12) is equivalent to

[u]B = x1[u1]B + x2[u2]B + · · · + xm[um]B. (13)

Therefore, the vector equation (11) in V is equivalent to the vector equation (13) in Rp.In particular, Eq. (11) has a solution x1 = c1, x2 = c2, . . . , xm = cm if and only ifEq. (13) has the same solution. Thus u is in Sp(S) if and only if [u]B is in Sp(T ).

To avoid confusion in the proof of property 2, let θV denote the zero vector for Vand let θp denote the p-dimensional zero vector in Rp. Then [θV ]B = θp. Thus settingu = θV in Eq. (11) and Eq. (13) implies that the vector equations

θV = x1u1 + x2u2 + · · · + xmum (14)

and

θp = x1[u1]B + x2[u2]B + · · · + xm[um]B (15)

have the same solutions. In particular, Eq. (14) has only the trivial solution if and onlyif Eq. (15) has only the trivial solution; that is, S is a linearly independent set in V if andonly if T is linearly independent in Rp.

An immediate corollary to Theorem 5 is as follows.



Corollary Let V be a vector space with a basis B = {v1, v2, . . . , vp}. Let S = {u1, u2, . . . , um}be a subset of V , and let T = {[u1]B, [u2]B, . . . , [um]B}. Then S is a basis for V if andonly if T is a basis for Rp.

Proof By Theorem 5, S is both linearly independent and a spanning set for V if and only if Tis both linearly independent and a spanning set for Rp.

Theorem 5 and its corollary allow us to use the techniques developed in Chapter 3to solve analogous problems in vector spaces other than Rp. The next two examplesprovide illustrations.

Example 7 Use the corollary to Theorem 5 to show that the set {1, 1 + x, 1 + 2x + x2} is a basisfor P2.

Solution Let B be the standard basis for P2: B = {1, x, x2}. The coordinate vectors of 1, 1+ x,and 1+ 2x + x2 are

[1]B =

100

, [1+ x]B =

110

, and [1+ 2x + x2]B =

121

.

Clearly the coordinate vectors [1]B, [1+x]B , and [1+2x+x2]B are linearly independentin R3. Since R3 has dimension 3, the coordinate vectors constitute a basis for R3. Itnow follows that {1, 1+ x, 1+ 2x + x2} is a basis for P2.

Example 8 Let V be the vector space of all (2 × 2) matrices, and let the subset S of V be definedby S = {A1, A2, A3, A4}, where

A1 =[

1 2−1 3

], A2 =

[0 −11 4

],

A3 =[ −1 0

1 −10

], and A4 =

[3 7−2 6

].

Use the corollary to Theorem 5 and the techniques of Section 3.4 to obtain a basis forSp(S).

Solution If B is the natural basis for V , B = {E11, E12, E21, E22}, then

[A1]B =

12−1

3

, [A2]B =

0−1

14

,

[A3]B =

−1

01

−10

, and [A4]B =

37−2

6

.



Let T = {[A1]B, [A2]B, [A3]B, [A4]B}. Several techniques for obtaining a basis forSp(T ) were illustrated in Section 3.4. For example, using the method demonstrated inExample 7 of Section 3.4, we form the matrix

C =

1 0 −1 32 −1 0 7−1 1 1 −2

3 4 −10 6

.

The matrix CT can be reduced to the matrix

DT =

1 2 −1 30 −1 1 40 0 2 10 0 0 0

.

Thus

D =

1 0 0 02 −1 0 0−1 1 2 0

3 4 1 0

,

and the nonzero columns of D constitute a basis for Sp(T ). Denote the nonzero columnsof D by w1,w2, and w3, respectively. Thus

w1 =

12−1

3

, w2 =

0−1

14

, and w3 =

0021

,

and {w1,w2,w3} is a basis for Sp(T ). If B1, B2, and B3 are (2 × 2) matrices suchthat [B1]B = w1, [B2]B = w2, and [B3]B = w3, then it follows from Theorem 5 that{B1, B2, B3} is a basis for Sp(S). If

B1 = E11 + 2E12 − E21 + 3E22,

then clearly [B1]B = w1. B2 and B3 are obtained in the same fashion, and

B1 =[

1 2−1 3

], B2 =

[0 −11 4

], and B3 =

[0 02 1

].

Examples 7 and 8 illustrate an important point. Although Theorem 5 shows thatquestions regarding the span or the linear dependence/independence of a subset of Vcan be translated to an equivalent problem in Rp, we do need one basis for V as a pointof reference. For example, in P2, once we know that B = {1, x, x2} is a basis, we canuse Theorem 5 to pass from a problem in P2 to an analogous problem in R3. In order toobtain the first basis B, however, we cannot use Theorem 5.



Example 9 In P4, consider the set of vectors S = {p1, p2, p3, p4, p5}, where p1(x) = x4 + 3x3 +2x + 4, p2(x) = x3 − x2 + 5x + 1, p3(x) = x4 + x + 3, p4(x) = x4 + x3 − x + 2, andp5(x) = x4 + x2. Is S a basis for P4?

Solution Let B denote the standard basis for P4, B = {1, x, x2, x3, x4}. By the corollaryto Theorem 5, S is a basis for P4 if and only if T is a basis for R5, where T ={[p1]B, [p2]B, [p3]B, [p4]B, [p5]B}. In particular, the coordinate vectors in T are

[p1]B =

42031

, [p2]B =

15−1

10

, [p3]B =

31001

,

[p4]B =

2−1

011

, and [p5]B =

00101

.

Since R5 has dimension 5 and T contains 5 vectors, T will be a basis for R5 if T

is a linearly independent set. To check whether T is linearly independent, we formthe matrix A whose columns are the vectors in T and use MATLAB to reduce A toechelon form. As can be seen from the results in Fig. 5.4, the columns of A are linearlyindependent. Hence, T is a basis for R5. Therefore, S is a basis for P4.

A= 4 1 3 2 0 2 5 1 -1 0 0 -1 0 0 1 3 1 0 1 0 1 0 1 1 1

>>rref(A)

ans= 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1

Figure 5.4 MATLAB was used for Example 9 to determine whether thecolumns of A are linearly independent. Since A is row equivalent to theidentity, its columns are linearly independent.



5.4 EXERCISES

In Exercises 1–4, W is a subspace of the vector space Vof all (2× 2) matrices. A matrix A in W is written as

A =[

a b

c d

].

In each case exhibit a basis for W .1. W = {A: a + b + c + d = 0}2. W = {A: a = −d, b = 2d, c = −3d}3. W = {A: a = 0}4. W = {A: b = a − c, d = 2a + c}

In Exercises 5–8, W is a subspace of P2. In each caseexhibit a basis for W .5. W = {p(x) = a0 + a1x + a2x

2: a2 = a0 − 2a1}6. W = {p(x) = a0 + a1x + a2x

2: a0 = 3a2,

a1 = −a2}7. W = {p(x) = a0 + a1x + a2x

2: p(0) = 0}8. W = {p(x) = a0+ a1x+ a2x

2: p(1) = p′(1) = 0}9. Find a basis for the subspace V of P4, where V ={p(x) in P4: p(0) = 0, p′(1) = 0, p′′(−1) = 0}.

10. Prove that the set of all real (2× 2) symmetric ma-trices is a subspace of the vector space of all real(2× 2) matrices. Find a basis for this subspace (seeExercise 26 of Section 5.3).

11. Let V be the vector space of all (2×2) real matrices.Show that B = {E11, E12, E21, E22} (see Example5) is a basis for V .

12. With respect to the basis B = {1, x, x2} for P2, findthe coordinate vector for each of the following.a) p(x) = x2 − x + 1b) p(x) = x2 + 4x − 1c) p(x) = 2x + 5

13. With respect to the basis B = {E11, E12, E21, E22}for the vector space V of all (2 × 2) matrices, findthe coordinate vector for each of the following.

a) A1 =[

2 −13 2

]b) A2 =

[1 0−1 1

]

c) A3 =[

2 30 0

]

14. Prove that {1, x, x2, . . . , xn} is a linearly indepen-dent set inPn by supposing thatp(x) = θ(x), where

p(x) = a0+a1x+· · ·+anxn. Next, take successivederivatives as in Example 2.

In Exercises 15–17, use the basis B of Exercise 11 andproperty 2 of Theorem 5 to test for linear independencein the vector space of (2× 2) matrices.

15. A1 =[

2 12 1

], A2 =

[3 00 2

],

A3 =[

1 12 1

]

16. A1 =[

1 32 1

], A2 =

[4 −20 6

],

A3 =[

6 44 8

]

17. A1 =[

2 21 3

], A2 =

[1 40 5

],

A3 =[

4 101 13

]

In Exercises 18–21, use Exercise 14 and property 2 ofTheorem 5 to test for linear independence in P3.18. {x3 − x, x2 − 1, x + 4}19. {x2 + 2x − 1, x2 − 5x + 2, 3x2 − x}20. {x3 − x2, x2 − x, x − 1, x3 − 1}21. {x3 + 1, x2 + 1, x + 1, 1}22. In P2, let S = {p1(x), p2(x), p3(x), p4(x)}, where

p1(x) = 1 + 2x + x2, p2(x) = 2 + 5x, p3(x) =3 + 7x + x2, and p4(x) = 1 + x + 3x2. Use themethod illustrated in Example 8 to obtain a basis forSp(S). [Hint: Use the basis B = {1, x, x2} to ob-tain coordinate vectors for p1(x), p2(x), p3(x), andp4(x). Now use the method illustrated in Example 7of Section 3.4.]

23. Let S be the subset of P2 given in Exercise 22. Finda subset of S that is a basis for Sp(S). [Hint: Proceedas in Exercise 22, but use the technique illustratedin Example 6 of Section 3.4.]

24. Let V be the vector space of all (2×2) matrices andlet S = {A1, A2, A3, A4}, where



A1 =[

1 2−1 3

], A2 =

[ −2 12 −1

],

A3 =[ −1 −1

1 −3

], and A4 =

[ −2 22 0

].

As in Example 8, find a basis for Sp(S).25. Let V and S be as in Exercise 24. Find a subset

of S that is a basis for Sp(S). [Hint: Use Theo-rem 5 and the technique illustrated in Example 6 ofSection 3.4.]

26. In P2, let Q = {p1(x), p2(x), p3(x)}, wherep1(x) = −1 + x + 2x2, p2(x) = x + 3x2, andp3(x) = 1+2x+8x2. Use the basis B = {1, x, x2}to show that Q is a basis for P2.

27. Let Q be the basis for P2 given in Exercise 26. Find[p(x)]Q for p(x) = 1+ x + x2.

28. Let Q be the basis for P2 given in Exercise 26. Find[p(x)]Q for p(x) = a0 + a1x + a2x

2.

29. In the vector space V of (2 × 2) matrices, letQ = {A1, A2, A3, A4} where

A1 =[

1 00 0

], A2 =

[1 −10 0

],

A3 =[

0 20 0

], and A4 =

[ −3 02 1

].

Use the corollary to Theorem 5 and the natural basisfor V to show that Q is a basis for V .

30. With V and Q as in Exercise 29, find [A]Q for

A =[

7 3−3 −1

].

31. With V and Q as in Exercise 29, find [A]Q for

A =[

a b

c d

].

32. Give an alternative proof that {1, x, x2} is a lin-early independent set in P2 as follows: Let p(x) =a0 + a1x + a2x

2, and suppose that p(x) = θ(x).Then p(−1) = 0, p(0) = 0, and p(1) = 0.These three equations can be used to show thata0 = a1 = a2 = 0.

33. The set {sin x, cos x} is a subset of the vector spaceC[−π, π ]. Prove that the set is linearly indepen-dent. [Hint: Set f (x) = c1 sin x + c2 cos x, andassume that f (x) = θ(x). Then f (0) = 0 andf (π/2) = 0.]

In Exercises 34 and 35, V is the set of functions

V = {f (x): f (x) = aex + be2x + ce3x + de4x

for real numbers a, b, c, d}.It can be shown that V is a vector space.

34. Show that B = {ex, e2x, e3x, e4x} is a basis for V .[Hint: To see that B is a linearly independent set, leth(x) = c1e

x+c2e2x+c3e

3x+c4e4x and assume that

h(x) = θ(x). Then h′(x) = θ(x), h′′(x) = θ(x),and h′′′(x) = θ(x). Therefore, h(0) = 0, h′(0) =0, h′′(0) = 0, and h′′′(0) = 0.]

35. Let S = {g1(x), g2(x), g3(x)} be the subset of V ,where g1(x) = ex − e4x, g2(x) = e2x + e3x , andg3(x) = −ex + e3x + e4x . Use Theorem 5 andbasis B of Exercise 34 to show that S is a linearlyindependent set.

36. Prove that if Q = {v1, v2, . . . , vm} is a linearly in-dependent subset of a vector space V , and if w isa vector in V such that w is not in Sp(Q), then{v1, v2, . . . , vm,w} is also a linearly independent setin V . [Note: θ is always in Sp(Q).]

37. Let S = {v1, v2, . . . , vn} be a subset of a vectorspace V , where n ≥ 2. Prove that set S is linearlydependent if and only if at least one of the vectors,vj , can be expressed as a linear combination of theremaining vectors.

38. Use Exercise 37 to obtain necessary and sufficientconditions for a set {u, v} of two vectors to be lin-early dependent. Determine by inspection whethereach of the following sets is linearly dependent orlinearly independent.

a) {1+ x, x2}b) {x, ex}c) {x, 3x}

d)

{[ −1 2

1 3

],

[2 −4

−2 −6

]}

e)

{[0 0

0 0

],

[1 0

0 1

]}



5.5 DIMENSION

We now use Theorem 5 to generalize the idea of dimension to the general vector-spacesetting. We begin with two theorems that will be needed to show that dimension isa well-defined concept. These theorems are direct applications of the corollary toTheorem 5, and the proofs are left to the exercises because they are essentially thesame as the proofs of the analogous theorems from Section 3.5.

Theorem 6 If V is a vector space and if B = {v1, v2, . . . , vp} is a basis of V , then any set of p + 1vectors in V is linearly dependent.

Theorem 7 Let V be a vector space, and let B = {v1, v2, . . . , vp} be a basis for V . If Q ={u1, u2, . . . , um} is also a basis for V , then m = p.

If V is a vector space that has a basis of p vectors, then no ambiguity can arise ifwe define the dimension of V to be p (since the number of vectors in a basis for V is aninvariant property of V by Theorem 7). There is, however, one extreme case, which isalso included in Definition 6. That is, there may not be a finite set of vectors that spansV ; in this case we call V an infinite-dimensional vector space.

Definition 6 Let V be a vector space.

1. If V has a basis B = {v1, v2, . . . , vn} of n vectors, then V has dimension n,and we write dim(V ) = n. [If V = {θ}, then dim(V ) = 0.]

2. If V is nontrivial and does not have a basis containing a finite number ofvectors, then V is an infinite-dimensional vector space.

We already know from Chapter 3 that Rn has dimension n. In the preceding sectionit was shown that {1, x, x2} is a basis for P2, so dim(P2) = 3. Similarly, the set{1, x, . . . , xn} is a basis for Pn, so dim(Pn) = n+ 1. The vector space V consisting ofall (2×2) real matrices has a basis with four vectors, namely, B = {E11, E12, E21, E22}.Therefore, dim(V ) = 4. More generally, the space of all (m × n) real matrices hasdimension mn because the (m × n) matrices Eij , 1 ≤ i ≤ m, 1 ≤ j ≤ n, constitute abasis for the space.

Example 1 Let W be the subspace of the set of all (2× 2) matrices defined by

W = {A =[

a b

c d

]: 2a − b + 3c + d = 0}.

Determine the dimension of W .

Solution The algebraic specification for W can be rewritten as d = −2a + b − 3c. Thus anelement of W is completely determined by the three independent variables a, b, and c.


5.5 Dimension 389

In succession, let a = 1, b = 0, c = 0; a = 0, b = 1, c = 0; and a = 0, b = 0, c = 1.This yields three matrices

A1 =[

1 00 −2

], A2 =

[0 10 1

], and A3 =

[0 01 −3

]

in W . The matrix A is in W if and only if A = aA1 + bA2 + cA3, so {A1, A2, A3} is aspanning set for W . It is easy to show that the set {A1, A2, A3} is linearly independent,so it is a basis for W . It follows that dim(W) = 3.

An example of an infinite-dimensional vector space is given next, in Example 2. AsExample 2 illustrates, we can show that a vector space V is infinite dimensional if wecan show that V contains subspaces of dimension k for k = 1, 2, 3, . . . .

If W is a subspace of a vector space V , and if dim(W) = k, then it is almost obviousthat dim(V ) ≥ dim(W) = k (we leave the proof of this as an exercise). This observationcan be used to show that C[a, b] is an infinite-dimensional vector space.

Example 2 Show that C[a, b] is an infinite-dimensional vector space.

Solution To show that C[a, b] is not a finite-dimensional vector space, we merely note that Pn

is a subspace of C[a, b] for every n. But dim(Pn) = n + 1; and so C[a, b] containssubspaces of arbitrarily large dimension. Thus C[a, b] must be an infinite-dimensionalvector space.

Properties of a p-Dimensional Vector SpaceThe next two theorems summarize some of the properties of a p-dimensional vectorspace V and show how properties of Rp carry over into V .

Theorem 8 Let V be a finite-dimensional vector space with dim(V ) = p.

1. Any set of p + 1 or more vectors in V is linearly dependent.2. Any set of p linearly independent vectors in V is a basis for V .

This theorem is a direct generalization from Rp (Exercise 20). To complete ourdiscussion of finite-dimensional vector spaces, we state the following lemma.

Lemma Let V be a vector space, and let Q = {u1, u2, . . . , up} be a spanning set for V . Thenthere is a subset Q′ of Q that is a basis for V .

Proof (We only sketch the proof of this lemma because the proof follows familiar lines.) If Qis linearly independent, then Q itself is a basis for V . If Q is linearly dependent, we canexpress some vector from Q in terms of the other p − 1 vectors in Q. Without loss ofgenerality, let us suppose we can express u1 in terms of u2, u3, . . . , up. In that event wehave

Sp{u2, u3, . . . , up} = Sp{u1, u2, u3, . . . , up} = V ;if {u2, u3, . . . , up} is linearly independent, it is a basis for V . If {u2, u3, . . . , up} islinearly dependent, we continue discarding redundant vectors until we obtain a linearlyindependent spanning set, Q′.



The following theorem is a companion to Theorem 8.

Theorem 9 Let V be a finite-dimensional vector space with dim(V ) = p.

1. Any spanning set for V must contain at least p vectors.2. Any set of p vectors that spans V is a basis for V .

Proof Property 1 follows immediately from the preceding lemma, for if there were a spanningset Q for V that contained fewer than p vectors, then we could find a subset Q′ of Qthat is a basis for V containing fewer than p vectors. This finding would contradictTheorem 7, so property 1 must be valid.

Property 2 also follows from the lemma, because we know there is a subset Q′ ofQ such that Q′ is a basis for V . Since dim(V ) = p, Q′ must have p vectors, and sinceQ′ ⊆ Q, where Q has p vectors, we must have Q′ = Q.

Example 3 Let V be the vector space of all (2× 2) real matrices. In V , set

A1 =[

1 0−1 0

], A2 =

[0 12 0

], A3 =

[0 0−1 3

],

A4 =[

1 0−1 1

], and A5 =

[2 13 1

].

For each of the sets {A1, A2, A3}, {A1, A2, A3, A4}, and {A1, A2, A3, A4, A5}, de-termine whether the set is a basis for V .

Solution We have already noted that dim(V ) = 4 and that B = {E11, E12, E21, E22} is a basisfor V . It follows from property 1 of Theorem 9 that the set {A1, A2, A3} does not spanV . Likewise, property 1 of Theorem 8 implies that {A1, A2, A3, A4, A5} is a linearlydependent set. By property 2 of Theorem 8, the set {A1, A2, A3, A4} is a basis for Vif and only if it is a linearly independent set. It is straightforward to see that the setof coordinate vectors {[A1]B, [A2]B, [A3]B, [A4]B} is a linearly independent set. ByTheorem 5 of Section 5.4, the set {A1, A2, A3, A4} is also linearly independent; thus theset is a basis for V .

5.5 EXERCISES

1. Let V be the set of all real (3× 3) matrices, and letV1 and V2 be subsets of V, where V1 consists of allthe (3×3) lower-triangular matrices and V2 consistsof all the (3× 3) upper-triangular matrices.a) Show that V1 and V2 are subspaces of V.b) Find bases for V1 and V2.c) Calculate dim(V ), dim(V1), and dim(V2).

2. Suppose that V1 and V2 are subspaces of a vectorspace V . Show that V1 ∩ V2 is also a subspace of V.

It is not necessarily true that V1 ∪ V2 is a subspaceof V . Let V = R2 and find two subspaces of R2

whose union is not a subspace of R2.3. LetV, V1, andV2 be as in Exercise 1. By Exercise 2,

V1 ∩ V2 is a subspace of V . Describe V1 ∩ V2 andcalculate its dimension.

4. Let V be as in Exercise 1, and let W be the subsetof all the (3× 3) symmetric matrices in V. ClearlyW is a subspace of V . What is dim(W)?


5.5 Dimension 391

5. Recall that a square matrix A is called skew sym-metric if AT = −A. Let V be as in Exercise 1 andletW be the subset of all the (3×3) skew-symmetricmatrices in V. Calculate dim(W).

6. Let W be the subspace of P2 consisting of polyno-mials p(x) = a0 + a1x + a2x

2 such that 2a0 − a1+3a2 = 0. Determine dim(W).

7. Let W be the subspace of P4 defined thus: p(x)

is in W if and only if p(1) + p(−1) = 0 andp(2)+ p(−2) = 0. What is dim(W)?

In Exercises 8–13, a subset S of a vector space V isgiven. In each case choose one of the statements i), ii),or iii) that holds for S and verify that this is the case.i) S is a basis for V .ii) S does not span V .iii) S is linearly dependent.8. S = {1+ x − x2, x + x3,−x2 + x3}; V = P3

9. S = {1+ x2, x − x2, 1+ x, 2− x + x2}; V = P2

10. S = {1+ x + x2, x + x2, x2}; V = P2

11. S ={[ 0 1

1 0

],

[ 1 00 1

]};

V is the set of all (2× 2) real matrices.

12. S ={[ 0 0

0 1

],

[ 0 10 1

],

[ 1 11 1

],

[ 0 11 1

]};

V is the set of all (2× 2) real matrices.

13. S ={[ 1 0−1 0

],

[ 1 21 −2

],

[ 1 −11 4

],

[ 3 40 4

],

[0 1−1 3

]};

V is the set of all (2× 2) real matrices.14. Let W be the subspace of C[−π, π ] consisting of

functions of the form f (x) = a sin x+ b cos x. De-termine the dimension of W .

15. Let V denote the set of all infinite sequences of realnumbers:

V = {x: x = {xi}∞i=1, xi in R}.If x = {xi}∞i=1 and y = {yi}∞i=1 are in V , then x + yis the sequence {xi + yi}∞i=1. If c is a real number,then cx is the sequence {cxi}∞i=1.a) Prove that V is a vector space.b) Show that V has infinite dimension. [Hint:

For each positive integer, k, let sk denote the

sequence sk = {eki}∞i=1, where ekk = 1, buteki = 0 for i �= k. For each positive integer n,show that {s1, s2, . . . , sn} is a linearly inde-pendent subset of V .]

16. Let V be a vector space, and let W be a subspaceof V , where dim(W) = k. Prove that if V is fi-nite dimensional, then dim(V ) ≥ k. [Hint: W mustcontain a set of k linearly independent vectors.]

17. Let W be a subspace of a finite-dimensional vectorspaceV , whereW contains at least one nonzero vec-tor. Prove that W has a basis and that dim(W) ≤dim(V ). [Hint: Use Exercise 36 of Section 5.4 toshow that W has a basis.]

18. Prove Theorem 6. [Hint: Let {u1, u2, . . . , uk} be asubset of V , where k ≥ p+ 1. Consider the vectors[u1]B, [u2]B, . . . , [uk]B in Rp and apply Theorem 5of Section 5.4.]

19. Prove Theorem 7.20. Prove Theorem 8.21. (Change of basis; see also Section 5.10). Let V

be a vector space, where dim(V ) = n, and letB = {v1, v2, . . . , vn} and C = {u1, u2, . . . , un} betwo bases for V . Let w be any vector in V , and sup-pose that w has these representations in terms of thebases B and C:

w = d1v1 + d2v2 + · · · + dnvnw = c1u1 + c2u2 + · · · + cnun.

By considering Eq. (10) of Section 5.4, convinceyourself that the coordinate vectors for w satisfy

[w]B = A[w]C,

where A is the (n × n) matrix whose ith columnis equal to [ui]B , 1 ≤ i ≤ n. As an application,consider the two bases for P2: C = {1, x, x2} andB = {1, x + 1, (x + 1)2}.a) Calculate the (3× 3) matrix A just described.b) Using the identity [p]B = A[p]c, calculate the

coordinate vector of p(x) = x2 + 4x + 8 withrespect to B.

22. The matrix A in Exercise 21 is called a transitionmatrix and shows how to transform a representationwith respect to one basis into a representation withrespect to another. Use the matrix in part a) of Ex-ercise 21 to convert p(x) = c0 + c1x + c2x

2 to theform p(x) = a0 + a1(x + 1)+ a2(x + 1)2, where:



a) p(x) = x2 + 3x − 2;b) p(x) = 2x2 − 5x + 8;c) p(x) = −x2 − 2x + 3;d) p(x) = x − 9.

23. By Theorem 5 of Section 5.4, an (n× n) transitionmatrix (see Exercises 21 and 22) is always nonsingu-lar. Thus if [w]B = A[w]c, then [w]C = A−1[w]B .CalculateA−1 for the matrix in part a) of Exercise 21and use the result to transform each of the followingpolynomials to the form a0 + a1x + a2x

2.a) p(x) = 2− 3(x + 1)+ 7(x + 1)2

b) p(x) = 1+ 4(x + 1)− (x + 1)2

c) p(x) = 4+ (x + 1)d) p(x) = 9− (x + 1)2

24. Find a matrix A such that [p]B = A[p]c for allp(x) in P3, where C = {1, x, x2, x3} and B ={1, x, x(x − 1), x(x − 1)(x − 2)}. Use A to con-vert each of the following to the form p(x) =a0 + a1x + a2x(x − 1)+ a3x(x − 1)(x − 2).a) p(x) = x3 − 2x2 + 5x − 9b) p(x) = x2 + 7x − 2c) p(x) = x3 + 1d) p(x) = x3 + 2x2 + 2x + 3

5.6 INNER-PRODUCT SPACES, ORTHOGONAL BASES,AND PROJECTIONS (OPTIONAL)

Up to now we have considered a vector space solely as an entity with an algebraicstructure. We know, however, that Rn possesses more than just an algebraic structure;in particular, we know that we can measure the size or length of a vector x in Rn by thequantity ‖x‖ = √xT x. Similarly, we can define the distance from x to y as ‖x−y‖. Theability to measure distances means that Rn has a geometric structure, which supplementsthe algebraic structure. The geometric structure can be employed to study problems ofconvergence, continuity, and the like. In this section we briefly describe how a suitablemeasure of distance might be imposed on a general vector space. Our development willbe brief, and we will leave most of the details to the reader; but the ideas parallel thosein Sections 3.6 and 3.8–3.9.

Inner-Product SpacesTo begin, we observe that the geometric structure for Rn is based on the scalar productxT y. Essentially the scalar product is a real-valued function of two vector variables:Given x and y in Rn, the scalar product produces a number xT y. Thus to derive ageometric structure for a vector space V , we should look for a generalization of thescalar-product function. A consideration of the properties of the scalar-product functionleads to the definition of an inner-product function for a vector space. (With referenceto Definition 7, which follows, we note that the expression uT v does not make sense ina general vector space V . Thus not only does the nomenclature change—scalar productbecomes inner product—but also the notation changes as well, with 〈u, v〉 denoting theinner product of u and v.)


5.6 Inner-Product Spaces, Orthogonal Bases, and Projections (Optional) 393

Definition 7 An inner product on a real vector space V is a function that assigns a real number,〈u, v〉, to each pair of vectors u and v in V , and that satisfies these properties:

1. 〈u, u〉 ≥ 0 and 〈u, u〉 = 0 if and only if u = θ .2. 〈u, v〉 = 〈v, u〉.3. 〈au, v〉 = a〈u, v〉.4. 〈u, v + w〉 = 〈u, v〉 + 〈u,w〉.

The usual scalar product in Rn is an inner product in the sense of Definition 7,where 〈x, y〉 = xT y. To illustrate the flexibility of Definition 7, we also note that thereare other sorts of inner products for Rn. The following example gives another innerproduct for R2.

Example 1 Let V be the vector space R2, and let A be the (2× 2) matrix

A =[

3 22 4

].

Verify that the function 〈u, v〉 = uTAv is an inner product for R2.

Solution Let u be a vector in R2:

u =[

u1

u2

].

Then

〈u, u〉 = uTAu = [u1, u2][

3 22 4

][u1

u2

],

so 〈u, u〉 = 3u21 + 4u1u2 + 4u2

2 = 2u21 + (u1 + 2u2)

2. Thus 〈u, u〉 ≥ 0 and 〈u, u〉 = 0if and only if u1 = u2 = 0. This shows that property 1 of Definition 7 is satisfied.

To see that property 2 of Definition 7 holds, note thatA is symmetric; that is,AT = A.Also observe that if u and v are in R2, then uTAv is a (1×1) matrix, so (uTAv)T = uTAv.It now follows that 〈u, v〉 = uTAv = (uTAv)T = vTAT (uT )T = vTAT u = 〈v, u〉.

Properties 3 and 4 of Definition 7 follow easily from the properties of matrix mul-tiplication, so 〈u, v〉 is an inner product for R2.

In Example 1, an inner product for R2 was defined in terms of a matrix A:

〈u, v〉 = uTAv.

In general, we might ask the following question:

“For what (n × n) matrices, A, does the operation uTAv define aninner product on Rn?”



The answer to this question is suggested by the solution to Example 1. In particular (seeExercises 3 and 32), the operation 〈u, v〉 = uTAv is an inner product for Rn if and onlyif A is a symmetric positive-definite matrix.

There are a number of ways in which inner products can be defined on spaces offunctions. For example, Exercise 6 will show that

〈p, q〉 = p(0)q(0)+ p(1)q(1)+ p(2)q(2)defines one inner product forP2. The following example gives yet another inner productfor P2.

Example 2 For p(t) and q(t) in P2, verify that

〈p, q〉 =∫ 1

0p(t)q(t) dt

is an inner product.

Solution To check property 1 of Definition 7, note that

〈p, p〉 =∫ 1

0p(t)2 dt,

and p(t)2 ≥ 0 for 0 ≤ t ≤ 1. Thus 〈p, p〉 is the area under the curve p(t)2, 0 ≤ t ≤ 1.Hence 〈p, p〉 ≥ 0, and equality holds if and only if p(t) = 0, 0 ≤ t ≤ 1 (see Fig. 5.5).

Properties 2, 3, and 4 of Definition 7 are straightforward to verify, and we includehere only the verification of property 4. If p(t), q(t), and r(t) are in P2, then

〈p, q + r〉 =∫ 1

0p(t)[q(t)+ r(t)] dt =

∫ 1

0[p(t)q(t)+ p(t)r(t)] dt

=∫ 1

0p(t)q(t)dt +

∫ 1

0p(t)r(t)dt = 〈p, q〉 + 〈p, r〉,

as required by property 4.

x0

y

y = p(t)2

Figure 5.5The value 〈p, p〉 isequal to the area underthe graph of y = p(t)2.

After the key step of defining a vector-space analog of the scalar product, the restis routine. For purposes of reference we call a vector space with an inner product aninner-product space. As in Rn, we can use the inner product as a measure of size: If Vis an inner-product space, then for each v in V we define ‖v‖ (the norm of v) as

‖v‖ = √〈v, v〉.Note that 〈v, v〉 ≥ 0 for all v in V , so the norm function is always defined.

Example 3 Use the inner product for P2 defined in Example 2 to determine ‖t2‖.Solution By definition, ‖t2‖ = √〈t2, t2〉. But 〈t2, t2〉 = ∫ 1

0 t2t2 dt = ∫ 10 t4 dt = 1/5, so

‖t2‖ = 1/√

5.

Before continuing, we pause to illustrate one way in which the inner-product spaceframework is used in practice. One of the many inner products for the vector spaceC[0, 1] is

〈f, g〉 =∫ 1

0f (x)g(x) dx.



If f is a relatively complicated function in C[0, 1], we might wish to approximatef by a simpler function, say a polynomial. For definiteness suppose we want to find apolynomialp inP2 that is a good approximation to f . The phrase “good approximation”is too vague to be used in any calculation, but the inner-product space framework allowsus to measure size and thus to pose some meaningful problems. In particular, we canask for a polynomial p∗ in P2 such that

‖f − p∗‖ ≤ ‖f − p‖for all p in P2. Finding such a polynomial p∗ in this setting is equivalent to minimizing∫ 1

0[f (x)− p(x)]2 dx

among all p in P2. We will present a procedure for doing this shortly.

Orthogonal BasesIf u and v are vectors in an inner-product space V , we say that u and v are orthogonalif 〈u, v〉 = 0. Similarly, B = {v1, v2, . . . , vp} is an orthogonal set in V if 〈vi , vj 〉 = 0when i �= j . In addition, if an orthogonal set of vectors B is a basis for V , we call Ban orthogonal basis. The next two theorems correspond to their analogs in Rn, and weleave the proofs to the exercises. [See Eq. (5a), Eq. (5b), and Theorem 14 of Section 3.6.]

Theorem 10 Let B = {v1, v2, . . . , vn} be an orthogonal basis for an inner-product space V . If u isany vector in V , then

u = 〈v1, u〉〈v1, v1〉v1 + 〈v2, u〉

〈v2, v2〉v2 + · · · + 〈vn, u〉〈vn, vn〉vn.

Theorem 11 Gram–Schmidt Orthogonalization Let V be an inner-product space, and let{u1, u2, . . . , un} be a basis for V . Let v1 = u1, and for 2 ≤ k ≤ n define vk by

vk = uk −k−1∑j=1

〈uk, vj 〉〈vj , vj 〉vj .

Then {v1, v2, . . . , vn} is an orthogonal basis for V .

Example 4 Let the inner product onP2 be the one given in Example 2. Starting with the natural basis{1, x, x2}, use Gram–Schmidt orthogonalization to obtain an orthogonal basis for P2.

Solution If we let {p0, p1, p2} denote the orthogonal basis, we have p0(x) = 1 and find p1(x)

from

p1(x) = x − 〈p0, x〉〈p0, p0〉p0(x).

We calculate

〈p0, x〉 =∫ 1

0x dx = 1/2 and 〈p0, p0〉 =

∫ 1

0dx = 1;



so p1(x) = x − 1/2. The next step of the Gram–Schmidt orthogonalization process isto form

p2(x) = x2 − 〈p1, x2〉

〈p1, p1〉p1(x)− 〈p0, x2〉

〈p0, p0〉p0(x).

The required constants are

〈p1, x2〉 =

∫ 1

0(x3 − x2/2) dx = 1/12

〈p1, p1〉 =∫ 1

0(x2 − x + 1/4) dx = 1/12

〈p0, x2〉 =

∫ 1

0x2 dx = 1/3

〈p0, p0〉 =∫ 1

0dx = 1.

Therefore, p2(x) = x2 − p1(x) − p0(x)/3 = x2 − x + 1/6, and {p0, p1, p2} is anorthogonal basis for P2 with respect to the inner product.

Example 5 Let B = {p0, p1, p2} be the orthogonal basis for P2 obtained in Example 4. Find thecoordinates of x2 relative to B.

Solution By Theorem 10, x2 = a0p0(x)+ a1p1(x)+ a2p2(x), where

a0 = 〈p0, x2〉/〈p0, p0〉

a1 = 〈p1, x2〉/〈p1, p1〉

a2 = 〈p2, x2〉/〈p2, p2〉.

The necessary calculations are

〈p0, x2〉 =

∫ 1

0x2 dx = 1/3

〈p1, x2〉 =

∫ 1

0[x3 − (1/2)x2] dx = 1/12

〈p2, x2〉 =

∫ 1

0[x4 − x3 + (1/6)x2] dx = 1/180

〈p0, p0〉 =∫ 1

0dx = 1

〈p1, p1〉 =∫ 1

0[x2 − x + 1/4] dx = 1/12

〈p2, p2〉 =∫ 1

0[x2 − x + 1/6]2 dx = 1/180.

Thus a0 = 1/3, a1 = 1, and a2 = 1. We can easily check that x2 = (1/3)p0(x) +p1(x)+ p2(x).



Orthogonal ProjectionsWe return now to the previously discussed problem of finding a polynomial p∗ in P2that is the best approximation of a function f in C[0, 1]. Note that the problem amountsto determining a vector p∗ in a subspace of an inner-product space, where p∗ is closerto f than any other vector in the subspace. The essential aspects of this problem can bestated formally as the following general problem:

Let V be an inner-product space and let W be a subspace of V . Given a vector v inV , find a vector w∗ in W such that

‖v − w∗‖ ≤ ‖v − w‖ for all w in W. (1)

A vector w∗ in W satisfying inequality (1) is called the projection of v onto W , or(frequently) the best least-squares approximation to v. Intuitively w∗ is the nearestvector in W to v.

The solution process for this problem is almost exactly the same as that for the least-squares problem in Rn. One distinction in our general setting is that the subspace W

might not be finite dimensional. If W is an infinite-dimensional subspace of V , thenthere may or may not be a projection of v onto W . If W is finite dimensional, then aprojection always exists, is unique, and can be found explicitly. The next two theoremsoutline this concept, and again we leave the proofs to the reader since they parallel theproof of Theorem 18 of Section 3.9.

Theorem 12 Let V be an inner-product space, and let W be a subspace of V . Let v be a vector in V ,and suppose w∗ is a vector in W such that

〈v − w∗,w〉 = 0 for all w in W.

Then ‖v − w∗‖ ≤ ‖v − w‖ for all w in W with equality holding only for w = w∗.

Theorem 13 Let V be an inner-product space, and let v be a vector in V . Let W be an n-dimensionalsubspace of V , and let {u1, u2, . . . , un} be an orthogonal basis for W . Then

‖v − w∗‖ ≤ ‖v − w‖ for all w in W

if and only if

w∗ = 〈v, u1〉〈u1, u1〉u1 + 〈v, u2〉

〈u2, u2〉u2 + · · · + 〈v, un〉〈un, un〉un. (2)

In view of Theorem 13, it follows that when W is a finite-dimensional subspace ofan inner-product space V , we can always find projections by first finding an orthogonalbasis for W (by using Theorem 11) and then calculating the projection w∗ from Eq. (2).

To illustrate the process of finding a projection, we return to the inner-product spaceC[0, 1]with the subspace P2. As a specific but rather unrealistic function, f , we choosef (x) = cos x, x in radians. The inner product is

〈f, g〉 =∫ 1

0f (x)g(x) dx.



Example 6 In the vector space C[0, 1], let f (x) = cos x. Find the projection of f onto the sub-space P2.

Solution Let {p0, p1, p2} be the orthogonal basis for P2 found in Example 4. (Note that the innerproduct used in Example 4 coincides with the present inner product on C[0, 1]. ByTheorem 13, the projection of f onto P2 is the polynomial p∗ defined by

p∗(x) = 〈f, p0〉〈p0, p0〉p0(x)+ 〈f, p1〉

〈p1, p1〉p1(x)+ 〈f, p2〉〈p2, p2〉p2(x),

where

〈f, p0〉 =∫ 1

0cos(x) dx � .841471

〈f, p1〉 =∫ 1

0(x − 1/2) cos(x) dx � .038962

〈f, p2〉 =∫ 1

0(x2 − x + 1/6) cos(x) dx � −.002394.

From Example 5, we have 〈p0, p0〉 = 1, 〈p1, p1〉 = 1/12, and 〈p2, p2〉 = 1/180.Therefore, p∗(x) is given by

p∗(x) = 〈f, p0〉p0(x)+ 12〈f, p1〉p1(x)+ 180〈f, p2〉p2(x)

� .841471p0(x)− .467544p1(x)− .430920p2(x).

In order to assess how well p∗(x) approximates cos x in the interval [0, 1], we cantabulate p∗(x) and cos x at various values of x (see Table 5.1).

Example 7 The function Si(x) (important in applications such as optics) is defined as follows:

Si(x) =∫ x

0

sin u

udu, for x �= 0. (3)

The integral in (3) is not an elementary one and so, for a given value of x, Si(x) mustbe evaluated using a numerical integration procedure. In this example, we approximate

Table 5.1

x p*(x) cos x p*(x) − cos x

0.0 1.0034 1.000 .00340.2 .9789 .9801 −.00120.4 .9198 .9211 −.00130.6 .8263 .8253 .00100.8 .6983 .6967 .00161.0 .5359 .5403 −.0044



Si(x) by a cubic polynomial for 0 ≤ x ≤ 1. In particular, it can be shown that if wedefine Si(0) = 0, then Si(x) is continuous for all x. Thus we can ask:

“What is the projection of Si(x) onto the subspace P3 of C[0, 1]?”This projection will serve as an approximation to Si(x) for 0 ≤ x ≤ 1.

Solution We used the computer algebra system Derive to carry out the calculations. Some ofthe steps are shown in Fig. 5.6. To begin, let {p0, p1, p2, p3} be the orthogonal basisfor P3 found by the Gram–Schmidt process. From Example 4, we already know that

6: ∫ 0 x3 P1 (x) dx

7:

8: ∫ 0 x3 P2 (x) dx

15: P3 (x) :=x3 - P0 (x) - P1 (x) - P2 (x)

16: P3 (x) :=x3 -

17: ∫ 0 P3 (x) P3 (x) dx

49: ∫ 0 180 P2 (x) ∫

0 du dx

50: -0.0804033

51: ∫ 0 2800 P3 (x) ∫

0 du dx

52: -0.0510442

1

1

1

1

340

1120

14

3x2

23x 5

120

32

910

9:

12800

18:

+ -

x

1 x

SIN (u) u

SIN (u) u

Figure 5.6 Some of the steps used by Derive to generate the projectionof Si(x) onto P3 in Example 7



p0(x) = 1, p1(x) = x − 1/2, and p2(x) = x2 − x + 1/6. To find p3, we first calculatethe inner products

〈p0, x3〉, 〈p1, x

3〉, 〈p2, x3〉

(see steps 6–9 in Fig. 5.6 for 〈p1, x3〉 and 〈p2, x

3〉).Using Theorem 11, we find p3 and, for later use, 〈p3, p3〉:

p3(x) = x3 − (3/2)x2 + (3/5)x − 1/20〈p3, p3〉 = 1/2800

(see steps 15–18 in Fig. 5.6). Finally, by Theorem 13, the projection of Si(x) onto P3 isthe polynomial p∗ defined by

p∗(x) = 〈Si, p0〉〈p0, p0〉p0(x)+ 〈Si, p1〉

〈p1, p1〉p1(x)+ 〈Si, p2〉〈p2, p2〉p2(x)+ 〈Si, p3〉

〈p3, p3〉p3(x)

= 〈Si, p0〉p0(x)+ 12〈Si, p1〉p1(x)+ 180〈Si, p2〉p2(x)+ 2800〈Si, p3〉p3(x).

In the expression above for p∗, the inner products 〈Si, pk〉 for k = 0, 1, 2, and 3 aregiven by

〈Si, pk〉 =∫ 1

0pk(x)Si(x) dx =

∫ 1

0pk(x)

{∫ x

0

sin u

udu

}dx

(see steps 49–52 in Fig. 5.6 for 180〈Si, p2〉 and 2800〈Si, p3〉).Now, since Si(x) must be estimated numerically, it follows that the inner products

〈Si, pk〉 must be estimated as well. Using Derive to approximate the inner products, weobtain the projection (or best least-squares approximation)

p∗(x) = .486385p0(x)+ .951172p1(x)− .0804033p2(x)− .0510442p3(x).

To assess how well p∗(x) approximates Si(x) in [0, 1], we tabulate each function at afew selected points (see Table 5.2). As can be seen from Table 5.2, it appears that p∗(x)is a very good approximation to Si(x).

Table 5.2

x p*(x) Si(x) p*(x) − Si(x)

0.0 .000049 .000000 .0000490.2 .199578 .199556 .0000280.4 .396449 .396461 −.0000120.6 .588113 .588128 −.0000150.8 .772119 .772095 .0000241.0 .946018 .946083 −.000065



5.6 EXERCISES

1. Prove that 〈x, y〉 = 4x1y1+x2y2 is an inner producton R2, where

x =[

x1

x2

]and y =

[y1

y2

].

2. Prove that 〈x, y〉 = a1x1y1+ a2x2y2+· · ·+ anxnynis an inner product on Rn, where a1, a2, . . . , an arepositive real numbers and where

x = [x1, x2, . . . , xn]T andy = [y1, y2, . . . , yn]T .

3. A real (n×n) symmetric matrix A is called positivedefinite if xTAx > 0 for all x in Rn, x �= θ . Let Abe a symmetric positive-definite matrix, and verifythat

〈x, y〉 = xTAy

defines an inner product on Rn; that is, verify thatthe four properties of Definition 7 are satisfied.

4. Prove that the following symmetric matrix A is pos-itive definite. Prove this by choosing an arbitraryvector x in R2, x �= θ , and calculating xTAx.

A =[

1 11 2

]

5. In P2 let p(x) = a0 + a1x + a2x2 and q(x) = b0 +

b1x+b2x2. Prove that 〈p, q〉 = a0b0+a1b1+a2b2

is an inner product on P2.

6. Prove that 〈p, q〉 = p(0)q(0) + p(1)q(1) +p(2)q(2) is an inner product on P2.

7. Let A = (aij ) and B = (bij ) be (2 × 2) matrices.Show that 〈A,B〉 = a11b11+a12b12+a21b21+a22b22is an inner product for the vector space of all (2×2)matrices.

8. For x = [1,−2]T and y = [0, 1]T in R2, find 〈x, y〉,‖x‖, ‖y‖, and ‖x− y‖ using the inner product givenin Exercise 1.

9. Repeat Exercise 8 with the inner product defined inExercise 3 and the matrix A given in Exercise 4.

10. In P2 let p(x) = −1 + 2x + x2 and q(x) =1 − x + 2x2. Using the inner product given in Ex-ercise 5, find 〈p, q〉, ‖p‖, ‖q‖, and ‖p − q‖.

11. Repeat Exercise 10 using the inner product definedin Exercise 6.

12. Show that {1, x, x2} is an orthogonal basis for P2with the inner product defined in Exercise 5 but notwith the inner product in Exercise 6.

13. In R2 let S = {x: ‖x‖ = 1}. Sketch a graph of S if〈x, y〉 = xT y. Now graph S using the inner productgiven in Exercise 1.

14. Let A be the matrix given in Exercise 4, and for x, yin R2 define 〈x, y〉 = xTAy (see Exercise 3). Start-ing with the natural basis {e1, e2}, use Theorem 11to obtain an orthogonal basis {u1, u2} for R2.

15. Let {u1, u2} be the orthogonal basis for R2 obtainedin Exercise 14 and let v = [3, 4]T . Use Theorem 10to find scalars a1, a2 such that v = a1u1 + a2u2.

16. Use Theorem 11 to calculate an orthogonal basis{p0, p1, p2} for P2 with respect to the inner productin Exercise 6. Start with the natural basis {1, x, x2}for P2.

17. Use Theorem 10 to write q(x) = 2 + 3x − 4x2 interms of the orthogonal basis {p0, p1, p2} obtainedin Exercise 16.

18. Show that the function defined in Exercise 6 is notan inner product forP3. [Hint: Findp(x) inP3 suchthat 〈p, p〉 = 0, but p �= θ .]

19. Starting with the natural basis {1, x, x2, x3, x4},generate an orthogonal basis for P4 with respect tothe inner product

〈p, q〉 =2∑

i=−2

p(i)q(i).

20. If V is an inner-product space, show that 〈v, θ〉 = 0for each vector v in V .

21. Let V be an inner-product space, and let u be a vec-tor in V such that 〈u, v〉 = 0 for every vector v inV . Show that u = θ .

22. Let a be a scalar and v a vector in an inner-productspace V . Prove that ‖av‖ = |a|‖v‖.

23. Prove that if {v1, v2, . . . , vk} is an orthogonal set ofnonzero vectors in an inner-product space, then thisset is linearly independent.

24. Prove Theorem 10.



25. Approximate x3 with a polynomial in P2. [Hint:Use the inner product

〈p, q〉 =∫ 1

0p(t)q(t) dt,

and let {p0, p1, p2} be the orthogonal basis for P2obtained in Example 4. Now apply Theorem 13.]

26. In Examples 4 and 7 we found p0(x), . . . , p3(x),which are orthogonal with respect to

〈f, g〉 =∫ 1

0f (x)g(x) dx.

Continue the process, and find p4(x) so that{p0, p1, . . . , p4} is an orthogonal basis for P4.(Clearly there is an infinite sequence of polynomialsp0, p1, . . . , pn, . . . that satisfy∫ 1

0pi(x)pj (x) dx = 0, i �= j.

These are called the Legendre polynomials.)27. With the orthogonal basis for P3 obtained in Ex-

ample 7, use Theorem 13 to find the projection off (x) = cos x in P3. Construct a table similar toTable 5.1 and note the improvement.

28. An inner product on C[−1, 1] is〈f, g〉 = 2

π

∫ 1

−1

f (x)g(x)√1− x2

dx.

Starting with the set {1, x, x2, x3, . . .}, usethe Gram–Schmidt process to find polynomialsT0(x), T1(x), T2(x), and T3(x) such that 〈Ti, Tj 〉 =0 when i �= j . These polynomials are called theChebyshev polynomials of the first kind. [Hint:Make a change of variables x = cos θ .]

29. A sequence of orthogonal polynomials usually sat-isfies a three-term recurrence relation. For example,the Chebyshev polynomials are related byTn+1(x) = 2xTn(x)− Tn−1(x), n = 1, 2, . . . ,

(R)where T0(x) = 1 and T1(x) = x. Verify thatthe polynomials defined by the relation (R) aboveare indeed orthogonal in C[−1, 1] with respect tothe inner product in Exercise 28. Verify this asfollows:a) Make the change of variables x = cos θ , and

use induction to show that Tk(cos θ) = cos kθ,k = 0, 1, . . . , where Tk(x) is defined by (R).

b) Using part a), show that 〈Ti, Tj 〉 = 0 wheni �= j .

c) Use induction to show that Tk(x) is a polyno-mial of degree k, k = 0, 1, . . . .

d) Use (R) to calculate T2, T3, T4, and T5.

30. Let C[−1, 1] have the inner product of Exercise 28,and let f be in C[−1, 1]. Use Theorem 13 to provethat ‖f − p∗‖ ≤ ‖f − p‖ for all p in Pn if

p∗(x) = a0

2+

n∑j=1

ajTj (x),

where aj = 〈f, Tj 〉, j = 0, 1, . . . , n.

31. The iterated trapezoid rule provides a good esti-mate of

∫ b

af (x) dx when f (x) is periodic in [a, b].

In particular, let N be a positive integer, and leth = (b−a)/N . Next, define xi by xi = a+ ih, i =0, 1, . . . , N , and suppose f (x) is in C[a, b]. If wedefine A(f ) by

A(f ) = h

2f (x0)+ h

N−1∑j=1

f (xj )+ h

2f (xN),

then A(f ) is the iterated trapezoid rule applied tof (x). Using the result in Exercise 30, write a com-puter program that generates a good approximationto f (x) in C[−1, 1]. That is, for an input functionf (x) and a specified value of n, calculate estimatesof a0, a1, . . . , an, where

ak = 〈f, Tk〉 � A(f Tk).

To do this calculation, make the usual change ofvariables x = cos θ so that

ak = 2π

∫ π

0f (cos θ) cos(kθ)dθ, k = 0, 1, . . . , n.

Use the iterated trapezoid rule to estimate each ak .Test your program on f (x) = e2x and note that(R) can be used to evaluate p∗(x) at any point x in[−1, 1].

32. Show that if A is a real (n × n) matrix and if theexpression 〈u, v〉 = uTAv defines an inner producton Rn, then A must be symmetric and positive def-inite (see Exercise 3 for the definition of positivedefinite). [Hint: Consider 〈ei , ej 〉.]


5.7 Linear Transformations 403

5.7 LINEAR TRANSFORMATIONS

Linear transformations on subspaces ofRn were introduced in Section 3.7. The definitiongiven there extends naturally to the general vector-space setting. In this section and thenext, we develop the basic properties of linear transformations, and in Section 5.8 wewill use linear transformations and the concept of coordinate vectors to show that ann-dimensional vector space is essentially just Rn.

If T :Rn→ Rm is a linear transformation, there exists an (m×n) matrix A such thatT (x) = Ax. Although this is not the case in the general vector-space setting, we willshow in Section 5.9 that there is still a close relationship between linear transformationsand matrices, provided that the domain space is finite dimensional.

We begin with the definition of a linear transformation.

Definition 8 Let U and V be vector spaces, and let T be a function from U to V , T : U → V .We say that T is a linear transformation if for all u and w in U and all scalars a

T (u+ w) = T (u)+ T (w)

and

T (au) = aT (u).

Examples of Linear TransformationsTo illustrate Definition 8, we now provide several examples of linear transformations.

Example 1 Let T : P2 → R1 be defined by T (p) = p(2). Verify that T is a linear transformation.

Solution First note that R1 is just the set R of real numbers, but in this context R is regarded asa vector space. To illustrate the definition of T , if p(x) = x2 − 3x + 1, then T (p) =p(2) = −1.

To verify that T is a linear transformation, let p(x) and q(x) be in P2 and let a bea scalar. Then T (p + q) = (p + q)(2) = p(2) + q(2) = T (p) + T (q). Likewise,T (ap) = (ap)(2) = ap(2) = aT (p). Thus T is a linear transformation.

In general, if W is any subspace of C[a, b] and if x0 is any number in [a, b], thenthe function T :W → R1 defined by T (f ) = f (x0) is a linear transformation.

Example 2 Let V be a p-dimensional vector space with basis B = {v1, v2, . . . , vp}. Show thatT : V → Rp defined by T (v) = [v]B is a linear transformation.

Solution That T is a linear transformation is a direct consequence of the lemma in Section 5.4.Specifically, if u and v are vectors in V , then T (u + v) = [u + v]B = [u]B + [v]B =T (u)+ T (v). Also, if a is a scalar, then T (au) = [au]B = a[u]B = aT (u).



Example 3 Let T : C[0, 1] → R1 be defined by

T (f ) =∫ 1

0f (t) dt.

Prove that T is a linear transformation.

Solution If f (x) and g(x) are functions in C[0, 1], then

T (f + g) =∫ 1

0[f (t)+ g(t)] dt

=∫ 1

0f (t) dt +

∫ 1

0g(t) dt

= T (f )+ T (g).

Likewise, if a is a scalar, the properties of integration give

T (af ) =∫ 1

0af (t) dt

= a

∫ 1

0f (t) dt

= aT (f ).

Therefore, T is a linear transformation.

Example 4 Let C1[0, 1] denote the set of all functions that have a continuous first derivative in theinterval [0, 1]. (Note that C1[0, 1] is a subspace of C[0, 1].) Let k(x) be a fixed functionin C[0, 1] and define T : C1[0, 1] → C[0, 1] by

T (f ) = f ′ + kf.

Verify that T is a linear transformation.

Solution To illustrate the definition of T , suppose, for example, that k(x) = x2. If f (x) = sin x,then T (f ) is the function defined by T (f )(x) = f ′(x)+ k(x)f (x) = cos x + x2 sin x.

To see that T is a linear transformation, let g and h be functions in C1[0, 1]. Then

T (g + h) = (g + h)′ + k(g + h)

= g′ + h′ + kg + kh

= (g′ + kg)+ (h′ + kh)

= T (g)+ T (h).

Also, for a scalar c, T (cg) = (cg)′ + k(cg) = c(g′ + kg) = cT (g). Hence T is a lineartransformation.

The linear transformation in Example 4 is an example of a differential operator. Wewill return to differential operators later and only mention here that the term operator istraditional in the study of differential equations. Operator is another term for function ortransformation, and we could equally well speak of T as a differential transformation.



For any vector space V , the mapping I : V → V defined by I (v) = v is a lineartransformation called the identity transformation. Between any two vector spaces U

and V , there is always at least one linear transformation, called the zero transformation.If θV is the zero vector in V , then the zero transformation T : U → V is defined byT (u) = θV for all u in U .

Properties of Linear TransformationsOne of the important features of the two linearity properties in Definition 8 is that ifT : U → V is a linear transformation and if U is a finite-dimensional vector space, thenthe action of T on U is completely determined by the action of T on a basis for U . Tosee why this statement is true, suppose U has a basis B = {u1, u2, . . . , up}. Then givenany u in U , we know that u can be expressed uniquely as

u = a1u1 + a2u2 + · · · + apup.

From this expression it follows that T (u) is given by

T (u) = T (a1u1 + a2u2 + · · · + apup)= a1T (u1)+ a2T (u2)+ · · · + apT (up). (1)

Clearly Eq. (1) shows that if we know the vectors T (u1), T (u2), . . . , T (up), thenwe know T (u) for any u in U ; T is completely determined once T is defined on thebasis. The next example illustrates this concept.

Example 5 Let T : P3 → P2 be a linear transformation such that T (1) = 1 − x, T (x) = x +x2, T (x2) = 1+ 2x, and T (x3) = 2− x2. Find T (2− 3x + x2 − 2x3).

Solution Applying Eq. (1) yields

T (2− 3x + x2 − 2x3) = 2T (1)− 3T (x)+ T (x2)− 2T (x3)

= 2(1− x)− 3(x + x2)+ (1+ 2x)− 2(2− x2)

= −1− 3x − x2.

Similarly,

T (a0+ a1x + a2x2 + a3x

3)

= a0T (1)+ a1T (x)+ a2T (x2)+ a3T (x

3)

= a0(1− x)+ a1(x + x2)+ a2(1+ 2x)+ a3(2− x2)

= (a0 + a2 + 2a3)+ (−a0 + a1 + 2a2)x + (a1 − a3)x2.

Before giving further properties of linear transformations, we require several defi-nitions. Let T : U → V be a linear transformation, and for clarity let us denote the zerovectors in U and V as θU and θV , respectively. The null space (or kernel) of T , denotedby N (T ), is the subset of U defined by

N (T ) = {u in U : T (u) = θV }.The range of T , denoted byR(T ), is the subset of V defined by

R(T ) = {v in V : v = T (u) for some u in U}.



As before, the dimension ofN (T ) is called the nullity of T and is denoted by nullity(T ).Likewise, the dimension of R(T ) is called the rank of T and is denoted by rank(T ).Finally, we say a linear transformation is one to one if T (u) = T (w) implies u = w forall u and w in U . Some of the elementary properties of linear transformations are givenin the next theorem.

Theorem 14 Let T : U → V be a linear transformation. Then:

1. T (θU) = θV .2. N (T ) is a subspace of U .3. R(T ) is a subspace of V .4. T is one to one if and only if N (T ) = {θU }; that is, T is one to one if and only

if nullity(T ) = 0.

Proof To prove property 1, note that 0θU = θU , so

T (θU) = T (0θU) = 0T (θU) = θV .To prove property 2, we must verify that N (T ) satisfies the three properties of

Theorem 2 in Section 5.3. It follows from property 1 that θU is inN (T ). Next, let u1 andu2 be inN (T ) and let a be a scalar. Then T (u1+u2) = T (u1)+T (u2) = θV +θV = θV ,so u1 + u2 is in N (T ). Similarly, T (au1) = aT (u1) = aθV = θV , so au1 is in N (T ).Therefore, N (T ) is a subspace of U .

The proof of property 3 is left as an exercise. To prove property 4, suppose thatN (T ) = {θU }. In order to show that T is one to one, let u and w be vectors in U suchthat T (u) = T (w). Then θV = T (u)−T (w) = T (u)+ (−1)T (w) = T [u+ (−1)w] =T (u − w). It follows that u − w is in N (T ). But N (T ) = {θU }, so u − w = θU .Therefore, u = w and T is one to one. The converse is Exercise 24.

When T : Rn → Rm is given by T (x) = Ax, with A an (m × n) matrix, thenN (T ) is the null space of A and R(T ) is the range of A. In this setting, property 4 ofTheorem 14 states that a consistent system of equations Ax = b has a unique solutionif and only if the trivial solution is the unique solution for the homogeneous systemAx = θ .

The following theorem gives additional properties of a linear transformationT :U →V , where U is a finite-dimensional vector space.

Theorem 15 Let T : U → V be a linear transformation and let U be p-dimensional, where B ={u1, u2, . . . , up} is a basis for U .

1. R(T ) = Sp{T (u1), T (u2), . . . , T (up)}.2. T is one to one if and only if {T (u1), T (u2), . . . , T (up)} is linearly independent

in V .3. rank(T )+ nullity(T ) = p.

Proof Property 1 is immediate from Eq. (1). That is, if v is in R(T ), then v = T (u) for someu in U . But B is a basis for U ; so u is of the form u = a1u1 + a2u2 + · · · + apup;and hence T (u) = v = a1T (u1) + a2T (u2) + · · · + apT (up). Therefore, v is inSp{T (u1), T (u2), . . . , T (up)}.



To prove property 2, we can use property 4 of Theorem 14; T is one to one if andonly if θU is the only vector inN (T ). In particular, let us suppose that u is some vectorin N (T ), where u = b1u1 + b2u2 + · · · + bpup. Then T (u) = θV , or

b1T (u1)+ b2T (u2)+ · · · + bpT (up) = θV . (2)

If {T (u1), T (u2), . . . , T (up)} is a linearly independent set in V , then the only scalarssatisfying Eq. (2) are b1 = b2 = · · · = bp = 0. Therefore, u must be θU ; so T is one toone. On the other hand, if T is one to one, then there cannot be a nontrivial solution toEq. (2); for if there were, N (T ) would contain the nonzero vector u.

To prove property 3, we first note that 0 ≤ rank(T ) ≤ p by property 1. We leavethe two extreme cases, rank(T ) = p and rank(T ) = 0, to the exercises and consideronly 0 < rank(T ) < p. [Note that rank(T ) < p implies that nullity(T ) ≥ 1, so T isnot one to one. We mention this point because we will need to choose a basis forN (T )

below.]It is conventional to let r denote rank(T ), so let us suppose R(T ) has a basis of

r vectors, {v1, v2, . . . , vr}. From the definition of R(T ), we know there are vectorsw1,w2, . . . ,wr , in U such that

T (wi ) = vi , 1 ≤ i ≤ r. (3)

Now suppose that nullity(T ) = k and let {x1, x2, . . . , xk} be a basis forN (T ). We nowshow that the set

Q = {x1, x2, . . . , xk,w1,w2, . . . ,wr}is a basis for U (therefore, k + r = p, which proves property 3).

We first establish that Q is a linearly independent set in U by considering

c1x1 + c2x2 + · · · + ckxk + a1w1 + a2w2 + · · · + arwr = θU . (4)

Applying T to both sides of Eq. (4) yields

T (c1x1 + · · · + ckxk + a1w1 + · · · + arwr ) = T (θU). (5a)

Using Eq. (1) and property 1 of Theorem 14, Eq. (5a) becomes

c1T (x1)+ · · · + ckT (xk)+ a1T (w1)+ · · · + arT (wr ) = θV . (5b)

Since each xi is in N (T ) and T (wi ) = vi , Eq. (5b) becomes

a1v1 + a2v2 + · · · + arvr = θV . (5c)

Since the set {v1, v2, . . . , vr} is linearly independent, a1 = a2 = · · · = ar = 0. Thevector equation (4) now becomes

c1x1 + c2x2 + · · · + ckxk = θU . (6)

But {x1, x2, . . . , xk} is a linearly independent set in U, so we must have c1 = c2 = · · · =ck = 0. Therefore, Q is a linearly independent set.

To complete the argument, we need to show that Q is a spanning set for U . So letu be any vector in U . Then v = T (u) is a vector inR(T ); so

T (u) = b1v1 + b2v2 + · · · + brvr .



Consider an associated vector x in U , where x is defined by

x = b1w1 + b2w2 + · · · + brwr . (7)

We observe that T (u− x) = θV ; so obviously u− x is in N (T ) and can be written as

u− x = d1x1 + d2x2 + · · · + dkxk. (8)

Placing x on the right-hand side of Eq. (8) and using Eq. (7), we have shown that u is alinear combination of vectors in Q. Thus Q is a basis for U , and property 3 is provedsince k + r must equal p.

As the following example illustrates, property 1 of Theorem 15 and the techniquesof Section 5.4 give a method for obtaining a basis forR(T ).

Example 6 Let V be the vector space of all (2 × 2) matrices, and let T : P3 → V be the lineartransformation defined by

T (a0 + a1x + a2x2 + a3x

3) =[

a0 + a2 a0 + a3

a1 + a2 a1 + a3

].

Find a basis forR(T ) and determine rank(T ) and nullity(T ). Finally, show that T is notone to one.

Solution By property 1 of Theorem 15, R(T ) = Sp{T (1), T (x), T (x2), T (x3)}. Thus R(T ) =Sp(S), where S = {A1, A2, A3, A4} and

A1 =[

1 10 0

], A2 =

[0 01 1

], A3 =

[1 01 0

], and A4 =

[0 10 1

].

Let B be the natural basis for V : B = {E11, E12, E21, E22}. Form the (4× 4) matrix C

with column vectors [A1]B, [A2]B, [A3]B, [A4]B ; thus

C =

1 0 1 01 0 0 10 1 1 00 1 0 1

.

EMMYNOETHER Emmy Noether (1882–1935) is the most heralded female mathematicianof the early twentieth century. Overcoming great obstacles for women in mathematics at the time, shereceived her doctorate from Göttingen and went on to work with David Hilbert and Felix Klein on thegeneral theory of relativity. Among her most highly regarded results are the representation ofnoncommutative algebras as linear transformations and Noether’s Theorem, which is used to explain thecorrespondences between certain invariants and physical conservation laws. She fled from Germany in1933 and spent the last two years of her life on the faculty at Bryn Mawr College in Philadelphia.



The matrix CT reduces to the matrix

DT =

1 1 0 00 1 0 10 0 1 10 0 0 0

in echelon form. Therefore,

D =

1 0 0 01 1 0 00 0 1 00 1 1 0

,

and the nonzero columns ofD constitute a basis for the subspace Sp{[A1]B, [A2]B, [A3]B,[A4]B} of R4. If the matrices B1, B2, and B3 are defined by

B1 =[

1 10 0

], B2 =

[0 10 1

], and B3 =

[0 01 1

],

then [B1]B, [B2]B , and [B3]B are the nonzero columns of D. It now follows fromTheorem 5 of Section 5.4 that {B1, B2, B3} is a basis forR(T ).

By property 3 of Theorem 15,

dim(P3) = rank(T )+ nullity(T ).

We have just shown that rank(T ) = 3. Since dim(P3) = 4, it follows that nullity(T ) = 1.In particular, T is not one to one by property 4 of Theorem 14.

Example 7 Let T :P2 → R1 be defined by T (p(x)) = p(2). Exhibit a basis forN (T ) and determinethe rank and nullity of T .

Solution By definition, T (a0 + a1x + a2x2) = a0 + 2a1 + 4a2. Thus

N (T ) = {p(x): p(x) is in P2 and a0 + 2a1 + 4a2 = 0}.In the algebraic specification for N (T ), a1 and a2 can be designated as unconstrainedvariables, and a0 = −2a1 − 4a2. Thus p(x) in N (T ) can be decomposed as

p(x) = (−2a1 − 4a2)+ a1x + a2x2 = a1(−2+ x)+ a2(−4+ x2).

It follows that {−2 + x,−4 + x2} is a basis for N (T ). In particular, nullity(T ) = 2.Then rank(T ) = dim(P2)− nullity(T ) = 3− 2 = 1.

We have already noted that if A is an (m×n) matrix and T :Rn→ Rm is defined byT (x) = Ax, then R(T ) = R(A) and N (T ) = N (A). The following corollary, givenas a remark in Section 3.5, is now an immediate consequence of these observations andproperty 3 of Theorem 15.

Corollary If A is an (m× n) matrix, then

n = rank(A)+ nullity(A).



5.7 EXERCISES

In Exercises 1–4, V is the vector space of all (2 × 2)matrices and A has the form

A =[

a b

c d

].

Determine whether the function T : V → R1 is a lineartransformation.1. T (A) = det(A)2. T (A) = a + 2b − c + d

3. T (A) = tr(A), where tr(A) denotes the trace of Aand is defined by tr(A) = a + d.

4. T (A) = (a − d)(b − c)

In Exercises 5–8, determine whether T is a lineartransformation.5. T : C1[−1, 1] → R1 defined by T (f ) = f ′(0)6. T : C[0, 1] → C[0, 1] defined by T (f ) = g, where

g(x) = exf (x)

7. T : P2 → P2 defined by T (a0 + a1x + a2x2) =

(a0 + 1)+ (a1 + 1)x + (a2 + 1)x2

8. T : P2 → P2 defined by T (p(x)) = p(0)+ xp′(x)9. Suppose that T : P2 → P3 is a linear transforma-

tion, where T (1) = 1 + x2, T (x) = x2 − x3, andT (x2) = 2+ x3.a) Find T (p), where p(x) = 3− 2x + 4x2.b) Give a formula for T ; that is, find

T (a0 + a1x + a2x2).

10. Suppose that T : P2 → P4 is a linear transforma-tion, where T (1) = x4, T (x + 1) = x3 − 2x, andT (x2 + 2x + 1) = x. Find T (p) and T (q), wherep(x) = x2 + 5x − 1 and q(x) = x2 + 9x + 5.

11. Let V be the set of all (2× 2) matrices and supposethat T : V → P2 is a linear transformation such thatT (E11) = 1− x, T (E12) = 1+ x + x2, T (E21) =2x − x2, and T (E22) = 2+ x − 2x2.a) Find T (A), where

A =[ −2 2

3 4

].

b) Give a formula for T ; that is, find

T

([a b

c d

]).

12. With V as in Exercise 11, define T : V → R2 by

T

([a b

c d

])=[

a + 2db − c

].

a) Prove that T is a linear transformation.b) Give an algebraic specification for N (T ).c) Exhibit a basis for N (T ).d) Determine the nullity and the rank of T .e) Without doing any calculations, argue thatR(T ) = R2.

f ) Prove R(T ) = R2 as follows: Let v be in R2,

v =[

x

y

].

Exhibit a (2×2) matrix A in V such that T (A) = v.13. LetT :P4 → P2 be the linear transformation defined

by T (p) = p′′(x).a) Exhibit a basis for R(T ) and conclude thatR(T ) = P2.

b) Determine the nullity of T and conclude that Tis not one to one.

c) Give a direct proof that R(T ) = P2; thatis, for p(x) = a0 + a1x + a2x

2 in P2, ex-hibit a polynomial q(x) in P4 such thatT (q) = p.

14. Define T : P4 → P3 by

T (a0+ a1x + a2x2 + a3x

3 + a4x4)

= (a0 − a1 + 2a2 − a3 + a4)

+(−a0 + 3a1 − 2a2 + 3a3 − a4)x

+(2a0 − 3a1 + 5a2 − a3 + a4)x2

+(3a0 − a1 + 7a2 + 2a3 + 2a4)x3.

Find a basis for R(T ) (see Example 6) and showthat T is not one to one.

15. Identify N (T ) and R(T ) for the linear transforma-tion T given in Example 1.

16. Identify N (T ) and R(T ) for the linear transforma-tion T given in Example 3.

17. Let I : V → V be defined by I (v) = v for each vin V .a) Prove that I is a linear transformation.b) Determine N (I ) and R(I ).


5.8 Operations with Linear Transformations 411

18. Let U and V be vector spaces and define T :U → V

by T (u) = θV for each u in U .a) Prove that T is a linear transformation.b) Determine N (T ) and R(T ).

19. Suppose that T :P4 → P2 is a linear transformation.Enumerate the various possibilities for rank(T ) andnullity(T ). Can T possibly be one to one?

20. Let T : U → V be a linear transformation and letU be finite dimensional. Prove that if dim(U) >

dim(V ), then T cannot be one to one.21. Suppose thatT :R3 → P3 is a linear transformation.

Enumerate the various possibilities for rank(T ) andnullity(T ). Is R(T ) = P3 a possibility?

22. Let T : U → V be a linear transformation and letU be finite dimensional. Prove that if dim(U) <

dim(V ), then R(T ) = V is not possible.23. Prove property 3 of Theorem 14.24. Complete the proof of property 4 of Theorem 14 by

showing that if T is one to one, then N (T ) = {θU }.

25. Complete the proof of property 3 of Theorem 15 asfollows:a) If rank(T ) = p, prove that nullity(T ) = 0.b) If rank(T ) = 0, show that nullity(T ) = p.

26. Let T : Rn → Rn be defined by T (x) = Ax,where A is an (n × n) matrix. Use property 4 ofTheorem 14 to show that T is one to one if and onlyif A is nonsingular.

27. Let V be the vector space of all (2×2) matrices anddefine T : V → V by T (A) = AT .a) Show that T is a linear transformation.b) Determine the nullity and rank of T . Conclude

that T is one to one and R(T ) = V .c) Show directly that R(T ) = V ; that is, for B

in V exhibit a matrix C in V such thatT (C) = B.

5.8 OPERATIONS WITH LINEAR TRANSFORMATIONS

We know that a useful arithmetic structure is associated with matrices: Matrices canbe added and multiplied, nonsingular matrices have inverses, and so on. Much of thisstructure is available also for linear transformations. For our explanation we will needsome definitions. Let U and V be vector spaces and let T1 and T2 be linear transfor-mations, where T1: U → V and T2: U → V . By the sum T3 = T1 + T2, we meanthe function T3: U → V , where T3(u) = T1(u) + T2(u) for all u in U . The followingexample illustrates this concept.

Example 1 Let T1: P4 → P2 be given by T1(p) = p′′(x), and suppose that T2: P4 → P2 is definedby T2(p) = xp(1). If S = T1 + T2, give the formula for S.

Solution By definition, the sum T1 + T2 is the linear transformation S: P4 → P2 defined byS(p) = T1(p)+ T2(p) = p′′(x)+ xp(1).

If T : U → V is a linear transformation and a is a scalar, then aT denotes thefunction aT : U → V defined by aT (u) = a(T (u)) for all u in U . Again, we illustratewith an example.

Example 2 Let V be the vector space of all (2 × 2) matrices and define T : V → R1 by T (A) =2a − b + 3c + 4d, where

A =[

a b

c d

].

Give the formula for 3T .



Solution By definition, 3T (A) = 3(T (A)) = 3(2a − b + 3c + 4d) = 6a − 3b + 9c + 12d.

It is straightforward to show that the functions T1 + T2 and aT , previously defined,are linear transformations (see Exercises 13 and 14).

Now let U,V , and W be vector spaces and let S and T be linear transformations,where S: U → V and T : V → W . The composition, L = T ◦ S, of S and T is definedto be function L: U → W given by L(u) = T (S(u)) for all u in U (see Fig. 5.7).

U S V T

T � S

Wu S(u) T(S(u))

Figure 5.7 The composition of linear transformations is a lineartransformation (see Example 3).

Example 3 Let S: U → V and T : V → W be linear transformations. Verify that the compositionL = T ◦ S is also a linear transformation.

Solution Let u1, u2 be vectors in U . Then L(u1 + u2) = T (S(u1 + u2)). Since S is a lineartransformation, S(u1 + u2) = S(u1)+ S(u2). But T is also a linear transformation, soL(u1+ u2) = T (S(u1)+ S(u2)) = T (S(u1))+ T (S(u2)) = L(u1)+L(u2). Similarly,if u is in U and a is a scalar, L(au) = T (S(au)) = T (aS(u)) = aT (S(u)) = aL(u).This shows that L = T ◦ S is a linear transformation.

The next two examples provide specific illustrations of the composition of two lineartransformations.

Example 4 Let U be the vector space of all (2 × 2) matrices. Define S: U → P2 by S(A) =(a − c)+ (b + 2c)x + (3c − d)x2, where

A =[

a b

c d

].

Define T : P2 → R2 by

T (a0 + a1x + a2x2) =

[a0 − a1

2a1 + a2

].

Give the formula for T ◦ S and show that S ◦ T is not defined.

Solution The composition T ◦ S: U → R2 is defined by (T ◦ S)(A) = T (S(A)) = T [(a − c)+(b + 2c)x + (3c − d)x2]. Thus

(T ◦ S)([

a b

c d

])=[

a − b − 3c2b + 7c − d

].

If p(x) is in P2, then T (p(x)) = v, where v is in R2. Thus (S ◦ T )(p(x)) =S(T (p(x))) = S(v). But v is not in the domain of S, so S(v) is not defined. Therefore,S ◦ T is undefined.



Example 4 illustrates that, as with matrix multiplication, T ◦ S may be defined,whereas S ◦ T is not defined. The next example illustrates that even when both aredefined, they may be different transformations.

Example 5 Let S: P4 → P4 be given by S(p) = p′′(x) and define T : P4 → P4 by T (q) = xq(1).Give the formulas for T ◦ S and S ◦ T .

Solution The linear transformation T ◦ S: P4 → P4 is defined by

(T ◦ S)(p) = T (S(p)) = T (p′′(x)) = xp′′(1),

and S ◦ T : P4 → P4 is given by

(S ◦ T )(p) = S(T (p)) = S(xp(1)) = [xp(1)]′′ = θ(x).In particular, S ◦ T �= T ◦ S.

Invertible TransformationsAs we have previously noted, linear transformations can be viewed as an extension of thenotion of a matrix to general vector spaces. In this subsection we introduce those lineartransformations that correspond to nonsingular (or invertible) matrices. First, supposeX and Y are any sets and f : X→ Y is a function; and supposeR(f ) denotes the rangeof f whereR(f ) ⊆ Y . Recall that f is onto provided thatR(f ) = Y ; that is, f is ontoif for each element y in Y there exists an element x in X such that f (x) = y.

In order to show that a linear transformation T : U → V is onto, it is frequentlyconvenient to use the results of Section 5.7 and a dimension argument to determinewhether R(T ) = V . To be more specific, suppose V has finite dimension. If R(T )has the same dimension, then, since R(T ) is a subspace of V , it must be the case thatR(T ) = V . Thus in order to show that T is onto when the dimension of V is finite, itsuffices to demonstrate that rank(T ) = dim(V ). Alternatively, an elementwise argumentcan be used to show that T is onto. The next two examples illustrate both procedures.

Example 6 Let U be the subspace of (2× 2) matrices defined by

U = {A: A =[

a −bb a

], where a and b are in R},

and let V = {f (x) in C[0, 1]: f (x) = cex + de−x , where c and d are in R}. DefineT : U → V by

T

([a −bb a

])= (a + b)ex + (a − b)e−x.

Show thatR(T ) = V .

Solution Note that U has basis {A1, A2}, where

A1 =[

1 00 1

]and A2 =

[0 −11 0

].



It follows from Theorem 15, property 1, of Section 5.7 that R(T ) = Sp{T (A1),

T (A2)} = Sp{ex + e−x, ex − e−x}. It is easily shown that the set {ex + e−x, ex − e−x}is linearly independent. It follows that rank(T ) = 2. Since {ex, e−x} is a linearly inde-pendent set and V = Sp{ex, e−x}, the set is a basis for V . In particular, dim(V ) = 2.SinceR(T ) ⊆ V and rank(T ) = dim(V ), it follows thatR(T ) = V .

Example 7 Let T : P → P be defined by T (p) = p′′(x). Show thatR(T ) = P .

Solution Recall that P is the vector space of all polynomials, with no bound on the degree. Wehave previously seen that P does not have a finite basis, so the techniques of Example 6do not apply. To show thatR(T ) = P , let q(x) = a0+ a1x+· · ·+ anx

n be an arbitrarypolynomial in P . We must exhibit a polynomial p(x) in P such that T (p) = p′′(x) =q(x). It is easy to see thatp(x) = (1/2)a0x

2+(1/6)a1x3+· · ·+[1/(n+1)(n+2)]anxn+2

is one choice for p(x). Thus T is onto.

Let f : X → Y be a function. If f is both one to one and onto, then the inverse off, f −1: Y → X, is the function defined by

f −1(y) = x if and only if f (x) = y. (1)

Therefore, if T : U → V is a linear transformation that is both one to one and onto,then the inverse function T −1: V → U is defined. The next two examples illustrate thisconcept.

Example 8 Let T : P4 → P3 be defined by T (p) = p′′(x). Show that T −1 is not defined.

Solution It is easy to see thatN (T ) = P1. In particular, by property 4 of Theorem 14 (Section 5.7),T is not one to one. Thus T −1 is not defined. To illustrate specifically, note thatT (x) = T (x + 1) = θ(x). Thus by formula (1) above, we have both T −1(θ(x)) = x

and T −1(θ(x)) = x + 1. Since T −1(θ(x)) is not uniquely determined, T −1 does notexist.

Since N (T ) = P1, it follows that nullity(T ) = 2. By property 3 of Theorem 15(Section 5.7), rank(T ) = dim(P4) − nullity(T ) = 5 − 2 = 3. But dim(P3) = 4, so T

is not onto. In particular, x3 is in P3, and it is easy to see that there is no polynomialp(x) in P4 such that T (p(x)) = p′′(x) = x3. Thus T −1(x3) remains undefined byformula (1).

Example 8 illustrates the following: If T : U → V is not one to one, then thereexists v in V such that T −1(v) is not uniquely determined by formula (1), since thereexists u1 and u2 in U such that u1 �= u2 but T (u1) = v = T (u2). On the other hand, ifT is not onto, there exists v in V such that T −1(v) is not defined by formula (1), sincethere exists no vector u in U such that T (u) = v.

Example 9 Let T : U → V be the linear transformation defined in Example 6. Show that T is bothone to one and onto, and give the formula for T −1.

Solution We showed in Example 6 that T is onto. In order to show that T is one to one, it suffices,by Theorem 14, property 4 of Section 5.7, to show that if A ∈ N (T ), then A = O



[where O is the (2× 2) zero matrix]. Thus suppose that

A =[

a −bb a

]

and T (A) = θ(x); that is, (a + b)ex + (a − b)e−x = θ(x). Since the set {ex, e−x} islinearly independent, it follows that a + b = 0 and a − b = 0. Therefore, a = b = 0and A = O.

To determine the formula for T −1, let f (x) = cex + de−x be in V . By formula (1),T −1(f ) = A, where A is a matrix such that T (A) = f (x); that is,

T (A) = (a + b)ex + (a − b)e−x = cex + de−x. (2)

Since {ex, e−x} is a linearly independent set, Eq. (2) requires that a+b = c and a−b = d.This yields a = (1/2)c+ (1/2)d and b = (1/2)c− (1/2)d. Therefore, the formula forT −1 is given by

T −1(cex + de−x) = (1/2)

[c + d −c + d

c − d c + d

].

A linear transformation T : U → V that is both one to one and onto is called aninvertible linear transformation. Thus if T is invertible, then the mapping T −1: V → U

exists and is defined by formula (1). The next theorem lists some of the propertiesof T −1.

Theorem 16 Let U and V be vector spaces, and let T : U → V be an invertible linear transformation.Then:

1. T −1: V → U is a linear transformation.2. T −1 is invertible and (T −1)−1 = T .3. T −1 ◦ T = IU and T ◦ T −1 = IV , where IU and IV are the identity trans-

formations on U and V , respectively.

Proof For property 1, we need to show that T −1: V → U satisfies Definition 8. Suppose thatv1 and v2 are vectors in V . Since T is onto, there are vectors u1 and u2 in U such thatT (u1) = v1 and T (u1) = v2. By formula (1),

T −1(v1) = u1 and T −1(v2) = u2. (3)

Furthermore, v1 + v2 = T (u1)+ T (u2) = T (u1 + u2), so by formula (1),

T −1(v1 + v2) = u1 + u2 = T −1(v1)+ T −1v2.

It is equally easy to see that T −1(cv) = cT −1(v) for all v in V and for any scalar c (seeExercise 15).

The proof of property 2 requires showing that T −1 is both one to one and onto. Tosee that T −1 is one to one, let v be in N (T −1). Then T −1(v) = θU , so by formula(1), T (θU) = v. By Theorem 14, property 1, of Section 5.7, v = θV so Theorem 14,property 4, implies that T is one to one. To see that T −1 is onto, let u be an arbitraryvector in U . If v = T (u), then v is in V and, by formula (1), T −1(v) = u. Therefore,T −1 is onto, and it follows that T is invertible.



That (T −1)−1 = T is an easy consequence of formula (1), as are the equalities givenin property 3, and the proofs are left as exercises.

As might be guessed from the corresponding theorems for nonsingular matrices,other properties of invertible transformations can be established. For example, ifT :U →V is an invertible transformation, then for each vector b in V , x = T −1(b) is the uniquesolution of T (x) = b. Also, if S and T are invertible and S ◦ T is defined, then S ◦ T isinvertible and (S ◦ T )−1 = T −1 ◦ S−1.

Isomorphic Vector SpacesSuppose that a linear transformation T : U → V is invertible. Since T is both one toone and onto, T establishes an exact pairing between elements of U and V . Moreover,because T is a linear transformation, this pairing preserves algebraic properties. There-fore, although U and V may be different sets, they may be regarded as indistinguishable(or equivalent) algebraically. Stated another way, U and V both represent just one un-derlying vector space but perhaps with different “labels” for the elements. The invertiblelinear transformation T acts as a translation from one set of labels to another.

If U and V are vector spaces and if T :U → V is an invertible linear transformation,then U and V are said to be isomorphic vector spaces. Also, an invertible transforma-tion T is called an isomorphism. For instance, the vector spaces U and V given inExample 6 are isomorphic, as shown in Example 9. The next example provides anotherillustration.

Example 10 Let U be the subspace of P3 defined byU = {p(x) = a0 + a1x + a2x

2 + a3x3: a3 = −2a0 + 3a1 + a2}.

Show that U is isomorphic to R3.

Solution Note that dim(U) = 3 and the set {1−2x3, x+3x3, x2+x3} is a basis for U . Moreover,each polynomial p(x) in U can be decomposed as

p(x) = a0 + a1x + a2x2 + a3x

3

= a0(1− 2x3)+ a1(x + 3x3)+ a2(x2 + x3). (4)

It is reasonable to expect that an isomorphism T : U → R3 will map a basis of U to abasis of R3. Since {e1, e2, e3} is a basis for R3, we seek a linear transformation T suchthat

T (1− 2x3) = e1, T (x + 3x3) = e2, and T (x2 + x3) = e3. (5)

It follows from Eq. (4) in this example and from Eq. (1) of Section 5.7 that if such alinear transformation exists, then it is defined by

T (a0 + a1x + a2x2 + a3x

3) = a0T (1− 2x3)+ a1T (x + 3x3)+ a2T (x2 + x3)

= a0e1 + a1e2 + a2e3.

That is,

T (a0 + a1x + a2x2 + a3x

3) =

a0

a1

a2

. (6)



It is straightforward to show that the function T defined by Eq. (6) is a lineartransformation. Moreover, the constraints placed on T by (5) imply, by Theorem 15,property 1, of Section 5.7, that R(T ) = R3. Likewise, by Theorem 15, property 2, Tis one to one. Therefore, T is an isomorphism andU andR3 are isomorphic vector spaces.

The previous example is actually just a special case of the following theorem, whichstates that every real n-dimensional vector space is isomorphic to Rn.

Theorem 17 If U is a real n-dimensional vector space, then U and Rn are isomorphic.

Proof To prove this theorem, we need only exhibit the isomorphism, and a coordinate system onU will provide the means. LetB = {u1, u2, . . . , un} be a basis forU , and let T :U → Rn

be the linear transformation defined by

T (u) = [u]B.Since B is a basis, θU is the only vector in N (T ); and therefore T is one to one.Furthermore, T (ui ) is the vector ei in Rn; so

R(T ) = Sp{T (u1), T (u2), . . . , T (un)} = Sp{e1, e2, . . . , en} = Rn.

Hence T is one to one and onto.

As an illustration of Theorem 17, note that dim(P2) = 3, so P2 and R3 are isomor-phic. Moreover, if B = {1, x, x2} is the natural basis for P2, then the linear transforma-tion T : P2 → R3 defined by T (p) = [p]B is an isomorphism; thus

T (a0 + a1x + a2x2) =

a0

a1

a2

.

The isomorphism T “pairs” the elements of P2 with elements of R3, p(x)↔ [p(x)]B .Furthermore, under this correspondence the sum of two polynomials, p(x) + q(x), ispaired with the sum of the corresponding coordinate vectors:

p(x)+ q(x)↔ [p(x)]B + [q(x)]B.Similarly, a scalar multiple, ap(x), of a polynomialp(x) is paired with the correspondingscalar multiple of [p(x)]B :

ap(x)↔ a[p(x)]B.In this sense, P2 and R3 have the same algebraic character.

It is easy to show that if U is isomorphic to V and V is isomorphic to W , then U

and W are also isomorphic (see Exercise 19). Using this fact, we obtain the followingcorollary of Theorem 17.

Corollary If U and V are real n-dimensional vector spaces, then U and V are isomorphic.



5.8 EXERCISES

In Exercises 1–6, the linear transformations S, T , andH are defined as follows:

S:P3 → P4 is defined by S(p) = p′(0).T :P3 → P4 is defined by T (p) = (x + 2)p(x).H :P4 → P3 is defined by H(p) = p′(x)+ p(0).

1. Give the formula for S + T . Calculate (S + T )(x)

and (S + T )(x2).

2. Give the formula for 2T . Calculate (2T )(x).

3. Give the formula for H ◦ T . What is the domain forH ◦ T ? Calculate (H ◦ T )(x).

4. Give the formula for T ◦H . What is the domain forT ◦H? Calculate (T ◦H)(x).

5. a) Prove that T is one to one but not onto.b) Attempt to define T −1: P4 → P3 as in for-

mula (1) by setting T −1(q) = p if and only ifT (p) = q. What is T −1(x)?

6. a) Prove that H is onto but not one to one.b) Attempt to define H−1: P3 → P4 as in for-

mula (1) by setting H−1(q) = p if and only ifH(p) = q. Show that H−1(x) is not uniquelydetermined.

7. The functions ex, e2x , and e3x are linearly indepen-dent in C[0, 1]. Let V be the subspace of C[0, 1]defined by V = Sp{ex, e2x, e3x}, and let T : V → V

be given by T (p) = p′(x). Show that T is invert-ible and calculate T −1(ex), T −1(e2x), and T −1(e3x).What is T −1(aex + be2x + ce3x)?

8. Let V be the subspace of C[0, 1] defined by V =Sp{sin x, cos x, e−x}, and let T :V → V be given byT (f ) = f ′(x). Given that the set {sin x, cos x, e−x}is linearly independent, show that T is invertible.Calculate T −1(sin x), T −1(cos x), and T −1(e−x)and give the formula for T −1; that is, determineT −1(a sin x + b cos x + ce−x).

9. Let V be the vector space of all (2×2) matrices anddefine T : V → V by T (A) = AT . Show that T isinvertible and give the formula for T −1.

10. Let V be the vector space of all (2 × 2) matrices,and let Q be a given nonsingular (2× 2) matrix. IfT : V → V is defined by T (A) = Q−1AQ, provethat T is invertible and give the formula for T −1.

11. Let V be the vector space of all (2× 2) matrices.a) Use Theorem 17 to show that V is isomorphic

to R4.b) Use the corollary to Theorem 17 to show that V

is isomorphic to P3.c) Exhibit an isomorphism T : V → P3. [Hint:

See Example 10.]12. Let U be the vector space of all (2 × 2) symmetric

matrices.a) Use Theorem 17 to show that U is isomorphic

to R3.b) Use the corollary to Theorem 17 to show that U

is isomorphic to P2.c) Exhibit an isomorphism T : U → P2.

13. Let T1: U → V and T2: U → V be linear transfor-mations. Prove that S:U → V , where S = T1+T2,is a linear transformation.

14. IfT :U → V is a linear transformation anda is a sca-lar, show that aT :U → V is a linear transformation.

15. Complete the proof of property 1 of Theorem 16 byshowing that T −1(cv) = cT −1(v) for all v in V andfor an arbitrary scalar c.

16. Complete the proof of property 2 of Theorem 16 byshowing that (T −1)−1 = T . [Hint: Use formula 1.]

17. Prove property 3 of Theorem 16.18. Let S: U → V and T : V → W be linear

transformations.a) Prove that if S and T are both one to one, then

T ◦ S is one to one.b) Prove that if S and T are both onto, then T ◦ S

is onto.c) Prove that if S and T are both invertible, then

T ◦ S is invertible and (T ◦ S)−1 = S−1 ◦ T −1.19. Let U,V , and W be vector spaces such that U and

V are isomorphic and V andW are isomorphic. UseExercise 18 to show that U and W are isomorphic.

20. Let U and V both be n-dimensional vector spaces,and suppose that T : U → V is a lineartransformation.a) If T is one to one, prove that T is invertible.

[Hint: Use property 3 of Theorem 15 to provethat R(T ) = V .]


5.9 Matrix Representations for Linear Transformations 419

b) If T is onto, prove that T is invertible. [Hint:Use property 3 of Theorem 15 and property 4 ofTheorem 14 to prove that T is one to one.]

21. Define T : P → P by T (a0 + a1x + · · · + anxn) =

a0x + (1/2)a1x2 + · · · + (1/(n+ 1))anxn+1. Prove

thatT is one to one but not onto. Why is this examplenot a contradiction of part a) of Exercise 20?

22. Define S: P → P by S(p) = p′(x). Prove that S isonto but not one to one. Why is this example not acontradiction of part b) of Exercise 20?

In Exercises 23–25, S: U → V and T : V → W arelinear transformations.23. Show thatN (S) ⊆ N (T ◦S). Conclude that if T ◦S

is one to one, then S is one to one.24. Show thatR(T ◦S) ⊆ R(T ). Conclude that if T ◦S

is onto, then T is onto.25. Assume that U,V , and W all have dimension n.

Prove that if T ◦ S is invertible, then both T and S

are invertible. [Hint: Use Exercises 20, 23, and 24.]

26. Let A be an (m×p) matrix and B a (p×n) matrix.Use Exercises 23 and 24 to show that nullity(B) ≤nullity(AB) and rank(AB) ≤ rank(A).

27. Let A be an (n × n) matrix, and suppose thatT : Rn → Rn is defined by T (x) = Ax. Showthat T is invertible if and only if A is nonsingular.If T is invertible, give a formula for T −1.

28. Let A and B be (n × n) matrices such that AB isnonsingular. Use Exercises 25 and 27 to show thateach of the matrices A and B is nonsingular.

29. Let U and V be vector spaces, and let L(U, V ) ={T : T is a linear transformation from U to V }. Withthe operations of addition and scalar multiplicationdefined in this section, show thatL(U, V ) is a vectorspace.

5.9 MATRIX REPRESENTATIONS FOR LINEARTRANSFORMATIONS

In Section 3.7 we showed that a linear transformation T : Rn → Rm can be representedas multiplication by an (m × n) matrix A; that is, T (x) = Ax for all x in Rn. In thegeneral vector-space setting, we have viewed a linear transformation T : U → V as anextension of this notion. Now suppose that U and V both have finite dimension, saydim(U) = n and dim(V ) = m. By Theorem 17 of Section 5.8, U is isomorphic to Rn

and V is isomorphic to Rm. To be specific, let B be a basis for U and let C be a basis forV . Then each vector u in U can be represented by the vector [u]B in Rn, and similarlyeach vector v in V can be represented by the vector [v]C in Rm. In this section we showthat T can be represented by an (m × n) matrix Q in the sense that if T (u) = v, thenQ[u]B = [v]C .

The Matrix of a TransformationWe begin by defining the matrix of a linear transformation. Thus letT :U → V be a lineartransformation, where dim(U) = n and dim(V ) = m. Let B = {u1, u2, . . . , un} be abasis for U , and let C = {v1, v2, . . . , vm} be a basis for V . The matrix representationfor T with respect to the bases B and C is the (m× n) matrix Q defined by

Q = [Q1,Q2, . . . ,Qn],



where

Qj = [T (uj )]C.Thus to determine Q, we first represent each of the vectors T (u1), T (u2), . . . , T (un) interms of the basis C for V :

T (u1) = q11v1 + q21v2 + · · · + qm1vmT (u2) = q12v1 + q22v2 + · · · + qm2vm

......

...T (un) = q1nv1 + q2nv2 + · · · + qmnvm.

(1)

It follows from system (1) that

Q1 = [T (u1)]C =

q11

q21...

qm1

, . . . ,Qn = [T (un)]C =

q1n

q2n...

qmn

. (2)

The following example provides a specific illustration.

Example 1 Let U be the vector space of all (2× 2) matrices and define T : U → P2 by

T

([a b

c d

])= (a − d)+ (a + 2b)x + (b − 3c)x2.

Find the matrix of T relative to the natural bases for U and P2.

Solution Let B = {E11, E12, E21, E22} be the natural basis for U , and let C = {1, x, x2} be thenatural basis for P2. Then T (E11) = 1 + x, T (E12) = 2x + x2, T (E21) = −3x2, andT (E22) = −1. In this example system (1) becomes

T (E11) = 1+ 1x + 0x2

T (E12) = 0+ 2x + 1x2

T (E21) = 0+ 0x − 3x2

T (E22) = −1+ 0x + 0x2.

Therefore, the matrix of T is the (3× 4) matrix Q given by

Q =

1 0 0 −11 2 0 00 1 −3 0

.

The Representation TheoremThe next theorem shows that if we translate from general vector spaces to coordinatevectors, the action of a linear transformation T translates to multiplication by its matrixrepresentative.



Theorem 18 Let T : U → V be a linear transformation, where dim(U) = n and dim(V ) = m. Let Band C be bases for U and V , respectively, and let Q be the matrix of T relative to B andC. If u is a vector in U and if T (u) = v, then

Q[u]B = [v]C. (3)

Moreover, Q is the unique matrix that satisfies (3).

The representation of T by Q is illustrated in Fig. 5.8. Before giving the proof ofTheorem 18, we illustrate the result with an example.

[u]B Q[u]B = [v]C

u T(u) = v

Figure 5.8 The matrix of T

Example 2 Let T : U → P2 be the linear transformation defined in Example 1, and let Q be thematrix representation determined in that example. Show by direct calculation that ifT (A) = p(x), then

Q[A]B = [p(x)]C. (4)

Solution Recall that B = {E11, E12, E21, E22} and C = {1, x, x2}. Equation (4) is, of course, animmediate consequence of Eq. (3). To verify Eq. (4) directly, note that if

A =[

a b

c d

],

then

[A]B =

a

b

c

d

.

Further, if p(x) = T (A), then p(x) = (a − d)+ (a + 2b)x + (b − 3c)x2, so

[p(x)]C =

a − d

a + 2bb − 3c

.

Therefore,

Q[A]B =

1 0 0 −11 2 0 00 1 −3 0

a

b

c

d

=

a − d

a + 2bb − 3c

= [p(x)]C.



Proof of Theorem 18 Let B = {u1, u2, . . . , un} be the given basis for U , let u be in U , and set T (u) = v. Firstwrite u in terms of the basis vectors:

u = a1u1 + a2u2 + · · · + anun. (5)

It follows from Eq. (5) that the coordinate vector for u is

[u]B =

a1

a2...an

.

Furthermore, the action of T is completely determined by its action on a basis for U (seeEq. (1) of Section 5.7), so Eq. (5) implies that

T (u) = a1T (u1)+ a2T (u2)+ · · · + anT (un) = v. (6)

The vectors in Eq. (6) are in V , and passing to coordinate vectors relative to thebasis C yields, by Eq. (10) of Section 5.4,

a1[T (u1)]C + a2[T (u2)]C + · · · + an[T (un)]C = [v]C. (7)

Recall that the matrix Q of T is the (m× n) matrix Q = [Q1,Q2, . . . ,Qn], where

Qj = [T (uj )]C.Thus Eq. (7) can be rewritten as

a1Q1 + a2Q2 + · · · + anQn = [v]C. (8)

Since Eq. (8) is equivalent to the matrix equation

Q[u]B = [v]C,this shows that Eq. (3) of Theorem 18 holds. The uniqueness of Q is left as an exercise.

Example 3 Let S: P2 → P3 be the differential operator defined by S(f ) = x2f ′′ − 2f ′ + xf . Findthe (4 × 3) matrix P that represents S with respect to the natural bases C = {1, x, x2}and D = {1, x, x2, x3} for P2 and P3, respectively. Also, illustrate that P satisfiesEq. (3) of Theorem 18.

Solution To construct the (4×3)matrixP that represents S, we need to find the coordinate vectorsof S(1), S(x), and S(x2) with respect to D. We calculate that S(1) = x, S(x) = x2− 2,and S(x2) = x3 + 2x2 − 4x; so the coordinate vectors of S(1), S(x), and S(x2) are

[S(1)]D =

0100

, [S(x)]D =

−2

010

, and [S(x2)]D =

0−4

21

.



Thus the matrix representation for S is the (4× 3) matrix

P =

0 −2 01 0 −40 1 20 0 1

.

To see that Eq. (3) of Theorem 18 holds, let p(x) = a0 + a1x + a2x2 be in P2. Then

S(p) = −2a1 + (a0 − 4a2)x + (a1 + 2a2)x2 + a2x

3. In this case

[p(x)]C =

a0

a1

a2

,

and if S(p) = q(x), then

[q(x)]D =

−2a1

a0 − 4a2

a1 + 2a2

a2

.

A straightforward calculation shows that P [p(x)]C = [q(x)]D .

Example 4 Let A be an (m×n) matrix and consider the linear transformation T : Rn→ Rm definedby T (x) = Ax. Show that relative to the natural bases for Rn and Rm, the matrix for Tis A.

Solution Let B = {e1, e2, . . . , en} be the natural basis for Rn, and let C denote the natural basisfor Rm. First note that for each vector y in Rm, y = [y]C . Now let Q denote the matrixof T relative to B and C, Q = [Q1,Q2, . . . ,Qn], and write A = [A1,A2, . . . ,An]. Thedefinition of Q gives

Qj = [T (ej )]C = T (ej ) = Aej = Aj .

It now follows that Q = A.

If V is a vector space, then linear transformations of the form T :V → V are ofconsiderable interest and importance. In this case, the same basis B is normally chosenfor both the domain and the range of T , and we refer to the representation as the matrixof T with respect to B. In this case, if Q is the matrix of T and if v is in V , then Eq. (3)of Theorem 18 becomes

Q[v]B = [T (v)]B.The next example illustrates this special case.

Example 5 Let T : P2 → P2 be the linear transformation defined by T (p) = xp′(x). Find thematrix, Q, of T relative to the natural basis for P2.



Solution Let B = {1, x, x2}. Then T (1) = 0, T (x) = x, and T (x2) = 2x2. The coordinatevectors for T (1), T (x), T (x2) relative to B are

[T (1)]B =

000

, [T (x)]B =

010

, and [T (x2)]B =

002

.

It follows that the matrix of T with respect to B is the (3× 3) matrix

Q =

0 0 00 1 00 0 2

.

If p(x) = a0 + a1x + a2x2, then T [p(x)] = x(a1 + 2a2x) = a1x + 2a2x

2. Thus

[p(x)]B =

a0

a1

a2

and [T (p(x))]B =

0a1

2a2

.

A direct calculation verifies that Q[p(x)]B = [T (p(x))]B .

Algebraic PropertiesIn Section 5.8, we defined the algebraic operations of addition, scalar multiplication, andcomposition for linear transformations. We now examine the matrix representations ofthe resulting transformations. We begin with the following theorem.

Theorem 19 Let U and V be vector spaces, with dim(U) = n and dim(V ) = m, and suppose that Band C are bases for U and V , respectively. Let T1, T2, and T be transformations fromU to V and let Q1,Q2, and Q be the matrix representations with respect to B and C forT1, T2, and T , respectively. Then:

1. Q1 +Q2 is the matrix representation for T1 + T2 with respect to B and C.2. For a scalar a, aQ is the matrix representation for aT with respect to B and C.

Proof We include here only the proof of property 1. The proof of property 2 is left for Exer-cises 26 and 27.

To prove property 1, set T3 = T1 + T2 and let Q3 be the matrix of T3. By Eq. (3) ofTheorem 18, Q3 satisfies the equation

Q3[u]B = [T3(u)]C (9)

for every vector u in U ; moreover, any other matrix that satisfies Eq. (9) is equal to Q3.We also know from Theorem 18 that

Q1[u]B = [T1(u)]C and Q2[u]B = [T2(u)]C (10)

for every vector u in U . Using Eq. (10) in Section 5.4 gives

[T1(u)+ T2(u)]C = [T1(u)]C + [T2(u)]C. (11)



It follows from Eqs. (10) and (11) that

(Q1 +Q2)[u]B = [T1(u)+ T2(u)]C = [T3(u)]C;therefore, Q3 = Q1 +Q2.

The following example illustrates the preceding theorem.

Example 6 Let T1 and T2 be the linear transformations from P2 to R2 defined by

T1(p) =[

p(0)p(1)

]and T2(p) =

[p′(0)p(−1)

].

SetT3 = T1+T2 andT4 = 3T1 and letB = {1, x, x2} andC = {e1, e2}. Use the definitionto calculate the matrices Q1,Q2,Q3, and Q4 for T1, T2, T3, and T4, respectively, relativeto the bases B and C. Note that Q3 = Q1 +Q2 and Q4 = 3Q1.

Solution Since

T1(1) =[

11

], T1(x) =

[01

], and T1(x

2) =[

01

],

it follows that Q1 is the (2× 3) matrix given by

Q1 =[

1 0 01 1 1

].

Similarly,

T2(1) =[

01

], T2(x) =

[1−1

], and T2(x

2) =[

01

],

so Q2 is given by

Q2 =[

0 1 01 −1 1

].

Now T3(p) = T1(p)+ T2(p), so

T3(p) =[

p(0)+ p′(0)p(1)+ p(−1)

].

Proceeding as before, we obtain

T3(1) =[

12

], T3(x) =

[10

], and T3(x

2) =[

02

].

Thus

Q3 =[

1 1 02 0 2

],

and clearly Q3 = Q1 +Q2.



The formula for T4 is

T4(p) = 3T1(p) = 3

[p(0)p(1)

]=[

3p(0)3p(1)

].

The matrix, Q4, for T4 is easily obtained and is given by

Q4 =[

3 0 03 3 3

].

In particular, Q4 = 3Q1.

The following theorem shows that the composition of two linear transformationscorresponds to the product of the matrix representations.

Theorem 20 Let T : U → V and S: V → W be linear transformations, and suppose dim(U) =n, dim(V ) = m, and dim(W) = k. Let B,C, and D be bases for U,V , and W ,respectively. If the matrix for T relative to B and C is Q[Q is (m× n)] and the matrixfor S relative to C and D is P [P is (k×m)], then the matrix representation for S ◦ T isPQ.

Proof The composition of T and S is illustrated in Fig. 5.9(a), and the matrix representation isillustrated in 5.9(b).

UT

VS

W

u T(u) S[T(u)](a)

RnQ

RmP

Rk

[u]B Q[u]B PQ[u]B(b)

Figure 5.9 The matrix for S ◦ T

To prove Theorem 20, let N be the matrix of S ◦ T with respect to the bases B andD. Then N is the unique matrix with the property that

N [u]B = [(S ◦ T )(u)]D (12)

for every vector u in U . Similarly, Q and P are characterized by

Q[u]B = [T (u)]C and P [v]C = [S(v)]D (13)

for all u in U and v in V . It follows from Eq. (13) that

PQ[u]B = P [T (u)]C = [S(T (u))]D = [(S ◦ T )(u)]D.The uniqueness of N in Eq. (12) now implies that PQ = N .

Example 7 Let U be the vector space of (2 × 2) matrices. If T : U → P2 is the transformationgiven in Example 1 and S: P2 → P3 is the transformation described in Example 3, givethe formula for S ◦ T . By direct calculation, find the matrix of S ◦ T with respect tothe bases B = {E11, E12, E21, E22} and D = {1, x, x2, x3} for U and P3, respectively.



Finally, use Theorem 20 and the matrices found in Examples 1 and 3 to calculate thematrix for S ◦ T .

Solution Recall that T : U → P2 is given by

T

([a b

c d

])= (a − d)+ (a + 2b)x + (b − 3c)x2,

and S: P2 → P3 is defined by

S(p) = x2p′′ − 2p′ + xp.

Therefore, S ◦ T : U → P3 is defined by

(S ◦ T )(A) = S(T (A)) = S((a − d)+ (a + 2b)x + (b − 3c)x2)

= (−2a − 4b)+ (a − 4b + 12c − d)x + (a + 4b − 6c)x2

+ (b − 3c)x3.

The matrix, N , of S ◦ T relative to the given bases B and D is easily determined to bethe (4× 4) matrix

N =

−2 −4 0 0

1 −4 12 −11 4 −6 00 1 −3 0

.

Moreover, N = PQ, where Q is the matrix for T found in Example 1 and P is the matrixfor S determined in Example 3.

A particularly useful case of Theorem 20 is the one in which S and T both map V

to V , dim(V ) = n, and the same basis B is used for both the domain and the range. Inthis case, the composition S ◦ T is always defined, and the matrices P and Q for S andT , respectively, are both (n× n) matrices. Using Theorem 20, we can easily show thatif T is invertible, then Q is nonsingular, and furthermore the matrix representation forT −1 is Q−1. The matrix representation for the identity transformation on V, IV , is the(n× n) identity matrix I . The matrix representation for the zero transformation on V isthe (n× n) zero matrix. (Observe that the identity and the zero transformations alwayshave the same matrix representations, regardless of what basis we choose for V . Thuschanging the basis for V may change the matrix representation for T or may leave therepresentation unchanged.)

The Vector Space L(U, V) (Optional)If U and V are vector spaces, then L(U, V ) denotes the set of all linear transformationsfrom U to V :

L(U, V ) = {T : T is a linear transformation; T : U → V }.If T , T1, and T2 are in L(U, V ) and a is a scalar, we have seen in Section 5.8 that T1+T2and aT are again in L(U, V ). In fact, with these operations of addition and scalarmultiplication, we have the following.



Remark The set L(U, V ) is a vector space.The proof of this remark is Exercise 29 of Section 5.8. We note here only that the zero

of L(U, V ) is the zero transformation T0: U → V defined by T0(u) = θV for all u in U .To see this, letT be inL(U, V ). Then (T+T0)(u) = T (u)+T0(u) = T (u)+θV = T (u).This shows that T + T0 = T , so T0 is the zero vector in L(U, V ).

Now let Rmn denote the vector space of (m × n) real matrices. If dim(U) = n

and dim(V ) = m, then we can define a function ψ : L(U, V ) → Rmn as follows: LetB and C be bases for U and V , respectively. For a transformation T in L(U, V ), setψ(T ) = Q, where Q is the matrix of T with respect to B and C. We will now show thatψ is an isomorphism. In particular, the following theorem holds.

Theorem 21 If U and V are vector spaces such that dim(U) = n and dim(V ) = m, then L(U, V ) isisomorphic to Rmn.

Proof It is an immediate consequence of Theorem 19 that the function ψ defined previouslyis a linear transformation; that is, if S and T are in L(U, V ) and a is a scalar, thenψ(S + T ) = ψ(S)+ ψ(T ) and ψ(aT ) = aψ(T ).

To show that ψ maps L(U, V ) onto Rmn, let Q = [qij ] be an (m × n) matrix.Assume that B = {u1, u2, . . . , un} and C = {v1, v2, . . . , vm} are the given bases for Uand V , respectively. Define a subset {w1,w2, . . . ,wn} of V as follows:

w1 = q11v1 + q21v2 + · · · + qm1vmw2 = q12v1 + q22v2 + · · · + qm2vm...

......

wn = q1nv1 + q2nv2 + · · · + qmnvm. (14)

Each vector u in U can be expressed uniquely in the form

u = a1u1 + a2u2 + · · · + anun.

If T : U → V is a function defined by

T (u) = a1w1 + a2w2 + · · · + anwn,

then T is a linear transformation and T (uj ) = wj for each j, 1 ≤ j ≤ n. By comparingsystems (14) and (1), it becomes clear that the matrix of T with respect to B and C isQ; that is, ψ(T ) = Q.

The proof that ψ is one to one is Exercise 33.

The following example illustrates the method, described in the proof of Theorem 21,for obtaining the transformation when its matrix representation is given.

Example 8 Let Q be the (3× 4) matrix

Q =

1 0 −1 00 1 1 02 0 3 1

.

Give the formula for a linear transformation T : P3 → P2 such that the matrix of T

relative to the natural bases for P3 and P2 is Q.



Solution Let B = {1, x, x2, x3} and let C = {1, x, x2}. Following the proof of Theorem 21, weform a subset {q0(x), q1(x), q2(x), q3(x)} of P2 by using the columns of Q. Thus

q0(x) = (1)1+ 0x + 2x2 = 1+ 2x2

q1(x) = (0)1+ 1x + 0x2 = x

q2(x) = (−1)1+ 1x + 3x2 = −1+ x + 3x2

q3(x) = (0)1+ 0x + 1x2 = x2.

If p(x) = a0 + a1x + a2x2 + a3x

3 is an arbitrary polynomial in P3, then defineT : P3 → P2 by T (p(x)) = a0q0(x)+ a1q1(x)+ a2q2(x)+ a3q3(x). Thus

T (p(x)) = (a0 − a2)+ (a1 + a2)x + (2a0 + 3a2 + a3)x2.

It is straightforward to show that T is a linear transformation. Moreover, T (1) =q0(x), T (x) = q1(x), T (x

2) = q2(x), and T (x3) = q3(x). It follows that Q is thematrix of T with respect to B and C.

Let U and V be vector spaces such that dim(U) = n and dim(V ) = m. Theorem 21implies that L(U, V ) and Rmn are essentially the same vector space. Thus, for example,we can now conclude that L(U, V ) has dimension mn. Furthermore, if T is a lineartransformation in L(U, V ) and Q is the corresponding matrix in Rmn, then propertiesof T can be ascertained by studying Q. For example, a vector u in U is in N (T ) ifand only if [u]B is in N (Q) and a vector v in V is in R(T ) if and only if [v]C is inR(Q). It follows that nullity(T ) = nullity(Q) and rank(T ) = rank(Q). In summary,the correspondence between L(U, V ) and Rmn allows both the computational and thetheoretical aspects of linear transformations to be studied in the more familiar contextof matrices.

5.9 EXERCISES

In Exercises 1–10, the linear transformations S, T ,H

are defined as follows:

S: P3 → P4 is defined by S(p) = p′(0).T : P3 → P4 is defined by T (p) = (x + 2)p(x).H : P4 → P3 is defined by H(p) = p′(x)+ p(0).

Also, B = {1, x, x2, x3} is the natural basis for P3,and C = {1, x, x2, x3, x4} is the natural basis for P4.1. Find the matrix for S with respect to B and C.2. Find the matrix for T with respect to B and C.3. a) Use the formula for S + T (see Exercise 1 of

Section 5.8) to calculate the matrix of S + T

relative to B and C.

b) Use Theorem 19 and the matrices found inExercises 1 and 2 to obtain the matrix repre-sentation of S + T .

4. a) Use the formula for 2T (see Exercise 2 ofSection 5.8) to calculate the matrix of 2T withrespect to B and C.

b) Use Theorem 19 and the matrix found inExercise 2 to find the matrix for 2T .

5. Find the matrix for H with respect to C and B.6. a) Use the formula for H ◦ T (see Exercise 3 of

Section 5.8) to determine the matrix of H ◦ Twith respect to B.

∗Exercises that are based on optional material.



b) Use Theorem 20 and the matrices obtained inExercises 2 and 5 to obtain the matrix repre-sentation for H ◦ T .

7. a) Use the formula for T ◦H (see Exercise 4 ofSection 5.8) to determine the matrix of T ◦Hwith respect to C.

b) Use Theorem 20 and the matrices obtained inExercises 2 and 5 to obtain the matrix repre-sentation for T ◦H .

8. Let p(x) = a0 + a1x + a2x2 + a3x

3 be an arbitrarypolynomial in P3.a) Exhibit the coordinate vectors [p]B and[S(p)]C .

b) If P is the matrix for S obtained in Exercise 1,demonstrate that P [p]B = [S(p)]C .

9. Let p(x) = a0 + a1x + a2x2 + a3x

3 be an arbitrarypolynomial in P3.a) Exhibit the coordinate vectors [p]B and[T (p)]C .

b) If Q is the matrix for T obtained in Exercise 2,demonstrate that Q[p]B = [T (p)]C .

10. Let N be the matrix representation obtained for Hin Exercise 5. Demonstrate that N [q]C = [H(q)]Bfor q(x) = a0 + a1x + a2x

2 + a3x3 + a4x

4 in P4.

11. Let T : V → V be the linear transformation de-fined in Exercise 7 of Section 5.8, and let B ={ex, e2x, e3x}.a) Find the matrix, Q, of T with respect to B.b) Find the matrix, P , of T −1 with respect to B.c) Show that P = Q−1.

12. Let T : V → V be the linear transformation de-fined in Exercise 8 of Section 5.8, and let B ={sin x, cos x, e−x}. Repeat Exercise 11.

13. Let V be the vector space of (2 × 2) matrices anddefine T :V → V by T (A) = AT (see Exercise 9 ofSection 5.8). Let B = {E11, E12, E21, E22} be thenatural basis for V .a) Find the matrix, Q, of T with respect to B.b) For arbitrary A in V , show that Q[A]B =[AT ]B .

14. Let S:P2 → P3 be given by S(p) = x3p′′ −x2p′ +3p. Find the matrix representation of S with re-spect to the natural bases B = {1, x, x2} for P2 andC = {1, x, x2, x3} for P3.

15. Let S be the transformation in Exercise 14, let thebasis for P2 be B = {x + 1, x + 2, x2}, and let thebasis for P3 be C = {1, x, x2, x3}. Find the matrixrepresentation for S.

16. Let S be the transformation in Exercise 14, let thebasis for P2 be B = {1, x, x2}, and let the basis forP3 be D = {3, 3x − x2, 3x2, x3}. Find the matrixfor S.

17. Let T : P2 → R3 be given by

T (p) =

p(0)3p′(1)p′(1)+ p′′(0)

.

Find the representation of T with respect to the nat-ural bases for P2 and R3.

18. Find the representation for the transformation in Ex-ercise 17 with respect to the natural basis for P2 andthe basis {u1, u2, u3} for R3, where

u1 =

1

0

1

, u2 =

0

1

1

, and

u3 =

1

1

1

.

19. Let T : V → V be a linear transformation, whereB = {v1, v2, v3, v4} is a basis forV . Find the matrixrepresentation of T with respect to B if T (v1) = v2,T (v2) = v3, T (v3) = v1+v2, andT (v4) = v1+3v4.

20. Let T : R3 → R2 be given by T (x) = Ax, where

A =[

1 2 13 0 4

].

Find the representation of T with respect to the nat-ural bases for R2 and R3.

21. LetT :P2 → P2 be defined byT (a0+a1x+a2x2) =

(−4a0−2a1)+(3a0+3a1)x+(−a0+2a1+3a2)x2.

Determine the matrix of T relative to the natural ba-sis B for P2.

22. Let T be the linear transformation defined in Ex-ercise 21. If Q is the matrix representation foundin Exercise 21, show that Q[p]B = [T (p)]B forp(x) = a0 + a1x + a2x

2.


5.10 Change of Basis and Diagonalization 431

23. Let T be the linear transformation defined in Exer-cise 21. Find the matrix of T with respect to thebasis C = {1− 3x + 7x2, 6− 3x + 2x2, x2}.

24. Complete the proof of Theorem 18 by showing thatQ is the unique matrix that satisfies Equation 3.[Hint: Suppose P = [P1,P2, . . . ,Pn] is an (m×n)

matrix such that P [u]B = [T (u)]C for each u in U .By takingu inB, show thatPj = Qj for 1 ≤ j ≤ n.]

25. Give another proof of property 1 of Theorem 19 byconstructing matrix representations for T1, T2, andT1 + T2.

26. Give a proof of property 2 of Theorem 19 by con-structing matrix representations for T and aT .

27. Give a proof of property 2 of Theorem 19 that usesthe uniqueness assertion in Theorem 18.

28. Let V be an n-dimensional vector space, and letIV : V → V be the identity transformation on V .[Recall that IV (v) = v for all v in V .] Show thatthe matrix representation for IV with respect to anybasis for V is the (n× n) identity matrix I .

29. Let V be an n-dimensional vector space, and letT0: V → V be the zero transformation in V ; thatis, T0(v) = θV for all v in V . Show that the matrixrepresentation for T0 with respect to any basis for Vis the (n× n) zero matrix.

30. Let V be an n-dimensional vector space with ba-sis B, and let T : V → V be an invertible lineartransformation. Let Q be the matrix of T with re-spect to B, and let P be the matrix of T −1 withrespect to B. Prove that P = Q−1. [Hint: Note thatT −1◦T = IV . Now apply Theorem 20 and Exercise28.]

In Exercises 31 and 32, Q is the (3× 4) matrix given by

Q =

1 0 2 00 1 0 1−1 1 0 −1

.

*31. Give the formula for a linear transformationT : P3 → P2 such that Q is the matrix of T withrespect to the natural bases for P3 and P2.

*32. Let V be the vector space of all (2 × 2) matri-ces. Give the formula for a linear transformationS: P2 → V such that QT is the matrix of S withrespect to the natural bases for P2 and V .

*33. Complete the proof of Theorem 21 by showing thatthe mapping described in the proof of the theoremis one to one.

5.10 CHANGE OF BASIS AND DIAGONALIZATION

In Section 5.9, we saw that a linear transformation from U to V could be representedas an (m × n) matrix when dim(U) = n and dim(V ) = m. A consequence of thisrepresentation is that properties of transformations can be studied by examining theircorresponding matrix representations. Moreover, we have a great deal of machinery inplace for matrix theory; so matrices will provide a suitable analytical and computationalframework for studying a linear transformation. To simplify matters somewhat, weconsider only transformations from V to V , where dim(V ) = n. So let T : V → V be alinear transformation and suppose that Q is the matrix representation for T with respectto a basis B; that is,

if w = T (u), then [w]B = Q[u]B. (1)

As we know, when we change the basisB forV , we may change the matrix representationfor T . If we are interested in the properties of T , then it is reasonable to search for abasis for V that makes the matrix representation for T as simple as possible. Findingsuch a basis is the subject of this section.



Diagonalizable TransformationsA particularly nice matrix to deal with computationally is a diagonal matrix. IfT :V → V

is a linear transformation whose matrix representation with respect to B is a diagonalmatrix,

D =

d1 0 0 · · · 00 d2 0 · · · 0...

...

0 0 0 · · · dn

, (2)

then it is easy to analyze the action of T on V , as the following example illustrates.

Example 1 Let V be a three-dimensional vector space with basis B = {v1, v2, v3}, and suppose thatT : V → V is a linear transformation with matrix

D =

2 0 00 3 00 0 0

with respect to B. Describe the action of T in terms of the basis vectors and determinebases for N (T ) andR(T ).

Solution It follows from the definition of D that T (v1) = 2v1, T (v2) = 3v2, and T (v3) = θ . Ifu is any vector in V and

u = av1 + bv2 + cv3,

then

T (u) = aT (v1)+ bT (v2)+ cT (v3).

Therefore, the action of T on u is given by

T (u) = 2av1 + 3bv2.

It follows that u is in N (T ) if and only if a = b = 0; that is,

N (T ) = Sp{v3}. (3)

Further,R(T ) = Sp{T (v1), T (v2), T (v3)}, and since T (v3) = θ , it follows that

R(T ) = Sp{T (v1), T (v2)} = Sp{2v1, 3v2}. (4)

One can easily see that the spanning sets given in Eqs. (3) and (4) are linearly independent,so they are bases for N (T ) andR(T ), respectively.

If T is a linear transformation with a matrix representation that is diagonal, then T

is called diagonalizable. Before characterizing diagonalizable linear transformations,we need to extend the concepts of eigenvalues and eigenvectors to the general vector-space setting. Specifically, a scalar λ is called an eigenvalue for a linear transformationT : V → V provided that there is a nonzero vector v in V such that T (v) = λv. Thevector v is called an eigenvector for T corresponding to λ. The following exampleillustrates these concepts.



Example 2 Let T : P2 → P2 be defined by

T (a0 + a1x + a2x2) = (2a1 − 2a2)+ (2a0 + 3a2)x + 3a2x

2.

Show that C = {1+ x, 1− x, x + x2} is a basis of V consisting of eigenvectors for T ,and exhibit the matrix of T with respect to C.

Solution It is straightforward to show that C is a basis for P2. Also,

T (1+ x) = 2+ 2x = 2(1+ x)

T (1− x) = −2+ 2x = −2(1− x)

T (x + x2) = 3x + 3x2 = 3(x + x2).

(5)

Thus T has eigenvalues 2,−2, and 3 with corresponding eigenvectors 1+ x, 1− x, andx + x2, respectively. Moreover, it follows from (5) that the matrix of T with respect toC is the (3× 3) diagonal matrix

Q =

2 0 00 −2 00 0 3

.

In particular, T is a diagonalizable linear transformation.

The linear transformation in Example 2 provides an illustration of the followinggeneral result.

Theorem 22 Let V be an n-dimensional vector space. A linear transformation T : V → V is diago-nalizable if and only if there exists a basis for V consisting of eigenvectors for T .

Proof First, suppose that B = {v1, v2, . . . , vn} is a basis for V consisting entirely of eigen-vectors—say, T (v1) = d1v1, T (v2) = d2v2, . . . , T (vn) = dnvn. It follows that thecoordinate vectors for T (v1), T (v2), . . . , T (vn) are the n-dimensional vectors

[T (v1)]B =

d1

0...0

, [T (v2)]B =

0d2...0

, . . . , [T (vn)]B =

00...

dn

. (6)

Therefore, the matrix representation for T with respect to B is the (n × n) diagonalmatrix D given in (2). In particular, T is diagonalizable.

Conversely, assume that T is diagonalizable and that the matrix for T with respect tothe basisB = {v1, v2, . . . , vn} is the diagonal matrixD given in (2). Then the coordinatevectors for T (v1), T (v2), . . . , T (vn) are given by (6), so it follows that

T (v1) = d1v1 + 0v2 + · · · + 0vn = d1v1

T (v2) = 0v1 + d2v2 + · · · + 0vn = d2v2...

......

...

T (vn) = 0v1 + 0v2 + · · · + dnvn = dnvn.

Thus B consists of eigenvectors for T .



As with matrices, not every linear transformation is diagonalizable. Equivalently,if T : V → V is a linear transformation, it may be that no matter what basis we choosefor V , we never obtain a matrix representation for T that is diagonal. Moreover, evenif T is diagonalizable, Theorem 22 provides no procedure for calculating a basis for Vconsisting of eigenvectors for T . Before providing such a procedure, we will examinethe relationship between matrix representations of a single transformation with respectto different bases. First we need to facilitate the process of changing bases.

The Transition MatrixLet B and C be bases for an n-dimensional vector space V . Theorem 23, which follows,relates the coordinate vectors [v]B and [v]C for an arbitrary vector v in V . Using thistheorem, we will be able to show later that if Q is the matrix of a linear transformation T

with respect to B, and if P is the matrix of T relative to C, then Q and P are similar.Since we know how to determine whether a matrix is similar to a diagonal matrix, wewill be able to determine when T is diagonalizable.

Theorem 23 Change of Basis Let B and C be bases for the vector space V , with B ={u1, u2, . . . , un}, and let P be the (n× n) matrix given by P = [P1,P2, . . . ,Pn], wherethe ith column of P is

Pi = [ui]C. (7)

Then P is a nonsingular matrix and

[v]C = P [v]B (8)

for each vector v in V .

Proof Let IV denote the identity transformation on V ; that is, IV (v) = v for all v in V . Recallthat the ith column of the matrix of IV with respect to B and C is the coordinate vector[IV (ui )]C . But IV (ui ) = ui , so it follows that the matrix P described above is just thematrix representation of IV with respect to B and C. It now follows from Eq. (3) ofTheorem 18 that

P [v]B = [IV (v)]C = [v]Cfor each v in V ; in particular, Eq. (8) is proved. The proof that P is nonsingular is leftas Exercise 17.

The matrix P given in Theorem 23 is called the transition matrix. Since P isnonsingular, we have, in addition to Eq. (8), the relationship

[v]B = P−1[v]C (9)

for each vector v in V . The following example illustrates the use of the transition matrix.

Example 3 Let B and C be the bases for P2 given by B = {1, x, x2} and C = {1, x + 1, (x + 1)2}.Find the transition matrix P such that

P [q]B = [q]Cfor each polynomial q(x) in P2.



Solution Following Theorem 23, we determine the coordinates of 1, x, and x2 in terms of 1, x+1,and (x + 1)2. This determination is easy, and we find

1 = 1x = (x + 1)− 1x2 = (x + 1)2 − 2(x + 1)+ 1.

Thus with respect to C the coordinate vectors of B are

[1]C =

100

, [x]C =

−1

10

, and [x2]C =

1−2

1

.

The transition matrix P is therefore

P =

1 −1 10 1 −20 0 1

.

In particular, any polynomial q(x) = a0 + a1x + a2x2 can be expressed in terms of

1, x + 1, and (x + 1)2 by forming [q]C = P [q]B . Forming this, we find

[q]C =

a0 − a1 + a2

a1 − 2a2

a2

.

So with respect to C, we can write q(x) as q(x) = (a0− a1+ a2)+ (a1− 2a2)(x+ 1)+a2(x + 1)2 [a result that we can verify directly by multiplying out the new expressionfor q(x)].

Matrix Representation and Change of BasisIn terms of the transition matrix, we can now state precisely the relationship betweenthe matrix representations of a linear transformation with respect to two different basesB and C. Moreover, given a basis B, the relationship suggests how to determine a basisC such that the matrix relative to C is a simpler matrix.

Theorem 24 Let B and C be bases for the n-dimensional vector space V , and let T : V → V be alinear transformation. If Q1 is the matrix of T with respect to B and if Q2 is the matrixof T with respect to C, then

Q2 = P−1Q1P, (10)

where P is the transition matrix from C to B.

Proof First note that the notation is reversed from Theorem 23; P is the transition matrix fromC to B, so

[v]B = P [v]C (11)

for all v in V . Also,

P−1[w]B = [w]C (12)



for each w in V . If v is in V and if T (v) = w, then (1) implies that

Q1[v]B = [w]B and Q2[v]C = [w]C. (13)

From the equations given in (11), (12), and (13), we obtain

P−1Q1P [v]C = P−1Q1[v]B = P−1[w]B = [w]C;that is, the matrix P−1Q1P satisfies the same property as Q2 in (13). By the uniquenessof Q2, we conclude that Q2 = P−1Q1P .

The following example provides an illustration of Theorem 24.

Example 4 Let T : P2 → P2 be the linear transformation given in Example 2, and let B and C bethe bases for P2 given by B = {1, x, x2} and C = {1+ x, 1− x, x + x2}. Calculate thematrix of T with respect to B and use Theorem 24 to find the matrix of T with respectto C.

Solution Recall that T is defined by

T (a0 + a1x + a2x2) = (2a1 − 2a2)+ (2a0 + 3a2)x + 3a2x

2.

In particular, T (1) = 2x, T (x) = 2, and T (x2) = −2+ 3x + 3x2. Thus

[T (1)]B =

020

, [T (x)]B =

200

, and [T (x2)]B =

−2

33

.

It follows that the matrix of T with respect to B is the matrix Q1 given by

Q1 =

0 2 −22 0 30 0 3

.

Now letP be the transition matrix fromC toB; that is,P [v]C = [v]B for each vectorv in V (note that the roles of B and C are reversed from Theorem 23). By Theorem 23,P is the (3× 3) matrix P = [P1,P2,P3], where

P1 = [1+ x]B, P2 = [1− x]B, and P3 = [x + x2]B.Thus P is given by

P =

1 1 01 −1 10 0 1

.

The inverse of P can be easily be calculated and is given by

P−1 = (1/2)

1 1 −11 −1 10 0 2

.



By Theorem 24, the matrix of T with respect to C is the matrix Q2 determined byQ2 = P−1Q1P . This yields

Q2 =

2 0 00 −2 00 0 3

.

Although the preceding example serves to illustrate the statement of Theorem 2, acomparison of Examples 2 and 4 makes it clear that when the basis C is given, it maybe easier to calculate the matrix of T with respect to C directly from the definition.Theorem 24, however, suggests the following idea: If we are given a linear transfor-mation T : V → V and the matrix representation, Q, for T with respect to a given basisB, then we should look for a simple matrix R (diagonal if possible) that is similar to Q,R = S−1QS. In this case we can use S−1 as a transition matrix to obtain a new basisC for V , where [u]C = S−1[u]B . With respect to the basis C, T will have the matrixrepresentation R, where R = S−1QS.

Given the transition matrix S−1, it is an easy matter to find the actual basis vectors ofC. In particular, suppose that B = {u1, u2, . . . , un} is the given basis for V , and we wishto find vectors in C = {v1, v2, . . . , vn}. Since [u]C = S−1[u]B for all u in V , we knowthat S[u]C = [u]B . Moreover, with respect to C, [vi]C = ei . So from S[vi]C = [vi]Bwe obtain

Sei = [vi]B, 1 ≤ i ≤ n. (14)

But if S = [S1, S2, . . . , Sn], then Sei = Si , and Eq. (14) tells us that the coordinatevector of vi with respect to the known basis B is the ith column of S. The procedurejust described can be summarized as follows:

SummaryLet T : V → V be a linear transformation and let B = {u1, u2, . . . , un} be a given basisfor V .

Step 1. Calculate the matrix, Q, for T with respect to the basis B.Step 2. Use matrix techniques to find a “simple” matrix R and a nonsingular

matrix S such that R = S−1QS.Step 3. Determine vectors v1, v2, . . . , vn in V so that [vi]B = Si , 1 ≤ i ≤ n,

where Si is the ith column of S.

Then C = {v1, v2, . . . , vn} is a basis for V and R is the matrix of T with respectto C.

The case of particular interest is the one in whichQ is similar to a diagonal matrixR.In this case, if we choose {S1, S2, . . . , Sn} to be a basis of Rn consisting of eigenvectorsfor Q, then

R = S−1QS =

d1 0 · · · 00 d2 · · · 0...

...

0 0 · · · dn

,



where d1, d2, . . . , dn are the (not necessarily distinct) eigenvalues for Q and whereQSi = diSi . Since R is the matrix of T with respect to C,C = {v1, v2, . . . , vn} is abasis of V consisting of eigenvectors for T ; specifically, T (vi ) = divi for 1 ≤ i ≤ n.The following example provides an illustration.

Example 5 Show that the differential operator T :P2 → P2 defined by T (p) = x2p′′ +(2x−1)p′ +3p is diagonalizable.

Solution With respect to the basis B = {1, x, x2}, T has the matrix representation

Q =

3 −1 00 5 −20 0 9

.

Since Q is triangular, we see that the eigenvalues are 3, 5, and 9; and since Q hasdistinct eigenvalues, Q can be diagonalized where the matrix S of eigenvectors willdiagonalize Q.

We calculate the eigenvectors u1, u2, u3 for Q and form S = [u1, u2, u3], whichyields

S =

1 1 10 −2 −60 0 12

.

In this case it follows that

S−1QS =

3 0 00 5 00 0 9

= R.

In view of our remarks above, R is the matrix representation for T with respect to thebasis C = {v1, v2, v3}, where [vi]B = Si , or

[v1]B =

100

, [v2]B =

1−2

0

, and [v3]B =

1−612

.

Therefore, the basisC is given precisely asC = {1, 1−2x, 1−6x+12x2}. Moreover, it iseasy to see thatT (v1) = 3v1, T (v2) = 5v2, andT (v3) = 9v3, wherev1 = 1, v2 = 1−2x,and v3 = 1− 6x + 12x2.

5.10 EXERCISES

1. Let T : R2 → R2 be defined by

T

([x1

x2

])=[

2x1 + x2

x1 + 2x2

].

Define u1, u2 in R2 by

u1 =[ −1

1

]and u2 =

[11

].



Show thatC = {u1, u2} is a basis ofR2 consisting ofeigenvectors for T . Calculate the matrix of T withrespect to C.

2. Let T : P2 → P2 be defined by

T (a0 + a1x + a2x2) = (2a0 − a1 − a2)+ (a0 − a2)x

+ (−a0 + a1 + 2a2)x2.

Show that C = {1+x−x2, 1+x2, 1+x} is a basisof P2 consisting of eigenvectors for T , and find thematrix of T with respect to C.

3. Let V be the vector space of (2 × 2) matrices, andlet T : V → V be defined by

T

([a b

c d

])=[ −3a + 5d 3b − 5c

−2c 2d

].

If C = {A1, A2, A3, A4}, where

A1 =[

1 00 1

], A2 =

[0 11 0

],

A3 =[

0 10 0

], and A4 =

[1 00 0

],

then show that C is a basis of V consisting of eigen-vectors for T . Find the matrix of T with respectto C.

4. Let C be the basis for R2 given in Exercise 1, andlet B be the natural basis for R2. Find the transitionmatrix and represent the following vectors in termsof C:

a =[

42

], b =

[ −20

],

c =[

95

], and d =

[a

b

].

5. Let C be the basis for P2 given in Exercise 2 andlet B = {1, x, x2}. Find the transition matrix andrepresent the following polynomials in terms of C:

p(x) = 2+ x, q(x) = −1+ 2x + 2x2,

s(x) = −1+ x2, and r(x) = a0 + a1x + a2x2.

6. Let V be the vector space of (2 × 2) matrices, andlet C be the basis given in Exercise 3. If B is thenatural basis for V,B = {E11, E12, E21, E22}, thenfind the transition matrix and express the followingmatrices in terms of the vectors in C:

A =[

1 23 4

], B =

[ −1 10 3

], and

C =[

a b

c d

].

7. Find the transition matrix for R2 when B = {u1, u2}and C = {w1,w2}:

w1 =[

21

], w2 =

[12

],

u1 =[

11

], and u2 =

[31

].

8. Repeat Exercise 7 for the basis vectors

w1 =[

43

], w2 =

[23

],

u1 =[

41

], and u2 =

[21

].

9. Let B = {1, x, x2, x3} and C = {x, x + 1, x2 −2x, x3 + 3} be bases for P3. Find the transition ma-trix and use it to represent the following in termsof C:

p(x) = x2 − 7x + 2, q(x) = x3 + 9x − 1,and r(x) = x3 − 2x2 + 6.

10. Represent the following quadratic polynomials inthe form a0 + a1x + a2x(x − 1) by constructing theappropriate transition matrix:

p(x) = x2 + 5x − 3, q(x) = 2x2 − 6x + 8,and r(x) = x2 − 5

11. Let T : R2 → R2 be the linear transformation de-fined in Exercise 1. Find the matrix of T with re-spect to the natural basis B = {e1, e2}. If C is thebasis for R2 given in Exercise 1, use Theorem 24 tocalculate the matrix of T with respect to C.

12. Let T : P2 → P2 be the linear transformation givenin Exercise 2. Find the matrix representation of Twith respect to the natural basis B = {1, x, x2} andthen use Theorem 24 to calculate the matrix of T

relative to the basis C given in Exercise 2.



13. Let V and T be as in Exercise 3. Find the matrixrepresentation of T with respect to the natural basisB = {E11, E12, E21, E22}. If C is the basis for Vgiven in Exercise 3, use Theorem 24 to determinethe matrix of T with respect to C.

In Exercises 14–16, proceed through the following steps:a) Find the matrix, Q, of T with respect to the natural

basis B for V .b) Show that Q is similar to a diagonal matrix; that is,

find a nonsingular matrix S and a diagonal matrix R

such that R = S−1QS.c) Exhibit a basis C of V such that R is the matrix

representation of T with respect to C.d) Calculate the transition matrix, P , from B

to C.e) Use the transition matrix P and the formula

R[v]C = [T (v)]C to calculate T (w1), T (w2), andT (w3).

14. V = P1 and T :V → V is defined by T (a0+a1x) =(4a0 + 3a1)+ (−2a0 − 3a1)x. Also,

w1 = 2+ 3x, w2 = −1+ x, andw3 = x.

15. V = P2 and T : V → V is defined by T (p) =xp′′ + (x + 1)p′ + p. Also,

w1 = −8+ 7x + x2, w2 = 5+ x2, andw3 = 4− 3x + 2x2.

16. V is the vector space of (2 × 2) matrices andT : V → V is given by

T

([a b

c d

])=[

a − b 2b − 2c5c − 3d 10d

].

Also,

w1 =[

0 30 1

], w2 =

[2 −31 0

], and

w3 =[

8 −70 2

].

17. Complete the proof of Theorem 23 by showing thatthe transition matrix P is nonsingular. [Hint: Wehave already noted in the proof of Theorem 23 thatP is the matrix representation of IV with respect tothe bases B and C. Let Q be the matrix represen-tation of IV with respect to C and B. Now applyTheorem 20 with T = S = IV .]

18. Let V be an n-dimensional vector space with basisB, and assume that T : V → V is a linear transfor-mation with matrix representation Q relative to B.a) If v is an eigenvector for T associated with the

eigenvalue λ, then prove that [v]B is an eigen-vector for Q associated with λ.

b) If the vector x in Rn is an eigenvector for Qcorresponding to the eigenvalue λ and if v in V

is a vector such that [v]B = x, prove that v is aneigenvector for T corresponding to theeigenvalue λ. [Hint: Make use of Eq. (1).]

19. Let T : V → V be a linear transformation, and letλ be an eigenvalue for T . Show that λ2 is an eigen-value for T 2 = T ◦ T .

20. Prove that a linear transformation T : V → V is oneto one if and only if zero is not an eigenvalue for T .[Hint: Use Theorem 14, property 4, of Section 5.7.]

21. Let T : V → V be an invertible linear transforma-tion. If λ is an eigenvalue for T , prove that λ−1 isan eigenvalue for T −1. (Note that λ �= 0 by Exer-cise 20.)


1. Let V be the set of all (2 × 2) matrices with realentries and with the usual operation of addition.Suppose, however, that scalar multiplication in V isdefined by

k

[a b

c d

]=[

ka 00 kd

].

Determine whether V is a real vector space.



2. Recall thatF(R)denotes the set of all functions fromR to R; that is, F(R) = {f : R→ R}. A function g

in F(R) is called an even function if g(−x) = g(x)

for every x in R. Prove that the set of all even func-tions in F(R) is a subspace of F(R).

3. In each of parts a)–c), show that the set S is linearlydependent, and write one of the vectors in S as alinear combination of the remaining vectors.a) S = {A1, A2, A3, A4}, where

A1 =[

1 0−1 1

], A2 =

[ −1 10 1

],

A3 =[ −1 3−2 5

], and A4 =

[ −3 22 0

].

b) S = {p1(x), p2(x), p3(x), p4(x)}, wherep1(x) = 1− x2 + x3, p2(x) = −1+ x+ x3,p3(x) = −1+ 3x − 2x2 + 5x3, andp4(x) = −3+ 2x + 2x2.

c) S = {v1, v2, v3, v4}, where

v1 =

10−1

1

, v2 =

−1

101

,

v3 =

−1

3−2

5

, and v4 =

−3

220

.

4. Let W be the subspace of the set of (2 × 2) realmatrices defined by

W = {A =[

a b

c d

]: a − 2b + 3c + d = 0}.

a) Exhibit a basis B for W .b) Find a matrix A in W such that [A]B =[2, 1,−2]T .

5. In P2, let S = {p1(x), p2(x), p3(x)}, wherep1(x) = 1 − x + 2x2, p2(x) = 2 + 3x + x2, andp3(x) = 1− 6x + 5x2.a) Obtain an algebraic specification for Sp(S).b) Determine which of the following polynomials

are in Sp(S):q1(x) = 5+ 5x + 4x2, q2(x) = 5− 5x + 8x2,

q3(x) = −5x + 3x2, and q4(x) = 5+ 7x2.

c) Use the algebraic specification obtained in parta) to determine a basis, B, of Sp(S).

d) For each polynomial qi(x), i = 1, 2, 3, 4,given in part b), if qi(x) is in Sp(S), then find[qi(x)]B .

6. In parts a)–c), find a subset of S that is a basis forSp(S). Express each element of S that does not ap-pear in the basis as a linear combination of the basisvectors.a) S = {A1, A2, A3, A4, A5}, where

A1 =[

1 −21 −1

], A2 =

[2 −34 −3

],

A3 =[ −1 1−3 2

], A4 =

[1 −14 0

], and

A5 =[

12 −1730 −11

].

b) S = {p1(x), p2(x), p3(x), p4(x), p5(x)}, where

p1(x) = 1− 2x + x2 − x3,

p2(x) = 2− 3x + 4x2 − 3x3,

p3(x) = −1+ x − 3x2 + 2x3,

p4(x) = 1− x + 4x2, and

p5(x) = 12− 17x + 30x2 − 11x3

c) S = {f1(x), f2(x), f3(x), f4(x), f5(x)}, where

f1(x) = ex − 2e2x + e3x − e4x,

f2(x) = 2ex − 3e2x + 4e3x − 3e4x,

f3(x) = −ex + e2x − 3e3x + 2e4x,

f4(x) = ex − e2x + 4e3x, and

f5(x) = 12ex − 17e2x + 30e3x − 11e4x

In Exercises 7–11, use the fact that the matrix

[A | b] =

1 −1 3 1 3 2 a

1 0 2 3 2 3 b

0 −2 2 −4 3 0 c

2 −1 5 4 6 7 d



is row equivalent to

1 0 2 3 0 −1 4a − 3b − 2c

0 1 −1 2 0 3 −3a + 3b + c

0 0 0 0 1 2 −2a + 2b + c

0 0 0 0 0 0 a − 3b − c + d

.

7. Find a basis for Sp{A1, A2, A3, A4}, where

A1 =[

1 −1 3

1 3 2

], A2 =

[1 0 2

3 2 3

],

A3 =[

0 −2 2

−4 3 0

], and A4 =

[2 −1 5

4 6 7

].

8. LetS = {p1(x), p2(x), p3(x), p4(x), p5(x), p6(x)},where

p1(x) = 1+ x + 2x3,

p2(x) = −1− 2x2 − x3,

p3(x) = 3+ 2x + 2x2 + 5x3,

p4(x) = 1+ 3x − 4x2 + 4x3,

p5(x) = 3+ 2x + 3x2 + 6x3, and

p6(x) = 2+ 3x + 7x3.

Find a subset of S that is a basis for Sp(S).

9. Let S be the set of polynomials given in Exercise 8.Show that q(x) = 1 + 2x − x2 + 4x3 is in Sp(S),and express q(x) as a linear combination of the basisvector found in Exercise 8.

10. If

S ={[

1 1

0 2

],

[ −1 0

−2 −1

],

[3 2

2 5

],

[1 3

−4 4

],

[3 2

3 6

],

[2 3

0 7

]},

then give an algebraic specification for Sp(S) anduse the specification to determine a basis for Sp(S).

11. LetV be the vector space for all (2×3)matrices, andsuppose that T :V → P3 is the linear transformationdefined by

T

([a11 a12 a13

a21 a22 a23

])

= (a11 − a12 + 3a13 + a21 + 3a22 + 2a23)

+ (a11 + 2a13 + 3a21 + 2a22 + 3a23)x

+ (−2a12 + 2a13 − 4a21 + 3a22)x2

+ (2a11 − a12 + 5a13 + 4a21 + 6a22 + 7a23)x3.

a) Calculate the matrix of T relative to the naturalbases B and C for V and P3, respectively.

b) Determine the rank and the nullity of T .c) Give an algebraic specification for R(T ) and

use the specification to determine a basis forR(T ).

d) Show that q(x) = 1+ 2x − x2 + 4x3 is inR(T ) and find a matrix A in V such thatT (A) = q(x).

e) Find a basis for N (T ).12. Show that there is a linear transformation T : R2 →P2 such that

T

([01

])= 1+ 2x + x2 and

T

([ −11

])= 2− x.

Give a formula for

T

([a

b

]).

13. Show that there are infinitely many linear transfor-mations T : P2 → R2 such that

T (x) =[

10

]and T (x2) =

[01

].

Give a formula for T (a + bx + cx2) for one suchlinear transformation.



14. Let V be the vector space of (2 × 2) matrices, andlet T : V → P2 be the linear transformation definedby

T

([a b

c d

])= (a − b + c − 4d)+ (b + c + 3d)x

+ (a + 2c − d)x2.

a) Find the matrix of T relative to the naturalbases, B and C, for V and P2, respectively.

b) Give an algebraic specification for R(T ) anduse the specification to obtain a basis S forR(T ).

c) For each polynomial q(x) in S, find a matrix A

in V such that T (A) = q(x). Let B1 denote theset of matrices found.

d) Find a basis, B2, for N (T ).e) Show that B1 ∪ B2 is a basis for V .(Note: This exercise illustrates the proof thatrank(T )+ nullity(T ) = dim(V ).)


In Exercises 1–10, answer true or false. Justify your an-swer by providing a counterexample if the statement isfalse or an outline of a proof if the statement is true.1. If a is a nonzero scalar and u and v are vectors in a

vector space V such that au = av, then u = v.2. If v is a nonzero vector in a vector space V and a

and b are scalars such that av = bv, then a = b.3. Every vector spaceV contains a unique vector called

the additive inverse of V .4. If V consists of all real polynomials of degree ex-

actly n together with the zero polynomial, then V isa vector space.

5. If W is a subspace of the vector space V anddim(W) = dim(V ) = n, then W = V .

6. If dim(V ) = n and W is a subspace of V , thendim(W) ≤ n.

7. The subset {θ} of a vector space is linearlydependent.

8. Let S1 and S2 be subsets of a vector space V suchthat S1 ⊆ S2. If S1 is linearly dependent, then sois S2.

9. Let S1 and S2 be subsets of a vector space V suchthat S1 ⊆ S2. If S1 is linearly independent, then sois S2.

10. Suppose that S1 = {v1, . . . , vk} and S2 ={w1, . . . ,wl} are subsets of a vector space V . IfV = Sp(S1) and S2 is linearly independent, thenl ≤ k.

In Exercises 11–19, give a brief answer.11. Let W be a subspace of the vector space V . If u and

v are elements of V such that u + v and u − v arein W , show that u and v are in W .

12. Let W be a subset of a vector space V that satisfiesthe following properties:i) θ is in W .ii) If x and y are in W and a is a scalar, then

ax + y is in W . Prove that W is a subspaceof V .

13. If W is a subspace of a vector space V , show thatSp(W) = W .

14. Give examples of subsets of S1 and S2 of a vectorspace V such that Sp(S1) ∩ Sp(S2) �= Sp(S1 ∩ S2).

15. If U and W are subspaces of a vector space V , thenU +W = {u+ w: u is in U and w is in W }.a) Prove that U +W is a subspace of V .b) Let S1 = {x1, . . . , xm} and S2 = {y1, . . . , yn}

be subsets of V . Show that Sp(S1 ∪ S2) =Sp(S1)+ Sp(S2).

16. Let B = {v1, . . . , vn} be a basis for a vector spaceV , and let v be a nonzero vector in V . Prove thatthere exists a vector vj in B, 1 ≤ j ≤ n, such thatvj can be replaced by v and the resulting set, B ′, isstill a basis for V .

17. Let B = {v1, . . . , vn} be a basis for a vector spaceV , and let S: V → W and T : V → W be linear



transformations such that S(vi ) = T (vi ) for i =1, 2, . . . , n. Show that S = T .

18. Let T : V → W be a linear transformation.a) If T is one to one, then show that T carries

linearly independent subsets of V to linearlyindependent subsets of W .

b) If T carries linearly independent subsets of Vto linearly independent subsets of W , thenprove that T is one to one.

19. Give an example of a linear transformation T :R2 →R2 such that N (T ) = R(T ).

MATLAB EXERCISES

In these exercises we expand on least-squares approximation of functions, an important topicintroduced in Section 5.6. As an inner-product space, we use C[a, b] with an inner productgiven by

〈f, g〉 =∫ b

a

w(x)f (x)g(x) dx.

For the inner product just defined, y = w(x) denotes a function that is positive and continuouson (a, b); the function w is called a weight function.

Let y = f (x) denote a function we wish to approximate. Let y = p∗(x) denote the bestapproximation to f in Pn. In particular, if y = q(x) is any polynomial in Pn, then we have∫ b

a

w(x)[f (x)− p∗(x)]2 dx ≤∫ b

a

w(x)[f (x)− q(x)]2 dx. (1)

By Theorem 12, the best approximation p∗ is characterized by the condition:

〈f − p∗, q〉 = 0, for all q in Pn.

Let {qj }nj=0 be any basis for Pn. The preceding condition can be replaced by the set ofn+ 1 equations 〈f − p∗, qj 〉 = 0, j = 0, 1, . . . , n. Equivalently, p∗ is characterized by

〈p∗, qj 〉 = 〈f, qj 〉, j = 0, 1, . . . , n. (2)

Now, suppose that p∗ has the following representation in terms of the basis:

p∗(x) = a0q0(x)+ a1q1(x)+ · · · + anqn(x).

Inserting this representation into Eq. (2), we obtain a system of n+ 1 equations in the n+ 1unknowns a0, a1, . . . , an:

a0〈q0, q0〉 + a1〈q1, q0〉 + · · · + an〈qn, q0〉 = 〈f, q0〉a0〈q0, q1〉 + a1〈q1, q1〉 + · · · + an〈qn, q1〉 = 〈f, q1〉

...a0〈q0, qn〉 + a1〈q1, qn〉 + · · · + an〈qn, qn〉 = 〈f, qn〉

(3)

The equations above are called the normal equations, and the coefficient matrix for thesystem is known as the Gram matrix. For notation, let us denote the system (3) by

Ga = f (4)



where

G =

〈q0, q0〉〈q1, q0〉 · · · 〈qn, q0〉〈q0, q1〉〈q1, q1〉 · · · 〈qn, q1〉

......

〈q0, qn〉〈q1, qn〉 · · · 〈qn, qn〉

, a =

a0

a1...

an

, f =

〈f, q0〉〈f, q1〉

...

〈f, qn〉

.

Thus, to find the best least-squares polynomial approximation to a function f , we can usethe following algorithm:

1. choose a basis for Pn

2. set up the Gram matrix G and the vector f and then solve Eq. (4).

Note: The preceding process is not restricted to polynomial approximations of f . In par-ticular, without loss of generality, we can replace the subspace Pn by any finite-dimensionalsubspace of C[a, b].

1. Let f (x) = cos x, [a, b] = [0, 1], and w(x) = 1. Also, let n = 2 and qj (x) = xj

for j = 0, 1, 2. Find the best least-squares approximation to f by solving Eq. (4). Insetting up the matrix G and the vector f , you can evaluate the inner products using anintegral table or by using the MATLAB numerical integration routine quad8 to estimatethe inner products. If you use quad8, you might want to test the effects of using differenttolerances.

2. In Example 6, Section 5.6, the least-squares approximation problem in Exercise 1 wasworked using a different basis for P2. Verify that you got the same polynomial p∗ inExercise 1 even though the basis was different. On the same MATLAB plot, compare thegraph of y = cos x and y = p∗(x). Next, plot the difference function y = cos x − p∗(x)and use your graph to estimate the maximum error.

3. Repeat Exercise 1, only this time use the basis from Example 6: q0(x) = 1, q1(x) =x − 1/2, and q2(x) = x2 − x + 1/6. What differences are there between the Grammatrix G in this exercise and the matrix G in Exercise 1?

4. If you did not already do so in Exercise 1, calculate by hand the ij-th entry of the Grammatrix for the basis of Exercise 1. Is the Gram matrix you find equal to the (3× 3) Hilbertmatrix? Suppose we were looking for an n-th degree polynomial approximation. Wouldthe Gram matrix be the ((n + 1) × (n + 1)) Hilbert matrix? If we used an orthogonalbasis in Eqs. (3) and (4), would the Gram matrix be a diagonal matrix? (Note that Eq. (4)is ill conditioned when G is the Hilbert matrix, but we would hope that it might be betterconditioned when G is a diagonal matrix.)

5. Many applications of mathematics require the use of functions defined by integrals of theform

f (x) =∫ x

0g(t) dt. (5)



Quite often the integral defining f is not an elementary one and can only be evaluatednumerically. Some examples are

a) g(x) =∫ x

0et

2dt b) f (x) =

∫ x

0

sin t

tdt c) f (x) =

∫ x

0cos t2 dt.

These functions are, respectively, the error function, the sine integral, and the Fresnelintegral. In each case, the integral defining f (x) must be evaluated numerically.

Rather than calling a numerical integration routine whenever we need the value f (x),we might consider approximating f by a polynomial. That idea is the theme of thisexercise.

Now, if we are to approximate f by its best least-squares polynomial approximationp∗, we first have to choose a basis forPn and then solve the normal equations (representedin Eq. (4) by Ga = f). As we can see from Eqs. (3) and (4), the vector f has components〈f, q0〉, 〈f, q1〉, . . . , 〈f, qn〉. Since f itself must be evaluated numerically, we will have todo the same for each of the components 〈f, qk〉. However, using a numerical integrationroutine to estimate 〈f, qk〉 requires us to supply a formula of some sort for f (x). In orderto avoid this requirement by a numerical integration routine, we use integration by partsto replace evaluations of f by evaluations of f ′.

In particular, suppose we want to approximate y = f (x) for x in [0, 1]. Let us choosey = ρ(x) to be an antiderivative for qk(x) with the property that ρ(1) is 0. Then, usingintegration by parts, we have:

〈f, qk〉 =∫ 1

0f (x)qk(x) dx

= ρ(x)f (x)|10 −∫ 1

0ρ(x)f ′(x) dx

= −∫ 1

0ρ(x)g(x) dx.

(6)

To explain the preceding calculations, we used integration by parts with u = f (x),

dv = qk(x)dx, v = ρ(x), and du = f ′(x) dx. To obtain the final result, we used thefact that ρ(1) = 0 and f (0) = 0, and also the fact that f ′(x) = g(x) by the fundamentaltheorem of calculus.

Let g(x) = cos x2 and use the preceding ideas to find the best least-squares approx-imation to the Fresnel integral f (x). Use n = 2, 4, and 6. For ease of calculation, usethe standard basis for Pn, qk(x) = xk, k = 0, 1, . . . , n. (Note that this choice of basiswill mean that the Gram matrix will be a Hilbert matrix. However, for the small values ofn we are using, the Hilbert matrix is not that badly behaved. You can use the MATLABcommand hilb(n)to create the matrix G in Eq. (4).) In order to find the components ofthe vector f on the right-hand side of Eq. (4), use a numerical integration routine such asquad8. Because of Eq. (6), the components of f can be found by evaluating the followingintegral numerically:

〈f, qk〉 = −∫ 1

0

xk+1 − 1k + 1

cos(x2) dx.


447

6Determinants

This chapter may be covered at any time after Chapter 1

Overview In this chapter we introduce the idea of the determinant of a square matrix. We alsoinvestigate some of the properties of the determinant. For example, a square matrix issingular if and only if its determinant is zero.

We also consider applications of determinants in matrix theory. For instance, wedescribe Cramer’s Rule for solving Ax = b, see how to express A−1 in terms of theadjoint matrix, and show how the Wronskian can be used as a device for determininglinear independence of a set of functions.

Core Sections 6.2 Cofactor Expansions of Determinants6.3 Elementary Operations and Determinants6.4 Cramer’s Rule6.5 Applications of Determinants: Inverses and Wronskians


448 Chapter 6 Determinants

6.1 INTRODUCTION

Determinants have played a major role in the historical development of matrix theory,and they possess a number of properties that are theoretically pleasing. For example, interms of linear algebra, determinants can be used to characterize nonsingular matrices,to express solutions of nonsingular systems Ax = b, and to calculate the dimension ofsubspaces. In analysis, determinants are used to express vector cross products, to expressthe conversion factor (the Jacobian) when a change of variables is needed to evaluate amultiple integral, to serve as a convenient test (the Wronskian) for linear independenceof sets of functions, and so on. We explore the theory and some of the applications ofdeterminants in this chapter.

The material in Sections 6.2 and 6.3 duplicates the material in Sections 4.2 and 4.3in order to present a contiguous coverage of determinants. The treatment is slightlydifferent because the material in Chapter 6 is self-contained, whereas Chapter 4 uses aresult (Theorem 6.13) that is stated in Chapter 4 but actually proved in Chapter 6. Hence,the reader who has seen the results of Sections 4.2 and 4.3 might want to proceed directlyto Section 6.4.

6.2 COFACTOR EXPANSIONS OF DETERMINANTS

If A is an (n × n) matrix, the determinant of A, denoted det(A), is a number that weassociate with A. Determinants are usually defined either in terms of cofactors or interms of permutations, and we elect to use the cofactor definition here. We begin withthe definition of det(A) when A is a (2× 2) matrix.

Definition 1 Let A = (aij ) be a (2× 2) matrix. The determinant of A is given by

det(A) = a11a22 − a12a21.

For notational purposes the determinant is often expressed by using vertical bars:

det(A) =∣∣∣∣∣ a11 a12

a21 a22

∣∣∣∣∣ .Example 1 Find the determinants of the following matrices:

A =[

1 2−1 3

], B =

[4 12 1

], and C =

[3 46 8

].


6.2 Cofactor Expansions of Determinants 449

Solution

det(A) =∣∣∣∣∣ 1 2−1 3

∣∣∣∣∣ = 1 · 3− 2(−1) = 5;

det(B) =∣∣∣∣∣ 4 12 1

∣∣∣∣∣ = 4 · 1− 1 · 2 = 2;

det(C) =∣∣∣∣∣ 3 46 8

∣∣∣∣∣ = 3 · 8− 4 · 6 = 0

We now define the determinant of an (n × n) matrix as a weighted sum of[(n− 1)× (n− 1)] determinants. It is convenient to make a preliminary definition.

Definition 2 Let A = (aij ) be an (n × n) matrix, and let Mrs denote the [(n − 1) × (n − 1)]matrix obtained by deleting the rth row and sth column from A. Then Mrs iscalled a minor matrix of A, and the number det(Mrs) is the minor of the (r,s)thentry, ars . In addition, the numbers

Aij = (−1)i+j det(Mij )

are called cofactors (or signed minors).

Example 2 Determine the minor matricesM11,M23, andM32 for the matrix A given by

A = 1 −1 2

2 3 −34 5 1

.

Also, calculate the cofactors A11, A23, and A32.

Solution Deleting row 1 and column 1 from A, we obtainM11:

M11 =[3 −35 1

].

Similarly, the minor matricesM23 andM32 are

M23 =[1 −14 5

]and M32 =

[1 22 −3

].



The associated cofactors, Aij = (−1)i+j det(Mij ) are given by

A11 = (−1)1+1∣∣∣∣∣ 3 −35 1

∣∣∣∣∣ = 3+ 15 = 18;

A23 = (−1)2+3∣∣∣∣∣ 1 −14 5

∣∣∣∣∣ = −(5+ 4) = −9;

A32 = (−1)3+2∣∣∣∣∣ 1 2

2 −3

∣∣∣∣∣ = −(−3− 4) = 7.

We use cofactors in our definition of the determinant.

Definition 3 Let A = (aij ) be an (n× n) matrix. Then the determinant of A is

det(A) = a11A11 + a12A12 + · · · + a1nA1n,

where Aij is the cofactor of a1j , 1 ≤ j ≤ n.

Determinants are defined only for square matrices. Note also the inductive natureof the definition. For example, ifA is (3×3), then det(A) = a11A11+a12A12+a13A13,and the cofactors A11, A12, and A13 can be evaluated from Definition 1. Similarly, thedeterminant of a (4 × 4) matrix is the sum of four (3 × 3) determinants, where each(3× 3) determinant is in turn the sum of three (2× 2) determinants.

Example 3 Compute det(A), where

A = 3 2 1

2 1 −34 0 1

.

Solution The matrix A is (3× 3). Using n = 3 in Definition 3, we have

det(A) = a11A11 + a12A12 + a13A13

= 3

∣∣∣∣∣ 1 −30 1

∣∣∣∣∣− 2

∣∣∣∣∣ 2 −34 1

∣∣∣∣∣+ 1

∣∣∣∣∣ 2 1

4 0

∣∣∣∣∣= 3(1)− 2(14)+ 1(−4) = −29.



DETERMINANTS BY PERMUTATIONS The determinant of an (n× n) matrix A canbe defined in terms of permutations rather than cofactors. Specifically, let S = {1, 2, . . . , n} denote theset consisting of the first n positive integers. A permutation (j1, j2, . . . , jn) of the set S = {1, 2, . . . , n}is just a rearrangement of the numbers in S. An inversion of this permutation occurs whenever a numberjr is followed by a smaller number js . For example, the permutation (1, 3, 2) has one inversion, but(2, 3, 1) has two inversions. A permutation of S is called odd or even if it has an odd or even number ofinversions.

It can be shown that det(A) is the sum of all possible terms of the form ±a1j1a2j2 . . . anjn , where(j1, j2, . . . , jn) is a permutation of S and the sign is taken as + or −, depending on whether thepermutation is even or odd. For instance,∣∣∣∣∣ a11 a12

a21 a22

∣∣∣∣∣ = +a11a22 − a12a21;

∣∣∣∣∣∣∣a11 a12 a13

a21 a22 a23

a31 a32 a33

∣∣∣∣∣∣∣ = +a11a22a33 − a11a23a32 − a12a21a33 + a12a23a31 + a13a21a32 − a13a22a31.

Since there are n! different permutations when S = {1, 2, . . . , n}, you can see why this definition is notsuitable for calculation. For example, calculating the determinant of a (10× 10) matrix requires us toevaluate 10! = 3,628,800 different terms of the form ±a1j1a2j2 . . . a10j10 . The permutation definition isuseful for theoretical purposes, however. For instance, the permutation definition gives immediately thatdet(A) = 0 when A has a row of zeros.

Example 4 Compute det(A), where

A =

1 2 0 2

−1 2 3 1

−3 2 −1 0

2 −3 −2 1

.

Solution The matrix A is (4× 4). Using n = 4 in Definition 3, we have

det(A) = a11A11 + a12A12 + a13A13 + a14A14 = A11 + 2A12 + 2A14.

The required cofactors, A11, A12, and A14, are calculated as in Example 3 (note that thecofactor A13 is not needed, since a13 = 0).



In detail,

A11 =

∣∣∣∣∣∣∣2 3 12 −1 0−3 −2 1

∣∣∣∣∣∣∣= 2

∣∣∣∣∣ −1 0−2 1

∣∣∣∣∣− 3

∣∣∣∣∣ 2 0−3 1

∣∣∣∣∣+ 1

∣∣∣∣∣ 2 −1−3 −2

∣∣∣∣∣ = −15;

A12 = −

∣∣∣∣∣∣∣−1 3 1−3 −1 02 −2 1

∣∣∣∣∣∣∣= −

(−1

∣∣∣∣∣ −1 0−2 1

∣∣∣∣∣− 3

∣∣∣∣∣ −3 02 1

∣∣∣∣∣+ 1

∣∣∣∣∣ −3 −12 −2

∣∣∣∣∣)= −18;

A14 = −

∣∣∣∣∣∣∣−1 2 3−3 2 −12 −3 −2

∣∣∣∣∣∣∣= −

(−1

∣∣∣∣∣ 2 −1−3 −2

∣∣∣∣∣− 2

∣∣∣∣∣ −3 −12 −2

∣∣∣∣∣+ 3

∣∣∣∣∣ −3 22 −3

∣∣∣∣∣)= −6.

Thus it follows that

det(A) = A11 + 2A12 + 2A14 = −15− 36− 12 = −63.The definition of det(A) given in Definition 3 and used in Examples 3 and 4 is based

on a cofactor expansion along the first row of A. In Section 6.5 (see Theorem 13), weprove that the value det(A) can be calculated from a cofactor expansion along any rowor any column.

Also, note in Example 4 that the calculation of the (4×4) determinant was simplifiedbecause of the zero entry in the (1, 3) position. Clearly, if we had some procedurefor creating zero entries, we could simplify the computation of determinants since thecofactor of a zero entry need not be calculated. We will develop such simplifications inthe next section.

Example 5 Compute the determinant of the lower-triangular matrix T , where

T =

3 0 0 01 2 0 02 3 2 01 4 5 1

.



Solution We have det(T ) = t11T11+ t12T12+ t13T13+ t14T14. Since t12 = 0, t13 = 0, and t14 = 0,the calculation simplifies to

det(T ) = t11T11 = 3

∣∣∣∣∣∣∣2 0 03 2 04 5 1

∣∣∣∣∣∣∣= 3 · 2

∣∣∣∣∣ 2 05 1

∣∣∣∣∣= 3 · 2 · 2 · 1 = 12.

In Example 5, we saw that the determinant of the lower-triangular matrix T was theproduct of the diagonal entries, det(T ) = t11t22t33t44. This simple relationship is validfor any lower-triangular matrix.

Theorem 1 Let T = (tij ) be an (n× n) lower-triangular matrix. Then

det(T ) = t11 · t22 · · · · · tnn.Proof If T is a (2× 2) lower-triangular matrix, then

det(T ) =∣∣∣∣∣ t11 0t21 t22

∣∣∣∣∣ = t11t22.

Proceeding inductively, suppose that the theorem is true for any (k× k) lower-triangularmatrix, where 2 ≤ k ≤ n− 1. If T is an (n× n) lower-triangular matrix, then

det(T ) =

∣∣∣∣∣∣∣∣∣t11 0 0 · · · 0t21 t22 0 · · · 0...

...

tn1 tn2 tn3 · · · tnn

∣∣∣∣∣∣∣∣∣ = t11, T11, where T11 =

∣∣∣∣∣∣∣∣∣t22 0 · · · 0t32 t33 · · · 0...

...

tn2 tn3 · · · tnn

∣∣∣∣∣∣∣∣∣ .Clearly, T11 is the determinant of an [(n − 1) × (n − 1)] lower-triangular matrix, soT11 = t22t33 · · · tnn. Thus det(T ) = t11t22 · · · tnn, and the theorem is proved.

Example 6 Let I denote the (n× n) identity matrix. Calculate det(I ).

Solution Since I is a lower-triangular matrix with diagonal entries equal to 1, we see fromTheorem 1 that

det(T ) = 1 · 1 · · · · · 1︸︷︷︸n factors

= 1.

6.2 EXERCISES

In Exercises 1–8, evaluate the determinant of the givenmatrix. If the determinant is zero, find a nonzero vectorx such that Ax = θ . (We will see later that det(A) = 0if and only if A is singular.)

1.[

1 32 1

]2.[

6 77 3

]



3.[

2 44 8

]4.[

1 30 2

]

5.[

4 31 7

]6.[

2 −11 1

]

7.[

4 1−2 1

]8.[

1 32 6

]

In Exercises 9–14, calculate the cofactorsA11,A12,A13,and A33 for the given matrix A.

9. A = 1 2 1

0 1 32 1 1

10. A = 1 4 0

1 0 23 1 2

11. A = 2 −1 3−1 2 23 2 1

12. A = 1 1 1

1 1 22 1 1

13. A = −1 1 −1

2 1 00 1 3

14. A = 4 2 1

4 3 10 0 2

In Exercises 15–20, use the results of Exercises 9–14 tofind det(A), where:

15. A is in Exercise 9. 16. A is in Exercise 10.



In Exercises 21–24, calculate det(A).

21. A =

2 1 −1 23 0 0 12 1 2 03 1 1 2

22. A =

1 −1 1 21 0 1 30 0 2 41 1 −1 1

23. A =

2 0 2 01 3 1 20 1 2 10 3 1 4

24. A =

1 2 1 10 2 0 31 4 1 20 2 1 3

In Exercises 25 and 26, show that the quantities det(A),a21A21 + a22A22 + a23A23, and a31A31 + a32A32 +a33A33 are all equal. (This is a special case of a gen-eral result given later in Theorem 13.)

25. A = 1 3 2−1 4 12 2 3

26. A = 2 4 1

3 1 32 3 2

In Exercises 27 and 28, show that a11A21 + a12A22 +a13A23 = 0, and a11A31 + a12A32 + a13A33 = 0. (Thisis a special case of a general result given later in thelemma to Theorem 14.)27. A as in Exercise 25 28. A as in Exercise 26

In Exercises 29 and 30, form the (3 × 3) matrix of co-factors C where cij = Aij and then calculate BA whereB = CT . How can you use this result to find A−1?29. A as in Exercise 25 30. A as in Exercise 26

31. Verify that det(A) = 0 when

A = 0 a12 a13

0 a22 a23

0 a32 a33

.

32. Use the result of Exercise 31 to prove that if U =(uij ) is a (4 × 4) upper-triangular matrix, thendet(U) = u11u22u33u44.

33. Let A = (aij ) be a (2 × 2) matrix. Show thatdet(AT ) = det(A).

34. An (n × n) symmetric matrix A is called positivedefinite if xT Ax > 0 for all x in Rn, x �= θ . Let Abe a (2×2) symmetric matrix. Prove the following:a) If A is positive definite, then a11 > 0 and

det(A) > 0.b) If a11 > 0 and det(A) > 0, then A is positive

definite. [Hint: For part a), consider x = e1.


6.3 Elementary Operations and Determinants 455

Then consider x = [u, v]T and use the fact thatA is symmetric.]

35. a) Let A be an (n× n) matrix. If n = 3, det(A)can be found by evaluating three (2× 2)determinants. If n = 4, det(A) can be found byevaluating twelve (2× 2) determinants. Give aformula, H(n), for the number of (2× 2)

determinants necessary to find det(A) for anarbitrary n.

b) Suppose you can perform additions,subtractions, multiplications, and divisionseach at a rate of one per second. How longdoes it take to evaluate H(n) determinants oforder (2× 2) when n = 2, n = 5, and n = 10?

6.3 ELEMENTARY OPERATIONS AND DETERMINANTS

In this section we show how certain column operations simplify the calculation of de-terminants. In addition, the properties we develop will be used later to demonstratesome of the connections between determinant theory and linear algebra. We use threeelementary column operations, which are analogous to the elementary row operationsdefined in Chapter 1. For a matrix A, the elementary column operations are as follows:

1. Interchange two columns of A.2. Multiply a column of A by a scalar c, c �= 0.3. Add a scalar multiple of one column of A to another column of A.

From Chapter 1, we know that row operations can be used to reduce a square matrix Ato an upper-triangular matrix (that is, we know A can be reduced to echelon form, anda square matrix in echelon form is upper triangular). Similarly, it is easy to show thatcolumn operations can be used to reduce a square matrix to lower-triangular form. Onereason for reducing an (n × n) matrix A to a lower-triangular matrix T is that det(T )is trivial to evaluate (see Theorem 1). Thus if we can calculate the effect that columnoperations have on the determinant, we can relate det(A) to det(T ).

Before proceeding, we wish to make the following statement about elementary rowand column operations. We will prove a succession of results dealing only with columnoperations. These results lead to a proof in Section 6.5 of the following theorem (seeTheorem 12):

Theorem If A is an (n× n) matrix, then

det(AT ) = det(A). (1)

Once Eq. (1) is formally established, we will immediately know that the theoremsfor column operations are also valid for row operations. (Row operations on A areprecisely mirrored by column operations on AT .) Therefore the following theorems arestated in terms of elementary row operations, as well as elementary column operations,although the row results will not be truly established until Theorem 12 is proved.

Elementary OperationsOur purpose is to describe how the determinant of amatrixA changeswhen an elementarycolumn operation is applied to A. The description will take the form of a series of



theorems. Because of the technical nature of the first three theorems, we defer theirproofs to the end of the section.

Our first result relating to elementary operations is given inTheorem2. This theoremasserts that a column interchange (or a row interchange) will change the sign of thedeterminant.

Theorem 2 LetA = [A1,A2, . . . ,An] be an (n×n)matrix. IfB is obtained fromA by interchangingtwo columns (or rows) of A, then det(B) = − det(A).The proof of Theorem 2 is at the end of this section.

Example 1 Verify Theorem 2 for the (2× 2) matrix

A =[a11 a12

a21 a22

].

Solution Let B denote the matrix obtained by interchanging the first and second columns of A.Thus B is given by

B =[a12 a11

a22 a21

].

Nowdet(B) = a12a21−a11a22, and det(A) = a11a22−a12a21. Thus det(B) = − det(A).


A = 1 3 1

2 0 41 2 3

.

The determinant of A is −10. Use the fact that det(A) = −10 to find the determinantsof B, C, and F , where

B = 3 1 1

0 2 42 1 3

, C = 1 1 3

2 4 01 3 2

, and F = 1 1 3

4 2 03 1 2

.

Solution If A is given in column form as A = [A1,A2,A3], then B = [A2,A1,A3], C =[A1,A3,A2], and F = [A3,A1,A2]. Since both B and C are obtained from A by asingle column interchange, it follows from Theorem 2 that

det(B) = det(C) = − det(A) = 10.

We can obtain F from A by two column interchanges as follows:

A→ G = [A2,A1,A3] → F = [A3,A1,A2].From Theorem 2, det(G) = − det(A) and det(F ) = − det(G). Therefore det(F ) =− det(G) = −[− det(A)] = det(A) = −10.



By performing a sequence of column interchanges, we can produce any rear-rangement of columns that we wish; and Theorem 2 can be used to find the deter-minant of the end result. For example, if A = [A1,A2,A3,A4] is a (4 × 4) ma-trix and B = [A4,A3,A1,A2] then we can relate det(B) to det(A) as follows: FormB1 = [A4,A2,A3,A1]; then form B2 = [A4,A3,A2,A1]; and then form B by in-terchanging the last two columns of B2. In this sequence, det(B) = − det(A) anddet(B2) = − det(B1), so det(B) = − det(B2) = det(B1) = − det(A).

Our next theorem shows that multiplying all entries in a column of A by a scalar chas the effect of multiplying the determinant by c.

Theorem 3 If A is an (n× n) matrix, and if B is the (n× n) matrix resulting from multiplying thekth column (or row) of A by a scalar c, then det(B) = c det(A).

Again, the proof of Theorem 3 is rather technical, so we defer it to the end of thissection. The next example, however, verifies Theorem 3 for a (2× 2) matrix A.

Example 3 Verify Theorem 3 for the (2× 2) matrix

A =[a11 a12

a21 a22

].

Solution Consider the matrices A′ and A′′ given by

A′ =[ca11 a12

ca21 a22

]and A′′ =

[a11 ca12

a21 ca22

].

Clearly, det(A′) = ca11a22 − ca21a12 = c(a11a22 − a21a12) = c det(A). Similarly,

det(A′′) = ca11a22 − ca21a12 = c(a11a22 − a21a12) = c det(A).

These calculations prove Theorem 3 for a (2× 2) matrix A.

We emphasize that Theorem 3 is valid when c = 0. That is, if A has a column ofzeros, then det(A) = 0.


A = 1 3 1

2 0 41 2 3

.

The determinant of A is −10. Use the fact that det(A) = −10 to find the determinantsof G, H , and J , where

G = 2 3 1

4 0 42 2 3

, H = 2 −3 1

4 0 42 −2 3

, and J = 2 −3 2

4 0 82 −2 6

.



Solution Let A = [A1,A2,A3]. ThenG = [2A1,A2,A3], H = [2A1,−A2,A3], and J = [2A1,−A2, 2A3].

By Theorem 3, det(G) = 2 det(A) = −20.Next,H is obtained fromG by multiplying the second column ofG by−1. There-

fore, det(H) = − det(G) = 20. Finally, J is obtained from H by multiplying the thirdcolumn of H by 2. Thus, det(J ) = 2 det(H) = 40.

The following result is a corollary of Theorem 3:

Corollary Let A be an (n× n) matrix and let c be a scalar. Then

det(cA) = cn det(A).

We leave the proof of the corollary as Exercise 32.

Example 5 Find det(3A), where

A =[1 24 1

].

Solution Clearly, det(A) = −7. Therefore, by the corollary, det(3A) = 32 det(A) = −63. As acheck, note that the matrix 3A is given by

3A =[

3 612 3

].

Thus, det(3A) = 9− 72 = −63, confirming the calculation above.So far we have considered the effect of two elementary column operations: column

interchanges and multiplication of a column by a scalar. We now wish to show that theaddition of a constant multiple of one column to another column does not change thedeterminant. We need several preliminary results to prove this.

Theorem 4 IfA,B, andC are (n×n)matrices that are equal except that the sth column (or row) ofA isequal to the sumof the sth columns (or rows) ofB andC, thendet(A) = det(B)+ det(C).

As before, the proof of Theorem 4 is somewhat technical and is deferred to the end ofthis section.

Example 6 Verify Theorem 4 where A, B, and C are (2× 2) matrices.

Solution Suppose that A, B, and C are (2 × 2) matrices such that the first column of A is equalto the sum of the first columns of B and C. Thus,

B =[b1 α

b2 β

], C =

[c1 α

c2 β

], and A =

[b1 + c1 α

b2 + c2 β

].



Calculating det(A), we have

det(A) = (b1 + c1)β − α(b2 + c2)

= (b1β − αb2)+ (c1β − αc2)

= det(B)+ det(C).

The case in which A, B, and C have the same first column is left as an exercise.

Example 7 Given that det(B) = 22 and det(C) = 29, find det(A), where

A = 1 3 2

0 4 72 1 8

, B = 1 1 2

0 2 72 0 8

, and C = 1 2 2

0 2 72 1 8

.

Solution In terms of column vectors, A1 = B1 = C1, A3 = B3 = C3, and A2 = B2 + C2. Thus,

det(A) = det(B)+ det(C) = 22+ 29 = 51.

Theorem 5 Let A be an (n × n) matrix. If the j th column (or row) of A is a multiple of the kthcolumn (or row) of A, then det(A) = 0.

Proof LetA = [A1,A2, . . . ,Aj , . . . ,Ak, . . . ,An] and suppose thatAj = cAk . DefineB to bethe matrix B = [A1,A2, . . . ,Ak, . . . ,Ak, . . . ,An] and observe that det(A) = c det(B).Now if we interchange the j th and kth columns of B, the matrix B remains the same,but the determinant changes sign (Theorem 2). This [det(B) = − det(B)] can happenonly if det(B) = 0; and since det(A) = c det(B), then det(A) = 0.

Two special cases of Theorem 5 are particularly interesting. If A has two identicalcolumns (c = 1 in the proof above), or if A has a zero column (c = 0 in the proof), thendet(A) = 0.

Theorems 4 and 5 can be used to analyze the effect of the last elementary columnoperation.

Theorem 6 If A is an (n×n)matrix, and if a multiple of the kth column (or row) is added to the j thcolumn (or row), then the determinant is not changed.

Proof Let A = [A1,A2, . . . ,Aj , . . . ,Ak, . . . ,An] and let B = [A1,A2, . . . ,Aj + cAk, . . . ,

Ak , . . . ,An]. By Theorem 4, det(B) = det(A) + det(Q), where Q = [A1,A2, . . . ,

cAk, . . . ,Ak, . . . ,An]. By Theorem 5, det(Q) = 0; so det(B) = det(A), and thetheorem is proved.

As shown in the examples that follow, we can use elementary column operationsto introduce zero entries into the first row of a matrix A. The analysis of how theseoperations affect the determinant allows us to relate this effect back to det(A).



Example 8 Use elementary column operations to simplify finding the determinant of the (4 × 4)matrix A:

A =

1 2 0 2−1 2 3 1−3 2 −1 02 −3 −2 1

.

Solution In Example 4 of Section 6.2, a laborious cofactor expansion showed that det(A) = −63.In column form,A = [A1,A2,A3,A4], and clearlywe can introduce a zero into the (1, 2)position by replacing A2 by A2 − 2A1. Similarly, replacing A4 by A4 − 2A1 creates azero in the (1, 4) entry. Moreover, by Theorem 6, the determinant is unchanged. Thedetails are

det(A) =

∣∣∣∣∣∣∣∣∣∣1 2 0 2−1 2 3 1−3 2 −1 02 −3 −2 1

∣∣∣∣∣∣∣∣∣∣=

∣∣∣∣∣∣∣∣∣∣1 0 0 2−1 4 3 1−3 8 −1 02 −7 −2 1

∣∣∣∣∣∣∣∣∣∣

=

∣∣∣∣∣∣∣∣∣∣1 0 0 0−1 4 3 3−3 8 −1 62 −7 −2 −3

∣∣∣∣∣∣∣∣∣∣.

Thus it follows that det(A) is given by

det(A) =

∣∣∣∣∣∣∣4 3 38 −1 6−7 −2 −3

∣∣∣∣∣∣∣ .We now wish to create zeros in the (1, 2) and (1, 3) positions of this (3 × 3) determi-nant. To avoid using fractions, we multiply the second and third columns by 4 (usingTheorem 3), and then add a multiple of −3 times column 1 to columns 2 and 3:

det(A) =

∣∣∣∣∣∣∣4 3 38 −1 6−7 −2 −3

∣∣∣∣∣∣∣ =116

∣∣∣∣∣∣∣4 12 128 −4 24−7 −8 −12

∣∣∣∣∣∣∣ =116

∣∣∣∣∣∣∣4 0 08 −28 0−7 13 9

∣∣∣∣∣∣∣ .Thus we again find det(A) = −63.

Example 9 Use column operations to find det(A), where

A =

0 1 3 11 −2 −2 23 4 2 −24 3 −1 1

.



Solution As in Gaussian elimination, column interchanges are sometimes desirable and serve tokeep order in the computations. Consider

det(A) =

∣∣∣∣∣∣∣∣∣∣0 1 3 11 −2 −2 23 4 2 −24 3 −1 1

∣∣∣∣∣∣∣∣∣∣= −

∣∣∣∣∣∣∣∣∣∣1 0 3 1−2 1 −2 24 3 2 −23 4 −1 1

∣∣∣∣∣∣∣∣∣∣.

Use column 1 to introduce zeros along the first row:

det(A) = −

∣∣∣∣∣∣∣∣∣∣1 0 0 0−2 1 4 44 3 −10 −63 4 −10 −2

∣∣∣∣∣∣∣∣∣∣= −

∣∣∣∣∣∣∣1 4 43 −10 −64 −10 −2

∣∣∣∣∣∣∣ .

Again column 1 can be used to introduce zeros:

det(A) = −

∣∣∣∣∣∣∣1 0 03 −22 −184 −26 −18

∣∣∣∣∣∣∣ = −∣∣∣∣∣ −22 −18−26 −18

∣∣∣∣∣ = 18

∣∣∣∣∣ −22 1−26 1

∣∣∣∣∣ ,and we calculate the (2× 2) determinant to find det(A) = 72.

Proof of Theorems 2, 3, and 4 (Optional)We conclude this section with the proofs of Theorems 2, 3, and 4. Note that these proofsare very similar and fairly straightforward.

Proof of Theorem 2 The proof is by induction. The initial case (k = 2) was proved in Example 1.Assuming the result is valid for any (k×k)matrix, 2 ≤ k ≤ n−1, letB be obtained

from A by interchanging the ith and j th columns. For 1 ≤ s ≤ n, let M1s and N1sdenote minor matrices of A and B, respectively.

If s �= i or j , then N1s is the same as M1s except for a single column interchange.Hence, by the induction hypotheses,

det(N1s) = − det(M1s), s �= i or j.

For definiteness let us suppose that i > j . Note that N1i contains no entries from theoriginal j th column. Furthermore, the columns of N1i can be rearranged to be the sameas the columns of M1j by i − j − 1 successive interchanges of adjacent columns. Bythe induction hypotheses, each such interchange causes a sign change, and so

det(N1i ) = (−1)(i−j−1) det(M1j ).



Therefore,

det(B) = n∑

s=1s �=ior j

a1s(−1)1+s det(N1s)

+ a1j (−1)i+1 det(N1i )

+ a1i (−1)1+j det(N1j )

= n∑

s=1s �=ior j

a1s(−1)1+s[− det(M1s)]+ a1j (−1)1+i (−1)i−j−1 det(M1j )

+ a1i (−1)1+j (−1)i−j−1 det(M1i )

=n∑

s=1a1s(−1)2+s det(M1s) = − det(A).

Proof of Theorem 3 Again, the proof is by induction. The case k = 2 was proved in Example 3.Assuming the result is valid for (k×k)matrices, 2 ≤ k ≤ n−1, letB be the (n×n)

matrix, where

B = [A1, . . . ,As−1, cAs ,As+1, . . . ,An].LetM1j and N1j be minor matrices of A and B, respectively, for 1 ≤ j ≤ n.

If j �= s, then N1j = M1j except that one column of N1j is multiplied by c. By theinduction hypothesis,

det(N1j ) = c det(M1j ), 1 ≤ j ≤ n, j �= s.

Moreover, N1s = M1s . Hence

det(B) = n∑

j=1j �=s

a1j (−1)1+j det(N1j )

+ ca1s(−1)1+s det(N1s)

= n∑

j=1j �=s

a1j (−1)1+j c det(M1j )

+ ca1s(−1)1+s det(M1s)

= c

n∑j=1

a1j (−1)1+j det(M1j ) = c det(A).

Proof of Theorem 4 We use induction where the case k = 2 is done in Example 6. Assuming the result istrue for (k × k) matrices for 2 ≤ k ≤ n− 1, let

A = [A1,A2, . . . ,An], B = [A1, . . . ,As−1,Bs ,As+1, . . . ,An], andC = [A1, . . . ,As−1,Cs ,As+1, . . . ,An],

where As = Bs + Cs , or

ais = bis + cis, for 1 ≤ i ≤ n.



Let M1j , N1j , and P1j be minor matrices of A, B, and C, respectively, for 1 ≤ j ≤ n.If j �= s, then M1j , N1j , and P1j are equal except in one column, which we designateas the rth column. Now the rth columns of N1j and P1j sum to the rth column ofM1j .Hence, by the induction hypothesis,

det(M1j ) = det(N1j )+ det(P1j ), 1 ≤ j ≤ n, j �= s.

Clearly, if j = s, thenM1s = N1s = P1s . Hence

det(B)+ det(C) = n∑

j=1j �=s

a1j (−1)1+j det(N1j )

+ b1s(−1)1+s det(N1s)

+ n∑

j=1j �=s

a1j (−1)1+j det(P1j )+ c1s(−1)1+s det(P1s)

= n∑

j=1j �=s

a1j (−1)1+j [det(N1j )+ det(P1j )]

+ (b1s + c1s)(−1)1+s det(M1s)

=n∑

j=1a1j (−1)1+j det(M1j ) = det(A).

6.3 EXERCISES

In Exercises 1–6, use elementary column operations tocreate zeros in the last two entries in the first row andthen calculate the determinant of the original matrix.1. 1 2 1

2 0 11 −1 1

2. 2 4 −2

0 2 31 1 2

3. 0 1 2

3 1 22 0 3

4. 2 2 4

1 0 12 1 2

5. 0 1 3

2 1 21 1 2

6. 1 1 1

2 1 23 0 2

Suppose that A = [A1,A2,A3,A4] is a (4× 4) matrix,where det(A) = 3. In Exercises 7–12, find det(B).7. B = [2A1,A2,A4,A3]8. B = [A2, 3A3,A1,−2A4]9. B = [A1 + 2A2,A2,A3,A4]

10. B = [A1,A1 + 2A2,A3,A4]11. B = [A1 + 2A2,A2 + 3A3,A3,A4]12. B = [2A1 − A2, 2A2 − A3,A3,A4]In Exercises 13–15, use only column interchanges toproduce a triangular matrix and then give the determi-nant of the original matrix.

13. 1 0 0 02 0 0 31 1 0 11 4 2 2

14.

0 0 2 00 0 1 30 4 1 32 1 5 6

15.

0 1 0 00 2 0 32 1 0 63 2 2 4

In Exercises 16–18, use elementary column operationsto create zeros in the (1, 2), (1, 3), (1, 4), (2, 3), and(2, 4) positions. Then evaluate the original determinant.



16. ∣∣∣∣∣∣∣∣∣1 2 0 32 5 1 12 0 4 30 1 6 2

∣∣∣∣∣∣∣∣∣17. ∣∣∣∣∣∣∣∣∣

2 4 −2 −21 3 1 21 3 1 3−1 2 1 2

∣∣∣∣∣∣∣∣∣18. ∣∣∣∣∣∣∣∣∣

1 1 2 10 1 4 12 1 3 02 2 1 2

∣∣∣∣∣∣∣∣∣19. Use elementary row operations on the determinant

in Exercise 16 to create zeros in the (2, 1), (3, 1),(4, 1), (3, 2), and (4, 2) positions. Assuming thecolumn results in this section also hold for rows,give the value of the original determinant to verifythat it is the same as in Exercise 16.

20. Repeat Exercise 19, using the determinant in Exer-cise 17.

21. Repeat Exercise 19, using the determinant in Exer-cise 18.

22. Find a (2 × 2) matrix A and a (2 × 2) matrix B,where det(A+ B) is not equal to det(A)+ det(B).Find a different A and B, both nonzero, such thatdet(A+ B) = det(A)+ det(B).

23. For any real number a, a �= 0, show that∣∣∣∣∣∣∣a + 1 a + 4 a + 7a + 2 a + 5 a + 8a + 3 a + 6 a + 9

∣∣∣∣∣∣∣ = 0,

∣∣∣∣∣∣∣a 4a 7a2a 5a 8a3a 6a 9a

∣∣∣∣∣∣∣ = 0,

and

∣∣∣∣∣∣∣a a4 a7

a2 a5 a8

a3 a6 a9

∣∣∣∣∣∣∣ = 0.

24. Let A = [A1,A2,A3] be a (3× 3) matrix and set

B = 2 0 0

3 −1 01 3 4

.

a) Show thatAB = [2A1 + 3A2 + A3,−A2 + 3A3, 4A3].

b) Use column operations to show thatdet(AB) = −8 det(A).

c) Conclude that det(AB) = det(A) det(B).

25. Let U be an (n × n) upper-triangular matrix andconsider the cofactors U1j , 2 ≤ j ≤ n. Show thatU1j = 0, 2 ≤ j ≤ n. [Hint: Some column in U1j isalways the zero column.]

26. Use the result of Exercise 25 to prove inductivelythat det(U) = u11u22 . . . unn, whereU = (uij ) is an(n× n) upper-triangular matrix.

27. Let y = mx + b be the equation of the line throughthe points (x1, y1) and (x2, y2) in the plane. Showthat the equation is given also by∣∣∣∣∣∣∣

x y 1

x1 y1 1

x2 y2 1

∣∣∣∣∣∣∣ = 0.

28. Let (x1, y1), (x2, y2), and (x3, y3) be the vertices ofa triangle in the plane where these vertices are num-bered counterclockwise. Prove that the area of thetriangle is given by

Area = 12

∣∣∣∣∣∣∣x1 y1 1

x2 y2 1

x3 y3 1

∣∣∣∣∣∣∣ .29. Let x and y be vectors in R3, and let A = I + xyT .

Show that det(A) = 1 + yT x. [Hint: If B = xyT ,B = [B1,B2,B3], thenA = [B1+e1,B2+e2,B3+e3]. Therefore, det(A) = det[B1,B2 + e2,B3 +e3] + det[e1,B2 + e2,B3 + e3]. Use Theorems 4and 5 to show that the first determinant is equal todet[B1, e2,B3 + e3], and so on.]

30. Use column operations to prove that∣∣∣∣∣∣∣1 a a2

1 b b2

1 c c2

∣∣∣∣∣∣∣ = (b − a)(c − a)(c − b).

31. Evaluate the (4× 4) determinant∣∣∣∣∣∣∣∣∣∣

1 a a2 a3

1 b b2 b3

1 c c2 c3

1 d d2 d3

∣∣∣∣∣∣∣∣∣∣.

[Hint: Proceed as in Exercise 30.]32. Prove the corollary to Theorem 3.


6.4 Cramer’s Rule 465

6.4 CRAMER’S RULE

In Section 6.3, we saw how to calculate the effect that a column operation or a rowoperation has on a determinant. In this section, we use that information to analyzethe relationships between determinants, nonsingular matrices, and solutions of systemsAx = b. We begin with the following lemma, which will be helpful in the proof of thesubsequent theorems.

Lemma 1 Let A = [A1,A2, . . . ,An] be an (n× n)matrix, and let b be any vector in Rn. For eachi, 1 ≤ i ≤ n, let Bi be the (n× n) matrix:

Bi = [A1, . . . ,Ai−1, b,Ai+1, . . . ,An].If the system of equationsAx = b is consistent and xi is the ith component of a solution,then

xi det(A) = det(Bi). (1)

Proof To keep the notation simple, we give the proof of Eq. (1) only for i = 1. Since thesystem Ax = b is assumed to be consistent, there are values x1, x2, . . . , xn such that

x1A1 + x2A2 + · · · + xnAn = b.

Using the properties of determinants, we have

x1 det(A) = det[x1A1,A2, . . . ,An]= det[b− x2A2 − · · · − xnAn,A2, . . . ,An]= det[b,A2, . . . ,An] − x2 det[A2,A2, . . . ,An]− · · · − xn det[An,A2, . . . ,An].

By Theorem 5, the last n− 1 determinants are zero, so we have

x1 det(A) = det[b,A2, . . . ,An];and this equality verifies Eq. (1) for i = 1. Clearly, the same argument is valid forany i.

As the following theorem shows, one consequence of Lemma 1 is that a singularmatrix has determinant zero.

Theorem 7 If A is an (n× n) singular matrix, then det(A) = 0.

Proof Since A is singular, Ax = θ has a nontrivial solution. Let xi be the ith component ofa nontrivial solution, and choose i so that xi �= 0. By Lemma 1, xi det(A) = det(Bi),whereBi = [A1, . . . ,Ai−1, θ ,Ai+1, . . . ,An]. It follows from Theorem 3 that det(Bi) =0. Thus, xi det(A) = 0, and since xi �= 0, then det(A) = 0.

Theorem 9, stated later, establishes the converse of Theorem 7: If det(A) = 0, thenA is a singular matrix. Theorem 9 will be an easy consequence of the product rule fordeterminants.



The Determinant of a ProductTheorem 8 states that if A and B are (n× n) matrices, then det(AB) = det(A) det(B).This result is somewhat surprising in view of the complexity of matrix multiplication.We also know, in general, that det(A+ B) is distinct from det(A)+ det(B).

Theorem 8 If A and B are (n× n) matrices, thendet(AB) = det(A) det(B).

Before sketching a proof of Theorem 8, note that if A is an (n × n) matrix, andif B is obtained from A by a sequence of elementary column operations, then, by theproperties of determinants given in Theorems 2, 3, and 6, det(A) = k det(B), where thescalar k is completely determined by the elementary column operations. To illustrate,suppose that B is obtained by the following sequence of elementary column operations:

1. Interchange the first and third columns.2. Multiply the second column by 3.3. Add 2 times the second column to the first column.

It now follows from Theorems 2, 3, and 6 that det(B) = −3 det(A) or, equivalently,det(A) = (−1/3) det(B). Moreover, the scalar −1/3 is completely determined by theoperations; that is, the scalar is independent of the matrices involved.

The proof of Theorem 8 is based on the previous observation and on the followinglemma.

Lemma 2 Let A and B be (n× n) matrices and let C = AB. Let C denote the result of applyingan elementary column operation to C and let B denote the result of applying the samecolumn operation to B. Then C = AB.

The proof of Lemma 2 is left to the exercises. The intent of the lemma is given schemat-ically in Fig. 6.1.

ColumnoperationAB

B

A

* AB

AB =

ColumnoperationB B

A

* AB

AB

Figure 6.1 Schematic diagram of Lemma 2

Lemma 2 tells us that the same result is produced whether we apply a columnoperation to the product AB or whether we apply the operation to B first (producing B)



and then form the productAB. For example, suppose thatA and B are (3×3)matrices.Consider the operation of interchanging column 1 and column 3:

B = [B1,B2,B3] → B = [B3,B2,B1]; AB = [AB3, AB2, AB1]AB = [AB1, AB2, AB3] → AB = [AB3, AB2, AB1]; AB = AB.

Proof of Theorem 8 Suppose thatA andB are (n×n)matrices. IfB is singular, then Theorem 8 is immediate,for in this case AB is also singular. Thus, by Theorem 7, det(B) = 0 and det(AB) = 0.Consequently, det(AB) = det(A) det(B).

Next, suppose that B is nonsingular. In this case, B can be transformed to the(n × n) identity matrix I by a sequence of elementary column operations. (To seethis, note that BT is nonsingular by Theorem 17, property 4, of Section 1.9. It nowfollows from Theorem 16 of Section 1.9 that BT can be reduced to I by a sequenceof elementary row operations. But performing row operations on BT is equivalent toperforming column operations on B.) Therefore, det(B) = k det(I ) = k, where k isdetermined entirely by the sequence of elementary column operations. By Lemma 2,the same sequence of operations reduces the matrix AB to the matrix AI = A. Thus,det(AB) = k det(A) = det(B) det(A) = det(A) det(B).

Example 1 Show by direct calculation that det(AB) = det(A) det(B) for the matrices

A =[2 11 3

]and B =

[ −1 32 −2

].

Solution We have det(A) = 5 and det(B) = −4. Since AB is given by

AB =[0 45 −3

],

it follows that det(AB) = −20 = (5)(−4) = det(A) det(B).

The following theorem is now an easy consequence of Theorem 8.

Theorem 9 If the (n × n) matrix A is nonsingular, then det(A) �= 0. Moreover, det(A−1) =1/ det(A).

Proof Since A is nonsingular, A−1 exists and AA−1 = I . By Theorem 8, 1 = det(I ) =det(AA−1) = det(A) det(A−1). In particular, det(A) �= 0 and det(A−1) = 1/ det(A).

Theorems 7 and 9 show that an (n×n)matrixA is singular if and only if det(A) = 0.This characterization of singular matrices is especially useful when we want to examinematrices that depend on a parameter. The next example illustrates one such application.

Example 2 Find all values λ such that the matrix B(λ) is singular, where

B(λ) = 2− λ 0 0

2 3− λ 41 2 1− λ

.



Solution By Theorems 7 and 9, B(λ) is singular if and only if det[B(λ)] = 0. The equationdet[B(λ)] = 0 is determined by

0 = det[B(λ)]= (2− λ)[(3− λ)(1− λ)− 8]= (2− λ)[λ2 − 4λ− 5]= (2− λ)(λ− 5)(λ+ 1).

Thus, B(λ) is singular if and only if λ is one of the values λ = 2, λ = 5, or λ = −1.The three matrices discovered by solving det[B(λ)] = 0 are listed next. As we can

see, each of these matrices is singular:

B(2) = 0 0 02 1 41 2 −1

, B(5) = −3 0 0

2 −2 41 2 −4

, B(−1) = 3 0 02 4 41 2 2

.Solving Ax = b with Cramer’s RuleA major result in determinant theory is Cramer’s rule, which gives a formula for thesolution of any system Ax = b when A is nonsingular.

Theorem 10 Cramer’s Rule Let A = [A1,A2, . . . ,An] be a nonsingular (n × n) matrix, andlet b be any vector in Rn. For each i, 1 ≤ i ≤ n, let Bi be the matrix Bi =[A1, . . . ,Ai−1, b,Ai+1, . . . ,An]. Then the ith component, xi , of the solution ofAx = bis given by

xi = det(Bi)

det(A). (2)

Proof Since A is nonsingular, det(A) �= 0. Formula (2) is now an immediate consequence of(1) in Lemma 1.

Example 3 Use Cramer’s rule to solve the system

3x1 + 2x2 = 45x1 + 4x2 = 6.

Solution To solve this system by Cramer’s rule, we write the system as Ax = b, and we formB1 = [b,A2] and B2 = [A1, b]:

A =[3 25 4

], B1 =

[4 26 4

], B2 =

[3 45 6

].

Note that det(A) = 2, det(B1) = 4, and det(B2) = −2. Thus, from Eq. (2), the solutionis

x1 = 42= 2 and x2 = −22 = −1.



Example 4 Use Cramer’s rule to solve the system

x1 − x2 + x3 = 0x1 + x2 − 2x3 = 1x1 + 2x2 + x3 = 6.

Solution Writing the system as Ax = b, we have

A = 1 −1 1

1 1 −21 2 1

, B1 = 0 −1 1

1 1 −26 2 1

,

B2 = 1 0 1

1 1 −21 6 1

, B3 = 1 −1 0

1 1 11 2 6

.

A calculation shows that det(A) = 9, det(B1) = 9, det(B2) = 18, and det(B3) = 9.Thus, by Eq. (2), the solution is

x1 = 99= 1, x2 = 18

9= 2, and x3 = 9

9= 1.

As a computational tool, Cramer’s rule is rarely competitive with Gaussian elimi-nation. It is, however, a valuable theoretical tool. Three specific examples illustratingthe use of Cramer’s rule in theoretical applications are as follows.

1. The method of variation of parameters (see W. E. Boyce and R. C. DiPrima,Elementary Differential Equations and Boundary Value Problems, p. 277. NewYork: John Wiley and Sons, 1986).

2. The theory of continued fractions (see PeterHenrici,Applied andComputationalComplex Analysis, Volume 2, pp. 520–521. New York: John Wiley and Sons,1977).

3. Characterization of best approximations (see E. W. Cheney, Introduction toApproximation Theory, p. 74. New York: McGraw-Hill, 1966).

CRAMER’S RULE In 1750, Gabriel Cramer (1704–1752) published a work in which, in theappendix, he stated the determinant procedure named after him for solving n linear equations in nunknowns. The first discoverer of this rule, however, was almost surely the Scottish mathematician ColinMaclaurin (1698–1746). It appeared in a paper of Maclaurin’s in 1748, published two years after hisdeath. This perhaps compensates for the fact that the famous series named after Maclaurin was not firstdiscovered by him. (Ironically, the Maclaurin series is a special case of a Taylor series, named after theEnglish mathematician Brook Taylor. However, as with the Maclaurin series, Taylor was not the firstdiscoverer of the Taylor series!)



6.4 EXERCISES

In Exercises 1–3, use column operations to reduce thegiven matrixA to lower-triangular form. Find the deter-minant of A.

1. A = 0 1 3

1 2 13 4 1

2. A = 1 2 1

2 4 32 1 3

3. A = 2 2 4

1 3 4−1 2 1

In Exercises 4–6, use column operations to reduce thegiven matrix A to the identity matrix. Find the determi-nant of A.

4. A = 1 0 1

2 1 11 2 1

5. A = 1 0 −2

3 1 30 1 2

6. A = 2 2 2

4 3 42 1 2

7. LetA andB be (3×3)matrices such that det(A) = 2and det(B) = 3. Find the value of each of thefollowing.a) det(AB) b) det(AB2)

c) det(A−1B) d) det(2A−1)e) det(2A)−1

8. Show that the matrices[sin θ − cos θcos θ sin θ

]and

sin θ − cos θ 2cos θ sin θ 30 0 1

are nonsingular for all values of θ .

In Exercises 9–14, find all values λ such that the givenmatrix B(λ) is singular.

9. B(λ) =[

λ 03 2− λ

]

10. B(λ) =[

λ 11 λ

]11. B(λ) =

[2 λ

λ 2

]

12. B(λ) = 1 λ λ2

1 1 11 3 9

13. B(λ) = λ 1 1

1 λ 11 1 λ

14. B(λ) =

2− λ 0 32 λ 11 0 −λ

In Exercises 15–21, use Cramer’s rule to solve the givensystem.15. x1 + x2 = 3

x1 − x2 = −116. x1 + 3x2 = 4

x1 − x2 = 0

17. x1 − 2x2 + x3 = −1x1 + x3 = 3x1 − 2x2 = 0

18. x1 + x2 + x3 = 2x1 + 2x2 + x3 = 2x1 + 3x2 − x3 = −4

19. x1 + x2 + x3 − x4 = 2x2 − x3 + x4 = 1

x3 − x4 = 0x3 + 2x4 = 3

20. 2x1 − x2 + x3 = 3x1 + x2 = 3

x2 − x3 = 121. x1 + x2 + x3 = a

x2 + x3 = b

x3 = c

22. Suppose thatA is an (n×n)matrix such thatA2 = I .Show that | det(A)| = 1.

23. Prove Lemma 2. [Hint: Let

B = [B1,B2, . . . ,Bi , . . . ,Bj , . . . ,Bn]and consider the matrix B produced by interchang-ing column i and column j . Also consider thematrixB produced by replacing Bi by Bi + aBj .]

24. We know that AB and BA are not usually equal.However, show that if A and B are (n × n), thendet(AB) = det(BA).

25. Suppose that S is a nonsingular (n× n) matrix, andsuppose that A and B are (n×n)matrices such thatSAS−1 = B. Prove that det(A) = det(B).

26. Suppose that A is (n × n) and A2 = A. What isdet(A)?


6.5 Applications of Determinants: Inverses and Wronskians 471

27. If det(A) = 3, what is det(A5)?28. Let A be a nonsingular matrix and suppose that all

the entries of both A and A−1 are integers. Provethat det(A) = ±1. [Hint: Use Theorem 9.]

29. Let A and C be square matrices, and let Q be amatrix of the form

Q =[

A OB C

].

Convince yourself that det(Q) = det(A) det(C).[Hint: Reduce C to lower-triangular form with col-umn operations; then reduce A.]

30. Verify the result in Exercise 29 for the matrix

Q =

1 2 0 0 0

2 1 0 0 0

3 5 1 2 2

7 2 3 5 1

1 8 1 4 1

.

6.5 APPLICATIONS OF DETERMINANTS: INVERSESAND WRONSKIANS

Now that we have det(AB) = det(A) det(B), we are ready to prove that det(AT ) =det(A) and to establish some other useful properties of determinants. First, however, weneed the preliminary result stated in Theorem 11.

Theorem 11 Let A be an (n × n) matrix. Then there is a nonsingular (n × n) matrix Q such thatAQ = L, where L is lower triangular. Moreover, det(QT ) = det(Q).

The proof of Theorem11 is based on the following fact: The result of any elementarycolumn operation applied to A can be represented in matrix terms as AQi , where Qi isan elementary matrix. We discuss this fact and give the proof of Theorem 11 at the endof this section.

Theorem 11 can be used to prove the following important result.

Theorem 12 If A is an (n× n) matrix, then det(AT ) = det(A).

Proof By Theorem 11, there is an (n × n) matrix Q such that AQ = L, where L is a lower-triangular matrix. Moreover, Q is nonsingular and det(QT ) = det(Q). Now, givenAQ = L, it follows that

QTAT = LT .

Applying Theorem 8 to AQ = L and toQTAT = LT , we obtain

det(A) det(Q) = det(L)det(QT ) det(AT ) = det(LT ).

Since L and LT are triangular matrices with the same diagonal entries, it follows (seeTheorem 1 of Section 6.2 and Exercise 26 of Section 6.3) that det(L) = det(LT ). Hence,from the two equalities above, we have

det(A) det(Q) = det(QT ) det(AT ).

Finally, since det(Q) = det(QT ) and det(Q) �= 0, we see that det(A) = det(AT ).



At this point we know that Theorems 2–6 of Section 6.3 are valid for rows as well asfor columns. In particular,we can use row operations to reduce a matrixA to a triangularmatrix T and conclude that det(A) = ± det(T ).

Example 1 We return to the (4× 4) matrix A in Example 8 of Section 6.3, where det(A) = −63:

det(A) =

∣∣∣∣∣∣∣∣∣∣1 2 0 2−1 2 3 1−3 2 −1 02 −3 −2 1

∣∣∣∣∣∣∣∣∣∣.

By using row operations, we can reduce det(A) to

det(A) =

∣∣∣∣∣∣∣∣∣∣1 2 0 20 4 3 30 8 −1 60 −7 −2 −3

∣∣∣∣∣∣∣∣∣∣.

Now we switch rows 2 and 3 and then switch columns 2 and 3 in order to get the number−1 into the pivot position. Following this switch, we create zeros in the (2, 3) and (2, 4)positions with row operations; and we find

det(A) =

∣∣∣∣∣∣∣∣∣∣1 0 2 20 −1 8 60 3 4 30 −2 −7 −3

∣∣∣∣∣∣∣∣∣∣=

∣∣∣∣∣∣∣∣∣∣1 0 2 20 −1 8 60 0 28 210 0 −23 −15

∣∣∣∣∣∣∣∣∣∣.

(The sign of the first determinant above is the same as det(A) because the first determinantis the result of two interchanges.) A quick calculation shows that the last determinanthas the value −63.

The next theorem shows that we can evaluate det(A) by using an expansion alongany row or any column we choose. Computationally, this ability is useful when somerow or column contains a number of zero entries.

Theorem 13 Let A = (aij ) be an (n× n) matrix. Then:

det(A) = ai1Ai1 + ai2Ai2 + · · · + ainAin (1)

det(A) = a1jA1j + a2jA2j + · · · + anjAnj . (2)

Proof We establish only Eq. (1), which is an expansion of det(A) along the ith row. Expansionof det(A) along the j th column in Eq. (2) is proved the same way.



Form a matrix B from A in the following manner: Interchange row i first with rowi − 1 and then with row i − 2; continue until row i is the top row of B. In other words,bring row i to the top and push the other rows down so that they retain their same relativeordering. This procedure requires i − 1 interchanges; so det(A) = (−1)i−1 det(B). Aninspection shows that the cofactors B11, B12, . . . , B1n are also related to the cofactorsAi1, Ai2, . . . , Ain by B1k = (−1)i−1Aik . To see this relationship, one need only observethat if M is the minor of the (1, k) entry of B, then M is the minor of the (i, k) entryof A. Therefore, B1k = (−1)k+1M and Aik = (−1)i+kM , which shows that B1k =(−1)i−1Aik . With this equality and Definition 2 of Section 6.2,

det(B) = b11B11 + b12B12 + · · · + b1nB1n

= ai1B11 + ai2B12 + · · · + ainB1n

= (−1)i−1(ai1Ai1 + ai2Ai2 + · · · + ainAin).

Since det(A) = (−1)i−1 det(B), formula (1) is proved.

The Adjoint Matrix and the InverseWe next show how determinants can be used to obtain a formula for the inverse ofa nonsingular matrix. We first prove a lemma, which is similar in appearance toTheorem 13. In words, the lemma states that the sum of the products of entries from theith row with cofactors from the kth row is zero when i �= k (and by Theorem 13 thissum is the determinant when i = k).

Lemma If A is an (n× n) matrix and if i �= k, then ai1Ak1 + ai2Ak2 + · · · + ainAkn = 0.

Proof For i and k, given i �= k, let B be the (n × n) matrix obtained from A by deleting thekth row of A and replacing it by the ith row of A; that is, B has two equal rows, the ithand kth, and B is the same as A for all rows but the kth.

In this event it is clear that det(B) = 0, that the cofactor Bkj is equal to Akj , andthat the entry bkj is equal to aij . Putting these together gives

0 = det(B) = bk1Bk1 + bk2Bk2 + · · · + bknBkn

= ai1Ak1 + ai2Ak2 + · · · + ainAkn;thus the lemma is proved.

This lemma can be used to derive a formula forA−1. In particular, letA be an (n×n)matrix, and let C denote the matrix of cofactors; C = (cij ) is (n × n), and cij = Aij .The adjoint matrix of A, denoted Adj(A), is equal to CT . With these preliminaries, weprove Theorem 14.

Theorem 14 If A is an (n× n) nonsingular matrix, then

A−1 = 1det(A)

Adj(A).

Proof Let B = (bij ) be the matrix product of A and Adj(A). Then the ij th entry of B is

bij = ai1Aj1 + ai2Aj2 + · · · + ainAjn,



and by the lemma and Theorem 13, bij = 0 when i �= j , while bii = det(A). Therefore,B is equal to a multiple of det(A) times I , and the theorem is proved.

Example 2 Let A be the matrix

A = 1 −1 2

2 1 −34 1 1

.

We calculate the nine required cofactors and find

A11 = 4 A12 = −14 A13 = −2A21 = 3 A22 = −7 A23 = −5A31 = 1 A32 = 7 A33 = 3.

The adjoint matrix (the transpose of the cofactor matrix) is

Adj(A) = 4 3 1−14 −7 7−2 −5 3

.

A multiplication shows that the product of A and Adj(A) is 14 0 00 14 00 0 14

;so A−1 = (1/14)Adj(A), where of course det(A) = 14.

Theorem 14 is especially useful when we need to calculate the inverse of a matrixthat contains variables. For instance, consider the (3× 3) matrix

A = a 1 b

1 1 1b 1 a

. (3)

Although A has some variable entries, we can calculate det(A) and Adj(A) and hencefind A−1.

Example 3 Let A be the (3× 3) matrix displayed in (3). Find A−1.

Solution Although we can do this calculation by hand, it is more convenient to use a computeralgebra system. We used Derive and found A−1 as shown in Fig. 6.2.

The WronskianAs a final application of determinant theory, we develop a simple test for the linearindependence of a set of functions. Suppose that f0(x), f1(x), . . . , fn(x) are real-valued



a 1 b -17: 1 1 1 b 1 a

a - 1 1 1 - b

(a + b - 2)(a - b) a + b - 2 (a + b - 2)(a - b)

1 a + b 1

a + b - 2 a + b - 2 a + b - 2

1 - b 1 a - 1

(a + b - 2)(a - b) a + b - 2 (a + b - 2)(a - b)

-

-

- -8:

Figure 6.2 Using Derive to find the inverse of a matrix with variableentries, as in Example 3

functions defined on an interval [a, b]. If there exist scalars a0, a1, . . . , an (not all ofwhich are zero) such that

a0f0(x)+ a1f1(x)+ · · · + anfn(x) = 0 (4)

for all x in [a, b], then {f0(x), f1(x), . . . , fn(x)} is a linearly dependent set of functions(see Section 5.4). If the only scalars for which Eq. (4) holds for all x in [a, b] area0 = a1 = · · · = an = 0, then the set is linearly independent.

A test for linear independence can be formulated from Eq. (4) as follows: Ifa0, a1, . . . , an are scalars satisfying Eq. (4) and if the functions fi(x) are sufficiently dif-ferentiable, then we can differentiate both sides of the identity (4) and have a0f (i)

0 (x)+a1f

(i)1 (x)+ · · · + anf

(i)n (x) = 0, 1 ≤ i ≤ n. In matrix terms, these equations are

f0(x) f1(x) · · · fn(x)

f ′0(x) f ′1(x) · · · f ′n(x)...

...

f(n)0 (x) f

(n)1 (x) · · · f (n)

n (x)

a0

a1...

an

=

00...

0

.

If we denote the coefficient matrix above as W(x), then det[W(x)] is called the Wron-skian for {f0(x), f1(x), . . . , fn(x)}. If there is a pointx0 in [a, b] such that det[W(x0)] �=0, then the matrixW(x) is nonsingular at x = x0, and the implication is that a0 = a1 =· · · = an = 0. In summary, if the Wronskian is nonzero at any point in [a, b], then{f0(x), f1(x), . . . , fn(x)} is a linearly independent set of functions. Note, however, thatdet[W(x)] = 0 for all x in [a, b] does not imply linear dependence (see Example 4).



WRONSKIANS Wronskians are named after the Polish mathematician Josef MariaHoëné-Wronski (1778–1853). Unfortunately, the violent character of his personal life often detractedfrom the respect he was due from his mathematical work. The Wronskian provides a partial test for linearindependence. If the Wronskian is nonzero for some x0 in [a, b], then f0(x), f1(x), . . . , fn(x) arelinearly independent (see the first part of Example 4). If the Wronskian is zero for all x in [a, b], then thetest gives no information (see the second part of Example 4).

The Wronskian does provide a complete test for linear independence, however, whenf0(x), f1(x), . . . , fn(x) are solutions of an (n+ 1)st-order linear differential equation of the form

y(n+1) + gn(x)y(n) + · · · + g1(x)y

′ + g0(x)y = 0,

where g0(x), g1(x), . . . , gn(x) are all continuous on (a, b). In this case, f0(x), f1(x), . . . , fn(x) arelinearly independent if and only if the Wronskian is never zero for any x in (a, b).

Example 4 Let F1 = {x, cos x, sin x} and F2 = {sin2 x, | sin x| sin x} for −1 ≤ x ≤ 1. Therespective Wronskians are

w1(x) =

∣∣∣∣∣∣∣x cos x sin x1 − sin x cos x0 − cos x − sin x

∣∣∣∣∣∣∣ = x

and

w2(x) =∣∣∣∣∣ sin2 x | sin x| sin xsin 2x | sin 2x|

∣∣∣∣∣ = 0.

Since w1(x) �= 0 for x �= 0, F1 is linearly independent. Even though w2(x) = 0 for allx in [−1, 1], F2 is also linearly independent, for if a1 sin2 x + a2| sin x| sin x = 0, thenat x = 1, a1 + a2 = 0; and at x = −1, a1 − a2 = 0; so a1 = a2 = 0.

Elementary Matrices (Optional)In this subsection, we observe that the result of applying a sequence of elementary columnoperations to a matrix A can be represented in matrix terms as multiplication of A by asequence of elementary matrices. In particular, let I denote the (n× n) identity matrix,and let E be the matrix that results when an elementary column operation is applied toI . Such a matrix E is called an elementary matrix.

For example, consider the (3× 3) matrices

E1 = 1 0 3

0 1 00 0 1

and E2 = 0 1 0

1 0 00 0 1

.

As we can see, E1 is obtained from I by adding 3 times the first column of I to the thirdcolumn of I . Similarly, E2 is obtained from I by interchanging the first and secondcolumns of I . Thus E1 and E2 are specific examples of (3× 3) elementary matrices.



The next theorem shows how elementary matrices can be used to represent elemen-tary column operations as matrix products.

Theorem 15 Let E be the (n × n) elementary matrix that results from performing a certain columnoperation on the (n× n) identity. If A is any (n× n) matrix, then AE is the matrix thatresults when this same column operation is performed on A.

Proof We prove Theorem 15 only for the case in which the column operation is to add c timescolumn i to column j . The rest of the proof is left to the exercises.

Let E denote the elementary matrix derived by adding c times the ith column of Ito the j th column of I . Since I is given by I = [e1, e2, . . . , ei , . . . , ej , . . . , en], we canrepresent the elementary matrix E in column form as

E = [e1, e2, . . . , ei , . . . , ej + cei , . . . , en].Consequently, in column form, AE is the matrix

AE = [Ae1, Ae2, . . . , Aei , . . . , A(ej + cei ), . . . , Aen].Next, ifA = [A1,A2, . . . ,An], thenAek = Ak , 1 ≤ k ≤ n. Therefore, AE has the form

AE = [A1,A2, . . . ,Ai , . . . ,Aj + cAi , . . . ,An].From this column representation for AE, it follows that AE is the matrix that resultswhen c times column i of A is added to column j .

We now use Theorem 15 to prove Theorem 11. LetA be an (n×n)matrix. ThenAcan be reduced to a lower-triangular matrixL by using a sequence of column operations.Equivalently, by Theorem 15, there is a sequence of elementary matricesE1, E2, . . . , Er

such that

AE1E2 · · ·Er = L. (5)

In Eq. (5), an elementarymatrixEk represents either a column interchange or the additionof a multiple of one column to another. It can be shown that:

(a) If Ek represents a column interchange, then Ek is symmetric.(b) If Ek represents the addition of a multiple of column i to column j , where

i < j , then Ek is an upper-triangular matrix with all main diagonal entriesequal to 1.

Now in Eq. (5), letQ denote the matrixQ = E1E2 · · ·Er and observe thatQ is nonsin-gular because each Ek is nonsingular. To complete the proof of Theorem 11, we needto verify that det(QT ) = det(Q).

From the remarks in (a) and (b) above, det(ETk ) = det(Ek), 1 ≤ k ≤ r , since each

matrix Ek is either symmetric or triangular. Thus

det(QT ) = det(ETr · · ·ET

2 ET1 )

= det(ETr ) · · · det(ET

2 ) det(ET1 )

= det(Er) · · · det(E2) det(E1)

= det(Q).

An illustration of the discussion above is provided by the next example.




A = 0 1 3

1 2 13 4 2

.

Display elementary matricesE1, E2, andE3 such thatAE1E2E3 = L, where L is lowertriangular.

Solution Matrix A can be reduced to a lower-triangular matrix by the following sequence ofcolumn operations:

A = 0 1 3

1 2 13 4 2

C1 ↔ C2−−−−−−→

1 0 32 1 14 3 2

C3 − 3C1−−−−−−→

1 0 02 1 −54 3 −10

C3 + 5C2−−−−−−→

1 0 02 1 04 3 5

.

Therefore, AE1E2E3 = L, where

E1 = 0 1 0

1 0 00 0 1

, E2 = 1 0 −3

0 1 00 0 1

, and E3 = 1 0 0

0 1 50 0 1

.

Note that E1 is symmetric and E2 and E3 are upper triangular.

6.5 EXERCISES

In Exercises 1–4, use row operations to reduce the givendeterminant to upper-triangular form and determine thevalue of the original determinant.1. ∣∣∣∣∣∣∣

1 2 12 3 2−1 4 1

∣∣∣∣∣∣∣2.∣∣∣∣∣∣∣0 3 11 2 12 −2 2

∣∣∣∣∣∣∣3.∣∣∣∣∣∣∣0 1 31 2 23 1 0

∣∣∣∣∣∣∣4.∣∣∣∣∣∣∣1 0 10 2 43 2 1

∣∣∣∣∣∣∣In Exercises 5–10, find the adjoint matrix for the givenmatrixA. Next, use Theorem 14 to calculate the inverseof the given matrix.5.[

1 23 4

]6.[

a b

c d

]

7. 1 0 1

2 1 2

1 1 2

8. 2 1 0

3 0 1

0 1 1

9. 1 1 1

1 2 2

1 3 1

10. 1 2 3

0 1 2

0 0 1

In Exercises 11–16, calculate the Wronskian. Also, de-termine whether the given set of functions is linearlyindependent on the interval [−1, 1].11. {1, x, x2} 12. {ex, e2x, e3x}13. {1, cos2 x, sin2 x} 14. {1, cos x, cos 2x}15. {x2, x|x|} 16. {x2, 1+ x2, 2− x2}



In Exercises 17–20, find elementary matrices E1, E2,and E3 such that AE1E2E3 = L, where L is lower tri-angular. Calculate the productQ = E1E2E3 and verifythat AQ = L and det(Q) = det(QT ).

17. A = 0 1 3

1 2 42 2 1

18. A = 0 −1 2

1 3 −11 2 1

19. A = 1 2 −1

3 5 14 0 2

20. A = 2 4 −6

1 1 13 2 1

In Exercises 21–24, calculate det[A(x)] and show thatthe givenmatrixA(x) is nonsingular for any real value ofx. Use Theorem 14 to find an expression for the inverseof A(x).

21. A(x) =[

x 1−1 x

]22. A(x) =

[1 x

−x 2

]

23. A(x) = 2 x 0−x 2 x

0 −x 2

24. A(x) =

sin x 0 cos x0 1 0

− cos x 0 sin x

25. Let L and U be the (3× 3) matrices

L = 1 0 0

a 1 0b c 1

and U = 1 a b

0 1 c

0 0 1

.

Use Theorem 14 to show thatL−1 is lower triangularand U−1 is upper triangular.

26. Let L be a nonsingular (4× 4) lower-triangular ma-trix. Show thatL−1 is also a lower-triangularmatrix.[Hint: Consider a variation of Exercise 25.]

27. Let A be an (n× n) matrix, where det(A) = 1 andA contains only integer entries. Show thatA−1 con-tains only integer entries.

28. Let E denote the (n × n) elementary matrix cor-responding to an interchange of the ith and j thcolumns of I . Let A be any (n× n) matrix.a) Show that matrix AE is equal to the result of

interchanging columns i and j of A.b) Show that matrix E is symmetric.

29. An (n × n) matrix A is called skew symmetric ifAT = −A. Show that if A is skew symmetric, thendet(A) = (−1)n det(A). If n is odd, show that Amust be singular.

30. An (n× n) real matrix is orthogonal provided thatAT = A−1. If A is an orthogonal matrix, prove thatdet(A) = ±1.

31. Let A be an (n × n) nonsingular matrix. Provethat det[Adj(A)] = [det(A)]n−1.[Hint: Use Theorem 14.]

32. Let A be an (n× n) nonsingular matrix.a) Show that

[Adj(A)]−1 = 1det(A)

A.

[Hint: Use Theorem 14.]b) Show that

Adj(A−1) = 1det(A)

A.

[Hint: Use Theorem 14 to obtain a formula for(A−1)−1.]


1. Express ∣∣∣∣∣ a11 + b11 a12 + b12

a21 + b21 a22 + b22

∣∣∣∣∣as a sum of four determinants in which there are nosums in the entries.

2. Let A = [A1,A2, . . . ,An] be an (n × n) matrix andlet B = [An, An−1, . . . ,A1]. How are det(A) anddet(B) related when n is odd? When n is even?

3. If A is an (n× n) matrix such that A3 = A, then listall possible values for det(A).

4. If A is a nonsingular (2× 2) matrix and c is a scalarsuch that AT = cA, what are the possible values



for c? If A is a nonsingular (3× 3) matrix, what arethe possible values for c?

5. Let A = (aij ) be a (3× 3)matrix such that det(A) =2, and let Aij denote the ij th cofactor of A. If

B = A31 A21 A11

A32 A22 A12

A33 A23 A13

,

then calculate AB.6. Let A = (aij ) be a (3 × 3) matrix with a11 = 1,a12 = 2, and a13 = −1. Let

C = −7 5 4−4 3 29 −7 −5

be the matrix of cofactors for A. (That is, A11 = −7,A12 = 5, and so on.) Find A.

7. Let b = [b1, b2, . . . , bn]T .a) For 1 ≤ i ≤ n, let Ai be the (n× n) matrix

Ai = [e1, . . . , ei−1, b, ei+1, . . . , en]. ApplyCramer’s rule to the system Inx = b to show thatdet(Ai) = bi .

b) If B is the (n× n) matrix B = [b, . . . , b], thenuse part a) and Theorem 4 to determine a formulafor det(B + I ).

8. If the Wronskian for {f0(x), f1(x), f2(x)} is(x2 + 1)ex , then calculate the Wronskian for{xf0(x), xf1(x), xf2(x)}.


In Exercises 1–8, answer true or false. Justify your an-swer by providing a counterexample if the statement isfalse or an outline of a proof if the statement is true.1. If A, B, and C are (n × n) matrices such thatAB = AC and det(A) �= 0, then B = C.

2. If A and B are (n × n) matrices, then det(AB) =det(BA).

3. If A is an (n × n) matrix and c is a scalar, thendet(cIn − A) = cn − det(A).

4. If A is an (n × n) matrix and c is a scalar, thendet(cA) = c det(A).

5. IfA is an (n×n)matrix such thatAk = O for somepositive integer k, then det(A) = 0.

6. If A1, A2, . . . , Am are (n × n) matrices such thatB = A1A2 . . . Am is nonsingular, then each Ai isnonsingular.

7. If the matrix A is symmetric, then so is Adj(A).8. If A is an (n× n)matrix such that det(A) = 1, thenAdj[Adj(A)] = A.

In Exercises 9–15, give a brief answer.9. Show that A2 + I = O is not possible if A is an(n× n) matrix and n is odd.

10. Let A and B be (n× n)matrices such that AB = I .Prove that BA = I . [Hint: Show that det(A) �= 0and conclude that A−1 exists.]

11. If A is an (n× n)matrix and c is a scalar, show thatdet(AT − cI) = det(A− cI).

12. Let A and B be (n × n) matrices such that B isnonsingular, and let c be a scalar.a) Show that det(A− cI) = det(B−1AB − cI).b) Show that det(AB − cI) = det(BA− cI).

13. If A is a nonsingular (n× n)matrix, then prove thatAdj(A) is also nonsingular. [Hint: Consider theproduct A[Adj(A)].]

14. a) If A and B are nonzero (n× n) matrices suchthat AB = O, then prove that both A and B aresingular. [Hint: What would you conclude ifeither A or B were nonsingular?]

b) Use part a) to prove that if A is a singular(n× n) matrix, then Adj(A) is also a singularmatrix. [Hint: Consider the productA[Adj(A)].]

15. If A = (aij ) is an (n × n) orthogonal matrix (thatis, AT = A−1), then prove that Aij = aij det(A),where Aij is the ij th cofactor of A. [Hint: ExpressA−1 in terms of Adj(A).]



MATLAB EXERCISES

Exercises 1–6 will illustrate some properties of the determinant and help you sharpen yourskills using MATLAB to manipulate matrices and perform matrix surgery. These exercisesalso reinforce the theoretical properties of the determinant that you learned in Chapter 6.

1. Use the A = round(20*rand(5,5) - 10*ones(5,5))command to generate arandom (5×5)matrixA having integer entries selected from [−10, 10]. Use Definition 3to calculate det(A), using the MATLAB det command to calculate the five cofactorsA11, A12, . . . , A15. Use matrix surgery to create the five minor matrices Mij (recall thatthe minor matrix is defined in Definition 2). Compare your result with the value of thedeterminant of A as calculated by the MATLAB command det(A).

2. Use matrix A from Exercise 1 (or a similarly randomly generated matrix) to illustrateTheorems 2, 3, and the corollary to Theorem 3.

3. As in Exercise 2, use a randomly generated (5 × 5) matrix to illustrate Theorems 4, 5,and 6.

4. As in Exercise 2, use a randomly generated (5× 5) matrix to illustrate Theorem 12.

5. As in Exercise 2, use a randomly generated (5 × 5) matrix and a randomly generatedvector b to illustrate Cramer’s Rule (Theorem 10).

6. As in Exercise 2, use a randomly generated (5 × 5) matrix A and a randomly generated(5× 5) matrix B to illustrate Theorem 8.

7. How common are singular matrices? Because of the emphasis on singular matricesin matrix theory, it might seem that they are quite common. In this exercise, randomlygenerate 100matrices, calculate the determinant of each, and thenmake a rough assessmentas to how likely encountering a singular matrix would be.

The following MATLAB loop will generate the determinant values for 100 randomlychosen matrices:

determ = zeros(1,100);for i = 1 : 100A = round(20*rand(5,5) - 10*ones(5,5));determ(1,i) = det(A);

end

After executing this loop, list the vector determinant to display the 100 determinantvalues calculated. Are any of the 100matrices singular? Repeat the experiment using 1000randomly generatedmatrices instead of 100. Rather than listing the vector determinant, usethe min(abs(determ))command to find the smallest determinant in absolute value.Did you encounter any singular matrices?

8. Generating integer matrices with integer inverses For certain simulations, it is con-venient to have a collection of randomly-generated matrices that have integer entries and



whose inverses also have integer entries. Argue, using Theorem 14, that an integer matrixwith determinant equal to 1 or −1 will have an integer inverse.

One easy way to create an integer matrixAwith determinant equal to 1 or−1 is to setA = LU where L is a lower-triangular integer matrix with 1’s and −1’s on its diagonaland whereU is an upper-triangular integer matrix with 1’s and−1’s on its diagonal. Then,since det(A) = det(L) det(U), we see that both A and A−1 will be integer matrices.

Use these ideas to create a set of ten randomly generated (5×5) integer matrices withinteger inverses. For each matrix A created, use the MATLAB inv command to generatethe inverse for A. Note, because of roundoff error, that the MATLAB inverse for A isnot always an integer matrix. To eliminate the roundoff error, you can use the commandround(inv(A)) in order to round the entries of A−1 to the nearest integer. Check, bydirect multiplication, that this will produce the inverse.


483

7Eigenvalues andApplications

Overview In this chapter we discuss a number of applications of eigenvalues. Sections 7.1 and7.2 are independent and can be covered at any time. Section 7.3 is a prerequisite forSection 7.4 and Section 7.5 is a prerequisite for Section 7.6. Sections 7.3 and 7.5 are,however, independent. Sections 7.7 and 7.8 depend on Section 7.4.

Core Sections 7.1 Quadratic Forms7.2 Systems of Differential Equations7.3 Transformation to Hessenberg Form7.4 Eigenvalues of Hessenberg Matrices7.5 Householder Transformations7.6 The QR Factorization and Least-Squares Solutions7.7 Matrix Polynomials and the Cayley–Hamilton Theorem7.8 Generalized Eigenvectors and Solutions of Systems of Differential Equations


484 Chapter 7 Eigenvalues and Applications

7.1 QUADRATIC FORMS∗

An expression of the sort

q(x, y) = ax2 + bxy + cy2

is called a quadratic form in x and y. Similarly, the expression

q(x, y, z) = ax2 + by2 + cz2 + dxy + exz+ fyzis a quadratic form in the variables x, y, and z.

In general, a quadratic form in the variables x1, x2, . . . , xn is an expression of theform

q(x) = q(x1, x2, . . . , xn) =n∑i=1

n∑j=1

bij xixj . (1)

In Eq. (1), the coefficients bij are given constants and, for simplicity, we assume that bijare real constants.

The term form means homogeneous polynomial; that is, q(ax) = akq(x). Theadjective quadratic implies that the form is homogeneous of degrees 2; that is, q(ax) =a2q(x). Quadratic forms occur naturally in applications such as mechanics, vibrations,geometry, optimization, and so on.

Matrix Representations for Quadratic FormsAs we see in Eq. (1), a quadratic form is nothing more than a polynomial in severalvariables, where each term of the polynomial has degree 2 exactly. It turns out thatsuch polynomials can be represented in the form q(x) = xTAx, where A is a uniquelydetermined symmetric matrix. For example, consider the quadratic form

q(x, y) = 2x2 + 4xy − 3y2.

Using the properties of matrix multiplication, we can verify that

q(x, y) = [x, y][

2 22 −3

][x

y

].

There is a simple procedure for finding the symmetric matrix A = (aij ) such thatq(x) = xTAx. In particular, consider the general quadratic form given in Eq. (1). Theprocedure, as applied to (1), is simply:

1. Define aii = bii, 1 ≤ i ≤ n.2. Define aij = (bij + bij )/2, 1 ≤ i, j ≤ n, i �= j .

When these steps are followed, the (n× n)matrix A = (aij ) will be symmetric and willsatisfy the conditions q(x) = xTAx.

Example 1 Represent the quadratic form in Eq. (2) as q(x) = xTAx:

q(x) = x21 + x2

2 + 3x23 + 6x1x2 + 4x1x3 − 10x2x3. (2)

∗The sections in this chapter need not be read in the order they are presented; see the Overview for details.


7.1 Quadratic Forms∗ 485

Solution In the context of Eq. (1), the constant bij is the coefficient of the term xixj . With respectto Eq. (2), b11 = b22 = 1, b33 = 3, b12 = 6, b13 = 4, b23 = −10, and the othercoefficients bij are zero. Following the simple two-step procedure, we obtain

A =

1 3 23 1 −52 −5 3

.

A quick check shows that xTAx = q(x):

[x1, x2, x3]

1 3 23 1 −52 −5 3

x1

x2

x3

= x21 + x2

2 + 3x23 + 6x1x2 + 4x1x3 − 10x2x3.

The procedure used in Example 1 is stated formally in the next theorem. Also,for brevity in Theorem 1, we have combined steps (1) and (2) using an equivalentformulation.

Theorem 1 For x in Rn, let q(x) denote the quadratic form

q(x) =n∑i=1

n∑j=1

bij xixj .

Let A = (aij ) be the (n× n) matrix defined by

aij = (bij + bij )/2, 1 ≤ i, j ≤ n.The matrix A is symmetric and, moreover, q(x) = xTAx. In addition, there is no othersymmetric matrix B such that xTBx = q(x).

Proof The fact that A is symmetric comes from the expression that defines aij . In particular,observe that aij and aji are given, respectively, by

aij = bij + bji2and aji = bji + bij2

.

Thus, aij = aji , which shows that A is symmetric. Also note that aii = (bii + bii)/2 =bii , which agrees with step (1) of the previously given two-step procedure.

The rest of the proof is left to the exercises.

In Theorem 1, if we relax the condition thatA be symmetric, then we no longer haveuniqueness. That is, there are many nonsymmetric matrices B such that xTBx = q(x).

Diagonalizing Quadratic FormsTheorem 1 shows that a quadratic form q(x) can be represented as q(x) = xTAx, whereA is a symmetric matrix. For many applications, however, it is useful to have the evensimpler representation described in this subsection.

Recall that a real symmetric matrixA can be diagonalized with an orthogonal matrix.That is, there is a square matrixQ such that:



1. QTQ = I .2. QTAQ = D, where D is diagonal.3. The diagonal entries of D are the eigenvalues of A.

Now consider the quadratic form q(x) = xTAx, where A is (n × n) and symmetric. Ifwe make the substitution x = Qy, we obtain

q(x) = q(Qy)= (Qy)TA(Qy)= yTQTAQy= yTDy= λ1y

21 + λ2y

22 + · · · + λny2

n.

(3)

The representation in Eq. (3) gives some qualitative information about q(x). For instance,suppose that the matrixA has only positive eigenvalues. In this case, if q(x) is evaluatedat some specific vector x∗ in Rn, x∗ �= θ , then q(x∗) will always be a positive number.

Example 2 Find the substitution x = Qy that diagonalizes the quadratic form

q(x) = q(r, s) = r2 + 4rs − 2s2.

Solution Following Theorem 1, we first represent q(x) as q(x) = xTAx:

q(x) = [r, s][

1 22 −2

][r

s

].

For the preceding (2 × 2) matrix A, the eigenvalues and eigenvectors are (for a and bnonzero),

λ = 2, w1 = a[

21

]and λ = −3, w2 = b

[1−2

].

An orthogonal matrix Q that diagonalizes A can be formed from normalizedeigenvectors:

Q = 1√5

[1 2−2 1

].

The substitution x = Qy is given by[r

s

]= 1√

5

[1 2−2 1

][u

v

],

or

r = 1√5(u+ 2v), s = 1√

5(−2u+ v).

Using the substitution above in the quadratic form, we obtain

r2 + 4rs − 2s2 = 15(u+ 2v)2 + 4

5(u+ 2v)(−2u+ v)− 2

5(−2u+ v)2

= 15[−15u2 + 10v2] = 2v2 − 3u2.



Classifying Quadratic FormsWe can think of a quadratic form as defining a function form Rn to R. Specifically,if q(x) = xTAx, where A is a real (n × n) symmetric matrix, then we can define areal-valued function with domain Rn by the rule

y = q(x) = xTAx.

The quadratic form is classified as:

(a) Positive definite if q(x) > 0 for all x in Rn, x �= θ .(b) Positive semidefinite if q(x) ≥ 0 for all x in Rn, x �= θ .(c) Negative definite if q(x) < 0 for all x in Rn, x �= θ .(d) Negative semidefinite if q(x) ≤ 0 for all x in Rn, x �= θ .(e) Indefinite if q(x) assumes both positive and negative values.

The diagonalization process shown in Eq. (3) allows us to classify any specific quadraticform q(x) = xTAx in terms of the eigenvalues of A. The details are given in the nexttheorem.

Theorem 2 Let q(x) be a quadratic form with representation q(x) = xTAx, where A is a symmetric(n× n) matrix. Let the eigenvalues of A be λ1, λ2, . . . , λn. The quadratic form is:

(a) Positive definite if and only if λi > 0, for 1 ≤ i ≤ n.(b) Positive semidefinite if and only if λi ≥ 0, for 1 ≤ i ≤ n.(c) Negative definite if and only if λi < 0, for 1 ≤ i ≤ n.(d) Negative semidefinite if and only if λi ≤ 0, for 1 ≤ i ≤ n.(e) Indefinite if and only if A has both positive and negative eigenvalues.

The proof is based on the diagonalization shown in Eq. (3) and is left as an exercise.

Example 3 Classify the quadratic form

q(r, s) = 3r2 − 4rs + 3s2.

Solution We first find the matrix representation for q(x) = q(r, s):

q(x) = q(r, s) = [r, s][

3 −2−2 3

][r

s

]= xTAx.

By Theorem 2, the quadratic form can be classified once we know the eigenvalues of A.The characteristic polynomial is

p(t) = det(A− tI ) =∣∣∣∣∣ 3− t −2−2 3− t

∣∣∣∣∣= t2 − 6t + 5= (t − 5)(t − 1).



Thus, since all the eigenvalues of A are positive, q(x) is a positive-definite quadraticform.

Because the quadratic form in Example 3 is so simple, it is possible to show directly thatthe form is positive definite. In particular

q(r, s) = 3r2 − 4rs + 3s2

= 2(r2 − 2rs + s2)+ r2 + s2

= 2(r − s)2 + r2 + s2.

(4)

From Eq. (4), it follows that q(x) > 0 for every nonzero x = [r, s]T .

Example 4 Verify that the following quadratic form is indefinite:

q(x) = q(r, s) = r2 + 4rs − 2s2.

Also, find vectors x1 and x2 such that q(x1) > 0 and q(x2) < 0.

Solution Example 2 showed that q(x) = xTAx, where A has eigenvalues λ1 = 2 and λ2 = −3.By Theorem 2, q(x) is indefinite.

If x1 is an eigenvector corresponding to λ1 = 2, then q(x1) is given by

q(x1) = xT1Ax1 = xT1 (λ1x1) = xT1 (2x1) = 2xT1 x1.

Thus, q(x1) > 0. Similarly, ifAx2 = λ2x2, where λ2 = −3, then q(x2) is negative sinceq(x2) = −3xT2 x2.

Note that as in (4), the quadratic form in Example 4 can be seen to be indefinite byobserving

q(r, s) = r2 + 4rs − 2s2

= r2 + 4rs + 4s2 − 6s2

= (r + 2s)2 − 6s2.

(5)

From Eq. (5), if r and s are numbers such that r + 2s = 0 with s nonzero, then q(r, s)is negative. On the other hand, q(r, 0) is positive for any nonzero value of r . Therefore,Eq. (5) confirms that q(r, s) takes on both positive and negative values. As specificinstances, note that q(2,−1) = −6 and q(2, 0) = 4.

Conic Sections and Quadric SurfacesThe ideas associated with quadratic forms are useful when we want to describe thesolution set of a quadratic equation in several variables. For example, consider thisgeneral quadratic equation in x and y:

ax2 + bxy + cy2 + dx + ey + f = 0. (6)

(In Eq. (6) we assume that at least one of a, b, or c is nonzero.)As we will see, the theory associated with quadratic forms allows us to make a

special change of variables that will eliminate the cross-product term in Eq. (6). Inparticular, there is always a change of variables of the form

x = a1u+ a2v, y = b1u+ b2v,



which, when these expressions are substituted into Eq. (6), produces a new equation ofthe form

a′u2 + b′v2 + c′u+ d ′v + e′ = 0. (7)

If Eq. (7) has solutions, then the pairs (u, v) that satisfy Eq. (7) will define a curve inthe uv-plane. Recall from analytic geometry that the solution set of Eq. (7) defines thefollowing curves in the uv-plane:

(a) An ellipse when a′b′ > 0.(b) A hyperbola when a′b′ < 0.(c) A parabola when a′b′ = 0 and one of a′ or b′ is nonzero.

In terms of the original variables x and y, the solution set for Eq. (6) is a curve in thexy-plane. Because of the special nature of the change of variables, the solution set forEq. (6) can be obtained simply by rotating the curve defined by Eq. (7).

To begin a study of the general quadratic equation in Eq. (6), we first rewriteEq. (6) in the form

xTAx + aT x + f = 0, (8)

where

x =[x

y

], A =

[a b/2b/2 c

], and a =

[d

e

].

Now ifQ is an orthogonal matrix that diagonalizesA, then the substitution x = Qywill remove the cross-product term from Eq. (6). Specifically, suppose thatQTAQ = D,where D is a diagonal matrix. In Eq. (8), the substitution x = Qy leads to

yTDy+ aTQy+ f = 0. (9)

For y = [u, v]T , Eq. (9) has the simple form

λ1u2 + λ2v

2 + c′u+ d ′v + f = 0. (10)

In Eq. (10), λ1 and λ2 are the (1, 1) and (2, 2) entries of D, respectively. (Note that λ1and λ2 are the eigenvalues of A.)

As we noted previously, if Eq. (10) has solutions, then the solution set will definean ellipse, a hyperbola, or a parabola in the uv-plane. Since the change of variablesx = Qy is defined by an orthogonal matrixQ, the pairs x = [x, y]T that satisfy Eq. (6)are obtained simply by rotating pairs y = [u, v]T that satisfy Eq. (7).

An example will illustrate these ideas.

Example 5 Describe and graph the solution set of

x2 + 4xy − 2y2 + 2√

5x + 4√

5y − 1 = 0. (11)

Solution The equation has the form xTAx + aTx + f = 0, where

x =[x

y

], A =

[1 22 −2

], a =

[2√

5

4√

5

], and f = −1.



From Example 2, we know thatQTAQ = D, where

Q = 1√5

[1 2−2 1

]and D =

[ −3 00 2

].

For y = [u, v]T , we make the substitution x = Qy in Eq. (11), obtaining

2v2 − 3u2 − 6u+ 8v − 1 = 0.

Completing the square, we can express the previous equation as

2(v2 + 4v + 4)− 3(u2 + 2u+ 1) = 6,

or(v + 2)2

3− (u+ 1)2

2= 1. (12)

From analytic geometry, Eq. (12) defines a hyperbola in the uv-plane, where thecenter of the hyperbola has coordinates (−1,−2); Fig. 7.1 shows the hyperbola. (Forreference, the vertices of the hyperbola have coordinates (−1,−2 ± √3) and the focihave coordinates (−1,−2±√13).)

–4 –2 2 4

–6

2

4

(–1, –2)u

v

Figure 7.1 The graph of the hyperbola(v + 2)2

3− (u+ 1)2

2= 1

Finally, Fig. 7.2 shows the solution set of Eq. (11), shown in the xy-plane. Thehyperbola shown in Fig. 7.2 is a rotation of hyperbola (12) shown in Fig. 7.1.

Note that, as Example 5 illustrated, if Eq. (6) has real solutions, then the solution setis easy to plot when a change of variables is used to eliminate any cross-product terms.In some circumstances, however, the solution set of quadratic equation (6) might consistof a single point or there might be no real solutions at all. For instance, the solution setof x2 + y2 = 0 consists of the single pair (0, 0), whereas the equation x2 + y2 = −1has no real solutions.

For quadratic equations involving more than two variables, all the cross-productterms can be eliminated by using the same technique employed with Eq. (6). For instance,consider the general quadratic equation in the variables x, y, and z:

ax2 + by2 + cz2 + dxy + exz+ fyz+ px + qy + rz+ s = 0. (13)



–3 4 6–3

6

x

u

v

y

Figure 7.2 The graph of the hyperbola x2 + 4xy − 2y2 + 2√

5x +4√

5y − 1 = 0

As with Eq. (6), we can express Eq. (13) in matrix-vector terms as

xTAx + aT x = s = 0, (14)

where

x =x

y

z

, A =

a d/2 e/2d/2 b f/2e/2 f/2 c

, and a =

p

q

r

.

If Q is an orthogonal matrix such that QTAQ = D, where D is diagonal, then thesubstitution x = Qy will reduce Eq. (14) to

yTDy+ aT(Qy)+ s = 0. (15)

For y = [u, v,w]T , Eq. (15) has no cross-product terms and will have the form

λ1u2 + λ2v

2 + λ3w2 + a′u+ b′v + c′w + s = 0. (16)

(Again, the scalars λ1, λ2, and λ3 are the diagonal entries of D or, equivalently, theeigenvalues of A.)

If Eq. (16) has real solutions, then the triples (u, v,w) that satisfy (16) will de-fine a surface in three-space. Such surfaces are called quadric surfaces, and detaileddescriptions (along with graphs) can be found in most calculus books.

The geometric nature of a quadric surface depends on the λi and the scalars a′, b′, c′,and s in Eq. (16). As a simple example, consider the equation

λ1u2 + λ2v

2 + λ3w2 = d, d > 0. (17)

If the λi are all positive, then the surface defined by Eq. (17) is an ellipsoid. If one ofthe λi is negative and the other two are positive, then the surface is a hyperboloid of onesheet. If two of the λi are negative and the other is positive, the surface is a hyperboloidof two sheets. If the λi are all negative, then Eq. (17) has no real solutions. The variousother surfaces associated with solution sets of Eq. (16) can be found in a calculus book.



The Principal Axis TheoremThe general quadratic equation in n variables has the form

n∑i=1

n∑j=1

bij xixj +n∑i=1

cixi + e = 0. (18)

As we know from the earlier discussions, Eq. (18) can be expressed in matrix-vectorterms as

xTAx + aT x + e = 0,

where A is a real (n× n) symmetric matrix and a = [c1, c2, . . . , cn]T .The following theorem tells us that it is always possible to make a change of variables

that will eliminate the cross-product terms in Eq. (18).

Theorem 3 The Principal Axis Theorem Let quadratic equation (18) be expressed as

xTAx + aT x + e = 0,

where A is real (n × n) symmetric matrix and a = [c1, c2, . . . , cn]T . Let Q be anorthogonal matrix such thatQTAQ = D, whereD is diagonal. Fory = [y1, y2, . . . , yn]T ,the substitution x = Qy transforms Eq. (18) to an equation of the form

λ1y21 + λ2y

22 + · · · + λny2

n + d1y1 + d2y2 + · · · + dnyn + e = 0.

7.1 EXERCISES

In Exercises 1–6, find a symmetric matrix A such thatq(x) = xTAx.1. q(x) = 2x2 + 4xy − 3y2

2. q(x) = −x2 + 6xy + y2

3. q(x) = x2 − 4y2 + 3z2 + 2xy − 6xz+ 8yz4. q(x) = u2 + 4w2 − z2 + 2uv + 10uw − 4uz

+ 4vw − 2vz+ 6wz

5. q(x) = [x, y][

2 04 1

][x

y

]

6. q(x) = [x, y, z]

1 3 15 2 43 2 1

x

y

z

In Exercises 7–12, find a substitution x = Qy that diag-onalizes the given quadratic form, where Q is orthogo-nal. Also, use Theorem 2 to classify the form as positivedefinite, positive semidefinite, and so on.7. q(x) = 2x2 + 6xy + 2y2

8. q(x) = 5x2 − 4xy + 5y2

9. q(x) = x2 + y2 + z2 + 4(xy + xz+ yz)10. q(x) = x2 + y2 + z2 + 2(xy + xz+ yz)11. q(x) = 3x2 − 2xy + 3y2

12. q(x) = u2 + v2 + w2 + z2

− 2(uv + uw + uz+ vw + vz+ wz)In Exercises 13–20, find a substitution x = Qy (whereQ is orthogonal) that eliminates the cross-product termin the given equation. Sketch a graph of the transformedequation, where y = [u, v]T .13. 2x2 +√3xy + y2 = 1014. 3x2 + 2xy + 3y2 = 815. x2 + 6xy − 7y2 = 816. 3x2 + 4xy + 5y2 = 417. xy = 418. 3x2 + 2

√3xy + y2 + 4x = 4

19. 3x2 − 2xy + 3y2 = 1620. x2 + 2xy + y2 = −121. Consider the quadratic form given by q(x) = xTAx,

where A is an (n × n) symmetric matrix. Suppose


7.2 Systems of Differential Equations 493

that C is any (n × n) symmetric matrix such thatxTAx = xTCx, for all x in Rn. Show that C = A.[Hint: Let x = ei and verify that cii = aii , 1 ≤ i ≤n. Next, consider x = er + es , where 1 ≤ r, s ≤ n.]

22. Consider the quadratic form q(x) = xTAx, where Ais an (n × n) symmetric matrix. Let A have eigen-values λ1, λ2, . . . , λn.a) Show that if the quadratic form is positive

definite, then λi > 0 for 1 ≤ i ≤ n. [Hint:Choose x to be an eigenvector of A.]

b) Show that if λi > 0 for 1 ≤ i ≤ n, then thequadratic form is positive definite. [Hint:Recall Eq. 3.]

(Note: Exercise 22 proves property (a) of Theo-rem 2.)

23. Prove property (b) of Theorem 2.24. Prove properties (c) and (d) of Theorem 2.25. Prove property (e) of Theorem 2. [Note: The proof

of property (e) is somewhat different from the proofof properties (a)–(d).]

26. Let A be an (n× n) symmetric matrix and considerthe function R defined on Rn by

R(x) = xTAxxTx

, x �= θ .

The number R(x) is called a Rayleigh quotient.Let A have eigenvalues λ1, λ2, . . . , λn, where λ1 ≤λ2 ≤ λ3 ≤ · · · ≤ λn. Prove that for every x in Rn,λ1 ≤ R(x) ≤ λn. [Hint: By the corollary to Theo-rem 23 in Section 4.7, Rn has an orthonormal basis{u1, u2, . . . ,un}, where Aui = λiui , 1 ≤ i ≤ n.For a given x, x �= θ , we can express x as x =a1u1 + a2u2 + · · · + anun. Using this expansion,calculate xTAx and xT x.]

27. Let A be an (n × n) symmetric matrix, as in Exer-cise 26. Let D denote the set of all vectors x in Rnsuch that ‖x‖ = 1 and consider the quadratic formq(x) = xTAx. Show that the maximum value ofq(x), x in D, is λn and the minimum value of q(x),x in D, is λ1. [Hint: Use the results of Exercise 26.Be sure to verify that the maximum and minimumvalues are attained.]

28. Let A be an (n× n) symmetric matrix, and let S bean (n× n) nonsingular matrix. Define the matrix Bby B = STAS.a) Verify that B is symmetric.b) Consider the quadratic forms q1(x) = xTAx

and q2(x) = xTBx. Show that q1(x) is positivedefinite if and only if q2(x) is positive definite.

7.2 SYSTEMS OF DIFFERENTIAL EQUATIONS

In Section 4.8, we provided a brief introduction to the problem of solving a system ofdifferential equations:

x ′1(t) = a11x1(t) + a12x2(t) + · · · + a1nxn(t)

x ′2(t) = a21x1(t) + a22x2(t) + · · · + a2nxn(t)...

......

x ′n(t) = an1x1(t) + an2x2(t) + · · · + annxn(t).(1)

A solution to system (1) is a set of functions x1(t), x2(t), . . . , xn(t) that simultaneouslysatisfy these equations.

In order to express system (1) in matrix terms, let us define the vector-valued functionx(t) by

x(t) =

x1(t)

x2(t)...

xn(t)

.



With x(t) defined in these terms, we can write system (1) asx′(t) = Ax(t), (2)

where the vector x′(t) and the (n× n) matrix A are given by

x′(t) =

x ′1(t)x ′2(t)...

x ′n(t)

and A =

a11 a12 · · · a1n

a21 a22 · · · a2n...

...an1 an2 · · · ann

.

The General Solution of x′ = AxAs in Section 4.8, let us assume that x′ = Ax has a solution of the form

x(t) = eλtu. (3)

For x(t) = eλtu, we have x′(t) = λeλtu. Therefore, inserting the trial form (3) intox′ = Ax leads to the condition

λeλtu = Aeλtu,which can be rewritten as

eλt [Au− λu] = θ . (4)

Since eλt is never zero, we see from Eq. (4) that x(t) = eλtu will be a nontrivial solutionof x′ = Ax if and only if λ is an eigenvalue of A and u is a corresponding eigenvector.

In general, suppose that the (n × n) matrix A has eigenvalues λ1, λ2, . . . , λn andcorresponding eigenvectors u1, u2, . . . ,un. Then the vector-valued functions x1(t) =eλ1tu1, x2(t) = eλ2tu2, . . . , xn(t) = eλntun are all solutions of x′ = Ax. It is easy toverify, moreover, that any linear combination of x1(t), x2(t), . . . , xn(t) is also a solution.That is,

x(t) = a1eλ1tu1 + a2e

λ2tu2 + · · · + aneλntun (5)

will solve x′ = Ax for any choice of scalars a1, a2, . . . , an.The question then arises: “Are there solutions to x′ = Ax other than the ones listed

in Eq. (5)?” The answer is: “If the eigenvectors u1, u2, . . . ,un are linearly independent,then every solution of x′ = Ax has the form (5).” A proof of this fact can be found in adifferential equations text. Equivalently, we can summarize the preceding discussion asfollows:

Let A be an (n × n) nondefective matrix with linearly independenteigenvectors u1, u2, . . . ,un. Then x(t) solves x′ = Ax if and only ifx(t) has the form (5).

For A nondefective, the expression (5) is known as the general solution of x′ = Ax.

Example 1 Write the following system of differential equations in the form x′ = Ax, and find thegeneral solution:

u′ = 3u+ v − w

v′ = 12u − 5ww′ = 4u+ 2v − w.



Solution This system has the form x′ = Ax, where

x(t) =u(t)

v(t)

w(t)

and A =

3 1 −112 0 −5

4 2 −1

.

The eigenvalues are λ1 = −1, λ2 = 1, and λ3 = 2. Corresponding eigenvectors are

u1 =

1−2

2

, u2 =

317

, and u3 =

112

.

Therefore, the general solution isx(t) = a1e

−tu1 + a2etu2 + a3e

2tu3

= a1e−t

1−2

2

+ a2e

t

317

+ a3e

2t

112

.

In terms of the original variables, the general solution is u(t) = a1e−t + 3a2e

t +a3e

2t , v(t) = −2a1e−t + a2e

t + a3e2t , w(t) = 2a1e

−t + 7a2et + 2a3e

2t , where a1, a2,and a3 are arbitrary.

In practice, we are often presented with an initial condition as well as a differentialequation. Such a problem,

x′(t) = Ax(t), x(0) = x0, (6)

is called an initial-value problem. That is, let x0 be a given initial vector. Then, amongall solutions of x′ = Ax, we want to identify that special solution that satisfies the initialcondition x(0) = x0.

WhenA is nondefective, it is easy to solve the initial-value problem (6). In particular,every solution of x′ = Ax has the form (5), and for x(t) as in (5) we have

x(0) = a1u1 + a2u2 + · · · + anun.Since the eigenvectors u1, u2, . . . ,un are linearly independent, we can always choosescalars α1, α2, . . . , αn such that x0 = α1u1 + α2u2 + · · · + αnun; therefore, x(t) =α1e

λ1tu1 + α2eλ2tu2 + · · · + αneλntun is the unique solution of x′ = Ax, x(0) = x0.

Example 2 Solve the initial-value problem x′ = Ax, x(0) = x0, where

A =

3 1 −112 0 −5

4 2 −1

and x0 =

7−316

.

Solution From Example 1, the general solution of x′ = Ax is x(t) = a1e−tu1+ a2e

tu2+ a3e2tu3,

where

u1 =

1−2

2

, u2 =

317

, and u3 =

112

.



Therefore, the condition x(0) = x0 reduces to a1u1 + a2u2 + a2u3 = x0, or

a1

1−2

2

+ a2

317

+ a3

112

=

7−316

.

Solving, we find a1 = 2, a2 = 2, and a3 = −1. Thus the solution of the initial-valueproblem is

x(t) = 2e−t

1−2

2

+ 2et

317

− e2t

112

=

2e−t + 6et − e2t

−4e−t + 2et − e2t

4e−t + 14et − 2e2t

.

The problem of solving x′ = Ax when A is defective is discussed in Section 7.8.Also, see Exercises 9 and 10 at the end of this section.

Solution by DiagonalizationAs noted in Eq. (5), if an (n×n)matrixA has n linearly independent eigenvectors, thenthe general solution of x′(t) = Ax(t) is given by

x(t) = b1x1(t)+ b2x2(t)+ · · · + bnxn(t)= b1e

λ1tu1 + b2eλ2tu2 + · · · + bneλntun.

Now, given thatA has a set of n linearly independent eigenvectors, the solution of x′(t) =Ax(t) can also be described in terms of diagonalization. This alternative solution processhas some advantages, especially for nonhomogeneous systems of the form x′(t) =Ax(t)+ f(t).

Suppose that A is an (n × n) matrix with n linearly independent eigenvectors. Aswe know, A is then diagonalizable. In particular, suppose that

S−1AS = D, D diagonal.

Next, consider the equation x′(t) = Ax(t). Let us make the substitution

x(t) = Sy(t).With this substitution, the equation x′(t) = Ax(t) becomes

Sy′(t) = ASy(t),or

y′(t) = S−1ASy(t),

or

y′(t) = Dy(t). (7)



Since D is diagonal, system (7) has the formy ′1(t)y ′2(t)...

y ′n(t)

=

λ1 0 0 · · · 00 λ2 0 · · · 0...

0 0 0 · · · λn

y1(t)

y2(t)...

yn(t)

.

Because D is diagonal, the equation above implies that the component functions, yi(t),are related by

y ′i (t) = λiyi(t), 1 ≤ i ≤ n.Then, since the general solution of the scalar equationw′ = λw is given byw(t) = ceλt ,it follows that

yi(t) = cieλi t , 1 ≤ i ≤ n.Therefore, the general solution of y′(t) = Dy(t) in system (7) is given by

y(t) =

c1e

λ1t

c2eλ2t

...

cneλnt

, (8)

where c1, c2, . . . , cn are arbitrary constants. In terms of x(t), we have x(t) = Sy(t) andx(0) = Sy(0), where y(0) = [c1, c2, . . . , cn]T . For an initial-value problem x′(t) =Ax(t), x(0) = x0, we would choose c1, c2, . . . , cn so that Sy(0) = x0 or y(0) = S−1x0.

Example 3 Use the diagonalization procedure to solve the initial-value problemu′(t) = −2u(t)+ v(t)+ w(t), u(0) = 1v′(t) = u(t)− 2v(t)+ w(t), v(0) = 3w′(t) = u(t)+ v(t)− 2w(t), w(0) = −1.

Solution First, we write the problem as x′(t) = Ax(t), x(0) = x0, where

x(t) =u(t)

v(t)

w(t)

, A =

−2 1 1

1 −2 11 1 −2

, and x0 =

13−1

.

The eigenvalues and eigenvectors of A are

λ1 = 0, u1 =

111

; λ2 = −3, u2 =

10−1

; λ3 = −3, u3 =

1−1

0

.

Thus we can construct a diagonalizing matrix S such that S−1AS = D by choosingS = [u1, u2, u3]:

S =

1 1 11 0 −11 −1 0

, D =

0 0 00 −3 00 0 −3

.



Next, solving y′(t) = Dy(t), we obtain

y(t) =

c1

c2e−3t

c3e−3t

.

From this, x(t) = Sy(t), or

x(t) =c1 + c2e

−3t + c3e−3t

c1 − c3e−3t

c1 − c2e−3t

.

To satisfy the initial condition x(0) = x0 = [1, 3,−1]T , we choose c1 = 1, c2 = 2, andc3 = −2. Thus, x(t) is given by

x(t) =u(t)

v(t)

w(t)

=

11+ 2e−3t

1− 2e−3t

.

(Note: For large t, x(t) ≈ [1, 1, 1]T .)

Complex SolutionsAs we have seen, solutions to x′(t) = Ax(t) are built up from functions of the form

xi (t) = eλi tui . (9)

In many applications, the function xi (t) represents a particular state of the physicalsystem modeled by x′(t) = Ax(t). Furthermore, for many applications, the state vectorxi (t) has component functions that are oscillatory in nature (for instance, see Example 5).Now, in general, a function of the form y(t) = eλt has an oscillatory nature if and onlyif λ is a complex scalar. To explain this fact, we need to give the definition of eλt whenλ is complex. In advanced texts it is shown, for λ = a + ib, that

eλt = e(a+ib)t = eat (cos bt + i sin bt). (10)

An example is presented below of a system x′(t) = Ax(t), where A has complexeigenvalues.

Example 4 Solve the initial-value problem

u′(t) = 3u(t)+ v(t), u(0) = 2v′(t) = −2u(t)+ v(t), v(0) = 8.

Solution The system can be written as x′(t) = Ax(t), x(0) = x0, where

x(t) =[u(t)

v(t)

], A =

[3 1−2 1

], and x0 =

[28

].



The eigenvalues are λ1 = 2+ i and λ2 = 2− i with corresponding eigenvectors

u1 =[

1+ i−2

]and u2 =

[1− i−2

].

The general solution of x′(t) = Ax(t) is given by

x(t) = a1eλ1tu1 + a2e

λ2tu2. (11)

From Eq. (10) it is clear that eλt has the value 1 when t = 0, whether λ is complex orreal. Thus to satisfy the condition x(0) = x0 = [2, 8]T , we need to choose a1 and a2 inEq. (11) so that a1u1 + a2u2 = x0:

a1(1+ i) + a2(1− i) = 2a1(−2) + a2(−2) = 8.

Solving the preceding system by using Gaussian elimination, we obtain a1 = −2 − 3iand a2 = −2+ 3i.

Having the coefficients a1 and a2 in Eq. (11), some complex arithmetic calculationswill give the functions u and v that satisfy the given initial-value problem. In particular,since λ1 = 2+ i, it follows that

eλ1t = e(2+i)t = e2t (cos t + i sin t).

Similarly, from the fact that cos(−t) = cos t and sin(−t) = − sin t ,

eλ2t = e(2−i)t = e2t (cos t − i sin t).

Thus, x(t) is given by

x(t) = eλ1t (a1u1)+ eλ2t (a2u2)

= e2t (cos t + i sin t)

[1− 5i4+ 6i

]+ e2t (cos t − i sin t)

[1+ 5i4− 6i

]

=[e2t (2 cos t + 10 sin t)e2t (8 cos t − 12 sin t)

].

That is,

u(t) = 2e2t (cos t + 5 sin t)v(t) = 4e2t (2 cos t − 3 sin t).

An example of a physical system that leads to a system of differential equationsis illustrated in Fig. 7.3. This figure shows a spring–mass system, where y1 = 0 andy2 = 0 indicate the equilibrium position of the masses, and y1(t) and y2(t) denote thedisplacements at time t . For a single spring and mass as in Fig. 7.4, we can use Hooke’slaw and F = ma to deducemy ′′(t) = −ky(t); that is, the restoring force of the spring isproportional to the displacement, y(t), of the mass from the equilibrium position, y = 0.The constant of proportionality is the spring constant k, and the minus sign indicates thatthe force is directed toward equilibrium.

In Fig. 7.3, the spring attached to m2 is stretched (or compressed) by the amounty2(t) − y1(t), so we can write m2y

′′2 (t) = −k2[y2(t) − y1(t)]. The mass m1 is being



m1 m2

k1

y1 = 0 y2 = 0

y2(t)

k2

y1(t)

Figure 7.3 A coupled spring–mass system

mk

y = 0

y1(t)

Figure 7.4 A single spring–mass system

pulled by two springs, so we have m1y′′1 (t) = −k1y1(t) + k2[y2(t) − y1(t)]. Thus the

motion of the physical system is governed by

y ′′1 (t) = −k1 + k2

m1y1(t)+ k2

m1y2(t)

y ′′2 (t) =k2

m2y1(t)− k2

m2y2(t).

(12)

To solve these equations, we write them in matrix form as y′′(t) = Ay(t), and we use atrial solution of the formy(t) = eωtu, whereu is a constant vector. Sincey′′(t) = ω2eωtu,we will have a solution if

ω2eωtu− eωtAu = θ (13)

or if (A− ω2I )u = θ . Thus to solve y′′(t) = Ay(t), we solve (A− λI)u = θ and thenchoose ω so that ω2 = λ. (It can be shown that λ will be negative and real, so ω must bea complex number.)

Example 5 Consider the spring–mass system illustrated in Fig. 7.3 and described mathematically insystem (12). Suppose that m1 = m2 = 1, k1 = 3, and k2 = 2. Find y1(t) and y2(t) ifthe initial conditions are

y1(0) = 0, y2(0) = 10y ′1(0) = 0, y ′2(0) = 0.



Solution System (12) has the form y′′(t) = Ay(t), where

y(t) =[y1(t)

y2(t)

]and A =

[ −5 22 −2

].

By Eq. (13), a function x of the form

x(t) = eβtuwill satisfy y′′(t) = Ay(t) if β2 is an eigenvalue of A.

The eigenvalues of A are λ1 = −1 and λ2 = −6, with corresponding eigenvectors

u1 =[

12

]and u2 =

[ −21

].

Thus four solutions of y′′(t) = Ay(t) are

x1(t) = eitu1, x2(t) = e−itu1,

x3(t) = e√

6itu2, x4(t) = e−√

6itu2.

The general solution is

y(t) = a1x1(t)+ a2x2(t)+ a3x3(t)+ a4x4(t). (14)

To satisfy the initial conditions, we need to choose the previous ai so that

y(0) =[

010

]and y′(0)

[00

].

An evaluation of Eq. (14) shows that

y(0) = (a1 + a2)u1 + (a3 + a4)u2

y′(0) = i(a1 − a2)u1 + i√

6(a3 − a4)u2.

Since y′(0) = θ , we see that a1 = a2 and a3 = a4. With this information in the conditiony(0) = [0, 10]T , it follows that a1 = 2 and a3 = 1.

Finally, by Eq. (14), we obtain

y(t) = 2[x1(t)+ x2(t)] + [x3(t)+ x4(t)]

=[

4 cos t − 4 cos(√

6t)

8 cos t + 2 cos(√

6t)

].

7.2 EXERCISES

In Exercises 1–8, write the given system of differen-tial equations in the form x′(t) = Ax(t). Expressthe general solution in the form (5) and determine theparticular solution that satisfies the given initial con-dition. (Note: Exercises 5 and 6 involve complexeigenvalues.)

1. u′(t) = 5u(t)− 2v(t)v′(t) = 6u(t)− 2v(t), x0 =

[58

]

2. u′(t) = 2u(t)− v(t)

v′(t) = −u(t)+ 2v(t), x0 =[

2−1

]



3. u′(t) = u(t)+ v(t)

v′(t) = 2u(t)+ 2v(t), x0 =[

51

]

4. u′(t) = 5u(t)− 6v(t)v′(t) = 3u(t)− 4v(t), x0 =

[32

]

5. u′(t) = .5u(t)+ .5v(t)v′(t) = −.5u(t)+ .5v(t), x0 =

[44

]

6. u′(t) = 6u(t)+ 8v(t)v′(t) = −u(t)+ 2v(t), x0 =

[80

]

7. u′(t) = 4u(t) + w(t)v′(t) = −2u(t)+ v(t),w′(t) = −2u(t) + w(t)

x0 =−1

10

8. u′(t) = 3u(t)+ v(t)− 2w(t)v′(t) = −u(t)+ 2v(t)+ w(t)

w′(t) = 4u(t)+ v(t)− 3w(t), x0 =

−2

4−8

9. Consider the systemu′(t) = u(t)− v(t)

v′(t) = u(t)+ 3v(t).a) Write this system in the form x′(t) = Ax(t)

and observe that there is only one solution ofthe form x1(t) = eλtu. What is the solution?

b) Having λ and u, find a vector y0 for whichx2(t) = teλtu+ eλty0 is a solution. [Hint:Substitute x2(t) into x′(t) = Ax(t) to determiney0. The vector y0 is called a generalizedeigenvector. See Section 7.8.]

c) Show that we can always choose constants c1and c2 such that

y(t) = c1x1(t)+ c2x2(t)

satisfies y(0) = x0 for any x0 in R2.10. Repeat Exercise 9 for the system

u′(t) = 2u(t)− v(t)

v′(t) = 4u(t)+ 6v(t)and find the solution that satisfies u(0) = 1,v(0) = 1.

7.3 TRANSFORMATION TO HESSENBERG FORM

In order to find the eigenvalues of an (n×n)matrixA, we would like to find a matrixHthat has the same eigenvalues as A but in which the eigenvalues ofH are relatively easyto determine. We already know from Section 4.7 that similar matrices have the sameeigenvalues, so we shall look for a matrix H such that

H = S−1AS

and such that H has some special sort of form that facilitates finding the characteristicpolynomial for H .

We might hope that we could choose H to be a diagonal or triangular matrix sincethis choice would make the eigenvalue problem for H trivial. Unfortunately we cannotexpect easily to reduce an arbitrary matrixA to a similar matrixH , whereH is triangularor diagonal. To see why, recall that if p(t) is any polynomial, then we can constructa matrix B for which p(t) is the characteristic polynomial of B (see Exercise 27 ofSection 4.4). If it were easy to reduce B to a similar matrix H that was triangular ordiagonal, then we would have an easy means of finding the roots of p(t) = 0. Butas we have commented, Abel showed that finding the roots of a polynomial equationcannot be an easy problem. Since we cannot expect to find an efficient procedure totransform an (n× n) matrix A into a similar matrix H that is triangular, we ask for thenext best thing—a way to transform A into an almost triangular or Hessenberg matrix.


7.3 Transformation to Hessenberg Form 503

In this section, we establish the details of reduction to Hessenberg form, and in the nextsection, we state an algorithm that can be used to find the characteristic polynomial of aHessenberg matrix.

We also prove that this algorithm is mathematically sound, and in the process wedevelop more of the theoretical foundation for the eigenvalue problem.

To begin, we say that an (n × n) matrix H = (hij ) is a Hessenberg matrix ifhij = 0 whenever i > j + 1. Thus H is a Hessenberg matrix if all the entries be-low the subdiagonal of H are zero, where the subdiagonal of H means the entriesh21, h32, h43, . . . , hn,n−1. For example, a (6× 6) Hessenberg matrix has the form

H =

× × × × × ×× × × × × ×0 × × × × ×0 0 × × × ×0 0 0 × × ×0 0 0 0 × ×

.

Note that the definition of a Hessenberg matrix insists only that the entries below thesubdiagonal are zero; it is irrelevant whether the other entries are zero. Thus, for exam-ple, diagonal and upper-triangular matrices are in Hessenberg form; and as an extremeexample, the (n× n) zero matrix is a Hessenberg matrix. Every (2× 2) matrix is (triv-ially) a Hessenberg matrix since there are no entries below the subdiagonal. We will seeshortly that Hessenberg form plays the same role for the eigenvalue problem as echelonform does for the problem of solving Ax = b.

Example 1 The following matrices are in Hessenberg form:

H1 =[

1 23 1

], H2 =

1 2 12 3 10 4 2

, H3 =

1 2 0 32 0 1 40 1 3 20 0 0 5

.

Our approach to finding the eigenvalues of A has two parts:

1. Find a Hessenberg matrix H that is similar to A.2. Calculate the characteristic polynomial for H .

As we show below, both of these steps are (relatively) easy. Transforming A to Hes-senberg form is accomplished by simple row and column operations that resemble theoperations used previously to reduce a matrix to echelon form. Next, the characteristicpolynomial for a Hessenberg matrix can be found simply by solving a triangular systemof equations. The main theoretical result of this section is Theorem 4, which asserts thatevery (n × n) matrix is similar to a Hessenberg matrix. The proof is constructive andshows how the similarity transformation is made.



In order to make the (n× n) case easier to understand, we begin by showing how a(4× 4) matrix can be reduced to Hessenberg form. Let A be the (4× 4) matrix

A =

a11 a12 a13 a14

a21 a22 a23 a24

a31 a32 a33 a34

a41 a42 a43 a44

, (1)

and suppose for the moment that a21 �= 0. Define the matrixQ1 by

Q1 =

1 0 0 00 1 0 0

0−a31

a211 0

0−a41

a210 1

, (2a)

and observe thatQ−11 is given by

Q−11 =

1 0 0 00 1 0 0

0a31

a211 0

0a41

a210 1

. (2b)

(That is, Q−11 is obtained from Q1 by changing the sign of the off-diagonal entries of

Q1; equivalently,Q1 +Q−11 = 2I .)

It is easy to see that forming the product Q1A has the effect of adding a multipleof −a31/a21 times row 2 of A to row 3 and adding a multiple of −a41/a21 times row 2of A to row 4 of A. Thus Q1A has zeros in the (3, 1) and (4, 1) positions. The matrixQ1AQ

−11 is similar to A, and we note that the zeros in the (3, 1) and (4, 1) positions are

not disturbed when the product (Q1A)Q−11 is formed. (This fact is easy to see sinceQ−1

1has the formQ−1

1 = [e1, q, e3, e4]; so the first, third, and fourth columns ofQ1A are notdisturbed when (Q1A)Q

−11 is formed.) In summary, when Q1 and Q−1

1 are defined by(2), then A1 = Q1AQ

−11 has the form

A1 =

b11 b12 b13 b14

b21 b22 b23 b24

0 b32 b33 b34

0 b42 b43 b44

. (3)

Matrix A1 is similar to A and represents the first step in Hessenberg reduction. Asa point of interest, we note that there is an easy way to see how to constructQ1. That is,if we wished to create zeros in the (3, 1) and (4, 1) entries ofA by using elementary rowoperations, we could multiply row 2 by −a31/a21 and add the result to row 3, and nextmultiply row 2 by −a41/a21 and add the result to row 4. The matrixQ1 is formed from



the (4× 4) identity I by performing these same row operations on I . (It is not usuallypossible to use row 1 to create zeros in the (2, 1), (3, 1), and (4, 1) positions and stillproduce a similar matrix.)

The next step in Hessenberg reduction is analogous to the first. We can introduce azero into the (4, 2) position of A1 if we multiply row 3 of A1 by −b42/b32 and add theresult to row 4. Following the discussion above, we defineQ2 to be the matrix

Q2 =

1 0 0 00 1 0 00 0 1 0

0 0−b42

b321

, (4a)

and we note as before that Q−12 is obtained from Q2 by changing the sign of the off-

diagonal entries:

Q−12 =

1 0 0 00 1 0 00 0 1 0

0 0b42

b321

. (4b)

By a direct multiplication, it is easy to see that H = Q2A1Q−12 is a Hessenberg matrix.

Since H is similar to A1 and A1 is similar to A, we see that H is similar to A. In fact,H = Q2A1Q

−12 = Q2(Q1AQ

−11 )Q

−12 = (Q2Q1)A(Q2Q1)

−1.Except for the possibility that a21 = 0 and/or b32 = 0, this discussion shows how to

reduce an arbitrary (4× 4) matrix to Hessenberg form. We will describe how to handlezero pivot elements after an example.

Example 2 Reduce the (4× 4) matrix A to Hessenberg form, where

A =

1 −2 4 12 0 5 22 −2 9 3−6 −1 −16 −6

.

Solution Following (2a) and (2b), we defineQ1 andQ−11 to be

Q1 =

1 0 0 00 1 0 00 −1 1 00 3 0 1

and Q−1

1 =

1 0 0 00 1 0 00 1 1 00 −3 0 1

.



Given this definition,

Q1A =

1 −2 4 12 0 5 20 −2 4 10 −1 −1 0

,

and

A1 = Q1AQ−11 =

1 −1 4 12 −1 5 20 −1 4 10 −2 −1 0

.

The final step of Hessenberg reduction is to use (4a) and (4b) to defineQ2 andQ−12 :

Q2 =

1 0 0 00 1 0 00 0 1 00 0 −2 1

, Q−1

2 =

1 0 0 00 1 0 00 0 1 00 0 2 1

.

We obtain H = Q2A1Q−12 ,

H =

1 −1 6 12 −1 9 20 −1 6 10 0 −13 −2

;

and H is a Hessenberg matrix that is similar to A.

To complete our discussion of how to reduce a (4× 4) matrix to Hessenberg form,we must show how to proceed when a21 = 0 in (1) or when b32 = 0 in (3). This situationis easily handled by using one of the permutation matrices (see Exercises 15–22 at theend of the section):

P1 =

1 0 0 00 0 1 00 1 0 00 0 0 1

, P2 =

1 0 0 00 0 0 10 0 1 00 1 0 0

, P3 =

1 0 0 00 1 0 00 0 0 10 0 1 0

.

Each of these matrices is its own inverse: P1P1 = I, P2P2 = I, P3P3 = I . Thus, P1AP1is similar to A, as are P2AP2 and P3AP3. The action of these similarity transformationsis easy to visualize; for example, forming P1A has the effect of interchanging rows 2 and3 of A, whereas forming (P1A)P1 switches columns 2 and 3 of P1A. In detail, P1AP1



is given by

P1AP1 =

a11 a13 a12 a14

a31 a33 a32 a34

a21 a23 a22 a24

a41 a43 a42 a44

.

If a21 = 0 in (1), but a31 �= 0, then P1AP1 is a matrix similar to A with a nonzero entryin the (2, 1) position. We can clearly carry out the first stage of Hessenberg reductionon P1AP1. If a21 = a31 = 0 in (1), but a41 �= 0, then P2AP2 has a nonzero entry inthe (2, 1) position; and we can now carry out the first stage of Hessenberg reduction.Finally, if a21 = a31 = a41 = 0, the first stage is not necessary. In (3), if b32 = 0, butb42 �= 0, then forming P3A1P3 will produce a similar matrix with a nonzero entry inthe (3, 2) position. Moreover, the first column of A1 will be left unchanged, and so thesecond step of Hessenberg reduction can be executed. (In general, interchanging tworows of a matrix A and then interchanging the same two columns produces a matrixsimilar to A. Also note that the permutation matrices P1, P2, and P3 are derived fromthe identity matrix I by performing the desired row-interchange operations on I .)

The discussion above proves that every (4 × 4) matrix is similar to a Hessenbergmatrix H and also shows how to construct H . The situation with respect to (n × n)matrices is exactly analogous, and we can now state the main result of this section.

Theorem 4 Let A be an (n × n) matrix. Then there is a nonsingular (n × n) matrix Q such thatQAQ−1 = H , where H is a Hessenberg matrix.

A proof of Theorem 4 can be constructed along the lines of the discussion for the(4× 4) case. Since no new ideas are involved, we omit the proof.

Example 3 Reduce A to Hessenberg form, where

A =

1 1 8 −20 3 5 −11 −1 −3 23 −1 −4 9

.

Solution In A, the entry a21 is zero, so we want to interchange rows 2 and 3. We construct theappropriate permutation matrix P by interchanging rows 2 and 3 of I , obtaining

P =

1 0 0 00 0 1 00 1 0 00 0 0 1

.

Clearly, PP = I , so that P−1 = P .



With this, B is similar to A, where B = PAP:

B = PAP =

1 8 1 −21 −3 −1 20 5 3 −13 −4 −1 9

.

Next, we define a matrixQ1 and form A1 = Q1BQ−11 , where

Q1 =

1 0 0 00 1 0 00 0 1 00 −3 0 1

and A1 = Q1BQ

−11 =

1 2 1 −21 3 −1 20 2 3 −10 14 2 3

.

Finally, we form a matrixQ2 and calculate H = Q2A1Q−11 :

Q2 =

1 0 0 00 1 0 00 0 1 00 0 −7 1

and H = Q2AQ

−12 =

1 2 −13 −21 3 13 20 2 −4 −10 0 51 10

.

Computational ConsiderationsA variety of similarity transformations, besides the elementary ones we have described,have been developed to reduce a matrix to Hessenberg form. Particularly effective areHouseholder transformations, a sequence of explicitly defined transformations involvingorthogonal matrices (Section 7.5).

Although a reduction process like transformation to Hessenberg form may seemquite tedious, we show in the next section that it is easy to calculate the characteristicpolynomial of a Hessenberg matrix. Also, however tedious Hessenberg reduction mayseem, the alternative of calculating the characteristic polynomial from p(t) = det(A−tI ) is worse. To illustrate this point, we note that in order to gauge the efficiencyof an algorithm (particularly an algorithm that will be implemented on a computer),operations counts are frequently used as a first approximation. By an operations count,we mean a count of the number of multiplications and additions that must be performedin order to execute the algorithm. Given an (n × n) matrix A, it is not hard to see thata total of approximately n3 multiplications and n3 additions are needed to reduce A toHessenberg form and then to find the characteristic polynomial. By contrast, if A is(n× n), calculating p(t) from

p(t) = det(A− tI )requires on the order of n!multiplications and n! additions. In the language of computerscience, reduction to Hessenberg form is a polynomial-time algorithm, whereas com-puting det(A − tI ) is an exponential-time algorithm. In a polynomial-time algorithm,execution time grows at a rate proportional to nk as n grows (where k is a constant); inan exponential time algorithm, execution time grows at least as fast as bn (where b is



Table 7.1

n n3 n!

3 27 64 64 245 125 1206 216 7207 343 5,0408 512 40,3209 729 362,880

10 1,000 3,628,80011 1,331 39,916,80012 1,728 479,001,600

a constant larger than 1). The distinction is more than academic because exponential-time algorithms can be used on only the smallest problems, and the basic question iswhether or not we can produce acceptable answers to practical problems in a reasonableamount of time. In fact, in some areas of application, the only known algorithms areexponential-time algorithms, and hence realistic problems cannot be solved except by aninspired guess. (An example of such a problem is the “traveling salesman’s” problem,which arises in operations research.)

Table 7.1 should illustrate the difference between polynomial time and exponentialtime for the problem of calculating the characteristic polynomial. We can draw somerough conclusions from this table. For instance, if an algorithm requiring n3 operationsis used on a (12× 12) matrix, and if the algorithm executes in 1 second, then we wouldexpect any algorithm requiring n! operations to take on the order of 77 hours to executewhen applied to the same (12 × 12) matrix. For larger values of n, the comparisonbetween polynomial-time and exponential-time algorithms borders on the absurd. Forexample, if an algorithm requiring n3 operations executes in 1 second for n = 20, wewould suspect that an algorithm requiring 20! operations would take something like8× 1010 hours, or approximately 9,000,000 years.

7.3 EXERCISES

In Exercises 1–10, reduce the given matrix to Hessen-berg form by using similarity transformations. Displaythe matrices used in the similarity transformations.1.−7 4 3

8 −3 332 −15 13

2.−6 3 −14−1 2 −2

2 0 5

3.

1 3 10 2 41 1 3

4.

1 2 −13 2 1−6 1 3

5.

3 −1 −14 −1 −2

−12 5 0

6.

4 0 30 1 23 2 1



7. 1 −1 −1 −1−1 1 −1 −1−1 −1 1 −1−1 −1 −1 1

8. 6 1 4 41 6 4 44 4 6 14 4 1 6

9. 1 2 1 30 1 1 20 3 1 11 2 0 2

10. 2 −2 0 −1−1 −1 −2 1

2 2 1 41 1 −3 9

11. Consider the general (4× 4) Hessenberg matrix H ,where a2, b3, and c4 are nonzero:

H =

a1 b1 c1 d1

a2 b2 c2 d2

0 b3 c3 d3

0 0 c4 d4

. (5)

Suppose that λ is any eigenvalue of H , for simplic-ity, we assume λ is real. Show that the geometricmultiplicity of λmust be equal to 1. [Hint: Considerthe columns of H − λI .]

12. Let H be a (4× 4) Hessenberg matrix as in (5), butwhere a2, b3, and c4 are not necessarily nonzero.Suppose that H is similar to a symmetric matrix A.Let λ be an eigenvalue of A, where λ has an alge-braic multiplicity greater than 1. Use Exercise 11to conclude that at least one of a2, b3, or c4 must bezero.

13. Let A be the matrix in Exercise 7 and let H be theHessenberg matrix found in Exercise 7. Determinethe characteristic equation forH and solve the equa-tion to find the eigenvalues of A. (Exercise 12 ex-plains why some subdiagonal entry of H must bezero.)

14. Repeat Exercise 13 for the matrix in Exercise 8.

Exercises 15–22 deal with permutation matrices. Recall(Section 4.7) that an (n× n) matrix P is a permutationmatrix if P is formed by rearranging the columns of the(n× n) identity matrix. For example, some (3× 3) per-mutation matrices areP = [e3, e2, e1], P = [e2, e3, e1],and P = [e1, e3, e2]. By convention the identity ma-trix, P = [e1, e2, e3], is also considered a permutationmatrix.15. List, in column form, all the possible (3 × 3) per-

mutation matrices (there are six).16. List, in column form, all the possible (4 × 4) per-

mutation matrices (there are 24).17. How many different (n × n) permutation matrices

are there? [Hint: How many positions can e1 oc-cupy? Once e1 is fixed, how many rearrangementsof the remaining n− 1 columns are there?]

18. Let P be an (n × n) permutation matrix. Verifythat P is an orthogonal matrix. [Hint: Recall (5) inSection 4.7.]

19. Let A be an (n × n) matrix and P an (n × n) per-mutation matrix, P = [ei , ej , ek, . . . , er ]. Showthat AP = [Ai ,Aj ,Ak, . . . ,Ar ]; that is, formingAP rearranges the columns of A through the samerearrangement that produced P from I .

20. As in Exercise 19, show that PTA rearranges therows of A in the same pattern as the columns of P .[Hint: Consider ATP .]

21. Let P and Q be two (n× n) permutation matrices.Show that PQ is also a permutation matrix.

22. Let P be an (n× n) permutation matrix. Show thatthere is a positive integer k such that P k = I . [Hint:Consider the sequence,P, P 2, P 3, . . . .Can there beinfinitely many different matrices in this sequence?]

7.4 EIGENVALUES OF HESSENBERG MATRICES

In Section 7.3, we saw that an (n× n)matrix A is similar to a Hessenberg matrixH . InSection 7.7, we will prove a rather important result, the Cayley–Hamilton theorem (seethe corollary to Theorem 15 in Section 7.7). In this section, we see that the Cayley–Hamilton theorem can be used as a tool for calculating the characteristic polynomial ofHessenberg matrices.

As we have noted, Hessenberg form plays somewhat the same role for the eigenvalueproblem as echelon form plays with respect to solving Ax = b. For instance, a squarematrix A, whether invertible or not, can always be reduced to echelon form by using asequence of simple row operations. Likewise, a square matrixA, whether diagonalizable


7.4 Eigenvalues of Hessenberg Matrices 511

or not, can always be reduced to Hessenberg form by using a sequence of simple similaritytransformations.

For the problem Ax = b, echelon form is easily achieved and reveals much aboutthe possible solutions of Ax = b, even when A is not invertible. Similarly (see Sec-tion 7.8), Hessenberg form provides a convenient framework for discussing generalizedeigenvectors. We will need this concept in the event that A is not diagonalizable.

Finally, once the system Ax = b is made row equivalent to Ux = c, where U isupper triangular (in echelon form), then it is fairly easy to complete the solution process.Likewise, if A is similar to H , where H is in Hessenberg form, then it is relatively easyto find the characteristic polynomial ofH (recall that the similar matricesA andH havethe same characteristic polynomial).

The Characteristic Polynomial of a Hessenberg MatrixFor hand calculations, an efficient method for determining the characteristic polynomialof a matrix A is as follows:

1. Reduce A to a Hessenberg matrix H , as in Section 7.3.2. Find the characteristic polynomial for H according to the algorithm described

in this subsection.

The algorithm referred to in step (2) is known as Krylov’s method. We outline thesteps for Krylov’s method in Eqs. (3)–(5). For a general (n× n) matrix A, we note thatKrylov’s method can fail. For an (n× n) Hessenberg matrixH , however, the procedureis always effective.

LetH be an (n×n)Hessenberg matrix. In Section 4.4, we defined the characteristicpolynomial for H by p(t) = det(H − tI ). In this section, it will be more convenient todefine p(t) by

p(t) = det(tI −H). (1)

Note that the properties of determinants show thatdet(tI −H) = det[(−1)(H − tI )]

= (−1)n det(H − tI ).Thus the zeros of p(t) in Eq. (1) are the eigenvalues of H . We will call p(t) thecharacteristic polynomial for H , even though it differs by a factor of (−1)n from ourprevious definition.

The algorithm described in Eqs. (3)–(5) is valid for Hessenberg matrices withnonzero subdiagonal elements. In this regard, a Hessenberg matrix, H = (hij ), issaid to be unreduced if

hk,k−1 �= 0, k = 2, 3, . . . , n. (2)

If H has at least one subdiagonal entry that is zero, then H will be called reduced.For example, the Hessenberg matrix H1 is unreduced, whereas H2 is reduced:

H1 =

2 1 3 41 3 2 10 4 2 50 0 1 3

, H2 =

4 2 5 13 6 4 20 0 1 30 0 4 7

.

That is, H2 is reduced since H2 has a zero on the subdiagonal, in the (3, 2) position.



With these preliminaries, we can state the following algorithm for determiningp(t) = det(tI −H).

Algorithm 1 Let H be an unreduced Hessenberg matrix, and let w0 denote the (n × 1) unitvector e1.(a) Compute the vectors w1,w2, . . . ,wn by

wk = Hwk−1, for k = 1, 2, . . . , n. (3)

(b) Solve the linear system

a0w0 + a1w1 + · · · + an−1wn−1 = −wn (4)

for a0, a1, . . . , an−1.

(c) Use the values from (b) as coefficients in p(t):

p(t) = tn + an−1tn−1 + · · · + a1t + a0. (5)

It will be shown in Section 7.7 that p(t) in Eq. (5) is the same as p(t) in Eq. (1).The theoretical basis for Algorithm 1 is discussed in Exercise 23.

Example 1 Use Algorithm 1 to find the eigenvalues of

H =[

5 −26 −2

].

Solution Note that h21 = 6 �= 0, so H is unreduced. With w0 = e1, Eq. (3) yields

w0 =[

10

], w1 = Hw0 =

[56

], and w2 = Hw1 =

[1318

].

From Eq. (4), we have

a0

[10

]+ a1

[56

]= −

[1318

],

ora0 + 5a1 = −13

6a1 = −18.The solution is a1 = −3 and a0 = 2. Thus, by Eq. (5),

p(t) = t2 − 3t + 2 = (t − 2)(t − 1).

Hence λ1 = 2 and λ2 = 1 are the eigenvalues of H . For the simple example above, thereader can easily check that

det(tI −H) =∣∣∣∣∣ t − 5 2−6 t + 2

∣∣∣∣∣ = t2 − 3t + 2.




H =

2 2 −1−1 −1 1

0 2 1

.

Solution Note that h21 and h32 are nonzero, so H is unreduced. With w0 = e1 = [1, 0, 0]T ,Eq. (3) yields

w1 = Hw0 =

2−1

0

, w2 = Hw1 =

2−1−2

, and w3 = Hw2 =

4−3−4

.

The system a0w0 + a1w1 + a2w2 = −w3 is

a0 + 2a1 + 2a2 = −4−a1 − a2 = 3−2a2 = 4.

The solution is a2 = −2, a1 = −1, and a0 = 2. So from Eq. (5),

p(t) = t3 − 2t2 − t + 2 = (t + 1)(t − 1)(t − 2).

Thus the eigenvalues of H are λ1 = 2, λ2 = 1, and λ3 = −1.


H =

1 1 1 12 0 1 10 −1 −2 −20 0 2 2

.

Solution Since h21, h32, and h43 are nonzero, H is unreduced. With w0 = e1 = [1, 0, 0, 0]T ,Eq. (3) yields

w1 =

1200

, w2 =

32−2

0

, w3 =

342−4

, and w4 =

540−4

.

The system a0w0 + a1w1 + a2w2 + a3w3 = −w4 is

a0 + a1 + 3a2 + 3a3 = −52a1 + 2a2 + 4a3 = −4−2a2 + 2a3 = 0

−4a3 = 4.Hence a3 = −1, a2 = −1, a1 = 1, and a0 = 0; so

p(t) = t4 − t3 − t2 + t = t (t − 1)2(t + 1).



Thus the eigenvalues are λ1 = λ2 = 1, λ3 = 0, and λ4 = −1. This example illustratesthat Algorithm 1 is effective even when H is singular (λ3 = 0).

As Examples 1–3 indicate, the system (4) that is solved to obtain the coefficients ofp(t) is both triangular and nonsingular. This fact is proved in Theorem 5. Knowing thatsystem (4) is nonsingular tells us that Algorithm 1 cannot fail.

Of course, the characteristic polynomial p(t) can also be obtained by expandingthe determinant det(tI − H). Algorithm 1, however, is more efficient (requires fewerarithmetic operations) than a determinant expansion. Besides increased efficiency, weintroduce this version of Krylov’s method because the technique provides insight intomatrix polynomials, generalized eigenvectors, and other important aspects of the eigen-value problem.

Theorem 5 Let H be an unreduced (n × n) Hessenberg matrix, and let w0 denote the (n × 1) unitvector e1. Then the vectors w0,w1, . . . ,wn−1, defined by

wi = Hwi−1, i = 1, 2, . . . , n− 1

form a basis for Rn.

Proof Since any set of n linearly independent vectors in Rn is a basis for Rn, we can prove thistheorem by showing that {w0,w1, . . . ,wn−1} is a linearly independent set of vectors. Toprove this, we observe first that w0 and w1 are given by

w0 =

100...

0

and w1 =

h11

h21

0...

0

.

Forming w2 = Hw1, we find that

w2 =

h11h11 + h12h21

h21h11 + h22h21

h32h21

0...

0

.

Since H was given as an unreduced Hessenberg matrix, the second component of w1and the third component of w2 are nonzero.

In general (see Exercise 23) it can be shown that the ith component of wi−1is the product hi,i−1hi−1,i−2 · · ·h32h21, and the kth component of wi−1 is zero fork = i + 1, i + 2, . . . , n. Thus the (n× n) matrix

W = [w0,w1, . . . ,wn−1]



is upper triangular, and the diagonal entries of W are all nonzero. In light of this, weconclude that {w0,w1, . . . ,wn−1} is a set of n linearly independent vectors in Rn andhence is a basis for Rn.

Reduced Hessenberg MatricesWe now consider a reduced Hessenberg matrixH and illustrate that Algorithm 1 cannotbe used on H .

Example 4 Demonstrate why Algorithm 1 fails on the reduced Hessenberg matrix

H =

1 2 1 32 1 1 10 0 2 10 0 1 1

.

Solution Since h32 = 0, H is reduced. From Eq. (3), with w0 = e1,

w1 =

1200

, w2 =

5400

, w3 =

131400

, and w4 =

4140

00

.

The vectors above are linearly dependent, and the solutions of a0w0 + a1w1 +a2w2 + a3w3 = −w4 are

a0 = −3a2 − 6a3 − 21a1 = −2a2 − 7a3 − 20,

(6)

where a2 and a3 are arbitrary. The coefficients of the characteristic polynomial (a0 =−3, a1 = 7, a2 = 4, a3 = −5) are one of the solutions in (6), but in general it isimpossible to discern this solution by inspection.

We now prove a result that shows how the eigenvalue problem forH uncouples intosmaller problems whenH is a reduced Hessenberg matrix. This theorem is based on theobservation that a Hessenberg matrix that has a zero subdiagonal entry can be partitionedin a natural and useful way. To illustrate, we consider a (5 × 5) reduced Hessenbergmatrix:

H =

2 1 3 5 76 2 1 3 80 1 2 1 30 0 0 4 10 0 0 1 6

.



We can partition H into four submatrices, H11, H12, H22, and O, as indicated below:

H =

2 1 3 5 76 2 1 3 80 1 2 1 30 0 0 4 10 0 0 1 6

=[H11 H12

O H22

].

(7)

For a matrix H partitioned as in Eq. (7), we can show that det(H) = det(H11) det(H22)

and that det(tI − H) = det(tI − H11) det(tI − H22). This fact leads to Theorem 6,stated below. We provide a different proof of Theorem 6 in order to demonstrate how tofind eigenvectors for a block matrix.

A matrix written in partitioned form, such as

H =[H11 H12

O H22

],

is usually called a block matrix—the entries in H are blocks, or submatrices, of H . Infact, H is called block upper triangular since the only block below the diagonal blocksis a zero block.

When some care is exercised to see that all the products are defined, the blocks ina block matrix can be treated as though they were scalars when forming the product oftwo block matrices. For example, supposeQ is a (5× 5)matrix partitioned in the samefashion as H in Eq. (7):

Q =[Q11 Q12

Q21 Q22

],

so thatQ11 is (3× 3),Q12 is (3× 2),Q21 is (2× 3), andQ22 is (2× 2). Then it is nothard to show that the product HQ is also given in block form as

HQ =[H11Q11 +H12Q21 H11Q12 +H12Q22

H22Q21 H22Q22

].

(Note that all the products make sense in the block representation of HQ.)With these preliminaries, we now state an important theorem.

Theorem 6 Let B be an (n× n) matrix of the form

B =[B11 B12

O B22

],

where B11 is (k × k), B12 is [k × (n− k)], O is the [(n− k)× k] zero matrix, and B22is [(n− k)× (n− k)]. Then λ is an eigenvalue of B if and only if λ is an eigenvalue ofB11 or B22.



Proof Let x be any (n× 1) vector and write x in partitioned form as

x =[uv

], (8)

where u is (k × 1) and v is [(n− k)× 1]. It is easy to see that the equation Bx = λx isequivalent to

B11u+ B12v = λuB22v = λv.

(9)

Suppose first that λ is an eigenvalue of B. Then there is a vector x, x �= θ , such thatBx = λx. If v �= θ in Eq. (8), then we see from (9) that λ is an eigenvalue of B22. Onthe other hand, if v = θ in (8), then we must have u �= θ ; and (9) guarantees that λ is aneigenvalue of B11.

Conversely, if λ is an eigenvalue of B11, then there is a nonzero vector u1 such thatB11u1 = λu1. In (8) we set u = u1 and v = θ to produce a solution of (9), and thisresult shows that any eigenvalue of B11 is also an eigenvalue of B. Finally, suppose thatλ is not an eigenvalue of B11 but is an eigenvalue of B22. Then there is a nonzero vectorv1 such that B22v1 = λv1; and so v1 satisfies the last equation in (9). To satisfy the firstequation in (9), we must solve

(B11 − λI)u = −B12v1.

But since λ is not an eigenvalue of B11, we know that B11 − λI is nonsingular, and sowe can solve (9). Thus any eigenvalue of B22 is also an eigenvalue of B.

As another example, consider the (7× 7) Hessenberg matrix

H =

2 3 1 6 −1 3 85 7 2 8 2 2 10 0 4 1 3 −5 20 0 6 1 2 4 30 0 0 4 1 2 10 0 0 0 0 6 50 0 0 0 0 7 3

. (10)

We first partition H as

H =[H11 H12

O H22

],

where H11 is the upper (2× 2) block

H11 =[

2 35 7

], H22 =

4 1 3 −5 26 1 2 4 30 4 1 2 10 0 0 6 50 0 0 7 3

.



Now the eigenvalues ofH are precisely the eigenvalues ofH11 and H22. The blockH11 is unreduced, so we can apply the algorithm to find the characteristic polynomialfor H11. H22 is reduced, however, so we partition H22 as

H22 =[C11 C12

O C22

],

where C11 and C22 are

C11 =

4 1 36 1 20 4 1

and C22 =

[6 57 3

].

The eigenvalues ofH22 are precisely the eigenvalues ofC11 andC22, and we can apply thealgorithm to find the characteristic polynomial for C11 and C22. In summary, the eigen-value problem for H has uncoupled into three eigenvalue problems for the unreducedHessenberg matrices H11, C11, and C22.

7.4 EXERCISES

In Exercises 1–8, use Algorithm 1 to find the character-istic polynomial for the given matrix.1.[

2 01 1

]2.[

0 03 0

]

3.

1 0 12 1 00 1 2

4.

1 2 11 3 −10 1 2

5.

2 4 11 1 30 1 5

6.

0 0 11 0 00 1 0

7. 0 1 0 11 2 1 10 1 0 10 0 2 1

8. 0 2 1 21 0 1 −10 2 0 20 0 1 1

In Exercises 9–12, partition the given matrix H intoblocks, as in the proof of Theorem 6. Find the eigenval-ues of the diagonal blocks and for each distinct eigen-value, find an eigenvector, as in Eq. (9).

9. H =

1 −1 1 41 3 −2 10 0 2 −10 0 −1 2

10. H =

1 1 2 11 1 1 30 0 3 00 0 1 4

11. H =

−2 0 −2 1−1 1 −2 3

0 1 −1 −20 0 0 2

12. H =

2 3 1 43 2 0 10 0 3 00 0 1 3

November 22, 2004 11:19 i56-ch07 Sheet number 37 Page number 519 cyan black

7.5 Householder Transformations 519

13. Consider the block matrix B given by

B =

a b c d

e f g h

0 0 w x

0 0 y z

=

[B11 B12

O B22

].

Verify, by expanding det(B), that det(B) =det(B11) det(B22).

14. Use the result of Exercise 13 to calculate det(H),where H is the matrix in Exercise 9.

15. There is one (3×3) permutation matrix P such thatP is an unreduced Hessenberg matrix. List this per-mutation matrix in column form.

16. As in Exercise 15, there is a unique (4× 4) permu-tation matrix P that is both unreduced and Hessen-berg. List P in column form.

17. Give the column form for the unique (n×n) permu-tation matrix P that is unreduced and Hessenberg.

18. Apply Algorithm 1 to determine the characteristicpolynomial for the (n×n)matrix P in Exercise 17.[Hint: Consider n = 3 and n = 4 to see the natureof system (4).]

19. Let H be an unreduced (n× n) Hessenberg matrixand let λ be an eigenvalue of H . Show that thegeometric multiplicity of λ is equal to 1. [Hint:See Exercise 11 of Section 7.3.]

20. Let H be an unreduced (n× n) Hessenberg matrixand let λ be an eigenvalue ofH . Use Exercise 19 toshow that if H is symmetric, then H has n distincteigenvalues.

21. Consider the (2× 2) matrix H , where

H =[a b

b c

].

Calculate the characteristic polynomial for H anduse the quadratic formula to show that H has twodistinct eigenvalues if H is an unreduced matrix.

22. Let H be an unreduced (n× n) Hessenberg matrixand suppose Hu = λu, u �= θ . Show that the nthcomponent of u is nonzero.

23. Complete the proof of Theorem 5 by using inductionto show that the ith component of wi−1 is nonzero.

7.5 HOUSEHOLDER TRANSFORMATIONS

In this section, we consider another method for reducing a matrixA to Hessenberg form,using Householder transformations. A Householder transformation (or Householdermatrix) is a symmetric orthogonal matrix that has an especially simple form. As we willsee, one reason for wanting to use Householder matrices in a similarity transformationis that symmetry is preserved.

Definition 1 Let u be a nonzero vector in Rn and let I be the (n × n) identity matrix. The(n× n) matrixQ given by

Q = I − 2uT u

uuT (1)

is called a Householder transformation or a Householder matrix.

Householder matrices are a basic tool for applied linear algebra and are widely usedeven in applications not directly involving eigenvalues. For instance, we will see in



the next section that Householder matrices can be used to good effect in least-squaresproblems.

The following theorem shows that a Householder matrix is both symmetric andorthogonal.

Theorem 7 LetQ be a Householder matrix as in Eq. (1). Then:

(a) QT = Q.(b) QTQ = I .

Proof We leave the proof of property (a) to the exercises. To prove property (b), it is sufficientto show thatQQ = I , sinceQT = Q.

To simplify the notation, let b denote the scalar 2/(uT u) in Eq. (1). Thus

Q = I − buuT , b = 2/(uT u).

FormingQQ, we have

QQ = (I − buuT )(I − buuT )= I − 2buuT + b2(uuT )(uuT )= I − 2buuT + b2u(uT u)uT

= I − 2buuT + b2(uT u)(uuT ).

(2)

(Note: We used the associativity of matrix multiplication to write (uuT )(uuT ) =u(uT u)uT .)

Next, observe that uT u is a scalar and that

b2(uT u) = 4(uT u)2

(uT u) = 4uT u= 2b.

Thus from Eq. (2) it follows thatQQ = I .

Operations with Householder MatricesIn practice it is neither necessary nor desirable to calculate explicitly the entries of aHouseholder matrix Q. In particular, if we need to form matrix products such as QAand AQ, or if we need to form a matrix–vector productQx, then the result can be foundmerely by exploiting the form ofQ.

For instance, consider the problem of calculatingQx, whereQ is an (n×n)House-holder matrix and x is in Rn. As in the proof of Theorem 7, we writeQ as

Q = I − buuT , b = 2/(uT u).

NowQx is given by

Qx = (I − buuT )x = x − b(uuT )x. (3)

In this expression, note that b(uuT )x = bu(uT x) and that uT x is a scalar. Thus, fromEq. (3),Qx has the form x − γu, where γ is the scalar b(uT x):

Qx = x − γu, γ = 2uT x/(uT u). (4)



Hence to form Qx we need only calculate the scalar γ and then perform the vectorsubtraction, x − γu, as indicated by Eq. (4).

Similarly, if A is an (n × p) matrix, then we can form the product QA withoutactually having to calculate Q. Specifically, if A = [A1,A2, . . . ,Ap] is an (n × p)matrix, then

QA = [QA1,QA2, . . . ,QAp].As in Eq. (4), the columns ofQA are found from

QAk = Ak − γku, γk = 2uTAk/(uT u).

Example 1 LetQ denote the Householder matrix of the form (1), where u = [1, 2, 0, 1]T . CalculateQx andQA, where

x =

1143

and A =

1 62 01 5−2 3

.

Solution By Eq. (1),Q is the (4× 4) matrixQ = I − buuT , where b = 2/(uT u) = 2/6 = 1/3.In detail,Qx is given by

Qx = (I − (1/3)uuT )x = x − uT x3

u = x − 63u = x − 2u.

ThusQx is the vector

Qx = x − 2u =

1143

− 2

1201

=

−1−3

41

.

The matrix QA is found by forming QA = [QA1,QA2]. Briefly, the calculationsare

QA1 = A1 −(uTA1

3

)u = A1 − u

QA2 = A2 −(uTA2

3

)u = A2 − 3u.

ThusQA is the (4× 2) matrix given by

QA =

0 30 −61 5−3 0

.



Householder Reduction to Hessenberg FormIn Section 7.3, we saw how an (n× n) matrix A could be reduced to Hessenberg formby using a sequence of similarity transformations

Qn−2 · · ·Q2Q1AQ−11 Q

−12 · · ·Q−1

n−2 = H. (5)

We will see that the matricesQi above can be chosen to be Householder matrices.In the next subsection, we will give the details of how these Householder matrices are

constructed. First, however, we want to comment on the significance of using orthogonalmatrices in a similarity transformation.

Let us define a matrixQ by

Q = Qn−2 · · ·Q2Q1,

where theQi are as in Eq. (5). Next, recall thatQ−1 is given by

Q−1 = Q−11 Q

−12 · · ·Q−1

n−2.

Thus Eq. (5) can be written compactly as

QAQ−1 = H,where H is a Hessenberg matrix.

Theorem 8 Let A be an (n × n) matrix and let Q = Qn−2 · · ·Q2Q1, where the Qi are (n × n)and nonsingular. Also, suppose that QAQ−1 = H . If the matrices Qi are orthogonal,1 ≤ i ≤ n− 2, then:

(a) The product matrixQ is also orthogonal, so that

QAQ−1 = QAQT = H.(b) If A is symmetric, then H is also symmetric.

We leave the proof of Theorem 8 to the exercises.As Theorem 8 indicates, a sequence of similarity transformations of the form (5)

will preserve symmetry when the matrices Qi are orthogonal. So if the Qi in (5) areorthogonal and A is symmetric, then the Hessenberg matrix H in (5) is also symmetric.A symmetric Hessenberg matrix has a special form—it is “tridiagonal.” The form of ageneral (6× 6) tridiagonal matrix T is given in Fig. 7.5.

T =

××0000

×××000

0×××00

00×××0

000×××

0000××

SuperdiagonalMain diagonalSubdiagonal

Figure 7.5 A tridiagonal matrix



As its name suggests, a tridiagonal matrix has three diagonals: a subdiagonal, themain diagonal, and a superdiagonal. Of course, every tridiagonal matrix is a Hessenbergmatrix. Moreover, every symmetric Hessenberg matrix is necessarily tridiagonal.

Once we see how to design the orthogonal matricesQi in Eq. (5), we will have thefollowing result.

Let A be an (n× n) symmetric matrix. It is easy to construct an orthogonal ma-trixQ such thatQAQT = T , where T is tridiagonal.

Although we cannot diagonalize a symmetric matrix A without knowing all theeigenvalues of A, we can always reduce A to tridiagonal form by using a sequence ofHouseholder transformations. In this sense, tridiagonal form represents the closest wecan get to diagonal form without actually finding the eigenvalues of A.

Constructing Householder MatricesWe now return to the main objective of this section. Given a general (n× n) matrix A,find orthogonal matricesQ1,Q2, . . . ,Qn−2 such that

Qn−2 · · ·Q2Q1AQT1Q

T2 · · ·QTn−2 = H,

where H is a Hessenberg matrix.As with the procedure described in Section 7.3, the productQ1AQ

T1 will have zeros

in column 1, below the subdiagonal. Similarly, the product Q2(Q1AQT1 )Q

T2 will have

zeros in columns 1 and 2, and so on. That is, we will be able to design HouseholdermatricesQ1,Q2, . . . such that

Q1AQT1 =

× × × × · · · ×× × × × · · · ×0 × × × · · · ×0 × × × · · · ×...

0 × × × · · · ×

,

Q2Q1AQT1Q

T2 =

× × × × · · · ×× × × × · · · ×0 × × × · · · ×0 0 × × · · · ×...

0 0 × × · · · ×

.

To accomplish each of the individual steps of the previously described reductionprocess, we want to be able to design a Householder matrix that solves the followingproblem.



ProblemLet v be an (n× 1) vector,

v =

v1

v2

v3...vn

.

Given an integer k, 1 ≤ k ≤ n, find a Householder matrix Q such that Qv = wand that w has the form

w =

v1

v2...vk−1

s

00...

0

. (6)

In words, the problem posed above is as follows: Given a vector v in Rn, find aHouseholder matrixQ so that forming the productQv results in a vectorw = Qv, wherew has zeros in the k + 1, k + 2, . . . , n components. Furthermore, w and v should agreein the first k − 1 components.

It is easy to form a Householder matrix Q such that the vector w = Qv has theform (6). Specifically, suppose that u is a vector andQ = I −buuT, b = 2/(uTu). SinceQv is given by

Qv = v − γu,we see that the form (6) can be achieved if u satisfies these conditions:

(a) γ = 1.(b) uk+1 = vk+1, uk+2 = vk+2, . . . , un = vn.(c) u1 = 0, u2 = 0, . . . , uk−1 = 0.

The following algorithm will solve the problem posed above.



Algorithm 2 Given an integer k, 1 ≤ k ≤ n, and a vector v = [v1, v2, . . . , vn]T , constructu = [u1, u2, . . . , un]T as follows:

1. u1 = u2 = · · · = uk−1 = 0.2. uk = vk − s, where

s = ±√v2k + v2

k+1 + · · · + v2n.

3. ui = vi for i = k + 1, k + 2, . . . , n.

In step (2), choose the sign of s so that vks ≤ 0.

For the vector u defined by Algorithm 2, the Householder matrix Q = I − buuT ,b = 2/(uT u), has the property that the productQv is of the desired form (6). In detail,Qv is given by

Qv = v − u =

v1

v2...vk−1

vk

vk+1...vn

−

00...0

vk − svk+1...vn

=

v1

v2...vk−1

s

0...0

. (7)

To verify Eq. (7), it is necessary only to calculateQv according to Eq. (4). Thus

Qv = v − γu, γ = 2uTv/(uTu). (8)

From the definition of u, it is clear that

uTu = (vk − s)2 + (vk+1)2 + · · · + (vn)2

= v2k − 2svk + s2 + v2

k+1 + · · · + v2n

= 2s2 − 2svk.

Similarly,

uTv = (vk − s)vk + (vk+1)2 + · · · + (vn)2

= s2 − svk.Therefore, from Eq. (8),

γ = 2uTvuTu= 2(s2 − svk)

2s2 − 2svk= 1.

So, since γ = 1, the calculation in Eq. (7) follows from Eq. (8).



Example 2 Let v = [1, 12, 3, 4]T . Use Algorithm 2 to determine Householder matricesQ1 andQ2,where:

(a) Q1v =

1s1

00

;

(b) Q2v =

112s2

0

.

Solution

(a) Q is defined by selecting a vector u according to Algorithm 2. Since k = 2,we calculate

s = ±√v2

2 + v23 + v2

4 = ±√

144+ 9+ 16 = ±13.

Choosing the sign of s so that sv2 ≤ 0, we have s = −13. ThusQ1 is definedby the vector u = [0, 25, 3, 4]T .

(b) k = 3, and we find

s = ±√v2

3 + v24 = ±

√9+ 16 = ±5.

Choosing s = −5, the vector u that definesQ2 is given by u = [0, 0, 8, 4]T .

Example 3 Let x = [4, 2, 5, 5]T . Find the product Q2x, where Q2 is the Householder matrix inExample 2.

Solution As we know from Eq. (4), the productQx is given by

Qx = x − γu, γ = 2uT x/(uT u).

From Example 2, u = [0, 0, 8, 4]T . Thus

γ = 2uT xuT u

= 12080= 3

2.

Therefore,

Qx = x −(

32

)u =

4255

−

00

126

=

42−7−1

.

For a given vector v in Rn, Algorithm 2 tells us how to construct a Householdermatrix Q so that Qv has zeros in its last n − k components. We now indicate how the



algorithm can be applied to reduce a matrix A to Hessenberg form. As in Section 7.3,we illustrate the process for a (4× 4)matrix and merely note that the process extends to(n× n) matrices in an obvious fashion.

Let A be the (4 × 4) matrix A = [A1,A2,A3,A4]. Construct a Householdermatrix Q so that the vector QA1 has zeros in its last two components. Thus, formingQA = [QA1,QA2,QA3,QA4] will produce a matrix of the form

QA =

c11 c12 c13 c14

c21 c22 c23 c24

0 c32 c33 c34

0 c42 c43 c44

.

Next, form B = QAQ and note that B is similar to A sinceQ is a Householder matrix.Also (see Exercise 25), it can be shown that forming B = (QA)Q will not disturb thezero entries in the (3, 1) and (4, 1) positions. Thus, B = QAQ has the form

B =

b11 b12 b13 b14

b21 b22 b23 b24

0 b32 b33 b34

0 b42 b43 b44

. (9)

For the matrix above, B = [B1,B2,B3,B4], choose a Householder matrix S suchthat the vector SB2 has a zero in its last component. It can be shown (see Exercise 25)that SB1 = B1. Thus, SB = [SB1, SB2, SB3, SB4] has the form

SB =

b11 d12 d13 d14

b21 d22 d23 d24

0 d32 d33 d34

0 0 d43 d44

. (10)

Finally, the matrix SBS is similar to B and hence to A. Moreover (see Exercise 25),forming (SB)S does not disturb the zero entries in (10). Therefore, SBS = S(QAQ)Shas the desired Hessenberg form

SBS = SQAQS =

h11 h12 h13 h14

h21 h22 h23 h24

0 h32 h33 h34

0 0 h43 h44

. (11)

The next example illustrates the final stage of reduction to Hessenberg form for a(4× 4) matrix, the process of going from Eqs. (9) to (11) above.



Example 4 Find a Householder matrix S such that SBS = H , where H is a Hessenberg matrix andwhere B is given by

B =

1 2 4 23 3 −4 20 3 9 −10 −4 −2 8

.

Also, calculate the matrix SB.

Solution We seek a Householder matrix S such that the vector SB2 has a zero in the fourthcomponent, where B2 = [2, 3, 3,−4]T .

Using k = 3 in Algorithm 2, we define a vector u, where

u =

008−4

.

The appropriate Householder matrix is S = I − buuT , b = 2/(uTu) = 1/40.Next, we calculate the matrix SB by using SB = [SB1, SB2, SB3, SB4], where

SBi = Bi − γiu; γi = 2uTBi/(uTu) = uTBi/40.

The details are:

(a) γ1 = uTB1/40 = 0, so SB1 = B1 = [1, 3, 0, 0]T .(b) γ2 = uTB2/40 = 1, so SB2 = B2 − u = [2, 3,−5, 0]T .(c) γ3 = uTB3/40 = 2, so SB3 = B3 − 2u = [4,−4,−7, 6]T .(d) γ4 = uTB4/40 = −1, so SB4 = B4 + u = [2, 2, 7, 4]T .

Thus the matrix SB is given by

SB =

1 2 4 23 3 −4 20 −5 −7 70 0 6 4

.

(Note: The Householder matrix S does not disturb the first column ofB, since uTB1 = 0and hence γ1 = 0; see Exercise 25.)

In order to complete the similarity transformation begun in Example 4, we need tocalculate the matrix (SB)S. Although we know how to form QA and Qx when Q is aHouseholder matrix, we have not yet discussed how to form the product AQ.

The easiest way to form AQ is to proceed as follows:

1. CalculateM = QAT .2. FormMT = (QAT )T = AQT = AQ.

(Note: In step 2, AQT = AQ since a Householder matrixQ is symmetric.)



Example 5 For the matrix SB in Example 4, calculate H , where H = SBS.

Solution Following the two-step procedure above, we first calculate S(SB)T. From Example 4 wehave

(SB)T =

1 3 0 02 3 −5 04 −4 −7 62 2 7 4

.

For notation, let the columns of (SB)T be denoted as Ri , so (SB)T = [R1,R2,R3,R4].As in Example 4, the matrix S(SB)T has column vectors SRi , where

SRi = Ri − γiu, γi = uTRi/40, 1 ≤ i ≤ 4.

With u from Example 4, the scalars are

γ1 = 3/5, γ2 = −1, γ3 = −21/10, and γ4 = 4/5.

Therefore, the matrix S(SB)T is given by

S(SB)T =

1.0 3.0 0.0 0.02.0 3.0 −5.0 0.0−.8 4.0 9.8 −.44.4 −2.0 −1.4 7.2

.

The transpose of the matrix above is the Hessenberg matrix H , where H = SBS.

7.5 EXERCISES

Let Q = I − buuT be the Householder matrix definedby (1), where u = [1,−1, 1,−1]T . In Exercises 1–8,calculate the indicated product.

1. Qx, for x =

3258

2. Qx, for x =

0118

3. QA, for A =

2 16 34 22 4

4. QA, for A =

0 1 22 2 11 4 33 7 2

5. xTQ, for x =

3225

6. xTQ, for x =

1322

7. AQ, for A =[

2 1 2 11 0 1 4

]

8. BQ, where B is the (4× 4) matrix in Example 4.



For the given vectors v and w in Exercises 9–14, deter-mine a vector u such that (I − buuT )v = w.

9. v =

1221

, w =

1a

00

10. v =

1111

, w =

a

000

11. v =

2143

, w =

21a

0

12. v =

20−2

21

, w =

20a

00

13. v =

000−3

4

, w =

000a

0

14. v =

11400

, w =

11a

00

In Exercises 15–20, find a Householder matrix Q suchthat QAQ = H , with H a Hessenberg matrix. List thevector u that definesQ and gives the matrix H .

15. A =

1 3 43 1 14 1 1

16. A =

1 0 50 2 15 1 2

17. A =

0 −4 3−4 0 1

3 1 2

18. A =

1 2 0 02 1 3 40 3 1 10 4 1 1

19. A =

2 1 1 23 4 0 10 −3 1 10 4 2 3

20. A =

1 2 3 04 1 2 30 0 2 10 1 3 2

21. LetQ denote the Householder matrix defined by (1).Verify thatQ is symmetric.

22. LetQ be the Householder matrix defined by (1) andcalculate the productQu. If v is any vector orthogo-nal to u, what is the result of formingQv?

23. Consider the (n × n) Householder matrix Q =I − buuT , b = 2/(uTu). Show that Q has eigen-values λ = −1 and λ = 1. [Hint: Use the Gram–Schmidt process to argue that Rn has an orthogo-nal basis {u,w2,w3, . . . ,wn}. Also, recall Exer-cise 22.]

24. Prove Theorem 8.25. Consider a (4 × 4) matrix B of the form shown in

(9), where b31 = 0 and b41 = 0. Let u be a vector ofthe form u = [0, a, b, c]T , and let Q = I − buuTbe the associated Householder matrix.a) Show that forming the product BQ does not

change the first column of B. [Hint: Form BQby using the two-step procedure illustrated inExample 5.]

b) Let u be a vector of the form u = [0, 0, a, b]T ,and letQ = I − buuT be the associatedHouseholder matrix. Show that forming QBQdoes not alter the first column of B.


7.6 The QR Factorization and Least-Squares Solutions 531

7.6 THE QR FACTORIZATION ANDLEAST-SQUARES SOLUTIONS

The Householder transformations of Section 7.5 can be used effectively to constructan algorithm to find the least-squares solution of an overdetermined linear system,Ax = b. This construction also yields a useful way of expressing A as a productof two other matrices, called the QR factorization. The QR factorization is a principalinstrument in many of the software packages for numerical linear algebra.

Reduction to Trapezoidal FormThe following theorem is proved by construction, and hence its proof serves as analgorithm for the desired factorization.

Theorem 9 Let A be an (m× n) matrix with m ≥ n. There exists an (m×m) orthogonal matrix Ssuch that

SA =[R

O

],

where R is an (n× n) upper-triangular matrix and O is the [(m− n)× n] zero matrix.(If m = n, SA = R.)

Proof LetA = [A1,A2, . . . ,An], where the column vectorsAi are inRm. Let S1 be the (m×m)Householder matrix such that

S1A1 = [s1, 0, 0, . . . , 0]T .Thus the product S1A = [S1A1, S1A2, . . . , S1An] has the form

S1A =

s1 c12 · · · c1n

0 c22 · · · c2n

0 c32 · · · c3n...

...

0 cm2 · · · cmn

.

For notation, let B = S1A and write B as B = [B1,B2, . . . ,Bn]. Next, choose theHouseholder S2 such that

S2B2 = [c12, s2, 0, 0, . . . , 0]T .As in reduction to Hessenberg form, notice that S2B1 = B1. Thus the product S2B =S2S1A has the form

S2S1A =

s1 c12 d13 · · · d1n

0 s2 d23 · · · d2n

0 0 d33 d3n...

......

...

0 0 dm3 · · · dmn

.



Continuing in this fashion, we ultimately find Householder matrices S1, S2, . . . , Sn suchthat the product Sn · · · S2S1A has the form

Sn · · · S2S1A =

× × × × · · · ×0 × × × · · · ×0 0 × × · · · ×0 0 0 × · · · ×...

0 0 0 0 · · · ×0 0 0 0 · · · 0...

0 0 0 0 · · · 0

←− row n

←− row m.↑

column nEquivalently, with S = Sn · · · S2S1, we find that S is orthogonal and

SA =[R

O

], where

{R is (n× n) upper triangularO is the [(m− n)× n] zero matrix.

(Note: The block matrix

[R

O

]in Theorem 9 is called an upper-trapezoidal matrix.

Also note that we are not interested in preserving similarity in Theorem 9. Thus we donot form S1AS1 or S2S1AS1S2 in the construction described in the proof of Theorem 9.)

Example 1 Following the proof of Theorem 9, find Householder matrices S1 and S2 such that S2S1A

is in trapezoidal form, where

A =

1 −2/3−1 3

0 −2−1 1

1 0

.

Solution Following Algorithm 2 in Section 7.5, we define a vector u by

u =

3−1

0−1

1

.

The first Householder matrix S1 is then S1 = I − buuT , where b = 2/(uTu) = 1/6.We next calculate S1A = [S1A1, S1A2], where S1Ai = Ai − γiu, γi = uTAi/6.

The scalars γi are γ1 = 1 and γ2 = −1. Thus the matrix S1A has columns A1 − u



and A2 + u:

S1A =

−2 7/30 20 −20 00 1

.

We now define the second Householder matrix S2, where S2 is designed so that S2(S1A)

has zeros in positions (3, 2), (4, 2), and (5, 2).Following Algorithm 2, define a vector v by

v =

05−2

01

and set S2 = I − bvvT , b = 2/(vTv) = 1/15. Forming S2(S1A), we obtain

S2S1A =

−2 7/30 −30 00 00 0

=[R

O

]; R =

[ −2 7/30 −3

].

Least-Squares SolutionsSuppose that Ax = b represents a system of m linear equations in n unknowns, wherem ≥ n. If the system is inconsistent, it is often necessary to find a “least-squaressolution” to Ax = b. By a least-squares solution (recall Section 3.8), we mean a vectorx∗ in Rn such that

‖Ax∗ − b‖ ≤ ‖Ax − b‖, for all x in Rn. (1)

In Section 3.8, we saw a simple procedure for solving Eq. (1). That is, x∗ can be obtainedby solving the normal equation:

ATAx = AT b.In this subsection, we consider an alternative procedure. The alternative is not so efficientfor hand calculations, but it is the preferred procedure for machine calculations. Thereason is based on the observation that the matrix ATA is often “ill-conditioned.” Thusit is frequently difficult to compute numerically an accurate solution to ATAx = AT b.

Recall from Section 4.7 that orthogonal matrices preserve the length of a vectorunder multiplication. That is, if y is any vector in Rm and Q is an (m×m) orthogonalmatrix, then

‖Qy‖ = ‖y‖.



In the context of (1), letQ be an (m×m) orthogonal matrix. Also, suppose that x∗is a vector in Rn such that

‖Ax∗ − b‖ ≤ ‖Ax − b‖, for all x in Rn.

Then, since ‖Ax∗ − b‖ = ‖Q(Ax∗ − b)‖ = ‖QAx∗ −Qb‖ and ‖Ax− b‖ = ‖Q(Ax−b)‖ = ‖QAx −Qb‖, we have

‖QAx∗ −Qb‖ ≤ ‖QAx −Qb‖, for all x in Rn. (2)

Similarly, if a vector x∗ satisfies Eq. (2), then that same vector also satisfies Eq. (1).In other words:

If Q is an orthogonal matrix, then finding the least-squares solution of AQx =Qb is equivalent to finding the least-squares solution of Ax = b.

Now, using the construction in Theorem 9, we can form an orthogonal matrixQ so thatthe least-squares solution ofQAx = Qb is easy to find.

In particular, for an (m×n)matrixA, let S be an orthogonal matrix such that SA is intrapezoid form. Consider the problem of finding the least-squares solution of SAx = Sb,where

SA =[R

O

], Sb =

[cd

], where c is in Rn and d is in Rm−n.

For any vector x in Rn, we have

SAx − Sb =[Rxθ

]−[

cd

]=[Rx − c−d

].

Thus, ‖SAx − Sb‖ can be found from the relationship

‖SAx − Sb‖2 = ‖Rx − c‖2 + ‖d‖2. (3)

By Eq. (3), a vector x∗ inRn minimizes ‖SAx−Sb‖ if and only if x∗minimizes ‖Rx−c‖.As an example to illustrate these ideas, consider the (5× 3) trapezoidal matrix SA,

where

SA =

1 2 10 2 40 0 30 0 00 0 0

=[R

O

]. (4)

In Eq. (4), R is the upper (3× 3) block of SA and O is the (2× 3) zero matrix.



Now for x in R3 and SA given by Eq. (4), note that SAx has the form

SAx =

1 2 10 2 40 0 30 0 00 0 0

x1

x2

x3

=

x1 + 2x2 + x3

2x2 + 4x3

3x3

00

=[Rxθ

]. (5)

In Eq. (5) the vector Rx is three-dimensional and θ is the two-dimensional zero vector.In general, as noted above, a vector x∗ inRn minimizes ‖SAx−Sb‖ if and only if x∗

minimizes ‖Rx− c‖ in Eq. (3). Since R is upper triangular, the problem of minimizing‖Rx−c‖ is fairly easy. In particular, ifR is nonsingular, then there is a unique minimizer,x∗, and x∗ is the solution of Rx = c. The nonsingular case is summarized in the nexttheorem.

Theorem 10 Let A be an (m × n) matrix, and suppose that the column vectors of A are linearlyindependent. Let S be an orthogonal matrix such that SA is upper trapezoidal. Given avector b in Rm, let SA and Sb be denoted as

SA =[R

O

]and Sb =

[cd

].

Then:

(a) R is nonsingular.(b) There is a unique least-squares solution of Ax = b, x∗.(c) The vector x∗ satisfies Rx∗ = c.

Proof In Exercise 19, the reader is asked to show that R is nonsingular when the columns of Aare linearly independent. Then, since there is a unique vector x∗ such that Rx∗ = c, therest of the conclusions in Theorem 10 follow from Eq. (3).

Example 2 Use Theorem 10 to find the least-squares solution of Ax = b, where

A =

1 −2/3−1 3

0 −2−1 1

1 0

and b =

13−4−3

3

.

Solution In Example 1, we found Householder matrices S1 and S2 such that S2S1A is in trapezoidalform. The matrices S1 and S2 were defined by vectors u and v (respectively), where



u =

3−1

0−1

1

and v =

05−2

01

.

The vector Sb = S2S1b is found from

S1b = b− u =

−24−4−2

2

and S2(S1b) = S1b− 2v =

−2−6

0−2

0

=[

cd

].

By Example 1, the matrix SA is given by

SA =

−2 7/30 −30 00 00 0

=[R

O

].

Thus the least-squares solution, x∗, is found by solving Rx = c, where c = [−2,−6]T :

−2x1 + (7/3)x2 = −2−3x2 = −6.

The solution of Rx = c is x∗ = [10/3, 2]T .

The QR FactorizationThe main result of this subsection is the following theorem.

Theorem 11 Let A be an (m × n) matrix with m ≥ n, where A has rank n. There is an (m × n)matrixQ with orthonormal column vectors such that

A = QR.Moreover, in the factorizationA = QR, the matrixR is upper triangular and nonsingular.

Proof From Theorem 9, we know there is an orthogonal matrix S such that

SA =[R

O

],

where R is an (n× n) upper-triangular matrix and R is nonsingular.



Since S is orthogonal, the reduction displayed above is equivalent to

A = ST[R

O

]. (6)

To simplify the notation, we let B denote the (m×m)matrix ST so that Eq. (6) becomes

A = B[R

O

]= [B1,B2, . . . ,Bn, . . . ,Bm]

[R

O

].

Examination of this product on the right-hand side shows that

A = [B1,B2, . . . ,Bn]R. (7)

That is, the column vectorsBn+1, . . . ,Bm are multiplied by the zero entries ofO. Hence,Eq. (7) yields a factorization of A that is different from Eq. (6) but is still valid.

The proof of Theorem 11 is complete once we define the (m × n) matrix Q to begiven by

Q = [B1,B2, . . . ,Bn].That is, Q consists of the first n columns of the matrix ST , where S is the (m × m)orthogonal matrix defined in Theorem 9.

Note that if m = n so that A is a square matrix, then the factors Q and R are alsosquare, andQ is an orthogonal matrix. This feature is illustrated in the next example.

Example 3 Find a QR factorization for the matrix A:

A =

3 1 20 3 −14 8 6

.

Solution Following the construction shown in the proof of Theorem 9, we use Householder ma-trices to reduce A to upper-triangular form.

First, define a Householder matrix S1 from S1 = I − buuT , where u = [8, 0, 4]T .Then we have

S1A =−5 −7 −6

0 3 −10 4 2

.

Next, define S2 = I − buuT , where u = [0, 8, 4]T . Forming S2(S1A), we obtain

S2S1A =−5 −7 −6

0 −5 −10 0 2

= R.

With the above, the desired QR factorization is given by

A = S1S2R = QR, Q = S1S2.



If we wished to do so, we could form the product S1S2 and list the matrixQ explic-itly. For most applications, however, there is no need to know the individual entriesofQ.

Example 4 Use the QR factorization found in Example 3 to solve Ax = b1 and Ax = b2, where

b1 =

188

and b2 =

10−410

.

Solution The factorization found in Example 3 states that Ax = bk can be written as (QR)x =bk, k = 1, 2. Equivalently,

(S1S2R)x = bk or Rx = S2S1bk, k = 1, 2.

Since S1 and S2 are Householder matrices, it is easy to form the vectors S2S1bk, k = 1, 2.We find that

S2S1b1 =−7−8−4

and S2S1b2 =

−14

42

.

Solving Rx = S2S1bk , we obtain the solutions to Ax = bk:

x =

12−2

and x =

3−1

1

.

The QR AlgorithmIn practice, the eigenvalues of an (n× n)matrix A are usually found by transforming Ato Hessenberg form H and then applying some version of the QR algorithm to H . Thesimplest and most basic version is given next.

The QR AlgorithmGiven an (n× n)matrix B, let B(1) = B. For each positive integer k, find the QRfactorization of B(k). That is, B(k) = Q(k)R(k), whereQ(k) is orthogonal and R(k)is upper triangular. Then set B(k+1) = R(k)Q(k) and repeat the process.

Since R(k) = [Q(k)]T B(k), it follows that

B(k+1) = [Q(k)]T B(k)Q(k).Hence each B(k) is similar to B. If all the eigenvalues of B have distinct absolute values,the QR iterates, B(k), converge to an upper-triangular matrix T with the eigenvalues ofB on its diagonal. Under other conditions, the iterates converge to other forms whoseeigenvalues are discernible.



Example 5 Perform one step of the QR algorithm on matrix A in Example 3.

Solution Let A(1) = A and let Q(1) and R(1) be the orthogonal and upper-triangular matrices,respectively, that were computed in Example 3.

If we form the product A(2) = R(1)Q(1) by using the two-step method illustrated inExample 5 of Section 7.5, we find

A(2) = R(1)Q(1)

=

7.8 3.88 5.84.8 3.48 3.64

−1.6 −.96 .72

.

(Note: We can draw no conclusions from just one iteration, but already in A(2) wecan see that the size of the (2, 1), (3, 1), and (3, 2) entries begins to diminish.)

7.6 EXERCISES

In Exercises 1–4, use Theorem 10 to find a vector x∗such that ‖Ax∗ − b‖ ≤ ‖Ax − b‖ for all x.

1. A =

1 20 10 0

, b =

313

2. A =

2 30 10 0

, b =

1−1

2

3. A =

1 2 10 1 30 0 20 0 0

, b =

674−1

4. A =

2 0 30 1 20 0 30 0 0

, b =

5435

In Exercises 5–10, find a Householder matrix S suchthat SA = R, with R upper triangular. List R and thevector u that defines S = I − buuT .

5. A =[

3 54 10

]6. A =

[0 31 5

]

7. A =[

0 24 6

]8. A =

[ −4 203 −10

]

9. A =

1 2 10 0 60 1 8

10. A =

3 1 20 3 50 4 10

In Exercises 11–14, use Householder matrices to reducethe given matrix A to upper-trapezoidal form.

11. A =

1 −5/32 122 84 15

12. A =

1 21 31 31 6

13. A =

2 40 30 00 4

14. A =

3 50 20 10 2

In Exercises 15–18, use Theorem 10 to find the least-squares solution to problem Ax = b.

15. A in Exercise 11, b =

110

01




5021


28

168


5302

19. Prove property (a) of Theorem 10.

7.7 MATRIX POLYNOMIALS AND THECAYLEY–HAMILTON THEOREM

The objective of this section is twofold. First, we wish to give a partial justification ofthe algorithm (presented in Section 7.4) for finding the characteristic polynomial of aHessenberg matrix. Second, we want to lay some of the necessary foundation for thematerial in Section 7.8, which describes how a basis for Rn can be constructed by usingeigenvectors and generalized eigenvectors. These ideas are indispensable if we want tosolve a difference equation xk = Axk−1 or a differential equation x′(t) = Ax(t), whereA is defective.

Matrix PolynomialsTo complete our discussion of the algorithm presented in Section 7.4, it is convenient tointroduce the idea of a matrix polynomial. By way of example, consider the polynomial

q(t) = t2 + 3t − 2.

If A is an (n× n)matrix, then we can define a matrix expression corresponding to q(t):

q(A) = A2 + 3A− 2I,

where I is the (n× n) identity. In effect, we have inserted A for t in q(t) = t2 + 3t − 2and defined A0 by A0 = I . In general, if q(t) is the kth-degree polynomial

q(t) = bktk + · · · + b2t2 + b1t + b0,

and if A is an (n× n) matrix, we define q(A) by

q(A) = bkAk + · · · + b2A2 + b1A+ b0I,

where I is the (n × n) identity matrix. Since q(A) is obviously an (n × n) matrix, wemight ask for the eigenvalues and eigenvectors of q(A). It is easy to show that if λ isan eigenvalue of A, then q(λ) is an eigenvalue of q(A). (Note that q(λ) is the scalarobtained by substituting the value t = λ into q(t).)

Theorem 12 Suppose that q(t) is a kth-degree polynomial and that A is an (n× n) matrix such thatAx = λx, where x �= θ . Then q(A)x = q(λ)x.


7.7 Matrix Polynomials and the Cayley–Hamilton Theorem 541

Proof Suppose that Ax = λx, where x �= θ . As we know, a consequence is that A2x = λ2x,and in general

Aix = λix, i = 2, 3, . . . .

Therefore, if q(t) = bktk + · · · + b2t2 + b1t + b0, then

q(A)x = (bkAk + · · · + b2A2 + b1A+ b0I )x

= bkAkx + · · · + b2A2x + b1Ax + b0x

= bkλkx + · · · + b2λ2x + b1λx + b0x

= q(λ)x.Thus if λ is an eigenvalue of A, then q(λ) is an eigenvalue of q(A).

The next example provides an illustration of Theorem 12.

Example 1 Let q(t) denote the polynomial q(t) = t2 + 5t + 4. Find the eigenvalues and eigenvec-tors for q(A), where A is the matrix given by

A =[

2 03 1

].

Solution The eigenvalues of A are λ = 2 and λ = 1. Therefore, by Theorem 12, the eigenvaluesof q(A) are given by

q(2) = 18 and q(1) = 10.

By way of verification, we calculate q(A):

q(A) = A2 + 5A+ 4I =[

4 09 1

]+[

10 015 5

]+[

4 00 4

]=[

18 024 10

].

As the calculation above confirms, q(A) has eigenvalues λ = q(2) = 18 and λ =q(1) = 10.

An interesting special case of Theorem 12 is provided when q(t) is the characteristicpolynomial for A. In particular, suppose that λ is an eigenvalue of A and that p(t) isthe characteristic polynomial for A so that p(λ) = 0. Since p(λ) is an eigenvalue ofp(A) and p(λ) = 0, we conclude that zero is an eigenvalue for p(A); that is, p(A) is asingular matrix. In fact, we will be able to prove more than this; we will show that p(A)is the zero matrix [p(A) = O is the conclusion of the Cayley–Hamilton theorem].

Example 2 Calculate the matrix p(A), where

A =[

1 −22 3

]

and where p(t) is the characteristic polynomial for A.



Solution The characteristic polynomial for A is given by p(t) = det(A − tI ) = t2 − 4t + 7.Therefore, the matrix p(A) is given by

p(A) = A2 − 4A+ 7I =[ −3 −8

8 5

]−[

4 −88 12

]+[

7 00 7

]=[

0 00 0

].

Thus Example 4 provides a particular instance of the Cayley–Hamilton theorem: Ifp(t) = det(A− tI ), then p(A) = O.

The theorems that follow show that the algorithm given in Section 7.4 leads to apolynomial p(t) whose zeros are the eigenvalues of H . In the process of verifying this,we will prove an interesting version of the Cayley–Hamilton theorem that is applicableto an unreduced Hessenberg matrix. Before beginning, we make an observation aboutthe sequence of vectors w0,w1,w2, . . . defined in Algorithm 1:

wi = Hwi−1, i = 1, 2, . . . .

Since w0 = e1, then w1 = Hw0 = He1. Given that w1 = He1, we see that

w2 = Hw1 = H(He1) = H 2e1;and in general wk = Hke1. Thus we can interpret the sequence w0,w1,w2, . . . ,wn asbeing given by e1, He1, H

2e1, . . . , Hne1.

With this interpretation, we rewrite the equation a0w0 + a1w1 + · · · + an−1wn−1 +wn = θ given in Algorithm 1 as

a0e1 + a1He1 + a2H2e1 + · · · + an−1H

n−1e1 +Hne1 = θ; (1)

or by regrouping, (1) is the same as

(a0I + a1H + a2H2 + · · · + an−1H

n−1 +Hn)e1 = θ . (2)

Now Theorem 5 asserts that if H is an unreduced (n× n) Hessenberg matrix, then thevectors e1, He1, H

2e1, . . . , Hn−1e1 are linearly independent, and that there is a unique

set of scalars a0, a1, . . . , an−1 that satisfy (1). Defining p(t) from (1) as

p(t) = a0 + a1t + a2t2 + · · · + an−1t

n−1 + tn,we see from Eq. (2) that p(H)e1 = θ . With these preliminaries, we prove the followingresult.

Theorem 13 Let H be an (n × n) unreduced Hessenberg matrix; let a0, a1, . . . , an−1 be the uniquescalars satisfying

a0e1 + a1He1 + a2H2e1 + · · · + an−1H

n−1e1 +Hne1 = θ;and let p(t) = a0 + a1t + a2t

2 + · · · + an−1tn−1 + tn. Then:

(a) p(H) is the zero matrix.(b) If q(t) = b0 + b1t + b2t

2 + · · · + bk−1tk−1 + tk is any monic kth-degree

polynomial, and if q(H) is the zero matrix, then k ≥ n. Moreover, if k = n,then q(t) ≡ p(t).



Proof For property (a), since {e1, He1, H2e1, . . . , H

n−1e1} is a basis for Rn, we can expressany vector y in Rn as a linear combination:

y = c0e1 + c1He1 + c2H2e1 + · · · + cn−1H

n−1e1.

Therefore, p(H)y is the vector

p(H)y = c0p(H)e1 + c1p(H)He1 + · · · + cn−1p(H)Hn−1e1. (3)

Now although matrix products do not normally commute, it is easy to see thatp(H)H i =Hip(H). Therefore, from Eq. (3), we can represent p(H)y as

p(H)y = c0p(H)e1 + c1Hp(H)e1 + · · · + cn−1Hn−1p(H)e1;

and since p(H)e1 = θ (see (1) and (2)), then p(H)y = θ for any y in Rn. In particular,p(H)ej = θ for j = 1, 2, . . . , n; and sincep(H)ej is the j th column ofp(H), it followsthat p(H) = O.

For the proof of property (b), suppose that q(H) is the zero matrix, where

q(H) = b0 + b1H + b2H2 + · · · + bk−1H

k−1 +Hk.

Then q(H)y = θ for every y in Rn, and in particular for y = e1 we have q(H)e1 = θ ,or

b0e1 + b1He1 + b2H2e1 + · · · + bk−1H

k−1e1 +Hke1 = θ . (4)

However, the vectors e1, He1, . . . , Hke1 are linearly independent when k ≤ n − 1; so

Eq. (4) can hold only if k ≥ n (recall that the leading coefficient of q(t) is 1; so we areexcluding the possibility that q(t) is the zero polynomial). Moreover, if k = n, we cansatisfy Eq. (4) only with the choice b0 = a0, b1 = a1, . . . , bn−1 = an−1 by Theorem 5;so if k = n, then q(t) ≡ p(t).

Since it was shown above that p(H) is the zero matrix whenever p(t) is the poly-nomial defined by the algorithm of Section 7.4, it is now an easy matter to show that thezeros of p(t) are precisely the eigenvalues of H .

Theorem 14 LetH be an (n×n) unreduced Hessenberg matrix, and letp(t) be the polynomial definedby Algorithm 1. Then λ is a root of p(t) = 0 if and only if λ is an eigenvalue of H .

Proof We show first that every eigenvalue of H is a zero of p(t). Thus we suppose thatHx = λx, where x �= θ . By Theorem 12, we know that

p(H)x = p(λ)x;and since p(H) is the zero matrix of Theorem 13, we must also conclude that

θ = p(λ)x.But since x �= θ , the equality θ = p(λ)x implies that p(λ) = 0. Thus every eigenvalueof H is a zero of p(t).

Conversely, suppose that λ is a zero of p(t). Then we can write p(t) in the form

p(t) = (t − λ)q(t), (5)



where q(t) is a monic polynomial of degree n − 1. Now equating coefficients of likepowers shows that if u(t) = r(t)s(t), where u, r , and s are polynomials, then we alsohave a corresponding matrix identity

u(A) = r(A)s(A)for any square matrix A. Thus from Eq. (5) we can assert that

p(H) = (H − λI)q(H). (6)

If H − λI were nonsingular, we could rewrite Eq. (6) as

(H − λI)−1p(H) = q(H);and since p(H) is the zero matrix by property (a) of Theorem 13, then q(H) would bethe zero matrix also. However, q(t) is a monic polynomial of degree n− 1, so property(b) of Theorem 13 assures us that q(H) is not the zero matrix. Thus, if λ is a root ofp(t) = 0, we know that H − λI must be singular, and hence λ is an eigenvalue of H .

We conclude this section by outlining a proof of the Cayley–Hamilton theorem foran arbitrary matrix. If H is a Hessenberg matrix of the form

H =

H11 H12 · · · H1r

O H22 · · · H2r...

O O · · · Hrr

, (7)

where H11, H22, . . . , Hrr are unreduced Hessenberg blocks, then we define p(t) to bethe characteristic polynomial for H , where

p(t) = p1(t)p2(t) · · ·pr(t)and where pi(t) is the characteristic polynomial for Hii, 1 ≤ i ≤ r .

Theorem 15 If p(t) is the characteristic polynomial for a Hessenberg matrixH , then p(H) is the zeromatrix.

Proof We sketch the proof for the case r = 2. If H has the form

H =[H11 H12

O H22

],

where H11 and H22 are square blocks, then it can be shown that Hk is a block matrix ofthe form

Hk =[Hk

11 Vk

O Hk22

].

Given this, it follows that if q(t) is any polynomial, then q(H) is a block matrix of theform

q(H) =[q(H11) W

O q(H22)

].



From these preliminaries, if H11 and H22 are unreduced blocks, then

p(H) = p1(H)p2(H) =[p1(H11) R

O p1(H22)

][p2(H11) S

O p2(H22)

];

and since p1(H11) and p2(H22) are zero blocks, it is easy to see that p(H) is the zeromatrix. This argument can be repeated inductively to show that p(H) is the zero matrixwhen H has the form (7) for r > 2.

Finally, we note that the essential features of polynomial expressions are preservedby similarity transformations. For example, ifH = SAS−1 and if q(t) is any polynomial,then (Exercise 5, Section 7.7)

q(H) = Sq(A)S−1.

Thus if A is similar toH and if p(H) is the zero matrix, then p(A) is the zero matrix aswell.

The remarks made above allow us to state the Cayley–Hamilton theorem as a corol-lary of Theorem 15.

Corollary TheCayley–HamiltonTheorem Ifp(t) is the characteristic polynomial for an (n×n)matrix A, then p(A) is the zero matrix.

7.7 EXERCISES

1. Let q(t) = t2 − 4t + 3. Calculate the matricesq(A), q(B), and q(C).

A =[

1 −11 3

], B =

[2 −1−1 2

],

C =−2 0 −2−1 1 −2

0 1 −1

2. The polynomial p(t) = (t − 1)3 = t3 − 3t2 + 3t − 1is the characteristic polynomial for A,B,C, and I .

A =

1 0 01 1 00 0 1

, B =

1 0 00 1 00 1 1

,

C =

1 0 01 1 00 1 1

, I =

1 0 00 1 00 0 1

a) Verify that p(A), p(B), p(C), and p(I) are eachthe zero matrix.

b) For A and B, find a quadratic polynomial q(t)such that q(A) = q(B) = O.

3. Suppose that q(t) is any polynomial and p(t) is thecharacteristic polynomial for a matrixA. If we dividep(t) into q(t), we obtain an identity

q(t) = s(t)p(t)+ r(t),where the degree of r(t) is less than the degree ofp(t). From this result, it can be shown that q(A) =s(A)p(A) + r(A); and since p(A) = O, q(A) =r(A).a) Let p(t) = t2 − 4t + 3 and q(t) = t5 − 4t4 +

4t3 − 5t2 + 8t − 1. Find s(t) and r(t) so thatq(t) = s(t)p(t)+ r(t).

b) Observe that p(t) is the characteristic polynomialfor the matrix B in Exercise 1. Calculate thematrix q(B) without forming the powers B5, B4,and so on.

4. Consider the (7 × 7) Hessenberg matrix H givenin (10) of Section 7.4, where H is partitionedwith three unreduced diagonal blocks, H11, H22, andH33. Verify that det(H − tI ) = −p1(t)p2(t)p3(t),



where p1(t), p2(t), and p3(t) are the characteris-tic polynomials for H11, H22, and H33 as given byAlgorithm 1.

5. Suppose that H = SAS−1 and q(t) is any polyno-mial. Show that q(H) = Sq(A)S−1. [Hint: Showthat Hk = SAkS−1 by direct multiplication.]

Exercises 6–8 give another proof that a symmetric ma-trix is diagonalizable.6. Let A be an (n × n) symmetric matrix. Let λ1 andλ2 be distinct eigenvalues of A with correspondingeigenvectors u1 and u2. Prove that uT1 u2 = 0. [Hint:Given that Au1 = λ1u1 and Au2 = λ2u2, show thatuT1 Au2 = uT2 Au1.]

7. LetW be a subspace ofRn, where dim(W) = d, d ≥1. Let A be an (n × n) matrix, and suppose that Axis inW whenever x is inW .a) Let x0 be any fixed vector inW . Prove thatAjx0 is inW for j = 1, 2, . . . . There is asmallest value k for which the set of vectors{x0, Ax0, A

2x0, . . . , Akx0} is linearly

dependent; and thus there are unique scalarsa0, a1, . . . , ak−1 such that

a0x0 + a1Ax0 + · · · + ak−1Ak−1x0 + Akx0 = θ .

Use these scalars to define the polynomial m(t),where m(t) = tk + ak−1t

k−1 + · · ·+ a1t + a0.Observe that m(A)x0 = θ ; m(t) is called theminimal annihilating polynomial for x0. Byconstruction there is no monic polynomial q(t),

where q(t) has degree less than k andq(A)x0 = θ .

b) Let r be a root of m(t) = 0 so that m(t) =(t − r)s(t). Prove that r is an eigenvalue of A.[Hint: Is the vector s(A)x0 nonzero?] Note thatpart (b) shows that every root of m(t) = 0 isan eigenvalue of A. If A is symmetric, thenm(t) = 0 has only real roots, so s(A)x0 isinW .

8. Exercise 6 shows that eigenvectors of a symmetricmatrix belonging to distinct eigenvalues are orthogo-nal. We now show that if A is a symmetric (n × n)matrix, thenA has a set of n orthogonal eigenvectors.Let {u1, u2, . . . ,uk} be a set of k eigenvectors for A,1 ≤ k < n, where uTi uj = 0, i �= j . Let W be thesubset of Rn defined by

W = {x: xTui = 0, i = 1, 2, . . . , k}From the Gram–Schmidt theorem, the subsetW con-tains nonzero vectors.a) Prove thatW is a subspace of Rn.b) Suppose that A is (n× n) and symmetric.

Prove that Ax is inW whenever x is inW .From Exercise 7, A has an eigenvector, u, inW . If we label u as uk+1, then by construction{u1, u2, . . . ,uk, uk+1} is a set of orthogonaleigenvectors for A. It follows that A has a set ofn orthogonal eigenvectors, u1, u2, . . . ,un. Usingthese, we can formQ so thatQTAQ = D, whereD is diagonal andQTQ = I .

7.8 GENERALIZED EIGENVECTORS AND SOLUTIONSOF SYSTEMS OF DIFFERENTIAL EQUATIONS

In this section, we develop the idea of a generalized eigenvector in order to give thecomplete solution to the system of differential equations x′ = Ax. When an (n × n)matrix A has real eigenvalues, the eigenvectors and generalized eigenvectors of A forma basis for Rn. We show how to construct the complete solution of x′ = Ax from thisspecial basis. (When some of the eigenvalues of A are complex, a few modifications arenecessary to obtain the complete solution of x′ = Ax in a real form. In any event, theeigenvectors and generalized eigenvectors of A form a basis for Cn, where Cn denotesthe set of all n-dimensional vectors with real or complex components.)

To begin, let A be an (n × n) matrix. The problem we wish to solve is called aninitial-value problem and is formulated as follows: Given a vector x0 in Rn, find a


7.8 Generalized Eigenvectors and Solutions of Systems of Differential Equations 547

function x(t) such thatx(0) = x0

x′(t) = Ax(t) for all t.(1)

If we can find n functions x1(t), x2(t), . . . , xn(t) that satisfyx′(t) = Ax1(t), x′2(t) = Ax2(t), . . . , x′n(t) = Axn(t)

and such that {x1(0), x2(0), . . . , xn(0)} is linearly independent, then we can always solve(1). To show why, we merely note that there must be constants c1, c2, . . . , cn such that

x0 = c1x1(0)+ c2x2(0)+ · · · + cnxn(0)and then note that the function

y(t) = c1x1(t)+ c2x2(t)+ · · · + cnxn(t)satisfies the requirements of (1). Thus to solve x′ = Ax, x(0) = x0, we are led to searchfor n solutions x1(t), x2(t), . . . , xn(t) of x′ = Ax for which {x1(0), x2(0), . . . , xn(0)} islinearly independent.

If A has a set of k linearly independent eigenvectors {u1, u2, . . . ,uk}, whereAui = λiui , i = 1, 2, . . . , k,

then, as in Section 7.2, we can immediately construct k solutions to x′ = Ax, namely,x1(t) = eλ1tu1, x2(t) = eλ2tu2, . . . , xk(t) = eλktuk.

Also, since xi (0) = ui , it follows that {x1(0), x2(0), . . . , xk(0)} is a linearly independentset. The difficulty arises when k < n, for then we must produce an additional set ofn − k solutions of x′ = Ax. In this connection, recall that an (n × n) matrix A iscalled defective ifA has fewer than n linearly independent eigenvectors. (Note: Distincteigenvalues give rise to linearly independent eigenvectors; so A can be defective only ifthe characteristic equation p(t) = 0 has fewer than n distinct roots.)

Generalized EigenvectorsA complete analysis of the initial-value problem is simplified considerably if we assumeA is a Hessenberg matrix. If A is not a Hessenberg matrix, then a simple change ofvariables can be used to convert x′ = Ax to an equivalent problem y′ = Hy, where His a Hessenberg matrix. In particular, suppose that QAQ−1 = H and let y(t) = Qx(t).Therefore, we see that x(t) = Q−1y(t) and x′(t) = Q−1y′(t). Thus, x′(t) = Ax(t) isthe same as

Q−1y′(t) = AQ−1y(t).Multiplying both sides by Q, we obtain the related equation y′ = Hy, where H =QAQ−1 is a Hessenberg matrix. Given that we can always make this change of variables,we will focus for the remainder of this section on the problem of solving

x′(t) = Hx(t), x(0) = x0. (2)

As we know, ifH is (n×n) and has n linearly independent eigenvectors, we can alwayssolve (2). To see how to solve (2) when H is defective, let us suppose that p(t) is thecharacteristic polynomial for H . If we write p(t) in factored form as

p(t) = (t − λ1)m1(t − λ2)

m2 · · · (t − λk)mk,



where m1 + m2 + · · · + mk = n, then we say that the eigenvalue λi has algebraicmultiplicity mi . Given λi , we want to construct mi solutions of x′ = Hx that areassociated with λi . For example, suppose that λ is an eigenvalue of H of algebraicmultiplicity 2. We have one solution of x′ = Hx, namely, x(t) = eλtu, whereHu = λu;and we would like another solution. To find this additional solution, we note that thetheory from elementary differential equations suggests that we look for another solutionto x′ = Hx that is of the form x(t) = teλta + eλtb, where a �= θ , b �= θ . To see whatconditions a and b must satisfy, we calculate

x′(t) = tλeλta + eλta + λeλtbHx(t) = teλtHa + eλtHb.

After we equate x′(t) with Hx(t) and group like powers of t , our guess leads to theconditions

tλeλta = teλtHaeλt (a + λb) = eλtHb. (3)

If (3) is to hold for all t , we will need

λa = Haa + λb = Hb,

or equivalently,

(H − λI)a = θ(H − λI)b = a,

(4)

where a and b are nonzero vectors. From (4) we see that a is an eigenvector and that(H − λI)2b = θ , but (H − λI)b �= θ . We will call b a generalized eigenvector oforder 2. If we can find vectors a and b that satisfy (4), then we have two solutions ofx′ = Hx associated with λ, namely,

x1(t) = eλtax2(t) = teλta + eλtb.

Moreover, x1(0) = a, x2(0) = b, and it is easy to see that x1(0) and x2(0) are linearlyindependent. (If c1a+ c2b = θ , then (H − λI)(c1a+ c2b) = θ . Since (H − λI)a = θ ,it follows that c2(H − λI)b = c2a = θ , which shows that c2 = 0. Finally, if c2 = 0,then c1a = θ , which means that c1 = 0.)

Example 1 Solve the initial-value problem x′(t) = Ax(t), x(0) = x0, where

A =[

1 −11 3

]and x0 =

[5−7

].

Solution For matrix A, the characteristic polynomial p(t) = det(A− tI ) is given by

p(t) = (t − 2)2.



Thus the only eigenvalue of A is λ = 2. The only eigenvectors for λ = 2 are thosevectors u of the form

u = a[

1−1

], a �= 0.

Since A is defective, we look for a generalized eigenvector associated with λ = 2. Thatis, as in (4) we look for a vector v such that

(A− 2I )v = u, u =[

1−1

].

In detail, the equation (A− 2I )v = u is given by[ −1 −11 1

][v1

v2

]=[

1−1

].

Now, although matrixA−2I is singular, the equation above does have a solution, namely,v1 = 1, v2 = −2.

Thus we have found two solutions to x′(t) = Ax(t), x1(t) and x2(t), where

x1(t) = e2tu and x2(t) = te2tu+ e2tv.

The general solution of x′(t) = Ax(t) is

x(t) = a1e2tu+ a2(te

2tu+ e2tv).

To satisfy the initial condition x(0) = x0 = [5,−7]T , we need a1 and a2 so that

x(0) = a1u+ a2v = x0.

Solving for a1 and a2, we find a1 = 3 and a2 = 2.Therefore, the solution is

x(t) = 3e2tu+ 2(te2tu+ e2tv)

=[

5e2t + 2te2t

−7e2t − 2te2t

].

In this section, we will see that the solution procedure illustrated in Example 1 canbe applied to an unreduced Hessenberg matrix. To formalize the procedure, we need adefinition.

Definition 2 Let A be an (n× n) matrix. A nonzero vector v such that

(A− λI)jv = θ(A− λI)j−1v �= θ

is called a generalized eigenvector of order j corresponding to λ.

Note that an eigenvector can be regarded as a generalized eigenvector of order 1.



If a matrix H has a generalized eigenvector vm of order m corresponding to λ, thenthe following sequence of vectors can be defined:

(H − λI)vm = vm−1

(H − λI)vm−1 = vm−2...

...

(H − λI)v2 = v1.

(5)

It is easy to show that each vector vr in (5) is a generalized eigenvector of order r andthat {v1, v2, . . . , vm} is a linearly independent set (see Exercise 6). In addition, eachgeneralized eigenvector vr leads to a solution xr (t) of x′ = Hx, where

xr (t) = eλt(vr + tvr−1 + · · · + t r−1

(r − 1)!v1

)(6)

(see Exercise 7).We begin the analysis by proving two theorems that show that an (n×n) unreduced

Hessenberg matrix H has a set of n linearly independent eigenvectors and generalizedeigenvectors. Then, following several examples, we comment on the general case.

Theorem 16 Let H be an (n × n) unreduced Hessenberg matrix, and let λ be an eigenvalue of H ,where λ has algebraic multiplicity m. Then H has a generalized eigenvector of order mcorresponding to λ.

Proof Let p(t) = (t − λ)mq(t) be the characteristic polynomial for H , where q(λ) �= 0.Let vm be the vector vm = q(H)e1. By Theorem 13, (H − λI)m−1q(H)e1 �= θ , so(H − λI)m−1vm �= θ . Also by Theorem 13, (H − λI)mvm = (H − λI)mq(H)e1 =p(H)e1 = θ , so we see that vm is a generalized eigenvector of order m.

Theorem 16 is an existence result that is quite valuable. If we know that an unreducedHessenberg matrixH has an eigenvalue of multiplicitym, then we know that the sequenceof vectors in (5) is defined. Therefore, we can start with an eigenvector v1, then find v2,then find v3, and so on. (If H is a reduced Hessenberg matrix, the sequence (5) mightnot exist.)

Example 2 Consider the unreduced Hessenberg matrix H , where

H =

1 0 01 1 00 1 1

.

Note that the eigenvalue λ = 1 has algebraic multiplicity 3. Find generalized eigenvec-tors of orders 2 and 3.

Solution We work backward up the chain of vectors in (5), starting with an eigenvector v1. Nowall eigenvectors corresponding to λ = 1 have the form u = a[0, 0, 1]T , a �= 0. If wechoose v1 = [0, 0, 1]T , the equation (H − I )v2 = v1 has the form

0 0 01 0 00 1 0

x1

x2

x3

=

001

.



The solution to the equation above is v2 = [0, 1, a]T , where a is arbitrary. For simplicitywe choose a = 0 and obtain v2 = [0, 1, 0]T .

Next, we need to solve (H − I )v3 = v2:

0 0 01 0 00 1 0

x1

x2

x3

=

010

.

The solution to this equation is v3 = [1, 0, a]T , where a is arbitrary. One solution isv3 = [1, 0, 0]T .

To summarize, an eigenvector and two generalized eigenvectors for H are

v1 =

001

, v2 =

010

, v3 =

100

.

(Note: For i = 1, 2, 3, vi is a generalized eigenvector of order i.)

Example 2 illustrates a situation in which a (3 × 3) unreduced Hessenberg matrixhas a set of eigenvalues and generalized eigenvectors that form a basis for R3. As thenext theorem demonstrates, Example 2 is typical.

Theorem 17 Let H be an (n × n) unreduced Hessenberg matrix. There is a set {u1, u2, . . . ,un}of linearly independent vectors in which each ui is an eigenvector or a generalizedeigenvector of H .

Proof Suppose that H has eigenvalues λ1, λ2, . . . , λk , where λi has multiplicity mi . Thus thecharacteristic polynomial has the form

p(t) = (t − λ1)m1(t − λ2)

m2 · · · (t − λk)mk ,where m1 +m2 + · · · +mk = n. By Theorem 16, each eigenvalue λi has an associatedgeneralized eigenvector of ordermi . For each eigenvalue λi , we can use (5) to generate aset ofmi generalized eigenvectors having order 1, 2, . . . , mi . Let us denote this collectionof n generalized eigenvectors as

v1, v2, . . . , vm1 ,w1,w2, . . . ,wr , (7)

where m1 + r = n. In (7), vj is a generalized eigenvector of order j corresponding tothe eigenvalue λi , whereas each of the vectors wj is a generalized eigenvector for oneof λ2, λ3, . . . , λk .

To show that the vectors in (7) are linearly independent, consider

a1v1 + a2v2 + · · · + am1vm1 + b1w1 + b2w2 + · · · + brwr = θ . (8)

Now for q(t) = (t − λ2)m2 · · · (t − λk)mk and for 1 ≤ j ≤ r , we note that

q(H)wj = θ ,sincewj is a generalized eigenvector of ordermi or less corresponding to some λi . (Thatis, (H −λiI )miwj = θ for some λi and (H −λiI )mi is one of the factors of q(H). Thus,q(H)wj = θ for any j, 1 ≤ j ≤ r .)



Now multiplying both sides of (8) by q(H), we obtain

a1q(H)v1 + a2q(H)v2 + · · · + am1q(H)vm1 = θ . (9)

Finally, we can use (5) to show that a1, a2, . . . , am1 are all zero in Eq. (9) (seeExercise 8). Since we could have made this argument for any of the eigenvaluesλ2, λ3, . . . , λk , it follows that all the coefficients bj in Eq. (8) are also zero.

Example 3 Find the general solution of x′ = Hx, where

H =

1 0 01 3 00 1 1

.

Solution The characteristic polynomial isp(t) = (t−1)2(t−3); so λ = 1 is an eigenvalue of mul-tiplicity 2, whereas λ = 3 is an eigenvalue of multiplicity 1. Eigenvectors correspondingto λ = 1 and λ = 3 are (respectively)

v1 =

001

and w1 =

021

.

Thus we have two solutions of x′ = Hx, namely, x(t) = etv1 and x(t) = e3tw1. Weneed one more solution to solve the initial-value problem for any x0 inR3. To find a thirdsolution, we need a vector v2 that is a generalized eigenvector of order 2 correspondingto λ = 1. According to the previous remarks, we solve (H − I )x = v1 and obtain

v2 =−2

10

.

By Eq. (6), a third solution to x′ = Hx is given by x(t) = et (v2 + tv1). Clearly{v1, v2,w1} is a basis for R3; so if x0 = c1v1 + c2v2 + c3w1, then

x(t) = c1etv1 + c2e

t (v2 + tv1)+ c3e3tw1

will satisfy x′ = Hx, x(0) = x0.

Example 4 Find the general solution of x′ = Ax, where

A =−1 −8 1−1 −3 2−4 −16 7

.

Solution Reducing A to Hessenberg form, we have H = QAQ−1, where H,Q, andQ−1 are

H =−1 −4 1−1 5 2

0 −8 −1

, Q =

1 0 00 1 00 −4 1

, and Q−1 =

1 0 00 1 00 4 1

.



The change of variables y(t) = Qx(t) converts x′ = Ax, x(0) = x0 to the problemy′ = Hy, y(0) = Qx0.

The characteristic polynomial for H is p(t) = (t − 1)3, so λ = 1 is an eigenvalueof multiplicity 3 of H . Up to a scalar multiple, the only eigenvector of H is

v1 =

4−1

4

.

We obtain two generalized eigenvectors for H by solving (H − I )x = v1 to get v2, and(H − I )x = v2 to get v3. These generalized eigenvectors are

v2 =

1−1

2

and v3 =

3−1

3

.

Thus the general solution of y′ = Hy is

y(t) = et[c1v1 + c2(v2 + tv1)+ c3

(v3 + tv2 + t

2

2v1

)],

and we can recover x(t) from x(t) = Q−1y(t).If H is an (n× n) reduced Hessenberg matrix, it can be shown that Rn has a basis

consisting of eigenvectors and generalized eigenvectors of H . This general result isfairly difficult to establish, however, and we do not do so here.

7.8 EXERCISES

1. Find a full set of eigenvectors and generalized eigen-vectors for each of the following.

a)

[1 −11 3

]b)

−2 0 −2−1 1 −2

0 1 −1

c)

−6 31 −14−1 6 −2

0 2 1

2. Find a full set of eigenvectors and generalized eigen-vectors for the following. (Note: λ = 2 is the onlyeigenvalue of B.)

A =

1 0 0 01 1 0 00 1 1 00 0 1 1

, B =

2 3 −21 −32 7 −41 −50 1 −5 −10 0 4 4

3. Solve x′ = Ax, x(0) = x0 by transforming A to Hes-senberg form, where

x0 =−1−1

1

, and

a) A =

8 −6 211 −1 3−3 2 −8

,

b) A =

2 1 −1−3 −1 1

9 3 −4

,

c) A =

1 1 −1−3 −2 1

9 3 −5

.

4. Give the general solution of x′ = Ax, where A is fromExercise 2.



5. Repeat Exercise 4, where A is in part (c) of Exer-cise 1.

6. Prove that each vector vr in (5) is a generalized eigen-vector of order r and that {v1, v2, . . . , vm} is linearlyindependent.

7. Prove that the functions xr (t) defined in (6) are solu-tions of x′ = Hx.

8. Prove that the coefficients a1, a2, . . . in (9), are allzero. [Hint: Multiply (9) by (H − λ1I )

m1−1.]


1. Consider the quadratic form q(x) = x21 +3x1x2+x2

2 .Describe all possible real (2×2)matricesA such thatq(x) = xTAx.

2. Let

A =[

2 6+ aa 2

],

where a is a real constant. For what values a is Adefective?

3. Let

B =[

2 60 2

],

and consider the quadratic form defined by q(x) =xT Bx.a) Find a vector x such that q(x) < 0.

b) Note that B has only positive eigenvalues. Whydoes this fact not contradict Theorem 2 inSection 7.1?

4. Let q(t) = t2 + 3t + 2 and let A be a nonsingular(n×n)matrix. Show that q(A) and q(A−1) commutein the sense that q(A)q(A−1) = q(A−1)q(A).

5. A positive definite matrix A can be factored as A =LLT , where L is a nonsingular lower-triangular ma-trix. (Such a factorization is called the Cholesky de-composition.) Find the Cholesky decomposition foreach of the following.

a) A =[

4 66 10

]b) A =

1 3 13 13 71 7 6


1. Let A be a (3 × 3) nonsingular matrix. Use theCayley–Hamilton theorem to show that A−1 can berepresented as A−1 = aI + bA+ cA2.

2. Let A and B be similar (n× n) matrices and let p(t)denote a kth-degree polynomial. Show that p(A) andp(B) are also similar.

3. Let A be an (n × n) symmetric matrix and supposethat the quadratic form q(x) = xTAx is positive def-inite. Show that the diagonal entries of A, aii for1 ≤ i ≤ n, are all positive.

4. Let A be a (3× 3) matrix.a) Use the Cayley–Hamilton theorem to show thatA4 can be represented as

A4 = aI + bA+ cA2.

b) Make an informal argument that Ak can berepresented as a quadratic polynomial in A fork = 5, 6, . . . .



MATLAB EXERCISES

This exercise gives a concrete illustration of the Cayley–Hamilton theorem (the corollary toTheorem 15).

1. The Cayley–Hamilton theorem Begin by generating a randomly selected (4 × 4) ma-trix with integer entries. (As was shown in Chapter 6, the following simple MAT-LAB command will create such a matrix: A = round(20*rand(4,4) - 10*ones(4, 4)).) Next, use the MATLAB command poly(A)to obtain the coefficients of thecharacteristic polynomial forA; the vector produced by the poly(A)command gives thecoefficients of y = p(t), beginning with the coefficient of t4 and ending with the constantterm (the coefficient of t0).a) Calculate the matrix polynomial p(A) and verify that it is indeed the (4× 4) zero

matrix.b) The Cayley–Hamilton theorem can be used to find the inverse of a nonsingular

matrix. To see how, suppose that p(t) = t4 + a3t3 + a2t

2 + a1t + a0. Then, as youillustrated in part a),

A4 + a3A3 + a2A

2 + a1A+ a0I = 0. (1)

If we now multiply Eq. (1) by A−1 and solve for A−1, we will have a simple formulafor A−1 in terms of powers of A. Carry out this idea using MATLAB and verify thematrix you form using powers of A is indeed the inverse of A.

c) Equation (1) can also be used to generate high powers of A without actually formingthese high powers. For example, we can solve Eq. (1) for A4 in terms of I , A, A2, andA3. Multiplying this equation by A, we obtain a formula for A5 in terms of A, A2, A3,and A4; but, since we already have A4 represented in terms of I , A, A2, and A3, wecan clearly represent A5 in terms of I , A, A2, and A3. Use this idea to form the matrixA6 as a linear combination of I , A, A2, and A3. Check your calculation by usingMATLAB to form A6 directly.

2. Krylov’s method for finding the characteristic polynomial Algorithm 1 (Krylov’smethod) is presented in Section 7.4 where the method begins with an unreduced Hessenbergmatrix. The original version of Krylov’s method, however, does not insist on beginningwith an unreduced Hessenberg matrix or with the starting vectorw0 = e1. If we begin withan arbitrary square matrix A and with an arbitrary starting vector w0, however, we cannotguarantee that the algorithm will produce the characteristic polynomial (the algorithm willalways produce a polynomial factor of the characteristic polynomial).

As in Example 1, choose a randomly generated (4 × 4) matrix with integer entries. Beginwith the vector w0 = [1, 1, 1]T and generate vectors w1, w2, w3, and w4, as in Eq. (3) ofAlgorithm 1. Then solve the linear system (4) displayed in Algorithm 1. Verify that thecoefficients you obtain are the coefficients of the characteristic polynomial for A. Note thatthe only way that the algorithm can fail is if the vectors w0, . . . , w4 are linearly dependent.

June 11, 2001 09:17 i56-ans Sheet number 1 Page number 1 cyan black

AN1

Answer s to SelectedOdd-Numbered

Exercises*

CHAPTER 1

Exercises 1.1, p. 121. Linear 3. Linear

5. Nonlinear 7. x1 + 3x2 = 74x1 − x2 = 2

9. x1 + x2 = 03x1 + 4x2 = −1−x1 + 2x2 = −3

11.

1

5

–1

x – y = 1Unique solution

2x + y = 5x

y

2.5

13.

2

3

x

y

Infinitely manysolutions

3x + 2y = 6–6x – 4y = –12

17. x1 = −3t + 4x2 = 2t − 1x3 = t

19. A =[

2 1 6

4 3 8

]21. Q =

1 4 −3

2 1 1

3 2 1

23. 2x1 + x2 = 6 and x1 + 4x2 = −34x1 + 3x2 = 8 2x1 + x2 = 1

3x1 + 2x2 = 1

∗Many of the problems have answers that contain parameters or answers that can be written in a variety of forms. For problems of this sort,we have presented one possible form of the answer. Your solution may have a different form and still be correct. You can frequently checkyour solution by inserting it in the original problem or by showing that two different forms for the answer are equivalent.


AN2 Answers to Selected Odd-Numbered Exercises

25. A =[

1 1 −1

2 0 −1

], B =

[1 1 −1 2

2 0 −1 1

]

27. A =

1 1 2

3 4 −1

−1 1 1

,

B =

1 1 2 6

3 4 −1 5

−1 1 1 2

29. A =

1 1 1

2 3 1

1 −1 3

, B =

1 1 1 1

2 3 1 2

1 −1 3 2

31. x1 + 2x2 − x3 = 1− x2 + 3x3 = 1

5x2 − 2x3 = 6

33. x1 + x2 = 9− 2x2 = −2− 2x2 = −21

35. x1 + 2x2 − x3 + x4 = 1x2 + x3 − x4 = 3

3x2 + 6x3 = 1

Exercises 1.2, p. 261. a) The matrix is in echelon form.

b) The operation R1 − 2R2 yields reduced

echelon form

[1 0

0 1

].

3. a) The operations R2 − 2R1, (1/2)R1, R2 − 4R1,

(1/5)R2 yield echelon form[1

0

3/2

1

1/2

2/5

].

5. a) The operations R1 ↔ R2, (1/2)R1, (1/2)R2yield echelon form[

1

0

0

0

1/2

1

2

3/2

].

7. a) The matrix is in echelon form.b) The operations R1 − 2R3, R2 − 4R3,

R1 − 3R2 yield reduced echelon form

1 0 0 5

0 1 0 −2

0 0 1 1

.

9. a) The operation (1/2)R2 yields echelon form

1 2 −1 −2

0 1 −1 −3/2

0 0 0 1

.

11. x1 = 0, x2 = 013. x1 = −2+ 5x3, x2 = 1− 3x3, x3 arbitrary15. The system is inconsistent.17. x1 = x3 = x4 = 0; x2 arbitrary19. The system is inconsistent.21. x1 = −1 − (1/2)x2 + (1/2)x4, x3 = 1 − x4, x2 and x4

arbitrary, x5 = 023. Inconsistent25. x1 = 2− x2, x2 arbitrary27. x1 = 2− x2 + x3, x2 and x3 arbitrary29. x1 = 3− 2x3, x2 = −2+ 3x3, x3 arbitrary31. x1 = 3− (7x4 − 16x5)/2, x2 = (x4 + 2x5)/2,

x3 = −2+ (5x4 − 12x5)/2, x4 and x5 arbitrary33. Inconsistent35. Inconsistent37. All values of a except a = 839. a = 3 or a = −341. α = π/3 or α = 5π/3; β = π/6 or β = 5π/645.

[1 × ×0 1 ×

],

[1 × ×0 0 1

],

[1 × ×0 0 0

],

[0 1 ×0 0 1

],

[0 1 ×0 0 0

],

[0 0 1

0 0 0

],

[0 0 0

0 0 0

]

47. The operations R2−2R1, R1+2R2,−R2 transform B toI. The operations R2 − 3R1, R1+R2, (−1/2)R2 reduceC to I, so the operations−2R2, R1−R2, R2+3R1 trans-form I to C. Thus the operations R2 − 2R1, R1 + 2R2,

−R2, −2R2, R1 − R2, R2 + 3R1 transform B to C.

49. N = 13551. The amounts were $39, $21, and $12.53. Let A denote the number of adults, S the number of stu-

dents, and C the number of children. Possible solutionsare: A = 5k, S = 67−11k, C = 12+6k, where k = 0,1, . . . , 6.

55. n(n+ 1)/257. n(n+ 1)(2n+ 1)(3n2 + 3n− 1)/30



Exercises 1.3, p. 371.

1

0

0

0

1

0

0

0

0

1

0

0

5/6

2/3

0

0

n = 3

r = 2

x2

3. 1

0

0

0

1

0

4

−1

0

0

0

1

13/2

−3/2

1/2

n = 4

r = 3

x3

5. r = 2, r = 1, r = 07. Infinitely many solutions9. Infinitely many solutions, a unique solution, or no

solution11. A unique solution or infinitely many solutions13. Infinitely many solutions15. A unique solution or infinitely many solutions17. Infinitely many solutions19. There are nontrivial solutions.21. There is only the trivial solution.23. a = 1

25. a)

1 0 0

0 1 0

0 0 1

0 0 0

27. 7x + 2y − 30 = 029. −3x2 + 3xy + y2 − 54y + 113 = 0

Exercises 1.4, p. 441. a) x1 + x4 = 1200

x1 + x2 = 1000x3 + x4 = 600

x2 + x3 = 400b) x1 = 1100, x2 = −100, x3 = 500;c) The minimum value is x1 = 600 and the maximum

value is x1 = 1000.3. x2 = 800, x3 = 400, x4 = 2005. I1 = 0.05, I2 = 0.6, I3 = 0.557. I1 = 35/13, I2 = 20/13, I3 = 15/13

Exercises 1.5, p. 58

1. a)

[2 0

2 6

]; b)

[0 4

2 4

];

c)

[0 −6

6 18

]; d)

[ −6 8

4 6

]

3.[ −2 −2

0 0

]5.

[ −1 −1

0 0

]

7. a)

[3

−3

]; b)

[3

4

]; c)

[0

0

]

9. a)

[2

1

]; b)

[0

1

]; c)

[17

14

]

11. a)

[2

3

]; b)

[20

16

]

13. a1 = 11/3, a2 = −4/315. a1 = −2, a2 = 0 17. No solution

19. a1 = 4, a2 = −3/2 21. w2 =[

1

3

]

23. w3 =[ −1

2

]25.

[ −4 6

2 12

]

27.[

4 12

4 10

]29.

[0 0

0 0

]

31. AB =[

5 16

5 18

], BA =

[4 11

6 19

]

33. Au =[

11

13

], vA = [8, 22]

35. 66 37.

5 10

8 12

15 20

8 17

39.

27

28

43

47

41. (BA)u = B(Au) =[

37

63

]

43. x =

x1

x2

x3

=

−2

3

0

+ x3

1

−2

1

45. x =

x1

x2

x3

x4

x5

= x3

1

−2

1

0

0

+ x5

1

−1

0

−1

1



47. x =

x1

x2

x3

x4

x5

= x3

1

−2

1

0

0

+ x4

2

−3

0

1

0

+

x5

3

−4

0

0

1

49. x =

x1

x2

x3

x4

x5

= x2

1

1

0

0

0

+ x4

2

0

−2

1

0

51.C(A(Bu)) (CA)(Bu) (C(AB))u C((AB)u)

12 16 20 16

53. a) AB is (2× 4); BA is undefined.b) Neither is defined.c) AB is undefined; BA is (6× 7).d) AB is (2× 2); BA is (3× 3).e) AB is (3× 1); BA is undefined.f ) Both are (2× 4).g) AB is (4× 4), BA is (1× 1).

61. a) For (i), A =[

2 −1

1 1

], x =

[x1

x2

],

b =[

3

3

]; for (ii), A =

1 −3 1

1 −2 1

0 1 −1

,

x =

x1

x2

x3

, b =

1

2

−1

.

b) For (i), x1

[2

1

]+ x2

[ −1

1

]=[

3

3

];

for (ii), x1

1

1

0

+ x2

−3

−2

1

+

x3

1

1

−1

=

1

2

−1

.

c) For (i), b = 2A1 + A2;for (ii), b = 2A1 + A2 + 2A3.

63. B =[

2 −1

−1 1

]

65. a) B =[ −1 6

1 0

]; b) Not possible;

c) B =[ −2a

a

−2b

b

], a and b arbitrary


1. (DE)F = D(EF) =[

23 23

29 29

]

3. DE =[

8 15

11 18

], ED =

[12 27

7 14

]

5. Fu = Fv =[

0

0

]7.

[3 4 2

1 7 6

]

9.[

5 5

9 9

]11. [0 0] 13. −6

15. 36 17. 2 19.√

221.√

29 23. 0 25. 2√

529. D and F are symmetric.31. AB is symmetric if and only if AB = BA.33. xT Dx = x2

1 + 3x22 + (x1 + x2)

2 > 035.

[ −3 3

3 −3

]37.

[ −27 −9

27 9

]

39. −12 18 24

18 −27 −36

24 −36 −48

41. a) x =[ −10

8

]; b) x =

[6

−2

]

57. n = 5, m = 7



59. n = 4, m = 661. n = 5, m = 5

Exercises 1.7, p. 781. Linearly independent3. Linearly dependent, v5 = 3v1

5. Linearly dependent, v3 = 2v1

7. Linearly dependent, u4 = 4u5

9. Linearly independent11. Linearly dependent, u4 = 4u5

13. Linearly dependent, u4 = 165

u0 + 125

u1 − 45

u2

15. Those in Exercises 5, 6, 13, and 1417. Singular; x1 = −2x2 19. Singular; x1 = −2x2

21. Singular; x1 = x2 = 0, x3 arbitrary23. Nonsingular25. Singular; x2 = x3 = 0, x1 arbitrary27. Nonsingular 29. a = 6

31. b(a − 2) = 4 33. c − ab = 0

35. v3 = A2 37. v2 = (C1 + C2)/2

39. u3 = (−8F1 − 2F2 + 9F3)/341. b = −11v1 + 7v2 43. b = 0v1 + 0v2

45. b = −3v1 + 2v2

47. a) Any value a b) Any value a

Exercises 1.8, p. 901. p(t) = (−1/2)t2 + (9/2)t − 13. p(t) = 2t + 35. p(t) = 2t3 − 2t2 + 3t + 17. y = 2e2x + e3x

9. y = 3e−x + 4ex + e2x

11.∫ 3h

0f (t) dt ≈ 3h

2[f (h)+ f (2h)]

13.∫ 3h

0f (t) dt

≈ 3h8[f (0)+ 3f (h)+ 3f (2h)+ f (3h)]

15.∫ h

0f (t) dt ≈ h

2[−f (−h)+ 3f (0)]

17. f ′(0) ≈ [−f (0)+ f (h)]/h19. f ′(0) ≈ [−3f (0)+ 4f (h)− f (2h)]/(2h)21. f ′′(0) ≈ [f (−h)− 2f (0)+ f (h)]/h2

27. p(t) = t3 + 2t2 + 3t + 229. p(t) = t3 + t2 + 4t + 3

35. f ′(a) ≈ 112h

× [f (a − 2h)− 8f (a − h)+ 8f (a + h)− f (a + 2h)]

Exercises 1.9, p. 1025. x1 = −3, x2 = 1.57. x1 = 14, x2 = −20, x3 = 89. If B = (bij ) is a (3× 3) matrix such that AB = I, then

0b11+0b21+0b31 = 1. Since this is impossible, no suchmatrix exists.

13.[

3 −1

−2 1

]

15.[ −1/3 2/3

2/3 −1/3

]17.

1 0 0

−2 1 0

5 −4 1

19.

1 −2 0

3 −3 −1

−6 7 2

21.−1/2 −2/3 −1/6 7/6

1 1/3 1/3 −4/3

0 −1/3 −1/3 1/3

−1/2 1 1/2 1/2

23. A−1 = (1/10)

[3 2

−2 2

]

25. A has no inverse 27. λ = 2 and λ = −2

29. x1 = 6, x2 = −8 31. x1 = 18, x2 = 13

33. x1 = 5/2, x2 = 5/2

35. Q−1 = C−1A−1 =[ −3 1

3 5

]

37. Q−1 = (A−1)T =[

3 0

1 2

]

39. Q−1 = (A−1)T (C−1)T =[ −3 3

1 5

]

41. Q−1 = BC−1 =[

1 5

−1 4

]

43. Q−1 = (1/2)A−1 =[

3/2

0

1/2

1

]



45. Q−1 = B(C−1A−1) =[

3 11

−3 7

]

47. B =

1 10

15 12

3 3

; C =

[13 12 8

2 3 5

]

49. (AB)−1 = B−1A−1 =

2 35 1

14 35 34

23 12 70

,

(3A)−1 = 13A−1 =

1/3 2/3 5/3

1 1/3 2

2/3 8/3 1/3

,

(AT )−1 = (A−1)T =

1 3 2

2 1 8

5 6 1

63. b0 = −5, b1 = 264. b0 = −7, b1 = 0

CHAPTER 2


1.

x

y

u

v

–4–4

4

8

–8

4 8–8

For vector AB the x-component is −4 − 0 = 4 andthe y-component is 3 − (−2) = 5. For vector CD thex-component is 1 − 5 = −4 and the y-component is4− (−1) = 5. The vectors are equal.

3.

x

y

uv–4

–4

4

8

–8

4 8–8

For vector AB the x-component is 0 − (−4) = 4 andthe y-component is 1 − (−2) = 3. For vector CD thex-component is 3 − 0 = 3 and the y-component is2− (−2) = 4. The vectors are not equal.

5. a) For u: ‖u‖ = √(2− (−3))2 + (2− 5)2 = √34.

For v: ‖v‖ = √(−2− 3)2 + (7− 4)2 = √34.

Therefore, ‖u‖ = ‖v‖.b) Segment AB has slope (2− 5)/(2− (−3)) = −3/5.

Segment CD has slope (7− 4)/(−2− 3) = 3/(−5).

c) For vector AB the x-component is 2− (−3) = 5 andthe y-component 2− 5 = −3. For vector CD thex-component is −2− 3 = −5 and the y-componentis 7− 4 = 3. The vectors are not equal.

d)

x

y

uv

–4–4

8

–8

4 8–8

7. D = (−2, 5)

9. D = (−1, 1)

11. v1 = 5, v2 = 3

13. v1 = −6, v2 = 5



15. B = (3, 3)

17. A = (2, 4)

19. a) B = (3, 2), C = (5, 0)

b)

x

y

u

u + v

v

–4

–4

8

4

–8

8–8

21. a) Q = (7, 1)

b)

x

y

u

u + v

v

–4

–4

8

4

–8

84–8

Q

23. a) B = (−1, 4), C = (0,−1)

b)

x

y

uu – v

v–4

–4

8

4

–8

84–8

25. a) B = (3, 3), C = (6, 1)

b)

x

y

v2v

–4

–4

8

4

–8

84–8

27. a) D = (6,−3)b)

x

y

v

2v–4

–4

8

4

–8

84–8

29.15

[3

4

]31.

3√13

i− 2√13

j

33. B = (1,−2) 35. B = (1/3,−7)

37. u+ v =[

2

4

], u− 3v =

[ −6

8

]

39. u+ v = 4i+ j, u− 3v = −4i− 7j


1.

y

P = (1, 2, 1)

Q = (0, 2, 2)

z

x

d(P,Q) = √(0− 1)2 + (2− 2)2 + (2− 1)2 = √2



3.

yP = (1, 0, 0)

Q = (0, 0, 1)

z

x

d(P,Q) = √(0− 1)2 + (0− 0)2 + (1− 0)2 = √2

5. M = (1, 4, 4);d(M,O) = √

(0− 1)2 + (0− 4)2 + (0− 4)2 = √33

7. B = (0, 3/2,−3/2),C = (1, 3, 0),D = (2, 9/2, 3/2)

9. line11. plane

19. a) v =

3

2

−3

b) D = (2, 4,−2); v

3

2

−3

21. a) v =

0

5

−7

b) D = (−1, 7,−6)

23. λ = 225. A = (−2, 3,−1)

27. a) u+ 2v =

7

7

10

b) ‖u− v‖ = 3

c) w

−1

1/2

−1

29. a) u+ 2v =−1

1

2

b) ‖u− v‖ = √150

c) w

7/2

−5

1/2

31. u = 2k

33. u = −53

v =

5/3

−10/3

−10/3

35. u =−2

−4

−1

Exercises 2.3, p. 1461. −2

3. −1

5. cos θ = 11√290

7. cos θ = 16

9. θ = π

6

11. θ = π

213. u = i+ 3j+ 4k

15. u = 3i+ 4k

17. u = −i+ 3j+ k

19. R = (33/10, 11/10)

x

y

u

w

q

–4–4

4

8

–8

4 8–8



21. R = (−3,−1)

x

y

u

w

q

–4–4

4

8

–8

4 8–8

23. u1 =[

5

5

], u2 =

[2

−2

]

25. u1 =

2

4

2

, u2 =

4

0

−4

33.

−2

2

8

35. 3i− j− 5k

37.

−2

−5

4

39.

2

−3

1

41.

1

1

−4

43. 4√

6 square units45. 3√

11 square units47. 24 cubic units49. not coplanar

Exercises 2.4, p. 1571. x = 2+ 3t, y = 4+ 2t,

z = −3+ 4t

3. x = t, y = 4− 2t, z = 1+ 3t

5. The lines are parallel.

7. The lines are not parallel.

9. x = 1+ 3t, y = 2+ 4t, z = 1− t

11. The line intersects the plane atP = (−1, 4, 1).

13. The line intersects the plane atP = (−8,−13, 36).

15. 6x + y − z = 16

17. −7x − y + 4z = 5

19. 2x − 7y − 3z = 1

21. n =

2/3

1/3

−2/3

23. x + 2y − 2z = 17

25. x = 4− t, y = 5+ t, z = t

CHAPTER 3


x

y

(3, 1)

(–3, –1)

u

–u

3.

x

y

(3, 1)u

(–9, –3)

–3u



5.

x

y

(3, 1)

(4, 3)(1, 2)

u

u + vv

7. y

(3, 1)

(2, –1)

(1, 2)

u

u – v

v

–vx

9.

y

z

x

(0, 2, 6)

(0, 1, 3)

x

2x

11.

y

z

x

(0, 1, 3)

(2, 2, 3)

(2, 1, 0)

x

x + y

–y

13.

x

y

x + 3y = 0

15.

x

y

x + y = 0

17.

x

yx2 + y2 = 4

19. W is the plane with equation x + y + 2z = 0.

21. W is the set of points on the upper half of the spherex2 + y2 + z2 = 1.

23. W ={[

a

0

]: a any real number

}

25. W ={[

a

2

]: a any real number

}

27. W =

x1

x2

x3

: x1 + x2 − 2x3 = 0

29. W =

0

x2

x3

: x2, x3 any real number



Exercises 3.2, p. 1741. W is a subspace. W is the set of points on the line with

equation x = 2y.3. W is not a subspace.5. W is the subspace consisting of the points on the y-axis.7. W is not a subspace.9. W is the subspace consisting of the points on the plane

2x − y − z = 0.11. W is not a subspace.13. W is not a subspace.15. W is the subspace consisting of the points on the line

with parametric equations x = 2t, y = −t, z = t.

17. W is the subspace consisting of the points on the x-axis.19. W is the set of points on the plane x + 2y + 3z = 0.23. W is the line formed by the two intersecting planes

x+2y+2z = 0 and x+3y = 0. The line has parametricequations x = −6t, y = 2t, z = t.

25. W is the set of points on the plane x − z = 0.

Exercises 3.3, p. 1861. Sp(S) = {x: x1+x2 = 0}; Sp(S) is the line with equation

x + y = 0.3. Sp(S) = {e}; Sp(S) is the point (0, 0).5. Sp(S) = R2

7. Sp(S) = {x : 3x1 + 2x2 = 0}; Sp(S) is the line withequation 3x + 2y = 0.

9. Sp(S) = R2

11. Sp(S) = {x: x1+x2 = 0}; Sp(S) is the line with equationx + y = 0.

13. Sp(S) = {x: x2 + x3 = 0 and x1 = 0}; Sp(S) is the linethrough (0, 0, 0) and (0,−1, 1). The parametric equa-tions for the line are x = 0, y = −t, z = t.

15. Sp(S) = {x: 2x1− x2+ x3 = 0}; Sp(S) is the plane withequation 2x − y + z = 0.

17. Sp(S) = R3

19. Sp(S) = {x : x2 + x3 = 0}; Sp(S) is the plane withequation y + z = 0.

21. The vectors u in b), c), and e) are in Sp(S); for b), u = x;for c), u = v; for e), u = 3v − 4x.

23. d and e25. x and y27. N (A) = {x in R2: −x1 + 3x2 = 0};R(A) = {x in R2: 2x1 + x2 = 0}

29. N (A) = {θ};R(A) = R2

31. N (A) = {x in R3: x1 + 2x2 = 0 and x3 = 0};R(A) = R2

33. N (A) = {x in R2: x2 = 0};R(A) = {x in R3: x2 = 2x1 and x3 = 3x1}

35. N (A) = {x in R3: x1 = −7x3 and x2 = 2x3};R(A) = {x in R3: −4x1 + 2x2 + x3 = 0}

37. N (A) = {θ};R(A) = R3

39. a) The vectors b in ii), v), and vi) are inR(A).

b) For ii), x = [1, 0]T is one choice; for v), x = [0, 1]Tis one choice; for vi), x = [0, 0]T is one choice.

c) For ii), b = A1; for v), b = A2, for vi),b = 0A1 + 0A2.

41. a) The vectors b in i), iii), v), and vi) are inR(A).

b) For i), x = [−1, 1, 0]T is one choice; for iii),x = [−2, 3, 0]T is one choice; for v),x = [−2, 1, 0]T is one choice; for vi),x = [0, 0, 0]T is one choice.

c) For i), b = −A1+A2; for iii), b = −2A1+ 3A2; forv), b = −2A1 + A2; for vi), b = 0A1 + 0A2 + 0A3.

47. w1 = [−2, 1, 3]T , w2 = [0, 3, 2]T49. w1 = [1, 2, 2]T , w2 = [0, 3, 1]T

Exercises 3.4, p. 2001. {[1, 0, 1, 0]T , [−1, 1, 0, 1]T }3. {[1, 1, 0, 0]T , [−1, 0, 1, 0]T , [3, 0, 0, 1]T }5. {[−1, 1, 0, 0]T , [0, 0, 1, 0]T , [0, 0, 0, 1]T }7. {[2, 1,−1, 0]T , [−1, 0, 0, 1]T }

9. a) x = 2

1

0

1

0

+

−1

1

0

1

; b) x is not in W.

c) x = −3

−1

1

0

1

; d) x = 2

1

0

1

0

11. a) B =

1 0 1 1

0 1 1 −1

0 0 0 0

b) A basis forN (A) is {[−1,−1, 1, 0]T ,[−1, 1, 0, 1]T }.

c) {A1,A2} is a basis for the column space of A;A3 = A1 + A2 and A4 = A1 − A2.



d) {[1, 0, 1, 1], [0, 1, 1,−1]} is a basis for the rowspace of A.

13. a) B =

1 0 −1 2

0 1 1 −1

0 0 0 0

0 0 0 0

b) A basis forN (A) is {[1,−1, 1, 0]T , [−2, 1, 0, 1]T }.c) {A1,A2} is a basis for the column space of A;

A3 = −A1 + A2 and A4 = 2A1 − A2.

d) {[1, 0,−1, 2], [0, 1, 1,−1]} is a basis for the rowspace of A.

15. a) B =

1 2 0

0 0 1

0 0 0

b) A basis forN (A) is {[−2, 1, 0]T }.c) {A1,A3} is a basis for the column space of A;

A2 = 2A1.

d) {[1, 2, 0], [0, 0, 1]} is a basis for the row space of A.

17. {[1, 3, 1]T , [0,−1,−1]T } is a basis forR(A).

19. {[1, 2, 2, 0]T , [0, 1,−2, 1]T } is a basis forR(A).

21. a) {[1, 2]T }; b) {[1, 2]T }23. a) {[1, 2, 1]T , [2, 5, 0]T };

b) {[1, 2, 1]T , [0, 1,−2]T }25. a) {[0, 1, 0]T }; b) {[−1, 1, 0]T , [0, 0, 1]T }

c) {[−1, 1, 0]T }27. −2v1 − 3v2 + v3 = θ , so S is linearly dependent. Since

v3 = 2v1 + 3v2, if v = a1v1 + a2v2 + a3v3 is inSp{v1, v2, v3}, then v = (a1 + 2a3)v1 + (a2 + 3a3)v2.

Therefore v is in Sp{v1, v2}.29. The subsets are {v1, v2, v3}, {v1, v2, v4},{v1, v3, v4}.

33. S is not a basis.35. S is not a basis.

Exercises 3.5, p. 2121. S does not span R2.

3. S is linearly dependent.5. S is linearly dependent and does not span R2.

7. S does not span R3.

9. S is linearly dependent.11. S is a basis.13. S is not a basis.15. dim(W) = 3

17. dim(W) = 219. dim(W) = 1

21. {[−2, 1]T } is a basis for N (A); nullity(A) = 1;rank(A) = 1.

23. {[−5,−2, 1]T } is a basis for N (A); nullity(A) = 1;rank(A) = 2.

25. {[1,−1, 1]T , [0, 2, 3]T } is a basis for R(A); rank(A) =2; nullity(A) = 1.

27. a) {[1, 1,−2], [0,−1, 1]T , [0, 0, 1]T } is a basis for W ;dim(W) = 3.

b) {[1, 2,−1, 1]T , [0, 1,−1, 1]T , [0, 0,−1, 4]T } is abasis for W ; dim(W) = 3.

29. dim(W) = 2

33. a) rank(A) ≤ 3 and nullity(A) ≥ 0.b) rank(A) ≤ 3 and nullity(A) ≥ 1.c) rank(A) ≤ 4 and nullity(A) ≥ 0.

Exercises 3.6, p. 2245. uT

1 u3 = 0 requires a + b + c = 0; uT2 u3 = 0 requires

2a + 2b − 4c = 0; therefore c = 0 and a + b = 0.7. uT

1 u2 = 0 forces a = 3; then uT2 u3 = 0 requires

−8− b + 3c = 0, while uT1 u3 = 0 requires

4+ b + c = 0; therefore b = −5, c = 1.9. v = (2/3)u1 − (1/2)u2 + (1/6)u3

11. v = 3u1

13. u1 =

0

0

1

0

, u2 =

1

1

0

1

, u3 =

1/3

−2/3

0

1/3

15. u1 =

1

1

0

, u2 =

1

−1

−1

, u3 =

2

−2

4

17. u1 =

0

1

0

1

, u2 =

−1

−1

0

1

, u3 =

−2/3

1/3

1

−1/3

19. For the null space:

−3

−1

1

0

,

7/11

−27/11

−6/11

1

;



for the range space:

1

2

1

,

−11/6

8/6

−5/6


1. a)

[0

0

]; b)

[ −1

0

]; c)

[1

−1

];

d)

[ −2

1

]

3. c) is not but a), b), and d) are.9. F is a linear transformation.

11. F is not a linear transformation.13. F is a linear transformation.15. F is a linear transformation.17. F is not a linear transformation.19. a)

3

1

−1

; b)

0

−1

−2

; c)

7

2

−3

21. T

([x1

x2

])=[

x1 + x2

x1 − 2x2

]

23. T

x1

x2

x3

=

−1

2x1 − 1

2x2 + 1

2x3

12x1 + 1

2x2 + 1

2x3

25. A =[

1 3

2 1

];N (T ) = {θ};R(T ) = R2;

rank(T ) = 2; nullity(T ) = 027. A = [3 2]; N (T ) = {x in R2: 3x1 + 2x2 = 0};R(T ) = R1; rank(T ) = 1; nullity(T ) = 1

29. A =[

1 −1 0

0 1 −1

];N (T ) = {x in R3: x1 = x3

and x2 = x3};R(T ) = R2; rank(T ) = 2;nullity(T ) = 1


1. x∗ =[ −5/13

7/13

]

3. x∗ =

(28/74)− 3x3

(27/74)+ x3

x3

, x3 arbitrary

5. x∗ =[

2x2 + 26/7

−x2

], x2 arbitrary

7. y = 1.3t + 1.1 9. y = 1.5t

11. y = 0.5t2 + 0.1t 13. y = 0.25t2+2.15t+0.45


1. w∗ =

1/2

3

11/2

3. w∗ =

1

1

1

5. w∗ =

4

2

2

7. w∗ =

3

−1

2

9. w∗ =

0

1

−1

11. w∗ = 45

2

1

0

+ 11

2

−1/5

2/5

1

13. w∗ =

1

1

0

+ 4

1/2

−1/2

1

15. w∗ = 2

1

−1

1

+

1

1

0

CHAPTER 4


1. λ = 1, x = a

[ −1

1

], a �= 0;

λ = 3, x = a

[0

1

], a �= 0

3. λ = 1, x = a

[1

1

], a �= 0;

λ = 3, x = a

[ −1

1

], a �= 0



5. λ = 1, x = a

[ −1

1

], a �= 0;

λ = 3, x = a

[1

1

], a �= 0

7. λ = 1, x = a

[0

1

], a �= 0

9. λ = 0, x = a

[ −1

1

], a �= 0;

λ = 5, x = a

[2

3

], a �= 0

11. λ = 2, x = a

[ −1

1

], a �= 0


1. M11 =

1 3 −1

2 4 1

2 0 −2

; A11 = 18

3. M31 =−1 3 1

1 3 −1

2 0 −2

; A31 = 0

5. M34 =

2 −1 3

4 1 3

2 2 0

; A34 = 0

7. det(A) = 0 9. det(A) = 0; A is singular.11. det(A) = −1; A is nonsingular.13. det(A) = 6; A is nonsingular.15. det(A) = 20; A is nonsingular.17. det(A) = 6; A is nonsingular.19. det(A) = 36; A is nonsingular.21. y = 2x − 1 27. 5 29. 3/5


1. det(A) =

∣∣∣∣∣∣∣1 2 1

3 0 2

−1 1 3

∣∣∣∣∣∣∣ =R2 − 3R1R3 + R1

∣∣∣∣∣∣∣1 2 1

0 −6 −1

0 3 4

∣∣∣∣∣∣∣=∣∣∣∣∣ −6 −1

3 4

∣∣∣∣∣ = −21

3. det(A) =

∣∣∣∣∣∣∣3 6 9

2 0 2

1 2 0

∣∣∣∣∣∣∣ = (3)(2)

∣∣∣∣∣∣∣1 2 3

1 0 1

1 2 0

∣∣∣∣∣∣∣R2 − R1R3 − R1

= 6

∣∣∣∣∣∣∣1 2 3

0 −2 −2

0 0 −3

∣∣∣∣∣∣∣ = 6

∣∣∣∣∣ −2 −2

0 −3

∣∣∣∣∣ = 36

5. det(A) =

∣∣∣∣∣∣∣2 4 −3

3 2 5

2 3 4

∣∣∣∣∣∣∣ = ( 12 )

∣∣∣∣∣∣∣2 4 −3

6 4 10

2 3 4

∣∣∣∣∣∣∣R2 − 3R1R3 − R1

= ( 12 )

∣∣∣∣∣∣∣2 4 −3

0 −8 19

0 −1 7

∣∣∣∣∣∣∣=∣∣∣∣∣ −8 19

−1 7

∣∣∣∣∣ = −37

7.∣∣∣∣∣∣∣∣∣∣

1 0 0 0

2 0 0 3

1 1 0 1

1 4 2 2

∣∣∣∣∣∣∣∣∣∣= (−1)

∣∣∣∣∣∣∣∣∣∣

1 0 0 0

2 3 0 0

1 1 0 1

1 2 2 4

∣∣∣∣∣∣∣∣∣∣

=

∣∣∣∣∣∣∣∣∣∣

1 0 0 0

2 3 0 0

1 1 1 0

1 2 4 2

∣∣∣∣∣∣∣∣∣∣= 6

9.∣∣∣∣∣∣∣∣∣∣

0 0 2 0

0 0 1 3

0 4 1 3

2 1 5 6

∣∣∣∣∣∣∣∣∣∣= (−1)

∣∣∣∣∣∣∣∣∣∣

2 0 0 0

1 0 0 3

1 4 0 3

5 1 2 6

∣∣∣∣∣∣∣∣∣∣

=

∣∣∣∣∣∣∣∣∣∣

2 0 0 0

1 3 0 0

1 3 0 4

5 6 2 1

∣∣∣∣∣∣∣∣∣∣

= (−1)

∣∣∣∣∣∣∣∣∣∣

2 0 0 0

1 3 0 0

1 3 4 0

5 6 1 2

∣∣∣∣∣∣∣∣∣∣= −48



11.∣∣∣∣∣∣∣∣∣∣

0 0 1 0

0 2 6 3

2 4 1 5

0 0 0 4

∣∣∣∣∣∣∣∣∣∣= (−1)

∣∣∣∣∣∣∣∣∣∣

2 4 1 5

0 2 6 3

0 0 1 0

0 0 0 4

∣∣∣∣∣∣∣∣∣∣= −16

13. det(B) = 3 det(A) = 615. det(B) = − det(A) = −217. det(B) = −2 det(A) = −4

19. R4 − 12R1 gives

∣∣∣∣∣∣∣∣∣∣

2 4 2 6

1 3 2 1

2 1 2 3

0 0 0 −2

∣∣∣∣∣∣∣∣∣∣

= (−2)

∣∣∣∣∣∣∣2 4 2

1 3 2

2 1 2

∣∣∣∣∣∣∣ = (−2)

∣∣∣∣∣∣∣2 4 2

0 1 1

0 −3 0

∣∣∣∣∣∣∣= (−2)(2)

∣∣∣∣∣ 1 1

−3 0

∣∣∣∣∣ = −12.

21. R4 − 2R3 gives

∣∣∣∣∣∣∣∣∣∣

0 4 1 3

0 2 2 1

1 3 1 2

0 −4 −1 0

∣∣∣∣∣∣∣∣∣∣

=

∣∣∣∣∣∣∣4 1 3

2 2 1

−4 −1 0

∣∣∣∣∣∣∣ =∣∣∣∣∣∣∣

4 1 3

2 2 1

0 0 3

∣∣∣∣∣∣∣= 3

∣∣∣∣∣ 4 1

2 2

∣∣∣∣∣ = 18.

Exercises 4.4, p. 3051. p(t) = (1− t)(3− t); λ = 1, λ = 33. p(t) = t2 − 4t + 3 = (t − 3)(t − 1); λ = 1, λ = 35. p(t) = t2 − 4t + 4 = (t − 2)2; λ = 2, algebraic

multiplicity 27. p(t) = −t3 + t2 + t − 1 = −(t − 1)2(t + 1);

λ = 1, algebraic multiplicity 2; λ = −1, algebraicmultiplicity 1

9. p(t) = −t3 + 2t2 + t − 2 = −(t − 2)(t − 1)(t + 1);λ = 2, λ = 1, λ = −1

11. p(t) = −t3 + 6t2 − 12t + 8 = −(t − 2)3; λ = 2,algebraic multiplicity 3

13. p(t) = t4 − 18t3 + 97t2 − 180t + 100 =(t − 1)(t − 2)(t − 5)(t − 10); λ = 1, λ = 2, λ = 5,λ = 10


1. x is an eigenvector if and only if x =[ −x2

x2

],

x2 �= 0; basis consists of

[ −1

1

]; algebraic and ge-

ometric multiplicities are 1.

3. x is an eigenvector if and only if x =[ −x2

x2

], x2 �= 0;

basis consists of

[ −1

1

]; algebraic multiplicity is 2 and

geometric multiplicity is 1.

5. x is an eigenvector if and only if x =

a

−a2a

, a �= 0;

basis consists of

1

−1

2

; algebraic and geometric

multiplicities are 1.


x4

−x4

−x4

x4

,


1

−1

−1

1

; algebraic and

geometric multiplicities are 1.


x4

x4

x4

x4

,


1

1

1

1

; algebraic and

geometric multiplicities are 1.



11. x is an eigenvector if and only if

x =

−x2 −x3

x2

x3

x4

−x4 ;

basis consists of

−1

1

0

0

,

−1

0

1

0

,

−1

0

0

1

; algebraic and geometric multiplicities are 3.

13. For λ = 2, x =

x1

−2x3

x3

= x1

1

0

0

+

x3

0

−2

1

; for λ = 3, x =

x2

x2

0

; the matrix is

not defective.

15. For λ = 2, x =

x1

x2

0

= x1

1

0

0

+ x2

0

1

0

;

for λ = 1, x =

−3x3

−x3

x3

; the matrix is not

defective.

17. For λ = 2, x =

x1

−x1

2x1

; for λ = 1, x =

−3x2

x2

−7x2

; for λ = −1; x =

x1

2x1

2x1

; the

matrix is not defective.

19. For λ = 1, eigenvectors are u1 =

1

0

0

, u2 =

0

1

2

; for λ = 2, u3 =

1

2

3

. Therefore x =

u1 + 2u2 + u3, and thus A10x = u1 + 2u2 + 210u3 =

1025

2050

3076

.

Exercises 4.6, p. 3241. 3+ 2i 3. 7− 3i

5. 6 7. 17

9. −5+ 5i 11. 17− 6i

13. (10− 11i)/17 15. (3+ i)/2

17. 1

19. λ = 4+ 2i, x = a

[4

−1+ i

];

λ = 4− 2i, x = a

[4

−1− i

]

21. λ = i, x = a

[ −2+ i

5

];

λ = −i, x = a

[ −2− i

5

]

23. λ = 2, x = a

−1

0

1

; λ = 2+ 3i,

x = a

−5+ 3i

3+ 3i

2

; λ = 2− 3i,

x = a

−5− 3i

3− 3i

2

25. x = 2− i, y = 3− 2i27.√

6 29. 4



Exercises 4.7, p. 3361. A is symmetric, so A is diagonalizable. For S =[

1 −1

1 1

], S−1AS =

[1 0

0 3

]and S−1A5S =

[1 0

0 243

]; therefore A5 =

[122 −121

−121 122

].

3. A is not diagonalizable; λ = −1 is the only eigenvalue

and x = a

[1

1

], a �= 0, are the only eigenvectors.

5. A is diagonalizable since A has distinct eigenvalues. For

S =[ −1 0

10 1

], S−1AS =

[1 0

0 2

]

and S−1A5S =[

1 0

0 32

]; therefore

A5 =[

1 0

310 32

].

7. A is not diagonalizable; λ = 1 is the only eigenvalue andit has geometric multiplicity 2. A basis for the eigenspaceconsists of [1, 1, 0]T and [2, 0, 1]T .

9. A is diagonalizable since A has distinct eigen-

values. For S =−3 −1 1

1 1 2

−7 −2 2

,

S−1AS =

1 0 0

0 2 0

0 0 −1

. Therefore

A5 =

163 −11 −71

−172 10 75

324 −22 −141

.

11. A is not diagonalizable; λ = 1 has algebraic multiplicity2 and geometric multiplicity 1.

13. Q is orthogonal.

15. Q is not orthogonal since the columns are notorthonormal.

17. Q is orthogonal.

19. α = 1/√

2, β = 1/√

6, a = −1/√

3, b = 1/√

3,c = 1/

√3

33. λ = 2, u = 1√2

[1

−1

], v = 1√

2

[1

1

],

Q = [u, v], QTAQ =[

2 −2

0 2

]

35. λ = 1, u = 1√2

[1

1

], v = 1√

2

[1

−1

],

Q = [u, v], QTAQ =[

1 0

0 3

]


1. x1 =[

4

2

], x2 =

[2

4

],

x3 =[

42

], x4 =

[2

4

]

3. x1 =[

80

112

], x2 =

[68

124

],

x3 =[

65127

], x4 =

[64.25

127.75

]

5. x1 =[

7

1

], x2 =

[11

8

],

x3 =[

4319

], x4 =

[119

62

]

7. xk = 3(1)k[

1

1

]+ (−1)k

[ −1

1

]=

[3+ (−1)k+1

3+ (−1)k

]; x4 =

[2

4

],

x10 =[

2

4

]; the sequence {xk} has no limit,

but ‖xk‖ ≤√

20.

9. xk = 64(1)k[

1

2

]− 64(1/4)k

[ −1

1

]=

64

[1+ (1/4)k

2− (1/4)k

]; x4 =

[64.25

127.75

],

x10 =[

64.00006

127.99994

]; the sequence {xk} con-

verges to [64, 128]T .



11. xk = (3/4)(3)k[

2

1

]+ (5/4)(−1)k

[ −2

1

]=

14

[6(3)k − 10(−1)k

3(3)k + 5(−1)k

]; x4 =

[119

62

];

x10 =[

88571

44288

]; the sequence {xk} has no limit

and ‖xk‖ → ∞.

13. xk = −2(1)k

−3

1

−7

− 2(2)k

−1

1

−2

−

5(−1)k

1

2

2

=

6+ 2(2)k − 5(−1)k

−2− 2(2)k − 10(−1)k

14+ 4(2)k − 10(−1)k

;

x4 =

33

−44

68

; x10 =

2049

−2060

4100

;

the sequence {xk} has no limit and ‖xk‖ → ∞.

15. x(t) = 3e2t

[2

1

]− 2e−t

[1

1

]

17. x(t) =

0

−2

2

− e2t

−2

−3

1

+ e3t

1

2

0

21. α = −.18; xk = 16118

(1)k[

3

10

]+

7118

(−.18)k[

10

−6

]; the limit is

16118

[3

10

].

CHAPTER 5


[0 −7 5

−11 −3 −12

],

[12 −22 38

−50 −6 −15

],

[7 −21 28

−42 −7 −14

]

3. ex − 2 sin x, ex − 2 sin x + 3√x2 + 1, −2ex − sin x +

3√x2 + 1

5. c1 = −2+ c3, c2 = 3− c3, c3 arbitrary

7. Not a vector space 9. Not a vector space

11. Not a vector space 13. A vector space



Exercises 5.3, p. 3731. Not a subspace 3. A subspace

5. A subspace 7. Not a subspace



17. p(x) = −p1(x)+ 3p2(x)− 2p3(x)

19. A = (−1− 2x)B1 + (2+ 3x)B2 + xB3 − 3B4,

x arbitrary21. cos 2x = − sin2 x + cos2 x

23. W = Sp{1, x2}

25. In Exercise 2, W = Sp

{[1 1 0

0 0 0

],

[ −2 0 1

0 0 0

],

[0 0 0

1 0 0

],

[0 0 0

0 1 0

],

[0 0 0

0 0 1

]}; in Exercise 3, W =

Sp

{[ −1 −1 1

0 0 0

],

[0 0 0

1 0 0

],

[0 0 0

0 1 0

]}; in Exercise 5, W =

Sp{−1+ x,−2+ x2}; in Exercise 6,W = Sp{1,−4x + x2}; in Exercise 8,W = Sp{1− x2, x}

27. W = Sp{B1, B2, E12, E13, E21, E23, E31, E32},

where B1 =−1 0 0

0 1 0

0 0 0

and

B2 =−1 0 0

0 0 0

0 0 1

29. A = B+C whereB = (A+AT )/2 andC = (A−AT )/2



31. a) W = Sp{E12, E13, E23};

b) W = Sp

−1 0 0

0 1 0

0 0 0

,

−1 0 0

0 0 0

0 0 1

,

0 −1 0

0 0 1

0 0 0

,

0 0 1

0 0 0

0 0 0

;

c) W = Sp

1 1 0

0 0 0

0 0 0

,

0 0 1

0 0 1

0 0 0

,

0 0 0

0 1 0

0 0 1

;

d) W = Sp

1 0 0

0 1 0

0 0 1

,

0 −1 1

0 0 1

0 0 0

,

0 0 1

0 0 0

0 0 0

33. x1 = −6a + 5b + 37c + 15d;x2 = 3a − 2b − 17c − 7d;x3 = −a + b + 5c + 2d; x4 = 2c + d;C = −12B1 + 6B2 − B3 − B4;D = 8B1 − 3B2 + B3 + B4


{[ −1 1

0 0

],

[ −1 0

1 0

],

[ −1 0

0 1

]}

3. {E12, E21, E22} 5. {1+ x2, x − 2x2}7. {x, x2}9. {−9x + 3x2 + x3, 8x − 6x2 + x4}

13. a) [2 − 1 3 2]T ; b) [1 0 − 1 1]T ;c) [2 3 0 0]T

15. Linearly independent17. Linearly dependent19. Linearly dependent21. Linearly independent23. {p1(x), p2(x)} 25. {A1, A2, A3}27. [−4 11 − 3]T31. [a + b − 2c + 7d,−b + 2c − 4d, c − 2d, d]T

38. The set {u, v} is linearly dependent if and only if one ofthe vectors is a scalar multiple of the other.

a) Linearly independent;b) Linearly independent; c) Linearly dependent;d) Linearly dependent; e) Linearly dependent

Exercises 5.5, p. 3901. b) {E11, E21, E22, E31, E32, E33} is a basis for V1.

{E11, E12, E13, E22, E23, E33} is a basis for V2.

c) dim(V ) = 9, dim(V1) = 6, dim(V2) = 6

3. V1 ∩ V2 =

a11

0

0

0

a22

0

0

0

a33

: a11, a22, a33

arbitrary real numbers}; dim(V1 ∩ V2) = 35. dim(W) = 3 7. dim(W) = 3

9. iii) The set S is linearly dependent.11. ii) The set S does not span V.

13. iii) The set S is linearly dependent.

21. a) A =

1 −1 1

0 1 −2

0 0 1

; b) [5 2 1]T

23. A−1 =

1 1 1

0 1 2

0 0 1

; a) p(x) =

6+ 11x + 7x2;b) p(x) = 4+ 2x − x2; c) p(x) = 5+ x;d) p(x) = 8− 2x − x2

Exercises 5.6, p. 4019. 〈x, y〉 = −3, ‖x‖ = √5, ‖y‖ = √2,‖x − y‖ = √13

11. 〈p, q〉 = 52, ‖p‖ = 3√

6, ‖q‖ = 3√

6,‖p − q‖ = 2

13. For 〈x, y〉 = xT y, the graph of S is the circle with equa-tion x2 + y2 = 1. For 〈x, y〉 = 4x1y1 + x2y2, the graphof S is the ellipse with equation 4x2 + y2 = 1.

15. a1 = 7, a2 = 417. q = (−5/3)p0 − 5p1 − 4p2

19. p0 = 1, p1 = x, p2 = x2 − 2, p3 = x3 − (17/5)x,p4 = x4 − (31/7)x2 + 72/35

25. p∗(x) = (3/2)x2 − (3/5)x + 1/2027. p∗(x) ∼= 0.841471p0(x)− 0.467544p1(x) −

0.430920p2(x)+ 0.07882p3(x)



29. d) T2(x) = 2x2 − 1, T3(x) = 4x3 − 3x, T4(x) =8x4 − 8x2 + 1, T5(x) = 16x5 − 20x3 + 5x

Exercises 5.7, p. 4101. Not a linear transformation3. A linear transformation5. A linear transformation7. Not a linear transformation9. a) 11+ x2 + 6x3;

b) T (a0 + a1x + a2x2) =

(a0 + 2a2)+ (a0 + a1)x2 + (−a1 + a2)x

3

11. a) 8+ 14x − 9x2;

b) T

([a b

c d

])= (a + b + 2d) +

(−a + b + 2c + d)x + (b − c − 2d)x2

13. a) {2, 6x, 12x2} is a basis forR(T ).

b) Nullity(T ) = 2c) T [(a0/2)x2 + (a1/6)x3 + (a2/12)x4] =

a0 + a1x + a2x2

15. N (T ) = {a0 + a1x + a2x2: a0 + 2a1 + 4a2 = 0};

R(T ) = R1

17. b) N (I ) = {θ};R(I ) = V

19.rank(T ) 3 2 1 0

nullity(T ) 2 3 4 5T cannot be one-to-one.

21.rank(T ) 3 2 1 0

nullity(T ) 0 1 2 3R(T ) = P3 is not possible.

27. b) Nullity(T ) = 0; rank(T ) = 4

Exercises 5.8, p. 4181. (S + T )(p) = p′(0)+ (x + 2)p(x);

(S + T )(x) = 1+ 2x + x2; (S + T )(x2) = 2x2 + x3

3. (H ◦ T )(p) = p(x)+ (x + 2)p′(x)+ 2p(0);(H ◦ T )(x) = 2x + 2

5. b) There is no polynomial p in P3 such that T (p) = x,

so T −1(x) is not defined.7. T −1(ex) = ex; T −1(e2x) = (1/2)e2x;

T −1(e3x) = (1/3)e3x;T −1(aex + be2x + ce3x) = aex + (b/2)e2x + (c/3)e3x

9. T −1(A) = AT

11. c) T

([a b

c d

])= a + bx + cx2 + dx3


0 1 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

3.

2 1 0 0

1 2 0 0

0 1 2 0

0 0 1 2

0 0 0 1

5.

1 1 0 0 0

0 0 2 0 0

0 0 0 3 0

0 0 0 0 4

7.

2 2 0 0 0

1 1 4 0 0

0 0 2 6 0

0 0 0 3 8

0 0 0 0 4

9. a) [p]B =

a0

a1

a2

a3

,

[T (p)]C =

2a0

a0 + 2a1

a1 + 2a2

a2 + 2a3

a3

11. a) Q =

1 0 0

0 2 0

0 0 3

;

b) P =

1 0 0

0 1/2 0

0 0 1/3

13. a) Q =

1 0 0 0

0 0 1 0

0 1 0 0

0 0 0 1

15. 3 6 0

3 3 0

−1 −1 3

0 0 0

17. 1 0 0

0 3 6

0 1 4



19. 0 0 1 1

1 0 1 0

0 1 0 0

0 0 0 3

21. −4 −2 0

3 3 0

−1 2 3

23. 2 0 0

0 −3 0

0 0 3

31. T (a0 + a1x + a2x2 + a3x

3) = (a0 + 2a2) +(a1 + a3)x + (−a0 + a1 − a3)x

2


1. T (u1) = u1, T (u2) = 3u2,

[1 0

0 3

]

3. T (A1) = 2A1, T (A2) = −2A2, T (A3) = 3A3,

T (A4) = −3A4,

2 0 0 0

0 −2 0 0

0 0 3 0

0 0 0 −3

5. 1 −1 −1

1 −1 0

−1 2 1

;

p(x) = (1+ x − x2)+ (1+ x2);q(x) = −5(1+ x − x2)− 3(1+ x2)+ 7(1+ x);s(x) = −2(1+ x − x2)− (1+ x2)+ 2(1+ x);r(x) = (a0 − a1 − a2)(1+ x − x2)

+ (a0 − a1)(1+ x2)

+ (−a0 + 2a1 + a2)(1+ x)

7.[

1/3 5/3

1/3 −1/3

]

9. −1 1 2 3

1 0 0 −3

0 0 1 0

0 0 0 1

p(x) = −7x + 2(x + 1)+ (x2 − 2x);q(x) = 13x − 4(x + 1)+ (x3 + 3);r(x) = −7x + 3(x + 1)− 2(x2 − 2x)+ (x3 + 3)

11. The matrix of T with respect to B is

Q1 =[

2 1

1 2

]. The transition matrix from C to

B is P =[−1 1

1 1

]. The matrix of T with respect

to C is Q2, where Q2 = P−1Q1P =[

1 0

0 3

].

13. The matrix of T with respect to B is

Q1 =

−3 0 0 5

0 3 −5 0

0 0 −2 0

0 0 0 2

. This transition

matrix from C to B is P =

1 0 0 1

0 1 1 0

0 1 0 0

1 0 0 0

. The

matrix of T with respect to C is Q2, where Q2 =

P−1Q1P =

2 0 0 0

0 −2 0 0

0 0 3 0

0 0 0 −3

.

15. a) Q =

1 1 0

0 2 4

0 0 3

;

b) S =

1 1 2

0 1 4

0 0 1

;R =

1 0 0

0 2 0

0 0 3

;

c) C = {1, 1+ x, 2+ 4x + x2};

d) P =

1 −1 2

0 1 −4

0 0 1

;

e) T (w1) = −1+ 18x + 3x2;T (w2) = 5+ 4x + 3x2;T (w3) = 1+ 2x + 6x2

CHAPTER 6


1. −5 3. 0, x =[ −2

1

]

5. 25 7. 6

9. A11 = −2, A12 = 6, A13 = −2, A33 = 1



11. A11 = −2, A12 = 7, A13 = −8, A33 = 313. A11 = 3, A12 = −6, A13 = 2, A33 = −315. 817. −3519. −1121. −923. 22

29. C =

10 5 −10

−5 −1 4

−5 −3 7

35. a) H(n) = n!/2;b) 3 seconds for n = 2; 180 seconds for n = 5;

5,443,200 seconds for n = 10


1.∣∣∣∣∣∣∣

1 2 1

2 0 1

1 −1 1

∣∣∣∣∣∣∣ =∣∣∣∣∣∣∣

1 0 0

2 −4 −1

1 −3 0

∣∣∣∣∣∣∣ =∣∣∣∣∣ −4 −1

−3 0

∣∣∣∣∣= −3

3. ∣∣∣∣∣∣∣0 1 2

3 1 2

2 0 3

∣∣∣∣∣∣∣ = −∣∣∣∣∣∣∣

1 0 2

1 3 2

0 2 3

∣∣∣∣∣∣∣ = −∣∣∣∣∣∣∣

1 0 0

1 3 0

0 2 3

∣∣∣∣∣∣∣= −

∣∣∣∣∣ 3 0

2 3

∣∣∣∣∣ = −9

5. ∣∣∣∣∣∣∣0 1 3

2 1 2

1 1 2

∣∣∣∣∣∣∣ = −∣∣∣∣∣∣∣

1 0 3

1 2 2

1 1 2

∣∣∣∣∣∣∣ = −∣∣∣∣∣∣∣

1 0 0

1 2 −1

1 1 −1

∣∣∣∣∣∣∣= −

∣∣∣∣∣ 2 −1

1 −1

∣∣∣∣∣ = 1

7. −69. 3

11. 313. Use the column interchanges: [C1,C2,C3,C4] →[C1,C4,C3,C2] → [C1,C4,C2,C3]; the determinantis 6.

15. Use the column interchanges: [C1,C2,C3,C4] →[C2,C1,C3,C4] → [C2,C4,C3,C1] →[C2,C4,C1,C3]; the determinant is −12.

17.∣∣∣∣∣∣∣∣∣∣

2 4 −2 −2

1 3 1 2

1 3 1 3

−1 2 1 2

∣∣∣∣∣∣∣∣∣∣=

∣∣∣∣∣∣∣∣∣∣

2 0 0 0

1 1 2 3

1 1 2 4

−1 4 0 1

∣∣∣∣∣∣∣∣∣∣= 2

∣∣∣∣∣∣∣1 2 3

1 2 4

4 0 1

∣∣∣∣∣∣∣ = 2

∣∣∣∣∣∣∣1 0 0

1 0 1

4 −8 −11

∣∣∣∣∣∣∣= 2

∣∣∣∣∣ 0 1

−8 −11

∣∣∣∣∣ = 16

19.∣∣∣∣∣∣∣∣∣∣

1 2 0 3

2 5 1 1

2 0 4 3

0 1 6 2

∣∣∣∣∣∣∣∣∣∣=

∣∣∣∣∣∣∣∣∣∣

1 2 0 3

0 1 1 −5

0 −4 4 −3

0 1 6 2

∣∣∣∣∣∣∣∣∣∣=

∣∣∣∣∣∣∣1 1 −5

−4 4 −3

1 6 2

∣∣∣∣∣∣∣ =∣∣∣∣∣∣∣

1 1 −5

0 8 −23

0 5 7

∣∣∣∣∣∣∣=

∣∣∣∣∣ 8 −23

5 7

∣∣∣∣∣ = 171

21.∣∣∣∣∣∣∣∣∣∣

1 1 2 1

0 1 4 1

2 1 3 0

2 2 1 2

∣∣∣∣∣∣∣∣∣∣=

∣∣∣∣∣∣∣∣∣∣

1 1 2 1

0 1 4 1

0 −1 −1 −2

0 0 −3 0

∣∣∣∣∣∣∣∣∣∣=

∣∣∣∣∣∣∣1 4 1

−1 −1 −2

0 −3 0

∣∣∣∣∣∣∣ =∣∣∣∣∣∣∣

1 4 1

0 3 −1

0 −3 0

∣∣∣∣∣∣∣=

∣∣∣∣∣ 3 −1

−3 0

∣∣∣∣∣ = −3


1. A→

1 0 3

2 1 1

4 3 1

→

1 0 0

2 1 −5

4 3 −11

→

1 0 0

2 1 0

4 3 4

; det(A) = −4.



3. A→

2 0 0

1 2 2

−1 3 3

→

2 0 0

1 2 0

−1 3 0

;

det(A) = 0.

5. A→

1 0 0

3 1 9

0 1 2

→

1 0 0

3 1 0

0 1 −7

→

1 0 0

0 1 0

0 0 −7

→ I ; det(A) = −7.

7. a) 6; b) 18; c) 3/2; d) 4; e) 1/169. det[B(λ)] = −λ2 + 2λ; λ = 0 and λ = 2

11. det[B(λ)] = 4− λ2; λ = 2 and λ = −213. det[B(λ)] = (λ+ 2)(λ− 1)2; λ = −2 and λ = 115. det(A) = −2, det(B1) = −2, det(B2) = −4;

x1 = 1, x2 = 217. det(A) = −2, det(B1) = −8, det(B2) = −4,

det(B3) = 2; x1 = 4, x2 = 2, x3 = −119. det(A) = det(B1) = det(B2) = det(B3) =

det(B4) = 3; x1 = x2 = x3 = x4 = 121. det(A) = 1, det(B1) = a − b, det(B2) = b − c,

det(B3) = c; x1 = a − b, x2 = b − c, x3 = c

27. det(A5) = [det(A)]5 = 35 = 243


1. ∣∣∣∣∣∣∣1 2 1

2 3 2

−1 4 1

∣∣∣∣∣∣∣ =∣∣∣∣∣∣∣

1 2 1

0 −1 0

0 6 2

∣∣∣∣∣∣∣=

∣∣∣∣∣∣∣1 2 1

0 −1 0

0 0 2

∣∣∣∣∣∣∣ = −2

3. ∣∣∣∣∣∣∣0 1 3

1 2 2

3 1 0

∣∣∣∣∣∣∣ = −∣∣∣∣∣∣∣

1 2 2

0 1 3

3 1 0

∣∣∣∣∣∣∣ = −∣∣∣∣∣∣∣

1 2 2

0 1 3

0 −5 −6

∣∣∣∣∣∣∣= −

∣∣∣∣∣∣∣1 2 2

0 1 3

0 0 9

∣∣∣∣∣∣∣ = −9

5. Adj(A) =[

4 −2

−3 1

];A−1 = −1

2Adj(A)

7. Adj(A) =

0 1 −1

−2 1 0

1 −1 1

;A−1 = Adj(A)

9. Adj(A) =−4 2 0

1 0 −1

1 −2 1

;A−1 = −1

2Adj(A)

11. For all x, w(x) = 2; therefore, the set is linearlyindependent.

13. For allx,w(x) = 0; the Wronskian gives no information;the set is linearly dependent since cos2 x + sin2 x = 1.

15. For all x, w(x) = 0; the Wronskian gives no informa-tion; the set is linearly independent.

17. L =

1 0 0

2 1 0

2 2 −1

, E1 =

0 1 0

1 0 0

0 0 1

,

E2 =

1 0 −3

0 1 0

0 0 1

, E3 =

1 0 0

0 1 2

0 0 1

19. L =

1 0 0

3 −1 0

4 −8 −26

, E1 =

1 −2 0

0 1 0

0 0 1

,

E2 =

1 0 1

0 1 0

0 0 1

, E3 =

1 0 0

0 1 4

0 0 1

21. det[A(x)] = x2 + 1; [A(x)]−1 =1

x2 + 1

[x −1

1 x

]

23. det[A(x)] = 4(2+ x2);

[A(x)]−1 = 14(2+ x2)

4+ x2 −2x x2

2x 4 −2x

x2 2x 4+ x2

CHAPTER 7


1. A =[

2 2

2 −3

]



3. A =

1 1 −3

1 −4 4

−3 4 3

5. A =

[2 2

2 1

]

7. Q = 1√2

[1 1

1 −1

]; the form is indefinite with

eigenvalues λ = 5 and λ = −1.

9. Q = 1√6

√

2√

3 −1√2 −√3 −1√2 0 2

; the form is inde-

finite with eigenvalues λ = 5 and λ = −1.

11. Q = 1√2

[1 1

1 −1

]; the form is positive definite

with eigenvalues λ = 2 and λ = 4.

13. Q =[

1/2√

3/2

−√3/2 1/2

]; the graph corresponds

to the ellipseu2

20+ v2

4= 1.

15. Q = 1√10

[ −1 3

3 1

]; the graph corresponds to

the hyperbolav2

4− u2 = 1.

17. Q = 1√2

[1 −1

1 1


the hyperbolau2

4− v2

4= 1.

19. Q = 1√2

[1 1

−1 1


the ellipseu2

4+ v2

8= 1.


1. x′(t) = Ax(t), A =[

5 −2

6 −2

], x(t) =

[u(t)

v(t)

];

x(t) = b1et

[1

2

]+ b2e

2t

[2

3

];

x(t) = et

[1

2

]+ 2e2t

[2

3

]=[

et + 4e2t

2et + 6e2t

]

3. x′(t) = Ax(t), A =[

1 1

2 2

], x(t) =

[u(t)

v(t)

];

x(t) = b1

[1

−1

]+ b2e

3t

[1

2

];

x(t) = 3

[1

−1

]+ 2e3t

[1

2

]

=[

3+ 2e3t

−3+ 4e3t

]

5. x′(t) = Ax(t), A =[

.5 .5

−.5 .5

], x(t) =

[u(t)

v(t)

];

x(t) = b1e(1+i)t/2

[1

i

]+ b2e

(1−i)t/2

[1

−i

];

b1 = 2− 2i and b2 = 2+ 2i, or x(t) =

4et/2

[cos(t/2)+ sin(t/2)

cos(t/2)− sin(t/2)

]

7. x′(t) = Ax(t), A =

4 0 1

−2 1 0

−2 0 1

,

x(t) =

u(t)

v(t)

w(t)

;

x(t) = b1et

0

1

0

+ b2e

2t

−1

2

2

+ b3e3t

−1

1

1

;

x(t) = et

1

1

0

− e2t

−1

2

2

+ 2e3t

−1

1

1

=

e2t − 2e3t

et− 2e2t + 2e3t

−2e2t + 2e3t



9. a) x′(t) = Ax(t), A =[

1 −1

1 3

],

x(t) =[

u(t)

v(t)

]; x1(t) = b1e

2t

[1

−1

]

b) The vector y0 is determined by the equation

(A− 2I )y = u, where u =[

1

−1

]. One

choice is y0 =[ −2

1

]. Thus,

x2(t) = te2t

[1

−1

]+ e2t

[ −2

1

]is another

solution of x′(t) = Ax(t).

c) Note that x1(0) =[

1

−1

]and

x2(0) =[ −2

1

]. Thus, {x1(0), x2(0)} is a

basis for R2.


1. H = Q1AQ−11 =

−7 16 3

8 9 3

0 1 1

;

Q1 =

1 0 0

0 1 0

0 −4 1

3. H = Q1AQ−11 =

1 1 3

1 3 1

0 4 2

;

Q1 =

1 0 0

0 0 1

0 1 0

5. H = Q1AQ−11 =

3 2 −1

4 5 −2

0 20 −6

;

Q1 =

1 0 0

0 1 0

0 3 1

7. H = Q1AQ−11 =

1 −3 −1 −1

−1 −1 −1 −1

0 0 2 0

0 0 0 2

;

Q1 =

1 0 0 0

0 1 0 0

0 −1 1 0

0 −1 0 1

9. H = Q2Q1AQ−11 Q−1

2 =

1 3 5 2

1 2 4 2

0 1 7 3

0 0 −11 −5

;

Q1 =

1 0 0 0

0 0 0 1

0 0 1 0

0 1 0 0

,

Q2 =

1 0 0 0

0 1 0 0

0 0 1 0

0 0 −2 1

13. The characteristic polynomial isp(t) = (t + 2)(t − 2)3.

15. [e1, e2, e3], [e1, e3, e2], [e2, e1, e3], [e2, e3, e1],[e3, e1, e2], [e3, e2, e1]

17. n!

Exercises 7.4, p. 5181. The system (4) is

a0 + 2a1 = −4a1 = −3 ; p(t) = t2 − 3t + 2

3. The system (4) isa0 + a1 + a2 = −3

2a1 + 4a2 = −6; p(t) = t3 − 4t2 + 5t − 42a2 = −8

5. The system (4) isa0 + 2a1 + 8a2 = −29

a1 + 3a2 = −14;a2 = −8

p(t) = t3 − 8t2 + 10t + 15



7. The system (4) is

a0 + a2 + 2a3 = −8a1 + 2a2 + 6a3 = −18

a2 + 2a3 = −82a3 = −6

p(t) = t4 − 3t3 − 2t2 + 4t

9. The blocks are B11 =[

1 −1

1 3

]and B22 =

[2 −1

−1 2

]. The only eigenvalue of B11 is λ = 2;

the eigenvalues of B22 are λ = 1 and λ = 3. Theeigenvectors are λ = 2, u = [1,−1, 0, 0]T ; λ = 1,u = [−9, 5, 1, 1]T ; and λ = 3, u = [3,−9, 1,−1]T .

11. The blocks are B11 =−2 0 −2

−1 1 −2

0 1 −1

and B22 =

[2]. The eigenvalues of B11 are λ = 0 and λ = −1; theeigenvalue of B22 is λ = 2. The eigenvectors are λ = 0,u = [ − 1, 1, 1, 0]T ; λ = −1, u = [2, 0,−1.0]T ; andλ = 2, u = [1, 15, 1, 6]T .

15. P = [e2, e3, e1]

Exercises 7.5, p. 5291. Qx = x + u = [4, 1, 6, 7]T3. QA1 = A1 + u,QA2 = A2 + 2u; therefore,

QA =

3 3

5 1

5 4

1 2

5. xTQ = xT + uT = [4, 1, 3, 4]

7. AQ = 1 2 1 2

2 −1 2 3

9. u =

0

5

2

1

11. u =

0

0

9

3

13. u =

0

0

0

−8

4

15. u =

0

8

4

17. u =

0

−9

3

19. u =

0

0

−8

4


1. x∗ =[

1

1

]

3. x∗ =

2

1

2

5. R =[ −5 −11

0 2

], u =

[8

4

]

7. R =[ −4 −6

0 −2

], u =

[4

4

]

9. R =

1 2 1

0 −1 −8

0 0 −6

, u =

0

1

1

11. Q2Q1A =

−5 −59/3

0 −7

0 0

0 0

,

where u1 =

6

2

2

4

, u2 =

0

13

2

3

13. Q1A =

2 4

0 −5

0 0

0 0

, where u1 =

0

8

0

4



15. x∗ =[ −38/21

15/21

]

17. x∗ =[ −87/25

56/25

]


1. q(A) =[ −1 0

0 −1

]; q(B) =

[0 0

0 0

];

q(C) =

15 −2 14

5 −2 10

−1 −4 6

3. a) q(t) = (t3 + t − 1)p(t)+ t + 2;

b) q(B) = B + 2I =[

4 −1

−1 4

]


1. a) p(t) = (t − 2)2; v1 =[

1

−1

], v2 =

[1

−2

]

b) p(t) = t (t + 1)2; for λ = −1,

v1 =

−2

0

1

, v2 =

0

1

1

; for λ = 0,

v1 =

−1

1

1

c) p(t) = t (t − 1)2(t + 1); for λ = 1,

v1 =

−2

0

1

, v2 =

5/2

1/2

0

;

for λ = −1, v1 =

−9

−1

1

3. a) QAQ−1 = H, where Q =

1 0 0

0 1 0

0 3 1

,

H =

8 −69 21

1 −10 3

0 −4 1

,

Q−1 =

1 0 0

0 1 0

0 −3 1

The characteristic polynomial for H isp(t) = (t + 1)2(t − 1); and the eigenvectorsand generalized eigenvectors are

λ = −1, v1 =

3

1

2

, v2 =

−7/2

−1/2

0

;

λ = 1, w1 =

−3

0

1

.

The general solution of y′ = Hy isy(t) = c1e

−tv1 + c2e−t (v2 + tv1)+ c3e

tw1;and the initial condition y(0) = Qx0 can bemet by choosing c1 = 0, c2 = 2, c3 = −2.

Finally x(t) = Q−1y(t), or

x(t) =

e−t (6t − 7) + 6et

e−t (2t − 1)

e−t (−2t + 3) − 2et

.

b) QAQ−1 = H, where

Q =

1 0 0

0 1 0

0 3 1

, H =

2 4 −1

−3 −4 1

0 3 −1

,

Q−1 =

1 0 0

0 1 0

0 −3 1

.



The characteristic polynomial is p(t) = (t + 1)3; and

v1 =

1

0

3

, v2 =

0

1

3

, v3 =

0

1

4

.

The general solution of y′ = Hy is

y(t) =

e−t[c1v1 + c2(v2 + tv1)+ c3

(v3 + tv2 + t2

2v1

)];

and the initial condition y(0) = Qx0 is met withc1 = −1, c2 = −5, c3 = 4. Finally Q−1y(t) solvesx′ = Ax, x(0) = x0.

c) QAQ−1 = H, where

Q =

1 0 0

0 1 0

0 3 1

, H =

1 4 −1

−3 −5 1

0 3 −2

,

Q−1 =

1 0 0

0 1 0

0 −3 1

.

The characteristic polynomial is p(t) =(t + 2)3; and

v1 =

1

0

3

, v2 =

0

1

3

, v3 =

0

1

4

.

The general solution of y′ = Hy isy(t) =e−2t

[c1v1 + c2(v2 + tv1)+ c3

(v3 + tv2 + t2

2v1

)];

and the initial condition y(0) = Qx0 is metwith c1 = −1, c2 = −5, c3 = 4. FinallyQ−1y(t) solves x′ = Ax, x(0) = x0.

5. x(t) = c1et

−2

0

1

+ c2e

t

5/2

1/2

0

+ t

−2

0

1

+ c3e−t

−9

−1

1

June20,2001 14:01 i56-frontmatter Sheetnumber1 Pagenumberi...

Documents

Transcript of June20,2001 14:01 i56-frontmatter Sheetnumber1 Pagenumberi...