Notes on Linear Algebra-Peter J Cameron

download Notes on Linear Algebra-Peter J Cameron

of 124

Transcript of Notes on Linear Algebra-Peter J Cameron

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    1/124

    Notes on Linear Algebra

    Peter J. Cameron

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    2/124

    ii

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    3/124

    Preface

    Linear algebra has two aspects. Abstractly, it is the study of vector spaces over

    fields, and their linear maps and bilinear forms. Concretely, it is matrix theory:

    matrices occur in all parts of mathematics and its applications, and everyone work-ing in the mathematical sciences and related areas needs to be able to diagonalise

    a real symmetric matrix. So in a course of this kind, it is necessary to touch on

    both the abstract and the concrete aspects, though applications are not treated in

    detail.

    On the theoretical side, we deal with vector spaces, linear maps, and bilin-

    ear forms. Vector spaces over a field K are particularly attractive algebraic ob-

    jects, since each vector space is completely determined by a single number, its

    dimension (unlike groups, for example, whose structure is much more compli-

    cated). Linear maps are the structure-preserving maps or homomorphisms of vec-

    tor spaces.On the practical side, the subject is really about one thing: matrices. If we need

    to do some calculation with a linear map or a bilinear form, we must represent it

    by a matrix. As this suggests, matrices represent several different kinds of things.

    In each case, the representation is not unique, since we have the freedom to change

    bases in our vector spaces; so many different matrices represent the same object.

    This gives rise to several equivalence relations on the set of matrices, summarised

    in the following table:

    Equivalence Similarity Congruence Orthogonal

    similarity

    Same linear map Same linear map Same bilinear Same self-adjoint : VW : VV form b on V : VV w.r.t.

    orthonormal basis

    A = Q1 AP A = P1 AP A = P AP A = P1APP, Q invertible P invertible P invertible P orthogonal

    The power of linear algebra in practice stems from the fact that we can choose

    bases so as to simplify the form of the matrix representing the object in question.

    We will see several such canonical form theorems in the notes.

    iii

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    4/124

    iv

    These lecture notes correspond to the course Linear Algebra II, as given at

    Queen Mary, University of London, in the first sememster 20056.The course description reads as follows:

    This module is a mixture of abstract theory, with rigorous proofs, and

    concrete calculations with matrices. The abstract component builds

    on the notions of subspaces and linear maps to construct the theory

    of bilinear forms i.e. functions of two variables which are linear in

    each variable, dual spaces (which consist of linear mappings from the

    original space to the underlying field) and determinants. The concrete

    applications involve ways to reduce a matrix of some specific type

    (such as symmetric or skew-symmetric) to as near diagonal form as

    possible.

    In other words, students on this course have met the basic concepts of linear al-

    gebra before. Of course, some revision is necessary, and I have tried to make the

    notes reasonably self-contained. If you are reading them without the benefit of a

    previous course on linear algebra, you will almost certainly have to do some work

    filling in the details of arguments which are outlined or skipped over here.

    The notes for the prerequisite course, Linear Algebra I, by Dr Francis Wright,

    are currently available from

    http://centaur.maths.qmul.ac.uk/Lin Alg I/

    I have by-and-large kept to the notation of these notes. For example, a general

    field is called K, vectors are represented as column vectors, linear maps (apart

    from zero and the identity) are represented by Greek letters.

    I have included in the appendices some extra-curricular applications of lin-

    ear algebra, including some special determinants, the method for solving a cubic

    equation, the proof of the Friendship Theorem and the problem of deciding the

    winner of a football league, as well as some worked examples.

    Peter J. Cameron

    September 5, 2008

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    5/124

    Contents

    1 Vector spaces 3

    1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    1.3 Row and column vectors . . . . . . . . . . . . . . . . . . . . . . 9

    1.4 Change of basis . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    1.5 Subspaces and direct sums . . . . . . . . . . . . . . . . . . . . . 13

    2 Matrices and determinants 15

    2.1 Matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    2.2 Row and column operations . . . . . . . . . . . . . . . . . . . . 16

    2.3 Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    2.4 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    2.5 Calculating determinants . . . . . . . . . . . . . . . . . . . . . . 25

    2.6 The CayleyHamilton Theorem . . . . . . . . . . . . . . . . . . 29

    3 Linear maps between vector spaces 33

    3.1 Definition and basic properties . . . . . . . . . . . . . . . . . . . 33

    3.2 Representation by matrices . . . . . . . . . . . . . . . . . . . . . 35

    3.3 Change of basis . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    3.4 Canonical form revisited . . . . . . . . . . . . . . . . . . . . . . 39

    4 Linear maps on a vector space 41

    4.1 Projections and direct sums . . . . . . . . . . . . . . . . . . . . . 41

    4.2 Linear maps and matrices . . . . . . . . . . . . . . . . . . . . . . 43

    4.3 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . 44

    4.4 Diagonalisability . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    4.5 Characteristic and minimal polynomials . . . . . . . . . . . . . . 48

    4.6 Jordan form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

    4.7 Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

    v

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    6/124

    CONTENTS 1

    5 Linear and quadratic forms 55

    5.1 Linear forms and dual space . . . . . . . . . . . . . . . . . . . . 555.1.1 Adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

    5.1.2 Change of basis . . . . . . . . . . . . . . . . . . . . . . . 57

    5.2 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

    5.2.1 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . 58

    5.2.2 Reduction of quadratic forms . . . . . . . . . . . . . . . . 60

    5.2.3 Quadratic and bilinear forms . . . . . . . . . . . . . . . . 62

    5.2.4 Canonical forms for complex and real forms . . . . . . . 64

    6 Inner product spaces 67

    6.1 Inner products and orthonormal bases . . . . . . . . . . . . . . . 676.2 Adjoints and orthogonal linear maps . . . . . . . . . . . . . . . . 70

    7 Symmetric and Hermitian matrices 73

    7.1 Orthogonal projections and orthogonal decompositions . . . . . . 73

    7.2 The Spectral Theorem . . . . . . . . . . . . . . . . . . . . . . . . 75

    7.3 Quadratic forms revisited . . . . . . . . . . . . . . . . . . . . . . 77

    7.4 Simultaneous diagonalisation . . . . . . . . . . . . . . . . . . . . 78

    8 The complex case 81

    8.1 Complex inner products . . . . . . . . . . . . . . . . . . . . . . . 81

    8.2 The complex Spectral Theorem . . . . . . . . . . . . . . . . . . . 828.3 Normal matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

    9 Skew-symmetric matrices 85

    9.1 Alternating bilinear forms . . . . . . . . . . . . . . . . . . . . . . 85

    9.2 Skew-symmetric and alternating matrices . . . . . . . . . . . . . 86

    9.3 Complex skew-Hermitian matrices . . . . . . . . . . . . . . . . . 88

    A Fields and vector spaces 89

    B Vandermonde and circulant matrices 93

    C The Friendship Theorem 97

    D Who is top of the league? 101

    E Other canonical forms 105

    F Worked examples 107

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    7/124

    2 CONTENTS

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    8/124

    Chapter 1

    Vector spaces

    These notes are about linear maps and bilinear forms on vector spaces, how we

    represent them by matrices, how we manipulate them, and what we use this for.

    1.1 Definitions

    Definition 1.1 A field is an algebraic system consisting of a non-empty set K

    equipped with two binary operations + (addition) and (multiplication) satisfyingthe conditions:

    (A) (K, +) is an abelian group with identity element 0 (called zero);

    (M) (K\{0}, ) is an abelian group with identity element 1;(D) the distributive law

    a(b + c) = ab + ac

    holds for all a, b, c K.

    If you dont know what an abelian group is, then you can find it spelled out in

    detail in Appendix A. In fact, the only fields that I will use in these notes are

    Q, the field of rational numbers; R, the field of real numbers; C, the field of complex numbers; Fp, the field of integers mod p, where p is a prime number.

    I will not stop to prove that these structures really are fields. You may have seen

    Fp referred to as Zp.

    3

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    9/124

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    10/124

    1.2. BASES 5

    &&&&&&&&&&&&

    !

    !

    I

    I

    &&&&&&&&&b

    Figure 1.1: The parallelogram law

    Not only is this definition much simpler, but it is much easier to check that

    the rules for a vector space are really satisfied! For example, we check the

    law a(v + w) = av + aw. Let v = (x1,y1) and w = (x2,y2). Then we have

    a(v + w) = a((x1,y1) + (x2,y2)

    = a(x1 +x2,y1 +y2)

    = (ax1 + ax2, ay1 + ay2)

    = (ax1, ay1) + (ax2, ay2)

    = av + aw.

    In the algebraic definition, we say that the operations of addition and scalar

    multiplication are coordinatewise: this means that we add two vectors coordinate

    by coordinate, and similarly for scalar multiplication.

    Using coordinates, this example can be generalised.

    Example 1.2 Let n be any positive integer and K any field. Let V = Kn, the setof all n-tuples of elements ofK. Then V is a vector space over K, where the

    operations are defined coordinatewise:

    (a1, a2, . . . , an) + (b1, b2, . . . , bn) = (a1 + b1, a2 + b2, . . . , an + bn),

    c(a1, a2, . . . , an) = (ca1, ca2, . . . , can).

    1.2 Bases

    This example is much more general than it appears: Every finite-dimensional vec-

    tor space looks like Example 1.2. Heres why.

    Definition 1.3 Let V be a vector space over the field K, and let v1, . . . , vn be vec-tors in V.

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    11/124

    6 CHAPTER 1. VECTOR SPACES

    (a) The vectors v1, v2, . . . , vn are linearly independent if, whenever we have

    scalars c1, c2, . . . , cn satisfying

    c1v1 + c2v2 + + cnvn = 0,

    then necessarily c1 = c2 = = 0.(b) The vectors v1, v2, . . . , vn are spanning if, for every vector vV, we can find

    scalars c1, c2, . . . , cn K such that

    v = c1v1 + c2v2 + + cnvn.

    In this case, we write V = v1, v2, . . . , vn.(c) The vectors v1, v2, . . . , vn form a basis for V if they are linearly independent

    and spanning.

    Remark Linear independence is a property of a listof vectors. A list containing

    the zero vector is never linearly independent. Also, a list in which the same vector

    occurs more than once is never linearly independent.

    I will say Let B = (v1, . . . , vn) be a basis for V to mean that the list of vectorsv1, . . . , vn is a basis, and to refer to this list as B.

    Definition 1.4 Let V be a vector space over the field K. We say that V is finite-

    dimensional if we can find vectors v1, v2, . . . , vn V which form a basis for V.

    Remark In these notes we are only concerned with finite-dimensional vector

    spaces. If you study Functional Analysis, Quantum Mechanics, or various other

    subjects, you will meet vector spaces which are not finite dimensional.

    Proposition 1.1 The following three conditions are equivalent for the vectors

    v1, . . . , vn of the vector space V overK:

    (a) v1, . . . , vn is a basis;

    (b) v1, . . . , vn is a maximal linearly independent set (that is, if we add any vectorto the list, then the result is no longer linearly independent);

    (c) v1, . . . , vn is a minimal spanning set (that is, if we remove any vector fromthe list, then the result is no longer spanning).

    The next theorem helps us to understand the properties of linear independence.

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    12/124

    1.2. BASES 7

    Theorem 1.2 (The Exchange Lemma) Let V be a vector space overK. Suppose

    that the vectors v1, . . . , vn are linearly independent, and that the vectors w1, . . . , wmare linearly independent, where m > n. Then we can find a number i with 1 imsuch that the vectors v1, . . . , vn, wi are linearly independent.

    In order to prove this, we need a lemma about systems of equations.

    Lemma 1.3 Given a system ()

    a11x1 + a12x2 + + a1mxm = 0,a21x1 + a22x2 + + a2mxm = 0,

    an1x1 + an2x2 + + anmxm = 0

    of homogeneous linear equations, where the number n of equations is strictly less

    than the number m of variables, there exists a non-zero solution (x1, . . . ,xm) (thatis, x1, . . . ,xm are not all zero).

    Proof This is proved by induction on the number of variables. If the coefficients

    a11, a21, . . . , an1 ofx1 are all zero, then putting x1 = 1 and the other variables zerogives a solution. If one of these coefficients is non-zero, then we can use the

    corresponding equation to express x1 in terms of the other variables, obtainingn 1 equations in m 1 variables. By hypothesis, n 1 < m 1. So by theinduction hypothesis, these new equations have a non-zero solution. Computing

    the value ofx1 gives a solution to the original equations.

    Now we turn to the proof of the Exchange Lemma. Let us argue for a contra-

    diction, by assuming that the result is false: that is, assume that none of the vectors

    wi can be added to the list (v1, . . . , vn) to produce a larger linearly independent list.This means that, for all j, the list (v1, . . . , vn, wi) is linearly dependent. So thereare coefficients c1, . . . , cn, d, not all zero, such that

    c1v1 + + cnvn + dwi = 0.

    We cannot have d = 0; for this would mean that we had a linear combination ofv1, . . . , vn equal to zero, contrary to the hypothesis that these vectors are linearlyindependent. So we can divide the equation through by d, and take wi to the other

    side, to obtain (changing notation slightly)

    wi = a1iv1 + a2iv2 + + anivn =n

    j=1

    ajivj.

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    13/124

    8 CHAPTER 1. VECTOR SPACES

    We do this for each value of i = 1, . . . , m.

    Now take a non-zero solution to the set of equations () above: that is,m

    i=1

    ajixi = 0

    for j = 1, . . . , n.Multiplying the formula for wi by xi and adding, we obtain

    x1w1 + +xmwm =n

    j=1

    m

    i=1

    ajixi

    vj = 0.

    But the coefficients are not all zero, so this means that the vectors (w1, . . . , wm)are not linearly dependent, contrary to hypothesis.

    So the assumption that no wi can be added to (v1, . . . , vn) to get a linearlyindependent set must be wrong, and the proof is complete.

    The Exchange Lemma has some important consequences:

    Corollary 1.4 Let V be a finite-dimensional vector space over a fieldK. Then

    (a) any two bases of V have the same number of elements;

    (b) any linearly independent set can be extended to a basis.

    The number of elements in a basis is called the dimension of the vector space

    V. We will say an n-dimensional vector space instead of a finite-dimensional

    vector space whose dimension is n. We denote the dimension ofV by dim(V).

    Proof Let us see how the corollary follows from the Exchange Lemma.

    (a) Let (v1, . . . , vn) and (w1, . . . , wm) be two bases for V. Suppose, for a con-tradiction, that they have different numbers of elements; say that n < m, without

    loss of generality. Both lists of vectors are linearly independent; so, according tothe Exchange Lemma, we can add some vector wi to the first list to get a larger

    linearly independent list. This means that v1, . . . , vn was not a maximal linearlyindependent set, and so (by Proposition 1.1) not a basis, contradicting our assump-

    tion. We conclude that m = n, as required.(b) Let (v1, . . . , vn) be linearly independent and let (w1, . . . , wm) be a basis.

    Necessarily n m, since otherwise we could add one of the vs to (1, . . . , wm) toget a larger linearly independent set, contradicting maximality. But now we can

    add some ws to (v1, . . . , vn) until we obtain a basis.

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    14/124

    1.3. ROW AND COLUMN VECTORS 9

    Remark We allow the possibility that a vector space has dimension zero. Such

    a vector space contains just one vector, the zero vector 0; a basis for this vectorspace consists of the empty set.

    Now let V be an n-dimensional vector space over K. This means that there is a

    basis v1, v2, . . . , vn for V. Since this list of vectors is spanning, every vector v Vcan be expressed as

    v = c1v1 + c2v2 + + cnvnfor some scalars c1, c2, . . . , cn K. The scalars c1, . . . , cn are the coordinates ofv (with respect to the given basis), and the coordinate representation of v is the

    n-tuple

    (c1, c2, . . . , cn) Kn.Now the coordinate representation is unique. For suppose that we also had

    v = c1v1 + c2v2 + + cnvn

    for scalars c1, c2 . . . , c

    n. Subtracting these two expressions, we obtain

    0 = (c1 c1)v1 + (c2 c2)v2 + + (cn cn)vn.

    Now the vectors v1, v2 . . . , vn are linearly independent; so this equation implies

    that c1 c1 = 0, c2 c2 = 0, . . . , cn cn = 0; that is,c1 = c

    1, c2 = c

    2, . . . cn = c

    n.

    Now it is easy to check that, when we add two vectors in V, we add their

    coordinate representations in Kn (using coordinatewise addition); and when we

    multiply a vector v V by a scalar c, we multiply its coordinate representationby c. In other words, addition and scalar multiplication in V translate to the same

    operations on their coordinate representations. This is why we only need to con-

    sider vector spaces of the form Kn, as in Example 1.2.

    Here is how the result would be stated in the language of abstract algebra:

    Theorem 1.5 Any n-dimensional vector space over a fieldK is isomorphic to the

    vector space Kn.

    1.3 Row and column vectors

    The elements of the vector space Kn are all the n-tuples of scalars from the field

    K. There are two different ways that we can represent an n-tuple: as a row, or as

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    15/124

    10 CHAPTER 1. VECTOR SPACES

    a column. Thus, the vector with components 1, 2 and

    3 can be represented as a

    row vector[ 1 2 3 ]

    or as a column vector 123

    .

    (Note that we use square brackets, rather than round brackets or parentheses. But

    you will see the notation (1, 2,3) and the equivalent for columns in other books!)Both systems are in common use, and you should be familiar with both. The

    choice of row or column vectors makes some technical differences in the state-

    ments of the theorems, so care is needed.There are arguments for and against both systems. Those who prefer row

    vectors would argue that we already use (x,y) or (x,y,z) for the coordinates ofa point in 2- or 3-dimensional Euclidean space, so we should use the same for

    vectors. The most powerful argument will appear when we consider representing

    linear maps by matrices.

    Those who prefer column vectors point to the convenience of representing,

    say, the linear equations

    2x + 3y = 5,

    4x + 5y = 9in matrix form

    2 3

    4 5

    x

    y

    =

    5

    9

    .

    Statisticians also prefer column vectors: to a statistician, a vector often represents

    data from an experiment, and data are usually recorded in columns on a datasheet.

    I will use column vectors in these notes. So we make a formal definition:

    Definition 1.5 Let V be a vector space with a basis B = (v1, v2, . . . , vn). If v =c1v1 + c2v2 + + cnvn, then the coordinate representation of v relative to thebasis B is

    [v]B =

    c1c2...

    cn

    .

    In order to save space on the paper, we often write this as

    [v]B = [ c1 c2 . . . vn ] .

    The symbol is read transpose.

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    16/124

    1.4. CHANGE OF BASIS 11

    1.4 Change of basis

    The coordinate representation of a vector is always relative to a basis. We now

    have to look at how the representation changes when we use a different basis.

    Definition 1.6 LetB = (v1, . . . , vn) andB = (v1, . . . , v

    n) be bases for the n-dimensional

    vector space V over the field K. The transitition matrix P from B to B is the nnmatrix whose jth column is the coordinate representation [vj]B of the jth vectorofB relative to B. If we need to specify the bases, we write PB,B .

    Proposition 1.6 Let B and B be bases for the n-dimensional vector space V overthe fieldK. Then, for any vector v

    V , the coordinate representations of v with

    respect to B and B are related by

    [v]B = P [v]B.

    Proof Let pi j be the i, j entry of the matrix P. By definition, we have

    vj =n

    i=1

    pi jvi.

    Take an arbitrary vector v V, and let[v]B = [c1, . . . , cn]

    , [v]B = [d1, . . . , dn].

    This means, by definition, that

    v =n

    i=1

    civi =n

    j=1

    djvj.

    Substituting the formula for vj into the second equation, we have

    v =n

    j=1

    dj

    n

    i=1

    pi jvi

    .

    Reversing the order of summation, we get

    v =n

    i=1

    n

    j=1

    pi jdj

    vi.

    Now we have two expressions for v as a linear combination of the vectors vi. By

    the uniqueness of the coordinate representation, they are the same: that is,

    ci =n

    j=1

    pi jdj.

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    17/124

    12 CHAPTER 1. VECTOR SPACES

    In matrix form, this says c1...cn

    = Pd1...

    dn

    ,or in other words

    [v]B = P [v]B,

    as required.

    In this course, we will see four ways in which matrices arise in linear algebra.

    Here is the first occurrence: matrices arise as transition matrices between bases

    of a vector space.

    The next corollary summarises how transition matrices behave. Here Idenotes

    the identity matrix, the matrix having 1s on the main diagonal and 0s everywhere

    else. Given a matrix P, we denote by P1 the inverse ofP, the matrix Q satisfyingPQ = QP = I. Not every matrix has an inverse: we say that P is invertible ornon-singular if it has an inverse.

    Corollary 1.7 Let B,B,B be bases of the vector space V .

    (a) PB,B = I.

    (b) PB,B = (PB,B)1.

    (c) PB,B = PB,BPB,B .

    This follows from the preceding Proposition. For example, for (b) we have

    [v]B = PB,B [v]B, [v]B = PB,B [v]B,

    so

    [v]B = PB,BPB,B [v]B.

    By the uniqueness of the coordinate representation, we have PB,BPB,B = I.

    Corollary 1.8 The transition matrix between any two bases of a vector space isinvertible.

    This follows immediately from (b) of the preceding Corollary.

    Remark We see that, to express the coordinate representation w.r.t. the new

    basis in terms of that w.r.t. the old one, we need the inverse of the transition matrix:

    [v]B = P1

    B,B[v]B.

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    18/124

    1.5. SUBSPACES AND DIRECT SUMS 13

    Example Consider the vector space R2, with the two bases

    B =

    1

    0

    ,

    0

    1

    , B =

    1

    1

    ,

    2

    3

    .

    The transition matrix is

    PB,B =

    1 2

    1 3

    ,

    whose inverse is calculated to be

    PB,B =

    3 21 1

    .

    So the theorem tells us that, for any x,y R, we havex

    y

    = x

    1

    0

    +y

    0

    1

    = (3x2y)

    1

    1

    + (x +y)

    2

    3

    ,

    as is easily checked.

    1.5 Subspaces and direct sums

    Definition 1.7 A non-empty subset of a vector space is called a subspace if it

    contains the sum of any two of its elements and any scalar multiple of any of itselements. We write UV to mean U is a subspace ofV.

    A subspace of a vector space is a vector space in its own right.

    Subspaces can be constructed in various ways:

    (a) Let v1, . . . , vn V. The span of(v1, . . . , vn) is the set{c1v1 + c2v2 + + cnvn : c1, . . . , cn K}.

    This is a subspace of V. Moreover, (v1, . . . , vn) is a spanning set in this

    subspace. We denote the span ofv1, . . . , vn by v1, . . . , vn.(b) Let U1 and U2 be subspaces ofV. Then

    the intersection U1 U2 is the set of all vectors belonging to both U1and U2;

    the sum U1 + U2 is the set {u1 + u2 : u1 U1, u2 U2} of all sums ofvectors from the two subspaces.

    Both U1U2 and U1 +U2 are subspaces ofV.

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    19/124

    14 CHAPTER 1. VECTOR SPACES

    The next result summarises some properties of these subspaces. Proofs are left

    to the reader.

    Proposition 1.9 Let V be a vector space overK.

    (a) For any v1, . . . , vn V , the dimension ofv1, . . . , vn is at most n, with equal-ity if and only if v1, . . . , vn are linearly independent.

    (b) For any two subspaces U1 and U2 of V , we have

    dim(U1U2) + dim(U1 +U2) = dim(U1) + dim(U2).

    An important special case occurs when U1

    U2

    is the zero subspace{

    0}

    . In

    this case, the sum U1 +U2 has the property that each of its elements has a uniqueexpression in the form u1 + u2, for u1 U1 and u2 U2. For suppose that we hadtwo different expressions for a vector v, say

    v = u1 + u2 = u1 + u

    2, u1, u

    1 U1, u2, u2 U2.

    Then

    u1 u1 = u2 u2.But u1u1 U1, and u2u2 U2; so this vector is in U1U2, and by hypothesisit is equal to 0, so that u1 = u

    1 and u2 = u

    2; that is, the two expressions are

    not different after all! In this case we say that U1 + U2 is the direct sum of thesubspaces U1 and U2, and write it as U1 U2. Note that

    dim(U1U2) = dim(U1) + dim(U2).The notion of direct sum extends to more than two summands, but is a little

    complicated to describe. We state a form which is sufficient for our purposes.

    Definition 1.8 Let U1, . . . ,Ur be subspaces of the vector space V. We say that Vis the direct sum ofU1, . . . ,Ur, and write

    V=

    U1 . . .

    Ur,

    if every vector v V can be written uniquely in the form v = u1 + + ur withui Ui for i = 1, . . . , r.Proposition 1.10 If V = U1 Ur, then

    (a) dim(V) = dim(U1) + + dim(Ur);(b) if Bi is a basis for Ui for i = 1, . . . , r, then B1 Br is a basis for V.

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    20/124

    Chapter 2

    Matrices and determinants

    You have certainly seen matrices before; indeed, we met some in the first chapter

    of the notes. Here we revise matrix algebra, consider row and column operations

    on matrices, and define the rank of a matrix. Then we define the determinant of

    a square matrix axiomatically and prove that it exists (that is, there is a unique

    determinant function satisfying the rules we lay down), and give some methods

    of calculating it and some of its properties. Finally we prove the CayleyHamilton

    Theorem: every matrix satisfies its own characteristic equation.

    2.1 Matrix algebra

    Definition 2.1 A matrix of size m n over a field K, where m and n are positiveintegers, is an array with m rows and n columns, where each entry is an element

    ofK. For 1 i m and 1 j n, the entry in row i and column j ofA is denotedby Ai j, and referred to as the (i, j) entry ofA.

    Example 2.1 A column vector in Kn can be thought of as a n1 matrix, while arow vector is a 1n matrix.

    Definition 2.2 We define addition and multiplication of matrices as follows.

    (a) Let A and B be matrices of the same size mn over K. Then the sum A +Bis defined by adding corresponding entries:

    (A +B)i j = Ai j +Bi j.

    (b) Let A be an m n matrix and B an np matrix over K. Then the productAB is the m p matrix whose (i, j) entry is obtained by multiplying each

    15

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    21/124

    16 CHAPTER 2. MATRICES AND DETERMINANTS

    element in the ith row ofA by the corresponding element in the jth column

    ofB and summing:

    (AB)i j =n

    k=1

    AikBk j.

    Remark Note that we can only add or multiply matrices if their sizes satisfy

    appropriate conditions. In particular, for a fixed value of n, we can add and mul-

    tiply n n matrices. It turns out that the set Mn(K) of n n matrices over K isa ring with identity: this means that it satisfies conditions (A0)(A4), (M0)(M2)

    and (D) of Appendix 1. The zero matrix, which we denote by O, is the matrix

    with every entry zero, while the identity matrix, which we denote by I, is the ma-

    trix with entries 1 on the main diagonal and 0 everywhere else. Note that matrix

    multiplication is not commutative: BA is usually not equal to AB.

    We already met matrix multiplication in Section 1 of the notes: recall that if

    PB,B denotes the transition matrix between two bases of a vector space, then

    PB,BPB,B = PB,B .

    2.2 Row and column operations

    Given an mn matrix A over a field K, we define certain operations on A calledrow and column operations.

    Definition 2.3 Elementary row operations There are three types:

    Type 1 Add a multiple of the jth row to the ith, where j = i.Type 2 Multiply the ith row by a non-zero scalar.

    Tyle 3 Interchange the ith and jth rows, where j = i.

    Elementary column operations There are three types:

    Type 1 Add a multiple of the jth column to the ith, where j = i.Type 2 Multiply the ith column by a non-zero scalar.

    Tyle 3 Interchange the ith and jth column, where j = i.

    By applying these operations, we can reduce any matrix to a particularly sim-

    ple form:

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    22/124

    2.2. ROW AND COLUMN OPERATIONS 17

    Theorem 2.1 Let A be an m

    n matrix over the fieldK. Then it is possible to

    change A into B by elementary row and column operations, where B is a matrixof the same size satisfying Bii = 1 for 0 i r, for r min{m, n}, and all otherentries of B are zero.

    If A can be reduced to two matrices B and B both of the above form, wherethe numbers of non-zero elements are r and r respectively, by different sequencesof elementary operations, then r= r, and so B = B.

    Definition 2.4 The number r in the above theorem is called the rank of A; while

    a matrix of the form described for B is said to be in the canonical form for equiv-

    alence. We can write the canonical form matrix in block form as

    B =

    Ir O

    O O

    ,

    where Ir is an r r identity matrix and O denotes a zero matrix of the appropriatesize (that is, r(nr), (mr)r, and (mr)(nr) respectively for the threeOs). Note that some or all of these Os may be missing: for example, if r= m, we

    just have [Im O ].

    Proof We outline the proof that the reduction is possible. To prove that we al-

    ways get the same value of r, we need a different argument.The proof is by induction on the size of the matrix A: in other words, we

    assume as inductive hypothesis that any smaller matrix can be reduced as in the

    theorem. Let the matrix A be given. We proceed in steps as follows:

    If A = O (the all-zero matrix), then the conclusion of the theorem holds,with r= 0; no reduction is required. So assume that A = O.

    If A11 = 0, then skip this step. If A11 = 0, then there is a non-zero elementAi j somewhere in A; by swapping the first and ith rows, and the first and jth

    columns, if necessary (Type 3 operations), we can bring this entry into the

    (1, 1) position.

    Now we can assume that A11 = 0. Multiplying the first row by A111 , (rowoperation Type 2), we obtain a matrix with A11 = 1.

    Now by row and column operations of Type 1, we can assume that all theother elements in the first row and column are zero. For if A1j = 0, thensubtracting A1j times the first column from the jth gives a matrix with A1j =0. Repeat this until all non-zero elements have been removed.

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    23/124

    18 CHAPTER 2. MATRICES AND DETERMINANTS

    Now let B be the matrix obtained by deleting the first row and column of A.

    Then B is smaller than A and so, by the inductive hypothesis, we can reduceB to canonical form by elementary row and column operations. The same

    sequence of operations applied to A now finish the job.

    Example 2.2 Here is a small example. Let

    A =

    1 2 3

    4 5 6

    .

    We have A11 = 1, so we can skip the first three steps. Subtracting twice the firstcolumn from the second, and three times the first column from the third, gives the

    matrix 1 0 04 3 6

    .

    Now subtracting four times the first row from the second gives1 0 0

    0 3 6

    .

    From now on, we have to operate on the smaller matrix [3 6 ], but we con-tinue to apply the operations to the large matrix.

    Multiply the second row by 1/3 to get

    1 0 00 1 2

    .

    Now subtract twice the second column from the third to obtain1 0 0

    0 1 0

    .

    We have finished the reduction, and we conclude that the rank of the original

    matrix A is equal to 2.

    We finish this section by describing the elementary row and column operations

    in a different way.

    For each elementary row operation on an n-rowed matrix A, we define the cor-responding elementary matrix by applying the same operation to the nn identitymatrix I. Similarly we represent elementary column operations by elementary ma-

    trices obtained by applying the same operations to the mm identity matrix.We dont have to distinguish between rows and columns for our elementary

    matrices. For example, the matrix 1 2 00 1 0

    0 0 1

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    24/124

    2.2. ROW AND COLUMN OPERATIONS 19

    corresponds to the elementary column operation of adding twice the first column

    to the second, or to the elementary row operation of adding twice the secondrow to the first. For the other types, the matrices for row operations and column

    operations are identical.

    Lemma 2.2 The effect of an elementary row operation on a matrix is the same as

    that of multiplying on the left by the corresponding elementary matrix. Similarly,

    the effect of an elementary column operation is the same as that of multiplying on

    the right by the corresponding elementary matrix.

    The proof of this lemma is somewhat tedious calculation.

    Example 2.3 We continue our previous example. In order, here is the list of

    elementary matrices corresponding to the operations we applied to A. (Here 22matrices are row operations while 33 matrices are column operations).

    1 2 00 1 00 0 1

    ,1 0 30 1 0

    0 0 1

    , 1 04 1

    ,

    1 0

    0 1/3

    ,

    1 0 00 1 2

    0 0 1

    .

    So the whole process can be written as a matrix equation:

    1 0

    0

    1/3

    1 0

    4 1A

    1 2 00 1 0

    0 0 1

    1 0 30 1 0

    0 0 1

    1 0 0

    0 1 20 0 1

    = B,

    or more simply 1 0

    4/3 1/3

    A

    1 2 10 1 2

    0 0 1

    = B,

    where, as before,

    A =

    1 2 3

    4 5 6

    , B =

    1 0 0

    0 1 0

    .

    An important observation about the elementary operations is that each of them

    can have its effect undone by another elementary operation of the same kind,and hence every elementary matrix is invertible, with its inverse being another

    elementary matrix of the same kind. For example, the effect of adding twice the

    first row to the second is undone by adding 2 times the first row to the second,so that

    1 2

    0 1

    1=

    1 20 1

    .

    Since the product of invertible matrices is invertible, we can state the above theo-

    rem in a more concise form. First, one more definition:

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    25/124

    20 CHAPTER 2. MATRICES AND DETERMINANTS

    Definition 2.5 The m

    n matrices A and B are said to be equivalent ifB = PAQ,

    where P and Q are invertible matrices of sizes mm and nn respectively.Theorem 2.3 Given any m n matrix A, there exist invertible matrices P and Qof sizes mm and n n respectively, such that PAQ is in the canonical form forequivalence.

    Remark The relation equivalence defined above is an equivalence relation on

    the set of all mn matrices; that is, it is reflexive, symmetric and transitive.When mathematicians talk about a canonical form for an equivalence re-

    lation, they mean a set of objects which are representatives of the equivalence

    classes: that is, every object is equivalent to a unique object in the canonical form.We have shown this for the relation of equivalence defined earlier, except for the

    uniqueness of the canonical form. This is our job for the next section.

    2.3 Rank

    We have the unfinished business of showing that the rank of a matrix is well de-

    fined; that is, no matter how we do the row and column reduction, we end up with

    the same canonical form. We do this by defining two further kinds of rank, and

    proving that all three are the same.

    Definition 2.6 Let A be an m n matrix over a field K. We say that the columnrank of A is the maximum number of linearly independent columns of A, while

    the row rankofA is the maximum number of linearly independent rows of A. (We

    regard columns or rows as vectors in Km andKn respectively.)

    Now we need a sequence of four lemmas.

    Lemma 2.4 (a) Elementary column operations dont change the column rank

    of a matrix.

    (b) Elementary row operations dont change the column rank of a matrix.

    (c) Elementary column operations dont change the row rank of a matrix.

    (d) Elementary row operations dont change the row rank of a matrix.

    Proof (a) This is clear for Type 3 operations, which just rearrange the vectors.

    For Types 1 and 2, we have to show that such an operation cannot take a linearly

    independent set to a linearly dependent set; the vice versa statement holds because

    the inverse of an elementary operation is another operation of the same kind.

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    26/124

    2.3. RANK 21

    So suppose that v1, . . . , vn are linearly independent. Consider a Type 1 oper-

    ation involving adding c times the jth column to the ith; the new columns arev1, . . . , v

    n, where v

    k = vk for k= i, while vi = vi + cvj. Suppose that the new vec-

    tors are linearly dependent. Then there are scalars a1, . . . , an, not all zero, suchthat

    0 = a1v1 + + anvn

    = a1v1 + + ai(vi + cvj) + + ajvj + + anvn= a1v1 + + aivi + + (aj + cai)vj + + anvn.

    Since v1, . . . , vn are linearly independent, we conclude that

    a1 = 0, . . . , ai = 0, . . . , aj + cai = 0, . . . , an = 0,

    from which we see that all the ak are zero, contrary to assumption. So the new

    columns are linearly independent.

    The argument for Type 2 operations is similar but easier.

    (b) It is easily checked that, if an elementary row operation is applied, then the

    new vectors satisfy exactly the same linear relations as the old ones (that is, the

    same linear combinations are zero). So the linearly independent sets of vectors

    dont change at all.

    (c) Same as (b), but applied to rows.

    (d) Same as (a), but applied to rows.

    Theorem 2.5 For any matrix A, the row rank, the column rank, and the rank are

    all equal. In particular, the rank is independent of the row and column operations

    used to compute it.

    Proof Suppose that we reduce A to canonical form B by elementary operations,

    where B has rankr. These elementary operations dont change the row or column

    rank, by our lemma; so the row ranks ofA and B are equal, and their column ranks

    are equal. But it is trivial to see that, if

    B =

    Ir OO O

    ,

    then the row and column ranks ofB are both equal to r. So the theorem is proved.

    We can get an extra piece of information from our deliberations. Let A be an

    invertible n n matrix. Then the canonical form of A is just I: its rank is equalto n. This means that there are matrices P and Q, each a product of elementary

    matrices, such that

    PAQ = In.

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    27/124

    22 CHAPTER 2. MATRICES AND DETERMINANTS

    From this we deduce that

    A = P1InQ1 = P1Q1;

    in other words,

    Corollary 2.6 Every invertible square matrix is a product of elementary matrices.

    In fact, we learn a little bit more. We observed, when we defined elementary

    matrices, that they can represent either elementary column operations or elemen-

    tary row operations. So, when we have written A as a product of elementary

    matrices, we can choose to regard them as representing column operations, and

    we see that A can be obtained from the identity by applying elementary columnoperations. If we now apply the inverse operations in the other order, they will turn

    A into the identity (which is its canonical form). In other words, the following is

    true:

    Corollary 2.7 If A is an invertible n n matrix, then A can be transformed intothe identity matrix by elementary column operations alone (or by elementary row

    operations alone).

    2.4 DeterminantsThe determinant is a function defined on square matrices; its value is a scalar.

    It has some very important properties: perhaps most important is the fact that a

    matrix is invertible if and only if its determinant is not equal to zero.

    We denote the determinant function by det, so that det(A) is the determinantofA. For a matrix written out as an array, the determinant is denoted by replacing

    the square brackets by vertical bars:

    det

    1 2

    3 4=

    1 2

    3 4 .

    You have met determinants in earlier courses, and you know the formula for

    the determinant of a 22 or 33 matrix:a bc d

    = adbc,

    a b c

    d e f

    g h i

    = aei + b f g + cdha f hbdi ceg.Our first job is to define the determinant for square matrices of any size. We do

    this in an axiomatic manner:

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    28/124

    2.4. DETERMINANTS 23

    Definition 2.7 A function D defined on n

    n matrices is a determinant if it satis-

    fies the following three conditions:

    (D1) For 1 i n, D is a linear function of the ith column: this means that, if Aand A are two matrices which agree everywhere except the ith column, andif A is the matrix whose ith column is c times the ith column of A plus ctimes the ith column ofA, but agreeing with A and A everywhere else, then

    D(A) = c D(A) + cD(A).

    (D2) IfA has two equal columns, then D(A) = 0.

    (D3) D(In) = 1, where In is the nn identity matrix.We show the following result:

    Theorem 2.8 There is a unique determinant function on nn matrices, for any n.

    Proof First, we show that applying elementary row operations to A has a well-

    defined effect on D(A).

    (a) If B is obtained from A by adding c times the jth column to the ith, then

    D(B) = D(A).

    (b) IfB is obtained from A by multiplying the ith column by a non-zero scalar

    c, then D(B) = cD(A).

    (c) IfB is obtained from A by interchanging two columns, then D(B) =D(A).For (a), let A be the matrix which agrees with A in all columns except the ith,

    which is equal to the jth column ofA. By rule (D2), D(A) = 0. By rule (D1),

    D(B) = D(A) + cD(A) = D(A).

    Part (b) follows immediately from rule (D3).

    To prove part (c), we observe that we can interchange the ith and jth columns

    by the following sequence of operations:

    add the ith column to the jth; multiply the ith column by 1; add the jth column to the ith; subtract the ith column from the jth.

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    29/124

    24 CHAPTER 2. MATRICES AND DETERMINANTS

    In symbols,

    (ci, cj) (ci, cj + ci) (ci, cj + ci) (cj, cj + ci) (cj, ci).The first, third and fourth steps dont change the value of D, while the second

    multiplies it by 1.Now we take the matrix A and apply elementary column operations to it, keep-

    ing track of the factors by which D gets multiplied according to rules (a)(c). The

    overall effect is to multiply D(A) by a certain non-zero scalar c, depending on theoperations.

    If A is invertible, then we can reduce A to the identity, so that cD(A) =D(I) = 1, whence D(A) = c

    1

    .

    IfA is not invertible, then its column rank is less than n. So the columns ofAare linearly dependent, and one column can be written as a linear combina-

    tion of the others. Applying axiom (D1), we see that D(A) is a linear com-bination of values D(A), where A are matrices with two equal columns; soD(A) = 0 for all such A, whence D(A) = 0.

    This proves that the determinant function, if it exists, is unique. We show its

    existence in the next section, by giving a couple of formulae for it.

    Given the uniqueness of the determinant function, we now denote it by det(A)instead ofD(A). The proof of the theorem shows an important corollary:

    Corollary 2.9 A square matrix is invertible if and only if det(A) = 0.

    Proof See the case division at the end of the proof of the theorem.

    One of the most important properties of the determinant is the following.

    Theorem 2.10 I f A a n d B a r e nn matrices overK, then det(AB) = det(A) det(B).

    Proof Suppose first that B is not invertible. Then det(B) = 0. Also, AB is notinvertible. (For, suppose that (AB)1 =X, so thatXAB =I. ThenX A is the inverseofB.) So det(AB) = 0, and the theorem is true.

    In the other case, B is invertible, so we can apply a sequence of elementary

    column operations to B to get to the identity. The effect of these operations is

    to multiply the determinant by a non-zero factor c (depending on the operations),

    so that c det(B) = I, or c = (det(B))1. Now these operations are represented byelementary matrices; so we see that BQ = I, where Q is a product of elementarymatrices.

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    30/124

    2.5. CALCULATING DETERMINANTS 25

    If we apply the same sequence of elementary operations to AB, we end up with

    the matrix (AB)Q = A(BQ) = AI = A. The determinant is multiplied by the samefactor, so we find that c det(AB) = det(A). Since c = det(B))1, this implies thatdet(AB) = det(A) det(B), as required.

    Finally, we have defined determinants using columns, but we could have used

    rows instead:

    Proposition 2.11 The determinant is the unique function D of n n matriceswhich satisfies the conditions

    (D1) for1 i n, D is a linear function of the ith row;(D2) if two rows of A are equal , then D(A) = 0;

    (D3) D(In) = 1.

    The proof of uniqueness is almost identical to that for columns. To see that

    D(A) = det(A): ifA is not invertible, then D(A) = det(A) = 0; but ifA is invertible,then it is a product of elementary matrices (which can represent either row or

    column operations), and the determinant is the product of the factors associated

    with these operations.

    Corollary 2.12 If A denotes the transpose of A, then det(A) = det(A).

    For, ifD denotes the determinant computed by row operations, then det(A) =D(A) = det(A), since row operations on A correspond to column operations onA.

    2.5 Calculating determinants

    We now give a couple of formulae for the determinant. This finishes the job we

    left open in the proof of the last theorem, namely, showing that a determinant

    function actually exists!

    The first formula involves some background notation.

    Definition 2.8 A permutation of{1, . . . , n} is a bijection from the set {1, . . . , n}to itself. The symmetric group Sn consists of all permutations of the set {1, . . . , n}.(There are n! such permutations.) For any permutation Sn, there is a numbersign() = 1, computed as follows: write as a product of disjoint cycles; ifthere are k cycles (including cycles of length 1), then sign() = (1)nk. Atransposition is a permutation which interchanges two symbols and leaves all the

    others fixed. Thus, if is a transposition, then sign() = 1.

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    31/124

    26 CHAPTER 2. MATRICES AND DETERMINANTS

    The last fact holds because a transposition has one cycle of size 2 and n

    2

    cycles of size 1, so n1 altogether; so sign() = (1)n(n1) = 1.We need one more fact about signs: if is any permutation and is a trans-

    position, then sign() = sign(), where denotes the composition of and (apply first , then ).

    Definition 2.9 Let A be an nn matrix over K. The determinant ofA is definedby the formula

    det(A) = Sn

    sign()A1(1)A2(2) An(n).

    Proof In order to show that this is a good definition, we need to verify that itsatisfies our three rules (D1)(D3).

    (D1) According to the definition, det(A) is a sum of n! terms. Each term, apartfrom a sign, is the product ofn elements, one from each row and column. If

    we look at a particular column, say the ith, it is clear that each product is a

    linear function of that column; so the same is true for the determinant.

    (D2) Suppose that the ith and jth columns of A are equal. Let be the transpo-sition which interchanges i and j and leaves the other symbols fixed. Then

    ((i)) =(j) and ((j)) =(i), whereas ((k)) =(k) for k

    = i, j. Be-

    cause the elements in the ith and jth columns ofA are the same, we see thatthe products A1(1)A2(2) An(n) and A1(1)A2(2) An(n) are equal.But sign() = sign(). So the corresponding terms in the formula forthe determinant cancel one another. The elements of Sn can be divided up

    into n!/2 pairs of the form {,}. As we have seen, each pair of terms inthe formula cancel out. We conclude that det(A) = 0. Thus (D2) holds.

    (D3) IfA = In, then the only permutation which contributes to the sum is theidentity permutation : for any other permutation satisfies (i) = i forsome i, so that Ai(i) = 0. The sign of is +1, and all the terms Ai(i) = Aiiare equal to 1; so det(A) = 1, as required.

    This gives us a nice mathematical formula for the determinant of a matrix.

    Unfortunately, it is a terrible formula in practice, since it involves working out

    n! terms, each a product of matrix entries, and adding them up with + and signs. For n of moderate size, this will take a very long time! (For example,

    10! = 3628800.)

    Here is a second formula, which is also theoretically important but very inef-

    ficient in practice.

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    32/124

    2.5. CALCULATING DETERMINANTS 27

    Definition 2.10 Let A be an n

    n matrix. For 1

    i, j

    n, we define the (i, j)

    minorofA to be the (n1) (n1) matrix obtained by deleting the ith row andjth column of A. Now we define the (i, j) cofactor of A to be (1)i+j times thedeterminant of the (i, j) minor. (These signs have a chessboard pattern, startingwith sign + in the top left corner.) We denote the (i, j) cofactor of A by Ki j(A).Finally, the adjugate ofA is the nn matrix Adj(A) whose (i, j) entry is the (j, i)cofactor Kji(A) ofA. (Note the transposition!)

    Theorem 2.13 (a) For j i n, we have

    det(A) =n

    i=1

    Ai jKi j(A).

    (b) For1 i n, we have

    det(A) =n

    j=1

    Ai jKi j(A).

    This theorem says that, if we take any column or row of A, multiply each

    element by the corresponding cofactor, and add the results, we get the determinant

    ofA.

    Example 2.4 Using a cofactor expansion along the first column, we see that1 2 3

    4 5 6

    7 8 10

    =5 68 10

    42 38 10

    + 72 35 6

    = (5 106 8)4(2 103 8) + 7(2 63 5)= 2 + 1621= 3

    using the standard formula for a 22 determinant.

    Proof We prove (a); the proof for (b) is a simple modification, using rows instead

    of columns. Let D(A) be the function defined by the right-hand side of (a) in thetheorem, using the jth column ofA. We verify rules (D1)(D3).

    (D1) It is clear that D(A) is a linear function of the jth column. For k= j, the co-factors are linear functions of the kth column (since they are determinants),

    and so D(A) is linear.

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    33/124

    28 CHAPTER 2. MATRICES AND DETERMINANTS

    (D2) If the kth and lth columns ofA are equal, then each cofactor is the determi-

    nant of a matrix with two equal columns, and so is zero. The harder case iswhen the jth column is equal to another, say the kth. Using induction, each

    cofactor can be expressed as a sum of elements of the kth column times

    (n 2) (n 2) determinants. In the resulting sum, it is easy to see thateach such determinant occurs twice with opposite signs and multiplied by

    the same factor. So the terms all cancel.

    (D3) Suppose that A = I. The only non-zero cofactor in the jth column is Kj j(I),which is equal to (1)j+j det(In1) = 1. So D(I) = 1.

    By the main theorem, the expression D(A) is equal to det(A).

    At first sight, this looks like a simple formula for the determinant, since it is

    just the sum ofn terms, rather than n! as in the first case. But each term is an

    (n1) (n1) determinant. Working down the chain we find that this methodis just as labour-intensive as the other one.

    But the cofactor expansion has further nice properties:

    Theorem 2.14 For any nn matrix A, we haveA Adj(A) = Adj(A) A = det(A) I.

    Proof We calculate the matrix product. Recall that the (i, j) entry of Adj(A) isKji(A).

    Now the (i, i) entry of the product A Adj(A) isn

    k=1

    Aik(Adj(A))ki =n

    k=1

    AikKik(A) = det(A),

    by the cofactor expansion. On the other hand, if i = j, then the (i, j) entry of theproduct is

    n

    k=1

    Aik(Adj(A))k j =n

    k=1

    AikKjk(A).

    This last expression is the cofactor expansion of the matrix A which is the sameof A except for the jth row, which has been replaced by the ith row of A. (Note

    that changing the jth row of a matrix has no effect on the cofactors of elements in

    this row.) So the sum is det(A). But A has two equal rows, so its determinant iszero.

    Thus A Adj(A) has entries det(A) on the diagonal and 0 everywhere else; soit is equal to det(A) I.

    The proof for the product the other way around is the same, using columns

    instead of rows.

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    34/124

    2.6. THE CAYLEYHAMILTON THEOREM 29

    Corollary 2.15 If the n

    n matrix A is invertible, then its inverse is equal to

    (det(A))1 Adj(A).

    So how can you work out a determinant efficiently? The best method in prac-

    tice is to use elementary operations.

    Apply elementary operations to the matrix, keeping track of the factor by

    which the determinant is multiplied by each operation. If you want, you can

    reduce all the way to the identity, and then use the fact that det (I) = 1. Often it issimpler to stop at an earlier stage when you can recognise what the determinant is.

    For example, if the matrix A has diagonal entries a1, . . . , an, and all off-diagonalentries are zero, then det(A) is just the product a1

    an.

    Example 2.5 Let

    A =

    1 2 34 5 6

    7 8 10

    .

    Subtracting twice the first column from the second, and three times the second

    column from the third (these operations dont change the determinant) gives

    1 0 0

    4 3 67 6 11

    .

    Now the cofactor expansion along the first row gives

    det(A) =

    3 66 11 = 3336 = 3.

    (At the last step, it is easiest to use the formula for the determinant of a 2 2matrix rather than do any further reduction.)

    2.6 The CayleyHamilton Theorem

    Since we can add and multiply matrices, we can substitute them into a polynomial.

    For example, if

    A =

    0 1

    2 3

    ,

    then the result of substituting A into the polynomial x23x + 2 is

    A23A + 2I =2 36 7

    +

    0 36 9

    +

    2 0

    0 2

    =

    0 0

    0 0

    .

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    35/124

    30 CHAPTER 2. MATRICES AND DETERMINANTS

    We say that the matrix A satisfies the equation x2

    3x + 2 = 0. (Notice that for

    the constant term 2 we substituted 2I.)It turns out that, for every nn matrix A, we can calculate a polynomial equa-

    tion of degree n satisfied by A.

    Definition 2.11 Let A be a n n matrix. The characteristic polynomial of A isthe polynomial

    cA(x) = det(xIA).This is a polynomial in x of degree n.

    For example, if

    A =

    0 12 3

    ,

    then

    cA(x) =

    x 12 x3 = x(x3) + 2 = x2 3x + 2.

    Indeed, it turns out that this is the polynomial we want in general:

    Theorem 2.16 (CayleyHamilton Theorem) L e t A b e a n nn matrix with char-acteristic polynomial cA(x). Then cA(A) = O.

    Example 2.6 Let us just check the theorem for 2 2 matrices. IfA =

    a b

    c d

    ,

    then

    cA(x) =

    xa bc xd = x2 (a + d)x + (adbc),

    and so

    cA(A) = a2 + bc ab + bdac + cd bc + d2 (a + d)

    a b

    c d+ (adbc)1 0

    0 1 = O,after a small amount of calculation.

    Proof We use the theorem

    A Adj(A) = det(A) I.In place ofA, we put the matrix xIA into this formula:

    (xIA) Adj(xIA) = det(xIA)I = cA(x)I.

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    36/124

    2.6. THE CAYLEYHAMILTON THEOREM 31

    Now it is very tempting just to substitute x = A into this formula: on the

    right we have cA(A)I = cA(A), while on the left there is a factor AIA = O.Unfortunately this is not valid; it is important to see why. The matrix Adj(xIA)is an n n matrix whose entries are determinants of (n 1) (n 1) matriceswith entries involving x. So the entries of Adj(xIA) are polynomials in x, and ifwe try to substitute A for x the size of the matrix will be changed!

    Instead, we argue as follows. As we have said, Adj(xIA) is a matrix whoseentries are polynomials, so we can write it as a sum of powers of x times matrices,

    that is, as a polynomial whose coefficients are matrices. For example,

    x2 + 1 2x3x

    4 x + 2 = x

    2

    1 0

    0 0+x0 2

    3 1+1 0

    4 2 .

    The entries in Adj(xIA) are (n 1) (n 1) determinants, so the highestpower ofx that can arise is xn1. So we can write

    Adj(xIA) = xn1Bn1 +xn2Bn2 + +xB1 +B0,for suitable nn matrices B0, . . . ,Bn1. Hence

    cA(x)I = (xIA) Adj(xIA)= (xIA)(xn1Bn1 +xn2Bn2 + +xB1 +B0)= xnBn1 +xn1(ABn1 +Bn2) + +x(AB1 +B0)AB0.

    So, if we letcA(x) = x

    n + cn1xn1 + + c1x + c0,then we read off that

    Bn1 = I,ABn1 + Bn2 = cn1I,

    AB1 + B0 = c1I,AB0 = c0I.

    We take this system of equations, and multiply the first by An, the second by

    An1, . . . , and the last by A0 = I. What happens? On the left, all the terms cancelin pairs: we have

    AnBn1 +An1(ABn1 +Bn2) + +A(AB1 +B0) +I(AB0) = O.On the right, we have

    An + cn1An1 + + c1A + c0I = cA(A).So cA(A) = O, as claimed.

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    37/124

    32 CHAPTER 2. MATRICES AND DETERMINANTS

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    38/124

    Chapter 3

    Linear maps between vector spaces

    We return to the setting of vector spaces in order to define linear maps between

    them. We will see that these maps can be represented by matrices, decide when

    two matrices represent the same linear map, and give another proof of the canon-

    ical form for equivalence.

    3.1 Definition and basic properties

    Definition 3.1 Let V and W be vector spaces over a field K. A function from

    V to W is a linear map if it preserves addition and scalar multiplication, that is, if

    (v1 + v2) = (v1) +(v2) for all v1, v2 V; (cv) = c(v) for all v V and c K.

    Remarks 1. We can combine the two conditions into one as follows:

    (c1v1 + c2v2) = c1(v1) + c2(v2).

    2. In other literature the term linear transformation is often used instead of

    linear map.

    Definition 3.2 Let : VW be a linear map. The image of is the set

    Im() = {w W : w = (v) for some v V},

    and the kernel of is

    Ker() = {v V : (v) = 0}.

    33

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    39/124

    34 CHAPTER 3. LINEAR MAPS BETWEEN VECTOR SPACES

    Proposition 3.1 Let : V

    W be a linear map. Then the image of is a sub-

    space of W and the kernel is a subspace of V .

    Proof We have to show that each is closed under addition and scalar multiplica-

    tion. For the image, ifw1 = (v1) and w2 = (v2), then

    w1 + w2 = (v1) +(v2) = (v1 + v2),

    and ifw = (v) thencw = c(v) = (cv).

    For the kernel, if(v1) = (v2) = 0 then

    (v1 + v2) = (v1) +(v2) = 0 + 0 = 0,

    and if(v) = 0 then(cv) = c(v) = c0 = 0.

    Definition 3.3 We define the rankof to be () = dim(Im()) and the nullityof to be () = dim(Ker()). (We use the Greek letters rho and nu here toavoid confusing the rank of a linear map with the rank of a matrix, though they

    will turn out to be closely related!)

    Theorem 3.2 (RankNullity Theorem) Let : V

    W be a linear map. Then

    () +() = dim(V).

    Proof Choose a basis u1, u2, . . . , uq for Ker(), where r= dim(Ker()) = ().The vectors u1, . . . , uq are linearly independent vectors ofV, so we can add furthervectors to get a basis for V, say u1, . . . , uq, v1, . . . , vs, where q + s = dim(V).

    We claim that the vectors (v1), . . . ,(vs) form a basis for Im(). We haveto show that they are linearly independent and spanning.

    Linearly independent: Suppose that c1(v1) + + cs(vs) = 0. Then (c1v1 +

    + csvs) = 0, so that c1v1 +

    + csvs

    Ker(). But then this vector can

    be expressed in terms of the basis for Ker():

    c1v1 + + csvs = a1u1 + + aquq,whence

    a1u1 aquq + c1v1 + + csvs = 0.But the us and vs form a basis for V, so they are linearly independent. So

    this equation implies that all the as and cs are zero. The fact that c1 = =cs = 0 shows that the vectors (v1, . . . ,(vs) are linearly independent.

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    40/124

    3.2. REPRESENTATION BY MATRICES 35

    Spanning: Take any vector in Im(), say w. Then w = (v) for some v

    V.

    Write v in terms of the basis for V:

    v = a1u1 + + aquq + c1v1 + + csvsfor some a1, . . . , aq, c1, . . . , cs. Applying , we get

    w = (v)

    = a1(u1) + + aq(uq) + c1(v1) + + cs(vs)= c1w1 + + csws,

    since (ui) = 0 (as ui

    Ker()) and (vi) = wi. So the vectors w1, . . . , wsspan Im().

    Thus, () = dim(Im()) = s. Since () = q and q + s = dim(V), thetheorem is proved.

    3.2 Representation by matrices

    We come now to the second role of matrices in linear algebra: they represent

    linear maps between vector spaces.

    Let : V

    W be a linear map, where dim(V) = m and dim(W) = n. As we

    saw in the first section, we can take V and W in their coordinate representation:

    V = Km and W = Kn (the elements of these vector spaces being represented ascolumn vectors). Let e1, . . . , em be the standard basis for V (so that ei is the vectorwith ith coordinate 1 and all other coordinates zero), and f1, . . . , fn the standardbasis for V. Then for i = 1, . . . , m, the vector (ei) belongs to W, so we can writeit as a linear combination of f1, . . . , fn.

    Definition 3.4 The matrix representing the linear map : VW relative to thebases B = (e1, . . . , em) for V and C= (f1, . . . , fn) for W is the nm matrix whose(i, j) entry is ai j, where

    (ei) =n

    j=1

    ajifj

    for j = 1, . . . , n.

    In practice this means the following. Take(ei) and write it as a column vector

    [ a1i a2i ani ]. This vector is the ith column of the matrix representing .So, for example, ifm = 3, n = 2, and

    (e1) = f1 + f2, (e2) = 2f1 + 5f2, (e3) = 3f1 f2,

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    41/124

    36 CHAPTER 3. LINEAR MAPS BETWEEN VECTOR SPACES

    then the vectors (ei) as column vectors are

    (e1) =

    1

    1

    , (e2) =

    2

    5

    , (e3) =

    3

    1

    ,

    and so the matrix representing T is1 2 3

    1 5 1

    .

    Now the most important thing about this representation is that the action ofis now easily described:

    Proposition 3.3 Let : VW be a linear map. Choose bases for V and W andlet A be the matrix representing . Then, if we represent vectors of V and W ascolumn vectors relative to these bases, we have

    (v) = Av.

    Proof Let e1, . . . , em be the basis for V, and f1, . . . , fn for W. Take v =mi=1 ciei

    V, so that in coordinates

    v =

    c1...

    cm

    .

    Then

    (v) =m

    i=1

    ci(ei) =m

    i=1

    n

    j=1

    ciajifj,

    so the jth coordinate of(v) is ni=1 ajici, which is precisely the jth coordinate inthe matrix product Av.

    In our example, ifv = 2e1 + 3e2 + 4e3 = [ 2 3 4 ]

    , then

    (v) = Av = 1 2 31 5 1

    2

    3

    4 = 20

    13 .

    Addition and multiplication of linear maps correspond to addition and multi-

    plication of the matrices representing them.

    Definition 3.5 Let and be linear maps from V to W. Define their sum +by the rule

    (+)(v) = (v) +(v)

    for all v V. It is easy to check that + is a linear map.

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    42/124

    3.3. CHANGE OF BASIS 37

    Proposition 3.4 If and are linear maps represented by matrices A and B

    respectively, then + is represented by the matrix A +B.

    The proof of this is not difficult: just use the definitions.

    Definition 3.6 Let U,V,W be vector spaces over K, and let : U V and :V W be linear maps. The product is the function U W defined by therule

    ()(u) = ((u))

    for all u U. Again it is easily checked that is a linear map. Note that theorder is important: we take a vector u U, apply to it to get a vector in V, andthen apply to get a vector in W. So means apply , then .

    Proposition 3.5 If : U V and : V W are linear maps represented bymatrices A and B respectively, then is represented by the matrix BA.

    Again the proof is tedious but not difficult. Of course it follows that a linear

    map is invertible (as a map; that is, there is an inverse map) if and only if it is

    represented by an invertible matrix.

    Remark Let l = dim(U), m = dim(V) and n = dim(W), then A is m l, and Bis nm; so the product BA is defined, and is n l, which is the right size for amatrix representing a map from an l-dimensional to an n-dimensional space.

    The significance of all this is that the strange rule for multiplying matrices is

    chosen so as to make Proposition 3.5 hold. The definition of multiplication of

    linear maps is the natural one (composition), and we could then say: what defini-

    tion of matrix multiplication should we choose to make the Proposition valid? We

    would find that the usual definition was forced upon us.

    3.3 Change of basis

    The matrix representing a linear map depends on the choice of bases we used to

    represent it. Now we have to discuss what happens if we change the basis.

    Remember the notion of transition matrix from Chapter 1. IfB = (v1, . . . , vm)and B = (v1, . . . , v

    m) are two bases for a vector space V, the transition matrix PB,B

    is the matrix whose jth column is the coordinate representation of vj in the basisB. Then we have

    [v]B = P[v]B,

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    43/124

    38 CHAPTER 3. LINEAR MAPS BETWEEN VECTOR SPACES

    where [v]B is the coordinate representation of an arbitrary vector in the basis B,

    and similarly for B. The inverse ofPB,B is PB,B. Let pi j be the (i, j) entry ofP = PB,B .

    Now let C= (w1, . . . , wn) and C = (w1, . . . , w

    n) be two bases for a space W,

    with transition matrix QC,C and inverse QC,C. Let Q = QC,C and let R = QC,C beits inverse, with (i, j) entry ri j.

    Let be a linear map from V to W. Then is represented by a matrix A

    using the bases B and C, and by a matrix A using the bases B and C. What is therelation between A and A?

    We just do it and see. To get A, we have to represent the vectors (vi) in thebasis C. We have

    vj =m

    i=1

    pi jvi,

    so

    (vj) =m

    i=1

    pi j(vi)

    =m

    i=1

    m

    k=1

    pi jAkiwk

    =m

    i=1

    n

    k=1

    n

    l=1

    pi jAkirlkwl.

    This means, on turning things around, that

    (A)l j =n

    k=1

    m

    i=1

    rlkAkipi j,

    so, according to the rules of matrix multiplication,

    A = RAP = Q1AP.

    Proposition 3.6 Let : VW be a linear map represented by matrix A relativeto the bases B for V and C for W , and by the matrix A relative to the bases B for

    V and C for W. If P = PB,B and Q = PC,C are the transition matrices from theunprimed to the primed bases, then

    A = Q1AP.

    This is rather technical; you need it for explicit calculations, but for theoretical

    purposes the importance is the following corollary. Recall that two matrices A and

    B are equivalent if B is obtained from A by multiplying on the left and right by

    invertible matrices. (It makes no difference that we said B = PAQ before andB = Q1AP here, of course.)

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    44/124

    3.4. CANONICAL FORM REVISITED 39

    Proposition 3.7 Two matrices represent the same linear map with respect to dif-

    ferent bases if and only if they are equivalent.

    This holds because

    transition matrices are always invertible (the inverse of PB,B is the matrixPB,B for the transition in the other direction); and

    any invertible matrix can be regarded as a transition matrix: for, if the nnmatrix P is invertible, then its rank is n, so its columns are linearly inde-

    pendent, and form a basis B for Kn; and then P = PB,B , where B is thestandard basis.

    3.4 Canonical form revisited

    Now we can give a simpler proof of Theorem 2.3 about canonical form for equiv-

    alence. First, we make the following observation.

    Theorem 3.8 Let : V W be a linear map of rank r = (). Then there arebases for V and W such that the matrix representing is, in block form,

    Ir OO O

    .

    Proof As in the proof of Theorem 3.2, choose a basis u1, . . . , us for Ker(), andextend to a basis u1, . . . , us, v1, . . . , vr for V. Then (v1), . . . ,(vr) is a basis forIm(), and so can be extended to a basis (v1), . . . ,(vr),x1, . . . ,xt for W. Nowwe will use the bases

    v1, . . . , vr, vr+1 = u1, . . . , vr+s = ws for V,

    w1 = (v1), . . . , wr = (vr), wr+1 = x1, . . . , wr+s = xs for W.

    We have

    (vi) =

    wi if 1 i r,0 otherwise;

    so the matrix of relative to these bases isIr O

    O O

    as claimed.

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    45/124

    40 CHAPTER 3. LINEAR MAPS BETWEEN VECTOR SPACES

    We recognise the matrix in the theorem as the canonical form for equivalence.

    Combining Theorem 3.8 with Proposition 3.7, we see:

    Theorem 3.9 A matrix of rank r is equivalent to the matrixIr O

    O O

    .

    We also see, by the way, that the rank of a linear map (that is, the dimension

    of its image) is equal to the rank of any matrix which represents it. So all our

    definitions of rank agree!

    The conclusion is that

    two matrices are equivalent if and only if they have the same rank.

    So how many equivalence classes of mn matrices are there, for given m and n?The rank of such a matrix can take any value from 0 up to the minimum of m and

    n; so the number of equivalence classes is min{m, n}+ 1.

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    46/124

    Chapter 4

    Linear maps on a vector space

    In this chapter we consider a linear map from a vector space V to itself. If

    dim(V) = n then, as in the last chapter, we can represent by an n n matrixrelative to any basis for V. However, this time we have less freedom: instead of

    having two bases to choose, there is only one. This makes the theory much more

    interesting!

    4.1 Projections and direct sums

    We begin by looking at a particular type of linear map whose importance will beclear later on.

    Definition 4.1 The linear map : V V is a projection if 2 = (where, asusual, 2 is defined by 2(v) = ((v))).

    Proposition 4.1 If : VV is a projection, then V = Im()Ker().

    Proof We have two things to do:

    Im() + Ker() = V: Take any vector v V, and let w = (v) Im(). Weclaim that v

    w

    Ker(). This holds because

    (vw) = (v)(w) = (v)((v)) = (v)2(v) = 0,since 2 = . Now v = w + (vw) is the sum of a vector in Im() and onein Ker().

    Im()Ker() ={0}: Take v Im()Ker(). Then v = (w) for some vectorw; and

    0 = (v) = ((w)) = 2(w) = (w) = v,

    as required (the first equality holding because v Ker()).

    41

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    47/124

    42 CHAPTER 4. LINEAR MAPS ON A VECTOR SPACE

    It goes the other way too: ifV = U

    W, then there is a projection : V

    V

    with Im() = U and Ker() = W. For every vector v V can be uniquely writtenas v = u + w, where u U and w W; we define by the rule that (v) = u. Nowthe assertions are clear.

    The diagram in Figure 4.1 shows geometrically what a projection is. It moves

    any vector v in a direction parallel to Ker() to a vector lying in Im().

    Ker()

    Im()

    %

    v

    (v)

    Figure 4.1: A projection

    We can extend this to direct sums with more than two terms. First, notice that

    if is a projection and = I (where I is the identity map, satisfying I(v) = vfor all vectors v), then is also a projection, since

    ()2 = (I)2 = I2+2 = I2+= I= ;and + = I; also = (I) = 2 = O. Finally, we see that Ker() =Im(); so V = Im() Im(). In this form the result extends:

    Proposition 4.2 Suppose that1,2, . . . ,r are projections on V satisfying

    (a) 1 +2 + +r = I, where I is the identity transformation;(b) ij = O for i = j.Then V = U1U2 Ur, where Ui = Im(i).

    Proof We have to show that any vector v can be uniquely written in the form

    v = u1 + u2 + + ur, where ui Ui for i = 1, . . . , r. We havev = I(v) = 1(v) +2(v) + +r(v) = u1 + u2 + + ur,

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    48/124

    4.2. LINEAR MAPS AND MATRICES 43

    where ui = i(v)

    Im(i) for i = 1, . . . , r. So any vector can be written in this

    form. Now suppose that we have any expression

    v = u1 + u2 + + ur,

    with ui Ui for i = 1, . . . , r. Since ui Ui = Im(i), we have ui = (vi) for somevi; then

    i(ui) =

    2i (vi) = i(vi) = u

    i.

    On the other hand, for j = i, we havei(u

    j) = ij(vj) = 0,

    since ij = O. So applying i to the expression for v, we obtain

    i(v) = i(u1) +i(u

    2) + +i(ur) = i(ui) = ui,

    since all terms in the sum except the ith are zero. So the only possible expression

    is given by ui = i(v), and the proof is complete.

    Conversely, ifV = U1U2Ur, then we can find projections i,2, . . . ,rsatisfying the conditions of the above Proposition. For any vector v V has aunique expression as

    v = u1 + u2 + + urwith ui Ui for i = 1, . . . , r; then we define i(v) = ui.

    The point of this is that projections give us another way to recognise and de-

    scribe direct sums.

    4.2 Linear maps and matrices

    Let : V V be a linear map. If we choose a basis v1, . . . , vn for V, then V canbe written in coordinates as Kn, and is represented by a matrix A, say, where

    (vi) =

    n

    j=1 ajivj.

    Then just as in the last section, the action ofon V is represented by the action ofA on Kn: (v) is represented by the product Av. Also, as in the last chapter, sumsand products (and hence arbitrary polynomials) of linear maps are represented by

    sums and products of the representing matrices: that is, for any polynomial f(x),the map f() is represented by the matrix f(A).

    What happens if we change the basis? This also follows from the formula we

    worked out in the last chapter. However, there is only one basis to change.

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    49/124

    44 CHAPTER 4. LINEAR MAPS ON A VECTOR SPACE

    Proposition 4.3 Letbe a linear map on V which is represented by the matrix A

    relative to a basis B, and by the matrix A relative to a basis B. Let P = PB,B bethe transition matrix between the two bases. Then

    A = P1AP.

    Proof This is just Proposition 4.6, since P and Q are the same here.

    Definition 4.2 Two n n matrices A and B are said to be similar if B = P1APfor some invertible matrix P.

    Thus similarity is an equivalence relation, and

    two matrices are similar if and only if they represent the same linear

    map with respect to different bases.

    There is no simple canonical form for similarity like the one for equivalence

    that we met earlier. For the rest of this section we look at a special class of ma-

    trices or linear maps, the diagonalisable ones, where we do have a nice simple

    representative of the similarity class. In the final section we give without proof a

    general result for the complex numbers.

    4.3 Eigenvalues and eigenvectors

    Definition 4.3 Let be a linear map on V. A vector v V is said to be aneigenvector of, with eigenvalue K, if v = 0 and (v) = v. The set {v :(v) = v} consisting of the zero vector and the eigenvectors with eigenvalue is called the -eigenspace of.

    Note that we require that v = 0; otherwise the zero vector would be an eigen-vector for any value of. With this requirement, each eigenvector has a uniqueeigenvalue: for if(v) = v = v, then ( )v = 0, and so (since v = 0) wehave = .

    The name eigenvalue is a mixture of German and English; it means charac-

    teristic value or proper value (here proper is used in the sense of property).

    Another term used in older books is latent root. Here latent means hidden:

    the idea is that the eigenvalue is somehow hidden in a matrix representing , andwe have to extract it by some procedure. Well see how to do this soon.

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    50/124

    4.4. DIAGONALISABILITY 45

    Example Let

    A = 6 612 11

    .

    The vector v =

    3

    4

    satisfies

    6 612 11

    3

    4

    = 2

    3

    4

    ,

    so is an eigenvector with eigenvalue 2. Similarly, the vector w =2

    3

    is an eigen-vector with eigenvalue 3.

    If we knew that, for example, 2 is an eigenvalue of A, then we could find a

    corresponding eigenvector

    x

    y

    by solving the linear equations

    6 612 11

    x

    y

    = 2

    x

    y

    .

    In the next-but-one section, we will see how to find the eigenvalues, and the fact

    that there cannot be more than n of them for an nn matrix.

    4.4 Diagonalisability

    Some linear maps have a particularly simple representation by matrices.

    Definition 4.4 The linear map on V is diagonalisable if there is a basis ofV

    relative to which the matrix representing is a diagonal matrix.

    Suppose that v1, . . . , vn is such a basis showing that is diagonalisable. Then(vi) = aiivi for i = 1, . . . , n, where aii is the ith diagonal entry of the diagonalmatrix A. Thus, the basis vectors are eigenvectors. Conversely, if we have a basis

    of eigenvectors, then the matrix representing is diagonal. So:

    Proposition 4.4 The linear map on V is diagonalisable if and only if there is abasis of V consisting of eigenvectors of.

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    51/124

    46 CHAPTER 4. LINEAR MAPS ON A VECTOR SPACE

    Example The matrix 1 20 1

    is not diagonalisable. It is easy to see that its onlyeigenvalue is 1, and the only eigenvectors are scalar multiples of [ 1 0 ]. So wecannot find a basis of eigenvectors.

    Theorem 4.5 Let : VV be a linear map. Then the following are equivalent:

    (a) is diagonalisable;

    (b) V is the direct sum of eigenspaces of;

    (c) = 11 +

    + rr , where 1, . . . ,r are the distinct eigenvalues of,and1, . . . ,r are projections satisfying 1 + +r = I andij = 0 fori = j.

    Proof Let 1, . . . ,r be the distinct eigenvalues of , and let vi1, . . . , vimi be abasis for the i-eigenspace of. Then is diagonalisable if and only if the unionof these bases is a basis for V. So (a) and (b) are equivalent.

    Now suppose that (b) holds. Proposition 4.2 and its converse show that there

    are projections 1, . . . ,r satisfying the conditions of (c) where Im(i) is the i-eigenspace. Now in this case it is easily checked that T and ii agree on every

    vector in V, so they are equal. So (b) implies (c).Finally, if= ii, where the i satisfy the conditions of (c), then V is the

    direct sum of the spaces Im(i), and Im(i) is the i-eigenspace. So (c) implies(b), and we are done.

    Example Our matrix A =

    6 612 11

    is diagonalisable, since the eigenvectors

    3

    4

    and

    2

    3

    are linearly independent, and so form a basis for R. Indeed, we see

    that 6 612 113 42 3 = 3 42 32 00 3 ,so that P1AP is diagonal, where P is the matrix whose columns are the eigenvec-tors ofA.

    Furthermore, one can find two projection matrices whose column spaces are

    the eigenspaces, namely

    P1 =

    9 6

    12 8

    , P2 =

    8 612 9

    .

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    52/124

    4.4. DIAGONALISABILITY 47

    Check directly that P21 = P1, P22 = P2, P1P2 = P2P1 = 0, P1 +P2 =I, and 2P1 +3P2 =

    A.This expression for a diagonalisable matrix A in terms of projections is useful

    in calculating powers ofA, or polynomials in A.

    Proposition 4.6 Let

    A =r

    i=1

    iPi

    be the expression for the diagonalisable matrix A in terms of projections Pi sat-

    isfying the conditions of Theorem 4.5, that is, ri=1 Pi = I and PiPj = O for i = j.

    Then

    (a) for any positive integer m, we have

    Am =r

    i=1

    mi Pi;

    (b) for any polynomial f(x), we have

    f(A) =r

    i=1

    f(i)Pi.

    Proof (a) The proof is by induction on m, the case m = 1 being the given expres-sion. Suppose that the result holds for m = k1. Then

    Ak = Ak1A

    =

    r

    i=1

    k1i Pi

    r

    i=1

    iPi

    When we multiply out this product, all the terms PiPj are zero for i = j, and weobtain simply

    ri=1

    k1i iPi, as required. So the induction goes through.

    (b) If f(x) = amxm, we obtain the result by multiplying the equation of part

    (a) by am and summing over m. (Note that, for m = 0, we use the fact that

    A0 = I =r

    i=1

    Pi =r

    i=1

    0i Pi,

    that is, part (a) holds also for m = 0.)

  • 8/8/2019 Notes on Linear Algebra-Peter J Cameron

    53/124

    48 CHAPTER 4. LINEAR MAPS ON A VECTOR SPACE

    4.5 Characteristic and minimal polynomials

    We defined the determinant of a square matrix A. Now we want to define the de-

    terminant of a linear map . The obvious way to do this is to take the determinant

    of any matrix representing . For this to be a good definition, we need to showthat it doesnt matter which matrix we take; in other words, that det(A) = det(A)ifA and A are similar. But, ifA = P1AP, then

    det(P1AP) = det(P1) det(A) det(P) = det(A),

    since det(P1) det(P) = 1. So our plan will succeed:

    Definition 4.5 (a) The determinant det() of a linear map : V

    V is the

    determinant of any matrix representing T.

    (b) The characteristic polynomial c(x) of a linear map : VV is the char-acteristic polynomial of any matrix representing .

    (c) The minimal polynomial m(x) of a linear map : V V is the monicpolynomial of smallest degree which is satisfied by .

    The second part of the definition is OK, by the same reasoning as the first

    (since cA(x) is just a determinant). But the third part also creates a bit of a problem:how do we know that satisfies any polynomial? The CayleyHamilton Theoremtells us that cA(A) = O for any matrix A representing . Now cA(A) representscA(), and cA = c by definition; so c() = O. Indeed, the CayleyHamiltonTheorem can be stated in the following form:

    Proposition 4.7 For any linear map on V, its minimal polynomial m(x) di-vides its characteristic polynomial c(x) (as polynomials).

    Proof Suppose not; then we can divide c(x) by m(x), getting a quotient q(x)and non-zero remainder r(x); that is,

    c(x) = m(x)q(x) + r(x).

    Substituting for x, using the fact that c() = m() = O, we find that r() =

    0. But the degree ofr is less than the degree ofm, so this contradicts the defini-tion ofm as the polynomial of least degree satisfied by .

    Theorem 4.8 Let be a linear map on V . Then the following conditions areequivalent for an element ofK:

    (a) is an eigenvalue of;

    (b) is