Cahill-Physical Mathematics What Physicists and Engineers Need to Know

674
Physical Mathematics: What Physicists and Engineers Need to Know Kevin Cahill Department of Physics and Astronomy University of New Mexico, Albuquerque, NM 87131-1156

description

math for physist

Transcript of Cahill-Physical Mathematics What Physicists and Engineers Need to Know

  • Physical Mathematics:What Physicists and Engineers Need to Know

    Kevin Cahill

    Department of Physics and Astronomy

    University of New Mexico, Albuquerque, NM 87131-1156

  • iCopyright c20042010 Kevin Cahill

  • ii

    For Marie, Mike, Sean, Peter, Mia, and James, and

    in honor of Muntader al-Zaidi and Julian Assange.

  • Contents

    Preface

    1. Linear Algebra

    2. Fourier Series

    3. Fourier and Laplace Transforms

    4. Infinite Series

    5. Complex-Variable Theory

    6. Differential Equations

    7. Integral Equations

    8. Legendre Polynomials

    9. Bessel Functions

    10. Group Theory

    11. Tensors and Local Symmetries

    12. Forms

    13. Probability and Statistics

    14. Monte Carlos

    15. Chaos and Fractals

    16. Functional Derivatives

    17. Path Integrals

    18. The Renormalization Group

    19. Finance

    20. Strings

  • Preface

    A word to students: You will find lots of physical examples crammed in

    amongst the mathematics of this book. Dont let them bother you. As you

    master the mathematics, you will learn some of the physics by osmosis, just

    as people learn a foreign language by living in a foreign country.

    This book has two goals. One is to teach mathematics in the context of

    physics. Students of physics and engineering can learn both physics and

    mathematics when they study mathematics with the help of physical exam-

    ples and problems. The other goal is to explain succinctly those concepts of

    mathematics that are simple and that help one understand physics. Linear

    dependence and analyticity are simple and helpful. Elaborate convergence

    tests for infinite series and exhaustive lists of the properties of special func-

    tions are not. This mathematical triage does not always work: Whitneys

    embedding theorem is helpful but not simple.

    The book is intended to support a one- or two-semester course for graduate

    students and advanced undergraduates. One could teach the first seven,

    eight, or nine chapters in the first semester, and the other chapters in the

    second semester.

    Several friends and colleagues, especially Bernard Becker, Steven Boyd,

    Robert Burckel, Colston Chandler, Vageli Coutsias, David Dunlap, Daniel

    Finley, Franco Giuliani, Igor Gorelov, Dinesh Loomba, Michael Malik, Sud-

    hakar Prasad, Randy Reeder, and Dmitri Sergatskov have given me valuable

    advice.

    The students in the courses in which I have developed this book have

    improved it by asking questions, contributing ideas, suggesting topics, and

    correcting mistakes. I am particularly grateful to Mss. Marie Cahill and

    Toby Tolley and to Messrs. Chris Cesare, Robert Cordwell, Amo-Kwao

    Godwin, Aram Gragossian, Aaron Hankin, Tyler Keating, Joshua Koch,

    Akash Rakholia, Ravi Raghunathan, Akash Rakholia, and Daniel Young for

  • Preface v

    ideas and questions and to Mss. Tiffany Hayes and Sheng Liu and Messrs.

    Thomas Beechem, Charles Cherqui, Aaron Hankin, Ben Oliker, Boleszek

    Osinski, Ravi Raghunathan, Christopher Vergien, Zhou Yang, Daniel Zir-

    zow, for pointing out several typos.

  • 1Linear Algebra

    1.1 Numbers

    The natural numbers are the positive integers, with or without zero. Ra-

    tional numbers are ratios of integers. An irrational number x is one whose

    decimal digits dn

    x =

    n=mx

    dn10n

    (1.1)

    do not repeat. Thus, the repeating decimals 1/2 = 0.50000 . . . and 1/3 =

    0.3 0.33333 . . . are rational, while pi = 3.141592654 . . . is not. Incidentally,decimal arithemetic was invented in India over 1500 years ago but was not

    widely adopted in the Europe until the seventeenth century.

    The real numbersR include the rational numbers and the irrational num-

    bers; they correspond to all the points on an infinite line called the real line.

    The complex numbers C are the real numbers with one new number i

    whose square is 1. A complex number z is a linear combination of a realnumber x and a real multiple iy of i

    z = x+ iy. (1.2)

    Here x = Rez is said to be the real part of z, and y the imaginary part

    y = Imz. One adds complex numbers by adding their real and imaginary

    parts

    z1 + z2 = x1 + iy1 + x2 + iy2 = x1 + x2 + i(y1 + y2). (1.3)

    Since i2 = 1, the product of two complex numbers isz1z2 = (x1 + iy1)(x2 + iy2) = x1x2 y1y2 + i(x1y2 + y1x2). (1.4)

  • 2 Linear Algebra

    The polar representation z = r exp(i) of a complex number z = x+ iy is

    z = x+ iy = rei = r(cos + i sin ) (1.5)

    in which r is the modulus of z

    r = |z| =x2 + y2 (1.6)

    and is its argument

    = arctan (y/x). (1.7)

    Since exp(2pii) = 1, there is an inevitable ambiguity in the definition of the

    argument of any complex number: the argument + 2pin gives the same z

    as .

    There are two common notations z and z for the complex conjugate ofa complex number z = x+ iy

    z = z = x iy. (1.8)The square of the modulus of a complex number z = x+ iy is

    |z|2 = x2 + y2 = (x+ iy)(x iy) = zz = zz. (1.9)The inverse of a complex number z = x+ iy is

    z1 = (x+ iy)1 =x iy

    (x iy)(x+ iy) =x iyx2 + y2

    =z

    zz=

    z

    |z|2 . (1.10)

    Grassmann numbers i are anti-commuting numbers, i.e., the anti-

    commutator of any two Grassmann numbers vanishes

    {i, j} [i, j]+ ij + ji = 0. (1.11)In particular, the square of any Grassmann number is zero

    2i = 0. (1.12)

    One may show that any power series in N Grassmann numbers i is a

    polynomial whose highest term is proportional to the product 12 . . . N .

    For instance, the most complicated power series in two Grassmann numbers

    f(1, 2) =n=0

    m=0

    fnmn1

    m2 (1.13)

    is just

    f(1, 2) = f0 + f1 1 + f2 2 + f12 12. (1.14)

  • 1.2 Arrays 3

    1.2 Arrays

    An array is an ordered set of numbers. Arrays play big roles in computer

    science, physics, and mathematics. They can be of any (integral) dimension.

    A one-dimensional array (a1, a2, . . . , an) is variously called an n-tuple,

    a row vector when written horizontally, a column vector when written

    vertically, or an n-vector. The numbers ak are its entries or components.

    A two-dimensional array aik with i running from 1 to n and k from 1 to

    m is an nm matrix. The numbers aik are called its entries, elements,or matrix elements. One can think of a matrix as a stack of row vectors

    or as a queue of column vectors. The entry aik is in the ith row and kth

    column.

    One can add together arrays of the same dimension and shape by adding

    their entries. Two n-tuples add as

    (a1, . . . , an) + (b1, . . . , bn) = (a1 + b1, . . . , an + bn) (1.15)

    and two nm matrices a and b add as(a+ b)ik = aik + bik. (1.16)

    One can multiply arrays by numbers: Thus z times the three-dimensional

    array aijk is the array with entries zaijk.

    One can multiply two arrays together no matter what their shapes and

    dimensions. The outer product of an n-tuple a and an m-tuple b is an

    nm matrix with elements(ab)ik = aibk (1.17)

    or an m n matrix with entries (ba)ki = bkai. If a and b are complex, thenone also can form the outer products

    (a b)ik = ai bk (1.18)

    and (b a)ki = bk ai. The (outer) product of a matrix aik and a three-

    dimensional array bj`m is a five-dimensional array

    (ab)ikj`m = aikbj`m. (1.19)

    An inner product is possible when two arrays are of the same size in

    one of their dimensions. Thus the inner product (a, b) a|b or dotproduct a b of two real n-tuples a and b is(a, b) = a|b = a b = (a1, . . . , an) (b1, . . . , bn) = a1b1 + + anbn. (1.20)

  • 4 Linear Algebra

    The inner product of two complex n-tuples is defined as

    (a, b) = a|b = a b = (a1, . . . , an) (b1, . . . , bn) = a1 b1 + + an bn (1.21)or as its complex conjugate

    (a, b) = a|b = (a b) = (b, a) = b|a = b a (1.22)so that (a, a) 0.

    The product of an m n matrix aik times an n-tuple bk is the m-tuple bwhose ith component is

    bi = ai1b1 + ai2b2 + + ainbn =nk=1

    aikbk (1.23)

    or simply b = a b in matrix notation.If the size of the second dimension of a matrix a matches that of the first

    dimension of a matrix b, then their product ab is the matrix with entries

    (ab)i` = ai1b1` + + ainbn`. (1.24)

    1.3 Matrices

    Apart from n-tuples, the most important arrays in linear algebra are the

    two-dimensional arrays called matrices.

    The trace of an n n matrix a is the sum of its diagonal elements

    Tr a = tr a = a11 + a22 + + ann =Ni=1

    aii. (1.25)

    The trace of two matrices is independent of their order

    Tr (a b) =ni=1

    nk=1

    aikbki =nk=1

    ni=1

    bkiaik = Tr (ba) . (1.26)

    It follows that the trace is cyclic

    Tr (a b . . . z) = Tr (b . . . z a) . (1.27)

    (Here we take for granted that the elements of these matrices are ordinary

    numbers that commute with each other.)

    The transpose of an n ` matrix a is the ` n matrix aT with entries(aT)ij

    = aji. (1.28)

  • 1.3 Matrices 5

    Some mathematicians use a prime to mean transpose, as in a = aT, butphysicists tend to use a prime to mean different . One may show that

    (a b) T = bT aT. (1.29)

    A matrix that is equal to its transpose

    a = aT (1.30)

    is symmetric.

    The (hermitian) adjoint of a matrix is the complex conjugate of its trans-

    pose (Charles Hermite, 18221901). That is, the (hermitian) adjoint a ofan N L complex matrix a is the LN matrix with entries

    (a)ij = (aji) = aji. (1.31)

    One may show that

    (a b) = b a. (1.32)

    A matrix that is equal to its adjoint

    (a)ij = (aji) = aji = aij (1.33)

    (and which therefore must be a square matrix) is said to be hermitian or

    self adjoint

    a = a. (1.34)

    Example: The three Pauli matrices

    1 =

    (0 1

    1 0

    ), 2 =

    (0 ii 0

    ), and 3 =

    (1 0

    0 1)

    (1.35)

    are all hermitian (Wolfgang Pauli, 19001958). A real hermitian matrix is

    symmetric. If a matrix a is hermitian, then the quadratic form

    v|a|v =ni=1

    Nj=1

    vi aijvj R (1.36)

    is real for all complex n-tuples v.

    The Kronecker delta ik is defined to be unity if i = k and zero if i 6= k(Leopold Kronecker, 18231891). In terms of it, the nn identity matrixI is the matrix with entries Iik = ik.

    The inverse a1 of an n n matrix a is a matrix that satisfiesa1 a = a a1 = I (1.37)

  • 6 Linear Algebra

    in which I is the n n identity matrix.So far we have been writing n-tuples and matrices and their elements with

    lower-case letters. It is equally common to use capital letters, and we will

    do so for the rest of this section.

    A matrix U whose adjoint U is its inverse

    U U = UU = I (1.38)

    is unitary. Unitary matrices are square.

    A real unitary matrix O is orthogonal and obeys the rule

    OTO = OOT = I. (1.39)

    Orthogonal matrices are square.

    An N N hermitian matrix A is said to be non-negativeA 0 (1.40)

    if for all complex vectors V the quadratic form

    V |A|V =Ni=1

    Nj=1

    V i AijVj 0 (1.41)

    is non-negative. A similar rule

    V |A|V > 0 (1.42)for all |V defines a positive or a positive-definite matrix (A > 0), al-though people often use these terms to describe non-negative matrices.

    Examples: The non-symmetric, non-hermitian 2 2 matrix(1 1

    1 1)

    (1.43)

    is positive on the space of all real 2-vectors but not on the space of all

    complex 2-vectors.

    The 2 2 matrix (0 11 0

    )(1.44)

    provides a representation of i since(0 11 0

    )(0 11 0

    )=

    (1 00 1

    )= I. (1.45)

    The 2 2 matrix (0 1

    0 0

    )(1.46)

  • 1.4 Vectors 7

    provides a representation of a Grassmann number since(0 1

    0 0

    )(0 1

    0 0

    )=

    (0 0

    0 0

    )= 0. (1.47)

    To represent two Grassmann numbers one needs 4 4 matrices, such as

    1 =

    0 1 0 0

    0 0 0 0

    0 0 0 1

    0 0 0 0

    and 2 =

    0 0 0 0

    0 0 0 0

    1 0 0 0

    0 1 0 0

    . (1.48)

    1.4 Vectors

    Vectors are things that can be multiplied by numbers and added together

    to form other vectors in the same vector space. So if U and V are vectors

    in a vector space S over a set F of numbers and x and y are numbers in F ,

    then

    W = xU + yV (1.49)

    is also a vector in the vector space S.

    A basis for a vector space S is a set of vectors Bk for k = 1 . . . N in terms

    of which every vector U in S can be expressed as a linear combination

    U = u1B1 + u2B2 + + uNBN (1.50)with numbers uk in F . These numbers uk are the components of the vector

    U in the basis Bk.

    Example: Suppose the vector W represents a certain kind of washer and

    the vector N represents a certain kind of nail. Then if n and m are natural

    numbers, the vector

    H = nW +mN (1.51)

    would represent a possible inventory of a very simple hardware store. The

    vector space of all such vectors H would include all possible inventories of

    the store. That space is a two-dimensional vector space over the natural

    numbers, and the two vectors W and N form a basis for it.

    Example: The complex numbers are a vector space. Two of its vectors

    are the number 1 and the number i; the vector space of complex numbers is

    then the set of all linear combinations

    z = x1 + yi = x+ iy. (1.52)

  • 8 Linear Algebra

    So the complex numbers form a two-dimensional vector space over the real

    numbers, and the vectors 1 and i form a basis for it.

    The complex numbers also form a one-dimensional vector space over the

    complex numbers. Here any non-zero real or complex number, for instance

    the number 1 can be a basis consisting of the single vector 1. This one-

    dimensional vector space is the set of all z = z1 for arbitrary complex z.

    Example: Ordinary flat two-dimensional space is the set of all linear

    combinations

    r = xx + yy (1.53)

    in which x and y are real numbers and x and y are perpendicular vectors of

    unit length (unit vectors). This vector space, called R2, is a 2-d space overthe reals.

    Note that the same vector r can be described either by the basis vectors

    x and y or by any other set of basis vectors, such as y and xr = xx + yy = y(y) + xx. (1.54)

    So the components of the vector r are (x, y) in the {x, y} basis and (y, x) inthe {y, x} basis. Each vector is unique, but its components dependupon the basis.

    Example: Ordinary flat three-dimensional space is the set of all linear

    combinations

    r = xx + yy + zz (1.55)

    in which x, y, and z are real numbers. It is a 3-d space over the reals.

    Example: Arrays of a given dimension and size can be added and multi-

    plied by numbers, and so they form a vector space. For instance, all complex

    three-dimensional arrays aijk in which 1 i 3, 1 j 4, and 1 k 5form a vector space over the complex numbers.

    Example: Derivatives are vectors, so are partial derivatives. For in-

    stance, the linear combinations of x and y partial derivatives taken at

    x = y = 0

    a

    x+ b

    y(1.56)

    form a vector space.

    Example: The space of all linear combinations of a set of functions fi(x)

    defined on an interval [a, b]

    f(x) =i

    zifi(x) (1.57)

  • 1.5 Linear Operators 9

    is a vector space over the space of the numbers {zi}.Example: In quantum mechanics, a state is represented by a vector,

    often written as or in Diracs notation as |. If c1 and c2 are complexnumbers, and |1 and |2 are any two states, then the linear combination

    | = c1|1+ c2|2 (1.58)also is a possible state of the system.

    1.5 Linear Operators

    A linear operator A is a map that takes any vector U in its domain into

    another vector U = A(U) AU in a way that is linear. So if U and V aretwo vectors in the domain of the linear operator A and b and c are two real

    or complex numbers, then

    A(bU + cV ) = bA(U) + cA(V ) = bAU + cAV. (1.59)

    In the most important case, the operator A maps vectors in a vector space

    S into vectors in the same space S. In this case, A maps each basis vector

    Bi for the space S into a linear combination of these basis vectors Bk

    ABi = a1iB1 + a2iB2 + + aNiBN =Nk=1

    akiBk. (1.60)

    The square matrix aki represents the linear operator A in the Bk basis.

    The effect of A on any vector U = u1B1 + u2B2 + + uNBN in S then is

    AU = A(

    Ni=1

    uiBi) =

    Ni=1

    uiABi =

    Ni=1

    ui

    Nk=1

    akiBk

    =Nk=1

    (Ni=1

    akiui

    )Bk. (1.61)

    Thus the kth component uk of the vector U = AU is

    uk = ak1u1 + ak2u2 + + akNuN =Ni=1

    akiui. (1.62)

    Thus the column vector u of the components uk of the vector U = AU

    is the product u = au of the matrix with elements aki that represents thelinear operator A in the Bk basis with the column vector with components

  • 10 Linear Algebra

    ui that represents the vector U in that basis. So in a given basis, vectors

    and linear operators can be identified with column vectors and matrices.

    Each linear operator is unique, but its matrix depends upon the

    basis. Suppose we change from the Bk basis to another basis Bk

    Bk =

    N`=1

    u`kB` (1.63)

    in which the N N matrix u`k has an inverse matrix u1ki so thatNk=1

    u1ki Bk =Nk=1

    u1kiN`=1

    u`kB` =

    N`=1

    (Nk=1

    u`ku1ki

    )B` =

    N`=1

    `iB` = B

    i.

    (1.64)

    Then the other basis vectors are given by

    Bi =Nk=1

    u1ki Bk (1.65)

    and one may show (problem 3) that the action of the linear operator A on

    this basis vector is

    ABi =N

    j,k,`=1

    u`jajku1ki B

    ` (1.66)

    which shows that the matrix a that represents A in the B basis is relatedto the matrix a that represents it in the B basis by

    a`i =N

    jk=1

    u`jajku1ki (1.67)

    which in matrix notation is simply a = u au1.Example: Suppose the action of the linear operator A on the basis

    {B1, B2} is AB1 = B2 and AB2 = 0. If the column vectors

    b1 =

    (1

    0

    )and b2 =

    (0

    1

    )(1.68)

    represent the two basis vectors B1 and B2, then the matrix

    a =

    (0 0

    1 0

    )(1.69)

    would represent the linear operator A. But if we let the column vectors

    b1 =(

    1

    0

    )and b2 =

    (0

    1

    )(1.70)

  • 1.6 Inner Products 11

    represent the basis vectors

    B1 =12

    (B1 +B2)

    B2 =12

    (B1 B2) (1.71)

    then the vectors

    b1 =12

    (1

    1

    )and b2 =

    12

    (1

    1)

    (1.72)

    would represent B1 and B2, and so the matrix

    a =1

    2

    (1 1

    1 1)

    (1.73)

    would represent the linear operator A.

    A linear operator A also may map a vector space S with basis Bk into a

    different vector space T with a different basis Ck. In this case, A maps the

    basis vector Bi into a linear combination of the basis vectors Ck

    ABi =Mk=1

    akiCk (1.74)

    and an arbitrary vector U = u1B1 + + uNBN into

    AU =Mk=1

    (Ni=1

    akiui

    )Ck. (1.75)

    Well return to this point in Sections (1.14 & 1.15).

    1.6 Inner Products

    Most, but not all, of the vector spaces used by physicists have an inner

    product. An inner product is a function that associates a number (f, g)

    with every ordered pair of vectors f & g in the vector space in such a way

    as to satisfy these rules:

    (f, g) = (g, f) (1.76)(f, z1g1 + z2g2) = z1(f, g1) + z2(f, g2) (1.77)

    (z1f1 + z2f2, g) = z1(f1, g) + z

    2(f2, g) (1.78)

    (f, f) 0 (1.79)in which the f s and gs are vectors and the zs are numbers. The first two

    rules require that the inner product be linear in the second vector of the

  • 12 Linear Algebra

    pair and anti-linear in the first vector of the pair. (The third rule follows

    from the first two.) If, in addition, the only vector f that has a vanishing

    inner product with itself is the zero vector

    (f, f) = 0 if and only if f = 0 (1.80)

    then the inner product is hermitian or non degenerate; otherwise it is

    semi-definite or degenerate.

    The inner product of a vector f with itself is the square of the norm

    |f | = f of the vector|f |2 = f 2= (f, f) (1.81)

    and so by (1.79), the norm is well-defined as

    f =

    (f, f). (1.82)

    The distance between two vectors f and g is the norm of their difference

    f g . (1.83)Example: The space of real vectors V with N components Vi forms an

    N -dimensional vector space over the real numbers with inner product

    (U, V ) =Ni=1

    UiVi (1.84)

    If the inner product (U, V ) is zero, then the two vectors are orthogonal. If

    (U,U) = 0, then

    (U,U) =Ni=1

    U2i = 0 (1.85)

    which implies that all Ui = 0, so the vector U = 0. So this inner product is

    hermitian or non degenerate.

    Example: The space of complex vectors V with N components Vi forms an

    N -dimensional vector space over the complex numbers with inner product

    (U, V ) =

    Ni=1

    Ui Vi (1.86)

    If the inner product (U, V ) is zero, then the two vectors are orthogonal. If

    (U,U) = 0, then

    (U,U) =

    Ni=1

    Ui Ui =Ni=1

    |Ui|2 = 0 (1.87)

  • 1.6 Inner Products 13

    which implies that all Ui = 0, and so the vector U is zero. So this inner

    product is hermitian or non degenerate.

    Example: For the vector space of N L complex matrices A, B, . . ., thetrace of the product of the adjoint (1.31) of A times B is a natural inner

    product

    (A,B) = TrAB =Ni=1

    Lj=1

    (A)jiBij =Ni=1

    Lj=1

    AijBij . (1.88)

    Note that (A,A) is positive

    (A,A) = TrAA =Ni=1

    Lj=1

    AijAij =Ni=1

    Lj=1

    |Aij |2 0 (1.89)

    and zero only when A = 0.

    Two examples of degenerate or semi-definite inner products are given in

    the section (1.41) on correlation functions.

    Mathematicians call a vector space with an inner product (1.761.79) an

    inner-product space, a metric space, and a pre-Hilbert space.

    A sequence of vectors fn is a Cauchy sequence if for every > 0 there

    is an integer N() such that fn fm < whenever both n and m exceedN(). A sequence of vectors fn converges to a vector f if for every > 0

    there is an integer N() such that ffn < whenever n exceeds N(). Aninner-product space with a norm defined as in (1.82) is complete if each of

    its Cauchy sequences converges to a vector in that space. A Hilbert space

    is a complete inner-product space. Every finite-dimensional inner-product

    space is complete and so is a Hilbert space. But the term Hilbert space

    more often is used to describe infinite-dimensional complete inner-product

    spaces, such as the space of all square-integrable functions (David Hilbert,

    18621943).

    Example 1.1 (The Hilbert Space of Square-Integrable Functions) For the

    vector space of functions (1.57), a natural inner product is

    (f, g) =

    badx f(x)g(x). (1.90)

    The squared norm f of a function f(x) is

    f 2= badx |f(x)|2. (1.91)

    A function is said to be square integrable if its norm is finite. The space of

  • 14 Linear Algebra

    all square-integrable functions is an inner-product space; it is also complete

    and so is a Hilbert space.

    1.7 Schwarz Inequality

    Since by (1.79) the inner product of a vector with itself cannot be negative,

    it follows that for any vectors f and g and any complex number z = x+ iy

    the inner product

    P (x, y) = (f +zg, f+zg) = (f, f)+zz(g, g)+z(g, f)+z(f, g) 0 (1.92)is positive or zero. It even is non-negative at its minimum, which we may

    find by differentiation

    0 =P (x, y)

    x=P (x, y)

    y(1.93)

    to be at

    x = Re(f, g)/(g, g) & y = Im(f, g)/(g, g) (1.94)as long as (g, g) > 0. If we substitute these values into Eq.(1.92), then we

    arrive at the relation

    (f, f)(g, g) |(f, g)|2 (1.95)which is called variously the Cauchy-Schwarz inequality and the Schwarz

    inequality. Equivalently

    f g |(f, g)|. (1.96)If the inner product is degenerate and (g, g) = 0, then the non-negativity of

    (f + zg, f + zg) implies that (f, g) = 0, in which case the Schwarz inequality

    is trivially satisfied.

    Example: For the dot-product of two real 3-vectors r & R, the Cauchy-

    Schwarz inequality is

    (r r) (R R) (r R)2 = (r r) (R R) cos2 (1.97)where is the angle between r and R.

    Example: For two real n-vectors x and y, the Schwarz inequality is

    (x x) (y y) (x y)2 = (x x) (y y) cos2 (1.98)and it implies (problem 5) that

    x+ y x+ y. (1.99)

  • 1.8 Linear Independence and Completeness 15

    Example: For two complex n-vectors u and v, the Schwarz inequality is

    (u u) (v v) |u v|2 = (u u) (v v) cos2 (1.100)and it implies (problem 6) that

    u+ v u+ v. (1.101)Example: For the inner product (1.90) of two complex functions f and

    g, the Schwarz inequality is badx |f(x)|2

    badx |g(x)|2

    badx f(x)g(x)

    2 . (1.102)

    1.8 Linear Independence and Completeness

    A set of N vectors Vi is linearly dependent if there exist numbers ci, not

    all zero, such that the linear combination ciVi vanishes

    Ni=1

    ciVi = 0. (1.103)

    A set of vectors Vi is linearly independent if it is not linearly dependent.

    A set {Vi} of linearly independent vectors is maximal in a vector spaceS if the addition of any other vector V S to the set {Vi} makes the set{V, Vi} linearly dependent.

    A set {Vi} of N linearly independent vectors that is maximal in a vectorspace S spans that space. For if V is any vector in S (and not one of the

    vectors Vi), then the set {V, Vi} linearly dependent. Thus there are numbersc, ci, not all zero, that make the sum

    cV +Ni=1

    ciVi = 0 (1.104)

    vanish. Now if c were 0, then the set {Vi} would be linearly dependent.Thus c 6= 0, and so we may divide by it and express the arbitrary vector Vas a linear combination of the vectors Vi

    V = 1c

    Ni=1

    ciVi. (1.105)

    So the set of vectors {Vi} spans the space S; it is a complete set of vectorsin the space S.

  • 16 Linear Algebra

    A set of vectors {Vi} that is complete in a vector space S is said to providea basis for that space because the set affords a way to expand an arbitrary

    vector in S as a linear combination of the basis vectors {Vi}. If the vectorsof basis are linearly dependent, then at least one of them is superfluous; thus

    it is convenient to have the vectors of a basis be linearly independent.

    1.9 Dimension of a Vector Space

    Suppose {Vi|i = 1 . . . N} and {Ui|i = 1 . . .M} are two sets of N and Mmaximally linearly independent vectors in a space S. Then N = M .

    Suppose M < N . Since the U s are complete, as explained in Sec. 1.8, we

    may express each of the N vectors Vi in terms of the M vectors Uj

    Vi =Mj=1

    AijUj . (1.106)

    Let Aj be the vector with components Aij ; there are M < N such vectors,

    and each has N > M components. So it is always possible to find a non-

    zero N -dimensional vector C with components ci that is orthogonal to all

    M vectors Aj :

    Ni=1

    ciAij = 0. (1.107)

    But then the linear combination

    Ni=1

    ciVi =Ni=1

    Mj=1

    ciAij Uj = 0 (1.108)

    vanishes, which would imply that the N vectors Vi were linearly dependent.

    Since these vectors are by assumption linearly independent, it follows that

    N M .Similarly, one may show that M N . Thus M = N .The number N of vectors in a maximal linearly independent set of a vector

    space S is the dimension of the vector space. Any N linearly independent

    vectors in an N -dimensional space forms a basis for it.

    1.10 Orthonormal Vectors

    Suppose the vectors {Vi|i = 1 . . . N} are linearly independent. Then we maymake out of them a set of N vectors Ui that are orthonormal

    (Ui, Uj) = ij . (1.109)

  • 1.10 Orthonormal Vectors 17

    Procedure (Gramm-Schmidt): We set

    U1 =V1

    (V1, V1)(1.110)

    So the first vector U1 is normalized.

    Next we set

    u2 = V2 + c12U1 (1.111)

    and require that u2 be orthogonal to U1

    0 = (U1, u2) = (U1, c12U1 + V2) = c12 + (U1, V2) (1.112)

    whence c12 = (U1, V2), and sou2 = V2 (U1, V2)U1. (1.113)

    The normalized vector U2 then is

    U2 =u2

    (u2, u2). (1.114)

    Similarly, we set

    u3 = V3 + c13U1 + c23U2 (1.115)

    and ask that u3 be orthogonal both to U1

    0 = (U1, u3) = (U1, c13U1 + c23U2 + V3) = c13 + (U1, V3) (1.116)

    and to U2

    0 = (U2, u3) = (U2, c13U1 + c23U2 + V3) = c23 + (U2, V3) (1.117)

    whence ci3 = (Ui, V3) for i = 1 & 2, and sou3 = V3 (U1, V3)U1 (U2, V3)U2. (1.118)

    The normalized vector U3 then is

    U3 =u3

    (u3, u3). (1.119)

    We may continue in this way until we reach the last of the N linearly

    independent vectors. We require the kth unnormalized vector uk

    uk = Vk +k1i=1

    cikUi. (1.120)

  • 18 Linear Algebra

    to be orthogonal to the k 1 vectors Ui and find that cik = (Ui, Vk) sothat

    uk = Vk k1i=1

    (Ui, Vk)Ui. (1.121)

    The normalized vector then is

    Uk =uk

    (uk, uk). (1.122)

    In general, a basis is more useful if it is composed of orthonormal vectors.

    1.11 Outer Products

    From any two vectors f and g, we may make an operator A that takes any

    vector h into the vector f with coefficient (g, h)

    Ah = f(g, h). (1.123)

    It is easy to show that A is linear, that is that

    A(zh+ we) = zAh+ wAe (1.124)

    for any vectors e, h and numbers z, w.

    Example: If f and g are vectors with components fi and gi, and h has

    components hi, then the linear transformation is

    (Ah)i =Nj=1

    Aijhj = fi

    Nj=1

    gjhj (1.125)

    so A is a matrix with entries

    Aij = figj . (1.126)

    The matrix A is the outer product of the vectors f and g.

    1.12 Dirac Notation

    Such outer products are important in quantum mechanics, and so Dirac

    invented a notation for linear algebra that makes them easy to write. In his

    notation, the outer product A of Eqs.(1.1231.126) is

    A = |fg| (1.127)

  • 1.12 Dirac Notation 19

    and the inner product (g, h) is

    (g, h) = g|h. (1.128)He called g| a bra and |h a ket, so that g|h is a bracket. In hisnotation Eq.(1.123) reads

    A|h = |fg|h. (1.129)The new thing in Diracs notation is the bra f |. If the ket |f is repre-

    sented by the vector

    |f =

    z1z2z3z4

    (1.130)then the bra f | is represented by the adjoint of that vector

    f | = (z1 , z2 , z3 , z4) . (1.131)In the standard notation, bras are implicit in the definition of the inner

    product, but they do not appear explicitly.

    In Diracs notation, the rules that a hermitian inner product (1.761.80)

    satisfies are:

    f |g = g|f (1.132)f |z1g1 + z2g2 = z1f |g1+ z2f |g2 (1.133)z1f1 + z2f2|g = z1f1|g+ z2f2|g (1.134)

    f |f 0 (1.135)f |f = 0 if and only if f = 0. (1.136)

    Usually, however, states in Dirac notation are labeled | or by their quan-tum numbers |n, l,m, and so one rarely sees plus signs or complex numbersor operators inside bras or kets. But one should.

    Diracs notation allows us to write outer products clearly and simply.

    Example: If the vectors f = |f and g = |g are

    |f = ab

    c

    and |g = ( zw

    )(1.137)

    then their outer products are

    |fg| =az awbz bwcz cw

    and |gf | = (za zb zcwa wb wc

    )(1.138)

  • 20 Linear Algebra

    as well as

    |ff | =aa ab acba bb bcca cb cc

    and |gg| = (zz zwwz ww

    ). (1.139)

    Example: In Dirac notation, formula (1.121) is

    |uk = |Vk k1i=1

    |UiUi|Vk (1.140)

    or

    |uk =(I

    k1i=1

    |UiUi|)|Vk (1.141)

    and (1.122) is

    |Uk = |ukuk|uk . (1.142)

    1.13 Identity Operators

    Dirac notation provides a neat way of representing the identity operator I in

    terms of a complete set of orthonormal vectors. First, in standard notation,

    the expansion of an arbitrary vector f in a space S in terms of a complete

    set of N orthonormal vectors ei

    (ej , ei) = ij (1.143)

    is

    f =

    Ni=1

    ci ei (1.144)

    from which we conclude that

    (ej , f) = (ej ,

    Ni=1

    ciei) =

    Ni=1

    ci(ej , ei) =

    Ni=1

    ciij = cj (1.145)

    whence

    f =

    Ni=1

    (ei, f) ei =

    Ni=1

    ei (ei, f). (1.146)

    The derivation stops here because there is no explicit expression for a bra.

  • 1.13 Identity Operators 21

    But in Dirac notation, these equations read

    ej |ei = ij (1.147)

    |f =Ni=1

    ci |ei (1.148)

    ej |f = ej |Ni=1

    ciei =Ni=1

    ciej |ei =Ni=1

    ciij = cj (1.149)

    |f =Ni=1

    ei|f |ei =Ni=1

    |ei ei|f. (1.150)

    (1.151)

    We now rewrite the last equation as

    |f =(

    Ni=1

    |ei ei|)|f. (1.152)

    Since this equation holds for every vector |f, the quantity inside the paren-theses must be the identity operator

    I =Ni=1

    |ei ei|. (1.153)

    Because one always may insert an identity operator anywhere, and because

    the formula is true for every complete set of orthonormal vectors, the reso-

    lution (1.153) of the identity operator is extremely useful.

    By twice inserting the identity operator (1.153), one may convert a general

    inner product (g,Af) = g|A|f into an expression involving a matrix Aijthat represents the linear operator A

    g|A|f = g|IAI|f

    =

    Ni,j=1

    g|eiei|A|ejej |f (1.154)

    In the basis {|ek}, the matrix Aij that represents the linear oper-ator A is

    Aij = ei|A|ej (1.155)and the components of the vectors |f and |g are

    fi = ei|fgi = ei|g. (1.156)

  • 22 Linear Algebra

    In this basis, the inner product (g,Af) = g|A|f takes the form

    g|A|f =N

    i,j=1

    giAijfj . (1.157)

    1.14 Vectors and Their Components

    Usually, the components vk of a vector |v are the inner productsvk = k|v (1.158)

    of the vector |v with a set of orthonormal basis vectors |k. Thus thecomponents vk of a vector |v depend on both the vector and the basis. Avector is independent of the basis used to compute its components,

    but its components depend upon the chosen basis.

    If the basis is orthonormal and so provides for the identity operator I the

    expansion

    I =

    Nk=1

    |kk| (1.159)

    then the components vk of the vector |v are the coefficients in its expansionin terms of the basis vectors |k

    |v = I|v =Nk=1

    |kk|v =Nk=1

    vk|k. (1.160)

    1.15 Linear Operators and Their Matrices

    A linear operator A maps vectors into vectors linearly as in Eq. (1.59)

    A(bU + cV ) = bA(U) + cA(V ) = bAU + cAV. (1.161)

    In the simplest and most important case, the linear operator A maps the

    vectors of a vector space S into vectors in the same space S. If the space S

    is N -dimensional, then it maps the vectors |i of any basis {|i} for S intovectors A|i |Ai that can be expanded in terms of the same basis {|k}

    A|i = |Ai =Nk=1

    Aki|k. (1.162)

    The N N matrix with entries Aki represents the linear operator A

  • 1.15 Linear Operators and Their Matrices 23

    in the basis {|i}. Because A is linear, its action on an arbitrary vector|C = Ni=1Ci |i in S is

    A|C = A(

    Ni=1

    Ci |i)

    =

    Ni=1

    CiA |i =Nk=1

    Ni=1

    AkiCi |k. (1.163)

    Thus the coefficients (AC)k of the vector A|C |AC in the expansion

    A|C = |AC =Nk=1

    (AC)k |k (1.164)

    are given by the matrix multiplication of the vector C with elements Ci by

    the matrix A with entries Aki

    (AC)k =Ni=1

    AkiCi. (1.165)

    Both the elements Ci of the vector C and the entries Aki of the

    matrix A depend upon the basis {|i} one chooses to use.If the vectors {|i} are orthonormal, then the elements C` and A`i are

    `|C =Ni=1

    Ci `|i =Ni=1

    Ci `i = C`

    `|A|i =Nk=1

    Aki `|k =Nk=1

    Aki `k = A`i. (1.166)

    In the more general case, the linear operator A maps vectors in a vector

    space S into vectors in a different vector space S. Now A maps an orthonor-mal basis {|i} for S into vectors A|i that may be expanded in terms of anorthonormal basis {|k}

    A|i =N k=1

    Aki |k. (1.167)

    If the N vectors A|i are linearly independent, then N = N , but if theyare linearly dependent or if some of them are zero, then N < N . Theelements A`i of the matrix that represents the linear operator A

    now are

    8`|A|i =N k=1

    Aki8`|k =

    N k=1

    Aki `k = A`i. (1.168)

    They depend on both bases {|i} and {|k}. So although the linear

  • 24 Linear Algebra

    operator is basis independent, the matrices that represent it vary with the

    chosen bases.

    So far we have mostly been talking about linear operators that act on

    finite-dimensional vector spaces and that can be represented by matrices.

    But infinite-dimensional vector spaces and the linear operators that act on

    them play central roles in electrodynamics and quantum mechanics. For

    instance, the Hilbert space H of all wave functions (x, t) that are squareintegrable over three-dimensional space at all times t is of (very) infinite

    dimension. An example in one space dimension of a linear operator that

    maps (a subspace of) H to H is the hamiltonian H for a non-relativisticparticle of mass m in a potential V

    H = ~2

    2m

    d2

    dx2+ V (x). (1.169)

    It maps the state vector | with components x| = (x) into the vectorH| with components

    x|H| = H(x) = ~2

    2m

    d2(x)

    dx2+ V (x)(x) (1.170)

    where ~ = 1.05 1034 Js. Translations in space and timeUT (a, b)(x, t) = (x + a, t+ b) (1.171)

    and rotations in space

    UR()(x, t) = (R()x, t) (1.172)

    are also represented by linear operators acting on vector spaces of infinite

    dimension. As well see in what follows, these linear operators are unitary.

    We may think of linear operators that act on vector spaces of infinite di-

    mension as infinite-dimensional matrices or as matrices of continuously

    infinite dimension, the latter really being integral operators like

    H =

    dp

    dp |pp|H|pp|. (1.173)

    Thus we may carry over to spaces of infinite dimension most of our intuition

    about matricesas long as we use common sense and keep in mind that

    infinite sums and integrals do not always converge to finite numbers.

    1.16 Determinants

    The determinant of a 2 2 matrix A isdetA = |A| = A11A22 A21A12. (1.174)

  • 1.16 Determinants 25

    In terms of the antisymmetric matrix eij = eji (which implies that e11 =e22 = 0) with e12 = 1, this determinant is

    detA =2i=1

    2j=1

    eijAi1Aj2. (1.175)

    Its also true that

    ek` detA =

    2i=1

    2j=1

    eijAikAj`. (1.176)

    These definitions and results extend to any square matrix. If A is a 3 3matrix, then its determinant is

    detA =3

    ijk=1

    eijkAi1Aj2Ak3 (1.177)

    in which eijk is totally antisymmetric with e123 = 1 and the sums over i, j,

    & k run from 1 to 3. More explicitly, this determinant is

    detA =3

    ijk=1

    eijkAi1Aj2Ak3

    =

    3i=1

    Ai1

    3jk=1

    eijkAj2Ak3

    = A11 (A22A33 A32A23) +A21 (A32A13 A12A33)+A31 (A12A23 A22A13) . (1.178)

    This sum involves the 2 2 determinants of the matrices that result whenwe strike out column 1 and row i, which are called minors, multiplied by

    (1)1+i

    detA = A11(1)2 (A22A33 A32A23) +A21(1)3 (A12A33 A32A13)+A31(1)4 (A12A23 A22A13) (1.179)

    =

    3i=1

    Ai1Ci1. (1.180)

    These minors multiplied by (1)1+i are called cofactors:C11 = A22A33 A23A32C21 = (A12A33 A32A13)C31 = A12A23 A22A13. (1.181)

  • 26 Linear Algebra

    This way of computing determinants is due to Laplace.

    Example: The determinant of a 3 3 matrix is the dot product of thevector of its first row with the cross-product of the vectors of its second and

    third rows:U1 U2 U3V1 V2 V3W1 W2 W3

    =3

    ijk=1

    eijkUiVjWk =

    3i=1

    Ui(V W )i = U (V W ).

    (1.182)

    Totally antisymmetric quantities ei1i2...iN with N indices and with e123...N =

    1 provide a definition of the determinant of an N N matrix A as

    detA =N

    i1i2...iN=1

    ei1i2...iNAi11Ai22 . . . AiNN (1.183)

    in which the sums over i1 . . . iN run from 1 to N . The general form of

    Laplaces expansion of this determinant is

    detA =Ni=1

    AikCik =Nk=1

    AikCik (1.184)

    in which the first sum is over the row index i but not the (arbitrary) col-

    umn index k, and the second sum is over the column index k but not the

    (arbitrary) row index i. The cofactor Cik is (1)i+kMik in which the minorMik is the determinant of the (N 1) (N 1) matrix A without its ithrow and kth column.

    Incidentally, its also true that

    ek1k2...kN detA =N

    i1i2...iN=1

    ei1i2...iNAi1k1Ai2k2 . . . AiNkN . (1.185)

    The key feature of a determinant is that it is an antisymmetric combina-

    tion of products of the elements Aik of a matrix A. One implication of this

    antisymmetry is that the interchange of any two rows or any two columns

    changes the sign of the determinant. Another is that if one adds a multiple

    of one column to another column, for example a multiple xAi2 of column 2

    to column 1, then the determinant

    detA =N

    i1i2...in=1

    ei1i2...iN (Ai11 + xAi12)Ai22 . . . AiNN (1.186)

  • 1.16 Determinants 27

    is unchanged. The reason is that the extra term detA vanishes

    detA =N

    i1i2...iN=1

    x ei1i2...iN Ai12Ai22 . . . AiNN = 0 (1.187)

    because it is proportional to a sum of products of a factor ei1i2...iN that is

    antisymmetric in i1 and i2 and a factor Ai12Ai22 that is symmetric in these

    indices. For instance, when i1 and i2 are 5 & 7 and 7 & 5, the two terms

    cancel

    e57...iNA52A72 . . . AiNN + e75...iNA72A52 . . . AiNN = 0 (1.188)

    because e57...iN = e75...iN .By repeated additions of x2Ai2, x3Ai3, etc. to Ai1, we can change the

    first column of the matrix A to a nearly arbitrary linear combination of all

    the columns

    Ai1 Ai1 +Nk=2

    xkAik (1.189)

    without changing detA. This linear combination is not completely arbitrary

    because the coefficient of Ai1 remains unity. The analogous operation

    Ai` Ai` +N

    k=1,k 6=`ykAik (1.190)

    replaces the `th column by a nearly arbitrary linear combination of all the

    columns without changing detA.

    The key concepts of linear dependence and independence were explained

    in Sec. 1.8. Suppose that the columns of an N N matrix A are linearlydependent, so that for some coefficients yk not all zero the linear combination

    Nk=1

    ykAik = 0 i (1.191)

    vanishes for all i (the upside-down A means for all). Suppose y1 6= 0. Thenby adding suitable linear combinations of columns 2 through N to column

    1, we could make all the elements Ai1 of column 1 vanish without changing

    detA. But then the detA as given by (1.183) would vanish. It follows that

    the determinant of any matrix whose columns are linearly dependent must

    vanish.

    The converse also is true: if columns of a matrix are linearly independent,

    then the determinant of that matrix can not vanish. To see why, let us

  • 28 Linear Algebra

    recall, as explained in Sec. 1.8, that any linearly independent set of vectors

    is complete. Thus if the columns of a matrix A are linearly independent and

    therefore complete, some linear combination of all columns 2 through N

    when added to column 1 will convert column 1 into a (non-zero) multiple of

    the N -dimensional column vector (1, 0, 0, . . . 0), say (c1, 0, 0, . . . 0). Similar

    operations will convert column 2 into a (non-zero) multiple of the column

    vector (0, 1, 0, . . . 0), say (0, c2, 0, . . . 0). Continuing in this way, we may

    convert the matrix A to a matrix with non-zero entries along the main

    diagonal and zeros everywhere else. The determinant detA is then the

    product of the non-zero diagonal entries c1c2 . . . cN 6= 0, and so detA cannot vanish.

    We may extend these arguments to the rows of a matrix. The addition

    to row k of a linear combination of the other rows

    Aki Aki +N

    `=1,` 6=kz`A`i (1.192)

    does not change the value of the determinant. In this way, one may show

    that the determinant of a matrix vanishes if and only if its rows are linearly

    dependent. The reason why these results apply to the rows as well as to

    the columns is that the determinant of a matrix A may be defined either in

    terms of the columns as in definitions (1.183) & 1.185) or in terms of the

    rows:

    detA =

    Ni1i2...iN=1

    ei1i2...iNA1i1A2i2 . . . ANiN (1.193)

    ek1k2...kN detA =

    Ni1i2...iN=1

    ei1i2...iNAk1i1Ak2i2 . . . AkN iN . (1.194)

    These and many other properties of determinants follow from a study of

    permutations, which are discussed in Section 10.13. Detailed proofs can

    be found in the book by Aitken (Aitken, 1959).

    By comparing the row (1.183) & 1.185) and (1.193 & 1.194) column def-

    initions of determinants, we see that the determinant of the transpose of a

    matrix is the same as the determinant of the matrix itself:

    detAT = detA. (1.195)

    Let us return for a moment to Laplaces expansion (1.184) for the deter-

    minant detA of an N N matrix A as a sum of AikCik over the row index

  • 1.16 Determinants 29

    i with the column index k held fixed

    detA =Ni=1

    AikCik (1.196)

    in order to prove that

    k` detA =

    Ni=1

    AikCi`. (1.197)

    For k = `, this formula just repeats Laplaces expansion (1.196). But for

    k 6= `, it is Laplaces expansion for the determinant of a matrix A thatis the same as A but with its `th column replaced by its kth one. Since

    the matrix A has two identical columns, its determinant vanishes, whichexplains (1.197) for k 6= `.

    The rule (1.197) therefore provides a formula for the inverse of a matrix A

    whose determinant does not vanish. Such matrices are called nonsingular.

    The inverse A1 of an N N nonsingular matrix A is the transpose of thematrix of cofactors divided by detA(

    A1)`i

    =Ci`

    detAor A1 =

    CT

    detA. (1.198)

    To verify this formula, we use it for A1 in the product A1A and note thatby (1.197) the `kth entry of the product A1A is just `k(

    A1A)`k

    =Ni=1

    (A1

    )`iAik =

    Ni=1

    Ci`detA

    Aik = `k (1.199)

    as required.

    Example: Lets apply our formula (1.198) to find the inverse of the

    general 2 2 matrixA =

    (a b

    c d

    ). (1.200)

    We find then

    A1 =1

    ad bc(d bc a

    )(1.201)

    which is the correct inverse.

    The simple example of matrix multiplicationa b cd e fg h i

    1 x y0 1 z0 0 1

    =a xa+ b ya+ zb+ cd xd+ e yd+ ze+ fg xg + h yg + zh+ i

    (1.202)

  • 30 Linear Algebra

    shows that the operations (1.190) on columns that dont change the value

    of the determinant can be written as matrix multiplication from the right

    by a matrix that has unity on its main diagonal and zeros below.

    Now imagine that A and B are NN matrices and consider the 2N2Nmatrix product (

    A 0

    I B)(

    I B

    0 I

    )=

    (A AB

    I 0)

    (1.203)

    in which I is theNN identity matrix, and 0 is theNN matrix of all zeros.The second matrix on the left-hand side has unity on its main diagonal and

    zeros below, and so it does not change the value of the determinant of the

    matrix to its left, which thus is equal to that of the matrix on the right-hand

    side:

    det

    (A 0

    I B)

    = det

    (A AB

    I 0). (1.204)

    By using Laplaces expansion (1.184) along the first column to evaluate the

    determinant on the left-hand side (LHS) and Laplaces expansion (1.184)

    along the last row to compute the determinant on the right-hand side (RHS),

    one may derive the general and important rule that the determinant of

    the product of two matrices is the product of the determinants

    detA detB = detAB. (1.205)

    Example: The case in which the matrices A and B are both 2 2 iseasy to understand. The LHS of Eq.(1.204) gives

    det

    (A 0

    I B)

    = det

    a11 a12 0 0

    a21 a22 0 0

    1 0 b11 b120 1 b21 b22

    (1.206)= a11a22 detB a21a12 detB = detAdetB

    while its RHS comes to

    det

    (A AB

    I 0)

    = det

    a11 a12 ab11 ab12a21 a22 ab21 ab221 0 0 00 1 0 0

    = (1)C42 = (1)(1) detAB = detAB. (1.207)

    Often one uses an absolute-value notation to denote a determinant, |A| =

  • 1.16 Determinants 31

    detA. In this more compact notation, the obvious generalization of the

    product rule is

    |ABC . . . Z| = |A||B| . . . |Z|. (1.208)The product rule (1.208) implies that the determinant of A1 is the inverse

    of |A| since1 = |I| = |AA1| = |A||A1|. (1.209)

    Incidentally, Gauss, Jordan, and others have developed much faster ways

    of computing determinants and matrix inverses than those (1.184 & 1.198)

    due to Laplace. Octave, Matlab, Maple, and Mathematica use these more

    modern techniques, which also are freely available as programs in C and

    fortran from www.netlib.org/lapack.

    Numerical Example: Adding multiples of rows to other rows does not

    change the value of a determinant, and interchanging two rows only changes

    a determinant by a minus sign. So we can use these operations, which leave

    determinants invariant, to make a matrix upper triangular, a form in

    which its determinant is just the product of the factors on its diagonal. For

    instance, to make the matrix

    A =

    1 2 12 6 34 2 5

    (1.210)upper triangular, we add twice the first row to the second row1 2 10 2 5

    4 2 5

    (1.211)and then subtract four times the first row from the third1 2 10 2 5

    0 6 9

    . (1.212)Next, we subtract three times the second row from the third1 2 10 2 5

    0 0 24

    . (1.213)We now find as the determinant of A the product of its diagonal elements:

    |A| = 1(2)(24) = 48. (1.214)The Matlab command is d = det(A).

  • 32 Linear Algebra

    1.17 Systems of Linear Equations

    Suppose we wish to solve the system of linear equations

    Nk=1

    Aikxk = yi (1.215)

    for the N unknowns xk. In matrix notation, with A an N N matrix andx and y N -vectors, this system of equations is

    Ax = y. (1.216)

    If the matrix A is non-singular, that is, if det(A) 6= 0, then it has aninverse A1 given by (1.198), and we may multiply both sides of (1.216) byA1 and so find x as

    x = Ix = A1Ax = A1y. (1.217)

    When A is non-singular, this is the unique solution to (1.215).

    When A is singular, det(A) = 0, and so its columns are linearly dependent

    as explained in Sec. 1.16. In this case, the linear dependence of the columns

    of A implies that Az = 0 for some non-zero vector z, and so if x is a

    solution, then Ax = y implies that x + cz for all c is also a solution since

    A(x + cz) = Ax + cAz = Ax = y. So if det(A) = 0, then there may be

    solutions, but there can be no unique solution. Whether equation (1.215)

    has any solutions when det(A) = 0 depends on whether the vector y can be

    expressed as a linear combination of the columns of A. Since these columns

    are linearly dependent, they span a subspace of fewer than N dimensions,

    and so (1.215) has solutions only when the N -vector y lies in that subspace.

    A system of M equations

    Nk=1

    Aikxk = yi for i = 1, 2, . . . ,M (1.218)

    in N , more than M , unknowns is under-determined. As long as at least

    M of the N columns Aik of the matrix A are linearly independent, such a

    system always has solutions, but they are not unique.

    1.18 Linear Least Squares

    Suppose we are confronted with a system of M equations

    Nk=1

    Aikxk = yi for i = 1, 2, . . . ,M (1.219)

  • 1.18 Linear Least Squares 33

    in fewer unknowns N < M . This problem is over-determined. In general,

    it has no solution, but it does have an approximate solution due to Carl

    Gauss (17771855).

    If the matrix A and the vector y are real, then Gausss solution is the N

    values xk that minimize the sum of the squares of the errors

    E =Mi=1

    (yi

    Nk=1

    Aikxk

    )2. (1.220)

    The minimizing values x` make the N derivatives of E vanish

    E

    x`= 0 =

    Mi=1

    2

    (yi

    Nk=1

    Aikxk

    )(Ai`) (1.221)

    so in matrix notation

    ATy = ATAx. (1.222)

    Since A is real, the matrix of the form ATA is non-negative (1.41); if it also

    is positive (1.42), then it has an inverse, and our least-squares solution

    is

    x =(ATA

    )1ATy. (1.223)

    If the matrix A and the vector y are complex, and if the matrix AA ispositive, then one may show (problem 16) that minimization of the sum of

    the squares of the absolute values of the errors gives

    x =(AA

    )1Ay. (1.224)

    Example from biophysics: If the wavelength of visible light were a

    nanometer, microscopes would yield much sharper images. Each photon

    from a (single-molecule) fluorofore entering the lens of a microscope would

    follow ray optics and be focused within a tiny circle of about a nanometer

    on a detector. Instead, a photon that should arrive at x = (x1, x2) ar-

    rives at yi = (y1i, y2i) according to an approximately gaussian probability

    distribution

    P (yi) = c e(yix)2/(22) (1.225)

    in which c is a normalization constant and is about 150 nm. What to do?

    Keith Lidke and his merry band of biophysicists collect about N = 500

  • 34 Linear Algebra

    Figure 1.1 Conventional (left, fuzzy) and STORM (right, sharp) imagesof microtubules. The tubulin is labeled with a fluorescent anti-tubulinantibody. The white rectangles are 1 micron in length. Images courtesy ofKeith Lidke.

    points yi and determine the point x that maximizes the joint probability of

    the ensemble of image points

    P =

    Ni=1

    P (yi) = cN

    Ni=1

    e(yix)2/(22) = cN exp

    [

    Ni=1

    (yi x)2/(22)]

    (1.226)

    by solving for k = 1 and 2 the equations

    P

    xk= 0 = P

    P

    xk

    [

    Ni=1

    (yi x)2/(22)]

    =P

    2

    Ni=1

    (yik xk) . (1.227)

    Thus this maximum likelihood estimate of the image point x is the

    average of the observed points yi

    x =1

    N

    Ni=1

    yi. (1.228)

    Their stochastic optical reconstruction microscopy (STORM) is more

    complicated because they also account for the finite accuracy of their detec-

    tor.

    Microtubules are long hollow tubes made of the protein tubulin. They

    are 25 nm in diameter and typically have one end attached to a centrosome.

    Together with actin and intermediate filaments, they form the cytoskeleton

    of a eukaryotic cell. Fig. 1.1 shows conventional (left, fuzzy) and STORM

  • 1.19 The Adjoint of an Operator 35

    (right, sharp) images of microtubules. The fluorophore attaches at a random

    point on an anti-tubulin antibody of finite size, which binds to the tubulin

    of a microtubule. This spatial uncertainty and the motion of the molecules

    of living cells limits the improvement in resolution is by a factor of 10 to 20.

    1.19 The Adjoint of an Operator

    The adjoint A of a linear operator A is defined by

    (g,Af) = (Ag, f) = (f,A g). (1.229)

    Equivalent expressions in Dirac notation are

    g|Af = g|A|f = Ag|f = f |Ag = f |A|g. (1.230)So if the vectors {ei} are orthonormal and complete in a space S, then

    with f = ej and g = ei, the definition (1.229) or (1.230) of the adjoint A

    of a linear operator A implies

    ei|A|ej = ej |A|ei (1.231)or (

    A)ij

    = (Aji) = Aji (1.232)

    in agreement with our definiton (1.31) of the adjoint of a matrix as the

    transpose of its complex conjugate

    A = AT. (1.233)

    Since both (A) = A and (AT)T = A, it follows that(A)

    =(AT

    ) T = A (1.234)so the adjoint of an adjoint is the original operator.

    By applying this rule (1.234) to the definition (1.229) of the adjoint, we

    find the related rule

    (g,Af) = (g,Af) = (Ag, f). (1.235)

    1.20 Self-Adjoint or Hermitian Linear Operators

    An operator A that is equal to its adjoint

    A = A (1.236)

  • 36 Linear Algebra

    is self adjoint or hermitian. In view of definition (1.229), a self-adjoint

    linear operator A satisfies

    (g,A f) = (Ag, f) = (f,A g) (1.237)

    or equivalently

    g|A |f = Ag|f = f |Ag = f |A |g. (1.238)By Eq.(1.232), a hermitian operator A that acts on a finite-dimensional

    vector space is represented in an orthonormal basis by a matrix that is

    equal to the transpose of its complex conjugate

    Aij = (A)ij =(A)ij

    = (Aji) = Aji. (1.239)

    Such matrices are said to be hermitian. Conversely, a linear operator that

    is represented by a hermitian matrix in an orthonormal basis is self adjoint

    (problem 17).

    A matrix Aij that is real and symmetric or imaginary and anti-

    symmetric is hermitian. But a self-adjoint linear operator A that is rep-

    resented by a matrix Aij that is real and symmetric (or imaginary and

    anti-symmetric) in one orthonormal basis will not in general be represented

    by a matrix that is real and symmetric (or imaginary and anti-symmetric)

    in a different orthonormal basis, but it will be represented by a hermitian

    matrix in every orthonormal basis.

    As well see in section (1.30), hermitian matrices have real eigenvalues

    and complete sets of orthonormal eigenvectors. Hermitian operators and

    matrices represent physical variables in quantum mechanics.

    1.21 Real, Symmetric Linear Operators

    In quantum mechanics, we usually consider complex vector spaces, that is,

    spaces in which the vectors |f are complex linear combinations

    |f =Ni=1

    zi |i (1.240)

    of complex orthonormal basis vectors |i.But real vector spaces also are of interest. A real vector space is a vector

    space in which the vectors |f are real linear combinations

    |f =Nn=1

    xn |n (1.241)

  • 1.22 Unitary Operators 37

    of real orthonormal basis vectors, xn = xn and |n = |n.A real linear operator A on a real vector space

    A =N

    n,m=1

    |nn|A|mm| =N

    n,m=1

    |nAnmm| (1.242)

    is represented by a real matrix Anm = Anm. A real linear operator A that isself adjoint on a real vector space satisfies the condition (1.237) of hermiticity

    but with the understanding that complex conjugation has no effect

    (g,A f) = (Ag, f) = (f,A g) = (f,A g). (1.243)

    Thus, its matrix elements are symmetric: g|A|f = f |A|g. Since A ishermitian as well as real, the matrix Anm that represents it (in a real basis)

    is real and hermitian, and so is symmetric

    Anm = Amn = Amn. (1.244)

    1.22 Unitary Operators

    A unitary operator U is one whose adjoint is its inverse

    U U = U U = I. (1.245)

    In general, the unitary operators well consider also are linear, that is

    U (z|+ w|) = zU |+ wU | (1.246)for all states or vectors | and | and all complex numbers z and w.

    In standard notation, U U = I implies that for any vectors f and g

    (g, f) = (g, U Uf) = (Ug, Uf) (1.247)

    as well as

    (g, f) = (g, U U f) = (U g, U f). (1.248)

    In Dirac notation, these equations are

    g|f = g|U U |f = Ug|U |f = Ug|Uf (1.249)and

    g|f = g|U U |f = U g|U |f = U g|U f. (1.250)Suppose the states {|n} form an orthonormal basis for a given vector

    space. Then if U is any unitary operator, the relations (1.2471.250) show

  • 38 Linear Algebra

    that the states {U |n} also form an orthonormal basis. The orthonormalityof the image states {U |n} follows from that of the basis states {|n}

    nm = n|m = Un|Um = n|U U |m. (1.251)The completeness relation for the basis states {|n} is that the sum of theirdyadics is the identity operator

    n

    |nn| = I (1.252)

    and it implies that the images states {U |n} also are completen

    U |nn|U = UIU = UU = I. (1.253)

    So a unitary matrix U maps an orthonormal basis into another orthonor-

    mal basis. In fact, any linear map from one orthonormal basis {|n} toanother {|n} must be unitary. Such an operator will be of the form

    U =Nn=1

    |nn| (1.254)

    with

    n|m = nm and n|m = nm. (1.255)The unitarity of such a sum is evident:

    U U =Nn=1

    |nn|Nm=1

    |mm|

    =

    Nn=1

    Nm=1

    |n nm m| =Nn=1

    |nn| = I. (1.256)

    The product U U similarly collapses to unity.Unitary matrices have unimodular determinants. To show this, we use the

    definition (1.245), that is, UU = I, and the product rule for determinants(1.208) to write

    1 = |I| = |UU | = |U ||U | = |U ||UT| = |U ||U |. (1.257)A unitary matrix that is real is said to be orthogonal. An orthogonal

    matrix O satisfies

    OOT = OTO = I. (1.258)

  • 1.23 Antiunitary, Antilinear Operators 39

    1.23 Antiunitary, Antilinear Operators

    Certain maps on states , such as those involving time reversal, areimplemented by operators K that are antilinear

    K (z + w) = K (z|+ w|) = zK|+ wK| = zK + wK(1.259)

    and antiunitary

    (K,K) = K|K = (, ) = | = | = (, ) . (1.260)Dont feel bad if you find such operators spooky. I do too.

    1.24 Symmetry in Quantum Mechanics

    In quantum mechanics, a symmetry is a map of states f f that preservestheir inner products

    |||2 = |||2 (1.261)and so their predicted probabilities. The inner products of the primed and

    unprimed vectors are the same.

    Eugene Wigner (19021995) has shown that every symmetry in quantum

    mechanics can be represented either by an operator U that is linear and

    unitary or by an operator K that is anti-linear and anti-unitary. The anti-

    linear, anti-unitary case seems to occur only when the symmetry involves

    time-reversal; most symmetries are represented by operators U that are lin-

    ear and unitary. So unitary operators are of great importance in quantum

    mechanics. They are used to represent rotations, translations, Lorentz trans-

    formations, internal-symmetry transformations just about all symmetries

    not involving time-reversal.

    1.25 Lagrange Multipliers

    The maxima and minima of a function f(x) of several variables x1, x2, . . . , xnare among the points at which its gradient vanishes

    f(x) = 0. (1.262)These are the stationary points of f .

    Example 1.2 (Minimum) For instance, if f(x) = x21 + 2x22 + 3x

    23, then its

    minimum is at

    f(x) = (2x1, 4x2, 6x3) = 0 (1.263)

  • 40 Linear Algebra

    that is, at x1 = x2 = x3 = 0.

    But how do we find the extrema of f(x) if x must satisfy k constraints

    c1(x) = 0, c2(x) = 0, . . . , ck(x) = 0? We use Lagrange multipliers (Joseph-

    Louis Lagrange, 17361813).

    In the case of one constraint c(x) = 0, we no longer expect the gradient

    f(x) to vanish, but its projectionf(x)dx must vanish in those directionsdx that preserve the constraint. Sof(x)dx = 0 for all dx that makec(x)dx = 0. This means that f(x) and c(x) must be parallel. Thus, theextrema of f(x) subject to the constraint c(x) = 0 satisfy the two equations

    f(x) = c(x) and c(x) = 0. (1.264)These equations define the extrema of the unconstrained function

    L(x, ) = f(x) c(x) (1.265)of the n+ 1 variables x, . . . , xn,

    L(x, ) = f(x) c(x) = 0 and L(x, )

    = c(x) = 0. (1.266)The extra variable is a Lagrange multiplier.

    In the case of k constraints c1(x) = 0, . . . , ck(x) = 0, the projection

    f dx must vanish in those directions dx that preserve all the constraints.So f(x) dx = 0 for all dx that make all cj(x) dx = 0 for j = 1, . . . , k.The gradient f will satisfy this requirement if its a linear combination

    f = 1c1 + + kck (1.267)of the k gradients because then f dx will vanish if cj dx = 0 forj = 1, . . . , k. The extrema also must satisfy the constraints

    c1(x) = 0, . . . , ck(x) = 0. (1.268)

    Equations (1.267 & 1.268) define the extrema of the unconstrained function

    L(x, ) = f(x) 1 c1(x) + . . . k ck(x) (1.269)of the n+ k variables x and

    L(x, ) = f(x) c1(x) ck(x) = 0 (1.270)and

    L(x, )

    j= cj(x) = 0 j = 1, . . . , k. (1.271)

  • 1.26 Eigenvectors and Invariant Subspaces 41

    Example 1.3 (Constrained Extrema and Eigenvectors) Suppose we want

    to find the extrema of a real, symmetric quadratic form f(x) = xTAx

    subject to the constraint c(x) = x x 1 which says that the vector x is ofunit length.

    We form the function

    L(x, ) = xTAx (x x 1) (1.272)

    and since the matrix A is real and symmetric, we find its unconstrained

    extrema as

    L(x, ) = 2Ax 2x = 0 and x x = 1. (1.273)

    The extrema of f(x) = xTAx subject to the constraint c(x) = x x 1 arethe normalized eigenvectors

    Ax = x and x x = 1. (1.274)

    of the real, symmetric matrix A.

    1.26 Eigenvectors and Invariant Subspaces

    Let A be a linear operator that maps vectors |v in a vector space S intovectors in the same space. If T S is a subspace of S, and if the vectorA|u is in T whenever |u is in T , then T is an invariant subspace of S.The whole space S is a trivial invariant subspace of S, as is the null set .

    If T S is a one-dimensional invariant subspace of S, then A maps eachvector |u T into another vector |u T , that is

    A|u = |u. (1.275)

    In this case, we say that |u is an eigenvector of A with eigenvalue .(The German adjective eigen means own, proper, singular.)

    Example: The matrix equation(cos sin

    sin cos )(

    1

    i)

    = ei(

    1

    i)

    (1.276)

    tells us that the eigenvectors of this 22 orthogonal matrix are the 2-tuples(1,i) with eigenvalues ei.

    Problem 18 is to show that the eigenvalues of a unitary (and hence of

    an orthogonal) matrix are unimodular, || = 1.

  • 42 Linear Algebra

    Example: Let us consider the eigenvector equation

    Nk=1

    AikVk = Vi (1.277)

    for a matrix A that is anti-symmetric Aik = Aki. The anti-symmetry ofA implies that

    Ni,k=1

    ViAikVk = 0. (1.278)

    Thus the last two relations imply that

    0 =N

    i,k=1

    ViAikVk = Ni=1

    V 2i = 0. (1.279)

    Thus either the eigenvalue or the dot-product of the eigenvector with itself

    vanishes.

    Problem 19 is to show that the sum of the eigenvalues of an anti-symmetric

    matrix vanishes.

    1.27 Eigenvalues of a Square Matrix

    Let A be an N N matrix with complex entries Aik. A non-zero N -dimensional vector V with entries Vk is an eigenvector of the matrix A

    with eigenvalue if

    A|V = |V AV = V Nk=1

    AikVk = Vi. (1.280)

    Every N N matrix A has N eigenvectors V (`) and eigenvalues `AV (`) = `V

    (`) (1.281)

    for ` = 1 . . . N . To see why, we write the top equation (1.280) as

    Nk=1

    (Aik ik)Vk = 0 (1.282)

    or in matrix notation as

    (A I)V = 0 (1.283)in which I is the N N matrix with entries Iik = ik. These equivalent

  • 1.27 Eigenvalues of a Square Matrix 43

    equations (1.282 & 1.283) say that the columns of the matrix A I, con-sidered as vectors, are linearly dependent, as defined in section 1.8. We saw

    in section 1.16 that the columns of a matrix, AI, are linearly dependentif and only if the determinant |A I| vanishes. Thus a non-zero solutionof the eigenvalue equation (1.280) exists if and only if the determinant

    det (A I) = |A I| = 0 (1.284)vanishes. This requirement that the determinant of A I vanish is calledthe characteristic equation. For an N N matrix A, it is a polynomialequation of the Nth degree in the unknown eigenvalue

    |A I| P (,A) = |A|+ + (1)N1N1 TrA+ (1)NN

    =

    Nk=0

    pk k = 0 (1.285)

    in which p0 = |A|, pN1 = (1)N1TrA, and pN = (1)N . (All the pksare basis independent.) By the fundamental theorem of algebra, proved in

    Sec. 5.9, the characteristic equation always has N roots or solutions ` lying

    somewhere in the complex plane. Thus, the characteristic polynomial has

    the factored form

    P (,A) = (1 )(2 ) . . . (N ). (1.286)For every root `, there is a non-zero eigenvector V

    (`) whose components

    V(`)k are the coefficients that make the N vectors Aik ` ik that are the

    columns of the matrix A `I sum to zero in (1.282). Thus, every N Nmatrix has N eigenvalues ` and N eigenvectors V

    (`).

    Setting = 0 in the factored form (1.286) of P (,A) and in the charac-

    teristic equation (1.285), we see that the determinant of every N Nmatrix is the product of its N eigenvalues

    P (0, A) = |A| = p0 = 12 . . . N . (1.287)TheseN roots usually are all different, and when they are, the eigenvectors

    V (`) are linearly independent. This result is trivially true for N = 1. Lets

    assume its validity for N 1 and deduce it for the case of N eigenvectors.If it were false for N eigenvectors, then there would be N numbers c`, not

    all zero, such that

    N`=1

    c`V(`) = 0. (1.288)

  • 44 Linear Algebra

    We now multiply this equation from the left by the linear operator A and

    use the eigenvalue equation (1.281)

    A

    N`=1

    c` V(`) =

    N`=1

    c`AV(`) =

    N`=1

    c` ` V(`) = 0. (1.289)

    On the other hand, the product of equation (1.288) multiplied by N is

    N`=1

    c` N V(`) = 0. (1.290)

    When we subtract (1.290) from (1.289), the terms with ` = N cancel leaving

    N1`=1

    c` (` N )V (`) = 0 (1.291)

    in which all the factors (` N ) are different from zero since by assumptionall the eigenvalues are different. But this last equation says that N1 eigen-vectors with different eigenvalues are linearly dependent, which contradicts

    our assumption that the result holds for N 1 eigenvectors. This contra-diction tells us that if the N eigenvectors of an N N square matrixhave different eigenvalues, then they are linearly independent.

    An eigenvalue ` that is a single root of the characteristic equation (1.285)

    is associated with a single eigenvector; it is called a simple eigenvalue. An

    eigenvalue ` that is an nth root of the characteristic equation is associated

    with n eigenvectors; it is said to be an n-fold degenerate eigenvalue

    or to have algebraic multiplicity n. Its geometric multiplicity is the

    number n n of linearly independent eigenvectors with eigenvalue ` . Amatrix whose eigenvectors are linearly dependent is said to be defective.

    Example: The 2 2 matrix (0 1

    0 0

    )(1.292)

    has only one linearly independent eigenvector (1, 0)T and so is defective.

    Suppose A is an N N matrix that is not defective. We may use itsN linearly independent eigenvectors V (`) = |` to define the columns of anN N matrix S as

    Sk` = k, 0|` (1.293)in which the vectors |k, 0 are the basis in which Aik = i, 0|A|k, 0. The

  • 1.28 A Matrix Obeys Its Characteristic Equation 45

    inner product of the eigenvalue equation AV (`) = `V(`) with the bra i, 0|

    is

    i, 0|A|` = i, 0|ANk=1

    |k, 0k, 0|` =Nk=1

    AikSk` = `Si`. (1.294)

    Since the columns of S are linearly independent, the determinant of S does

    not vanishthe matrix S is nonsingularand so its inverse S1 is well-defined by (1.198). It follows that

    Ni,k=1

    (S1

    )niAikSk` =

    Ni=1

    `(S1

    )niSi` = ann` = ` (1.295)

    or in matrix notation

    S1AS = A(d) (1.296)

    in which A(d) is the diagonal form of the matrix A with its eigenvalues `arranged along its main diagonal and zeros elsewhere. Equation (1.296) is

    a similarity transformation. Any nondefective square matrix can

    be diagonalized by a similarity transformation

    A = SA(d)S1. (1.297)

    By using the product rule (1.208), we see that the determinant of any non-

    defective square matrix is the product of its eigenvalues

    |A| = |SA(d)S1| = |S| |A(d)| |S1| = |SS1| |A(d)| = |A(d)| =N`=1

    `

    (1.298)

    which is a special case of (1.287).

    1.28 A Matrix Obeys Its Characteristic Equation

    Every square matrix obeys its characteristic equation (1.285). That is, the

    characteristic equation

    P (,A) = |A I| =Nk=0

    pk k = 0 (1.299)

    remains true when the matrix A replaces the unknown variable

    P (A,A) =Nk=0

    pk Ak = 0. (1.300)

  • 46 Linear Algebra

    To see why, we recall the formula (1.198) for the inverse of the matrix

    A I(A I)1 = C(,A)

    T

    |A I| (1.301)

    in which C(,A)T is the transpose of the matrix of cofactors of the matrix

    A I. Since the determinant |A I| is the characteristic polynomialP (,A), we have rearranging

    (A I)C(,A)T = P (,A)I. (1.302)The transpose of the matrix of cofactors of the matrix AI is a polynomialin with matrix coefficients

    C(,A)T = C0 + C1+ + CN1N1. (1.303)The LHS of equation (1.302) is then

    (A I)C(,A)T = AC0 + (AC1 C0)+ (AC2 C1)2 + . . .+ (ACN1 CN2)N1 CN1N . (1.304)

    Equating equal powers of on both sides of (1.302), we have using (1.299)

    and (1.304), we have

    AC0 = p0I

    AC1 C0 = p1IAC2 C1 = p2I

    . . . = . . . (1.305)

    ACN1 CN2 = pN1ICN1 = pNI.

    We now multiply on the left the first of these equations by I, the second

    by A, the third by A2, . . . , and the last by AN and then add the resulting

    equations. All the terms on the left-hand sides cancel, while the sum of

    those on the right give P (A,A). Thus the matrix A obeys its characteristic

    equation

    0 =

    Nk=0

    pk Ak = |A| I+p1A+ +(1)N1(TrA)AN1+(1)N AN (1.306)

    a result known as the Cayley-Hamilton theorem (Arthur Cayley, 1821

    1895, and William Hamilton, 18051865). This derivation is due to Israel

    Gelfand (19132009) (Gelfand, 1961, pp. 8990).

  • 1.29 Functions of Matrices 47

    Because every N N matrix A obeys its characteristic equation, its Nthpower AN can be expressed as a linear combination of its lesser powers

    AN = (1)N1 [|A| I + p1A+ p2A2 + + (1)N1(TrA)AN1] .(1.307)

    Thus the square A2 of every 2 2 matrix is given byA2 = |A|I + (TrA)A. (1.308)

    Example 1.4 (Spin-one-half rotation matrix) If is a real 3-vector and

    is the 3-vector of Pauli matrices (1.35), then the square of the traceless

    2 2 matrix A = is

    ( )2 = 3 1 i21 + i2 3

    I = 2 I (1.309)in which 2 = . One may use this identity to show (problem (20)) that

    exp (i /2) = cos(/2) i sin(/2) (1.310)in which is a unit 3-vector. This matrix represents a right-handed rotation

    of radians about the axis for a spin-one-half object.

    1.29 Functions of Matrices

    What sense can we make of a function f of an N N matrix A? andhow would we compute it? One way is to use the characteristic equation

    (1.307) to express every power of A in terms of I, A, . . . , AN1 and thecoefficients p0 = |A|, p1, p2, . . . , pN2, and pN1 = (1)N1TrA. Then iff(x) is a polynomial or a function with a convergent power series

    f(x) =k=0

    ck xk (1.311)

    in principle we may express f(A) in terms of N functions fk(p) of the

    coefficients p (p0, . . . , pN1) as

    f(A) =

    N1k=0

    fk(p)Ak. (1.312)

    The identity (1.310) for exp (i /2) is an example of this technique forN = 2. which can become challenging for N > 3.

    Example: In problem (21), one finds the characteristic equation (1.306)

    for the 33 matrix i J in which the generators are (Jk)ij = iikj andijk is totally antisymmetric with 123 = 1. These generators satisfy the

  • 48 Linear Algebra

    commutation relations [Ji, Jj ] = iijkJk in which sums over repeated indices

    from 1 to 3 are understood. In problem (22), one uses this characteristic

    equation to show that the 33 real orthogonal matrix exp(i J), whichrepresents a right-handed rotation by radians about the axis , is

    exp(i J) = cos I i J sin + (1 cos ) ()T (1.313)or in terms of indices

    exp(i J)ij = ij cos sin ijkk + (1 cos ) ij . (1.314)Direct use of the characteristic equation can become unwieldy for larger

    values of N . Fortunately, another trick is available if A is a non-defective

    square matrix, and if the power series (1.311) for f(x) converges. For then

    A is related to its diagonal form A(d) by a similarity transformation (1.297),

    and we may define f(A) as

    f(A) = Sf(A(d))S1 (1.315)

    in which f(A(d)) is the diagonal matrix with entries f(a`)

    f(A(d)) =

    f(a1) 0 0 . . .

    0 f(a2) 0 . . ....

    ......

    ...

    0 0 . . . f(aN )

    (1.316)the a`s being the eigenvalues of the matrix A.

    This definition makes sense because wed expect f(A) to be

    f(A) =n=0

    cnAn =

    n=0

    cn

    (SA(d)S1

    )n. (1.317)

    But since S1S = I, we have(SA(d)S1

    )n= S

    (A(d)

    )nS1 and so

    f(A) = S

    [ n=0

    cn

    (A(d)

    )n]S1 = Sf(A(d))S1 (1.318)

    which is (1.315).

    Example: In quantum mechanics, the time-evolution operator is taken

    to be the exponential exp(iHt/~) in which H = H is a hermitian linearoperator, the hamiltonian, named after William Rowan Hamilton (1805

    1865), and ~ = h/(2pi) 1034 Js where h is the constant named after MaxPlanck (18581947). As well see in the next section, hermitian operators

  • 1.29 Functions of Matrices 49

    are never defective, so H can be diagonalized by a similarity transformation

    H = SH(d)S1. (1.319)

    The diagonal elements of the diagonal matrix H(d) are the energies E` of

    the states of the system described by the hamiltonian H. The time-evolution

    operator U(t) then is

    U(t) = S exp(iH(d)t/~)S1. (1.320)

    If the system has three states with angular frequencies i = Ei/~, then U(t)is

    U(t) = S

    ei1t 0 00 ei2t0 0 ei3t

    S1 (1.321)in which the angular frequencies are ` = E`/~.Example: For a system described by the density operator , the entropy

    S is defined as the trace

    S = kTr ( ln ) (1.322)

    in which k = 1.38 1023 J/K is the constant named after Ludwig Boltz-mann (18441906). The density operator is hermitian, non-negative, and

    of unit trace. Since is hermitian, the matrix that represents it is never

    defective, and so that matrix can be diagonalized by a similarity transfor-

    mation

    = S (d) S1. (1.323)

    Thus since the trace is cyclic (1.27), we may compute the entropy as

    S = kTr(S (d) S1 S ln((d))S1

    )= kTr

    ((d) ln((d))

    ). (1.324)

    A vanishing eigenvalue (d)k = 0 contributes nothing to this trace since

    limx0 x lnx = 0. If the system has three states, populated with proba-bilities i, the elements of

    (d), then the entropy is

    S = k (1 ln 1 + 2 ln 2 + 3 ln 3)= k [1 ln (1/1) + 2 ln (1/2) + 3 ln (1/3)] . (1.325)

  • 50 Linear Algebra

    1.30 Hermitian Matrices

    Hermitian matrices have especially nice properties. By definition (1.33),

    a hermitian matrix A is square and unchanged by hermitian conjugation

    A = A. Since it is square, the results of section 1.27 ensure that an N Nhermitian matrix A has N eigenvectors |n with eigenvalues an

    A|n = an|n. (1.326)

    In fact, these eigenvalues are all real. To see why, we form the adjoint of

    equation (1.326)

    n|A = ann| (1.327)and use the property A = A to find

    n|A = n|A = ann|. (1.328)We now form the inner product of both sides of this equation with the ket

    |n and use the eigenvalue equation (1.326) to getn|A|n = ann|n = ann|n (1.329)

    which tells us that the eigenvalues are real

    an = an. (1.330)

    Since A = A, the matrix elements of A between two of its eigenvectorssatisfy

    amm|n = (amn|m) = n|A|m = m|A|n = m|A|n = anm|n(1.331)

    which implies that

    (am an) m|n = 0. (1.332)But since the all the eigenvalues of the hermitian matrix A are real, we have

    (am an) m|n = 0. (1.333)This equation tells us that when the eigenvalues are different, then the eigen-

    vectors are orthogonal. In the absence of a symmetry, all n eigenvalues usu-

    ally are different, and so the eigenvectors usually are mutually orthogonal.

    When two or more eigenvectors |n of a hermitian matrix have the sameeigenvalue an, their eigenvalues are said to be degenerate. In this case, any

  • 1.30 Hermitian Matrices 51

    linear combination of the degenerate eigenvectors will also be an eigenvector

    with the same eigenvalue an

    A

    (D

    c|n)

    = an

    (D

    c|n)

    (1.334)

    where D is the set of labels of the eigenvectors with the same eigenvalue.

    If the degenerate eigenvectors |n are linearly independent, then we mayuse the Gramm-Schmidt procedure (1.1101.122) to choose the coefficients

    c so as to construct degenerate eigenvectors that are normalized and or-

    thogonal to each other and to the non-degenerate eigenvectors. We then

    may normalize these mutually orthogonal eigenvectors.

    But two related questions arise: Are the degenerate eigenvectors |nlinearly independent? And if so, what orthonormal linear combinations of

    them should we choose for a given physical problem? Lets consider the

    second question first.

    We saw in Sec. 1.22 that unitary transformations preserve the orthonor-

    mality of a basis. Any unitary transformation that commutes with the

    matrix A

    [A,U ] = 0 (1.335)

    maps each set of orthonormal degenerate eigenvectors of A into another

    set of orthonormal degenerate eigenvectors of A with the same eigenvalue

    because

    AU |n = UA|n = an U |n. (1.336)So theres a huge spectrum of choices for the orthonormal degenerate eigen-

    vectors of A with the same eigenvalue. What is the right set for a given

    physical problem?

    A sensible way to proceed is to add to the matrix A a second hermitian

    matrix B multiplied by a tiny, real scale factor

    A() = A+ B. (1.337)

    The matrix B must completely break whatever symmetry led to the degen-

    eracy in the eigenvalues of A. Ideally, the matrix B should be one that

    represents a modification of A that is physically plausible and relevant to

    the problem at hand. The hermitian matrix A() then will have N different

    eigenvalues an() and N orthonormal non-degenerate eigenvectors

    A()|n, = an ()|n, . (1.338)

  • 52 Linear Algebra

    These eigenvectors |n of A() are orthogonal to each othern, |n = , (1.339)

    and to the eigenvectors of A() with other eigenvalues, and they remain so

    as we take the limit

    |n = lim0|n, . (1.340)

    We may choose them as the orthogonal degenerate eigenvectors of A. Since

    one always may find a crooked hermitian matrixB that breaks any particular

    symmetry, it follows that every N N hermitian matrix A possesses Northonormal eigenvectors, which are complete in the vector space in which

    A acts. (Any N linearly independent vectors span their N -dimensional

    vector space, as explained in section 1.9.)

    Now lets return to the first question and show that an N N hermitianmatrix has N orthogonal eigenvectors. To do this, well first show that the

    space of vectors orthogonal to an eigenvector |n of a hermitian operator AA|n = |n (1.341)

    is invariant under the action of A. We must show that if |y is any vectororthogonal to the eigenvector |n

    n|y = 0 (1.342)then A|y also is orthogonal to |n, that is, n|A|y = 0. We use successivelythe definition of A, the hermiticity of A, the eigenvector equation (1.341),the definition of the inner product, and the reality of the eigenvalues of a

    hermitian matrix:

    n|A|y = An|y = An|y = n|y = n|y = n|y = 0. (1.343)Thus the space of vectors orthogonal to an eigenvector of a hermitian oper-

    ator is invariant under it.

    Now a hermitian operator A acting on an N -dimensional vector space S

    is represented by an N N hermitian matrix, and so it has at least oneeigenvector |1. The subspace of S consisting of all vectors orthogonal to|1 is an (N 1)-dimensional vector space SN1 that is invariant under theaction of A. On this space SN1, the operator A is represented by an (N 1)(N1) hermitian matrix AN1. This matrix has at least one eigenvector|2. The subspace of SN1 consisting of all vectors orthogonal to |2 is an(N 2)-dimensional vector space SN2 that is invariant under the action ofA. On SN2, the operator A is represented by an (N2)(N2) hermitianmatrix AN2 which has at least one eigenvector |3. By construction, the

  • 1.30 Hermitian Matrices 53

    vectors |1, |2, and |3 are mutually orthogonal. Continuing in this way, wesee that A has N orthogonal eigenvectors |k for k = 1, 2, . . . , N .

    The N orthogonal eigenvectors |k of an N N matrix A can be normal-ized and used to write the N N identity operator I as

    I =

    Nk=1

    |kk|. (1.344)

    On multiplying from the left by the matrix A, we find

    A = AI = ANk=1

    |kk| =Nk=1

    ak|kk| (1.345)

    which is the diagonal form of the hermitian matrix A. This expansion of A as

    a sum over outer products of its eigenstates multiplied by their eigenvalues

    is important in quantum mechanics. The expansion represents the possible

    selective, non-destructive measurements of the physical quantity represented

    by the matrix A.

    The hermitian matrix A is diagonal in the basis provided by its eigenstates

    |kAkj = k|A|j = akkj . (1.346)

    But in any other basis |`, o, the matrix A appears as

    Ai` = i, o|A|`, o =Nk=1

    i, o|kakk|`, o. (1.347)

    The linear operator

    U =Nk=1

    |kk, o| (1.348)

    is unitary because it maps the arbitrary orthonormal basis |k, o into theorthonormal basis of eigenstates |k. In the |k, o basis, U is the matrixwhose nth column is the N -tuple i, o|n that represents |n in the basis|i, o

    Uin = i, o|U |n, o = i, o|n. (1.349)So equation (1.347) tells us that an arbitrary N N hermitian matrix Acan be diagonalized by a unitary transformation

    A = UA(d)U . (1.350)

    Here A(d) is the diagonal matrix A(d)nm = amnm.

  • 54 Linear Algebra

    A matrix that is real and symmetric is hermitian; so is one that is

    imaginary and antisymmetric.

    A real, symmetric matrix R can be diagonalized by an orthogonal trans-

    formation

    R = OR(d)OT (1.351)

    in which the matrix O is a real unitary matrix, that is, an orthogonal matrix

    (1.258).

    Example: Suppose we wish to find the eigenvalues of the real, symmetric

    mass matrix

    M =(

    0 m

    m M

    )(1.352)

    in which m is an ordinary mass and M is a huge mass. The eigenvalues

    of this hermitian mass matrix satisfy the equation

    det (M I) = (M)m2 = 0 (1.353)with solutions

    =1

    2

    (M

    M2 + 4m2

    ). (1.354)

    The larger mass + is approximately the huge mass M

    + M + m2

    M(1.355)

    and the smaller mass is very tiny

    m2