PHY120 Unit 6: Matrices - University of...

PHY120

Unit 6: Matrices

J D W Weston and Dr M Mears

Department of Physics & Astronomy.

Introduction to the Unit

In this unit you will be introduced to matrix algebra, how to use it, and how it can be applied to a

variety of �elds. If you have not seen this before, then do not worry as we will go from the foundations

right through to application.

This lecture course has been designed alongside the core textbook, �Introductory Maths for Physicists

and Astronomers, Volume One� compiled by Professor David Mowbray (2016, Pearson Custom Pub-

lishing). Relevant parts of the textbook are referenced in these additional notes by use of the symbol

. In particular, do make note of the Key Point boxes. �When you look at the practice questions

in this book, attempt only the Exercises and End of Block Exercises; do not attempt the Computer

and Calculator Exercises at this point, although you may �nd these useful to refer to later in your

studies, such as when you do any subsequent programming modules.

There are also a number of other textbooks available through the library to get more problems to work

through, both for matrices and for other topics.

If at any point in the lecture course you are struggling to understand something, you should �rst refer

to the textbooks and try some problems. However if you are still having di�culties lecturers can be

available during o�ce hours; alternatively you can speak to those assisting in problem classes or your

academic tutors . Do try the set textbook �rst and check other standard texts as often the answers can

be there. You should be expecting to do some independent study before seeking the simple answer.

This part of the module will be delivered in seven lectures through the autumn semester of the academic

year, two set home-works, two problems classes and on-line exercises.

Even if you have seen much of the material before, it is absolutely vital that you support the material

taught in the lectures for this module by going through examples in the text book as highlighted here,

the problem classes and any online questions. These actions will reinforce your mathematical skills;

not just for this course and the exams. You will be expected to be able to carry out these operations

3

INTRODUCTION TO THE UNIT 4

throughout your degree and further studies. If you do not master them now, you will not have as much

space and time to make up for lost practice later and will struggle for the rest. This is seen every year

with students who did not take the necessity of mathematical skills seriously enough from the start.

Contents

Introduction to the Unit 3

Chapter 1. Introduction to Matrices 61.1. De�nitions 61.2. Basic Matrix Operators 71.3. Scaling 81.4. Multiplication 9

Chapter 2. Determinants and Inverse Matrix 112.1. Determinant of a 2 x 2 matrix 112.2. Determinant of a 3 x 3 matrix 112.3. Inverse matrix 13

Chapter 3. Transformations using Matrices 163.1. Scaling 173.2. Rotation 183.3. Re�ection 203.4. Shear 223.5. Translation 233.6. Composite transformations 24

Chapter 4. Using matrices to solve equations 254.1. Using the Inverse Matrix 264.2. Cramer's Rule 274.3. Gaussian Elimination 29

Chapter 5. Eigenvalues and Eigenvectors 335.1. Solutions of Equations 335.2. Eigenvalues and the Characteristic Equation 345.3. Eigenvectors 365.4. Relevance of Eigenvectors 38

5

CHAPTER 1

Introduction to Matrices

1.1. De�nitions

A matrix is an array of numbers or expressions which behaves as required by matrix algebra. The

terms matrix (the pleural is matrices) was �rst written about over 150 years ago 1 and there are many

references to the origins of matrix algebra in the text books that you can follow up if you wish.

Each number / expression within the matrix is known as an element. Much of matrix algebra is based

on understanding how we manipulate combinations of the elements of matrices.

Matrices are normally written as the individual elements enclosed in brackets

or

, and are

denoted by a capital letter A ; elements are denoted by lower case letters, a.

The size of a matrix is expressed by n×m where n is the number of rows and m the number of

columns. A square matrix has size n× n. The number of rows is always quoted before the number of

columns (down, then across). This allows us to identify speci�c elements of the matrix: the convention

used is a subscript of i for rows andjfor columns. As such, matrix A has elements aij, and the top left

element (row 1, column 1) is a11.

The main diagonal runs from the top left element a11 down through elements a22, a33 etc.

a11 a12

a21 a22

,

a11 a12 a13

a21 a22 a23

a31 a32 a33

1J. J. Sylvester (1850) "Additions to the articles in the September number of this journal, "On a new class of theorems,"and on Pascal's theorem," The London, Edinburgh and Dublin Philosophical Magazine and Journal of Science, 37 :363-370.

6

1.2. BASIC MATRIX OPERATORS 7

This is easy for square matrices, but take care for non-square matrices. The main diagonal always

goes through the same elements, regardless of matrix shape. Do not simply draw a line from top left

to bottom right elements.

The unit (or identity) matrix (I) is the matrix such that multiplying any matrix A by the identity

matrix gives the original matrix A (the concept of matrix multiplication is covered below in section

1.4).In practice, it is always a square matrix with 1s on the main diagonal and 0s elsewhere, ie

I2 =

1 0

0 1

, I3 =

1 0 0

0 1 0

0 0 1

A transpose matrix(AT

)is determined by transposing the rows and the columns.

For a square matrix A this is a simple mirroring the elements of along the main diagonal, and the size

is trivially maintained n× n

if A =

a11 a12 a13

a21 a22 a23

a31 a32 a33

then AT =

a11 a21 a31

a12 a22 a32

a13 a23 a33

For a non-square matrix B of size n×m the transpose BT has size m× n

if B =

b11 b12 b13 b14

b21 b22 b23 b24

, then BT =

b11 b21

b12 b22

b13 b23

b14 b24

Pages 225 and 226

1.2. Basic Matrix Operators

1.2.1. Addition and Subtraction. Firstly, when adding or subtracting matrices, they must be

of the same size. If they are not, you cannot add or subtract them. Each element in the resultant

1.3. SCALING 8

matrix is found by adding or subtracting the corresponding elements from the originals. In other words

if A =

a11 a12

a21 a22

and B =

b11 b12

b21 b22

then A± B =

a11 ± b11 a12 ± b12

a21 ± b21 a22 ± b22

Practice: Show that

A+ B = B+A

(A + B)T= AT +BT

Page 226-228

1.3. Scaling

If a matrix is multiplied by a number, then simply multiply all elements by that number.

kA =

ka11 ka12

ka21 ka22

Practice: Show that

A+A = 2A

(kA)T= kAT

Pages 228 -230

1.4. MULTIPLICATION 9

1.4. Multiplication

1.4.1. Condition for multiplication. In order to multiply two matrices together, the number

of columns of the pre-multiplier must equal the number of rows in the post-multiplier (you will see the

need for the speci�c use of these terms in the next section 1.4.2). So

if A = n×m and B = p× q

then the multiplication AB is valid i� (if and only if)

m = p

Also, using the above sizes for matrices A and B we can determine the size of the �nal matrix as the

number of rows of the pre-multiplier and the number of columns of the post-multiplier. In other words,

AB has size n× q.

A good way to start matrix multiplication is by writing out a size equation:

A × B = C

(n×m)× (p︸︷︷︸× q) = n× q

?m = p?

Page 233 and 234

1.4.2. Non-commutative properties. In the previous section I discussed multiplication in

terms of pre- and post-multipliers. In non-matrix multiplication the order of multiplication is not

important (ab = ba), but it may already be apparent from the size equation that multiplication order

is important for matrices. Matrix multiplication is non-commutative, meaning that AB 6= BA. Hence

it is important to be aware as to which matrix is the pre-multiplier and which the post-multiplier.

Task: Use the size equation to �nd the size of matrix AB if A has size 3 × 2 and B has size 2 × 4.

Show that AB is a valid multiplication, but BA is not a valid multiplication.

1.4. MULTIPLICATION 10

1.4.3. General form of multiplication. The multiplication of A and B gives another matrix

C with elements cij. To calculate these elements we use row i from A and column j from B, multiply

the individual elements (�rst in row × �rst in column, etc) and then add them together. This is easier

to see mathematically.

if A =

a11 a12

a21 a22

and B =

b11 b12

b21 b22

then the product AB = C =

c11 c12

c21 c22

where

c11 = a11b11 + a12b21

c12 = a11b12 + a12b22

c21 = a21b11 + a22b21

c22 = a21b12 + a22b22

Task 1: Check whether BA is a valid multiplication and, if so, determine the matrix D = BA.

Task 2: If A =

4 1

−2 3

and B =

−1 0 3

2 2 1

, calculate AB and BA (remember to check �rst

using the size equation...).

Task 3: In section 1.1 I introduced the identity matrix as �the matrix equivalent of 1�. Show that this

is true by multiplying any matrix by I.

Pages 235-238

CHAPTER 2

Determinants and Inverse Matrix

All square matrices possess a determinant which is needed for calculating the inverse matrix (the

matrix equivalent to division) as well as other operations.

2.1. Determinant of a 2 x 2 matrix

If A =

a b

c d

, then the determinant is

detA = |A| =

∣∣∣∣∣∣∣a b

c d

∣∣∣∣∣∣∣= ad− bc

A singular matrix is de�ned as detA = 0.

Task: Show that

|A| =∣∣AT

∣∣Pages 242-245 (excluding Q8 on p245)

2.2. Determinant of a 3 x 3 matrix

Calculating the determinant of a 3 x 3 matrix requires a few steps, and it is easy to get lost. We will

consider a matrix of the general form

A =

a11 a12 a13

a21 a22 a23

a31 a32 a33

11

2.2. DETERMINANT OF A 3 X 3 MATRIX 12

Step 1: Find a suitable row or column to expand the matrix by. If you choose this step carefully you

can save yourself some work, such as choosing a row or column containing a zero.

Step 2: Calculate the minor for each element of the row/column you are expanding the matrix by.

I am going to expand by the �rst row, meaning I shall �nd the minors (and then the cofactors) of

a11, a12, and a13.

The minor of element aij is the reduced determinant of the 2× 2 matrix found by excluding row i and

column j. The minors for the elements chosen above are

minor of a11 =

∣∣∣∣∣∣∣a22 a23

a32 a33

∣∣∣∣∣∣∣ = a22a33 − a23a32

minor of a12 =

∣∣∣∣∣∣∣a21 a23

a31 a33

∣∣∣∣∣∣∣ = a21a33 − a23a31

minor of a13 =

∣∣∣∣∣∣∣a21 a22

a31 a32

∣∣∣∣∣∣∣ = a21a32 − a22a31

Step 3: Calculate the cofactor of each expanded element. This step is often forgotten, so

practice plenty until it is �xed in your minds!

The cofactor of each element is simply the minor matrix multiplied by (−1)i+j

, meaning that some

elements will change sign (when i + j is odd). A visual method of remembering this relation is by

picturing the following matrix + − +

− + −

+ − +

so using the expansion choice above the �rst element remains the same, the second changes sign, and

the third also stays the same.

Step 4: Multiply each expanded element by its cofactor, and add them all together to �nd the

determinant. That is,

|A| = (a11 × cofactor of a11) + (a12 × cofactor of a12) + (a13 × cofactor of a13)

2.3. INVERSE MATRIX 13

= a11

∣∣∣∣∣∣∣a22 a23

a32 a33

∣∣∣∣∣∣∣− a12

∣∣∣∣∣∣∣a21 a23

a31 a33

∣∣∣∣∣∣∣+ a13

∣∣∣∣∣∣∣a21 a22

a31 a32

∣∣∣∣∣∣∣Make note of the minus sign appearing in the middle term!

As with all �avours of mathematics it is something that can only be learnt and memorised with practice.

Pick any set of numbers in a 3 x 3 matrix, and work out the determinant; or look at Q1 and Q2 on

page 258 of the text book.

Pages 246-260

2.3. Inverse matrix

We can draw on a comparison with non-matrix mathematics to understand what the inverse matrix is.

For any given variable a the inverse is given by 1/a, such that a× 1/a = 1. Therefore if the product of

two matrices returns the identity matrix, then one matrix is the inverse of the other. In other words,

if AB = BA = I then B = A−1. Note that if |A| = 0 the matrix A does not have an inverse.

2.3.1. Inverse of a 2 x 2 matrix. For the general matrix A =

a b

c d

the inverse matrix is

given by A−1 =1

|A|

d −b

−c a

.

Task: Check that this is true by calculating AA−1.

Page 261-264

2.3.2. Inverse of an n x n matrix. Step 1: Calculate AT (see section 1.1 if you need a

reminder)

Step 2: Replace every element of AT by its cofactor, resulting in the adjugate matrix adj (A)(may

also be referred to as the adjoint1matrix as in the textbook).

Step 3: Divide by detA.

1in the past these terms were e�ectively synonymous and you will probably �nd them interchangeable in your use ofmatrix algebra. The term adjoint now more normally refers to linear operators in fuctional analysis


The inverse matrix is thus de�ned by A−1 =adj (A)

|A|

Consider a general 3× 3 matrix. Stick with it - the maths is not hard, but it is easy to get lost! (Note

that letters i and j are missing from the general matrix. This is to avoid confusion with their use in

de�ning aij)

A =

a b c

d e f

g h k

So, step 1 calculates the transpose

AT =

a d g

b e h

c f k

Next, replace each element by its cofactor

adj (A) =

∣∣∣∣∣∣∣e h

f k

∣∣∣∣∣∣∣∣∣∣∣∣∣∣b h

c k

∣∣∣∣∣∣∣∣∣∣∣∣∣∣b e

c f

∣∣∣∣∣∣∣∣∣∣∣∣∣∣d g

f k

∣∣∣∣∣∣∣∣∣∣∣∣∣∣a g

c k

∣∣∣∣∣∣∣∣∣∣∣∣∣∣a d

c f

∣∣∣∣∣∣∣∣∣∣∣∣∣∣d g

e h

∣∣∣∣∣∣∣∣∣∣∣∣∣∣a g

b h

∣∣∣∣∣∣∣∣∣∣∣∣∣∣a d

b e

∣∣∣∣∣∣∣

∴ adj (A) =

(ek− hf) (hc− bk) (bf − ec)

(gf − dk) (ak− gc) (dc− af)

(dh− ge) (gb− ah) (ae− bd)

The �nal step is to divide this adjugate matrix by the determinant of the original matrix A. You should

remember how to calculate the determinant of a 3× 3 matrix (see section 2.2) so I will not go over it


step-by-step here. This gives the inverse matrix of a general 3× 3 matrix

A−1 =1

a (ek− fh)− b (dk− fg) + c (dh− eg)

(ek− hf) (hc− bk) (bf − ec)

(gf − dk) (ak− gc) (dc− af)

(dh− ge) (gb− ah) (ae− bd)

Not the nicest looking of equations, but this is because I have given a general expression for the inverse

matrix. In reality you are likely to encounter matrices made of up numbers rather than unknown

variables, meaning you may be able to simplify before moving onto the next step.

Pages 264-269

CHAPTER 3

Transformations using Matrices

Before we can start using matrices to transform graphs (and thus data in general), we need to note

that a single point on the xy plane is expressed by the 2× 1 matrix X =

x

y

. If you have multiple

points, add more columns to give a 2× n matrix where n is the number of points.

It may not be too surprising that moving into three dimensions only requires adding an extra row into

the point matrix, ie

x

y

z

though for the purposes of clarity I will stick to two dimensions.

We shall consider a single point in a two dimensional plane, and the general transformation matrix

given by

T =

a b

c d

This transformation matrix pre-multiplies our point as follows

TX =

a b

c d

x

y

=

ax + by

cx + dy

In other words the transformation matrix gives two new points, where x −→ ax + by and y −→ cx + dy.

So when we want a matrix to perform a particular function we need to consider the changes on the x

and y co-ordinates. In the following sections you will see �ve di�erent types of transformation, all of

which you should be able to remember and perform without breaking a sweat.

Page 270-273

16

3.1. SCALING 17

3.1. Scaling

Scaling is the process of multiplying all points by a set value, but in two dimensions it is possible to

scale by two di�erent factors: de�nes horizontal scaling and de�nes vertical scaling, giving new

co-ordinates

αx

βy

If we compare these new co-ordinates to the general equations above we can see that a = α, b = c = 0, d = β.

Putting these values into the general transformation matrix gives the form responsible for scaling as

Tscale =

α 0

0 β

Let us consider an example. We have four points, (0, 0) (0, 1), (1, 0), and (1, 1) and we want to scale

our x values by a factor of 3 and y by a factor of 2. In matrix form the data points are expressed as a

single matrix A =

0 0 1 1

0 1 0 1

and the translation matrix by Tscale =

3 0

0 2

.

The transformation is therefore given by

TscaleA =

3 0

0 2

0 0 1 1

0 1 0 1

=

0 0 3 3

0 2 0 2

and so the new co-ordinates are (0, 0), (0, 2), (3, 0), and (3, 2).

If we plot the initial co-ordinates (see below) we get a square, and after transformation we have a

rectangle with a length 3 times longer along the x axis and 2 times longer along the y.

3.2. ROTATION 18

Page 274-275

3.2. Rotation

We begin with a convention: the angle of rotation θ is positive for anticlockwise transformations and

is about the origin.

3.2.1. Proof. Consider an initial point with co-ordinates

x1

y1

that we want to rotate about

the origin to co-ordinates

x2

y2

through an angle θ. As this is a rotation the distance from the

origin to each point is R. The line connecting the origin and the initial point makes an angle φ with the

x-axis, whereas the line connecting the �nal point makes an angle γ. This is all shown in the diagram

below.

The �rst step is to recall some fairly basic trigonometry. We can express the x and y co-ordinates in

terms of R and the angles as

x1 = Rcosφ and y1 = Rsinφ

x2 = Rcos γ and y2 = Rsin γ

Next, it becomes clear from the above diagram that the two angles used to de�ne the co-ordinates can

be combined with the angle between the two lines, ie the angle of rotation θ. This combination tells

3.2. ROTATION 19

us that γ = θ + φ which allows us to rewrite the equations for the �nal co-ordinates as

x2 = Rcos (θ + φ) and y2 = Rsin (θ + φ)

Using the sum and di�erence identities and with substitution of the equations for x1 and y1 we �nd

x2 = x1 cos θ − y1 sin θ

y2 = x1 sin θ + y1 sin θ

A simple comparison of these equations with the general matrix given at the start of this chapter

provides us with the �nal transformation matrix

Trot =

cos θ − sin θ

sin θ cos θ

3.2.2. Example. Take the same initial set of points from the previous section, but this time we

will rotate them by 45°. So our transformation is

TrotA =

cos θ − sin θ

sin θ cos θ

0 0 1 1

0 1 0 1

=

0 −sin θ cos θ cos θ − sin θ

0 cos θ sin θ sin θ + cos θ

This can of course be simpli�ed by putting in the value of θ and by remembering that sin 45 = cos 45 =

1√2. So in this example we get

TrotA =

0 − 1√2

1√2

0

0 1√2

1√2

√2

which, when plotted, gives what we expect - a square rotated anticlockwise by 45° about the origin.

3.3. REFLECTION 20

Page 276-278

3.3. Re�ection

Whilst it is possible to re�ect a data set about any line, it is easier to consider four simple cases before

jumping into the general form for re�ection.

3.3.1. Re�ection about x-axis. When re�ecting about the x-axis, the x value of a point remains

unchanged but the y value changes sign. So, we need a matrix that when we multiply with a data set,

we get (x, y) → (x, −y). In terms of the general matrix at the start of the chapter, we want a = 1,

b = c = 0, d = −1. The matrix for the job is

TRef =

1 0

0 −1

Task: Check this works by calculating TRefX. You should do the same for each of the re�ection cases

in this section.

3.3.2. Re�ection about y-axis. Given what we saw in the previous section, it should be of no

surprise to learn that the matrix responsible for re�ection about the y-axis (ie change signs on x values

but not on y values) is

TRef =

−1 0

0 1

3.3. REFLECTION 21

3.3.3. Re�ection about y = x. This is slightly more tricky, though easier when you think about

what is happening to the individual co-ordinates. When re�ected about the line y = x we �nd that

the co-ordinates change places: x −→ y and y −→ x. Referring again to the general transformation

matrix requires a = d = 0 and b = c = 1. The transformation matrix that performs this operation is

TRef =

0 1

1 0

and once again it is useful to see how this works by trying it out on a few matrices.

3.3.4. Re�ection about y = −x. This case is similar to the previous one, but in addition to

interchanging the co-ordinates we must also change their sign. This gives a transformation matrix that

is similar to the previous, though with the extra sign component.

TRef =

0 −1

−1 0

3.3.5. General form. Whilst the matrix used to re�ect about any line passing through the origin

will be given here, I will not go through the derivation as it is beyond the level of this course and (from

a pragmatic physicist point of view) it is the end result that is of use.

We �rst need to de�ne a vector that travels along the line of re�ection. In two dimensions this vector

is u = ux i+ uy j (you will get more familiar with vectors later in your studies).

The transformation matrix is given by

TRef =1

‖u‖2

u2x − u2y 2uxuy

2uxuy u2y − u2x

Try not to worry about this general form too much. I have included it for completeness as it may be

something you need to use in the future.

Page 278-282

3.4. SHEAR 22

3.4. Shear

Shear transformations take every point and displace them parallel to and by an amount proportional

to the distance from a de�ned line. So, if we want to shear something in the x direction, points that

lie on the x-axis are unchanged, but other points move depending on their y component; the further

away from the x-axis, the more they are displaced.

Take a look at the image below. We start with a square and apply two di�erent shears, both with the

same proportionality constant (in this case, 2) but one along x and the other along y. Shearing in the

x direction leaves the �base� on the box in place but moves the other points to the right.

Now consider this transformation process. When shearing in the x direction we want the new x

co-ordinate to increase by an amount proportional to its y co-ordinate, and y co-ordinate remains

unchanged. In other words, (x, y) −→ (x + λy, y) where λ is some constant. Referring back to

the equations for a general translation matrix we see that a = d = 1, b = λ, c = 0. The same

process can be applied for shearing along the y axis, resulting in a general transformation matrix with

a = d = 1, c = λ, b = 0.

We can combine these into a single general transformation matrix for shearing along the x and y axes

with proportionality constants λx and λy respectively

Tshear =

1 λx

λy 1

3.5. TRANSLATION 23

Page 282-284

3.5. Translation

It may seem odd to leave translation until after these other, somewhat more complicated transform-

ations. But there is a good reason for this. All of the previous transformations are multiplicative;

we take our original co-ordinates, multiply them by some matrix, and get the desired result. This is

not the case for transformations where we want to take every point and translate them by the same

amount. When considered in two dimensions we want new co-ordinates x* and y* given by

x∗ = x + tx

y∗ = y + ty

It is clear from the general transformation matrix equations that we cannot do this with the current

2x2 matrix; there is no way to get the above equations (the closest you can get is the shear equations,

but in this case the extra term added on is a function of x or y - we require a constant). To achieve

this, we need to think bigger and remember that our (x, y) co-ordinate system can be described as a

plane in three dimensional space, but with a constant value for z.

So, we have our points in the three dimensional form X =

x

y

1

and a general translation matrix

T =

a11 a12 a13

a21 a22 a23

a31 a32 a33

which in turn gives

TX =

a11x + a12y + a13

a21x + a22y + a23

a31x + a32y + a33

By comparison with the equations for the new co-ordinates at the start of this section it is possible

to de�ne some of these variables. We see that a11 = a22 = a33 = 1 and a12 = a21 = a31 = a32 = 0, but

3.6. COMPOSITE TRANSFORMATIONS 24

also that a13 = tx and a23 = ty . Substituting these back in to the 3x3 general transformation matrix

results in the �nal form of the translation matrix given by

Ttrans =

1 0 tx

0 1 ty

0 0 1

Page 284-287

3.6. Composite transformations

All of the transformations seen so far perform a single operation and, in some cases, only perform it

about a speci�c reference point (for example, rotations occur about the origin). What if we want to

perform multiple transformations? What if I want to rotate data about some point other than the

origin?

The answer is simple. If you want to do multiple transformations, pre-multiply the matrix with each

transformation in turn. You must be careful to preserve the order of multiplication: TrotTtransX

translates the data then rotates it.

Composite transformations have another property that is particularly useful when performing trans-

lations on multiple data sets. So long as the order of multiplication is preserved, the following is

valid

Trot (TtransX) ≡ (TrotTtrans)X

This may seem obvious and pointless, but it has real use in data analysis and computer programming

(particularly in computer graphics). The left hand side of the equations states that I translate the data

and then rotate it, but as each data point is di�erent I would need to perform these two steps for every

point. If, however, I calculate Trot trans = TrotTtrans I have a single matrix that I can pre-multiply to

every data point. In this way, very complicated transformations can be reduced into a single matrix

before it is applied to any data set. Nifty.

Pages 287-290

CHAPTER 4

Using matrices to solve equations

The previous chapter explored the use of matrices when manipulating sets of data and graphs. The

same matrix methods can be used in a di�erent context to also solve series of linear algebraic equations.

We shall consider three di�erent approaches to solving linear equations using matrix methods, each

having their own strengths (and weaknesses) as a technique.

However before we jump into solving equations in this manner, we need to be able to express linear

algebraic equations in matrix form. In fact, you have already seen it in the introduction to chapter 3

but in reverse (that is, going from matrix to linear algebraic form).

If we have the following equations

a11x + a12y = k1

a21x + a22y = k2

then these two equations can be written in the following form a11 a12

a21 a22

x

y

=

k1

k2

You will have seen this already as a means of describing the transformation between co-ordinate systems

as part of the Special Relativity notes in the PHY101 module; this is where we prove that this is both

a valid and a useful way to handle such equation forms within physics. From previous notes3.4, you

may realise that the Galilean transformation is analogous to a shear mapping on the co-ordinate vectort

x

,where the o�set for the new x co-ordinate depends on time

25

4.1. USING THE INVERSE MATRIX 26

4.1. Using the Inverse Matrix

We can call upon an interesting property of matrix multiplication that you have already seen in section

3.6; so long as the order of multiplication is preserved, (AB)C ≡ A(BC). The other property we need

is that the product of a matrix and its inverse gives an identity matrix. So,

AX = B

then

A−1AX = A−1B

IX = X = A−1B

This only becomes clearer with an example, as general terms just lead to a messy set of equations that

cannot be cancelled down. So, example time.

We can write the two simultaneous equations

7x + 2y = 12

3x + y = 5

in matrix form as 7 2

3 1

x

y

=

12

5

The �rst step is to calculate the inverse of the 2× 2 matrix which should be second nature to you by

now (if not, go back and practice!). We �nd that

A−1 =

1 −2

−3 7

4.2. CRAMER'S RULE 27

and so we can now solve the simultaneous equations using

X = A−1B x

y

=

1 −2

−3 7

12

5

=

2

−1

Therefore, we have the solutions x = 2, y = −1. Putting these back into the original simultaneous

equations shows that these are indeed the correct solutions (reassuring, really).

Pages 305-310

4.2. Cramer's Rule

4.2.1. Proof for 2 x 2 matrix. We start by considering two general simultaneous equations

a11x + a12y = k1

a21x + a22y = k2

where the a and k terms are all constants. Rearrange the �rst equation to express x in terms of y

x =k1 − a12y

a11

4.2. CRAMER'S RULE 28

and substitute this in to the second equation

a21

(k1 − a12y

a11

)+ a22y = k2

a21k1a11

− a21a12y

a11+ a22y = k2

a21k1a11

+ y

(a22a11 − a12a21

a11

)= k2

y

(a22a11 − a12a21

a11

)=

a11k2 − a21k1a11

y =a11k2 − a21k1a22a11 − a12a21

∴ y =

∣∣∣∣∣∣∣a11 k1

a21 k2

∣∣∣∣∣∣∣∣∣∣∣∣∣∣a11 a12

a21 a22

∣∣∣∣∣∣∣Doing the same process but rearranging the �rst equation to express y in terms of x gives

x =

∣∣∣∣∣∣∣k1 a12

k2 a22

∣∣∣∣∣∣∣∣∣∣∣∣∣∣a11 a12

a21 a22

∣∣∣∣∣∣∣These two expressions are known as Cramer's rule. Notice that to �nd x, the (determinant of the)

numerator has the �rst column - that is, the one representing the coe�cients of x - replaced by the

column matrix on the right hand side. Similarly to �nd y the second column on the numerator is

replaced. You should note that the (determinant of the) denominator is the same in both cases, so

you only have to work it out once.

Cramer's rule cannot always be used to solve linear equations. If detA = 0 the rule cannot be applied

as you would be dividing by zero which is unde�ned. In this case, you would need to use another

method.

4.2.2. Extension to 3 x 3 matrices. Cramer's rule does not apply only to 2 x 2 matrices, but

the steps to prove for other size matrices is time consuming and beyond the level of this course. It will

4.3. GAUSSIAN ELIMINATION 29

be no surprise that for a 3 x 3 matrix we do the same process as above, but with the addition that

to calculate z we replace the third column in the matrix A with the column from the right hand side.

The general form of Cramer's rule for 3 variables is

x =

∣∣∣∣∣∣∣∣∣∣k1 a12 a13

k2 a22 a23

k3 a32 a33

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣a11 a12 a13

a12 a22 a23

a13 a32 a33

∣∣∣∣∣∣∣∣∣∣

; y =

∣∣∣∣∣∣∣∣∣∣∣∣∣∣

a11 k1 a13

a12 k2 a23

a13 k3 a33

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

a11 a12 a13

a12 a22 a23

a13 a32 a33

∣∣∣∣∣∣∣∣∣∣∣∣∣∣

; z =

∣∣∣∣∣∣∣∣∣∣a11 a12 k1

a12 a22 k2

a13 a32 k3

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣a11 a12 a13

a12 a22 a23

a13 a32 a33

∣∣∣∣∣∣∣∣∣∣Pages 301-304

4.3. Gaussian Elimination

The previous two methods take a series of linear algebraic equations and change them into a matrix

equivalent of simultaneous equations. An alternative method is to produce a single matrix that contains

all of the elements of the initial equations. We start with the initial general equations

a11x + a12y = k1

a21x + a22y = k2

and convert this into a single matrix, known as the augmented matrix a11 a12 k1

a12 a22 k2

The Gaussian elimination method requires us to manipulate the augmented matrix so that it is given

in the row-echelon form, in which the main diagonal contains 1s and any element under the main

diagonal is zero. For example,

1 −3 7

0 1 4

0 0 1

is in row-echelon form. In order to reduce matrices to

row-echelon form there are three rules that you can use:


(1) Rows can be swapped around

(2) Multiply any row by a non-zero constant

(3) Add or subtract a multiple of one row from another

and further manipulation gives an augmented matrix that explicitly gives the solutions to the initial

equations. Now, trying to go through this in the same way every time with a prescribed set of equations

will not work. We are aiming to reduce some elements to 1s and 0s, and the mechanism for that will

depend on the general matrix. So, to demonstrate we shall look at two worked examples, a 2x2 and a

3x3. It should be easy to see that applying the same manipulations for other matrices will not succeed.

4.3.1. Gaussian Elimination for 2x2 matrix. We start with the simultaneous equations

3x− y = 7

2x + 4y = 14

which gives the augmented matrix 3 −1 7

2 4 14

First, swap the rows around:R1 ↔ R2 2 4 14

3 −1 7

Divide row 1 by two:R1/2 1 2 7

3 −1 7

Subtract 3 × row 1 from row 2: R2− 3R1 1 2 7

0 −7 −14

Divide row 2 by -7: R2

−7 1 2 7

0 1 2


Finally, subtract 2 × row 2 from row 1: R1− 2R2 1 0 3

0 1 2

From this we have the solution x = 3, y = 2. Putting these into the original equations as a sanity check

shows that they are indeed correct.

As you can see, I have taken this step-by-step and only performed one operation at any time. This can

be somewhat laborious, so when you get better practised you can do multiple steps. This is particularly

useful for larger size matrices, as you will now see.

4.3.2. Gaussian Elimination for 3x3 matrix. We start with three simultaneous equations

2x + y − z = 1

x + 2y − z = 6

4x + 2y + 4z = 8

The augmented matrix for these equations is2 1 −1 1

1 2 −1 6

4 2 4 8

R1 ↔ R2,

R3

2 1 2 −1 6

2 1 −1 1

2 1 2 4

R2− 2× R1,R3− 2× R1

1 2 −1 6

0 −3 1 −11

0 −3 4 −8


R3− R2 1 2 −1 6

0 −3 1 −11

0 0 3 3

R3

3, R2− R3

3 1 2 −1 6

0 −3 0 −12

0 0 1 1

−R2

3 1 2 −1 6

0 1 0 4

0 0 1 1

R1− 2R2 + R3

1 0 0 −1

0 1 0 4

0 0 1 1

And so we have our solutions, x = −1, y = 4, z = 1. Check these in the original equations.

As you have seen, there is not anything di�cult in terms of the level of mathematics, but it is very

easy to get lost or misplace a term. Like everything else in this module, the only way to crack it is to

practice! There are a lot of suitable exercises in the relevant block in the text book.

Pages 313-323.

CHAPTER 5

Eigenvalues and Eigenvectors

This subject will at �rst seem somewhat abstract, but once you have a good understanding of the

process for determining eigenvalues and eigenvectors we will move on to looking at applications and

how eigenvectors relate to the matrices you have already seen earlier in this unit.

5.1. Solutions of Equations

We will start by considering solutions of the following simultaneous equations

ax + by = 0

cx + dy = 0

where a, b, c, d are constants. These equations have the trivial solution of x = 0, y = 0, but we

will consider other solutions where x 6= 0and y 6= 0 know as the non-trivial solutions. Rearranging

the above equations to express y in terms of x gives

y =a

bx

y =c

dx

and for non-trivial solutions we see that ab = c

d . We introduce an additional parameter t which may at

�rst seem like an unnecessary step, but it allows us to later determine the eigenvectors without having

x depend on y and vice versa. In this case, we have the solutions

x = t y =a

bt

x = t y =c

dt

It is tempting to look at this and assume that a = c and b = d, but whilst this is a solution it is not the

only solution. Other solutions are a = 2c & b = 2d, a = −13c & b = −13d, a = π2 c & b = π

2 d... In fact,

33

5.2. EIGENVALUES AND THE CHARACTERISTIC EQUATION 34

there are an in�nite number of non-trivial solutions when the second of the simultaneous equations is

some multiple of the �rst, or when

a = αc, b = αd

where α is some constant. Looking at the condition for non-trivial solutions (ie. ab = c

d ) allows us to

determine the matrix form of the condition as follows

Ifa

b=

c

d

Then ad = bc

or ad− bc = 0

By noting that the simultaneous equations presented at the start of this section can be written in

matrix form as a b

c d

x

y

=

0

0

orAX = 0

we can see that the condition for non-trivial solutions to exist is ad− bc = detA = 0

Page 326-329

5.2. Eigenvalues and the Characteristic Equation

We shall now consider the more general simultaneous equations

ax + by = λx

cx + dy = λy

or in matrix form a b

c d

x

y

= λ

x

y

AX = λX

5.2. EIGENVALUES AND THE CHARACTERISTIC EQUATION 35

This equation can be rearranged to determine whether these equations have non-trivial solutions, but

simple rearrangement will not work. (A− λ)X is invalid because, as you have seen in section 1.2.1,

addition and subtraction of matrices can only be performed if they are of the same size. As A is a

matrix and λ is a constant, we must do an additional step so that the subtraction can be performed.

This additional step is multiplying λ by 1, or in matrix terms we multiply λ by the identity matrix

(we have already established that multiplication by the identity matrix returns the original matrix).

Thus

AX = λIX

AX − λIX = 0

(A− λI)X = 0

and so to have non-trivial solutions we require that

|A− λI| = 0

At this point, it is easier to continue with a speci�c example rather than using general equations as

we can simplify when numbers are introduced in place of unknown variables. Take the equations

3x + 2y = λx

1x + 4y = λy

which can be rewritten in matrix form as 3 2

1 4

x

y

= λ

x

y

In order for these equations to have non-trivial solutions we require that

|A− λI| = 0∣∣∣∣∣∣∣ 3 2

1 4

−

λ 0

0 λ

∣∣∣∣∣∣∣ = 0

∣∣∣∣∣∣∣3− λ 2

1 4− λ

∣∣∣∣∣∣∣ = 0

5.3. EIGENVECTORS 36

This determinant can easily be calculated and reduced to the simplest form known as the character-

istic equation, that is

(3− λ) (4− λ)− (2× 1) = 0

12− 4λ− 3λ+ λ2 − 2 = 0

λ2 − 7λ+ 10 = 0

We can solve the characteristic equation λ2−7λ+10 = 0 using the quadratic formula, but in this case

it is easier to factorise the equation to give

(λ− 5) (λ− 2) = 0

∴ λ = 5 and λ = 2

These values of λ are the eigenvalues of the initial equations, and substitution of these eigenvalues

into the original equations gives a system that has non-trivial solutions. In the next section we will

determine these solutions and explore what relevance they have to the matrix systems and applications

seen previously in this unit.

Pages 330-334

5.3. Eigenvectors

We found in the previous section that the equations

3x + 2y = λx

1x + 4y = λy

have eigenvalues λ = 5 and λ = 2. For each of these eigenvalues there exists an eigenvector which

is the non-trivial solution to the equations. As there were two eigenvectors in the example given, we

will need to solve twice to �nd each corresponding non-trivial solution; if there were three eigenvalues

we would need to solve it three times, and so on. Be aware though that in some circumstances the

solutions may be identical.

Solution 1: λ = 5

5.3. EIGENVECTORS 37

We start with the initial equation AX = λX shown in section 5.2, but now substitute in the value of

the eigenvalue. Thus

(A− λI)X = 0 3 2

1 4

−

5 0

0 5

x

y

= 0

−2 2

1 −1

x

y

= 0

which can be written as individual equations

−2x + 2y = 0

x− y = 0

It should hopefully be clear that the �rst equation is simply a multiple of the second (namely, multiplied

by -2), which means there is an in�nite number of solutions so long as x = −y. As previously discussed

in section 5.1 we introduce another unknown parameter t which gives solutions

x = t y = −t

that can be written in matrix the form

X =

x

y

=

t

−t

= t

1

−1

such that the eigenvector corresponding to λ = 5 is

1

−1

for any number t.

Solution 2: λ = 2

5.4. RELEVANCE OF EIGENVECTORS 38

(A− λI)X = 0 3 2

1 4

−

2 0

0 2

x

y

= 0

1 2

1 2

x

y

= 0

and therefore

x = t y = 2t

thus

X = t

1

2

Pages 335-339.

5.4. Relevance of Eigenvectors

After all of this rather abstract mathematics, it is useful to �nd out what they can be used for. The

�rst thing you must recall (particularly from chapter 3) that a matrix describes an operation on a

data set or series of co-ordinates. The eigenvectors are vectors that do not change direction after the

operation has been performed, which is where the use of the German word eigen (meaning �same�)

comes from. This is a useful process in computer games design, but also in vector �eld analysis, �ow

dynamic, meteorology, and many others.

We will revisit one of the transformation matrices to highlight this. Consider the shear matrix 1 1

0 1

that we have already seen will shear along the x -axis with shear factor of 1. First we


calculate the eigenvalues

|A− λI| = 0∣∣∣∣∣∣∣ 1 1

0 1

−

λ 0

0 λ

∣∣∣∣∣∣∣ = 0

∣∣∣∣∣∣∣1− λ 1

0 1− λ

∣∣∣∣∣∣∣ = 0

(1− λ)2

= 0

λ2 − 2λ+ 1 = 0

which gives eigenvalues λ = 1 (twice). From this the eigenvector can be determined 1 1

0 1

−

1 0

0 1

x

y

= 0

0 1

0 0

x

y

= 0

meaning that y = 0 but x can take any value. Thus the eigenvector of a shear matrix along the

x-axis is

1

0

and so any vector that originally points in the x-direction remains so after the shear

transformation. This should not be too surprising if you picture the process of shearing, but we shall

enlist the help of a smiling lady to emphasis the point. In the picture below, the left image is the

original and the right has had a horizontal shear applied. Note that the horizontal vector arrow remains

horizontal, but the vertical vector in the initial image points in a di�erent direction after shearing.

PHY120 Unit 6: Matrices - University of...

Documents

Transcript of PHY120 Unit 6: Matrices - University of...