ECON 381 Coursebook 2013

download ECON 381 Coursebook 2013

of 80

Transcript of ECON 381 Coursebook 2013

  • Economics 381: Foundations of Economic Analysis

    2013 Semester Two

    Table of Contents

    1. Course outline

    2. Additional Course Information

    3. Course Notes

  • Continued

    Course Outline 2013

    ECON 381: FOUNDATIONS OF ECONOMIC ANALYSIS (15 POINTS)

    Semester 2 (1135)

    Course Prescription

    A grounding in the quantitative methods of economic analysis with application to

    commonly used formal models in microeconomics, macroeconomics and econometrics. The emphasis will be on the unifying structure of the theory with a systematic treatment

    of the mathematical techniques involved. Preparation for continuing study in economic

    theory and econometrics.

    Programme and Course Advice

    Prerequisite: ECON 201 Microeconomics

    ECON 381 is a prerequisite for entry into the Honours and Master's programmes in Economics.

    Goals of the Course

    The goal of the course is to familarise students with the most fundamental theoretical

    models and methods employed in economic analysis. It is intended that students emerge with a solid preparation for further study in economic theory and econometrics.

    Learning Outcomes

    By the end of this course it is expected that the student will:

    1. be familiar with the most commonly used quantitative methods of economic analysis;

    2. know the definition of a number of mathematical concepts central to economic

    analysis;

    3. understand the ideas behind these concepts;

    4. be able to apply these methods and concepts in a number of standard economic settings;

    5. be well prepared to undertake graduate study in the core areas of economics.

    Content Outline

    The course will develop and apply basic techniques for economic analysis. Applications will be in the context of a variety of economic models, including models from single agent

    microeconomic theory, partial and general equilibrium theory, econometrics and macroeconomics.

    The following topics will be covered.

    Week 1: Logic and Set Theory, Functions and Binary Relations

    Week 2: Introduction to Mathematical Analysis: Sets, Spaces, and Topological Structure

    Week 3: Linear Algebra

    Week 4: Linear Algebra (cont)

    Week 5: Linear Algebra (cont)

    Week 6: Convexity

  • 2. ECON 381 Course Outline 2013

    Content Outline continued

    Week 7: Constrained optimisation: Basic Theory and Lagrangians

    Week 8: Macroeconomic Applications

    Week 9: Constrained optimisation: Caveats and Extensions

    Week 10: Constrained optimisation: The Implicit Function Theorem

    Week 11: Constrained optimisation: The Envelope Theorem

    Week 12: Microeconomic Applications

    Learning and Teaching

    This course will be taught in the second semester. There will be 3 hours of lectures per week (Monday 3-5pm and Friday between 8-10am) plus a one-hour tutorial which

    students are expected to attend. Some weeks there will be an optional full class tutorial

    in the unused lecture hour. There will be regular homework which students are expected to complete.

    Teaching Staff

    Associate Professor John Hillas, Course Coordinator, Room 6111, 6th floor, Owen G.

    Glenn building, Telephone: 923 7349, email: [email protected]

    Dr Ping Yu, Room 6103, 6th floor, Owen G. Glenn building, Telephone: 923 8312,

    email: [email protected]

    Learning Resources

    There is no prescribed text for this course but there is a Coursebook containing notes on

    the material. This coursebook will be available for purchase and online. For students who

    would like a text to accompany the course, we recommend the following:

    C.P. Simon and L. Blume, Mathematics for Economists, 1994, W.W. Norton and Co.

    This book covers everything that we do in ECON 381, and lots more besides. It would be a good investment for students planning to continue to postgraduate study in Economics. It

    also contains many useful exercises on the ECON 381 material. A copy is available from the General Library's Short Loan collection.

    The following supplementary references will also be useful, and all are available on Short

    Loan.

    A.K. Dixit, Optimization in Economic Theory, 2nd edition, 1990, Oxford University Press.

    G.A. Jehle and P.J. Reny, Advanced Microeconomic Theory, 2nd edition, 2001, Addison-

    Wesley. (1st edition, 1998 is also suitable.)

    R. Garnier and J. Taylor, 100% Mathematical Proof, 1996, John Wiley and Sons.

    Assessment

    Assessment will be based on two components: Coursework worth 40% of the total mark,

    (one Test worth 30% and two Assignments each worth 5%); and a Final Examination worth 60%. Plussage does NOT apply.

    Learning

    Outcome

    Assignment

    1

    Assignment

    2 Test

    Final

    Examination

    1 X X X X

    2 X X X X

    3 X X X X

    4 X X X X

    5 X X X X 1

  • Economics 381: Foundations of Economic Analysis

    2013 Semester Two Additional Course Information

    INCLUSIVE LEARNING

    Students are urged to discuss privately any impairment-related requirements face-to-face

    and/or in written form with the course convenor/lecturer and/or tutor.

    STUDENT FEEDBACK

    Student feedback is encouraged in this course. During the semester, students may

    directly submit their feedback to the lecturer through a face-to-face appointment, or they

    may wish to submit feedback through the class representative.

    Class representatives

    At the beginning of each semester, you will elect a class representative for the

    paper[1]

    . The role of the class representative is to gather feedback from students in the

    course and bring this to the lecturer and/or the Department. Class representatives email

    addresses are posted on Cecil and you are encouraged to contact them with feedback

    relating to the course. You are also welcome to talk to the class representatives in

    person.

    Staff-Student Consultative Committee

    Class representatives also submit feedback to the Department of Economics Staff Student

    Consultative Committee (SSCC), which meets up to three times per semester to gain

    feedback regarding the course. Only class representatives may attend the SSCC

    meetings, and they will ask the class for feedback before the SSCC meeting.

    Course and teaching evaluations

    At the end of the semester, you will have the opportunity to submit an evaluation of the

    course in a formative feedback questionnaire.

    [1]

    An election will not take place if the number of applicants for the class representative positions equals the number of positions available.

  • ECON 381 SC Foundations Of Economic Analysis

    2012

    John Hillas

    University of Auckland

    Dmitriy Kvasov

    University of Adelaide

  • Contents

    Chapter 1. Logic, Sets, Functions, and Spaces 11. Logic 12. Proofs 33. Sets 44. Binary Relations 65. Functions 76. Spaces 97. Metric Spaces and Continuous Functions 108. Open sets, Compact Sets, and the Weierstrass Theorem 119. Sequences and Subsequences 1210. Linear Spaces 16

    Chapter 2. Linear Algebra 171. The Space Rn 172. Linear Functions from Rn to Rm 193. Matrices and Matrix Algebra 204. Matrices as Representations of Linear Functions 215. Linear Functions from Rn to Rn and Square Matrices 246. Inverse Functions and Inverse Matrices 247. Changes of Basis 258. The Trace and the Determinant 279. Calculating and Using Determinants 2910. Eigenvalues and Eigenvectors 33

    Chapter 3. Convex Sets 371. Definition and Basic Properties 372. Support and Separation 40

    Chapter 4. Constrained Optimisation 451. Constrained Maximisation 452. Applications to Macroeconomic Theory 493. Nonlinear Programming 534. The Implicit Function Theorem 565. The Theorem of the Maximum 586. The Envelope Theorem 607. Applications to Microeconomic Theory 64

    i

  • CHAPTER 1

    Logic, Sets, Functions, and Spaces

    1. Logic

    All the aspects of logic that we describe in this section are part of what is calledfirst order or propositional logic.

    We start by supposing that we have a number of atomic statements, which wedenote by lower case letters, p, q, r. Examples of such statements might be

    Consumer 1 is a utility maximiser.The apple is green.The price of good 3 is 17.

    We assume that each atomic statement is either true or false.Given these atomic statements we can form other statements using logical con-

    nectives.If p is a statement then p, read not p, is the statement that is true precisely

    when p is false. If both p and q are statements then (p q), read p and q, is thestatement that is true when both p and q are true and false otherwise. If both pand q are statements then (p q), read p or q, is the statement that is true wheneither p or q are true, that is, the statement that is false only if both p and q arefalse.

    We could make do with these three symbols, together with brackets to groupsymbols and tell us what to do first. For example we could have the complicatedstatement (((p q) (p r))s). This means that at least one of two statementsis true. The first is that either both p and q are true or both p and r are true. Thesecond is that s is not true.

    Exercise 1. Think about the meaning of the statement we have just consid-ered. Can you see a more straightforward statement that would mean the samething?

    While we dont strictly need any more symbols it is certainly convenient tohave at least a couple more. (In fact we dont actually need all the ones we havedefined. If we have and we can define in terms of them. Similarly if we have and we can define in terms of them.) If both p and q are statements then(p q), read if p then q or p implies q or p is sufficient for q or q is necessary forp, is the statement that is false when p is true and q is false and is true otherwise.Many people find this a bit nonintuitive. In particular, one might wonder aboutthe truth of this statement when p is false and q is true. A simple (and correct)answer is that this is a definition. It is simply what we mean by the symbol andthere isnt any point in arguing about definitions. However there is a sense in whichthe definition is what is implied by the informal statements. When we say if pthen q we are saying that in any situation or state in which p is true then q is alsotrue. We are not making any claim about what might or might not be the casewhen p is not true. So, in states in which p is not true we make no claim about qand so our statement is true whether q is true or false. Instead of (p q) we can

    1

  • 2 1. LOGIC, SETS, FUNCTIONS, AND SPACES

    write (q p). In this case we are most likely to read the statement as q only if por q is necessary for p.

    Exercise 2. We claimed above that if we have and we can define interms of them and if we have and we can define in terms of them. Show howwe would do this.

    If (p q) and (p q)that is (q p)then we say that p if and only if q orp is necessary and sufficient for q and write (p q).

    One powerful method of analysing logical relationships is by means of truthtables. A truth table lists all possible combinations of the truth values of theatomic statements and the associated truth values of the compound statements.If we have two atomic statements then the following table gives the four possiblecombinations of truth values.

    p qT TF TT FF F

    Now, we can add a column that would, for each combination of truth values ofp and q, give the truth value of p q, just as described above.

    p q p qT T TF T TT F FF F T

    Such truth tables allow us to see the logical relationship between various state-ments. Suppose we have two compound statements A and B and we form a truthtable showing the truth values of A and B for each possible profile of truth valuesof the atomic statements that constitute A and B. If in each row in which A is trueB is also true then statement A implies statement B. If statements A and B havethe same truth value is each row then statements A and B are logically equivalent.For example I claim that the statement p q we have just considered is logicallyequivalent to p q. We can see this by adding columns to the truth table we havejust considered. Let me add a column for p and then one for p q. (we only addthe column for p to make it easier).

    p q p q p p qT T T F TF T T T TT F F F FF F T T T

    Since the third column and the fifth column contain exactly the same truth valueswe see that the two statements, p q and p q are indeed logically equivalent.

    Exercise 3. Construct the truth table for the statement (p q). Is itpossible to write this statement using fewer logical connectives? Hint: why notstart with just one?

    Exercise 4. Prove that the following statements are equivalent:

    (i) (p q) ((p) q) and (q p),(ii) p q and q p.

  • 2. PROOFS 3

    In part (ii) the second statement is called the contrapositive of the first statement.Often if you are asked to prove that p implies q it will be easier to show thecontrapositive, that is, that not q implies not p.

    Exercise 5. Prove that the following statements are equivalent:

    (i) (p q) and p q,(ii) (p q) and p q.These two equivalences are known as De Morgans Laws.

    A tautology is a statement that is necessarily true. For example if the state-ments A and B are logically equivalent then the statement A B is a tautology.If A logically implies B then A B is a tautology. We can check whether a com-pound statement is a tautology by writing a truth table for this statement. If thestatement is a tautology then its truth value should be T in each row of its truthtable.

    A contradiction is a statement that is necessarily false, that is, a statement Asuch that A is a tautology. Again, we can see whether a statement is a contradic-tion by writing a truth table for the statement.

    2. Proofs

    We shall not give a systematic development of this topic. Rather we shall justcollect a number of practical points about reading proofs and of writing your ownsimple proofs.

    When we are asked to prove something we have a set of assumptions or premisesand are asked to prove a conclusion. Often the division between premises andconclusion is not obviously given. For example, if we have say two premises P1and P2 and want to prove a conclusion C it would be essentially the same result ifwe had premise P1 and conclusion P2 C, or indeed no premise (or, say, take aspremise only the basic axioms of mathematics, which are usually left implicit) andthe conclusion (P1 P2) C.

    Having said that, let us consider the problem which asks us form the premiseP prove the conclusion C. We could start with the premise P , draw some interme-diate claims, and eventually conclude that C is true. This is called a direct proof.Alternatively, we could start by assuming C, draw some intermediate claims, andeventually conclude P . This is called proving the contrapositive. Finally we couldstart by assuming P C and prove that this leads to a logical contradiction. Thisis called a proof by contradiction.

    2.1. Proof by Mathematical Induction. The previous comments, whileinformal, properly belong in a treatment of logic. We now turn to another com-mon method of proof that essentially depends on the definition of the natural orcounting numbers, that is, the numbers 1, 2, 3, . . . . Often, in mathematics, andeven reasonably often in economics, we want to prove, not a single proposition butsome general class of propositions. For example, rather than wanting to prove thata particular specification of an economy has a competitive equilibrium we mightwant to prove that all economies that satisfy some conditions have equilibria. Oftenin such cases some details of size or dimension will not be specified. For examplewe may not specify the number of consumers, or the number of commodities. Inthese circumstances what we want to prove is a class of propositions P1, P2, P3, . . . .And it might be quite difficult to directly prove the general case Pn where n can beany natural number 1, 2, 3, . . . . It is often easier, sometimes much easier, to prove,say, P1. And knowing P1 it might be quite easy to then prove P2. And knowingP2 to prove P3, and so on. Thus, if we wanted to prove P17, for example, we could

  • 4 1. LOGIC, SETS, FUNCTIONS, AND SPACES

    do so, though it might be a bit tedious. We still would not have proved the generalcase that we wanted to, though it would seem to be obviously true.

    The Principle of Mathematical Induction says that one has a list of propositions,P1, P2, P3, . . . and P1 is true and that Pn implies Pn+1 for any n = 1, 2, 3, . . . thenPn is true for any n = 1, 2, 3, . . . .

    Exercise 6. To use the Principle of Mathematical Induction we will need tofirst prove that P1 is true. This is often quite easy. We then need to assume thatPn is true and prove that Pn+1 is true. This is often much harder. Sometimesit is easier if we assume not only that Pn is true but also that P1, . . . , Pn are alltrue. That is, we may sometimes want to use a more general principle that if P1is true and that P1, . . . , Pn implies Pn+1 for any n = 1, 2, 3, . . . then Pn is true forany n = 1, 2, 3, . . . . Show that this more general principle follows from the statedPrinciple of Mathematical Induction. Hint: Let Qn = P1 P2 Pn and applythe Principle of Mathematical Induction to the propositions Q1, Q2, Q3, . . . .

    As we said in the previous exercise the difficulty in using the Principle of Math-ematical Induction is often in proving the Pn+1 is true. As noted in that exercise itsometimes helps if in addition to Pn we also assume P1, . . . , Pn1 and that this isquite legitimate. If even this does not help it sometimes helps to strengthen whatwe are trying to prove. In most proof this will only make the problem harder sincewe are trying to prove more. However in using the Principle of Mathematical In-duction when we strengthen what we are trying to prove we not only strengthen theconclusion we also strengthen, in the place where we usually have to work hardest,the premises. Suppose that we are trying to prove P1, P2, P3, . . . and that for eachn the proposition Qn is stronger than Pn. It might well be that it is easier to prove,using the Principle of Mathematical Induction, Q1, Q2, Q3, . . . then P1, P2, P3, . . . .We would first need to prove Q1. This might be a little harder, but this is, as wehave said, often not where the difficulty lies. Of course, Q1 should be true. We arenot claiming that any strengthening of the P s will make things easier but just thatit is sometimes useful to consider strengthening the propositions if you are havingdifficulty with the original propositions.

    The next step is proving that Qn implies Qn+1 for any n = 1, 2, 3, . . . . Now itscertainly easier to prove Pn+1 than Qn+1 from the same set of premises. But wedont have the same set of premises. In our original attempt we had the premisePn. Now we have the stronger premise Qn and this might possibly make our lifeeasier. In order to see what the right form for Q is it is often useful to prove P1and then use P1 to prove P2 and use P2 to prove P3. If you examine exactly whatyou are doing you may find that there are things that are true in the case of n = 1and n = 2 that are not explicitly part of P1 and P2 and that you are using.

    We use the Principle of Mathematical Induction in at least one place in whatfollows, and you will find many uses of it in articles in economics.

    3. Sets

    Set theory was developed in the second half of the 19th century and is at thevery foundation of modern mathematics. But we shall not be concerned here withthe development of the theory. Rather we shall only give the basic language of settheory and outline some of the very basic operations on sets.

    We start by defining a set to be a collection of objects or elements. We willusually denote sets by capital letters and their elements by lower case letters. Ifthe element a is in the set A we write a A. If every element of the set B is alsoin the set A we call B a subset of the set A and write B A. We shall also saythe A contains B. If A and B have exactly the same elements then we say they

  • 3. SETS 5

    are equal or identical. Alternatively we could say A = B if and only if A B andB A. If B A and B 6= A then we say that B is a proper subset of A or that Astrictly contains B.

    Exercise 7. How many subsets a set with N elements has?

    In order to avoid the paradoxes such as the one referred to in the first paragraphwe shall always assume that in whatever situation we are discussing there is somegiven set U called the universal set which contains all of the sets with which weshall deal.

    We customarily enclose our specification of a set by braces. In order to specifya set one may simply list the elements. For example to specify the set D whichcontains the numbers 1,2, and 3 we may write D = {1, 2, 3}. Alternatively we maydefine the set by specifying a property that identifies the elements. For examplewe may specify the same set D by D = {x | x is an integer and 0 < x < 4}. Noticethat this second method is more powerful. We could not, for example, list allthe integers. (Since there are an infinite number of them we would die before wefinished.)

    For any two sets A and B we define the union of A and B to be the set whichcontains exactly all of the elements of A and all the elements of B. We denote theunion of A and B by A B. Similarly we define the intersection of A and B tobe that set which contains exactly those elements which are in both A and B. Wedenote the intersection of A and B by A B. Thus we have

    A B = {x | x A or x B}A B = {x | x A and x B}.

    Exercise 8. The oldest mathematician among chess players and the oldestchess player among mathematicians is it the same person or (possibly) differentones?

    Exercise 9. The best mathematician among chess players and the best chessplayer among mathematicians is it the same person or (possibly) different ones?

    Exercise 10. Every tenth mathematician is a chess player and every fourthchess player is a mathematician. Are there more mathematicians or chess playersand by how many times?

    Exercise 11. Prove the distributive laws for operations of union and intersec-tion.

    (i) (A B) C = (A C) (B C)(ii) (A B) C = (A C) (B C)

    Just as the number zero is extremely useful so the concept of a set that hasno elements is extremely useful also. This set we call the empty set or the null setand denote by . To see one use of the empty set notice that having such a conceptallows the intersection of two sets be well defined whether or not the sets have anyelements in common.

    We also introduce the concept of a Cartesian product. If we have two sets, sayA and B, the Cartesian product, A B, is the set of all ordered pairs, (a, b) suchthat a is an element of A and b is an element of B. Symbolically we write

    AB = {(a, b) | a A and b B}.

  • 6 1. LOGIC, SETS, FUNCTIONS, AND SPACES

    4. Binary Relations

    There are a number of ways of formulating the notion of a binary relation. Weshall pursue one, defining a binary relation on a set X simply as a subset of XX ,the Cartesian product of X with itself.

    Definition 1. A binary relation R on the set X is a subset of X X . If thepoint (x, y) R we shall often write xRy instead of (x, y) R.

    Since we have already defined the notions of Cartesian product and subset,there is really nothing new here. However the structure and properties of binaryrelations that we shall now study is motivated by the informal notion of a relationbetween the elements of X .

    Example 1. Suppose that X is a set of boys and girls and the relation xSy isx is a sister of y.

    Example 2. Suppose that X is the set of natural numbers X = {1, 2, 3, . . .}.There are binary relations >, , and =.

    Example 3. Suppose that X is the set of natural numbers X = {1, 2, 3, . . .}.The relations R, P , and I are defined by

    xRy if and only if x+ 1 y,xPy if and only if x > y + 1, andxIy if and only if 1 x y 1.

    Definition 2. The following properties of binary relations have been definedand found to be useful.

    (BR1) Reflexivity: For all x in X xRx.(BR2) Irreflexivity: For all x in X not xRx.(BR3) Completeness: For all x and y in X either xRy or yRx (or both).1

    (BR4) Transitivity: For all x, y, and z in X if xRy and yRz then xRz.(BR5) Negative Transitivity: For all x, y, and z in X if xRy then either

    xRz or zRy (or both).(BR6) Symmetry: For all x and y in X if xRy then yRx.(BR7) Anti-Symmetry: For all x and y in X if xRy and yRx then x = y.(BR8) Asymmetry: For all x and y in X if xRy then not yRx.

    Exercise 12. Show that completeness implies reflexivity, that asymmetry im-plies anti-symmetry, and that asymmetry implies irreflexivity.

    Exercise 13. Which properties does the relation described in Example 1 sat-isfy?

    Exercise 14. Which properties do the relations described in Example 2 sat-isfy?

    Exercise 15. Which properties do the relations described in Example 3 sat-isfy?

    We now define a few particularly important classes of binary relations.

    Definition 3. A weak order is a binary relation that satisfies transitivity andcompleteness.

    Definition 4. A strict partial order is a binary relation that satisfies transi-tivity and asymmetry.

    1We shall always implicitly include or both when we say either. . . or.

  • 5. FUNCTIONS 7

    Definition 5. An equivalence is a binary relation that satisfies transitivityand symmetry.

    You have almost certainly already met examples of such binary relations inyour study of Economics. We normally assume that weak preference, strict pref-erence, and indifference of a consumer are weak orders, strict partial orders, andequivalences, though we actually typically assume a little more about the strictpreference.

    The following construction is also motivated by the idea of preference. Letus consider some binary relation R which we shall informally think of as a weakpreference relation, though we shall not, for the moment, make any assumptionsabout the properties of R. Consider the relations P defined by xPy if and only ifxRy and not yRx and I defined by xRy if and only if xRy and yRx.

    Exercise 16. Show that if R is a weak order then P is a strict partial orderand I is an equivalence.

    We could also think of starting with a strict preference P and defining the weakpreference R in terms of P . We could do so either by defining R as xRy if and onlyif not yPx or by defining R as xRy if and only if either xPy or not yPx.

    Exercise 17. Show that these two definitions of R coincide if P is asymmetric.

    Exercise 18. Show by example that P may be a strict partial order (so, bythe previous result, the two definitions of R coincide) but R not a weak order.[Hint: If you cannot think of another example consider the binary relations definedin Example 3.]

    Exercise 19. Show that if P is asymmetric and negatively transitive then

    (i) P is transitive (and hence a strict partial order), and(ii) R is a weak order.

    5. Functions

    Let X and Y be two sets. A function (or a mapping) f from the set X to theset Y is a rule that assigns to each x in X a unique element in Y , denoted by f(x).The notation

    f : X Y.is standard. The set X is called the domain of f and the set Y is called thecodomain of f . The set of all values taken by f , i.e. the set

    {y Y | there exists x in X such that y = f(x)}is called the range of f . The range of a function need not coincide with its codomainY .

    There are several useful ways of visualising functions. A function can be thoughtof as a machine that operates on elements of the set X and transforms an inputx into a unique output f(x). Note that the machine is not required to producedifferent outputs from different inputs. This analogy helps to distinguish betweenthe function itself, f , and its particular value, f(x). The former is the machine,the latter is the output2. One of the reasons for this confusion is that in practice,to avoid being verbose, people often say things like consider a function U(x, y) =xy instead of saying consider a function defined for every pair (x, y) in R2 bythe equation U(x, y) = xy.

    2Mathematician Robert Bartle put it as follows. Only a fool would confuse sausage-grinderwith a sausage; however, enough people have confused functions with their values...

  • 8 1. LOGIC, SETS, FUNCTIONS, AND SPACES

    A function can also be thought of as a transformation, or a mapping, of the setX into the set Y . In line with this interpretation is the common terminology, it issaid that f(x) is the image of x under the function f . Again, it is important toremember that there may be points of Y which are the images of no point of X andthat there may be different points of X which have the same images in Y . What isabsolutely prohibited, however, is for a point from X to have several images in Y .

    The part of definition of the function is the specification of its domain. However,in applications, functions are quite often defined as an algebraic formula, withoutexplicit specification of its domain. For example, a function may be defined as

    f(x) = sinx+ 145x2.

    The function f is then the rule that assigns the value sinx+145x2 to each value ofx. The convention in such cases is that the domain of f is the set of all values of xfor which the formula gives a unique value. Thus, if you come, for instance, acrossthe function f(x) = 1/x you should assume that its domain is (, 0) (0,),unless specified otherwise.

    For any subset A of X , the subset f(A) of Y such that y = f(x) for some x inX is called the image of A by f , that is,

    f(A) = {y Y | there exists x in A such that y = f(x)}.Thus, the the range of f can be written as f(X). Similarly, one can define theinverse image. For any subset B of Y , the inverse image f1(B) of B is the set ofx in X such that f(x) is in B, that is,

    f1(B) = {x X | f(x) B}.A function f is called a function onto Y (or surjection) if the range of f is Y ,

    i.e., if for every y Y there is (at least) one x X such that y = f(x). In otherwords, each element of Y is the image of (at least) one element of X . A function f iscalled one-to-one (or injection) if f(x1) = f(x2) implies x1 = x2, that is, for everyelement y of f(X) there is a unique element x of X such that y = f(x). In otherwords, one-to-one function maps different elements of X into different elements ofY . When a function f : X Y is both onto and one-to-one it is called a bijection.

    Exercise 20. Suppose that a set X has m elements and a set Y has n melements. How many different functions are there from X to Y ? from Y to X?How many of them surjective? How many of them injective? How many of thembijective?

    Exercise 21. Find a function f : N N which is(i) surjective but not injective,(ii) injective but not surjective,(iii) neither surjective nor injective,(iv) bijective

    If function f is a bijection then it is possible to define a function g : Y Xsuch that g(y) = x where x = f(y). Thus, to each element y of Y is assigned anelement x in X whose image under f is y. Since f is onto, g is defined for every yof Y and since f is one-to-one g(y) is unique. The function g is called the inverse off and is usually written as f1. In that case, however, its not immediately clearwhat f1(x) means. Is it the inverse image of x under f or the image of x underf1? Happily enough they are the same if f1 exists.

    Exercise 22. Prove that when a function f1 exists it is both onto and one-to-one and that the inverse of f1 is the function f itself.

  • 6. SPACES 9

    If f : X Y and g : Y Z, then the function h : X Z, defined ash(x) = g(f(x)), is called the composition of g with f and denoted by g f . Notethat even if f g is well-defined it is, usually, different from g f .

    Exercise 23. Let f : X Y . Prove that there exists a surjection g : X Awhere A X and a injection h : A Y such that f = h g. In other words, provethat any function can be written as a composition of a surjection and an injection.

    The set G XY of ordered pairs (x, f(x)) is called the graph of the functionf3. Of course, the fact that something is called a graph does not necessarily meanthat it can be drawn.

    6. Spaces

    Sets are reasonably interesting mathematical objects to study. But to makethem even more interesting (and useful for applications) sets are usually endowedwith some additional properties, or structures. These new objects are called spaces.The structures are often modeled after the familiar properties of space we live in andreflect (in axiomatic form) such notions as order, distance, addition, multiplication,and so on.

    Probably one of the most intuitive spaces is the space of the real numbers, R.We will briefly look at the axiomatic way of describing some of its properties.

    Given the set of real numbers R, the operation of addition is the function+ : RR R that maps any two elements x and y in R to an element denoted byx+ y and called the sum of x and y. This addition operation satisfies the followingaxioms. For all real numbers x, y, and z

    A1: x+ y = y + x.A2: (x+ y) + z = x+ (y + z).A3: There exists an element, denoted by 0, such that x+ 0 = x.A4: For each x there exists an element, denoted by x, such that

    x+ (x) = 0.All the remaining properties of the addition can be proven using these axioms.

    Note also that we can define another operation x y as x + (y) and call itsubtraction.

    Exercise 24. Prove that the axioms for addition imply the following state-ments.

    (i) The element 0 is unique.(ii) If x+ y = x+ z then y = z (a cancelation law).(iii) (x) = x.

    The operation of multiplication can be axiomatised in a similar way. Given theset of real numbers, R, the operation of multiplication is the function : RR Rthat maps any two elements x and y in R to an element denoted by x y and calledthe product of x and y. The multiplication satisfies the following axioms for all realnumbers x, y, and z.

    A5: x y = y x.A6: (x y) z = x (y z).A7: There exist an element, denoted by 1, such that x 1 = x.A8: For each x 6= 0 there exist an element, denoted by x1, such that

    x x1 = 1.3Some people like the idea of the graph of a function so much that they define a function to

    be its graph.

  • 10 1. LOGIC, SETS, FUNCTIONS, AND SPACES

    One more axiom (a distributive law) brings these two operations, addition andmultiplication4, together.

    A9: x(y + z) = xy + xz for all x, y, and z in R.

    Another structure possessed by the real numbers has to do with the fact thatthe real numbers are ordered. The notion of x less than y can be axiomatised asfollows. For any two distinct elements x and y either x < y or y < x and, inaddition, if x < y and y < z then x < z.

    Another example of a space (very important and useful one) is ndimensionalreal space5. Given the natural number n, define Rn to be the set of all possi-ble ordered ntuples of n real numbers, with generic element denoted by x =(x1, . . . , xn). Thus, the space R

    n is the nfold Cartesian product of the set R withitself. Real numbers x1, . . . , xn are called coordinates of the vector x. Two vectorsx and y are equal if and only if x1 = y1, . . . , xn = yn. The operation of addition oftwo vectors is defined as

    x+ y = (x1 + y1, . . . , xn + yn).

    Exercise 25. Prove that the addition of vectors in Rn satisfies the axioms ofaddition.

    The role of multiplication in this space is player by the operation of multipli-cation by real number defined for all x in Rn and all in R by

    x = (x1, . . . , xn).

    Exercise 26. Prove that the multiplication by real number satisfies a distribu-tive law.

    7. Metric Spaces and Continuous Functions

    The notion of metric is the generalisation of the notion of distance between tworeal numbers.

    Let X be a set and d : XX R a function. The function d is called a metricif it satisfies the following properties for all x, y, and z in X .

    1. d(x, y) 0 and d(x, y) = 0 if and only if x = y,2. d(x, y) = d(y, x),3. d(x, y) d(x, z) + d(z, y).

    The set X together with the function d is called a metric space, elements of Xare usually called points, and the number d(x, y) is called the distance between xand y. The last property of a metric is called triangle inequality.

    Exercise 27. Let X be a non-empty set and d : X X R be the functionthat satisfies the following two properties for all x, y, and z in X .

    (i) d(x, y) = 0 if and only if x = y,(ii) d(x, y) d(x, z) + d(y, z).Prove that d is a metric.

    Exercise 28. Prove that d(x, y)+d(w, z) d(x,w)+d(x, z)+d(y, w)+d(y, z)for all x, y, w, and z in X , where d is some metric on X .

    An obvious example of a metric space is the the set of real numbers, R, togetherwith the usual distance, d(x, y) = |x y|. Another example is the ndimensionalEuclidean space Rn with metric

    d(x, y) =(x1 y1)2 + + (xn yn)2.

    4From now on, to go easy on notation we will follow the standard convention not to writethe symbol for multiplication, that is to write xy instead of x y, etc.

    5We havent defined what the word dimension means yet, so just treat it as a (fancy) name.

  • 8. OPEN SETS, COMPACT SETS, AND THE WEIERSTRASS THEOREM 11

    Note that the same set can be endowed with the different metrics thus resultingin the different metric spaces. For example, the set of all ntuples of real numberscan be made into metric space by use of the (non-Euclidean) metric

    dT (x, y) = |x1 y1|+ + |xn yn|,which is different from metric space Rn. This metric is sometimes called the Man-hattan (or taxicab) metric. Another curious metric is the so-called French railroadmetric, defined by

    dF (x, y) =

    {0 if x = y

    d(x, P ) + d(y, P ) if x 6= ywhere P is the particular point of Rn (called Paris) and function d is the Euclideandistance.

    Exercise 29. Prove that the French railroad metric dF is a metric.

    Exercise 30. Let X be a non-empty set and d : X X R be the functiondefined by

    d(x, y) =

    {1 if x 6= y0 if x = y

    Prove that d is a metric. (This metric is called the discrete metric.)

    Using the notion of metric it is possible to generalise the idea of continuousfunction.

    Suppose (X, dX) and (Y, dY ) are metric spaces, x0 X , and f : X Y is afunction. Then f is continuous at x0 if for every > 0 there exists a > 0 suchthat

    dY (f(x0), f(x)) <

    for all points x X for which dX(x0, x) < .The function f is continuous on X if f is continuous at every point of X .Lets prove that function f(x) = x is continuous on R using the above definition.

    For all x0 R, we have |f(x0) f(x)| = |x0 x| < as long as |x0 x| < = .That is, given any > 0 we are always able to find a , namely = , such thatall points which are closer to x0 than will have images which are closer to f(x0)than .

    Exercise 31. Let f : R R be the function defined byf(x) =

    {1/x if x 6= 00 if x = 0

    Prove that f is continuous at every point of R, with the exception of 0.

    8. Open sets, Compact Sets, and the Weierstrass Theorem

    Let x be a point in a metric space and r > 0. The open ball B(x, r) of radiusr centred at x is the set of all y X such that d(x, y) < r. Thus, the open ball isthe set of all points whose distance from the centre is strictly less than r. The ballis closed if the inequality is weak, d(x, y) r.

    A set S in a metric space is open if for all x S there exists r R, r > 0 suchthat B(x, r) S. A set S is closed if its complement

    SC = {x X :| x / S}is open.

    Exercise 32. Prove that an open ball is an open set.

    Exercise 33. Prove that the intersection of any finite number of open sets isthe open set.

  • 12 1. LOGIC, SETS, FUNCTIONS, AND SPACES

    A set S is bounded if there exists a closed ball of finite radius that contains it.Formally, S is bounded if there exists a closed ball B(x, r) such that S B(x, r).

    Exercise 34. Prove that the set S is bounded if and only if there a exists areal number p > 0 such that d(x, x) p for all x and x in S.

    Exercise 35. Prove that the union of two bounded sets is a bounded set.

    A collection (possibly infinite) of open sets U1, U2, . . . in a metric space is anopen cover of the set S if S is contained in its union.

    A set S is compact if every open cover of S has a finite subcover. That is, fromany open cover can select a finite number of sets Ui that still cover S.

    Note that the definition does not say that a set is compact if there is a finiteopen cover. That wouldnt be a good definition as you can cover any set with thewhole space, which is just one open set.

    Lets see how to use this definition to show that something is not compact.Consider the set (0, 1) R. To prove that it is not compact we need to find anopen cover of (0, 1) from which we cannot select a finite cover. The collection ofopen intervals (1/n, 1) for all integers n 2 is an open cover of (0, 1), because forany point x (0, 1) it is always able to find an integer n such that n > 1/x, thusx (1/n, 1). But, no finite subcover will do. Let (1/N, 1) be the maximal intervalin a candidate subcover then it is always possible to find a point x (0, 1) suchthat N < 1/x.

    While this definition of compactness is quite useful for finding out when the setunder question is not compact it is less useful for verifying that a set is indeed com-pact. Much more convenient characterisation of compact sets in finite-dimensionalEuclidean space, Rn, is given by the following theorem.

    Theorem 1. Any closed and bounded subset of Rn is compact.

    But why are we interested in compactness at all? Because of the following ex-tremely important theorem the first version of which was proved by Carl Weierstrassaround 1860.

    Theorem 2. Let S be a compact set in a metric space and f : S R be acontinuous function. Then function f attains its maximum and minimum in S.

    And why is this theorem important for us? Because many economic problemsare concerned with finding a maximal (or a minimal) value of a function on some set.The Weierstrass theorem provides conditions under which such search is meaningful.This theorem and its implications will be much dwelt upon later in the notes, sowe just give here one example. The consumer utility maximisation problem is theproblem of finding the maximum of utility function subject to the budget constraint.According to Weierstrass theorem, this problem has a solution if utility function iscontinuous and the budget set is compact.

    9. Sequences and Subsequences

    Let us consider again some metric space (X, d). An infinite sequence of pointsin (X, d) is simply a list

    x1, x2, x3, . . . ,

    where . . . indicates that the list continues forever.We can be a bit more formal about this. We first consider the set of natural

    numbers (or counting numbers) 1, 2, 3, . . . , which we denote N. We can now definean infinite sequence in the following way.

    Definition 6. An infinite sequence of elements of X is a function from N toX .

  • 9. SEQUENCES AND SUBSEQUENCES 13

    Notation. If we look at the previous definition we see that we might havea sequence s : N X which would define s(1), s(2), s(3), . . . or in other wordswould define s(n) for any natural number n. Typically when we are referring tosequences we use subscripts (or sometimes superscripts) instead of parentheses andwrite s1, s2, s3, . . . and sn instead of s(1), s(2), s(3), . . . and s(n). Also rather thansaying that s : N X is a sequence we say that {sn} is a sequence or even that{sn}n=1 is a sequence.

    Lets now examine a few examples.

    Example 4. Suppose that (X, d) is R the real numbers with the usual metricd(, x, y) = |x y|. Then {n}, {n}, and {1/n} are sequences.

    Example 5. Again, suppose that (X, d) is R the real numbers with the usualmetric d(x, y) = |x y|. Consider the sequence {xn} where

    xn =

    {1 if n is odd

    0 if n is even

    We see that {n} and {n} get arbitrary large as n gets larger, while in the lastexample xn bounces back and forth between 0 and 1 as n gets larger. However for{1/n} the element of the sequence gets closer and closer to 0 (and indeed arbitrarilyclose to 0). We say, in this case, that the sequence converges to zero or that thesequence has limit 0. This is a particularly important concept and so we shall givea formal definition.

    Definition 7. Let {xn} be a sequence of points in (X, d). We say that thesequence converges to x0 X if for any > 0 there is N N such that if n > Nthen d(xn, x0) < .

    Informally we can describe this by saying that if n is large then the distancefrom xn to x0 is small.

    If the sequence {xn} converges to x0, then we often write xn x0 as n or limn xn = x0.

    Exercise 36. Show that if the sequence {xn} converges to x0 then it does notconverge to any other value unequal to x0. Another way of saying this is that ifthe sequence converges then its limit is unique.

    We have now seen a number of examples of sequences. In some the sequenceruns off to infinity; in others it bounces around; while in others it converges toa limit. Could a sequence do anything else? Could a sequence, for example, settledown each element getting closer and closer to all future elements in the sequencebut not converging to any particular limit? In fact, depending on what the spaceX is this is indeed possible.

    First let us recall the notion of a rational number. A rational number is anumber that can be expressed as the ratio of two integers, that is r is rational ifr = a/b with a and b integers and b 6= 0. We usually denote the set of all rationalnumbers Q (since we have already used R for the real numbers). We now considerand example in which the underlying space X is Q. Consider the sequence ofrational numbers defined in the following way

    x1 = 1

    xn+1 =xn + 2

    xn + 1.

    This kind of definition is called a recursive definition. Rather than writing, as afunction of n, what xn is we write what x1 is and then what xn+1 is as a function

  • 14 1. LOGIC, SETS, FUNCTIONS, AND SPACES

    of what xn is. We can obviously find any element of the sequence that we need, aslong as we sequentially calculate each previous element. In our case wed have

    x1 = 1

    x2 =1 + 2

    1 + 1=

    3

    2= 1.5

    x3 =32 + 232 + 1

    =7

    5= 1.4

    x4 =75 + 275 + 1

    =17

    12 1.416667

    x5 =1712 + 21712 + 1

    =41

    29 1.413793

    x6 =4129 + 24129 + 1

    =99

    70 1.414286

    ...

    We see that the sequence goes up and down but that it seems to be converg-ing. What is it converging to? Lets suppose that its converging to some value x0.Recall that

    xn+1 =xn + 2

    xn + 1.

    Well see later that if f is a continuous function then limn f(xn) = f(limn xn).In this case that means that

    x0 = limn

    xn+1 = limn

    xn + 2

    xn + 1

    =x0 + 2

    x0 + 1.

    Thus we have

    x0 =x0 + 2

    x0 + 1

    and if we solve this we obtain x0 = 2. Clearly if xn > 0 then xn+1 > 0 so

    our sequence cant be converging to 2 so we must have x0 =2. But

    2 is

    not in Q. Thus we have a sequence of elements in Q that are getting very close toeach other but are not converging to any element of Q. (Of course the sequence isconverging to a point in R. In fact one construction of the real number system isin terms of such sequences in Q.

    Definition 8. Let {xn} be a sequence of points in (X, d). We say that thesequence is a Cauchy sequence if for any > 0 there is N N such that if n,m > Nthen d(xn, xm) < .

    Exercise 37. Show that if {xn} converges then {xn} is a Cauchy sequence.A metric space (X, d) in which every Cauchy sequence converges to a limit in

    X is called a complete metric space. The space of real numbers R is a completemetric space, while the space of rationals Q is not.

    Exercise 38. Is N the space of natural or counting numbers with metric dgiven by d(x, y) = |x y| a complete metric space?

    In Section 7 we defined the notion of a function being continuous at a point.It is possible to give that definition in terms of sequences.

  • 9. SEQUENCES AND SUBSEQUENCES 15

    Definition 9. Suppose (X, dX) and (Y, dY ) are metric spaces, x0 X , andf : X Y is a function. Then f is continuous at x0 if for every sequence {xn} thatconverges to x0 in (X, dX) the sequence {f(xn)} converges to f(x0) in (Y, dY ).

    Exercise 39. Show that the function f(x) = (x+ 2)/(x+ 1) is continuous atany point x 6= 1. Show that this means that if xn x0 as n then

    limn

    xn + 2

    xn + 1=

    x0 + 2

    x0 + 1.

    We can also define the concept of a closed set (and hence the concepts of opensets and compact sets) in terms of sequences.

    Definition 10. Let (X, d) be a metric space. A set S X is closed if for anyconvergent sequence {xn} such that xn S for all n then limn xn S. A set isopen if its complement is closed.

    Given a sequence {xn} we can define a new sequence by taking only some ofthe elements of the original sequence. In the example we considered earlier in whichxn was 1 if n was odd and 0 if n was even we could take only the odd n and thusobtain a sequence that did converge. The new sequence is called a subsequence ofthe old sequence.

    Definition 11. Let {xn} be some sequence in (X, d). Let {nj}j=1 be asequence of natural numbers such that for each j we have nj < nj+1, that isn1 < n2 < n3 < . . . . The sequence {xnj}j=1 is called a subsequence of the originalsequence.

    The notion of a subsequence is often useful. We often use it in the way thatwe briefly referred to above. We initially have a sequence that may not converge,but we are able to take a subsequence that does converge. Such a subsequence iscalled a convergent subsequence.

    Definition 12. A subset of a metric space with the property that every se-quence in the subset has a convergent subsequence is called sequentially compact.

    Theorem 3. In any metric space any compact set is sequentially compact.

    If we restrict attention to finite dimensional Euclidian spaces the situation iseven better behaved.

    Theorem 4. Any subset of Rn is sequentially compact if and only if it iscompact.

    Exercise 40. Verify the following limits.

    (i) limn

    n

    n+ 1= 1

    (ii) limn

    n+ 3

    n2 + 1= 0

    (iii) limn

    n+ 1n = 0

    (iv) limn

    nan + bn = max{a, b}

    Exercise 41. Consider a sequence {xn} in R. What can you say about thesequence if it converges and for each n xn is an integer.

    Exercise 42. Consider the sequence

    12 ,

    13 ,

    23 ,

    14 ,

    24 ,

    34 ,

    15 ,

    25 ,

    35 ,

    45 ,

    16 , . . . .

    For which values z R is there a subsequence converging to z?

  • 16 1. LOGIC, SETS, FUNCTIONS, AND SPACES

    Exercise 43. Prove that if a subsequence of a Cauchy sequence converges toa limit z then so does the original Cauchy sequence.

    Exercise 44. Prove that any subsequence of a convergent sequence converges.

    Finally one somewhat less trivial exercise.

    Exercise 45. Prove that if limn xn = z then

    limn

    x1 + + xnn

    = z

    10. Linear Spaces

    The notion of linear space is the axiomatic way of looking at the familiar linearoperations: addition and multiplication. A trivial example of a linear space is theset of real numbers, R.

    What is the operation of addition? The one way of answering the question issaying that the operation of addition is just the list of its properties. So, we willdefine the addition of elements from some set X as the operation that satisfies thefollowing four axioms.

    A1: x+ y = y + x for all x and y in X .A2: x+ (y + z) = (x + y) + z, for all x, y, and z in X .A3: There exists an element, denoted by 0, such that x+ 0 = x for all x in

    X .A4: For every x in X there exist an element y in X , called inverse of x, such

    that x+ y = 0.

    And, to make things more interesting we will also introduce the operation ofmultiplication by number by adding two more axioms.

    A5: 1x = x for all x in X .A6: (x) = ()x for all x in X and for all and in R.

    Finally, two more axioms relating addition and multiplication.

    A7: (x+ y) = x+ y for all x and y in X and for all in R.A8: (+ )x = x+ x for all x in X and for all and in R.

    Elements x, y, . . . , w are linearly dependent if there exist real numbers , , . . . , ,not all of them equal to zero, such that

    x+ y + + z = 0.Otherwise, the elements x, y, . . . , w are linearly independent.

    If in a space L it is possible to find n linearly independent elements, but anyn+ 1 elements are linearly dependent then we say that the space L has dimensionn.

    Nonempty subset L of a linear space L is called a linear subspace if L formsa linear space in itself. In other words, L is a linear subspace of L if for any x andy in L and all and in R

    x + y L.

  • CHAPTER 2

    Linear Algebra

    1. The Space Rn

    In the previous chapter we introduced the concept of a linear space or a vectorspace. We shall now examine in some detail one example of such a space. This isthe space of all ordered n-tuples (x1, x2, . . . , xn) where each xi is a real number.We call this space n-dimensional real space and denote it Rn.

    Remember from the previous chapter that to define a vector space we not onlyneed to define the points in that space but also to define how we add such pointsand how we multiple such points by scalars. In the case of Rn we do this elementby element in the n-tuple or vector. That is,

    (x1, x2, . . . , xn) + (y1, y2, . . . , yn) = (x1 + y1, x2 + y2, . . . , xn + yn)

    and

    (x1, x2, . . . , xn) = (x1, x2, . . . , xn).

    Let us consider the case that n = 2, that is, the case of R2. In this case we canvisualise the space as in the following diagram. The vector (x1, x2) is representedby the point that is x1 units along from the point (0, 0) in the horizontal directionand x2 units up from (0, 0) in the vertical direction.

    -

    6

    x1

    x2

    q (1, 2)

    1

    2

    Figure 1

    Let us for the moment continue our discussion in R2. Notice that we areimplicitly writing a vector (x1, x2) as a sum x1 v1 + x2 v2 where v1 is theunit vector in the first direction and v2 is the unit vector in the second direction.Suppose that instead we considered the vectors u1 = (2, 1) = 2 v1 + 1 v2 and

    17

  • 18 2. LINEAR ALGEBRA

    u2 = (1, 2) = 1 v1 + 2 v2. We could have written any vector (x1, x2) insteadas z1 u1 + z2 u2 where z1 = (2x1 x2)/3 and z2 = (2x2 x1)/3. That is, forany vector in R2 we can uniquely write that vector in terms of u1 and u2. Is thereanything that is special about u1 and u2 that allows us to make this claim? Theremust be since we can easily find other vectors for which this would not have beentrue. (For example, (1, 2) and (2, 4).)

    The property of the pair of vectors u1 and u2 is that they are independent. Thatis, we cannot write either as a multiple of the other. More generally in n dimensionswe would say that we cannot write any of the vectors as a linear combination ofthe others, or equivalently as the following definition.

    Definition 13. The vectors x1, . . . , xk all in Rn are linearly independent if itis not possible to find scalars 1, . . . , k not all zero such that

    1x1 + + kxk = 0.

    Notice that we do not as a matter of definition require that k = n or even thatk n. We state as a result that if k > n then the collection x1, . . . , xk cannotbe linearly independent. (In a real maths course we would, of course, have provedthis.)

    Comment 1. If you examine the definition above you will notice that thereis nowhere that we actually need to assume that our vectors are in Rn. We canin fact apply the same definition of linear independence to any vector space. Thisallows us to define the concept of the dimension of an arbitrary vector space as themaximal number of linearly independent vectors in that space. In the case of Rn

    we obtain that the dimension is in fact n.

    Exercise 46. Suppose that x1, . . . , xk all in Rn are linearly independent andthat the vector y in Rn is equal to 1x

    1 + + kxk. Show that this is the onlyway that y can be expressed as a linear combination of the xis. (That is show thatif y = 1x

    1 + + kxk then 1 = 1, . . . , k = k.)The set of all vectors that can be written as a linear combination of the vectors

    x1, . . . , xk is called the span of those vectors. If x1, . . . , xk are linearly independentand if the span of x1, . . . , xk is all of Rn then the collection { x1, . . . , xk } is calleda basis for Rn. (Of course, in this case we must have k = n.) Any vector in Rn

    can be uniquely represented as a linear combination of the vectors x1, . . . , xk. Weshall later see that it can sometimes be useful to choose a particular basis in whichto represent the vectors with which we deal.

    It may be that we have a collection of vectors { x1, . . . , xk } whose span is notall of Rn. In this case we call the span of { x1, . . . , xk } a linear subspace of Rn.Alternatively we say that X Rn is a linear subspace of Rn if X is closed undervector addition and scalar multiplication. That is, if for all x, y X the vectorx + y is also in X and for all x X and R the vector x is in X . If the spanof x1, . . . , xk is X and if x1, . . . , xk are linearly independent then we say that thesevectors are a basis for the linear subspace X . In this case the dimension of thelinear subspace X is k. In general the dimension of the span of x1, . . . , xk is equalto the maximum number of linearly independent vectors in x1, . . . , xk.

    Finally, we comment that Rn is a metric space with metric d : R2n R+defined by

    d((x1, . . . , xn), (y1, . . . , yn)) =(x1 y1)2 + + (xn yn)2.

    There are many other metrics we could define on this space but this is the standardone.

  • 2. LINEAR FUNCTIONS FROM Rn TO Rm 19

    2. Linear Functions from Rn to Rm

    In the previous section we introduced the space Rn. Here we shall discussfunctions from one such space to another (possibly of different dimension). Theconcept of continuity that we introduced for metric spaces is immediately applicablehere. We shall be mainly concerned here with an even narrower class of functions,namely, the linear functions.

    Definition 14. A function f : Rn Rm is said to be a linear function if itsatisfies the following two properties.

    (1) f(x+ y) = f(x) + f(y) for all x, y Rn, and(2) f(x) = f(x) for all x Rn and R.

    Comment 2. When considering functions of a single real variable, that is,functions from R to R functions of the form f(x) = ax + b where a and b arefixed constants are sometimes called linear functions. It is easy to see that if b 6= 0then such functions do not satisfy the conditions given above. We shall call suchfunctions affine functions. More generally we shall call a function g : Rn Rm anaffine function if it is the sum of a linear function f : Rn Rm and a constantb Rm. That is, if for any x Rn g(x) = f(x) + b.

    Let us now suppose that we have two linear functions f : Rn Rm andg : Rn Rm. It is straightforward to show that the function (f + g) : Rn Rmdefined by (f + g)(x) = f(x) + g(x) is also a linear function. Similarly if we have alinear function f : Rn Rm and a constant R the function (f) : Rn Rmdefined by (f)(x) = f(x) is a linear function. If f : Rn Rm and g : Rm Rk are linear functions then the composite function g f : Rn Rk defined byg f(x) = g(f(x)) is again a linear function. Finally, if f : Rn Rn is not onlylinear, but also one-to-one and onto so that it has an inverse f1 : Rn Rn thenthe inverse function is also a linear function.

    Exercise 47. Prove the facts stated in the previous paragraph.

    Recall in the previous section we defined the notion of a linear subspace. Alinear function f : Rn Rm defines two important subspaces, the image of f ,denoted Im(f) Rm, and the kernel of f , denoted Ker(f) Rn. The image of fis the set of all vectors in Rm such that f maps some vector in Rn to that vector,that is,

    Im(f) = { y Rm | x Rn such that y = f(x) }.The kernel of f is the set of all vectors in Rn that are mapped by the function fto the zero vector in Rm, that is,

    Ker(f) = { x Rn | f(x) = 0 }.The kernel of f is sometimes called the null space of f .

    It is intuitively clear that the dimension of Im(f) is no more than n. (It is ofcourse no more than m since it is contained in Rm.) Of course, in general it may beless than n, for example if m < n or if f mapped all points in Rn to the zero vectorin Rm. (You should satisfy yourself that this function is indeed a linear function.)However if the dimension of Im(f) is indeed less than n it means that the functionhas mapped the n-dimensional space Rn into a linear space of lower dimension andthat in the process some dimensions have been lost. The linearity of f means thata linear subspace of dimension equal to the number of dimensions that have beenlost must have been collapsed to the zero vector (and that translates of this linearsubspace have been collapsed to single points). Thus we can say that

    dim(Im(f)) + dim(Ker(f)) = n.

  • 20 2. LINEAR ALGEBRA

    In the following section we shall introduce the notion of a matrix and definevarious operations on matrices. If you are like me when I first came across matrices,these definitions may seem somewhat arbitrary and mysterious. However, we shallsee that matrices may be viewed as representations of linear functions and that whenviewed in this way the operations we define on matrices are completely natural.

    3. Matrices and Matrix Algebra

    A matrix is defined as a rectangular array of numbers. If the matrix containsm rows and n columns it is called an m n matrix (read m by n matrix). Theelement in the ith row and the jth column is called the ijth element. We typicallyenclose a matrix in square brackets [ ] and write it as

    a11 . . . a1n...

    . . ....

    am1 . . . amn

    .

    In the case that m = n we call the matrix a square matrix. If m = 1 the matrixcontains a single row and we call it a row vector. If n = 1 the matrix containsa single column and we call it a column vector. For most purposes we do notdistinguish between a 1 1 matrix [a] and the scalar a.

    Just as we defined the operation of vector addition and the multiplication ofa vector by a scalar we define similar operations for matrices. In order to be ableto add two matrices we require that the matrices be of the same dimension. Thatis, if matrix A is of dimension m n we shall be able to add the matrix B to itif and only if B is also of dimension m n. If this condition is met then we addmatrices simply by adding the corresponding elements of each matrix to obtain thenew m n matrix A+B. That is,

    a11 . . . a1n...

    . . ....

    am1 . . . amn

    +

    b11 . . . b1n...

    . . ....

    bm1 . . . bmn

    =

    a11 + b11 . . . a1n + b1n...

    . . ....

    am1 + bm1 . . . amn + bmn

    .

    We can see that this definition of matrix addition satisfies many of the sameproperties of the addition of scalars. If A, B, and C are all m n matrices then

    (1) A+B = B +A,(2) (A+B) + C = A+ (B + C),(3) there is a zero matrix 0 such that for any mn matrix A we have A+0 =

    0 +A = A, and(4) there is a matrix A such that A+ (A) = (A) +A = 0.Of course, the zero matrix referred to in 3 is simply the mn matrix consisting

    of all zeros (this is called a null matrix ) and the matrix A referred to in 4 is thematrix obtained from A by replacing each element of A by its negative, that is,

    a11 . . . a1n...

    . . ....

    am1 . . . amn

    =

    a11 . . . a1n...

    . . ....

    am1 . . . amn

    .

    Now, given a scalar in R and an m n matrix A we define the product of and A which we write A to be the matrix in which each element is replaced by times that element, that is,

    a11 . . . a1n...

    . . ....

    am1 . . . amn

    =

    a11 . . . a1n...

    . . ....

    am1 . . . amn

    .

  • 4. MATRICES AS LINEAR FUNCTIONS 21

    So far the definitions of matrix operations have all seemed the most naturalones. We now come to defining matrix multiplication. Perhaps here the definitionseems somewhat less natural. However in the next section we shall see that the defi-nition we shall give is in fact very natural when we view matrices as representationsof linear functions.

    We define matrix multiplication of A times B written as AB where A is anm n matrix and B is a p q matrix only when n = p. In this case the productAB is defined to be an m q matrix in which the element in the ith row and jthcolumn is

    nk=1 aikbkj . That is, to find the term to go in the ith row and the jth

    column of the product matrix AB we take the ith row of the matrix A which willbe a row vector with n elements and the jth column of the matrix B which will bea column vector with n elements. We then multiply each element of the first vectorby the corresponding element of the second and add all these products. Thus

    a11 . . . a1n...

    . . ....

    am1 . . . amn

    b11 . . . b1q...

    . . ....

    bn1 . . . bnq

    =

    n

    k=1 a1kbk1 . . .n

    k=1 a1kbkq...

    . . ....n

    k=1 amkbk1 . . .n

    k=1 amkbkq

    .

    For example

    [a b cd e f

    ] p qr st v

    = [ ap+ br + ct aq + bs+ cv

    dp+ er + ft dq + es+ fv

    ].

    We define the identity matrix of order n to be the nn matrix that has 1s onits main diagonal and zeros elsewhere that is, whose ijth element is 1 if i = j andzero if i 6= j. We denote this matrix by In or, if the order is clear from the context,simply I. That is,

    I =

    1 0 . . . 00 1 . . . 0...

    .... . .

    ...0 0 . . . 1

    .

    It is easy to see that if A is an m n matrix then AIn = A and ImA = A. In fact,we could equally well define the identity matrix to be that matrix that satisfiesthese properties for all such matrices A in which case it would be easy to show thatthere was a unique matrix satisfying this property, namely, the matrix we definedabove.

    Consider an m n matrix A. The columns of A are m-dimensional vectors,that is, elements of Rm and the rows of A are elements of Rn. Thus we can askif the n columns are linearly independent and similarly if the m rows are linearlyindependent. In fact we ask: What is the maximum number of linearly independentcolumns of A? It turns out that this is the same as the maximum number of linearlyindependent rows of A. We call the number the rank of the matrix A.

    4. Matrices as Representations of Linear Functions

    Let us suppose that we have a particular linear function f : Rn Rm. We havesuggested in the previous section that such a function can necessarily be representedas multiplication by some matrix. We shall now show that this is true. Moreoverwe shall do so by explicitly constructing the appropriate matrix.

  • 22 2. LINEAR ALGEBRA

    Let us write the n-dimensional vector x as a column vector

    x =

    x1x2...xn

    .

    Now, notice that we can write the vector x as a sumn

    i=1 xiei, where ei is the ith

    unit vector, that is, the vector with 1 in the ith place and zeros elsewhere. That is,

    x1x2...xn

    = x1

    10...0

    + x2

    01...0

    + + xn

    00...1

    .

    Now from the linearity of the function f we can write

    f(x) = f(ni=1

    xiei)

    =

    ni=1

    f(xiei)

    =

    ni=1

    xif(ei).

    But, what is f(ei)? Remember that ei is a unit vector in Rn and that f mapsvectors in Rn to vectors in Rm. Thus f(ei) is the image in Rm of the vector ei. Letus write f(ei) as

    a1ia2i...

    ami

    .

    Thus

    f(x) =

    ni=1

    xif(ei)

    = x1

    a11a21...

    am1

    + x2

    a12a22...

    am2

    + + xn

    a1na2n...

    amn

    =

    n

    i=1 a1ixini=1 a2ixi

    ...ni=1 amixi

    and this is exactly what we would have obtained had we multiplied the matrices

    a11 a12 . . . a1na21 a22 . . . a2n...

    .... . .

    ...am1 am2 . . . amn

    x1x2...xn

    .

    Thus we have not only shown that a linear function is necessarily represented bymultiplication by a matrix we have also shown how to find the appropriate matrix.

  • 4. MATRICES AS LINEAR FUNCTIONS 23

    It is precisely the matrix whose n columns are the images under the function of then unit vectors in Rn.

    Exercise 48. Find the matrices that represent the following linear functionsfrom R2 to R2.

    (1) a clockwise rotation of /2 (90),(2) a reflection in the x1 axis,(3) a reflection in the line x2 = x1 (that is, the 45

    line),(4) a counter clockwise rotation of /4 (45), and(5) a reflection in the line x2 = x1 followed by a counter clockwise rotation of

    /4.

    Recall that in Section 2 we defined, for any f, g : Rn Rm and R, thefunctions (f + g) and (f). In Section 3 we defined the sum of two mn matricesA and B, and the product of a scalar with the matrix A. Let us instead definethe sum of A and B as follows.

    Let f : Rn Rm be the linear function represented by the matrix A andg : Rn Rm be the linear function represented by the matrix B. Now definethe matrix (A + B) to be the matrix that represents the linear function (f + g).Similarly let the matrix A be the matrix that represents the linear function (f).

    Exercise 49. Prove that the matrices (A+B) and A defined in the previousparagraph coincide with the matrices defined in Section 3.

    We can also see that the definition we gave of matrix multiplication is preciselythe right definition if we mean multiplication of matrices to mean the composition ofthe linear functions that the matrices represent. To be more precise let f : Rn Rmand g : Rm Rk be linear functions and let A and B be the m n and k mmatrices that represent them. Let (g f) : Rn Rk be the composite functiondefined in Section 2. Now let us define the product BA to be that matrix thatrepresents the linear function (g f).

    Now since the matrix A represents the function f and B represents g we have

    (g f)(x) = g(f(x))

    = g

    a11 a12 . . . a1na21 a22 . . . a2n...

    .... . .

    ...am1 am2 . . . amn

    x1x2...xn

    = g

    n

    i=1 a1ixini=1 a2ixi

    ...ni=1 amixi

    =

    b11 b12 . . . b1mb21 b22 . . . b2m...

    .... . .

    ...bk1 bk2 . . . bkm

    n

    i=1 a1ixini=1 a2ixi

    ...ni=1 amixi

    =

    m

    j=1 b1jn

    i=1 ajiximj=1 b2j

    ni=1 ajixi

    ...mj=1 bkj

    ni=1 ajixi

  • 24 2. LINEAR ALGEBRA

    =

    n

    i=1

    mj=1 b1jajixin

    i=1

    mj=1 b2jajixi...n

    i=1

    mj=1 bkjajixi

    =

    m

    j=1 b1jaj1m

    j=1 b1jaj2 . . .m

    j=1 b1jajnmj=1 b2jaj1

    mj=1 b2jaj2 . . .

    mj=1 b2jajn

    ......

    . . ....m

    j=1 bkjaj1m

    j=1 bkjaj2 . . .m

    j=1 bkjajn

    x1x2...xn

    .

    And this last is the product of the matrix we defined in Section 3 to be BA withthe column vector x. As we have claimed the definition of matrix multiplicationwe gave in Section 3 was not arbitrary but rather was forced on us by our decisionto regard the multiplication of two matrices as corresponding to the compositionof the linear functions the matrices represented.

    Recall that the columns of the matrix A that represented the linear functionf : Rn Rm were precisely the images of the unit vectors in Rn under f . Thelinearity of f means that the image of any point in Rn is in the span of the imagesof these unit vectors and similarly that any point in the span of the images is theimage of some point in Rn. Thus Im(f) is equal to the span of the columns ofA. Now, the dimension of the span of the columns of A is equal to the maximumnumber of linearly independent columns in A, that is, to the rank of A.

    5. Linear Functions from Rn to Rn and Square Matrices

    In the remainder of this chapter we look more closely at an important subclassof linear functions and the matrices that represent them, viz the functions thatmap Rn to itself. From what we have already said we see immediately that thematrix representing such a linear function will have the same number of rows as ithas columns. We call such a matrix a square matrix.

    If the linear function f : Rn Rn is one-to-one and onto then the function fhas an inverse f1. In Exercise 47 you showed that this function too was linear.A matrix that represents a linear function that is one-to-one and onto is called anonsingular matrix. Alternatively we can say that an n n matrix is nonsingularif the rank of the matrix is n. To see these two statements are equivalent notefirst that if f is one-to-one then Ker(f) = {0}. (This is the trivial direction ofExercise 50.) But this means that dim(Ker(f)) = 0 and so dim(Im(f)) = n. And,as we argued at the end of the previous section this is the same as the rank ofmatrix that represents f .

    Exercise 50. Show that the linear function f : Rn Rm is one-to-one if andonly if Ker(f) = {0}.

    Exercise 51. Show that the linear function f : Rn Rn is one-to-one if andonly if it is onto.

    6. Inverse Functions and Inverse Matrices

    In the previous section we discussed briefly the idea of the inverse of a linearfunction f : Rn Rn. This allows us a very easy definition of the inverse of asquare matrix A. The inverse of A is the matrix that represents the linear functionthat is the inverse function of the linear function that A represents. We write theinverse of the matrix A as A1. Thus a matrix will have an inverse if and only ifthe linear function that the matrix represents has an inverse, that is, if and onlyif the linear function is one-to-one and onto. We saw in the previous section that

  • 7. CHANGES OF BASIS 25

    this will occur if and only if the kernel of the function is {0} which in turn occursif and only if the image of f is of full dimension, that is, is all of Rn. This is thesame as the matrix being of full rank, that is, of rank n.

    As with the ideas we have discussed earlier we can express the idea of a matrixinverse purely in terms of matrices without reference to the linear function thatthey represent. Given an n n matrix A we define the inverse of A to be a matrixB such that BA = In where In is the n n identity matrix discussed in Section 3.Such a matrix B will exist if and only if the matrix A is nonsingular. Moreover, ifsuch a matrix B exists then it is also true that AB = In, that is, (A

    1)1 = A.In Section 9 we shall see one method for calculating inverses of general n n

    matrices. Here we shall simply describe how to calculate the inverse of a 2 2matrix. Suppose that we have the matrix

    A =

    [a bc d

    ].

    The inverse of this matrix is(1

    ad bc)[

    d bc a

    ].

    Exercise 52. Show that the matrix A is of full rank if and only if ad bc 6= 0.Exercise 53. Check that the matrix given is, in fact, the inverse of A.

    7. Changes of Basis

    We have until now implicitly assumed that there is no ambiguity when wespeak of the vector (x1, x2, . . . , xn). Sometimes there may indeed be an obviousmeaning to such a vector. However when we define a linear space all that are reallyspecified are what straight lines are and where zero is. In particular, we donot necessarily have defined in an unambiguous way where the axes are or whata unit length along each axis is. In other words we may not have a set of basisvectors specified.

    Even when we do have, or have decided on, a set of basis vectors we may wishto redefine our description of the linear space with which we are dealing so as touse a different set of basis vectors. Let us suppose that we have an n-dimensionalspace, even Rn say, with a given set of basis vectors v1, v2, . . . , vn and that wewish instead to describe the space in terms of the linearly independent vectorsb1, b2, . . . , bn where

    bi = b1iv1 + b2iv

    2 + + bnivn.Now, if we had the description of a point in terms of the new coordinate vectors,

    e.g., asz1b

    1 + z2b2 + + znbn

    then we can easily convert this to a description in terms of the original basis vectors.We would simply substitute the formula for bi in terms of the ej s into the previousformula giving(

    ni=1

    b1izi

    )v1 +

    (ni=1

    b2izi

    )v2 + +

    (ni=1

    bnizi

    )vn

    or, in our previous notation

    (n

    i=1 b1izi)(n

    i=1 b2izi)...

    (n

    i=1 bnizi)

    .

  • 26 2. LINEAR ALGEBRA

    But this is simply the product

    b11 b12 . . . b1nb21 b22 . . . b2n...

    .... . .

    ...bn1 bn2 . . . bnn

    z1z2...zn

    .

    That is, if we are given an n-tuple of real numbers that describe a vector in termsof the new basis vectors b1, b2, . . . , bn and we wish to find the n-tuple that describesthe vector in terms of the original basis vectors we simply multiply the ntuple weare given, written as a column vector by the matrix whose columns are the newbasis vectors b1, b2, . . . , bn. We shall call this matrix B. We see among other thingsthat changing the basis is a linear operation.

    Now, if we were given the information in terms of the original basis vectorsand wanted to write it in terms of the new basis vectors what should we do? Sincewe dont have the original basis vectors written in terms of the new basis vectorsthis is not immediately obvious. However we do know that if we were to do it andthen were to carry out the operation described in the previous paragraph we wouldbe back with what we started. Further we know that the operation is a linearoperation that maps n-tuples to n-tuples and so is represented by multiplicationby an nn matrix. That is we multiply the n-tuple written as a column vector bythe matrix that when multiplied by B gives the identity matrix, that is, the matrixB1. If we are given a vector of the form

    x1v1 + x2v

    2 + + xnvnand we wish to express it in terms of the vectors b1, b2, . . . , bn we calculate

    b11 b12 . . . b1nb21 b22 . . . b2n...

    .... . .

    ...bn1 bn2 . . . bnn

    1

    x1x2...xn

    .

    Suppose now that we consider a linear function f : Rn Rn and that we haveoriginally described Rn in terms of the basis vectors v1, v2, . . . , vn where vi is thevector with 1 in the ith place and zeros elsewhere. Suppose that with these basisvectors f is represented by the matrix

    A =

    a11 a12 . . . a1na21 a22 . . . a2n...

    .... . .

    ...an1 an2 . . . ann

    .

    If we now describe Rn in terms of the vectors b1, b2, . . . , bn how will the linearfunction f be represented? Let us think of what we want? We shall be givena vector described in terms of the basis vectors b1, b2, . . . , bn and we shall wantto know what the image of this vector under the linear function f is, where weshall again want our answer in terms of the basis vectors b1, b2, . . . , bn. We shallknow how to do this when we are given the description in terms of the vectorse1, e2, . . . , en. Thus the first thing we shall do with our vector is to convert it froma description in terms of b1, b2, . . . , bn to a description in terms of e1, e2, . . . , en. Wedo this by multiplying the n-tuple by the matrix B. Thus if we call our originaln-tuple z we shall now have a description of the vector in terms of e1, e2, . . . , en,viz Bz. Given this description we can find the image of the vector in questionunder f by multiplying by the matrix A. Thus we shall have A(Bz) = (AB)z.Remember however this will have given us the image vector in terms of the basis

  • 8. THE TRACE AND THE DETERMINANT 27

    vectors e1, e2, . . . , en. In order to convert this to a description in terms of the vectorsb1, b2, . . . , bn we must multiply by the matrix B1. Thus our final n-tuple will be(B1AB)z.

    Recapitulating, suppose that we know that the linear function f : Rn Rn isrepresented by the matrix A when we describe Rn in terms of the standard basisvectors e1, e2, . . . , en and that we have a new set of basis vectors b1, b2, . . . , bn. Thenwhen Rn is described in terms of these new basis vectors the linear function f willbe represented by the matrix B1AB.

    Exercise 54. Let f : Rn Rm be a linear function. Suppose that with thestandard bases for Rn and Rm the function f is represented by the matrix A. Letb1, b2, . . . , bn be a new set of basis vectors for Rn and c1, c2, . . . , cm be a new set ofbasis vectors for Rm. What is the matrix that represents f when the linear spacesare described in terms of the new basis vectors?

    Exercise 55. Let f : R2 R2 be a linear function. Suppose that with thestandard bases for Rn and Rm the function f is represented by the matrix[

    3 11 2

    ].

    Let [32

    ]and

    [11

    ]be a new set of basis vectors for R2. What is the matrix that represents f whenR2 is described in terms of the new basis vectors?

    Properties of a square matrix that depend only on the linear function that thematrix represents and not on the particular choice of basis vectors for the linearspace are called invariant properties. We have already seen one example of aninvariant property, the rank of a matrix. The rank of a matrix is equal to thedimension of the image space of the function that the matrix represents whichclearly depends only on the function and not on the choice of basis vectors for thelinear space.

    The idea of a property being invariant can be expressed also in terms only ofmatrices without reference to the idea of linear functions. A property is invariantif whenever an n n matrix A has the property then for any nonsingular n nmatrix B the matrix B1AB also has the property. We might think of rank as afunction that associates to any square matrix a nonnegative integer. We shall saythat such a function is an invariant if the property of having the function take aparticular value is invariant for all particular values we may choose.

    Two particularly important invariants are the trace of a square matrix and thedeterminant of a square matrix. We examine these in more detail in the followingsection.

    8. The Trace and the Determinant

    In this section we define two important real valued functions on the spaceof n n matrices, the trace and the determinant. Both of these concepts havegeometric interpretations. However, while the trace is easy to calculate (much easierthan the determinant) its geometric interpretation is rather hard to see. Thus weshall not go into it. On the other hand the determinant while being somewhatharder to calculate has a very clear geometric interpretation. In Section 9 we shallexamine in some detail how to calculate determinants. In this section we shall becontent to discuss one definition and the geometric intuition of the determinant.

  • 28 2. LINEAR ALGEBRA

    Given an nn matrix A the trace of A, written tr(A) is the sum of the elementson the main diagonal, that is,

    tr

    a11 a12 . . . a1na21 a22 . . . a2n...

    .... . .

    ...an1 an2 . . . ann

    =

    ni=1

    aii.

    Exercise 56. For the matrices given in Exercise 55 confirm that tr(A) =tr(B1AB).

    It is easy to see that the trace is a linear function on the space of all n nmatrices, that is, that for all A and B n n matrices and for all R(1) tr(A+B) = tr(A) + tr(B),

    and

    (2) tr(A) = tr(A).

    We can also see that if A and B are both nn matrices then tr(AB) = tr(BA).In fact, if A is an m n matrix and B is an n m matrix this is still true. Thiswill often be extremely useful in calculating the trace of a product.

    Exercise 57. From the definition of matrix multiplication show that if A is anmn matrix and B is an nm matrix that tr(AB) = tr(BA). [Hint: Look at thedefinition of matrix multiplication in Section 2. Then write the determinant of theproduct matrix using summation notation. Finally change the order of summation.]

    The determinant, unlike the trace is not a linear function of the matrix. It doeshowever have some linear structure. If we fix all columns of the matrix except oneand look at the determinant as a function of only this column then the determinantis linear in this single column. Moreover this is true whatever the column we choose.Let us write the determinant of the n n matrix A as det(A). Let us also writethe matrix A as [a1, a2, . . . , an] where ai is the ith column of the matrix A. Thusour claim is that for all n n matrices A, for all i = 1, 2, . . . n, for all n vectors b,and for all R

    det([a1, . . . , ai1, ai + b, ai+1, . . . , an]) = det([a1, . . . , ai1, ai, ai+1, . . . , an])

    + det([a1, . . . , ai1, b, ai+1, . . . , an])(3)

    and

    (4) det([a1, . . . , ai1, ai, ai+1, . . . , an]) = det([a1, . . . , ai1, ai, ai+1, . . . , an]).

    We express this by saying that the determinant is a multilinear function.Also the determinant is such that any n n matrix that is not of full rank,

    that is, of rank n, has a zero determinant. In fact, given that the determinantis a multilinear function if we simply say that any matrix in which one column isthe same as one of its neighbours has a zero determinant this implies the strongerstatement that we made. We already see one use of calculating determinants. Amatrix is nonsingular if and only if its determinant is nonzero.

    The two properties of being multilinear and zero whenever two neighbouringcolumns are the same already almost uniquely identify the determinant. Noticehowever that if the determinant satisfies these two properties then so does anyconstant times the determinant. To uniquely define the determinant we tie downthis constant by assuming that det(I) = 1.

    Though we havent proved that it is so, these three properties uniquely de-fine the determinant. That is, there is one and only one function with these threeproperties. We call this function the determinant. In Section 9 we shall discuss a

  • 9. CALCULATING AND USING DETERMINANTS 29

    number of other useful properties of the determinant. Remember that this addi-tional properties are not really additional facts about the determinant. They canall be derived from the three properties we have given here.

    Let us now look to the geometric interpretation of the determinant. Let usfirst think about what linear transformations can do to the space Rn. Since wehave already said that a linear transformation that is not onto is represented by amatrix with a zero determinant let us think about linear transformations that areonto, that is, that do not map Rn into a linear space of lower dimension. Suchtransformations can rotate the space around zero. They can stretch the space indifferent directions. And they can flip the space over. In the latter case all objectswill become mirror images of themselves. We call linear transformations thatmake such a mirror image orientation reversing and those that dont orientationpreserving. A matrix that represents an orientation preserving linear function has apositive determinant while a matrix that represents an orientation reversing linearfunction has a negative determinant. Thus we have a geometric interpretation ofthe sign of the determinant.

    The absolute size of the determinant represents how much bigger or smaller thelinear function makes objects. More precisely it gives the volume of the imageof the unit hypercube under the transformation. The word volume is in quotesbecause it is the volume with which we are familiar only when n = 3. If n = 2 thenit is area, while if n > 3 then it is the full dimensional analog in Rn of volume inR3.

    Exercise 58. Consider the matrix[3 11 2

    ].

    In a diagram show the image under the linear function that this matrix representsof the unit square, that is, the square whose corners are the points (0,0), (1,0),(0,1), and (1,1). Calculate the area of that image. Do the same for the matrix[

    4 11 1

    ].

    In