Foundations Of Economic Analysis

ECON 381 SC Foundations Of Economic Analysis

2009

John Hillas and Dmitriy Kvasov

University of Auckland

Contents

Chapter 1. Logic, Sets, Functions, and Spaces 11. Logic 12. Sets 33. Binary Relations 44. Functions 55. Spaces 76. Metric Spaces and Continuous Functions 87. Open sets, Compact Sets, and the Weierstrass Theorem 108. Sequences and Subsequences 119. Linear Spaces 14

Chapter 2. Linear Algebra 171. The Space Rn 172. Linear Functions from Rn to Rm 193. Matrices and Matrix Algebra 204. Matrices as Representations of Linear Functions 225. Linear Functions from Rn to Rn and Square Matrices 246. Inverse Functions and Inverse Matrices 257. Changes of Basis 258. The Trace and the Determinant 289. Calculating and Using Determinants 3010. Eigenvalues and Eigenvectors 34

Chapter 3. Consumer Behaviour: Optimisation Subject to the BudgetConstraint 37

1. Constrained Maximisation 372. The Implicit Function Theorem 413. The Theorem of the Maximum 434. The Envelope Theorem 455. Applications to Microeconomic Theory 49

Chapter 4. Topics in Convex Analysis 571. Convexity 572. Support and Separation 61

i

CHAPTER 1

Logic, Sets, Functions, and Spaces

1. Logic

All the aspects of logic that we describe in this section are part of what is calledfirst order or propositional logic.

We start by supposing that we have a number of atomic statements, which wedenote by lower case letters, p, q, r. Examples of such statements might be

Consumer 1 is a utility maximiserthe apple is greenthe price of good 3 is 17.

We assume that each atomic statement is either true or false.Given these atomic statements we can form other statements using logical con-

nectives.If p is a statement then ¬p, read not p, is the statement that is true precisely

when p is false. If both p and q are statements then p ∧ q, read p and q, is thestatement that is true when both p and q are true and false otherwise. If both pand q are statements then p ∨ q, read p or q, is the statement that is true wheneither p and q are true, that is, the statement that is false only if both p and q arefalse.

We could make do with these three symbols together with brackets to groupsymbols and tell us what to do first. For example we could have the complicatedstatement ((p ∧ q) ∨ (p ∧ r)) ∨ ¬s. This means that at least one of two statementsis true. The first is that either both p and q are true or both p and r are true. Thesecond is that s is not true.

Exercise 1. Think about the meaning of the statement we have just consid-ered. Can you see a more straightforward statement that would mean the samething?

While we don’t strictly need any more symbols it is certainly convenient tohave at least a couple more. If both p and q are statements then p ⇒ q, read if pthen q or p implies q or p is sufficient for q or q is necessary for p, is the statementthat is false when p is true and q is false and is true otherwise. Many people findthis a bit nonintuitive. In particular, one might wonder about the truth of thisstatement when p is false and q is true. A simple (and correct) answer is that thisis a definition. It is simply what we mean by the symbol and there isn’t any pointin arguing about definitions. However there is a sense in which the definition iswhat is implied by the informal statements. When we say “if p then q” we aresaying that in any situation or state in which p is true then q is also true. We arenot making any claim about what might or might not be the case when p is nottrue. So, in states in which p is not true we make no claim about q and so ourstatement is true whether q is true or false. Instead of p ⇒ q we can write q ⇐ p.In this case we are most likely to read the statement as q only if p or q is necessaryfor p.

1

2 1. LOGIC, SETS, FUNCTIONS, AND SPACES

If p ⇒ q and p ⇐ q (that is q ⇒ p) then we say that p if and only if q or p isnecessary and sufficient for q and write p⇔ q.

One powerful method of analysing logical relationships is by means of truthtables. A truth table lists all possible combinations of the truth values of theatomic statements and the associated truth values of the compound statements.If we have two atomic statements then the following table gives the four possiblecombinations of truth values.

p qT TF TT FF F

Now, we can add a column that would, for each combination of truth values ofp and q, give the truth value of p⇒ q, just as described above.

p q p⇒ qT T TF T TT F FF F T

Such truth tables allow us to see the logical relationship between various state-ments. Suppose we have two compound statements A and B and we form a truthtable showing the truth values of A and B for each possible profile of truth valuesof the atomic statements that constitute A and B. If in each row in which A is trueB is also true then statement A implies statement B. If statements A and B havethe same truth value is each row then statements A and B are logically equivalent.For example I claim that the statement p ⇒ q we have just considered is logicallyequivalent to ¬p∨ q. We can see this by adding columns to the truth table we havejust considered. Let me add a column for ¬p and then one for ¬p∨ q. (we only addthe column for ¬p to make it easier).

p q p⇒ q ¬p ¬p ∨ qT T T F TF T T T TT F F F FF F T T T

Since the third column and the fifth column contain exactly the same truth valueswe see that the two statements, p⇒ q and ¬p ∨ q are indeed logically equivalent.

Exercise 2. Construct the truth table for the statement ¬(¬p ∨ ¬q). Is itpossible to write this statement using fewer logical connectives? Hint: why notstart with just one?

Exercise 3. Prove that the following statements are equivalent:(i) (p ∨ ¬q)⇒ ((¬p) ∧ q) and ¬(q ⇒ p),

(ii) p⇒ q and ¬q ⇒ ¬p.In part (ii) the second statement is called the contrapositive of the first statement.Often if you are asked to prove that p implies q it will be easier to show thecontrapositive, that is, that not q implies not p.

Exercise 4. Prove that the following statements are equivalent:(i) ¬(p ∧ q) and ¬p ∨ ¬q,

(ii) ¬(p ∨ q) and ¬p ∧ ¬q.

2. SETS 3

These two equivalences are known as De Morgan’s Laws.

A tautology is a statement that is necessarily true. For example if the state-ments A and B are logically equivalent then the statement A ⇔ B is a tautology.If A logically implies B then A⇒ B is a tautology. We can check whether a com-pound statement is a tautology by writing a truth table for this statement. If thestatement is a tautology then its truth value should be T in each row of its truthtable.

A contradiction is a statement that is necessarily false, that is, a statement Asuch that ¬A is a tautology. Again, we can see whether a statement is a contradic-tion by writing a truth table for the statement.

2. Sets

Set theory was developed in the second half of the 19th century and is at thevery foundation of modern mathematics. But we shall not be concerned here withthe development of the theory. Rather we shall only give the basic language of settheory and outline some of the very basic operations on sets.

We start by defining a set to be a collection of objects or elements. We willusually denote sets by capital letters and their elements by lower case letters. Ifthe element a is in the set A we write a ∈ A. If every element of the set B is alsoin the set A we call B a subset of the set A and write B ⊂ A. We shall also saythe A contains B. If A and B have exactly the same elements then we say theyare equal or identical. Alternatively we could say A = B if and only if A ⊂ B andB ⊂ A. If B ⊂ A and B 6= A then we say that B is a proper subset of A or that Astrictly contains B.

Exercise 5. How many subsets a set with N elements has?

In order to avoid the paradoxes such as the one referred to in the first paragraphwe shall always assume that in whatever situation we are discussing there is somegiven set U called the universal set which contains all of the sets with which weshall deal.

We customarily enclose our specification of a set by braces. In order to specifya set one may simply list the elements. For example to specify the set D whichcontains the numbers 1,2, and 3 we may write D = {1, 2, 3}. Alternatively we maydefine the set by specifying a property that identifies the elements. For examplewe may specify the same set D by D = {x | x is an integer and 0 < x < 4}. Noticethat this second method is more powerful. We could not, for example, list allthe integers. (Since there are an infinite number of them we would die before wefinished.)

For any two sets A and B we define the union of A and B to be the set whichcontains exactly all of the elements of A and all the elements of B. We denote theunion of A and B by A ∪ B. Similarly we define the intersection of A and B tobe that set which contains exactly those elements which are in both A and B. Wedenote the intersection of A and B by A ∩B. Thus we have

A ∪B = {x | x ∈ A or x ∈ B}A ∩B = {x | x ∈ A and x ∈ B}.

Exercise 6. The oldest mathematician among chess players and the oldestchess player among mathematicians is it the same person or (possibly) differentones?

Exercise 7. The best mathematician among chess players and the best chessplayer among mathematicians is it the same person or (possibly) different ones?


Exercise 8. Every tenth mathematician is a chess player and every fourthchess player is a mathematician. Are there more mathematicians or chess playersand by how many times?

Exercise 9. Prove the distributive laws for operations of union and intersec-tion.

(i) (A ∩B) ∪ C = (A ∪ C) ∩ (B ∪ C)(ii) (A ∪B) ∩ C = (A ∩ C) ∪ (B ∩ C)

Just as the number zero is extremely useful so the concept of a set that hasno elements is extremely useful also. This set we call the empty set or the null setand denote by ∅. To see one use of the empty set notice that having such a conceptallows the intersection of two sets be well defined whether or not the sets have anyelements in common.

We also introduce the concept of a Cartesian product. If we have two sets, sayA and B, the Cartesian product, A × B, is the set of all ordered pairs, (a, b) suchthat a is an element of A and b is an element of B. Symbolically we write

A×B = {(a, b) | a ∈ A and b ∈ B}.

3. Binary Relations

There are a number of ways of formulating the notion of a binary relation. Weshall pursue one, defining a binary relation on a set X simply as a subset of X×X,the Cartesian product of X with itself.

Definition 1. A binary relation R on the set X is a subset of X ×X. If thepoint (x, y) ∈ R we shall often write xRy instead of (x, y) ∈ R.

Since we have already defined the notions of Cartesian product and subset,there is really nothing new here. However the structure and properties of binaryrelations that we shall now study is motivated by the informal notion of a “relation”between the elements of X.

Example 1. Suppose that X is a set of boys and girls and the relation xSy is“x is a sister of y.”

Example 2. Suppose that X is the set of natural numbers X = {1, 2, 3, . . . }.There are binary relations >, ≥, and =.

Example 3. Suppose that X is the set of natural numbers X = {1, 2, 3, . . . }.The relations R, P , and I are defined by

xRy if and only if x+ 1 ≥ y,xPy if and only if x > y + 1, andxIy if and only if −1 ≤ x− y ≤ 1.

Definition 2. The following properties of binary relations have been definedand found to be useful.

(BR1) Reflexivity: For all x in X xRx.(BR2) Irreflexivity: For all x in X not xRx.(BR3) Completeness: For all x and y in X either xRy or yRx (or both).1

(BR4) Transitivity: For all x, y, and z in X if xRy and yRz then xRz.(BR5) Negative Transitivity: For all x, y, and z in X if xRy then eitherxRz or zRy (or both).

(BR6) Symmetry: For all x and y in X if xRy then yRx.(BR7) Anti-Symmetry: For all x and y in X if xRy and yRx then x = y.(BR8) Asymmetry: For all x and y in X if xRy then not yRx.

1We shall always implicitly include “or both” when we say “either. . . or.”

4. FUNCTIONS 5

Exercise 10. Show that completeness implies reflexivity, that asymmetry im-plies anti-symmetry, and that asymmetry implies irreflexivity.

Exercise 11. Which properties does the relation described in Example 1 sat-isfy?

Exercise 12. Which properties do the relations described in Example 2 sat-isfy?

Exercise 13. Which properties do the relations described in Example 3 sat-isfy?

We now define a few particularly important classes of binary relations.

Definition 3. A weak order is a binary relation that satisfies transitivity andcompleteness.

Definition 4. A strict partial order is a binary relation that satisfies transi-tivity and asymmetry.

Definition 5. An equivalence is a binary relation that satisfies transitivityand symmetry.

You have almost certainly already met examples of such binary relations inyour study of Economics. We normally assume that weak preference, strict pref-erence, and indifference of a consumer are weak orders, strict partial orders, andequivalences, though we actually typically assume a little more about the strictpreference.

The following construction is also motivated by the idea of preference. Letus consider some binary relation R which we shall informally think of as a weakpreference relation, though we shall not, for the moment, make any assumptionsabout the properties of R. Consider the relations P defined by xPy if and only ifxRy and not yRx and I defined by xRy if and only if xRy and yRx.

Exercise 14. Show that if R is a weak order then P is a strict partial orderand I is an equivalence.

We could also think of starting with a strict preference P and defining the weakpreference R in terms of P . We could do so either by defining R as xRy if and onlyif not yPx or by defining R as xRy if and only if either xPy or not yPx.

Exercise 15. Show that these two definitions of R coincide if P is asymmetric.

Exercise 16. Show by example that P may be a strict partial order (so, bythe previous result, the two definitions of R coincide) but R not a weak order.[Hint: If you cannot think of another example consider the binary relations definedin Example 3.]

Exercise 17. Show that if P is asymmetric and negatively transitive then(i) P is transitive (and hence a strict partial order), and

(ii) R is a weak order.

4. Functions

Let X and Y be two sets. A function (or a mapping) f from the set X to theset Y is a rule that assigns to each x in X a unique element in Y , denoted by f(x).The notation

f : X → Y.


is standard. The set X is called the domain of f and the set Y is called thecodomain of f . The set of all values taken by f , i.e. the set

{y ∈ Y | there exists x in X such that y = f(x)}is called the range of f . The range of a function need not coincide with its codomainY .

There are several useful ways of visualising functions. A function can be thoughtof as a machine that operates on elements of the set X and transforms an inputx into a unique output f(x). Note that the machine is not required to producedifferent outputs from different inputs. This analogy helps to distinguish betweenthe function itself, f , and its particular value, f(x). The former is the machine,the latter is the output2! One of the reasons for this confusion is that in practice,to avoid being verbose, people often say things like ‘consider a function U(x, y) =xαyβ ’ instead of saying ‘consider a function defined for every pair (x, y) in R2 bythe equation U(x, y) = xαyβ ’.

A function can also be thought of as a transformation, or a mapping, of the setX into the set Y . In line with this interpretation is the common terminology, it issaid that f(x) is the image of x under the function f . Again, it is important toremember that there may be points of Y which are the images of no point of X andthat there may be different points of X which have the same images in Y . What isabsolutely prohibited, however, is for a point from X to have several images in Y !

The part of definition of the function is the specification of its domain. However,in applications, functions are quite often defined as an algebraic formula, withoutexplicit specification of its domain. For example, a function may be defined as

f(x) = sinx+ 145x2.

The function f is then the rule that assigns the value sinx+ 145x2 to each value ofx. The convention in such cases is that the domain of f is the set of all values of xfor which the formula gives a unique value. Thus, if you come, for instance, acrossthe function f(x) = 1/x you should assume that its domain is (−∞, 0) ∪ (0,∞),unless specified otherwise.

For any subset A of X, the subset f(A) of Y such that y = f(x) for some x inX is called the image of A by f , that is,

f(A) = {y ∈ Y | there exists x in A such that y = f(x)}.Thus, the the range of f can be written as f(X). Similarly, one can define theinverse image. For any subset B of Y , the inverse image f−1(B) of B is the set ofx in X such that f(x) is in B, that is,

f−1(B) = {x ∈ X | f(x) ∈ B}.A function f is called a function onto Y (or surjection) if the range of f is Y ,

i.e., if for every y ∈ Y there is (at least) one x ∈ X such that y = f(x). In otherwords, each element of Y is the image of (at least) one element of X. A function f iscalled one-to-one (or injection) if f(x1) = f(x2) implies x1 = x2, that is, for everyelement y of f(X) there is a unique element x of X such that y = f(x). In otherwords, one-to-one function maps different elements of X into different elements ofY . When a function f : X → Y is both onto and one-to-one it is called a bijection.

Exercise 18. Suppose that a set X has m elements and a set Y has n ≥ melements. How many different functions are there from X to Y ? from Y to X?How many of them surjective? How many of them injective? How many of thembijective?

2Mathematician Robert Bartle put it as follows. ”Only a fool would confuse sausage-grinderwith a sausage; however, enough people have confused functions with their values...”

5. SPACES 7

Exercise 19. Find a function f : N→ N which is

(i) surjective but not injective,(ii) injective but not surjective,(iii) neither surjective nor injective,(iv) bijective

If function f is a bijection then it is possible to define a function g : Y → Xsuch that g(y) = x where x = f(y). Thus, to each element y of Y is assigned anelement x in X whose image under f is y. Since f is onto, g is defined for every yof Y and since f is one-to-one g(y) is unique. The function g is called the inverse off and is usually written as f−1. In that case, however, it’s not immediately clearwhat f−1(x) means. Is it the inverse image of x under f or the image of x underf−1? Happily enough they are the same if f−1 exists!

Exercise 20. Prove that when a function f−1 exists it is both onto and one-to-one and that the inverse of f−1 is the function f itself.

If f : X → Y and g : Y → Z, then the function h : X → Z, defined ash(x) = g(f(x)), is called the composition of g with f and denoted by g ◦ f . Notethat even if f ◦ g is well-defined it is usually, different from g ◦ f .

Exercise 21. Let f : X → Y . Prove that there exist a surjection g : X → Awhere A ⊆ X and a injection h : A→ Y such that f = h ◦ g. In other words, provethat any function can be written as a composition of a surjection and an injection.

The set G ⊂ X×Y of ordered pairs (x, f(x)) is called the graph of the functionf3. Of course, the fact that something is called a graph does not necessarily meanthat it can be drawn!

5. Spaces

Sets are reasonably interesting mathematical objects to study. But to makethem even more interesting (and useful for applications) sets are usually endowedwith some additional properties, or structures. These new objects are called spaces.The structures are often modeled after the familiar properties of space we live in andreflect (in axiomatic form) such notions as order, distance, addition, multiplication,etc.

Probably one of the most intuitive spaces is the space of the real numbers, R.We will briefly look at the axiomatic way of describing some of its properties.

Given the set of real numbers R, the operation of addition is the function+ : R × R → R that maps any two elements x and y in R to an element denotedby x+ y and called the sum of x and y. The addition satisfies the following axiomsfor all real numbers x, y, and z.

A1: x+ y = y + x.A2: (x+ y) + z = x+ (y + z).A3: There exist an element, denoted by 0, such that x+ 0 = x.A4: For each x there exist an element, denoted by−x, such that x+ (−x) = 0.

All the remaining properties of the addition can be proven using these axioms.Note also that we can define another operation x − y as x + (−y) and call itsubtraction.

3Some people like the idea of the graph of a function so much that they define a function tobe its graph.


Exercise 22. Prove that the axioms for addition imply the following state-ments.

(i) The element 0 is unique.(ii) If x+ y = x+ z then y = z (a cancelation law).(iii) −(−x) = x.

The operation of multiplication can be axiomatised in a similar way. Given theset of real numbers, R, the operation of multiplication is the function · : R×R→ Rthat maps any two elements x and y in R to an element denoted by x · y and calledthe product of x and y. The multiplication satisfies the following axioms for all realnumbers x, y, and z.

A5: x · y = y · x.A6: (x · y) · z = x · (y · z).A7: There exist an element, denoted by 1, such that x · 1 = x.A8: For each x 6= 0 there exist an element, denoted by x−1, such thatx · x−1 = 1.

One more axiom (a distributive law) brings these two operations, addition andmultiplication4, together.

A9: x(y + z) = xy + xz for all x, y, and z in R.Another structure possessed by the real numbers has to do with the fact that

the real numbers are ordered. The notion of x less than y can be axiomatised asfollows. For any two distinct elements x and y either x < y or y < x and, inaddition, if x < y and y < z then x < z.

Another example of a space (very important and useful one) is n−dimensionalreal space5. Given the natural number n, define Rn to be the set of all possi-ble ordered n−tuples of n real numbers, with generic element denoted by x =(x1, . . . , xn). Thus, the space Rn is the n−fold Cartesian product of the set R withitself. Real numbers x1, . . . , xn are called coordinates of the vector x. Two vectorsx and y are equal if and only if x1 = y1, . . . , xn = yn. The operation of addition oftwo vectors is defined as

x+ y = (x1 + y1, . . . , xn + yn).

Exercise 23. Prove that the addition of vectors in Rn satisfies the axioms ofaddition.

The role of multiplication in this space is player by the operation of multipli-cation by real number defined for all x in Rn and all α in R by

αx = (αx1, . . . , αxn).

Exercise 24. Prove that the multiplication by real number satisfies a distribu-tive law.

6. Metric Spaces and Continuous Functions

The notion of metric is the generalisation of the notion of distance between tworeal numbers.

Let X be a set and d : X×X → R a function. The function d is called a metricif it satisfies the following properties for all x, y, and z in X.1. d(x, y) ≥ 0 and d(x, y) = 0 if and only if x = y,2. d(x, y) = d(y, x),3. d(x, y) ≤ d(x, z) + d(z, y).

4From now on, to go easy on notation we will follow the standard convention not to writethe symbol for multiplication, that is to write xy instead of x · y, etc.

5We haven’t defined what the word dimension means yet, so just treat it as a (fancy) name.

6. METRIC SPACES AND CONTINUOUS FUNCTIONS 9

The set X together with the function d is called a metric space, elements of Xare usually called points, and the number d(x, y) is called the distance between xand y. The last property of a metric is called triangle inequality.

Exercise 25. Let X be a non-empty set and d : X ×X → R be the functionthat satisfies the following two properties for all x, y, and z in X.

(i) d(x, y) = 0 if and only if x = y,(ii) d(x, y) ≤ d(x, z) + d(z, y).

Prove that d is a metric.

Exercise 26. Prove that d(x, y)+d(w, z) ≤ d(x,w)+d(x, z)+d(y, w)+d(y, z)for all x, y, w, and z in X, where d is some metric on X.

An obvious example of a metric space is the the set of real numbers, R, togetherwith the ‘usual’ distance, d(x, y) = |x− y|. Another example is the n−dimensionalEuclidean space Rn with metric

d(x, y) =√

(x1 − y1)2 + · · ·+ (xn − yn)2.

Note that the same set can be endowed with the different metrics thus resultingin the different metric spaces! For example, the set of all n−tuples of real numberscan be made into metric space by use of the (non-Euclidean) metric

dT (x, y) = |x1 − y1|+ · · ·+ |xn − yn|,

which is different from metric space Rn. This metric is sometimes called the Man-hattan (or taxicab) metric. Another curious metric is the so-called French railroadmetric, defined by

dF (x, y) ={

0 if x = yd(x, P ) + d(y, P ) if x 6= y

where P is the particular point of Rn (called Paris) and function d is the Euclideandistance.

Exercise 27. Prove that the French railroad metric dF is a metric.

Exercise 28. Let X be a non-empty set and d : X ×X → R be the functiondefined by

d(x, y) ={

1 if x 6= y0 if x = y

Prove that d is a metric. (This metric is called the discrete metric.)

Using the notion of metric it is possible to generalise the idea of continuousfunction.

Suppose (X, dX) and (Y, dY ) are metric spaces, x0 ∈ X, and f : X → Y is afunction. Then f is continuous at x0 if for every ε > 0 there exists a δ > 0 suchthat

dY (f(x0), f(x)) < ε

for all points x ∈ X for which dX(x0, x) < δ.The function f is continuous on X if f is continuous at every point of X.Let’s prove that function f(x) = x is continuous on R using the above definition.

For all x0 ∈ R, we have |f(x0) − f(x)| = |x0 − x| < ε as long as |x0 − x| < δ = ε.That is, given any ε > 0 we are always able to find a δ, namely δ = ε, such thatall points which are closer to x0 than δ will have images which are closer to f(x0)than ε.


Exercise 29. Let f : R→ R be the function defined by

f(x) ={

1/x if x 6= 00 if x = 0

Prove that f is continuous at every point of R, with the exception of 0.

7. Open sets, Compact Sets, and the Weierstrass Theorem

Let x be a point in a metric space and r > 0. The open ball B(x, r) of radiusr centred at x is the set of all y ∈ X such that d(x, y) < r. Thus, the open ball isthe set of all points whose distance from the centre is strictly less than r. The ballis closed if the inequality is weak, d(x, y) ≤ r.

A set S in a metric space is open if for all x ∈ S there exists r ∈ R, r > 0 suchthat B(x, r) ⊂ S. A set S is closed if its complement

SC = {x ∈ X :| x /∈ S}is open.

Exercise 30. Prove that an open ball is an open set.

Exercise 31. Prove that the intersection of any finite number of open sets isthe open set.

A set S is bounded if there exists a closed ball of finite radius that contains it.Formally, S is bounded if there exists a closed ball B(x, r) such that S ⊂ B(x, r).

Exercise 32. Prove that the set S is bounded if and only if there a exists areal number p > 0 such that d(x, x′) ≤ p for all x and x′ in S.

Exercise 33. Prove that the union of two bounded sets is a bounded set.

A collection (possibly infinite) of open sets U1, U2, . . . in a metric space is anopen cover of the set S if S is contained in its union.

A set S is compact if every open cover of S has a finite subcover. That is fromany open cover can select a finite number of sets Ui that still cover S.

Note that the definition does not say that a set is compact if there is a finiteopen cover! That wouldn’t be a good definition as you can cover any set with thewhole space, which is just one open set.

Let’s see how to use this definition to show that something is not compact.Consider the set (0, 1) ∈ R. To prove that it is not compact we need to find anopen cover of (0, 1) from which we cannot select a finite cover. The collection ofopen intervals (1/n, 1) for all integers n ≥ 2 is an open cover of (0, 1), because forany point x ∈ (0, 1) it is always able to find an integer n such that n > 1/x, thusx ∈ (1/n, 1). But, no finite subcover will do! Let (1/N, 1) be the maximal intervalin a candidate subcover then it is always possible to find a point x ∈ (0, 1) suchthat N < 1/x.

While this definition of compactness is quite useful for finding out when the setunder question is not compact it is less useful for verifying that a set is indeed com-pact. Much more convenient characterisation of compact sets in finite-dimensionalEuclidean space, Rn, is given by the following theorem.

Theorem 1. Any closed and bounded subset of Rn is compact.

But why are we interested in compactness at all? Because of the following ex-tremely important theorem the first version of which was proved by Carl Weierstrassaround 1860.

Theorem 2. Let S be a compact set in a metric space and f : S → R be acontinuous function. Then function f attains its maximum and minimum in S.

8. SEQUENCES AND SUBSEQUENCES 11

And why this theorem is important for us? Because many economic problemsare concerned with finding a maximal (or a minimal) value of a function on some set.Weierstrass theorem provides conditions under which such search is meaningful!!!This theorem and its implications will be much dwelt upon later in the notes, sowe just give here one example. The consumer utility maximisation problem is theproblem of finding the maximum of utility function subject to the budget constraint.According to Weierstrass theorem, this problem has a solution if utility function iscontinuous and the budget set is compact.

8. Sequences and Subsequences

Let us consider again some metric space (X, d). An infinite sequence of pointsin (X, d) is simply a list

x1, x2, x3, . . . ,

where . . . indicates that the list continues “forever.”We can be a bit more formal about this. We first consider the set of natural

numbers (or counting numbers) 1, 2, 3, . . . , which we denote N. We can now definean infinite sequence in the following way.

Definition 6. An infinite sequence of elements of X is a function from N toX.

Notation. If we look at the previous definition we see that we might havea sequence s : N → X which would define s(1), s(2), s(3), . . . or in other wordswould define s(n) for any natural number n. Typically when we are referring tosequences we use subscripts (or sometimes superscripts) instead of parentheses andwrite s1, s2, s3, . . . and sn instead of s(1), s(2), s(3), . . . and s(n). Also rather thansaying that s : N → X is a sequence we say that {sn} is a sequence or even that{sn}∞n=1 is a sequence.

Lets now examine a few examples.

Example 4. Suppose that (X, d) is R the real numbers with the usual metricd(, x, y) = |x− y|. Then {n}, {

√n}, and {1/n} are sequences.

Example 5. Again, suppose that (X, d) is R the real numbers with the usualmetric d(x, y) = |x− y|. Consider the sequence {xn} where

xn =

{1 if n is odd0 if n is even

We see that {n} and {√n} get arbitrary large as n gets larger, while in the last

example xn “bounces” back and forth between 0 and 1 as n gets larger. However for{1/n} the element of the sequence gets closer and closer to 0 (and indeed arbitrarilyclose to 0). We say, in this case, that the sequence converges to zero or that thesequence has limit 0. This is a particularly important concept and so we shall givea formal definition.

Definition 7. Let {xn} be a sequence of points in (X, d). We say that thesequence converges to x0 ∈ X if for any ε > 0 there is N ∈ N such that if n > Nthen d(xn, x0) < ε.

Informally we can describe this by saying that if n is large then the distancefrom xn to x0 is small.

If the sequence {xn} converges to x0, then we often write xn → x0 as n → ∞or limn→∞ xn = x0.


Exercise 34. Show that if the sequence {xn} converges to x0 then it does notconverge to any other value unequal to x0. Another way of saying this is that ifthe sequence converges then it’s limit is unique.

We have now seen a number of examples of sequences. In some the sequence“runs off to infinity;” in others it “bounces around;” while in others it converges toa limit. Could a sequence do anything else? Could a sequence, for example, settledown each element getting closer and closer to all future elements in the sequencebut not converging to any particular limit? In fact, depending on what the spaceX is this is indeed possible.

First let us recall the notion of a rational number. A rational number is anumber that can be expressed as the ratio of two integers, that is r is rational ifr = a/b with a and b integers and b 6= 0. We usually denote the set of all rationalnumbers Q (since we have already used R for the real numbers). We now considerand example in which the underlying space X is Q. Consider the sequence ofrational numbers defined in the following way

x1 = 1

xn+1 =xn + 2xn + 1

.

This kind of definition is called a recursive definition. Rather than writing, as afunction of n, what xn is we write what x1 is and then what xn+1 is as a functionof what xn is. We can obviously find any element of the sequence that we need, aslong as we sequentially calculate each previous element. In our case we’d have

x1 = 1

x2 =1 + 21 + 1

=32

= 1.5

x3 =32 + 232 + 1

=75

= 1.4

x4 =75 + 275 + 1

=1712≈ 1.416667

x5 =1712 + 21712 + 1

=4129≈ 1.413793

x6 =4129 + 24129 + 1

=9970≈ 1.414286

...

We see that the sequence goes up and down but that it seems to be “converg-ing.” What is it converging to? Lets suppose that it’s converging to some value x0.Recall that

xn+1 =xn + 2xn + 1

.

We’ll see later that if f is a continuous function then limn→∞f(xn) = f(limn→∞xn).In this case that means that

x0 = limn→∞xn+1 = limn→∞xn + 2xn + 1

=x0 + 2x0 + 1

.

Thus we havex0 =

x0 + 2x0 + 1

8. SEQUENCES AND SUBSEQUENCES 13

and if we solve this we obtain x0 = ±√

2. Clearly if xn > 0 then xn+1 > 0 soour sequence can’t be converging to −

√2 so we must have x0 =

√2. But

√2 is

not in Q. Thus we have a sequence of elements in Q that are getting very close toeach other but are not converging to any element of Q. (Of course the sequence isconverging to a point in R. In fact one construction of the real number system isin terms of such sequences in Q.

Definition 8. Let {xn} be a sequence of points in (X, d). We say that thesequence is a Cauchy sequence if for any ε > 0 there is N ∈ N such that if n,m > Nthen d(xn, xm) < ε.

Exercise 35. Show that if {xn} converges then {xn} is a Cauchy sequence.

A metric space (X, d) in which every Cauchy sequence converges to a limit inX is called a complete metric space. The space of real numbers R is a completemetric space, while the space of rationals Q is not.

Exercise 36. Is N the space of natural or counting numbers with metric dgiven by d(x, y) = |x− y| a complete metric space?

In Section 6 we defined the notion of a function being continuous at a point.It is possible to give that definition in terms of sequences.

Definition 9. Suppose (X, dX) and (Y, dY ) are metric spaces, x0 ∈ X, andf : X → Y is a function. Then f is continuous at x0 if for every sequence {xn} thatconverges to x0 in (X, dX) the sequence {f(xn)} converges to f(x0) in (Y, dY ).

Exercise 37. Show that the function f(x) = (x+ 2)/(x+ 1) is continuous atany point x 6= −1. Show that this means that if xn → x0 as n→∞ then

limn→∞

xn + 2xn + 1

=x0 + 2x0 + 1

.

We can also define the concept of a closed set (and hence the concepts of opensets and compact sets) in terms of sequences.

Definition 10. Let (X, d) be a metric space. A set S ⊂ X is closed if for anyconvergent sequence {xn} such that xn ∈ S for all n then limn→∞ xn ∈ S. A set isopen if its complement is closed.

Given a sequence {xn} we can define a new sequence by taking only some ofthe elements of the original sequence. In the example we considered earlier in whichxn was 1 if n was odd and 0 if n was even we could take only the odd n and thusobtain a sequence that did converge. The new sequence is called a subsequence ofthe old sequence.

Definition 11. Let {xn} be some sequence in (X, d). Let {nj}∞j=1 be asequence of natural numbers such that for each j we have nj < nj+1, that isn1 < n2 < n3 < . . . . The sequence {xnj

}∞j=1 is called a subsequence of the originalsequence.

The notion of a subsequence is often useful. We often use it in the way thatwe briefly referred to above. We initially have a sequence that may not converge,but we are able to take a subsequence that does converge. Such a subsequence iscalled a convergent subsequence.

Definition 12. A subset of a metric space with the property that every se-quence in the subset has a convergent subsequence is called sequentially compact.

Theorem 3. In any metric space any compact set is sequentially compact.


If we restrict attention to finite dimensional Euclidian spaces the situation iseven better behaved.

Theorem 4. Any subset of Rn is sequentially compact if and only if it iscompact.

Exercise 38. Verify the following limits.

(i) limn→∞

n

n+ 1= 1

(ii) limn→∞

n+ 3n2 + 1

= 0

(iii) limn→∞

√n+ 1−

√n = 0

(iv) limn→∞

n√an + bn = max{a, b}

Exercise 39. Consider a sequence {xn} in R. What can you say about thesequence if it converges and for each n xn is an integer.

Exercise 40. Consider the sequence12 ,

13 ,

23 ,

14 ,

24 ,

34 ,

15 ,

25 ,

35 ,

45 ,

16 , . . . .

For which values z ∈ R is there a subsequence converging to z?

Exercise 41. Prove that if a subsequence of a Cauchy sequence converges toa limit z then so does the original Cauchy sequence.

Exercise 42. Prove that any subsequence of a convergent sequence converges.

Finally one somewhat less trivial exercise.

Exercise 43. Prove that if limn→∞ xn = z then

limn→∞

x1 + · · ·+ xnn

= z

9. Linear Spaces

The notion of linear space is the axiomatic way of looking at the familiar linearoperations: addition and multiplication. A trivial example of a linear space is theset of real numbers, R.

What is the operation of addition? The one way of answering the question issaying that the operation of addition is just the list of its properties. So, we willdefine the addition of elements from some set X as the operation that satisfies thefollowing four axioms.

A1: x+ y = y + x for all x and y in X.A2: x+ (y + z) = (x+ y) + z, for all x, y, and z in X.A3: There exists an element, denoted by 0, such that x+ 0 = x for all x inX.

A4: For every x in X there exist an element y in X, called inverse of x, suchthat x+ y = 0.

And, to make things more interesting we will also introduce the operation of‘multiplication by number’ by adding two more axioms.

A5: 1x = x for all x in X.A6: α(βx) = (αβ)x for all x in X and for all α and β in R.

Finally, two more axioms relating addition and multiplication.A7: α(x+ y) = αx+ αy for all x and y in X and for all α in R.A8: (α+ β)x = αx+ βx for all x in X and for all α and β in R.

9. LINEAR SPACES 15

Elements x, y, . . . , w are linearly dependent if there exist real numbers α, β, . . . , λ,not all of them equal to zero, such that

αx+ βy + · · ·+ λz = 0.

Otherwise, the elements x, y, . . . , w are linearly independent.If in a space L it is possible to find n linearly independent elements, but any

n+ 1 elements are linearly dependent then we say that the space L has dimensionn.

Nonempty subset L′ of a linear space L is called a linear subspace if L′ formsa linear space in itself. In other words, L′ is a linear subspace of L if for any x andy in L and all α and β in R

αx+ βy ∈ L′.

CHAPTER 2

Linear Algebra

1. The Space Rn

In the previous chapter we introduced the concept of a linear space or a vectorspace. We shall now examine in some detail one example of such a space. This isthe space of all ordered n-tuples (x1, x2, . . . , xn) where each xi is a real number.We call this space n-dimensional real space and denote it Rn.

Remember from the previous chapter that to define a vector space we not onlyneed to define the points in that space but also to define how we add such pointsand how we multiple such points by scalars. In the case of Rn we do this elementby element in the n-tuple or vector. That is,

(x1, x2, . . . , xn) + (y1, y2, . . . , yn) = (x1 + y1, x2 + y2, . . . , xn + yn)

andα(x1, x2, . . . , xn) = (αx1, αx2, . . . , αxn).

Let us consider the case that n = 2, that is, the case of R2. In this case we canvisualise the space as in the following diagram. The vector (x1, x2) is representedby the point that is x1 units along from the point (0, 0) in the horizontal directionand x2 units up from (0, 0) in the vertical direction.

-

6

x1

x2

q (1, 2)

1

2

Figure 1

Let us for the moment continue our discussion in R2. Notice that we areimplicitly writing a vector (x1, x2) as a sum x1 × v1 + x2 × v2 where v1 is theunit vector in the first direction and v2 is the unit vector in the second direction.Suppose that instead we considered the vectors u1 = (2, 1) = 2 × v1 + 1 × v2 and

17

18 2. LINEAR ALGEBRA

u2 = (1, 2) = 1 × v1 + 2 × v2. We could have written any vector (x1, x2) insteadas z1 × u1 + z2 × u2 where z1 = (2x1 − x2)/3 and z2 = (2x2 − x1)/3. That is, forany vector in R2 we can uniquely write that vector in terms of u1 and u2. Is thereanything that is special about u1 and u2 that allows us to make this claim? Theremust be since we can easily find other vectors for which this would not have beentrue. (For example, (1, 2) and (2, 4).)

The property of the pair of vectors u1 and u2 is that they are independent. Thatis, we cannot write either as a multiple of the other. More generally in n dimensionswe would say that we cannot write any of the vectors as a linear combination ofthe others, or equivalently as the following definition.

Definition 13. The vectors x1, . . . , xk all in Rn are linearly independent if itis not possible to find scalars α1, . . . , αk not all zero such that

α1x1 + · · ·+ αkx

k = 0.

Notice that we do not as a matter of definition require that k = n or even thatk ≤ n. We state as a result that if k > n then the collection x1, . . . , xk cannotbe linearly independent. (In a real maths course we would, of course, have provedthis.)

Comment 1. If you examine the definition above you will notice that thereis nowhere that we actually need to assume that our vectors are in Rn. We canin fact apply the same definition of linear independence to any vector space. Thisallows us to define the concept of the dimension of an arbitrary vector space as themaximal number of linearly independent vectors in that space. In the case of Rnwe obtain that the dimension is in fact n.

Exercise 44. Suppose that x1, . . . , xk all in Rn are linearly independent andthat the vector y in Rn is equal to β1x

1 + · · · + βkxk. Show that this is the only

way that y can be expressed as a linear combination of the xi’s. (That is show thatif y = γ1x

1 + · · ·+ γkxk then β1 = γ1, . . . , βk = γk.)

The set of all vectors that can be written as a linear combination of the vectorsx1, . . . , xk is called the span of those vectors. If x1, . . . , xk are linearly independentand if the span of x1, . . . , xk is all of Rn then the collection {x1, . . . , xk } is calleda basis for Rn. (Of course, in this case we must have k = n.) Any vector in Rncan be uniquely represented as a linear combination of the vectors x1, . . . , xk. Weshall later see that it can sometimes be useful to choose a particular basis in whichto represent the vectors with which we deal.

It may be that we have a collection of vectors {x1, . . . , xk } whose span is notall of Rn. In this case we call the span of {x1, . . . , xk } a linear subspace of Rn.Alternatively we say that X ⊂ Rn is a linear subspace of Rn if X is closed undervector addition and scalar multiplication. That is, if for all x, y ∈ X the vectorx+ y is also in X and for all x ∈ X and α ∈ R the vector αx is in X. If the spanof x1, . . . , xk is X and if x1, . . . , xk are linearly independent then we say that thesevectors are a basis for the linear subspace X. In this case the dimension of thelinear subspace X is k. In general the dimension of the span of x1, . . . , xk is equalto the maximum number of linearly independent vectors in x1, . . . , xk.

Finally, we comment that Rn is a metric space with metric d : R2n → R+

defined by

d((x1, . . . , xn), (y1, . . . , yn)) =√

(x1 − y1)2 + · · ·+ (xn − yn)2.

There are many other metrics we could define on this space but this is the standardone.

2. LINEAR FUNCTIONS FROM Rn TO Rm 19

2. Linear Functions from Rn to Rm

In the previous section we introduced the space Rn. Here we shall discussfunctions from one such space to another (possibly of different dimension). Theconcept of continuity that we introduced for metric spaces is immediately applicablehere. We shall be mainly concerned here with an even narrower class of functions,namely, the linear functions.

Definition 14. A function f : Rn → Rm is said to be a linear function if itsatisfies the following two properties.

(1) f(x+ y) = f(x) + f(y) for all x, y ∈ Rn, and(2) f(αx) = αf(x) for all x ∈ Rn and α ∈ R.

Comment 2. When considering functions of a single real variable, that is,functions from R to R functions of the form f(x) = ax + b where a and b arefixed constants are sometimes called linear functions. It is easy to see that if b 6= 0then such functions do not satisfy the conditions given above. We shall call suchfunctions affine functions. More generally we shall call a function g : Rn → Rm anaffine function if it is the sum of a linear function f : Rn → Rm and a constantb ∈ Rm. That is, if for any x ∈ Rn g(x) = f(x) + b.

Let us now suppose that we have two linear functions f : Rn → Rm andg : Rn → Rm. It is straightforward to show that the function (f + g) : Rn → Rmdefined by (f + g)(x) = f(x) + g(x) is also a linear function. Similarly if we have alinear function f : Rn → Rm and a constant α ∈ R the function (αf) : Rn → Rmdefined by (αf)(x) = αf(x) is a linear function. If f : Rn → Rm and g : Rm →Rk are linear functions then the composite function g ◦ f : Rn → Rk defined byg ◦ f(x) = g(f(x)) is again a linear function. Finally, if f : Rn → Rn is not onlylinear, but also one-to-one and onto so that it has an inverse f−1 : Rn → Rn thenthe inverse function is also a linear function.

Exercise 45. Prove the facts stated in the previous paragraph.

Recall in the previous section we defined the notion of a linear subspace. Alinear function f : Rn → Rm defines two important subspaces, the image of f ,denoted Im(f) ⊂ Rm, and the kernel of f , denoted Ker(f) ⊂ Rn. The image of fis the set of all vectors in Rm such that f maps some vector in Rn to that vector,that is,

Im(f) = { y ∈ Rm | ∃x ∈ Rn such that y = f(x) }.The kernel of f is the set of all vectors in Rn that are mapped by the function fto the zero vector in Rm, that is,

Ker(f) = {x ∈ Rn | f(x) = 0 }.The kernel of f is sometimes called the null space of f .

It is intuitively clear that the dimension of Im(f) is no more than n. (It is ofcourse no more than m since it is contained in Rm.) Of course, in general it may beless than n, for example if m < n or if f mapped all points in Rn to the zero vectorin Rm. (You should satisfy yourself that this function is indeed a linear function.)However if the dimension of Im(f) is indeed less than n it means that the functionhas mapped the n-dimensional space Rn into a linear space of lower dimension andthat in the process some dimensions have been lost. The linearity of f means thata linear subspace of dimension equal to the number of dimensions that have beenlost must have been collapsed to the zero vector (and that translates of this linearsubspace have been collapsed to single points). Thus we can say that

dim(Im(f)) + dim(Ker(f)) = n.


In the following section we shall introduce the notion of a matrix and definevarious operations on matrices. If you are like me when I first came across matrices,these definitions may seem somewhat arbitrary and mysterious. However, we shallsee that matrices may be viewed as representations of linear functions and that whenviewed in this way the operations we define on matrices are completely natural.

3. Matrices and Matrix Algebra

A matrix is defined as a rectangular array of numbers. If the matrix containsm rows and n columns it is called an m× n matrix (read “m by n” matrix). Theelement in the ith row and the jth column is called the ijth element. We typicallyenclose a matrix in square brackets [ ] and write it as a11 . . . a1n

.... . .

...am1 . . . amn

.In the case that m = n we call the matrix a square matrix. If m = 1 the matrixcontains a single row and we call it a row vector. If n = 1 the matrix containsa single column and we call it a column vector. For most purposes we do notdistinguish between a 1× 1 matrix [a] and the scalar a.

Just as we defined the operation of vector addition and the multiplication ofa vector by a scalar we define similar operations for matrices. In order to be ableto add two matrices we require that the matrices be of the same dimension. Thatis, if matrix A is of dimension m × n we shall be able to add the matrix B to itif and only if B is also of dimension m × n. If this condition is met then we addmatrices simply by adding the corresponding elements of each matrix to obtain thenew m× n matrix A+B. That is, a11 . . . a1n

.... . .

...am1 . . . amn

+

b11 . . . b1n...

. . ....

bm1 . . . bmn

=

a11 + b11 . . . a1n + b1n...

. . ....

am1 + bm1 . . . amn + bmn

.We can see that this definition of matrix addition satisfies many of the same

properties of the addition of scalars. If A, B, and C are all m× n matrices then(1) A+B = B +A,(2) (A+B) + C = A+ (B + C),(3) there is a zero matrix 0 such that for any m×n matrix A we have A+0 =

0 +A = A, and(4) there is a matrix −A such that A+ (−A) = (−A) +A = 0.

Of course, the zero matrix referred to in 3 is simply the m×n matrix consistingof all zeros (this is called a null matrix ) and the matrix −A referred to in 4 is thematrix obtained from A by replacing each element of A by its negative, that is,

−

a11 . . . a1n

.... . .

...am1 . . . amn

=

−a11 . . . −a1n

.... . .

...−am1 . . . −amn

.Now, given a scalar α in R and an m× n matrix A we define the product of α

and A which we write αA to be the matrix in which each element is replaced by αtimes that element, that is,

α

a11 . . . a1n

.... . .

...am1 . . . amn

=

αa11 . . . αa1n

.... . .

...αam1 . . . αamn

.

3. MATRICES AND MATRIX ALGEBRA 21

So far the definitions of matrix operations have all seemed the most naturalones. We now come to defining matrix multiplication. Perhaps here the definitionseems somewhat less natural. However in the next section we shall see that the defi-nition we shall give is in fact very natural when we view matrices as representationsof linear functions.

We define matrix multiplication of A times B written as AB where A is anm × n matrix and B is a p × q matrix only when n = p. In this case the productAB is defined to be an m× q matrix in which the element in the ith row and jthcolumn is

∑nk=1 aikbkj . That is, to find the term to go in the ith row and the jth

column of the product matrix AB we take the ith row of the matrix A which willbe a row vector with n elements and the jth column of the matrix B which will bea column vector with n elements. We then multiply each element of the first vectorby the corresponding element of the second and add all these products. Thus

a11 . . . a1n

.... . .

...am1 . . . amn

b11 . . . b1q

.... . .

...bn1 . . . bnq

=

∑nk=1 a1kbk1 . . .

∑nk=1 a1kbkq

.... . .

...∑nk=1 amkbk1 . . .

∑nk=1 amkbkq

.

For example

[a b cd e f

] p qr st v

=[ap+ br + ct aq + bs+ cvdp+ er + ft dq + es+ fv

].

We define the identity matrix of order n to be the n×n matrix that has 1’s onits main diagonal and zeros elsewhere that is, whose ijth element is 1 if i = j andzero if i 6= j. We denote this matrix by In or, if the order is clear from the context,simply I. That is,

I =

1 0 . . . 00 1 . . . 0...

.... . .

...0 0 . . . 1

.

It is easy to see that if A is an m× n matrix then AIn = A and ImA = A. In fact,we could equally well define the identity matrix to be that matrix that satisfiesthese properties for all such matrices A in which case it would be easy to show thatthere was a unique matrix satisfying this property, namely, the matrix we definedabove.

Consider an m × n matrix A. The columns of A are m-dimensional vectors,that is, elements of Rm and the rows of A are elements of Rn. Thus we can askif the n columns are linearly independent and similarly if the m rows are linearlyindependent. In fact we ask: What is the maximum number of linearly independentcolumns of A? It turns out that this is the same as the maximum number of linearlyindependent rows of A. We call the number the rank of the matrix A.


4. Matrices as Representations of Linear Functions

Let us suppose that we have a particular linear function f : Rn → Rm. We havesuggested in the previous section that such a function can necessarily be representedas multiplication by some matrix. We shall now show that this is true. Moreoverwe shall do so by explicitly constructing the appropriate matrix.

Let us write the n-dimensional vector x as a column vector

x =

x1

x2

...xn

.Now, notice that we can write the vector x as a sum

∑ni=1 xie

i, where ei is the ithunit vector, that is, the vector with 1 in the ith place and zeros elsewhere. That is,

x1

x2

...xn

= x1

10...0

+ x2

01...0

+ · · ·+ xn

00...1

.Now from the linearity of the function f we can write

f(x) = f(n∑i=1

xiei)

=n∑i=1

f(xiei)

=n∑i=1

xif(ei).

But, what is f(ei)? Remember that ei is a unit vector in Rn and that f mapsvectors in Rn to vectors in Rm. Thus f(ei) is the image in Rm of the vector ei. Letus write f(ei) as

a1i

a2i

...ami

.Thus

f(x) =n∑i=1

xif(ei)

= x1

a11

a21

...am1

+ x2

a12

a22

...am2

+ · · ·+ xn

a1n

a2n

...amn

=

∑ni=1 a1ixi∑ni=1 a2ixi

...∑ni=1 amixi

4. MATRICES AS LINEAR FUNCTIONS 23

and this is exactly what we would have obtained had we multiplied the matrices

a11 a12 . . . a1n

a21 a22 . . . a2n

......

. . ....

am1 am2 . . . amn

x1

x2

...xn

.

Thus we have not only shown that a linear function is necessarily represented bymultiplication by a matrix we have also shown how to find the appropriate matrix.It is precisely the matrix whose n columns are the images under the function of then unit vectors in Rn.

Exercise 46. Find the matrices that represent the following linear functionsfrom R2 to R2.

(1) a clockwise rotation of π/2 (90◦),(2) a reflection in the x1 axis,(3) a reflection in the line x2 = x1 (that is, the 45◦ line),(4) a counter clockwise rotation of π/4 (45◦), and(5) a reflection in the line x2 = x1 followed by a counter clockwise rotation of

π/4.

Recall that in Section 2 we defined, for any f, g : Rn → Rm and α ∈ R, thefunctions (f + g) and (αf). In Section 3 we defined the sum of two m×n matricesA and B, and the product of a scalar α with the matrix A. Let us instead definethe sum of A and B as follows.

Let f : Rn → Rm be the linear function represented by the matrix A andg : Rn → Rm be the linear function represented by the matrix B. Now definethe matrix (A + B) to be the matrix that represents the linear function (f + g).Similarly let the matrix αA be the matrix that represents the linear function (αf).

Exercise 47. Prove that the matrices (A+B) and αA defined in the previousparagraph coincide with the matrices defined in Section 3.

We can also see that the definition we gave of matrix multiplication is preciselythe right definition if we mean multiplication of matrices to mean the composition ofthe linear functions that the matrices represent. To be more precise let f : Rn → Rmand g : Rm → Rk be linear functions and let A and B be the m × n and k ×mmatrices that represent them. Let (g ◦ f) : Rn → Rk be the composite functiondefined in Section 2. Now let us define the product BA to be that matrix thatrepresents the linear function (g ◦ f).


Now since the matrix A represents the function f and B represents g we have

(g ◦ f)(x) = g(f(x))

= g

a11 a12 . . . a1n

a21 a22 . . . a2n

......

. . ....

am1 am2 . . . amn

x1

x2

...xn

= g


...∑ni=1 amixi

=

b11 b12 . . . b1mb21 b22 . . . b2m...

.... . .

...bk1 bk2 . . . bkm


...∑ni=1 amixi

=

∑mj=1 b1j

∑ni=1 ajixi∑m

j=1 b2j∑ni=1 ajixi

...∑mj=1 bkj

∑ni=1 ajixi

=

∑ni=1

∑mj=1 b1jajixi∑n

i=1

∑mj=1 b2jajixi...∑n

i=1

∑mj=1 bkjajixi

=

∑mj=1 b1jaj1

∑mj=1 b1jaj2 . . .

∑mj=1 b1jajn∑m

j=1 b2jaj1∑mj=1 b2jaj2 . . .

∑mj=1 b2jajn

......

. . ....∑m

j=1 bkjaj1∑mj=1 bkjaj2 . . .

∑mj=1 bkjajn

x1

x2

...xn

.And this last is the product of the matrix we defined in Section 3 to be BA withthe column vector x. As we have claimed the definition of matrix multiplicationwe gave in Section 3 was not arbitrary but rather was forced on us by our decisionto regard the multiplication of two matrices as corresponding to the compositionof the linear functions the matrices represented.

Recall that the columns of the matrix A that represented the linear functionf : Rn → Rm were precisely the images of the unit vectors in Rn under f . Thelinearity of f means that the image of any point in Rn is in the span of the imagesof these unit vectors and similarly that any point in the span of the images is theimage of some point in Rn. Thus Im(f) is equal to the span of the columns ofA. Now, the dimension of the span of the columns of A is equal to the maximumnumber of linearly independent columns in A, that is, to the rank of A.

5. Linear Functions from Rn to Rn and Square Matrices

In the remainder of this chapter we look more closely at an important subclassof linear functions and the matrices that represent them, viz the functions thatmap Rn to itself. From what we have already said we see immediately that thematrix representing such a linear function will have the same number of rows as ithas columns. We call such a matrix a square matrix.

7. CHANGES OF BASIS 25

If the linear function f : Rn → Rn is one-to-one and onto then the function fhas an inverse f−1. In Exercise 45 you showed that this function too was linear.A matrix that represents a linear function that is one-to-one and onto is called anonsingular matrix. Alternatively we can say that an n × n matrix is nonsingularif the rank of the matrix is n. To see these two statements are equivalent notefirst that if f is one-to-one then Ker(f) = {0}. (This is the trivial direction ofExercise 48.) But this means that dim(Ker(f)) = 0 and so dim(Im(f)) = n. And,as we argued at the end of the previous section this is the same as the rank ofmatrix that represents f .

Exercise 48. Show that the linear function f : Rn → Rm is one-to-one if andonly if Ker(f) = {0}.

Exercise 49. Show that the linear function f : Rn → Rn is one-to-one if andonly if it is onto.

6. Inverse Functions and Inverse Matrices

In the previous section we discussed briefly the idea of the inverse of a linearfunction f : Rn → Rn. This allows us a very easy definition of the inverse of asquare matrix A. The inverse of A is the matrix that represents the linear functionthat is the inverse function of the linear function that A represents. We write theinverse of the matrix A as A−1. Thus a matrix will have an inverse if and only ifthe linear function that the matrix represents has an inverse, that is, if and onlyif the linear function is one-to-one and onto. We saw in the previous section thatthis will occur if and only if the kernel of the function is {0} which in turn occursif and only if the image of f is of full dimension, that is, is all of Rn. This is thesame as the matrix being of full rank, that is, of rank n.

As with the ideas we have discussed earlier we can express the idea of a matrixinverse purely in terms of matrices without reference to the linear function thatthey represent. Given an n× n matrix A we define the inverse of A to be a matrixB such that BA = In where In is the n× n identity matrix discussed in Section 3.Such a matrix B will exist if and only if the matrix A is nonsingular. Moreover, ifsuch a matrix B exists then it is also true that AB = In, that is, (A−1)−1 = A.

In Section 9 we shall see one method for calculating inverses of general n × nmatrices. Here we shall simply describe how to calculate the inverse of a 2 × 2matrix. Suppose that we have the matrix

A =[a bc d

].

The inverse of this matrix is(1

ad− bc

)[d −b−c a

].

Exercise 50. Show that the matrix A is of full rank if and only if ad− bc 6= 0.

Exercise 51. Check that the matrix given is, in fact, the inverse of A.

7. Changes of Basis

We have until now implicitly assumed that there is no ambiguity when wespeak of the vector (x1, x2, . . . , xn). Sometimes there may indeed be an obviousmeaning to such a vector. However when we define a linear space all that are reallyspecified are “what straight lines are” and “where zero is.” In particular, we donot necessarily have defined in an unambiguous way “where the axes are” or “what


a unit length along each axis is.” In other words we may not have a set of basisvectors specified.

Even when we do have, or have decided on, a set of basis vectors we may wishto redefine our description of the linear space with which we are dealing so as touse a different set of basis vectors. Let us suppose that we have an n-dimensionalspace, even Rn say, with a given set of basis vectors v1, v2, . . . , vn and that wewish instead to describe the space in terms of the linearly independent vectorsb1, b2, . . . , bn where

bi = b1iv1 + b2iv

2 + · · ·+ bnivn.

Now, if we had the description of a point in terms of the new coordinate vectors,e.g., as

z1b1 + z2b

2 + · · ·+ znbn

then we can easily convert this to a description in terms of the original basis vectors.We would simply substitute the formula for bi in terms of the ej ’s into the previousformula giving(

n∑i=1

b1izi

)v1 +

(n∑i=1

b2izi

)v2 + · · ·+

(n∑i=1

bnizi

)vn

or, in our previous notation (∑ni=1 b1izi)

(∑ni=1 b2izi)

...(∑ni=1 bnizi)

.But this is simply the product

b11 b12 . . . b1nb21 b22 . . . b2n...

.... . .

...bn1 bn2 . . . bnn

z1z2...zn

.That is, if we are given an n-tuple of real numbers that describe a vector in termsof the new basis vectors b1, b2, . . . , bn and we wish to find the n-tuple that describesthe vector in terms of the original basis vectors we simply multiply the ntuple weare given, written as a column vector by the matrix whose columns are the newbasis vectors b1, b2, . . . , bn. We shall call this matrix B. We see among other thingsthat changing the basis is a linear operation.

Now, if we were given the information in terms of the original basis vectorsand wanted to write it in terms of the new basis vectors what should we do? Sincewe don’t have the original basis vectors written in terms of the new basis vectorsthis is not immediately obvious. However we do know that if we were to do it andthen were to carry out the operation described in the previous paragraph we wouldbe back with what we started. Further we know that the operation is a linearoperation that maps n-tuples to n-tuples and so is represented by multiplicationby an n×n matrix. That is we multiply the n-tuple written as a column vector bythe matrix that when multiplied by B gives the identity matrix, that is, the matrixB−1. If we are given a vector of the form

x1v1 + x2v

2 + · · ·+ xnvn

7. CHANGES OF BASIS 27

and we wish to express it in terms of the vectors b1, b2, . . . , bn we calculateb11 b12 . . . b1nb21 b22 . . . b2n...

.... . .

...bn1 bn2 . . . bnn

−1

x1

x2

...xn

.Suppose now that we consider a linear function f : Rn → Rn and that we have

originally described Rn in terms of the basis vectors v1, v2, . . . , vn where vi is thevector with 1 in the ith place and zeros elsewhere. Suppose that with these basisvectors f is represented by the matrix

A =

a11 a12 . . . a1n

a21 a22 . . . a2n

......

. . ....

an1 an2 . . . ann

.If we now describe Rn in terms of the vectors b1, b2, . . . , bn how will the linearfunction f be represented? Let us think of what we want? We shall be givena vector described in terms of the basis vectors b1, b2, . . . , bn and we shall wantto know what the image of this vector under the linear function f is, where weshall again want our answer in terms of the basis vectors b1, b2, . . . , bn. We shallknow how to do this when we are given the description in terms of the vectorse1, e2, . . . , en. Thus the first thing we shall do with our vector is to convert it froma description in terms of b1, b2, . . . , bn to a description in terms of e1, e2, . . . , en. Wedo this by multiplying the n-tuple by the matrix B. Thus if we call our originaln-tuple z we shall now have a description of the vector in terms of e1, e2, . . . , en,viz Bz. Given this description we can find the image of the vector in questionunder f by multiplying by the matrix A. Thus we shall have A(Bz) = (AB)z.Remember however this will have given us the image vector in terms of the basisvectors e1, e2, . . . , en. In order to convert this to a description in terms of the vectorsb1, b2, . . . , bn we must multiply by the matrix B−1. Thus our final n-tuple will be(B−1AB)z.

Recapitulating, suppose that we know that the linear function f : Rn → Rn isrepresented by the matrix A when we describe Rn in terms of the standard basisvectors e1, e2, . . . , en and that we have a new set of basis vectors b1, b2, . . . , bn. Thenwhen Rn is described in terms of these new basis vectors the linear function f willbe represented by the matrix B−1AB.

Exercise 52. Let f : Rn → Rm be a linear function. Suppose that with thestandard bases for Rn and Rm the function f is represented by the matrix A. Letb1, b2, . . . , bn be a new set of basis vectors for Rn and c1, c2, . . . , cm be a new set ofbasis vectors for Rm. What is the matrix that represents f when the linear spacesare described in terms of the new basis vectors?

Exercise 53. Let f : R2 → R2 be a linear function. Suppose that with thestandard bases for Rn and Rm the function f is represented by the matrix[

3 11 2

].

Let [32

]and

[11

]be a new set of basis vectors for R2. What is the matrix that represents f whenR2 is described in terms of the new basis vectors?


Properties of a square matrix that depend only on the linear function that thematrix represents and not on the particular choice of basis vectors for the linearspace are called invariant properties. We have already seen one example of aninvariant property, the rank of a matrix. The rank of a matrix is equal to thedimension of the image space of the function that the matrix represents whichclearly depends only on the function and not on the choice of basis vectors for thelinear space.

The idea of a property being invariant can be expressed also in terms only ofmatrices without reference to the idea of linear functions. A property is invariantif whenever an n × n matrix A has the property then for any nonsingular n × nmatrix B the matrix B−1AB also has the property. We might think of rank as afunction that associates to any square matrix a nonnegative integer. We shall saythat such a function is an invariant if the property of having the function take aparticular value is invariant for all particular values we may choose.

Two particularly important invariants are the trace of a square matrix and thedeterminant of a square matrix. We examine these in more detail in the followingsection.

8. The Trace and the Determinant

In this section we define two important real valued functions on the spaceof n × n matrices, the trace and the determinant. Both of these concepts havegeometric interpretations. However, while the trace is easy to calculate (much easierthan the determinant) its geometric interpretation is rather hard to see. Thus weshall not go into it. On the other hand the determinant while being somewhatharder to calculate has a very clear geometric interpretation. In Section 9 we shallexamine in some detail how to calculate determinants. In this section we shall becontent to discuss one definition and the geometric intuition of the determinant.

Given an n×n matrix A the trace of A, written tr(A) is the sum of the elementson the main diagonal, that is,

tr

a11 a12 . . . a1n

a21 a22 . . . a2n

......

. . ....

an1 an2 . . . ann

=n∑i=1

aii.

Exercise 54. For the matrices given in Exercise 53 confirm that tr(A) =tr(B−1AB).

It is easy to see that the trace is a linear function on the space of all n × nmatrices, that is, that for all A and B n× n matrices and for all α ∈ R

(1) tr(A+B) = tr(A) + tr(B),

and

(2) tr(αA) = αtr(A).

We can also see that if A and B are both n×n matrices then tr(AB) = tr(BA).In fact, if A is an m × n matrix and B is an n ×m matrix this is still true. Thiswill often be extremely useful in calculating the trace of a product.

Exercise 55. From the definition of matrix multiplication show that if A is anm×n matrix and B is an n×m matrix that tr(AB) = tr(BA). [Hint: Look at thedefinition of matrix multiplication in Section 2. Then write the determinant of theproduct matrix using summation notation. Finally change the order of summation.]

8. THE TRACE AND THE DETERMINANT 29

The determinant, unlike the trace is not a linear function of the matrix. It doeshowever have some linear structure. If we fix all columns of the matrix except oneand look at the determinant as a function of only this column then the determinantis linear in this single column. Moreover this is true whatever the column we choose.Let us write the determinant of the n × n matrix A as det(A). Let us also writethe matrix A as [a1, a2, . . . , an] where ai is the ith column of the matrix A. Thusour claim is that for all n× n matrices A, for all i = 1, 2, . . . n, for all n vectors b,and for all α ∈ R

det([a1, . . . , ai−1, ai + b, ai+1, . . . , an]) = det([a1, . . . , ai−1, ai, ai+1, . . . , an])

+ det([a1, . . . , ai−1, b, ai+1, . . . , an])(3)

and

(4) det([a1, . . . , ai−1, αai, ai+1, . . . , an]) = α det([a1, . . . , ai−1, ai, ai+1, . . . , an]).

We express this by saying that the determinant is a multilinear function.Also the determinant is such that any n × n matrix that is not of full rank,

that is, of rank n, has a zero determinant. In fact, given that the determinantis a multilinear function if we simply say that any matrix in which one column isthe same as one of its neighbours has a zero determinant this implies the strongerstatement that we made. We already see one use of calculating determinants. Amatrix is nonsingular if and only if its determinant is nonzero.

The two properties of being multilinear and zero whenever two neighbouringcolumns are the same already almost uniquely identify the determinant. Noticehowever that if the determinant satisfies these two properties then so does anyconstant times the determinant. To uniquely define the determinant we “tie down”this constant by assuming that det(I) = 1.

Though we haven’t proved that it is so, these three properties uniquely de-fine the determinant. That is, there is one and only one function with these threeproperties. We call this function the determinant. In Section 9 we shall discuss anumber of other useful properties of the determinant. Remember that this addi-tional properties are not really additional facts about the determinant. They canall be derived from the three properties we have given here.

Let us now look to the geometric interpretation of the determinant. Let usfirst think about what linear transformations can do to the space Rn. Since wehave already said that a linear transformation that is not onto is represented by amatrix with a zero determinant let us think about linear transformations that areonto, that is, that do not map Rn into a linear space of lower dimension. Suchtransformations can rotate the space around zero. They can “stretch” the space indifferent directions. And they can “flip” the space over. In the latter case all objectswill become “mirror images” of themselves. We call linear transformations thatmake such a mirror image orientation reversing and those that don’t orientationpreserving. A matrix that represents an orientation preserving linear function has apositive determinant while a matrix that represents an orientation reversing linearfunction has a negative determinant. Thus we have a geometric interpretation ofthe sign of the determinant.

The absolute size of the determinant represents how much bigger or smaller thelinear function makes objects. More precisely it gives the “volume” of the imageof the unit hypercube under the transformation. The word volume is in quotesbecause it is the volume with which we are familiar only when n = 3. If n = 2 thenit is area, while if n > 3 then it is the full dimensional analog in Rn of volume inR3.


Exercise 56. Consider the matrix[3 11 2

].

In a diagram show the image under the linear function that this matrix representsof the unit square, that is, the square whose corners are the points (0,0), (1,0),(0,1), and (1,1). Calculate the area of that image. Do the same for the matrix[

4 1−1 1

].

In the light of Exercise 53, comment on the answers you calculated.

9. Calculating and Using Determinants

We have already used the concepts of the inverse of a matrix and the determi-nant of a matrix. The purpose of this section is to cover some of the “cookbook”aspects of calculating inverses and determinants.

Suppose that we have an n× n matrix

A =

a11 . . . a1n

.... . .

...an1 . . . ann

then we shall use |A| or ∣∣∣∣∣∣∣

a11 . . . a1n

.... . .

...an1 . . . ann

∣∣∣∣∣∣∣as an alternative notation for det(A). Always remember that∣∣∣∣∣∣∣

a11 . . . a1n

.... . .

...an1 . . . ann

∣∣∣∣∣∣∣is not a matrix but rather a real number. For the case n = 2 we define

det(A) =∣∣∣∣ a11 a12

a21 a22

∣∣∣∣as a11a22 − a21a12. It is possible to also give a convenient formula for the deter-minant of a 3 × 3 matrix. However, rather than doing this, we shall immediatelyconsider the case of an n× n matrix.

By the minor of an element of the matrix A we mean the determinant (re-member a real number) of the matrix obtained from the matrix A by deleting therow and column containing the element in question. We denote the minor of theelement aij by the symbol |Mij |. Thus, for example,

|M11| =

∣∣∣∣∣∣∣a22 . . . a2n

.... . .

...an2 . . . ann

∣∣∣∣∣∣∣ .Exercise 57. Write out the minors of a general 3× 3 matrix.

We now define the cofactor of an element to be either plus or minus the minorof the element, being plus if the sum of indices of the element is even and minusif it is odd. We denote the cofactor of the element aij by the symbol |Cij |. Thus|Cij | = |Mij | if i+ j is even and |Cij | = −|Mij | if i+ j is odd. Or,

|Cij | = (−1)i+j |Mij |.

9. CALCULATING AND USING DETERMINANTS 31

We now define the determinant of an n× n matrix A,

det(A) = |A| =

∣∣∣∣∣∣∣a11 . . . a1n

.... . .

...an1 . . . ann

∣∣∣∣∣∣∣to be

∑nj=1 a1j |C1j |. This is the sum of n terms, each one of which is the product

of an element of the first row of the matrix and the cofactor of that element.

Exercise 58. Define the determinant of the 1 × 1 matrix [a] to be a. (Whatelse could we define it to be?) Show that the definition given above correspondswith the definition we gave earlier for 2× 2 matrices.

Exercise 59. Calculate the determinants of the following 3× 3 matrices.

(a)

1 2 33 6 94 5 7

(b)

1 5 21 4 30 1 2

(c)

1 1 05 4 12 3 2

(d)

1 0 00 1 00 0 1

(e)

2 5 21 5 30 1 3

Exercise 60. Show that the determinant of the identity matrix, det(In) is 1

for all values of n. [Hint: Show that it is true for I2. Then show that if it is truefor In−1 then it is true for In.]

One might ask what was special about the first row that we took elements ofthat row multiplied them by their cofactors and added them up. Why not thesecond row, or the first column? It will follow from a number of properties ofdeterminants we list below that in fact we could have used any row or column andwe would have arrived at the same answer.

Exercise 61. Expand the matrix given in Exercise 59(b) in terms of the 2ndand 3rd rows and in terms of each column and check that the resulting answeragrees with the answer you obtained originally.

We now have a way of calculating the determinant of any matrix. To findthe determinant of an n × n matrix we have to calculate n determinants of size(n−1)×(n−1). This is clearly a fairly computationally costly procedure. Howeverthere are often ways to economise on the computation.

Exercise 62. Evaluate the determinants of the following matrices

(a)

1 8 0 72 3 4 61 6 0 −10 −5 0 8

(b)

4 7 0 45 6 1 80 0 9 01 −3 1 4

[Hint: Think carefully about which column or row to use in the expansion.]

We shall now list a number of properties of determinants. These propertiesimply that, as we stated above, it does not matter which row or column we use toexpand the determinant. Further these properties will give us a series of transfor-mations we may perform on a matrix without altering its determinant. This willallow us to calculate a determinant by first transforming the matrix to one whosedeterminant is easier to calculate and then calculating the determinant of the easiermatrix.


Property 1. The determinant of a matrix equals the determinant of its trans-pose.

|A| = |A′|

Property 2. Interchanging two rows (or two columns) of a matrix changesits sign but not its absolute value. For example,∣∣∣∣ c d

a b

∣∣∣∣ = cb− ad = −(ad− cb) = −∣∣∣∣ a bc d

∣∣∣∣ .Property 3. Multiplying one row (or column) of a matrix by a constant λ

will change the value of the determinant λ-fold. For example,∣∣∣∣∣∣∣λa11 . . . λa1n

.... . .

...an1 . . . ann

∣∣∣∣∣∣∣ = λ

∣∣∣∣∣∣∣a11 . . . a1n

.... . .

...an1 . . . ann

∣∣∣∣∣∣∣ .Exercise 63. Check Property 3 for the cases n = 2 and n = 3.

Corollary 1. |λA| = λn|A| (where A is an n× n matrix).

Corollary 2. | −A| = |A| if n is even. | −A| = −|A| if n is odd.

Property 4. Adding a multiple of any row (column) to any other row (column)does not alter the value of the determinant.

Exercise 64. Check that 1 5 21 4 30 1 2

=

1 5 + 3× 2 21 4 + 3× 3 30 1 + 3× 2 2

=

1 + (−2)× 1 5 + (−2)× 4 2 + (−2)× 31 4 30 1 2

.Property 5. If one row (or column) is a constant times another row (or

column) then the determinant the matrix is zero.

Exercise 65. Show that Property 5 follows from Properties 3 and 4.

We can strengthen Property 5 to obtain the following.

Property 5′. The determinant of a matrix is zero if and only if the matrix isnot of full rank.

Exercise 66. Explain why Property 5′ is a strengthening of Property 5, thatis, why 5′ implies 5.

These properties allow us to calculate determinants more easily. Given an n×nmatrix A the basic strategy one follows is to use the above properties, particularlyProperty 4 to find a matrix with the same determinant as A in which one row (orcolumn) has only one non-zero element. Then, rather than calculating n determi-nants of size (n− 1)× (n− 1) one only needs to calculate one. One then does thesame thing for the (n− 1)× (n− 1) determinant that needs to be calculated, andso on.

There are a number of reasons we are interested in determinants. One is thatthey give us one method of calculating the inverse of a nonsingular matrix. (Recallthat there is no inverse of a singular matrix.) They also give us a method, knownas Cramer’s Rule, for solving systems of linear equations. Before proceeding withthis it is useful to state one further property of determinants.

9. CALCULATING AND USING DETERMINANTS 33

Property 6. If one expands a matrix in terms of one row (or column) andthe cofactors of a different row (or column) then the answer is always zero. That is

n∑j=1

aij |Ckj | = 0

whenever i 6= k. Alson∑i=1

aij |Cik| = 0

whenever j 6= k.

Exercise 67. Verify Property 6 for the matrix 4 1 25 2 11 0 3

.Let us define the matrix of cofactors C to be the matrix [|Cij |] whose ijth

element is the cofactor of the ijth element of A. Now we define the adjoint matrixof A to be the transpose of the matrix of cofactors of A. That is

adj(A) = C ′.

It is straightforward to see (using Property 6) that A adj(A) = |A|In = adj(A)A.That is, A−1 = 1

|A|adj(A). Notice that this is well defined if and only if |A| 6= 0.We now have a method of finding the inverse of any nonsingular square matrix.

Exercise 68. Use this method to find the inverses of the following matrices

(a)

3 −1 21 0 34 0 2

(b)

4 −2 17 3 32 0 1

(c)

1 5 21 4 30 1 2

.Knowing how to invert matrices we thus know how to solve a system of n linear

equations in n unknowns. For we can express the n equations in matrix notation asAx = b where A is an n×n matrix of coefficients, x is an n×1 vector of unknowns,and b is an n × 1 vector of constants. Thus we can solve the system of equationsas x = A−1Ax = A−1b.

Sometimes, particularly if we are not interested in all of the x’s it is convenientto use another method of solving the equations. This method is known as Cramer’sRule. Let us suppose that we wish to solve the above system of equations, that is,Ax = b. Let us define the matrix Ai to be the matrix obtained from A by replacingthe ith column of A by the vector b. Then the solution is given by

xi =|Ai||A|

.

Exercise 69. Derive Cramer’s Rule. [Hint: We know that the solution to thesystem of equations is solved by x = (1/|A|)adj(A)b. This gives a formula for xi.Show that this formula is the same as that given by xi = |Ai|/|A|.]

Exercise 70. Solve the following system of equations (i) by matrix inversionand (ii) by Cramer’s Rule

(a)2x1 − x2 = 2

3x2 + 2x3 = 165x1 + 3x3 = 21

(b)−x1 + x2 + x3 = 1x1 − x2 + x3 = 1x1 + x2 + x3 = 1

.


Exercise 71. Recall that we claimed that the determinant was an invariant.Confirm this by calculating (directly) det(A) and det(B−1AB) where

B =

1 0 11 −1 22 1 −1

and A =

1 0 00 2 00 0 3

.Exercise 72. An nth order determinant of the form∣∣∣∣∣∣∣∣∣∣∣

a11 0 0 . . . 0a21 a22 0 . . . 0a31 a32 a33 . . . 0...

......

. . ....

an1 an2 an3 . . . ann

∣∣∣∣∣∣∣∣∣∣∣is called triangular. Evaluate this determinant. [Hint: Expand the determinant interms of its first row. Expand the resulting (n− 1)× (n− 1) determinant in termsof its first row, and so on.]

10. Eigenvalues and Eigenvectors

Suppose that we have a linear function f : Rn → Rn. When we look athow f deforms Rn one natural question to look at is: Where does f send somelinear subspace? In particular we might ask if there are any linear subspaces thatf maps to themselves. We call such linear subspaces invariant linear subspaces.Of course the space Rn itself and the zero dimensional space {0} are invariantlinear subspaces. The real question is whether there are any others. Clearly, forsome linear transformations there are no other invariant subspaces. For example,a clockwise rotation of π/4 in R2 has no invariant subspaces other than R2 itselfand {0}.

A particularly important class of invariant linear subspaces are the one dimen-sional ones. A one dimensional linear subspace is specified by one nonzero vector,say x. Then the subspace is {λx | λ ∈ R}. Let us call this subspace L(x). If L(x)is an invariant linear subspace of f and if x ∈ L(x) then there is some value λ suchthat f(x) = λx. Moreover the value of λ for which this is true will be the samewhatever value of x we choose in L(x).

Now if we fix the set of basis vectors and thus the matrix A that represents fwe have that if x is in a one dimensional invariant linear subspace of f then thereis some λ ∈ R such that

Ax = λx.

Again we can define this notion without reference to linear functions. Given amatrix A if we can find a pair x, λ with x 6= 0 that satisfy the above equation wecall x an eigenvector of the matrix A and λ the associated eigenvalue. (Sometimesthese are called characteristic vectors and values.)

Exercise 73. Show that the eigenvalues of a matrix are an invariant, thatis, that they depend only on the linear function the matrix represents and not onthe choice of basis vectors. Show also that the eigenvectors of a matrix are notan invariant. Explain why the dependence of the eigenvectors on the particularbasis is exactly what we would expect and argue that is some sense they are indeedinvariant.

Now we can rewrite the equation Ax = λx as

(A− λIn)x = 0.

10. EIGENVALUES AND EIGENVECTORS 35

If x, λ solve this equation and x 6= 0 then we have a nonzero linear combination ofthe columns of A− λIn equal to zero. This means that the columns of A− λIn arenot linearly independent and so det(A− λIn) = 0, that is,

det

a11 − λ a12 . . . a1n

a21 a22 − λ . . . a2n

......

. . ....

an1 an2 . . . ann − λ

= 0.

Now, the left hand side of this last equation is a polynomial of degree n inλ, that is, a polynomial in λ in which n is the highest power of λ that appearswith nonzero coefficient. It is called the characteristic polynomial and the equationis called the characteristic equation. Now this equation may, or may not, have asolution in real numbers. In general, by the fundamental theorem of algebra theequation has n solutions, perhaps not all distinct, in the complex numbers. If thematrix A happens to be symmetric (that is, if aij = aji for all i and j) then all ofits eigenvalues are real. If the eigenvalues are all distinct (that is, different fromeach other) then we are in a particularly well behaved situation. As a prelude westate the following result.

Theorem 5. Given an n×n matrix A suppose that we have m eigenvectors of Ax1, x2, . . . , xm with corresponding eigenvalues λ1, λ2, . . . , λm. If λi 6= λj wheneveri 6= j then x1, x2, . . . , xm are linearly independent.

An implication of this theorem is that an n× n matrix cannot have more thann eigenvectors with distinct eigenvalues. Further this theorem allows us to see thatif an n × n matrix has n distinct eigenvalues then it is possible to find a basisfor Rn in which the linear function that the matrix represents is represented bya diagonal matrix. Equivalently we can find a matrix B such that B−1AB is adiagonal matrix.

To see this let b1, b2, . . . , bn be n linearly independent eigenvectors with associ-ated eigenvalues λ1, λ2, . . . , λn. Let B be the matrix whose columns are the vectorsb1, b2, . . . , bn. Since these vectors are linearly independent the matrix B has aninverse. Now

B−1AB = B−1[Ab1 Ab2 . . . Abn]

= B−1[λ1b1 λ2b

2 . . . λnbn]

= [λ1B−1b1 λ2B

−1b2 . . . λnB−1bn]

=

λ1 0 . . . 00 λ2 . . . 0...

.... . .

...0 0 . . . λn

.

CHAPTER 3

Consumer Behaviour: Optimisation Subject to theBudget Constraint

1. Constrained Maximisation

1.1. Lagrange Multipliers. Consider the problem of a consumer who seeksto distribute his income across the purchase of the two goods that he consumes,subject to the constraint that he spends no more than his total income. Let usdenote the amount of the first good that he buys x1 and the amount of the secondgood x2, the prices of the two goods p1 and p2, and the consumer’s income y.The utility that the consumer obtains from consuming x1 units of good 1 and x2

of good two is denoted u(x1, x2). Thus the consumer’s problem is to maximiseu(x1, x2) subject to the constraint that p1x1 + p2x2 ≤ y. (We shall soon writep1x1 + p2x2 = y, i.e., we shall assume that the consumer must spend all of hisincome.) Before discussing the solution of this problem lets write it in a more“mathematical” way.

(5)maxx1,x2

u(x1, x2)

subject to p1x1 + p2x2 = y

We read this “Choose x1 and x2 to maximise u(x1, x2) subject to the constraintthat p1x1 + p2x2 = y.”

Let us assume, as usual, that the indifference curves (i.e., the sets of points(x1, x2) for which u(x1, x2) is a constant) are convex to the origin. Let us alsoassume that the indifference curves are nice and smooth. Then the point (x∗1, x

∗2)

that solves the maximisation problem (31) is the point at which the indifferencecurve is tangent to the budget line as given in Figure 1.

One thing we can say about the solution is that at the point (x∗1, x∗2) it must be

true that the marginal utility with respect to good 1 divided by the price of good 1must equal the marginal utility with respect to good 2 divided by the price of good2. For if this were not true then the consumer could, by decreasing the consumptionof the good for which this ratio was lower and increasing the consumption of theother good, increase his utility. Marginal utilities are, of course, just the partialderivatives of the utility function. Thus we have

(6)∂u∂x1

(x∗1, x∗2)

p1=

∂u∂x2

(x∗1, x∗2)

p2.

The argument we have just made seems very “economic.” It is easy to give analternate argument that does not explicitly refer to the economic intuition. Let xu2be the function that defines the indifference curve through the point (x∗1, x

∗2), i.e.,

u(x1, xu2 (x1)) ≡ u ≡ u(x∗1, x

∗2).

Now, totally differentiating this identity gives

∂u

∂x1(x1, x

u2 (x1)) +

∂u

∂x2(x1, x

u2 (x1))

dxu2dx1

(x1) = 0.

37

383. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT

-

6

@@

@@

@@

@@

@@

@@

@@

@

q q q q q q q q q q q q q q q q

qqqqqqqqqqqqqqq

x2

x∗2

x1x∗1

u(x1, x2) = u

p1x1 + p2x2 = y

Figure 1

That is,

dxu2dx1

(x1) = −∂u∂x1

(x1, xu2 (x1))

∂u∂x2

(x1, xu2 (x1)).

Now xu2 (x∗1) = x∗2. Thus the slope of the indifference curve at the point (x∗1, x∗2)

dxu2dx1

(x∗1) = −∂u∂x1

(x∗1, x∗2)

∂u∂x2

(x∗1, x∗2).

Also, the slope of the budget line is −p1p2 . Combining these two results again givesresult (6).

Since we also have another equation that (x∗1, x∗2) must satisfy, viz

(7) p1x∗1 + p2x

∗2 = y

we have two equations in two unknowns and we can (if we know what the utilityfunction is and what p1, p2, and y are) go happily away and solve the problem.(This isn’t quite true but we shall not go into that at this point.) What we shalldevelop is a systemic and useful way to obtain the conditions (6) and (7). Let usfirst denote the common value of the ratios in (6) by λ. That is,

∂u∂x1

(x∗1, x∗2)

p1= λ =

∂u∂x2

(x∗1, x∗2)

p2

and we can rewrite this and (7) as

(8)

∂u

∂x1(x∗1, x

∗2)− λp1 = 0

∂u

∂x2(x∗1, x

∗2)− λp2 = 0

y − p1x∗1 − p2x

∗2 = 0.

1. CONSTRAINED MAXIMISATION 39

Now we have three equations in x∗1, x∗2, and the new artificial or auxiliary variable

λ. Again we can, perhaps, solve these equations for x∗1, x∗2, and λ. Consider the

following function

(9) L(x1, x2, λ) = u(x1, x2) + λ(y − p1x1 − p2x2)

This function is known as the Lagrangian. Now, if we calculate ∂L∂x1

, ∂L∂x2

, and, ∂L∂λ ,and set the results equal to zero we obtain exactly the equations given in (8). Wenow describe this technique in a somewhat more general way.

Suppose that we have the following maximisation problem

(10)max

x1,...,xn

f(x1, . . . , xn)

subject to g(x1, . . . , xn) = c

and we let

(11) L(x1, . . . , xn, λ) = f(x1, . . . , xn) + λ(c− g(x1, . . . , xn))

then if (x∗1, . . . , x∗n) solves (10) there is a value of λ, say λ∗ such that

∂L∂xi

(x∗1, . . . , x∗n, λ∗) = 0 i = 1, . . . , n(12)

∂L∂λ

(x∗1, . . . , x∗n, λ∗) = 0.(13)

Notice that the conditions (12) are precisely the first order conditions forchoosing x1, . . . , xn to maximise L, once λ∗ has been chosen. This provides anintuition into this method of solving the constrained maximisation problem. Inthe constrained problem we have told the decision maker that he must satisfyg(x1, . . . , xn) = c and that he should choose among all points that satisfy this con-straint the point at which f(x1, . . . , xn) is greatest. We arrive at the same answerif we tell the decision maker to choose any point he wishes but that for each unit bywhich he violates the constraint g(x1, . . . , xn) = c we shall take away λ units fromhis payoff. Of course we must be careful to choose λ to be the correct value. If wechoose λ too small the decision maker may choose to violate his constraint—e.g.,if we made the penalty for spending more than the consumer’s income very smallthe consumer would choose to consume more goods than he could afford and topay the penalty in utility terms. On the other hand if we choose λ too large thedecision maker may violate his constraint in the other direction, e.g., the consumerwould choose not to spend any of his income and just receive λ units of utility foreach unit of his income.

It is possible to give a more general statement of this technique, allowing formultiple constraints. (Of course, we should always have fewer constraints than wehave variables.) Suppose we have more than one constraint. Consider the problem

maxx1,...,xn

f(x1, . . . , xn)

subject to g1(x1, . . . , xn) = c1

......

gm(x1, . . . , xn) = cm.

Again we construct the Lagrangian

(14)L(x1, . . . , xn, λ1, . . . , λm) = f(x1, . . . , xn)

+ λ1(c1 − g1(x1, . . . , xn)) + · · ·+ λm(cm − gm(x1, . . . , xn))


and again if (x∗1, . . . , x∗n) solves (14) there are values of λ, say λ∗1, . . . , λ

∗m such that

(15)

∂L∂xi

(x∗1, . . . , x∗n, λ∗1, . . . , λ

∗m) = 0 i = 1, . . . , n

∂L∂λj

(x∗1, . . . , x∗n, λ∗1, . . . , λ

∗m) = 0 j = 1, . . . ,m.

1.2. Caveats and Extensions. Notice that we have been referring to the setof conditions which a solution to the maximisation problem must satisfy. (We callsuch conditions necessary conditions.) So far we have not even claimed that therenecessarily is a solution to the maximisation problem. There are many examples ofmaximisation problems which have no solution. One example of an unconstrainedproblem with no solution is

(16) maxx

2x

maximise over the choice of x the function 2x. Clearly the greater we make x thegreater is 2x, and so, since there is no upper bound on x there is no maximum.Thus we might want to restrict maximisation problems to those in which we choosex from some bounded set. Again, this is not enough. Consider the problem

(17) max0≤x≤1

1/x .

The smaller we make x the greater is 1/x and yet at zero 1/x is not even defined.We could define the function to take on some value at zero, say 7. But then thefunction would not be continuous. Or we could leave zero out of the feasible setfor x, say 0 < x ≤ 1. Then the set of feasible x is not closed. Since there wouldobviously still be no solution to the maximisation problem in these cases we shallwant to restrict maximisation problems to those in which we choose x to maximisesome continuous function from some closed (and because of the previous example)bounded set. (We call a set of numbers, or more generally a set of vectors, thatis both closed and bounded a compact set.) Is there anything else that could gowrong? No! The following result says that if the function to be maximised iscontinuous and the set over which we are choosing is both closed and bounded, i.e.,is compact, then there is a solution to the maximisation problem.

Theorem 6 (The Weierstrass Theorem). Let S be a compact set. Let f be acontinuous function that takes each point in S to a real number. (We usually write:let f : S → R be continuous.) Then there is some x∗ in S at which the function ismaximised. More precisely, there is some x∗ in S such that f(x∗) ≥ f(x) for anyx in S.

Notice that in defining such compact sets we typically use inequalities, suchas x ≥ 0. However in Section 1 we did not consider such constraints, but ratherconsidered only equality constraints. However, even in the example of utility max-imisation at the beginning of Section 5.6, there were implicitly constraints on x1

and x2 of the formx1 ≥ 0, x2 ≥ 0.

A truly satisfactory treatment would make such constraints explicit. It is possibleto explicitly treat the maximisation problem with inequality constraints, at theprice of a little additional complexity. We shall return to this question later in thebook.

Also, notice that had we wished to solve a minimisation problem we couldhave transformed the problem into a maximisation problem by simply multiplyingthe objective function by −1. That is, if we wish to minimise f(x) we could doso by maximising −f(x). As an exercise write out the conditions analogous to

2. THE IMPLICIT FUNCTION THEOREM 41

the conditions (8) for the case that we wanted to minimise u(x). Notice that ifx∗1, x∗2, and λ satisfy the original equations then x∗1, x∗2, and −λ satisfy the newequations. Thus we cannot tell whether there is a maximum at (x∗1, x∗2) or aminimum. This corresponds to the fact that in the case of a function of a singlevariable over an unconstrained domain at a maximum we require the first derivativeto be zero, but that to know for sure that we have a maximum we must look at thesecond derivative. We shall not develop the analogous conditions for the constrainedproblem with many variables here. However, again, we shall return to it later inthe book.

2. The Implicit Function Theorem

In the previous section we said things like: “Now we have three equationsin x∗1, x

∗2, and the new artificial or auxiliary variable λ. Again we can, perhaps,

solve these equations for x∗1, x∗2, and λ.” In this section we examine the question

of when we can solve a system of n equations to give n of the variable in termsof the others. Let us suppose that we have n endogenous variables x1, . . . , xn,m exogenous variables or parameters, b1, . . . , bm, and n equations or equilibriumconditions

(18)

f1(x1, . . . , xn, b1, . . . , bm) = 0

f2(x1, . . . , xn, b1, . . . , bm) = 0...

fn(x1, . . . , xn, b1, . . . , bm) = 0,

or, using vector notation,f(x, b) = 0,

where f : Rn+m → Rn, x ∈ Rn, that is it is an n vector, b ∈ Rm, and 0 ∈ Rn.When can we solve this system to obtain functions giving each xi as a function

of b1, . . . , bm? As we’ll see below we only give an incomplete answer to this question,but first let’s look at the case that the function f is a linear function.

Suppose that our equations area11x1 + . . . a1nxn + c11b1 + c1mbm = 0a21x1 + . . . a2nxn + c21b1 + c2mbm = 0

...an1x1 + . . . annxn + cn1b1 + cnmbm = 0.

We can write this, in matrix notation, as

[A | C][xb

]= 0,

where A is an n× n matrix, C is an n×m matrix, x is an n× 1 (column) vector,and b is an m× 1 vector.

This we can rewrite asAx+ Cb = 0,

and solve this to givex = −A−1Cb.

And we can do this as long as the matrix A can be inverted, that is, as long as thematrix A is of full rank.

Our answer to the general question in which the function f may not be linearis that if there are some values (x, b) for which f(x, b) = 0 then if, when we takea linear approximation to f we can solve the approximate linear system as we did


above, then we can solve the true nonlinear system, at least in a neighbourhood of(x, b). By this last phrase we mean that if b is not close to b we may not be able tosolve the system, and that for a particular value of b there may be many values ofx that solve the system, but there is only one close to x.

To see why we can’t, in general, do better than this consider the equationf : R2 → R given by f(x, b) = g(x)−b, where the function g is graphed in Figure 2.

Notice that the values (x, b) satisfy the equation f(x, b) = 0. For all values of bclose to b we can find a unique value of x close to x such that f(x, b) = 0. However,(1) for each value of b there are other values of x far away from x that also satisfyf(x, b) = 0, and (2) there are values of b, such as b for which there are no values ofx that satisfy f(x, b) = 0.

-

6g(x)

b

b q

xx

q q q q q q q q q q q q qqqqqqqqqq

Figure 2

Let us consider again the system of equations 18. We say that the function fis C1 on some open set A ⊂ Rn+m if f has partial derivatives everywhere in A andthese partial derivatives are continuous on A.

Theorem 7. Suppose that f : Rn+m → Rn is a C1 function on an open setA ⊂ Rn+m and that (x, b) in A is such that f(x, b) = 0. Suppose also that

∂f(x, b)∂x

=

∂f1(x,b)∂x1

· · · ∂f1(x,b)∂xn

......

∂fn(x,b)∂x1

· · · ∂fn(x,b)∂xn

is of full rank. Then there are open sets A1 ⊂ Rn and A2 ⊂ Rm with x in A1 andb in A2 and A1×A2 ⊂ A such that for each b in A2 there is exactly one g(b) in A1

such that f(g(b), b) = 0. Moreover, g : A2 → A1 is a C1 function and

∂g(b)∂b

= −[∂f(g(b), b)

∂x

]−1 [∂f(g(b), b)

∂b

].

3. THE THEOREM OF THE MAXIMUM 43

Exercise 74. Consider the general utility maximisation problem

(19)max

x1,x2,...,xn

u(x1, x2, . . . , xn)

subject to p1x1 + p2x2 + · · ·+ pnxn = w.

Suppose that for some price vector p the maximisation problem has a utility max-imising bundle x. Find conditions on the utility function such that in a neighbour-hood of (x, p) we can solve for the demand functions x(p). Find the derivatives ofthe demand functions, ∂x/∂p.

Exercise 75. Now suppose that there are only two goods and the utilityfunction is given by

u(x1, x2) = (x1)13 (x2)

23 .

Solve this utility maximisation problem, as you learned to do in Section 1 of thisChapter, and then differentiate the demand functions that you find to find thepartial derivative with respect to p1, p2, and w of each demand function.

Also find the same derivatives using the method of the previous exercise.

3. The Theorem of the Maximum

Often in economics we are not so much interested in what the solution to aparticular maximisation problem is but rather wish to know how the solution to aparameterised problem depends on the parameters. Thus in our first example ofutility maximisation we might be interested not so much in what the solution to themaximisation problem is when p1 = 2, p2 = 7, and y = 25, but rather in how thesolution depends on p1, p2, and y. (That is, we might be interested in the demandfunction.) Sometimes we shall also be interested in how the maximised functiondepends on the parameters—in the example how the maximised utility depends onp1, p2, and y.

This raises a number of questions. In order for us to speak meaningfully of ademand function it should be the case that the maximisation problem has a uniquesolution. Further, we would like to know if the “demand” function is continuous—or even if it is differentiable. Consider again the problem (14), but this time let usexplicitly add some parameters.

(20)

maxx1,...,xn

f(x1, . . . , xn, a1, . . . , ak)

subject to g1(x1, . . . , xn, a1, . . . , ak) = c1

......

gm(x1, . . . , xn, a1, . . . , ak) = cm

In order to be able to say whether or not the problem has a unique solutionit is useful to know something about the shape or curvature of the functions fand g. We say a function is concave if for any two points in the domain of thefunction the value of function at a weighted average of the two points is greaterthan the weighted average of the value of the function at the two points. We saythe function is convex if the value of the function at the average is less than theaverage of the values. The following definition makes this a little more explicit. (Inboth definitions x = (x1, . . . , xn) is a vector.)

Definition 15. A function f is concave if for any x and x′ with x 6= x′ andfor any t such that 0 < t < 1 we have f(tx+ (1− t)x′) ≥ tf(x) + (1− t)f(x′). Thefunction is strictly concave if f(tx+ (1− t)x′) > tf(x) + (1− t)f(x′).

A function f is convex if for any x and x′ with x 6= x′ and for any t such that0 < t < 1 we have f(tx+ (1− t)x′) ≤ tf(x) + (1− t)f(x′). The function is strictlyconvex if f(tx+ (1− t)x′) < tf(x) + (1− t)f(x′).


The result we are about to give is most conveniently stated when our statementof the problem is in terms of inequality constraints rather than equality constraints.As mentioned earlier we shall examine this kind of problem later in this course.However for the moment in order to proceed with our discussion of the probleminvolving equality constraints we shall assume that all of the functions with whichwe are dealing are increasing in the x variables. (See Exercise 1 for a formaldefinition of what it means for a function to be increasing.) In this case if f isstrictly concave and gj is convex for each j then the problem has a unique solution.In fact the concepts of concavity and convexity are somewhat stronger than isrequired. We shall see later in the course that they can be replaced by the conceptsof quasi-concavity and quasi-convexity. In some sense these latter concepts are the“right” concepts for this result.

Theorem 8. Suppose that f and gj are increasing in (x1, . . . , xn). If f isstrictly concave in (x1, . . . , xn) and gj is convex in (x1, . . . , xn) for j = 1, . . . ,mthen for each value of the parameters (a1, . . . , ak) if problem (20) has a solution(x∗1, . . . , x

∗n) that solution is unique.

Now let v(a1, . . . , ak) be the maximised value of f when the parameters are(a1, . . . , ak). Let us suppose that the problem is such that the solution is unique andthat (x∗1(a1, . . . , ak), . . . , x∗n(a1, . . . , ak)) are the values that maximise the functionf when the parameters are (a1, . . . , ak) then

(21) v(a1, . . . , ak) = f(x∗1(a1, . . . , ak), . . . , x∗n(a1, . . . , ak), a1, . . . , ak).

(Notice however that the function v is uniquely defined even if there is not a uniquemaximiser.)

The Theorem of the Maximum gives conditions on the problem under whichthe function v and the functions x∗1, . . . , x

∗n are continuous. The constraints in the

problem (20) define a set of feasible vectors x over which the function f is to bemaximised. Let us call this set G(a1, . . . , ak), i.e.,

(22) G(a1, . . . , ak) = {(x1, . . . , xn) | gj(x1, . . . , xn, a1, . . . , ak) = cj ∀j}

Now we can restate the problem as

(23)max

x1,...,xn

f(x1, . . . , xn, a1, . . . , ak)

subject to (x1, . . . , xn) ∈ G(a1, . . . , ak).

Notice that both the function f and the feasible set G depend on the parametersa, i.e., both may change as a changes. The Theorem of the Maximum requiresboth that the function f be continuous as a function of x and a and that thefeasible set G(a1, . . . , ak) change continuously as a changes. We already know—or should know—what it means for f to be continuous but the notion of what itmeans for a set to change continuously is less elementary. We call G a set valuedfunction or a correspondence. G associates with any vector (a1, . . . , ak) a subset ofthe vectors (x1, . . . , xn). The following two definitions define what we mean by acorrespondence being continuous. First we define what it means for two sets to beclose.

Definition 16. Two sets of vectors A and B are within ε of each other if forany vector x in one set there is a vector x′ in the other set such that x′ is within εof x.

We can now define the continuity of the correspondence G in essentially thesame way that we define the continuity of a single valued function.

4. THE ENVELOPE THEOREM 45

Definition 17. The correspondence G is continuous at (a1, . . . , ak) if for anyε > 0 there is δ > 0 such that if (a′1, . . . , a

′k) is within δ of (a1, . . . , ak) then

G(a′1, . . . , a′k) is within ε of G(a1, . . . , ak).

It is, unfortunately, not the case that the continuity of the functions gj neces-sarily implies the continuity of the feasible set. (Exercise 2 asks you to construct acounterexample.)

Remark 1. It is possible to define two weaker notions of continuity, which wecall upper hemicontinuity and lower hemicontinuity. A correspondence is in factcontinuous in the way we have defined it if it is both upper hemicontinuous andlower hemicontinuous.

We are now in a position to state the Theorem of the Maximum. We assumethat f is a continuous function, that G is a continuous correspondence, and thatfor any (a1, . . . , ak) the set G(a1, . . . , ak) is compact. The Weierstrass Theoremthus guarantees that there is a solution to the maximisation problem (23) for any(a1, . . . , ak).

Theorem 9 (Theorem of the Maximum). Suppose that f(x1, . . . , xn, a1, . . . , ak)is continuous (in (x1, . . . , xn, a1, . . . , ak)), that G(a1, . . . , ak) is a continuous corre-spondence, and that for any (a1, . . . , ak) the set G(a1, . . . , ak) is compact. Then

(1) v(a1, . . . , ak) is continuous, and(2) if (x∗1(a1, . . . , ak), . . . , x∗n(a1, . . . , ak)) are (single valued) functions then

they are also continuous.

Later in the course we shall see how the Implicit Function Theorem allows usto identify conditions under which the functions v and x∗ are differentiable.

Exercises.

Exercise 76. We say that the function f(x1, . . . , xn) is nondecreasing if x′i ≥xi for each i implies that f(x′1, . . . , x

′n) ≥ f(x1, . . . , xn), is increasing if x′i > xi

for each i implies that f(x′1, . . . , x′n) > f(x1, . . . , xn) and is strictly increasing if

x′i ≥ xi for each i and x′j > xj for at least one j implies that f(x′1, . . . , x′n) >

f(x1, . . . , xn). Show that if f is nondecreasing and strictly concave then it must bestrictly increasing. [Hint: This is very easy.]

Exercise 77. Show by example that even if the functions gj are continuousthe correspondence G may not be continuous. [Hint: Use the case n = m = k = 1.]

4. The Envelope Theorem

In this section we examine a theorem that is particularly useful in the studyof consumer and producer theory. There is in fact nothing mysterious about thistheorem. You will see that the proof of this theorem is simply calculation and anumber of substitutions. Moreover the theorem has a very clear intuition. It is this:Suppose we are at a maximum (in an unconstrained problem) and we change thedata of the problem by a very small amount. Now both the solution of the problemand the value at the maximum will change. However at a maximum the functionis flat (the first derivative is zero). Thus when we want to know by how much themaximised value has changed it does not matter (very much) whether or not wetake account of how the maximiser changes or not. See Figure 2. The intuition fora constrained problem is similar and only a little more complicated.

To motivate our discussion of the Envelope Theorem we will first consider aparticular case, viz, the relation between short and long run average cost curves.Recall that, in general we assume that the average cost of producing some good is


-

6

qqqqqqqqqqqqqqqqqqqq

qqqqqqqqqqqqqqqqqqqqq

q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q

f(x, a)

f(x∗(a), a)

f(x∗(a′), a′)f(x∗(a), a′) f(·, a′)

f(·, a)

xx∗(a) x∗(a′)

Figure 2

a function of the amount of the good to be produced. The short run average costfunction is defined to be the function which for any quantity, Q, gives the averagecost of producing that quantity, taking as given the scale of operation, i.e., the sizeand number of plants and other fixed capital which we assume cannot be changedin the short run (whatever that is). The long run average cost function on theother hand gives, as a function of Q, the average cost of producing Q units of thegood, with the scale of operation selected to be the optimal scale for that level ofproduction.

That is, if we let the scale of operation be measured by a single variable k,say, and we let the short run average cost of producing Q units when the scale isk be given by SRAC(Q, k) and the long run average cost of producing Q units byLRAC(Q) then we have

LRAC(Q) = minkSRAC(Q, k).

Let us denote, for a given value Q, the optimal level of k by k(Q). That is, k(Q) isthe value of k that minimises the right hand side of the above equation.

Graphically, for any fixed level of k the short run average cost function can berepresented by a curve (normally assumed to be U-shaped) drawn in two dimensionswith quantity on the horizontal axis and cost on the vertical axis. Now think aboutdrawing one short run average cost curve for each of the (infinite) possible values ofk. One way of thinking about the long run average cost curve is as the “bottom” orenvelope of these short run average cost curves. Suppose that we consider a pointon this long run or envelope curve. What can be said about the slope of the longrun average cost curve at this point. A little thought should convince you that itshould be the same as the slope of the short run curve through the same point.(If it were not then that short run curve would come below the long run curve, a

4. THE ENVELOPE THEOREM 47

contradiction.) That is,

d LRAC(Q)dQ

=∂ SRAC(Q, k(Q))

∂Q.

See Figure 3.

-

6

qqqqqqqqqqqqqqqqq q q q q q q q q q q q q q q

Cost

LRAC(Q) =SRAC(Q, k(Q))

LRAC

SRAC

QQ

Figure 3

The envelope theorem is a general statement of the result of which this is aspecial case. We will consider not only cases in which Q and k are vectors, but alsocases in which the maximisation or minimisation problem includes some constraints.

Let us consider again the maximisation problem (20). Recall:

maxx1,...,xn

f(x1, . . . , xn, a1, . . . , ak)

subject to g1(x1, . . . , xn, a1, . . . , ak) = c1

......

gm(x1, . . . , xn, a1, . . . , ak) = cm

Again let L(x1, . . . , xn, λ1, . . . , λm; a1, . . . , ak) be the Lagrangian function.

(24)

L(x1, . . . , xn, λ1, . . . , λm; a1, . . . , ak) = f(x1, . . . , xn, a1, . . . , ak)

+m∑j=1

λj(cj − gj(x1, . . . , xn, a1, . . . , ak)).

Let (x∗1(a1, . . . , ak), . . . , x∗n(a1, . . . , ak)) and (λ1(a1, . . . , ak), . . . , λm(a1, . . . , ak)) bethe values of x and λ that solve this problem. Now let

(25) v(a1, . . . , ak) = f(x∗1(a1, . . . , ak), . . . , x∗n(a1, . . . , ak), a1, . . . , ak)

That is, v(a1, . . . , ak) is the maximised value of the function f when the parametersare (a1, . . . , ak). The envelope theorem says that the derivative of v is equal to thederivative of L at the maximising values of x and λ. Or, more precisely


Theorem 10 (The Envelope Theorem). If all functions are defined as aboveand the problem is such that the functions x∗ and λ are well defined then∂v

∂ah(a1, . . . , ak) =

∂L∂ah

(x∗1(a1, . . . , ak), . . . , x∗n(a1, . . . , ak),

λ1(a1, . . . , ak), . . . , λm(a1, . . . , ak), a1, . . . , ak)

=∂f

∂ah(x∗1(a1, . . . , ak), . . . , x∗n(a1, . . . , ak), a1, . . . , ak)

−m∑j=1

λj(a1, . . . , ak)∂gh∂ah

(x∗1(a1, . . . , ak), . . . , x∗n(a1, . . . , ak), a1, . . . , ak)

for all h.

In order to show the advantages of using matrix and vector notation we shallrestate the theorem in that notation before returning to give a proof of the theorem.(In proving the theorem we shall return to using mainly scalar notation.)

Theorem 10 (The Envelope Theorem). Under the same conditions as above

∂v

∂a(a) =

∂L∂a

(x∗(a), λ(a), a)

=∂f

∂a(x∗(a), a)− λ(a)

∂g

∂a(x∗(a), a).

Proof. From the definition of the function v we have

(26) v(a1, . . . , ak) = f(x∗1(a1, . . . , ak), . . . , x∗n(a1, . . . , ak), a1, . . . , ak)

Thus

(27)∂v

∂ah(a) =

∂f

∂ah(x∗(a), a) +

n∑i=1

∂f

∂xi(x∗(a), a)

∂x∗i∂ah

(a).

Now, from the first order conditions (12) we have

∂f

∂xi(x∗(a), a)−

m∑j=1

λj(a)∂gj∂xi

(x∗(a), a) = 0.

Or

(28)∂f

∂xi(x∗(a), a) =

m∑j=1

λj(a)∂gj∂xi

(x∗(a), a).

Also, since x∗(a) satisfies the constraints we have, for each j

gj(x∗1(a), . . . , x∗n(a), a1, . . . , ak) ≡ cj .

And, since this holds as an identity, we may differentiate both sides with respectto ah giving

n∑i=1

∂gj∂xi

(x∗(a), a)∂x∗i∂ah

(a) +∂gj∂ah

(x∗(a), a) = 0.

Or

(29)n∑i=1

∂gj∂xi


(a) = − ∂gj∂ah

(x∗(a), a).

Substituting (28) into (27) gives

∂v

∂ah(a) =

∂f

∂ah(x∗(a), a) +

n∑i=1

[m∑j=1

λj(a)∂gj∂xi

(x∗(a), a)]∂x∗i∂ah

(a).

5. APPLICATIONS TO MICROECONOMIC THEORY 49

Changing the order of summation gives

(30)∂v

∂ah(a) =

∂f

∂ah(x∗(a), a) +

m∑j=1

λj(a)[n∑i=1

∂gj∂xi


(a)].

And now substituting (29) into (30) gives

∂v

∂ah(a) =

∂f

∂ah(x∗(a), a)−

m∑j=1

λj(a)∂gj∂ah

(x∗(a), a),

which is the required result. �

Exercises.

Exercise 78. Rewrite this proof using matrix notation. Go through your proofand identify the dimension of each of the vectors or matrices you use. For examplefx is a 1× n vector, gx is an m× n matrix.

5. Applications to Microeconomic Theory

5.1. Utility Maximisation. Let us again consider the problem given in (31)

maxx1,x2

u(x1, x2)

subject to p1x1 + p2x2 − y = 0.

Let v(p1, p2, y) be the maximised value of u when prices and income are p1, p2, andy. Let us consider the effect of a change in y with p1 and p2 remaining constant.By the Envelope Theorem

∂v

∂y=

∂

∂y{u(x1, x2) + λ(y − p1x1 + p2x2)} = 0 + λ1 = λ.

This is the familiar result that λ is the marginal utility of income.

5.2. Expenditure Minimisation. Let us consider the problem of minimisingexpenditure subject to attaining a given level of utility, i.e.,

minx1,...,xn

n∑i=1

pixi

subject to u(x1, . . . , xn)− u0 = 0.

Let the minimised value of the expenditure function be denoted bye(p1, . . . , pn, u

0). Then by the Envelope Theorem we obtain

∂e

∂pi=

∂

∂pi{n∑i=1

pixi + λ(u0 − u(x1, . . . , xn))} = xi − λ0 = xi

when evaluated at the point which solves the minimisation problem which we writeas hi(p1, . . . , pn, u

0) to distinguish this (compensated) value of the demand for goodi as a function of prices and utility from the (uncompensated) value of the demandfor good i as a function of prices and income. This result is known as Hotelling’sTheorem.


5.3. The Hicks-Slutsky Equations. It can be shown that the compensateddemand at utility u0, i.e., hi(p1, . . . , pn, u

0) is equal to the uncompensated de-mand at income e(p1, . . . , pn, u

0), i.e., xi(p1, . . . , pn, e(p1, . . . , pn, u0)). (This result

is known as the duality theorem.) Thus totally differentiating the identity

xi(p1, . . . , pn, e(p1, . . . , pn, u0)) ≡ hi(p1, . . . , pn, u

0)

with respect to pk we obtain

∂xi∂pk

+∂xi∂y

∂e

∂pk=∂hi∂pk

which by Hotelling’s Theorem gives

∂xi∂pk

+∂xi∂y

hk =∂hi∂pk

.

So∂xi∂pk

=∂hi∂pk− hk

∂xi∂y

for all i, k = 1, . . . , n. These are the Hicks-Slutsky equations.

5.4. The Indirect Utility Function. Again let v(p1, . . . , pn, y) be the indi-rect utility function, that is, the maximised value of utility as described in Appli-cation (1). Then by the Envelope Theorem

∂v

∂pi=

∂u

∂pi− λxi(p1, . . . , pn, y) = −λxi(p1, . . . , pn, y)

since ∂u∂pi

= 0. Now, since we have already shown that λ = ∂v∂y (in Section 4.1) we

have

xi(p1, . . . , pn, y) = −∂v/∂pi∂v/∂y

.

This is known as Roy’s Theorem.

5.5. Profit functions. Now consider the problem of a firm that maximisesprofits subject to technology constraints. Let x = (x1, . . . , xn) be a vector ofnetputs, i.e., xi is positive if the firm is a net supplier of good i, negative if the firmis a net user of that good. Let assume that we can write the technology constraintsas F (x) = 0. Thus the firm’s problem is

maxx1,...,xn

n∑i=1

pixi

subject to F (x1, . . . , xn) = 0.

Let ϕi(p) be the value of xi that solves this problem, i.e., the net supply ofcommodity i when prices are p. (Here p is a vector.) We call the maximised valuethe profit function which is given by

Π(p) =n∑i=1

piϕi(p).

And so by the Envelope Theorem

∂Π∂pi

= ϕi(p).

This result is known as Hotelling’s lemma.


5.6. Cobb-Douglas Example. We consider a particular Cobb-Douglas ex-ample of the utility maximisation problem

(31)maxx1,x2

√x1√x2

subject to p1x1 + p2x2 = w

The Lagrangean is

(32) L(x1, x2, λ) =√x1√x2 + λ(y − p1x1 − p2x2)

and the first order conditions are∂L∂x1

=12x− 1

21 x

122 − p1λ = 0(33)

∂L∂x2

=12x

121 x− 1

22 − p2λ = 0(34)

∂L∂λ

= w − p1x1 − p2x2 = 0.(35)

If we divide equation (33) by equation (34) we obtain

x1−1x2 = p1/p2

orp1x1 = p2x2

and if we substitute this into equation (35) we obtain

w − p1x1 − p1x1 = 0

or

(36) x1 =w

2p1.

Similarly,

(37) x2 =w

2p2.

Substituting equations (36) and (37) into the utility function gives

(38) v(p1, p2, w) =

√w2

4p1p2=

w

2√p1p2

.

As a check here we can check some known properties of the indirect utilityfunction. For example it is homogeneous of degree zero, that is, is we multiply p1,p2, and w by the same positive constant, say α we do not change the value of v.You should confirm that this is the case.

We now calculate the optimal value of λ from the first order conditions bysubstituting equations (36) and (37) into (33), giving

12

(w

2p1

)− 12(w

2p2

) 12

− p1λ = 0

or12

√2p1w

w2p2= p1λ

or12

√p1√p2· 1p1

= λ

or

λ =1

2√p1p2

.


Our first application of the Envelope Theorem told us that this value of λ couldbe found as the derivative of the indirect utility function with respect to w. Weconfirm this by differentiating the function we found above with respect to w.

∂v

∂w=

∂

∂w

w

2√p1p2

=1

2√p1p2

as we had found directly above.Now let us, for the same utility function consider the expenditure minimisation

problem

minx1,x2

p1x2 + p2x2

subject to√x1√x2 = u.

The Lagrangian is

(39) L(x1, x2, λ) = p1x1 + p2x2 + λ(u−√x1√x2)

and the first order conditions are∂L∂x1

= p1 − λ12x− 1

21 x

122 = 0(40)

∂L∂x2

= p2 − λ12x

121 x− 1

22 = 0(41)

∂L∂λ

= u−√x1√x2 = 0.(42)

Dividing equation (40) by equation (41) givesp1

p2=x2

x1

or

(43) x2 =p1x1

p2.

And, if we substitute equation (43) into equation (40) we obtain

u− x1

√p1

p2

or

x1 = u

√p2

p1.

Similarly,

x2 = u

√p1

p2,

and if we substitute these values back into the objective function we obtain theexpenditure function

e(p1, p2, u) = p1u

√p2

p1+ p2u

√p1

p2= 2u

√p1p2.

Hotelling’s Theorem tells us that is we differentiate this expenditure functionwith respect to pi we should obtain the Hicksian demand function hi.

∂e(p1, p2, u)∂p1

=∂

∂p12u√p1p2 = 2u · 1

2

√p2

p1= u

√p2

p1

as we had already found. And similarly for h2.


Let us summarise what we have found so far. The Marshallian demand func-tions are

x1(p1, p2, w) =w

2p1

x2(p1, p2, w) =w

2p2

The indirect utility function is

v(p1, p2, w) =w

2√p1p2

.

The Hicksian demand functions are

h1(p1, p2, w) = u

√p2

p1

h2(p1, p2, w) = u

√p1

p2,

and the expenditure function is

e(p1, p2, u) = 2u√p1p2.

We now look at the third application concerning the Hicks -Slutsky decompo-sition. First let us confirm that if we substitute the expenditure function for w inthe Marshallian demand function we do obtain the Hicksian demand function.

x1(p1, p2, e(p1, p2, u)) =e(p1, p2, u)

2p1

=2u√p1p2

2p1

= u

√p2

p1,

as required.Similarly, if we plug the indirect utility function v into the Hicksian demand

function hi we obtain the Marshallian demand function xi. Confirmation of this isleft as an exercise. [You should do this exercise. If you understand properly it isvery easy. If you understand a bit then doing the exercise will solidify your under-standing. If you can’t do it then it is a message to get some further explanation.]

Let us now check the Hicks-Slutsky decomposition for the effect of a change inthe price of good 2 on the demand for good 1. The Hicks-Slutsky decompositiontells us that

∂x1

∂p2=∂h1

∂p2− h2

∂x1

∂w.

Calculating these partial derivatives we have

∂x1

∂p2= 0

∂x1

∂w=

12p1

∂h1

∂p2=

u√p1× 1

2× 1√p2

=u

2√p1p2


and

h1 = u

√p2

p1.

Substituting into the right hand side of the Hicks-Slutsky equation above gives

RHS =u

2√p1p2

− u√p2

p1· 1

2p1= 0,

which is exactly what we had found for the left hand side of the Hicks-Slutskyequation.

Finally we check Roy’s Theorem, which tells us that the Marshallian demandfor good 1 can be found as

x1(p1, p2, w) =− ∂v∂p1∂v∂w

.

In this case we obtain

x1(p1, p2, w) =−w2 ×

1√p2× −1

2 × p1−32

12

√1

p1p2

=w

2p1,

as required.

Exercises.

Exercise 79. Consider the direct utility function

u(x) =n∑i=1

βi log(xi − γi),

where βi and γi, i = 1, . . . , n are, respectively, positive and nonpositive parameters.(1) Derive the indirect utility function and show that it is decreasing in its

arguments.(2) Verify Roy’s Theorem.(3) Derive the expenditure function and show that it is homogeneous of degree

one and nondecreasing in prices.(4) Verify Hotelling’s Theorem.

Exercise 80. For the utility function defined in exercise 2,(1) Derive the Slutsky equation.(2) Let di(p, y) be the demand for good i derived from the above utility func-

tion. Goods i and j are said to be gross substitutes if ∂di(p, y)/∂pj > 0and gross complements if ∂di(p, y)/∂pj < 0. For this utility function arethe various goods gross substitutes, gross complements, or can we not say?

(The two previous exercises are taken from R. Robert Russell and MauriceWilkinson, Microeconomics: A Synthesis of Modern and Neoclassical Theory, NewYork, John Wiley & Sons, 1979.)

Exercise 81. An electric utility has two generating plants in which total costsper hour are c1 and c2 respectively where

c1 =80 + 2x1 + 0.001bx21

b >0

c2 =90 + 1.5x2 + 0.002x22


where xi is the quantity generated in the i-th plant. If the utility is required to pro-duce 2000 megawatts in a particular hour, how should it allocate this load betweenthe plants so as to minimise costs? Use the Lagrangian method and interpret themultiplier. How do total costs vary as b changes. (That is, what is the derivativeof the minimised cost with respect to b.)

CHAPTER 4

Topics in Convex Analysis

1. Convexity

Convexity is one of the most important mathematical properties in econom-ics. For example, without convexity of preferences, demand and supply functionsare not continuous, and so competitive markets generally do not have equilibriumpoints. The economic interpretation of convex preference sets in consumer theory isdiminishing marginal rates of substitution; the interpretation of convex productionsets is constant or decreasing returns to scale. Considerably less is known aboutgeneral equilibrium models that allow non-convex production sets (e.g., economiesof scale) or non-convex preferences (e.g., the consumer prefers a pint of beer or ashot of vodka alone to any mixture of the two).

Another set of mathematical results closely connected to the notion of convexityis so-called separation and support theorems. These theorems are frequently used ineconomics to obtain a price system that leads consumers and producers to choosePareto-efficient allocation. That is, given the prices, producers are maximizingprofits, and given those profits as income, consumers are maximizing utility subjectto their budget constraints.

1.1. Convex Sets. Given two points x, y ∈ Rn, a point z = ax + (1− a) y,where 0 ≤ a ≤ 1, is called a convex combination of x and y.

The set of all possible convex combinations of x and y, denoted by [x, y], iscalled the interval with endpoints x and y (or, the line segment connecting x andy):

[x, y] = {ax+ (1− a) y : 0 ≤ a ≤ 1} .

Definition 18. A set S ⊆ Rn is convex iff for any x, y ∈ S the interval[x, y] ⊆ S.

In words: a set is convex if it contains the line segment connecting any two ofits points; or, more loosely speaking, a set is convex if along with any two points itcontains all points between them.

Convex sets in R2 include interiors of triangle, squares, circles, ellipses, andhosts of other sets. Note also that, for example in R3, while the interior of a cube isa convex set, its boundary is not. The quintessential convex set in Euclidean spaceRn for any n > 1 is the n−dimensional sphere SR(a) of radius R > 0 about pointa ∈ Rn, given by

SR(a) = {x : x ∈ Rn, |x− a| < R}.More examples of convex sets:1. Is the empty set convex? Is a singleton convex? Is Rn convex?There are also several standard ways of forming convex sets from convex sets:2. Let A,B ⊆ Rn be sets. The Minkowski sum A+B ⊆ Rn is defined as

A+B = {x+ y : x ∈ A, y ∈ B} .

When B = {b} is a singleton, the set A+ b is called a translation of A. Prove thatA+B is convex if A and B are convex.

57

58 4. TOPICS IN CONVEX ANALYSIS

3. Let A ⊆ Rn be a set and α ∈ R be a number. The scaling αA ⊆ Rn isdefined as

αA = {αx : x ∈ A} .When α > 0, the set αA is called a dilation of A. Prove that αA is convex if A isconvex.

4. Prove that the intersection ∩i∈ISi of any number of convex sets is convex.5. Show by example that the union of convex sets need not be convex.It is also possible to define convex combination of arbitrary (but finite) number

of points.

Definition 19. Let x1, ..., xk be a finite set of points from Rn. A point

x =k∑i=1

αixi,

where αi ≥ 0 for i = 1, ..., k andk∑i=1

αi = 1, is called a convex combination ofx1, ..., xk.

Note that the definition of a convex combination of two points is a special caseof this definition. (Prove it)

Can we generate ‘superconvex’ sets using definition 19? No! as the followingLemma shows.

Lemma 1. A set S ⊆ Rn is convex iff every convex combination of points of Sis in S.

Proof. If a set contains all convex combinations of its points it is obviouslyconvex, because it also contains convex combinations of all pairs its points. Thus,we need to show that a convex set contains any convex combination of its points.The proof is by induction on the number of points of S in a convex combination.By definition, convex set contains all convex combinations of any two of it points.Suppose that S contains any convex combination of n or fewer points and considerone of n + 1 points, x =

∑n+1i=1 αixi. Since not all αi = 1, we can relabel them so

that αn+1 < 1. Then

x = (1− αn+1)n∑i=1

αi1− αn+1

xi + αn+1xn+1

= (1− αn+1) y + αn+1xn+1.

Note that y ∈ S by induction hypothesis (as a convex combination of n points ofS) and, as a result, so is x, being a convex combination of two points in S. �

But, using definition 19, we can generate convex sets from non-convex sets!This operation is very useful, so the resulting set deserves a special name.

Definition 20. Given a set S ⊆ Rn the set of all convex combinations ofpoints from S, denoted convS, is called the convex hull of S.

Note: convince yourself that the adjective ‘convex’ in the term ‘convex hull’ iswell-deserved by proving that convex hull is indeed convex! Now, the lemma 1 canbe written more succinctly: S = convS iff S is convex.

1.2. Convex Hulls. The next theorem deals the following interesting prop-erty of convex hulls: the convex hull of a set S is the intersection of all convex setscontaining S. Thus, in a natural sense, the convex hull of a set S is the ‘smallest’convex set containing S. In fact, many authors define convex hulls in that way andthen prove our Definition 20 as theorem.

1. CONVEXITY 59

Theorem 11. Let S ⊆ Rn be a set then any convex set containing S alsocontains convS.

Proof. Let A be a convex set such that S ⊆ A. By lemma 1 A contains allconvex combinations of its points and, in particular, all convex combinations ofpoints of its subset S, which is convS. �

The next property is quite obvious and, again, frustrates attempts to generate‘superconvex’ sets, this time by trying to take convex hulls of convex hulls.

1. Prove that convconvS = convS for any S.2. Prove that if A ⊂ B then convA ⊂ convB.The next property relates the operation of taking convex hulls and of taking

direct sums. It does not matter in which order you use these operations.3. Prove that conv (A+B) = (convA) + (convB).4. Prove that conv (A ∩B) ⊆ (convA) ∩ (convB).5. Prove that (convA) ∪ (convB) ⊆ conv (A ∪B).

1.3. Caratheodory’s Theorem. The definition 20 implies that any point xin the convex hull of S is representable as a convex combination of (finitely) manypoints of S but it places no restrictions on the number of points of S requiredto make the combination. Caratheodory’s Theorem puts the upper bound on thenumber of points required, in Rn the number of points never has to be more thann+ 1.

Theorem 12 (Caratheodory, 1907). Let S ⊆ Rn be a non-empty set then everyx ∈ convS can be represented as a convex combination of (at most) n + 1 pointsfrom S.

Note that the theorem does not ‘identify’ points used in representation, theirchoice would depend on x.

Show by example that the constant n + 1 in Caratheodory’s theorem cannotbe improved. That is, exhibit a set S ⊆ Rn and a point x ∈ convS that cannot berepresented as a convex combination of fewer than n+ 1 points from S.

1.4. Polytopes. The simplest convex sets are those which are convex hulls ofa finite set of points, that is, sets of the form S = conv{x1 , x2 , ..., xm}. The convexhull of a finite set of points in Rn is called a polytope.

1. Prove that the set

∆ = {x ∈ Rn+1 :n+1∑i=1

xi = 1 and xi ≥ 0 for any i}

is a polytope. This polytope is called the standard n−dimensional simplex.2. Prove that the set

C = {x ∈ Rn+1 : 0 ≤ xi ≤ 1 for any i}

is a polytope. This polytope is called an n−dimensional cube.3. Prove that the set

O = {x ∈ Rn+1 :n+1∑i=1

|xi| ≤ 1}

is a polytope. This polytope is called a (hyper)octahedron.

1.5. Topology of Convex Sets.(1) The closure of a convex set is a convex set.(2) The interior of a convex set (possible empty) is convex.


1.6. Aside: Helly’s Theorem. While there are not so many applications ofHelly’s theorem to economics (in fact, I am aware of the only one paper that usesHelly’s theorem in economic context), it is definitely one of the most famous resultsin convexity.

Theorem 13 (Helly, 1913). Let A1, A2, ..., Am ⊆ Rn be a finite family of convexsets with m ≥ n+ 1. Suppose that every n+ 1 sets have a nonempty intersection.Then all sets have a nonempty intersection.

To prove Helly’s theorem with elegance we need first to formulate a very useulresult obtianed by J.Radon.

Theorem 14 (Radon, 1921). Let S ⊆ Rn be a set of at least n + 2 points.Then there are two non-intersecting subsets R ⊂ S (‘red points’) and B ⊂ S (‘bluepoints’) such that

convR ∩ convB 6= ∅.

Proof. Let x1, ..., xm be m ≥ n + 2 distinct points from S. Consider thesystem of n+ 1 homogeneous linear equations in variables γ1, ..., γm

γ1x1 + ...+ γmx

m = 0 and γ1 + ...+ γm = 0

Since m ≥ n+ 2, there is a nontrivial solution to this system. Let

R = {xi : γi > 0} and B = {xi : γi < 0}.

Then R ∩ B = ∅. Let β =∑

i:γi>0

γi then β > 0 and∑

i:γi<0

γi = −β, since γ’s sum

up to zero. Moreover, ∑i:γi>0

γixi =

∑i:γi<0

(−γi)xi

since∑γix

i = 0. Let

x =∑i:γi>0

γiβxi =

∑i:γi<0

−γiβxi.

Thus x is a convex combination of points from R and a convex combination ofpoints from B. �

Example: For any set of four points in the plane (R2) either one of the pointslies within the convex hull of other three or the points can be split into two pairswhose convex hulls intersect. In both cases, Radon’s theorem is true.

Proof of Helly’s theorem. (Due to Radon, 1921). The proof is by induc-tion on the number of points, m, starting with m = n+1. Suppose that m > n+1.Then, by the induction hypothesis, for every i = 1, ...,m there is a point pi in theintersection A1 ∩ ... ∩Ai−1 ∩Ai+1 ∩ ... ∩Am (Ai is omitted). Altogether there arem > n+ 1 points each of which belongs to all the sets, except perhaps Ai.

If two of the points are the same then that point belongs to all Ai’s. Otherwise,by Radon’s Theorem, there are non-intersecting subsets R = {pi : i ∈ I} andB = {pi : j ∈ J} such that there is a point p ∈ convR ∩ convB. We claim thatp is a common point of all Ai’s. All the points of R belong to the sets Ai : i /∈ Iand all the points of B belong to the sets Aj : j /∈ J . Since the sets Ai’s and Aj ’sare convex, every point in convR belongs to Ai : i /∈ I and every point in convBbelongs to Aj : j /∈ J . Therefore

p ∈ ∩i/∈IAi and p ∈ ∩i/∈JAj .

Since I ∩ J = ∅ we have p ∈ ∩mi=1Ai. �

2. SUPPORT AND SEPARATION 61

2. Support and Separation

2.1. Hyperplanes. The concept of hyperplane in Rn is a straightforward gen-eralisation of the notion of a line in R2 and of a plane in R3. A line in R2 can bedescribed by an equation

p1x1 + p2x2 = α

where p = (p1, p2) is some non-zero vector and α is some scalar. A plane in R3 canbe described by an equation

p1x1 + p2x2 + p3x3 = α

where p = (p1, p2, p3) is some non-zero vector and α is some scalar. Similarly, ahyperplane in Rn can be described by an equation∑n

i=1pixi = α

where p = (p1, p2, ..., pn) is some non-zero vector in Rn and α is some scalar. It canbe written in more concise way using scalar (aka inner, dot) product notation.

Definition 21. A hyperplane is the set

H(p, α) = {x ∈ Rn : p · x = α}where p ∈ Rnis a non-zero vector and α is a scalar. The vector p is called thenormal to the hyperplane H.

Suppose that there are two points x∗, y∗ ∈ H(p, α). Then by definition p·x∗ = αand p · y∗ = α. Hence p · (x∗ − y∗) = 0. In other words, vector p is orthogonal tothe line segment (x∗ − y∗), or to H(p, α).

Given a hyperplane H ⊂ Rn points in Rn can be classified according to theirpositions relative to hyperplane. The (closed) half-space determined by the hyper-plane H(p, α) is either the set of points ‘below’ H or the set of points ‘above’ H,i.e., either the set {x ∈ Rn : p · x ≤ α} or the set {x ∈ Rn : p · x ≥ α}. Openhalf-spaces are defined by strict inequalities. Prove that a closed half-space is closedand open half-space is open.

The straightforward economic example of a half-space is a budget set {x ∈Rn : p · x ≤ α} of a consumer with income α facing the vector of prices p. (It wasrather neat to call the normal vector p, wasn’t it?). By the way, hyperplanes andhalf-spaces are convex sets (Prove it).

2.2. Support Functions. In this section we give a description of what iscalled a dual structure. Consider the set of all closed convex subsets of Rn. Wewill show that to each such set S we can associate an extended-real valued functionµS : Rn → R∪{∞}, that is a function that maps each vector in Rn to either a realnumber or to −∞. Not all such functions can be arrived at in this way. In factwe shall show that any such function must be concave and homogeneous of degree1. But once we restrict attention to functions that can be arrived at as a “supportfunction” for some such closed convex set we have another set of objects that wecan analyse and perhaps make useful arguments about the original sets in whichwe where interested.

In fact, we shall define the function µS for any subset of Rn, not just the closedand convex ones. However, if the original set S is not a closed convex one we shalllose some information about S in going to µS . In particular, µS only depends onthe closed convex hull of S, that is, if two sets have the same closed convex hullthey will lead to the same function µS .

We define µS : Rn → R ∪ {−∞} as

µS(p) = inf{p · x | x ∈ S},


where inf denotes the infimum or greatest lower bound. It is a property of thereal numbers that any set of real numbers has an infimum. Thus µS(p) is welldefined for any set S. If the minimum exists, for example if the set S is compact,then the infimum is the minimum. In other cases the minimum may not exist. Totake a simple one dimensional example suppose that the set S was the subset of Rconsisting od the numbers 1/n for n = 1, . . . and that p = 2. Then clearly p ·x = pxdoes not have a minimum on the set S However 0 is less than px = 2x for any valueof x in S but for any number a greater than 0 there is a value of x in S such thatpx < a. Thus 0 is in this case the infimum of the set {p · x | x ∈ S}.

Recall that we have not assumed that S is convex. However, if we do assumethat S is both convex and closed then the function µS contains all the informationneeded to reconstruct S.

Given any extended-real valued function µ : Rn → R ∪ {∞} let us define theset Sµ as

Sµ = {x ∈ Rn | p · x ≥ µ(p) for every p ∈ Rn}.That is, for each p > −infty we define the closed half space

{x ∈ Rn | p · x ≥ µ(p)}.

Notice that is µ(p) = −∞ then p · x ≥ µ(p) for any x and so the above set will beRn rather than a half space. The set Sµ is the intersection of all these closed halfspaces. Since the intersection of convex sets is convex and the intersection of closedsets is closed, the set Sµ is, for any function µ, a closed convex set.

Suppose that we start with a set S, define µS as above and then use µS todefine the set SµS

. If the set S was a closed convex set then SµSwill be exactly

equal to S. Since we have seen that SµSis a closed convex set, it must be that if

S is not a closed convex set it will not be equal to SµS. However S will always be

a subset of SµS, and indeed SµS

will be the smallest closed convex set such that Sis a subset, that is SµS

is the closed convex hull of S.

2.3. Separation. We now consider the notion of ‘separating’ two sets by ahyperplane.

Definition 22. A hyperplane H separates sets A and B if A is contained inone closed half-space and B is contained in the other. A hyperplane H strictlyseparates sets A and B if A is contained in one open half-space and B is containedin the other.

It is clear that strict separation requires the two sets to be disjoint. For example,consider two (externally) tangent circles in a plane. Their common tangent lineseparates them but does not separate them strictly. On the other hand, although itis necessary for two sets be disjoint in order to strictly separate them, this conditionis not sufficient, even for closed convex sets. Let A = {x ∈ R2 : x1 > 0 andx1x2 ≥ 1} and B = {x ∈ R2 : x1 ≥ 0 and x2 = 0} then A and B are disjointclosed convex sets but they cannot be strictly separated by a hyperplane (line inR2). Thus the problem of the existence of separating hyperplane is more involvedthen it may appear to be at first.

We start with separation of a set and a point.

Theorem 15. Let S ⊆ Rn be a convex set and x0 /∈ S be a point. Then S andx0 can be separated. If S is closed then S and x can be strongly separated.

Idea of proof. Proof proceeds in two steps. The first step establishes theexistence a point a in the closure of S which is the closest to x0. The second stepconstructs the separating hyperplane using the point a.

2. SUPPORT AND SEPARATION 63

STEP 1. There exists a point a ∈ S (closure of S) such that d(x0, a) ≤ d(x, a)for all x ∈ S, and d(x0, a) > 0.

Let B(x0) be closed ball with centre at x0 that intersects the closure of S.Let A = B(x0) ∩ S 6= ∅. The set A is nonempty, closed and bounded (hencecompact). According to Weierstrass’s theorem, the continuous distance functiond(x0, x) achieves its minimum in A. That is, there exists a ∈ A such that d(x0, a) ≤d(x, a) for all x ∈ S. Note that d(x0, a) > 0

STEP 2. There exists a hyperplane H(p, α) = {x ∈ Rn : p · x = α} such thatp · x ≥ α for all x ∈ S and p · x < α.

Construct a hyperplane which goes through the point a ∈ S and has normalp = a − x0. The proof that this hyperplane is the separating one is done bycontradiction. Suppose there exists a point y ∈ S which is strictly on the same sideof H as x0. Consider the point y′ ∈ [a, y] such that the vector y′−x0 is orthogonalto y − a. Since d(x0, y) ≥ d(x0, a), the point y′ is between a and y. Thus, y ∈ Sand d(x0, y

′) ≤ d(x0, a) which contradicts the choice of a. When S = S, that is Sis closed, the separation can be made strict by choosing a point strictly in betweena and x0 instead of a. This is always possible because d(x0, a) > 0. �

Theorem 15 is very useful because separation of a pair of sets can be alwaysreduced to separation of a set and a point.

Lemma 2. Let A and B be a non-empty sets. A and B can be separated(strongly separated) iff A−B and 0 can be separated (strongly separated).

Proof. If A and B are convex then A− B is convex. If A is compact and Bis closed then A−B is closed. And 0 /∈ A−B iff A ∩B = ∅. �

Theorem 16 (Minkowski, 1911). Let A and B be a non-empty convex setswith A ∩B = ∅. Then A and B can be separated. If A is compact and B is closedthen A and B can be strongly separated.

2.4. Support. Closely (not in topological sense) related to the notion of aseparating hyperplane is the notion of supporting hyperplane.

Definition 23. The hyperplane H supports the set S at the point x0 ∈ S ifx0 ∈ H and S is a subset of one of the half-spaces determined by H.

A convex set can be supported at any of its boundary points, this is the imme-diate consequence of Theorem 16. To prove it, consider the sets A and B = {x0},where x0 is a boundary point of A.

Theorem 17. Let S ⊆ Rn be a convex set with nonempty interior and x0 ∈ Sbe its boundary point. Then there exist a supporting hyperplane for S at x0.

Note that if the boundary of a convex set is smooth (‘differentiable’) at thegiven point x0 then the supporting hyperplane is unique and is just the tangenthyperplane. If, however, the boundary is not smooth then there can be manysupporting hyperplanes passing through the given point. It is important to notethat conceptually the supporting theorems are connected to calculus. But, thesupporting theorems are more powerful (don’t require smoothness), more direct,and more set-theoretic.

Certain points on the boundary of a convex set carry a lot of information aboutthe set.

Definition 24. A point x of a convex set S is an extreme point of S if x isnot an interior point of any line segment in S.


The extreme points of a closed ball and of a closed cube in R3 are its boundarypoints and its eight vertices, respectively. A half-space has no extreme points evenif it is closed.

An interesting property of extreme points is that an extreme point can bedeleted from the set without destroying convexity of the set. That is, a point x ina convex set S is an extreme point iff the set S\{x} is convex.

The next Theorem is a finite-dimensional version of a quite general and powerfulresult by M.G. Krein and D.P. Milman.

Theorem 18 (Krein & Milman, 1940). Let S ⊆ Rn be convex and compact.Then S is the convex hull of its extreme points.

Foundations Of Economic Analysis

Documents

Transcript of Foundations Of Economic Analysis