Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could...

29
Chapter 3 - Multivariate Calculus * Justin Leduc These lecture notes are meant to be used by students entering the University of Mannheim Master program in Economics. They constitute the base for a pre-course in mathematics; that is, they summarize elementary concepts with which all of our econ grad students must be familiar. More advanced concepts will be introduced later on in the regular coursework. A thorough knowledge of these basic notions will be assumed in later coursework. No prerequisite beyond high school mathematics are required. Although the wording is my own, the definitions of concepts and the ways to approach them is strongly inspired by various sources, which are mentioned explicitly in the text or at the end of the chapter. * This Version: September 9th, 2015 Center for Doctoral Studies in Economic and Social Sciences. Contact: [email protected] 1

Transcript of Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could...

Page 1: Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could one use the answer to the second question as an answer for the rst question? ... 3.

Chapter 3 - Multivariate Calculus∗

Justin Leduc†

These lecture notes are meant to be used by students entering the University of Mannheim

Master program in Economics. They constitute the base for a pre-course in mathematics;

that is, they summarize elementary concepts with which all of our econ grad students must

be familiar. More advanced concepts will be introduced later on in the regular coursework.

A thorough knowledge of these basic notions will be assumed in later

coursework. No prerequisite beyond high school mathematics are required.

Although the wording is my own, the definitions of concepts and the ways to approach them

is strongly inspired by various sources, which are mentioned explicitly in the text or at the

end of the chapter.

∗This Version: September 9th, 2015†Center for Doctoral Studies in Economic and Social Sciences. Contact: [email protected]

1

Page 2: Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could one use the answer to the second question as an answer for the rst question? ... 3.

Contents

1 Introduction 3

1.1 What are derivatives? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Why do we use derivatives? . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Multivariate Functions, Generalized Derivatives 8

2.1 Partial Derivatives and the Gradient . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Directional Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 Multivariate Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 Functions Mapping to High-Dimensional Spaces . . . . . . . . . . . . . . . . 16

2.5 Higher Order Partial Derivatives and the Taylor Approximation Theorems . 17

3 Convexity, Concavity, and Multivariate Real-Valued Functions 20

3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 Useful Characterizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.3 Quasi-concavity, Quasi-convexity . . . . . . . . . . . . . . . . . . . . . . . . 24

A Appendix - Homogeneity 27

2

Page 3: Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could one use the answer to the second question as an answer for the rst question? ... 3.

1 Introduction

Wikipedia provides a good explanation of what calculus actually is about:

“Calculus is the mathematical study of change, in the same way that geometry is

the study of shape and algebra is the study of operations and their application

to solving equations. It has two major branches, differential calculus (concerning

rates of change and slopes of curves), and integral calculus (concerning accumu-

lation of quantities and the areas under and between curves); these two branches

are related to each other by the fundamental theorem of calculus, [...] [which]

states that differentiation and integration are inverse operations.”

In this chapter, we will focus on the first branch of calculus, that is, differential calculus.

Integration is obviously important too, but is technically more involved, and would require a

whole course to carefully cover the very basics of it. If you are not familiar with integration,

it is therefore probably more adequate – for the moment! – to simply look at some cookbook

style lecture notes about integrals and how to perform integration, rather than to dig into

the difficult theory of measure and integration. Although I will not cover this topic here,

please note that knowing how to do basic integration and especially simple integration by

part will come in a handy to you during the year1.

1.1 What are derivatives?

In this section, we seek to understand what is precisely meant by ”the mathematical study

of change”. We consider the very simple case of real valued functions of a single variable.

Namely, let f be a function with domain X in R and codomain Y in R, let x̄ ∈ R, and

assume we are asked what the instantaneous rate of change, or slope, of f at x = x̄ is. The

question isn’t particularly easy to answer. Well, a general piece of advice when one doesn’t

know how to solve a mathematical problem is the following: first try to look for a slightly

different problem which you know how to solve. With a bit of luck, answering this second

problem will help you find a solution to the first one. In the present case, for instance, one

could try to answer first the following question: what is the average rate of change of f from

x̄ to x̄ + h, h ∈ R? This second question is more familiar to us: we just compute the ratio

of the rate of change in f to that of the rate of change in x:

∆f

∆x=f(x̄+ h)− f(x̄)

h

Could one use the answer to the second question as an answer for the first question? Well,

in many cases, if h is large, it would be hard to claim that the average rate of change is

a satisfactory answer to the first question. If h is set equal to zero, the situation is even

1A very good website on which you can speed up acquisition of such knowledge is that of Paul Dawkinshttp://tutorial.math.lamar.edu/

3

Page 4: Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could one use the answer to the second question as an answer for the rst question? ... 3.

worse: we end up with an undetermined coefficient. Second piece of advice when facing a

tough mathematical problem: in low-dimensional spaces, geometry can help us! Let us try

to picture graphically what happens as we shrink h. For instance, in the following figure, we

started with a large h1 and then considered a smaller h2:

Figure 1: Average Rate of Change – an Answer to Instantaneous Rate of Change?

Intuitively, the smaller h becomes, the more one can feel satisfied with using the average rate

of change as an approximate answer for the question of the instantaneous rate of change.

And yet, h should never reach zero. This is reminiscent of our limit concept! When asked

about the slope of a function f at a given point x̄, one should try to see whether the following

limit is well defined:

limh→0

f(x̄+ h)− f(x̄)

h

If the limit exists, we denote it f ′(x̄) and call it the derivative of f at x̄. f is then said to be

differentiable at x̄. The derivative of f at x̄, provided it exists, is the best answer one has to

the question about the instantaneous rate of change of f at x̄. A function differentiable at

every point of its domain is simply called differentiable.

Remark: Note that derivatives, provided they exists, are unique. Indeed, it easy to show that,

in a metric space, limits are unique whenever they exist: start by assuming that two limits

exist, then use the triangle inequality to show that they must be the same! This remark will

also apply to all more general concepts of derivative we will study in this chapter.

4

Page 5: Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could one use the answer to the second question as an answer for the rst question? ... 3.

1.2 Why do we use derivatives?

Why would we care about the possibility to formally define the instantaneous rate of change

of a function f? The answer to this question is linked to the idea that motivated the intro-

duction of vector spaces and is twofold: (i) most functions we investigate have a domain (and

also sometimes a codomain) that lies in a high-dimensional vector space, i.e., a vector space

that cannot be pictured geometrically (ii) the instantaneous rate of change, if defined, gives

us detailed information about the local behavior of the functions, and thereby allows us, in

such spaces, to analytically see what we cannot geometrically see2. As a result, while

a rigorous, analytical definition of the instantaneous rate of change appears redundant when

working with low-dimensional functions – i.e., functions with domain and image within Rn,

n ≤ 3 – it becomes our only tool in high dimensional spaces.

The best way to convince you is to tell you what I precisely mean by “the derivative gives

us detailed information about the local behavior of a function”. Let f be a function with

domain and codomain in R. The existence and value of the derivative of a function f gives

us three important pieces of information about f :

• Let f be differentiable at x̄, then f is continuous at x̄.

Proof:

f ′(x̄) := limx→x̄

f(x)− f(x̄)

x− x̄Thus, f ′(x̄) well-defined ⇒ ∃ε > 0 such that, ∀x ∈ Bε(x̄),

f(x)− f(x̄)

x− x̄is well defined.

Let x ∈ Bε(x̄), then f(x)− f(x̄) =f(x)− f(x̄)

x− x̄(x− x̄).

Let us take the limit of this expression:

limx→x̄

[f(x)− f(x̄)] = limx→x̄

[f(x)− f(x̄)

x− x̄

]limx→x̄

[x− x̄] = f ′(x̄)× 0 = 0.

Which is just what is required for continuity, has defined in last chapter.

• Let f be differentiable at x̄, then there exists a good “linear approximation”

of f in a neighborhood3 of x̄.

We like linear functions because they are simple and we know how they work. Unfor-

tunately, it is not likely that the functions involved in our applications be linear. A

good solution, then, is often to choose a function that has the properties required for

the application and that is differentiable on the interval of interest. Such functions are

2Providing a definition of “analytical” isn’t straightforward. The message I wish to convey here is thefollowing: symbols and equations are our only eyes in high-dimensional spaces!

3A “neighborhood” of x̄ is simply defined as subset that contains a ε-open ball centered at x̄ for someε > 0

5

Page 6: Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could one use the answer to the second question as an answer for the rst question? ... 3.

“locally linear”, in the sense that we can locally assimilate their behavior to that of

linear functions.

More formally, the tangent of a function f : X ⊆ R → R at x̄ is the following linear

function:

x 7→ f(x̄) + f ′(x̄)(x− x̄)

Note that it is defined everywhere on R, as x̄, f(x̄) and f ′(x̄) are real numbers. To

see why the tangent is indeed a good approximation of f in a neighborhood of x̄,

let x ∈ Bε(x̄) and let ε(x) be our approximation error when using the tangent to

approximate f at x. Then

ε(x)

(x− x̄):=

f(x)− f(x̄)− f ′(x̄)(x− x̄)

(x− x̄)=f(x)− f(x̄)

(x− x̄)− f ′(x̄)

Therefore, by definition of the derivative,

limx→x̄

[ε(x)

(x− x̄)

]= 0

In words, when x gets close to x̄ the approximation error becomes insignificant, in the

sense that it is much smaller even than the discrepancy between x and x̄. Geometrically,

the tangent may be seen as the limit of secants4 of f :

Figure 2: The Tangent as the Limit of Secants

• Let a and b be real numbers such that a < b. If f is continuous and differ-

entiable on (a,b), then:

(i) f ′(x) = 0 for all x ∈ (a,b) iff f is constant on (a,b).

(ii) f ′(x) < 0 for all x ∈ (a,b) iff f is decreasing on (a,b).

(iii) f ′(x) > 0 for all x ∈ (a,b) iff f is increasing on (a,b).

4A secant is simply a line going through two specified points of a curve.

6

Page 7: Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could one use the answer to the second question as an answer for the rst question? ... 3.

Think again of our tangent equation. Replacing f ′(x) by any null, negative, or pos-

itive number at all points of (a, b) should convince you of the above fact. Picture it

geometrically!

Each of these geometrical insight is of great importance when optimizing functions with

domain or codomain in a high-dimensional space. Only because of them we can claim that

analytical formulas serve as substitution eyes in such spaces. But before they do, we must

make sure we generalize them in a proper way. Therefore, the purpose of this chapter

is to generalize the concept of a derivative in a manner that preserves precisely

these pieces of information that the derivative of f at a point delivers to us.

Although such a generalization exists for infinite-dimensional spaces (e.g. function spaces),

and happens to be very close to the generalization for finite-dimensional spaces (e.g. Rn,

n ∈ N), we restrict ourselves to finite-dimensional vector spaces. The reason is twofold: (i)

you will not be expected to be able to work with infinite dimensional spaces in the master’s

curriculum, (ii) if you grasp the generalization for finite dimensional spaces, then you will

easily grasp that for infinite dimensional spaces as it is presented in books5.

Remark 1: In some situations, it is more convenient to not only have the instantaneous rate

of change but directly an approximation of the level of change. For this reason, we define

another function df : X × R→ R, called the total differential as follows:

df(x, dx) = f ′(x)dx

where dx denotes the change in x6. It is important to note that the total differential refers

to a function of two variables and not to single variable function like the derivative. Its

output is the level of change in f when x moves at a distance dx from x̄. Geometrically, we

can evaluate the total differential at x̄ and plot the resulting one variable function, which is

exactly the tangent to f at x̄.

Remark 2: The following notation is sometimes observed:

df

dx(x) := f ′(x)

It is important not to confuse thisdf

dxwith the total differential of f . The above notation is

an alternative to the notation of a derivative and not to that of the differential. Thus, we

have:

df(x, dx) =df

dx(x)dx

where the dx terms cannot be canceled in the last expression! (One is just a notation, the

other is a variable of its own!)

5For studying extensions of optimization techniques to infinite dimensional vector spaces, and the theoryof optimization by vector space methods in general, I strongly recommend Luenberger’s classic [4].

6replaces the previous notation h

7

Page 8: Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could one use the answer to the second question as an answer for the rst question? ... 3.

2 Multivariate Functions, Generalized Derivatives

Before we start the process, let us make sure that our vocables are in line. In this chapter,

the term vector is to be understood in its classical sense, i.e., as a column vector composed

of real numbers, unless otherwise specified. The use of a row vector will thus be explicitly

mentioned via a transposition symbol such as a ′. A vector of dimension n× 1 is said to be

of length n. A vector of length 1 is a scalar, i.e., a real number.

I required as a prerequisite that you be clear with the concept of a function, f , its domain,

X, its codomain, Y , and its image, f(X). If Y ⊆ R, i.e., if f maps into the real line, we say

that f is a real valued function. If X ⊆ R, i.e., if f takes real numbers as inputs, we call f

a function of a single variable, or univariate function. Finally, if X ⊆ Rn, n > 1, i.e., if f

takes as input vectors of length n, n > 1, we call f a multivariate function. We will derive

the insights of the generalization using real-valued multivariate functions, i.e., functions with

domain Rn, n > 1 and codomain in R. Then, we will discuss functions going from Rn to

Rm, with n > 1 and m > 1.

Throughout the discussion, we will try to exploit geometric intuition as much as we can.

The most common way to geometrically represent a function is to plot its graph. Let f

be a function with domain X, codomain Y , and image f(X). Analytically, the graph of a

function is a collection of ordered pairs (x, f(x)), with x ∈ X and f(x) ∈ Y . Hence, the

graph is an element of the cartesian product X × Y , a space with dimension dim(X × Y ) =

dim(X)+dim(Y ). In the next chapter, we will discuss a different geometrical representation

of functions which is slightly less demanding in terms of dimension.

Finally, let us come back to the promised discussion on inverse functions. Let f be a

function with domain X, codomain Y , and image f(X). In the preliminary chapter, I said,

that, under some conditions, one can define an inverse function f−1 mapping from Y to X,

and such that f−1(f(x)) = x. What precisely are these conditions? There are two of them,

both quite intuitive. The first one is that we want our function to be well defined over the

codomain Y , i.e., we want that it be defined for any element y in Y . This will not be the

case, if the image of f , f(X) is a proper subset of Y , for then we would not know what

value to associate to f−1(y), y ∈ Y \ f(X). A first natural condition, then, is to require

that the image and the codomain of f coincide, i.e., f(X) = Y . If a function satisfies this

condition, it is said to be onto7. In other words, if f is onto, then, for any element, y,

that we pick in the codomain, Y , of f , there exists an x, in the domain, X, of f , such that

f(x) = y. It is a condition on the codomain of f 8. The second requirement is intuitive

too. Namely, we know that for any function f , if we take an element x in its domain X,

7a.k.a. surjective.8Note that this is not a very demanding condition, as it suffices to properly choose the codomain of our

function.

8

Page 9: Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could one use the answer to the second question as an answer for the rst question? ... 3.

then, by definition, there exists a unique y in the image, f(X), such that y = f(x). The

converse, however, need not be true, i.e., take an element y in f(X) and consider its set of

antecedents, i.e., the set of x such that f(x) = y, there is no a priori reason that this set

be a singleton9. But if the converse is not true, then, once we consider the inverse mapping

f−1 we will end up having several images associated to a single y in f(X). That, in turn,

would disqualify our inverse map for the label of “function”. The second condition, then is

that the converse holds, i.e., that every distinct element x of the domain X maps into a dis-

tinct element y of the image f(X). If it is the case, we say that a function f is one-to-one10.

A function possesses an inverse function if and only if it is one-to-one and onto11.

See exercises 3 and 4.

2.1 Partial Derivatives and the Gradient

Let f : X ⊆ Rn → R be a function of n independent variables12. The simplest way to

proceed with our generalization is to simply transform that function into functions of one

variable and to use our usual single-variable notion of derivation. We do this by “consciously

forgetting” that actually n variables could be changing at the same time and instead focus

on what happens when only one of the n variables is changing at a time. This process is

formally stated in the following definition.

Definition: (Partial Derivative)

Let X ⊆ Rn and suppose f : X → R. If x̄ is an interior point of X, then, the partial

derivative of f with respect to xi at x̄ is defined as

∂f(x̄)

∂xi:= lim

h→0

f(x̄1, ..., x̄i−1, x̄i + h, x̄i+1, ..., x̄n)− f(x̄1, ..., x̄i, ..., x̄n)

h

with h ∈ R, whenever the limit exists. Another common notation for the partial deriva-

tive of f with respect to xi at x̄ is fi(x̄).

9A singleton is a set that contains a single element.10a.k.a. injective. Note that this is a much more demanding requirement. Think, for instance, of the

constant function. There is no way you will ever turn it into a one-to-one map.11a.k.a. bijective.12I distinguish here three cases: Independent variables are variables which can move in any direction

without having any impact on each other. For instance, two independently distributed random variablesin an experiment. Indirectly dependent variables are variables which cannot freely move without impactingthe other. For instance, the amount of goods in an economy and the happiness of individuals. Finally,Directly dependent variables are two variables for which the move of one totally determines the move of theother. Directly dependent variables are not interesting for us though, because we easily get rid of them byexpressing one as a function of the other and reducing the dimension of our problem.

9

Page 10: Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could one use the answer to the second question as an answer for the rst question? ... 3.

Hence, the idea behind the partial derivative with respect to xi at x̄ is to consider all (n−1)

xj’s, j 6= i, as fixed and do as if f(.) was only a function of xi. There are thus – at most! – n

partial derivatives, one for each of the variables. Note that, a partial derivative, if defined,

is defined at a given point: x̄. Moreover, not only does the value of x̄i matter, but also

those at which one fixes the xj, j 6= i. This is illustrated by the following example: let

f(x1, x2) = x1x2, then

∂f(x̄)

∂x1

= x̄2

which obviously depends on where we fix x2.

Figure 3: Partial derivative of f(x1, x2) with respect to x1 at x̄

Remark: In the above figure, the partial derivative depends on the second variable and not

on the first one (that is why I could draw the tangent line without mentioning which x̄1 I

was picking). It may as well depend on all variables, or on none of them, as in the following

example: compute the partial derivative of f(x1, x2) = x1 + x2 with respect to x2 at x̄:

∂f(x̄)

∂x2

= 1

If the partial derivative of f with respect to xi is defined at x̄, we say that f is partially

differentiable at x̄ with respect to xi. If the partial derivative of f with respect to xi is defined

at every point of f ’s domain, we say that f is partially differentiable with respect to xi. In

such a case, as is conventional with univariate derivatives, the partial derivative can be seen

as a function from X to R.

10

Page 11: Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could one use the answer to the second question as an answer for the rst question? ... 3.

To conclude the section – and sprinkle a bit of mystery!, consider the row vector whose

entries are the partial derivatives of f at x̄:

∇f (x̄) :=(f1(x̄) f2(x̄) · · · fn(x̄)

)Such a vector is called the gradient of f at x̄. It is an extremely important vector, section

2.3 will make you understand why!

See exercises 5

2.2 Directional Derivatives

Now, it is important to realize that we did not proceed without loss of generality when

generalizing the concept of derivation to multivariate functions. Remember, we “consciously

forgot” that all variables could move at the same time! That is, our generalization could

only claim to be proper for independent variables. In practice, variables often are indirectly

dependent or simply happen to change simultaneously, and we may wish to evaluate the

instantaneous rate of change of f if several of our variables change simultaneously. Let us

consider first the geometric intuition. In the case of a derivative for univariate functions it

was possible to move away from the point in any desirable direction, for the simple reason

that there was a unique direction along which to move, namely, that of the real line13! Then,

the derivative was indicating the rate of change in our function as we move away from x

along the real line, i.e., all directions we could think of moving along. Yet, as soon as we

have two or more dimensions, there are actually infinitely many possible directions along

which one could move14! Depending on the “geography” of our multivariate function, which

direction we focus on could matter! In the above definition, by requiring that all xj’s, j 6= i,

be fixed, we have imposed that the direction along which to evaluate the rate of change

should be that which is specified by the vector xi (as drawn in the Figure 3, where we picked

the direction of x1!).

Arguably, the real generalization of our derivative concept for univariate functions should

not impose any restriction on the direction along which to evaluate the rate of change. This

generalization exists and is called a multivariate derivative or more simply, derivative. But,

before we get there, we should first look at an intermediate concept, called the directional

derivative, which, if it exists, will give us the rate of change in any specified direction –

13Please do not confuse direction and sense! A direction simply indicates a line along which we move, thesense in which we move, however, indicates toward which side of the line we are moving.

14If you have troubles picturing that, simply imagine a single point within a 2-dimensional space (i.e., asimple plane, like a piece of paper), and start counting the number of lines that you can draw going throughthis point. The moment when you stop should come right after that when you realize there is an infinity ofsuch lines ;)

11

Page 12: Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could one use the answer to the second question as an answer for the rst question? ... 3.

and not only the direction induced by only one of our variables. We already know from the

previous chapter that a vector consists of a direction, a sense, and a magnitude. Thus, any

vector z ∈ Rn specifies a direction along which we could try to evaluate the rate of change

of our multivariate real-valued function. Yet, as an infinity of vectors could specify the same

direction, we need to impose a bit of consistency and convention suggest to take a unitary

z, i.e., a z such that ‖z‖= 1. From there, we can proceed with a rather intuitive definition:

Definition: (Directional Derivative)

Let X ⊆ Rn and suppose f : X → R. If x̄ is an interior point of X, then, the rate

of change of f(x) at x̄ in the direction of unit vector z = (z1, ..., zn) ∈ Rn is called the

directional derivative, is denoted Dzf(x̄), and is defined as

Dzf(x̄) = limh→0

f(x̄+ hz)− f(x̄)

h

with h ∈ R, whenever the limit exists and whenever x̄+ hz ∈ X.

In particular, if z is going in the direction of xi, i.e., z = (0, ..., 0, 1, 0, ..., 0) ∈ Rn, where

the only positive component is that in the direction of xi, then the directional derivative

coincides with the partial derivative with respect to xi. To illustrate the concept, let us

consider again the case f(x1, x2) = x1x2:

Figure 4: Directional derivative of f(x1, x2) in direction of (1, 1) at x̄

In practice, it is not a very convenient definition: choosing a vector, finding the unitary

vector with the same direction, computing a limit... Luckily, there exists an important

characterization of the directional derivative which emphasize its link with partial derivatives

and which is much more convenient for applications. I present it in the next section.

12

Page 13: Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could one use the answer to the second question as an answer for the rst question? ... 3.

2.3 Multivariate Derivative

Both generalizations introduced above preserve the second and third geometrical insight

mentioned in the the introduction. Given a point and a specified direction, one can locally

assimilate a function’s behavior to that of a linear function, provided the directional deriva-

tive exists. Yet, these generalizations could seem insufficient in the following sense: they

do not preserve the first important geometrical insight. Indeed, given a point x̄, neither the

existence of all partial derivatives, nor the existence of all directional derivatives is sufficient

to guarantee continuity of the function at x̄ (see exercise 6). For this reason, one introduces

the concept of (multivariate) derivative. This concept is the one that should be perceived as

a complete generalization of a univariate derivative.

Let Df(x̄) denote the generalized version of the derivative at x̄, i.e., f ’s instantaneous rate

of change at x̄. Continuity, as defined in the previous chapter, is about preservation of

neighborhoods. As we are looking for continuity, we shall not specify any direction along

which x moves. Rather, we specify the distance that separates the new x from our starting

point x̄ and allow for movement in any direction within that distance. In the univariate case,

f is differentiable at x̄ if and only if there exists a real number f ′(x̄) such that:

limh→0

f(x̄+ h)− f(x̄)

h− f ′(x̄) = 0

And because the norm is a continuous function, this implies:

lim‖h‖→0

‖f(x̄+ h)− f(x̄)− f ′(x̄)h‖‖h‖

= 0

Using this expression to generalize f ′(x̄) ensures that continuity will be preserved.

Definition: (Multivariate Derivative)

Let X ⊆ Rn and suppose f : X → R. If x̄ is an interior point of X, then f is differentiable

at x̄ if and only if there exists a row vector Df(x̄) such that

lim‖h‖→0

‖f(x̄+ h)− f(x̄)−Df(x̄) · h‖‖h‖

= 0

where h is a vector in Rn. If such a vector Df(x̄) exists, we interpret it as the derivative

of f at x̄.

Remark: As in the univariate case, for any x in X, we define the total differential at x̄,

denoted df(x̄, x), as:

df(x̄, x) := Df(x̄) · x

13

Page 14: Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could one use the answer to the second question as an answer for the rst question? ... 3.

Df(x̄) is to be interpreted as the generalization of our univariate derivative f ′(x̄)

at x̄. Yet, defined as it is, it is a bit difficult to exactly see what it is! The following result,

by relating it to the geometrically intuitive partial and directional derivatives, gives useful

insights.

Theorem: (Total Differential, Directional Derivative, and Gradient)

Let X ⊆ Rn and suppose f : X → R. If x̄ is an interior point of X, and if f is

differentiable at x̄, then:

(i) all directional derivativesa of f exist at x̄ and,

(ii) ∀ z in X, ‖z‖= 1: df(x̄, z) := Df(x̄) · z = Dzf(x̄) = ∇f (x̄) · zaReminder: partial derivatives are only a special case of directional derivatives!

In words, Df(x̄), the true generalization of the univariate derivative, is nothing else than

the vector of partial derivatives of f at x̄, i.e., the gradient of f at x̄, ∇f (x̄)! That is, the

instantaneous rate of change of f in all directions is coincides with the gradient of f !

IN SHORT, IF YOU UNDERSTOOD WHAT THE GRADIENT IS, THEN YOU almost

RULE THE WORLD!

Proof: As f is differentiable at x̄, we know that there exists Df(x̄) in Rn such that:

lim‖h‖→0

‖f(x̄+ h)− f(x̄)−Df(x̄) · h‖‖h‖

= 0

Where h is a vector in Rn. Alternatively, let h = tz, where z is any unit vector and t a scalar

going to zero, i.e.,

limt→0

‖f(x̄+ h)− f(x̄)−Df(x̄) · tz‖‖tz‖

= 0

Using that ‖z‖= 1 and arranging a bit, one has:

limt→0

‖f(x̄+ tz)− f(x̄)‖|t|

= ‖Df(x̄) · z‖

On each side, the sign of terms inside the norm always coincide15, and one has:

limt→0

f(x̄+ tz)− f(x̄)

t= Df(x̄) · z

that is,

15Realize that if f increases in the direction of z, then both the left hand term and the right hand terminside the norms are positive. If f decreases in the direction of z, then both the left hand term and the righthand term inside the norms are negative.

14

Page 15: Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could one use the answer to the second question as an answer for the rst question? ... 3.

Dzf(x̄) = Df(x̄) · z

Hence, all directional derivatives are well defined and so must be the partial derivatives in

particular. To prove the second part, observe that:

lim‖h‖→0

‖f(x̄+ h)− f(x̄)−Df(x̄) · h‖‖h‖

= 0

Df(x̄) is a vector here, and you may denote its entries by (ai)i=1,...,n. Let h = (h1, ..., hn),

the limit then writes:

lim‖h‖→0

‖f(x̄+ h)− f(x̄)−∑n

i=1 aihi‖‖h‖

= 0

Now, let h approach zero for all but one coordinate, for instance, coordinate j. We get:

lim‖hj‖→0

‖f(x̄+ hj)− f(x̄)− ajhj‖‖hj‖

= 0

and an argument similar to that above allows us to take the aj out:

limhj→0

f(x̄+ hj)− f(x̄)

hj= aj

or, otherwise stated:

fj(x̄) = aj

As argued earlier, the converse of this theorem need not hold. However, one can show

the following:

Theorem: (Partial Differentiablility and Differentiability)

Let X ⊆ Rn, suppose f : X → R, and let x̄ be an interior point of X. If all the partial

derivatives of f at x̄ exist and are continuous, then f is differentiable.

Proof: See e.g. De la Fuente [1], Chapter 4, Theorem 3.4.

And, as we wished, we also have our link to continuity:

Theorem: (Partial Differentiablility and Continuity)

Let X ⊆ Rn, suppose f : X → R, and let x̄ be an interior point of X. If f is differentiable

at x̄, then f is continuous at x̄.

Proof: See e.g. De la Fuente [1], Chapter 4, Theorem 3.2.

15

Page 16: Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could one use the answer to the second question as an answer for the rst question? ... 3.

2.4 Functions Mapping to High-Dimensional Spaces

Consider now the more general case of a functions f with domain X in Rn, and codomain

Y in Rm, n and m greater than or equal to 2. The easiest way to proceed is to realize that

this high-dimensional function can be seen as a vector of real-valued multivariate functions.

Namely,

f =

f 1

f 2

...

fm

where each f i is a real-valued multivariate function mapping from Rn to R. We can then

extend our previous definition in a very natural way:

Definition: (Multivariate Derivative)

Let X ⊆ Rn and suppose f : X → Rm. If x̄ is an interior point of X, then f is

differentiable at x̄ if and only if there exists a matrix Df(x̄) such that

lim‖h‖→0

‖f(x̄+ h)− f(x̄)−Df(x̄) · h‖‖h‖

= 0

where h is a vector in Rn. If such a matrix Df(x̄) exists, we interpret it as the derivative

of f at x̄.

In words, our gradient is generalized by a matrix of first order derivatives, Df(x̄). Such

a matrix is called the Jacobian of f , sometimes denoted Jf (x̄). Further, the following result

holds:

Theorem: (Multivariate Derivative and Gradient of the Component Functions)

Let X ⊆ Rn, suppose f : X → Rm, and let x̄ be an interior point of X. Then, f is

differentiable at x̄ if and only if each of its component functions are differentiable at x̄.

Moreover, if f is differentiable at x̄, then:

(i) all directional derivatives of the component functions exist at x̄, and

(ii) the derivative of f at x̄ is the matrix of partial derivatives of the component

functions at x̄:

Jf (x̄) := Df (x̄) =

∇f1(x̄)...

∇fm(x̄)

=

∂f 1

∂x1

(x̄) · · · ∂f 1

∂xn(x̄)

... · · · ...∂fm

∂x1

(x̄) · · · ∂fm

∂xn(x̄)

∈ Rm×n

16

Page 17: Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could one use the answer to the second question as an answer for the rst question? ... 3.

Proof: See e.g. De la Fuente [1], Chapter 4, Theorem 3.3.

The partial converse to this theorem as well as the result on continuity also extend to

the high-dimensional case. The proofs indicated for them takes multidimensionality of the

codomain into account. Let us conclude with the following remark: a generalized chain rule

in high-dimensional space. Once one has introduced the concept of a multivariate derivative,

the multivariate chain rule is a straightforward extension of the univariate one.

Proposition: (Multivariate Chain Rule)

Let X ⊆ Rn suppose g : X → Y , where Y ⊆ Rm. Further, suppose f : Y → Z, where

Z ⊆ Rp. If x̄ is an interior point of X, g(x̄) an interior point of Y , and g and f are

differentiable at x̄ and g(x̄), respectively, then f ◦ g is differentiable at x̄ and:

D[f ◦ g](x) = Df(g(x̄))Dg(x̄)

Proof: See e.g. De la Fuente [1], Chapter 4, Theorem 3.5.

See exercises 6 and 7.

2.5 Higher Order Partial Derivatives and the Taylor Approxima-

tion Theorems

Let f : X ⊆ Rn → R be a function of n variables. Also, assume X is open16. Earlier

on, we suggested that, if the n partial derivatives of f are defined at each point of the open

domain, then the partial derivatives could themselves be perceived as functions from X to R.

Hence, one may attempt to compute their partial derivatives! If such derivatives are defined,

we call them second order partial derivative. Repeating the reasoning, one can attempt to

define third, fourth, fifth order derivatives, and so on. Notationally, fi(x) denotes first order

partial derivatives, fi,j(x) := ∂fi(x)/∂xj denotes second order partial derivatives, and so on...

Let X ⊆ Rn be open, suppose f : X → R, and let x̄ be an element of X. If all second

order partial derivatives of f are defined at x̄, then, in the same way as the first order

partial derivatives could be gathered in a vector – the gradient –, all the second order partial

derivatives can be gathered in a matrix. Such a matrix, denoted Hf (x̄), is called the Hessian

of f at x̄, is square, and should be thought of as a generalized second order derivative for

multivariate real valued functions.

16A convenient way to stop worrying about interiority! But if in your application X is not open, do notforget to check interiority!!

17

Page 18: Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could one use the answer to the second question as an answer for the rst question? ... 3.

Hf (x̄) =

∇f1(x̄)

∇f2(x̄)...

∇fn(x̄)

=

f1,1(x̄) f1,2(x̄) · · · f1,n(x̄)

f2,1(x̄) f2,2(x̄) · · · f2,n(x̄)...

.... . .

...

fn,1(x̄) fn,2(x̄) · · · fn,n(x̄)

Remark: The intermediate equality makes clear why the Hessian is an n × n matrix: it is

the derivative of the gradient of f at x̄, which is a function from X ⊆ Rn to Rn. Hence, the

Hessian is a Jacobian, but the converse is not true! Do not confuse both concepts!

Before we go on, let me introduce the concept of Ck-differentiability. Ck-differentiable

functions constitute the main class of functions economists work with, as they have many

nice properties. For instance, as the next theorem will state, Ck-differentiability constitute

a sufficient condition for consequence-less permutations in the order of derivation.

Definition: (Function of class Ck)

Let X ⊆ Rn be an open set, Y ⊆ R, and suppose f : X → Y . f is said to be of class Ck

on X, denoted f ∈ Ck(X, Y )a, if all partial derivatives of order less or equal to k exist

and are continuous on XaY could be equal to R. As real valued functions are heavily used in applications, the short notation

Ck(X) is taken as a substitute for f ∈ Ck(X,R).

Remark: It is common to call a C1(X) function a continuously differentiable function.

Theorem: (Schwarz’s Theorem / Young’s Theorem)

If f ∈ Ck(X), then the order in which the derivatives up to order k are taken can be

permuted.

Proof: See e.g., De la Fuente [1], Chapter 4, Theorem 2.6.

For instance, let X ⊆ Rn be open and suppose f : X → R. If f ∈ C2(X), then

fi,j(x) =∂fi(x)

∂xj=∂fj(x)

∂xi= fj,i(x)

Therefore, if f is C2-differentiable, then its Hessian matrix is symmetric!

We now have the equipment to discuss Taylor approximations17! Much of the result we

will see in the second half of this course and in the whole next chapter rely on Taylor ap-

proximations of the second order. Further, they are heavily used in macroeconomic analysis.

Albeit their rather heavy notation, they formalize a rather simple idea. Namely, in the same

17a.k.a. Taylor expansions.

18

Page 19: Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could one use the answer to the second question as an answer for the rst question? ... 3.

way that first order derivatives provide valuable information, so do higher order derivatives.

Therefore, more precise, polynomial (rather than simply linear), approximations can be build

in small neighborhoods of a point of interest. When the function under study is a function

of a single variable, it is possible to exploit Taylor expansions of high orders.

Theorem: (nth Order Univariate Taylor Approximation)

Let X ⊆ R be an open set and consider f ∈ Cn+1(X). Then f can be best nth order

approximated around x̄ by the nth order Taylor expansion:

f(x̄+ h) ≈ f(x̄) +n∑

k=1

f (k)(x̄)hk

k!

where h ∈ R is such that x̄ + h ∈ X and f (k)(x) denotes f ’s derivative of order k at x̄.

The error of approximation, also known as the remainder of the Taylor approximation,

is given by the following formula:

Rn(h | x̄) := f(x̄+ h)− f(x̄)−n∑

k=1

f (k)(x̄)hk

k!=f (n+1)(x+ λh)

(n+ 1)!hn+1

for some λ ∈ (0, 1).

Proof: See e.g., De la Fuente [1], Chapter 4, Theorem 1.9.

Remark: Note that the remainder approaches zero at a faster rate than h itself. This is what

guaranties the quality of the approximation!

When the function under study is a multivariate function it is computationally very de-

manding18 and conceptually non-straightforward to exploit derivatives of order higher than

2. Therefore, we usually stick to the first and second orders.

Theorem: (First Order Multivariate Taylor Approximation)

Let X ⊆ Rn be an open set and consider f ∈ C2(X). Then f can be best linearly

approximated around x̄ by the first order Taylor expansion:

f(x̄+ h) ≈ f(x̄) +∇f (x̄) · h

where h ∈ Rn is such that x̄ + h ∈ X. The error of approximation, also known as the

remainder of the Taylor approximation, is given by the following formula:

Rn(h | x̄) := f(x̄+ h)− f(x̄)−∇f (x̄) · h =1

2h′ ·Hf (x̄+ λh)h

for some λ ∈ (0, 1).

Proof: See e.g., De la Fuente [1], Solutions to Chapter 4, Exercise 4.5.

18The first order generalized derivative of a real valued multivariate function is a vector, the second ordergeneralized derivative a matrix, the third order generalized derivative a matrix with multiple layers,...

19

Page 20: Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could one use the answer to the second question as an answer for the rst question? ... 3.

Theorem: (Second Order Multivariate Taylor Approximation)

Let X ⊆ Rn be an open set and consider f ∈ C3(X). Then f can be best second order

approximated around x̄ by the second order Taylor expansion:

f(x̄+ h) ≈ f(x̄) +∇f (x̄) · h+1

2h′ ·Hf (x̄)h

where h ∈ Rn is such that x̄+h ∈ X. As ‖h‖ approaches zero, the remainder approaches

zero at a faster rate than h itself.

3 Convexity, Concavity, and Multivariate Real-Valued

Functions

3.1 Definition

Our last effort in this chapter is devoted to the notions of convexity and concavity of func-

tions. Their importance stems from optimization and will thus be emphasized in the next

chapter. For now, allow me to simply proceed with the formal discussion.

Definition: (Convex Real Valued Function)

Let X ⊆ Rn. A function f : X → R is convex if and only if X is a convex set and for

any two x, y ∈ X and λ ∈ [0, 1] we have

f(λx+ (1− λ)y) ≤ λf(x) + (1− λ)f(y)

Moreover, if this statement holds strictly whenever y 6= x and λ ∈ (0, 1), we say that f

is strictly convex.

Remark: Remember the epigraph story? And Jensen’s inequality? (Chapter 2, Exercise 3) If

yes, you probably understand that this definition of convex functions implies that the epigraph

of any convex function is convex. Show it!

Definition: (Concave Real-Valued Function)

Let X ⊆ Rn. A function f : X → R is concave if and only if −f is convex. Similarly f

is strictly concave if and only if −f is strictly convex.

Remarks: (i) Note that the definition of a concave real-valued function also requires that the

function be defined on a convex domain! (ii) Note that all affine functions are both convex

and concave! (iii) A concave function has a convex hypograph, a special case of this has

20

Page 21: Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could one use the answer to the second question as an answer for the rst question? ... 3.

also been seen in Chapter 2 Exercise 3!

3.2 Useful Characterizations

You already know the geometrical signification of convexity and concavity for univariate

real-valued functions: if one picks at random an x and a y in the domain of f and draw the

line segment between (x, f(x)) and (y, f(y)), it will lie weakly below or weakly above the

graph of f between f(x) and f(y)19. This segment will lie weakly above f if and only if f is

convex; it will lie weakly below if and only if f is concave. This is illustrated in Figure 5:

Figure 5: Concavity and convexity of univariate real-valued functions

Ideally, our generalized definition should preserve this geometrical insight. Does it? Let

us have a look at a simple convex function defined in X ⊂ R2, say, f(x1, x2) = x21 + x2

2. The

graph of f , i.e., the collection of ordered pairs (x, f(x)) for all x in the domain of f , lies in R3

(see Figure 6). If one draws the segment line that connect two points chosen at random on

that curve, somehow, it does appear to lie “above” the graph of the function. Yet, as a line

exhaust only one out of three dimensions, there are, in fact, two dimensions left free, and the

notion of “above” I just used isn’t as clear as that which I used when we were considering a

graph on the real plane20. Therefore, if we wish to formalize the multidimensional geometric

intuition, we must first specify a plane, i.e., a subspace of dimension 2, and make sure that,

on that plane, the line lie above the graph, in the classical sense of the term “above”.

19If you checked the prerequisite on basic logic, you have probably noted that I use or in its mathematicalsense or the term, i.e., both may hold at once. It will lie weakly below and weakly above if and only if f isaffine!

20In the real plane, a line is a hyperplane, i.e. an affine whose dimension differs by only one unit formthat of the universal space. “Above” is necessarily well defined, as only one dimension is left free outside ofthe line, and a simple convention suffices to distinguish an “above” from a “below”.

21

Page 22: Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could one use the answer to the second question as an answer for the rst question? ... 3.

Figure 6: Lying “above” in 3 or more dimensions

Remember that, independently of the dimension of the input vectors x, y ∈ Rn, f is

assumed to be real-valued and, therefore, f(λx + (1 − λ)y) as well as λf(x) + (1 − λ)f(y)

belong to R. Therefore, the definition only makes a statement about some specific planes

of R3. Let e denote the vector in Rn which spans the output dimension. The definition

tells us the following: consider any plane with a basis containing e; then, the restriction of

f to that plane is a real-valued univariate convex function. More practically, we are asked

to consider any two points x and y in the domain of f . Also, the domain is required to be

convex. Therefore, the vector z = y − x lies in the domain, and, further, it spans a line,

{t ∈ R | x + tz ∈ X}, that passes through both x and y. Combined with e (the vector

spanning the vertical axis), z generates a plane in the (n+1)-dimensional space in which the

the restriction of f to {t ∈ R | x+tz ∈ X} lies. If one focuses on this plane, one finds himself

back in the simple situation of considering the graph of a univariate real-valued function in

the real plane, and the definition of convexity requires that this funstion be convex.

Figure 7: Multivariate Concavity

In fact, the converse can be shown to hold. That is, given a multivariate real-valued

22

Page 23: Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could one use the answer to the second question as an answer for the rst question? ... 3.

function f with convex domain X. If, for any line lying in X, the restriction of f to this line

is convex, then f is said to be convex. This is stated formally in the following theorem.

Theorem: (Multivariate Convexity)

Let X be a convex subset of Rn. A real-valued function f : X → R is (strictly) convex

if and only if, for every x ∈ X and every z ∈ Rn that is different from zero, the function

g(t) = f(x+ tz) is (strictly) convex on {t ∈ R | x+ tz ∈ X}.

And since it must hold for every z ∈ R \ {0} and that each of this z has to be scaled by

a scalar t, we can, without loss of generality, require that it only holds for any unit vector

z ∈ R− {0}21.

Corollary: (Multivariate Convexity bis)

Let X be a convex subset of Rn. A real-valued function f : X → Rn is (strictly) convex

if and only if, for every x ∈ X and every z ∈ {z ∈ R − {0} | ‖z‖= 1}, the function

g(t) = f(x+ tz) is (strictly) convex on {t ∈ R | x+ tz ∈ X}.

Endowed with this geometrical insight, one can derive a convenient characterization of con-

vexity which applies to functions of class C2(X). Therefore, assume from now on that f is

twice differentiable. By considering the restriction of f to the line spanned by z we implicitly

defined a univariate function g of t:

g(t) = f(x+ tz)

where t ∈ R and x, z ∈ Rn such that ‖z‖= 1. Thus, by definition of univariate derivatives ,

we have that:

g′(t) = limh→0

g(t+ h)− g(t)

h

Further, if t = 0, g(0) = f(x) and if t 6= 0, then g(t) evaluates f at a distance t from x in the

direction specified by z. This should remind you of the directional derivative, for, indeed,

g′(t) evaluated at 0 coincides with the directional derivative at x in direction of z:

g′(0) = limh→0

g(t)− g(0)

t= lim

t→0

f(x+ tz)− f(x)

t

Or, more concisely,

g′(0) = Dzf(x)

In terms of generalized derivatives, this is equivalent to g′(0) = Df(x) · z (= ∇f (x) · z).

21Put differently, what matters for us in the z is the direction that it defines, not its sense nor itsmagnitude, for both of these aspects are subsumed in the multiplication of z by a scalar t!

23

Page 24: Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could one use the answer to the second question as an answer for the rst question? ... 3.

From there, one can reevaluate the requirement that g, i.e., the restriction of f to a specified

line, be (strictly) convex. More precisely, this is true if and only if ∀t ∈ {t ∈ R | x+ tz ∈ X},g′′(t)(>) ≥ 0, where:

g′′(t) =d

dtg′(t) =

d

dt∇f (x) · z = z′ ·Hf (x+ tz) · z

Evaluating at zero, we find that what we are requiring for convexity is simply that:

∀ x ∈ Int(X) ∀ z ∈ {z ∈ R− {0} | ‖z‖= 1}, z′ ·Hf (x) · z(>) ≥ 0.

And, again, realizing that scaling z by any scalar will not affect the sign of this equation, we

have that this requirement is equivalent to the following one:

∀ x ∈ Int(X) ∀ z ∈ R \ {0}, z′ ·Hf (x) · z(>) ≥ 0

This suggests a third characterization of convexity:

Corollary: (Multivariate Convexity ter)

Let X be a convex subset of Rn. A real-valued function f : X → R that is also an element

of C2(X) is convex if and only if, Hf (x) is positive semidefinite for all x ∈ Int(X).

Further, if Hf (x) is positive definite for all x ∈ Int(X), then f is strictly convex.

Remark 1: Please note that the characterization only applies to C2 functions. Not every

convex function is differentiable though! For instance, the absolute value, a convex function,

is not differentiable at 0.

Remark 2: We will not show it here, but a convex real-valued function defined on a finite-

dimensional convex domain X is continuous on Int(X).

Symmetric characterizations can be derived for concave functions.

3.3 Quasi-concavity, Quasi-convexity

In some settings, concavity or convexity are just too demanding. For instance, in some cases,

the only reasonable assumption one can make is that the function of interest is increasing or

decreasing. Yet, being increasing or decreasing does not guaranty convexity nor concavity.

Further, concavity and convexity need not be preserved by monotonic transformations, which

constitutes an undesirable feature when we think that most of economists’ work is based on

ordinal preferences. Therefore, it is interesting to define classes of functions which (i) preserve

the most important properties of concave and convex functions and (ii) entail all increasing

and decreasing functions and have properties which are preserved under monotonic transfor-

mations. Such functions are known respectively as quasi-concave and quasi-convex functions.

24

Page 25: Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could one use the answer to the second question as an answer for the rst question? ... 3.

As you will see in the next chapter, the convexity of the upper-level set (for concave functions)

and convexity of the lower-level set (for convex functions)22 are the specific characteristics

of concave and convex one would wish to preserve. As multivariate convexity and concavity

can be reduced to univariate ones, let me illustrate these concepts for the univariate case.

If one considers a convex function, and draw an horizontal line (a “level” line) through it,

the set of elements x in the domain with an image below this line is called a lower-level set

of the function and is convex. Similarly, if one considers a concave function, and draw an

horizontal line (a “level” line) through it, the set of elements x in the domain with an image

above this line is called an upper-level set of the function and is convex.

Figure 8: Convexity and Concavity via lower- and upper-level sets

Quasiconvexity and quasiconcavity are defined so as to preserve precisely these two qualities:

Definition: (Quasiconvexity, Quasiconcavity)

Let X be a convex subset of Rn. A real-valued function f : X → R is quasiconvex if

and only if the set

L−c := {x|x ∈ X, f(x) ≤ c},also refered to as f lower-level set, is convex. If f is such that

L+c := {x|x ∈ X, f(x) ≥ c},

also refered to as f upper-level set, is convex, then f is said to be quasiconcave.

In Exercise 11, you are asked to show that the following useful characterization holds:

22I will introduce level sets formally in the next chapter only, for now you just have to grasp the geometricintuition. PLEASE DO NOT CONFUSE UPPER-LEVEL SET AND EPIGRAPH, NOR LOWER-LEVELSET AND HYPOGRAPH! Level sets lie in the domain! Graphs lie in the Cartesian product of the domainand the codomain!

25

Page 26: Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could one use the answer to the second question as an answer for the rst question? ... 3.

Definition: (Quasiconvexity, Quasiconcavity)

Let X be a convex subset of Rn. A real-valued function f : X → R is quasiconvex if

and only if

∀x, y ∈ X ∀λ ∈ [0, 1] f(λx+ (1− λ)y) ≤ max{f(x), f(y)}If f is such that

∀x, y ∈ X ∀λ ∈ [0, 1] f(λx+ (1− λ)y) ≥ min{f(x), f(y)}then f is said to be quasiconcave.

This generalization allows us to include some non-convex and non-concave functions while

ruling out too messy functions, such as “camel backs” (see figure 9). All convex functions

are quasi-convex. All concave functions are quasi-concave. Linear functions are both quasi-

concave and quasi-convex, but they are not the only functions with this property: functions

that are both quasi-convex and quasi-concave are called quasi-linear functions23. Monotonic

functions are another instance of quasi-linear functions.

Figure 9: Quasiconvexity and Quasiconcavity

Remark: Beware! Contrary to convexity and concavity, which imply continuity on the

interior of the domain, quasi-concave and quasi-convex functions need not be continuous!

The floor function is an example of a discontinuous quasi-convex function! However, if a

quasi-convex (quasi-concave) function is continuous, then the lower- (respectively upper-)

level set is not only convex but also closed!

23An unfortunate term for economists, because it should absolutely not be confused withquasi-linear preferences/ quasi-linear utility functions!!

26

Page 27: Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could one use the answer to the second question as an answer for the rst question? ... 3.

A Appendix - Homogeneity

Homogeneous functions constitute an important class of functions in economics, and espe-

cially in microeconomics. For instance, in consumer theory, you will see that homothetic

preferences (see def. 3.B.6 in MWG), which happen to be very useful in applications, must

admit a utility representation that is homogeneous (see Exercise 3.C.5 in MWG24), that the

Walrasian demand function is homogeneous of degree 0 in its two inputs, namely, price and

income, etc. Similarly, in macroeconomics, the neoclassical growth model assumes a CRS

production technology, i.e., a production function that exhibits constant returns to scale.

Again, this CRS property is mathematically interpreted as a requirement of homogeneity

of degree one, also called linear homogeneity. Hence, we shall conclude this chapter with a

presentation of the concept of homogeneity together with the associated Euler theorem.

Definition: (Homogeneous Function)

Let X be an open conea in Rn. Let f : X → R be of class C1. Then, f is homogeneous

of degree k in X if and only if

f(λx) = λkf(x) ∀ λ > 0

where k ∈ N.aA cone preserves conic combinations, i.e. if X is a cone, then, for any x ∈ X and λ ∈ R+, λx ∈ X.

Cf. Chap. 2, Ex. 5.

Hence, an homogeneous function is a function that displays a property that is sometime

called “scale invariance”: if we simultaneously change all of its inputs by a common factor

λ, then the output is simply scaled by some given factor, namely, λk.

Remark 1: Note that any linear function is by definition homogeneous of degree one!

Remark 2: As we now deal with multivariate functions, we may sometimes want to express

homogeneity in only a subset of the inputs. It is possible to do so and I do it here for a

simple example. The Hicksian demand function, denoted h(p, u) where p denotes the price

vector, and u the minimum utility level to be achieved, is known to be homogeneous of degree

zero in prices, that is:

∀ λ > 0 h(λp, u) = λ0h(p, u) = h(p, u)

To conclude, the Euler theorem is an important characterization of homogeneous func-

tions that will either help you check the homogeneity of a function or help you draw some

interesting consequences out of homogeneity.

24By the way, notice a typo in that exercise, it need only have a homogeneous representation, not nec-essarily one that is homogeneous of degree one. This makes sense: in mathematics a function is calledhomothetic if it is an increasing transformation of a homogeneous function.

27

Page 28: Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could one use the answer to the second question as an answer for the rst question? ... 3.

Theorem: (Euler’s Theorem)

Let X be a cone in Rn. A real valued function f : X → R of class C1 is said to be

homogeneous of degree k in X if and only if

kf(x) = ∇f (x) · x ∀ x ∈ Xwhere k ∈ N. Moreover, the partial derivatives of f are themselves homogeneous of

degree (k − 1).

Proof: See exercise 8.

Remark 1: Note that for linearly homogeneous functions, this implies:

f(x) = ∇f (x) · x

28

Page 29: Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could one use the answer to the second question as an answer for the rst question? ... 3.

References

[1] De la Fuente, A. Mathematical methods and models for economists. Cambridge

University Press, 2000.

[2] Gross, H. Res.18.006 calculus revisited: Single variable calculus (mit opencourseware:

Massachusetts institute of technology), 2010.

[3] Jehle, G. A., and Reny, P. J. Advanced microeconomic theory (third edition). Pear-

son Education, 2011.

[4] Luenberger, D. G. Optimization by vector space methods. John Wiley & Sons, 1969.

[5] Simmons, G. F. Introduction to topology and modern analysis, vol. 3. McGraw-Hill

New York, 1963.

29