Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could...
Transcript of Chapter 3 - Multivariate Calculus - WordPress.com · Chapter 3 - Multivariate Calculus ... Could...
Chapter 3 - Multivariate Calculus∗
Justin Leduc†
These lecture notes are meant to be used by students entering the University of Mannheim
Master program in Economics. They constitute the base for a pre-course in mathematics;
that is, they summarize elementary concepts with which all of our econ grad students must
be familiar. More advanced concepts will be introduced later on in the regular coursework.
A thorough knowledge of these basic notions will be assumed in later
coursework. No prerequisite beyond high school mathematics are required.
Although the wording is my own, the definitions of concepts and the ways to approach them
is strongly inspired by various sources, which are mentioned explicitly in the text or at the
end of the chapter.
∗This Version: September 9th, 2015†Center for Doctoral Studies in Economic and Social Sciences. Contact: [email protected]
1
Contents
1 Introduction 3
1.1 What are derivatives? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Why do we use derivatives? . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Multivariate Functions, Generalized Derivatives 8
2.1 Partial Derivatives and the Gradient . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Directional Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Multivariate Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Functions Mapping to High-Dimensional Spaces . . . . . . . . . . . . . . . . 16
2.5 Higher Order Partial Derivatives and the Taylor Approximation Theorems . 17
3 Convexity, Concavity, and Multivariate Real-Valued Functions 20
3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Useful Characterizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Quasi-concavity, Quasi-convexity . . . . . . . . . . . . . . . . . . . . . . . . 24
A Appendix - Homogeneity 27
2
1 Introduction
Wikipedia provides a good explanation of what calculus actually is about:
“Calculus is the mathematical study of change, in the same way that geometry is
the study of shape and algebra is the study of operations and their application
to solving equations. It has two major branches, differential calculus (concerning
rates of change and slopes of curves), and integral calculus (concerning accumu-
lation of quantities and the areas under and between curves); these two branches
are related to each other by the fundamental theorem of calculus, [...] [which]
states that differentiation and integration are inverse operations.”
In this chapter, we will focus on the first branch of calculus, that is, differential calculus.
Integration is obviously important too, but is technically more involved, and would require a
whole course to carefully cover the very basics of it. If you are not familiar with integration,
it is therefore probably more adequate – for the moment! – to simply look at some cookbook
style lecture notes about integrals and how to perform integration, rather than to dig into
the difficult theory of measure and integration. Although I will not cover this topic here,
please note that knowing how to do basic integration and especially simple integration by
part will come in a handy to you during the year1.
1.1 What are derivatives?
In this section, we seek to understand what is precisely meant by ”the mathematical study
of change”. We consider the very simple case of real valued functions of a single variable.
Namely, let f be a function with domain X in R and codomain Y in R, let x̄ ∈ R, and
assume we are asked what the instantaneous rate of change, or slope, of f at x = x̄ is. The
question isn’t particularly easy to answer. Well, a general piece of advice when one doesn’t
know how to solve a mathematical problem is the following: first try to look for a slightly
different problem which you know how to solve. With a bit of luck, answering this second
problem will help you find a solution to the first one. In the present case, for instance, one
could try to answer first the following question: what is the average rate of change of f from
x̄ to x̄ + h, h ∈ R? This second question is more familiar to us: we just compute the ratio
of the rate of change in f to that of the rate of change in x:
∆f
∆x=f(x̄+ h)− f(x̄)
h
Could one use the answer to the second question as an answer for the first question? Well,
in many cases, if h is large, it would be hard to claim that the average rate of change is
a satisfactory answer to the first question. If h is set equal to zero, the situation is even
1A very good website on which you can speed up acquisition of such knowledge is that of Paul Dawkinshttp://tutorial.math.lamar.edu/
3
worse: we end up with an undetermined coefficient. Second piece of advice when facing a
tough mathematical problem: in low-dimensional spaces, geometry can help us! Let us try
to picture graphically what happens as we shrink h. For instance, in the following figure, we
started with a large h1 and then considered a smaller h2:
Figure 1: Average Rate of Change – an Answer to Instantaneous Rate of Change?
Intuitively, the smaller h becomes, the more one can feel satisfied with using the average rate
of change as an approximate answer for the question of the instantaneous rate of change.
And yet, h should never reach zero. This is reminiscent of our limit concept! When asked
about the slope of a function f at a given point x̄, one should try to see whether the following
limit is well defined:
limh→0
f(x̄+ h)− f(x̄)
h
If the limit exists, we denote it f ′(x̄) and call it the derivative of f at x̄. f is then said to be
differentiable at x̄. The derivative of f at x̄, provided it exists, is the best answer one has to
the question about the instantaneous rate of change of f at x̄. A function differentiable at
every point of its domain is simply called differentiable.
Remark: Note that derivatives, provided they exists, are unique. Indeed, it easy to show that,
in a metric space, limits are unique whenever they exist: start by assuming that two limits
exist, then use the triangle inequality to show that they must be the same! This remark will
also apply to all more general concepts of derivative we will study in this chapter.
4
1.2 Why do we use derivatives?
Why would we care about the possibility to formally define the instantaneous rate of change
of a function f? The answer to this question is linked to the idea that motivated the intro-
duction of vector spaces and is twofold: (i) most functions we investigate have a domain (and
also sometimes a codomain) that lies in a high-dimensional vector space, i.e., a vector space
that cannot be pictured geometrically (ii) the instantaneous rate of change, if defined, gives
us detailed information about the local behavior of the functions, and thereby allows us, in
such spaces, to analytically see what we cannot geometrically see2. As a result, while
a rigorous, analytical definition of the instantaneous rate of change appears redundant when
working with low-dimensional functions – i.e., functions with domain and image within Rn,
n ≤ 3 – it becomes our only tool in high dimensional spaces.
The best way to convince you is to tell you what I precisely mean by “the derivative gives
us detailed information about the local behavior of a function”. Let f be a function with
domain and codomain in R. The existence and value of the derivative of a function f gives
us three important pieces of information about f :
• Let f be differentiable at x̄, then f is continuous at x̄.
Proof:
f ′(x̄) := limx→x̄
f(x)− f(x̄)
x− x̄Thus, f ′(x̄) well-defined ⇒ ∃ε > 0 such that, ∀x ∈ Bε(x̄),
f(x)− f(x̄)
x− x̄is well defined.
Let x ∈ Bε(x̄), then f(x)− f(x̄) =f(x)− f(x̄)
x− x̄(x− x̄).
Let us take the limit of this expression:
limx→x̄
[f(x)− f(x̄)] = limx→x̄
[f(x)− f(x̄)
x− x̄
]limx→x̄
[x− x̄] = f ′(x̄)× 0 = 0.
Which is just what is required for continuity, has defined in last chapter.
• Let f be differentiable at x̄, then there exists a good “linear approximation”
of f in a neighborhood3 of x̄.
We like linear functions because they are simple and we know how they work. Unfor-
tunately, it is not likely that the functions involved in our applications be linear. A
good solution, then, is often to choose a function that has the properties required for
the application and that is differentiable on the interval of interest. Such functions are
2Providing a definition of “analytical” isn’t straightforward. The message I wish to convey here is thefollowing: symbols and equations are our only eyes in high-dimensional spaces!
3A “neighborhood” of x̄ is simply defined as subset that contains a ε-open ball centered at x̄ for someε > 0
5
“locally linear”, in the sense that we can locally assimilate their behavior to that of
linear functions.
More formally, the tangent of a function f : X ⊆ R → R at x̄ is the following linear
function:
x 7→ f(x̄) + f ′(x̄)(x− x̄)
Note that it is defined everywhere on R, as x̄, f(x̄) and f ′(x̄) are real numbers. To
see why the tangent is indeed a good approximation of f in a neighborhood of x̄,
let x ∈ Bε(x̄) and let ε(x) be our approximation error when using the tangent to
approximate f at x. Then
ε(x)
(x− x̄):=
f(x)− f(x̄)− f ′(x̄)(x− x̄)
(x− x̄)=f(x)− f(x̄)
(x− x̄)− f ′(x̄)
Therefore, by definition of the derivative,
limx→x̄
[ε(x)
(x− x̄)
]= 0
In words, when x gets close to x̄ the approximation error becomes insignificant, in the
sense that it is much smaller even than the discrepancy between x and x̄. Geometrically,
the tangent may be seen as the limit of secants4 of f :
Figure 2: The Tangent as the Limit of Secants
• Let a and b be real numbers such that a < b. If f is continuous and differ-
entiable on (a,b), then:
(i) f ′(x) = 0 for all x ∈ (a,b) iff f is constant on (a,b).
(ii) f ′(x) < 0 for all x ∈ (a,b) iff f is decreasing on (a,b).
(iii) f ′(x) > 0 for all x ∈ (a,b) iff f is increasing on (a,b).
4A secant is simply a line going through two specified points of a curve.
6
Think again of our tangent equation. Replacing f ′(x) by any null, negative, or pos-
itive number at all points of (a, b) should convince you of the above fact. Picture it
geometrically!
Each of these geometrical insight is of great importance when optimizing functions with
domain or codomain in a high-dimensional space. Only because of them we can claim that
analytical formulas serve as substitution eyes in such spaces. But before they do, we must
make sure we generalize them in a proper way. Therefore, the purpose of this chapter
is to generalize the concept of a derivative in a manner that preserves precisely
these pieces of information that the derivative of f at a point delivers to us.
Although such a generalization exists for infinite-dimensional spaces (e.g. function spaces),
and happens to be very close to the generalization for finite-dimensional spaces (e.g. Rn,
n ∈ N), we restrict ourselves to finite-dimensional vector spaces. The reason is twofold: (i)
you will not be expected to be able to work with infinite dimensional spaces in the master’s
curriculum, (ii) if you grasp the generalization for finite dimensional spaces, then you will
easily grasp that for infinite dimensional spaces as it is presented in books5.
Remark 1: In some situations, it is more convenient to not only have the instantaneous rate
of change but directly an approximation of the level of change. For this reason, we define
another function df : X × R→ R, called the total differential as follows:
df(x, dx) = f ′(x)dx
where dx denotes the change in x6. It is important to note that the total differential refers
to a function of two variables and not to single variable function like the derivative. Its
output is the level of change in f when x moves at a distance dx from x̄. Geometrically, we
can evaluate the total differential at x̄ and plot the resulting one variable function, which is
exactly the tangent to f at x̄.
Remark 2: The following notation is sometimes observed:
df
dx(x) := f ′(x)
It is important not to confuse thisdf
dxwith the total differential of f . The above notation is
an alternative to the notation of a derivative and not to that of the differential. Thus, we
have:
df(x, dx) =df
dx(x)dx
where the dx terms cannot be canceled in the last expression! (One is just a notation, the
other is a variable of its own!)
5For studying extensions of optimization techniques to infinite dimensional vector spaces, and the theoryof optimization by vector space methods in general, I strongly recommend Luenberger’s classic [4].
6replaces the previous notation h
7
2 Multivariate Functions, Generalized Derivatives
Before we start the process, let us make sure that our vocables are in line. In this chapter,
the term vector is to be understood in its classical sense, i.e., as a column vector composed
of real numbers, unless otherwise specified. The use of a row vector will thus be explicitly
mentioned via a transposition symbol such as a ′. A vector of dimension n× 1 is said to be
of length n. A vector of length 1 is a scalar, i.e., a real number.
I required as a prerequisite that you be clear with the concept of a function, f , its domain,
X, its codomain, Y , and its image, f(X). If Y ⊆ R, i.e., if f maps into the real line, we say
that f is a real valued function. If X ⊆ R, i.e., if f takes real numbers as inputs, we call f
a function of a single variable, or univariate function. Finally, if X ⊆ Rn, n > 1, i.e., if f
takes as input vectors of length n, n > 1, we call f a multivariate function. We will derive
the insights of the generalization using real-valued multivariate functions, i.e., functions with
domain Rn, n > 1 and codomain in R. Then, we will discuss functions going from Rn to
Rm, with n > 1 and m > 1.
Throughout the discussion, we will try to exploit geometric intuition as much as we can.
The most common way to geometrically represent a function is to plot its graph. Let f
be a function with domain X, codomain Y , and image f(X). Analytically, the graph of a
function is a collection of ordered pairs (x, f(x)), with x ∈ X and f(x) ∈ Y . Hence, the
graph is an element of the cartesian product X × Y , a space with dimension dim(X × Y ) =
dim(X)+dim(Y ). In the next chapter, we will discuss a different geometrical representation
of functions which is slightly less demanding in terms of dimension.
Finally, let us come back to the promised discussion on inverse functions. Let f be a
function with domain X, codomain Y , and image f(X). In the preliminary chapter, I said,
that, under some conditions, one can define an inverse function f−1 mapping from Y to X,
and such that f−1(f(x)) = x. What precisely are these conditions? There are two of them,
both quite intuitive. The first one is that we want our function to be well defined over the
codomain Y , i.e., we want that it be defined for any element y in Y . This will not be the
case, if the image of f , f(X) is a proper subset of Y , for then we would not know what
value to associate to f−1(y), y ∈ Y \ f(X). A first natural condition, then, is to require
that the image and the codomain of f coincide, i.e., f(X) = Y . If a function satisfies this
condition, it is said to be onto7. In other words, if f is onto, then, for any element, y,
that we pick in the codomain, Y , of f , there exists an x, in the domain, X, of f , such that
f(x) = y. It is a condition on the codomain of f 8. The second requirement is intuitive
too. Namely, we know that for any function f , if we take an element x in its domain X,
7a.k.a. surjective.8Note that this is not a very demanding condition, as it suffices to properly choose the codomain of our
function.
8
then, by definition, there exists a unique y in the image, f(X), such that y = f(x). The
converse, however, need not be true, i.e., take an element y in f(X) and consider its set of
antecedents, i.e., the set of x such that f(x) = y, there is no a priori reason that this set
be a singleton9. But if the converse is not true, then, once we consider the inverse mapping
f−1 we will end up having several images associated to a single y in f(X). That, in turn,
would disqualify our inverse map for the label of “function”. The second condition, then is
that the converse holds, i.e., that every distinct element x of the domain X maps into a dis-
tinct element y of the image f(X). If it is the case, we say that a function f is one-to-one10.
A function possesses an inverse function if and only if it is one-to-one and onto11.
See exercises 3 and 4.
2.1 Partial Derivatives and the Gradient
Let f : X ⊆ Rn → R be a function of n independent variables12. The simplest way to
proceed with our generalization is to simply transform that function into functions of one
variable and to use our usual single-variable notion of derivation. We do this by “consciously
forgetting” that actually n variables could be changing at the same time and instead focus
on what happens when only one of the n variables is changing at a time. This process is
formally stated in the following definition.
Definition: (Partial Derivative)
Let X ⊆ Rn and suppose f : X → R. If x̄ is an interior point of X, then, the partial
derivative of f with respect to xi at x̄ is defined as
∂f(x̄)
∂xi:= lim
h→0
f(x̄1, ..., x̄i−1, x̄i + h, x̄i+1, ..., x̄n)− f(x̄1, ..., x̄i, ..., x̄n)
h
with h ∈ R, whenever the limit exists. Another common notation for the partial deriva-
tive of f with respect to xi at x̄ is fi(x̄).
9A singleton is a set that contains a single element.10a.k.a. injective. Note that this is a much more demanding requirement. Think, for instance, of the
constant function. There is no way you will ever turn it into a one-to-one map.11a.k.a. bijective.12I distinguish here three cases: Independent variables are variables which can move in any direction
without having any impact on each other. For instance, two independently distributed random variablesin an experiment. Indirectly dependent variables are variables which cannot freely move without impactingthe other. For instance, the amount of goods in an economy and the happiness of individuals. Finally,Directly dependent variables are two variables for which the move of one totally determines the move of theother. Directly dependent variables are not interesting for us though, because we easily get rid of them byexpressing one as a function of the other and reducing the dimension of our problem.
9
Hence, the idea behind the partial derivative with respect to xi at x̄ is to consider all (n−1)
xj’s, j 6= i, as fixed and do as if f(.) was only a function of xi. There are thus – at most! – n
partial derivatives, one for each of the variables. Note that, a partial derivative, if defined,
is defined at a given point: x̄. Moreover, not only does the value of x̄i matter, but also
those at which one fixes the xj, j 6= i. This is illustrated by the following example: let
f(x1, x2) = x1x2, then
∂f(x̄)
∂x1
= x̄2
which obviously depends on where we fix x2.
Figure 3: Partial derivative of f(x1, x2) with respect to x1 at x̄
Remark: In the above figure, the partial derivative depends on the second variable and not
on the first one (that is why I could draw the tangent line without mentioning which x̄1 I
was picking). It may as well depend on all variables, or on none of them, as in the following
example: compute the partial derivative of f(x1, x2) = x1 + x2 with respect to x2 at x̄:
∂f(x̄)
∂x2
= 1
If the partial derivative of f with respect to xi is defined at x̄, we say that f is partially
differentiable at x̄ with respect to xi. If the partial derivative of f with respect to xi is defined
at every point of f ’s domain, we say that f is partially differentiable with respect to xi. In
such a case, as is conventional with univariate derivatives, the partial derivative can be seen
as a function from X to R.
10
To conclude the section – and sprinkle a bit of mystery!, consider the row vector whose
entries are the partial derivatives of f at x̄:
∇f (x̄) :=(f1(x̄) f2(x̄) · · · fn(x̄)
)Such a vector is called the gradient of f at x̄. It is an extremely important vector, section
2.3 will make you understand why!
See exercises 5
2.2 Directional Derivatives
Now, it is important to realize that we did not proceed without loss of generality when
generalizing the concept of derivation to multivariate functions. Remember, we “consciously
forgot” that all variables could move at the same time! That is, our generalization could
only claim to be proper for independent variables. In practice, variables often are indirectly
dependent or simply happen to change simultaneously, and we may wish to evaluate the
instantaneous rate of change of f if several of our variables change simultaneously. Let us
consider first the geometric intuition. In the case of a derivative for univariate functions it
was possible to move away from the point in any desirable direction, for the simple reason
that there was a unique direction along which to move, namely, that of the real line13! Then,
the derivative was indicating the rate of change in our function as we move away from x
along the real line, i.e., all directions we could think of moving along. Yet, as soon as we
have two or more dimensions, there are actually infinitely many possible directions along
which one could move14! Depending on the “geography” of our multivariate function, which
direction we focus on could matter! In the above definition, by requiring that all xj’s, j 6= i,
be fixed, we have imposed that the direction along which to evaluate the rate of change
should be that which is specified by the vector xi (as drawn in the Figure 3, where we picked
the direction of x1!).
Arguably, the real generalization of our derivative concept for univariate functions should
not impose any restriction on the direction along which to evaluate the rate of change. This
generalization exists and is called a multivariate derivative or more simply, derivative. But,
before we get there, we should first look at an intermediate concept, called the directional
derivative, which, if it exists, will give us the rate of change in any specified direction –
13Please do not confuse direction and sense! A direction simply indicates a line along which we move, thesense in which we move, however, indicates toward which side of the line we are moving.
14If you have troubles picturing that, simply imagine a single point within a 2-dimensional space (i.e., asimple plane, like a piece of paper), and start counting the number of lines that you can draw going throughthis point. The moment when you stop should come right after that when you realize there is an infinity ofsuch lines ;)
11
and not only the direction induced by only one of our variables. We already know from the
previous chapter that a vector consists of a direction, a sense, and a magnitude. Thus, any
vector z ∈ Rn specifies a direction along which we could try to evaluate the rate of change
of our multivariate real-valued function. Yet, as an infinity of vectors could specify the same
direction, we need to impose a bit of consistency and convention suggest to take a unitary
z, i.e., a z such that ‖z‖= 1. From there, we can proceed with a rather intuitive definition:
Definition: (Directional Derivative)
Let X ⊆ Rn and suppose f : X → R. If x̄ is an interior point of X, then, the rate
of change of f(x) at x̄ in the direction of unit vector z = (z1, ..., zn) ∈ Rn is called the
directional derivative, is denoted Dzf(x̄), and is defined as
Dzf(x̄) = limh→0
f(x̄+ hz)− f(x̄)
h
with h ∈ R, whenever the limit exists and whenever x̄+ hz ∈ X.
In particular, if z is going in the direction of xi, i.e., z = (0, ..., 0, 1, 0, ..., 0) ∈ Rn, where
the only positive component is that in the direction of xi, then the directional derivative
coincides with the partial derivative with respect to xi. To illustrate the concept, let us
consider again the case f(x1, x2) = x1x2:
Figure 4: Directional derivative of f(x1, x2) in direction of (1, 1) at x̄
In practice, it is not a very convenient definition: choosing a vector, finding the unitary
vector with the same direction, computing a limit... Luckily, there exists an important
characterization of the directional derivative which emphasize its link with partial derivatives
and which is much more convenient for applications. I present it in the next section.
12
2.3 Multivariate Derivative
Both generalizations introduced above preserve the second and third geometrical insight
mentioned in the the introduction. Given a point and a specified direction, one can locally
assimilate a function’s behavior to that of a linear function, provided the directional deriva-
tive exists. Yet, these generalizations could seem insufficient in the following sense: they
do not preserve the first important geometrical insight. Indeed, given a point x̄, neither the
existence of all partial derivatives, nor the existence of all directional derivatives is sufficient
to guarantee continuity of the function at x̄ (see exercise 6). For this reason, one introduces
the concept of (multivariate) derivative. This concept is the one that should be perceived as
a complete generalization of a univariate derivative.
Let Df(x̄) denote the generalized version of the derivative at x̄, i.e., f ’s instantaneous rate
of change at x̄. Continuity, as defined in the previous chapter, is about preservation of
neighborhoods. As we are looking for continuity, we shall not specify any direction along
which x moves. Rather, we specify the distance that separates the new x from our starting
point x̄ and allow for movement in any direction within that distance. In the univariate case,
f is differentiable at x̄ if and only if there exists a real number f ′(x̄) such that:
limh→0
f(x̄+ h)− f(x̄)
h− f ′(x̄) = 0
And because the norm is a continuous function, this implies:
lim‖h‖→0
‖f(x̄+ h)− f(x̄)− f ′(x̄)h‖‖h‖
= 0
Using this expression to generalize f ′(x̄) ensures that continuity will be preserved.
Definition: (Multivariate Derivative)
Let X ⊆ Rn and suppose f : X → R. If x̄ is an interior point of X, then f is differentiable
at x̄ if and only if there exists a row vector Df(x̄) such that
lim‖h‖→0
‖f(x̄+ h)− f(x̄)−Df(x̄) · h‖‖h‖
= 0
where h is a vector in Rn. If such a vector Df(x̄) exists, we interpret it as the derivative
of f at x̄.
Remark: As in the univariate case, for any x in X, we define the total differential at x̄,
denoted df(x̄, x), as:
df(x̄, x) := Df(x̄) · x
13
Df(x̄) is to be interpreted as the generalization of our univariate derivative f ′(x̄)
at x̄. Yet, defined as it is, it is a bit difficult to exactly see what it is! The following result,
by relating it to the geometrically intuitive partial and directional derivatives, gives useful
insights.
Theorem: (Total Differential, Directional Derivative, and Gradient)
Let X ⊆ Rn and suppose f : X → R. If x̄ is an interior point of X, and if f is
differentiable at x̄, then:
(i) all directional derivativesa of f exist at x̄ and,
(ii) ∀ z in X, ‖z‖= 1: df(x̄, z) := Df(x̄) · z = Dzf(x̄) = ∇f (x̄) · zaReminder: partial derivatives are only a special case of directional derivatives!
In words, Df(x̄), the true generalization of the univariate derivative, is nothing else than
the vector of partial derivatives of f at x̄, i.e., the gradient of f at x̄, ∇f (x̄)! That is, the
instantaneous rate of change of f in all directions is coincides with the gradient of f !
IN SHORT, IF YOU UNDERSTOOD WHAT THE GRADIENT IS, THEN YOU almost
RULE THE WORLD!
Proof: As f is differentiable at x̄, we know that there exists Df(x̄) in Rn such that:
lim‖h‖→0
‖f(x̄+ h)− f(x̄)−Df(x̄) · h‖‖h‖
= 0
Where h is a vector in Rn. Alternatively, let h = tz, where z is any unit vector and t a scalar
going to zero, i.e.,
limt→0
‖f(x̄+ h)− f(x̄)−Df(x̄) · tz‖‖tz‖
= 0
Using that ‖z‖= 1 and arranging a bit, one has:
limt→0
‖f(x̄+ tz)− f(x̄)‖|t|
= ‖Df(x̄) · z‖
On each side, the sign of terms inside the norm always coincide15, and one has:
limt→0
f(x̄+ tz)− f(x̄)
t= Df(x̄) · z
that is,
15Realize that if f increases in the direction of z, then both the left hand term and the right hand terminside the norms are positive. If f decreases in the direction of z, then both the left hand term and the righthand term inside the norms are negative.
14
Dzf(x̄) = Df(x̄) · z
Hence, all directional derivatives are well defined and so must be the partial derivatives in
particular. To prove the second part, observe that:
lim‖h‖→0
‖f(x̄+ h)− f(x̄)−Df(x̄) · h‖‖h‖
= 0
Df(x̄) is a vector here, and you may denote its entries by (ai)i=1,...,n. Let h = (h1, ..., hn),
the limit then writes:
lim‖h‖→0
‖f(x̄+ h)− f(x̄)−∑n
i=1 aihi‖‖h‖
= 0
Now, let h approach zero for all but one coordinate, for instance, coordinate j. We get:
lim‖hj‖→0
‖f(x̄+ hj)− f(x̄)− ajhj‖‖hj‖
= 0
and an argument similar to that above allows us to take the aj out:
limhj→0
f(x̄+ hj)− f(x̄)
hj= aj
or, otherwise stated:
fj(x̄) = aj
As argued earlier, the converse of this theorem need not hold. However, one can show
the following:
Theorem: (Partial Differentiablility and Differentiability)
Let X ⊆ Rn, suppose f : X → R, and let x̄ be an interior point of X. If all the partial
derivatives of f at x̄ exist and are continuous, then f is differentiable.
Proof: See e.g. De la Fuente [1], Chapter 4, Theorem 3.4.
And, as we wished, we also have our link to continuity:
Theorem: (Partial Differentiablility and Continuity)
Let X ⊆ Rn, suppose f : X → R, and let x̄ be an interior point of X. If f is differentiable
at x̄, then f is continuous at x̄.
Proof: See e.g. De la Fuente [1], Chapter 4, Theorem 3.2.
15
2.4 Functions Mapping to High-Dimensional Spaces
Consider now the more general case of a functions f with domain X in Rn, and codomain
Y in Rm, n and m greater than or equal to 2. The easiest way to proceed is to realize that
this high-dimensional function can be seen as a vector of real-valued multivariate functions.
Namely,
f =
f 1
f 2
...
fm
where each f i is a real-valued multivariate function mapping from Rn to R. We can then
extend our previous definition in a very natural way:
Definition: (Multivariate Derivative)
Let X ⊆ Rn and suppose f : X → Rm. If x̄ is an interior point of X, then f is
differentiable at x̄ if and only if there exists a matrix Df(x̄) such that
lim‖h‖→0
‖f(x̄+ h)− f(x̄)−Df(x̄) · h‖‖h‖
= 0
where h is a vector in Rn. If such a matrix Df(x̄) exists, we interpret it as the derivative
of f at x̄.
In words, our gradient is generalized by a matrix of first order derivatives, Df(x̄). Such
a matrix is called the Jacobian of f , sometimes denoted Jf (x̄). Further, the following result
holds:
Theorem: (Multivariate Derivative and Gradient of the Component Functions)
Let X ⊆ Rn, suppose f : X → Rm, and let x̄ be an interior point of X. Then, f is
differentiable at x̄ if and only if each of its component functions are differentiable at x̄.
Moreover, if f is differentiable at x̄, then:
(i) all directional derivatives of the component functions exist at x̄, and
(ii) the derivative of f at x̄ is the matrix of partial derivatives of the component
functions at x̄:
Jf (x̄) := Df (x̄) =
∇f1(x̄)...
∇fm(x̄)
=
∂f 1
∂x1
(x̄) · · · ∂f 1
∂xn(x̄)
... · · · ...∂fm
∂x1
(x̄) · · · ∂fm
∂xn(x̄)
∈ Rm×n
16
Proof: See e.g. De la Fuente [1], Chapter 4, Theorem 3.3.
The partial converse to this theorem as well as the result on continuity also extend to
the high-dimensional case. The proofs indicated for them takes multidimensionality of the
codomain into account. Let us conclude with the following remark: a generalized chain rule
in high-dimensional space. Once one has introduced the concept of a multivariate derivative,
the multivariate chain rule is a straightforward extension of the univariate one.
Proposition: (Multivariate Chain Rule)
Let X ⊆ Rn suppose g : X → Y , where Y ⊆ Rm. Further, suppose f : Y → Z, where
Z ⊆ Rp. If x̄ is an interior point of X, g(x̄) an interior point of Y , and g and f are
differentiable at x̄ and g(x̄), respectively, then f ◦ g is differentiable at x̄ and:
D[f ◦ g](x) = Df(g(x̄))Dg(x̄)
Proof: See e.g. De la Fuente [1], Chapter 4, Theorem 3.5.
See exercises 6 and 7.
2.5 Higher Order Partial Derivatives and the Taylor Approxima-
tion Theorems
Let f : X ⊆ Rn → R be a function of n variables. Also, assume X is open16. Earlier
on, we suggested that, if the n partial derivatives of f are defined at each point of the open
domain, then the partial derivatives could themselves be perceived as functions from X to R.
Hence, one may attempt to compute their partial derivatives! If such derivatives are defined,
we call them second order partial derivative. Repeating the reasoning, one can attempt to
define third, fourth, fifth order derivatives, and so on. Notationally, fi(x) denotes first order
partial derivatives, fi,j(x) := ∂fi(x)/∂xj denotes second order partial derivatives, and so on...
Let X ⊆ Rn be open, suppose f : X → R, and let x̄ be an element of X. If all second
order partial derivatives of f are defined at x̄, then, in the same way as the first order
partial derivatives could be gathered in a vector – the gradient –, all the second order partial
derivatives can be gathered in a matrix. Such a matrix, denoted Hf (x̄), is called the Hessian
of f at x̄, is square, and should be thought of as a generalized second order derivative for
multivariate real valued functions.
16A convenient way to stop worrying about interiority! But if in your application X is not open, do notforget to check interiority!!
17
Hf (x̄) =
∇f1(x̄)
∇f2(x̄)...
∇fn(x̄)
=
f1,1(x̄) f1,2(x̄) · · · f1,n(x̄)
f2,1(x̄) f2,2(x̄) · · · f2,n(x̄)...
.... . .
...
fn,1(x̄) fn,2(x̄) · · · fn,n(x̄)
Remark: The intermediate equality makes clear why the Hessian is an n × n matrix: it is
the derivative of the gradient of f at x̄, which is a function from X ⊆ Rn to Rn. Hence, the
Hessian is a Jacobian, but the converse is not true! Do not confuse both concepts!
Before we go on, let me introduce the concept of Ck-differentiability. Ck-differentiable
functions constitute the main class of functions economists work with, as they have many
nice properties. For instance, as the next theorem will state, Ck-differentiability constitute
a sufficient condition for consequence-less permutations in the order of derivation.
Definition: (Function of class Ck)
Let X ⊆ Rn be an open set, Y ⊆ R, and suppose f : X → Y . f is said to be of class Ck
on X, denoted f ∈ Ck(X, Y )a, if all partial derivatives of order less or equal to k exist
and are continuous on XaY could be equal to R. As real valued functions are heavily used in applications, the short notation
Ck(X) is taken as a substitute for f ∈ Ck(X,R).
Remark: It is common to call a C1(X) function a continuously differentiable function.
Theorem: (Schwarz’s Theorem / Young’s Theorem)
If f ∈ Ck(X), then the order in which the derivatives up to order k are taken can be
permuted.
Proof: See e.g., De la Fuente [1], Chapter 4, Theorem 2.6.
For instance, let X ⊆ Rn be open and suppose f : X → R. If f ∈ C2(X), then
fi,j(x) =∂fi(x)
∂xj=∂fj(x)
∂xi= fj,i(x)
Therefore, if f is C2-differentiable, then its Hessian matrix is symmetric!
We now have the equipment to discuss Taylor approximations17! Much of the result we
will see in the second half of this course and in the whole next chapter rely on Taylor ap-
proximations of the second order. Further, they are heavily used in macroeconomic analysis.
Albeit their rather heavy notation, they formalize a rather simple idea. Namely, in the same
17a.k.a. Taylor expansions.
18
way that first order derivatives provide valuable information, so do higher order derivatives.
Therefore, more precise, polynomial (rather than simply linear), approximations can be build
in small neighborhoods of a point of interest. When the function under study is a function
of a single variable, it is possible to exploit Taylor expansions of high orders.
Theorem: (nth Order Univariate Taylor Approximation)
Let X ⊆ R be an open set and consider f ∈ Cn+1(X). Then f can be best nth order
approximated around x̄ by the nth order Taylor expansion:
f(x̄+ h) ≈ f(x̄) +n∑
k=1
f (k)(x̄)hk
k!
where h ∈ R is such that x̄ + h ∈ X and f (k)(x) denotes f ’s derivative of order k at x̄.
The error of approximation, also known as the remainder of the Taylor approximation,
is given by the following formula:
Rn(h | x̄) := f(x̄+ h)− f(x̄)−n∑
k=1
f (k)(x̄)hk
k!=f (n+1)(x+ λh)
(n+ 1)!hn+1
for some λ ∈ (0, 1).
Proof: See e.g., De la Fuente [1], Chapter 4, Theorem 1.9.
Remark: Note that the remainder approaches zero at a faster rate than h itself. This is what
guaranties the quality of the approximation!
When the function under study is a multivariate function it is computationally very de-
manding18 and conceptually non-straightforward to exploit derivatives of order higher than
2. Therefore, we usually stick to the first and second orders.
Theorem: (First Order Multivariate Taylor Approximation)
Let X ⊆ Rn be an open set and consider f ∈ C2(X). Then f can be best linearly
approximated around x̄ by the first order Taylor expansion:
f(x̄+ h) ≈ f(x̄) +∇f (x̄) · h
where h ∈ Rn is such that x̄ + h ∈ X. The error of approximation, also known as the
remainder of the Taylor approximation, is given by the following formula:
Rn(h | x̄) := f(x̄+ h)− f(x̄)−∇f (x̄) · h =1
2h′ ·Hf (x̄+ λh)h
for some λ ∈ (0, 1).
Proof: See e.g., De la Fuente [1], Solutions to Chapter 4, Exercise 4.5.
18The first order generalized derivative of a real valued multivariate function is a vector, the second ordergeneralized derivative a matrix, the third order generalized derivative a matrix with multiple layers,...
19
Theorem: (Second Order Multivariate Taylor Approximation)
Let X ⊆ Rn be an open set and consider f ∈ C3(X). Then f can be best second order
approximated around x̄ by the second order Taylor expansion:
f(x̄+ h) ≈ f(x̄) +∇f (x̄) · h+1
2h′ ·Hf (x̄)h
where h ∈ Rn is such that x̄+h ∈ X. As ‖h‖ approaches zero, the remainder approaches
zero at a faster rate than h itself.
3 Convexity, Concavity, and Multivariate Real-Valued
Functions
3.1 Definition
Our last effort in this chapter is devoted to the notions of convexity and concavity of func-
tions. Their importance stems from optimization and will thus be emphasized in the next
chapter. For now, allow me to simply proceed with the formal discussion.
Definition: (Convex Real Valued Function)
Let X ⊆ Rn. A function f : X → R is convex if and only if X is a convex set and for
any two x, y ∈ X and λ ∈ [0, 1] we have
f(λx+ (1− λ)y) ≤ λf(x) + (1− λ)f(y)
Moreover, if this statement holds strictly whenever y 6= x and λ ∈ (0, 1), we say that f
is strictly convex.
Remark: Remember the epigraph story? And Jensen’s inequality? (Chapter 2, Exercise 3) If
yes, you probably understand that this definition of convex functions implies that the epigraph
of any convex function is convex. Show it!
Definition: (Concave Real-Valued Function)
Let X ⊆ Rn. A function f : X → R is concave if and only if −f is convex. Similarly f
is strictly concave if and only if −f is strictly convex.
Remarks: (i) Note that the definition of a concave real-valued function also requires that the
function be defined on a convex domain! (ii) Note that all affine functions are both convex
and concave! (iii) A concave function has a convex hypograph, a special case of this has
20
also been seen in Chapter 2 Exercise 3!
3.2 Useful Characterizations
You already know the geometrical signification of convexity and concavity for univariate
real-valued functions: if one picks at random an x and a y in the domain of f and draw the
line segment between (x, f(x)) and (y, f(y)), it will lie weakly below or weakly above the
graph of f between f(x) and f(y)19. This segment will lie weakly above f if and only if f is
convex; it will lie weakly below if and only if f is concave. This is illustrated in Figure 5:
Figure 5: Concavity and convexity of univariate real-valued functions
Ideally, our generalized definition should preserve this geometrical insight. Does it? Let
us have a look at a simple convex function defined in X ⊂ R2, say, f(x1, x2) = x21 + x2
2. The
graph of f , i.e., the collection of ordered pairs (x, f(x)) for all x in the domain of f , lies in R3
(see Figure 6). If one draws the segment line that connect two points chosen at random on
that curve, somehow, it does appear to lie “above” the graph of the function. Yet, as a line
exhaust only one out of three dimensions, there are, in fact, two dimensions left free, and the
notion of “above” I just used isn’t as clear as that which I used when we were considering a
graph on the real plane20. Therefore, if we wish to formalize the multidimensional geometric
intuition, we must first specify a plane, i.e., a subspace of dimension 2, and make sure that,
on that plane, the line lie above the graph, in the classical sense of the term “above”.
19If you checked the prerequisite on basic logic, you have probably noted that I use or in its mathematicalsense or the term, i.e., both may hold at once. It will lie weakly below and weakly above if and only if f isaffine!
20In the real plane, a line is a hyperplane, i.e. an affine whose dimension differs by only one unit formthat of the universal space. “Above” is necessarily well defined, as only one dimension is left free outside ofthe line, and a simple convention suffices to distinguish an “above” from a “below”.
21
Figure 6: Lying “above” in 3 or more dimensions
Remember that, independently of the dimension of the input vectors x, y ∈ Rn, f is
assumed to be real-valued and, therefore, f(λx + (1 − λ)y) as well as λf(x) + (1 − λ)f(y)
belong to R. Therefore, the definition only makes a statement about some specific planes
of R3. Let e denote the vector in Rn which spans the output dimension. The definition
tells us the following: consider any plane with a basis containing e; then, the restriction of
f to that plane is a real-valued univariate convex function. More practically, we are asked
to consider any two points x and y in the domain of f . Also, the domain is required to be
convex. Therefore, the vector z = y − x lies in the domain, and, further, it spans a line,
{t ∈ R | x + tz ∈ X}, that passes through both x and y. Combined with e (the vector
spanning the vertical axis), z generates a plane in the (n+1)-dimensional space in which the
the restriction of f to {t ∈ R | x+tz ∈ X} lies. If one focuses on this plane, one finds himself
back in the simple situation of considering the graph of a univariate real-valued function in
the real plane, and the definition of convexity requires that this funstion be convex.
Figure 7: Multivariate Concavity
In fact, the converse can be shown to hold. That is, given a multivariate real-valued
22
function f with convex domain X. If, for any line lying in X, the restriction of f to this line
is convex, then f is said to be convex. This is stated formally in the following theorem.
Theorem: (Multivariate Convexity)
Let X be a convex subset of Rn. A real-valued function f : X → R is (strictly) convex
if and only if, for every x ∈ X and every z ∈ Rn that is different from zero, the function
g(t) = f(x+ tz) is (strictly) convex on {t ∈ R | x+ tz ∈ X}.
And since it must hold for every z ∈ R \ {0} and that each of this z has to be scaled by
a scalar t, we can, without loss of generality, require that it only holds for any unit vector
z ∈ R− {0}21.
Corollary: (Multivariate Convexity bis)
Let X be a convex subset of Rn. A real-valued function f : X → Rn is (strictly) convex
if and only if, for every x ∈ X and every z ∈ {z ∈ R − {0} | ‖z‖= 1}, the function
g(t) = f(x+ tz) is (strictly) convex on {t ∈ R | x+ tz ∈ X}.
Endowed with this geometrical insight, one can derive a convenient characterization of con-
vexity which applies to functions of class C2(X). Therefore, assume from now on that f is
twice differentiable. By considering the restriction of f to the line spanned by z we implicitly
defined a univariate function g of t:
g(t) = f(x+ tz)
where t ∈ R and x, z ∈ Rn such that ‖z‖= 1. Thus, by definition of univariate derivatives ,
we have that:
g′(t) = limh→0
g(t+ h)− g(t)
h
Further, if t = 0, g(0) = f(x) and if t 6= 0, then g(t) evaluates f at a distance t from x in the
direction specified by z. This should remind you of the directional derivative, for, indeed,
g′(t) evaluated at 0 coincides with the directional derivative at x in direction of z:
g′(0) = limh→0
g(t)− g(0)
t= lim
t→0
f(x+ tz)− f(x)
t
Or, more concisely,
g′(0) = Dzf(x)
In terms of generalized derivatives, this is equivalent to g′(0) = Df(x) · z (= ∇f (x) · z).
21Put differently, what matters for us in the z is the direction that it defines, not its sense nor itsmagnitude, for both of these aspects are subsumed in the multiplication of z by a scalar t!
23
From there, one can reevaluate the requirement that g, i.e., the restriction of f to a specified
line, be (strictly) convex. More precisely, this is true if and only if ∀t ∈ {t ∈ R | x+ tz ∈ X},g′′(t)(>) ≥ 0, where:
g′′(t) =d
dtg′(t) =
d
dt∇f (x) · z = z′ ·Hf (x+ tz) · z
Evaluating at zero, we find that what we are requiring for convexity is simply that:
∀ x ∈ Int(X) ∀ z ∈ {z ∈ R− {0} | ‖z‖= 1}, z′ ·Hf (x) · z(>) ≥ 0.
And, again, realizing that scaling z by any scalar will not affect the sign of this equation, we
have that this requirement is equivalent to the following one:
∀ x ∈ Int(X) ∀ z ∈ R \ {0}, z′ ·Hf (x) · z(>) ≥ 0
This suggests a third characterization of convexity:
Corollary: (Multivariate Convexity ter)
Let X be a convex subset of Rn. A real-valued function f : X → R that is also an element
of C2(X) is convex if and only if, Hf (x) is positive semidefinite for all x ∈ Int(X).
Further, if Hf (x) is positive definite for all x ∈ Int(X), then f is strictly convex.
Remark 1: Please note that the characterization only applies to C2 functions. Not every
convex function is differentiable though! For instance, the absolute value, a convex function,
is not differentiable at 0.
Remark 2: We will not show it here, but a convex real-valued function defined on a finite-
dimensional convex domain X is continuous on Int(X).
Symmetric characterizations can be derived for concave functions.
3.3 Quasi-concavity, Quasi-convexity
In some settings, concavity or convexity are just too demanding. For instance, in some cases,
the only reasonable assumption one can make is that the function of interest is increasing or
decreasing. Yet, being increasing or decreasing does not guaranty convexity nor concavity.
Further, concavity and convexity need not be preserved by monotonic transformations, which
constitutes an undesirable feature when we think that most of economists’ work is based on
ordinal preferences. Therefore, it is interesting to define classes of functions which (i) preserve
the most important properties of concave and convex functions and (ii) entail all increasing
and decreasing functions and have properties which are preserved under monotonic transfor-
mations. Such functions are known respectively as quasi-concave and quasi-convex functions.
24
As you will see in the next chapter, the convexity of the upper-level set (for concave functions)
and convexity of the lower-level set (for convex functions)22 are the specific characteristics
of concave and convex one would wish to preserve. As multivariate convexity and concavity
can be reduced to univariate ones, let me illustrate these concepts for the univariate case.
If one considers a convex function, and draw an horizontal line (a “level” line) through it,
the set of elements x in the domain with an image below this line is called a lower-level set
of the function and is convex. Similarly, if one considers a concave function, and draw an
horizontal line (a “level” line) through it, the set of elements x in the domain with an image
above this line is called an upper-level set of the function and is convex.
Figure 8: Convexity and Concavity via lower- and upper-level sets
Quasiconvexity and quasiconcavity are defined so as to preserve precisely these two qualities:
Definition: (Quasiconvexity, Quasiconcavity)
Let X be a convex subset of Rn. A real-valued function f : X → R is quasiconvex if
and only if the set
L−c := {x|x ∈ X, f(x) ≤ c},also refered to as f lower-level set, is convex. If f is such that
L+c := {x|x ∈ X, f(x) ≥ c},
also refered to as f upper-level set, is convex, then f is said to be quasiconcave.
In Exercise 11, you are asked to show that the following useful characterization holds:
22I will introduce level sets formally in the next chapter only, for now you just have to grasp the geometricintuition. PLEASE DO NOT CONFUSE UPPER-LEVEL SET AND EPIGRAPH, NOR LOWER-LEVELSET AND HYPOGRAPH! Level sets lie in the domain! Graphs lie in the Cartesian product of the domainand the codomain!
25
Definition: (Quasiconvexity, Quasiconcavity)
Let X be a convex subset of Rn. A real-valued function f : X → R is quasiconvex if
and only if
∀x, y ∈ X ∀λ ∈ [0, 1] f(λx+ (1− λ)y) ≤ max{f(x), f(y)}If f is such that
∀x, y ∈ X ∀λ ∈ [0, 1] f(λx+ (1− λ)y) ≥ min{f(x), f(y)}then f is said to be quasiconcave.
This generalization allows us to include some non-convex and non-concave functions while
ruling out too messy functions, such as “camel backs” (see figure 9). All convex functions
are quasi-convex. All concave functions are quasi-concave. Linear functions are both quasi-
concave and quasi-convex, but they are not the only functions with this property: functions
that are both quasi-convex and quasi-concave are called quasi-linear functions23. Monotonic
functions are another instance of quasi-linear functions.
Figure 9: Quasiconvexity and Quasiconcavity
Remark: Beware! Contrary to convexity and concavity, which imply continuity on the
interior of the domain, quasi-concave and quasi-convex functions need not be continuous!
The floor function is an example of a discontinuous quasi-convex function! However, if a
quasi-convex (quasi-concave) function is continuous, then the lower- (respectively upper-)
level set is not only convex but also closed!
23An unfortunate term for economists, because it should absolutely not be confused withquasi-linear preferences/ quasi-linear utility functions!!
26
A Appendix - Homogeneity
Homogeneous functions constitute an important class of functions in economics, and espe-
cially in microeconomics. For instance, in consumer theory, you will see that homothetic
preferences (see def. 3.B.6 in MWG), which happen to be very useful in applications, must
admit a utility representation that is homogeneous (see Exercise 3.C.5 in MWG24), that the
Walrasian demand function is homogeneous of degree 0 in its two inputs, namely, price and
income, etc. Similarly, in macroeconomics, the neoclassical growth model assumes a CRS
production technology, i.e., a production function that exhibits constant returns to scale.
Again, this CRS property is mathematically interpreted as a requirement of homogeneity
of degree one, also called linear homogeneity. Hence, we shall conclude this chapter with a
presentation of the concept of homogeneity together with the associated Euler theorem.
Definition: (Homogeneous Function)
Let X be an open conea in Rn. Let f : X → R be of class C1. Then, f is homogeneous
of degree k in X if and only if
f(λx) = λkf(x) ∀ λ > 0
where k ∈ N.aA cone preserves conic combinations, i.e. if X is a cone, then, for any x ∈ X and λ ∈ R+, λx ∈ X.
Cf. Chap. 2, Ex. 5.
Hence, an homogeneous function is a function that displays a property that is sometime
called “scale invariance”: if we simultaneously change all of its inputs by a common factor
λ, then the output is simply scaled by some given factor, namely, λk.
Remark 1: Note that any linear function is by definition homogeneous of degree one!
Remark 2: As we now deal with multivariate functions, we may sometimes want to express
homogeneity in only a subset of the inputs. It is possible to do so and I do it here for a
simple example. The Hicksian demand function, denoted h(p, u) where p denotes the price
vector, and u the minimum utility level to be achieved, is known to be homogeneous of degree
zero in prices, that is:
∀ λ > 0 h(λp, u) = λ0h(p, u) = h(p, u)
To conclude, the Euler theorem is an important characterization of homogeneous func-
tions that will either help you check the homogeneity of a function or help you draw some
interesting consequences out of homogeneity.
24By the way, notice a typo in that exercise, it need only have a homogeneous representation, not nec-essarily one that is homogeneous of degree one. This makes sense: in mathematics a function is calledhomothetic if it is an increasing transformation of a homogeneous function.
27
Theorem: (Euler’s Theorem)
Let X be a cone in Rn. A real valued function f : X → R of class C1 is said to be
homogeneous of degree k in X if and only if
kf(x) = ∇f (x) · x ∀ x ∈ Xwhere k ∈ N. Moreover, the partial derivatives of f are themselves homogeneous of
degree (k − 1).
Proof: See exercise 8.
Remark 1: Note that for linearly homogeneous functions, this implies:
f(x) = ∇f (x) · x
28
References
[1] De la Fuente, A. Mathematical methods and models for economists. Cambridge
University Press, 2000.
[2] Gross, H. Res.18.006 calculus revisited: Single variable calculus (mit opencourseware:
Massachusetts institute of technology), 2010.
[3] Jehle, G. A., and Reny, P. J. Advanced microeconomic theory (third edition). Pear-
son Education, 2011.
[4] Luenberger, D. G. Optimization by vector space methods. John Wiley & Sons, 1969.
[5] Simmons, G. F. Introduction to topology and modern analysis, vol. 3. McGraw-Hill
New York, 1963.
29