Chapter 2 - Introduction to Vector Spaces - … · Chapter 2 - Introduction to Vector Spaces ......

Chapter 2 - Introduction to Vector Spaces∗

Justin Leduc†

These lecture notes are meant to be used by students entering the University of Mannheim

Master program in Economics. They constitute the base for a pre-course in mathematics;

that is, they summarize elementary concepts with which all of our econ grad students must

be familiar. More advanced concepts will be introduced later on in the regular coursework.

A thorough knowledge of these basic notions will be assumed in later

coursework. No prerequisite beyond high school mathematics are required.

Although the wording is my own, the definitions of concepts and the ways to approach them

is strongly inspired by various sources, which are mentioned explicitly in the text or at the

end of the chapter.

∗This version: September 9th 2015†Center for Doctoral Studies in Economic and Social Sciences. Contact: [email protected]

1

Contents

1 Introduction 3

2 From Classical Vectors to Vector Spaces 4

3 Subspaces, Linear Combinations, and Linear Dependence 7

4 Affine and Convex Sets 10

4.1 Affine Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.2 Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.3 Important Examples: Hyperplanes and Half Spaces . . . . . . . . . . . . . . 15

5 Normed Vector Spaces and Continuity 17

5.1 Distance and Norm in a Vector Space . . . . . . . . . . . . . . . . . . . . . . 17

5.2 Open sets, Closed sets, Compact sets . . . . . . . . . . . . . . . . . . . . . . 20

5.3 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

A Appendix 1: The Separating and Supporting Hyperplane Theorems 25

2

1 Introduction

In the previous chapter, I devoted a whole section to vectors. The reason is that they

constitute an extremely important benchmark to get into the topic of vector spaces, as

the name of the topic itself indicates. The main objective of the theory of vector spaces

is sometimes1 described as follows: Geometrical insights at hand with 2- or 3-dimensional

real vectors are really helpful. Can we, in some way, generalize these insights to other

mathematical objects, for which a geometric picture is not available? By “insights”, we here

think of our ability to identify low dimensional real vectors with geometrical entities, and,

thereby to guide our thoughts. For instance, various vector operations are easily plotted:

Figure 1: 2D Geometrical Interpretation of Vector Operations

And, in turn, these geometrical representations give us simple and fundamental principles

that help us solving various problems. Think, for instance, of the projection theorem in two

dimensions, which states that the shortest path from a point x to a line l is that which lies on

the perpendicular to l. While geometrically very intuitive, this result would probably have

been very hard to find out, had we not had a geometrical representation for vectors. And

yet, it is a fundamental result for optimization problems such as the least square problems

you (may) have studied in your undergrad. Its generalization to n-dimensional real vectors

allows, among other things, the application of the least squares method in situations where

no picture can guide the thoughts. What this chapter will try to convince you about, is

that the generalization applies to many more mathematical objects than n-tuples of real

numbers2.

IN THIS CHAPTER, BY VECTOR WE NEED NOT MEAN A N-TUPLE OF REAL

1See e.g. Luenberger [4] or Gross [3].2For instance, functions and matrices!

3

NUMBERS, BUT MAY REFER TO MANY MORE OBJECTS (FUNCTIONS,

SEQUENCES, MATRICES, · · ·)!

Now, generalizing something need not be easy. Indeed, one has to pin down the exact

features of the simpler concepts which guarantee the validity of the insights they bring to

us. In our case, these features are precisely the algebraic structure which we have defined for

matrices (and hence, for vectors) in our previous chapter. Therefore, in the sequel, we find

important to distinguish simple sets from “algebraically structured sets”, i.e. sets equipped

with an algebraic structure. In order to mark the change in emphasis when we manipulate

such “structured sets”, we give them a different name, namely, we call them spaces. A vector

space, then, is a set of mathematical objects3 equipped with the same algebraic structure as

that of vectors of real numbers.

2 From Classical Vectors to Vector Spaces

Let us try to summarize the algebraic structure we introduced in the last chapter. Remem-

ber that, by vector, unless otherwise stated, we meant a column vector, i.e. a n-tuple of

real numbers. This notion of a vector will now be referred to as the classical notion of a vector.

Definition: (The Vector Space of Classical Vectors Vn)

Let Vn denote the set collecting all n-tuples of real numbers. Associate to this set two

operations: the vector addition, which associate to any x and y in Vn the n × 1 vector

x + y, and the scalar multiplication, which associate to any scalar λ and any x in Vn the

n× 1 vector λx. Denote by Vn the space (Vn,+,×). The following properties hold:

(i) Vector addition is commutative: ∀x,y ∈ Vn x + y = y + x

(ii) Vector addition is associative: ∀x,y, z ∈ Vn x + (y + z) = (x + y) + z

(iii) There exists a null vector 0n in Vn such that: ∀x ∈ Vn x + 0 = x

(iv) Scalar multiplication is associative: ∀λ, µ ∈ R ∀x ∈ Vn λ(µx) = (λµ)x

(v) Scalar multiplication is distributive over vector and scalar additions:

∀λ ∈ R ∀x,y ∈ Vn λ(x + y) = λx + λy

∀ λ, µ ∈ R ∀ x ∈ Vn (λ+ µ)x = λx + µx

(vi) If 1 denotes the scalar multiplicative identity and 0 the scalar zero, then:

∀x ∈ Vn 1x = x and 0x = 0n

3For instance, vectors in the classical sense of the term (i.e. n-tuples of real numbers), but not necessarily!Functions, matrices, sequences,... all can fit in this new, more general, concept of a vector! Beware, do notunderstand here that it is possible to identify those mathematical objects with n-tuples of real numbers.Rather, this means that our concept of what a vector is now includes the concept of vectors you have studiedin high school, while not being itself included in this concept.

4

Remark: I did not call my set of length n classical vectors Rn. Any idea why? Observe that

I have been silent about another vector operation: the scalar product! It is conventional to

forget about the scalar product when defining vector spaces. By doing so, one includes in

the definition of vector spaces all these sets which can be equipped with the above algebraic

structure but may not be equipped with a scalar product. The absence of a scalar product is

sometimes emphasized by calling the vector space under consideration a linear space. Rn,

however, is itself a notation for a space of n-tuples of real numbers, but a space with a nicer

algebraic structure than Vn: one that includes the scalar product! A vector space that pos-

sesses a well defined scalar product is called a pre-Hilbert space4.

As mentioned in the introduction, in order to define general vector spaces, we take all these

algebraic properties and use them as defining axioms. Note the change of emphasis here.

A property is the consequence of some set of axioms, i.e., initial assumptions. Using the

algebraic properties of classical vectors as defining axioms means that we now understand as

vectors not only n-tuples of real numbers, but all mathematical objects which are member

of a space endowed with these algebraic laws.

Definition: (Real Vector Space)

Let X := (X,+,×) be a set of elements, which we will from now on call vectors, together

with two operations. Namely, the vector addition, which associate to any x and y in X

the n×1 vector x+y, and the scalar multiplication, which associate to any scalar λ and

any x in X the n × 1 vector λx. X is called a vector space if the following properties

hold:

(i) Vector addition is commutative: ∀x,y ∈ X x + y = y + x

(ii) Vector addition is associative: ∀x,y, z ∈ X x + (y + z) = (x + y) + z

(iii) There exists a null element 0 in X such that: ∀x ∈ X x + 0 = x

(iv) Scalar multiplication is associative: ∀λ, µ ∈ R ∀x ∈ X λ(µx) = (λµ)x

(v) Scalar multiplication is distributive over vector and scalar additions:

∀λ ∈ R ∀x,y ∈ X λ(x + y) = λx + λy

∀ λ, µ ∈ R ∀ x ∈ X (λ+ µ)x = λx + µx

(vi) If 1 denotes the scalar multiplicative identity and 0 the scalar zero, then:

∀x ∈ X 1x = x and 0x = 0n

4Actually, Rn possesses a further important, yet non-algebraic, property: that of completeness. Ittherefore qualifies for the name of Hilbert space!

5

Remark: the attribute “real” in our definition is simply here to express the fact that our

scalars are elements of the real line. The concept of vector spaces can naturally be extended

to that of vector space over a different set of numbers, such as for instance C, the set of

complex numbers. Yet, we will focus on real vector spaces in what follows.

It is a good exercise to verify that the following sets, endowed with proper operations, can

now also be considered as vector spaces:

• V = {f : dom(f) = [a, b]},

• S = {x | x = ζk}∞k=1 the set of infinite sequences of real numbers,

• The set Cn(X) of all bounded and continuous real valued functions with domain in Rn,

• Mm×n, the space of m× n matrices,

• ... And many others!

Elementary but important properties of vector spaces are the cancellation laws.

Proposition: (Cancellation laws)

Let X := (X,+,×) be a real vector space, x, y and z belong to X, and λ and γ belong

to R. Then we have the following Cancellation laws :

(i) If x + y = x + z, then y = z

(ii) If λx = λy and λ 6= 0, then x = y

(iii) If λx = γx and x 6= 0, then λ = γ

Proof: See Exercise 1

To conclude this section, let us generalize another point with which you have become familiar

when working with classical vector spaces: the relation between R, R2, R3, and so on... This

relation is made formal through the concept of Cartesian product:

Definition: (Cartesian product)

Let X := (X,+,×) and Y := (Y,+,×) be two real vector spaces. We define the cartesian

product of X and Y, denoted X×Y as the collection of ordered pairs (x, y) with x element

of X and y element of Y together with two operations: addition and scalar multiplication,

defined respectively as (x1, y1) + (x2, y2) = (x1 + x2, y1 + y2) and λ(x, y) = (λx, λy).

Remark: You may want to verify that the Cartesian product of two real vector spaces is itself

a vector space.

6

3 Subspaces, Linear Combinations, and Linear Depen-

dence

Considering subsets of a “universal” or “ambient” set often proves useful in mathematics.

For instance, we may like to consider only the integers and not the whole real line. It is possi-

ble to generalize the concept of a subset to the context of spaces (i.e. algebraically structured

sets). If we are to do so, however, we do not want to loose the very structure we looked

for when moving from the notion of a set to that of a space5. I first introduce a concept

that will help us to go on along this line, and then proceed with the notion of vector subspace.

Definition: (Closure Under an Operation)

Let X := (X,+,×) be a real vector space. We say that Y ⊆ X is closed under the

addition if and only if, for any two elements y1 and y2 in Y, we have that y1 + y2

belongs to Y. Similarly, we can define closure under scalar multiplication.

Definition: (Vector Subspace)

Let X := (X,+,×) be a real vector space and Y a non empty subset of X. We say

that Y is a subspace of X if and only if Y is closed under vector addition and scalar

multiplication.

Notation: In what follows, I will generally denote a real vector space (X,+,×) by X.

Remark: If you understood properly the idea of closure, you should be able to conclude that a

simple way to check whether Y is a subspace of X or not is to verify or falsify the following

statement:

∀y1,y2 ∈ Y ∀λ, µ ∈ R λy1 + µy2 ∈ Y

This means that a subspace of a real vector space is a subset that contains any linear combi-

nations of two of its elements! Keep that in mind!

Any subspace of a vector space is itself a vector space. Further, note that the entire space

X is a subspace of X as X is by definition a subset of itself. In the same way that we call

a subset a proper subset whenever the inclusion is strict, a subspace not equal to the entire

space is called a proper subspace. For instance, we may think of the the space of convergent

infinite real sequences as a proper subspace of that of infinite real sequences mentioned above.

5Remember, the structure will guarantee the valid extension of our geometrical insights!

7

Proposition: (Intersection and Addition of Subspaces)

Let M and N be subspaces of a real vector space X. Then:

(i) their intersection, M ∩ N, is a subspace of X.

(ii) their sum, M + N, is a subspace of X.

Proof: See Exercise 2

Remark: Note that nothing is said about the union!! (Counterexample?)

Summing up, we have defined vector spaces, vector subspaces, and argued that any linear

combination of vectors in a vector (sub)space also lie in that (sub)space. The next result

establishes a converse proposition: linear combinations can be used to construct a subspace

from an arbitrary set of vectors in a vector space.

Proposition: (Generated Subspace (a.k.a Span))

Let Y be a subset of a real vector space X. Then, the set Span(Y), which consists of all

vectors in X that can be expressed as linear combinations of vectors in Y, is a subspace of

X. It is called the subspace generated by Y or span of Y and it is the smallest subspace

which contains Y.

Example: In chapter 1, section 7.4, we have seen that the equation Ax = b could be un-

derstood as a linear combination of A’s columns. We have argued that, if the columns are

linearly independent, then, for all b in Rn, the equation Ax = b has a unique solution. The

reason is that n linearly independent vectors will span the whole of Rn! Had only m < n

columns been independent, the space spanned by the columns of A, a.k.a. the column space

of A, would have been a proper subspace of dimension m. The b’s in Rn but outside the

column space of A would have been such that the equation Ax = b has no solution.

Actually, the vector space terminology allows us to define more precisely the concept of linear

independence that we informally mentioned until now.

Definition: (Linear Dependence, Linear Independence)

Let x be an element of a real vector space X. x is said to be linearly dependent upon

a set S of vectors of X if it can be expressed as a linear combination of vectors from

S. Equivalently, x is linearly dependent upon S if and only if x ∈ Span(S). If that is

not the case, the vector x is said to be linearly independent of the set S. Finally, a set

of vectors is said to be a linearly independent set if each vector of the set is linearly

independent of the remainder of the set.

8

Thus, two vectors are linearly independent if they do not lie on a common line through the

origin, three vectors are linearly independent if they do not lie in a plane through the origin,

etc...

Remark: Clearly, 0 is dependent on any given vector x (Why?). By convention, the set con-

sisting of 0 only is understood to be a dependent set and a set consisting of a single nonzero

vector an independent set.

Theorem: (Testing Linear Independence)

A necessary and sufficient condition for the set of vectors x1, x2, ..., xn to be linearly

independent is that:

Ifn∑

k=1

λkxk = 0, then ∀k = 1, 2, ..., n λk = 0

Proof: (Necessary part; By contradiction)

Let the set of vectors x1, x2, ..., xn be linearly independent and assume there is a λkdifferent from zero in the above sum. For simplicity, name the vectors so that this λk be the

one corresponding to the nth vector. Then,

n∑k=1

λkxk = 0⇔n−1∑k=1

λkxk = λnxn

Therefore, the following holds:

xn = −n−1∑k=1

λkλnxk

Which contradicts our initial assumption.

Finally, the following definition will come in handy if you are to use tools of advanced calculus.

Definition: (Basis and Space Dimension)

A finite set S of linearly independent vectors is said to be a basis for the space X if S

generates X. A vector space having a finite basis is said to be finite dimensional. All

other vector spaces are said to be infinite dimensional.

Example: Consider R3 with coordinate axes x, y, and z. Then any 3-tuple of independent

3-dimensional vectors spans the whole space R3. A particular example, known as the canon-

ical basis, is the following set of three vectors: {(1, 0, 0); (0, 1, 0); (0, 0, 1)}

9

Theorem: (Uniqueness of the Dimension)

Any two bases for a finite dimensional vector space contain the same number of elements.

Proof: (The idea only)

The result is quite intuitive. Assume it is not the case, i.e., one basis, say, basis 1, has

n elements and another basis, say, basis 2, has m elements, with m 6= n. Without loss of

generality, assume m < n. Because basis 1 is a basis, you may express all elements of basis

2 as linear combinations of the elements of basis 1. And because each element of basis 2 is

by definition independent, you can one by one substitute the first m elements of basis 1 by

m elements of basis 2 so expressed. Thus, only m elements of basis 2 suffice to generate the

whole subspace, and it must be that n = m.

Example: Any pair of independent vectors in Rn generates a plane, which is thus a finite

dimensional space of dimension two. No finite collection of vectors will suffice to define

Cn(X), the set of all bounded and continuous real valued functions with domain X ∈ Rn. It

is thus an infinite dimensional space.

4 Affine and Convex Sets

The concept of a subspace is fundamental. Yet, in many applications, this remains, “geo-

metrically” speaking, too inadequate a concept, and we need to operate a trade off. While,

on the one hand, we would be happy to keep as much as possible of the introduced algebraic

structure, on the other hand, we must frequently work with subsets of vectors spaces which

are not themselves subspaces and which we are not willing to replace with their span. For

instance, in optimization, we most often face constraints on our vectors, constraints which

define the subset of the ambient space over which we are maximizing. Of course, because

these constraints arise from our real environment, there is a priori no reason that the region

they define constitute a subspace, i.e. that it be closed under arbitrary linear combinations.

And yet substituting the subspace spanned by the constraint set to the constraint set would

precisely nullify our attempts to model the constraints imposed by the environment we try

to model. In this section, we present two type of subsets which provide, in many a situation,

a satisfactory answer to this trade-off. Note, because we loose on the algebraic dimension,

we drop the label of “space” and come back to that of “set”.

I hope you perceived that, in the previous section, we established a strong relation between

linear combinations and subspaces. I will now present the concepts of affine combination,

10

which is fundamental to understand what are affine sets6, and that of convex combination,

which relates to convex sets. Both impose algebraic restrictions on the already discussed

concept of linear combination. Each restriction is of course a loss on the algebraic dimension.

But keep in mind that it is also the only way to gain in the “geometrical” dimension, that is,

to enlarge our collection of “workable” sets. Indeed, instead of requiring from our sets that

they contain all linear combinations of their elements, we will ask from them to contain only

a specific subset of the linear combinations of their elements. Such an algebraic requirement

being less demanding, our class of “satisfying” sets will be larger.

4.1 Affine Sets

It is sometimes suggested7 that a second advantage of vector spaces, stemming from the

focus on the algebraic structure, is to “free ourselves from the coordinate system”. Instead

of a specific coordinate system, vectors exists for themselves and become the main object of

study. The next concept, which is sometimes described as an attempt to “forget about the

origin of the vector space”, pushes the idea even further as it simply drops the requirement

that our subset entails the origin of our ambient space (requirement (iii) in our definition of

vector spaces).

Definition: (Affine Sets)

The translationa of a subspace is called an affine set.

aA translation consists of moving an object from a point to another. The direction, sense andmagnitude of this move are usually specified by the mean of a vector.

This is a clear definition to expose the concept (see Figure 2). However, the following char-

acterization via affine combinations proves more useful in applications:

Definition: (Affine Combination)

An affine combination of the vectors x1, x2,...,xn is a linear combination of the vectors,

i.e., a sumn∑

i=1

λixi, λi ∈ R, such that the following additional requirement holds:

n∑i=1

λi = 1

6Also often referred to as affine subspaces or linear varieties. Yet, as explained above, this is not strictlyspeaking a space anymore and we should therefore be wary of the first alternative denomination. The secondalternative denomination could be misleading as well, as it uses the term “linear” while we will, in fact, betalking of sets which preserve affine (and not linear) combinations.

7See e.g. Gross [3].

11

Propostion: (Characterization of Affine Sets)

Let Y be a subset of a vector space X. Y is affine if and only if it contains every affine

combinations of its elements.

Proof: (I just give the idea)

Let Y be a set that contains every affine combination of its elements. Consider any element

y0 of Y. It can be shown that Y := X − y0 = {y − y0 | x ∈ Y} is a subspace. Further, it

can be shown that the subspace thus defined does not depend on which y0 you picked! Said

otherwise, a set closed under affine combinations is the translation of a subspace.

Figure 2: To every affine set is associated a unique subspace

The other direction is straightforward, as any affine set is a translated subspace, the only

property of vector spaces that is not fulfilled is that of containing the origin.

Remark: Hence, to verify whether a set is affine or not, one has to make sure that any line

going through two different points of the set is fully contained in it!

Clearly affine sets conserve most of the properties of subspaces, and this probably explains

why the terminology affine spaces is popular. Further many interesting sets of constraints

actually define an affine set as our optimization region. Linear equality constraints constitute

such an example, in the sense that the solution to a set of linear equations XA = {x | Ax = b}where x belongs to X ⊆ Rn, A belongs to Mm×n and b belongs to Rm. Technically, they

are even more than a simple example. Rather, they constitute a third characterization of

affine sets, as it can be shown that every affine set can be expressed as the solution set of a

system of linear equations.

12

4.2 Convex Sets

The strong relation between affine sets and sets of linear constraints directly points towards

some limitations. For instance, constraints are often non-linear. But, as we will see in the

next chapter, a very important class of non-linear constraints can be locally approximated by

linear constraints, so that non-linearity need not be a too big problem. However, even when

constraints are linear, a more serious issue is the existence often inequality (rather than or in

addition to equality) constraints. The concept of convex sets helps us deal with that issue,

at least if many relevant cases. It is associated with the concept of convex combination and

arise naturally when the constraints of our optimization problem are convex functions8.

Definition: (Convex Combination)

A convex combination of the vectors x1, x2,...,xn is a linear combination of the vectors,

i.e., a sumn∑

i=1

λixi, λi ∈ R, such that the following additional requirements hold:

n∑i=1

λi = 1 and ∀i λi ∈ [0, 1]

Propostion: (Characterization of Convex Sets)

Let Y be a subset of a vector space X. Y is convex if and only if the convex combination

between any two of its elements is contained in Y.

Remark: Hence, to verify whether a set is convex or not, one has to make sure that any line

segment going through two different points of the set is fully contained in it!

Figure 3: Convex and non-convex sets

8We’ll define what that mean in the next chapter ;)

13

If a set is not convex, we may “convexify” it by adding the smallest possible amount of

elements such that the resulting set is convex. This idea is very close to that of the span.

When a subset is not a subspace, the minimal amount of points I have to add for it to become

a subspace are those which guaranty the closure of the set under linear combinations. To

make a non convex subset convex, I simply add those elements which guaranty the closure

of the set under convex combinations.

Definition: (Convex Hull)

Let Y be the subset of a vector space X. The convex hull, denoted Co(Y) is the smallest

convex set containing Y.

The convex hull of Y may also be expressed as the set of all possible convex combinations

of the elements of Y:

Co(Y) =

{x ∈ X : ∃y1, y2, · · · , yn ∈ Y and λ ∈ [0, 1]n s.t.

n∑i=1

λi = 1 and x =n∑

i=1

λiyi

}

Figure 4: Convex Hulls

To conclude the subsection, here are some important properties that will, in some cases,

help you decide on the convexity of a given set:

Proposition: (Operations which preserve convexity)

Let C be the collection of all convex sets in the ambient space X. Let Ca be an arbitrary

collection of convex sets. Then:

(i) ∀ K ∈ C ∀α ∈ R αK ∈ C a

(ii) ∀ K, G ∈ C K + G ∈ C(iii) ∩K∈CaK ∈ C

aFor all k in K, let kα := αk. Then et αK := {kα ∈ X}

Proof: See Exercise 4.

14

4.3 Important Examples: Hyperplanes and Half Spaces

Please keep in mind that every subspace is an affine set, that every affine set is a convex

set, but that the converses of these two statements are not true!!

In this section we shall display two families of sets which play an important role in convex

optimization theory. The first type of sets, called hyperplanes, is a family of affine sets and

is thus also a family of convex sets. The second type, called halfspaces, constitute only a

family of a convex sets.

Formally, a hyperplane should be defined as a maximal proper affine set, that is, an affine

set Y that is a strict subset of the ambient space X and is such that if any affine set V 6= Y

contains Y, then V = X. However, economist generally use a specific characterization of

hyperplanes to define them, which is valid for classical vector spaces (i.e., Euclidean spaces).

This characterization is very important on its own right and I use it too to define hyperplanes:

Definition: (Hyperplane)

Let X be a subspace of Rn. Then, a hyperplane of X is a set of the form:

Hba := {x ∈ X | a′x = b}

where a is an element of Rn that is different from 0 and b is an element of R.

Figure 5: Hyperplanes in R and R2

Geometrically, the hyperplane Hba := {x ∈ X | a′x = b} can be interpreted as the set of

points whose inner product to a given vector a is constant, or, put otherwise as a plane

15

with normal vector a9. The constant b determines the offset of the hyperplane from the

origin. A hyperplane of a space of dimension n necessarily is a set of dimension (n− 1), as

its characterizing equation restricts only one degree of freedom: that spanned by the vector

a. Further, one may wish to define an “above” and a “below” along the dimension spanned

by a. This is the job of half-spaces.

Definition: (Halfspace)

Let X be a subspace of Rn. Then, a halfspace of X is a set of the form:

Hb−a := {x ∈ X | a′x ≤ b}

or

Hb+a := {x ∈ X | a′x ≥ b}

where a is an element of Rn that is different from 0 and b is an element of R.

Put differently, a hyperplane incidentally defines two halfspaces. The halfspace deter-

mined by a′x ≥ b is the halfspace extending in the direction of a and the halfspace determined

by a′x ≤ b is the halfspace extending in the direction of −a.

Figure 6: Halfspaces in R2

9a′x = b⇔ a′x− b = 0⇔ a′(x−y) = 0 where y is such that a′y = b and also belongs to the hyperplane!!Reminder, a zero inner product indicates orthogonality!

16

5 Normed Vector Spaces and Continuity

You may remember from your previous classes, if they involved some optimization problems,

that you were usually considering continuous – and, actually, differentiable – functions. The

reason for that is as follows. Assume one wants to find out the maximum of a given function.

An obvious, yet inconvenient, way to proceed is to take every single element of the function’s

domain, to plug it into the function, to look at the output and to compare it to that asso-

ciated with the other elements. Needless to say, no one seriously considers this procedure!

So we have to find some kind of “tools” or “tricks” that will spare us all that tedious work.

But, of course, a tool is generally designed for a specific kind of object. Certainly, you would

not claim to efficiently open a bottle of wine with a beer opener. Well, the same applies to

our optimization tools. They will work very well, but only on a specific class of functions.

Namely, those which are continuous and differentiable.

Therefore, it is important for us to know what continuity precisely means. If you remember

the preliminary chapter, we associated continuity with the requirement that the images of

two nearby points should not stand “too far” apart one from another. The purpose of this

section is to make this statement formal, within the context of vector spaces. As the algebraic

concepts we just discussed do not say anything specific about how “close” two objects are

in the space, we introduce the concepts of distance and norm in a vector space.

5.1 Distance and Norm in a Vector Space

Many basic mathematical concepts are very intuitive. For sure, the concept of a distance

function is one of the most intuitive that could be. Consider two objects that stand nearby

you, and ask yourself what properties you would like a distance function to have if it was

to give you the distance between these two objects. You shall wish to set a conventional

minimal distance, for cases where the two objects considered lie at the same place (i.e., are

the same object). Thus, a first wish could be that the function always yields a non-negative

output. Second, it seems natural to consider only such functions which give an output robust

to changes in the sense of measurement : whether one starts from object 1 and measures all

the way to object 2, or whether one proceeds the other way around, the output should be

the same. Finally, a third natural requirement is the following: when asked (i) to measure

the distance between object 1 and 2 and (ii) to measure the distance between object 1 and 2

while being imposed to pass by object 3, one should hope the outcome from (i) to be, in some

sense, “smaller” than the outcome from (ii). As the following formal definition will show,

these three properties are exactly what defines, in the eyes of mathematicians, a distance

function.

17

Definition: (Metric Space)

Let X be a vector space. If we can define a real-valued function d(., .) which maps any

two elements x and y in X into a real number d(x, y), and if that function is such that:

(i) ∀x, y ∈ X, d(x, y) ≥ 0, d(x, y) = 0 if and only if x = y, (non-negativity)

(ii) ∀x, y ∈ X, d(x, y) = d(y, x), (symmetry)

(iii) ∀x, y, z ∈ X d(x, y) ≤ d(x, z) + d(z, y), (triangle inequality)

then d(., .) is called a distance functiona for X and (X, d(., .)) a metric space.

aa.k.a. metric.

Remark: Note that the possibility to define such a function isn’t guarantied under all cir-

cumstances, which implies a loss of generality. But do not worry to much about that, most

economic applications can make use of metric spaces – and if not, then generalizations exist!

Example: An important example of a distance function in R is the absolute value of the

difference: d(x, y) = |x− y|.

The attentive reader may claim to have been fooled. Indeed, I consciously passed over

another important property which we would like distance functions to possess. Assume

one picks the two objects you investigated earlier on and translate them in the same sense

from 1 meter, i.e. move them from 1 meter in a parallel fashion and in the same sense.

Would you expect the distance between the two objects to have changed? Certainly not.

Well, this requirement is not imposed in the above definition, and there is thus no reason

that it be fulfilled. (We will see an example at the end of this section!) One strategy to im-

pose this further requirement is to define the distance via a norm. Let us detail that strategy.

Definition: (Normed Space)

Let X be a vector space. If we can define a real-valued function ‖.‖ which maps each

element x in X into a real number ‖x‖, and if that function is such that:

(i) ∀x ∈ X, ‖x‖≥ 0, ‖x‖= 0 if and only if x = 0, (non-negativity)

(ii) ∀x, y ∈ X, ‖x+ y‖≤ ‖x‖+‖y‖, (triangle inequality)

(iii) ∀x ∈ X ∀λ ∈ R, ‖λx‖= |λ|‖x‖. (absolute homogeneitya)

Then ‖.‖ is called a norm for X and (X, ‖.‖) a normed space.

aYou’ll understand in chapter 3 why it is called this way. Keep the name in mind! ;)

Remark: From point (ii) it is easy to derive the following fact (do it!):

‖x− y‖≥ ‖x‖−‖y‖

18

An important example for us is the Euclidean norm in Rn:

∀x = (x1, ..., xn) ∈ Rn ‖x‖:=

(n∑

i=1

x2i

)1/2

= (x′x)1/2

But this is certainly not the only one. We can define, for instance, the norm of a matrix as

follows:

∀A ∈Mm×n ‖A‖:= maxx∈Rn,‖x‖=1

{‖Ax‖}

Norms relates to distance functions as follows: on a given vector every norm defines a

distance function! In other words, the existence of a norm is slightly more demanding than

that of the distance function and, as a consequence, any normed vector space necessarily is a

metric space (while the converse need not be true!). To get a bit of intuition, let us consider

a distance function between any x in X and the zero element of X (i.e. the origin). By the

above definition of a distance, we have:

(i) ∀x ∈ X, d(x, 0) ≥ 0, d(x, 0) = 0 if and only if x = 0,

(ii) ∀x ∈ X, d(x, 0) = d(0, x),

(iii) ∀x, y ∈ X d(x, y) ≤ d(x, 0) + d(0, y).

Now the two concepts look even more similar! It is thus worth asking the following

question: does d(., 0) define a norm in X? Clearly, the triangular inequality is fulfilled and

so is the non-negativity property. Yet, nothing here guarantees the absolute homogeneity

property, and, thus, unless we add some more requirements, d(., 0) need not define a norm

in X. However, the following result can actually be shown:

Proposition: (Norm vs. Distance)

Let (X, ‖.‖) be a normed vector space. Then, the function

d(x, y) = ‖x− y‖ ∀x, y ∈ Xis a distance function in X. Further, d(., .) exhibits the following extra properties:

(i) ∀x, y ∈ X ∀λ ∈ R d(λx, λy) = |λ|d(x, y), (absolute homogeneity)

(ii) ∀x, y, z ∈ X d(x+ z, y + z) = d(x, y). (translation invariance)

Conversely, a metric d(., .) that exhibits the extra properties of absolute homogeneity

and translation invariance defines a norm by defining the norm of an element x as its

distance from the origin.

Example: (The French Railway Metric is not translation invariant)

Consider the following example in R2:

19

d(x, y) =

{‖x‖+‖y‖ if x and y are independent

‖x− y‖ if x and y are colinear

with the bars denoting the euclidean norm. You should verify that it satisfies the three

properties defining a distance function. Yet, it is not translation invariant since if we take,

for instance, a strictly positive vector z in R and two independent strictly positive vectors

x, y, then

d(x+ z, y + z) = ‖x+ z‖+‖y + z‖= ‖x‖+‖y‖+2‖z‖> d(x, y)

For those who want a bit of intuition here, that distance function is simply imposed to pass

by the origin when measuring the distance between two points that are not contained in

a single ray from the origin. It is called the French Railway Metric because it used to be

almost true that, in France, if you were to travel between two cities that are not contained

in a single ray from Paris, then you was imposed to travel through Paris. See, for instance,

the figure below, where to go from T (Toulouse) to B (Barcelona) you can proceed without

going through Paris, while to go from T to B’ (Bordeaux), you need to go through Paris.

Then, if one translates the origin to, say, Mannheim, the distance between Toulouse and

Barcelona, or between Toulouse and Bordeaux, as measured by the French Railway metric,

will change!

Figure 7: The French Railway Metric is not Translation Invariant

5.2 Open sets, Closed sets, Compact sets

All the statement of this subsection are assuming the ambient space to be a metric space, so

of course they do hold in normed vector spaces!

In our quest for a formal definition of continuity, the concepts of open and closed sets

are the next necessary stop. So let us proceed.

20

Definition: (ε-Open Ball)

Let (X, d(., .)) be a metric space, x0 be an element of X, and ε be a strictly positive real

number. The ε-open ball Bε(x0) centered at x0 is the set of points whose distance from

x0 is strictly smaller than ε, that is:

Bε(x0) = {x|x ∈ X, d(x− x0) < ε}

Definition: (ε-Closed Ball)

Let (X, d(., .)) be a metric space, x0 be an element of X, and ε be a strictly positive real

number. The ε-closed ball Bε[x0] centered on x0 is the set of points whose distance from

x0 is smaller than or equal to ε, that is:

Bε[x0] = {x|x ∈ X, d(x, x0) ≤ ε}

Example: You can see any open interval (a, b) with a, b ∈ R and a < b as a (b − a)/2-open

ball centered around (a + b)/2 in R. Similarly, any closed interval [a, b] with a, b ∈ R and

a < b can be seen as a closed ball in R. Note that neither can be seen as closed or open balls

if the universal space has a dimension higher than that of R!

Coming back to geometrical intuition, wherein a set corresponds to a “geographical area”

in the universal space, we can informally grasp the concepts of open and closed sets. Namely,

an area may or may not have a boundary. Assume it has one. If that boundary is considered

to belong to the area, then we call the area a closed area. If, on the countrary, the boundary

is not considered to belong to the area, we call the area an open area. If part of the boundary

belongs to the area while some other part does not, then the area is neither closed nor open!

Finally, by convention, if no boundaries exist, as for the empty set of the universal set, the

area is said to be both closed and open. The next definitions use the concepts of open and

close balls to formalize these ideas.

Definition: (Interior Point, Interior)

Let A be a subset of a metric space X. The point a in A is said to be an interior point

of A if and only if there exists ε > 0 such that the ε-open ball centered at a lies entirely

inside A. The collection of all interior points of A is called the interior of A, denoted

Int(A) or A

Definition: (Open Set)

Let A be a subset of a metric space X. A is said to be an open set if an only if A =Int(A)

Hence, any set A contains its interior, but the converse is true if and only if A is open.

21

Definition: (Closure Point, Closure)

Let A be a subset of a metric space X. The point x in X is said to be a closure point

of A if and only if, for every ε > 0, the ε-open ball centered at x contains at least one

point a that belongs to A. The collection of all closure points of A is called the closure

of A, denoted A

Definition: (Closed Set)

Let A be a subset of a metric space X. A is said to be a closed set if an only if A = A

Hence, any set A is included in its closure, but the converse is true if and only if A is

closed. We now can characterize the boundary as the a set of elements such that, if they all

belong to A, then A is closed, and, if none of them belong to A, then A is open.

Definition: (Boundary Point, Boundary)

Let A be a subset of a metric space X. The point x in X is said to be a boundary point

of A if and only if, for every ε > 0, the ε-open ball centered on x contains at least one

point a that belongs to A and at least one point ac that belongs to the complement of

A, AC. The collection of all boundary points of A is called the boundary of A and

denoted ∂A

We may now rephrase our concepts of open and closed sets as follows:

• Let A be a subset of a metric space X. Then A is open if and only if none of the

boundary points of A lie in A: A ∩ ∂A = ∅.

• Let A be a subset of a metric space X. Then A is closed if and only if all the boundary

points of A lie in A: A ∩ ∂A = ∂A.

Figure 8: Closed sets, open sets, boundary

Three last results which will help you make sure whether some set is open or closed. Some

intuition will be given for them via examples in the exercises.

22

Theorem: (Properties of Open sets)

Let (X, d(., .)) be a metric space. Then

(i) ∅ and X are open in X.

(ii) A set A is open if and only if its complement is closed.

(ii) The union of an arbitrary (possibly infinite) collection of open sets is open.

(iii) The intersection of a finite collection of open sets is open.

Theorem: (Properties of Closed sets)

Let (X, d(., .)) be a metric space. Then

(i) ∅ and X are closed in X.

(ii) A set A is closed if and only if its complement is open.

(iii) The union of a finite collection of closed sets is closed.

(iv) The intersection of an arbitrary (possibly infinite) collection of closed sets is closed.

Fact: (Interior and Closure of Convex Sets)

Let A be a subset of a metric space X. If A is a convex set, then so are A and Int(A).

5.3 Continuity

We are finally ready to formally define continuity, one of the most fundamental requirements

for the application of our optimization toolbox.

Fact: (Continuous Function)

A function mapping from a metric space (X, dX(., .)) to a metric space (Y, dY(., .)) is

continuous at x0 ∈ X if and only if, for every ε > 0, there is a δ > 0 such that if

dX(x, x0) < δ, then dY(f(x), f(x0)) < ε. A function that is continuous at every point of

its domain is said to be continuous.

According to this definition, a function is continuous at some point x0 if, for any element

contained in a δ-open ball around x0, we can be sure that the images of both this element

and x0 are (i) well defined, and (ii) contained in a ε-open ball around f(x0), where ε can be

chosen as small as we wish by just selecting a small enough δ. In fact, one can also generalize

the useful characterization of continuity that we have seen in the preliminary chapter, i.e.,

the characterization in terms of converging sequences. This is done by first generalizing the

notion of convergence to the context of metric spaces, which is achieved rather easily by

noting that, independently of our space’s dimension, the distance function maps into the

real line!

23

Definition: (Convergence)

Let X be a metric space. The infinite sequence of vectors {xn}n∈N is said to converge to

a vector x if the sequence {d(xn, x)}n∈N of real numbers converges to 0. That is,

∀ε > 0 ∃N ∈ N ∀n > N d(xn, x) < ε

In this case, we write xn → x.

The fact that the limit of a converging sequence is unique in a metric space is easily shown

using the nonnegativity and triangle inequality properties of a metric. Suppose xn → x and

xn → y, then ∀n ∈ N 0 ≤ d(x, y) ≤ d(x, xn) + d(xn, y) and the convergence of the left and

right hand side suffices to establish the result (squeeze theorem a.k.a. sandwich theorem10).

Proposition (Characterization of Continuity)

A function mapping from a metric space (X, dX(., .)) to a metric space (Y, dY(., .)) is

continuous at x0 ∈ X if and only if xn → x implies f(xn)→ f(x).

10If you have a sandwich inequality a ≤ x ≤ b and a and b converge to the same limit, then x convergesto that same limit too. Please note that I approve much more of the French name of that theorem: the cops’theorem!

24

A Appendix 1: The Separating and Supporting Hy-

perplane Theorems

With the concepts of hyperplane, halfspace, and convex set are associated two extremely

important results for convex optimization. Both of these results are geometrically very intu-

itive, and their proof is way beyond the scope of this lecture. Therefore, I only illustrate them.

Theorem: (Separating Hyperplane Theorem V.1)

Let C be a convex set in a metric space X, with non-empty interior, and x be an element

of X not in Int(C). Then, there is a hyperplane in X containing x but no interior point

of C.

Figure 9: Separating Hyperplane Theorem

Definition (Supporting Hyperplane)

Let C be a subset of a metric space X and x0 be a point in its boundary ∂C. If a 6= 0 is an

element of X that satisfies a′x ≤ a′x0 for all x in C, then the hyperplane {x|a′x = a′x0}is called a supporting hyperplane to C at x0.

Figure 10: Supporting Hyperplane Theorem

25

Theorem (Supporting Hyperplane Theorem)

Let C be a convex subset of a metric space X. If Int(C) is non-empty and x0 is a point

in ∂C, then there exists a supporting hyperplane at x0.

Remark: There is also a partial converse to the supporting hyperplane theorem. Namely, if

a set is closed, has a non-empty interior, and has a supporting hyperplane at every point in

its boundary, then it is convex.

Figure 11: A nonempty closed set is convex if and only if it possesses a supporting hyperplaneat every point of its boundary!

The supporting hyperplane theorem allows us to state a new version of the separating hy-

perplane theorem, which will actually prove most useful.

Theorem: (Separating Hyperplane Theorem V.2)

Let C and D be two convex sets in a metric space X. Further, assume C∩D = ∅. Then,

there exists a 6= 0 in Rn and b in R such that for all x in C a′x ≤ b and for all x in D

a′x ≥ b. In other words, the affine function a′x− b is nonpositive on C and nonnegative

on D. The hyperplane {x ∈ X | a′x = b} is called a separating hyperplane for the sets

C and D.

Figure 12: Separating Hyperplane Theorem

26

Remark: Please note that the converse of this theorem is not true unless some further re-

quirements are added! That is, the existence of a separating hyperplane between two convex

sets C and D does not imply that C and D do not intersect. (Consider for instance the

degenerate case C = D = {0}.)

27

References

[1] Boyd, S., and Vandenberghe, L. Convex optimization. Cambridge university press,

2009.

[2] Gross, H. Res.18.006 calculus revisited: Single variable calculus (mit opencourseware:

Massachusetts institute of technology), 2010.

[3] Gross, H. Res.18-008 calculus revisited: Complex variables, differential equations, and

linear algebra. (mit opencourseware: Massachusetts institute of technology), 2011.

[4] Luenberger, D. G. Optimization by vector space methods. John Wiley & Sons, 1969.

28

Chapter 2 - Introduction to Vector Spaces - … · Chapter 2 - Introduction to Vector Spaces ......

Documents

Transcript of Chapter 2 - Introduction to Vector Spaces - … · Chapter 2 - Introduction to Vector Spaces ......