C:/kalev/Documents and Settings/kalev/My Documents ... · in probability. The course starts with...

63
UNIVERSITY OF TARTU INSTITUTE OF MATHEMATICAL STATISTICS Martingales Lecture notes Fall 2016 Kalev P¨ arna J. Liivi 2, 50409 Tartu, Email: [email protected] 1

Transcript of C:/kalev/Documents and Settings/kalev/My Documents ... · in probability. The course starts with...

UNIVERSITY OF TARTU

INSTITUTE OF MATHEMATICAL STATISTICS

Martingales

Lecture notes

Fall 2016

Kalev Parna

J. Liivi 2, 50409 Tartu,

Email: [email protected]

1

These lecture notes give an overview of martingales and their use in financial

mathematics. It assumes that the reader has passed a measure theoretic course

in probability. The course starts with introducing the concept of conditional ex-

pectation, followed by a treatment of discrete time martingales. Then continuous

time martingales are covered, including Brownian motion. The stochastic integral

is defined and Ito formula is shown. The theory is applied for pricing of options

by considering classical Black-Scholes model.

The notes are mainly based on the following books.

D. Williams. Probability with Martingales. Cambridge University Press, 1995.

Chapters 9, 10, 11, 12

A. Etheridge.A Course in Financial Calculus. Cambridge University Press, 2004.

D. Lamperton, B. Lapeyre. Introduction to Stochastic Calculus Applied to Fi-

nance. Chapman & Hall, 1996.

2

English - Estonian dictionary

Probability space (Ω,F ,P) - toenaosusruum

random variable - juhuslik suurus

conditional probability - tinglik toenaosus

expectation (mean value) - keskvaartus

variance - dispersioon

conditional expectation - tinglik keskvaartus

G- measurable - G-mootuv

filtration - filtratsioon

adopted to filtration - filtratsioonile kohandatud

martingale - martingaal

submartingale - submartingaal

supermartingale - supermartingaal

fair game - oiglane mang, aus mang

unfair game - ebaoiglane mang

previsible process - ennustatav protsess (ettenahtav protsess)

gamble - hasartmang

stopping time - peatumisaeg

stopped martingale - peatatud martingaal

stake on game n - panus mangus n

martingale transform - martingaalteisendus

stochastic integral - stohhastiline integraal

convergence theorem - koondumisteoreem

upcrossing - ulestous (loigu labimine altpoolt ules )

forward convergence theorem - ettepoole koondumise teoreem

bounded - tokestatud

Doob decomposition - Doobi lahutus

quadratic variation - ruutvariatsioon

uniform integrability - uhtlane integreeruvus

Brownian motion - Browni liikumine (= Wieneri protsess)

random walk - juhuslik ekslemine

3

central limit theorem - tsentraalne piirteoreem

stationary increments - statsionaarsed juurdekasvud

drift - triiv, trend

trajectory (path) - trajektoor

martingales with continuous time - pideva ajaga martingaalid

variation - variatsioon

semimartingale - semimartingaal

cross-variation - ristvariatsioon

change of measure - moodu vahetus

option - optsioon

maturity - taitmisaeg

payoff function - maksefunktsioon

option pricing - optsiooni hindamine (hinnastamine)

volatiliy - volatiilsus

stock price - aktsia hind

discounted price - diskonteeritud hind

4

1 Introduction

Martingales in simple words

A martingale is a mathematical model for a ’fair’ game. What is a fair game?

Consider the following example. A die is thrown and you earn 1 eur if the result is

1, 2, or 3, and you lose the same amount if the result is 4, 5, or 6. Your expected

win is therefore 0 (= 1· 12−1· 1

2), which means that you can not earn systematically

any additional money (random gains, both positive or negative, still possible).

Denote by Yn your total win after the game n, Yn = X1 +X2 + ...+Xn where Xi

is the win in game i (each Xi = +1 or −1). The random process Yn obtained in

this way is a martingale. (Such a process is also known under the name ”simple

random walk”.)

What is typical for a martingale? As in the example above, the martingale makes

steps whose average length is equal to 0. The consequence is that given the

current value Yn, the next value Yn+1 is in average the same as the current value

Yn. Mathematically, it is expressed by writing E(Yn+1|Yn) = Yn. The expression

on the left-hand side of the last equality is called ’conditional expectation’ of Yn+1

given Yn - one of the main concepts in our course.

How can one benefit from martingales? Suppose the process we are interested

in is a martingale. Then, if we know its today’s value, we also know something

very important about its tomorrow’s value - namely, we know that tomorrow’s

mean value is exactly the same as today’s value (which we know!). However, on

financial markets, any information about the future is very useful. Therefore, in

our course we will try to identify martingales in our market models, in order to

solve important pricing problems (e.g. option pricing).

Martingales have been intensively studied since 1940’s (Doob, Ito stochastic cal-

culus). They have been used in financial economics since 1970’s due to pioneering

works by F. Black and M. Scholes (Noble Prize to R. Merton and M. Scholes for

a new method to determine the value of derivatives).

5

Basic concepts of probability used in this course

Random experiment (trial) is an action whose consequence is not predeter-

mined.

Space of elementary events (sample space) Ω is the set of all possible outcomes

of the random trial. Each element ω in Ω is called an elementary event (sample

point).

σ-algebra of events (event space) F is a collection of subsets of Ω satisfying:

1. ∅,Ω ∈ F

2. if A1, A2, . . . ∈ F , then also ∪iAi ∈ F

3. if A ∈ F , then also A ∈ F .

Remark: From 1.-3. it follows that F is also closed w.r.t. intersections of its

elements: if Ai ∈ F then also ∩iAi ∈ F .

All elements of the event space F are called events.

Ex.1. Let us throw a die. Then Ω = 1, 2, . . . , 6 and we can define F as a

collection of all possible subsets of Ω, i.e. F = 2Ω, which contains 26 = 64

elements.

Ex.2. In the previous example, suppose you are betting on the result of a die

throw: you win or lose 1 euro depending on whether the outcome is an odd

number (1, 3, 5) or even number (2, 4, 6). Then you can use a simpler event space,

namely F = ∅,Ω, 1, 3, 5, 2, 4, 6.

In the latter example, we say that event space F is generated by the event A =

1, 3, 5, and we write F = σ(A). This is the smallest σ- algebra which contains

the subset A. σ-algebras can also be generated in a more sophisticated way (see

below σ-algebras induced by random variables).

6

Probability (probability measure) P is a function on F satisfying:

1. P(A) ≥ 0 for each A ∈ F

2. P(∅) = 0, P(Ω) = 1

3. if A1, A2, . . . do not intersect and each Ai ∈ F , then P(∪iAi) =∑

i P(Ai).

The triple (Ω,F ,P) is called a probability space.

Borel sets and Borel σ-algebra:

Consider a special case where Ω is the real line R (imagine that we are throwing

random point to the real line, or we measure yearly profit or loss of an enterprise).

Consider all possible intervals [a, b], a < b. The collection of all such intervals is

not a σ-algebra by itself (why?), and we have to add other necessary subsets in

order to get the requirements 1-3 fulfilled. For example, we have to add all unions

of intervals, their complements etc. Finally, we see that also open intervals (a, b)

must be included, together with their unions etc. The smallest σ-algebra which

contains all the intervals above is called Borel σ-algebra, and it is denoted by

B. Each element B of B is called a Borel set. We can say that the Borel σ-

algebra B is generated by the class K of all intervals, and we write B = σ(K).

Random variables

Let (Ω,F ,P) be a given probability space. Let X be a function X : Ω → R.

For a Borel set B we can consider its inverse image X−1(B) = ω : X(ω) ∈ B.Depending on the function X such an inverse set can be an event (i.e. X−1(B) ∈F), or not. The function X is called measurable (w.r.t. F) if for each Borel set

B ∈ B its inverse set X−1(B) ∈ F . Measurable functions are also called random

variables.

Example. The function X defined by

X(ω) =

+1, if ω = 1, 3, 5

−1, if ω = 2, 4, 6

7

is measurable w.r.t. the σ-algebra F1 = ∅,Ω, 1, 3, 5, 2, 4, 6 but it is not

measurable w.r.t. F2 = ∅,Ω, 1, 2, 3, 4, 5, 6.Remark: We see that X is measurable w.r.t. F if the value of X is changed only

on the borders of subsets of F .

Each random variable X generates its σ-algebra σ(X) which is the collection of

all subsets of the form X−1(B), where B is a Borel subset:

σ(X) = X−1(B) : B ∈ B.

Note that always σ(X) ⊂ F (sub-σ–algebra).

Example: Ω = all students in the classroom.Let X(ω) = 0, if ω =male student, and X(ω) = 1, if ω =female student.

Then σ(X) consists of 4 subsets (which ones?)

Distribution function of X is defined as F (t) = PX ≤ t ≡ ω : X(ω) ≤ t

Expectation (or expected value)

There are two important special classes of RV’s - discrete and continuous RV’s.

A discrete random variable has at most countably many different values

x1, x2, . . ., with respective probabilities p1, p2, . . .. The expectation of a discrete

RV is calculated as EX =∑

i xipi.

A continuous random variable has density function fX(x) = F ′(x), its expec-

tation is calculated as EX =∫∞−∞ xf(x)dx

These two formulas are special cases of the general formula of expectation:

EX =

Ω

X(ω)P(dw) =

Ω

XdP

which is called Lebesgue integral (here X is an arbitrary RV).

Alternative notation is EX =∫xdF (x) - Stiltjes integral.

8

2 Conditional expectation

2.1 Conditional expectation with respect to discrete ran-dom variable

Let (Ω,F ,P) be a probability space and X, Z two discrete random variables with

values x1, x2, . . . , xm ja z1, z2, . . . , zn, respectively. From elementary probability

theory the conditional probability is known to be:

P(X = xi|Z = zj) =P(X = xi, Z = zj)

P(Z = zj),

and conditional expectation of X with respect to the random event Z = zj is

E(X|Z = zj) =∑

xiP(X = xi|Z = zj).

Conditional expectation of a random variable X with respect to another random

variable Z is defined as a (new) random variable Y = E(X|Z), given by the

equation

if Z(ω) = zj, then Y (ω) = E(X|Z = zj) := yj.

It is useful to look at this in a new way. The random variable Z generates (cre-

ates) a σ-algebra G = σ(Z) consisting of sets of the form 1 Z ∈ B, B ∈ B,where B is the Borel σ-algebra. As in our case Z is discrete, each set Z ∈ Bis a union of some Gj’s and G = σ(Z) consists of all possible unions of the sets

Gj (the total number of such unions is 2n, including the empty set ∅). Since the

conditional expectation Y is (similarly to Z) is constant on subsets Gj, then for

each B ∈ B also Y −1(B) ∈ G or, in other words,

Y is G −measurable. (1)

1By notation Z ∈ B we mean the set ω : Z(ω) ∈ B := Z−1(B), i.e. inverse image of aBorel subset B.

9

Next, since Y takes constant value yj on the subset Gj, we have

Gj

Y dP = yjP(Z = zj) =∑

i

xiP(X = xi|Z = zj)P(Z = zj)

=∑

i

xiP(X = xi, Z = zj) =

Gj

XdP.

Since each G ∈ G is a union of certain Gj’s, then, by summing up respective

integrals, we get ∫

G

Y dP =

G

XdP, ∀G ∈ G. (2)

Results (1) and (2) suggest the following central definition of modern probability.

2.2 Conditional expectation

Theorem 1. (Kolmogorov, 1933) Let (Ω,F ,P) be a probability space and X

a random variable with E(|X|) < ∞. Let G ⊂ F be a sub-σ-algebra. Then there

exists a random variable Y such that

(a) Y is G-measurable,

(b) E(|Y |) < ∞,

(c)∫GY dP =

∫GXdP, ∀ G ∈ G.

If Y is another random variable with these properties then Y = Y a.s., that is

P(Y = Y ) = 1. Each random variable Y with properties (a)–(c) is called the

conditional expectation of X w.r.t. G and we write Y = E(X|G) a.s.

Remark. The random variables with properties (a)–(c) are also called the ver-

sions of conditional expectation E(X|G). It follows that all versions are pair-wiseequal a.s.

Notation. If G = σ(Z), then we usually write E(X|Z) instead of E[X|σ(Z)].

10

Proof of Theorem1. We apply Radon-Nikodym theorem: If µ and ν are σ-

finite measures and ν ≪ µ, then there exists a non-negative function f (density)

such that ν(A) =∫Afdµ for each A ∈ F . If g is another such density, then

µf 6= g = 0.

Suppose first that X ≥ 0. Denote ν(G) :=∫GXdP, G ∈ G. Then ν ≪ P

and by R-N theorem there exists a (G-measurable) function Y ≥ 0 such that∫GY dP = ν(G) ≡

∫GXdP. If X can also take negative values, then we use the

decomposition X = X+ −X− and apply R-N separately for two parts, thus ob-

taining Y + and Y − and define Y = Y + − Y −.

Interpretation of conditional expectation. A random experiment is exer-

cised. Suppose we do not know the outcome ω exactly but, instead, we know

about each event G ∈ G whether it occurred or not (i.e. whether ω ∈ G or not).

Then Y (ω) = E(X|G)(ω) is the expected value of X given such an information. If

G is trivial σ-algebra ∅,Ω, it does not give any additional information about ω

(since ∅ never occurs and Ω always occurs), and then the conditional expectation

reduces to the (ordinary) expected value, E(X|G)(ω) = E(X) for all ω.

Conditional expectation as the best G-measurable prediction of X.

If E(X2) < ∞, then the conditional expectation Y = E(X|G) is orthogonal

projection of X onto the space L2(Ω,G,P). In other words, Y minimizes the

expression E(X − Y )2 over all possible G-measurable functions (i.e. amongst all

predictors which can be computed from the available infornmation G).

11

2.3 Properties of conditional expectation

Assume that E(|X|) < ∞ and that G, H are sub-σ-algebras of F , i.e. G,H ⊂ F .

(a) If Y is any version of E(X|G), then E(Y ) = E(X).

(b) If X is G-measurable, then E(X|G) = X a.s.

(c) (Linearity) E(a1X1 + a2X2|G) = a1E(X1|G) + a2E(X2|G) a.s.

(d) (Positivity) If X ≥ 0, then E(X|G) ≥ 0 a.s.

(e) (cMon) If 0 ≤ Xn ↑ X, then E(Xn|G) ↑ E(X|G) a.s.

(f) (cFatou) If Xn ≥ 0, then E(lim infXn|G) ≤ lim inf E(Xn + |G) a.s.

(g) (cDom) If |Xn(ω)| ≤ V (ω), ∀n, EV < ∞ and Xn → X a.s., then

E(Xn|G) → E(X|G) a.s.

(h) (cJensen) If c : R → R is convex and E|c(X)| < ∞, then

E[c(X)|G] ≥ c(E[X|G]) a.s.

Corollary: ‖E(X|G)‖p ≥ ‖X‖p p.k.

(i) (Tower property) If H is sub-σ-algebra of G, then

E[E(X|G)|H] = E[X|H] a.s.

(j) (Take out what is known) If Z is G-measurable and bounded, then

E(ZX|G) = ZE(X|G) a.s.

It also holds if p > 1, p−1 + q−1 = 1, X ∈ Lp(Ω,F ,P), Z ∈ Lq(Ω,G,P),or if X ∈ (mF)+, Z ∈ (mG)+,EX < ∞ ja E(ZX) < ∞.

(k) (Independence) If H is independent of σ(σ(X),G), then

E[X|σ(G,H)] = E[X|G] a.s.

In particular, if X is independent of H-st, then E(X|H) = E(X) a.s.

12

2.4 Proofs of properties of conditional expectation

The proofs are based on the definition of conditional expectation and correspond-

ing properties of the expected value (integral).

The proofs of properties (a)–(c) are left to the reader.

Property (d). Let X ≥ 0 and Y = E(X|G). Suppose, contraversially, thatP(Y < 0) > 0. Then there exists an n such that the set G = Y < −1/n has

positive probability, and hence 2

0 ≤ E(X;G) = E(Y ;G) < − 1

nP (G) < 0.

We reached a contradiction.

Property (e). Let Yn = E(Xn|G). Then for positivity of conditional expec-

tation we have that 0 ≤ Y1 ≤ Y2 ≤ . . .. Define Y = lim supYn. Then Y is

G-measurable and Yn ↑ Y a.s.. Show that Y = E(X|G). For that, we apply

monotone convergence theorem to both sides of the equality

E(Yn;G) = E(Xn;G), G ∈ G.

The result is E(Y ;G) = E(X;G), ∀G ∈ G.

Property (f). Denote

Yn = infm≥n

Xn, Y = limn→∞

Yn ≡ lim infXn.

Then Yn ↑ Y and by (e) we have E(Yn|G) ↑ E(Y |G). Therefore

E(lim infXn|G) = E(Y |G) = lim inf E(Yn|G) ≤ lim inf E(Xn|G) a.s.

where the inequality comes from the property (d).

2E(X;G) denotes the integral

∫GXdP

13

Property (g). Apply the property (f) to sequences Xn + V and V −Xn .

Property (h).

Property (i) follows immediately from the definition.

Property (j).

14

3 Martingales

3.1 Filtered spaces. Adopted processes

Let (Ω,F ,P) be a probability space.

Definition 1. An increasing family of sub-σ-algebras Fn, n ≥ 0 satisfying

F0 ⊆ F1 ⊆ . . . ⊆ F

is called filtration.

Denote

F∞ := σ

(⋃

n

Fn

)⊆ F .

Intuitive idea. The concept of filtration is used to express the growth of infor-

mation in time. At time n the information about ω consists precisely of the values

of all Fn-measurable functions Z(ω). Usually Fn is the natural filtration

Fn = σ(W0,W1, . . . ,Wn)

of some random (stochastic) process Wn, and then the information about ω

which we have at time n is the current history of the process i.e.

W0(ω),W1(ω), . . . ,Wn(ω)

(from which ω can not be determined uniquely, as a rule).

Definition 2. We say that the random process X = (Xn : n ≥ 0) is adopted to

the filtration Fn if for each n, Xn is Fn -measurable.

Intuitive idea. If X is adopted, then its value Xn(ω) is known to us at time

n. Usually, Fn = σ(W0,W1, . . . ,Wn) and Xn = fn(W0,W1, . . . ,Wn) for some

measurable function fn.

15

3.2 Martingale, supermartingale, submartingale

Definition 3. A process X is called martingale (relative to Fn ) if

1. X is adopted,

2. E|Xn| < ∞,

3. E(Xn|Fn−1) = Xn−1 a.s. (n ≥ 1).

Supermartingale is defined similarly, except that 3. is replaced by

E(Xn|Fn−1) ≤ Xn−1 a.s. (n ≥ 1)

and a submartingale is defined with 3. replaced by

E(Xn|Fn−1) ≥ Xn−1 a.s. (n ≥ 1).

A supermartingale decreases on average, a submartingale increases on average in

time.

Note that X is martingale if it is both supermartingale and submartingale. As-

suming that X0 ∈ L1(Ω,F ,P), X is martingale if and only if X − X0 has the

same property. The same is true for supermartingales and submartingales. So

we can focus attention on processes with X0 = 0.

If X is for example a supermartingale, then the Tower Property (i) shows that

for m < n,

E(Xn|Fm) = E(Xn|Fn−1|Fm) ≤ E(Xn−1|Fm) ≤ . . . ≤ Xm a.s.

3.3 Examples of martingales

Example 1. Sums of independent zero-mean RV’s.

Let X1, X2, . . . be a sequence of independent RVs with E|Xk| < ∞, ∀k and

EXk = 0, ∀k.

16

Define S0 = 0, F0 = ∅,Ω and

Sn =n∑

i=1

Xi, n ≥ 1, , Fn = σ(X1, X2, . . . , Xn), n ≥ 1.

Then S = (Sn : n ≥ 0) is martingale relative to filtration Fn (show that!).

Example 2. Products of non-negative independent RVs of mean 1.

Let X1, X2, . . . be a sequence independent non-negative RVs with Xk ≥ 0 ja

EXk = 1, ∀k.

Define M0 = 1, F0 = ∅,Ω and

Mn =n∏

i=1

Xi, Fn = σ(X1, X2, . . . , Xn).

Then the process X = (Xn : n ≥ 0) is martingale relative to Fn (show that!).

Naide 3. Accumulating data about a random variable

Let Fnn≥0 be our filtration and let ξ ∈ L1(Ω,F ,P) (it means that ξ is inte-

grable, E|ξ| < ∞.). Define Mn = E(ξ | Fn) (’some version of CE’). The random

variable Mn is a ’coarse’ (Fn-measurable) version of ξ, i.e. the best prediction of

ξ given the information available to us at time n, which becomes more and more

precise as n → ∞. Show that the process M = (Mn : n ≥ 0) is a martingale. By

the Tower Property of CE, we have (a.s.)

E(Mn|Fn−1) = E(ξ|Fn|Fn−1) = E(ξ|Fn−1) = Mn−1.

Hence M is a martingale.

Later we will be able to show that

Mn → M∞ := E(ξ|F∞), a.s.

3.4 Fair and unfair games

Let Xn −Xn−1 be your net winnings per unit stake in game n (n ≥ 1) in a series

of games, played at times n = 1, 2, . . . . There is no game at time 0. A simple

17

example is obtained by a series of coin tosses where the outcome of the toss at

time k is

∆k =

+1, if head

−1, if tail

and Xn =∑n

k=1∆k.

In the martingale case

(a) E(Xn −Xn−1|Fn−1) = 0, (game series is fair)

and in the supermartingale case

(b) E(Xn −Xn−1|Fn−1) ≤ 0, (game series is unfavorable to you).

Note that we have the case (a) if the coin is symmetric, and case (b) if −1 is

more probable than +1.

3.5 Predictable process, gambling strategy

Definition 4. A process C = (Cn : n ≥ 1) is called predictable,3 if Cn is

Fn−1-measurable ( n ≥ 1).

One can think about Cn as your stake on game n. You have to decide on the

value of Cn based on the history up to (and including) time n − 1. This is the

intuitive meaning of the predictable character of C. Your winnings on game n

are Cn · (Xn −Xn−1) and your total winnings up to time n are

Yn =n∑

i=1

Ci(Xi −Xi−1) =: (C •X)n. (3)

Note that Y0 = (C •X)0 := 0 and Yn − Yn−1 = Cn(Xn −Xn−1).

3also called previsible

18

The process C •X is called the martingale transform of X by C. This is the

discrete analogue of the stochastic integral∫C dX. Stochastic-integral theory is

one of the greatest achievements of the modern theory of probability.

Theorem 2. (You can’t beat the system)

(i) If C is a bounded (|Cn| ≤ K < ∞, ∀n) predictable process and X is martingale,

then C •X is a martingale null at 0.

(ii) If, in addition, C is a non-negative and X is a supermartingale, then C •Xis a supermartingale null at 0.

Remark. Assertions (i) and (ii) ramain valid when the boundedness of C is

replaced by the condition that E|Cn(Xn −Xn−1)| < ∞, ∀n ≥ 1.

Proof. Let C be bounded and previsible and let X be adopted. Let Y = C •X.

Then, by the property (j) (’Take out what is known’) of conditional expectation,

we have

E[Yn − Yn−1 | Fn−1] = E[Cn(Xn −Xn−1) | Fn−1] = CnE[Xn −Xn−1 | Fn−1]. (4)

If X is a martingale, then E(Xn − Xn−1 | Fn−1) = 0, and by the equality (4)

Y = C• is also a martingale. If Cn ≥ 0 and X is a supermartingale, then from

(4) it follows that

E[Yn − Yn−1 | Fn−1] ≤ 0,

and thus C •X is also a supermartingale.

19

4 Stopping times

Definition 5. A map T : Ω → 0, 1, . . . ;∞ is called a stopping time, if

T ≤ n ∈ Fn ∀n ∈ 0, 1, . . . ,∞

or, eqvivalently,

T = n ∈ Fn ∀n ∈ 0, 1, . . . ,∞.

Exercise. Show that the two conditions above are equivalent.

Note that T can be ∞.

Intuitive idea. T is a time when you can decide to stop playing the game.

Whether or not you stop immediately after the nth game depends only on the

history up to (and including) time n i.e. T = n ∈ Fn.

Example. Let A be an adopted process and let B be a Borel set. Then

T = infn : An ∈ B

– the time of the first entry into the set B – is a stopping time.

By convention, inf∅ = ∞, so that T = ∞ means the process A never enters

set B.

Example. Let Xn be a series of die throws, and let Sn = X1 +X2 + . . .+Xn.

Convince yourself that

L := supn : Sn ≤ 100

is NOT a stopping time.

20

4.1 Stopped martingales

Let X be a random process (a (super)martingale, for example) and let T be a

stopping time. For simplicity, we denote

T ∧ n = minT, n.

The process XT := (XT∧n : n ≥ 0) is called a stopped process. The stopped

process remains constant after the stopping time T . Show that stopping does not

change the martingale (or supermartingale) property of X.

Theorem 3. If X is a martingale (supermartingale) and T is a stopping time

then the stopped process XT = (XT∧n : n ≥ 0) is a martingale (supermartin-

gale) again.

Proof. The proof is based on Theorem 2. For that, we present the stopped process

as a total winnings process. Suppose you always bet 1 unit and quit playing at

(immediately after) time T ; it means your ’stake process’ is CT where

CTn (ω) =

1, if n ≤ T (ω);0, otherwise.

Now, let’s consider the ’winnings process’ CT •X with value at time n equal to

(CT •X)n =n∑

i=1

CTi (Xi −Xi−1) = XT∧n −X0 = XT∧n.

We see that, indeed, the stopped process XT∧n can be regarded as a winning

process. Clearly CT is bounded (by 1) and non-negative. Moreover, CT is pre-

visible because ∀n CTn = 0 = T ≤ n − 1 ∈ Fn−1. Now Theorem 2 part (i)

applies and XT∧n is a martingale. If the initial process X is a supermartingale,

part (ii) of the same theorem gives the result.

21

4.2 Doob’s Optional Stopping Theorem

It is important to know whether the martingale property EXn = EX0 (or respec-

tive supermartingale property EXn ≤ EX0) remains true when n is replaced by a

stopping time T. The main difference between two cases is that n is constant but

T is a random variable which depends on trajectory (on ω). The expected value

EXn is calculated over all trajectories at the same time n, while in case of EXT

each trajectory is taken into account at its individual (random) time moment

T = T (ω).

Theorem 4. (Doob’s Optional Stopping Theorem)

a) Let T be a stopping time and let X be supermartingale. Then each of the

following conditions ensures that XT is integrable and EXT ≤ EX0 :

(i) T is bounded (T ≤ K),

(ii) X is bounded (∃K : |Xn| ≤ K, ∀n ≥ 0) and T < ∞ a.s.

(iii) ET < ∞ and for some K > 0 |Xn −Xn−1| ≤ K, ∀n ≥ 1.

(iv) X ≥ 0 and T < ∞ a.s.

b) If T is a stopping time, X is a martingale and at least one of the conditions

(i)-(iii) is satisfied , then EXT = EX0.

Proof

Part a). Under the condition (i) the proof is simplest. Consider the stopped

process XT = (XT∧n, n ≥ 0). As trajectories of XT remain constant after the

stopping time T and as T ≤ K, we have XT ≡ XTT = XT

K , from which EXT =

EXTK . At the same time, by Theorem 3, XT is a supermartingale, and therefore

EXTK ≤ EXT

0 ≡ EX0. Hence also EXT ≤ EX0, as needed.

Under each of conditions (ii) – (iv) first observe that T < ∞ a.s. However, the

latter ensures that the stopped process XT converges:

XTn = XT∧n → XT a.s., (5)

22

as n → ∞. To proceed, in case (ii) use the Lebesgue theorem of dominated

convergence (DOM) which gives

limn

EXTn = EXT a.s. (6)

Again, as XT is a supermartingale, for each n we have EXTn ≤ EX0. Then, due

to (6), we also have EXT ≤ EX0, as needed.

In case of (iii) we first use

|XT∧n −X0| = |T∧n∑

k=1

(Xk −Xk−1)| ≤ KT

showing that XT has an integrable upper bound. Then, again the Lebesgue

theorem applies and the rest is the same as in case (ii).

In the case (iv) apply Fatou Lemma giving

E(lim infn

XTn ) ≤ lim inf

n(EXT

n )

Due to (5), the LHS is equal to EXT and the RHS is less than EX0 (for XT being

a martingale). Therefore, we have EXT ≤ EX0 again.

The part b) follows from a) (applied twice), since if X is a martingale then both

X and −X are supermartingales.

Corollary 1. Let M be a martingale such that for some K1 > 0 we have |Mn −Mn−1| ≤ K1 ∀n ≥ 1. Let C be a predictable process such that |Cn| ≤ K2 for some

K2 > 0. If T is a stopping time such that ET < ∞, then

E(C •M)T = 0.

Comment. The theorem says that when you play with bounded stake and the

win in a single game is also bounded, you quit the game with zero average.

Proof. By Theorem 2 the process

(C •M)n =n∑

i=1

Ci(Mi −Mi−1)

23

is a martingale. By Theorem 3, the stopped process (C•M)Tn is also a martingale,

and its increments are bounded:

Ci(Mi −Mi−1) ≤ |Ci||Mi −Mi−1| ≤ K2 ·K1.

Now Theorem 4 part b) case (iii) gives E(C •M)T = E(C •M)0 = 0.

In order to use Optional Stopping Theorem one has to show that T < ∞ and

sometimes ET < ∞. The following Lemma can be used for such a purpose.

Lemma 1. Let T be a stopping time such that for some N ∈ N and ε > 0 the

following inequality holds:

P(T ≤ n+N | Fn) ≡ E(IT≤n+N | Fn) > ε p.k. ∀n ≥ 1,

Then ET < ∞.

Sketch of the proof:

First we show, by means of mathematical induction, that P(T > kN) ≤ (1− ε)k.

Indeed, let us assume that it holds for k − 1, i.e. P(T > (k − 1)N) ≤ (1− ε)k−1.

Then

P(T > kN) = P(T > kN |T > (k − 1)N)) ·P(T > (k − 1)N))

= (1− ε)(1− ε)k−1 = (1− ε)k.

Now, using the fact that the probabilities P(T > k) decrease when k grows, and

grouping them by N , the expected value of T can be estimated as

ET =∞∑

k=0

P(T > k) ≤∞∑

k=0

NP(T > kN) ≤ N

∞∑

k=0

(1− ε)k =N

ε< ∞.

24

5 The Convergence Theorem

Many applications of martingales are based on their convergence property. Let

X be a random process. Let a and b, a < b be some real numbers. We denote by

UN [a, b](ω) the largest k such that there exist 0 ≤ s0 < t0 < s1 < t1 < . . . < sk <

tk ≤ N for which

Xsi < a, Xti > b ∀i ∈ 0, 1, . . . , N.

The number UN [a, b](ω) is called the number of upcrossing of [a,b] made by the

trajectory Xn(ω) by time N . The function UN [a, b] is FN -measurable (why?).

Lemma 2. (Doob’s Upcrossing Lemma) Let X be a supermartingale. Then

(b− a)EUN [a, b] ≤ E[(XN − a)−].

Proof

The idea is to use Theorem 2 (’You can’t beat the system!’). For that, we define

0–1 stakes by C1 = IX0<a,

Ci = ICi−1=0∩Xi−1<a + ICi−1=1∩Xi−1≤b, i ≥ 2.

The process C is previsible, bounded and non-negative, thus by the Theorem 2

the process Y = C •X with YN =∑N

i=1Ci(Xi −Xi−1) is also a supermartingale.

Therefore we have E(YN) ≤ E(Y0) = 0.

25

a

b

X

s......................

s.

.................................................................................

s.

....................................................

s.

.................................................................................................

s.

....................................................

s....................................

s.

................................................

s.

.....................................................................................(XN − a)−

0

Y

s......................

s.

.................................................................................

s.

....................................................

s.

.................................................................................................

s.

....................................................

s....................................

s.

................................................

s.

.....................................................................................

Figure 1: Above is the initial process X, below is Y = C • X. White circlescorrespond to stakes Ci = 0, black circles correspond to Ci = 1.

However, YN satisfies

YN ≥ (b− a)UN [a, b]− (XN − a)−

(look at the picture!) Taking expectations gives

(b− a)E(UN [a, b])− E[(XN − a)−] ≤ E(YN) ≤ E(Y0) = 0.

Thus we have

(b− a)E(UN [a, b]) ≤ E[(XN − a)−].

26

Define U∞[a, b] = limN→∞ UN [a, b].

We say that a process X is bounded in L1 if there exists a constant K such that

E|Xn| ≤ K for each n = 1, 2, ... or, equivalently, K := supn E(|Xn|) < ∞.

Lemma 3. Let X be a supermartingale bounded in L1 (i.e. supn E(|Xn|) < ∞).

Then for any a, b ∈ R, a < b, we have

P(U∞[a, b] = ∞) = 0.

Proof

By Lemma 2 we have

(b− a)EUN [a, b] ≤ E[(XN − a)−] ≤ |a|+ E(|XN |) ≤ |a|+K < ∞, ∀N ≥ 1.

Letting N → ∞ and using Monotone Convergence Theorem (MON) we have

EU∞[a, b] < ∞. But then it must be that

P(U∞[a, b] = ∞) = 0.

Theorem 5. (Doob’s Forward Convergence theorem)

Let X be a supermartingale bounded in L1 (i.e. K = supn E|Xn| < ∞). Then,

almost surely, the limit X∞ := limn→∞ Xn exists and is finite.

Proof

Note that

A = ω : Xn(ω) does not converge to a limit in [−∞; +∞]= ω : lim inf

nXn(ω) < lim sup

nXn(ω)

= ∪a,b∈Q:a<bω : lim infn

Xn(ω) < a < b < lim supn

Xn(ω)

= ∪a,b∈Q:a<bω : U∞[a, b](ω) = ∞.

27

Since A is a countable union of subsets ω : U∞[a, b](ω) = ∞ of zero-probability

(previous Lemma applies!), we see that P(A) = 0, whence

limn

Xn exists a.s. in [−∞; +∞].

Show now that −∞ and ∞ are in fact excluded, i.e. limn Xn can only take finite

values a.s. Indeed, by Fatou’s Lemma we have

E(|X∞|) = E(lim infn

|Xn|) ≤ lim infn

E(|Xn|) ≤ K < ∞,

so that P(|X∞| < ∞) = 1.

Corollary 2. If X is a non-negative supermartingale, then, almost surely, the

limit X∞ := limn→∞Xn exists and is finite.

Proof. Non-negative X is bounded in L1, since E|Xn| = EXn ≤ E0 for each n.

Now Theorem 5 applies.

28

6 Martingales bounded in L2

One of the easiest ways of proving that a martingale M is bounded in L1 (needed

in Doob’s Convergence Theorem!) is to prove that it is bounded in L2 in the sense

that supnE(X2n) < ∞. (This is based on the simple fact that |Xn| ≤ X2

n + 1,

showing that E(X2n) < ∞ implies E|Xn| < ∞.)

Let M = (Mn : n ≥ 0) be a L2-martingale. Then for any time moments

0 ≤ s ≤ t ≤ u ≤ v the increments Mt − Ms and Mv − Mu are uncorrelated

(orthogonal in L2):

E[(Mt −Ms)(Mv −Mu)] = E[E((Mt −Ms)(Mv −Mu) | Fu)]

= E[(Mt −Ms)E(Mv −Mu) | Fu)] = 0,

since E(Mv −Mu | Fu) = 0 in the martingale case. Now the simple formula

Mn = M0 +n∑

i=1

(Mi −Mi−1) (7)

expresses Mn as the sum of uncorrelated terms, and therefore we obtain

E(M2n) = E(M2

0 ) +n∑

i=1

E(Mi −Mi−1)2. (8)

Note that all terms on the right-hand side are non-negative, so that the second

moment (and also the variance) of Mn grow monotonically together with n.

Theorem 6. Let M be a L2-martingale. Then

(a) M is bounded in L2 if and only if

∞∑

k=1

E(Mk −Mk−1)2 < ∞. (9)

(b) If M is bounded in L2, then Mn → M∞ a.s. and in L2.

Proof. (a) For (8) the L2-boundedness of M is equivalent to (9). (b) Suppose now

(9) holds. Then M is bounded in L2, hence also in L1. By Doob’s Convergence

Theorem we have that Mn → M∞ =: lim infMn a.s.. Show now the convergence

29

in L2, meaning that limn E(M∞ −Mn)2 = 0. For orthogonality of increments we

have

E(Mn+r −Mn)2 =

n+r∑

k=n+1

E(Mk −Mk−1)2.

Letting r → ∞ and applying Fatou’s Lemma, we obtain

E(M∞ −Mn)2 = E(lim inf

r(Mn+r −Mn)

2) ≤ lim infr

E(Mn+r −Mn)2

=∞∑

k=n+1

E(Mk −Mk−1)2,

which tends to 0 when n → ∞ (the residual sum of a convergent series!). Hence

limn

E(M∞ −Mn)2 = 0,

so that Mn → M∞ in L2.

30

7 Doob Decomposition. Quadratic Variation

Theorem 7. (Doob decomposition)

(a) Let X be an adopted process, with Xn ∈ L1, ∀n. Then X can be represented

in the form

X = X0 +M + A,

where M is a martingale null at 0, and A is a previsible process null at 0. More-

over, this decomposition is unique up to the a.s. equivalence in the sense that

if X = X0 + M ′ + A′ is another such decomposition, then PM = M ′, A =

A′, ∀n = 1.

(b) The process X is a submartingale if and only if the process A is increasing

a.s.

Proof. (a) Define

A0 = 0,

An =n∑

i=1

E(Xi −Xi−1 | Fi−1), n ≥ 1,

and

Mn = Xn −X0 − An.

Show that A and M satisfy the conditions stated in the theorem. Obviously, A

is predictable (An ∈ Fn−1). Check that M is a martingale:

E(Mn −Mn−1 | Fn−1) = E(Xn −Xn−1 − (An − An−1) | Fn−1) (10)

= E(Xn −Xn−1 − (Xn −Xn−1) | Fn−1) (11)

= 0. (12)

Thus M is a martingale.

(b) If X is a submartingale, then E(Xi −Xi−1 | Fi−1) ≥ 0, ∀i, and we have

An =n∑

i=1

E(Xi −Xi−1 | Fi−1) ≥ An−1, (13)

so that A is increasing.

31

Example. Consider asymmetric random walk Sn =∑n

i=1 Xi with P (Xi = +1) =

0, 6 and P (Xi = −1) = 0, 4. Assume that S0 = 0. The process S is a submartin-

gale. It can be decomposed as Sn = S0 +Mn + An, where An = 0, 2 · n and Mn

is a martingale given by Mn = Sn −An =∑n

i=1 Xi − 0, 2 · n =∑n

i=1(Xi − 0, 2) =∑n

i=1 X′i, where the new ’steps’ X ′

i = Xi − 0, 2 take values 0,8 and -1,2 with

probabilities 0,6 and 0,4, as before. (Check that new steps have zero means!)

Let now M be a L2-martingale (i.e. E(M2n) < ∞, ∀n) starting from zero.

Then, by Jensen inequality (see property (h) of CE) we have E(M2n | Fn−1) ≥

≥ [E(Mn | Fn−1)]2 = M2

n−1, and hence

M2 is a submartingale.

Thus, by Theorem 7, M2 has the Doob decomposition

M2 = N + A, (14)

where N is a martingale and A is a previsible increasing process, both N and A

being null at 0.

The process A is often written 〈M〉 and is called quadratic variation of M .

Example. Consider a simple symmetric random walk Sn = X1 + . . . + Xn,

where Xi are independent random variables taking values +1 or −1 with equal

probabilities 12. We know that the processes Sn and S2

n−n are both martingales.

Hence the Doob decomposition (14) of the submartingale S2n is given by

S2n = (S2

n − n) + n

with martingale part Nn = S2n − n and previsible increasing part (i.e. quadartic

variation) An = 〈M〉n = n.

Define the limit

〈M〉∞ = limn→∞

〈M〉n.

The following two observations are useful:

32

(i) M is bounded in L2 if and only if E〈M〉∞ < ∞.

Indeed, since E(M2n) = E(An) is increasing in n, we have supE(M2

n) = E〈M〉∞.

(ii) 〈M〉n − 〈M〉n−1 = E(M2n −M2

n−1 | Fn−1) = E[(Mn −Mn−1)2 | Fn−1].

Indeed, since E(Mn−1Mn | Fn−1) = Mn−1E(Mn | Fn−1) = M2n−1, we can write

E(M2n −M2

n−1 | Fn−1) = E(M2n − 2Mn−1Mn + 2Mn−1Mn −M2

n−1 | Fn−1)

= E(M2n − 2Mn−1Mn +M2

n−1 | Fn−1)

= E[(Mn −Mn−1)2 | Fn−1].

- the conditional variance of the increment of M .

33

8 Brownian motion

8.1 Transition to continuous time.Brownian motion.

So far, we have studied martingales with discrete time, t = 0, 1, 2, . . .. The

simpliest example was Simple Random Walk (SRW). In finance, however, it is

also useful to consider processes with continous time, where the time parameter

t ∈ R+ = [0,∞). For example, stock prices can change at any time instant within

a business day. Among random processes with continous time one of the most

important is so-called Brownian motion. Brownian motion is a simple continuous

stochastic process that is widely used in physics and finance for modeling random

behavior that evolves over time. Examples of such behavior are the random

movements of a molecule of gas or fluctuations in an asset’s price.

Brownian motion gets its name from the botanist Robert Brown (1828) who

observed in 1827 that tiny particles of pollen suspended in water moved erratically

on a microscopic scale; but he was not able to determine the mechanisms that

caused this motion. The physicist Albert Einstein published a paper in 1905

explaining that the motion was caused by water molecules randomly bombarding

the particle of pollen, and thus helping to firmly establish the atomic theory of

matter. Later on, starting in 1918, American mathematician Norbert Wiener

created a precise mathematical model for this phenomena. This is what we will

study now.

In order to define the Brownian motion, we start from Simple Random Walk

(SRW) and let its step sizes (both time step and space step) to go to 0. Denote

Xt = ∆x · (X1 +X2 + . . .+X[ t∆t

]),

where

Xi =

+1, with probability 1

2,

−1, with probability 12,

34

∆x - space step size,

∆t - time step size,

t∆t

- the number of time steps in the time interval [0, t].

SinceX1, X2, . . . , X[ t∆t

] are independent identically distributed (IID) random vari-

ables with mean value EXi = 0 and variance DXi = 1, then for each ∆x and

∆t we have EXt = 0 and DXt = (∆x)2[ t∆t]. When ∆t → 0,∆x → 0, then the

number of summands [ t∆t] increases unboundedly and according to the Central

Limit TheoremXt

∆x√

[ t∆t]

D→ N (0, 1).

By choosing the relationship between ∆x and ∆t such that (∆x)2

∆t= const =: C2

we get, on the limit, that Xt ∼ N (0, C√t). The limiting process Xt preserves

some important features of SRW:

(i) The increments of Xt are independent i.e. for 0 ≤ s ≤ t ≤ u ≤ v the

increments Xt −Xs and Xv −Xu are indpendent r.v. (the same is valid for

any n time intervals).

(ii) The increments of Xt are stationary i.e. the distribution of Xs+t −Xs only

depends on t (and not on s).

These properties suggest the following definition.

Definition 6 (Brownian motion). The random process Wt, t ≥ 0 is called

Brownian motion (Wieneri process), if

(i) W (0) = 0,

(ii) for all t > 0 the r.v. Wt ∼ N (0, C√t), where C > 0 is a constant,

35

(iii) increments of Wt are independent and stationary,

(iv) the paths of Wt are a.s. continuous (in t).

From the definition of BM it follows that also the increments of BM are normally

distributed: by the stationarity of increments

Wt −WsD= Wt−s −W0 = Wt−s ∼ N (0, C

√t− s),

whereD= is to be read as ”has same distribution as” .

Note that, in fact, the property (iv) can be deduced from properties (i)-(iii).

BM is a mathematical model widely used in physics (diffusions), economics (price

models) e.t.c. .

If C = 1, the BM is called Standard Brownian Motion (SBM). The process

0.0 0.2 0.4 0.6 0.8 1.0

−0.5

0.00.5

1.0

aeg

W

Figure 2: A trajectory of standard Brownian motion

Wt + µt, where µ is a real number, is called Brownian motion with drift(µ is

called drift coefficient). The mean value of BM with positive (resp negative)

drift increases (resp decreases) in time.

36

8.2 Some properties of Brownian motion

1) Finite-dimensional distributions of SBM

The joint distribution of (Wt1 ,Wt2 , . . . ,Wtn) where 0 < t1 < t2 < . . . < tn can

easily be calculated.

For each ti the density of Wti is

fWti(x) =

1√2πti

e− x2

2ti ,

provided that C = 1. Since the equalities

Wt1 = x1

Wt2 = x2

. . .Wtn = xn

are equivalent to the equalities

Wt1 = x1

Wt2 −Wt1 = x2 − x1

. . .Wtn −Wtn−1 = xn − xn−1

and since the increments Wt1 ,Wt2 − Wt1 , . . . ,Wtn − Wtn−1 are independent, we

have

fWt1 ,...,Wtn(x1, . . . , xn) =

= fWt1 ,Wt2−Wt1 ,...,Wtn−Wtn−1(x1, x2 − x1, . . . , xn − xn−1) =

= fWt1(x1) · fWt2−Wt1

(x2 − x1) · . . . · fWtn−Wtn−1(xn − xn−1) =

=1√2πt1

e−x1

2

2t1 · 1√2π(t2 − t1)

e− (x2−x1)

2

2(t2−t1) · . . . · 1√2πtn − tn−1

e− (xn−xn−1)

2

2(tn−tn−1) .

The formula obtained can be used for many purposes.

37

2) Conditional distribution.

Let’s use the formula above to solve one particular problem. Suppose we know

that at time t BM has taken value Wt = B. Let s be an earlier time, s < t.

What is the conditional distribution of Ws given the event Wt = B? It is known

that the conditional density is the ratio of joint density and the density of the

condition, we can calculate

fWs|Wt(x|B) =

fWs,Wt(x,B)

fWt(B)

=

=

1√2πs

· ex2

2s · 1√2π(t−s)

· e(B−x)2

2(t−s)

1√2πt

· eB2

2t

= . . . =1√

2π st(t− s)

· e−t(x−B s

t )2

2s(t−s) .

Hence, the conditional distribution of Ws is normal distribution with mean B · st

and variance s(t−s)t

.

3) First passage time.

Let W0 = 0 and a > 0. We are interested in the time which elapses before BM

attains the level a. We call it the first passage time to the point a and denote

Ta = infT : Wt = a. Ta is a random variable since its value depends on the

path of BM (on ω.) Let’s find its distribution function FTa(t) = PTa ≤ t.

Using the formula of total probability, we have:

PWt ≥ a = PWt ≥ a|Ta ≤ t · PTa ≤ t+ PWt ≥ a|Ta > t · PTa > t.

For symmetry of normal distribution PWt ≥ a|Ta ≤ t = 12. At the same time

obviously PWt ≥ a|Ta > t = 0. Therefore PWt ≥ a = 12PTa ≤ t, and

PTa ≤ t = 2PWt ≥ a.

Since Wt ∼ N(0,√t), we have

FTa(t) = PTa ≤ t = 2

1√2πt

∫ ∞

a

e−x2

2t dx = 2[1− Φ

(a√t

)].

If a < 0, then by symmetry PTa ≤ a = 2[1− Φ(

|a|√t

)].

If a = 0, then T0 = 0. Taking all together, we have that, for any a,

PTa ≤ t = 2[1− Φ

( |a|√t

)]. (15)

38

By differentiating the distribution function above, one gets the density function

of Ta. For a > 0 it calculates as

fTA(t) = F ′

Ta(t) = −2ϕ(

a√t) · (−a

2) · t− 3

2 =a√2πt3

e−a2/2t.

This distribution is called inverse Gaussian distribution (also Wald distribution).

From (15) it is also seen that if t → ∞, then

P(Ta < ∞) = limt→∞

P(Ta ≤ t) = 1. (16)

4) Maxima of Brownian motion .

If a > 0, then Pmax0≤s≤t Ws ≥ a = PTa ≤ t = 2[1− Φ(

|a|√t

)].

If a < 0, then Pmax0≤s≤t Ws ≥ a = 1.

5) Brownian motion between two boundaries

Let A > 0, B > 0. Let us find the probability that, starting from 0, BM reaches

level A before −B. Recall that in the case of SRW the answer to the same

question is BA+B

. As the same answer remains true for any time and space steps

sizes, we have

PWt reaches A before − B = PTA < TB =B

A+ B.

39

Exercise 1. Let Wt be a standard Brownian motion. Assume that W1 = 2. Find

the probability that W5 < 0.

Without any given condition we would have PW5 < 0 = 12, since Wt ∼

N(0,√t).

However, under the condition W1 = 2 the increment W5 − W1 ∼ N(0,√4) –

this distribution only depends on the length of the time interval and not on its

location. Therefore we have

PW5 < 0|W1 = 2 = PW5 −W1 < −2|W1 = 2 =

= PW5 −W1 < −2 = PW4 < −2 =

= PN(0,√4) < −2 = Φ

(−2

2

)= Φ(−1) = 0.16 .

40

9 Martingales with continous time

Many important concepts and results that are known for martingales with discrete

time can be transferred to continuous time without any major change.

Let (Ω,F ,P) be a probability space and let t be a real-valued parameter inter-

preted as time. Most often, t takes values from the half-line R+ or finite interval

[0, T ].

Definition 7. A family of σ-algebras Ft, t ≥ 0 is called a filtration, if

1) all its members Ft are sub-σ-algebras of F and, 2) for s < t one has Fs ⊆ Ft.

As in the case of dicsrete time, we are mainly interested in the natural filtration

FXt , t ≥ 0, generated by a random process X. As before, FX

t contains the

information induced by the random process X within the time interval [0, t]. It

means that an event A ∈ FXt if and only if one can decide whether A ocurred or

not on the basis of the trajectory Xs, 0 ≤ s ≤ t that the process X generates

by time t.

Definition 8. If Yt, t ≥ 0 is a random process such that for each t the random

variable Yt is Ft-measurable, then it is said that the process Y is adopted to

filtration Ft, t ≥ 0.

Examples:

1. The random process Zt =∫ t

0Xsds is adopted to the filtration FX

t , t ≥0, since knowing the path of X within time interval [0, t] is sufficient to

determine Zt.

2. The process Mt = max0≤s≤t Ws is adopted to the filtration FWt , t ≥ 0.

3. The process Zt = W 2t+1 −W 2

t is not adopted to the filtration FWt , t ≥ 0.

Similarly to discrete time we define martingales with continous time.

41

Definition 9. Let (Ω,F ,P) be a probability space endowed with a filtration

Ft, t ≥ 0. A random process Mt, t ≥ 0 is called a martingale, if

1. M is adopted to the filtration Ft, t ≥ 0,

2. E|Mt| < ∞, ∀t

3. for any s ≤ t we have E(Mt|Fs) = Ms a.s.

If the equality in 3. is replaced by the inequality ≤ (or ≥), then we speak about

a supermartingale (respectively submartingale).

Remark: Similarly, a martingale can be defined on a finite time interval [0, T ].

In the following mostly the filtration FWt , t ≥ 0, generated by a standard

Brownian motion W , is used.

Very important continous time martingales are related to Brownian motion.

Lemma 4. Let Wt be a standard Brownian motion and let Ft, t ≥ 0 be the

filtration induced by W. Then

1. Wt is a martingale,

2. W 2t − t is a martingale,

3. exp(σWt − σ2

2t) is a martingale (called exponential martingale).

Proof: The proofs resemble each other. Consider, for example, the process

Mt = W 2t − t. Obviously, E|Mt| < ∞. Find now

E(W 2t −W 2

s |Fs) = E[(Wt −Ws)2 + 2Ws(Wt −Ws)|Fs)]

= E[(Wt −Ws)2|Fs] + 2WsE[(Wt −Ws)|Fs]

= t− s.

So that

E(W 2t − t|Fs) = E(W 2

t −W 2s +W 2

s − (t− s)− s|Fs)

= (t− s) +W 2s − (t− s)− s = W 2

s − s,

42

and thus the process W 2t − t is a martingale.

Definition 10. A random variable T is called a stopping time with respect to the

filtration Ft, t ≥ 0, if for each t ≥ 0 the event T ≤ t ∈ Ft.

Let’s introduce a more general concept of local martingale.

Definition 11. A process Xt, t ≥ 0 is local martingale, if there exists a se-

quence of stopping times Tn, n = 1, 2, . . . such that the stopped process Xt∧Tn, t ≥

0 is a martingale for each n and

P[ limt→∞

Tn = ∞] = 1.

Every martingale is a local martingale (how to choose Tn?), however, a local

martingale need not be a martingale. This is the reason why we will assume

some boundedness condition to be satisfied in the following.

It is very useful to have a ’continuous’ version of Doob’s convergence theorem.

However, for that the trajectories of the martingale must be ’nice’ enough. In all

our examples below the trajectories are right continuous and have left limits or,

in short, cadlag (in French: continues a droite, limites a gauche ).

For example, continuous functions (like trajectories of Brownian motion) are au-

tomatically cadlag.

Theorem 8. (Stopping theorem in continous time)

If Mt, t ≥ 0 is a cadlag martingale w.r.t. the filtration Ft, t ≥ 0 and τ1, τ2

are bounded stopping times, τ1 ≤ τ2 ≤ K, then

E|Mτ2 | < ∞

and

E(Mτ2|Fτ1) = Mτ1 , P− a.s. (17)

Note: Let τ is a bounded stopping time. Then, by choosing τ1 = 0, τ2 = τ, and

by taking expectations from both sides in (17), we have EMτ = EM0.

43

We next consider an interesting application of the stopping theorem.

Lemma 5. Let Wt be a standard Brownian motion and let Ta be the first passage

time of level a, Ta = inft ≥ 0 : Wt = a. Then for any θ > 0

E[e−θ Ta

]= e−

√2θ|a| (18)

Proof. Assume that a ≥ 0 (the case a < 0 can be handled by symmetry).

Consider the martingale Mt = exp(σWt − σ2t/2). As the stopping time Ta is

not bounded, Theorem 8 does not apply directly. Instead, let’s consider bounded

stopping times τ1 = 0 and τ2 = Ta ∧ n. Then, by Theorem 8, we have

E(MTa∧n) = EM0 = 1, ∀n.

Show now that EMTa= limn E(MTa∧n). First recall that Ta < ∞ a.s. (cf the

formula (16)), hence starting from some value of n we have Ta < n and therefore

MTa∧n → MTaa.s.

At the same time the stopped martingale MTa∧n is bounded from both sides:

indeed, for WTa∧n ≤ a we have

0 ≤ MTa∧n = exp

(σWTa∧n −

σ2

2(Ta ∧ n)

)< eσa.

Therefore the bounded convergence theorem applies, giving us

EMTa= lim

nE(MTa∧n) = 1.

From the other side, by the definition ofMt, the expectation EMTacan be written

as

EMTa= E

[e(σa−

12σ2Ta)

].

Hence we have

1 = E[e(σa−

12σ2Ta)

],

and by choosing σ2 = 2θ, we obtain the required formula (18).

The Lemma above can be used, for example, for the calculation of ETa. (Expected

value of Ta can be expressed via its moment generating function m(θ) = E(eθTa),

namely, ETa = m′(0).) Try to show that ETa = ∞!

44

10 Stochastic integral

Our aim here is to give a meaning to the integral∫ t

0Xs dMs, where X is an

adopted process and M is a martingale, e.g. Brownian motion. This new integral

is rather different from the classical Stiltjes integral. In fact, we have already done

a useful piece of work in this direction when considering discrete time martingales.

For discrete time martingales we have defined the martingale transform as the

process

Yn =n∑

i=1

Ci(Xi −Xi−1) =: (C •X)n,

which was interpreted as the total winnings of a player after the game n (recall

that Ci is the stake of the player on game i and Xi − Xi−1 is the net winnings

per unit stake in game i.) Most importantly, we have shown (Theorem 2) that

if the process Ci is predictable and X is a martingale, then the process Y is a

martingale again. At the same time, the formula of the martingale transform

above looks like a certain integral sum. Our next task is to define similar concept

(stochastic integral) for continuous time martingales. Stochastic integral is an

efficient tool to solve problems in various areas, including option pricing.

We first show that it is not possible to integrate with respect to continuous time

martingales in a traditional manner i.e. path-wise (as Riemann- Stiltjes integral).

10.1 Variation. Quadratic variation of Brownian motion

Let us consider a function f of a real variable. Variability of the function f

within the interval [a, b] can be described by dividing the interval into smaller

subintervals using cutting points a = t0 < t1 < . . . < tn = b and by summing up

the absolute values of the increments |f(ti)− f(ti−1|. Instead of absolute values,

one could also use the squares, cubes etc. In order to avoid the dependence of

the result on the interval partition method, we allow the number of cut-points n

to approach infinity. We thus reach the following definition.

45

Definition 12. The p-variation of a function f over the interval [a, b] is defined

as

V arp(f ; a, b) = lim sup‖πn‖→0

n∑

i=1

|f(ti)− f(ti−1)|p,

where πn is a partition of [a, b] by cutting points a = t0 < t1 < . . . < tn = b, and

‖πn‖ is the length of the longest subinterval of πn.

It is easy to see that when f is continuous and the partition is fine enough, so

that the increments |f(ti)− f(ti−1| are small numbers (smaller than 1), then the

higher is the order p the less is the result. Thus for p > q the p-variation is less

than the q-variation.

At the same time, while studying martingales, we have used the term ’quadratic

variation’: the quadratic variation of a martingale M was a predictable process A

such that the difference M2−A is again martingale. For example, for a standard

Brownian motion W the process W 2t − t is a martingale, and hence t is the

quadratic variation of standard Brownian motion. The question arises whether

such a coincidence of terminology is justified. The positive answer is provided by

the following lemma. Consider a partition of the interval [0, t]

πn : 0 = t0 < t1 < . . . < tn = t

and let us denote

∆i = ti − ti−1, i = 1, 2, . . . , n

∆Wi = Wti −Wti−1,

Qn(t) =n∑

i=1

(Wti −Wti−1)2 =

n∑

i=1

(∆Wi)2.

Lemma 6. The following (mean-square) convergence takes place:

E[Qn(t)− t]2 → 0, n → ∞.

Proof. The increments of Brownian motion ∆Wi are independent and ∆Wi ∼N(0,

√∆i). Therefore

EQn(t) =n∑

i=1

E(∆Wi)2 =

n∑

i=1

∆i = t.

46

At the same time the variance

D(Qn(t)) =n∑

i=1

D((∆Wi)2) =

n∑

i=1

[E(∆Wi)4 −∆2

i ].

It is well known that the 4-th order moment of a N(0, 1)– distributed random

variable is 3, thus E(W 41 ) = 3. Hence, by taking into account the stationarity of

increments, we have

E(∆Wi)4 = E(Wti −Wti−1

)4 = EW 4ti−ti−1

= EW 4∆i

= E(√

∆iW1

)4= 3∆2

i ,

from which

D(Qn(t)) = 2n∑

i=1

∆2i .

Therefore, if ‖πn‖ = max∆i → 0, then

D(Qn(t)) ≤ 2‖πn‖ ·n∑

i=1

∆i = 2t ‖πn‖ → 0.

Since

D(Qn(t)) = E(Qn(t)− t)2,

the proof is completed.

Now it is not difficult to show that the variation (1-variation) of Brownian motion

is unbounded.

Corollary. The trajectories of Brownian motion have a.s. unbounded variation,

i.e. V ar1(W ; 0, t) = ∞.

Proof. Obviously, the following inequalities are valid:

Qn(t) =n∑

i=1

(∆Wi)2

≤ maxi

|∆Wi| ·n∑

i=1

|∆Wi|

≤ maxi

|∆Wi| · V ar1(W ; 0, t). (19)

We now let ‖πn‖ = max∆i → 0. Since the trajectories of Brownian motion are

uniformly continuous in the interval [0, t], we have the convergence maxi |∆Wi| →

47

0. Suppose now, in contrary, that V ar1(W ; 0, t) < ∞. Then the product (19)

also tends to zero, while a smaller quantity Qn(t) converges to t - a contradiction.

Therefore, it must be that V ar1(W ; 0, t) = ∞.

The last fact makes it more complicated to integrate with respect to Brownian

motion (and other martingales), as it becomes impossible to integrate along indi-

vidual trajectories. Indeed, it is known that Riemann-Stiltjes integral∫ 1

0f(t)dg(t)

exists if both f and g have bounded variations and their discontinuity points do

not overlap. 4 It is therefore necessary to introduce a new type of integrals (called

stochastic integral) that we first do in a form of simple but important example.

10.2 First example of stochastic integral

In the next, we try to give meaning to the integral∫ t

0WsdWs. From the discussion

above, it is clear that this integral can not exist in Riemann- Stiltjes sense, because

of the unbounded variation of Wt. Thus for each separate trajectory the integral

sums

Sn =n∑

i=1

Wti−1∆Wi =

n∑

i=1

Wti−1(Wti −Wti−1

) (20)

do not converge as ordinary number sequence, when n → ∞. However, by using

(Wti −Wti−1)2 = W 2

ti+W 2

ti−1− 2WtiWti−1

, it is possible (after little algebra) to

represent Sn in the form

Sn =1

2W 2

t − 1

2Qn(t).

By Lemma 6, the mean-square convergence (L2- convergence) Sn → 12W 2

t − 12t

takes place. The limit 12W 2

t − 12t =: It is called stochastic integral and we write

∫ t

0

WsdWs =1

2W 2

t − 1

2t. (21)

We see that, in addition to Riemann-Stiltjes integral, an additional term −t/2

has appeared. Let us take together the discussion above:

4Recent studies, however, have shown that for the existence of Riemann-Stiltjes integral it issufficient that f has bounded p-variation and g has bounded q-variation, where p, q > 0 satisfy1/p+ 1/q > 1.

48

• For each separate trajectory Wt(ω) the integral sum Sn(ω) does not con-

verge.

• At the same time, the average of Sn (over all ω) is equal to It =12W 2

t − 12t,

so that E(Sn − It) = 0 for each n.

• Sn fluctuates around the process It and its deviation from It vanishes when

n increases to infinity: for each t we have E(Sn − It)2 → 0 (convergence in

L2).

Thus the stochastic integral It is a random process which is not necessarily close

to the integral sum Sn for each separate trajectory, but it works well (as much as

possible) for all trajectories simultaneously.

It is important to note, that the integral sum Sn =∑n

i=1 Wti−1(Wti −Wti−1

) is a

martingale transform and therefore (by Theorem 2) it is a martingale. We also

know that the limit process It =12W 2

t − t2is a martingale (see Lemma 4, assertion

2)).

Comment: If we define Sn slightly differently from (20), Sn =∑n

i=1 Wti∆Wi,

using ti = (ti−1 + ti)/2 (instead of ti−1), the process Sn stops to be a martingale.

The integral obtained by such an alternative construction is called Stratonovich

integral and it has useful technical applications.

We now proceed to a more general definition of the stochastic integral. In doing

that we do not restrict ourselves with integration merely w.r.t. the Brownian

motion.

10.3 Definition of stochastic integral

In order to define a stochastic integral in general case, we follow the standard

scheme:

1) the stochastic integral is first defined for ”simple” processes (resembles a mar-

tingale transform),

49

2) a general process is represented as the limit of simple processes,

3) the stochastic integral of a general process is defined as the limit of integrals

of respective simple processes.

10.3.1 Stochastic integral of simple process

Let T = [0,∞) and let Ft, t ∈ T be a filtration. Let M be a continuous square

integrable martingale w.r.t. the filtration Ft, t ∈ T.

Definition 13. The process η is called a simple process if there exists a finite

number of time instances 0 = t0 < t1 < · · · < tn = ∞ and random variables

ξi, i = 1, 2, . . . , n, with finite variances such that ξi is Fti−1-measurable for each

i and

ηt =n∑

i=1

I[ti−1,ti)(t)ξi.

Definition 14. The stochastic integral of a simple process η with respect to a

martingale M is defined as the process

Intt =

∫ t

0

ηs dMs : =nt∑

i=1

ξi(Mti −Mti−1) + ξnt+1(Mt −Mtnt

)

≡n∑

i=1

ξi(Mt∧ti −Mt∧ti−1),

where nt is an integer such that tnt≤ t < tnt+1.

Notice that the last formula has the structure of a martingale transform (3).

Therefore, by Theorem 2, the process Intti is a martingale as well.

It is easy to see that by choosing η ≡ 1, we obtain an useful formula

∫ t

0

dMs = Mt −M0,

as in the case of classical R-S integral.

50

10.3.2 Stochastic integral of continuous process

Let M be a martingale and let Z be an adopted and continuous process such that

E(

∫ t

0

Z2s d〈M〉s) < ∞ ∀t > 0.

Definition 15. The stochastic integral of a process Z with respect to a martingale

M is defined as the limit (in L2)

Intt =

∫ t

0

Zs dMs = lim‖πn‖→0

n∑

i=1

Zti−1(Mti −Mti−1

),

where πn is a partition of [0, t] into n subintervals:

0 = t0 < t1 < · · · < tn = t.

Theorem 9. Stochastic integral has following properties:

(i) The process (Intt)t∈[0,∞) is a square integrable martingale.

(ii) The quadratic variation of Int is the process

〈Int〉t =∫ t

0

Z2s d〈M〉s.

Proof: The properties are relatively easy to prove for simple processes. The

second property is known as isometry.

Generally speaking, the definition above is not suitable for immediate calcu-

lations, except for simple processes. (The same is true for classical Riemann

integral!). Therefore certain rules have been worked out to calculate stochastic

integrals in practice. Here the central role is played by so called Ito formula.

51

11 Ito formula and its applications

Consider an stochastic process Xt, which can be decomposed as

Xt = X0 + Vt +Mt, (22)

where

– Mt is a martingale with continuous trajectories that have non-zero quadratic

variation, V ar2 > 0 (such trajectories are called ’rough’, e.g. Brownian motion),

– Vt is an adopted process with continuously differentiable (or ’smooth’) trajecto-

ries that have finite variation, V ar1 < ∞, and zero quadratic variation, V ar2 = 0.

Such processes Xt are called semimartingales.

We are interested in an expression for the increment f(Xt) − f(X0) for some

functions f .

In classical analysis, it is known that if Xt is a ”common” process (e.g. with

continuously differentiable trajectories) then the Newton-Leibnizi formula gives

f(Xt)− f(X0) =

∫ t

0

f ′(Xs)dXs.

However, for more general processes (22) the answer is different, as shows the Ito

formula below.

11.1 Ito formula

Let f be twice continuously differentiable function. Then

f(Xt)− f(X0) =

∫ t

0

f ′(Xs)dVs +

∫ t

0

f ′(Xs)dMs +1

2

∫ t

0

f ′′(Xs)d〈M〉s . (23)

The second integral at the right hand side of Ito formula is a stochastic integral,

whereas the other two are traditional (Riemann-Stiltjes) integrals. Hence, the

stochastic integral can be expressed in terms of usual integrals. This is why the

52

Ito formula is so important.

Example 1. Let Wt be standard Brownian motion and Xt = Wt (i.e. the

decomposition (22) contains only the martingale part while X0 = Vt = 0. Using

Ito formula, show that∫ t

0WsdWs = 1

2W 2

t − 12t (that, in fact, we know already

from before) (21).)

Hint: Take f(x) = x2.

The proof of Ito formula. We present only the basic idea of the proof. For

simplicity, let Vt = 0 and let Mt = Wt - a standard Brownian motion. Since the

quadratic variation is 〈W 〉s = s, we need to show that

f(Xt)− f(X0) =

∫ t

0

f ′(Xs)dWs +1

2

∫ t

0

f ′′(Xs)ds. (24)

Let us consider a partition of the interval [0, t] by points 0 = t0 < t1 < . . . < tn =

t. Then

f(Xt)− f(X0) ≡n∑

i=1

[f(Xti)− f(Xti−1

)].

In each subinterval we apply usual Taylor’s formula:

f(Xt)− f(X0) =n∑

i=1

f ′(Xti−1)(Wti −Wti−1

) +1

2

n∑

i=1

f ′′(Xξi)(Wti −Wti−1)2,

where ξi ∈ [ti, ti−1]. When the partition gets finer, so that the maximum length of

subintervals tends to zero, the first sum converges (in mean square) to stochastic

integral∫ t

0f ′(Xs)dWs. At the same time, in the second term the square of the

increment of the Brownian motion (Wti − Wti−1)2 can be approximated by its

mean value ∆i = ti − ti−1 (related calculations were made in the proof of the

lemma 6 where it came out that the variance D((Wti − Wti−1)2) = 2∆2

i → 0).

Therefore, the second term can be approximated by the expression

12

∑ni=1 f

′′(Xξi)∆i, which converges to usual integral 12

∫ t

0f ′′(Xs)ds.

53

Example 2. A widely used model in financial mathematics to describe the

behaviour of a stock price St is the following:

dSt = St (µdt+ σdWt), (25)

where

µ shows relative change of the price per time unit (a constant in this model),

dWt is the random part of the price change, that within a short time interval ∆t

behaves like an increment of a Brownian motion ,

σ shows the importance of the random component in the development of the price

(volatility parameter).

The relationship above is, in fact, a short notation of the following equation:

St − S0 =

∫ t

0

Ssµds+

∫ t

0

SsσdWs. (26)

The equation (25) is called stochastic differential equation (SDE) . Our aim is to

solve this SDE for St. It turns out that the easiest way to do that is first to find

ln(St/S0).

Hint: apply Ito formula for f(x) = lnx.

54

11.2 Ito formula in differential form

df(Xt) = f ′(Xt)dXt +1

2f ′′(Xt)d〈M〉t . (27)

The formula (27) comes immediately from the Ito formula (23), being simply

its shorter (and more convenient) notation. Here the expression d〈M〉t is the

usual differential of the function 〈M〉t , however dXt is a symbol of (nn stochastic

differential), whose precise meaning is given by the notion of stochastic integral∫ t

0f ′(Xs)dXs.

Example 3. Solve the problem in Example 2, using the differential form of Ito

formula.

55

11.3 A generalization of Ito formula

Let the function f also depend on time, f = f(Xt, t), where the process Xt

is still of the form (22). Assuming that the function f(x, t) is smooth enough

(sufficiently times differentiable), the following formula is valid:

df(Xt, t) =∂f

∂t(Xt, t) dt+

∂f

∂x(Xt, t) dXt +

1

2

∂2f

∂x2(Xt, t) d〈M〉t . (28)

Example 4. Assume that a stock price St behaves in accordance with the

following model:

dSt = St (rdt+ σdWt),

where r is the risk free interest rate and σ is the price volatility. Show that then

the discounted price process e−rtSt is a martingale.

Hint: use f(x, t) = e−rtx.

56

Example 5. Find d (e−rtV (St, t)), where V (s, t) is known twice continuously

differentiable function and the price process St is driven by the equation dSt =

St (rdt+ σdWt).

Hint: Take f(s, t) = e−rtV (s, t).

57

The result is the following stochastic differential equation

d (e−rtV (St, t)) = e−rt

[−rV (St, t) +

∂V

∂t+ rSt

∂V

∂s+

1

2S2σ2∂

2V

∂s2

]dt+e−rtσSt

∂V

∂sdWt

Corrollary: If the function V = V (s, t) satisfies the relationship

∂V

∂t+ rs

∂V

∂s+

1

2s2σ2∂

2V

∂s2− rV = 0, ∀s, t (29)

then the coefficient of the term dt becomes zero and the result is the martingale

d (e−rtV (St, t)) = e−rtσSt∂V

∂sdWt.

We have discovered a relationship between random processes (SDE) and differ-

ential equations (PDE). The formula (29) is called Black-Scholes equation.

The fact that, under some circumstances, e−rtV (St, t) is a martingale can be used

to solve several problems in financial mathematics. This is based on exploiting

the martingale’s property to keep its mean value over the time. Let us consider

an example of this type.

Example 6. Option pricing

58

11.4 Ito formula for functions of several variables

Consider a function of several variables in the form f(x1, x2, . . . , xn, t), for exam-

ple∑n

i=1 xi+t.Our aim is to find the stochastic differential d f(X1t, X2t, . . . , Xnt, t).

We first introduce the concept of cross-variation.

Definition 16. The cross-variation of martingales M and N is the process

〈M,N〉t =1

4(〈M +N〉t − 〈M −N〉t) .

Some most important properties of cross-variation are:

1. 〈M,M〉t = 〈M〉t. (It suffices to take M = N in the definition)

2. If Int1t =∫ t

0ξs dMs ja Int2t =

∫ t

0ηs dNs, then

〈Int1, Int2〉t =∫ t

0

ξsηs d〈M,N〉s.

3. For independent Brownian motions 〈W1,W2〉t = 0.

Ito formula for functions of several variables

If f is twice continuously differentiable function, then

d f(X1t, X2t, . . . , Xnt, t) =∂f

∂t(−→X t, t) dt+

n∑

i=1

∂f

∂xi

(−→X t, t) dXit +

+1

2

n∑

i=1

n∑

j=1

∂2f

∂xi∂xj

(−→X t, t) d〈Mi,Mj〉t

where Xit is of the form Xit = Xi0 + Vit +Mit,

V is a term with ’nice’ properties (having finite variation), and M is a martingale.

Naide 7.

59

12 Change of measure. Girsanov’s theorem

12.1 Options

An option is a contract between a buyer and a seller that gives the buyer of

the option the right, but not the obligation, to buy or to sell a specified asset

(underlying) on or before the option’s expiration time, at an agreed price, the

strike price. In return for granting the option, the seller collects a payment (the

premium) from the buyer. Granting the option is also referred to as ”selling” or

”writing” the option. A call option gives the buyer of the option the right but

not the obligation to buy the underlying at the strike price. A put option gives

the buyer of the option the right but not the obligation to sell the underlying at

the strike price.

If the buyer chooses to exercise this right, the seller is obliged to sell or buy the

asset at the agreed price. If the option may only be exercised at expiration time

T then it is called European option. An American option, in contrary, may be

exercised on any trading day on or before expiration.

The price of the underlying St can rise by time T higher than strike price K. In

such case the owner of the option at time T executes his right to buy shares at

price K and sells them at the market price, earning ST − K per share. At the

same time, the seller of the option can lose the same amount of money, provided

he does not undertake anything during the option period. However, as we will

see below, the seller can take measures to avoid the loss.

Since the option never results in negative value, more exactly, at time T its value

is C(ST ) := max(ST − K, 0) , then it is natural that in return for granting the

option, the seller collects a payment (the premium) from the buyer. We see two

problems related to options:

• What is the fair price of the option (at the time of writing it but also at

any later time since we may want to sell the option on)?

60

• How can the seller (writer) of the option avoid possible loss caused by an

unfavorable change of the price of the underlying?

12.2 Option pricing

The basis for pricing an option is so called parity principle. According to that

principle, the fair price is the amount of money X0 such that if the seller, starting

trading with that amount at time 0 and using appropriate trading strategy, can

reach the same value XT at time T as the option itself , i.e. XT = C(ST ) =

max(ST −K, 0).

It is not difficult to show that when the option is priced differently from X0, there

will be a possibility for arbitrage where one side of the agreement (the buyer or

the seller) can earn a risk free profit.

In the following we assume that the stock price St is driven by the model

d St = St (µdt+ σ dWt), (30)

where µ is a constant component of the return, called drift, and σ is a constant

showing the importance of the random component of the return, called volatility.

Now consider an investor (= seller of the option) who invests the amount X0 at

time 0 partly in shares and puts the rest of the money onto a bank account with

risk free interest rate r. Then he starts trading the same stock putting the money

earned to the bank accunt and taking the money necessary for purchase of shares

from the same account (self-financing portfolio).

Let η(t) be the number of shares owned by the investor at time t. Then the value

Xt of the portfolio at time t consists of two parts: the value of shares η(t) · St

plus the money in the bank Xt − η(t)St. Let’s find the increment of the value of

such portfolio during a short time interval dt:

dXt = r [Xt − η(t)St] dt+ η(t) · dSt.

61

By using Ito formula and the market model (30) one can also calculate the change

of the discounted wealth:

d(e−rtXt) = η(t)d(e−rtSt)

(show this!). From here it is seen that as soon as the discounted price process

e−rtSt is a martingale, so does the discounted wealth e−rtXt. However, any mar-

tingale keeps its mean value over the time. Hence, its (non-random) initial value

X0 is equal to the mean value at any later time epoch, including time T :

X0 = e−r0X0 = E(e−rTXT ) = E(e−rTC(ST )) (31)

Only one question remains: whether the discounted price process e−rtSt is a martingale?

In order to decide that, we will find its stochastic differential, using Ito formula:

d (e−rtSt) = e−rtSt σ

(µ− r

σdt+ dWt

).

The result will definitely be a martingale if µ = r, since then the right hand side

reduces to the stochastic integral w.r.t. Wt, which is, as we know, a martingale.

However, it turns out that even when µ 6= r the whole expression in the brackets

in the last formula, Wt =µ−rσ

t+Wt, can be regarded as a martingale. For that, it

is only necessary to redefine appropriately the probabilities of the trajectories of

the initial Brownian motion Wt. In other words, the measure P must be changed

in such a way that Wt is martingale with respect to the new measure Q. How

this can be achieved? This possibility is granted by Girsanov’s Theorem (also

called Cameron-Martin Theorem) which also indicates the relationship between

the old and new meausres.

Theorem 10. (Girsanov’s Theorem)

Let Wt be a standard Brownian motion w.r.t. probability space (Ω, Ftt≥0, P).

Let Wt =∫ t

0θs ds+Wt, where the process θ is adopted to the filtration Ft and

satisfies the requirement E(e

12

∫ T

0 θ2sds)< ∞.

Then there exists a probability measure Q such that Wt is a standard Brownian

motion w.r.t. that measure. The measure Q is related to P via the formula

62

Q(A) =∫AMT (ω)P(dω), where

Mt = exp

(−∫ t

0

θs dWs −1

2

∫ t

0

θ2sds

).

The measure Q is called risk neutral or martingale measure. Let us apply Gir-

sanov’s Theorem in our case where we have θt ≡ µ−rσ. The assumption

E(e

12

∫ T

0 θ2sds)< ∞ is clearly fulfilled. Thus the existence of the martingale mea-

sure Q is guaranteed, and the option price is the expected value w.r.t. to that

new measure Q :

X0 = e−r0X0 = EQ(e−rTC(ST )). (32)

Finally, let’s explain the term ”risk neutral measure”. As

dWt = dWt −µ− r

σdt,

the market model (30) reduces (after simple calculations) to

d St = St (rdt+ σ dWt). (33)

The formula differs from the model (30) only in that µ is replaced by the risk

free interest rate r. We see that in the case of the measure Q the average return

of the stock price is equal to the risk free interest rate r. Hence the term interest

”risk neutral measure”.

Some remarks on how to calculate the option price as the expected value X0 =

EQ(e−rTC(ST )). Basically, there are two possibilities for that:

1. Simulation where a large number of price trajectories is generated, in

accordance with the market model (33). For each trajectory C(ST ) is found,

the results are averaged and multiplied by the discounting factor e−rT .

2. Analytical calculation of the expectation. This is only possible for some

simple pay-off functions C(St), like for example European type options. In

the latter case the result is well known as Black-Scholes pricing formula.

(a good exercise!)

63