Post on 04-Jun-2018
8/13/2019 Stochastic Processes Applications Lecturenotes
1/102
Stochastic Processes
Selective Topics and Applications
Nguyen V.M. Man, Ph.D.
January 15, 2013
8/13/2019 Stochastic Processes Applications Lecturenotes
2/102
Keywords. probabilistic model, random process, linear algebra, compu-
tational algebra, statistical inference and modeling
Copyright in 2013 by
Lecturer Nguyen V. M. Man, Ph.D.
Faculty Computer Science and Engineering
Institution University of Technology of HCMC - HCMUT
Address 268 Ly Thuong Kiet, Dist. 10, HCMC, Vietnam
Email: mnguyen@cse.hcmut.edu.vn
Ehome www.cse.hcmut.edu.vn/ mnguyen
the AUTHOR
Man Nguyen conducted his Ph.D. research in Applied Mathematicsand Industrial Statistics after following a master program in Computa-
tional Lie Algebras at HCMs University of Science.
The Ph.D. work was about Factorial Experiment Designs usingCom-
puter Algebraicmethods andDiscrete Mathematics, be done at the Eind-
hoven University of Technology, the Netherlands in 2001-2005.
His current research interests include
* Algebraic Statistics and Experimental Designs, and
* Mathematical & Statistical Modeling of practical problems.
For more information, you are welcomed to visit his e-home at
www.cse.hcmut.edu.vn/mnguyen
8/13/2019 Stochastic Processes Applications Lecturenotes
3/102
ii
8/13/2019 Stochastic Processes Applications Lecturenotes
4/102
Contents
1 Background 91.1 Introductory Stochastic Processes . . . . . . . . . . . . . 9
1.2 Generating Functions . . . . . . . . . . . . . . . . . . . . 13
1.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . 13
1.2.2 Elementary results of Generating Functions . . . 16
1.2.3 Convolutions . . . . . . . . . . . . . . . . . . . . 18
1.2.4 Compound distributions . . . . . . . . . . . . . . 20
2 Markov Chains & Modeling 23
2.1 Homogeneous Markov chains. . . . . . . . . . . . . . . . 24
2.2 Classification of States . . . . . . . . . . . . . . . . . . . 29
2.3 Markov Chain Decomposition . . . . . . . . . . . . . . . 33
2.4 Limiting probabilities & Stationary distributions. . . . . 36
2.5 Theory of stochastic matrix for MC . . . . . . . . . . . . 412.6 Spectral Theorem for Diagonalizable Matrices . . . . . . 45
2.7 Markov Chains with Absorbing States . . . . . . . . . . 48
2.7.1 Theory. . . . . . . . . . . . . . . . . . . . . . . . 48
2.8 Chapter Review and Discussion . . . . . . . . . . . . . . 52
iii
8/13/2019 Stochastic Processes Applications Lecturenotes
5/102
iv CONTENTS
3 Random walks & Wiener process 55
3.1 Introduction to Random Walks . . . . . . . . . . . . . . 553.2 Random Walk- a mathematical realization . . . . . . . . 56
3.3 Wiener process . . . . . . . . . . . . . . . . . . . . . . . 60
4 Arrival-Type processes 63
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2 The Bernoulli process. . . . . . . . . . . . . . . . . . . . 64
4.2.1 Basic facts . . . . . . . . . . . . . . . . . . . . . . 644.2.2 Random Variables Associated with the
Bernoulli Process . . . . . . . . . . . . . . . . . . 66
4.3 The Poisson process. . . . . . . . . . . . . . . . . . . . . 66
4.3.1 Poisson distribution . . . . . . . . . . . . . . . . . 66
4.3.2 Poisson process . . . . . . . . . . . . . . . . . . . 67
4.4 Course Review and Discussion . . . . . . . . . . . . . . . 68
5 Probability Modeling and Mathematical Finance 71
5.1 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.1.1 History. . . . . . . . . . . . . . . . . . . . . . . . 72
5.1.2 Conditional expectation . . . . . . . . . . . . . . 72
5.1.3 Key properties of Conditional expectation . . . . 74
5.1.4 Filtration . . . . . . . . . . . . . . . . . . . . . . 75
5.1.5 Martingale. . . . . . . . . . . . . . . . . . . . . . 76
5.1.6 Martingale examples . . . . . . . . . . . . . . . . 78
5.1.7 Stopping time . . . . . . . . . . . . . . . . . . . . 81
5.2 Stochastic Calculus . . . . . . . . . . . . . . . . . . . . . 83
5.2.1 A Simple Model for Asset Prices. . . . . . . . . . 83
8/13/2019 Stochastic Processes Applications Lecturenotes
6/102
5.2.2 Stochastic differential equation . . . . . . . . . . 83
6 Part III: Practical Applications of SP 85
6.1 Statistical Parameter Estimation . . . . . . . . . . . . . 85
6.2 Inventory Control in Logistics . . . . . . . . . . . . . . . 87
6.3 Epidemic processes . . . . . . . . . . . . . . . . . . . . . 89
6.4 Statistical Models in Risk Management . . . . . . . . . . 90
6.5 Optimization Methods for Portfolio Risk Management. . 91
8/13/2019 Stochastic Processes Applications Lecturenotes
7/102
2 CONTENTS
8/13/2019 Stochastic Processes Applications Lecturenotes
8/102
Introduction
We propose a few specific probabilistic techniques used in mathemati-
cally modeling complex phenomena in biology, service systems or finance
activities. These are aimed for graduates in Applied Mathematics and
Statistics.
The aims the course
introduces basic techniques of Stochastic Processes theory, including:
Markov chains and processes (discrete and continuous parame-
ters.)
Random walks, fluctuation theory.
Stationary processes, spectral analysis.
Diffusion processes.
Applications in finance and transportation.
The structure of the course. The course consists of three parts:
Part I: Motivated topics for studying Stochastic Processes
3
8/13/2019 Stochastic Processes Applications Lecturenotes
9/102
4 CONTENTS
Part II: Fundamental setting of Stochastic Processes
Part III: Connections and research projects
Part I: Motivated topics and Background
Service systems: mathematical model of queueing systems.
Introductory Stochastic Processes: basic concepts
Part II: Basic Stochastic Processes
We will discuss the followings:
Markov Chains and processes
Random walks and Wiener process
Arrival-Type processes
Martingaleand Stochastic Calculus
Part III: New applications of SP
We investigate few following applications:
Statistical Models and Simulation in Risk Management
Mathematical and Statistical Model in Transportation Science
8/13/2019 Stochastic Processes Applications Lecturenotes
10/102
Motivated topics of SP
Service systemsOver the last few years the Processor Sharing scheme has attracted re-
newed attention as a convenient and efficient approach for studying band-
width sharing mechanisms such as TCP or any process requiring resource
sharing.
Understanding and computing such those processes to produce a high
performance system with limited resources is a very difficult task. Few
typical aspects of the resource allocation are:
1. the fact that many classes of jobs (clients) come in a system with
distinct rates demands a wise policy to get them through efficiently,
2. measuring performance of a system through many different param-
eters (metrics) is hard, requires complex mathematical models.
Evolutionary Dynamics
Keywords: critial lineages, virus mutant,mutation, reproductive ratio,
invasion, escape, ecology, vaccine.
5
8/13/2019 Stochastic Processes Applications Lecturenotes
11/102
6 CONTENTS
Introductory Invasion and Escape. Some realistic biological phe-
nomina occur in nature such as: (a) a parasite infecting a new host, (b) aspecies trying to invade a new ecological niche, (c) cancer cells escaping
from chemotherapy and, (d) viruses evading anti-microbial therapy.
Typical problems. Imagine a virus of one host species that is trans-
ferred to another host species (HIV, SARS). In the new host, the virus
has a basic reproductive ratio R less than one. Some mutation may
be required to generate a virus mutant attempting to invade the new
host that can lead to an epidemic in the new host species. Few crucialconcerns are:
1. how to calculate the probability that such an attempt succeeds?
2. suppose a successful and effective vaccine is found; but some mu-
tants can breakthrough the protective immunity of the vaccine.
How to calculate the probability that a virus quasispecies contains
an escape mutant that establishes an infection and thereby causes
vaccine failure?
Summary usage We call for a theory to calculate the probability
of non-extinction/ escape for lineages starting from single individuals.
Computing Software
OpenModelica, ScalaLab and R.
Introductory R a statistiscal language
R is a language and environment for statistical computing and graphics.
It is similar to the S language and environment which was developed
8/13/2019 Stochastic Processes Applications Lecturenotes
12/102
CONTENTS 7
at Bell Laboratories (formerly AT&T, now Lucent Technologies). The
R distribution contains functionality for a large number of statisticalprocedures. Among these are:
linear and generalized
linear models, nonlinear regression models, time series analysis,
classical parametric and nonparametric tests ...
There is also a large set of functions which provide a flexible graphical
environment for creating various kinds of data presentations.
One of Rs strengths is the ease with which well-designed publication-
quality plots can be produced, including mathematical symbols and for-
mulae where needed. Great care has been taken over the defaults for the
minor design choices in graphics, but the user retains full control. R is an
integrated suite of software facilities for data manipulation, calculation
and graphical display. It includes
* an effective data handling and storage facility,
* a suite of operators for calculations on arrays, in particular matrices,
* a large, coherent, integrated collection of intermediate tools for data
analysis,
* graphical facilities for data analysis and display
* a well-developed, simple and effective programming language which
includes conditionals, loops, user-defined recursive functions and inputand output facilities.
Note: most classical statistics and much of the latest methodology is
available for use with R, but users may need to be prepared to do a little
work to find it.
8/13/2019 Stochastic Processes Applications Lecturenotes
13/102
8 CONTENTS
8/13/2019 Stochastic Processes Applications Lecturenotes
14/102
Chapter 1
Background
1.1 Introductory Stochastic Processes
The concept. A stochastics process is just a collection (usually infi-
nite) ofrandom variables, denoted Xt or X(t); where parameter t often
represents time. State space of a stochastics process consists of all real-
izationsxofXt, i.e. Xt=x says the random process is in state xat time
t. Stochastics processes can be generally subdivided into four distinct
categories depending on whether tor Xt are discrete or continuous:
1. Discrete processes: both are discrete, such asBernoulli process(die
rolling) or Discrete Time Markov chains.
2. Continuous time discrete state processes: the state space ofXt is
discrete and the index set, e.g. time set T oft is continuous, as an
interval of the reals R.
Poisson process the number of clients X(t) who has entered
9
8/13/2019 Stochastic Processes Applications Lecturenotes
15/102
10 CHAPTER 1. BACKGROUND
ACB from the time it opened until time t. X(t) will have the
Poisson distribution with the meanE[X(t)] =t (being thearrive rate).
Continuous time Markov chain.
Queuing process people not only enter but also leave the
bank, we need the distribution of service time (the time a
client spends in ACB).
3. Continuous processes: both Xt and t are continuous, such as dif-
fusion process (Brownian motion).
4. Discrete time continuous state processes: Xt is continuous andt is
discrete the so-called TIME SERIES such as
monthly fluctuations of the inflation rate of Vietnam,
daily fluctuations of a stock market.
Examples
1. Discrete processes: random walk model consisting of positions Xt
of an object (drunkand) at time discrete time point t during 24
hours, whose directional distance from a particular point 0 is mea-
sured in integer units. Here T={0, 1, 2, . . . , 24}.
2. Discrete time continuous processes: Xtis the number of births in a
given population during time period [0, t]. Here T = R+ = [0, )
and the state space is {0, 1, 2, . . . , } The sequence of failure times
of a machine is a specific instance.
8/13/2019 Stochastic Processes Applications Lecturenotes
16/102
8/13/2019 Stochastic Processes Applications Lecturenotes
17/102
12 CHAPTER 1. BACKGROUND
- X(t) and X(t+ ) will have the same distributions. For the
first-order distribution,
FX(x; t) =FX(x; t + ) =FX(x); and fX(x; t) =fX(x).
These processes are found in Arrival-Type Processes. For which,
we are interested in occurrences that have the character of an ar-
rival, such as message receptions at a receiver, job completions in
a manufacturing cell, customer purchases at a store, etc. We will
focus on models in which the interarrival times(the times between
successive arrivals) are independent random variables.
The case where arrivals occur in discrete time and the interarrival
times are geometrically distributed is the Bernoulli process.
The case where arrivals occur in continuous time and the inter-
arrival times are exponentially distributed is the Poisson process.
Bernoulli processand Poisson processwill be investigated next.
2. MARKOVIAN (memory-less) property: Many processes with memory-
less property caused by experiments that evolve in time and in
which the future evolution exhibits a probabilistic dependence on
the past.
As an example, the future daily prices of a stock are typically
dependent on past prices. However, in aMarkov process, we assume
a very special type of dependence: the next value depends on past
values only through the current value, that is Xi+1 depends only
on Xi, and not on any previous values.
8/13/2019 Stochastic Processes Applications Lecturenotes
18/102
1.2. GENERATING FUNCTIONS 13
1.2 Generating Functions
1.2.1 Introduction
Probabilistic models often involve several random variablesof interest.
For example, in a medical diagnosis context, the results of several tests
may be signicant, or in a networking context, the workloads of several
gateways may be of interest. All of these random variables are associated
with the same experiment, sample space, and probability law, and their
values may relate in interesting ways. Mathematically, a random variable
is a mapping!
Definition 1. A random variable X is a mapping (function) from a
sample space S to the reals R. For any j R, the preimage A :=
X1(j) ={w: X(w) =j} S is an event, then we understand
{X=j}= (A) =wA
(w).
For finite set - sample spaceSthen obviously
{X=j}= (A) = |A|
|S|.
A discrete random variable Xis the one having finite range Range(X),
described by the probability point or mass distribution (pmf), deter-
mined by{X=j}= pj. We must have
pj 0, andj
pj = 1.
8/13/2019 Stochastic Processes Applications Lecturenotes
19/102
14 CHAPTER 1. BACKGROUND
A continuous random variable Xis the one having infinite range Range(X),
described by the probability density distribution (pdf) f(x), that satisfies
f(t) 0, and
tRange(X)
f(t)dt= 1.
Generating functionsare important in handling stochastic processes in-
volving integral-valued random variables.
Multiple random variables. We consider probabilities involving si-
multaneously the numerical values of several random variables and to
investigate their mutual couplings. In this section, we will extend the
concepts of pmf and expectation developed so far to multiple random
variables.
Consider two discrete random variables X, Y : S R associated with
the same experiment. The joint pmf ofXand Y is dened by
pX,Y(x, y) = P(X=x, Y =y)
for all pairs of numerical values (x, y) thatXandYcan take. We will use
the abbreviated notation P(X = x, Y = y) instead of the more precise
notations P({X=x} {Y =y}) or P({X=x} and {Y =y}). That is
P(X=x, Y =y) = P({X=x} {Y =y}) = P({X=x} and {Y =y}).
For the pair of random variablesX, Y, we say
Definition2. X andYare independent if for allx, y R, we have
P(X=x, Y =y) ={X=x}{Y =y} pX,Y(x, y) =pX(x)pY(y),
or in terms of conditional probability
8/13/2019 Stochastic Processes Applications Lecturenotes
20/102
1.2. GENERATING FUNCTIONS 15
({X=x}|{Y =y}) ={X=x}.
This can be extended to the so-called mutually independent of a finite
numbernrandom variables.
Definition 3. The expectation operator defines the expected value of a
random variableX as
E(X) = xRange(X)
{X=x} x
If we consider Xis a function from a sample space Sto the naturals
N, then
E(X) =i=0
{X > i}.(W hy?)
Functions of Multiple Random Variables. When there are multi-
ple random variables of interest, it is possible to generate new random
variables by considering functions involving several of these random vari-
ables. In particular, a function Z=g(X, Y) of the random variables X
and Y denes another random variable. Its pmf can be calculated from
the joint pmfpX,Y according to
pZ(z) = (x,y)|g(x,y)=z
pX,Y(x, y).
Furthermore, the expected value rule for functions naturally extends and
takes the form
E[g(X, Y)] =(x,y)
g(x, y)pX,Y(x, y).
8/13/2019 Stochastic Processes Applications Lecturenotes
21/102
16 CHAPTER 1. BACKGROUND
Theorem 4. We have two important results of expectation.
Linearity E(X+ Y) =E(X) + E(Y) for any pair of random variables
X, Y
Independence E(X Y) =E(X) E(Y) for any pair of independent r.
v. X, Y
Mean, variance and moments of the probability distribution
{X=j}= pj
m= E(X) =j=0
j pj =P(1) =
j=0
qj =Q(1)(why!?)
Recall that the variance of the probability distribution pj is
2 =E(X(X 1)) + E(X) [E(X)]2
we need to know
E(X(X 1)) =j=0
j(j 1)pj =P(1) = 2Q(1)?
Therefore, 2 =?
Exercise: Find the formula of the r-th factorial moment
[r]=E(X(X 1)(X 2) (X r+ 1))
1.2.2 Elementary results of Generating Functions
Suppose we have a sequence of real numbers a0, a1, a2, . . . Introducing
the dummy variable x, we may define a function
A(x) =a0+ a1x + a2x2 + =
j=0
ajxj. (1.2.1)
8/13/2019 Stochastic Processes Applications Lecturenotes
22/102
1.2. GENERATING FUNCTIONS 17
If the series converges in some real interval x0 < x < x0, the func-
tion A(x) is called the generating function of the sequence {aj}.
Fact 1.1. If the sequence {aj} is bounded by some cosntant K, then
A(x) converges at least for|x|< 1 [Prove it!]
Fact 1.2. In case of the sequence{aj}represents probabilities, we intro-
duce the restriction
aj 0,j=0
aj = 1.
The corresponding function A(x) is then called aprobability-generating
function. We consider the (point) probability distribution and the tail
probability of a random variable X, given by
{X=j}= pj, P{X > j}= qj,
then the usual distribution function is P{Xj}= 1qj. The probability-
generating function now is
P(x) =j=0
pjxj =E(xj), E is the expectation operator.
Also we can define a generating function for the tail probabilities:
Q(x) =j=0
qjxj.
Q(x) is not a probability-generating function, however.
Fact 1.3.
a/ P(1) =
j=0pj1j = 1 and |P(x)|
j=0 |pjx
j|
j=0pj
1 if|x|< 1. So P(x) is absolutely convergent at least for|x| 1.
b/Q(x) is absolutely convergent at least for|x|< 1.
8/13/2019 Stochastic Processes Applications Lecturenotes
23/102
18 CHAPTER 1. BACKGROUND
c/ Connection betweenP(x) andQ(x): (check this!)
(1 x)Q(x) = 1 P(x) orP(x) + Q(x) = 1 + xQ(x).
Finding a generating function from a recurrence: multiply both
sides by xn. For example, the Fibonacci sequence
fn=fn1+ fn2 =F(x) =x + xF(x) + x2F(x)
Finding a recurrence from a generating function: whenever you
knowF(x), we find its power seriesP, the coefficicents ofPbeforexn are
Fibonacci numbers. How? Just remember how to find a partial fractions
expansionofF(x), in particular a basic expansion
1
1 x= 1 + x + 2x2 +
In general, ifG(x) is a generating function of a sequence (gn) then
G(n)(0) =n!gn
1.2.3 Convolutions
Now we consider two nonnegative independent integral-valued random
variablesXand Y, having the probability distributions
{X=j}= aj, P{Y =k}= bk. (1.2.2)
The joint probability of the event (X=j, Y =k) is ajbk obviously. We
form a new random variable S=X+ Y, then the event S=r comprises
the mutuallyexclusive events
(X= 0, Y =r), (X= 1, Y =r 1), , (X=r, Y = 0).
8/13/2019 Stochastic Processes Applications Lecturenotes
24/102
1.2. GENERATING FUNCTIONS 19
Fact 1.4. The probability distribution of the sumS then is
{S=r}= cr =a0br+ a1br1+ + arb0.
Proof.
pS(r) = P(X+Y =r) =
(x,y):x+y=r
P(X=x and Y =y) ==x
pX(x)pY(rx)
This method of compounding two sequences of numbers (not necessarily
be probabilities) is called convolution. Notation
{cj}= {aj} {bj}
will be used.
Fact 1.5. Define the generating functions of the sequence{aj},{bj}and
{cj}by
A(x) =j=0
ajxj, B(x) =
j=0
bjxj , C(x) =
j=0
cjxj,
it follows thatC(x) =A(x)B(x). [check this!]
In practical applications, the sum of several independent integral-
valued random variables Xi can be defined
Sn = X1+ X2+ + Xn, n Z+.
If the Xi have a common probability distribution given by pj, with
probability-generating function P(x), then the probability-generating
function ofSn isP(x)n. Clearly, the n-fold convolution ofSn is
{pj} {pj} {pj} (n factors) ={pj}n.
8/13/2019 Stochastic Processes Applications Lecturenotes
25/102
20 CHAPTER 1. BACKGROUND
1.2.4 Compound distributions
In our discussion so far of sums of random variables, we have always
assumed that the number of variables in the sum is known and xed, i.e.,
it is nonrandom. We now generalize the previous concept of convolution
to the case where the number Nof random variables Xk contributing to
the sum is itself a random variable! In particular, we consider the sum
SN=X1+ X2+ + XN, where
{Xk=j}= fj,
{N=n}= gn,
{SN=l}= hl.
(1.2.3)
Probability-generating functions ofX, N and Sare
F(x) =
fjxj,
G(x) =
gnxn,
H(x) =
hlxl.
(1.2.4)
Compute H(x) with respect to F(x) and G(x). Prove that
H(x) =G(F(x)).
Example 1.1. A remote village has three gas stations, and each one
of them is open on any given day with probability 1/2, independently of
the others. The amount of gas available in each gas station is unknownand is uniformly distributed between 0 and 1000 gallons. We wish to
characterize the distribution of the total amount of gas available at the
gas stations that are open.
The number Nof open gas stations is a binomial random variable
8/13/2019 Stochastic Processes Applications Lecturenotes
26/102
1.2. GENERATING FUNCTIONS 21
with p= 1/2 and the corresponding transform is
GN(x) = (1 p +pex)3 =18
(1 + ex)3.
The transform (probability-generating function) FX(x) associated with
the amount of gas available in an open gas station is
FX(x) =e1000x 1
1000x .
The transformHS(x) associated with the total amountSof gas avail-
able at the three gas stations of the village that are open is the same as
GN(x), except that each occurrence ofex is replaced with FX(x), i.e.,
HS(x) =G(F(x)) =1
8(1 + FX(x))
3.
Next chapter will discuss Fundamental Stochastic Processes.
8/13/2019 Stochastic Processes Applications Lecturenotes
27/102
22 CHAPTER 1. BACKGROUND
8/13/2019 Stochastic Processes Applications Lecturenotes
28/102
Chapter 2
Markov Chains & Modeling
We discuss the concept of discrete time Markov Chain or just Markov
Chains (MC) in this section. Suppose we have a sequence Mof consec-
utive trials, numbered n = 0, 1, 2, . The outcome of the nth trial is
represented by the random variable Xn, which we assume to be discrete
and to take one of the values jin a finite setQof discreteoutcomes/states
{e1, e2, e3, . . . , es}.
M is called a (discrete time) Markov chain if, while occupying Q states
at each of the unit time points 0, 1, 2, 3, . . . , n1, n , n+1, . . .,M satisfies
the following property, called
Markov property or Memoryless property:
(Xn+1 = j |Xn=i, , X0=a) = (Xn+1= j |Xn=i),
for all n = 0, 1, 2, .
23
8/13/2019 Stochastic Processes Applications Lecturenotes
29/102
24 CHAPTER 2. MARKOV CHAINS & MODELING
(In each time step n to n+ 1, the process can stay at the same state ei
(at both n, n+ 1) or move to other state ej (at n+ 1) with respect tothe memoryless rule, saying the future behavior of system depends only
on the present and not on its past history.)
Definition5 (One-step transition probability).
Denote the absolute probability of outcomej at thenth trial by
pj(n) = (Xn=j ) (2.0.1)
The one-step transition probability, denoted
pij(n + 1) = (Xn+1=j |Xn = i),
defined as the conditional probability that the process is in statej at time
n + 1 given that the process was in statei at the previous timen, for all
i, j Q.
2.1 Homogeneous Markov chains
If the state transition probabilities pij(n+ 1) in a Markov chain M is
independentof time n, they are said to be stationary,time homogeneous
or just homogeneous. The state transition probability in homogeneous
chain then can be written without mention time point n:
pij = (Xn+1=j |Xn= i). (2.1.1)
Unless stated otherwise, we assume and will work with homogeneous
Markov chains M. The one-step transition probabilities given by 2.1.1
8/13/2019 Stochastic Processes Applications Lecturenotes
30/102
2.1. HOMOGENEOUS MARKOV CHAINS 25
of these Markov chains must satisfy:
sj=1
pij = 1; for each i= 1, 2, , sand pij 0.
Transition Probability Matrix. In practice, we are likely given the ini-
tial distribution (the probability distribution of starting position of the
concerned object at time point 0), and the transition probabilities; and
we want to determine the the probability distribution of positionXn for
any time point n > 0. The Markov property, quantitatively describedthrough transition probabilities, is represented in the state transition
matrix P = [pij]:
P =
p11 p12 p13 . . . .p1s.
p21 p22 p23 . . . p2s.
p31 p32 p33 . . . p3s...
.
..
.
..
.
. . . ..
.
(2.1.2)
Briefly, we have
Definition 6. A (homogeneous) Markov chain M is a triple (Q,p,A)
in which:
Q is a finite set of states (be identified with an alphabet),
p(0) are initial probabilities, (at initial time pointn= 0)
Pare state transition probabilities, denoted by a matrixP = [pij]
in which
pij = (Xn+1 = j |Xn=i)
.
8/13/2019 Stochastic Processes Applications Lecturenotes
31/102
26 CHAPTER 2. MARKOV CHAINS & MODELING
And such that thememoryless property is satisfied,ie.,
(Xn+1=j |Xn=i, , X0=a) = (Xn+1 = j |Xn=i), for alln.
In practice, the initial probabilities p(0) is obtained at the current
time (begining of a research), and the transition probability matrix P is
found from empirical observations in the past. In most cases, the major
concern is using P and p(0) to predict future.
Example 2.1. The Coopmart chain (denotedC) in SG currently con-
trols60% of the daily processed-food market, their rivals Maximart and
other brands (denotedM) takes the other share. Data from the previous
years (2006 and 2007) show that88% ofCs customers remained loyal
to C, while12% switched to rival brands. In addition,85% ofMs cus-
tomers remained loyal to M, while other15% switched to C. Assuming
that these trends continue, determineCs share of the market (a) in 5
years and (b) over the long run.
Proposed solution. Suppose that the brand attraction is time homoge-
neous, for a sample of large enough size n, we denote the customers
attention in the year n by a random variable Xn. The market share
probability of the whole population then can be approximated by using
the sample statistics, e.g.
P(Xn=C) = |{x: Xn(x) =C}|
n , and P(Xn=M) = 1 P(Xn=C).
Set n= 0 for the current time, the initial probabilities then is
p(0) = [0.6, 0.4] = [P(X0 = C),P(X0=M)].
8/13/2019 Stochastic Processes Applications Lecturenotes
32/102
2.1. HOMOGENEOUS MARKOV CHAINS 27
Obviously we want to know the market share probabilities p(n) = [P(Xn =
C),P
(Xn = M)] at any yearn >0. We now introduce a transition prob-ability matrix with labels on rows and columns to be Cand M
P=
C M
C 0.88 0.12
M 0.15 0.85
=
1 a= 0.88 a= 0.12
b= 0.15 1 b= 0.85
, =
0.88 0.12
0.15 0.85
,
(2.1.3)
where a= pCM = P[Xn+1 =M|Xn =C], b= pMC= P[Xn+1 =C|Xn =M].
Higher-order transition probabilities.
The aim: find the absolute probabilities at any stage n. We write
p(n)ij = (Xn+m = j |Xm=i), with p(1)ij =pij (2.1.4)
for the n-step transition probability, being dependent of m N, see
Equation 2.1.1. Then-step transition matrix is denoted asP(n) = (p(n)ij ).
For the case n= 0, we have
p(0)ij =ij = 1 ifi = j, and i=j.
Chapman Komopgorov equations. Chapman Komopgorov equations re-
late then-step transition probabilities and k-step andn k-step transi-
tion probabilities:
p(n)ij =
sh=1
p(nk)ih p
(k)hj, 0< k < n.
This results in the matrix notation
P(n) =P(nk)P(k).
8/13/2019 Stochastic Processes Applications Lecturenotes
33/102
28 CHAPTER 2. MARKOV CHAINS & MODELING
Since P(1) =P, we get P(2) =P2, and in general P(n) =Pn.
Let p
(n)
denote the vector form of probability mass distribution (pmf orabsolute probability distribution) associated withXnof a Markov process,
that is
p(n) = [p1(n), p2(n), p3(n), . . . , ps(n)],
where each pi(n) is defined as in 2.0.1.
Proposition 7. The absolute probability distributionp(n) at any stagen
of a Markov chain is given in the matrix form
p(n) =Pnp(0), wherep(0) =p is the initial probability vector. (2.1.5)
Proof. We employ two facts:
* P(n) =Pn, and
* the absolute probability distributionp(n+1) at any stagen +1 (asso-
ciated withXn+1) can be found by the 1-step transition matrixP = [pij]
and the distribution
p(n) = [p1(n), p2(n), p3(n), . . . , ps(n)]
at any stage n(associated with Xn):
pj(n + 1) =si=1
pijpi(n), or in the matrix notation p(n+1) =P p(n).
Then just do the induction p(n+1) = P p(n) = P P,p(n1) = =
Pn+1
p(0)
.
Example 2.2 (The Coopmart chain: cont. ). (a/) Cs share of the
market in 5 years can be computed by
p(5) = [pC(5), pM(5)] =P5p(0).
8/13/2019 Stochastic Processes Applications Lecturenotes
34/102
2.2. CLASSIFICATION OF STATES 29
Practical Problem1. A state transition diagram of a finite-state Markov
chain is a line diagram with a vertex corresponding to each state and a
directed line between two vertices i and j ifpij >0. In such a diagram,
if one can move fromi and j by a path following the arrows, theni j .
The diagram is useful to determine whether a finite-state Markov
chain is irreducible or not, or to check for periodicities.
Draw the state transition diagrams and classify the states of the MCs
with the following transition probability matrices:
P1=
0 0.5 0.5
0.5 0 0.5
0.5 0.5 0
; P2 =
0 0 0.5 0.5
1 0 0 0
0 1 0 0
0 1 0 0
; P3 =
0.3 0.4 0 0 0.3
0 1 0 0 0
0 0 0 0.6 0.4
0 0 1 0 0
2.2 Classification of States
A) Accessible states.
Statejis said to be accessible from state i if for some N0, p(N)ij >0,
and we write i j . Two statesi andj accessible to each other are said
to communicate, and we write i j. If all states communicate with
each other, then we say that the Markov chain is irreducible. Formally,irreducibility means
i, jQ : N0[p(N)ij >0].
B) Recurrent/persistence states and Transient states.
8/13/2019 Stochastic Processes Applications Lecturenotes
35/102
30 CHAPTER 2. MARKOV CHAINS & MODELING
Let A(i) be the set of states that are accessible from i. We say that
i is recurrent if from any future state, there is always some probabilityof returning to i and, given enough time, this is certain to happen. By
repeating this argument, if a recurrent state is visited once, it will be
revisited an innite number of times.
A state is called transientif it is not recurrent. In particular, there are
states j A(i) such that i is not accessible from j. After each visit to
statei, there is positive probability that the state enters such a j . Given
enough time, this will happen, and state i cannot be visited after that.
Thus, a transient state will only be visited a finite number of times.
We now formalize concepts of recurrent/persistence state and transient
state.
Let thefirst return timeTj indicate the first timeor the number of steps
the chain is firstly at state j after leaving j after time 0 (if j is never
reached then setTj =) It is a discrete r.v., taking values in {1, 2, 3,...}.
For any two states i =j andn >0, letfni,j be the conditional probability
the chain is firstly at state j after n steps given it was at state i at time
0:
fni,j := P[Tj =n|X0 = i] = P[Xn = j, Xk=j, k= 1, 2,...,n 1|X0 = i]
and f0i,j = 0 since Tj 1. Then clearly
f1i,j = P[X1 = j |X0=i] =pi,j
8/13/2019 Stochastic Processes Applications Lecturenotes
36/102
8/13/2019 Stochastic Processes Applications Lecturenotes
37/102
32 CHAPTER 2. MARKOV CHAINS & MODELING
Statej is said to be transient (or nonrecurrent) if
fj,j = P[Tj
8/13/2019 Stochastic Processes Applications Lecturenotes
38/102
2.3. MARKOV CHAIN DECOMPOSITION 33
2.3 Markov Chain Decomposition
Fact 2.1. In any Markov Chain, the followings are correct.
It can be decomposed into one or morerecurrent classesor equiv-
alent classes, plus possibly some transient states. Each equivalent
class contains those states that communicate with each other.
A recurrent state is accessible from all states in its class, but is not
accessible from recurrent states in other classes;
A transient state is not accessible from any recurrent state. But,
at least one, possibly more, recurrent states are accessible from a
given transient state.
For the purpose of understanding long-termbehavior of Markov
chain, it is important to analyze chains that consist of a single recur-
rent class. Such Markov chain is called irreducible chain.
For the purpose of understanding short-termbehavior, it is also
important to analyze the mechanism by which any particular class of
recurrent states is entered starting from a given transient state.
C) Periodic states.
In a finite Markov Chain M = (Q,, P) (i.e. having finite number
of states), a periodicstate i is state to which an agent could go back at
positive integer time points t0, 2t0, 3t0, . . .(multiple of an integer period
t0 >1). t0 is named the period ofi, being the greatest common divisor
of the integers {t >0 :p(t)i,i >0}.
8/13/2019 Stochastic Processes Applications Lecturenotes
39/102
34 CHAPTER 2. MARKOV CHAINS & MODELING
A Markov Chain isaperiodicif there is no such periodic state, in other
words, if the period of each state i Q is 1.
For example, we could check if a MC has the transition matrix
P =
0 0 0.6 0.4
0 0 0.3 0.7
0.5 0.5 0 0
0.2 0.8 0 0
;
then it is periodic. Indeed, if the Markovian random variable (agent)
starts at time 0 in stateE1, then at time 1 it must be in state E3 orE4,
at time 2 it must be in state E1 or E2. Therefore, it generaly can visit
onlyE1 at times 2,4,6, ... Summarizing we have
Definition9. A finite Markov chainM= (Q,, P) is
1. irreducible iff it has only one single recurrent class, or any state
can be accessible from all other states.
2. aperiodic iff the period of each state i Q is 1; or it has no
periodic state.
3. ergodic if it is positive recurrent and aperiodic.
It can be shown that recurrence, transientness, and periodicity are all
class properties; that is, if state i is recurrent (positive recurrent, null
recurrent, transient, periodic), then all other states in the same class of
state i inherit the same property.
D) Absorbing states and Absorption probabilities.
8/13/2019 Stochastic Processes Applications Lecturenotes
40/102
2.3. MARKOV CHAIN DECOMPOSITION 35
State j is said to be an absorbing state ifpjj = 1; that is, once state
j is reached, it is never left.
If there is a unique absorbing statek, its steady-state probability is
1 (because all other states are transient and have zero steady-state
probability), and will be reached with probability 1, starting from
any initial state.
If there are multiple absorbing states, the probability that one
of them will be eventually reached is still 1, but the identity of
the absorbing state to be entered is random and the associated
probabilities may depend on the starting state.
Can we determine precisely absorption probabilities for all the ab-
sorbing states in a MC in the generic case?
Consider a Markov chain X(n) = {Xn, n 0} with finite state space
E={1, 2, , N}and transition probability matrix P.
Theorem 10. Let A = {1, , m} be the set of absorbing states and
B ={m + 1, , N} be a set of nonabsorbing states.
Then the transition probability matrixPcan be expressed as
P=
I O
R Q
whereI ism midentity matrix, 0is anm (Nm)zero matrix,
and the elements ofR are the one-step transition probabilities from
nonabsorbing to absorbing states, and the elements of Q are the
one-step transition probabilities among the nonabsorbing states.
8/13/2019 Stochastic Processes Applications Lecturenotes
41/102
36 CHAPTER 2. MARKOV CHAINS & MODELING
LetU= [uk,j ] be an(N m) m matrix and its elements are the
absorption probabilities for the various absorbing states,
uk,j = P[Xn= j(A)|X0 = k(B)]
We have
U= (I Q)1R= R,
is called the fundamental matrixof the Markov chainX(n).
2.4 Limiting probabilities & Stationary dis-
tributions
From now on we assume that all MCs are finite, aperiodic and irre-
ducible. The irreducibility assumption implies that any state can even-
tually be reached from any other state. Both irreducibility and aperiod-
icity assumptions hold for essentially all practical applications of MCs
(in bioinformatics,...) except for the case of MCs with absorbing states.
Definition11. Vectorp = (p1, p2, , p
s) is called the stationary dis-
tribution of a Markov chain{Xn, n 0}with the state transition matrix
P if:
pP= p.
This equation indicates that a stationary distribution p is a left eigen-
vector of P with eigenvalue 1. In general, we wish to know limiting
probabilities p from taking n in the equation
p() =Pp(0).
8/13/2019 Stochastic Processes Applications Lecturenotes
42/102
2.4. LIMITING PROBABILITIES & STATIONARY DISTRIBUTIONS37
We need some general results to determine the stationary distribution
p
and limiting probabilitiesp
of a Markov chain. For a specific classof MCs as follows, there exist stationary distribution.
Lemma 12. If M = (Q,, P) is a finite, aperiodic and irreducible
Markov chain, then some power ofPis strictly positive.
See a proof in [7], page 79. Such matrices P (that there exists a
natural m such that Pm >0) are called regularmatrices.
Theorem 13. [Equilibrium distribution] Given a finite, aperiodic and
irreducible Markov chainM = (Q,, P), whereQ consists ofs states.
Then there exist stationary probabilities
pi := limt
pi(t),
where thepi form a unique solution to the conditions:
si=1 p
i = 1; where eachp
i 0;
pj =s
i=1 pipi,j.
See the proof in Theorem 19. We discuss here two particular cases
when s = 2 ands >2.
A) Markov chains that have two states.
At first we investigate the case of Markov chains that have two states, say
Q= {e1, e2}. Leta= pe1e2andb= pe2e1the state transition probabilities
between distinct states in a two state Markov chain, its state transition
matrix is
P =
p11 p21
p12 p22
=
1 a a
b 1 b
, where 0< a
8/13/2019 Stochastic Processes Applications Lecturenotes
43/102
38 CHAPTER 2. MARKOV CHAINS & MODELING
Proposition 14.
a) Then-step transition probability matrix is given by
P(n) =Pn = 1
a + b
b a
b a
+ (1 a b)n
a a
b b
b) Find the limit matrix whenn .
To prove this basic Proposition 14 (computing transition probability
matrix of two state Markov chains), we use a fundamental result of Linear
Algebra that is recalled in Subsection 2.6.
Proof. The eigenvalues of the state transition matrixP found by solving
equation
c() =|I P|= 0
are 1 = 1 and 2 = 1 a b. The spectral decomposition of square
matrix says Pcan be decomposed into two constituent matrices E1, E2
(since only two eigenvalues was found):
E1= 1
1 2[P 2I], E2 =
1
2 1[P 1I].
That means, E1, E2 are orthogonal matrices, i.e. E1 E2 = 0 =E2 E1,
and
P =1E1+ 2E2; E21 =E1, E
22 =E2.
Hence,
Pn =n1E1+ n2E2 = E1+ (1 a b)
nE2,
or
8/13/2019 Stochastic Processes Applications Lecturenotes
44/102
2.4. LIMITING PROBABILITIES & STATIONARY DISTRIBUTIONS39
P(n) =Pn = 1a + b
b ab a
+ (1 a b)n a ab b
b) The limit matrix when n :
limn
Pn = 1
a + b
b a
b a
B) Markov chains that have more than two states.
For s > 2, it is cumbersome to compute constituent matrices Ei ofP,
we could employ the so-called regular property.
Definition 15. Markov chains are regular if there exists m N such
that
P(m) =Pm >0
(i.e. every matrix entry is positive).
In summary, in a DTMC Mthat have more than two states, we have 4
cases:
Fact 2.2.
1. M has irreducible, positive recurrent, but periodic states. The
component i of the stationary distribution vector must be un-
derstood as the long-run proportion of time that the process is in
state i.
8/13/2019 Stochastic Processes Applications Lecturenotes
45/102
40 CHAPTER 2. MARKOV CHAINS & MODELING
2. M has several closed, positive recurrent classes. In this case, the
transition matrix of the DTMC takes the block form.
In contrast to the irreducible ergodic DTMC, where the limiting
distribution is independent of the initial state, the DTMC with sev-
eral closed, positive recurrent classes has the limiting distribution
that is dependent on the initial state.
3. Mhas both recurrent and transient classes. In this situation, we
often seek the probabilities that the chain is eventually absorbed
by different recurrent classes. See the well-known gamblers ruin
problem.
4. M is an irreducible DTMC withnullrecurrent or transient states.
This case is only possible when the state space is innite, since any
nite-state, irreducible DTMC must be positive recurrent. In this
case, neither the limiting distribution nor the stationary distribu-
tion exists.
A well-known example of this case is the random walk model.
Practical Problem3. Consider a Markov chain with state space {0, 1, 2}
and transition probability matrix
P =
0 0.5 0.5
1 0 0
1 0 0
;
Show that state 0 is periodic with period 2.
8/13/2019 Stochastic Processes Applications Lecturenotes
46/102
2.5. THEORY OF STOCHASTIC MATRIX FOR MC 41
Practical Problem4(The Gamblers Ruin problem). Let two gam-
blers, A and B, initially have k dollars and m dollars, respectively. Sup-pose that at each round of their game, A wins one dollar from B with
probability p and loses one dollar to B with probability q= 1 p. As-
sume that A and B play until one of them has no money left. LetXn be
As capital after round n, where n= 0, 1, 2, and X0=k.
(a) Show thatX(n) ={Xn, n 0} is a Markov chain with absorbing
states.
(b) Find its transition probability matrixP. RealizeP whenp= q=
1/2 andN= 4
(c*) What is the probability of As losing all his money?
2.5 Theory of stochastic matrix for MC
A stochastic matrix is a matrix for which each column sum equals one.
If the row sums also equal one, the matrix is called doubly stochastic.
Hence the transition probability matrix P = [pij] is a stochastic matrix.
Proposition 16. Every stochastic matrixK has
1 as an eigenvalue (possibly with multiple), and
none of the eigenvalues exceeds 1 in absolute value, that is all eigen-
valuesi satisfy|i| 1.
Proof.
8/13/2019 Stochastic Processes Applications Lecturenotes
47/102
42 CHAPTER 2. MARKOV CHAINS & MODELING
The spectral radius (K) of any square K is defined as
(K) = maxi {eigen values i}.
When K is stochastic, (K) = 1. Note that ifP is a transition matrix
for a nite-state Markov chain, (then P is stochastic) the multiplicity
of the eigenvalue (K) = 1 is equal to the number ofrecurrent classes
associated with P .
Fact 2.3. IfKis a stochastic matrix thenKm is a stochastic matrix.
Proof. Let e = [1, 1, , 1]t the all-one vector, then use the fact that
Ke= e. Prove that Kme= e.
Let A = [aij] > 0 denote that every element aij of A satisfies the
condition aij >0.
Definition17.
A stochastic matrixP = [pij] is ergodic if limm Pm = L (say)
exists, that is eachp(m)ij has a limit whenm .
A stochastic matrix P is regular if there exists a natural m such
that Pm > 0. In our context, a Markov chain, with transition
probability matrixP, is called regular if there exists anm >0 such
thatPm >0, i.e. there is a finite positive integerm such that after
m time-steps, every state has a nonzero chance of being occupied,
no matter what the initial state.
Example2.3. Is the matrix
P =
0.88 0.12
0.15 0.85
regular? ergodic? Calculate the limit matrixL= limm Pm.
8/13/2019 Stochastic Processes Applications Lecturenotes
48/102
8/13/2019 Stochastic Processes Applications Lecturenotes
49/102
44 CHAPTER 2. MARKOV CHAINS & MODELING
(p is called a stationary distribution of MC). Your final task is proving
that Ls rows are identical and equal to the stationary distribution p
i.e.: L= [p, ,p].
Corollary 20. Few important remarks are: (a) for regular MC, the
long-term behavior does not depend on the initial state distribution prob-
abilitiesp(0); (b) in general, the limiting distributions are influenced by
the initial distributionsp(0), whenever the stochastic matrixP = [pij ]is
ergodic but not regular. (See more at problem D).
Example2.4. Consider a Markov chain with two states and transition
probability matrix 3/4 1/4
1/2 1/2
(a) Find the stationary distributionp of the chain. (b) Findlimn Pn
by first evaluatingPn. (c) Find limn Pn.
8/13/2019 Stochastic Processes Applications Lecturenotes
50/102
2.6. SPECTRAL THEOREM FOR DIAGONALIZABLE MATRICES45
2.6 Spectral Theorem for Diagonalizable
Matrices
Consider a square matrix Pof order s with spectrum (P) ={1, 2, , k}
consisting of its eigenvalues. Then:
If{(1,x1), (2,x2), , (k,xk)} are eigenpairs for P, then S =
{x1, ,xk} is a linearly independent set. IfBi is a basis for the
null space N(P iI), then B = B1 B2 Bk is a linearly
independent set
P is diagonalizable if and only if P possesses a complete set of
eigenvectors (i.e. a set ofs linearly independent vectors). More-
over, H1P H = D = (1, 2, , s) if and only if the columns
ofHconstitute a complete set of eigenvectors and the js are the
associated eigenvalues- i.e., each (j , H[, j]) is an eigenpair forP.
Spectral Theorem for Diagonalizable Matrices. A square matrix
Pof order s with spectrum(P) ={1, 2, , k}consisting of eigen-
values is diagonalizable if and only if there exist constituent matrices
{E1, E2, , Ek}(called the spectral set) such that
P =1E1+ 2E2+ + kEk, (2.6.1)
where the Eis have the following properties:
Ei Ej = 0 whenever i=j , and E2i =Ei for all i= 1..k
E1+ E2+ + Ek=I
8/13/2019 Stochastic Processes Applications Lecturenotes
51/102
46 CHAPTER 2. MARKOV CHAINS & MODELING
In practice we employ Fact 2.6.1 in two ways:
Way 1 : if we know the decomposition 2.6.1 explicitly, then we cancompute powers
Pm =m1 E1+ m2 E2+ +
mkEk, for any integer m >0. (2.6.2)
Way 2: if we know P is diagonalizable then we find the constituent
matrices Ei by:
* finding the nonsingular matrix H= (x1|x2| |xk), where each xi
is a basis left eigenvector of the null subspace
N(P iI) ={v: (P iI)(v) = 0 Pv= iv};
** then, P =H DH1 = (x1|x2| |xk) D H1 where
D= diag(1, , k) the diagonal matrix, and
H
1
=K
=
yt1
yt2
...
ytk
; (i.e.K= (y1|y2| |yk)).
Here each yi is a basis right eigenvector of the null subspace
N(P iI) ={v:vP =iv
}.
The constituent matrices Ei=xi yti.
Example2.5. Diagonalize the following matrix and provide its spectral
decomposition.
P =
1 4 4
8 11 8
8 8 5
.
8/13/2019 Stochastic Processes Applications Lecturenotes
52/102
2.6. SPECTRAL THEOREM FOR DIAGONALIZABLE MATRICES47
The characteristic equation is
p() = det(P I) =3 + 52 + 3 9 = 0.
So = 1 is a simple eigenvalue, and = 3 is repeated twice (its
algebraic multiplicityis 2). Any set of vectors x satisfying
x N(P I) (P I)x= 0
can be taken as a basis of the eigenspace (or null space) N(P I).
Bases of for the eigenspaces are:
N(P1I) =span
[1, 2, 2]
; andN(P+3I) =span
[1, 1, 0], [1, 0, 1]
.
Easy to check that these three eigenvectors xi form a linearly indepen-
dent set, then P is diagonalizable. The nonsingular matrix (also called
similarity transformation matrix)
H= (x1|x2|x3) =
1 1 12 1 0
2 0 1
;
will diagonalize P, and since P =H DH1 we have
H1P H=D = (1, 2, 2) = (1, 3, 3) =
1 0 0
0 3 0
0 0 3
Here, H1 =
1 1 1
2 3 2
2 2 1
implies that
8/13/2019 Stochastic Processes Applications Lecturenotes
53/102
48 CHAPTER 2. MARKOV CHAINS & MODELING
yt1 = [1, 1, 1], yt2 = [2, 3, 2], y
t3 = [2, 2, 1]. Therefore, the con-
stituent matrices
E1 = x1yt1=
1 1 1
2 2 2
2 2 2
; E2 = x2yt2=
2 3 2
2 3 2
0 0 0
; E3 = x3yt3=
2 2 1
0 0 0
2 2 1
.
Obviously,
P =1E1+ 2E2+ 3E3 =
1 4 4
8 11 8
8 8 5
.
2.7 Markov Chains with Absorbing States
2.7.1 Theory
Two quetions:
/ if there are at least two absorbing states, what is the probabilitythat a specific absorbing state is the one eventually entered?
/ what is the mean time until an absorbing state is eventually en-
tered?
Question . The probability that a specific absorbing state is the one
eventually entered.
Theorem 21. Consider a Markov chainX(n) ={Xn, n 0}with finite
state spaceE = {1, 2, , N} and transition probability matrixP. Let
A= {1, , m} be the set of absorbing states andB ={m+ 1, , N}
be a set of nonabsorbing states.
8/13/2019 Stochastic Processes Applications Lecturenotes
54/102
8/13/2019 Stochastic Processes Applications Lecturenotes
55/102
50 CHAPTER 2. MARKOV CHAINS & MODELING
we could equivalently check that absorption ofX(n) in one or another
of the absorbing states is certain. Formally, you could prove
Lemma 22.
limnP[Xn B ] = 0 or limnP[Xn A] = 1.
Question . The mean time until an absorbing state is eventually
entered.LetTk denote the total time units (or steps) to absorption from state
k (meaning X0 = k), where k=m + 1..N. Let
T = [Tm+1, Tm+2, , TN]
Then it can be shown that the mean time E(Tk) to absorption from
state k
E(Tk) =N
i=m+1
[k, i]
where [k, i] the (k, i)th element of the fundamental matrix .
Proof. Let W = [nj,k] ,wherenj,kis the number of times the statek(B)
is occupied until absorption takes place when Xn starts in state j (B).
Then
Tj =N
k=m+1
nj,k,
then calculate E(nj,k).
8/13/2019 Stochastic Processes Applications Lecturenotes
56/102
2.7. MARKOV CHAINS WITH ABSORBING STATES 51
Example2.6. Consider a simple random walkX(n)with absorbing bar-
riers at state 0 and stateN = 3 = mA+mB as in the Gamblers Ruinproblem; wheremA= 2USD isA capital andmB = 1USD isB capital
at round 0. Can you write out
a/ the transition probability matrix P, known thatp = P[ A wins ] in
each round, where 0< p
8/13/2019 Stochastic Processes Applications Lecturenotes
57/102
52 CHAPTER 2. MARKOV CHAINS & MODELING
2.8 Chapter Review and Discussion
Application in Large Deviation theory. We are interested in a
practical situation in insurance industry, originally realized from 1932
by F. Esscher, (Notices of AMS, Feb 2008).
Problem: too many claims could be made against the insurance com-
pany, we worry about the total claim amount exceeding the reserve fund
set aside for paying these claims.
Our aim: to compute the probability of this event.
Modeling. Each individual claim is a random variable, we assume
some distribution for it, and the total claim is then the sum Sof a large
number of (independent or not) random variables. The probability that
this sum exceeds a certain reserve amount is the tail probabilityof the
sumSof independent random variables.
Large Deviation theoryinvented by Esscher requires the calculation of the
moment generating functions! If your random variables are independent
then the moment generating functions are the product of the individual
ones, but if they are not (like in a Markov chain) then there is no longer
just one moment generating function!
Research project: study Large Deviation theory to solve this problem.
Practical Problem5 (Brand switching model for consumer behavior).
Suppose there are several brands of a product competing in a market
(for example, those brands might be competing brands of soft drinks).
Assume that every week a consumer buys one of the three brands, labeled
as 1, 2, and 3. In each week, a consumer may either buy the same
8/13/2019 Stochastic Processes Applications Lecturenotes
58/102
2.8. CHAPTER REVIEW AND DISCUSSION 53
brand he bought the previous week or switch to a different brand. A
consumers preference can be influenced by many factors, such as brandloyaltyandbrand pressure(i.e., a consumer is persuaded to purchase the
same brand). To gauge consumer behavior, sample surveys are frequently
conducted. Suppose that one of such surveys identifies the following
consumer behavior:
Following week
Current week Brand 1 Brand 2 Brand 3
Brand 1 0.51 0.35 0.14
Brand 2 0.12 0.80 0.08
Brand 3 0.03 0.05 0.92
The market share of a brand during a period is defined as the
average proportion of people who buy the brand during the period. Our
questions are:
a/ What is the market share of a specific brand in a short run (say
in 3 months) or in a long run (say in 3 years)?
b/ How does repeat business, due to brand loyaltyand brand pres-
sure, affect a companys market share and profitability?
c/ What is the expected number of weeks that a consumer stays
with a particular brand?
8/13/2019 Stochastic Processes Applications Lecturenotes
59/102
54 CHAPTER 2. MARKOV CHAINS & MODELING
8/13/2019 Stochastic Processes Applications Lecturenotes
60/102
Chapter 3
Random walks & Wiener
process
Random walks are special cases of Markov chain, thus can be studied by
Markov chain methods.
3.1 Introduction to Random Walks
We use random walks to supply the math base for BLAST. BLAST is a
procedure often employed in Biomatics that
searches for high-scoring local alignments between two sequences,
then tests for significance of the scores found via P-values.
Example3.1. Consider a simple case of the two aligned DNA sequences
ggagactgtagacagctaatgctata
gaacgccctagccacgagcccttatc
55
8/13/2019 Stochastic Processes Applications Lecturenotes
61/102
56 CHAPTER 3. RANDOM WALKS & WIENER PROCESS
Suppose we give
- a score +1 if the two nucleotides in corresponding positions are thesame and
- a score -1 if they are different.
When we compare two sequences from left to right, the accumulated
score performs a random walk, or better a simple random walk in one
dimension. The following theory although mentions the generic case, but
we will use this example and BLAST as running example.
3.2 Random Walk- a mathematical real-
ization
Let Z1, Z2, be independent identically distributed r.v.s with
P(Zn= 1) =p and P(Zn=1) =q= 1 p
for all n. Let
Xn =ni=1
Zi, n= 1, 2, and X0 = 0.
The collection of r.v.s {Xn, n 0}is a random process, and it is called
the simple random walk in one dimension.
(a) Describe the simple random walkX(n).
(b) Construct a typical sample sequence (or realization) ofX(n).
(c) Find the probability that X(n) =2 after four steps.
8/13/2019 Stochastic Processes Applications Lecturenotes
62/102
3.2. RANDOM WALK- A MATHEMATICAL REALIZATION 57
(d) Verify the result of part (a) by enumerating all possible sample
sequences that lead to the value X(n) =2 after four steps.
(e) Find the mean and variance of the simple random walkX(n). Find
the autocorrelation function RX(n, m) of the simple random walk
X(n).
(f) Show that the simple random walk X(n) is a Markov chain.
(g) Find its one-step transition probabilities.
(h) Derive the first-order probability distribution of the random walk
X(n).
Solution.
(a) Describe the simple random walk. X(n) is a discrete-parameter (or
time), discrete-state random process. The state space is E={..., 2, 1, 0, 1, 2,...},
and the index parameter set isT ={0, 1, 2,...}.
(b) Typical sample sequence. A sample sequence x(n) of a simple ran-
dom walk X(n) can be produced by tossing a coin every second and
letting x(n) increase by unity if a head H appears and decrease by unity
if a tail T appears. Thus, for instance, we have a small realization of
X(n) in Table 3.2:
The sample sequence x(n) obtained above is plotted in (n, x(n))-plane.The simple random walk X(n) specified in this problem is said to be
unrestricted because there are no bounds on the possible values of X.
The simple random walk process is often used in Game Theory or
Biomatics.
8/13/2019 Stochastic Processes Applications Lecturenotes
63/102
58 CHAPTER 3. RANDOM WALKS & WIENER PROCESS
n 0 1 2 3 4 5 6 7 8 9 10
Coin tossing H T T H H H T H H T
xn 0 1 0 - 1 0 1 2 1 2 3 2
Table 3.1: Simple random walk from Coin tossing
Remark 3.1. We define the ladder points to be the points in the walk
lower than any previously reached point. An excursion in a walk is the
part of the walk from a ladder point to the highest point attained before
the next ladder point.
BLAST theory focus on the maximum heights achieved by theses
excursions.
(c) The probability that X(n) =2 after four steps.
We compute the first-order probability distribution of the random walk
X(n):
pn(k) = P(Xn=k), with boundary conditions p0(0) = 1, and pn(k) = 0 ifn
8/13/2019 Stochastic Processes Applications Lecturenotes
64/102
3.2. RANDOM WALK- A MATHEMATICAL REALIZATION 59
When X(n) =k, we see that
A= (n + k)/2,
which is a binomial r.v. with parameters (n, p).
Conclude: the probability distribution ofX(n) is given by 3.2.1, in which
n |k|, and n, k must be both even or odd.
Set k = 2 and n= 4 in 3.2.1 to get the concerned probability p4(2)
that X(4) =2
(d) Verify the result of part (a) by enumerating all possible sample se-
quences that lead to the value X(n) =2 after four steps. DIY!
(e) The mean and variance of the simple random walk X(n). Use the
fact
P(Zn= +1) =p and P(Zn=1) = 1 p.
8/13/2019 Stochastic Processes Applications Lecturenotes
65/102
60 CHAPTER 3. RANDOM WALKS & WIENER PROCESS
3.3 Wiener process
Counting process. A random process {X(t), t 0} is said to be a
counting process if X(t) represents the total number of events that
have occurred in the interval (0, t). From its definition, we see that for
a counting process, X(t) must satisfy the following conditions:
X(t) 0 and X(0) = 0.
X(t) is integer valued.
X(s) X(t) ifs < t.
X(t) X(s) equals the number of events that have occurred on the
interval (s, t).
Independent increments and stationary increments. A counting
processX(t) is said to possess independent incrementsif the numbers of
events which occur in disjoint time intervals are independent.
A counting processX(t) is said to possessstationary increments ifX(t+
h) X(s + h) (the number of events in the interval (s + h, t + h) has the
same distribution as X(t) X(s) (the number of events in the interval
(s, t)), for all s < tand h >0.
Wiener process. A random process {X(t), t 0} is called a Wiener
process if
1. X(t) has stationary independent increments
2. The incrementX(t) X(s) (t > s) is normally distributed
8/13/2019 Stochastic Processes Applications Lecturenotes
66/102
3.3. WIENER PROCESS 61
3. E[X(t)] = 0, and
4. X(0) = 0.
The Wiener process is also known as the Brownian motionprocess, since
it originates as a model for Brownian motion, the motion of particles
suspended in a fluid.
Definition23. A random process{X(t), t 0} is called a Wiener pro-
cess withdrift coeficient if
1. X(t) has stationary independent increments
2. X(t) is normally distributed with meanE[X(t)] =t, and
3. X(0) = 0.
8/13/2019 Stochastic Processes Applications Lecturenotes
67/102
62 CHAPTER 3. RANDOM WALKS & WIENER PROCESS
8/13/2019 Stochastic Processes Applications Lecturenotes
68/102
Chapter 4
Arrival-Type processes
4.1 Introduction
In Stochastic processes, we are interested in few distinct properties:
(a) the dependencies in the sequence of values generated by the
process. For example, how do future prices of a stock depend on
past values?
(b)long-term averages, involving the entire se- quence of generated
values. For example, what is the fraction of time that a machine
is idle?
(c) the likelihood or frequency of certain boundary events. For
example, what is the probability that within a given hour all cir-
cuits of some telephone system become simultaneously busy?
63
8/13/2019 Stochastic Processes Applications Lecturenotes
69/102
64 CHAPTER 4. ARRIVAL-TYPE PROCESSES
In this chapter, we will discuss the first major category of stochastic
processes, Arrival-Type Processes. We are interested in occurrences thathave the character of an arrival, such as
- message receptions at a receiver,
- job completions in a manufacturing cell,
- customer purchases at a store, etc.
We will focus on models in which the interarrival times(the times be-
tween successive arrivals) are independent random variables.
First, we consider the case where arrivals occur in discrete time
and the interarrival times are geometrically distributed this is the
Bernoulli process.
Then we consider the case where arrivals occur in continuous time
and the interarrival times are exponentially distributed this is the
Poisson process.
4.2 The Bernoulli process
4.2.1 Basic facts
The Bernoulli process can be visualized as a sequence of independent
coin tosses, where the probability of heads in each toss is a fixed number
p in the range 0 < p < 1. In general, the Bernoulli process consists of
a sequence of Bernoulli trials, where each trial produces
a 1 (a success) with probability p, and
a 0 (a failure) with probability 1 p, independently of what happens
in other trials.
8/13/2019 Stochastic Processes Applications Lecturenotes
70/102
4.2. THE BERNOULLI PROCESS 65
There are many realizations of Bernoulli process. Coin tossing is just a
paradigm involving a sequence of independent binary outcomes. The se-quence Z1, Z2, of independent identically distributed r.v.s in Section
3 is another paradigm for the same phenomenon.
In practice, a Bernoulli process is often used to model systems involving
arrivals of customers or jobs at service centers. Here, time is discretized
into periods, and a success at thek-th trial is associated with the arrival
of at least one customer at the service center during the k-th period. In
fact, we will often use the term arrival in place of success when this is
justied by the context.
Given an arrival process, one is often interested in random variables such
as the number of arrivals within a certain time period, or the time until
the first arrival. For the case of a Bernoulli process, some answers are
already available from earlier chapters. Here is a summary of the main
facts.
Bernoulli DistributionB(p) describes a random variable that can take
only two possible values, i.e. X ={0, 1}. The distribution is described
by a probability function
p(1) = P(X= 1) =p, p(0) = P(X= 0) = 1 pfor some p[0, 1].
It is easy to check that E(X) =p, Var(X) =p(1 p).
8/13/2019 Stochastic Processes Applications Lecturenotes
71/102
66 CHAPTER 4. ARRIVAL-TYPE PROCESSES
4.2.2 Random Variables Associated with the
Bernoulli Process
Binomial distribution B(n, p). This distribution describes a random
variableXthat is a number of successes innindependent Bernoulli trials
with probability of success p.
In other words,Xis a sum ofnindependent Bernoulli r.v. Therefore,
X takes values in X = {0, 1,...,n} and the distribution is given by a
probability function
p(k) = P(X=k) =
n
k
pk (1 p)nk.
It is easy to check that E(X) =np, Var(X) =np(1 p).
4.3 The Poisson process
4.3.1 Poisson distribution
It is another discrete probability distribution, used to determined the
probability of a designated number of successes per unit of timewhen the
successes/events are independent and the average number of successes
per unit of time remains constant. The Poisson distribution is
p(x) =ex
x! x= 0, 1, 2,... (4.3.1)
where
x= designated number of successes, e= 2.71 the natural base,
8/13/2019 Stochastic Processes Applications Lecturenotes
72/102
4.3. THE POISSON PROCESS 67
>0 a constant = the average number of successes per unit of time
periodThe Poisson distributions mean and the variance are
= ; 2 = = .
Example4.1 (Poisson distribution usage). We often model the number
of defects or non-conformities that occur in a unit of product (unit area,
volume, and most frequently unit of time...) say, a semiconductor device,
by a Poisson distribution. The number of wire-bonding defects per unit
X is Poisson distributed with parameter= 4. Compute the probability
that a randomly selected semiconductor device will contain two or fewers
wire-bonding defects.
This probability is
(x 2) =p(0) +p(1) +p(2) =2
x=0
ex
x! = 0.2381.
4.3.2 Poisson process
The Poisson process can be viewed as a continuous-time analog of the
Bernoulli process and applies to situations where there is no natural way
of dividing time into discrete periods. We consider an arrival process
that evolves in continuous time, in the sense that any real number tis a
possible arrival time.
Definition24. A counting processX(t) is said to be a Poisson (count-
ing) process with positive rate (or intensity) if
X(0) = 0, andX(t) has independent increments.
8/13/2019 Stochastic Processes Applications Lecturenotes
73/102
68 CHAPTER 4. ARRIVAL-TYPE PROCESSES
The number of events in any interval of length t is Poisson dis-
tributed with meant; that is, for alls,t >0,
P[X(t + s) X(s) =n] =et(t)n
n! n= 0, 1, 2,... (4.3.2)
4.4 Course Review and Discussion
Practical Problem6.
1. Prove that a Poisson processX(t) with positive rate has station-ary increments, and
E[X(t)] =t, Var[X(t)] =t.
2. Practice. Patients arrive at the doctors office according to a Pois-
son process with rate = 1/10 minute. The doctor will not see a
patient until at least three patients are in the waiting room.
a/ Find the expected waiting time until the first patient is admitted
to see the doctor.
b/ What is the probability that nobody is admitted to see the doctor
in the first hour?
Theorem 25. If every eigenvalue of a matrixPyields linearly indepen-
dent left eigenvectors in number equal to its multuiplicity, then
1. there exists a nonsingular matrixMwhose rows are left eigenvec-
tors ofP, such that
8/13/2019 Stochastic Processes Applications Lecturenotes
74/102
4.4. COURSE REVIEW AND DISCUSSION 69
2. D= M P M1 is a diagonal matrix with diagonal elements are the
eigenvalues ofP, repeated according to multiplicity.
Practical Problem7 (MC for Business Intelligence). Consider a case
study of mobile phone industry in VN. Due to a most recent survey,
there are four big mobile producers/sellers N, S, M and L, and their
market distributions in 2007 is given by the stochastic matrix:
P =
N M L S
N 1 0 0 0
M 0.4 0 0.6 0
L 0.2 0 0.1 0.7
S 0 0 0 1
IsP regular? ergodic?
Find the long term distribution matrixL= limm Pm.
What is your conclusion?
(Remark that the state N and Sare called absorpting states).
8/13/2019 Stochastic Processes Applications Lecturenotes
75/102
70 CHAPTER 4. ARRIVAL-TYPE PROCESSES
8/13/2019 Stochastic Processes Applications Lecturenotes
76/102
Chapter 5
Probability Modeling and
Mathematical Finance
Probability modeling in finance provides instruments to rationalize the
unknown by imbedding it into a coherent framework. Three key com-
ponents should be distinguished: randomness, uncertainty and chaos.
Kolmogorov defined randomness in terms of non-uniqueness and non-
regularity (as a die with six faces or the expansion of). Kalman defined
chaos as randomness without probability.
Few areas that employ much probability modeling include: weather fore-
casting, biology and financial forecasting. In general, in order to model
uncertainty we seek to distinguish the known from the unknown and find
some mechanisms (theories, intuition, common sense...) to reconcile our
knownledge with our lack of it.
71
8/13/2019 Stochastic Processes Applications Lecturenotes
77/102
72CHAPTER 5. PROBABILITY MODELING AND MATHEMATICAL FINANCE
5.1 Martingales
5.1.1 History
Girolamo Cardano in his book The Book of Game of Chance in 1565
proposed the notion of fair game. He stated: The most fundamental
priciple of all in gambling is simply equal conditions, ... . This is the
essence of the Martingale, however until 1900, in Bacheliers thesis that
a mathematical model of a fair game- or martingale- was proposed.
Nowadays, we understand the concept of a fair game or martingale, in
money terms, states that the expected profit at a given time given the
total past capital is null with probability one.
Throughout this chapter we assume that (, F,P) is a fixed probability
space, where
is a sample space representing the set of all possible outcomes,
F is a -algebra of subsets of representing the events to which
we can assign probabilities, and
P is a probability measure on (, F).
The expectation with respect to P will be denoted by E[.].
5.1.2 Conditional expectation
Let X and Zbe two r.vs on the same (, F,P)-space. Suppose X has
range {x1, x2, . . . , xm}and Zhas range{z1, z2, . . . , z n}. We know that
P[X=xi|Z=zj ] :=P[X=xi, Z=zj]
P[Z=zj]
8/13/2019 Stochastic Processes Applications Lecturenotes
78/102
5.1. MARTINGALES 73
and also
E[X|Z=zj] =i
xi P[X=xi|Z=zj].
Definition 26. The random variableY = E[X|Z], the conditional ex-
pectation ofX givenZ, is defined as follows:
(a) ifZ() =zj, then Y() :=E[X|Z=zj] =:yj(say).
Justification. In this way we could do partitioning the space into
Z-atoms Z = zj, on which Z is constant. The -algebra G = (Z)
generated byZconsists of sets {ZB}, B B, the Borel set. Therefore
G=(Z) consists precisely of the 2n possible unions of the n Z-atoms.
Note from (a) that Y is constant on Z-atoms, so better we say
(b) Y isG measurable.
Theorem 27 (Kolmogorov 1933). Let (, F,P) be a probability space
and X a random variable with E[|X|] < . LetG be sub--algebra of
F. Then there exists a random variableY such that
a) Y isG-measurable,
b) E[|Y|]< ,
c) for everyG Gwe have
G
Y dP=
G
XdP, G G.
8/13/2019 Stochastic Processes Applications Lecturenotes
79/102
74CHAPTER 5. PROBABILITY MODELING AND MATHEMATICAL FINANCE
Moreover, if Y1 is another random variable with these properties then
Y1=Yalmost surely (a.s.), that isP
[Y1=Y] = 1.
A random variable Y with properties a)-c) is called a version of the
conditional expectationE[X|G] ofXgivenG, and we write Y =E[X|G]
a.s.
Proof. Since G is generated by Z, or any G G is a union of the n
Z-atoms, so we first prove that
Z=zj
Y dP= yj P[Z=zj] =... =
Z=zj
XdP.
Write Gj ={Z=zj}then this equation means E[Y IGj ] =E[XIGj ]...
Note 5.1. We often write
E[X|Z] forE[X|G] =E[X|(Z)]; and
E[X|Z1, Z2, . . .] forE[X|(Z1, Z2, . . .)].
Fact 5.2. ifUis a non-negative bounded r.v., then
E[U|G] 0, a.s.
5.1.3 Key properties of Conditional expectation
See textbook.
8/13/2019 Stochastic Processes Applications Lecturenotes
80/102
5.1. MARTINGALES 75
5.1.4 Filtration
A filtration is a family {Ft, t= 0, 1, . . . , T }of sub--algebras indexed by
t = 0, 1, . . . , T such that
F0 F1 F2 . . . FT;
that is the family is increasing with time. Intuitively, for each t =
0, 1, . . . , T , the -algebra Ft tells us which events may be observed by
time t.
If the sample space is a finite set, often the -algebra F0 is trivial,
consisting simply of the empty set and the whole sample space . We
also often write just {Ft} instead of the lengthy {Ft, t = 0, 1, . . . , T },
and can assume that FT = F(since shall be considering only random
variables that areFT-measurable).
Definition28. We call the quadruple (, F, {Ft},P) a filtered probabil-
ity space.
We fix a filtered probability space (, F, {Ft},P) from now on. Given
d N.
A d-dimensional stochastic process with time index set {0, 1, . . . , T },
defined on the provided filtered probability space, is a collection
X={Xt, t= 0, 1, . . . , T }
where each Xt is a d-dimensional random vector, i.e. a function
Xt : Rd such that
8/13/2019 Stochastic Processes Applications Lecturenotes
81/102
76CHAPTER 5. PROBABILITY MODELING AND MATHEMATICAL FINANCE
X1
t (B) { :Xt() B} F
for each subset B ofRd.
The process X = {Xt, t 0} is called adapted(to the filtration
{Ft}) if for each t, Xt isFt-measurable, i.e.
ifX1t (B) Ft for each set B ofRd and for each t= 0, 1, . . . , T .
We often write Xt Ft as shorthand for X1t (B) Ft for all sets
B in Rd.
Two d-dimensional stochastic processes Y = {Yt} and Z = {Zt}
are modifications of one another if P(Yt = Zt) = 1 for each t =
0, 1, . . . , T .
5.1.5 Martingale
A collection/ process M={Mt, Ft, t= 0, 1, . . . , T }, where each Mt is a
real-valued random variable, is called a martingale if the following three
conditions hold:
1. E[|Mt|]< for t = 0, 1, . . . , T ,
2. Mt is Ft-measurable for t = 0, 1, . . . , T , [i.e. the process M is
adapted]
3. the conditional expectation
E[Mt|Ft1] =Mt1 for t= 1, . . . , T .
8/13/2019 Stochastic Processes Applications Lecturenotes
82/102
5.1. MARTINGALES 77
In our discrete time setting, condition 3. can be equivalently re-
placed by
3.
E[Mt|Fs] =Ms for all s < t in{0, 1, . . . , T }.
We call M asub-martingaleif the = in condition 3. or 3. is replaced
by ; call M a super-martingale if the = in condition 3. or 3. is
replaced by .
When describing (sub/super)martingales we will sometimes omit the fil-
tration Ft from the notion for Mwhen it is understood.
Interpretation of Martingale in Finance
The martingale is considered to be a necessary condition for an efficient
asset market, one in which the information contained in past prices isinstantly, fully and perpetually reflected in the assets current price. We
identify
M={Mt =pt the assets price at t},
and denote the filtration t={p0, p1, . . . , pt}for an asset price history
at timet = 0, 1, 2 . . .expressing the relevant information we have at this
time regarding the time series. Then we could think that in a martingale
process each process event (as a new price)
is independent and can be summed (or intergrable); and
has the property that its conditional expectation remains the same
(i.e. time-invariant).
8/13/2019 Stochastic Processes Applications Lecturenotes
83/102
78CHAPTER 5. PROBABILITY MODELING AND MATHEMATICAL FINANCE
Hence,M={Mt = pt}is a martingale iff the expected next period price
is equal to the current price:
E[pt+1|p0, p1, . . . , pt] =pt or equivalentlyE[pt+1|t] =pt for any time t.
If instead asset prices decrease (or increase) in expectation over time, we
have a super-martingale (sub-martingale):
E[pt+1|t]pt
Observation 1. Martingales may also be defined with respect to other
processes.
If, for example, P ={pt, t0}ispriceprocess and Y ={yt, t 0}
is interest rateprocess, we can say that P is a martingale with respect
to Y if
E[|pt|]< , and E[pt+1|y0, y1, . . . , yt] =pt,t.
Fact 5.3. By induction, a martingale implies an invariant mean:
E[pt+1] =E[pt] = = E[p0].
5.1.6 Martingale examples
Example5.1. Sum of independent zero-mean r.vs. LetX1, X2, . . .be a sequence of independent r.vs withE[|Xn|]< , n andE[Xn] = 0.
Define S0= 0,F0={, }and
Sn:= X1+ X2+ X3+ + Xn,
8/13/2019 Stochastic Processes Applications Lecturenotes
84/102
5.1. MARTINGALES 79
Fn:=(X1, X2, X3, . . . , X n).
Then you can prove for n 1 that
E[Sn|Fn1] =Sn1 a.s.
Example 5.2. Geometric Random Walks and a specific case.
The essential idea underlying the random walk for real processes is the
assumption of mutually independent increments of the order of magni-
tude for each point in time. However, economic time series in particular
do not satisfy the latter assumption. Seasonal fluctuations of monthly
sales figures for example are in absolute terms signicantly greater if the
yearly average sales gure is high. By contrast, the relative orpercent-
age changes are stable over time and do not depend on the current
level ofXt.
Analogously to the random walk Xt =t
i=0 Zi with i.i.d. absoluteincrements Zt = Xt Xt1, a geometric random walk {Xt; t 0} is
assumed to have i.i.d. relative increments
Rt = XtXt1
for t= 1, 2, . . .
For a specific case, the geometric binomial random walk
Xt=Rt Xt1 = X0k=1
tRk
whereX0, R1, R2, . . .are mutually independent, eachRkis Bernoulli, and
for u >1 (up), d
8/13/2019 Stochastic Processes Applications Lecturenotes
85/102
8/13/2019 Stochastic Processes Applications Lecturenotes
86/102
5.1. MARTINGALES 81
Example5.4. Product of non-negative independent r.vs of mean
1. Let X1, X2, . . . be a sequence of independent non-negative r.vs withE[Xn] = 1n.
Define M0 = 0,F0 = {, }and
Mn := X1X2X3 . . . X n, Fn := (X1, X2, X3, . . . , X n).
The processM is a martingale. (Why?)
5.1.7 Stopping time
Definition30. A (discrete) stopping time is a function : {0, 1, . . . , T }
{}
such that
{=t} Ft for t= 0, 1, . . . , T . . . ()
Obviously for such a stopping time we see:
{=}= \(Tt=0
{=t}) FT.
For convenience we define F = FT, and then () also holds with
t= .
Justification. Intuitively is a time when you can decide to stop
playing our game. Whether or not you stop immediately after the n-
th game depends only on the history up to (and including) time n:
{=n}= {:() =n} Fn.
8/13/2019 Stochastic Processes Applications Lecturenotes
87/102
82CHAPTER 5. PROBABILITY MODELING AND MATHEMATICAL FINANCE
Fact 5.4. With any (discrete) stopping time, there is a-algebra de-
fined by
F={A F: A {=t} Ft for t= 0, 1, . . . , T }.
Lemma 31. If and are two stopping times, then
= min(, ), and = max(, )
both also are stopping times.
8/13/2019 Stochastic Processes Applications Lecturenotes
88/102
5.2. STOCHASTIC CALCULUS 83
5.2 Stochastic Calculus
Our basic assumption is, we do not know and can not predict tomorrows
values of asset prices. The past history of the asset value is there as a
financial time series for us to examine as much as we want, but we can
n