Applied Probability Theory - J. Chen

8/10/2019 Applied Probability Theory - J. Chen

1/177

Stat333 Lecture Notes

Applied Probability Theory

Jiahua Chen

Department of Statistics and Actuarial Science

University of Waterloo

cJiahua Chen

Fall, 2003


2/177

2

Course Outline

Stat333

Review of basic probability. Generating functions and their applications.Simple random walk, branching process and renewal events. Discrete time

Markov chain. Poisson process and continues time Markov chain. Quequing

theory and renewal processes.


3/177

Contents

1 Introduction 1

1.1 Probability Model . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Conditional Probabilities and Independence . . . . . . . . . . 3

1.3 Bayes Formula . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Key Facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Random Variables 7

2.1 Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Discrete Random Variables . . . . . . . . . . . . . . . . . . . . 9

2.3 Continuous Random Variables . . . . . . . . . . . . . . . . . . 102.4 Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.5 Joint Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.6 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.7 Formulas for Expectations . . . . . . . . . . . . . . . . . . . . 14

2.8 Key Results and Concepts . . . . . . . . . . . . . . . . . . . . 15

2.9 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3 Conditional Expectation 19

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2 Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.3 Comment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

1


4/177

2 CONTENTS

4 Generating Functions 29

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.2 Probability Generating Functions . . . . . . . . . . . . . . . . 32

4.3 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.3.1 Key Facts . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.4 The Simple Random Walk . . . . . . . . . . . . . . . . . . . . 36

4.4.1 First Passage Times . . . . . . . . . . . . . . . . . . . 38

4.4.2 Returns to Origin . . . . . . . . . . . . . . . . . . . . . 40

4.4.3 Some Key Results in the Simple Random Walk . . . . 41

4.5 The Branching Process . . . . . . . . . . . . . . . . . . . . . . 42

4.5.1 Mean and Variance ofZn. . . . . . . . . . . . . . . . . 43

4.5.2 Probability of Extinction . . . . . . . . . . . . . . . . . 444.5.3 Some Key Results in Branch Process . . . . . . . . . . 48

4.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5 Renewal Events 59

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.2 The Renewal and Lifetime Sequences . . . . . . . . . . . . . . 61

5.3 Some Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.4 Delayed Renewal Events . . . . . . . . . . . . . . . . . . . . . 67

5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6 Discrete Time MC 73

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6.2 Chapman-Kolmogorov Equations . . . . . . . . . . . . . . . . 80

6.3 Classification of States . . . . . . . . . . . . . . . . . . . . . . 82

6.4 Limiting Probabilities . . . . . . . . . . . . . . . . . . . . . . . 89

6.5 Mean Time Spent in Transient States . . . . . . . . . . . . . . 95

6.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

7 Exponential and Poisson 105

7.1 Definition and Some Properties . . . . . . . . . . . . . . . . . 106

7.2 Properties of Exponential Distribution . . . . . . . . . . . . . 106

7.3 The Poisson Process . . . . . . . . . . . . . . . . . . . . . . . 109


5/177

CONTENTS 3

7.3.1 Inter-arrival and Waiting Time Distributions . . . . . . 112

7.4 Further Properties . . . . . . . . . . . . . . . . . . . . . . . . 1137.5 Conditional Distribution of the Arrival Times . . . . . . . . . 114

7.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

8 Continuous Time Markov Chain 119

8.1 Birth and Death Process . . . . . . . . . . . . . . . . . . . . . 122

8.2 Kolmogorov Differential Equations . . . . . . . . . . . . . . . 125

8.3 Limiting Probabilities . . . . . . . . . . . . . . . . . . . . . . . 130

8.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

9 Queueing Theory 139

9.1 Cost Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 139

9.2 Steady-State Probabilities . . . . . . . . . . . . . . . . . . . . 141

9.3 Exponential Model . . . . . . . . . . . . . . . . . . . . . . . . 143

9.4 Single Server . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

9.5 Network of Queues . . . . . . . . . . . . . . . . . . . . . . . . 149

9.5.1 Open System . . . . . . . . . . . . . . . . . . . . . . . 149

9.5.2 Closed Systems . . . . . . . . . . . . . . . . . . . . . . 150

9.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

10 Renewal Process 15510.1 Distribution ofN(t) . . . . . . . . . . . . . . . . . . . . . . . 156

10.2 Limiting Theorems and Their Applications . . . . . . . . . . . 159

10.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

11 Sample Exam Papers 165

11.1 Quiz 1: Winter 2003 . . . . . . . . . . . . . . . . . . . . . . . 165

11.2 Quiz 2: Winter 2003 . . . . . . . . . . . . . . . . . . . . . . . 167

11.3 Final Exam: Winter 2003 . . . . . . . . . . . . . . . . . . . . 169


6/177

Chapter 1

Introduction

1.1 Probability Model

A probability model consists of three parts: sample space, a collection of

events, and a probability measure.

Assume an experiment is to be done. The set of all possible outcomes is

called Sample Space. For example, if we roll a die,{1, 2, 3, 4, 5, 6} is thesample space. We use notation S for the sample space. Every element of

S is called a sample point. Mathematically, the sample space is merely an

arbitrary set. There is no need of a corresponding experiment.Roughly speaking, every subset ofS is an event. The collection of events

is then all possible subsets ofS. In some cases, however, we only admit a

specific class of subsets ofSas events. We do not discuss this point in this

course.

For every event, we assign a probability to it. To make it meaningful,

we have to maintain some internal consistency. That is, the assignment is

required to have some properties. The following conditions are placed on

assigning probabilities.

Axioms of Probability Measure

A probability measure Pis a function of events such that:

1. 0 P(E) 1 for any event E;

1


7/177

2 CHAPTER 1. INTRODUCTION

2. P(S) = 1;

3. P(i=1Ei) =

i=1 P(Ei) for any mutually exclusive events Ei, i =

1, 2, . . .. i.e. EiEj = for all i =j .

Mathematically, the above definition does not depend on the hypothetical

experiment. A probability model consists of a sample spaceS, a-algebra B(a collection of subsets ofSwith some properties), and a probability measure

P.

The axioms for a probability model imply that the probability measure

has many other properties not explicitly stated as axioms. For example, since

P(

) =P() + P(), we must have P() = 0.

LetEc be the complement of eventEwhich consists of all sample points

which do not belong to E. Axioms 2 and 3 imply that

1 =P(S) =P(E Ec) =P(E) + P(Ec).

Hence,P(Ec) = 1 P(E).For any two events E1 and E2, we have

P(E1 E2) =P(E1) + P(E2) P(E1E2).

In general,

P(ni=1Ei) =

P(Ei) i1


8/177

1.2. CONDITIONAL PROBABILITIES AND INDEPENDENCE 3

Using classical definition of the probability measure (which satisfies three

Axioms),P(A1) =

(n 1)!n!

; P(A1A2) =(n 2)!

n!

and so on. We get

P(iAi) = 1 12!

+ 1

3! + (1)n+11

n!.

The answer to the question is then

1 P(iAi) = 1 [1 12!

+ 1

3! + (1)n+11

n!].

The limit when n is thenexp(1).

1.2 Conditional Probabilities and Independence

Two eventsAand B areindependentif and only if

P(AB) =P(A)P(B).

Some people may have probabilistic instinct on why this relationship de-

scribes the independence, and why our notion of independence implies thisrelationship. However, once the notion of independence is defined as above,

this relationship serves as our golden standard. We always try to verify this,

whether we work on assignment problems or on applications on the concept

of independence. For instance, to test if being a smoker is independent of

having heart disease, we check whether the above relationship is true by

collecting data on these incidents.

A sequence of eventsA1, . . . , Anare independent of each other if and only

if

P(iI

Ai) =

iI

P(Ai)

for all subsets I of{1, 2, . . . , n}.We would like to emphasize that pairwise independence does not imply

the overall independence.


9/177


LetEand Fbe two events andP(F)>0. We define that the conditional

probability ofE givenF by

P(E|F) =P(EF)/P(F).

As already defined, two events E and Fare independent if and only if

P(EF) =P(E)P(F). When eventsEand F are independent, we find

P(E|F) =P(E)

when P(F) > 0. However, we should not use this relationship as the defi-

nition of independence. WhenP(F) = 0, the conditional probability is not

defined, butEand Fcan still be two independent events.

1.3 Bayes Formula

Let Fi, i= 1, 2, . . . , nbe mutually exclusive events such thatFi = S, andP(E)> 0. Then

P(Fk|E) = P(EFk)P(E)

= P(E|Fk)P(Fk)

i P(E|Fi)P(Fi)

.

The Bayes formula is a mathematical consequence of defining the condi-tional probability. However, this formula has generated a lot of thinking in

statistics. We could think ofEis an event (subset of sample space) of some

experiment to be done, andFis classify the sample points of the same exper-

iment according to possibly a different rule (than the rule ofE). Somehow,

E is readily observed, but Fis are not. Before the experiment is done, we

may have some prior information on what probabilities ofFis are. However,

when the experiment is done and the outcome (the sample point) is known

to belong to E, but its membership in Fi remains unknown, this Bayes for-

mula allows us to update our assessment of the chance for Fi in view of the

occurrence ofE. For example, before we toss a die, it is known the chance of

observing 2 is 1/6. After a die is tossed, and you are told that the outcome

is an even number, then the conditional probability becomes 1/3.

Here is a less straightforward example.


10/177

1.4. KEY FACTS 5

Example 1.2

There are three coins in a box: 1. two headed; 2. fair; 3. biased withP(H) =0.75.

When one of the coins is selected at random and flipped, it shows head.

What is the probability that it is the two headed coin?

Solution: Let C1, C2 and C3 represent the events when the two headed,

fair or biased coin is selected, respectively. We want to find P(C1|H).

P(C1|H) = P(H|C1)P(C1)3i=1 P(H|Ci)P(Ci)

.

The answer is 4/9.

RemarkIt is not so important to memorize the Bayes formula, but the def-inition of the conditional probability. Once you understand the conditional

probability, you can work out the formula easily.

1.4 Key Facts

A probability space consists of three components: Sample space, the collec-

tion of events, and the probability measure. The probability measure satisfies

three Axioms and from which we introduce the concepts of conditional prob-

ability and independence. The Bayes theorem is a simple consequence ofmanipulating the idea of conditional probability. However, the result incited

philosophical debate in statistics.

1.5 Problems

1. Suppose that in an experiment, a fair die is rolled twice. LetA={thefirst outcome is even}, B={the total score is 4}, C= the total score,D=the absolute difference between two scores.

(a) Which of A, B, C, D are events? Which of them are randomvariables?

(b) Which of the following make sense? Which of them do not?

(i)A B, (ii)P(C), (iii)E(A), (iv) Var(D).


11/177


2. Let S be the sample space of an particular experiment, A and B be

events, and P be a probability measure. Which of the followings areAxioms, definitions and formulas?

(i)P(A B) =P(A) + P(B) P(AB).(ii)P(S) = 1.

(iii)P(A|B) =P(AB)/P(B) whenP(B) = 0.

3. Using only the axioms of probability, show that

1)P(A B) =P(A) + P(B) P(AB)2)P(A

B

C) =P(A) + P(B) + P(C)

P(AB)

P(AC)

P(BC) +

P(ABC).

4. a) Prove thatP(ABC) =P(A|BC)P(B|C)P(C).b) Prove that ifA and B are independent, then so are Ac and B c.

5. LetA and B be two events.

(a) Show that in general, ifA and B are mutually exclusive, then they

are not necessarily independent.

(b) Find a particular pair of events A and B such that they are both

mutually exclusive and independent.

6. Prove Booles inequalities:

(a)P(ni=1Ai) ni=1 P(Ai), (b) P(ni=1Ai) 1

ni=1 P(A

ci).

7. Let A1 A2 be a sequence of events. Ifi=1 Ai = (empty),show that

limn

P(An) = 0.


12/177

Chapter 2

Random Variables

2.1 Random Variable

In practice, we may describe the outcomes of an experiment by any termi-

nology. For example, if Mary and Paul compete in a game, the outcomes can

be: Mary wins; Mary loses; it is a draw.

However, it is more convenient in mathematics to code the outcomes by

numbers. For example, we may define the outcome as 1 if Mary wins, the

outcome is 1 if Mary loses, and as 0 if it is a draw. That is, we can transformthe outcomes in S into numbers. There are many ways to transform theoutcomes.

In probability theory, we call the mechanism of transforming sample

points into numbers as Random Variable. More formally, we define a

random variable as a function on the sample space S.

We use capital letters X, Y, and so on for random variables.

In most applications, we focus mainly on the value of the function (ran-

dom variables). That is why it appears that the random variables are num-

bers, rather than mechanisms of transforming sample points into numbers.

As a function, a random variable is totally deterministic. There is nothing

random. However, the inputs of this function are random, this fact impliesthe outcome of the transformation is random. This is how we get the notion

that random variables are random.

Example 2.1

7


13/177

8 CHAPTER 2. RANDOM VARIABLES

LetSbe the ordered outcomes of rolling two fair dice. DefineXbe the sum

of two outcomes. If = (2, 5) which is a sample point, then X() = 7.Nothing is random. Since in a specific experiment, we are not certain in advance whether the

two outcomes will be = (2, 5), we hence do not know whether the outcome

ofXwill be 7. This gives us the illusion ofXbeing random. Its randomness

is inherited from the randomness of the outcome in S.

When we use notation X= 7, we often do not mean that the outcome

ofXis 7 in a specific experiment. Rather, we define it as

X= 7 = Set of sample points which makes X equal 7.

Hence, in this example,

X= 7 = {(1, 6), (2, 5), . . . , (6, 1)}

which is a subset ofS. Consequently, it is an event. When the dice are fair,

the classical definition assigns a probability of 1/6 to this event.

If the dice are not fair, we usually assign a different value to it, or we

do not know what value is most suitable in this application. However, we

believe that there must be a suitable value exists and it does not have any

effect on the definition ofX.

There is another excuse for not focus on the fact that a random variable

Xis a function. We care more about probabilities associated with events in

the form of X x than about how X maps S into real numbers. OnceP(X x) is available for all real numbers x, we then classify Xaccordingto the form of this function, and ignore X itself.

Example 2.2

Toss a coin until the first head appears. Suppose in each trial, P(H) = p

and trials are independent. Define X = the number of tosses when theexperiment is completed.

In this experiment, the sample space is

S= {H , T H , T T H , . . .}.


14/177

2.2. DISCRETE RANDOM VARIABLES 9

The corresponding values ofX are

{1, 2, 3, . . .}.

We find

P(X=n) =p(1 p)n1

for all n 1. Once this is done, we say Xhas geometric distribution. HowthisX is defined becomes irrelevant.

IfXis a random variable, we call

F(x) =P(X x)

the cumulative distribution function(c.d.f.). It is known that F(x) is ac.d.f. of some random variable in some probability model if and only if

1. F(x) 0;2. F() = 1, F() = 0;3. F(x) is monotone increasing and right continuous.

That is, we can construct a sample space together with a probability measure

and a random variable, so that the cumulative distribution function of this

random variable is given by F(x).

2.2 Discrete Random Variables

If the set of all possible outcomes of a random variable X is countable, then

we say that the random variable X is discrete.

For example, if a random variable can only take values{.2, .5, 2, }, itis discrete. More commonly seen discrete random variables in our textbooks

take integer values. However, we should remember that discrete random

variable can take any values, as long as the number of possible values remain

countable.

By the way, the notion of countable needs to be clarified. If we can find

a one-to-one map from a set to a set of integers, then this set is count-

able. The set of all even numbers is countable. The set of the numbers


15/177


{1, .1, .01, .001, . . .} is also countable. Being countable implies that we can

arrange the elements in the set into a sequence. We often represent a count-able set of real numbers as{x1, x2, . . .}.If{ t1, t2, t3, . . .} is the set of possible outcomes ofX, we say the function

f(ti) =P(X=ti)

the probability (mass) function (p.m.f.) ofX.

Note that in this definition, I used notation ti for possible values of the

random variable X. Although it is a common practice that we use xis for

possible values of the random variable X, this is not a requirement. It is

very important for us to make a distinction between (the notation of) the

possible values ofX, and X itself.

2.3 Continuous Random Variables

If the c.d.f. of a random variable F(x) =P(X x) can be written as

F(x) = x

f(t)dt,

for some non-negative f(t), we say X is absolutely continuous. We have

f(x) =dF(x)/dx(for almost all x) and f(x) is called the density function ofX.

We classify random variables according to their cumulative distribution

functions, probability functions or density functions. We usually do not mind

how these random variables are defined.

Example 2.3

1. Xhas Binomial (n, p) distribution iff(i) =P(X=i) =ni

pi(1p)ni

fori = 0, 1, . . . , n.

2. Xhas Poisson () distribution if

f(i) =P(X=i) =i

i!exp()

fori = 0, 1, 2, . . ..


16/177

2.4. EXPECTATIONS 11

3. Xhas uniform [0, 1] distribution ifF(x) =P(X x) =xforx [0, 1],

orf(x) = 1 for x [0, 1].4. Xhas exponential distribution with mean parameter if its c.d.f. is

given by F(x) = 1exp(x/) or if its p.d.f. is given by f(x) =1exp(x/) for x 0.

Note that we do not have to specify the sample space, probability mea-

sure, and how the random variables are defined in the above example.

Two basic types of random variables have been introduced. In theory,

there is a third type of random variables. However, the third type of randomvariables is usually not discussed in elementary probability courses. Notice

that the sum of two random variables is clearly another random variable.

When we add a continuous random variable to a discrete random variable,

the new random variable is not discrete nor continuous. That is, we can-

not always classify a random variable into one of the three possible types.

A measure theory result states, however, that any random variable can be

written as a linear combination of three random variables of each type.

2.4 ExpectationsA proper definition of the expectation of a random variable needs advanced

knowledge of real analysis. We give a handicapped definition as follows.

IfX is discrete with possible values{x0, x1, x2, . . . , }, then we calculateits expectation as

E(X) =i=0

xiP(X=xi)

when the summation converges absolutely.

IfXis (absolute) continuous with density functionf(x), then we calculateits expectation as

E(X) =

tf(t)dt

when the integration converges absolutely.


17/177


When the convergence does not hold, we say the expectation does not

exist.To calculate the expectation of any random variable, we should pay a lot

of attention to the if part before you start. Many students lost the clue

because they ignore this part of the definition.

Example 2.4

Calculate expectation of Binomial and Exponential random variables.

2.5 Joint Distribution

LetXandY be two random variables. Note that it is possible to define two

functions on the same sample space. For example, suppose our sample space

is [0, 1][0, 1], the unit square. Every sample point can be represented as(w1, w2). Let

X(w1, w2) =w1, Y(w1, w2) =w2

and assume the probability measure on [0, 1][0, 1] is uniform. Then bothX andY have uniform distribution. We find

P(X s, Y t) =st;when (s, t)[0, 1][0, 1].

IfZis another random variable such that

Z(w1, w2) = 1 w1.We findZalso have uniform distribution. However,

P(X s, Z t) =stin general.

The moral of this example is: knowing individual distributions ofX, Y

and Zis not enough to tell their joint behavior.The joint random behavior of two random variables X and Y is charac-

terized by their joint c.d.f. defined as

F(x, y) =P(X x, Y y).


18/177

2.5. JOINT DISTRIBUTION 13

The joint c.d.f. of more random variables are defined similarly.

Let us point out again that the lower case letters x, y are notation fordummy variables. They do not have to associate with random variablesX

andY. That is, we may use

F(s, t) =P(X s, Y t)

to represent exactly the same joint c.d.f. It is the appearance ofX,Y in the

definition that makesFthe joint c.d.f ofX andY.

The marginal c.d.f ofX orYcan be obtained by taking limit.

FX(s) =P(X s) = limt F(s, t).FY(y) =P(Y y) = lim

sF(s, y).

Note that I used (s,t,y) on purpose. It is certainly not a good practice,

but the point is, Xdoes not have to be linked with x.

When bothXandYare discrete, it is more convenient to work with the

joint probability (mass) function:

f(x, y) =P(X=x, Y =y);

When there exists a non-negative functionf(x, y) such that

F(x, y) = y

x

f(s, t)dsdt

for all real numbers (x, y), we say that X and Y are jointly (absolutely)

continuous andf(x, y) is their joint density function.

The marginal probability function (for discrete case) can be obtained as

fX(x) =

yf(x, y).

The marginal density function (for continuous case) can be obtained as

fX(x) =

f(x, y)dy.


19/177


2.6 Independence

If the joint c.d.f. ofXandY satisfiesF(x, y) =FX(x)FY(y) forallx, y, then

we sayX andY are independent.

When bothXandYare discrete, then the independence is equivalent to

f(x, y) =fX(x)fY(y)

for all (x, y) where f(x, y) is the joint probability function. When X and Y

are jointly continuous, then the independence is equivalent to

f(x, y) =fX(x)fY(y)

for almost of (x, y) wheref(x, y) is the joint density function.

2.7 Formulas for Expectations

LetX and Y be two random variables. We define

V ar(X) =E(X E(X))2 =E(X2) (EX)2;

Cov(X, Y) =E[(X EX)(Y EY)].It is known that

E(aX+ bY) =aEX+ bEY;

V ar(aX+ bY) =a2V ar(X) + b2V ar(Y) + 2abCov(X, Y)

wherea, b are two real numbers (constants).

Let Z = X+Y be a newly created random variable. Its c.d.f. can be

derived from the joint c.d.f. of X and Y. This task is not always simple.

There are two special cases.

First, assume X andYare independent and jointly continuous. Assume

that Xhas density function f(x) and Y have density function g(y). Then

we know that the joint density function f(x, y) = f(x)g(y). The density

function ofZ=X+ Y is given by

fZ(t) =

f(t y)g(y)dy.


20/177

2.8. KEY RESULTS AND CONCEPTS 15

Second, assumeXandYare independent, take non-negative integer val-

ues only, with probability functions f(x) and g(y). (Note the notation lookthe same as before). The joint probability function ofZ=X+ Y is

P(Z=n) =ni=0

f(i)g(n i).

Example 2.5 Derivation of the distribution ofX+ Y.

1. Both X and Y have exponential distribution with common density

F(x) = exp(x) forx 0.

2. Both Xand Y have Poisson distribution with means 1 and 2.

2.8 Key Results and Concepts

Random variables are real valued functions defined on the sample space.

Their randomness is the consequence of the randomness of the outcome from

the sample space. We classify them according to their cumulative distribu-

tion functions or equivalently, their probability mass functions or probability

density functions.

A discrete random variable takes at most countable number of possible

values. An absolutely continuous random variable has cumulative distribu-

tion function which can be obtained from a density function by integration.

(Or roughly, its cumulative distribution function is differentiable). The third

type of random variable is not discussed.

A random variable has, say, Poisson distribution if its probability function

has the formn

n!exp(), n 0, 1, 2, . . . .

In general, the distribution of a random variable is named after the form of

its cumulative distribution function.

The mean, variance, moments of a random variable are determined by

its distribution. In many examples, they can be obtained by summation or


21/177


integration (to some students) easily. In other examples, the mean, variance

of a random variable can be obtained via its relationship to other randomvariables. Thus, memorizing some formulas is useful.

2.9 Problems

1. IfXand Yare two random variables, what do we mean by

(i)F(x) is the cumulative distribution function ofX?

(ii)X 4 is independent ofY 2?

2. Let Xbe a random variable with Binomial distribution with parametersn= 3, p = 0.4, i.e.

pX(k) =

3

k

(0.4)k(1 0.4)3k when k = 0, 1, 2, 3.

LetY = (X 1)2.(i) LetFX(x) be the cumulative distribution function ofX. Calculate

FX(2.4).

(ii) Tabulate the probability function ofY.(iii) Tabulate the probability function ofXgivenY= 1. (iv) Tabulate

E(X|Y).

3. A random number Nof fair dice is thrown. P(N = n) = 2n, n 1.LetSbe the sum of the scores. Find the probability that

a)N= 2 given S= 4

b)S= 4 given N= 2.

c)S= 4 given N is even

d) the largest number shown by any die is r .

4. A coupon is selected at random from a series ofk coupons and placed

in each box of cereal. A house-husband has bought Nboxes of cereal.


22/177

2.9. PROBLEMS 17

What is the probability that all k coupons are obtained? (Hint: Con-

sider the event that the ith coupon is not obtained. The answer is innice summation format.)

5. If birthdays are equally likely to fall in each of the twelve months of

the year, find the probability that all twelve months are represented

among the birthdays of 20 people selected at random.

(Hint: let Ai be the event that the ith month is not included and

considerA1 A2 A12)

6. LetXbe a random variable and g() be a real valued function.

(a) What do we mean by X is discrete?(b) IfXis a discrete random variable, argue thatg(X) is also a random

variable and discrete.

(c) IfXis a continuous random variable, is g(X) necessarily a contin-

uous random variable? Why?

7. Leta and b be independent random variables uniformly distributed in

(0, 1). What is the probability that x2 + ax + b= 0 has no real roots?

8. Express the distribution functions of

X+ = max{0, X}, X = min{0, X}, |X| =X+ + X, X

in terms of the distribution function Fof the random variable X.

9. Is it generally true that E(1/X) = 1/E(X)? Is it ever true that

E(1/X) = 1/E(X)?

10. Suppose that 10 cards, of which 5 are red and 5 are green, are put

at random into 10 envelops, of which 7 are red and 3 are green, so

that each envelop will contain a card. Determine the probability that

exactlyk envelopes will contain a card with a matching color(k=0, 1,

. . ., 10).


23/177



24/177

Chapter 3

Conditional Distribution and

Expectations

3.1 Introduction

Suppose bothXandYare discrete and hence have a joint probability func-

tionf(x, y). Then, we have

P(X=x|Y =y) = P(X=x, Y =y)P(Y =y)

=f(x, y)

fY

(y).

Of course, this is meaningful only ifP(Y =y) =fY(y)> 0.

When we pay no attention on the part of Y = y, this is a function of

x only. However, this function (or the way of crunching the numberx and

report a number called probability) is determined by X,Yand the numbery

jointly. As a function ofx, it is a probability function. Since it is determined

byXandY =y, we say it is the conditional probability function ofXgiven

Y =y. A commonly used notation is fX|Y(x|y).

Example 3.1

There are two urns. The first contains 4 white and 6 black balls, and the

second contains 2 white balls and 8 black balls. A urn is selected randomly,

and then we randomly pick 5 balls from the urn (with replacement). Define

19


25/177

20 CHAPTER 3. CONDITIONAL EXPECTATION

X= the number of while balls selected. What is the probability function of

X?Solution: Consider the situations when different urns are selected. Define

Y =i if the ith urn is selected.

Let us work on the conditional probability functions first.

P(X=j |Y= 1) =

5

j

(.4)j(.6)5j

and

P(X=j |Y= 2) =

5

j

(.2)j(.8)5j

forj = 0, 1, . . . , 5.The marginal probability function ofX is given by

P(X=j) = (.5)[

5

j

((.4)j(.6)5j] + (0.5)[

5

j

(.2)j(.8)5j ].

As we have noticed, when Y = 1 is given, Xhas binomial distribution

withn = 5, p= 0.4. This distribution has expectation 2. We use notation

E(X|Y= 1) = 2.In general, we define

E(X|Y =y) = x

xP(X=x|Y =y)

where the sum is over all possible values ofX.

Remark: Again, we should always first determine whether Xis discrete.

If it is, then determine the what values ofXcan be before this formula is

applied.

When bothXandYare discrete, E(X|Y =y) is well defined. There areseveral components in this definition. Whenever we use a new value y, theoutcome will probably change. In the last example,

E(X|Y= 1) = 2, E(X|Y= 2) = 1.


26/177

3.1. INTRODUCTION 21

When we focus on the value of Y in this expression, we find we have a

function ofy defined as

(y) =E(X|Y =y).

Just like a function such as g(y) =y2, we know that (Y) is also a random

variable. Thus, we might want to know the expectation of this new random

variable. It turns out that

E[(Y)] =y

(y)P(Y =y)

= y

E(X|Y =y)P(Y =y)

=y

[x

xP(X=x|Y =y)]P(Y =y)

=x,y

xP(X=x, Y =y)

= E(X).

To be more concrete, we do not use (Y) in textbooks, but write it as

E(X|Y) and call it the conditional expectation ofXgivenY. For some withmathematical curiosity, we may write

E(X|Y) =E[X|Y =y]|y=Y.Hence, the above identity can be stated as

E[E(X|Y)] =E(X).

One intuitive interpretation of this result is: the grand average is the weighted

average of sub-averages. To find the average mark of students in stat230, we

may first calculate the average in each of 6 sections. Hence, we obtain 6

conditional expectations (conditioning on which section a student is in). We

then calculate the weighted average of section averages according to the size

of each section. This is the second expectation being applied on the left hand

side of the above formula.

It turns out that this concept applies to continuous random variables too.

If (X, Y) are jointly continuous, we define the conditional density function


27/177


ofX given Y =y as

fX|Y(x|y) =f(x, y)

fY(y)

wheref(x, y) is the joint density,fX andfY are marginal density functions,

and we assume that fY(y) is larger than zero,

The conditional expectation will then be defined as

E(X|Y =y) =

xf(x, y)

fY(y) dx

which is again a function of y. Same argument implies we could define

E(X

|Y) in exactly the same way as before. It is easy to verify that

E[E(X|Y)] =E(X).

In fact, this equality is true regardless the type of random variables (after

they are properly defined). The only restriction is: all relevant quantities

exist.

3.2 Formulas

Most formulas for ordinary expectation remain valid for the conditional ex-

pectation. For example,

E(aX+ bY|Z) =aE(X|Z) + bE(Y|Z).

Ifg() is a function, we have

E[g(Y)X|Y] =g(Y)E[X|Y]

as g(Y) is regarded as non-random with respect to Y.

At last, we define

V ar(X|Y) =E[(X E(X|Y))2|Y].

Then

V ar(X) =E[V ar(X|Y)] + V ar[E(X|Y)].


28/177

3.2. FORMULAS 23

To show this, notice that

E[V ar(X|Y)] = E{E[(X E(X|Y))2|Y]}= E{E(X2|Y) [E(X|Y)]2}= EX2 E[E(X|Y)]2,

and

V ar(E(X|Y)) = E[E(X|Y)]2 [E{E(X|Y)}]2= E[E(X|Y)]2 [E(X)]2.

Adding them up, we get the conclusion.

Example 3.2

A miner is trapped in a mine with 3 doors. If he uses the first door, he will

be free 2 hours later. If he uses the second, he will be back to the same spot

3 hours later. If he uses the third door, he will be back to the same spot 5

hours later. Assume that he does not have memory and will always pick a

door at random. What is the expected time it takes for him to get free?

Solution: LetXbe the number of hours it takes until he gets free. We

are asked to calculate E(X).It seems that the expectation is simpler if we know which door he selected

in the first place. For this reason, we define random variable Y to be the

door he selects in the first try.

Now it is simple to write down

E(X|Y= 1) = 2.

However, we only have

E(X|Y = 2) = 3 + E(X), E(X|Y= 3) = 5 + E(X).Even though it does not directly answer our question, we do have

E(X) =E[E(X|Y)] =13

[2 + (3 + EX) + (5 + EX)].


29/177


This is a simple linear equation, we find E(X) = 10.

Can we use the same idea to calculate V ar(X)?It is seen that

V ar(X|Y= 1) = 0; V ar(X|Y= 2) =V ar(X|Y= 3) =V ar(X).

Hence,

E[V ar(X|Y)] =23

V ar(X),

V ar[E(X|Y)] =

1

3[22 + 133 + 152]

102 =

98

3.

Consequently, we find

V ar(X) =2

3V ar(X) +

98

3

and henceV ar(X) = 98.

Remark: We certainly do not believe that the miner will be memoryless.

Such an example might be useful to model a trapped mouse. We might be

able to make the inference whether a mouse will learn after repeating thisexperiment a number of times. We could compare the observed average with

this theoretical average under memoryless assumption. Any discrepancy may

point to the possibility that the mouse is in fact learning.

3.3 Comment

It could be claimed that the probability theory is a special case of mea-

sure theory in mathematics. However, the concepts of independence and

conditional expectation allow probability theory to be a separate scientific

discipline.

Our subsequent developments depend heavily on the use of conditional

expectation.


30/177

3.4. PROBLEMS 25

3.4 Problems

1. LetXbe an random variable such that

P(X=n) =p(1 p)n, n= 0, 1, 2, . . .

is its probability function and 0< p 0. Their common probability density function is

f(t) =et t 0.

(i) Calculate P(X >5|X >3).

(ii) Calculate P(X+ Y 1).3. There are two TAs for a certain course. For a particular assignment

handed in, if it were marked by the first TA, the mark would be random

with mean 75% and variance (0.1)2; while if it were marked by the

second TA, the mark would be random with mean 70% and variance

(0.05)2. The first TA has 40% chance to mark any single assignment.

Let X be the mark of the particular assignment. Calculate the mean

and variance ofX.

4. LetX1, X2, X3, . . .be independently distributed random variables suchthatXn has probability mass function

fn(k) =P(Xn= k) =

n

k

pk(1 p)nk k= 0, 1, . . . , n .


31/177


(a) Find the probability generating function ofXn.

(b) Find the probability generating function ofX1+ X2+ X3.

(c) LetNbe a positive integer valued random variable with probability

generating function G(s) and assume it is independent ofX1, X2, . . ..

Find the probability generating function ofXN

(d) Continuation of (c), find the probability generating function ofXN+

XN+1.

5. An integerNis chosen from the geometric distribution with probability

function

fN(n) =(1 )n1

, n= 1, 2, . . .

Given N=n,Xhas the uniform distribution on 1, 2, . . . , n.

a) Find the joint p.f. ofX and N.

b) Find the conditional p.f. ofN givenX=x.

6. The number of fish that Elise catches in a day is a Poisson random

variable with mean 30. However, on the average, Elise tosses back two

out of every three fish she catches. What is the probability that, on a

given day, Elise takes home nfish. What is the mean and variance of

(a) the number of fish she catches,

(b) the number of fish she takes home?

(What independence assumptions have you made?)

7. Let X1, X2, X3 be independent random variables taking values in the

positive integers and having probability function given by P(Xi = x) =

(1 pi)px1i forx = 1, 2, . . . ,and i = 1, 2, 3.(a) Show that

P(X1 < X2< X3) = (1 p1)(1 p2)p2p23(1 p2p3)(1 p1p2p3) .

(b) FindP(X1 X2 X3).


32/177

3.4. PROBLEMS 27

8. Suppose that 13 cards are selected at random from a regular deck of 52

playing cards. (a) If it is known that at least one ace has been selected,what is the probability that at least two aces have been selected? (b)

If it is known that the ace of heart has been selected, what is the

probability that at least two aces have been selected?

9. The number of children N in a randomly chosen family has mean

and variance2. Each child is a male with probabilityp independently

and X represents the number of male children in a randomly chosen

family. Find the mean and variance ofX.

10. Suppose we have ten coins which are such that if the ith one is flipped

then heads will appear with probability i/10, i = 1, 2, . . . , 10. When

one of the coins is randomly selected and flipped, it shows head. What

is the conditional probability that it was the fifth coin?


33/177



34/177

Chapter 4

Generating functions and their

applications

4.1 Introduction

Suppose that{aj} = {a0, a1, . . .}, is a sequence of real numbers. If

A(s) =j=0

ajsj =a0+ a1s + a2s

2 + (4.1)

converges in some interval|s| s0 where s0 > 0, then A(s) is called thegenerating functionof the sequence {aj}0 . The generating function providesa convenient summary of a real number sequence. In many examples, simple

and explicit expressions ofA(s) can be obtained. This enables us to study

the properties of{aj}0 conveniently.

Example 4.1

The Fibonacci sequence{fj} is defined by f0 = 0, f1 = 1 and the recursiverelationshipfj =fj1+ fj2, j = 2, 3, . . . . (4.2)

We use the tool of generating function to find explicit expressions offj .

29


35/177

30 CHAPTER 4. GENERATING FUNCTIONS

Solution: Multiplying bysj and summing over j gives

j=2

fjsj =

j=2

fj1sj +

j=2

fj2sj. (4.3)

Note the summation starts from j = 2 because (4.2) is valid only when

j = 2, 3, . . .. By defining F(s) =j=0 fjs

j, we get

j=2

fjsj =

j=0

fjsj f0 f1s= F(s) s.

With similar treatment on the right hand side of (4.3), we obtain

F(s) s= sF(s) + s2F(s). (4.4)

Ignoring the convergence issue for the moment, we find

F(s) = s

1 s s2 .

This is surely a simple and explicit generating function. To study other

properties of the sequence, let us note that in general, a generating function

has the McLaurin series expansion

A(s) =A(0) + A

(0)s + A

(0)s2

/2! + which by comparison with (4.1) gives

aj =A(j)

j! .

This, of course, requires the function be analytic at 0 which is true whenA(s)

converges in a neighbourhood of 0. An obvious conclusion is: the real number

sequences and the generating functions have an one-to-one correspondence

when the convergence and the analytic properties are true.

Now let us get back to the example, F(s) clearly converges at least for|s| 0.5. This allows us to look for its McLaurin series expansion. Note that

1 s s2 = (1 1 +

5

2 s)(1 1

5

2 s)


36/177

4.1. INTRODUCTION 31

and by the method of partial fractions

F(s) = 1

5[j=0

(1 + 5

2 )jsj

j=0

(1 5

2 )jsj].

Recall the property of one-to-one correspondence,

fj = 1

5[(

1 +

5

2 )j (1

5

2 )j], j = 0, 1, 2, . . . .

It is interesting to note that

limj

fj/fj1 = (1 + 5)/2

which is the golden ratio, to which the ancient Egyptians attributed many

mystical quantities.

In this example, the generating function has been used as a tool for solving

thedifference equation (4.2). The generating functions will be seen to be

far more useful than just this. For example, if A(s) converges in|s| s0withs0> 1, then

A(1) =

j=1

aj, A(1) =

j=1

jaj

and so on.

Example 4.2

Consider the following series:

aj = 1, j = 0, 1, 2, . . . ;

bj = 1/j!, j= 0, 1, 2, . . . ;

c0= 0, cj = 1/j, j = 1, 2, . . . .

Easy calculation shows their corresponding generating functions are A(s) =

(1s)1,B(s) =es andC(s) = log(1s), where the regions of convergenceas|s|


37/177


4.2 Probability Generating Functions

LetXbe a random variable taking non-negative integer values with proba-

bility function{pj}, where

pj =P{X=j}, j= 0, 1, 2, . . . .

The generating function of{pj} is called theprobability generating func-tion ofXand we write

G(s) =GX(s) =E{sX} =p0+p1s +p2s2 + . (4.5)

Of course, this function provides a convenient summary of the probability

function ofX. Note that it converges at least for|s| 1 since, for sin thisinterval,

j=0

pj |s|j j=0

pj = 1.

Using some mathematics tools, we can easily find

G(1) =E(X) =j=0

jpj , G(r)(1) =E(X(r)) =

j=0

j(r)pj

whenever the corresponding quantities exist. Otherwise, G(r)(1) has to be

replaced by lims1G(r)(s) and infinity outcome is allowed. Note j(r) =j(j 1) (j r +1) andE(X(r)) is therth factorial momentofX. Thevariance ofXcan be expressed as

V ar(X) =E(X(2)) + E(X) [E(x)]2 =G(1) + G(1) [G(1)]2.

Example 4.3

SupposeXhas geometric distribution with parameter pso that

pj =P(X=j) =p(1

p)j , j= 0, 1, 2, . . . .

The probability generating function ofX is

G(s) =E(sX) =j=0

p(1 p)jsj =p[1 (1 p)s]1


38/177

4.2. PROBABILITY GENERATING FUNCTIONS 33

for|s| j ) =pj+1+pj+2+ , j = 0, 1, 2, . . . .

LetQ(s) =j=0 qjs

j be the corresponding generating function and note that

sinceqj 1 for all j , it follows thatj=0

qjsj

j=0

sj = (1 s)1

for|s|


39/177


Since G(1) = 1, it follows from (4.7) and the Mean Value Theorem incalculus that, for given|s|


40/177


41/177


42/177

4.4. THE SIMPLE RANDOM WALK 37

Here, we use generating functions to examine properties of the process

{Xn}. Some quantities to be investigated areun = P(Xn= 0)

fn = P(X1= 0, . . . , X n1= 0, Xn= 0)n = P(X1< 1, . . . , X n1< 1, Xn 1)

(r)n = P(X1< r, . . . , X n1 < r, Xn r)(r)n = P(X1> r , . . . , X n1> r, Xn r)

forn = 1, 2, . . .and r= 1, 2, . . ..

For convenience, we define u0 = 1, f0 = 0 = (r)0 =

(r) = 0. In the

simple random walk as presented, Zn can be either 1 or1. Thus, it isimpossible for Xn1 < r, Xn > r to occur for any n. We insist on using

Xn1 < r, Xn r instead ofXn1 < r, Xn=r in the definitions of(r)n . Ithas the advantage of being able to retain the same definition for more general

random walks.

Each of these quantities represents the probability of a particular outcome

of the simple random walk afterntrials. We summarize them in the following

table.

Symbol Probability of

un return to 0 at trialn

fn first return to 0 at trial n

n first passage through 1 at trial n

(r)n first passage through r at trial n


43/177


44/177


since 0 = 0. Note this identity is still true even when n = 0. Therefore,

we have found {(2)

} = {n} {n} (convolution) and (2)

= [(s)]2

. In likemanner,

(r)(s) = [(s)]r, r= 2, 3, . . . .

Although the above relationship is neat, we cannot solve it to obtain an

explicit expression ofns yet. Let us work on another relationship between

{(2)n} and{n}. It is obvious that1 =p. If the first passage through 1 attrialn such thatn >1, it requiresZ1= X1 = 1. After that, it requires thesimple random walk to gain a value of 2 in exactly n 1 steps. Thus

n = q(2)n1, n= 2, 3, . . . . (4.8)

Multiplying both sides of (4.8) bysn and sum overnwith care over its range,

we haven=2

nsn =q

n=2

(2)n1s

n.

We find

(s) ps= qs(2)n (s) =qs[(s)]2

from the first relationship.

It is easy to find the two possible forms:

(s) =1 1 4pqs22qs

.

Whens 0, we should have (s) 0 so we must have

(s) =1 1 4pqs2

2qs = (2qs)1

j=1

1/2

j

(4pqs2)j

where the binomial expansion has been used. From this we find 2n= 0 and

2n1= (2q)1

1/2

n

(4pq)n = (2n1)1

2n 1

n

pnqn1, n= 1, 2, . . . .

The generating function (s) will tell us more about the simple random

walk. Since

(s) =n=0

nsn,


45/177


(1) = 0+ 1+ 2+

= P(first passage through 1 ever occurs)= (1

1 4p + 4p2)/2q= (1 |p q|)/2q

=

1 p qp/q p < q.

The walk is certain to pass through 1 whenp > q, or even whenp = q= 1/2.

Ifp q, we may define the random variable Nwhich is the waiting timeuntil first passage through 1 occurs. That is

N= min{

n: Xn= 1}

and we know, in this case, that P(N q

, p=q.

Can we still define N whenp < q?

If the walk is used to model gambling, the above conclusions amount

to say: the gambler is certain to have positive net winning at some time if

p 1/2. If, however,p


46/177


Note {Xn} is also a simple random work withP(Xn = 1) =qrather than

p. Hence the event B has similar structure to the event A. Let(1)n =P(X1< 1, X2 < 1, . . . , Xn1 < 1, Xn= 1).

Then,{(1)n }has the same generating function as that of{n}except for pand q switched. In addition,P(B) =P(X1 =1)(1)n1 and therefore, forn 1,

fn = P(A) + P(B) =p(1)n1 + qn1.

Equivalently,

F(s) = ps(1)(s) + qs(s)

= ps1

1 4pqs2

2ps + qs 1

1 4pqs2

2qs

= 1

1 4pqs2.The probability that the process ever returns to the origin is

F(1) =n=0

fn= 1 |p q|

and so a return is certain only ifp = q= 1/2. In this case, the mean time to

return is

F(1) = lims1

d

ds

[1

1

s2] =

.

Thus, if the game is fair and you have lost some money at the moment, we

have a good news for you: the chance that you will win back all your money

is 1. The bad news is, the above result also tells you that on average, you

may not live that long to see it.

4.4.3 Some Key Results in the Simple Random Walk

Symbol Expression Generating function

un 2nn (pq)

n U(s) = (1 4pqs2)1/2

fn (2n 1)12nn

(pq)n 1

(1 4pqs2)

n (2n 1)12n1n

pnqn1 (2qs)1[1

(1 4pqs2)]


47/177


The following are key steps of deriving the results in the above table.

qs[(s)]2 (s) +ps= 0;F(s) = 1 [U(s)]1;

F(s) =ps(1)(s) + qs(s);

(2)(s) = [(s)]2.

4.5 The Branching Process

Now let us study the second example of simple stochastic processes. Here we

have particles that are capable of producing particles of like kind. Assumethat all such particles act independently of one another, and that each parti-

cle has a probability pj of producing exactly j new particles, j = 0, 1, 2, . . .,pj = 1. For simplicity, we assume that the 0th generation to consist of

a single particle and the direct descendants of that particle form the first

generation. Similarly, the direct descendants of the nth generation form the

(n + 1)th generation. Z0 = 1

Z1 = 4

Z2 = 5

Z3 = 9

Let Zn be the population of the nth generation so that Z0 = 1 and

P(Z1 = j) =pj ,j= 0, 1, 2, . . .. LetXni be the direct descendants of the ith

individual in the nth generation. Hence, we have

Zn+1=Zni=1

Xni

for all n 1. In addition, all Xni are independent and have the samedistribution as that ofZ1.


48/177


49/177


and from (4.10) it follows that

n= G(Hn1(1))Hn1(1) =n1, n= 1, 2, . . . (4.11)

where = G(1) is the mean family size and we have Hn1(1) = 1. Since

0 = 1, it follows from (4.11) that n = n. Thus, if > 1, the average

population size increases exponentially. If


50/177

4.5. THE BRANCHING PROCESS 45

Note thatq0 q1 q2 andqj 1 for all j . Thus,

q = limn

qn

exists and represents the probability that the population ever becomes ex-

tinct. From (4.12), it follows thatqis a fixed point of the probability gener-

ating functionG(s); that is

q= G(q).

This gives us the idea that we need only solve the equation G(s) s= 0to obtain the probability of extinction. However, we need to know that when

the equation has more than one solution, which one gives the probability of

extinction?Theorem 4.2

Let{Zn}n=0 be a branching process as specified in this section such thatZ0 = 1, and the family size generating function is given by G(s). Then the

probability of extinction for this branching process qis the smallest solution

of the equation

s= G(s)

in the interval [0, 1].

Proof: Assume that smallest solution in [0, 1] isq and we want to show

thatq= q.

Let qn =P(Zn = 0) for n= 0, 1, . . .. Clearly, q0 = 0q. Assume thatqk q for some n. Note thatG(s) in an increasing function for s [0, 1].Hence, G(qk)q. Consequently, qk+1 = G(qk)q. This impliesqn qfor all k. Let n , we obtain q q. Since qis also a solution in [0, 1],andq is the smallest such solutions, we must have q= q.

In many situations, we do not have to solve the equation to determind the

value ofq. Leth(s) =G(s) s. One obvious solution in [0, 1] is s = 1. Noteh(s) =G(s) 1, h(s) =G(s) =

j=2j(j 1)pjsj2 0 when s [0, 1].

Thus, h(s) is a convex function.There are several possibilities:

1. Ifh(1) =G(1)1 = 1> 0, the curve ofh(s) goes down froms= 0and then comes up to hit 0 ats = 1. Sinceh(0) =P(X01 = 0) 0, the


51/177


curve crosses 0 line exactly once before s= 1. Since q is the smallest

solution in [0, 1]. We must have q 0, then we are at the

same situation as in case 2; On the other hand,h(0) =P(X01= 0) = 0

implies the family size is fixed at 1, hence q= 0.

RemarkBecause of the above summary, most students tend to always solvethe equation to find the probability of ultimate extinction. This is often more

than what is needed.

Example 4.5


52/177

4.5. THE BRANCHING PROCESS 47

Lotka (See Feller 1968, page 294) showed that to a reasonable approximation,

the distribution of the number of male offspring in an American family wasdescribed by

p0= 0.4825, pk = (0.2126)(0.5893)k1, k 1

which is a geometric distribution with a modified first term. The correspond-

ing probability generating function is

G(s) =0.4825 0.0717s

1 0.5893sand G(1) = 1.261. Thus, for example, in the 16th generation, the average

population of male descendants of a single root ancestor is

16 = (1.261)16 = 40.685.

The probability of extinction, however, is the smallest solution of

q=0.4826 0.0717q

1 0.5893q .

Thus, we findq= 0.8197. This suggests that for those names that do survive

to the 16th generation, the average size is very much more than 40.685. (All

the calculations are subject to original round off errors). Example 4.6

From the point of epidemiology, it is more important to control the spread

of the disease than to cure the infected patients. Suppose that the spread of

a disease can be modeled by a branching process. Then it is very important

to make sure that the average number of people being infected by a patient

is less than 1. If so, the probability of its extinction will be one. However,

even if the average number of people being infected is larger than one, there

is still a positive chance that the disease will extinct.

A scientist in Health Canada analyzed the data from the SARS (severe

atypical respiratory syndrome) epidemic in year 2003. It is noticed that many

interest phenomena could be partially explained by the results in branching

process.


53/177


First, many countries imported SARS patients but they did not cause

epidemics. This can be explained by the fact that the probability of extinc-tion is not small (even when the average number of people being infected by

a single patient is larger than 1).

Second, a few patients were nicknamed super-spreader. They might

simply corresponding to the portion of branching process which do not ex-

tinct.

Third, after government intervention, the average number of people being

infected by a single patient was substantially reduced. When it fell below 1,

the epidemic was doomed to extinct.

Finally, it was not cost effective to screen all airplane passengers, but to

take strict and quick measure of quarantine of new and old cases. Whenthe average number of people being infected by a single patient falls below

one, the disease will be controlled with probability one.

4.5.3 Some Key Results in Branch Process

For simplicity, we assumed that the population starts from a single individual:

Z0 = 1; we also assumed the numbers of offsprings of various individual are

independent and have the same distribution.

Under these assumptions, we have shown that

n= n

and

2n=n 1

1 n12

where and 2 are the mean and the variance of the family size and n and

2n are the mean and the variance of the size of the nth generation.

We have shown that the probability of extinction, q, is the smallest non-

negative solution to

G(s) =s

whereG(s) is the probability generating function of the family size. Further,

it is known that q= 1 when 1, q


54/177

4.6. PROBLEMS 49

These results can all be derived from the fact that

Hn(s) =Hn1(G(s))

whereHn(s) is the probability generating function of the population size of

thenth generation.

4.6 Problems

1. Find the mean and variance ofXwhen

(a) X has Poisson distribution with p(x) = x

x! e, x = 0, 1, . . ..

(b) X has exponential distribution with f(x) =ex,x 0.

2. (a) If X and Y are exponentially distributed with rate = 1 and

independent of each other, find the density function ofX+ Y.

(b) If X and Y are geometrically distributed with parameter p and

independent of each other, find the probability mass function ofX+Y.

(c) Find a typical discrete distribution and a typical continuous distri-

bution (not discussed in class) to repeat question (a) and (b).

3. Suppose that given N=n, Xhas binomial distribution with parame-

ters n and p. Suppose alsoNhas Poisson distribution with parameter

. Use the technique of generating functions to find

(a) the marginal distribution ofX.

(b) the distribution ofN X.

4. LetX1, X2, X3, . . . be independent and identically distributed random

variables such that X1 has probability mass function

f(k) =P(X1 = k) =p(1 p)k k= 0, 1, 2, . . . .

(a) Find the probability generating function ofX1.


55/177


(b) Let In = 1 if Xn n and In = 0 if Xn < n for n = 0, 1, 2, . . ..

That is,In is an indicator random variable. Show that the probabilitygenerating function ofIn is given by

Hn(s) = 1 + (s 1)(1 p)n.

(c) Let Nbe a random variable with probability generating function

G(s) and assume it is independent ofX1, X2, . . .. Let IN = In when

N = n and In is the indicator random variable defined in (b). Show

that

E[sIN|N] =HN(s) = 1 + (s 1)(1 p)N.

Find the probability generating function ofIN,

5. A coin is tossed repeatedly, heads appearing with probability p= 2/3

on each toss.

(a) LetXbe the number of tosses until the first occasion by which two

heads have appeared successively. Write down a difference equation for

f(k) =P(X=k). Assume that f(0) = 0.

(b) Show the generating function off(k) is given by

F(s) = 4

27s2[

2

1 23s+

1

1 + 13s].

(c) Find an explicit expression for f(k) and calculate E(X).

6. LetXandYbe independent random variables with negative binomial

distribution and probability function

pi =

ki

pk(p 1)i, i= 0, 1, . . . .

(a) Show that the probability generating function of X is given by

G(s) = pk

(1+(p1)s)k .(b) Find the probability function ofX+ Y.

(c) Calculate E(eX) and V ar(eX) and what condition on the size ofp

is needed?


56/177

4.6. PROBLEMS 51

7. Give the sequences generated by the following:

1)A(s) = (1 s)1.5;2)B(s) = (s2 s 12)1;3)C(s) =s log(1 s2)/ log(1 );4)D(s) =s/(5 + 3s);

5)E(s) = (3 + 2s)/(s2 3s 4);6)F(s) = (p + qs)n.

8. Turn the following equation systems into equations in generating func-

tions.

1)b0= 1;bj =bj1+ 2aj, j = 1, 2, . . .; a0 = 0.

2)b0= 0,b1 = p, bn= qn1r=1brbn1r, n = 2, 3, . . ..

9. 1) Find the generating function of the sequence aj = j(j + 1), j =

0, 1, 2, . . ..

2) Find the generating function of the sequence aj = j/(j+ 1), j =

0, 1, 2, . . ..

3) LetXbe a non-negative integer valued random variable and definerj = P(X j). Find the generating function of{rj} in terms of theprobability generating function ofX.

10. 1) Negative Binomial

pj =

kj

(p)j(1 p)k, j = 0, 1, . . .

wherek >0 and 0< p


57/177


11. Find the probability generating function of the following distributions:

1. Discrete uniform on 0, 1, . . . , N .2. Geometric.

3. Binomial.

4. Poisson.

12. Let {an} be a sequence with generating functionA(s),|s| < R,R >0.Find the generating functions of

1){c + an}wherecis a real number.2)

{can

}where c is a real number.

3){an+ an+2}.4){(n + 1)an}.5){a2n} = {a0, 0, a2, 0, a4, . . .}.6){a3n} = {a0, 0, 0, a3, 0, 0, a6, . . .}.

13. Consider a usual branching process: let the population size of the nth

generation be Xn and family size of the ith family in the nth gener-

ation be Zn,i. Thus, Xn =

Xn1i=1 Zn,i and X0 = 1. Assume Zn,i are

independent and identically distributed, and

P(Z1,1 = 0) =1

2+ a; P(Z1,1= 1) =

1

4 2a; P(Z1,1= 3) = 1

4+ a,

for some a.

(a) Find probability generating function of the family size. When a=

1/8, find the probability generating function ofX2.

(b) Find range ofa such that the probability of extinction is less than

1.

(c) Whena = 1/8, find the expectation and variance of the population

size of the 5th generation and the probability of extinct.

14. For a branching process with family size distribution given by

P0= 1/6, P2 = 1/3, P3= 1/2;


58/177

4.6. PROBLEMS 53

calculate the probability generating function ofZ2 givenZo= 1, where

Z2 is the population of the second generation. Find also, the meanand variance ofZ2 and the probability of extinction. Repeat the same

calculation when Zo= 3 and

P0 = 1/6, P1= 1/2, P3= 1/3.

15. Let the probabilitypnthat a family has exactlynchildren bepn when

n 1, andp0= 1p(1+p+p2+ ). Assume that all 2n sex sequencesin a family ofn children have probability 2n. Show that fork 1,the probability that a family has exactly k boys is 2pk/(2

p)k+1.

Given that a family includes at least one boy, what is the probabilitythat there are two or more boys?

16. Let Xi, i 1, be independent uniform (0, 1) random variables, anddefineN by

N= min{n: Xn< Xn+1}whereX0 = x. Letf(x) =E(N).

(a) Derive an integral equation for f(x) by conditioning onX1.

(b) Differentiate both sides of the equation derived in (a).

(c) Solve the resulting equation obtained in (b).

17. Consider a sequence defined by r0 = 0, r1 = 1 and rj = rj1+ 2rj2,

j 2. Find the generating function R(s) of{rj}, determine r25. Forwhat region ofs values does the series forR(s) converge?

18. Let X1, X2, be independent random variables with common p.g.f.G(s) =E(sXi). LetNbe a random variable with p.g.f. H(s) indepen-

dent of the Xis. LetTbe defined as 0 ifN= 0 and

Ni=1 Xi ifN >0.

Show that the p.g.f. ofT is given by H(G(s)). Hence find E(T) andVar(T) in terms ofE(X),V ar(X),E(N) andV ar(N).

19. Consider a branching process in which the family size distribution is

Poisson with mean .


59/177


(a) Under what condition will the probability of extinction of the pro-

cess be less than 1?(b) Find the extinction probability when = 2.5 numerically.

(c) When = 2.5 find the expected size of the 10th generation, and

the probability of extinction by the 5th generation. Comment on the

relationship between this second number and the ultimate extinction

probability obtained in (b).

20. Consider a branching process in which the family size distribution is

geometric with parameter p. (The geometric distribution has p.m.f

pj =p(1 p)j

, j = 0, 1, . . .).(a) Under what condition will the probability of extinction of the pro-

cess be less than 1?

(b) Find the probability of extinction when p= 1/3.

(c) When p = 1/3, find the expectation and variance of the size of the

10th generation and the probability of extinction by the 5th generation.

21. Let{Zn}n=0 be a usual branching process with Z0 = 1. It is knownP0= p, P1= pq, P2= q

2 with 0

p

1 andq= 1

p.

1) Find a condition on the size ofp such that the probability of extinc-

tion is 0. 2) Find the range ofp such that the probability of extinction

is smaller than 1. Calculate the probability of extinction when p = 1/2.

3) Calculate the mean and the variance ofZn whenp = 1/2.

22. Let X1, X2, . . . be independent random variables with common p.g.f.

G(s) = E(sXi). Let Nbe a random variable with p.g.f. H(s). Show

that

T = Ni=1 Xi N 1

0 N= 0

has p.g.f. H(G(s)). Hence, find the mean and variance ofT in terms

of the means and variances ofXi and N. Remark: Can you see the

relevance between this problem and the usual branching process?


60/177


61/177


62/177

4.6. PROBLEMS 57

(b) Let fn = P(X1= 0, X2= 0, . . . , X n1= 0, Xn = 0|X0 = 1) for

n= 1, 2, . . .and f0= 0. It is known that the generating function offnis given by

F(s) =1 1 4pqs2

2ps

and

1 4pq= |pq|. Find the probability that 0 will ever be reached.(c) Find the range ofp such that state 0 is recurrent.


63/177



64/177

Chapter 5

Renewal Events and Discrete

Renewal Processes

5.1 Introduction

Consider a sequence of trials that are not necessarily independent and let

represent some property which, on the basis of the outcomes of the first

n trials, can be said unequivocally to occur or not to occur at trial n. By

convention, we suppose that has just occurred at trial 0, andEn represents

the event that occurs at trial n,n = 1, 2, . . ..We call an event in renewal theory. However, it is not an event in the

sense of probability models in which events are subsets of the sample space.

Taking the simple random walk {Xn} as an example, we regardXnas theoutcome of thenth trial. Thus, {Xn} themselves are outcomes of a sequenceof trials. An event1 can be used to describe: the outcome Xn is 0. That

is, the event has just occurred at trial n is the event

En = {Xn= 0}

for a given n.Similarly, another possible event2 can be defined such that 2 has just

occurred at trailn is the event

En= {Xn Xn1= 1, Xn1 Xn2= 1}, n= 2, 3, . . . .

59


65/177

60 CHAPTER 5. RENEWAL EVENTS

The events E0 andE1 have to be defined separately.

In general, if we have a well defined event , then we can easily describethe event En for every n. If we have a complete description of event En for

every n, the event is then well defined. It is convenient to definef0 = 0

and for n 1, fn =P(Ec1Ec2 . . . E cn1En). Thus,fn is the probability that occurs for the first time at trial n (after trial 0).

We say that is a renewal event if each time occurs, the process

undergoes a renewal or regeneration. That is to say that at the point when

occurs, the outcomes of the successive trials have the same stochastic prop-

erties as the outcomes of the successive trials started at time 0. In par-

ticular, the probability that will next occur after n additional trials is fn,

n= 1, 2, . . .. Mathematically, it means

1. P(En+m|En) =P(Em|E0);2. P(En+mE

cn+m1 Ecn+2Ecn+1|En) =P(EmEcm1 Ec2Ec1|E0).

Another simple (but not rigorous) way to define a renewal event is: inde-

pendent of previous outcomes of the trials, once occurs, the waiting time

for the next occurrence of has the fixed distribution.

Example 5.1

Consider a sequence of Bernoulli trials in which P(S) = p and P(F) = qwithp + q= 1. Let represent the event that trials n 2,n 1 andn resultrespectively in F, S and S. We shall say that is the event F SS. It is

clear that is a renewal event. If occurs atn, the process regenerates and

the waiting time for its next occurrence has the same distribution as had the

waiting time for the first occurrence. Example 5.2

In the same situation as above, let represent the eventS S. That is, is

said to occur at trial n if trials n

1 and n both give Sas the outcome. In

this case, is not a renewal event; the occurrence of does not constitute a

renewal of the process. The reason is, if has occurred at trial n, the chance

it will recur at trial n + 1 isp, but the chance that occurs on the first trial

is 0.


66/177

5.2. THE RENEWAL AND LIFETIME SEQUENCES 61

Example 5.3

In most situations, the event of record breaking is not a renewal event. Let

us consider the record high temperature. The record always gets higher and

makes it hard to break. Thus, the waiting time for the next occurrence is

likely to be longer. Hence, it cannot be a renewal event.

Example 5.4

The simple random walk provides a rich source for examples of renewal

events. As before, we assume X0 = 0 and Xn = Xn1+Zn, where Zn = +1

or1 with respective probabilities p and q, independently, n = 1, 2, . . ..a) Let represent return to the origin. Then is a renewal event. In

fact, the notation that we used in our analysis of the simple random walk

will motivate our choice of notation for recurrent events as introduced in the

next section.

b) Let represent a ladder point in the walk. By this we mean that

occurs at trial n if

Xn= max{X0, X1, . . . , X n1) + 1

and we assume to have occurred at trial 0. Thus, the first occurrence of corresponds to first passage through 1, the second occurrence of corre-

sponds to first passage through 2, and so on. Here again, is a renewal

event, since each ladder point corresponds to a regeneration of the process.

c) As a final example, suppose thatis said to occur at trialnif the num-

ber of positive values in Z1, . . . , Z n is exactly twice the number of negative

values. Equivalently,occurs at trial nif and only ifXn= n/3.

5.2 The Renewal and Lifetime Sequences

Let represent a renewal event and as before define the lifetime sequence

{fn} where f0= 0 and

fn= P{occurs for the first time at trial n}, n= 1, 2, . . . .


67/177


In like manner, we define the renewal sequence un, where u0= 1 and

un= P{occurs at trialn}, n= 1, 2, . . . .

Let F(s) =

fnsn and U(s) =

uns

n be the generating functions of

{fn} and{un}. Note that

f=

fn = F(1) 1

since fhas the interpretation that recurs at some time in the sequence.

Since the event may not occur at all, it is possible for f to be less than 1.

Clearly, 1

f represents the probability that never recurs in the infinite

sequence of trials.Iff


68/177

5.2. THE RENEWAL AND LIFETIME SEQUENCES 63

earlier is the corresponding probability generating function. A renewal event

withf= 1 is called recurrent.For a recurrent event, F(s) is a probability generating function. The

mean inter-occurrence time is

= F(1) =n=0

nfn.

If < , we say that is positive recurrent. If = , we say thatisnull recurrent.

Finally, if can occur only at n = t, 2t, 3t , . . . for some positive integer

t > 1, we say that is periodic with period t. More formally, let t =

g.c.d.{n: fn> 0}. (g.c.d. stands for the greatest common divisor). Ift >1,the recurrent event is said to be periodic with period t. Ift = 1, is said

to beaperiodic.

Note that even if the first a few fn values are zero, the renewal event can

still be aperiodic. Many students believe that: iff1 =f2 = 0, the period of

the renewal event must be at least 3. This is wrong. The renewal event can

still be aperiodic iff8 >0, f11 >0 and so on. The greatest common divisor

of 8, 11 is one! No additional information is needed.

Another remark is, suppose fi > 0 and fj > 0 for some integers i and

j. In addition, iand j is mutually prime, then the greatest common divisor

for the set{i,j, any additional numbers} = 1. That is, we know that theperiod is 1 already. No need to look further.

To show that the greatest common divisor is t which is larger than 1, we

have to make sure fn = 0 whenever n is not a multiple of t. This is much

harder in general.

In the simple random walk, the renewal event of returning to zero has

period 2. This is because fn > 0 only if you lose and win equal number of

games in a total ofn games. Thus,n must be even when fn> 0. The period

is t= 2, rather than any larger because f2 >0. Thus the greatest common

divisor cannot be larger than 2.


69/177


5.3 Some Properties

For a renewal event to occur at trial n 1, either occurs for the firsttime at n with probability fn =fnu0, or occurs for the first time at some

intermediate trialk < n and then occurs again at n. The probability of this

event isfkunk. Notice that f0= 1, we therefore have

un = f0un+ f1un1+ + fn1u1+ fnu0, n= 1, 2, . . . .

This equation is called renewal equation.

Using the typical generating function methodology, we get

U(s) 1 =F(s)U(s).

Hence

U(s) = 1

1 F(s) or F(s) = 1 1

U(s).

Recall that when we discussed the simple random walk, we found in that

context,

U(s) = (1 4pqs2)1/2, F(s) = 1

1 4pqs2.It is simple to see that this relationship is true.

The concepts defined in the last section are all related to the {un} se-quence and we summarize this in the following.

Theorem 5.1

The renewal event is

1. transient if and only ifu =

un= U(1)< ,

2. recurrent if and only ifu= ,

3. periodic if t = g.c.d.{n : un > 0} is greater than 1 and aperiodic ift= 1.

4. null recurrent if and only if

un= andun 0 asn .


70/177

5.3. SOME PROPERTIES 65

Proof1 and 2:

un=n=0

un = lims1

U(s) = lims1

[1 F(s)]1.

It follows thatu < andu = whenf=F(1)< 1 andf= 1 respectively.The event is transient in the former case and persistent in the latter.

3. If has periodd >1, then F(s) =

fnsn contains only powers ofsd.

Since

U(s) = [1 F(s)]1 = 1 + F(s) + F2(s) + if follows that U(s) = uns

n contains only powers ofsd and so t = g.c.d{n:un >0} = g.c.d.{md: umd > 0} is such that t|d. But since un = 0 impliesthatfn= 0, it follows that d|t. Hence t = d.

4. This result will follow from the renewal theorem.

The following is the famous renewal theorem.

Theorem 5.2 (The renewal theorem).

Let be a recurrent and aperiodic renewal event and let

=

nfn= F(1)

be the mean inter-occurrence time. Then

limn un=

1

.Proof: See Feller (1968, page 335).

When= which impliesis null recurrent, then 1 = 0. This proves4) in the last theorem.

For recurrent periodic renewal event , we might be able to re-scale the

time unit and then make use of this theorem. Suppose that has period

d > 1. We can define a new sequence of trials so that each new trial is a

combination ofd original trials. That is, if the outcome of the original trials

are X1, X2, . . .. For instance, define

Ym+1= (Xmd+1, Xmd+2, . . . , X (m+1)d).

The new sequence{Y0, Y1, . . .} can also be used to define the renewal event. However, in this case, becomes aperiodic and the theorem can then be

applied.


71/177


72/177

5.4. DELAYED RENEWAL EVENTS 67

5.4 Delayed Renewal Events

In a simple random walk with X0 = 0, the event of the walk returning to

0 is a renewal event. When Xn = 0 for some n, the process renews itself:

it behaves as if we have just observed X0 = 0, and we can re-set the clock

back to 0. More specifically, ifX10 = 0, then{X10 = 0, X10+1, X10+2, . . .} isstochastically the same as {X0= 0, X1, X2, . . .}. However, let be the eventofXn = 1, then is not a renewal event. When X5 = 1,{X5, X6, . . .} doesnot behave the same as {X0, X1, . . .}. Hence, we cannot re-set the clock backto 0 and pretend that nothing happened.

If we observe that X5= 1 and X19= 1, then{X19+0= 1, X19+1, . . .}willhave the same stochastic property of the system{X5 = 1, X5+1, X5+2, . . .}.Hence, the event does not renewal itself when it first occurs, but after its

first occurrence, the future occurrence of renews the process to the time

when it first occurs. Such events are calleddelayed renewal events.

The main difference between the delayed renewal and the usual renewal

events is: the waiting time for the first occurrence of has different distribu-

tion from the distribution of the inter-occurrence times. An informal way to

describe the delayed renewal event is: we missed the beginning and started

from the middle of the sequence.

Suppose that is a delayed renewal event. Let us define some quantities:

1){bn}: the probability that first occurs on trial n, n = 0, 1, 2, . . ..2){fn}: the probability thatfirst occurs again n trials later once that

has occurred,

3){un}: the probability that occurs on trial n, given that occurredon trial 0.

4){vn}: unconditional probability that occurs on trial n.By convention, we suppose that f0 = 0 but we do allow b0 > 0 so that

may occur for the first time at trial 0. LetB(s), F(s), U(s) and V(s) be

corresponding generating functions. We have,

U(s) = [1 F(s)]1, |s|


73/177


is recurrent if f =

fn = 1 and transient if f < 1. Periodicities are

determined by examining g.c.d.{n: fn> 0}. Note that it is possible that is a recurrent event and yet has non-zero probability that will never occur,but once it does it then occurs infinitely often.

To find V(s), let us note that when occurs at trial n 1, either occurs for the first time at n with probability bn=bnu0, or occurs for the

first time at some intermediate trialk < n and then occurs again at n. Thus,

vn= b0un+ b1un1+ + bnu0, n= 0, 1, 2, . . . .We recognize the right side as the convolution of{bn} with{un} and so

V(s) =B(s)U(s),

|s

|


74/177

5.5. SUMMARY 69

5.5 Summary

Table 5.1: Summary of some concepts

Terminology Definition

Event It is a property of a stochastic process.

Its occurrence can be determined after n trials.

Renewal Event When this type of event occurs, the stochastic

process undergoes a renewal: the random behavior of the

process from this point is the same as the process fromtime zero

Delayed Renewal At the second time when this type of event occurs

Event the process undergoes a renewal: the random behavior of the

process from this point is the same as the process from

time when it occurred for the first time.

Recurrent The renewal event will occur with probability 1.

Transient The renewal event may never occur.

Positive Recurrent The renewal event is recurrent and the

expected waiting time for the next occurrence is finite

Null Recurrent The renewal event is recurrent but theexpected waiting time for the next occurrence is infinite

Period The greatest common divisor of the number of trials

after which the renewal event can happen.

Aperiodic The period of the renewal event is 1.

5.6 Problems

1. A fair die is rolled repeatedly. We keep a record of the score of eachrol

Applied Probability Theory - J. Chen

Documents

Transcript of Applied Probability Theory - J. Chen