Applied Probability Theory - J. Chen

download Applied Probability Theory - J. Chen

of 177

Transcript of Applied Probability Theory - J. Chen

  • 8/10/2019 Applied Probability Theory - J. Chen

    1/177

    Stat333 Lecture Notes

    Applied Probability Theory

    Jiahua Chen

    Department of Statistics and Actuarial Science

    University of Waterloo

    cJiahua Chen

    Fall, 2003

  • 8/10/2019 Applied Probability Theory - J. Chen

    2/177

    2

    Course Outline

    Stat333

    Review of basic probability. Generating functions and their applications.Simple random walk, branching process and renewal events. Discrete time

    Markov chain. Poisson process and continues time Markov chain. Quequing

    theory and renewal processes.

  • 8/10/2019 Applied Probability Theory - J. Chen

    3/177

    Contents

    1 Introduction 1

    1.1 Probability Model . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Conditional Probabilities and Independence . . . . . . . . . . 3

    1.3 Bayes Formula . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    1.4 Key Facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    1.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2 Random Variables 7

    2.1 Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . 7

    2.2 Discrete Random Variables . . . . . . . . . . . . . . . . . . . . 9

    2.3 Continuous Random Variables . . . . . . . . . . . . . . . . . . 102.4 Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    2.5 Joint Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 12

    2.6 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    2.7 Formulas for Expectations . . . . . . . . . . . . . . . . . . . . 14

    2.8 Key Results and Concepts . . . . . . . . . . . . . . . . . . . . 15

    2.9 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    3 Conditional Expectation 19

    3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2 Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    3.3 Comment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    3.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    1

  • 8/10/2019 Applied Probability Theory - J. Chen

    4/177

    2 CONTENTS

    4 Generating Functions 29

    4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.2 Probability Generating Functions . . . . . . . . . . . . . . . . 32

    4.3 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

    4.3.1 Key Facts . . . . . . . . . . . . . . . . . . . . . . . . . 36

    4.4 The Simple Random Walk . . . . . . . . . . . . . . . . . . . . 36

    4.4.1 First Passage Times . . . . . . . . . . . . . . . . . . . 38

    4.4.2 Returns to Origin . . . . . . . . . . . . . . . . . . . . . 40

    4.4.3 Some Key Results in the Simple Random Walk . . . . 41

    4.5 The Branching Process . . . . . . . . . . . . . . . . . . . . . . 42

    4.5.1 Mean and Variance ofZn. . . . . . . . . . . . . . . . . 43

    4.5.2 Probability of Extinction . . . . . . . . . . . . . . . . . 444.5.3 Some Key Results in Branch Process . . . . . . . . . . 48

    4.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

    5 Renewal Events 59

    5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

    5.2 The Renewal and Lifetime Sequences . . . . . . . . . . . . . . 61

    5.3 Some Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 64

    5.4 Delayed Renewal Events . . . . . . . . . . . . . . . . . . . . . 67

    5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

    5.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

    6 Discrete Time MC 73

    6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

    6.2 Chapman-Kolmogorov Equations . . . . . . . . . . . . . . . . 80

    6.3 Classification of States . . . . . . . . . . . . . . . . . . . . . . 82

    6.4 Limiting Probabilities . . . . . . . . . . . . . . . . . . . . . . . 89

    6.5 Mean Time Spent in Transient States . . . . . . . . . . . . . . 95

    6.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

    7 Exponential and Poisson 105

    7.1 Definition and Some Properties . . . . . . . . . . . . . . . . . 106

    7.2 Properties of Exponential Distribution . . . . . . . . . . . . . 106

    7.3 The Poisson Process . . . . . . . . . . . . . . . . . . . . . . . 109

  • 8/10/2019 Applied Probability Theory - J. Chen

    5/177

    CONTENTS 3

    7.3.1 Inter-arrival and Waiting Time Distributions . . . . . . 112

    7.4 Further Properties . . . . . . . . . . . . . . . . . . . . . . . . 1137.5 Conditional Distribution of the Arrival Times . . . . . . . . . 114

    7.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

    8 Continuous Time Markov Chain 119

    8.1 Birth and Death Process . . . . . . . . . . . . . . . . . . . . . 122

    8.2 Kolmogorov Differential Equations . . . . . . . . . . . . . . . 125

    8.3 Limiting Probabilities . . . . . . . . . . . . . . . . . . . . . . . 130

    8.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

    9 Queueing Theory 139

    9.1 Cost Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 139

    9.2 Steady-State Probabilities . . . . . . . . . . . . . . . . . . . . 141

    9.3 Exponential Model . . . . . . . . . . . . . . . . . . . . . . . . 143

    9.4 Single Server . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

    9.5 Network of Queues . . . . . . . . . . . . . . . . . . . . . . . . 149

    9.5.1 Open System . . . . . . . . . . . . . . . . . . . . . . . 149

    9.5.2 Closed Systems . . . . . . . . . . . . . . . . . . . . . . 150

    9.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

    10 Renewal Process 15510.1 Distribution ofN(t) . . . . . . . . . . . . . . . . . . . . . . . 156

    10.2 Limiting Theorems and Their Applications . . . . . . . . . . . 159

    10.3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

    11 Sample Exam Papers 165

    11.1 Quiz 1: Winter 2003 . . . . . . . . . . . . . . . . . . . . . . . 165

    11.2 Quiz 2: Winter 2003 . . . . . . . . . . . . . . . . . . . . . . . 167

    11.3 Final Exam: Winter 2003 . . . . . . . . . . . . . . . . . . . . 169

  • 8/10/2019 Applied Probability Theory - J. Chen

    6/177

    Chapter 1

    Introduction

    1.1 Probability Model

    A probability model consists of three parts: sample space, a collection of

    events, and a probability measure.

    Assume an experiment is to be done. The set of all possible outcomes is

    called Sample Space. For example, if we roll a die,{1, 2, 3, 4, 5, 6} is thesample space. We use notation S for the sample space. Every element of

    S is called a sample point. Mathematically, the sample space is merely an

    arbitrary set. There is no need of a corresponding experiment.Roughly speaking, every subset ofS is an event. The collection of events

    is then all possible subsets ofS. In some cases, however, we only admit a

    specific class of subsets ofSas events. We do not discuss this point in this

    course.

    For every event, we assign a probability to it. To make it meaningful,

    we have to maintain some internal consistency. That is, the assignment is

    required to have some properties. The following conditions are placed on

    assigning probabilities.

    Axioms of Probability Measure

    A probability measure Pis a function of events such that:

    1. 0 P(E) 1 for any event E;

    1

  • 8/10/2019 Applied Probability Theory - J. Chen

    7/177

    2 CHAPTER 1. INTRODUCTION

    2. P(S) = 1;

    3. P(i=1Ei) =

    i=1 P(Ei) for any mutually exclusive events Ei, i =

    1, 2, . . .. i.e. EiEj = for all i =j .

    Mathematically, the above definition does not depend on the hypothetical

    experiment. A probability model consists of a sample spaceS, a-algebra B(a collection of subsets ofSwith some properties), and a probability measure

    P.

    The axioms for a probability model imply that the probability measure

    has many other properties not explicitly stated as axioms. For example, since

    P(

    ) =P() + P(), we must have P() = 0.

    LetEc be the complement of eventEwhich consists of all sample points

    which do not belong to E. Axioms 2 and 3 imply that

    1 =P(S) =P(E Ec) =P(E) + P(Ec).

    Hence,P(Ec) = 1 P(E).For any two events E1 and E2, we have

    P(E1 E2) =P(E1) + P(E2) P(E1E2).

    In general,

    P(ni=1Ei) =

    P(Ei) i1

  • 8/10/2019 Applied Probability Theory - J. Chen

    8/177

    1.2. CONDITIONAL PROBABILITIES AND INDEPENDENCE 3

    Using classical definition of the probability measure (which satisfies three

    Axioms),P(A1) =

    (n 1)!n!

    ; P(A1A2) =(n 2)!

    n!

    and so on. We get

    P(iAi) = 1 12!

    + 1

    3! + (1)n+11

    n!.

    The answer to the question is then

    1 P(iAi) = 1 [1 12!

    + 1

    3! + (1)n+11

    n!].

    The limit when n is thenexp(1).

    1.2 Conditional Probabilities and Independence

    Two eventsAand B areindependentif and only if

    P(AB) =P(A)P(B).

    Some people may have probabilistic instinct on why this relationship de-

    scribes the independence, and why our notion of independence implies thisrelationship. However, once the notion of independence is defined as above,

    this relationship serves as our golden standard. We always try to verify this,

    whether we work on assignment problems or on applications on the concept

    of independence. For instance, to test if being a smoker is independent of

    having heart disease, we check whether the above relationship is true by

    collecting data on these incidents.

    A sequence of eventsA1, . . . , Anare independent of each other if and only

    if

    P(iI

    Ai) =

    iI

    P(Ai)

    for all subsets I of{1, 2, . . . , n}.We would like to emphasize that pairwise independence does not imply

    the overall independence.

  • 8/10/2019 Applied Probability Theory - J. Chen

    9/177

    4 CHAPTER 1. INTRODUCTION

    LetEand Fbe two events andP(F)>0. We define that the conditional

    probability ofE givenF by

    P(E|F) =P(EF)/P(F).

    As already defined, two events E and Fare independent if and only if

    P(EF) =P(E)P(F). When eventsEand F are independent, we find

    P(E|F) =P(E)

    when P(F) > 0. However, we should not use this relationship as the defi-

    nition of independence. WhenP(F) = 0, the conditional probability is not

    defined, butEand Fcan still be two independent events.

    1.3 Bayes Formula

    Let Fi, i= 1, 2, . . . , nbe mutually exclusive events such thatFi = S, andP(E)> 0. Then

    P(Fk|E) = P(EFk)P(E)

    = P(E|Fk)P(Fk)

    i P(E|Fi)P(Fi)

    .

    The Bayes formula is a mathematical consequence of defining the condi-tional probability. However, this formula has generated a lot of thinking in

    statistics. We could think ofEis an event (subset of sample space) of some

    experiment to be done, andFis classify the sample points of the same exper-

    iment according to possibly a different rule (than the rule ofE). Somehow,

    E is readily observed, but Fis are not. Before the experiment is done, we

    may have some prior information on what probabilities ofFis are. However,

    when the experiment is done and the outcome (the sample point) is known

    to belong to E, but its membership in Fi remains unknown, this Bayes for-

    mula allows us to update our assessment of the chance for Fi in view of the

    occurrence ofE. For example, before we toss a die, it is known the chance of

    observing 2 is 1/6. After a die is tossed, and you are told that the outcome

    is an even number, then the conditional probability becomes 1/3.

    Here is a less straightforward example.

  • 8/10/2019 Applied Probability Theory - J. Chen

    10/177

    1.4. KEY FACTS 5

    Example 1.2

    There are three coins in a box: 1. two headed; 2. fair; 3. biased withP(H) =0.75.

    When one of the coins is selected at random and flipped, it shows head.

    What is the probability that it is the two headed coin?

    Solution: Let C1, C2 and C3 represent the events when the two headed,

    fair or biased coin is selected, respectively. We want to find P(C1|H).

    P(C1|H) = P(H|C1)P(C1)3i=1 P(H|Ci)P(Ci)

    .

    The answer is 4/9.

    RemarkIt is not so important to memorize the Bayes formula, but the def-inition of the conditional probability. Once you understand the conditional

    probability, you can work out the formula easily.

    1.4 Key Facts

    A probability space consists of three components: Sample space, the collec-

    tion of events, and the probability measure. The probability measure satisfies

    three Axioms and from which we introduce the concepts of conditional prob-

    ability and independence. The Bayes theorem is a simple consequence ofmanipulating the idea of conditional probability. However, the result incited

    philosophical debate in statistics.

    1.5 Problems

    1. Suppose that in an experiment, a fair die is rolled twice. LetA={thefirst outcome is even}, B={the total score is 4}, C= the total score,D=the absolute difference between two scores.

    (a) Which of A, B, C, D are events? Which of them are randomvariables?

    (b) Which of the following make sense? Which of them do not?

    (i)A B, (ii)P(C), (iii)E(A), (iv) Var(D).

  • 8/10/2019 Applied Probability Theory - J. Chen

    11/177

    6 CHAPTER 1. INTRODUCTION

    2. Let S be the sample space of an particular experiment, A and B be

    events, and P be a probability measure. Which of the followings areAxioms, definitions and formulas?

    (i)P(A B) =P(A) + P(B) P(AB).(ii)P(S) = 1.

    (iii)P(A|B) =P(AB)/P(B) whenP(B) = 0.

    3. Using only the axioms of probability, show that

    1)P(A B) =P(A) + P(B) P(AB)2)P(A

    B

    C) =P(A) + P(B) + P(C)

    P(AB)

    P(AC)

    P(BC) +

    P(ABC).

    4. a) Prove thatP(ABC) =P(A|BC)P(B|C)P(C).b) Prove that ifA and B are independent, then so are Ac and B c.

    5. LetA and B be two events.

    (a) Show that in general, ifA and B are mutually exclusive, then they

    are not necessarily independent.

    (b) Find a particular pair of events A and B such that they are both

    mutually exclusive and independent.

    6. Prove Booles inequalities:

    (a)P(ni=1Ai) ni=1 P(Ai), (b) P(ni=1Ai) 1

    ni=1 P(A

    ci).

    7. Let A1 A2 be a sequence of events. Ifi=1 Ai = (empty),show that

    limn

    P(An) = 0.

  • 8/10/2019 Applied Probability Theory - J. Chen

    12/177

    Chapter 2

    Random Variables

    2.1 Random Variable

    In practice, we may describe the outcomes of an experiment by any termi-

    nology. For example, if Mary and Paul compete in a game, the outcomes can

    be: Mary wins; Mary loses; it is a draw.

    However, it is more convenient in mathematics to code the outcomes by

    numbers. For example, we may define the outcome as 1 if Mary wins, the

    outcome is 1 if Mary loses, and as 0 if it is a draw. That is, we can transformthe outcomes in S into numbers. There are many ways to transform theoutcomes.

    In probability theory, we call the mechanism of transforming sample

    points into numbers as Random Variable. More formally, we define a

    random variable as a function on the sample space S.

    We use capital letters X, Y, and so on for random variables.

    In most applications, we focus mainly on the value of the function (ran-

    dom variables). That is why it appears that the random variables are num-

    bers, rather than mechanisms of transforming sample points into numbers.

    As a function, a random variable is totally deterministic. There is nothing

    random. However, the inputs of this function are random, this fact impliesthe outcome of the transformation is random. This is how we get the notion

    that random variables are random.

    Example 2.1

    7

  • 8/10/2019 Applied Probability Theory - J. Chen

    13/177

    8 CHAPTER 2. RANDOM VARIABLES

    LetSbe the ordered outcomes of rolling two fair dice. DefineXbe the sum

    of two outcomes. If = (2, 5) which is a sample point, then X() = 7.Nothing is random. Since in a specific experiment, we are not certain in advance whether the

    two outcomes will be = (2, 5), we hence do not know whether the outcome

    ofXwill be 7. This gives us the illusion ofXbeing random. Its randomness

    is inherited from the randomness of the outcome in S.

    When we use notation X= 7, we often do not mean that the outcome

    ofXis 7 in a specific experiment. Rather, we define it as

    X= 7 = Set of sample points which makes X equal 7.

    Hence, in this example,

    X= 7 = {(1, 6), (2, 5), . . . , (6, 1)}

    which is a subset ofS. Consequently, it is an event. When the dice are fair,

    the classical definition assigns a probability of 1/6 to this event.

    If the dice are not fair, we usually assign a different value to it, or we

    do not know what value is most suitable in this application. However, we

    believe that there must be a suitable value exists and it does not have any

    effect on the definition ofX.

    There is another excuse for not focus on the fact that a random variable

    Xis a function. We care more about probabilities associated with events in

    the form of X x than about how X maps S into real numbers. OnceP(X x) is available for all real numbers x, we then classify Xaccordingto the form of this function, and ignore X itself.

    Example 2.2

    Toss a coin until the first head appears. Suppose in each trial, P(H) = p

    and trials are independent. Define X = the number of tosses when theexperiment is completed.

    In this experiment, the sample space is

    S= {H , T H , T T H , . . .}.

  • 8/10/2019 Applied Probability Theory - J. Chen

    14/177

    2.2. DISCRETE RANDOM VARIABLES 9

    The corresponding values ofX are

    {1, 2, 3, . . .}.

    We find

    P(X=n) =p(1 p)n1

    for all n 1. Once this is done, we say Xhas geometric distribution. HowthisX is defined becomes irrelevant.

    IfXis a random variable, we call

    F(x) =P(X x)

    the cumulative distribution function(c.d.f.). It is known that F(x) is ac.d.f. of some random variable in some probability model if and only if

    1. F(x) 0;2. F() = 1, F() = 0;3. F(x) is monotone increasing and right continuous.

    That is, we can construct a sample space together with a probability measure

    and a random variable, so that the cumulative distribution function of this

    random variable is given by F(x).

    2.2 Discrete Random Variables

    If the set of all possible outcomes of a random variable X is countable, then

    we say that the random variable X is discrete.

    For example, if a random variable can only take values{.2, .5, 2, }, itis discrete. More commonly seen discrete random variables in our textbooks

    take integer values. However, we should remember that discrete random

    variable can take any values, as long as the number of possible values remain

    countable.

    By the way, the notion of countable needs to be clarified. If we can find

    a one-to-one map from a set to a set of integers, then this set is count-

    able. The set of all even numbers is countable. The set of the numbers

  • 8/10/2019 Applied Probability Theory - J. Chen

    15/177

    10 CHAPTER 2. RANDOM VARIABLES

    {1, .1, .01, .001, . . .} is also countable. Being countable implies that we can

    arrange the elements in the set into a sequence. We often represent a count-able set of real numbers as{x1, x2, . . .}.If{ t1, t2, t3, . . .} is the set of possible outcomes ofX, we say the function

    f(ti) =P(X=ti)

    the probability (mass) function (p.m.f.) ofX.

    Note that in this definition, I used notation ti for possible values of the

    random variable X. Although it is a common practice that we use xis for

    possible values of the random variable X, this is not a requirement. It is

    very important for us to make a distinction between (the notation of) the

    possible values ofX, and X itself.

    2.3 Continuous Random Variables

    If the c.d.f. of a random variable F(x) =P(X x) can be written as

    F(x) = x

    f(t)dt,

    for some non-negative f(t), we say X is absolutely continuous. We have

    f(x) =dF(x)/dx(for almost all x) and f(x) is called the density function ofX.

    We classify random variables according to their cumulative distribution

    functions, probability functions or density functions. We usually do not mind

    how these random variables are defined.

    Example 2.3

    1. Xhas Binomial (n, p) distribution iff(i) =P(X=i) =ni

    pi(1p)ni

    fori = 0, 1, . . . , n.

    2. Xhas Poisson () distribution if

    f(i) =P(X=i) =i

    i!exp()

    fori = 0, 1, 2, . . ..

  • 8/10/2019 Applied Probability Theory - J. Chen

    16/177

    2.4. EXPECTATIONS 11

    3. Xhas uniform [0, 1] distribution ifF(x) =P(X x) =xforx [0, 1],

    orf(x) = 1 for x [0, 1].4. Xhas exponential distribution with mean parameter if its c.d.f. is

    given by F(x) = 1exp(x/) or if its p.d.f. is given by f(x) =1exp(x/) for x 0.

    Note that we do not have to specify the sample space, probability mea-

    sure, and how the random variables are defined in the above example.

    Two basic types of random variables have been introduced. In theory,

    there is a third type of random variables. However, the third type of randomvariables is usually not discussed in elementary probability courses. Notice

    that the sum of two random variables is clearly another random variable.

    When we add a continuous random variable to a discrete random variable,

    the new random variable is not discrete nor continuous. That is, we can-

    not always classify a random variable into one of the three possible types.

    A measure theory result states, however, that any random variable can be

    written as a linear combination of three random variables of each type.

    2.4 ExpectationsA proper definition of the expectation of a random variable needs advanced

    knowledge of real analysis. We give a handicapped definition as follows.

    IfX is discrete with possible values{x0, x1, x2, . . . , }, then we calculateits expectation as

    E(X) =i=0

    xiP(X=xi)

    when the summation converges absolutely.

    IfXis (absolute) continuous with density functionf(x), then we calculateits expectation as

    E(X) =

    tf(t)dt

    when the integration converges absolutely.

  • 8/10/2019 Applied Probability Theory - J. Chen

    17/177

    12 CHAPTER 2. RANDOM VARIABLES

    When the convergence does not hold, we say the expectation does not

    exist.To calculate the expectation of any random variable, we should pay a lot

    of attention to the if part before you start. Many students lost the clue

    because they ignore this part of the definition.

    Example 2.4

    Calculate expectation of Binomial and Exponential random variables.

    2.5 Joint Distribution

    LetXandY be two random variables. Note that it is possible to define two

    functions on the same sample space. For example, suppose our sample space

    is [0, 1][0, 1], the unit square. Every sample point can be represented as(w1, w2). Let

    X(w1, w2) =w1, Y(w1, w2) =w2

    and assume the probability measure on [0, 1][0, 1] is uniform. Then bothX andY have uniform distribution. We find

    P(X s, Y t) =st;when (s, t)[0, 1][0, 1].

    IfZis another random variable such that

    Z(w1, w2) = 1 w1.We findZalso have uniform distribution. However,

    P(X s, Z t) =stin general.

    The moral of this example is: knowing individual distributions ofX, Y

    and Zis not enough to tell their joint behavior.The joint random behavior of two random variables X and Y is charac-

    terized by their joint c.d.f. defined as

    F(x, y) =P(X x, Y y).

  • 8/10/2019 Applied Probability Theory - J. Chen

    18/177

    2.5. JOINT DISTRIBUTION 13

    The joint c.d.f. of more random variables are defined similarly.

    Let us point out again that the lower case letters x, y are notation fordummy variables. They do not have to associate with random variablesX

    andY. That is, we may use

    F(s, t) =P(X s, Y t)

    to represent exactly the same joint c.d.f. It is the appearance ofX,Y in the

    definition that makesFthe joint c.d.f ofX andY.

    The marginal c.d.f ofX orYcan be obtained by taking limit.

    FX(s) =P(X s) = limt F(s, t).FY(y) =P(Y y) = lim

    sF(s, y).

    Note that I used (s,t,y) on purpose. It is certainly not a good practice,

    but the point is, Xdoes not have to be linked with x.

    When bothXandYare discrete, it is more convenient to work with the

    joint probability (mass) function:

    f(x, y) =P(X=x, Y =y);

    When there exists a non-negative functionf(x, y) such that

    F(x, y) = y

    x

    f(s, t)dsdt

    for all real numbers (x, y), we say that X and Y are jointly (absolutely)

    continuous andf(x, y) is their joint density function.

    The marginal probability function (for discrete case) can be obtained as

    fX(x) =

    yf(x, y).

    The marginal density function (for continuous case) can be obtained as

    fX(x) =

    f(x, y)dy.

  • 8/10/2019 Applied Probability Theory - J. Chen

    19/177

    14 CHAPTER 2. RANDOM VARIABLES

    2.6 Independence

    If the joint c.d.f. ofXandY satisfiesF(x, y) =FX(x)FY(y) forallx, y, then

    we sayX andY are independent.

    When bothXandYare discrete, then the independence is equivalent to

    f(x, y) =fX(x)fY(y)

    for all (x, y) where f(x, y) is the joint probability function. When X and Y

    are jointly continuous, then the independence is equivalent to

    f(x, y) =fX(x)fY(y)

    for almost of (x, y) wheref(x, y) is the joint density function.

    2.7 Formulas for Expectations

    LetX and Y be two random variables. We define

    V ar(X) =E(X E(X))2 =E(X2) (EX)2;

    Cov(X, Y) =E[(X EX)(Y EY)].It is known that

    E(aX+ bY) =aEX+ bEY;

    V ar(aX+ bY) =a2V ar(X) + b2V ar(Y) + 2abCov(X, Y)

    wherea, b are two real numbers (constants).

    Let Z = X+Y be a newly created random variable. Its c.d.f. can be

    derived from the joint c.d.f. of X and Y. This task is not always simple.

    There are two special cases.

    First, assume X andYare independent and jointly continuous. Assume

    that Xhas density function f(x) and Y have density function g(y). Then

    we know that the joint density function f(x, y) = f(x)g(y). The density

    function ofZ=X+ Y is given by

    fZ(t) =

    f(t y)g(y)dy.

  • 8/10/2019 Applied Probability Theory - J. Chen

    20/177

    2.8. KEY RESULTS AND CONCEPTS 15

    Second, assumeXandYare independent, take non-negative integer val-

    ues only, with probability functions f(x) and g(y). (Note the notation lookthe same as before). The joint probability function ofZ=X+ Y is

    P(Z=n) =ni=0

    f(i)g(n i).

    Example 2.5 Derivation of the distribution ofX+ Y.

    1. Both X and Y have exponential distribution with common density

    F(x) = exp(x) forx 0.

    2. Both Xand Y have Poisson distribution with means 1 and 2.

    2.8 Key Results and Concepts

    Random variables are real valued functions defined on the sample space.

    Their randomness is the consequence of the randomness of the outcome from

    the sample space. We classify them according to their cumulative distribu-

    tion functions or equivalently, their probability mass functions or probability

    density functions.

    A discrete random variable takes at most countable number of possible

    values. An absolutely continuous random variable has cumulative distribu-

    tion function which can be obtained from a density function by integration.

    (Or roughly, its cumulative distribution function is differentiable). The third

    type of random variable is not discussed.

    A random variable has, say, Poisson distribution if its probability function

    has the formn

    n!exp(), n 0, 1, 2, . . . .

    In general, the distribution of a random variable is named after the form of

    its cumulative distribution function.

    The mean, variance, moments of a random variable are determined by

    its distribution. In many examples, they can be obtained by summation or

  • 8/10/2019 Applied Probability Theory - J. Chen

    21/177

    16 CHAPTER 2. RANDOM VARIABLES

    integration (to some students) easily. In other examples, the mean, variance

    of a random variable can be obtained via its relationship to other randomvariables. Thus, memorizing some formulas is useful.

    2.9 Problems

    1. IfXand Yare two random variables, what do we mean by

    (i)F(x) is the cumulative distribution function ofX?

    (ii)X 4 is independent ofY 2?

    2. Let Xbe a random variable with Binomial distribution with parametersn= 3, p = 0.4, i.e.

    pX(k) =

    3

    k

    (0.4)k(1 0.4)3k when k = 0, 1, 2, 3.

    LetY = (X 1)2.(i) LetFX(x) be the cumulative distribution function ofX. Calculate

    FX(2.4).

    (ii) Tabulate the probability function ofY.(iii) Tabulate the probability function ofXgivenY= 1. (iv) Tabulate

    E(X|Y).

    3. A random number Nof fair dice is thrown. P(N = n) = 2n, n 1.LetSbe the sum of the scores. Find the probability that

    a)N= 2 given S= 4

    b)S= 4 given N= 2.

    c)S= 4 given N is even

    d) the largest number shown by any die is r .

    4. A coupon is selected at random from a series ofk coupons and placed

    in each box of cereal. A house-husband has bought Nboxes of cereal.

  • 8/10/2019 Applied Probability Theory - J. Chen

    22/177

    2.9. PROBLEMS 17

    What is the probability that all k coupons are obtained? (Hint: Con-

    sider the event that the ith coupon is not obtained. The answer is innice summation format.)

    5. If birthdays are equally likely to fall in each of the twelve months of

    the year, find the probability that all twelve months are represented

    among the birthdays of 20 people selected at random.

    (Hint: let Ai be the event that the ith month is not included and

    considerA1 A2 A12)

    6. LetXbe a random variable and g() be a real valued function.

    (a) What do we mean by X is discrete?(b) IfXis a discrete random variable, argue thatg(X) is also a random

    variable and discrete.

    (c) IfXis a continuous random variable, is g(X) necessarily a contin-

    uous random variable? Why?

    7. Leta and b be independent random variables uniformly distributed in

    (0, 1). What is the probability that x2 + ax + b= 0 has no real roots?

    8. Express the distribution functions of

    X+ = max{0, X}, X = min{0, X}, |X| =X+ + X, X

    in terms of the distribution function Fof the random variable X.

    9. Is it generally true that E(1/X) = 1/E(X)? Is it ever true that

    E(1/X) = 1/E(X)?

    10. Suppose that 10 cards, of which 5 are red and 5 are green, are put

    at random into 10 envelops, of which 7 are red and 3 are green, so

    that each envelop will contain a card. Determine the probability that

    exactlyk envelopes will contain a card with a matching color(k=0, 1,

    . . ., 10).

  • 8/10/2019 Applied Probability Theory - J. Chen

    23/177

    18 CHAPTER 2. RANDOM VARIABLES

  • 8/10/2019 Applied Probability Theory - J. Chen

    24/177

    Chapter 3

    Conditional Distribution and

    Expectations

    3.1 Introduction

    Suppose bothXandYare discrete and hence have a joint probability func-

    tionf(x, y). Then, we have

    P(X=x|Y =y) = P(X=x, Y =y)P(Y =y)

    =f(x, y)

    fY

    (y).

    Of course, this is meaningful only ifP(Y =y) =fY(y)> 0.

    When we pay no attention on the part of Y = y, this is a function of

    x only. However, this function (or the way of crunching the numberx and

    report a number called probability) is determined by X,Yand the numbery

    jointly. As a function ofx, it is a probability function. Since it is determined

    byXandY =y, we say it is the conditional probability function ofXgiven

    Y =y. A commonly used notation is fX|Y(x|y).

    Example 3.1

    There are two urns. The first contains 4 white and 6 black balls, and the

    second contains 2 white balls and 8 black balls. A urn is selected randomly,

    and then we randomly pick 5 balls from the urn (with replacement). Define

    19

  • 8/10/2019 Applied Probability Theory - J. Chen

    25/177

    20 CHAPTER 3. CONDITIONAL EXPECTATION

    X= the number of while balls selected. What is the probability function of

    X?Solution: Consider the situations when different urns are selected. Define

    Y =i if the ith urn is selected.

    Let us work on the conditional probability functions first.

    P(X=j |Y= 1) =

    5

    j

    (.4)j(.6)5j

    and

    P(X=j |Y= 2) =

    5

    j

    (.2)j(.8)5j

    forj = 0, 1, . . . , 5.The marginal probability function ofX is given by

    P(X=j) = (.5)[

    5

    j

    ((.4)j(.6)5j] + (0.5)[

    5

    j

    (.2)j(.8)5j ].

    As we have noticed, when Y = 1 is given, Xhas binomial distribution

    withn = 5, p= 0.4. This distribution has expectation 2. We use notation

    E(X|Y= 1) = 2.In general, we define

    E(X|Y =y) = x

    xP(X=x|Y =y)

    where the sum is over all possible values ofX.

    Remark: Again, we should always first determine whether Xis discrete.

    If it is, then determine the what values ofXcan be before this formula is

    applied.

    When bothXandYare discrete, E(X|Y =y) is well defined. There areseveral components in this definition. Whenever we use a new value y, theoutcome will probably change. In the last example,

    E(X|Y= 1) = 2, E(X|Y= 2) = 1.

  • 8/10/2019 Applied Probability Theory - J. Chen

    26/177

    3.1. INTRODUCTION 21

    When we focus on the value of Y in this expression, we find we have a

    function ofy defined as

    (y) =E(X|Y =y).

    Just like a function such as g(y) =y2, we know that (Y) is also a random

    variable. Thus, we might want to know the expectation of this new random

    variable. It turns out that

    E[(Y)] =y

    (y)P(Y =y)

    = y

    E(X|Y =y)P(Y =y)

    =y

    [x

    xP(X=x|Y =y)]P(Y =y)

    =x,y

    xP(X=x, Y =y)

    = E(X).

    To be more concrete, we do not use (Y) in textbooks, but write it as

    E(X|Y) and call it the conditional expectation ofXgivenY. For some withmathematical curiosity, we may write

    E(X|Y) =E[X|Y =y]|y=Y.Hence, the above identity can be stated as

    E[E(X|Y)] =E(X).

    One intuitive interpretation of this result is: the grand average is the weighted

    average of sub-averages. To find the average mark of students in stat230, we

    may first calculate the average in each of 6 sections. Hence, we obtain 6

    conditional expectations (conditioning on which section a student is in). We

    then calculate the weighted average of section averages according to the size

    of each section. This is the second expectation being applied on the left hand

    side of the above formula.

    It turns out that this concept applies to continuous random variables too.

    If (X, Y) are jointly continuous, we define the conditional density function

  • 8/10/2019 Applied Probability Theory - J. Chen

    27/177

    22 CHAPTER 3. CONDITIONAL EXPECTATION

    ofX given Y =y as

    fX|Y(x|y) =f(x, y)

    fY(y)

    wheref(x, y) is the joint density,fX andfY are marginal density functions,

    and we assume that fY(y) is larger than zero,

    The conditional expectation will then be defined as

    E(X|Y =y) =

    xf(x, y)

    fY(y) dx

    which is again a function of y. Same argument implies we could define

    E(X

    |Y) in exactly the same way as before. It is easy to verify that

    E[E(X|Y)] =E(X).

    In fact, this equality is true regardless the type of random variables (after

    they are properly defined). The only restriction is: all relevant quantities

    exist.

    3.2 Formulas

    Most formulas for ordinary expectation remain valid for the conditional ex-

    pectation. For example,

    E(aX+ bY|Z) =aE(X|Z) + bE(Y|Z).

    Ifg() is a function, we have

    E[g(Y)X|Y] =g(Y)E[X|Y]

    as g(Y) is regarded as non-random with respect to Y.

    At last, we define

    V ar(X|Y) =E[(X E(X|Y))2|Y].

    Then

    V ar(X) =E[V ar(X|Y)] + V ar[E(X|Y)].

  • 8/10/2019 Applied Probability Theory - J. Chen

    28/177

    3.2. FORMULAS 23

    To show this, notice that

    E[V ar(X|Y)] = E{E[(X E(X|Y))2|Y]}= E{E(X2|Y) [E(X|Y)]2}= EX2 E[E(X|Y)]2,

    and

    V ar(E(X|Y)) = E[E(X|Y)]2 [E{E(X|Y)}]2= E[E(X|Y)]2 [E(X)]2.

    Adding them up, we get the conclusion.

    Example 3.2

    A miner is trapped in a mine with 3 doors. If he uses the first door, he will

    be free 2 hours later. If he uses the second, he will be back to the same spot

    3 hours later. If he uses the third door, he will be back to the same spot 5

    hours later. Assume that he does not have memory and will always pick a

    door at random. What is the expected time it takes for him to get free?

    Solution: LetXbe the number of hours it takes until he gets free. We

    are asked to calculate E(X).It seems that the expectation is simpler if we know which door he selected

    in the first place. For this reason, we define random variable Y to be the

    door he selects in the first try.

    Now it is simple to write down

    E(X|Y= 1) = 2.

    However, we only have

    E(X|Y = 2) = 3 + E(X), E(X|Y= 3) = 5 + E(X).Even though it does not directly answer our question, we do have

    E(X) =E[E(X|Y)] =13

    [2 + (3 + EX) + (5 + EX)].

  • 8/10/2019 Applied Probability Theory - J. Chen

    29/177

    24 CHAPTER 3. CONDITIONAL EXPECTATION

    This is a simple linear equation, we find E(X) = 10.

    Can we use the same idea to calculate V ar(X)?It is seen that

    V ar(X|Y= 1) = 0; V ar(X|Y= 2) =V ar(X|Y= 3) =V ar(X).

    Hence,

    E[V ar(X|Y)] =23

    V ar(X),

    V ar[E(X|Y)] =

    1

    3[22 + 133 + 152]

    102 =

    98

    3.

    Consequently, we find

    V ar(X) =2

    3V ar(X) +

    98

    3

    and henceV ar(X) = 98.

    Remark: We certainly do not believe that the miner will be memoryless.

    Such an example might be useful to model a trapped mouse. We might be

    able to make the inference whether a mouse will learn after repeating thisexperiment a number of times. We could compare the observed average with

    this theoretical average under memoryless assumption. Any discrepancy may

    point to the possibility that the mouse is in fact learning.

    3.3 Comment

    It could be claimed that the probability theory is a special case of mea-

    sure theory in mathematics. However, the concepts of independence and

    conditional expectation allow probability theory to be a separate scientific

    discipline.

    Our subsequent developments depend heavily on the use of conditional

    expectation.

  • 8/10/2019 Applied Probability Theory - J. Chen

    30/177

    3.4. PROBLEMS 25

    3.4 Problems

    1. LetXbe an random variable such that

    P(X=n) =p(1 p)n, n= 0, 1, 2, . . .

    is its probability function and 0< p 0. Their common probability density function is

    f(t) =et t 0.

    (i) Calculate P(X >5|X >3).

    (ii) Calculate P(X+ Y 1).3. There are two TAs for a certain course. For a particular assignment

    handed in, if it were marked by the first TA, the mark would be random

    with mean 75% and variance (0.1)2; while if it were marked by the

    second TA, the mark would be random with mean 70% and variance

    (0.05)2. The first TA has 40% chance to mark any single assignment.

    Let X be the mark of the particular assignment. Calculate the mean

    and variance ofX.

    4. LetX1, X2, X3, . . .be independently distributed random variables suchthatXn has probability mass function

    fn(k) =P(Xn= k) =

    n

    k

    pk(1 p)nk k= 0, 1, . . . , n .

  • 8/10/2019 Applied Probability Theory - J. Chen

    31/177

    26 CHAPTER 3. CONDITIONAL EXPECTATION

    (a) Find the probability generating function ofXn.

    (b) Find the probability generating function ofX1+ X2+ X3.

    (c) LetNbe a positive integer valued random variable with probability

    generating function G(s) and assume it is independent ofX1, X2, . . ..

    Find the probability generating function ofXN

    (d) Continuation of (c), find the probability generating function ofXN+

    XN+1.

    5. An integerNis chosen from the geometric distribution with probability

    function

    fN(n) =(1 )n1

    , n= 1, 2, . . .

    Given N=n,Xhas the uniform distribution on 1, 2, . . . , n.

    a) Find the joint p.f. ofX and N.

    b) Find the conditional p.f. ofN givenX=x.

    6. The number of fish that Elise catches in a day is a Poisson random

    variable with mean 30. However, on the average, Elise tosses back two

    out of every three fish she catches. What is the probability that, on a

    given day, Elise takes home nfish. What is the mean and variance of

    (a) the number of fish she catches,

    (b) the number of fish she takes home?

    (What independence assumptions have you made?)

    7. Let X1, X2, X3 be independent random variables taking values in the

    positive integers and having probability function given by P(Xi = x) =

    (1 pi)px1i forx = 1, 2, . . . ,and i = 1, 2, 3.(a) Show that

    P(X1 < X2< X3) = (1 p1)(1 p2)p2p23(1 p2p3)(1 p1p2p3) .

    (b) FindP(X1 X2 X3).

  • 8/10/2019 Applied Probability Theory - J. Chen

    32/177

    3.4. PROBLEMS 27

    8. Suppose that 13 cards are selected at random from a regular deck of 52

    playing cards. (a) If it is known that at least one ace has been selected,what is the probability that at least two aces have been selected? (b)

    If it is known that the ace of heart has been selected, what is the

    probability that at least two aces have been selected?

    9. The number of children N in a randomly chosen family has mean

    and variance2. Each child is a male with probabilityp independently

    and X represents the number of male children in a randomly chosen

    family. Find the mean and variance ofX.

    10. Suppose we have ten coins which are such that if the ith one is flipped

    then heads will appear with probability i/10, i = 1, 2, . . . , 10. When

    one of the coins is randomly selected and flipped, it shows head. What

    is the conditional probability that it was the fifth coin?

  • 8/10/2019 Applied Probability Theory - J. Chen

    33/177

    28 CHAPTER 3. CONDITIONAL EXPECTATION

  • 8/10/2019 Applied Probability Theory - J. Chen

    34/177

    Chapter 4

    Generating functions and their

    applications

    4.1 Introduction

    Suppose that{aj} = {a0, a1, . . .}, is a sequence of real numbers. If

    A(s) =j=0

    ajsj =a0+ a1s + a2s

    2 + (4.1)

    converges in some interval|s| s0 where s0 > 0, then A(s) is called thegenerating functionof the sequence {aj}0 . The generating function providesa convenient summary of a real number sequence. In many examples, simple

    and explicit expressions ofA(s) can be obtained. This enables us to study

    the properties of{aj}0 conveniently.

    Example 4.1

    The Fibonacci sequence{fj} is defined by f0 = 0, f1 = 1 and the recursiverelationshipfj =fj1+ fj2, j = 2, 3, . . . . (4.2)

    We use the tool of generating function to find explicit expressions offj .

    29

  • 8/10/2019 Applied Probability Theory - J. Chen

    35/177

    30 CHAPTER 4. GENERATING FUNCTIONS

    Solution: Multiplying bysj and summing over j gives

    j=2

    fjsj =

    j=2

    fj1sj +

    j=2

    fj2sj. (4.3)

    Note the summation starts from j = 2 because (4.2) is valid only when

    j = 2, 3, . . .. By defining F(s) =j=0 fjs

    j, we get

    j=2

    fjsj =

    j=0

    fjsj f0 f1s= F(s) s.

    With similar treatment on the right hand side of (4.3), we obtain

    F(s) s= sF(s) + s2F(s). (4.4)

    Ignoring the convergence issue for the moment, we find

    F(s) = s

    1 s s2 .

    This is surely a simple and explicit generating function. To study other

    properties of the sequence, let us note that in general, a generating function

    has the McLaurin series expansion

    A(s) =A(0) + A

    (0)s + A

    (0)s2

    /2! + which by comparison with (4.1) gives

    aj =A(j)

    j! .

    This, of course, requires the function be analytic at 0 which is true whenA(s)

    converges in a neighbourhood of 0. An obvious conclusion is: the real number

    sequences and the generating functions have an one-to-one correspondence

    when the convergence and the analytic properties are true.

    Now let us get back to the example, F(s) clearly converges at least for|s| 0.5. This allows us to look for its McLaurin series expansion. Note that

    1 s s2 = (1 1 +

    5

    2 s)(1 1

    5

    2 s)

  • 8/10/2019 Applied Probability Theory - J. Chen

    36/177

    4.1. INTRODUCTION 31

    and by the method of partial fractions

    F(s) = 1

    5[j=0

    (1 + 5

    2 )jsj

    j=0

    (1 5

    2 )jsj].

    Recall the property of one-to-one correspondence,

    fj = 1

    5[(

    1 +

    5

    2 )j (1

    5

    2 )j], j = 0, 1, 2, . . . .

    It is interesting to note that

    limj

    fj/fj1 = (1 + 5)/2

    which is the golden ratio, to which the ancient Egyptians attributed many

    mystical quantities.

    In this example, the generating function has been used as a tool for solving

    thedifference equation (4.2). The generating functions will be seen to be

    far more useful than just this. For example, if A(s) converges in|s| s0withs0> 1, then

    A(1) =

    j=1

    aj, A(1) =

    j=1

    jaj

    and so on.

    Example 4.2

    Consider the following series:

    aj = 1, j = 0, 1, 2, . . . ;

    bj = 1/j!, j= 0, 1, 2, . . . ;

    c0= 0, cj = 1/j, j = 1, 2, . . . .

    Easy calculation shows their corresponding generating functions are A(s) =

    (1s)1,B(s) =es andC(s) = log(1s), where the regions of convergenceas|s|

  • 8/10/2019 Applied Probability Theory - J. Chen

    37/177

    32 CHAPTER 4. GENERATING FUNCTIONS

    4.2 Probability Generating Functions

    LetXbe a random variable taking non-negative integer values with proba-

    bility function{pj}, where

    pj =P{X=j}, j= 0, 1, 2, . . . .

    The generating function of{pj} is called theprobability generating func-tion ofXand we write

    G(s) =GX(s) =E{sX} =p0+p1s +p2s2 + . (4.5)

    Of course, this function provides a convenient summary of the probability

    function ofX. Note that it converges at least for|s| 1 since, for sin thisinterval,

    j=0

    pj |s|j j=0

    pj = 1.

    Using some mathematics tools, we can easily find

    G(1) =E(X) =j=0

    jpj , G(r)(1) =E(X(r)) =

    j=0

    j(r)pj

    whenever the corresponding quantities exist. Otherwise, G(r)(1) has to be

    replaced by lims1G(r)(s) and infinity outcome is allowed. Note j(r) =j(j 1) (j r +1) andE(X(r)) is therth factorial momentofX. Thevariance ofXcan be expressed as

    V ar(X) =E(X(2)) + E(X) [E(x)]2 =G(1) + G(1) [G(1)]2.

    Example 4.3

    SupposeXhas geometric distribution with parameter pso that

    pj =P(X=j) =p(1

    p)j , j= 0, 1, 2, . . . .

    The probability generating function ofX is

    G(s) =E(sX) =j=0

    p(1 p)jsj =p[1 (1 p)s]1

  • 8/10/2019 Applied Probability Theory - J. Chen

    38/177

    4.2. PROBABILITY GENERATING FUNCTIONS 33

    for|s| j ) =pj+1+pj+2+ , j = 0, 1, 2, . . . .

    LetQ(s) =j=0 qjs

    j be the corresponding generating function and note that

    sinceqj 1 for all j , it follows thatj=0

    qjsj

    j=0

    sj = (1 s)1

    for|s|

  • 8/10/2019 Applied Probability Theory - J. Chen

    39/177

    34 CHAPTER 4. GENERATING FUNCTIONS

    Since G(1) = 1, it follows from (4.7) and the Mean Value Theorem incalculus that, for given|s|

  • 8/10/2019 Applied Probability Theory - J. Chen

    40/177

  • 8/10/2019 Applied Probability Theory - J. Chen

    41/177

  • 8/10/2019 Applied Probability Theory - J. Chen

    42/177

    4.4. THE SIMPLE RANDOM WALK 37

    Here, we use generating functions to examine properties of the process

    {Xn}. Some quantities to be investigated areun = P(Xn= 0)

    fn = P(X1= 0, . . . , X n1= 0, Xn= 0)n = P(X1< 1, . . . , X n1< 1, Xn 1)

    (r)n = P(X1< r, . . . , X n1 < r, Xn r)(r)n = P(X1> r , . . . , X n1> r, Xn r)

    forn = 1, 2, . . .and r= 1, 2, . . ..

    For convenience, we define u0 = 1, f0 = 0 = (r)0 =

    (r) = 0. In the

    simple random walk as presented, Zn can be either 1 or1. Thus, it isimpossible for Xn1 < r, Xn > r to occur for any n. We insist on using

    Xn1 < r, Xn r instead ofXn1 < r, Xn=r in the definitions of(r)n . Ithas the advantage of being able to retain the same definition for more general

    random walks.

    Each of these quantities represents the probability of a particular outcome

    of the simple random walk afterntrials. We summarize them in the following

    table.

    Symbol Probability of

    un return to 0 at trialn

    fn first return to 0 at trial n

    n first passage through 1 at trial n

    (r)n first passage through r at trial n

  • 8/10/2019 Applied Probability Theory - J. Chen

    43/177

  • 8/10/2019 Applied Probability Theory - J. Chen

    44/177

    4.4. THE SIMPLE RANDOM WALK 39

    since 0 = 0. Note this identity is still true even when n = 0. Therefore,

    we have found {(2)

    } = {n} {n} (convolution) and (2)

    = [(s)]2

    . In likemanner,

    (r)(s) = [(s)]r, r= 2, 3, . . . .

    Although the above relationship is neat, we cannot solve it to obtain an

    explicit expression ofns yet. Let us work on another relationship between

    {(2)n} and{n}. It is obvious that1 =p. If the first passage through 1 attrialn such thatn >1, it requiresZ1= X1 = 1. After that, it requires thesimple random walk to gain a value of 2 in exactly n 1 steps. Thus

    n = q(2)n1, n= 2, 3, . . . . (4.8)

    Multiplying both sides of (4.8) bysn and sum overnwith care over its range,

    we haven=2

    nsn =q

    n=2

    (2)n1s

    n.

    We find

    (s) ps= qs(2)n (s) =qs[(s)]2

    from the first relationship.

    It is easy to find the two possible forms:

    (s) =1 1 4pqs22qs

    .

    Whens 0, we should have (s) 0 so we must have

    (s) =1 1 4pqs2

    2qs = (2qs)1

    j=1

    1/2

    j

    (4pqs2)j

    where the binomial expansion has been used. From this we find 2n= 0 and

    2n1= (2q)1

    1/2

    n

    (4pq)n = (2n1)1

    2n 1

    n

    pnqn1, n= 1, 2, . . . .

    The generating function (s) will tell us more about the simple random

    walk. Since

    (s) =n=0

    nsn,

  • 8/10/2019 Applied Probability Theory - J. Chen

    45/177

    40 CHAPTER 4. GENERATING FUNCTIONS

    (1) = 0+ 1+ 2+

    = P(first passage through 1 ever occurs)= (1

    1 4p + 4p2)/2q= (1 |p q|)/2q

    =

    1 p qp/q p < q.

    The walk is certain to pass through 1 whenp > q, or even whenp = q= 1/2.

    Ifp q, we may define the random variable Nwhich is the waiting timeuntil first passage through 1 occurs. That is

    N= min{

    n: Xn= 1}

    and we know, in this case, that P(N q

    , p=q.

    Can we still define N whenp < q?

    If the walk is used to model gambling, the above conclusions amount

    to say: the gambler is certain to have positive net winning at some time if

    p 1/2. If, however,p

  • 8/10/2019 Applied Probability Theory - J. Chen

    46/177

    4.4. THE SIMPLE RANDOM WALK 41

    Note {Xn} is also a simple random work withP(Xn = 1) =qrather than

    p. Hence the event B has similar structure to the event A. Let(1)n =P(X1< 1, X2 < 1, . . . , Xn1 < 1, Xn= 1).

    Then,{(1)n }has the same generating function as that of{n}except for pand q switched. In addition,P(B) =P(X1 =1)(1)n1 and therefore, forn 1,

    fn = P(A) + P(B) =p(1)n1 + qn1.

    Equivalently,

    F(s) = ps(1)(s) + qs(s)

    = ps1

    1 4pqs2

    2ps + qs 1

    1 4pqs2

    2qs

    = 1

    1 4pqs2.The probability that the process ever returns to the origin is

    F(1) =n=0

    fn= 1 |p q|

    and so a return is certain only ifp = q= 1/2. In this case, the mean time to

    return is

    F(1) = lims1

    d

    ds

    [1

    1

    s2] =

    .

    Thus, if the game is fair and you have lost some money at the moment, we

    have a good news for you: the chance that you will win back all your money

    is 1. The bad news is, the above result also tells you that on average, you

    may not live that long to see it.

    4.4.3 Some Key Results in the Simple Random Walk

    Symbol Expression Generating function

    un 2nn (pq)

    n U(s) = (1 4pqs2)1/2

    fn (2n 1)12nn

    (pq)n 1

    (1 4pqs2)

    n (2n 1)12n1n

    pnqn1 (2qs)1[1

    (1 4pqs2)]

  • 8/10/2019 Applied Probability Theory - J. Chen

    47/177

    42 CHAPTER 4. GENERATING FUNCTIONS

    The following are key steps of deriving the results in the above table.

    qs[(s)]2 (s) +ps= 0;F(s) = 1 [U(s)]1;

    F(s) =ps(1)(s) + qs(s);

    (2)(s) = [(s)]2.

    4.5 The Branching Process

    Now let us study the second example of simple stochastic processes. Here we

    have particles that are capable of producing particles of like kind. Assumethat all such particles act independently of one another, and that each parti-

    cle has a probability pj of producing exactly j new particles, j = 0, 1, 2, . . .,pj = 1. For simplicity, we assume that the 0th generation to consist of

    a single particle and the direct descendants of that particle form the first

    generation. Similarly, the direct descendants of the nth generation form the

    (n + 1)th generation. Z0 = 1

    Z1 = 4

    Z2 = 5

    Z3 = 9

    Let Zn be the population of the nth generation so that Z0 = 1 and

    P(Z1 = j) =pj ,j= 0, 1, 2, . . .. LetXni be the direct descendants of the ith

    individual in the nth generation. Hence, we have

    Zn+1=Zni=1

    Xni

    for all n 1. In addition, all Xni are independent and have the samedistribution as that ofZ1.

  • 8/10/2019 Applied Probability Theory - J. Chen

    48/177

  • 8/10/2019 Applied Probability Theory - J. Chen

    49/177

    44 CHAPTER 4. GENERATING FUNCTIONS

    and from (4.10) it follows that

    n= G(Hn1(1))Hn1(1) =n1, n= 1, 2, . . . (4.11)

    where = G(1) is the mean family size and we have Hn1(1) = 1. Since

    0 = 1, it follows from (4.11) that n = n. Thus, if > 1, the average

    population size increases exponentially. If

  • 8/10/2019 Applied Probability Theory - J. Chen

    50/177

    4.5. THE BRANCHING PROCESS 45

    Note thatq0 q1 q2 andqj 1 for all j . Thus,

    q = limn

    qn

    exists and represents the probability that the population ever becomes ex-

    tinct. From (4.12), it follows thatqis a fixed point of the probability gener-

    ating functionG(s); that is

    q= G(q).

    This gives us the idea that we need only solve the equation G(s) s= 0to obtain the probability of extinction. However, we need to know that when

    the equation has more than one solution, which one gives the probability of

    extinction?Theorem 4.2

    Let{Zn}n=0 be a branching process as specified in this section such thatZ0 = 1, and the family size generating function is given by G(s). Then the

    probability of extinction for this branching process qis the smallest solution

    of the equation

    s= G(s)

    in the interval [0, 1].

    Proof: Assume that smallest solution in [0, 1] isq and we want to show

    thatq= q.

    Let qn =P(Zn = 0) for n= 0, 1, . . .. Clearly, q0 = 0q. Assume thatqk q for some n. Note thatG(s) in an increasing function for s [0, 1].Hence, G(qk)q. Consequently, qk+1 = G(qk)q. This impliesqn qfor all k. Let n , we obtain q q. Since qis also a solution in [0, 1],andq is the smallest such solutions, we must have q= q.

    In many situations, we do not have to solve the equation to determind the

    value ofq. Leth(s) =G(s) s. One obvious solution in [0, 1] is s = 1. Noteh(s) =G(s) 1, h(s) =G(s) =

    j=2j(j 1)pjsj2 0 when s [0, 1].

    Thus, h(s) is a convex function.There are several possibilities:

    1. Ifh(1) =G(1)1 = 1> 0, the curve ofh(s) goes down froms= 0and then comes up to hit 0 ats = 1. Sinceh(0) =P(X01 = 0) 0, the

  • 8/10/2019 Applied Probability Theory - J. Chen

    51/177

    46 CHAPTER 4. GENERATING FUNCTIONS

    curve crosses 0 line exactly once before s= 1. Since q is the smallest

    solution in [0, 1]. We must have q 0, then we are at the

    same situation as in case 2; On the other hand,h(0) =P(X01= 0) = 0

    implies the family size is fixed at 1, hence q= 0.

    RemarkBecause of the above summary, most students tend to always solvethe equation to find the probability of ultimate extinction. This is often more

    than what is needed.

    Example 4.5

  • 8/10/2019 Applied Probability Theory - J. Chen

    52/177

    4.5. THE BRANCHING PROCESS 47

    Lotka (See Feller 1968, page 294) showed that to a reasonable approximation,

    the distribution of the number of male offspring in an American family wasdescribed by

    p0= 0.4825, pk = (0.2126)(0.5893)k1, k 1

    which is a geometric distribution with a modified first term. The correspond-

    ing probability generating function is

    G(s) =0.4825 0.0717s

    1 0.5893sand G(1) = 1.261. Thus, for example, in the 16th generation, the average

    population of male descendants of a single root ancestor is

    16 = (1.261)16 = 40.685.

    The probability of extinction, however, is the smallest solution of

    q=0.4826 0.0717q

    1 0.5893q .

    Thus, we findq= 0.8197. This suggests that for those names that do survive

    to the 16th generation, the average size is very much more than 40.685. (All

    the calculations are subject to original round off errors). Example 4.6

    From the point of epidemiology, it is more important to control the spread

    of the disease than to cure the infected patients. Suppose that the spread of

    a disease can be modeled by a branching process. Then it is very important

    to make sure that the average number of people being infected by a patient

    is less than 1. If so, the probability of its extinction will be one. However,

    even if the average number of people being infected is larger than one, there

    is still a positive chance that the disease will extinct.

    A scientist in Health Canada analyzed the data from the SARS (severe

    atypical respiratory syndrome) epidemic in year 2003. It is noticed that many

    interest phenomena could be partially explained by the results in branching

    process.

  • 8/10/2019 Applied Probability Theory - J. Chen

    53/177

    48 CHAPTER 4. GENERATING FUNCTIONS

    First, many countries imported SARS patients but they did not cause

    epidemics. This can be explained by the fact that the probability of extinc-tion is not small (even when the average number of people being infected by

    a single patient is larger than 1).

    Second, a few patients were nicknamed super-spreader. They might

    simply corresponding to the portion of branching process which do not ex-

    tinct.

    Third, after government intervention, the average number of people being

    infected by a single patient was substantially reduced. When it fell below 1,

    the epidemic was doomed to extinct.

    Finally, it was not cost effective to screen all airplane passengers, but to

    take strict and quick measure of quarantine of new and old cases. Whenthe average number of people being infected by a single patient falls below

    one, the disease will be controlled with probability one.

    4.5.3 Some Key Results in Branch Process

    For simplicity, we assumed that the population starts from a single individual:

    Z0 = 1; we also assumed the numbers of offsprings of various individual are

    independent and have the same distribution.

    Under these assumptions, we have shown that

    n= n

    and

    2n=n 1

    1 n12

    where and 2 are the mean and the variance of the family size and n and

    2n are the mean and the variance of the size of the nth generation.

    We have shown that the probability of extinction, q, is the smallest non-

    negative solution to

    G(s) =s

    whereG(s) is the probability generating function of the family size. Further,

    it is known that q= 1 when 1, q

  • 8/10/2019 Applied Probability Theory - J. Chen

    54/177

    4.6. PROBLEMS 49

    These results can all be derived from the fact that

    Hn(s) =Hn1(G(s))

    whereHn(s) is the probability generating function of the population size of

    thenth generation.

    4.6 Problems

    1. Find the mean and variance ofXwhen

    (a) X has Poisson distribution with p(x) = x

    x! e, x = 0, 1, . . ..

    (b) X has exponential distribution with f(x) =ex,x 0.

    2. (a) If X and Y are exponentially distributed with rate = 1 and

    independent of each other, find the density function ofX+ Y.

    (b) If X and Y are geometrically distributed with parameter p and

    independent of each other, find the probability mass function ofX+Y.

    (c) Find a typical discrete distribution and a typical continuous distri-

    bution (not discussed in class) to repeat question (a) and (b).

    3. Suppose that given N=n, Xhas binomial distribution with parame-

    ters n and p. Suppose alsoNhas Poisson distribution with parameter

    . Use the technique of generating functions to find

    (a) the marginal distribution ofX.

    (b) the distribution ofN X.

    4. LetX1, X2, X3, . . . be independent and identically distributed random

    variables such that X1 has probability mass function

    f(k) =P(X1 = k) =p(1 p)k k= 0, 1, 2, . . . .

    (a) Find the probability generating function ofX1.

  • 8/10/2019 Applied Probability Theory - J. Chen

    55/177

    50 CHAPTER 4. GENERATING FUNCTIONS

    (b) Let In = 1 if Xn n and In = 0 if Xn < n for n = 0, 1, 2, . . ..

    That is,In is an indicator random variable. Show that the probabilitygenerating function ofIn is given by

    Hn(s) = 1 + (s 1)(1 p)n.

    (c) Let Nbe a random variable with probability generating function

    G(s) and assume it is independent ofX1, X2, . . .. Let IN = In when

    N = n and In is the indicator random variable defined in (b). Show

    that

    E[sIN|N] =HN(s) = 1 + (s 1)(1 p)N.

    Find the probability generating function ofIN,

    5. A coin is tossed repeatedly, heads appearing with probability p= 2/3

    on each toss.

    (a) LetXbe the number of tosses until the first occasion by which two

    heads have appeared successively. Write down a difference equation for

    f(k) =P(X=k). Assume that f(0) = 0.

    (b) Show the generating function off(k) is given by

    F(s) = 4

    27s2[

    2

    1 23s+

    1

    1 + 13s].

    (c) Find an explicit expression for f(k) and calculate E(X).

    6. LetXandYbe independent random variables with negative binomial

    distribution and probability function

    pi =

    ki

    pk(p 1)i, i= 0, 1, . . . .

    (a) Show that the probability generating function of X is given by

    G(s) = pk

    (1+(p1)s)k .(b) Find the probability function ofX+ Y.

    (c) Calculate E(eX) and V ar(eX) and what condition on the size ofp

    is needed?

  • 8/10/2019 Applied Probability Theory - J. Chen

    56/177

    4.6. PROBLEMS 51

    7. Give the sequences generated by the following:

    1)A(s) = (1 s)1.5;2)B(s) = (s2 s 12)1;3)C(s) =s log(1 s2)/ log(1 );4)D(s) =s/(5 + 3s);

    5)E(s) = (3 + 2s)/(s2 3s 4);6)F(s) = (p + qs)n.

    8. Turn the following equation systems into equations in generating func-

    tions.

    1)b0= 1;bj =bj1+ 2aj, j = 1, 2, . . .; a0 = 0.

    2)b0= 0,b1 = p, bn= qn1r=1brbn1r, n = 2, 3, . . ..

    9. 1) Find the generating function of the sequence aj = j(j + 1), j =

    0, 1, 2, . . ..

    2) Find the generating function of the sequence aj = j/(j+ 1), j =

    0, 1, 2, . . ..

    3) LetXbe a non-negative integer valued random variable and definerj = P(X j). Find the generating function of{rj} in terms of theprobability generating function ofX.

    10. 1) Negative Binomial

    pj =

    kj

    (p)j(1 p)k, j = 0, 1, . . .

    wherek >0 and 0< p

  • 8/10/2019 Applied Probability Theory - J. Chen

    57/177

    52 CHAPTER 4. GENERATING FUNCTIONS

    11. Find the probability generating function of the following distributions:

    1. Discrete uniform on 0, 1, . . . , N .2. Geometric.

    3. Binomial.

    4. Poisson.

    12. Let {an} be a sequence with generating functionA(s),|s| < R,R >0.Find the generating functions of

    1){c + an}wherecis a real number.2)

    {can

    }where c is a real number.

    3){an+ an+2}.4){(n + 1)an}.5){a2n} = {a0, 0, a2, 0, a4, . . .}.6){a3n} = {a0, 0, 0, a3, 0, 0, a6, . . .}.

    13. Consider a usual branching process: let the population size of the nth

    generation be Xn and family size of the ith family in the nth gener-

    ation be Zn,i. Thus, Xn =

    Xn1i=1 Zn,i and X0 = 1. Assume Zn,i are

    independent and identically distributed, and

    P(Z1,1 = 0) =1

    2+ a; P(Z1,1= 1) =

    1

    4 2a; P(Z1,1= 3) = 1

    4+ a,

    for some a.

    (a) Find probability generating function of the family size. When a=

    1/8, find the probability generating function ofX2.

    (b) Find range ofa such that the probability of extinction is less than

    1.

    (c) Whena = 1/8, find the expectation and variance of the population

    size of the 5th generation and the probability of extinct.

    14. For a branching process with family size distribution given by

    P0= 1/6, P2 = 1/3, P3= 1/2;

  • 8/10/2019 Applied Probability Theory - J. Chen

    58/177

    4.6. PROBLEMS 53

    calculate the probability generating function ofZ2 givenZo= 1, where

    Z2 is the population of the second generation. Find also, the meanand variance ofZ2 and the probability of extinction. Repeat the same

    calculation when Zo= 3 and

    P0 = 1/6, P1= 1/2, P3= 1/3.

    15. Let the probabilitypnthat a family has exactlynchildren bepn when

    n 1, andp0= 1p(1+p+p2+ ). Assume that all 2n sex sequencesin a family ofn children have probability 2n. Show that fork 1,the probability that a family has exactly k boys is 2pk/(2

    p)k+1.

    Given that a family includes at least one boy, what is the probabilitythat there are two or more boys?

    16. Let Xi, i 1, be independent uniform (0, 1) random variables, anddefineN by

    N= min{n: Xn< Xn+1}whereX0 = x. Letf(x) =E(N).

    (a) Derive an integral equation for f(x) by conditioning onX1.

    (b) Differentiate both sides of the equation derived in (a).

    (c) Solve the resulting equation obtained in (b).

    17. Consider a sequence defined by r0 = 0, r1 = 1 and rj = rj1+ 2rj2,

    j 2. Find the generating function R(s) of{rj}, determine r25. Forwhat region ofs values does the series forR(s) converge?

    18. Let X1, X2, be independent random variables with common p.g.f.G(s) =E(sXi). LetNbe a random variable with p.g.f. H(s) indepen-

    dent of the Xis. LetTbe defined as 0 ifN= 0 and

    Ni=1 Xi ifN >0.

    Show that the p.g.f. ofT is given by H(G(s)). Hence find E(T) andVar(T) in terms ofE(X),V ar(X),E(N) andV ar(N).

    19. Consider a branching process in which the family size distribution is

    Poisson with mean .

  • 8/10/2019 Applied Probability Theory - J. Chen

    59/177

    54 CHAPTER 4. GENERATING FUNCTIONS

    (a) Under what condition will the probability of extinction of the pro-

    cess be less than 1?(b) Find the extinction probability when = 2.5 numerically.

    (c) When = 2.5 find the expected size of the 10th generation, and

    the probability of extinction by the 5th generation. Comment on the

    relationship between this second number and the ultimate extinction

    probability obtained in (b).

    20. Consider a branching process in which the family size distribution is

    geometric with parameter p. (The geometric distribution has p.m.f

    pj =p(1 p)j

    , j = 0, 1, . . .).(a) Under what condition will the probability of extinction of the pro-

    cess be less than 1?

    (b) Find the probability of extinction when p= 1/3.

    (c) When p = 1/3, find the expectation and variance of the size of the

    10th generation and the probability of extinction by the 5th generation.

    21. Let{Zn}n=0 be a usual branching process with Z0 = 1. It is knownP0= p, P1= pq, P2= q

    2 with 0

    p

    1 andq= 1

    p.

    1) Find a condition on the size ofp such that the probability of extinc-

    tion is 0. 2) Find the range ofp such that the probability of extinction

    is smaller than 1. Calculate the probability of extinction when p = 1/2.

    3) Calculate the mean and the variance ofZn whenp = 1/2.

    22. Let X1, X2, . . . be independent random variables with common p.g.f.

    G(s) = E(sXi). Let Nbe a random variable with p.g.f. H(s). Show

    that

    T = Ni=1 Xi N 1

    0 N= 0

    has p.g.f. H(G(s)). Hence, find the mean and variance ofT in terms

    of the means and variances ofXi and N. Remark: Can you see the

    relevance between this problem and the usual branching process?

  • 8/10/2019 Applied Probability Theory - J. Chen

    60/177

  • 8/10/2019 Applied Probability Theory - J. Chen

    61/177

  • 8/10/2019 Applied Probability Theory - J. Chen

    62/177

    4.6. PROBLEMS 57

    (b) Let fn = P(X1= 0, X2= 0, . . . , X n1= 0, Xn = 0|X0 = 1) for

    n= 1, 2, . . .and f0= 0. It is known that the generating function offnis given by

    F(s) =1 1 4pqs2

    2ps

    and

    1 4pq= |pq|. Find the probability that 0 will ever be reached.(c) Find the range ofp such that state 0 is recurrent.

  • 8/10/2019 Applied Probability Theory - J. Chen

    63/177

    58 CHAPTER 4. GENERATING FUNCTIONS

  • 8/10/2019 Applied Probability Theory - J. Chen

    64/177

    Chapter 5

    Renewal Events and Discrete

    Renewal Processes

    5.1 Introduction

    Consider a sequence of trials that are not necessarily independent and let

    represent some property which, on the basis of the outcomes of the first

    n trials, can be said unequivocally to occur or not to occur at trial n. By

    convention, we suppose that has just occurred at trial 0, andEn represents

    the event that occurs at trial n,n = 1, 2, . . ..We call an event in renewal theory. However, it is not an event in the

    sense of probability models in which events are subsets of the sample space.

    Taking the simple random walk {Xn} as an example, we regardXnas theoutcome of thenth trial. Thus, {Xn} themselves are outcomes of a sequenceof trials. An event1 can be used to describe: the outcome Xn is 0. That

    is, the event has just occurred at trial n is the event

    En = {Xn= 0}

    for a given n.Similarly, another possible event2 can be defined such that 2 has just

    occurred at trailn is the event

    En= {Xn Xn1= 1, Xn1 Xn2= 1}, n= 2, 3, . . . .

    59

  • 8/10/2019 Applied Probability Theory - J. Chen

    65/177

    60 CHAPTER 5. RENEWAL EVENTS

    The events E0 andE1 have to be defined separately.

    In general, if we have a well defined event , then we can easily describethe event En for every n. If we have a complete description of event En for

    every n, the event is then well defined. It is convenient to definef0 = 0

    and for n 1, fn =P(Ec1Ec2 . . . E cn1En). Thus,fn is the probability that occurs for the first time at trial n (after trial 0).

    We say that is a renewal event if each time occurs, the process

    undergoes a renewal or regeneration. That is to say that at the point when

    occurs, the outcomes of the successive trials have the same stochastic prop-

    erties as the outcomes of the successive trials started at time 0. In par-

    ticular, the probability that will next occur after n additional trials is fn,

    n= 1, 2, . . .. Mathematically, it means

    1. P(En+m|En) =P(Em|E0);2. P(En+mE

    cn+m1 Ecn+2Ecn+1|En) =P(EmEcm1 Ec2Ec1|E0).

    Another simple (but not rigorous) way to define a renewal event is: inde-

    pendent of previous outcomes of the trials, once occurs, the waiting time

    for the next occurrence of has the fixed distribution.

    Example 5.1

    Consider a sequence of Bernoulli trials in which P(S) = p and P(F) = qwithp + q= 1. Let represent the event that trials n 2,n 1 andn resultrespectively in F, S and S. We shall say that is the event F SS. It is

    clear that is a renewal event. If occurs atn, the process regenerates and

    the waiting time for its next occurrence has the same distribution as had the

    waiting time for the first occurrence. Example 5.2

    In the same situation as above, let represent the eventS S. That is, is

    said to occur at trial n if trials n

    1 and n both give Sas the outcome. In

    this case, is not a renewal event; the occurrence of does not constitute a

    renewal of the process. The reason is, if has occurred at trial n, the chance

    it will recur at trial n + 1 isp, but the chance that occurs on the first trial

    is 0.

  • 8/10/2019 Applied Probability Theory - J. Chen

    66/177

    5.2. THE RENEWAL AND LIFETIME SEQUENCES 61

    Example 5.3

    In most situations, the event of record breaking is not a renewal event. Let

    us consider the record high temperature. The record always gets higher and

    makes it hard to break. Thus, the waiting time for the next occurrence is

    likely to be longer. Hence, it cannot be a renewal event.

    Example 5.4

    The simple random walk provides a rich source for examples of renewal

    events. As before, we assume X0 = 0 and Xn = Xn1+Zn, where Zn = +1

    or1 with respective probabilities p and q, independently, n = 1, 2, . . ..a) Let represent return to the origin. Then is a renewal event. In

    fact, the notation that we used in our analysis of the simple random walk

    will motivate our choice of notation for recurrent events as introduced in the

    next section.

    b) Let represent a ladder point in the walk. By this we mean that

    occurs at trial n if

    Xn= max{X0, X1, . . . , X n1) + 1

    and we assume to have occurred at trial 0. Thus, the first occurrence of corresponds to first passage through 1, the second occurrence of corre-

    sponds to first passage through 2, and so on. Here again, is a renewal

    event, since each ladder point corresponds to a regeneration of the process.

    c) As a final example, suppose thatis said to occur at trialnif the num-

    ber of positive values in Z1, . . . , Z n is exactly twice the number of negative

    values. Equivalently,occurs at trial nif and only ifXn= n/3.

    5.2 The Renewal and Lifetime Sequences

    Let represent a renewal event and as before define the lifetime sequence

    {fn} where f0= 0 and

    fn= P{occurs for the first time at trial n}, n= 1, 2, . . . .

  • 8/10/2019 Applied Probability Theory - J. Chen

    67/177

    62 CHAPTER 5. RENEWAL EVENTS

    In like manner, we define the renewal sequence un, where u0= 1 and

    un= P{occurs at trialn}, n= 1, 2, . . . .

    Let F(s) =

    fnsn and U(s) =

    uns

    n be the generating functions of

    {fn} and{un}. Note that

    f=

    fn = F(1) 1

    since fhas the interpretation that recurs at some time in the sequence.

    Since the event may not occur at all, it is possible for f to be less than 1.

    Clearly, 1

    f represents the probability that never recurs in the infinite

    sequence of trials.Iff

  • 8/10/2019 Applied Probability Theory - J. Chen

    68/177

    5.2. THE RENEWAL AND LIFETIME SEQUENCES 63

    earlier is the corresponding probability generating function. A renewal event

    withf= 1 is called recurrent.For a recurrent event, F(s) is a probability generating function. The

    mean inter-occurrence time is

    = F(1) =n=0

    nfn.

    If < , we say that is positive recurrent. If = , we say thatisnull recurrent.

    Finally, if can occur only at n = t, 2t, 3t , . . . for some positive integer

    t > 1, we say that is periodic with period t. More formally, let t =

    g.c.d.{n: fn> 0}. (g.c.d. stands for the greatest common divisor). Ift >1,the recurrent event is said to be periodic with period t. Ift = 1, is said

    to beaperiodic.

    Note that even if the first a few fn values are zero, the renewal event can

    still be aperiodic. Many students believe that: iff1 =f2 = 0, the period of

    the renewal event must be at least 3. This is wrong. The renewal event can

    still be aperiodic iff8 >0, f11 >0 and so on. The greatest common divisor

    of 8, 11 is one! No additional information is needed.

    Another remark is, suppose fi > 0 and fj > 0 for some integers i and

    j. In addition, iand j is mutually prime, then the greatest common divisor

    for the set{i,j, any additional numbers} = 1. That is, we know that theperiod is 1 already. No need to look further.

    To show that the greatest common divisor is t which is larger than 1, we

    have to make sure fn = 0 whenever n is not a multiple of t. This is much

    harder in general.

    In the simple random walk, the renewal event of returning to zero has

    period 2. This is because fn > 0 only if you lose and win equal number of

    games in a total ofn games. Thus,n must be even when fn> 0. The period

    is t= 2, rather than any larger because f2 >0. Thus the greatest common

    divisor cannot be larger than 2.

  • 8/10/2019 Applied Probability Theory - J. Chen

    69/177

    64 CHAPTER 5. RENEWAL EVENTS

    5.3 Some Properties

    For a renewal event to occur at trial n 1, either occurs for the firsttime at n with probability fn =fnu0, or occurs for the first time at some

    intermediate trialk < n and then occurs again at n. The probability of this

    event isfkunk. Notice that f0= 1, we therefore have

    un = f0un+ f1un1+ + fn1u1+ fnu0, n= 1, 2, . . . .

    This equation is called renewal equation.

    Using the typical generating function methodology, we get

    U(s) 1 =F(s)U(s).

    Hence

    U(s) = 1

    1 F(s) or F(s) = 1 1

    U(s).

    Recall that when we discussed the simple random walk, we found in that

    context,

    U(s) = (1 4pqs2)1/2, F(s) = 1

    1 4pqs2.It is simple to see that this relationship is true.

    The concepts defined in the last section are all related to the {un} se-quence and we summarize this in the following.

    Theorem 5.1

    The renewal event is

    1. transient if and only ifu =

    un= U(1)< ,

    2. recurrent if and only ifu= ,

    3. periodic if t = g.c.d.{n : un > 0} is greater than 1 and aperiodic ift= 1.

    4. null recurrent if and only if

    un= andun 0 asn .

  • 8/10/2019 Applied Probability Theory - J. Chen

    70/177

    5.3. SOME PROPERTIES 65

    Proof1 and 2:

    un=n=0

    un = lims1

    U(s) = lims1

    [1 F(s)]1.

    It follows thatu < andu = whenf=F(1)< 1 andf= 1 respectively.The event is transient in the former case and persistent in the latter.

    3. If has periodd >1, then F(s) =

    fnsn contains only powers ofsd.

    Since

    U(s) = [1 F(s)]1 = 1 + F(s) + F2(s) + if follows that U(s) = uns

    n contains only powers ofsd and so t = g.c.d{n:un >0} = g.c.d.{md: umd > 0} is such that t|d. But since un = 0 impliesthatfn= 0, it follows that d|t. Hence t = d.

    4. This result will follow from the renewal theorem.

    The following is the famous renewal theorem.

    Theorem 5.2 (The renewal theorem).

    Let be a recurrent and aperiodic renewal event and let

    =

    nfn= F(1)

    be the mean inter-occurrence time. Then

    limn un=

    1

    .Proof: See Feller (1968, page 335).

    When= which impliesis null recurrent, then 1 = 0. This proves4) in the last theorem.

    For recurrent periodic renewal event , we might be able to re-scale the

    time unit and then make use of this theorem. Suppose that has period

    d > 1. We can define a new sequence of trials so that each new trial is a

    combination ofd original trials. That is, if the outcome of the original trials

    are X1, X2, . . .. For instance, define

    Ym+1= (Xmd+1, Xmd+2, . . . , X (m+1)d).

    The new sequence{Y0, Y1, . . .} can also be used to define the renewal event. However, in this case, becomes aperiodic and the theorem can then be

    applied.

  • 8/10/2019 Applied Probability Theory - J. Chen

    71/177

  • 8/10/2019 Applied Probability Theory - J. Chen

    72/177

    5.4. DELAYED RENEWAL EVENTS 67

    5.4 Delayed Renewal Events

    In a simple random walk with X0 = 0, the event of the walk returning to

    0 is a renewal event. When Xn = 0 for some n, the process renews itself:

    it behaves as if we have just observed X0 = 0, and we can re-set the clock

    back to 0. More specifically, ifX10 = 0, then{X10 = 0, X10+1, X10+2, . . .} isstochastically the same as {X0= 0, X1, X2, . . .}. However, let be the eventofXn = 1, then is not a renewal event. When X5 = 1,{X5, X6, . . .} doesnot behave the same as {X0, X1, . . .}. Hence, we cannot re-set the clock backto 0 and pretend that nothing happened.

    If we observe that X5= 1 and X19= 1, then{X19+0= 1, X19+1, . . .}willhave the same stochastic property of the system{X5 = 1, X5+1, X5+2, . . .}.Hence, the event does not renewal itself when it first occurs, but after its

    first occurrence, the future occurrence of renews the process to the time

    when it first occurs. Such events are calleddelayed renewal events.

    The main difference between the delayed renewal and the usual renewal

    events is: the waiting time for the first occurrence of has different distribu-

    tion from the distribution of the inter-occurrence times. An informal way to

    describe the delayed renewal event is: we missed the beginning and started

    from the middle of the sequence.

    Suppose that is a delayed renewal event. Let us define some quantities:

    1){bn}: the probability that first occurs on trial n, n = 0, 1, 2, . . ..2){fn}: the probability thatfirst occurs again n trials later once that

    has occurred,

    3){un}: the probability that occurs on trial n, given that occurredon trial 0.

    4){vn}: unconditional probability that occurs on trial n.By convention, we suppose that f0 = 0 but we do allow b0 > 0 so that

    may occur for the first time at trial 0. LetB(s), F(s), U(s) and V(s) be

    corresponding generating functions. We have,

    U(s) = [1 F(s)]1, |s|

  • 8/10/2019 Applied Probability Theory - J. Chen

    73/177

    68 CHAPTER 5. RENEWAL EVENTS

    is recurrent if f =

    fn = 1 and transient if f < 1. Periodicities are

    determined by examining g.c.d.{n: fn> 0}. Note that it is possible that is a recurrent event and yet has non-zero probability that will never occur,but once it does it then occurs infinitely often.

    To find V(s), let us note that when occurs at trial n 1, either occurs for the first time at n with probability bn=bnu0, or occurs for the

    first time at some intermediate trialk < n and then occurs again at n. Thus,

    vn= b0un+ b1un1+ + bnu0, n= 0, 1, 2, . . . .We recognize the right side as the convolution of{bn} with{un} and so

    V(s) =B(s)U(s),

    |s

    |

  • 8/10/2019 Applied Probability Theory - J. Chen

    74/177

    5.5. SUMMARY 69

    5.5 Summary

    Table 5.1: Summary of some concepts

    Terminology Definition

    Event It is a property of a stochastic process.

    Its occurrence can be determined after n trials.

    Renewal Event When this type of event occurs, the stochastic

    process undergoes a renewal: the random behavior of the

    process from this point is the same as the process fromtime zero

    Delayed Renewal At the second time when this type of event occurs

    Event the process undergoes a renewal: the random behavior of the

    process from this point is the same as the process from

    time when it occurred for the first time.

    Recurrent The renewal event will occur with probability 1.

    Transient The renewal event may never occur.

    Positive Recurrent The renewal event is recurrent and the

    expected waiting time for the next occurrence is finite

    Null Recurrent The renewal event is recurrent but theexpected waiting time for the next occurrence is infinite

    Period The greatest common divisor of the number of trials

    after which the renewal event can happen.

    Aperiodic The period of the renewal event is 1.

    5.6 Problems

    1. A fair die is rolled repeatedly. We keep a record of the score of eachrol