Appendix b

23
Appendix B1 Measure and Measurability σ-algebra Let Ω denote a set of elements of interest, which is referred to as a space, or sample space in statistical context. A set Ω is said to be countable if its elements can be listed as a sequence: Ω = {ω 1 2 ,... }. Otherwise Ω is uncountable. In particular, any finite set Ω is countable. Any interval [a, b] with a<b is uncountable. A space Ω is said to be discrete if it is countable; or continuous if it is an interval or a product of intervals, such as: Ω= R =(-∞, ), Ω = [0, ), Ω = [0, 1], Ω= R n = {(x 1 ,...,x n ): x 1 ,...,x n R}, Ω = [0, ) 2 = {(x, y ): x, y 0}, or Ω = [0, ) × [0, 1] = {(x, y ): x 0, 0 y 1}, etc. A collection F of subsets of a space Ω is called a σ-algebra or σ-field if it satisfies the following axioms: (i) The empty set ∅∈F ; (ii) If E ∈F , then its complement E c ∈F ; (iii) If E i ∈F , i =1, 2,... , then i=1 E i ∈F . 1

description

math

Transcript of Appendix b

  • Appendix B1

    Measure and Measurability

    -algebra

    Let denote a set of elements of interest, which is referred to

    as a space, or sample space in statistical context.

    A set is said to be countable if its elements can be listed as a

    sequence: = {1, 2, . . .}. Otherwise is uncountable.

    In particular, any finite set is countable. Any interval [a, b]

    with a < b is uncountable.

    A space is said to be discrete if it is countable; or continuous

    if it is an interval or a product of intervals, such as:

    = R = (,), = [0,), = [0, 1],

    = Rn = {(x1, . . . , xn) : x1, . . . , xn R},

    = [0,)2 = {(x, y) : x, y 0}, or

    = [0,) [0, 1] = {(x, y) : x 0, 0 y 1}, etc.

    A collection F of subsets of a space is called a -algebra or

    -field if it satisfies the following axioms:

    (i) The empty set F ;

    (ii) If E F , then its complement Ec F ;

    (iii) If Ei F , i = 1, 2, . . ., theni=1

    Ei F .

    1

  • The three axioms above further imply:

    (iv) The state space F ;

    (v) If E F and F F , then E F F , E F F , and

    E F = E F c F ;

    (vi) If Ei F , i = 1, 2, . . ., theni=1

    Ei F .

    In summary, a -algebra is nonempty and closed under finite or

    countable operations of unions, intersections and complements.

    In other words, it is a self-contained collection of subsets.

    Given a collection G of subsets of , the smallest -algebra F

    such that G F is called the -algebra generated by G, and

    denoted by F = (G).

    The smallest -algebra is F = ({}) = {,}. The largest

    -algebra is the collection of all subsets of , denoted by 2.

    Given E (E 6= ,), the smallest -algebra that includes E

    as a member is F = ({E}) = {, E,Ec,}.

    Obviously, if G1 G2, then (G1) (G2).

    If is a discrete space, then every subset E of can be expressed

    as a countable union of single-element sets:

    E =E

    {}

    As a result, if a -algebra F on a discrete space includes all

    single-point sets {} for , then F = 2.

    2

  • Measurable sets

    Given a space and a -algebra F on , a subset E of is said

    to be measurable with respect to F if and only if E F .

    By the definition of -algebra, any countable union, intersection

    and complement of measurable sets are also measurable.

    Borel field and sets

    If = R and G = {[a, b] : a, b }, then B = (G) is called

    the Borel algebra or Borel field on R. Equivalently, the intervals

    [a, b] can be replaced by (a, b), (a, b], [a, b), (, b], etc.

    Similarly we can define the Borel field on [0,) or any other

    interval; as well as on Rn by the products of intervals:

    B =

    ({ni=1

    [ai, bi] : ai, bi R, i = 1, . . . , n

    })

    Any A B is called a Borel set, or said to be Borel measurable.

    In particular, any single-point set {a} = [a, a] is a Borel set, and

    consequently, every countable subset of R is a Borel set as well.

    The Borel field B on an interval I must include all intervals

    and countable unions, intersections and complements of intervals

    contained in I. It is quite large and includes all subsets we need

    to define probability, but it does not include every subset of an

    interval; in other words, B 6= 2I . There exist many non-Borel

    sets, but it is not easy to show an explicit example.

    3

  • Measure

    Given a space and a -algebra F on , a set function m()

    defined on F is said to be a measure if it satisfies the following

    two axioms:

    (i) m(E) 0 for any E F ;

    (ii) If E1, E2, . . . F are disjoint or mutually exclusive in the

    sense that Ei Ej = for i 6= j, then

    m

    (i=1

    Ei

    )=

    i=1

    m (Ei)

    A measure defined on the Borel field is called a Borel measure.

    Among the most useful Borel measures is the Lebesgue measure,

    which assigns measure

    m([a, b]) = m((a, b)) = m((a, b]) = m([a, b)) = b a

    to any interval in R. In particular, for any a R,

    m({a}) = m ([a, a]) = a a = 0

    Consequently, every countable subset of R has zero measure.

    The Lebesgue measure is defined on a -algebra F larger than

    the Borel field B. There exist, however, subsets of an interval I

    for which even the Lebesgue measure cannot be assigned. That

    explains why 2I is too large to be a useful -algebra.

    For the purpose of probability, the Borel field together with the

    Lebesgue measure is sufficient.

    4

  • Measurable functions

    A real-valued function g(x) defined on R is said to be measurable

    if g1(A) = {x : g(x) A} is a Borel set for every Borel set A.

    In particular, an indicator function g(x) = I{xB} of any B B

    is measurable since g1(A) = , B, Bc or R for any A B.

    A real-valued function g(x) defined on R is said to be Riemann

    integrable if the integralRg(x)dx is well-defined in the usual

    sense of Calculus. This integral is referred to as the Riemann

    integral.

    A property is said to hold almost everywhere if it holds on a

    Borel set B such that m(Bc) = 0 under the Lebesgue measure.

    A function is Riemann integrable if and only if it is continuous

    almost everywhere.

    An almost everywhere continuous function g(x) is measurable.

    Consequently, all analytic, continuous, piecewise continuous, and

    Riemann integrable functions are measurable.

    In fact, all functions of practical interest are measurable.

    A simple example of a measurable function that is not Riemann

    integrable is the indicator of the set R of rational numbers:

    IR(x) = I{xR} ={1 if x is a rational number0 if x is an irrational number

    Since R is countable, it is a Borel set and so IR(x) is measurable.

    It is however nowhere continuous, hence not Riemann integrable.

    5

  • Lebesgue integral

    The Lebesgue integral is defined for measurable functions.

    If g(x) is a Riemann integrable function, its Lebesgue integral

    coincides with its Riemann integral. Hence we will use the same

    notation for Lebesgue and Riemann integrals.

    If g(x) = 1IA1(x) + + kIAk (x) is a linear combination of

    indicators of Borel sets, its Lebesgue integral is defined byR

    g(x)dx =ki=1

    im(Ai)

    where m() is the Lebesgue measure. In particular, if R is the

    set of rational numbers, thenRIR(x)dx = m(R) = 0.

    For a measurable function g(x) 0, there exists a sequence of

    linear combinations {gn(x)} of indicators of Borel sets such that

    gn(x) g(x). The Lebesgue integral of g(x) is then defined byR

    g(x)dx = limn

    R

    gn(x)dx

    A measurable function g(x) is said to be Lebesgue integrable ifR

    |g(x)|dx =

    R

    g+(x)dx+

    R

    g(x)dx

  • Probability space and random variable

    For a continuous state space S, by a measurable set A S we

    mean A B; that is, A is Borel measurable.

    Let be a sample space (of all possible outcomes) and F be a

    -algebra of subsets of . Each E F is called an event.

    A probability (measure) is a measure Pr() defined on F such that

    Pr() = 1. The probability Pr(E) is defined only for E F .

    The triplet (,F ,Pr()) is called a probability space.

    A real-valued functionX = X() of is said to be a random

    variable if and only if {X A} F for every Borel set A.

    If X is a random variable and g(x) is a measurable function,

    then g(X) is also a random variable.

    If X is a continuous random variable, then its state space S is

    an interval and Pr(X A) is defined only for Borel set A S.

    Given a cdf F (x), we can define Pr(X x) = F (x) and extend

    it to Pr(X A) for any Borel set A via the axioms or properties

    of the probability, such as

    Pr(X (a, b]) = Pr(a < X b) = Pr(X b) Pr(X a)

    Pr(X (, b)) = Pr(X < b) = limn

    Pr(X b n1

    )and so on, as the Borel field B is generated by {(, x], x R}.

    Such extensions may not be possible if A / B.

    This explains why a cdf can determine a probability distribution.

    7

  • Appendix B2

    Stationary Distribution

    Existence

    Let {Xn : n = 0, 1, . . .} be a Markov chain with a finite state

    space S = {1, 2, . . . ,N} and transition matrix P = (pij)NN .

    Since pi1 + + piN = 1, i = 1, . . . ,N, the matrix

    I P =

    1 p11 p12 . . . p1Np21 1 p22 . . . p2N...

    .... . .

    ...pN1 pN2 . . . 1 pNN

    has a zero sum of elements for each row.

    Thus the N columns of I P are linearly dependent, implying

    Rank(I P ) < N . Consequently, the equation pi(I P ) = 0 or

    pi = piP has at least one solution pi 6= 0.

    For convenience, we write pi > 0 for a vector pi = (pi1, . . . , piN )

    (either a row or a column) if pij 0 for all j = 1, . . . ,N and

    pij > 0 for some j. We also write pi < 0 if pi > 0.

    We now prove the existence of a row vector pi > 0 such that

    pi = piP by induction on the number N of states.

    Start with N = 1. In this case, we must have P = 1, hence

    pi = 1 satisfies pi = piP obviously.

    8

  • For N > 1, assume there exists such a pi > 0 with k states for

    1 k < N . Then we need to prove the case with N states.

    Let pi 6= 0 be a solution to pi = piP . If neither pi > 0 nor pi < 0,

    we can write pi = [pi1 pi2] and

    P =

    [P11 P12P21 P22

    ](B2.1)

    where pi1 is a vector of k negative elements (1 k < N), pi2 > 0

    has N k elements, and P11 is a k k matrix.

    If P12 = 0, then P11 is a kk transition matrix (with the elements

    of each row add to 1). Hence by the induction assumption, there

    exists pi > 0 such that pi = piP11.

    Consequently,

    [pi 0]P = [pi 0]

    [P11 0P21 P22

    ]= [piP11 0] = [pi 0]

    Thus pi = [pi 0] > 0 and pi = piP .

    If P12 6= 0, then pij > 0 for some i k and j > k, so that

    pi1 + + pik < pi1 + + pik + pij 1.

    Let 1k = [1 1 1]T denote the k 1 vector with all elements

    equal to 1. Then

    (I P11)1k =

    1 p11 p1k

    ...1 pk1 pkk

    > 0

    Thus pi1(I P11)1k < 0 since all elements of pi1 are negative.

    9

  • On the other hand, by (B2.1),

    pi = piP pi1 = pi1P11 + pi2P21 pi1(I P11) = pi2P21

    This together with pi2 > 0 leads to a contradiction:

    0 > pi1(I P11)1k = pi2P211k 0

    It follows that when P12 6= 0, either pi > 0 or pi < 0. Thus we

    can take either pi = pi > 0 or pi = pi > 0 to satisfy pi = piP .

    We have shown the existence of pi > 0 such that pi = piP with

    N states. By the principle of mathematical induction, this holds

    for all N = 1, 2, . . ..

    Let pi = (pi1 , . . . , piN ) > 0 be a row vector such that pi

    = piP .

    Then pi1 + + piN > 0.

    Take pi = [pi1 piN ] with

    pij =pij

    pi1 + + piN

    , j = 1, 2, . . . ,N.

    Then pi > 0, pi1 + + piN = 1 and

    pi(I P ) =pi(I P )

    pi1 + + piN

    = 0 = pi = piP

    Thus pi is a stationary distribution of {Xn}.

    This shows that a Markov chain with a finite state space must

    have at least one stationary distribution.

    10

  • Uniqueness

    Suppose there exist two stationary distributions pi 6= pi. Then

    pi = pi pi 6= 0 and

    piP = (pi pi)P = piP piP = pi pi = pi

    Thus pi is a non-zero solution to equation pi = piP .

    Since pi and pi each has a sum 1 over its elements, the elements

    of pi has a sum 0. Hence neither pi > 0 nor pi < 0.

    Then a partition of P as in (B2.1) leads to

    P =

    [P11 0P21 P22

    ]

    where P11 is a k k matrix with 1 k < N .

    It follows that the transition matrix at time n has the form

    P (n) = Pn =

    [Pn11 0

    P(n)21 P

    n22

    ](B2.2)

    where P(n)21 and P

    n22 are two matrices of orders (N k) k and

    (N k) (N k) respectively.

    From (B2.2) we see that p1N (n) = 0 for all n. This shows that

    the Markov chain with transition matrix P is reducible.

    Therefore, two distinct stationary distributions are possible only

    with a reducible Markov chain.

    Consequently, an irreducible Markov chain with finite states

    must have a unique stationary distribution.

    11

  • Appendix B3

    Convergence of Markov Chain

    For any time-homogeneous Markov chain with finite N states

    and transition matrices P = (pij)NN and Pn = (pij(n))NN , it

    has been theoretically proved that limn

    pij(n) exists (i.e., pij(n)

    converges) for every aperiodic state j. Thus if P is aperiodic

    (every state of P is aperiodic), then Pn converges.

    If P is irreducible, then P = limn

    Pn exists if and only if P is

    aperiodic. This is because each row of P is the unique stationary

    distribution pi = (pi1, . . . , piN ) of P with pij > 0 for at least one

    state j. Therefore,

    limn

    Pn = P = limn

    pjj(n) = pij > 0 = pjj(n) > 0

    for all sufficiently large n. This means that state j is aperiodic

    (a periodic state j must have pjj(n) = 0 for infinitely many n),

    and so P is aperiodic since it is irreducible.

    If all states of P are periodic, then Pn must diverge. To see this,

    let P11 be an irreducible block of P that is a transition matrix

    itself. Then Pn11 diverges as P11 is periodic, hence Pn diverges.

    However, individual pij(n) may converge for some (i, j) even if

    every state of P is periodic. A trivial example is a reducible P ,

    which has some states i, j such that pij(n) = 0 for all n.

    If P has both periodic and aperiodic states (such a P must be

    reducible), then Pn may or may not converge.

    12

  • A simple example with divergent Pn is

    P =

    0 1 01 0 00 0 1

    = Pn = { I3 for even n

    P for odd n

    Hence states 1 and 2 are periodic with d = 2, whereas state 3 is

    aperiodic. As Pn oscillates between I3 and P , it diverges.

    The next example shows that Pn may converge:

    P =

    0 0.5 0.50.5 0 0.5

    0 0 1

    = Pn =

    0.5n 0 1 0.5n0 0.5n 1 0.5n

    0 0 1

    for even n and

    Pn =

    0 0.5n 1 0.5n0.5n 0 1 0.5n

    0 0 1

    for odd n.

    Thus states 1 and 2 are periodic (d = 2) and state 3 is aperiodic.

    Pn converges obviously in this case:

    P = limn

    Pn =

    0 0 10 0 10 0 1

    ,

    where each row of P is the unique stationary distribution of P .

    This example also shows that limn

    pij(n) = 0 exist for periodic

    states j = 1, 2 beyond the trivial case pij(n) = 0 for all n.

    Even if Pn diverges, pi(n) = pi(0)Pn may converge for some pi(0)

    that satisfies certain conditions. This is demonstrated in the

    following example.

    13

  • Consider a transition matrix of the form

    P =

    [A B0 D

    ]NN

    with D =

    [0 11 0

    ]22

    (N > 2)

    Assume that An 0 as n. Then(I A2

    )1exists.

    Note that

    P 2 =

    [A2 AB + BD0 I2

    ]has aperiodic states N 1 and N . Hence

    limm

    P 2m = limm

    (P 2)m =

    [0 Q0 I2

    ]for some Q (B3.1)

    (as pij(2m) converges for aperiodic states j = N 1 and N).

    Since P 2m+2 has the same limit as P 2m, (B3.1) implies[0 Q0 I2

    ]= P 2

    [0 Q0 I2

    ]=

    [A2 AB +BD0 I2

    ] [0 Q0 I2

    ]= Q = A2Q+AB +BD =

    (I A2

    )Q = AB + BD

    = Q =(I A2

    )1(AB +BD) (B3.2)

    It also follows from (B3.1) that

    limm

    P 2m+1 =

    [0 Q0 I2

    ]P =

    [0 QD0 D

    ](B3.3)

    (B3.1) and (B3.3) show that Pn diverges as D 6= I2.

    Let pi1 be a 1 (N 2) vector of nonnegative elements with sum

    no more than 0.5, and pi2 = [0.5 0.5] pi1Q. Then

    pi1Q+ pi2 = [0.5 0.5] = [0.5 0.5]D = (pi1Q+ pi2)D (B3.4)

    14

  • It follows from (B3.1), (B3.3) and (B3.4) that

    limm

    [ pi1 pi2 ]P2m = [ 0 pi1Q+ pi2 ] = [0 pi1QD + pi2D ]

    = limm

    [pi1 pi2 ]P2m+1 (B3.5)

    where 0 is a 1 (N 2) vector of zeros. This shows that the

    limit limn

    [ pi1 pi2 ]Pn exists.

    Since each row of the matrix Q must have sum equal to 1, it

    is not difficult to see that pi(0) = [pi1 pi2 ] provides an initial

    distribution such that

    limn

    pi(n) = limn

    pi(0)Pn = limn

    [ pi1 pi2 ]Pn = [0 0.5 0.5 ]

    Therefore, if pi1 and pi2 have nonnegative elements and satisfy

    the condition pi1Q + pi2 = [0.5 0.5], then pi(n) converges with

    pi(0) = [pi1 pi2 ], although Pn diverges.

    The domain for pi(0) = [pi1 pi2 ] to meet the above conditions

    has a dimension N 2, which is one less than the dimension

    N 1 for pi(0) without such conditions.

    The above example can also show that limn

    pij(n) > 0 is possible

    for a periodic state j. To see this, take N = 3, A = 0.8 and

    B = [0.1 0.1]. Then by (B3.2),

    Q = (1 0.82)1(0.8[0.1 0.1] + [0.1 0.1]D)

    =1

    0.36[0.18 0.18] = [0.5 0.5] = QD

    Hence by (B3.1) and (B3.3), limn

    p1j(n) = 0.5 > 0 for periodic

    states j = 2, 3.

    15

  • Appendix B4

    Properties of Poisson Processes

    Partition of a Poisson process

    Let N(t) be a Poisson process with rate , which counts the

    number of events occurred by time t.

    Each event is classified into one of k types, independent of N(t),

    with Pr(Type j) = pj , j = 1, . . . , k, where p1 + + pk = 1.

    Let Nj(t) be the numbers of type j events occurred by time t,

    j = 1, . . . , k. Conditional on N(t) = n, N1(t), . . . Nk(t) have a

    multinomial distribution:

    Pr(Nj(t) = nj, j = 1, . . . , k|N(t) = n) =n!

    n1! nk!pn11 p

    nkk

    where n1, . . . , nk satisfy n1 + + nk = n.

    Consequently,

    Pr(N1(t) = n1, . . . ,Nk(t) = nk)

    = Pr(Nj(t) = nj , j = 1, . . . , k|N(t) = n)Pr(N(t) = n)

    =n!

    n1! nk!pn11 p

    nkk e

    t (t)n

    n!

    =(t)n1++nk

    n1! nk!pn11 p

    nkk e

    (p1++pk)t

    =(tp1)

    n1

    n1!ep1t

    (tpk)nk

    nk!epkt

    Thus N1(t),N2(t), . . . ,Nk(t) are independent Poisson processes

    with rates p1, . . . , pk respectively.

    16

  • Transform of a multivariate density

    Before deriving the joint distribution of arrival times, we first

    review the transform of a multivariate density function.

    Let fX(x1, . . . , xn) denote the joint density function of random

    variables X1, . . . ,Xn.

    Assume that a one-to-one transform between (X1, . . . ,Xn) and

    (Y1, . . . , Yn) is given by Xi = gi(Y1, . . . , Yn), i = 1, . . . , n.

    Then by multivariate calculus, the joint density of Y1, . . . , Yn is

    given by

    fY (y1, . . . , yn) = fX(x1, . . . , xn)|J | (B4.1)

    where xi = gi(y1, . . . , yn), i = 1, . . . , n, and

    J =

    (xiyj

    )nn

    =

    x1y1

    x1y2

    x1yn

    x2y1

    x2y2

    x2yn

    ......

    . . ....

    xny1

    xny2

    xnyn

    which is called the Jacobian of the transform from (x1, . . . , xn)

    to (y1, . . . , yn).

    If the transform is linear: X = AY , where X = (X1, . . . ,Xn)T ,

    Y = (Y1, . . . , Yn)T , and A = (aij)nn is a constant matrix, then

    xiyj

    =

    yj

    nk=1

    aikyk = aij , i, j = 1, . . . , n.

    Hence J = |A| (the determinant of A).

    17

  • Arrival times

    Given a Poisson process N(t) with rate to count the number

    of events, the arrive time of the kth event is given by

    Ak = T1 + T2 + + Tk, k = 1, 2, . . .

    where T1, T2, . . . are i.i.d. exponentially distributed with rate .

    Let (a1, . . . , an) and (t1, . . . , tn) denote the values of random

    vectors (A1, . . . , An) and (T1, . . . , Tn) respectively. Thena1a2...an

    =

    1 0 01 1 0...

    .... . .

    ...1 1 1

    t1t2...tn

    (B4.2)

    subject to restrictions a1 < < an and t1 > 0, . . . , tn > 0.

    Let fA and fT denote the joint densities of (A1, . . . , An) and

    (T1, . . . , Tn) respectively.

    Since the matrix in (B4.2) has determinant equal to 1, we have

    J = 1 for the transform in (B4.2). Hence by (B4.1),

    fA(a1, . . . , an|N(t) = n) = fT (t1, . . . , tn|N(t) = n)

    =Pr(N(t) = n|t1, . . . , tn)fT (t1, . . . , tn)

    Pr(N(t) = n)

    =Pr(N(t)N(an) = 0)e

    t1 etn

    Pr(N(t) = n)

    =e(tan)ne(t1++tn)

    et(t)n/n!=

    n!

    tn(B4.3)

    if 0 a1 < < an t; and 0 otherwise.

    18

  • LetX1, . . . ,Xn be i.i.d. random variables with a common density

    fX(x), and X(1) < < X(n) their order statistics.

    The joint density of of X1, . . . ,Xn at (x1, . . . , xn) is given by

    f(x1, . . . , xn) = fX(x1) fX(xn) = fX(x(1)) fX(x(n)).

    Given x(1) < x(2) < < x(n), there are n! unordered n-tuples

    (x1, . . . , xn) whose ordered values are equal to x(1), . . . , x(n), each

    with density fX(x(1)) fX(x(n)).

    Therefore, the density of X(1), . . . ,X(n) is given by

    f(n)(x(1), . . . , x(n)) = n!ni=1

    fX(x(i)) (B4.4)

    For example, when n = 3 and (x(1), x(2), x(3)) = (1, 2, 3),

    f(3)(1, 2, 3) = f(1, 2, 3) + f(1, 3, 2) + f(2, 1, 3)

    + f(2, 3, 1) + f(3, 1, 2) + f(3, 2, 1)

    = 3!fX(1)fX(2)fX(3)

    In particular, if fX(x) = t1I{0xt} is uniform over [0, t], then

    f(n)(x(1), . . . , x(n)) = n!

    (1

    t

    )n=

    n!

    tn(B4.5)

    if 0 x(1) < < x(n) t; and 0 otherwise.

    Compare (B4.5) with (B4.3), we see that the conditional joint

    distribution of the arrival times A1 < < An given N(t) = n is

    the same as that of the order statistics of n independent uniform

    random variables over interval [0, t].

    19

  • The time-inhomogeneous case

    If the Poisson processN(t) is time-inhomogeneous with intensity

    function (t), then by the independent increments,

    Pr(Tk > t|T1 = t1, . . . , Tk1 = tk1)

    = Pr(N(ak1 + t)N(ak1) = 0) = e[(t+ak1)(ak1)]

    where a0 = t0 = 0 and ak = t1 + + tk, k = 1, 2, . . ..

    Thus the density of Tk given {T1 = t1, . . . , Tk1 = tk1} is

    fk(t|t1, . . . , tk1) = (t+ ak1)e[(t+ak1)(ak1)]

    and the joint density of T1, . . . , Tn is given by

    f(t1, ..., tn) =nk=1

    fk(tk|t1, ..., tk1) =nk=1

    (ak)e[(ak)(ak1)]

    = (a1) (an)e(an)

    Then similar to (B4.3), for 0 a1 < < an t,

    fA(a1, . . . , an|N(t) = n) =e[(t)(an)](a1) (an)e

    (an)

    e(t)[(t)]n/n!

    = n!(a1) (an)

    [(t)]n= n!

    nj=1

    (aj)

    (t)(B4.6)

    Compare (B4.6) with (B4.4), we can see that given N(t) = n,

    the arrival times A1 < < An are distributed as the order

    statistics of i.i.d. X1, . . . ,Xn with a common density

    fX(x) =(x)

    (t), 0 x t.

    20

  • Define Poisson process by inter-arrival times

    Let {Nt, t 0} be a counting process and T1, T2, . . . are the inter-

    arrival time of Nt. If T1, T2, . . . is a sequence of i.i.d. exponential

    random variables with a common rate , we can show that the Nt is

    a Poisson process with rate in the following steps.

    First, Nt can be defined by T1, T2, . . . as follows:

    Nt = min{k : Ak > t}, (B4.6)

    where Ak = T1 + + Tk is the arrival time of the kth event.

    Then (B4.6) implies

    {Nt n} = {An+1 > t}, {Nt = n} = {An t < An+1} (B4.7)

    Next, it is not difficult to show that Nt is Poisson distributed

    with mean t (in a tutorial exercise).

    Then we show that Nt has Markov property. Let nt 0 be

    integer-valued and nondecreasing in t 0. Then Tns+1, . . . , Tnt

    are independent of Anu = T1 + + Tnu for any 0 u s < t

    and of Tnu+1 if nu < ns. These together with (B4.7) imply

    Pr(Nt nt|Nu = nu, u s,Ans = as)

    = Pr(Ant > t|Anu u < Anu + Tnu+1, u s,Ans = as)

    = Pr(as + Tns+1 + + Tnt > t|as + Tns+1 > s,Ans = as)

    = Pr(Nt nt|Ns = ns, Ans = as) (B4.8)

    for any 0 s < t and 0 as s.

    21

  • Note that the value as of Ans must satisfy 0 as s since ns is

    the number of arrivals no later than time s by (B4.6) and (B4.7).

    It follows from (B4.8) that

    Pr(Nt nt|Nu = nu, u s) = Pr(Nt nt|Ns = ns)

    for any 0 s < t. This shows the Markov property of Nt.

    It remains to show that Nt+u Nt is Poisson distributed with

    mean u and is independent of Nt for any t, u > 0. By (4.7),

    Pr(Nt+u Nt n|Nt = m) = Pr(Nt+u m+ n|Nt = m)

    = Pr(Am+n+1 > t+ u|Am t < Am+1)

    Hence for any 0 am t, by the property of the exponential

    distribution and the independence between Am and Tm+1,

    Pr(Nt+u Nt 0|Nt = m,Am = am)

    = Pr(Nt+u m|Nt = m,Am = am)

    = Pr(Am+1 > t+ u|Am = am t < Am+1)

    = Pr(am + Tm+1 > t+ u|am + Tm+1 > t)

    = Pr(Tm+1 > t+ u am|Tm+1 > t am)

    = Pr(Tm+1 > u) = eu (B4.9)

    As eu does not depend on m and am, (B4.9) shows that

    Pr(Nt+u Nt 0|Nt = m) = Pr(Nt+u Nt 0)

    = eu = Pr(Nu = 0) = Pr(Nu 0) (B4.10)

    22

  • For n 1, let X = Tm+2 + + Tm+n+1 Gamma(n, ) and

    define event E = E(x) = {Am = am t,X = x}. Then an

    argument similar to that of (B4.9) leads to

    Pr(Nt+u Nt n|Nt = m,E) = Pr(Nt+u m+ n|Nt = m,E)

    = Pr(am + Tm+1 + x > t+ u|am + Tm+1 > t,E)

    = Pr(Tm+1 > u x) = e(ux)I{xu} + I{x>u} (B4.11)

    Multiply (B4.11) by the density fX(x) of X Gamma(n, ) and

    integrate over x 0, we get

    Pr(Nt+u Nt n|Nt = m) =

    0

    Pr(Tm+1 > u x)fX(x)dx

    =

    u0

    e(ux)nxn1ex

    (n 1)!dx+

    u

    nxn1ex

    (n 1)!dx

    = eun u0

    xn1

    (n 1)!dx+ Pr(An > u)

    = eunun

    n!+ Pr(Nu n 1) = Pr(Nu n) (B4.12)

    It follows from (B4.10) and (B4.12) that Nt+uNt has a Poisson

    distribution with mean u and is independent of Nt. Thus Nt is

    a Poisson process with rate .

    Remarks

    (i) (B4.6) and (B4.7) summarise the relationship between a counting

    process and its arrival and inter-arrival times.

    (ii) A processNt defined by (B4.6) is a Markov process provided that

    T1, T2, . . . are independent (need not be i.i.d. or exponential).

    23