STAT:5100 (22S:193) Statistical Inference I Homework...

61
STAT:5100 (22S:193) Statistical Inference I Homework Assignments Luke Tierney Fall 2015

Transcript of STAT:5100 (22S:193) Statistical Inference I Homework...

  • STAT:5100 (22S:193) Statistical Inference IHomework Assignments

    Luke Tierney

    Fall 2015

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    Assignment 1

    Due on Monday, August 31, 2015.

    1. For each of the following experiments, describe a reasonable sample space:

    (a) Toss a coin four times.

    (b) Count the number of insect-damaged leaves on a plant.

    (c) Measure the lifetime (in hours) of a particular brand of light bulb.

    (d) Three people arrive at an airport checkpoint. Two of the three are ran-domly chosen to complete a survey.

    2. The set-theoretic difference A\B = A ∩ Bc is the set of all elements in A thatare not in B. The symmetric difference A∆B = (A\B) ∪ (B\A) is the set ofall elements in either A or B but not both. Verify the following identities:

    (a) A\B = A\(A ∩B)(b) A∆B = Ac∆Bc

    (c) A ∪B = A ∪ (B\A)(d) B = (B ∩ A) ∪ (B ∩ Ac)

    3. Problem 1.4 in the textbook

    4. Problem 1.5 in the textbook

    5. Problem 1.13 in the textbook

    Solutions

    1. (a) Toss coin 4 times:

    {(H,H,H,H), . . .} = {(x1, x2, x3, x4) : xi ∈ {H,T}}

    or{0, 1, 2, 3, 4}

    (b) Count number of insect-damaged leaves:

    {0, 1, . . . , N} N = # leaves (or upper bound){0, 1, 2, . . .} if no upper bound is available

    1

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    (c) Measure lifetime in hours:

    {0, 1, 2, . . .} if rounded (can put in upper limit)[0,∞) if fractional hours are allowed

    (d) Two out of three people chosen to complete a survey: Suppose the peopleare labeled A, B, and C. One possible sample space is is the collection ofall subsets of size 2 that can be chosen from the set {A,B,C}:

    {{A,B}, {A,C}, {B,C}}.

    Another possibility is the collection of all ordered pairs that can be formed:

    {(A,B), (B,A), (A,C), (C,A), (B,C), (C,B)}.

    2. (a) A\B is defined as A ∩Bc. To see that A\B = A\(A ∩B):

    A\(A ∩B) = A ∩ (A ∩B)c

    = A ∩ (Ac ∪Bc) De Morgan’s law= (A ∩ Ac) ∪ (A ∩Bc) distributive law= ∅ ∪ (A ∩Bc)= A\B

    (b) A∆B = Ac∆Bc: For any two sets A and B

    A\B = A ∩Bc

    = Bc ∩ A commutative law= Bc ∩ (Ac)c

    = Bc\Ac

    So

    A∆B = (A\B) ∪ (B\A)= (Ac\Bc) ∪ (Bc\Ac)= Ac∆Bc

    (c) A ∪B = A ∪ (B\A):

    A ∪B = A ∪ ((B ∩ A) ∪ (B ∩ Ac)) by part (d)= (A ∪ (B ∩ A)) ∪ (B ∩ Ac) associative law= A ∪ (B ∩ Ac)

    2

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    (d) B = (B ∩ A) ∪ (B ∩ Ac):

    (B ∩ A) ∪ (B ∩ Ac) = B ∩ (A ∪ Ac) distributive law= B ∩ S= B

    3. (a) A or B or both:

    P (A ∪B) = P (A) + P (B)− P (A ∩B)

    (b) A or B but not both:

    A∆B = (A ∪B)\(A ∩B)= (A ∩Bc) ∪ (B ∩ Ac)

    and the sets A ∩Bc and B ∩Ac are disjoint. Since A ∩B and A ∩Bc aredisjoint and their union is A,

    P (A) = P (A ∩B) + P (A ∩Bc)

    and therefore P (A ∩ Bc) = P (A) − P (A ∩ B). Similarly P (B ∩ Ac) =P (B)− P (A ∩B) and therefore

    P (A∆B) = P (A) + P (B)− 2P (A ∩B)

    (c) At least one of A or B: A ∪B.(d) At most one of A or B = not A ∩B = (A ∩B)c:

    P ((A ∩B)c) = 1− P (A ∩B)

    4. The event A ∩ B ∩ C is the event that the birth results in identical twins whoare female. The proportion of births satisfying this description is

    P (A ∩B ∩ C) = 190× 1

    3× 1

    2=

    1

    540= 0.001852.

    5. If A and B are disjoint then

    P (A ∪B) = P (A) + P (B) ≤ 1,

    but

    P (A) + P (B) = P (A) + 1− P (Bc) = 13

    + 1− 14

    =13

    12> 1,

    so A and B cannot be disjoint if P (A) = 13

    and P (Bc) = 14.

    Another way to see that this is not possible: If A and B are disjoint then A ⊂ Bcand therefore P (A) ≤ P (B), but the opposite is true.

    3

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    Assignment 2

    Due on Wednesday, September 9, 2015.

    1. Problem 1.14 from the textbook

    2. Suppose n balls are placed at random in n cells; cells can contain more thanone ball.

    (a) Show that the probability that exactly one cell remains empty is

    n(n− 1)(n2

    )(n− 2)!

    nn=

    (n2

    )n!

    nn

    (b) The R function defined as

    sim1

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    You can show it algebraically using the binomial theorem

    (a+ b)n =n∑k=0

    (n

    k

    )akbn−k

    taking a = b = 1.

    You can also use one of several counting arguments. One such argument isinductive ans starts with the number of subsets of the empty set, which is x0 = 1.Now let xn be the number of subsets of a set of n items An = {a1, a2, . . . , an}.A subset of An+1 = {a1, a2, . . . , an+1} is either also a subset of An+1 is eitherslso a subset An or it is the union of a subset of An with {an+1}. There are xnsubsets of each type, so the number of subsets of An+1 is xn+1 = 2xn,

    Another conting arguments uses the fact that each subset of a setAn = {a1, . . . , an}corresponds to an ordered list [x1, . . . , xn] of n 0’s and 1’s, with

    xi = 1 if the ai is in the subset.

    xi = 0 if the ai is not in the subset.

    For example, if A4 = {a1, a2, a3, a4}, then

    (1, 0, 0, 1)↔ {a1, a4}

    There are 2n such lists.

    2. (a) n balls are assigned at random into n cells. S has nn elements since multipleballs per call are allowed.

    Assume equally likely outcomes. Equivalently, assume balls are assignedindependently.

    Exactly one cell empty means:

    • one empty cell• one with two• n-2 with 1

    Choices:

    • n for the one empty cell• n− 1 for the cell with two balls•(n2

    )for the balls to use for the two-ball cell.

    • (n− 2)! arrangements for the other balls in their cells.So the number of ways to get one empty is

    n(n− 1)(n

    2

    )(n− 2)! =

    (n

    2

    )n!

    The probability of this arrangement is(n2

    )n!

    nn

    5

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    (b) We can define a function sim1empty to compute the probability of oneempty cell by simulation using N simulation replicates:

    sim1empty

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    > f f(20)

    [1] 137846528820

    > choose(40, 20)

    [1] 137846528820

    The identityn∑k=0

    (n

    k

    )2=

    (2n

    n

    )can be verified using induction, but a counting argument is simpler: Considera box containing n red and n blue balls and consider selecting a sample of nballs. There are

    (2nn

    )such samples. A sample has to contain some number k of

    red balls and n − k blue balls. The number of samples with k red and n − kblue balls is (

    n

    k

    )(n

    n− k

    )=

    (n

    k

    )2.

    So the total number of samples of size n from a set of size 2n satisfies(2n

    n

    )=

    n∑k=0

    (n

    k

    )(n

    n− k

    )=

    n∑k=0

    (n

    k

    )2.

    Another approach to counting the number of outcomes in which the playershave the same number of heads: We have 2n slots that are to be filled in witha head or a tail; the first n corresponding to player A and the second to playerB. Chose n of these slots; this can be done in

    (2nn

    )ways. Some number, say k of

    the chosen slots will be in the first half. Make these heads, and the remainingn− k slots in the first half tails. For the n− k chosen slots in the second half,make those tails and the remaining k heads. Then the result contains k headsin the first half and k heads in the second half. Every assignment of heads andtails with the same number of heads in each half corresponds in this way toa unique selection of n out of 2n slots, so there are the same number of suchassignments as there are ways to choose n slots out of 2n slots,

    (2nn

    ).

    5. There are 6k possible outcomes for the k rolls. Outcomes with the m-th 6 onroll k must have a 6 on roll k (one choice) and m − 1 in the first k − 1 rolls;there are

    (k−1m−1

    )ways to choose the positions for these 6 rolls, and , given these

    positions, there are 5k−m ways to choose the results for the remaining rolls. Sothe probability of the m-th 6 on roll k is(

    k−1m−1

    )5k−m

    6k=

    (k − 1m− 1

    )(1

    6

    )m(1− 1

    6

    )k−m

    7

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    Assignment 3

    Due on Monday, September 14, 2015.

    1. Problem 1.12 (b) from the textbook

    2. Problem 1.24 from the textbook

    3. Problem 1.34 from the textbook

    4. Problem 1.36 from the textbook

    5. An urn contains 11 balls numbered 0, 1, . . . , 10. A ball is selected at random.Suppose the number on the selected ball is k. A second urn is filled with k redballs and 10− k blue balls. Five balls are selected at random with replacementfrom the second urn.

    (a) Find the probability that the sample from the second urn consists of threered and two blue balls.

    (b) Given that the sample from the second urn consists of three red and twoblue balls, find the conditional probability that the ball selected from thefirst urn had the number k = 6.

    Solutions

    1. Let A1, A2, . . . be pairwise disjoint. For each i, let

    Bi =∞⋃j=i

    Aj

    Then for each n,∞⋃i=1

    Ai = A1 ∪ · · · ∪ An ∪Bn+1

    Since A1, A2, . . . , An, Bn+1 are pairwise disjoint,

    P

    (∞⋃i=1

    Ai

    )=

    n∑i=1

    P (Ai) + P (Bn+1)

    for every n by finite additivity. But

    B1 ⊃ B2 ⊃ · · ·

    8

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    and⋂Bi = ∅. So by continuity, P (Bn+1) ↓ 0. So

    n∑i=1

    P (Ai) = P

    (∞⋃i=1

    Ai

    )− P (Bn+1)→ P

    (∞⋃i=1

    Ai

    )and thus

    ∞∑i=1

    P (Ai) = P

    (∞⋃i=1

    Ai

    )An alternative approach is to show, using finite additivity and the continuityaxiom as stated, that for any set of events B1, B2, . . . that satisfy B1 ⊂ B2 ⊂B3 ⊂ . . . the identity

    P

    (∞⋃i=1

    Bi

    )= lim

    n→∞P (Bn)

    is true. Then let

    Bi =i⋃

    j=1

    Aj.

    These Bi satisfy B1 ⊂ B2 ⊂ B3 ⊂ . . . and∞⋃i=1

    Ai =∞⋃i=1

    Bi.

    By finite additivity

    P (Bi) =i∑

    j=1

    P (Ai)

    and by the result just stated

    i∑j=1

    P (Ai) = P (Bi)→ P

    (∞⋃i=1

    Bi

    )= P

    (∞⋃i=1

    Ai

    ).

    2. Let Ei be the event that the first head appears on toss i and let A be the eventthat player A wins. Then

    A = E1 ∪ E3 ∪ E5 ∪ . . .

    The Ei are pairwise disjoint, and P (Ei) = (1− p)i−1p, so

    P (A) = p+ (1− p)2p+ (1− p)4p+ . . .

    = p∞∑k=1

    (1− p)k−1

    =p

    1− (1− p)2

    =p

    1− (1− 2p+ p2)=

    p

    2p− p2=

    1

    2− p

    9

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    So if p = 12

    then P (A) = 23, and for all p

    P (A) =1

    2− p≥ 1

    2.

    An alternative approach to deriving the formula for P (A) is to condition on theresults of the first two tosses. For the first toss,

    P (A) = pP (A|E1) + (1− p)P (A|Ec1)= p+ (1− p)P (A|Ec1)

    For the second toss

    P (A|Ec1) = pP (A|E2 ∩ Ec1) + (1− p)P (A|Ec2 ∩ Ec1)= (1− p)P (A|Ec2 ∩ Ec1)

    Since the tosses are independent, the game starts over if the first two tosses aretails, i.e.

    P (A|Ec2 ∩ Ec1) = P (A)

    So we have an equation in P (A):

    P (A) = p+ (1− p)2P (A)

    and the solution is

    P (A) =1

    2− p≥ 1

    2.

    3.

    I : B,B,G P (B|I) = 2/3II : B,B,B,G,G P (B|II) = 3/5

    P (I) = P (II) = 1/2 (choose litter at random).

    a. P (B) = P (B|I)P (I) + P (B|II)P (II) = 23× 1

    2+ 3

    5× 1

    2= 1

    3+ 3

    10= 19

    30

    b. P (I|B) = P (I∩B)P (B)

    =23× 1

    21930

    =131930

    = 1019

    4. The probabilities of no hits and exactly one hits are:

    P (not hit) =

    (4

    5

    )10≈ 0.1074

    P (hit once) = 10× 15×(

    4

    5

    )9≈ 0.2684

    10

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    Therefore

    P (at least twice) = 1− P (not hit)− P (hit once)

    = 1−(

    4

    5

    )10− 10× 1

    5×(

    4

    5

    )9≈ 0.6242

    and

    P (at least twice|at least once) = P (at least twice)P (at least once)

    =1−

    (45

    )10 − 10× 15×(45

    )91−

    (45

    )10 ≈ 0.69935. (a) Given that ball k is chosen from the first urn, the probability of choosing

    three red and two blue balls from the second when sampling with replace-ment is the binomial probability

    P (three red|ball k) =(

    5

    3

    )(k

    10

    )3(1− k

    10

    )2.

    The probability of choosing ball k from the first urn and three red ballsfrom the second is therefore

    P (three red and ball k) = P (three red|ball k)P (ball k)

    =

    (5

    3

    )(k

    10

    )3(1− k

    10

    )2× 1

    11,

    and the unconditional probability of choosing three red balls from thesecond urn is

    P (three red) =10∑k=0

    P (three red and ball k)

    =10∑k=0

    (5

    3

    )(k

    10

    )3(1− k

    10

    )2× 1

    11.

    ≈ 0.1515

    This can be computed in R as

    > sum(dbinom(3, 5, (0 : 10) / 10) / 11)

    [1] 0.1515

    (b) The conditional probability that the chosen ball from the first urn wasnumbered k = 6, given that three red balls were chosen from the second,

    11

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    is

    P (ball k = 6|three red) = P (three red and ball k = 6)P (three red)

    =

    (53

    ) (k10

    )3 (1− k

    10

    )2 × 111

    P (three red)

    ≈ 0.031420.1515

    ≈ 0.2074

    This can be computed as

    > (dbinom(3, 5, 6 / 10) / 11) / (sum(dbinom(3, 5, (0 : 10) / 10)) / 11)

    [1] 0.2073807

    The probabilities for all k can be computed and graphed as

    k

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    Assignment 4

    Due on Monday, September 21, 2015.

    1. A coin has probability p of coming up heads and 1− p of tails, with 0 < p < 1.An experiment is conducted with the following steps:

    1. Flip the coin.

    2. Flip the coin a second time.

    3. If both flips land on heads or both land on tails return to step 1.

    4. Otherwise let the result of the experiment be the result of the last flip atstep 2.

    Assume flips are independent.

    (a) The R function

    sim1

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    Solutions

    1. (a) One possible approach:

    > sapply(seq(0.2, 0.9, by = 0.2),

    function(p) mean(replicate(10000, sim1(p))))

    [1] 0.4913 0.4965 0.5034 0.4991

    This suggests that the probability of heads may be 0.5 for any p.

    (b) Let A be the event that the process returns a head, and let B be the eventthat the process ends after the first two flips. Then

    P (A) = P (A ∩B) + P (A|Bc)P (Bc).

    Now A ∩ B is the event that the first toss is a tail and the second toss isa head, so P (A ∩ B) = (1 − p)p. B is the event that either the first tossis a head and the second a tail, or the first is a tail and the second is ahead; so P (B) = 2p(1− p) and P (Bc) = 1− 2p(1− p). If the process doesnot end with the first two tosses then it starts over again independently,so P (A|Bc) = P (A). Therefore P (A) satisfies

    P (A) = p(1− p) + P (A)(1− 2p(1− p))

    and thus

    P (A) =p(1− p)2p(1− p)

    =1

    2,

    as the simulation in part (a) suggests. The requirement that p > 0 andp < 1 ensures that the denominator is positive and that the process isguaranteed to end.

    2. (i) PX(A) = P (X ∈ A) ≥ 0 since P is a probability.(ii) PX(R) = P (X ∈ R) = P (S) = 1 since P is a probability.

    (iii) For Borel sets A1, A2, . . . that are pairwise disjoint,

    PX

    (⋃Ai

    )= P

    (X ∈

    ⋃Ai

    )= P

    (⋃{X ∈ Ai}

    )=∑

    P (X ∈ Ai)

    =∑

    PX(Ai)

    since Bi = {X ∈ Ai} are pairwise disjoint.

    3. All functions in (a)–(d) are continuous and therefore right continuous. Wetherefore only need to check that they are nondecreasing and have the rightlimits at ±∞.

    14

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    (a) For all x ∈ Rd

    dx

    (1

    2+

    1

    πtan−1(x)

    )=

    1

    π

    1

    1 + x2> 0

    so F (x) is increasing, and

    limx→−∞

    1

    2+

    1

    πtan−1(x) =

    1

    2+

    1

    π

    (−π

    2

    )= 0

    limx→∞

    1

    2+

    1

    πtan−1(x) =

    1

    2+

    1

    π

    π

    2= 1.

    So F (x) is a CDF.

    (b) For all x ∈ Rd

    dx(1 + e−x)−1 =

    e−x

    (1− e−x)2> 0

    so F (x) is increasing, and

    limx→−∞

    (1 + e−x)−1 = (1 +∞)−1 = 0

    limx→∞

    (1 + e−x)−1 = (1 + 0)−1 = 1.

    So F (x) is a CDF.

    (c) For all x ∈ Rd

    dxe−e

    −x= e−xe−e

    −x> 0

    so F (x) is increasing, and

    limx→−∞

    e−e−x

    = e−∞ = 0

    limx→∞

    e−e−x

    = e0 = 1.

    So F (x) is a CDF.

    (d) F (x) = 0 for x ≤ 0. For x > 0

    d

    dx(1− e−x) = e−x > 0

    so F (x) is nondecreasing, and

    limx→∞

    (1− e−x) = 1− 0 = 1.

    So F (x) is a CDF.

    15

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    (e) The function is continuous everywhere except possibly at the origin, andat the origin it is right continuous because of the placement of the equalitysign in the definition. The function is increasing for y < 0 and for y > 0from part (b). The function value at the origin is

    F (0) = �+1− �

    2>

    1− �2

    = F (0−),

    so F is increasing everywhere. Using the limit results from part (b)

    limy→−∞

    F (y) = (1− �)× 0 = 0

    limx→∞

    F (y) = �+ (1− �)× 1 = 1.

    So F (x) is a CDF.

    4. The set of possible values is X = {0, 1, 2, 3, 4}, and the PMF is given by

    fX(x) =

    (5x

    )(254−x

    )(304

    )A table of the probabilities is

    0 1 2 3 4p 0.4616 0.4196 0.1095 0.0091 0.0002

    and a plot of the CDF is

    −1 0 1 2 3 4 5

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    CDF for Number of Defectives

    x

    f(x)

    R code was used to create the table and plot the CDF is available

    16

    http://www.stat.uiowa.edu/~luke/classes/193/1-51.R

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    5. The function

    g(x) =

    {f(x)

    1−F (x0) x ≥ x00 x < x0

    represents the conditional density of X given that X ≥ x0. Since f(x) ≥ 0 wealso have g(x) ≥ 0. Furthermore,∫ ∞

    −∞g(x)dx =

    ∫ ∞x0

    g(x)dx =

    ∫∞x0f(x)dx

    1− F (x0)

    =P (X > x0)

    1− F (x0)=

    1− F (x0)1− F (x0)

    = 1

    So g(x) is a PDF.

    17

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    Assignment 5

    Due on Monday, September 28, 2015.

    1. Let X be a non-negative, integer-valued random variable with probability massfunction pn = P (X = n) for n = 0, 1, . . . . The probability generating functionof X is defined as

    G(t) =∞∑n=0

    tnpn

    for |t| ≤ 1.

    (a) Show that pn can be recovered from the value of the n-th derivative of G(t)at t = 0. (The zero-th derivative of G(t) is G(t).)

    (b) Suppose X is the number of heads in n independent flips of a biased coinwith probability of heads equal to p. X has a binomial distribution. Findthe probability generating function of X.

    (c) Suppose Y is the number of independent tosses of a biased coin with withprobability p of heads needed until the first head is obtained. Y has ageometric distribution. Find the probability generating function of Y .

    2. Problem 2.2 from the textbook

    3. Problem 2.6 from the textbook

    4. Problem 2.8 from the textbook

    Solutions

    1. (a) The derivatives are

    G′(t) =∞∑n=1

    ntn−1pn

    G′′(t) =∞∑n=2

    n(n− 1)tn−2pn

    ...

    G(k)(t) =∞∑n=k

    n!

    (n− k)!tn−2pn.

    18

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    At t = 0 all terms except the first are zero, so

    G(0) = p0

    G′(0) = p1

    G′′(0) = 2p2...

    G(k)(0) = k! pk.

    So pk = G(k)(0)/k!. This is the reasonG is called the probability generating

    function.

    (b) For the binomial distribution

    G(t) =n∑k=0

    tk(n

    k

    )pk(1− p)n−k =

    n∑k=0

    (n

    k

    )(tp)k(1− p)n−k = (tp+ 1− p)n

    by the binomial theorem.

    (c) For the geometric distribution

    G(t) =∞∑n=1

    tnp(1− p)n−1 = tp∞∑n=1

    [t(1− p)]n−1 = tp1− t(1− p)

    .

    2. The change of variables formula for smooth monotone transformations can beapplied in all three cases.

    (a) Y = [0, 1], g−1(y) = √y, and for y ∈ [0, 1]

    fY (y) =1

    2√y

    (b) Y = (0,∞), g−1(y) = e−y, and for y > 0

    fY (y) =(n+m+ 1)!

    n!m!e−yn(1−e−y)m|−e−y| = (n+m+ 1)!

    n!m!e−y(n+1)(1−e−y)m

    (c) Y = (1,∞), g−1(y) = log y, and for y > 1

    fY (y) =1

    σ2log(y)

    ye−((log y)/σ)

    2/2

    3. All three fit into the framework of Theorem 2.1.8.

    (a) Let A1 = (−∞, 0) and A2 = (0,∞). On A1, g1(x) = |x|3 = −x3, and onA2, g2(x) = |x|3 = x3. The range is Y = (0,∞). So for y > 0

    fY (y) =1

    2e−y

    1/3 1

    3| − y−2/3|+ 1

    2e−y

    1/3 1

    3y−2/3 =

    1

    3y−2/3e−y

    1/3

    19

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    (b) Let A1 = (−1, 0) and A2 = (0, 1). The range of Y is Y = (0, 1). Theng1(x) = 1− x2, g2(x) = 1− x2, g−11 (y) = −

    √1− y, and g−12 =

    √1− y. So

    for y ∈ (0, 1)

    fY (y) =3

    8

    (1−

    √1− y

    )2 12

    1√1− y

    +3

    8

    (1 +

    √1− y

    )2 12

    1√1− y

    =3

    8(1− y)−1/2 + 3

    8(1− y)1/2

    (c) Y , A1, A2, and g1 are as in the previous part; g2(x) = 1−x and g2−1(y) =1− y. So for y ∈ (0, 1)

    fY (y) =3

    8

    (1−

    √1− y

    )2 12

    1√1− y

    +3

    8(1 + 1− y)2

    =3

    16

    (1−

    √1− y

    )2 1√1− y

    +3

    8(2− y)2

    4. F−1(y) = inf{x : F (x) ≥ y}

    (a)

    F (x) =

    {0 x < 0

    1− e−x x ≥ 0F−1(y) =

    {−∞ y = 0− log(1− y) 0 < y ≤ 1

    (b)

    F (x) =

    12ex x < 0

    12

    0 ≤ x < 11− 1

    2e1−x x ≥ 1

    F−1(y) =

    {log 2y 0 ≤ y ≤ 1/21− log(2(1− y)) 1/2 < y ≤ 1

    (c)

    F (x) =

    {14ex x < 0

    1− 14e−x x ≥ 0

    F−1(y) =

    log(4y) 0 ≤ y < 1/40 1/4 ≤ y < 3/4− log(4(1− y)) 3/4 ≤ y ≤ 1

    20

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    Assignment 6

    Due on Monday, October 5, 2015.

    1. Problem 2.11 from the textbook

    2. Problem 2.13 from the textbook

    3. Let X be a non-negative random variable with CDF F . Show that

    E[X] =

    ∫ ∞0

    (1− F (t))dt.

    Hint: Argue that you can write X =∫∞0

    1{t

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    Using the PDF of Y = X2:

    E[X2] = E[Y ] =

    ∫ ∞0

    y1√2π

    1√ye−y/2dy

    =1√2π

    ∫ ∞0

    √ye−y/2dy

    =23/2√

    ∫ ∞0

    z3/2−1e−zdz

    =23/2√

    2πΓ(3/2) =

    2√π

    Γ(3/2)

    =2√π

    Γ(1/2)1

    2=

    Γ(1/2)√π

    = 1

    (b) The density of Y = |X| is

    fY (y) = fX(−y) + fX(y) = 2e−y2/2 1√

    2π=

    √2

    πe−y

    2/2

    for y ≥ 0 and fY (y) = 0 for y < 0. The mean is

    E[Y ] = E[|X|] = 2∫ ∞0

    ye−y2/2 1√

    2πdy

    = 2

    [− 1√

    2πe−y

    2/2

    ∣∣∣∣∞0

    ]= 2

    1√2π

    =

    √2

    π

    The second noncentral moment is

    E[Y 2] = E[X2] = 1

    so the variance is

    Var(Y ) = 1− 2π

    2. The possible values of X are X = {1, 2, 3, . . . }. The probability mass functionof X is

    fX(x) = P (X = x)

    = P (first x are H, followed by a T) + P (first x are T, followed by an H)

    = px(1− p) + (1− p)xp

    22

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    for x ∈ X . So the mean is

    E[X] =∞∑x=1

    x(px(1− p) + (1− p)xp)

    =

    [∞∑x=1

    xpx(1− p)

    ]+

    [∑x=1

    x(1− p)xp)

    ]

    = p

    [∞∑x=1

    xpx−1(1− p)

    ]+ (1− p)

    [∑x=1

    x(1− p)x−1p)

    ]

    The sums in square brackets are the means of geometric random variables withsuccess probabilities 1− p and p, respectively, so

    E[X] =p

    1− p+

    1− pp

    =p2 + (1− p)2

    p(1− p)

    3. For a non-negative random variable X we can write

    X =

    ∫ X0

    1dt =

    ∫ ∞0

    1{t t)dt = a

    ∫ ∞0

    e−λtdt+ (1− a)∫ ∞0

    e−µtdt

    =a

    λ

    ∫ ∞0

    λe−λtdt+1− aµ

    ∫ ∞0

    µe−µtdt

    =a

    λ+

    1− aµ

    5. The n-th moment of this density is

    E[Xn] =

    ∫ ∞1

    xnα

    xα+1dx =

    ∫ ∞1

    αxn−α−1dx

    =

    {[α

    n−αxn−α]∞

    1for α 6= n

    [α log(x)]∞1 for α = n=

    {∞ for α ≤ nα

    α−n for α > n.

    23

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    So

    E[X] =

    {∞ for α ≤ 1αα−1 for α > 1

    and

    E[X2] =

    {∞ for α ≤ 2αα−2 for α > 2.

    The variance of X is therefore infinite if α ≤ 2, and is

    Var(X) = E[X2]− E[X]2 = αα− 2

    −(

    α

    α− 1

    )2=

    α

    (α− 2)(α− 1)2

    for α > 2.

    24

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    Assignment 7

    Due on Monday, October 12, 2015.

    1. Problem 2.17 from the textbook

    2. Problem 2.24 from the textbook

    3. Problem 2.32 from the textbook

    4. Problem 2.33 from the textbook

    5. Problem 2.38 from the textbook

    6. Problem 2.40 from the textbook

    Solutions

    1. (a) Over the range x ∈ [0, 1] the CDF is

    F (x) =

    ∫ x0

    3y2dy = x3

    is strictly increasing, so there is a unique median m that solves

    F (m) = m3 = 1/2

    The solution is m = 12

    1/3 ≈ 0.7937(b) The density (a Cauchy density) is symmetric around the origin, so

    P (X ≤ 0) = P (X ≥ 0) = 1/2

    and therefore m = 0 is a median. Since the density is positive the CDF isstrictly increasing and the median is unique.

    2. (a) f(x) = axa−1, 0 < x < 1, a > 0.

    E[X] =

    ∫ 10

    axadx =a

    a+ 1

    E[X2] =

    ∫ 10

    axa+1dx =a

    a+ 2

    Var(X) =a

    a+ 2−(

    a

    a+ 1

    )2=

    a

    (a+ 2)(a+ 1)2

    25

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    (b) f(x) = 1n, x = 1, 2, . . . , n.

    E[X] =n∑i=1

    i

    n=

    1

    n

    n∑i=1

    i =1

    n

    n(n+ 1)

    2=n+ 1

    2

    E[X2] =n∑i=1

    i2

    n=

    1

    n

    n∑i=1

    i2 =1

    n

    n(n+ 1)(2n+ 1)

    6=

    (n+ 1)(2n+ 1)

    6

    Var(X) =(n+ 1)(2n+ 1)

    6−(n+ 1

    2

    )2= (n+ 1)

    (2n+ 1

    6− n+ 1

    4

    )= (n+ 1)

    4n+ 2− 3n− 312

    = (n+ 1)n− 1

    12=n2 − 1

    12

    (c) f(x) = 32(x− 1)2, 0 < x < 2.

    E[X] =

    ∫ 20

    x3

    2(x− 1)2dx = 1

    Var(X) = E[(X − 1)2]

    =

    ∫ 20

    3

    2(x− 1)4dx

    =3

    2

    (x− 1)5

    5

    ∣∣∣∣20

    =3

    2× 2

    5=

    3

    5

    3. The first derivative of S(t) is

    S ′(t) =d

    dtlogMX(t) =

    M ′X(t)

    MX(t)

    So

    S ′(t)|t=0 =M ′X(0)

    MX(0)=E[X]

    1= E[X].

    By the quotient rule the second derivative of S(t) is

    S ′′(t) =MX(t)M

    ′′X(t)−M ′X(t)2

    MX(t)2

    and therefore

    S ′′(t)|t=0 =MX(0)M

    ′′X(0)−M ′X(0)2

    MX(0)2= E[X2]− E[X]2 = Var(X).

    4. (a) Done in class.

    26

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    (b) Variation on a geometric.

    M(t) =∞∑x=0

    etxp(1− p)x =

    {p

    1−et(1−p) t < − log(1− p)∞ otherwise

    M ′(t) =p(1− p)et

    (1− et(1− p))2

    M ′′(t) =(1− et(1− p))2p(1− p)et + 2(1− et(1− p))etp(1− p)2et

    (1− et(1− p))4

    E[X] =1− pp

    E[X2] =p3(1− p) + 2p2(1− p)2

    p4=p(1− p) + 2(1− p)2

    p2

    Var(X) =p(1− p) + (1− p)2

    p2=

    1− pp2

    (c)

    M(t) =

    ∫etx

    1√2πσ

    e−(x−µ)2

    2σ2 dx

    =

    ∫1√2πσ

    exp{− x2

    2σ2+xµ

    σ2− µ

    2

    2σ2+ tx}dx

    =

    ∫1√2πσ

    exp{− x2

    2σ2+

    x

    σ2(µ+ σ2t)− µ

    2

    2σ2}dx

    = exp{− µ2

    2σ2+µ2 + 2µσ2t+ σ4t2

    2σ2}

    = exp{µt+ 12σ2t2}

    K(t) = logM(t) = µt+1

    2σ2t2

    K ′(t) = µ+ σ2t

    K ′′(t) = σ2

    E[X] = K ′(0) = µ

    Var(X) = K ′′(0) = σ2

    5. (a) From 2.30(d)

    MX(t) =

    {(p

    1−et(1−p)

    )rt < − log(1− p)

    ∞ otherwise

    27

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    (b)

    MY (t) = E[etY ] = E[e2ptX ] = MX(2pt)

    =

    {(p

    1−e2pt(1−p)

    )r2pt < − log(1− p)

    ∞ otherwise

    Now by L’Hospital’s rule

    limp→0− log(1− p)

    2p=

    limp→01

    1−p

    limp→0 2=

    1

    2

    and

    limp→0

    p

    1− e2pt(1− p)=

    limp→0 1

    limp→0(−2te2pt(1− p) + e2pt)=

    1

    1− 2t

    So for t < 12,

    MY (t)→(

    1

    1− 2t

    )r=

    (1

    1− 2t

    )2r/2This is the MGF of a χ22r distribution.

    6. The result holds only for x = 0, . . . , n − 1. For x = 0 the left hand side is(1− p)n and the right hand side is

    n

    ∫ 1−p0

    tn−1dt = tn|t=1−pt=0 = (1− p)n

    So the claim hols for x = 0. Suppose the claim is true for y = 0, . . . , x− 1 andx < n. Integration by parts produces

    (n− x)(n

    x

    )∫ 1−p0

    tn−x−1(1− t)xdt

    =

    (n

    x

    )∫ 1−p0

    ((n− x)tn−x−1

    )(1− t)xdt

    =

    (n

    x

    )[tn−x(1− t)x

    ∣∣t=1−pt=0

    +

    ∫ 1−p0

    xtn−x(1− t)x−1dt]

    =

    (n

    x

    )px(1− p)n−x + x

    (n

    x

    )∫ 1−p0

    tn−(x−1)−1(1− t)x−1dt

    =

    (n

    x

    )px(1− p)n−x + (n− (x− 1))

    (n

    x− 1

    )∫ 1−p0

    tn−(x−1)−1(1− t)x−1dt

    28

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    By the induction hypothesis the second term is

    (n− (x− 1))(

    n

    x− 1

    )∫ 1−p0

    tn−(x−1)−1(1− t)x−1dt =x−1∑k=0

    (n

    k

    )pk(1− p)n−k

    Thus the result folds for all x = 0, . . . , n− 1.Several other approaches are possible.

    • Differentiating both sides produces a telescoping series on the left handside.

    • Both sides are polynomials of degree n in p. The polynomials are equal ifand only if the coefficients of the powers of p are equal, and these can becalculated by differentiating multiple times and evaluating the derivativesat zero.

    • The simplest approach uses a property of the distribution of order statisticsthat we will learn about in Chapter 5:

    If U1, . . . , Un are independent standard uniforms and Np is the number ofthese uniforms that are less than or equal to p, then Np is Binomial(n, p),and

    P (Np ≤ x) = P (Np < x+ 1) = P (U(x+1) > p)

    where U(k) is the k-th order statistic of the sample. Now

    P (U(x+1) > p) = P (1− U(x+1) < 1− p) = P (V(n−x) < 1− p)

    where V(k) is the k-th order statistic of the sample 1−U1, . . . , 1−Un. Theresult now follows from the fact that the k-th uniform order statistic V(k)for a sample of size n has a Beta(k, n− k + 1) distribution.

    29

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    Assignment 8

    Due on Monday, October 19, 2015.

    1. Problem 3.7 from the textbook

    2. Problem 3.12 from the textbook

    3. Problem 3.25 from the textbook

    4. Problem 3.26 from the textbook

    5. Problem 3.28 from the textbook

    6. Problem 3.30 from the textbook

    Solutions

    1. P (X ≥ 2) = 0.99 means

    P (X = 0) + P (X = 1) = e−λ + λe−λ = 0.01

    The solution is around 6.6. This can be determined graphically or numerically,for example using the R function uniroot.

    2. X is Binomial(n, p) and Y is negative binomial(r, p) (zero based, Y countsnumber of failures).

    FX(r − 1) = P (r − 1 or fewer successes in n trials)= 1− P (r or more successes in n trials)= 1− P (r-th success on or before n-th trial)= 1− P (number of failures before r-th success ≤ n− r)= 1− FY (n− r)

    3.

    hT (t) = limδ↓0

    P (t ≤ T < t+ δ|T > t)δ

    = limδ↓0

    1

    δ

    F (t+ δ)− F (t)1− F (t)

    =1δ(F (t+ δ)− F (t))

    1− F (t)

    =F ′(t)

    1− F (t)=

    f(t)

    1− F (t)

    30

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    and

    − ddt

    log(1− F (t)) = f(t)1− F (t)

    The quantity

    HT (t) = − log(1− FT (t)) =∫ t0

    hT (u)du

    is called the cumulative hazard function, and

    FT (t) = 1− exp{−HT (t)}.

    4. (a) For an Exponential(β) distribution

    fX(t) =1

    βe−t/β

    hT (t) =

    1βe−x/β

    e−x/β=

    1

    β

    (b) For a Weibull distribution,

    fT (t) =γ

    βtγ−1e−t

    γ/β

    FT (t) = P (X1/γ ≤ t) = P (X ≤ tγ) = 1− e−tγ/β

    hT (t) =γ

    βtγ−1

    (c) For the logistic distribution,

    FT (t) =1

    1 + e−(t−µ)/β

    fT (t) =1

    (1 + e−(t−µ)/β)21

    βe−(t−µ)/β

    =1

    βF (t)(1− F (t))

    So h(t) = 1βF (t).

    5. (a) The normal family can be written as

    f(x|µ, σ) = 1√2πσ

    exp

    {−(x− µ)

    2

    2σ2

    }

    =1√2πσ

    e−µ2/(2σ2)︸ ︷︷ ︸

    c(θ)

    exp

    −x2︸︷︷︸t1(x)1

    2σ2︸︷︷︸w1(θ)

    + x︸︷︷︸t2(x)

    µ

    σ2︸︷︷︸w2(θ)

    1︸︷︷︸h(x)31

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    (b) The Gamma family with both parameters unknown can be written as

    f(x|α, β) = 1Γ(α)βα

    xα−1e−x/β1(0,∞)(x)

    =1

    Γ(α)βα︸ ︷︷ ︸c(θ)

    exp

    (α− 1)︸ ︷︷ ︸w1(θ) log x︸︷︷︸t1(x) + (−x)︸ ︷︷ ︸t2(x)1

    β︸︷︷︸w2(θ)

    1(0,∞)(x)︸ ︷︷ ︸h(x)If α is known then the first term in the exponent becomes part of h(x); ifβ is known the second term in the exponent becomes part of h(x).

    (c) The Beta family with both parameters unknown can be written as

    f(x|α, β) = Γ(α + β)Γ(α)Γ(β)

    xα−1(1− x)β−11[0,1](x)

    =Γ(α + β)

    Γ(α)Γ(β)︸ ︷︷ ︸c(θ)

    exp

    (α− 1)︸ ︷︷ ︸w1(θ)

    log x︸︷︷︸t1(x)

    + (β − 1)︸ ︷︷ ︸w2(θ)

    log(1− x)︸ ︷︷ ︸t2(x)

    1[0,1](x)︸ ︷︷ ︸h(x)

    Again, if either α or β is known the corresponding term in the exponentbecomes part of h(x).

    (d) The Poisson family can be written as

    f(x|λ) = λx

    x!e−λ =

    1

    x!︸︷︷︸h(x)

    e−λ︸︷︷︸c(λ)

    exp

    x︸︷︷︸t(x)

    log λ︸︷︷︸w(λ)

    (e) The negative binomial family with r known can be written as

    f(x|r, p) =(r + x− 1

    x

    )pr(1− p)x

    =

    (r + x− 1

    x

    )︸ ︷︷ ︸

    h(x)

    pr︸︷︷︸c(p)

    exp

    x︸︷︷︸t(x)

    log(1− p)︸ ︷︷ ︸w(p)

    6. (a) For the binomial, w(p) = log(p/(1 − p)), c(p) = (1 − p)n, and t(x) = x.

    The variance Var(t(X)) = Var(x) satisfies

    (w′(p))2Var(X) = − d2

    dp2log c(p)− w′′(p)E[X]

    32

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    Now

    w′(p) =1

    p+

    1

    1− p=

    1

    p(1− p)

    w′′(p) = − 1p2

    +1

    (1− p)2d2

    dp2log c(p) = − n

    (1− p)2

    So (1

    p(1− p)

    )2Var(X) =

    n

    (1− p)2−(− 1p2

    +1

    (1− p)2

    )np

    = n

    (1

    1− p+

    1

    p

    )=

    n

    p(1− p)

    and thus Var(X) = np(1− p)(b) For the Beta distribution t1(x) = log x and t2(x) = log(1−x). The function

    f(x) = x cannot be expressed as a linear combination of t1(x) and t2(x),so the identities in Theorem 3.4.2 cannot be used to find the mean andvariance of X.

    If X ∼ Poisson(λ) then t(x) = x, w(λ) = log λ, and c(λ) = e−λ. So

    w′(λ) = 1/λ w′′(λ) = −1/λ2

    ∂λlog c(λ) = −1 ∂

    2

    ∂λ2log c(λ) = 0

    So Theorem 3.4.2 produces the equations

    E[X/λ] = 1

    Var(X/λ) = E[X/λ2]

    with solutions E[X] = λ and Var(X) = λ.

    33

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    Assignment 9

    Due on Monday, October 26, 2015.

    1. Problem 4.1 from the textbook

    2. Problem 4.4 from the textbook

    3. Problem 4.5 from the textbook

    4. Problem 4.10 from the textbook

    5. Let X1, X2, and V be independent random variables with

    E[X1] = µ E[X2] = µ E[V ] = 0

    Var(X1) = σ2 Var(X2) = σ

    2 Var(V ) = τ 2.

    Let Y1 = X1 + V and Y2 = X2 + V

    (a) Find the means and variances of Y1 and Y1.

    (b) Find Cov(Y1, Y2) and Cov(Y1, V ).

    6. In the generalized birthday problem discussed in Week 2 an urn contains m ballsand a sample of size n is drawn from the urn with replacement. Let X be thenumber of balls that do not appear in the sample. Find the mean and varianceof X. [Hint: Express X as a sum of suitable Bernoulli random variables.]

    Solutions

    1. The joint density is

    f(x, y) =

    {14−1 ≤ x, y ≤ 1

    0 otherwise

    (a) Since the unit circle is contained in the supporting square,

    P (X2 + Y 2 < 1) =area of circle

    total area of square

    4

    (b) The line y = 2x splits the support into two parts of equal area, so P (2X−Y > 0) = 1/2. Alternatively,

    P (Y < 2X) =

    ∫ 1−1

    ∫ 1y/2

    1

    4dxdy =

    ∫ 1−1

    1

    4(1− y/2)dy

    =

    [y

    4− y

    2

    16

    ]1−1

    =1

    4+

    1

    4=

    1

    2

    34

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    (c) All points in the interior of the square satisfy |x− y| < 2, so

    P (|X + Y | < 2) = 1.

    2. (a) The integral of the density is

    1 =

    ∫ 10

    ∫ 20

    C(x+ 2y)dxdy =

    ∫ 10

    C(2 + 4y)dy

    = C(2 + 2) = 4C

    So the normalizing constant is C = 1/4.

    (b) The marginal density of X is

    fX(x) =

    ∫ 10

    1

    4(x+ 2y)dy1[0,2](x)

    =1

    4(x+ 1)1[0,2](x)

    (c) The joint CDF is

    F (x, y) =

    ∫ y0

    ∫ x0

    1

    4(u+ 2v)dudv

    =

    ∫ y0

    1

    4

    (x2

    2+ 2vx

    )dv

    =1

    4

    (x2

    2y + y2x

    )=

    1

    8x2y +

    1

    4y2x

    for 0 < x < 2 and 0 < y < 1.

    (d) Since Z depends only on X this is a one-dimensional transformation. Thetransformation is smooth and monotone, so

    Z = 9/(X + 1)2

    X = 3/√Z − 1

    dx =3

    2

    1

    z3/2dz

    fZ(z) =1

    4

    3√z

    3

    2

    1

    z3/2=

    9

    8z2

    for z ∈ [1, 9].

    3. (a)

    P (X >√Y ) = P (Y < X2) =

    ∫ 10

    ∫ x20

    x+ ydydx

    =

    ∫ 10

    x3 +x4

    2dx =

    1

    4+

    1

    10=

    7

    20= 0.35

    35

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    (b)

    P (X2 < Y < X) =

    ∫ 10

    ∫ xx2

    2xdydx

    =

    ∫ 10

    2x(x− x2)dx =∫ 10

    2x2 − 2x3dx

    =2

    3− 2

    4=

    1

    6

    4. (a) The marginal probabilities fX(2) and fY (3) are non-zero and thereforefX(2)fY (3) is non-zero, but the joint probability fX,Y (2, 3) = 0. So thejoint PMF is not the product of the marginals and thusX, Y are dependent.

    (b) The marginals are

    1 2 3fX(x)

    14

    12

    14

    and

    2 3 4fY (y)

    13

    13

    13

    The joint probability table

    X1 2 3

    2 112

    16

    112

    Y 3 112

    16

    112

    4 112

    16

    112

    obtained as gX,Y (x, y) = fXxfY (y) has the same marginals and X, Y areindependent.

    5. (a) The means are

    E[Xi] = E[Yi + V ] = E[Yi] + E[V ] = µ.

    Since the Xi are independent of V the variances are

    Var(Yi) = Var(Xi + V ) = Var(Xi) + Var(V ) = σ2 + τ 2.

    (b) The covariance of Y1 and Y2 is

    Cov(Y1, Y2) = Cov(X1 + V,X2 + V )

    = Cov(X1, X2 + V ) + Cov(V,X2 + V )

    = Cov(X1, X2) + Cov(X1, V ) + Cov(V,X2) + Cov(V, V )

    = 0 + 0 + 0 + Var(V ) = τ 2.

    36

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    The covariance of Yi and V is

    Cov(Yi, V ) = Cov(Xi + V, V )

    = Cov(Xi, V ) + Cov(V, V )

    = 0 + Var(V ) = τ 2.

    6. Let Yi = 1 if ball i is not in the sample and Yi = 0 otherwise. Then X =Y1 + · · ·+ Ym. The Yi are Bernoulli random variables with success probability

    pm =

    (m− 1m

    )n.

    So the mean of X is

    E[X] = mpm = m

    (m− 1m

    )n.

    The Yi are correlated, so to calculate Var(X) we need their covariances. Nowfor i 6= j

    E[YiYj] = E[Y1Y2] = P (balls 1 and 2 are not in the sample) =

    (m− 2m

    )n.

    So

    Cov(Yi, Yj) =

    (m− 2m

    )n− p2m =

    (m− 2m

    )n−(m− 1m

    )2n.

    The variance of X is therefore

    Var(X) =∑

    Var(Yi) +∑∑

    i 6=j

    Cov(Yi, Yj)

    = mpm(1− pm) +m(m− 1)((

    m− 2m

    )n− p2m

    ).

    37

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    Assignment 10

    Due on Monday, November 2, 2015.

    1. Problem 4.15 from the textbook

    2. Problem 4.16 (a) and (c) from the textbook (these geometrics count failures!)

    3. Problem 4.17 from the textbook

    4. Problem 4.21 from the textbook

    5. Problem 4.27 from the textbook

    6. Problem 4.36 from the textbook

    Solutions

    1. X, Y are independent,

    X ∼ Poisson(θ) MX(t) = exp{θ(et − 1)}Y ∼ Poisson(λ) MY (t) = exp{λ(et − 1)}

    soMX+Y (t) = exp{(θ + λ)(et − 1)}

    and thus X + Y is Poisson(θ + λ). Now

    fX|X+Y (x|z) =fX(x)fY (z − x)

    fX+Y (z)=

    θx

    x!e−θ λ

    z−x

    (z−x)!e−λ

    (θ+λ)z

    z!e−(θ+λ)

    =z!

    x!(z − x)!

    θ + λ

    )x(1− θ

    θ + λ

    )z−xfor 0 ≤ x ≤ z. So X|X + Y = z is Binomial(z, θ/(θ + λ)).

    2. X, Y are geometric, starting at zero (counting failures),

    f(x) = p(1− p)x

    for x = 0, 1, 2, . . ., and are independent.

    (a)

    U = min(X, Y ) range: 0, 1, . . .

    V = X − Y range: all integers

    38

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    The joint PMF of U, V is

    fU,V (u, v) = P (min(X, Y ) = u,X − Y = v)

    =

    {P (Y = u,X = u+ v) if v ≥ 0P (X = u, Y = u− v) if v < 0

    =

    {p2(1− p)u(1− p)u+v v ≥ 0p2(1− p)u(1− p)u−v v < 0

    =[p2(1− p)2u

    ] [(1− p)|v|

    ]So U, V are independent.

    (b) Z takes on all possible rational values in [0,1]. Let q be a rational in [0,1]and let q = m

    nwhere m and n have no common factors. Then for m > 0

    P (Z = q) = P (Z =m

    n)

    = P (X = mk, Y = (n−m)k for some k ≥ 1)

    =∞∑k=1

    p2(1− p)mk(1− p)(n−m)k =∞∑k=1

    p2(1− p)nk

    =p2(1− p)n

    1− (1− p)n

    For m = 0, P (Z = 0) = P (X = 0) = p.

    c. Let Z = X + Y . For x = 0, 1, . . . and z = x, x + 1, . . . the joint PMF ofX and Z is

    f(x, z) = P (X = x, Y = z − x) = p(1− p)xp(1− p)z−x = p2(1− p)z

    For all other (x, y) pairs f(x, y) = 0.

    3. (a) For y = 1, 2, . . .,

    fY (y) = P (y − 1 < X < y) = e−(y−1) − e−y = e−(y−1)(1− e−1)

    So Y is geometric(p = 1− e−1).(b)

    P (X − 4 > x|Y ≥ 5) = P (X − 4 > x|X ≥ 4)

    =

    {1 x ≤ 0e−x−4/e−4 x > 0

    =

    {1 x ≤ 0e−x x > 0

    This is an exponential distribution.

    For any t, X − t|X ≥ t is Exponential(1).

    39

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    4. R2 ∼ χ22 = Gamma(1, 2) = Exponential(2) and θ ∼ Uniform(0, 2π).

    X =√R2 cos θ

    Y =√R2 sin θ

    A = (0,∞)× (0, 2π)B = R2

    The joint density of R2, θ is

    fR2,θ(a, b) =1

    2e−a/2

    1

    for (a, b) ∈ A. The inverse transformation is

    R2 = X2 + Y 2

    θ =

    cos−1(

    X√X2+Y 2

    )Y > 0

    2π − cos−1(

    X√X2+Y 2

    )Y < 0

    This is messy to differentiate; instead, compute

    J−1 = det

    (1

    2√R2

    cos θ −√R2 sin θ

    1

    2√R2

    sin θ√R2 cos θ

    )=

    1

    2cos2 θ +

    1

    2sin2 θ =

    1

    2

    So J = 2, and

    fX,Y (x, y) =1

    2πe−

    x2+y2

    2

    Thus X, Y are independent standard normal variables.

    5. Approach from class: Let Z1, Z2 be independent standard normals and let

    X = µ+ σZ1

    Y = γ + σZ2

    Then

    U = X + Y = µ+ γ + σZ1 + σZ2

    V = X − Y = µ− γ + σZ1 − σZ2

    So

    C = BBT =

    [2σ2 00 2σ2

    ]

    40

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    and

    fU,V (u, v) =1

    2π2σ2exp

    {−(u− (µ+ γ))

    2

    4σ2− (v − (µ− γ))

    2

    4σ2

    }= fU(u)fV (v)

    where U ∼ N(µ+ γ, 2σ2), V ∼ N(µ− γ, 2σ2), and U, V are independent.

    6. (a) See (b)

    (b) Suppose the Pi are independent random variables with values in the unitinterval and common mean µ. Since the Pi are independent and each Xionly depends on Pi, the Xi are marginally independent as well. Each Xitakes on only the values 0 and 1, so the marginal distributions of the Xiare Bernoulli with success probability

    P (Xi = 1) = E[P (Xi = 1|Pi)] = E[Pi] = µ

    So the Xi are independent Bernoulli(µ) random variables and thereforeY =

    ∑ni=1Xi is Binomial(n, µ). If the Pi have a Beta(α,β) distribution

    then µ = α/(α + β) and therefore

    E[Y ] = nµ = nα/(α + β)

    Var(Y ) = nµ(1− µ) = nαβ/(α + β)2

    (c) For each i = 1, . . . , k

    E[Xi] = E[E[Xi|Pi]] = E[niPi] = niE[Pi] = niα

    α + β

    Var(Xi) = E[Var(Xi|Pi)] + Var(E[Xi|Pi])= E[niPi(1− Pi)] + Var(niPi)= niE[Pi(1− Pi)] + n2iVar(Pi)

    = ni

    ∫ 10

    Γ(α + β)

    Γ(α)Γ(β)pα+1−1(1− p)β+1−1dp+ n2i

    αβ

    (α + β)2(α + β + 1)

    = niΓ(α + β)

    Γ(α)Γ(β)

    Γ(α + 1)Γ(β + 1)

    Γ(α + β + 2)+

    n2iαβ

    (α + β)2(α + β + 1)

    =niαβ

    (α + β)(α + β + 1)+

    n2iαβ

    (α + β)2(α + β + 1)

    =niαβ

    (α + β)(α + β + 1)

    (1 +

    niα + β

    )= ni

    αβ(α + β + ni)

    (α + β)2(α + β + 2)

    41

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    Again the Xi are marginally independent, so

    E[Y ] =∑

    E[Xi] =α

    α + β

    k∑i=1

    ni

    Var(Y ) =∑

    Var(Xi) =k∑i=1

    niαβ(α + β + ni)

    (α + β)2(α + β + 2)

    The marginal distribution of Xi is called a beta-binomial distribution. Thedensity of Pi is

    fP (p) =Γ(α + β)

    Γ(α)Γ(β)pα−1(1− p)β−1

    for 0 < p < 1. So the PMF of Xi is

    P (Xi = x) = E[P (Xi = x|Pi)] = E[(nix

    )P xi (1− Pi)ni−x

    ]=

    ∫ 10

    (nix

    )px(1− p)ni−x Γ(α + β)

    Γ(α)Γ(β)pα−1(1− p)β−1dp

    =

    (nix

    )Γ(α + β)

    Γ(α)Γ(β)

    Γ(α + x)Γ(β + ni − x)Γ(α + β + ni)

    42

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    Assignment 11

    Due on Monday, November 9, 2015.

    1. Problem 4.28 (a) and (b) from the textbook

    2. Problem 4.30 from the textbook

    3. Problem 4.39 from the textbook

    4. Problem 4.40 from the textbook

    Solutions

    1. (a)

    U =X

    X + YA = R2

    V = X + Y B = R2

    X = UV

    Y = V − UV = (1− U)V

    So

    |J(u, v)| =∣∣∣∣det( v u−v 1− u

    )∣∣∣∣ = |v(1− u) + uv| = |v|Thus

    fU,V (u, v) = fX,Y (uv, (1− u)v)|v| =1

    2πe−

    12u2v2− 1

    2(1−u)2v2 |v|

    and

    fU(u) =

    ∫fU,V (u, v)dv

    =

    ∫ ∞−∞

    1

    2πe−

    12u2v2− 1

    2(1−u)2v2|v|dv

    = 2

    ∫ ∞0

    1

    2πexp

    {−v

    2

    2(1 + 2u2 − 2u)

    }vdv

    =1

    π(1 + 2u2 − 2u)=

    1

    π(12

    + 2(u− 1/2)2)=

    2

    π(1 + 4(u− 1/2)2)

    This is a Cauchy(1/2,1/2) density.

    43

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    (b)

    U = X/|Y |V = Y

    X = U |V |Y = V

    with A = B = R2. So

    |J(u, v)| =∣∣∣∣det(|v| ±u0 1

    )∣∣∣∣ = |v|and

    fU,V (u, v) =1

    2π|v| exp

    {−1

    2u2v2 − 1

    2v2}

    fU(u) =1

    π(1− u2)

    2. (a) The mean of Y is

    E[Y ] = E[E[Y |X]] = E[X] = 1/2

    The variance is

    Var(Y ) = E[Var(Y |X)] + Var(E[Y |X]) = E[X2] + Var(X)= 1/3 + 1/12 = 5/12

    The covariance is

    Cov(X, Y ) = E[(Y − µY )(X − µX)] = E[E[Y − µY |X](X − µX)]= E[(X − µX)2] = Var(X) = 1/12

    (b) The conditional distribution of Z = Y/X, given X = x is N(1, 1). Sincethis conditional distribution does not depend on x, Z and X are indepen-dent.

    3. For each j, Xj counts the number of m independent trials that fall in categoryj. It therefore has a Binomial(m, pj) distribution.

    Let Y = m−Xi−Xj. Then by a similar argument the joint marginal distribution

    44

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    of (Xi, Xj, Y ) is Multinomial(m, pi, pj, 1− p1 − pj). So

    P (X = xi|Xj = xj) =P (Xi = xi, Xj = xj)

    P (Xj = xj)

    =P (Xi = xi, Xj = xj, Y = m− xi − xj)

    P (Xj = xj)

    =

    m!xi!xj !(m−xi−xj)!p

    xii p

    xjj (1− pi − pj)m−x1−xj

    m!xj !(m−xj)!p

    xjj (1− pj)m−xj

    =(m− xj)!

    xi!(m− xi − xj)!pxii (1− pi − pj)m−xi−xj

    (1− pj)m−xj

    =

    (m− xjxi

    )(pi

    1− pj

    )xi (1− pi

    1− pj

    )m−xj−xifor xi = 0, . . . ,m − xj. This is the PMF of a Binomial(m − xj, pj/(1 − pj))distribution.

    Using these results,

    E[XiXj] = E[XjE[Xi|Xj]] = E[Xj(m−Xj)

    pi1− pj

    ]= (m2pj − E[X2j ])

    pi1− pj

    = (m2pj − Var(Xj)− E[Xj]2)pi

    1− pj= (m2pj −mpj(1− pj)−m2p2j)

    pi1− pj

    = (m2pj(1− pj)−mpj(1− pj))pi

    1− pj= (m2 −m)pipj

    and therefore

    Cov(Xi, Xj) = E[XiXj]− E[Xi]E[Xj] = (m2 −m)pipj −m2pjpj = −mpipj

    An alternative approach for deriving the covariance is to use indicator functionsof whether the k-th trial falls in category i.

    4. (a)

    (b) The marginal density of Y is

    fX(x) =

    ∫ 1−x0

    Cxa−1yb−1(1− x− y)c−1dy

    = Cxa−1(1− x)b+c−1∫ 10

    ub−1(1− u)c−1du

    = Cxa−1(1− x)b+c−1Γ(b)Γ(c)Γ(b+ c)

    45

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    for 0 < x < 1 using the change of variables u = y/(1 − x). This is aBeta(a, b+ c) density, and

    1 =

    ∫ 10

    fX(x)dx = CΓ(b)Γ(c)

    Γ(b+ c)

    Γ(a)Γ(b+ c)

    Γ(a+ b+ c)

    = CΓ(a)Γ(b)Γ(c)

    Γ(a+ b+ c).

    So

    C =Γ(a+ b+ c)

    Γ(a)Γ(b)Γ(c).

    Because of the symmetric role of x and y, the marginal distribution of Yis Beta(b, a+ c).

    (c) The conditional distribution of Y |X = x has density

    fY |X(y|x) =fX,Y (x, y)

    fX(x)

    ∝ xa−1yb−1(1− x− y)c−1

    xa−1(1− x)b+c−1

    =

    (y

    1− x

    )b−1(1− y

    1− x

    )c−11

    1− x

    for 0 < y < 1− x. The conditional density of U = Y/(1−X) given X = xis therefore

    fU |X(u|x) ∝ ub−1(1− u)c−1

    for 0 < u < 1, which is a Beta(b, c) density. As this does not depend on xthis shows that U and X are independent.

    (d) The expected product is

    E[XY ] = E[XE[Y |X]] = bb+ c

    E[X(1−X)]

    =b

    b+ c

    Γ(a+ b+ c)

    Γ(a)Γ(b+ c)

    ∫ 10

    xa(1− x)b+cdx

    =b

    b+ c

    Γ(a+ b+ c)

    Γ(a)Γ(b+ c)

    Γ(a+ 1)Γ(b+ c+ 1)

    Γ(a+ b+ c+ 2)

    =b

    b+ c

    a(b+ c)

    (a+ b+ c)(a+ b+ c+ 1)

    =ab

    (a+ b+ c)(a+ b+ c+ 1).

    46

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    The covariance is therefore

    Cov(X, Y ) = E[XY ]− E[X]E[Y ]

    =ab

    (a+ b+ c)(a+ b+ c+ 1)− ab

    (a+ b+ c)2

    = − ab(a+ b+ c)2(a+ b+ c+ 1)

    .

    47

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    Assignment 12

    Due on Monday, November 16, 2015.

    1. Problem 4.47 from the textbook

    2. Problem 4.55 from the textbook

    3. Problem 5.2 from the textbook

    4. Problem 5.8 from the textbook. You can simplify calculations somewhat byarguing that you can assume without loss of generality that θ1 = E[Xi] = 0.

    5. Problem 5.15 from the textbook.

    6. Let U1, . . . , Un be a random sample from the Uniform[0, 1] distribution withorder statistics U(1) ≤ · · · ≤ U(n), and let R = U(n) − U(1) be the sample range.Find the marginal density of R.

    Solutions

    1. (a) For z < 0

    P (Z ≤ z) = P (X ≤ z and Y < 0) + P (−X ≤ z and Y > 0).

    Since X, Y are independent, continuous, and have distributions symmetricabout the origin,

    P (X ≤ z and Y < 0) = P (X ≤ z)P (Y < 0) = Φ(z)12

    and

    P (−X ≤ z and Y > 0) = P (−X ≤ z)P (Y > 0) = Φ(z)12

    where Φ(z) is the CDF of the standard normal distribution. So

    P (Z ≤ z) = Φ(z)12

    + Φ(z)1

    2= Φ(z)

    By symmetry, for z > 0

    P (Z ≥ z) = P (Z ≤ −z) = Φ(−z) = 1− Φ(z)

    and therefore the CDF of X is equal to Φ for all z and Z has a standardnormal distribution.

    48

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    (b) If Y > 0 then Z = |X| and Z > 0. Similarly, if Y < 0 then Z = −|X| andZ < 0. So Z and Y have the same sign and therefore the joint distributionof Z, Y assigns zero probability to the second and fourth quadrants. SinceX, Y are jointly continuous and the bivariate normal distribution assignspositive probability to all open sets this means Z, Y cannot be jointlynormal.

    2. Let L be the system lifetime and let X1, X2, X3 be the component lifetimes.Then

    P (L ≤ x) = P (all three components fail by x)= P (X1 ≤ x,X2 ≤ x,X3 ≤ x) = P (X1 ≤ x)P (X2 ≤ x)P (X3 ≤ x)= P (X1 ≤ x)3 = (1− e−x/λ)3

    for y > 0.

    3. (a) Condition on X1 = x:

    P (Y > y|X1 = x) =

    {1 if y ≤ 0F (x)y if y ≥ 1

    i.e. Y |X1 = x is geometric with p = 1− F (x). So for y ≥ 1

    P (Y > y) = E[F (X1)y] =

    ∫ 10

    uydu

    =1

    y + 1

    since F (X1) is uniform on [0, 1] by the probability integral transform. Sofor y = 1, 2, . . . ,

    P (Y = y) =1

    y− 1y + 1

    =1

    y(y + 1)

    Alternative argument: For y = 1, 2, . . . ,

    P (Y > y) = P (X1 > max{X2, . . . , Xy+1})= P (argmax(X1, . . . , Xy+1) = 1)

    =1

    y + 1

    by symmetry.

    49

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    (b) Using byc to denote the largest integer less than or equal to y we haveP (Y > y) = 1− FY (y) = 1/(byc+ 1) for all y ≥ 0. So

    E[Y ] =

    ∫ ∞0

    1− FY (y)dy =∫ ∞0

    1

    byc+ 1dy

    ≥∫ ∞0

    1

    y + 1dy =∞

    4. (a) ∑(Xi −X)2 =

    ∑X2i −

    1

    n

    (∑Xi

    )(∑Xj

    )=

    1

    2n

    (2∑i

    ∑j

    X2i − 2∑i

    ∑j

    XiXj

    )

    =1

    2n

    (∑i

    ∑j

    X2i − 2∑i

    ∑j

    XiXj +∑i

    ∑j

    X2j

    )

    =1

    2n

    ∑i

    ∑j

    (Xi −Xj)2

    (b) Assume, without loss of generality, that E[Xi] = θ1 = 0. Then

    E[S2] = σ2 = θ2

    and

    E[S4] =1

    4n2(n− 1)2∑i

    ∑j

    ∑k

    ∑`

    E[(Xi −Xj)2(Xk −X`)2]

    If i = j or k = `, then E[(Xi − Xj)2(Xk − X`)2] = 0. If all i, j, k, ` aredifferent, then

    E[(Xi −Xj)2(Xk −X`)2] = E[(X1 −X2)2]2 = (2σ2)2 = 4θ22

    If {i, j} ∩ {k, `} = {i}, say k = i, then

    E[(Xi −Xj)2(Xi −X`)2] = E[(X2i − 2XiXj +X2j )(X2i − 2XiX` +X2` )]= E[X4i − 2X3iXj +X2iX2j

    − 2X3iX` + 4X2iXjX` − 2X2jXiX`+X2iX

    2` − 2XiXjX2` +X2jX2` ]

    = θ4 + 3θ22

    50

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    If {i, j} = {k, `}, then

    E[(Xi −Xj)2(Xi −X`)2] = E[(Xi −Xj)4]= E[X4i − 4X3iXj + 6X2iX2j − 4XiX3j +X4j ]= 2θ4 + 6θ

    22

    So

    E[S4] =1

    4n2(n− 1)2[n(n− 1)(n− 2)(n− 3)4θ22+ 4n(n− 1)(n− 2)(θ4 + 3θ22)+ 2n(n− 1)(2θ4 + 6θ22)]

    =1

    4n(n− 1)[4(n− 2)(n− 3)θ22 + 4(n− 2)(θ4 + 3θ22) + 4(θ4 + 3θ22)

    ]=

    1

    n(n− 1)[(n− 1)θ4 + ((n− 2)(n− 3) + 3(n− 2) + 3)θ22]

    =1

    n(n− 1)[(n− 1)θ4 + (n2 − 2n+ 3)θ22]

    So

    Var(S2) = E[S4]− n(n− 1)n(n− 1)

    θ22

    =1

    n(n− 1)[(n− 1)θ4 + (n2 − 2n+ 3− n2 + n)θ22]

    =1

    n(n− 1)[(n− 1)θ4 − (n− 3)θ22]

    =1

    n

    [θ4 −

    n− 3n− 1

    θ22

    ](c) Still assume θ1 = 0.

    E[XS2] =1

    2n2(n− 1)∑i

    ∑j

    ∑k

    E[(Xi −Xj)2Xk]

    =1

    2n2(n− 1)2n(n− 1)E[(X1 −X2)2X1]

    =1

    nE[X31 − 2X21X2 +X1X22 ]

    =1

    nE[X31 ] =

    1

    nθ3

    So X and S2 are uncorrelated if and only if θ3 = 0.

    51

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    5. (a) For the mean,

    Xn+1 =1

    n+ 1

    n+1∑i=1

    Xi

    =1

    n+ 1

    n∑i=1

    Xi +1

    n+ 1Xn+1

    =n

    n+ 1Xn +

    1

    n+ 1Xn+1

    (b) For the variance,

    nS2n+1 =n+1∑i=1

    (Xi −Xn+1)2

    =n+1∑i=1

    (Xi −Xn)2 − (n+ 1)(Xn+1 −Xn)2

    =n∑i=1

    (Xi −Xn)2 + (Xn+1 −Xn)2 − (n+ 1)(Xn+1 −Xn)2

    = (n− 1)S2n + (Xn+1 −Xn)2 − (n+ 1)(Xn+1 −Xn)2

    From the result for sample means

    Xn+1 −Xn =1

    n+ 1(Xn+1 −Xn)

    and therefore

    (Xn+1 −Xn)2 − (n+ 1)(Xn+1 −Xn)2 = (Xn+1 −Xn)2 −1

    n+ 1(Xn+1 −Xn)2

    =n

    n+ 1(Xn+1 −Xn)2

    which completes the proof.

    6. To simplify notation let X = U(n) and Y = U(1). From the general form of thejoint density of two order statistics the joint density of X and Y is

    fXY (x, y) =

    {n(n− 1)(x− y)n−2 for 0 < y < x < 10 otherwise/

    Let R = X − Y and V = Y . This is a one-to-one transformation with inverse

    x = r + v

    y = v,

    52

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    Jacobian determinant

    J(r, v) = det

    (1 10 1

    )= 1,

    and range

    B = {(r, v) : 0 < r < r + v} = {(r, v) : 0 < r < 1 and 0 < v < 1− r}.

    The joint density of R and V is therefore

    fRV (r, v) = fXY (r + v, v) =

    {n(n− 1)rn−2 if 0 < r < 1 and 0 < v < 1− r0 otherwise,

    and the marginal density of R is

    fR(r) =

    ∫ 1−r0

    n(n− 1)rn−2dv = n(n− 1)rn−2(1− r)

    for 0 < r < 1 and zero otherwise. This is a Beta(n− 1, 2) density.

    53

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    Assignment 13

    Due on Monday, November 30, 2015.

    1. Problem 5.24 from the textbook

    2. Problem 5.32 from the textbook

    3. Problem 5.40 from the textbook

    4. Let X have a Gamma(α, 1) distribution and let Y = (X − α)/√α.

    (a) Find the density, mean, and variance of Y .

    (b) Plot the density fα(y) of Y for α = 2, 10, 100.

    (c) Show that for every y the density fα(y) converges to the standard normaldensity at y as α tends to infinity. Hints:

    i. Use Stirling’s approximation for the gamma function, which can bewritten as

    Γ(α) = e−ααα−1/2√

    2π(1 +O(α−1)

    )as α→∞.

    ii. Work with log densities and use the fact that

    log(1 + x) = x− x2

    2+O(x3)

    as x→ 0.

    Solutions

    1. Assume, without loss of generality, that θ = 1. Then for 0 < u < v < 1

    fX(1),X(n)(u, v) =n!

    0!(n− 2)!0!f(u)f(v)F (u)u(F (v)− F (u))n−2(1− F (v))0

    = n(n− 1)f(u)f(v)(F (v)− F (u))n−2

    = n(n− 1)(v − u)n−2

    Let

    Y = X(1)/X(n)

    Z = X(n)

    54

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    The range of (Y, Z) is B = [0, 1]× [0, 1], the inverse transformation is

    X(1) = Y Z

    X(n) = Z

    The Jacobian determinant is

    J(y, z) = det

    (z y0 1

    )= z

    So for 0 < z < 1 and 0 < y < 1

    fY,Z(y, z) = n(n− 1)(z − yz)n−2z= nzn−1 × (n− 1)(1− y)n−2

    Thus Y and Z are independent.

    This proof first removes θ from consideration since it is just a scale parameter.An alternative approach is to note that X(n) is minimal sufficient, X(1)/X(n) isancillary, and use Basu’s theorem in chapter 6.

    2. a. Suppose f is continuous at a and XnP→ a, a a constant. Fix ε > 0. Then

    there exists a δ > 0 such that

    |f(x)− f(a)| < ε

    if |x− a| < δ. So

    P (|f(Xn)− f(a)| < ε) ≥ P (|Xn − a| < δ)→ 1

    So f(Xn)P→ f(a). The result follows if f(x) =

    √x or f(x) = 1/x and

    a > 0.

    b. f(x) = σ/√x is continuous at x = σ2 if σ > 0.

    3. a. For any t and any ε > 0, if Xn > t and |Xn −X| < ε, then X > t− ε. SoX ≤ t− ε implies that either Xn ≤ t or |Xn −X| ≥ ε, i.e.

    {X ≤ t− ε} ⊂ {Xn ≤ t} ∪ {|Xn −X| ≥ ε}

    SoP (X ≤ t− ε) ≤ P (Xn ≤ t) + P (|Xn −X| ≥ ε)

    orP (X ≤ t− ε)− P (|Xn −X| ≥ ε) ≤ P (Xn ≤ t)

    b. Similarly (reversing the roles of X and Xn and replacing t by t+ ε),

    P (Xn ≤ t) ≤ P (X ≤ t+ ε) + P (|Xn −X| ≥ ε)

    55

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    c. Suppose the CDF of X is continuous at t. From the previous two parts,since Xn → X in probability we have

    P (X ≤ t− ε) ≤ lim infn→∞

    P (Xn ≤ t) ≤ lim supn→∞

    P (Xn ≤ t) ≤ P (X ≤ t+ ε)

    for any ε > 0. Since t is a continuity point of the distribution of X,limε↓0 P (X ≤ t− ε) = limε↓0 P (X ≤ t+ ε) = P (X ≤ t), and therefore

    P (X ≤ t) ≤ lim infn→∞

    P (Xn ≤ t) ≤ lim supn→∞

    P (Xn ≤ t) ≤ P (X ≤ t)

    So limn→∞ P (Xn ≤ t) exists and is equal to P (X ≤ t). Thus Xn → X indistribution.

    4. (a) The mean and variance of X are E[X] = α and Var(X) = α, so

    E[Y ] = (E[X]− α)/√α = 0

    Var(Y ) = Var(X)/α = 1.

    The inverse transformation is x =√αy + α with derivative dx/dy =

    √α,

    so the density of Y is

    fY (y) =√αfX(

    √αy + α) =

    √α

    Γ(α)(√αy + α)α−1e−

    √αy−α.

    (b) The following R code is one way to produce the plots:

    z

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    The logarithm of the remainder of the density is

    (α− 1) log(

    y√α

    + 1

    )−√αy = (α− 1)

    [y√α− y

    2

    2α+O(α−3/2)

    ]−√αy

    = − y√α− α− 1

    α

    y2

    2+O(α−1/2)

    → −y2

    2

    as α→∞. So fY (y) converges pointwise to a standard normal density.

    57

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    Assignment 14

    Due on Monday, December 7, 2015.

    1. Let Xn have a χ2n distribution.

    (a) Find an approximating normal distribution for Xn.

    (b) Find an approximating normal distribution for Yn =√Xn.

    (c) Find an approximating normal distribution for Zn = logXn.

    (d) How good are the approximations in parts (a), (b), and (c) for n =5, 10, 20, 100?

    2. The height H and radius R of a cylinder are measured with error; the measure-ments are independent, normally distributed, and

    µH = 75cm σH = 2cm µR = 10cm σR = 1cm.

    The estimated volume of the cylinder is V = πR2H. Find a normal approxi-mation to the distribution of V .

    3. Let X1, . . . , Xn be a random sample from an exponential distribution with meanθ.

    (a) Find a normal approximation to the distribution of the sample mean Xn.

    (b) Let Yn = g(Xn) where g is differentiable. Find a normal approximation tothe distribution of Yn.

    (c) Can you find a function g such that the variance of the normal approxi-mation in (b) does not depend on θ?

    4. Let X1, . . . , Xn he a random sample from a Poisson distribution with mean

    λ > 0. Let Xn be the sample average and let Un =√n(Xn − λ)/

    √Xn. Find

    the limiting distribution of Un as n tends to infinity.

    Solutions

    1. (a) A χ2n random variable Xn has the same distribution as∑n

    i=1 Ui with theUi i.i.d. χ

    21 random variables. So the central limit theorem states that

    Xn ∼ AN(n, 2n)

    as n→∞.

    58

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    (b) Let Vn =√Xn/n = f(Xn/n). Now Xn/n

    P→ 1, Xn/n ∼ AN(1, 2/n),f(1) = 1, and f ′(1) = 1/2. So

    Vn ∼ AN(

    1,1

    42/n

    )= AN

    (1,

    1

    2n

    )and thus

    Yn =√nVn ∼ AN(

    √n, 1/2)

    (c) Let Wn = log(Xn/n) = f(Xn/n). Now Xn/nP→ 1, Xn/n ∼ AN(1, 2/n),

    f(1) = 0, and f ′(1) = 1. So

    Wn ∼ AN (0, 2/n)

    and thusZn = Vn + log n ∼ AN(log n, 2/n)

    (d) The exact CDFs and PDFs of Yn and Zn are

    FYn(y) = FXn(y2) fYn(y) = fXn(y

    2)2y

    FZn(z) = FXn(ez) fZn(z) = fXn(e

    z)ez

    You can look at graphs of the densities or CDFs, or at quantile plots, orat numerical measures of the discrepancies of the exact and approximateCDFs.

    2. V = f(R,H) with f(r, h) = πr2h. The gradient of f is

    ∇f(r, h) = (2πrh, πr2).

    So

    V ≈ f(10, 75) + ∂∂rf(10, 75)(R− 10) + ∂

    ∂hf(10, 75)(H − 75)

    = π × 7500 + π × 1500× (R− 10) + π × 100× (H − 75)∼ N(π × 7500, (π × 1500× 1)2 + (π × 100× 2)2)≈ N(23561.94, (4754.09)2)

    3. (a) The variance of the exponential distribution with mean θ is θ2. So by theCLT Xn ∼ N(θ, θ2/n).

    (b) By the delta method Yn = g(Xn) ∼ N(g(θ), (g′(θ))2θ2/n).(c) The approximate variance is constant in θ if g′(θ) = 1/θ, or g(θ) = log θ.

    This is an example of a variance stabilizing transformation.

    59

  • Statistics STAT:5100 (22S:193), Fall 2015 Tierney

    4. By the weak law of large numbers Xn converges in probability to E[X1] = λ.

    By the continuous mapping theorem Tn =√Xm converges in probability to√

    λ. Using the strong law of large numbers and basic continuity shows thatconvergence also holds almost surely.

    Since Var(X1) = λ the central limit theorem implies that Xn ∼ AN(λ, λ/n).Since the square root function f(x) =

    √x is differentiable at positive x the delta

    method implies that

    Tn ∼ AN(f(λ), (f ′(λ)2λ/n) = AN

    (√λ,

    (1

    2√λ

    )2λ/n

    )= AN

    (√λ,

    1

    4n

    ).

    60

    Assignment 1SolutionsAssignment 2SolutionsAssignment 3SolutionsAssignment 4SolutionsAssignment 5SolutionsAssignment 6SolutionsAssignment 7SolutionsAssignment 8SolutionsAssignment 9SolutionsAssignment 10SolutionsAssignment 11SolutionsAssignment 12SolutionsAssignment 13SolutionsAssignment 14Solutions