STAT:5100 (22S:193) Statistical Inference I Homework...

STAT:5100 (22S:193) Statistical Inference IHomework Assignments

Luke Tierney

Fall 2015

Statistics STAT:5100 (22S:193), Fall 2015 Tierney

Assignment 1

Due on Monday, August 31, 2015.

1. For each of the following experiments, describe a reasonable sample space:

(a) Toss a coin four times.

(b) Count the number of insect-damaged leaves on a plant.

(c) Measure the lifetime (in hours) of a particular brand of light bulb.

(d) Three people arrive at an airport checkpoint. Two of the three are ran-domly chosen to complete a survey.

2. The set-theoretic difference A\B = A ∩ Bc is the set of all elements in A thatare not in B. The symmetric difference A∆B = (A\B) ∪ (B\A) is the set ofall elements in either A or B but not both. Verify the following identities:

(a) A\B = A\(A ∩B)(b) A∆B = Ac∆Bc

(c) A ∪B = A ∪ (B\A)(d) B = (B ∩ A) ∪ (B ∩ Ac)

3. Problem 1.4 in the textbook



Solutions

1. (a) Toss coin 4 times:

{(H,H,H,H), . . .} = {(x1, x2, x3, x4) : xi ∈ {H,T}}

or{0, 1, 2, 3, 4}

(b) Count number of insect-damaged leaves:

{0, 1, . . . , N} N = # leaves (or upper bound){0, 1, 2, . . .} if no upper bound is available

1


(c) Measure lifetime in hours:

{0, 1, 2, . . .} if rounded (can put in upper limit)[0,∞) if fractional hours are allowed

(d) Two out of three people chosen to complete a survey: Suppose the peopleare labeled A, B, and C. One possible sample space is is the collection ofall subsets of size 2 that can be chosen from the set {A,B,C}:

{{A,B}, {A,C}, {B,C}}.

Another possibility is the collection of all ordered pairs that can be formed:

{(A,B), (B,A), (A,C), (C,A), (B,C), (C,B)}.

2. (a) A\B is defined as A ∩Bc. To see that A\B = A\(A ∩B):

A\(A ∩B) = A ∩ (A ∩B)c

= A ∩ (Ac ∪Bc) De Morgan’s law= (A ∩ Ac) ∪ (A ∩Bc) distributive law= ∅ ∪ (A ∩Bc)= A\B

(b) A∆B = Ac∆Bc: For any two sets A and B

A\B = A ∩Bc

= Bc ∩ A commutative law= Bc ∩ (Ac)c

= Bc\Ac

So

A∆B = (A\B) ∪ (B\A)= (Ac\Bc) ∪ (Bc\Ac)= Ac∆Bc

(c) A ∪B = A ∪ (B\A):

A ∪B = A ∪ ((B ∩ A) ∪ (B ∩ Ac)) by part (d)= (A ∪ (B ∩ A)) ∪ (B ∩ Ac) associative law= A ∪ (B ∩ Ac)

2


(d) B = (B ∩ A) ∪ (B ∩ Ac):

(B ∩ A) ∪ (B ∩ Ac) = B ∩ (A ∪ Ac) distributive law= B ∩ S= B

3. (a) A or B or both:

P (A ∪B) = P (A) + P (B)− P (A ∩B)

(b) A or B but not both:

A∆B = (A ∪B)\(A ∩B)= (A ∩Bc) ∪ (B ∩ Ac)

and the sets A ∩Bc and B ∩Ac are disjoint. Since A ∩B and A ∩Bc aredisjoint and their union is A,

P (A) = P (A ∩B) + P (A ∩Bc)

and therefore P (A ∩ Bc) = P (A) − P (A ∩ B). Similarly P (B ∩ Ac) =P (B)− P (A ∩B) and therefore

P (A∆B) = P (A) + P (B)− 2P (A ∩B)

(c) At least one of A or B: A ∪B.(d) At most one of A or B = not A ∩B = (A ∩B)c:

P ((A ∩B)c) = 1− P (A ∩B)

4. The event A ∩ B ∩ C is the event that the birth results in identical twins whoare female. The proportion of births satisfying this description is

P (A ∩B ∩ C) = 190× 1

3× 1

2=

1

540= 0.001852.

5. If A and B are disjoint then

P (A ∪B) = P (A) + P (B) ≤ 1,

but

P (A) + P (B) = P (A) + 1− P (Bc) = 13

+ 1− 14

=13

12> 1,

so A and B cannot be disjoint if P (A) = 13

and P (Bc) = 14.

Another way to see that this is not possible: If A and B are disjoint then A ⊂ Bcand therefore P (A) ≤ P (B), but the opposite is true.

3


Assignment 2

Due on Wednesday, September 9, 2015.

1. Problem 1.14 from the textbook

2. Suppose n balls are placed at random in n cells; cells can contain more thanone ball.

(a) Show that the probability that exactly one cell remains empty is

n(n− 1)(n2

)(n− 2)!

nn=

(n2

)n!

nn

(b) The R function defined as

sim1


You can show it algebraically using the binomial theorem

(a+ b)n =n∑k=0

(n

k

)akbn−k

taking a = b = 1.

You can also use one of several counting arguments. One such argument isinductive ans starts with the number of subsets of the empty set, which is x0 = 1.Now let xn be the number of subsets of a set of n items An = {a1, a2, . . . , an}.A subset of An+1 = {a1, a2, . . . , an+1} is either also a subset of An+1 is eitherslso a subset An or it is the union of a subset of An with {an+1}. There are xnsubsets of each type, so the number of subsets of An+1 is xn+1 = 2xn,

Another conting arguments uses the fact that each subset of a setAn = {a1, . . . , an}corresponds to an ordered list [x1, . . . , xn] of n 0’s and 1’s, with

xi = 1 if the ai is in the subset.

xi = 0 if the ai is not in the subset.

For example, if A4 = {a1, a2, a3, a4}, then

(1, 0, 0, 1)↔ {a1, a4}

There are 2n such lists.

2. (a) n balls are assigned at random into n cells. S has nn elements since multipleballs per call are allowed.

Assume equally likely outcomes. Equivalently, assume balls are assignedindependently.

Exactly one cell empty means:

• one empty cell• one with two• n-2 with 1

Choices:

• n for the one empty cell• n− 1 for the cell with two balls•(n2

)for the balls to use for the two-ball cell.

• (n− 2)! arrangements for the other balls in their cells.So the number of ways to get one empty is

n(n− 1)(n

2

)(n− 2)! =

(n

2

)n!

The probability of this arrangement is(n2

)n!

nn

5


(b) We can define a function sim1empty to compute the probability of oneempty cell by simulation using N simulation replicates:

sim1empty


> f f(20)

[1] 137846528820

> choose(40, 20)

[1] 137846528820

The identityn∑k=0

(n

k

)2=

(2n

n

)can be verified using induction, but a counting argument is simpler: Considera box containing n red and n blue balls and consider selecting a sample of nballs. There are

(2nn

)such samples. A sample has to contain some number k of

red balls and n − k blue balls. The number of samples with k red and n − kblue balls is (

n

k

)(n

n− k

)=

(n

k

)2.

So the total number of samples of size n from a set of size 2n satisfies(2n

n

)=

n∑k=0

(n

k

)(n

n− k

)=

n∑k=0

(n

k

)2.

Another approach to counting the number of outcomes in which the playershave the same number of heads: We have 2n slots that are to be filled in witha head or a tail; the first n corresponding to player A and the second to playerB. Chose n of these slots; this can be done in

(2nn

)ways. Some number, say k of

the chosen slots will be in the first half. Make these heads, and the remainingn− k slots in the first half tails. For the n− k chosen slots in the second half,make those tails and the remaining k heads. Then the result contains k headsin the first half and k heads in the second half. Every assignment of heads andtails with the same number of heads in each half corresponds in this way toa unique selection of n out of 2n slots, so there are the same number of suchassignments as there are ways to choose n slots out of 2n slots,

(2nn

).

5. There are 6k possible outcomes for the k rolls. Outcomes with the m-th 6 onroll k must have a 6 on roll k (one choice) and m − 1 in the first k − 1 rolls;there are

(k−1m−1

)ways to choose the positions for these 6 rolls, and , given these

positions, there are 5k−m ways to choose the results for the remaining rolls. Sothe probability of the m-th 6 on roll k is(

k−1m−1

)5k−m

6k=

(k − 1m− 1

)(1

6

)m(1− 1

6

)k−m

7


Assignment 3

Due on Monday, September 14, 2015.

1. Problem 1.12 (b) from the textbook




5. An urn contains 11 balls numbered 0, 1, . . . , 10. A ball is selected at random.Suppose the number on the selected ball is k. A second urn is filled with k redballs and 10− k blue balls. Five balls are selected at random with replacementfrom the second urn.

(a) Find the probability that the sample from the second urn consists of threered and two blue balls.

(b) Given that the sample from the second urn consists of three red and twoblue balls, find the conditional probability that the ball selected from thefirst urn had the number k = 6.

Solutions

1. Let A1, A2, . . . be pairwise disjoint. For each i, let

Bi =∞⋃j=i

Aj

Then for each n,∞⋃i=1

Ai = A1 ∪ · · · ∪ An ∪Bn+1

Since A1, A2, . . . , An, Bn+1 are pairwise disjoint,

P

(∞⋃i=1

Ai

)=

n∑i=1

P (Ai) + P (Bn+1)

for every n by finite additivity. But

B1 ⊃ B2 ⊃ · · ·

8


and⋂Bi = ∅. So by continuity, P (Bn+1) ↓ 0. So

n∑i=1

P (Ai) = P

(∞⋃i=1

Ai

)− P (Bn+1)→ P

(∞⋃i=1

Ai

)and thus

∞∑i=1

P (Ai) = P

(∞⋃i=1

Ai

)An alternative approach is to show, using finite additivity and the continuityaxiom as stated, that for any set of events B1, B2, . . . that satisfy B1 ⊂ B2 ⊂B3 ⊂ . . . the identity

P

(∞⋃i=1

Bi

)= lim

n→∞P (Bn)

is true. Then let

Bi =i⋃

j=1

Aj.

These Bi satisfy B1 ⊂ B2 ⊂ B3 ⊂ . . . and∞⋃i=1

Ai =∞⋃i=1

Bi.

By finite additivity

P (Bi) =i∑

j=1

P (Ai)

and by the result just stated

i∑j=1

P (Ai) = P (Bi)→ P

(∞⋃i=1

Bi

)= P

(∞⋃i=1

Ai

).

2. Let Ei be the event that the first head appears on toss i and let A be the eventthat player A wins. Then

A = E1 ∪ E3 ∪ E5 ∪ . . .

The Ei are pairwise disjoint, and P (Ei) = (1− p)i−1p, so

P (A) = p+ (1− p)2p+ (1− p)4p+ . . .

= p∞∑k=1

(1− p)k−1

=p

1− (1− p)2

=p

1− (1− 2p+ p2)=

p

2p− p2=

1

2− p

9


So if p = 12

then P (A) = 23, and for all p

P (A) =1

2− p≥ 1

2.

An alternative approach to deriving the formula for P (A) is to condition on theresults of the first two tosses. For the first toss,

P (A) = pP (A|E1) + (1− p)P (A|Ec1)= p+ (1− p)P (A|Ec1)

For the second toss

P (A|Ec1) = pP (A|E2 ∩ Ec1) + (1− p)P (A|Ec2 ∩ Ec1)= (1− p)P (A|Ec2 ∩ Ec1)

Since the tosses are independent, the game starts over if the first two tosses aretails, i.e.

P (A|Ec2 ∩ Ec1) = P (A)

So we have an equation in P (A):

P (A) = p+ (1− p)2P (A)

and the solution is

P (A) =1

2− p≥ 1

2.

3.

I : B,B,G P (B|I) = 2/3II : B,B,B,G,G P (B|II) = 3/5

P (I) = P (II) = 1/2 (choose litter at random).

a. P (B) = P (B|I)P (I) + P (B|II)P (II) = 23× 1

2+ 3

5× 1

2= 1

3+ 3

10= 19

30

b. P (I|B) = P (I∩B)P (B)

=23× 1

21930

=131930

= 1019

4. The probabilities of no hits and exactly one hits are:

P (not hit) =

(4

5

)10≈ 0.1074

P (hit once) = 10× 15×(

4

5

)9≈ 0.2684

10


Therefore

P (at least twice) = 1− P (not hit)− P (hit once)

= 1−(

4

5

)10− 10× 1

5×(

4

5

)9≈ 0.6242

and

P (at least twice|at least once) = P (at least twice)P (at least once)

=1−

(45

)10 − 10× 15×(45

)91−

(45

)10 ≈ 0.69935. (a) Given that ball k is chosen from the first urn, the probability of choosing

three red and two blue balls from the second when sampling with replace-ment is the binomial probability

P (three red|ball k) =(

5

3

)(k

10

)3(1− k

10

)2.

The probability of choosing ball k from the first urn and three red ballsfrom the second is therefore

P (three red and ball k) = P (three red|ball k)P (ball k)

=

(5

3

)(k

10

)3(1− k

10

)2× 1

11,

and the unconditional probability of choosing three red balls from thesecond urn is

P (three red) =10∑k=0

P (three red and ball k)

=10∑k=0

(5

3

)(k

10

)3(1− k

10

)2× 1

11.

≈ 0.1515

This can be computed in R as

> sum(dbinom(3, 5, (0 : 10) / 10) / 11)

[1] 0.1515

(b) The conditional probability that the chosen ball from the first urn wasnumbered k = 6, given that three red balls were chosen from the second,

11


is

P (ball k = 6|three red) = P (three red and ball k = 6)P (three red)

=

(53

) (k10

)3 (1− k

10

)2 × 111

P (three red)

≈ 0.031420.1515

≈ 0.2074

This can be computed as

> (dbinom(3, 5, 6 / 10) / 11) / (sum(dbinom(3, 5, (0 : 10) / 10)) / 11)

[1] 0.2073807

The probabilities for all k can be computed and graphed as

k


Assignment 4


1. A coin has probability p of coming up heads and 1− p of tails, with 0 < p < 1.An experiment is conducted with the following steps:

1. Flip the coin.

2. Flip the coin a second time.

3. If both flips land on heads or both land on tails return to step 1.

4. Otherwise let the result of the experiment be the result of the last flip atstep 2.

Assume flips are independent.

(a) The R function

sim1


Solutions

1. (a) One possible approach:

> sapply(seq(0.2, 0.9, by = 0.2),

function(p) mean(replicate(10000, sim1(p))))

[1] 0.4913 0.4965 0.5034 0.4991

This suggests that the probability of heads may be 0.5 for any p.

(b) Let A be the event that the process returns a head, and let B be the eventthat the process ends after the first two flips. Then

P (A) = P (A ∩B) + P (A|Bc)P (Bc).

Now A ∩ B is the event that the first toss is a tail and the second toss isa head, so P (A ∩ B) = (1 − p)p. B is the event that either the first tossis a head and the second a tail, or the first is a tail and the second is ahead; so P (B) = 2p(1− p) and P (Bc) = 1− 2p(1− p). If the process doesnot end with the first two tosses then it starts over again independently,so P (A|Bc) = P (A). Therefore P (A) satisfies

P (A) = p(1− p) + P (A)(1− 2p(1− p))

and thus

P (A) =p(1− p)2p(1− p)

=1

2,

as the simulation in part (a) suggests. The requirement that p > 0 andp < 1 ensures that the denominator is positive and that the process isguaranteed to end.

2. (i) PX(A) = P (X ∈ A) ≥ 0 since P is a probability.(ii) PX(R) = P (X ∈ R) = P (S) = 1 since P is a probability.

(iii) For Borel sets A1, A2, . . . that are pairwise disjoint,

PX

(⋃Ai

)= P

(X ∈

⋃Ai

)= P

(⋃{X ∈ Ai}

)=∑

P (X ∈ Ai)

=∑

PX(Ai)

since Bi = {X ∈ Ai} are pairwise disjoint.

3. All functions in (a)–(d) are continuous and therefore right continuous. Wetherefore only need to check that they are nondecreasing and have the rightlimits at ±∞.

14


(a) For all x ∈ Rd

dx

(1

2+

1

πtan−1(x)

)=

1

π

1

1 + x2> 0

so F (x) is increasing, and

limx→−∞

1

2+

1

πtan−1(x) =

1

2+

1

π

(−π

2

)= 0

limx→∞

1

2+

1

πtan−1(x) =

1

2+

1

π

π

2= 1.

So F (x) is a CDF.

(b) For all x ∈ Rd

dx(1 + e−x)−1 =

e−x

(1− e−x)2> 0


limx→−∞

(1 + e−x)−1 = (1 +∞)−1 = 0

limx→∞

(1 + e−x)−1 = (1 + 0)−1 = 1.

So F (x) is a CDF.

(c) For all x ∈ Rd

dxe−e

−x= e−xe−e

−x> 0


limx→−∞

e−e−x

= e−∞ = 0

limx→∞

e−e−x

= e0 = 1.

So F (x) is a CDF.

(d) F (x) = 0 for x ≤ 0. For x > 0

d

dx(1− e−x) = e−x > 0

so F (x) is nondecreasing, and

limx→∞

(1− e−x) = 1− 0 = 1.

So F (x) is a CDF.

15


(e) The function is continuous everywhere except possibly at the origin, andat the origin it is right continuous because of the placement of the equalitysign in the definition. The function is increasing for y < 0 and for y > 0from part (b). The function value at the origin is

F (0) = �+1− �

2>

1− �2

= F (0−),

so F is increasing everywhere. Using the limit results from part (b)

limy→−∞

F (y) = (1− �)× 0 = 0

limx→∞

F (y) = �+ (1− �)× 1 = 1.

So F (x) is a CDF.

4. The set of possible values is X = {0, 1, 2, 3, 4}, and the PMF is given by

fX(x) =

(5x

)(254−x

)(304

)A table of the probabilities is

0 1 2 3 4p 0.4616 0.4196 0.1095 0.0091 0.0002

and a plot of the CDF is

−1 0 1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

CDF for Number of Defectives

x

f(x)

R code was used to create the table and plot the CDF is available

16

http://www.stat.uiowa.edu/~luke/classes/193/1-51.R


5. The function

g(x) =

{f(x)

1−F (x0) x ≥ x00 x < x0

represents the conditional density of X given that X ≥ x0. Since f(x) ≥ 0 wealso have g(x) ≥ 0. Furthermore,∫ ∞

−∞g(x)dx =

∫ ∞x0

g(x)dx =

∫∞x0f(x)dx

1− F (x0)

=P (X > x0)

1− F (x0)=

1− F (x0)1− F (x0)

= 1

So g(x) is a PDF.

17


Assignment 5


1. Let X be a non-negative, integer-valued random variable with probability massfunction pn = P (X = n) for n = 0, 1, . . . . The probability generating functionof X is defined as

G(t) =∞∑n=0

tnpn

for |t| ≤ 1.

(a) Show that pn can be recovered from the value of the n-th derivative of G(t)at t = 0. (The zero-th derivative of G(t) is G(t).)

(b) Suppose X is the number of heads in n independent flips of a biased coinwith probability of heads equal to p. X has a binomial distribution. Findthe probability generating function of X.

(c) Suppose Y is the number of independent tosses of a biased coin with withprobability p of heads needed until the first head is obtained. Y has ageometric distribution. Find the probability generating function of Y .




Solutions

1. (a) The derivatives are

G′(t) =∞∑n=1

ntn−1pn

G′′(t) =∞∑n=2

n(n− 1)tn−2pn

...

G(k)(t) =∞∑n=k

n!

(n− k)!tn−2pn.

18


At t = 0 all terms except the first are zero, so

G(0) = p0

G′(0) = p1

G′′(0) = 2p2...

G(k)(0) = k! pk.

So pk = G(k)(0)/k!. This is the reasonG is called the probability generating

function.

(b) For the binomial distribution

G(t) =n∑k=0

tk(n

k

)pk(1− p)n−k =

n∑k=0

(n

k

)(tp)k(1− p)n−k = (tp+ 1− p)n

by the binomial theorem.

(c) For the geometric distribution

G(t) =∞∑n=1

tnp(1− p)n−1 = tp∞∑n=1

[t(1− p)]n−1 = tp1− t(1− p)

.

2. The change of variables formula for smooth monotone transformations can beapplied in all three cases.

(a) Y = [0, 1], g−1(y) = √y, and for y ∈ [0, 1]

fY (y) =1

2√y

(b) Y = (0,∞), g−1(y) = e−y, and for y > 0

fY (y) =(n+m+ 1)!

n!m!e−yn(1−e−y)m|−e−y| = (n+m+ 1)!

n!m!e−y(n+1)(1−e−y)m

(c) Y = (1,∞), g−1(y) = log y, and for y > 1

fY (y) =1

σ2log(y)

ye−((log y)/σ)

2/2

3. All three fit into the framework of Theorem 2.1.8.

(a) Let A1 = (−∞, 0) and A2 = (0,∞). On A1, g1(x) = |x|3 = −x3, and onA2, g2(x) = |x|3 = x3. The range is Y = (0,∞). So for y > 0

fY (y) =1

2e−y

1/3 1

3| − y−2/3|+ 1

2e−y

1/3 1

3y−2/3 =

1

3y−2/3e−y

1/3

19


(b) Let A1 = (−1, 0) and A2 = (0, 1). The range of Y is Y = (0, 1). Theng1(x) = 1− x2, g2(x) = 1− x2, g−11 (y) = −

√1− y, and g−12 =

√1− y. So

for y ∈ (0, 1)

fY (y) =3

8

(1−

√1− y

)2 12

1√1− y

+3

8

(1 +

√1− y

)2 12

1√1− y

=3

8(1− y)−1/2 + 3

8(1− y)1/2

(c) Y , A1, A2, and g1 are as in the previous part; g2(x) = 1−x and g2−1(y) =1− y. So for y ∈ (0, 1)

fY (y) =3

8

(1−

√1− y

)2 12

1√1− y

+3

8(1 + 1− y)2

=3

16

(1−

√1− y

)2 1√1− y

+3

8(2− y)2

4. F−1(y) = inf{x : F (x) ≥ y}

(a)

F (x) =

{0 x < 0

1− e−x x ≥ 0F−1(y) =

{−∞ y = 0− log(1− y) 0 < y ≤ 1

(b)

F (x) =

12ex x < 0

12

0 ≤ x < 11− 1

2e1−x x ≥ 1

F−1(y) =

{log 2y 0 ≤ y ≤ 1/21− log(2(1− y)) 1/2 < y ≤ 1

(c)

F (x) =

{14ex x < 0

1− 14e−x x ≥ 0

F−1(y) =

log(4y) 0 ≤ y < 1/40 1/4 ≤ y < 3/4− log(4(1− y)) 3/4 ≤ y ≤ 1

20


Assignment 6

Due on Monday, October 5, 2015.



3. Let X be a non-negative random variable with CDF F . Show that

E[X] =

∫ ∞0

(1− F (t))dt.

Hint: Argue that you can write X =∫∞0

1{t


Using the PDF of Y = X2:

E[X2] = E[Y ] =

∫ ∞0

y1√2π

1√ye−y/2dy

=1√2π

∫ ∞0

√ye−y/2dy

=23/2√

2π

∫ ∞0

z3/2−1e−zdz

=23/2√

2πΓ(3/2) =

2√π

Γ(3/2)

=2√π

Γ(1/2)1

2=

Γ(1/2)√π

= 1

(b) The density of Y = |X| is

fY (y) = fX(−y) + fX(y) = 2e−y2/2 1√

2π=

√2

πe−y

2/2

for y ≥ 0 and fY (y) = 0 for y < 0. The mean is

E[Y ] = E[|X|] = 2∫ ∞0

ye−y2/2 1√

2πdy

= 2

[− 1√

2πe−y

2/2

∣∣∣∣∞0

]= 2

1√2π

=

√2

π

The second noncentral moment is

E[Y 2] = E[X2] = 1

so the variance is

Var(Y ) = 1− 2π

2. The possible values of X are X = {1, 2, 3, . . . }. The probability mass functionof X is

fX(x) = P (X = x)

= P (first x are H, followed by a T) + P (first x are T, followed by an H)

= px(1− p) + (1− p)xp

22


for x ∈ X . So the mean is

E[X] =∞∑x=1

x(px(1− p) + (1− p)xp)

=

[∞∑x=1

xpx(1− p)

]+

[∑x=1

x(1− p)xp)

]

= p

[∞∑x=1

xpx−1(1− p)

]+ (1− p)

[∑x=1

x(1− p)x−1p)

]

The sums in square brackets are the means of geometric random variables withsuccess probabilities 1− p and p, respectively, so

E[X] =p

1− p+

1− pp

=p2 + (1− p)2

p(1− p)

3. For a non-negative random variable X we can write

X =

∫ X0

1dt =

∫ ∞0

1{t t)dt = a

∫ ∞0

e−λtdt+ (1− a)∫ ∞0

e−µtdt

=a

λ

∫ ∞0

λe−λtdt+1− aµ

∫ ∞0

µe−µtdt

=a

λ+

1− aµ

5. The n-th moment of this density is

E[Xn] =

∫ ∞1

xnα

xα+1dx =

∫ ∞1

αxn−α−1dx

=

{[α

n−αxn−α]∞

1for α 6= n

[α log(x)]∞1 for α = n=

{∞ for α ≤ nα

α−n for α > n.

23


So

E[X] =

{∞ for α ≤ 1αα−1 for α > 1

and

E[X2] =

{∞ for α ≤ 2αα−2 for α > 2.

The variance of X is therefore infinite if α ≤ 2, and is

Var(X) = E[X2]− E[X]2 = αα− 2

−(

α

α− 1

)2=

α

(α− 2)(α− 1)2

for α > 2.

24


Assignment 7








Solutions

1. (a) Over the range x ∈ [0, 1] the CDF is

F (x) =

∫ x0

3y2dy = x3

is strictly increasing, so there is a unique median m that solves

F (m) = m3 = 1/2

The solution is m = 12

1/3 ≈ 0.7937(b) The density (a Cauchy density) is symmetric around the origin, so

P (X ≤ 0) = P (X ≥ 0) = 1/2

and therefore m = 0 is a median. Since the density is positive the CDF isstrictly increasing and the median is unique.

2. (a) f(x) = axa−1, 0 < x < 1, a > 0.

E[X] =

∫ 10

axadx =a

a+ 1

E[X2] =

∫ 10

axa+1dx =a

a+ 2

Var(X) =a

a+ 2−(

a

a+ 1

)2=

a

(a+ 2)(a+ 1)2

25


(b) f(x) = 1n, x = 1, 2, . . . , n.

E[X] =n∑i=1

i

n=

1

n

n∑i=1

i =1

n

n(n+ 1)

2=n+ 1

2

E[X2] =n∑i=1

i2

n=

1

n

n∑i=1

i2 =1

n

n(n+ 1)(2n+ 1)

6=

(n+ 1)(2n+ 1)

6

Var(X) =(n+ 1)(2n+ 1)

6−(n+ 1

2

)2= (n+ 1)

(2n+ 1

6− n+ 1

4

)= (n+ 1)

4n+ 2− 3n− 312

= (n+ 1)n− 1

12=n2 − 1

12

(c) f(x) = 32(x− 1)2, 0 < x < 2.

E[X] =

∫ 20

x3

2(x− 1)2dx = 1

Var(X) = E[(X − 1)2]

=

∫ 20

3

2(x− 1)4dx

=3

2

(x− 1)5

5

∣∣∣∣20

=3

2× 2

5=

3

5

3. The first derivative of S(t) is

S ′(t) =d

dtlogMX(t) =

M ′X(t)

MX(t)

So

S ′(t)|t=0 =M ′X(0)

MX(0)=E[X]

1= E[X].

By the quotient rule the second derivative of S(t) is

S ′′(t) =MX(t)M

′′X(t)−M ′X(t)2

MX(t)2

and therefore

S ′′(t)|t=0 =MX(0)M

′′X(0)−M ′X(0)2

MX(0)2= E[X2]− E[X]2 = Var(X).

4. (a) Done in class.

26


(b) Variation on a geometric.

M(t) =∞∑x=0

etxp(1− p)x =

{p

1−et(1−p) t < − log(1− p)∞ otherwise

M ′(t) =p(1− p)et

(1− et(1− p))2

M ′′(t) =(1− et(1− p))2p(1− p)et + 2(1− et(1− p))etp(1− p)2et

(1− et(1− p))4

E[X] =1− pp

E[X2] =p3(1− p) + 2p2(1− p)2

p4=p(1− p) + 2(1− p)2

p2

Var(X) =p(1− p) + (1− p)2

p2=

1− pp2

(c)

M(t) =

∫etx

1√2πσ

e−(x−µ)2

2σ2 dx

=

∫1√2πσ

exp{− x2

2σ2+xµ

σ2− µ

2

2σ2+ tx}dx

=

∫1√2πσ

exp{− x2

2σ2+

x

σ2(µ+ σ2t)− µ

2

2σ2}dx

= exp{− µ2

2σ2+µ2 + 2µσ2t+ σ4t2

2σ2}

= exp{µt+ 12σ2t2}

K(t) = logM(t) = µt+1

2σ2t2

K ′(t) = µ+ σ2t

K ′′(t) = σ2

E[X] = K ′(0) = µ

Var(X) = K ′′(0) = σ2

5. (a) From 2.30(d)

MX(t) =

{(p

1−et(1−p)

)rt < − log(1− p)

∞ otherwise

27


(b)

MY (t) = E[etY ] = E[e2ptX ] = MX(2pt)

=

{(p

1−e2pt(1−p)

)r2pt < − log(1− p)

∞ otherwise

Now by L’Hospital’s rule

limp→0− log(1− p)

2p=

limp→01

1−p

limp→0 2=

1

2

and

limp→0

p

1− e2pt(1− p)=

limp→0 1

limp→0(−2te2pt(1− p) + e2pt)=

1

1− 2t

So for t < 12,

MY (t)→(

1

1− 2t

)r=

(1

1− 2t

)2r/2This is the MGF of a χ22r distribution.

6. The result holds only for x = 0, . . . , n − 1. For x = 0 the left hand side is(1− p)n and the right hand side is

n

∫ 1−p0

tn−1dt = tn|t=1−pt=0 = (1− p)n

So the claim hols for x = 0. Suppose the claim is true for y = 0, . . . , x− 1 andx < n. Integration by parts produces

(n− x)(n

x

)∫ 1−p0

tn−x−1(1− t)xdt

=

(n

x

)∫ 1−p0

((n− x)tn−x−1

)(1− t)xdt

=

(n

x

)[tn−x(1− t)x

∣∣t=1−pt=0

+

∫ 1−p0

xtn−x(1− t)x−1dt]

=

(n

x

)px(1− p)n−x + x

(n

x

)∫ 1−p0

tn−(x−1)−1(1− t)x−1dt

=

(n

x

)px(1− p)n−x + (n− (x− 1))

(n

x− 1

)∫ 1−p0

tn−(x−1)−1(1− t)x−1dt

28


By the induction hypothesis the second term is

(n− (x− 1))(

n

x− 1

)∫ 1−p0

tn−(x−1)−1(1− t)x−1dt =x−1∑k=0

(n

k

)pk(1− p)n−k

Thus the result folds for all x = 0, . . . , n− 1.Several other approaches are possible.

• Differentiating both sides produces a telescoping series on the left handside.

• Both sides are polynomials of degree n in p. The polynomials are equal ifand only if the coefficients of the powers of p are equal, and these can becalculated by differentiating multiple times and evaluating the derivativesat zero.

• The simplest approach uses a property of the distribution of order statisticsthat we will learn about in Chapter 5:

If U1, . . . , Un are independent standard uniforms and Np is the number ofthese uniforms that are less than or equal to p, then Np is Binomial(n, p),and

P (Np ≤ x) = P (Np < x+ 1) = P (U(x+1) > p)

where U(k) is the k-th order statistic of the sample. Now

P (U(x+1) > p) = P (1− U(x+1) < 1− p) = P (V(n−x) < 1− p)

where V(k) is the k-th order statistic of the sample 1−U1, . . . , 1−Un. Theresult now follows from the fact that the k-th uniform order statistic V(k)for a sample of size n has a Beta(k, n− k + 1) distribution.

29


Assignment 8








Solutions

1. P (X ≥ 2) = 0.99 means

P (X = 0) + P (X = 1) = e−λ + λe−λ = 0.01

The solution is around 6.6. This can be determined graphically or numerically,for example using the R function uniroot.

2. X is Binomial(n, p) and Y is negative binomial(r, p) (zero based, Y countsnumber of failures).

FX(r − 1) = P (r − 1 or fewer successes in n trials)= 1− P (r or more successes in n trials)= 1− P (r-th success on or before n-th trial)= 1− P (number of failures before r-th success ≤ n− r)= 1− FY (n− r)

3.

hT (t) = limδ↓0

P (t ≤ T < t+ δ|T > t)δ

= limδ↓0

1

δ

F (t+ δ)− F (t)1− F (t)

=1δ(F (t+ δ)− F (t))

1− F (t)

=F ′(t)

1− F (t)=

f(t)

1− F (t)

30


and

− ddt

log(1− F (t)) = f(t)1− F (t)

The quantity

HT (t) = − log(1− FT (t)) =∫ t0

hT (u)du

is called the cumulative hazard function, and

FT (t) = 1− exp{−HT (t)}.

4. (a) For an Exponential(β) distribution

fX(t) =1

βe−t/β

hT (t) =

1βe−x/β

e−x/β=

1

β

(b) For a Weibull distribution,

fT (t) =γ

βtγ−1e−t

γ/β

FT (t) = P (X1/γ ≤ t) = P (X ≤ tγ) = 1− e−tγ/β

hT (t) =γ

βtγ−1

(c) For the logistic distribution,

FT (t) =1

1 + e−(t−µ)/β

fT (t) =1

(1 + e−(t−µ)/β)21

βe−(t−µ)/β

=1

βF (t)(1− F (t))

So h(t) = 1βF (t).

5. (a) The normal family can be written as

f(x|µ, σ) = 1√2πσ

exp

{−(x− µ)

2

2σ2

}

=1√2πσ

e−µ2/(2σ2)︸︷︷︸

c(θ)

exp

−x2︸︷︷︸t1(x)1

2σ2︸︷︷︸w1(θ)

+ x︸︷︷︸t2(x)

µ

σ2︸︷︷︸w2(θ)

1︸︷︷︸h(x)31


(b) The Gamma family with both parameters unknown can be written as

f(x|α, β) = 1Γ(α)βα

xα−1e−x/β1(0,∞)(x)

=1

Γ(α)βα︸︷︷︸c(θ)

exp

(α− 1)︸︷︷︸w1(θ) log x︸︷︷︸t1(x) + (−x)︸︷︷︸t2(x)1

β︸︷︷︸w2(θ)

1(0,∞)(x)︸︷︷︸h(x)If α is known then the first term in the exponent becomes part of h(x); ifβ is known the second term in the exponent becomes part of h(x).

(c) The Beta family with both parameters unknown can be written as

f(x|α, β) = Γ(α + β)Γ(α)Γ(β)

xα−1(1− x)β−11[0,1](x)

=Γ(α + β)

Γ(α)Γ(β)︸︷︷︸c(θ)

exp

(α− 1)︸︷︷︸w1(θ)

log x︸︷︷︸t1(x)

+ (β − 1)︸︷︷︸w2(θ)

log(1− x)︸︷︷︸t2(x)

1[0,1](x)︸︷︷︸h(x)

Again, if either α or β is known the corresponding term in the exponentbecomes part of h(x).

(d) The Poisson family can be written as

f(x|λ) = λx

x!e−λ =

1

x!︸︷︷︸h(x)

e−λ︸︷︷︸c(λ)

exp

x︸︷︷︸t(x)

log λ︸︷︷︸w(λ)

(e) The negative binomial family with r known can be written as

f(x|r, p) =(r + x− 1

x

)pr(1− p)x

=

(r + x− 1

x

)︸︷︷︸

h(x)

pr︸︷︷︸c(p)

exp

x︸︷︷︸t(x)

log(1− p)︸︷︷︸w(p)

6. (a) For the binomial, w(p) = log(p/(1 − p)), c(p) = (1 − p)n, and t(x) = x.

The variance Var(t(X)) = Var(x) satisfies

(w′(p))2Var(X) = − d2

dp2log c(p)− w′′(p)E[X]

32


Now

w′(p) =1

p+

1

1− p=

1

p(1− p)

w′′(p) = − 1p2

+1

(1− p)2d2

dp2log c(p) = − n

(1− p)2

So (1

p(1− p)

)2Var(X) =

n

(1− p)2−(− 1p2

+1

(1− p)2

)np

= n

(1

1− p+

1

p

)=

n

p(1− p)

and thus Var(X) = np(1− p)(b) For the Beta distribution t1(x) = log x and t2(x) = log(1−x). The function

f(x) = x cannot be expressed as a linear combination of t1(x) and t2(x),so the identities in Theorem 3.4.2 cannot be used to find the mean andvariance of X.

If X ∼ Poisson(λ) then t(x) = x, w(λ) = log λ, and c(λ) = e−λ. So

w′(λ) = 1/λ w′′(λ) = −1/λ2

∂

∂λlog c(λ) = −1 ∂

2

∂λ2log c(λ) = 0

So Theorem 3.4.2 produces the equations

E[X/λ] = 1

Var(X/λ) = E[X/λ2]

with solutions E[X] = λ and Var(X) = λ.

33


Assignment 9






5. Let X1, X2, and V be independent random variables with

E[X1] = µ E[X2] = µ E[V ] = 0

Var(X1) = σ2 Var(X2) = σ

2 Var(V ) = τ 2.

Let Y1 = X1 + V and Y2 = X2 + V

(a) Find the means and variances of Y1 and Y1.

(b) Find Cov(Y1, Y2) and Cov(Y1, V ).

6. In the generalized birthday problem discussed in Week 2 an urn contains m ballsand a sample of size n is drawn from the urn with replacement. Let X be thenumber of balls that do not appear in the sample. Find the mean and varianceof X. [Hint: Express X as a sum of suitable Bernoulli random variables.]

Solutions

1. The joint density is

f(x, y) =

{14−1 ≤ x, y ≤ 1

0 otherwise

(a) Since the unit circle is contained in the supporting square,

P (X2 + Y 2 < 1) =area of circle

total area of square

=π

4

(b) The line y = 2x splits the support into two parts of equal area, so P (2X−Y > 0) = 1/2. Alternatively,

P (Y < 2X) =

∫ 1−1

∫ 1y/2

1

4dxdy =

∫ 1−1

1

4(1− y/2)dy

=

[y

4− y

2

16

]1−1

=1

4+

1

4=

1

2

34


(c) All points in the interior of the square satisfy |x− y| < 2, so

P (|X + Y | < 2) = 1.

2. (a) The integral of the density is

1 =

∫ 10

∫ 20

C(x+ 2y)dxdy =

∫ 10

C(2 + 4y)dy

= C(2 + 2) = 4C

So the normalizing constant is C = 1/4.

(b) The marginal density of X is

fX(x) =

∫ 10

1

4(x+ 2y)dy1[0,2](x)

=1

4(x+ 1)1[0,2](x)

(c) The joint CDF is

F (x, y) =

∫ y0

∫ x0

1

4(u+ 2v)dudv

=

∫ y0

1

4

(x2

2+ 2vx

)dv

=1

4

(x2

2y + y2x

)=

1

8x2y +

1

4y2x

for 0 < x < 2 and 0 < y < 1.

(d) Since Z depends only on X this is a one-dimensional transformation. Thetransformation is smooth and monotone, so

Z = 9/(X + 1)2

X = 3/√Z − 1

dx =3

2

1

z3/2dz

fZ(z) =1

4

3√z

3

2

1

z3/2=

9

8z2

for z ∈ [1, 9].

3. (a)

P (X >√Y ) = P (Y < X2) =

∫ 10

∫ x20

x+ ydydx

=

∫ 10

x3 +x4

2dx =

1

4+

1

10=

7

20= 0.35

35


(b)

P (X2 < Y < X) =

∫ 10

∫ xx2

2xdydx

=

∫ 10

2x(x− x2)dx =∫ 10

2x2 − 2x3dx

=2

3− 2

4=

1

6

4. (a) The marginal probabilities fX(2) and fY (3) are non-zero and thereforefX(2)fY (3) is non-zero, but the joint probability fX,Y (2, 3) = 0. So thejoint PMF is not the product of the marginals and thusX, Y are dependent.

(b) The marginals are

1 2 3fX(x)

14

12

14

and

2 3 4fY (y)

13

13

13

The joint probability table

X1 2 3

2 112

16

112

Y 3 112

16

112

4 112

16

112

obtained as gX,Y (x, y) = fXxfY (y) has the same marginals and X, Y areindependent.

5. (a) The means are

E[Xi] = E[Yi + V ] = E[Yi] + E[V ] = µ.

Since the Xi are independent of V the variances are

Var(Yi) = Var(Xi + V ) = Var(Xi) + Var(V ) = σ2 + τ 2.

(b) The covariance of Y1 and Y2 is

Cov(Y1, Y2) = Cov(X1 + V,X2 + V )

= Cov(X1, X2 + V ) + Cov(V,X2 + V )

= Cov(X1, X2) + Cov(X1, V ) + Cov(V,X2) + Cov(V, V )

= 0 + 0 + 0 + Var(V ) = τ 2.

36


The covariance of Yi and V is

Cov(Yi, V ) = Cov(Xi + V, V )

= Cov(Xi, V ) + Cov(V, V )

= 0 + Var(V ) = τ 2.

6. Let Yi = 1 if ball i is not in the sample and Yi = 0 otherwise. Then X =Y1 + · · ·+ Ym. The Yi are Bernoulli random variables with success probability

pm =

(m− 1m

)n.

So the mean of X is

E[X] = mpm = m

(m− 1m

)n.

The Yi are correlated, so to calculate Var(X) we need their covariances. Nowfor i 6= j

E[YiYj] = E[Y1Y2] = P (balls 1 and 2 are not in the sample) =

(m− 2m

)n.

So

Cov(Yi, Yj) =

(m− 2m

)n− p2m =

(m− 2m

)n−(m− 1m

)2n.

The variance of X is therefore

Var(X) =∑

Var(Yi) +∑∑

i 6=j

Cov(Yi, Yj)

= mpm(1− pm) +m(m− 1)((

m− 2m

)n− p2m

).

37


Assignment 10

Due on Monday, November 2, 2015.


2. Problem 4.16 (a) and (c) from the textbook (these geometrics count failures!)





Solutions

1. X, Y are independent,

X ∼ Poisson(θ) MX(t) = exp{θ(et − 1)}Y ∼ Poisson(λ) MY (t) = exp{λ(et − 1)}

soMX+Y (t) = exp{(θ + λ)(et − 1)}

and thus X + Y is Poisson(θ + λ). Now

fX|X+Y (x|z) =fX(x)fY (z − x)

fX+Y (z)=

θx

x!e−θ λ

z−x

(z−x)!e−λ

(θ+λ)z

z!e−(θ+λ)

=z!

x!(z − x)!

(θ

θ + λ

)x(1− θ

θ + λ

)z−xfor 0 ≤ x ≤ z. So X|X + Y = z is Binomial(z, θ/(θ + λ)).

2. X, Y are geometric, starting at zero (counting failures),

f(x) = p(1− p)x

for x = 0, 1, 2, . . ., and are independent.

(a)

U = min(X, Y ) range: 0, 1, . . .

V = X − Y range: all integers

38


The joint PMF of U, V is

fU,V (u, v) = P (min(X, Y ) = u,X − Y = v)

=

{P (Y = u,X = u+ v) if v ≥ 0P (X = u, Y = u− v) if v < 0

=

{p2(1− p)u(1− p)u+v v ≥ 0p2(1− p)u(1− p)u−v v < 0

=[p2(1− p)2u

] [(1− p)|v|

]So U, V are independent.

(b) Z takes on all possible rational values in [0,1]. Let q be a rational in [0,1]and let q = m

nwhere m and n have no common factors. Then for m > 0

P (Z = q) = P (Z =m

n)

= P (X = mk, Y = (n−m)k for some k ≥ 1)

=∞∑k=1

p2(1− p)mk(1− p)(n−m)k =∞∑k=1

p2(1− p)nk

=p2(1− p)n

1− (1− p)n

For m = 0, P (Z = 0) = P (X = 0) = p.

c. Let Z = X + Y . For x = 0, 1, . . . and z = x, x + 1, . . . the joint PMF ofX and Z is

f(x, z) = P (X = x, Y = z − x) = p(1− p)xp(1− p)z−x = p2(1− p)z

For all other (x, y) pairs f(x, y) = 0.

3. (a) For y = 1, 2, . . .,

fY (y) = P (y − 1 < X < y) = e−(y−1) − e−y = e−(y−1)(1− e−1)

So Y is geometric(p = 1− e−1).(b)

P (X − 4 > x|Y ≥ 5) = P (X − 4 > x|X ≥ 4)

=

{1 x ≤ 0e−x−4/e−4 x > 0

=

{1 x ≤ 0e−x x > 0

This is an exponential distribution.

For any t, X − t|X ≥ t is Exponential(1).

39


4. R2 ∼ χ22 = Gamma(1, 2) = Exponential(2) and θ ∼ Uniform(0, 2π).

X =√R2 cos θ

Y =√R2 sin θ

A = (0,∞)× (0, 2π)B = R2

The joint density of R2, θ is

fR2,θ(a, b) =1

2e−a/2

1

2π

for (a, b) ∈ A. The inverse transformation is

R2 = X2 + Y 2

θ =

cos−1(

X√X2+Y 2

)Y > 0

2π − cos−1(

X√X2+Y 2

)Y < 0

This is messy to differentiate; instead, compute

J−1 = det

(1

2√R2

cos θ −√R2 sin θ

1

2√R2

sin θ√R2 cos θ

)=

1

2cos2 θ +

1

2sin2 θ =

1

2

So J = 2, and

fX,Y (x, y) =1

2πe−

x2+y2

2

Thus X, Y are independent standard normal variables.

5. Approach from class: Let Z1, Z2 be independent standard normals and let

X = µ+ σZ1

Y = γ + σZ2

Then

U = X + Y = µ+ γ + σZ1 + σZ2

V = X − Y = µ− γ + σZ1 − σZ2

So

C = BBT =

[2σ2 00 2σ2

]

40


and

fU,V (u, v) =1

2π2σ2exp

{−(u− (µ+ γ))

2

4σ2− (v − (µ− γ))

2

4σ2

}= fU(u)fV (v)

where U ∼ N(µ+ γ, 2σ2), V ∼ N(µ− γ, 2σ2), and U, V are independent.

6. (a) See (b)

(b) Suppose the Pi are independent random variables with values in the unitinterval and common mean µ. Since the Pi are independent and each Xionly depends on Pi, the Xi are marginally independent as well. Each Xitakes on only the values 0 and 1, so the marginal distributions of the Xiare Bernoulli with success probability

P (Xi = 1) = E[P (Xi = 1|Pi)] = E[Pi] = µ

So the Xi are independent Bernoulli(µ) random variables and thereforeY =

∑ni=1Xi is Binomial(n, µ). If the Pi have a Beta(α,β) distribution

then µ = α/(α + β) and therefore

E[Y ] = nµ = nα/(α + β)

Var(Y ) = nµ(1− µ) = nαβ/(α + β)2

(c) For each i = 1, . . . , k

E[Xi] = E[E[Xi|Pi]] = E[niPi] = niE[Pi] = niα

α + β

Var(Xi) = E[Var(Xi|Pi)] + Var(E[Xi|Pi])= E[niPi(1− Pi)] + Var(niPi)= niE[Pi(1− Pi)] + n2iVar(Pi)

= ni

∫ 10

Γ(α + β)

Γ(α)Γ(β)pα+1−1(1− p)β+1−1dp+ n2i

αβ

(α + β)2(α + β + 1)

= niΓ(α + β)

Γ(α)Γ(β)

Γ(α + 1)Γ(β + 1)

Γ(α + β + 2)+

n2iαβ

(α + β)2(α + β + 1)

=niαβ

(α + β)(α + β + 1)+

n2iαβ

(α + β)2(α + β + 1)

=niαβ

(α + β)(α + β + 1)

(1 +

niα + β

)= ni

αβ(α + β + ni)

(α + β)2(α + β + 2)

41


Again the Xi are marginally independent, so

E[Y ] =∑

E[Xi] =α

α + β

k∑i=1

ni

Var(Y ) =∑

Var(Xi) =k∑i=1

niαβ(α + β + ni)

(α + β)2(α + β + 2)

The marginal distribution of Xi is called a beta-binomial distribution. Thedensity of Pi is

fP (p) =Γ(α + β)

Γ(α)Γ(β)pα−1(1− p)β−1

for 0 < p < 1. So the PMF of Xi is

P (Xi = x) = E[P (Xi = x|Pi)] = E[(nix

)P xi (1− Pi)ni−x

]=

∫ 10

(nix

)px(1− p)ni−x Γ(α + β)

Γ(α)Γ(β)pα−1(1− p)β−1dp

=

(nix

)Γ(α + β)

Γ(α)Γ(β)

Γ(α + x)Γ(β + ni − x)Γ(α + β + ni)

42


Assignment 11


1. Problem 4.28 (a) and (b) from the textbook




Solutions

1. (a)

U =X

X + YA = R2

V = X + Y B = R2

X = UV

Y = V − UV = (1− U)V

So

|J(u, v)| =∣∣∣∣det( v u−v 1− u

)∣∣∣∣ = |v(1− u) + uv| = |v|Thus

fU,V (u, v) = fX,Y (uv, (1− u)v)|v| =1

2πe−

12u2v2− 1

2(1−u)2v2 |v|

and

fU(u) =

∫fU,V (u, v)dv

=

∫ ∞−∞

1

2πe−

12u2v2− 1

2(1−u)2v2|v|dv

= 2

∫ ∞0

1

2πexp

{−v

2

2(1 + 2u2 − 2u)

}vdv

=1

π(1 + 2u2 − 2u)=

1

π(12

+ 2(u− 1/2)2)=

2

π(1 + 4(u− 1/2)2)

This is a Cauchy(1/2,1/2) density.

43


(b)

U = X/|Y |V = Y

X = U |V |Y = V

with A = B = R2. So

|J(u, v)| =∣∣∣∣det(|v| ±u0 1

)∣∣∣∣ = |v|and

fU,V (u, v) =1

2π|v| exp

{−1

2u2v2 − 1

2v2}

fU(u) =1

π(1− u2)

2. (a) The mean of Y is

E[Y ] = E[E[Y |X]] = E[X] = 1/2

The variance is

Var(Y ) = E[Var(Y |X)] + Var(E[Y |X]) = E[X2] + Var(X)= 1/3 + 1/12 = 5/12

The covariance is

Cov(X, Y ) = E[(Y − µY )(X − µX)] = E[E[Y − µY |X](X − µX)]= E[(X − µX)2] = Var(X) = 1/12

(b) The conditional distribution of Z = Y/X, given X = x is N(1, 1). Sincethis conditional distribution does not depend on x, Z and X are indepen-dent.

3. For each j, Xj counts the number of m independent trials that fall in categoryj. It therefore has a Binomial(m, pj) distribution.

Let Y = m−Xi−Xj. Then by a similar argument the joint marginal distribution

44


of (Xi, Xj, Y ) is Multinomial(m, pi, pj, 1− p1 − pj). So

P (X = xi|Xj = xj) =P (Xi = xi, Xj = xj)

P (Xj = xj)

=P (Xi = xi, Xj = xj, Y = m− xi − xj)

P (Xj = xj)

=

m!xi!xj !(m−xi−xj)!p

xii p

xjj (1− pi − pj)m−x1−xj

m!xj !(m−xj)!p

xjj (1− pj)m−xj

=(m− xj)!

xi!(m− xi − xj)!pxii (1− pi − pj)m−xi−xj

(1− pj)m−xj

=

(m− xjxi

)(pi

1− pj

)xi (1− pi

1− pj

)m−xj−xifor xi = 0, . . . ,m − xj. This is the PMF of a Binomial(m − xj, pj/(1 − pj))distribution.

Using these results,

E[XiXj] = E[XjE[Xi|Xj]] = E[Xj(m−Xj)

pi1− pj

]= (m2pj − E[X2j ])

pi1− pj

= (m2pj − Var(Xj)− E[Xj]2)pi

1− pj= (m2pj −mpj(1− pj)−m2p2j)

pi1− pj

= (m2pj(1− pj)−mpj(1− pj))pi

1− pj= (m2 −m)pipj

and therefore

Cov(Xi, Xj) = E[XiXj]− E[Xi]E[Xj] = (m2 −m)pipj −m2pjpj = −mpipj

An alternative approach for deriving the covariance is to use indicator functionsof whether the k-th trial falls in category i.

4. (a)

(b) The marginal density of Y is

fX(x) =

∫ 1−x0

Cxa−1yb−1(1− x− y)c−1dy

= Cxa−1(1− x)b+c−1∫ 10

ub−1(1− u)c−1du

= Cxa−1(1− x)b+c−1Γ(b)Γ(c)Γ(b+ c)

45


for 0 < x < 1 using the change of variables u = y/(1 − x). This is aBeta(a, b+ c) density, and

1 =

∫ 10

fX(x)dx = CΓ(b)Γ(c)

Γ(b+ c)

Γ(a)Γ(b+ c)

Γ(a+ b+ c)

= CΓ(a)Γ(b)Γ(c)

Γ(a+ b+ c).

So

C =Γ(a+ b+ c)

Γ(a)Γ(b)Γ(c).

Because of the symmetric role of x and y, the marginal distribution of Yis Beta(b, a+ c).

(c) The conditional distribution of Y |X = x has density

fY |X(y|x) =fX,Y (x, y)

fX(x)

∝ xa−1yb−1(1− x− y)c−1

xa−1(1− x)b+c−1

=

(y

1− x

)b−1(1− y

1− x

)c−11

1− x

for 0 < y < 1− x. The conditional density of U = Y/(1−X) given X = xis therefore

fU |X(u|x) ∝ ub−1(1− u)c−1

for 0 < u < 1, which is a Beta(b, c) density. As this does not depend on xthis shows that U and X are independent.

(d) The expected product is

E[XY ] = E[XE[Y |X]] = bb+ c

E[X(1−X)]

=b

b+ c

Γ(a+ b+ c)

Γ(a)Γ(b+ c)

∫ 10

xa(1− x)b+cdx

=b

b+ c

Γ(a+ b+ c)

Γ(a)Γ(b+ c)

Γ(a+ 1)Γ(b+ c+ 1)

Γ(a+ b+ c+ 2)

=b

b+ c

a(b+ c)

(a+ b+ c)(a+ b+ c+ 1)

=ab

(a+ b+ c)(a+ b+ c+ 1).

46


The covariance is therefore

Cov(X, Y ) = E[XY ]− E[X]E[Y ]

=ab

(a+ b+ c)(a+ b+ c+ 1)− ab

(a+ b+ c)2

= − ab(a+ b+ c)2(a+ b+ c+ 1)

.

47


Assignment 12





4. Problem 5.8 from the textbook. You can simplify calculations somewhat byarguing that you can assume without loss of generality that θ1 = E[Xi] = 0.

5. Problem 5.15 from the textbook.

6. Let U1, . . . , Un be a random sample from the Uniform[0, 1] distribution withorder statistics U(1) ≤ · · · ≤ U(n), and let R = U(n) − U(1) be the sample range.Find the marginal density of R.

Solutions

1. (a) For z < 0

P (Z ≤ z) = P (X ≤ z and Y < 0) + P (−X ≤ z and Y > 0).

Since X, Y are independent, continuous, and have distributions symmetricabout the origin,

P (X ≤ z and Y < 0) = P (X ≤ z)P (Y < 0) = Φ(z)12

and

P (−X ≤ z and Y > 0) = P (−X ≤ z)P (Y > 0) = Φ(z)12

where Φ(z) is the CDF of the standard normal distribution. So

P (Z ≤ z) = Φ(z)12

+ Φ(z)1

2= Φ(z)

By symmetry, for z > 0

P (Z ≥ z) = P (Z ≤ −z) = Φ(−z) = 1− Φ(z)

and therefore the CDF of X is equal to Φ for all z and Z has a standardnormal distribution.

48


(b) If Y > 0 then Z = |X| and Z > 0. Similarly, if Y < 0 then Z = −|X| andZ < 0. So Z and Y have the same sign and therefore the joint distributionof Z, Y assigns zero probability to the second and fourth quadrants. SinceX, Y are jointly continuous and the bivariate normal distribution assignspositive probability to all open sets this means Z, Y cannot be jointlynormal.

2. Let L be the system lifetime and let X1, X2, X3 be the component lifetimes.Then

P (L ≤ x) = P (all three components fail by x)= P (X1 ≤ x,X2 ≤ x,X3 ≤ x) = P (X1 ≤ x)P (X2 ≤ x)P (X3 ≤ x)= P (X1 ≤ x)3 = (1− e−x/λ)3

for y > 0.

3. (a) Condition on X1 = x:

P (Y > y|X1 = x) =

{1 if y ≤ 0F (x)y if y ≥ 1

i.e. Y |X1 = x is geometric with p = 1− F (x). So for y ≥ 1

P (Y > y) = E[F (X1)y] =

∫ 10

uydu

=1

y + 1

since F (X1) is uniform on [0, 1] by the probability integral transform. Sofor y = 1, 2, . . . ,

P (Y = y) =1

y− 1y + 1

=1

y(y + 1)

Alternative argument: For y = 1, 2, . . . ,

P (Y > y) = P (X1 > max{X2, . . . , Xy+1})= P (argmax(X1, . . . , Xy+1) = 1)

=1

y + 1

by symmetry.

49


(b) Using byc to denote the largest integer less than or equal to y we haveP (Y > y) = 1− FY (y) = 1/(byc+ 1) for all y ≥ 0. So

E[Y ] =

∫ ∞0

1− FY (y)dy =∫ ∞0

1

byc+ 1dy

≥∫ ∞0

1

y + 1dy =∞

4. (a) ∑(Xi −X)2 =

∑X2i −

1

n

(∑Xi

)(∑Xj

)=

1

2n

(2∑i

∑j

X2i − 2∑i

∑j

XiXj

)

=1

2n

(∑i

∑j

X2i − 2∑i

∑j

XiXj +∑i

∑j

X2j

)

=1

2n

∑i

∑j

(Xi −Xj)2

(b) Assume, without loss of generality, that E[Xi] = θ1 = 0. Then

E[S2] = σ2 = θ2

and

E[S4] =1

4n2(n− 1)2∑i

∑j

∑k

∑`

E[(Xi −Xj)2(Xk −X`)2]

If i = j or k = `, then E[(Xi − Xj)2(Xk − X`)2] = 0. If all i, j, k, ` aredifferent, then

E[(Xi −Xj)2(Xk −X`)2] = E[(X1 −X2)2]2 = (2σ2)2 = 4θ22

If {i, j} ∩ {k, `} = {i}, say k = i, then

E[(Xi −Xj)2(Xi −X`)2] = E[(X2i − 2XiXj +X2j )(X2i − 2XiX` +X2` )]= E[X4i − 2X3iXj +X2iX2j

− 2X3iX` + 4X2iXjX` − 2X2jXiX`+X2iX

2` − 2XiXjX2` +X2jX2` ]

= θ4 + 3θ22

50


If {i, j} = {k, `}, then

E[(Xi −Xj)2(Xi −X`)2] = E[(Xi −Xj)4]= E[X4i − 4X3iXj + 6X2iX2j − 4XiX3j +X4j ]= 2θ4 + 6θ

22

So

E[S4] =1

4n2(n− 1)2[n(n− 1)(n− 2)(n− 3)4θ22+ 4n(n− 1)(n− 2)(θ4 + 3θ22)+ 2n(n− 1)(2θ4 + 6θ22)]

=1

4n(n− 1)[4(n− 2)(n− 3)θ22 + 4(n− 2)(θ4 + 3θ22) + 4(θ4 + 3θ22)

]=

1

n(n− 1)[(n− 1)θ4 + ((n− 2)(n− 3) + 3(n− 2) + 3)θ22]

=1

n(n− 1)[(n− 1)θ4 + (n2 − 2n+ 3)θ22]

So

Var(S2) = E[S4]− n(n− 1)n(n− 1)

θ22

=1

n(n− 1)[(n− 1)θ4 + (n2 − 2n+ 3− n2 + n)θ22]

=1

n(n− 1)[(n− 1)θ4 − (n− 3)θ22]

=1

n

[θ4 −

n− 3n− 1

θ22

](c) Still assume θ1 = 0.

E[XS2] =1

2n2(n− 1)∑i

∑j

∑k

E[(Xi −Xj)2Xk]

=1

2n2(n− 1)2n(n− 1)E[(X1 −X2)2X1]

=1

nE[X31 − 2X21X2 +X1X22 ]

=1

nE[X31 ] =

1

nθ3

So X and S2 are uncorrelated if and only if θ3 = 0.

51


5. (a) For the mean,

Xn+1 =1

n+ 1

n+1∑i=1

Xi

=1

n+ 1

n∑i=1

Xi +1

n+ 1Xn+1

=n

n+ 1Xn +

1

n+ 1Xn+1

(b) For the variance,

nS2n+1 =n+1∑i=1

(Xi −Xn+1)2

=n+1∑i=1

(Xi −Xn)2 − (n+ 1)(Xn+1 −Xn)2

=n∑i=1

(Xi −Xn)2 + (Xn+1 −Xn)2 − (n+ 1)(Xn+1 −Xn)2

= (n− 1)S2n + (Xn+1 −Xn)2 − (n+ 1)(Xn+1 −Xn)2

From the result for sample means

Xn+1 −Xn =1

n+ 1(Xn+1 −Xn)

and therefore

(Xn+1 −Xn)2 − (n+ 1)(Xn+1 −Xn)2 = (Xn+1 −Xn)2 −1

n+ 1(Xn+1 −Xn)2

=n

n+ 1(Xn+1 −Xn)2

which completes the proof.

6. To simplify notation let X = U(n) and Y = U(1). From the general form of thejoint density of two order statistics the joint density of X and Y is

fXY (x, y) =

{n(n− 1)(x− y)n−2 for 0 < y < x < 10 otherwise/

Let R = X − Y and V = Y . This is a one-to-one transformation with inverse

x = r + v

y = v,

52


Jacobian determinant

J(r, v) = det

(1 10 1

)= 1,

and range

B = {(r, v) : 0 < r < r + v} = {(r, v) : 0 < r < 1 and 0 < v < 1− r}.

The joint density of R and V is therefore

fRV (r, v) = fXY (r + v, v) =

{n(n− 1)rn−2 if 0 < r < 1 and 0 < v < 1− r0 otherwise,

and the marginal density of R is

fR(r) =

∫ 1−r0

n(n− 1)rn−2dv = n(n− 1)rn−2(1− r)

for 0 < r < 1 and zero otherwise. This is a Beta(n− 1, 2) density.

53


Assignment 13





4. Let X have a Gamma(α, 1) distribution and let Y = (X − α)/√α.

(a) Find the density, mean, and variance of Y .

(b) Plot the density fα(y) of Y for α = 2, 10, 100.

(c) Show that for every y the density fα(y) converges to the standard normaldensity at y as α tends to infinity. Hints:

i. Use Stirling’s approximation for the gamma function, which can bewritten as

Γ(α) = e−ααα−1/2√

2π(1 +O(α−1)

)as α→∞.

ii. Work with log densities and use the fact that

log(1 + x) = x− x2

2+O(x3)

as x→ 0.

Solutions

1. Assume, without loss of generality, that θ = 1. Then for 0 < u < v < 1

fX(1),X(n)(u, v) =n!

0!(n− 2)!0!f(u)f(v)F (u)u(F (v)− F (u))n−2(1− F (v))0

= n(n− 1)f(u)f(v)(F (v)− F (u))n−2

= n(n− 1)(v − u)n−2

Let

Y = X(1)/X(n)

Z = X(n)

54


The range of (Y, Z) is B = [0, 1]× [0, 1], the inverse transformation is

X(1) = Y Z

X(n) = Z

The Jacobian determinant is

J(y, z) = det

(z y0 1

)= z

So for 0 < z < 1 and 0 < y < 1

fY,Z(y, z) = n(n− 1)(z − yz)n−2z= nzn−1 × (n− 1)(1− y)n−2

Thus Y and Z are independent.

This proof first removes θ from consideration since it is just a scale parameter.An alternative approach is to note that X(n) is minimal sufficient, X(1)/X(n) isancillary, and use Basu’s theorem in chapter 6.

2. a. Suppose f is continuous at a and XnP→ a, a a constant. Fix ε > 0. Then

there exists a δ > 0 such that

|f(x)− f(a)| < ε

if |x− a| < δ. So

P (|f(Xn)− f(a)| < ε) ≥ P (|Xn − a| < δ)→ 1

So f(Xn)P→ f(a). The result follows if f(x) =

√x or f(x) = 1/x and

a > 0.

b. f(x) = σ/√x is continuous at x = σ2 if σ > 0.

3. a. For any t and any ε > 0, if Xn > t and |Xn −X| < ε, then X > t− ε. SoX ≤ t− ε implies that either Xn ≤ t or |Xn −X| ≥ ε, i.e.

{X ≤ t− ε} ⊂ {Xn ≤ t} ∪ {|Xn −X| ≥ ε}

SoP (X ≤ t− ε) ≤ P (Xn ≤ t) + P (|Xn −X| ≥ ε)

orP (X ≤ t− ε)− P (|Xn −X| ≥ ε) ≤ P (Xn ≤ t)

b. Similarly (reversing the roles of X and Xn and replacing t by t+ ε),

P (Xn ≤ t) ≤ P (X ≤ t+ ε) + P (|Xn −X| ≥ ε)

55


c. Suppose the CDF of X is continuous at t. From the previous two parts,since Xn → X in probability we have

P (X ≤ t− ε) ≤ lim infn→∞

P (Xn ≤ t) ≤ lim supn→∞

P (Xn ≤ t) ≤ P (X ≤ t+ ε)

for any ε > 0. Since t is a continuity point of the distribution of X,limε↓0 P (X ≤ t− ε) = limε↓0 P (X ≤ t+ ε) = P (X ≤ t), and therefore

P (X ≤ t) ≤ lim infn→∞

P (Xn ≤ t) ≤ lim supn→∞

P (Xn ≤ t) ≤ P (X ≤ t)

So limn→∞ P (Xn ≤ t) exists and is equal to P (X ≤ t). Thus Xn → X indistribution.

4. (a) The mean and variance of X are E[X] = α and Var(X) = α, so

E[Y ] = (E[X]− α)/√α = 0

Var(Y ) = Var(X)/α = 1.

The inverse transformation is x =√αy + α with derivative dx/dy =

√α,

so the density of Y is

fY (y) =√αfX(

√αy + α) =

√α

Γ(α)(√αy + α)α−1e−

√αy−α.

(b) The following R code is one way to produce the plots:

z


The logarithm of the remainder of the density is

(α− 1) log(

y√α

+ 1

)−√αy = (α− 1)

[y√α− y

2

2α+O(α−3/2)

]−√αy

= − y√α− α− 1

α

y2

2+O(α−1/2)

→ −y2

2

as α→∞. So fY (y) converges pointwise to a standard normal density.

57


Assignment 14

Due on Monday, December 7, 2015.

1. Let Xn have a χ2n distribution.

(a) Find an approximating normal distribution for Xn.

(b) Find an approximating normal distribution for Yn =√Xn.

(c) Find an approximating normal distribution for Zn = logXn.

(d) How good are the approximations in parts (a), (b), and (c) for n =5, 10, 20, 100?

2. The height H and radius R of a cylinder are measured with error; the measure-ments are independent, normally distributed, and

µH = 75cm σH = 2cm µR = 10cm σR = 1cm.

The estimated volume of the cylinder is V = πR2H. Find a normal approxi-mation to the distribution of V .

3. Let X1, . . . , Xn be a random sample from an exponential distribution with meanθ.

(a) Find a normal approximation to the distribution of the sample mean Xn.

(b) Let Yn = g(Xn) where g is differentiable. Find a normal approximation tothe distribution of Yn.

(c) Can you find a function g such that the variance of the normal approxi-mation in (b) does not depend on θ?

4. Let X1, . . . , Xn he a random sample from a Poisson distribution with mean

λ > 0. Let Xn be the sample average and let Un =√n(Xn − λ)/

√Xn. Find

the limiting distribution of Un as n tends to infinity.

Solutions

1. (a) A χ2n random variable Xn has the same distribution as∑n

i=1 Ui with theUi i.i.d. χ

21 random variables. So the central limit theorem states that

Xn ∼ AN(n, 2n)

as n→∞.

58


(b) Let Vn =√Xn/n = f(Xn/n). Now Xn/n

P→ 1, Xn/n ∼ AN(1, 2/n),f(1) = 1, and f ′(1) = 1/2. So

Vn ∼ AN(

1,1

42/n

)= AN

(1,

1

2n

)and thus

Yn =√nVn ∼ AN(

√n, 1/2)

(c) Let Wn = log(Xn/n) = f(Xn/n). Now Xn/nP→ 1, Xn/n ∼ AN(1, 2/n),

f(1) = 0, and f ′(1) = 1. So

Wn ∼ AN (0, 2/n)

and thusZn = Vn + log n ∼ AN(log n, 2/n)

(d) The exact CDFs and PDFs of Yn and Zn are

FYn(y) = FXn(y2) fYn(y) = fXn(y

2)2y

FZn(z) = FXn(ez) fZn(z) = fXn(e

z)ez

You can look at graphs of the densities or CDFs, or at quantile plots, orat numerical measures of the discrepancies of the exact and approximateCDFs.

2. V = f(R,H) with f(r, h) = πr2h. The gradient of f is

∇f(r, h) = (2πrh, πr2).

So

V ≈ f(10, 75) + ∂∂rf(10, 75)(R− 10) + ∂

∂hf(10, 75)(H − 75)

= π × 7500 + π × 1500× (R− 10) + π × 100× (H − 75)∼ N(π × 7500, (π × 1500× 1)2 + (π × 100× 2)2)≈ N(23561.94, (4754.09)2)

3. (a) The variance of the exponential distribution with mean θ is θ2. So by theCLT Xn ∼ N(θ, θ2/n).

(b) By the delta method Yn = g(Xn) ∼ N(g(θ), (g′(θ))2θ2/n).(c) The approximate variance is constant in θ if g′(θ) = 1/θ, or g(θ) = log θ.

This is an example of a variance stabilizing transformation.

59


4. By the weak law of large numbers Xn converges in probability to E[X1] = λ.

By the continuous mapping theorem Tn =√Xm converges in probability to√

λ. Using the strong law of large numbers and basic continuity shows thatconvergence also holds almost surely.

Since Var(X1) = λ the central limit theorem implies that Xn ∼ AN(λ, λ/n).Since the square root function f(x) =

√x is differentiable at positive x the delta

method implies that

Tn ∼ AN(f(λ), (f ′(λ)2λ/n) = AN

(√λ,

(1

2√λ

)2λ/n

)= AN

(√λ,

1

4n

).

60

Assignment 1SolutionsAssignment 2SolutionsAssignment 3SolutionsAssignment 4SolutionsAssignment 5SolutionsAssignment 6SolutionsAssignment 7SolutionsAssignment 8SolutionsAssignment 9SolutionsAssignment 10SolutionsAssignment 11SolutionsAssignment 12SolutionsAssignment 13SolutionsAssignment 14Solutions

STAT:5100 (22S:193) Statistical Inference I Homework...

Documents

Transcript of STAT:5100 (22S:193) Statistical Inference I Homework...