1 Point and Interval Estimates Examples with z and t distributions Single sample; two samples...

23
1 Point and Interval Estimates Examples with z and t distributions Single sample; two samples Result: Sums (and differences) of normally distributed RV are normally distributed. Determining the variance of the difference between means for two independent samples Pooled estimates of the variance (when two independent estimates are available) Degrees of freedom for the variance of the difference between the means of two independent samples (equal/not equal variances) Estimating the variance for use with proportions, and CI with proportions: Bayesian Credibility Intervals Prior Distribution Joint Distribution of prior and data Posterior Distribution
  • date post

    18-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    0

Transcript of 1 Point and Interval Estimates Examples with z and t distributions Single sample; two samples...

Page 1: 1 Point and Interval Estimates Examples with z and t distributions Single sample; two samples Result: Sums (and differences) of normally distributed RV.

1

Point and Interval Estimates• Examples with z and t distributions• Single sample; two samples• Result: Sums (and differences) of normally distributed RV are

normally distributed.• Determining the variance of the difference between means for

two independent samples• Pooled estimates of the variance (when two independent

estimates are available)• Degrees of freedom for the variance of the difference between

the means of two independent samples (equal/not equal variances)

• Estimating the variance for use with proportions, and CI with proportions:

• Bayesian Credibility Intervals – Prior Distribution– Joint Distribution of prior and data– Posterior Distribution

Page 2: 1 Point and Interval Estimates Examples with z and t distributions Single sample; two samples Result: Sums (and differences) of normally distributed RV.

2

Introduction to Biostatistics (PUBHLTH 540)

Examples of Point and Interval Estimates+ Credibility Intervals

Examples from Seasons Study• Assumptions: Subjects are SRS from population. • Assume different groups are independent SRS from

different stratum (ie. gender)

Details: • Use t-distribution for interval estimates when sample

sizes are small (unless estimate is of a proportion) – requires an assumption that the underlying random

variable is normally distributed• When response is binary (yes/no), we estimate the

population mean by the sample mean (equal to the sample proportion ), and the sample variance byp̂

2ˆ ˆ ˆ1p p

Page 3: 1 Point and Interval Estimates Examples with z and t distributions Single sample; two samples Result: Sums (and differences) of normally distributed RV.

3

Examples: Point and Interval Estimate of WtExamples from Seasons Study (see ejs09b540p34.sas).What is a 95% Confidence Interval for Weight?

(see: http://dostat.stat.sc.edu/prototype/calculators/index.php3 )?dist=T to get t-percentiles)Figure 1. Histogram of weight in kg for n=291

Source: ejs09b540p34.sas 10/20/2009 by ejs

48 60 72 84 96 108 120 132 144 156

0

5

10

15

20

25

30

Pe

rce

nt

W t (kg) (formerly cc5a)

Weight  

n 291 Lower 95 Upper 95

Mean 77.62 75.6 79.7

Std 17.79

df 290

statist 1.968

The mean weight is estimated as 77.6 kg, with a 95% CI of (75.6, 79.7)

2

1 ,0.975df n

SY t

n

290,0.9751 ,0.975 1.968

17.7977.6 1.968

290

df nt t

Use applets to get t value

Page 4: 1 Point and Interval Estimates Examples with z and t distributions Single sample; two samples Result: Sums (and differences) of normally distributed RV.

4

Examples: Point and Interval Estimate of Wt

Answer: Same as before--The mean weight is estimated as 77.6 kg, with a 95% CI of (75.6, 79.7)

• Suppose we assume the Seasons study subjects were a SRS from people in the US. What is a point and interval estimate of weight for the US population?

Page 5: 1 Point and Interval Estimates Examples with z and t distributions Single sample; two samples Result: Sums (and differences) of normally distributed RV.

5

Examples: Point and Interval Estimate of Wt- separately for men and women

Examples from Seasons Studyejs09b540p34.sas

(see: http://dostat.stat.sc.edu/prototype/calculators/index.php3?dist=T to get t-percentiles)

For men, the mean weight is estimated as 85.9 kg (95% CI (83.3,88.5) while for women, mean wt is 69.7 kg (95% CI (67.2, 72.3)

Table 3. Description of weight by gender

Male(0) Analysis Variable : wt Wt (kg) (formerly cc5a) N Mean Std Dev Variance Std Errorƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ142 85.90 15.82 250.32 1.33ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Female(1) Analysis Variable : wt Wt (kg) (formerly cc5a) N Mean Std Dev Variance Std Errorƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ149 69.73 15.92 253.42 1.30ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒSource: ejs09b540p34.sas 10/20/2009 by ejs

1 ,0.975df n YY t S 141,0.975 1.977t

Use Applet-- men

Use Applet-- women148,0.975 1.976t

Page 6: 1 Point and Interval Estimates Examples with z and t distributions Single sample; two samples Result: Sums (and differences) of normally distributed RV.

6

Examples: Point and Interval Estimate of Wt- adjusting for gender in US population

• Suppose we assume the Seasons study male subjects were a SRS from males in the US, and similarly, and female subjects were an independent SRS from females in the US. In 2000, there were 138.05 million males, and 143.37 million females in the U.S.. Using the Seasons study estimates, what is a point and interval estimate of weight for the US population?

138.050.49

138.05 143.37Mc

143.370.51

138.05 143.37Fc

ˆM FZ c Y c X

Males Females

Page 7: 1 Point and Interval Estimates Examples with z and t distributions Single sample; two samples Result: Sums (and differences) of normally distributed RV.

7

ˆM F

M F

Z c Y c X

Yc c

X

Example: Linear Combinations of Random variables

2

2

0var 0

var0 var

0

y

y

x

x

Y nY

X X

n

Estimate:

2

2

250.320 0142

253.4200

149

Y

y

X

x

S

n

S

n

0.49 0.51M Fc c

2 2 2

ˆvar var

var var

MM F

F

z M F

cYZ c c

cX

c Y c X

Page 8: 1 Point and Interval Estimates Examples with z and t distributions Single sample; two samples Result: Sums (and differences) of normally distributed RV.

8

ˆ 0.49 85.9 0.51 69.73Z

Example: Linear Combinations of Random variables

2 2250.32 253.42ˆ 0.49 0.51

142 149z

What are the DF for the t-dist?

If variances are equal, use df=n1+n2-2, and replace individual variance estimates by a pooled variance.

If variances are not equal, see p270-271 in text for df approximation.

Page 9: 1 Point and Interval Estimates Examples with z and t distributions Single sample; two samples Result: Sums (and differences) of normally distributed RV.

9

Note: Common estimate of a variance-

Pooled EstimateIf we assume the population variance in weight is equal for males and females, we can estimate a pooled (common) variance (see p267 in text):

2 21 1 2 22

1 1

1 1

1 1p

n S n SS

n n

More generally:

2

12

1

1

1

G

g gg

p G

gg

n S

Sn

for Wt:

2 142 1 250.32 149 1 253.42251.9

142 1 149 1pS

2 2251.9 251.9ˆ 0.49 0.51

142 149z

Page 10: 1 Point and Interval Estimates Examples with z and t distributions Single sample; two samples Result: Sums (and differences) of normally distributed RV.

10

ˆ 0.49 85.9 0.51 69.73Z

Example: Linear Combinations of Random variables

2 2

2 215.82 15.92ˆ 0.49 0.51

142 149z

Assuming not equal: from p270-271 in text, 22 2 22 2

2 2 22 2 222

15.82 15.92142 149

15.82 15.92142 149

1 1 141 148

M F

m f

FM

fm

m f

S S

n ndf

SSnn

n n

Page 11: 1 Point and Interval Estimates Examples with z and t distributions Single sample; two samples Result: Sums (and differences) of normally distributed RV.

11

Example: Linear Combinations of Random variables

Wt Mean constants Estimate Var(Mean)

Male 0.49 85.9 1.76

Female 0.51 69.7 1.70

Wt Average   77.6 0.865

df 289

Approx df 288.5  95% CI  (75.8, 79.5)

t-0.975 1.968    

Wt is estimated as 77.6 kg with a 95% CI of (75.8,79.5)

Page 12: 1 Point and Interval Estimates Examples with z and t distributions Single sample; two samples Result: Sums (and differences) of normally distributed RV.

12

Examples: Proportion of Subjects who are obese (BMI>30) (see p327 text)

•Estimate the proportion of subjects obese, and a 95% CI

•Create 0/1 variable 1=obese 0=normal wt

•Use Z-dist for CI (since np>5)

• Variance estimate: 2ˆ ˆ ˆ1p p Obese

No 221

Yes 70 Lower 95

Total 291 0.1914

P_hat 0.240549828 0.2897

var (phat) 0.000627786 Upper95

See: ejs09b540p34.sas

0.975

ˆ ˆ1ˆ

p pp z

n

Page 13: 1 Point and Interval Estimates Examples with z and t distributions Single sample; two samples Result: Sums (and differences) of normally distributed RV.

13

Examples: Proportion of Subjects who are obese (BMI>30) (see p327 text)

•Single random variable (0/1) is called a Bernoulli random variable.

•Variance is estimated using maximum likelihood estimator (biased): 2ˆ ˆ ˆ1p p

•Usual estimate of the variance (used in other settings) is:

•Normal Approximation is used commonly when nP>5 and n(1-P)>5 (NOT t-dist)

2 ˆ ˆ11

nS p p

n

Example: Sample finds 4 of 10 subjects obese

4ˆ 0.4

10p

Note: nP is not large enough here for the normal approximation to be “good”.

0.975

ˆ ˆ1ˆ

p pp z

n

95% CI

0.4 0.60.4 1.96 (0.10,0.70)

10

Page 14: 1 Point and Interval Estimates Examples with z and t distributions Single sample; two samples Result: Sums (and differences) of normally distributed RV.

14

Examples: Credibility IntervalsBayesian Approach

Recall that we could estimate the mean using Maximum Likelihood

Example: We select a srs with replacement of n=10 and observe x=4. What is p?

Solution 1: Use the sample mean:

ˆ 0.4x

pn

Solution 2: Use value of the parameter p that maximizes the likelihood, given the data.

644 | , 10 210 1P X p n p p

64210 1L p p p Likelihood:

The likelihood is a function of p. We can think of a set of possible values, i.e. 0, 0.1, 0.2, …, 0.8, 0.9, 1 of p. The maximum likelihood estimate is the value of p where the likelihood is largest.

Page 15: 1 Point and Interval Estimates Examples with z and t distributions Single sample; two samples Result: Sums (and differences) of normally distributed RV.

15

Binomial DistributionLikelihood

We select a srs with replacement of n=10 and observe x=4. What is p?

Parameterp

L(p) Parameterp

L(p)

0.05 0.001 0.55 0.1596

0.10 0.0112 0.60 0.1115

0.15 0.0401 0.65 0.0689

0.20 0.0881 0.70 0.0368

0.25 0.1460 0.75 0.0162

0.30 0.2001 0.80 0.0055

0.35 0.2377 0.85 0.0012

0.40 0.2508 0.90 0.0001

0.45 0.2384 0.95 0.0000

0.50 0.2051 1.00 0.0000

Page 16: 1 Point and Interval Estimates Examples with z and t distributions Single sample; two samples Result: Sums (and differences) of normally distributed RV.

16

Binomial DistributionMaximum LikelihoodLikelihood: 64210 1L p p p

p L(p) p L(p)

0.05 0.001 0.40 0.2508

0.10 0.0112 0.45 0.2384

0.15 0.0401 0.50 0.2051

0.20 0.0881 0.55 0.1596

0.25 0.1460 0.60 0.1115

0.30 0.2001 0.65 0.0689

0.35 0.2377 etc

L p

0.05

0.1

0.2

0.2 0.3 0.4 0.5

MaximumLikelihood

ˆ 0.4x

pn

0.6 0.7 0.9

Page 17: 1 Point and Interval Estimates Examples with z and t distributions Single sample; two samples Result: Sums (and differences) of normally distributed RV.

17

Examples: Credibility IntervalsBayesian Approach-Prior

Suppose we assume each parameter is equally likely. This is called a uniform prior distribution

Parameterp

Prior Prob.

Parameterp

Prior Prob.

0.05 0.05 0.55 0.05

0.10 0.05 0.60 0.05

0.15 0.05 0.65 0.05

0.20 0.05 0.70 0.05

0.25 0.05 0.75 0.05

0.30 0.05 0.80 0.05

0.35 0.05 0.85 0.05

0.40 0.05 0.90 0.05

0.45 0.05 0.95 0.05

0.50 0.05 1.00 0.05

Prior distribution

p

p p

Page 18: 1 Point and Interval Estimates Examples with z and t distributions Single sample; two samples Result: Sums (and differences) of normally distributed RV.

18

Examples: Credibility IntervalsBayesian Approach-Data|p

We select a srs with replacement of n=10 and observe x=4. The likelihoodis the Pr(Data|p)

Parameterp

L(p|x) Parameterp

L(p|x)

0.05 0.001 0.55 0.1596

0.10 0.0112 0.60 0.1115

0.15 0.0401 0.65 0.0689

0.20 0.0881 0.70 0.0368

0.25 0.1460 0.75 0.0162

0.30 0.2001 0.80 0.0055

0.35 0.2377 0.85 0.0012

0.40 0.2508 0.90 0.0001

0.45 0.2384 0.95 0.0000

0.50 0.2051 1.00 0.0000

|P x p |P x p

64 1p p

Page 19: 1 Point and Interval Estimates Examples with z and t distributions Single sample; two samples Result: Sums (and differences) of normally distributed RV.

19

Examples: Credibility IntervalsBayesian Approach-Posterior

Combining the Likelihood and the prior, we have the joint probabilities

|pP p x P x p

We sum these probabilities over all possible possible values of p, and divide by this sum to form posterior probabilities:

||

|

p

pp

P x pP p x

P x p

Page 20: 1 Point and Interval Estimates Examples with z and t distributions Single sample; two samples Result: Sums (and differences) of normally distributed RV.

20

Examples: Credibility IntervalsBayesian Approach-Posterior

Credibility Intervals are like Confidence Intervals for parameters in the Posterior Distribution (Uniform Prior)

n 10x 4 Successes Normalized

Prior Prob P(Success) Likelihood Joint Joint Cumulative pi(p) p L(x|p) pi(p)*L(x|p) Posterior Posterior

0.05 0.05 0.00000 0.00000 0.00053 0.000530.05 0.1 0.00005 0.00000 0.00614 0.006670.05 0.15 0.00019 0.00001 0.02205 0.02872 0.150000.05 0.2 0.00042 0.00002 0.04844 0.077170.05 0.25 0.00070 0.00003 0.08030 0.157460.05 0.3 0.00095 0.00005 0.11007 0.267530.05 0.35 0.00113 0.00006 0.13072 0.39825 0.960.05 0.4 0.00119 0.00006 0.13795 0.53620 Credible Interval0.05 0.45 0.00114 0.00006 0.13110 0.667300.05 0.5 0.00098 0.00005 0.11279 0.780100.05 0.55 0.00076 0.00004 0.08776 0.867860.05 0.6 0.00053 0.00003 0.06131 0.929170.05 0.65 0.00033 0.00002 0.03790 0.967070.05 0.7 0.00018 0.00001 0.02022 0.98729 0.700000.05 0.75 0.00008 0.00000 0.00892 0.996210.05 0.8 0.00003 0.00000 0.00303 0.999240.05 0.85 0.00001 0.00000 0.00069 0.999920.05 0.9 0.00000 0.00000 0.00008 1.000000.05 0.95 0.00000 0.00000 0.00000 1.000000.05 1 0.00000 0.00000 0.00000 1.00000

Totals 1 0.00043 1.00000

Page 21: 1 Point and Interval Estimates Examples with z and t distributions Single sample; two samples Result: Sums (and differences) of normally distributed RV.

21

Examples: Credibility IntervalsBayesian Approach-Posterior

Credibility Intervals are like Confidence Intervals for parameters in the Posterior Distribution (Symmetric Prior)

n 10x 4 Successes Normalized

Prior Prob P(Success) Likelihood Joint Joint Cumulative pi(p) p L(x|p) pi(p)*L(x|p) Posterior Posterior

0.050000 0.050000 0.000005 0.000000 0.000499 0.0004990.100000 0.100000 0.000053 0.000005 0.011541 0.0120400.200000 0.150000 0.000191 0.000038 0.082926 0.094965 0.1500000.300000 0.200000 0.000419 0.000126 0.273251 0.3682170.200000 0.250000 0.000695 0.000139 0.301952 0.670169 0.910.100000 0.300000 0.000953 0.000095 0.206945 0.877114 Credible Interval0.050000 0.350000 0.001132 0.000057 0.122886 1.000000 0.3500000.000000 0.400000 0.001194 0.000000 0.000000 1.0000000.000000 0.450000 0.001135 0.000000 0.000000 1.0000000.000000 0.500000 0.000977 0.000000 0.000000 1.0000000.000000 0.550000 0.000760 0.000000 0.000000 1.0000000.000000 0.600000 0.000531 0.000000 0.000000 1.0000000.000000 0.650000 0.000328 0.000000 0.000000 1.0000000.000000 0.700000 0.000175 0.000000 0.000000 1.0000000.000000 0.750000 0.000077 0.000000 0.000000 1.0000000.000000 0.800000 0.000026 0.000000 0.000000 1.0000000.000000 0.850000 0.000006 0.000000 0.000000 1.0000000.000000 0.900000 0.000001 0.000000 0.000000 1.0000000.000000 0.950000 0.000000 0.000000 0.000000 1.0000000.000000 1.000000 0.000000 0.000000 0.000000 1.000000

Totals 1.000000 0.000460 1.000000

Page 22: 1 Point and Interval Estimates Examples with z and t distributions Single sample; two samples Result: Sums (and differences) of normally distributed RV.

22

Examples: Credibility IntervalsBayesian Approach-Posterior

Credibility Intervals are like Confidence Intervals for parameters in the Posterior Distribution (Tiered Prior)

n 10x 4 Successes Normalized

Prior Prob P(Success) Likelihood Joint Joint Cumulative pi(p) p L(x|p) pi(p)*L(x|p) Posterior Posterior0.01000 0.05000 0.00000 0.00000 0.00010 0.000100.10000 0.10000 0.00005 0.00001 0.01105 0.011150.20000 0.15000 0.00019 0.00004 0.07941 0.09056 0.150000.20000 0.20000 0.00042 0.00008 0.17444 0.265000.20000 0.25000 0.00070 0.00014 0.28914 0.55414 0.890.10000 0.30000 0.00095 0.00010 0.19817 0.75231 Credible Interval0.03000 0.35000 0.00113 0.00003 0.07060 0.822910.02000 0.40000 0.00119 0.00002 0.04967 0.872580.02000 0.45000 0.00114 0.00002 0.04721 0.919790.02000 0.50000 0.00098 0.00002 0.04062 0.960410.01000 0.55000 0.00076 0.00001 0.01580 0.97621 0.550000.01000 0.60000 0.00053 0.00001 0.01104 0.987250.01000 0.65000 0.00033 0.00000 0.00682 0.994070.01000 0.70000 0.00018 0.00000 0.00364 0.997710.01000 0.75000 0.00008 0.00000 0.00161 0.999320.01000 0.80000 0.00003 0.00000 0.00055 0.999860.01000 0.85000 0.00001 0.00000 0.00012 0.999990.01000 0.90000 0.00000 0.00000 0.00001 1.000000.01000 0.95000 0.00000 0.00000 0.00000 1.000000.01000 1.00000 0.00000 0.00000 0.00000 1.00000

Totals 1.00000 0.00048 1.00000

Page 23: 1 Point and Interval Estimates Examples with z and t distributions Single sample; two samples Result: Sums (and differences) of normally distributed RV.

23

Examples: Credibility IntervalsBayesian Approach-ConclusionsCredibility Intervals (for the same data) depend on the Prior Distribution

Prior Credibility Interval ConfidenceUniform (0.15, 0.70) 0.96Symmetric (0.15, 0.35) 0.91Tiered (0.15, 0.55) 0.89

Frequentist 95% Confidence Intervals based on Normal Approximation

(0.10, 0.70) 1

2

ˆ ˆ1ˆ

p pp z

n

Credibility Interval- Intuitive Interpretation- prob parameter is in interval is confidence

Frequentist Confidence Interval- awkward interpretation- includes parameter for 95% of samples, if repeated