Inferential Statistics Estimation...

11
Estimation Estimation - 1 1 Estimation Confidence Intervals for Means 2 Inferential Statistics 1. Type of Inference: Estimation Hypothesis Testing 2. Purpose Make Decisions about Population Characteristics Population? 3 Estimation Process Mean, , is unknown Population (, s) Random Sample Mean X = 98 Sample X 1 , X 2 , … , X n ~N(,s) ~N(,s) X 1 X 2 X 3 ~N(,s) X 4 ~N(,s) X 5 ~N(,s) ... 4 Random Sample Random sample is a set of independent and identically distributed (i.i.d.) random variables. 5 Theorem (Distribution of ) If X 1 , X 2 , …, X n are observations of a random sample of size n from the normal distribution N(, s 2 ), then the distribution of the sample mean is N(, s 2 /n) n i i X n X 1 1 X 6 Theorem (Distribution of ) If X 1 , X 2 , …, X n are observations of a random sample of size n from a distribution that has a mean and a finite variance s 2 , then the distribution of is N(0, 1), as n , X n X Z / s s n n X Z n i i 1 X and the distribution of the sample mean is N(, s 2 /n), as n .

Transcript of Inferential Statistics Estimation...

Estimation

Estimation - 1

1

Estimation

Confidence Intervals for Means

2

Inferential Statistics

1. Type of Inference:

Estimation

Hypothesis Testing

2. Purpose

Make Decisions

about Population

Characteristics

Population?

3

Estimation Process

Mean, , is

unknown

Population (, s)

Random Sample

Mean

X= 98

Sample

X1, X2, … , Xn

~N(,s)

~N(,s)

X1

X2

X3 ~N(,s) X4 ~N(,s)

X5 ~N(,s) ...

4

Random Sample

Random sample is a set of independent and

identically distributed (i.i.d.) random variables.

5

Theorem (Distribution of )

If X1, X2, …, Xn are observations of a random

sample of size n from the normal distribution

N(, s2), then the distribution of the sample

mean is N(, s2/n)

n

i

iXn

X1

1

X

6

Theorem (Distribution of )

If X1, X2, …, Xn are observations of a random

sample of size n from a distribution that has a

mean and a finite variance s2, then the

distribution of is N(0, 1), as n ,

X

n

XZ

/s

s

n

nXZ

n

i i 1

Xand the distribution of the sample mean

is N(, s 2/n), as n .

Estimation

Estimation - 2

7

Statistics Used to Estimate Population Parameters

Sample Mean,

Sample Variance, s2

Sample Proportion,

Estimators

x population mean

s2 population variance

p population proportion

Parameters Statistics

8

Sampling Distribution

Theoretical Probability Distribution of

the Sample Statistic.

XThe distribution of the sample mean

from N(, s2), is N(, s2/n), as n .

Example:

9

Disadvantage of Point Estimation

1. Provides Single Value

Based on Observations from 1 Sample.

* Sample MeanX = 98 Is a Point Estimate of

Unknown Population Mean.

2. Gives No Information about How Close Value Is

to the Unknown Population Parameter

Which of the following statistics do you prefer? a. 32% b. 32% with a margin of error 3%

10

Estimation

You’re interested in finding the average body temperature of healthy adults in Northeastern Ohio (the population). What would you do?

How can we estimate this average with a measure of reliability?

98 1 F 98 .5 F 98 .2 F

11

Interval Estimation

Margin of Error Gives Information about How Close Value Is to the Unknown Population Parameter.

12

Sampling Error

x

Sample statistic

(point estimate)

Sampling Error = | – | x

Estimation

Estimation - 3

13

Key Elements of

Interval Estimation

Sample statistic

(point estimate)

Confidence

limit (lower)

Confidence

limit (upper)

Confidence

interval

Confidence Level: A probability that the

population parameter falls somewhere

within the interval.

x Margin of Error

98 1 F

14

Confidence Interval Estimation

X

Xs

s

s

1)( 2/2/n

zXn

zXP

ss 1)( 2/2/ xx zXzP

ss 1)( 2/2/ xx zXzXP

2/n

zxs

xz s 2/xz s 2/

/2 1 –

15

Confidence Interval Estimation

X

Xs

95.)96.196.1( n

Xn

XPs

s

95.)96.196.1( xx XP ss

95.)96.196.1( xx XXP ss

96.1n

xs

xs 96.1xs 96.1

.025 .95

16

The Confidence Interval

sx _

X

95% Sample

Means

+ 1.96sx - 1.96sx

1- = .95

Confidence Level

/2 /2 = .025

1.96 = z.025

x + 1.96sx x - 1.96sx

x

Confidence Interval =>

17

(1-)·100% Confidence Interval Estimate for

mean of a normal population

or

) , ( 2/2/n

zxn

zxss

2/n

zxs

Margin of Error

Confidence Interval for Mean

(s Known)

“s Known” may mean that we have very good estimate of s.

It is not practical to assume that we know s. 18

Confidence Interval of Mean

(s unKnown and n 30)

(1-)·100% Confidence Interval Estimate

for mean of a population when sample size

is relative large

or

) , ( 2/2/n

szx

n

szx

2/n

szx

Estimation

Estimation - 4

19

The Confidence Interval

95% Samples

sx _

X

+ 1.96sx - 1.96sx

x - 1.96sx x + 1.96sx

x

Confidence Interval =>

95% Confidence

Interval

20

95% Samples

sx _

X

2.5% 2.5%

95 % of

intervals

contain .

5% do not.

The Confidence Interval

21

Factors Affecting

Interval Width

1. Data Dispersion

Measured by s

2. Sample Size

Affects standard error:

3. Level of Confidence (1 )

Affects Z/ 2

n

x

ss

) , ( 2/2/n

zxn

zxss

22

90% Samples

95% Samples

99% Samples

+ 1.65sx + 2.58sx

sx _

X

+1.96sx

- 2.58sx - 1.65sx

-1.96sx

Size of Interval

23

Estimation Example

Mean (s Known) The average weight of a random sample of n = 25

subjects isX = 140. Set up a 95% confidence interval

estimate for if s = 10. (Assume Normal population.)

3.92140or ) 92.341 , 08.631 (

) 25

1096.1041 ,

25

1096.1041 (

) , (

1.96. z .025, 2

.05, ,95.1

2/2/

2

nZX

nZX

ss

2/n

zxs

143.92) (136.08,

92.3 140 25

1096.1401

24

Interpretation

We can be 95% confident that the population

mean is in (136.08, 143.92).

We can be 95% confident that the maximum

sampling error using this interval estimate for

estimating mean is within 3.92.

Estimation

Estimation - 5

25

Confidence Interval of Mean

(s unKnown and n 30)

(1-)·100% Confidence Interval Estimate

for mean of a population when sample size

is relative large

or

) , ( 2/2/n

szx

n

szx

2/n

szx

26

Thinking Challenge

Example: A city uses a certain noise index to monitor the noise pollution at a certain area of the city. A random sample of 100 observations from randomly selected days around noon showed an average index value of x = 1.99 and standard deviation s = 0.05. Find the 90% confidence interval estimate of the average noise index at noon.

27

Confidence Interval Solution*

) 998.1 , 982.1 (

0.008 1.99100

05.64.199.1

1.64z z

.05 /2 .1, 90.1 .90, 1

2/

.052 /

n

szx

28

Interval Estimation for Mean

In a survey on a random sample of 64

individuals who gambled at Las Vegas, the

average amount of money won for the day that

survey was done is –$25.50 with a standard

deviation of $100. Find the 95% confidence

interval estimate for the average amount of

money won by people gambled at Las Vegas

that day.

29

Finding Sample Sizes

for Estimating

I don’t want to

sample too much

or too little!

2

22

2

2

2

Error ofMargin

nz :C.I.

E

zn

nzE

x

s

s

s

B = Margin of Error or Bound

30

Sample Size Example

What sample size is needed to be 90%

confident of being correct within 5? A pilot

study suggested that the standard deviation is

45.

2202.2195

45645.12

22

2

22

05. E

zn

s

Estimation

Estimation - 6

31

Thinking Challenge

You plan to survey residents in

your county to find the average

health insurance premium that they

are paying. You want to be 95%

confident that the sample mean is

within ± $50.

A pilot study showed that s was

about $400. What sample size

should you use?

32

Sample Size Solution*

24686.245

50

40096.12

22

2

22

025.0

E

zn

s

33

Confidence Interval Mean

(s Unknown & n < 30)

1. Assumptions

Population Standard Deviation Is Unknown

Population Must Be Normally Distributed

2. Use Student’s t Distribution

3. Confidence Interval Estimate

) , ( 1,2/1,2/n

stx

n

stx nn

n

stx

n

1 ,2

34

t

Student’s t Distribution

0

t (df = 5)

Z

Standard

Normal (Z)

Bell-Shaped

Symmetric

‘Fatter’ Tails

t (df = 13)

ns

xt

35

Theorem (Distribution of and S 2) X

)1( is )1( 2

2

2

nSn

s

X

If X1, X2, …, Xn are observations of a random

sample of size n from the normal distribution

N(, s 2). The statistics, sample mean, , and

sample variance, S 2, are independent and

n

i iXn

X1

1

n

i i XXn

S1

22 )(1

136

Student’s t Distribution

Let Z be a random variable that is N(0, 1),

and U be a random variable that is 2(r), and

Z and U are independent. Then, the random

variable

has a t-distribution with degrees of freedom r.

rU

ZT

Estimation

Estimation - 7

37

Student’s t Distribution

If X1, X2, …, Xn are observations of a random

sample of size n from the normal distribution

N(, s 2). The statistics, sample mean, , and

sample variance, s 2, are independent and

has a t-distribution with d.f. n – 1.

nS

XT

X

38

t-statistic

nS

X

nSn

n

X

T

s

s

)1()1(

2

2

Z

U d.f. of U

39 40

Student’s t Table

t values

or percentile in t-distribution t0

.05

For a 90% C.I.:

n = 3

df = n - 1 = 2

= .10

/2 =.05

t/2 = ?

2.920

41

Estimation Example

Mean (s Unknown) A random sample of weights of 25 subjects, has a sample

mean 140 and sample standard deviation 8. Set up a

95% confidence interval estimate for .

) 31.341 , 69.631 (

3.31 140 25

8064.2041

064.2

.025, /2 .05,.951 .95, 1

025.024 , /2

tt df

1,2/n

stx n

42

Thinking Challenge

The numbers of community hospital beds per 1000

population that are available in each different

regions of the country is normally distributed. A

random sample 6 regions were selected and the

rates of beds per 1000 were recorded and they are

3.6, 4.2, 4.0, 3.5, 3.8, 3.1.

Find the 90% confidence interval estimate of the

mean bed-rate in the country.

Estimation

Estimation - 8

43

Confidence Interval Solution*

= 3.7

s = 0.38987

x

1592.6

38987.

n

s

(use 90% confidence level)

n = 6, df = n 1 = 6 1 = 5

t.05,5 = 2.015

( 3.7 - (2.015)(0.1592), 3.7 + (2.015)(0.1592) )

( 3.379, 4.021 )

n

stx n 1 ,2/

44

Confidence interval with z-score:

The (1 %confidence interval estimate

for population mean:

Assumption: If sampled from normal

population with known variance, s,

Assumption: If large sample and if

unknown variance, s replaces s,

nzx

s 2/

n

szx 2/

45

Confidence interval with t-score:

The (1 %confidence interval estimate

for population mean:

Assumption: If sampled from normal

population with unknown variance, s,

n

stx ndf 1 ,2/

(If sample size is large the normality assumption is

insignificant.) t z as sample becomes large

46

Average Weight for Female Ten

Year Children In US

Info. from a random sample: n = 10, x = 80 lb, s =

18.05 lb, assume weight is normally distributed,

find the 95% confidence interval estimate for

average weight.

Data: 73.80 50.00 101.40 67.20 102.20 97.80 81.00 93.40 63.20 70.00

How do we know whether normality assumption is OK?

47

T ests of Normality

.171 10 .200* .930 10 .452weight (pounds) of participant

Statistic df Sig. Statistic df Sig.

Kolmogorov-Smirnova Shapiro-Wilk

This is a lower bound of the true significance.*.

Lilliefors Significance Correctiona.

Both are greater than 0.05, normality assumption is acceptable.

48

Average Weight for Female Ten

Year Children In US

Info. from a random sample: n = 10, x = 80 lb, s =

18.05 lb, assume weight is normally distributed, find

the 95% confidence interval estimate for average

weight.

t/2 = t.05/2 = t0.025,d.f. = 10 – 1 = 9, t0.025,9=2.262

10

05.18262.2809,2/

n

stx df

)91.92 ,09.67( 91.1280

Estimation

Estimation - 9

49

Descriptiv es

80.0000 5.70840

67.0867

92.9133

80.4333

77.4000

325 .858

18.05153

50.00

102 .20

52.20

32.5000

-.148 .687

-1.229 1.334

86.8600 3.96048

77.9008

95.8192

86.5222

82.8000

156 .854

12.52413

73.80

106 .00

32.20

25.5500

.553 .687

-1.422 1.334

Mean

Lower Bound

Upper Bound

95% Confidence

Inte rval for Mean

5% Trim med Mean

Median

Variance

Std. Deviation

Min imum

Maxim um

Range

Inte rquart i le Range

Skewness

Kurtosis

Mean

Lower Bound

Upper Bound

95% Confidence

Inte rval for Mean

5% Trim med Mean

Median

Variance

Std. Deviation

Min imum

Maxim um

Range

Inte rquart i le Range

Skewness

Kurtosis

What is your sex?

fem ale

ma le

weight (pounds)

of participant

Statistic Std. Error

80 12.91 Weight for Ten Year Old

50

One-Sided C. I.

Z C.I.:

Lower interval

Upper interval

t C.I.:

Lower interval

Upper interval

) , (n

zxs

) , ( n

zxs

) , (n

stx

) , ( n

stx

51

Size of Interval

95% Samples

sx _

x

+1.96sx -1.96sx

0.025 0.025

.95

52

Lower Interval

95% Samples

sx _

X

+1.64sx

0.05

.95

z = z.05

53

Estimation Example

Mean (s Known)

The mean of a random sample of weight of 25

subjects isx = 140. Set up a upper 95%

confidence interval estimate for if s = 10.

) , 72.361 (

) , 25

1064.1041 (

) , (

1.64. z .05, ,95.1

nzx

s

54

Estimation

Confidence Intervals for Proportions

Estimation

Estimation - 10

55

Proportion Estimation

Parameter: Population Proportion p (or p)

(Percentage of people has no health insurance)

Statistic: Sample Proportion n

xp ˆ

x is number of successes

n is sample size

Remark: If data is coded as 1 or 0, sample mean is the same as sample proportion of 1’s.

Data: 1, 0, 0, 1, 0 px

4.5

2

5

01001

56

Confidence Interval

Proportion

1. Assumptions

Two Categorical Outcomes

Normal Approximation Can Be Used If

np and n(1 – p) are both greater than 5.

) )ˆ1(ˆ

ˆ , )ˆ1(ˆ

ˆ ( 22n

ppzp

n

ppzp

2. Confidence Interval Estimate

(for large sample)

n

ppp

)ˆ1(ˆzˆ

2

57

Parameters of Sample Proportion

X ~ Binomial (n,p),

E[X] = np, Var[X] = np(1p)

n

Xp ˆ

~ ?

E[X/n] = ? , Var[X/n] = ?

n

pp )1( p

58

Estimation Example

Proportion

A random sample of 400 from a large

community showed that 32 have diabetes. Set up

a 95% confidence interval estimate for p, the

percentage of people that have diabetes.

96.1,40008400

32ˆ 025.2/ zzn.p ,

n

ppp

)ˆ1(ˆzˆ

2

59

Estimation Example

Proportion

The 95% C.I. for p, the percentage of people that

have diabetes:

) 107. , 053. ( %7.2%8 .027 .08

400

)08.1(08.96.108.

)ˆ1(ˆ

ˆ 2/n

ppZp

400 ,08400

32ˆ n.p

60

Thinking Challenge

A member of a health department wish to see what percentage of people in a community will support an environmental policy. Of 200 survey forms sent and received, 35 responded that they support the policy and the rest of them do not support the policy.

Find a 90% confidence interval estimate of the percentage of the population in this community that support the policy?

Estimation

Estimation - 11

61

Confidence Interval

Solution*

) %92.21 , %08.13 (

4.42%17.5%0442. .175

645.1 ,200 175.200

35ˆ

2/ zn,p

)ˆ1(ˆ

ˆ 2/n

ppzp

200

)825(.175.645.1175.

62

Example:

Researchers wish to estimate the percentage of

hospital employees infected by SARS in a

certain country. Out of 500 randomly chosen

hospital employees, 14 were infected. Find the

95% confidence interval estimate for

percentage of hospital employees infected by

SARS in this country.

63

Sample Size

25.0

or

2

2

2 E

zn

to get the largest sample to

achieve the goal.

n

ppp

)ˆ1(ˆzˆ :C.I.

2

n

ppZE

)ˆ1(ˆError ofMargin

2

if pilot study is done.

)ˆ1(ˆ2

2

2 ppE

zn

64

Sample Size (No prior information on p)

Sample Size Example: If one wishes to do a

survey to estimate the population proportion

with 95% confidence and a margin of error of

3%, how large a sample is needed?

z/2 = 1.96; E = .03

n = (1.962/.032) x .25 = 1067.11

A sample of size 1068 is needed.

65

Sample Size (With prior information on p)

Sample Size Example: If one wishes to to estimate

the percentage of people infected with West Nile in a

population with 95% confidence and a margin of

error of 3%, how large a sample is needed? (A pilot

study has been done, and the sample proportion was

6%.)

z/2 = 1.96; E = .03

n = (1.962/.032) x .06 x (1 – .06) = 240.7

A sample of size 241 is needed.

How large a sample was used for pilot study? 66

nz

nznppznzp

/1

)4/(/)ˆ1(ˆ)2/(ˆ2

2/

22

2/2/

2

2/

nppzp /)ˆ1(ˆˆ2/

Wilson’s method (Unbiased Estimation)

Asymptotic Method

For x = 16, n = 100, 95% CI is (0.121, 0.287)

For x = 16, n = 100, 95% CI is (0.101, 0.244)