Lecture 4 Chapter 11 wrap-up Chapter 12.2 - Inference about the mean when the s.d. is unknown...

34
Lecture 4 • Chapter 11 wrap-up • Chapter 12.2 - Inference about the mean when the s.d. is unknown • Chapter 12.3 – Inference about a population proportion
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    0

Transcript of Lecture 4 Chapter 11 wrap-up Chapter 12.2 - Inference about the mean when the s.d. is unknown...

Lecture 4

• Chapter 11 wrap-up

• Chapter 12.2 - Inference about the mean when the s.d. is unknown

• Chapter 12.3 – Inference about a population proportion

Hypothesis Testing – Basic Steps

1. Set up alternative and null hypotheses

2. Calculate test statistic, e.g. z-score

3. Find critical values and compare the test statistic to critical value (rejection region method) or find p-value (p-value method)

4. Make substantive conclusions.

Right-, Left, Two-Sided Tests

• Right-sided:

• Left-sided:

• 2-sided:

LL xxn

zx :rej.;;:H 001

LL xxn

zx :rej.;;:H 001

L

L

L

L

xx

xx

nz

x

x:rej.;;:H

2001

Summary: Steps in Testing• Determine and (right//left//2sided), and

decide on a significance level .• Rejection region method: calculate and

reject if // // • P-value method: calculate =>

P(Z>z) // P(Z<z) // P(|Z|>|z|) from z-tables, or “Prob>z” // ”Prob<z” // “Prob>|z|” from JMP; reject if p-value .

• Interpret the result and tell a story.

)(H 00 1Hx

)(Lx

Lxx Lxx Lxx)//()( 0 nxz

Relationship Between CIs and Hypothesis Tests

• There is a duality between confidence intervals and hypothesis tests

• We can construct a level hypothesis test based on a level confidence interval by rejecting if and only if is not in the confidence interval

• We can construct a level confidence interval based on a level hypothesis test by including in the confidence interval if and only if the test does not reject

00 : H

)%1(100

)%1(100

00 : H

Calculation of Type II error

1. State alternative for which you want to find P(Type II error).

2. Find rejection region in terms of unstandardized statistic (sample mean)

3. Find the probability of the sample mean falling outside the rejection region if the alternative under consideration is true (use standardization relative to the alternative hypothesis mean to calculate this probability).

Summary: Power Calculations

• Works only for the rejection region method, and we don’t do it for 2-sided tests.

• Calculate for level- test.

• Right-sided: P(Z<z) from z-table; =P(Z<z)

• Left-sided: P(Z>z) from z-table; =P(Z>z)

n

xz L

/1

Frequent -values

0.10 0.05 0.025 0.01 0.005

1.28 1.64 1.96 2.33 2.58

z

z

Practice Problems

• 11.68,11.84,12.40,12.46

Chapter 12

• In this chapter we utilize the approach In this chapter we utilize the approach developed before to describe a population.developed before to describe a population.– Identify the parameter to be estimated or tested.Identify the parameter to be estimated or tested.– Specify the parameter’s estimator and its Specify the parameter’s estimator and its

sampling distribution.sampling distribution.– Construct a confidence interval estimator or Construct a confidence interval estimator or

perform a hypothesis test.perform a hypothesis test.

Recall that when is known we use the following statistic to estimate and test a population mean

When is unknown, we use its point estimator s,

and the z-statistic is replaced then by the t-statistic

12.2 Inference About a Population Mean When the Population Standard Deviation Is Unknown

n

xz

t-Statistic

ns

xt

/

• When the sampled population is normally distributed, the t statistic is Student t distributed with n-1 degrees of freedom.

• Confidence Interval: where is the quantile of the Student t-distribution with n-1 degrees of freedom. n

stx n 1,2/ 1,2/ nt

2/

t-Statistic

ns

xt

/

• When the sampled population is normally distributed, the t statistic is Student t distributed with n-1 degrees of freedom.

• Confidence Interval: where is the quantile of the Student t-distribution with n-1 degrees of freedom. n

stx n 1,2/

1,2/ nt2/

The t - Statistic

n

x

n

x

s

0

The t distribution is mound-shaped, and symmetrical around zero.

The “degrees of freedom”,(a function of the sample size)determine how spread thedistribution is (compared to the normal distribution)

d.f. = v2

d.f. = v1

v1 < v2

t

Degrees of Freedom1 3.078 6.314 12.706 31.821 63.6572 1.886 2.92 4.303 6.965 9.925. . . . . .. . . . . .

20 1.325 1.725 2.086 2.528 2.845. . . . . .. . . . . .

200 1.286 1.653 1.972 2.345 2.6011.282 1.645 1.96 2.326 2.576

tA

t.100 t.05 t.025 t.01 t.005

A = .05

Testing when is unknown

• Example 12.1– In order to determine the number of workers required to

meet demand, the productivity of newly hired trainees is studied.

– It is believed that trainees can process and distribute more than 450 packages per hour within one week of hiring.

– Fifty trainees were observed for one hour. In this sample of 50 trainees, the mean number of packages processed is 460.38 and s=38.82.

– Can we conclude that the belief is correct, based on the productivity observation of 50 trainees?

Checking the required conditions

• In deriving the test and confidence interval, we have made two assumptions: (i) the sample is a random sample from the population; (ii) the distribution of the population is normal.

• The t test is robust – the results are still approximately valid as long as the population is not extremely nonnormal. Also if the sample size is large, the results are approximately valid.

• A rough graphical approach to examining normality is to look at the sample histogram.

350 400 450 500 550

Packages

Distributions

JMP Example

• Problem 12.45: Companies that sell groceries over the Internet are called e-grocers. Customers enter their orders, pay by credit card, and receive delivery by truck. A potential e-grocer analyzed the market and determined that to be profitable the average order would have to exceed $85. To determine whether an e-grocer would be profitable in one large city, she offered the service and recorded the size of the order for a random sample of customers. Can we infer from the data that e-grocery will be profitable in this city at significance level 0.05?

12.3 Inference About a Population Variance

• Sometimes we are interested in making inference about the variability of processes.

• Examples:– The consistency of a production process for

quality control purposes.– Investors use variance as a measure of risk.

• To draw inference about variability, the parameter of interest is 2.

• The sample variance s2 is an unbiased, consistent and

efficient point estimator for 2.

• The statistic has a distribution called Chi-

squared, if the population is normally distributed. 2

2s)1n(

1..)1(

2

22

nfd

sn

1..

)1(2

22

nfd

sn

d.f. = 5

d.f. = 10

12.3 Inference About a Population Variance

Confidence Interval for Population Variance

• From the following probability statement

P(21-/2 < 2 < 2

/2) = 1-

we have (by substituting 2 = [(n - 1)s2]/2.)

22/1

22

22/

2 s)1n(s)1n(

22/1

22

22/

2 s)1n(s)1n(

• Example 12.3 (operation management application)– A container-filling machine is believed to fill 1 liter

containers so consistently, that the variance of the filling will be less than 1 cc (.001 liter).

– To test this belief a random sample of 25 1-liter fills was taken, and the results recorded (Xm12-03). s2=0.8659.

– Do these data support the belief that the variance is less than 1cc at 5% significance level?

– Find a 99% confidence interval for the variance of fills.

Testing the Population Variance

JMP implementation of two-sided test

998.0

998.5

999.0

999.5

1000.0

1000.5

1001.0

1001.5

Hypothesized Value

Actual Estimate

df

1

0.93054

24

Test Statistic

Prob > |ChiSq|

Prob < ChiSq

Prob > ChiSq

20.7816

0.6969

0.3484

0.6516

ChiSquare

Test Standard Deviation=value

Fills

Distributions

12.4 Inference About a Population Proportion

• When the population consists of nominal data (e.g., does the customer prefer Pepsi or Coke), the only inference we can make is about the proportion of occurrence of a certain value.

• When there are two categories (success and failure), the parameter p describes the proportion of successes in the population. The probability of obtaining X successes in a random sample of size n from a large population can be calculated using the binomial distribution.

.sizesamplen.successesofnumberthex

wherenx

.sizesamplen.successesofnumberthex

wherenx

• Statistic and sampling distribution– the statistic used when making inference about p is:

– Under certain conditions, [np > 5 and n(1-p) > 5], is approximately normally distributed, with

= p and 2 = p(1 - p)/n.p̂

12.4 Inference About a Population Proportion

Testing and Estimating the Proportion• Test statistic for p

• Interval estimator for p (1- confidence level)

5)1(5

/)1(

ˆ

pnandnpwhere

npp

ppZ

5)1(5

/)1(

ˆ

pnandnpwhere

npp

ppZ

5)p̂1(nand5p̂nprovided

n/)p̂1(p̂zp̂ 2/

5)p̂1(nand5p̂nprovided

n/)p̂1(p̂zp̂ 2/

• Example 12.5 (Predicting the winner in election day)– Voters are asked by a certain network to participate

in an exit poll in order to predict the winner on election day.

– The exit poll consists of 765 voters. 407 say that they voted for the Republican network.

– The polls close at 8:00. Should the network announce at 8:01 that the Republican candidate will win?

Testing the Proportion

Selecting the Sample Size to Estimate the Proportion

• Recall: The confidence interval for the proportion is

• Thus, to estimate the proportion to within W, we can write

• The required sample size is:

nppzp /)ˆ1(ˆˆ 2/

nppzW /)ˆ1(ˆ2/

2

2/ )ˆ1(ˆ

W

ppzn

2

2/ )ˆ1(ˆ

W

ppzn

• Example– Suppose we want to estimate the proportion of

customers who prefer our company’s brand to within .03 with 95% confidence.

– Find the sample size needed.

– SolutionW = .03; 1 - = .95,

therefore /2 = .025,

so z.025 = 1.96

2

03.)p̂1(p̂96.1

n

Since the sample has not yet been taken, the sample proportionis still unknown.

We proceed using either one of the following two methods:

Sample Size to Estimate the Proportion

• Method 1:– There is no knowledge about the value of

• Let . This results in the largest possible n needed for a 1- confidence interval of the form .

• If the sample proportion does not equal .5, the actual W will be narrower than .03 with the n obtained by the formula below.

• Method 2:– There is some idea about what will turn out to be.

• Use a probable value of to calculate the sample size

5.p̂

03.ˆ p

068,103.

)5.1(5.96.1n

2

68303.

)2.1(2.96.1n

2

Sample Size to Estimate the Proportion

p̂p̂

Practice Problems

• 12.40, 12.46, 12.58, 12.77, 12.98