EM561 Lecture Notes - Part 3 of 3[1]

103
1 EM 561 GW DePuy 286 8-1 Introduction In the previous chapter we illustrated how a parameter can be estimated from sample data. Estimate mean length of adult foot. Measure 28 students and find X = 26.322 cm. Is this a good estimate for μ? Do we believe μ EXACTLY equals 26.322 cm (μ=X)? Rather than estimate μ with a single value (called point estimate) let‟s develop range or interval of values (called interval estimate) we think contains true value of μ. Now rather than saying μ = 26.322 cm, we say 25.77 ≤ μ≤ 26.87 cm EM 561 GW DePuy 287

Transcript of EM561 Lecture Notes - Part 3 of 3[1]

Page 1: EM561 Lecture Notes - Part 3 of 3[1]

1

EM 561 GW DePuy 286

8-1 Introduction

• In the previous chapter we illustrated how a

parameter can be estimated from sample data.

• Estimate mean length of adult foot. Measure 28

students and find X = 26.322 cm. Is this a good estimate

for μ?

• Do we believe μ EXACTLY equals 26.322 cm (μ=X)?

• Rather than estimate μ with a single value (called

point estimate) let‟s develop range or interval of

values (called interval estimate) we think

contains true value of μ.

• Now rather than saying μ = 26.322 cm, we say 25.77 ≤

μ ≤ 26.87 cm

EM 561 GW DePuy 287

Page 2: EM561 Lecture Notes - Part 3 of 3[1]

2

8-1 Introduction

• A confidence interval for an unknown parameter

θ is an interval that contains a set of plausible or

believable values of the parameter.

• It is associated with a confidence level, 1-α,

which measures the probability that the confidence

interval actually contains the unknown parameter.

• A 100(1-α) percent confidence interval on the

unknown parameter θ:

P(lower limit ≤ θ ≤ upper limit) = 1-α

EM 561 GW DePuy 288

8-1 Introduction

• For example, a 99% confidence interval on

the parameter θ

• Find the lower and upper limit such that:

P(lower limit ≤ θ ≤ upper limit) = 0.99

• Common 100(1-α)% CI:

• 99% CI → 1-α = 0.99 → α = 0.01

• 95% CI → 1-α = 0.95 → α = 0.05

• 90% CI → 1-α = 0.90 → α = 0.10

IE 360 GW DePuy 289IE 360 GW DePuy 289

Page 3: EM561 Lecture Notes - Part 3 of 3[1]

3

8-1 Introduction

• We use a sample statistic to estimate the

population parameter

• X to estimate μ, S2 to estimate σ2

• General procedure to form confidence interval

– Take a sample

– Find the sample statistic of interest (i.e. X or S2)

– Form interval around sample statistic that we think contains true population parameter value

• Calculate upper and lower CI limit

EM 561 GW DePuy 290

8-1 Introduction

• How to calculate upper and lower CI limits?

• Using the sampling distribution of the

appropriate sample statistic, choose CI such that:

P(lower limit ≤ θ ≤ upper limit) = 1-α

• Use Normal distribution to find CI for μ when n≥40

• Use Student t distribution to find CI for μ when

n<40

• Use Chi-square distribution to find CI for σ2

IE 360 GW DePuy 291IE 360 GW DePuy 291

Page 4: EM561 Lecture Notes - Part 3 of 3[1]

4

8-1 Introduction

• How to calculate upper and lower CI limits?P(lower limit ≤ θ ≤ upper limit) = 1-α

• As we will soon see, width (upper limit – lower

limit) of a CI is a function of:

• Confidence level, 1-α

• As confidence level ↑, CI width ↑

• Sample size (i.e. # observations)

• As sample size↑, CI width ↓

• Variance of data (as measured by s2)

• As variance ↑, CI width ↑

IE 360 GW DePuy 292IE 360 GW DePuy 292

8-1 Introduction

• The width of a confidence interval is a measure of the

precision of estimation.

• The width of the confidence interval is a measure of the

quality of information obtained from the sample.

• The wider the confidence interval, the more confident we

are that the interval actually contains the true value of θ.

– Remember: as confidence level ↑, CI width ↑

• However, the wider the confidence interval the less

information we have about the true value of θ

• In an ideal situation, we obtain a relatively narrow interval

with high confidence. This is possible when sample size is

large and/or variance is small.

EM 561 GW DePuy 293

Page 5: EM561 Lecture Notes - Part 3 of 3[1]

5

8-1 Introduction

• Precision of CI

• Do not want CI that is too narrow

• Low probability it will contain true population parameter

• Do not want CI that is too wide

• Not useful to make decisions

• Mean length of foot example

• 50% CI: 26.22 ≤ μ ≤ 26.42 cm

• 99% CI: 16.0 ≤ μ ≤ 36.0 cm

• 99% CI: 25.92 ≤ μ ≤ 26.72 cm

IE 360 GW DePuy 294IE 360 GW DePuy 294

8-1 Introduction

• Two-sided confidence interval specifies both a

lower and upper limit on θ

P(lower limit ≤ θ ≤ upper limit) = 1-α

• One-sided confidence interval may be more

appropriate for some applications.

• One-sided lower confidence interval on θ

P(lower limit ≤ θ) = 1-α

• One-sided upper confidence interval on θ

P(θ ≤ upper limit) = 1-α

IE 360 GW DePuy 295IE 360 GW DePuy 295

Page 6: EM561 Lecture Notes - Part 3 of 3[1]

6

8-1 Introduction

• We will find confidence intervals for several

population parameters:

– Mean, μ, of any distribution, large sample size

(n ≥ 40)

– Mean, μ, of normal distribution, small sample

size (n < 40)

– Variance, σ2, of normal distribution

– Population proportion, p

EM 561 GW DePuy 296

8-2 Confidence Interval on the Mean, μ,

for Large Sample

• Want to estimate the population mean, μ, of data

from any distributed with unknown variance, σ2

• Take a sample of size n ≥ 40 (Note: data can have

any distribution)

• Find sample mean, , and sample variance, S2

• will be Normally distributed with mean μ and

variance σ2/n

– How do I know this? Remember Central Limit Theorem

from Chpt 7. As sample size, n, gets large, the distribution

of sample means approaches a normal distribution

X

X

EM 561 GW DePuy 297

Page 7: EM561 Lecture Notes - Part 3 of 3[1]

7

De

nsit

y

Distribution Plot

8-2 Confidence Interval on the Mean, μ,

for Large Sample

nNX

2

,~

• Data from UNKNOWN distribution with UNKNOWN mean, μ, and

UNKNOWN variance σ2

• By CLT, sample means normally distributed with mean, μ, and variance σ2/n

• Both distributions have same mean, μ, what is it? Form interval we

think contains μ

μ

),??(~ 2X

EM 561 GW DePuy 298

8-2 Confidence Interval on the Mean, μ,

for Large Sample

• Use sample mean, , to estimate μ

• Sample mean, , will be center of CI for μ

• Calculate upper and lower CI limit using ~ N(μ,σ2/n)

and 1-α such that

P(lower limit ≤ μ ≤ upper limit) = 1-α

• How to calculate lower limit and upper limit?

X

X

EM 561 GW DePuy 299

X

Page 8: EM561 Lecture Notes - Part 3 of 3[1]

8

f(x)

n

σμ,N~X

2

μ

Xf(

x)

n

σμ,N~X

2

μ

X

8-2 Confidence Interval on the Mean, μ,

for Large Sample

IE 360 GW DePuy 300

• Our sample mean is a single draw from this distribution.

• We do not know which part of the curve our sample mean

was drawn from. So we do not know how close the sample

mean is to μ

XXX

IE 360 GW DePuy 300

8-2 Confidence Interval on the Mean, μ,

for Large Sample

• We want the CI we

form around our

sample mean to have

a high (1-α)

probability of

including μ

• All samples from this

distribution will have

same width CI. What

should that width be?

IE 360 GW DePuy 301

XXX

f(x)

n

σμ,N~X

2

μ

X

f(x)

n

σμ,N~X

2

μ

X

IE 360 GW DePuy 301

Page 9: EM561 Lecture Notes - Part 3 of 3[1]

9

8-2 Confidence Interval on the Mean, μ,

for Large Sample

• We choose the CI limits such that

P(lower limit ≤ μ ≤ upper limit) = 1-α

• We choose upper and lower limits such that repeated sampling will result in 100(1-α)% of the CI containing μ

– 95% CI example: choose upper and lower limit such that when we repeatedly take a sample of size n and form a CI, 95% of these CIs will contain μ

EM 561 GW DePuy 302

8-2 Confidence Interval on the Mean, μ,

for Large Sample

Figure 8-1 Repeated construction of a confidence interval for .

How wide to

make this

interval so

that it

includes μ

100(1-α)% of

the time?

EM 561 GW DePuy 303

Page 10: EM561 Lecture Notes - Part 3 of 3[1]

10

f(x)

n

σμ,N~X

2

X

μ

f(x)

n

σμ,N~X

2

X

μ

8-2 Confidence Interval on the Mean, μ,

for Large Sample

• 100(1-α)% of our samples will have a sample mean in this

range.

• If we choose the upper and lower CI limits so that these

sample‟s CIs include μ, then 100(1-α)% of the CI will include μ

1-α 2

2

EM 561 GW DePuy 304

8-2 Confidence Interval on the Mean, μ,

for Large Sample

For example, 95% of

samples will have

sample mean in this

range

Choose the CI width so that all sample means in this range

will include μ. Consider most extreme cases

XX

Lower LimitUpper Limit

f(x)

n

σμ,N~X

2

X

μ

0.95

f(x)

n

σμ,N~X

2

X

μ

f(x)

n

σμ,N~X

2

X

μ

0.95

EM 561 GW DePuy 305

0.0250.025

Page 11: EM561 Lecture Notes - Part 3 of 3[1]

11

8-2 Confidence Interval on the Mean, μ,

for Large Sample

• So, for X ~N(μ,σ2/n), 95% of area under the curve within ± 1.96

s.d. of mean where s.d. =

0.4

0.3

0.2

0.1

0.0

f(x)

0

0.95

Z-1.96 1.96

• So our 95% CI on μ has a width of 2(1.96)( ) nσ

0.025 0.025

• Using standard normal distribution, N(0,1), we find 95% of area

under curve within ± 1.96 standard deviations of mean

Z~N(0,1)

f(x)

n

σμ,N~X

2

X

μ

0.950.0250.025

n

96.1

n

96.1

f(x)

n

σμ,N~X

2

X

μ

0.95

f(x)

n

σμ,N~X

2

X

μ

f(x)

n

σμ,N~X

2

X

μ

0.950.0250.025

n

96.1

n

96.1

EM 561 GW DePuy 306

8-2 Confidence Interval on the Mean, μ,

for Large Sample

IE 360 GW DePuy 307

95% CI 100(1-α)% CI

f(x)

n

σμ,N~X

2

X

μ

f(x)

n

σμ,N~X

2

X

μ X

0.95

nX

96.1

nX

96.1

0.0250.025

f(x)

n

σμ,N~X

2

X

μ

f(x)

n

σμ,N~X

2

X

μ X

0.95

nX

96.1

nX

96.1

f(x)

n

σμ,N~X

2

X

μ

f(x)

n

σμ,N~X

2

X

μ X

0.95

nX

96.1

nX

96.1

0.0250.025

f(x)

n

σμ,N~X

2

X

μ

f(x)

n

σμ,N~X

2

X

μ X

1-α

nZX

2/

nZX

2/

2

2

IE 360 GW DePuy 307

Page 12: EM561 Lecture Notes - Part 3 of 3[1]

12

8-2 Confidence Interval on the Mean, μ,

for Large Sample

• 100(1-α)% Confidence Interval on μ (2-sided):

EM 561 GW DePuy 308

nZX

nZX

2/2/

• But we have unknown variance, σ2

• When large sample size (n ≥ 40), we can replace σ2

with sample variance, S2

• 100(1-α)% Confidence Interval on μ (2-sided):

n

SZX

n

SZX 2/2/

• 100(1-α)% Confidence Interval on μ (2-sided):

• where Zα/2 is the upper 100(α/2) percentage point of the standard normal distribution

• Zα/2 = z for which P(Z>z) = α/2

• For a 90% CI, α = 0.10 and Z0.05 = 1.645

• For a 95% CI, α = 0.05 and Z0.025 = 1.96

• For a 99% CI, α = 0.01 and Z0.005 = 2.576

8-2 Confidence Interval on the Mean, μ,

for Large Sample

IE 360 GW DePuy 309

These Z values

come from the

N(0,1) table

IE 360 GW DePuy 309

n

SZX

n

SZX 2/2/

n

SZX 2/

Page 13: EM561 Lecture Notes - Part 3 of 3[1]

13

• 100(1-α)% Confidence Interval on μ (2-sided):

• Notice: CI centered around sample mean

• Notice: CI width is a function of confidence level, sample size, and variance

8-2 Confidence Interval on the Mean, μ,

for Large Sample

EM 561 GW DePuy 310

n

SZX

n

SZX 2/2/

n

SZX 2/

8-2 Confidence Interval on the Mean, μ,

for Large Sample

• Kellogg‟s is interested in estimating the mean fill weight of

their corn flakes cereal boxes.

• They take a sample of 60 boxes and calculate the sample

mean to be 12.05 oz. and the sample standard deviation to

be 0.14 oz.

1. Find a 90% CI for the mean fill weight, μ

• Believable μ = 12.00 oz? μ = 12.10 oz? μ = 12.025 oz?

EM 561 GW DePuy 311

Page 14: EM561 Lecture Notes - Part 3 of 3[1]

14

8-2 Confidence Interval on the Mean, μ,

for Large Sample

One-sided Confidence Intervals

• Often only interested in upper or lower CI limit (not both)

– Lower limit of concrete strength

– Upper limit on automated fill volume

• One-sided lower confidence interval on θ

P(lower limit ≤ θ) = 1-α

• One-sided upper confidence interval on θ

P(θ ≤ upper limit) = 1-α

EM 561 GW DePuy 312

8-2 Confidence Interval on the Mean, μ,

for Large Sample

One-sided lower confidence interval on μ

• We choose the Lower CI limit such that

P(lower limit ≤ μ) = 1-α

• We choose the lower limit such that repeated sampling will result in 100(1-α)% of the one-sided lower CI containing μ

• 95% CI example: choose lower limit such that when we repeatedly take a sample of size n and form a one-sided lower CI, 95% of these Lower CIs will contain μ

• Only those 5% of sample means that are very large will not include μ in their Lower CI

IE 360 GW DePuy 313

Page 15: EM561 Lecture Notes - Part 3 of 3[1]

15

8-2 Confidence Interval on the Mean, μ,

for Large Sample

Example of 95%

Lower CI

95% of samples will

have sample mean in

this range

Choose the CI lower limit so that all sample means in this

range will include μ. Consider most extreme case

X

Lower Limit

f(x)

n

σμ,N~X

2

X

μ

0.95 0.05

f(x)

n

σμ,N~X

2

X

μ

0.95 0.05

IE 360 GW DePuy 314

8-2 Confidence Interval on the Mean, μ,

for Large Sample

• Using standard normal distribution, N(0,1), we find 95% of area

under curve is less than +1.645 standard deviations

0.4

0.3

0.2

0.1

0.0

De

nsit

y

Z~N(0,1)

Z

0.95

0 1.645

0.05

0.4

0.3

0.2

0.1

0.0

De

nsit

y

Z~N(0,1)

Z

0.95

0 1.645

0.05

f(x)

n

σμ,N~X

2

X

μ

0.95 0.05

n

645.1

f(x)

n

σμ,N~X

2

X

μ

0.95 0.05

f(x)

n

σμ,N~X

2

X

μ

0.95 0.05

n

645.1

• So, for X ~N(μ,σ2/n), 95% of area under the curve is less than

1.645 s.d. of mean where s.d. = n

IE 360 GW DePuy 315

Page 16: EM561 Lecture Notes - Part 3 of 3[1]

16

8-2 Confidence Interval on the Mean, μ,

for Large Sample

IE 360 GW DePuy 316IE 360 GW DePuy 316

n

SZX

n

SZX

• 100(1-α)% Lower Confidence Bound on μ (1-sided):

• One-sided Upper CI calculated in much the same way as

One-sided Lower CI

• 100(1-α)% Upper Confidence Bound on μ (1-sided):

8-2 Confidence Interval on the Mean, μ,

for Large Sample

• Kellogg‟s is interested in estimating the mean fill weight of

their corn flakes cereal boxes.

• They take a sample of 60 boxes and calculate the sample

mean to be 12.05 oz. and the sample standard deviation to

be 0.14 oz.

2. Find a 90% Lower Confidence Bound for the mean fill

weight, μ

• Believable μ > 12.00 oz? μ > 12.10 oz?

EM 561 GW DePuy 317

Page 17: EM561 Lecture Notes - Part 3 of 3[1]

17

8-2 Confidence Interval on the Mean, μ,

for Large Sample

Summary of Procedure

• Data Distribution: Any i.e. X~ ??(μ, σ2)

• Population parameters:

– Mean, μ: unknown → to be estimated

– Variance, σ2: unknown → estimate using S2

• Sample Size: large i.e. n ≥ 40

• Sample Statistic: Sample Mean

• Sampling Distribution: Normal Distribution

• CIs:

2-sided CI Upper 1-sided CI Lower 1-sided CI

n

SZX

n

SZX

n

SZX 2/

EM 561 GW DePuy 318

8-2 Confidence Interval on the Mean, μ, for

Large Sample

How to do CI on μ, for large sample, in

Minitab?

• Can either enter each observation in a

worksheet column or input summarized data

Stat

Basic Statistics

1-sample Z

EM 561 GW DePuy 319

Page 18: EM561 Lecture Notes - Part 3 of 3[1]

18

8-2 Confidence Interval on the Mean, μ, for

Large Sample

How to do CI on μ, for large sample, in Minitab?

n

σ

X

Default is 95%

two-sided CI

„Options‟ to

change

confidence level

and one-sided CI

EM 561 GW DePuy 320

8-2 Confidence Interval on the Mean, μ,

for Large Sample

One-Sample Z

The assumed standard deviation = 100

N Mean SE Mean 95% CI

17 215.0 24.3 (167.5, 262.5)

EM 561 GW DePuy 321

Page 19: EM561 Lecture Notes - Part 3 of 3[1]

19

8-2 Confidence Interval on the Mean,

μ, for known population variance, σ2

• Use Normal distribution to find CI for μ when

population variance, σ2, known

EM 561 GW DePuy 322

2-sided CI Upper 1-sided CI Lower 1-sided CI

nZX

2/

nZX

2/

nZX 2/

8-3 Confidence Interval on the Mean, μ, of

Normal Distribution with Small Sample

• Want to estimate the population mean, μ, of data that is

Normally distributed with unknown variance (i.e. small sample)

• Take a sample of size n from Normally distributed data (Note:

n < 40)

• Find sample mean,

• will be Student-t distributed with n-1 degrees of freedom

– How do I know this? Remember from Chpt 7: Student t Distribution is

sampling distribution of the sample mean when sample size, n, is small

and underlying distribution is Normal (or close to Normal)

• Calculate upper and lower CI limit using Student-t distribution

and 1-α such that P(lower limit ≤ μ ≤ upper limit) = 1-α

X

nS

-X

EM 561 GW DePuy 323

Page 20: EM561 Lecture Notes - Part 3 of 3[1]

20

8-3 Confidence Interval on the Mean, μ, of

Normal Distribution with Small Sample

• Remember: Student-t distribution similar to

Standard Normal Distribution, N(0,1)

– Both symmetric, bell-shaped curves

– Both have a mean of zero, μT =0, μZ = 0

– We use tables (t-tables or Z-table) to evaluate

the area under the curve for standardized

values

• Example: standardized sample means

n

XZ

nS

XT

EM 561 GW DePuy 324

8-3 Confidence Interval on the Mean, μ, of

Normal Distribution with Small Sample

• So for X, 100(1-α)% of area under the curve within ± tα/2,n-1 s.d.

of mean where s.d. =

• Using Student-t distribution, we find 100(1-α)% of area under

curve within ± tα/2,n-1 of mean

f(x)

0

α/2 α/21-α

tα/2,n-1-tα/2,n-1

t

f(x)

0

α/2 α/21-α

tα/2,n-1-tα/2,n-1

t

nS

f(x)

μ

α/2 α/21-α X

nSt n 1,2/ nSt n 1,2/

EM 561 GW DePuy 325

Page 21: EM561 Lecture Notes - Part 3 of 3[1]

21

• 100(1-α)% Confidence Interval on μ (2-sided):

• where tα/2,n-1 is the upper 100(α/2) percentage point of the student-t distribution with n-1 d.f.

– Use t table in back of book to find appropriate t value for n and α

8-3 Confidence Interval on the Mean, μ,

of Normal Distribution with Small Sample

n

StX n 1,2/

n

StX

n

StX nn 1,2/1,2/

EM 561 GW DePuy 326

8-3 Confidence Interval on the Mean, μ,

of Normal Distribution with Small Sample

n

StX n 1,

n

StX n 1,

• 100(1-α)% Upper Confidence Bound on μ (1-sided):

• 100(1-α)% Lower Confidence Bound on μ (1-sided):

EM 561 GW DePuy 327

Page 22: EM561 Lecture Notes - Part 3 of 3[1]

22

8-3 Confidence Interval on the Mean, μ,

of Normal Distribution with Small Sample

• Suppose I measure my systolic blood pressure 5 times and

get the following values:

98, 117, 132, 105, 121

• Assume my systolic blood pressure is normally distributed

1. Find a 95% CI on my mean systolic blood pressure

• Believable my mean systolic BP is 100?

EM 561 GW DePuy 328

8-3 Confidence Interval on the Mean, μ,

of Normal Distribution with Small Sample

2. Believable my mean systolic BP is less than 100? Find

appropriate 95% CI.

EM 561 GW DePuy 329

Page 23: EM561 Lecture Notes - Part 3 of 3[1]

23

Summary of Procedure

• Data Distribution: Normal i.e. X~ N(μ, σ2)

• Population parameters:

– Mean, μ: unknown → to be estimated

– Variance, σ2: unknown → estimate using S2

• Sample Size: small (i.e. n< 40)

• Sample Statistic: Sample Mean

• Sampling Distribution: Student-t Distribution

• CIs:

8-3 Confidence Interval on the Mean, μ,

of Normal Distribution with Small Sample

2-sided CI Upper 1-sided CI Lower 1-sided CI

n

StX n 1,2/

n

StX n 1,

n

StX n 1,

EM 561 GW DePuy 330

8-3 Confidence Interval on the Mean, μ, of

Normal Distribution with Small Sample

How to do CI on μ, with n small,

in Minitab?

• Can either enter each

observation in a worksheet

column or input summarized

data

Stat

Basic Statistics

1-sample t

EM 561 GW DePuy 331

Page 24: EM561 Lecture Notes - Part 3 of 3[1]

24

8-3 Confidence Interval on the Mean, μ, of

Normal Distribution with Small Sample

How to do CI on μ, with n small, in Minitab?

Default is 95%

two-sided CI

„Options‟ to

change

confidence level

and one-sided CI

EM 561 GW DePuy 332

8-3 Confidence Interval on the Mean, μ, of

Normal Distribution with Small Sample

One-Sample T: BP

Variable N Mean StDev SE Mean 95% CI

BP 5 114.60 13.39 5.99 (97.97, 131.23)

EM 561 GW DePuy 333

Page 25: EM561 Lecture Notes - Part 3 of 3[1]

25

Confidence Intervals on μ

• We discussed several different CI on μ – how to know which CI is appropriate for a particular problem?

• Depends on data distribution, whether population variance, σ2, is known and sample size

• 2-sided or 1-sided CI?

IE 360 GW DePuy 334IE 360 GW DePuy 334

data distribution σ2 sample size CI distribution

any distribution unknown large Z

Normal Distribution unknown small t

Normal Distribution known any Z

Stop here on Saturday?

• Turn in HW#1

• Review HW#1 solutions

• Test #1 on Monday!

IE 360 GW DePuy 335

Page 26: EM561 Lecture Notes - Part 3 of 3[1]

26

8-4 Confidence Interval on the Variance,

σ2, of a Normal Distribution

• Want to estimate the population variance, σ2, of data that is

Normally distributed

• Take a sample of size n from Normally distributed data (Note:

n is any size)

• Find sample variance, S2

• S2 will be Chi-squared distributed (multiplied by a constant)

with n-1 degrees of freedom

– How do I know this? Remember from Chpt 7: Chi-square Distribution

is sampling distribution of the sample variance,S2

• Calculate upper and lower CI limit using S2 ~ (σ2/n-1)X2n-1 and

1-α such that P(lower limit ≤ σ2 ≤ upper limit) = 1-α

EM 561 GW DePuy 336

8-4 Confidence Interval on the Variance,

σ2, of a Normal Distribution

• Using Chi-squared distribution, we find 100(1-α)% of area

under curve in range X21-(α/2),n-1 to X2

α/2,n-1

• 100(1-α)% of our samples will have a Chi-squared value, X2, in

this range where X2 = (n-1)S2/σ2

f(x)

0

1-α α/2α/2

2

1

22

1~

n

nS

2

2)1(

Sn

2

1,2/1 n 2

1,2/ n

f(x)

0

1-α α/2α/2

2

1

22

1~

n

nS

2

2)1(

Sn

2

1,2/1 n 2

1,2/ n

EM 561 GW DePuy 337

Page 27: EM561 Lecture Notes - Part 3 of 3[1]

27

• 100(1-α)% Confidence Interval on σ2 (2-sided):

• Notice: CI width is a function of confidence level, sample size, and sample variance

• Use Chi-squared table in back of book to find appropriate Chi-squared value for n and α

• To find CI on σ, take square root of both CI limits

8-4 Confidence Interval on the Variance,

σ2, of a Normal Distribution

IE 360 GW DePuy 338

2

1,2/1

22

2

1,2/

2 )1()1(

nn

SnSn

IE 360 GW DePuy 338

8-4 Confidence Interval on the Variance,

σ2, of a Normal Distribution

2

1,1

22 )1(

n

Sn

2

2

1,

2)1(

n

Sn

• 100(1-α)% Upper Confidence Bound on σ2 (1-sided):

• 100(1-α)% Lower Confidence Bound on σ2 (1-sided):

• To find CI on σ, take square root of CI limits

EM 561 GW DePuy 339

Page 28: EM561 Lecture Notes - Part 3 of 3[1]

28

8-4 Confidence Interval on the Variance,

σ2, of a Normal Distribution

• A rivet is to be inserted into a hole. A random sample of n= 15 parts is selected, and the hole diameter is measured. The sample standard deviation of the hole diameter measurements is S = 0.008 mm. If the variance of the hole diameter is too large, an unacceptable proportion of rivets will not fit properly in the hole. Assume the hole diameter is normally distributed.

• Construct a 99% upper confidence bound for σ2

• Believable σ < 0.010 mm? σ < 0.015 mm?

EM 561 GW DePuy 340

8-4 Confidence Interval on the Variance,

σ2, of a Normal Distribution

2-sided CI Upper 1-sided CI Lower 1-sided CI

Summary of Procedure

• Data Distribution: Normal i.e. X~ N(μ, σ2)

• Population parameters:

– Variance, σ2: unknown → to be estimated

• Sample Size: any

• Sample Statistic: Sample variance

• Sampling Distribution: Chi-squared Distribution

• CIs:

2

1,2/1

22

2

1,2/

2 )1()1(

nn

SnSn

2

1,1

22 )1(

n

Sn

2

2

1,

2)1(

n

Sn

IE 360 GW DePuy 341

Page 29: EM561 Lecture Notes - Part 3 of 3[1]

29

8-4 Confidence Interval on the Variance,

σ2, of a Normal Distribution

How to do CI on σ2 in Minitab?

• Can either enter each observation in a

worksheet column or input summarized data

Stat

Basic Statistics

1 Variance

EM 561 GW DePuy 342

How to do CI on σ2, in Minitab?

8-4 Confidence Interval on the Variance,

σ2, of a Normal Distribution

n

S

Default is 95% two-

sided CI

„Options‟ to change

confidence level and

one-sided CI

Choose to enter S

or S2

EM 561 GW DePuy 343

Page 30: EM561 Lecture Notes - Part 3 of 3[1]

30

How to do CI on σ2, in Minitab?

8-4 Confidence Interval on the Variance,

σ2, of a Normal Distribution

Change to 99% CI

Change to 1-sided

upper bound CI

EM 561 GW DePuy 344

Test and CI for One Standard Deviation

Statistics

N StDev Variance

15 0.00800 0.000064

99% One-Sided Confidence Intervals

Upper Bound Upper Bound

Method for StDev for Variance

Standard 0.01387 0.000192

8-4 Confidence Interval on the Variance,

σ2, of a Normal Distribution

EM 561 GW DePuy 345

Page 31: EM561 Lecture Notes - Part 3 of 3[1]

31

8-5 Large Sample Confidence

Interval for a Population Proportion

• Suppose random sample of size n has been taken

from a large population and X observations belong

to a class of interest.

– Example: take sample of 500 people. Count the

number with long hair

• Sample proportion, = X/n, is point estimator of

the proportion of the population that belongs to

this class.

• Can form CI around sample proportion

p

EM 561 GW DePuy 346

8-5 Large Sample Confidence Interval for

a Population Proportion

• Want to estimate the population proportion, p, of data from any

distribution

• Take a large sample of size n from Normally distributed data

(Note: n ≥40)

• Find sample proportion,

• P will be normally distributed

– How do I know this? See Chpt 4: Normal Approximation to the

Binomial Distribution. As sample size gets large, binomial distribution

can be approximated with Normal distribution.

• Calculate upper and lower CI limit using Normal Distribution

and 1-α such that P(lower limit ≤ p ≤ upper limit) = 1-α

n

Xp ˆ

EM 561 GW DePuy 347

Page 32: EM561 Lecture Notes - Part 3 of 3[1]

32

• 100(1-α)% Confidence Interval on p (2-sided):

• where Zα/2 is the upper 100(α/2) percentage point of the standard normal distribution

• Zα/2 = z for which P(Z>z) = α/2

• For a 90% CI, α = 0.10 and Z0.05 = 1.645

• For a 95% CI, α = 0.05 and Z0.025 = 1.96

• For a 99% CI, α = 0.01 and Z0.005 = 2.576

8-5 Large Sample Confidence Interval for

a Population Proportion

IE 360 GW DePuy 348

These Z values

come from the

N(0,1) table

n

ppZpp

n

ppZp

)ˆ1(ˆˆ

)ˆ1(ˆˆ

2/2/

n

ppZp

)ˆ1(ˆˆ

2/

IE 360 GW DePuy 348

• 100(1-α)% Confidence Interval on p (2-sided):

• Notice: CI centered around sample proportion

• Notice: CI width is a function of confidence level, sample size, and variance

– variance of binomial is

8-5 Large Sample Confidence Interval for

a Population Proportion

IE 360 GW DePuy 349

n

ppZpp

n

ppZp

)ˆ1(ˆˆ

)ˆ1(ˆˆ

2/2/

n

ppZp

)ˆ1(ˆˆ

2/

n

Xp ˆ

)ˆ1(ˆ ppn

IE 360 GW DePuy 349

Page 33: EM561 Lecture Notes - Part 3 of 3[1]

33

8-5 Large Sample Confidence Interval for

a Population Proportion

p

n

ppZp

)ˆ1(ˆˆ

n

ppZpp

)ˆ1(ˆˆ

• 100(1-α)% Upper Confidence Bound on p (1-sided):

• 100(1-α)% Lower Confidence Bound on p (1-sided):

EM 561 GW DePuy 350

8-5 Large Sample Confidence

Interval for a Population Proportion

A manufacturer of electronic calculators takes a random sample

of 1200 calculators and finds there are 8 defective units.

a) Construct a 95% confidence interval on the population

proportion

• Believable that 1% defective?

EM 561 GW DePuy 351

Page 34: EM561 Lecture Notes - Part 3 of 3[1]

34

8-5 Large Sample Confidence

Interval for a Population Proportion

b) Is there evidence to support a claim that the fraction of

defective units produced is 1% or less?

EM 561 GW DePuy 352

8-5 Large Sample Confidence Interval

for a Population Proportion

2-sided CI Upper 1-sided CI Lower 1-sided CI

Summary of Procedure

• Data Distribution: Any i.e. X~ ??(μ, σ2)

• Population parameters:

– Proportion, p: unknown → to be estimated

• Sample Size: large (n ≥ 40)

• Sample Statistic: Sample proportion

• Sampling Distribution: Normal Distribution

• CIs:

n

ppZp

)ˆ1(ˆˆ

2/

p

n

ppZp

)ˆ1(ˆˆ

n

ppZpp

)ˆ1(ˆˆ

IE 360 GW DePuy 353

Page 35: EM561 Lecture Notes - Part 3 of 3[1]

35

8-5 Large Sample Confidence Interval for a

Population Proportion

How to do CI on p, in Minitab?

• Can either enter each observation in a

worksheet column or input summarized data

Stat

Basic Statistics

1 Proportion

EM 561 GW DePuy 354

8-5 Large Sample Confidence Interval for a

Population Proportion

How to do CI on p, in Minitab?

X

n

Default is 95%

two-sided CI

„Options‟ to

change

confidence level

and one-sided CI

EM 561 GW DePuy 355

Page 36: EM561 Lecture Notes - Part 3 of 3[1]

36

8-5 Large Sample Confidence Interval

for a Population Proportion

Test and CI for One Proportion

Sample X N Sample p 95% CI

1 8 1200 0.006667 (0.002882, 0.013094)

EM 561 GW DePuy 356

Confidence Intervals

• We discussed several different CI – how to know which CI is appropriate for a particular problem?

• What population parameter is being estimated? μ? σ2? p?

– If μ being estimated, is σ2 known? Sample size?

• If σ2 known, then use Z

• If σ2 unknown and large sample, then use Z

• If σ2 unknown and small sample, then use t

– If σ2 being estimated, use Chi-squared

– If p being estimated, use Z

• 2-sided or 1-sided CI?

EM 561 GW DePuy 357

Page 37: EM561 Lecture Notes - Part 3 of 3[1]

37

More Chapter 8 Examples

• A drink machine is adjusted to release a certain amount of

syrup into a chamber where it is mixed with carbonated

water. A random sample of 25 drinks were found to have a

mean syrup content of 1.10 fluid ounces and a standard

deviation of 0.015 fluid ounces.

• Believable that the mean syrup dispensed is 1.15 fl oz?

1.05 fl oz? Find a 95% CI.

EM 561 GW DePuy 358

More Chapter 8 Examples

• The percentage of titanium in an alloy used in aerospace

castings is measured in 51 randomly selected parts. The

sample standard deviation is S = 0.37 percent.

• Believable that the standard deviation is 0.50 percent? Find a

95% CI.

• Believable that the variance is 0.13 percent? Find a 95% CI.

EM 561 GW DePuy 359

Page 38: EM561 Lecture Notes - Part 3 of 3[1]

38

IE 360 GW DePuy 360

More Chapter 8 Examples

• Need more Chapter 8 examples?

• READ THE BOOK!

• We covered all sections 8-1 thru 8-6 (not 8-

7)

IE 360 GW DePuy 360

IE 360 GW DePuy 361

Page 39: EM561 Lecture Notes - Part 3 of 3[1]

39

IE 360 GW DePuy 362

9-1 Hypothesis Testing

• In previous chapter we learned to construct CI

estimate of population parameter from sample

data

• In this chapter we‟ll learn how to accept or reject a

statement or hypothesis about a parameter

– Useful since we can formulate many types of decision-

making problems, tests, or experiments in the

engineering world as hypothesis-testing problems

EM 561 GW DePuy 363

Page 40: EM561 Lecture Notes - Part 3 of 3[1]

40

CI and Hypothesis Testing

• Both CI (Chpt 8) and Hypothesis testing (Chpt 9)

are used to make a statistical inference

– Use sample statistic to „guess‟ population parameter

• In CI we formed an interval around sample

statistic and used the interval to answer

questions

– Believable that μ=12? Believable that σ<3.2?

• In hypothesis testing we are making a statement

about the population parameter then using the

sample statistic to decide whether the statement

is believable or not

EM 561 GW DePuy 364

9-1 Hypothesis Testing

• Suppose we are interested in the average length of an

adult‟s foot

• Take sample

• Can use sample to form CI we think contains population

mean, μ

• OR can use sample to decide whether a statement or

hypothesis about the population mean is believable

• e.g. Hypothesis is μ = 27.0 cm

• In both CI and hypothesis testing we use the sample

statistic (e.g. sample mean) to make inference about the

unknown population statistic (e.g. μ)

EM 561 GW DePuy 365

Page 41: EM561 Lecture Notes - Part 3 of 3[1]

41

9-1 Hypothesis Testing

• State our hypothesis in terms of Null Hypothesis, H0

and Alternative Hypothesis, H1

Two-sided hypothesis

H0: μ = 27.0 cm

H1: μ ≠ 27.0 cm

One-sided hypotheses

H0: μ ≥ 27.0 cm H0: μ ≤ 27.0 cm

H1: μ < 27.0 cm OR H1: μ > 27.0 cm

EM 561 GW DePuy 366

9-1 Hypothesis Testing

• General hypotheses for population mean μ

Two-sided hypothesis

H0: μ = μ0

H1: μ ≠ μ0

One-sided hypotheses

H0: μ ≥ μ0 H0: μ ≤ μ0

H1: μ < μ0 OR H1: μ > μ0”

EM 561 GW DePuy 367

Page 42: EM561 Lecture Notes - Part 3 of 3[1]

42

9-1 Hypothesis Testing

• Use sample statistics to decide which statement, H0 or H1,

is more believable

Example H0: μ = 27.0 cm

H1: μ ≠ 27.0 cm

• If sample mean = 26.8 cm I would believe H0: μ = 27.0 cm

• If sample mean = 23.1 cm I would believe H1: μ ≠ 27.0 cm

EM 561 GW DePuy 368

9-1 Hypothesis Testing

• How close does sample mean need to be to 27.0

cm for me to believe H0: μ = 27.0 cm?

• How far away from 27 cm does the sample mean

need to be for me to believe H1: μ ≠ 27.0 cm?

• Depends on:

• Variance, σ2

• Sample size, n

• Significance level, α

EM 561 GW DePuy 369

Page 43: EM561 Lecture Notes - Part 3 of 3[1]

43

9-1 Hypothesis Testing

• Example: H0: μ = 27.0 H1: μ ≠ 27.0

• If 26.0 ≤ sample mean ≤ 28.0, Accept H0

• If sample mean < 26.0 or if sample mean > 27.0,

Reject H0

X26.0 27.0 28.0

Accept H0

μ = 27.0

Reject H0

μ ≠ 27.0

Reject H0

μ ≠ 27.0

EM 561 GW DePuy 370

9-1 Hypothesis Testing

• Example: H0: μ = 27.0 H1: μ ≠ 27.0

X

Accept H0

μ = 27.0

Reject H0

μ ≠ 27.0

Reject H0

μ ≠ 27.0

Acceptance Region

Rejection

Region

Critical valuesHow to choose

critical values?

Rejection

Region

EM 561 GW DePuy 371

26.0 27.0 28.0

Page 44: EM561 Lecture Notes - Part 3 of 3[1]

44

9-1 Hypothesis Testing

• How to choose critical values?

• Use the sampling distribution of the appropriate

sample statistic to determine critical values and

therefore whether to accept or reject H0

• Use Normal distribution to find CI for μ when n≥40

• Use Student t distribution to find CI for μ when

n<40

• Use Chi-square distribution to find CI for σ2

EM 561 GW DePuy 372

9-1 Hypothesis Testing

• We either Accept or Reject H0

• Always in terms of accepting/rejecting H0 , NOT H1

• When we reject H0 we are saying H0 is not

plausible or believable

• When we accept H0 we are saying H0 is plausible

or believable. However accepting H0 does not

prove H0 to be true.

– There may be other plausible H0

EM 561 GW DePuy 373

Page 45: EM561 Lecture Notes - Part 3 of 3[1]

45

9-1 Hypothesis Testing

• The “strongest” inference is available when the null hypothesis is rejected.

• This point is important when an experimenter decides what should be the null hypothesis and what should be the alternative hypothesis for one-sided problems.

• In order to “prove” or establish a statement (say μ> μ0) it is necessary to make it the alternative hypothesis (H1).

• For one-sided hypothesis test, make H1 whatever you are trying to „prove‟

EM 561 GW DePuy 374

9-1 Hypothesis Testing

• We either Accept or Reject H0 based on our

sample statistic

• Our decision to accept or reject H0 will either be

right or wrong.

• We‟ll never know for certain whether we‟re right or

wrong (because we do not know population parameter),

but we can find the probability we‟re wrong

• Two ways we could be wrong:

– Reject H0 when H0 is true (type I error)

– Accept H0 when H0 is false (type II error)

EM 561 GW DePuy 375

Page 46: EM561 Lecture Notes - Part 3 of 3[1]

46

9-1 Hypothesis Testing

• Probability of Type I error = α

• α also called significance level

• Probability of Type II error =β

• 1-β called power of the hypothesis test

EM 561 GW DePuy 376

9-1 Hypothesis Testing

• Power, 1-β, can be interpreted as the probability of

correctly rejecting a false null hypothesis.

• We often compare statistical tests by comparing

their power properties.

EM 561 GW DePuy 377

Page 47: EM561 Lecture Notes - Part 3 of 3[1]

47

9-1 Hypothesis Testing

General procedure for hypothesis test

1. From the problem context, identify the parameter of interest.

2. State the null hypothesis, H0 .

3. Specify an appropriate alternative hypothesis, H1.

4. Choose a significance level, .

5. Determine an appropriate test statistic.

6. State the rejection region for the statistic.

7. Calculate test statistic.

8. Decide to accept or reject H0 and report conclusion in the

context of the problem

EM 561 GW DePuy 378

9-1 Hypothesis Testing

• We will conduct hypothesis tests for

several population parameters:

– Mean, μ, of any distribution, large sample

size (n ≥ 40)

– Mean, μ, of normal distribution, small

sample size (n < 40)

– Variance, σ2, of normal distribution

– Population proportion, p

EM 561 GW DePuy 379

Page 48: EM561 Lecture Notes - Part 3 of 3[1]

48

9-2 Hypothesis Test on the Mean, μ,

for Large Sample

Alternative

Hypothesis Rejection criteria

Rejection

criteria in box

on p.307 are

incorrect

Null Hypothesis: H0: μ = μ0

Test Statistic:

• State hypothesis

• Take sample

• Calculate test stat

• Compare to

appropriate rejection

criteria

• Accept or reject H0

zZH

zZH

zZH

001

001

2/001

:

:

:

EM 561 GW DePuy 380

nS

XZ 0

0

9-2 Hypothesis Test on the Mean, μ,

for Large Sample

• Calculate Test Statistic (i.e. standardize sample

mean), Z0

a) Reject H0: μ=μ0 (i.e. believe H1: μ≠μ0) if test statistic very large or very small

b) Reject H0: μ=μ0 (i.e. believe H1: μ>μ0) if test statistic very large

c) Reject H0: μ=μ0 (i.e. believe H1: μ<μ0) if test statistic very small

EM 561 GW DePuy 381

Page 49: EM561 Lecture Notes - Part 3 of 3[1]

49

How does this Hypothesis Test Work?

• Suppose H0: μ = 10.0 H1:μ ≠ 10.0

• We want to find critical values for sample of

size 25 with σ=2

X??? 10 ???

Accept H0

μ = 10

Reject H0

μ ≠ 10

Reject H0

μ ≠ 10

Acceptance Region

Rejection

Region

Critical values

Rejection

Region

EM 561 GW DePuy 382

How does this Hypothesis Test Work?

• For α=0.05, critical values are ± 1.96

standard deviations from μ0=10.0

• Remember from CLT

• So critical values are

nX

216.9

25296.110

784.10

25296.110

EM 561 GW DePuy 383

Page 50: EM561 Lecture Notes - Part 3 of 3[1]

50

How does this Hypothesis Test Work?

• Take a sample. If sample mean between 9.216

and 10.784 then we believe μ=10 at α=0.05

X9.216 10 10.784

Accept H0

μ = 10

Reject H0

μ ≠ 10

Reject H0

μ ≠ 10

Acceptance Region

Rejection

Region

Critical values

Rejection

Region

EM 561 GW DePuy 384

How does this Hypothesis Test Work?

• Similarly we can work with the standardized values.

H0:μ=10 H0:Z0=0

H1:μ≠10 H1:Z0≠0

n

XZ

/

00

Equivalent to

EM 561 GW DePuy 385

nS

XZ 0

0

Known population st. dev. Large sample size (n≥40)

Page 51: EM561 Lecture Notes - Part 3 of 3[1]

51

How does this Hypothesis Test Work?

• Take a sample. Find test statistic, Z0 (i.e. standardize

sample mean). If Z0 between -1.96 and 1.96 then we

believe Z0=0 or equivalently μ=10 at α=0.05

-1.96 0 1.96

Accept H0Reject H0 Reject H0

Acceptance Region

Rejection

Region

Critical values

Rejection

Region

Z0

EM 561 GW DePuy 386

9-2 Hypothesis Test on the Mean, μ,

for Large Sample• The percent yield of a chemical process is being studied.

• 75 observations on yield have been taken and their sample

mean is 90.68% and sample variance 11.24%

• Is it believable the true mean yield is 90.0%? Use α=0.05

EM 561 GW DePuy 387

Page 52: EM561 Lecture Notes - Part 3 of 3[1]

52

Stop here Tuesday?

• Chapter 8 extra credit

EM 561 GW DePuy 388

9-2 Hypothesis Test on the Mean, μ,

for Large Sample

How to do hypothesis test on μ, with σ2

known, in Minitab?

• Can either enter each observation in a

worksheet column or input summarized data

Stat

Basic Statistics

1-sample Z

EM 561 GW DePuy 389

Page 53: EM561 Lecture Notes - Part 3 of 3[1]

53

9-2 Hypothesis Test on the Mean, μ,

for Large SampleHow to do hypothesis test on μ, with σ2 known,

in Minitab?

n

σ

XDefault is H1≠μ0

„Options‟ to

change to H1>μ0

or H1<μ0

μ0

EM 561 GW DePuy 390

9-2 Hypothesis Test on the Mean, μ,

for Large Sample

One-Sample Z

Test of mu = 90 vs not = 90

The assumed standard deviation = 3

N Mean SE Mean 95% CI Z P

5 90.68 1.34 (88.05, 93.31) 0.51 0.612

EM 561 GW DePuy 391

Page 54: EM561 Lecture Notes - Part 3 of 3[1]

54

P-value

• The hypothesis test can quickly show

decisions (accept or reject) for a variety of

significance levels, α

• We can calculate the p-value associated

with a hypothesis test

• P-value is the smallest level of significance

(i.e. α) that would lead to rejection of the null

hypothesis H0 with the given data.

EM 561 GW DePuy 392

P-value

• The P-value is the probability that the test

statistic will take on a value that is at least as

extreme as the observed value of the statistic

when the null hypothesis H0 is true.

• Thus, a P-value conveys much information

about the weight of evidence against H0, and

so a decision maker can draw a conclusion at

any specified level of significance.

• P-value indicates how strongly we believe H0

or H1

EM 561 GW DePuy 393

Page 55: EM561 Lecture Notes - Part 3 of 3[1]

55

P-value

• We reject H0 when p-value < α

• Suppose p-value = 0.0316

• At α=0.05 accept or reject H0?

• At α=0.01 accept or reject H0?

• So when p-value very small (i.e. near 0) we

STRONGLY reject H0.

– We REALLY do not believe H0

– We REALLY do believe H1

EM 561 GW DePuy 394

Relationship between

Hypothesis Test and CI

Close relationship between CI and hypothesis test

• H0: μ = μ0

• H1: μ ≠ μ0

• For same sample and significance level,α, the H0

will be rejected if and only if μ is NOT in the 100(1-

α)% CI on μ

• This relationship between hypothesis test and CI is

same whether population parameter is μ,σ2, or p

EM 561 GW DePuy 395

Page 56: EM561 Lecture Notes - Part 3 of 3[1]

56

Relationship between

Hypothesis Test and CI

• For a specific significance level, α, a CI is

more informative than performing a

hypothesis test.

• The decision made by the hypothesis test can

be deduced from the confidence interval by

looking to see whether μ0 is inside the

confidence interval or not.

• Thus the confidence interval portrays the

decisions made by hypothesis test for all

possible values of μ0 (for a given α)

EM 561 GW DePuy 396

Relationship between

Hypothesis Test and CI

• However, for a specific sample, a hypothesis

test is more informative than performing a CI

because the hypothesis test can quickly

show decisions (accept or reject) for a

variety of significance levels, α.

EM 561 GW DePuy 397

Page 57: EM561 Lecture Notes - Part 3 of 3[1]

57

9-3 Hypothesis Test on the Mean, μ, of

Normal Distribution with Small Sample

Alternative

Hypothesis Rejection criteria

Null Hypothesis: H0: μ = μ0

Test Statistic:

• State hypothesis

• Take sample

• Calculate test stat

• Compare to

appropriate rejection

criteria

• Accept or reject H0

nS

XT 0

0

1,001

1,001

1,2/001

:

:

:

n

n

n

tTH

tTH

tTH

EM 561 GW DePuy 398

9-3 Hypothesis Test on the Mean, μ, of

Normal Distribution with Small Sample

Figure 9-8 The reference distribution for H0: = 0 with critical

region for (a) H1: 0 , (b) H1: > 0, and (c) H1: < 0.

a) Reject H0: μ=μ0 (i.e. believe H1: μ≠μ0) if test statistic very large or

very small

b) Reject H0: μ=μ0 (i.e. believe H1: μ>μ0) if test statistic very large

c) Reject H0: μ=μ0 (i.e. believe H1: μ<μ0) if test statistic very small

EM 561 GW DePuy 399

Page 58: EM561 Lecture Notes - Part 3 of 3[1]

58

9-3 Hypothesis Test on the Mean, μ, of

Normal Distribution with Small Sample

• The tar content in cigars is being studied. A sample of 30

cigars is taken and the sample mean is 1.529mg and

S=0.0566mg. Assume the tar content is normally distributed.

• Can you support a claim the mean tar content exceeds

1.5mg? Use α=0.05

EM 561 GW DePuy 400

How can we test to see if data is normally distributed?

9-4 Hypothesis Test on the Variance,

σ2, of a Normal Distribution

Alternative

Hypothesis Rejection criteria

Null Hypothesis: H0:σ2=σ0

2

Test Statistic:

• State hypothesis

• Take sample

• Calculate test stat

• Compare to

appropriate rejection

criteria

• Accept or reject H0

2

0

22

0

)1(

Sn

2

1,1

2

0

2

0

2

1

2

1,

2

0

2

0

2

1

2

1,2/1

2

0

2

1,2/

2

0

2

0

2

1

:

:

:

n

n

nn

H

H

orH

Rejection criteria in box

on p.323 are incorrect

EM 561 GW DePuy 401

Page 59: EM561 Lecture Notes - Part 3 of 3[1]

59

9-4 Hypothesis Test on the Variance,

σ2, of a Normal Distribution

a) Reject H0: σ2=σ0

2 (i.e. believe H1: σ2≠σ0

2 ) if test statistic very large

or very small

b) Reject H0: σ2=σ0

2 (i.e. believe H1: σ2>σ0

2 ) if test statistic very large

c) Reject H0: σ2=σ0

2 (i.e. believe H1: σ2<σ0

2 ) if test statistic very small

EM 561 GW DePuy 402

9-4 Hypothesis Test on the Variance,

σ2, of a Normal Distribution

• An engineer is testing the tire life for a new rubber

compound. 16 tires were tested to end-of-life on a road test.

The sample mean is 60139.7 km and S = 3645.94 km.

Assume the tire life is normally distributed.

• Can you conclude that the standard deviation of tire life

exceeds 3500 km? Use α=0.05

EM 561 GW DePuy 403

Page 60: EM561 Lecture Notes - Part 3 of 3[1]

60

9-5 Hypothesis Test on a

Population Proportion

Alternative

Hypothesis Rejection criteria

Null Hypothesis: H0: p = p0

Test Statistic:

• State hypothesis

• Take sample

• Calculate test stat

• Compare to

appropriate rejection

criteria

• Accept or reject H0

)1( 00

00

pnp

npXZ

H1: p ≠ p0 |Z0| > Z α/2

H1: p > p0 Z0 > Z α

H1: p < p0 Z0 < −Z α

Rejection

criteria in box

on p.326 are

incorrect

EM 561 GW DePuy 404

9-5 Hypothesis Test on a

Population Proportion

• The fraction of defective integrated circuits is being studied.

A random sample of 300 circuits is tested, revealing 13

defectives.

• Do the data support the claim the fraction of defective units

produced is less than 0.05? Use α=0.05

EM 561 GW DePuy 405

Page 61: EM561 Lecture Notes - Part 3 of 3[1]

61

Hypothesis Testing

• We discussed several different hypothesis tests –how to know which test is appropriate for a particular problem?

• What population parameter is being estimated? μ? σ2? p?

– If μ being estimated, is σ2 known?

• If σ2 known or large sample, then use Z test statistic

• If σ2 unknown and small sample, then use t test statistic

– If σ2 being estimated, use Chi-squared test statistic

– If p being estimated, use Z test statistic

• 2-sided or 1-sided hypothesis test?

EM 561 GW DePuy 406

9-7 Testing for Goodness of Fit

Is a particular distribution a good fit to the data?

• H0: data is well fit by proposed distribution

• H1: data is not well fit by proposed distribution

1. Divide proposed distribution into intervals

a. # intervals ≥ 5

b. # expected observations in each interval ≥ 5

c. Therefore # observations ≥ 25

2. Count number of actual observations in each interval

3. Find number of expected observations in each interval for

proposed distribution

4. Do hypothesis test to determine goodness of fit

EM 561 GW DePuy 407

Page 62: EM561 Lecture Notes - Part 3 of 3[1]

62

9-7 Testing for Goodness of Fit

• The test is based on the Chi-square distribution.

• Assume there is a sample of size n from a

population whose probability distribution is

unknown.

• Let Oi be the observed frequency in the ith class

interval.

• Let Ei be the expected frequency in the ith class

interval.

• Test statistic:

k

i i

ii

E

EO

1

2

2

0

)(

EM 561 GW DePuy 408

9-7 Testing for Goodness of Fit

• The test statistic is

• Reject H0 if

• Where k = # intervals

p = # distribution parameters estimated by

sample statistics

2

1,

2

0 pk

k

i i

ii

E

EO

1

2

2

0

)(

EM 561 GW DePuy 409

Page 63: EM561 Lecture Notes - Part 3 of 3[1]

63

9-7 Testing for Goodness of Fit

• Is it believable the following data (n=50) came

from a U(36,78) distribution?

77.72 45.64 45.90 72.69 43.69

58.19 57.30 59.59 76.35 69.98

43.12 61.87 74.61 57.66 45.00

44.22 77.51 66.22 74.85 68.82

43.96 36.19 76.19 46.39 55.71

57.95 55.96 43.93 63.46 54.54

59.95 76.79 54.92 63.73 55.46

36.93 62.64 63.31 46.34 67.13

48.90 61.37 67.64 57.10 76.26

57.02 42.78 42.89 56.96 62.77

EM 561 GW DePuy 410

9-7 Testing for Goodness of Fit

H0: data well fit by U(36,78)

H1: data not well fit by U(36,78)

1. Divide U(36,78) into intervals

2. Count # of actual observations in each interval

3. Find expected # observations in each interval for

U(36,78) distribution

4. Do Goodness-of-Fit Hypothesis test

EM 561 GW DePuy 411

Page 64: EM561 Lecture Notes - Part 3 of 3[1]

64

9-7 Testing for Goodness of Fit

1. Divide U(36,78) into intervals

2. Count # of actual observations in each interval

interval actual # obs

[36.0, 44.4) 9

[44.4, 52.8) 6

[52.8, 61.2) 14

[61.2, 69.6) 11

[69.6, 78.0] 10

EM 561 GW DePuy 412

9-7 Testing for Goodness of Fit

3. Find expected # observations in each interval for

U(36,78) distribution

• P(36.0<X<44.4) = (44.4 – 36.0)(1/42) = 0.20

• So, E = nP = 50(0.20) = 10

• Repeat for each interval.

36.0 78.0

1/42

44.4

EM 561 GW DePuy 413

Page 65: EM561 Lecture Notes - Part 3 of 3[1]

65

9-7 Testing for Goodness of Fit

interval

actual # obs

(O)

Expected # obs

(E)

[36.0, 44.4) 9 10

[44.4, 52.8) 6 10

[52.8, 61.2) 14 10

[61.2, 69.6) 11 10

[69.6, 78.0] 10 10

4.310

)1010(

10

)1110(

10

)1410(

10

)610(

10

)910(

)(

222222

0

2

0

i i

ii

E

EO

EM 561 GW DePuy 414

9-7 Testing for Goodness of Fit

• X02 = 3.4

• Critical value (at α=0.05) = X20.05,k-p-1 = X2

0.05,5-0-1=9.49

• Note: 0 distribution parameters estimated from data –

we were told which distribution to try, U(36,78)

• Since test statistic = 3.4 < critical value = 9.49, we

ACCEPT H0

• We do believe a U(36,78) distribution is a good fit to

this data

EM 561 GW DePuy 415

Page 66: EM561 Lecture Notes - Part 3 of 3[1]

66

9-7 Testing for Goodness of Fit

• Number of customers arriving at an ATM is

hypothesized to follow a Poisson distribution.

• Data is collected over several days and the

number of customers to arrive at the ATM each

minute is summarized as follows

# arrivals per

minute

Observed

Frequency

0 146

1 132

2 72

3 36

4 115 3

EM 561 GW DePuy 416

9-7 Testing for Goodness of Fit

• Estimate mean of hypothesized Poisson Distribution

from the observed data

• E(X) = λ = 1.1075 Remember: E(X)=∑xP(x)

# arrivals per

minute, x

Observed

Frequency Observed P(X)

0 146 0.3650

1 132 0.3300

2 72 0.1800

3 36 0.0900

4 11 0.02755 3 0.0075

400 E(X)=1.1075

EM 561 GW DePuy 417

Page 67: EM561 Lecture Notes - Part 3 of 3[1]

67

9-7 Testing for Goodness of Fit

• H0: Data is well fit by Poisson with λ=1.1075

• H1: Data is not well fit by Poisson with λ=1.1075

• Now calculate expected frequency (for Poisson

with λ=1.1075) in each interval

# arrivals per

minute, x

P(X) for Poisson

with λ=1.1075

Expected

Frequency with

n=400

0 0.3304 132.15

1 0.3659 146.36

2 0.2026 81.05

3 0.0748 29.92

4 0.0207 8.28≥ 5 0.0056 2.24

EM 561 GW DePuy 418

9-7 Testing for Goodness of Fit

• Since last interval has < 5 expected observations,

combine it with previous interval

# arrivals per

minute, x

P(X) for Poisson

with λ=1.1075

Expected

Frequency with

n=400

0 0.3304 132.15

1 0.3659 146.36

2 0.2026 81.05

3 0.0748 29.92

≥ 4 0.0263 10.52

EM 561 GW DePuy 419

Page 68: EM561 Lecture Notes - Part 3 of 3[1]

68

9-7 Testing for Goodness of Fit

• Now perform hypothesis test• H0: Data is well fit by Poisson with λ=1.1075

• H1: Data is not well fit by Poisson with λ=1.1075

• Calculate test stat, X02

• Compare to appropriate rejection criteria

• Accept or reject H0

# arrivals per

minute

Observed

Frequency, O

Expected

Frequency, E

0 146 133.82

1 132 146.53

2 72 80.22

3 36 29.28≥ 4 14 10.15

EM 561 GW DePuy 420

9-7 Testing for Goodness of Fit

• X02 = 6.3950

• Critical value (at α=0.05) = X20.05,k-p-1 = X2

0.05,5-1-1=7.81

k

i i

ii

E

EO

1

2

2

0

)(

# arrivals per

minute

Observed

Frequency, O

Expected

Frequency, E X02

0 146 133.82 1.1094

1 132 146.53 1.4405

2 72 80.22 0.8431

3 36 29.28 1.5413≥ 4 14 10.15 1.4606

6.3950

EM 561 GW DePuy 421

Page 69: EM561 Lecture Notes - Part 3 of 3[1]

69

9-7 Testing for Goodness of Fit

• Since test statistic X02 = 6.3950 < critical

value X20.05,5-1-1=7.81, we ACCEPT H0

• We do believe a Poisson λ=1.1075

distribution is a good fit to this data

EM 561 GW DePuy 422

More Chapter 9 examples• A particular type of gasoline is supposed to have a mean

octane rating greater than 90%. Five measurements are

taken of the octane rating as follows: 90.8%, 88.4%,

89.2%, 91.6%, 92.1%

• Can you conclude that the mean octane rating is greater

than 90%? Use α=0.10

EM 561 GW DePuy 423

Page 70: EM561 Lecture Notes - Part 3 of 3[1]

70

More Chapter 9 examples

• A rivet is to be inserted into a hole. A random sample of n= 15 parts is selected, and the hole diameter is measured.

• The sample standard deviation of the hole diameter measurements is S = 0.008 mm. If the variance of the hole diameter is too large, an unacceptable proportion of rivets will not fit properly in the hole. Assume the hole diameter is normally distributed.

• Believable the population standard deviation less than 0.012 mm? use α=0.01

EM 561 GW DePuy 424

More Chapter 9 examples• A new concrete mix is being designed to provide adequate

compressive strength for concrete blocks. The specs call

for the blocks to have a mean compressive strength

greater than 1350 kPa. A sample of 100 blocks is

produced and tested. The mean strength of the sample is

1360 kPa and the standard deviation is 70 kPa.

• Is it believable the blocks meet spec? Use α=0.05

EM 561 GW DePuy 425

Page 71: EM561 Lecture Notes - Part 3 of 3[1]

71

Stop here Wednesday

• Chapter 9 extra credit

EM 561 GW DePuy 426

IE 360 GW DePuy 427

Page 72: EM561 Lecture Notes - Part 3 of 3[1]

72

IE 360 GW DePuy 428

Regression Models

•An experimenter is often interested in how a particular

variable depends upon one or more of the other variables.

•Modeling is often performed by finding a functional

relationship between the expected value of a dependent

variable, Y, and a set of independent variables, Xi.

How does X affect Y? What is the effect of X on Y?

•Linear Regression is a modeling technique in which the

expected value of a dependent variable is modeled as a

linear combination of a set of independent variables.

•Many problems in engineering and science involve exploring

the relationships between two or more variables.

IE 360 GW DePuy 429

Page 73: EM561 Lecture Notes - Part 3 of 3[1]

73

Regression Analysis

Example

•A company that makes paper grocery bags is

interested in improving the tensile strength of their

bags. Specifically they are interested in the

relationship between the hardwood concentration in

the pulp and the tensile strength of the bag.

•Regression analysis can be used to build a model

to relate tensile strength (Y) to hardwood

concentration (X).

EM 561 GW DePuy 430

Linear RegressionFit a simple linear regression model to paper bag example

hardwood

str

en

gth

20.017.515.012.510.07.55.0

25

20

15

10

5

Scatterplot of strength vs hardwood

EM 561 GW DePuy 431

Page 74: EM561 Lecture Notes - Part 3 of 3[1]

74

11-1 Empirical Models

Based on the scatter diagram, it is probably reasonable to

assume that the variable Y is related to X by the following

simple linear regression model :

where the slope, β1, and intercept, β0, of the line are

called regression coefficients and where is the random

error term. We assume the mean and variance of are 0

and 2

Y = β0 + β1X + ε

EM 561 GW DePuy 432

11-2 Simple Linear Regression

• The case of simple linear regression considers

a single regressor or predictor x and a

dependent or response variable Y.

• How to fit line to data? How to find values for

slope, β1, and intercept, β0?

•find slope, β1, and intercept, β0, of best fitting line.

EM 561 GW DePuy 433

Page 75: EM561 Lecture Notes - Part 3 of 3[1]

75

Simple Linear Regression

• Used when demand is linearly increasing or

decreasing over time

• Find „best‟ fitting line to data

2535455565758595

0 5 10 15 20 25

Time

Dem

an

d

EM 561 GW DePuy 434

11-2 Simple Linear Regression

• The method of least squares is used to

estimate the parameters, 0 and 1 by minimizing

the sum of the squares of the vertical deviations in

Figure 11-3.

Figure 11-3

Deviations of the

data from the

estimated

regression model.

EM 561 GW DePuy 435

Page 76: EM561 Lecture Notes - Part 3 of 3[1]

76

11-2 Simple Linear Regression

• The sum of the squares of the deviations of the

observations from the true regression line is

• How to find 0 and 1 that minimize L?

EM 561 GW DePuy 436

Simple Linear Regression

Fitted or estimated regression line is xy 10ˆˆˆ

EM 561 GW DePuy 437

Page 77: EM561 Lecture Notes - Part 3 of 3[1]

77

Simple Linear RegressionFit a simple linear regression model to paper bag example

Hardwood Strength

x y x2

y2

xy

5 7 25 49 35

5 8 25 64 40

5 15 25 225 75

5 11 25 121 55

5 9 25 81 45

5 10 25 100 50

10 12 100 144 120

10 17 100 289 170

10 13 100 169 130

10 18 100 324 180

10 19 100 361 190

10 15 100 225 150

15 14 225 196 210

15 18 225 324 270

15 19 225 361 285

15 17 225 289 255

15 16 225 256 240

15 18 225 324 270

20 19 400 361 380

20 25 400 625 500

20 22 400 484 440

20 23 400 529 460

20 18 400 324 360

20 20 400 400 400

Total 300 383 4500 6625 5310

Avg 12.5 15.96

EM 561 GW DePuy 438

Simple Linear Regression

• Y = 7.25 + 0.697*X

• Strength = 7.25 + 0.697*Hardwood

697.0

24

3004500

24

)300)(383(5310

ˆ21

25.7)5.12(697.096.15ˆ0

EM 561 GW DePuy 439

Page 78: EM561 Lecture Notes - Part 3 of 3[1]

78

Regression Analysis in Minitab• Data in 2 columns: dependent (Y) & independent(X)

Stat

Regression

Regression

IE 360 GW DePuy 440

Regression Analysis in Minitab

Regression Analysis: strength versus hardwood

The regression equation is

strength = 7.25 + 0.697 hardwood

Predictor Coef SE Coef T PConstant 7.250 1.301 5.57 0.000hardwood 0.69667 0.09501 7.33 0.000

S = 2.60201 R-Sq = 71.0% R-Sq(adj) = 69.6%

Analysis of VarianceSource DF SS MS F PRegression 1 364.01 364.01 53.76 0.000Residual Error 22 148.95 6.77Total 23 512.96

EM 561 GW DePuy 441

Page 79: EM561 Lecture Notes - Part 3 of 3[1]

79

Finding the regression equation in

Excel

• Put data in 2 columns

– dependent (Y)

– Independent (X)

EM 561 GW DePuy 442

Regression in Excel

• In Excel

> Data Analysis

>Regression

EM 561 GW DePuy 443

Page 80: EM561 Lecture Notes - Part 3 of 3[1]

80

Regression in Excel

strength = 7.25 + 0.697 hardwood

EM 561 GW DePuy 444

Prediction of New Observations

If x0 is the value of the regressor variable of interest,

is the point estimator of the new or future value of

the response, Y0.

Predict bag strength for a hardwood concentration of

8%

Strength = 7.25 + 0.697(8) = 12.826

0100 xˆˆY

EM 561 GW DePuy 445

How did I know to use 8 and not 0.08?

Page 81: EM561 Lecture Notes - Part 3 of 3[1]

81

Prediction of New Observations

and where

E

1o

2

2 MS2N

XYˆYˆYˆ

A 100(1-α)% prediction interval on a future

observation Y0 at the value x0 is given by

where 0100 xˆˆY

N

xx

)xx(

N

1NˆtY

2

2

2

02

2N,2

0

EM 561 GW DePuy 446

Prediction of New Observations

Find a 95% PI for the tensile strength at a

hardwood concentration of x0 = 8%

(7.28, 18.37)

826.12ˆ0 Y

69.6

224

)5310(697.0)383(25.76625ˆ 2

24

3004500

)5.128(

24

2569.6074.2826.12

2

2

EM 561 GW DePuy 447

Page 82: EM561 Lecture Notes - Part 3 of 3[1]

82

Regression Analysis in Minitab

• Prediction intervals in Minitab

– „Options‟ menu within Regression

– Enter X0 value for PI

IE 360 GW DePuy 448

Predicted Values for New Observations

New

Obs Fit SE Fit 95% CI 95% PI

1 12.823 0.682 (11.409, 14.237) (7.245, 18.402)

Values of Predictors for New Observations

New

Obs hardwood

1 8.00

Regression Analysis in Minitab

Difference between CI and PI:

CI refers to the true mean response at X0. CI based only on data used to fit

the regression model.

PI for future observation which is independent of observations used to

develop regression model. Therefore more error (fitted model error and

error associated with projecting into future) and wider interval.

EM 561 GW DePuy 449

Page 83: EM561 Lecture Notes - Part 3 of 3[1]

83

Linear Regression

A few questions to answer about this linear

regression analysis

• How good a fit is this line to the data?

• Does the independent variable have a significant

effect on the dependent variable?

– Does X have a significant effect Y?

– Does hardwood concentration have a significant effect

on paper tensile strength?

EM 561 GW DePuy 450

Adequacy of the Regression Model

Coefficient of Determination (R2)

• The quantity R2 is called the coefficient of

determination and is often used to judge the fit of

a regression model

• 0 R2 1 or 0% R2 100%

• We often refer to R2 as the amount of variability in

the data explained or accounted for by the

regression model.

EM 561 GW DePuy 451

Page 84: EM561 Lecture Notes - Part 3 of 3[1]

84

R2

• Higher R2 values indicate a better fit of line to data

• We often refer to R2 as the amount of variability in the data explained or accounted for by the regression model

• No universal cut-off or threshold R2 value to define „good fitting‟ model versus „bad fitting model

Y

X

Y

X

Higher R2 Lower R2

EM 561 GW DePuy 452

Regression Analysis: strength versus hardwood

The regression equation is

strength = 7.25 + 0.697 hardwood

Predictor Coef SE Coef T PConstant 7.250 1.301 5.57 0.000hardwood 0.69667 0.09501 7.33 0.000

S = 2.60201 R-Sq = 71.0% R-Sq(adj) = 69.6%

Analysis of VarianceSource DF SS MS F PRegression 1 364.01 364.01 53.76 0.000Residual Error 22 148.95 6.77Total 23 512.96

Regression Analysis in MinitabEM 561 GW DePuy 453

Page 85: EM561 Lecture Notes - Part 3 of 3[1]

85

Regression in ExcelEM 561 GW DePuy 454

Significance of Regression

An important part of assessing the adequacy of a linear

regression model is testing statistical hypotheses about the

model parameters.

We can form a CI on both the slope, β1, and the intercept, β0

The most important CI is on the slope, β1, since this CI tests

the significance of regression (i.e. Does the independent

variable have a significant effect on the dependent

variable?).

A slope of 0 indicates there is no linear relationship between

X and Y.

EM 561 GW DePuy 455

Page 86: EM561 Lecture Notes - Part 3 of 3[1]

86

Significance of Regression

Figure 11-5 1 = 0 indicating no linear relationship

between X and Y.

EM 561 GW DePuy 456

Significance of Regression

Figure 11-6 1 ≠ 0 indicating a linear relationship

between X and Y.

EM 561 GW DePuy 457

Page 87: EM561 Lecture Notes - Part 3 of 3[1]

87

IE 360 GW DePuy 458

Significance of Regression

H0: β1 = 0

H1: β1 ≠ 0

In Excel

• given CI for slope. Believe β1 ≠ 0 (i.e. X DOES have

significant effect on Y) if 0 not in CI.

• given t test statistic for slope. Reject H0 (i.e. X DOES have

significant effect on Y) if |T| > tα/2,n-2 or if P-value < α

In Minitab

• given t test statistic for slope. Reject H0 (i.e. X DOES have

significant effect on Y) if |T| > tα/2,n-2 or if P-value < α

IE 360 GW DePuy 458

Regression in ExcelIE 360 GW DePuy 459

β1≠ 0? Remember: when β1≠ 0 then X significantly affects Y

Use CI - does CI contain 0?

If CI does not contain 0, then we believe X significantly affects Y

IE 360 GW DePuy 459

Page 88: EM561 Lecture Notes - Part 3 of 3[1]

88

Regression Analysis: strength versus hardwood

The regression equation is

strength = 7.25 + 0.697 hardwood

Predictor Coef SE Coef T PConstant 7.250 1.301 5.57 0.000hardwood 0.69667 0.09501 7.33 0.000

S = 2.60201 R-Sq = 71.0% R-Sq(adj) = 69.6%

Analysis of VarianceSource DF SS MS F PRegression 1 364.01 364.01 53.76 0.000Residual Error 22 148.95 6.77Total 23 512.96

460

Regression Analysis in MinitabIE 360 GW DePuy 460

Hardwood concentration DOES have a significant effect on

bag strength

Another Example

• How does weight (in lbs) affect time to run 100

yards?

• dependent (Y)?

• Independent (X)?

weight run time

120 16

142 37

137 27

129 22

137 31

112 12

148 39

132 26

126 17

EM 561 GW DePuy 461

Page 89: EM561 Lecture Notes - Part 3 of 3[1]

89

Plot data in Excel

• Straight line good fit?

• Do Excel Scatter plot

0

5

10

15

20

25

30

35

40

45

80 100 120 140 160

weight (lbs)

run

tim

e (

sec)

EM 561 GW DePuy 462

Find Regression Equation

• By Hand

• In Excel

> Tools

> Data Analysis

>Regression

• In Minitab

EM 561 GW DePuy 463

Page 90: EM561 Lecture Notes - Part 3 of 3[1]

90

Excel Regression Output

Time = -82.1 + 0.82(weight)

Line a good fit? R2=0.94

95% CI on β1

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.969596224

R Square 0.940116838

Adjusted R Square 0.931562101

Standard Error 2.453310621

Observations 9

ANOVA

df SS MS F Significance F

Regression 1 661.4244245 661.4244 109.8943 1.5663E-05

Residual 7 42.13113102 6.018733

Total 8 703.5555556

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept -82.0970885 10.2700248 -7.993855 9.16E-05 -106.38184 -57.81234

weight 0.816461366 0.077883967 10.48305 1.57E-05 0.63229505 1.0006277

EM 561 GW DePuy 464

Excel Regression Output

• Is this line a good fit? R2=0.94

• Does weight significantly affect run time?

Is weight useful in predicting run time?

• 95% CI on β1 = (0.63, 1.00)

• Since CI does not contain 0, we believe β1

≠ 0, therefore weight does significantly

affect run time.

EM 561 GW DePuy 465

Page 91: EM561 Lecture Notes - Part 3 of 3[1]

91

Predictions

• Predict run time for weight of 135 lbs.

• Predict run time for weight of 50 lbs.

EM 561 GW DePuy 466

IE 563 Dr. G.W. DePuy, UofL

467

Predictions

• Predictions only meaningful in general

range of original data.

• Be very careful extrapolating regression

model

Y

X

Page 92: EM561 Lecture Notes - Part 3 of 3[1]

92

468

Another Example

• How does outdoor temperature affect time

(in min) to unload truck?

• dependent (Y)?

• Independent (X)?

obs # temp unload time

7 40 37

6 40 29

12 40 35

2 50 39

13 50 40

5 50 44

16 60 47

11 60 44

1 60 46

9 70 49

4 70 53

14 70 55

3 80 46

10 80 47

18 80 43

15 90 36

17 90 37

8 90 34

IE 563 Dr. G.W. DePuy, UofL

469

Truck Unload Example

Regression Analysis: unload versus temp

The regression equation is

unload = 36.8 + 0.0848 temp

Predictor Coef SE Coef T P

Constant 36.768 6.442 5.71 0.000

temp 0.08476 0.09586 0.88 0.390

S = 6.94574 R-Sq = 4.7% R-Sq(adj) = 0.0%

Good regression model for this data? Why?

Does outdoor temperature affect time (in min) to

unload truck?

Page 93: EM561 Lecture Notes - Part 3 of 3[1]

93

IE 563 Dr. G.W. DePuy, UofL

470

Truck Unload Example

• Plot data – straight line a good fit?

temp

un

loa

d

908070605040

55

50

45

40

35

30

Scatterplot of unload vs temp

471

Truck Unload Example

• Include higher order terms in model e.g.

temp2 term

Unload time = β0 + β1temp + β2temp2

• Now regression model has more than one

term – called Multiple Linear Regression

Page 94: EM561 Lecture Notes - Part 3 of 3[1]

94

472

Multiple Linear Regression• General multiple linear regression model with k

regressors

Y = β0 + β1X1 + β2X2 + … + βkXk + ε

• For our truck unload example

• X1 = temp

• X2 = temp2

• Model a linear function of parameters β0, β1, β2, …, βk

• Testing for significance of regression now becomes:

H0: β1 = β2 = … = βk = 0

H1: βk ≠0 for at least one k

473

Multiple Linear Regression

• Do not add too many terms to model

• Adding terms will always make R2 increase

• If we had 8 data points, a model with 7 higher

order terms will fit perfectly, i.e. R2 = 100%

Y = β0 + β1X + β2X2 + β3X

3 + … + β7X7

• But too many terms will be difficult to interpret and

will not give good predictions

• So keep model as low order as possible

Page 95: EM561 Lecture Notes - Part 3 of 3[1]

95

474

Multiple Linear Regression

• For Multiple Linear Regression use Adjusted R2

to measure model adequacy

• Adjusted R2 will not automatically increase as

terms added to model.

• R2 adjusted by the number of terms in model

• Where n = # data points and p = # terms in

model (including constant)

22

adj R1pn

1n1R

IE 563 Dr. G.W. DePuy, UofL

475

Truck Unload Example

• Include temp2 term

• Add temp2 column to Minitab worksheet

Calc

Calculator

Page 96: EM561 Lecture Notes - Part 3 of 3[1]

96

IE 563 Dr. G.W. DePuy, UofL

476

Truck Unload Example

IE 563 Dr. G.W. DePuy, UofL

477

Truck Unload Example

• Include temp2 term in Regression model

Page 97: EM561 Lecture Notes - Part 3 of 3[1]

97

478

Truck Unload Example

Regression Analysis: unload versus temp, temp^2

The regression equation isunload = - 55.7 + 3.14 temp - 0.0235 temp^2

Predictor Coef SE Coef T PConstant -55.71 12.22 -4.56 0.000temp 3.1413 0.3945 7.96 0.000temp^2 -0.023512 0.003015 -7.80 0.000

S = 3.19108 R-Sq = 81.1% R-Sq(adj) = 78.6%

Analysis of Variance

Source DF SS MS F PRegression 2 656.87 328.43 32.25 0.000Residual Error 15 152.75 10.18Total 17 809.61

Model a good fit

to data?

Does temperature

affect truck

unload time?

H0:β1=β2=0

479

Truck Unload Example

• Predict unload time for temperature of 55

Unload time = -55.7 + 3.14(55) – 0.0235(552) = 45.94 min

• 95% PI on unload time for temperature of 55

• In Minitab, „Options‟ menu of Regression Menu, input values

for each model term• temp = 55

• temp2 = 552 = 3025

Page 98: EM561 Lecture Notes - Part 3 of 3[1]

98

480

Truck Unload Example

Predicted Values for New Observations

New

Obs Fit SE Fit 95% CI 95% PI

1 45.937 1.046 (43.708, 48.166) (38.779, 53.094)

Values of Predictors for New Observations

New

Obs temp temp^2

1 55.0 3025

Truck Unload Example in Excel

• Columns for temp, temp2,

unload time

• Include columns for both

temp & temp2

EM 561 GW DePuy 481

Page 99: EM561 Lecture Notes - Part 3 of 3[1]

99

Truck Unload Example in ExcelEM 561 GW DePuy 482

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.90074139

R Square 0.811335052

Adjusted R Square 0.786179726

Standard Error 3.191083809

Observations 18

ANOVA

df SS MS F Significance F

Regression 2 656.865873 328.4329365 32.25301233 3.69559E-06

Residual 15 152.7452381 10.18301587

Total 17 809.6111111

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept -55.7119048 12.22389608 -4.55762258 0.000377434 -81.7665224 -29.6572871

temp 3.141309524 0.394454127 7.96368782 9.10675E-07 2.300550459 3.982068589

temp^2 -0.0235119 0.003015291 -7.79755802 1.179E-06 -0.02993884 -0.01708496

Model a good fit

to data?

Does temperature

affect truck

unload time?

483

Multiple Linear Regression

Another example

• A pastry chef is interested in

how number of eggs and

amount of milk effect cake

height.

• Dependent (Y)?

• Independent (X)?

eggs milk height

1 0.5 2.3

1 0.5 2.1

1 0.5 2.5

2 0.5 3.4

2 0.5 3.3

2 0.5 3

3 0.5 4.2

3 0.5 3.9

3 0.5 4.3

1 1 2.4

1 1 2.7

1 1 2.3

2 1 2.8

2 1 2.9

2 1 2.5

3 1 2.9

3 1 3

3 1 3.2

Page 100: EM561 Lecture Notes - Part 3 of 3[1]

100

Multiple Linear RegressionEM 561 GW DePuy 484

• Include terms for

Eggs, Milk, and

Eggs*Milk

• Columns for all terms

and response in

Minitab

Multiple Linear Regression

Regression Analysis: height versus eggs, milk, eggs*milk

The regression equation is

height = 0.600 + 1.55 eggs + 1.58 milk - 1.27 eggs*milk

Predictor Coef SE Coef T P

Constant 0.6000 0.3630 1.65 0.121

eggs 1.5500 0.1680 9.22 0.000

milk 1.5778 0.4592 3.44 0.004

eggs*milk -1.2667 0.2126 -5.96 0.000

S = 0.184089 R-Sq = 93.2% R-Sq(adj) = 91.8%

EM 561 GW DePuy 485

What terms

significantly

effect cake

height?

Is this model a good fit?

Page 101: EM561 Lecture Notes - Part 3 of 3[1]

101

Multiple Linear Regression

• Predict cake height for 1.5 eggs and ¾ cup

of milk.

• height = 0.600 + 1.55 eggs + 1.58 milk -

1.27 eggs*milk

• height = 0.600 + 1.55 *(1.5) + 1.58 *(.75) -

1.27 *(1.5)*(.75) = 2.6833”

EM 561 GW DePuy 486

Multiple Linear Regression

• Interaction of #eggs and milk

EM 561 GW DePuy 487

321

4.0

3.5

3.0

2.5

eggs

Me

an

0.5

1.0

milk

Interaction Plot for heightFitted Means

Page 102: EM561 Lecture Notes - Part 3 of 3[1]

102

Multiple Linear Regression in ExcelEM 561 GW DePuy 488

• Columns for eggs, milk,

eggs*milk, height

• Include columns for eggs, milk,

eggs*milk

Multiple Linear Regression in ExcelEM 561 GW DePuy 489

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.96564146

R Square 0.93246342

Adjusted R Square 0.9179913

Standard Error 0.18408935

Observations 18

ANOVA

df SS MS F Significance F

Regression 3 6.550555556 2.18351852 64.431694 1.9533E-08

Residual 14 0.474444444 0.03388889

Total 17 7.025

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 0.6 0.363029095 1.65276009 0.12062076 -0.17861997 1.37861997

eggs 1.55 0.168049816 9.22345549 2.5181E-07 1.18956899 1.91043101

milk 1.57777778 0.459199518 3.43593082 0.00401543 0.59289277 2.56266279

eggs*milk -1.26666667 0.212568072 -5.95887546 3.4936E-05 -1.72257984 -0.8107535

What terms significantly

effect cake height?

Is this model a good fit?

Page 103: EM561 Lecture Notes - Part 3 of 3[1]

103

Test #2 here

• Take home exam

• You may use book, notes, calculator

• You may not discuss test with anyone other

than me.

• Due

• Covers Chapters 6, 7, 8, 9, 11

EM 561 GW DePuy 490

Good-Bye!

• Thanks!

• Keep working hard – you will be successful!!

• Good luck!!

EM 561 GW DePuy 491