MDP 308 Quality Management 2021. 2. 2. · Central limit theorem “ If X 1, X 2…X n is a random...

MDP 308

Quality Management

Lecture #3

Statistical Inference and confidence intervals

Today’s lecture

Statistical inference

Confidence intervals

Selecting probability distribution

Thinking Challenge

Suppose you’re interested in

the average amount of time

(in minutes) the students at

FECU (the population)

spend daily on watching TV.

How would you find out?


“The field of statistical inference consists of those methods used to make decisions

or to draw conclusions about a population. These methods utilize the information

contained in a sample from the population in drawing conclusions.”


A population consists of the totality of the observations

with which we are concerned.

A sample is a subset of observations selected from a

population.

A statistic is any function of the observations in a

random sample.

A random sample is a sample collected by making sure

that each individual in the population has the same

probability of being selected.

Statistical Methods

Statistical

Methods

EstimationHypothesis

Testing

Inferential

Statistics

Descriptive

Statistics

Summarize

the sample

data

Use the data

to learn

about the

population

Take a random sample, then use statistical methods to treat the collected data

Estimation Methods

Estimation

Interval

EstimationPoint

Estimation

Point estimation of parameters

A point estimate of some population parameter 𝜃 is a single numerical value መ𝜃 of a statistic Θ. The statistic Θ is called the point estimator.

Estimation problems occur frequently in engineering. We often need to estimate

The mean of a single population

The variance 2 (or standard deviation ) of a single population

The proportion p of items in a population that belong to a class of interest

The difference in means of two populations, 1 - 2

The difference in two population proportions, p1 – p2

Point Estimation

1. Provides a single value

• Based on observations from one sample

2. Gives no information about how close the value is to

the unknown population parameter

3. Example: Sample mean x = 3 is the point

estimate of the unknown population mean

Unbiased estimator

No bias

True value

Bias

True value

• An estimator should be “close” in some sense to the

true value of the unknown parameter.

• Formally, we say that Θ is an unbiased estimator of 𝜃 if

the expected value of Θ is equal to 𝜃.

Interval Estimator

An interval estimator (or confidence interval) is

a formula that tells us how to use the sample data to

calculate an interval that estimates the target parameter.

Interval Estimation

1. Provides a range of values

• Based on observations from one sample

2. Gives information about closeness to unknown

population parameter

• Stated in terms of probability

3. Example: Unknown population mean lies between 50

and 70 with 95% confidence

Estimation Process

Mean, , is

unknown

Population

☺

☺

☺☺

☺

☺☺

☺

☺

Sample☺

☺

☺

☺

I am 95% confident

that is between 40

& 60.

Random Sample

☺☺

Mean

x = 50

Key Elements of Interval Estimation

Sample statistic

(point estimate)Confidence

interval

Confidence

limit (lower, L)

Confidence

limit (upper, U)

A confidence interval provides a range of

plausible values for the population parameter.

Central limit theorem

“ If X1, X2… Xn is a random sample of size n taken from the

population (either finite or infinite) with mean and

variance 2, and if ത𝑋 is the sample mean, the limiting form of

the distribution of

𝑍 =ത𝑋−𝜇

Τ𝜎 𝑛

as n → , is the standard normal distribution.”

Furthermore, if the variance 2 is unknown and the sample

size n is large, the quantity ത𝑋−𝜇

Τ𝑆 𝑛where 𝑆2 is the sample variance

Has an approximate standard normal distribution.

Central limit theorem

By taking more than one sample and looking at the distribution of means

calculated for each sample, we can see that this calculated mean

approaches the actual population mean as indicated in the following figure.

Even if a population distribution is strongly non-normal, its sampling

distribution of means will be approximately normal for large sample sizes

(n30), and the mean of a sampling distribution of means is an unbiased

estimator of the population mean.

Confidence interval

Confidence interval on the mean of a normal distribution,

variance known.

Suppose that x1, x2, ..., xn is a random sample from a normal

distribution with unknown μ and known σ2 .

We know that ҧ𝑥~𝑁(𝜇,𝜎

𝑛)

A Confidence interval estimate for μ is

/

xZ

n

−=

UL

Prob. of selecting samples provide the range of µ that contains the true value of µ

Confidence interval

In order to find lower and upper confidence limits:

/ 2 /2

/2 /2

{ } 1/

{ } 1

xP z z

n

P x z x zn n

−− = −

− + = −

Confidence interval

Interpreting a CI

We cannot say: "with probability (1 − α) the parameter μ lies in the

confidence interval."

We can say that: if an infinite number of random samples are collected and

a 100(1-)% CI for µ is computed from each sample, 100(1-)% of these

intervals will contain the true value of µ

If our confidence level is 95%, then in the long run, 95% of

our confidence intervals will contain µ and 5% will not.

Effect of Confidence Level

For a confidence coefficient of 95%, the area in the two

tails is .05. To choose a different confidence coefficient

we increase or decrease the area (call it ) assigned

to the tails. If we place /2 in

each tail and z/2 is the z-value,

the confidence interval with

coefficient (1 – ) is x z 2( ) x .

1. A random sample is selected from the target

population.

2. The sample size n is large (i.e., n ≥ 30). Due to the

Central Limit Theorem, this condition guarantees

that the sampling distribution of is approximately

normal. Also, for large n, s will be a good estimator

of .

Conditions Required for a Valid Large-Sample

Confidence Interval for µ

x

where z/2 is the z-value with an area /2 to its right and The parameter is the standard deviation of the sampled population, and n is the sample size.

Note: When is unknown and n is large (n ≥ 30), the confidence interval is approximately equal to

Large-Sample (1 – )% Confidence

Interval for µ

where s is the sample standard deviation.

x z 2( ) x = x z 2

n

x z 2s

n

Thinking Challenge

You’re a Q/C inspector for

FruitTree. The for 0.33-liter

cans is .005 liters. A random

sample of 100 bottles showed x

= 0.329 liters. What is the 90%

confidence interval estimate of

the true mean amount in 0.33-

liter cans?

Confidence Interval Solution

/2 /2

.005 .0050.329 1.645 0.329 1.645

100 100

0.32818 0.32966

x z x zn n

− +

− +

Confidence interval for small sample

By assuming that the measured parameter of the

population is normally distributed, then the random

variable

𝑇 =ത𝑋−𝜇

Τ𝑆 𝑛

Has a t distribution with n-1 degrees of freedom

Therefore, the confidence interval is given by:

ҧ𝑥 − 𝑡𝛼2,𝑛−1

𝑆

𝑛 ҧ𝑥 + 𝑡𝛼

2,𝑛−1

𝑆

𝑛

Student t Distribution…

Here the letter t is used to represent the random

variable, hence the name. The density function for the

Student t distribution is as follows…

(nu) is called the degrees of freedom, and

(Gamma function) is (k)=(k-1)(k-2)…(2)(1)

Student t Distribution…[1 parameter]

In much the same way that and define the normal distribution [2 parameters], [1 parameter], the degrees of freedom, defines the Student t Distribution:

As the number of degrees of freedom increases, the tdistribution approaches the standard normal distribution.

Using the t table (Table 4) for values…

For example, if we want the value of t with 10 degrees of

freedom such that the area under the Student t curve is .05:Area under the curve value (t) : COLUMN

Degrees of Freedom : ROW

t.05,10

t.05,10=1.812

Student t Probabilities and Values

Excel can calculate Student distribution probabilities and

values. Warning: Excel will give you the value for “t”

where is the area in “BOTH” tails

=TINV(0.1,10) "=" 1.812

Selecting a probability distribution

In statistical process control, most of the quality characteristics are random variables.

The determination of confidence intervals are based on the assumption that the population distribution is Normal.

If that is not the case, we need to test the hypothesis that a particular distribution we select will be satisfactory.

This is done by first collecting numerical values from the real studied system.

Probability plots can be used as a first guess of the probability distribution function that can suit the collected data.

Then, goodness-of-fit test can be used for further verification based on detailed numerical comparison between the collected data and the selected probability distribution.

Chapter 3Statistical Quality Control, 7th Edition by Douglas C. Montgomery.

Copyright (c) 2013 John Wiley & Sons, Inc.

Determining if a sample of data might reasonably be

assumed to come from a specific distribution

Probability plots are available for various distributions

Easy to construct with computer software

(MINITAB)

Subjective interpretation

3.4 Probability Plots

Normal Probability Plot

Chapter 332Statistical Quality

Control, 7th Edition by

Douglas C. Montgomery.

The Normal Probability Plot on Standard

Graph Paper





Other Probability Plots

What is a reasonable choice as a probability model

for these data?



3.5 Some Useful Approximations

MDP 308 Quality Management 2021. 2. 2. · Central limit theorem “ If X 1, X 2…X n is a random...

Documents

Transcript of MDP 308 Quality Management 2021. 2. 2. · Central limit theorem “ If X 1, X 2…X n is a random...