MDP 308 Quality Management 2021. 2. 2. · Central limit theorem “ If X 1, X 2…X n is a random...
Transcript of MDP 308 Quality Management 2021. 2. 2. · Central limit theorem “ If X 1, X 2…X n is a random...
MDP 308
Quality Management
Lecture #3
Statistical Inference and confidence intervals
Today’s lecture
Statistical inference
Confidence intervals
Selecting probability distribution
Thinking Challenge
Suppose you’re interested in
the average amount of time
(in minutes) the students at
FECU (the population)
spend daily on watching TV.
How would you find out?
Statistical inference
“The field of statistical inference consists of those methods used to make decisions
or to draw conclusions about a population. These methods utilize the information
contained in a sample from the population in drawing conclusions.”
Statistical inference
A population consists of the totality of the observations
with which we are concerned.
A sample is a subset of observations selected from a
population.
A statistic is any function of the observations in a
random sample.
A random sample is a sample collected by making sure
that each individual in the population has the same
probability of being selected.
Statistical Methods
Statistical
Methods
EstimationHypothesis
Testing
Inferential
Statistics
Descriptive
Statistics
Summarize
the sample
data
Use the data
to learn
about the
population
Take a random sample, then use statistical methods to treat the collected data
Estimation Methods
Estimation
Interval
EstimationPoint
Estimation
Point estimation of parameters
A point estimate of some population parameter 𝜃 is a single numerical value መ𝜃 of a statistic Θ. The statistic Θ is called the point estimator.
Estimation problems occur frequently in engineering. We often need to estimate
The mean of a single population
The variance 2 (or standard deviation ) of a single population
The proportion p of items in a population that belong to a class of interest
The difference in means of two populations, 1 - 2
The difference in two population proportions, p1 – p2
Point Estimation
1. Provides a single value
• Based on observations from one sample
2. Gives no information about how close the value is to
the unknown population parameter
3. Example: Sample mean x = 3 is the point
estimate of the unknown population mean
Unbiased estimator
No bias
True value
Bias
True value
• An estimator should be “close” in some sense to the
true value of the unknown parameter.
• Formally, we say that Θ is an unbiased estimator of 𝜃 if
the expected value of Θ is equal to 𝜃.
Interval Estimator
An interval estimator (or confidence interval) is
a formula that tells us how to use the sample data to
calculate an interval that estimates the target parameter.
Interval Estimation
1. Provides a range of values
• Based on observations from one sample
2. Gives information about closeness to unknown
population parameter
• Stated in terms of probability
3. Example: Unknown population mean lies between 50
and 70 with 95% confidence
Estimation Process
Mean, , is
unknown
Population
☺
☺
☺☺
☺
☺☺
☺
☺
Sample☺
☺
☺
☺
I am 95% confident
that is between 40
& 60.
Random Sample
☺☺
Mean
x = 50
Key Elements of Interval Estimation
Sample statistic
(point estimate)Confidence
interval
Confidence
limit (lower, L)
Confidence
limit (upper, U)
A confidence interval provides a range of
plausible values for the population parameter.
Central limit theorem
“ If X1, X2… Xn is a random sample of size n taken from the
population (either finite or infinite) with mean and
variance 2, and if ത𝑋 is the sample mean, the limiting form of
the distribution of
𝑍 =ത𝑋−𝜇
Τ𝜎 𝑛
as n → , is the standard normal distribution.”
Furthermore, if the variance 2 is unknown and the sample
size n is large, the quantity ത𝑋−𝜇
Τ𝑆 𝑛where 𝑆2 is the sample variance
Has an approximate standard normal distribution.
Central limit theorem
By taking more than one sample and looking at the distribution of means
calculated for each sample, we can see that this calculated mean
approaches the actual population mean as indicated in the following figure.
Even if a population distribution is strongly non-normal, its sampling
distribution of means will be approximately normal for large sample sizes
(n30), and the mean of a sampling distribution of means is an unbiased
estimator of the population mean.
Confidence interval
Confidence interval on the mean of a normal distribution,
variance known.
Suppose that x1, x2, ..., xn is a random sample from a normal
distribution with unknown μ and known σ2 .
We know that ҧ𝑥~𝑁(𝜇,𝜎
𝑛)
A Confidence interval estimate for μ is
/
xZ
n
−=
UL
Prob. of selecting samples provide the range of µ that contains the true value of µ
Confidence interval
In order to find lower and upper confidence limits:
/ 2 /2
/2 /2
{ } 1/
{ } 1
xP z z
n
P x z x zn n
−− = −
− + = −
Confidence interval
Interpreting a CI
We cannot say: "with probability (1 − α) the parameter μ lies in the
confidence interval."
We can say that: if an infinite number of random samples are collected and
a 100(1-)% CI for µ is computed from each sample, 100(1-)% of these
intervals will contain the true value of µ
If our confidence level is 95%, then in the long run, 95% of
our confidence intervals will contain µ and 5% will not.
Effect of Confidence Level
For a confidence coefficient of 95%, the area in the two
tails is .05. To choose a different confidence coefficient
we increase or decrease the area (call it ) assigned
to the tails. If we place /2 in
each tail and z/2 is the z-value,
the confidence interval with
coefficient (1 – ) is x z 2( ) x .
1. A random sample is selected from the target
population.
2. The sample size n is large (i.e., n ≥ 30). Due to the
Central Limit Theorem, this condition guarantees
that the sampling distribution of is approximately
normal. Also, for large n, s will be a good estimator
of .
Conditions Required for a Valid Large-Sample
Confidence Interval for µ
x
where z/2 is the z-value with an area /2 to its right and The parameter is the standard deviation of the sampled population, and n is the sample size.
Note: When is unknown and n is large (n ≥ 30), the confidence interval is approximately equal to
Large-Sample (1 – )% Confidence
Interval for µ
where s is the sample standard deviation.
x z 2( ) x = x z 2
n
x z 2s
n
Thinking Challenge
You’re a Q/C inspector for
FruitTree. The for 0.33-liter
cans is .005 liters. A random
sample of 100 bottles showed x
= 0.329 liters. What is the 90%
confidence interval estimate of
the true mean amount in 0.33-
liter cans?
Confidence Interval Solution
/2 /2
.005 .0050.329 1.645 0.329 1.645
100 100
0.32818 0.32966
x z x zn n
− +
− +
Confidence interval for small sample
By assuming that the measured parameter of the
population is normally distributed, then the random
variable
𝑇 =ത𝑋−𝜇
Τ𝑆 𝑛
Has a t distribution with n-1 degrees of freedom
Therefore, the confidence interval is given by:
ҧ𝑥 − 𝑡𝛼2,𝑛−1
𝑆
𝑛 ҧ𝑥 + 𝑡𝛼
2,𝑛−1
𝑆
𝑛
Student t Distribution…
Here the letter t is used to represent the random
variable, hence the name. The density function for the
Student t distribution is as follows…
(nu) is called the degrees of freedom, and
(Gamma function) is (k)=(k-1)(k-2)…(2)(1)
Student t Distribution…[1 parameter]
In much the same way that and define the normal distribution [2 parameters], [1 parameter], the degrees of freedom, defines the Student t Distribution:
As the number of degrees of freedom increases, the tdistribution approaches the standard normal distribution.
Using the t table (Table 4) for values…
For example, if we want the value of t with 10 degrees of
freedom such that the area under the Student t curve is .05:Area under the curve value (t) : COLUMN
Degrees of Freedom : ROW
t.05,10
t.05,10=1.812
Student t Probabilities and Values
Excel can calculate Student distribution probabilities and
values. Warning: Excel will give you the value for “t”
where is the area in “BOTH” tails
=TINV(0.1,10) "=" 1.812
Selecting a probability distribution
In statistical process control, most of the quality characteristics are random variables.
The determination of confidence intervals are based on the assumption that the population distribution is Normal.
If that is not the case, we need to test the hypothesis that a particular distribution we select will be satisfactory.
This is done by first collecting numerical values from the real studied system.
Probability plots can be used as a first guess of the probability distribution function that can suit the collected data.
Then, goodness-of-fit test can be used for further verification based on detailed numerical comparison between the collected data and the selected probability distribution.
Chapter 3Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Copyright (c) 2013 John Wiley & Sons, Inc.
Determining if a sample of data might reasonably be
assumed to come from a specific distribution
Probability plots are available for various distributions
Easy to construct with computer software
(MINITAB)
Subjective interpretation
3.4 Probability Plots
Normal Probability Plot
Chapter 332Statistical Quality
Control, 7th Edition by
Douglas C. Montgomery.
Chapter 3Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Copyright (c) 2013 John Wiley & Sons, Inc.
The Normal Probability Plot on Standard
Graph Paper
Chapter 3Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Copyright (c) 2013 John Wiley & Sons, Inc.
Chapter 3Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Copyright (c) 2013 John Wiley & Sons, Inc.
Other Probability Plots
What is a reasonable choice as a probability model
for these data?
Chapter 3Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Copyright (c) 2013 John Wiley & Sons, Inc.
Chapter 3Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Copyright (c) 2013 John Wiley & Sons, Inc.
3.5 Some Useful Approximations