CS433 Modeling and Simulation Lecture 05 Statistical Analysis Tools
CS433 Modeling and Simulation Lecture 12 Output Analysis Large-Sample Estimation Theory Dr. Anis...
-
Upload
alban-reeves -
Category
Documents
-
view
227 -
download
2
Transcript of CS433 Modeling and Simulation Lecture 12 Output Analysis Large-Sample Estimation Theory Dr. Anis...
CS433Modeling and Simulation
Lecture 12
Output AnalysisLarge-Sample Estimation
Theory
Dr. Anis Koubâa
http://10.2.230.10:4040/akoubaa/cs433/
10 Jan 2009
Al-Imam Mohammad Ibn Saud UniversityAl-Imam Mohammad Ibn Saud University
Goals of Today
Understand the problem of confidence in simulation results
Learn how to determine of range of value with a certain confidence a certain stochastic simulation result
Understand the concept of Margin of Error Confidence Interval with a certain level
of confidence
Reading
Required Lemmis Park, Discrete Event Simulation - A
First Course, Chapter 8: Output Analysis
Optional Harry Perros, Computer Simulation Technique
- The Definitive Introduction, 2007Chapter 5
Problem Statement
For a deterministic simulation model one run will be sufficient to determine the output.
A stochastic simulation model will not give the same result when run repetitively with independent random seed. One run is not sufficient to obtain confident
simulation results from one sample. Statistical Analysis of Simulation Result:
multiple runs to estimate the metric of interest with a certain confidence
Example:
The estimation of the mean value of the response time of M/M/1 Queue (from a simulation or experiments) May vary from one run to another (depending
on the seed) Depends on the number of samples/size of
samples Objective
For a given large sample output, determine what is the mean value with a certain confidence on the result.
M. Peter JurkatUNM/MPJ CS452/Mgt532 V. Output Analysis
6
Stochastic Process Simulation
Each stochastic variable (e.g., time between arrivals) is generated by a stream of random numbers from a RNG beginning with a particular seed value need multiple runs and statistical analysis for stochastic
models (i.e., sampling) The same RNG with the same seed, x0, will always
generate the same sequence of pseudo-random numbers repeated simulation with the same inputs (e.g. time,
parameters) and the same seed will result in identical outputs
For independent replications, we need a different seed for each replication (تكرار)
7
Simulation Sampling
Each simulation run may yield only one value of each simulation output (e.g., average waiting time, proportion of time server is idle)
Need replications to gain statistical stability and significance (confidence) in output distribution
M. Peter JurkatUNM/MPJ CS452/Mgt532 V. Output Analysis
8
Simulation Termination (Simulation Time)
Terminating Simulation: Runs for some duration of time TE, where E is a specified event that stops the simulation. Bank example: Opens at 8:30 am (time 0) with no customers
present and 8 of the 11 teller working (initial conditions), and closes at 4:30 pm (Time TE = 480 minutes).
Non-Terminating Simulation: Runs continuously, or at least over a very long period of time. Examples: simulating telephone systems, or a computer
network Main Objective: Study the steady-state (long-run) properties
of the system, properties that are not influenced by the initial conditions of the model ⇨ Collecting a Large Sample
9
Experimental Design Issues
What types of parameters to estimate?
In general, a stochastic variable is described by their probability distributions and parameters. For quantitative random variables, the
distributions are described by the mean and variance
For a binomial random variables, the location and shape are determined by p.
If the values of parameters are unknown, we make inferences about them using sample information.
Types of Inference
Estimation:Estimation: Estimating or predicting the value of the parameter from simulation results.
“What is (are) the most likely values of or p?”
ت?ْن=باط استدالل ا?س=
Types of Inference - Example
Examples:Examples: A consumer wants to estimate the
average price of similar homes in his city before putting his home on the market.Estimation:Estimation: Estimate , the average home price.
A engineer wants to estimate the average waiting time in a queue obtained by a simulation.
Estimation:Estimation: Estimate , the average waiting time in the queue.
Estimators
Definitions In statistics, an estimator is
a function of the observable sample data that is
used to estimate an unknown population parameter (which
is called the estimand);
an estimate is the result from the actual application of the function to a particular sample of data.
Many different estimators are possible for any given parameter. Some criterion is used to choose between the estimators,
although it is often the case that a criterion cannot be used
to clearly pick one estimator over another.
http://en.wikipedia.org/wiki/Estimator
Estimation Procedure
To estimate a parameter of interest (e.g., a population mean, a binomial proportion, a difference between two population means, or a ratio of two population standard deviation), the usual procedure is as follows:
1. Select a random sample from the population of interest
(simulation output, experimental measures, random
variables, etc).
2. Calculate the point estimate of the parameter (i.e. the mean
value of the sample).
3. Calculate a measure of its variability, often a confidence
interval.
4. Associate with this estimate a measure of variability.
http://en.wikipedia.org/wiki/Estimator
Definitions There are two types of estimators:
Point estimatorPoint estimator: : It is a single number calculated to estimate the parameter. Example: average value of a sample.
Interval estimatorInterval estimator:: Two numbers are calculated to create an interval within which the parameter is expected to lie. An interval estimation uses the sample data to
calculate an interval of possible values of an unknown parameter.
The most known forms of interval estimation are: Confidence intervals (a Frequentist Method) Credible intervals (a Bayesian Method).
Point Estimators
Properties of Point Estimators
The estimator depends on the sampleThe estimator depends on the sample: Since an
estimator is calculated from sample values, it varies
from sample to sample according to its sampling sampling
distributiondistribution..
An unbiased estimatorunbiased estimator is an estimator where the
mean of its sampling distribution equals the real real
(expected) mean value(expected) mean value of the parameter of interest.
It does not systematically overestimate or underestimate
the target parameter.
Properties of Point Estimators Among all the unbiasedunbiased estimators, we
prefer the estimator whose sampling distribution has the smallest smallest spreadspread (or variabilityvariability).
Measuring the Goodness of an Estimator
Error of estimation (Error of estimation (or or Bias) Bias) is the distance between an estimate and the true value of the parameter.
The distance between the bullet and the bull’s-eye.
The distance between the bullet and the bull’s-eye.
Because of the Central Limit Theorem.
Because of the Central Limit Theorem.
In this chapter, the sample sizes are large, so that our unbiased estimators will have normal distributions..
The Margin of Error FACT. For unbiasedunbiased estimators with normal
sampling distributions, 95% of all point estimates will lie within 1.96 standard deviations of the parameter of interest.
estimator theoferror std96.1 estimator theoferror std96.1
Margin of error: The maximum error of estimation, calculated as
Estimating Means and Proportions
For a quantitative population,
n
sn
xμ
96.1 :)30(error ofMargin
:mean population ofestimator Point
n
sn
xμ
96.1 :)30(error ofMargin
:mean population ofestimator Point
For a binomial population,
n
qpn
x/npp
ˆˆ96.1 :)30(error ofMargin
ˆ : proportion population ofestimator Point
n
qpn
x/npp
ˆˆ96.1 :)30(error ofMargin
ˆ : proportion population ofestimator Point
Example 1
Point estimator of : 252,000
15,000Margin of error : 1.96 1.96 3,675
64
μ x
s
n
Point estimator of : 252,000
15,000Margin of error : 1.96 1.96 3,675
64
μ x
s
n
A homeowner randomly samples 64 homes similar to
his own and finds that the average selling price is
252,000 SAR with a standard deviation of 15,000 SAR.
Question: Estimate the average selling price for all
similar homes in the city.
A quality control technician wants to estimate the
proportion of soda bottles that are under-filled. He
randomly samples 200 bottles of soda and finds 10
under-filled cans.
What is the estimation of the proportion of under-
filled cans?200 proportion of underfilled cans
ˆPoint estimator of : 10 / 200 .05
ˆ ˆ (.05)(.95)Margin of error: 1.96 1.96 .03
200
n p
p p x/n
pq
n
200 proportion of underfilled cans
ˆPoint estimator of : 10 / 200 .05
ˆ ˆ (.05)(.95)Margin of error: 1.96 1.96 .03
200
n p
p p x/n
pq
n
Example 2
Interval Estimators
Confidence Interval
Interval Estimation
• Create an interval (a, b) so that you are fairly sure that the parameter lies between these two values.
• “Fairly sure” means “with high probability”, measured using the confidence coefficient, 1-confidence coefficient, 1-..
• Suppose 1- = 0.95 and that the estimator has a normal distribution.
Parameter 1.96SEParameter 1.96SE
Usually, 1- = 0.90, 0.95, 0.98, 0.99
Interval Estimation
• Since we don’t know the value of the parameter, consider
which has a variable center.
• Only if the estimator falls in the tail areas will the interval fail to enclose the parameter. This happens only 5% of the time.
Estimator 1.96SEEstimator 1.96SE
WorkedWorkedWorked
Failed
APPLETAPPLETMY
To Change the Confidence Level
• To change to a general confidence level, 1-, pick a value of z that puts area 1- in the center of the z-distribution (i.e. Normal Distribution N(0,1).
100(1-)% Confidence Interval: Estimator zSE100(1-)% Confidence Interval: Estimator zSE
Tail area
/2
Confidence Level z/2
0.05 0.1 90% 1.645
0.025 0.05 95% 1.96
0.01 0.02 98% 2.33
0.005 0.01 99% 2.58
Confidence Intervals for Means and Proportions
For a quantitative population
n
szx
μ
2/
:mean population afor interval Confidence
n
szx
μ
2/
:mean population afor interval Confidence
For a binomial population
n
qpzp
p
ˆˆˆ
: proportion population afor interval Confidence
2/n
qpzp
p
ˆˆˆ
: proportion population afor interval Confidence
2/
Example 1A random sample of n = 50 males
showed a mean average daily intake of dairy products equal to 756 grams with a standard deviation of 35 grams. Find a 95% confidence interval for the population average m.
n
sx 96.1
50
3596.17 56 70.97 56
grams. 65.70 746.30or 7
Example 1 Find a 99% confidence interval for m,
the population average daily intake of dairy products for men.
n
sx 58.2
50
3558.27 56 77.127 56
grams. 7 743.23or 77.68 The interval must be wider to provide for the increased confidence that is does indeed enclose the true value of .
Example 2
Of a random sample of n = 150 college students, 104 of the students said that they had played on a soccer team during their K-12 years. Estimate the proportion of college students who played soccer in their youth with a 98% confidence interval.
n
qpp
ˆˆ33.2ˆ
150
)31(.69.33.2
104
150
09.. 69 .60or .78. p
Estimate the Difference between two means
Estimating the Difference between Two Means
Sometimes we are interested in comparing the means of two populations.
The average growth of plants fed using two different nutrients. The average scores for students taught with two different teaching methods.
To make this comparison,
. varianceand mean with 1 population
fromdrawn size of sample randomA 211
1
μ
n
. varianceand mean with 1 population
fromdrawn size of sample randomA 211
1
μ
n
. varianceand mean with 2 population
fromdrawn size of sample randomA 222
2
μ
n
. varianceand mean with 2 population
fromdrawn size of sample randomA 222
2
μ
n
Estimating the Difference between Two Means
We compare the two averages by making inferences about , the difference in the two population averages.
If the two population averages are the same, then = 0. The best estimate of is the difference in the two sample means,
21 xx 21 xx
The Sampling Distribution of 1 2x x
.SE as
estimated becan SE and normal,ely approximat is of
ondistributi sampling thelarge, are sizes sample theIf .3
.SE is ofdeviation standard The 2.
means. population the
in difference the, is ofmean The 1.
2
22
1
21
21
2
22
1
21
21
2121
n
s
n
s
xx
nnxx
xx
.SE as
estimated becan SE and normal,ely approximat is of
ondistributi sampling thelarge, are sizes sample theIf .3
.SE is ofdeviation standard The 2.
means. population the
in difference the, is ofmean The 1.
2
22
1
21
21
2
22
1
21
21
2121
n
s
n
s
xx
nnxx
xx
Estimating 1-
For large samples, point estimates and their margin of error as well as confidence intervals are based on the standard normal distribution (z-distribution).
2
22
1
21
2121
1.96 :Error ofMargin
:-for estimatePoint
n
s
n
s
xx
2
22
1
21
2121
1.96 :Error ofMargin
:-for estimatePoint
n
s
n
s
xx
2
22
1
21
2/21
21
)(
:-for interval Confidence
n
s
n
szxx
2
22
1
21
2/21
21
)(
:-for interval Confidence
n
s
n
szxx
Example
Compare the average daily intake of dairy products of men and women using a 95% confidence interval.
78.126
.78.6 18.78-or 21
Avg Daily Intakes
Men Women
Sample size 50 50
Sample mean 756 762
Sample Std Dev 35 30
2
22
1
21
21 96.1)(n
s
n
sxx
2 235 30(756 762) 1.96
50 50
Example, continued
• Could you conclude, based on this confidence interval, that there is a difference in the average daily intake of dairy products for men and women?
• The confidence interval contains the value 11--= =
0.0. Therefore, it is possible that 11 = = .You would not want to conclude that there is a difference in average daily intake of dairy products for men and women.
78.6 18.78- 21 78.6 18.78- 21
Estimating the Difference between Two Proportions
Sometimes we are interested in comparing the proportion of “successes” in two binomial populations.
The proportion of male and female voters who favor a particular candidate.
To make this comparison,
.parameter with 1 population binomial
fromdrawn size of sample randomA
1
1
p
n.parameter with 1 population binomial
fromdrawn size of sample randomA
1
1
p
n
.parameter with 2 population binomial
fromdrawn size of sample randomA
2
2
p
n.parameter with 2 population binomial
fromdrawn size of sample randomA
2
2
p
n
Estimating the Difference between Two Means
We compare the two proportions by making inferences about p1-p2, the difference in the two population proportions.
If the two population proportions are the same, then p1-p2 = 0. The best estimate of p1-p2 is the difference in the two sample proportions,
2
2
1
121 ˆˆ
n
x
n
xpp
2
2
1
121 ˆˆ
n
x
n
xpp
The Sampling Distribution of 1 2ˆ ˆp p
.ˆˆˆˆ
SE as
estimated becan SE and normal,ely approximat is ˆˆ of
ondistributi sampling thelarge, are sizes sample theIf .3
.SE is ˆˆ ofdeviation standard The 2.
s.proportion population the
in difference the, is ˆˆ ofmean The 1.
2
22
1
11
21
2
22
1
1121
2121
n
qp
n
qp
pp
n
qp
n
qppp
pppp
.ˆˆˆˆ
SE as
estimated becan SE and normal,ely approximat is ˆˆ of
ondistributi sampling thelarge, are sizes sample theIf .3
.SE is ˆˆ ofdeviation standard The 2.
s.proportion population the
in difference the, is ˆˆ ofmean The 1.
2
22
1
11
21
2
22
1
1121
2121
n
qp
n
qp
pp
n
qp
n
qppp
pppp
Estimating p1-p
2
22
1
11
2121
ˆˆˆˆ1.96 :Error ofMargin
ˆˆ :for estimatePoint
n
qp
n
qp
pp-pp
2
22
1
11
2121
ˆˆˆˆ1.96 :Error ofMargin
ˆˆ :for estimatePoint
n
qp
n
qp
pp-pp
2
22
1
112/21
21
ˆˆˆˆ)ˆˆ(
:for interval Confidence
n
qp
n
qpzpp
pp
2
22
1
112/21
21
ˆˆˆˆ)ˆˆ(
:for interval Confidence
n
qp
n
qpzpp
pp
For large samples, point estimates and their margin of error as well as confidence intervals are based on the standard normal distribution (z-distribution).
Example
Compare the proportion of male and female college students who said that they had played sport in a team during their K-12 years using a 99% confidence interval.
2
22
1
1121
ˆˆˆˆ58.2)ˆˆ(
n
qp
n
qppp
70
)44(.56.
80
)19(.81.58.2)
70
39
80
65( 19.52.
.44. .06or 21 pp
Youth Soccer
Male Female
Sample size 80 70
Played soccer
65 39
Example, continued
• Could you conclude, based on this confidence interval, that there is a difference in the proportion of male and female college students who said that they had played sport in a team during their K-12 years?
• The confidence interval does not contains the value p1-p2 = 0. Therefore, it is not likely that p1= p2. You would conclude that there is a difference in the proportions for males and females.
44. .06 21 pp 44. .06 21 pp
A higher proportion of males than females played soccer in their youth.
One Sided Confidence Bounds
Confidence intervals are by their nature two-sided two-sided since they produce upper and lower bounds for the parameter.
One-sided bounds One-sided bounds can be constructed simply by using a value of z that puts rather than /2 in the tail of the z distribution.
Estimator) ofError Std(Estimator :UCB
Estimator) ofError Std(Estimator :LCB
z
zEstimator) ofError Std(Estimator :UCB
Estimator) ofError Std(Estimator :LCB
z
z
How to Choose the Sample Size?
Choosing the Sample Size
The total amount of relevant information in a sample is controlled by two factors:- The sampling plansampling plan or experimental experimental designdesign: the procedure for collecting the information- The sample size sample size nn: the amount of information you collect.
In a statistical estimation problem, the accuracy of the estimation is measured by the margin of errormargin of error or the width of the width of the confidence intervalconfidence interval..
1. Determine the size of the margin of error, B, that you are willing to tolerate.
2. Choose the sample size by solving for n or n n 1 n2 in the inequality: 1.96 SE B, where SE is a function of the sample size n.
3. For quantitative populations, estimate the population standard deviation using a previously calculated value of ss or the range approximation Range / 4.Range / 4.
4. For binomial populations, use the conservative approach and approximate p using the value pp .5 .5.
Choosing the Sample Size
Example
A producer of PVC pipe wants to survey wholesalers who buy his product in order to estimate the proportion of wholesalers who plan to increase their purchases next year. What sample size is required if he wants his estimate to be within .04 of the actual proportion with probability equal to .95?
04.96.1 n
pq04.
)5(.5.96.1
n
5.2404.
)5(.5.96.1 n 25.6005.24 2 n
He should survey at least 601 wholesalers.
Key Concepts
I. Types of EstimatorsI. Types of Estimators1. Point estimator: a single number is calculated to estimate the population parameter.2. Interval estimatorInterval estimator: two numbers are calculated to form an interval that contains the parameter.
II. Properties of Good EstimatorsII. Properties of Good Estimators1. Unbiased: the average value of the estimator equals the parameter to be estimated.2. Minimum variance: of all the unbiased estimators, the best estimator has a sampling distribution with the smallest standard error.3. The margin of error measures the maximum distance between the estimator and the true value of the parameter.
III. Large-Sample Point EstimatorsIII. Large-Sample Point Estimators
To estimate one of four population parameters when the sample sizes are large, use the following point estimators with the appropriate margins of error.
Key Concepts
Key Concepts
IV. Large-Sample Interval EstimatorsIV. Large-Sample Interval Estimators
To estimate one of four population parameters when the sample sizes are large, use the following interval estimators.
Key Concepts
1. All values in the interval are possible values for the unknown population parameter.
2. Any values outside the interval are unlikely to be the value of the unknown parameter.
3. To compare two population means or proportions, look for the value 0 in the confidence interval. If 0 is in the interval, it is possible that the two population means or proportions are equal, and you should not declare a difference. If 0 is not in the interval, it is unlikely that the two means or proportions are equal, and you can confidently declare a difference.
V. One-Sided Confidence BoundsV. One-Sided Confidence BoundsUse either the upper () or lower () two-sided bound, with the critical value of z changed from z / 2 to z.