1614 probability-models and concepts
-
Upload
dr-fereidoun-dejahang -
Category
Education
-
view
177 -
download
6
Transcript of 1614 probability-models and concepts
ROBABILITY:
P
MODELS & CONCEPTS
Chance, Consequence & Strategy:
Likelihood or ProbabilitySince there is little in life that occurs with absolute certainty, probability theory has found application in virtually every field of human endeavor.
Why Probability Theory?
As we observe the universe about us, wonderful Craftsmanship can be seen.
As we examine the elements of this creation we discover that there is incredible order, but also variation therein.
Probability theory seeks to describe the variation or randomness within order so that underlying order may be better understood.
Once understood, strategies can be more effectively formulated and their risks evaluated.
Objective Assessment:Apriori & Aposteriori Probability
Apriori means “before the fact” and hence probability assessments of this sort typically rely on a study of
traits of the phenomenon under consideration. Based on Theory.
Aposteriori means “after the fact”. This approach to likelihood assessment is also called the “relative frequency” approach.Based on repeated observation.
Likelihood Concepts EVENTS
• As we observe a phenomenon, we generally note that varying, and sometimes “identical” conditions do not always give rise to identical results. As a phenomena is repeatedly observed, the various possible results can be thought of as “events”.
Mutually Exclusive Events
• Any number of events are said to be mutually exclusive if they have no overlap or commonality.
“Nothing is impossible Mario; improbable, unlikely maybe, but not impossible.” Luigi Mario speaking to brother, Mario Mario in the movie, “Super Mario Bros.”
• A collection of events is exhaustive if, taken in totality, they account for all possible results or outcomes.
A
B A and B are mutually exclusive.
Mutually Exclusive & Exhaustive Events
Intersection & Union of Events
The intersection of two or more events is like the intersection of two streets --- it is the property they share in-common.
The intersection of events A and B is symbolized by AB.
The union of two or more events is the totality of results captured by these events.The union of two events A and B is symbolized by AUB
Notation & Definitions The probability of the event A is given by: P(A)
The probability of AB is P(AB) = P(A) + P(B) - P(AUB) where P(AUB) is the probability of the union of events A and
B. The conditional probability of the event A given that the event B
has occurred is: P(A|B) = P(AB)/P(B)DEPENDENCE & INDEPENDENCE
Two events A and B are said to be independent if and only if: P(A|B) = P(A) and P(B|A) = P(B)
It follows from this that if A and B are independent then P(AB) = P(A)*P(B)
This is the multiplication rule for independent events.
A Service Sector Example:Fast Food Clientele
A leading fast food restaurant chain routinely & randomly surveys its customers in an effort to continually improve ability to serve their clientele. Two primary questions on the survey address frequency of customer patronage and primary reason for this patronage. Results of last month’s survey of 1,000 customers are recorded in the following table.
Survey of 1000 Customers:Frequency of and Reason for Patronage
occasional moderate frequent TOTALS
menu/food 60 120 30 210
customer relations
75 180 45 300
value/cost 35 200 40 275
location/ access
60 80 25 165
other reason
20 20 10 50
TOTALS 250 600 150 1000
Marginal Probability• Marginal probabilities can be thought of as
the probabilities of being in the various margins of the table. For example, the marginal probability of a customer patronizing the restaurant chain due to menu, regardless of frequency of patronage is:
• P(menu) = 210/1000 = .21• The various marginal probabilities for this
example are determined and represented graphically as follows. The graphs are “marginal probability distributions”.
Frequency of Patronage:Marginal Probability
Distribution • Occasional Patrons:
P(occasional) = 250/1000 = .25• Moderate Patronage:
P(moderate) = 600/1000 = .60• Frequent Patrons:
P(frequent) = 150/1000 = .150
0.1
0.2
0.3
0.4
0.5
0.6
occasion
moderate
frequent
Reasons for Patronage:Marginal Probability
Distribution• Menu: P(Menu) = 210/1000 = .21• C.Rel.: P(CR) = 300/1000 = .30• Value: P(Value) = 275/1000 = .275• Location: P(Loc) =165/1000 = .165• Other: P(Other) = 50/1000 = .05
0
0.05
0.1
0.15
0.2
0.25
0.3
menucust.rel.valuelocationother
Joint Probability• Consider the cross-tabulation relating the two traits:
– frequency of patronage, and – primary reason for patronage
• Joint probabilities are probabilities of intersections of the categories (or events) of two traits. As an example, the joint probability that a customer is moderate in their patronage and their primary reason for patronage is the menu is given by
– P(moderate menu) = P(AB) = 120/1000 = .120.• A graphical representation of the complete joint probability
distribution follows.
Reasons & Frequency of Patronage Joint
Probabilities
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
occasion moderate frequent
menucust.rel.valuelocationother
Conditional Probability
Conditional probability can be thought of as probability determined in the mode of either “what if” or “given that”
For example, we might ask, “what is the probability that a customer’s primary reason for patronage is the value (A), given that the customer is frequent (B) in their patronage?”
This is symbolized by P(A|B) and is calculated as P(AB)/P(B) where the vertical line, “|” is read as “given that”.
Thus a “conditional probability” is equal to the probability of the appropriate intersection, divided by the marginal probability of the given.
Reasons for Frequent Patrons:
Conditional Probabilities
0
0.2
0.4 menucust.rel.valuelocationother
• P(value | frequent) = P(value ∩ frequent)/P(frequent) =
(40/1000) / (150/1000) = .04/.15 = .267• This is represented by the “red” bar above. The entire
“conditional probability distribution of reasons for patronage by frequent customers” is displayed above.
Independence & Dependence
Recall that two events, A & B, are mutually independent if and only if P(A|B) = P(A) and P(B|A) = P(B)
Are the events A & B independent where: A = Primary patronage reason is customer relations B = Customer is a frequent in patronage Recall that P(A) = .3, that P(B) = .15 and that
P(AB) = 45/1000 = .045 so that P(A|B) = P(AB)/P(B) = .045/.15 = .30 = P(A) P(B|A) = P(AB)/P(A) = .045/.30 = .15 = P(B) Indeed, A & B are independent.,
Independence - Key Concept
If two events, A & B, are independent then the occurrence of one of the two events does not change the LIKELIHOOD or probability that the other of the two events will occur.
Occurrence of one of the two events does alter the MANNER in which the other of the two events may occur.
Dependence If two events A & B are dependent then P(A|B) will
not equal P(A) and, similarly, P(B|A) will not equal P(B).
Let A = primary reason for patronage is menu. Let B = frequency of patronage is moderate. We have P(A|B) = 120/600 = .20 and is not equal
to P(A) = 210/1000 = .21. P(B|A) = 120/210 = .57 which is not equal to
P(B) = 600/1000 = .60. In this case, even though values are comparable
they are not equal => dependence.
Dependence - Key Concept
If two events A & B are dependent, then occurrence of one of the two events will alter the likelihood and the manner in which the other of the two events may occur.
In the case of mutually exclusive events, occurrence of one of the two events will preclude occurrence of the other event.
Mutually exclusive events are always dependent.
ProbabilityModels
Probability Models Probability models are mathematical descriptions of the
behavior of one or more variables. The ability to somewhat anticipate the behavior of a variable can be useful in risk assessment and strategy formulation.
Three commonly used models, the binomial, Poisson, and normal models, are introduced.
Random variables described by these models may be either ‘discrete’ or ‘continuous’.
Mean, Variance and Standard Deviation of a
Random Variable The mean of a random variable (r.v.) Y is denoted by
Y. For a discrete r.v. Y this is calculated as: Y = yiP(yi) This is the weighted average of the values of Y. For continuous random variables, integration replaces summation.
The variance and standard deviation of the r.v. Y are represented by 2
Y and Y, respectively. For a discrete r.v. Y, these are:
2Y = Pyi)(yi - Y)2 and Y = 2
Y
The Poisson ModelNapoleon had a problem: many of
his men were killed when kicked in the head by their own horse or mule.
Napoleon had to plan for this problem.
The Poisson model helped him to do so.
Poisson ConditionsThe Poisson model (or distribution) is
commonly applicable when: We are modeling events which occur only “rarely”,
where “rare” means “rare relative to opportunity for occurrence”.
Our random variable will be the “number of occurrences of the event over the region of opportunity for occurrence”.
Poisson Conditions:Region of Opportunity
Examples of region of opportunity include:number of customers arriving per minute
(or any other time unit);number of phone calls arriving at a
switch board per unit time;number of scars on the surface of a
compact disk.Generally “region of opportunity” is
defined either temporally or spatially.
The Poisson Model is Integral to the Study of
Queueing Theory
The Poisson Model• Defining our random variable as Y = “number of
occurrences of the event over the region of opportunity”, y = 0, 1, 2, 3, ... we have the Poisson probability model:
• P(y) = ye-/y! for y = 0, 1, 2, 3, ...
• Where is the mean or average number of occurrences of the event over the region of opportunity and e = 2.7183 is the natural base.
Estimation of the Process Mean,
• The mean of the Poisson process is ,• The variance of the process is also , that is2 = • so that the standard deviation is = • In the following example we proceed as though is of
known value. When this is not the case we simply estimate with X, the mean of the sample.
First Federal Bank of Centerville
A Queueing Example First Federal Bank (FFB) of Centreville has an automatic teller
machine (ATM) near the entrance of the bank. Long lines at the ATM have sometimes led to congestion and
perhaps a diminishing clientele. With a view toward improved customer service, FFB is considering the addition of one or more ATMs or, possibly, relocation of the current ATM.
During peak hours ATM users arrive in a manner described by a Poisson distribution with a mean of 1.7 customers per minute.
First Federal Bank of Centreville
The Probability Distribution• What is the probability that no customers arrive in
one-minute during a peak business period?
• Solution: P(0) = 1.70e-1.7/0! = .1827• Similarly, P(1) = 1.71e-1.7/1! = .3106• Determine probabilities for 2, 3, ...., 9 customers.
The probability distribution appears on the next slide.
FFB of CentrevilleProbability Distribution
x P(X = x) 0 0.1827 1 0.3106 2 0.2640 3 0.1496 4 0.0636 5 0.0216 6 0.0061 7 0.0015 8 0.0003 9 0.0001 10 0.0000
x P(X LESS < x) 0 0.1827 1 0.4932 2 0.7572 3 0.9068 4 0.9704 5 0.9920 6 0.9981 7 0.9996 8 0.9999 9 1.0000
Poisson Probabilities with µ= 1.7
Poisson Cum
ulative Probabilitiesw
ith µ= 1.7
First Federal Bank of CentrevilleATM Customer Probabilities
0
0.05
0.1
0.15
0.2
0.25
0.3
0.350123456789
First Federal Bank of Centreville
CDF Graph
0
0.2
0.4
0.6
0.8
10123456789
The cdf graph above was constructed by adding the appropriate Poisson probabilities.
First Federal Bank of Centreville
Key ConsiderationsKey factors that FFB should address
prior to making a decision include:What is the service rate (how quickly do
customers complete their ATM transactions)? If long lines are forming during peak hours, the
service rate may be less than customer arrival rate and addition of one or more ATMs may be necessary.
If the problem is congestion, rather than excessive wait to use the ATM, the solution may be to simply move the ATM.
Model Adequacy:Chi-Square Goodness of Fit Testing
DOES THIS MODEL FIT?Chi-Square Goodness-of-Fit
Tests The purpose of 2 goodness-of-fit tests is to
evaluate whether a particular probability distribution does an adequate job of modeling the behavior of the process under consideration. This sort of test can be applied to any model.
A “skeleton” or template for the chi-square goodness-of-fit test follows.
2 Goodness of Fit Test - General Layout.
1) H0: p1 = p10, p2 = p20, ... , pk = pk0
HA: at least one pi ≠ pi0
2) n = _______ = _______3) DR: Reject H0 in favor of HA iff 2
calc > 2crit = ___.
Otherwise, FTR H0.4) 2
calc = (Oi - npio)2/npio = (Oi - Ei)2/Ei
5) Interpretation: Should relate to whether the hypothesized model adequately describes behavior of the process underconsideration.
Generic Example: A computer manufacturer produces a disk drive which has three major causes of failure (A, B, C) and a variety of minor failure causes (D).
Suppose that historic failure rates are:Due to A: .20 Due to B: .35 Due to C: .30 Due to D: .15The manufacturer has worked on A, B, and C and believes that failures due to these causes has been reduced, so that, while fewer failure will occur, it is more likely that when one occurs, it will be due to D. To examine this claim the manufacturer will sample 200 failed disk drives manufactured since process changes were made. IF THE CHANGES HAD NO IMPACT then the number of these failed drives that were due to causes A, B, C, and D that would be EXPECTED would be:EA = npA0 = 200(.20) = 40 EB = npB0 = 200(.35) = 70EC = npC0 = 200(.30) = 60 ED = npD0 = 200(.15) = 30
Upon observation, suppose that we had OA = 28, OB = 66, OC = 46, OD = 60. Test the appropriate hypothesis at the= .05 level.
CONTINUED NEXT PAGE
Failure Mode Profile Example - Continued
1) H0: pA = .20, pB = .35, pC = .30, pD = .15
HA: at least one pi ≠ pi0 for i = A, B, C, D
2) n = 200 = .05
3) DR: Reject H0 in favor of HA iff 2c > 2
T = 7.8147. Otherwise, FTR H0. Note: There are (k-1) = 3 degrees of freedom.
4) 2c = (Oi - npio)2/npio = (Oi - Ei)2/Ei
= (28-40)2/40 + (66-70)2/70 + (46-60)2/60 + (60-30)2/30 = 3.6000 + 0.2286 + 3.2667 + 30.0000 = 37.0953
5) Interpretation: Since 2c exceeds 2
T, we can conclude that the historic failure mode distribution no longer applies (reject H0 in favor of HA). So how has the distribution changed? The answer is embedded in the individual category contributions to 2
calc ... larger contributions indicate where the changes have occurred: reductions in A and C, no obvious change in B, the various failures that make-up D now comprise a (proportionally) larger amount of the failures.
Chi-Square Goodness of Fit Test
for the Poisson DistributionA sample of 120 minutes selected during rush periods at FFB gave the following number of customers arriving during each of those 120 minutes. Is this data consistent with a Poisson distribution with a mean of 1.7 customers per minute, as previously stated? Test the appropriate hypothesis at the = .10 level of significance.
Number of 0 1 2 3 4 or more Customers
Frequency 25 42 35 9 9
FFB of CentrevillePoisson Goodness of Fit
TestCustomers/ Prob. Obs (O) Exp (E) (O-E)2/Eminute 0 0.1827 25 21.924 0.4316 1 0.3106 42 37.272 0.5998 2 0.2640 35 31.680 0.3479 3 0.1496 9 17.952 4.4640
> 4 0.0932 9 11.184 0.4265 1.00 120 120 6.2698 = 2
calc
with = .10 and (k-1) = 4 df, the critical value is 7.7794
FFB of Centreville - Continued
1) H0: the number of customers arriving per minute is Poisson distributed with a mean of 1.7. OR p(0) = .1827 p(1) = .3106 p(2) = .2640 p(3) = .1496 p(4+) = .0932
HA: the number of customers arriving per minute is not Poisson with = 1.7
2) n = 120 and = .10
3) DR: Reject H0 in favor of HA iff 2calc > 2
crit = 7.7794. Otherwise, FTR H0. (NOTE - THERE ARE 4 DF)
4) 2calc = 6.2698 (calculations on previous slide)
5) FTR H0. In this case, the number of customers arriving per minute during the business rush at FFB of Centreville is reasonably well-modeled by a Poisson distribution with a mean of 1.7.
As a modification --- if we had not had information about the mean number of customers arriving per minute, we would have had to estimate this value with the sample mean and then determined the estimated probabilities. This would have cost an additional degree of freedom (e.g. df = (k-1) - 1 = 3.
Binomial Conditions
Suppose that there are two possible outcomes to an experiment which are mutually exclusive and exhaustive (refer to these generically as “success” and “failure”);
a predetermined sample size, n;the probability of “success” is a constant, p, and
the probability of “failure” is a constant, (1-p);the condition of one item is not influenced by the
condition of any other item (this is called independence).
Collectively, these are the binomial conditions.
Binomial Probability Model
• When the binomial conditions are present, and the random variable Y is defined as the number of “successes” out of n items sampled, then the model which determines probabilities for the various values of Y is given by:
• P(Y = y) = [nCy]py(1-p)n-y • where nCy = n!/[y!(n-y)!] is read as the number of
combinations of n things selected y-at-a-time.• with any integer x! being x(x-1)(x-2)...(1)• so that, for example, 5! = 5(4)(3)(2)(1) = 120
Binomial Mean, Varianceand Standard Deviation
• Although the formulas previously presented can be used to determine the values of Y, 2
Y and Y, the following results are more easily applied in the binomial case:
Y = np2
Y = np(1-p)Y = np(1-p)
Estimation of p The binomial parameter, p, is thought of as the “probability that
any single item sampled is identified as a ‘success’ “. Frequently this value will be unknown and will need to be
estimated from sample information. p is estimated as simply x/n where x is the number of ‘successes’
in the sample of n items. This estimate is often denoted by p. Similarly, the estimate of (1-p) is (1-p).
^^
The Electronix Store
In a competitive local retail electronics market, the probability that a randomly selected “customer” browsing in The Electronix Store will make a purchase is .2.
If 6 “customers” are randomly selected, what is the probability that exactly 2 of these individuals will make a purchase?
This and similar questions can be addressed via the binomial distribution.
The Electronix StoreWe identify:
n = 6 customersp = .2 = probability that a customer buysY = number of the six customers who buy
Thus we see that:Y = np = 6(.2) = 1.2 Customers2
Y = np(1-p) = 6(.2)(.8) = .96Y = √ .96 = .98 Customers
^
^^ ^
The Electronix Store
• We have:– P(0) = .2621 P(1) = .3932– P(2) = .2458 P(3) = .0819– P(4) = .0154 = {6!/[4!2!]}(.2)4(.8)2
– = 15(.0016)(.64)– P(5) = .0015 P(6) = .0001 (or .000064)
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0123456
The Electronix StoreCustomer Purchase
Probabilities
The Electronix Store
We may require answers to such questions as:“What is the probability that no more than two of six
customers make a purchase?”“What is the probability that at least four of six
customers make a purchase?”“How many cash registers are needed?”
Answers to these and similar questions can be investigated through study such as we have undertaken.
The Electronix StoreCumulative Probabilities
0
0.5
10123456
• A tabulation of the “less than or equal to” probabilities is called a “cumulative distribution function” or cdf. The Electronix Store cdf appears above.
Application of this information might spark discussion on:staffing decisions,sales representative
specialization focus on
merchandise, value, and customer service.
The Electronix Store:Strategy
2 Goodness-of-Fit Test: Binomial
ExampleOil & Gas Exploration is both expensive and risky. The average cost of a “dry hole” is in excess of $20 million. New technologies are always under development in an effort to reduce the likelihood of drilling a “dry hole” with the result being increased profitability. Suppose an experimental technology has been developed that claims to have an 80% success rate (e.g. only 20% dry holes). This technology was tested by drilling four holes and counting the number of productive wells. This was done 100 times, each time counting the number of productive wells. The data is recorded below:
Number of productive wells 0 1 2 3 4
Observed 3 6 22 41 28 Frequency
Test the appropriate hypothesis at the = .01 level of significance.
Oil & Gas Exploration Example
1) H0: the new technology delivers success according to a binomial distribution with p = .8 or ... p(0 or 1) = .0272 p(2) = .1536 p(3) = .4096 p(4) = .4096 (NOTE - SEE NEXT PAGE FOR THESE VALUES)
HA: the new technology does not deliver success according to a binomial distribution with p=.8.
2) n = 100 and = .01
3) DR: Reject H0 in favor of HA iff 2calc > 2
crit = 11.3449. Otherwise, FTR H0.
4) 2calc = 21/4705 (calculations on next slide)
5) Reject H0 in favor of HA. In this case, note that “O” tends to be greater than “E” for lower numbers of successful wells, and the reverse for higher numbers of successful wells ... this indicates that the success rate of the new technology is LESS THAN THE CLAIMED 80% rate.
Hits Prob Count Expected Combined C-Prob C-Count C-Expect (O-E)^2/E X^2calc 0 0.0016 3 0.16 0-1 0.0272 9 2.72 14.4994 21.4705 1 0.0256 6 2.56 2 0.1536 22 15.36 2.8704 2 0.1536 22 15.36 3 0.4096 41 40.96 0.0000 3 0.4096 41 40.96 4 0.4096 28 40.96 4.1006 4 0.4096 28 40.96
Modified Oil & Gas Exploration Example
(still binomial)If p were unknown, then it would have to be estimated from the data. There is a cost to this --- a lost degree of freedom. In general df = (k - 1) - m
where k = number of categories-1 because the probabilities across all categories add to one (lacking only one probability, we can determine the otherm = the number of parameters that must be estimated.
In this case, the estimate of p is this: a total of 400 wells were drilled (100 fields at 4 wells each). The number of productive wells was (3*0 + 6*1 + 28*2 + 41*3 + 22*4) = 273
So that our estimate of p is 273/400 = .6825. The modified calculations follow.
Modified Oil & Gas Exploration Example
MTB > pdf;SUBC> binomial n=4 p=.6825.
BINOMIAL WITH N = 4 P = 0.682500 K P( X = K) Observed Expected (O-E)2/E 0 0.0102
combine these .0976 9 9.76 0.0592 1 0.0874
2 0.2817 28 28.17 0.0010 3 0.4037 41 40.37 0.0098 4 0.2170 22 21.70 0.0041
0.0742 = calculated value of 2
MTB > invcdf .99;SUBC> chis 2. 0.9900 9.2103 = critical value
Clearly we would FTR H0. So that if you combine the information, really, you havenot rejected the binomial distribution altogether ... though you did reject the binomialdistribution with p=.8. The binomial distribution with p=.6825 does an excellent jobof modeling the performance of this new oil & gas exploration technology.
The Normal Probability Model
The “normal” or “Gaussian” distribution is the most commonly used of all probability models.This distribution is known perhaps most familiarly as the “bell curve”. The normal distribution serves as the assumed model of behavior for various phenomena, generally as an approximation. It is also foundational to the development of numerous commonly used statistical methods.
The Normal Distribution• The normal distribution is described by the
mathematical expression:
f(x) = (1/ √ 22)exp(-(x-)2/22)
X is a random variable with mean and standard deviation exp = e = 2.7183 is the natural base, raised to the power expressed in the ( ). As will be seen, we need not work with the formula above.
00.020.040.060.080.1
0.120.140.160.180.2
A histogram representation of the normal distribution might appear as this one.
The normal distribution is symmetric about its mean, It is also well-tabled as the “standard normal distribution” with = 0 and = 1.
The Normal Probability Model
Table Use - Relationships
Since the normal distribution isa probability distribution, with total area
under the curve equal to 1, andsymmetric about its mean, µ, we have:
P(Z > Z*) = .5 - A(Z*) where Z* > 0 A(-Z*) = A(Z*) by symmetry. Knowing these few relationships, any needed
probabilities can be found. Only positive values of Z need be tabled.
Z Table Use Examples
• Using available Z tables determine :• A(1.33) and A(-1.33)• The probability of being between Z = -1.33 and +1.33.• The probability that Z is at most 1.33• The probability that Z is at least 1.33• The probability that Z is at most -1.33• The probability that Z is between -.75 and +1.2• The probability that Z is between +.50 and +1.2
-1.33 0 .5 .75 1.2 1.33
Z Table - Selected Portions
Z 0.00 0.01 0.02 0.03 0.04 0.05 ......... 0.090.0 .0000 .0040 .0080 .0120 .0160 .0199 ......... .0359
0.5 .1915 .1950 .1985 .2019 .2054 .2088 ......... .2224
0.7 .2580 .2611 .2642 .2673 .2704 .2734 ......... .2852
1.2 .3849 .3869 .3888 .3907 .3925 .3944 ......... .4015
1.3 .4032 .4049 .4066 .4082 .4099 .4155 ......... .4177
Inverse Use of the Z Table
In application, there are two common variations requiring opposite use of tables of the standard normal distribution.
We have illustrated the first variation where, given one or more values of Z, we can determine the needed area under the curve (e.g. the needed probability).
The “inverse” situation is one in which an area under the curve is designated, and the corresponding value(s) of Z are obtained.
Inverse Use of the Z Table The inverse approach is to:
locate the appropriate area or probability in the body of the table,
then move to the corresponding top and left table margins to identify the appropriate value(s) of Z.
From this we have X = + Z
A(Z) = known
?
Application of the Inverse Normal
The Normal Distribution in General
We can determine probabilities for any normally distributed process performance measure or PPM, X, by determining the corresponding value of Z, that is Z = (X - )/
Inversely, given an area under the curve, we can determine a needed value of X as: X = + Z
The SUPER MarketThe SUPER Market, a major metropolitan area
superstore chain, offers delivery service to addresses within a defined region.
The SUPER Market guarantees delivery within two hours of the time that the order is received. If this guarantee is not met, the customer receives a 10% discount for each 30 minutes late.
The SUPER Market
• Delivery time is approximately normally distributed with an average delivery time of 1 hour and 20 minutes and a standard deviation of 20 minutes. That is = 80 min. and = 20 min.
Guaranteeddelivery
within two hours!
The SUPER Market:Time to Delivery
• Inverse Problems• Given a designated
probability, what is the corresponding value of Z and, in turn, X = delivery time?
A Goodness of Fit Test for the Normal
Distribution IS DELIVERY TIME NORMAL? To determine whether delivery times for the SUPER MARKET are, within reason, normally distributed we would select a random sample of delivery times and apply any of a number of goodness of fit techniques.
While the chi-square goodness of fit test could be applied, a graphical procedure, the normal probability plot, will be illustrated. This is augmented by a more formal procedure, the Anderson-Darling test.
To proceed we will select a sample of, say, 40 delivery times. These appear in the sequel.
40 Sampled Delivery Times56 89 123 97 68 79 80 96 74 108 86 65 102 96 90 88 67 87 58 71 72 83 90 59 76 73 82 88 63 114 86 54 109 43 69 47 90 96 52 117
N Mean Median Std. Dev. Del_Time 40 81.07 82.50 19.45
p-value: 0.934A-Squared: 0.166
Anderson-Darling Normality Test
N of data: 40Std Dev: 19.448Average: 81.075
120110100908070605040
.999
.99
.95
.80
.50
.20
.05
.01
.001
Prob
abilit
y
Del_Time
Normal Probability PlotSampled Delivery Times from the SUPER Market
Normally distributed values should plot VERY close to a straight line. While this is a judgment call, a more objective approach is to examine the p-value from theAnderson-Darling test -- if the p-value is less than , then normality is questionable.