WEEK 4: CHAPTER 4, SPECIAL...

38
EEOS 601 UMASS/Online Introduction to Probability & Applied Statistics Handout 7, Week 4 6/21 Tu - 6/27 M Revised: 2/10/11 WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONS TABLE OF CONTENTS Page: List of Figures ................................................................................... 1 List of Tables .................................................................................... 2 List of m.files .................................................................................... 2 Assignment ...................................................................................... 2 Required reading .......................................................................... 2 Understanding by Design Templates .................................................................. 3 Understanding By Design Stage 1 — Desired Results Week 4 .............................................. 3 Understanding by Design Stage II — Assessment Evidence Week 4 6/21-6/27 ................................. 3 Introduction ..................................................................................... 5 Annotated outline (with Matlab scripts) for Larsen & Marx Chapter 4 ........................................ 5 References ..................................................................................... 37 Index .......................................................................................... 38 List of Figures Figure 1. Table 4.2.1, Table 4.2.2 .................................................................... 7 Figure 2. Case study 4.2.4, Table 4.2.6-7, Figure 4.2.3 ................................................... 13 Figure 3. Figure 4.2.3 from Case study 4.2.4 .......................................................... 13 Figure 4. Figure 4.3.2 ............................................................................ 16 Figure 5. Example 4.3.2 Page 302 in Larsen & Marx (2006). 40 samples of 5 uniformly distributed random numbers after conversion to Z scores. The fit to the normal distribution is quite good even though each of the 40 samples is based on only 5 uniformly distributed numbers. ................................................. 19 Figure 6. Example 4.3.2 Page 302 in Larsen & Marx (2006). 1000 samples of 5 uniformly distributed random numbers after conversion to Z scores. The number of bins increased from 7 to 14 for this bar chart. The normal distribution provides an excellent fit even though each of 1000 means is based on only 5 cases drawn from a uniform distribution. The uniform distribution is symmetric, allowing a rapid convergence to the normal distribution. . . 20 Figure 7. Example 4.3.2 Page 302 in Larsen & Marx (2006). 1000 samples of 500 uniformly distributed random numbers after conversion to Z scores. The number of bins increased from 7 to 14 for this bar chart. The fit of the normal distribution is excellent as would be expected from the large number of cases (500) on which each mean is based. ....................................................................................... 21

Transcript of WEEK 4: CHAPTER 4, SPECIAL...

Page 1: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601UMASSOnlineIntroduction to Probabilityamp Applied StatisticsHandout 7 Week 4621 Tu - 627 MRevised 21011

WEEK 4 CHAPTER 4 SPECIAL DISTRIBUTIONS

TABLE OF CONTENTS Page

List of Figures 1

List of Tables 2

List of mfiles 2

Assignment 2

Required reading 2

Understanding by Design Templates 3

Understanding By Design Stage 1 mdash Desired Results Week 4 3

Understanding by Design Stage II mdash Assessment Evidence Week 4 621-627 3

Introduction 5

Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4 5

References 37

Index 38

List of Figures

Figure 1 Table 421 Table 422 7

Figure 2 Case study 424 Table 426-7 Figure 423 13

Figure 3 Figure 423 from Case study 424 13

Figure 4 Figure 432 16

Figure 5 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 uniformly distributed random numbers after

conversion to Z scores The fit to the normal distribution is quite good even though each of the 40 samples is

based on only 5 uniformly distributed numbers 19

Figure 6 Example 432 Page 302 in Larsen amp Marx (2006) 1000 samples of 5 uniformly distributed random numbers

after conversion to Z scores The number of bins increased from 7 to 14 for this bar chart The normal distribution

provides an excellent fit even though each of 1000 means is based on only 5 cases drawn from a uniform

distribution The uniform distribution is symmetric allowing a rapid convergence to the normal distribution 20

Figure 7 Example 432 Page 302 in Larsen amp Marx (2006) 1000 samples of 500 uniformly distributed random numbers

after conversion to Z scores The number of bins increased from 7 to 14 for this bar chart The fit of the normal

distribution is excellent as would be expected from the large number of cases (500) on which each mean is based

21

EEOS 601 Applied Statistics Week 4 P 2 of 38

Figure 8 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 lognormally distributed (igrave=1 oacute=1) random

numbers after conversion to Z-scores The normal approximation is quite poor Because each of the 40 samples is

based on a mean of only 5 cases 21

Figure 9 Example 432 Page 302 in Larsen amp Marx (2006) 1000 samples of 1000 lognormally distributed (igrave=1 oacute=1)

random numbers after conversion to Z scores I used 14 bins for this bar chart Even for the highly asymmetrically

lognormally distributed samples the normal approximation is excellent because of the large number of cases

(1000) on which each mean was based 22

Figure 10 Figure 435 from Example 436 If a driverrsquos true blood alcohol level is 0095 and the breathalyzer has an

instrumental error oacute= 0004 (pdf shown in upper panel) What is the probability that a driver will be erroneously

charged with DUI The problem can be analyzed using the pdf for the unscaled data or the data can be converted

to z scores (lower panel) by subtracting the mean and dividing by oacute The probability of an erroneous charge is

1056 the area to the right of the cutpoints shown in the normal and standard normal curves 27

Figure 11 Figure 436 from Example 437 A Mensa score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16 29

Figure 12 Figure 437 from Example 438 A Mensa score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16 30

Figure 13 Figure 471 An application of Matlabrsquos exponential distribution which is defined differently from Larson amp

Marx (2006) y=0145fyy=exppdf(y1lambda) 35

Figure 14 Figure 472 An application of Matlabrsquos normal pdf Page 334 in Larsen amp Marx (2006) page 334 35

Figure 15 Figure 476 The result of 100 Monte Carlo simulations of repair costs Page 336 in Larsen amp Marx (2006)

page 334 The max median and mean repair bills are $737 $142 and $161 15 of customers had no repair

bills 36

Figure 16 Figure 476 The result of 1 million Monte Carlo simulations of repair costs Page 336 in Larsen amp Marx

(2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

36

List of Tables

List of mfiles

Assignment

REQUIRED READING

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

Read Sections 41-43 We wonrsquot cover 44-47 on the geometric negative binomial and gamma

distributions in this course There are some applications of these distributions in environmental science particularly the use of the negative binomial to model aggregated plant distributions but there is insufficient time to cover everything I provide the Matlab code for the examples and case studies

EEOS 601 Applied Statistics Week 4 P 3 of 38

Understanding by Design Templates

Understanding By Design Stage 1 mdash Desired Results Week 4 LM Chapter 4 Special Distributions

G Established Goals bull Learn how to solve applied problems with Matlabrsquos pdf cdfrsquos and inverse functions

for the Poisson exponential and normal distributions bull Learn how to model individual behavior using the standard normal (Gaussian) curve bull Describe the effects of sampling error and rounding on the expected values and

variances of computed values from observations (An MCAS 10th grade standard) U Understand bull Most statistical inference mdash frequentist amp Bayesian mdash is based on the uniform

binomial Poisson amp Normal distributions bull The Poisson and Normal distributions were developed as approximations for the

binomial distribution but are useful themselves in modeling nature bull The Poisson model is the standard model for lsquorandomrsquo distributions of animals and

plants in nature bull Increased precision of measurement is one of the benchmarks of scientific advance

and propagation of error is the ruler for the benchmark Q Essential Questions bull Why is the Poisson distribution so useful in modeling random events in nature bull Is the normal distribution the foundation of all quantitative science bull Is estimating the variance of a climate change prediction more important than

estimating its expected value K Students will know how to define (in words or equations) bull Poisson exponential Monte Carlo simulation normal and standard normal

distributions propagation of error standard error z-score and z-transform S Students will be able to bull Write Matlab programs using the pdf cdf and inverse distribution functions for the

binomial Poisson exponential and normal distributions bull Provide examples from their own fields of the Poisson exponential amp normal

distributions bull Be able to state to a layman the meaning of p values based on the distribution

functions for the binomial Poisson amp normal distributions

EEOS 601 Applied Statistics Week 4 P 4 of 38

Understanding by Design Stage II mdash Assessment Evidence Week 4 621-627 LM Chapter 41-43 (pages 274-316) Special Distributions

bull Post in the discussion section by 62911 W bull Find an application from the primary literature in your own field or your own

work using the Poisson or normal distributions Hint use Google scholar eg search under lsquoCoral abundances Poissonrsquo) Describe the application and include a citation or link if it is from the scientific literature

bull HW 4 Problems due Wednesday 62911 W 10 PM bull Each problem must be solved using Matlab

bull Submit your Matlab code as part a Word Wordperfect or rtf document summarizing the answer I must be able to run your Matlab code after a simple cut-and-paste from your document to the Matlab editor

bull You must reach a 1-2 sentence conclusion about your result bull Submit to the course Vista 8 website

bull Basic problems (4 problems 10 points) bull Problem 422 P 280 Prescription errors

bull Compare the Poisson limit approximation with the exact probability from the binomial pdf

bull Use Case Study 421 as a model bull Problem 4212 P 287 Midwestern Skies Lost Bags

bull Use Case Study 422 as a model bull Matlab hints

bull Enter the data as BagsLost bull lambda=mean(BagsLost) bull k=[0max(BagsLost)+4] bull ObservedF=hist(BagsLostk) bull bar(kObservedF)figure(gcf)

bull Problem 4320 Nevada nuclear weapons test p 306-307 bull Use Example 435 (Memphis earthquakes as a model)

bull Problem 4322 P 314 Childrenrsquos IQrsquos and school cost bull Use Example 436 (Drunk Driving p 308-309) as a paradigm

bull Advanced problems (25 points each) bull Problem 4320 Nevada nuclear weapons Graph both the Poisson pdf

and superimposed normal pdf bull Use Example the graph in Gallagher Example 435 as a model

bull Problem 4322 p 314 Childrenrsquos IQrsquos and school cost bull Plot the pdf as in Figure 436 using Matlabrsquos fill function to fill

in the areas corresponding to special needs bull Use Example 437 as a model

bull Master problems (1 only 5 points) bull What is the probability of observing 2 or more magnitude 5 earthquakes

within 1 degree latitude amp longitude of Los Angeles next year bull httpearthquakeusgsgovearthquakeseqarchivesepicepic_rectphp

bull Search database 1173 to 123110 (38 years) 3305 to 3505 lat -11925 to -11725 longitude

EEOS 601 Applied Statistics Week 4 P 5 of 38

Introduction

Chapter 4 is my favorite chapter in Larsen amp Marx (2006) It also covers the Poisson distribution which is the null model for random distributions in nature The case studies and examples are interesting including the analysis of leukemia clusters in Niles Illinois (Case Study 421) and the probability that yoursquoll be eating 5 or more insect parts with your next peanut butter and jelly sandwich (Example 424)

Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4

4 Special distributionsLambert Adolphe Jacques Quetelet (1796-1874)

41 Introduction 411 To qualify as a probability model over a sample space S a function

4111 must be nonnegative for all outcomes in S an 4112 must integrate to 1

412 Not all functions qualify as a probability model 42 The Poisson distribution

4211 binomial distribution

4212 The Poisson limit

Poisson limit theorem (Larsen amp Marx 2001 Theorem 421 p 251) If n 4 and p 0 in such a way that euml = np remains constant then for any nonnegative integer k

Example 421 Convergence of the Poisson very good for n100

EEOS 601 Applied Statistics Week 4 P 6 of 38

LMex040201_4thm Example 421 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Table 421 amp Table 422 Larsen amp Marx (2006 p 277) Written by Eugene Gallagher for EEOS601 in2001 revised 1911 EugeneGallagherumbedu Comparing Poisson pdf and binomial pdf Figure 1 Figure from Table 421 With n=5

p=15 (euml=1) the Poisson approximation is just

n=5 adequate

k = [0n]lambda=1p_pdf = poisspdf(klambda)p_pdf=[p_pdf1-sum(p_pdf)]b_pdf= binopdf(knp)b_pdf=[b_pdf1-sum(b_pdf)]kdisp=[06]fprintf(Table 421n)fprintf(f10 53f 53fn[kdisp b_pdf p_pdf])

Plot as a grouped histogram bar(kdisp[b_pdf p_pdf]grouped) axis([-6 66 0 05]) legend(Binomial pdfPoisson pdf) xlabel(kFontsize14) ylabel(FrequencyFontsize14) title(Table 421Fontsize16) figure(gcf)pause p=1100n=100 Figure 1 Figure from Table 422 With

ndisp=[09] n=100 (euml=1) the Poisson approximation is

k = [0n] very good

lambda=1p_pdf = poisspdf(ndisplambda)p_pdf=[p_pdf1-sum(p_pdf)]b_pdf= binopdf(ndispnp)b_pdf=[b_pdf1-sum(b_pdf)]kdisp=[010]

IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 7 of 38

fprintf(Table 421n)fprintf(f10 53 53fn[kdisp b_pdf p_pdf]) Plot as a grouped histogram bar(kdisp[b_pdf p_pdf]grouped) legend(Binomial pdfPoisson pdfFontSize16) axis([-6 106 0 04]) xlabel(kFontSize14) ylabel(FrequencyFontSize14) title(Table 422FontSize16) figure(gcf)pause

Table 421

Figure 1 Table 421 Table 422

Example 422 Shadyrest hospital LMex040202_4thm Example 422 Shady Rest Hospital cardiac monitoring machines p 277-288 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition There are 12000 residents in the area served by Shady Rest Hospital and the probability that a resident will have a heart attack and need a monitor is 18000 There are 3 monitors available What is the probability that the 3 monitors will be inadequate to meet the communitys needs That is what is the probability of 4 or more heart attacks Written by EugeneGallagherumbedu Fall 2010 fprintf(Solved as the sum of binomial probabilities 86fn 1-sum(binopdf(031200018000))) fprintf(Or solved using the binomial cumulative distribution function 86fn 1-sum(binocdf(31200018000))) fprintf(Or solved using the Poisson distribution 86fn 1-sum(poisspdf(03120008000))) fprintf(Or solved using the cumulative Poisson distribution 86fn

IT
Stamp

EEOS 601 Applied Statistics Week 4 P 8 of 38

1-sum(poisscdf(3120008000)))

Case Study 421 Leukemia in Niles Illinois LMcs040201_4thm Case Study 421 Leukemia cancer cluster Pp 278-279 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Edition Written by Eugene D Gallagher for UMASSBostons EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1012010 Revised 1711 p=incidence of leukemia in 5 13 years in towns around Niles IL What is the probability of observing 8 or more childhood leukemia cases by chance alone Solve using the binomial pdf and the Poisson limit N=number of people in Niles ILN=7076p=2481e5X=87076X=Xbp=binopdf(XNp)BP=sum(bp) or it could be solved using the complementX2=07BP2=1-sum(binopdf(X2Np))fprintf(nFrom the binomial pdf the exact probability of observing n8 or more ) fprintf(deaths among 7076 children in 5 13 years is 63gnBP)fprintf(From the complement of the binomial pdf the exact p of observing n8 or)fprintf( more deaths among 7076 children in 5 13 years is 63gnBP2) Now solve using the Poisson pdf lambda is the expected number of cases among 7076 in Niles Illionoislambda=7076248e-5PP= 1-sum(poisspdf(X2lambda))fprintf(nFrom the Poisson pdf the approximate probability of observing n8 or ) fprintf(more deaths among 7076 children in 5 13 years with a Poisson n)fprintf(expected value of 63g is 63gnnlambdaPP)

Or one could use the Poisson cumulative distribution functionPCP = 1-poisscdf(7lambda)fprintf(From the cumulative Poisson pdf the approximate probability of n) fprintf(observing 8 or more deaths among 7076 children in 5 13 years )fprintf(with na Poisson expected value of 63g is 63gnn lambdaPCP) RE=abs(BP-PP)BP 100 fprintf( The relative error in the Poisson approximation is 41fnRE)

Questions p 279

EEOS 601 Applied Statistics Week 4 P 9 of 38

4213 The Poisson Distribution 42131 First described by Poisson as a limit theorem and

then used in 1898 by Professor Ladislaus von Bortkiewicz to model the number the Prussian cavalry officers kicked to death by their horses

Larsen amp Marx 2001 Theorem 422 p 251 The random variable X is said to have a Poisson distribution if

4214 Fitting the Poisson Distribution to Data

Case Study 422 LMcs040202_4thm Case Study 422 alpha decay amp Geiger counter Page 282-284 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1142010 Revised 1711 Fitting the Poisson pdf to radioactive decay alpha-particlek=[011]Frequency=[57 203 383 525 532 408 273 139 45 27 10 6]sumFrequency=sum(Frequency)lambda=sum(kFrequency)sumFrequencyfprintf(The Poisson parameter is 53fnlambda)ObservedP=FrequencysumFrequencyExpectedP=poisspdf(klambda)TableData=[k Frequency ObservedP ExpectedP]fprintf(ntttttTable 423n)fprintf(No Detected k tFrequency tProportion tp_x(k)n)fprintf( 20ftttt20fttt53ftt53fnTableData)

Using concepts from Chapter 5 a MLE fit could be made of the data The only new feature added is a 95 CI for the binomial parameter See LMcs050201_4thm for another example of using Poisspdf to fit the Poisson distribution to dataDATA=[zeros(571)ones(2031)2ones(3831)3ones(5251)4ones(5321) 5ones(4081)6ones(2731)7ones(1391)8ones(451) 9ones(271)10ones(101)11ones(61)] [LAMBDAHAT LAMBDACI] = poissfit(DATA005) fprintf(

nThe Poisson parameter =53f with 95 CI [53f 53f]nn LAMBDAHAT LAMBDACI)

EEOS 601 Applied Statistics Week 4 P 10 of 38

Case Study 423 LMcs040203_4thm Fitting football fumbles to a Poisson pdf From Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu Revised 1142010 See LMcs040202_4thm (alpha decay) for a very similar programk=[07]Frequency=[8 24 27 20 17 10 3 1]sumFrequency=sum(Frequency)meank=sum(kFrequency)sumFrequencyfprintf(The Poisson parameter is 53fnmeank)ObservedP=FrequencysumFrequencyExpectedP=Poisspdf(kmeank)TableData=[k Frequency ObservedP ExpectedP]fprintf(ntttttTable 425n)fprintf(No Detected k tFrequency tProportion tp_x(k)n)fprintf( 20ftttt20fttt53ftt53fnTableData) Using concepts from Chapter 5 a MLE fit could be made of the data The only new feature added is a 95 CI for the binomial parameter See LMcs050201_4thm for another example of using Poisspdf to fit the Poisson distribution to dataDATA=[zeros(81)ones(241)2ones(271)3ones(201)4ones(171) 5ones(101)6ones(31)7] [LAMBDAHAT LAMBDACI] = poissfit(DATA005) fprintf(

nThe Poisson parameter =53f with 95 CI [53f 53f]nn LAMBDAHAT LAMBDACI)

4215 The Poisson Model the law of small numbers 421511 Assumptions of the Poisson model

4215111 The probability that 2 or more events occur in any given ltsmallgt subinterval is essentially 0

4215112 Events are independent 4215113 The probability that an event occurs

during a given subinterval is constant over the entire interval from 0 to T

EEOS 601 Applied Statistics Week 4 P 11 of 38

42152 Calculating Poisson Probabilities 421521 ldquoCalculating Poisson probabilities is an

exercise in choosing T so that eumlT represents the expected number of occurrences in whatever ldquounitrdquo is associated with the random variable X

Example 423 LMex040203_4thm Example 423 Typographical Errors Page 285 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 172011 Revised 1711 Using the Poisson pdf ErrorRate=typographical Errors per page What is the probability that fewer than three typos will appear in a 16shy page edition X=Number of errorsErrorRate=04lambda=ErrorRate16fprintf(Using the Poisson cumulative distribution functionn)P=poisscdf(2lambda)fprintf(The probability that fewer than 3 typos will appear in a n)fprintf(16-p document with an error rate of 31f per page is 53fnlambdaP)fprintf(nUsing the Poisson probability distribution functionn)X=02P=sum(poisspdf(Xlambda))fprintf(If the Poisson parameter = 31f the probability of nlambda)fprintf(fewer than 3 typos = 53fnP)

Example 424 Insect parts in peanut butter

LMex040204_4thm Example 424 Eating bug parts Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu 121210 revised 11211 httpalphaesumbedufacultyedgfilesedgwebphtml This is a simple application of the Poisson cumulative distribution function and Poisson pdf The FDA Food Detection Action Limit for peanut butter is 30 insect fragments per 100 grams If your peanut butter snack weighs 20 grams what is the probability that youll ingest 5 or more insect parts

EEOS 601 Applied Statistics Week 4 P 12 of 38

Calculate lambda the expected number of insect parts per 20 gramslambda=3010020 solve using the Poisson cumulative distribution functionP=1-poisscdf(4lambda)fprintf(The probability that a 20-g snack contains 5 or more insect parts is 53fnP) The problem could also be solved exactly using the Poisson pdfk=04P2=1-sum(poisspdf(klambda))fprintf(The probability that a 20-g snack contains 5 or more insect parts is 53fnP2)

SSum of Poisson processes is a Poisson process

Questions Page 287 4210 Prussian horse kicks 4212 Midwestern Skies Airlines bags lost

Sum of two Poisson variables is Poisson

422 Interval between events the PoissonExponential

Relationship

Theorem 423 P 290

Suppose a series of events satisfying the Poisson model are occurring at a rate of euml

per unit time Let the random variable Y denote the inteval between consecutive

events Then Y has the exponential distribution

EEOS 601 Applied Statistics Week 4 P 13 of 38

Figure 2 Case study 424 Table 426-7 Figure 423

Case Study 424 LMcs040204_4thm Case Study 424 Mauna Loa eruptions amp exponential distribution P 290-291 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Ed Written by Eugene Gallagher for EEOS601 Figure 3 Figure 423 from Case study 424

IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 14 of 38

EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1711 Revised 1711 Fitting the exponential pdf to intervals between eruptons Data = interval between 37 consecutive eruptions of Mauna Loa DATA=[126 73 3 6 37 23 73 23 2 65 94 51 26 21 6 68 16 20 6 18 6 41 40 18 41 11 12 38 77 61 26 3 38 50 91 12] DATA=DATA() lambda=0027 fprintf(Eruptions occuring at 53f per month or once every 32f lambda1(12lambda)) fprintf(yearsn) histc is the Matlab counting function Plot the data using LMFig030402_4thm as a model ObservedI=histc(DATA020140) bar(020140ObservedI(sum(ObservedI)20)histc) set(get(gcaChildren)FaceColor[8 8 1]) axis([0 145 0 0033]) figure(gcf) xlabel(Interval between eruptions (in months)) ylabel(Density) ax1=gca get the handle for the bar charts axes pause y=0140 fy=lambdaexp(-lambday) h2=line(yfyColorrLineStyle--Linewidth2) set(ax1YTick[0001003]) text(340015f_y(y)=0027e^-0027y) title(Figure 423) figure(gcf) pause

Example 425 LMex040205_4thm Example 425 page 292 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu for EEOS601 If the Perseid meteor shower has a rate of 40 per hour Poisson distributed calculate the probability that someone would have to wait 5 minutes before seeing another y= interval in minutes between consecutive sightings Using the symbolic math toolbox

EEOS 601 Applied Statistics Week 4 P 15 of 38

fprintf(Exact P=n)syms y disp(int(4060exp(-4060y)5inf))P=eval(int(4060exp(-4060y)5inf))fprintf(If the Perseids have a Poisson parameter = 4060 per min n)fprintf(an observer would have p=64f of waiting 5 or more nP)fprintf(minutes between consecutive meteorsn)

43 THE NORMAL DISTRIBUTION

Figure 431 LMFig040301_4thm Page 293 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Generate 20 binomial random numbers p(success)=05 Written in 2001 by EugeneGallagherumbedu revised 1711 Use the statistical toolboxs binorndm X=020n=20p=05

Figure 431 Page 293 in Larsen amp Marx (2006)

Y = binopdf(Xnp)bar(XY1)axis([0 21 0 02])title(Figure 431FontSize14)set(get(gcaChildren)FaceColor[8 8 1])ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0220FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=npsigma_j=sqrt(np(1-p)) sigmaj is the standard deviationy_j=00220fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel()ylabel(Probability)title(Figure 431)figure(gcf)pause

IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 16 of 38

431 Theorem 431 Let X be a binomial random variable defined on n independent trials for which p=P(success) For any numbers a and b

Comment The function is referred to as the standard normal (or

Gaussian) curve [or distribution] By convention any random variable whose probabilistic behavior is described by a standard normal curve is denoted by Z (rather than by X Y or W) E(Z)=0 and Var(Z)=1 Page 294 in Larsen amp Marx (2006)

432 Finding areas under the standard normal curve

Figure 432 LMFig040302_4thm Larsen amp Marx Figure 432 plotpage 295 Use LMex040308_4thm as amodel Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Page 267 Standard normal distributionusing ezplot Integration using the symbolicmath toolbox Figure 4 Figure 432 Written byEugeneGallagherumbedu October 1 2010clear allhold off clfsyms z Just for fun use symbolic math integration to find the integralINTP=eval(int(1sqrt(2pi)exp(-(z^22))-inf114)) or use the standard normal cumulative distribution functionintp=normcdf(114)

Plot Figure 432mu=0sigma=1X=-350135

EEOS 601 Applied Statistics Week 4 P 17 of 38

Y = normpdf(Xmusigma) plot(XY--r) axis([-355 355 0 041]) title(Figure 432FontSize16) ax1=gca xlabel( ) ylabel(f_z(z)FontSize14) ax1=gca set(ax1XTick[-3 -2 -1 0 114 2 3] FontSize12

XTickLabel-3-2-1011423FontSize12) set(ax1ytick0104FontSize12) hold on xf=-3501114yf=normpdf(xfmusigma) fill([-35 xf 114][0 yf 0][8 8 1]) hold off

433 The continuity correction

Figure 5 Figure 433 the continuity correction

Example 431 LMex040301_4thm Example 431 Overbooking airline seats Application of the normal distribution and the continuity correction Page 297 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagher httpwwwesumbeduedgwebphtm Written Fall 2010 Revised 1711 178 seats sold for 168 economy class seats available p=09 passenger will show up What is the probability that not everyone who shows up at the gate on time can be accomodated n=178p=09n2=169

IT
Stamp

EEOS 601 Applied Statistics Week 4 P 18 of 38

P=normcdf((n+05-np)sqrt(np(1-p)))- normcdf((n2-05-np)sqrt(np(1-p))) fprintf( nP not all can be accomodated with continuity correction = 64fnP) Without the continuity correction P2=normcdf((n-np)sqrt(np(1-p)))- normcdf((n2-np)sqrt(np(1-p))) fprintf( P not all can be accomodated without continuity correction = 64fnP2) k=169178 Solve as a binomial pdf P3 = sum(binopdf(knp)) fprintf( P not all can be accomodated exact binomial = 64fnP3) P4 = binocdf(nnp)-binocdf(n2-1np) fprintf( P not all can be accomodated exact binomial cdf = 64fnP4)

Case Study 431 ESP LMcs040301_4thm Case Study 431 ESP P 298 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Thirty two students correctly guessed 12489 times out of 60000 when the expected number of successes would be just 12000 What is the probability of this occuring by chancek=12489n=6e4p=15P1=1-binocdf(knp)fprintf(nUsing the binomial cdf the exact P = 97gnP1)X=(124885-np)sqrt(np(1-p))P2=1-normcdf(X)fprintf(Using the Normal approximation with the continuity n)fprintf(correction the exact P = 97gnP2)X=(12489-np)sqrt(np(1-p))P3=1-normcdf(X)fprintf(Using the Normal approximation without the continuity n)fprintf(correction the exact P = 97gnP3)

Questions Page 299

EEOS 601 Applied Statistics Week 4 P 19 of 38

434 Central Limit Theorem (p 301) Central limit theorem Every binomial random variable X can be written as the sum of n independent Bernoulli random variables X X X where X =1 with probability P and 0 with probability 1-p but if 1 2 n i

X=X +X ++X Theorem 431 can be reexpressed as 1 2 n

The Russian Lyapunov made many of the key advances and in 1920 George Polya called it the central limit theorem Theorem 432 p 273 Let W W be an infinite sequence of of independent random variables 1 2

with the same distribution Suppose that the mean igrave and the variance oacute2 of f (w) are both finite w

For any numbers a and b

Comment The central limit theorem is often stated in terms of the average of W W and 1 2

W rather than the sum Theorem 432 can be stated in equivalent form n

Example 432 LMEx040302_4thm Larsen amp Marx Example 432 Example of the central limit theorem Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf

Figure 5 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 uniformly distributed random numbers after conversion to Z scores The fit to the normal distribution is quite good even though each of the 40 samples is based on only 5 uniformly distributed numbers

EEOS 601 Applied Statistics Week 4 P 20 of 38

See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogrammultiple of 7 n=100samples=1000numbins=14 Numbins for thehistogram multiple of 7 n=1000samples=500numbins=14 For the histogram n=1000samples=40 numbins=7 For the histogramC=rand(samplesn) Figure 6 Example 432 Page 302 in Larsen amp MarxC6=sum(C) sums of n replicates (2006) 1000 samples of 5 uniformly distributed random Use symbolic math toolbox to numbers after conversion to Z scores The number of binscalculate the variance of a uniform increased from 7 to 14 for this bar chart The normal random distribution distribution provides an excellent fit even though each ofmu=05 1000 means is based on only 5 cases drawn from a uniformsyms f z y distribution The uniform distribution is symmetricdisp(Definite integral (01) of y^2) allowing a rapid convergence to the normal distributionint(y^201)EYsq=int(sym(y^2)01)VARY=EYsq-mu^2disp(Var of a uniformly distributed random number on the interval (01))disp(VARY)vary=eval(VARY) convert from symbolic to numeric The variance is 112 The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302) end Use the code on page 5-115 of Statistics toolbox to draw histogram binwidthscaling=7numbins This will produce 7 bins (35-(-35)) edges=-35binwidthscaling35N=histc(C7edges) bar(edgesNsum(N)binwidthscaling1) ax1=gca set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 21 of 38

axis([-36 36 -0005 053]) ylabel(DensityFontSize14)xlabel(Z-ratio) set(ax1xtickedgesFontSize12) set(ax1ytick00105FontSize12) figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is Figure 7 Example 432 Page 302 in Larsen amp Marx f(y_j)=1(sqrt(2pi)sigma_j)exp(-1

(2006) 1000 samples of 500 uniformly distributed random numbers after conversion to Z scores The number of bins

2((y_j-mu_j)sigma_j)^2) increased from 7 to 14 for this bar chart The fit of the mu_j=0 sigma_j=1 sigmaj is the standard

normal distribution is excellent as would be expected from the large number of cases (500) on which each mean is

deviation = 1 after Z transform basedy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)s=sprintf(Example 432 n=40f and samples=30fnsamples)title(s)figure(gcf)pause

LMex040302_logrnd_4thm Larsen amp Marx Example 432 Example of the central limit theorem using lognormally distributed variables instead of uniform Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox

Figure 8 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 lognormally distributed (igrave=1 oacute=1) random numbers after conversion to Z-scores The normal approximation is quite poor Because each of the 40 samples is based on a mean of only 5 cases

EEOS 601 Applied Statistics Week 4 P 22 of 38

Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogram multiple of 7 Figure 9 Example 432 Page 302 in Larsen amp Marx

n=100samples=1000 (2006) 1000 samples of 1000 lognormally distributed

numbins=14 Numbins for the (igrave=1 oacute=1) random numbers after conversion to Z scores I

histogram multiple of 7 used 14 bins for this bar chart Even for the highly

n=1000samples=1000 asymmetrically lognormally distributed samples the

numbins=14 For the histogram normal approximation is excellent because of the large

n=1000samples=40 numbins=7 number of cases (1000) on which each mean was based

For the histogram lognormal=0 This example calculates the z scores based on the parametric mean and variance for the underlying distributions For the uniform distribution the variance is calculated from the variance of the distribution to be 112 For the lognormal distribution lognstat is the built-in Matlab function that will covert the mean and variance from the log scale to the standard scale if lognormal mu=1var=1 C=lognrnd(11samplesn) [muvary]=lognstat(11)s=sprintf(Example 432 Lognormal n=40f and samples=30fnsamples) s2=sprintf(Example 432 Lognormal distribution 1st sample n=40fn) else C=rand(samplesn)

mu=05 Use symbolic math toolbox to calculate the variance of a uniform

random distribution syms f z y disp(Definite integral (01) of y^2) int(y^201) EYsq=int(sym(y^2)01)

EEOS 601 Applied Statistics Week 4 P 23 of 38

VARY=EYsq-mu^2 disp( Var of a uniformly distributed random number on the interval (01)) disp(VARY) vary=eval(VARY) convert from symbolic to numeric The variance is 112 s=sprintf(Example 432 Uniform n=40f and samples=40fnsamples) s2=sprintf(Example 432 Uniform distribution 1st sample n=40fn)endC6=sum(C) sums of n replicates The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302)end

Use the code on page 5-115 of Statistics toolbox to draw histogram Draw a histogram of one row of datahist(C(1))ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])ylabel(FrequencyFontSize16)xlabel(XFontSize16)title(s2FontSize20)figure(gcf)pause

histogram of the z scoresbinwidthscaling=7numbins This will produce 7 bins (35-(-35))edges=-35binwidthscaling35N=histc(C7edges)bar(edgesNsum(N)binwidthscaling1histc)ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])axis([-36 36 -0005 105max(Nsum(N)binwidthscaling)])ylabel(DensityFontSize16)xlabel(Z-ratioFontSize16)set(ax1xtickedgesFontSize14)set(ax1ytick00107FontSize14)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=0

EEOS 601 Applied Statistics Week 4 P 24 of 38

sigma_j=1 sigmaj is the standard deviation = 1 after Z transformy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)

title(sFontSize20)figure(gcf)pause

Example 433 LMex040303_4thm Example 433 Pp 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition An application of the symbolic math integration functions Typo in book f_y(y)=y 3(1-y)^2 not fy(y)=3(1-y)^2 syms y f_y(y)=y 3(1-y)^2 0lt=y=1 Let mean y = (115)Sum from 1 to 15 Yi int(y3(1-y)^201) muy=eval(int(y3(1-y)^201)) by definition of variance int(y^23(1-y)^201)-14^2 vary=eval(int(y^23(1-y)^201)-14^2) P=normcdf((38-muy) (sqrt(vary)sqrt(15)))- normcdf((18-muy) (sqrt(vary)sqrt(15)))

Example 434 LMEx040304_4thm Example 434 Effects of rounding analyzed with normal approximation Effect of rounding on the expected value and variance of rounded estimates P 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Using the central limit theorem what is the probability that the error on 100 transactions each rounded to the nearest $100 would exceed $500 What is the variance of a uniformly distributed random number over the interval -50 to 50 From definition of Variance Var(Y)=E[(Y-mu)^2]= integral from -50 to 50 (y-mu)^2 f_y(y) dy mu=0 f_y(y)=1 This is the height of the uniform pdf at each value of y for max(y)-min(y)=1 the value of f_y(y) is 1 for uniform random numbers spread over different ranges for max(y)-min(y)f_y(y) must equal 1 thus for a uniform random number distributed on the interval -100 to 100

EEOS 601 Applied Statistics Week 4 P 25 of 38

f_y(y)=1200 Written by EugeneGallagherumbedu for EEOS601 httpwwwesumbeduedgwebphtm Written in 2001 revised 1911 The mfile goes through some applications of the symbolic math toolbox at the beginning to verify that the variance of 100 rounded transactions is 25003 format rat syms y mu int(y^2100-5050)Use the symbolic math toolbox to find the variance of a uniformly distributed variable on the interval -50 to 50 it is 25003 VarY=eval(int(sym(y^2100)-5050)) Variance of a uniform randomly distributed number on the interval -50 to 50 is 25003 Note how the variance remains 25003 with a shift in mean Var(Y)=E(Y^2)-mu^2 mu=50 int(y^21000100) EYsq=eval(int(sym(y^2100)0100)) The previous statement produces 100003 which produces an identical variance after subtracting the square of the expected value mu^2 as is appropriate from the definition of variance VARY=EYsq-mu^2 Note that Var(Y) is also defined as integral -inf to inf (y-mu)^2 f_y(y) dy We can create the definite integral of this equation and evaluate it to produce the variance syms y mu mu=50 VARY2=eval(int(sym((y-mu)^2100)y0100)) VARY2 should also be 25003 format using Theorem 31312 in Larsen amp Marx (2001 p 223) Calculate the variance of all 100 transactions (multiply the variance by 100) Note this is for 100 indepedently distributed random variates where the positive and negative errors among transactions are not correlated Varall=100VARY If you wanted to convert the variance calculated here in dollars to the variance in pennies youd have to use Theorem 3131 p 222 It would the variance of Varall in pennies would be a=100Var(aVarall)=a^2 Varall =10000 Varall calculate the standard deviation sigmaall=sqrt(Varall)

EEOS 601 Applied Statistics Week 4 P 26 of 38

Calculate the Z statisticZ=zeros(21)Z(1)=(-500-0)sigmaallZ(2)=(500-0)sigmaall Use the cumulative normal probability function to find the probability of being more than $5 dollars off after 100 transactionsP=1-(normcdf(Z(2))-normcdf(Z(1)))fprintf(The probability of being more than $500 off after 100n)fprintf(transactions in which each transaction is rounded to the n)fprintf(nearest $100 is 64fnP)

Example 435 LMex040305_4thm Example 435 Memphis earthquakes and the Poisson distribution Pp 305-306 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Annual earthquakes registering 25 on the Richter scale and within 40 miles of downtown Memphis have a Poisson distribution with lambda=65 Calculate the exact probability that nine or more such earthquakes will strike next year using the Poisson distribution and compare that with the normal approximation based on the central limit theoremlambda=65N=9P=1-poisscdf(N-1lambda)fprintf(nThe exact p value using the Poisson cumulative distribution )fprintf(function with lambda=53f is 74fnlambdaP)P2=1-normcdf(((N-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP2)continuity=05 The continuity correction is appropriate whenever a discrete probability model is being approximated by the area under a curve)P3=1-normcdf(((N+continuity-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem with )fprintf(the continuity correction is )fprintf(74fnP3)

Optional plot the Poisson pdf and the normal approximation Use LMFig040301_4thm as a modelX=018Y = poisspdf(Xlambda)bar(XY1)axis([-06 19 0 02])title(Example 435FontSize14)set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 27 of 38

ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0218FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=lambdasigma_j=sqrt(lambda) sigmaj is the standard deviationy_j=00218fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel(Number of Memphis Earthquakes gt 25)figure(gcf)pause

Questions p 306

435 The Normal Curve as a Model for Individual Measurements

Definition 431 A random variable Y is said to be normally distributed with mean igrave and variance oacutesup2 if

Comment Areas under and ldquoarbitraryrdquo normal distribution f (y) y

are calculated by finding the equivalent area under the standard normal distribution f (z) z Figure 10 Figure 435 from Example 436 If a driverrsquos

true blood alcohol level is 0095 and the breathalyzer has an instrumental error oacute= 0004 (pdf shown in upper panel) What is the probability that a driver will be

The ratio is often referred to erroneously charged with DUI The problem can be as either a Z transformation or a Z analyzed using the pdf for the unscaled data or the data can score be converted to z scores (lower panel) by subtracting the

mean and dividing by oacute The probability of an erroneous Example 436 charge is 1056 the area to the right of the cutpoints LMex040306_4thm shown in the normal and standard normal curves

EEOS 601 Applied Statistics Week 4 P 28 of 38

Example 436 Drunk driving pages 308-309 in Larsen amp Marx (2006) Based on LMex040305_4thm Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml A legally drunk driver in many states has a blood alcohol level of 010 but the breathalyzer has measurement error Repeated measurements indicate that the the estimate of the mean is unbiased and sigma is 0004 If a drivers true blood alcohol content is 0095 what is the probability that he will be incorrectly booked on a DUI chargemu=0095sigma=0004Observed=01P=1-normcdf((Observed-mu)sigma)fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP)

Plot Figure 435subplot(211)X=0075000050115Y = normpdf(Xmusigma)plot(XY--r)axis([0075 0115 0 105])title(Figure 435FontSize14)ax1=gcaxlabel(Blood Alcohol Level)ylabel(f_y(y)FontSize14)ax1=gcaset(ax1xtick0080005011FontSize12)set(ax1ytick025100FontSize12)hold onxf=0100001115yf=normpdf(xfmusigma)fill([01 xf 0115][0 yf 0][8 8 1])hold off

subplot(212)X=-3500135Y = normpdf(X01)plot(XY--r)axis([-355 355 0 042])ax1=gcaxlabel(Z-score)ylabel(f_z(z)FontSize14)ax1=gca

EEOS 601 Applied Statistics Week 4 P 29 of 38

set(ax1xtick-350535FontSize12)set(ax1ytick00204FontSize12)hold onxf=125000135yf=normpdf(xf01)fill([125 xf 35][0 yf 0][8 8 1])figure(gcf)pausehold offsubplot(111)

Example 437 Mensa p 309 LMex040307_4thm Example 437 Mensa amp IQ Pages309-310 in Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Based on LMex040306_4thm Written Fall 2010 by EugeneGallagher for EEOS601 Revised1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfi Figure 11 Figure 436 from Example 437 A Mensa

lesedgwebphtml score of 133 marks the upper 2 of IQrsquos if igrave=100 and

A person must be in the upper 2 oacute=16

of the population in IQ to join What is the minimum IQ that will qualify a person for membership Assume that the mean IQ is 100 and sigma is 16mu=100sigma=16 Use the inverse of the normal pdf to find the IQ cutoff for the upper 2 of the population 098=(1-002)IQmensa=norminv(09810016)fprintf(nThe lowest acceptable IQ for membership in Mensa is )fprintf(30fnceil(IQmensa))

Plot Figure 436X=50150Y = normpdf(Xmusigma)plot(XY--r)axis([45 155 0 003])title(Figure 436 Mensa amp IQFontSize20)ax1=gcaxlabel(IQFontSize16)ylabel(f_y(y)FontSize16)ax1=gcaset(ax1xtick[50 100 133 150]FontSize14)

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 2: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 2 of 38

Figure 8 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 lognormally distributed (igrave=1 oacute=1) random

numbers after conversion to Z-scores The normal approximation is quite poor Because each of the 40 samples is

based on a mean of only 5 cases 21

Figure 9 Example 432 Page 302 in Larsen amp Marx (2006) 1000 samples of 1000 lognormally distributed (igrave=1 oacute=1)

random numbers after conversion to Z scores I used 14 bins for this bar chart Even for the highly asymmetrically

lognormally distributed samples the normal approximation is excellent because of the large number of cases

(1000) on which each mean was based 22

Figure 10 Figure 435 from Example 436 If a driverrsquos true blood alcohol level is 0095 and the breathalyzer has an

instrumental error oacute= 0004 (pdf shown in upper panel) What is the probability that a driver will be erroneously

charged with DUI The problem can be analyzed using the pdf for the unscaled data or the data can be converted

to z scores (lower panel) by subtracting the mean and dividing by oacute The probability of an erroneous charge is

1056 the area to the right of the cutpoints shown in the normal and standard normal curves 27

Figure 11 Figure 436 from Example 437 A Mensa score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16 29

Figure 12 Figure 437 from Example 438 A Mensa score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16 30

Figure 13 Figure 471 An application of Matlabrsquos exponential distribution which is defined differently from Larson amp

Marx (2006) y=0145fyy=exppdf(y1lambda) 35

Figure 14 Figure 472 An application of Matlabrsquos normal pdf Page 334 in Larsen amp Marx (2006) page 334 35

Figure 15 Figure 476 The result of 100 Monte Carlo simulations of repair costs Page 336 in Larsen amp Marx (2006)

page 334 The max median and mean repair bills are $737 $142 and $161 15 of customers had no repair

bills 36

Figure 16 Figure 476 The result of 1 million Monte Carlo simulations of repair costs Page 336 in Larsen amp Marx

(2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

36

List of Tables

List of mfiles

Assignment

REQUIRED READING

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

Read Sections 41-43 We wonrsquot cover 44-47 on the geometric negative binomial and gamma

distributions in this course There are some applications of these distributions in environmental science particularly the use of the negative binomial to model aggregated plant distributions but there is insufficient time to cover everything I provide the Matlab code for the examples and case studies

EEOS 601 Applied Statistics Week 4 P 3 of 38

Understanding by Design Templates

Understanding By Design Stage 1 mdash Desired Results Week 4 LM Chapter 4 Special Distributions

G Established Goals bull Learn how to solve applied problems with Matlabrsquos pdf cdfrsquos and inverse functions

for the Poisson exponential and normal distributions bull Learn how to model individual behavior using the standard normal (Gaussian) curve bull Describe the effects of sampling error and rounding on the expected values and

variances of computed values from observations (An MCAS 10th grade standard) U Understand bull Most statistical inference mdash frequentist amp Bayesian mdash is based on the uniform

binomial Poisson amp Normal distributions bull The Poisson and Normal distributions were developed as approximations for the

binomial distribution but are useful themselves in modeling nature bull The Poisson model is the standard model for lsquorandomrsquo distributions of animals and

plants in nature bull Increased precision of measurement is one of the benchmarks of scientific advance

and propagation of error is the ruler for the benchmark Q Essential Questions bull Why is the Poisson distribution so useful in modeling random events in nature bull Is the normal distribution the foundation of all quantitative science bull Is estimating the variance of a climate change prediction more important than

estimating its expected value K Students will know how to define (in words or equations) bull Poisson exponential Monte Carlo simulation normal and standard normal

distributions propagation of error standard error z-score and z-transform S Students will be able to bull Write Matlab programs using the pdf cdf and inverse distribution functions for the

binomial Poisson exponential and normal distributions bull Provide examples from their own fields of the Poisson exponential amp normal

distributions bull Be able to state to a layman the meaning of p values based on the distribution

functions for the binomial Poisson amp normal distributions

EEOS 601 Applied Statistics Week 4 P 4 of 38

Understanding by Design Stage II mdash Assessment Evidence Week 4 621-627 LM Chapter 41-43 (pages 274-316) Special Distributions

bull Post in the discussion section by 62911 W bull Find an application from the primary literature in your own field or your own

work using the Poisson or normal distributions Hint use Google scholar eg search under lsquoCoral abundances Poissonrsquo) Describe the application and include a citation or link if it is from the scientific literature

bull HW 4 Problems due Wednesday 62911 W 10 PM bull Each problem must be solved using Matlab

bull Submit your Matlab code as part a Word Wordperfect or rtf document summarizing the answer I must be able to run your Matlab code after a simple cut-and-paste from your document to the Matlab editor

bull You must reach a 1-2 sentence conclusion about your result bull Submit to the course Vista 8 website

bull Basic problems (4 problems 10 points) bull Problem 422 P 280 Prescription errors

bull Compare the Poisson limit approximation with the exact probability from the binomial pdf

bull Use Case Study 421 as a model bull Problem 4212 P 287 Midwestern Skies Lost Bags

bull Use Case Study 422 as a model bull Matlab hints

bull Enter the data as BagsLost bull lambda=mean(BagsLost) bull k=[0max(BagsLost)+4] bull ObservedF=hist(BagsLostk) bull bar(kObservedF)figure(gcf)

bull Problem 4320 Nevada nuclear weapons test p 306-307 bull Use Example 435 (Memphis earthquakes as a model)

bull Problem 4322 P 314 Childrenrsquos IQrsquos and school cost bull Use Example 436 (Drunk Driving p 308-309) as a paradigm

bull Advanced problems (25 points each) bull Problem 4320 Nevada nuclear weapons Graph both the Poisson pdf

and superimposed normal pdf bull Use Example the graph in Gallagher Example 435 as a model

bull Problem 4322 p 314 Childrenrsquos IQrsquos and school cost bull Plot the pdf as in Figure 436 using Matlabrsquos fill function to fill

in the areas corresponding to special needs bull Use Example 437 as a model

bull Master problems (1 only 5 points) bull What is the probability of observing 2 or more magnitude 5 earthquakes

within 1 degree latitude amp longitude of Los Angeles next year bull httpearthquakeusgsgovearthquakeseqarchivesepicepic_rectphp

bull Search database 1173 to 123110 (38 years) 3305 to 3505 lat -11925 to -11725 longitude

EEOS 601 Applied Statistics Week 4 P 5 of 38

Introduction

Chapter 4 is my favorite chapter in Larsen amp Marx (2006) It also covers the Poisson distribution which is the null model for random distributions in nature The case studies and examples are interesting including the analysis of leukemia clusters in Niles Illinois (Case Study 421) and the probability that yoursquoll be eating 5 or more insect parts with your next peanut butter and jelly sandwich (Example 424)

Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4

4 Special distributionsLambert Adolphe Jacques Quetelet (1796-1874)

41 Introduction 411 To qualify as a probability model over a sample space S a function

4111 must be nonnegative for all outcomes in S an 4112 must integrate to 1

412 Not all functions qualify as a probability model 42 The Poisson distribution

4211 binomial distribution

4212 The Poisson limit

Poisson limit theorem (Larsen amp Marx 2001 Theorem 421 p 251) If n 4 and p 0 in such a way that euml = np remains constant then for any nonnegative integer k

Example 421 Convergence of the Poisson very good for n100

EEOS 601 Applied Statistics Week 4 P 6 of 38

LMex040201_4thm Example 421 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Table 421 amp Table 422 Larsen amp Marx (2006 p 277) Written by Eugene Gallagher for EEOS601 in2001 revised 1911 EugeneGallagherumbedu Comparing Poisson pdf and binomial pdf Figure 1 Figure from Table 421 With n=5

p=15 (euml=1) the Poisson approximation is just

n=5 adequate

k = [0n]lambda=1p_pdf = poisspdf(klambda)p_pdf=[p_pdf1-sum(p_pdf)]b_pdf= binopdf(knp)b_pdf=[b_pdf1-sum(b_pdf)]kdisp=[06]fprintf(Table 421n)fprintf(f10 53f 53fn[kdisp b_pdf p_pdf])

Plot as a grouped histogram bar(kdisp[b_pdf p_pdf]grouped) axis([-6 66 0 05]) legend(Binomial pdfPoisson pdf) xlabel(kFontsize14) ylabel(FrequencyFontsize14) title(Table 421Fontsize16) figure(gcf)pause p=1100n=100 Figure 1 Figure from Table 422 With

ndisp=[09] n=100 (euml=1) the Poisson approximation is

k = [0n] very good

lambda=1p_pdf = poisspdf(ndisplambda)p_pdf=[p_pdf1-sum(p_pdf)]b_pdf= binopdf(ndispnp)b_pdf=[b_pdf1-sum(b_pdf)]kdisp=[010]

IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 7 of 38

fprintf(Table 421n)fprintf(f10 53 53fn[kdisp b_pdf p_pdf]) Plot as a grouped histogram bar(kdisp[b_pdf p_pdf]grouped) legend(Binomial pdfPoisson pdfFontSize16) axis([-6 106 0 04]) xlabel(kFontSize14) ylabel(FrequencyFontSize14) title(Table 422FontSize16) figure(gcf)pause

Table 421

Figure 1 Table 421 Table 422

Example 422 Shadyrest hospital LMex040202_4thm Example 422 Shady Rest Hospital cardiac monitoring machines p 277-288 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition There are 12000 residents in the area served by Shady Rest Hospital and the probability that a resident will have a heart attack and need a monitor is 18000 There are 3 monitors available What is the probability that the 3 monitors will be inadequate to meet the communitys needs That is what is the probability of 4 or more heart attacks Written by EugeneGallagherumbedu Fall 2010 fprintf(Solved as the sum of binomial probabilities 86fn 1-sum(binopdf(031200018000))) fprintf(Or solved using the binomial cumulative distribution function 86fn 1-sum(binocdf(31200018000))) fprintf(Or solved using the Poisson distribution 86fn 1-sum(poisspdf(03120008000))) fprintf(Or solved using the cumulative Poisson distribution 86fn

IT
Stamp

EEOS 601 Applied Statistics Week 4 P 8 of 38

1-sum(poisscdf(3120008000)))

Case Study 421 Leukemia in Niles Illinois LMcs040201_4thm Case Study 421 Leukemia cancer cluster Pp 278-279 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Edition Written by Eugene D Gallagher for UMASSBostons EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1012010 Revised 1711 p=incidence of leukemia in 5 13 years in towns around Niles IL What is the probability of observing 8 or more childhood leukemia cases by chance alone Solve using the binomial pdf and the Poisson limit N=number of people in Niles ILN=7076p=2481e5X=87076X=Xbp=binopdf(XNp)BP=sum(bp) or it could be solved using the complementX2=07BP2=1-sum(binopdf(X2Np))fprintf(nFrom the binomial pdf the exact probability of observing n8 or more ) fprintf(deaths among 7076 children in 5 13 years is 63gnBP)fprintf(From the complement of the binomial pdf the exact p of observing n8 or)fprintf( more deaths among 7076 children in 5 13 years is 63gnBP2) Now solve using the Poisson pdf lambda is the expected number of cases among 7076 in Niles Illionoislambda=7076248e-5PP= 1-sum(poisspdf(X2lambda))fprintf(nFrom the Poisson pdf the approximate probability of observing n8 or ) fprintf(more deaths among 7076 children in 5 13 years with a Poisson n)fprintf(expected value of 63g is 63gnnlambdaPP)

Or one could use the Poisson cumulative distribution functionPCP = 1-poisscdf(7lambda)fprintf(From the cumulative Poisson pdf the approximate probability of n) fprintf(observing 8 or more deaths among 7076 children in 5 13 years )fprintf(with na Poisson expected value of 63g is 63gnn lambdaPCP) RE=abs(BP-PP)BP 100 fprintf( The relative error in the Poisson approximation is 41fnRE)

Questions p 279

EEOS 601 Applied Statistics Week 4 P 9 of 38

4213 The Poisson Distribution 42131 First described by Poisson as a limit theorem and

then used in 1898 by Professor Ladislaus von Bortkiewicz to model the number the Prussian cavalry officers kicked to death by their horses

Larsen amp Marx 2001 Theorem 422 p 251 The random variable X is said to have a Poisson distribution if

4214 Fitting the Poisson Distribution to Data

Case Study 422 LMcs040202_4thm Case Study 422 alpha decay amp Geiger counter Page 282-284 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1142010 Revised 1711 Fitting the Poisson pdf to radioactive decay alpha-particlek=[011]Frequency=[57 203 383 525 532 408 273 139 45 27 10 6]sumFrequency=sum(Frequency)lambda=sum(kFrequency)sumFrequencyfprintf(The Poisson parameter is 53fnlambda)ObservedP=FrequencysumFrequencyExpectedP=poisspdf(klambda)TableData=[k Frequency ObservedP ExpectedP]fprintf(ntttttTable 423n)fprintf(No Detected k tFrequency tProportion tp_x(k)n)fprintf( 20ftttt20fttt53ftt53fnTableData)

Using concepts from Chapter 5 a MLE fit could be made of the data The only new feature added is a 95 CI for the binomial parameter See LMcs050201_4thm for another example of using Poisspdf to fit the Poisson distribution to dataDATA=[zeros(571)ones(2031)2ones(3831)3ones(5251)4ones(5321) 5ones(4081)6ones(2731)7ones(1391)8ones(451) 9ones(271)10ones(101)11ones(61)] [LAMBDAHAT LAMBDACI] = poissfit(DATA005) fprintf(

nThe Poisson parameter =53f with 95 CI [53f 53f]nn LAMBDAHAT LAMBDACI)

EEOS 601 Applied Statistics Week 4 P 10 of 38

Case Study 423 LMcs040203_4thm Fitting football fumbles to a Poisson pdf From Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu Revised 1142010 See LMcs040202_4thm (alpha decay) for a very similar programk=[07]Frequency=[8 24 27 20 17 10 3 1]sumFrequency=sum(Frequency)meank=sum(kFrequency)sumFrequencyfprintf(The Poisson parameter is 53fnmeank)ObservedP=FrequencysumFrequencyExpectedP=Poisspdf(kmeank)TableData=[k Frequency ObservedP ExpectedP]fprintf(ntttttTable 425n)fprintf(No Detected k tFrequency tProportion tp_x(k)n)fprintf( 20ftttt20fttt53ftt53fnTableData) Using concepts from Chapter 5 a MLE fit could be made of the data The only new feature added is a 95 CI for the binomial parameter See LMcs050201_4thm for another example of using Poisspdf to fit the Poisson distribution to dataDATA=[zeros(81)ones(241)2ones(271)3ones(201)4ones(171) 5ones(101)6ones(31)7] [LAMBDAHAT LAMBDACI] = poissfit(DATA005) fprintf(

nThe Poisson parameter =53f with 95 CI [53f 53f]nn LAMBDAHAT LAMBDACI)

4215 The Poisson Model the law of small numbers 421511 Assumptions of the Poisson model

4215111 The probability that 2 or more events occur in any given ltsmallgt subinterval is essentially 0

4215112 Events are independent 4215113 The probability that an event occurs

during a given subinterval is constant over the entire interval from 0 to T

EEOS 601 Applied Statistics Week 4 P 11 of 38

42152 Calculating Poisson Probabilities 421521 ldquoCalculating Poisson probabilities is an

exercise in choosing T so that eumlT represents the expected number of occurrences in whatever ldquounitrdquo is associated with the random variable X

Example 423 LMex040203_4thm Example 423 Typographical Errors Page 285 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 172011 Revised 1711 Using the Poisson pdf ErrorRate=typographical Errors per page What is the probability that fewer than three typos will appear in a 16shy page edition X=Number of errorsErrorRate=04lambda=ErrorRate16fprintf(Using the Poisson cumulative distribution functionn)P=poisscdf(2lambda)fprintf(The probability that fewer than 3 typos will appear in a n)fprintf(16-p document with an error rate of 31f per page is 53fnlambdaP)fprintf(nUsing the Poisson probability distribution functionn)X=02P=sum(poisspdf(Xlambda))fprintf(If the Poisson parameter = 31f the probability of nlambda)fprintf(fewer than 3 typos = 53fnP)

Example 424 Insect parts in peanut butter

LMex040204_4thm Example 424 Eating bug parts Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu 121210 revised 11211 httpalphaesumbedufacultyedgfilesedgwebphtml This is a simple application of the Poisson cumulative distribution function and Poisson pdf The FDA Food Detection Action Limit for peanut butter is 30 insect fragments per 100 grams If your peanut butter snack weighs 20 grams what is the probability that youll ingest 5 or more insect parts

EEOS 601 Applied Statistics Week 4 P 12 of 38

Calculate lambda the expected number of insect parts per 20 gramslambda=3010020 solve using the Poisson cumulative distribution functionP=1-poisscdf(4lambda)fprintf(The probability that a 20-g snack contains 5 or more insect parts is 53fnP) The problem could also be solved exactly using the Poisson pdfk=04P2=1-sum(poisspdf(klambda))fprintf(The probability that a 20-g snack contains 5 or more insect parts is 53fnP2)

SSum of Poisson processes is a Poisson process

Questions Page 287 4210 Prussian horse kicks 4212 Midwestern Skies Airlines bags lost

Sum of two Poisson variables is Poisson

422 Interval between events the PoissonExponential

Relationship

Theorem 423 P 290

Suppose a series of events satisfying the Poisson model are occurring at a rate of euml

per unit time Let the random variable Y denote the inteval between consecutive

events Then Y has the exponential distribution

EEOS 601 Applied Statistics Week 4 P 13 of 38

Figure 2 Case study 424 Table 426-7 Figure 423

Case Study 424 LMcs040204_4thm Case Study 424 Mauna Loa eruptions amp exponential distribution P 290-291 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Ed Written by Eugene Gallagher for EEOS601 Figure 3 Figure 423 from Case study 424

IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 14 of 38

EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1711 Revised 1711 Fitting the exponential pdf to intervals between eruptons Data = interval between 37 consecutive eruptions of Mauna Loa DATA=[126 73 3 6 37 23 73 23 2 65 94 51 26 21 6 68 16 20 6 18 6 41 40 18 41 11 12 38 77 61 26 3 38 50 91 12] DATA=DATA() lambda=0027 fprintf(Eruptions occuring at 53f per month or once every 32f lambda1(12lambda)) fprintf(yearsn) histc is the Matlab counting function Plot the data using LMFig030402_4thm as a model ObservedI=histc(DATA020140) bar(020140ObservedI(sum(ObservedI)20)histc) set(get(gcaChildren)FaceColor[8 8 1]) axis([0 145 0 0033]) figure(gcf) xlabel(Interval between eruptions (in months)) ylabel(Density) ax1=gca get the handle for the bar charts axes pause y=0140 fy=lambdaexp(-lambday) h2=line(yfyColorrLineStyle--Linewidth2) set(ax1YTick[0001003]) text(340015f_y(y)=0027e^-0027y) title(Figure 423) figure(gcf) pause

Example 425 LMex040205_4thm Example 425 page 292 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu for EEOS601 If the Perseid meteor shower has a rate of 40 per hour Poisson distributed calculate the probability that someone would have to wait 5 minutes before seeing another y= interval in minutes between consecutive sightings Using the symbolic math toolbox

EEOS 601 Applied Statistics Week 4 P 15 of 38

fprintf(Exact P=n)syms y disp(int(4060exp(-4060y)5inf))P=eval(int(4060exp(-4060y)5inf))fprintf(If the Perseids have a Poisson parameter = 4060 per min n)fprintf(an observer would have p=64f of waiting 5 or more nP)fprintf(minutes between consecutive meteorsn)

43 THE NORMAL DISTRIBUTION

Figure 431 LMFig040301_4thm Page 293 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Generate 20 binomial random numbers p(success)=05 Written in 2001 by EugeneGallagherumbedu revised 1711 Use the statistical toolboxs binorndm X=020n=20p=05

Figure 431 Page 293 in Larsen amp Marx (2006)

Y = binopdf(Xnp)bar(XY1)axis([0 21 0 02])title(Figure 431FontSize14)set(get(gcaChildren)FaceColor[8 8 1])ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0220FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=npsigma_j=sqrt(np(1-p)) sigmaj is the standard deviationy_j=00220fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel()ylabel(Probability)title(Figure 431)figure(gcf)pause

IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 16 of 38

431 Theorem 431 Let X be a binomial random variable defined on n independent trials for which p=P(success) For any numbers a and b

Comment The function is referred to as the standard normal (or

Gaussian) curve [or distribution] By convention any random variable whose probabilistic behavior is described by a standard normal curve is denoted by Z (rather than by X Y or W) E(Z)=0 and Var(Z)=1 Page 294 in Larsen amp Marx (2006)

432 Finding areas under the standard normal curve

Figure 432 LMFig040302_4thm Larsen amp Marx Figure 432 plotpage 295 Use LMex040308_4thm as amodel Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Page 267 Standard normal distributionusing ezplot Integration using the symbolicmath toolbox Figure 4 Figure 432 Written byEugeneGallagherumbedu October 1 2010clear allhold off clfsyms z Just for fun use symbolic math integration to find the integralINTP=eval(int(1sqrt(2pi)exp(-(z^22))-inf114)) or use the standard normal cumulative distribution functionintp=normcdf(114)

Plot Figure 432mu=0sigma=1X=-350135

EEOS 601 Applied Statistics Week 4 P 17 of 38

Y = normpdf(Xmusigma) plot(XY--r) axis([-355 355 0 041]) title(Figure 432FontSize16) ax1=gca xlabel( ) ylabel(f_z(z)FontSize14) ax1=gca set(ax1XTick[-3 -2 -1 0 114 2 3] FontSize12

XTickLabel-3-2-1011423FontSize12) set(ax1ytick0104FontSize12) hold on xf=-3501114yf=normpdf(xfmusigma) fill([-35 xf 114][0 yf 0][8 8 1]) hold off

433 The continuity correction

Figure 5 Figure 433 the continuity correction

Example 431 LMex040301_4thm Example 431 Overbooking airline seats Application of the normal distribution and the continuity correction Page 297 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagher httpwwwesumbeduedgwebphtm Written Fall 2010 Revised 1711 178 seats sold for 168 economy class seats available p=09 passenger will show up What is the probability that not everyone who shows up at the gate on time can be accomodated n=178p=09n2=169

IT
Stamp

EEOS 601 Applied Statistics Week 4 P 18 of 38

P=normcdf((n+05-np)sqrt(np(1-p)))- normcdf((n2-05-np)sqrt(np(1-p))) fprintf( nP not all can be accomodated with continuity correction = 64fnP) Without the continuity correction P2=normcdf((n-np)sqrt(np(1-p)))- normcdf((n2-np)sqrt(np(1-p))) fprintf( P not all can be accomodated without continuity correction = 64fnP2) k=169178 Solve as a binomial pdf P3 = sum(binopdf(knp)) fprintf( P not all can be accomodated exact binomial = 64fnP3) P4 = binocdf(nnp)-binocdf(n2-1np) fprintf( P not all can be accomodated exact binomial cdf = 64fnP4)

Case Study 431 ESP LMcs040301_4thm Case Study 431 ESP P 298 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Thirty two students correctly guessed 12489 times out of 60000 when the expected number of successes would be just 12000 What is the probability of this occuring by chancek=12489n=6e4p=15P1=1-binocdf(knp)fprintf(nUsing the binomial cdf the exact P = 97gnP1)X=(124885-np)sqrt(np(1-p))P2=1-normcdf(X)fprintf(Using the Normal approximation with the continuity n)fprintf(correction the exact P = 97gnP2)X=(12489-np)sqrt(np(1-p))P3=1-normcdf(X)fprintf(Using the Normal approximation without the continuity n)fprintf(correction the exact P = 97gnP3)

Questions Page 299

EEOS 601 Applied Statistics Week 4 P 19 of 38

434 Central Limit Theorem (p 301) Central limit theorem Every binomial random variable X can be written as the sum of n independent Bernoulli random variables X X X where X =1 with probability P and 0 with probability 1-p but if 1 2 n i

X=X +X ++X Theorem 431 can be reexpressed as 1 2 n

The Russian Lyapunov made many of the key advances and in 1920 George Polya called it the central limit theorem Theorem 432 p 273 Let W W be an infinite sequence of of independent random variables 1 2

with the same distribution Suppose that the mean igrave and the variance oacute2 of f (w) are both finite w

For any numbers a and b

Comment The central limit theorem is often stated in terms of the average of W W and 1 2

W rather than the sum Theorem 432 can be stated in equivalent form n

Example 432 LMEx040302_4thm Larsen amp Marx Example 432 Example of the central limit theorem Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf

Figure 5 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 uniformly distributed random numbers after conversion to Z scores The fit to the normal distribution is quite good even though each of the 40 samples is based on only 5 uniformly distributed numbers

EEOS 601 Applied Statistics Week 4 P 20 of 38

See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogrammultiple of 7 n=100samples=1000numbins=14 Numbins for thehistogram multiple of 7 n=1000samples=500numbins=14 For the histogram n=1000samples=40 numbins=7 For the histogramC=rand(samplesn) Figure 6 Example 432 Page 302 in Larsen amp MarxC6=sum(C) sums of n replicates (2006) 1000 samples of 5 uniformly distributed random Use symbolic math toolbox to numbers after conversion to Z scores The number of binscalculate the variance of a uniform increased from 7 to 14 for this bar chart The normal random distribution distribution provides an excellent fit even though each ofmu=05 1000 means is based on only 5 cases drawn from a uniformsyms f z y distribution The uniform distribution is symmetricdisp(Definite integral (01) of y^2) allowing a rapid convergence to the normal distributionint(y^201)EYsq=int(sym(y^2)01)VARY=EYsq-mu^2disp(Var of a uniformly distributed random number on the interval (01))disp(VARY)vary=eval(VARY) convert from symbolic to numeric The variance is 112 The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302) end Use the code on page 5-115 of Statistics toolbox to draw histogram binwidthscaling=7numbins This will produce 7 bins (35-(-35)) edges=-35binwidthscaling35N=histc(C7edges) bar(edgesNsum(N)binwidthscaling1) ax1=gca set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 21 of 38

axis([-36 36 -0005 053]) ylabel(DensityFontSize14)xlabel(Z-ratio) set(ax1xtickedgesFontSize12) set(ax1ytick00105FontSize12) figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is Figure 7 Example 432 Page 302 in Larsen amp Marx f(y_j)=1(sqrt(2pi)sigma_j)exp(-1

(2006) 1000 samples of 500 uniformly distributed random numbers after conversion to Z scores The number of bins

2((y_j-mu_j)sigma_j)^2) increased from 7 to 14 for this bar chart The fit of the mu_j=0 sigma_j=1 sigmaj is the standard

normal distribution is excellent as would be expected from the large number of cases (500) on which each mean is

deviation = 1 after Z transform basedy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)s=sprintf(Example 432 n=40f and samples=30fnsamples)title(s)figure(gcf)pause

LMex040302_logrnd_4thm Larsen amp Marx Example 432 Example of the central limit theorem using lognormally distributed variables instead of uniform Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox

Figure 8 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 lognormally distributed (igrave=1 oacute=1) random numbers after conversion to Z-scores The normal approximation is quite poor Because each of the 40 samples is based on a mean of only 5 cases

EEOS 601 Applied Statistics Week 4 P 22 of 38

Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogram multiple of 7 Figure 9 Example 432 Page 302 in Larsen amp Marx

n=100samples=1000 (2006) 1000 samples of 1000 lognormally distributed

numbins=14 Numbins for the (igrave=1 oacute=1) random numbers after conversion to Z scores I

histogram multiple of 7 used 14 bins for this bar chart Even for the highly

n=1000samples=1000 asymmetrically lognormally distributed samples the

numbins=14 For the histogram normal approximation is excellent because of the large

n=1000samples=40 numbins=7 number of cases (1000) on which each mean was based

For the histogram lognormal=0 This example calculates the z scores based on the parametric mean and variance for the underlying distributions For the uniform distribution the variance is calculated from the variance of the distribution to be 112 For the lognormal distribution lognstat is the built-in Matlab function that will covert the mean and variance from the log scale to the standard scale if lognormal mu=1var=1 C=lognrnd(11samplesn) [muvary]=lognstat(11)s=sprintf(Example 432 Lognormal n=40f and samples=30fnsamples) s2=sprintf(Example 432 Lognormal distribution 1st sample n=40fn) else C=rand(samplesn)

mu=05 Use symbolic math toolbox to calculate the variance of a uniform

random distribution syms f z y disp(Definite integral (01) of y^2) int(y^201) EYsq=int(sym(y^2)01)

EEOS 601 Applied Statistics Week 4 P 23 of 38

VARY=EYsq-mu^2 disp( Var of a uniformly distributed random number on the interval (01)) disp(VARY) vary=eval(VARY) convert from symbolic to numeric The variance is 112 s=sprintf(Example 432 Uniform n=40f and samples=40fnsamples) s2=sprintf(Example 432 Uniform distribution 1st sample n=40fn)endC6=sum(C) sums of n replicates The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302)end

Use the code on page 5-115 of Statistics toolbox to draw histogram Draw a histogram of one row of datahist(C(1))ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])ylabel(FrequencyFontSize16)xlabel(XFontSize16)title(s2FontSize20)figure(gcf)pause

histogram of the z scoresbinwidthscaling=7numbins This will produce 7 bins (35-(-35))edges=-35binwidthscaling35N=histc(C7edges)bar(edgesNsum(N)binwidthscaling1histc)ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])axis([-36 36 -0005 105max(Nsum(N)binwidthscaling)])ylabel(DensityFontSize16)xlabel(Z-ratioFontSize16)set(ax1xtickedgesFontSize14)set(ax1ytick00107FontSize14)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=0

EEOS 601 Applied Statistics Week 4 P 24 of 38

sigma_j=1 sigmaj is the standard deviation = 1 after Z transformy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)

title(sFontSize20)figure(gcf)pause

Example 433 LMex040303_4thm Example 433 Pp 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition An application of the symbolic math integration functions Typo in book f_y(y)=y 3(1-y)^2 not fy(y)=3(1-y)^2 syms y f_y(y)=y 3(1-y)^2 0lt=y=1 Let mean y = (115)Sum from 1 to 15 Yi int(y3(1-y)^201) muy=eval(int(y3(1-y)^201)) by definition of variance int(y^23(1-y)^201)-14^2 vary=eval(int(y^23(1-y)^201)-14^2) P=normcdf((38-muy) (sqrt(vary)sqrt(15)))- normcdf((18-muy) (sqrt(vary)sqrt(15)))

Example 434 LMEx040304_4thm Example 434 Effects of rounding analyzed with normal approximation Effect of rounding on the expected value and variance of rounded estimates P 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Using the central limit theorem what is the probability that the error on 100 transactions each rounded to the nearest $100 would exceed $500 What is the variance of a uniformly distributed random number over the interval -50 to 50 From definition of Variance Var(Y)=E[(Y-mu)^2]= integral from -50 to 50 (y-mu)^2 f_y(y) dy mu=0 f_y(y)=1 This is the height of the uniform pdf at each value of y for max(y)-min(y)=1 the value of f_y(y) is 1 for uniform random numbers spread over different ranges for max(y)-min(y)f_y(y) must equal 1 thus for a uniform random number distributed on the interval -100 to 100

EEOS 601 Applied Statistics Week 4 P 25 of 38

f_y(y)=1200 Written by EugeneGallagherumbedu for EEOS601 httpwwwesumbeduedgwebphtm Written in 2001 revised 1911 The mfile goes through some applications of the symbolic math toolbox at the beginning to verify that the variance of 100 rounded transactions is 25003 format rat syms y mu int(y^2100-5050)Use the symbolic math toolbox to find the variance of a uniformly distributed variable on the interval -50 to 50 it is 25003 VarY=eval(int(sym(y^2100)-5050)) Variance of a uniform randomly distributed number on the interval -50 to 50 is 25003 Note how the variance remains 25003 with a shift in mean Var(Y)=E(Y^2)-mu^2 mu=50 int(y^21000100) EYsq=eval(int(sym(y^2100)0100)) The previous statement produces 100003 which produces an identical variance after subtracting the square of the expected value mu^2 as is appropriate from the definition of variance VARY=EYsq-mu^2 Note that Var(Y) is also defined as integral -inf to inf (y-mu)^2 f_y(y) dy We can create the definite integral of this equation and evaluate it to produce the variance syms y mu mu=50 VARY2=eval(int(sym((y-mu)^2100)y0100)) VARY2 should also be 25003 format using Theorem 31312 in Larsen amp Marx (2001 p 223) Calculate the variance of all 100 transactions (multiply the variance by 100) Note this is for 100 indepedently distributed random variates where the positive and negative errors among transactions are not correlated Varall=100VARY If you wanted to convert the variance calculated here in dollars to the variance in pennies youd have to use Theorem 3131 p 222 It would the variance of Varall in pennies would be a=100Var(aVarall)=a^2 Varall =10000 Varall calculate the standard deviation sigmaall=sqrt(Varall)

EEOS 601 Applied Statistics Week 4 P 26 of 38

Calculate the Z statisticZ=zeros(21)Z(1)=(-500-0)sigmaallZ(2)=(500-0)sigmaall Use the cumulative normal probability function to find the probability of being more than $5 dollars off after 100 transactionsP=1-(normcdf(Z(2))-normcdf(Z(1)))fprintf(The probability of being more than $500 off after 100n)fprintf(transactions in which each transaction is rounded to the n)fprintf(nearest $100 is 64fnP)

Example 435 LMex040305_4thm Example 435 Memphis earthquakes and the Poisson distribution Pp 305-306 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Annual earthquakes registering 25 on the Richter scale and within 40 miles of downtown Memphis have a Poisson distribution with lambda=65 Calculate the exact probability that nine or more such earthquakes will strike next year using the Poisson distribution and compare that with the normal approximation based on the central limit theoremlambda=65N=9P=1-poisscdf(N-1lambda)fprintf(nThe exact p value using the Poisson cumulative distribution )fprintf(function with lambda=53f is 74fnlambdaP)P2=1-normcdf(((N-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP2)continuity=05 The continuity correction is appropriate whenever a discrete probability model is being approximated by the area under a curve)P3=1-normcdf(((N+continuity-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem with )fprintf(the continuity correction is )fprintf(74fnP3)

Optional plot the Poisson pdf and the normal approximation Use LMFig040301_4thm as a modelX=018Y = poisspdf(Xlambda)bar(XY1)axis([-06 19 0 02])title(Example 435FontSize14)set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 27 of 38

ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0218FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=lambdasigma_j=sqrt(lambda) sigmaj is the standard deviationy_j=00218fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel(Number of Memphis Earthquakes gt 25)figure(gcf)pause

Questions p 306

435 The Normal Curve as a Model for Individual Measurements

Definition 431 A random variable Y is said to be normally distributed with mean igrave and variance oacutesup2 if

Comment Areas under and ldquoarbitraryrdquo normal distribution f (y) y

are calculated by finding the equivalent area under the standard normal distribution f (z) z Figure 10 Figure 435 from Example 436 If a driverrsquos

true blood alcohol level is 0095 and the breathalyzer has an instrumental error oacute= 0004 (pdf shown in upper panel) What is the probability that a driver will be

The ratio is often referred to erroneously charged with DUI The problem can be as either a Z transformation or a Z analyzed using the pdf for the unscaled data or the data can score be converted to z scores (lower panel) by subtracting the

mean and dividing by oacute The probability of an erroneous Example 436 charge is 1056 the area to the right of the cutpoints LMex040306_4thm shown in the normal and standard normal curves

EEOS 601 Applied Statistics Week 4 P 28 of 38

Example 436 Drunk driving pages 308-309 in Larsen amp Marx (2006) Based on LMex040305_4thm Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml A legally drunk driver in many states has a blood alcohol level of 010 but the breathalyzer has measurement error Repeated measurements indicate that the the estimate of the mean is unbiased and sigma is 0004 If a drivers true blood alcohol content is 0095 what is the probability that he will be incorrectly booked on a DUI chargemu=0095sigma=0004Observed=01P=1-normcdf((Observed-mu)sigma)fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP)

Plot Figure 435subplot(211)X=0075000050115Y = normpdf(Xmusigma)plot(XY--r)axis([0075 0115 0 105])title(Figure 435FontSize14)ax1=gcaxlabel(Blood Alcohol Level)ylabel(f_y(y)FontSize14)ax1=gcaset(ax1xtick0080005011FontSize12)set(ax1ytick025100FontSize12)hold onxf=0100001115yf=normpdf(xfmusigma)fill([01 xf 0115][0 yf 0][8 8 1])hold off

subplot(212)X=-3500135Y = normpdf(X01)plot(XY--r)axis([-355 355 0 042])ax1=gcaxlabel(Z-score)ylabel(f_z(z)FontSize14)ax1=gca

EEOS 601 Applied Statistics Week 4 P 29 of 38

set(ax1xtick-350535FontSize12)set(ax1ytick00204FontSize12)hold onxf=125000135yf=normpdf(xf01)fill([125 xf 35][0 yf 0][8 8 1])figure(gcf)pausehold offsubplot(111)

Example 437 Mensa p 309 LMex040307_4thm Example 437 Mensa amp IQ Pages309-310 in Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Based on LMex040306_4thm Written Fall 2010 by EugeneGallagher for EEOS601 Revised1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfi Figure 11 Figure 436 from Example 437 A Mensa

lesedgwebphtml score of 133 marks the upper 2 of IQrsquos if igrave=100 and

A person must be in the upper 2 oacute=16

of the population in IQ to join What is the minimum IQ that will qualify a person for membership Assume that the mean IQ is 100 and sigma is 16mu=100sigma=16 Use the inverse of the normal pdf to find the IQ cutoff for the upper 2 of the population 098=(1-002)IQmensa=norminv(09810016)fprintf(nThe lowest acceptable IQ for membership in Mensa is )fprintf(30fnceil(IQmensa))

Plot Figure 436X=50150Y = normpdf(Xmusigma)plot(XY--r)axis([45 155 0 003])title(Figure 436 Mensa amp IQFontSize20)ax1=gcaxlabel(IQFontSize16)ylabel(f_y(y)FontSize16)ax1=gcaset(ax1xtick[50 100 133 150]FontSize14)

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 3: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 3 of 38

Understanding by Design Templates

Understanding By Design Stage 1 mdash Desired Results Week 4 LM Chapter 4 Special Distributions

G Established Goals bull Learn how to solve applied problems with Matlabrsquos pdf cdfrsquos and inverse functions

for the Poisson exponential and normal distributions bull Learn how to model individual behavior using the standard normal (Gaussian) curve bull Describe the effects of sampling error and rounding on the expected values and

variances of computed values from observations (An MCAS 10th grade standard) U Understand bull Most statistical inference mdash frequentist amp Bayesian mdash is based on the uniform

binomial Poisson amp Normal distributions bull The Poisson and Normal distributions were developed as approximations for the

binomial distribution but are useful themselves in modeling nature bull The Poisson model is the standard model for lsquorandomrsquo distributions of animals and

plants in nature bull Increased precision of measurement is one of the benchmarks of scientific advance

and propagation of error is the ruler for the benchmark Q Essential Questions bull Why is the Poisson distribution so useful in modeling random events in nature bull Is the normal distribution the foundation of all quantitative science bull Is estimating the variance of a climate change prediction more important than

estimating its expected value K Students will know how to define (in words or equations) bull Poisson exponential Monte Carlo simulation normal and standard normal

distributions propagation of error standard error z-score and z-transform S Students will be able to bull Write Matlab programs using the pdf cdf and inverse distribution functions for the

binomial Poisson exponential and normal distributions bull Provide examples from their own fields of the Poisson exponential amp normal

distributions bull Be able to state to a layman the meaning of p values based on the distribution

functions for the binomial Poisson amp normal distributions

EEOS 601 Applied Statistics Week 4 P 4 of 38

Understanding by Design Stage II mdash Assessment Evidence Week 4 621-627 LM Chapter 41-43 (pages 274-316) Special Distributions

bull Post in the discussion section by 62911 W bull Find an application from the primary literature in your own field or your own

work using the Poisson or normal distributions Hint use Google scholar eg search under lsquoCoral abundances Poissonrsquo) Describe the application and include a citation or link if it is from the scientific literature

bull HW 4 Problems due Wednesday 62911 W 10 PM bull Each problem must be solved using Matlab

bull Submit your Matlab code as part a Word Wordperfect or rtf document summarizing the answer I must be able to run your Matlab code after a simple cut-and-paste from your document to the Matlab editor

bull You must reach a 1-2 sentence conclusion about your result bull Submit to the course Vista 8 website

bull Basic problems (4 problems 10 points) bull Problem 422 P 280 Prescription errors

bull Compare the Poisson limit approximation with the exact probability from the binomial pdf

bull Use Case Study 421 as a model bull Problem 4212 P 287 Midwestern Skies Lost Bags

bull Use Case Study 422 as a model bull Matlab hints

bull Enter the data as BagsLost bull lambda=mean(BagsLost) bull k=[0max(BagsLost)+4] bull ObservedF=hist(BagsLostk) bull bar(kObservedF)figure(gcf)

bull Problem 4320 Nevada nuclear weapons test p 306-307 bull Use Example 435 (Memphis earthquakes as a model)

bull Problem 4322 P 314 Childrenrsquos IQrsquos and school cost bull Use Example 436 (Drunk Driving p 308-309) as a paradigm

bull Advanced problems (25 points each) bull Problem 4320 Nevada nuclear weapons Graph both the Poisson pdf

and superimposed normal pdf bull Use Example the graph in Gallagher Example 435 as a model

bull Problem 4322 p 314 Childrenrsquos IQrsquos and school cost bull Plot the pdf as in Figure 436 using Matlabrsquos fill function to fill

in the areas corresponding to special needs bull Use Example 437 as a model

bull Master problems (1 only 5 points) bull What is the probability of observing 2 or more magnitude 5 earthquakes

within 1 degree latitude amp longitude of Los Angeles next year bull httpearthquakeusgsgovearthquakeseqarchivesepicepic_rectphp

bull Search database 1173 to 123110 (38 years) 3305 to 3505 lat -11925 to -11725 longitude

EEOS 601 Applied Statistics Week 4 P 5 of 38

Introduction

Chapter 4 is my favorite chapter in Larsen amp Marx (2006) It also covers the Poisson distribution which is the null model for random distributions in nature The case studies and examples are interesting including the analysis of leukemia clusters in Niles Illinois (Case Study 421) and the probability that yoursquoll be eating 5 or more insect parts with your next peanut butter and jelly sandwich (Example 424)

Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4

4 Special distributionsLambert Adolphe Jacques Quetelet (1796-1874)

41 Introduction 411 To qualify as a probability model over a sample space S a function

4111 must be nonnegative for all outcomes in S an 4112 must integrate to 1

412 Not all functions qualify as a probability model 42 The Poisson distribution

4211 binomial distribution

4212 The Poisson limit

Poisson limit theorem (Larsen amp Marx 2001 Theorem 421 p 251) If n 4 and p 0 in such a way that euml = np remains constant then for any nonnegative integer k

Example 421 Convergence of the Poisson very good for n100

EEOS 601 Applied Statistics Week 4 P 6 of 38

LMex040201_4thm Example 421 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Table 421 amp Table 422 Larsen amp Marx (2006 p 277) Written by Eugene Gallagher for EEOS601 in2001 revised 1911 EugeneGallagherumbedu Comparing Poisson pdf and binomial pdf Figure 1 Figure from Table 421 With n=5

p=15 (euml=1) the Poisson approximation is just

n=5 adequate

k = [0n]lambda=1p_pdf = poisspdf(klambda)p_pdf=[p_pdf1-sum(p_pdf)]b_pdf= binopdf(knp)b_pdf=[b_pdf1-sum(b_pdf)]kdisp=[06]fprintf(Table 421n)fprintf(f10 53f 53fn[kdisp b_pdf p_pdf])

Plot as a grouped histogram bar(kdisp[b_pdf p_pdf]grouped) axis([-6 66 0 05]) legend(Binomial pdfPoisson pdf) xlabel(kFontsize14) ylabel(FrequencyFontsize14) title(Table 421Fontsize16) figure(gcf)pause p=1100n=100 Figure 1 Figure from Table 422 With

ndisp=[09] n=100 (euml=1) the Poisson approximation is

k = [0n] very good

lambda=1p_pdf = poisspdf(ndisplambda)p_pdf=[p_pdf1-sum(p_pdf)]b_pdf= binopdf(ndispnp)b_pdf=[b_pdf1-sum(b_pdf)]kdisp=[010]

IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 7 of 38

fprintf(Table 421n)fprintf(f10 53 53fn[kdisp b_pdf p_pdf]) Plot as a grouped histogram bar(kdisp[b_pdf p_pdf]grouped) legend(Binomial pdfPoisson pdfFontSize16) axis([-6 106 0 04]) xlabel(kFontSize14) ylabel(FrequencyFontSize14) title(Table 422FontSize16) figure(gcf)pause

Table 421

Figure 1 Table 421 Table 422

Example 422 Shadyrest hospital LMex040202_4thm Example 422 Shady Rest Hospital cardiac monitoring machines p 277-288 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition There are 12000 residents in the area served by Shady Rest Hospital and the probability that a resident will have a heart attack and need a monitor is 18000 There are 3 monitors available What is the probability that the 3 monitors will be inadequate to meet the communitys needs That is what is the probability of 4 or more heart attacks Written by EugeneGallagherumbedu Fall 2010 fprintf(Solved as the sum of binomial probabilities 86fn 1-sum(binopdf(031200018000))) fprintf(Or solved using the binomial cumulative distribution function 86fn 1-sum(binocdf(31200018000))) fprintf(Or solved using the Poisson distribution 86fn 1-sum(poisspdf(03120008000))) fprintf(Or solved using the cumulative Poisson distribution 86fn

IT
Stamp

EEOS 601 Applied Statistics Week 4 P 8 of 38

1-sum(poisscdf(3120008000)))

Case Study 421 Leukemia in Niles Illinois LMcs040201_4thm Case Study 421 Leukemia cancer cluster Pp 278-279 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Edition Written by Eugene D Gallagher for UMASSBostons EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1012010 Revised 1711 p=incidence of leukemia in 5 13 years in towns around Niles IL What is the probability of observing 8 or more childhood leukemia cases by chance alone Solve using the binomial pdf and the Poisson limit N=number of people in Niles ILN=7076p=2481e5X=87076X=Xbp=binopdf(XNp)BP=sum(bp) or it could be solved using the complementX2=07BP2=1-sum(binopdf(X2Np))fprintf(nFrom the binomial pdf the exact probability of observing n8 or more ) fprintf(deaths among 7076 children in 5 13 years is 63gnBP)fprintf(From the complement of the binomial pdf the exact p of observing n8 or)fprintf( more deaths among 7076 children in 5 13 years is 63gnBP2) Now solve using the Poisson pdf lambda is the expected number of cases among 7076 in Niles Illionoislambda=7076248e-5PP= 1-sum(poisspdf(X2lambda))fprintf(nFrom the Poisson pdf the approximate probability of observing n8 or ) fprintf(more deaths among 7076 children in 5 13 years with a Poisson n)fprintf(expected value of 63g is 63gnnlambdaPP)

Or one could use the Poisson cumulative distribution functionPCP = 1-poisscdf(7lambda)fprintf(From the cumulative Poisson pdf the approximate probability of n) fprintf(observing 8 or more deaths among 7076 children in 5 13 years )fprintf(with na Poisson expected value of 63g is 63gnn lambdaPCP) RE=abs(BP-PP)BP 100 fprintf( The relative error in the Poisson approximation is 41fnRE)

Questions p 279

EEOS 601 Applied Statistics Week 4 P 9 of 38

4213 The Poisson Distribution 42131 First described by Poisson as a limit theorem and

then used in 1898 by Professor Ladislaus von Bortkiewicz to model the number the Prussian cavalry officers kicked to death by their horses

Larsen amp Marx 2001 Theorem 422 p 251 The random variable X is said to have a Poisson distribution if

4214 Fitting the Poisson Distribution to Data

Case Study 422 LMcs040202_4thm Case Study 422 alpha decay amp Geiger counter Page 282-284 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1142010 Revised 1711 Fitting the Poisson pdf to radioactive decay alpha-particlek=[011]Frequency=[57 203 383 525 532 408 273 139 45 27 10 6]sumFrequency=sum(Frequency)lambda=sum(kFrequency)sumFrequencyfprintf(The Poisson parameter is 53fnlambda)ObservedP=FrequencysumFrequencyExpectedP=poisspdf(klambda)TableData=[k Frequency ObservedP ExpectedP]fprintf(ntttttTable 423n)fprintf(No Detected k tFrequency tProportion tp_x(k)n)fprintf( 20ftttt20fttt53ftt53fnTableData)

Using concepts from Chapter 5 a MLE fit could be made of the data The only new feature added is a 95 CI for the binomial parameter See LMcs050201_4thm for another example of using Poisspdf to fit the Poisson distribution to dataDATA=[zeros(571)ones(2031)2ones(3831)3ones(5251)4ones(5321) 5ones(4081)6ones(2731)7ones(1391)8ones(451) 9ones(271)10ones(101)11ones(61)] [LAMBDAHAT LAMBDACI] = poissfit(DATA005) fprintf(

nThe Poisson parameter =53f with 95 CI [53f 53f]nn LAMBDAHAT LAMBDACI)

EEOS 601 Applied Statistics Week 4 P 10 of 38

Case Study 423 LMcs040203_4thm Fitting football fumbles to a Poisson pdf From Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu Revised 1142010 See LMcs040202_4thm (alpha decay) for a very similar programk=[07]Frequency=[8 24 27 20 17 10 3 1]sumFrequency=sum(Frequency)meank=sum(kFrequency)sumFrequencyfprintf(The Poisson parameter is 53fnmeank)ObservedP=FrequencysumFrequencyExpectedP=Poisspdf(kmeank)TableData=[k Frequency ObservedP ExpectedP]fprintf(ntttttTable 425n)fprintf(No Detected k tFrequency tProportion tp_x(k)n)fprintf( 20ftttt20fttt53ftt53fnTableData) Using concepts from Chapter 5 a MLE fit could be made of the data The only new feature added is a 95 CI for the binomial parameter See LMcs050201_4thm for another example of using Poisspdf to fit the Poisson distribution to dataDATA=[zeros(81)ones(241)2ones(271)3ones(201)4ones(171) 5ones(101)6ones(31)7] [LAMBDAHAT LAMBDACI] = poissfit(DATA005) fprintf(

nThe Poisson parameter =53f with 95 CI [53f 53f]nn LAMBDAHAT LAMBDACI)

4215 The Poisson Model the law of small numbers 421511 Assumptions of the Poisson model

4215111 The probability that 2 or more events occur in any given ltsmallgt subinterval is essentially 0

4215112 Events are independent 4215113 The probability that an event occurs

during a given subinterval is constant over the entire interval from 0 to T

EEOS 601 Applied Statistics Week 4 P 11 of 38

42152 Calculating Poisson Probabilities 421521 ldquoCalculating Poisson probabilities is an

exercise in choosing T so that eumlT represents the expected number of occurrences in whatever ldquounitrdquo is associated with the random variable X

Example 423 LMex040203_4thm Example 423 Typographical Errors Page 285 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 172011 Revised 1711 Using the Poisson pdf ErrorRate=typographical Errors per page What is the probability that fewer than three typos will appear in a 16shy page edition X=Number of errorsErrorRate=04lambda=ErrorRate16fprintf(Using the Poisson cumulative distribution functionn)P=poisscdf(2lambda)fprintf(The probability that fewer than 3 typos will appear in a n)fprintf(16-p document with an error rate of 31f per page is 53fnlambdaP)fprintf(nUsing the Poisson probability distribution functionn)X=02P=sum(poisspdf(Xlambda))fprintf(If the Poisson parameter = 31f the probability of nlambda)fprintf(fewer than 3 typos = 53fnP)

Example 424 Insect parts in peanut butter

LMex040204_4thm Example 424 Eating bug parts Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu 121210 revised 11211 httpalphaesumbedufacultyedgfilesedgwebphtml This is a simple application of the Poisson cumulative distribution function and Poisson pdf The FDA Food Detection Action Limit for peanut butter is 30 insect fragments per 100 grams If your peanut butter snack weighs 20 grams what is the probability that youll ingest 5 or more insect parts

EEOS 601 Applied Statistics Week 4 P 12 of 38

Calculate lambda the expected number of insect parts per 20 gramslambda=3010020 solve using the Poisson cumulative distribution functionP=1-poisscdf(4lambda)fprintf(The probability that a 20-g snack contains 5 or more insect parts is 53fnP) The problem could also be solved exactly using the Poisson pdfk=04P2=1-sum(poisspdf(klambda))fprintf(The probability that a 20-g snack contains 5 or more insect parts is 53fnP2)

SSum of Poisson processes is a Poisson process

Questions Page 287 4210 Prussian horse kicks 4212 Midwestern Skies Airlines bags lost

Sum of two Poisson variables is Poisson

422 Interval between events the PoissonExponential

Relationship

Theorem 423 P 290

Suppose a series of events satisfying the Poisson model are occurring at a rate of euml

per unit time Let the random variable Y denote the inteval between consecutive

events Then Y has the exponential distribution

EEOS 601 Applied Statistics Week 4 P 13 of 38

Figure 2 Case study 424 Table 426-7 Figure 423

Case Study 424 LMcs040204_4thm Case Study 424 Mauna Loa eruptions amp exponential distribution P 290-291 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Ed Written by Eugene Gallagher for EEOS601 Figure 3 Figure 423 from Case study 424

IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 14 of 38

EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1711 Revised 1711 Fitting the exponential pdf to intervals between eruptons Data = interval between 37 consecutive eruptions of Mauna Loa DATA=[126 73 3 6 37 23 73 23 2 65 94 51 26 21 6 68 16 20 6 18 6 41 40 18 41 11 12 38 77 61 26 3 38 50 91 12] DATA=DATA() lambda=0027 fprintf(Eruptions occuring at 53f per month or once every 32f lambda1(12lambda)) fprintf(yearsn) histc is the Matlab counting function Plot the data using LMFig030402_4thm as a model ObservedI=histc(DATA020140) bar(020140ObservedI(sum(ObservedI)20)histc) set(get(gcaChildren)FaceColor[8 8 1]) axis([0 145 0 0033]) figure(gcf) xlabel(Interval between eruptions (in months)) ylabel(Density) ax1=gca get the handle for the bar charts axes pause y=0140 fy=lambdaexp(-lambday) h2=line(yfyColorrLineStyle--Linewidth2) set(ax1YTick[0001003]) text(340015f_y(y)=0027e^-0027y) title(Figure 423) figure(gcf) pause

Example 425 LMex040205_4thm Example 425 page 292 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu for EEOS601 If the Perseid meteor shower has a rate of 40 per hour Poisson distributed calculate the probability that someone would have to wait 5 minutes before seeing another y= interval in minutes between consecutive sightings Using the symbolic math toolbox

EEOS 601 Applied Statistics Week 4 P 15 of 38

fprintf(Exact P=n)syms y disp(int(4060exp(-4060y)5inf))P=eval(int(4060exp(-4060y)5inf))fprintf(If the Perseids have a Poisson parameter = 4060 per min n)fprintf(an observer would have p=64f of waiting 5 or more nP)fprintf(minutes between consecutive meteorsn)

43 THE NORMAL DISTRIBUTION

Figure 431 LMFig040301_4thm Page 293 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Generate 20 binomial random numbers p(success)=05 Written in 2001 by EugeneGallagherumbedu revised 1711 Use the statistical toolboxs binorndm X=020n=20p=05

Figure 431 Page 293 in Larsen amp Marx (2006)

Y = binopdf(Xnp)bar(XY1)axis([0 21 0 02])title(Figure 431FontSize14)set(get(gcaChildren)FaceColor[8 8 1])ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0220FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=npsigma_j=sqrt(np(1-p)) sigmaj is the standard deviationy_j=00220fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel()ylabel(Probability)title(Figure 431)figure(gcf)pause

IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 16 of 38

431 Theorem 431 Let X be a binomial random variable defined on n independent trials for which p=P(success) For any numbers a and b

Comment The function is referred to as the standard normal (or

Gaussian) curve [or distribution] By convention any random variable whose probabilistic behavior is described by a standard normal curve is denoted by Z (rather than by X Y or W) E(Z)=0 and Var(Z)=1 Page 294 in Larsen amp Marx (2006)

432 Finding areas under the standard normal curve

Figure 432 LMFig040302_4thm Larsen amp Marx Figure 432 plotpage 295 Use LMex040308_4thm as amodel Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Page 267 Standard normal distributionusing ezplot Integration using the symbolicmath toolbox Figure 4 Figure 432 Written byEugeneGallagherumbedu October 1 2010clear allhold off clfsyms z Just for fun use symbolic math integration to find the integralINTP=eval(int(1sqrt(2pi)exp(-(z^22))-inf114)) or use the standard normal cumulative distribution functionintp=normcdf(114)

Plot Figure 432mu=0sigma=1X=-350135

EEOS 601 Applied Statistics Week 4 P 17 of 38

Y = normpdf(Xmusigma) plot(XY--r) axis([-355 355 0 041]) title(Figure 432FontSize16) ax1=gca xlabel( ) ylabel(f_z(z)FontSize14) ax1=gca set(ax1XTick[-3 -2 -1 0 114 2 3] FontSize12

XTickLabel-3-2-1011423FontSize12) set(ax1ytick0104FontSize12) hold on xf=-3501114yf=normpdf(xfmusigma) fill([-35 xf 114][0 yf 0][8 8 1]) hold off

433 The continuity correction

Figure 5 Figure 433 the continuity correction

Example 431 LMex040301_4thm Example 431 Overbooking airline seats Application of the normal distribution and the continuity correction Page 297 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagher httpwwwesumbeduedgwebphtm Written Fall 2010 Revised 1711 178 seats sold for 168 economy class seats available p=09 passenger will show up What is the probability that not everyone who shows up at the gate on time can be accomodated n=178p=09n2=169

IT
Stamp

EEOS 601 Applied Statistics Week 4 P 18 of 38

P=normcdf((n+05-np)sqrt(np(1-p)))- normcdf((n2-05-np)sqrt(np(1-p))) fprintf( nP not all can be accomodated with continuity correction = 64fnP) Without the continuity correction P2=normcdf((n-np)sqrt(np(1-p)))- normcdf((n2-np)sqrt(np(1-p))) fprintf( P not all can be accomodated without continuity correction = 64fnP2) k=169178 Solve as a binomial pdf P3 = sum(binopdf(knp)) fprintf( P not all can be accomodated exact binomial = 64fnP3) P4 = binocdf(nnp)-binocdf(n2-1np) fprintf( P not all can be accomodated exact binomial cdf = 64fnP4)

Case Study 431 ESP LMcs040301_4thm Case Study 431 ESP P 298 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Thirty two students correctly guessed 12489 times out of 60000 when the expected number of successes would be just 12000 What is the probability of this occuring by chancek=12489n=6e4p=15P1=1-binocdf(knp)fprintf(nUsing the binomial cdf the exact P = 97gnP1)X=(124885-np)sqrt(np(1-p))P2=1-normcdf(X)fprintf(Using the Normal approximation with the continuity n)fprintf(correction the exact P = 97gnP2)X=(12489-np)sqrt(np(1-p))P3=1-normcdf(X)fprintf(Using the Normal approximation without the continuity n)fprintf(correction the exact P = 97gnP3)

Questions Page 299

EEOS 601 Applied Statistics Week 4 P 19 of 38

434 Central Limit Theorem (p 301) Central limit theorem Every binomial random variable X can be written as the sum of n independent Bernoulli random variables X X X where X =1 with probability P and 0 with probability 1-p but if 1 2 n i

X=X +X ++X Theorem 431 can be reexpressed as 1 2 n

The Russian Lyapunov made many of the key advances and in 1920 George Polya called it the central limit theorem Theorem 432 p 273 Let W W be an infinite sequence of of independent random variables 1 2

with the same distribution Suppose that the mean igrave and the variance oacute2 of f (w) are both finite w

For any numbers a and b

Comment The central limit theorem is often stated in terms of the average of W W and 1 2

W rather than the sum Theorem 432 can be stated in equivalent form n

Example 432 LMEx040302_4thm Larsen amp Marx Example 432 Example of the central limit theorem Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf

Figure 5 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 uniformly distributed random numbers after conversion to Z scores The fit to the normal distribution is quite good even though each of the 40 samples is based on only 5 uniformly distributed numbers

EEOS 601 Applied Statistics Week 4 P 20 of 38

See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogrammultiple of 7 n=100samples=1000numbins=14 Numbins for thehistogram multiple of 7 n=1000samples=500numbins=14 For the histogram n=1000samples=40 numbins=7 For the histogramC=rand(samplesn) Figure 6 Example 432 Page 302 in Larsen amp MarxC6=sum(C) sums of n replicates (2006) 1000 samples of 5 uniformly distributed random Use symbolic math toolbox to numbers after conversion to Z scores The number of binscalculate the variance of a uniform increased from 7 to 14 for this bar chart The normal random distribution distribution provides an excellent fit even though each ofmu=05 1000 means is based on only 5 cases drawn from a uniformsyms f z y distribution The uniform distribution is symmetricdisp(Definite integral (01) of y^2) allowing a rapid convergence to the normal distributionint(y^201)EYsq=int(sym(y^2)01)VARY=EYsq-mu^2disp(Var of a uniformly distributed random number on the interval (01))disp(VARY)vary=eval(VARY) convert from symbolic to numeric The variance is 112 The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302) end Use the code on page 5-115 of Statistics toolbox to draw histogram binwidthscaling=7numbins This will produce 7 bins (35-(-35)) edges=-35binwidthscaling35N=histc(C7edges) bar(edgesNsum(N)binwidthscaling1) ax1=gca set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 21 of 38

axis([-36 36 -0005 053]) ylabel(DensityFontSize14)xlabel(Z-ratio) set(ax1xtickedgesFontSize12) set(ax1ytick00105FontSize12) figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is Figure 7 Example 432 Page 302 in Larsen amp Marx f(y_j)=1(sqrt(2pi)sigma_j)exp(-1

(2006) 1000 samples of 500 uniformly distributed random numbers after conversion to Z scores The number of bins

2((y_j-mu_j)sigma_j)^2) increased from 7 to 14 for this bar chart The fit of the mu_j=0 sigma_j=1 sigmaj is the standard

normal distribution is excellent as would be expected from the large number of cases (500) on which each mean is

deviation = 1 after Z transform basedy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)s=sprintf(Example 432 n=40f and samples=30fnsamples)title(s)figure(gcf)pause

LMex040302_logrnd_4thm Larsen amp Marx Example 432 Example of the central limit theorem using lognormally distributed variables instead of uniform Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox

Figure 8 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 lognormally distributed (igrave=1 oacute=1) random numbers after conversion to Z-scores The normal approximation is quite poor Because each of the 40 samples is based on a mean of only 5 cases

EEOS 601 Applied Statistics Week 4 P 22 of 38

Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogram multiple of 7 Figure 9 Example 432 Page 302 in Larsen amp Marx

n=100samples=1000 (2006) 1000 samples of 1000 lognormally distributed

numbins=14 Numbins for the (igrave=1 oacute=1) random numbers after conversion to Z scores I

histogram multiple of 7 used 14 bins for this bar chart Even for the highly

n=1000samples=1000 asymmetrically lognormally distributed samples the

numbins=14 For the histogram normal approximation is excellent because of the large

n=1000samples=40 numbins=7 number of cases (1000) on which each mean was based

For the histogram lognormal=0 This example calculates the z scores based on the parametric mean and variance for the underlying distributions For the uniform distribution the variance is calculated from the variance of the distribution to be 112 For the lognormal distribution lognstat is the built-in Matlab function that will covert the mean and variance from the log scale to the standard scale if lognormal mu=1var=1 C=lognrnd(11samplesn) [muvary]=lognstat(11)s=sprintf(Example 432 Lognormal n=40f and samples=30fnsamples) s2=sprintf(Example 432 Lognormal distribution 1st sample n=40fn) else C=rand(samplesn)

mu=05 Use symbolic math toolbox to calculate the variance of a uniform

random distribution syms f z y disp(Definite integral (01) of y^2) int(y^201) EYsq=int(sym(y^2)01)

EEOS 601 Applied Statistics Week 4 P 23 of 38

VARY=EYsq-mu^2 disp( Var of a uniformly distributed random number on the interval (01)) disp(VARY) vary=eval(VARY) convert from symbolic to numeric The variance is 112 s=sprintf(Example 432 Uniform n=40f and samples=40fnsamples) s2=sprintf(Example 432 Uniform distribution 1st sample n=40fn)endC6=sum(C) sums of n replicates The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302)end

Use the code on page 5-115 of Statistics toolbox to draw histogram Draw a histogram of one row of datahist(C(1))ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])ylabel(FrequencyFontSize16)xlabel(XFontSize16)title(s2FontSize20)figure(gcf)pause

histogram of the z scoresbinwidthscaling=7numbins This will produce 7 bins (35-(-35))edges=-35binwidthscaling35N=histc(C7edges)bar(edgesNsum(N)binwidthscaling1histc)ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])axis([-36 36 -0005 105max(Nsum(N)binwidthscaling)])ylabel(DensityFontSize16)xlabel(Z-ratioFontSize16)set(ax1xtickedgesFontSize14)set(ax1ytick00107FontSize14)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=0

EEOS 601 Applied Statistics Week 4 P 24 of 38

sigma_j=1 sigmaj is the standard deviation = 1 after Z transformy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)

title(sFontSize20)figure(gcf)pause

Example 433 LMex040303_4thm Example 433 Pp 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition An application of the symbolic math integration functions Typo in book f_y(y)=y 3(1-y)^2 not fy(y)=3(1-y)^2 syms y f_y(y)=y 3(1-y)^2 0lt=y=1 Let mean y = (115)Sum from 1 to 15 Yi int(y3(1-y)^201) muy=eval(int(y3(1-y)^201)) by definition of variance int(y^23(1-y)^201)-14^2 vary=eval(int(y^23(1-y)^201)-14^2) P=normcdf((38-muy) (sqrt(vary)sqrt(15)))- normcdf((18-muy) (sqrt(vary)sqrt(15)))

Example 434 LMEx040304_4thm Example 434 Effects of rounding analyzed with normal approximation Effect of rounding on the expected value and variance of rounded estimates P 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Using the central limit theorem what is the probability that the error on 100 transactions each rounded to the nearest $100 would exceed $500 What is the variance of a uniformly distributed random number over the interval -50 to 50 From definition of Variance Var(Y)=E[(Y-mu)^2]= integral from -50 to 50 (y-mu)^2 f_y(y) dy mu=0 f_y(y)=1 This is the height of the uniform pdf at each value of y for max(y)-min(y)=1 the value of f_y(y) is 1 for uniform random numbers spread over different ranges for max(y)-min(y)f_y(y) must equal 1 thus for a uniform random number distributed on the interval -100 to 100

EEOS 601 Applied Statistics Week 4 P 25 of 38

f_y(y)=1200 Written by EugeneGallagherumbedu for EEOS601 httpwwwesumbeduedgwebphtm Written in 2001 revised 1911 The mfile goes through some applications of the symbolic math toolbox at the beginning to verify that the variance of 100 rounded transactions is 25003 format rat syms y mu int(y^2100-5050)Use the symbolic math toolbox to find the variance of a uniformly distributed variable on the interval -50 to 50 it is 25003 VarY=eval(int(sym(y^2100)-5050)) Variance of a uniform randomly distributed number on the interval -50 to 50 is 25003 Note how the variance remains 25003 with a shift in mean Var(Y)=E(Y^2)-mu^2 mu=50 int(y^21000100) EYsq=eval(int(sym(y^2100)0100)) The previous statement produces 100003 which produces an identical variance after subtracting the square of the expected value mu^2 as is appropriate from the definition of variance VARY=EYsq-mu^2 Note that Var(Y) is also defined as integral -inf to inf (y-mu)^2 f_y(y) dy We can create the definite integral of this equation and evaluate it to produce the variance syms y mu mu=50 VARY2=eval(int(sym((y-mu)^2100)y0100)) VARY2 should also be 25003 format using Theorem 31312 in Larsen amp Marx (2001 p 223) Calculate the variance of all 100 transactions (multiply the variance by 100) Note this is for 100 indepedently distributed random variates where the positive and negative errors among transactions are not correlated Varall=100VARY If you wanted to convert the variance calculated here in dollars to the variance in pennies youd have to use Theorem 3131 p 222 It would the variance of Varall in pennies would be a=100Var(aVarall)=a^2 Varall =10000 Varall calculate the standard deviation sigmaall=sqrt(Varall)

EEOS 601 Applied Statistics Week 4 P 26 of 38

Calculate the Z statisticZ=zeros(21)Z(1)=(-500-0)sigmaallZ(2)=(500-0)sigmaall Use the cumulative normal probability function to find the probability of being more than $5 dollars off after 100 transactionsP=1-(normcdf(Z(2))-normcdf(Z(1)))fprintf(The probability of being more than $500 off after 100n)fprintf(transactions in which each transaction is rounded to the n)fprintf(nearest $100 is 64fnP)

Example 435 LMex040305_4thm Example 435 Memphis earthquakes and the Poisson distribution Pp 305-306 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Annual earthquakes registering 25 on the Richter scale and within 40 miles of downtown Memphis have a Poisson distribution with lambda=65 Calculate the exact probability that nine or more such earthquakes will strike next year using the Poisson distribution and compare that with the normal approximation based on the central limit theoremlambda=65N=9P=1-poisscdf(N-1lambda)fprintf(nThe exact p value using the Poisson cumulative distribution )fprintf(function with lambda=53f is 74fnlambdaP)P2=1-normcdf(((N-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP2)continuity=05 The continuity correction is appropriate whenever a discrete probability model is being approximated by the area under a curve)P3=1-normcdf(((N+continuity-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem with )fprintf(the continuity correction is )fprintf(74fnP3)

Optional plot the Poisson pdf and the normal approximation Use LMFig040301_4thm as a modelX=018Y = poisspdf(Xlambda)bar(XY1)axis([-06 19 0 02])title(Example 435FontSize14)set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 27 of 38

ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0218FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=lambdasigma_j=sqrt(lambda) sigmaj is the standard deviationy_j=00218fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel(Number of Memphis Earthquakes gt 25)figure(gcf)pause

Questions p 306

435 The Normal Curve as a Model for Individual Measurements

Definition 431 A random variable Y is said to be normally distributed with mean igrave and variance oacutesup2 if

Comment Areas under and ldquoarbitraryrdquo normal distribution f (y) y

are calculated by finding the equivalent area under the standard normal distribution f (z) z Figure 10 Figure 435 from Example 436 If a driverrsquos

true blood alcohol level is 0095 and the breathalyzer has an instrumental error oacute= 0004 (pdf shown in upper panel) What is the probability that a driver will be

The ratio is often referred to erroneously charged with DUI The problem can be as either a Z transformation or a Z analyzed using the pdf for the unscaled data or the data can score be converted to z scores (lower panel) by subtracting the

mean and dividing by oacute The probability of an erroneous Example 436 charge is 1056 the area to the right of the cutpoints LMex040306_4thm shown in the normal and standard normal curves

EEOS 601 Applied Statistics Week 4 P 28 of 38

Example 436 Drunk driving pages 308-309 in Larsen amp Marx (2006) Based on LMex040305_4thm Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml A legally drunk driver in many states has a blood alcohol level of 010 but the breathalyzer has measurement error Repeated measurements indicate that the the estimate of the mean is unbiased and sigma is 0004 If a drivers true blood alcohol content is 0095 what is the probability that he will be incorrectly booked on a DUI chargemu=0095sigma=0004Observed=01P=1-normcdf((Observed-mu)sigma)fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP)

Plot Figure 435subplot(211)X=0075000050115Y = normpdf(Xmusigma)plot(XY--r)axis([0075 0115 0 105])title(Figure 435FontSize14)ax1=gcaxlabel(Blood Alcohol Level)ylabel(f_y(y)FontSize14)ax1=gcaset(ax1xtick0080005011FontSize12)set(ax1ytick025100FontSize12)hold onxf=0100001115yf=normpdf(xfmusigma)fill([01 xf 0115][0 yf 0][8 8 1])hold off

subplot(212)X=-3500135Y = normpdf(X01)plot(XY--r)axis([-355 355 0 042])ax1=gcaxlabel(Z-score)ylabel(f_z(z)FontSize14)ax1=gca

EEOS 601 Applied Statistics Week 4 P 29 of 38

set(ax1xtick-350535FontSize12)set(ax1ytick00204FontSize12)hold onxf=125000135yf=normpdf(xf01)fill([125 xf 35][0 yf 0][8 8 1])figure(gcf)pausehold offsubplot(111)

Example 437 Mensa p 309 LMex040307_4thm Example 437 Mensa amp IQ Pages309-310 in Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Based on LMex040306_4thm Written Fall 2010 by EugeneGallagher for EEOS601 Revised1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfi Figure 11 Figure 436 from Example 437 A Mensa

lesedgwebphtml score of 133 marks the upper 2 of IQrsquos if igrave=100 and

A person must be in the upper 2 oacute=16

of the population in IQ to join What is the minimum IQ that will qualify a person for membership Assume that the mean IQ is 100 and sigma is 16mu=100sigma=16 Use the inverse of the normal pdf to find the IQ cutoff for the upper 2 of the population 098=(1-002)IQmensa=norminv(09810016)fprintf(nThe lowest acceptable IQ for membership in Mensa is )fprintf(30fnceil(IQmensa))

Plot Figure 436X=50150Y = normpdf(Xmusigma)plot(XY--r)axis([45 155 0 003])title(Figure 436 Mensa amp IQFontSize20)ax1=gcaxlabel(IQFontSize16)ylabel(f_y(y)FontSize16)ax1=gcaset(ax1xtick[50 100 133 150]FontSize14)

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 4: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 4 of 38

Understanding by Design Stage II mdash Assessment Evidence Week 4 621-627 LM Chapter 41-43 (pages 274-316) Special Distributions

bull Post in the discussion section by 62911 W bull Find an application from the primary literature in your own field or your own

work using the Poisson or normal distributions Hint use Google scholar eg search under lsquoCoral abundances Poissonrsquo) Describe the application and include a citation or link if it is from the scientific literature

bull HW 4 Problems due Wednesday 62911 W 10 PM bull Each problem must be solved using Matlab

bull Submit your Matlab code as part a Word Wordperfect or rtf document summarizing the answer I must be able to run your Matlab code after a simple cut-and-paste from your document to the Matlab editor

bull You must reach a 1-2 sentence conclusion about your result bull Submit to the course Vista 8 website

bull Basic problems (4 problems 10 points) bull Problem 422 P 280 Prescription errors

bull Compare the Poisson limit approximation with the exact probability from the binomial pdf

bull Use Case Study 421 as a model bull Problem 4212 P 287 Midwestern Skies Lost Bags

bull Use Case Study 422 as a model bull Matlab hints

bull Enter the data as BagsLost bull lambda=mean(BagsLost) bull k=[0max(BagsLost)+4] bull ObservedF=hist(BagsLostk) bull bar(kObservedF)figure(gcf)

bull Problem 4320 Nevada nuclear weapons test p 306-307 bull Use Example 435 (Memphis earthquakes as a model)

bull Problem 4322 P 314 Childrenrsquos IQrsquos and school cost bull Use Example 436 (Drunk Driving p 308-309) as a paradigm

bull Advanced problems (25 points each) bull Problem 4320 Nevada nuclear weapons Graph both the Poisson pdf

and superimposed normal pdf bull Use Example the graph in Gallagher Example 435 as a model

bull Problem 4322 p 314 Childrenrsquos IQrsquos and school cost bull Plot the pdf as in Figure 436 using Matlabrsquos fill function to fill

in the areas corresponding to special needs bull Use Example 437 as a model

bull Master problems (1 only 5 points) bull What is the probability of observing 2 or more magnitude 5 earthquakes

within 1 degree latitude amp longitude of Los Angeles next year bull httpearthquakeusgsgovearthquakeseqarchivesepicepic_rectphp

bull Search database 1173 to 123110 (38 years) 3305 to 3505 lat -11925 to -11725 longitude

EEOS 601 Applied Statistics Week 4 P 5 of 38

Introduction

Chapter 4 is my favorite chapter in Larsen amp Marx (2006) It also covers the Poisson distribution which is the null model for random distributions in nature The case studies and examples are interesting including the analysis of leukemia clusters in Niles Illinois (Case Study 421) and the probability that yoursquoll be eating 5 or more insect parts with your next peanut butter and jelly sandwich (Example 424)

Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4

4 Special distributionsLambert Adolphe Jacques Quetelet (1796-1874)

41 Introduction 411 To qualify as a probability model over a sample space S a function

4111 must be nonnegative for all outcomes in S an 4112 must integrate to 1

412 Not all functions qualify as a probability model 42 The Poisson distribution

4211 binomial distribution

4212 The Poisson limit

Poisson limit theorem (Larsen amp Marx 2001 Theorem 421 p 251) If n 4 and p 0 in such a way that euml = np remains constant then for any nonnegative integer k

Example 421 Convergence of the Poisson very good for n100

EEOS 601 Applied Statistics Week 4 P 6 of 38

LMex040201_4thm Example 421 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Table 421 amp Table 422 Larsen amp Marx (2006 p 277) Written by Eugene Gallagher for EEOS601 in2001 revised 1911 EugeneGallagherumbedu Comparing Poisson pdf and binomial pdf Figure 1 Figure from Table 421 With n=5

p=15 (euml=1) the Poisson approximation is just

n=5 adequate

k = [0n]lambda=1p_pdf = poisspdf(klambda)p_pdf=[p_pdf1-sum(p_pdf)]b_pdf= binopdf(knp)b_pdf=[b_pdf1-sum(b_pdf)]kdisp=[06]fprintf(Table 421n)fprintf(f10 53f 53fn[kdisp b_pdf p_pdf])

Plot as a grouped histogram bar(kdisp[b_pdf p_pdf]grouped) axis([-6 66 0 05]) legend(Binomial pdfPoisson pdf) xlabel(kFontsize14) ylabel(FrequencyFontsize14) title(Table 421Fontsize16) figure(gcf)pause p=1100n=100 Figure 1 Figure from Table 422 With

ndisp=[09] n=100 (euml=1) the Poisson approximation is

k = [0n] very good

lambda=1p_pdf = poisspdf(ndisplambda)p_pdf=[p_pdf1-sum(p_pdf)]b_pdf= binopdf(ndispnp)b_pdf=[b_pdf1-sum(b_pdf)]kdisp=[010]

IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 7 of 38

fprintf(Table 421n)fprintf(f10 53 53fn[kdisp b_pdf p_pdf]) Plot as a grouped histogram bar(kdisp[b_pdf p_pdf]grouped) legend(Binomial pdfPoisson pdfFontSize16) axis([-6 106 0 04]) xlabel(kFontSize14) ylabel(FrequencyFontSize14) title(Table 422FontSize16) figure(gcf)pause

Table 421

Figure 1 Table 421 Table 422

Example 422 Shadyrest hospital LMex040202_4thm Example 422 Shady Rest Hospital cardiac monitoring machines p 277-288 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition There are 12000 residents in the area served by Shady Rest Hospital and the probability that a resident will have a heart attack and need a monitor is 18000 There are 3 monitors available What is the probability that the 3 monitors will be inadequate to meet the communitys needs That is what is the probability of 4 or more heart attacks Written by EugeneGallagherumbedu Fall 2010 fprintf(Solved as the sum of binomial probabilities 86fn 1-sum(binopdf(031200018000))) fprintf(Or solved using the binomial cumulative distribution function 86fn 1-sum(binocdf(31200018000))) fprintf(Or solved using the Poisson distribution 86fn 1-sum(poisspdf(03120008000))) fprintf(Or solved using the cumulative Poisson distribution 86fn

IT
Stamp

EEOS 601 Applied Statistics Week 4 P 8 of 38

1-sum(poisscdf(3120008000)))

Case Study 421 Leukemia in Niles Illinois LMcs040201_4thm Case Study 421 Leukemia cancer cluster Pp 278-279 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Edition Written by Eugene D Gallagher for UMASSBostons EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1012010 Revised 1711 p=incidence of leukemia in 5 13 years in towns around Niles IL What is the probability of observing 8 or more childhood leukemia cases by chance alone Solve using the binomial pdf and the Poisson limit N=number of people in Niles ILN=7076p=2481e5X=87076X=Xbp=binopdf(XNp)BP=sum(bp) or it could be solved using the complementX2=07BP2=1-sum(binopdf(X2Np))fprintf(nFrom the binomial pdf the exact probability of observing n8 or more ) fprintf(deaths among 7076 children in 5 13 years is 63gnBP)fprintf(From the complement of the binomial pdf the exact p of observing n8 or)fprintf( more deaths among 7076 children in 5 13 years is 63gnBP2) Now solve using the Poisson pdf lambda is the expected number of cases among 7076 in Niles Illionoislambda=7076248e-5PP= 1-sum(poisspdf(X2lambda))fprintf(nFrom the Poisson pdf the approximate probability of observing n8 or ) fprintf(more deaths among 7076 children in 5 13 years with a Poisson n)fprintf(expected value of 63g is 63gnnlambdaPP)

Or one could use the Poisson cumulative distribution functionPCP = 1-poisscdf(7lambda)fprintf(From the cumulative Poisson pdf the approximate probability of n) fprintf(observing 8 or more deaths among 7076 children in 5 13 years )fprintf(with na Poisson expected value of 63g is 63gnn lambdaPCP) RE=abs(BP-PP)BP 100 fprintf( The relative error in the Poisson approximation is 41fnRE)

Questions p 279

EEOS 601 Applied Statistics Week 4 P 9 of 38

4213 The Poisson Distribution 42131 First described by Poisson as a limit theorem and

then used in 1898 by Professor Ladislaus von Bortkiewicz to model the number the Prussian cavalry officers kicked to death by their horses

Larsen amp Marx 2001 Theorem 422 p 251 The random variable X is said to have a Poisson distribution if

4214 Fitting the Poisson Distribution to Data

Case Study 422 LMcs040202_4thm Case Study 422 alpha decay amp Geiger counter Page 282-284 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1142010 Revised 1711 Fitting the Poisson pdf to radioactive decay alpha-particlek=[011]Frequency=[57 203 383 525 532 408 273 139 45 27 10 6]sumFrequency=sum(Frequency)lambda=sum(kFrequency)sumFrequencyfprintf(The Poisson parameter is 53fnlambda)ObservedP=FrequencysumFrequencyExpectedP=poisspdf(klambda)TableData=[k Frequency ObservedP ExpectedP]fprintf(ntttttTable 423n)fprintf(No Detected k tFrequency tProportion tp_x(k)n)fprintf( 20ftttt20fttt53ftt53fnTableData)

Using concepts from Chapter 5 a MLE fit could be made of the data The only new feature added is a 95 CI for the binomial parameter See LMcs050201_4thm for another example of using Poisspdf to fit the Poisson distribution to dataDATA=[zeros(571)ones(2031)2ones(3831)3ones(5251)4ones(5321) 5ones(4081)6ones(2731)7ones(1391)8ones(451) 9ones(271)10ones(101)11ones(61)] [LAMBDAHAT LAMBDACI] = poissfit(DATA005) fprintf(

nThe Poisson parameter =53f with 95 CI [53f 53f]nn LAMBDAHAT LAMBDACI)

EEOS 601 Applied Statistics Week 4 P 10 of 38

Case Study 423 LMcs040203_4thm Fitting football fumbles to a Poisson pdf From Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu Revised 1142010 See LMcs040202_4thm (alpha decay) for a very similar programk=[07]Frequency=[8 24 27 20 17 10 3 1]sumFrequency=sum(Frequency)meank=sum(kFrequency)sumFrequencyfprintf(The Poisson parameter is 53fnmeank)ObservedP=FrequencysumFrequencyExpectedP=Poisspdf(kmeank)TableData=[k Frequency ObservedP ExpectedP]fprintf(ntttttTable 425n)fprintf(No Detected k tFrequency tProportion tp_x(k)n)fprintf( 20ftttt20fttt53ftt53fnTableData) Using concepts from Chapter 5 a MLE fit could be made of the data The only new feature added is a 95 CI for the binomial parameter See LMcs050201_4thm for another example of using Poisspdf to fit the Poisson distribution to dataDATA=[zeros(81)ones(241)2ones(271)3ones(201)4ones(171) 5ones(101)6ones(31)7] [LAMBDAHAT LAMBDACI] = poissfit(DATA005) fprintf(

nThe Poisson parameter =53f with 95 CI [53f 53f]nn LAMBDAHAT LAMBDACI)

4215 The Poisson Model the law of small numbers 421511 Assumptions of the Poisson model

4215111 The probability that 2 or more events occur in any given ltsmallgt subinterval is essentially 0

4215112 Events are independent 4215113 The probability that an event occurs

during a given subinterval is constant over the entire interval from 0 to T

EEOS 601 Applied Statistics Week 4 P 11 of 38

42152 Calculating Poisson Probabilities 421521 ldquoCalculating Poisson probabilities is an

exercise in choosing T so that eumlT represents the expected number of occurrences in whatever ldquounitrdquo is associated with the random variable X

Example 423 LMex040203_4thm Example 423 Typographical Errors Page 285 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 172011 Revised 1711 Using the Poisson pdf ErrorRate=typographical Errors per page What is the probability that fewer than three typos will appear in a 16shy page edition X=Number of errorsErrorRate=04lambda=ErrorRate16fprintf(Using the Poisson cumulative distribution functionn)P=poisscdf(2lambda)fprintf(The probability that fewer than 3 typos will appear in a n)fprintf(16-p document with an error rate of 31f per page is 53fnlambdaP)fprintf(nUsing the Poisson probability distribution functionn)X=02P=sum(poisspdf(Xlambda))fprintf(If the Poisson parameter = 31f the probability of nlambda)fprintf(fewer than 3 typos = 53fnP)

Example 424 Insect parts in peanut butter

LMex040204_4thm Example 424 Eating bug parts Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu 121210 revised 11211 httpalphaesumbedufacultyedgfilesedgwebphtml This is a simple application of the Poisson cumulative distribution function and Poisson pdf The FDA Food Detection Action Limit for peanut butter is 30 insect fragments per 100 grams If your peanut butter snack weighs 20 grams what is the probability that youll ingest 5 or more insect parts

EEOS 601 Applied Statistics Week 4 P 12 of 38

Calculate lambda the expected number of insect parts per 20 gramslambda=3010020 solve using the Poisson cumulative distribution functionP=1-poisscdf(4lambda)fprintf(The probability that a 20-g snack contains 5 or more insect parts is 53fnP) The problem could also be solved exactly using the Poisson pdfk=04P2=1-sum(poisspdf(klambda))fprintf(The probability that a 20-g snack contains 5 or more insect parts is 53fnP2)

SSum of Poisson processes is a Poisson process

Questions Page 287 4210 Prussian horse kicks 4212 Midwestern Skies Airlines bags lost

Sum of two Poisson variables is Poisson

422 Interval between events the PoissonExponential

Relationship

Theorem 423 P 290

Suppose a series of events satisfying the Poisson model are occurring at a rate of euml

per unit time Let the random variable Y denote the inteval between consecutive

events Then Y has the exponential distribution

EEOS 601 Applied Statistics Week 4 P 13 of 38

Figure 2 Case study 424 Table 426-7 Figure 423

Case Study 424 LMcs040204_4thm Case Study 424 Mauna Loa eruptions amp exponential distribution P 290-291 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Ed Written by Eugene Gallagher for EEOS601 Figure 3 Figure 423 from Case study 424

IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 14 of 38

EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1711 Revised 1711 Fitting the exponential pdf to intervals between eruptons Data = interval between 37 consecutive eruptions of Mauna Loa DATA=[126 73 3 6 37 23 73 23 2 65 94 51 26 21 6 68 16 20 6 18 6 41 40 18 41 11 12 38 77 61 26 3 38 50 91 12] DATA=DATA() lambda=0027 fprintf(Eruptions occuring at 53f per month or once every 32f lambda1(12lambda)) fprintf(yearsn) histc is the Matlab counting function Plot the data using LMFig030402_4thm as a model ObservedI=histc(DATA020140) bar(020140ObservedI(sum(ObservedI)20)histc) set(get(gcaChildren)FaceColor[8 8 1]) axis([0 145 0 0033]) figure(gcf) xlabel(Interval between eruptions (in months)) ylabel(Density) ax1=gca get the handle for the bar charts axes pause y=0140 fy=lambdaexp(-lambday) h2=line(yfyColorrLineStyle--Linewidth2) set(ax1YTick[0001003]) text(340015f_y(y)=0027e^-0027y) title(Figure 423) figure(gcf) pause

Example 425 LMex040205_4thm Example 425 page 292 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu for EEOS601 If the Perseid meteor shower has a rate of 40 per hour Poisson distributed calculate the probability that someone would have to wait 5 minutes before seeing another y= interval in minutes between consecutive sightings Using the symbolic math toolbox

EEOS 601 Applied Statistics Week 4 P 15 of 38

fprintf(Exact P=n)syms y disp(int(4060exp(-4060y)5inf))P=eval(int(4060exp(-4060y)5inf))fprintf(If the Perseids have a Poisson parameter = 4060 per min n)fprintf(an observer would have p=64f of waiting 5 or more nP)fprintf(minutes between consecutive meteorsn)

43 THE NORMAL DISTRIBUTION

Figure 431 LMFig040301_4thm Page 293 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Generate 20 binomial random numbers p(success)=05 Written in 2001 by EugeneGallagherumbedu revised 1711 Use the statistical toolboxs binorndm X=020n=20p=05

Figure 431 Page 293 in Larsen amp Marx (2006)

Y = binopdf(Xnp)bar(XY1)axis([0 21 0 02])title(Figure 431FontSize14)set(get(gcaChildren)FaceColor[8 8 1])ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0220FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=npsigma_j=sqrt(np(1-p)) sigmaj is the standard deviationy_j=00220fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel()ylabel(Probability)title(Figure 431)figure(gcf)pause

IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 16 of 38

431 Theorem 431 Let X be a binomial random variable defined on n independent trials for which p=P(success) For any numbers a and b

Comment The function is referred to as the standard normal (or

Gaussian) curve [or distribution] By convention any random variable whose probabilistic behavior is described by a standard normal curve is denoted by Z (rather than by X Y or W) E(Z)=0 and Var(Z)=1 Page 294 in Larsen amp Marx (2006)

432 Finding areas under the standard normal curve

Figure 432 LMFig040302_4thm Larsen amp Marx Figure 432 plotpage 295 Use LMex040308_4thm as amodel Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Page 267 Standard normal distributionusing ezplot Integration using the symbolicmath toolbox Figure 4 Figure 432 Written byEugeneGallagherumbedu October 1 2010clear allhold off clfsyms z Just for fun use symbolic math integration to find the integralINTP=eval(int(1sqrt(2pi)exp(-(z^22))-inf114)) or use the standard normal cumulative distribution functionintp=normcdf(114)

Plot Figure 432mu=0sigma=1X=-350135

EEOS 601 Applied Statistics Week 4 P 17 of 38

Y = normpdf(Xmusigma) plot(XY--r) axis([-355 355 0 041]) title(Figure 432FontSize16) ax1=gca xlabel( ) ylabel(f_z(z)FontSize14) ax1=gca set(ax1XTick[-3 -2 -1 0 114 2 3] FontSize12

XTickLabel-3-2-1011423FontSize12) set(ax1ytick0104FontSize12) hold on xf=-3501114yf=normpdf(xfmusigma) fill([-35 xf 114][0 yf 0][8 8 1]) hold off

433 The continuity correction

Figure 5 Figure 433 the continuity correction

Example 431 LMex040301_4thm Example 431 Overbooking airline seats Application of the normal distribution and the continuity correction Page 297 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagher httpwwwesumbeduedgwebphtm Written Fall 2010 Revised 1711 178 seats sold for 168 economy class seats available p=09 passenger will show up What is the probability that not everyone who shows up at the gate on time can be accomodated n=178p=09n2=169

IT
Stamp

EEOS 601 Applied Statistics Week 4 P 18 of 38

P=normcdf((n+05-np)sqrt(np(1-p)))- normcdf((n2-05-np)sqrt(np(1-p))) fprintf( nP not all can be accomodated with continuity correction = 64fnP) Without the continuity correction P2=normcdf((n-np)sqrt(np(1-p)))- normcdf((n2-np)sqrt(np(1-p))) fprintf( P not all can be accomodated without continuity correction = 64fnP2) k=169178 Solve as a binomial pdf P3 = sum(binopdf(knp)) fprintf( P not all can be accomodated exact binomial = 64fnP3) P4 = binocdf(nnp)-binocdf(n2-1np) fprintf( P not all can be accomodated exact binomial cdf = 64fnP4)

Case Study 431 ESP LMcs040301_4thm Case Study 431 ESP P 298 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Thirty two students correctly guessed 12489 times out of 60000 when the expected number of successes would be just 12000 What is the probability of this occuring by chancek=12489n=6e4p=15P1=1-binocdf(knp)fprintf(nUsing the binomial cdf the exact P = 97gnP1)X=(124885-np)sqrt(np(1-p))P2=1-normcdf(X)fprintf(Using the Normal approximation with the continuity n)fprintf(correction the exact P = 97gnP2)X=(12489-np)sqrt(np(1-p))P3=1-normcdf(X)fprintf(Using the Normal approximation without the continuity n)fprintf(correction the exact P = 97gnP3)

Questions Page 299

EEOS 601 Applied Statistics Week 4 P 19 of 38

434 Central Limit Theorem (p 301) Central limit theorem Every binomial random variable X can be written as the sum of n independent Bernoulli random variables X X X where X =1 with probability P and 0 with probability 1-p but if 1 2 n i

X=X +X ++X Theorem 431 can be reexpressed as 1 2 n

The Russian Lyapunov made many of the key advances and in 1920 George Polya called it the central limit theorem Theorem 432 p 273 Let W W be an infinite sequence of of independent random variables 1 2

with the same distribution Suppose that the mean igrave and the variance oacute2 of f (w) are both finite w

For any numbers a and b

Comment The central limit theorem is often stated in terms of the average of W W and 1 2

W rather than the sum Theorem 432 can be stated in equivalent form n

Example 432 LMEx040302_4thm Larsen amp Marx Example 432 Example of the central limit theorem Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf

Figure 5 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 uniformly distributed random numbers after conversion to Z scores The fit to the normal distribution is quite good even though each of the 40 samples is based on only 5 uniformly distributed numbers

EEOS 601 Applied Statistics Week 4 P 20 of 38

See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogrammultiple of 7 n=100samples=1000numbins=14 Numbins for thehistogram multiple of 7 n=1000samples=500numbins=14 For the histogram n=1000samples=40 numbins=7 For the histogramC=rand(samplesn) Figure 6 Example 432 Page 302 in Larsen amp MarxC6=sum(C) sums of n replicates (2006) 1000 samples of 5 uniformly distributed random Use symbolic math toolbox to numbers after conversion to Z scores The number of binscalculate the variance of a uniform increased from 7 to 14 for this bar chart The normal random distribution distribution provides an excellent fit even though each ofmu=05 1000 means is based on only 5 cases drawn from a uniformsyms f z y distribution The uniform distribution is symmetricdisp(Definite integral (01) of y^2) allowing a rapid convergence to the normal distributionint(y^201)EYsq=int(sym(y^2)01)VARY=EYsq-mu^2disp(Var of a uniformly distributed random number on the interval (01))disp(VARY)vary=eval(VARY) convert from symbolic to numeric The variance is 112 The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302) end Use the code on page 5-115 of Statistics toolbox to draw histogram binwidthscaling=7numbins This will produce 7 bins (35-(-35)) edges=-35binwidthscaling35N=histc(C7edges) bar(edgesNsum(N)binwidthscaling1) ax1=gca set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 21 of 38

axis([-36 36 -0005 053]) ylabel(DensityFontSize14)xlabel(Z-ratio) set(ax1xtickedgesFontSize12) set(ax1ytick00105FontSize12) figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is Figure 7 Example 432 Page 302 in Larsen amp Marx f(y_j)=1(sqrt(2pi)sigma_j)exp(-1

(2006) 1000 samples of 500 uniformly distributed random numbers after conversion to Z scores The number of bins

2((y_j-mu_j)sigma_j)^2) increased from 7 to 14 for this bar chart The fit of the mu_j=0 sigma_j=1 sigmaj is the standard

normal distribution is excellent as would be expected from the large number of cases (500) on which each mean is

deviation = 1 after Z transform basedy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)s=sprintf(Example 432 n=40f and samples=30fnsamples)title(s)figure(gcf)pause

LMex040302_logrnd_4thm Larsen amp Marx Example 432 Example of the central limit theorem using lognormally distributed variables instead of uniform Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox

Figure 8 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 lognormally distributed (igrave=1 oacute=1) random numbers after conversion to Z-scores The normal approximation is quite poor Because each of the 40 samples is based on a mean of only 5 cases

EEOS 601 Applied Statistics Week 4 P 22 of 38

Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogram multiple of 7 Figure 9 Example 432 Page 302 in Larsen amp Marx

n=100samples=1000 (2006) 1000 samples of 1000 lognormally distributed

numbins=14 Numbins for the (igrave=1 oacute=1) random numbers after conversion to Z scores I

histogram multiple of 7 used 14 bins for this bar chart Even for the highly

n=1000samples=1000 asymmetrically lognormally distributed samples the

numbins=14 For the histogram normal approximation is excellent because of the large

n=1000samples=40 numbins=7 number of cases (1000) on which each mean was based

For the histogram lognormal=0 This example calculates the z scores based on the parametric mean and variance for the underlying distributions For the uniform distribution the variance is calculated from the variance of the distribution to be 112 For the lognormal distribution lognstat is the built-in Matlab function that will covert the mean and variance from the log scale to the standard scale if lognormal mu=1var=1 C=lognrnd(11samplesn) [muvary]=lognstat(11)s=sprintf(Example 432 Lognormal n=40f and samples=30fnsamples) s2=sprintf(Example 432 Lognormal distribution 1st sample n=40fn) else C=rand(samplesn)

mu=05 Use symbolic math toolbox to calculate the variance of a uniform

random distribution syms f z y disp(Definite integral (01) of y^2) int(y^201) EYsq=int(sym(y^2)01)

EEOS 601 Applied Statistics Week 4 P 23 of 38

VARY=EYsq-mu^2 disp( Var of a uniformly distributed random number on the interval (01)) disp(VARY) vary=eval(VARY) convert from symbolic to numeric The variance is 112 s=sprintf(Example 432 Uniform n=40f and samples=40fnsamples) s2=sprintf(Example 432 Uniform distribution 1st sample n=40fn)endC6=sum(C) sums of n replicates The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302)end

Use the code on page 5-115 of Statistics toolbox to draw histogram Draw a histogram of one row of datahist(C(1))ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])ylabel(FrequencyFontSize16)xlabel(XFontSize16)title(s2FontSize20)figure(gcf)pause

histogram of the z scoresbinwidthscaling=7numbins This will produce 7 bins (35-(-35))edges=-35binwidthscaling35N=histc(C7edges)bar(edgesNsum(N)binwidthscaling1histc)ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])axis([-36 36 -0005 105max(Nsum(N)binwidthscaling)])ylabel(DensityFontSize16)xlabel(Z-ratioFontSize16)set(ax1xtickedgesFontSize14)set(ax1ytick00107FontSize14)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=0

EEOS 601 Applied Statistics Week 4 P 24 of 38

sigma_j=1 sigmaj is the standard deviation = 1 after Z transformy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)

title(sFontSize20)figure(gcf)pause

Example 433 LMex040303_4thm Example 433 Pp 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition An application of the symbolic math integration functions Typo in book f_y(y)=y 3(1-y)^2 not fy(y)=3(1-y)^2 syms y f_y(y)=y 3(1-y)^2 0lt=y=1 Let mean y = (115)Sum from 1 to 15 Yi int(y3(1-y)^201) muy=eval(int(y3(1-y)^201)) by definition of variance int(y^23(1-y)^201)-14^2 vary=eval(int(y^23(1-y)^201)-14^2) P=normcdf((38-muy) (sqrt(vary)sqrt(15)))- normcdf((18-muy) (sqrt(vary)sqrt(15)))

Example 434 LMEx040304_4thm Example 434 Effects of rounding analyzed with normal approximation Effect of rounding on the expected value and variance of rounded estimates P 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Using the central limit theorem what is the probability that the error on 100 transactions each rounded to the nearest $100 would exceed $500 What is the variance of a uniformly distributed random number over the interval -50 to 50 From definition of Variance Var(Y)=E[(Y-mu)^2]= integral from -50 to 50 (y-mu)^2 f_y(y) dy mu=0 f_y(y)=1 This is the height of the uniform pdf at each value of y for max(y)-min(y)=1 the value of f_y(y) is 1 for uniform random numbers spread over different ranges for max(y)-min(y)f_y(y) must equal 1 thus for a uniform random number distributed on the interval -100 to 100

EEOS 601 Applied Statistics Week 4 P 25 of 38

f_y(y)=1200 Written by EugeneGallagherumbedu for EEOS601 httpwwwesumbeduedgwebphtm Written in 2001 revised 1911 The mfile goes through some applications of the symbolic math toolbox at the beginning to verify that the variance of 100 rounded transactions is 25003 format rat syms y mu int(y^2100-5050)Use the symbolic math toolbox to find the variance of a uniformly distributed variable on the interval -50 to 50 it is 25003 VarY=eval(int(sym(y^2100)-5050)) Variance of a uniform randomly distributed number on the interval -50 to 50 is 25003 Note how the variance remains 25003 with a shift in mean Var(Y)=E(Y^2)-mu^2 mu=50 int(y^21000100) EYsq=eval(int(sym(y^2100)0100)) The previous statement produces 100003 which produces an identical variance after subtracting the square of the expected value mu^2 as is appropriate from the definition of variance VARY=EYsq-mu^2 Note that Var(Y) is also defined as integral -inf to inf (y-mu)^2 f_y(y) dy We can create the definite integral of this equation and evaluate it to produce the variance syms y mu mu=50 VARY2=eval(int(sym((y-mu)^2100)y0100)) VARY2 should also be 25003 format using Theorem 31312 in Larsen amp Marx (2001 p 223) Calculate the variance of all 100 transactions (multiply the variance by 100) Note this is for 100 indepedently distributed random variates where the positive and negative errors among transactions are not correlated Varall=100VARY If you wanted to convert the variance calculated here in dollars to the variance in pennies youd have to use Theorem 3131 p 222 It would the variance of Varall in pennies would be a=100Var(aVarall)=a^2 Varall =10000 Varall calculate the standard deviation sigmaall=sqrt(Varall)

EEOS 601 Applied Statistics Week 4 P 26 of 38

Calculate the Z statisticZ=zeros(21)Z(1)=(-500-0)sigmaallZ(2)=(500-0)sigmaall Use the cumulative normal probability function to find the probability of being more than $5 dollars off after 100 transactionsP=1-(normcdf(Z(2))-normcdf(Z(1)))fprintf(The probability of being more than $500 off after 100n)fprintf(transactions in which each transaction is rounded to the n)fprintf(nearest $100 is 64fnP)

Example 435 LMex040305_4thm Example 435 Memphis earthquakes and the Poisson distribution Pp 305-306 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Annual earthquakes registering 25 on the Richter scale and within 40 miles of downtown Memphis have a Poisson distribution with lambda=65 Calculate the exact probability that nine or more such earthquakes will strike next year using the Poisson distribution and compare that with the normal approximation based on the central limit theoremlambda=65N=9P=1-poisscdf(N-1lambda)fprintf(nThe exact p value using the Poisson cumulative distribution )fprintf(function with lambda=53f is 74fnlambdaP)P2=1-normcdf(((N-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP2)continuity=05 The continuity correction is appropriate whenever a discrete probability model is being approximated by the area under a curve)P3=1-normcdf(((N+continuity-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem with )fprintf(the continuity correction is )fprintf(74fnP3)

Optional plot the Poisson pdf and the normal approximation Use LMFig040301_4thm as a modelX=018Y = poisspdf(Xlambda)bar(XY1)axis([-06 19 0 02])title(Example 435FontSize14)set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 27 of 38

ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0218FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=lambdasigma_j=sqrt(lambda) sigmaj is the standard deviationy_j=00218fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel(Number of Memphis Earthquakes gt 25)figure(gcf)pause

Questions p 306

435 The Normal Curve as a Model for Individual Measurements

Definition 431 A random variable Y is said to be normally distributed with mean igrave and variance oacutesup2 if

Comment Areas under and ldquoarbitraryrdquo normal distribution f (y) y

are calculated by finding the equivalent area under the standard normal distribution f (z) z Figure 10 Figure 435 from Example 436 If a driverrsquos

true blood alcohol level is 0095 and the breathalyzer has an instrumental error oacute= 0004 (pdf shown in upper panel) What is the probability that a driver will be

The ratio is often referred to erroneously charged with DUI The problem can be as either a Z transformation or a Z analyzed using the pdf for the unscaled data or the data can score be converted to z scores (lower panel) by subtracting the

mean and dividing by oacute The probability of an erroneous Example 436 charge is 1056 the area to the right of the cutpoints LMex040306_4thm shown in the normal and standard normal curves

EEOS 601 Applied Statistics Week 4 P 28 of 38

Example 436 Drunk driving pages 308-309 in Larsen amp Marx (2006) Based on LMex040305_4thm Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml A legally drunk driver in many states has a blood alcohol level of 010 but the breathalyzer has measurement error Repeated measurements indicate that the the estimate of the mean is unbiased and sigma is 0004 If a drivers true blood alcohol content is 0095 what is the probability that he will be incorrectly booked on a DUI chargemu=0095sigma=0004Observed=01P=1-normcdf((Observed-mu)sigma)fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP)

Plot Figure 435subplot(211)X=0075000050115Y = normpdf(Xmusigma)plot(XY--r)axis([0075 0115 0 105])title(Figure 435FontSize14)ax1=gcaxlabel(Blood Alcohol Level)ylabel(f_y(y)FontSize14)ax1=gcaset(ax1xtick0080005011FontSize12)set(ax1ytick025100FontSize12)hold onxf=0100001115yf=normpdf(xfmusigma)fill([01 xf 0115][0 yf 0][8 8 1])hold off

subplot(212)X=-3500135Y = normpdf(X01)plot(XY--r)axis([-355 355 0 042])ax1=gcaxlabel(Z-score)ylabel(f_z(z)FontSize14)ax1=gca

EEOS 601 Applied Statistics Week 4 P 29 of 38

set(ax1xtick-350535FontSize12)set(ax1ytick00204FontSize12)hold onxf=125000135yf=normpdf(xf01)fill([125 xf 35][0 yf 0][8 8 1])figure(gcf)pausehold offsubplot(111)

Example 437 Mensa p 309 LMex040307_4thm Example 437 Mensa amp IQ Pages309-310 in Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Based on LMex040306_4thm Written Fall 2010 by EugeneGallagher for EEOS601 Revised1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfi Figure 11 Figure 436 from Example 437 A Mensa

lesedgwebphtml score of 133 marks the upper 2 of IQrsquos if igrave=100 and

A person must be in the upper 2 oacute=16

of the population in IQ to join What is the minimum IQ that will qualify a person for membership Assume that the mean IQ is 100 and sigma is 16mu=100sigma=16 Use the inverse of the normal pdf to find the IQ cutoff for the upper 2 of the population 098=(1-002)IQmensa=norminv(09810016)fprintf(nThe lowest acceptable IQ for membership in Mensa is )fprintf(30fnceil(IQmensa))

Plot Figure 436X=50150Y = normpdf(Xmusigma)plot(XY--r)axis([45 155 0 003])title(Figure 436 Mensa amp IQFontSize20)ax1=gcaxlabel(IQFontSize16)ylabel(f_y(y)FontSize16)ax1=gcaset(ax1xtick[50 100 133 150]FontSize14)

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 5: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 5 of 38

Introduction

Chapter 4 is my favorite chapter in Larsen amp Marx (2006) It also covers the Poisson distribution which is the null model for random distributions in nature The case studies and examples are interesting including the analysis of leukemia clusters in Niles Illinois (Case Study 421) and the probability that yoursquoll be eating 5 or more insect parts with your next peanut butter and jelly sandwich (Example 424)

Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4

4 Special distributionsLambert Adolphe Jacques Quetelet (1796-1874)

41 Introduction 411 To qualify as a probability model over a sample space S a function

4111 must be nonnegative for all outcomes in S an 4112 must integrate to 1

412 Not all functions qualify as a probability model 42 The Poisson distribution

4211 binomial distribution

4212 The Poisson limit

Poisson limit theorem (Larsen amp Marx 2001 Theorem 421 p 251) If n 4 and p 0 in such a way that euml = np remains constant then for any nonnegative integer k

Example 421 Convergence of the Poisson very good for n100

EEOS 601 Applied Statistics Week 4 P 6 of 38

LMex040201_4thm Example 421 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Table 421 amp Table 422 Larsen amp Marx (2006 p 277) Written by Eugene Gallagher for EEOS601 in2001 revised 1911 EugeneGallagherumbedu Comparing Poisson pdf and binomial pdf Figure 1 Figure from Table 421 With n=5

p=15 (euml=1) the Poisson approximation is just

n=5 adequate

k = [0n]lambda=1p_pdf = poisspdf(klambda)p_pdf=[p_pdf1-sum(p_pdf)]b_pdf= binopdf(knp)b_pdf=[b_pdf1-sum(b_pdf)]kdisp=[06]fprintf(Table 421n)fprintf(f10 53f 53fn[kdisp b_pdf p_pdf])

Plot as a grouped histogram bar(kdisp[b_pdf p_pdf]grouped) axis([-6 66 0 05]) legend(Binomial pdfPoisson pdf) xlabel(kFontsize14) ylabel(FrequencyFontsize14) title(Table 421Fontsize16) figure(gcf)pause p=1100n=100 Figure 1 Figure from Table 422 With

ndisp=[09] n=100 (euml=1) the Poisson approximation is

k = [0n] very good

lambda=1p_pdf = poisspdf(ndisplambda)p_pdf=[p_pdf1-sum(p_pdf)]b_pdf= binopdf(ndispnp)b_pdf=[b_pdf1-sum(b_pdf)]kdisp=[010]

IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 7 of 38

fprintf(Table 421n)fprintf(f10 53 53fn[kdisp b_pdf p_pdf]) Plot as a grouped histogram bar(kdisp[b_pdf p_pdf]grouped) legend(Binomial pdfPoisson pdfFontSize16) axis([-6 106 0 04]) xlabel(kFontSize14) ylabel(FrequencyFontSize14) title(Table 422FontSize16) figure(gcf)pause

Table 421

Figure 1 Table 421 Table 422

Example 422 Shadyrest hospital LMex040202_4thm Example 422 Shady Rest Hospital cardiac monitoring machines p 277-288 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition There are 12000 residents in the area served by Shady Rest Hospital and the probability that a resident will have a heart attack and need a monitor is 18000 There are 3 monitors available What is the probability that the 3 monitors will be inadequate to meet the communitys needs That is what is the probability of 4 or more heart attacks Written by EugeneGallagherumbedu Fall 2010 fprintf(Solved as the sum of binomial probabilities 86fn 1-sum(binopdf(031200018000))) fprintf(Or solved using the binomial cumulative distribution function 86fn 1-sum(binocdf(31200018000))) fprintf(Or solved using the Poisson distribution 86fn 1-sum(poisspdf(03120008000))) fprintf(Or solved using the cumulative Poisson distribution 86fn

IT
Stamp

EEOS 601 Applied Statistics Week 4 P 8 of 38

1-sum(poisscdf(3120008000)))

Case Study 421 Leukemia in Niles Illinois LMcs040201_4thm Case Study 421 Leukemia cancer cluster Pp 278-279 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Edition Written by Eugene D Gallagher for UMASSBostons EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1012010 Revised 1711 p=incidence of leukemia in 5 13 years in towns around Niles IL What is the probability of observing 8 or more childhood leukemia cases by chance alone Solve using the binomial pdf and the Poisson limit N=number of people in Niles ILN=7076p=2481e5X=87076X=Xbp=binopdf(XNp)BP=sum(bp) or it could be solved using the complementX2=07BP2=1-sum(binopdf(X2Np))fprintf(nFrom the binomial pdf the exact probability of observing n8 or more ) fprintf(deaths among 7076 children in 5 13 years is 63gnBP)fprintf(From the complement of the binomial pdf the exact p of observing n8 or)fprintf( more deaths among 7076 children in 5 13 years is 63gnBP2) Now solve using the Poisson pdf lambda is the expected number of cases among 7076 in Niles Illionoislambda=7076248e-5PP= 1-sum(poisspdf(X2lambda))fprintf(nFrom the Poisson pdf the approximate probability of observing n8 or ) fprintf(more deaths among 7076 children in 5 13 years with a Poisson n)fprintf(expected value of 63g is 63gnnlambdaPP)

Or one could use the Poisson cumulative distribution functionPCP = 1-poisscdf(7lambda)fprintf(From the cumulative Poisson pdf the approximate probability of n) fprintf(observing 8 or more deaths among 7076 children in 5 13 years )fprintf(with na Poisson expected value of 63g is 63gnn lambdaPCP) RE=abs(BP-PP)BP 100 fprintf( The relative error in the Poisson approximation is 41fnRE)

Questions p 279

EEOS 601 Applied Statistics Week 4 P 9 of 38

4213 The Poisson Distribution 42131 First described by Poisson as a limit theorem and

then used in 1898 by Professor Ladislaus von Bortkiewicz to model the number the Prussian cavalry officers kicked to death by their horses

Larsen amp Marx 2001 Theorem 422 p 251 The random variable X is said to have a Poisson distribution if

4214 Fitting the Poisson Distribution to Data

Case Study 422 LMcs040202_4thm Case Study 422 alpha decay amp Geiger counter Page 282-284 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1142010 Revised 1711 Fitting the Poisson pdf to radioactive decay alpha-particlek=[011]Frequency=[57 203 383 525 532 408 273 139 45 27 10 6]sumFrequency=sum(Frequency)lambda=sum(kFrequency)sumFrequencyfprintf(The Poisson parameter is 53fnlambda)ObservedP=FrequencysumFrequencyExpectedP=poisspdf(klambda)TableData=[k Frequency ObservedP ExpectedP]fprintf(ntttttTable 423n)fprintf(No Detected k tFrequency tProportion tp_x(k)n)fprintf( 20ftttt20fttt53ftt53fnTableData)

Using concepts from Chapter 5 a MLE fit could be made of the data The only new feature added is a 95 CI for the binomial parameter See LMcs050201_4thm for another example of using Poisspdf to fit the Poisson distribution to dataDATA=[zeros(571)ones(2031)2ones(3831)3ones(5251)4ones(5321) 5ones(4081)6ones(2731)7ones(1391)8ones(451) 9ones(271)10ones(101)11ones(61)] [LAMBDAHAT LAMBDACI] = poissfit(DATA005) fprintf(

nThe Poisson parameter =53f with 95 CI [53f 53f]nn LAMBDAHAT LAMBDACI)

EEOS 601 Applied Statistics Week 4 P 10 of 38

Case Study 423 LMcs040203_4thm Fitting football fumbles to a Poisson pdf From Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu Revised 1142010 See LMcs040202_4thm (alpha decay) for a very similar programk=[07]Frequency=[8 24 27 20 17 10 3 1]sumFrequency=sum(Frequency)meank=sum(kFrequency)sumFrequencyfprintf(The Poisson parameter is 53fnmeank)ObservedP=FrequencysumFrequencyExpectedP=Poisspdf(kmeank)TableData=[k Frequency ObservedP ExpectedP]fprintf(ntttttTable 425n)fprintf(No Detected k tFrequency tProportion tp_x(k)n)fprintf( 20ftttt20fttt53ftt53fnTableData) Using concepts from Chapter 5 a MLE fit could be made of the data The only new feature added is a 95 CI for the binomial parameter See LMcs050201_4thm for another example of using Poisspdf to fit the Poisson distribution to dataDATA=[zeros(81)ones(241)2ones(271)3ones(201)4ones(171) 5ones(101)6ones(31)7] [LAMBDAHAT LAMBDACI] = poissfit(DATA005) fprintf(

nThe Poisson parameter =53f with 95 CI [53f 53f]nn LAMBDAHAT LAMBDACI)

4215 The Poisson Model the law of small numbers 421511 Assumptions of the Poisson model

4215111 The probability that 2 or more events occur in any given ltsmallgt subinterval is essentially 0

4215112 Events are independent 4215113 The probability that an event occurs

during a given subinterval is constant over the entire interval from 0 to T

EEOS 601 Applied Statistics Week 4 P 11 of 38

42152 Calculating Poisson Probabilities 421521 ldquoCalculating Poisson probabilities is an

exercise in choosing T so that eumlT represents the expected number of occurrences in whatever ldquounitrdquo is associated with the random variable X

Example 423 LMex040203_4thm Example 423 Typographical Errors Page 285 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 172011 Revised 1711 Using the Poisson pdf ErrorRate=typographical Errors per page What is the probability that fewer than three typos will appear in a 16shy page edition X=Number of errorsErrorRate=04lambda=ErrorRate16fprintf(Using the Poisson cumulative distribution functionn)P=poisscdf(2lambda)fprintf(The probability that fewer than 3 typos will appear in a n)fprintf(16-p document with an error rate of 31f per page is 53fnlambdaP)fprintf(nUsing the Poisson probability distribution functionn)X=02P=sum(poisspdf(Xlambda))fprintf(If the Poisson parameter = 31f the probability of nlambda)fprintf(fewer than 3 typos = 53fnP)

Example 424 Insect parts in peanut butter

LMex040204_4thm Example 424 Eating bug parts Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu 121210 revised 11211 httpalphaesumbedufacultyedgfilesedgwebphtml This is a simple application of the Poisson cumulative distribution function and Poisson pdf The FDA Food Detection Action Limit for peanut butter is 30 insect fragments per 100 grams If your peanut butter snack weighs 20 grams what is the probability that youll ingest 5 or more insect parts

EEOS 601 Applied Statistics Week 4 P 12 of 38

Calculate lambda the expected number of insect parts per 20 gramslambda=3010020 solve using the Poisson cumulative distribution functionP=1-poisscdf(4lambda)fprintf(The probability that a 20-g snack contains 5 or more insect parts is 53fnP) The problem could also be solved exactly using the Poisson pdfk=04P2=1-sum(poisspdf(klambda))fprintf(The probability that a 20-g snack contains 5 or more insect parts is 53fnP2)

SSum of Poisson processes is a Poisson process

Questions Page 287 4210 Prussian horse kicks 4212 Midwestern Skies Airlines bags lost

Sum of two Poisson variables is Poisson

422 Interval between events the PoissonExponential

Relationship

Theorem 423 P 290

Suppose a series of events satisfying the Poisson model are occurring at a rate of euml

per unit time Let the random variable Y denote the inteval between consecutive

events Then Y has the exponential distribution

EEOS 601 Applied Statistics Week 4 P 13 of 38

Figure 2 Case study 424 Table 426-7 Figure 423

Case Study 424 LMcs040204_4thm Case Study 424 Mauna Loa eruptions amp exponential distribution P 290-291 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Ed Written by Eugene Gallagher for EEOS601 Figure 3 Figure 423 from Case study 424

IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 14 of 38

EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1711 Revised 1711 Fitting the exponential pdf to intervals between eruptons Data = interval between 37 consecutive eruptions of Mauna Loa DATA=[126 73 3 6 37 23 73 23 2 65 94 51 26 21 6 68 16 20 6 18 6 41 40 18 41 11 12 38 77 61 26 3 38 50 91 12] DATA=DATA() lambda=0027 fprintf(Eruptions occuring at 53f per month or once every 32f lambda1(12lambda)) fprintf(yearsn) histc is the Matlab counting function Plot the data using LMFig030402_4thm as a model ObservedI=histc(DATA020140) bar(020140ObservedI(sum(ObservedI)20)histc) set(get(gcaChildren)FaceColor[8 8 1]) axis([0 145 0 0033]) figure(gcf) xlabel(Interval between eruptions (in months)) ylabel(Density) ax1=gca get the handle for the bar charts axes pause y=0140 fy=lambdaexp(-lambday) h2=line(yfyColorrLineStyle--Linewidth2) set(ax1YTick[0001003]) text(340015f_y(y)=0027e^-0027y) title(Figure 423) figure(gcf) pause

Example 425 LMex040205_4thm Example 425 page 292 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu for EEOS601 If the Perseid meteor shower has a rate of 40 per hour Poisson distributed calculate the probability that someone would have to wait 5 minutes before seeing another y= interval in minutes between consecutive sightings Using the symbolic math toolbox

EEOS 601 Applied Statistics Week 4 P 15 of 38

fprintf(Exact P=n)syms y disp(int(4060exp(-4060y)5inf))P=eval(int(4060exp(-4060y)5inf))fprintf(If the Perseids have a Poisson parameter = 4060 per min n)fprintf(an observer would have p=64f of waiting 5 or more nP)fprintf(minutes between consecutive meteorsn)

43 THE NORMAL DISTRIBUTION

Figure 431 LMFig040301_4thm Page 293 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Generate 20 binomial random numbers p(success)=05 Written in 2001 by EugeneGallagherumbedu revised 1711 Use the statistical toolboxs binorndm X=020n=20p=05

Figure 431 Page 293 in Larsen amp Marx (2006)

Y = binopdf(Xnp)bar(XY1)axis([0 21 0 02])title(Figure 431FontSize14)set(get(gcaChildren)FaceColor[8 8 1])ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0220FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=npsigma_j=sqrt(np(1-p)) sigmaj is the standard deviationy_j=00220fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel()ylabel(Probability)title(Figure 431)figure(gcf)pause

IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 16 of 38

431 Theorem 431 Let X be a binomial random variable defined on n independent trials for which p=P(success) For any numbers a and b

Comment The function is referred to as the standard normal (or

Gaussian) curve [or distribution] By convention any random variable whose probabilistic behavior is described by a standard normal curve is denoted by Z (rather than by X Y or W) E(Z)=0 and Var(Z)=1 Page 294 in Larsen amp Marx (2006)

432 Finding areas under the standard normal curve

Figure 432 LMFig040302_4thm Larsen amp Marx Figure 432 plotpage 295 Use LMex040308_4thm as amodel Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Page 267 Standard normal distributionusing ezplot Integration using the symbolicmath toolbox Figure 4 Figure 432 Written byEugeneGallagherumbedu October 1 2010clear allhold off clfsyms z Just for fun use symbolic math integration to find the integralINTP=eval(int(1sqrt(2pi)exp(-(z^22))-inf114)) or use the standard normal cumulative distribution functionintp=normcdf(114)

Plot Figure 432mu=0sigma=1X=-350135

EEOS 601 Applied Statistics Week 4 P 17 of 38

Y = normpdf(Xmusigma) plot(XY--r) axis([-355 355 0 041]) title(Figure 432FontSize16) ax1=gca xlabel( ) ylabel(f_z(z)FontSize14) ax1=gca set(ax1XTick[-3 -2 -1 0 114 2 3] FontSize12

XTickLabel-3-2-1011423FontSize12) set(ax1ytick0104FontSize12) hold on xf=-3501114yf=normpdf(xfmusigma) fill([-35 xf 114][0 yf 0][8 8 1]) hold off

433 The continuity correction

Figure 5 Figure 433 the continuity correction

Example 431 LMex040301_4thm Example 431 Overbooking airline seats Application of the normal distribution and the continuity correction Page 297 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagher httpwwwesumbeduedgwebphtm Written Fall 2010 Revised 1711 178 seats sold for 168 economy class seats available p=09 passenger will show up What is the probability that not everyone who shows up at the gate on time can be accomodated n=178p=09n2=169

IT
Stamp

EEOS 601 Applied Statistics Week 4 P 18 of 38

P=normcdf((n+05-np)sqrt(np(1-p)))- normcdf((n2-05-np)sqrt(np(1-p))) fprintf( nP not all can be accomodated with continuity correction = 64fnP) Without the continuity correction P2=normcdf((n-np)sqrt(np(1-p)))- normcdf((n2-np)sqrt(np(1-p))) fprintf( P not all can be accomodated without continuity correction = 64fnP2) k=169178 Solve as a binomial pdf P3 = sum(binopdf(knp)) fprintf( P not all can be accomodated exact binomial = 64fnP3) P4 = binocdf(nnp)-binocdf(n2-1np) fprintf( P not all can be accomodated exact binomial cdf = 64fnP4)

Case Study 431 ESP LMcs040301_4thm Case Study 431 ESP P 298 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Thirty two students correctly guessed 12489 times out of 60000 when the expected number of successes would be just 12000 What is the probability of this occuring by chancek=12489n=6e4p=15P1=1-binocdf(knp)fprintf(nUsing the binomial cdf the exact P = 97gnP1)X=(124885-np)sqrt(np(1-p))P2=1-normcdf(X)fprintf(Using the Normal approximation with the continuity n)fprintf(correction the exact P = 97gnP2)X=(12489-np)sqrt(np(1-p))P3=1-normcdf(X)fprintf(Using the Normal approximation without the continuity n)fprintf(correction the exact P = 97gnP3)

Questions Page 299

EEOS 601 Applied Statistics Week 4 P 19 of 38

434 Central Limit Theorem (p 301) Central limit theorem Every binomial random variable X can be written as the sum of n independent Bernoulli random variables X X X where X =1 with probability P and 0 with probability 1-p but if 1 2 n i

X=X +X ++X Theorem 431 can be reexpressed as 1 2 n

The Russian Lyapunov made many of the key advances and in 1920 George Polya called it the central limit theorem Theorem 432 p 273 Let W W be an infinite sequence of of independent random variables 1 2

with the same distribution Suppose that the mean igrave and the variance oacute2 of f (w) are both finite w

For any numbers a and b

Comment The central limit theorem is often stated in terms of the average of W W and 1 2

W rather than the sum Theorem 432 can be stated in equivalent form n

Example 432 LMEx040302_4thm Larsen amp Marx Example 432 Example of the central limit theorem Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf

Figure 5 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 uniformly distributed random numbers after conversion to Z scores The fit to the normal distribution is quite good even though each of the 40 samples is based on only 5 uniformly distributed numbers

EEOS 601 Applied Statistics Week 4 P 20 of 38

See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogrammultiple of 7 n=100samples=1000numbins=14 Numbins for thehistogram multiple of 7 n=1000samples=500numbins=14 For the histogram n=1000samples=40 numbins=7 For the histogramC=rand(samplesn) Figure 6 Example 432 Page 302 in Larsen amp MarxC6=sum(C) sums of n replicates (2006) 1000 samples of 5 uniformly distributed random Use symbolic math toolbox to numbers after conversion to Z scores The number of binscalculate the variance of a uniform increased from 7 to 14 for this bar chart The normal random distribution distribution provides an excellent fit even though each ofmu=05 1000 means is based on only 5 cases drawn from a uniformsyms f z y distribution The uniform distribution is symmetricdisp(Definite integral (01) of y^2) allowing a rapid convergence to the normal distributionint(y^201)EYsq=int(sym(y^2)01)VARY=EYsq-mu^2disp(Var of a uniformly distributed random number on the interval (01))disp(VARY)vary=eval(VARY) convert from symbolic to numeric The variance is 112 The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302) end Use the code on page 5-115 of Statistics toolbox to draw histogram binwidthscaling=7numbins This will produce 7 bins (35-(-35)) edges=-35binwidthscaling35N=histc(C7edges) bar(edgesNsum(N)binwidthscaling1) ax1=gca set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 21 of 38

axis([-36 36 -0005 053]) ylabel(DensityFontSize14)xlabel(Z-ratio) set(ax1xtickedgesFontSize12) set(ax1ytick00105FontSize12) figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is Figure 7 Example 432 Page 302 in Larsen amp Marx f(y_j)=1(sqrt(2pi)sigma_j)exp(-1

(2006) 1000 samples of 500 uniformly distributed random numbers after conversion to Z scores The number of bins

2((y_j-mu_j)sigma_j)^2) increased from 7 to 14 for this bar chart The fit of the mu_j=0 sigma_j=1 sigmaj is the standard

normal distribution is excellent as would be expected from the large number of cases (500) on which each mean is

deviation = 1 after Z transform basedy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)s=sprintf(Example 432 n=40f and samples=30fnsamples)title(s)figure(gcf)pause

LMex040302_logrnd_4thm Larsen amp Marx Example 432 Example of the central limit theorem using lognormally distributed variables instead of uniform Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox

Figure 8 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 lognormally distributed (igrave=1 oacute=1) random numbers after conversion to Z-scores The normal approximation is quite poor Because each of the 40 samples is based on a mean of only 5 cases

EEOS 601 Applied Statistics Week 4 P 22 of 38

Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogram multiple of 7 Figure 9 Example 432 Page 302 in Larsen amp Marx

n=100samples=1000 (2006) 1000 samples of 1000 lognormally distributed

numbins=14 Numbins for the (igrave=1 oacute=1) random numbers after conversion to Z scores I

histogram multiple of 7 used 14 bins for this bar chart Even for the highly

n=1000samples=1000 asymmetrically lognormally distributed samples the

numbins=14 For the histogram normal approximation is excellent because of the large

n=1000samples=40 numbins=7 number of cases (1000) on which each mean was based

For the histogram lognormal=0 This example calculates the z scores based on the parametric mean and variance for the underlying distributions For the uniform distribution the variance is calculated from the variance of the distribution to be 112 For the lognormal distribution lognstat is the built-in Matlab function that will covert the mean and variance from the log scale to the standard scale if lognormal mu=1var=1 C=lognrnd(11samplesn) [muvary]=lognstat(11)s=sprintf(Example 432 Lognormal n=40f and samples=30fnsamples) s2=sprintf(Example 432 Lognormal distribution 1st sample n=40fn) else C=rand(samplesn)

mu=05 Use symbolic math toolbox to calculate the variance of a uniform

random distribution syms f z y disp(Definite integral (01) of y^2) int(y^201) EYsq=int(sym(y^2)01)

EEOS 601 Applied Statistics Week 4 P 23 of 38

VARY=EYsq-mu^2 disp( Var of a uniformly distributed random number on the interval (01)) disp(VARY) vary=eval(VARY) convert from symbolic to numeric The variance is 112 s=sprintf(Example 432 Uniform n=40f and samples=40fnsamples) s2=sprintf(Example 432 Uniform distribution 1st sample n=40fn)endC6=sum(C) sums of n replicates The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302)end

Use the code on page 5-115 of Statistics toolbox to draw histogram Draw a histogram of one row of datahist(C(1))ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])ylabel(FrequencyFontSize16)xlabel(XFontSize16)title(s2FontSize20)figure(gcf)pause

histogram of the z scoresbinwidthscaling=7numbins This will produce 7 bins (35-(-35))edges=-35binwidthscaling35N=histc(C7edges)bar(edgesNsum(N)binwidthscaling1histc)ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])axis([-36 36 -0005 105max(Nsum(N)binwidthscaling)])ylabel(DensityFontSize16)xlabel(Z-ratioFontSize16)set(ax1xtickedgesFontSize14)set(ax1ytick00107FontSize14)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=0

EEOS 601 Applied Statistics Week 4 P 24 of 38

sigma_j=1 sigmaj is the standard deviation = 1 after Z transformy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)

title(sFontSize20)figure(gcf)pause

Example 433 LMex040303_4thm Example 433 Pp 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition An application of the symbolic math integration functions Typo in book f_y(y)=y 3(1-y)^2 not fy(y)=3(1-y)^2 syms y f_y(y)=y 3(1-y)^2 0lt=y=1 Let mean y = (115)Sum from 1 to 15 Yi int(y3(1-y)^201) muy=eval(int(y3(1-y)^201)) by definition of variance int(y^23(1-y)^201)-14^2 vary=eval(int(y^23(1-y)^201)-14^2) P=normcdf((38-muy) (sqrt(vary)sqrt(15)))- normcdf((18-muy) (sqrt(vary)sqrt(15)))

Example 434 LMEx040304_4thm Example 434 Effects of rounding analyzed with normal approximation Effect of rounding on the expected value and variance of rounded estimates P 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Using the central limit theorem what is the probability that the error on 100 transactions each rounded to the nearest $100 would exceed $500 What is the variance of a uniformly distributed random number over the interval -50 to 50 From definition of Variance Var(Y)=E[(Y-mu)^2]= integral from -50 to 50 (y-mu)^2 f_y(y) dy mu=0 f_y(y)=1 This is the height of the uniform pdf at each value of y for max(y)-min(y)=1 the value of f_y(y) is 1 for uniform random numbers spread over different ranges for max(y)-min(y)f_y(y) must equal 1 thus for a uniform random number distributed on the interval -100 to 100

EEOS 601 Applied Statistics Week 4 P 25 of 38

f_y(y)=1200 Written by EugeneGallagherumbedu for EEOS601 httpwwwesumbeduedgwebphtm Written in 2001 revised 1911 The mfile goes through some applications of the symbolic math toolbox at the beginning to verify that the variance of 100 rounded transactions is 25003 format rat syms y mu int(y^2100-5050)Use the symbolic math toolbox to find the variance of a uniformly distributed variable on the interval -50 to 50 it is 25003 VarY=eval(int(sym(y^2100)-5050)) Variance of a uniform randomly distributed number on the interval -50 to 50 is 25003 Note how the variance remains 25003 with a shift in mean Var(Y)=E(Y^2)-mu^2 mu=50 int(y^21000100) EYsq=eval(int(sym(y^2100)0100)) The previous statement produces 100003 which produces an identical variance after subtracting the square of the expected value mu^2 as is appropriate from the definition of variance VARY=EYsq-mu^2 Note that Var(Y) is also defined as integral -inf to inf (y-mu)^2 f_y(y) dy We can create the definite integral of this equation and evaluate it to produce the variance syms y mu mu=50 VARY2=eval(int(sym((y-mu)^2100)y0100)) VARY2 should also be 25003 format using Theorem 31312 in Larsen amp Marx (2001 p 223) Calculate the variance of all 100 transactions (multiply the variance by 100) Note this is for 100 indepedently distributed random variates where the positive and negative errors among transactions are not correlated Varall=100VARY If you wanted to convert the variance calculated here in dollars to the variance in pennies youd have to use Theorem 3131 p 222 It would the variance of Varall in pennies would be a=100Var(aVarall)=a^2 Varall =10000 Varall calculate the standard deviation sigmaall=sqrt(Varall)

EEOS 601 Applied Statistics Week 4 P 26 of 38

Calculate the Z statisticZ=zeros(21)Z(1)=(-500-0)sigmaallZ(2)=(500-0)sigmaall Use the cumulative normal probability function to find the probability of being more than $5 dollars off after 100 transactionsP=1-(normcdf(Z(2))-normcdf(Z(1)))fprintf(The probability of being more than $500 off after 100n)fprintf(transactions in which each transaction is rounded to the n)fprintf(nearest $100 is 64fnP)

Example 435 LMex040305_4thm Example 435 Memphis earthquakes and the Poisson distribution Pp 305-306 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Annual earthquakes registering 25 on the Richter scale and within 40 miles of downtown Memphis have a Poisson distribution with lambda=65 Calculate the exact probability that nine or more such earthquakes will strike next year using the Poisson distribution and compare that with the normal approximation based on the central limit theoremlambda=65N=9P=1-poisscdf(N-1lambda)fprintf(nThe exact p value using the Poisson cumulative distribution )fprintf(function with lambda=53f is 74fnlambdaP)P2=1-normcdf(((N-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP2)continuity=05 The continuity correction is appropriate whenever a discrete probability model is being approximated by the area under a curve)P3=1-normcdf(((N+continuity-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem with )fprintf(the continuity correction is )fprintf(74fnP3)

Optional plot the Poisson pdf and the normal approximation Use LMFig040301_4thm as a modelX=018Y = poisspdf(Xlambda)bar(XY1)axis([-06 19 0 02])title(Example 435FontSize14)set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 27 of 38

ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0218FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=lambdasigma_j=sqrt(lambda) sigmaj is the standard deviationy_j=00218fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel(Number of Memphis Earthquakes gt 25)figure(gcf)pause

Questions p 306

435 The Normal Curve as a Model for Individual Measurements

Definition 431 A random variable Y is said to be normally distributed with mean igrave and variance oacutesup2 if

Comment Areas under and ldquoarbitraryrdquo normal distribution f (y) y

are calculated by finding the equivalent area under the standard normal distribution f (z) z Figure 10 Figure 435 from Example 436 If a driverrsquos

true blood alcohol level is 0095 and the breathalyzer has an instrumental error oacute= 0004 (pdf shown in upper panel) What is the probability that a driver will be

The ratio is often referred to erroneously charged with DUI The problem can be as either a Z transformation or a Z analyzed using the pdf for the unscaled data or the data can score be converted to z scores (lower panel) by subtracting the

mean and dividing by oacute The probability of an erroneous Example 436 charge is 1056 the area to the right of the cutpoints LMex040306_4thm shown in the normal and standard normal curves

EEOS 601 Applied Statistics Week 4 P 28 of 38

Example 436 Drunk driving pages 308-309 in Larsen amp Marx (2006) Based on LMex040305_4thm Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml A legally drunk driver in many states has a blood alcohol level of 010 but the breathalyzer has measurement error Repeated measurements indicate that the the estimate of the mean is unbiased and sigma is 0004 If a drivers true blood alcohol content is 0095 what is the probability that he will be incorrectly booked on a DUI chargemu=0095sigma=0004Observed=01P=1-normcdf((Observed-mu)sigma)fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP)

Plot Figure 435subplot(211)X=0075000050115Y = normpdf(Xmusigma)plot(XY--r)axis([0075 0115 0 105])title(Figure 435FontSize14)ax1=gcaxlabel(Blood Alcohol Level)ylabel(f_y(y)FontSize14)ax1=gcaset(ax1xtick0080005011FontSize12)set(ax1ytick025100FontSize12)hold onxf=0100001115yf=normpdf(xfmusigma)fill([01 xf 0115][0 yf 0][8 8 1])hold off

subplot(212)X=-3500135Y = normpdf(X01)plot(XY--r)axis([-355 355 0 042])ax1=gcaxlabel(Z-score)ylabel(f_z(z)FontSize14)ax1=gca

EEOS 601 Applied Statistics Week 4 P 29 of 38

set(ax1xtick-350535FontSize12)set(ax1ytick00204FontSize12)hold onxf=125000135yf=normpdf(xf01)fill([125 xf 35][0 yf 0][8 8 1])figure(gcf)pausehold offsubplot(111)

Example 437 Mensa p 309 LMex040307_4thm Example 437 Mensa amp IQ Pages309-310 in Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Based on LMex040306_4thm Written Fall 2010 by EugeneGallagher for EEOS601 Revised1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfi Figure 11 Figure 436 from Example 437 A Mensa

lesedgwebphtml score of 133 marks the upper 2 of IQrsquos if igrave=100 and

A person must be in the upper 2 oacute=16

of the population in IQ to join What is the minimum IQ that will qualify a person for membership Assume that the mean IQ is 100 and sigma is 16mu=100sigma=16 Use the inverse of the normal pdf to find the IQ cutoff for the upper 2 of the population 098=(1-002)IQmensa=norminv(09810016)fprintf(nThe lowest acceptable IQ for membership in Mensa is )fprintf(30fnceil(IQmensa))

Plot Figure 436X=50150Y = normpdf(Xmusigma)plot(XY--r)axis([45 155 0 003])title(Figure 436 Mensa amp IQFontSize20)ax1=gcaxlabel(IQFontSize16)ylabel(f_y(y)FontSize16)ax1=gcaset(ax1xtick[50 100 133 150]FontSize14)

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 6: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 6 of 38

LMex040201_4thm Example 421 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Table 421 amp Table 422 Larsen amp Marx (2006 p 277) Written by Eugene Gallagher for EEOS601 in2001 revised 1911 EugeneGallagherumbedu Comparing Poisson pdf and binomial pdf Figure 1 Figure from Table 421 With n=5

p=15 (euml=1) the Poisson approximation is just

n=5 adequate

k = [0n]lambda=1p_pdf = poisspdf(klambda)p_pdf=[p_pdf1-sum(p_pdf)]b_pdf= binopdf(knp)b_pdf=[b_pdf1-sum(b_pdf)]kdisp=[06]fprintf(Table 421n)fprintf(f10 53f 53fn[kdisp b_pdf p_pdf])

Plot as a grouped histogram bar(kdisp[b_pdf p_pdf]grouped) axis([-6 66 0 05]) legend(Binomial pdfPoisson pdf) xlabel(kFontsize14) ylabel(FrequencyFontsize14) title(Table 421Fontsize16) figure(gcf)pause p=1100n=100 Figure 1 Figure from Table 422 With

ndisp=[09] n=100 (euml=1) the Poisson approximation is

k = [0n] very good

lambda=1p_pdf = poisspdf(ndisplambda)p_pdf=[p_pdf1-sum(p_pdf)]b_pdf= binopdf(ndispnp)b_pdf=[b_pdf1-sum(b_pdf)]kdisp=[010]

IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 7 of 38

fprintf(Table 421n)fprintf(f10 53 53fn[kdisp b_pdf p_pdf]) Plot as a grouped histogram bar(kdisp[b_pdf p_pdf]grouped) legend(Binomial pdfPoisson pdfFontSize16) axis([-6 106 0 04]) xlabel(kFontSize14) ylabel(FrequencyFontSize14) title(Table 422FontSize16) figure(gcf)pause

Table 421

Figure 1 Table 421 Table 422

Example 422 Shadyrest hospital LMex040202_4thm Example 422 Shady Rest Hospital cardiac monitoring machines p 277-288 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition There are 12000 residents in the area served by Shady Rest Hospital and the probability that a resident will have a heart attack and need a monitor is 18000 There are 3 monitors available What is the probability that the 3 monitors will be inadequate to meet the communitys needs That is what is the probability of 4 or more heart attacks Written by EugeneGallagherumbedu Fall 2010 fprintf(Solved as the sum of binomial probabilities 86fn 1-sum(binopdf(031200018000))) fprintf(Or solved using the binomial cumulative distribution function 86fn 1-sum(binocdf(31200018000))) fprintf(Or solved using the Poisson distribution 86fn 1-sum(poisspdf(03120008000))) fprintf(Or solved using the cumulative Poisson distribution 86fn

IT
Stamp

EEOS 601 Applied Statistics Week 4 P 8 of 38

1-sum(poisscdf(3120008000)))

Case Study 421 Leukemia in Niles Illinois LMcs040201_4thm Case Study 421 Leukemia cancer cluster Pp 278-279 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Edition Written by Eugene D Gallagher for UMASSBostons EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1012010 Revised 1711 p=incidence of leukemia in 5 13 years in towns around Niles IL What is the probability of observing 8 or more childhood leukemia cases by chance alone Solve using the binomial pdf and the Poisson limit N=number of people in Niles ILN=7076p=2481e5X=87076X=Xbp=binopdf(XNp)BP=sum(bp) or it could be solved using the complementX2=07BP2=1-sum(binopdf(X2Np))fprintf(nFrom the binomial pdf the exact probability of observing n8 or more ) fprintf(deaths among 7076 children in 5 13 years is 63gnBP)fprintf(From the complement of the binomial pdf the exact p of observing n8 or)fprintf( more deaths among 7076 children in 5 13 years is 63gnBP2) Now solve using the Poisson pdf lambda is the expected number of cases among 7076 in Niles Illionoislambda=7076248e-5PP= 1-sum(poisspdf(X2lambda))fprintf(nFrom the Poisson pdf the approximate probability of observing n8 or ) fprintf(more deaths among 7076 children in 5 13 years with a Poisson n)fprintf(expected value of 63g is 63gnnlambdaPP)

Or one could use the Poisson cumulative distribution functionPCP = 1-poisscdf(7lambda)fprintf(From the cumulative Poisson pdf the approximate probability of n) fprintf(observing 8 or more deaths among 7076 children in 5 13 years )fprintf(with na Poisson expected value of 63g is 63gnn lambdaPCP) RE=abs(BP-PP)BP 100 fprintf( The relative error in the Poisson approximation is 41fnRE)

Questions p 279

EEOS 601 Applied Statistics Week 4 P 9 of 38

4213 The Poisson Distribution 42131 First described by Poisson as a limit theorem and

then used in 1898 by Professor Ladislaus von Bortkiewicz to model the number the Prussian cavalry officers kicked to death by their horses

Larsen amp Marx 2001 Theorem 422 p 251 The random variable X is said to have a Poisson distribution if

4214 Fitting the Poisson Distribution to Data

Case Study 422 LMcs040202_4thm Case Study 422 alpha decay amp Geiger counter Page 282-284 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1142010 Revised 1711 Fitting the Poisson pdf to radioactive decay alpha-particlek=[011]Frequency=[57 203 383 525 532 408 273 139 45 27 10 6]sumFrequency=sum(Frequency)lambda=sum(kFrequency)sumFrequencyfprintf(The Poisson parameter is 53fnlambda)ObservedP=FrequencysumFrequencyExpectedP=poisspdf(klambda)TableData=[k Frequency ObservedP ExpectedP]fprintf(ntttttTable 423n)fprintf(No Detected k tFrequency tProportion tp_x(k)n)fprintf( 20ftttt20fttt53ftt53fnTableData)

Using concepts from Chapter 5 a MLE fit could be made of the data The only new feature added is a 95 CI for the binomial parameter See LMcs050201_4thm for another example of using Poisspdf to fit the Poisson distribution to dataDATA=[zeros(571)ones(2031)2ones(3831)3ones(5251)4ones(5321) 5ones(4081)6ones(2731)7ones(1391)8ones(451) 9ones(271)10ones(101)11ones(61)] [LAMBDAHAT LAMBDACI] = poissfit(DATA005) fprintf(

nThe Poisson parameter =53f with 95 CI [53f 53f]nn LAMBDAHAT LAMBDACI)

EEOS 601 Applied Statistics Week 4 P 10 of 38

Case Study 423 LMcs040203_4thm Fitting football fumbles to a Poisson pdf From Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu Revised 1142010 See LMcs040202_4thm (alpha decay) for a very similar programk=[07]Frequency=[8 24 27 20 17 10 3 1]sumFrequency=sum(Frequency)meank=sum(kFrequency)sumFrequencyfprintf(The Poisson parameter is 53fnmeank)ObservedP=FrequencysumFrequencyExpectedP=Poisspdf(kmeank)TableData=[k Frequency ObservedP ExpectedP]fprintf(ntttttTable 425n)fprintf(No Detected k tFrequency tProportion tp_x(k)n)fprintf( 20ftttt20fttt53ftt53fnTableData) Using concepts from Chapter 5 a MLE fit could be made of the data The only new feature added is a 95 CI for the binomial parameter See LMcs050201_4thm for another example of using Poisspdf to fit the Poisson distribution to dataDATA=[zeros(81)ones(241)2ones(271)3ones(201)4ones(171) 5ones(101)6ones(31)7] [LAMBDAHAT LAMBDACI] = poissfit(DATA005) fprintf(

nThe Poisson parameter =53f with 95 CI [53f 53f]nn LAMBDAHAT LAMBDACI)

4215 The Poisson Model the law of small numbers 421511 Assumptions of the Poisson model

4215111 The probability that 2 or more events occur in any given ltsmallgt subinterval is essentially 0

4215112 Events are independent 4215113 The probability that an event occurs

during a given subinterval is constant over the entire interval from 0 to T

EEOS 601 Applied Statistics Week 4 P 11 of 38

42152 Calculating Poisson Probabilities 421521 ldquoCalculating Poisson probabilities is an

exercise in choosing T so that eumlT represents the expected number of occurrences in whatever ldquounitrdquo is associated with the random variable X

Example 423 LMex040203_4thm Example 423 Typographical Errors Page 285 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 172011 Revised 1711 Using the Poisson pdf ErrorRate=typographical Errors per page What is the probability that fewer than three typos will appear in a 16shy page edition X=Number of errorsErrorRate=04lambda=ErrorRate16fprintf(Using the Poisson cumulative distribution functionn)P=poisscdf(2lambda)fprintf(The probability that fewer than 3 typos will appear in a n)fprintf(16-p document with an error rate of 31f per page is 53fnlambdaP)fprintf(nUsing the Poisson probability distribution functionn)X=02P=sum(poisspdf(Xlambda))fprintf(If the Poisson parameter = 31f the probability of nlambda)fprintf(fewer than 3 typos = 53fnP)

Example 424 Insect parts in peanut butter

LMex040204_4thm Example 424 Eating bug parts Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu 121210 revised 11211 httpalphaesumbedufacultyedgfilesedgwebphtml This is a simple application of the Poisson cumulative distribution function and Poisson pdf The FDA Food Detection Action Limit for peanut butter is 30 insect fragments per 100 grams If your peanut butter snack weighs 20 grams what is the probability that youll ingest 5 or more insect parts

EEOS 601 Applied Statistics Week 4 P 12 of 38

Calculate lambda the expected number of insect parts per 20 gramslambda=3010020 solve using the Poisson cumulative distribution functionP=1-poisscdf(4lambda)fprintf(The probability that a 20-g snack contains 5 or more insect parts is 53fnP) The problem could also be solved exactly using the Poisson pdfk=04P2=1-sum(poisspdf(klambda))fprintf(The probability that a 20-g snack contains 5 or more insect parts is 53fnP2)

SSum of Poisson processes is a Poisson process

Questions Page 287 4210 Prussian horse kicks 4212 Midwestern Skies Airlines bags lost

Sum of two Poisson variables is Poisson

422 Interval between events the PoissonExponential

Relationship

Theorem 423 P 290

Suppose a series of events satisfying the Poisson model are occurring at a rate of euml

per unit time Let the random variable Y denote the inteval between consecutive

events Then Y has the exponential distribution

EEOS 601 Applied Statistics Week 4 P 13 of 38

Figure 2 Case study 424 Table 426-7 Figure 423

Case Study 424 LMcs040204_4thm Case Study 424 Mauna Loa eruptions amp exponential distribution P 290-291 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Ed Written by Eugene Gallagher for EEOS601 Figure 3 Figure 423 from Case study 424

IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 14 of 38

EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1711 Revised 1711 Fitting the exponential pdf to intervals between eruptons Data = interval between 37 consecutive eruptions of Mauna Loa DATA=[126 73 3 6 37 23 73 23 2 65 94 51 26 21 6 68 16 20 6 18 6 41 40 18 41 11 12 38 77 61 26 3 38 50 91 12] DATA=DATA() lambda=0027 fprintf(Eruptions occuring at 53f per month or once every 32f lambda1(12lambda)) fprintf(yearsn) histc is the Matlab counting function Plot the data using LMFig030402_4thm as a model ObservedI=histc(DATA020140) bar(020140ObservedI(sum(ObservedI)20)histc) set(get(gcaChildren)FaceColor[8 8 1]) axis([0 145 0 0033]) figure(gcf) xlabel(Interval between eruptions (in months)) ylabel(Density) ax1=gca get the handle for the bar charts axes pause y=0140 fy=lambdaexp(-lambday) h2=line(yfyColorrLineStyle--Linewidth2) set(ax1YTick[0001003]) text(340015f_y(y)=0027e^-0027y) title(Figure 423) figure(gcf) pause

Example 425 LMex040205_4thm Example 425 page 292 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu for EEOS601 If the Perseid meteor shower has a rate of 40 per hour Poisson distributed calculate the probability that someone would have to wait 5 minutes before seeing another y= interval in minutes between consecutive sightings Using the symbolic math toolbox

EEOS 601 Applied Statistics Week 4 P 15 of 38

fprintf(Exact P=n)syms y disp(int(4060exp(-4060y)5inf))P=eval(int(4060exp(-4060y)5inf))fprintf(If the Perseids have a Poisson parameter = 4060 per min n)fprintf(an observer would have p=64f of waiting 5 or more nP)fprintf(minutes between consecutive meteorsn)

43 THE NORMAL DISTRIBUTION

Figure 431 LMFig040301_4thm Page 293 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Generate 20 binomial random numbers p(success)=05 Written in 2001 by EugeneGallagherumbedu revised 1711 Use the statistical toolboxs binorndm X=020n=20p=05

Figure 431 Page 293 in Larsen amp Marx (2006)

Y = binopdf(Xnp)bar(XY1)axis([0 21 0 02])title(Figure 431FontSize14)set(get(gcaChildren)FaceColor[8 8 1])ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0220FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=npsigma_j=sqrt(np(1-p)) sigmaj is the standard deviationy_j=00220fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel()ylabel(Probability)title(Figure 431)figure(gcf)pause

IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 16 of 38

431 Theorem 431 Let X be a binomial random variable defined on n independent trials for which p=P(success) For any numbers a and b

Comment The function is referred to as the standard normal (or

Gaussian) curve [or distribution] By convention any random variable whose probabilistic behavior is described by a standard normal curve is denoted by Z (rather than by X Y or W) E(Z)=0 and Var(Z)=1 Page 294 in Larsen amp Marx (2006)

432 Finding areas under the standard normal curve

Figure 432 LMFig040302_4thm Larsen amp Marx Figure 432 plotpage 295 Use LMex040308_4thm as amodel Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Page 267 Standard normal distributionusing ezplot Integration using the symbolicmath toolbox Figure 4 Figure 432 Written byEugeneGallagherumbedu October 1 2010clear allhold off clfsyms z Just for fun use symbolic math integration to find the integralINTP=eval(int(1sqrt(2pi)exp(-(z^22))-inf114)) or use the standard normal cumulative distribution functionintp=normcdf(114)

Plot Figure 432mu=0sigma=1X=-350135

EEOS 601 Applied Statistics Week 4 P 17 of 38

Y = normpdf(Xmusigma) plot(XY--r) axis([-355 355 0 041]) title(Figure 432FontSize16) ax1=gca xlabel( ) ylabel(f_z(z)FontSize14) ax1=gca set(ax1XTick[-3 -2 -1 0 114 2 3] FontSize12

XTickLabel-3-2-1011423FontSize12) set(ax1ytick0104FontSize12) hold on xf=-3501114yf=normpdf(xfmusigma) fill([-35 xf 114][0 yf 0][8 8 1]) hold off

433 The continuity correction

Figure 5 Figure 433 the continuity correction

Example 431 LMex040301_4thm Example 431 Overbooking airline seats Application of the normal distribution and the continuity correction Page 297 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagher httpwwwesumbeduedgwebphtm Written Fall 2010 Revised 1711 178 seats sold for 168 economy class seats available p=09 passenger will show up What is the probability that not everyone who shows up at the gate on time can be accomodated n=178p=09n2=169

IT
Stamp

EEOS 601 Applied Statistics Week 4 P 18 of 38

P=normcdf((n+05-np)sqrt(np(1-p)))- normcdf((n2-05-np)sqrt(np(1-p))) fprintf( nP not all can be accomodated with continuity correction = 64fnP) Without the continuity correction P2=normcdf((n-np)sqrt(np(1-p)))- normcdf((n2-np)sqrt(np(1-p))) fprintf( P not all can be accomodated without continuity correction = 64fnP2) k=169178 Solve as a binomial pdf P3 = sum(binopdf(knp)) fprintf( P not all can be accomodated exact binomial = 64fnP3) P4 = binocdf(nnp)-binocdf(n2-1np) fprintf( P not all can be accomodated exact binomial cdf = 64fnP4)

Case Study 431 ESP LMcs040301_4thm Case Study 431 ESP P 298 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Thirty two students correctly guessed 12489 times out of 60000 when the expected number of successes would be just 12000 What is the probability of this occuring by chancek=12489n=6e4p=15P1=1-binocdf(knp)fprintf(nUsing the binomial cdf the exact P = 97gnP1)X=(124885-np)sqrt(np(1-p))P2=1-normcdf(X)fprintf(Using the Normal approximation with the continuity n)fprintf(correction the exact P = 97gnP2)X=(12489-np)sqrt(np(1-p))P3=1-normcdf(X)fprintf(Using the Normal approximation without the continuity n)fprintf(correction the exact P = 97gnP3)

Questions Page 299

EEOS 601 Applied Statistics Week 4 P 19 of 38

434 Central Limit Theorem (p 301) Central limit theorem Every binomial random variable X can be written as the sum of n independent Bernoulli random variables X X X where X =1 with probability P and 0 with probability 1-p but if 1 2 n i

X=X +X ++X Theorem 431 can be reexpressed as 1 2 n

The Russian Lyapunov made many of the key advances and in 1920 George Polya called it the central limit theorem Theorem 432 p 273 Let W W be an infinite sequence of of independent random variables 1 2

with the same distribution Suppose that the mean igrave and the variance oacute2 of f (w) are both finite w

For any numbers a and b

Comment The central limit theorem is often stated in terms of the average of W W and 1 2

W rather than the sum Theorem 432 can be stated in equivalent form n

Example 432 LMEx040302_4thm Larsen amp Marx Example 432 Example of the central limit theorem Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf

Figure 5 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 uniformly distributed random numbers after conversion to Z scores The fit to the normal distribution is quite good even though each of the 40 samples is based on only 5 uniformly distributed numbers

EEOS 601 Applied Statistics Week 4 P 20 of 38

See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogrammultiple of 7 n=100samples=1000numbins=14 Numbins for thehistogram multiple of 7 n=1000samples=500numbins=14 For the histogram n=1000samples=40 numbins=7 For the histogramC=rand(samplesn) Figure 6 Example 432 Page 302 in Larsen amp MarxC6=sum(C) sums of n replicates (2006) 1000 samples of 5 uniformly distributed random Use symbolic math toolbox to numbers after conversion to Z scores The number of binscalculate the variance of a uniform increased from 7 to 14 for this bar chart The normal random distribution distribution provides an excellent fit even though each ofmu=05 1000 means is based on only 5 cases drawn from a uniformsyms f z y distribution The uniform distribution is symmetricdisp(Definite integral (01) of y^2) allowing a rapid convergence to the normal distributionint(y^201)EYsq=int(sym(y^2)01)VARY=EYsq-mu^2disp(Var of a uniformly distributed random number on the interval (01))disp(VARY)vary=eval(VARY) convert from symbolic to numeric The variance is 112 The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302) end Use the code on page 5-115 of Statistics toolbox to draw histogram binwidthscaling=7numbins This will produce 7 bins (35-(-35)) edges=-35binwidthscaling35N=histc(C7edges) bar(edgesNsum(N)binwidthscaling1) ax1=gca set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 21 of 38

axis([-36 36 -0005 053]) ylabel(DensityFontSize14)xlabel(Z-ratio) set(ax1xtickedgesFontSize12) set(ax1ytick00105FontSize12) figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is Figure 7 Example 432 Page 302 in Larsen amp Marx f(y_j)=1(sqrt(2pi)sigma_j)exp(-1

(2006) 1000 samples of 500 uniformly distributed random numbers after conversion to Z scores The number of bins

2((y_j-mu_j)sigma_j)^2) increased from 7 to 14 for this bar chart The fit of the mu_j=0 sigma_j=1 sigmaj is the standard

normal distribution is excellent as would be expected from the large number of cases (500) on which each mean is

deviation = 1 after Z transform basedy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)s=sprintf(Example 432 n=40f and samples=30fnsamples)title(s)figure(gcf)pause

LMex040302_logrnd_4thm Larsen amp Marx Example 432 Example of the central limit theorem using lognormally distributed variables instead of uniform Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox

Figure 8 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 lognormally distributed (igrave=1 oacute=1) random numbers after conversion to Z-scores The normal approximation is quite poor Because each of the 40 samples is based on a mean of only 5 cases

EEOS 601 Applied Statistics Week 4 P 22 of 38

Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogram multiple of 7 Figure 9 Example 432 Page 302 in Larsen amp Marx

n=100samples=1000 (2006) 1000 samples of 1000 lognormally distributed

numbins=14 Numbins for the (igrave=1 oacute=1) random numbers after conversion to Z scores I

histogram multiple of 7 used 14 bins for this bar chart Even for the highly

n=1000samples=1000 asymmetrically lognormally distributed samples the

numbins=14 For the histogram normal approximation is excellent because of the large

n=1000samples=40 numbins=7 number of cases (1000) on which each mean was based

For the histogram lognormal=0 This example calculates the z scores based on the parametric mean and variance for the underlying distributions For the uniform distribution the variance is calculated from the variance of the distribution to be 112 For the lognormal distribution lognstat is the built-in Matlab function that will covert the mean and variance from the log scale to the standard scale if lognormal mu=1var=1 C=lognrnd(11samplesn) [muvary]=lognstat(11)s=sprintf(Example 432 Lognormal n=40f and samples=30fnsamples) s2=sprintf(Example 432 Lognormal distribution 1st sample n=40fn) else C=rand(samplesn)

mu=05 Use symbolic math toolbox to calculate the variance of a uniform

random distribution syms f z y disp(Definite integral (01) of y^2) int(y^201) EYsq=int(sym(y^2)01)

EEOS 601 Applied Statistics Week 4 P 23 of 38

VARY=EYsq-mu^2 disp( Var of a uniformly distributed random number on the interval (01)) disp(VARY) vary=eval(VARY) convert from symbolic to numeric The variance is 112 s=sprintf(Example 432 Uniform n=40f and samples=40fnsamples) s2=sprintf(Example 432 Uniform distribution 1st sample n=40fn)endC6=sum(C) sums of n replicates The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302)end

Use the code on page 5-115 of Statistics toolbox to draw histogram Draw a histogram of one row of datahist(C(1))ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])ylabel(FrequencyFontSize16)xlabel(XFontSize16)title(s2FontSize20)figure(gcf)pause

histogram of the z scoresbinwidthscaling=7numbins This will produce 7 bins (35-(-35))edges=-35binwidthscaling35N=histc(C7edges)bar(edgesNsum(N)binwidthscaling1histc)ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])axis([-36 36 -0005 105max(Nsum(N)binwidthscaling)])ylabel(DensityFontSize16)xlabel(Z-ratioFontSize16)set(ax1xtickedgesFontSize14)set(ax1ytick00107FontSize14)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=0

EEOS 601 Applied Statistics Week 4 P 24 of 38

sigma_j=1 sigmaj is the standard deviation = 1 after Z transformy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)

title(sFontSize20)figure(gcf)pause

Example 433 LMex040303_4thm Example 433 Pp 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition An application of the symbolic math integration functions Typo in book f_y(y)=y 3(1-y)^2 not fy(y)=3(1-y)^2 syms y f_y(y)=y 3(1-y)^2 0lt=y=1 Let mean y = (115)Sum from 1 to 15 Yi int(y3(1-y)^201) muy=eval(int(y3(1-y)^201)) by definition of variance int(y^23(1-y)^201)-14^2 vary=eval(int(y^23(1-y)^201)-14^2) P=normcdf((38-muy) (sqrt(vary)sqrt(15)))- normcdf((18-muy) (sqrt(vary)sqrt(15)))

Example 434 LMEx040304_4thm Example 434 Effects of rounding analyzed with normal approximation Effect of rounding on the expected value and variance of rounded estimates P 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Using the central limit theorem what is the probability that the error on 100 transactions each rounded to the nearest $100 would exceed $500 What is the variance of a uniformly distributed random number over the interval -50 to 50 From definition of Variance Var(Y)=E[(Y-mu)^2]= integral from -50 to 50 (y-mu)^2 f_y(y) dy mu=0 f_y(y)=1 This is the height of the uniform pdf at each value of y for max(y)-min(y)=1 the value of f_y(y) is 1 for uniform random numbers spread over different ranges for max(y)-min(y)f_y(y) must equal 1 thus for a uniform random number distributed on the interval -100 to 100

EEOS 601 Applied Statistics Week 4 P 25 of 38

f_y(y)=1200 Written by EugeneGallagherumbedu for EEOS601 httpwwwesumbeduedgwebphtm Written in 2001 revised 1911 The mfile goes through some applications of the symbolic math toolbox at the beginning to verify that the variance of 100 rounded transactions is 25003 format rat syms y mu int(y^2100-5050)Use the symbolic math toolbox to find the variance of a uniformly distributed variable on the interval -50 to 50 it is 25003 VarY=eval(int(sym(y^2100)-5050)) Variance of a uniform randomly distributed number on the interval -50 to 50 is 25003 Note how the variance remains 25003 with a shift in mean Var(Y)=E(Y^2)-mu^2 mu=50 int(y^21000100) EYsq=eval(int(sym(y^2100)0100)) The previous statement produces 100003 which produces an identical variance after subtracting the square of the expected value mu^2 as is appropriate from the definition of variance VARY=EYsq-mu^2 Note that Var(Y) is also defined as integral -inf to inf (y-mu)^2 f_y(y) dy We can create the definite integral of this equation and evaluate it to produce the variance syms y mu mu=50 VARY2=eval(int(sym((y-mu)^2100)y0100)) VARY2 should also be 25003 format using Theorem 31312 in Larsen amp Marx (2001 p 223) Calculate the variance of all 100 transactions (multiply the variance by 100) Note this is for 100 indepedently distributed random variates where the positive and negative errors among transactions are not correlated Varall=100VARY If you wanted to convert the variance calculated here in dollars to the variance in pennies youd have to use Theorem 3131 p 222 It would the variance of Varall in pennies would be a=100Var(aVarall)=a^2 Varall =10000 Varall calculate the standard deviation sigmaall=sqrt(Varall)

EEOS 601 Applied Statistics Week 4 P 26 of 38

Calculate the Z statisticZ=zeros(21)Z(1)=(-500-0)sigmaallZ(2)=(500-0)sigmaall Use the cumulative normal probability function to find the probability of being more than $5 dollars off after 100 transactionsP=1-(normcdf(Z(2))-normcdf(Z(1)))fprintf(The probability of being more than $500 off after 100n)fprintf(transactions in which each transaction is rounded to the n)fprintf(nearest $100 is 64fnP)

Example 435 LMex040305_4thm Example 435 Memphis earthquakes and the Poisson distribution Pp 305-306 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Annual earthquakes registering 25 on the Richter scale and within 40 miles of downtown Memphis have a Poisson distribution with lambda=65 Calculate the exact probability that nine or more such earthquakes will strike next year using the Poisson distribution and compare that with the normal approximation based on the central limit theoremlambda=65N=9P=1-poisscdf(N-1lambda)fprintf(nThe exact p value using the Poisson cumulative distribution )fprintf(function with lambda=53f is 74fnlambdaP)P2=1-normcdf(((N-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP2)continuity=05 The continuity correction is appropriate whenever a discrete probability model is being approximated by the area under a curve)P3=1-normcdf(((N+continuity-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem with )fprintf(the continuity correction is )fprintf(74fnP3)

Optional plot the Poisson pdf and the normal approximation Use LMFig040301_4thm as a modelX=018Y = poisspdf(Xlambda)bar(XY1)axis([-06 19 0 02])title(Example 435FontSize14)set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 27 of 38

ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0218FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=lambdasigma_j=sqrt(lambda) sigmaj is the standard deviationy_j=00218fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel(Number of Memphis Earthquakes gt 25)figure(gcf)pause

Questions p 306

435 The Normal Curve as a Model for Individual Measurements

Definition 431 A random variable Y is said to be normally distributed with mean igrave and variance oacutesup2 if

Comment Areas under and ldquoarbitraryrdquo normal distribution f (y) y

are calculated by finding the equivalent area under the standard normal distribution f (z) z Figure 10 Figure 435 from Example 436 If a driverrsquos

true blood alcohol level is 0095 and the breathalyzer has an instrumental error oacute= 0004 (pdf shown in upper panel) What is the probability that a driver will be

The ratio is often referred to erroneously charged with DUI The problem can be as either a Z transformation or a Z analyzed using the pdf for the unscaled data or the data can score be converted to z scores (lower panel) by subtracting the

mean and dividing by oacute The probability of an erroneous Example 436 charge is 1056 the area to the right of the cutpoints LMex040306_4thm shown in the normal and standard normal curves

EEOS 601 Applied Statistics Week 4 P 28 of 38

Example 436 Drunk driving pages 308-309 in Larsen amp Marx (2006) Based on LMex040305_4thm Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml A legally drunk driver in many states has a blood alcohol level of 010 but the breathalyzer has measurement error Repeated measurements indicate that the the estimate of the mean is unbiased and sigma is 0004 If a drivers true blood alcohol content is 0095 what is the probability that he will be incorrectly booked on a DUI chargemu=0095sigma=0004Observed=01P=1-normcdf((Observed-mu)sigma)fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP)

Plot Figure 435subplot(211)X=0075000050115Y = normpdf(Xmusigma)plot(XY--r)axis([0075 0115 0 105])title(Figure 435FontSize14)ax1=gcaxlabel(Blood Alcohol Level)ylabel(f_y(y)FontSize14)ax1=gcaset(ax1xtick0080005011FontSize12)set(ax1ytick025100FontSize12)hold onxf=0100001115yf=normpdf(xfmusigma)fill([01 xf 0115][0 yf 0][8 8 1])hold off

subplot(212)X=-3500135Y = normpdf(X01)plot(XY--r)axis([-355 355 0 042])ax1=gcaxlabel(Z-score)ylabel(f_z(z)FontSize14)ax1=gca

EEOS 601 Applied Statistics Week 4 P 29 of 38

set(ax1xtick-350535FontSize12)set(ax1ytick00204FontSize12)hold onxf=125000135yf=normpdf(xf01)fill([125 xf 35][0 yf 0][8 8 1])figure(gcf)pausehold offsubplot(111)

Example 437 Mensa p 309 LMex040307_4thm Example 437 Mensa amp IQ Pages309-310 in Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Based on LMex040306_4thm Written Fall 2010 by EugeneGallagher for EEOS601 Revised1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfi Figure 11 Figure 436 from Example 437 A Mensa

lesedgwebphtml score of 133 marks the upper 2 of IQrsquos if igrave=100 and

A person must be in the upper 2 oacute=16

of the population in IQ to join What is the minimum IQ that will qualify a person for membership Assume that the mean IQ is 100 and sigma is 16mu=100sigma=16 Use the inverse of the normal pdf to find the IQ cutoff for the upper 2 of the population 098=(1-002)IQmensa=norminv(09810016)fprintf(nThe lowest acceptable IQ for membership in Mensa is )fprintf(30fnceil(IQmensa))

Plot Figure 436X=50150Y = normpdf(Xmusigma)plot(XY--r)axis([45 155 0 003])title(Figure 436 Mensa amp IQFontSize20)ax1=gcaxlabel(IQFontSize16)ylabel(f_y(y)FontSize16)ax1=gcaset(ax1xtick[50 100 133 150]FontSize14)

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 7: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 7 of 38

fprintf(Table 421n)fprintf(f10 53 53fn[kdisp b_pdf p_pdf]) Plot as a grouped histogram bar(kdisp[b_pdf p_pdf]grouped) legend(Binomial pdfPoisson pdfFontSize16) axis([-6 106 0 04]) xlabel(kFontSize14) ylabel(FrequencyFontSize14) title(Table 422FontSize16) figure(gcf)pause

Table 421

Figure 1 Table 421 Table 422

Example 422 Shadyrest hospital LMex040202_4thm Example 422 Shady Rest Hospital cardiac monitoring machines p 277-288 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition There are 12000 residents in the area served by Shady Rest Hospital and the probability that a resident will have a heart attack and need a monitor is 18000 There are 3 monitors available What is the probability that the 3 monitors will be inadequate to meet the communitys needs That is what is the probability of 4 or more heart attacks Written by EugeneGallagherumbedu Fall 2010 fprintf(Solved as the sum of binomial probabilities 86fn 1-sum(binopdf(031200018000))) fprintf(Or solved using the binomial cumulative distribution function 86fn 1-sum(binocdf(31200018000))) fprintf(Or solved using the Poisson distribution 86fn 1-sum(poisspdf(03120008000))) fprintf(Or solved using the cumulative Poisson distribution 86fn

IT
Stamp

EEOS 601 Applied Statistics Week 4 P 8 of 38

1-sum(poisscdf(3120008000)))

Case Study 421 Leukemia in Niles Illinois LMcs040201_4thm Case Study 421 Leukemia cancer cluster Pp 278-279 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Edition Written by Eugene D Gallagher for UMASSBostons EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1012010 Revised 1711 p=incidence of leukemia in 5 13 years in towns around Niles IL What is the probability of observing 8 or more childhood leukemia cases by chance alone Solve using the binomial pdf and the Poisson limit N=number of people in Niles ILN=7076p=2481e5X=87076X=Xbp=binopdf(XNp)BP=sum(bp) or it could be solved using the complementX2=07BP2=1-sum(binopdf(X2Np))fprintf(nFrom the binomial pdf the exact probability of observing n8 or more ) fprintf(deaths among 7076 children in 5 13 years is 63gnBP)fprintf(From the complement of the binomial pdf the exact p of observing n8 or)fprintf( more deaths among 7076 children in 5 13 years is 63gnBP2) Now solve using the Poisson pdf lambda is the expected number of cases among 7076 in Niles Illionoislambda=7076248e-5PP= 1-sum(poisspdf(X2lambda))fprintf(nFrom the Poisson pdf the approximate probability of observing n8 or ) fprintf(more deaths among 7076 children in 5 13 years with a Poisson n)fprintf(expected value of 63g is 63gnnlambdaPP)

Or one could use the Poisson cumulative distribution functionPCP = 1-poisscdf(7lambda)fprintf(From the cumulative Poisson pdf the approximate probability of n) fprintf(observing 8 or more deaths among 7076 children in 5 13 years )fprintf(with na Poisson expected value of 63g is 63gnn lambdaPCP) RE=abs(BP-PP)BP 100 fprintf( The relative error in the Poisson approximation is 41fnRE)

Questions p 279

EEOS 601 Applied Statistics Week 4 P 9 of 38

4213 The Poisson Distribution 42131 First described by Poisson as a limit theorem and

then used in 1898 by Professor Ladislaus von Bortkiewicz to model the number the Prussian cavalry officers kicked to death by their horses

Larsen amp Marx 2001 Theorem 422 p 251 The random variable X is said to have a Poisson distribution if

4214 Fitting the Poisson Distribution to Data

Case Study 422 LMcs040202_4thm Case Study 422 alpha decay amp Geiger counter Page 282-284 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1142010 Revised 1711 Fitting the Poisson pdf to radioactive decay alpha-particlek=[011]Frequency=[57 203 383 525 532 408 273 139 45 27 10 6]sumFrequency=sum(Frequency)lambda=sum(kFrequency)sumFrequencyfprintf(The Poisson parameter is 53fnlambda)ObservedP=FrequencysumFrequencyExpectedP=poisspdf(klambda)TableData=[k Frequency ObservedP ExpectedP]fprintf(ntttttTable 423n)fprintf(No Detected k tFrequency tProportion tp_x(k)n)fprintf( 20ftttt20fttt53ftt53fnTableData)

Using concepts from Chapter 5 a MLE fit could be made of the data The only new feature added is a 95 CI for the binomial parameter See LMcs050201_4thm for another example of using Poisspdf to fit the Poisson distribution to dataDATA=[zeros(571)ones(2031)2ones(3831)3ones(5251)4ones(5321) 5ones(4081)6ones(2731)7ones(1391)8ones(451) 9ones(271)10ones(101)11ones(61)] [LAMBDAHAT LAMBDACI] = poissfit(DATA005) fprintf(

nThe Poisson parameter =53f with 95 CI [53f 53f]nn LAMBDAHAT LAMBDACI)

EEOS 601 Applied Statistics Week 4 P 10 of 38

Case Study 423 LMcs040203_4thm Fitting football fumbles to a Poisson pdf From Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu Revised 1142010 See LMcs040202_4thm (alpha decay) for a very similar programk=[07]Frequency=[8 24 27 20 17 10 3 1]sumFrequency=sum(Frequency)meank=sum(kFrequency)sumFrequencyfprintf(The Poisson parameter is 53fnmeank)ObservedP=FrequencysumFrequencyExpectedP=Poisspdf(kmeank)TableData=[k Frequency ObservedP ExpectedP]fprintf(ntttttTable 425n)fprintf(No Detected k tFrequency tProportion tp_x(k)n)fprintf( 20ftttt20fttt53ftt53fnTableData) Using concepts from Chapter 5 a MLE fit could be made of the data The only new feature added is a 95 CI for the binomial parameter See LMcs050201_4thm for another example of using Poisspdf to fit the Poisson distribution to dataDATA=[zeros(81)ones(241)2ones(271)3ones(201)4ones(171) 5ones(101)6ones(31)7] [LAMBDAHAT LAMBDACI] = poissfit(DATA005) fprintf(

nThe Poisson parameter =53f with 95 CI [53f 53f]nn LAMBDAHAT LAMBDACI)

4215 The Poisson Model the law of small numbers 421511 Assumptions of the Poisson model

4215111 The probability that 2 or more events occur in any given ltsmallgt subinterval is essentially 0

4215112 Events are independent 4215113 The probability that an event occurs

during a given subinterval is constant over the entire interval from 0 to T

EEOS 601 Applied Statistics Week 4 P 11 of 38

42152 Calculating Poisson Probabilities 421521 ldquoCalculating Poisson probabilities is an

exercise in choosing T so that eumlT represents the expected number of occurrences in whatever ldquounitrdquo is associated with the random variable X

Example 423 LMex040203_4thm Example 423 Typographical Errors Page 285 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 172011 Revised 1711 Using the Poisson pdf ErrorRate=typographical Errors per page What is the probability that fewer than three typos will appear in a 16shy page edition X=Number of errorsErrorRate=04lambda=ErrorRate16fprintf(Using the Poisson cumulative distribution functionn)P=poisscdf(2lambda)fprintf(The probability that fewer than 3 typos will appear in a n)fprintf(16-p document with an error rate of 31f per page is 53fnlambdaP)fprintf(nUsing the Poisson probability distribution functionn)X=02P=sum(poisspdf(Xlambda))fprintf(If the Poisson parameter = 31f the probability of nlambda)fprintf(fewer than 3 typos = 53fnP)

Example 424 Insect parts in peanut butter

LMex040204_4thm Example 424 Eating bug parts Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu 121210 revised 11211 httpalphaesumbedufacultyedgfilesedgwebphtml This is a simple application of the Poisson cumulative distribution function and Poisson pdf The FDA Food Detection Action Limit for peanut butter is 30 insect fragments per 100 grams If your peanut butter snack weighs 20 grams what is the probability that youll ingest 5 or more insect parts

EEOS 601 Applied Statistics Week 4 P 12 of 38

Calculate lambda the expected number of insect parts per 20 gramslambda=3010020 solve using the Poisson cumulative distribution functionP=1-poisscdf(4lambda)fprintf(The probability that a 20-g snack contains 5 or more insect parts is 53fnP) The problem could also be solved exactly using the Poisson pdfk=04P2=1-sum(poisspdf(klambda))fprintf(The probability that a 20-g snack contains 5 or more insect parts is 53fnP2)

SSum of Poisson processes is a Poisson process

Questions Page 287 4210 Prussian horse kicks 4212 Midwestern Skies Airlines bags lost

Sum of two Poisson variables is Poisson

422 Interval between events the PoissonExponential

Relationship

Theorem 423 P 290

Suppose a series of events satisfying the Poisson model are occurring at a rate of euml

per unit time Let the random variable Y denote the inteval between consecutive

events Then Y has the exponential distribution

EEOS 601 Applied Statistics Week 4 P 13 of 38

Figure 2 Case study 424 Table 426-7 Figure 423

Case Study 424 LMcs040204_4thm Case Study 424 Mauna Loa eruptions amp exponential distribution P 290-291 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Ed Written by Eugene Gallagher for EEOS601 Figure 3 Figure 423 from Case study 424

IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 14 of 38

EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1711 Revised 1711 Fitting the exponential pdf to intervals between eruptons Data = interval between 37 consecutive eruptions of Mauna Loa DATA=[126 73 3 6 37 23 73 23 2 65 94 51 26 21 6 68 16 20 6 18 6 41 40 18 41 11 12 38 77 61 26 3 38 50 91 12] DATA=DATA() lambda=0027 fprintf(Eruptions occuring at 53f per month or once every 32f lambda1(12lambda)) fprintf(yearsn) histc is the Matlab counting function Plot the data using LMFig030402_4thm as a model ObservedI=histc(DATA020140) bar(020140ObservedI(sum(ObservedI)20)histc) set(get(gcaChildren)FaceColor[8 8 1]) axis([0 145 0 0033]) figure(gcf) xlabel(Interval between eruptions (in months)) ylabel(Density) ax1=gca get the handle for the bar charts axes pause y=0140 fy=lambdaexp(-lambday) h2=line(yfyColorrLineStyle--Linewidth2) set(ax1YTick[0001003]) text(340015f_y(y)=0027e^-0027y) title(Figure 423) figure(gcf) pause

Example 425 LMex040205_4thm Example 425 page 292 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu for EEOS601 If the Perseid meteor shower has a rate of 40 per hour Poisson distributed calculate the probability that someone would have to wait 5 minutes before seeing another y= interval in minutes between consecutive sightings Using the symbolic math toolbox

EEOS 601 Applied Statistics Week 4 P 15 of 38

fprintf(Exact P=n)syms y disp(int(4060exp(-4060y)5inf))P=eval(int(4060exp(-4060y)5inf))fprintf(If the Perseids have a Poisson parameter = 4060 per min n)fprintf(an observer would have p=64f of waiting 5 or more nP)fprintf(minutes between consecutive meteorsn)

43 THE NORMAL DISTRIBUTION

Figure 431 LMFig040301_4thm Page 293 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Generate 20 binomial random numbers p(success)=05 Written in 2001 by EugeneGallagherumbedu revised 1711 Use the statistical toolboxs binorndm X=020n=20p=05

Figure 431 Page 293 in Larsen amp Marx (2006)

Y = binopdf(Xnp)bar(XY1)axis([0 21 0 02])title(Figure 431FontSize14)set(get(gcaChildren)FaceColor[8 8 1])ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0220FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=npsigma_j=sqrt(np(1-p)) sigmaj is the standard deviationy_j=00220fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel()ylabel(Probability)title(Figure 431)figure(gcf)pause

IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 16 of 38

431 Theorem 431 Let X be a binomial random variable defined on n independent trials for which p=P(success) For any numbers a and b

Comment The function is referred to as the standard normal (or

Gaussian) curve [or distribution] By convention any random variable whose probabilistic behavior is described by a standard normal curve is denoted by Z (rather than by X Y or W) E(Z)=0 and Var(Z)=1 Page 294 in Larsen amp Marx (2006)

432 Finding areas under the standard normal curve

Figure 432 LMFig040302_4thm Larsen amp Marx Figure 432 plotpage 295 Use LMex040308_4thm as amodel Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Page 267 Standard normal distributionusing ezplot Integration using the symbolicmath toolbox Figure 4 Figure 432 Written byEugeneGallagherumbedu October 1 2010clear allhold off clfsyms z Just for fun use symbolic math integration to find the integralINTP=eval(int(1sqrt(2pi)exp(-(z^22))-inf114)) or use the standard normal cumulative distribution functionintp=normcdf(114)

Plot Figure 432mu=0sigma=1X=-350135

EEOS 601 Applied Statistics Week 4 P 17 of 38

Y = normpdf(Xmusigma) plot(XY--r) axis([-355 355 0 041]) title(Figure 432FontSize16) ax1=gca xlabel( ) ylabel(f_z(z)FontSize14) ax1=gca set(ax1XTick[-3 -2 -1 0 114 2 3] FontSize12

XTickLabel-3-2-1011423FontSize12) set(ax1ytick0104FontSize12) hold on xf=-3501114yf=normpdf(xfmusigma) fill([-35 xf 114][0 yf 0][8 8 1]) hold off

433 The continuity correction

Figure 5 Figure 433 the continuity correction

Example 431 LMex040301_4thm Example 431 Overbooking airline seats Application of the normal distribution and the continuity correction Page 297 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagher httpwwwesumbeduedgwebphtm Written Fall 2010 Revised 1711 178 seats sold for 168 economy class seats available p=09 passenger will show up What is the probability that not everyone who shows up at the gate on time can be accomodated n=178p=09n2=169

IT
Stamp

EEOS 601 Applied Statistics Week 4 P 18 of 38

P=normcdf((n+05-np)sqrt(np(1-p)))- normcdf((n2-05-np)sqrt(np(1-p))) fprintf( nP not all can be accomodated with continuity correction = 64fnP) Without the continuity correction P2=normcdf((n-np)sqrt(np(1-p)))- normcdf((n2-np)sqrt(np(1-p))) fprintf( P not all can be accomodated without continuity correction = 64fnP2) k=169178 Solve as a binomial pdf P3 = sum(binopdf(knp)) fprintf( P not all can be accomodated exact binomial = 64fnP3) P4 = binocdf(nnp)-binocdf(n2-1np) fprintf( P not all can be accomodated exact binomial cdf = 64fnP4)

Case Study 431 ESP LMcs040301_4thm Case Study 431 ESP P 298 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Thirty two students correctly guessed 12489 times out of 60000 when the expected number of successes would be just 12000 What is the probability of this occuring by chancek=12489n=6e4p=15P1=1-binocdf(knp)fprintf(nUsing the binomial cdf the exact P = 97gnP1)X=(124885-np)sqrt(np(1-p))P2=1-normcdf(X)fprintf(Using the Normal approximation with the continuity n)fprintf(correction the exact P = 97gnP2)X=(12489-np)sqrt(np(1-p))P3=1-normcdf(X)fprintf(Using the Normal approximation without the continuity n)fprintf(correction the exact P = 97gnP3)

Questions Page 299

EEOS 601 Applied Statistics Week 4 P 19 of 38

434 Central Limit Theorem (p 301) Central limit theorem Every binomial random variable X can be written as the sum of n independent Bernoulli random variables X X X where X =1 with probability P and 0 with probability 1-p but if 1 2 n i

X=X +X ++X Theorem 431 can be reexpressed as 1 2 n

The Russian Lyapunov made many of the key advances and in 1920 George Polya called it the central limit theorem Theorem 432 p 273 Let W W be an infinite sequence of of independent random variables 1 2

with the same distribution Suppose that the mean igrave and the variance oacute2 of f (w) are both finite w

For any numbers a and b

Comment The central limit theorem is often stated in terms of the average of W W and 1 2

W rather than the sum Theorem 432 can be stated in equivalent form n

Example 432 LMEx040302_4thm Larsen amp Marx Example 432 Example of the central limit theorem Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf

Figure 5 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 uniformly distributed random numbers after conversion to Z scores The fit to the normal distribution is quite good even though each of the 40 samples is based on only 5 uniformly distributed numbers

EEOS 601 Applied Statistics Week 4 P 20 of 38

See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogrammultiple of 7 n=100samples=1000numbins=14 Numbins for thehistogram multiple of 7 n=1000samples=500numbins=14 For the histogram n=1000samples=40 numbins=7 For the histogramC=rand(samplesn) Figure 6 Example 432 Page 302 in Larsen amp MarxC6=sum(C) sums of n replicates (2006) 1000 samples of 5 uniformly distributed random Use symbolic math toolbox to numbers after conversion to Z scores The number of binscalculate the variance of a uniform increased from 7 to 14 for this bar chart The normal random distribution distribution provides an excellent fit even though each ofmu=05 1000 means is based on only 5 cases drawn from a uniformsyms f z y distribution The uniform distribution is symmetricdisp(Definite integral (01) of y^2) allowing a rapid convergence to the normal distributionint(y^201)EYsq=int(sym(y^2)01)VARY=EYsq-mu^2disp(Var of a uniformly distributed random number on the interval (01))disp(VARY)vary=eval(VARY) convert from symbolic to numeric The variance is 112 The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302) end Use the code on page 5-115 of Statistics toolbox to draw histogram binwidthscaling=7numbins This will produce 7 bins (35-(-35)) edges=-35binwidthscaling35N=histc(C7edges) bar(edgesNsum(N)binwidthscaling1) ax1=gca set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 21 of 38

axis([-36 36 -0005 053]) ylabel(DensityFontSize14)xlabel(Z-ratio) set(ax1xtickedgesFontSize12) set(ax1ytick00105FontSize12) figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is Figure 7 Example 432 Page 302 in Larsen amp Marx f(y_j)=1(sqrt(2pi)sigma_j)exp(-1

(2006) 1000 samples of 500 uniformly distributed random numbers after conversion to Z scores The number of bins

2((y_j-mu_j)sigma_j)^2) increased from 7 to 14 for this bar chart The fit of the mu_j=0 sigma_j=1 sigmaj is the standard

normal distribution is excellent as would be expected from the large number of cases (500) on which each mean is

deviation = 1 after Z transform basedy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)s=sprintf(Example 432 n=40f and samples=30fnsamples)title(s)figure(gcf)pause

LMex040302_logrnd_4thm Larsen amp Marx Example 432 Example of the central limit theorem using lognormally distributed variables instead of uniform Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox

Figure 8 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 lognormally distributed (igrave=1 oacute=1) random numbers after conversion to Z-scores The normal approximation is quite poor Because each of the 40 samples is based on a mean of only 5 cases

EEOS 601 Applied Statistics Week 4 P 22 of 38

Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogram multiple of 7 Figure 9 Example 432 Page 302 in Larsen amp Marx

n=100samples=1000 (2006) 1000 samples of 1000 lognormally distributed

numbins=14 Numbins for the (igrave=1 oacute=1) random numbers after conversion to Z scores I

histogram multiple of 7 used 14 bins for this bar chart Even for the highly

n=1000samples=1000 asymmetrically lognormally distributed samples the

numbins=14 For the histogram normal approximation is excellent because of the large

n=1000samples=40 numbins=7 number of cases (1000) on which each mean was based

For the histogram lognormal=0 This example calculates the z scores based on the parametric mean and variance for the underlying distributions For the uniform distribution the variance is calculated from the variance of the distribution to be 112 For the lognormal distribution lognstat is the built-in Matlab function that will covert the mean and variance from the log scale to the standard scale if lognormal mu=1var=1 C=lognrnd(11samplesn) [muvary]=lognstat(11)s=sprintf(Example 432 Lognormal n=40f and samples=30fnsamples) s2=sprintf(Example 432 Lognormal distribution 1st sample n=40fn) else C=rand(samplesn)

mu=05 Use symbolic math toolbox to calculate the variance of a uniform

random distribution syms f z y disp(Definite integral (01) of y^2) int(y^201) EYsq=int(sym(y^2)01)

EEOS 601 Applied Statistics Week 4 P 23 of 38

VARY=EYsq-mu^2 disp( Var of a uniformly distributed random number on the interval (01)) disp(VARY) vary=eval(VARY) convert from symbolic to numeric The variance is 112 s=sprintf(Example 432 Uniform n=40f and samples=40fnsamples) s2=sprintf(Example 432 Uniform distribution 1st sample n=40fn)endC6=sum(C) sums of n replicates The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302)end

Use the code on page 5-115 of Statistics toolbox to draw histogram Draw a histogram of one row of datahist(C(1))ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])ylabel(FrequencyFontSize16)xlabel(XFontSize16)title(s2FontSize20)figure(gcf)pause

histogram of the z scoresbinwidthscaling=7numbins This will produce 7 bins (35-(-35))edges=-35binwidthscaling35N=histc(C7edges)bar(edgesNsum(N)binwidthscaling1histc)ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])axis([-36 36 -0005 105max(Nsum(N)binwidthscaling)])ylabel(DensityFontSize16)xlabel(Z-ratioFontSize16)set(ax1xtickedgesFontSize14)set(ax1ytick00107FontSize14)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=0

EEOS 601 Applied Statistics Week 4 P 24 of 38

sigma_j=1 sigmaj is the standard deviation = 1 after Z transformy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)

title(sFontSize20)figure(gcf)pause

Example 433 LMex040303_4thm Example 433 Pp 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition An application of the symbolic math integration functions Typo in book f_y(y)=y 3(1-y)^2 not fy(y)=3(1-y)^2 syms y f_y(y)=y 3(1-y)^2 0lt=y=1 Let mean y = (115)Sum from 1 to 15 Yi int(y3(1-y)^201) muy=eval(int(y3(1-y)^201)) by definition of variance int(y^23(1-y)^201)-14^2 vary=eval(int(y^23(1-y)^201)-14^2) P=normcdf((38-muy) (sqrt(vary)sqrt(15)))- normcdf((18-muy) (sqrt(vary)sqrt(15)))

Example 434 LMEx040304_4thm Example 434 Effects of rounding analyzed with normal approximation Effect of rounding on the expected value and variance of rounded estimates P 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Using the central limit theorem what is the probability that the error on 100 transactions each rounded to the nearest $100 would exceed $500 What is the variance of a uniformly distributed random number over the interval -50 to 50 From definition of Variance Var(Y)=E[(Y-mu)^2]= integral from -50 to 50 (y-mu)^2 f_y(y) dy mu=0 f_y(y)=1 This is the height of the uniform pdf at each value of y for max(y)-min(y)=1 the value of f_y(y) is 1 for uniform random numbers spread over different ranges for max(y)-min(y)f_y(y) must equal 1 thus for a uniform random number distributed on the interval -100 to 100

EEOS 601 Applied Statistics Week 4 P 25 of 38

f_y(y)=1200 Written by EugeneGallagherumbedu for EEOS601 httpwwwesumbeduedgwebphtm Written in 2001 revised 1911 The mfile goes through some applications of the symbolic math toolbox at the beginning to verify that the variance of 100 rounded transactions is 25003 format rat syms y mu int(y^2100-5050)Use the symbolic math toolbox to find the variance of a uniformly distributed variable on the interval -50 to 50 it is 25003 VarY=eval(int(sym(y^2100)-5050)) Variance of a uniform randomly distributed number on the interval -50 to 50 is 25003 Note how the variance remains 25003 with a shift in mean Var(Y)=E(Y^2)-mu^2 mu=50 int(y^21000100) EYsq=eval(int(sym(y^2100)0100)) The previous statement produces 100003 which produces an identical variance after subtracting the square of the expected value mu^2 as is appropriate from the definition of variance VARY=EYsq-mu^2 Note that Var(Y) is also defined as integral -inf to inf (y-mu)^2 f_y(y) dy We can create the definite integral of this equation and evaluate it to produce the variance syms y mu mu=50 VARY2=eval(int(sym((y-mu)^2100)y0100)) VARY2 should also be 25003 format using Theorem 31312 in Larsen amp Marx (2001 p 223) Calculate the variance of all 100 transactions (multiply the variance by 100) Note this is for 100 indepedently distributed random variates where the positive and negative errors among transactions are not correlated Varall=100VARY If you wanted to convert the variance calculated here in dollars to the variance in pennies youd have to use Theorem 3131 p 222 It would the variance of Varall in pennies would be a=100Var(aVarall)=a^2 Varall =10000 Varall calculate the standard deviation sigmaall=sqrt(Varall)

EEOS 601 Applied Statistics Week 4 P 26 of 38

Calculate the Z statisticZ=zeros(21)Z(1)=(-500-0)sigmaallZ(2)=(500-0)sigmaall Use the cumulative normal probability function to find the probability of being more than $5 dollars off after 100 transactionsP=1-(normcdf(Z(2))-normcdf(Z(1)))fprintf(The probability of being more than $500 off after 100n)fprintf(transactions in which each transaction is rounded to the n)fprintf(nearest $100 is 64fnP)

Example 435 LMex040305_4thm Example 435 Memphis earthquakes and the Poisson distribution Pp 305-306 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Annual earthquakes registering 25 on the Richter scale and within 40 miles of downtown Memphis have a Poisson distribution with lambda=65 Calculate the exact probability that nine or more such earthquakes will strike next year using the Poisson distribution and compare that with the normal approximation based on the central limit theoremlambda=65N=9P=1-poisscdf(N-1lambda)fprintf(nThe exact p value using the Poisson cumulative distribution )fprintf(function with lambda=53f is 74fnlambdaP)P2=1-normcdf(((N-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP2)continuity=05 The continuity correction is appropriate whenever a discrete probability model is being approximated by the area under a curve)P3=1-normcdf(((N+continuity-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem with )fprintf(the continuity correction is )fprintf(74fnP3)

Optional plot the Poisson pdf and the normal approximation Use LMFig040301_4thm as a modelX=018Y = poisspdf(Xlambda)bar(XY1)axis([-06 19 0 02])title(Example 435FontSize14)set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 27 of 38

ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0218FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=lambdasigma_j=sqrt(lambda) sigmaj is the standard deviationy_j=00218fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel(Number of Memphis Earthquakes gt 25)figure(gcf)pause

Questions p 306

435 The Normal Curve as a Model for Individual Measurements

Definition 431 A random variable Y is said to be normally distributed with mean igrave and variance oacutesup2 if

Comment Areas under and ldquoarbitraryrdquo normal distribution f (y) y

are calculated by finding the equivalent area under the standard normal distribution f (z) z Figure 10 Figure 435 from Example 436 If a driverrsquos

true blood alcohol level is 0095 and the breathalyzer has an instrumental error oacute= 0004 (pdf shown in upper panel) What is the probability that a driver will be

The ratio is often referred to erroneously charged with DUI The problem can be as either a Z transformation or a Z analyzed using the pdf for the unscaled data or the data can score be converted to z scores (lower panel) by subtracting the

mean and dividing by oacute The probability of an erroneous Example 436 charge is 1056 the area to the right of the cutpoints LMex040306_4thm shown in the normal and standard normal curves

EEOS 601 Applied Statistics Week 4 P 28 of 38

Example 436 Drunk driving pages 308-309 in Larsen amp Marx (2006) Based on LMex040305_4thm Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml A legally drunk driver in many states has a blood alcohol level of 010 but the breathalyzer has measurement error Repeated measurements indicate that the the estimate of the mean is unbiased and sigma is 0004 If a drivers true blood alcohol content is 0095 what is the probability that he will be incorrectly booked on a DUI chargemu=0095sigma=0004Observed=01P=1-normcdf((Observed-mu)sigma)fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP)

Plot Figure 435subplot(211)X=0075000050115Y = normpdf(Xmusigma)plot(XY--r)axis([0075 0115 0 105])title(Figure 435FontSize14)ax1=gcaxlabel(Blood Alcohol Level)ylabel(f_y(y)FontSize14)ax1=gcaset(ax1xtick0080005011FontSize12)set(ax1ytick025100FontSize12)hold onxf=0100001115yf=normpdf(xfmusigma)fill([01 xf 0115][0 yf 0][8 8 1])hold off

subplot(212)X=-3500135Y = normpdf(X01)plot(XY--r)axis([-355 355 0 042])ax1=gcaxlabel(Z-score)ylabel(f_z(z)FontSize14)ax1=gca

EEOS 601 Applied Statistics Week 4 P 29 of 38

set(ax1xtick-350535FontSize12)set(ax1ytick00204FontSize12)hold onxf=125000135yf=normpdf(xf01)fill([125 xf 35][0 yf 0][8 8 1])figure(gcf)pausehold offsubplot(111)

Example 437 Mensa p 309 LMex040307_4thm Example 437 Mensa amp IQ Pages309-310 in Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Based on LMex040306_4thm Written Fall 2010 by EugeneGallagher for EEOS601 Revised1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfi Figure 11 Figure 436 from Example 437 A Mensa

lesedgwebphtml score of 133 marks the upper 2 of IQrsquos if igrave=100 and

A person must be in the upper 2 oacute=16

of the population in IQ to join What is the minimum IQ that will qualify a person for membership Assume that the mean IQ is 100 and sigma is 16mu=100sigma=16 Use the inverse of the normal pdf to find the IQ cutoff for the upper 2 of the population 098=(1-002)IQmensa=norminv(09810016)fprintf(nThe lowest acceptable IQ for membership in Mensa is )fprintf(30fnceil(IQmensa))

Plot Figure 436X=50150Y = normpdf(Xmusigma)plot(XY--r)axis([45 155 0 003])title(Figure 436 Mensa amp IQFontSize20)ax1=gcaxlabel(IQFontSize16)ylabel(f_y(y)FontSize16)ax1=gcaset(ax1xtick[50 100 133 150]FontSize14)

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 8: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 8 of 38

1-sum(poisscdf(3120008000)))

Case Study 421 Leukemia in Niles Illinois LMcs040201_4thm Case Study 421 Leukemia cancer cluster Pp 278-279 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Edition Written by Eugene D Gallagher for UMASSBostons EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1012010 Revised 1711 p=incidence of leukemia in 5 13 years in towns around Niles IL What is the probability of observing 8 or more childhood leukemia cases by chance alone Solve using the binomial pdf and the Poisson limit N=number of people in Niles ILN=7076p=2481e5X=87076X=Xbp=binopdf(XNp)BP=sum(bp) or it could be solved using the complementX2=07BP2=1-sum(binopdf(X2Np))fprintf(nFrom the binomial pdf the exact probability of observing n8 or more ) fprintf(deaths among 7076 children in 5 13 years is 63gnBP)fprintf(From the complement of the binomial pdf the exact p of observing n8 or)fprintf( more deaths among 7076 children in 5 13 years is 63gnBP2) Now solve using the Poisson pdf lambda is the expected number of cases among 7076 in Niles Illionoislambda=7076248e-5PP= 1-sum(poisspdf(X2lambda))fprintf(nFrom the Poisson pdf the approximate probability of observing n8 or ) fprintf(more deaths among 7076 children in 5 13 years with a Poisson n)fprintf(expected value of 63g is 63gnnlambdaPP)

Or one could use the Poisson cumulative distribution functionPCP = 1-poisscdf(7lambda)fprintf(From the cumulative Poisson pdf the approximate probability of n) fprintf(observing 8 or more deaths among 7076 children in 5 13 years )fprintf(with na Poisson expected value of 63g is 63gnn lambdaPCP) RE=abs(BP-PP)BP 100 fprintf( The relative error in the Poisson approximation is 41fnRE)

Questions p 279

EEOS 601 Applied Statistics Week 4 P 9 of 38

4213 The Poisson Distribution 42131 First described by Poisson as a limit theorem and

then used in 1898 by Professor Ladislaus von Bortkiewicz to model the number the Prussian cavalry officers kicked to death by their horses

Larsen amp Marx 2001 Theorem 422 p 251 The random variable X is said to have a Poisson distribution if

4214 Fitting the Poisson Distribution to Data

Case Study 422 LMcs040202_4thm Case Study 422 alpha decay amp Geiger counter Page 282-284 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1142010 Revised 1711 Fitting the Poisson pdf to radioactive decay alpha-particlek=[011]Frequency=[57 203 383 525 532 408 273 139 45 27 10 6]sumFrequency=sum(Frequency)lambda=sum(kFrequency)sumFrequencyfprintf(The Poisson parameter is 53fnlambda)ObservedP=FrequencysumFrequencyExpectedP=poisspdf(klambda)TableData=[k Frequency ObservedP ExpectedP]fprintf(ntttttTable 423n)fprintf(No Detected k tFrequency tProportion tp_x(k)n)fprintf( 20ftttt20fttt53ftt53fnTableData)

Using concepts from Chapter 5 a MLE fit could be made of the data The only new feature added is a 95 CI for the binomial parameter See LMcs050201_4thm for another example of using Poisspdf to fit the Poisson distribution to dataDATA=[zeros(571)ones(2031)2ones(3831)3ones(5251)4ones(5321) 5ones(4081)6ones(2731)7ones(1391)8ones(451) 9ones(271)10ones(101)11ones(61)] [LAMBDAHAT LAMBDACI] = poissfit(DATA005) fprintf(

nThe Poisson parameter =53f with 95 CI [53f 53f]nn LAMBDAHAT LAMBDACI)

EEOS 601 Applied Statistics Week 4 P 10 of 38

Case Study 423 LMcs040203_4thm Fitting football fumbles to a Poisson pdf From Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu Revised 1142010 See LMcs040202_4thm (alpha decay) for a very similar programk=[07]Frequency=[8 24 27 20 17 10 3 1]sumFrequency=sum(Frequency)meank=sum(kFrequency)sumFrequencyfprintf(The Poisson parameter is 53fnmeank)ObservedP=FrequencysumFrequencyExpectedP=Poisspdf(kmeank)TableData=[k Frequency ObservedP ExpectedP]fprintf(ntttttTable 425n)fprintf(No Detected k tFrequency tProportion tp_x(k)n)fprintf( 20ftttt20fttt53ftt53fnTableData) Using concepts from Chapter 5 a MLE fit could be made of the data The only new feature added is a 95 CI for the binomial parameter See LMcs050201_4thm for another example of using Poisspdf to fit the Poisson distribution to dataDATA=[zeros(81)ones(241)2ones(271)3ones(201)4ones(171) 5ones(101)6ones(31)7] [LAMBDAHAT LAMBDACI] = poissfit(DATA005) fprintf(

nThe Poisson parameter =53f with 95 CI [53f 53f]nn LAMBDAHAT LAMBDACI)

4215 The Poisson Model the law of small numbers 421511 Assumptions of the Poisson model

4215111 The probability that 2 or more events occur in any given ltsmallgt subinterval is essentially 0

4215112 Events are independent 4215113 The probability that an event occurs

during a given subinterval is constant over the entire interval from 0 to T

EEOS 601 Applied Statistics Week 4 P 11 of 38

42152 Calculating Poisson Probabilities 421521 ldquoCalculating Poisson probabilities is an

exercise in choosing T so that eumlT represents the expected number of occurrences in whatever ldquounitrdquo is associated with the random variable X

Example 423 LMex040203_4thm Example 423 Typographical Errors Page 285 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 172011 Revised 1711 Using the Poisson pdf ErrorRate=typographical Errors per page What is the probability that fewer than three typos will appear in a 16shy page edition X=Number of errorsErrorRate=04lambda=ErrorRate16fprintf(Using the Poisson cumulative distribution functionn)P=poisscdf(2lambda)fprintf(The probability that fewer than 3 typos will appear in a n)fprintf(16-p document with an error rate of 31f per page is 53fnlambdaP)fprintf(nUsing the Poisson probability distribution functionn)X=02P=sum(poisspdf(Xlambda))fprintf(If the Poisson parameter = 31f the probability of nlambda)fprintf(fewer than 3 typos = 53fnP)

Example 424 Insect parts in peanut butter

LMex040204_4thm Example 424 Eating bug parts Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu 121210 revised 11211 httpalphaesumbedufacultyedgfilesedgwebphtml This is a simple application of the Poisson cumulative distribution function and Poisson pdf The FDA Food Detection Action Limit for peanut butter is 30 insect fragments per 100 grams If your peanut butter snack weighs 20 grams what is the probability that youll ingest 5 or more insect parts

EEOS 601 Applied Statistics Week 4 P 12 of 38

Calculate lambda the expected number of insect parts per 20 gramslambda=3010020 solve using the Poisson cumulative distribution functionP=1-poisscdf(4lambda)fprintf(The probability that a 20-g snack contains 5 or more insect parts is 53fnP) The problem could also be solved exactly using the Poisson pdfk=04P2=1-sum(poisspdf(klambda))fprintf(The probability that a 20-g snack contains 5 or more insect parts is 53fnP2)

SSum of Poisson processes is a Poisson process

Questions Page 287 4210 Prussian horse kicks 4212 Midwestern Skies Airlines bags lost

Sum of two Poisson variables is Poisson

422 Interval between events the PoissonExponential

Relationship

Theorem 423 P 290

Suppose a series of events satisfying the Poisson model are occurring at a rate of euml

per unit time Let the random variable Y denote the inteval between consecutive

events Then Y has the exponential distribution

EEOS 601 Applied Statistics Week 4 P 13 of 38

Figure 2 Case study 424 Table 426-7 Figure 423

Case Study 424 LMcs040204_4thm Case Study 424 Mauna Loa eruptions amp exponential distribution P 290-291 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Ed Written by Eugene Gallagher for EEOS601 Figure 3 Figure 423 from Case study 424

IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 14 of 38

EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1711 Revised 1711 Fitting the exponential pdf to intervals between eruptons Data = interval between 37 consecutive eruptions of Mauna Loa DATA=[126 73 3 6 37 23 73 23 2 65 94 51 26 21 6 68 16 20 6 18 6 41 40 18 41 11 12 38 77 61 26 3 38 50 91 12] DATA=DATA() lambda=0027 fprintf(Eruptions occuring at 53f per month or once every 32f lambda1(12lambda)) fprintf(yearsn) histc is the Matlab counting function Plot the data using LMFig030402_4thm as a model ObservedI=histc(DATA020140) bar(020140ObservedI(sum(ObservedI)20)histc) set(get(gcaChildren)FaceColor[8 8 1]) axis([0 145 0 0033]) figure(gcf) xlabel(Interval between eruptions (in months)) ylabel(Density) ax1=gca get the handle for the bar charts axes pause y=0140 fy=lambdaexp(-lambday) h2=line(yfyColorrLineStyle--Linewidth2) set(ax1YTick[0001003]) text(340015f_y(y)=0027e^-0027y) title(Figure 423) figure(gcf) pause

Example 425 LMex040205_4thm Example 425 page 292 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu for EEOS601 If the Perseid meteor shower has a rate of 40 per hour Poisson distributed calculate the probability that someone would have to wait 5 minutes before seeing another y= interval in minutes between consecutive sightings Using the symbolic math toolbox

EEOS 601 Applied Statistics Week 4 P 15 of 38

fprintf(Exact P=n)syms y disp(int(4060exp(-4060y)5inf))P=eval(int(4060exp(-4060y)5inf))fprintf(If the Perseids have a Poisson parameter = 4060 per min n)fprintf(an observer would have p=64f of waiting 5 or more nP)fprintf(minutes between consecutive meteorsn)

43 THE NORMAL DISTRIBUTION

Figure 431 LMFig040301_4thm Page 293 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Generate 20 binomial random numbers p(success)=05 Written in 2001 by EugeneGallagherumbedu revised 1711 Use the statistical toolboxs binorndm X=020n=20p=05

Figure 431 Page 293 in Larsen amp Marx (2006)

Y = binopdf(Xnp)bar(XY1)axis([0 21 0 02])title(Figure 431FontSize14)set(get(gcaChildren)FaceColor[8 8 1])ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0220FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=npsigma_j=sqrt(np(1-p)) sigmaj is the standard deviationy_j=00220fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel()ylabel(Probability)title(Figure 431)figure(gcf)pause

IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 16 of 38

431 Theorem 431 Let X be a binomial random variable defined on n independent trials for which p=P(success) For any numbers a and b

Comment The function is referred to as the standard normal (or

Gaussian) curve [or distribution] By convention any random variable whose probabilistic behavior is described by a standard normal curve is denoted by Z (rather than by X Y or W) E(Z)=0 and Var(Z)=1 Page 294 in Larsen amp Marx (2006)

432 Finding areas under the standard normal curve

Figure 432 LMFig040302_4thm Larsen amp Marx Figure 432 plotpage 295 Use LMex040308_4thm as amodel Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Page 267 Standard normal distributionusing ezplot Integration using the symbolicmath toolbox Figure 4 Figure 432 Written byEugeneGallagherumbedu October 1 2010clear allhold off clfsyms z Just for fun use symbolic math integration to find the integralINTP=eval(int(1sqrt(2pi)exp(-(z^22))-inf114)) or use the standard normal cumulative distribution functionintp=normcdf(114)

Plot Figure 432mu=0sigma=1X=-350135

EEOS 601 Applied Statistics Week 4 P 17 of 38

Y = normpdf(Xmusigma) plot(XY--r) axis([-355 355 0 041]) title(Figure 432FontSize16) ax1=gca xlabel( ) ylabel(f_z(z)FontSize14) ax1=gca set(ax1XTick[-3 -2 -1 0 114 2 3] FontSize12

XTickLabel-3-2-1011423FontSize12) set(ax1ytick0104FontSize12) hold on xf=-3501114yf=normpdf(xfmusigma) fill([-35 xf 114][0 yf 0][8 8 1]) hold off

433 The continuity correction

Figure 5 Figure 433 the continuity correction

Example 431 LMex040301_4thm Example 431 Overbooking airline seats Application of the normal distribution and the continuity correction Page 297 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagher httpwwwesumbeduedgwebphtm Written Fall 2010 Revised 1711 178 seats sold for 168 economy class seats available p=09 passenger will show up What is the probability that not everyone who shows up at the gate on time can be accomodated n=178p=09n2=169

IT
Stamp

EEOS 601 Applied Statistics Week 4 P 18 of 38

P=normcdf((n+05-np)sqrt(np(1-p)))- normcdf((n2-05-np)sqrt(np(1-p))) fprintf( nP not all can be accomodated with continuity correction = 64fnP) Without the continuity correction P2=normcdf((n-np)sqrt(np(1-p)))- normcdf((n2-np)sqrt(np(1-p))) fprintf( P not all can be accomodated without continuity correction = 64fnP2) k=169178 Solve as a binomial pdf P3 = sum(binopdf(knp)) fprintf( P not all can be accomodated exact binomial = 64fnP3) P4 = binocdf(nnp)-binocdf(n2-1np) fprintf( P not all can be accomodated exact binomial cdf = 64fnP4)

Case Study 431 ESP LMcs040301_4thm Case Study 431 ESP P 298 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Thirty two students correctly guessed 12489 times out of 60000 when the expected number of successes would be just 12000 What is the probability of this occuring by chancek=12489n=6e4p=15P1=1-binocdf(knp)fprintf(nUsing the binomial cdf the exact P = 97gnP1)X=(124885-np)sqrt(np(1-p))P2=1-normcdf(X)fprintf(Using the Normal approximation with the continuity n)fprintf(correction the exact P = 97gnP2)X=(12489-np)sqrt(np(1-p))P3=1-normcdf(X)fprintf(Using the Normal approximation without the continuity n)fprintf(correction the exact P = 97gnP3)

Questions Page 299

EEOS 601 Applied Statistics Week 4 P 19 of 38

434 Central Limit Theorem (p 301) Central limit theorem Every binomial random variable X can be written as the sum of n independent Bernoulli random variables X X X where X =1 with probability P and 0 with probability 1-p but if 1 2 n i

X=X +X ++X Theorem 431 can be reexpressed as 1 2 n

The Russian Lyapunov made many of the key advances and in 1920 George Polya called it the central limit theorem Theorem 432 p 273 Let W W be an infinite sequence of of independent random variables 1 2

with the same distribution Suppose that the mean igrave and the variance oacute2 of f (w) are both finite w

For any numbers a and b

Comment The central limit theorem is often stated in terms of the average of W W and 1 2

W rather than the sum Theorem 432 can be stated in equivalent form n

Example 432 LMEx040302_4thm Larsen amp Marx Example 432 Example of the central limit theorem Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf

Figure 5 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 uniformly distributed random numbers after conversion to Z scores The fit to the normal distribution is quite good even though each of the 40 samples is based on only 5 uniformly distributed numbers

EEOS 601 Applied Statistics Week 4 P 20 of 38

See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogrammultiple of 7 n=100samples=1000numbins=14 Numbins for thehistogram multiple of 7 n=1000samples=500numbins=14 For the histogram n=1000samples=40 numbins=7 For the histogramC=rand(samplesn) Figure 6 Example 432 Page 302 in Larsen amp MarxC6=sum(C) sums of n replicates (2006) 1000 samples of 5 uniformly distributed random Use symbolic math toolbox to numbers after conversion to Z scores The number of binscalculate the variance of a uniform increased from 7 to 14 for this bar chart The normal random distribution distribution provides an excellent fit even though each ofmu=05 1000 means is based on only 5 cases drawn from a uniformsyms f z y distribution The uniform distribution is symmetricdisp(Definite integral (01) of y^2) allowing a rapid convergence to the normal distributionint(y^201)EYsq=int(sym(y^2)01)VARY=EYsq-mu^2disp(Var of a uniformly distributed random number on the interval (01))disp(VARY)vary=eval(VARY) convert from symbolic to numeric The variance is 112 The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302) end Use the code on page 5-115 of Statistics toolbox to draw histogram binwidthscaling=7numbins This will produce 7 bins (35-(-35)) edges=-35binwidthscaling35N=histc(C7edges) bar(edgesNsum(N)binwidthscaling1) ax1=gca set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 21 of 38

axis([-36 36 -0005 053]) ylabel(DensityFontSize14)xlabel(Z-ratio) set(ax1xtickedgesFontSize12) set(ax1ytick00105FontSize12) figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is Figure 7 Example 432 Page 302 in Larsen amp Marx f(y_j)=1(sqrt(2pi)sigma_j)exp(-1

(2006) 1000 samples of 500 uniformly distributed random numbers after conversion to Z scores The number of bins

2((y_j-mu_j)sigma_j)^2) increased from 7 to 14 for this bar chart The fit of the mu_j=0 sigma_j=1 sigmaj is the standard

normal distribution is excellent as would be expected from the large number of cases (500) on which each mean is

deviation = 1 after Z transform basedy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)s=sprintf(Example 432 n=40f and samples=30fnsamples)title(s)figure(gcf)pause

LMex040302_logrnd_4thm Larsen amp Marx Example 432 Example of the central limit theorem using lognormally distributed variables instead of uniform Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox

Figure 8 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 lognormally distributed (igrave=1 oacute=1) random numbers after conversion to Z-scores The normal approximation is quite poor Because each of the 40 samples is based on a mean of only 5 cases

EEOS 601 Applied Statistics Week 4 P 22 of 38

Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogram multiple of 7 Figure 9 Example 432 Page 302 in Larsen amp Marx

n=100samples=1000 (2006) 1000 samples of 1000 lognormally distributed

numbins=14 Numbins for the (igrave=1 oacute=1) random numbers after conversion to Z scores I

histogram multiple of 7 used 14 bins for this bar chart Even for the highly

n=1000samples=1000 asymmetrically lognormally distributed samples the

numbins=14 For the histogram normal approximation is excellent because of the large

n=1000samples=40 numbins=7 number of cases (1000) on which each mean was based

For the histogram lognormal=0 This example calculates the z scores based on the parametric mean and variance for the underlying distributions For the uniform distribution the variance is calculated from the variance of the distribution to be 112 For the lognormal distribution lognstat is the built-in Matlab function that will covert the mean and variance from the log scale to the standard scale if lognormal mu=1var=1 C=lognrnd(11samplesn) [muvary]=lognstat(11)s=sprintf(Example 432 Lognormal n=40f and samples=30fnsamples) s2=sprintf(Example 432 Lognormal distribution 1st sample n=40fn) else C=rand(samplesn)

mu=05 Use symbolic math toolbox to calculate the variance of a uniform

random distribution syms f z y disp(Definite integral (01) of y^2) int(y^201) EYsq=int(sym(y^2)01)

EEOS 601 Applied Statistics Week 4 P 23 of 38

VARY=EYsq-mu^2 disp( Var of a uniformly distributed random number on the interval (01)) disp(VARY) vary=eval(VARY) convert from symbolic to numeric The variance is 112 s=sprintf(Example 432 Uniform n=40f and samples=40fnsamples) s2=sprintf(Example 432 Uniform distribution 1st sample n=40fn)endC6=sum(C) sums of n replicates The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302)end

Use the code on page 5-115 of Statistics toolbox to draw histogram Draw a histogram of one row of datahist(C(1))ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])ylabel(FrequencyFontSize16)xlabel(XFontSize16)title(s2FontSize20)figure(gcf)pause

histogram of the z scoresbinwidthscaling=7numbins This will produce 7 bins (35-(-35))edges=-35binwidthscaling35N=histc(C7edges)bar(edgesNsum(N)binwidthscaling1histc)ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])axis([-36 36 -0005 105max(Nsum(N)binwidthscaling)])ylabel(DensityFontSize16)xlabel(Z-ratioFontSize16)set(ax1xtickedgesFontSize14)set(ax1ytick00107FontSize14)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=0

EEOS 601 Applied Statistics Week 4 P 24 of 38

sigma_j=1 sigmaj is the standard deviation = 1 after Z transformy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)

title(sFontSize20)figure(gcf)pause

Example 433 LMex040303_4thm Example 433 Pp 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition An application of the symbolic math integration functions Typo in book f_y(y)=y 3(1-y)^2 not fy(y)=3(1-y)^2 syms y f_y(y)=y 3(1-y)^2 0lt=y=1 Let mean y = (115)Sum from 1 to 15 Yi int(y3(1-y)^201) muy=eval(int(y3(1-y)^201)) by definition of variance int(y^23(1-y)^201)-14^2 vary=eval(int(y^23(1-y)^201)-14^2) P=normcdf((38-muy) (sqrt(vary)sqrt(15)))- normcdf((18-muy) (sqrt(vary)sqrt(15)))

Example 434 LMEx040304_4thm Example 434 Effects of rounding analyzed with normal approximation Effect of rounding on the expected value and variance of rounded estimates P 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Using the central limit theorem what is the probability that the error on 100 transactions each rounded to the nearest $100 would exceed $500 What is the variance of a uniformly distributed random number over the interval -50 to 50 From definition of Variance Var(Y)=E[(Y-mu)^2]= integral from -50 to 50 (y-mu)^2 f_y(y) dy mu=0 f_y(y)=1 This is the height of the uniform pdf at each value of y for max(y)-min(y)=1 the value of f_y(y) is 1 for uniform random numbers spread over different ranges for max(y)-min(y)f_y(y) must equal 1 thus for a uniform random number distributed on the interval -100 to 100

EEOS 601 Applied Statistics Week 4 P 25 of 38

f_y(y)=1200 Written by EugeneGallagherumbedu for EEOS601 httpwwwesumbeduedgwebphtm Written in 2001 revised 1911 The mfile goes through some applications of the symbolic math toolbox at the beginning to verify that the variance of 100 rounded transactions is 25003 format rat syms y mu int(y^2100-5050)Use the symbolic math toolbox to find the variance of a uniformly distributed variable on the interval -50 to 50 it is 25003 VarY=eval(int(sym(y^2100)-5050)) Variance of a uniform randomly distributed number on the interval -50 to 50 is 25003 Note how the variance remains 25003 with a shift in mean Var(Y)=E(Y^2)-mu^2 mu=50 int(y^21000100) EYsq=eval(int(sym(y^2100)0100)) The previous statement produces 100003 which produces an identical variance after subtracting the square of the expected value mu^2 as is appropriate from the definition of variance VARY=EYsq-mu^2 Note that Var(Y) is also defined as integral -inf to inf (y-mu)^2 f_y(y) dy We can create the definite integral of this equation and evaluate it to produce the variance syms y mu mu=50 VARY2=eval(int(sym((y-mu)^2100)y0100)) VARY2 should also be 25003 format using Theorem 31312 in Larsen amp Marx (2001 p 223) Calculate the variance of all 100 transactions (multiply the variance by 100) Note this is for 100 indepedently distributed random variates where the positive and negative errors among transactions are not correlated Varall=100VARY If you wanted to convert the variance calculated here in dollars to the variance in pennies youd have to use Theorem 3131 p 222 It would the variance of Varall in pennies would be a=100Var(aVarall)=a^2 Varall =10000 Varall calculate the standard deviation sigmaall=sqrt(Varall)

EEOS 601 Applied Statistics Week 4 P 26 of 38

Calculate the Z statisticZ=zeros(21)Z(1)=(-500-0)sigmaallZ(2)=(500-0)sigmaall Use the cumulative normal probability function to find the probability of being more than $5 dollars off after 100 transactionsP=1-(normcdf(Z(2))-normcdf(Z(1)))fprintf(The probability of being more than $500 off after 100n)fprintf(transactions in which each transaction is rounded to the n)fprintf(nearest $100 is 64fnP)

Example 435 LMex040305_4thm Example 435 Memphis earthquakes and the Poisson distribution Pp 305-306 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Annual earthquakes registering 25 on the Richter scale and within 40 miles of downtown Memphis have a Poisson distribution with lambda=65 Calculate the exact probability that nine or more such earthquakes will strike next year using the Poisson distribution and compare that with the normal approximation based on the central limit theoremlambda=65N=9P=1-poisscdf(N-1lambda)fprintf(nThe exact p value using the Poisson cumulative distribution )fprintf(function with lambda=53f is 74fnlambdaP)P2=1-normcdf(((N-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP2)continuity=05 The continuity correction is appropriate whenever a discrete probability model is being approximated by the area under a curve)P3=1-normcdf(((N+continuity-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem with )fprintf(the continuity correction is )fprintf(74fnP3)

Optional plot the Poisson pdf and the normal approximation Use LMFig040301_4thm as a modelX=018Y = poisspdf(Xlambda)bar(XY1)axis([-06 19 0 02])title(Example 435FontSize14)set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 27 of 38

ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0218FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=lambdasigma_j=sqrt(lambda) sigmaj is the standard deviationy_j=00218fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel(Number of Memphis Earthquakes gt 25)figure(gcf)pause

Questions p 306

435 The Normal Curve as a Model for Individual Measurements

Definition 431 A random variable Y is said to be normally distributed with mean igrave and variance oacutesup2 if

Comment Areas under and ldquoarbitraryrdquo normal distribution f (y) y

are calculated by finding the equivalent area under the standard normal distribution f (z) z Figure 10 Figure 435 from Example 436 If a driverrsquos

true blood alcohol level is 0095 and the breathalyzer has an instrumental error oacute= 0004 (pdf shown in upper panel) What is the probability that a driver will be

The ratio is often referred to erroneously charged with DUI The problem can be as either a Z transformation or a Z analyzed using the pdf for the unscaled data or the data can score be converted to z scores (lower panel) by subtracting the

mean and dividing by oacute The probability of an erroneous Example 436 charge is 1056 the area to the right of the cutpoints LMex040306_4thm shown in the normal and standard normal curves

EEOS 601 Applied Statistics Week 4 P 28 of 38

Example 436 Drunk driving pages 308-309 in Larsen amp Marx (2006) Based on LMex040305_4thm Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml A legally drunk driver in many states has a blood alcohol level of 010 but the breathalyzer has measurement error Repeated measurements indicate that the the estimate of the mean is unbiased and sigma is 0004 If a drivers true blood alcohol content is 0095 what is the probability that he will be incorrectly booked on a DUI chargemu=0095sigma=0004Observed=01P=1-normcdf((Observed-mu)sigma)fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP)

Plot Figure 435subplot(211)X=0075000050115Y = normpdf(Xmusigma)plot(XY--r)axis([0075 0115 0 105])title(Figure 435FontSize14)ax1=gcaxlabel(Blood Alcohol Level)ylabel(f_y(y)FontSize14)ax1=gcaset(ax1xtick0080005011FontSize12)set(ax1ytick025100FontSize12)hold onxf=0100001115yf=normpdf(xfmusigma)fill([01 xf 0115][0 yf 0][8 8 1])hold off

subplot(212)X=-3500135Y = normpdf(X01)plot(XY--r)axis([-355 355 0 042])ax1=gcaxlabel(Z-score)ylabel(f_z(z)FontSize14)ax1=gca

EEOS 601 Applied Statistics Week 4 P 29 of 38

set(ax1xtick-350535FontSize12)set(ax1ytick00204FontSize12)hold onxf=125000135yf=normpdf(xf01)fill([125 xf 35][0 yf 0][8 8 1])figure(gcf)pausehold offsubplot(111)

Example 437 Mensa p 309 LMex040307_4thm Example 437 Mensa amp IQ Pages309-310 in Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Based on LMex040306_4thm Written Fall 2010 by EugeneGallagher for EEOS601 Revised1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfi Figure 11 Figure 436 from Example 437 A Mensa

lesedgwebphtml score of 133 marks the upper 2 of IQrsquos if igrave=100 and

A person must be in the upper 2 oacute=16

of the population in IQ to join What is the minimum IQ that will qualify a person for membership Assume that the mean IQ is 100 and sigma is 16mu=100sigma=16 Use the inverse of the normal pdf to find the IQ cutoff for the upper 2 of the population 098=(1-002)IQmensa=norminv(09810016)fprintf(nThe lowest acceptable IQ for membership in Mensa is )fprintf(30fnceil(IQmensa))

Plot Figure 436X=50150Y = normpdf(Xmusigma)plot(XY--r)axis([45 155 0 003])title(Figure 436 Mensa amp IQFontSize20)ax1=gcaxlabel(IQFontSize16)ylabel(f_y(y)FontSize16)ax1=gcaset(ax1xtick[50 100 133 150]FontSize14)

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 9: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 9 of 38

4213 The Poisson Distribution 42131 First described by Poisson as a limit theorem and

then used in 1898 by Professor Ladislaus von Bortkiewicz to model the number the Prussian cavalry officers kicked to death by their horses

Larsen amp Marx 2001 Theorem 422 p 251 The random variable X is said to have a Poisson distribution if

4214 Fitting the Poisson Distribution to Data

Case Study 422 LMcs040202_4thm Case Study 422 alpha decay amp Geiger counter Page 282-284 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1142010 Revised 1711 Fitting the Poisson pdf to radioactive decay alpha-particlek=[011]Frequency=[57 203 383 525 532 408 273 139 45 27 10 6]sumFrequency=sum(Frequency)lambda=sum(kFrequency)sumFrequencyfprintf(The Poisson parameter is 53fnlambda)ObservedP=FrequencysumFrequencyExpectedP=poisspdf(klambda)TableData=[k Frequency ObservedP ExpectedP]fprintf(ntttttTable 423n)fprintf(No Detected k tFrequency tProportion tp_x(k)n)fprintf( 20ftttt20fttt53ftt53fnTableData)

Using concepts from Chapter 5 a MLE fit could be made of the data The only new feature added is a 95 CI for the binomial parameter See LMcs050201_4thm for another example of using Poisspdf to fit the Poisson distribution to dataDATA=[zeros(571)ones(2031)2ones(3831)3ones(5251)4ones(5321) 5ones(4081)6ones(2731)7ones(1391)8ones(451) 9ones(271)10ones(101)11ones(61)] [LAMBDAHAT LAMBDACI] = poissfit(DATA005) fprintf(

nThe Poisson parameter =53f with 95 CI [53f 53f]nn LAMBDAHAT LAMBDACI)

EEOS 601 Applied Statistics Week 4 P 10 of 38

Case Study 423 LMcs040203_4thm Fitting football fumbles to a Poisson pdf From Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu Revised 1142010 See LMcs040202_4thm (alpha decay) for a very similar programk=[07]Frequency=[8 24 27 20 17 10 3 1]sumFrequency=sum(Frequency)meank=sum(kFrequency)sumFrequencyfprintf(The Poisson parameter is 53fnmeank)ObservedP=FrequencysumFrequencyExpectedP=Poisspdf(kmeank)TableData=[k Frequency ObservedP ExpectedP]fprintf(ntttttTable 425n)fprintf(No Detected k tFrequency tProportion tp_x(k)n)fprintf( 20ftttt20fttt53ftt53fnTableData) Using concepts from Chapter 5 a MLE fit could be made of the data The only new feature added is a 95 CI for the binomial parameter See LMcs050201_4thm for another example of using Poisspdf to fit the Poisson distribution to dataDATA=[zeros(81)ones(241)2ones(271)3ones(201)4ones(171) 5ones(101)6ones(31)7] [LAMBDAHAT LAMBDACI] = poissfit(DATA005) fprintf(

nThe Poisson parameter =53f with 95 CI [53f 53f]nn LAMBDAHAT LAMBDACI)

4215 The Poisson Model the law of small numbers 421511 Assumptions of the Poisson model

4215111 The probability that 2 or more events occur in any given ltsmallgt subinterval is essentially 0

4215112 Events are independent 4215113 The probability that an event occurs

during a given subinterval is constant over the entire interval from 0 to T

EEOS 601 Applied Statistics Week 4 P 11 of 38

42152 Calculating Poisson Probabilities 421521 ldquoCalculating Poisson probabilities is an

exercise in choosing T so that eumlT represents the expected number of occurrences in whatever ldquounitrdquo is associated with the random variable X

Example 423 LMex040203_4thm Example 423 Typographical Errors Page 285 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 172011 Revised 1711 Using the Poisson pdf ErrorRate=typographical Errors per page What is the probability that fewer than three typos will appear in a 16shy page edition X=Number of errorsErrorRate=04lambda=ErrorRate16fprintf(Using the Poisson cumulative distribution functionn)P=poisscdf(2lambda)fprintf(The probability that fewer than 3 typos will appear in a n)fprintf(16-p document with an error rate of 31f per page is 53fnlambdaP)fprintf(nUsing the Poisson probability distribution functionn)X=02P=sum(poisspdf(Xlambda))fprintf(If the Poisson parameter = 31f the probability of nlambda)fprintf(fewer than 3 typos = 53fnP)

Example 424 Insect parts in peanut butter

LMex040204_4thm Example 424 Eating bug parts Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu 121210 revised 11211 httpalphaesumbedufacultyedgfilesedgwebphtml This is a simple application of the Poisson cumulative distribution function and Poisson pdf The FDA Food Detection Action Limit for peanut butter is 30 insect fragments per 100 grams If your peanut butter snack weighs 20 grams what is the probability that youll ingest 5 or more insect parts

EEOS 601 Applied Statistics Week 4 P 12 of 38

Calculate lambda the expected number of insect parts per 20 gramslambda=3010020 solve using the Poisson cumulative distribution functionP=1-poisscdf(4lambda)fprintf(The probability that a 20-g snack contains 5 or more insect parts is 53fnP) The problem could also be solved exactly using the Poisson pdfk=04P2=1-sum(poisspdf(klambda))fprintf(The probability that a 20-g snack contains 5 or more insect parts is 53fnP2)

SSum of Poisson processes is a Poisson process

Questions Page 287 4210 Prussian horse kicks 4212 Midwestern Skies Airlines bags lost

Sum of two Poisson variables is Poisson

422 Interval between events the PoissonExponential

Relationship

Theorem 423 P 290

Suppose a series of events satisfying the Poisson model are occurring at a rate of euml

per unit time Let the random variable Y denote the inteval between consecutive

events Then Y has the exponential distribution

EEOS 601 Applied Statistics Week 4 P 13 of 38

Figure 2 Case study 424 Table 426-7 Figure 423

Case Study 424 LMcs040204_4thm Case Study 424 Mauna Loa eruptions amp exponential distribution P 290-291 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Ed Written by Eugene Gallagher for EEOS601 Figure 3 Figure 423 from Case study 424

IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 14 of 38

EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1711 Revised 1711 Fitting the exponential pdf to intervals between eruptons Data = interval between 37 consecutive eruptions of Mauna Loa DATA=[126 73 3 6 37 23 73 23 2 65 94 51 26 21 6 68 16 20 6 18 6 41 40 18 41 11 12 38 77 61 26 3 38 50 91 12] DATA=DATA() lambda=0027 fprintf(Eruptions occuring at 53f per month or once every 32f lambda1(12lambda)) fprintf(yearsn) histc is the Matlab counting function Plot the data using LMFig030402_4thm as a model ObservedI=histc(DATA020140) bar(020140ObservedI(sum(ObservedI)20)histc) set(get(gcaChildren)FaceColor[8 8 1]) axis([0 145 0 0033]) figure(gcf) xlabel(Interval between eruptions (in months)) ylabel(Density) ax1=gca get the handle for the bar charts axes pause y=0140 fy=lambdaexp(-lambday) h2=line(yfyColorrLineStyle--Linewidth2) set(ax1YTick[0001003]) text(340015f_y(y)=0027e^-0027y) title(Figure 423) figure(gcf) pause

Example 425 LMex040205_4thm Example 425 page 292 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu for EEOS601 If the Perseid meteor shower has a rate of 40 per hour Poisson distributed calculate the probability that someone would have to wait 5 minutes before seeing another y= interval in minutes between consecutive sightings Using the symbolic math toolbox

EEOS 601 Applied Statistics Week 4 P 15 of 38

fprintf(Exact P=n)syms y disp(int(4060exp(-4060y)5inf))P=eval(int(4060exp(-4060y)5inf))fprintf(If the Perseids have a Poisson parameter = 4060 per min n)fprintf(an observer would have p=64f of waiting 5 or more nP)fprintf(minutes between consecutive meteorsn)

43 THE NORMAL DISTRIBUTION

Figure 431 LMFig040301_4thm Page 293 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Generate 20 binomial random numbers p(success)=05 Written in 2001 by EugeneGallagherumbedu revised 1711 Use the statistical toolboxs binorndm X=020n=20p=05

Figure 431 Page 293 in Larsen amp Marx (2006)

Y = binopdf(Xnp)bar(XY1)axis([0 21 0 02])title(Figure 431FontSize14)set(get(gcaChildren)FaceColor[8 8 1])ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0220FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=npsigma_j=sqrt(np(1-p)) sigmaj is the standard deviationy_j=00220fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel()ylabel(Probability)title(Figure 431)figure(gcf)pause

IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 16 of 38

431 Theorem 431 Let X be a binomial random variable defined on n independent trials for which p=P(success) For any numbers a and b

Comment The function is referred to as the standard normal (or

Gaussian) curve [or distribution] By convention any random variable whose probabilistic behavior is described by a standard normal curve is denoted by Z (rather than by X Y or W) E(Z)=0 and Var(Z)=1 Page 294 in Larsen amp Marx (2006)

432 Finding areas under the standard normal curve

Figure 432 LMFig040302_4thm Larsen amp Marx Figure 432 plotpage 295 Use LMex040308_4thm as amodel Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Page 267 Standard normal distributionusing ezplot Integration using the symbolicmath toolbox Figure 4 Figure 432 Written byEugeneGallagherumbedu October 1 2010clear allhold off clfsyms z Just for fun use symbolic math integration to find the integralINTP=eval(int(1sqrt(2pi)exp(-(z^22))-inf114)) or use the standard normal cumulative distribution functionintp=normcdf(114)

Plot Figure 432mu=0sigma=1X=-350135

EEOS 601 Applied Statistics Week 4 P 17 of 38

Y = normpdf(Xmusigma) plot(XY--r) axis([-355 355 0 041]) title(Figure 432FontSize16) ax1=gca xlabel( ) ylabel(f_z(z)FontSize14) ax1=gca set(ax1XTick[-3 -2 -1 0 114 2 3] FontSize12

XTickLabel-3-2-1011423FontSize12) set(ax1ytick0104FontSize12) hold on xf=-3501114yf=normpdf(xfmusigma) fill([-35 xf 114][0 yf 0][8 8 1]) hold off

433 The continuity correction

Figure 5 Figure 433 the continuity correction

Example 431 LMex040301_4thm Example 431 Overbooking airline seats Application of the normal distribution and the continuity correction Page 297 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagher httpwwwesumbeduedgwebphtm Written Fall 2010 Revised 1711 178 seats sold for 168 economy class seats available p=09 passenger will show up What is the probability that not everyone who shows up at the gate on time can be accomodated n=178p=09n2=169

IT
Stamp

EEOS 601 Applied Statistics Week 4 P 18 of 38

P=normcdf((n+05-np)sqrt(np(1-p)))- normcdf((n2-05-np)sqrt(np(1-p))) fprintf( nP not all can be accomodated with continuity correction = 64fnP) Without the continuity correction P2=normcdf((n-np)sqrt(np(1-p)))- normcdf((n2-np)sqrt(np(1-p))) fprintf( P not all can be accomodated without continuity correction = 64fnP2) k=169178 Solve as a binomial pdf P3 = sum(binopdf(knp)) fprintf( P not all can be accomodated exact binomial = 64fnP3) P4 = binocdf(nnp)-binocdf(n2-1np) fprintf( P not all can be accomodated exact binomial cdf = 64fnP4)

Case Study 431 ESP LMcs040301_4thm Case Study 431 ESP P 298 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Thirty two students correctly guessed 12489 times out of 60000 when the expected number of successes would be just 12000 What is the probability of this occuring by chancek=12489n=6e4p=15P1=1-binocdf(knp)fprintf(nUsing the binomial cdf the exact P = 97gnP1)X=(124885-np)sqrt(np(1-p))P2=1-normcdf(X)fprintf(Using the Normal approximation with the continuity n)fprintf(correction the exact P = 97gnP2)X=(12489-np)sqrt(np(1-p))P3=1-normcdf(X)fprintf(Using the Normal approximation without the continuity n)fprintf(correction the exact P = 97gnP3)

Questions Page 299

EEOS 601 Applied Statistics Week 4 P 19 of 38

434 Central Limit Theorem (p 301) Central limit theorem Every binomial random variable X can be written as the sum of n independent Bernoulli random variables X X X where X =1 with probability P and 0 with probability 1-p but if 1 2 n i

X=X +X ++X Theorem 431 can be reexpressed as 1 2 n

The Russian Lyapunov made many of the key advances and in 1920 George Polya called it the central limit theorem Theorem 432 p 273 Let W W be an infinite sequence of of independent random variables 1 2

with the same distribution Suppose that the mean igrave and the variance oacute2 of f (w) are both finite w

For any numbers a and b

Comment The central limit theorem is often stated in terms of the average of W W and 1 2

W rather than the sum Theorem 432 can be stated in equivalent form n

Example 432 LMEx040302_4thm Larsen amp Marx Example 432 Example of the central limit theorem Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf

Figure 5 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 uniformly distributed random numbers after conversion to Z scores The fit to the normal distribution is quite good even though each of the 40 samples is based on only 5 uniformly distributed numbers

EEOS 601 Applied Statistics Week 4 P 20 of 38

See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogrammultiple of 7 n=100samples=1000numbins=14 Numbins for thehistogram multiple of 7 n=1000samples=500numbins=14 For the histogram n=1000samples=40 numbins=7 For the histogramC=rand(samplesn) Figure 6 Example 432 Page 302 in Larsen amp MarxC6=sum(C) sums of n replicates (2006) 1000 samples of 5 uniformly distributed random Use symbolic math toolbox to numbers after conversion to Z scores The number of binscalculate the variance of a uniform increased from 7 to 14 for this bar chart The normal random distribution distribution provides an excellent fit even though each ofmu=05 1000 means is based on only 5 cases drawn from a uniformsyms f z y distribution The uniform distribution is symmetricdisp(Definite integral (01) of y^2) allowing a rapid convergence to the normal distributionint(y^201)EYsq=int(sym(y^2)01)VARY=EYsq-mu^2disp(Var of a uniformly distributed random number on the interval (01))disp(VARY)vary=eval(VARY) convert from symbolic to numeric The variance is 112 The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302) end Use the code on page 5-115 of Statistics toolbox to draw histogram binwidthscaling=7numbins This will produce 7 bins (35-(-35)) edges=-35binwidthscaling35N=histc(C7edges) bar(edgesNsum(N)binwidthscaling1) ax1=gca set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 21 of 38

axis([-36 36 -0005 053]) ylabel(DensityFontSize14)xlabel(Z-ratio) set(ax1xtickedgesFontSize12) set(ax1ytick00105FontSize12) figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is Figure 7 Example 432 Page 302 in Larsen amp Marx f(y_j)=1(sqrt(2pi)sigma_j)exp(-1

(2006) 1000 samples of 500 uniformly distributed random numbers after conversion to Z scores The number of bins

2((y_j-mu_j)sigma_j)^2) increased from 7 to 14 for this bar chart The fit of the mu_j=0 sigma_j=1 sigmaj is the standard

normal distribution is excellent as would be expected from the large number of cases (500) on which each mean is

deviation = 1 after Z transform basedy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)s=sprintf(Example 432 n=40f and samples=30fnsamples)title(s)figure(gcf)pause

LMex040302_logrnd_4thm Larsen amp Marx Example 432 Example of the central limit theorem using lognormally distributed variables instead of uniform Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox

Figure 8 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 lognormally distributed (igrave=1 oacute=1) random numbers after conversion to Z-scores The normal approximation is quite poor Because each of the 40 samples is based on a mean of only 5 cases

EEOS 601 Applied Statistics Week 4 P 22 of 38

Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogram multiple of 7 Figure 9 Example 432 Page 302 in Larsen amp Marx

n=100samples=1000 (2006) 1000 samples of 1000 lognormally distributed

numbins=14 Numbins for the (igrave=1 oacute=1) random numbers after conversion to Z scores I

histogram multiple of 7 used 14 bins for this bar chart Even for the highly

n=1000samples=1000 asymmetrically lognormally distributed samples the

numbins=14 For the histogram normal approximation is excellent because of the large

n=1000samples=40 numbins=7 number of cases (1000) on which each mean was based

For the histogram lognormal=0 This example calculates the z scores based on the parametric mean and variance for the underlying distributions For the uniform distribution the variance is calculated from the variance of the distribution to be 112 For the lognormal distribution lognstat is the built-in Matlab function that will covert the mean and variance from the log scale to the standard scale if lognormal mu=1var=1 C=lognrnd(11samplesn) [muvary]=lognstat(11)s=sprintf(Example 432 Lognormal n=40f and samples=30fnsamples) s2=sprintf(Example 432 Lognormal distribution 1st sample n=40fn) else C=rand(samplesn)

mu=05 Use symbolic math toolbox to calculate the variance of a uniform

random distribution syms f z y disp(Definite integral (01) of y^2) int(y^201) EYsq=int(sym(y^2)01)

EEOS 601 Applied Statistics Week 4 P 23 of 38

VARY=EYsq-mu^2 disp( Var of a uniformly distributed random number on the interval (01)) disp(VARY) vary=eval(VARY) convert from symbolic to numeric The variance is 112 s=sprintf(Example 432 Uniform n=40f and samples=40fnsamples) s2=sprintf(Example 432 Uniform distribution 1st sample n=40fn)endC6=sum(C) sums of n replicates The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302)end

Use the code on page 5-115 of Statistics toolbox to draw histogram Draw a histogram of one row of datahist(C(1))ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])ylabel(FrequencyFontSize16)xlabel(XFontSize16)title(s2FontSize20)figure(gcf)pause

histogram of the z scoresbinwidthscaling=7numbins This will produce 7 bins (35-(-35))edges=-35binwidthscaling35N=histc(C7edges)bar(edgesNsum(N)binwidthscaling1histc)ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])axis([-36 36 -0005 105max(Nsum(N)binwidthscaling)])ylabel(DensityFontSize16)xlabel(Z-ratioFontSize16)set(ax1xtickedgesFontSize14)set(ax1ytick00107FontSize14)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=0

EEOS 601 Applied Statistics Week 4 P 24 of 38

sigma_j=1 sigmaj is the standard deviation = 1 after Z transformy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)

title(sFontSize20)figure(gcf)pause

Example 433 LMex040303_4thm Example 433 Pp 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition An application of the symbolic math integration functions Typo in book f_y(y)=y 3(1-y)^2 not fy(y)=3(1-y)^2 syms y f_y(y)=y 3(1-y)^2 0lt=y=1 Let mean y = (115)Sum from 1 to 15 Yi int(y3(1-y)^201) muy=eval(int(y3(1-y)^201)) by definition of variance int(y^23(1-y)^201)-14^2 vary=eval(int(y^23(1-y)^201)-14^2) P=normcdf((38-muy) (sqrt(vary)sqrt(15)))- normcdf((18-muy) (sqrt(vary)sqrt(15)))

Example 434 LMEx040304_4thm Example 434 Effects of rounding analyzed with normal approximation Effect of rounding on the expected value and variance of rounded estimates P 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Using the central limit theorem what is the probability that the error on 100 transactions each rounded to the nearest $100 would exceed $500 What is the variance of a uniformly distributed random number over the interval -50 to 50 From definition of Variance Var(Y)=E[(Y-mu)^2]= integral from -50 to 50 (y-mu)^2 f_y(y) dy mu=0 f_y(y)=1 This is the height of the uniform pdf at each value of y for max(y)-min(y)=1 the value of f_y(y) is 1 for uniform random numbers spread over different ranges for max(y)-min(y)f_y(y) must equal 1 thus for a uniform random number distributed on the interval -100 to 100

EEOS 601 Applied Statistics Week 4 P 25 of 38

f_y(y)=1200 Written by EugeneGallagherumbedu for EEOS601 httpwwwesumbeduedgwebphtm Written in 2001 revised 1911 The mfile goes through some applications of the symbolic math toolbox at the beginning to verify that the variance of 100 rounded transactions is 25003 format rat syms y mu int(y^2100-5050)Use the symbolic math toolbox to find the variance of a uniformly distributed variable on the interval -50 to 50 it is 25003 VarY=eval(int(sym(y^2100)-5050)) Variance of a uniform randomly distributed number on the interval -50 to 50 is 25003 Note how the variance remains 25003 with a shift in mean Var(Y)=E(Y^2)-mu^2 mu=50 int(y^21000100) EYsq=eval(int(sym(y^2100)0100)) The previous statement produces 100003 which produces an identical variance after subtracting the square of the expected value mu^2 as is appropriate from the definition of variance VARY=EYsq-mu^2 Note that Var(Y) is also defined as integral -inf to inf (y-mu)^2 f_y(y) dy We can create the definite integral of this equation and evaluate it to produce the variance syms y mu mu=50 VARY2=eval(int(sym((y-mu)^2100)y0100)) VARY2 should also be 25003 format using Theorem 31312 in Larsen amp Marx (2001 p 223) Calculate the variance of all 100 transactions (multiply the variance by 100) Note this is for 100 indepedently distributed random variates where the positive and negative errors among transactions are not correlated Varall=100VARY If you wanted to convert the variance calculated here in dollars to the variance in pennies youd have to use Theorem 3131 p 222 It would the variance of Varall in pennies would be a=100Var(aVarall)=a^2 Varall =10000 Varall calculate the standard deviation sigmaall=sqrt(Varall)

EEOS 601 Applied Statistics Week 4 P 26 of 38

Calculate the Z statisticZ=zeros(21)Z(1)=(-500-0)sigmaallZ(2)=(500-0)sigmaall Use the cumulative normal probability function to find the probability of being more than $5 dollars off after 100 transactionsP=1-(normcdf(Z(2))-normcdf(Z(1)))fprintf(The probability of being more than $500 off after 100n)fprintf(transactions in which each transaction is rounded to the n)fprintf(nearest $100 is 64fnP)

Example 435 LMex040305_4thm Example 435 Memphis earthquakes and the Poisson distribution Pp 305-306 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Annual earthquakes registering 25 on the Richter scale and within 40 miles of downtown Memphis have a Poisson distribution with lambda=65 Calculate the exact probability that nine or more such earthquakes will strike next year using the Poisson distribution and compare that with the normal approximation based on the central limit theoremlambda=65N=9P=1-poisscdf(N-1lambda)fprintf(nThe exact p value using the Poisson cumulative distribution )fprintf(function with lambda=53f is 74fnlambdaP)P2=1-normcdf(((N-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP2)continuity=05 The continuity correction is appropriate whenever a discrete probability model is being approximated by the area under a curve)P3=1-normcdf(((N+continuity-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem with )fprintf(the continuity correction is )fprintf(74fnP3)

Optional plot the Poisson pdf and the normal approximation Use LMFig040301_4thm as a modelX=018Y = poisspdf(Xlambda)bar(XY1)axis([-06 19 0 02])title(Example 435FontSize14)set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 27 of 38

ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0218FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=lambdasigma_j=sqrt(lambda) sigmaj is the standard deviationy_j=00218fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel(Number of Memphis Earthquakes gt 25)figure(gcf)pause

Questions p 306

435 The Normal Curve as a Model for Individual Measurements

Definition 431 A random variable Y is said to be normally distributed with mean igrave and variance oacutesup2 if

Comment Areas under and ldquoarbitraryrdquo normal distribution f (y) y

are calculated by finding the equivalent area under the standard normal distribution f (z) z Figure 10 Figure 435 from Example 436 If a driverrsquos

true blood alcohol level is 0095 and the breathalyzer has an instrumental error oacute= 0004 (pdf shown in upper panel) What is the probability that a driver will be

The ratio is often referred to erroneously charged with DUI The problem can be as either a Z transformation or a Z analyzed using the pdf for the unscaled data or the data can score be converted to z scores (lower panel) by subtracting the

mean and dividing by oacute The probability of an erroneous Example 436 charge is 1056 the area to the right of the cutpoints LMex040306_4thm shown in the normal and standard normal curves

EEOS 601 Applied Statistics Week 4 P 28 of 38

Example 436 Drunk driving pages 308-309 in Larsen amp Marx (2006) Based on LMex040305_4thm Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml A legally drunk driver in many states has a blood alcohol level of 010 but the breathalyzer has measurement error Repeated measurements indicate that the the estimate of the mean is unbiased and sigma is 0004 If a drivers true blood alcohol content is 0095 what is the probability that he will be incorrectly booked on a DUI chargemu=0095sigma=0004Observed=01P=1-normcdf((Observed-mu)sigma)fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP)

Plot Figure 435subplot(211)X=0075000050115Y = normpdf(Xmusigma)plot(XY--r)axis([0075 0115 0 105])title(Figure 435FontSize14)ax1=gcaxlabel(Blood Alcohol Level)ylabel(f_y(y)FontSize14)ax1=gcaset(ax1xtick0080005011FontSize12)set(ax1ytick025100FontSize12)hold onxf=0100001115yf=normpdf(xfmusigma)fill([01 xf 0115][0 yf 0][8 8 1])hold off

subplot(212)X=-3500135Y = normpdf(X01)plot(XY--r)axis([-355 355 0 042])ax1=gcaxlabel(Z-score)ylabel(f_z(z)FontSize14)ax1=gca

EEOS 601 Applied Statistics Week 4 P 29 of 38

set(ax1xtick-350535FontSize12)set(ax1ytick00204FontSize12)hold onxf=125000135yf=normpdf(xf01)fill([125 xf 35][0 yf 0][8 8 1])figure(gcf)pausehold offsubplot(111)

Example 437 Mensa p 309 LMex040307_4thm Example 437 Mensa amp IQ Pages309-310 in Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Based on LMex040306_4thm Written Fall 2010 by EugeneGallagher for EEOS601 Revised1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfi Figure 11 Figure 436 from Example 437 A Mensa

lesedgwebphtml score of 133 marks the upper 2 of IQrsquos if igrave=100 and

A person must be in the upper 2 oacute=16

of the population in IQ to join What is the minimum IQ that will qualify a person for membership Assume that the mean IQ is 100 and sigma is 16mu=100sigma=16 Use the inverse of the normal pdf to find the IQ cutoff for the upper 2 of the population 098=(1-002)IQmensa=norminv(09810016)fprintf(nThe lowest acceptable IQ for membership in Mensa is )fprintf(30fnceil(IQmensa))

Plot Figure 436X=50150Y = normpdf(Xmusigma)plot(XY--r)axis([45 155 0 003])title(Figure 436 Mensa amp IQFontSize20)ax1=gcaxlabel(IQFontSize16)ylabel(f_y(y)FontSize16)ax1=gcaset(ax1xtick[50 100 133 150]FontSize14)

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 10: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 10 of 38

Case Study 423 LMcs040203_4thm Fitting football fumbles to a Poisson pdf From Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu Revised 1142010 See LMcs040202_4thm (alpha decay) for a very similar programk=[07]Frequency=[8 24 27 20 17 10 3 1]sumFrequency=sum(Frequency)meank=sum(kFrequency)sumFrequencyfprintf(The Poisson parameter is 53fnmeank)ObservedP=FrequencysumFrequencyExpectedP=Poisspdf(kmeank)TableData=[k Frequency ObservedP ExpectedP]fprintf(ntttttTable 425n)fprintf(No Detected k tFrequency tProportion tp_x(k)n)fprintf( 20ftttt20fttt53ftt53fnTableData) Using concepts from Chapter 5 a MLE fit could be made of the data The only new feature added is a 95 CI for the binomial parameter See LMcs050201_4thm for another example of using Poisspdf to fit the Poisson distribution to dataDATA=[zeros(81)ones(241)2ones(271)3ones(201)4ones(171) 5ones(101)6ones(31)7] [LAMBDAHAT LAMBDACI] = poissfit(DATA005) fprintf(

nThe Poisson parameter =53f with 95 CI [53f 53f]nn LAMBDAHAT LAMBDACI)

4215 The Poisson Model the law of small numbers 421511 Assumptions of the Poisson model

4215111 The probability that 2 or more events occur in any given ltsmallgt subinterval is essentially 0

4215112 Events are independent 4215113 The probability that an event occurs

during a given subinterval is constant over the entire interval from 0 to T

EEOS 601 Applied Statistics Week 4 P 11 of 38

42152 Calculating Poisson Probabilities 421521 ldquoCalculating Poisson probabilities is an

exercise in choosing T so that eumlT represents the expected number of occurrences in whatever ldquounitrdquo is associated with the random variable X

Example 423 LMex040203_4thm Example 423 Typographical Errors Page 285 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 172011 Revised 1711 Using the Poisson pdf ErrorRate=typographical Errors per page What is the probability that fewer than three typos will appear in a 16shy page edition X=Number of errorsErrorRate=04lambda=ErrorRate16fprintf(Using the Poisson cumulative distribution functionn)P=poisscdf(2lambda)fprintf(The probability that fewer than 3 typos will appear in a n)fprintf(16-p document with an error rate of 31f per page is 53fnlambdaP)fprintf(nUsing the Poisson probability distribution functionn)X=02P=sum(poisspdf(Xlambda))fprintf(If the Poisson parameter = 31f the probability of nlambda)fprintf(fewer than 3 typos = 53fnP)

Example 424 Insect parts in peanut butter

LMex040204_4thm Example 424 Eating bug parts Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu 121210 revised 11211 httpalphaesumbedufacultyedgfilesedgwebphtml This is a simple application of the Poisson cumulative distribution function and Poisson pdf The FDA Food Detection Action Limit for peanut butter is 30 insect fragments per 100 grams If your peanut butter snack weighs 20 grams what is the probability that youll ingest 5 or more insect parts

EEOS 601 Applied Statistics Week 4 P 12 of 38

Calculate lambda the expected number of insect parts per 20 gramslambda=3010020 solve using the Poisson cumulative distribution functionP=1-poisscdf(4lambda)fprintf(The probability that a 20-g snack contains 5 or more insect parts is 53fnP) The problem could also be solved exactly using the Poisson pdfk=04P2=1-sum(poisspdf(klambda))fprintf(The probability that a 20-g snack contains 5 or more insect parts is 53fnP2)

SSum of Poisson processes is a Poisson process

Questions Page 287 4210 Prussian horse kicks 4212 Midwestern Skies Airlines bags lost

Sum of two Poisson variables is Poisson

422 Interval between events the PoissonExponential

Relationship

Theorem 423 P 290

Suppose a series of events satisfying the Poisson model are occurring at a rate of euml

per unit time Let the random variable Y denote the inteval between consecutive

events Then Y has the exponential distribution

EEOS 601 Applied Statistics Week 4 P 13 of 38

Figure 2 Case study 424 Table 426-7 Figure 423

Case Study 424 LMcs040204_4thm Case Study 424 Mauna Loa eruptions amp exponential distribution P 290-291 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Ed Written by Eugene Gallagher for EEOS601 Figure 3 Figure 423 from Case study 424

IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 14 of 38

EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1711 Revised 1711 Fitting the exponential pdf to intervals between eruptons Data = interval between 37 consecutive eruptions of Mauna Loa DATA=[126 73 3 6 37 23 73 23 2 65 94 51 26 21 6 68 16 20 6 18 6 41 40 18 41 11 12 38 77 61 26 3 38 50 91 12] DATA=DATA() lambda=0027 fprintf(Eruptions occuring at 53f per month or once every 32f lambda1(12lambda)) fprintf(yearsn) histc is the Matlab counting function Plot the data using LMFig030402_4thm as a model ObservedI=histc(DATA020140) bar(020140ObservedI(sum(ObservedI)20)histc) set(get(gcaChildren)FaceColor[8 8 1]) axis([0 145 0 0033]) figure(gcf) xlabel(Interval between eruptions (in months)) ylabel(Density) ax1=gca get the handle for the bar charts axes pause y=0140 fy=lambdaexp(-lambday) h2=line(yfyColorrLineStyle--Linewidth2) set(ax1YTick[0001003]) text(340015f_y(y)=0027e^-0027y) title(Figure 423) figure(gcf) pause

Example 425 LMex040205_4thm Example 425 page 292 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu for EEOS601 If the Perseid meteor shower has a rate of 40 per hour Poisson distributed calculate the probability that someone would have to wait 5 minutes before seeing another y= interval in minutes between consecutive sightings Using the symbolic math toolbox

EEOS 601 Applied Statistics Week 4 P 15 of 38

fprintf(Exact P=n)syms y disp(int(4060exp(-4060y)5inf))P=eval(int(4060exp(-4060y)5inf))fprintf(If the Perseids have a Poisson parameter = 4060 per min n)fprintf(an observer would have p=64f of waiting 5 or more nP)fprintf(minutes between consecutive meteorsn)

43 THE NORMAL DISTRIBUTION

Figure 431 LMFig040301_4thm Page 293 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Generate 20 binomial random numbers p(success)=05 Written in 2001 by EugeneGallagherumbedu revised 1711 Use the statistical toolboxs binorndm X=020n=20p=05

Figure 431 Page 293 in Larsen amp Marx (2006)

Y = binopdf(Xnp)bar(XY1)axis([0 21 0 02])title(Figure 431FontSize14)set(get(gcaChildren)FaceColor[8 8 1])ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0220FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=npsigma_j=sqrt(np(1-p)) sigmaj is the standard deviationy_j=00220fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel()ylabel(Probability)title(Figure 431)figure(gcf)pause

IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 16 of 38

431 Theorem 431 Let X be a binomial random variable defined on n independent trials for which p=P(success) For any numbers a and b

Comment The function is referred to as the standard normal (or

Gaussian) curve [or distribution] By convention any random variable whose probabilistic behavior is described by a standard normal curve is denoted by Z (rather than by X Y or W) E(Z)=0 and Var(Z)=1 Page 294 in Larsen amp Marx (2006)

432 Finding areas under the standard normal curve

Figure 432 LMFig040302_4thm Larsen amp Marx Figure 432 plotpage 295 Use LMex040308_4thm as amodel Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Page 267 Standard normal distributionusing ezplot Integration using the symbolicmath toolbox Figure 4 Figure 432 Written byEugeneGallagherumbedu October 1 2010clear allhold off clfsyms z Just for fun use symbolic math integration to find the integralINTP=eval(int(1sqrt(2pi)exp(-(z^22))-inf114)) or use the standard normal cumulative distribution functionintp=normcdf(114)

Plot Figure 432mu=0sigma=1X=-350135

EEOS 601 Applied Statistics Week 4 P 17 of 38

Y = normpdf(Xmusigma) plot(XY--r) axis([-355 355 0 041]) title(Figure 432FontSize16) ax1=gca xlabel( ) ylabel(f_z(z)FontSize14) ax1=gca set(ax1XTick[-3 -2 -1 0 114 2 3] FontSize12

XTickLabel-3-2-1011423FontSize12) set(ax1ytick0104FontSize12) hold on xf=-3501114yf=normpdf(xfmusigma) fill([-35 xf 114][0 yf 0][8 8 1]) hold off

433 The continuity correction

Figure 5 Figure 433 the continuity correction

Example 431 LMex040301_4thm Example 431 Overbooking airline seats Application of the normal distribution and the continuity correction Page 297 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagher httpwwwesumbeduedgwebphtm Written Fall 2010 Revised 1711 178 seats sold for 168 economy class seats available p=09 passenger will show up What is the probability that not everyone who shows up at the gate on time can be accomodated n=178p=09n2=169

IT
Stamp

EEOS 601 Applied Statistics Week 4 P 18 of 38

P=normcdf((n+05-np)sqrt(np(1-p)))- normcdf((n2-05-np)sqrt(np(1-p))) fprintf( nP not all can be accomodated with continuity correction = 64fnP) Without the continuity correction P2=normcdf((n-np)sqrt(np(1-p)))- normcdf((n2-np)sqrt(np(1-p))) fprintf( P not all can be accomodated without continuity correction = 64fnP2) k=169178 Solve as a binomial pdf P3 = sum(binopdf(knp)) fprintf( P not all can be accomodated exact binomial = 64fnP3) P4 = binocdf(nnp)-binocdf(n2-1np) fprintf( P not all can be accomodated exact binomial cdf = 64fnP4)

Case Study 431 ESP LMcs040301_4thm Case Study 431 ESP P 298 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Thirty two students correctly guessed 12489 times out of 60000 when the expected number of successes would be just 12000 What is the probability of this occuring by chancek=12489n=6e4p=15P1=1-binocdf(knp)fprintf(nUsing the binomial cdf the exact P = 97gnP1)X=(124885-np)sqrt(np(1-p))P2=1-normcdf(X)fprintf(Using the Normal approximation with the continuity n)fprintf(correction the exact P = 97gnP2)X=(12489-np)sqrt(np(1-p))P3=1-normcdf(X)fprintf(Using the Normal approximation without the continuity n)fprintf(correction the exact P = 97gnP3)

Questions Page 299

EEOS 601 Applied Statistics Week 4 P 19 of 38

434 Central Limit Theorem (p 301) Central limit theorem Every binomial random variable X can be written as the sum of n independent Bernoulli random variables X X X where X =1 with probability P and 0 with probability 1-p but if 1 2 n i

X=X +X ++X Theorem 431 can be reexpressed as 1 2 n

The Russian Lyapunov made many of the key advances and in 1920 George Polya called it the central limit theorem Theorem 432 p 273 Let W W be an infinite sequence of of independent random variables 1 2

with the same distribution Suppose that the mean igrave and the variance oacute2 of f (w) are both finite w

For any numbers a and b

Comment The central limit theorem is often stated in terms of the average of W W and 1 2

W rather than the sum Theorem 432 can be stated in equivalent form n

Example 432 LMEx040302_4thm Larsen amp Marx Example 432 Example of the central limit theorem Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf

Figure 5 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 uniformly distributed random numbers after conversion to Z scores The fit to the normal distribution is quite good even though each of the 40 samples is based on only 5 uniformly distributed numbers

EEOS 601 Applied Statistics Week 4 P 20 of 38

See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogrammultiple of 7 n=100samples=1000numbins=14 Numbins for thehistogram multiple of 7 n=1000samples=500numbins=14 For the histogram n=1000samples=40 numbins=7 For the histogramC=rand(samplesn) Figure 6 Example 432 Page 302 in Larsen amp MarxC6=sum(C) sums of n replicates (2006) 1000 samples of 5 uniformly distributed random Use symbolic math toolbox to numbers after conversion to Z scores The number of binscalculate the variance of a uniform increased from 7 to 14 for this bar chart The normal random distribution distribution provides an excellent fit even though each ofmu=05 1000 means is based on only 5 cases drawn from a uniformsyms f z y distribution The uniform distribution is symmetricdisp(Definite integral (01) of y^2) allowing a rapid convergence to the normal distributionint(y^201)EYsq=int(sym(y^2)01)VARY=EYsq-mu^2disp(Var of a uniformly distributed random number on the interval (01))disp(VARY)vary=eval(VARY) convert from symbolic to numeric The variance is 112 The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302) end Use the code on page 5-115 of Statistics toolbox to draw histogram binwidthscaling=7numbins This will produce 7 bins (35-(-35)) edges=-35binwidthscaling35N=histc(C7edges) bar(edgesNsum(N)binwidthscaling1) ax1=gca set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 21 of 38

axis([-36 36 -0005 053]) ylabel(DensityFontSize14)xlabel(Z-ratio) set(ax1xtickedgesFontSize12) set(ax1ytick00105FontSize12) figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is Figure 7 Example 432 Page 302 in Larsen amp Marx f(y_j)=1(sqrt(2pi)sigma_j)exp(-1

(2006) 1000 samples of 500 uniformly distributed random numbers after conversion to Z scores The number of bins

2((y_j-mu_j)sigma_j)^2) increased from 7 to 14 for this bar chart The fit of the mu_j=0 sigma_j=1 sigmaj is the standard

normal distribution is excellent as would be expected from the large number of cases (500) on which each mean is

deviation = 1 after Z transform basedy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)s=sprintf(Example 432 n=40f and samples=30fnsamples)title(s)figure(gcf)pause

LMex040302_logrnd_4thm Larsen amp Marx Example 432 Example of the central limit theorem using lognormally distributed variables instead of uniform Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox

Figure 8 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 lognormally distributed (igrave=1 oacute=1) random numbers after conversion to Z-scores The normal approximation is quite poor Because each of the 40 samples is based on a mean of only 5 cases

EEOS 601 Applied Statistics Week 4 P 22 of 38

Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogram multiple of 7 Figure 9 Example 432 Page 302 in Larsen amp Marx

n=100samples=1000 (2006) 1000 samples of 1000 lognormally distributed

numbins=14 Numbins for the (igrave=1 oacute=1) random numbers after conversion to Z scores I

histogram multiple of 7 used 14 bins for this bar chart Even for the highly

n=1000samples=1000 asymmetrically lognormally distributed samples the

numbins=14 For the histogram normal approximation is excellent because of the large

n=1000samples=40 numbins=7 number of cases (1000) on which each mean was based

For the histogram lognormal=0 This example calculates the z scores based on the parametric mean and variance for the underlying distributions For the uniform distribution the variance is calculated from the variance of the distribution to be 112 For the lognormal distribution lognstat is the built-in Matlab function that will covert the mean and variance from the log scale to the standard scale if lognormal mu=1var=1 C=lognrnd(11samplesn) [muvary]=lognstat(11)s=sprintf(Example 432 Lognormal n=40f and samples=30fnsamples) s2=sprintf(Example 432 Lognormal distribution 1st sample n=40fn) else C=rand(samplesn)

mu=05 Use symbolic math toolbox to calculate the variance of a uniform

random distribution syms f z y disp(Definite integral (01) of y^2) int(y^201) EYsq=int(sym(y^2)01)

EEOS 601 Applied Statistics Week 4 P 23 of 38

VARY=EYsq-mu^2 disp( Var of a uniformly distributed random number on the interval (01)) disp(VARY) vary=eval(VARY) convert from symbolic to numeric The variance is 112 s=sprintf(Example 432 Uniform n=40f and samples=40fnsamples) s2=sprintf(Example 432 Uniform distribution 1st sample n=40fn)endC6=sum(C) sums of n replicates The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302)end

Use the code on page 5-115 of Statistics toolbox to draw histogram Draw a histogram of one row of datahist(C(1))ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])ylabel(FrequencyFontSize16)xlabel(XFontSize16)title(s2FontSize20)figure(gcf)pause

histogram of the z scoresbinwidthscaling=7numbins This will produce 7 bins (35-(-35))edges=-35binwidthscaling35N=histc(C7edges)bar(edgesNsum(N)binwidthscaling1histc)ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])axis([-36 36 -0005 105max(Nsum(N)binwidthscaling)])ylabel(DensityFontSize16)xlabel(Z-ratioFontSize16)set(ax1xtickedgesFontSize14)set(ax1ytick00107FontSize14)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=0

EEOS 601 Applied Statistics Week 4 P 24 of 38

sigma_j=1 sigmaj is the standard deviation = 1 after Z transformy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)

title(sFontSize20)figure(gcf)pause

Example 433 LMex040303_4thm Example 433 Pp 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition An application of the symbolic math integration functions Typo in book f_y(y)=y 3(1-y)^2 not fy(y)=3(1-y)^2 syms y f_y(y)=y 3(1-y)^2 0lt=y=1 Let mean y = (115)Sum from 1 to 15 Yi int(y3(1-y)^201) muy=eval(int(y3(1-y)^201)) by definition of variance int(y^23(1-y)^201)-14^2 vary=eval(int(y^23(1-y)^201)-14^2) P=normcdf((38-muy) (sqrt(vary)sqrt(15)))- normcdf((18-muy) (sqrt(vary)sqrt(15)))

Example 434 LMEx040304_4thm Example 434 Effects of rounding analyzed with normal approximation Effect of rounding on the expected value and variance of rounded estimates P 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Using the central limit theorem what is the probability that the error on 100 transactions each rounded to the nearest $100 would exceed $500 What is the variance of a uniformly distributed random number over the interval -50 to 50 From definition of Variance Var(Y)=E[(Y-mu)^2]= integral from -50 to 50 (y-mu)^2 f_y(y) dy mu=0 f_y(y)=1 This is the height of the uniform pdf at each value of y for max(y)-min(y)=1 the value of f_y(y) is 1 for uniform random numbers spread over different ranges for max(y)-min(y)f_y(y) must equal 1 thus for a uniform random number distributed on the interval -100 to 100

EEOS 601 Applied Statistics Week 4 P 25 of 38

f_y(y)=1200 Written by EugeneGallagherumbedu for EEOS601 httpwwwesumbeduedgwebphtm Written in 2001 revised 1911 The mfile goes through some applications of the symbolic math toolbox at the beginning to verify that the variance of 100 rounded transactions is 25003 format rat syms y mu int(y^2100-5050)Use the symbolic math toolbox to find the variance of a uniformly distributed variable on the interval -50 to 50 it is 25003 VarY=eval(int(sym(y^2100)-5050)) Variance of a uniform randomly distributed number on the interval -50 to 50 is 25003 Note how the variance remains 25003 with a shift in mean Var(Y)=E(Y^2)-mu^2 mu=50 int(y^21000100) EYsq=eval(int(sym(y^2100)0100)) The previous statement produces 100003 which produces an identical variance after subtracting the square of the expected value mu^2 as is appropriate from the definition of variance VARY=EYsq-mu^2 Note that Var(Y) is also defined as integral -inf to inf (y-mu)^2 f_y(y) dy We can create the definite integral of this equation and evaluate it to produce the variance syms y mu mu=50 VARY2=eval(int(sym((y-mu)^2100)y0100)) VARY2 should also be 25003 format using Theorem 31312 in Larsen amp Marx (2001 p 223) Calculate the variance of all 100 transactions (multiply the variance by 100) Note this is for 100 indepedently distributed random variates where the positive and negative errors among transactions are not correlated Varall=100VARY If you wanted to convert the variance calculated here in dollars to the variance in pennies youd have to use Theorem 3131 p 222 It would the variance of Varall in pennies would be a=100Var(aVarall)=a^2 Varall =10000 Varall calculate the standard deviation sigmaall=sqrt(Varall)

EEOS 601 Applied Statistics Week 4 P 26 of 38

Calculate the Z statisticZ=zeros(21)Z(1)=(-500-0)sigmaallZ(2)=(500-0)sigmaall Use the cumulative normal probability function to find the probability of being more than $5 dollars off after 100 transactionsP=1-(normcdf(Z(2))-normcdf(Z(1)))fprintf(The probability of being more than $500 off after 100n)fprintf(transactions in which each transaction is rounded to the n)fprintf(nearest $100 is 64fnP)

Example 435 LMex040305_4thm Example 435 Memphis earthquakes and the Poisson distribution Pp 305-306 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Annual earthquakes registering 25 on the Richter scale and within 40 miles of downtown Memphis have a Poisson distribution with lambda=65 Calculate the exact probability that nine or more such earthquakes will strike next year using the Poisson distribution and compare that with the normal approximation based on the central limit theoremlambda=65N=9P=1-poisscdf(N-1lambda)fprintf(nThe exact p value using the Poisson cumulative distribution )fprintf(function with lambda=53f is 74fnlambdaP)P2=1-normcdf(((N-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP2)continuity=05 The continuity correction is appropriate whenever a discrete probability model is being approximated by the area under a curve)P3=1-normcdf(((N+continuity-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem with )fprintf(the continuity correction is )fprintf(74fnP3)

Optional plot the Poisson pdf and the normal approximation Use LMFig040301_4thm as a modelX=018Y = poisspdf(Xlambda)bar(XY1)axis([-06 19 0 02])title(Example 435FontSize14)set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 27 of 38

ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0218FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=lambdasigma_j=sqrt(lambda) sigmaj is the standard deviationy_j=00218fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel(Number of Memphis Earthquakes gt 25)figure(gcf)pause

Questions p 306

435 The Normal Curve as a Model for Individual Measurements

Definition 431 A random variable Y is said to be normally distributed with mean igrave and variance oacutesup2 if

Comment Areas under and ldquoarbitraryrdquo normal distribution f (y) y

are calculated by finding the equivalent area under the standard normal distribution f (z) z Figure 10 Figure 435 from Example 436 If a driverrsquos

true blood alcohol level is 0095 and the breathalyzer has an instrumental error oacute= 0004 (pdf shown in upper panel) What is the probability that a driver will be

The ratio is often referred to erroneously charged with DUI The problem can be as either a Z transformation or a Z analyzed using the pdf for the unscaled data or the data can score be converted to z scores (lower panel) by subtracting the

mean and dividing by oacute The probability of an erroneous Example 436 charge is 1056 the area to the right of the cutpoints LMex040306_4thm shown in the normal and standard normal curves

EEOS 601 Applied Statistics Week 4 P 28 of 38

Example 436 Drunk driving pages 308-309 in Larsen amp Marx (2006) Based on LMex040305_4thm Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml A legally drunk driver in many states has a blood alcohol level of 010 but the breathalyzer has measurement error Repeated measurements indicate that the the estimate of the mean is unbiased and sigma is 0004 If a drivers true blood alcohol content is 0095 what is the probability that he will be incorrectly booked on a DUI chargemu=0095sigma=0004Observed=01P=1-normcdf((Observed-mu)sigma)fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP)

Plot Figure 435subplot(211)X=0075000050115Y = normpdf(Xmusigma)plot(XY--r)axis([0075 0115 0 105])title(Figure 435FontSize14)ax1=gcaxlabel(Blood Alcohol Level)ylabel(f_y(y)FontSize14)ax1=gcaset(ax1xtick0080005011FontSize12)set(ax1ytick025100FontSize12)hold onxf=0100001115yf=normpdf(xfmusigma)fill([01 xf 0115][0 yf 0][8 8 1])hold off

subplot(212)X=-3500135Y = normpdf(X01)plot(XY--r)axis([-355 355 0 042])ax1=gcaxlabel(Z-score)ylabel(f_z(z)FontSize14)ax1=gca

EEOS 601 Applied Statistics Week 4 P 29 of 38

set(ax1xtick-350535FontSize12)set(ax1ytick00204FontSize12)hold onxf=125000135yf=normpdf(xf01)fill([125 xf 35][0 yf 0][8 8 1])figure(gcf)pausehold offsubplot(111)

Example 437 Mensa p 309 LMex040307_4thm Example 437 Mensa amp IQ Pages309-310 in Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Based on LMex040306_4thm Written Fall 2010 by EugeneGallagher for EEOS601 Revised1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfi Figure 11 Figure 436 from Example 437 A Mensa

lesedgwebphtml score of 133 marks the upper 2 of IQrsquos if igrave=100 and

A person must be in the upper 2 oacute=16

of the population in IQ to join What is the minimum IQ that will qualify a person for membership Assume that the mean IQ is 100 and sigma is 16mu=100sigma=16 Use the inverse of the normal pdf to find the IQ cutoff for the upper 2 of the population 098=(1-002)IQmensa=norminv(09810016)fprintf(nThe lowest acceptable IQ for membership in Mensa is )fprintf(30fnceil(IQmensa))

Plot Figure 436X=50150Y = normpdf(Xmusigma)plot(XY--r)axis([45 155 0 003])title(Figure 436 Mensa amp IQFontSize20)ax1=gcaxlabel(IQFontSize16)ylabel(f_y(y)FontSize16)ax1=gcaset(ax1xtick[50 100 133 150]FontSize14)

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 11: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 11 of 38

42152 Calculating Poisson Probabilities 421521 ldquoCalculating Poisson probabilities is an

exercise in choosing T so that eumlT represents the expected number of occurrences in whatever ldquounitrdquo is associated with the random variable X

Example 423 LMex040203_4thm Example 423 Typographical Errors Page 285 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 172011 Revised 1711 Using the Poisson pdf ErrorRate=typographical Errors per page What is the probability that fewer than three typos will appear in a 16shy page edition X=Number of errorsErrorRate=04lambda=ErrorRate16fprintf(Using the Poisson cumulative distribution functionn)P=poisscdf(2lambda)fprintf(The probability that fewer than 3 typos will appear in a n)fprintf(16-p document with an error rate of 31f per page is 53fnlambdaP)fprintf(nUsing the Poisson probability distribution functionn)X=02P=sum(poisspdf(Xlambda))fprintf(If the Poisson parameter = 31f the probability of nlambda)fprintf(fewer than 3 typos = 53fnP)

Example 424 Insect parts in peanut butter

LMex040204_4thm Example 424 Eating bug parts Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu 121210 revised 11211 httpalphaesumbedufacultyedgfilesedgwebphtml This is a simple application of the Poisson cumulative distribution function and Poisson pdf The FDA Food Detection Action Limit for peanut butter is 30 insect fragments per 100 grams If your peanut butter snack weighs 20 grams what is the probability that youll ingest 5 or more insect parts

EEOS 601 Applied Statistics Week 4 P 12 of 38

Calculate lambda the expected number of insect parts per 20 gramslambda=3010020 solve using the Poisson cumulative distribution functionP=1-poisscdf(4lambda)fprintf(The probability that a 20-g snack contains 5 or more insect parts is 53fnP) The problem could also be solved exactly using the Poisson pdfk=04P2=1-sum(poisspdf(klambda))fprintf(The probability that a 20-g snack contains 5 or more insect parts is 53fnP2)

SSum of Poisson processes is a Poisson process

Questions Page 287 4210 Prussian horse kicks 4212 Midwestern Skies Airlines bags lost

Sum of two Poisson variables is Poisson

422 Interval between events the PoissonExponential

Relationship

Theorem 423 P 290

Suppose a series of events satisfying the Poisson model are occurring at a rate of euml

per unit time Let the random variable Y denote the inteval between consecutive

events Then Y has the exponential distribution

EEOS 601 Applied Statistics Week 4 P 13 of 38

Figure 2 Case study 424 Table 426-7 Figure 423

Case Study 424 LMcs040204_4thm Case Study 424 Mauna Loa eruptions amp exponential distribution P 290-291 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Ed Written by Eugene Gallagher for EEOS601 Figure 3 Figure 423 from Case study 424

IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 14 of 38

EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1711 Revised 1711 Fitting the exponential pdf to intervals between eruptons Data = interval between 37 consecutive eruptions of Mauna Loa DATA=[126 73 3 6 37 23 73 23 2 65 94 51 26 21 6 68 16 20 6 18 6 41 40 18 41 11 12 38 77 61 26 3 38 50 91 12] DATA=DATA() lambda=0027 fprintf(Eruptions occuring at 53f per month or once every 32f lambda1(12lambda)) fprintf(yearsn) histc is the Matlab counting function Plot the data using LMFig030402_4thm as a model ObservedI=histc(DATA020140) bar(020140ObservedI(sum(ObservedI)20)histc) set(get(gcaChildren)FaceColor[8 8 1]) axis([0 145 0 0033]) figure(gcf) xlabel(Interval between eruptions (in months)) ylabel(Density) ax1=gca get the handle for the bar charts axes pause y=0140 fy=lambdaexp(-lambday) h2=line(yfyColorrLineStyle--Linewidth2) set(ax1YTick[0001003]) text(340015f_y(y)=0027e^-0027y) title(Figure 423) figure(gcf) pause

Example 425 LMex040205_4thm Example 425 page 292 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu for EEOS601 If the Perseid meteor shower has a rate of 40 per hour Poisson distributed calculate the probability that someone would have to wait 5 minutes before seeing another y= interval in minutes between consecutive sightings Using the symbolic math toolbox

EEOS 601 Applied Statistics Week 4 P 15 of 38

fprintf(Exact P=n)syms y disp(int(4060exp(-4060y)5inf))P=eval(int(4060exp(-4060y)5inf))fprintf(If the Perseids have a Poisson parameter = 4060 per min n)fprintf(an observer would have p=64f of waiting 5 or more nP)fprintf(minutes between consecutive meteorsn)

43 THE NORMAL DISTRIBUTION

Figure 431 LMFig040301_4thm Page 293 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Generate 20 binomial random numbers p(success)=05 Written in 2001 by EugeneGallagherumbedu revised 1711 Use the statistical toolboxs binorndm X=020n=20p=05

Figure 431 Page 293 in Larsen amp Marx (2006)

Y = binopdf(Xnp)bar(XY1)axis([0 21 0 02])title(Figure 431FontSize14)set(get(gcaChildren)FaceColor[8 8 1])ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0220FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=npsigma_j=sqrt(np(1-p)) sigmaj is the standard deviationy_j=00220fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel()ylabel(Probability)title(Figure 431)figure(gcf)pause

IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 16 of 38

431 Theorem 431 Let X be a binomial random variable defined on n independent trials for which p=P(success) For any numbers a and b

Comment The function is referred to as the standard normal (or

Gaussian) curve [or distribution] By convention any random variable whose probabilistic behavior is described by a standard normal curve is denoted by Z (rather than by X Y or W) E(Z)=0 and Var(Z)=1 Page 294 in Larsen amp Marx (2006)

432 Finding areas under the standard normal curve

Figure 432 LMFig040302_4thm Larsen amp Marx Figure 432 plotpage 295 Use LMex040308_4thm as amodel Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Page 267 Standard normal distributionusing ezplot Integration using the symbolicmath toolbox Figure 4 Figure 432 Written byEugeneGallagherumbedu October 1 2010clear allhold off clfsyms z Just for fun use symbolic math integration to find the integralINTP=eval(int(1sqrt(2pi)exp(-(z^22))-inf114)) or use the standard normal cumulative distribution functionintp=normcdf(114)

Plot Figure 432mu=0sigma=1X=-350135

EEOS 601 Applied Statistics Week 4 P 17 of 38

Y = normpdf(Xmusigma) plot(XY--r) axis([-355 355 0 041]) title(Figure 432FontSize16) ax1=gca xlabel( ) ylabel(f_z(z)FontSize14) ax1=gca set(ax1XTick[-3 -2 -1 0 114 2 3] FontSize12

XTickLabel-3-2-1011423FontSize12) set(ax1ytick0104FontSize12) hold on xf=-3501114yf=normpdf(xfmusigma) fill([-35 xf 114][0 yf 0][8 8 1]) hold off

433 The continuity correction

Figure 5 Figure 433 the continuity correction

Example 431 LMex040301_4thm Example 431 Overbooking airline seats Application of the normal distribution and the continuity correction Page 297 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagher httpwwwesumbeduedgwebphtm Written Fall 2010 Revised 1711 178 seats sold for 168 economy class seats available p=09 passenger will show up What is the probability that not everyone who shows up at the gate on time can be accomodated n=178p=09n2=169

IT
Stamp

EEOS 601 Applied Statistics Week 4 P 18 of 38

P=normcdf((n+05-np)sqrt(np(1-p)))- normcdf((n2-05-np)sqrt(np(1-p))) fprintf( nP not all can be accomodated with continuity correction = 64fnP) Without the continuity correction P2=normcdf((n-np)sqrt(np(1-p)))- normcdf((n2-np)sqrt(np(1-p))) fprintf( P not all can be accomodated without continuity correction = 64fnP2) k=169178 Solve as a binomial pdf P3 = sum(binopdf(knp)) fprintf( P not all can be accomodated exact binomial = 64fnP3) P4 = binocdf(nnp)-binocdf(n2-1np) fprintf( P not all can be accomodated exact binomial cdf = 64fnP4)

Case Study 431 ESP LMcs040301_4thm Case Study 431 ESP P 298 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Thirty two students correctly guessed 12489 times out of 60000 when the expected number of successes would be just 12000 What is the probability of this occuring by chancek=12489n=6e4p=15P1=1-binocdf(knp)fprintf(nUsing the binomial cdf the exact P = 97gnP1)X=(124885-np)sqrt(np(1-p))P2=1-normcdf(X)fprintf(Using the Normal approximation with the continuity n)fprintf(correction the exact P = 97gnP2)X=(12489-np)sqrt(np(1-p))P3=1-normcdf(X)fprintf(Using the Normal approximation without the continuity n)fprintf(correction the exact P = 97gnP3)

Questions Page 299

EEOS 601 Applied Statistics Week 4 P 19 of 38

434 Central Limit Theorem (p 301) Central limit theorem Every binomial random variable X can be written as the sum of n independent Bernoulli random variables X X X where X =1 with probability P and 0 with probability 1-p but if 1 2 n i

X=X +X ++X Theorem 431 can be reexpressed as 1 2 n

The Russian Lyapunov made many of the key advances and in 1920 George Polya called it the central limit theorem Theorem 432 p 273 Let W W be an infinite sequence of of independent random variables 1 2

with the same distribution Suppose that the mean igrave and the variance oacute2 of f (w) are both finite w

For any numbers a and b

Comment The central limit theorem is often stated in terms of the average of W W and 1 2

W rather than the sum Theorem 432 can be stated in equivalent form n

Example 432 LMEx040302_4thm Larsen amp Marx Example 432 Example of the central limit theorem Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf

Figure 5 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 uniformly distributed random numbers after conversion to Z scores The fit to the normal distribution is quite good even though each of the 40 samples is based on only 5 uniformly distributed numbers

EEOS 601 Applied Statistics Week 4 P 20 of 38

See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogrammultiple of 7 n=100samples=1000numbins=14 Numbins for thehistogram multiple of 7 n=1000samples=500numbins=14 For the histogram n=1000samples=40 numbins=7 For the histogramC=rand(samplesn) Figure 6 Example 432 Page 302 in Larsen amp MarxC6=sum(C) sums of n replicates (2006) 1000 samples of 5 uniformly distributed random Use symbolic math toolbox to numbers after conversion to Z scores The number of binscalculate the variance of a uniform increased from 7 to 14 for this bar chart The normal random distribution distribution provides an excellent fit even though each ofmu=05 1000 means is based on only 5 cases drawn from a uniformsyms f z y distribution The uniform distribution is symmetricdisp(Definite integral (01) of y^2) allowing a rapid convergence to the normal distributionint(y^201)EYsq=int(sym(y^2)01)VARY=EYsq-mu^2disp(Var of a uniformly distributed random number on the interval (01))disp(VARY)vary=eval(VARY) convert from symbolic to numeric The variance is 112 The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302) end Use the code on page 5-115 of Statistics toolbox to draw histogram binwidthscaling=7numbins This will produce 7 bins (35-(-35)) edges=-35binwidthscaling35N=histc(C7edges) bar(edgesNsum(N)binwidthscaling1) ax1=gca set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 21 of 38

axis([-36 36 -0005 053]) ylabel(DensityFontSize14)xlabel(Z-ratio) set(ax1xtickedgesFontSize12) set(ax1ytick00105FontSize12) figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is Figure 7 Example 432 Page 302 in Larsen amp Marx f(y_j)=1(sqrt(2pi)sigma_j)exp(-1

(2006) 1000 samples of 500 uniformly distributed random numbers after conversion to Z scores The number of bins

2((y_j-mu_j)sigma_j)^2) increased from 7 to 14 for this bar chart The fit of the mu_j=0 sigma_j=1 sigmaj is the standard

normal distribution is excellent as would be expected from the large number of cases (500) on which each mean is

deviation = 1 after Z transform basedy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)s=sprintf(Example 432 n=40f and samples=30fnsamples)title(s)figure(gcf)pause

LMex040302_logrnd_4thm Larsen amp Marx Example 432 Example of the central limit theorem using lognormally distributed variables instead of uniform Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox

Figure 8 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 lognormally distributed (igrave=1 oacute=1) random numbers after conversion to Z-scores The normal approximation is quite poor Because each of the 40 samples is based on a mean of only 5 cases

EEOS 601 Applied Statistics Week 4 P 22 of 38

Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogram multiple of 7 Figure 9 Example 432 Page 302 in Larsen amp Marx

n=100samples=1000 (2006) 1000 samples of 1000 lognormally distributed

numbins=14 Numbins for the (igrave=1 oacute=1) random numbers after conversion to Z scores I

histogram multiple of 7 used 14 bins for this bar chart Even for the highly

n=1000samples=1000 asymmetrically lognormally distributed samples the

numbins=14 For the histogram normal approximation is excellent because of the large

n=1000samples=40 numbins=7 number of cases (1000) on which each mean was based

For the histogram lognormal=0 This example calculates the z scores based on the parametric mean and variance for the underlying distributions For the uniform distribution the variance is calculated from the variance of the distribution to be 112 For the lognormal distribution lognstat is the built-in Matlab function that will covert the mean and variance from the log scale to the standard scale if lognormal mu=1var=1 C=lognrnd(11samplesn) [muvary]=lognstat(11)s=sprintf(Example 432 Lognormal n=40f and samples=30fnsamples) s2=sprintf(Example 432 Lognormal distribution 1st sample n=40fn) else C=rand(samplesn)

mu=05 Use symbolic math toolbox to calculate the variance of a uniform

random distribution syms f z y disp(Definite integral (01) of y^2) int(y^201) EYsq=int(sym(y^2)01)

EEOS 601 Applied Statistics Week 4 P 23 of 38

VARY=EYsq-mu^2 disp( Var of a uniformly distributed random number on the interval (01)) disp(VARY) vary=eval(VARY) convert from symbolic to numeric The variance is 112 s=sprintf(Example 432 Uniform n=40f and samples=40fnsamples) s2=sprintf(Example 432 Uniform distribution 1st sample n=40fn)endC6=sum(C) sums of n replicates The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302)end

Use the code on page 5-115 of Statistics toolbox to draw histogram Draw a histogram of one row of datahist(C(1))ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])ylabel(FrequencyFontSize16)xlabel(XFontSize16)title(s2FontSize20)figure(gcf)pause

histogram of the z scoresbinwidthscaling=7numbins This will produce 7 bins (35-(-35))edges=-35binwidthscaling35N=histc(C7edges)bar(edgesNsum(N)binwidthscaling1histc)ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])axis([-36 36 -0005 105max(Nsum(N)binwidthscaling)])ylabel(DensityFontSize16)xlabel(Z-ratioFontSize16)set(ax1xtickedgesFontSize14)set(ax1ytick00107FontSize14)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=0

EEOS 601 Applied Statistics Week 4 P 24 of 38

sigma_j=1 sigmaj is the standard deviation = 1 after Z transformy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)

title(sFontSize20)figure(gcf)pause

Example 433 LMex040303_4thm Example 433 Pp 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition An application of the symbolic math integration functions Typo in book f_y(y)=y 3(1-y)^2 not fy(y)=3(1-y)^2 syms y f_y(y)=y 3(1-y)^2 0lt=y=1 Let mean y = (115)Sum from 1 to 15 Yi int(y3(1-y)^201) muy=eval(int(y3(1-y)^201)) by definition of variance int(y^23(1-y)^201)-14^2 vary=eval(int(y^23(1-y)^201)-14^2) P=normcdf((38-muy) (sqrt(vary)sqrt(15)))- normcdf((18-muy) (sqrt(vary)sqrt(15)))

Example 434 LMEx040304_4thm Example 434 Effects of rounding analyzed with normal approximation Effect of rounding on the expected value and variance of rounded estimates P 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Using the central limit theorem what is the probability that the error on 100 transactions each rounded to the nearest $100 would exceed $500 What is the variance of a uniformly distributed random number over the interval -50 to 50 From definition of Variance Var(Y)=E[(Y-mu)^2]= integral from -50 to 50 (y-mu)^2 f_y(y) dy mu=0 f_y(y)=1 This is the height of the uniform pdf at each value of y for max(y)-min(y)=1 the value of f_y(y) is 1 for uniform random numbers spread over different ranges for max(y)-min(y)f_y(y) must equal 1 thus for a uniform random number distributed on the interval -100 to 100

EEOS 601 Applied Statistics Week 4 P 25 of 38

f_y(y)=1200 Written by EugeneGallagherumbedu for EEOS601 httpwwwesumbeduedgwebphtm Written in 2001 revised 1911 The mfile goes through some applications of the symbolic math toolbox at the beginning to verify that the variance of 100 rounded transactions is 25003 format rat syms y mu int(y^2100-5050)Use the symbolic math toolbox to find the variance of a uniformly distributed variable on the interval -50 to 50 it is 25003 VarY=eval(int(sym(y^2100)-5050)) Variance of a uniform randomly distributed number on the interval -50 to 50 is 25003 Note how the variance remains 25003 with a shift in mean Var(Y)=E(Y^2)-mu^2 mu=50 int(y^21000100) EYsq=eval(int(sym(y^2100)0100)) The previous statement produces 100003 which produces an identical variance after subtracting the square of the expected value mu^2 as is appropriate from the definition of variance VARY=EYsq-mu^2 Note that Var(Y) is also defined as integral -inf to inf (y-mu)^2 f_y(y) dy We can create the definite integral of this equation and evaluate it to produce the variance syms y mu mu=50 VARY2=eval(int(sym((y-mu)^2100)y0100)) VARY2 should also be 25003 format using Theorem 31312 in Larsen amp Marx (2001 p 223) Calculate the variance of all 100 transactions (multiply the variance by 100) Note this is for 100 indepedently distributed random variates where the positive and negative errors among transactions are not correlated Varall=100VARY If you wanted to convert the variance calculated here in dollars to the variance in pennies youd have to use Theorem 3131 p 222 It would the variance of Varall in pennies would be a=100Var(aVarall)=a^2 Varall =10000 Varall calculate the standard deviation sigmaall=sqrt(Varall)

EEOS 601 Applied Statistics Week 4 P 26 of 38

Calculate the Z statisticZ=zeros(21)Z(1)=(-500-0)sigmaallZ(2)=(500-0)sigmaall Use the cumulative normal probability function to find the probability of being more than $5 dollars off after 100 transactionsP=1-(normcdf(Z(2))-normcdf(Z(1)))fprintf(The probability of being more than $500 off after 100n)fprintf(transactions in which each transaction is rounded to the n)fprintf(nearest $100 is 64fnP)

Example 435 LMex040305_4thm Example 435 Memphis earthquakes and the Poisson distribution Pp 305-306 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Annual earthquakes registering 25 on the Richter scale and within 40 miles of downtown Memphis have a Poisson distribution with lambda=65 Calculate the exact probability that nine or more such earthquakes will strike next year using the Poisson distribution and compare that with the normal approximation based on the central limit theoremlambda=65N=9P=1-poisscdf(N-1lambda)fprintf(nThe exact p value using the Poisson cumulative distribution )fprintf(function with lambda=53f is 74fnlambdaP)P2=1-normcdf(((N-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP2)continuity=05 The continuity correction is appropriate whenever a discrete probability model is being approximated by the area under a curve)P3=1-normcdf(((N+continuity-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem with )fprintf(the continuity correction is )fprintf(74fnP3)

Optional plot the Poisson pdf and the normal approximation Use LMFig040301_4thm as a modelX=018Y = poisspdf(Xlambda)bar(XY1)axis([-06 19 0 02])title(Example 435FontSize14)set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 27 of 38

ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0218FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=lambdasigma_j=sqrt(lambda) sigmaj is the standard deviationy_j=00218fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel(Number of Memphis Earthquakes gt 25)figure(gcf)pause

Questions p 306

435 The Normal Curve as a Model for Individual Measurements

Definition 431 A random variable Y is said to be normally distributed with mean igrave and variance oacutesup2 if

Comment Areas under and ldquoarbitraryrdquo normal distribution f (y) y

are calculated by finding the equivalent area under the standard normal distribution f (z) z Figure 10 Figure 435 from Example 436 If a driverrsquos

true blood alcohol level is 0095 and the breathalyzer has an instrumental error oacute= 0004 (pdf shown in upper panel) What is the probability that a driver will be

The ratio is often referred to erroneously charged with DUI The problem can be as either a Z transformation or a Z analyzed using the pdf for the unscaled data or the data can score be converted to z scores (lower panel) by subtracting the

mean and dividing by oacute The probability of an erroneous Example 436 charge is 1056 the area to the right of the cutpoints LMex040306_4thm shown in the normal and standard normal curves

EEOS 601 Applied Statistics Week 4 P 28 of 38

Example 436 Drunk driving pages 308-309 in Larsen amp Marx (2006) Based on LMex040305_4thm Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml A legally drunk driver in many states has a blood alcohol level of 010 but the breathalyzer has measurement error Repeated measurements indicate that the the estimate of the mean is unbiased and sigma is 0004 If a drivers true blood alcohol content is 0095 what is the probability that he will be incorrectly booked on a DUI chargemu=0095sigma=0004Observed=01P=1-normcdf((Observed-mu)sigma)fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP)

Plot Figure 435subplot(211)X=0075000050115Y = normpdf(Xmusigma)plot(XY--r)axis([0075 0115 0 105])title(Figure 435FontSize14)ax1=gcaxlabel(Blood Alcohol Level)ylabel(f_y(y)FontSize14)ax1=gcaset(ax1xtick0080005011FontSize12)set(ax1ytick025100FontSize12)hold onxf=0100001115yf=normpdf(xfmusigma)fill([01 xf 0115][0 yf 0][8 8 1])hold off

subplot(212)X=-3500135Y = normpdf(X01)plot(XY--r)axis([-355 355 0 042])ax1=gcaxlabel(Z-score)ylabel(f_z(z)FontSize14)ax1=gca

EEOS 601 Applied Statistics Week 4 P 29 of 38

set(ax1xtick-350535FontSize12)set(ax1ytick00204FontSize12)hold onxf=125000135yf=normpdf(xf01)fill([125 xf 35][0 yf 0][8 8 1])figure(gcf)pausehold offsubplot(111)

Example 437 Mensa p 309 LMex040307_4thm Example 437 Mensa amp IQ Pages309-310 in Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Based on LMex040306_4thm Written Fall 2010 by EugeneGallagher for EEOS601 Revised1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfi Figure 11 Figure 436 from Example 437 A Mensa

lesedgwebphtml score of 133 marks the upper 2 of IQrsquos if igrave=100 and

A person must be in the upper 2 oacute=16

of the population in IQ to join What is the minimum IQ that will qualify a person for membership Assume that the mean IQ is 100 and sigma is 16mu=100sigma=16 Use the inverse of the normal pdf to find the IQ cutoff for the upper 2 of the population 098=(1-002)IQmensa=norminv(09810016)fprintf(nThe lowest acceptable IQ for membership in Mensa is )fprintf(30fnceil(IQmensa))

Plot Figure 436X=50150Y = normpdf(Xmusigma)plot(XY--r)axis([45 155 0 003])title(Figure 436 Mensa amp IQFontSize20)ax1=gcaxlabel(IQFontSize16)ylabel(f_y(y)FontSize16)ax1=gcaset(ax1xtick[50 100 133 150]FontSize14)

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 12: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 12 of 38

Calculate lambda the expected number of insect parts per 20 gramslambda=3010020 solve using the Poisson cumulative distribution functionP=1-poisscdf(4lambda)fprintf(The probability that a 20-g snack contains 5 or more insect parts is 53fnP) The problem could also be solved exactly using the Poisson pdfk=04P2=1-sum(poisspdf(klambda))fprintf(The probability that a 20-g snack contains 5 or more insect parts is 53fnP2)

SSum of Poisson processes is a Poisson process

Questions Page 287 4210 Prussian horse kicks 4212 Midwestern Skies Airlines bags lost

Sum of two Poisson variables is Poisson

422 Interval between events the PoissonExponential

Relationship

Theorem 423 P 290

Suppose a series of events satisfying the Poisson model are occurring at a rate of euml

per unit time Let the random variable Y denote the inteval between consecutive

events Then Y has the exponential distribution

EEOS 601 Applied Statistics Week 4 P 13 of 38

Figure 2 Case study 424 Table 426-7 Figure 423

Case Study 424 LMcs040204_4thm Case Study 424 Mauna Loa eruptions amp exponential distribution P 290-291 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Ed Written by Eugene Gallagher for EEOS601 Figure 3 Figure 423 from Case study 424

IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 14 of 38

EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1711 Revised 1711 Fitting the exponential pdf to intervals between eruptons Data = interval between 37 consecutive eruptions of Mauna Loa DATA=[126 73 3 6 37 23 73 23 2 65 94 51 26 21 6 68 16 20 6 18 6 41 40 18 41 11 12 38 77 61 26 3 38 50 91 12] DATA=DATA() lambda=0027 fprintf(Eruptions occuring at 53f per month or once every 32f lambda1(12lambda)) fprintf(yearsn) histc is the Matlab counting function Plot the data using LMFig030402_4thm as a model ObservedI=histc(DATA020140) bar(020140ObservedI(sum(ObservedI)20)histc) set(get(gcaChildren)FaceColor[8 8 1]) axis([0 145 0 0033]) figure(gcf) xlabel(Interval between eruptions (in months)) ylabel(Density) ax1=gca get the handle for the bar charts axes pause y=0140 fy=lambdaexp(-lambday) h2=line(yfyColorrLineStyle--Linewidth2) set(ax1YTick[0001003]) text(340015f_y(y)=0027e^-0027y) title(Figure 423) figure(gcf) pause

Example 425 LMex040205_4thm Example 425 page 292 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu for EEOS601 If the Perseid meteor shower has a rate of 40 per hour Poisson distributed calculate the probability that someone would have to wait 5 minutes before seeing another y= interval in minutes between consecutive sightings Using the symbolic math toolbox

EEOS 601 Applied Statistics Week 4 P 15 of 38

fprintf(Exact P=n)syms y disp(int(4060exp(-4060y)5inf))P=eval(int(4060exp(-4060y)5inf))fprintf(If the Perseids have a Poisson parameter = 4060 per min n)fprintf(an observer would have p=64f of waiting 5 or more nP)fprintf(minutes between consecutive meteorsn)

43 THE NORMAL DISTRIBUTION

Figure 431 LMFig040301_4thm Page 293 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Generate 20 binomial random numbers p(success)=05 Written in 2001 by EugeneGallagherumbedu revised 1711 Use the statistical toolboxs binorndm X=020n=20p=05

Figure 431 Page 293 in Larsen amp Marx (2006)

Y = binopdf(Xnp)bar(XY1)axis([0 21 0 02])title(Figure 431FontSize14)set(get(gcaChildren)FaceColor[8 8 1])ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0220FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=npsigma_j=sqrt(np(1-p)) sigmaj is the standard deviationy_j=00220fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel()ylabel(Probability)title(Figure 431)figure(gcf)pause

IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 16 of 38

431 Theorem 431 Let X be a binomial random variable defined on n independent trials for which p=P(success) For any numbers a and b

Comment The function is referred to as the standard normal (or

Gaussian) curve [or distribution] By convention any random variable whose probabilistic behavior is described by a standard normal curve is denoted by Z (rather than by X Y or W) E(Z)=0 and Var(Z)=1 Page 294 in Larsen amp Marx (2006)

432 Finding areas under the standard normal curve

Figure 432 LMFig040302_4thm Larsen amp Marx Figure 432 plotpage 295 Use LMex040308_4thm as amodel Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Page 267 Standard normal distributionusing ezplot Integration using the symbolicmath toolbox Figure 4 Figure 432 Written byEugeneGallagherumbedu October 1 2010clear allhold off clfsyms z Just for fun use symbolic math integration to find the integralINTP=eval(int(1sqrt(2pi)exp(-(z^22))-inf114)) or use the standard normal cumulative distribution functionintp=normcdf(114)

Plot Figure 432mu=0sigma=1X=-350135

EEOS 601 Applied Statistics Week 4 P 17 of 38

Y = normpdf(Xmusigma) plot(XY--r) axis([-355 355 0 041]) title(Figure 432FontSize16) ax1=gca xlabel( ) ylabel(f_z(z)FontSize14) ax1=gca set(ax1XTick[-3 -2 -1 0 114 2 3] FontSize12

XTickLabel-3-2-1011423FontSize12) set(ax1ytick0104FontSize12) hold on xf=-3501114yf=normpdf(xfmusigma) fill([-35 xf 114][0 yf 0][8 8 1]) hold off

433 The continuity correction

Figure 5 Figure 433 the continuity correction

Example 431 LMex040301_4thm Example 431 Overbooking airline seats Application of the normal distribution and the continuity correction Page 297 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagher httpwwwesumbeduedgwebphtm Written Fall 2010 Revised 1711 178 seats sold for 168 economy class seats available p=09 passenger will show up What is the probability that not everyone who shows up at the gate on time can be accomodated n=178p=09n2=169

IT
Stamp

EEOS 601 Applied Statistics Week 4 P 18 of 38

P=normcdf((n+05-np)sqrt(np(1-p)))- normcdf((n2-05-np)sqrt(np(1-p))) fprintf( nP not all can be accomodated with continuity correction = 64fnP) Without the continuity correction P2=normcdf((n-np)sqrt(np(1-p)))- normcdf((n2-np)sqrt(np(1-p))) fprintf( P not all can be accomodated without continuity correction = 64fnP2) k=169178 Solve as a binomial pdf P3 = sum(binopdf(knp)) fprintf( P not all can be accomodated exact binomial = 64fnP3) P4 = binocdf(nnp)-binocdf(n2-1np) fprintf( P not all can be accomodated exact binomial cdf = 64fnP4)

Case Study 431 ESP LMcs040301_4thm Case Study 431 ESP P 298 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Thirty two students correctly guessed 12489 times out of 60000 when the expected number of successes would be just 12000 What is the probability of this occuring by chancek=12489n=6e4p=15P1=1-binocdf(knp)fprintf(nUsing the binomial cdf the exact P = 97gnP1)X=(124885-np)sqrt(np(1-p))P2=1-normcdf(X)fprintf(Using the Normal approximation with the continuity n)fprintf(correction the exact P = 97gnP2)X=(12489-np)sqrt(np(1-p))P3=1-normcdf(X)fprintf(Using the Normal approximation without the continuity n)fprintf(correction the exact P = 97gnP3)

Questions Page 299

EEOS 601 Applied Statistics Week 4 P 19 of 38

434 Central Limit Theorem (p 301) Central limit theorem Every binomial random variable X can be written as the sum of n independent Bernoulli random variables X X X where X =1 with probability P and 0 with probability 1-p but if 1 2 n i

X=X +X ++X Theorem 431 can be reexpressed as 1 2 n

The Russian Lyapunov made many of the key advances and in 1920 George Polya called it the central limit theorem Theorem 432 p 273 Let W W be an infinite sequence of of independent random variables 1 2

with the same distribution Suppose that the mean igrave and the variance oacute2 of f (w) are both finite w

For any numbers a and b

Comment The central limit theorem is often stated in terms of the average of W W and 1 2

W rather than the sum Theorem 432 can be stated in equivalent form n

Example 432 LMEx040302_4thm Larsen amp Marx Example 432 Example of the central limit theorem Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf

Figure 5 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 uniformly distributed random numbers after conversion to Z scores The fit to the normal distribution is quite good even though each of the 40 samples is based on only 5 uniformly distributed numbers

EEOS 601 Applied Statistics Week 4 P 20 of 38

See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogrammultiple of 7 n=100samples=1000numbins=14 Numbins for thehistogram multiple of 7 n=1000samples=500numbins=14 For the histogram n=1000samples=40 numbins=7 For the histogramC=rand(samplesn) Figure 6 Example 432 Page 302 in Larsen amp MarxC6=sum(C) sums of n replicates (2006) 1000 samples of 5 uniformly distributed random Use symbolic math toolbox to numbers after conversion to Z scores The number of binscalculate the variance of a uniform increased from 7 to 14 for this bar chart The normal random distribution distribution provides an excellent fit even though each ofmu=05 1000 means is based on only 5 cases drawn from a uniformsyms f z y distribution The uniform distribution is symmetricdisp(Definite integral (01) of y^2) allowing a rapid convergence to the normal distributionint(y^201)EYsq=int(sym(y^2)01)VARY=EYsq-mu^2disp(Var of a uniformly distributed random number on the interval (01))disp(VARY)vary=eval(VARY) convert from symbolic to numeric The variance is 112 The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302) end Use the code on page 5-115 of Statistics toolbox to draw histogram binwidthscaling=7numbins This will produce 7 bins (35-(-35)) edges=-35binwidthscaling35N=histc(C7edges) bar(edgesNsum(N)binwidthscaling1) ax1=gca set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 21 of 38

axis([-36 36 -0005 053]) ylabel(DensityFontSize14)xlabel(Z-ratio) set(ax1xtickedgesFontSize12) set(ax1ytick00105FontSize12) figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is Figure 7 Example 432 Page 302 in Larsen amp Marx f(y_j)=1(sqrt(2pi)sigma_j)exp(-1

(2006) 1000 samples of 500 uniformly distributed random numbers after conversion to Z scores The number of bins

2((y_j-mu_j)sigma_j)^2) increased from 7 to 14 for this bar chart The fit of the mu_j=0 sigma_j=1 sigmaj is the standard

normal distribution is excellent as would be expected from the large number of cases (500) on which each mean is

deviation = 1 after Z transform basedy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)s=sprintf(Example 432 n=40f and samples=30fnsamples)title(s)figure(gcf)pause

LMex040302_logrnd_4thm Larsen amp Marx Example 432 Example of the central limit theorem using lognormally distributed variables instead of uniform Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox

Figure 8 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 lognormally distributed (igrave=1 oacute=1) random numbers after conversion to Z-scores The normal approximation is quite poor Because each of the 40 samples is based on a mean of only 5 cases

EEOS 601 Applied Statistics Week 4 P 22 of 38

Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogram multiple of 7 Figure 9 Example 432 Page 302 in Larsen amp Marx

n=100samples=1000 (2006) 1000 samples of 1000 lognormally distributed

numbins=14 Numbins for the (igrave=1 oacute=1) random numbers after conversion to Z scores I

histogram multiple of 7 used 14 bins for this bar chart Even for the highly

n=1000samples=1000 asymmetrically lognormally distributed samples the

numbins=14 For the histogram normal approximation is excellent because of the large

n=1000samples=40 numbins=7 number of cases (1000) on which each mean was based

For the histogram lognormal=0 This example calculates the z scores based on the parametric mean and variance for the underlying distributions For the uniform distribution the variance is calculated from the variance of the distribution to be 112 For the lognormal distribution lognstat is the built-in Matlab function that will covert the mean and variance from the log scale to the standard scale if lognormal mu=1var=1 C=lognrnd(11samplesn) [muvary]=lognstat(11)s=sprintf(Example 432 Lognormal n=40f and samples=30fnsamples) s2=sprintf(Example 432 Lognormal distribution 1st sample n=40fn) else C=rand(samplesn)

mu=05 Use symbolic math toolbox to calculate the variance of a uniform

random distribution syms f z y disp(Definite integral (01) of y^2) int(y^201) EYsq=int(sym(y^2)01)

EEOS 601 Applied Statistics Week 4 P 23 of 38

VARY=EYsq-mu^2 disp( Var of a uniformly distributed random number on the interval (01)) disp(VARY) vary=eval(VARY) convert from symbolic to numeric The variance is 112 s=sprintf(Example 432 Uniform n=40f and samples=40fnsamples) s2=sprintf(Example 432 Uniform distribution 1st sample n=40fn)endC6=sum(C) sums of n replicates The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302)end

Use the code on page 5-115 of Statistics toolbox to draw histogram Draw a histogram of one row of datahist(C(1))ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])ylabel(FrequencyFontSize16)xlabel(XFontSize16)title(s2FontSize20)figure(gcf)pause

histogram of the z scoresbinwidthscaling=7numbins This will produce 7 bins (35-(-35))edges=-35binwidthscaling35N=histc(C7edges)bar(edgesNsum(N)binwidthscaling1histc)ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])axis([-36 36 -0005 105max(Nsum(N)binwidthscaling)])ylabel(DensityFontSize16)xlabel(Z-ratioFontSize16)set(ax1xtickedgesFontSize14)set(ax1ytick00107FontSize14)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=0

EEOS 601 Applied Statistics Week 4 P 24 of 38

sigma_j=1 sigmaj is the standard deviation = 1 after Z transformy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)

title(sFontSize20)figure(gcf)pause

Example 433 LMex040303_4thm Example 433 Pp 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition An application of the symbolic math integration functions Typo in book f_y(y)=y 3(1-y)^2 not fy(y)=3(1-y)^2 syms y f_y(y)=y 3(1-y)^2 0lt=y=1 Let mean y = (115)Sum from 1 to 15 Yi int(y3(1-y)^201) muy=eval(int(y3(1-y)^201)) by definition of variance int(y^23(1-y)^201)-14^2 vary=eval(int(y^23(1-y)^201)-14^2) P=normcdf((38-muy) (sqrt(vary)sqrt(15)))- normcdf((18-muy) (sqrt(vary)sqrt(15)))

Example 434 LMEx040304_4thm Example 434 Effects of rounding analyzed with normal approximation Effect of rounding on the expected value and variance of rounded estimates P 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Using the central limit theorem what is the probability that the error on 100 transactions each rounded to the nearest $100 would exceed $500 What is the variance of a uniformly distributed random number over the interval -50 to 50 From definition of Variance Var(Y)=E[(Y-mu)^2]= integral from -50 to 50 (y-mu)^2 f_y(y) dy mu=0 f_y(y)=1 This is the height of the uniform pdf at each value of y for max(y)-min(y)=1 the value of f_y(y) is 1 for uniform random numbers spread over different ranges for max(y)-min(y)f_y(y) must equal 1 thus for a uniform random number distributed on the interval -100 to 100

EEOS 601 Applied Statistics Week 4 P 25 of 38

f_y(y)=1200 Written by EugeneGallagherumbedu for EEOS601 httpwwwesumbeduedgwebphtm Written in 2001 revised 1911 The mfile goes through some applications of the symbolic math toolbox at the beginning to verify that the variance of 100 rounded transactions is 25003 format rat syms y mu int(y^2100-5050)Use the symbolic math toolbox to find the variance of a uniformly distributed variable on the interval -50 to 50 it is 25003 VarY=eval(int(sym(y^2100)-5050)) Variance of a uniform randomly distributed number on the interval -50 to 50 is 25003 Note how the variance remains 25003 with a shift in mean Var(Y)=E(Y^2)-mu^2 mu=50 int(y^21000100) EYsq=eval(int(sym(y^2100)0100)) The previous statement produces 100003 which produces an identical variance after subtracting the square of the expected value mu^2 as is appropriate from the definition of variance VARY=EYsq-mu^2 Note that Var(Y) is also defined as integral -inf to inf (y-mu)^2 f_y(y) dy We can create the definite integral of this equation and evaluate it to produce the variance syms y mu mu=50 VARY2=eval(int(sym((y-mu)^2100)y0100)) VARY2 should also be 25003 format using Theorem 31312 in Larsen amp Marx (2001 p 223) Calculate the variance of all 100 transactions (multiply the variance by 100) Note this is for 100 indepedently distributed random variates where the positive and negative errors among transactions are not correlated Varall=100VARY If you wanted to convert the variance calculated here in dollars to the variance in pennies youd have to use Theorem 3131 p 222 It would the variance of Varall in pennies would be a=100Var(aVarall)=a^2 Varall =10000 Varall calculate the standard deviation sigmaall=sqrt(Varall)

EEOS 601 Applied Statistics Week 4 P 26 of 38

Calculate the Z statisticZ=zeros(21)Z(1)=(-500-0)sigmaallZ(2)=(500-0)sigmaall Use the cumulative normal probability function to find the probability of being more than $5 dollars off after 100 transactionsP=1-(normcdf(Z(2))-normcdf(Z(1)))fprintf(The probability of being more than $500 off after 100n)fprintf(transactions in which each transaction is rounded to the n)fprintf(nearest $100 is 64fnP)

Example 435 LMex040305_4thm Example 435 Memphis earthquakes and the Poisson distribution Pp 305-306 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Annual earthquakes registering 25 on the Richter scale and within 40 miles of downtown Memphis have a Poisson distribution with lambda=65 Calculate the exact probability that nine or more such earthquakes will strike next year using the Poisson distribution and compare that with the normal approximation based on the central limit theoremlambda=65N=9P=1-poisscdf(N-1lambda)fprintf(nThe exact p value using the Poisson cumulative distribution )fprintf(function with lambda=53f is 74fnlambdaP)P2=1-normcdf(((N-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP2)continuity=05 The continuity correction is appropriate whenever a discrete probability model is being approximated by the area under a curve)P3=1-normcdf(((N+continuity-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem with )fprintf(the continuity correction is )fprintf(74fnP3)

Optional plot the Poisson pdf and the normal approximation Use LMFig040301_4thm as a modelX=018Y = poisspdf(Xlambda)bar(XY1)axis([-06 19 0 02])title(Example 435FontSize14)set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 27 of 38

ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0218FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=lambdasigma_j=sqrt(lambda) sigmaj is the standard deviationy_j=00218fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel(Number of Memphis Earthquakes gt 25)figure(gcf)pause

Questions p 306

435 The Normal Curve as a Model for Individual Measurements

Definition 431 A random variable Y is said to be normally distributed with mean igrave and variance oacutesup2 if

Comment Areas under and ldquoarbitraryrdquo normal distribution f (y) y

are calculated by finding the equivalent area under the standard normal distribution f (z) z Figure 10 Figure 435 from Example 436 If a driverrsquos

true blood alcohol level is 0095 and the breathalyzer has an instrumental error oacute= 0004 (pdf shown in upper panel) What is the probability that a driver will be

The ratio is often referred to erroneously charged with DUI The problem can be as either a Z transformation or a Z analyzed using the pdf for the unscaled data or the data can score be converted to z scores (lower panel) by subtracting the

mean and dividing by oacute The probability of an erroneous Example 436 charge is 1056 the area to the right of the cutpoints LMex040306_4thm shown in the normal and standard normal curves

EEOS 601 Applied Statistics Week 4 P 28 of 38

Example 436 Drunk driving pages 308-309 in Larsen amp Marx (2006) Based on LMex040305_4thm Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml A legally drunk driver in many states has a blood alcohol level of 010 but the breathalyzer has measurement error Repeated measurements indicate that the the estimate of the mean is unbiased and sigma is 0004 If a drivers true blood alcohol content is 0095 what is the probability that he will be incorrectly booked on a DUI chargemu=0095sigma=0004Observed=01P=1-normcdf((Observed-mu)sigma)fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP)

Plot Figure 435subplot(211)X=0075000050115Y = normpdf(Xmusigma)plot(XY--r)axis([0075 0115 0 105])title(Figure 435FontSize14)ax1=gcaxlabel(Blood Alcohol Level)ylabel(f_y(y)FontSize14)ax1=gcaset(ax1xtick0080005011FontSize12)set(ax1ytick025100FontSize12)hold onxf=0100001115yf=normpdf(xfmusigma)fill([01 xf 0115][0 yf 0][8 8 1])hold off

subplot(212)X=-3500135Y = normpdf(X01)plot(XY--r)axis([-355 355 0 042])ax1=gcaxlabel(Z-score)ylabel(f_z(z)FontSize14)ax1=gca

EEOS 601 Applied Statistics Week 4 P 29 of 38

set(ax1xtick-350535FontSize12)set(ax1ytick00204FontSize12)hold onxf=125000135yf=normpdf(xf01)fill([125 xf 35][0 yf 0][8 8 1])figure(gcf)pausehold offsubplot(111)

Example 437 Mensa p 309 LMex040307_4thm Example 437 Mensa amp IQ Pages309-310 in Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Based on LMex040306_4thm Written Fall 2010 by EugeneGallagher for EEOS601 Revised1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfi Figure 11 Figure 436 from Example 437 A Mensa

lesedgwebphtml score of 133 marks the upper 2 of IQrsquos if igrave=100 and

A person must be in the upper 2 oacute=16

of the population in IQ to join What is the minimum IQ that will qualify a person for membership Assume that the mean IQ is 100 and sigma is 16mu=100sigma=16 Use the inverse of the normal pdf to find the IQ cutoff for the upper 2 of the population 098=(1-002)IQmensa=norminv(09810016)fprintf(nThe lowest acceptable IQ for membership in Mensa is )fprintf(30fnceil(IQmensa))

Plot Figure 436X=50150Y = normpdf(Xmusigma)plot(XY--r)axis([45 155 0 003])title(Figure 436 Mensa amp IQFontSize20)ax1=gcaxlabel(IQFontSize16)ylabel(f_y(y)FontSize16)ax1=gcaset(ax1xtick[50 100 133 150]FontSize14)

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 13: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 13 of 38

Figure 2 Case study 424 Table 426-7 Figure 423

Case Study 424 LMcs040204_4thm Case Study 424 Mauna Loa eruptions amp exponential distribution P 290-291 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th Ed Written by Eugene Gallagher for EEOS601 Figure 3 Figure 423 from Case study 424

IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 14 of 38

EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1711 Revised 1711 Fitting the exponential pdf to intervals between eruptons Data = interval between 37 consecutive eruptions of Mauna Loa DATA=[126 73 3 6 37 23 73 23 2 65 94 51 26 21 6 68 16 20 6 18 6 41 40 18 41 11 12 38 77 61 26 3 38 50 91 12] DATA=DATA() lambda=0027 fprintf(Eruptions occuring at 53f per month or once every 32f lambda1(12lambda)) fprintf(yearsn) histc is the Matlab counting function Plot the data using LMFig030402_4thm as a model ObservedI=histc(DATA020140) bar(020140ObservedI(sum(ObservedI)20)histc) set(get(gcaChildren)FaceColor[8 8 1]) axis([0 145 0 0033]) figure(gcf) xlabel(Interval between eruptions (in months)) ylabel(Density) ax1=gca get the handle for the bar charts axes pause y=0140 fy=lambdaexp(-lambday) h2=line(yfyColorrLineStyle--Linewidth2) set(ax1YTick[0001003]) text(340015f_y(y)=0027e^-0027y) title(Figure 423) figure(gcf) pause

Example 425 LMex040205_4thm Example 425 page 292 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu for EEOS601 If the Perseid meteor shower has a rate of 40 per hour Poisson distributed calculate the probability that someone would have to wait 5 minutes before seeing another y= interval in minutes between consecutive sightings Using the symbolic math toolbox

EEOS 601 Applied Statistics Week 4 P 15 of 38

fprintf(Exact P=n)syms y disp(int(4060exp(-4060y)5inf))P=eval(int(4060exp(-4060y)5inf))fprintf(If the Perseids have a Poisson parameter = 4060 per min n)fprintf(an observer would have p=64f of waiting 5 or more nP)fprintf(minutes between consecutive meteorsn)

43 THE NORMAL DISTRIBUTION

Figure 431 LMFig040301_4thm Page 293 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Generate 20 binomial random numbers p(success)=05 Written in 2001 by EugeneGallagherumbedu revised 1711 Use the statistical toolboxs binorndm X=020n=20p=05

Figure 431 Page 293 in Larsen amp Marx (2006)

Y = binopdf(Xnp)bar(XY1)axis([0 21 0 02])title(Figure 431FontSize14)set(get(gcaChildren)FaceColor[8 8 1])ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0220FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=npsigma_j=sqrt(np(1-p)) sigmaj is the standard deviationy_j=00220fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel()ylabel(Probability)title(Figure 431)figure(gcf)pause

IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 16 of 38

431 Theorem 431 Let X be a binomial random variable defined on n independent trials for which p=P(success) For any numbers a and b

Comment The function is referred to as the standard normal (or

Gaussian) curve [or distribution] By convention any random variable whose probabilistic behavior is described by a standard normal curve is denoted by Z (rather than by X Y or W) E(Z)=0 and Var(Z)=1 Page 294 in Larsen amp Marx (2006)

432 Finding areas under the standard normal curve

Figure 432 LMFig040302_4thm Larsen amp Marx Figure 432 plotpage 295 Use LMex040308_4thm as amodel Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Page 267 Standard normal distributionusing ezplot Integration using the symbolicmath toolbox Figure 4 Figure 432 Written byEugeneGallagherumbedu October 1 2010clear allhold off clfsyms z Just for fun use symbolic math integration to find the integralINTP=eval(int(1sqrt(2pi)exp(-(z^22))-inf114)) or use the standard normal cumulative distribution functionintp=normcdf(114)

Plot Figure 432mu=0sigma=1X=-350135

EEOS 601 Applied Statistics Week 4 P 17 of 38

Y = normpdf(Xmusigma) plot(XY--r) axis([-355 355 0 041]) title(Figure 432FontSize16) ax1=gca xlabel( ) ylabel(f_z(z)FontSize14) ax1=gca set(ax1XTick[-3 -2 -1 0 114 2 3] FontSize12

XTickLabel-3-2-1011423FontSize12) set(ax1ytick0104FontSize12) hold on xf=-3501114yf=normpdf(xfmusigma) fill([-35 xf 114][0 yf 0][8 8 1]) hold off

433 The continuity correction

Figure 5 Figure 433 the continuity correction

Example 431 LMex040301_4thm Example 431 Overbooking airline seats Application of the normal distribution and the continuity correction Page 297 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagher httpwwwesumbeduedgwebphtm Written Fall 2010 Revised 1711 178 seats sold for 168 economy class seats available p=09 passenger will show up What is the probability that not everyone who shows up at the gate on time can be accomodated n=178p=09n2=169

IT
Stamp

EEOS 601 Applied Statistics Week 4 P 18 of 38

P=normcdf((n+05-np)sqrt(np(1-p)))- normcdf((n2-05-np)sqrt(np(1-p))) fprintf( nP not all can be accomodated with continuity correction = 64fnP) Without the continuity correction P2=normcdf((n-np)sqrt(np(1-p)))- normcdf((n2-np)sqrt(np(1-p))) fprintf( P not all can be accomodated without continuity correction = 64fnP2) k=169178 Solve as a binomial pdf P3 = sum(binopdf(knp)) fprintf( P not all can be accomodated exact binomial = 64fnP3) P4 = binocdf(nnp)-binocdf(n2-1np) fprintf( P not all can be accomodated exact binomial cdf = 64fnP4)

Case Study 431 ESP LMcs040301_4thm Case Study 431 ESP P 298 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Thirty two students correctly guessed 12489 times out of 60000 when the expected number of successes would be just 12000 What is the probability of this occuring by chancek=12489n=6e4p=15P1=1-binocdf(knp)fprintf(nUsing the binomial cdf the exact P = 97gnP1)X=(124885-np)sqrt(np(1-p))P2=1-normcdf(X)fprintf(Using the Normal approximation with the continuity n)fprintf(correction the exact P = 97gnP2)X=(12489-np)sqrt(np(1-p))P3=1-normcdf(X)fprintf(Using the Normal approximation without the continuity n)fprintf(correction the exact P = 97gnP3)

Questions Page 299

EEOS 601 Applied Statistics Week 4 P 19 of 38

434 Central Limit Theorem (p 301) Central limit theorem Every binomial random variable X can be written as the sum of n independent Bernoulli random variables X X X where X =1 with probability P and 0 with probability 1-p but if 1 2 n i

X=X +X ++X Theorem 431 can be reexpressed as 1 2 n

The Russian Lyapunov made many of the key advances and in 1920 George Polya called it the central limit theorem Theorem 432 p 273 Let W W be an infinite sequence of of independent random variables 1 2

with the same distribution Suppose that the mean igrave and the variance oacute2 of f (w) are both finite w

For any numbers a and b

Comment The central limit theorem is often stated in terms of the average of W W and 1 2

W rather than the sum Theorem 432 can be stated in equivalent form n

Example 432 LMEx040302_4thm Larsen amp Marx Example 432 Example of the central limit theorem Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf

Figure 5 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 uniformly distributed random numbers after conversion to Z scores The fit to the normal distribution is quite good even though each of the 40 samples is based on only 5 uniformly distributed numbers

EEOS 601 Applied Statistics Week 4 P 20 of 38

See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogrammultiple of 7 n=100samples=1000numbins=14 Numbins for thehistogram multiple of 7 n=1000samples=500numbins=14 For the histogram n=1000samples=40 numbins=7 For the histogramC=rand(samplesn) Figure 6 Example 432 Page 302 in Larsen amp MarxC6=sum(C) sums of n replicates (2006) 1000 samples of 5 uniformly distributed random Use symbolic math toolbox to numbers after conversion to Z scores The number of binscalculate the variance of a uniform increased from 7 to 14 for this bar chart The normal random distribution distribution provides an excellent fit even though each ofmu=05 1000 means is based on only 5 cases drawn from a uniformsyms f z y distribution The uniform distribution is symmetricdisp(Definite integral (01) of y^2) allowing a rapid convergence to the normal distributionint(y^201)EYsq=int(sym(y^2)01)VARY=EYsq-mu^2disp(Var of a uniformly distributed random number on the interval (01))disp(VARY)vary=eval(VARY) convert from symbolic to numeric The variance is 112 The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302) end Use the code on page 5-115 of Statistics toolbox to draw histogram binwidthscaling=7numbins This will produce 7 bins (35-(-35)) edges=-35binwidthscaling35N=histc(C7edges) bar(edgesNsum(N)binwidthscaling1) ax1=gca set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 21 of 38

axis([-36 36 -0005 053]) ylabel(DensityFontSize14)xlabel(Z-ratio) set(ax1xtickedgesFontSize12) set(ax1ytick00105FontSize12) figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is Figure 7 Example 432 Page 302 in Larsen amp Marx f(y_j)=1(sqrt(2pi)sigma_j)exp(-1

(2006) 1000 samples of 500 uniformly distributed random numbers after conversion to Z scores The number of bins

2((y_j-mu_j)sigma_j)^2) increased from 7 to 14 for this bar chart The fit of the mu_j=0 sigma_j=1 sigmaj is the standard

normal distribution is excellent as would be expected from the large number of cases (500) on which each mean is

deviation = 1 after Z transform basedy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)s=sprintf(Example 432 n=40f and samples=30fnsamples)title(s)figure(gcf)pause

LMex040302_logrnd_4thm Larsen amp Marx Example 432 Example of the central limit theorem using lognormally distributed variables instead of uniform Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox

Figure 8 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 lognormally distributed (igrave=1 oacute=1) random numbers after conversion to Z-scores The normal approximation is quite poor Because each of the 40 samples is based on a mean of only 5 cases

EEOS 601 Applied Statistics Week 4 P 22 of 38

Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogram multiple of 7 Figure 9 Example 432 Page 302 in Larsen amp Marx

n=100samples=1000 (2006) 1000 samples of 1000 lognormally distributed

numbins=14 Numbins for the (igrave=1 oacute=1) random numbers after conversion to Z scores I

histogram multiple of 7 used 14 bins for this bar chart Even for the highly

n=1000samples=1000 asymmetrically lognormally distributed samples the

numbins=14 For the histogram normal approximation is excellent because of the large

n=1000samples=40 numbins=7 number of cases (1000) on which each mean was based

For the histogram lognormal=0 This example calculates the z scores based on the parametric mean and variance for the underlying distributions For the uniform distribution the variance is calculated from the variance of the distribution to be 112 For the lognormal distribution lognstat is the built-in Matlab function that will covert the mean and variance from the log scale to the standard scale if lognormal mu=1var=1 C=lognrnd(11samplesn) [muvary]=lognstat(11)s=sprintf(Example 432 Lognormal n=40f and samples=30fnsamples) s2=sprintf(Example 432 Lognormal distribution 1st sample n=40fn) else C=rand(samplesn)

mu=05 Use symbolic math toolbox to calculate the variance of a uniform

random distribution syms f z y disp(Definite integral (01) of y^2) int(y^201) EYsq=int(sym(y^2)01)

EEOS 601 Applied Statistics Week 4 P 23 of 38

VARY=EYsq-mu^2 disp( Var of a uniformly distributed random number on the interval (01)) disp(VARY) vary=eval(VARY) convert from symbolic to numeric The variance is 112 s=sprintf(Example 432 Uniform n=40f and samples=40fnsamples) s2=sprintf(Example 432 Uniform distribution 1st sample n=40fn)endC6=sum(C) sums of n replicates The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302)end

Use the code on page 5-115 of Statistics toolbox to draw histogram Draw a histogram of one row of datahist(C(1))ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])ylabel(FrequencyFontSize16)xlabel(XFontSize16)title(s2FontSize20)figure(gcf)pause

histogram of the z scoresbinwidthscaling=7numbins This will produce 7 bins (35-(-35))edges=-35binwidthscaling35N=histc(C7edges)bar(edgesNsum(N)binwidthscaling1histc)ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])axis([-36 36 -0005 105max(Nsum(N)binwidthscaling)])ylabel(DensityFontSize16)xlabel(Z-ratioFontSize16)set(ax1xtickedgesFontSize14)set(ax1ytick00107FontSize14)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=0

EEOS 601 Applied Statistics Week 4 P 24 of 38

sigma_j=1 sigmaj is the standard deviation = 1 after Z transformy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)

title(sFontSize20)figure(gcf)pause

Example 433 LMex040303_4thm Example 433 Pp 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition An application of the symbolic math integration functions Typo in book f_y(y)=y 3(1-y)^2 not fy(y)=3(1-y)^2 syms y f_y(y)=y 3(1-y)^2 0lt=y=1 Let mean y = (115)Sum from 1 to 15 Yi int(y3(1-y)^201) muy=eval(int(y3(1-y)^201)) by definition of variance int(y^23(1-y)^201)-14^2 vary=eval(int(y^23(1-y)^201)-14^2) P=normcdf((38-muy) (sqrt(vary)sqrt(15)))- normcdf((18-muy) (sqrt(vary)sqrt(15)))

Example 434 LMEx040304_4thm Example 434 Effects of rounding analyzed with normal approximation Effect of rounding on the expected value and variance of rounded estimates P 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Using the central limit theorem what is the probability that the error on 100 transactions each rounded to the nearest $100 would exceed $500 What is the variance of a uniformly distributed random number over the interval -50 to 50 From definition of Variance Var(Y)=E[(Y-mu)^2]= integral from -50 to 50 (y-mu)^2 f_y(y) dy mu=0 f_y(y)=1 This is the height of the uniform pdf at each value of y for max(y)-min(y)=1 the value of f_y(y) is 1 for uniform random numbers spread over different ranges for max(y)-min(y)f_y(y) must equal 1 thus for a uniform random number distributed on the interval -100 to 100

EEOS 601 Applied Statistics Week 4 P 25 of 38

f_y(y)=1200 Written by EugeneGallagherumbedu for EEOS601 httpwwwesumbeduedgwebphtm Written in 2001 revised 1911 The mfile goes through some applications of the symbolic math toolbox at the beginning to verify that the variance of 100 rounded transactions is 25003 format rat syms y mu int(y^2100-5050)Use the symbolic math toolbox to find the variance of a uniformly distributed variable on the interval -50 to 50 it is 25003 VarY=eval(int(sym(y^2100)-5050)) Variance of a uniform randomly distributed number on the interval -50 to 50 is 25003 Note how the variance remains 25003 with a shift in mean Var(Y)=E(Y^2)-mu^2 mu=50 int(y^21000100) EYsq=eval(int(sym(y^2100)0100)) The previous statement produces 100003 which produces an identical variance after subtracting the square of the expected value mu^2 as is appropriate from the definition of variance VARY=EYsq-mu^2 Note that Var(Y) is also defined as integral -inf to inf (y-mu)^2 f_y(y) dy We can create the definite integral of this equation and evaluate it to produce the variance syms y mu mu=50 VARY2=eval(int(sym((y-mu)^2100)y0100)) VARY2 should also be 25003 format using Theorem 31312 in Larsen amp Marx (2001 p 223) Calculate the variance of all 100 transactions (multiply the variance by 100) Note this is for 100 indepedently distributed random variates where the positive and negative errors among transactions are not correlated Varall=100VARY If you wanted to convert the variance calculated here in dollars to the variance in pennies youd have to use Theorem 3131 p 222 It would the variance of Varall in pennies would be a=100Var(aVarall)=a^2 Varall =10000 Varall calculate the standard deviation sigmaall=sqrt(Varall)

EEOS 601 Applied Statistics Week 4 P 26 of 38

Calculate the Z statisticZ=zeros(21)Z(1)=(-500-0)sigmaallZ(2)=(500-0)sigmaall Use the cumulative normal probability function to find the probability of being more than $5 dollars off after 100 transactionsP=1-(normcdf(Z(2))-normcdf(Z(1)))fprintf(The probability of being more than $500 off after 100n)fprintf(transactions in which each transaction is rounded to the n)fprintf(nearest $100 is 64fnP)

Example 435 LMex040305_4thm Example 435 Memphis earthquakes and the Poisson distribution Pp 305-306 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Annual earthquakes registering 25 on the Richter scale and within 40 miles of downtown Memphis have a Poisson distribution with lambda=65 Calculate the exact probability that nine or more such earthquakes will strike next year using the Poisson distribution and compare that with the normal approximation based on the central limit theoremlambda=65N=9P=1-poisscdf(N-1lambda)fprintf(nThe exact p value using the Poisson cumulative distribution )fprintf(function with lambda=53f is 74fnlambdaP)P2=1-normcdf(((N-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP2)continuity=05 The continuity correction is appropriate whenever a discrete probability model is being approximated by the area under a curve)P3=1-normcdf(((N+continuity-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem with )fprintf(the continuity correction is )fprintf(74fnP3)

Optional plot the Poisson pdf and the normal approximation Use LMFig040301_4thm as a modelX=018Y = poisspdf(Xlambda)bar(XY1)axis([-06 19 0 02])title(Example 435FontSize14)set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 27 of 38

ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0218FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=lambdasigma_j=sqrt(lambda) sigmaj is the standard deviationy_j=00218fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel(Number of Memphis Earthquakes gt 25)figure(gcf)pause

Questions p 306

435 The Normal Curve as a Model for Individual Measurements

Definition 431 A random variable Y is said to be normally distributed with mean igrave and variance oacutesup2 if

Comment Areas under and ldquoarbitraryrdquo normal distribution f (y) y

are calculated by finding the equivalent area under the standard normal distribution f (z) z Figure 10 Figure 435 from Example 436 If a driverrsquos

true blood alcohol level is 0095 and the breathalyzer has an instrumental error oacute= 0004 (pdf shown in upper panel) What is the probability that a driver will be

The ratio is often referred to erroneously charged with DUI The problem can be as either a Z transformation or a Z analyzed using the pdf for the unscaled data or the data can score be converted to z scores (lower panel) by subtracting the

mean and dividing by oacute The probability of an erroneous Example 436 charge is 1056 the area to the right of the cutpoints LMex040306_4thm shown in the normal and standard normal curves

EEOS 601 Applied Statistics Week 4 P 28 of 38

Example 436 Drunk driving pages 308-309 in Larsen amp Marx (2006) Based on LMex040305_4thm Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml A legally drunk driver in many states has a blood alcohol level of 010 but the breathalyzer has measurement error Repeated measurements indicate that the the estimate of the mean is unbiased and sigma is 0004 If a drivers true blood alcohol content is 0095 what is the probability that he will be incorrectly booked on a DUI chargemu=0095sigma=0004Observed=01P=1-normcdf((Observed-mu)sigma)fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP)

Plot Figure 435subplot(211)X=0075000050115Y = normpdf(Xmusigma)plot(XY--r)axis([0075 0115 0 105])title(Figure 435FontSize14)ax1=gcaxlabel(Blood Alcohol Level)ylabel(f_y(y)FontSize14)ax1=gcaset(ax1xtick0080005011FontSize12)set(ax1ytick025100FontSize12)hold onxf=0100001115yf=normpdf(xfmusigma)fill([01 xf 0115][0 yf 0][8 8 1])hold off

subplot(212)X=-3500135Y = normpdf(X01)plot(XY--r)axis([-355 355 0 042])ax1=gcaxlabel(Z-score)ylabel(f_z(z)FontSize14)ax1=gca

EEOS 601 Applied Statistics Week 4 P 29 of 38

set(ax1xtick-350535FontSize12)set(ax1ytick00204FontSize12)hold onxf=125000135yf=normpdf(xf01)fill([125 xf 35][0 yf 0][8 8 1])figure(gcf)pausehold offsubplot(111)

Example 437 Mensa p 309 LMex040307_4thm Example 437 Mensa amp IQ Pages309-310 in Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Based on LMex040306_4thm Written Fall 2010 by EugeneGallagher for EEOS601 Revised1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfi Figure 11 Figure 436 from Example 437 A Mensa

lesedgwebphtml score of 133 marks the upper 2 of IQrsquos if igrave=100 and

A person must be in the upper 2 oacute=16

of the population in IQ to join What is the minimum IQ that will qualify a person for membership Assume that the mean IQ is 100 and sigma is 16mu=100sigma=16 Use the inverse of the normal pdf to find the IQ cutoff for the upper 2 of the population 098=(1-002)IQmensa=norminv(09810016)fprintf(nThe lowest acceptable IQ for membership in Mensa is )fprintf(30fnceil(IQmensa))

Plot Figure 436X=50150Y = normpdf(Xmusigma)plot(XY--r)axis([45 155 0 003])title(Figure 436 Mensa amp IQFontSize20)ax1=gcaxlabel(IQFontSize16)ylabel(f_y(y)FontSize16)ax1=gcaset(ax1xtick[50 100 133 150]FontSize14)

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 14: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 14 of 38

EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Written 1711 Revised 1711 Fitting the exponential pdf to intervals between eruptons Data = interval between 37 consecutive eruptions of Mauna Loa DATA=[126 73 3 6 37 23 73 23 2 65 94 51 26 21 6 68 16 20 6 18 6 41 40 18 41 11 12 38 77 61 26 3 38 50 91 12] DATA=DATA() lambda=0027 fprintf(Eruptions occuring at 53f per month or once every 32f lambda1(12lambda)) fprintf(yearsn) histc is the Matlab counting function Plot the data using LMFig030402_4thm as a model ObservedI=histc(DATA020140) bar(020140ObservedI(sum(ObservedI)20)histc) set(get(gcaChildren)FaceColor[8 8 1]) axis([0 145 0 0033]) figure(gcf) xlabel(Interval between eruptions (in months)) ylabel(Density) ax1=gca get the handle for the bar charts axes pause y=0140 fy=lambdaexp(-lambday) h2=line(yfyColorrLineStyle--Linewidth2) set(ax1YTick[0001003]) text(340015f_y(y)=0027e^-0027y) title(Figure 423) figure(gcf) pause

Example 425 LMex040205_4thm Example 425 page 292 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagherumbedu for EEOS601 If the Perseid meteor shower has a rate of 40 per hour Poisson distributed calculate the probability that someone would have to wait 5 minutes before seeing another y= interval in minutes between consecutive sightings Using the symbolic math toolbox

EEOS 601 Applied Statistics Week 4 P 15 of 38

fprintf(Exact P=n)syms y disp(int(4060exp(-4060y)5inf))P=eval(int(4060exp(-4060y)5inf))fprintf(If the Perseids have a Poisson parameter = 4060 per min n)fprintf(an observer would have p=64f of waiting 5 or more nP)fprintf(minutes between consecutive meteorsn)

43 THE NORMAL DISTRIBUTION

Figure 431 LMFig040301_4thm Page 293 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Generate 20 binomial random numbers p(success)=05 Written in 2001 by EugeneGallagherumbedu revised 1711 Use the statistical toolboxs binorndm X=020n=20p=05

Figure 431 Page 293 in Larsen amp Marx (2006)

Y = binopdf(Xnp)bar(XY1)axis([0 21 0 02])title(Figure 431FontSize14)set(get(gcaChildren)FaceColor[8 8 1])ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0220FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=npsigma_j=sqrt(np(1-p)) sigmaj is the standard deviationy_j=00220fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel()ylabel(Probability)title(Figure 431)figure(gcf)pause

IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 16 of 38

431 Theorem 431 Let X be a binomial random variable defined on n independent trials for which p=P(success) For any numbers a and b

Comment The function is referred to as the standard normal (or

Gaussian) curve [or distribution] By convention any random variable whose probabilistic behavior is described by a standard normal curve is denoted by Z (rather than by X Y or W) E(Z)=0 and Var(Z)=1 Page 294 in Larsen amp Marx (2006)

432 Finding areas under the standard normal curve

Figure 432 LMFig040302_4thm Larsen amp Marx Figure 432 plotpage 295 Use LMex040308_4thm as amodel Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Page 267 Standard normal distributionusing ezplot Integration using the symbolicmath toolbox Figure 4 Figure 432 Written byEugeneGallagherumbedu October 1 2010clear allhold off clfsyms z Just for fun use symbolic math integration to find the integralINTP=eval(int(1sqrt(2pi)exp(-(z^22))-inf114)) or use the standard normal cumulative distribution functionintp=normcdf(114)

Plot Figure 432mu=0sigma=1X=-350135

EEOS 601 Applied Statistics Week 4 P 17 of 38

Y = normpdf(Xmusigma) plot(XY--r) axis([-355 355 0 041]) title(Figure 432FontSize16) ax1=gca xlabel( ) ylabel(f_z(z)FontSize14) ax1=gca set(ax1XTick[-3 -2 -1 0 114 2 3] FontSize12

XTickLabel-3-2-1011423FontSize12) set(ax1ytick0104FontSize12) hold on xf=-3501114yf=normpdf(xfmusigma) fill([-35 xf 114][0 yf 0][8 8 1]) hold off

433 The continuity correction

Figure 5 Figure 433 the continuity correction

Example 431 LMex040301_4thm Example 431 Overbooking airline seats Application of the normal distribution and the continuity correction Page 297 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagher httpwwwesumbeduedgwebphtm Written Fall 2010 Revised 1711 178 seats sold for 168 economy class seats available p=09 passenger will show up What is the probability that not everyone who shows up at the gate on time can be accomodated n=178p=09n2=169

IT
Stamp

EEOS 601 Applied Statistics Week 4 P 18 of 38

P=normcdf((n+05-np)sqrt(np(1-p)))- normcdf((n2-05-np)sqrt(np(1-p))) fprintf( nP not all can be accomodated with continuity correction = 64fnP) Without the continuity correction P2=normcdf((n-np)sqrt(np(1-p)))- normcdf((n2-np)sqrt(np(1-p))) fprintf( P not all can be accomodated without continuity correction = 64fnP2) k=169178 Solve as a binomial pdf P3 = sum(binopdf(knp)) fprintf( P not all can be accomodated exact binomial = 64fnP3) P4 = binocdf(nnp)-binocdf(n2-1np) fprintf( P not all can be accomodated exact binomial cdf = 64fnP4)

Case Study 431 ESP LMcs040301_4thm Case Study 431 ESP P 298 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Thirty two students correctly guessed 12489 times out of 60000 when the expected number of successes would be just 12000 What is the probability of this occuring by chancek=12489n=6e4p=15P1=1-binocdf(knp)fprintf(nUsing the binomial cdf the exact P = 97gnP1)X=(124885-np)sqrt(np(1-p))P2=1-normcdf(X)fprintf(Using the Normal approximation with the continuity n)fprintf(correction the exact P = 97gnP2)X=(12489-np)sqrt(np(1-p))P3=1-normcdf(X)fprintf(Using the Normal approximation without the continuity n)fprintf(correction the exact P = 97gnP3)

Questions Page 299

EEOS 601 Applied Statistics Week 4 P 19 of 38

434 Central Limit Theorem (p 301) Central limit theorem Every binomial random variable X can be written as the sum of n independent Bernoulli random variables X X X where X =1 with probability P and 0 with probability 1-p but if 1 2 n i

X=X +X ++X Theorem 431 can be reexpressed as 1 2 n

The Russian Lyapunov made many of the key advances and in 1920 George Polya called it the central limit theorem Theorem 432 p 273 Let W W be an infinite sequence of of independent random variables 1 2

with the same distribution Suppose that the mean igrave and the variance oacute2 of f (w) are both finite w

For any numbers a and b

Comment The central limit theorem is often stated in terms of the average of W W and 1 2

W rather than the sum Theorem 432 can be stated in equivalent form n

Example 432 LMEx040302_4thm Larsen amp Marx Example 432 Example of the central limit theorem Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf

Figure 5 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 uniformly distributed random numbers after conversion to Z scores The fit to the normal distribution is quite good even though each of the 40 samples is based on only 5 uniformly distributed numbers

EEOS 601 Applied Statistics Week 4 P 20 of 38

See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogrammultiple of 7 n=100samples=1000numbins=14 Numbins for thehistogram multiple of 7 n=1000samples=500numbins=14 For the histogram n=1000samples=40 numbins=7 For the histogramC=rand(samplesn) Figure 6 Example 432 Page 302 in Larsen amp MarxC6=sum(C) sums of n replicates (2006) 1000 samples of 5 uniformly distributed random Use symbolic math toolbox to numbers after conversion to Z scores The number of binscalculate the variance of a uniform increased from 7 to 14 for this bar chart The normal random distribution distribution provides an excellent fit even though each ofmu=05 1000 means is based on only 5 cases drawn from a uniformsyms f z y distribution The uniform distribution is symmetricdisp(Definite integral (01) of y^2) allowing a rapid convergence to the normal distributionint(y^201)EYsq=int(sym(y^2)01)VARY=EYsq-mu^2disp(Var of a uniformly distributed random number on the interval (01))disp(VARY)vary=eval(VARY) convert from symbolic to numeric The variance is 112 The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302) end Use the code on page 5-115 of Statistics toolbox to draw histogram binwidthscaling=7numbins This will produce 7 bins (35-(-35)) edges=-35binwidthscaling35N=histc(C7edges) bar(edgesNsum(N)binwidthscaling1) ax1=gca set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 21 of 38

axis([-36 36 -0005 053]) ylabel(DensityFontSize14)xlabel(Z-ratio) set(ax1xtickedgesFontSize12) set(ax1ytick00105FontSize12) figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is Figure 7 Example 432 Page 302 in Larsen amp Marx f(y_j)=1(sqrt(2pi)sigma_j)exp(-1

(2006) 1000 samples of 500 uniformly distributed random numbers after conversion to Z scores The number of bins

2((y_j-mu_j)sigma_j)^2) increased from 7 to 14 for this bar chart The fit of the mu_j=0 sigma_j=1 sigmaj is the standard

normal distribution is excellent as would be expected from the large number of cases (500) on which each mean is

deviation = 1 after Z transform basedy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)s=sprintf(Example 432 n=40f and samples=30fnsamples)title(s)figure(gcf)pause

LMex040302_logrnd_4thm Larsen amp Marx Example 432 Example of the central limit theorem using lognormally distributed variables instead of uniform Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox

Figure 8 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 lognormally distributed (igrave=1 oacute=1) random numbers after conversion to Z-scores The normal approximation is quite poor Because each of the 40 samples is based on a mean of only 5 cases

EEOS 601 Applied Statistics Week 4 P 22 of 38

Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogram multiple of 7 Figure 9 Example 432 Page 302 in Larsen amp Marx

n=100samples=1000 (2006) 1000 samples of 1000 lognormally distributed

numbins=14 Numbins for the (igrave=1 oacute=1) random numbers after conversion to Z scores I

histogram multiple of 7 used 14 bins for this bar chart Even for the highly

n=1000samples=1000 asymmetrically lognormally distributed samples the

numbins=14 For the histogram normal approximation is excellent because of the large

n=1000samples=40 numbins=7 number of cases (1000) on which each mean was based

For the histogram lognormal=0 This example calculates the z scores based on the parametric mean and variance for the underlying distributions For the uniform distribution the variance is calculated from the variance of the distribution to be 112 For the lognormal distribution lognstat is the built-in Matlab function that will covert the mean and variance from the log scale to the standard scale if lognormal mu=1var=1 C=lognrnd(11samplesn) [muvary]=lognstat(11)s=sprintf(Example 432 Lognormal n=40f and samples=30fnsamples) s2=sprintf(Example 432 Lognormal distribution 1st sample n=40fn) else C=rand(samplesn)

mu=05 Use symbolic math toolbox to calculate the variance of a uniform

random distribution syms f z y disp(Definite integral (01) of y^2) int(y^201) EYsq=int(sym(y^2)01)

EEOS 601 Applied Statistics Week 4 P 23 of 38

VARY=EYsq-mu^2 disp( Var of a uniformly distributed random number on the interval (01)) disp(VARY) vary=eval(VARY) convert from symbolic to numeric The variance is 112 s=sprintf(Example 432 Uniform n=40f and samples=40fnsamples) s2=sprintf(Example 432 Uniform distribution 1st sample n=40fn)endC6=sum(C) sums of n replicates The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302)end

Use the code on page 5-115 of Statistics toolbox to draw histogram Draw a histogram of one row of datahist(C(1))ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])ylabel(FrequencyFontSize16)xlabel(XFontSize16)title(s2FontSize20)figure(gcf)pause

histogram of the z scoresbinwidthscaling=7numbins This will produce 7 bins (35-(-35))edges=-35binwidthscaling35N=histc(C7edges)bar(edgesNsum(N)binwidthscaling1histc)ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])axis([-36 36 -0005 105max(Nsum(N)binwidthscaling)])ylabel(DensityFontSize16)xlabel(Z-ratioFontSize16)set(ax1xtickedgesFontSize14)set(ax1ytick00107FontSize14)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=0

EEOS 601 Applied Statistics Week 4 P 24 of 38

sigma_j=1 sigmaj is the standard deviation = 1 after Z transformy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)

title(sFontSize20)figure(gcf)pause

Example 433 LMex040303_4thm Example 433 Pp 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition An application of the symbolic math integration functions Typo in book f_y(y)=y 3(1-y)^2 not fy(y)=3(1-y)^2 syms y f_y(y)=y 3(1-y)^2 0lt=y=1 Let mean y = (115)Sum from 1 to 15 Yi int(y3(1-y)^201) muy=eval(int(y3(1-y)^201)) by definition of variance int(y^23(1-y)^201)-14^2 vary=eval(int(y^23(1-y)^201)-14^2) P=normcdf((38-muy) (sqrt(vary)sqrt(15)))- normcdf((18-muy) (sqrt(vary)sqrt(15)))

Example 434 LMEx040304_4thm Example 434 Effects of rounding analyzed with normal approximation Effect of rounding on the expected value and variance of rounded estimates P 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Using the central limit theorem what is the probability that the error on 100 transactions each rounded to the nearest $100 would exceed $500 What is the variance of a uniformly distributed random number over the interval -50 to 50 From definition of Variance Var(Y)=E[(Y-mu)^2]= integral from -50 to 50 (y-mu)^2 f_y(y) dy mu=0 f_y(y)=1 This is the height of the uniform pdf at each value of y for max(y)-min(y)=1 the value of f_y(y) is 1 for uniform random numbers spread over different ranges for max(y)-min(y)f_y(y) must equal 1 thus for a uniform random number distributed on the interval -100 to 100

EEOS 601 Applied Statistics Week 4 P 25 of 38

f_y(y)=1200 Written by EugeneGallagherumbedu for EEOS601 httpwwwesumbeduedgwebphtm Written in 2001 revised 1911 The mfile goes through some applications of the symbolic math toolbox at the beginning to verify that the variance of 100 rounded transactions is 25003 format rat syms y mu int(y^2100-5050)Use the symbolic math toolbox to find the variance of a uniformly distributed variable on the interval -50 to 50 it is 25003 VarY=eval(int(sym(y^2100)-5050)) Variance of a uniform randomly distributed number on the interval -50 to 50 is 25003 Note how the variance remains 25003 with a shift in mean Var(Y)=E(Y^2)-mu^2 mu=50 int(y^21000100) EYsq=eval(int(sym(y^2100)0100)) The previous statement produces 100003 which produces an identical variance after subtracting the square of the expected value mu^2 as is appropriate from the definition of variance VARY=EYsq-mu^2 Note that Var(Y) is also defined as integral -inf to inf (y-mu)^2 f_y(y) dy We can create the definite integral of this equation and evaluate it to produce the variance syms y mu mu=50 VARY2=eval(int(sym((y-mu)^2100)y0100)) VARY2 should also be 25003 format using Theorem 31312 in Larsen amp Marx (2001 p 223) Calculate the variance of all 100 transactions (multiply the variance by 100) Note this is for 100 indepedently distributed random variates where the positive and negative errors among transactions are not correlated Varall=100VARY If you wanted to convert the variance calculated here in dollars to the variance in pennies youd have to use Theorem 3131 p 222 It would the variance of Varall in pennies would be a=100Var(aVarall)=a^2 Varall =10000 Varall calculate the standard deviation sigmaall=sqrt(Varall)

EEOS 601 Applied Statistics Week 4 P 26 of 38

Calculate the Z statisticZ=zeros(21)Z(1)=(-500-0)sigmaallZ(2)=(500-0)sigmaall Use the cumulative normal probability function to find the probability of being more than $5 dollars off after 100 transactionsP=1-(normcdf(Z(2))-normcdf(Z(1)))fprintf(The probability of being more than $500 off after 100n)fprintf(transactions in which each transaction is rounded to the n)fprintf(nearest $100 is 64fnP)

Example 435 LMex040305_4thm Example 435 Memphis earthquakes and the Poisson distribution Pp 305-306 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Annual earthquakes registering 25 on the Richter scale and within 40 miles of downtown Memphis have a Poisson distribution with lambda=65 Calculate the exact probability that nine or more such earthquakes will strike next year using the Poisson distribution and compare that with the normal approximation based on the central limit theoremlambda=65N=9P=1-poisscdf(N-1lambda)fprintf(nThe exact p value using the Poisson cumulative distribution )fprintf(function with lambda=53f is 74fnlambdaP)P2=1-normcdf(((N-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP2)continuity=05 The continuity correction is appropriate whenever a discrete probability model is being approximated by the area under a curve)P3=1-normcdf(((N+continuity-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem with )fprintf(the continuity correction is )fprintf(74fnP3)

Optional plot the Poisson pdf and the normal approximation Use LMFig040301_4thm as a modelX=018Y = poisspdf(Xlambda)bar(XY1)axis([-06 19 0 02])title(Example 435FontSize14)set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 27 of 38

ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0218FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=lambdasigma_j=sqrt(lambda) sigmaj is the standard deviationy_j=00218fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel(Number of Memphis Earthquakes gt 25)figure(gcf)pause

Questions p 306

435 The Normal Curve as a Model for Individual Measurements

Definition 431 A random variable Y is said to be normally distributed with mean igrave and variance oacutesup2 if

Comment Areas under and ldquoarbitraryrdquo normal distribution f (y) y

are calculated by finding the equivalent area under the standard normal distribution f (z) z Figure 10 Figure 435 from Example 436 If a driverrsquos

true blood alcohol level is 0095 and the breathalyzer has an instrumental error oacute= 0004 (pdf shown in upper panel) What is the probability that a driver will be

The ratio is often referred to erroneously charged with DUI The problem can be as either a Z transformation or a Z analyzed using the pdf for the unscaled data or the data can score be converted to z scores (lower panel) by subtracting the

mean and dividing by oacute The probability of an erroneous Example 436 charge is 1056 the area to the right of the cutpoints LMex040306_4thm shown in the normal and standard normal curves

EEOS 601 Applied Statistics Week 4 P 28 of 38

Example 436 Drunk driving pages 308-309 in Larsen amp Marx (2006) Based on LMex040305_4thm Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml A legally drunk driver in many states has a blood alcohol level of 010 but the breathalyzer has measurement error Repeated measurements indicate that the the estimate of the mean is unbiased and sigma is 0004 If a drivers true blood alcohol content is 0095 what is the probability that he will be incorrectly booked on a DUI chargemu=0095sigma=0004Observed=01P=1-normcdf((Observed-mu)sigma)fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP)

Plot Figure 435subplot(211)X=0075000050115Y = normpdf(Xmusigma)plot(XY--r)axis([0075 0115 0 105])title(Figure 435FontSize14)ax1=gcaxlabel(Blood Alcohol Level)ylabel(f_y(y)FontSize14)ax1=gcaset(ax1xtick0080005011FontSize12)set(ax1ytick025100FontSize12)hold onxf=0100001115yf=normpdf(xfmusigma)fill([01 xf 0115][0 yf 0][8 8 1])hold off

subplot(212)X=-3500135Y = normpdf(X01)plot(XY--r)axis([-355 355 0 042])ax1=gcaxlabel(Z-score)ylabel(f_z(z)FontSize14)ax1=gca

EEOS 601 Applied Statistics Week 4 P 29 of 38

set(ax1xtick-350535FontSize12)set(ax1ytick00204FontSize12)hold onxf=125000135yf=normpdf(xf01)fill([125 xf 35][0 yf 0][8 8 1])figure(gcf)pausehold offsubplot(111)

Example 437 Mensa p 309 LMex040307_4thm Example 437 Mensa amp IQ Pages309-310 in Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Based on LMex040306_4thm Written Fall 2010 by EugeneGallagher for EEOS601 Revised1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfi Figure 11 Figure 436 from Example 437 A Mensa

lesedgwebphtml score of 133 marks the upper 2 of IQrsquos if igrave=100 and

A person must be in the upper 2 oacute=16

of the population in IQ to join What is the minimum IQ that will qualify a person for membership Assume that the mean IQ is 100 and sigma is 16mu=100sigma=16 Use the inverse of the normal pdf to find the IQ cutoff for the upper 2 of the population 098=(1-002)IQmensa=norminv(09810016)fprintf(nThe lowest acceptable IQ for membership in Mensa is )fprintf(30fnceil(IQmensa))

Plot Figure 436X=50150Y = normpdf(Xmusigma)plot(XY--r)axis([45 155 0 003])title(Figure 436 Mensa amp IQFontSize20)ax1=gcaxlabel(IQFontSize16)ylabel(f_y(y)FontSize16)ax1=gcaset(ax1xtick[50 100 133 150]FontSize14)

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 15: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 15 of 38

fprintf(Exact P=n)syms y disp(int(4060exp(-4060y)5inf))P=eval(int(4060exp(-4060y)5inf))fprintf(If the Perseids have a Poisson parameter = 4060 per min n)fprintf(an observer would have p=64f of waiting 5 or more nP)fprintf(minutes between consecutive meteorsn)

43 THE NORMAL DISTRIBUTION

Figure 431 LMFig040301_4thm Page 293 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Generate 20 binomial random numbers p(success)=05 Written in 2001 by EugeneGallagherumbedu revised 1711 Use the statistical toolboxs binorndm X=020n=20p=05

Figure 431 Page 293 in Larsen amp Marx (2006)

Y = binopdf(Xnp)bar(XY1)axis([0 21 0 02])title(Figure 431FontSize14)set(get(gcaChildren)FaceColor[8 8 1])ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0220FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=npsigma_j=sqrt(np(1-p)) sigmaj is the standard deviationy_j=00220fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel()ylabel(Probability)title(Figure 431)figure(gcf)pause

IT
Stamp
IT
Stamp
IT
Stamp

EEOS 601 Applied Statistics Week 4 P 16 of 38

431 Theorem 431 Let X be a binomial random variable defined on n independent trials for which p=P(success) For any numbers a and b

Comment The function is referred to as the standard normal (or

Gaussian) curve [or distribution] By convention any random variable whose probabilistic behavior is described by a standard normal curve is denoted by Z (rather than by X Y or W) E(Z)=0 and Var(Z)=1 Page 294 in Larsen amp Marx (2006)

432 Finding areas under the standard normal curve

Figure 432 LMFig040302_4thm Larsen amp Marx Figure 432 plotpage 295 Use LMex040308_4thm as amodel Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Page 267 Standard normal distributionusing ezplot Integration using the symbolicmath toolbox Figure 4 Figure 432 Written byEugeneGallagherumbedu October 1 2010clear allhold off clfsyms z Just for fun use symbolic math integration to find the integralINTP=eval(int(1sqrt(2pi)exp(-(z^22))-inf114)) or use the standard normal cumulative distribution functionintp=normcdf(114)

Plot Figure 432mu=0sigma=1X=-350135

EEOS 601 Applied Statistics Week 4 P 17 of 38

Y = normpdf(Xmusigma) plot(XY--r) axis([-355 355 0 041]) title(Figure 432FontSize16) ax1=gca xlabel( ) ylabel(f_z(z)FontSize14) ax1=gca set(ax1XTick[-3 -2 -1 0 114 2 3] FontSize12

XTickLabel-3-2-1011423FontSize12) set(ax1ytick0104FontSize12) hold on xf=-3501114yf=normpdf(xfmusigma) fill([-35 xf 114][0 yf 0][8 8 1]) hold off

433 The continuity correction

Figure 5 Figure 433 the continuity correction

Example 431 LMex040301_4thm Example 431 Overbooking airline seats Application of the normal distribution and the continuity correction Page 297 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagher httpwwwesumbeduedgwebphtm Written Fall 2010 Revised 1711 178 seats sold for 168 economy class seats available p=09 passenger will show up What is the probability that not everyone who shows up at the gate on time can be accomodated n=178p=09n2=169

IT
Stamp

EEOS 601 Applied Statistics Week 4 P 18 of 38

P=normcdf((n+05-np)sqrt(np(1-p)))- normcdf((n2-05-np)sqrt(np(1-p))) fprintf( nP not all can be accomodated with continuity correction = 64fnP) Without the continuity correction P2=normcdf((n-np)sqrt(np(1-p)))- normcdf((n2-np)sqrt(np(1-p))) fprintf( P not all can be accomodated without continuity correction = 64fnP2) k=169178 Solve as a binomial pdf P3 = sum(binopdf(knp)) fprintf( P not all can be accomodated exact binomial = 64fnP3) P4 = binocdf(nnp)-binocdf(n2-1np) fprintf( P not all can be accomodated exact binomial cdf = 64fnP4)

Case Study 431 ESP LMcs040301_4thm Case Study 431 ESP P 298 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Thirty two students correctly guessed 12489 times out of 60000 when the expected number of successes would be just 12000 What is the probability of this occuring by chancek=12489n=6e4p=15P1=1-binocdf(knp)fprintf(nUsing the binomial cdf the exact P = 97gnP1)X=(124885-np)sqrt(np(1-p))P2=1-normcdf(X)fprintf(Using the Normal approximation with the continuity n)fprintf(correction the exact P = 97gnP2)X=(12489-np)sqrt(np(1-p))P3=1-normcdf(X)fprintf(Using the Normal approximation without the continuity n)fprintf(correction the exact P = 97gnP3)

Questions Page 299

EEOS 601 Applied Statistics Week 4 P 19 of 38

434 Central Limit Theorem (p 301) Central limit theorem Every binomial random variable X can be written as the sum of n independent Bernoulli random variables X X X where X =1 with probability P and 0 with probability 1-p but if 1 2 n i

X=X +X ++X Theorem 431 can be reexpressed as 1 2 n

The Russian Lyapunov made many of the key advances and in 1920 George Polya called it the central limit theorem Theorem 432 p 273 Let W W be an infinite sequence of of independent random variables 1 2

with the same distribution Suppose that the mean igrave and the variance oacute2 of f (w) are both finite w

For any numbers a and b

Comment The central limit theorem is often stated in terms of the average of W W and 1 2

W rather than the sum Theorem 432 can be stated in equivalent form n

Example 432 LMEx040302_4thm Larsen amp Marx Example 432 Example of the central limit theorem Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf

Figure 5 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 uniformly distributed random numbers after conversion to Z scores The fit to the normal distribution is quite good even though each of the 40 samples is based on only 5 uniformly distributed numbers

EEOS 601 Applied Statistics Week 4 P 20 of 38

See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogrammultiple of 7 n=100samples=1000numbins=14 Numbins for thehistogram multiple of 7 n=1000samples=500numbins=14 For the histogram n=1000samples=40 numbins=7 For the histogramC=rand(samplesn) Figure 6 Example 432 Page 302 in Larsen amp MarxC6=sum(C) sums of n replicates (2006) 1000 samples of 5 uniformly distributed random Use symbolic math toolbox to numbers after conversion to Z scores The number of binscalculate the variance of a uniform increased from 7 to 14 for this bar chart The normal random distribution distribution provides an excellent fit even though each ofmu=05 1000 means is based on only 5 cases drawn from a uniformsyms f z y distribution The uniform distribution is symmetricdisp(Definite integral (01) of y^2) allowing a rapid convergence to the normal distributionint(y^201)EYsq=int(sym(y^2)01)VARY=EYsq-mu^2disp(Var of a uniformly distributed random number on the interval (01))disp(VARY)vary=eval(VARY) convert from symbolic to numeric The variance is 112 The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302) end Use the code on page 5-115 of Statistics toolbox to draw histogram binwidthscaling=7numbins This will produce 7 bins (35-(-35)) edges=-35binwidthscaling35N=histc(C7edges) bar(edgesNsum(N)binwidthscaling1) ax1=gca set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 21 of 38

axis([-36 36 -0005 053]) ylabel(DensityFontSize14)xlabel(Z-ratio) set(ax1xtickedgesFontSize12) set(ax1ytick00105FontSize12) figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is Figure 7 Example 432 Page 302 in Larsen amp Marx f(y_j)=1(sqrt(2pi)sigma_j)exp(-1

(2006) 1000 samples of 500 uniformly distributed random numbers after conversion to Z scores The number of bins

2((y_j-mu_j)sigma_j)^2) increased from 7 to 14 for this bar chart The fit of the mu_j=0 sigma_j=1 sigmaj is the standard

normal distribution is excellent as would be expected from the large number of cases (500) on which each mean is

deviation = 1 after Z transform basedy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)s=sprintf(Example 432 n=40f and samples=30fnsamples)title(s)figure(gcf)pause

LMex040302_logrnd_4thm Larsen amp Marx Example 432 Example of the central limit theorem using lognormally distributed variables instead of uniform Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox

Figure 8 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 lognormally distributed (igrave=1 oacute=1) random numbers after conversion to Z-scores The normal approximation is quite poor Because each of the 40 samples is based on a mean of only 5 cases

EEOS 601 Applied Statistics Week 4 P 22 of 38

Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogram multiple of 7 Figure 9 Example 432 Page 302 in Larsen amp Marx

n=100samples=1000 (2006) 1000 samples of 1000 lognormally distributed

numbins=14 Numbins for the (igrave=1 oacute=1) random numbers after conversion to Z scores I

histogram multiple of 7 used 14 bins for this bar chart Even for the highly

n=1000samples=1000 asymmetrically lognormally distributed samples the

numbins=14 For the histogram normal approximation is excellent because of the large

n=1000samples=40 numbins=7 number of cases (1000) on which each mean was based

For the histogram lognormal=0 This example calculates the z scores based on the parametric mean and variance for the underlying distributions For the uniform distribution the variance is calculated from the variance of the distribution to be 112 For the lognormal distribution lognstat is the built-in Matlab function that will covert the mean and variance from the log scale to the standard scale if lognormal mu=1var=1 C=lognrnd(11samplesn) [muvary]=lognstat(11)s=sprintf(Example 432 Lognormal n=40f and samples=30fnsamples) s2=sprintf(Example 432 Lognormal distribution 1st sample n=40fn) else C=rand(samplesn)

mu=05 Use symbolic math toolbox to calculate the variance of a uniform

random distribution syms f z y disp(Definite integral (01) of y^2) int(y^201) EYsq=int(sym(y^2)01)

EEOS 601 Applied Statistics Week 4 P 23 of 38

VARY=EYsq-mu^2 disp( Var of a uniformly distributed random number on the interval (01)) disp(VARY) vary=eval(VARY) convert from symbolic to numeric The variance is 112 s=sprintf(Example 432 Uniform n=40f and samples=40fnsamples) s2=sprintf(Example 432 Uniform distribution 1st sample n=40fn)endC6=sum(C) sums of n replicates The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302)end

Use the code on page 5-115 of Statistics toolbox to draw histogram Draw a histogram of one row of datahist(C(1))ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])ylabel(FrequencyFontSize16)xlabel(XFontSize16)title(s2FontSize20)figure(gcf)pause

histogram of the z scoresbinwidthscaling=7numbins This will produce 7 bins (35-(-35))edges=-35binwidthscaling35N=histc(C7edges)bar(edgesNsum(N)binwidthscaling1histc)ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])axis([-36 36 -0005 105max(Nsum(N)binwidthscaling)])ylabel(DensityFontSize16)xlabel(Z-ratioFontSize16)set(ax1xtickedgesFontSize14)set(ax1ytick00107FontSize14)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=0

EEOS 601 Applied Statistics Week 4 P 24 of 38

sigma_j=1 sigmaj is the standard deviation = 1 after Z transformy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)

title(sFontSize20)figure(gcf)pause

Example 433 LMex040303_4thm Example 433 Pp 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition An application of the symbolic math integration functions Typo in book f_y(y)=y 3(1-y)^2 not fy(y)=3(1-y)^2 syms y f_y(y)=y 3(1-y)^2 0lt=y=1 Let mean y = (115)Sum from 1 to 15 Yi int(y3(1-y)^201) muy=eval(int(y3(1-y)^201)) by definition of variance int(y^23(1-y)^201)-14^2 vary=eval(int(y^23(1-y)^201)-14^2) P=normcdf((38-muy) (sqrt(vary)sqrt(15)))- normcdf((18-muy) (sqrt(vary)sqrt(15)))

Example 434 LMEx040304_4thm Example 434 Effects of rounding analyzed with normal approximation Effect of rounding on the expected value and variance of rounded estimates P 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Using the central limit theorem what is the probability that the error on 100 transactions each rounded to the nearest $100 would exceed $500 What is the variance of a uniformly distributed random number over the interval -50 to 50 From definition of Variance Var(Y)=E[(Y-mu)^2]= integral from -50 to 50 (y-mu)^2 f_y(y) dy mu=0 f_y(y)=1 This is the height of the uniform pdf at each value of y for max(y)-min(y)=1 the value of f_y(y) is 1 for uniform random numbers spread over different ranges for max(y)-min(y)f_y(y) must equal 1 thus for a uniform random number distributed on the interval -100 to 100

EEOS 601 Applied Statistics Week 4 P 25 of 38

f_y(y)=1200 Written by EugeneGallagherumbedu for EEOS601 httpwwwesumbeduedgwebphtm Written in 2001 revised 1911 The mfile goes through some applications of the symbolic math toolbox at the beginning to verify that the variance of 100 rounded transactions is 25003 format rat syms y mu int(y^2100-5050)Use the symbolic math toolbox to find the variance of a uniformly distributed variable on the interval -50 to 50 it is 25003 VarY=eval(int(sym(y^2100)-5050)) Variance of a uniform randomly distributed number on the interval -50 to 50 is 25003 Note how the variance remains 25003 with a shift in mean Var(Y)=E(Y^2)-mu^2 mu=50 int(y^21000100) EYsq=eval(int(sym(y^2100)0100)) The previous statement produces 100003 which produces an identical variance after subtracting the square of the expected value mu^2 as is appropriate from the definition of variance VARY=EYsq-mu^2 Note that Var(Y) is also defined as integral -inf to inf (y-mu)^2 f_y(y) dy We can create the definite integral of this equation and evaluate it to produce the variance syms y mu mu=50 VARY2=eval(int(sym((y-mu)^2100)y0100)) VARY2 should also be 25003 format using Theorem 31312 in Larsen amp Marx (2001 p 223) Calculate the variance of all 100 transactions (multiply the variance by 100) Note this is for 100 indepedently distributed random variates where the positive and negative errors among transactions are not correlated Varall=100VARY If you wanted to convert the variance calculated here in dollars to the variance in pennies youd have to use Theorem 3131 p 222 It would the variance of Varall in pennies would be a=100Var(aVarall)=a^2 Varall =10000 Varall calculate the standard deviation sigmaall=sqrt(Varall)

EEOS 601 Applied Statistics Week 4 P 26 of 38

Calculate the Z statisticZ=zeros(21)Z(1)=(-500-0)sigmaallZ(2)=(500-0)sigmaall Use the cumulative normal probability function to find the probability of being more than $5 dollars off after 100 transactionsP=1-(normcdf(Z(2))-normcdf(Z(1)))fprintf(The probability of being more than $500 off after 100n)fprintf(transactions in which each transaction is rounded to the n)fprintf(nearest $100 is 64fnP)

Example 435 LMex040305_4thm Example 435 Memphis earthquakes and the Poisson distribution Pp 305-306 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Annual earthquakes registering 25 on the Richter scale and within 40 miles of downtown Memphis have a Poisson distribution with lambda=65 Calculate the exact probability that nine or more such earthquakes will strike next year using the Poisson distribution and compare that with the normal approximation based on the central limit theoremlambda=65N=9P=1-poisscdf(N-1lambda)fprintf(nThe exact p value using the Poisson cumulative distribution )fprintf(function with lambda=53f is 74fnlambdaP)P2=1-normcdf(((N-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP2)continuity=05 The continuity correction is appropriate whenever a discrete probability model is being approximated by the area under a curve)P3=1-normcdf(((N+continuity-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem with )fprintf(the continuity correction is )fprintf(74fnP3)

Optional plot the Poisson pdf and the normal approximation Use LMFig040301_4thm as a modelX=018Y = poisspdf(Xlambda)bar(XY1)axis([-06 19 0 02])title(Example 435FontSize14)set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 27 of 38

ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0218FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=lambdasigma_j=sqrt(lambda) sigmaj is the standard deviationy_j=00218fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel(Number of Memphis Earthquakes gt 25)figure(gcf)pause

Questions p 306

435 The Normal Curve as a Model for Individual Measurements

Definition 431 A random variable Y is said to be normally distributed with mean igrave and variance oacutesup2 if

Comment Areas under and ldquoarbitraryrdquo normal distribution f (y) y

are calculated by finding the equivalent area under the standard normal distribution f (z) z Figure 10 Figure 435 from Example 436 If a driverrsquos

true blood alcohol level is 0095 and the breathalyzer has an instrumental error oacute= 0004 (pdf shown in upper panel) What is the probability that a driver will be

The ratio is often referred to erroneously charged with DUI The problem can be as either a Z transformation or a Z analyzed using the pdf for the unscaled data or the data can score be converted to z scores (lower panel) by subtracting the

mean and dividing by oacute The probability of an erroneous Example 436 charge is 1056 the area to the right of the cutpoints LMex040306_4thm shown in the normal and standard normal curves

EEOS 601 Applied Statistics Week 4 P 28 of 38

Example 436 Drunk driving pages 308-309 in Larsen amp Marx (2006) Based on LMex040305_4thm Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml A legally drunk driver in many states has a blood alcohol level of 010 but the breathalyzer has measurement error Repeated measurements indicate that the the estimate of the mean is unbiased and sigma is 0004 If a drivers true blood alcohol content is 0095 what is the probability that he will be incorrectly booked on a DUI chargemu=0095sigma=0004Observed=01P=1-normcdf((Observed-mu)sigma)fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP)

Plot Figure 435subplot(211)X=0075000050115Y = normpdf(Xmusigma)plot(XY--r)axis([0075 0115 0 105])title(Figure 435FontSize14)ax1=gcaxlabel(Blood Alcohol Level)ylabel(f_y(y)FontSize14)ax1=gcaset(ax1xtick0080005011FontSize12)set(ax1ytick025100FontSize12)hold onxf=0100001115yf=normpdf(xfmusigma)fill([01 xf 0115][0 yf 0][8 8 1])hold off

subplot(212)X=-3500135Y = normpdf(X01)plot(XY--r)axis([-355 355 0 042])ax1=gcaxlabel(Z-score)ylabel(f_z(z)FontSize14)ax1=gca

EEOS 601 Applied Statistics Week 4 P 29 of 38

set(ax1xtick-350535FontSize12)set(ax1ytick00204FontSize12)hold onxf=125000135yf=normpdf(xf01)fill([125 xf 35][0 yf 0][8 8 1])figure(gcf)pausehold offsubplot(111)

Example 437 Mensa p 309 LMex040307_4thm Example 437 Mensa amp IQ Pages309-310 in Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Based on LMex040306_4thm Written Fall 2010 by EugeneGallagher for EEOS601 Revised1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfi Figure 11 Figure 436 from Example 437 A Mensa

lesedgwebphtml score of 133 marks the upper 2 of IQrsquos if igrave=100 and

A person must be in the upper 2 oacute=16

of the population in IQ to join What is the minimum IQ that will qualify a person for membership Assume that the mean IQ is 100 and sigma is 16mu=100sigma=16 Use the inverse of the normal pdf to find the IQ cutoff for the upper 2 of the population 098=(1-002)IQmensa=norminv(09810016)fprintf(nThe lowest acceptable IQ for membership in Mensa is )fprintf(30fnceil(IQmensa))

Plot Figure 436X=50150Y = normpdf(Xmusigma)plot(XY--r)axis([45 155 0 003])title(Figure 436 Mensa amp IQFontSize20)ax1=gcaxlabel(IQFontSize16)ylabel(f_y(y)FontSize16)ax1=gcaset(ax1xtick[50 100 133 150]FontSize14)

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 16: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 16 of 38

431 Theorem 431 Let X be a binomial random variable defined on n independent trials for which p=P(success) For any numbers a and b

Comment The function is referred to as the standard normal (or

Gaussian) curve [or distribution] By convention any random variable whose probabilistic behavior is described by a standard normal curve is denoted by Z (rather than by X Y or W) E(Z)=0 and Var(Z)=1 Page 294 in Larsen amp Marx (2006)

432 Finding areas under the standard normal curve

Figure 432 LMFig040302_4thm Larsen amp Marx Figure 432 plotpage 295 Use LMex040308_4thm as amodel Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Page 267 Standard normal distributionusing ezplot Integration using the symbolicmath toolbox Figure 4 Figure 432 Written byEugeneGallagherumbedu October 1 2010clear allhold off clfsyms z Just for fun use symbolic math integration to find the integralINTP=eval(int(1sqrt(2pi)exp(-(z^22))-inf114)) or use the standard normal cumulative distribution functionintp=normcdf(114)

Plot Figure 432mu=0sigma=1X=-350135

EEOS 601 Applied Statistics Week 4 P 17 of 38

Y = normpdf(Xmusigma) plot(XY--r) axis([-355 355 0 041]) title(Figure 432FontSize16) ax1=gca xlabel( ) ylabel(f_z(z)FontSize14) ax1=gca set(ax1XTick[-3 -2 -1 0 114 2 3] FontSize12

XTickLabel-3-2-1011423FontSize12) set(ax1ytick0104FontSize12) hold on xf=-3501114yf=normpdf(xfmusigma) fill([-35 xf 114][0 yf 0][8 8 1]) hold off

433 The continuity correction

Figure 5 Figure 433 the continuity correction

Example 431 LMex040301_4thm Example 431 Overbooking airline seats Application of the normal distribution and the continuity correction Page 297 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagher httpwwwesumbeduedgwebphtm Written Fall 2010 Revised 1711 178 seats sold for 168 economy class seats available p=09 passenger will show up What is the probability that not everyone who shows up at the gate on time can be accomodated n=178p=09n2=169

IT
Stamp

EEOS 601 Applied Statistics Week 4 P 18 of 38

P=normcdf((n+05-np)sqrt(np(1-p)))- normcdf((n2-05-np)sqrt(np(1-p))) fprintf( nP not all can be accomodated with continuity correction = 64fnP) Without the continuity correction P2=normcdf((n-np)sqrt(np(1-p)))- normcdf((n2-np)sqrt(np(1-p))) fprintf( P not all can be accomodated without continuity correction = 64fnP2) k=169178 Solve as a binomial pdf P3 = sum(binopdf(knp)) fprintf( P not all can be accomodated exact binomial = 64fnP3) P4 = binocdf(nnp)-binocdf(n2-1np) fprintf( P not all can be accomodated exact binomial cdf = 64fnP4)

Case Study 431 ESP LMcs040301_4thm Case Study 431 ESP P 298 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Thirty two students correctly guessed 12489 times out of 60000 when the expected number of successes would be just 12000 What is the probability of this occuring by chancek=12489n=6e4p=15P1=1-binocdf(knp)fprintf(nUsing the binomial cdf the exact P = 97gnP1)X=(124885-np)sqrt(np(1-p))P2=1-normcdf(X)fprintf(Using the Normal approximation with the continuity n)fprintf(correction the exact P = 97gnP2)X=(12489-np)sqrt(np(1-p))P3=1-normcdf(X)fprintf(Using the Normal approximation without the continuity n)fprintf(correction the exact P = 97gnP3)

Questions Page 299

EEOS 601 Applied Statistics Week 4 P 19 of 38

434 Central Limit Theorem (p 301) Central limit theorem Every binomial random variable X can be written as the sum of n independent Bernoulli random variables X X X where X =1 with probability P and 0 with probability 1-p but if 1 2 n i

X=X +X ++X Theorem 431 can be reexpressed as 1 2 n

The Russian Lyapunov made many of the key advances and in 1920 George Polya called it the central limit theorem Theorem 432 p 273 Let W W be an infinite sequence of of independent random variables 1 2

with the same distribution Suppose that the mean igrave and the variance oacute2 of f (w) are both finite w

For any numbers a and b

Comment The central limit theorem is often stated in terms of the average of W W and 1 2

W rather than the sum Theorem 432 can be stated in equivalent form n

Example 432 LMEx040302_4thm Larsen amp Marx Example 432 Example of the central limit theorem Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf

Figure 5 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 uniformly distributed random numbers after conversion to Z scores The fit to the normal distribution is quite good even though each of the 40 samples is based on only 5 uniformly distributed numbers

EEOS 601 Applied Statistics Week 4 P 20 of 38

See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogrammultiple of 7 n=100samples=1000numbins=14 Numbins for thehistogram multiple of 7 n=1000samples=500numbins=14 For the histogram n=1000samples=40 numbins=7 For the histogramC=rand(samplesn) Figure 6 Example 432 Page 302 in Larsen amp MarxC6=sum(C) sums of n replicates (2006) 1000 samples of 5 uniformly distributed random Use symbolic math toolbox to numbers after conversion to Z scores The number of binscalculate the variance of a uniform increased from 7 to 14 for this bar chart The normal random distribution distribution provides an excellent fit even though each ofmu=05 1000 means is based on only 5 cases drawn from a uniformsyms f z y distribution The uniform distribution is symmetricdisp(Definite integral (01) of y^2) allowing a rapid convergence to the normal distributionint(y^201)EYsq=int(sym(y^2)01)VARY=EYsq-mu^2disp(Var of a uniformly distributed random number on the interval (01))disp(VARY)vary=eval(VARY) convert from symbolic to numeric The variance is 112 The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302) end Use the code on page 5-115 of Statistics toolbox to draw histogram binwidthscaling=7numbins This will produce 7 bins (35-(-35)) edges=-35binwidthscaling35N=histc(C7edges) bar(edgesNsum(N)binwidthscaling1) ax1=gca set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 21 of 38

axis([-36 36 -0005 053]) ylabel(DensityFontSize14)xlabel(Z-ratio) set(ax1xtickedgesFontSize12) set(ax1ytick00105FontSize12) figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is Figure 7 Example 432 Page 302 in Larsen amp Marx f(y_j)=1(sqrt(2pi)sigma_j)exp(-1

(2006) 1000 samples of 500 uniformly distributed random numbers after conversion to Z scores The number of bins

2((y_j-mu_j)sigma_j)^2) increased from 7 to 14 for this bar chart The fit of the mu_j=0 sigma_j=1 sigmaj is the standard

normal distribution is excellent as would be expected from the large number of cases (500) on which each mean is

deviation = 1 after Z transform basedy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)s=sprintf(Example 432 n=40f and samples=30fnsamples)title(s)figure(gcf)pause

LMex040302_logrnd_4thm Larsen amp Marx Example 432 Example of the central limit theorem using lognormally distributed variables instead of uniform Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox

Figure 8 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 lognormally distributed (igrave=1 oacute=1) random numbers after conversion to Z-scores The normal approximation is quite poor Because each of the 40 samples is based on a mean of only 5 cases

EEOS 601 Applied Statistics Week 4 P 22 of 38

Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogram multiple of 7 Figure 9 Example 432 Page 302 in Larsen amp Marx

n=100samples=1000 (2006) 1000 samples of 1000 lognormally distributed

numbins=14 Numbins for the (igrave=1 oacute=1) random numbers after conversion to Z scores I

histogram multiple of 7 used 14 bins for this bar chart Even for the highly

n=1000samples=1000 asymmetrically lognormally distributed samples the

numbins=14 For the histogram normal approximation is excellent because of the large

n=1000samples=40 numbins=7 number of cases (1000) on which each mean was based

For the histogram lognormal=0 This example calculates the z scores based on the parametric mean and variance for the underlying distributions For the uniform distribution the variance is calculated from the variance of the distribution to be 112 For the lognormal distribution lognstat is the built-in Matlab function that will covert the mean and variance from the log scale to the standard scale if lognormal mu=1var=1 C=lognrnd(11samplesn) [muvary]=lognstat(11)s=sprintf(Example 432 Lognormal n=40f and samples=30fnsamples) s2=sprintf(Example 432 Lognormal distribution 1st sample n=40fn) else C=rand(samplesn)

mu=05 Use symbolic math toolbox to calculate the variance of a uniform

random distribution syms f z y disp(Definite integral (01) of y^2) int(y^201) EYsq=int(sym(y^2)01)

EEOS 601 Applied Statistics Week 4 P 23 of 38

VARY=EYsq-mu^2 disp( Var of a uniformly distributed random number on the interval (01)) disp(VARY) vary=eval(VARY) convert from symbolic to numeric The variance is 112 s=sprintf(Example 432 Uniform n=40f and samples=40fnsamples) s2=sprintf(Example 432 Uniform distribution 1st sample n=40fn)endC6=sum(C) sums of n replicates The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302)end

Use the code on page 5-115 of Statistics toolbox to draw histogram Draw a histogram of one row of datahist(C(1))ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])ylabel(FrequencyFontSize16)xlabel(XFontSize16)title(s2FontSize20)figure(gcf)pause

histogram of the z scoresbinwidthscaling=7numbins This will produce 7 bins (35-(-35))edges=-35binwidthscaling35N=histc(C7edges)bar(edgesNsum(N)binwidthscaling1histc)ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])axis([-36 36 -0005 105max(Nsum(N)binwidthscaling)])ylabel(DensityFontSize16)xlabel(Z-ratioFontSize16)set(ax1xtickedgesFontSize14)set(ax1ytick00107FontSize14)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=0

EEOS 601 Applied Statistics Week 4 P 24 of 38

sigma_j=1 sigmaj is the standard deviation = 1 after Z transformy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)

title(sFontSize20)figure(gcf)pause

Example 433 LMex040303_4thm Example 433 Pp 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition An application of the symbolic math integration functions Typo in book f_y(y)=y 3(1-y)^2 not fy(y)=3(1-y)^2 syms y f_y(y)=y 3(1-y)^2 0lt=y=1 Let mean y = (115)Sum from 1 to 15 Yi int(y3(1-y)^201) muy=eval(int(y3(1-y)^201)) by definition of variance int(y^23(1-y)^201)-14^2 vary=eval(int(y^23(1-y)^201)-14^2) P=normcdf((38-muy) (sqrt(vary)sqrt(15)))- normcdf((18-muy) (sqrt(vary)sqrt(15)))

Example 434 LMEx040304_4thm Example 434 Effects of rounding analyzed with normal approximation Effect of rounding on the expected value and variance of rounded estimates P 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Using the central limit theorem what is the probability that the error on 100 transactions each rounded to the nearest $100 would exceed $500 What is the variance of a uniformly distributed random number over the interval -50 to 50 From definition of Variance Var(Y)=E[(Y-mu)^2]= integral from -50 to 50 (y-mu)^2 f_y(y) dy mu=0 f_y(y)=1 This is the height of the uniform pdf at each value of y for max(y)-min(y)=1 the value of f_y(y) is 1 for uniform random numbers spread over different ranges for max(y)-min(y)f_y(y) must equal 1 thus for a uniform random number distributed on the interval -100 to 100

EEOS 601 Applied Statistics Week 4 P 25 of 38

f_y(y)=1200 Written by EugeneGallagherumbedu for EEOS601 httpwwwesumbeduedgwebphtm Written in 2001 revised 1911 The mfile goes through some applications of the symbolic math toolbox at the beginning to verify that the variance of 100 rounded transactions is 25003 format rat syms y mu int(y^2100-5050)Use the symbolic math toolbox to find the variance of a uniformly distributed variable on the interval -50 to 50 it is 25003 VarY=eval(int(sym(y^2100)-5050)) Variance of a uniform randomly distributed number on the interval -50 to 50 is 25003 Note how the variance remains 25003 with a shift in mean Var(Y)=E(Y^2)-mu^2 mu=50 int(y^21000100) EYsq=eval(int(sym(y^2100)0100)) The previous statement produces 100003 which produces an identical variance after subtracting the square of the expected value mu^2 as is appropriate from the definition of variance VARY=EYsq-mu^2 Note that Var(Y) is also defined as integral -inf to inf (y-mu)^2 f_y(y) dy We can create the definite integral of this equation and evaluate it to produce the variance syms y mu mu=50 VARY2=eval(int(sym((y-mu)^2100)y0100)) VARY2 should also be 25003 format using Theorem 31312 in Larsen amp Marx (2001 p 223) Calculate the variance of all 100 transactions (multiply the variance by 100) Note this is for 100 indepedently distributed random variates where the positive and negative errors among transactions are not correlated Varall=100VARY If you wanted to convert the variance calculated here in dollars to the variance in pennies youd have to use Theorem 3131 p 222 It would the variance of Varall in pennies would be a=100Var(aVarall)=a^2 Varall =10000 Varall calculate the standard deviation sigmaall=sqrt(Varall)

EEOS 601 Applied Statistics Week 4 P 26 of 38

Calculate the Z statisticZ=zeros(21)Z(1)=(-500-0)sigmaallZ(2)=(500-0)sigmaall Use the cumulative normal probability function to find the probability of being more than $5 dollars off after 100 transactionsP=1-(normcdf(Z(2))-normcdf(Z(1)))fprintf(The probability of being more than $500 off after 100n)fprintf(transactions in which each transaction is rounded to the n)fprintf(nearest $100 is 64fnP)

Example 435 LMex040305_4thm Example 435 Memphis earthquakes and the Poisson distribution Pp 305-306 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Annual earthquakes registering 25 on the Richter scale and within 40 miles of downtown Memphis have a Poisson distribution with lambda=65 Calculate the exact probability that nine or more such earthquakes will strike next year using the Poisson distribution and compare that with the normal approximation based on the central limit theoremlambda=65N=9P=1-poisscdf(N-1lambda)fprintf(nThe exact p value using the Poisson cumulative distribution )fprintf(function with lambda=53f is 74fnlambdaP)P2=1-normcdf(((N-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP2)continuity=05 The continuity correction is appropriate whenever a discrete probability model is being approximated by the area under a curve)P3=1-normcdf(((N+continuity-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem with )fprintf(the continuity correction is )fprintf(74fnP3)

Optional plot the Poisson pdf and the normal approximation Use LMFig040301_4thm as a modelX=018Y = poisspdf(Xlambda)bar(XY1)axis([-06 19 0 02])title(Example 435FontSize14)set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 27 of 38

ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0218FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=lambdasigma_j=sqrt(lambda) sigmaj is the standard deviationy_j=00218fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel(Number of Memphis Earthquakes gt 25)figure(gcf)pause

Questions p 306

435 The Normal Curve as a Model for Individual Measurements

Definition 431 A random variable Y is said to be normally distributed with mean igrave and variance oacutesup2 if

Comment Areas under and ldquoarbitraryrdquo normal distribution f (y) y

are calculated by finding the equivalent area under the standard normal distribution f (z) z Figure 10 Figure 435 from Example 436 If a driverrsquos

true blood alcohol level is 0095 and the breathalyzer has an instrumental error oacute= 0004 (pdf shown in upper panel) What is the probability that a driver will be

The ratio is often referred to erroneously charged with DUI The problem can be as either a Z transformation or a Z analyzed using the pdf for the unscaled data or the data can score be converted to z scores (lower panel) by subtracting the

mean and dividing by oacute The probability of an erroneous Example 436 charge is 1056 the area to the right of the cutpoints LMex040306_4thm shown in the normal and standard normal curves

EEOS 601 Applied Statistics Week 4 P 28 of 38

Example 436 Drunk driving pages 308-309 in Larsen amp Marx (2006) Based on LMex040305_4thm Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml A legally drunk driver in many states has a blood alcohol level of 010 but the breathalyzer has measurement error Repeated measurements indicate that the the estimate of the mean is unbiased and sigma is 0004 If a drivers true blood alcohol content is 0095 what is the probability that he will be incorrectly booked on a DUI chargemu=0095sigma=0004Observed=01P=1-normcdf((Observed-mu)sigma)fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP)

Plot Figure 435subplot(211)X=0075000050115Y = normpdf(Xmusigma)plot(XY--r)axis([0075 0115 0 105])title(Figure 435FontSize14)ax1=gcaxlabel(Blood Alcohol Level)ylabel(f_y(y)FontSize14)ax1=gcaset(ax1xtick0080005011FontSize12)set(ax1ytick025100FontSize12)hold onxf=0100001115yf=normpdf(xfmusigma)fill([01 xf 0115][0 yf 0][8 8 1])hold off

subplot(212)X=-3500135Y = normpdf(X01)plot(XY--r)axis([-355 355 0 042])ax1=gcaxlabel(Z-score)ylabel(f_z(z)FontSize14)ax1=gca

EEOS 601 Applied Statistics Week 4 P 29 of 38

set(ax1xtick-350535FontSize12)set(ax1ytick00204FontSize12)hold onxf=125000135yf=normpdf(xf01)fill([125 xf 35][0 yf 0][8 8 1])figure(gcf)pausehold offsubplot(111)

Example 437 Mensa p 309 LMex040307_4thm Example 437 Mensa amp IQ Pages309-310 in Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Based on LMex040306_4thm Written Fall 2010 by EugeneGallagher for EEOS601 Revised1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfi Figure 11 Figure 436 from Example 437 A Mensa

lesedgwebphtml score of 133 marks the upper 2 of IQrsquos if igrave=100 and

A person must be in the upper 2 oacute=16

of the population in IQ to join What is the minimum IQ that will qualify a person for membership Assume that the mean IQ is 100 and sigma is 16mu=100sigma=16 Use the inverse of the normal pdf to find the IQ cutoff for the upper 2 of the population 098=(1-002)IQmensa=norminv(09810016)fprintf(nThe lowest acceptable IQ for membership in Mensa is )fprintf(30fnceil(IQmensa))

Plot Figure 436X=50150Y = normpdf(Xmusigma)plot(XY--r)axis([45 155 0 003])title(Figure 436 Mensa amp IQFontSize20)ax1=gcaxlabel(IQFontSize16)ylabel(f_y(y)FontSize16)ax1=gcaset(ax1xtick[50 100 133 150]FontSize14)

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 17: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 17 of 38

Y = normpdf(Xmusigma) plot(XY--r) axis([-355 355 0 041]) title(Figure 432FontSize16) ax1=gca xlabel( ) ylabel(f_z(z)FontSize14) ax1=gca set(ax1XTick[-3 -2 -1 0 114 2 3] FontSize12

XTickLabel-3-2-1011423FontSize12) set(ax1ytick0104FontSize12) hold on xf=-3501114yf=normpdf(xfmusigma) fill([-35 xf 114][0 yf 0][8 8 1]) hold off

433 The continuity correction

Figure 5 Figure 433 the continuity correction

Example 431 LMex040301_4thm Example 431 Overbooking airline seats Application of the normal distribution and the continuity correction Page 297 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by EugeneGallagher httpwwwesumbeduedgwebphtm Written Fall 2010 Revised 1711 178 seats sold for 168 economy class seats available p=09 passenger will show up What is the probability that not everyone who shows up at the gate on time can be accomodated n=178p=09n2=169

IT
Stamp

EEOS 601 Applied Statistics Week 4 P 18 of 38

P=normcdf((n+05-np)sqrt(np(1-p)))- normcdf((n2-05-np)sqrt(np(1-p))) fprintf( nP not all can be accomodated with continuity correction = 64fnP) Without the continuity correction P2=normcdf((n-np)sqrt(np(1-p)))- normcdf((n2-np)sqrt(np(1-p))) fprintf( P not all can be accomodated without continuity correction = 64fnP2) k=169178 Solve as a binomial pdf P3 = sum(binopdf(knp)) fprintf( P not all can be accomodated exact binomial = 64fnP3) P4 = binocdf(nnp)-binocdf(n2-1np) fprintf( P not all can be accomodated exact binomial cdf = 64fnP4)

Case Study 431 ESP LMcs040301_4thm Case Study 431 ESP P 298 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Thirty two students correctly guessed 12489 times out of 60000 when the expected number of successes would be just 12000 What is the probability of this occuring by chancek=12489n=6e4p=15P1=1-binocdf(knp)fprintf(nUsing the binomial cdf the exact P = 97gnP1)X=(124885-np)sqrt(np(1-p))P2=1-normcdf(X)fprintf(Using the Normal approximation with the continuity n)fprintf(correction the exact P = 97gnP2)X=(12489-np)sqrt(np(1-p))P3=1-normcdf(X)fprintf(Using the Normal approximation without the continuity n)fprintf(correction the exact P = 97gnP3)

Questions Page 299

EEOS 601 Applied Statistics Week 4 P 19 of 38

434 Central Limit Theorem (p 301) Central limit theorem Every binomial random variable X can be written as the sum of n independent Bernoulli random variables X X X where X =1 with probability P and 0 with probability 1-p but if 1 2 n i

X=X +X ++X Theorem 431 can be reexpressed as 1 2 n

The Russian Lyapunov made many of the key advances and in 1920 George Polya called it the central limit theorem Theorem 432 p 273 Let W W be an infinite sequence of of independent random variables 1 2

with the same distribution Suppose that the mean igrave and the variance oacute2 of f (w) are both finite w

For any numbers a and b

Comment The central limit theorem is often stated in terms of the average of W W and 1 2

W rather than the sum Theorem 432 can be stated in equivalent form n

Example 432 LMEx040302_4thm Larsen amp Marx Example 432 Example of the central limit theorem Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf

Figure 5 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 uniformly distributed random numbers after conversion to Z scores The fit to the normal distribution is quite good even though each of the 40 samples is based on only 5 uniformly distributed numbers

EEOS 601 Applied Statistics Week 4 P 20 of 38

See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogrammultiple of 7 n=100samples=1000numbins=14 Numbins for thehistogram multiple of 7 n=1000samples=500numbins=14 For the histogram n=1000samples=40 numbins=7 For the histogramC=rand(samplesn) Figure 6 Example 432 Page 302 in Larsen amp MarxC6=sum(C) sums of n replicates (2006) 1000 samples of 5 uniformly distributed random Use symbolic math toolbox to numbers after conversion to Z scores The number of binscalculate the variance of a uniform increased from 7 to 14 for this bar chart The normal random distribution distribution provides an excellent fit even though each ofmu=05 1000 means is based on only 5 cases drawn from a uniformsyms f z y distribution The uniform distribution is symmetricdisp(Definite integral (01) of y^2) allowing a rapid convergence to the normal distributionint(y^201)EYsq=int(sym(y^2)01)VARY=EYsq-mu^2disp(Var of a uniformly distributed random number on the interval (01))disp(VARY)vary=eval(VARY) convert from symbolic to numeric The variance is 112 The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302) end Use the code on page 5-115 of Statistics toolbox to draw histogram binwidthscaling=7numbins This will produce 7 bins (35-(-35)) edges=-35binwidthscaling35N=histc(C7edges) bar(edgesNsum(N)binwidthscaling1) ax1=gca set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 21 of 38

axis([-36 36 -0005 053]) ylabel(DensityFontSize14)xlabel(Z-ratio) set(ax1xtickedgesFontSize12) set(ax1ytick00105FontSize12) figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is Figure 7 Example 432 Page 302 in Larsen amp Marx f(y_j)=1(sqrt(2pi)sigma_j)exp(-1

(2006) 1000 samples of 500 uniformly distributed random numbers after conversion to Z scores The number of bins

2((y_j-mu_j)sigma_j)^2) increased from 7 to 14 for this bar chart The fit of the mu_j=0 sigma_j=1 sigmaj is the standard

normal distribution is excellent as would be expected from the large number of cases (500) on which each mean is

deviation = 1 after Z transform basedy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)s=sprintf(Example 432 n=40f and samples=30fnsamples)title(s)figure(gcf)pause

LMex040302_logrnd_4thm Larsen amp Marx Example 432 Example of the central limit theorem using lognormally distributed variables instead of uniform Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox

Figure 8 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 lognormally distributed (igrave=1 oacute=1) random numbers after conversion to Z-scores The normal approximation is quite poor Because each of the 40 samples is based on a mean of only 5 cases

EEOS 601 Applied Statistics Week 4 P 22 of 38

Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogram multiple of 7 Figure 9 Example 432 Page 302 in Larsen amp Marx

n=100samples=1000 (2006) 1000 samples of 1000 lognormally distributed

numbins=14 Numbins for the (igrave=1 oacute=1) random numbers after conversion to Z scores I

histogram multiple of 7 used 14 bins for this bar chart Even for the highly

n=1000samples=1000 asymmetrically lognormally distributed samples the

numbins=14 For the histogram normal approximation is excellent because of the large

n=1000samples=40 numbins=7 number of cases (1000) on which each mean was based

For the histogram lognormal=0 This example calculates the z scores based on the parametric mean and variance for the underlying distributions For the uniform distribution the variance is calculated from the variance of the distribution to be 112 For the lognormal distribution lognstat is the built-in Matlab function that will covert the mean and variance from the log scale to the standard scale if lognormal mu=1var=1 C=lognrnd(11samplesn) [muvary]=lognstat(11)s=sprintf(Example 432 Lognormal n=40f and samples=30fnsamples) s2=sprintf(Example 432 Lognormal distribution 1st sample n=40fn) else C=rand(samplesn)

mu=05 Use symbolic math toolbox to calculate the variance of a uniform

random distribution syms f z y disp(Definite integral (01) of y^2) int(y^201) EYsq=int(sym(y^2)01)

EEOS 601 Applied Statistics Week 4 P 23 of 38

VARY=EYsq-mu^2 disp( Var of a uniformly distributed random number on the interval (01)) disp(VARY) vary=eval(VARY) convert from symbolic to numeric The variance is 112 s=sprintf(Example 432 Uniform n=40f and samples=40fnsamples) s2=sprintf(Example 432 Uniform distribution 1st sample n=40fn)endC6=sum(C) sums of n replicates The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302)end

Use the code on page 5-115 of Statistics toolbox to draw histogram Draw a histogram of one row of datahist(C(1))ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])ylabel(FrequencyFontSize16)xlabel(XFontSize16)title(s2FontSize20)figure(gcf)pause

histogram of the z scoresbinwidthscaling=7numbins This will produce 7 bins (35-(-35))edges=-35binwidthscaling35N=histc(C7edges)bar(edgesNsum(N)binwidthscaling1histc)ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])axis([-36 36 -0005 105max(Nsum(N)binwidthscaling)])ylabel(DensityFontSize16)xlabel(Z-ratioFontSize16)set(ax1xtickedgesFontSize14)set(ax1ytick00107FontSize14)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=0

EEOS 601 Applied Statistics Week 4 P 24 of 38

sigma_j=1 sigmaj is the standard deviation = 1 after Z transformy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)

title(sFontSize20)figure(gcf)pause

Example 433 LMex040303_4thm Example 433 Pp 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition An application of the symbolic math integration functions Typo in book f_y(y)=y 3(1-y)^2 not fy(y)=3(1-y)^2 syms y f_y(y)=y 3(1-y)^2 0lt=y=1 Let mean y = (115)Sum from 1 to 15 Yi int(y3(1-y)^201) muy=eval(int(y3(1-y)^201)) by definition of variance int(y^23(1-y)^201)-14^2 vary=eval(int(y^23(1-y)^201)-14^2) P=normcdf((38-muy) (sqrt(vary)sqrt(15)))- normcdf((18-muy) (sqrt(vary)sqrt(15)))

Example 434 LMEx040304_4thm Example 434 Effects of rounding analyzed with normal approximation Effect of rounding on the expected value and variance of rounded estimates P 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Using the central limit theorem what is the probability that the error on 100 transactions each rounded to the nearest $100 would exceed $500 What is the variance of a uniformly distributed random number over the interval -50 to 50 From definition of Variance Var(Y)=E[(Y-mu)^2]= integral from -50 to 50 (y-mu)^2 f_y(y) dy mu=0 f_y(y)=1 This is the height of the uniform pdf at each value of y for max(y)-min(y)=1 the value of f_y(y) is 1 for uniform random numbers spread over different ranges for max(y)-min(y)f_y(y) must equal 1 thus for a uniform random number distributed on the interval -100 to 100

EEOS 601 Applied Statistics Week 4 P 25 of 38

f_y(y)=1200 Written by EugeneGallagherumbedu for EEOS601 httpwwwesumbeduedgwebphtm Written in 2001 revised 1911 The mfile goes through some applications of the symbolic math toolbox at the beginning to verify that the variance of 100 rounded transactions is 25003 format rat syms y mu int(y^2100-5050)Use the symbolic math toolbox to find the variance of a uniformly distributed variable on the interval -50 to 50 it is 25003 VarY=eval(int(sym(y^2100)-5050)) Variance of a uniform randomly distributed number on the interval -50 to 50 is 25003 Note how the variance remains 25003 with a shift in mean Var(Y)=E(Y^2)-mu^2 mu=50 int(y^21000100) EYsq=eval(int(sym(y^2100)0100)) The previous statement produces 100003 which produces an identical variance after subtracting the square of the expected value mu^2 as is appropriate from the definition of variance VARY=EYsq-mu^2 Note that Var(Y) is also defined as integral -inf to inf (y-mu)^2 f_y(y) dy We can create the definite integral of this equation and evaluate it to produce the variance syms y mu mu=50 VARY2=eval(int(sym((y-mu)^2100)y0100)) VARY2 should also be 25003 format using Theorem 31312 in Larsen amp Marx (2001 p 223) Calculate the variance of all 100 transactions (multiply the variance by 100) Note this is for 100 indepedently distributed random variates where the positive and negative errors among transactions are not correlated Varall=100VARY If you wanted to convert the variance calculated here in dollars to the variance in pennies youd have to use Theorem 3131 p 222 It would the variance of Varall in pennies would be a=100Var(aVarall)=a^2 Varall =10000 Varall calculate the standard deviation sigmaall=sqrt(Varall)

EEOS 601 Applied Statistics Week 4 P 26 of 38

Calculate the Z statisticZ=zeros(21)Z(1)=(-500-0)sigmaallZ(2)=(500-0)sigmaall Use the cumulative normal probability function to find the probability of being more than $5 dollars off after 100 transactionsP=1-(normcdf(Z(2))-normcdf(Z(1)))fprintf(The probability of being more than $500 off after 100n)fprintf(transactions in which each transaction is rounded to the n)fprintf(nearest $100 is 64fnP)

Example 435 LMex040305_4thm Example 435 Memphis earthquakes and the Poisson distribution Pp 305-306 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Annual earthquakes registering 25 on the Richter scale and within 40 miles of downtown Memphis have a Poisson distribution with lambda=65 Calculate the exact probability that nine or more such earthquakes will strike next year using the Poisson distribution and compare that with the normal approximation based on the central limit theoremlambda=65N=9P=1-poisscdf(N-1lambda)fprintf(nThe exact p value using the Poisson cumulative distribution )fprintf(function with lambda=53f is 74fnlambdaP)P2=1-normcdf(((N-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP2)continuity=05 The continuity correction is appropriate whenever a discrete probability model is being approximated by the area under a curve)P3=1-normcdf(((N+continuity-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem with )fprintf(the continuity correction is )fprintf(74fnP3)

Optional plot the Poisson pdf and the normal approximation Use LMFig040301_4thm as a modelX=018Y = poisspdf(Xlambda)bar(XY1)axis([-06 19 0 02])title(Example 435FontSize14)set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 27 of 38

ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0218FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=lambdasigma_j=sqrt(lambda) sigmaj is the standard deviationy_j=00218fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel(Number of Memphis Earthquakes gt 25)figure(gcf)pause

Questions p 306

435 The Normal Curve as a Model for Individual Measurements

Definition 431 A random variable Y is said to be normally distributed with mean igrave and variance oacutesup2 if

Comment Areas under and ldquoarbitraryrdquo normal distribution f (y) y

are calculated by finding the equivalent area under the standard normal distribution f (z) z Figure 10 Figure 435 from Example 436 If a driverrsquos

true blood alcohol level is 0095 and the breathalyzer has an instrumental error oacute= 0004 (pdf shown in upper panel) What is the probability that a driver will be

The ratio is often referred to erroneously charged with DUI The problem can be as either a Z transformation or a Z analyzed using the pdf for the unscaled data or the data can score be converted to z scores (lower panel) by subtracting the

mean and dividing by oacute The probability of an erroneous Example 436 charge is 1056 the area to the right of the cutpoints LMex040306_4thm shown in the normal and standard normal curves

EEOS 601 Applied Statistics Week 4 P 28 of 38

Example 436 Drunk driving pages 308-309 in Larsen amp Marx (2006) Based on LMex040305_4thm Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml A legally drunk driver in many states has a blood alcohol level of 010 but the breathalyzer has measurement error Repeated measurements indicate that the the estimate of the mean is unbiased and sigma is 0004 If a drivers true blood alcohol content is 0095 what is the probability that he will be incorrectly booked on a DUI chargemu=0095sigma=0004Observed=01P=1-normcdf((Observed-mu)sigma)fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP)

Plot Figure 435subplot(211)X=0075000050115Y = normpdf(Xmusigma)plot(XY--r)axis([0075 0115 0 105])title(Figure 435FontSize14)ax1=gcaxlabel(Blood Alcohol Level)ylabel(f_y(y)FontSize14)ax1=gcaset(ax1xtick0080005011FontSize12)set(ax1ytick025100FontSize12)hold onxf=0100001115yf=normpdf(xfmusigma)fill([01 xf 0115][0 yf 0][8 8 1])hold off

subplot(212)X=-3500135Y = normpdf(X01)plot(XY--r)axis([-355 355 0 042])ax1=gcaxlabel(Z-score)ylabel(f_z(z)FontSize14)ax1=gca

EEOS 601 Applied Statistics Week 4 P 29 of 38

set(ax1xtick-350535FontSize12)set(ax1ytick00204FontSize12)hold onxf=125000135yf=normpdf(xf01)fill([125 xf 35][0 yf 0][8 8 1])figure(gcf)pausehold offsubplot(111)

Example 437 Mensa p 309 LMex040307_4thm Example 437 Mensa amp IQ Pages309-310 in Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Based on LMex040306_4thm Written Fall 2010 by EugeneGallagher for EEOS601 Revised1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfi Figure 11 Figure 436 from Example 437 A Mensa

lesedgwebphtml score of 133 marks the upper 2 of IQrsquos if igrave=100 and

A person must be in the upper 2 oacute=16

of the population in IQ to join What is the minimum IQ that will qualify a person for membership Assume that the mean IQ is 100 and sigma is 16mu=100sigma=16 Use the inverse of the normal pdf to find the IQ cutoff for the upper 2 of the population 098=(1-002)IQmensa=norminv(09810016)fprintf(nThe lowest acceptable IQ for membership in Mensa is )fprintf(30fnceil(IQmensa))

Plot Figure 436X=50150Y = normpdf(Xmusigma)plot(XY--r)axis([45 155 0 003])title(Figure 436 Mensa amp IQFontSize20)ax1=gcaxlabel(IQFontSize16)ylabel(f_y(y)FontSize16)ax1=gcaset(ax1xtick[50 100 133 150]FontSize14)

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 18: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 18 of 38

P=normcdf((n+05-np)sqrt(np(1-p)))- normcdf((n2-05-np)sqrt(np(1-p))) fprintf( nP not all can be accomodated with continuity correction = 64fnP) Without the continuity correction P2=normcdf((n-np)sqrt(np(1-p)))- normcdf((n2-np)sqrt(np(1-p))) fprintf( P not all can be accomodated without continuity correction = 64fnP2) k=169178 Solve as a binomial pdf P3 = sum(binopdf(knp)) fprintf( P not all can be accomodated exact binomial = 64fnP3) P4 = binocdf(nnp)-binocdf(n2-1np) fprintf( P not all can be accomodated exact binomial cdf = 64fnP4)

Case Study 431 ESP LMcs040301_4thm Case Study 431 ESP P 298 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written by Eugene Gallagher for EEOS601 EugeneGallagherumbedu httpwwwesumbeduedgwebphtm Thirty two students correctly guessed 12489 times out of 60000 when the expected number of successes would be just 12000 What is the probability of this occuring by chancek=12489n=6e4p=15P1=1-binocdf(knp)fprintf(nUsing the binomial cdf the exact P = 97gnP1)X=(124885-np)sqrt(np(1-p))P2=1-normcdf(X)fprintf(Using the Normal approximation with the continuity n)fprintf(correction the exact P = 97gnP2)X=(12489-np)sqrt(np(1-p))P3=1-normcdf(X)fprintf(Using the Normal approximation without the continuity n)fprintf(correction the exact P = 97gnP3)

Questions Page 299

EEOS 601 Applied Statistics Week 4 P 19 of 38

434 Central Limit Theorem (p 301) Central limit theorem Every binomial random variable X can be written as the sum of n independent Bernoulli random variables X X X where X =1 with probability P and 0 with probability 1-p but if 1 2 n i

X=X +X ++X Theorem 431 can be reexpressed as 1 2 n

The Russian Lyapunov made many of the key advances and in 1920 George Polya called it the central limit theorem Theorem 432 p 273 Let W W be an infinite sequence of of independent random variables 1 2

with the same distribution Suppose that the mean igrave and the variance oacute2 of f (w) are both finite w

For any numbers a and b

Comment The central limit theorem is often stated in terms of the average of W W and 1 2

W rather than the sum Theorem 432 can be stated in equivalent form n

Example 432 LMEx040302_4thm Larsen amp Marx Example 432 Example of the central limit theorem Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf

Figure 5 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 uniformly distributed random numbers after conversion to Z scores The fit to the normal distribution is quite good even though each of the 40 samples is based on only 5 uniformly distributed numbers

EEOS 601 Applied Statistics Week 4 P 20 of 38

See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogrammultiple of 7 n=100samples=1000numbins=14 Numbins for thehistogram multiple of 7 n=1000samples=500numbins=14 For the histogram n=1000samples=40 numbins=7 For the histogramC=rand(samplesn) Figure 6 Example 432 Page 302 in Larsen amp MarxC6=sum(C) sums of n replicates (2006) 1000 samples of 5 uniformly distributed random Use symbolic math toolbox to numbers after conversion to Z scores The number of binscalculate the variance of a uniform increased from 7 to 14 for this bar chart The normal random distribution distribution provides an excellent fit even though each ofmu=05 1000 means is based on only 5 cases drawn from a uniformsyms f z y distribution The uniform distribution is symmetricdisp(Definite integral (01) of y^2) allowing a rapid convergence to the normal distributionint(y^201)EYsq=int(sym(y^2)01)VARY=EYsq-mu^2disp(Var of a uniformly distributed random number on the interval (01))disp(VARY)vary=eval(VARY) convert from symbolic to numeric The variance is 112 The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302) end Use the code on page 5-115 of Statistics toolbox to draw histogram binwidthscaling=7numbins This will produce 7 bins (35-(-35)) edges=-35binwidthscaling35N=histc(C7edges) bar(edgesNsum(N)binwidthscaling1) ax1=gca set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 21 of 38

axis([-36 36 -0005 053]) ylabel(DensityFontSize14)xlabel(Z-ratio) set(ax1xtickedgesFontSize12) set(ax1ytick00105FontSize12) figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is Figure 7 Example 432 Page 302 in Larsen amp Marx f(y_j)=1(sqrt(2pi)sigma_j)exp(-1

(2006) 1000 samples of 500 uniformly distributed random numbers after conversion to Z scores The number of bins

2((y_j-mu_j)sigma_j)^2) increased from 7 to 14 for this bar chart The fit of the mu_j=0 sigma_j=1 sigmaj is the standard

normal distribution is excellent as would be expected from the large number of cases (500) on which each mean is

deviation = 1 after Z transform basedy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)s=sprintf(Example 432 n=40f and samples=30fnsamples)title(s)figure(gcf)pause

LMex040302_logrnd_4thm Larsen amp Marx Example 432 Example of the central limit theorem using lognormally distributed variables instead of uniform Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox

Figure 8 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 lognormally distributed (igrave=1 oacute=1) random numbers after conversion to Z-scores The normal approximation is quite poor Because each of the 40 samples is based on a mean of only 5 cases

EEOS 601 Applied Statistics Week 4 P 22 of 38

Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogram multiple of 7 Figure 9 Example 432 Page 302 in Larsen amp Marx

n=100samples=1000 (2006) 1000 samples of 1000 lognormally distributed

numbins=14 Numbins for the (igrave=1 oacute=1) random numbers after conversion to Z scores I

histogram multiple of 7 used 14 bins for this bar chart Even for the highly

n=1000samples=1000 asymmetrically lognormally distributed samples the

numbins=14 For the histogram normal approximation is excellent because of the large

n=1000samples=40 numbins=7 number of cases (1000) on which each mean was based

For the histogram lognormal=0 This example calculates the z scores based on the parametric mean and variance for the underlying distributions For the uniform distribution the variance is calculated from the variance of the distribution to be 112 For the lognormal distribution lognstat is the built-in Matlab function that will covert the mean and variance from the log scale to the standard scale if lognormal mu=1var=1 C=lognrnd(11samplesn) [muvary]=lognstat(11)s=sprintf(Example 432 Lognormal n=40f and samples=30fnsamples) s2=sprintf(Example 432 Lognormal distribution 1st sample n=40fn) else C=rand(samplesn)

mu=05 Use symbolic math toolbox to calculate the variance of a uniform

random distribution syms f z y disp(Definite integral (01) of y^2) int(y^201) EYsq=int(sym(y^2)01)

EEOS 601 Applied Statistics Week 4 P 23 of 38

VARY=EYsq-mu^2 disp( Var of a uniformly distributed random number on the interval (01)) disp(VARY) vary=eval(VARY) convert from symbolic to numeric The variance is 112 s=sprintf(Example 432 Uniform n=40f and samples=40fnsamples) s2=sprintf(Example 432 Uniform distribution 1st sample n=40fn)endC6=sum(C) sums of n replicates The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302)end

Use the code on page 5-115 of Statistics toolbox to draw histogram Draw a histogram of one row of datahist(C(1))ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])ylabel(FrequencyFontSize16)xlabel(XFontSize16)title(s2FontSize20)figure(gcf)pause

histogram of the z scoresbinwidthscaling=7numbins This will produce 7 bins (35-(-35))edges=-35binwidthscaling35N=histc(C7edges)bar(edgesNsum(N)binwidthscaling1histc)ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])axis([-36 36 -0005 105max(Nsum(N)binwidthscaling)])ylabel(DensityFontSize16)xlabel(Z-ratioFontSize16)set(ax1xtickedgesFontSize14)set(ax1ytick00107FontSize14)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=0

EEOS 601 Applied Statistics Week 4 P 24 of 38

sigma_j=1 sigmaj is the standard deviation = 1 after Z transformy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)

title(sFontSize20)figure(gcf)pause

Example 433 LMex040303_4thm Example 433 Pp 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition An application of the symbolic math integration functions Typo in book f_y(y)=y 3(1-y)^2 not fy(y)=3(1-y)^2 syms y f_y(y)=y 3(1-y)^2 0lt=y=1 Let mean y = (115)Sum from 1 to 15 Yi int(y3(1-y)^201) muy=eval(int(y3(1-y)^201)) by definition of variance int(y^23(1-y)^201)-14^2 vary=eval(int(y^23(1-y)^201)-14^2) P=normcdf((38-muy) (sqrt(vary)sqrt(15)))- normcdf((18-muy) (sqrt(vary)sqrt(15)))

Example 434 LMEx040304_4thm Example 434 Effects of rounding analyzed with normal approximation Effect of rounding on the expected value and variance of rounded estimates P 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Using the central limit theorem what is the probability that the error on 100 transactions each rounded to the nearest $100 would exceed $500 What is the variance of a uniformly distributed random number over the interval -50 to 50 From definition of Variance Var(Y)=E[(Y-mu)^2]= integral from -50 to 50 (y-mu)^2 f_y(y) dy mu=0 f_y(y)=1 This is the height of the uniform pdf at each value of y for max(y)-min(y)=1 the value of f_y(y) is 1 for uniform random numbers spread over different ranges for max(y)-min(y)f_y(y) must equal 1 thus for a uniform random number distributed on the interval -100 to 100

EEOS 601 Applied Statistics Week 4 P 25 of 38

f_y(y)=1200 Written by EugeneGallagherumbedu for EEOS601 httpwwwesumbeduedgwebphtm Written in 2001 revised 1911 The mfile goes through some applications of the symbolic math toolbox at the beginning to verify that the variance of 100 rounded transactions is 25003 format rat syms y mu int(y^2100-5050)Use the symbolic math toolbox to find the variance of a uniformly distributed variable on the interval -50 to 50 it is 25003 VarY=eval(int(sym(y^2100)-5050)) Variance of a uniform randomly distributed number on the interval -50 to 50 is 25003 Note how the variance remains 25003 with a shift in mean Var(Y)=E(Y^2)-mu^2 mu=50 int(y^21000100) EYsq=eval(int(sym(y^2100)0100)) The previous statement produces 100003 which produces an identical variance after subtracting the square of the expected value mu^2 as is appropriate from the definition of variance VARY=EYsq-mu^2 Note that Var(Y) is also defined as integral -inf to inf (y-mu)^2 f_y(y) dy We can create the definite integral of this equation and evaluate it to produce the variance syms y mu mu=50 VARY2=eval(int(sym((y-mu)^2100)y0100)) VARY2 should also be 25003 format using Theorem 31312 in Larsen amp Marx (2001 p 223) Calculate the variance of all 100 transactions (multiply the variance by 100) Note this is for 100 indepedently distributed random variates where the positive and negative errors among transactions are not correlated Varall=100VARY If you wanted to convert the variance calculated here in dollars to the variance in pennies youd have to use Theorem 3131 p 222 It would the variance of Varall in pennies would be a=100Var(aVarall)=a^2 Varall =10000 Varall calculate the standard deviation sigmaall=sqrt(Varall)

EEOS 601 Applied Statistics Week 4 P 26 of 38

Calculate the Z statisticZ=zeros(21)Z(1)=(-500-0)sigmaallZ(2)=(500-0)sigmaall Use the cumulative normal probability function to find the probability of being more than $5 dollars off after 100 transactionsP=1-(normcdf(Z(2))-normcdf(Z(1)))fprintf(The probability of being more than $500 off after 100n)fprintf(transactions in which each transaction is rounded to the n)fprintf(nearest $100 is 64fnP)

Example 435 LMex040305_4thm Example 435 Memphis earthquakes and the Poisson distribution Pp 305-306 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Annual earthquakes registering 25 on the Richter scale and within 40 miles of downtown Memphis have a Poisson distribution with lambda=65 Calculate the exact probability that nine or more such earthquakes will strike next year using the Poisson distribution and compare that with the normal approximation based on the central limit theoremlambda=65N=9P=1-poisscdf(N-1lambda)fprintf(nThe exact p value using the Poisson cumulative distribution )fprintf(function with lambda=53f is 74fnlambdaP)P2=1-normcdf(((N-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP2)continuity=05 The continuity correction is appropriate whenever a discrete probability model is being approximated by the area under a curve)P3=1-normcdf(((N+continuity-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem with )fprintf(the continuity correction is )fprintf(74fnP3)

Optional plot the Poisson pdf and the normal approximation Use LMFig040301_4thm as a modelX=018Y = poisspdf(Xlambda)bar(XY1)axis([-06 19 0 02])title(Example 435FontSize14)set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 27 of 38

ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0218FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=lambdasigma_j=sqrt(lambda) sigmaj is the standard deviationy_j=00218fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel(Number of Memphis Earthquakes gt 25)figure(gcf)pause

Questions p 306

435 The Normal Curve as a Model for Individual Measurements

Definition 431 A random variable Y is said to be normally distributed with mean igrave and variance oacutesup2 if

Comment Areas under and ldquoarbitraryrdquo normal distribution f (y) y

are calculated by finding the equivalent area under the standard normal distribution f (z) z Figure 10 Figure 435 from Example 436 If a driverrsquos

true blood alcohol level is 0095 and the breathalyzer has an instrumental error oacute= 0004 (pdf shown in upper panel) What is the probability that a driver will be

The ratio is often referred to erroneously charged with DUI The problem can be as either a Z transformation or a Z analyzed using the pdf for the unscaled data or the data can score be converted to z scores (lower panel) by subtracting the

mean and dividing by oacute The probability of an erroneous Example 436 charge is 1056 the area to the right of the cutpoints LMex040306_4thm shown in the normal and standard normal curves

EEOS 601 Applied Statistics Week 4 P 28 of 38

Example 436 Drunk driving pages 308-309 in Larsen amp Marx (2006) Based on LMex040305_4thm Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml A legally drunk driver in many states has a blood alcohol level of 010 but the breathalyzer has measurement error Repeated measurements indicate that the the estimate of the mean is unbiased and sigma is 0004 If a drivers true blood alcohol content is 0095 what is the probability that he will be incorrectly booked on a DUI chargemu=0095sigma=0004Observed=01P=1-normcdf((Observed-mu)sigma)fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP)

Plot Figure 435subplot(211)X=0075000050115Y = normpdf(Xmusigma)plot(XY--r)axis([0075 0115 0 105])title(Figure 435FontSize14)ax1=gcaxlabel(Blood Alcohol Level)ylabel(f_y(y)FontSize14)ax1=gcaset(ax1xtick0080005011FontSize12)set(ax1ytick025100FontSize12)hold onxf=0100001115yf=normpdf(xfmusigma)fill([01 xf 0115][0 yf 0][8 8 1])hold off

subplot(212)X=-3500135Y = normpdf(X01)plot(XY--r)axis([-355 355 0 042])ax1=gcaxlabel(Z-score)ylabel(f_z(z)FontSize14)ax1=gca

EEOS 601 Applied Statistics Week 4 P 29 of 38

set(ax1xtick-350535FontSize12)set(ax1ytick00204FontSize12)hold onxf=125000135yf=normpdf(xf01)fill([125 xf 35][0 yf 0][8 8 1])figure(gcf)pausehold offsubplot(111)

Example 437 Mensa p 309 LMex040307_4thm Example 437 Mensa amp IQ Pages309-310 in Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Based on LMex040306_4thm Written Fall 2010 by EugeneGallagher for EEOS601 Revised1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfi Figure 11 Figure 436 from Example 437 A Mensa

lesedgwebphtml score of 133 marks the upper 2 of IQrsquos if igrave=100 and

A person must be in the upper 2 oacute=16

of the population in IQ to join What is the minimum IQ that will qualify a person for membership Assume that the mean IQ is 100 and sigma is 16mu=100sigma=16 Use the inverse of the normal pdf to find the IQ cutoff for the upper 2 of the population 098=(1-002)IQmensa=norminv(09810016)fprintf(nThe lowest acceptable IQ for membership in Mensa is )fprintf(30fnceil(IQmensa))

Plot Figure 436X=50150Y = normpdf(Xmusigma)plot(XY--r)axis([45 155 0 003])title(Figure 436 Mensa amp IQFontSize20)ax1=gcaxlabel(IQFontSize16)ylabel(f_y(y)FontSize16)ax1=gcaset(ax1xtick[50 100 133 150]FontSize14)

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 19: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 19 of 38

434 Central Limit Theorem (p 301) Central limit theorem Every binomial random variable X can be written as the sum of n independent Bernoulli random variables X X X where X =1 with probability P and 0 with probability 1-p but if 1 2 n i

X=X +X ++X Theorem 431 can be reexpressed as 1 2 n

The Russian Lyapunov made many of the key advances and in 1920 George Polya called it the central limit theorem Theorem 432 p 273 Let W W be an infinite sequence of of independent random variables 1 2

with the same distribution Suppose that the mean igrave and the variance oacute2 of f (w) are both finite w

For any numbers a and b

Comment The central limit theorem is often stated in terms of the average of W W and 1 2

W rather than the sum Theorem 432 can be stated in equivalent form n

Example 432 LMEx040302_4thm Larsen amp Marx Example 432 Example of the central limit theorem Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf

Figure 5 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 uniformly distributed random numbers after conversion to Z scores The fit to the normal distribution is quite good even though each of the 40 samples is based on only 5 uniformly distributed numbers

EEOS 601 Applied Statistics Week 4 P 20 of 38

See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogrammultiple of 7 n=100samples=1000numbins=14 Numbins for thehistogram multiple of 7 n=1000samples=500numbins=14 For the histogram n=1000samples=40 numbins=7 For the histogramC=rand(samplesn) Figure 6 Example 432 Page 302 in Larsen amp MarxC6=sum(C) sums of n replicates (2006) 1000 samples of 5 uniformly distributed random Use symbolic math toolbox to numbers after conversion to Z scores The number of binscalculate the variance of a uniform increased from 7 to 14 for this bar chart The normal random distribution distribution provides an excellent fit even though each ofmu=05 1000 means is based on only 5 cases drawn from a uniformsyms f z y distribution The uniform distribution is symmetricdisp(Definite integral (01) of y^2) allowing a rapid convergence to the normal distributionint(y^201)EYsq=int(sym(y^2)01)VARY=EYsq-mu^2disp(Var of a uniformly distributed random number on the interval (01))disp(VARY)vary=eval(VARY) convert from symbolic to numeric The variance is 112 The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302) end Use the code on page 5-115 of Statistics toolbox to draw histogram binwidthscaling=7numbins This will produce 7 bins (35-(-35)) edges=-35binwidthscaling35N=histc(C7edges) bar(edgesNsum(N)binwidthscaling1) ax1=gca set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 21 of 38

axis([-36 36 -0005 053]) ylabel(DensityFontSize14)xlabel(Z-ratio) set(ax1xtickedgesFontSize12) set(ax1ytick00105FontSize12) figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is Figure 7 Example 432 Page 302 in Larsen amp Marx f(y_j)=1(sqrt(2pi)sigma_j)exp(-1

(2006) 1000 samples of 500 uniformly distributed random numbers after conversion to Z scores The number of bins

2((y_j-mu_j)sigma_j)^2) increased from 7 to 14 for this bar chart The fit of the mu_j=0 sigma_j=1 sigmaj is the standard

normal distribution is excellent as would be expected from the large number of cases (500) on which each mean is

deviation = 1 after Z transform basedy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)s=sprintf(Example 432 n=40f and samples=30fnsamples)title(s)figure(gcf)pause

LMex040302_logrnd_4thm Larsen amp Marx Example 432 Example of the central limit theorem using lognormally distributed variables instead of uniform Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox

Figure 8 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 lognormally distributed (igrave=1 oacute=1) random numbers after conversion to Z-scores The normal approximation is quite poor Because each of the 40 samples is based on a mean of only 5 cases

EEOS 601 Applied Statistics Week 4 P 22 of 38

Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogram multiple of 7 Figure 9 Example 432 Page 302 in Larsen amp Marx

n=100samples=1000 (2006) 1000 samples of 1000 lognormally distributed

numbins=14 Numbins for the (igrave=1 oacute=1) random numbers after conversion to Z scores I

histogram multiple of 7 used 14 bins for this bar chart Even for the highly

n=1000samples=1000 asymmetrically lognormally distributed samples the

numbins=14 For the histogram normal approximation is excellent because of the large

n=1000samples=40 numbins=7 number of cases (1000) on which each mean was based

For the histogram lognormal=0 This example calculates the z scores based on the parametric mean and variance for the underlying distributions For the uniform distribution the variance is calculated from the variance of the distribution to be 112 For the lognormal distribution lognstat is the built-in Matlab function that will covert the mean and variance from the log scale to the standard scale if lognormal mu=1var=1 C=lognrnd(11samplesn) [muvary]=lognstat(11)s=sprintf(Example 432 Lognormal n=40f and samples=30fnsamples) s2=sprintf(Example 432 Lognormal distribution 1st sample n=40fn) else C=rand(samplesn)

mu=05 Use symbolic math toolbox to calculate the variance of a uniform

random distribution syms f z y disp(Definite integral (01) of y^2) int(y^201) EYsq=int(sym(y^2)01)

EEOS 601 Applied Statistics Week 4 P 23 of 38

VARY=EYsq-mu^2 disp( Var of a uniformly distributed random number on the interval (01)) disp(VARY) vary=eval(VARY) convert from symbolic to numeric The variance is 112 s=sprintf(Example 432 Uniform n=40f and samples=40fnsamples) s2=sprintf(Example 432 Uniform distribution 1st sample n=40fn)endC6=sum(C) sums of n replicates The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302)end

Use the code on page 5-115 of Statistics toolbox to draw histogram Draw a histogram of one row of datahist(C(1))ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])ylabel(FrequencyFontSize16)xlabel(XFontSize16)title(s2FontSize20)figure(gcf)pause

histogram of the z scoresbinwidthscaling=7numbins This will produce 7 bins (35-(-35))edges=-35binwidthscaling35N=histc(C7edges)bar(edgesNsum(N)binwidthscaling1histc)ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])axis([-36 36 -0005 105max(Nsum(N)binwidthscaling)])ylabel(DensityFontSize16)xlabel(Z-ratioFontSize16)set(ax1xtickedgesFontSize14)set(ax1ytick00107FontSize14)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=0

EEOS 601 Applied Statistics Week 4 P 24 of 38

sigma_j=1 sigmaj is the standard deviation = 1 after Z transformy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)

title(sFontSize20)figure(gcf)pause

Example 433 LMex040303_4thm Example 433 Pp 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition An application of the symbolic math integration functions Typo in book f_y(y)=y 3(1-y)^2 not fy(y)=3(1-y)^2 syms y f_y(y)=y 3(1-y)^2 0lt=y=1 Let mean y = (115)Sum from 1 to 15 Yi int(y3(1-y)^201) muy=eval(int(y3(1-y)^201)) by definition of variance int(y^23(1-y)^201)-14^2 vary=eval(int(y^23(1-y)^201)-14^2) P=normcdf((38-muy) (sqrt(vary)sqrt(15)))- normcdf((18-muy) (sqrt(vary)sqrt(15)))

Example 434 LMEx040304_4thm Example 434 Effects of rounding analyzed with normal approximation Effect of rounding on the expected value and variance of rounded estimates P 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Using the central limit theorem what is the probability that the error on 100 transactions each rounded to the nearest $100 would exceed $500 What is the variance of a uniformly distributed random number over the interval -50 to 50 From definition of Variance Var(Y)=E[(Y-mu)^2]= integral from -50 to 50 (y-mu)^2 f_y(y) dy mu=0 f_y(y)=1 This is the height of the uniform pdf at each value of y for max(y)-min(y)=1 the value of f_y(y) is 1 for uniform random numbers spread over different ranges for max(y)-min(y)f_y(y) must equal 1 thus for a uniform random number distributed on the interval -100 to 100

EEOS 601 Applied Statistics Week 4 P 25 of 38

f_y(y)=1200 Written by EugeneGallagherumbedu for EEOS601 httpwwwesumbeduedgwebphtm Written in 2001 revised 1911 The mfile goes through some applications of the symbolic math toolbox at the beginning to verify that the variance of 100 rounded transactions is 25003 format rat syms y mu int(y^2100-5050)Use the symbolic math toolbox to find the variance of a uniformly distributed variable on the interval -50 to 50 it is 25003 VarY=eval(int(sym(y^2100)-5050)) Variance of a uniform randomly distributed number on the interval -50 to 50 is 25003 Note how the variance remains 25003 with a shift in mean Var(Y)=E(Y^2)-mu^2 mu=50 int(y^21000100) EYsq=eval(int(sym(y^2100)0100)) The previous statement produces 100003 which produces an identical variance after subtracting the square of the expected value mu^2 as is appropriate from the definition of variance VARY=EYsq-mu^2 Note that Var(Y) is also defined as integral -inf to inf (y-mu)^2 f_y(y) dy We can create the definite integral of this equation and evaluate it to produce the variance syms y mu mu=50 VARY2=eval(int(sym((y-mu)^2100)y0100)) VARY2 should also be 25003 format using Theorem 31312 in Larsen amp Marx (2001 p 223) Calculate the variance of all 100 transactions (multiply the variance by 100) Note this is for 100 indepedently distributed random variates where the positive and negative errors among transactions are not correlated Varall=100VARY If you wanted to convert the variance calculated here in dollars to the variance in pennies youd have to use Theorem 3131 p 222 It would the variance of Varall in pennies would be a=100Var(aVarall)=a^2 Varall =10000 Varall calculate the standard deviation sigmaall=sqrt(Varall)

EEOS 601 Applied Statistics Week 4 P 26 of 38

Calculate the Z statisticZ=zeros(21)Z(1)=(-500-0)sigmaallZ(2)=(500-0)sigmaall Use the cumulative normal probability function to find the probability of being more than $5 dollars off after 100 transactionsP=1-(normcdf(Z(2))-normcdf(Z(1)))fprintf(The probability of being more than $500 off after 100n)fprintf(transactions in which each transaction is rounded to the n)fprintf(nearest $100 is 64fnP)

Example 435 LMex040305_4thm Example 435 Memphis earthquakes and the Poisson distribution Pp 305-306 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Annual earthquakes registering 25 on the Richter scale and within 40 miles of downtown Memphis have a Poisson distribution with lambda=65 Calculate the exact probability that nine or more such earthquakes will strike next year using the Poisson distribution and compare that with the normal approximation based on the central limit theoremlambda=65N=9P=1-poisscdf(N-1lambda)fprintf(nThe exact p value using the Poisson cumulative distribution )fprintf(function with lambda=53f is 74fnlambdaP)P2=1-normcdf(((N-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP2)continuity=05 The continuity correction is appropriate whenever a discrete probability model is being approximated by the area under a curve)P3=1-normcdf(((N+continuity-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem with )fprintf(the continuity correction is )fprintf(74fnP3)

Optional plot the Poisson pdf and the normal approximation Use LMFig040301_4thm as a modelX=018Y = poisspdf(Xlambda)bar(XY1)axis([-06 19 0 02])title(Example 435FontSize14)set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 27 of 38

ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0218FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=lambdasigma_j=sqrt(lambda) sigmaj is the standard deviationy_j=00218fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel(Number of Memphis Earthquakes gt 25)figure(gcf)pause

Questions p 306

435 The Normal Curve as a Model for Individual Measurements

Definition 431 A random variable Y is said to be normally distributed with mean igrave and variance oacutesup2 if

Comment Areas under and ldquoarbitraryrdquo normal distribution f (y) y

are calculated by finding the equivalent area under the standard normal distribution f (z) z Figure 10 Figure 435 from Example 436 If a driverrsquos

true blood alcohol level is 0095 and the breathalyzer has an instrumental error oacute= 0004 (pdf shown in upper panel) What is the probability that a driver will be

The ratio is often referred to erroneously charged with DUI The problem can be as either a Z transformation or a Z analyzed using the pdf for the unscaled data or the data can score be converted to z scores (lower panel) by subtracting the

mean and dividing by oacute The probability of an erroneous Example 436 charge is 1056 the area to the right of the cutpoints LMex040306_4thm shown in the normal and standard normal curves

EEOS 601 Applied Statistics Week 4 P 28 of 38

Example 436 Drunk driving pages 308-309 in Larsen amp Marx (2006) Based on LMex040305_4thm Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml A legally drunk driver in many states has a blood alcohol level of 010 but the breathalyzer has measurement error Repeated measurements indicate that the the estimate of the mean is unbiased and sigma is 0004 If a drivers true blood alcohol content is 0095 what is the probability that he will be incorrectly booked on a DUI chargemu=0095sigma=0004Observed=01P=1-normcdf((Observed-mu)sigma)fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP)

Plot Figure 435subplot(211)X=0075000050115Y = normpdf(Xmusigma)plot(XY--r)axis([0075 0115 0 105])title(Figure 435FontSize14)ax1=gcaxlabel(Blood Alcohol Level)ylabel(f_y(y)FontSize14)ax1=gcaset(ax1xtick0080005011FontSize12)set(ax1ytick025100FontSize12)hold onxf=0100001115yf=normpdf(xfmusigma)fill([01 xf 0115][0 yf 0][8 8 1])hold off

subplot(212)X=-3500135Y = normpdf(X01)plot(XY--r)axis([-355 355 0 042])ax1=gcaxlabel(Z-score)ylabel(f_z(z)FontSize14)ax1=gca

EEOS 601 Applied Statistics Week 4 P 29 of 38

set(ax1xtick-350535FontSize12)set(ax1ytick00204FontSize12)hold onxf=125000135yf=normpdf(xf01)fill([125 xf 35][0 yf 0][8 8 1])figure(gcf)pausehold offsubplot(111)

Example 437 Mensa p 309 LMex040307_4thm Example 437 Mensa amp IQ Pages309-310 in Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Based on LMex040306_4thm Written Fall 2010 by EugeneGallagher for EEOS601 Revised1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfi Figure 11 Figure 436 from Example 437 A Mensa

lesedgwebphtml score of 133 marks the upper 2 of IQrsquos if igrave=100 and

A person must be in the upper 2 oacute=16

of the population in IQ to join What is the minimum IQ that will qualify a person for membership Assume that the mean IQ is 100 and sigma is 16mu=100sigma=16 Use the inverse of the normal pdf to find the IQ cutoff for the upper 2 of the population 098=(1-002)IQmensa=norminv(09810016)fprintf(nThe lowest acceptable IQ for membership in Mensa is )fprintf(30fnceil(IQmensa))

Plot Figure 436X=50150Y = normpdf(Xmusigma)plot(XY--r)axis([45 155 0 003])title(Figure 436 Mensa amp IQFontSize20)ax1=gcaxlabel(IQFontSize16)ylabel(f_y(y)FontSize16)ax1=gcaset(ax1xtick[50 100 133 150]FontSize14)

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 20: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 20 of 38

See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogrammultiple of 7 n=100samples=1000numbins=14 Numbins for thehistogram multiple of 7 n=1000samples=500numbins=14 For the histogram n=1000samples=40 numbins=7 For the histogramC=rand(samplesn) Figure 6 Example 432 Page 302 in Larsen amp MarxC6=sum(C) sums of n replicates (2006) 1000 samples of 5 uniformly distributed random Use symbolic math toolbox to numbers after conversion to Z scores The number of binscalculate the variance of a uniform increased from 7 to 14 for this bar chart The normal random distribution distribution provides an excellent fit even though each ofmu=05 1000 means is based on only 5 cases drawn from a uniformsyms f z y distribution The uniform distribution is symmetricdisp(Definite integral (01) of y^2) allowing a rapid convergence to the normal distributionint(y^201)EYsq=int(sym(y^2)01)VARY=EYsq-mu^2disp(Var of a uniformly distributed random number on the interval (01))disp(VARY)vary=eval(VARY) convert from symbolic to numeric The variance is 112 The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302) end Use the code on page 5-115 of Statistics toolbox to draw histogram binwidthscaling=7numbins This will produce 7 bins (35-(-35)) edges=-35binwidthscaling35N=histc(C7edges) bar(edgesNsum(N)binwidthscaling1) ax1=gca set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 21 of 38

axis([-36 36 -0005 053]) ylabel(DensityFontSize14)xlabel(Z-ratio) set(ax1xtickedgesFontSize12) set(ax1ytick00105FontSize12) figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is Figure 7 Example 432 Page 302 in Larsen amp Marx f(y_j)=1(sqrt(2pi)sigma_j)exp(-1

(2006) 1000 samples of 500 uniformly distributed random numbers after conversion to Z scores The number of bins

2((y_j-mu_j)sigma_j)^2) increased from 7 to 14 for this bar chart The fit of the mu_j=0 sigma_j=1 sigmaj is the standard

normal distribution is excellent as would be expected from the large number of cases (500) on which each mean is

deviation = 1 after Z transform basedy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)s=sprintf(Example 432 n=40f and samples=30fnsamples)title(s)figure(gcf)pause

LMex040302_logrnd_4thm Larsen amp Marx Example 432 Example of the central limit theorem using lognormally distributed variables instead of uniform Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox

Figure 8 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 lognormally distributed (igrave=1 oacute=1) random numbers after conversion to Z-scores The normal approximation is quite poor Because each of the 40 samples is based on a mean of only 5 cases

EEOS 601 Applied Statistics Week 4 P 22 of 38

Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogram multiple of 7 Figure 9 Example 432 Page 302 in Larsen amp Marx

n=100samples=1000 (2006) 1000 samples of 1000 lognormally distributed

numbins=14 Numbins for the (igrave=1 oacute=1) random numbers after conversion to Z scores I

histogram multiple of 7 used 14 bins for this bar chart Even for the highly

n=1000samples=1000 asymmetrically lognormally distributed samples the

numbins=14 For the histogram normal approximation is excellent because of the large

n=1000samples=40 numbins=7 number of cases (1000) on which each mean was based

For the histogram lognormal=0 This example calculates the z scores based on the parametric mean and variance for the underlying distributions For the uniform distribution the variance is calculated from the variance of the distribution to be 112 For the lognormal distribution lognstat is the built-in Matlab function that will covert the mean and variance from the log scale to the standard scale if lognormal mu=1var=1 C=lognrnd(11samplesn) [muvary]=lognstat(11)s=sprintf(Example 432 Lognormal n=40f and samples=30fnsamples) s2=sprintf(Example 432 Lognormal distribution 1st sample n=40fn) else C=rand(samplesn)

mu=05 Use symbolic math toolbox to calculate the variance of a uniform

random distribution syms f z y disp(Definite integral (01) of y^2) int(y^201) EYsq=int(sym(y^2)01)

EEOS 601 Applied Statistics Week 4 P 23 of 38

VARY=EYsq-mu^2 disp( Var of a uniformly distributed random number on the interval (01)) disp(VARY) vary=eval(VARY) convert from symbolic to numeric The variance is 112 s=sprintf(Example 432 Uniform n=40f and samples=40fnsamples) s2=sprintf(Example 432 Uniform distribution 1st sample n=40fn)endC6=sum(C) sums of n replicates The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302)end

Use the code on page 5-115 of Statistics toolbox to draw histogram Draw a histogram of one row of datahist(C(1))ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])ylabel(FrequencyFontSize16)xlabel(XFontSize16)title(s2FontSize20)figure(gcf)pause

histogram of the z scoresbinwidthscaling=7numbins This will produce 7 bins (35-(-35))edges=-35binwidthscaling35N=histc(C7edges)bar(edgesNsum(N)binwidthscaling1histc)ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])axis([-36 36 -0005 105max(Nsum(N)binwidthscaling)])ylabel(DensityFontSize16)xlabel(Z-ratioFontSize16)set(ax1xtickedgesFontSize14)set(ax1ytick00107FontSize14)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=0

EEOS 601 Applied Statistics Week 4 P 24 of 38

sigma_j=1 sigmaj is the standard deviation = 1 after Z transformy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)

title(sFontSize20)figure(gcf)pause

Example 433 LMex040303_4thm Example 433 Pp 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition An application of the symbolic math integration functions Typo in book f_y(y)=y 3(1-y)^2 not fy(y)=3(1-y)^2 syms y f_y(y)=y 3(1-y)^2 0lt=y=1 Let mean y = (115)Sum from 1 to 15 Yi int(y3(1-y)^201) muy=eval(int(y3(1-y)^201)) by definition of variance int(y^23(1-y)^201)-14^2 vary=eval(int(y^23(1-y)^201)-14^2) P=normcdf((38-muy) (sqrt(vary)sqrt(15)))- normcdf((18-muy) (sqrt(vary)sqrt(15)))

Example 434 LMEx040304_4thm Example 434 Effects of rounding analyzed with normal approximation Effect of rounding on the expected value and variance of rounded estimates P 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Using the central limit theorem what is the probability that the error on 100 transactions each rounded to the nearest $100 would exceed $500 What is the variance of a uniformly distributed random number over the interval -50 to 50 From definition of Variance Var(Y)=E[(Y-mu)^2]= integral from -50 to 50 (y-mu)^2 f_y(y) dy mu=0 f_y(y)=1 This is the height of the uniform pdf at each value of y for max(y)-min(y)=1 the value of f_y(y) is 1 for uniform random numbers spread over different ranges for max(y)-min(y)f_y(y) must equal 1 thus for a uniform random number distributed on the interval -100 to 100

EEOS 601 Applied Statistics Week 4 P 25 of 38

f_y(y)=1200 Written by EugeneGallagherumbedu for EEOS601 httpwwwesumbeduedgwebphtm Written in 2001 revised 1911 The mfile goes through some applications of the symbolic math toolbox at the beginning to verify that the variance of 100 rounded transactions is 25003 format rat syms y mu int(y^2100-5050)Use the symbolic math toolbox to find the variance of a uniformly distributed variable on the interval -50 to 50 it is 25003 VarY=eval(int(sym(y^2100)-5050)) Variance of a uniform randomly distributed number on the interval -50 to 50 is 25003 Note how the variance remains 25003 with a shift in mean Var(Y)=E(Y^2)-mu^2 mu=50 int(y^21000100) EYsq=eval(int(sym(y^2100)0100)) The previous statement produces 100003 which produces an identical variance after subtracting the square of the expected value mu^2 as is appropriate from the definition of variance VARY=EYsq-mu^2 Note that Var(Y) is also defined as integral -inf to inf (y-mu)^2 f_y(y) dy We can create the definite integral of this equation and evaluate it to produce the variance syms y mu mu=50 VARY2=eval(int(sym((y-mu)^2100)y0100)) VARY2 should also be 25003 format using Theorem 31312 in Larsen amp Marx (2001 p 223) Calculate the variance of all 100 transactions (multiply the variance by 100) Note this is for 100 indepedently distributed random variates where the positive and negative errors among transactions are not correlated Varall=100VARY If you wanted to convert the variance calculated here in dollars to the variance in pennies youd have to use Theorem 3131 p 222 It would the variance of Varall in pennies would be a=100Var(aVarall)=a^2 Varall =10000 Varall calculate the standard deviation sigmaall=sqrt(Varall)

EEOS 601 Applied Statistics Week 4 P 26 of 38

Calculate the Z statisticZ=zeros(21)Z(1)=(-500-0)sigmaallZ(2)=(500-0)sigmaall Use the cumulative normal probability function to find the probability of being more than $5 dollars off after 100 transactionsP=1-(normcdf(Z(2))-normcdf(Z(1)))fprintf(The probability of being more than $500 off after 100n)fprintf(transactions in which each transaction is rounded to the n)fprintf(nearest $100 is 64fnP)

Example 435 LMex040305_4thm Example 435 Memphis earthquakes and the Poisson distribution Pp 305-306 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Annual earthquakes registering 25 on the Richter scale and within 40 miles of downtown Memphis have a Poisson distribution with lambda=65 Calculate the exact probability that nine or more such earthquakes will strike next year using the Poisson distribution and compare that with the normal approximation based on the central limit theoremlambda=65N=9P=1-poisscdf(N-1lambda)fprintf(nThe exact p value using the Poisson cumulative distribution )fprintf(function with lambda=53f is 74fnlambdaP)P2=1-normcdf(((N-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP2)continuity=05 The continuity correction is appropriate whenever a discrete probability model is being approximated by the area under a curve)P3=1-normcdf(((N+continuity-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem with )fprintf(the continuity correction is )fprintf(74fnP3)

Optional plot the Poisson pdf and the normal approximation Use LMFig040301_4thm as a modelX=018Y = poisspdf(Xlambda)bar(XY1)axis([-06 19 0 02])title(Example 435FontSize14)set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 27 of 38

ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0218FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=lambdasigma_j=sqrt(lambda) sigmaj is the standard deviationy_j=00218fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel(Number of Memphis Earthquakes gt 25)figure(gcf)pause

Questions p 306

435 The Normal Curve as a Model for Individual Measurements

Definition 431 A random variable Y is said to be normally distributed with mean igrave and variance oacutesup2 if

Comment Areas under and ldquoarbitraryrdquo normal distribution f (y) y

are calculated by finding the equivalent area under the standard normal distribution f (z) z Figure 10 Figure 435 from Example 436 If a driverrsquos

true blood alcohol level is 0095 and the breathalyzer has an instrumental error oacute= 0004 (pdf shown in upper panel) What is the probability that a driver will be

The ratio is often referred to erroneously charged with DUI The problem can be as either a Z transformation or a Z analyzed using the pdf for the unscaled data or the data can score be converted to z scores (lower panel) by subtracting the

mean and dividing by oacute The probability of an erroneous Example 436 charge is 1056 the area to the right of the cutpoints LMex040306_4thm shown in the normal and standard normal curves

EEOS 601 Applied Statistics Week 4 P 28 of 38

Example 436 Drunk driving pages 308-309 in Larsen amp Marx (2006) Based on LMex040305_4thm Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml A legally drunk driver in many states has a blood alcohol level of 010 but the breathalyzer has measurement error Repeated measurements indicate that the the estimate of the mean is unbiased and sigma is 0004 If a drivers true blood alcohol content is 0095 what is the probability that he will be incorrectly booked on a DUI chargemu=0095sigma=0004Observed=01P=1-normcdf((Observed-mu)sigma)fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP)

Plot Figure 435subplot(211)X=0075000050115Y = normpdf(Xmusigma)plot(XY--r)axis([0075 0115 0 105])title(Figure 435FontSize14)ax1=gcaxlabel(Blood Alcohol Level)ylabel(f_y(y)FontSize14)ax1=gcaset(ax1xtick0080005011FontSize12)set(ax1ytick025100FontSize12)hold onxf=0100001115yf=normpdf(xfmusigma)fill([01 xf 0115][0 yf 0][8 8 1])hold off

subplot(212)X=-3500135Y = normpdf(X01)plot(XY--r)axis([-355 355 0 042])ax1=gcaxlabel(Z-score)ylabel(f_z(z)FontSize14)ax1=gca

EEOS 601 Applied Statistics Week 4 P 29 of 38

set(ax1xtick-350535FontSize12)set(ax1ytick00204FontSize12)hold onxf=125000135yf=normpdf(xf01)fill([125 xf 35][0 yf 0][8 8 1])figure(gcf)pausehold offsubplot(111)

Example 437 Mensa p 309 LMex040307_4thm Example 437 Mensa amp IQ Pages309-310 in Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Based on LMex040306_4thm Written Fall 2010 by EugeneGallagher for EEOS601 Revised1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfi Figure 11 Figure 436 from Example 437 A Mensa

lesedgwebphtml score of 133 marks the upper 2 of IQrsquos if igrave=100 and

A person must be in the upper 2 oacute=16

of the population in IQ to join What is the minimum IQ that will qualify a person for membership Assume that the mean IQ is 100 and sigma is 16mu=100sigma=16 Use the inverse of the normal pdf to find the IQ cutoff for the upper 2 of the population 098=(1-002)IQmensa=norminv(09810016)fprintf(nThe lowest acceptable IQ for membership in Mensa is )fprintf(30fnceil(IQmensa))

Plot Figure 436X=50150Y = normpdf(Xmusigma)plot(XY--r)axis([45 155 0 003])title(Figure 436 Mensa amp IQFontSize20)ax1=gcaxlabel(IQFontSize16)ylabel(f_y(y)FontSize16)ax1=gcaset(ax1xtick[50 100 133 150]FontSize14)

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 21: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 21 of 38

axis([-36 36 -0005 053]) ylabel(DensityFontSize14)xlabel(Z-ratio) set(ax1xtickedgesFontSize12) set(ax1ytick00105FontSize12) figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is Figure 7 Example 432 Page 302 in Larsen amp Marx f(y_j)=1(sqrt(2pi)sigma_j)exp(-1

(2006) 1000 samples of 500 uniformly distributed random numbers after conversion to Z scores The number of bins

2((y_j-mu_j)sigma_j)^2) increased from 7 to 14 for this bar chart The fit of the mu_j=0 sigma_j=1 sigmaj is the standard

normal distribution is excellent as would be expected from the large number of cases (500) on which each mean is

deviation = 1 after Z transform basedy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)s=sprintf(Example 432 n=40f and samples=30fnsamples)title(s)figure(gcf)pause

LMex040302_logrnd_4thm Larsen amp Marx Example 432 Example of the central limit theorem using lognormally distributed variables instead of uniform Page 302-303 Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Integration to find the variance of a uniform pdf using the symbolic math toolbox

Figure 8 Example 432 Page 302 in Larsen amp Marx (2006) 40 samples of 5 lognormally distributed (igrave=1 oacute=1) random numbers after conversion to Z-scores The normal approximation is quite poor Because each of the 40 samples is based on a mean of only 5 cases

EEOS 601 Applied Statistics Week 4 P 22 of 38

Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogram multiple of 7 Figure 9 Example 432 Page 302 in Larsen amp Marx

n=100samples=1000 (2006) 1000 samples of 1000 lognormally distributed

numbins=14 Numbins for the (igrave=1 oacute=1) random numbers after conversion to Z scores I

histogram multiple of 7 used 14 bins for this bar chart Even for the highly

n=1000samples=1000 asymmetrically lognormally distributed samples the

numbins=14 For the histogram normal approximation is excellent because of the large

n=1000samples=40 numbins=7 number of cases (1000) on which each mean was based

For the histogram lognormal=0 This example calculates the z scores based on the parametric mean and variance for the underlying distributions For the uniform distribution the variance is calculated from the variance of the distribution to be 112 For the lognormal distribution lognstat is the built-in Matlab function that will covert the mean and variance from the log scale to the standard scale if lognormal mu=1var=1 C=lognrnd(11samplesn) [muvary]=lognstat(11)s=sprintf(Example 432 Lognormal n=40f and samples=30fnsamples) s2=sprintf(Example 432 Lognormal distribution 1st sample n=40fn) else C=rand(samplesn)

mu=05 Use symbolic math toolbox to calculate the variance of a uniform

random distribution syms f z y disp(Definite integral (01) of y^2) int(y^201) EYsq=int(sym(y^2)01)

EEOS 601 Applied Statistics Week 4 P 23 of 38

VARY=EYsq-mu^2 disp( Var of a uniformly distributed random number on the interval (01)) disp(VARY) vary=eval(VARY) convert from symbolic to numeric The variance is 112 s=sprintf(Example 432 Uniform n=40f and samples=40fnsamples) s2=sprintf(Example 432 Uniform distribution 1st sample n=40fn)endC6=sum(C) sums of n replicates The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302)end

Use the code on page 5-115 of Statistics toolbox to draw histogram Draw a histogram of one row of datahist(C(1))ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])ylabel(FrequencyFontSize16)xlabel(XFontSize16)title(s2FontSize20)figure(gcf)pause

histogram of the z scoresbinwidthscaling=7numbins This will produce 7 bins (35-(-35))edges=-35binwidthscaling35N=histc(C7edges)bar(edgesNsum(N)binwidthscaling1histc)ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])axis([-36 36 -0005 105max(Nsum(N)binwidthscaling)])ylabel(DensityFontSize16)xlabel(Z-ratioFontSize16)set(ax1xtickedgesFontSize14)set(ax1ytick00107FontSize14)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=0

EEOS 601 Applied Statistics Week 4 P 24 of 38

sigma_j=1 sigmaj is the standard deviation = 1 after Z transformy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)

title(sFontSize20)figure(gcf)pause

Example 433 LMex040303_4thm Example 433 Pp 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition An application of the symbolic math integration functions Typo in book f_y(y)=y 3(1-y)^2 not fy(y)=3(1-y)^2 syms y f_y(y)=y 3(1-y)^2 0lt=y=1 Let mean y = (115)Sum from 1 to 15 Yi int(y3(1-y)^201) muy=eval(int(y3(1-y)^201)) by definition of variance int(y^23(1-y)^201)-14^2 vary=eval(int(y^23(1-y)^201)-14^2) P=normcdf((38-muy) (sqrt(vary)sqrt(15)))- normcdf((18-muy) (sqrt(vary)sqrt(15)))

Example 434 LMEx040304_4thm Example 434 Effects of rounding analyzed with normal approximation Effect of rounding on the expected value and variance of rounded estimates P 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Using the central limit theorem what is the probability that the error on 100 transactions each rounded to the nearest $100 would exceed $500 What is the variance of a uniformly distributed random number over the interval -50 to 50 From definition of Variance Var(Y)=E[(Y-mu)^2]= integral from -50 to 50 (y-mu)^2 f_y(y) dy mu=0 f_y(y)=1 This is the height of the uniform pdf at each value of y for max(y)-min(y)=1 the value of f_y(y) is 1 for uniform random numbers spread over different ranges for max(y)-min(y)f_y(y) must equal 1 thus for a uniform random number distributed on the interval -100 to 100

EEOS 601 Applied Statistics Week 4 P 25 of 38

f_y(y)=1200 Written by EugeneGallagherumbedu for EEOS601 httpwwwesumbeduedgwebphtm Written in 2001 revised 1911 The mfile goes through some applications of the symbolic math toolbox at the beginning to verify that the variance of 100 rounded transactions is 25003 format rat syms y mu int(y^2100-5050)Use the symbolic math toolbox to find the variance of a uniformly distributed variable on the interval -50 to 50 it is 25003 VarY=eval(int(sym(y^2100)-5050)) Variance of a uniform randomly distributed number on the interval -50 to 50 is 25003 Note how the variance remains 25003 with a shift in mean Var(Y)=E(Y^2)-mu^2 mu=50 int(y^21000100) EYsq=eval(int(sym(y^2100)0100)) The previous statement produces 100003 which produces an identical variance after subtracting the square of the expected value mu^2 as is appropriate from the definition of variance VARY=EYsq-mu^2 Note that Var(Y) is also defined as integral -inf to inf (y-mu)^2 f_y(y) dy We can create the definite integral of this equation and evaluate it to produce the variance syms y mu mu=50 VARY2=eval(int(sym((y-mu)^2100)y0100)) VARY2 should also be 25003 format using Theorem 31312 in Larsen amp Marx (2001 p 223) Calculate the variance of all 100 transactions (multiply the variance by 100) Note this is for 100 indepedently distributed random variates where the positive and negative errors among transactions are not correlated Varall=100VARY If you wanted to convert the variance calculated here in dollars to the variance in pennies youd have to use Theorem 3131 p 222 It would the variance of Varall in pennies would be a=100Var(aVarall)=a^2 Varall =10000 Varall calculate the standard deviation sigmaall=sqrt(Varall)

EEOS 601 Applied Statistics Week 4 P 26 of 38

Calculate the Z statisticZ=zeros(21)Z(1)=(-500-0)sigmaallZ(2)=(500-0)sigmaall Use the cumulative normal probability function to find the probability of being more than $5 dollars off after 100 transactionsP=1-(normcdf(Z(2))-normcdf(Z(1)))fprintf(The probability of being more than $500 off after 100n)fprintf(transactions in which each transaction is rounded to the n)fprintf(nearest $100 is 64fnP)

Example 435 LMex040305_4thm Example 435 Memphis earthquakes and the Poisson distribution Pp 305-306 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Annual earthquakes registering 25 on the Richter scale and within 40 miles of downtown Memphis have a Poisson distribution with lambda=65 Calculate the exact probability that nine or more such earthquakes will strike next year using the Poisson distribution and compare that with the normal approximation based on the central limit theoremlambda=65N=9P=1-poisscdf(N-1lambda)fprintf(nThe exact p value using the Poisson cumulative distribution )fprintf(function with lambda=53f is 74fnlambdaP)P2=1-normcdf(((N-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP2)continuity=05 The continuity correction is appropriate whenever a discrete probability model is being approximated by the area under a curve)P3=1-normcdf(((N+continuity-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem with )fprintf(the continuity correction is )fprintf(74fnP3)

Optional plot the Poisson pdf and the normal approximation Use LMFig040301_4thm as a modelX=018Y = poisspdf(Xlambda)bar(XY1)axis([-06 19 0 02])title(Example 435FontSize14)set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 27 of 38

ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0218FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=lambdasigma_j=sqrt(lambda) sigmaj is the standard deviationy_j=00218fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel(Number of Memphis Earthquakes gt 25)figure(gcf)pause

Questions p 306

435 The Normal Curve as a Model for Individual Measurements

Definition 431 A random variable Y is said to be normally distributed with mean igrave and variance oacutesup2 if

Comment Areas under and ldquoarbitraryrdquo normal distribution f (y) y

are calculated by finding the equivalent area under the standard normal distribution f (z) z Figure 10 Figure 435 from Example 436 If a driverrsquos

true blood alcohol level is 0095 and the breathalyzer has an instrumental error oacute= 0004 (pdf shown in upper panel) What is the probability that a driver will be

The ratio is often referred to erroneously charged with DUI The problem can be as either a Z transformation or a Z analyzed using the pdf for the unscaled data or the data can score be converted to z scores (lower panel) by subtracting the

mean and dividing by oacute The probability of an erroneous Example 436 charge is 1056 the area to the right of the cutpoints LMex040306_4thm shown in the normal and standard normal curves

EEOS 601 Applied Statistics Week 4 P 28 of 38

Example 436 Drunk driving pages 308-309 in Larsen amp Marx (2006) Based on LMex040305_4thm Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml A legally drunk driver in many states has a blood alcohol level of 010 but the breathalyzer has measurement error Repeated measurements indicate that the the estimate of the mean is unbiased and sigma is 0004 If a drivers true blood alcohol content is 0095 what is the probability that he will be incorrectly booked on a DUI chargemu=0095sigma=0004Observed=01P=1-normcdf((Observed-mu)sigma)fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP)

Plot Figure 435subplot(211)X=0075000050115Y = normpdf(Xmusigma)plot(XY--r)axis([0075 0115 0 105])title(Figure 435FontSize14)ax1=gcaxlabel(Blood Alcohol Level)ylabel(f_y(y)FontSize14)ax1=gcaset(ax1xtick0080005011FontSize12)set(ax1ytick025100FontSize12)hold onxf=0100001115yf=normpdf(xfmusigma)fill([01 xf 0115][0 yf 0][8 8 1])hold off

subplot(212)X=-3500135Y = normpdf(X01)plot(XY--r)axis([-355 355 0 042])ax1=gcaxlabel(Z-score)ylabel(f_z(z)FontSize14)ax1=gca

EEOS 601 Applied Statistics Week 4 P 29 of 38

set(ax1xtick-350535FontSize12)set(ax1ytick00204FontSize12)hold onxf=125000135yf=normpdf(xf01)fill([125 xf 35][0 yf 0][8 8 1])figure(gcf)pausehold offsubplot(111)

Example 437 Mensa p 309 LMex040307_4thm Example 437 Mensa amp IQ Pages309-310 in Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Based on LMex040306_4thm Written Fall 2010 by EugeneGallagher for EEOS601 Revised1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfi Figure 11 Figure 436 from Example 437 A Mensa

lesedgwebphtml score of 133 marks the upper 2 of IQrsquos if igrave=100 and

A person must be in the upper 2 oacute=16

of the population in IQ to join What is the minimum IQ that will qualify a person for membership Assume that the mean IQ is 100 and sigma is 16mu=100sigma=16 Use the inverse of the normal pdf to find the IQ cutoff for the upper 2 of the population 098=(1-002)IQmensa=norminv(09810016)fprintf(nThe lowest acceptable IQ for membership in Mensa is )fprintf(30fnceil(IQmensa))

Plot Figure 436X=50150Y = normpdf(Xmusigma)plot(XY--r)axis([45 155 0 003])title(Figure 436 Mensa amp IQFontSize20)ax1=gcaxlabel(IQFontSize16)ylabel(f_y(y)FontSize16)ax1=gcaset(ax1xtick[50 100 133 150]FontSize14)

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 22: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 22 of 38

Written by EugeneGallagherumbedu as LMex040302_3rdm in 2001 revised 1811 See also LMEx040304_4thm on rounding for variance of uniform pdf See also LMEx040304_3rdm on rounding for variance of uniform pdf

Generate 40 sets of 5 uniform random numbers as in Table 432 p 302 Figure in Table 432 uses 7 bins and 40 samples Try 1000 samples and numbins=20 for a prettier plot n=5samples=40 numbins=7 Numbins for the histogram multiple of 7 n=5samples=1000 numbins=14 Numbins for the histogram multiple of 7 Figure 9 Example 432 Page 302 in Larsen amp Marx

n=100samples=1000 (2006) 1000 samples of 1000 lognormally distributed

numbins=14 Numbins for the (igrave=1 oacute=1) random numbers after conversion to Z scores I

histogram multiple of 7 used 14 bins for this bar chart Even for the highly

n=1000samples=1000 asymmetrically lognormally distributed samples the

numbins=14 For the histogram normal approximation is excellent because of the large

n=1000samples=40 numbins=7 number of cases (1000) on which each mean was based

For the histogram lognormal=0 This example calculates the z scores based on the parametric mean and variance for the underlying distributions For the uniform distribution the variance is calculated from the variance of the distribution to be 112 For the lognormal distribution lognstat is the built-in Matlab function that will covert the mean and variance from the log scale to the standard scale if lognormal mu=1var=1 C=lognrnd(11samplesn) [muvary]=lognstat(11)s=sprintf(Example 432 Lognormal n=40f and samples=30fnsamples) s2=sprintf(Example 432 Lognormal distribution 1st sample n=40fn) else C=rand(samplesn)

mu=05 Use symbolic math toolbox to calculate the variance of a uniform

random distribution syms f z y disp(Definite integral (01) of y^2) int(y^201) EYsq=int(sym(y^2)01)

EEOS 601 Applied Statistics Week 4 P 23 of 38

VARY=EYsq-mu^2 disp( Var of a uniformly distributed random number on the interval (01)) disp(VARY) vary=eval(VARY) convert from symbolic to numeric The variance is 112 s=sprintf(Example 432 Uniform n=40f and samples=40fnsamples) s2=sprintf(Example 432 Uniform distribution 1st sample n=40fn)endC6=sum(C) sums of n replicates The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302)end

Use the code on page 5-115 of Statistics toolbox to draw histogram Draw a histogram of one row of datahist(C(1))ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])ylabel(FrequencyFontSize16)xlabel(XFontSize16)title(s2FontSize20)figure(gcf)pause

histogram of the z scoresbinwidthscaling=7numbins This will produce 7 bins (35-(-35))edges=-35binwidthscaling35N=histc(C7edges)bar(edgesNsum(N)binwidthscaling1histc)ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])axis([-36 36 -0005 105max(Nsum(N)binwidthscaling)])ylabel(DensityFontSize16)xlabel(Z-ratioFontSize16)set(ax1xtickedgesFontSize14)set(ax1ytick00107FontSize14)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=0

EEOS 601 Applied Statistics Week 4 P 24 of 38

sigma_j=1 sigmaj is the standard deviation = 1 after Z transformy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)

title(sFontSize20)figure(gcf)pause

Example 433 LMex040303_4thm Example 433 Pp 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition An application of the symbolic math integration functions Typo in book f_y(y)=y 3(1-y)^2 not fy(y)=3(1-y)^2 syms y f_y(y)=y 3(1-y)^2 0lt=y=1 Let mean y = (115)Sum from 1 to 15 Yi int(y3(1-y)^201) muy=eval(int(y3(1-y)^201)) by definition of variance int(y^23(1-y)^201)-14^2 vary=eval(int(y^23(1-y)^201)-14^2) P=normcdf((38-muy) (sqrt(vary)sqrt(15)))- normcdf((18-muy) (sqrt(vary)sqrt(15)))

Example 434 LMEx040304_4thm Example 434 Effects of rounding analyzed with normal approximation Effect of rounding on the expected value and variance of rounded estimates P 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Using the central limit theorem what is the probability that the error on 100 transactions each rounded to the nearest $100 would exceed $500 What is the variance of a uniformly distributed random number over the interval -50 to 50 From definition of Variance Var(Y)=E[(Y-mu)^2]= integral from -50 to 50 (y-mu)^2 f_y(y) dy mu=0 f_y(y)=1 This is the height of the uniform pdf at each value of y for max(y)-min(y)=1 the value of f_y(y) is 1 for uniform random numbers spread over different ranges for max(y)-min(y)f_y(y) must equal 1 thus for a uniform random number distributed on the interval -100 to 100

EEOS 601 Applied Statistics Week 4 P 25 of 38

f_y(y)=1200 Written by EugeneGallagherumbedu for EEOS601 httpwwwesumbeduedgwebphtm Written in 2001 revised 1911 The mfile goes through some applications of the symbolic math toolbox at the beginning to verify that the variance of 100 rounded transactions is 25003 format rat syms y mu int(y^2100-5050)Use the symbolic math toolbox to find the variance of a uniformly distributed variable on the interval -50 to 50 it is 25003 VarY=eval(int(sym(y^2100)-5050)) Variance of a uniform randomly distributed number on the interval -50 to 50 is 25003 Note how the variance remains 25003 with a shift in mean Var(Y)=E(Y^2)-mu^2 mu=50 int(y^21000100) EYsq=eval(int(sym(y^2100)0100)) The previous statement produces 100003 which produces an identical variance after subtracting the square of the expected value mu^2 as is appropriate from the definition of variance VARY=EYsq-mu^2 Note that Var(Y) is also defined as integral -inf to inf (y-mu)^2 f_y(y) dy We can create the definite integral of this equation and evaluate it to produce the variance syms y mu mu=50 VARY2=eval(int(sym((y-mu)^2100)y0100)) VARY2 should also be 25003 format using Theorem 31312 in Larsen amp Marx (2001 p 223) Calculate the variance of all 100 transactions (multiply the variance by 100) Note this is for 100 indepedently distributed random variates where the positive and negative errors among transactions are not correlated Varall=100VARY If you wanted to convert the variance calculated here in dollars to the variance in pennies youd have to use Theorem 3131 p 222 It would the variance of Varall in pennies would be a=100Var(aVarall)=a^2 Varall =10000 Varall calculate the standard deviation sigmaall=sqrt(Varall)

EEOS 601 Applied Statistics Week 4 P 26 of 38

Calculate the Z statisticZ=zeros(21)Z(1)=(-500-0)sigmaallZ(2)=(500-0)sigmaall Use the cumulative normal probability function to find the probability of being more than $5 dollars off after 100 transactionsP=1-(normcdf(Z(2))-normcdf(Z(1)))fprintf(The probability of being more than $500 off after 100n)fprintf(transactions in which each transaction is rounded to the n)fprintf(nearest $100 is 64fnP)

Example 435 LMex040305_4thm Example 435 Memphis earthquakes and the Poisson distribution Pp 305-306 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Annual earthquakes registering 25 on the Richter scale and within 40 miles of downtown Memphis have a Poisson distribution with lambda=65 Calculate the exact probability that nine or more such earthquakes will strike next year using the Poisson distribution and compare that with the normal approximation based on the central limit theoremlambda=65N=9P=1-poisscdf(N-1lambda)fprintf(nThe exact p value using the Poisson cumulative distribution )fprintf(function with lambda=53f is 74fnlambdaP)P2=1-normcdf(((N-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP2)continuity=05 The continuity correction is appropriate whenever a discrete probability model is being approximated by the area under a curve)P3=1-normcdf(((N+continuity-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem with )fprintf(the continuity correction is )fprintf(74fnP3)

Optional plot the Poisson pdf and the normal approximation Use LMFig040301_4thm as a modelX=018Y = poisspdf(Xlambda)bar(XY1)axis([-06 19 0 02])title(Example 435FontSize14)set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 27 of 38

ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0218FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=lambdasigma_j=sqrt(lambda) sigmaj is the standard deviationy_j=00218fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel(Number of Memphis Earthquakes gt 25)figure(gcf)pause

Questions p 306

435 The Normal Curve as a Model for Individual Measurements

Definition 431 A random variable Y is said to be normally distributed with mean igrave and variance oacutesup2 if

Comment Areas under and ldquoarbitraryrdquo normal distribution f (y) y

are calculated by finding the equivalent area under the standard normal distribution f (z) z Figure 10 Figure 435 from Example 436 If a driverrsquos

true blood alcohol level is 0095 and the breathalyzer has an instrumental error oacute= 0004 (pdf shown in upper panel) What is the probability that a driver will be

The ratio is often referred to erroneously charged with DUI The problem can be as either a Z transformation or a Z analyzed using the pdf for the unscaled data or the data can score be converted to z scores (lower panel) by subtracting the

mean and dividing by oacute The probability of an erroneous Example 436 charge is 1056 the area to the right of the cutpoints LMex040306_4thm shown in the normal and standard normal curves

EEOS 601 Applied Statistics Week 4 P 28 of 38

Example 436 Drunk driving pages 308-309 in Larsen amp Marx (2006) Based on LMex040305_4thm Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml A legally drunk driver in many states has a blood alcohol level of 010 but the breathalyzer has measurement error Repeated measurements indicate that the the estimate of the mean is unbiased and sigma is 0004 If a drivers true blood alcohol content is 0095 what is the probability that he will be incorrectly booked on a DUI chargemu=0095sigma=0004Observed=01P=1-normcdf((Observed-mu)sigma)fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP)

Plot Figure 435subplot(211)X=0075000050115Y = normpdf(Xmusigma)plot(XY--r)axis([0075 0115 0 105])title(Figure 435FontSize14)ax1=gcaxlabel(Blood Alcohol Level)ylabel(f_y(y)FontSize14)ax1=gcaset(ax1xtick0080005011FontSize12)set(ax1ytick025100FontSize12)hold onxf=0100001115yf=normpdf(xfmusigma)fill([01 xf 0115][0 yf 0][8 8 1])hold off

subplot(212)X=-3500135Y = normpdf(X01)plot(XY--r)axis([-355 355 0 042])ax1=gcaxlabel(Z-score)ylabel(f_z(z)FontSize14)ax1=gca

EEOS 601 Applied Statistics Week 4 P 29 of 38

set(ax1xtick-350535FontSize12)set(ax1ytick00204FontSize12)hold onxf=125000135yf=normpdf(xf01)fill([125 xf 35][0 yf 0][8 8 1])figure(gcf)pausehold offsubplot(111)

Example 437 Mensa p 309 LMex040307_4thm Example 437 Mensa amp IQ Pages309-310 in Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Based on LMex040306_4thm Written Fall 2010 by EugeneGallagher for EEOS601 Revised1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfi Figure 11 Figure 436 from Example 437 A Mensa

lesedgwebphtml score of 133 marks the upper 2 of IQrsquos if igrave=100 and

A person must be in the upper 2 oacute=16

of the population in IQ to join What is the minimum IQ that will qualify a person for membership Assume that the mean IQ is 100 and sigma is 16mu=100sigma=16 Use the inverse of the normal pdf to find the IQ cutoff for the upper 2 of the population 098=(1-002)IQmensa=norminv(09810016)fprintf(nThe lowest acceptable IQ for membership in Mensa is )fprintf(30fnceil(IQmensa))

Plot Figure 436X=50150Y = normpdf(Xmusigma)plot(XY--r)axis([45 155 0 003])title(Figure 436 Mensa amp IQFontSize20)ax1=gcaxlabel(IQFontSize16)ylabel(f_y(y)FontSize16)ax1=gcaset(ax1xtick[50 100 133 150]FontSize14)

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 23: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 23 of 38

VARY=EYsq-mu^2 disp( Var of a uniformly distributed random number on the interval (01)) disp(VARY) vary=eval(VARY) convert from symbolic to numeric The variance is 112 s=sprintf(Example 432 Uniform n=40f and samples=40fnsamples) s2=sprintf(Example 432 Uniform distribution 1st sample n=40fn)endC6=sum(C) sums of n replicates The Z ratios of the sums are in column C7 of Table 432C7=(C6-nmu)(sqrt(n)sqrt(vary))Table040302=[[1samples] C C6 C7]if n==5 Reproduce Table 432 fprintf( Table 432n) fprintf( y1 y2 y3 y4 y5 y Z-ration) fprintf(20f 74f 74f 74f 74f 74f 74f 74fnTable040302)end

Use the code on page 5-115 of Statistics toolbox to draw histogram Draw a histogram of one row of datahist(C(1))ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])ylabel(FrequencyFontSize16)xlabel(XFontSize16)title(s2FontSize20)figure(gcf)pause

histogram of the z scoresbinwidthscaling=7numbins This will produce 7 bins (35-(-35))edges=-35binwidthscaling35N=histc(C7edges)bar(edgesNsum(N)binwidthscaling1histc)ax1=gcaset(get(gcaChildren)FaceColor[8 8 1])axis([-36 36 -0005 105max(Nsum(N)binwidthscaling)])ylabel(DensityFontSize16)xlabel(Z-ratioFontSize16)set(ax1xtickedgesFontSize14)set(ax1ytick00107FontSize14)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) is f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=0

EEOS 601 Applied Statistics Week 4 P 24 of 38

sigma_j=1 sigmaj is the standard deviation = 1 after Z transformy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)

title(sFontSize20)figure(gcf)pause

Example 433 LMex040303_4thm Example 433 Pp 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition An application of the symbolic math integration functions Typo in book f_y(y)=y 3(1-y)^2 not fy(y)=3(1-y)^2 syms y f_y(y)=y 3(1-y)^2 0lt=y=1 Let mean y = (115)Sum from 1 to 15 Yi int(y3(1-y)^201) muy=eval(int(y3(1-y)^201)) by definition of variance int(y^23(1-y)^201)-14^2 vary=eval(int(y^23(1-y)^201)-14^2) P=normcdf((38-muy) (sqrt(vary)sqrt(15)))- normcdf((18-muy) (sqrt(vary)sqrt(15)))

Example 434 LMEx040304_4thm Example 434 Effects of rounding analyzed with normal approximation Effect of rounding on the expected value and variance of rounded estimates P 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Using the central limit theorem what is the probability that the error on 100 transactions each rounded to the nearest $100 would exceed $500 What is the variance of a uniformly distributed random number over the interval -50 to 50 From definition of Variance Var(Y)=E[(Y-mu)^2]= integral from -50 to 50 (y-mu)^2 f_y(y) dy mu=0 f_y(y)=1 This is the height of the uniform pdf at each value of y for max(y)-min(y)=1 the value of f_y(y) is 1 for uniform random numbers spread over different ranges for max(y)-min(y)f_y(y) must equal 1 thus for a uniform random number distributed on the interval -100 to 100

EEOS 601 Applied Statistics Week 4 P 25 of 38

f_y(y)=1200 Written by EugeneGallagherumbedu for EEOS601 httpwwwesumbeduedgwebphtm Written in 2001 revised 1911 The mfile goes through some applications of the symbolic math toolbox at the beginning to verify that the variance of 100 rounded transactions is 25003 format rat syms y mu int(y^2100-5050)Use the symbolic math toolbox to find the variance of a uniformly distributed variable on the interval -50 to 50 it is 25003 VarY=eval(int(sym(y^2100)-5050)) Variance of a uniform randomly distributed number on the interval -50 to 50 is 25003 Note how the variance remains 25003 with a shift in mean Var(Y)=E(Y^2)-mu^2 mu=50 int(y^21000100) EYsq=eval(int(sym(y^2100)0100)) The previous statement produces 100003 which produces an identical variance after subtracting the square of the expected value mu^2 as is appropriate from the definition of variance VARY=EYsq-mu^2 Note that Var(Y) is also defined as integral -inf to inf (y-mu)^2 f_y(y) dy We can create the definite integral of this equation and evaluate it to produce the variance syms y mu mu=50 VARY2=eval(int(sym((y-mu)^2100)y0100)) VARY2 should also be 25003 format using Theorem 31312 in Larsen amp Marx (2001 p 223) Calculate the variance of all 100 transactions (multiply the variance by 100) Note this is for 100 indepedently distributed random variates where the positive and negative errors among transactions are not correlated Varall=100VARY If you wanted to convert the variance calculated here in dollars to the variance in pennies youd have to use Theorem 3131 p 222 It would the variance of Varall in pennies would be a=100Var(aVarall)=a^2 Varall =10000 Varall calculate the standard deviation sigmaall=sqrt(Varall)

EEOS 601 Applied Statistics Week 4 P 26 of 38

Calculate the Z statisticZ=zeros(21)Z(1)=(-500-0)sigmaallZ(2)=(500-0)sigmaall Use the cumulative normal probability function to find the probability of being more than $5 dollars off after 100 transactionsP=1-(normcdf(Z(2))-normcdf(Z(1)))fprintf(The probability of being more than $500 off after 100n)fprintf(transactions in which each transaction is rounded to the n)fprintf(nearest $100 is 64fnP)

Example 435 LMex040305_4thm Example 435 Memphis earthquakes and the Poisson distribution Pp 305-306 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Annual earthquakes registering 25 on the Richter scale and within 40 miles of downtown Memphis have a Poisson distribution with lambda=65 Calculate the exact probability that nine or more such earthquakes will strike next year using the Poisson distribution and compare that with the normal approximation based on the central limit theoremlambda=65N=9P=1-poisscdf(N-1lambda)fprintf(nThe exact p value using the Poisson cumulative distribution )fprintf(function with lambda=53f is 74fnlambdaP)P2=1-normcdf(((N-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP2)continuity=05 The continuity correction is appropriate whenever a discrete probability model is being approximated by the area under a curve)P3=1-normcdf(((N+continuity-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem with )fprintf(the continuity correction is )fprintf(74fnP3)

Optional plot the Poisson pdf and the normal approximation Use LMFig040301_4thm as a modelX=018Y = poisspdf(Xlambda)bar(XY1)axis([-06 19 0 02])title(Example 435FontSize14)set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 27 of 38

ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0218FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=lambdasigma_j=sqrt(lambda) sigmaj is the standard deviationy_j=00218fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel(Number of Memphis Earthquakes gt 25)figure(gcf)pause

Questions p 306

435 The Normal Curve as a Model for Individual Measurements

Definition 431 A random variable Y is said to be normally distributed with mean igrave and variance oacutesup2 if

Comment Areas under and ldquoarbitraryrdquo normal distribution f (y) y

are calculated by finding the equivalent area under the standard normal distribution f (z) z Figure 10 Figure 435 from Example 436 If a driverrsquos

true blood alcohol level is 0095 and the breathalyzer has an instrumental error oacute= 0004 (pdf shown in upper panel) What is the probability that a driver will be

The ratio is often referred to erroneously charged with DUI The problem can be as either a Z transformation or a Z analyzed using the pdf for the unscaled data or the data can score be converted to z scores (lower panel) by subtracting the

mean and dividing by oacute The probability of an erroneous Example 436 charge is 1056 the area to the right of the cutpoints LMex040306_4thm shown in the normal and standard normal curves

EEOS 601 Applied Statistics Week 4 P 28 of 38

Example 436 Drunk driving pages 308-309 in Larsen amp Marx (2006) Based on LMex040305_4thm Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml A legally drunk driver in many states has a blood alcohol level of 010 but the breathalyzer has measurement error Repeated measurements indicate that the the estimate of the mean is unbiased and sigma is 0004 If a drivers true blood alcohol content is 0095 what is the probability that he will be incorrectly booked on a DUI chargemu=0095sigma=0004Observed=01P=1-normcdf((Observed-mu)sigma)fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP)

Plot Figure 435subplot(211)X=0075000050115Y = normpdf(Xmusigma)plot(XY--r)axis([0075 0115 0 105])title(Figure 435FontSize14)ax1=gcaxlabel(Blood Alcohol Level)ylabel(f_y(y)FontSize14)ax1=gcaset(ax1xtick0080005011FontSize12)set(ax1ytick025100FontSize12)hold onxf=0100001115yf=normpdf(xfmusigma)fill([01 xf 0115][0 yf 0][8 8 1])hold off

subplot(212)X=-3500135Y = normpdf(X01)plot(XY--r)axis([-355 355 0 042])ax1=gcaxlabel(Z-score)ylabel(f_z(z)FontSize14)ax1=gca

EEOS 601 Applied Statistics Week 4 P 29 of 38

set(ax1xtick-350535FontSize12)set(ax1ytick00204FontSize12)hold onxf=125000135yf=normpdf(xf01)fill([125 xf 35][0 yf 0][8 8 1])figure(gcf)pausehold offsubplot(111)

Example 437 Mensa p 309 LMex040307_4thm Example 437 Mensa amp IQ Pages309-310 in Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Based on LMex040306_4thm Written Fall 2010 by EugeneGallagher for EEOS601 Revised1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfi Figure 11 Figure 436 from Example 437 A Mensa

lesedgwebphtml score of 133 marks the upper 2 of IQrsquos if igrave=100 and

A person must be in the upper 2 oacute=16

of the population in IQ to join What is the minimum IQ that will qualify a person for membership Assume that the mean IQ is 100 and sigma is 16mu=100sigma=16 Use the inverse of the normal pdf to find the IQ cutoff for the upper 2 of the population 098=(1-002)IQmensa=norminv(09810016)fprintf(nThe lowest acceptable IQ for membership in Mensa is )fprintf(30fnceil(IQmensa))

Plot Figure 436X=50150Y = normpdf(Xmusigma)plot(XY--r)axis([45 155 0 003])title(Figure 436 Mensa amp IQFontSize20)ax1=gcaxlabel(IQFontSize16)ylabel(f_y(y)FontSize16)ax1=gcaset(ax1xtick[50 100 133 150]FontSize14)

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 24: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 24 of 38

sigma_j=1 sigmaj is the standard deviation = 1 after Z transformy_j=-350135fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_j ColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)

title(sFontSize20)figure(gcf)pause

Example 433 LMex040303_4thm Example 433 Pp 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition An application of the symbolic math integration functions Typo in book f_y(y)=y 3(1-y)^2 not fy(y)=3(1-y)^2 syms y f_y(y)=y 3(1-y)^2 0lt=y=1 Let mean y = (115)Sum from 1 to 15 Yi int(y3(1-y)^201) muy=eval(int(y3(1-y)^201)) by definition of variance int(y^23(1-y)^201)-14^2 vary=eval(int(y^23(1-y)^201)-14^2) P=normcdf((38-muy) (sqrt(vary)sqrt(15)))- normcdf((18-muy) (sqrt(vary)sqrt(15)))

Example 434 LMEx040304_4thm Example 434 Effects of rounding analyzed with normal approximation Effect of rounding on the expected value and variance of rounded estimates P 304 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Using the central limit theorem what is the probability that the error on 100 transactions each rounded to the nearest $100 would exceed $500 What is the variance of a uniformly distributed random number over the interval -50 to 50 From definition of Variance Var(Y)=E[(Y-mu)^2]= integral from -50 to 50 (y-mu)^2 f_y(y) dy mu=0 f_y(y)=1 This is the height of the uniform pdf at each value of y for max(y)-min(y)=1 the value of f_y(y) is 1 for uniform random numbers spread over different ranges for max(y)-min(y)f_y(y) must equal 1 thus for a uniform random number distributed on the interval -100 to 100

EEOS 601 Applied Statistics Week 4 P 25 of 38

f_y(y)=1200 Written by EugeneGallagherumbedu for EEOS601 httpwwwesumbeduedgwebphtm Written in 2001 revised 1911 The mfile goes through some applications of the symbolic math toolbox at the beginning to verify that the variance of 100 rounded transactions is 25003 format rat syms y mu int(y^2100-5050)Use the symbolic math toolbox to find the variance of a uniformly distributed variable on the interval -50 to 50 it is 25003 VarY=eval(int(sym(y^2100)-5050)) Variance of a uniform randomly distributed number on the interval -50 to 50 is 25003 Note how the variance remains 25003 with a shift in mean Var(Y)=E(Y^2)-mu^2 mu=50 int(y^21000100) EYsq=eval(int(sym(y^2100)0100)) The previous statement produces 100003 which produces an identical variance after subtracting the square of the expected value mu^2 as is appropriate from the definition of variance VARY=EYsq-mu^2 Note that Var(Y) is also defined as integral -inf to inf (y-mu)^2 f_y(y) dy We can create the definite integral of this equation and evaluate it to produce the variance syms y mu mu=50 VARY2=eval(int(sym((y-mu)^2100)y0100)) VARY2 should also be 25003 format using Theorem 31312 in Larsen amp Marx (2001 p 223) Calculate the variance of all 100 transactions (multiply the variance by 100) Note this is for 100 indepedently distributed random variates where the positive and negative errors among transactions are not correlated Varall=100VARY If you wanted to convert the variance calculated here in dollars to the variance in pennies youd have to use Theorem 3131 p 222 It would the variance of Varall in pennies would be a=100Var(aVarall)=a^2 Varall =10000 Varall calculate the standard deviation sigmaall=sqrt(Varall)

EEOS 601 Applied Statistics Week 4 P 26 of 38

Calculate the Z statisticZ=zeros(21)Z(1)=(-500-0)sigmaallZ(2)=(500-0)sigmaall Use the cumulative normal probability function to find the probability of being more than $5 dollars off after 100 transactionsP=1-(normcdf(Z(2))-normcdf(Z(1)))fprintf(The probability of being more than $500 off after 100n)fprintf(transactions in which each transaction is rounded to the n)fprintf(nearest $100 is 64fnP)

Example 435 LMex040305_4thm Example 435 Memphis earthquakes and the Poisson distribution Pp 305-306 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Annual earthquakes registering 25 on the Richter scale and within 40 miles of downtown Memphis have a Poisson distribution with lambda=65 Calculate the exact probability that nine or more such earthquakes will strike next year using the Poisson distribution and compare that with the normal approximation based on the central limit theoremlambda=65N=9P=1-poisscdf(N-1lambda)fprintf(nThe exact p value using the Poisson cumulative distribution )fprintf(function with lambda=53f is 74fnlambdaP)P2=1-normcdf(((N-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP2)continuity=05 The continuity correction is appropriate whenever a discrete probability model is being approximated by the area under a curve)P3=1-normcdf(((N+continuity-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem with )fprintf(the continuity correction is )fprintf(74fnP3)

Optional plot the Poisson pdf and the normal approximation Use LMFig040301_4thm as a modelX=018Y = poisspdf(Xlambda)bar(XY1)axis([-06 19 0 02])title(Example 435FontSize14)set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 27 of 38

ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0218FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=lambdasigma_j=sqrt(lambda) sigmaj is the standard deviationy_j=00218fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel(Number of Memphis Earthquakes gt 25)figure(gcf)pause

Questions p 306

435 The Normal Curve as a Model for Individual Measurements

Definition 431 A random variable Y is said to be normally distributed with mean igrave and variance oacutesup2 if

Comment Areas under and ldquoarbitraryrdquo normal distribution f (y) y

are calculated by finding the equivalent area under the standard normal distribution f (z) z Figure 10 Figure 435 from Example 436 If a driverrsquos

true blood alcohol level is 0095 and the breathalyzer has an instrumental error oacute= 0004 (pdf shown in upper panel) What is the probability that a driver will be

The ratio is often referred to erroneously charged with DUI The problem can be as either a Z transformation or a Z analyzed using the pdf for the unscaled data or the data can score be converted to z scores (lower panel) by subtracting the

mean and dividing by oacute The probability of an erroneous Example 436 charge is 1056 the area to the right of the cutpoints LMex040306_4thm shown in the normal and standard normal curves

EEOS 601 Applied Statistics Week 4 P 28 of 38

Example 436 Drunk driving pages 308-309 in Larsen amp Marx (2006) Based on LMex040305_4thm Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml A legally drunk driver in many states has a blood alcohol level of 010 but the breathalyzer has measurement error Repeated measurements indicate that the the estimate of the mean is unbiased and sigma is 0004 If a drivers true blood alcohol content is 0095 what is the probability that he will be incorrectly booked on a DUI chargemu=0095sigma=0004Observed=01P=1-normcdf((Observed-mu)sigma)fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP)

Plot Figure 435subplot(211)X=0075000050115Y = normpdf(Xmusigma)plot(XY--r)axis([0075 0115 0 105])title(Figure 435FontSize14)ax1=gcaxlabel(Blood Alcohol Level)ylabel(f_y(y)FontSize14)ax1=gcaset(ax1xtick0080005011FontSize12)set(ax1ytick025100FontSize12)hold onxf=0100001115yf=normpdf(xfmusigma)fill([01 xf 0115][0 yf 0][8 8 1])hold off

subplot(212)X=-3500135Y = normpdf(X01)plot(XY--r)axis([-355 355 0 042])ax1=gcaxlabel(Z-score)ylabel(f_z(z)FontSize14)ax1=gca

EEOS 601 Applied Statistics Week 4 P 29 of 38

set(ax1xtick-350535FontSize12)set(ax1ytick00204FontSize12)hold onxf=125000135yf=normpdf(xf01)fill([125 xf 35][0 yf 0][8 8 1])figure(gcf)pausehold offsubplot(111)

Example 437 Mensa p 309 LMex040307_4thm Example 437 Mensa amp IQ Pages309-310 in Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Based on LMex040306_4thm Written Fall 2010 by EugeneGallagher for EEOS601 Revised1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfi Figure 11 Figure 436 from Example 437 A Mensa

lesedgwebphtml score of 133 marks the upper 2 of IQrsquos if igrave=100 and

A person must be in the upper 2 oacute=16

of the population in IQ to join What is the minimum IQ that will qualify a person for membership Assume that the mean IQ is 100 and sigma is 16mu=100sigma=16 Use the inverse of the normal pdf to find the IQ cutoff for the upper 2 of the population 098=(1-002)IQmensa=norminv(09810016)fprintf(nThe lowest acceptable IQ for membership in Mensa is )fprintf(30fnceil(IQmensa))

Plot Figure 436X=50150Y = normpdf(Xmusigma)plot(XY--r)axis([45 155 0 003])title(Figure 436 Mensa amp IQFontSize20)ax1=gcaxlabel(IQFontSize16)ylabel(f_y(y)FontSize16)ax1=gcaset(ax1xtick[50 100 133 150]FontSize14)

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 25: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 25 of 38

f_y(y)=1200 Written by EugeneGallagherumbedu for EEOS601 httpwwwesumbeduedgwebphtm Written in 2001 revised 1911 The mfile goes through some applications of the symbolic math toolbox at the beginning to verify that the variance of 100 rounded transactions is 25003 format rat syms y mu int(y^2100-5050)Use the symbolic math toolbox to find the variance of a uniformly distributed variable on the interval -50 to 50 it is 25003 VarY=eval(int(sym(y^2100)-5050)) Variance of a uniform randomly distributed number on the interval -50 to 50 is 25003 Note how the variance remains 25003 with a shift in mean Var(Y)=E(Y^2)-mu^2 mu=50 int(y^21000100) EYsq=eval(int(sym(y^2100)0100)) The previous statement produces 100003 which produces an identical variance after subtracting the square of the expected value mu^2 as is appropriate from the definition of variance VARY=EYsq-mu^2 Note that Var(Y) is also defined as integral -inf to inf (y-mu)^2 f_y(y) dy We can create the definite integral of this equation and evaluate it to produce the variance syms y mu mu=50 VARY2=eval(int(sym((y-mu)^2100)y0100)) VARY2 should also be 25003 format using Theorem 31312 in Larsen amp Marx (2001 p 223) Calculate the variance of all 100 transactions (multiply the variance by 100) Note this is for 100 indepedently distributed random variates where the positive and negative errors among transactions are not correlated Varall=100VARY If you wanted to convert the variance calculated here in dollars to the variance in pennies youd have to use Theorem 3131 p 222 It would the variance of Varall in pennies would be a=100Var(aVarall)=a^2 Varall =10000 Varall calculate the standard deviation sigmaall=sqrt(Varall)

EEOS 601 Applied Statistics Week 4 P 26 of 38

Calculate the Z statisticZ=zeros(21)Z(1)=(-500-0)sigmaallZ(2)=(500-0)sigmaall Use the cumulative normal probability function to find the probability of being more than $5 dollars off after 100 transactionsP=1-(normcdf(Z(2))-normcdf(Z(1)))fprintf(The probability of being more than $500 off after 100n)fprintf(transactions in which each transaction is rounded to the n)fprintf(nearest $100 is 64fnP)

Example 435 LMex040305_4thm Example 435 Memphis earthquakes and the Poisson distribution Pp 305-306 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Annual earthquakes registering 25 on the Richter scale and within 40 miles of downtown Memphis have a Poisson distribution with lambda=65 Calculate the exact probability that nine or more such earthquakes will strike next year using the Poisson distribution and compare that with the normal approximation based on the central limit theoremlambda=65N=9P=1-poisscdf(N-1lambda)fprintf(nThe exact p value using the Poisson cumulative distribution )fprintf(function with lambda=53f is 74fnlambdaP)P2=1-normcdf(((N-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP2)continuity=05 The continuity correction is appropriate whenever a discrete probability model is being approximated by the area under a curve)P3=1-normcdf(((N+continuity-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem with )fprintf(the continuity correction is )fprintf(74fnP3)

Optional plot the Poisson pdf and the normal approximation Use LMFig040301_4thm as a modelX=018Y = poisspdf(Xlambda)bar(XY1)axis([-06 19 0 02])title(Example 435FontSize14)set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 27 of 38

ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0218FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=lambdasigma_j=sqrt(lambda) sigmaj is the standard deviationy_j=00218fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel(Number of Memphis Earthquakes gt 25)figure(gcf)pause

Questions p 306

435 The Normal Curve as a Model for Individual Measurements

Definition 431 A random variable Y is said to be normally distributed with mean igrave and variance oacutesup2 if

Comment Areas under and ldquoarbitraryrdquo normal distribution f (y) y

are calculated by finding the equivalent area under the standard normal distribution f (z) z Figure 10 Figure 435 from Example 436 If a driverrsquos

true blood alcohol level is 0095 and the breathalyzer has an instrumental error oacute= 0004 (pdf shown in upper panel) What is the probability that a driver will be

The ratio is often referred to erroneously charged with DUI The problem can be as either a Z transformation or a Z analyzed using the pdf for the unscaled data or the data can score be converted to z scores (lower panel) by subtracting the

mean and dividing by oacute The probability of an erroneous Example 436 charge is 1056 the area to the right of the cutpoints LMex040306_4thm shown in the normal and standard normal curves

EEOS 601 Applied Statistics Week 4 P 28 of 38

Example 436 Drunk driving pages 308-309 in Larsen amp Marx (2006) Based on LMex040305_4thm Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml A legally drunk driver in many states has a blood alcohol level of 010 but the breathalyzer has measurement error Repeated measurements indicate that the the estimate of the mean is unbiased and sigma is 0004 If a drivers true blood alcohol content is 0095 what is the probability that he will be incorrectly booked on a DUI chargemu=0095sigma=0004Observed=01P=1-normcdf((Observed-mu)sigma)fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP)

Plot Figure 435subplot(211)X=0075000050115Y = normpdf(Xmusigma)plot(XY--r)axis([0075 0115 0 105])title(Figure 435FontSize14)ax1=gcaxlabel(Blood Alcohol Level)ylabel(f_y(y)FontSize14)ax1=gcaset(ax1xtick0080005011FontSize12)set(ax1ytick025100FontSize12)hold onxf=0100001115yf=normpdf(xfmusigma)fill([01 xf 0115][0 yf 0][8 8 1])hold off

subplot(212)X=-3500135Y = normpdf(X01)plot(XY--r)axis([-355 355 0 042])ax1=gcaxlabel(Z-score)ylabel(f_z(z)FontSize14)ax1=gca

EEOS 601 Applied Statistics Week 4 P 29 of 38

set(ax1xtick-350535FontSize12)set(ax1ytick00204FontSize12)hold onxf=125000135yf=normpdf(xf01)fill([125 xf 35][0 yf 0][8 8 1])figure(gcf)pausehold offsubplot(111)

Example 437 Mensa p 309 LMex040307_4thm Example 437 Mensa amp IQ Pages309-310 in Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Based on LMex040306_4thm Written Fall 2010 by EugeneGallagher for EEOS601 Revised1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfi Figure 11 Figure 436 from Example 437 A Mensa

lesedgwebphtml score of 133 marks the upper 2 of IQrsquos if igrave=100 and

A person must be in the upper 2 oacute=16

of the population in IQ to join What is the minimum IQ that will qualify a person for membership Assume that the mean IQ is 100 and sigma is 16mu=100sigma=16 Use the inverse of the normal pdf to find the IQ cutoff for the upper 2 of the population 098=(1-002)IQmensa=norminv(09810016)fprintf(nThe lowest acceptable IQ for membership in Mensa is )fprintf(30fnceil(IQmensa))

Plot Figure 436X=50150Y = normpdf(Xmusigma)plot(XY--r)axis([45 155 0 003])title(Figure 436 Mensa amp IQFontSize20)ax1=gcaxlabel(IQFontSize16)ylabel(f_y(y)FontSize16)ax1=gcaset(ax1xtick[50 100 133 150]FontSize14)

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 26: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 26 of 38

Calculate the Z statisticZ=zeros(21)Z(1)=(-500-0)sigmaallZ(2)=(500-0)sigmaall Use the cumulative normal probability function to find the probability of being more than $5 dollars off after 100 transactionsP=1-(normcdf(Z(2))-normcdf(Z(1)))fprintf(The probability of being more than $500 off after 100n)fprintf(transactions in which each transaction is rounded to the n)fprintf(nearest $100 is 64fnP)

Example 435 LMex040305_4thm Example 435 Memphis earthquakes and the Poisson distribution Pp 305-306 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Annual earthquakes registering 25 on the Richter scale and within 40 miles of downtown Memphis have a Poisson distribution with lambda=65 Calculate the exact probability that nine or more such earthquakes will strike next year using the Poisson distribution and compare that with the normal approximation based on the central limit theoremlambda=65N=9P=1-poisscdf(N-1lambda)fprintf(nThe exact p value using the Poisson cumulative distribution )fprintf(function with lambda=53f is 74fnlambdaP)P2=1-normcdf(((N-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP2)continuity=05 The continuity correction is appropriate whenever a discrete probability model is being approximated by the area under a curve)P3=1-normcdf(((N+continuity-1)-lambda)sqrt(lambda))fprintf(nThe approximate p value using the central limit theorem with )fprintf(the continuity correction is )fprintf(74fnP3)

Optional plot the Poisson pdf and the normal approximation Use LMFig040301_4thm as a modelX=018Y = poisspdf(Xlambda)bar(XY1)axis([-06 19 0 02])title(Example 435FontSize14)set(get(gcaChildren)FaceColor[8 8 1])

EEOS 601 Applied Statistics Week 4 P 27 of 38

ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0218FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=lambdasigma_j=sqrt(lambda) sigmaj is the standard deviationy_j=00218fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel(Number of Memphis Earthquakes gt 25)figure(gcf)pause

Questions p 306

435 The Normal Curve as a Model for Individual Measurements

Definition 431 A random variable Y is said to be normally distributed with mean igrave and variance oacutesup2 if

Comment Areas under and ldquoarbitraryrdquo normal distribution f (y) y

are calculated by finding the equivalent area under the standard normal distribution f (z) z Figure 10 Figure 435 from Example 436 If a driverrsquos

true blood alcohol level is 0095 and the breathalyzer has an instrumental error oacute= 0004 (pdf shown in upper panel) What is the probability that a driver will be

The ratio is often referred to erroneously charged with DUI The problem can be as either a Z transformation or a Z analyzed using the pdf for the unscaled data or the data can score be converted to z scores (lower panel) by subtracting the

mean and dividing by oacute The probability of an erroneous Example 436 charge is 1056 the area to the right of the cutpoints LMex040306_4thm shown in the normal and standard normal curves

EEOS 601 Applied Statistics Week 4 P 28 of 38

Example 436 Drunk driving pages 308-309 in Larsen amp Marx (2006) Based on LMex040305_4thm Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml A legally drunk driver in many states has a blood alcohol level of 010 but the breathalyzer has measurement error Repeated measurements indicate that the the estimate of the mean is unbiased and sigma is 0004 If a drivers true blood alcohol content is 0095 what is the probability that he will be incorrectly booked on a DUI chargemu=0095sigma=0004Observed=01P=1-normcdf((Observed-mu)sigma)fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP)

Plot Figure 435subplot(211)X=0075000050115Y = normpdf(Xmusigma)plot(XY--r)axis([0075 0115 0 105])title(Figure 435FontSize14)ax1=gcaxlabel(Blood Alcohol Level)ylabel(f_y(y)FontSize14)ax1=gcaset(ax1xtick0080005011FontSize12)set(ax1ytick025100FontSize12)hold onxf=0100001115yf=normpdf(xfmusigma)fill([01 xf 0115][0 yf 0][8 8 1])hold off

subplot(212)X=-3500135Y = normpdf(X01)plot(XY--r)axis([-355 355 0 042])ax1=gcaxlabel(Z-score)ylabel(f_z(z)FontSize14)ax1=gca

EEOS 601 Applied Statistics Week 4 P 29 of 38

set(ax1xtick-350535FontSize12)set(ax1ytick00204FontSize12)hold onxf=125000135yf=normpdf(xf01)fill([125 xf 35][0 yf 0][8 8 1])figure(gcf)pausehold offsubplot(111)

Example 437 Mensa p 309 LMex040307_4thm Example 437 Mensa amp IQ Pages309-310 in Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Based on LMex040306_4thm Written Fall 2010 by EugeneGallagher for EEOS601 Revised1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfi Figure 11 Figure 436 from Example 437 A Mensa

lesedgwebphtml score of 133 marks the upper 2 of IQrsquos if igrave=100 and

A person must be in the upper 2 oacute=16

of the population in IQ to join What is the minimum IQ that will qualify a person for membership Assume that the mean IQ is 100 and sigma is 16mu=100sigma=16 Use the inverse of the normal pdf to find the IQ cutoff for the upper 2 of the population 098=(1-002)IQmensa=norminv(09810016)fprintf(nThe lowest acceptable IQ for membership in Mensa is )fprintf(30fnceil(IQmensa))

Plot Figure 436X=50150Y = normpdf(Xmusigma)plot(XY--r)axis([45 155 0 003])title(Figure 436 Mensa amp IQFontSize20)ax1=gcaxlabel(IQFontSize16)ylabel(f_y(y)FontSize16)ax1=gcaset(ax1xtick[50 100 133 150]FontSize14)

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 27: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 27 of 38

ylabel(ProbabilityFontSize14)ax1=gcaset(ax1xtick0218FontSize12)set(ax1ytick000502FontSize12)figure(gcf)pause Superimpose the normal probability pdf The normal probability equation is provided on p 293 This is for mean 0 and unit standard deviation The more general equation (Legendre amp Legendre 1998 p 147) f(y_j)=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2)mu_j=lambdasigma_j=sqrt(lambda) sigmaj is the standard deviationy_j=00218fy_j=1(sqrt(2pi)sigma_j)exp(-12((y_j-mu_j)sigma_j)^2) fyj=1(sqrt(2pi)sigmaj)exp(-12((y-muj)sigmaj)^2) Plot using ax1 handle saved aboveto save this graph on top of the previous graphh1=line(y_jfy_jColorrParentax1Linewidth2)set(h1linestyle--colorrlinewidth2)xlabel(Number of Memphis Earthquakes gt 25)figure(gcf)pause

Questions p 306

435 The Normal Curve as a Model for Individual Measurements

Definition 431 A random variable Y is said to be normally distributed with mean igrave and variance oacutesup2 if

Comment Areas under and ldquoarbitraryrdquo normal distribution f (y) y

are calculated by finding the equivalent area under the standard normal distribution f (z) z Figure 10 Figure 435 from Example 436 If a driverrsquos

true blood alcohol level is 0095 and the breathalyzer has an instrumental error oacute= 0004 (pdf shown in upper panel) What is the probability that a driver will be

The ratio is often referred to erroneously charged with DUI The problem can be as either a Z transformation or a Z analyzed using the pdf for the unscaled data or the data can score be converted to z scores (lower panel) by subtracting the

mean and dividing by oacute The probability of an erroneous Example 436 charge is 1056 the area to the right of the cutpoints LMex040306_4thm shown in the normal and standard normal curves

EEOS 601 Applied Statistics Week 4 P 28 of 38

Example 436 Drunk driving pages 308-309 in Larsen amp Marx (2006) Based on LMex040305_4thm Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml A legally drunk driver in many states has a blood alcohol level of 010 but the breathalyzer has measurement error Repeated measurements indicate that the the estimate of the mean is unbiased and sigma is 0004 If a drivers true blood alcohol content is 0095 what is the probability that he will be incorrectly booked on a DUI chargemu=0095sigma=0004Observed=01P=1-normcdf((Observed-mu)sigma)fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP)

Plot Figure 435subplot(211)X=0075000050115Y = normpdf(Xmusigma)plot(XY--r)axis([0075 0115 0 105])title(Figure 435FontSize14)ax1=gcaxlabel(Blood Alcohol Level)ylabel(f_y(y)FontSize14)ax1=gcaset(ax1xtick0080005011FontSize12)set(ax1ytick025100FontSize12)hold onxf=0100001115yf=normpdf(xfmusigma)fill([01 xf 0115][0 yf 0][8 8 1])hold off

subplot(212)X=-3500135Y = normpdf(X01)plot(XY--r)axis([-355 355 0 042])ax1=gcaxlabel(Z-score)ylabel(f_z(z)FontSize14)ax1=gca

EEOS 601 Applied Statistics Week 4 P 29 of 38

set(ax1xtick-350535FontSize12)set(ax1ytick00204FontSize12)hold onxf=125000135yf=normpdf(xf01)fill([125 xf 35][0 yf 0][8 8 1])figure(gcf)pausehold offsubplot(111)

Example 437 Mensa p 309 LMex040307_4thm Example 437 Mensa amp IQ Pages309-310 in Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Based on LMex040306_4thm Written Fall 2010 by EugeneGallagher for EEOS601 Revised1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfi Figure 11 Figure 436 from Example 437 A Mensa

lesedgwebphtml score of 133 marks the upper 2 of IQrsquos if igrave=100 and

A person must be in the upper 2 oacute=16

of the population in IQ to join What is the minimum IQ that will qualify a person for membership Assume that the mean IQ is 100 and sigma is 16mu=100sigma=16 Use the inverse of the normal pdf to find the IQ cutoff for the upper 2 of the population 098=(1-002)IQmensa=norminv(09810016)fprintf(nThe lowest acceptable IQ for membership in Mensa is )fprintf(30fnceil(IQmensa))

Plot Figure 436X=50150Y = normpdf(Xmusigma)plot(XY--r)axis([45 155 0 003])title(Figure 436 Mensa amp IQFontSize20)ax1=gcaxlabel(IQFontSize16)ylabel(f_y(y)FontSize16)ax1=gcaset(ax1xtick[50 100 133 150]FontSize14)

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 28: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 28 of 38

Example 436 Drunk driving pages 308-309 in Larsen amp Marx (2006) Based on LMex040305_4thm Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml A legally drunk driver in many states has a blood alcohol level of 010 but the breathalyzer has measurement error Repeated measurements indicate that the the estimate of the mean is unbiased and sigma is 0004 If a drivers true blood alcohol content is 0095 what is the probability that he will be incorrectly booked on a DUI chargemu=0095sigma=0004Observed=01P=1-normcdf((Observed-mu)sigma)fprintf(nThe approximate p value using the central limit theorem is )fprintf(74fnP)

Plot Figure 435subplot(211)X=0075000050115Y = normpdf(Xmusigma)plot(XY--r)axis([0075 0115 0 105])title(Figure 435FontSize14)ax1=gcaxlabel(Blood Alcohol Level)ylabel(f_y(y)FontSize14)ax1=gcaset(ax1xtick0080005011FontSize12)set(ax1ytick025100FontSize12)hold onxf=0100001115yf=normpdf(xfmusigma)fill([01 xf 0115][0 yf 0][8 8 1])hold off

subplot(212)X=-3500135Y = normpdf(X01)plot(XY--r)axis([-355 355 0 042])ax1=gcaxlabel(Z-score)ylabel(f_z(z)FontSize14)ax1=gca

EEOS 601 Applied Statistics Week 4 P 29 of 38

set(ax1xtick-350535FontSize12)set(ax1ytick00204FontSize12)hold onxf=125000135yf=normpdf(xf01)fill([125 xf 35][0 yf 0][8 8 1])figure(gcf)pausehold offsubplot(111)

Example 437 Mensa p 309 LMex040307_4thm Example 437 Mensa amp IQ Pages309-310 in Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Based on LMex040306_4thm Written Fall 2010 by EugeneGallagher for EEOS601 Revised1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfi Figure 11 Figure 436 from Example 437 A Mensa

lesedgwebphtml score of 133 marks the upper 2 of IQrsquos if igrave=100 and

A person must be in the upper 2 oacute=16

of the population in IQ to join What is the minimum IQ that will qualify a person for membership Assume that the mean IQ is 100 and sigma is 16mu=100sigma=16 Use the inverse of the normal pdf to find the IQ cutoff for the upper 2 of the population 098=(1-002)IQmensa=norminv(09810016)fprintf(nThe lowest acceptable IQ for membership in Mensa is )fprintf(30fnceil(IQmensa))

Plot Figure 436X=50150Y = normpdf(Xmusigma)plot(XY--r)axis([45 155 0 003])title(Figure 436 Mensa amp IQFontSize20)ax1=gcaxlabel(IQFontSize16)ylabel(f_y(y)FontSize16)ax1=gcaset(ax1xtick[50 100 133 150]FontSize14)

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 29: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 29 of 38

set(ax1xtick-350535FontSize12)set(ax1ytick00204FontSize12)hold onxf=125000135yf=normpdf(xf01)fill([125 xf 35][0 yf 0][8 8 1])figure(gcf)pausehold offsubplot(111)

Example 437 Mensa p 309 LMex040307_4thm Example 437 Mensa amp IQ Pages309-310 in Larsen amp Marx (2006)Introduction to MathematicalStatistics 4th edition Based on LMex040306_4thm Written Fall 2010 by EugeneGallagher for EEOS601 Revised1911 EugeneGallagherumbedu httpalphaesumbedufacultyedgfi Figure 11 Figure 436 from Example 437 A Mensa

lesedgwebphtml score of 133 marks the upper 2 of IQrsquos if igrave=100 and

A person must be in the upper 2 oacute=16

of the population in IQ to join What is the minimum IQ that will qualify a person for membership Assume that the mean IQ is 100 and sigma is 16mu=100sigma=16 Use the inverse of the normal pdf to find the IQ cutoff for the upper 2 of the population 098=(1-002)IQmensa=norminv(09810016)fprintf(nThe lowest acceptable IQ for membership in Mensa is )fprintf(30fnceil(IQmensa))

Plot Figure 436X=50150Y = normpdf(Xmusigma)plot(XY--r)axis([45 155 0 003])title(Figure 436 Mensa amp IQFontSize20)ax1=gcaxlabel(IQFontSize16)ylabel(f_y(y)FontSize16)ax1=gcaset(ax1xtick[50 100 133 150]FontSize14)

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 30: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 30 of 38

set(ax1ytick001003FontSize14)hold onxf=ceil(IQmensa)155yf=normpdf(xfmusigma)fill([ceil(IQmensa) xf 155][0 yf 0][8 8 1])text(500007Area=098FontSize18)text(1350007Area=002FontSize18)figure(gcf)pausehold off

Example 438 Army Anti-Tank Missile LMex040308_4thm Example 438 Truck-launced antitank missile Pages 310-11 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised

Figure 12 Figure 437 from Example 438 A Mensa 1811 EugeneGallagherumbedu

score of 133 marks the upper 2 of IQrsquos if igrave=100 and oacute=16

httpalphaesumbedufacultyedgfilesedgwebphtml The army is soliciting proposlas requiring that the automatic sighting mechanism be sufficiently reliable to guarantee that 95 of the missiles fall no more than 50 feet short or 50 feet beyond the target What is the largest sigma compatible with that degree of precision Assume that Y the distance a missile travels is normally distributed with its mean (mu) equal to the the length of separation between the truck and the targetZvalue=norminv(0975)Zvalue=50sigma so sigma=50Zvaluefprintf(nThe largest acceptable sigma is 42f feetnsigma)

Plot Figure 437X=-7575mu=0Y = normpdf(Xmusigma)plot(XY--r)axis([-75 75 0 002])title(Figure 437 Tank Missile PrecisionFontSize20)ax1=gcaxlabel(Distance between truck and targetFontSize16)ylabel(f_y(y)FontSize16)ax1=gca

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 31: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 31 of 38

set(ax1xtick-505050FontSize14 XTickLabelmu-50mumu+50FontSize14)

set(ax1ytick001002FontSize14) hold on xf=-5050yf=normpdf(xfmusigma) fill([-50 xf 50][0 yf 0][8 8 1]) figure(gcf)pause hold off

Example 439 Deals with Moment generating functions not covered in EEOS601

Theorem 433 Let Y be a normally distributed random variable with mean igrave and1 1

variance oacutesup2 and let Y be a normally distributed random variable with mean igrave1 2 2

and variance oacutesup2 Define Y=Y +Y If Y and Y are independent Y is normally 2 1 2 1 2

distributed with mean igrave + igrave and variance oacutesup2 + oacutesup2 1 2 1 2

Corollary Let Y Y2 Y be a random sample of size n from a normal 1 n

distribution with mean igrave and variance oacutesup2 Then the sample mean

also normally distributed with mean igrave but with variance equal to oacutesup2n (which

implies that is a standard normal random variable Z)

is

Corollary Let Y Y2 Y be any set of independent normal random variables 1 n

with means igrave igrave igrave and variances oacutesup2 oacutesup2 oacutesup2 respectively Let a a a 1 2 n 1 2 n 1 2 n

be any set of constants Then Y = a Y + a Y + + a Y is normally distributed 1 1 2 2 n n

with mean and variance

Example 4310 LMex040310_4thm Example 4310 Swampwater Tech Elevator Pages 312-313 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml Swampwater Tech elevators have a maximum capacity of 2400 lb Suppose that ten Swampwater Tech football players enter the elevator and that the football player weights are normally distributed with mean = 220 lb and

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 32: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 32 of 38

standard deviation = 20 lb What is the probability that there will be 10 fewer football players at tomorrows practicemu=220sigma=20n=10P=1-normcdf(240mu20sqrt(n))fprintf(nThe probability of an elevator collapse is 42gnP)P2=1-normcdf(2400munsqrt(n)20)fprintf(nThe probability of an elevator collapse is 42gnP2) If 11 players baoarded the elevatorn=11P3=1-normcdf(2400munsqrt(n)20)fprintf(nIf 11 players entererd the elevator the expected weight n)fprintf(is 40f lb and the probability of collapse is 53fn mu11 P3)

Example 4311 LMex040311_4thm Example 4311 Aptitude tests for a corporation Pages 313-314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml The personnel department gives a verbal (Y1) and quantitative aptitude (Y2)test to all applicants Y1 is normally distributed with mu1=50 and sigma1=20 and Y2 has mu2=100 and sigma2=20 Scores are independent A composit score is assigned to each applicant Y=3Y1+2Y2 The company rejects any applicant whose score is less than 375 If six individuals apply what are the chances that fewer than half will fail the screening test mu1=50 sigma1=10 mu2=100 sigma2=20 cutoff=375 n=6 EY=3mu1+2mu2 VarY=3^2sigma1^2+2^2sigma2^2 fprintf(For a single applicant the expected score is 30f with EY) fprintf(variance = 40fnVarY) Zscore=((375-EY)sqrt(VarY)) P=normcdf(Zscore01) fprintf(The probability of a single applicant being rejected is 64fnP) fprintf(What is the probability that at most 2 of 6 will fail the testn) P2=binocdf(26P)

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 33: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 33 of 38

fprintf(The probability that fewer than half of 6 applicants will failn) fprintf(the test is 64fnP2)

Example 4312 LMex040312_4thm Example 4312 Pages 314 in Larsen amp Marx (2006) Introduction to Mathematical Statistics 4th edition Based on LMex040306_4thm Written Fall 2010 by Eugene Gallagher for EEOS601 Revised 1811 EugeneGallagherumbedu httpalphaesumbedufacultyedgfilesedgwebphtml An application of propagation of error Y=random sample of size 9 from a normal distribution with EY=2 SigmaY=2 nY=9 Ystar=independent random sample of size 4 from a normal distribution with EYstar=1 SigmaYstar=1 nYstar=4 Find Probability mean(Y)gt=mean(Ystar) E_YminusYstar=EY-EYstar Var_YminusYstar= VarY+VarYstar format rat Var_YminusYstar=SigmaY^2nY+SigmaYstar^2nYstar format P=1-normcdf((0-E_YminusYstar)sqrt(Var_YminusYstar)) fprintf(The probability that Y gt= Ystar = 64fnP)

44 The geometric distribution Not covered in Summer 2011 Example 441 Available on course website Theorem 441 Example 442 Available on course website Example 443 Not programmed (Not really a Matlab-type problem) Example 444 Not programmed Questions p 321

45 THE NEGATIVE BINOMIAL DISTRIBUTION Theorem 451 Example 451 Programmed in Matlab but difficult Questions p 326

46 THE GAMMA DISTRIBUTION p 296 Waiting time for the rth event to occur

Theorem 461

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 34: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 34 of 38

Example 461 Programmed as an example of the symbolic math toolbox 461 Generalizing the Waiting Time Distribution

Theorem 462 Definition 462 Theorem 463

462 Sums of Gamma Random Variables Theorem 464 Example 462 Theorem 465

47 TAKING A SECOND LOOK AT STATISTICS (MONTE CARLO SIMULATIONS) 471 Monte Carlo simulations are often used when the underlying probability

model is complex 472 Section 47 analyzes a compound probability problem involving a

Poissonexponential process and a normal pdf 473 A 2-year warranty is offered for a plasma TV 474 Lambda = 075 service calls per year at an average cost of $100 per repair

(oacute=20) 475 Is the $200 2-year warranty a good deal 476 Not mentioned by Larsen amp Marx (2006) but the expected repair cost =

275$100=$150 but what is the variance 4761 ldquoFor any particular customer the value of W [the warranty] will

depend on (1) the number of repairs needed in the 1st two years and (2) the cost of each repair Although we have reliability and cost assumptions that address(1) and (2) the 2-yr limit on the warranty introduces a complexity that goes beyond what we have learned in Chapters 3 and 4 What remains is the option of using random samples to simulate the repair costs that might accrue during those first two yearsrdquo [p 333]

LM0407_4thm A Monte Carlo simulation with graphics of the cost of a warranty A simulation combining the exponential pdf and Poisson pdf For a plasma TV screen the sreen is expected to require 075 service calls per year The cost of normally distributed and independently distributed service calls have mean=$100 and sigma=20 Should the buyer buy a warranty costing $200 for two years Written by EugeneGallagherumbedu for EEOS601 11011 Revised Theorem 423 (Larsen amp Marx 2006 P 290) implies that the interval between repairs will follow the exponential distribution

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 35: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 35 of 38

Figure 471lambda=075mu=100sigma=20y=0145fyy=exppdf(y1lambda) Notethat Matlabs exp pdf is defined differently from Larsen amp Marx Must enter 1lambda to observe same resultplot(yfyy--rLineWidth3) Figure 13 Figure 471 An application of Matlabrsquosaxis([0 44 0 08]) exponential distribution which is defined differently fromax1=gca Larson amp Marx (2006) y=0145fyy=exppdf(y1lambda)set(ax1xtick[04]FontSize14FontSize14)ax1=gcaxlabel(Interval between repair calls (yr)FontSize16)ylabel(f_y(y)FontSize16)title(Figure 471FontSize20)figure(gcf)pause

Plot 472 usingLMex040308_4thm as modelX=38162mu=100sigma=20Y = normpdf(Xmusigma)plot(XY--rLineWidth3)axis([35 165 0 0021])title(Figure 472 Cost of ScreenRepairFontSize20)ax1=gcaxlabel(CostFontSize16) Figure 14 Figure 472 An application of Matlabrsquos normal

ylabel(f_c(c)FontSize16) pdf Page 334 in Larsen amp Marx (2006) page 334

ax1=gcaset(ax1xtick[40 100 160]FontSize14

XTickLabel$40(mu-3 sigma)$100$160 (mu+3 sigma) FontSize14)

set(ax1ytick001002FontSize14) figure(gcf)pause

Program with nested for loop Trials=100

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 36: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

Figure 15 Figure 476 The result of 100 Monte Carlosimulations of repair costs Page 336 in Larsen amp Marx(2006) page 334 The max median and mean repair billsare $737 $142 and $161 15 of customers had no repairbills

EEOS 601 Applied Statistics Week 4 P 36 of 38

time=0 COSTS=zeros(Trials1) for i=1Trials

cost=0 time=0 while timelt=2 time=exprnd(1lambda)+time if timelt=2 cost=cost+normrnd(musigma)

end end

COSTS(i)=costendfprintf(The max median and mean repair bills are $40f $30f and $30fnmax(COSTS)median(COSTS)mean(COSTS))fprintf(41f of customers had no repair costssum(~COSTS)Trials100)

Use the code from page 5-115 of Statistics toolbox amp Ex040302_4thm to draw histogram Figure 476if max(COSTS)gt400

maxbin=ceil(max(COSTS)200)200else

maxbin=ceil(max(COSTS)100)100endedges=050maxbinN=histc(COSTSedges)bar(edgesN1histc)maxf=ceil(max(N)5)5ax1=gcaset(get(gcaChildren)FaceColor[88 1])axis([0 maxbin 0 105maxf])ylabel(FrequencyFontSize16)xlabel(Simulated repaircostsFontSize16)set(ax1xtickedgesFontSize14)if maxflt20

Figure 16 Figure 476 The result of 1 million Monte

set(ax1ytick05maxfFontSize14) Carlo simulations of repair costs Page 336 in Larsen amp

elseif maxflt50 Marx (2006) page 334 The max median and mean repair bills are $1037 $119 and $150 223 had no repair costs

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 37: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 37 of 38

set(ax1ytick010maxfFontSize14) elseif maxflt100 set(ax1ytick020maxfFontSize14) elseif maxflt500 set(ax1ytick0100maxfFontSize14) elseif maxflt1000 set(ax1ytick0200maxfFontSize14) elseif maxflt10000 set(ax1ytick01000maxfFontSize14) elseif maxflt100000 set(ax1ytick010000maxfFontSize14) end s=sprintf(Figure 476 Trials=70fTrials) title(sFontSize20) figure(gcf)pause

Appendix 4A1 Mintab applications Appendix 4A2 A proof of the central limit theorem

References

Larsen R J and M L Marx 2006 An introduction to mathematical statistics and its applications 4th edition Prentice Hall Upper Saddle River NJ 920 pp

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index
Page 38: WEEK 4: CHAPTER 4, SPECIAL DISTRIBUTIONSocw.umb.edu/environmental-earth-and-ocean-sciences/eeos...EEOS 601 Applied Statistics Week 4, P. 5 of 38 Introduction Chapter 4 is my favorite

EEOS 601 Applied Statistics Week 4 P 38 of 38

Index

Binomial distribution 3 5 33Central limit theorem 19 21 24 26 28 37complement 8 continuity correction 17 18 26Distributions

exponential 12 13 34 35gamma 33geometric 33lognormal 22negative binomial 33

33 31 27 19-21 16 15 3normal 26 10 9 7 5 3Poisson

Expected value 3 8 24 25independent trials 16Legendre 15 21 23 27

33 22 14 5 3 2Matlab Symbolic Math Toolbox 14 16 20 22 25 34

Median 36normal curve 16 27P-Value 26 28

15 9-11Parameter 34 26 15 14 5-12 3Poisson

Poisson limit theorem 5 Poisson model 3 10 12population 29

30 3Precision -26 24 23 21 17-19 15 14 10-12 8 7 5 3 1Probability 32-34 28

Probability function 26Propagation of error 3 33random sample 31 33Random variable 9 11 12 16 19 27 31

33 31 23 22 5sample Standard deviation 15 21 24 25 27 32Standard error 3 Statistic 26uniform distribution 20 22 23

31 27 25 19 16 12 11 variable 9 34 31 27 25-19 variance 3

Z transformation 27

  • List of Figures
  • List of Tables
  • List of mfiles
  • Assignment
    • Required reading
      • Understanding by Design Templates
      • Introduction
      • Annotated outline (with Matlab scripts) for Larsen amp Marx Chapter 4
      • References
      • Index