Continuous Random Variables and Probability Distributions

113
4 Continuous Random Variables and Probability Distributions

description

Continuous Random Variables and Probability Distributions. 4. Probability Density Functions. 4.1. Probability Density Functions. - PowerPoint PPT Presentation

Transcript of Continuous Random Variables and Probability Distributions

Page 1: Continuous Random Variables and  Probability Distributions

4Continuous Random

Variables and Probability Distributions

Page 2: Continuous Random Variables and  Probability Distributions

4.1 Probability Density Functions

Page 3: Continuous Random Variables and  Probability Distributions

3

Probability Density Functions

A discrete random variable is one whose possible values either constitute a finite set or else can be listed in an infinite sequence (a list in which there is a first element, a second element, etc.).

A random variable whose set of possible values is an entire interval of numbers is not discrete.

Page 4: Continuous Random Variables and  Probability Distributions

4

Probability Density Functions

A random variable is continuous if both of the following apply:

1. Its set of possible values consists either of all numbers in a single interval on the number line or all numbers in a disjoint union of such intervals (e.g., [0, 10] [20, 30]).

2. No possible value of the variable has positive probability, that is, P(X = c) = 0 for any possible value c.

Page 5: Continuous Random Variables and  Probability Distributions

5

Probability Distributions for Continuous Variables

Page 6: Continuous Random Variables and  Probability Distributions

6

Probability Distributions for Continuous Variables

Definition

Let X be a continuous rv. Then a probability distribution or probability density function (pdf) of X is a function f (x) such that for any two numbers a and b with a b,

P (a X b) =

Page 7: Continuous Random Variables and  Probability Distributions

7

Probability Distributions for Continuous Variables

That is, the probability that X takes on a value in the interval [a, b] is the area above this interval and under the graph of the density function, as illustrated in Figure 4.2.

The graph of f (x) is often referred to as the density curve.

P (a X b) = the area under the density curve between a and b

Figure 4.2

Page 8: Continuous Random Variables and  Probability Distributions

8

Probability Distributions for Continuous Variables

For f (x) to be a legitimate pdf, it must satisfy the following two conditions:

1. f (x) 0 for all x

2. = area under the entire graph of f (x)

= 1

Page 9: Continuous Random Variables and  Probability Distributions

9

Example 4

The direction of an imperfection with respect to a reference line on a circular object such as a tire, brake rotor, or flywheel is, in general, subject to uncertainty.

Consider the reference line connecting the valve stem on a tire to the center point, and let X be the angle measured clockwise to the location of an imperfection. One possible pdf for X is

Page 10: Continuous Random Variables and  Probability Distributions

10

Example 4

The pdf is graphed in Figure 4.3.

The pdf and probability from Example 4

Figure 4.3

cont’d

Page 11: Continuous Random Variables and  Probability Distributions

11

Example 4

Clearly f(x) 0. The area under the density curve

is just the area of a rectangle:

(height) (base) = (360) = 1.

The probability that the angle is between 90 and 180 is

cont’d

Page 12: Continuous Random Variables and  Probability Distributions

12

Probability Distributions for Continuous Variables

Because whenever 0 a b 360 in Example 4.4 and P (a X b) depends only on the width b – a of the interval, X is said to have a uniform distribution.

Definition

A continuous rv X is said to have a uniform distribution on the interval [A, B] if the pdf of X is

Page 13: Continuous Random Variables and  Probability Distributions

13

Probability Distributions for Continuous Variables

When X is a discrete random variable, each possible value is assigned positive probability.

This is not true of a continuous random variable (that is, the second condition of the definition is satisfied) because the area under a density curve that lies above any single value is zero:

Page 14: Continuous Random Variables and  Probability Distributions

14

Probability Distributions for Continuous Variables

The fact that P(X = c) = 0 when X is continuous has an important practical consequence: The probability that X lies in some interval between a and b does not depend on whether the lower limit a or the upper limit b is included in the probability calculation:

P (a X b) = P (a < X < b) = P (a < X b) = P (a X < b)

If X is discrete and both a and b are possible values (e.g., X is binomial with n = 20 and a = 5, b = 10), then all four of the probabilities in (4.1) are different.

(4.1)

Page 15: Continuous Random Variables and  Probability Distributions

15

Example 5

“Time headway” in traffic flow is the elapsed time between the time that one car finishes passing a fixed point and the instant that the next car begins to pass that point.

Let X = the time headway for two randomly chosen consecutive cars on a freeway during a period of heavy flow. The following pdf of X is essentially the one suggested in “The Statistical Properties of Freeway Traffic” (Transp. Res., vol. 11: 221–228):

Page 16: Continuous Random Variables and  Probability Distributions

16

Example 5

The graph of f (x) is given in Figure 4.4; there is no density associated with headway times less than .5, and headway density decreases rapidly (exponentially fast) as x increases from .5.

The density curve for time headway in Example 5

Figure 4.4

cont’d

Page 17: Continuous Random Variables and  Probability Distributions

17

Example 5

Clearly, f (x) 0; to show that f (x)dx = 1, we use the calculus result

e–kx dx = (1/k)e–k a.

Then

cont’d

Page 18: Continuous Random Variables and  Probability Distributions

18

Example 5

The probability that headway time is at most 5 sec is

P(X 5) =

= .15e–.15(x – .5) dx

= .15e.075 e–15x dx

=

cont’d

Page 19: Continuous Random Variables and  Probability Distributions

19

Example 5

= e.075(–e–.75 + e–.075)

= 1.078(–.472 + .928)

= .491

= P (less than 5 sec)

= P (X < 5)

cont’d

Page 20: Continuous Random Variables and  Probability Distributions

20

The Cumulative Distribution Function

Page 21: Continuous Random Variables and  Probability Distributions

21

The Cumulative Distribution Function

The cumulative distribution function (cdf) F(x) for a discrete rv X gives, for any specified number x, the probability P(X x) .

It is obtained by summing the pdf p(y) over all possible values y satisfying y x.

The cdf of a continuous rv gives the same probabilities P(X x) and is obtained by integrating the pdf f(y) between the limits and x.

Page 22: Continuous Random Variables and  Probability Distributions

22

The Cumulative Distribution Function

Definition

The cumulative distribution function F(x) for a continuous rv X is defined for every number x by

F(x) = P(X x) =

For each x, F(x) is the area under the density curve to the left of x. This is illustrated in Figure 4.5, where F(x) increases smoothly as x increases.

Figure 4.5

A pdf and associated cdf

Page 23: Continuous Random Variables and  Probability Distributions

23

Example 6

Let X, the thickness of a certain metal sheet, have a uniform distribution on [A, B].

The density function is shown in Figure 4.6.

Figure 4.6

The pdf for a uniform distribution

Page 24: Continuous Random Variables and  Probability Distributions

24

Example 6

For x < A, F(x) = 0, since there is no area under the graph of the density function to the left of such an x.

For x B, F(x) = 1, since all the area is accumulated to the left of such an x. Finally for A x B,

cont’d

Page 25: Continuous Random Variables and  Probability Distributions

25

Example 6

The entire cdf is

The graph of this cdf appears in Figure 4.7.

Figure 4.7

The cdf for a uniform distribution

cont’d

Page 26: Continuous Random Variables and  Probability Distributions

26

Using F(x) to Compute Probabilities

Page 27: Continuous Random Variables and  Probability Distributions

27

Using F (x) to Compute Probabilities

The importance of the cdf here, just as for discrete rv’s, is that probabilities of various intervals can be computed from a formula for or table of F (x).

Proposition

Let X be a continuous rv with pdf f (x) and cdf F (x). Then for

any number a,

P(X > a) = 1 – F(a)

and for any two numbers a and b with a < b,

P(a X b) = F(b) – F(a)

Page 28: Continuous Random Variables and  Probability Distributions

28

Using F (x) to Compute Probabilities

Figure 4.8 illustrates the second part of this proposition; the desired probability is the shaded area under the density curve between a and b, and it equals the difference between the two shaded cumulative areas.

This is different from what is appropriate for a discrete integer valued random variable (e.g., binomial or Poisson):

P(a X b) = F(b) – F(a – 1) when a and b are integers.

Figure 4.8

Computing P(a X b) from cumulative probabilities

Page 29: Continuous Random Variables and  Probability Distributions

29

Example 7

Suppose the pdf of the magnitude X of a dynamic load on a bridge (in newtons) is

For any number x between 0 and 2,

Page 30: Continuous Random Variables and  Probability Distributions

30

Example 7

Thus

The graphs of f(x) and F(x) are shown in Figure 4.9.

Figure 4.9

The pdf and cdf for Example 4.7

cont’d

Page 31: Continuous Random Variables and  Probability Distributions

31

Example 7

The probability that the load is between 1 and 1.5 is

P(1 X 1.5) = F(1.5) – F(1)

The probability that the load exceeds 1 is P(X > 1) = 1 – P(X 1)

= 1 – F(1)

cont’d

Page 32: Continuous Random Variables and  Probability Distributions

32

Example 7

= 1 –

Once the cdf has been obtained, any probability involving X can easily be calculated without any further integration.

cont’d

Page 33: Continuous Random Variables and  Probability Distributions

33

Obtaining f(x) from F(x)

Page 34: Continuous Random Variables and  Probability Distributions

34

Obtaining f(x) from F(x)

For X discrete, the pmf is obtained from the cdf by taking the difference between two F(x) values. The continuous analog of a difference is a derivative.

The following result is a consequence of the Fundamental Theorem of Calculus.

Proposition

If X is a continuous rv with pdf f (x) and cdf F(x), then at every x at which the derivative F (x) exists, F (x) = f (x).

Page 35: Continuous Random Variables and  Probability Distributions

35

Example 8

When X has a uniform distribution, F(x) is differentiable except at x = A and x = B, where the graph of F(x) has sharp corners.

Since F(x) = 0 for x < A and F(x) = 1 for x > B, F (x) = 0 = f(x) for such x.

For A < x < B,

Page 36: Continuous Random Variables and  Probability Distributions

36

Percentiles of a Continuous Distribution

Page 37: Continuous Random Variables and  Probability Distributions

37

Percentiles of a Continuous Distribution

When we say that an individual’s test score was at the 85th percentile of the population, we mean that 85% of all population scores were below that score and 15% were above.

Similarly, the 40th percentile is the score that exceeds 40% of all scores and is exceeded by 60% of all scores.

Page 38: Continuous Random Variables and  Probability Distributions

38

Percentiles of a Continuous Distribution

Proposition

Let p be a number between 0 and 1. The (100p)th percentile of the distribution of a continuous rv X, denoted by (p), is defined by

p = F((p)) = f(y) dy

According to Expression (4.2), (p) is that value on the measurement axis such that 100p% of the area under the graph of f(x) lies to the left of (p) and 100(1 – p)% lies to the right.

(4.2)

Page 39: Continuous Random Variables and  Probability Distributions

39

Percentiles of a Continuous Distribution

Thus (.75), the 75th percentile, is such that the area under the graph of f(x) to the left of (.75) is .75.

Figure 4.10 illustrates the definition.

Figure 4.10

The (100p)th percentile of a continuous distribution

Page 40: Continuous Random Variables and  Probability Distributions

40

Example 9

The distribution of the amount of gravel (in tons) sold by a particular construction supply company in a given week is a continuous rv X with pdf

The cdf of sales for any x between 0 and 1 is

Page 41: Continuous Random Variables and  Probability Distributions

41

Example 9

The graphs of both f (x) and F(x) appear in Figure 4.11.

Figure 4.11

The pdf and cdf for Example 4.9

cont’d

Page 42: Continuous Random Variables and  Probability Distributions

42

Example 9

The (100p)th percentile of this distribution satisfies the equation

that is,

((p))3 – 3(p) + 2p = 0

For the 50th percentile, p = .5, and the equation to be solved is 3 – 3 + 1 = 0; the solution is = (.5) = .347. If the distribution remains the same from week to week, then in the long run 50% of all weeks will result in sales of less than .347 ton and 50% in more than .347 ton.

cont’d

Page 43: Continuous Random Variables and  Probability Distributions

43

Percentiles of a Continuous Distribution

Definition

The median of a continuous distribution, denoted by , is the 50th percentile, so satisfies .5 = F( ) That is, half the area under the density curve is to the left of and half is to the right of .

A continuous distribution whose pdf is symmetric—the graph of the pdf to the left of some point is a mirror image of the graph to the right of that point—has median equal to the point of symmetry, since half the area under the curve lies to either side of this point.

Page 44: Continuous Random Variables and  Probability Distributions

44

Percentiles of a Continuous Distribution

Figure 4.12 gives several examples. The error in a measurement of a physical quantity is often assumed to have a symmetric distribution.

Figure 4.12

Medians of symmetric distributions

Page 45: Continuous Random Variables and  Probability Distributions

45

Expected Values

Page 46: Continuous Random Variables and  Probability Distributions

46

Expected Values

For a discrete random variable X, E(X) was obtained by summing x p(x)over possible X values.

Here we replace summation by integration and the pmf by the pdf to get a continuous weighted average.

Definition

The expected or mean value of a continuous rvX with pdf f (x) is

x = E(X) = x f(x) dy

Page 47: Continuous Random Variables and  Probability Distributions

47

Example 10

The pdf of weekly gravel sales X was

(1 – x2) 0 x 1

0 otherwise

So

f(x) =

Page 48: Continuous Random Variables and  Probability Distributions

48

Expected Values

When the pdf f(x) specifies a model for the distribution of values in a numerical population, then is the population mean, which is the most frequently used measure of population location or center.

Often we wish to compute the expected value of some function h(X) of the rv X.

If we think of h(X) as a new rv Y, techniques from mathematical statistics can be used to derive the pdf of Y, and E(Y) can then be computed from the definition.

Page 49: Continuous Random Variables and  Probability Distributions

49

Expected Values

Fortunately, as in the discrete case, there is an easier way to compute E[h(X)].

Proposition

If X is a continuous rv with pdf f(x) and h(X) is any function of X, then

E[h(X)] = h(X) = h(x) f (x) dx

Page 50: Continuous Random Variables and  Probability Distributions

50

Example 11

Two species are competing in a region for control of a limited amount of a certain resource.

Let X = the proportion of the resource controlled by species 1 and suppose X has pdf

0 x 1

otherwise

which is a uniform distribution on [0, 1]. (In her book Ecological Diversity, E. C. Pielou calls this the “broken- tick” model for resource allocation, since it is analogous to breaking a stick at a randomly chosen point.)

f(x) =

Page 51: Continuous Random Variables and  Probability Distributions

51

Example 11

Then the species that controls the majority of this resource controls the amount

h(X) = max (X, 1 – X) =

The expected amount controlled by the species having majority control is then

E[h(X)] = max(x, 1 – x) f (x)dx

cont’d

Page 52: Continuous Random Variables and  Probability Distributions

52

Example 11

= max(x, 1 – x) 1 dx

= (1 – x) 1 dx + x 1 dx

=

cont’d

Page 53: Continuous Random Variables and  Probability Distributions

53

Variance

For h(X), a linear function, E[h(X)] = E(aX + b) = aE(X) + b.

In the discrete case, the variance of X was defined as the expected squared deviation from and was calculated by summation. Here again integration replaces summation.

Definition

The variance of a continuous random variable X with pdf f(x) and mean value is

= V(X) = (x – )2 f (x)dx = E[(X – )2]

The standard deviation (SD) of X is X =

Page 54: Continuous Random Variables and  Probability Distributions

54

Variance

The variance and standard deviation give quantitative measures of how much spread there is in the distribution or population of x values.

Again is roughly the size of a typical deviation from . Computation of 2 is facilitated by using the same shortcut formula employed in the discrete case.

Proposition

V(X) = E(X2) – [E(X)]2

Page 55: Continuous Random Variables and  Probability Distributions

55

Example 12

For weekly gravel sales, we computed E(X) = . Since

E(X2) = x2 f (x) dx

= x2 (1 – x2) dx

= (x2 – x4) dx =

Page 56: Continuous Random Variables and  Probability Distributions

56

Example 12

and X = .244

When h(X) = aX + b, the expected value and variance of h(X) satisfy the same properties as in the discrete case:

E[h(X)] = a + b and V[h(X)] = a2 2.

cont’d

Page 57: Continuous Random Variables and  Probability Distributions

57

The Normal Distribution

Page 58: Continuous Random Variables and  Probability Distributions

58

The Normal Distribution

The normal distribution is the most important one in all of probability and statistics.

Many numerical populations have distributions that can be fit very closely by an appropriate normal curve.

Examples: heights, weights, measurement errors in scientific experiments, anthropometric measurements on fossils, reaction times in psychological experiments, measurements of intelligence and aptitude, scores on various tests, and numerous economic measures and indicators.

Page 59: Continuous Random Variables and  Probability Distributions

59

The Normal DistributionDefinitionA continuous rv X is said to have a normal distribution with parameters and (or and 2), where < < and 0 < , if the pdf of X is

f(x; , ) = < x < (4.3)

e = 2.71828… The base of the natural logarithm

π = pi = 3.14159…

Page 60: Continuous Random Variables and  Probability Distributions

60

The Normal Distribution

The statement that X is normally distributed with parameters and 2 is often abbreviated X ~ N(, 2).

Clearly f(x; , ) 0, but a somewhat complicated calculus argument must be used to verify that f(x; , ) dx = 1.

It can be shown that E(X) = and V(X) = 2, so the parameters are the mean and the standard deviation of X.

Page 61: Continuous Random Variables and  Probability Distributions

61

The Normal Distribution

Figure 4.13 presents graphs of f(x; , ) for several different (, ) pairs.

Figure 4.13(a)

Two different normal density curves

Figure 4.13(b)

Visualizing and for a normal

distribution

Page 62: Continuous Random Variables and  Probability Distributions

620 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

A family of density curves

Here, means are different

( = 10, 15, and 20) while

standard deviations are the

same ( = 3).

Here, means are the same ( = 15)

while standard deviations are

different ( = 2, 4, and 6).

Page 63: Continuous Random Variables and  Probability Distributions

63

The Standard Normal Distribution

Page 64: Continuous Random Variables and  Probability Distributions

64

The Standard Normal Distribution

The computation of P(a X b) when X is a normal rv with parameters and requires evaluating

(4.4)

None of the standard integration techniques can be used to accomplish this. Instead, for = 0 and = 1, Expression (4.4) has been calculated using numerical techniquesand tabulated for certain values of a and b.

This table can also be used to compute probabilities for any other values of and under consideration.

Page 65: Continuous Random Variables and  Probability Distributions

65

The Standard Normal Distribution

DefinitionThe normal distribution with parameter values = 0 and = 1 is called the standard normal distribution. A random variable having a standard normal distribution is called a standard normal random variable and will be denoted by Z. The pdf of Z is

< z <

The graph of f(z; 0, 1) is called the standard normal (or z) curve. Its inflection points are at 1 and –1. The cdf of Z isP(Z z) = which we will denote by

Page 66: Continuous Random Variables and  Probability Distributions

66

The Standard Normal Distribution

The standard normal distribution almost never serves as a model for a naturally arising population.

Instead, it is a reference distribution from which informationabout other normal distributions can be obtained.

Appendix Table A.3 gives = P(Z z), the area under the standard normal density curve to the left of z, forz = –3.49, –3.48,..., 3.48, 3.49.

Page 67: Continuous Random Variables and  Probability Distributions

67

The Standard Normal Distribution

Figure 4.14 illustrates the type of cumulative area (probability) tabulated in Table A.3. From this table, various other probabilities involving Z can be calculated.

Figure 4.14

Standard normal cumulative areas tabulated in Appendix Table A.3

Page 68: Continuous Random Variables and  Probability Distributions

68

Example 13

Let’s determine the following standard normal probabilities:

(a) P(Z 1.25), (b) P(Z > 1.25),

(c) P(Z –1.25), and (d) P(–.38 Z 1.25).

a. P(Z 1.25) = (1.25), a probability that is tabulated in Appendix Table A.3 at the intersection of the row marked 1.2 and the column marked .05.

The number there is .8944, so P(Z 1.25) = .8944.

Page 69: Continuous Random Variables and  Probability Distributions

69

Example 13

Figure 4.15(a) illustrates this probability.

b. P(Z > 1.25) = 1 – P(Z 1.25) = 1 – (1.25), the area under the z curve to the right of 1.25 (an upper-tail area). Then (1.25) = .8944 implies that P(Z > 1.25) = .1056.

Figure 4.15(a)

Normal curve areas (probabilities) for Example 13

cont’d

Page 70: Continuous Random Variables and  Probability Distributions

70

Example 13

Since Z is a continuous rv, P(Z 1.25) = .1056. See Figure 4.15(b).

c. P(Z –1.25) = (–1.25), a lower-tail area. Directly from Appendix Table A.3, (–1.25) = .1056.

By symmetry of the z curve, this is the same answer as in part (b).

Figure 4.15(b)

Normal curve areas (probabilities) for Example 13

cont’d

Page 71: Continuous Random Variables and  Probability Distributions

71

Example 13

d. P(–.38 Z 1.25) is the area under the standardnormal curve above the interval whose left endpoint is–.38 and whose right endpoint is 1.25.

From Section 4.2, if X is a continuous rv with cdf F(x), then P(a X b) = F(b) – F(a).

Thus P(–.38 Z 1.25) = (1.25) – (–.38)

= .8944 – .3520

= .5424

cont’d

Page 72: Continuous Random Variables and  Probability Distributions

72

Example 13

See Figure 4.16.

cont’d

Figure 4.16

P(–.38 Z 1.25) as the difference between two cumulative areas

Page 73: Continuous Random Variables and  Probability Distributions

73

Percentiles of the Standard Normal Distribution

Page 74: Continuous Random Variables and  Probability Distributions

74

Percentiles of the Standard Normal Distribution

For any p between 0 and 1, Appendix Table A.3 can be used to obtain the (100p)th percentile of the standard normal distribution.

Page 75: Continuous Random Variables and  Probability Distributions

75

Example 14

The 99th percentile of the standard normal distribution is that value on the horizontal axis such that the area under the z curve to the left of the value is .9900.

Appendix Table A.3 gives for fixed z the area under the standard normal curve to the left of z, whereas here we have the area and want the value of z. This is the “inverse” problem to P(Z z) = ?

so the table is used in an inverse fashion: Find in the middle of the table .9900; the row and column in which it lies identify the 99th z percentile.

Page 76: Continuous Random Variables and  Probability Distributions

76

Example 14

Here .9901 lies at the intersection of the row marked 2.3 and column marked .03, so the 99th percentile is (approximately) z = 2.33.

(See Figure 4.17.)

cont’d

Figure 4.17

Finding the 99th percentile

Page 77: Continuous Random Variables and  Probability Distributions

77

Example 14

By symmetry, the first percentile is as far below 0 as the 99th is above 0, so equals –2.33 (1% lies below the first and also above the 99th).

(See Figure 4.18.)

cont’d

Figure 4.18

The relationship between the 1st and 99th percentiles

Page 78: Continuous Random Variables and  Probability Distributions

78

Percentiles of the Standard Normal Distribution

In general, the (100p)th percentile is identified by the row and column of Appendix Table A.3 in which the entry p is found (e.g., the 67th percentile is obtained by finding .6700 in the body of the table, which gives z = .44).

If p does not appear, the number closest to it is often used, although linear interpolation gives a more accurate answer.

Page 79: Continuous Random Variables and  Probability Distributions

79

Percentiles of the Standard Normal Distribution

For example, to find the 95th percentile, we look for .9500 inside the table.

Although .9500 does not appear, both .9495 and .9505 do, corresponding to z = 1.64 and 1.65, respectively.

Since .9500 is halfway between the two probabilities that do appear, we will use 1.645 as the 95th percentile and –1.645 as the 5th percentile.

Page 80: Continuous Random Variables and  Probability Distributions

80

z Notation for z Critical Values

Page 81: Continuous Random Variables and  Probability Distributions

81

z Notation for z Critical Values

In statistical inference, we will need the values on the horizontal z axis that capture certain small tail areas under the standard normal curve.

Notationz will denote the value on the z axis for which of the area under the z curve lies to the right of z. (See Figure 4.19.)

Figure 4.19

z notation Illustrated

Page 82: Continuous Random Variables and  Probability Distributions

82

z Notation for z Critical Values

For example, z.10 captures upper-tail area .10, and z.01

captures upper-tail area .01.

Since of the area under the z curve lies to the right of z,

1 – of the area lies to its left. Thus z is the 100(1 – )thpercentile of the standard normal distribution.

By symmetry the area under the standard normal curve to the left of –z is also . The z s are usually referred to as z critical values.

Page 83: Continuous Random Variables and  Probability Distributions

83

z Notation for z Critical Values

Table 4.1 lists the most useful z percentiles and z values.

Table 4.1

Standard Normal Percentiles and Critical Values

Page 84: Continuous Random Variables and  Probability Distributions

84

Example 15

z.05 is the 100(1 – .05)th = 95th percentile of the standard normal distribution, so z.05 = 1.645.

The area under the standard normal curve to the left of–z.05 is also .05. (See Figure 4.20.)

Figure 4.20

Finding z.05

Page 85: Continuous Random Variables and  Probability Distributions

85

Nonstandard Normal Distributions

Page 86: Continuous Random Variables and  Probability Distributions

86

Nonstandard Normal Distributions

When X ~ N(, 2), probabilities involving X are computed by “standardizing.” The standardized variable is (X – )/.

Subtracting shifts the mean from to zero, and then dividing by scales the variable so that the standard deviation is 1 rather than .

PropositionIf X has a normal distribution with mean and standard deviation , then

Page 87: Continuous Random Variables and  Probability Distributions

87

Nonstandard Normal Distributions

has a standard normal distribution. Thus

Page 88: Continuous Random Variables and  Probability Distributions

88

Nonstandard Normal Distributions

The key idea of the proposition is that by standardizing, anyprobability involving X can be expressed as a probability involving a standard normal rv Z, so that Appendix Table A.3 can be used.

This is illustrated in Figure 4.21.

Figure 4.21

Equality of nonstandard and standard normal curve areas

Page 89: Continuous Random Variables and  Probability Distributions

89

Nonstandard Normal Distributions

The proposition can be proved by writing the cdf of Z = (X – )/ as

Using a result from calculus, this integral can be differentiated with respect to z to yield the desired pdff(z; 0, 1).

Page 90: Continuous Random Variables and  Probability Distributions

90

Example 16

The time that it takes a driver to react to the brake lights on a decelerating vehicle is critical in helping to avoid rear-end collisions.

The article “Fast-Rise Brake Lamp as a Collision-Prevention Device” (Ergonomics, 1993: 391–395) suggeststhat reaction time for an in-traffic response to a brake signal from standard brake lights can be modeled with a normal distribution having mean value 1.25 sec and standarddeviation of .46 sec.

Page 91: Continuous Random Variables and  Probability Distributions

91

Example 16

What is the probability that reaction time is between 1.00 sec and 1.75 sec? If we let X denote reaction time, then standardizing gives

1.00 X 1.75

if and only if

Thus

cont’d

Page 92: Continuous Random Variables and  Probability Distributions

92

Example 16

= P(–.54 Z 1.09) = (1.09) – (–.54)

= .8621 – .2946 = .5675

This is illustrated in Figure 4.22

cont’d

Figure 4.22

Normal curves for Example 16

Page 93: Continuous Random Variables and  Probability Distributions

93

Example 16

Similarly, if we view 2 sec as a critically long reactiontime, the probability that actual reaction time will exceed this value is

cont’d

Page 94: Continuous Random Variables and  Probability Distributions

94

Percentiles of an Arbitrary Normal Distribution

Page 95: Continuous Random Variables and  Probability Distributions

95

Percentiles of an Arbitrary Normal Distribution

The (100p)th percentile of a normal distribution with mean and standard deviation is easily related to the (100p)th percentile of the standard normal distribution.

Proposition

Another way of saying this is that if z is the desiredpercentile for the standard normal distribution, then thedesired percentile for the normal (, ) distribution is zstandard deviations from .

Page 96: Continuous Random Variables and  Probability Distributions

96

Example 18

The amount of distilled water dispensed by a certain machine is normally distributed with mean value 64 oz and standard deviation .78 oz.

What container size c will ensure that overflow occurs only .5% of the time? If X denotes the amount dispensed, the desired condition is that P(X > c) = .005, or, equivalently, that P(X c) = .995.

Thus c is the 99.5th percentile of the normal distribution with = 64 and = .78.

Page 97: Continuous Random Variables and  Probability Distributions

97

Example 18

The 99.5th percentile of the standard normal distribution is 2.58, so

c = (.995) = 64 + (2.58)(.78) = 64 + 2.0 = 66 oz

This is illustrated in Figure 4.23.

cont’d

Figure 4.23

Distribution of amount dispensed for Example 18

Page 98: Continuous Random Variables and  Probability Distributions

98

The Normal Distribution and Discrete Populations

Page 99: Continuous Random Variables and  Probability Distributions

99

The Normal Distribution and Discrete Populations

The normal distribution is often used as an approximation to the distribution of values in a discrete population.

In such situations, extra care should be taken to ensure that probabilities are computed in an accurate manner.

Page 100: Continuous Random Variables and  Probability Distributions

100

Example 19

IQ in a particular population (as measured by a standard test) is known to be approximately normally distributed with = 100 and = 15.

What is the probability that a randomly selected individual has an IQ of at least 125?

Letting X = the IQ of a randomly chosen person, we wish P(X 125).

The temptation here is to standardize X 125 as in previous examples. However, the IQ population distribution is actually discrete, since IQs are integer-valued.

Page 101: Continuous Random Variables and  Probability Distributions

101

Example 19

So the normal curve is an approximation to a discrete probability histogram, as pictured in Figure 4.24.

The rectangles of the histogram are centered at integers,so IQs of at least 125 correspond to rectangles beginning at 124.5, as shaded in Figure 4.24.

cont’d

Figure 4.24

A normal approximation to a discrete distribution

Page 102: Continuous Random Variables and  Probability Distributions

102

Example 19

Thus we really want the area under the approximating normal curve to the right of 124.5.

Standardizing this value gives P(Z 1.63) = .0516, whereas standardizing 125 results in P(Z 1.67) = .0475.

The difference is not great, but the answer .0516 is more accurate. Similarly, P(X = 125) would be approximated by the area between 124.5 and 125.5, since the area under the normal curve above the single value 125 is zero.

cont’d

Page 103: Continuous Random Variables and  Probability Distributions

103

Example 19

The correction for discreteness of the underlying distribution in Example 19 is often called a continuity correction.

It is useful in the following application of the normal distribution to the computation of binomial probabilities.

cont’d

Page 104: Continuous Random Variables and  Probability Distributions

104

Approximating the Binomial Distribution

Page 105: Continuous Random Variables and  Probability Distributions

105

Approximating the Binomial Distribution

Recall that the mean value and standard deviation of a binomial random variable X are X = np and X =respectively.

Page 106: Continuous Random Variables and  Probability Distributions

106

Approximating the Binomial Distribution

Figure 4.25 displays a binomial probability histogram for the binomial distribution with n = 20, p = .6, for which = 20(.6) = 12 and =

Figure 4.25

Binomial probability histogram for n = 20, p = .6 with normal approximation curve superimposed

Page 107: Continuous Random Variables and  Probability Distributions

107

Approximating the Binomial Distribution

A normal curve with this and has been superimposed on the probability histogram.

Although the probability histogram is a bit skewed (because p .5), the normal curve gives a very good approximation, especially in the middle part of the picture.

The area of any rectangle (probability of any particular X value) except those in the extreme tails can be accurately approximated by the corresponding normal curve area.

Page 108: Continuous Random Variables and  Probability Distributions

108

Approximating the Binomial Distribution

For example,P(X = 10) = B(10; 20, .6) – B(9; 20, .6) = .117,

whereas the area under the normal curve between 9.5 and 10.5 is P(–1.14 Z –.68) = .1212.

More generally, as long as the binomial probability histogram is not too skewed, binomial probabilities can be well approximated by normal curve areas. It is then customary to say that X has approximately a normal distribution.

Page 109: Continuous Random Variables and  Probability Distributions

109

Approximating the Binomial Distribution

PropositionLet X be a binomial rv based on n trials with success probability p. Then if the binomial probability histogram is not too skewed, X has approximately a normal distribution with = np and =

In particular, for x = a possible value of X,

Page 110: Continuous Random Variables and  Probability Distributions

110

Approximating the Binomial Distribution

In practice, the approximation is adequate provided that both np 10 and n(1-p) 10, since there is then enough symmetry in the underlying binomial distribution.

A direct proof of this result is quite difficult. In the next chapter we’ll see that it is a consequence of a more general result called the Central Limit Theorem.

In all honesty, this approximation is not so important for probability calculation as it once was.

This is because software can now calculate binomial probabilities exactly for quite large values of n.

Page 111: Continuous Random Variables and  Probability Distributions

111

Example 20

Suppose that 25% of all students at a large public university receive financial aid.

Let X be the number of students in a random sample of size 50 who receive financial aid, so that p = .25.Then = 12.5 and = 3.06.

Since np = 50(.25) = 12.5 10 and n(1-p) = 37.5 10, the approximation can safely be applied.

Page 112: Continuous Random Variables and  Probability Distributions

112

Example 20

The probability that at most 10 students receive aid is

Similarly, the probability that between 5 and 15 (inclusive) of the selected students receive aid is

P(5 X 15) = B(15; 50, .25) – B(4; 50, .25)

cont’d

Page 113: Continuous Random Variables and  Probability Distributions

113

Example 20

The exact probabilities are .2622 and .8348, respectively, so the approximations are quite good.

In the last calculation, the probability P(5 X 15) is being approximated by the area under the normal curve between 4.5 and 15.5—the continuity correction is used for both the upper and lower limits.

cont’d