Lesson 5 - Probability Distributions

54
Probability Distributions Jeremy G. Vicencio 2 nd Semester, A.Y. 2010-2011

Transcript of Lesson 5 - Probability Distributions

Page 1: Lesson 5 - Probability Distributions

Probability Distributions

Jeremy G. Vicencio

2nd Semester, A.Y. 2010-2011

Page 2: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Probability Distribution

• A device which summarizes the relationship between the values of a random variable and the probabilities of their occurrence.

• It may be expressed in the form of a table, graph, or formula.

Page 3: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Probability Distribution of a Discrete Random Variable

• The probability distribution of a discrete random variable is a table, graph, or formula, or other device used to specify all possible values of a discrete random variable along with their respective probabilities.

Page 4: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Probability Distribution of a Discrete Random Variable

In an article in the American Journal of Obstetrics andGynecology, Buitendijk and Bracken state that duringthe previous 25 years there had been an increasingawareness of the potentially harmful effects of drugsand chemicals on the developing fetus. The authorsassessed the use of medication in a population ofwomen who were delivered of infants at a largeEuropean hospital between 1980 and 1982, andstudied the association of medication use withvarious maternal characteristics such asalcohol, tobacco, and illegal drug use. Their findings

Page 5: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Probability Distribution of a Discrete Random Variable

suggest that women who engage in risk-takingbehavior during pregnancy are also more likely to usemedications while pregnant. Table 5.1 shows theprevalence of prescription and nonprescription druguse in pregnancy among the study subjects.

We wish to construct the probability distributionof the discrete variable X = number of prescriptionand nonprescription drugs used by the study subjects.

Page 6: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Probability Distribution of a Discrete Random Variable

Number of Drugs Frequency

0 1425

1 1351

2 793

3 348

4 156

5 58

6 28

7 15

8 6

9 3

10 1

12 1

Total 4185

Table 5.1 Prevalence of Prescription and Nonprescription Drug Use in Pregnancy Among Women Delivered of Infants at a Large European Hospital

SOURCE: Simone Buitendjik and Michael B. Bracken, “Medication in Early Pregnancy: Prevalence of Use and Relationship to Maternal Characteristics,” American Journal of Obstetrics and Gynecology, 165(1991), 33-40 as printed in Biostatistics: A Foundation for Analysis in the Health Sciences by Wayne W. Daniel (1995)

Page 7: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Probability Distribution of a Discrete Random Variable

Number of Drugs (x) P (X=x)

0 0.3405

1 0.3228

2 0.1895

3 0.0832

4 0.0373

5 0.0139

6 0.0067

7 0.0036

8 0.0014

9 0.0007

10 0.0002

12 0.0002

Total 1.0000

Table 5.2 Probability Distribution of Number of Prescription and Nonprescription Drugs Used in Pregnancy By Women Delivered

of Infants at a Large European Hospital

Page 8: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Probability Distribution of a Discrete Random Variable

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0 1 2 3 4 5 6 7 8 9 10 11 12

Pro

bab

ility

x (number of drugs)

Figure 5.1 Graphical representation of the probability distribution shown in Table 5.2

Page 9: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Probability Distribution of a Discrete Random Variable

Two essential properties of a probability distribution of a discrete variable:

1) 0 ≤ P(X=x) ≤ 1

2) ∑P(X=x) = 1

Page 10: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Probability Distribution of a Discrete Random Variable

What is the probability that a randomly selected woman will be one who used seven prescription and nonprescription drugs?

Solution: the desired probability is P(X=7). From Table 5.2, it will be seen that the answer is 0.0036

Page 11: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Probability Distribution of a Discrete Random Variable

What is the probability that a randomly selected woman used either two or three drugs?

Solution: Use the addition rule for mutually exclusive events. Using probability notation and the values from Table 5.2, the answer is

P(2 ∪ 3) = P(2) + P(3) = 0.1895 + 0.0832 = 0.2727

Page 12: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Probability Distribution of a Discrete Random Variable

Cumulative probability distribution

The cumulative probability for xi is written as

F(x) = P(X ≤ xi)

It gives the probability that X is less than or equal to a specified value, xi.

Page 13: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Probability Distribution of a Discrete Random Variable

Number of Drugs (x) Cumulative Frequency P(X ≤ xi)

0 0.3405

1 0.6633

2 0.8528

3 0.9360

4 0.9733

5 0.9872

6 0.9939

7 0.9975

8 0.9989

9 0.9996

10 0.9998

12 1.0000

Table 5.3 Cumulative Probability Distribution of Number of Prescription and Nonprescription Drugs Used in Pregnancy By Women Delivered

of Infants at a Large European Hospital

Page 14: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Probability Distribution of a Discrete Random Variable

Figure 5.1 Cumulative probability distribution of number of prescription and prescription drugs used during pregnancy Used in Pregnancy by women

delivered of infants at a large European hospital

Page 15: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Probability Distribution of a Discrete Random Variable

What is the probability that a woman picked at random will be one who used three or fewer drugs?

Solution: The probability in question can be found directly in Table 5.3 by reading the cumulative probability opposite x = 3. Therefore,

P(x ≤ 3) = 0.9360

Page 16: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Probability Distribution of a Discrete Random Variable

What is the probability that a woman picked at random will be one who used fewer than 3 drugs?

Solution: Since a woman who used fewer than three drugs used either two, one, or no drugs, the answer is the cumulative probability for 2. That is,

P(x < 3) = P(x ≤ 2) = 0.8528

Page 17: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Probability Distribution of a Discrete Random Variable

What is the probability that a randomly selected woman used six or more drugs?

Solution: Use the concept of complementary probabilities. P(x ≥ 6) + P(x ≤ 5) = 1. Therefore,

P(x ≥ 6) = 1 – P(x ≤ 5) = 1 – 0.9872 = 0.0128

Page 18: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Probability Distribution of a Discrete Random Variable

What is the probability that a randomly selected woman is one who used between two and five drugs inclusive?

Solution: P(x ≤ 5) = 0.9872 is the probability that a woman used between zero and five drugs. To get the probability of between two and five drugs, subtract from 0.9872, the probability of one or fewer.

P(2 ≤ x ≤ 5) = P(x ≤ 5) – P(x ≤ 1) =

0.9872 – 0.6633 = 0.3239

Page 19: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

ContinuousProbability Distribution

Figure 5.3.1 Histogram of the Ages of 169 Subjects Who Participated in a Study of Sparteine and Mephenytoin Oxidation

0

10

20

30

40

50

60

70

10-19 20-29 30-39 40-49 50-59 60-69

Freq

uen

cy

Age

Page 20: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

ContinuousProbability Distribution

Figure 5.3.2 Histogram of the Ages of 169 Subjects Who Participated in a Study of Sparteine and Mephenytoin Oxidation

0

5

10

15

20

25

30

35

40

45

15-19 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64

Freq

uen

cy

Age

Page 21: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

ContinuousProbability Distribution

0

5

10

15

20

25

30

35

40

45

0 10 20 30 40 50 60 70

Freq

uen

cy

Age

Figure 5.3.3 Frequency Polygon of the Ages of 169 Subjects Who Participated in a Study of Sparteine and Mephenytoin Oxidation

Page 22: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

ContinuousProbability Distribution

0

5

10

15

20

25

30

35

40

45

0 10 20 30 40 50 60 70

Freq

uen

cy

Age

Figure 5.3.4 Continuous Probability Distribution of the Ages of Subjects Who Participate in Sparteine and Mephenytoin Oxidation Studies

Page 23: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

ContinuousProbability Distribution

Figure 5.4.1 A histogram resulting from a large number of values and small class intervals.

Source: Daniel (1995)

Page 24: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

ContinuousProbability Distribution

• In general, as the number of observations, n, approaches infinity, and the width of the class intervals approaches zero, the frequency polygon approaches a smooth curve.

Page 25: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

ContinuousProbability Distribution

• The total area under the curve is equal to one

Figure 5.4.2 Graphical representation of a continuous distributionSource: Daniel (1995)

Page 26: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

ContinuousProbability Distribution

• The relative frequency of occurrence of values between any two points on the x-axis is equal to the total area bounded by the curve, the x-axis, and the perpendicular lines erected at two points on the x axis.

Figure 5.4.2 Graph of a continuous distribution showing area between a and b

Source: Daniel (1995)

Page 27: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

ContinuousProbability Distribution

• What is the probability of any specific value of the random variable?

Page 28: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

ContinuousProbability Distribution

Finding area under a smooth curve

• Integral Calculus – to find the area under a smooth curve between any two points a and b, the density function is integrated from a to b.

• Density Function – a formula used to represent the distribution of a continuous random variable.

Page 29: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

ContinuousProbability Distribution

Definition:

A nonnegative function f(x) is called a probabilitydistribution (sometimes called a probabilitydensity function) of the continuous randomvariable X if the total area bounded by its curveand the x-axis is equal to 1 and if the subareaunder the curve bounded by the curve, the x-axis, and perpendiculars erected at any twopoints a and b gives the probability that X isbetween the points a and b.

Page 30: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Normal Distribution

Figure 5.6 Graph of a normal distributionSource: Daniel (1995)

Page 31: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Normal Distribution

Characteristics of the Normal Distribution:

1. It is symmetrical about its mean, µ.

2. The mean, median, and the mode are all equal.

3. The total area under the curve about the x-axis is one square unit.

4. If we erect perpendiculars a distance of 1 SD from the mean in both directions, the area enclosed by these perpendiculars, the x-axis,

Page 32: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Normal Distribution

and the curve will be approximately 68% of the total area. (2 SD, 95%; 3 SD; 99.7%).

Figure 5.7 Subdivision of the Areas Under the Normal Curve

Page 33: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Normal Distribution

5. The normal distribution is completely determined by the parameters µ and σ. In other words, a different normal distribution is specified for each different value of µ and σ. Different values of µ shift the graph of the distribution along the x-axis while different values of σ determine the degree of flatness or peakedness of the graph of the distribution.

Page 34: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Normal Distribution

Figure 5.8.1 Three normal distributions with different means but the same amount of variability

Figure 5.8.2 Three normal distributions with different standard deviations but the same mean

Source: Daniel (1995)

Page 35: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Normal Distribution

Standard Normal Dist./Unit Normal Dist.

• Has a mean of 0 and a standard dev. of 1

Figure 5.9 Graph of the Standard Normal DistributionSource: Daniel (1995)

Page 36: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Normal Distribution

• To find the area between z0 and z1, we need to evaluate the following integral:

Page 37: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Normal Distribution

• Given the standard normal distribution, find the area under the curve, above the z-axis between z = -∞ and z = 2 (0.9772)

Figure 5.10.1 Graph of the standard normal distribution showing area between z = - ∞ and z = 2

Source: Daniel (1995)

Page 38: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Normal Distribution

The area can be interpreted in several ways:

• The probability that a z picked at random from a population of z’s will have a value between -∞ and 2.

• The relative frequency of occurrence (or proportion) of values of z between -∞ and 2, or we may say that 97.72% of the z’s have a value between -∞ and 2.

Page 39: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Normal Distribution

• Instead of looking up the areas on the table, you can use Excel’s NORMSDIST function.

=NORMSDIST(z)

Page 40: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Normal Distribution

• What is the probability that a z picked at random from the population of z’s will have a value between -2.55 and +2.55?

Figure 5.10.2 Standard normal curve showing P(-2.55 < z < 2.55)Source: Daniel (1995)

Page 41: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Normal Distribution

• What proportion of z-values are between -2.74 and 1.53?

Figure 5.10.3 Standard normal curve showing proportion of z values between z = -2.74 and z = 1.53

Source: Daniel (1995)

Page 42: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Normal Distribution

• Given the standard normal distribution, find P(z ≥ 2.71)

Figure 5.10.4 Standard normal curve showing P(z ≥ 2.71). Source: Daniel (1995)

Page 43: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Normal Distribution

• Given the standard normal distribution, find P(0.84 ≤ z ≤ 2.45)

Figure 5.10.5 Standard normal curve showing P(0.84 ≤ z ≤ 2.45)Source: Daniel (1995)

Page 44: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Normal Distribution

As part of a study of Alzheimer’s disease (AD), Dusheiko reported data that are compatible with the hypothesis that brain weights of victims of the disease are normally distributed. From the reported data, we may compute a mean of 1076.80 grams and an SD of 105.76 grams. If we assume that these results are applicable to all victims of Alzheimer’s disease, find the probability that a randomly selected victim of the disease will have a brain that weighs less than 800 grams.

Page 45: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Normal Distribution

Figure 5.11.1 Normal distribution to approximate distribution of brain weights of patients with AD (mean and SD estimated)

Source: Daniel (1995)

Page 46: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Normal Distribution

Figure 5.11.1 Normal distribution of brain weights (x) and the standard normal distribution (z)

Source: Daniel (1995)

Page 47: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Normal Distribution

• This formula transforms any value of x in any normal distribution to the corresponding value of z in the standard normal distribution.

Page 48: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Normal Distribution

Page 49: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Normal Distribution

• Instead of using the formula, you may use Excel’s STANDARDIZE function

=STANDARDIZE(x, mean, standard_dev)

• Then, apply the NORMSDIST function, or…

Page 50: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Normal Distribution

• You may use the NORMDIST function

=NORMDIST (x, mean, standard_dev, cumulative)

• cumulative – if FALSE, returns the probability that the x value will occur; if TRUE, returns the probability that the value will be less than or equal to x

Page 51: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Normal Distribution

• Suppose it is known that the heights of a certain population of individuals are approximately normally distributed with a mean of 70 inches and a standard deviation of 3 inches. What is the probability that a person picked at random from this group will be between 65 and 74 inches tall?

Page 52: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Normal Distribution

Page 53: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Normal Distribution

• In a population of 10,000 people described in the previous example, how many would you expect to be 6 feet 5 inches tall or taller?

Page 54: Lesson 5 - Probability Distributions

Bio 180 (Biostatistics)Lesson 5Probability Distributions3rd Week, January 2011

Normal Distribution

• In a population of 10,000 people described in the previous example, how many would you expect to be 6 feet 5 inches tall or taller?

Out of 10,000 people, we would expect 10,000(0.0099) = 99 to be 6 feet 5 inches (77 inches) tall or taller