Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall...

63
Chapter 3 Chapter 3 Distributions

Transcript of Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall...

Page 1: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Chapter 3Chapter 3Distributions

Page 2: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Continuous random Continuous random variablesvariables

•Are numerical variables whose values fall within a range or interval

•Are measurements•Can be described by density curves

Page 3: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Density curvesDensity curves• Is always on or aboveon or above the

horizontal axis•Has an area exactly equal to equal to

oneone underneath it•Often describes an overall

distribution•Describe what proportionsproportions of

the observations fall within each range of values

Page 4: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Unusual density Unusual density curvescurves

•Can be any shape•Are generic continuous distributions

•Probabilities are calculated by finding the finding the area under the curvearea under the curve

Page 5: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

1 2 3 4 5

.5

.25

P(X < 2) =

25.

225.2

How do you find the area of a triangle?

Page 6: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

1 2 3 4 5

.5

.25

P(X = 2) =

0

P(X < 2) =

.25

What is the area of a line

segment?

Page 7: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

In continuous distributions, P(P(XX < 2) & P( < 2) & P(XX << 2)2) are the same answer.

Hmmmm…

Is this different than discrete distributions?

Page 8: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

1 2 3 4 5

.5

.25

P(X > 3) =

P(1 < X < 3) =

Shape is a trapezoid –

How long are the bases?

2

21 hbbArea

.5(.375+.5)(1)=.4375

.5(.125+.375)(2) =.5

b2 = .375

b1 = .5

h = 1

Page 9: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

1 2 3 4

0.25

0.50 P(X > 1) =.75

.5(2)(.25) = .25

(2)(.25) = .5

Page 10: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

1 2 3 4

0.25

0.50P(0.5 < X < 1.5) =

.28125

.5(.25+.375)(.5) = .15625

(.5)(.25) = .125

Page 11: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Special Continuous Distributions

Page 12: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Uniform DistributionUniform Distribution• Is a continuous distribution that is

evenly (or uniformly) distributed• Has a density curve in the shape

of a rectangle• Probabilities are calculated by

finding the area under the curve

12

22

2 ab

ba

x

x

Where: a & b are the endpoints of the uniform distribution

How do you find the area of a rectangle?

Page 13: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

4.98 5.044.92

The Citrus Sugar Company packs sugar in bags labeled 5 pounds. However, the packaging isn’t perfect and the actual weights are uniformly distributed with a mean of 4.98 pounds and a range of .12 pounds.

a)Construct the uniform distribution above.

How long is this rectangle?

What is the height of this rectangle?

What shape does a uniform distribution have?

1/.12

Page 14: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

• What is the probability that a randomly selected bag will weigh more than 4.97 pounds?

4.98 5.044.92

1/.12

P(X > 4.97) =

.07(1/.12) = .5833What is the length of the shaded region?

Page 15: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

• Find the probability that a randomly selected bag weighs between 4.93 and 5.03 pounds.

4.98 5.044.92

1/.12

P(4.93<X<5.03) =

.1(1/.12) = .8333What is the length of the shaded region?

Page 16: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

The time it takes for students to The time it takes for students to drive to school is evenly distributed drive to school is evenly distributed with a minimum of 5 minutes and a with a minimum of 5 minutes and a range of 35 minutes.range of 35 minutes.

a)Draw the distribution

5

Where should the rectangle end?

40

What is the height of the rectangle?

1/35

Page 17: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

b) What is the probability that it takes less than 20 minutes to drive to school?

5 40

1/35

P(X < 20) =

(15)(1/35) = .4286

Page 18: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

c) What is the mean and standard deviation of this distribution?

= (5 + 40)/2 = 22.5

= (40 - 5)2/12 = 102.083

= 10.104

Page 19: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Density Curves

A density curve is similar to a histogram, but there are several important distinctions.

1. Obviously, a smooth curve is used to represent data rather than bars. However, a density curve describes the proportions of the observations that fall in each range rather than the actual number of observations.

2. The scale should be adjusted so that the total area under the curve is exactly 1. This represents the proportion 1 (or 100%).

Page 20: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Density Curves

3. While a histogram represents actual data (i.e., a sample set), a density curve represents an idealized sample or population distribution. (describes the proportion of the observations)

4. Always on or above the horizontal axis

5. We will still utilize mu for mean and sigma for standard deviation.

Page 21: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Density Curves: Mean & Median

Three points that have been previously made are especially relevant to density curves.

1. The median is the "equal areas" point. Likewise, the quartiles can be found by dividing the area under the curve into 4 equal parts.

2. The mean of the data is the "balancing" point.

3. The mean and median are the same for a symmetric density curve.

Page 22: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Shapes of Density Curves

• We have mostly discussed right skewed, left skewed, and roughly symmetric distributions that look like this:

Page 23: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Bimodal Distributions

We could have a bi-modal distribution. For instance, think of counting the number of tires owned by a two-person family. Most two-person families probably have 1 or 2 vehicles, and therefore own 4 or 8 tires. Some, however, have a motorcycle, or maybe more than 2 cars. Yet, the distribution will most likely have a “hump” at 4 and at 8, making it “bi-modal.”

Page 24: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Uniform Distributions

We could have a uniform distribution. Consider the number of cans in all six packs. Each pack uniformly has 6 cans. Or, think of repeatedly drawing a card from a complete deck. One-fourth of the cards should be hearts, one-fourth of the cards should be diamonds, etc.

Page 25: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Other Distributions

Many other distributions exist, and some do not clearly fall under a certain label. Frequently these are the most interesting, and we will discuss many of them.

#1 RULE – ALWAYS MAKE A PICTURE

It is the only way to see what is really going on!

Page 26: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Normal Normal DistributionsDistributions

• Symmetrical bell-shaped (unimodal) density curve

• AboveAbove the horizontal axis• N(, )• The transition points occur at + • Probability is calculated by finding the area area

under the curveunder the curve• As increasesincreases, the curve flattens &

spreads out• As decreasesdecreases, the curve gets

taller and thinner

How is this done mathematically?

Page 27: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Normal Curves• Curves that are symmetric, single-peaked, and

bell-shaped are often called normal curves and describe normal distributions.

• All normal distributions have the same overall shape. They may be "taller" or more spread out, but the idea is the same.

Page 28: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

What does it look like?

Page 29: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Normal Curves: μ and σ

• The "control factors" are the mean μ and the standard deviation σ.

• Changing only μ will move the curve along the horizontal axis.

• The standard deviation σ controls the spread of the distribution. Remember that a large σ implies that the data is spread out.

Page 30: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Finding μ and σ

• You can locate the mean μ by finding the middle of the distribution. Because it is symmetric, the mean is at the peak.

• The standard deviation σ can be found by locating the points where the graph changes curvature (inflection points). These points are located a distance σ from the mean.

Page 31: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

A

B

Do these two normal curves have the same mean? If so, what is it?

Which normal curve has a standard deviation of 3?

Which normal curve has a standard deviation of 1?

6

YESYES

BB

AA

Page 32: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

The 68-95-99.7 (Empirical)Rule

In a NORMAL DISTRIBUTIONS with mean μ and standard deviation σ:

• 68% of the observations are within σ of the mean μ.

• 95% of the observations are within 2 σ of the mean μ.

• 99.7% of the observations are within 3 σ of the mean μ.

Page 33: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

The 68-95-99.7 Rule

Page 34: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Why Use the Normal Distribution???

1. They occur frequently in large data sets (all SAT scores), repeated measurements of the same quantity, and in biological populations (lengths of roaches).

2. They are often good approximations to chance outcomes (like coin flipping).

3. We can apply things we learn in studying normal distributions to other distributions.

Page 35: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Heights of Young Women

• The distribution of heights of young women aged 18 to 24 is approximately normally distributed with mean = 64.5 inches and standard deviation = 2.5 inches.

Page 36: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

The 68-95-99.7 Rule

Page 37: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Use the previous chart...

• Where do the middle 95% of heights fall?

• What percent of the heights are above 69.5 inches?

• A height of 62 inches is what percentile?

• What percent of the heights are between 62 and 67 inches?

• What percent of heights are less than 57 in.?

Page 38: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Example

• Suppose, on average, it takes you 20 minutes to drive to school, with a standard deviation of 2 minutes. Suppose a normal model is appropriate for the distribution of drivers times.– How often will you arrive at school in less than 20

minutes?– How often will it take you more than 24 minutes?

Page 39: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Suppose that the height of male students at BHS is normally distributed with a mean of 71 inches and standard deviation of 2.5 inches. What is the probability that the height of a randomly selected male student is more than 73.5 inches?P(X > 73.5) = 0.16

71

68%

1 - .68 = .32

Page 40: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Suppose you take the SAT test and the ACT test. Not using the chart they provide, can you directly compare your SAT Math score to your ACT math score? Why or why not?

We need to standardized these scores so that we can compare them.

Page 41: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Standard Normal Standard Normal Density CurvesDensity Curves

Always has = 0 & = 1

To standardize:

x

zMust have

this memorized!

Page 42: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Let’s explore . . .

Suppose the mean and standard deviation of a distribution are = 50 & = 5.

If the x-value is 55, what is the z-score?

If the x-value is 45, what is the z-score?

If the x-value is 60, what is the z-score?

So what does the z-score tell you?

1

-1

2

Page 43: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

What do these z scores mean?

-2.3

1.8

6.1

-4.3

2.3 below the mean

1.8 above the mean

6.1 above the mean4.3 below the mean

Page 44: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Jonathan wants to work at Utopia Landfill. He must take a test to see if he is qualified for the job. The test has a normal distribution with = 45 and = 3.6. In order to qualify for the job, a person can not score lower than 2.5 standard deviations (z score) below the mean. Jonathan scores 35 on this test. Does he get the job?No, he scored 2.78 SD below the mean

Page 45: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Sally is taking two different math achievement tests with different means and standard deviations. The mean score on test A was 56 with a standard deviation of 3.5, while the mean score on test B was 65 with a standard deviation of 2.8. Sally scored a 62 on test A and a 69 on test B. On which test did Sally score the best?

She did better on test A.

Page 46: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Strategies for finding Strategies for finding probabilities or proportions in probabilities or proportions in

normal distributionsnormal distributions

1.State the probability statement

2.Draw a picture3.Calculate the z-score4.Look up the probability

(proportion) in the table

Page 47: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

The lifetime of a certain type of battery is normally distributed with a mean of 200 hours and a standard deviation of 15 hours. What proportion of these batteries can be expected to last less than 220 hours?P(X < 220) =

33.115

200220

z

.9082

Write the probability statement

Draw & shade the curve

Calculate z-score

Look up z-score in

table

Page 48: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

The lifetime of a certain type of battery is normally distributed with a mean of 200 hours and a standard deviation of 15 hours. What proportion of these batteries can be expected to last more than 220 hours?P(X>220) =

33.115

200220

z

1 - .9082 = .0918

Page 49: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

The lifetime of a certain type of battery is normally distributed with a mean of 200 hours and a standard deviation of 15 hours. How long must a battery last to be in the top 5%?P(X > ?) = .05

675.22415

200645.1

x

x .95.05

Look up in table 0.95 to find z- score

1.645

Page 50: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

The heights of the female students at PWSH are normally distributed with a mean of 65 inches. What is the standard deviation of this distribution if 18.5% of the female students are shorter than 63 inches?P(X < 63) = .185

6322.2

9.2

65639.

What is the z-score for the 63?

-0.9

Page 51: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Will my calculator do any of this normal

stuff?• Normalpdf – use for graphing

ONLYONLY

• Normalcdf – will find probability of area from lower bound to upper bound

• Invnorm (inverse normal) – will find z-score for probability

Page 52: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

The lifetime of a certain type of battery is normally distributed with a mean of 200 hours and a standard deviation of 15 hours. What proportion of these batteries can be expected to last less than 220 hours?

P(X < 220) =

Normalcdf(-∞,220,200,15)=.9082

N(200,15)

Page 53: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

The lifetime of a certain type of battery is normally distributed with a mean of 200 hours and a standard deviation of 15 hours. What proportion of these batteries can be expected to last more than 220 hours?

P(X>220) =

Normalcdf(220,∞,200,15) = .0918

N(200,15)

Page 54: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

The lifetime of a certain type of battery is normally distributed with a mean of 200 hours and a standard deviation of 15 hours. How long must a battery last to be in the top 5%?P(X > ?) = .05

.95.05

Invnorm(.95,200,15)=224.675

Page 55: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

The heights of female teachers at PWSH are normally distributed with mean of 65.5 inches and standard deviation of 2.25 inches. The heights of male teachers are normally distributed with mean of 70 inches and standard deviation of 2.5 inches. •Describe the distribution of differences of heights (male – female) teachers.

Normal distribution with = 4.5 & = 3.3634

Page 56: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

• What is the probability that a randomly selected male teacher is shorter than a randomly selected female teacher?

4.5

P(X<0) =

Normalcdf(-∞,0,4.5,3.3634 = .0901

Page 57: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Ways to Assess NormalityWays to Assess Normality

•Use graphs (dotplots, boxplots, or histograms)

•Normal probability (quantile) plot

Page 58: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Normal Probability (Quantile) Normal Probability (Quantile) plotsplots

• The observation (x) is plotted against known normal z-scores

• If the points on the quantile plot lie close to a straight line, then the data is normally distributed

• Deviations on the quantile plot indicate nonnormal data

• Points far away from the plot indicate outliers

• Vertical stacks of points (repeated observations of the same number) is called granularity

Page 59: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Consider a random sample with n = 5.To find the appropriate z-scores for a sample of size 5, divide the standard normal curve into 5 equal-area regions.

Why are these regions not the same

width?

Page 60: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Consider a random sample with n = 5.Next – find the median z-score for each region.

-1.28 0 1.28

-.524 .524

Why is the median not

in the “middle” of each region?

These would be the z-scores (from the standard normal curve) that we

would use to plot our data against.

Page 61: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Let’s construct a normal probability plot. The values of the normal scores depend on the sample size n. The normal scores when n = 10 are below:

-1.539 -1.001 -0.656 -0.376 -0.123 0.123 0.376 0.656 1.001 1.539

Suppose we have the following observations of widths of contact windows in integrated circuit chips:

3.21 2.49 2.94 4.38 4.02 3.62 3.30 2.85 3.34 3.81

Sketch a scatterplot by pairing the smallest normal score with the smallest observation from

the data set & so on

1 2 3 4 5

-1

1N

orm

al S

core

s

Widths of Contact Windows

What should happen if our data

set is normally

distributed?

Page 62: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Notice that the boxplot is approximately

symmetrical and that the normal

probability plot is approximately linear.

Notice that the boxplot is approximately

symmetrical except for the outlier and

that the normal probability plot shows

the outlier.Notice that the boxplot is skewed left and

that the normal probability plot

shows this skewness.

Page 63: Chapter 3 Distributions. Continuous random variables Are numerical variables whose values fall within a range or interval Are measurements Can be described.

Are these approximately normally distributed?

50 48 54 47 51 52 46 53 52 51 48 48 54 55 57 45 53 50 47 49 50 56 53 52

Both the histogram & boxplot are approximately symmetrical, so these data are approximately normal.

The normal probability plot is approximately linear, so these data are approximately normal.

What is this called?