Normal Distributions (aka Bell Curves, Gaussians)

30
Normal Distributions (aka Bell Curves, Gaussians) Spring 2010

description

Normal Distributions (aka Bell Curves, Gaussians). Spring 2010. You’ve probably all seen a bell curve…. The Normal distribution is common. Lots of real data follows a normal shape. For example 1) Many/most biometric measurements (heights, femur lengths, skull diameters, etc.) - PowerPoint PPT Presentation

Transcript of Normal Distributions (aka Bell Curves, Gaussians)

Page 1: Normal Distributions (aka Bell Curves, Gaussians)

Normal Distributions(aka Bell Curves, Gaussians)

Spring 2010

Page 2: Normal Distributions (aka Bell Curves, Gaussians)

You’ve probably all seen a bell curve…

Page 3: Normal Distributions (aka Bell Curves, Gaussians)

The Normal distribution is common

Lots of real data follows a normal shape. For example

1) Many/most biometric measurements (heights, femur lengths, skull diameters, etc.)

2) Scores on many standardized exams (IQ tests) are forced into a normal shape before reporting

3) Many quality control measurements, if you take the log first, have a normal shape.

Page 4: Normal Distributions (aka Bell Curves, Gaussians)

When sampling from a normal

Normal distributions are typically characterized by two numbers, their mean or “expected value” which corresponds to the peak, and their “standard deviation” which is the distance from the mean to the inflection point.

Large standard deviations result in “spread out” normals. Small standard deviations result in “strongly peaked” distributions.

Page 5: Normal Distributions (aka Bell Curves, Gaussians)

Mean (µ) and Standard deviation (σ) for a normal distribution

µ (here 25)

this distance is σ

height is about60% of peak here

Page 6: Normal Distributions (aka Bell Curves, Gaussians)

Two normals, corresponding to different standard deviations.

Mean=100, std.dev = 16 Mean=100, std.dev = 4

Page 7: Normal Distributions (aka Bell Curves, Gaussians)

The EDA of Normal distributions

The central location of a normal distribution is given by the mean μ. (The median is also μ).

The spread is given by the standard deviation σ. (The interquartile range is 1.35σ, but you do NOT need to know that)

Normal distributions are symmetric and typically have few, if any, outliers. If your data has a lot of outliers, but is otherwise symmetric and unimodal, it may have a “t” distribution (discussed later in class).

Page 8: Normal Distributions (aka Bell Curves, Gaussians)

Probabilities from a Normal distribution

Normal distributions have a nice property that, knowing the mean (μ) and standard deviation (σ), we can tell how much data will fall in any region.

Examples – the normal distribution is symmetric, so 50% of the data is smaller than μ and 50% is larger than μ.

Page 9: Normal Distributions (aka Bell Curves, Gaussians)

More Normal Probabilities

It is always true that about 68% of the data appears within 1 standard deviation of the mean (so about 68% of the data appears in the region μ±σ)

Page 10: Normal Distributions (aka Bell Curves, Gaussians)

Yet more normal probabilities

It is also true about 95% of the appears within 2 standard deviation of the mean, and about 99.7% of the data appear within 3 standard deviations of the mean (so it’s VERY rare to go beyond 3 standard deviations.

In quality control applications, one often is interested in “6-sigma”. Note 6 standard deviations includes 99.9999998% of the data.

Page 11: Normal Distributions (aka Bell Curves, Gaussians)

95% within 2 standard deviations, 99.7% within 3 standard deviations

Page 12: Normal Distributions (aka Bell Curves, Gaussians)

Computing more general probabilities

Suppose you want to know how much data appears within 1.5 standard deviations of the mean, or how much data appears between 1.3 and 1.7 standard deviations of the mean.

Page 13: Normal Distributions (aka Bell Curves, Gaussians)

Another way

There is another way of computing normal probabilities that is 1) the way it used to be done, back in pre-handy-computer days, 2) useful for understanding more about the normal distribution

The number of standard deviations an observation is from the mean is called the Z-score for that observation.

Page 14: Normal Distributions (aka Bell Curves, Gaussians)

Z-score examples

If μ=100 and σ=16 (this is true of IQ scores in the U.S.), then an observation X=125 is 25 points above the mean, which corresponds to 25/16 = 1.5625 standard deviations above the mean.

If general, a Z-score for an observation X is Z=(X-μ)/σ

Observations above the mean get positive Z-scores, observations below the mean get negative Z-scores.

Page 15: Normal Distributions (aka Bell Curves, Gaussians)

Computing probabilities with Z-scores

Fortunately, the Z-score is all you need to know to compute probabilities from a normal distribution.

The reason is that Z-scores map directly to percentiles.

For each Z-score Stata can provide the percentile. For example, if the Z-score is 1, the percentile is 84.13%. If the Z-score is 2.3, then the percentile is 98.93%

Page 16: Normal Distributions (aka Bell Curves, Gaussians)

Using a Z-table

Z-table shown separately in class and illustrated.

Page 17: Normal Distributions (aka Bell Curves, Gaussians)

Probabilities between Z-scores

Again, IQ scores are normally distributed with mean 100 and standard deviation 16.

How many people have IQ scores between 90 and 120?

Compute the corresponding Z-scores. For 90, the Z-score is (90-100)/16 = -0.625. For 120, the Z-score is (120-100)/16 = 1.25.

Find the corresponding percentiles (SAS). The percentile for Z=1.25 is 89.43%. The percentile for Z=(-0.625) is 26.6%.

The amount between these is 89.43 – 26.60 = 62.83%

Page 18: Normal Distributions (aka Bell Curves, Gaussians)

Comparing observations from different normal distributions

The central idea is that a Z-score corresponds to a percentile for the observations.

If you have observations from multiple normal distributions, you can compute the Z-score for each observations and compare which has the “better” score.

Page 19: Normal Distributions (aka Bell Curves, Gaussians)

Example

Suppose you have two students, one with a 23 on the ACT (mean 22 and standard deviation 3) and another with a 1220 on the SAT (mean 900 and standard deviation 250).

The Z-score for the student with the ACT is (23-22)/3 = 0.33 while the Z-score for the student with the SAT is (1220-900)/250 = 1.28.

The student with the SAT performed much better (relative to peers on the exam).

Page 20: Normal Distributions (aka Bell Curves, Gaussians)

Review

The mean and standard deviation of a normal is all you need to know to compute any percentage.

Z-scores map directly to percentiles. A “Z-table” provides these percentiles.

To find the probability below a value, compute the Z-score and look up the percentile.

To find the probability above a value, find the Z-score and percentile, then take 1 minus that percentile (e.g. if the probability of being below is 0.24, the probability above is 0.76)

To find the probability between two values, find the Z-scores and percentiles for each value, then take the difference.

Page 21: Normal Distributions (aka Bell Curves, Gaussians)

Example

Suppose you have data which is normally distribution with mean 70 and standard deviation 4 (this describes U.S. male heights in inches).

– What proportion of data is less than 70? Half the data is always below the mean, so this is 50%.

– What proportion of data is between 68 and 74? The Z-scores for 68 and 74 are (68-70)/4 = (-0.50) and (74-70)/4 = 1.00, corresponding to percentiles from the table of 84.13% and 30.85%. The difference between thesepercentiles is 84.13% - 30.85% = 53.28%

– What proportion of data is greater than 76? The Z-score for 76 is (76-70)/4 = 1.5 which corresponds to a percentile of 93.32%. Remember that percentiles are always in terms of “less than the value”. To get greater than, we subtract from 100% to get 6.68% as our answer.

Page 22: Normal Distributions (aka Bell Curves, Gaussians)

Example

One manufacturing plant (plant A) produces bolts whose lengths are normally distribution with mean 2 inches and standard deviation 0.009 inches. Another plant (plant B) produces bolts whose lengths are normally distributed with mean 1.99 inches and standard deviation 0.003 inches. For the bolts to be usable, their length must be between 1.98 and 2.02 inches. Which plant has the higher percentage of usable bolts? (this gets to the issue that spread may be more important than central location in some quality control examples).

Page 23: Normal Distributions (aka Bell Curves, Gaussians)

Example continued

Plant A is N(2.00,0.009) Plant B is N(1.99,0.003) For each plant, we want the percentage of

bolts between 1.98 and 2.02 inches. Thus, we need the Z-scores for 1.98 and

2.02 from each plant.

Page 24: Normal Distributions (aka Bell Curves, Gaussians)

Example continued

Plant A is N(2.00,0.009). Z-scores for 1.98 and 2.02 are Z=(1.98-2.00)/0.009=(-2.22) and Z=(2.02-2.00)/0.009)=2.22.

These correspond to percentiles of 0.0132 and 0.9868. Thus, the probability is 0.9868-0.0132 = 0.9736

Plant B is N(1.99,0.003). Z-scores for 1.98 and 2.02 are Z=(1.98-1.99)/0.003=(-3.33) and Z=(2.02-1.99)/0.003=10.

These correspond to percentiles of 0.0004 and 1.0000 (for values off the chart, use 0 or 1). Thus, the probability is 1.0000-0.0004 = 0.9996.

Thus, even though Plant A averages a “perfect bolt”, the increased spread makes them worse overall.

Page 25: Normal Distributions (aka Bell Curves, Gaussians)

Percentiles back to Z-scores

We can go both directions between Z-scores and percentiles. Each Z-score corresponds to a percentile. Similarly, each percentile has a corresponding Z-score.

For example, what IQ score has 80% of the people below it?

Page 26: Normal Distributions (aka Bell Curves, Gaussians)

Going from percents to Z-scores

In the U.S., IQ scores are normally distributed with mean 100 and standard deviation 16.

What is the 80th percentile of IQs? In other words, for what IQ do 80% of the people fall

below it? The first step is to find the Z-score corresponding to

80% (look for the number closest to 0.8000 in the BODY of the table, then find the corresponding Z-score)

That Z-score is Z=0.84

Page 27: Normal Distributions (aka Bell Curves, Gaussians)

Z-scores back to IQ scores

The Z-score corresponding to 80% is 0.84 Remember the Z-score is Z=(X-μ)/σ This formula can be reversed to find

X=σZ + μ Our Z=0.84, μ=100, and σ=16, so X = (0.84)(16) + 100 = 113.44 So 80% of people have an IQ score below

113.44

Page 28: Normal Distributions (aka Bell Curves, Gaussians)

New Example

What about the middle 50% of IQ scores? What percentiles does this correspond to?

To get the middle 50%, we need to stretch from the 25th percentile to the 75th percentile.

The 25th percentile corresponds to Z-score of (-0.67), while the 75th percentile corresponds to a Z-score of 0.67

These correspond to IQ’s of (-0.67)(16)+100 = 89.28 and (0.67)(16) + 100 = 110.72

Page 29: Normal Distributions (aka Bell Curves, Gaussians)

Related example

What about the middle 95% of values (removing 2.5% from each tail)

Quick answer is that we’ve already said going 2 standard deviations in either direction contains approximately 95% of the values, which would correspond to IQs between 68 and 132.

Exact answer, however, is that we need the 2.5% and 97.5% percentiles. These correspond to -1.96 and +1.96.

Going 1.96 standard deviations in either direction from the mean 100 means the middle 95% of IQ scores fall between 68.64 and 131.36

Page 30: Normal Distributions (aka Bell Curves, Gaussians)

Another example

What about the middle 99% of values? We need to find the 0.5th and 99.5th

percentiles (removing one half a percent from each tail, leaving 99% in the middle)

These are -2.576 and 2.576, which would correspond to IQ scores between (-2.576)(16) + 100 = 58.78 and (2.576)16 + 100 = 141.21