Probability and Statistics

30
Probability and Statistics 1.3 The Normal Distributions

description

Probability and Statistics. 1.3 The Normal Distributions. Density Curve. A density curve is a smooth function meant to approximate a histogram. The area under a density curve is one. - PowerPoint PPT Presentation

Transcript of Probability and Statistics

Page 1: Probability and Statistics

Probability and Statistics

Probability and Statistics

1.3 The Normal Distributions1.3 The Normal Distributions

Page 2: Probability and Statistics

Density CurveDensity CurveDensity CurveDensity Curve

A density curve is a smooth function meant to approximate a histogram.

The area under a density curve is one.

Since the density curve represents the entire distribution, the area under the curve on any interval represents the proportion of observations in that interval.

Page 3: Probability and Statistics

Density CurveDensity CurveDensity CurveDensity Curve

Page 4: Probability and Statistics

Density Curves: Density Curves: PropertiesProperties

Density Curves: Density Curves: PropertiesProperties

Page 5: Probability and Statistics

Density CurvesDensity CurvesDensity CurvesDensity CurvesThe mean of density curve is the point at which the curve would balance.

The median of a density curve is the equal-areas point. In other words the areas under the curve on either side of the median are equal.

For symmetric density curves, balance point (mean) and the equal-areas point (median) are the same.

Page 6: Probability and Statistics

6

Symmetric Data is symmetric if the left half of its histogram (or density curve) is roughly a mirror of its right half.

Skewed Data is skewed if its histogram (or density curve) is not symmetric and if it extends more to one side than the other.

DefinitionsDefinitions

Page 7: Probability and Statistics

7

Mode = Mean = Median

SYMMETRIC

SKEWED LEFT(negatively)

Mean Mode Median

SKEWED RIGHT(positively)

Mean Mode Median

SkewnessSkewness

Page 8: Probability and Statistics

CharacterizationCharacterizationCharacterizationCharacterizationA normal distribution is bell-shaped and symmetric.

The distribution is determined by the mean (mu (μ)), and the standard deviation (sigma (σ)).

The mean controls the center and stdev controls the spread.

Note: These two density curves have the same mean but different Standard

Deviations.

Page 9: Probability and Statistics

68-95-99.7 Rule68-95-99.7 Rule68-95-99.7 Rule68-95-99.7 RuleFor any normal curve with mean μ and standard deviation σ:

68 percent of the observations fall within one standard deviation of the mean. (μ – 1σ < x < μ + 1σ)

95 percent of observation fall within 2 standard deviations. (μ – 2σ < x < μ + 2σ)

99.7 percent of observations fall within 3 standard deviations of the mean. (μ – 3σ < x < μ + 3σ)

Page 10: Probability and Statistics

10

Waiting Times of Bank Customers at Different Banks

in minutes

Jefferson Valley Bank

Bank of Providence

6.5

4.2

6.6

5.4

6.7

5.8

6.8

6.2

7.1

6.7

7.3

7.7

7.4

7.7

7.7

8.5

7.7

9.3

7.7

10.0

Jefferson Valley Bank

7.15

7.20

7.7

7.10

Bank of Providence

7.15

7.20

7.7

7.10

Mean

Median

Mode

Midrange

What is the Standard Deviation of the data from JV Bank? from BofP?

Page 11: Probability and Statistics

Dotplots of Waiting Times

Visually, which one has the greater spread?

Page 12: Probability and Statistics

12

Measures of VariationMeasures of Variation

Range

highest value – lowest value

Page 13: Probability and Statistics

13

a measure of variation of the scores about the mean

(average deviation from the mean)

Measures of Variation

Standard Deviation

Page 14: Probability and Statistics

14

Sample Standard Deviation Formula

Sample Standard Deviation Formula

calculators can compute the sample standard deviation of data

Σ (x - x)2

n - 1S =

Page 15: Probability and Statistics

15

Symbolsfor Standard Deviation

Population

σσx

xσn

Book

Some graphicscalculators

Somenon-graphicscalculators

Sample

s

Sx

xσn-1

Textbook

Some graphicscalculators

Somenon-graphics

calculators

Articles in professional journals and reports often use SD for standard deviation and VAR for variance.

Page 16: Probability and Statistics

Understanding Standard Deviation

Spot the Jack Russell weighs 19 pounds. The mean weight for a Jack Russell Terrier is 16 pounds with a std dev of 1.5 pounds. Desdi the Maine Coon cat also weighs 19 pounds and frequently kicks Spot’s butt around the house. The mean weight for a Maine Coon is 17 pounds with a std dev of 0.75 pounds. Which animal is most in need of a diet?

Page 17: Probability and Statistics

Understanding Standard Deviation

The only way to compare values in different units is to standardize the deviations from the means. In other words, we

first have to convert all of the values into similar units – standard deviations from the respective means. THEN, we can compare them directly. This is done through the application of

a Z-score:

(y – y)z = s

Value of interestValue of interest Mean of

dataMean of

data

Std dev of data

Std dev of data

Page 18: Probability and Statistics

z-score

will have same units as the independent variable if the data in quantitative or unit-less if the independent variable is categorical

represents the number of standard deviations a given number in the data is from the mean

Understanding Standard Deviation

Page 19: Probability and Statistics

Understanding Standard DeviationSpot the Jack Russell weighs 19 pounds. The mean weight for a Jack Russell Terrier is 16 pounds with a std dev of 1.5 pounds. Desdi the Maine Coon cat also weighs 19 pounds and frequently kicks Spot’s butt around the house. The mean weight for a Maine Coon is 17 pounds with a std dev of .75 pounds. Which animal is most in need of a diet?

z-score for Spot z-score for Desdi

Desdi is farther from the mean for the typical weight of her breed than Spot is from his breed.

What can you say about the spread of weights for the two breeds?

Can you think of any extraneous factor that could explain Desdi’s weight other than being overweight?

z =19−170.75

=2.67z =19−16

1.5=2

Page 20: Probability and Statistics

Understanding Standard Deviation

Spotz=2

Desdiz=2.67

What percent of Jack Russell terriers weigh less than Spot? more?

What percent of Maine Coon cats weigh less than Desdi? more?

Page 21: Probability and Statistics

Using z-score and the normal distribution

Suppose it takes you 20 minutes to drive to school, with a standard deviation of 2 minutes.

• How often will you arrive on school in less than 22 minutes?• How often will it take you more than 24 minutes?• 75% of the time you will arrive in x minutes or less. Solve for x.• 43% of the time you will arrive in y minutes or more. Solve for y.

Page 22: Probability and Statistics

22

Measures of VariationMeasures of VariationVariance

standard deviation squared

s2 or σ2 Notation

Page 23: Probability and Statistics

23

SampleVariance

PopulationVariance

Variance

Σ (x - x )2

n - 1s2 =

Σ (x - µ)2

Nσ2 =

Page 24: Probability and Statistics

24

Round-off Rulefor measures of variation

Round-off Rulefor measures of variation

Carry at least one more decimal place than is present in the

original set of values.

Round only the final answer, never in the middle of a calculation.

Page 25: Probability and Statistics

25

Estimation of Standard DeviationRange Rule of Thumb

x - 2s x x + 2s

Range ≈ 4sor(minimum

usual value)(maximum usual

value)

Range

4s ≈ =highest value - lowest value

4

Page 26: Probability and Statistics

minimum ‘usual’ value ≈ (mean) - 2 (standard deviation)

minimum ≈ x - 2(s)

maximum ‘usual’ value ≈ (mean) + 2 (standard deviation)

maximum ≈ x + 2(s)

Usual Sample Values

Usual Sample Values

Page 27: Probability and Statistics

27

The Empirical Rule(applies to bell-shaped distributions)

x x - s x + s

68% within1 standard deviation

34% 34%

x - 2s x + 2s

95% within 2 standard deviations

13.5% 13.5%

x - 3s x + 3s

99.7% of data are within 3 standard deviations of the mean

0.1% 0.1%

2.4% 2.4%

Page 28: Probability and Statistics

28

Chebyshev’s TheoremChebyshev’s Theorem

applies to distributions of any shape.

the proportion (or fraction) of any set of data lying within K standard deviations of the mean is always at least 1 - 1/K

2 , where K is any positive number greater than 1.

at least 3/4 (75%) of all values lie within 2 standard deviations of the mean.

at least 8/9 (89%) of all values lie within 3 standard deviations of the mean.

Page 29: Probability and Statistics

29

Measures of Variation Summary

Measures of Variation Summary

For typical data sets, it is unusual for a score to differ from the mean by more than 2 or 3 standard deviations.

Page 30: Probability and Statistics

AssignmentAssignment

Read Section 1.3

p. 64-66 1.62, 1.64-1.69, 1.71