Probability and Statistics
description
Transcript of Probability and Statistics
Probability and Statistics
Probability and Statistics
1.3 The Normal Distributions1.3 The Normal Distributions
Density CurveDensity CurveDensity CurveDensity Curve
A density curve is a smooth function meant to approximate a histogram.
The area under a density curve is one.
Since the density curve represents the entire distribution, the area under the curve on any interval represents the proportion of observations in that interval.
Density CurveDensity CurveDensity CurveDensity Curve
Density Curves: Density Curves: PropertiesProperties
Density Curves: Density Curves: PropertiesProperties
Density CurvesDensity CurvesDensity CurvesDensity CurvesThe mean of density curve is the point at which the curve would balance.
The median of a density curve is the equal-areas point. In other words the areas under the curve on either side of the median are equal.
For symmetric density curves, balance point (mean) and the equal-areas point (median) are the same.
6
Symmetric Data is symmetric if the left half of its histogram (or density curve) is roughly a mirror of its right half.
Skewed Data is skewed if its histogram (or density curve) is not symmetric and if it extends more to one side than the other.
DefinitionsDefinitions
7
Mode = Mean = Median
SYMMETRIC
SKEWED LEFT(negatively)
Mean Mode Median
SKEWED RIGHT(positively)
Mean Mode Median
SkewnessSkewness
CharacterizationCharacterizationCharacterizationCharacterizationA normal distribution is bell-shaped and symmetric.
The distribution is determined by the mean (mu (μ)), and the standard deviation (sigma (σ)).
The mean controls the center and stdev controls the spread.
Note: These two density curves have the same mean but different Standard
Deviations.
68-95-99.7 Rule68-95-99.7 Rule68-95-99.7 Rule68-95-99.7 RuleFor any normal curve with mean μ and standard deviation σ:
68 percent of the observations fall within one standard deviation of the mean. (μ – 1σ < x < μ + 1σ)
95 percent of observation fall within 2 standard deviations. (μ – 2σ < x < μ + 2σ)
99.7 percent of observations fall within 3 standard deviations of the mean. (μ – 3σ < x < μ + 3σ)
10
Waiting Times of Bank Customers at Different Banks
in minutes
Jefferson Valley Bank
Bank of Providence
6.5
4.2
6.6
5.4
6.7
5.8
6.8
6.2
7.1
6.7
7.3
7.7
7.4
7.7
7.7
8.5
7.7
9.3
7.7
10.0
Jefferson Valley Bank
7.15
7.20
7.7
7.10
Bank of Providence
7.15
7.20
7.7
7.10
Mean
Median
Mode
Midrange
What is the Standard Deviation of the data from JV Bank? from BofP?
Dotplots of Waiting Times
Visually, which one has the greater spread?
12
Measures of VariationMeasures of Variation
Range
highest value – lowest value
13
a measure of variation of the scores about the mean
(average deviation from the mean)
Measures of Variation
Standard Deviation
14
Sample Standard Deviation Formula
Sample Standard Deviation Formula
calculators can compute the sample standard deviation of data
Σ (x - x)2
n - 1S =
15
Symbolsfor Standard Deviation
Population
σσx
xσn
Book
Some graphicscalculators
Somenon-graphicscalculators
Sample
s
Sx
xσn-1
Textbook
Some graphicscalculators
Somenon-graphics
calculators
Articles in professional journals and reports often use SD for standard deviation and VAR for variance.
Understanding Standard Deviation
Spot the Jack Russell weighs 19 pounds. The mean weight for a Jack Russell Terrier is 16 pounds with a std dev of 1.5 pounds. Desdi the Maine Coon cat also weighs 19 pounds and frequently kicks Spot’s butt around the house. The mean weight for a Maine Coon is 17 pounds with a std dev of 0.75 pounds. Which animal is most in need of a diet?
Understanding Standard Deviation
The only way to compare values in different units is to standardize the deviations from the means. In other words, we
first have to convert all of the values into similar units – standard deviations from the respective means. THEN, we can compare them directly. This is done through the application of
a Z-score:
(y – y)z = s
Value of interestValue of interest Mean of
dataMean of
data
Std dev of data
Std dev of data
z-score
will have same units as the independent variable if the data in quantitative or unit-less if the independent variable is categorical
represents the number of standard deviations a given number in the data is from the mean
Understanding Standard Deviation
Understanding Standard DeviationSpot the Jack Russell weighs 19 pounds. The mean weight for a Jack Russell Terrier is 16 pounds with a std dev of 1.5 pounds. Desdi the Maine Coon cat also weighs 19 pounds and frequently kicks Spot’s butt around the house. The mean weight for a Maine Coon is 17 pounds with a std dev of .75 pounds. Which animal is most in need of a diet?
z-score for Spot z-score for Desdi
Desdi is farther from the mean for the typical weight of her breed than Spot is from his breed.
What can you say about the spread of weights for the two breeds?
Can you think of any extraneous factor that could explain Desdi’s weight other than being overweight?
z =19−170.75
=2.67z =19−16
1.5=2
Understanding Standard Deviation
Spotz=2
Desdiz=2.67
What percent of Jack Russell terriers weigh less than Spot? more?
What percent of Maine Coon cats weigh less than Desdi? more?
Using z-score and the normal distribution
Suppose it takes you 20 minutes to drive to school, with a standard deviation of 2 minutes.
• How often will you arrive on school in less than 22 minutes?• How often will it take you more than 24 minutes?• 75% of the time you will arrive in x minutes or less. Solve for x.• 43% of the time you will arrive in y minutes or more. Solve for y.
22
Measures of VariationMeasures of VariationVariance
standard deviation squared
s2 or σ2 Notation
23
SampleVariance
PopulationVariance
Variance
Σ (x - x )2
n - 1s2 =
Σ (x - µ)2
Nσ2 =
24
Round-off Rulefor measures of variation
Round-off Rulefor measures of variation
Carry at least one more decimal place than is present in the
original set of values.
Round only the final answer, never in the middle of a calculation.
25
Estimation of Standard DeviationRange Rule of Thumb
x - 2s x x + 2s
Range ≈ 4sor(minimum
usual value)(maximum usual
value)
Range
4s ≈ =highest value - lowest value
4
minimum ‘usual’ value ≈ (mean) - 2 (standard deviation)
minimum ≈ x - 2(s)
maximum ‘usual’ value ≈ (mean) + 2 (standard deviation)
maximum ≈ x + 2(s)
Usual Sample Values
Usual Sample Values
27
The Empirical Rule(applies to bell-shaped distributions)
x x - s x + s
68% within1 standard deviation
34% 34%
x - 2s x + 2s
95% within 2 standard deviations
13.5% 13.5%
x - 3s x + 3s
99.7% of data are within 3 standard deviations of the mean
0.1% 0.1%
2.4% 2.4%
28
Chebyshev’s TheoremChebyshev’s Theorem
applies to distributions of any shape.
the proportion (or fraction) of any set of data lying within K standard deviations of the mean is always at least 1 - 1/K
2 , where K is any positive number greater than 1.
at least 3/4 (75%) of all values lie within 2 standard deviations of the mean.
at least 8/9 (89%) of all values lie within 3 standard deviations of the mean.
29
Measures of Variation Summary
Measures of Variation Summary
For typical data sets, it is unusual for a score to differ from the mean by more than 2 or 3 standard deviations.
AssignmentAssignment
Read Section 1.3
p. 64-66 1.62, 1.64-1.69, 1.71