Describing Distributions With Numbers Section 1.3 cont. (five number summary, boxplots, variance,...
-
Upload
rylee-tallon -
Category
Documents
-
view
213 -
download
0
Transcript of Describing Distributions With Numbers Section 1.3 cont. (five number summary, boxplots, variance,...
Describing Distributions Describing Distributions With NumbersWith Numbers
Section 1.3 cont.Section 1.3 cont.(five number summary, boxplots, variance, standard (five number summary, boxplots, variance, standard deviation)deviation)
Target Goal: I can calculate a 5 number summary and Target Goal: I can calculate a 5 number summary and construct a boxplot. construct a boxplot.
I can describe spread using the standard I can describe spread using the standard deviation of a distribution. deviation of a distribution.
Hw:Hw: pg 71: 92, 93, 95, 96, 97, 103, 105, 107 - 110 pg 71: 92, 93, 95, 96, 97, 103, 105, 107 - 110
Five-Number SummaryFive-Number Summary
Data set consisting of smallest Data set consisting of smallest observation, first quartile, median, observation, first quartile, median, third quartile, and largest observation third quartile, and largest observation written in order. written in order.
Min Q1 M Q3 Max Min Q1 M Q3 Max
It gives us a It gives us a quick summary of both quick summary of both center and spread.center and spread.
Bonds:Bonds:
Min Q1 Min Q1 M M Q3 Max Q3 Max
1616 2525 3434 4141 7373
Box (and whiskers) PlotBox (and whiskers) Plot
A graph of a five-number summary of A graph of a five-number summary of a distribution; a distribution;
• best for side- by-side best for side- by-side comparisonscomparisons since they show less since they show less detail than histograms or stemplots; detail than histograms or stemplots;
• drawn either horizontally or drawn either horizontally or vertically.vertically.
Modified BoxplotModified Boxplot
Because the regular boxplot conceals Because the regular boxplot conceals outliers we will use modified boxplot.outliers we will use modified boxplot.
• Plots outliers as isolated pointsPlots outliers as isolated points
• Extend “whiskers” out to Extend “whiskers” out to largest largest and/or smallest data pointsand/or smallest data points that are that are not outliers not outliers
• Remember: label axis, title graph, Remember: label axis, title graph, scale axis.scale axis.
Regular (a) and modified (b) boxplots Regular (a) and modified (b) boxplots comparing Barry Bonds and Hank Aaron comparing Barry Bonds and Hank Aaron home runs.home runs.
Min Q1 M Q3 Max Min Q1 M Q3 Max
OutlierOutlier
Activity:Activity: Acing the First TestAcing the First Test
Enter the scores of Mrs. Liao’s students Enter the scores of Mrs. Liao’s students on their first statistics test into L1 from on their first statistics test into L1 from page 71, ex. 92page 71, ex. 92
Sort Data(ascending)Sort Data(ascending): : InspireInspirePlace cursor on column titlePlace cursor on column titleSelect:Menu,1:Actions,6:sort, sort by (a)Select:Menu,1:Actions,6:sort, sort by (a)
Inspire: Appendix A6Inspire: Appendix A6
a.a. Find the five-number summary and Find the five-number summary and
verify your expectation from a.verify your expectation from a. Calculator activityCalculator activity
• Enter the scores into L1 from page 71. Enter the scores into L1 from page 71.
•Calculator: 1 VAR STAT(L1)Calculator: 1 VAR STAT(L1)
43, 82, 43, 82, 87.7587.75, 93, 98, 93, 98
mean = 2544/30 (or )= mean = 2544/30 (or )= 84.884.8
the median is greater than the the median is greater than the meanmean
x
b.b. What is the range of the middle half What is the range of the middle half
of the score of the statistic students?of the score of the statistic students?
Between Q1 and Q3:Between Q1 and Q3:
•Between 82 and 93Between 82 and 93
Acing the First Test Cont.Acing the First Test Cont.
c.c. Construct Construct by hand a modified boxplot by hand a modified boxplot of of the stats students scores.the stats students scores.
First find potential outliers.First find potential outliers.• IQR = IQR = • Q1 - IQR x 1.5 = Q1 - IQR x 1.5 = • Q3 + IQR x 1.5 =Q3 + IQR x 1.5 =• Outliers:Outliers:Graph: Mark a small x for the outlier(s), next lowest Graph: Mark a small x for the outlier(s), next lowest
min, Q1, M, Q3, max. min, Q1, M, Q3, max. Draw box and whisker plot.Draw box and whisker plot.
93 182 1 82 16.5 65.5 93 16.5 109.5
43,45
Acing the First Test Cont.Acing the First Test Cont.
d.d. On your calculator: On your calculator: First define Plot1 First define Plot1 to to be a be a modified boxplot modified boxplot using the list. using the list. Graph, trace and compare. Is there an Graph, trace and compare. Is there an outlier? If so, was it the same as in part outlier? If so, was it the same as in part a ?a ?
• Based on the boxplot, conjecture the Based on the boxplot, conjecture the shape of the corresponding histogram.shape of the corresponding histogram.
• Histogram Histogram shape:______________________shape:______________________
43,45
Acing the First Test Cont.Acing the First Test Cont.
• Next, Define Next, Define Plot2 to be a histogram Plot2 to be a histogram also also using the same list. Trace and compare. using the same list. Trace and compare. Did you guess correctly? Roughly draw Did you guess correctly? Roughly draw histogram below. histogram below.
Important Note:Important Note:
• If a distribution contains If a distribution contains outliersoutliers, use , use the the median and the IQRmedian and the IQR to describe to describe the distribution.the distribution.
The most common numerical The most common numerical description of a distribution is description of a distribution is the :the :Standard deviationStandard deviation (s): (s):
• measures spread measures spread by looking at how by looking at how far the observations are from their far the observations are from their meanmean
• The standard deviations (s) is the The standard deviations (s) is the square root of the variance (ssquare root of the variance (s22).).
VarianceVariance (s (s22)) of a set of observations is the of a set of observations is the average of the squares of the deviations average of the squares of the deviations of the observations from their mean.of the observations from their mean.
Note: Most of the time we will use calculator Note: Most of the time we will use calculator (STAT:CALC:1VAR STAT).(STAT:CALC:1VAR STAT).
22 ( )
( 1)
ix xs
n
Why square the Why square the deviations?deviations?
• It makes them all non negative so that the It makes them all non negative so that the observations far from the mean in either observations far from the mean in either direction will have large positive squared direction will have large positive squared deviation.deviation.
Properties of the Standard Properties of the Standard DeviationDeviation
• The sum of the deviations of the The sum of the deviations of the observations from their mean will always be observations from their mean will always be zero.zero.
• Choose s only when mean is chosenChoose s only when mean is chosen as the as the measure of center.measure of center.
• s = 0 only when there is no spread (all s = 0 only when there is no spread (all observations have the same value).observations have the same value).
• ss, like the mean is , like the mean is not resistantnot resistant. Strong . Strong skewness or a few outliers can make s very skewness or a few outliers can make s very large.large.
• If a value is If a value is more than 2more than 2σσ’s ’s from the mean it from the mean it is is an outlier.an outlier.
Why divide by (n – 1)? Why divide by (n – 1)?
Degrees of freedomDegrees of freedom – –
Since is the exact balancing point of the Since is the exact balancing point of the data, the data data, the data will almost always be will almost always be closer to closer to , on average, , on average, than they will than they will be to μbe to μ. . The sum of the squared The sum of the squared deviations of will underestimate deviations of will underestimate the the sum of the squared deviations of µ. To sum of the squared deviations of µ. To correct this we divide by n-1 instead of correct this we divide by n-1 instead of n.n.
x
xx
Example: Roger MarisExample: Roger Maris
• New York Yankee Roger Maris held New York Yankee Roger Maris held the single-season home run record the single-season home run record from 1961 until 1998. Here are from 1961 until 1998. Here are Maris’s home run counts for his 10 Maris’s home run counts for his 10 years in the American League: years in the American League:
1414 28 16 39 61 28 16 39 61 33 33 23 26 8 23 26 8 1313
14 28 16 39 61 3314 28 16 39 61 33 23 26 8 23 26 8 1313
a.a. Maris’s mean number of home runs is Maris’s mean number of home runs is = 26.1.= 26.1. Find the standard deviation Find the standard deviation ss from its definition from its definition (by hand).(by hand).
∑ ∑ (x(xii - ) - )22 = (14-26.1) = (14-26.1)22 + (28-26.1) + (28-26.1)22……
ss22 = / n-1 = / n-1 ss22 = = 2192.9/92192.9/9 ss22 = = 243.66243.66 s = s = 15.60915.609
x
x
14 28 16 39 61 3314 28 16 39 61 33 23 26 8 23 26 8 1313
b.b. Use your calculator to verify your Use your calculator to verify your results.results.(STAT:CALC:1 var stat:L1)(STAT:CALC:1 var stat:L1)
• Then use your calculator to find the Then use your calculator to find the mean mean and s and s for the 9 observations that remain for the 9 observations that remain when you leave out any outlier(s). when you leave out any outlier(s).
Recall IQR x 1.5 Recall IQR x 1.5 Note: they choose 61 as an outlier while Note: they choose 61 as an outlier while
the upper bound is 61.5. the upper bound is 61.5.
• Mean = 22.2Mean = 22.2
• Sx = 10.244Sx = 10.244
• How does the leaving out the “outlier” How does the leaving out the “outlier” affect the values of the mean and s? affect the values of the mean and s?
It caused the values of both measures to It caused the values of both measures to decrease.decrease.
• Is Is ss a resistant measure of spread? a resistant measure of spread?
Clearly, s is not a resistant measure of Clearly, s is not a resistant measure of spread.spread.
Key Points of ChapterKey Points of Chapter