1.2 Describing Distributions with Numbers Is the mean a good measure of center? Ex. Roger Maris’s...
Transcript of 1.2 Describing Distributions with Numbers Is the mean a good measure of center? Ex. Roger Maris’s...
1.2 Describing Distributions with Numbers
Is the mean a good measure of center?
Ex. Roger Maris’s yearly homerun production:
8 13 14 16 23 26 28 33 39 61
Mean/Mean…(Centers)
Both measure center in different ways, but both are useful.
Use median if you want a “typical” number.
Mean = “Arithmetic Average Value” Mean/Median of a symmetric distribution
are close together. If a distribution is exactly symmetric, mean = median.
In a skewed distribution, the mean is farther out in the long tail than the median.
Measures of Spread
Range = Largest – Smallest Observations in a list. What’s the problem with this?
Better measure of spread: Quartiles.
Range Quartiles 5 # Summary Variance Standard
Deviation
Male/Female Surgeons (# of hysterectomies performed)
Put in ascending order (male dr.s): odd #20 25 25 27 28 31 33 34 36 37 44 50 59 85 86Min Q1 M Q3 Max
Put in ascending order (female dr.s): even #5 7 10 14 18 19 25 29 31 33Min Q1 M = 18.5 Q3 Max
Boxplots You can instantly see that female dr.’s perform less
hysterectomies than male doctors. Also, there is less variation among female doctors.
Notes on boxplots
Best used for side-by-side comparisons of more than 1 distribution.
Less detail than histograms or stem plots.
Always include the numerical scale...\Simulations\Hotdog Data.xls
Travel Times to Work #1
How long does it take you to get from home to school? Here are the travel times from home to work in minutes for 15 workers in North Carolina, chosen at random by the Census Bureau:
30 20 10 40 25 20 10 60 1540 5 30 12 10 10
The distribution…
Describe Is the longest travel time (60 minutes) an
outlier? How many of the travel times are larger than
the mean? If you leave out the large time, how does that
change the mean? The mean in this example is nonresistant
because it is sensitive to the influence of extreme observations. The mean is the arithmetic average, but it may not be a “typical“number!
Travel Times to Work #2
Travel times to work in New York State are (on the average) longer than in North Carolina. Here are the travel times in minutes of 20 randomly chosen New York workers:
10 30 5 25 40 20 10 1530 20 15 20 85 15 65 1560 60 40 45
Interquartile Range (IQR)
Measures the spread of the middle ½ of the data.
An observation is an outlier if:Less than Q1 – 1.5(IQR) or Greater than Q3 + 1.5(IQR)
Looking at the spread….
Quartiles show spread of middle ½ of data Spacing of the quartiles and extremes
about the median give an indication of the symmetry or skewness of the distribution.
Symmetric distributions:1st/3rd quartiles equally distant from the median.
In right-skewed distributions: 3rd quartile will be farther above the median than the 1st quartile is below it.
Is there a difference between the number of programmed telephone numbers in girls’ cell phones and the number of programmed numbers in boys’ cell phones? Do you think there is a difference? If so, in what direction?
1) Count the number of programmed telephone numbers in your cell phone and write the total on a piece of paper.
2) Make a back-to-back stemplot of this information, then draw boxplots. When you test for outliers, how many do you find for males and how many do you find for females using the 1.5 X IQR test?
3) Find the 5# Summary for each group. Compare the two distributions (SOCS!).
4) It is important in any study that you have “data integrity” (the data is reported accurately and truthfully). Do you think this is the case here? Do you see any suspicious observations? Can you think of any reason someone may make up a response or stretch the truth? If you DO see a difference between the two groups, can you suggest a possible reason for this difference?
5) Do you think a study of cell phone programmed numbers for a sophomore algebra class would yield similar results? Why or why not?
Spring ’09 Student Data
Girls: 53 457 24 136 222 106 23775 296 154 275 70 134
Boys: 298 65 81 95 35 141 24760 176 33
Standard Deviation: A measure of spread
Standard deviation looks at how far observations are from their mean.
It’s the natural measure of spread for the Normal distribution
We like s instead of s-squared (variance) since the units of measurement are easier to work with (original scale)
S is the average of the squares of the deviations of the observations from their mean.
S, like the mean, is strongly influenced by extreme observations. A few outliers can make s very large.
Skewed distributions with a few observations in the single long tail = large s. (S is therefore not very helpful in this case)
As the observations become more spread about the mean, s gets larger.
Mean vs. MedianStandard Deviation vs. 5-Number Summary
The mean and standard deviation are more common than the median and the five number summary as a measure of center and spread.
No single # describes the spread well. Remember: A graph gives the best overall picture of a
distribution. ALWAYS PLOT YOUR DATA! The choice of mean/median depends upon the shape of
the distribution.When dealing with a skewed distribution, use the
median and the 5# summary.When dealing with reasonably symmetric distributions,
use the mean and standard deviation.
The variance and standard deviation are…LARGE if observations are widely spread
about the meanSMALL if observations are close to the mean
Degrees of Freedom (n-1)
Definition: the number of independent pieces of information that are included in your measurement.
Calculated from the size of the sample. They are a measure of the amount of information from the sample data that has been used up. Every time a statistic is calculated from a sample, one degree of freedom is used up.
If the mean of 4 numbers is 250, we have degrees of freedom (4-1) = 3. Why?
____ ____ ____ ____ mean = 250 If we freely choose numbers for the first 3
blanks, the 4th number HAS to be a certain number in order to obtain the mean of 250.
A person’s metabolic rate is the rate at which the body consumes energy. Metabolic rate is important in studies of weight gain, dieting, and exercise. Here are the metabolic rates of 7 men who took part in a study of dieting:1792 1666 1362 1614 1460 1867 1439
Find the mean Column 1: Observations (x) Column 2: Deviations Column 3: Squared deviations (TI-83: STAT/Calc/1-var-Stats L1 after entering list into L1)
You do! (using 1 Var Stats)
During the years 1929-1939 of the Great Depression, the weekly average hours worked in manufacturing jobs were 45, 43, 41, 39, 39, 35, 37, 40, 39, 36, and 37. What is the variance and standard deviation?
Miami Heat Salaries1) Suppose that each member receives a $100,000 bonus. How will this effect the center, shape, and spread?2) Suppose that each player is offered 10% increase in base salary. What happened to the centers and spread?
Player Salary
Shaq 27.7
Eddie Jones 13.46
Wade 2.83
Jones 2.5
Doleac 2.4
Butler 1.2
Wright 1.15
Woods 1.13
Laettner 1.10
Smith 1.10
Anderson .87
Dooling .75
Wang .75
Haslem .62
Mourning .33