1.2 Describing Distributions with Numbers Is the mean a good measure of center? Ex. Roger Maris’s...

27
1.2 Describing Distributions with Numbers Is the mean a good measure of center? Ex. Roger Maris’s yearly homerun production:

Transcript of 1.2 Describing Distributions with Numbers Is the mean a good measure of center? Ex. Roger Maris’s...

1.2 Describing Distributions with Numbers

Is the mean a good measure of center?

Ex. Roger Maris’s yearly homerun production:

8 13 14 16 23 26 28 33 39 61

Mean/Mean…(Centers)

Both measure center in different ways, but both are useful.

Use median if you want a “typical” number.

Mean = “Arithmetic Average Value” Mean/Median of a symmetric distribution

are close together. If a distribution is exactly symmetric, mean = median.

In a skewed distribution, the mean is farther out in the long tail than the median.

Measures of Spread

Range = Largest – Smallest Observations in a list. What’s the problem with this?

Better measure of spread: Quartiles.

Range Quartiles 5 # Summary Variance Standard

Deviation

Male/Female Surgeons (# of hysterectomies performed)

Put in ascending order (male dr.s): odd #20 25 25 27 28 31 33 34 36 37 44 50 59 85 86Min Q1 M Q3 Max

Put in ascending order (female dr.s): even #5 7 10 14 18 19 25 29 31 33Min Q1 M = 18.5 Q3 Max

Boxplots You can instantly see that female dr.’s perform less

hysterectomies than male doctors. Also, there is less variation among female doctors.

Notes on boxplots

Best used for side-by-side comparisons of more than 1 distribution.

Less detail than histograms or stem plots.

Always include the numerical scale...\Simulations\Hotdog Data.xls

Travel Times to Work #1

How long does it take you to get from home to school? Here are the travel times from home to work in minutes for 15 workers in North Carolina, chosen at random by the Census Bureau:

30 20 10 40 25 20 10 60 1540 5 30 12 10 10

The distribution…

Describe Is the longest travel time (60 minutes) an

outlier? How many of the travel times are larger than

the mean? If you leave out the large time, how does that

change the mean? The mean in this example is nonresistant

because it is sensitive to the influence of extreme observations. The mean is the arithmetic average, but it may not be a “typical“number!

Travel Times to Work #2

Travel times to work in New York State are (on the average) longer than in North Carolina. Here are the travel times in minutes of 20 randomly chosen New York workers:

10 30 5 25 40 20 10 1530 20 15 20 85 15 65 1560 60 40 45

Interquartile Range (IQR)

Measures the spread of the middle ½ of the data.

An observation is an outlier if:Less than Q1 – 1.5(IQR) or Greater than Q3 + 1.5(IQR)

Looking at the spread….

Quartiles show spread of middle ½ of data Spacing of the quartiles and extremes

about the median give an indication of the symmetry or skewness of the distribution.

Symmetric distributions:1st/3rd quartiles equally distant from the median.

In right-skewed distributions: 3rd quartile will be farther above the median than the 1st quartile is below it.

Is there a difference between the number of programmed telephone numbers in girls’ cell phones and the number of programmed numbers in boys’ cell phones? Do you think there is a difference? If so, in what direction?

1) Count the number of programmed telephone numbers in your cell phone and write the total on a piece of paper.

2) Make a back-to-back stemplot of this information, then draw boxplots. When you test for outliers, how many do you find for males and how many do you find for females using the 1.5 X IQR test?

3) Find the 5# Summary for each group. Compare the two distributions (SOCS!).

4) It is important in any study that you have “data integrity” (the data is reported accurately and truthfully). Do you think this is the case here? Do you see any suspicious observations? Can you think of any reason someone may make up a response or stretch the truth? If you DO see a difference between the two groups, can you suggest a possible reason for this difference?

5) Do you think a study of cell phone programmed numbers for a sophomore algebra class would yield similar results? Why or why not?

Spring ’09 Student Data

Girls: 53 457 24 136 222 106 23775 296 154 275 70 134

Boys: 298 65 81 95 35 141 24760 176 33

Standard Deviation: A measure of spread

Standard deviation looks at how far observations are from their mean.

It’s the natural measure of spread for the Normal distribution

We like s instead of s-squared (variance) since the units of measurement are easier to work with (original scale)

S is the average of the squares of the deviations of the observations from their mean.

S, like the mean, is strongly influenced by extreme observations. A few outliers can make s very large.

Skewed distributions with a few observations in the single long tail = large s. (S is therefore not very helpful in this case)

As the observations become more spread about the mean, s gets larger.

Mean vs. MedianStandard Deviation vs. 5-Number Summary

The mean and standard deviation are more common than the median and the five number summary as a measure of center and spread.

No single # describes the spread well. Remember: A graph gives the best overall picture of a

distribution. ALWAYS PLOT YOUR DATA! The choice of mean/median depends upon the shape of

the distribution.When dealing with a skewed distribution, use the

median and the 5# summary.When dealing with reasonably symmetric distributions,

use the mean and standard deviation.

The variance and standard deviation are…LARGE if observations are widely spread

about the meanSMALL if observations are close to the mean

Degrees of Freedom (n-1)

Definition: the number of independent pieces of information that are included in your measurement.

Calculated from the size of the sample. They are a measure of the amount of information from the sample data that has been used up. Every time a statistic is calculated from a sample, one degree of freedom is used up.

If the mean of 4 numbers is 250, we have degrees of freedom (4-1) = 3. Why?

____ ____ ____ ____ mean = 250 If we freely choose numbers for the first 3

blanks, the 4th number HAS to be a certain number in order to obtain the mean of 250.

A person’s metabolic rate is the rate at which the body consumes energy. Metabolic rate is important in studies of weight gain, dieting, and exercise. Here are the metabolic rates of 7 men who took part in a study of dieting:1792 1666 1362 1614 1460 1867 1439

Find the mean Column 1: Observations (x) Column 2: Deviations Column 3: Squared deviations (TI-83: STAT/Calc/1-var-Stats L1 after entering list into L1)

You do! (By Hand)

Let X =What is the variance and standard deviation?

3,7,15,23

You do! (using 1 Var Stats)

During the years 1929-1939 of the Great Depression, the weekly average hours worked in manufacturing jobs were 45, 43, 41, 39, 39, 35, 37, 40, 39, 36, and 37. What is the variance and standard deviation?

Miami Heat Salaries1) Suppose that each member receives a $100,000 bonus. How will this effect the center, shape, and spread?2) Suppose that each player is offered 10% increase in base salary. What happened to the centers and spread?

Player Salary

Shaq 27.7

Eddie Jones 13.46

Wade 2.83

Jones 2.5

Doleac 2.4

Butler 1.2

Wright 1.15

Woods 1.13

Laettner 1.10

Smith 1.10

Anderson .87

Dooling .75

Wang .75

Haslem .62

Mourning .33