MAT 1000 Mathematics in Today's World. Last Time 1.Three keys to summarize a collection of data:...

Post on 24-Dec-2015

215 views 0 download

Tags:

Transcript of MAT 1000 Mathematics in Today's World. Last Time 1.Three keys to summarize a collection of data:...

MAT 1000

Mathematics in Today's World

Last Time

1. Three keys to summarize a collection of data: shape, center, spread.

2. The distribution of a data set: which values occur, and how often they occur

3. Graph the distribution to describe its shape

Today

Two useful ways to describe the center of a distribution: mean and median.

How are they calculated?

They can be either statistics or parameters.

Why have two ways to find the center?

Describing Center

There are different notions of the "center" of a distribution.

The three most common are:

•Mean

•Median

•Mode

Mean

The mean is just another word for the average.

How do we calculate the mean of a list of numbers?

If we have n numbers then we add them up and then divide by the number n.

Mean

Example: 3, 1, 5, 7, 20

The mean (= average) in this case is:

Formula for the mean

Given n numbers

(I will give you this formula on tests.)

The mean of these numbers is:

Statistics and parameters

Recall that a statistic is a number that describes a sample, a parameter is a number that describes a population.

The mean of a set of numbers can be either one, depending on where those numbers come from.

Statistics and parameters

Example

Suppose I want to know the average height of all Wayne State students.

I could measure every WSU student, add up their heights, and divided by the number of WSU students.

This is a parameter.

Statistics and parameters

Example

On the other hand, I could use a sample of Wayne State students, say the students in this class.

I could measure the height of all MAT 1000 students, add up those heights, and divide by the number of MAT 1000 students.

This would be a statistic.

Statistics and parameters

There are two commonly used abbreviations for a mean, depending on whether it is a parameter or a statistic.

Parameter:

(This is a Greek letter. It is pronounced “mu.”)

Statistic:

(This is pronounced “x bar.”)

Statistics and parameters

Later I will talk more about the relationship between the parameter and the statistic

For now, it’s enough to know that usually is unknown, and we estimate it using

Unless it’s stated otherwise, for now you can assume in homework and test problems that you are finding the population mean

Mean

How can we estimate the mean from the graph of a distribution?

The mean represents the “balance point” of the distribution.

Mean

It’s easy to see where a symmetric distribution balances…

…right in the middle.

Mean

What about an asymmetric distribution?

The midpoint of the distribution is clearly not the balance point. Here, the balance point is further to the right.

Mean

What about an asymmetric distribution?

The midpoint of the distribution is clearly not the balance point. Here the balance point is further to the right.

Mean

In a right-skewed distribution, the mean will be to the left of the midpoint:

Mean

Example: Suppose we look at 10 people’s savings accounts. Nine have $1 in their accounts, and the tenth has $1,000,000.

Does this represent the “typical” account size among these 10 people?

The very large savings account is clearly an outlier in that data set, and it is also the cause of the large mean.

Mean

As a measure of center, the mean is “susceptible to outliers.”

This also means that if a distribution is strongly skewed, the long tail will tend to pull the mean in the same direction.

Sometimes it is better to have a measure of center which is not susceptible to outliers.

MedianThe median of a ordered list of numbers is the

number in the middle.

Must put the numbers in order from smallest to largest.

If the number of data values is odd, there is a middle number, and this is the median.

Example

1 3 5 9 9

The median here is 5.

MedianIf the number of data values is even, there is no

middle number.

Example

1 3 5 9 9 10

In that case, the median is the mean of the middle pair.

So here the median is 7.

Notice the median doesn’t need to be in the data set.

MedianJust like with means, medians may either be

parameters or statistics.

There is no commonly used notation to distinguish a median which is a statistic and a median which is a parameter.

We won’t worry about notation for medians.

MedianLet’s revisit those ten people and their savings

accounts.

What is the median of this data set?

1 1 1 1 1 1 1 1 1 1,000,000

There are ten values in this set, so the median is the mean of the middle pair, in this case it is 1.

Median

Estimating the median from the graph of a distribution is harder than estimating the mean.

But we can use that the median is less sensitive to outliers to get a general idea.

Median

In a symmetric distribution the median (green) will be close to the mean (blue).

Median

In a left-skewed distribution the mean (blue) is smaller than the median (green).

The “long tail” pulls the mean to the left.

Median

In a right-skewed distribution, the mean (blue) will be larger than the median (green):

Here the tail pulls the mean to the right.

Comparing the mean and medianExample

How much money does the typical American earn?

In 2004, the mean income was $60,528.

The median was $43,389.

Why the discrepancy?

The distribution of incomes is skewed to the right: you can’t have an income less than $0, but there is no upper limit on income.

The number of people earning very large incomes is relatively small, but those large incomes affect the mean.

Comparing the mean and medianExample

The famous biologist Stephen Jay Gould was diagnosed with a form of cancer that had a median survival time of 8 months.

He lived another 20 years, dying of a different, unrelated cancer.

The median tells us what happens about half the time.

If 30-40% of people with a disease can be completely cured, the mortality distribution will be skewed to the right.

Comparing the mean and medianWhether to use the mean or median depends on

the shape of the distribution.

For a symmetric distribution with few outliers, the mean is a good measure of center.

If the distribution is asymmetric or has lots of outliers, the median is a better choice.

How do you determine the shape of a distribution?

Look at a histogram!