MAT 1000 Mathematics in Today's World. Last Time.

Post on 16-Jan-2016

215 views 1 download

Tags:

Transcript of MAT 1000 Mathematics in Today's World. Last Time.

MAT 1000

Mathematics in Today's World

Last Time

We learned how to calculate correlation (abbreviated ).

This is a number that tells us about the strength and direction of an association, when that association has the form of a straight line.

If the sign of the number is positive, the association has a positive direction, if is negative, so is the association.

The closer is to either 1 or -1, the stronger the association, and this means that a scatterplot of the data will look more like a straight line.

Today

First: a warning about interpreting correlation.

We will also talk about least-squares regression.

This is a way to calculate the line that is the “best fit” for the data, in other words: a line that is a good approximation of the scatterplot.

The reason least-squares regression is important is that it allows us to make predictions where we don’t have any data—these predictions will be based on the pattern the data gives us.

“Correlation is not causation”

You may have heard this expression before. What does it mean?

Correlation is good evidence for a cause and effect relationship between two variables. If there is such a relationship, the variables will have a strong correlation.

On the other hand, variables can have a strong correlation even though there is no cause and effect relationship.

“Correlation is not causation”

ExampleIce cream sales are correlated with drowning deaths.

Obviously not a cause and effect relationship.

In this case the explanation is that ice cream sales and drowning deaths are both related to the weather. More ice cream is sold in the summer, and more people go swimming in the summer.

We call this relationship between ice cream sales and drowning deaths “mutual response.”

“Correlation is not causation”Correlation may not even be due to mutual response.

Example (The Pirate Effect)

The number of pirates is correlated with global average temperature: over the past few centuries the number of pirates has decreased, and global average temperatures have increased.

Is global warming caused by lack of pirates?

This is just a coincidence.

People call this kind of relationship a “nonsense correlation.”

For more nonsense correlations: www.tylervigen.com

Approximating scatterplotsLast time we calculated the correlation between the heights and weights of five male adults. Here is that same data as a scatterplot.

Approximating scatterplotsIf you had to draw by hand a line that approximated the shape of this scatterplot, you could end up with any number of lines.

Approximating scatterplots

For example, maybe you would draw this line

Approximating scatterplots

Or this one

Approximating scatterplots

But there is only one “least-squares regression line:”

Review of linear functions

The goal is to take a set of pairs of data and produce a line that approximates that data.

First, we need to review some facts about lines.

In mathematics we describe a line using a linear function.

Linear functions can be put into a special form, called slope-intercept form. This looks like:

Review of linear functions

In the equation , the numbers and are called “constants.” This just means that they should have specific values.

For example

is the equation of a line. Here we specify that and . We never specify or —both of these are “variables.”

We usually call the number the slope of the line, and is called the intercept.

Review of linear functions

But how does the equation

describe a line?

There are many pairs of numbers and that satisfy this equation.

For example, the pair and .

How can we tell?

Plug in for and for :

This is a true equation.

Review of linear functions

Of course, not every pair of numbers satisfies the equation

For example, the pair and

Try plugging in for and for :

This is not a true equation.

Review of linear functions

To find the graph of the line described by the equation , we need two pairs and that satisfy the equation.

We have one, namely the pair and , let’s find one more.

What we can do is pick any number for and solve for .

Let’s say . Plug that into the equation:

If you simplify this, you will find that must be equal to .

Review of linear functions

Now we use these pairs to plot two points. The number gives the horizontal location of the point, the number gives the vertical distance.

Review of linear functions

Plot the pair and

Review of linear functions

Add in the pair and

Review of linear functions

Now connect these two points with a line

The least-squares regression line

A line will be determined by knowing the values of and .

We will give formulas for finding each of these.

In the equation we have two variables:

is the explanatory variable

is the response variable

The least-squares regression line

Remember that our starting point in all of this is a collection of paired data.

Therefore we have two variables.

We also have their means and ,

and their standard deviations and .

We can also calculate their correlation .

The formulas for the least-square regression line use all of these numbers.

The least-squares regression line

The slope of the least-squares regression line is

The intercept is

Notice the in the equation for the intercept—this is the slope (the same one we find with the previous formula).

The least-squares regression line

ExampleLast time we considered the heights and weight of five adult males. We found the following numbers:

So the least-squares regression line has slope

The intercept is

The least-squares regression line

ExamplePutting these together, the least-squares regression line is:

The least-squares regression line

Note that none of the data actually lies on the line.

For a line to be the least-squares regression line the distance from all of the data to the line must be as small as possible.

Nevertheless, the line need not (and usually does not) contain any of the data values.

Predictions

The most important application of least-squares regression lines is for making predictions.

If a scatterplot has a linear form, this suggests an underlying pattern.

Mathematically, that pattern is exactly the least-squares regression line.

We can then make predictions based on the pattern we see in the data we’ve collected.

Predictions

Let’s use our least-squares regression line to make predictions.

What does our data predict will be the weight of a man who is 70 inches tall?

To find this, we plug 70 in for and find the corresponding .

How about a man who is 76 inches tall?

Predictions

Predictions

In general, when you have paired data, you can find the least-squares regression line

The variable always corresponds to the explanatory variable (if there is one), and is always the response variable.

You can use the regression line to predict values of the response variable for different values of the explanatory variable.

Just take the value of the explanatory variable, and substitute it for in the equation. The number your get for is the predicted response.

Predictions

One danger in using least-squares regression for predictions is extrapolation.

Within the range of our data, the least-squares regression line should give reasonable predictions. But, if we plug in numbers too far outside that range, the predictions may no longer be reasonable.

In our original height and weight data, the heights range from 67 inches to 77 inches.

We can be confident that our least-squares regression line gives reasonable predictions for any height in this range.

Predictions

What weight does our regression line predict for a man who is 5 feet tall (60 inches)?

105.8 pounds.

This is quite low: even 120 pounds is considered a low weight for a 5 foot tall man.

What about 50 inches?

28.5 pounds. This is obviously preposterous.