Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf ·...

43
Wed, June 26, (Lecture 8-2). Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in SPSS.

Transcript of Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf ·...

Page 1: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

Wed, June 26, (Lecture 8-2). Nonlinearity.

Significance test for correlation

R-squared, SSE, and SST.

Correlation in SPSS.

Page 2: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

Last time, we looked at scatterplots, which show the

interaction between two variables, and correlation.

Page 3: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

The correlation coefficient r measures how well the pairs of

values fit on a line.

r is positive when two values increase together.

r is negative when two one value goes up as the other goes

down.

Page 4: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

However, correlation only shows the linear relation between

two variables.

The variables could still be related in a non-linear way and

have little or no correlation.

Page 5: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

In real world contexts, the most common form of non-linear

relationship is a curvilinear one. (SOURCE: GAPMINDER.ORG)

One common reason is a scaling issue, where a fixed change in

one thing doesn’t mean a fixed change in another.

Page 6: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

Life expectancy increases with the logarithm of income, not

with income. (SOURCE: GAPMINDER.ORG)

When we rescale income into a log-scale (a scale that shows

very small and very large numbers equally well), a line appears.

Page 7: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

Another reason for non-linearity could be two competing

factors.

In a too-easy course, nobody learns anything new.

In a too-hard course, nobody learns anything at all.

Page 8: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

________ correlation is a measure that can handle curves as

long as the trend doesn’t switch between increasing and

decreasing. The only time we’ll be using this is as a check in

SPSS.

Everything else we do in Ch.10 and 11 is the…

________correlation, which is restricted to linear

relationships. We use the Pearson correlation because it

produces stronger results and the math is simpler.

Page 9: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

Math: The ugly sweater around an otherwise pretty graph.

Page 10: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

You can do hypothesis testing. We may be interested in

whether or not there is a correlation between two variables.

Since samples are random, the sample correlation between

two variables will show up as a little above or below zero by

chance.

How far from zero correlation does something have to be

before it’s significant?

Page 11: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

This formula gives the t-score of correlation.

The null hypothesis is: true correlation = zero.

The alternative is: correlation not zero.

Page 12: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

The t in this formula is the same t-score as in chapters 6 and 7.

This t-score gets compared the critical values in the t-table at

n-2 degrees of freedom.

Page 13: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

The stronger the correlation, the farther r goes from zero.

As r gets farther from zero, t-score gets bigger.

So a stronger correlation gives you higher t-score.

Stronger correlation better evidence of a correlation.

Page 14: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

t-score also increases with sample size. As usual, it’s under a

square root.

Having more data points makes it easier to detect correlations.

Page 15: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

A larger t-score meant more evidence against the null, just like

before.

So a large t-score means more evidence of a correlation.

Page 16: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

If there’s a weak correlation and a small sample, we might not

detect it. (Example: n=10, r=.25)

Page 17: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

t* = 1.397, at 8 df, 0.20 significance.

t* = 2.306, at 8 df, 0.05 significance.

No significant evidence of a correlation. ________

Page 18: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

What if we get a larger sample of this correlation?

(n=46, r=0.25)

We should get some evidence of a correlation, but not much.

Page 19: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

t* = 1.684, at 44 df, 0.10 significance.

t* = 2.021, at 44 df, 0.05 significance.

Weak evidence of a correlation, ____________

Page 20: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

What happens when you get a near perfect correlation?

(Example: n=10, r=.99).

Expectation: Very strong evidence of a correlation.

Page 21: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

t* = 2.306, at 8 df, 0.05 significance.

t* = 5.041, at 8 df, 0.001 significance.

Reality: Very strong evidence of a correlation.

Page 22: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

The bottom gets very small, and dividing by a small number

gives you something huge.

The same thing happens with a near-perfect negative

correlation, but the t-score is negative and huge.

Page 23: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

For interest: You can always put a line exactly through two

points.

With only two points, we have no idea what the true

correlation is. Points after the first two tell us about

correlation.

That’s why correlation has n-2 degrees of freedom.

Page 24: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

More math? More ugly sweaters!

Show your pet some love by forcing it into a tea cosy.

Page 25: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

First, we need to set down a convention. We’re looking at two

variables of the same object. We call these variables x and y.

Example: If we were talking about dragons, X could be the

length and Y could be the width.

X is the independent/explanatory variable (the one we control

or can measure more perfectly),

Y is the dependent/response variable.

Page 26: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

When x and y are correlated, we say that some of the variation

in y is ____________.

Meaning: Across all the x, the range of y can be large.

Page 27: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

But if we only consider a particular x (or a small x-interval), the

range of y shrinks considerably.

Y varies less for a particular X.

Y has less variance when accounting for X.

Page 28: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

r2 is the proportion that the variance of y is reduced when

accounting for x.

r = 0.6 in this graph, so r2 = 0.62 = 0.36.

________ of the variation in Y is explained by X.

Page 29: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

The same proportion of variance is explained for a negative

correlation of equal strength. A negative times itself is

positive, so r2 is always between ____________

Page 30: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

In a perfect correlation, knowing x automatically gives you y as well. So there is no variation in y left to explain.

r = 1 or -1, so r2 = 1.

________ of the variation in y is explained by x.

When two values are uncorrelated, using a linear function of x

to guess at y is useless.

r = 0, so r2 = 0

________ of the variation in y is explained by x.

Page 31: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

The total squared difference from the mean of y is called the

________________________, or SST

SST is the total square length of all the vertical red lines.

Page 32: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

If we fit a line through the middle of the points in the scatter

plot (called a regression line, the subject of chapter 11),

the lines, on average, get shorter.

The total squared length of these lines is the

________________________, or SSE.

Page 33: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

The stronger the correlation, the shorter the vertical lines get.

In other words, the smaller our errors get, and with them the

Sum of Squared Error does too.

Here, the correlation is very strong, and there are barely and

errors at all.

Page 34: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

r2 can also be expressed in terms of SSE and SST.

SST is the total amount of variation in Y

SSE is the amount of variation in Y left unexplained by X.

When r2 is zero, SSE is same as SST

When r2 is one, SSE disappears completely.

Page 35: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

An ugly sweater for every occasion!

Even SPSS!

Page 36: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

To find a correlation in SPSS, go to

____ ____ ____ ____ ____ ____ (Means two-variable)

Pick the variables you want to correlate, drag them right.

Page 37: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

Pearson correlation coefficient MUST be selected.

Spearman coefficient is optional.

There is a correlation of r = .940 between weight and height.

Page 38: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

It’s a significant correlation, with a p-value of less than .001

(shows up as Sig. (2-tailed) = .000)

Also, anything correlates with itself perfectly, so the

correlation between length and length is r= 1

Page 39: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

To build a scatterplot, go to

________________________________

Page 40: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

Choose Simple Scatter if it’s not already picked, and click

________.

Page 41: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

Move the independent variable into the x-axis,

And the dependent variable into the y-axis,

, then click OK (way at the bottom)

Page 42: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

Our result:

There is a definite upward trend, so the strong positive

correlation of r = 0.940 makes sense.

Page 43: Wed, June 26, (Lecture 8-2). Nonlinearity. Significance ...jackd/Stat203_2011/Wk08_2.pdf · Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in

Next time: Residuals, Outliers and Influence, and the

assumption of constant variance.