Correlation - Pennsylvania State University

64
Correlation

Transcript of Correlation - Pennsylvania State University

Page 1: Correlation - Pennsylvania State University

Correlation

Page 2: Correlation - Pennsylvania State University

Correlation

A statistics method to measure the

relationship between two variables

Three characteristics

Direction of the relationship

Form of the relationship

Strength/Consistency

Page 3: Correlation - Pennsylvania State University

Direction of the Relationship

Positive correlation

Variables moving in the same direction

Negative correlation

Variables moving in opposite directions

Page 4: Correlation - Pennsylvania State University

Form of the Relationship

Linear or non-linear

Predicting data

Page 5: Correlation - Pennsylvania State University

Strength/Consistency

How well do data fit the specific form?

Measured by the distance between actual data

and the predicted data

The absolute value of a correlation

Measuring the fitness

1: Perfect fit

0: Not fit at all

Page 6: Correlation - Pennsylvania State University
Page 7: Correlation - Pennsylvania State University

Correlation Measures The Pearson correlation

Linear relationship

The sign of the correlation: direction

The numerical value: the degree of the relationship

The Spearman correlation For ordinal scale of measurement

Both the X values and the Y values are ranks. Measuring consistency for data relationship

Not necessarily to be linear

The point-biserial correlation Used to measure the correlation between a regular

variable and a dichotomous variable

Page 8: Correlation - Pennsylvania State University

The Pearson Correlation

YX SSSS

SP

y separatelY and X ofy variabilit

Y and X ofity covariabil

atelyvary separ Y and X whichto degree

togethervary Y and X whichto degreer

Page 9: Correlation - Pennsylvania State University

2)( MXSS

n

YXXYPS

))((

))(( YX MYMXPS

n

XXSS

2

2)(

Page 10: Correlation - Pennsylvania State University

Check the Result

Using the scatterplot of data

Drawing the envelope around all data points

Checking the direction and shape of the

envelope

0

1

2

3

4

5

0 2 4 6 8 10 12

X Y

0 1

10 3

4 1

8 2

8 3

Page 11: Correlation - Pennsylvania State University

Interpreting Correlations

Predication

Correlation is just about relationship between two variables. Not necessarily causation!!

Page 12: Correlation - Pennsylvania State University
Page 13: Correlation - Pennsylvania State University

Interpreting Correlations

Predication

Correlation is just about relationship between two variables. Not necessarily causation!!

The value could be affected greatly by the data range.

Page 14: Correlation - Pennsylvania State University

Data Range and Correlation

Page 15: Correlation - Pennsylvania State University

Interpreting Correlations

Predication

Correlation is just about relationship between two variables. Not necessarily causation!!

The value could be affected greatly by the data range.

Outliers can dramatically affect the value.

Page 16: Correlation - Pennsylvania State University

Outlier and Correlation

Page 17: Correlation - Pennsylvania State University

The Strength of Relationship

Page 18: Correlation - Pennsylvania State University

The Strength of Relationship

The coefficient of determination

Squaring the value of correlation

How much of the variance in dependent variable

is accounted for by independent variable.

Similar to the power used in z- and t-tests

Page 19: Correlation - Pennsylvania State University

Hypothesis Tests with the

Pearson Correlation

Pearson correlation is usually computed for

sample data, but used to test hypotheses

about the relationship in the population

Population correlation shown by Greek letter

rho (ρ)

Non-directional: H0: ρ = 0 and H1: ρ ≠ 0

Directional: H0: ρ ≤ 0 and H1: ρ > 0 or

Directional: H0: ρ ≥ 0 and H1: ρ < 0

Page 20: Correlation - Pennsylvania State University

Population vs. Sample

Page 21: Correlation - Pennsylvania State University

Correlation Hypothesis Test

Sample correlation r used to test population ρ

Hypothesis test can be computed using

either t or F

Use t table to find critical value

rs

rt

df

rsr

21

Page 22: Correlation - Pennsylvania State University

About df

What should the df be?

Suppose the sample size is n

2

)1( 2

n

r

rt

Page 23: Correlation - Pennsylvania State University

Example

α = .05

n = 30

r = 0.35

Two-tailed test: critical value ±2.048

Fail to reject the null hypothesis

One-tailed test: reject: 1.701

Reject

97.1

28

)35.01(

35.0

2

)1( 22

n

r

rt

Page 24: Correlation - Pennsylvania State University

Using r Directly …

Page 25: Correlation - Pennsylvania State University

Report Correlations

A correlation for the data revealed a

significant relationship between amount of

education and annual income, r (28)= 0.65,

p < .01, two-tailed.

Page 26: Correlation - Pennsylvania State University

Usually, Multiple Variables

Involved in Correlation Tests

Page 27: Correlation - Pennsylvania State University

Partial Correlation

Involvement of

other factors in

correlation?

Page 28: Correlation - Pennsylvania State University

Partial Correlation

Page 29: Correlation - Pennsylvania State University

Partial Correlation

A partial correlation measures the

relationship between two variables while

mathematically controlling the influence of a

third variable by holding it constant

)1)(1(

)(

22

yzxz

yzxzxy

zxy

rr

rrrr

Page 30: Correlation - Pennsylvania State University

ExampleNumber of Churches

(X) Number of Crimes

(Y) Population

(Z)1 4 12 3 13 1 14 2 15 5 17 8 28 11 29 9 2

10 7 211 10 213 15 314 14 315 16 316 17 317 13 3

0zxyr

Page 31: Correlation - Pennsylvania State University

What if the relationship looks

like this?

Page 32: Correlation - Pennsylvania State University

The Spearman Correlation

To measure the degree of consistency of direction Not necessarily linear.

One extra step before calculating the Pearson correlation Ranking the X and Y values

Analyze the correlation of ranking values.

X Y (values) X Y (Ranks)

1 3 2 2

6 4 4 3

2 5 3 4

0 2 1 1

Page 33: Correlation - Pennsylvania State University
Page 34: Correlation - Pennsylvania State University

Ranking Tied Scores

Using the same rank for same scores

Ranking all scores

Computing the mean for ranked position of same scores

X Y (values) X Y (Ranks)

1 3 2 2 (2.5)

6 3 4 3 (2.5)

2 5 3 4

0 2 1 1

Page 35: Correlation - Pennsylvania State University

Special Formula for

Spearman Correlation

12

)1( 2

nnSS

)1(

61

2

2

nn

Drs

Page 36: Correlation - Pennsylvania State University

The Point-Biserial Correlation

Just like the Pearson correlation

One variable has only two values

Gender, success/failure, college education or not, …

The value of correlation has nothing to do with the

values you used in study (1/0, 1/-1, etc.)

Page 37: Correlation - Pennsylvania State University
Page 38: Correlation - Pennsylvania State University

Point-Biserial Correlation vs. t

Test

t test

t = 4

p <.001

df = 18

Point-Biserail

r = 0.686

Page 39: Correlation - Pennsylvania State University

If we know two variables are

linearly related, how can we

describe such a relationship?

Using a linear equation

y = bx + a

Page 40: Correlation - Pennsylvania State University

Regression

Page 41: Correlation - Pennsylvania State University

Goal of Regression

Determining two constants for a linear

equation: y=bx+a

b: slope

a: intercept

Methods

The least-squares solution

Page 42: Correlation - Pennsylvania State University

Distance = Y - Y^

Minimizing S(Y-Y)2^

Page 43: Correlation - Pennsylvania State University

Formula

XY

X

bMMa

SS

SPb

Page 44: Correlation - Pennsylvania State University

Regression in Excel

Draw a scatterplot

Show the trendline

Page 45: Correlation - Pennsylvania State University

Linear Equations and

Regression

The Pearson correlation

measures a linear relationship

between two variables

This figure

Makes the relationship easier to

see

Shows the central tendency of

the relationship

Can be used for prediction

Page 46: Correlation - Pennsylvania State University

Linear Equations

General equation for a line

Equation: Y = bX + a

X and Y are variables

a and b are fixed constant

Page 47: Correlation - Pennsylvania State University

Regression

Regression is a method of finding an

equation describing the best-fitting line for a

set of data

Least square

Minimizing errors of known data

Or the error of prediction

Page 48: Correlation - Pennsylvania State University

Error of Prediction

With a linear function from regression, we

can calculate the predicted value based on a

given X

Ŷ

Error of prediction: Y- Ŷ

Often squared

Page 49: Correlation - Pennsylvania State University
Page 50: Correlation - Pennsylvania State University

Standard Error of Estimate

Regression equation makes a prediction

Precision of the estimate is measured by the

standard error of estimate (SEoE)

SEoE =2

)ˆ( 2

n

YY

df

SSresidual

2

)ˆ(SS

2

residual

n

YY

Page 51: Correlation - Pennsylvania State University
Page 52: Correlation - Pennsylvania State University

Relationship Between

Correlation and Standard Error

of Estimate

SSregression = r2 SSY

SSresidual = (1 - r2) SSY

2

)1( 2

n

SSr

df

SS Yresidual

Page 53: Correlation - Pennsylvania State University

Testing Regression

Significance

Analysis of Regression

Similar to Analysis of Variance

Uses an F-ratio of two Mean Square values

Each MS is an SS divided by its df

H0: the slope of the regression line (b or beta)

is zero

no regression

Page 54: Correlation - Pennsylvania State University

Mean Squares and F-ratio

residual

residualresidual

df

SSMS

regression

regression

regressiondf

SSMS

residual

regression

MS

MSF

Page 55: Correlation - Pennsylvania State University

SS and df in

Regression Analysis

Page 56: Correlation - Pennsylvania State University

SPSS Output Example

Page 57: Correlation - Pennsylvania State University
Page 58: Correlation - Pennsylvania State University

In Excel

X Y5 101 44 57 116 154 63 52 0

Page 59: Correlation - Pennsylvania State University

ANOVA and Regression

Basically the same method, but different

perspectives to look at the results

Main effect in ANOVA == a variable in

regression

Interaction between two factors ==

multiplication of two variables in regression

Regression not only tells difference, but also

predicts by how much.

Multivariate regression

Page 60: Correlation - Pennsylvania State University

Linear or Non-Linear

Regression?

Linear models are usually good enough to

most research in IST.

If non-linear models are involved, how to tell

the linear model you have is not appropriate?

Look at residual distribution

Page 61: Correlation - Pennsylvania State University
Page 62: Correlation - Pennsylvania State University
Page 63: Correlation - Pennsylvania State University

In Summary

Correlation: the relationship between two

variables

Direction, form, degree

Three methods

For different purposes

Regression

Determining the linear equation that data best fit

Slope and intercept

Page 64: Correlation - Pennsylvania State University

Homework

Three problems to solve.