CORRELATIONS: TESTING RELATIONSHIPS BETWEEN TWO METRIC VARIABLES Lecture 18:

Post on 13-Jan-2016

214 views 0 download

Transcript of CORRELATIONS: TESTING RELATIONSHIPS BETWEEN TWO METRIC VARIABLES Lecture 18:

CORRELATIONS: TESTING RELATIONSHIPS BETWEEN TWO METRIC VARIABLES

Lecture 18:

Agenda2

Reminder about Lab 3

Brief Update on Data for Final

Correlations

Probability Revisited3

To make a reasonable decision, we must know:

Probability Distribution What would the distribution be like if it were

only due to chance?

Decision Rule What criteria do we need in order to

determine whether an observation is just due to chance or not.

Quick Recap of An Earlier Issue:Why N-1?

4

If we have a randomly distributed variable in a population, extreme cases (i.e., the tails) are less likely to be selected than common cases (i.e., within 1 SD of the mean). One result of this: sample variance is lower than

actual population variance. Dividing by n-1 corrects this bias when calculating sample statistics.

Checking for simple linear relationships

5

Pearson’s correlation coefficient Measures the extent to which two metric or

interval-type variables are linearly related Statistic is Pearson r, or the linear or product-

moment correlation

Or, the correlation coefficient is the average of the cross products of the corresponding z-scores.

Correlations6

N

iyixixy zz

Nr

11

1

Ranges from zero to 1, where 1 = perfect linear relationship between the two variables. Negative relations Positive relations

Remember: correlation ONLY measures linear relationships, not all relationships!

Interpretation7

Recall that Correlation is a precondition for causality– but by itself it is not sufficient to show causality (why?)

Correlation is a proportional measure; does not depend on specific measurements

Correlation interpretation: Direction (+/-) Magnitude of Effect (-1 to 1); shown as r Statistical Significance (p<.05, p<.01, p<.001)

Correlation: Null and Alt Hypotheses8

Null versus Alternative Hypothesis H0 H1, H2, etc

Test Statistics and Significance Level

Test statistic Calculated from the data Has a known probability distribution

Significance level Usually reported as a p-value

(probability that a result would occur if the null hypothesis were true).

price mpg

price 1.0000

mpg -0.4686 1.0000 0.0000

Factors which limit Correlation coefficient

9

Homogeneity of sample groupNon-linear relationshipsCensored or limited scalesUnreliable measurement instrumentOutliers

Homogenous Groups10

Homogenous Groups: Adding Groups11

Homogenous Groups: Adding More Groups

12

Separate Groups (non-homogeneous)13

Non-Linear Relationships14

Censored or Limited Scales…15

Censored or Limited Scales16

Unreliable Instrument17

Unreliable Instrument18

Unreliable Instrument19

Outliers20

Outliers21

Outlier

22

Examples with Real Data…