Basic Statistics Correlation Var Relationships Associations.

Post on 26-Mar-2015

225 views 1 download

Tags:

Transcript of Basic Statistics Correlation Var Relationships Associations.

Basic Statistics

Correlation

Var

Var

Var Var

Var

Relationships

Associations

Information

?COvary

In Research

Dependent variable

Independent variables

X1

X2

X3

Y

The Concept of Correlation

Association or relationship between two variables

X Y

Covary---Go together

Co-relate?relationr

Patterns of Covariation Y

Positive correlation

Negative correlation

CorrelationCovary

Go togetherX Y X Y

XZero or no correlation

Scatter plots allow us to visualize the relationships

Scatter Plots

The chief purpose of the scatter diagram is to study the nature of the relationship between two variables

Linear/curvilinear relationship

Direction of relationship

Magnitude (size) of relationship

Represents both the X and Y scores

Variable X

Variable Y

An illustration of a perfect positive correlation

high

high

low

low

Scatter Plot A

Exact value

Variable X

Variable Y

An illustration of a positive correlation

high

high

low

low

Scatter Plot B

Estimated Y value

Variable X

Variable Y

An illustration of a perfect negative correlation

high

high

low

low

Scatter Plot C

Exact value

Variable X

Variable Y

An illustration of a negative correlation

high

high

low

low

Scatter Plot D

Estimated Y value

Variable X

Variable Y

An illustration of a zero correlation

high

high

low

low

Scatter Plot E

Variable X

Variable Y

An illustration of a curvilinear relationship

high

high

low

low

Scatter Plot F

The Measurement of Correlation

The degree of correlation between two variables can be described by such terms as “strong,” ”low,” ”positive,” or “moderate,” but these terms are not very precise.

If a correlation coefficient is computed between two sets of scores, the relationship can be described more accurately.

The Correlation Coefficient

A statistical summary of the degree and direction of relationship or association between two variables can be computed

Pearson’s Product-Moment Correlation Coefficient r

-1.00 -.50 0 + .50 1.00

Direction of relationship: Sign (+ or –)

Magnitude: 0 through +1 or 0 through -1

Negative correlation Positive correlation

No Relationship

nY)(

YnX)(

X

nY)X)((

XYr

22

22

The Pearson Product-MomentCorrelation Coefficient

1n

XXXXΣ

1n

XXΣS

2

2

Recall that the formula for a variance is:

If we replaced the second X that was squared with a second variable, Y, it would be:

1n

YYXXΣS yx

This is called a co-variance and is an index of the relationship between X and Y.

Conceptual Formula for Pearson r

n

1i

n

1i

2i

2i

n

1i

)Y(Y)X(X

)Y)(YX(Xr

ii

This formula may be rewritten to reflect the actual method of calculation

nY)(

YnX)(

X

nY)X)((

XYr

22

22

Calculation of Pearson r

You should notice that this formula is merely the sum of squares for covariance divided by the square root of the product of the sum of squares for X and Y

Formulae for Sums of Squares

n

YXXYSSxy

n

YYSSy

n

XXSSx

22

22

Therefore, the formula for calculating r may be rewritten as:

Calculation of r Using Sums of Squares

SSySSx

SSxyr

An Example

Suppose that a college statistics professor is interested in how the number of hours that a student spends studying is related to how many errors students make on the mid-term examination. To determine the relationship the professor collects the following data:

The Stats Professor’s Data

Student Hours Studied (X)

Errors (Y) X2 Y2 XY

1 4 15 16 225 60

2 4 12 16 144 48

3 5 9 25 81 45

4 6 10 36 100 60

5 7 8 49 64 56

6 7 4 49 16 28

7 7 6 49 36 42

8 9 2 81 4 18

9 9 4 81 16 36

10 12 3 100 9 36

Total X = 70 Y = 73 X2 =546 Y2=695 XY=429

The Data Needed to Calculate the Sum of Squares

X Y X2 Y2 XY

Total X = 70 Y = 73 X2 =546 Y2=695 XY=429

n

YYSSy

22

n

YXXYSSxy

n

XXSSx

22 = 546 - 702/10 = 546 - 490 = 56

= 695 - 732/10 = 695 - 523.9 = 162.1

= 429 – (70)(73)/10 = 429 – 511 = -82

Calculating the Correlation Coefficient

SSySSx

SSxyr = -82 / √(56)(162.1)

= - 0.86

Thus, the correlation between hours studied and errors made on the mid-term examination is -0.86; indicating that more time spend studying is related to fewer errors on the mid-term examination. Hopefully an obvious, but now a statistical conclusion!

Pearson Product-Moment Correlation Coefficient r

0-1 +1

Negative correlation

Positive correlation

perfect negative correlation

Perfect positive correlation

Zero correlation

nY)(

YnX)(

X

nY)X)((

XYr

22

22

Numerical values

Negative correlation Zero correlation Positive correlation

0- .35.73

nY)(

YnX)(

X

nY)X)((

XYr

22

22

Perfect Strong Moderate

The Pearson r and Marginal Distribution

The marginal distribution of X is simply the distribution of the X’s; the marginal distribution

of Y is the frequency distribution of the Y’s.

Y

X

Bivariate Normal Distribution

Bivariate relationship

Marginal distribution of X and Y are precisely the same shape.

X variable

Y variable

Interpreting r, the Correlation Coefficient

Recall that r includes two types of information:

The direction of the relationship (+ or -)The magnitude of the relationship (0 to 1)

However, there is a more precise way to use the correlation coefficient, r, to interpret the magnitude of a relationship. That is, the square of the correlation coefficient or r2.

The square of r tells us what proportion of the variance of Y can be explained by X or vice versa.

Variable X

Variable Y

An illustration of how the squared correlation accounts for variance in X, r = .7, r2 = .49

high

high

low

low

How does correlation explain variance?

Explained

Explained

Suppose you wish to estimate Y for a given value of X.

49% of variance is explained

Free to Vary

Now, let’s look at some correlation coefficients and their corresponding scatter plots.

Beginning Salary

700006000050000400003000020000100000

Cur

rent

Sal

ary

120000

100000

80000

60000

40000

20000

0

What is your estimate of r?

r = .87 r2 = .76 = 76%

Beginning Salary

700006000050000400003000020000100000

Cur

rent

Sal

ary

120000

100000

80000

60000

40000

20000

0

X

Y

What is your estimate of r?

r = -1.00 r2 = 1.00 = 100%

Beginning Salary

700006000050000400003000020000100000

Cur

rent

Sal

ary

120000

100000

80000

60000

40000

20000

0

X

Y

What is your estimate of r?

r = +1.00 r2 = 1.00 = 100%

What is your estimate of r?

r = .04

Months since Hire

10090807060

Beg

inni

ng S

ala

ry

70000

60000

50000

40000

30000

20000

10000

0

r2 = .002 = .2%

What is your estimate of r?

r = -.44

Time to Accelerate from 0 to 60 mph (sec)

3020100

Veh

icle

Wei

ght

(lbs.

)

6000

5000

4000

3000

2000

1000

r2 = .19 = 19%