What is a Pearson Product Moment Correlation (independence)?

Post on 20-Jun-2015

88 views 7 download

Tags:

description

What is a Pearson Product Moment Correlation (independence)?

Transcript of What is a Pearson Product Moment Correlation (independence)?

Pearson Product Moment Correlation for Independence Questions

The Pearson Product Moment Correlation is the most widely used statistic when determining the relationship between or independence of two variables that are continuous.

The Pearson Product Moment Correlation is the most widely used statistic when determining the relationship between or independence of two variables that are continuous.

Variable A Variable B

By continuous we mean a variable that can take any valuable between two points.

By continuous we mean a variable that can take any valuable between two points.Here is an example:

By continuous we mean a variable that can take any valuable between two points.Here is an example:

Suppose the fire department mandates that all fire fighters must weigh between 150 and 250 pounds. The weight of a fire fighter would be an example of a continuous variable; since a fire fighter's weight could take on any value between 150 and 250 pounds.

By continuous we mean a variable that can take any valuable between two points.Here is an example:

Suppose the fire department mandates that all fire fighters must weigh between 150 and 250 pounds. The weight of a fire fighter would be an example of a continuous variable; since a fire fighter's weight could take on any value between 150 and 250 pounds.

The Pearson Product Moment Correlation will either indicate a strong relationship

The Pearson Product Moment Correlation will either indicate a strong relationship

Variable A Variable B

Or a weak even nonexistent relationship

Or a weak even nonexistent relationship

Variable A Variable B

Or a weak even nonexistent relationship

Variable A Variable B

This is what is meant by independence!

The Pearson Product Moment Correlation or simply Pearson Correlation values range from -1.0 to +1.0

The Pearson Product Moment Correlation or simply Pearson Correlation values range from -1.0 to +1.0

-1 +10

A Pearson Correlation of 1.0 has a perfect positive relationship. Note two qualities here:

A Pearson Correlation of 1.0 has a perfect positive relationship. Note two qualities here:

(1) direction

A Pearson Correlation of 1.0 has a perfect positive relationship. Note two qualities here:

(1) direction(2) strength

A Pearson Correlation of 1.0 has a perfect postive relationship. Note two qualities here:

(1) direction(2) strength

A +1.0 Pearson Correlation’s direction is positive and it’s strength is very or perfectly strong.

A Pearson Correlation of 1.0 has a perfect postive relationship. Note two qualities here:

(1) direction(2) strength

A +1.0 Pearson Correlation’s direction is positive and it’s strength is very or perfectly strong.A -1.0 Pearson Correlation’s direction is negative and it’s strength is very or perfectly strong.

A Pearson Correlation of 1.0 has a perfect postive relationship. Note two qualities here:

(1) direction(2) strength

A +1.0 Pearson Correlation’s direction is positive and it’s strength is very or perfectly strong.A -1.0 Pearson Correlation’s direction is negative and it’s strength is very or perfectly strong.A 0.0 Pearson Correlation has no direction and has no strength.

A Pearson Correlation of 1.0 has a perfect postive relationship. Note two qualities here:

(1) direction(2) strength

A +1.0 Pearson Correlation’s direction is positive and it’s strength is very or perfectly strong.A -1.0 Pearson Correlation’s direction is negative and it’s strength is very or perfectly strong.A 0.0 Pearson Correlation has no direction and has no strength.

This is would be evidence of independence between two

variables

A Pearson Correlation of 1.0 has a perfect postive relationship. Note two qualities here:

(1) direction(2) strength

A +1.0 Pearson Correlation’s direction is positive and it’s strength is very or perfectly strong.A -1.0 Pearson Correlation’s direction is negative and it’s strength is very or perfectly strong.A 0.0 Pearson Correlation has no direction and has no strength. A +0.3 Pearson Correlation’s direction is positive and it’s strength is moderately weak.

A Pearson Correlation of 1.0 has a perfect postive relationship. Note two qualities here:

(1) direction(2) strength

A +1.0 Pearson Correlation’s direction is positive and it’s strength is very or perfectly strong.A -1.0 Pearson Correlation’s direction is negative and it’s strength is very or perfectly strong.A 0.0 Pearson Correlation has no direction and has no strength. A +0.3 Pearson Correlation’s direction is positive and it’s strength is moderately weak.A -0.1 Pearson Correlation’s direction is negative and it’s strength is very weak.

There is another quality as well. With a Pearson correlation you are considering the relationship between or independence of only two variables.

There is another quality as well. With a Pearson correlation you are considering the relationship between or independence of only two variables.

There is another quality as well. With a Pearson correlation you are considering the relationship between or independence of only two variables.

Three’s a crowd:

There is another quality as well. With a Pearson correlation you are considering the relationship between or independence of only two variables.

Three’s a crowd:

There is another quality as well. With a Pearson correlation you are considering the relationship between or independence of only two variables.

Three’s a crowd:

Bottom line: The Pearson Correlation is used only when exploring the relationship between or independence of two variables.

Let’s look at a fictitious problem to illustrate how the Pearson Correlation is calculated.

Imagine you are working for a company that is trying convince patrons that ice-cream is a dessert for all seasons. They ask you to conduct a study to determine if the average daily temperature and the average daily ice cream sales are independent of one another.

Imagine you are working for a company that is trying convince patrons that ice-cream is a dessert for all seasons. They ask you to conduct a study to determine if the average daily temperature and the average daily ice cream sales are independent of one another.

Imagine the data set looks like this:

Imagine the data set looks like this:

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

Notice how as one variable goes up (temperature) the other variable increases (ice cream sales)

Notice how as one variable goes up (temperature) the other variable increases (ice cream sales)

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

Notice how as one variable goes up (temperature) the other variable increases (ice cream sales)

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

One way to look at this relationship is to rank order both variable values like so:

One way to look at this relationship is to rank order both variable values like so:

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

One way to look at this relationship is to rank order both variable values like so:

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

1st

One way to look at this relationship is to rank order both variable values like so:

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

1st 1st

One way to look at this relationship is to rank order both variable values like so:

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

1st 1st

One way to look at this relationship is to rank order both variable values like so:

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

1st 1st

2nd 2nd

One way to look at this relationship is to rank order both variable values like so:

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

1st 1st

2nd

3rd 3rd

2nd

One way to look at this relationship is to rank order both variable values like so:

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

1st 1st

2nd

3rd 3rd

2nd

4th 4th

One way to look at this relationship is to rank order both variable values like so:

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

1st 1st

2nd

5th 5th

4th 4th

3rd 3rd

2nd

Notice how their rank orders are identical. And because their standard deviations are similar as well, these variables have a +1.0 Pearson Correlation.

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

1st 1st

2nd

5th 5th

4th 4th

3rd 3rd

2nd

Notice how their rank orders are identical. And because their standard deviations are similar as well, these variables have a +1.0 Pearson Correlation.

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

1st 1st

2nd

5th 5th

4th 4th

3rd 3rd

2nd

Meaning that higher values for one variable are associated with higher

values for another variable

Notice how their rank orders are identical. And because their standard deviations are similar as well, these variables have a +1.0 Pearson Correlation.

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

1st 1st

2nd

5th 5th

4th 4th

3rd 3rd

2nd

Meaning that higher values for one variable are associated with higher

values for another variable

Notice how their rank orders are identical. And because their standard deviations are similar as well, these variables have a +1.0 Pearson Correlation.

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

1st 1st

2nd

5th 5th

4th 4th

3rd 3rd

2nd

Meaning that higher values for one variable are associated with higher

values for another variable

Notice how their rank orders are identical. And because their standard deviations are similar as well, these variables have a +1.0 Pearson Correlation.

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

1st 1st

2nd

5th 5th

4th 4th

3rd 3rd

2nd

Or

Notice how their rank orders are identical. And because their standard deviations are similar as well, these variables have a +1.0 Pearson Correlation.

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

1st 1st

2nd

5th 5th

4th 4th

3rd 3rd

2nd

Meaning that lower values for one variable are associated with lower

values for another variable

Notice how their rank orders are identical. And because their standard deviations are similar as well, these variables have a +1.0 Pearson Correlation.

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

1st 1st

2nd

5th 5th

4th 4th

3rd 3rd

2nd

Meaning that lower values for one variable are associated with lower

values for another variable

What would a perfectly negative correlation (-1.0) look like?

What would a perfectly negative correlation (-1.0) look like?

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

230

320

350

480

560

1st

1st

2nd

5th

5th

4th

4th

3rd 3rd

2nd

What would a perfectly negative correlation (-1.0) look like?

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

230

320

350

480

560

1st

1st

2nd

5th

5th

4th

4th

3rd 3rd

2nd

What would a perfectly negative correlation (-1.0) look like?

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

230

320

350

480

560

1st

1st

2nd

5th

5th

4th

4th

3rd 3rd

2nd

What would a perfectly negative correlation (-1.0) look like?

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

230

320

350

480

560

1st

1st

2nd

5th

5th

4th

4th

3rd 3rd

2nd

What would a perfectly negative correlation (-1.0) look like?

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

230

320

350

480

560

1st

1st

2nd

5th

5th

4th

4th

3rd 3rd

2nd

What would a perfectly negative correlation (-1.0) look like?

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

230

320

350

480

560

1st

1st

2nd

5th

5th

4th

4th

3rd 3rd

2nd

Meaning that higher values for one variable are associated with lower

values for another variable

What would a perfectly negative correlation (-1.0) look like?

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

230

320

350

480

560

1st

1st

2nd

5th

5th

4th

4th

3rd 3rd

2nd

Meaning that higher values for one variable are associated with lower

values for another variable

What would a zero correlation (0.0) look like?

What would a zero correlation (0.0) look like?

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

1st

1st

2nd

5th 5th

4th

4th

3rd

3rd

2nd

What would a zero correlation (0.0) look like?

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

1st

1st

2nd

5th 5th

4th

4th

3rd

3rd

2nd

What would a zero correlation (0.0) look like?

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

1st

1st

2nd

5th 5th

4th

4th

3rd

3rd

2nd

What would a zero correlation (0.0) look like?

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

1st

1st

2nd

5th 5th

4th

4th

3rd

3rd

2nd

What would a zero correlation (0.0) look like?

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

1st

1st

2nd

5th 5th

4th

4th

3rd

3rd

2nd

What would a zero correlation (0.0) look like?

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

1st

1st

2nd

5th 5th

4th

4th

3rd

3rd

2nd

What would a zero correlation (0.0) look like?

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

1st

1st

2nd

5th 5th

4th

4th

3rd

3rd

2nd

If this is the result than we can conclude that temperature and ice

cream are independent of one another

The Pearson Product Moment Correlation (PPMC) is calculated as the average cross product of the z-scores of two variables for a single group of people. Here is the equation for the PPMC

The Pearson Product Moment Correlation (PPMC) is calculated as the average cross product of the z-scores of two variables for a single group of people. Here is the equation for the PPMC

𝑟=∑(𝑍 𝑋 ∙𝑍𝑌 )𝑛

Let’s calculate the Pearson Correlation, for the following data set:

Let’s calculate the Pearson Correlation, for the following data set:

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

Let’s calculate the Pearson Correlation, for the following data set:

It is important to note that the Pearson Correlation can be computed in a matter of seconds using statistical software. The next set of slides is designed to help you see what is happening conceptually as well as computationally with the Pearson Correlation.

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

When computing a Pearson Correlation you will normally have two variables that DO NOT USE THE SAME METRIC:

When computing a Pearson Correlation you will normally have two variables that DO NOT USE THE SAME METRIC:

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

When computing a Pearson Correlation you will normally have two variables that DO NOT USE THE SAME METRIC:

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

The metric here is degrees

When computing a Pearson Correlation you will normally have two variables that DO NOT USE THE SAME METRIC:

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

The metric here is number of ice

cream sales

The metric here is degrees

So we have to get these two variables on the same metric. This is done by calculating the z scores or standardized scores for the values from each variable.

So these raw score values in separate metrics are transformed into standardized values which converts them into the same metric:

So these raw score values in separate metrics are transformed into standardized values which converts them into the same metric:

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

So these raw score values in separate metrics are transformed into standardized values which converts them into the same metric:

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

So these raw score values in separate metrics are transformed into standardized values which converts them into the same metric:

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

Ave Daily Temp

+1.4

+0.7

0.0

-0.7

-1.4

Ave Daily Ice Cream Sales

+1.5

+0.8

-0.3

-0.6

-1.3

So these raw score values in separate metrics are transformed into standardized values which converts them into the same metric:

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

Ave Daily Temp

+1.4

+0.7

0.0

-0.7

-1.4

Ave Daily Ice Cream Sales

+1.5

+0.8

-0.3

-0.6

-1.3

Different Metric (raw scores)

So these raw score values in separate metrics are transformed into standardized values which converts them into the same metric:

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

Ave Daily Temp

+1.4

+0.7

0.0

-0.7

-1.4

Ave Daily Ice Cream Sales

+1.5

+0.8

-0.3

-0.6

-1.3

Same Metric (z or standard

scores)

• Note – this is done by subtracting each value from it’s mean (e.g., 900 minus 700 = 200) and dividing it by it’s standard deviation (e.g., 200 / 14.1 = 1.4)

Ave Daily Temp

900

800

700

600

500

Ave Daily Ice Cream Sales

560

480

350

320

230

Ave Daily Temp

+1.4

+0.7

0.0

-0.7

-1.4

Ave Daily Ice Cream Sales

+1.5

+0.8

-0.3

-0.6

-1.3

• Once the values are standardized we multiply them

• Once the values are standardized we multiply them

𝑟=∑(𝒁 𝑿 ∙𝒁𝒀 )

𝑛

• Once the values are standardized we multiply them

𝑟=∑(𝒁 𝑿 ∙𝒁𝒀 )

𝑛

• Once the values are standardized we multiply them

Ave Daily Temp

+1.4

+0.7

0.0

-0.7

-1.4

Ave Daily Ice Cream Sales

+1.5

+0.8

-0.3

-0.6

-1.3

𝑟=∑(𝒁 𝑿 ∙𝒁𝒀 )

𝑛

• Once the values are standardized we multiply them

Ave Daily Temp

+1.4

+0.7

0.0

-0.7

-1.4

Ave Daily Ice Cream Sales

+1.5

+0.8

-0.3

-0.6

-1.3

XXXXX

𝑟=∑(𝒁 𝑿 ∙𝒁𝒀 )

𝑛

• Once the values are standardized we multiply them

Ave Daily Temp

+1.4

+0.7

0.0

-0.7

-1.4

Ave Daily Ice Cream Sales

+1.5

+0.8

-0.3

-0.6

-1.3

XXXXX

Cross Products

1.9

0.4

0.0

0.6

2.1

=====

𝑟=∑(𝒁 𝑿 ∙𝒁𝒀 )

𝑛

• Once the values are standardized we multiply them

Ave Daily Temp

+1.4

+0.7

0.0

-0.7

-1.4

Ave Daily Ice Cream Sales

+1.5

+0.8

-0.3

-0.6

-1.3

XXXXX

Cross Products

1.9

0.4

0.0

0.6

2.1

=====

𝑟=∑(𝒁 𝑿 ∙𝒁𝒀 )

𝑛

These are called cross products because we are multiplying

across two values

• Once the values are standardized we multiply them

Ave Daily Temp

+1.4

+0.7

0.0

-0.7

-1.4

Ave Daily Ice Cream Sales

+1.5

+0.8

-0.3

-0.6

-1.3

XXXXX

Cross Products

1.9

0.4

0.0

0.6

2.1

=====

𝑟=∑(𝒁 𝑿 ∙𝒁𝒀 )

𝑛

1.9 + 0.4 + 0.0 + 0.6 + 2.1 = 5.0Then we sum the cross products

• Finally, divide that number (5.0) by the number of observations

• Finally, divide that number (5.0) by the number of observations

𝑟=∑(𝒁 𝑿 ∙𝒁𝒀 )

𝑛

• Finally, divide that number (5.0) by the number of observations

𝑟=∑(𝒁 𝑿 ∙𝒁𝒀 )

𝑛

The number of observations (in this case 5)

Ave Daily Temp

+1.4

+0.7

0.0

-0.7

-1.4

Ave Daily Ice Cream Sales

+1.5

+0.8

-0.3

-0.6

-1.3

12345

𝑟=∑(𝒁 𝑿 ∙𝒁𝒀 )

𝟓

𝑟=∑(𝒁 𝑿 ∙𝒁𝒀 )

𝟓

The number of observations (in this case 5)

𝑟=𝟓𝟓

𝑟=∑(𝒁 𝑿 ∙𝒁𝒀 )

𝟓

The number of observations (in this case 5)

𝑟=𝟓𝟓

Sum of the cross products1.9 + 0.4 + 0.0 + 0.6 + 2.1 =

5.0

𝑟=∑(𝒁 𝑿 ∙𝒁𝒀 )

𝟓

The number of observations (in this case 5)

𝑟=𝟓𝟓

Sum of the cross products1.9 + 0.4 + 0.0 + 0.6 + 2.1 =

5.0

𝑟=+𝟏 .𝟎

𝑟=∑(𝒁 𝑿 ∙𝒁𝒀 )

𝟓

The number of observations (in this case 5)

𝑟=𝟓𝟓

Sum of the cross products1.9 + 0.4 + 0.0 + 0.6 + 2.1 =

5.0

𝑟=+𝟏 .𝟎This is the Pearson Correlation which in this case is a perfect

positive relationship

In summary:

In summary:The Pearson Product Moment Correlation can range from -1 to 0 to +1.

In summary:The Pearson Product Moment Correlation can range from -1 to 0 to +1.

-1 +10

A correlation of 0.0 indicates no association between the variables of interest, hence independence.