Chapter 3

24
06/20/22 Chapter 3 1 Chapter 3 Scatterplots and Correlation

description

Chapter 3. Scatterplots and Correlation. Both Ch 3 and Ch 4. Relationships between two quantitative variables X  explanatory variable Y  response variable - PowerPoint PPT Presentation

Transcript of Chapter 3

Page 1: Chapter 3

04/20/23 Chapter 3 1

Chapter 3

Scatterplots and Correlation

Page 2: Chapter 3

04/20/23 2

Both Ch 3 and Ch 4• Relationships between two quantitative

variablesX explanatory variable

Y response variable

• Illustrative Example: What is the relationship between “per capita gross domestic product” (X) and “life expectancy” (Y)?

Page 3: Chapter 3

04/20/23 3

Data n = 10 “GDP & life expectancy”

Country Per Capita GDP (X)

Life Expectancy (Y)

Austria 21.4 77.48

Belgium 23.2 77.53

Finland 20.0 77.32

France 22.7 78.63

Germany 20.8 77.17

Ireland 18.6 76.39

Italy 21.5 78.51

Netherlands 22.0 78.15

Switzerland 23.8 78.99

United Kingdom 21.2 77.37

Page 4: Chapter 3

04/20/23 4

Scatterplot: Life_Exp vs. GDP

GDP

24232221201918

LIF

E_

EX

P79.5

79.0

78.5

78.0

77.5

77.0

76.5

76.0

This is the data point for Switzerland (23.8, 78.99)

Page 5: Chapter 3

04/20/23 5

Interpreting Scatterplots1. Form: Straight? Curved?

2. Outliers: Deviations from overall pattern

3. Direction of association:– Positive association (upward)– Negative association (downward)– No association (flat)

4. Strength: Extent to which points adhere to predicted trend line (next slide)

Page 6: Chapter 3

04/20/23 6

No association Moderate positive assn Strong positive assn

Weak negative assn. Strong negative assn.Very strong negative assn.

Page 7: Chapter 3

04/20/23 7

Interpretation: life expectancy example

This is the data point for Switzerland (23.8, 78.99)

GDP

24232221201918

LIF

E_

EX

P

79.5

79.0

78.5

78.0

77.5

77.0

76.5

76.0

• Form: linear• Outliers:

none• Direction:

positive• Strength:

hard to tell by eye

Page 8: Chapter 3

04/20/23 8

Example #2

Interpretation • Form: linear• Outliers: none• Direction: positive• Strength: looks strong

Page 9: Chapter 3

04/20/23 9

Example #3

• Form: linear• Outliers: none• Direction: negative• Strength: weak(?)

Page 10: Chapter 3

04/20/23 10

Example #4 (Age & Health)

• Form: linear(?)• Outliers: none• Direction: negative(?)• Strength: weak

Page 11: Chapter 3

04/20/23 11

Example #5 (Physical & Mental Health)

• Form: U-shaped• Outliers: (?)• Direction: down then

up• Strength: (?)

Page 12: Chapter 3

04/20/23 12

• These two figures display the same data set with different axis scaling but the bottom figure looks “stronger” (optical illusion)

• To overcome this difficulty: calculate correlation coefficient r

Strength is Difficult to Judge by Eye Alone

Page 13: Chapter 3

04/20/23 13

Correlation Coefficient r • Notation: r ≡ Pearson’s correlation coefficient• Always between −1 and +1

r = +1 all points on upward sloping line

r = -1 all points on downward line

r = 0 no line or horizontal line

The closer r is to +1 or –1, the stronger the correlation

Positive or negative sign indicates direction of correlation

Page 14: Chapter 3

04/20/23 14

Guidelines for interpreting “strength” via r

• 0.0 | r | < 0.3 “weak”

• 0.3 | r | < 0.7 “moderate”

• 0.7 | r | < 1.0 “strong”

Page 15: Chapter 3

04/20/23 15

Examples

• Husband’s age / Wife’s age• r = .94 (strong positive correlation)

• Husband’s height / Wife’s height• r = .36 (moderate positive correlation)

• Distance of golf putt / percent success• r = -.94 (strong negative correlation)

Page 16: Chapter 3

04/20/23 16

Calculating r by hand• Calculate mean and standard deviation of X

• Calculate mean and standard deviation of Y

• Turn all X values into “z scores”

• Turn all Y values into “z scores”

• Calculate r

i

Xx

x xz

s

iY

y

y yz

s

n

1i1-n

1r YX zz

Page 17: Chapter 3

What is a z score?

• z ≡ “standardized value”

• Tells you the number of units above or below the mean in standard deviation units

Examples: •A z score of 1 indicates the value is 1 standard deviation above the mean•A z score of –1 indicates the value is 1 standard deviation below the mean•A z score of 0 indicates the value is equal to the mean

04/20/23 17

Page 18: Chapter 3

04/20/23 18

Calculating r by hand (Example)X Y ZX

ZY ZX ∙ ZY

1 82 6 0 0 03 4 1 -1 -1

1 0 1 2X Yz z

n

i 1

1

1

1

n

X Yi

r z zn

1

23 1

1

1 21

1xz

1

8 61

2yz

1 1

1 1 1x yz z

1

0

1

2

3

4

5

6

7

8

9

0 1 2 3 4

16 2

31 [by calculator]x

x

s

118 6

32 [by calculator]y

y

s

Page 19: Chapter 3

04/20/23 19

r by hand can be tedious!(Life expectancy data)

X Y ZXZY ZX ∙ ZX

21.4 77.48 -0.078 -0.345 0.02723.2 77.53 1.097 -0.282 -0.30920.0 77.32 -0.992 -0.546 0.54222.7 78.63 0.770 1.102 0.84920.8 77.17 -0.470 -0.735 0.34518.6 76.39 -1.906 -1.716 3.27121.5 78.51 -0.013 0.951 -0.01222.0 78.15 0.313 0.498 0.15623.8 78.99 1.489 1.555 2.31521.2 77.37 -0.209 -0.483 0.101

x-bar= 21.52 sx= 1.532

y-bar= 77.754 sy = 0.795 X Yz z

n

i 1

7.285

Page 20: Chapter 3

04/20/23 20

Example: Calculating r

x yz z 1r

n -1

r = .809 strong positive correlation

1(7.285)

10 1

0.809

Page 21: Chapter 3

04/20/23 21

Calculating rUse your calculator in 2-var mode!

TI two-variablecalculator

Page 22: Chapter 3

04/20/23 22

Beware!

• r applies to linear relations only

• Outliers have large influences on r

• Association ≠ causation

Page 23: Chapter 3

04/20/23 23

Nonlinear relation (mpg vs. speed)

0

5

10

15

20

25

30

35

0 50 100

speed

05

1015

2025

3035

0 50 100

speed

Strong non-linear relationshipsCan show r = 0r = 0

Page 24: Chapter 3

04/20/23 24

Outliers Have Undue Influence

With the outlier, r 0

Without the outlier, r .8

x