Chapter 3
description
Transcript of Chapter 3
04/20/23 Chapter 3 1
Chapter 3
Scatterplots and Correlation
04/20/23 2
Both Ch 3 and Ch 4• Relationships between two quantitative
variablesX explanatory variable
Y response variable
• Illustrative Example: What is the relationship between “per capita gross domestic product” (X) and “life expectancy” (Y)?
04/20/23 3
Data n = 10 “GDP & life expectancy”
Country Per Capita GDP (X)
Life Expectancy (Y)
Austria 21.4 77.48
Belgium 23.2 77.53
Finland 20.0 77.32
France 22.7 78.63
Germany 20.8 77.17
Ireland 18.6 76.39
Italy 21.5 78.51
Netherlands 22.0 78.15
Switzerland 23.8 78.99
United Kingdom 21.2 77.37
04/20/23 4
Scatterplot: Life_Exp vs. GDP
GDP
24232221201918
LIF
E_
EX
P79.5
79.0
78.5
78.0
77.5
77.0
76.5
76.0
This is the data point for Switzerland (23.8, 78.99)
04/20/23 5
Interpreting Scatterplots1. Form: Straight? Curved?
2. Outliers: Deviations from overall pattern
3. Direction of association:– Positive association (upward)– Negative association (downward)– No association (flat)
4. Strength: Extent to which points adhere to predicted trend line (next slide)
04/20/23 6
No association Moderate positive assn Strong positive assn
Weak negative assn. Strong negative assn.Very strong negative assn.
04/20/23 7
Interpretation: life expectancy example
This is the data point for Switzerland (23.8, 78.99)
GDP
24232221201918
LIF
E_
EX
P
79.5
79.0
78.5
78.0
77.5
77.0
76.5
76.0
• Form: linear• Outliers:
none• Direction:
positive• Strength:
hard to tell by eye
04/20/23 8
Example #2
Interpretation • Form: linear• Outliers: none• Direction: positive• Strength: looks strong
04/20/23 9
Example #3
• Form: linear• Outliers: none• Direction: negative• Strength: weak(?)
04/20/23 10
Example #4 (Age & Health)
• Form: linear(?)• Outliers: none• Direction: negative(?)• Strength: weak
04/20/23 11
Example #5 (Physical & Mental Health)
• Form: U-shaped• Outliers: (?)• Direction: down then
up• Strength: (?)
04/20/23 12
• These two figures display the same data set with different axis scaling but the bottom figure looks “stronger” (optical illusion)
• To overcome this difficulty: calculate correlation coefficient r
Strength is Difficult to Judge by Eye Alone
04/20/23 13
Correlation Coefficient r • Notation: r ≡ Pearson’s correlation coefficient• Always between −1 and +1
r = +1 all points on upward sloping line
r = -1 all points on downward line
r = 0 no line or horizontal line
The closer r is to +1 or –1, the stronger the correlation
Positive or negative sign indicates direction of correlation
04/20/23 14
Guidelines for interpreting “strength” via r
• 0.0 | r | < 0.3 “weak”
• 0.3 | r | < 0.7 “moderate”
• 0.7 | r | < 1.0 “strong”
04/20/23 15
Examples
• Husband’s age / Wife’s age• r = .94 (strong positive correlation)
• Husband’s height / Wife’s height• r = .36 (moderate positive correlation)
• Distance of golf putt / percent success• r = -.94 (strong negative correlation)
04/20/23 16
Calculating r by hand• Calculate mean and standard deviation of X
• Calculate mean and standard deviation of Y
• Turn all X values into “z scores”
• Turn all Y values into “z scores”
• Calculate r
i
Xx
x xz
s
iY
y
y yz
s
n
1i1-n
1r YX zz
What is a z score?
• z ≡ “standardized value”
• Tells you the number of units above or below the mean in standard deviation units
Examples: •A z score of 1 indicates the value is 1 standard deviation above the mean•A z score of –1 indicates the value is 1 standard deviation below the mean•A z score of 0 indicates the value is equal to the mean
04/20/23 17
04/20/23 18
Calculating r by hand (Example)X Y ZX
ZY ZX ∙ ZY
1 82 6 0 0 03 4 1 -1 -1
1 0 1 2X Yz z
n
i 1
1
1
1
n
X Yi
r z zn
1
23 1
1
1 21
1xz
1
8 61
2yz
1 1
1 1 1x yz z
1
0
1
2
3
4
5
6
7
8
9
0 1 2 3 4
16 2
31 [by calculator]x
x
s
118 6
32 [by calculator]y
y
s
04/20/23 19
r by hand can be tedious!(Life expectancy data)
X Y ZXZY ZX ∙ ZX
21.4 77.48 -0.078 -0.345 0.02723.2 77.53 1.097 -0.282 -0.30920.0 77.32 -0.992 -0.546 0.54222.7 78.63 0.770 1.102 0.84920.8 77.17 -0.470 -0.735 0.34518.6 76.39 -1.906 -1.716 3.27121.5 78.51 -0.013 0.951 -0.01222.0 78.15 0.313 0.498 0.15623.8 78.99 1.489 1.555 2.31521.2 77.37 -0.209 -0.483 0.101
x-bar= 21.52 sx= 1.532
y-bar= 77.754 sy = 0.795 X Yz z
n
i 1
7.285
04/20/23 20
Example: Calculating r
x yz z 1r
n -1
r = .809 strong positive correlation
1(7.285)
10 1
0.809
04/20/23 21
Calculating rUse your calculator in 2-var mode!
TI two-variablecalculator
04/20/23 22
Beware!
• r applies to linear relations only
• Outliers have large influences on r
• Association ≠ causation
04/20/23 23
Nonlinear relation (mpg vs. speed)
0
5
10
15
20
25
30
35
0 50 100
speed
05
1015
2025
3035
0 50 100
speed
Strong non-linear relationshipsCan show r = 0r = 0
04/20/23 24
Outliers Have Undue Influence
With the outlier, r 0
Without the outlier, r .8
x