Quantitative Methods Using more than one explanatory variable.
1 Association Variables –Response – an outcome variable whose values exhibit variability....
-
Upload
julie-fields -
Category
Documents
-
view
220 -
download
0
Transcript of 1 Association Variables –Response – an outcome variable whose values exhibit variability....
1
Association Variables
–Response – an outcome variable whose values exhibit variability.
–Explanatory – a variable that we use to try to explain the variability in the response.
2
Association There is an association
between two variables if values of one variable are more likely to occur with certain values of a second variable.
3
Picturing Association Two Categorical (Qualitative).
–Cross-tabs table, mosaic plot. Two Numerical (Quantitative).
–Scatter diagram.
4
Categorical Data Who?
– Students in a statistics class at Penn State University.
What?– “With whom is it easiest to make
friends?” Opposite sex, same sex, no difference.
– Gender. Male, female.
5
Cross-tabs Table
Same Sex
Opposite Sex
No Diff Total
Female 16 58 63 137
Male 13 15 40 68
Total 29 73 103 205
With whom is it easiest to make friends?
6
Bar Graph
50.2
35.6
14.125
50
75
100
Cou
nt
No Diff Opposite Same
No DiffOppositeSameTotal
Level 1037329
205
Count0.502440.356100.141461.00000
Prob
N Missing 03 Levels
Frequencies
Answer
Distributions
With whom is it easiest to make friends?
7
Percentages
Count
Row %
Same Sex
Opposite Sex
No Diff Total
Female 16
11.7%
58
42.3%
63
46.0%
137
100%
Male 13
19.1%
15
22.1%
40
58.8%
68
100%
Total 29 73 103 205
With whom is it easiest to make friends?
8
Mosaic PlotA
nsw
er
0.00
0.25
0.50
0.75
1.00
Female Male
Gender
No Diff
Opposite
Same
9
Interpretation More than 50% of males say no
difference while less than 50% of females say no difference.
Females are about twice as likely as males to say opposite.
Males are about twice as likely as females to say the same.
10
Scatter Plot Statistics is about … variation. Recognize, quantify and try to
explain variation. Variation in two quantitative
variables is displayed in a scatter plot.
11
Scatter Plot Numerical variable on the
vertical axis, y, is the response variable.
Numerical variable on the horizontal axis, x, is the explanatory variable.
12
Scatter Plot Example: Body mass (kg) and
Bite force (N) for Canidae.–y, Response: Bite force (N)–x, Explanatory: Body mass (kg)–Cases: 28 species of Canidae.
13
0
100
200
300
400
500
BF
ca (
N)
0 5 10 15 20 25 30 35 40
Body Mass (kg)
Bivariate Fit of BFca (N) By Body Mass (kg)
14
Positive Association Positive Association
– Above average values of Bite force are associated with above average values of Body mass.
– Below average values of Bite force are associated with below average values of Body mass.
15
Scatter Plot Example: Outside temperature
and amount of natural gas used.– Response: Natural gas used (1000
ft3).– Explanatory: Outside temperature
(o C).– Cases: 26 days.
16
0
5
10
Gas
-5.0 .0 5.0 10.0 15.0
Temp
17
Negative Association–Above average values of gas
are associated with below average temperatures.
–Below average values of gas are associated with above average temperatures.
18
Association Positive
–As x goes up, y tends to go up. Negative
–As x goes up, y tends to go down.
19
Correlation Linear Association
– How closely do the points on the scatter plot represent a straight line?
– The correlation coefficient gives the direction of and quantifies the strength of the linear association between two quantitative variables.
20
Correlation Standardize y
Standardize x
y
y s
yyz
x
x s
xxz
21
3210-1
3
2
1
0
-1
Standardized Body Mass
Sta
ndar
dize
d B
ite F
orce
Bite Force vs Body Mass of Canidae
22
Correlation Coefficient
yx
yx
ssn
yyxxr
n
zzr
1
1
23
Correlation Coefficient Body mass and Bite force
r = 0.9807
27
479626
1
.
n
zzr yx
24
Correlation Coefficient There is a very strong
positive correlation, linear association, between the body mass and bite force for the various species of Canidae.
25
JMP Analyze – Multivariate
methods – Multivariate Y, Columns
– Body mass– BF ca (Bite force at the
canine)
26
Body Mass (kg)BFca (N)
1.00000.9807
0.98071.0000
Body Mass (kg) BFca (N)
Correlations
5
10
15
20
25
30
35
40
100
200
300
400
500
Body
Mass (kg)
5 10 15 20 25 30 35 40
BFca (N)
100 200 300 400 500
Scatterplot Matrix
Multivariate
27
Correlation Properties The sign of r indicates the direction
of the association. The value of r is always between –1 and +1. Correlation has no units. Correlation is not affected by
changes of center or scale.
28
Algebra Review The equation of a straight line y = mx + b
– m is the slope – the change in y over the change in x – or rise over run.
– b is the y-intercept – the value where the line cuts the y axis.
29
-5 -4 -3 -2 -1 0 1 2 3 4 5
-15
-10
-5
0
5
10
15
x
yy = 3x + 2
30
Review y = 3x + 2
– x = 0 y = 2 (y-intercept)
– x = 3 y = 11
– Change in y (+9) divided by the change in x (+3) gives the slope, 3.
31
Linear Regression Example: Body mass (kg) and
Bite force (N) for Canidae.–y, Response: Bite force (N)–x, Explanatory: Body mass (kg)–Cases: 28 species of Canidae.
32
Correlation Coefficient Body mass and Bite force
r = 0.9807
27
479626
1
.
n
zzr yx
33
Correlation Coefficient There is a strong correlation,
linear association, between the body mass and bite force for the various species of Canidae.
34
Linear Model The linear model is the equation
of a straight line through the data.
A point on the straight line through the data gives a predicted value of y, denoted .
y
35
Residual The difference between the
observed value of y and the predicted value of y, , is called the residual.
Residual =
y
yy ˆ
36
353025201510 5 0
500
400
300
200
100
0
Body mass (kg)
BF
ca
(N
)Regression Plot
Residual
37
Line of “Best Fit” There are lots of straight lines
that go through the data. The line of “best fit” is the
line for which the sum of squared residuals is the smallest – the least squares line.
38
Line of “Best Fit”Some positive and some
negative residuals but they sum to zero.
Passes through the point . yx,
39
Line of “Best Fit”
bxay ˆLeast squares slope:
intercept:
x
y
s
srb
xbya
40
Body mass, x Bite Force, y
0.9807
kg 8.016
kg 9.207
r
s
x
x N 109.760
N 154.029
ys
y
Least Squares Estimates
41
Least Squares Estimates
xy
a
b
428.13397.30ˆ
397.30)207.9(428.13029.154
428.13016.8
760.1099807.0
42
Interpretation Slope – for a 1 kg increase in body
mass, the bite force increases, on average, 13.428 N.
Intercept – there is not a reasonable interpretation of the intercept in this context because one wouldn’t see a Canidae with a body mass of 0 kg.
43
353025201510 5 0
500
400
300
200
100
0
Body mass (kg)
BF
ca
(N
)
Bite Force vs Body Mass
x..y 4281339730
44
Prediction Least squares line
N1366254281339730
25
4281339730
.)(..y
x
x..y
45
Residual Body mass, x = 25 kg Bite force, y = 351.5 N Predicted, = 366.1 N Residual, = 351.5 – 366.1
= – 14.6 N
y
yy ˆ
46
Residuals Residuals help us see if the
linear model makes sense. Plot residuals versus the
explanatory variable.– If the plot is a random scatter of
points, then the linear model is the best we can do.
47
35302520151050
60
50
40
30
20
10
0
-10
-20
-30
Body mass (kg)
Res
idua
lPlot of Residuals vs Body Mass
48
Interpretation of the Plot The residuals are scattered
randomly. This indicates that the linear model is an appropriate model for the relationship between body mass and bite force for Canidae.
49
(r)2 or R2
The square of the correlation coefficient gives the amount of variation in y, that is accounted for or explained by the linear relationship with x.
50
Body mass and Bite force r = 0.9807 (r)2 = (0.9807)2 = 0.962 or 96.2% 96.2% of the variation in bite
force can be explained by the linear relationship with body mass.
51
Regression Conditions Quantitative variables – both
variables should be quantitative. Linear model – does the scatter
diagram show a reasonably straight line?
Outliers – watch out for outliers as they can be very influential.
52
Regression Cautions Beware of extraordinary points. Don’t extrapolate beyond the
data. Don’t infer x causes y just
because there is a good linear model relating the two variables.
53
Extraordinary Points
54
Don’t ExtrapolateExplanatory (x) – Average outdoor
temperature (o C).Response (y) – Amount of natural
gas used (1000 cu ft).
xy 393.085.6ˆ
55
Don’t Extrapolate
0
5
10G
as
-5 0 5 10 15 20
Temp
56
Don’t ExtrapolateExplanatory (x = 20) – Average
outdoor temperature (o C).Response (y) – Amount of natural
gas used (1000 cu ft).
01.1ˆ
20393.085.6ˆ
y
y
57
Correlation Causation Don’t confuse correlation with
causation.– There is a strong positive correlation
between the number of crimes committed in communities and the number of 2nd graders in those communities.
Beware of lurking variables.