Scatter plots & Association
description
Transcript of Scatter plots & Association
1
Scatter plots & AssociationStatistics is about … variation.Recognize, quantify and try to
explain variation.–Variation in contents of cola
cans can be explained, in part, by the type of cola in the cans.
2
Scatter plots & AssociationResponse variable – variable
of primary interest.Explanatory variable –
variable used to try to explain variation in the response.
3
Scatter plots & AssociationWhen both the response and
the explanatory variables are quantitative, display them both in a scatter plot.
Look for a general pattern of association.
4
Scatter plots & AssociationExample: Tar (mg) and carbon
monoxide (mg) in cigarettes.–y, Response: CO (mg).–x, Explanatory: Tar (mg).–Cases: 25 brands of cigarettes.
5
Scatter plot
6
Positive AssociationAbove average values of CO
are associated with above average values of Tar.
Below average values of CO are associated with below average values of Tar.
7
Scatter plots & AssociationExample: Outside temperature
and amount of natural gas used.– Response: Natural gas (1000 ft3).– Explanatory: Outside temperature
(o C).– Cases: 26 days.
8
Negative Association
0
5
10
Gas
-5.0 .0 5.0 10.0 15.0Temp
9
Negative AssociationAbove average values of gas
are associated with below average temperatures.
Below average values of gas are associated with above average temperatures.
10
CorrelationLinear Association
–How closely do the points on the scatter plot represent a straight line?
–The correlation coefficient gives the direction of the linear association and quantifies the strength of the linear association between two quantitative variables.
11
CorrelationStandardize y
Standardize x
yy s
yyz
xx s
xxz
12
ZxZy > 0
ZxZy > 0
13
Correlation Coefficient
1
1
nssyyxx
r
nzz
r
yx
yx
14
Correlation ConditionsCorrelation applies only to
quantitative variables.Correlation measures the
strength of linear association.Outliers can distort the value
of the correlation coefficient.
15
Correlation CoefficientTar and CO
r = 0.9575
249796.22
1
nzz
r yx
16
Correlation CoefficientThere is a strong positive
correlation, linear association, between the tar content and carbon monoxide content of the various cigarette brands.
17
JMPAnalyze – Multivariate methods
– MultivariateY, Columns
– Tar (mg)– CO (mg)
18
Tar (mg)CO (mg)
1.00000.9575
0.95751.0000
Tar (mg) CO (mg)Correlations
0
5
10
15
20
25
30
0
5
10
15
20
25
30
Tar (mg)
0 5 10 15 20 25 30
CO (mg)
0 5 10 15 20 25 30
Scatterplot Matrix
Multivariate
19
Correlation PropertiesThe sign of r indicates the direction of
the association.The value of r is always between
–1 and +1Correlation has no units.Correlation is not affected by changes
of center or scale.
20
Correlation Cautions“Correlation” and “Association”
are different.–Correlation – specific (linear).–Association – vague (trend).
Don’t correlate categorical variables.
21
Correlation CautionsDon’t confuse correlation with
causation.–There is a strong positive correlation
between the number of crimes committed in communities and the number of 2nd graders in those communities.
Beware of lurking variables.