C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of...
-
Upload
anna-osborne -
Category
Documents
-
view
216 -
download
1
Transcript of C HAPTER 5 Summarizing Bivariate Data What conclusions can be made when considering the effect of...
CHAPTER 5
Summarizing Bivariate Data
What conclusions can be made when considering the effect of one treatment on another?
SCATTERPLOTS5-1 What is a scatterplot and what can be
determined from them?
TYPES OF DATA
Univariate—one list
Bivariate—two lists
Multivariate—multiple lists
SCATTERPLOT
The most important graphical representation of bivariate data
Plotted on a Cartesian coordinate system
graphs
5.1 HOMEWORK
Page 150-151 2, 4, 6, 8
CORRELATION5-2
WHAT IS MEANT BY CORRELATION?
Strong Negative Correlation
As x increases, y decreases
Strong Positive Correlation
As x increases, y increases
No Correlation
x and y do not appear to related
Correlation coefficient—
Indicates the strength of the relationship of bivariate data.
Pearson’s correlation coefficient is the most commonly used and often called simply THE correlation coefficient.
Find , Sx (ave. x, sd of x)
, Sy (ave. y, sd of y)
zx (calc the z-score for each xi)
zy (calc the z-score for each yi)
multiply zx zy (multiply the zx and the zy)
Calc. r
remember -1 ≤ r ≤ 1
To calculate Pearson’s Correlation Coefficientby hand
X Y zx zy zx zy1n
zzr yx
xy
Use the chart to help
Enter the data in L1, L2 Turn on the diagnostics Find the linear
regression for the data
To calculate Pearson’s Correlation Coefficientby calculator
Strong Negative Correlation
As x increases, y decreases
Strong Positive Correlation
As x increases, y increases
No Correlation
x and y do not appear to related
Correlation values:-1 to -.8 and .8 to 1 strong-.8 to -.5 and .5 to .8 moderate-.5 to .5 weak
Same Slide as before with an addition
EXAMPLE 1observation 1 2 3 4 5 6 7 8 9 10
crisis management score 20 13 27 18 19 21 0 21 21 11
family strength score 50 60 67 57 49 72 50 68 60 58
Find the correlation coefficient for crisis management vs family strength
Using both the calculator and excel
Repeat switching L1 and L2 on the calculator
what does this indicate?
n
yy
n
xx
n
yxxy
r2
22
2 )()(Alternate method:
Listed on formula sheet
Will only be used if they give you summary statistics
Properties of r Does not depend on the unit of measurement Does not depend on which is labeled x Is always between -1 and 1 1 indicates a strong positive correlation 0 indicates no correlation -1 indicates a strong negative correlation--measures the extent to which x and y have a linear
relationship
r – for the sample
Correlation DOES NOT imply causation Often two items have a high correlation not because
they impact each other but because they are strongly related to a third item
EX.Among elementary students, there is a strong positive correlation between vocabulary size and the number of cavities. WHY?
They are both related to age.
Spearman’s Rank correlation Coefficient Not as effected by “outliers” Order the x’s low to high Order the y’s low to high Keep the original x and y togetherEX
Use the calculator as before OR
12)1)(1(
4)1(
))((2
nnn
nnyrankxrank
rs
2 1 3 4
X 3 -2 5 7
Y 6 9 4 12
2 3 1 4
-1< rs < 1
5.2 HOMEWORK
P 163 5.9, 5.10, 5.12, 5.13,
5.14, 5.16, 5.18, 5.22
5.3 FITTING A LINE TO BIVARIATE DATA How do you fit a line to linear data?
5.3 FITTING A LINE TO BIVARIATE DATA Activation:
Given the following points, find the equation
X Y .-2 2
0 -2 2 -6
VARIABLES DEFINED
X = the independent or explanatory variable
Y = the dependent or response variable
Stat version of the linear regression (#8)y = a + bx
Algebra and calculus version (#4)y = ax + b
The slope and y-intercept are the same but stat prefers the other set up
REGRESSION LINEFORMED BY THE PRINCIPLE OF LEAST SQUARES
Determine the vertical distance each point is to the line which is supposed to represent the overall pattern of the data
if y = a + bx then
the predicted points are (x1, y1), (x2, y2), (x3, y3), etc.
the vertical distance is
yi – (a + bxi)
if this is positive yi is above the prediction line
if this is negative yi is below the prediction line
The least squares regression line is the one that minimizes
The formula for the least squares line is
a and b can be calculated by
(on the AP STAT formula sheet)
LEAST SQUARES REGRESSION LINE
2))(( ii bxay
bxay ˆ
2)(
))((
xx
yyxxb xbya
CALCULATING BY HAND
n
xx
n
yxxy
b
2
2 )(
xbya
These values can be calculated straight from the data. This formula is not on the formula sheet and is only used when the summary values are given.
LEAST SQUARES REGRESSION LINE
USE for INTERPOLATION not EXTRAPOLATION
Interpolation—data values between the given values
Extrapolation—data values beyond the given values If you are asked to extrapolate always state that
the values may not be accurate due to extrapolation
EXAMPLEAge in months Height in inches
19 22
21 23
23
24 25
27 28
29 31
31 28
34 32
38 34
43 39
50 45
72 48
84 54
58
120 62
128
Find the linear regression line for the given data: then find the values for the missing data
MINITAB INFOxy 407.354.61ˆ
a
The Regression equation isChollevl=61.5 + 3.41 perchgwt
Predictor Coef Stdev t-ratio pConstant 61.537 2.268 27.13 0.000Perchgwt 3.407 1.028 3.31 0.007
value of a value of b (slope)% weight change
Cho
lest
erol
leve
l
Should only be used to predict cholesterol from weight. And only weights from -5 to 3 should be used with any certainty.
USING PEARSON’S CORRELATION COEFFICIENT AND ALGEBRAIC MANIPULATION:
Given and
1) If
Then
2) If r =1
if
if
3) If it is not a perfect correlation let r =.5
Then substituting
this means that y will be r standard deviations from
that x is from
Hence it pulls (regresses) y back into the line
x
y
s
srb )(ˆ xx
s
sryy
x
y
xx
yy ˆ
)(ˆ xxs
syy
x
y
xsxx 1
ysyy ˆ
xsxx 2
ysyy 2ˆ
)(5.ˆ xxs
syy
x
y
xsxx 1
ysyy 5.ˆ
yx
5.3 HOMEWORK
Page 174-176 26, 27, 28, 31, 32, 34
5.4 ASSESSING THE FIT OF A LINE
How do you assess how well a line fits the data?
3 CHECKS FOR FIT
1) Is a line an appropriate way to summarize the data (does it the shape appear to be linear)
2) Are there any unusual aspects of the data that
need to be considered before making predictions
3) How accurate can we expect these predictions to
be
FINDING RESIDUALS The distance from the actual or observed to the
predicted value (HINT: this is an AP class a residual is Actual – Predicted)
ii yy ˆUsing the calculator to find residuals L1=x L2=y L3= predicted L3
vars stat 5EqReg EQreplace the X in Reg EQ w/L1
L4 = residuals
L4 type L2 – L3
PLOTTING RESIDUALS OR
There are two types of residuals that can be plotted Each gives us a picture that can be examined
Residuals for a good fit should have no particular pattern but should be in a band not be too far from zero
)ˆ,( yyx )ˆ,ˆ( yyy
WHAT TO LOOK FOR
Removal of the data causing a single large residual has a minimal impact on the regression line
Removal of a single influential point, has a large impact on the regression line.
An influential point is one where the x is not in the same group as the rest of the values.
THE COEFFICIENT OF DETERMINATION
Gives the proportion of variation in y that is attributed to the approximate linear relationship between x and y.
0
2 Re1
SST
sidSSr
Amount actually attributed to the linear relationship
Possible amount explained by a linear relationship
Amount not attributed to a linear relationship
SST0 AND SSRESID CALCULATIONS
SST0
Total sum of squares squared variation from
mean of
SSResid The amount of variation
not attributed to a linear relationship
Referred to as the errorsum of squares
SSResid ≤SST0
y2
0 )( yySST i
2)ˆ(Re ii yysidSS
Easy Computational Formulas
SST0=
SSResid =
All items can be obtained from the regression line and 2 variable stats function including the coefficient of determination
n
yy
22 )(
xybyay2
0
2 Re1
SST
sidSSr
STANDARD DEVIATION ABOUT THE LEAST SQUARES LINE
Denoted Se => means the Standard Deviation of error
n-2 relates to degrees of freedom—to be discussed later
For a truly good fit r2 must be larger than .5 and Se should be low
2
Re
n
sidSSSe
MINITAB AND CORRELATION
Page 179
EXAMPLE
Page use data from 5.441)Use the calculator to :
a)draw a scatterplot
b) find the regression line
c) find the correlation coefficient
d) calculate the predicted values
e) calculate the residuals
f) graph the residuals
X Y
92 1.7
92 2.3
96 1.9
100 2.0
102 1.5
102 1.7
106 1.6
106 1.8
121 1.0
143 0.3
5.4 HOMEWORK
Page 188-191 37, 38, 39, 41, 42, 43, 48, 51 c&d
5.5 NONLINEAR RELATIONSHIPS AND TRANSFORMATION
How are nonlinear relationships explained?
TRANSFORMATIONS
DO NOT mean moved from the parent function
DO mean adjusting x and/or y values so that the new points appear linear
Common transformations are sq. roots, logs, and reciprocals
originalAlgebraic transformation
QUADRATIC AND CUBIC FUNCTIONS
Use a graphing calculator or a STAT package such as minitab or fathom
Quadratic equations can be done by hand although it is not recommended
2)ˆ( yy
0
2
0
2
)ˆ(1
Re1
SST
yy
SST
sidSSR
UNDOING A TRANSFORMATION y’ = 1.14 – 1.92x where y’ = log (y)log y = 1.14 – 1.92x10log y = 10 1.14 – 1.92x
y = 101.14 – 1.92x
y = (101.14)(10-1.92x) y = 13.8038 (10-1.92x)
Undoing a transformation yields a curve that fits the data, but is not a least squares line.
DETERMINING WHICH TRANSFORMATION TO USE
+y
-y
-x +x
12
43
If the curve resembles one of the numbered curves to achieve a linear transformation move up(+) or down (-) the power chart as indicated by the closest part of the x or y axis.
Power Function Name
3 X3 Cube
2 X2 Square
1 X No transformation
½ Sq. Root
1/3 Cube Root
0 log x Log
-1 1/x Reciprocal
3 x
x
EXAMPLE
frying time moisture
x y5 16.310 9.715 8.120 4.225 3.430 2.945 1.960 1.3
#3 curve therefore x and/or y down
frying time moisture transformation
x y log(y)5 16.3 1.21218760410 9.7 0.98677173415 8.1 0.90848501920 4.2 0.6232492925 3.4 0.53147891730 2.9 0.46239799845 1.9 0.27875360160 1.3 0.113943352
Is the transformed data linear?
Find the linear regression on the transformation
Check the residual pattern. Try a different transformation. Plot this residual pattern. Which one looks better? Which has a better r value.
5.5 HOMEWORK
Page 206-207 52, 53, 59
5.6 INTERPRETING THE RESULTS OF STATISTICAL ANALYSIS
Read pages 208-209
REVIEW
Page 210-213 61, 63, 64, 66, 68, 69