Finding Areas with Calc 1. Shade Norm Consider WISC data from before: N (100, 15). Suppose we want...
-
Upload
hilary-merritt -
Category
Documents
-
view
218 -
download
0
Transcript of Finding Areas with Calc 1. Shade Norm Consider WISC data from before: N (100, 15). Suppose we want...
Finding Areas with Calc 1. Shade NormFinding Areas with
Calc 1. Shade Norm• Consider WISC data from before: N (100, 15).
Suppose we want to find % of children whose scores are above 125
• Specify window: X[55,145]12 and Y[-.008, .028].01
• Press 2nd VARS (DISTR), then choose Draw and 1: ShadeNorm(.
• Compete command ShadeNorm(125, 1E99, 100, 15) or (0,85, 100,15).
• If using Z scores instead of raw scores, the mean 0 and SD 1 will be understood so ShadeNorm (1,2) will give you area from z-score of 1.0 to 2.0. How does this compare to the 68-95-99.7 rule?
2. NormalCDF2. NormalCDF
• Advantage- quicker, disadvantage- no picture.
• 2nd Vars (DISTR) choose 2: normalcdf(.
• Complete command (125, 1E99, 100, 15) and press enter. You get .0477 or approx. 5%.
• If you have Z scores: normalcdf (-1, 1) = .6827 aka 68%...like our rule!
InvNormInvNorm• InvNorm calculates the raw or Z score value
corresponding to a known area under the curve.
• 2nd Vars (DISTR), choose 3: invNorm(.
• Complete the command invNorm (.9, 100, 15) and press Enter. You get 119.223, so this is the score corresponding to 90th percentile.
• Compare this with command invNorm (.9) you get 1.28. This is the Z score.
Describing Bivariate Relationships
Describing Bivariate Relationships
Chapter 3 SummaryYMS3e
AP Stats at RHSMs. Nichols
Chapter 3 SummaryYMS3e
AP Stats at RHSMs. Nichols
Bivariate RelationshipsBivariate RelationshipsWhat is Bivariate data?When exploring/describing a bivariate (x,y) relationship:
Determine the Explanatory and Response variablesPlot the data in a scatterplotNote the Strength, Direction, and FormNote the mean and standard deviation of x and the mean and standard deviation of yCalculate and Interpret the Correlation, rCalculate and Interpret the Least Squares Regression Line in context.Assess the appropriateness of the LSRL by constructing a Residual Plot.
3.1 Response Vs. Explanatory Variables
3.1 Response Vs. Explanatory Variables
• Response variable measures an outcome of a study, explanatory variable helps explain or influences changes in a response variable (like independent vs. dependent).
• Calling one variable explanatory and the other response doesn’t necessarily mean that changes in one CAUSE changes in the other.
• Ex: Alcohol and Body temp: One effect of Alcohol is a drop in body temp. To test this, researches give several amounts of alcohol to mice and measure each mouse’s body temp change. What are the explanatory and response variables?
ScatterplotsScatterplots
• Scatterplot shows the relationship between two quantitative variables measured on the same individuals.
• Explanatory vari
• ables along X axis, Response variables along Y.
• Each individual in data appears as the point in the plot fixed by the values of both variables for that individual.
• Example:
Interpreting ScatterplotsInterpreting Scatterplots
• Direction: in previous example, the overall pattern moves from upper left to lower right. We call this a negative association.
• Form: The form is slightly curved and there are two distinct clusters. What explains the clusters? (ACT States)
• Strength: The strength is determined by how closely the points follow a clear form. The example is only moderately strong.
• Outliers: Do we see any deviations from the pattern? (Yes, West Virginia, where 20% of HS seniors take the SAT but the mean math score is only 511).
AssociationAssociation
Introducing Categorical Variables
Introducing Categorical Variables
Calculator ScatterplotCalculator Scatterplot
• Enter the Beer consumption in L1 and the BAC values in L2
• Next specify scatterplot in Statplot menu (first graph). X list L1 Y List L2 (explanatory and response)
• Use ZoomStat.
• Notice that their are no scales on the axes and they aren’t labeled. If you are copying your graph to your paper, make sure you scale and label the Axis (use Trace)
Student 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Beers 5 2 9 8 3 7 3 5 3 5 4 6 5 7 1 4BAC 0.1 0.0
30.19
0.12
0.04
0.0950
0.07
0.06
0.02
0.05
0.07
0.1 0.085
0.09
0.01
0.05
CorrelationCorrelation
• Caution- our eyes can be fooled! Our eyes are not good judges of how strong a linear relationship is. The 2 scatterplots depict the same data but drawn with a different scale. Because of this we need a numerical measure to supplement the graph.
rr
• The Correlation measures the direction and strength of the linear relationship between 2 variables.
• Formula- (don’t need to memorize or use): r =
• In Calc: Go to Catalog (2nd, zero button), go to DiagnosticOn, enter, enter. You only have to do this ONCE! Once this is done:
• Enter data in L1 and L2 (you can do calc-2 var stats if you want the mean and sd of each)
• Calc, LinReg (A + Bx) enter
ZxZyn 1
Interpreting r Interpreting r • The absolute value of r tells you the strength of the
association (0 means no association, 1 is a strong association)
• The sign tells you whether it’s a positive or a negative association. So r ranges from -1 to +1
• Note- it makes no difference which variable you call x and which you call y when calculating correlation, but stay consistent!
• Because r uses standardized values of the observations, r does not change when we change the units of measurement of x, y, or both. (Ex: Measuring height in inches vs. ft. won’t change correlation with weight)
• values of -1 and +1 occur ONLY in the case of a perfect linear relationship , when the variables lie exactly along a straight line.
ExamplesExamples1. Correlation requires that both variables be quantitative
2. Correlation measures the strength of only LINEAR relationships, not curved...no matter how strong they are!
3. Like the mean and standard deviation, the correlation is not resistant: r is strongly affected by a few outlying observations. Use r with caution when outliers appear in the scatterplot
4. Correlation is not a complete summary of two-variable data, even when the relationship is linear- always give the means and standard deviations of both x and y along with the correlation.
3.2- least squares regression
3.2- least squares regression
Text
The slope here B = .00344 tells us that fat gained goes down by .00344 kg for each added calorie of NEA according to this linear model. Our regression equation is the predicted RATE OF CHANGE in the response y as the explanatory variable x changes.
The Y intercept a = 3.505kg is the fat gain estimated by this model if NEA does not change when a person overeats.
PredictionPrediction• We can use a regression line to predict the
response y for a specific value of the explanatory variable x.
LSRL LSRL • In most cases, no line will pass exactly
through all the points in a scatter plot and different people will draw different regression lines by eye.
• Because we use the line to predict y from x, the prediction errors we make are errors in y, the vertical direction in the scatter plot
• A good regression line makes the vertical distances of the points from the line as small as possible
• Error: Observed response - predicted response
LSRL Cont. LSRL Cont.
Equation of LSRLEquation of LSRL
• Example 3.36: The Sanchez household is about to install solar panels to reduce the cost of heating their house. In order to know how much the panels help, they record their consumption of natural gas before the panels are installed. Gas consumption is higher in cold weather, so the relationship between outside temp and gas consumption is important.
• Describe the direction, form, and strength of the relationship
• Positive, linear, and very strong
• About how much gas does the regression line predict that the family will use in a month that averages 20 degree-days per day?
• 500 cubic feet per day
• How well does the least-squares line fit the data?
•
• The error of our predictions, or vertical distance from predicted Y to observed Y, are called residuals because they are “left-over” variation in the response.
ResidualsResiduals
One subject’s NEA rose by 135 calories. That subject gained 2.7 KG of fat. The predicted gain for 135 calories is
Y hat = 3.505- .00344(135) = 3.04 kg
The residual for this subject is
y - yhat= 2.7 - 3.04 = -.34 kg
Residual PlotResidual Plot
• The sum of the least-squares residuals is always zero.
• The mean of the residuals is always zero, the horizontal line at zero in the figure helps orient us. This “residual = 0” line corresponds to the regression line
Residuals List on Calc
Residuals List on Calc• If you want to get all your residuals listed in L3
highlight L3 (the name of the list, on the top) and go to 2nd- stat- RESID then hit enter and enter and the list that pops out is your resid for each individual in the corresponding L1 and L2. (if you were to create a normal scatter plot using this list as your y list, so x list: L1 and Y list L3 you would get the exact same thing as if you did a residual plot defining x list as L1 and Y list as RESID as we had been doing).
This is a helpful list to have to check your work when asked to calculate an individuals residual.
Examining Residual PlotExamining Residual Plot• Residual plot should show no obvious pattern. A
curved pattern shows that the relationship is not linear and a straight line may not be the best model.
• Residuals should be relatively small in size. A regression line in a model that fits the data well should come close” to most of the points.
• A commonly used measure of this is the standard deviation of the residuals, given by:
s residuals
2n 2
For the NEA and fat gain data, S = 7.663
14.740
Residual Plot on CalcResidual Plot on Calc
• Produce Scatterplot and Regression line from data (lets use BAC if still in there)
• Turn all plots off
• Create new scatterplot with X list as your explanatory variable and Y list as residuals (2nd stat, resid)
• Zoom Sta
R squared- Coefficient of determination
R squared- Coefficient of determination
If all the points fall directly on the least-squares line, r squared = 1. Then all the variation in y is explained by the linear relationship with x.
So, if r squared = .606, that means that 61% of the variation in y among individual subjects is due to the influence of the other variable. The other 39% is “not explained”.
r squared is a measure of how successful the regression was in explaining the response
Facts about Least-Squares regressionFacts about Least-Squares regression
• The distinction between explanatory and response variables is essential in regression. If we reverse the roles, we get a different least-squares regression line.
• There is a close connection between corelation and the slope of the LSRL. Slope is r times Sy/Sx. This says that a change of one standard deviation in x corresponds to a change of 4 standard deviations in y. When the variables are perfectly correlated (4 = +/- 1), the change in the predicted response y hat is the same (in standard deviation units) as the change in x.
• The LSRL will always pass through the point (X bar, Y Bar)
• r squared is the fraction of variation in values of y explained by the x variable
3.3 Influences3.3 Influences• Correlation r is not resistant. Extrapolation
is not very reliable. One unusual point in the scatterplot greatly affects the value of r. LSRL also not resistant.
• A point extreme in the x direction with no other points near it pulls the line toward itself. This point is influential.
Lurking Variables- Beware!
Lurking Variables- Beware!
• Example: A college board study of HS grads found a strong correlation between math minority students took in high school and their later success in college. News articles quoted the College Board saying that “math is the gatekeeper for success in college”.
• But, Minority students from middle-class homes with educated parents no doubt take more high school math courses. They are also more likely to have a stable family, parents who emphasize education, and can pay for college etc. These students would likely succeed in college even if they took fewer math courses. The family background of students is a lurking variable that probably explains much of the relationship between math courses and college success.
Beware correlations based on averagesBeware correlations based on averages
• Correlations based on averages are usually too high when applied to individuals.
• Example: if we plot the average height of young children against their age in months, we will see a very strong positive association with correlation near 1. But individual children of the same age vary a great deal in height. A plot of height against age for individual children will show much more scatter and lower correlation than the plot of average height against age.
Chapter Example:Corrosion and Strength
Chapter Example:Corrosion and Strength
Consider the following data from the article, “The Carbonation of Concrete Structures in the Tropical Environment of Singapore” (Magazine of Concrete Research (1996):293-300 which discusses how the corrosion of steel(caused by carbonation) is the biggest problem affecting concrete strength:
x= carbonation depth in concrete (mm)y= strength of concrete (Mpa)
xx 8 20 20 30 35 40 50 55 65
yy 22.8 17.1 21.5 16.1 13.4 12.4 11.4 9.7 6.8
Define the Explanatory and Response Variables.Plot the data and describe the relationship.
Corrosion and StrengthCorrosion and Strength
Depth (mm)
Str
en
gth
(M
pa) There is a strong,
negative, linear relationship between
depth of corrosion and concrete strength. As the
depth increases, the strength decreases at a
constant rate.
Corrosion and StrengthCorrosion and Strength
Depth (mm)
Str
en
gth
(M
pa)
The mean depth of corrosion is 35.89mm with
a standard deviation of 18.53mm.
The mean strength is 14.58 Mpa with a standard
deviation of 5.29 Mpa.
Corrosion and StrengthCorrosion and StrengthFind the equation of the
Least Squares Regression Line (LSRL)
that models the relationship between
corrosion and strength.
Depth (mm)
Str
en
gth
(M
pa)
y=24.52+(-0.28)x
strength=24.52+(-0.28)depth
r=-0.96
Corrosion and StrengthCorrosion and Strength
Depth (mm)
Str
en
gth
(M
pa) y=24.52+(-0.28)x
strength=24.52+(-0.28)depth
r=-0.96
What does “r” tell us?There is a Strong, Negative, LINEAR relationship
between depth of corrosion and strength of concrete.
What does “b=-0.28” tell us?For every increase of 1mm in depth of corrosion, we predict a 0.28 Mpa decrease in strength of the
concrete.
Corrosion and StrengthCorrosion and StrengthUse the prediction model (LSRL) to determine the following:
What is the predicted strength of concrete with a corrosion depth of 25mm?strength=24.52+(-0.28)depthstrength=24.52+(-0.28)(25)strength=17.59 Mpa
What is the predicted strength of concrete with a corrosion depth of 40mm?strength=24.52+(-0.28)(40)strength=13.44 MpaHow does this prediction compare with the observed strength at a corrosion depth of 40mm?
ResidualsResidualsNote, the predicted strength when corrosion=40mm is:predicted strength=13.44 MpaThe observed strength when corrosion=40mm is:observed strength=12.4mm
• The prediction did not match the observation.• That is, there was an “error” or “residual” between
our prediction and the actual observation.
• RESIDUAL = Observed y - Predicted y
• The residual when corrosion=40mm is:• residual = 12.4 - 13.44• residual = -1.04
Assessing the ModelAssessing the ModelIs the LSRL the most appropriate prediction model for strength? r suggests it will provide strong predictions...can we do better?To determine this, we need to study the residuals generated by the LSRL.Make a residual plot.Look for a pattern.If no pattern exists, the LSRL may be our best bet for predictions.If a pattern exists, a better prediction model may exist...
Residual PlotResidual PlotConstruct a Residual Plot for the (depth,strength) LSRL.
depth(mm)
resi
du
als
There appears to be no pattern to the residual plot...therefore, the LSRL may be our best prediction model.
Coefficient of DeterminationCoefficient of Determination
We know what “r” tells us about the relationship
between depth and strength....what about r2?
Depth (mm)
Str
en
gth
(M
pa)
93.75% of the variability in predicted strength can be explained by the LSRL on
depth.
SummarySummaryWhen exploring a bivariate relationship:
Make and interpret a scatterplot:
Strength, Direction, Form
Describe x and y:
Mean and Standard Deviation in Context
Find the Least Squares Regression Line.
Write in context.
Construct and Interpret a Residual Plot.
Interpret r and r2 in context.
Use the LSRL to make predictions...
Examining Examining RelationshipsRelationshipsRegression Regression
ReviewReview
Regression BasicsRegression BasicsWhen describing a Bivariate Relationship:
Make a Scatterplot
Strength, Direction, Form
Model: y-hat=a+bx
Interpret slope in context
Make Predictions
Residual = Observed-Predicted
Assess the Model
Interpret “r”
Residual Plot
0
10
20
30
40
50
60
Registrations (thousand)400 450 500 550 600 650 700 750
The Endangered Manatee Scatter Plot
0
10
20
30
40
50
60
Registrations (thousand)400 450 500 550 600 650 700 750
The Endangered Manatee Scatter Plot
Killed = (0.125 thousand^-1)Registrations - 41.4; r2 = 0.89
0
10
20
30
40
50
60
Registrations (thousand)400 450 500 550 600 650 700 750
The Endangered Manatee Scatter Plot
Killed = (0.125 thousand^-1)Registrations - 41.4; r2 = 0.89
0
10
20
30
40
50
60
Registrations (thousand)400 450 500 550 600 650 700 750
The Endangered Manatee Scatter Plot
Killed = (0.125 thousand^-1)Registrations - 41; r2 = 0.89
010
203040
5060
Registrations (thousand)400 450 500 550 600 650 700 750
-606
400 450 500 550 600 650 700 750Registrations (thousand)
The Endangered Manatee Scatter Plot
Killed = (0.125 thousand^-1)Registrations - 41; r2 = 0.89
010
203040
5060
Registrations (thousand)400 450 500 550 600 650 700 750
-606
400 450 500 550 600 650 700 750Registrations (thousand)
The Endangered Manatee Scatter Plot
Reading Minitab Reading Minitab OutputOutputRegression Analysis: Fat gain versus NEA
The regression equation isFatGain = ****** + ******(NEA)
Predictor Coef SE Coef T PConstant 3.5051 0.3036 11.54 0.000NEA -0.0034415 0.00074141 -4.04 0.000
S=0.739853 R-Sq = 60.6%R-Sq(adj)=57.8%
Regression equations aren’t always as easy to spot as they are on your TI-84. Can you find the slope and intercept above?
Regression Analysis: Fat gain versus NEA
The regression equation isFatGain = ****** + ******(NEA)
Predictor Coef SE Coef T PConstant 3.5051 0.3036 11.54 0.000NEA -0.0034415 0.00074141 -4.04 0.000
S=0.739853 R-Sq = 60.6%R-Sq(adj)=57.8%
Regression Analysis: Fat gain versus NEA
The regression equation isFatGain = ****** + ******(NEA)
Predictor Coef SE Coef T PConstant 3.5051 0.3036 11.54 0.000NEA -0.0034415 0.00074141 -4.04 0.000
S=0.739853 R-Sq = 60.6%R-Sq(adj)=57.8%
Regression Analysis: Fat gain versus NEA
The regression equation isFatGain = ****** + ******(NEA)
Predictor Coef SE Coef T PConstant 3.5051 0.3036 11.54 0.000NEA -0.0034415 0.00074141 -4.04 0.000
S=0.739853 R-Sq = 60.6%R-Sq(adj)=57.8%
Outliers/Influential Outliers/Influential PointsPointsDoes the age of a child’s first word
predict his/her mental ability? Consider the following data on (age of first word, Gesell Adaptive Score) for 21 children.
Age at First Word and Gesell Score
Child Age Score <new>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
1 months15 95
2 months26 71
3 months10 83
4 months9 91
5 months15 102
6 months20 87
7 months18 93
8 months11 100
9 months8 104
10 months20 94
11 months7 113
12 months9 96
13 months10 83
14 months11 84
15 months11 102
16 months10 100
17 months12 105
18 months42 57
19 months17 121
20 months11 86
21 months10 100
Age at First Word and Gesell Score
Child Age Score <new>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
1 months15 95
2 months26 71
3 months10 83
4 months9 91
5 months15 102
6 months20 87
7 months18 93
8 months11 100
9 months8 104
10 months20 94
11 months7 113
12 months9 96
13 months10 83
14 months11 84
15 months11 102
16 months10 100
17 months12 105
18 months42 57
19 months17 121
20 months11 86
21 months10 100
50
60
70
80
90
100
110
120
130
Age (months)0 5 10 15 20 25 30 35 40 45
Age at First Word and Gesell Score Scatter Plot
50
60
70
80
90
100
110
120
130
Age (months)0 5 10 15 20 25 30 35 40 45
Age at First Word and Gesell Score Scatter Plot
Score = (-1.13 months^-1)Age + 110; r2 = 0.41
50
60
70
80
90
100
110
120
130
Age (months)0 5 10 15 20 25 30 35 40 45
Age at First Word and Gesell Score Scatter Plot
Score = (-1.13 months^-1)Age + 110; r2 = 0.41
50
60
70
80
90
100
110
120
130
Age (months)0 5 10 15 20 25 30 35 40 45
Age at First Word and Gesell Score Scatter Plot
Does the highlighted point markedly affect the equation of the LSRL? If so, it is “influential”.
Test by removing the point and finding the new LSRL.
Influential?
Score = (-0.779 months^-1)Age + 106; r2 = 0.11
50
60
70
80
90
100
110
120
130
Age (months)0 5 10 15 20 25 30 35 40 45
Age at First Word and Gesell Score Scatter Plot
Score = (-0.779 months^-1)Age + 106; r2 = 0.11
50
60
70
80
90
100
110
120
130
Age (months)0 5 10 15 20 25 30 35 40 45
Age at First Word and Gesell Score Scatter Plot
Explanatory vs. Explanatory vs. ResponseResponseThe Distinction Between Explanatory and Response
variables is essential in regression.
Switching the distinction results in a different least-squares regression line.
v = 454r - 38; r2 = 0.63
-400
-200
0
200
400
600
800
1000
1200
r0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2
Hubble 1929 data Scatter Plot
v = 454r - 38; r2 = 0.63
-400
-200
0
200
400
600
800
1000
1200
r0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2
Hubble 1929 data Scatter Plot
r = 0.00139v + 0.39; r2 = 0.63
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
2.2
v-400 -200 0 200 400 600 800 1000 1200
Hubble 1929 data Scatter Plot
r = 0.00139v + 0.39; r2 = 0.63
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
2.2
v-400 -200 0 200 400 600 800 1000 1200
Hubble 1929 data Scatter Plot
Note: The correlation value, r, does NOT depend on the distinction between Explanatory and Response.
CorrelationCorrelationThe correlation, r, describes the strength of the straight-line relationship between x and y.
Ex: There is a strong, positive, LINEAR relationship between # of beers and BAC.
There is a weak, positive, linear relationship between x and y. However, there is a strong nonlinear relationship.
r measures the strength of linearity...
BAC = 0.0180Beers - 0.013; r2 = 0.80
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
Beers0 1 2 3 4 5 6 7 8 9 10
Beer and Blood Alcohol Scatter Plot
BAC = 0.0180Beers - 0.013; r2 = 0.80
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
Beers0 1 2 3 4 5 6 7 8 9 10
Beer and Blood Alcohol Scatter Plot
y = x - 10; r2 = 0.14
-20
-15
-10
-5
0
5
x0 2 4 6 8 10 12
Collection 1 Scatter Plot
y = x - 10; r2 = 0.14
-20
-15
-10
-5
0
5
x0 2 4 6 8 10 12
Collection 1 Scatter Plot
Coefficient of Coefficient of DeterminationDeterminationThe coefficient of determination, r2, describes the
percent of variability in y that is explained by the linear regression on x.
71% of the variability in death rates due to heart disease can be explained by the LSRL on alcohol consumption.
That is, alcohol consumption provides us with a fairly good prediction of death rate due to heart disease, but other factors contribute to this rate, so our prediction will be off somewhat.
DeathRate = (-23.0 yr/L)Alcohol + 260; r2 = 0.71
0
50
100
150
200
250
300
350
Alcohol (L/yr)0 1 2 3 4 5 6 7 8 9 10
Wine Consumption and Heart Disease Scatter Plot
DeathRate = (-23.0 yr/L)Alcohol + 260; r2 = 0.71
0
50
100
150
200
250
300
350
Alcohol (L/yr)0 1 2 3 4 5 6 7 8 9 10
Wine Consumption and Heart Disease Scatter Plot
CautionsCautions Correlation and Regression are NOT RESISTANT to outliers and Influential Points!
Correlations based on “averaged data” tend to be higher than correlations based on all raw data.
Extrapolating beyond the observed data can result in predictions that are unreliable.
Correlation vs. Correlation vs. CausationCausationConsider the following historical data:
Collection 1
Year Ministers Rum <new>
1
2
3
4
5
6
7
8
9
10
11
12
1860 63 8376
1865 48 6406
1870 53 7005
1875 64 8486
1880 72 9595
1885 80 10643
1890 85 11265
1895 76 10071
1900 80 10547
1905 83 11008
1910 105 13885
1915 140 18559
Collection 1
Year Ministers Rum <new>
1
2
3
4
5
6
7
8
9
10
11
12
1860 63 8376
1865 48 6406
1870 53 7005
1875 64 8486
1880 72 9595
1885 80 10643
1890 85 11265
1895 76 10071
1900 80 10547
1905 83 11008
1910 105 13885
1915 140 18559
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
x0 20 40 60 80 100 120 140 160
Collection 1 Scatter Plot
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
x0 20 40 60 80 100 120 140 160
Collection 1 Scatter Plot
y = 132x + 33; r2 = 1.00
02000400060008000
100001200014000160001800020000
x0 20 40 60 80 100 120 140 160
Collection 1 Scatter Plot
y = 132x + 33; r2 = 1.00
02000400060008000
100001200014000160001800020000
x0 20 40 60 80 100 120 140 160
Collection 1 Scatter Plot
There is an almost perfect linear relationship between x and y. (r=0.999997)
x = # Methodist Ministers in New England
y = # of Barrels of Rum Imported to Boston
CORRELATION DOES NOT IMPLY CAUSATION!
SummarySummary
0
10
20
30
40
50
60
Registrations (thousand)400 450 500 550 600 650 700 750
The Endangered Manatee Scatter Plot
0
10
20
30
40
50
60
Registrations (thousand)400 450 500 550 600 650 700 750
The Endangered Manatee Scatter Plot
Killed = (0.125 thousand^-1)Registrations - 41.4; r2 = 0.89
0
10
20
30
40
50
60
Registrations (thousand)400 450 500 550 600 650 700 750
The Endangered Manatee Scatter Plot
Killed = (0.125 thousand^-1)Registrations - 41.4; r2 = 0.89
0
10
20
30
40
50
60
Registrations (thousand)400 450 500 550 600 650 700 750
The Endangered Manatee Scatter Plot
Killed = (0.125 thousand^-1)Registrations - 41; r2 = 0.89
010
203040
5060
Registrations (thousand)400 450 500 550 600 650 700 750
-606
400 450 500 550 600 650 700 750Registrations (thousand)
The Endangered Manatee Scatter Plot
Killed = (0.125 thousand^-1)Registrations - 41; r2 = 0.89
010
203040
5060
Registrations (thousand)400 450 500 550 600 650 700 750
-606
400 450 500 550 600 650 700 750Registrations (thousand)
The Endangered Manatee Scatter Plot