AP Statistics - Ds Mathdsmath.weebly.com/uploads/2/2/3/6/22368992/3.1_two... · AP Statistics...

21
AP Statistics Two-Variable Data Analysis

Transcript of AP Statistics - Ds Mathdsmath.weebly.com/uploads/2/2/3/6/22368992/3.1_two... · AP Statistics...

Page 1: AP Statistics - Ds Mathdsmath.weebly.com/uploads/2/2/3/6/22368992/3.1_two... · AP Statistics Two-Variable Data Analysis . Key Ideas Scatterplots Lines of Best Fit The Correlation

AP Statistics Two-Variable Data Analysis

Page 2: AP Statistics - Ds Mathdsmath.weebly.com/uploads/2/2/3/6/22368992/3.1_two... · AP Statistics Two-Variable Data Analysis . Key Ideas Scatterplots Lines of Best Fit The Correlation

Key Ideas

Scatterplots

Lines of Best Fit

The Correlation Coefficient

Least Squares Regression Line

Coefficient of Determination

Residuals

Outliers and Influential Points

Transformations to Achieve Linearity

Page 3: AP Statistics - Ds Mathdsmath.weebly.com/uploads/2/2/3/6/22368992/3.1_two... · AP Statistics Two-Variable Data Analysis . Key Ideas Scatterplots Lines of Best Fit The Correlation

Two-Variable Data

When looking at two-variable data, we are primarily interested in whether or not two variables have a linear relationship and how changes in one variable can predict changes in the other variable.

Two-variable data is also called bivariate data.

Page 4: AP Statistics - Ds Mathdsmath.weebly.com/uploads/2/2/3/6/22368992/3.1_two... · AP Statistics Two-Variable Data Analysis . Key Ideas Scatterplots Lines of Best Fit The Correlation

Response, Explanatory

A response variable measures an outcome of a study.

An explanatory variable helps explain or influences changes in a response variable.

Exercises: page 173 problems 3.1 – 3.4

Page 5: AP Statistics - Ds Mathdsmath.weebly.com/uploads/2/2/3/6/22368992/3.1_two... · AP Statistics Two-Variable Data Analysis . Key Ideas Scatterplots Lines of Best Fit The Correlation

Example STUDENT HOURS STUDIED SCORE ON EXAM

A 0.5 65

B 2.5 80

C 3.0 77

D 1.5 60

E 1.25 68

F 0.75 70

G 4.0 83

H 2.25 85

I 1.5 70

J 6.0 96

K 3.25 84

L 2.5 84

M 0.0 51

N 1.75 63

O 2.0 71

Page 6: AP Statistics - Ds Mathdsmath.weebly.com/uploads/2/2/3/6/22368992/3.1_two... · AP Statistics Two-Variable Data Analysis . Key Ideas Scatterplots Lines of Best Fit The Correlation

Example, cont.

A teacher wanted to know if additional studying resulted in higher grades. In other words, does studying have an effect on test performance?

Draw a scatterplot – putting one variable on the horizontal axis and the other on the vertical axis. If we have an explanatory variable, it should go on the horizontal axis and the response variable on the vertical axis.

Page 7: AP Statistics - Ds Mathdsmath.weebly.com/uploads/2/2/3/6/22368992/3.1_two... · AP Statistics Two-Variable Data Analysis . Key Ideas Scatterplots Lines of Best Fit The Correlation

Interpreting a Scatterplot

Direction, form and strength of the relationship

Striking deviations

Outliers

Page 8: AP Statistics - Ds Mathdsmath.weebly.com/uploads/2/2/3/6/22368992/3.1_two... · AP Statistics Two-Variable Data Analysis . Key Ideas Scatterplots Lines of Best Fit The Correlation

Association?

If the variable on the vertical axis tends to increase as the variable on the horizontal axis increases, we say that the two variables are positively associated.

If one of them decreases as the other increases, we say they are negatively associated.

Page 9: AP Statistics - Ds Mathdsmath.weebly.com/uploads/2/2/3/6/22368992/3.1_two... · AP Statistics Two-Variable Data Analysis . Key Ideas Scatterplots Lines of Best Fit The Correlation

Calculator Tip

To draw a scatterplot on your calculator, enter the data in two lists (L1 and L2).

Go to STAT PLOT and choose the scatterplot icon.

Enter L1 for Xlist and L2 for Ylist.

Do ZOOM: ZoomStat.

See Technology Toolbox on page 183

Page 10: AP Statistics - Ds Mathdsmath.weebly.com/uploads/2/2/3/6/22368992/3.1_two... · AP Statistics Two-Variable Data Analysis . Key Ideas Scatterplots Lines of Best Fit The Correlation

Exercises

Page 179, problems 3.5 – 3.10

Page 11: AP Statistics - Ds Mathdsmath.weebly.com/uploads/2/2/3/6/22368992/3.1_two... · AP Statistics Two-Variable Data Analysis . Key Ideas Scatterplots Lines of Best Fit The Correlation

Correlation

We are primarily interested in determining the extent to which two variables are linearly associated.

The first statistic we have to determine a linear relationship is the Pearson product moment correlation, or more simply, the correlation coefficient, denoted by the letter r.

The correlation coefficient is a measure of the strength of the linear relationship between two variables as well as an indicator of the direction of the linear relationship.

Page 12: AP Statistics - Ds Mathdsmath.weebly.com/uploads/2/2/3/6/22368992/3.1_two... · AP Statistics Two-Variable Data Analysis . Key Ideas Scatterplots Lines of Best Fit The Correlation

Formula

If we have a sample of size n of paired data, say (x,y), and assuming that we have computed summary statistics for x and y (means and standard deviations), the correlation coefficient r is defined as follows:

Find r for the previous example.

r 1

n 1

xi x

sx

yi y

sy

Page 13: AP Statistics - Ds Mathdsmath.weebly.com/uploads/2/2/3/6/22368992/3.1_two... · AP Statistics Two-Variable Data Analysis . Key Ideas Scatterplots Lines of Best Fit The Correlation

Properties of r

If r is positive, it indicates that the variables are positively associated. If r is negative, the variables are negatively associated.

If r = 0, it indicates that there is no linear association that would allow us to predict y from x. It doesn’t mean that there is no relationship – just not a linear one.

It doesn’t matter which variable you call x and which one you call y.

r doesn’t depend on units of measurements.

r is not resistant to extreme values because it is based on the mean.

1 r 1

Page 14: AP Statistics - Ds Mathdsmath.weebly.com/uploads/2/2/3/6/22368992/3.1_two... · AP Statistics Two-Variable Data Analysis . Key Ideas Scatterplots Lines of Best Fit The Correlation

Guidelines

There are no hard and fast rules about how strong a relationship is based on the numerical value of r.

VALUE OF r STRENGTH OF RELATIONSHIP

-1 < r < -0.8 0.8 < r < 1

strong

-0.8 < r < -0.5 0.5 < r < 0.8

moderate

-0.5 < r < 0.5 weak

Page 15: AP Statistics - Ds Mathdsmath.weebly.com/uploads/2/2/3/6/22368992/3.1_two... · AP Statistics Two-Variable Data Analysis . Key Ideas Scatterplots Lines of Best Fit The Correlation

Calculator Tip

You will first need to turn “Diagnostic On”. You can find this in CATALOG. Choose it and press ENTER twice.

Enter values in L1 and L2.

STAT: CALC: LinReg(a + bx) then ENTER

Enter L1, L2 and press ENTER

Page 16: AP Statistics - Ds Mathdsmath.weebly.com/uploads/2/2/3/6/22368992/3.1_two... · AP Statistics Two-Variable Data Analysis . Key Ideas Scatterplots Lines of Best Fit The Correlation

Exercises

Page 188-189, problems 3.13, 3.14

Page 17: AP Statistics - Ds Mathdsmath.weebly.com/uploads/2/2/3/6/22368992/3.1_two... · AP Statistics Two-Variable Data Analysis . Key Ideas Scatterplots Lines of Best Fit The Correlation

Correlation and Causation

Association does not imply causation!

Just because two things seem to go together does not mean that one caused the other.

Some third variable, called a lurking variable, may be influencing them both.

Page 18: AP Statistics - Ds Mathdsmath.weebly.com/uploads/2/2/3/6/22368992/3.1_two... · AP Statistics Two-Variable Data Analysis . Key Ideas Scatterplots Lines of Best Fit The Correlation

YEAR Number of Methodist Ministers in New England

Number of barrels of Cuban rum imported to Boston

1860 63 8376

1865 48 6406

1870 53 7005

1875 64 8486

1880 72 9595

1885 80 10,643

1890 85 11,265

1895 76 10.071

1900 80 10,547

1905 83 11,008

1910 105 13,885

1915 140 18,559

1920 175 23,024

1925 183 24,185

1930 192 25,434

1935 221 29.238

Page 19: AP Statistics - Ds Mathdsmath.weebly.com/uploads/2/2/3/6/22368992/3.1_two... · AP Statistics Two-Variable Data Analysis . Key Ideas Scatterplots Lines of Best Fit The Correlation

Cautions

Correlation requires that both variables be quantitative.

Correlation does not describe curved relationships between variables, no matter how strong they are.

Correlation is not a complete summary of two-variable data.

Page 20: AP Statistics - Ds Mathdsmath.weebly.com/uploads/2/2/3/6/22368992/3.1_two... · AP Statistics Two-Variable Data Analysis . Key Ideas Scatterplots Lines of Best Fit The Correlation

Line of Best Fit

Once we have determined that two variables have a strong linear relationship, we can find a line of best fit so that we can predict values.

This is called linear regression. In this situation, it matters which variable we call x and which one we call y.

The line we are looking for is called the least squares regression line.

Page 21: AP Statistics - Ds Mathdsmath.weebly.com/uploads/2/2/3/6/22368992/3.1_two... · AP Statistics Two-Variable Data Analysis . Key Ideas Scatterplots Lines of Best Fit The Correlation

Exercises

Page 193-195, problems 3.15-3.20

Section 3.1 Exercises: page 196-199, problems 3.21-3.28