Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

45
Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1

Transcript of Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Page 1: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Section 4.1

Scatter Diagrams and Linear Correlation

4.1 / 1

Page 2: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Scatter DiagramScatter Diagram

• Is a graph in which data pairs (x, y) are plotted as individual points on a grid with horizontal axis x and vertical axis y

• We call x the explanatory variable.

• We call y the response variable.

4.1 / 2

Page 3: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Paired dataPaired data

• x = phosphorus concentration at inlet• y = phosphorus concentration at outlet

4.1 / 3

Page 4: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Scatter DiagramScatter Diagram

Linear CorrelationLinear Correlation

The general trend of the points seems to follow a straight line segment.

4.1 / 4

Page 5: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Non-Linear CorrelationNon-Linear Correlation

4.1 / 5

Page 6: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

No Linear CorrelationNo Linear Correlation

4.1 / 6

Page 7: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

High Linear CorrelationHigh Linear Correlation

Points lie close to a straight line.

7

Page 8: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Moderate Linear CorrelationModerate Linear Correlation

4.1 / 8

Page 9: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Low Linear CorrelationLow Linear Correlation

4.1 / 9

Page 10: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Perfect Linear CorrelationPerfect Linear Correlation

10

Page 11: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Positive Linear CorrelationPositive Linear Correlation

4.1 / 11

Page 12: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Negative Linear CorrelationNegative Linear Correlation

4.1 / 12

Page 13: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Little or No Linear CorrelationLittle or No Linear Correlation

4.1 / 13

Page 14: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Questions Arising Questions Arising

• Can we find a relationship between x and y?• How strong is the relationship?

• The answer is that there is a mathematical measurement that describes the strength of the linear association between two variables. This measure is the sample correlation coefficient r.

4.1 / 14

Page 15: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

The Correlation Coefficient (The Correlation Coefficient (rr) )

• A numerical measurement that assesses the strength of a linear relationship between two variables x and y

4.1 / 15

Page 16: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Properties of the Correlation Properties of the Correlation Coefficient Coefficient rr

• Also called the Pearson product-moment correlation coefficient, r is a unitless measurement between

• 1 and 1.

• That is 1 < r < 1.

4.1 / 16

Page 17: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Properties of the Correlation Properties of the Correlation Coefficient Coefficient rr

• If r = 1, there is a perfect positive correlation.

4.1 / 17

Page 18: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Properties of the Correlation Properties of the Correlation Coefficient Coefficient rr

• If r = 1, there is a perfect negative correlation.

4.1 / 18

Page 19: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Properties of the Correlation Properties of the Correlation Coefficient Coefficient rr

• If r = 0, there is no linear correlation.

4.1 / 19

Page 20: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Properties of the Correlation Properties of the Correlation CoefficientCoefficient rr

• Positive values of r imply that as x increases, y tends to increase.

4.1 / 20

Page 21: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Properties of the Correlation Properties of the Correlation Coefficient Coefficient rr

• Negative values of r imply that as x increases, y tends to decrease.

4.1 / 21

Page 22: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Properties of the Correlation Properties of the Correlation Coefficient Coefficient rr

• The closer r is to 1 or +1, the better a line describes the relationship between the two variables x and y.

• The value of r does not change when either variable is converted to different units.

4.1 / 22

Page 23: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Properties of the Correlation Properties of the Correlation Coefficient Coefficient rr

• The value of r is the same regardless of which variable is the explanatory variable and which variable is the response variable. In other words, the value of r is the same for the pairs (x, y) as for the pairs (y, x).

4.1 / 23

Page 24: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Computing the Correlation Coefficient Computing the Correlation Coefficient rr

• Obtain a random sample of n data pairs (x, y).• Using the data pairs, compute Σx, Σy, Σx², Σy²,

and Σxy.• Use the following formula:

2222 yynxxn

yxxynr

4.1 / 24

Page 25: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

x (Miles)

y (Min.)

x2 y2 xy

2 6 4 36 12

5 9 25 81 45

12 23 144 529 276

7 18 49 324 126

7 15 49 225 105

15 28 225 784 420

10 19 100 361 190

x = 58 y = 118 x2 = 596 y2=2340 xy = 1174

Example: ComputingExample: Computing rr

4.1 / 25

Page 26: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

ComputingComputing rr

2 22 2

2 2

)

7(1174) (58)(118)

7(596) 58 7(2340) 118

0.975

n xy x yr

n x x n y y

Interpretation of Interpretation of r: r:

An r value of 0.975 indicates a strong positive correlation between the variables x and y

4.1 / 26

Page 27: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

GUIDED EXERCISEIn one of the Boston city parks, there has been a

problem with muggings in the summer months. A police officer took a random sample of 10 days (out of the 90-day summer) and compile the following data. For each day, x represents the number of police officers on duty in the park and y represents the number of reported muggings on that day.

x 10 15 16 1 4 6 18 12 14 7y 5 2 1 9 7 8 1 5 3 6

4.1 / 27

Page 28: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

GUIDED EXERCISE Cont.a. Construct a diagram of x and y values.

Plot the (x, y)

b. From the scatter diagramr will be negative. The general trend is that large x values are associated with small y values and vice versa. From left to right, the least-square line goes down 4.1 / 28

Page 29: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

GUIDED EXERCISE Cont.c. Verify that Σx = 103, Σy = 47, Σ = 1347,

Σ = 295, and Σxy = 343.

Use calculator.

d. Compute r. Alternatively, find the value of r directly by using a calculator.

4.1 / 29

2x

2x2y

2 22 2

2 2

)

10(343) (103)(47)

10(1347) 103 10(295) 47

14110.969

(53.49)(27.22)

n xy x yr

n x x n y y

Page 30: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Sample compared to Sample compared to Population CorrelationPopulation Correlation

• Sample correlation coefficient = rr

• Population correlation coefficient = ρρ

• ρρ is the Greek letter rho.

4.1 / 30

Page 31: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

A Caution

• The correlation coefficient measures the strength of the relationship between two variables.

• A strong correlation does not imply a cause and effect relationship.

• A correlation between two variables may be caused by other (either known or unknown) variables called lurking variables.

4.1 / 31

Page 32: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Lurking Variable

• A lurking variable is neither an explanatory nor a response variable.

• A lurking variable may be responsible for changes in both x and y.

4.1 / 32

Page 33: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

ExampleCorrelation does not equal Causation!

You were given the data the weight of cars in pounds with their highway gas mileage. You found a linear regression equation and determined that your model was a good fit.

Car Weight in Pounds Gas Mileage MPG 3489 28 3955 25 3345 27 3085 29 4915 18 4159 21 4289 20 3992 26

4.1 / 33

Page 34: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Example cont.Correlation does not equal Causation!

• So, you now state for the whole world to hear that heavier cars get less gas mileage. Right???

• Not necessarily. Your statement may be correct for this particular set of data, but it may not be a universal truth.

• It may also be true that the weight of the car has nothing to do with the gas mileage. Perhaps some other factor is affecting the gas mileage.

• Just because a correlation exists does not guarantee that the change in one of your variables is causing the change in the other variable.

4.1 / 34

Page 35: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Example Cause-Effect RelationshipDuring the months of March and April, the weekly weight

increases of a puppy in New York were collected. For the same time frame, the retail price increases of snowshoes in Alaska were collected.

Weekly Data CollectionThe weight of a The retail price of

Growing puppy in snowshoes in

New York Alaska 8 pounds $32.45

8.5 $32.959 $33.459.6 $34.00

10.1 $34.5010.7 $35.1011.5 $35.63

4.1 / 35

Page 36: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Example Cause-Effect Relationship cont.

• The data was examined and was found to have a very strong linear correlation. So, this must mean that the weight increase of a puppy in New York is causing snowshoe prices in Alaska to increase. Of course this is not true!

• The moral of this example is: "be careful what you infer from your statistical analyses." Be sure your relationship makes sense. Also keep in mind that other factors may be involved in a cause-effect relationship

4.1 / 36

Page 37: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Scatter Plots (calc)• A scatter plot is a graph used to determine whether there is a

relationship between paired data. • In many real-life situations, scatter plots follow patterns that are

approximately linear. If y tends to increase as x increases, then the paired data are said to be a positive correlation. If y tends to decrease as x increases, the paired data are said to be a negative correlation. If the points show no linear pattern, the paired data are said to have relatively no correlation.To set up a scatter plot:Clear (or deactivate) any entries in Y= before you begin.

• 1. Enter the X data values in L1. Enter the Y data values in L2, being careful that each X data value and its matching Y data value are entered on the same horizontal line.

4.1 / 37

Page 38: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Scatter Plots cont. (calc)2. Activate the scatter plot. Press 2nd STATPLOT and choose

#1 PLOT 1. Be sure the plot is ON, the scatter plot icon is highlighted, and that the list of the X data values are next to Xlist, and the list of the Y data values are next to Ylist. Choose any of the three marks.

3. To see the scatter plot, press ZOOM and #9 ZoomStat. Hitting TRACE and right arrow will move along the data points.

4. To turn the scatter plot off, when you are finished with this problem: Method 1: Go to the Y= screen. Arrow up onto the PLOT highlighted at the top of the screen. Press ENTER to turn it off. Method 2: Go to STAT PLOT (above Y=). Choose your PLOT location. Arrow to OFF. Press ENTER to turn it off.

4.1 / 38

Page 39: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Scatter Plots cont. (calc)• Follow-up:

* At this point, the graph may be observed for the existence of a positive, negative or no correlation between the data.* A line of best fit can be calculated “manually”. 1. Select two points that you feel would give a line that fits the data. 2. Using your knowledge of equations of lines and slope, write the equation of your line. 3. Enter this equation into Y1 and graph. 4. How well does the line “fit” the data? 5. Use your line to make predictions.

• * Or a line of best fit can be calculated "using the calculator". See Line of Best Fit.

4.1 / 39

Page 40: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Line of Best Fit (calc)• A line of best fit (or "trend" line) is a straight line that best

represents the data on a scatter plot. This line may pass through some of the points, none of the points, or all of the points.

• • You can examine lines of best fit with:

1. paper and pencil only 2. a combination of graphing calculator and paper and pencil 3. or solely with the graphing calculator

4.1 / 40

Page 41: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Line of Best Fit cont. (calc)• Example: Is there a relationship between the fat grams and the total

calories in fast food?• Sandwich Total Fat (g) Total Calories • Hamburger 9 260 • Cheeseburger 13 320 • Quarter Pounder 21 420 • Quarter Pounder with Cheese 30 530 • Big Mac 31 560 • Arch Sandwich Special 31 550 • Arch Special with Bacon 34 590 • Crispy Chicken 25 500 • Fish Fillet 28 560 • Grilled Chicken 20 440 • Grilled Chicken Light 5 300 4.1 / 41

Page 42: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Line of Best Fit cont. (calc)Paper and Pencil Solution:1. Prepare a scatter plot of the data on graph paper.2. Find two points that you think will be on the "best-fit" line.

Perhaps you chose the points (9, 260) and (30,530). Different people may choose different points.

3. Calculate the slope of the line through your two points (rounded to three decimal places).

4.1 / 42

2 1

2 1

530 260 27012.857

30 9 21

y ym

x x

Page 43: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Line of Best Fit cont. (calc)4. Write the equation of the line. This equation can now be

used to predict information that was not plotted in the scatter plot. For example, you can use the equation to find the total calories based upon 22 grams of fat. Equation: Prediction based on 22 grams of fat:

• Different people may choose different points and arrive at

different equations. All of them are "correct", but which one is actually the "best"? To determine the actual "best" fit, we will use a graphing calculator. 4.1 / 43

1 1( )

260 12.857( 9)

12.857( 9) 260

y y m x x

y x

y x

12.857(22 9) 260

12.857(13) 260

427.141

y

y

y

Page 44: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Line of Best Fit cont. (calc)Graphing Calculator Solution:1. Enter the data in the calculator lists. Place the data in L1 and L2.

STAT, #1Edit, type values into the lists

2. Prepare a scatter plot of the data. Set up for the scatterplot. 2nd StatPlot - choose the first icon.

Choose ZOOM #9 ZoomStat.

4.1 / 44

Page 45: Section 4.1 Scatter Diagrams and Linear Correlation 4.1 / 1.

Line of Best Fit cont. (calc)3. Have the calculator determine the line of best fit.

STAT → CALC #4 LinReg(ax+b) Include the parameters L1, L2, Y1. (Y1 comes from VARS → YVARS, #Function, Y1)

You now have the values of a and b needed to write the equation of the actual line of best fit. y = 11.73128088x + 193.8521475

4. Graph the line of best fit. Simply hit GRAPH. To get a predicted value within the window, hit TRACE, up arrow, and type the desired value.

The screen shows x = 22.

4.1 / 45