Investigating Linear Patterns in Data · 2019. 1. 19. · Investigating Linear Patterns in Data...

24
Investigating Linear Patterns in Data Bill Gillam Linear Regression

Transcript of Investigating Linear Patterns in Data · 2019. 1. 19. · Investigating Linear Patterns in Data...

Page 1: Investigating Linear Patterns in Data · 2019. 1. 19. · Investigating Linear Patterns in Data “If he was all on the same scale as his foot, he must certainly have been a giant.”

InvestigatingLinear Patternsin Data

Bill Gillam

Linear Regression

Page 2: Investigating Linear Patterns in Data · 2019. 1. 19. · Investigating Linear Patterns in Data “If he was all on the same scale as his foot, he must certainly have been a giant.”

InvestigatingLinear Patternsin Data

Bill Gillam

"The Adventure of Wisteria Lodge" Sherlock Holmes. Sir Arthur Conan Doyle, 20 April 1988. Granada Television. Access: https://www.youtube.com/watch?v=vY7jswaefbw 4:31/9:50

Page 3: Investigating Linear Patterns in Data · 2019. 1. 19. · Investigating Linear Patterns in Data “If he was all on the same scale as his foot, he must certainly have been a giant.”

InvestigatingLinear Patternsin Data

“If he was all on the same scale as his foot, he must certainly have been a giant.” - Sherlock Holmes

The Adventure of Wisteria Lodge.

Page 4: Investigating Linear Patterns in Data · 2019. 1. 19. · Investigating Linear Patterns in Data “If he was all on the same scale as his foot, he must certainly have been a giant.”

InvestigatingLinear Patternsin Data

Question:Is the length of a person’s foot a useful predictor of his/her height?

How can we gather and use data to tell?

Page 5: Investigating Linear Patterns in Data · 2019. 1. 19. · Investigating Linear Patterns in Data “If he was all on the same scale as his foot, he must certainly have been a giant.”

InvestigatingLinear Patternsin Data

What kind of evidence would convince us that the two variables footlength and height, to use Holmes’ words, are “on the same scale?”

Page 6: Investigating Linear Patterns in Data · 2019. 1. 19. · Investigating Linear Patterns in Data “If he was all on the same scale as his foot, he must certainly have been a giant.”

InvestigatingLinear Patternsin Data

One way to do this is to create a mathematical model of the supposed relationship from data, and then measure how closely the data actually matches our model.

Page 7: Investigating Linear Patterns in Data · 2019. 1. 19. · Investigating Linear Patterns in Data “If he was all on the same scale as his foot, he must certainly have been a giant.”

InvestigatingLinear Patternsin Data

We will use linear regression to find our model, and we will use the correlation coefficient to measure how strongly that model predicts our data.

Page 8: Investigating Linear Patterns in Data · 2019. 1. 19. · Investigating Linear Patterns in Data “If he was all on the same scale as his foot, he must certainly have been a giant.”

Linear Regression:vocabulary in

context

We are going to perform simple linear regression in this example. We are going to assume linear relationship, and we are going to have only one explanatory(independent, manipulated, input) variable. We will let footlength be our explanatory variable, and height be our response (dependent, responding, output) variable.

Page 9: Investigating Linear Patterns in Data · 2019. 1. 19. · Investigating Linear Patterns in Data “If he was all on the same scale as his foot, he must certainly have been a giant.”

Linear Regression

Here is our plan:Step One: Gather DataStep Two: See if the data trends linearlyStep Three: Use linear regression to find line-of-best-fit.Step Four: Find and interpret the correlation coefficient.

Page 10: Investigating Linear Patterns in Data · 2019. 1. 19. · Investigating Linear Patterns in Data “If he was all on the same scale as his foot, he must certainly have been a giant.”

Step One: Gather some data.

Here is some I gathered from one of my classes:

(EXCEL)

footlength(cm) height(cm)

22 178

25 173

20 150

24 169

22 166

29 170

22 163

23 169

29 179

28 167

27 178

20 157

23 161

27 163

29 172

28 163

25 156

29 175

25 152

23 160

22 170

Page 11: Investigating Linear Patterns in Data · 2019. 1. 19. · Investigating Linear Patterns in Data “If he was all on the same scale as his foot, he must certainly have been a giant.”

Studying numerical patterns (table)

Footlength(cm)

Height (cm)

Our Model: (y = ax + b)Height = 1.2712(foot)+134

Slope: 1.2712/1 = rise/run

Our model says that for each increase of 1 cm of footlength, a person should be 1.2712 cm taller.

Page 12: Investigating Linear Patterns in Data · 2019. 1. 19. · Investigating Linear Patterns in Data “If he was all on the same scale as his foot, he must certainly have been a giant.”

Studying numerical patterns (table)

Our Model: (y = ax + b)Height = 1.2712(foot)+134

Don’t extrapolate outside your data. Notice our model has no problem predicting a person with no foot as being 134 cm tall. Footlength(cm)

Height (cm)

Page 13: Investigating Linear Patterns in Data · 2019. 1. 19. · Investigating Linear Patterns in Data “If he was all on the same scale as his foot, he must certainly have been a giant.”

Studying numerical patterns (table)

We are given R2 = 0.2186

From this we can take a square root to get Pearson’s Correlation coefficient: R = .47

Footlength(cm)

Height (cm)

Page 14: Investigating Linear Patterns in Data · 2019. 1. 19. · Investigating Linear Patterns in Data “If he was all on the same scale as his foot, he must certainly have been a giant.”

Studying numerical patterns (table)Interpreting Correlation

An R greater than 0.8 is generally described as strong, whereas a correlation less than 0.5 is generally described as weak.

R = .47 indicates weak to moderate positive correlation.

Page 15: Investigating Linear Patterns in Data · 2019. 1. 19. · Investigating Linear Patterns in Data “If he was all on the same scale as his foot, he must certainly have been a giant.”

Studying numerical patterns (table)Interpreting Correlation

R2: R-squared: 0.2186

R2 is the percentage of the response variable variation that is explained by the model.

In our case, 21.86% of the variability in height can be explained as the variability expected from the variation in foot lengths.

Page 16: Investigating Linear Patterns in Data · 2019. 1. 19. · Investigating Linear Patterns in Data “If he was all on the same scale as his foot, he must certainly have been a giant.”

Studying numerical patterns (table)Pearson’s Correlation Coefficient

Pearson’s Correlation Coefficient - how it is calculated

Sum the product of the x and y z-score for each ordered pair and divide by n-1

Page 17: Investigating Linear Patterns in Data · 2019. 1. 19. · Investigating Linear Patterns in Data “If he was all on the same scale as his foot, he must certainly have been a giant.”

Studying numerical patterns (table)

x y x-xbar y-ybar (x-xbar)/sx*(y-ybar)/sy=zx*zy

6 5 -8 -2 0.7613

10 3 -4 -4 0.7613

14 7 0 0 0.0000

19 8 5 1 0.2379

21 12 7 5 1.6652

Sum zx*zy 3.4256

sx =6.2 sy = 3.4 sum/(n-1) 0.8564

Page 18: Investigating Linear Patterns in Data · 2019. 1. 19. · Investigating Linear Patterns in Data “If he was all on the same scale as his foot, he must certainly have been a giant.”

Studying numerical patterns (table)Pearson’s Correlation Coefficient

Conditions for using the correlation coefficient:•Quantitative Variables•Linear •Outliers are not distorting correlation

Page 19: Investigating Linear Patterns in Data · 2019. 1. 19. · Investigating Linear Patterns in Data “If he was all on the same scale as his foot, he must certainly have been a giant.”

Studying numerical patterns (table)Non-linear Data

This is the data for f-stops for a camera. Here we square the data to make it linear. Another transformation that statisticians sometimes use involves logs to straighten exponential data.

Sometimes we can make non-linear data linear

Page 20: Investigating Linear Patterns in Data · 2019. 1. 19. · Investigating Linear Patterns in Data “If he was all on the same scale as his foot, he must certainly have been a giant.”

Studying numerical patterns (table)Correlation vs. Causation

Page 21: Investigating Linear Patterns in Data · 2019. 1. 19. · Investigating Linear Patterns in Data “If he was all on the same scale as his foot, he must certainly have been a giant.”

Studying numerical patterns (table)Correlation vs. Causation

Correlation does not imply causation. Often the correlation is caused by lurking variables. A lurking variable is a variable that is not included as an explanatory or response variable but could strongly affect the correlation.

Another mistake we can make is to reverse the explanatory and response variables from their actual relationship. The stork story is an example of this.

Page 22: Investigating Linear Patterns in Data · 2019. 1. 19. · Investigating Linear Patterns in Data “If he was all on the same scale as his foot, he must certainly have been a giant.”

Studying numerical patterns (table)

Bad conclusions:•The number of AP courses a student takes and SAT performance strongly correlate, so we should enroll all students in more AP courses.

•Studies show that the number of police in an area positively correlates with the amount of gang activity, so we should reduce the number of policemen to reduce gang activity.

•Completed homework and student performance have a positive correlation, so all teachers should assign more homework every night in every course.

•The amount of damage caused by a fire has a positive correlation with the number of firemen at the scene. To reduce the amount of damage due to fires, we should send less firemen.

Discuss possible lurking variables or reversed axes.

Page 23: Investigating Linear Patterns in Data · 2019. 1. 19. · Investigating Linear Patterns in Data “If he was all on the same scale as his foot, he must certainly have been a giant.”

Studying numerical patterns (table)

What have we learned?

• We can use linear regression to find a linear model for appropriate data

• We can find the correlation coefficient, r, to see how well our model fits

• We can “Straighten” some data that is not linear by performing mathematical functions like squaring or taking the log of each data point

• Correlation is not necessarily causation, and we should watch or lurking variables.

Page 24: Investigating Linear Patterns in Data · 2019. 1. 19. · Investigating Linear Patterns in Data “If he was all on the same scale as his foot, he must certainly have been a giant.”

Studying numerical patterns (table)

What have we learned?

Munroe, R. (n.d.). Correlation. Retrieved January 18, 2019, from https://xkcd.com/552/