Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house...

26
Correlation and Regression Basic Concepts

Transcript of Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house...

Page 1: Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,

Correlation and Regression

Basic Concepts

Page 2: Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,

An Example

• We can hypothesize that the value of a house increases as its size increases.

• Said differently, size and house value “covary” or “co-relate.”

• Further, we can hypothesize that the relationship is a simple linear one, e.g., that as size increases, house value increases in a similar linear fashion.

• Hence we can use the simple linear equation,• y = a + bx, to describe the relationship

Page 3: Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,

We Ask Two Questions…

• Is there a relationship and how strong is it?

• and• What is the relationship?

• We answer the first with a new statistic, a “correlation” coefficient.

• We answer the second with a linear regression model.

Page 4: Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,

Two Questions

• We started with Correlation .

• We continue with Regression.

Page 5: Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,

Terms

• Independent and Dependent variables

• Scatterplots

• Correlation, correlation coefficient, r

• Regression, regression coefficient, b

• Regression, regression constant, a

• Ordinary Least Squares (OLS) equation:y = a + bx + e

Page 6: Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,

Issues

• Defining relationships– Nature of the relationship: for the moment,

linear– Strength of the relationship (using r)– Direction of the relationship (using r and b)– Calculation of the relationship: y = a + bx + e

Page 8: Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,

Illustration

• Case A. x= 2.5, y=2

• Case B. x=8, y = 7

Page 9: Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,

Linear Trend

Page 10: Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,

What if there are lots of data points?

0 1000 2000 3000 4000 5000SIZE

0

3000

6000

9000

12000P

RO

PV

ALU

Page 11: Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,

If there are more data points?How do we summarize the relationships in the data?

Page 12: Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,

Solution: Least Squares Regression, The Best Linear Fit

2 3 4 5 6 7 8 9Independent Variable

1

2

3

4

5

6

7

8

Dep

ende

nt V

aria

ble

A

B

C

Page 13: Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,

Some Theory• Knowing nothing else, the best estimate

of a variable is its mean.

2 3 4 5 6 7 8 9Independent Variable

1

2

3

4

5

6

7

8

De

pe

nd

en

t V

ari

ab

le

Mean YLinear Trend

A

BC

Page 14: Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,

The Regression Model does better…

• Deviation from y = yi – ymean

2 3 4 5 6 7 8 9Independent Variable

1

2

3

4

5

6

7

8

De

pe

nd

en

t V

ari

ab

le

Mean YLinear Trend

A

BC

Page 15: Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,

A Regression equation…

• Measures the nature of the relationship between x and y using a linear model

• Measures the direction of the relationship

• Accompanying statistics, for the time being, r, measures the strength of the relationship.

Page 16: Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,

Understanding the Improvement, measuring the deviations from the

mean

2 3 4 5 6 7 8 9Independent Variable

1

2

3

4

5

6

7

8

De

pe

nd

en

t V

ari

ab

le

Mean yLinear Trend

Page 17: Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,

More Terms

• Yi – the value of a particular case• Y mean – mean value of y• Y hat – y with a ^ above it soŷ

• (Yi – Ymean) = total deviation from mean Y• (Yhat – Ymean) = explained deviation of Yi from

Y mean• (Yi – Yhat) = unexplained deviation of Yi from Y

mean

Page 18: Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,

Bivariate Regression

• Relationships are modeled using the equation, y = a + bx + e

• Translation: The values of an interval level dependent variable, y, can be “predicted” or “modeled” by adding a constant, a, to the product of a slope coefficient, b, times the values of the independent variable, x, and an error term, e.

Page 19: Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,

Estimating the Equation, y = a + bx + e

• The regression equation is calculated by finding the equation that minimizes the sum of the squared deviations between the data points, the y’s, and the predicted y’s, also called y hat.

Page 20: Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,

ymeany

ypredictedorhaty y

ebxay

ˆ

Page 21: Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,

Correlation Coefficient: r

• A measure of the strength of a linear relationship between two interval variables, x and y

• Ranges from – 1 to + 1

• The higher the value of r (e.g., the closer to -1 or + 1, the stronger the relationship between x and y

Page 22: Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,

Correlation Coefficient calculation

• r = Covariance of x and y divided by the product of the standard deviation of x and the standard deviation of y

• Covariance is the sum of the products of the deviations of the cases divided by N.

Page 23: Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,

Equations...

22 )()(

))((

YYXX

YYXXr

tcoefficienncorrelatior

Page 24: Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,

Calculating a and b

2

222

22

)(

)ˆ(

YY

YYrR

XNX

YXNXYb

XbYN

XbYa

Page 25: Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,
Page 26: Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,

2 3 4 5 6 7 8 9Independent Variable

1

2

3

4

5

6

7

8

Dep

ende

nt V

aria

ble

A

B

C

2 3 4 5 6 7 8 9Independent Variable

1

2

3

4

5

6

7

8

De

pe

nd

en

t V

ari

ab

le

Mean YLinear Trend

A

BC

2 3 4 5 6 7 8 9Independent Variable

1

2

3

4

5

6

7

8

De

pe

nd

en

t V

ari

ab

le

Mean YLinear Trend

A

BC

2 3 4 5 6 7 8 9Independent Variable

1

2

3

4

5

6

7

8

De

pe

nd

en

t V

ari

ab

le

Mean yLinear Trend

X Y

2.5 2

4 7

8 7