Regression Regression: Mathematical method for determining the best equation that reproduces a data...

14
Regression Regression: Mathematical method for determining the best equation that reproduces a data set Linear Regression: Regression method applied with a linear model (straight line) Uses Prediction of new X,Y values Understanding data behavior • Verification of hypotheses/physical laws

Transcript of Regression Regression: Mathematical method for determining the best equation that reproduces a data...

Page 1: Regression Regression: Mathematical method for determining the best equation that reproduces a data set Linear Regression: Regression method applied with.

Regression

• Regression: Mathematical method for determining the best equation that reproduces a data set

• Linear Regression: Regression method applied with a linear model (straight line)

• Uses– Prediction of new X,Y values– Understanding data behavior

• Verification of hypotheses/physical laws

Page 2: Regression Regression: Mathematical method for determining the best equation that reproduces a data set Linear Regression: Regression method applied with.

Regression

• The Linear Model

Y = mX + b

Y = Dependent variable

X = Independent variable

m = slope = Y/X

b = y-intercept (point where

line crosses y-axis at x=0)

0

2

4

6

8

10

12

0 5 10 15 20 25

X

YX

Y

X1=1, Y1=2.4

X2=20, Y2=10

Page 3: Regression Regression: Mathematical method for determining the best equation that reproduces a data set Linear Regression: Regression method applied with.

Regression• Fitting the data: finding the equation for the straight

line that does the best job of reproducing the data.

Average Income versus % with a College Degree (by State)

15,000

20,000

25,000

30,000

35,000

40,000

10 15 20 25 30 35

Percentage of Population with College Degree or Higher

Ave

rage

Inco

me

Leve

l ($

per

year

)

Page 4: Regression Regression: Mathematical method for determining the best equation that reproduces a data set Linear Regression: Regression method applied with.

Regression

• Residual: Difference between measured and calculated Y-values

Average Income versus % with a College Degree (by State)

22,000

22,500

23,000

23,500

24,000

24,500

25,000

25,500

26,000

15 15.5 16 16.5 17 17.5 18 18.5 19 19.5 20

Percentage of Population with College Degree or Higher

Ave

rag

e In

co

me

Le

vel (

$ p

er

yea

r)

Page 5: Regression Regression: Mathematical method for determining the best equation that reproduces a data set Linear Regression: Regression method applied with.

Regression Analysis

• Use the least square method to “best fit” a straight line through the data points.

• A straight line is described by its slope and “y”-intercept in a x-y plot.

• Need to determine the numerical values of the slope and the “y”-intercept from the data.

• This is equivalent to adding a trendline to your scatter plot in EXCEL.

Page 6: Regression Regression: Mathematical method for determining the best equation that reproduces a data set Linear Regression: Regression method applied with.

Regression Analysis

• The least square method consists of defining a difference, called the residual, between the regression line and a data point along a measured “x” value.

• Then add up the squared residuals for all data points.

• Adjusting the slope and the “y”-intercept of the regression line so that the sum of squared residuals, called regression error, has the smallest value.

Page 7: Regression Regression: Mathematical method for determining the best equation that reproduces a data set Linear Regression: Regression method applied with.

Regression Analysis

• The covariance appears in the calculation of the correlation coefficient between the measurements of two variables.

• Let us denote the two variables as “x” and “y”.

• Their measurements are the “x” data set and the “y” data set.

Page 8: Regression Regression: Mathematical method for determining the best equation that reproduces a data set Linear Regression: Regression method applied with.

Regression Analysis

• The slope of the regression line is given by the ratio of the covariance between the “x” and “y” data sets and of the variance of the “x” data set.

• You then use the equation of the line to determine the y-intercept. You MUST use the mean of x and the mean of y for this equation since your data points are likely not on the regression line.

Page 9: Regression Regression: Mathematical method for determining the best equation that reproduces a data set Linear Regression: Regression method applied with.

Regression Analysis

• Once we determined the slope and the “y” intercept of the regression line, we have a mathematical relation that ties the “x” variable to the “y” variable.

• We can use this relation to predict values of “y” given a “x” value that are not on the data sets.

Page 10: Regression Regression: Mathematical method for determining the best equation that reproduces a data set Linear Regression: Regression method applied with.

Regression Analysis

• Interpolation – the process by which we use the regression line to predict a value of the “y” variable for a value of the “x” variable that is not one of the data points but is within the range of the data set.

• The “x” and “y” points will lie on the regression line.

Page 11: Regression Regression: Mathematical method for determining the best equation that reproduces a data set Linear Regression: Regression method applied with.

Regression Analysis

• Extrapolation – the process by which we use the regression line to predict a value of the “y” variable for a value of the “x” variable that is outside of the range of the data set.

• The “x” and “y” points also lie on the regression line but outside of the range of the data set.

Page 12: Regression Regression: Mathematical method for determining the best equation that reproduces a data set Linear Regression: Regression method applied with.

Tricks of the Trade

• A curve can be partitioned into sections and “best” fitted a different curve in each section.

• Use scaling as a mean to increase the accuracy of the “fitted” curve.

Page 13: Regression Regression: Mathematical method for determining the best equation that reproduces a data set Linear Regression: Regression method applied with.

Multivariate AnalysisRegression

• Prediction: Once the best fit line has been determined, the equation can be used to predict new values of Y for any given X and vice versa. (Interpolation/Extrapolation)

y = 772.03x + 10810

If a states % of the population with a college degree is 20%, then they can expect an average income level of

y = 772.03(20) + 10810 = $26,250

If a states average income level is $30,000, then what % of its population has a college degree?

x = (30,000 – 10810)/772.03 = 24.9%

Page 14: Regression Regression: Mathematical method for determining the best equation that reproduces a data set Linear Regression: Regression method applied with.

Multivariate Analysis

• Excel Functions and Tools– SLOPE() - Returns the slope when passed X, Y data..

– INTERCEPT() - Returns the intercept when passed X, Y data..

– LINEST() - Returns the slope and intercepts when passed X, Y data..

– TREND() - Returns predicted values in a linear trend when passed X, Y data..

– Trendline (from the Chart menu) Returns the trendline, equation, and correlation coefficient for a set of X,Y data.