Statistics and Data Analysis

39
Part 17: Regression Residuals 7-1/38 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics

description

Statistics and Data Analysis. Professor William Greene Stern School of Business IOMS Department Department of Economics. Statistics and Data Analysis. Part 17 – The Linear Regression Model. Regression Modeling. Theory behind the regression model Computing the regression statistics - PowerPoint PPT Presentation

Transcript of Statistics and Data Analysis

Page 1: Statistics and Data Analysis

Part 17: Regression Residuals17-1/38

Statistics and Data Analysis

Professor William GreeneStern School of Business

IOMS DepartmentDepartment of Economics

Page 2: Statistics and Data Analysis

Part 17: Regression Residuals17-2/38

Statistics and Data Analysis

Part 17 – The Linear Regression Model

Page 3: Statistics and Data Analysis

Part 17: Regression Residuals17-3/38

Regression Modeling

Theory behind the regression model Computing the regression statistics Interpreting the results Application: Statistical Cost Analysis

Page 4: Statistics and Data Analysis

Part 17: Regression Residuals17-4/38

A Linear Regression

Predictor: Box Office = -14.36 + 72.72 Buzz

Page 5: Statistics and Data Analysis

Part 17: Regression Residuals17-5/38

Data and Relationship

We suggested the relationship between box office sales and internet buzz is Box Office = -14.36 + 72.72 Buzz

Box Office is not exactly equal to -14.36+72.72xBuzz How do we reconcile the equation with the data?

Page 6: Statistics and Data Analysis

Part 17: Regression Residuals17-6/38

Modeling the Underlying Process A model that explains the process that produces

the data that we observe: Observed outcome = the sum of two parts (1) Explained: The regression line (2) Unexplained (noise): The remainder.

Internet Buzz is not the only thing that explains Box Office, but it is the only variable in the equation.

Regression model The “model” is the statement that part (1) is the

same process from one observation to the next.

Page 7: Statistics and Data Analysis

Part 17: Regression Residuals17-7/38

The Population Regression

THE model: (1) Explained:

Explained Box Office = α + β Buzz (2) Unexplained: The rest is “noise, ε.”

Random ε has certain characteristics Model statement

Box Office = α + β Buzz + ε Box Office is related to Buzz, but is not exactly

equal to α + β Buzz

Page 8: Statistics and Data Analysis

Part 17: Regression Residuals17-8/38

The Data Include the Noise

Page 9: Statistics and Data Analysis

Part 17: Regression Residuals17-9/38

What explains the noise?What explains the variation in fuel bills?

ROOMS

FUEL

BILL

111098765432

1400

1200

1000

800

600

400

200

Scatterplot of FUELBILL vs ROOMS

Page 10: Statistics and Data Analysis

Part 17: Regression Residuals17-10/38

Noisy Data?What explains the variation in milk production other

than number of cows?

Page 11: Statistics and Data Analysis

Part 17: Regression Residuals17-11/38

Assumptions

(Regression) The equation linking “Box Office” and “Buzz” is stable

E[Box Office | Buzz] = α + β Buzz

Another sample of movies, say 2012, would obey the same fundamental relationship.

Page 12: Statistics and Data Analysis

Part 17: Regression Residuals17-12/38

Model Assumptions

yi = α + β xi + εi α + β xi is the “regression function” εi is the “disturbance. It is the unobserved

random component The Disturbance is Random Noise

Mean zero. The regression is the mean of yi. εi is the deviation from the regression. Variance σ2.

Page 13: Statistics and Data Analysis

Part 17: Regression Residuals17-13/38

We will use the data to estimate and β

Sample : a + b Buzz

Page 14: Statistics and Data Analysis

Part 17: Regression Residuals17-14/38

We also want to estimate 2 =√E[εi2]

Sample : a + b Buzze=y-a-bBuzz

Page 15: Statistics and Data Analysis

Part 17: Regression Residuals17-15/38

Standard Deviation of the Residuals Standard deviation of εi = yi-α-βxi is σ σ = √E[εi

2] (Mean of εi is zero) Sample a and b estimate α and β Residual ei = yi – a – bxi estimates εi Use √(1/N-2)Σei

2 to estimate σ.

N N2 2i i ii=1 i=1

e

e (y - a -bx )s = =

N- 2 N- 2

Why N-2? Relates to the fact that two parameters (α,β) were estimated. Same reason N-1 was used to compute a sample variance.

Page 16: Statistics and Data Analysis

Part 17: Regression Residuals17-16/38

Residuals

Page 17: Statistics and Data Analysis

Part 17: Regression Residuals17-17/38

Summary: Regression Computations

Nii 1

Nii 1

N2 2x ii 1

N2 2y ii 1

The same 5 statistics (with N) are still needed:N = 62 complete observations.

1y = y = 20.721 N1x = x = 0.48242N

1Var(x) = s = (x x) = 0.02453N-1

1Var(y) = s = (y y) = 305N-1

xy

Ni ii 1

.985

Cov(x,y) = s

1 = (x x)(y y) = 1.784N-1

xy2x

2 2 2y x

e

2 22 x

2y

sb = = 72.72

sa = y - bx = -14.36

(N-1)(s -b s )s = = 13.386

N- 2(for later...),

b sR = = 0.424

s

Page 18: Statistics and Data Analysis

Part 17: Regression Residuals17-18/38

Using se to identify outliersRemember the empirical rule, 95% of observations will lie within mean ± 2 standard deviations? We show (a+bx) ± 2se below.)

This point is 2.2 standard deviations from the regression.Only 3.2% of the 62 observations lie outside the bounds. (We will refine this later.)

Page 19: Statistics and Data Analysis

Part 17: Regression Residuals17-19/38

Page 20: Statistics and Data Analysis

Part 17: Regression Residuals17-20/38

Linear Regression

Sample Regression Line

Page 21: Statistics and Data Analysis

Part 17: Regression Residuals17-21/38

Page 22: Statistics and Data Analysis

Part 17: Regression Residuals17-22/38

Page 23: Statistics and Data Analysis

Part 17: Regression Residuals17-23/38

Results to Report

Page 24: Statistics and Data Analysis

Part 17: Regression Residuals17-24/38

The Reported Results

Page 25: Statistics and Data Analysis

Part 17: Regression Residuals17-25/38

Estimated equation

Page 26: Statistics and Data Analysis

Part 17: Regression Residuals17-26/38

Estimated coefficients a and b

Page 27: Statistics and Data Analysis

Part 17: Regression Residuals17-27/38

S = se = estimated std. deviation of ε

Page 28: Statistics and Data Analysis

Part 17: Regression Residuals17-28/38

Square of the sample correlation between x and y

Page 29: Statistics and Data Analysis

Part 17: Regression Residuals17-29/38

N-2 = degrees of freedomN-1 = sample size minus 1

Page 30: Statistics and Data Analysis

Part 17: Regression Residuals17-30/38

Sum of squared residuals, Σiei

2

Page 31: Statistics and Data Analysis

Part 17: Regression Residuals17-31/38

S2 = se2

Page 32: Statistics and Data Analysis

Part 17: Regression Residuals17-32/38

N 2ii=1

Total Variation

= (y - y)

Page 33: Statistics and Data Analysis

Part 17: Regression Residuals17-33/38

2

N2

N

2ii=1

2ii=1

Coefficient of Determination R

b (x - x)= =

(y - y)RegressionSS

TotalSS

Page 34: Statistics and Data Analysis

Part 17: Regression Residuals17-34/38

The Model Constructed to provide a framework for

interpreting the observed data What is the meaning of the observed relationship

(assuming there is one) How it’s used

Prediction: What reason is there to assume that we can use sample observations to predict outcomes?

Testing relationships

Page 35: Statistics and Data Analysis

Part 17: Regression Residuals17-35/38

A Cost Model

Electricity.mpjTotal cost in $MillionOutput in Million KWHN = 123 American electric utilitiesModel: Cost = α + βKWH + ε

Page 36: Statistics and Data Analysis

Part 17: Regression Residuals17-36/38

Cost Relationship

Output

Cost

80000700006000050000400003000020000100000

500

400

300

200

100

0

Scatterplot of Cost vs Output

Page 37: Statistics and Data Analysis

Part 17: Regression Residuals17-37/38

Sample Regression

Page 38: Statistics and Data Analysis

Part 17: Regression Residuals17-38/38

Interpreting the Model Cost = 2.44 + 0.00529 Output + e Cost is $Million, Output is Million KWH. Fixed Cost = Cost when output = 0

Fixed Cost = $2.44Million Marginal cost

= Change in cost/change in output= .00529 * $Million/Million KWH= .00529 $/KWH = 0.529 cents/KWH.

Page 39: Statistics and Data Analysis

Part 17: Regression Residuals17-39/38

Summary

Linear regression model Assumptions of the model Residuals and disturbances

Estimating the parameters of the model Regression parameters Disturbance standard deviation

Computation of the estimated model