Multiple regression models - Applied...

100
Multiple regression models

Transcript of Multiple regression models - Applied...

Page 1: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

Multiple regression models

Page 2: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

2

Multiple Regression AnalysisDefinitionThe multiple regression model equation is

Y = β0 + β1x1 + β2x2 + ... + βkxk + ∈

where E(∈) = 0 and V(∈) = σ 2.

In addition, for purposes of testing hypotheses andcalculating CIs or PIs, it is assumed that ∈ is normallydistributed.

This is not a regression line any longer, but a regressionsurface.

Page 3: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

3

Multiple Regression AnalysisLet be particular values of x1,...,xk.Then

Thus just as β0 + β1x describes the mean Y value as afunction of x in simple linear regression, the true(or population) regression function β0 + β1x1 + . . . + βkxkgives the expected value of Y as a function of x1,..., xk.

The βi’s are the true (or population) regressioncoefficients.

Page 4: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

4

Multiple Regression AnalysisThe regression coefficient β1 is interpreted as theexpected change in Y associated with a 1-unitincrease in x1 while x2,..., xk are held fixed.

Analogous interpretations hold for β2,..., βk.

These coefficients are called partial or adjustedregression coefficients.

Page 5: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

5

Estimating Parameters

Page 6: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

6

Estimating ParametersThe data in simple linear regression consists of n pairs(x1, y1), . . . , (xn, yn). Suppose that a multiple regressionmodel contains two predictor variables, x1 and x2.

Then the data set will consist of n triples (x11, x21, y1), (x12,x22, y2), . . . , (x1n, x2n, yn). Here the first subscript on xrefers to the predictor and the second to the observationnumber.

More generally, with k predictors, the data consists ofn (k + 1) tuples (x11, x21,..., xk1, y1), (x12, x22,..., xk2, y2), . . . ,(x1n, x2n ,. . . , xkn, yn), where xij is the value of the ithpredictor xi associated with the observed value yj.

Page 7: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

7

Estimating ParametersThe observations are assumed to have been obtainedindependently of one another according to the model

To estimate the parameters β0, β1,..., βk using the principleof least squares, form the sum of squared deviations:

f (β0, β1,..., βk) = [yj – (β0 + β1 x1j + β2 x2j + . . . + βk xkj )]2

The least squares estimates are those values of the βi’sthat minimize f(β0,..., βk).

Page 8: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

8

Estimating Parameters

Taking the partial derivative of f with respect to each βi (i = 0,1, . . . , k) and equating all partials to zero yieldsthe following system of normal equations:

β0n + β1Σx1j + β2Σx2j +. . . + βkΣxkj = Σyj

Page 9: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

9

Estimating ParametersThese equations are linear in the unknowns b0, b1,..., bk.Solving yields the least squares estimates .

This is best done by utilizing a statistical software package.

Page 10: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

10

Example – shear strength

En experiment was carried out to assess theimpact of the variables

• x1 = force (gm),• x2 = power (mW),• x3 = tempertaure (°C), and• x4 = time (msec)

y = bond shear strength (gm).

Page 11: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

11

Example -- shear strengthA computer package gave the following least squaresestimates:

–37.48 .2117 .4983 .1297

.2583

.1297 gm is the estimated average change in strengthassociated with a 1-degree increase in temperature whenthe other three predictors are held fixed;(the other estimated coefficients are interpreted in a similarmanner.)

cont’d

Page 12: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

12

Example -- shear strengthHow do you write the estimated regression equation?

A point prediction of strength resulting from a force of 35gm, power of 75 mW, temperature of 200° degrees, andtime of 20 msec is…?

cont’d

Page 13: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

13

Fitted values

The predicted value results from substituting the valuesof the various predictors from the first observation into theestimated regression function:

The remaining predicted values come fromsubstituting values of the predictors from the 2nd, 3rd,...,and finally nth observations into the estimated function.

Page 14: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

14

For example, the values of the 4 predictors for the lastobservation in the previous example are x1,30 = 35, x2,30 =75, x3,30 = 200, x4,30 = 20, so

= –37.48 + .2117(35) + .4983(75) + .1297(200) +.2583(20) = 38.41

The residuals are the differencesbetween the observed and predicted values.

Fitted values

Page 15: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

15

Models with Interaction and Quadratic Predictors

Example: an investigator has obtained observations on y, x1, andx2. One possible model is:

Y = β0 + β1x1 + β2x2 + ∈.

However, other models can be constructed by forming predictorsthat are mathematical functions of x1 and/or x2.

For example, with and x4 = x1x2, the model

Y = β0 + β1x1 + β2x2 + β3x3 + β4x4 + ∈

also has the general MR form.

Page 16: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

16

Models with Interaction and Quadratic Predictors

Sometimes, mathematical functions of other predictors mayresult in a more successful model in explaining variation iny. This is still “linear regression”, even though therelationship between outcome and predictors may not be.

For example, the model

Y = β0 + β1x + β2x2 + ∈ is still MLR with k = 2, x1 = x, and x2= x2.

Page 17: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

17

Example -- travel timeThe article “Estimating Urban Travel Times:A Comparative Study” (Trans. Res., 1980: 173–175)described a study relating the dependent variabley = travel time between locations in a certain city and theindependent variable x2 = distance between locations.

Two types of vehicles, passenger cars and trucks, wereused in the study.

Let

Page 18: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

18

Example – travel timeOne possible multiple regression model is

Y = β0 + β1x1 + β2x2 + ∈

The mean value of travel time depends on whether avehicle is a car or a truck:

mean time = β0 + β2x2 when x1 = 0 (cars)

mean time = β0 + β1 + β2x2 when x1 = 1 (trucks)

cont’d

Page 19: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

19

Example -- travel timeThe coefficient β1 is the difference in mean times betweentrucks and cars with distance held fixed; if β1 > 0, onaverage it will take trucks the same amount of time longerto traverse any particular distance than it will for cars.

A second possibility is a model with an interactionpredictor:

Y = β0 + β1x1 + β2x2 + β3x1x2 + ∈

Now the mean times for the two types of vehicles are…?Does it make sense to have different intercepts for carsand truck?

cont’d

Page 20: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

20

Example – travel timeFor each model, the graph of the mean time versusdistance is a straight line for either type of vehicle, asillustrated in the figure.

Regression functions for models with one dummy variable (x1) and one quantitative variable x2

(a) no interaction (b) interaction

cont’d

Page 21: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

21

Models with Predictors for Categorical Variables

You might think that the way to handle a three-categorysituation is to define a single numerical variable with codedvalues such as 0, 1, and 2 corresponding to the threecategories.

This is generally incorrect, because it imposes an orderingon the categories that may not exist in reality. It’s ok to dothis for education categories (HS=1,BS=2,Grad=3), but notfor ethnicity, for example.

The correct approach to incorporating three unorderedcategories is to define two different indicator variables.

Page 22: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

22

Models with Predictors for Categorical Variables

Suppose, for example, that y is the lifetime of a certain tool,and that there are 3 brands of tool being investigated.

Let:x1 = 1 if tool A is used, and 0 otherwise,x2 = 1 if tool B is used, and 0 otherwise,x3 = 1 if tool C is used, and 0 otherwise.

Then, if an observation is on a:brand A tool: we have x1 = 1 and x2 = 0 and x3 = 0,brand B tool: we have x1 = 0 and x2 = 1 and x3 = 0,brand C tool: we have x1 = 0 and x2 = 0 and x3 = 1.

Page 23: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

23

Using regression to do mean testing

If we wish to model average lifetimes of the tool as afunction of brand (A, B and C), the model to use is:

Y = β0 + β2x2 + β3x3 + ∈We only need 2 out of 3 indicator variables - WHY?

Lifetimes for brands A, B, and C are written as?

Question: How do you test if the average lifetime is thesame for brand A and brand B?

Page 24: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

24

R2 and σ 2^

Page 25: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

25

The closer the residuals are to 0, the better the job ourestimated regression function is doing in makingpredictions corresponding to observations in the sample.

Error or residual sum of squares is SSE = Σ(yi – )2.It is again interpreted as a measure of how much variationin the observed y values is not explained by (not attributedto) the model relationship.

The number of df associated with SSE is n – (k + 1)because k + 1 df are lost in estimating the k + 1β coefficients.

R2 and σ 2^

Page 26: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

26

Total sum of squares, a measure of total variation in theobserved y values, is SST = Σ(yi – y)2.

Regression sum of squares SSR = Σ( – y)2 = SST – SSEis a measure of explained variation.

Then the coefficient of multiple determination R2 is

R2 = 1 – SSE/SST = SSR/SST

The positive square root of R2 is called the multiplecorrelation coefficient and is denoted by R.

R2 and σ 2^

Page 27: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

27

R2 and σ 2

Because there is no preliminary picture of multipleregression data analogous to a scatter plot for bivariatedata, the coefficient of multiple determination is ourfirst indication of whether the chosen model is successful inexplaining y variation.

Unfortunately, there is a problem with R2: Its value can beinflated by adding lots of predictors into the model even ifmost of these predictors are rather frivolous.

There is no such thing as “negative information” from thepredictors – they can either help or do nothing wrtprediction.

^

Page 28: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

28

R2 and σ 2

Example: suppose y is the sale price of a house. Thensensible predictors includex1 = the interior size of the house,x2 = the size of the lot on which the house sits,x3 = the number of bedrooms,x4 = the number of bathrooms, andx5 = the house’s age.

Now suppose we add inx6 = the diameter of the doorknob on the coat closet,x7 = the thickness of the cutting board in the kitchen,x8 = the thickness of the patio slab.

^

Page 29: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

29

R2 and σ 2

Unless we are very unlucky in our choice of predictors,using n – 1 predictors (one fewer than the sample size) willyield an R2 of almost 1.

The objective in multiple regression is not simply toexplain most of the observed y variation, but to do so usinga model with relatively few predictors that are easilyinterpreted.

It is thus desirable to adjust R2 to take account of the sizeof the model:

^

Page 30: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

30

R2 and σ 2

Because the ratio in front of SSE/SST exceeds 1, issmaller than R2. Furthermore, the larger the number ofpredictors k relative to the sample size n, the smaller willbe relative to R2.

Adjusted R2 can even be negative, whereas R2 itself mustbe between 0 and 1. A value of that is substantiallysmaller than R2 itself is a warning that the model maycontain too many predictors.

^

Page 31: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

31

R2 and σ 2

It can be shown that R is the sample correlation coefficientcalculated from the ( , yi) pairs.

SSE is also the basis for estimating the remaining modelparameter:

^

Page 32: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

32

ExampleInvestigators carried out a study to see how variouscharacteristics of concrete are influenced by x1 = %limestone powder and x2 = water-cement ratio, resulting inthe accompanying data (“Durability of Concrete withAddition of Limestone Powder,” Magazine of ConcreteResearch, 1996: 131–137).

Page 33: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

33

ExampleConsider compressive strength as the dependent variable y.

Using computer software, fitting the first order modelresults in:y = 84.82 + .1643x1 – 79.67x2,SSE = 72.52 (df = 6), R2 = .741, = .654

If we include an interaction predictor, we get:

y = 6.22 + 5.779x1 + 51.33x2 – 9.357x1x2

SSE = 29.35 (df = 5) R2 = .895 = .831

cont’d

Page 34: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

34

Model selection

Page 35: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

35

Three questions:

• Model utility: are all predictorssignificantly related to our y? (Is ourmodel worthless?)

• Does any particular predictor matter?

• Which predictor subset matters the most?(Which among all possible models is the“best”?)

Page 36: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

36

SSE

SSE is the left-over variation in the observations, afterfitting the regression model. The better the model fit, thelower SSE will be.

The SSE is the basis for estimating the error variance:

Adjusted R2 is one way of looking at how well the model fitsthe data:

Page 37: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

37

A Model Utility TestThe model utility test in simple linear regression involvesthe null hypothesis H0: β1 = 0, according to which there isno useful linear relation between y and the predictor x.

In MLR we test the hypothesis H0: β1 = 0, β2 = 0,..., βk = 0,which says that there is no useful linear relationshipbetween y and any of the k predictors. If at least one ofthese β’s is not 0, the model is deemed useful.

We could test each β separately, but that would take timeand be very conservative (if Bonferroni correction is used).

A better test is a joint test, and is based on a statistic thathas an F distribution when H0 is true.

Page 38: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

38

A Model Utility TestNull hypothesis: H0: β1 = β2 = … = βk = 0

Alternative hypothesis: Ha: at least one βi ≠ 0 (i = 1,..., k)

Test statistic value: F

SSR = regression sum of squares = SST – SSE

Rejection region for a level α test: f ≥ Fα,k,n – (k + 1)

Page 39: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

39

Example – bond shear strengthReturning to the bond shear strength data, a model withk = 4 predictors was fit, so the relevant hypotheses are

H0: β1 = β2 = β3 = β4 = 0 Ha: at least one of these four β s is not 0

Usually, an output from a statistical package will looksomething like this:

Page 40: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

40

Example – bond shear strength cont’d

Page 41: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

41

Example – bond shear strengthThe null hypothesis should be rejected at any reasonablesignificance level.

We conclude that there is a useful linear relationshipbetween y and at least one of the four predictors in themodel.

This does not mean that all four predictors are useful;we will focus on that next.

cont’d

Page 42: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

42

Inference in Multiple RegressionAll standard statistical software packages compute andshow the standard deviations of the regression coefficients,

Inference concerning a single βi is based on thestandardized variable

which has a t distribution with n – (k + 1) df.

A 100(1 – α)% CI for βi is

Page 43: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

43

Inferences in Multiple RegressionSingle predictor hypothesis has the formH0: βi = 0 for a particular i.

For example, after fitting the four-predictor model inexample on bond shear strength, the investigator mightwish to test H0: β4 = 0.

According to H0, as long as the predictors x1, x2, and x3remain in the model, x4 contains no useful informationabout y.

The test statistic value is the t ratio .

Page 44: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

44

Inferences in Multiple RegressionWe also see that as long as power, temperature, and timeare retained in the model, the predictor x1 = force can bedeleted.

Page 45: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

45

Inference in Multiple RegressionAn F Test for a Group of Predictors. The model utility Ftest was appropriate for testing whether there is usefulinformation about the dependent variable in any of the kpredictors (i.e., whether β1 = ... = βk = 0).

In many situations, one first builds a model containing kpredictors and then wishes to know whether any of thepredictors in a particular subset provide useful informationabout Y.

Note that with k predictors there are 2k possible models.When k is moderate, we can check all possible subsets.However, for k large (such as 100), that is a giganticnumber of possible models.

Page 46: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

46

Inference in Multiple RegressionThe relevant hypotheses are as follows:

H0: βl+1 = βl+2 = . . . = βk = 0(so the “reduced” model with only l predictorsY = β0 + β1x1 + . . . + βlxl + ∈ is the correct model)

versus

Ha: at least one among βl+1,..., βk is not 0(so the “full” model Y = β0 + β1x1 + . . . + βkxk + ∈, with kpredictors, is the correct model)

Page 47: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

47

Inferences in Multiple RegressionThe test is carried out by fitting both the full and reducedmodels. Because the full model contains not only thepredictors of the reduced model but also some extrapredictors, it should fit the data at least as well as thereduced model.

That is, if we let SSEk be the sum of squared residuals forthe full model and SSEl be the corresponding sum for thereduced model, then SSEk ≤ SSEl.

Page 48: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

48

Inferences in Multiple RegressionIntuitively, if SSEk is a great deal smaller than SSEl, the fullmodel provides a much better fit than the reduced model;the appropriate test statistic should then depend on thereduction SSEl – SSEk in unexplained variation.

SSEk = unexplained variation for the full model

SSEl = unexplained variation for the reduced model

Test statistic value:

Rejection region: f ≥ Fα,k–l,n – (k + 1)

Page 49: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

49

Example -- fuelThe fuel consumption dataset, collected by C. Binghamfrom the American Almanac for 1974 to study the fuelconsumption across different states. The variables are:

cont’d

----------------------------------------------------------name variable label----------------------------------------------------------state Statetax 1972 motor fuel tax rate, cents/galinc 1972 per capita income, thousands of dollarsroad 1971 federal-aid highways, thousands of milesdlic Percent of population with driver's licensefuel Motor fuel consumption, gal/person

Page 50: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

50

Example -- fuelWe want to look at the relationship of fuel consumption andfuel tax (which is measured in cents per gallon).

To assure that we work with comparable quantities, weonly look at per-capita versions of other variables thataffect fuel consumption.

For the initial analysis, we will look at the graphicalrelationship of the following variables which we think mightbe important in explaining the fuel consumption: tax, inc,road, dlic.

cont’d

Page 51: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

51

Example -- fuel cont’dMotor fuelconsumption,

gal/person

1972 motorfuel tax

rate,cents/gal

1971federal-aidhighways,

thousands ofmiles

1972 percapita

income,thousands of

dollars

Percent ofpopulation

withdriverslicence

400

600

800

1000

400 600 800 1000

5

10

5 10

0

10

20

0 10 20

3

4

5

3 4 5

40

60

80

40 60 80

Page 52: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

52

Example -- fuelFrom the scatterplot we can examine the relationshipbetween per-capita fuel consumption and other variables,as well as the relationships amongst the variablesthemselves.

This inter-predictor relationship is known as collinearity.There is always some of it in the model.

From the scatterplots in the top row (those describingrelationships between fuel and taxes, road length, per-capita income, and the proportion of driving licenses), wecan see that the per-capita fuel consumption seems to belinearly related only to two variables, tax and dlic, while theother variables (road, inc) seem to be rather weak.

cont’d

Page 53: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

53

Example -- fuel cont’d

Hence for now we choose to examine only tax and dlic and their effect onfuel:

But before we go for this big model, let us consider what adding the singlenew predictor to a simple linear regression model would do. Specifically,let us consider what would happen if we added tax to the SLR modelrelating fuel to dlic.

The main idea in adding tax variable to the model

is to explain the part of fuel that hasn't already been explained by dlicvariable. The fitting of one predictor after another is the central point inmultiple regression.

Page 54: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

54

Example -- fuel cont’d

Page 55: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

55

Example -- fuel cont’d

Page 56: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

56

Example -- fuel cont’d

We need to combine these two regressions and end up with amodel that contains both tax and dlic.

What can we say about the MLR model based on these twosimple regressions?

The R2 for the combined model will have to be bigger than48.9%, the biggest of the two individual R2’s. The total R2 will beadditive, i.e. 48.9%+ 20.4% =69.3% only if the two variables taxand dlic are uncorrelated, and tell us completely separateinformation about fuel consumption. This almost never happensin the real world.

In order to understand what one variable contributes on top ofthe other already in the model, we need to understand thatoverlap. We do that by running a regression of one on the other.

Page 57: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

57

Example -- fuel cont’d

Let’s say dlic is already in the model. What information cantax contribute that is not already captured by dlic?

That would be the residuals from this regression:

Page 58: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

58

Example -- fuel cont’d

What the residuals from that regression can explain invariation of the residuals from the regression of fuel on dlicis the contribution of tax to the model:

Page 59: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

59

Example -- fuel cont’d

Proceeding in the same way, we obtain the full model:

_cons 247. 5998 3578. 363 0. 07 0. 945 - 6968. 856 7464. 056 r oad 416. 0281 65. 36389 6. 36 0. 000 284. 2092 547. 8469 i nc 1270. 123 332. 1401 3. 82 0. 000 600. 2992 1939. 948 t ax 74. 44481 250. 1444 0. 30 0. 767 - 430. 0194 578. 909 dl i c - 109. 906 37. 08678 - 2. 96 0. 005 - 184. 6986 - 35. 11339 f uel c Coef . St d. Er r . t P>| t | [ 95% Conf . I nt er val ]

Tot al 212091875 47 4512593. 09 Root MSE = 1278. 8 Adj R- squar ed = 0. 6376 Resi dual 70317692. 3 43 1635295. 17 R- squar ed = 0. 6685 Model 141774183 4 35443545. 8 Pr ob > F = 0. 0000 F( 4, 43) = 21. 67 Sour ce SS df MS Number of obs = 48

. r eg f uel c dl i c t ax i nc r oad

Page 60: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

60

Example -- fuel cont’d

Tax does not seem significant. Re-running the modelwithout it gives:

_cons 1077. 415 2219. 442 0. 49 0. 630 - 3395. 576 5550. 406 r oad 404. 9094 53. 07603 7. 63 0. 000 297. 9417 511. 8771 i nc 1281. 629 326. 4479 3. 93 0. 000 623. 7166 1939. 542 dl i c - 114. 217 33. 78536 - 3. 38 0. 002 - 182. 307 - 46. 12713 f uel c Coef . St d. Er r . t P>| t | [ 95% Conf . I nt er val ]

Tot al 212091875 47 4512593. 09 Root MSE = 1265. 5 Adj R- squar ed = 0. 6451 Resi dual 70462530. 7 44 1601421. 15 R- squar ed = 0. 6678 Model 141629345 3 47209781. 6 Pr ob > F = 0. 0000 F( 3, 44) = 29. 48 Sour ce SS df MS Number of obs = 48

. r eg f uel c dl i c i nc r oad

Page 61: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

61

Standardizing Variables

Page 62: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

62

Standardizing VariablesFor multiple regression, especially when values of variablesare large in magnitude, it is advantageous to carry thiscoding one step further.

Let xi and si be the sample average and sample standarddeviation of the xij’s ( j = 1, …, n).

Now code each variable xi by xʹ′i = (xi – xi)/si. The codedvariable xʹ′i simply re-expresses any xi value in units ofstandard deviation above or below the mean.

Page 63: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

63

Standardizing VariablesThus if xi = 100 and si = 20, xi = 130 becomes xʹ′i = 1.5,because 130 is 1.5, standard deviations above the mean ofthe values of xi.

For example, the coded full second-order model with twoindependent variables has regression function

Page 64: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

64

Standardizing VariablesThe benefits of coding are(1) Increased numerical accuracy in all computations

(2) More accurate estimation because the individualparameters of the coded model characterize the behaviorof the regression function near the center of the data ratherthan near the origin.

Page 65: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

65

Example 20The article “The Value and the Limitations of High-SpeedTurbo-Exhausters for the Removal of Tar-Fog fromCarburetted Water-Gas” (Soc., Chemical Industry J. of1946: 166–168) presents data on y = tar content(grains/100 ft3) of a gas stream as a function of x1 = rotorspeed (rpm) and x2 = gas inlet temperature (°F).

Page 66: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

66

Example 20The data is also considered in the article “Some Aspects ofNonorthogonal Data Analysis” (J. of Quality Tech. 1973:67–79), which suggests using the coded model describedpreviously.

The means and standard deviations are x1 = 2991.13,s1 = 387.81, x2 = 58.468, and s2 = 6.944, soxʹ′1 = (x1 – 2991.13)/387.81 and xʹ′2 = (x2 – 58.468)/6.944.

With xʹ′3 = (xʹ′1)2, xʹ′4 = (xʹ′2)2, xʹ′5 = xʹ′1 xʹ′2, fitting the fullsecond-order model yielded = 40.2660, = –13.4041, = 10.2553, = 2.3313, = –2.3405, and = 2.5978.

cont’d

Page 67: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

67

Example 20The estimated regression equation is then

= 40.27 – 13.40xʹ′1 + 10.26xʹ′2 + 2.33xʹ′3 – 2.34xʹ′4 + 2.60xʹ′5

Thus if x1 = 3200 and x2 = 57.0, xʹ′1 = .539, xʹ′2 = –.211,whatis the fitted value?

cont’d

Page 68: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

68

Variable Selection

Page 69: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

69

Variable SelectionSuppose an experimenter has obtained data on a responsevariable y as well as on p candidate predictors x1, …, xp.

How can a best (in some sense) model involving a subsetof these predictors be selected?

We know that as predictors are added one by one into amodel, SSE cannot increase (a larger model cannot explainless variation than a smaller one) and will usually decrease,albeit perhaps by a small amount.

Page 70: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

70

Variable SelectionThere is no mystery as to which model gives the largest R2

value—it is the one containing all p predictors. What we’dreally like is a model involving relatively few predictors thatis easy to interpret and use yet explains a relatively largeamount of observed y variation.

For any fixed number of predictors (e.g., 5), it is reasonableto identify the best model of that size as the one with thelargest R2 value—equivalently, the smallest value of SSE.

The more difficult issue concerns selection of a criterionthat will allow for comparison of models of different sizes.

Page 71: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

71

Variable SelectionUse subscript k to denote a quantity computed from amodel containing k predictors (e.g., SSEk).

Three different criteria, each one a simple function of SSEk,are widely used.

1. , the coefficient of multiple determination for a k-predictor model. Because will virtually always increase as k does (and can never decrease), we are not interested in the k that maximizes .

Instead, we wish to identify a small k for which is nearly as large as R2 for all predictors in the model.

Page 72: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

72

Variable Selection2. MSEk = SSEk /(n – k – 1), the mean squared error for a k-predictor model.

This is often used in place of , because although never decreases with increasing k, a small decrease in SSEk obtained with one extra predictor can be more than offset by a decrease of 1 in the denominator of MSEk.

The objective is then to find the model having minimum MSEk. Since adjusted = 1 – MSEk /MST, where MST = SST/(n – 1), is constant in k, examination of adjusted is equivalent to consideration of MSEk.

Page 73: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

73

Variable Selection3. The rationale for the third criterion, Ck, is more difficult to understand, but the criterion is widely used by data analysts.

Suppose the true regression model is specified by m predictors—that is,

Y = β0 + β1x1 + … + βmxm + ∈ V(∈) = σ 2

so that

E(Y) = β0 + β1x1 + … + βmxm

Page 74: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

74

Variable SelectionConsider fitting a model by using a subset of k of these mpredictors; for simplicity, suppose we use x1, x2, …, xk.

Then by solving the system of normal equations, estimates are obtained (but not, of course, estimates ofany corresponding to predictors not in the fitted model).

The true expected value E(Y) can then be estimated by

Page 75: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

75

Variable SelectionNow consider the normalized expected total error ofestimation

The second equality must be taken on faith because itrequires a tricky expected-value argument.

Page 76: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

76

Variable SelectionA particular subset is then appealing if its value is small.Unfortunately, though, E(SSEk) and σ

2 are not known.

To remedy this, let s2 denote the estimate of σ 2 based on

the model that includes all predictors for which data isavailable, and define

Page 77: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

77

Variable SelectionA desirable model is then specified by a subset of predictorsfor which Ck is small.

The total number of models that can be created frompredictors in the candidate pool is 2k (because each predictorcan be included in or left out of any particular model—one ofthese is the model that contains no predictors).

Page 78: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

78

Variable SelectionIf k ≤ 5, then it would not be too tedious to examine allpossible regression models involving these predictors usingany good statistical software package.

But the computational effort required to fit all possiblemodels becomes prohibitive as the size of the candidatepool increases.

Several software packages have incorporated algorithmswhich will sift through models of various sizes in order toidentify the best one or more models of each particularsize.

Page 79: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

79

Variable SelectionMinitab, for example, will do this for k ≤ 31 and allows theuser to specify the number of models of each size(1, 2, 3, 4, or 5) that will be identified as having bestcriterion values.

Why would we want to go beyond the best single model ofeach size?

Page 80: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

80

Example 21The review article by Ron Hocking listed in the chapterbibliography reports on an analysis of data taken from the1974 issues of Motor Trend magazine.

The dependent variable y was gas mileage, there weren = 32 observations, and the predictors for which data wasobtained were:x1 = engine shape (1 = straight and 0 = V),x2 = number of cylinders,x3 = transmission type (1 = manual and 0 = auto),x4 = number of transmission speeds,x5 = engine size,x6 = horsepower,

Page 81: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

81

Example 21x7 = number of carburetor barrels,x8 = final drive ratio,x9 = weight, andx10 = quarter-mile time.

In the table, we present summary information from theanalysis.

Best Subsets for Gas Mileage Data of Example 21Table 13.10

cont’d

Page 82: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

82

Example 21The table describes for each k the subset having minimumSSEk; reading down the variables column indicates whichvariable is added in going from k to k + 1 (going from k = 2k = 3, x3 and x10 are added, and x2 is deleted).

The figure contains plots of , adjusted , and Ckagainst k; these plots are an important visual aid inselecting a subset.

cont’d

and Ck plots for the gas mileage data

Page 83: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

83

Example 21The estimate of σ2 is s2 = 6.24, which is MSE10. A simplemodel that rates highly according to all criteria is the onecontaining predictors x3, x9, and x10.

cont’d

Page 84: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

84

Variable SelectionGenerally speaking, when a subset of k predictors (k < m)is used to fit a model, the estimators will bebiased for and will also be a biasedestimator for the true E(Y) (all this because m – k predictorsare missing from the fitted model).

However, as measured by the total normalized expectederror , estimates based on a subset can provide moreprecision than would be obtained using all possiblepredictors; essentially, this greater precision is obtained atthe price of introducing a bias in the estimators.

Page 85: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

85

Variable SelectionA value of k for which Ck ≈ k + 1 indicates that the biasassociated with this k-predictor model would be small.

An alternative to this procedure is forward selection (FS).FS starts with no predictors in the model and considersfitting in turn the model with only x1, only x2, …, and finallyonly xm.

The variable that, when fit, yields the largest absolute t ratioenters the model provided that the ratio exceeds thespecified constant tin.

Page 86: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

86

Variable SelectionSuppose x1 enters the model. Then models with (x1, x2), (x1,x3), …, (x1, xm) are considered in turn.

The largest then specifies the enteringpredictor provided that this maximum also exceeds tin.

This continues until at some step no absolute t ratiosexceed tin. The entered predictors then specify the model.

The value tin = 2 is often used for the same reason thattout = 2 is used in BE.

Page 87: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

87

Variable SelectionFor the tar content data, FS resulted in the sequence ofmodels given in Steps 5, 4, …, 1 in the table. This will notalways be the case.

Table 13.11

Backward Elimination Results for the Data of Example 20

Page 88: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

88

Variable SelectionThe stepwise procedure most widely used is a combinationof FS and BE, denoted by FB. This procedure starts asdoes forward selection, by adding variables to the model,but after each addition it examines those variablespreviously entered to see whether any is a candidate forelimination.

For example, if there are eight predictors underconsideration and the current set consists of x2, x3, x5, andx6 with x5 having just been added, the t ratiosand are examined.

Page 89: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

89

Variable SelectionIf the smallest absolute ratio is less than tout, then thecorresponding variable is eliminated from the model (somesoftware packages base decisions on f = t2).

The idea behind FB is that, with forward selection, a singlevariable may be more strongly related to y than to either oftwo or more other variables individually, but thecombination of these variables may make the singlevariable subsequently redundant.

Page 90: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

90

Variable SelectionThis actually happened with the gas-mileage data discussedin previously, with x2 entering and subsequently leaving themodel.

Although in most situations these automatic selectionprocedures will identify a good model, there is no guaranteethat the best or even a nearly best model will result.

Close scrutiny should be given to data sets for which thereappear to be strong relationships among some of thepotential predictors; we will say more about this shortly.

Page 91: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

91

Identification of InfluentialObservations

Page 92: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

92

Identification of Influential Observations

In simple linear regression, it is easy to spot an observationwhose x value is much larger or much smaller than otherx values in the sample.

Such an observation may have a great impact on theestimated regression equation (whether it actually doesdepends on how far the point (x, y) falls from the linedetermined by the other points in the scatter plot).

Page 93: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

93

Identification of Influential Observations

In multiple regression, it is also desirable to know whetherthe values of the predictors for a particular observation aresuch that it has the potential for exerting great influence onthe estimated equation.

Page 94: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

94

Identification of Influential Observations

One method for identifying potentially influentialobservations relies on the fact that because each is alinear function of each predicted y value ofthe form is also a linear functionof the yj’s. In particular, the predicted values correspondingto sample observations can be written as follows:

Page 95: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

95

Identification of Influential Observations

Each coefficient hij is a function only of the xij’s in thesample and not of the yj’s. It can be shown that hij = hji andthat 0 ≤ hjj ≤ 1.

Focus on the “diagonal” coefficients h11, h22, …, hnn. Thecoefficient hjj is the weight given to yj in computing thecorresponding predicted value

This quantity can also be expressed as a measure of thedistance between the point (x1j, …, xkj) in k-dimensionalspace and the center of the data

Page 96: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

96

Identification of Influential Observations

It is therefore natural to characterize an observation whosehjj is relatively large as one that has potentially largeinfluence.

Unless there is a perfect linear relationship among the kpredictors, so the average of the hjj’s is(k + 1)/n.

Some statisticians suggest that if the jthobservation be cited as being potentially influential; othersuse 3(k + 1)/n as the dividing line.

Page 97: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

97

Example 24The accompanying data appeared in the article “Testing forthe Inclusion of Variables in Linear Regression by aRandomization Technique” (Technometrics, 1966:695–699) and was reanalyzed in Hoaglin and Welsch, “TheHat Matrix in Regression and ANOVA” (Amer. Statistician,1978: 17–23).

Page 98: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

98

Example 24The hij’s (with elements below the diagonal omitted bysymmetry) follow the data.

Here k = 2, so (k + 1)/n = 3/10 = .3; since h44 = .604 > 2(.3),the fourth data point is identified as potentially influential.

cont’d

Page 99: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

99

Identification of Influential Observations

Another technique for assessing the influence of the jthobservation that takes into account yj as well as thepredictor values involves deleting the jth observation fromthe data set and performing a regression based on theremaining observations.

If the estimated coefficients from the “deleted observation”regression differ greatly from the estimates based on thefull data, the jth observation has clearly had a substantialimpact on the fit.

Page 100: Multiple regression models - Applied Mathematicsamath.colorado.edu/sites/default/files/2014/04/1009752375/Week11.pdf · 6 Estimating Parameters The data in simple linear regression

100

Identification of Influential Observations

One way to judge whether estimated coefficients changegreatly is to express each change relative to the estimatedstandard deviation of the coefficient:

There exist efficient computational formulas that allow allthis information to be obtained from the “no-deletion”regression, so that the additional n regressions areunnecessary.