Multiple Regression Last Part – Model Reduction 1.

26
Multiple Regression Last Part – Model Reduction 1

Transcript of Multiple Regression Last Part – Model Reduction 1.

PowerPoint Presentation

Multiple RegressionLast Part Model Reduction1

Model ReductionMore of an art form than a scienceAs stated earlier, were trying to create a model predicting a DV that explains as much of the variance in that DV as possible, while at the same time:Meets the assumptions of MLRBest manages the other aforementioned issues sample size, outliers, multicollinearityIs parsimonious12

Model ReductionThe more variables, the higher the R2; conversely, our R2 will decrease every time we remove a variable from the modelSo, if were reducing our R2, we want to make sure that were making progress relative to the assumptions, sample size, multicollinearity, & parsimony1

MLR Model Reduction ExampleLets use the March Madness homework dataRQ: How effectively can we create a model to predict teams success in the NCAA Division I Mens Basketball Tournament?Alpha=.05 a priori for all tests12

MLR Model Reduction ExampleUse SPSS to find the Cooks distance for the data:2

1

2

MLR Model Reduction ExampleOutput from Cooks distance request:

1

The largest Cooks distance is smaller than 1, so no problem (CD of > 1 signifies a problem with an influential data point, so you should consider eliminating it)Output from Cooks distance request (data file:MLR Model Reduction Example

1

MLR Model Reduction ExampleExamine the correlation matrix to see which variables are correlating with the DV and for multicollinearity among IVsMatrix on next slideCorrelations above .5 are somewhat concerningthose above .7 or particularly .8 are larger concernsI count eight pairwise correlations (not involving the DV) that are .7+1

1

MLR Model Reduction ExampleWhat does this tell us?

1

MLR Model Reduction ExampleWhat does this tell us?

1

1

1Sample size concernsRecall:Tabachnick & Fidell (1996): n > 50 + 8kHere k (# predictors) = 13n = 19250 + (8 * 13) = 50 + 104 = 154So the inequality is satisfied hereCould still be improved by losing some predictors

MLR Model Reduction ExampleAm I satisfied with this model, or should I examine another model by reducing via IV elimination?Because of some serious multicollinearity problems, it seems we can create a better model via reduction1

MLR Model Reduction ExampleSo, what variables do we drop?In examining variables to drop, look at:Pairwise correlation with the DV ( good)Multicollinearity with other IVs ( good)Prediction strength in model (ideal to have no non-significant IVs in model) ( good)Common sense make your decisions based on BOTH statistical and practical grounds1234

This is an important slide, folks!65

MLR Model Reduction ExampleWins, losses, and winning % are all obviously highly correlated with one another of the three, wins has the highest pairwise correlation w/ the DV and the highest t-score of the three in the model, so lets keep it but drop the other two12

Example Model #2So, lets re-run the analysis without those 2 variables & see what we get

1

1

Example Model #2Compare from one model to the next:R2F-statisticIVs in modelSo, how did we do?Happy with this model?123

Example Model #3Lets try to clear up a couple of other multicollinearity problems:Top 50 wins vs. Top 50 win %Strength of schedule vs. RPI vs. Conference membershipLets drop Top 50 win % & SOSAlso, lets get rid of # of wins in last ten games and Top 50 losses as they havent been significant anywhere123

4

Example Model #3Model #3

1

1

Example Model #3/4How did we do this time?A fourth model should perhaps get rid of automatic bid & conference affiliation1

MLR Model ReductionAs you can see, this trial-and-error process can continue at some lengthThe goal is to create a highly predictive, parsimonious model with as few problems with assumptions & multicollinearity as possible1

Finis

Some more notes on MC...VIF & tol.KNR 445Regression: slide 25

Some more notes on MC...VIF & tol.KNR 445Regression: slide 26