Regression Analysis - spada.uns.ac.id

74
Regression Analysis Multiple Regression [ Cross-Sectional Data ]

Transcript of Regression Analysis - spada.uns.ac.id

Page 1: Regression Analysis - spada.uns.ac.id

Regression Analysis

Multiple Regression[ Cross-Sectional Data ]

Page 2: Regression Analysis - spada.uns.ac.id

Learning ObjectivesuExplain the linear multiple regression

model [for cross-sectional data]u Interpret linear multiple regression

computer outputuExplain multicollinearityuDescribe the types of multiple regression

models

Page 3: Regression Analysis - spada.uns.ac.id

Regression Modeling Steps

uDefine problem or questionuSpecify modeluCollect datauDo descriptive data analysisuEstimate unknown parametersuEvaluate modeluUse model for prediction

Page 4: Regression Analysis - spada.uns.ac.id

Simple vs. Multiple represents the

unit change in Y per unit change in X .

uDoes not take into account any other variable besides single independent variable.

i represents the unit change in Y per unit change in Xi.

uTakes into account the effect of other

i s.u “Net regression

coefficient.”

Page 5: Regression Analysis - spada.uns.ac.id

AssumptionsuLinearity - the Y variable is linearly related

to the value of the X variable. uIndependence of Error - the error

(residual) is independent for each value of X.uHomoscedasticity - the variation around

the line of regression be constant for all values of X.

uNormality - the values of Y be normally distributed at each value of X.

Page 6: Regression Analysis - spada.uns.ac.id

GoalDevelop a statistical model that can predict the values of a dependent ( ) variable based upon the values of the independent ( ) variables.

Page 7: Regression Analysis - spada.uns.ac.id

Simple Regression

A statistical model that utilizes one independent variable “X” to predict the

dependent variable “Y.”

Page 8: Regression Analysis - spada.uns.ac.id

Multiple RegressionA statistical model that utilizes two or more quantitative and qualitative explanatory variables (x1,..., xp) to predict a dependent variable Y.

Caution: have at least two or more quantitative explanatory variables (rule of thumb)

Page 9: Regression Analysis - spada.uns.ac.id

Multiple Regression Model

X2

X1

Y

e

Page 10: Regression Analysis - spada.uns.ac.id

Hypotheses

u H0: 1 = 2 = 3 = ... = P = 0

u H1: At least one regression coefficient is not equal to zero

Page 11: Regression Analysis - spada.uns.ac.id

Hypotheses (alternate format)

H0: = 0

H1: 0

Page 12: Regression Analysis - spada.uns.ac.id

Types of ModelsuPositive linear relationshipuNegative linear relationshipuNo relationship between X and YuPositive curvilinear relationshipuU-shaped curvilinearuNegative curvilinear relationship

Page 13: Regression Analysis - spada.uns.ac.id

Multiple Regression Models

Page 14: Regression Analysis - spada.uns.ac.id

Multiple Regression Equations

This is too complicated! You’ve got to

be kiddin’!

Page 15: Regression Analysis - spada.uns.ac.id

Multiple Regression Models

Page 16: Regression Analysis - spada.uns.ac.id

Linear ModelRelationship between one dependent & two

or more independent variables is a linear function

Page 17: Regression Analysis - spada.uns.ac.id

Method of Least SquaresuThe straight line that best fits the data.

uDetermine the straight line for which the differences between the actual values (Y) and the values that would be predicted from the fitted line of regression (Y-hat) are as small as possible.

Page 18: Regression Analysis - spada.uns.ac.id

Measures of VariationuExplained variation (sum of

squares due to regression)uUnexplained variation (error sum

of squares)

uTotal sum of squares

Page 19: Regression Analysis - spada.uns.ac.id

Coefficient of Multiple Determination

When null hypothesis is rejected, a relationship between Y and the X variables exists.Strength measured by R2 [ several types ]

Page 20: Regression Analysis - spada.uns.ac.id

Coefficient of Multiple Determination

R2y.123- - -P

The proportion of Y that is explained by the set of explanatory variables selected

Page 21: Regression Analysis - spada.uns.ac.id

Standard Error of the Estimate

the measure of variability around the line of regression

Page 22: Regression Analysis - spada.uns.ac.id

Confidence interval estimates

»True meanY.X

»IndividualY-hati

Page 23: Regression Analysis - spada.uns.ac.id

Interval Bands [from simple regression]

Page 24: Regression Analysis - spada.uns.ac.id

Multiple Regression EquationY-hat = 0 + 1x1 + 2x2 + ... + PxP + where:

0 = y-intercept {a constant value}

= slope of Y with variable x1 holding the variables x2, x3, ..., xP effects constant

P = slope of Y with variable xP holding all other variables’ effects constant

Page 25: Regression Analysis - spada.uns.ac.id

Who is in Charge?

Page 26: Regression Analysis - spada.uns.ac.id

Mini-CasePredict the consumption of home heating oil during January for homes located around Screne Lakes. Two explanatory variables are selected - - average daily atmospheric temperature (oF) and the amount of attic insulation (“).

Page 27: Regression Analysis - spada.uns.ac.id

O il (G a l) T e m p In su la tio n275.30 40 3363.80 27 3164.30 40 1040.80 73 694.30 64 6

230.90 34 6366.70 9 6300.60 8 10237.80 23 10121.40 63 331.40 65 10

203.50 41 6441.10 21 3323.00 38 352.50 58 10

Mini-Case(0F)Develop a model for

estimating heating oil used for a single family home in the month of January based on average temperature and amount of insulation in inches.

Page 28: Regression Analysis - spada.uns.ac.id

Mini-CaseuWhat preliminary conclusions can home

owners draw from the data?

uWhat could a home owner expect heating oil consumption (in gallons) to be if the outside temperature is 15 oF when the attic insulation is 10 inches thick?

Page 29: Regression Analysis - spada.uns.ac.id

Multiple Regression Equation[mini-case]

Dependent variable: Gallons Consumed ------------------------------------------------------------------------------------- Standard T Parameter Estimate Error Statistic P-Value -------------------------------------------------------------------------------------- CONSTANT 562.151 21.0931 26.6509 0.0000 Insulation -20.0123 2.34251 -8.54313 0.0000 Temperature -5.43658 0.336216 -16.1699 0.0000 -------------------------------------------------------------------------------------- R-squared = 96.561 percent

R-squared (adjusted for d.f.) = 95.9879 percent Standard Error of Est. = 26.0138+

Page 30: Regression Analysis - spada.uns.ac.id

Multiple Regression Equation[mini-case]

where: = temperature [degrees F] = insulation [inches]

Page 31: Regression Analysis - spada.uns.ac.id

Multiple Regression Equation[mini-case]

u For a home with zero inches of attic

insulation and an outside temperature of 0 oF, 562.15 gallons of heating oil would be consumed.

+

Page 32: Regression Analysis - spada.uns.ac.id

Extrapolation

Page 33: Regression Analysis - spada.uns.ac.id

Multiple Regression Equation[mini-case]

u For a home with zero attic insulation and an outside

temperature of zero, 562.15 gallons of heating oil would be consumed.

u For each incremental increase in

degree F of temperature, heating oil

consumption drops 5.44 gallons.

+

Page 34: Regression Analysis - spada.uns.ac.id

Multiple Regression Equation[mini-case]

u For a home with zero attic insulation and an outside temperature of zero, 562 gallons of heating oil would be consumed.

u For each incremental increase in degree F of temperature, for a given amount of attic insulation, heating oil consumption drops 5.44 gallons.

uFor each incremental increase in inches of attic insulation, heating oil consumption drops 20.01 gallons.

Page 35: Regression Analysis - spada.uns.ac.id

Multiple Regression Prediction[mini-case]

with x1 = 15oF and x2 = 10 inches

Y-hat = 562.15 - 5.44(15) - 20.01(10) = 280.45 gallons consumed

Page 36: Regression Analysis - spada.uns.ac.id

Coefficient of Multiple Determination [mini-case]

R2y.12 = .9656

96.56 percent of the variation in heating oil can be explained by the variation in temperature andandand insulation.

Page 37: Regression Analysis - spada.uns.ac.id

Coefficient of Multiple Determination

uProportion of variation in Y ‘explained’ by all X variables taken together

uR2Y.12 = Explained variation = SSR

Total variation SSTuNever decreases when new X variable is added

to model– Only Y values determine SST– Disadvantage when comparing models

Page 38: Regression Analysis - spada.uns.ac.id

uProportion of variation in Y ‘explained’ by all X variables taken together

uReflects– Sample size– Number of independent variables

uSmaller [more conservative] than R2Y.12

uUsed to compare models

Coefficient of Multiple Determination Adjusted

Page 39: Regression Analysis - spada.uns.ac.id

Coefficient of Multiple Determination (adjusted)

R2(adj) y.123- - -P

The proportion of Y that is explained by the set of independent [explanatory] variables selected, adjusted for the number of independent variables and the sample size.

Page 40: Regression Analysis - spada.uns.ac.id

Coefficient of Multiple Determination (adjusted) [Mini-Case]

R2adj = 0.9599

95.99 percent of the variation in heating oil consumption can be explained by the model - adjusted for number of independent variables and the sample size

Page 41: Regression Analysis - spada.uns.ac.id

Coefficient of Partial Determination

uProportion of variation in Y ‘explained’ by variable XP holding all others constant

uMust estimate separate modelsuDenoted R2

Y1.2 in two X variables case– Coefficient of partial determination of X1 with Y

holding X2 constantuUseful in selecting X variables

Page 42: Regression Analysis - spada.uns.ac.id

Coefficient of Partial Determination [p. 878]

R2y1.234 --- P

The coefficient of partial variation of variable Y with x1 holding constant

the effects of variables x2, x3, x4, ... xP.

Page 43: Regression Analysis - spada.uns.ac.id

Coefficient of Partial Determination [Mini-Case]

R2y1.2 = 0.9561

For a fixed (constant) amount of insulation, 95.61 percent of the variation in heating oil can be explained by the variation in average atmospheric temperature. [p. 879]

Page 44: Regression Analysis - spada.uns.ac.id

Coefficient of Partial Determination [Mini-Case]

R2y2.1 = 0.8588

For a fixed (constant) temperature, 85.88 percent of the variation in heating oil can be explained by the variation in amount of insulation.

Page 45: Regression Analysis - spada.uns.ac.id

Testing Overall SignificanceuShows if there is a linear relationship

between all X variables together & YuUses p-valueuHypotheses

– H0: 1 = 2 = ... = P = 0 » No linear relationship

– H1: At least one coefficient is not 0 » At least one X variable affects Y

Page 46: Regression Analysis - spada.uns.ac.id

uExamines the contribution of a set of X variables to the relationship with Y

uNull hypothesis:– Variables in set do not improve significantly

the model when all other variables are included uMust estimate separate modelsuUsed in selecting X variables

Testing Model Portions

Page 47: Regression Analysis - spada.uns.ac.id

Diagnostic CheckinguH0 retain or reject

If reject - {p-value 0.05}uR2

adj

uCorrelation matrixuPartial correlation matrix

Page 48: Regression Analysis - spada.uns.ac.id

Multicollinearity

uHigh correlation between X variablesuCoefficients measure combined effectuLeads to unstable coefficients depending on

X variables in modeluAlways exists; matter of degreeuExample: Using both total number of rooms

and number of bedrooms as explanatory variables in same model

Page 49: Regression Analysis - spada.uns.ac.id

Detecting Multicollinearity

uExamine correlation matrix– Correlations between pairs of X variables are

more than with Y variableuFew remedies

– Obtain new sample data– Eliminate one correlated X variable

Page 50: Regression Analysis - spada.uns.ac.id

Evaluating Multiple Regression Model Steps

uExamine variation measuresuDo residual analysisuTest parameter significance

– Overall model– Portions of model – Individual coefficients

uTest for multicollinearity

Page 51: Regression Analysis - spada.uns.ac.id

Multiple Regression Models

Page 52: Regression Analysis - spada.uns.ac.id

Dummy-Variable Regression Model

u Involves categorical X variable with two levels– e.g., female-male, employed-not employed, etc.

Page 53: Regression Analysis - spada.uns.ac.id

Dummy-Variable Regression Model

u Involves categorical X variable with two levels– e.g., female-male, employed-not employed, etc.

uVariable levels coded 0 & 1

Page 54: Regression Analysis - spada.uns.ac.id

Dummy-Variable Regression Model

u Involves categorical X variable with two levels– e.g., female-male, employed-not employed, etc.

uVariable levels coded 0 & 1uAssumes only intercept is different

– Slopes are constant across categories

Page 55: Regression Analysis - spada.uns.ac.id

Dummy-Variable Model Relationships

Same slopes b1

Females

Males

Page 56: Regression Analysis - spada.uns.ac.id

Dummy Variables

u Permits use of qualitative data (e.g.: seasonal, class standing, location, gender).

u 0, 1 coding (nominative data)

u As part of Diagnostic Checking;

incorporate outliers (i.e.: large residuals)

and influence measures.

Page 57: Regression Analysis - spada.uns.ac.id

Multiple Regression Models

Page 58: Regression Analysis - spada.uns.ac.id

Interaction Regression Model

uHypothesizes interaction between pairs of X variables– Response to one X variable varies at different

levels of another X variableuContains two-way cross product terms Y = 0 + 1x1 + 2x2 + 3x1x2 +

uCan be combined with other models e.g. dummy variable models

Page 59: Regression Analysis - spada.uns.ac.id

Effect of Interaction

uGiven:

uWithout interaction term, effect of X1 on Y is measured by 1

uWith interaction term, effect of X1 onY is measured by 1 + 3X2– Effect increases as X2i increases

Page 60: Regression Analysis - spada.uns.ac.id

Interaction Example

Page 61: Regression Analysis - spada.uns.ac.id

Interaction Example

Page 62: Regression Analysis - spada.uns.ac.id

Interaction Example

Page 63: Regression Analysis - spada.uns.ac.id

Interaction Example

Page 64: Regression Analysis - spada.uns.ac.id

Multiple Regression Models

Page 65: Regression Analysis - spada.uns.ac.id

Inherently Linear ModelsuNon-linear models that can be expressed in

linear form– Can be estimated by least square in linear form

uRequire data transformation

Page 66: Regression Analysis - spada.uns.ac.id

Curvilinear Model Relationships

Page 67: Regression Analysis - spada.uns.ac.id

Logarithmic Transformation

Y = + 1 lnx1 + 2 lnx2 +

Page 68: Regression Analysis - spada.uns.ac.id

Square-Root Transformation

Page 69: Regression Analysis - spada.uns.ac.id

Reciprocal Transformation

Page 70: Regression Analysis - spada.uns.ac.id

Exponential Transformation

Page 71: Regression Analysis - spada.uns.ac.id

OverviewuExplained the linear multiple regression

modelu Interpreted linear multiple regression

computer outputuExplained multicollinearityuDescribed the types of multiple regression

models

Page 72: Regression Analysis - spada.uns.ac.id

Source of Elaborate Slides

Prentice Hall, IncLevine, et. all, First Edition

Page 73: Regression Analysis - spada.uns.ac.id

Regression Analysis[Multiple Regression]

*** End of Presentation ***Questions?

Page 74: Regression Analysis - spada.uns.ac.id