Regression Analysis - spada.uns.ac.id
Transcript of Regression Analysis - spada.uns.ac.id
Regression Analysis
Multiple Regression[ Cross-Sectional Data ]
Learning ObjectivesuExplain the linear multiple regression
model [for cross-sectional data]u Interpret linear multiple regression
computer outputuExplain multicollinearityuDescribe the types of multiple regression
models
Regression Modeling Steps
uDefine problem or questionuSpecify modeluCollect datauDo descriptive data analysisuEstimate unknown parametersuEvaluate modeluUse model for prediction
Simple vs. Multiple represents the
unit change in Y per unit change in X .
uDoes not take into account any other variable besides single independent variable.
i represents the unit change in Y per unit change in Xi.
uTakes into account the effect of other
i s.u “Net regression
coefficient.”
AssumptionsuLinearity - the Y variable is linearly related
to the value of the X variable. uIndependence of Error - the error
(residual) is independent for each value of X.uHomoscedasticity - the variation around
the line of regression be constant for all values of X.
uNormality - the values of Y be normally distributed at each value of X.
GoalDevelop a statistical model that can predict the values of a dependent ( ) variable based upon the values of the independent ( ) variables.
Simple Regression
A statistical model that utilizes one independent variable “X” to predict the
dependent variable “Y.”
Multiple RegressionA statistical model that utilizes two or more quantitative and qualitative explanatory variables (x1,..., xp) to predict a dependent variable Y.
Caution: have at least two or more quantitative explanatory variables (rule of thumb)
Multiple Regression Model
X2
X1
Y
e
Hypotheses
u H0: 1 = 2 = 3 = ... = P = 0
u H1: At least one regression coefficient is not equal to zero
Hypotheses (alternate format)
H0: = 0
H1: 0
Types of ModelsuPositive linear relationshipuNegative linear relationshipuNo relationship between X and YuPositive curvilinear relationshipuU-shaped curvilinearuNegative curvilinear relationship
Multiple Regression Models
Multiple Regression Equations
This is too complicated! You’ve got to
be kiddin’!
Multiple Regression Models
Linear ModelRelationship between one dependent & two
or more independent variables is a linear function
Method of Least SquaresuThe straight line that best fits the data.
uDetermine the straight line for which the differences between the actual values (Y) and the values that would be predicted from the fitted line of regression (Y-hat) are as small as possible.
Measures of VariationuExplained variation (sum of
squares due to regression)uUnexplained variation (error sum
of squares)
uTotal sum of squares
Coefficient of Multiple Determination
When null hypothesis is rejected, a relationship between Y and the X variables exists.Strength measured by R2 [ several types ]
Coefficient of Multiple Determination
R2y.123- - -P
The proportion of Y that is explained by the set of explanatory variables selected
Standard Error of the Estimate
the measure of variability around the line of regression
Confidence interval estimates
»True meanY.X
»IndividualY-hati
Interval Bands [from simple regression]
Multiple Regression EquationY-hat = 0 + 1x1 + 2x2 + ... + PxP + where:
0 = y-intercept {a constant value}
= slope of Y with variable x1 holding the variables x2, x3, ..., xP effects constant
P = slope of Y with variable xP holding all other variables’ effects constant
Who is in Charge?
Mini-CasePredict the consumption of home heating oil during January for homes located around Screne Lakes. Two explanatory variables are selected - - average daily atmospheric temperature (oF) and the amount of attic insulation (“).
O il (G a l) T e m p In su la tio n275.30 40 3363.80 27 3164.30 40 1040.80 73 694.30 64 6
230.90 34 6366.70 9 6300.60 8 10237.80 23 10121.40 63 331.40 65 10
203.50 41 6441.10 21 3323.00 38 352.50 58 10
Mini-Case(0F)Develop a model for
estimating heating oil used for a single family home in the month of January based on average temperature and amount of insulation in inches.
Mini-CaseuWhat preliminary conclusions can home
owners draw from the data?
uWhat could a home owner expect heating oil consumption (in gallons) to be if the outside temperature is 15 oF when the attic insulation is 10 inches thick?
Multiple Regression Equation[mini-case]
Dependent variable: Gallons Consumed ------------------------------------------------------------------------------------- Standard T Parameter Estimate Error Statistic P-Value -------------------------------------------------------------------------------------- CONSTANT 562.151 21.0931 26.6509 0.0000 Insulation -20.0123 2.34251 -8.54313 0.0000 Temperature -5.43658 0.336216 -16.1699 0.0000 -------------------------------------------------------------------------------------- R-squared = 96.561 percent
R-squared (adjusted for d.f.) = 95.9879 percent Standard Error of Est. = 26.0138+
Multiple Regression Equation[mini-case]
where: = temperature [degrees F] = insulation [inches]
Multiple Regression Equation[mini-case]
u For a home with zero inches of attic
insulation and an outside temperature of 0 oF, 562.15 gallons of heating oil would be consumed.
+
Extrapolation
Multiple Regression Equation[mini-case]
u For a home with zero attic insulation and an outside
temperature of zero, 562.15 gallons of heating oil would be consumed.
u For each incremental increase in
degree F of temperature, heating oil
consumption drops 5.44 gallons.
+
Multiple Regression Equation[mini-case]
u For a home with zero attic insulation and an outside temperature of zero, 562 gallons of heating oil would be consumed.
u For each incremental increase in degree F of temperature, for a given amount of attic insulation, heating oil consumption drops 5.44 gallons.
uFor each incremental increase in inches of attic insulation, heating oil consumption drops 20.01 gallons.
Multiple Regression Prediction[mini-case]
with x1 = 15oF and x2 = 10 inches
Y-hat = 562.15 - 5.44(15) - 20.01(10) = 280.45 gallons consumed
Coefficient of Multiple Determination [mini-case]
R2y.12 = .9656
96.56 percent of the variation in heating oil can be explained by the variation in temperature andandand insulation.
Coefficient of Multiple Determination
uProportion of variation in Y ‘explained’ by all X variables taken together
uR2Y.12 = Explained variation = SSR
Total variation SSTuNever decreases when new X variable is added
to model– Only Y values determine SST– Disadvantage when comparing models
uProportion of variation in Y ‘explained’ by all X variables taken together
uReflects– Sample size– Number of independent variables
uSmaller [more conservative] than R2Y.12
uUsed to compare models
Coefficient of Multiple Determination Adjusted
Coefficient of Multiple Determination (adjusted)
R2(adj) y.123- - -P
The proportion of Y that is explained by the set of independent [explanatory] variables selected, adjusted for the number of independent variables and the sample size.
Coefficient of Multiple Determination (adjusted) [Mini-Case]
R2adj = 0.9599
95.99 percent of the variation in heating oil consumption can be explained by the model - adjusted for number of independent variables and the sample size
Coefficient of Partial Determination
uProportion of variation in Y ‘explained’ by variable XP holding all others constant
uMust estimate separate modelsuDenoted R2
Y1.2 in two X variables case– Coefficient of partial determination of X1 with Y
holding X2 constantuUseful in selecting X variables
Coefficient of Partial Determination [p. 878]
R2y1.234 --- P
The coefficient of partial variation of variable Y with x1 holding constant
the effects of variables x2, x3, x4, ... xP.
Coefficient of Partial Determination [Mini-Case]
R2y1.2 = 0.9561
For a fixed (constant) amount of insulation, 95.61 percent of the variation in heating oil can be explained by the variation in average atmospheric temperature. [p. 879]
Coefficient of Partial Determination [Mini-Case]
R2y2.1 = 0.8588
For a fixed (constant) temperature, 85.88 percent of the variation in heating oil can be explained by the variation in amount of insulation.
Testing Overall SignificanceuShows if there is a linear relationship
between all X variables together & YuUses p-valueuHypotheses
– H0: 1 = 2 = ... = P = 0 » No linear relationship
– H1: At least one coefficient is not 0 » At least one X variable affects Y
uExamines the contribution of a set of X variables to the relationship with Y
uNull hypothesis:– Variables in set do not improve significantly
the model when all other variables are included uMust estimate separate modelsuUsed in selecting X variables
Testing Model Portions
Diagnostic CheckinguH0 retain or reject
If reject - {p-value 0.05}uR2
adj
uCorrelation matrixuPartial correlation matrix
Multicollinearity
uHigh correlation between X variablesuCoefficients measure combined effectuLeads to unstable coefficients depending on
X variables in modeluAlways exists; matter of degreeuExample: Using both total number of rooms
and number of bedrooms as explanatory variables in same model
Detecting Multicollinearity
uExamine correlation matrix– Correlations between pairs of X variables are
more than with Y variableuFew remedies
– Obtain new sample data– Eliminate one correlated X variable
Evaluating Multiple Regression Model Steps
uExamine variation measuresuDo residual analysisuTest parameter significance
– Overall model– Portions of model – Individual coefficients
uTest for multicollinearity
Multiple Regression Models
Dummy-Variable Regression Model
u Involves categorical X variable with two levels– e.g., female-male, employed-not employed, etc.
Dummy-Variable Regression Model
u Involves categorical X variable with two levels– e.g., female-male, employed-not employed, etc.
uVariable levels coded 0 & 1
Dummy-Variable Regression Model
u Involves categorical X variable with two levels– e.g., female-male, employed-not employed, etc.
uVariable levels coded 0 & 1uAssumes only intercept is different
– Slopes are constant across categories
Dummy-Variable Model Relationships
Same slopes b1
Females
Males
Dummy Variables
u Permits use of qualitative data (e.g.: seasonal, class standing, location, gender).
u 0, 1 coding (nominative data)
u As part of Diagnostic Checking;
incorporate outliers (i.e.: large residuals)
and influence measures.
Multiple Regression Models
Interaction Regression Model
uHypothesizes interaction between pairs of X variables– Response to one X variable varies at different
levels of another X variableuContains two-way cross product terms Y = 0 + 1x1 + 2x2 + 3x1x2 +
uCan be combined with other models e.g. dummy variable models
Effect of Interaction
uGiven:
uWithout interaction term, effect of X1 on Y is measured by 1
uWith interaction term, effect of X1 onY is measured by 1 + 3X2– Effect increases as X2i increases
Interaction Example
Interaction Example
Interaction Example
Interaction Example
Multiple Regression Models
Inherently Linear ModelsuNon-linear models that can be expressed in
linear form– Can be estimated by least square in linear form
uRequire data transformation
Curvilinear Model Relationships
Logarithmic Transformation
Y = + 1 lnx1 + 2 lnx2 +
Square-Root Transformation
Reciprocal Transformation
Exponential Transformation
OverviewuExplained the linear multiple regression
modelu Interpreted linear multiple regression
computer outputuExplained multicollinearityuDescribed the types of multiple regression
models
Source of Elaborate Slides
Prentice Hall, IncLevine, et. all, First Edition
Regression Analysis[Multiple Regression]
*** End of Presentation ***Questions?