Regression Analysis in Theory and Practice. DON’T WRITE THE FORMULAS AHEAD!!!
-
Upload
ginger-long -
Category
Documents
-
view
236 -
download
0
Transcript of Regression Analysis in Theory and Practice. DON’T WRITE THE FORMULAS AHEAD!!!
Regression Analysis in Regression Analysis in Theory and PracticeTheory and Practice
DON’T WRITE THE DON’T WRITE THE FORMULAS AHEADFORMULAS AHEAD!!!!!!
REGRESSION ANALYSISREGRESSION ANALYSIS
Formula for simple regressionFormula for simple regression
where is the predicted value of Y on the where is the predicted value of Y on the regression line.regression line.
Do you remember y=mx + b?Do you remember y=mx + b?
Same thing!Same thing!
Y a bX Y
The dependence of Y on X can be of two types: “deterministic” or “probabilistic”.
The classic case of deterministic relationship is that between Fahrenheit and Celsius measure of temperature:
F0 = 32 + (9/5)C
Where a, the intercept, is 320. So when C=0, degrees F=32, b beta, is the slope of the line, here (9/5) or 1.8. C is X, degrees Celsius.
Y a bX
So for every on degree of change in degrees C, Fahrenheit goes up by 1.8 degrees, starting at 32 degrees.
So when C =0 F = 320 + (9/5)0 = 320
When C = 1000 F = 32 + (9/5)100=2120
Note: 1.8 = 9/5
Probabilistic Regression
Not perfectly predictive.Not perfectly predictive. On average, we expect a certain amount On average, we expect a certain amount
of change in Y for a certain change in Xof change in Y for a certain change in X
Regression ExampleRegression Example
Judges are advised to give longer Judges are advised to give longer sentences to repeat offenders than to first-sentences to repeat offenders than to first-time offenders. Does it really happen?time offenders. Does it really happen?
Hypothesis: In comparing criminals, those Hypothesis: In comparing criminals, those who illustrate the characteristic of having who illustrate the characteristic of having been convicted before will receive longer been convicted before will receive longer prison sentences than those with no prior prison sentences than those with no prior convictions.convictions.
We collect data for 10 convicted criminalsWe collect data for 10 convicted criminals
Data and Data and Formula:Formula:
X (convctn)X (convctn) y (sen len)y (sen len)
00 1212
33 1313
11 1515
00 1919
66 2626
55 2727
33 2929
44 3131
1010 4040
88 4848
ΣΣx = 40x = 40 ΣΣy = 260y = 260
2
( )( )
( )
X X Y Yb
X X
X = 4Y = 26
X – XX – X Y – YY – Y
-4-4 -14-14
-1-1 -13-13
-3-3 -11-11
-4-4 -7-7
22 00
11 11
-1-1 33
00 55
66 8484
44 8888
Continued:Continued:(X-X) * (Y-Y)(X-X) * (Y-Y) (X-X)(X-X)22
5656 1616
1313 11
3333 99
2828 1616
00 44
11 11
-3-3 11
55 00
1414 3636
2222 1616
ΣΣ = 300 = 300 ΣΣ= 100= 100
2
( )( )
( )
X X Y Yb
X X
X = 4Y = 26
300
100
b = 3
X – XX – X Y – YY – Y
-4-4 -14-14
-1-1 -13-13
-3-3 -11-11
-4-4 -7-7
22 00
11 11
-1-1 33
00 55
66 8484
44 8888
Now Calculate “A”Now Calculate “A”
a Y bX a = 26 – (3) * 4a = 26 – 12a = 14
Y = 14 + 3*X
Interpret the EquationInterpret the Equation
Y = 14 + 3*X
Interpret 14
Interpret 3
ScatterplotScatterplot
var1
var2 Fitted values
0 10
12
48
Multiple Regression - 1Multiple Regression - 1
The mathematics of how the computer The mathematics of how the computer calculates regression coefficients in multiple calculates regression coefficients in multiple regression is very complicated. Fortunately, regression is very complicated. Fortunately, there is an intuitive process that generates the there is an intuitive process that generates the correct answers and is much easier to correct answers and is much easier to understand. Let’s see how the computer understand. Let’s see how the computer obtained the value of -.644 for the impact of obtained the value of -.644 for the impact of senator conservatism on the degree to which a senator conservatism on the degree to which a senator voted for tax changes primarily senator voted for tax changes primarily benefitting households at, or below, the median benefitting households at, or below, the median income.income.
Multiple Regression - 2Multiple Regression - 2
Our “main equation” is:Our “main equation” is: Y = aY = a11 + b + b11XX11 + b + b22XX22 + b + b33XX33 + e + e11
Y = percentage support for tax changes Y = percentage support for tax changes benefitting households with incomes at, or benefitting households with incomes at, or below, the medianbelow, the median
XX1 1 = senator conservatism= senator conservatism
XX2 2 = senator party affiliation= senator party affiliation
XX3 3 = state median household income= state median household income
Our goal is to estimate bOur goal is to estimate b11
Multiple Regression - 3Multiple Regression - 3
XX11 = a = a22 + b + b44XX22 + b + b55XX3 3 + e+ e22
In the above equation eIn the above equation e22 represents that represents that
portion of a senator’s conservatism than portion of a senator’s conservatism than CANNOT be explained by either their CANNOT be explained by either their party affiliation or the median family party affiliation or the median family income in their state.income in their state.
Multiple Regression - 4Multiple Regression - 4
Y = aY = a33 + b + b66XX22 + b + b77XX3 3 + e+ e3 3
In the above equation eIn the above equation e33 represents that represents that
portion of a senator’s degree of support portion of a senator’s degree of support for tax changes favorable to households for tax changes favorable to households with incomes at, or below, the median that with incomes at, or below, the median that CANNOT be explained by either their CANNOT be explained by either their party affiliation or the median family party affiliation or the median family income in their state.income in their state.
Multiple Regression - 5Multiple Regression - 5
ee3 3 = a= a4 4 + b+ b88ee2 2 + e+ e44 In the above equation bIn the above equation b88 represents the impact of represents the impact of
that portion of a senator’s conservatism that that portion of a senator’s conservatism that CANNOT be explained by party and state median CANNOT be explained by party and state median income on the percentage of times the senator income on the percentage of times the senator voted in favor of tax changes primarily benefitting voted in favor of tax changes primarily benefitting households at, or below, the median income that households at, or below, the median income that CANNOT be explained by either their party CANNOT be explained by either their party affiliation or the median income in their state. affiliation or the median income in their state. Thus, bThus, b8 8 in the above equation = bin the above equation = b1 1 in the “main in the “main
equation” (i.e., -.644).equation” (i.e., -.644).
Maximum Likelihood EstimationMaximum Likelihood Estimation