Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter...
Transcript of Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter...
![Page 1: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/1.jpg)
Math 344: Regression AnalysisChapter 1 — Linear Regression with One Predictor Variable
Wenge Guo
January 19, 2012
Wenge Guo Math 344: Regression Analysis
![Page 2: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/2.jpg)
Why regression?
I Want to model a functional relationship between an“predictor variable” (input, independent variable, etc.) and a“response variable” (output, dependent variable, etc.)
I Examples?
I But real world is noisy, no f = maI Observation noiseI Process noise
I Two distinct goalsI (Estimation) Understanding the relationship between predictor
variables and response variablesI (Prediction) Predicting the future response given the new
observed predictors.
![Page 3: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/3.jpg)
History
I Sir Francis Galton, 19th centuryI Studied the relation between heights of parents and children
and noted that the children “regressed” to the populationmean
I “Regression” stuck as the term to describe statistical relationsbetween variables
![Page 4: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/4.jpg)
Example Applications
Trend lines, eg. Google over 6 mo.
![Page 5: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/5.jpg)
Others
I EpidemiologyI Relating lifespan to obesity or smoking habits etc.
I Science and engineeringI Relating physical inputs to physical outputs in complex systems
I Brain
![Page 6: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/6.jpg)
Aims for the course
I Given something you would like to predict and some numberof covariates
I What kind of model should you use?I Which variables should you include?I Which transformations of variables and interaction terms
should you use?
I Given a model and some dataI How do you fit the model to the data?I How do you express confidence in the values of the model
parameters?I How do you regularize the model to avoid over-fitting and
other related issues?
![Page 7: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/7.jpg)
Aims for the course
I Given something you would like to predict and some numberof covariates
I What kind of model should you use?I Which variables should you include?I Which transformations of variables and interaction terms
should you use?
I Given a model and some dataI How do you fit the model to the data?I How do you express confidence in the values of the model
parameters?I How do you regularize the model to avoid over-fitting and
other related issues?
![Page 8: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/8.jpg)
Data for Regression Analysis
I Observational DataExample: relation between age of employee (X ) and numberof days of illness last year (Y )Cannot be controlled!
I Experimental DataExample: an insurance company wishes to study the relationbetween productivity of its analysts in processing claims (Y )and length of training X .
I Treatment: the length of trainingI Experimental Units: the analysts included in the study.
I Completely Randomized Design: Most basic type of statisticaldesignExample: same example, but every experimental unit has anequal chance to receive any one of the treatments.
![Page 9: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/9.jpg)
Data for Regression Analysis
I Observational DataExample: relation between age of employee (X ) and numberof days of illness last year (Y )Cannot be controlled!
I Experimental DataExample: an insurance company wishes to study the relationbetween productivity of its analysts in processing claims (Y )and length of training X .
I Treatment: the length of trainingI Experimental Units: the analysts included in the study.
I Completely Randomized Design: Most basic type of statisticaldesignExample: same example, but every experimental unit has anequal chance to receive any one of the treatments.
![Page 10: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/10.jpg)
Data for Regression Analysis
I Observational DataExample: relation between age of employee (X ) and numberof days of illness last year (Y )Cannot be controlled!
I Experimental DataExample: an insurance company wishes to study the relationbetween productivity of its analysts in processing claims (Y )and length of training X .
I Treatment: the length of trainingI Experimental Units: the analysts included in the study.
I Completely Randomized Design: Most basic type of statisticaldesignExample: same example, but every experimental unit has anequal chance to receive any one of the treatments.
![Page 11: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/11.jpg)
Simple Linear Regression
I Want to find parameters for a function of the form
Yi = β0 + β1Xi + εi
I Distribution of error random variable not specified
![Page 12: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/12.jpg)
Quick Example - Scatter Plot
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
−2 −1 0 1 2
−2
02
46
8
a
b
![Page 13: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/13.jpg)
Formal Statement of Model
Yi = β0 + β1Xi + εi
I Yi value of the response variable in the i th trial
I β0 and β1 are parameters
I Xi is a known constant, the value of the predictor variable inthe i th trial
I εi is a random error term with mean E(εi ) = 0 and varianceVar(εi ) = σ2
I i = 1, . . . , n
![Page 14: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/14.jpg)
Properties
I The response Yi is the sum of two componentsI Constant term β0 + β1Xi
I Random term εi
I The expected response is
E(Yi ) = E(β0 + β1Xi + εi )
= β0 + β1Xi + E(εi )
= β0 + β1Xi
![Page 15: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/15.jpg)
Expectation Review
I Definition
E(X ) = E(X ) =
∫XP(X )dX , X ∈ R
I Linearity property
E(aX ) = aE(X )
E(aX + bY ) = aE(X ) + b E(Y )
I Obvious from definition
![Page 16: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/16.jpg)
Example Expectation Derivation
P(X ) = 2X , 0 ≤ X ≤ 1
Expectation
E(X ) =
∫ 1
0XP(X )dX
=
∫ 1
02X 2dX
=2X 3
3|10
=2
3
![Page 17: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/17.jpg)
Expectation of a Product of Random Variables
If X,Y are random variables with joint distribution P(X ,Y ) thenthe expectation of the product is given by
E(XY ) =
∫XY
XYP(X ,Y )dXdY .
![Page 18: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/18.jpg)
Expectation of a product of random variables
What if X and Y are independent? If X and Y are independentwith density functions f and g respectively then
E(XY ) =
∫XY
XYf (X )g(Y )dXdY
=
∫X
∫Y
XYf (X )g(Y )dXdY
=
∫X
Xf (X )[
∫Y
Yg(Y )dY ]dX
=
∫X
Xf (X )E(Y )dX
= E(X )E(Y )
![Page 19: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/19.jpg)
Regression Function
I The response Yi comes from a probability distribution withmean
E(Yi ) = β0 + β1Xi
I This means the regression function is
E(Y ) = β0 + β1X
Since the regression function relates the means of theprobability distributions of Y for a given X to the level of X
![Page 20: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/20.jpg)
Error Terms
I The response Yi in the i th trial exceeds or falls short of thevalue of the regression function by the error term amount εi
I The error terms εi are assumed to have constant variance σ2
![Page 21: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/21.jpg)
Response Variance
Responses Yi have the same constant variance
Var(Yi ) = Var(β0 + β1Xi + εi )
= Var(εi )
= σ2
![Page 22: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/22.jpg)
Variance (2nd central moment) Review
I Continuous distribution
Var(X ) = E((X −E(X ))2) =
∫(X −E(X ))2P(X )dX , X ∈ R
I Discrete distribution
Var(X ) = E((X − E(X ))2) =∑i
(Xi − E(X ))2P(Xi ), X ∈ Z
![Page 23: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/23.jpg)
Alternative Form for Variance
Var(X ) = E((X − E(X ))2)
= E((X 2 − 2X E(X ) + E(X )2))
= E(X 2)− 2E(X )E(X ) + E(X )2
= E(X 2)− 2E(X )2 + E(X )2
= E(X 2)− E(X )2.
![Page 24: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/24.jpg)
Example Variance Derivation
P(X ) = 2X , 0 ≤ X ≤ 1
Var(X ) = E((X − E(X ))2) = E(X 2)− E(X )2
=
∫ 1
02XX 2dX − (
2
3)2
=2X 4
4|10 −
4
9
=1
2− 4
9=
1
18
![Page 25: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/25.jpg)
Variance Properties
Var(aX ) = a2 Var(X )
Var(aX + bY ) = a2 Var(X ) + b2 Var(Y ) if X ⊥ Y
Var(a + cX ) = c2 Var(X ) if a and c are both constant
More generally
Var(∑
aiXi ) =∑i
∑j
aiajCov(Xi ,Xj)
![Page 26: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/26.jpg)
Covariance
I The covariance between two real-valued random variables Xand Y, with expected values E(X ) = µ and E(Y ) = ν isdefined as
Cov(X ,Y ) = E((X − µ)(Y − ν))
I Which can be rewritten as
Cov(X ,Y ) = E(XY − νX − µY + µν),
Cov(X ,Y ) = E(XY )− ν E(X )− µE(Y ) + µν,
Cov(X ,Y ) = E(XY )− µν.
![Page 27: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/27.jpg)
Covariance of Independent Variables
If X and Y are independent, then their covariance is zero. Thisfollows because under independence
E(XY ) = E(X )E(Y ) = µν.
and thenCov(X ,Y ) = µν − µν = 0.
![Page 28: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/28.jpg)
Formal Statement of Model
Yi = β0 + β1Xi + εi
I Yi value of the response variable in the i th trial
I β0 and β1 are parameters
I Xi is a known constant, the value of the predictor variable inthe i th trial
I εi is a random error term with mean E(εi ) = 0 and varianceVar(εi ) = σ2
I i = 1, . . . , n
![Page 29: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/29.jpg)
Example
An experimenter gave three subjects a very difficult task. Data onthe age of the subject (X ) and on the number of attempts toaccomplish the task before giving up (Y ) follow:
Table:
Subject i 1 2 3
Age Xi 20 55 30Number of Attempts Yi 5 12 10
![Page 30: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/30.jpg)
Least Squares Linear Regression
I Seek to minimize
Q =n∑
i=1
(Yi − (β0 + β1Xi ))2
I Choose b0 and b1 as estimators for β0 and β1. b0 and b1 willminimize the criterion Q for the given sample observations(X1,Y1), (X2,Y2), · · · , (Xn,Yn).
![Page 31: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/31.jpg)
Guess #1
![Page 32: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/32.jpg)
Guess #2
![Page 33: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/33.jpg)
Function maximization
I Important technique to remember!I Take derivativeI Set result equal to zero and solveI Test second derivative at that point
I Question: does this always give you the maximum?
I Going further: multiple variables, convex optimization
![Page 34: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/34.jpg)
Function Maximization
Find the value of x that maximize the function
f (x) = −x2 + ln(x), x > 0
1. Take derivative
d
dx(−x2 + ln(x)) = 0 (1)
−2x +1
x= 0 (2)
2x2 = 1 (3)
x2 =1
2(4)
x =
√2
2(5)
![Page 35: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/35.jpg)
Function Maximization
Find the value of x that maximize the function
f (x) = −x2 + ln(x), x > 0
1. Take derivative
d
dx(−x2 + ln(x)) = 0 (1)
−2x +1
x= 0 (2)
2x2 = 1 (3)
x2 =1
2(4)
x =
√2
2(5)
![Page 36: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/36.jpg)
2. Check second order derivative
d2
dx2(−x2 + ln(x)) =
d
dx(−2x +
1
x) (6)
= −2− 1
x2(7)
< 0 (8)
Then we have found the maximum. x∗ =√22 ,
f (x∗) = −12 [1 + log(2)].
![Page 37: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/37.jpg)
Least Squares Max(min)imization
I Function to minimize w.r.t. b0 and b1 – b0 and b1 are calledpoint estimators of β0 and β1 respectively
Q =n∑
i=1
(Yi − (b0 + b1Xi ))2
I Minimize this by maximizing -Q
I Either way, find partials and set both equal to zero
dQ
db0= 0
dQ
db1= 0
![Page 38: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/38.jpg)
Normal Equations
I The result of this maximization step are called the normalequations. b0 and b1 are called point estimators of β0 and β1respectively. ∑
Yi = nb0 + b1
∑Xi∑
XiYi = b0
∑Xi + b1
∑X 2i
I This is a system of two equations and two unknowns. Thesolution is given by. . .
![Page 39: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/39.jpg)
Solution to Normal Equations
After a lot of algebra one arrives at
b1 =
∑(Xi − X )(Yi − Y )∑
(Xi − X )2
b0 = Y − b1X
X =
∑Xi
n
Y =
∑Yi
n
![Page 40: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/40.jpg)
Least Square Fit
![Page 41: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/41.jpg)
Questions to Ask
I Is the relationship really linear?
I What is the distribution of the of “errors”?
I Is the fit good?
I How much of the variability of the response is accounted forby including the predictor variable?
I Is the chosen predictor variable the best one?
![Page 42: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/42.jpg)
Goals for First Half of Course
I How to do linear regressionI Self familiarization with software tools
I How to interpret standard linear regression results
I How to derive tests
I How to assess and address deficiencies in regression models
![Page 43: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/43.jpg)
Estimators for β0, β1, σ2
I We want to establish properties of estimators for β0, β1, andσ2 so that we can construct hypothesis tests and so forth
I We will start by establishing some properties of the regressionsolution.
![Page 44: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/44.jpg)
Properties of Solution
I The i th residual is defined to be
ei = Yi − Yi
I The sum of the residuals is zero:∑i
ei =∑
(Yi − b0 − b1Xi )
=∑
Yi − nb0 − b1
∑Xi
=∑
Yi − n(Y − b1X )− b1
∑Xi
= 0
![Page 45: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/45.jpg)
Properties of Solution
The sum of the observed values Yi equals the sum of the fittedvalues Yi ∑
i
Yi =∑i
(b1Xi + b0)
=∑i
(b1Xi + Y − b1X )
= b1
∑i
Xi + nY − b1nX
= b1nX +∑i
Yi − b1nX
=∑i
Yi
![Page 46: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/46.jpg)
Properties of Solution
The sum of the weighted residuals is zero when the residual in thei th trial is weighted by the level of the predictor variable in the i th
trial ∑i
Xiei =∑
(Xi (Yi − b0 − b1Xi ))
=∑i
XiYi − b0
∑Xi − b1
∑X 2i
= 0
The last step is due to the second normal equation.
![Page 47: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/47.jpg)
Properties of Solution
The sum of the weighted residuals is zero when the residual in thei th trial is weighted by the fitted value of the response variable inthe i th trial ∑
i
Yiei =∑i
(b0 + b1Xi )ei
= b0
∑i
ei + b1
∑i
Xiei
= 0
![Page 48: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/48.jpg)
Properties of Solution
The regression line always goes through the point
X , Y
Using the alternative format of linear regression model:
Yi = β∗0 + β1(Xi − X ) + εi
The least squares estimator b1 for β1 remains the same as before.The least squares estimator for β∗0 = β0 + β1X becomes
b∗0 = b0 + b1X = (Y − b1X ) + b1X = Y
Hence the estimated regression function is
Y = Y + b1(X − X )
Apparently, the regression line always goes through the point(X , Y ).
![Page 49: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/49.jpg)
Estimating Error Term Variance σ2
I Review estimation in non-regression setting.
I Show estimation results for regression setting.
![Page 50: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/50.jpg)
Estimation Review
I An estimator is a rule that tells how to calculate the value ofan estimate based on the measurements contained in a sample
I i.e. the sample mean
Y =1
n
n∑i=1
Yi
![Page 51: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/51.jpg)
Point Estimators and Bias
I Point estimatorθ = f ({Y1, . . . ,Yn})
I Unknown quantity / parameter
θ
I Definition: Bias of estimator
B(θ) = E(θ)− θ
![Page 52: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/52.jpg)
Example
I Samples from a Normal(µ, σ2) distribution
Yi ∼ Normal(µ, σ2)
I Estimate the population mean
θ = µ, θ = Y =1
n
n∑i=1
Yi
![Page 53: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/53.jpg)
Sampling Distribution of the Estimator
I First moment
E(θ) = E(1
n
n∑i=1
Yi )
=1
n
n∑i=1
E(Yi ) =nµ
n= θ
I This is an example of an unbiased estimator
B(θ) = E(θ)− θ = 0
![Page 54: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/54.jpg)
Variance of Estimator
I Definition: Variance of estimator
Var(θ) = E([θ − E(θ)]2)
I Remember:
Var(cY ) = c2 Var(Y )
Var(n∑
i=1
Yi ) =n∑
i=1
Var(Yi )
Only if the Yi are independent with finite variance
![Page 55: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/55.jpg)
Example Estimator Variance
I For N(0, 1) mean estimator
Var(θ) = Var(1
n
n∑i=1
Yi )
=1
n2
n∑i=1
Var(Yi ) =nσ2
n2=σ2
n
I Note assumptions
![Page 56: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/56.jpg)
Bias Variance Trade-off
I The mean squared error of an estimator
MSE (θ) = E([θ − θ]2)
I Can be re-expressed
MSE (θ) = Var(θ) + (B(θ)2)
![Page 57: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/57.jpg)
MSE = VAR + BIAS2
Proof
MSE (θ)
= E((θ − θ)2)
= E(([θ − E(θ)] + [E(θ)− θ])2)
= E([θ − E(θ)]2) + 2E([E(θ)− θ][θ − E(θ)]) + E([E(θ)− θ]2)
= Var(θ) + 2E([E(θ)[θ − E(θ)]− θ[θ − E(θ)])) + (B(θ))2
= Var(θ) + 2(0 + 0) + (B(θ))2
= Var(θ) + (B(θ))2
![Page 58: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/58.jpg)
Trade-off
I Think of variance as confidence and bias as correctness.I Intuitions (largely) apply
I Sometimes choosing a biased estimator can result in anoverall lower MSE if it exhibits lower variance.
![Page 59: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/59.jpg)
s2 estimator for σ2 for Single population
Sum of Squares:n∑
i=1
(Yi − Yi )2
Sample Variance Estimator:
s2 =
∑ni=1(Yi − Yi )
2
n − 1
I s2 is an unbiased estimator of σ2.
I The sum of squares SSE has n − 1 “degrees of freedom”associated with it, one degree of freedom is lost by using Y asan estimate of the unknown population mean µ.
![Page 60: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/60.jpg)
Estimating Error Term Variance σ2 for Regression Model
I Regression model
I Variance of each observation Yi is σ2 (the same as for theerror term εi )
I Each Yi comes from a different probability distribution withdifferent means that depend on the level Xi
I The deviation of an observation Yi must be calculated aroundits own estimated mean.
![Page 61: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/61.jpg)
s2 estimator for σ2
s2 = MSE =SSE
n − 2=
∑(Yi − Yi )
2
n − 2=
∑e2i
n − 2
I MSE is an unbiased estimator of σ2
E(MSE ) = σ2
I The sum of squares SSE has n − 2 “degrees of freedom”associated with it.
I Cochran’s theorem (later in the course) tells us where degree’sof freedom come from and how to calculate them.
![Page 62: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/62.jpg)
Normal Error Regression Model
I No matter how the error terms εi are distributed, the leastsquares method provides unbiased point estimators of β0 andβ1
I that also have minimum variance among all unbiased linearestimators
I To set up interval estimates and make tests we need tospecify the distribution of the εi
I We will assume that the εi are normally distributed.
![Page 63: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/63.jpg)
Normal Error Regression Model
Yi = β0 + β1Xi + εi
I Yi value of the response variable in the i th trial
I β0 and β1 are parameters
I Xi is a known constant, the value of the predictor variable inthe i th trial
I εi ∼iid N(0, σ2)note this is different, now we know the distribution
I i = 1, . . . , n
![Page 64: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/64.jpg)
Notational Convention
I When you see εi ∼iid N(0, σ2)
I It is read as εi is distributed identically and independentlyaccording to a normal distribution with mean 0 and varianceσ2
I ExamplesI θ ∼ Poisson(λ)I z ∼ G (θ)
![Page 65: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/65.jpg)
Maximum Likelihood Principle
The method of maximum likelihood chooses as estimates thosevalues of the parameters that are most consistent with the sampledata.
![Page 66: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/66.jpg)
Likelihood Function
IfXi ∼ F (Θ), i = 1 . . . n
then the likelihood function is
L({Xi}ni=1,Θ) =n∏
i=1
F (Xi ; Θ)
![Page 67: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/67.jpg)
Maximum Likelihood Estimation
I The likelihood function can be maximized w.r.t. theparameter(s) Θ, doing this one can arrive at estimators forparameters as well.
L({Xi}ni=1,Θ) =n∏
i=1
F (Xi ; Θ)
I To do this, find solutions to (analytically or by followinggradient)
dL({Xi}ni=1,Θ)
dΘ= 0
![Page 68: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/68.jpg)
Important Trick
Never (almost) maximize the likelihood function, maximize the loglikelihood function instead.
log(L({Xi}ni=1,Θ)) = log(n∏
i=1
F (Xi ; Θ))
=n∑
i=1
log(F (Xi ; Θ))
Quite often the log of the density is easier to work withmathematically.
![Page 69: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/69.jpg)
ML Normal Regression
Likelihood function
L(β0, β1, σ2) =
n∏i=1
1
(2πσ2)1/2e−
12σ2 (Yi−β0−β1Xi )
2
=1
(2πσ2)n/2e−
12σ2
∑ni=1(Yi−β0−β1Xi )
2
which if you maximize (how?) w.r.t. to the parameters you get. . .
![Page 70: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/70.jpg)
Maximum Likelihood Estimator(s)
I β0b0 same as in least squares case
I β1b1 same as in least squares case
I σ2
σ2 =
∑i (Yi − Yi )
2
n
I Note that ML estimator is biased as s2 is unbiased and
s2 = MSE =n
n − 2σ2
![Page 71: Math 344: Regression Analysis - New Jersey Institute of ...wguo/Math344_2012/Math344_Chapter 1...Math 344: Regression Analysis Chapter 1 | Linear Regression with One Predictor Variable](https://reader035.fdocuments.in/reader035/viewer/2022071409/61017ebf6b54ca5279214cfa/html5/thumbnails/71.jpg)
Comments
I Least squares minimizes the squared error between theprediction and the true output
I The normal distribution is fully characterized by its first twocentral moments (mean and variance)