Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm...

48
Generalized Linear Models SJSU November 17, 2016

Transcript of Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm...

Page 1: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Generalized Linear Models

SJSU

November 17, 2016

Page 2: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

GLM Overview

Analysis methods

ANOVA: discrete x, continuous y

Regression: discrete/continuous x, continuous y

GLM: discrete/continuous x, discrete/continuous y

Generalized Linear Models is the broadest category

2 / 48

Page 3: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

GLM Overview

GLM types

Logistic regression

Normal regression - needs the 3 assumptions

Poisson regression

Negative binomial regression (maybe)

3 / 48

Page 4: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

GLM Overview

GLM types

Response (y) type:

Normally distributed

categorical - disease present/absent

categorical - disease low/medium/high

integer valued - number of chocolate chips

4 / 48

Page 5: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Logistic Regression (GLM)

Logistic Regression

Response (yi ) correspond to multiple covariates (xi )

Response is count data

male-female, healthy-sick, alive-dead, success-failure, win-loss

To simplify notation we denote the response with 0 or 1

We are interested in the probability (p) of y=1

5 / 48

Page 6: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Logistic Regression (GLM)

Logistic Regression

Binary response data (Binomial)F−1(p) = β0 + β1x

Standard regression:y = β0 + β1x + εiF−1(p) = y standard link

6 / 48

Page 7: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Logistic Regression (GLM)

Logistic Regression

7 / 48

Page 8: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Logistic Regression (GLM)

Link Functions

Link function:-link probability (p) and the covariates- F is a monotone cumulative distribution function-common: logistic link, probit link, clog-log link

Logit link: F−1(p) = logit(pi ) = log p1−p

log pi1−pi

= β0 + β1xi

pi = exp(β0+β1xi )1+exp(β0+β1xi )

8 / 48

Page 9: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Logistic Regression (GLM)

Link Functions

Link function:

9 / 48

Page 10: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Logistic Regression (GLM)

Logistic Regression

Multiple measurements correspond to the same covariate (xi )Turn the multiple measurements into binomial counts

xi covariate

ni responses corresponding to xi

yi number of responses equal to 1

log pi1−pi

= β0 + β1xi

10 / 48

Page 11: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Logistic Regression (GLM)

Solving for βs

Newton-Raphson algorithm (eg. the same method as finding theparameters of a Gamma distribution)solve the non-linear equation

log pi1−pi

= β0 + β1xi

Once βs are found then solve for pi

pi = exp(β0+β1xi )1+exp(β0+β1xi )

11 / 48

Page 12: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Logistic Regression (GLM)

Assumptions

Independence still needs to be checked

There is no Normality assumption

There is no constant variance assumption

* the variance is a function of the mean

E (yi ) = pi and V (yi ) = E (yi )/(1− E (yi ))

12 / 48

Page 13: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Logistic Regression (GLM)

Example 1- logistic regression

C-section infection data: C-section is a major surgery to have a baby thatcan cause excessive bleeding, blood clots, infection, pain, longer hospitalstays, and longer recovery. The data is from example 17.1 and concernsinfection from a C-section. The response variable (y) is occurrence ornon-occurrence of infection. Three covariates (x) each at two levels:

x1 nonplan -planned=0 and unplanned=1

x2 riskfac - diabetes, overweight, previous C-section: present=1,not=0

x3 antibio - antibiotics were given =1 or not=0

13 / 48

Page 14: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Logistic Regression (GLM)

Example 1 - Data

Planned No Plan

Infection Infectionyes no total yes no total

AntibioticsRisk(yes) 1 17 18 11 87 98Risk (no) 0 2 2 0 0 0

No AntibioticsRisk(yes) 28 30 58 23 3 26Risk (no) 8 32 40 0 9 9

14 / 48

Page 15: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Logistic Regression (GLM)

Example 1 -Code

log(pinfection/pnoinfection) = β0 + β1 ∗ noplan + β2 ∗ riskfac + β3 ∗ antibio

infection=c(1,11,0,0,28,23,8,0)total=c(18,98,2,0,58,26,40,9)proportion=infection/totalnoplan=c(0,1,0,1,0,1,0,1)riskfac=c(1,1,0,0,1,1,0,0)antibio=c(1,1,1,1,0,0,0,0)

reg1=glm(proportion ∼ noplan+riskfac+antibio, family=“binomial”,weights=total)summary(reg1)

15 / 48

Page 16: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Logistic Regression (GLM)

proportion = yi/niweights = ni

reg1=glm(proportion∼noplan+riskfac+antibio, family=“binomial”,weights=ni )summary(reg1)

16 / 48

Page 17: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Logistic Regression (GLM)

Example 1 -Code

Estimate Std. Error z value Pr(> |z |)(Intercept) -1.8926 0.4124 -4.589 4.45e-06 ***

noplan 1.0720 0.4254 2.520 0.0117 *riskfac 2.0299 0.4553 4.459 8.25e-06 ***antibio -3.2544 0.4813 -6.761 1.37e-11 ***

Null deviance: 83.491 on 6 degrees of freedomResidual deviance: 10.997 on 3 degrees of freedom(1 observation deleted due to missingness)AIC: 36.178

When antibiotics are given the factor exp(-3.25)=0.0388P(infection)/P(no.infection) or 1/0.0388=25.77, the odds decrease 25.77times

17 / 48

Page 18: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Logistic Regression (GLM)

Example 1

Observed(no observed proportion for number 4: 0/0=NaNproportion= 0.0555 0.112 0.000 NaN 0.4827 0.8846 0.200 0.000

Model prediction proportion:log(pinfection/pnoinfection) = β0 + β1 ∗ noplan + β2 ∗ riskfac + β3 ∗ antibio(pinfection/pnoinfection) =exp(β0) + exp(β1 ∗ noplan) + exp(β2 ∗ riskfac) + exp(β3 ∗ antibio)predict(reg1, type=“response”)0.0424 0.1145 0.00578 NA 0.534 0.770 0.1309 0.3056

18 / 48

Page 19: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Logistic Regression (GLM)

Deviance score

The deviance of the model measures goodness-of-fitOutput:Null deviance: 83.491 on 6 degrees of freedomResidual deviance: 10.997 on 3 degrees of freedom(1 observation deleted due to missingness)AIC: 36.178

χ2 with 3 degrees of freedom7 observations- 4 parameters estimated (βs)=3Residual deviance=10.9967pvalue=1-pchisq(10.9967,3)=0.0117Reject H0 and conclude the model does not fit wellInclude some interactions?

19 / 48

Page 20: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Logistic Regression (GLM)

Plot Data and Model

plot(1:8, proportion, pch=15, col=“dark blue”)fit=predict(reg1, type=“response”)points(as.numeric(names(fit)),predict(reg1, type=“response”), pch=19,col=”dark green”)legend(1, 0.9,legend=c(“Obs. Proportions”,“Logistic Fit”), pch=c(15,19),col=c(”dark blue”, ”dark green”))

x: 1=(0,1,1),2=(1,1,1), 3=(0,0,1), 4=(1,0,1), 5=(0,1,0),6=(1,1,0),7=(0,0,0),8=(1,0,0)

20 / 48

Page 21: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Logistic Regression (GLM)

Logistic Regression

Goodness of fit:

no R2, F or MSE (use a psuedo R2)

Deviance: D = −2log likelihood of the fitted modellikelihood of the saturated model

yi is the number of 1s and ni − yi is the number of 0s

D = −2∑k

i=1{yi log( yiyi ) + (ni − yi )logni−yini−yi}

Output:Null deviance: 83.491 on 6 degrees of freedomResidual deviance: 10.997 on 3 degrees of freedom(1 observation deleted due to missingness)AIC: 36.178

21 / 48

Page 22: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Logistic Regression (GLM)

Example 2

Arrhythmia: patients who undergo coronary artery bypass graft surgery(CABG) have an approximately 19 to 40% chance of developing atrialfibrillation (AF). AF is a quivering, chaotic motion in the upper chambersof the heart, known as atria. AF can lead to the formation of blood clots,causing greater in-hospital mortality, strokes, and longer hospitalstays.While this can be prevented with drugs, it is very expensive andsometimes dangerous if not warranted. Ideally, several risk factors thatwould indicate an increased risk of developing AF in this population couldsave lives and money by indicating which patients need pharmacologicalintervention. Researchers began collecting data form CABG patientsduring their hospital stay such as demographics, like age and sex, as wellas heart rate, cholesterol, operations time, etc. Then the researchersrecorded which patients developed AF during their hospital stay. The goalwas to evaluate the probability of AF given the measured demographic andrisk factors.

22 / 48

Page 23: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Logistic Regression (GLM)

Example 2: Data

Y fibrillation

X1 ageX2 aortic cross clamp timeX3 cardiopulmonary bypass timeX4 intensive care unit (ICU) timeX5 average heart rateX6 left ventricle ejection fractionX7 anamnesis of hypertensionX8 gender (1=female, 0=male)X9 anamnesis of diabetes

X10 previous MI

23 / 48

Page 24: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Logistic Regression (GLM)

R Code - Binomial

proportion = yi/niweights = ni

reg1=glm(proportion ∼x1, family=“binomial”, weights=ni )summary(reg1)

OR

Y=0 or 1 arrhythmia, ni = 1 so weights are implied as 1reg1=glm(Y∼x1, family=“binomial”)

OR

ni=die+livereg1=glm(cbind(die, live)∼ x1, family=“binomial” )reg1=glm(die/ni ∼ x1, family=“binomial”, weights=ni )

24 / 48

Page 25: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Logistic Regression (GLM)

Example 2

reg1=glm(Y∼X1+X2+X3+X4+X5+X6+X7+X8+X9+X10,binomial)summary(reg1)

Estimate Std. Error z value Pr(> |z |)Intercept -10.952752 4.539527 -2.413 0.015833 *

X1 0.153628 0.044021 3.490 0.000483 ***X2 0.024800 0.023960 1.035 0.300635X3 -0.016837 0.015594 -1.080 0.280272X4 -0.129457 0.086554 -1.496 0.134737X5 0.007144 0.029105 0.245 0.806109X6 0.020674 0.025727 0.804 0.421647X7 -0.537703 0.613750 -0.876 0.380979X8 -0.263754 0.631467 -0.418 0.676178X9 1.093606 0.633264 1.727 0.084179 .

X10 0.341597 0.641249 0.533 0.594237

(Dispersion parameter for binomial family taken to be 1)Residual deviance: 78.252 on 70 degrees of freedom, AIC: 100.25 25 / 48

Page 26: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Logistic Regression (GLM)

Example 2 plot

Too many X values, so find linear predictor (x-axis): β0 + β1 ∗ x1 + ...y-axis is the observed data Y=1 fibrilation, Y=0 no fiby-axis is also the fitted probabilities (GLM regression)

26 / 48

Page 27: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Logistic Regression (GLM)

Example 2

XX=cbind(rep(1,length(X1)),X1,X2,X3,X4,X5,X6,X7,X8,X9,X10)linpred= apply(XX%*%coef(reg1),1,sum)plot(linpred, Y, col=”dark blue”)fit=predict(reg1, type=”response”)points(linpred,predict(reg1, type=”response”), pch=19, col=”dark green”)legend(-5, 0.9,legend=c(”Obs. Proportions”,”Logistic Fit”), pch=c(1,19),col=c(”dark blue”, ”dark green”))

27 / 48

Page 28: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Logistic Regression (GLM)

Example 2

reg1=glm(Y∼X1+X2+X3+X4+X5+X6+X7+X8+X9+X10,family=”binomial”)summary(reg1)

Estimate Std. Error z value Pr(> |z |)Intercept -10.952752 4.539527 -2.413 0.015833 *

X1 0.153628 0.044021 3.490 0.000483 ***X2 0.024800 0.023960 1.035 0.300635X3 -0.016837 0.015594 -1.080 0.280272X4 -0.129457 0.086554 -1.496 0.134737X5 0.007144 0.029105 0.245 0.806109X6 0.020674 0.025727 0.804 0.421647X7 -0.537703 0.613750 -0.876 0.380979X8 -0.263754 0.631467 -0.418 0.676178X9 1.093606 0.633264 1.727 0.084179 .

X10 0.341597 0.641249 0.533 0.594237

(Dispersion parameter for binomial family taken to be 1)Residual deviance: 78.252 on 70 degrees of freedom, AIC: 100.25 28 / 48

Page 29: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Logistic Regression (GLM)

Example 2

X Correlation Checkround(cor(XX[,2:11]),2)

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10X1 1.00 -0.03 0.01 -0.01 -0.18 0.14 0.15 0.21 -0.11 -0.11X2 -0.03 1.00 0.85 0.34 -0.07 -0.23 0.10 -0.26 0.14 -0.06X3 0.01 0.85 1.00 0.32 0.13 -0.10 0.12 -0.19 0.15 -0.11X4 -0.01 0.34 0.32 1.00 0.11 -0.23 0.14 -0.11 0.00 0.04X5 -0.18 -0.07 0.13 0.11 1.00 -0.13 -0.08 -0.06 -0.02 0.18X6 0.14 -0.23 -0.10 -0.23 -0.13 1.00 0.03 0.25 -0.18 -0.42X7 0.15 0.10 0.12 0.14 -0.08 0.03 1.00 0.08 0.18 -0.12X8 0.21 -0.26 -0.19 -0.11 -0.06 0.25 0.08 1.00 -0.08 -0.09X9 -0.11 0.14 0.15 0.00 -0.02 -0.18 0.18 -0.08 1.00 -0.10

X10 -0.11 -0.06 -0.11 0.04 0.18 -0.42 -0.12 -0.09 -0.10 1.00

highest is 0.85 for X3 (clamp time) and X2 (bypass time), remove one29 / 48

Page 30: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Logistic Regression (GLM)

Example 2

Remove X5 average heart rateRemove X8 genderRemove X10 previous MIRemove X2 clamp timeRemove X3 bypass timeRemove X6 left ejection fractionRemove X7 hypertensionRemove X9 diabetes

Keep X1 age and X4 ICU time

30 / 48

Page 31: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Logistic Regression (GLM)

AIC and Parsimony

AIC= 100.25, X1,.., X10 (full model)AIC=92.46, X1, X4, X6, X7,X9AIC=90.58, X1, X4, X7,X9AIC=89.44, X1, X4, X9AIC= 89.48, X1 and X4 (small model)

Small AIC is *best* modelAIC gives penalty for including too many XsOr you can look for largest “residual deviance”

31 / 48

Page 32: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Logistic Regression (GLM)

Other links

Logit, probit, or clog-log linksreg1=glm(Y∼X1+X2+X3+...+X10,family=binomial(logit))reg1=glm(Y∼X1+X2+X3+...+X10,family=binomial(probit))reg1=glm(Y∼X1+X2+X3+...+X10,family=binomial(cloglog))

Very small difference in resultsComplementary log-log is good when y=1 is rareBayesian algorithms prefer the probit link(see lab 5)

32 / 48

Page 33: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Poisson Regression (GLM)

Poisson Regression

Response (y) is count data (Poisson)y={0,1,2,3,...}

Tend to be rare events in a large number of trials- accidents, incidents of a rare disease, device failure in a time interval

33 / 48

Page 34: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Poisson Regression (GLM)

Poisson Regression

E (y) = λ and V (y) = λ

yi ∼ Pois(λi )

log(λi ) = β0 + β1xior multiple x’slog(λi ) = β0 + β1x1,i + β2x2,i + ...

yi = exp(β0 + β1xi )

34 / 48

Page 35: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Poisson Regression (GLM)

Poisson Regression

Model checking and Goodness-of-fit:

Deviance - D = 2∑n

i=1

(yi log

yiyi− (yi − yi )

)Deviance residuals

Pearson residuals

Freedman-Tukey residuals

Plot the residuals

35 / 48

Page 36: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Poisson Regression (GLM)

Example 3

Cellular differentiation (when a cell becomes a more specialized cell). Thisis a study of TNF (tumor necrosis factor) and IFN (interferon) to inducecell differentiation. The number of cells that exhibited markers ofdifferentiation after exposure to TNF or IFN were recorded. There were 16dose combinations of TFN/IFN and 200 cells were examined.reg1=glm(y ∼tfn*ifn, family=poisson)

36 / 48

Page 37: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Poisson Regression (GLM)

Example 3-Data

y=c(11,18,20,39,22,38,52,69,31,68,69,128,102,171,180,193)tfn=c(0,0,0,0,1,1,1,1,10,10,10,10,100,100,100,100)ifn=c(0,4,20,100,0,4,20,100,0,4,20,100,0,4,20,100)

37 / 48

Page 38: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Poisson Regression (GLM)

Example 3

summary(reg1)

Estimate Std. Error z value Pr(> |z |)(Intercept) 3.436e+00 6.377e-02 53.877 < 2e-16 ***

tfn 1.553e-02 8.308e-04 18.689 < 2e-16 ***ifn 8.946e-03 9.669e-04 9.253 < 2e-16 ***

tfn:ifn -5.670e-05 1.348e-05 -4.205 2.61e-05 ***

(Dispersion parameter for poisson family taken to be 1)Null deviance: 707.03 on 15 degrees of freedomResidual deviance: 142.39 on 12 degrees of freedomAIC: 243.69

38 / 48

Page 39: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Poisson Regression (GLM)

Example 3: More code

confint(reg1)2.5 % 97.5 %

(Intercept) 3.308307e+00 3.558360e+00tfn 1.390603e-02 1.716434e-02ifn 7.043823e-03 1.083599e-02tfn:ifn -8.318686e-05 -3.031362e-05

39 / 48

Page 40: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Poisson Regression (GLM)

Example 3

What we expect to see:

predict(reg1, type=”response”)35.62746 36.47161 40.05305 63.97916 36.09877 36.95410 40.5829164.82554 40.63133 41.59405 45.67849 72.96502 132.60092 135.74276149.07241 238.12240

Data:y=c(11,18,20,39,22,38,52,69,31,68,69,128,102,171,180,193)

40 / 48

Page 41: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Poisson Regression (GLM)

Example 3

residuals(reg1,type=”deviance”)

41 / 48

Page 42: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Poisson Regression (GLM)

Example 4

Overdispersion/Underdispersion and the Quasi-PoissonThe Poisson distribution has one parameter for mean and variance(dispersion parameter)

There is a strict assumption that the mean=variance

What if that is not the case?

42 / 48

Page 43: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Poisson Regression (GLM)

Example 4

Y is the count of faults in the manufacturing of rolls of fabric. X is thelength of the roll.

The Poisson model is: log(yi ) = β0 + β1xiglm(y∼x, family=poisson)

43 / 48

Page 44: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Poisson Regression (GLM)

Example 4

Standard poisson: glm(y∼x, family=poisson)

Estimate Std. Error z value Pr(>—z—)

(Intercept) 0.9717506 0.2124693 4.574 4.79e-06 ***x 0.0019297 0.0003063 6.300 2.97e-10 ***

(Dispersion parameter for poisson family taken to be 1)Null deviance: 103.714 on 31 degrees of freedomResidual deviance: 61.758 on 30 degrees of freedomAIC: 189.06

44 / 48

Page 45: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Poisson Regression (GLM)

Example 4

Standard poisson: glm(y∼x, family=quasipoisson)

Estimate Std. Error z value Pr(>—z—)

(Intercept) 0.9717506 0.3095033 3.140 0.003781 **x 0.0019297 0.0004462 4.325 0.000155 ***

(Dispersion parameter for quasipoisson family taken to be 2.121965)Null deviance: 103.714 on 31 degrees of freedomResidual deviance: 61.758 on 30 degrees of freedomAIC: NA

45 / 48

Page 46: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Poisson Regression (GLM)

Example 4

Poisson versus Quassi-poisson

Same β0 and β1

Same deviance

Dispersion parameter (1) and Dispersion parameter (2.12)

Different p-values

46 / 48

Page 47: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Poisson Regression (GLM)

Example 4

47 / 48

Page 48: Generalized Linear Modelsiamrandom.com/sites/default/files/GLM-1.pdf · Newton-Raphson algorithm (eg. the same method as nding the parameters of a Gamma distribution) solve the non-linear

Other GLM

Other GLM

Exponential family response (y)

Normal

Binomial/Bernoulli

Poisson

Gamma

48 / 48