L7 GLM Binomial

download L7 GLM Binomial

of 18

Transcript of L7 GLM Binomial

  • 8/3/2019 L7 GLM Binomial

    1/18

    Lecture 7

    GLMs IIBinomial Family

    Olivier MISSA, [email protected]

    Advanced Research Skills

    mailto:[email protected]:[email protected]
  • 8/3/2019 L7 GLM Binomial

    2/18

    2

    Outline

    Continue our Introduction to Generalized Linear Models.

    In this lecture:

    Illustrate the use of GLMs for

    proportion andbinary data.

  • 8/3/2019 L7 GLM Binomial

    3/18

    3

    Binary & Proportion data tend to follow

    the Binomial distribution

    The Canonical link of this glm family

    is the logitfunction:

    The variance reaches a maximum for intermediate valuesof pand a minimum at either 0% or 100%.

    Reminder

    )1(log

    pp

    ppnVar 1pnMean

  • 8/3/2019 L7 GLM Binomial

    4/18

    4

    In R, binary/proportion data can be entered

    into a model as a response in three different ways:

    as a numeric vector

    (holding the number or proportion of successes)

    as a logical vector or a factor

    (TRUE or the first factor level will be considered successes).

    as a two-column matrix(the first column holding the number of successes and

    the second column the number of failures).

    Three ways to work with binary data

  • 8/3/2019 L7 GLM Binomial

    5/18

    5

    Toxicity to tobacco budworm (moth) of different doses

    of trans-cypermethrin.Batches of 20 moths (of each sex) were put in contact

    for three days with increasing doses of the pyrethroid.

    1st Example

    Dose (micrograms)

    Sex 1 2 4 8 16 32

    Male 1 4 9 13 18 20

    Female 0 2 6 10 12 16

    Number of dead mothsout of 20 tested

    > (dose numdead (sex SF

  • 8/3/2019 L7 GLM Binomial

    6/18

    6

    1st Example

    > modb summary(modb)

    Coefficients:

    Estimate Std. Error z value Pr(>|z|)

    (Intercept) -1.71578 0.32233 -5.323 1.02e-07 ***

    sexM -0.21194 0.51523 -0.411 0.68082

    dose 0.11568 0.02379 4.863 1.16e-06 ***sexM:dose 0.18156 0.06692 2.713 0.00666 **

    ---

    (Dispersion parameter for binomial family taken to be 1)

    Null deviance: 124.876 on 11 degrees of freedom

    Residual deviance: 18.164 on 8 degrees of freedom

    AIC: 56.275

    What is modelled is the proportion of successes

    n

    ii

    i

    iiiiyn

    ynynyyyD

    1

    ln)()/ln(2

  • 8/3/2019 L7 GLM Binomial

    7/187

    1st Example

    > ldose modb2 summary(modb2)

    Coefficients:

    Estimate Std. Error z value Pr(>|z|)

    (Intercept) -2.9935 0.5527 -5.416 6.09e-08 ***

    sexM 0.1750 0.7783 0.225 0.822

    ldose 0.9060 0.1671 5.422 5.89e-08 ***sexM:ldose 0.3529 0.2700 1.307 0.191

    ---

    (Dispersion parameter for binomial family taken to be 1)

    Null deviance: 124.8756 on 11 degrees of freedom

    Residual deviance: 4.9937 on 8 degrees of freedom

    AIC: 43.104

  • 8/3/2019 L7 GLM Binomial

    8/188

    1st Example> drop1(modb2, test="Chisq")

    Single term deletions

    Model:SF ~ sex * ldose

    Df Deviance AIC LRT Pr(Chi)

    4.994 43.104

    sex:ldose 1 6.757 42.867 1.763 0.1842

    > modb3 summary(modb3)

    Coefficients:

    Estimate Std. Error z value Pr(>|z|)

    (Intercept) -3.4732 0.4685 -7.413 1.23e-13 ***

    sexM 1.1007 0.3558 3.093 0.00198 **

    ldose 1.0642 0.1311 8.119 4.70e-16 ***---

    (Dispersion parameter for binomial family taken to be 1)

    Null deviance: 124.876 on 11 degrees of freedom

    Residual deviance: 6.757 on 9 degrees of freedom

    AIC: 42.867

  • 8/3/2019 L7 GLM Binomial

    9/189

    1st Example> drop1(modb3, test="Chisq")

    Single term deletions

    Model:

    SF ~ sex + ldose

    Df Deviance AIC LRT Pr(Chi)

    6.757 42.867

    sex 1 16.984 51.094 10.227 0.001384 **

    ldose 1 118.799 152.909 112.042 < 2.2e-16 ***

    > shapiro.test(residuals(modb3), type="deviance")

    Shapiro-Wilk normality test

    data: residuals(modb3, type = "deviance")

    W = 0.9666, p-value = 0.8725

  • 8/3/2019 L7 GLM Binomial

    10/18

    10

    1st Example

    > par(mfrow=c(2,2))

    > plot(modb3)

  • 8/3/2019 L7 GLM Binomial

    11/18

    11

    1st Example> plot( c(0,1) ~ c(1,32), type="n", log="x",

    xlab="dose", ylab="Probability")

    > text(dose, numdead/20, labels=as.character(sex) )> ld lines (ld, predict(modb3, data.frame(ldose=log2(ld),

    sex=factor(rep("M", length(ld)), levels=levels(sex))),

    type="response") )

    > lines (ld, predict(modb3, data.frame(ldose=log2(ld),

    sex=factor(rep("F", length(ld)), levels=levels(sex))),type="response"), lty=2, col="red" )

  • 8/3/2019 L7 GLM Binomial

    12/18

    12

    1st Example> modbp AIC(modbp)[1] 41.87836

    > modbc AIC(modbc)

    [1] 43.8663

    > AIC(modb3)

    [1] 42.86747

  • 8/3/2019 L7 GLM Binomial

    13/18

    13

    > summary(modb3)

    Coefficients:

    Estimate Std. Error z value Pr(>|z|)

    (Intercept) -3.4732 0.4685 -7.413 1.23e-13 ***

    sexM 1.1007 0.3558 3.093 0.00198 **

    ldose 1.0642 0.1311 8.119 4.70e-16 ***

    ---

    > exp(modb3$coeff) ## careful it may be misleading

    (Intercept) sexM ldose

    0.031019 3.006400 2.898560 ## odds ration: p / (1-p)

    > exp(modb3$coeff[1]+modb3$coeff[2]) ## odds for males

    (Intercept)

    0.09325553

    1st Example

    logit scale

    )1(log

    p

    p

    Every doubling of the dose will leadto an increase in the odds of dying

    over surviving by a factor of 2.899

  • 8/3/2019 L7 GLM Binomial

    14/18

    14

    Erythrocyte Sedimentation Rate in a group of patients.

    Two groups : 20 (ill) mm/hourQ: Is it related to globulin & fibrinogen level in the blood ?

    2nd Example

    > data("plasma", package="HSAUR")

    > str(plasma)

    'data.frame': 32 obs. of 3 variables:

    $ fibrinogen: num 2.52 2.56 2.19 2.18 3.41 2.46 3.22 2.21 ...

    $ globulin : int 38 31 33 31 37 36 38 37 39 41 ...

    $ ESR : Factor w/ 2 levels "ESR < 20","ESR > 20": 1 1 ...

    > summary(plasma)

    fibrinogen globulin ESR

    Min. :2.090 Min. :28.00 ESR < 20:26

    1st Qu.:2.290 1st Qu.:31.75 ESR > 20: 6

    Median :2.600 Median :36.00

    Mean :2.789 Mean :35.66

    3rd Qu.:3.167 3rd Qu.:38.00

    Max. :5.060 Max. :46.00

  • 8/3/2019 L7 GLM Binomial

    15/18

    15

    2nd Example> stripchart(globulin ~ ESR, vertical=T, data=plasma,

    xlab="Erythrocyte Sedimentation Rate (mm/hr)",

    ylab="Globulin blood level", method="jitter" )

    > stripchart(fibrinogen ~ ESR, vertical=T, data=plasma,

    xlab="Erythrocyte Sedimentation Rate (mm/hr)",

    ylab="Fibrinogen blood level", method="jitter" )

  • 8/3/2019 L7 GLM Binomial

    16/18

    16

    2nd Example> mod1 summary(mod1)

    Coefficients:Estimate Std. Error z value Pr(>|z|)

    (Intercept) -6.8451 2.7703 -2.471 0.0135 *

    fibrinogen 1.8271 0.9009 2.028 0.0425 *

    ---

    (Dispersion parameter for binomial family taken to be 1)

    Null deviance: 30.885 on 31 degrees of freedom

    Residual deviance: 24.840 on 30 degrees of freedom

    AIC: 28.840

    > mod2 AIC(mod2)

    [1] 28.97111

    factor

  • 8/3/2019 L7 GLM Binomial

    17/18

    17

    2nd Example> anova(mod1, mod2, test="Chisq")

    Analysis of Deviance Table

    Model 1: ESR ~ fibrinogen

    Model 2: ESR ~ fibrinogen + globulin

    Resid. Df Resid. Dev Df Deviance P(>|Chi|)

    1 30 24.8404

    2 29 22.9711 1 1.8692 0.1716

    > summary(mod2)

    Coefficients:

    Estimate Std. Error z value Pr(>|z|)

    (Intercept) -12.7921 5.7963 -2.207 0.0273 *

    fibrinogen 1.9104 0.9710 1.967 0.0491 *

    globulin 0.1558 0.1195 1.303 0.1925---

    (Dispersion parameter for binomial family taken to be 1)

    Null deviance: 30.885 on 31 degrees of freedom

    Residual deviance: 22.971 on 29 degrees of freedom

    AIC: 28.971

    The difference in terms ofDeviance between thesemodels is not significant,which leads us to select

    the least complex model

  • 8/3/2019 L7 GLM Binomial

    18/18

    18

    2nd Example> shapiro.test(residuals(mod1, type="deviance"))

    Shapiro-Wilk normality test

    data: residuals(mod1, type = "deviance")

    W = 0.6863, p-value = 5.465e-07

    > par(mfrow=c(2,2))

    > plot(mod1)