Severity Distributions for GLMs: Gamma or Lognormal? - Casualty
Precept 4 - More GLMs: Models of Binary and Lognormal Outcomes - Soc 504… · GLMsComplementary...
Transcript of Precept 4 - More GLMs: Models of Binary and Lognormal Outcomes - Soc 504… · GLMsComplementary...
GLMs Complementary log-log Quantities of Interest
Precept 4 - More GLMs:Models of Binary and Lognormal Outcomes
Soc 504: Advanced Social Statistics
Ian Lundberg1
Princeton University
March 2, 2017
1These slides owe an enormous debt to generations of TFs in Gov 2001 atHarvard. Many slides are directly adapted from those by Brandon Stewart andStephen Pettigrew.
GLMs Complementary log-log Quantities of Interest
Outline
1 GLMsGeneral Structure of GLMsProcedure for Running a GLM
2 Complementary log-log
3 Quantities of Interest
GLMs Complementary log-log Quantities of Interest
Replication Paper
Any thoughts or issues to discuss?
GLMs Complementary log-log Quantities of Interest
Generalized Linear Models
All of the models we’ve talked about belong to a class calledgeneralized linear models (GLM).
Three elements of a GLM:
A distribution for Y (stochastic component)
A linear predictor Xβ (systematic component)
A link function that relates the linear predictor to a parameterof the distribution. (systematic component)
GLMs Complementary log-log Quantities of Interest
Generalized Linear Models
All of the models we’ve talked about belong to a class calledgeneralized linear models (GLM).
Three elements of a GLM:
A distribution for Y (stochastic component)
A linear predictor Xβ (systematic component)
A link function that relates the linear predictor to a parameterof the distribution. (systematic component)
GLMs Complementary log-log Quantities of Interest
Generalized Linear Models
All of the models we’ve talked about belong to a class calledgeneralized linear models (GLM).
Three elements of a GLM:
A distribution for Y (stochastic component)
A linear predictor Xβ (systematic component)
A link function that relates the linear predictor to a parameterof the distribution. (systematic component)
GLMs Complementary log-log Quantities of Interest
Generalized Linear Models
All of the models we’ve talked about belong to a class calledgeneralized linear models (GLM).
Three elements of a GLM:
A distribution for Y (stochastic component)
A linear predictor Xβ (systematic component)
A link function that relates the linear predictor to a parameterof the distribution. (systematic component)
GLMs Complementary log-log Quantities of Interest
Generalized Linear Models
All of the models we’ve talked about belong to a class calledgeneralized linear models (GLM).
Three elements of a GLM:
A distribution for Y (stochastic component)
A linear predictor Xβ (systematic component)
A link function that relates the linear predictor to a parameterof the distribution. (systematic component)
GLMs Complementary log-log Quantities of Interest
Generalized Linear Models
All of the models we’ve talked about belong to a class calledgeneralized linear models (GLM).
Three elements of a GLM:
A distribution for Y (stochastic component)
A linear predictor Xβ (systematic component)
A link function that relates the linear predictor to a parameterof the distribution. (systematic component)
GLMs Complementary log-log Quantities of Interest
1. Specify a distribution for Y
Assume our data was generated from some distribution.
Examples:
Continuous and Unbounded: Normal
Binary: Bernoulli
Event Count: Poisson
Duration: Exponential
Ordered Categories: Normal with observation mechanism
Unordered Categories: Multinomial
GLMs Complementary log-log Quantities of Interest
1. Specify a distribution for Y
Assume our data was generated from some distribution.
Examples:
Continuous and Unbounded: Normal
Binary: Bernoulli
Event Count: Poisson
Duration: Exponential
Ordered Categories: Normal with observation mechanism
Unordered Categories: Multinomial
GLMs Complementary log-log Quantities of Interest
1. Specify a distribution for Y
Assume our data was generated from some distribution.
Examples:
Continuous and Unbounded: Normal
Binary: Bernoulli
Event Count: Poisson
Duration: Exponential
Ordered Categories: Normal with observation mechanism
Unordered Categories: Multinomial
GLMs Complementary log-log Quantities of Interest
1. Specify a distribution for Y
Assume our data was generated from some distribution.
Examples:
Continuous and Unbounded:
Normal
Binary: Bernoulli
Event Count: Poisson
Duration: Exponential
Ordered Categories: Normal with observation mechanism
Unordered Categories: Multinomial
GLMs Complementary log-log Quantities of Interest
1. Specify a distribution for Y
Assume our data was generated from some distribution.
Examples:
Continuous and Unbounded: Normal
Binary: Bernoulli
Event Count: Poisson
Duration: Exponential
Ordered Categories: Normal with observation mechanism
Unordered Categories: Multinomial
GLMs Complementary log-log Quantities of Interest
1. Specify a distribution for Y
Assume our data was generated from some distribution.
Examples:
Continuous and Unbounded: Normal
Binary:
Bernoulli
Event Count: Poisson
Duration: Exponential
Ordered Categories: Normal with observation mechanism
Unordered Categories: Multinomial
GLMs Complementary log-log Quantities of Interest
1. Specify a distribution for Y
Assume our data was generated from some distribution.
Examples:
Continuous and Unbounded: Normal
Binary: Bernoulli
Event Count: Poisson
Duration: Exponential
Ordered Categories: Normal with observation mechanism
Unordered Categories: Multinomial
GLMs Complementary log-log Quantities of Interest
1. Specify a distribution for Y
Assume our data was generated from some distribution.
Examples:
Continuous and Unbounded: Normal
Binary: Bernoulli
Event Count:
Poisson
Duration: Exponential
Ordered Categories: Normal with observation mechanism
Unordered Categories: Multinomial
GLMs Complementary log-log Quantities of Interest
1. Specify a distribution for Y
Assume our data was generated from some distribution.
Examples:
Continuous and Unbounded: Normal
Binary: Bernoulli
Event Count: Poisson
Duration: Exponential
Ordered Categories: Normal with observation mechanism
Unordered Categories: Multinomial
GLMs Complementary log-log Quantities of Interest
1. Specify a distribution for Y
Assume our data was generated from some distribution.
Examples:
Continuous and Unbounded: Normal
Binary: Bernoulli
Event Count: Poisson
Duration:
Exponential
Ordered Categories: Normal with observation mechanism
Unordered Categories: Multinomial
GLMs Complementary log-log Quantities of Interest
1. Specify a distribution for Y
Assume our data was generated from some distribution.
Examples:
Continuous and Unbounded: Normal
Binary: Bernoulli
Event Count: Poisson
Duration: Exponential
Ordered Categories: Normal with observation mechanism
Unordered Categories: Multinomial
GLMs Complementary log-log Quantities of Interest
1. Specify a distribution for Y
Assume our data was generated from some distribution.
Examples:
Continuous and Unbounded: Normal
Binary: Bernoulli
Event Count: Poisson
Duration: Exponential
Ordered Categories:
Normal with observation mechanism
Unordered Categories: Multinomial
GLMs Complementary log-log Quantities of Interest
1. Specify a distribution for Y
Assume our data was generated from some distribution.
Examples:
Continuous and Unbounded: Normal
Binary: Bernoulli
Event Count: Poisson
Duration: Exponential
Ordered Categories: Normal with observation mechanism
Unordered Categories: Multinomial
GLMs Complementary log-log Quantities of Interest
1. Specify a distribution for Y
Assume our data was generated from some distribution.
Examples:
Continuous and Unbounded: Normal
Binary: Bernoulli
Event Count: Poisson
Duration: Exponential
Ordered Categories: Normal with observation mechanism
Unordered Categories:
Multinomial
GLMs Complementary log-log Quantities of Interest
1. Specify a distribution for Y
Assume our data was generated from some distribution.
Examples:
Continuous and Unbounded: Normal
Binary: Bernoulli
Event Count: Poisson
Duration: Exponential
Ordered Categories: Normal with observation mechanism
Unordered Categories: Multinomial
GLMs Complementary log-log Quantities of Interest
2. Specify a linear predictor
We are interested in allowing some parameter of the distribution θto vary as a (linear) function of covariates. So we specify a linearpredictor.
Xβ = β0 + x1β1 + x2β2 + · · ·+ xkβk
GLMs Complementary log-log Quantities of Interest
2. Specify a linear predictor
We are interested in allowing some parameter of the distribution θto vary as a (linear) function of covariates. So we specify a linearpredictor.
Xβ = β0 + x1β1 + x2β2 + · · ·+ xkβk
GLMs Complementary log-log Quantities of Interest
3. Specify a link function
The link function relates the linear predictor to some parameter θof the distribution for Y (almost always the mean).
Let g(·) be the link function and let E (Y ) = θ be the mean ofdistribution for Y .
g(θ) = Xβ
θ = g−1(Xβ)
This is the systematic component that we’ve been talking about all
along.
GLMs Complementary log-log Quantities of Interest
3. Specify a link function
The link function relates the linear predictor to some parameter θof the distribution for Y (almost always the mean).
Let g(·) be the link function and let E (Y ) = θ be the mean ofdistribution for Y .
g(θ) = Xβ
θ = g−1(Xβ)
This is the systematic component that we’ve been talking about all
along.
GLMs Complementary log-log Quantities of Interest
3. Specify a link function
The link function relates the linear predictor to some parameter θof the distribution for Y (almost always the mean).
Let g(·) be the link function and let E (Y ) = θ be the mean ofdistribution for Y .
g(θ) = Xβ
θ = g−1(Xβ)
This is the systematic component that we’ve been talking about all
along.
GLMs Complementary log-log Quantities of Interest
3. Specify a link function
The link function relates the linear predictor to some parameter θof the distribution for Y (almost always the mean).
Let g(·) be the link function and let E (Y ) = θ be the mean ofdistribution for Y .
g(θ) = Xβ
θ = g−1(Xβ)
This is the systematic component that we’ve been talking about all
along.
GLMs Complementary log-log Quantities of Interest
3. Specify a link function
The link function relates the linear predictor to some parameter θof the distribution for Y (almost always the mean).
Let g(·) be the link function and let E (Y ) = θ be the mean ofdistribution for Y .
g(θ) = Xβ
θ = g−1(Xβ)
This is the systematic component that we’ve been talking about all
along.
GLMs Complementary log-log Quantities of Interest
3. Specify a link function
The link function relates the linear predictor to some parameter θof the distribution for Y (almost always the mean).
Let g(·) be the link function and let E (Y ) = θ be the mean ofdistribution for Y .
g(θ) = Xβ
θ = g−1(Xβ)
This is the systematic component that we’ve been talking about all
along.
GLMs Complementary log-log Quantities of Interest
Example Link Functions
Identity:
Link: µ = Xβ
Inverse:
Link: λ−1 = Xβ
Inverse Link: λ = (Xβ)−1
Logit:
Link: ln(
π1−π
)= Xβ
Inverse Link: π = 11+e−Xβ
Probit:
Link: Φ−1(π) = Xβ
Inverse Link: π = Φ(Xβ)
Log:
Link: ln(λ) = Xβ
Inverse Link: λ = exp(Xβ)
GLMs Complementary log-log Quantities of Interest
Example Link Functions
Identity:
Link: µ = Xβ
Inverse:
Link: λ−1 = Xβ
Inverse Link: λ = (Xβ)−1
Logit:
Link: ln(
π1−π
)= Xβ
Inverse Link: π = 11+e−Xβ
Probit:
Link: Φ−1(π) = Xβ
Inverse Link: π = Φ(Xβ)
Log:
Link: ln(λ) = Xβ
Inverse Link: λ = exp(Xβ)
GLMs Complementary log-log Quantities of Interest
Example Link Functions
Identity:
Link: µ = Xβ
Inverse:
Link: λ−1 = Xβ
Inverse Link: λ = (Xβ)−1
Logit:
Link: ln(
π1−π
)= Xβ
Inverse Link: π = 11+e−Xβ
Probit:
Link: Φ−1(π) = Xβ
Inverse Link: π = Φ(Xβ)
Log:
Link: ln(λ) = Xβ
Inverse Link: λ = exp(Xβ)
GLMs Complementary log-log Quantities of Interest
Example Link Functions
Identity:
Link: µ = Xβ
Inverse:
Link: λ−1 = Xβ
Inverse Link: λ = (Xβ)−1
Logit:
Link: ln(
π1−π
)= Xβ
Inverse Link: π = 11+e−Xβ
Probit:
Link: Φ−1(π) = Xβ
Inverse Link: π = Φ(Xβ)
Log:
Link: ln(λ) = Xβ
Inverse Link: λ = exp(Xβ)
GLMs Complementary log-log Quantities of Interest
Example Link Functions
Identity:
Link: µ = Xβ
Inverse:
Link: λ−1 = Xβ
Inverse Link: λ = (Xβ)−1
Logit:
Link: ln(
π1−π
)= Xβ
Inverse Link: π = 11+e−Xβ
Probit:
Link: Φ−1(π) = Xβ
Inverse Link: π = Φ(Xβ)
Log:
Link: ln(λ) = Xβ
Inverse Link: λ = exp(Xβ)
GLMs Complementary log-log Quantities of Interest
Example Link Functions
Identity:
Link: µ = Xβ
Inverse:
Link: λ−1 = Xβ
Inverse Link: λ = (Xβ)−1
Logit:
Link: ln(
π1−π
)= Xβ
Inverse Link: π = 11+e−Xβ
Probit:
Link: Φ−1(π) = Xβ
Inverse Link: π = Φ(Xβ)
Log:
Link: ln(λ) = Xβ
Inverse Link: λ = exp(Xβ)
GLMs Complementary log-log Quantities of Interest
Example Link Functions
Identity:
Link: µ = Xβ
Inverse:
Link: λ−1 = Xβ
Inverse Link: λ = (Xβ)−1
Logit:
Link: ln(
π1−π
)= Xβ
Inverse Link: π = 11+e−Xβ
Probit:
Link: Φ−1(π) = Xβ
Inverse Link: π = Φ(Xβ)
Log:
Link: ln(λ) = Xβ
Inverse Link: λ = exp(Xβ)
GLMs Complementary log-log Quantities of Interest
Example Link Functions
Identity:
Link: µ = Xβ
Inverse:
Link: λ−1 = Xβ
Inverse Link: λ = (Xβ)−1
Logit:
Link: ln(
π1−π
)= Xβ
Inverse Link: π = 11+e−Xβ
Probit:
Link: Φ−1(π) = Xβ
Inverse Link: π = Φ(Xβ)
Log:
Link: ln(λ) = Xβ
Inverse Link: λ = exp(Xβ)
GLMs Complementary log-log Quantities of Interest
Example Link Functions
Identity:
Link: µ = Xβ
Inverse:
Link: λ−1 = Xβ
Inverse Link: λ = (Xβ)−1
Logit:
Link: ln(
π1−π
)= Xβ
Inverse Link: π = 11+e−Xβ
Probit:
Link: Φ−1(π) = Xβ
Inverse Link: π = Φ(Xβ)
Log:
Link: ln(λ) = Xβ
Inverse Link: λ = exp(Xβ)
GLMs Complementary log-log Quantities of Interest
Example Link Functions
Identity:
Link: µ = Xβ
Inverse:
Link: λ−1 = Xβ
Inverse Link: λ = (Xβ)−1
Logit:
Link: ln(
π1−π
)= Xβ
Inverse Link: π = 11+e−Xβ
Probit:
Link: Φ−1(π) = Xβ
Inverse Link: π = Φ(Xβ)
Log:
Link: ln(λ) = Xβ
Inverse Link: λ = exp(Xβ)
GLMs Complementary log-log Quantities of Interest
Example Link Functions
Identity:
Link: µ = Xβ
Inverse:
Link: λ−1 = Xβ
Inverse Link: λ = (Xβ)−1
Logit:
Link: ln(
π1−π
)= Xβ
Inverse Link: π = 11+e−Xβ
Probit:
Link: Φ−1(π) = Xβ
Inverse Link: π = Φ(Xβ)
Log:
Link: ln(λ) = Xβ
Inverse Link: λ = exp(Xβ)
GLMs Complementary log-log Quantities of Interest
Example Link Functions
Identity:
Link: µ = Xβ
Inverse:
Link: λ−1 = Xβ
Inverse Link: λ = (Xβ)−1
Logit:
Link: ln(
π1−π
)= Xβ
Inverse Link: π = 11+e−Xβ
Probit:
Link: Φ−1(π) = Xβ
Inverse Link: π = Φ(Xβ)
Log:
Link: ln(λ) = Xβ
Inverse Link: λ = exp(Xβ)
GLMs Complementary log-log Quantities of Interest
Example Link Functions
Identity:
Link: µ = Xβ
Inverse:
Link: λ−1 = Xβ
Inverse Link: λ = (Xβ)−1
Logit:
Link: ln(
π1−π
)= Xβ
Inverse Link: π = 11+e−Xβ
Probit:
Link: Φ−1(π) = Xβ
Inverse Link: π = Φ(Xβ)
Log:
Link: ln(λ) = Xβ
Inverse Link: λ = exp(Xβ)
GLMs Complementary log-log Quantities of Interest
Example Link Functions
Identity:
Link: µ = Xβ
Inverse:
Link: λ−1 = Xβ
Inverse Link: λ = (Xβ)−1
Logit:
Link: ln(
π1−π
)= Xβ
Inverse Link: π = 11+e−Xβ
Probit:
Link: Φ−1(π) = Xβ
Inverse Link: π = Φ(Xβ)
Log:
Link: ln(λ) = Xβ
Inverse Link: λ = exp(Xβ)
GLMs Complementary log-log Quantities of Interest
Example Link Functions
Identity:
Link: µ = Xβ
Inverse:
Link: λ−1 = Xβ
Inverse Link: λ = (Xβ)−1
Logit:
Link: ln(
π1−π
)= Xβ
Inverse Link: π = 11+e−Xβ
Probit:
Link: Φ−1(π) = Xβ
Inverse Link: π = Φ(Xβ)
Log:
Link: ln(λ) = Xβ
Inverse Link: λ = exp(Xβ)
GLMs Complementary log-log Quantities of Interest
4. Estimate Parameters via ML
Use optim to estimate the parameters just like we have all along.
GLMs Complementary log-log Quantities of Interest
4. Estimate Parameters via ML
Use optim to estimate the parameters just like we have all along.
GLMs Complementary log-log Quantities of Interest
5. Quantities of Interest
1 Simulate parameters from multivariate normal.
2 Run Xβ through inverse link function to get expected values.
3 Draw from distribution of Y for predicted values.
GLMs Complementary log-log Quantities of Interest
5. Quantities of Interest
1 Simulate parameters from multivariate normal.
2 Run Xβ through inverse link function to get expected values.
3 Draw from distribution of Y for predicted values.
GLMs Complementary log-log Quantities of Interest
5. Quantities of Interest
1 Simulate parameters from multivariate normal.
2 Run Xβ through inverse link function to get expected values.
3 Draw from distribution of Y for predicted values.
GLMs Complementary log-log Quantities of Interest
5. Quantities of Interest
1 Simulate parameters from multivariate normal.
2 Run Xβ through inverse link function to get expected values.
3 Draw from distribution of Y for predicted values.
GLMs Complementary log-log Quantities of Interest
Outline
1 GLMsGeneral Structure of GLMsProcedure for Running a GLM
2 Complementary log-log
3 Quantities of Interest
GLMs Complementary log-log Quantities of Interest
GLMs Complementary log-log Quantities of Interest
We will use data from the Fragile Families and Child WellbeingStudy to study the cumulative risk of eviction over child forchildren born in large American cities.
GLMs Complementary log-log Quantities of Interest
Research question and data
What is the probability of eviction in a given year for a child with agiven set of covariates?
ffEviction.csv is the data that we use.
GLMs Complementary log-log Quantities of Interest
Research question and data
What is the probability of eviction in a given year for a child with agiven set of covariates?
ffEviction.csv is the data that we use.
GLMs Complementary log-log Quantities of Interest
Fragile Families data
What’s in the data?
> head(d)
idnum income married cm1ethrace ev
1 0001 1.5 0 Hispanic 0
2 0002 1.6 0 Black 0
3 0003 2.7 0 White 0
4 0004 1.0 0 Hispanic 0
5 0006 0.2 0 Black 0
6 0007 1.3 0 Hispanic 0
> summary(d)
idnum income married cm1ethrace ev
Length:12298 Min. :0.000 Min. :0.0000 White :2709 Min. :0.00000
Class :character 1st Qu.:0.500 1st Qu.:0.0000 Black :5911 1st Qu.:0.00000
Mode :character Median :1.200 Median :0.0000 Hispanic:3225 Median :0.00000
Mean :1.666 Mean :0.2502 Other : 453 Mean :0.02301
3rd Qu.:2.400 3rd Qu.:1.0000 3rd Qu.:0.00000
Max. :5.000 Max. :1.0000 Max. :1.00000
GLMs Complementary log-log Quantities of Interest
Fragile Families data
What’s in the data?
> head(d)
idnum income married cm1ethrace ev
1 0001 1.5 0 Hispanic 0
2 0002 1.6 0 Black 0
3 0003 2.7 0 White 0
4 0004 1.0 0 Hispanic 0
5 0006 0.2 0 Black 0
6 0007 1.3 0 Hispanic 0
> summary(d)
idnum income married cm1ethrace ev
Length:12298 Min. :0.000 Min. :0.0000 White :2709 Min. :0.00000
Class :character 1st Qu.:0.500 1st Qu.:0.0000 Black :5911 1st Qu.:0.00000
Mode :character Median :1.200 Median :0.0000 Hispanic:3225 Median :0.00000
Mean :1.666 Mean :0.2502 Other : 453 Mean :0.02301
3rd Qu.:2.400 3rd Qu.:1.0000 3rd Qu.:0.00000
Max. :5.000 Max. :1.0000 Max. :1.00000
GLMs Complementary log-log Quantities of Interest
Fragile Families data
What’s in the data?
> head(d)
idnum income married cm1ethrace ev
1 0001 1.5 0 Hispanic 0
2 0002 1.6 0 Black 0
3 0003 2.7 0 White 0
4 0004 1.0 0 Hispanic 0
5 0006 0.2 0 Black 0
6 0007 1.3 0 Hispanic 0
> summary(d)
idnum income married cm1ethrace ev
Length:12298 Min. :0.000 Min. :0.0000 White :2709 Min. :0.00000
Class :character 1st Qu.:0.500 1st Qu.:0.0000 Black :5911 1st Qu.:0.00000
Mode :character Median :1.200 Median :0.0000 Hispanic:3225 Median :0.00000
Mean :1.666 Mean :0.2502 Other : 453 Mean :0.02301
3rd Qu.:2.400 3rd Qu.:1.0000 3rd Qu.:0.00000
Max. :5.000 Max. :1.0000 Max. :1.00000
GLMs Complementary log-log Quantities of Interest
Fragile Families data
ev:
dependent variable; was this child evicted in a given year?
income: family income / poverty line at age 1
married: were the parents married at the birth?
cm1ethrace: mother’s race/ethnicity
GLMs Complementary log-log Quantities of Interest
Fragile Families data
ev: dependent variable; was this child evicted in a given year?
income: family income / poverty line at age 1
married: were the parents married at the birth?
cm1ethrace: mother’s race/ethnicity
GLMs Complementary log-log Quantities of Interest
Fragile Families data
ev: dependent variable; was this child evicted in a given year?
income:
family income / poverty line at age 1
married: were the parents married at the birth?
cm1ethrace: mother’s race/ethnicity
GLMs Complementary log-log Quantities of Interest
Fragile Families data
ev: dependent variable; was this child evicted in a given year?
income: family income / poverty line at age 1
married: were the parents married at the birth?
cm1ethrace: mother’s race/ethnicity
GLMs Complementary log-log Quantities of Interest
Fragile Families data
ev: dependent variable; was this child evicted in a given year?
income: family income / poverty line at age 1
married:
were the parents married at the birth?
cm1ethrace: mother’s race/ethnicity
GLMs Complementary log-log Quantities of Interest
Fragile Families data
ev: dependent variable; was this child evicted in a given year?
income: family income / poverty line at age 1
married: were the parents married at the birth?
cm1ethrace: mother’s race/ethnicity
GLMs Complementary log-log Quantities of Interest
Fragile Families data
ev: dependent variable; was this child evicted in a given year?
income: family income / poverty line at age 1
married: were the parents married at the birth?
cm1ethrace:
mother’s race/ethnicity
GLMs Complementary log-log Quantities of Interest
Fragile Families data
ev: dependent variable; was this child evicted in a given year?
income: family income / poverty line at age 1
married: were the parents married at the birth?
cm1ethrace: mother’s race/ethnicity
GLMs Complementary log-log Quantities of Interest
Outline
1 GLMsGeneral Structure of GLMsProcedure for Running a GLM
2 Complementary log-log
3 Quantities of Interest
GLMs Complementary log-log Quantities of Interest
Binary Dependent Variable
Our outcome variable is whether or not a child was evicted
What’s the first question we should ask ourselves when we start tomodel this dependent variable?
GLMs Complementary log-log Quantities of Interest
Binary Dependent Variable
Our outcome variable is whether or not a child was evicted
What’s the first question we should ask ourselves when we start tomodel this dependent variable?
GLMs Complementary log-log Quantities of Interest
Binary Dependent Variable
Our outcome variable is whether or not a child was evicted
What’s the first question we should ask ourselves when we start tomodel this dependent variable?
GLMs Complementary log-log Quantities of Interest
1. Specify a distribution for Y
Yi ∼ Bernoulli(πi )
p(y|π) =n∏
i=1
πyii (1− πi )1−yi
2. Specify a linear predictor:
Xiβ
GLMs Complementary log-log Quantities of Interest
1. Specify a distribution for Y
Yi ∼ Bernoulli(πi )
p(y|π) =n∏
i=1
πyii (1− πi )1−yi
2. Specify a linear predictor:
Xiβ
GLMs Complementary log-log Quantities of Interest
1. Specify a distribution for Y
Yi ∼ Bernoulli(πi )
p(y|π) =n∏
i=1
πyii (1− πi )1−yi
2. Specify a linear predictor:
Xiβ
GLMs Complementary log-log Quantities of Interest
1. Specify a distribution for Y
Yi ∼ Bernoulli(πi )
p(y|π) =n∏
i=1
πyii (1− πi )1−yi
2. Specify a linear predictor:
Xiβ
GLMs Complementary log-log Quantities of Interest
1. Specify a distribution for Y
Yi ∼ Bernoulli(πi )
p(y|π) =n∏
i=1
πyii (1− πi )1−yi
2. Specify a linear predictor:
Xiβ
GLMs Complementary log-log Quantities of Interest
1. Specify a distribution for Y
Yi ∼ Bernoulli(πi )
p(y|π) =n∏
i=1
πyii (1− πi )1−yi
2. Specify a linear predictor:
Xiβ
GLMs Complementary log-log Quantities of Interest
3. Specify a link (or inverse link) function.
Complementary Log-log (cloglog):
log(− log(1− πi )) = Xiβ
πi = 1− exp(− exp(Xiβ))
We could also have chosen several other options:
Probit: πi = Φ(xiβ)Logit:
πi =1
1 + e−xiβ
Scobit: πi = (1 + e−xiβ)−α
GLMs Complementary log-log Quantities of Interest
3. Specify a link (or inverse link) function.
Complementary Log-log (cloglog):
log(− log(1− πi )) = Xiβ
πi = 1− exp(− exp(Xiβ))
We could also have chosen several other options:
Probit: πi = Φ(xiβ)Logit:
πi =1
1 + e−xiβ
Scobit: πi = (1 + e−xiβ)−α
GLMs Complementary log-log Quantities of Interest
3. Specify a link (or inverse link) function.
Complementary Log-log (cloglog):
log(− log(1− πi )) = Xiβ
πi = 1− exp(− exp(Xiβ))
We could also have chosen several other options:
Probit: πi = Φ(xiβ)Logit:
πi =1
1 + e−xiβ
Scobit: πi = (1 + e−xiβ)−α
GLMs Complementary log-log Quantities of Interest
3. Specify a link (or inverse link) function.
Complementary Log-log (cloglog):
log(− log(1− πi )) = Xiβ
πi = 1− exp(− exp(Xiβ))
We could also have chosen several other options:
Probit: πi = Φ(xiβ)
Logit:
πi =1
1 + e−xiβ
Scobit: πi = (1 + e−xiβ)−α
GLMs Complementary log-log Quantities of Interest
3. Specify a link (or inverse link) function.
Complementary Log-log (cloglog):
log(− log(1− πi )) = Xiβ
πi = 1− exp(− exp(Xiβ))
We could also have chosen several other options:
Probit: πi = Φ(xiβ)Logit:
πi =1
1 + e−xiβ
Scobit: πi = (1 + e−xiβ)−α
GLMs Complementary log-log Quantities of Interest
3. Specify a link (or inverse link) function.
Complementary Log-log (cloglog):
log(− log(1− πi )) = Xiβ
πi = 1− exp(− exp(Xiβ))
We could also have chosen several other options:
Probit: πi = Φ(xiβ)Logit:
πi =1
1 + e−xiβ
Scobit: πi = (1 + e−xiβ)−α
GLMs Complementary log-log Quantities of Interest
3. Specify a link (or inverse link) function.
Complementary Log-log (cloglog):
log(− log(1− πi )) = Xiβ
πi = 1− exp(− exp(Xiβ))
We could also have chosen several other options:
Probit: πi = Φ(xiβ)Logit:
πi =1
1 + e−xiβ
Scobit: πi = (1 + e−xiβ)−α
GLMs Complementary log-log Quantities of Interest
Our model
In the notation of Unifying Political Methodology, this is the modelwe’ve just defined:
Yi ∼ Bernoulli(πi )
πi = 1− exp(− exp(Xiβ))
GLMs Complementary log-log Quantities of Interest
Our model
In the notation of Unifying Political Methodology, this is the modelwe’ve just defined:
Yi ∼ Bernoulli(πi )
πi = 1− exp(− exp(Xiβ))
GLMs Complementary log-log Quantities of Interest
Our model
In the notation of Unifying Political Methodology, this is the modelwe’ve just defined:
Yi ∼ Bernoulli(πi )
πi = 1− exp(− exp(Xiβ))
GLMs Complementary log-log Quantities of Interest
Log-likelihood of the c-loglog
`(β | Y ) = log(L(β | Y ))
= log(p(Y | β))
= log
(n∏
i=1
p(Yi | β))
= log
(n∏
i=1
[1− exp(− exp(Xiβ))]Yi [exp(− exp(Xiβ))]
(1−Yi )
)
=n∑
i=1
(Yi log(1− exp(− exp(Xiβ))) + (1− Yi ) log[exp(− exp(Xiβ))])
=n∑
i=1
(Yi log(1− exp(− exp(Xiβ)))− (1− Yi ) exp(Xiβ))
GLMs Complementary log-log Quantities of Interest
Log-likelihood of the c-loglog
`(β | Y ) = log(L(β | Y ))
= log(p(Y | β))
= log
(n∏
i=1
p(Yi | β))
= log
(n∏
i=1
[1− exp(− exp(Xiβ))]Yi [exp(− exp(Xiβ))]
(1−Yi )
)
=n∑
i=1
(Yi log(1− exp(− exp(Xiβ))) + (1− Yi ) log[exp(− exp(Xiβ))])
=n∑
i=1
(Yi log(1− exp(− exp(Xiβ)))− (1− Yi ) exp(Xiβ))
GLMs Complementary log-log Quantities of Interest
Log-likelihood of the c-loglog
`(β | Y ) = log(L(β | Y ))
= log(p(Y | β))
= log
(n∏
i=1
p(Yi | β))
= log
(n∏
i=1
[1− exp(− exp(Xiβ))]Yi [exp(− exp(Xiβ))]
(1−Yi )
)
=n∑
i=1
(Yi log(1− exp(− exp(Xiβ))) + (1− Yi ) log[exp(− exp(Xiβ))])
=n∑
i=1
(Yi log(1− exp(− exp(Xiβ)))− (1− Yi ) exp(Xiβ))
GLMs Complementary log-log Quantities of Interest
Log-likelihood of the c-loglog
`(β | Y ) = log(L(β | Y ))
= log(p(Y | β))
= log
(n∏
i=1
p(Yi | β))
= log
(n∏
i=1
[1− exp(− exp(Xiβ))]Yi [exp(− exp(Xiβ))]
(1−Yi )
)
=n∑
i=1
(Yi log(1− exp(− exp(Xiβ))) + (1− Yi ) log[exp(− exp(Xiβ))])
=n∑
i=1
(Yi log(1− exp(− exp(Xiβ)))− (1− Yi ) exp(Xiβ))
GLMs Complementary log-log Quantities of Interest
Log-likelihood of the c-loglog
`(β | Y ) = log(L(β | Y ))
= log(p(Y | β))
= log
(n∏
i=1
p(Yi | β))
= log
(n∏
i=1
[1− exp(− exp(Xiβ))]Yi [exp(− exp(Xiβ))]
(1−Yi )
)
=n∑
i=1
(Yi log(1− exp(− exp(Xiβ))) + (1− Yi ) log[exp(− exp(Xiβ))])
=n∑
i=1
(Yi log(1− exp(− exp(Xiβ)))− (1− Yi ) exp(Xiβ))
GLMs Complementary log-log Quantities of Interest
Log-likelihood of the c-loglog
`(β | Y ) = log(L(β | Y ))
= log(p(Y | β))
= log
(n∏
i=1
p(Yi | β))
= log
(n∏
i=1
[1− exp(− exp(Xiβ))]Yi [exp(− exp(Xiβ))]
(1−Yi )
)
=n∑
i=1
(Yi log(1− exp(− exp(Xiβ))) + (1− Yi ) log[exp(− exp(Xiβ))])
=n∑
i=1
(Yi log(1− exp(− exp(Xiβ)))− (1− Yi ) exp(Xiβ))
GLMs Complementary log-log Quantities of Interest
Log-likelihood of the c-loglog
`(β | Y ) = log(L(β | Y ))
= log(p(Y | β))
= log
(n∏
i=1
p(Yi | β))
= log
(n∏
i=1
[1− exp(− exp(Xiβ))]Yi [exp(− exp(Xiβ))]
(1−Yi )
)
=n∑
i=1
(Yi log(1− exp(− exp(Xiβ))) + (1− Yi ) log[exp(− exp(Xiβ))])
=n∑
i=1
(Yi log(1− exp(− exp(Xiβ)))− (1− Yi ) exp(Xiβ))
GLMs Complementary log-log Quantities of Interest
Coding our log likelihood function
n∑i=1
(Yi log(1− exp(− exp(Xiβ)))− (1− Yi ) exp(Xiβ))
cloglog.loglik <- function(par, X, y) {
beta <- par
log.lik <- sum(y * log(1 - exp(-exp(X %*% beta))) -
(1 - y) * exp(X %*% beta))
return(log.lik)
}
GLMs Complementary log-log Quantities of Interest
Coding our log likelihood function
n∑i=1
(Yi log(1− exp(− exp(Xiβ)))− (1− Yi ) exp(Xiβ))
cloglog.loglik <- function(par, X, y) {
beta <- par
log.lik <- sum(y * log(1 - exp(-exp(X %*% beta))) -
(1 - y) * exp(X %*% beta))
return(log.lik)
}
GLMs Complementary log-log Quantities of Interest
Coding our log likelihood function
n∑i=1
(Yi log(1− exp(− exp(Xiβ)))− (1− Yi ) exp(Xiβ))
cloglog.loglik <- function(par, X, y) {
beta <- par
log.lik <- sum(y * log(1 - exp(-exp(X %*% beta))) -
(1 - y) * exp(X %*% beta))
return(log.lik)
}
GLMs Complementary log-log Quantities of Interest
Coding our log likelihood function
n∑i=1
(Yi log(1− exp(− exp(Xiβ)))− (1− Yi ) exp(Xiβ))
cloglog.loglik <- function(par, X, y) {
beta <- par
log.lik <- sum(y * log(1 - exp(-exp(X %*% beta))) -
(1 - y) * exp(X %*% beta))
return(log.lik)
}
GLMs Complementary log-log Quantities of Interest
Coding our log likelihood function
n∑i=1
(Yi log(1− exp(− exp(Xiβ)))− (1− Yi ) exp(Xiβ))
cloglog.loglik <- function(par, X, y) {
beta <- par
log.lik <- sum(y * log(1 - exp(-exp(X %*% beta))) -
(1 - y) * exp(X %*% beta))
return(log.lik)
}
GLMs Complementary log-log Quantities of Interest
Finding the MLE
X <- model.matrix(~married + cm1ethrace + income,
data = d)
opt <- optim(par = rep(0, ncol(X)),
fn = cloglog.loglik,
X = X,
y = d$ev,
control = list(fnscale = -1),
hessian = T,
method = "BFGS")
Point estimate of the MLE:
opt$par
[1] -2.704 -1.211 -0.526 -0.620 -0.272 -0.348
GLMs Complementary log-log Quantities of Interest
Finding the MLE
X <- model.matrix(~married + cm1ethrace + income,
data = d)
opt <- optim(par = rep(0, ncol(X)),
fn = cloglog.loglik,
X = X,
y = d$ev,
control = list(fnscale = -1),
hessian = T,
method = "BFGS")
Point estimate of the MLE:
opt$par
[1] -2.704 -1.211 -0.526 -0.620 -0.272 -0.348
GLMs Complementary log-log Quantities of Interest
Finding the MLE
X <- model.matrix(~married + cm1ethrace + income,
data = d)
opt <- optim(par = rep(0, ncol(X)),
fn = cloglog.loglik,
X = X,
y = d$ev,
control = list(fnscale = -1),
hessian = T,
method = "BFGS")
Point estimate of the MLE:
opt$par
[1] -2.704 -1.211 -0.526 -0.620 -0.272 -0.348
GLMs Complementary log-log Quantities of Interest
Finding the MLE
X <- model.matrix(~married + cm1ethrace + income,
data = d)
opt <- optim(par = rep(0, ncol(X)),
fn = cloglog.loglik,
X = X,
y = d$ev,
control = list(fnscale = -1),
hessian = T,
method = "BFGS")
Point estimate of the MLE:
opt$par
[1] -2.704 -1.211 -0.526 -0.620 -0.272 -0.348
GLMs Complementary log-log Quantities of Interest
Finding the MLE
X <- model.matrix(~married + cm1ethrace + income,
data = d)
opt <- optim(par = rep(0, ncol(X)),
fn = cloglog.loglik,
X = X,
y = d$ev,
control = list(fnscale = -1),
hessian = T,
method = "BFGS")
Point estimate of the MLE:
opt$par
[1] -2.704 -1.211 -0.526 -0.620 -0.272 -0.348
GLMs Complementary log-log Quantities of Interest
Finding the MLE
X <- model.matrix(~married + cm1ethrace + income,
data = d)
opt <- optim(par = rep(0, ncol(X)),
fn = cloglog.loglik,
X = X,
y = d$ev,
control = list(fnscale = -1),
hessian = T,
method = "BFGS")
Point estimate of the MLE:
opt$par
[1] -2.704 -1.211 -0.526 -0.620 -0.272 -0.348
GLMs Complementary log-log Quantities of Interest
Finding the MLE
X <- model.matrix(~married + cm1ethrace + income,
data = d)
opt <- optim(par = rep(0, ncol(X)),
fn = cloglog.loglik,
X = X,
y = d$ev,
control = list(fnscale = -1),
hessian = T,
method = "BFGS")
Point estimate of the MLE:
opt$par
[1] -2.704 -1.211 -0.526 -0.620 -0.272 -0.348
GLMs Complementary log-log Quantities of Interest
Finding the MLE
X <- model.matrix(~married + cm1ethrace + income,
data = d)
opt <- optim(par = rep(0, ncol(X)),
fn = cloglog.loglik,
X = X,
y = d$ev,
control = list(fnscale = -1),
hessian = T,
method = "BFGS")
Point estimate of the MLE:
opt$par
[1] -2.704 -1.211 -0.526 -0.620 -0.272 -0.348
GLMs Complementary log-log Quantities of Interest
Standard errors of the MLE
Recall that the standard errors are defined as the diagonal of:√−[∂2`
∂β2
]−1
where ∂2`∂β2 is the
2
2Credit to Stephen Pettigrew for including this figure in slides.
GLMs Complementary log-log Quantities of Interest
Standard errors of the MLE
Recall that the standard errors are defined as the diagonal of:√−[∂2`
∂β2
]−1
where ∂2`∂β2 is the
2
2Credit to Stephen Pettigrew for including this figure in slides.
GLMs Complementary log-log Quantities of Interest
Standard errors of the MLE
Recall that the standard errors are defined as the diagonal of:√−[∂2`
∂β2
]−1
where ∂2`∂β2 is the
2
2Credit to Stephen Pettigrew for including this figure in slides.
GLMs Complementary log-log Quantities of Interest
Standard errors of the MLE
Recall that the standard errors are defined as the diagonal of:√−[∂2`
∂β2
]−1
where ∂2`∂β2 is the
2
2Credit to Stephen Pettigrew for including this figure in slides.
GLMs Complementary log-log Quantities of Interest
Standard errors of the MLE
Variance-covariance matrix:
-solve(opt$hessian)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.026 -0.002 -0.021 -0.021 -0.019 -0.006
[2,] -0.002 0.063 0.004 0.002 -0.001 -0.004
[3,] -0.021 0.004 0.026 0.019 0.018 0.002
[4,] -0.021 0.002 0.019 0.033 0.018 0.002
[5,] -0.019 -0.001 0.018 0.018 0.128 0.002
[6,] -0.006 -0.004 0.002 0.002 0.002 0.004
Standard errors:
sqrt(diag(-solve(opt$hessian)))
[1] 0.162 0.252 0.160 0.183 0.358 0.064
GLMs Complementary log-log Quantities of Interest
Standard errors of the MLE
Variance-covariance matrix:
-solve(opt$hessian)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.026 -0.002 -0.021 -0.021 -0.019 -0.006
[2,] -0.002 0.063 0.004 0.002 -0.001 -0.004
[3,] -0.021 0.004 0.026 0.019 0.018 0.002
[4,] -0.021 0.002 0.019 0.033 0.018 0.002
[5,] -0.019 -0.001 0.018 0.018 0.128 0.002
[6,] -0.006 -0.004 0.002 0.002 0.002 0.004
Standard errors:
sqrt(diag(-solve(opt$hessian)))
[1] 0.162 0.252 0.160 0.183 0.358 0.064
GLMs Complementary log-log Quantities of Interest
Standard errors of the MLE
Variance-covariance matrix:
-solve(opt$hessian)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.026 -0.002 -0.021 -0.021 -0.019 -0.006
[2,] -0.002 0.063 0.004 0.002 -0.001 -0.004
[3,] -0.021 0.004 0.026 0.019 0.018 0.002
[4,] -0.021 0.002 0.019 0.033 0.018 0.002
[5,] -0.019 -0.001 0.018 0.018 0.128 0.002
[6,] -0.006 -0.004 0.002 0.002 0.002 0.004
Standard errors:
sqrt(diag(-solve(opt$hessian)))
[1] 0.162 0.252 0.160 0.183 0.358 0.064
GLMs Complementary log-log Quantities of Interest
Outline
1 GLMsGeneral Structure of GLMsProcedure for Running a GLM
2 Complementary log-log
3 Quantities of Interest
GLMs Complementary log-log Quantities of Interest
Interpreting c-loglog coefficients
Here’s a nicely formatted table with your regression results fromour model:
Variable Coefficient SE
Intercept -2.70 0.16Married -1.21 0.25Black -0.53 0.16Hispanic -0.62 0.18Other -0.27 0.36Income / poverty line -0.35 0.06
But what does this table even mean?
GLMs Complementary log-log Quantities of Interest
Interpreting c-loglog coefficients
Here’s a nicely formatted table with your regression results fromour model:
Variable Coefficient SE
Intercept -2.70 0.16Married -1.21 0.25Black -0.53 0.16Hispanic -0.62 0.18Other -0.27 0.36Income / poverty line -0.35 0.06
But what does this table even mean?
GLMs Complementary log-log Quantities of Interest
Interpreting c-loglog coefficients
Here’s a nicely formatted table with your regression results fromour model:
Variable Coefficient SE
Intercept -2.70 0.16Married -1.21 0.25Black -0.53 0.16Hispanic -0.62 0.18Other -0.27 0.36Income / poverty line -0.35 0.06
But what
does this table even mean?
GLMs Complementary log-log Quantities of Interest
Interpreting c-loglog coefficients
Here’s a nicely formatted table with your regression results fromour model:
Variable Coefficient SE
Intercept -2.70 0.16Married -1.21 0.25Black -0.53 0.16Hispanic -0.62 0.18Other -0.27 0.36Income / poverty line -0.35 0.06
But what does
this table even mean?
GLMs Complementary log-log Quantities of Interest
Interpreting c-loglog coefficients
Here’s a nicely formatted table with your regression results fromour model:
Variable Coefficient SE
Intercept -2.70 0.16Married -1.21 0.25Black -0.53 0.16Hispanic -0.62 0.18Other -0.27 0.36Income / poverty line -0.35 0.06
But what does this
table even mean?
GLMs Complementary log-log Quantities of Interest
Interpreting c-loglog coefficients
Here’s a nicely formatted table with your regression results fromour model:
Variable Coefficient SE
Intercept -2.70 0.16Married -1.21 0.25Black -0.53 0.16Hispanic -0.62 0.18Other -0.27 0.36Income / poverty line -0.35 0.06
But what does this table
even mean?
GLMs Complementary log-log Quantities of Interest
Interpreting c-loglog coefficients
Here’s a nicely formatted table with your regression results fromour model:
Variable Coefficient SE
Intercept -2.70 0.16Married -1.21 0.25Black -0.53 0.16Hispanic -0.62 0.18Other -0.27 0.36Income / poverty line -0.35 0.06
But what does this table even
mean?
GLMs Complementary log-log Quantities of Interest
Interpreting c-loglog coefficients
Here’s a nicely formatted table with your regression results fromour model:
Variable Coefficient SE
Intercept -2.70 0.16Married -1.21 0.25Black -0.53 0.16Hispanic -0.62 0.18Other -0.27 0.36Income / poverty line -0.35 0.06
But what does this table even mean?
GLMs Complementary log-log Quantities of Interest
Interpreting c-loglog results
What does it even mean for the coefficient for married to be -1.21?
All else constant, children of married parents have -1.21 pointslower log rate of eviction.
And what are log rates? Nobody thinks in terms of log odds, or
probit coefficients, or exponential rates.
GLMs Complementary log-log Quantities of Interest
Interpreting c-loglog results
What does it even mean for the coefficient for married to be -1.21?
All else constant, children of married parents have -1.21 pointslower log rate of eviction.
And what are log rates? Nobody thinks in terms of log odds, or
probit coefficients, or exponential rates.
GLMs Complementary log-log Quantities of Interest
Interpreting c-loglog results
What does it even mean for the coefficient for married to be -1.21?
All else constant, children of married parents have -1.21 pointslower log rate of eviction.
And what are log rates? Nobody thinks in terms of log odds, or
probit coefficients, or exponential rates.
GLMs Complementary log-log Quantities of Interest
Interpreting c-loglog results
What does it even mean for the coefficient for married to be -1.21?
All else constant, children of married parents have -1.21 pointslower log rate of eviction.
And what are log rates?
Nobody thinks in terms of log odds, or
probit coefficients, or exponential rates.
GLMs Complementary log-log Quantities of Interest
Interpreting c-loglog results
What does it even mean for the coefficient for married to be -1.21?
All else constant, children of married parents have -1.21 pointslower log rate of eviction.
And what are log rates? Nobody thinks in terms of log odds, or
probit coefficients, or exponential rates.
GLMs Complementary log-log Quantities of Interest
If there’s one thing you take away from this class, it should be this:
When you present results, always present yourfindings in terms of something that has substantive
meaning to the reader.
For binary outcome models that often means turning your resultsinto predicted probabilities, which is what we’ll do now.
If there’s a second thing you should take away, it’s this:
Always account for all types of uncertainty whenyou present your results
We’ll spend the rest of today looking at how to do that.
GLMs Complementary log-log Quantities of Interest
If there’s one thing you take away from this class, it should be this:
When you present results, always present yourfindings in terms of something that has substantive
meaning to the reader.
For binary outcome models that often means turning your resultsinto predicted probabilities, which is what we’ll do now.
If there’s a second thing you should take away, it’s this:
Always account for all types of uncertainty whenyou present your results
We’ll spend the rest of today looking at how to do that.
GLMs Complementary log-log Quantities of Interest
If there’s one thing you take away from this class, it should be this:
When you present results, always present yourfindings in terms of something that has substantive
meaning to the reader.
For binary outcome models that often means turning your resultsinto predicted probabilities, which is what we’ll do now.
If there’s a second thing you should take away, it’s this:
Always account for all types of uncertainty whenyou present your results
We’ll spend the rest of today looking at how to do that.
GLMs Complementary log-log Quantities of Interest
If there’s one thing you take away from this class, it should be this:
When you present results, always present yourfindings in terms of something that has substantive
meaning to the reader.
For binary outcome models that often means turning your resultsinto predicted probabilities, which is what we’ll do now.
If there’s a second thing you should take away, it’s this:
Always account for all types of uncertainty whenyou present your results
We’ll spend the rest of today looking at how to do that.
GLMs Complementary log-log Quantities of Interest
If there’s one thing you take away from this class, it should be this:
When you present results, always present yourfindings in terms of something that has substantive
meaning to the reader.
For binary outcome models that often means turning your resultsinto predicted probabilities, which is what we’ll do now.
If there’s a second thing you should take away, it’s this:
Always account for all types of uncertainty whenyou present your results
We’ll spend the rest of today looking at how to do that.
GLMs Complementary log-log Quantities of Interest
If there’s one thing you take away from this class, it should be this:
When you present results, always present yourfindings in terms of something that has substantive
meaning to the reader.
For binary outcome models that often means turning your resultsinto predicted probabilities, which is what we’ll do now.
If there’s a second thing you should take away, it’s this:
Always account for all types of uncertainty whenyou present your results
We’ll spend the rest of today looking at how to do that.
GLMs Complementary log-log Quantities of Interest
Getting Quantities of Interest
How to present results in a better format than just coefficients andstandard errors:
1 Write our your model and estimate β̂MLE and the Hessian
2 Simulate from the sampling distribution of β̂MLE to incorporateestimation uncertainty
3 Multiply these simulated β̃s by some covariates in the model to get
X̃β
4 Plug X̃β into your link function, g−1(X̃β), to put it on the samescale as the parameter(s) in your stochastic function
5 Use the transformed g−1(X̃β) to take thousands of draws from yourstochastic function and incorporate fundamental uncertainty
6 Store the mean of these simulations, E [y |X ]
7 Repeat steps 2 through 6 thousands of times
8 Use the results to make fancy graphs and informative tables
GLMs Complementary log-log Quantities of Interest
Getting Quantities of Interest
How to present results in a better format than just coefficients andstandard errors:
1 Write our your model and estimate β̂MLE and the Hessian
2 Simulate from the sampling distribution of β̂MLE to incorporateestimation uncertainty
3 Multiply these simulated β̃s by some covariates in the model to get
X̃β
4 Plug X̃β into your link function, g−1(X̃β), to put it on the samescale as the parameter(s) in your stochastic function
5 Use the transformed g−1(X̃β) to take thousands of draws from yourstochastic function and incorporate fundamental uncertainty
6 Store the mean of these simulations, E [y |X ]
7 Repeat steps 2 through 6 thousands of times
8 Use the results to make fancy graphs and informative tables
GLMs Complementary log-log Quantities of Interest
Getting Quantities of Interest
How to present results in a better format than just coefficients andstandard errors:
1 Write our your model and estimate β̂MLE and the Hessian
2 Simulate from the sampling distribution of β̂MLE to incorporateestimation uncertainty
3 Multiply these simulated β̃s by some covariates in the model to get
X̃β
4 Plug X̃β into your link function, g−1(X̃β), to put it on the samescale as the parameter(s) in your stochastic function
5 Use the transformed g−1(X̃β) to take thousands of draws from yourstochastic function and incorporate fundamental uncertainty
6 Store the mean of these simulations, E [y |X ]
7 Repeat steps 2 through 6 thousands of times
8 Use the results to make fancy graphs and informative tables
GLMs Complementary log-log Quantities of Interest
Getting Quantities of Interest
How to present results in a better format than just coefficients andstandard errors:
1 Write our your model and estimate β̂MLE and the Hessian
2 Simulate from the sampling distribution of β̂MLE to incorporateestimation uncertainty
3 Multiply these simulated β̃s by some covariates in the model to get
X̃β
4 Plug X̃β into your link function, g−1(X̃β), to put it on the samescale as the parameter(s) in your stochastic function
5 Use the transformed g−1(X̃β) to take thousands of draws from yourstochastic function and incorporate fundamental uncertainty
6 Store the mean of these simulations, E [y |X ]
7 Repeat steps 2 through 6 thousands of times
8 Use the results to make fancy graphs and informative tables
GLMs Complementary log-log Quantities of Interest
Getting Quantities of Interest
How to present results in a better format than just coefficients andstandard errors:
1 Write our your model and estimate β̂MLE and the Hessian
2 Simulate from the sampling distribution of β̂MLE to incorporateestimation uncertainty
3 Multiply these simulated β̃s by some covariates in the model to get
X̃β
4 Plug X̃β into your link function, g−1(X̃β), to put it on the samescale as the parameter(s) in your stochastic function
5 Use the transformed g−1(X̃β) to take thousands of draws from yourstochastic function and incorporate fundamental uncertainty
6 Store the mean of these simulations, E [y |X ]
7 Repeat steps 2 through 6 thousands of times
8 Use the results to make fancy graphs and informative tables
GLMs Complementary log-log Quantities of Interest
Getting Quantities of Interest
How to present results in a better format than just coefficients andstandard errors:
1 Write our your model and estimate β̂MLE and the Hessian
2 Simulate from the sampling distribution of β̂MLE to incorporateestimation uncertainty
3 Multiply these simulated β̃s by some covariates in the model to get
X̃β
4 Plug X̃β into your link function, g−1(X̃β), to put it on the samescale as the parameter(s) in your stochastic function
5 Use the transformed g−1(X̃β) to take thousands of draws from yourstochastic function and incorporate fundamental uncertainty
6 Store the mean of these simulations, E [y |X ]
7 Repeat steps 2 through 6 thousands of times
8 Use the results to make fancy graphs and informative tables
GLMs Complementary log-log Quantities of Interest
Getting Quantities of Interest
How to present results in a better format than just coefficients andstandard errors:
1 Write our your model and estimate β̂MLE and the Hessian
2 Simulate from the sampling distribution of β̂MLE to incorporateestimation uncertainty
3 Multiply these simulated β̃s by some covariates in the model to get
X̃β
4 Plug X̃β into your link function, g−1(X̃β), to put it on the samescale as the parameter(s) in your stochastic function
5 Use the transformed g−1(X̃β) to take thousands of draws from yourstochastic function and incorporate fundamental uncertainty
6 Store the mean of these simulations, E [y |X ]
7 Repeat steps 2 through 6 thousands of times
8 Use the results to make fancy graphs and informative tables
GLMs Complementary log-log Quantities of Interest
Getting Quantities of Interest
How to present results in a better format than just coefficients andstandard errors:
1 Write our your model and estimate β̂MLE and the Hessian
2 Simulate from the sampling distribution of β̂MLE to incorporateestimation uncertainty
3 Multiply these simulated β̃s by some covariates in the model to get
X̃β
4 Plug X̃β into your link function, g−1(X̃β), to put it on the samescale as the parameter(s) in your stochastic function
5 Use the transformed g−1(X̃β) to take thousands of draws from yourstochastic function and incorporate fundamental uncertainty
6 Store the mean of these simulations, E [y |X ]
7 Repeat steps 2 through 6 thousands of times
8 Use the results to make fancy graphs and informative tables
GLMs Complementary log-log Quantities of Interest
Getting Quantities of Interest
How to present results in a better format than just coefficients andstandard errors:
1 Write our your model and estimate β̂MLE and the Hessian
2 Simulate from the sampling distribution of β̂MLE to incorporateestimation uncertainty
3 Multiply these simulated β̃s by some covariates in the model to get
X̃β
4 Plug X̃β into your link function, g−1(X̃β), to put it on the samescale as the parameter(s) in your stochastic function
5 Use the transformed g−1(X̃β) to take thousands of draws from yourstochastic function and incorporate fundamental uncertainty
6 Store the mean of these simulations, E [y |X ]
7 Repeat steps 2 through 6 thousands of times
8 Use the results to make fancy graphs and informative tables
GLMs Complementary log-log Quantities of Interest
Simulate from the sampling distribution of β̂MLE
By the central limit theorem, we assume thatβ̂MLE ∼ mvnorm(β̂, V̂ (β̂))
β̂ is the vector of our estimates for the parameters, opt$par
V̂ (β̂) is the variance-covariance matrix, -solve(opt$hessian)
We hope that the β̂s we estimated are good estimates of the trueβs, but we know that they aren’t exactly perfect because ofestimation uncertainty.
So we account for this uncertainty by simulating βs from themultivariate normal distribution defined above
GLMs Complementary log-log Quantities of Interest
Simulate from the sampling distribution of β̂MLE
By the central limit theorem, we assume thatβ̂MLE ∼ mvnorm(β̂, V̂ (β̂))
β̂ is the vector of our estimates for the parameters, opt$par
V̂ (β̂) is the variance-covariance matrix, -solve(opt$hessian)
We hope that the β̂s we estimated are good estimates of the trueβs, but we know that they aren’t exactly perfect because ofestimation uncertainty.
So we account for this uncertainty by simulating βs from themultivariate normal distribution defined above
GLMs Complementary log-log Quantities of Interest
Simulate from the sampling distribution of β̂MLE
By the central limit theorem, we assume thatβ̂MLE ∼ mvnorm(β̂, V̂ (β̂))
β̂ is the vector of our estimates for the parameters, opt$par
V̂ (β̂) is the variance-covariance matrix, -solve(opt$hessian)
We hope that the β̂s we estimated are good estimates of the trueβs, but we know that they aren’t exactly perfect because ofestimation uncertainty.
So we account for this uncertainty by simulating βs from themultivariate normal distribution defined above
GLMs Complementary log-log Quantities of Interest
Simulate from the sampling distribution of β̂MLE
By the central limit theorem, we assume thatβ̂MLE ∼ mvnorm(β̂, V̂ (β̂))
β̂ is the vector of our estimates for the parameters, opt$par
V̂ (β̂) is the variance-covariance matrix, -solve(opt$hessian)
We hope that the β̂s we estimated are good estimates of the trueβs, but we know that they aren’t exactly perfect because ofestimation uncertainty.
So we account for this uncertainty by simulating βs from themultivariate normal distribution defined above
GLMs Complementary log-log Quantities of Interest
Simulate from the sampling distribution of β̂MLE
By the central limit theorem, we assume thatβ̂MLE ∼ mvnorm(β̂, V̂ (β̂))
β̂ is the vector of our estimates for the parameters, opt$par
V̂ (β̂) is the variance-covariance matrix, -solve(opt$hessian)
We hope that the β̂s we estimated are good estimates of the trueβs, but we know that they aren’t exactly perfect because ofestimation uncertainty.
So we account for this uncertainty by simulating βs from themultivariate normal distribution defined above
GLMs Complementary log-log Quantities of Interest
Simulate from the sampling distribution of β̂MLE
Simulate one draw from mvnorm(β̂, V̂ (β̂))
Install the mvtnorm package if you need to
install.packages("mvtnorm")
require(mvtnorm)
sim.betas <- rmvnorm(n = 1,
mean = opt$par,
sigma = -solve(opt$hessian))
sim.betas
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] -2.825 -1.424 -0.478 -0.358 -0.123 -0.237
GLMs Complementary log-log Quantities of Interest
Simulate from the sampling distribution of β̂MLE
Simulate one draw from mvnorm(β̂, V̂ (β̂))Install the mvtnorm package if you need to
install.packages("mvtnorm")
require(mvtnorm)
sim.betas <- rmvnorm(n = 1,
mean = opt$par,
sigma = -solve(opt$hessian))
sim.betas
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] -2.825 -1.424 -0.478 -0.358 -0.123 -0.237
GLMs Complementary log-log Quantities of Interest
Untransform Xβ
Now we need to choose some values of the covariates that we wantpredictions about.
Let’s make predictions for one white child born to married parentswith family income at the poverty line Recall that our predictors(in order) are:
> colnames(X)
[1] "(Intercept)" "married" "cm1ethraceBlack" "cm1ethraceHispanic"
[5] "cm1ethraceOther" "income"
We can set the values of X as:
setX <- c(1, 1, 0, 0, 0, 1)
GLMs Complementary log-log Quantities of Interest
Untransform Xβ
Now we need to choose some values of the covariates that we wantpredictions about.
Let’s make predictions for one white child born to married parentswith family income at the poverty line Recall that our predictors(in order) are:
> colnames(X)
[1] "(Intercept)" "married" "cm1ethraceBlack" "cm1ethraceHispanic"
[5] "cm1ethraceOther" "income"
We can set the values of X as:
setX <- c(1, 1, 0, 0, 0, 1)
GLMs Complementary log-log Quantities of Interest
Untransform Xβ
Now we need to choose some values of the covariates that we wantpredictions about.
Let’s make predictions for one white child born to married parentswith family income at the poverty line Recall that our predictors(in order) are:
> colnames(X)
[1] "(Intercept)" "married" "cm1ethraceBlack" "cm1ethraceHispanic"
[5] "cm1ethraceOther" "income"
We can set the values of X as:
setX <- c(1, 1, 0, 0, 0, 1)
GLMs Complementary log-log Quantities of Interest
Untransform Xβ
Now we need to choose some values of the covariates that we wantpredictions about.
Let’s make predictions for one white child born to married parentswith family income at the poverty line Recall that our predictors(in order) are:
> colnames(X)
[1] "(Intercept)" "married" "cm1ethraceBlack" "cm1ethraceHispanic"
[5] "cm1ethraceOther" "income"
We can set the values of X as:
setX <- c(1, 1, 0, 0, 0, 1)
GLMs Complementary log-log Quantities of Interest
Untransform Xβ
Now we multiply our covariates of interest by our simulatedparameters:
setX %*% t(sim.betas)
[,1]
[1,] -4.486287
If we stopped right here we’d be making two mistakes.
1 -4.486287 is not the predicted probability (obviously - it’snegative!), it’s the predicted log rate
2 We haven’t done a very good job of accounting for theuncertainty in the model
GLMs Complementary log-log Quantities of Interest
Untransform Xβ
Now we multiply our covariates of interest by our simulatedparameters:
setX %*% t(sim.betas)
[,1]
[1,] -4.486287
If we stopped right here we’d be making two mistakes.
1 -4.486287 is not the predicted probability (obviously - it’snegative!), it’s the predicted log rate
2 We haven’t done a very good job of accounting for theuncertainty in the model
GLMs Complementary log-log Quantities of Interest
Untransform Xβ
Now we multiply our covariates of interest by our simulatedparameters:
setX %*% t(sim.betas)
[,1]
[1,] -4.486287
If we stopped right here we’d be making two mistakes.
1 -4.486287 is not the predicted probability (obviously - it’snegative!), it’s the predicted log rate
2 We haven’t done a very good job of accounting for theuncertainty in the model
GLMs Complementary log-log Quantities of Interest
Untransform Xβ
Now we multiply our covariates of interest by our simulatedparameters:
setX %*% t(sim.betas)
[,1]
[1,] -4.486287
If we stopped right here we’d be making two mistakes.
1 -4.486287 is not the predicted probability (obviously - it’snegative!), it’s the predicted log rate
2 We haven’t done a very good job of accounting for theuncertainty in the model
GLMs Complementary log-log Quantities of Interest
Untransform Xβ
Now we multiply our covariates of interest by our simulatedparameters:
setX %*% t(sim.betas)
[,1]
[1,] -4.486287
If we stopped right here we’d be making two mistakes.
1 -4.486287 is not the predicted probability (obviously - it’snegative!), it’s the predicted log rate
2 We haven’t done a very good job of accounting for theuncertainty in the model
GLMs Complementary log-log Quantities of Interest
Untransform Xβ
To turn X̃ β̃ into a predicted probability we need to plug it backinto our link function, which was 1− exp(− exp(Xiβ))
> sim.p <- 1 - exp(-exp(setX %*% t(sim.betas)))
> sim.p
[,1]
[1,] 0.0111992
GLMs Complementary log-log Quantities of Interest
Untransform Xβ
To turn X̃ β̃ into a predicted probability we need to plug it backinto our link function, which was 1− exp(− exp(Xiβ))
> sim.p <- 1 - exp(-exp(setX %*% t(sim.betas)))
> sim.p
[,1]
[1,] 0.0111992
GLMs Complementary log-log Quantities of Interest
Untransform Xβ
To turn X̃ β̃ into a predicted probability we need to plug it backinto our link function, which was 1− exp(− exp(Xiβ))
> sim.p <- 1 - exp(-exp(setX %*% t(sim.betas)))
> sim.p
[,1]
[1,] 0.0111992
GLMs Complementary log-log Quantities of Interest
Simulate from the stochastic function
Now we have to account for fundamental uncertainty by simulatingfrom the original stochastic function, Bernoulli
> draws <- rbinom(n = 10000, size = 1, prob = sim.p)
> mean(draws)
[1] 0.0117
Is 0.0117 our best guess at the predicted probability of eviction?Nope
GLMs Complementary log-log Quantities of Interest
Simulate from the stochastic function
Now we have to account for fundamental uncertainty by simulatingfrom the original stochastic function, Bernoulli
> draws <- rbinom(n = 10000, size = 1, prob = sim.p)
> mean(draws)
[1] 0.0117
Is 0.0117 our best guess at the predicted probability of eviction?Nope
GLMs Complementary log-log Quantities of Interest
Simulate from the stochastic function
Now we have to account for fundamental uncertainty by simulatingfrom the original stochastic function, Bernoulli
> draws <- rbinom(n = 10000, size = 1, prob = sim.p)
> mean(draws)
[1] 0.0117
Is 0.0117 our best guess at the predicted probability of eviction?Nope
GLMs Complementary log-log Quantities of Interest
Simulate from the stochastic function
Now we have to account for fundamental uncertainty by simulatingfrom the original stochastic function, Bernoulli
> draws <- rbinom(n = 10000, size = 1, prob = sim.p)
> mean(draws)
[1] 0.0117
Is 0.0117 our best guess at the predicted probability of eviction?
Nope
GLMs Complementary log-log Quantities of Interest
Simulate from the stochastic function
Now we have to account for fundamental uncertainty by simulatingfrom the original stochastic function, Bernoulli
> draws <- rbinom(n = 10000, size = 1, prob = sim.p)
> mean(draws)
[1] 0.0117
Is 0.0117 our best guess at the predicted probability of eviction?Nope
GLMs Complementary log-log Quantities of Interest
Store and repeat
Remember, we only took 1 draw of our βs from the multivariatenormal distribution.
To fully account for estimation uncertainty, we need to take tonsof draws of β̃.
To do this we’d need to loop over all the steps I just went throughand get the full distribution of predicted probabilities for this case.
GLMs Complementary log-log Quantities of Interest
Store and repeat
Remember, we only took 1 draw of our βs from the multivariatenormal distribution.
To fully account for estimation uncertainty, we need to take tonsof draws of β̃.
To do this we’d need to loop over all the steps I just went throughand get the full distribution of predicted probabilities for this case.
GLMs Complementary log-log Quantities of Interest
Store and repeat
Remember, we only took 1 draw of our βs from the multivariatenormal distribution.
To fully account for estimation uncertainty, we need to take tonsof draws of β̃.
To do this we’d need to loop over all the steps I just went throughand get the full distribution of predicted probabilities for this case.
GLMs Complementary log-log Quantities of Interest
Store and repeat
Remember, we only took 1 draw of our βs from the multivariatenormal distribution.
To fully account for estimation uncertainty, we need to take tonsof draws of β̃.
To do this we’d need to loop over all the steps I just went throughand get the full distribution of predicted probabilities for this case.
GLMs Complementary log-log Quantities of Interest
Speeding up the process
Or, instead of using a loop, let’s just vectorize our code:
sim.betas <- rmvnorm(n = 10000,
mean = opt$par,
sigma = -solve(opt$hessian))
head(sim.betas)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] -2.800 -0.955 -0.545 -0.531 -0.217 -0.319
[2,] -2.410 -0.879 -0.772 -0.896 -0.301 -0.485
[3,] -2.715 -1.672 -0.483 -0.741 -0.365 -0.333
[4,] -2.718 -0.993 -0.588 -0.545 -0.094 -0.389
[5,] -2.479 -1.161 -0.721 -0.794 -0.601 -0.413
[6,] -2.625 -1.227 -0.654 -0.586 -0.496 -0.351
dim(sim.betas)
[1] 10000 6
GLMs Complementary log-log Quantities of Interest
Speeding up the process
Or, instead of using a loop, let’s just vectorize our code:
sim.betas <- rmvnorm(n = 10000,
mean = opt$par,
sigma = -solve(opt$hessian))
head(sim.betas)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] -2.800 -0.955 -0.545 -0.531 -0.217 -0.319
[2,] -2.410 -0.879 -0.772 -0.896 -0.301 -0.485
[3,] -2.715 -1.672 -0.483 -0.741 -0.365 -0.333
[4,] -2.718 -0.993 -0.588 -0.545 -0.094 -0.389
[5,] -2.479 -1.161 -0.721 -0.794 -0.601 -0.413
[6,] -2.625 -1.227 -0.654 -0.586 -0.496 -0.351
dim(sim.betas)
[1] 10000 6
GLMs Complementary log-log Quantities of Interest
Speeding up the process
Or, instead of using a loop, let’s just vectorize our code:
sim.betas <- rmvnorm(n = 10000,
mean = opt$par,
sigma = -solve(opt$hessian))
head(sim.betas)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] -2.800 -0.955 -0.545 -0.531 -0.217 -0.319
[2,] -2.410 -0.879 -0.772 -0.896 -0.301 -0.485
[3,] -2.715 -1.672 -0.483 -0.741 -0.365 -0.333
[4,] -2.718 -0.993 -0.588 -0.545 -0.094 -0.389
[5,] -2.479 -1.161 -0.721 -0.794 -0.601 -0.413
[6,] -2.625 -1.227 -0.654 -0.586 -0.496 -0.351
dim(sim.betas)
[1] 10000 6
GLMs Complementary log-log Quantities of Interest
Speeding up the process
Or, instead of using a loop, let’s just vectorize our code:
sim.betas <- rmvnorm(n = 10000,
mean = opt$par,
sigma = -solve(opt$hessian))
head(sim.betas)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] -2.800 -0.955 -0.545 -0.531 -0.217 -0.319
[2,] -2.410 -0.879 -0.772 -0.896 -0.301 -0.485
[3,] -2.715 -1.672 -0.483 -0.741 -0.365 -0.333
[4,] -2.718 -0.993 -0.588 -0.545 -0.094 -0.389
[5,] -2.479 -1.161 -0.721 -0.794 -0.601 -0.413
[6,] -2.625 -1.227 -0.654 -0.586 -0.496 -0.351
dim(sim.betas)
[1] 10000 6
GLMs Complementary log-log Quantities of Interest
Speeding up the process
Or, instead of using a loop, let’s just vectorize our code:
sim.betas <- rmvnorm(n = 10000,
mean = opt$par,
sigma = -solve(opt$hessian))
head(sim.betas)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] -2.800 -0.955 -0.545 -0.531 -0.217 -0.319
[2,] -2.410 -0.879 -0.772 -0.896 -0.301 -0.485
[3,] -2.715 -1.672 -0.483 -0.741 -0.365 -0.333
[4,] -2.718 -0.993 -0.588 -0.545 -0.094 -0.389
[5,] -2.479 -1.161 -0.721 -0.794 -0.601 -0.413
[6,] -2.625 -1.227 -0.654 -0.586 -0.496 -0.351
dim(sim.betas)
[1] 10000 6
GLMs Complementary log-log Quantities of Interest
Speeding up the process
Now multiply the 10,000 x 6 β̃ matrix by your 1 x 6 vector of X̃ ofinterest
pred.xb <- setX %*% t(sim.betas)
And untransform them
> pred.prob <- 1 - exp(-exp(pred.xb))
> pred.prob[,1:5]
[1] 0.016858874 0.022674923 0.008876607 0.016426627 0.017223131
GLMs Complementary log-log Quantities of Interest
Speeding up the process
Now multiply the 10,000 x 6 β̃ matrix by your 1 x 6 vector of X̃ ofinterest
pred.xb <- setX %*% t(sim.betas)
And untransform them
> pred.prob <- 1 - exp(-exp(pred.xb))
> pred.prob[,1:5]
[1] 0.016858874 0.022674923 0.008876607 0.016426627 0.017223131
GLMs Complementary log-log Quantities of Interest
Speeding up the process
Now multiply the 10,000 x 6 β̃ matrix by your 1 x 6 vector of X̃ ofinterest
pred.xb <- setX %*% t(sim.betas)
And untransform them
> pred.prob <- 1 - exp(-exp(pred.xb))
> pred.prob[,1:5]
[1] 0.016858874 0.022674923 0.008876607 0.016426627 0.017223131
GLMs Complementary log-log Quantities of Interest
Speeding up the process
Now multiply the 10,000 x 6 β̃ matrix by your 1 x 6 vector of X̃ ofinterest
pred.xb <- setX %*% t(sim.betas)
And untransform them
> pred.prob <- 1 - exp(-exp(pred.xb))
> pred.prob[,1:5]
[1] 0.016858874 0.022674923 0.008876607 0.016426627 0.017223131
GLMs Complementary log-log Quantities of Interest
Speeding up the process
Now multiply the 10,000 x 6 β̃ matrix by your 1 x 6 vector of X̃ ofinterest
pred.xb <- setX %*% t(sim.betas)
And untransform them
> pred.prob <- 1 - exp(-exp(pred.xb))
> pred.prob[,1:5]
[1] 0.016858874 0.022674923 0.008876607 0.016426627 0.017223131
GLMs Complementary log-log Quantities of Interest
Look at Our Results in Tabular Form
0
30
60
90
0.01 0.02 0.03
Probability of eviction
dens
ity
GLMs Complementary log-log Quantities of Interest
Look at Our Results in Tabular Form
0
30
60
90
0.01 0.02 0.03
Probability of eviction
dens
ity
GLMs Complementary log-log Quantities of Interest
Look at Our Results
> mean(pred.prob)
[1] 0.01446521
> quantile(pred.prob, prob = c(.025, .975))
2.5% 97.5%
0.00844883 0.02295435
> mean(pred.prob > .02)
[1] 0.0828
GLMs Complementary log-log Quantities of Interest
Look at Our Results
> mean(pred.prob)
[1] 0.01446521
> quantile(pred.prob, prob = c(.025, .975))
2.5% 97.5%
0.00844883 0.02295435
> mean(pred.prob > .02)
[1] 0.0828
GLMs Complementary log-log Quantities of Interest
Different QOI
What if our QOI was the chance of any eviction from birth to age9?
P(Ever evicted) = 1− P(Never evicted)
= 1−9∏
i=1
(1− P(Evicted at age i))
= 1− (1− p)9
All that will change is the very last step3
3Note: We assume independence between eviction in each year, and aconstant risk over time. This corresponds to an Exponential survival model.
GLMs Complementary log-log Quantities of Interest
Different QOI
What if our QOI was the chance of any eviction from birth to age9?
P(Ever evicted) = 1− P(Never evicted)
= 1−9∏
i=1
(1− P(Evicted at age i))
= 1− (1− p)9
All that will change is the very last step3
3Note: We assume independence between eviction in each year, and aconstant risk over time. This corresponds to an Exponential survival model.
GLMs Complementary log-log Quantities of Interest
Different QOI
What if our QOI was the chance of any eviction from birth to age9?
P(Ever evicted) = 1− P(Never evicted)
= 1−9∏
i=1
(1− P(Evicted at age i))
= 1− (1− p)9
All that will change is the very last step3
3Note: We assume independence between eviction in each year, and aconstant risk over time. This corresponds to an Exponential survival model.
GLMs Complementary log-log Quantities of Interest
Different QOI
We already estimated the sampling distribution of p and storedsamples from this distribution in the vector predprob. Now wecan just transform them!
P(Ever evictedi ) = 1− (1− pi )9
> cum.prob <- 1 - (1 - pred.prob) ^ 9
> mean(cum.prob)
[1] 0.1224441
12%! The probability of eviction looks much higher than we hadthought!
GLMs Complementary log-log Quantities of Interest
Different QOI
We already estimated the sampling distribution of p and storedsamples from this distribution in the vector predprob. Now wecan just transform them!
P(Ever evictedi ) = 1− (1− pi )9
> cum.prob <- 1 - (1 - pred.prob) ^ 9
> mean(cum.prob)
[1] 0.1224441
12%! The probability of eviction looks much higher than we hadthought!
GLMs Complementary log-log Quantities of Interest
Look at Our Results For Both QOIs
0
30
60
90
0.01 0.02 0.03
Probability of eviction
dens
ity
0
5
10
0.1 0.2 0.3
Cumulative probability of eviction by age 9
dens
ity
GLMs Complementary log-log Quantities of Interest
Look at Our Results For Both QOIs
0
30
60
90
0.01 0.02 0.03
Probability of eviction
dens
ity
0
5
10
0.1 0.2 0.3
Cumulative probability of eviction by age 9de
nsity
GLMs Complementary log-log Quantities of Interest
Quantities of interest matter in continuous
cases as well
Example: Modeling log income
GLMs Complementary log-log Quantities of Interest
Modeling log income
We want to model the effect of college (D) on earnings (Y ), netof age (X ).
Let’s get some data! http://cps.ipums.org
GLMs Complementary log-log Quantities of Interest
Causal identification
We state an ignorability assumption:
{Y (0),Y (1)} ⊥⊥ D | X
D Y
X = Age
This is our identification strategybut it says nothing about estimation.
GLMs Complementary log-log Quantities of Interest
Causal identification
We state an ignorability assumption:
{Y (0),Y (1)} ⊥⊥ D | X
D Y
X = Age
This is our identification strategybut it says nothing about estimation.
GLMs Complementary log-log Quantities of Interest
Causal identification
We state an ignorability assumption:
{Y (0),Y (1)} ⊥⊥ D | X
D Y
X = Age
This is our identification strategybut it says nothing about estimation.
GLMs Complementary log-log Quantities of Interest
Causal identification
We state an ignorability assumption:
{Y (0),Y (1)} ⊥⊥ D | X
D Y
X = Age
This is our identification strategy
but it says nothing about estimation.
GLMs Complementary log-log Quantities of Interest
Causal identification
We state an ignorability assumption:
{Y (0),Y (1)} ⊥⊥ D | X
D Y
X = Age
This is our identification strategybut it says nothing about estimation.
GLMs Complementary log-log Quantities of Interest
Estimation via GLMs
1. Specify a distribution for Y ...
LogNormal!
Y ∼ LogNormal(µ, σ2)
2. Specify a linear predictor
Xβ
3. Specify a link function
log(µ) = Xβ
4. Estimate parameters via maximum likelihood
5. Simulate quantities of interest
GLMs Complementary log-log Quantities of Interest
Estimation via GLMs
1. Specify a distribution for Y ... LogNormal!
Y ∼ LogNormal(µ, σ2)
2. Specify a linear predictor
Xβ
3. Specify a link function
log(µ) = Xβ
4. Estimate parameters via maximum likelihood
5. Simulate quantities of interest
GLMs Complementary log-log Quantities of Interest
Estimation via GLMs
1. Specify a distribution for Y ... LogNormal!
Y ∼ LogNormal(µ, σ2)
2. Specify a linear predictor
Xβ
3. Specify a link function
log(µ) = Xβ
4. Estimate parameters via maximum likelihood
5. Simulate quantities of interest
GLMs Complementary log-log Quantities of Interest
Estimation via GLMs
1. Specify a distribution for Y ... LogNormal!
Y ∼ LogNormal(µ, σ2)
2. Specify a linear predictor
Xβ
3. Specify a link function
log(µ) = Xβ
4. Estimate parameters via maximum likelihood
5. Simulate quantities of interest
GLMs Complementary log-log Quantities of Interest
Estimation via GLMs
1. Specify a distribution for Y ... LogNormal!
Y ∼ LogNormal(µ, σ2)
2. Specify a linear predictor
Xβ
3. Specify a link function
log(µ) = Xβ
4. Estimate parameters via maximum likelihood
5. Simulate quantities of interest
GLMs Complementary log-log Quantities of Interest
Estimation via GLMs
1. Specify a distribution for Y ... LogNormal!
Y ∼ LogNormal(µ, σ2)
2. Specify a linear predictor
Xβ
3. Specify a link function
log(µ) = Xβ
4. Estimate parameters via maximum likelihood
5. Simulate quantities of interest
GLMs Complementary log-log Quantities of Interest
Estimation via GLMs
1. Specify a distribution for Y ... LogNormal!
Y ∼ LogNormal(µ, σ2)
2. Specify a linear predictor
Xβ
3. Specify a link function
log(µ) = Xβ
4. Estimate parameters via maximum likelihood
5. Simulate quantities of interest
GLMs Complementary log-log Quantities of Interest
Estimation via GLMs
1. Specify a distribution for Y ... LogNormal!
Y ∼ LogNormal(µ, σ2)
2. Specify a linear predictor
Xβ
3. Specify a link function
log(µ) = Xβ
4. Estimate parameters via maximum likelihood
5. Simulate quantities of interest
GLMs Complementary log-log Quantities of Interest
Estimation via GLMs
1. Specify a distribution for Y ... LogNormal!
Y ∼ LogNormal(µ, σ2)
2. Specify a linear predictor
Xβ
3. Specify a link function
log(µ) = Xβ
4. Estimate parameters via maximum likelihood
5. Simulate quantities of interest
GLMs Complementary log-log Quantities of Interest
Likelihood
L(β, σ2 | Y ) ∝ f (Y | β, σ2)
=n∏
i=1
f (Yi | β, σ2)
=n∏
i=1
1
Yiσ√
2πexp
(−(ln(Yi )− µ)2
2σ2
)
GLMs Complementary log-log Quantities of Interest
Log likelihood
`(β, σ2 | Y) = ln
[N∏i=1
f (Yi |β, σ2)
]
= ln
[N∏i=1
1
Yiσ√
2πexp
(− (ln(Yi )− Xiβ)2
2σ2
)]
=N∑i=1
ln
[1
Yiσ√
2πexp
(− (ln(Yi )− Xiβ)2
2σ2
)]
=N∑i=1
(− ln(Yi )− ln(σ)− ln(
√2π) + ln
[exp
(− (ln(Yi )− Xiβ)2
2σ2
)])
=N∑i=1
(− ln(Yi )− ln(σ)− ln(
√2π)− (ln(Yi )− Xiβ)2
2σ2
).
=N∑i=1
(− ln(σ)− (ln(Yi )− Xiβ)2
2σ2
)
GLMs Complementary log-log Quantities of Interest
Log likelihood
`(β, σ2 | Y) = ln
[N∏i=1
f (Yi |β, σ2)
]
= ln
[N∏i=1
1
Yiσ√
2πexp
(− (ln(Yi )− Xiβ)2
2σ2
)]
=N∑i=1
ln
[1
Yiσ√
2πexp
(− (ln(Yi )− Xiβ)2
2σ2
)]
=N∑i=1
(− ln(Yi )− ln(σ)− ln(
√2π) + ln
[exp
(− (ln(Yi )− Xiβ)2
2σ2
)])
=N∑i=1
(− ln(Yi )− ln(σ)− ln(
√2π)− (ln(Yi )− Xiβ)2
2σ2
).
=N∑i=1
(− ln(σ)− (ln(Yi )− Xiβ)2
2σ2
)
GLMs Complementary log-log Quantities of Interest
Log likelihood
`(β, σ2 | Y) = ln
[N∏i=1
f (Yi |β, σ2)
]
= ln
[N∏i=1
1
Yiσ√
2πexp
(− (ln(Yi )− Xiβ)2
2σ2
)]
=N∑i=1
ln
[1
Yiσ√
2πexp
(− (ln(Yi )− Xiβ)2
2σ2
)]
=N∑i=1
(− ln(Yi )− ln(σ)− ln(
√2π) + ln
[exp
(− (ln(Yi )− Xiβ)2
2σ2
)])
=N∑i=1
(− ln(Yi )− ln(σ)− ln(
√2π)− (ln(Yi )− Xiβ)2
2σ2
).
=N∑i=1
(− ln(σ)− (ln(Yi )− Xiβ)2
2σ2
)
GLMs Complementary log-log Quantities of Interest
Log likelihood
`(β, σ2 | Y) = ln
[N∏i=1
f (Yi |β, σ2)
]
= ln
[N∏i=1
1
Yiσ√
2πexp
(− (ln(Yi )− Xiβ)2
2σ2
)]
=N∑i=1
ln
[1
Yiσ√
2πexp
(− (ln(Yi )− Xiβ)2
2σ2
)]
=N∑i=1
(− ln(Yi )− ln(σ)− ln(
√2π) + ln
[exp
(− (ln(Yi )− Xiβ)2
2σ2
)])
=N∑i=1
(− ln(Yi )− ln(σ)− ln(
√2π)− (ln(Yi )− Xiβ)2
2σ2
).
=N∑i=1
(− ln(σ)− (ln(Yi )− Xiβ)2
2σ2
)
GLMs Complementary log-log Quantities of Interest
Log likelihood
`(β, σ2 | Y) = ln
[N∏i=1
f (Yi |β, σ2)
]
= ln
[N∏i=1
1
Yiσ√
2πexp
(− (ln(Yi )− Xiβ)2
2σ2
)]
=N∑i=1
ln
[1
Yiσ√
2πexp
(− (ln(Yi )− Xiβ)2
2σ2
)]
=N∑i=1
(− ln(Yi )− ln(σ)− ln(
√2π) + ln
[exp
(− (ln(Yi )− Xiβ)2
2σ2
)])
=N∑i=1
(− ln(Yi )− ln(σ)− ln(
√2π)− (ln(Yi )− Xiβ)2
2σ2
)
.=
N∑i=1
(− ln(σ)− (ln(Yi )− Xiβ)2
2σ2
)
GLMs Complementary log-log Quantities of Interest
Log likelihood
`(β, σ2 | Y) = ln
[N∏i=1
f (Yi |β, σ2)
]
= ln
[N∏i=1
1
Yiσ√
2πexp
(− (ln(Yi )− Xiβ)2
2σ2
)]
=N∑i=1
ln
[1
Yiσ√
2πexp
(− (ln(Yi )− Xiβ)2
2σ2
)]
=N∑i=1
(− ln(Yi )− ln(σ)− ln(
√2π) + ln
[exp
(− (ln(Yi )− Xiβ)2
2σ2
)])
=N∑i=1
(− ln(Yi )− ln(σ)− ln(
√2π)− (ln(Yi )− Xiβ)2
2σ2
).
=N∑i=1
(− ln(σ)− (ln(Yi )− Xiβ)2
2σ2
)
GLMs Complementary log-log Quantities of Interest
Coding our log likelihood function
N∑i=1
(− ln(σ)− (ln(Yi )− Xiβ)2
2σ2
)
logNormal.log.lik <- function(par, X, y) {
beta <- par[-length(par)]
sigma2 <- exp(par[length(par)])
log.lik <- sum(-log(sqrt(sigma2)) -
((log(y) - X %*% beta) ^ 2) /
(2*sigma2))
return(log.lik)
}
GLMs Complementary log-log Quantities of Interest
Coding our log likelihood function
N∑i=1
(− ln(σ)− (ln(Yi )− Xiβ)2
2σ2
)
logNormal.log.lik <- function(par, X, y) {
beta <- par[-length(par)]
sigma2 <- exp(par[length(par)])
log.lik <- sum(-log(sqrt(sigma2)) -
((log(y) - X %*% beta) ^ 2) /
(2*sigma2))
return(log.lik)
}
GLMs Complementary log-log Quantities of Interest
Coding our log likelihood function
N∑i=1
(− ln(σ)− (ln(Yi )− Xiβ)2
2σ2
)
logNormal.log.lik <- function(par, X, y) {
beta <- par[-length(par)]
sigma2 <- exp(par[length(par)])
log.lik <- sum(-log(sqrt(sigma2)) -
((log(y) - X %*% beta) ^ 2) /
(2*sigma2))
return(log.lik)
}
GLMs Complementary log-log Quantities of Interest
Coding our log likelihood function
N∑i=1
(− ln(σ)− (ln(Yi )− Xiβ)2
2σ2
)
logNormal.log.lik <- function(par, X, y) {
beta <- par[-length(par)]
sigma2 <- exp(par[length(par)])
log.lik <- sum(-log(sqrt(sigma2)) -
((log(y) - X %*% beta) ^ 2) /
(2*sigma2))
return(log.lik)
}
GLMs Complementary log-log Quantities of Interest
Coding our log likelihood function
N∑i=1
(− ln(σ)− (ln(Yi )− Xiβ)2
2σ2
)
logNormal.log.lik <- function(par, X, y) {
beta <- par[-length(par)]
sigma2 <- exp(par[length(par)])
log.lik <- sum(-log(sqrt(sigma2)) -
((log(y) - X %*% beta) ^ 2) /
(2*sigma2))
return(log.lik)
}
GLMs Complementary log-log Quantities of Interest
Coding our log likelihood function
N∑i=1
(− ln(σ)− (ln(Yi )− Xiβ)2
2σ2
)
logNormal.log.lik <- function(par, X, y) {
beta <- par[-length(par)]
sigma2 <- exp(par[length(par)])
log.lik <- sum(-log(sqrt(sigma2)) -
((log(y) - X %*% beta) ^ 2) /
(2*sigma2))
return(log.lik)
}
GLMs Complementary log-log Quantities of Interest
Finding the MLE
X <- model.matrix(~ college + age,
data = d)
y <- d$incwage
opt <- optim(par = rep(0, ncol(X) + 1),
fn = logNormal.log.lik,
y = y,
X = X,
control = list(fnscale = -1),
method = "BFGS",
hessian = TRUE)
GLMs Complementary log-log Quantities of Interest
Extract the MLE
> opt$par
[1] 8.84550213 0.72904517 0.03270657 -0.03216774
See that it matches what we get with LM
> lm.fit <- lm(log(incwage) ~ college + age,
+ data = d)
> coef(lm.fit)
(Intercept) collegeTRUE age
8.84550213 0.72904517 0.03270657
Why is that last term negative? Because it’s the log of σ2!
GLMs Complementary log-log Quantities of Interest
Extract the MLE
> opt$par
[1] 8.84550213 0.72904517 0.03270657 -0.03216774
See that it matches what we get with LM
> lm.fit <- lm(log(incwage) ~ college + age,
+ data = d)
> coef(lm.fit)
(Intercept) collegeTRUE age
8.84550213 0.72904517 0.03270657
Why is that last term negative? Because it’s the log of σ2!
GLMs Complementary log-log Quantities of Interest
Extract the MLE
> opt$par
[1] 8.84550213 0.72904517 0.03270657 -0.03216774
See that it matches what we get with LM
> lm.fit <- lm(log(incwage) ~ college + age,
+ data = d)
> coef(lm.fit)
(Intercept) collegeTRUE age
8.84550213 0.72904517 0.03270657
Why is that last term negative?
Because it’s the log of σ2!
GLMs Complementary log-log Quantities of Interest
Extract the MLE
> opt$par
[1] 8.84550213 0.72904517 0.03270657 -0.03216774
See that it matches what we get with LM
> lm.fit <- lm(log(incwage) ~ college + age,
+ data = d)
> coef(lm.fit)
(Intercept) collegeTRUE age
8.84550213 0.72904517 0.03270657
Why is that last term negative? Because it’s the log of σ2!
GLMs Complementary log-log Quantities of Interest
See how σ2 matches
> summary(lm.fit)
Call:
lm(formula = log(incwage) ~ college + age, data = d)
.......other output....
Residual standard error: 0.9841 on 68932 degrees of freedom
We estimated
σ2 = eγ = e−0.03216774
> exp(opt$par[4])
[1] 0.9683441
It matches!
GLMs Complementary log-log Quantities of Interest
See how σ2 matches
> summary(lm.fit)
Call:
lm(formula = log(incwage) ~ college + age, data = d)
.......other output....
Residual standard error: 0.9841 on 68932 degrees of freedom
We estimated σ2 = eγ = e−0.03216774
> exp(opt$par[4])
[1] 0.9683441
It matches!
GLMs Complementary log-log Quantities of Interest
See how σ2 matches
> summary(lm.fit)
Call:
lm(formula = log(incwage) ~ college + age, data = d)
.......other output....
Residual standard error: 0.9841 on 68932 degrees of freedom
We estimated σ2 = eγ = e−0.03216774
> exp(opt$par[4])
[1] 0.9683441
It matches!
GLMs Complementary log-log Quantities of Interest
Variance-covariance matrix matches
We know how to calculate the variance-covariance matrix - theinverse of the negative Hessian!
> vcov.optim <- -solve(opt$hessian)
> vcov.optim
[,1] [,2] [,3] [,4]
[1,] 1.938105e-04 -6.974235e-06 -4.762674e-06 1.171351e-13
[2,] -6.974235e-06 6.311970e-05 -4.031325e-07 1.714362e-14
[3,] -4.762674e-06 -4.031325e-07 1.316824e-07 2.028694e-14
[4,] 1.171351e-13 1.714362e-14 2.028694e-14 2.901283e-05
And we can compare that to the canned version...
> vcov(lm.fit)
(Intercept) collegeTRUE age
(Intercept) 1.938189e-04 -6.974537e-06 -4.762880e-06
collegeTRUE -6.974537e-06 6.312244e-05 -4.031500e-07
age -4.762880e-06 -4.031500e-07 1.316881e-07
....which matches!
GLMs Complementary log-log Quantities of Interest
Variance-covariance matrix matches
We know how to calculate the variance-covariance matrix - theinverse of the negative Hessian!
> vcov.optim <- -solve(opt$hessian)
> vcov.optim
[,1] [,2] [,3] [,4]
[1,] 1.938105e-04 -6.974235e-06 -4.762674e-06 1.171351e-13
[2,] -6.974235e-06 6.311970e-05 -4.031325e-07 1.714362e-14
[3,] -4.762674e-06 -4.031325e-07 1.316824e-07 2.028694e-14
[4,] 1.171351e-13 1.714362e-14 2.028694e-14 2.901283e-05
And we can compare that to the canned version...
> vcov(lm.fit)
(Intercept) collegeTRUE age
(Intercept) 1.938189e-04 -6.974537e-06 -4.762880e-06
collegeTRUE -6.974537e-06 6.312244e-05 -4.031500e-07
age -4.762880e-06 -4.031500e-07 1.316881e-07
....which matches!
GLMs Complementary log-log Quantities of Interest
Variance-covariance matrix matches
We know how to calculate the variance-covariance matrix - theinverse of the negative Hessian!
> vcov.optim <- -solve(opt$hessian)
> vcov.optim
[,1] [,2] [,3] [,4]
[1,] 1.938105e-04 -6.974235e-06 -4.762674e-06 1.171351e-13
[2,] -6.974235e-06 6.311970e-05 -4.031325e-07 1.714362e-14
[3,] -4.762674e-06 -4.031325e-07 1.316824e-07 2.028694e-14
[4,] 1.171351e-13 1.714362e-14 2.028694e-14 2.901283e-05
And we can compare that to the canned version...
> vcov(lm.fit)
(Intercept) collegeTRUE age
(Intercept) 1.938189e-04 -6.974537e-06 -4.762880e-06
collegeTRUE -6.974537e-06 6.312244e-05 -4.031500e-07
age -4.762880e-06 -4.031500e-07 1.316881e-07
....which matches!
GLMs Complementary log-log Quantities of Interest
What is the effect of college on earnings?
Our model can easily estimate the effect of college on log earnings.
But we want the effect of college on earnings.
One could try to interpret the lognormal regression coefficients.We estimated βCollege = 0.729.
This means a unit increase in age increases earnings by a factor ofe0.729 = 2.208.
GLMs Complementary log-log Quantities of Interest
What is the effect of college on earnings?
Our model can easily estimate the effect of college on log earnings.But we want the effect of college on earnings.
One could try to interpret the lognormal regression coefficients.We estimated βCollege = 0.729.
This means a unit increase in age increases earnings by a factor ofe0.729 = 2.208.
GLMs Complementary log-log Quantities of Interest
What is the effect of college on earnings?
Our model can easily estimate the effect of college on log earnings.But we want the effect of college on earnings.
One could try to interpret the lognormal regression coefficients.We estimated βCollege = 0.729.
This means a unit increase in age increases earnings by a factor ofe0.729 = 2.208.
GLMs Complementary log-log Quantities of Interest
What is the effect of college on earnings?
Our model can easily estimate the effect of college on log earnings.But we want the effect of college on earnings.
One could try to interpret the lognormal regression coefficients.We estimated βCollege = 0.729.
This means a unit increase in age increases earnings by a factor ofe0.729 = 2.208.
GLMs Complementary log-log Quantities of Interest
What is the effect of college on earnings?
Our model can easily estimate the effect of college on log earnings.But we want the effect of college on earnings.
One could try to interpret the lognormal regression coefficients.We estimated βCollege = 0.729.
This means a unit increase in age increases earnings by a factor ofe0.729 = 2.208.
GLMs Complementary log-log Quantities of Interest
Student: Can’t we just exponentiate things?
No! We run into Jensen’s inequality.
E (Y ) = E (e log(Y )) ≥ eE(log(Y ))
So we can’t just exponentiate predicted values. What can we doinstead? Simulation!
GLMs Complementary log-log Quantities of Interest
Student: Can’t we just exponentiate things?
No! We run into Jensen’s inequality.
E (Y ) = E (e log(Y )) ≥ eE(log(Y ))
So we can’t just exponentiate predicted values. What can we doinstead?
Simulation!
GLMs Complementary log-log Quantities of Interest
Student: Can’t we just exponentiate things?
No! We run into Jensen’s inequality.
E (Y ) = E (e log(Y )) ≥ eE(log(Y ))
So we can’t just exponentiate predicted values. What can we doinstead? Simulation!
GLMs Complementary log-log Quantities of Interest
How wrong can you be?
[If time allows, we can walk through this in R]
Using the naive approach of exponentiating things, we would findan effect of college on earnings of $25,181.27.
Using simulation, the actual average treatment effect is$43,266.67!
Why the discrepancy? (Draw on the board).
GLMs Complementary log-log Quantities of Interest
How wrong can you be?
[If time allows, we can walk through this in R]Using the naive approach of exponentiating things, we would findan effect of college on earnings of $25,181.27.
Using simulation, the actual average treatment effect is$43,266.67!
Why the discrepancy? (Draw on the board).
GLMs Complementary log-log Quantities of Interest
How wrong can you be?
[If time allows, we can walk through this in R]Using the naive approach of exponentiating things, we would findan effect of college on earnings of $25,181.27.
Using simulation, the actual average treatment effect is$43,266.67!
Why the discrepancy? (Draw on the board).
GLMs Complementary log-log Quantities of Interest
How wrong can you be?
[If time allows, we can walk through this in R]Using the naive approach of exponentiating things, we would findan effect of college on earnings of $25,181.27.
Using simulation, the actual average treatment effect is$43,266.67!
Why the discrepancy? (Draw on the board).
GLMs Complementary log-log Quantities of Interest
Student: Ok. But since there were no interactions in the model,we don’t have to average over the population, right?
No! Effect sizes depend on the values chosen for the rest of thecovariates.
GLMs Complementary log-log Quantities of Interest
Student: Ok. But since there were no interactions in the model,we don’t have to average over the population, right?
No! Effect sizes depend on the values chosen for the rest of thecovariates.
GLMs Complementary log-log Quantities of Interest
Different effect sizes for different groups!
Effect 2.5% 97.5%
20-year-olds 23,276.88 19,466.92 27,494.3050-year-olds 62,319.15 51,498.74 73,129.50
Since 50-year-olds have higher predicted earnings to begin with,multiplying by a factor of 2.208 increases their earnings by moredollars.
GLMs Complementary log-log Quantities of Interest
Student: Ok, so now I see that I need to carefully specify andsimulate my quantity of interest, involving both estimationuncertainty and fundamental uncertainty.
Which of those goes away as the sample size grows?
GLMs Complementary log-log Quantities of Interest
Student: Ok, so now I see that I need to carefully specify andsimulate my quantity of interest, involving both estimationuncertainty and fundamental uncertainty.
Which of those goes away as the sample size grows?
GLMs Complementary log-log Quantities of Interest
Estimation uncertainty disappears in large samples
Each row section below shows the difference in earnings (college -noncollege), for 50 year olds.
$‘N = 100‘
2.5% 97.5%
Difference in any two 50-year-olds -186146.3 890301.0
Average difference 53387.9 249905.7
$‘N = 1,000‘
2.5% 97.5%
Difference in any two 50-year-olds -169635.63 479623.19
Average difference 51505.21 85801.36
$‘N = 50,000‘
2.5% 97.5%
Difference in any two 50-year-olds -177369.77 459907.68
Average difference 52190.08 73289.31
The expected difference is more precisely estimated with largersample sizes, but fundamental uncertainty makes it always difficultto make precise predictions about the difference between any twoactual individuals.
GLMs Complementary log-log Quantities of Interest
Conclusion: Getting Quantities of Interest
How to present results in a better format than just coefficients andstandard errors:
1 Write our your model and estimate β̂MLE and the Hessian
2 Simulate from the sampling distribution of β̂MLE to incorporateestimation uncertainty
3 Multiply these simulated β̃s by some covariates in the model to get
X̃β
4 Plug X̃β into your link function, g−1(X̃β), to put it on the samescale as the parameter(s) in your stochastic function
5 Use the transformed g−1(X̃β) to take thousands of draws from yourstochastic function and incorporate fundamental uncertainty
6 Store the mean of these simulations, E [y |X ]
7 Repeat steps 2 through 6 thousands of times
8 Use the results to make fancy graphs and informative tables
GLMs Complementary log-log Quantities of Interest
Conclusion: Getting Quantities of Interest
How to present results in a better format than just coefficients andstandard errors:
1 Write our your model and estimate β̂MLE and the Hessian
2 Simulate from the sampling distribution of β̂MLE to incorporateestimation uncertainty
3 Multiply these simulated β̃s by some covariates in the model to get
X̃β
4 Plug X̃β into your link function, g−1(X̃β), to put it on the samescale as the parameter(s) in your stochastic function
5 Use the transformed g−1(X̃β) to take thousands of draws from yourstochastic function and incorporate fundamental uncertainty
6 Store the mean of these simulations, E [y |X ]
7 Repeat steps 2 through 6 thousands of times
8 Use the results to make fancy graphs and informative tables
GLMs Complementary log-log Quantities of Interest
Conclusion: Getting Quantities of Interest
How to present results in a better format than just coefficients andstandard errors:
1 Write our your model and estimate β̂MLE and the Hessian
2 Simulate from the sampling distribution of β̂MLE to incorporateestimation uncertainty
3 Multiply these simulated β̃s by some covariates in the model to get
X̃β
4 Plug X̃β into your link function, g−1(X̃β), to put it on the samescale as the parameter(s) in your stochastic function
5 Use the transformed g−1(X̃β) to take thousands of draws from yourstochastic function and incorporate fundamental uncertainty
6 Store the mean of these simulations, E [y |X ]
7 Repeat steps 2 through 6 thousands of times
8 Use the results to make fancy graphs and informative tables
GLMs Complementary log-log Quantities of Interest
Conclusion: Getting Quantities of Interest
How to present results in a better format than just coefficients andstandard errors:
1 Write our your model and estimate β̂MLE and the Hessian
2 Simulate from the sampling distribution of β̂MLE to incorporateestimation uncertainty
3 Multiply these simulated β̃s by some covariates in the model to get
X̃β
4 Plug X̃β into your link function, g−1(X̃β), to put it on the samescale as the parameter(s) in your stochastic function
5 Use the transformed g−1(X̃β) to take thousands of draws from yourstochastic function and incorporate fundamental uncertainty
6 Store the mean of these simulations, E [y |X ]
7 Repeat steps 2 through 6 thousands of times
8 Use the results to make fancy graphs and informative tables
GLMs Complementary log-log Quantities of Interest
Conclusion: Getting Quantities of Interest
How to present results in a better format than just coefficients andstandard errors:
1 Write our your model and estimate β̂MLE and the Hessian
2 Simulate from the sampling distribution of β̂MLE to incorporateestimation uncertainty
3 Multiply these simulated β̃s by some covariates in the model to get
X̃β
4 Plug X̃β into your link function, g−1(X̃β), to put it on the samescale as the parameter(s) in your stochastic function
5 Use the transformed g−1(X̃β) to take thousands of draws from yourstochastic function and incorporate fundamental uncertainty
6 Store the mean of these simulations, E [y |X ]
7 Repeat steps 2 through 6 thousands of times
8 Use the results to make fancy graphs and informative tables
GLMs Complementary log-log Quantities of Interest
Conclusion: Getting Quantities of Interest
How to present results in a better format than just coefficients andstandard errors:
1 Write our your model and estimate β̂MLE and the Hessian
2 Simulate from the sampling distribution of β̂MLE to incorporateestimation uncertainty
3 Multiply these simulated β̃s by some covariates in the model to get
X̃β
4 Plug X̃β into your link function, g−1(X̃β), to put it on the samescale as the parameter(s) in your stochastic function
5 Use the transformed g−1(X̃β) to take thousands of draws from yourstochastic function and incorporate fundamental uncertainty
6 Store the mean of these simulations, E [y |X ]
7 Repeat steps 2 through 6 thousands of times
8 Use the results to make fancy graphs and informative tables
GLMs Complementary log-log Quantities of Interest
Conclusion: Getting Quantities of Interest
How to present results in a better format than just coefficients andstandard errors:
1 Write our your model and estimate β̂MLE and the Hessian
2 Simulate from the sampling distribution of β̂MLE to incorporateestimation uncertainty
3 Multiply these simulated β̃s by some covariates in the model to get
X̃β
4 Plug X̃β into your link function, g−1(X̃β), to put it on the samescale as the parameter(s) in your stochastic function
5 Use the transformed g−1(X̃β) to take thousands of draws from yourstochastic function and incorporate fundamental uncertainty
6 Store the mean of these simulations, E [y |X ]
7 Repeat steps 2 through 6 thousands of times
8 Use the results to make fancy graphs and informative tables
GLMs Complementary log-log Quantities of Interest
Conclusion: Getting Quantities of Interest
How to present results in a better format than just coefficients andstandard errors:
1 Write our your model and estimate β̂MLE and the Hessian
2 Simulate from the sampling distribution of β̂MLE to incorporateestimation uncertainty
3 Multiply these simulated β̃s by some covariates in the model to get
X̃β
4 Plug X̃β into your link function, g−1(X̃β), to put it on the samescale as the parameter(s) in your stochastic function
5 Use the transformed g−1(X̃β) to take thousands of draws from yourstochastic function and incorporate fundamental uncertainty
6 Store the mean of these simulations, E [y |X ]
7 Repeat steps 2 through 6 thousands of times
8 Use the results to make fancy graphs and informative tables
GLMs Complementary log-log Quantities of Interest
Conclusion: Getting Quantities of Interest
How to present results in a better format than just coefficients andstandard errors:
1 Write our your model and estimate β̂MLE and the Hessian
2 Simulate from the sampling distribution of β̂MLE to incorporateestimation uncertainty
3 Multiply these simulated β̃s by some covariates in the model to get
X̃β
4 Plug X̃β into your link function, g−1(X̃β), to put it on the samescale as the parameter(s) in your stochastic function
5 Use the transformed g−1(X̃β) to take thousands of draws from yourstochastic function and incorporate fundamental uncertainty
6 Store the mean of these simulations, E [y |X ]
7 Repeat steps 2 through 6 thousands of times
8 Use the results to make fancy graphs and informative tables