Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with...
Transcript of Multilevel logistic regression with outcome · PDF fileMultilevel logistic regression with...
Multilevel logistic regression with outcome uncertainty
Leo Bastos
Scientific Computing Program (PROCC)Oswaldo Cruz Foundation (Fiocruz)
Rio de Janeiro, Brazil
Funding: CNPq and FAPERJ
April 8, 2014
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 1 / 37
1 MotivationHard-to-reach populationsNational crack users survey
2 ModelInference, priors and implementation
3 ApplicationsDetention among crack users in Rio de JaneiroHIV among crack users in Rio de Janeiro
4 Working in progress
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 2 / 37
Outline
1 MotivationHard-to-reach populationsNational crack users survey
2 ModelInference, priors and implementation
3 ApplicationsDetention among crack users in Rio de JaneiroHIV among crack users in Rio de Janeiro
4 Working in progress
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 3 / 37
Outline
1 MotivationHard-to-reach populationsNational crack users survey
2 ModelInference, priors and implementation
3 ApplicationsDetention among crack users in Rio de JaneiroHIV among crack users in Rio de Janeiro
4 Working in progress
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 4 / 37
Motivation
Our group study hard-to-reach populations, such as MSM, FSW,heavy drug users, crack users, etc.
How many are they?
Obtain indirect information from the general population.
Who are they?
Sampling directly from these populations
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 5 / 37
How many are they?
Sample from general population
Ask ‘how many Xs do the person know?’ from several populations.
Network degree can be estimated
Prevalence for hard-to-reach populations can be estimated
Dealing with complex samples (?) {Si, Patel, and Gelman, 2015}
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 6 / 37
Who are they?
Usual sampling methods cannot be used;
Alternative sampling methodsRespondent-driven sampling (RDS)
Chain-referral sampling; Network degree reported.Estabilished point estimatorsThere is no model-based approach for RDS
Time-location sampling (TLS)
Complex probabilistic sampleRequire previous knowlegment about population behaviourHierarchical modelling can be done
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 7 / 37
Motivation
In epidemiology, Logistic regression is a standard tool;
Coefficients give directly interpretable measures of risk;
Usually applied into survey data;
HoweverData may lack accuracy
Rapid disease tests;Self response;
Complex samples (cluster and/or stratify samples)Natural “levels” in the study
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 8 / 37
Outline
1 MotivationHard-to-reach populationsNational crack users survey
2 ModelInference, priors and implementation
3 ApplicationsDetention among crack users in Rio de JaneiroHIV among crack users in Rio de Janeiro
4 Working in progress
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 9 / 37
National crack users survey
Crack cocaine is the freebase form of cocaine that can be smoked.
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 10 / 37
National crack users survey
The Brazilian crack users survey is the largest survey with crack users(n = 7381 users in crack scenes)
SENAD/MJ (Ministry of Justice) and Fiocruz
Complex sample for a hard-to-reach population
41 geographical strata: capitals (26+1), metropolitan regions (9), restof brasil by region (5)Time-location sampling
Time: days of week (7) and turns (3)Location: Crack scenes previously mappedUsers: Inverse probability scheme
Data
Socio-demographic and risk behaviour for infectious diseaseRapid tests for HIV, hepa C, and TBCrime, access to health system, drug use behaviour, etc.
The first results were published in a book freely available.
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 11 / 37
National crack users survey
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 12 / 37
Outline
1 MotivationHard-to-reach populationsNational crack users survey
2 ModelInference, priors and implementation
3 ApplicationsDetention among crack users in Rio de JaneiroHIV among crack users in Rio de Janeiro
4 Working in progress
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 13 / 37
Multilevel logistic regression with outcome uncertainty
Yi = {0, 1} is a test result of a disease for patient i .
Zi = {0, 1} is the true disease status (unknown).
γs and γe are the sensitivity and specificity of the diagnostic test
Hence, the probability of a positive outcome is given by
P(Yi = 1) = πi = θiγs + (1− θi )(1− γe )
where θi is the probability of the patient i truly has the disease.
We observe Yi , but are interested to infer about Zi , i.e. θi
usually, there is some reliable information about the pair (γs , γe)
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 14 / 37
Multilevel logistic regression with outcome uncertainty
The model can be represented as the following
Yi ∼ Bernoulli(πi )
πi = θiγs + (1− θi )(1− γe )
logit(θi ) = αj[i ] + xTi βj[i ]
Using Gelman and Hill (2007) multilevel notation.
Outcome uncertainty: Madger and Hughes (1997), McInturff et al.(2004)
Depending on group-level structure, the associated distribution cancontain
Independence between groups (iid)Temporal dependence (RW, ARMA, DLM)Spatial dependence (CAR)You name it...
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 15 / 37
Outline
1 MotivationHard-to-reach populationsNational crack users survey
2 ModelInference, priors and implementation
3 ApplicationsDetention among crack users in Rio de JaneiroHIV among crack users in Rio de Janeiro
4 Working in progress
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 16 / 37
Inference, priors and implementation
Frenquentist solution
For the classical model and, sensitivity and specificity both known, vanden Hout et al. (2007) propose a simple implementation based onGLM.Analogously, a multilevel model can be adapted. (iid case works onlme4)
Bayesian approach
Key feature of Bayesian approach are the priors.Gelman et al. (2008) proposed weakly informative priors for logisticmodels.Fong et al. (2010) described how to elicit weakly informative priors forrandom effects.The problem lies on the specificity and sensitivity parameters. (Fox etal. 2005, Chu et al. 2006)
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 17 / 37
Inference, priors and implementation
Too sensisitive of specificity and sensitivity parameters.
Either specify the values or set (very) informative priors
γs ∼ Beta(aγs , bγs ), γe ∼ Beta(aγe , bγe )
On the bright side, we have information about γs and γeElicitation:
Fix a pontual prior estimate m∗ = E[γ] = a/(a + b);Fix a prior sample size, n∗ = a + b;Find (a, b) in terms of (m∗, n∗).
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 18 / 37
Inference
All inference is based on the posterior distribution
p(α,β, γs , γe , ψ | Y, x) (1)
Inference methodSampling via MCMC (Gelfand & Smith, 1990)
Numerical approximation for the joint posterior;Computationally intensive;BUGS and Stan may help;
Approximate posterior marginals via INLA (Rue et al., 2009).
INLA is fast and accurate;Feasible to apply to large data sets;
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 19 / 37
INLA in a nutshell
Integrated nested Laplace approximation (INLA)
Works in a class of latent Gaussian models. [y |θ,φ]
Let the posterior distribution be π(θ,φ|y)
Posterior marginal distribution
π(θi |y) =
∫π(θi |φ, y)π(φ|y)dφ
π(φk |y) =
∫π(φ|y)dφ−k
Approximated by
π(θi |y) ≈∫π̃(θi |φ, y)π̃(φ|y)dφ
π(φk |y) ≈∫π̃(φ|y)dφ−k
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 20 / 37
INLA details
Sensitivity-specificity model has been implemented (with Havard Ruesupport)
Bernouli problem in INLA (Ferkingstad & Rue, 2015, March)
WAIC has been implemented, but not used here.
Benchmark with MCMC to be done via simulation studies.
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 21 / 37
Outline
1 MotivationHard-to-reach populationsNational crack users survey
2 ModelInference, priors and implementation
3 ApplicationsDetention among crack users in Rio de JaneiroHIV among crack users in Rio de Janeiro
4 Working in progress
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 22 / 37
Outline
1 MotivationHard-to-reach populationsNational crack users survey
2 ModelInference, priors and implementation
3 ApplicationsDetention among crack users in Rio de JaneiroHIV among crack users in Rio de Janeiro
4 Working in progress
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 23 / 37
Detention among crack users in Rio de Janeiro
Find risk factors for detention (in the past 12 months) among crackusers in Rio de Janeiro
“Have you been detained (stayed less than one day in a police station)in the past year? (Yes, No)”
Explanatory variables
Poli user: {yes, no}Rehab: {yes, no}Illicit money for obtain drugs: {yes, no}Homeless: {Yes; No}Years of study: {<8, ≥8}Race: {White, Black, “Pardo”, others}Gender: {male, female}Age: {< 31, 31+}
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 24 / 37
Detention among crack users in Rio de Janeiro
The design effect was included in the model throughout randomeffects for strata (capital or MR) and for TLS level (crack scene andday/turn)
Multilevel model assumptions
The time and location clusters were chosen according to a simplerandom samplingThe sampling design is ignorable at user levels
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 25 / 37
Multilevel logistic regression
The complete model is the following
Deti ∼ Bernoulli(
thetai )
logit(θi ) = αgeo[i ] + α∗tls[i ] + xiβ
Priors (weakly informative)
Coefficients: Cauchy(0, 2.5) (Gelman et al., 2008)Random effects: Cauchy(0, 10) (Fong et al., 2010)
930 crack users were interviewed in Rio de Janeiro, 544 in the capital,and 386 in the metropolitan region.
17 users were excluded due to missing data (15 in the capital and 2 inthe MR)
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 26 / 37
Multilevel logistic regression model
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 27 / 37
DIC - Deviance information criterion
DIC
Crude 1139.827Stratum 1136.454
Stratum + TLS 1092.895
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 28 / 37
Outline
1 MotivationHard-to-reach populationsNational crack users survey
2 ModelInference, priors and implementation
3 ApplicationsDetention among crack users in Rio de JaneiroHIV among crack users in Rio de Janeiro
4 Working in progress
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 29 / 37
HIV among crack users in Rio de Janeiro
Finding risk factors for HIV among crack users in Rio de Janeiro.
Explanatory variables
Gender: {male, female}Age: {< 31, 31+}Received money or drugs in exchance for sex (last 30 days): {yes, no}
930 crack users were interviewed in Rio de Janeiro, 544 in the capital,and 386 in the MR.
345 users either refuse or had inconclusive test results. (excluded)Status Capital MR
HIV- 377 178HIV+ 22 8
Total 399 186
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 30 / 37
HIV among crack users in Rio de Janeiro
The design effect was included in the model throughout randomeffects for strata (capital or MR) and for TLS level (crack scene andday/turn)
The effect of ”received money or drugs in exchance for sex (last 30days)” may vary between capital and MR, a random coefficient wasconsidered.
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 31 / 37
Multilevel logistic regression with outcome uncertainty
The complete model is the following
HIVi ∼ Bernoulli(πi )
πi = θiγs + (1− θi )(1− γe )
logit(θi ) = αgeo[i ] + α∗tls[i ] + ηi
ηi = β1GenderFem + β2Age31p + βgeo[i ]Money4SexYes
γs = 0.9999 and γe = 0.989 from the Rapid HIV test instructions.
Priors (weakly informative)
Coefficients: Cauchy(0, 2.5)Random effects: Cauchy(0, 10)
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 32 / 37
Logistic regression model
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 33 / 37
Multilevel logistic regression model
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 34 / 37
DIC - Deviance information criterion
DIC
LR MLR
Crude 166.97 168.27Design 157.03 148.77
SS + Design 156.38 150.40
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 35 / 37
Outline
1 MotivationHard-to-reach populationsNational crack users survey
2 ModelInference, priors and implementation
3 ApplicationsDetention among crack users in Rio de JaneiroHIV among crack users in Rio de Janeiro
4 Working in progress
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 36 / 37
Work in progress
Multilevel modeling allows to include the study design in the model
Outcome uncertainty can also be delt with
Already implemented in INLA testing version
> inla(model.equation, family="testbinomial1",...)
Benchmark against standard MCMC implemented in stan (Hoffman &Gelman, 2012);
Estimates for γs and γe depend heavily on prior choice, thusdemanding more detailed prior sensitivity assessment;
Investigate performance under various scenarios with a comprehensivesimulation study.
Methodological paper in progress...
Leo Bastos (PROCC-Fiocruz) Multilevel logistic with outcome uncertainty UFRJ 37 / 37