Algorithms for sampling from the Bayesian posterior distribu - Cytel
Transcript of Algorithms for sampling from the Bayesian posterior distribu - Cytel
Algorithms for sampling from the Bayesian posterior distribu6on for Four Parameter Logis6c and
Sigmoid Emax models
Ni6n Patel*, Chris Jennisonǂ, Jane Templeǂ, Charles Liu*
* Cytel, Inc., USA ǂUniversity of Bath, UK
Outline
• Mo6va6on • A Metropolis-‐Has6ngs algorithm • A Gibbs Sampling algorithm currently being implemented in a soKware product (Compass)
• A Direct Monte Carlo algorithm • Extensions and work in progress
In this talk we will focus on 4PL models as algorithms and results for Sigmoid Emax models are very similar
2 ICSA 2012 Applied Sta6s6cs Symposium, Boston
Mo6va6on • 4 Parameter Logis6c (4PL) and Sigmoid Emax models are very
popular for modeling dose response in clinical trials • Exis6ng algorithms in Cytel’s Bayesian dose finding trial design
soKware (Compass and CytelSim) for compu6ng posterior distribu6ons for these models use a Metropolis-‐ Has6ngs method developed by ScoX Berry (Berry Consultants). Algorithm has slow convergence to steady state.
• Could it be improved for both speed and ease of use?
– The number of samples from the posterior distribu6ons required for the work described by Jim Bolognese was around 250 million random draws.
• Having different methods for posterior computa6ons
facilitates valida6on of results by comparing answers from dis6nct algorithms
3 ICSA 2012 Applied Sta6s6cs Symposium, Boston
Four parameter logis/c (4PL) D is dose (or logdose) Y is response of a subject on dose D Parameters Minimum response (β) Response range (δ) Median effec6ve dose (θ) Slope parameter (τ), τ>0
4 PL Model for Dose Response
4 ICSA 2012 Applied Sta6s6cs Symposium, Boston
4PL Model for Dose Response
Very flexible • Fits diverse range of monotonic dose response curves
• includes linear, concave, convex and sigmoidal shapes
5 ICSA 2012 Applied Sta6s6cs Symposium, Boston
MCMC: Metropolis-Hastings
No convenient conjugate prior, so Markov chain Monte Carlo (MCMC) Random-‐walk Metropolis Has/ngs algorithm -‐ sample new point via random walk; -‐ if new point posterior density > posterior density at current point, move to new point; -‐ else stay at current point. In theory, converges to posterior distribu6on, but difficul6es in prac6ce…
Current point
Higher density proposal
Lower density proposal
Parameter value
Posterior density
6 ICSA 2012 Applied Sta6s6cs Symposium, Boston
Metropolis-‐Has/ngs for Independent Normal priors for β, δ, θ, τ and Inverse Gamma for σ2 At each itera6on of the MCMC chain, the parameters β, δ, θ, τ are sampled successively from their univariate condi6onal distribu6ons using a random-‐walk Metropolis step. Thus this is a ‘Metropolis within Gibbs’ algorithm (Marin and Roberts) The condi6onal distribu6on for σ2 is Inverse Gamma, so it is sampled directly using a Gibbs step.
7 ICSA 2012 Applied Sta6s6cs Symposium, Boston
MCMC: Metropolis-Hastings
Disadvantages: -‐ High autocorrela/on between Monte Carlo samples -‐ Dependence on star/ng points; requires discarding many burn-‐in samples
-‐ Long runs to ensure convergence (parameter space is explored adequately)
-‐ Requires “tuning” of proposal density (SD of random walk). Inefficient if too many samples rejected or accepted in random walk
8 ICSA 2012 Applied Sta6s6cs Symposium, Boston
Example: True dose response
Parameter Value
β 0
δ 1.1
θ 4
τ 0.5
σ 2 (known)
ICSA 2012 Applied Sta6s6cs Symposium, Boston 9
Example: one trial simula6on
ICSA 2012 Applied Sta6s6cs Symposium, Boston 10
Doses N Obs. Mean Resp.
0 30 -‐0.04
1 30 -‐0.08
2 30 0.06
3 30 -‐1.02
4 30 0.52
5 30 1.34
6 30 0.9
7 30 0.89
8 30 1.79
True Observed
MCMC: Convergence issues
Metropolis-‐Has/ngs trace plot
blank
(500) burn-‐in samples discarded
Dependence on star6ng points
Autocorrela6on
11 ICSA 2012 Applied Sta6s6cs Symposium, Boston
Autocorrelation function of samples
Metropolis-‐Has/ngs (Beta)
12 ICSA 2012 Applied Sta6s6cs Symposium, Boston
MCMC: Gibbs
Gibbs Sampling -‐ Unlike Metropolis-‐Has6ngs, all samples are accepted. -‐ Requires sampling from known full condi6onal posterior distribu6ons.
Our Gibbs Algorithm generates samples in 3 blocks : 1. Sample (β, δ | θ, τ, σ2) ~ MVN(µ, Σ) 2. Sample (θ, τ | β, δ, σ2) ~ grid(θ, τ) * 3. Sample (σ2 | β, δ, σ2) ~ IG(1/a, b) (same as in MH algorithm) -‐ back to block 1…
Advantages: -‐ Lower autocorrela6on; less burn-‐in; no "tuning"; no samples rejected
* “Griddy” Gibbs sampler (Tanner, 1996).
13 ICSA 2012 Applied Sta6s6cs Symposium, Boston
Gibbs Sampling We assume the following independent priors: β ~ Normal(μβ, σβ2), δ ~ Normal(μδ, σδ2), θ ~ Discrete Uniform (θL , θL+1 ,⋯ θU), Nota6on: Let be the vector of observed mean responses at the D doses Let W be a diagonal matrix with diagonal elements =1/nj , where nj is the number of subjects on dose Dj
Let xj = {1+exp(θ – Dj)/τ}−1 and x = (x1, x2,…, xD)T Let X denote the Dx2 matrix [ 1, x]
ICSA 2012 Applied Sta6s6cs Symposium, Boston 14
y
Sampling the (β, δ| θ,τ, σ2 ) block Posterior Condi/onal Distribu/ons
Sample (β, δ |θ, τ , σ2) from this Bivariate Normal Distribu/on 15 ICSA 2012 Applied Sta6s6cs Symposium,
Boston
Condi6onal Distribu6on of (θ,τ|β,δ,σ2)
Compute the likelihood for each support point ( θk , τl) in the discrete prior Since each point is equally likely in the prior, the joint distribu6on of (θ, τ) is given by the likelihood normalized over all points in the grid.
16 ICSA 2012 Applied Sta6s6cs Symposium, Boston
Sample (θ, τ | β, δ , σ2) from this bivariate discrete distribu/on
Condi6onal Distribu6on of (σ2|θ,τ,β,δ)
Sample 1/σ2 from the Gamma Distribu/on with parameters α+n/2 and ψ+SSQ/2
17 ICSA 2012 Applied Sta6s6cs Symposium, Boston
MCMC: Gibbs
Gibbs trace plot
blank
18 ICSA 2012 Applied Sta6s6cs Symposium, Boston
Autocorrelation function Gibbs (Beta)
19 ICSA 2012 Applied Sta6s6cs Symposium, Boston
Autocorrelation functions for Beta Metropolis-‐Has/ngs Gibbs
Disadvantage of Gibbs: Needs star6ng values , Computa6on 6me can be slow; Burn-‐in requires trial-‐and-‐error 20 ICSA 2012 Applied Sta6s6cs Symposium,
Boston
Effective Sample Size The advantage of Gibbs > MH can be quan6fied by effec/ve sample size: equivalent number of i.i.d. samples. For MCMC (MH & Gibbs), the sampling error is inflated due to autocorrela6on.
M: effec/ve sample size n: original sample size ρk: autocorrela/on at lag k Many methods for approxima6ng effec6ve sample size (e.g., batch means). -‐ Here, we used the effectiveSize() func6on in R package CODA.
21 ICSA 2012 Applied Sta6s6cs Symposium, Boston
Effective Sampling Speed
Example design: 1000 simulated trials; 1 cohort of 270 pa6ents; 8 doses; 1000 steady state samples per trial; grid size: 30 x 30. Effec/ve sampling speed: rate of genera6ng equivalent number of i.i.d. samples. Gibbs Sampling is 20 6mes faster than Metropolis-‐Has6ngs
Algorithm Effec/ve sample size (N)
Compu/ng /me (seconds)
Effec/ve sampling speed (seconds/N)
Metropolis-‐Has6ngs 18 118 6.5
Gibbs 840 273 0.325
22 ICSA 2012 Applied Sta6s6cs Symposium, Boston
Ease of Use vs. M-‐H
• Gibbs sampling does not require tuning of the random walk parameter
• Does require selec6ng: – Star6ng values for (θ, τ) – Grid values (θmin, θmax) (τmin, τmax) and number of grid points. (In most cases we have found 30x30 grid is adequate.)
23 ICSA 2012 Applied Sta6s6cs Symposium, Boston
Marginal posterior distribution of (θ,τ|σ2)
It can be shown that, for a uniform discrete prior on θ and τ* :
27 ICSA 2012 Applied Sta6s6cs Symposium, Boston
* Can be easily extended to other discrete priors e.g. discre/zed Normal
Direct Monte Carlo (for known σ2)
Direct posterior probability calcula6ons: (not Markov chain hence avoids MCMC convergence issues altogether, also computa6on is much faster!) 1. Marginal posterior distribu6on Pr(θ, τ | D) is calculated for each grid point of (θ, τ) space. From this joint distribu6on it is easy to sample posterior distribu6on of τ. Sample τ from this distribu6on using inverse sampling 2. Calculate the marginal distribu6on of θ|τ and sample θ from this distribu6on using inverse sampling 3. Sample (β, δ | θ, τ, σ2) ~ MVN(µ, Σ) as in Gibbs 25 ICSA 2012 Applied Sta6s6cs Symposium,
Boston
Notation
268 ICSA 2012 Applied Sta6s6cs Symposium, Boston
ˆ ˆ,uv uvβ δ are WLS es6mates of intercept and slope in linear regression of duvy on x
Direct Monte Carlo
Direct Monte Carlo trace plot
No dependence on star6ng points
No autocorrela6on No need for burn-‐in
27 ICSA 2012 Applied Sta6s6cs Symposium, Boston
Effective Sampling Speed
Example design: 1000 simulated trials; 1 cohort of 270 pa6ents; 8 doses; 1000 steady state samples per trial; grid size: 30 x 30. Effec/ve sampling speed: rate of genera6ng equivalent number of i.i.d. samples. Direct sampling is 5 6mes faster than Gibbs, and 100 6mes faster than Metropolis-‐Has6ngs!
Effec/ve sample size (N)
Compu/ng /me (seconds)
Effec/ve sampling speed (seconds/N)
Metropolis-‐Has6ngs 18 118 6.5
Gibbs 840 273 0.325
Direct 1000 66 0.066
28 ICSA 2012 Applied Sta6s6cs Symposium, Boston
Autocorrelation function
Direct (Beta)
29 ICSA 2012 Applied Sta6s6cs Symposium, Boston
Ease of Use: Direct vs. Gibbs • Direct sampling does not require selec6ng star6ng values,
burn-‐in length or steady state sampling length to account for auto-‐correla6on in samples
• Does require : – Grid values (θmin, θmax) (τmin, τmax) and number of grid points. – If grid is too small: we miss significant posterior parameter values. – If grid is too large: we have computa6onal inefficiency (In most cases we have found 30x30 grid is adequate.)
• Number of points in grid should be chosen to approximate the con6nuous distribu6on of (θ, τ) reasonably well
30 ICSA 2012 Applied Sta6s6cs Symposium, Boston
Design Simulation vs. Data Analysis Design simula/on:
• We can use true values to center grid and for star6ng values, also facilitates selec6on of a reasonable range
• Need to simulate data for many trials within
mul6ple scenarios to evaluate opera6ng characteris6cs of design
Data analysis:
• Need to es6mate likely range of true data-‐ genera6ng values before specifying limits • Actual observed data used so computa6on is for just
one data set
31 ICSA 2012 Applied Sta6s6cs Symposium, Boston
Summary and future work
• The Gibbs sampling method outperforms Metropolis –Has6ngs for posterior samples for 4PL and Sigmoid Emax models in speed and also requires less effort in specifica6on by user
• The Direct sampling method is easy to use as it does not require tuning parameters and convergence assessment. It is beXer than Gibbs sampling for known σ but the method needs to be extended to handle unknown σ (work in progress . . .) • Automa6ng grid selec6on will make the Direct sampling method very straigh�orward to use (work in progress . . .)
• We are extending the Direct sampling algorithm to mul6variate observa6ons for a PK/PD applica6on with mul6ple endpoints 32 ICSA 2012 Applied Sta6s6cs Symposium,
Boston
Thank you!
Extra Slides
Automating Grid Limits
Work in Progress: In the direct algorithm, how can we automate the loca6on of θ, τ grid values?
35 ICSA 2012 Applied Sta6s6cs Symposium, Boston
Automating Grid Limits (theta)
For fixed β, δ , τ, what happens as θ varies?
Upper bound θmax : θ where [(E(Y|dmax) – β) –fmin]/δ = ε.
Lower bound θmin : θ where [δ – (E(Y|dmin)–β)]/δ = ε.
Ignore very “flat” curves: Pr[θ < θmin or θ > θmax] = 0. 36 ICSA 2012 Applied Sta6s6cs Symposium,
Boston
Automating Grid Limits (theta)
37 ICSA 2012 Applied Sta6s6cs Symposium, Boston
ICSA 2012 Applied Sta6s6cs Symposium, Boston 38
Cytel Tools for Dose-‐Finding
CytelSim (cont.)
CytelSim (binary)
Compass™ (cont.)
Compass™ (binary)
Up&down (1 or 2 targets) √ √ √ √
T-‐stat (1 or 2 targets) √ √ √ √
2-‐stage (isotonic) √ 2-‐stage (R-‐function based) √
2-‐stage (Hochberg) √ 4-‐param logistic Bayesian √ √ √ √
Umbrella (Maximizing) √ √ √ √ NDLM Bayesian √ √
Emax Bayesian √ √
CRM √ √
MCP-‐mod *
Ivanova 2-‐stage Bayesian *
Ph2 -‐> Ph3 -‐> PoS & NPV √
* coming soon
# Posterior samples in NP simula6ons
• The number of designs simulated in the work described by Jim Bolognese required simula6on of approximately 500 design scenarios. At least 500 trial simula6ons for each scenario with each simula6on requiring at least 1000 random draws from the posterior distribu6on resulted in genera6on of over 250 million random draws.