Algorithms for sampling from the Bayesian posterior distribu - Cytel

Algorithms for sampling from the Bayesian posterior distribu6on for Four Parameter Logis6c and

Sigmoid Emax models

Ni6n Patel*, Chris Jennisonǂ, Jane Templeǂ, Charles Liu*

* Cytel, Inc., USA ǂUniversity of Bath, UK

Outline

•  Mo6va6on •  A Metropolis-‐Has6ngs algorithm •  A Gibbs Sampling algorithm currently being implemented in a soKware product (Compass)

•  A Direct Monte Carlo algorithm •  Extensions and work in progress

In this talk we will focus on 4PL models as algorithms and results for Sigmoid Emax models are very similar

2 ICSA 2012 Applied Sta6s6cs Symposium, Boston

Mo6va6on •  4 Parameter Logis6c (4PL) and Sigmoid Emax models are very

popular for modeling dose response in clinical trials •  Exis6ng algorithms in Cytel’s Bayesian dose finding trial design

soKware (Compass and CytelSim) for compu6ng posterior distribu6ons for these models use a Metropolis-‐ Has6ngs method developed by ScoX Berry (Berry Consultants). Algorithm has slow convergence to steady state.

•  Could it be improved for both speed and ease of use?

–  The number of samples from the posterior distribu6ons required for the work described by Jim Bolognese was around 250 million random draws.

•  Having different methods for posterior computa6ons

facilitates valida6on of results by comparing answers from dis6nct algorithms


Four parameter logis/c (4PL) D is dose (or logdose) Y is response of a subject on dose D Parameters Minimum response (β) Response range (δ) Median effec6ve dose (θ) Slope parameter (τ), τ>0

4 PL Model for Dose Response


4PL Model for Dose Response

Very flexible •  Fits diverse range of monotonic dose response curves

• includes linear, concave, convex and sigmoidal shapes


MCMC: Metropolis-Hastings

No convenient conjugate prior, so Markov chain Monte Carlo (MCMC) Random-‐walk Metropolis Has/ngs algorithm -‐ sample new point via random walk; -‐ if new point posterior density > posterior density at current point, move to new point; -‐ else stay at current point. In theory, converges to posterior distribu6on, but difficul6es in prac6ce…

Current point

Higher density proposal

Lower density proposal

Parameter value

Posterior density


Metropolis-‐Has/ngs for Independent Normal priors for β, δ, θ, τ and Inverse Gamma for σ2 At each itera6on of the MCMC chain, the parameters β, δ, θ, τ are sampled successively from their univariate condi6onal distribu6ons using a random-‐walk Metropolis step. Thus this is a ‘Metropolis within Gibbs’ algorithm (Marin and Roberts) The condi6onal distribu6on for σ2 is Inverse Gamma, so it is sampled directly using a Gibbs step.


MCMC: Metropolis-Hastings

Disadvantages: -‐ High autocorrela/on between Monte Carlo samples -‐  Dependence on star/ng points; requires discarding many burn-‐in samples

-‐  Long runs to ensure convergence (parameter space is explored adequately)

-‐  Requires “tuning” of proposal density (SD of random walk). Inefficient if too many samples rejected or accepted in random walk


Example: True dose response

Parameter Value

β 0

δ 1.1

θ 4

τ 0.5

σ 2 (known)

ICSA 2012 Applied Sta6s6cs Symposium, Boston 9

Example: one trial simula6on


Doses N Obs. Mean Resp.

0 30 -‐0.04

1 30 -‐0.08

2 30 0.06

3 30 -‐1.02

4 30 0.52

5 30 1.34

6 30 0.9

7 30 0.89

8 30 1.79

True Observed

MCMC: Convergence issues

Metropolis-‐Has/ngs trace plot

blank

(500) burn-‐in samples discarded

Dependence on star6ng points

Autocorrela6on


Autocorrelation function of samples

Metropolis-‐Has/ngs (Beta)


MCMC: Gibbs

Gibbs Sampling -‐ Unlike Metropolis-‐Has6ngs, all samples are accepted. -‐  Requires sampling from known full condi6onal posterior distribu6ons.

Our Gibbs Algorithm generates samples in 3 blocks : 1. Sample (β, δ | θ, τ, σ2) ~ MVN(µ, Σ) 2. Sample (θ, τ | β, δ, σ2) ~ grid(θ, τ) * 3. Sample (σ2 | β, δ, σ2) ~ IG(1/a, b) (same as in MH algorithm) -‐ back to block 1…

Advantages: -‐ Lower autocorrela6on; less burn-‐in; no "tuning"; no samples rejected

* “Griddy” Gibbs sampler (Tanner, 1996).


Gibbs Sampling We assume the following independent priors: β ~ Normal(μβ, σβ2), δ ~ Normal(μδ, σδ2), θ ~ Discrete Uniform (θL , θL+1 ,⋯ θU), Nota6on: Let be the vector of observed mean responses at the D doses Let W be a diagonal matrix with diagonal elements =1/nj , where nj is the number of subjects on dose Dj

Let xj = {1+exp(θ – Dj)/τ}−1 and x = (x1, x2,…, xD)T Let X denote the Dx2 matrix [ 1, x]


y

Sampling the (β, δ| θ,τ, σ2 ) block Posterior Condi/onal Distribu/ons

Sample (β, δ |θ, τ , σ2) from this Bivariate Normal Distribu/on 15 ICSA 2012 Applied Sta6s6cs Symposium,

Boston

Condi6onal Distribu6on of (θ,τ|β,δ,σ2)

Compute the likelihood for each support point ( θk , τl) in the discrete prior Since each point is equally likely in the prior, the joint distribu6on of (θ, τ) is given by the likelihood normalized over all points in the grid.


Sample (θ, τ | β, δ , σ2) from this bivariate discrete distribu/on

Condi6onal Distribu6on of (σ2|θ,τ,β,δ)

Sample 1/σ2 from the Gamma Distribu/on with parameters α+n/2 and ψ+SSQ/2


MCMC: Gibbs

Gibbs trace plot

blank


Autocorrelation function Gibbs (Beta)


Autocorrelation functions for Beta Metropolis-‐Has/ngs Gibbs

Disadvantage of Gibbs: Needs star6ng values , Computa6on 6me can be slow; Burn-‐in requires trial-‐and-‐error 20 ICSA 2012 Applied Sta6s6cs Symposium,

Boston

Effective Sample Size The advantage of Gibbs > MH can be quan6fied by effec/ve sample size: equivalent number of i.i.d. samples. For MCMC (MH & Gibbs), the sampling error is inflated due to autocorrela6on.

M: effec/ve sample size n: original sample size ρk: autocorrela/on at lag k Many methods for approxima6ng effec6ve sample size (e.g., batch means). -‐ Here, we used the effectiveSize() func6on in R package CODA.


Effective Sampling Speed

Example design: 1000 simulated trials; 1 cohort of 270 pa6ents; 8 doses; 1000 steady state samples per trial; grid size: 30 x 30. Effec/ve sampling speed: rate of genera6ng equivalent number of i.i.d. samples. Gibbs Sampling is 20 6mes faster than Metropolis-‐Has6ngs

Algorithm Effec/ve sample size (N)

Compu/ng /me (seconds)

Effec/ve sampling speed (seconds/N)

Metropolis-‐Has6ngs 18 118 6.5

Gibbs 840 273 0.325


Ease of Use vs. M-‐H

•  Gibbs sampling does not require tuning of the random walk parameter

•  Does require selec6ng: – Star6ng values for (θ, τ) – Grid values (θmin, θmax) (τmin, τmax) and number of grid points. (In most cases we have found 30x30 grid is adequate.)


Marginal posterior distribution of (θ,τ|σ2)

It can be shown that, for a uniform discrete prior on θ and τ* :


* Can be easily extended to other discrete priors e.g. discre/zed Normal

Direct Monte Carlo (for known σ2)

Direct posterior probability calcula6ons: (not Markov chain hence avoids MCMC convergence issues altogether, also computa6on is much faster!) 1. Marginal posterior distribu6on Pr(θ, τ | D) is calculated for each grid point of (θ, τ) space. From this joint distribu6on it is easy to sample posterior distribu6on of τ. Sample τ from this distribu6on using inverse sampling 2. Calculate the marginal distribu6on of θ|τ and sample θ from this distribu6on using inverse sampling 3. Sample (β, δ | θ, τ, σ2) ~ MVN(µ, Σ) as in Gibbs 25 ICSA 2012 Applied Sta6s6cs Symposium,

Boston

Notation


ˆ ˆ,uv uvβ δ are WLS es6mates of intercept and slope in linear regression of duvy on x

Direct Monte Carlo

Direct Monte Carlo trace plot

No dependence on star6ng points

No autocorrela6on No need for burn-‐in


Effective Sampling Speed

Example design: 1000 simulated trials; 1 cohort of 270 pa6ents; 8 doses; 1000 steady state samples per trial; grid size: 30 x 30. Effec/ve sampling speed: rate of genera6ng equivalent number of i.i.d. samples. Direct sampling is 5 6mes faster than Gibbs, and 100 6mes faster than Metropolis-‐Has6ngs!

Effec/ve sample size (N)

Compu/ng /me (seconds)

Effec/ve sampling speed (seconds/N)

Metropolis-‐Has6ngs 18 118 6.5

Gibbs 840 273 0.325

Direct 1000 66 0.066


Autocorrelation function

Direct (Beta)


Ease of Use: Direct vs. Gibbs •  Direct sampling does not require selec6ng star6ng values,

burn-‐in length or steady state sampling length to account for auto-‐correla6on in samples

•  Does require : –  Grid values (θmin, θmax) (τmin, τmax) and number of grid points. –  If grid is too small: we miss significant posterior parameter values. –  If grid is too large: we have computa6onal inefficiency (In most cases we have found 30x30 grid is adequate.)

•  Number of points in grid should be chosen to approximate the con6nuous distribu6on of (θ, τ) reasonably well


Design Simulation vs. Data Analysis Design simula/on:

•  We can use true values to center grid and for star6ng values, also facilitates selec6on of a reasonable range

•  Need to simulate data for many trials within

mul6ple scenarios to evaluate opera6ng characteris6cs of design

Data analysis:

•  Need to es6mate likely range of true data-‐ genera6ng values before specifying limits •  Actual observed data used so computa6on is for just

one data set


Summary and future work

• The Gibbs sampling method outperforms Metropolis –Has6ngs for posterior samples for 4PL and Sigmoid Emax models in speed and also requires less effort in specifica6on by user

• The Direct sampling method is easy to use as it does not require tuning parameters and convergence assessment. It is beXer than Gibbs sampling for known σ but the method needs to be extended to handle unknown σ (work in progress . . .) • Automa6ng grid selec6on will make the Direct sampling method very straigh�orward to use (work in progress . . .)

• We are extending the Direct sampling algorithm to mul6variate observa6ons for a PK/PD applica6on with mul6ple endpoints 32 ICSA 2012 Applied Sta6s6cs Symposium,

Boston

Thank you!

[email protected]

Extra Slides

Automating Grid Limits

Work in Progress: In the direct algorithm, how can we automate the loca6on of θ, τ grid values?


Automating Grid Limits (theta)

For fixed β, δ , τ, what happens as θ varies?

Upper bound θmax : θ where [(E(Y|dmax) – β) –fmin]/δ = ε.

Lower bound θmin : θ where [δ – (E(Y|dmin)–β)]/δ = ε.

Ignore very “flat” curves: Pr[θ < θmin or θ > θmax] = 0. 36 ICSA 2012 Applied Sta6s6cs Symposium,

Boston

Automating Grid Limits (theta)



Cytel Tools for Dose-‐Finding

CytelSim (cont.)

CytelSim (binary)

Compass™ (cont.)

Compass™ (binary)

Up&down (1 or 2 targets) √ √ √ √

T-‐stat (1 or 2 targets) √ √ √ √

2-‐stage (isotonic) √ 2-‐stage (R-‐function based) √

2-‐stage (Hochberg) √ 4-‐param logistic Bayesian √ √ √ √

Umbrella (Maximizing) √ √ √ √ NDLM Bayesian √ √

Emax Bayesian √ √

CRM √ √

MCP-‐mod *

Ivanova 2-‐stage Bayesian *

Ph2 -‐> Ph3 -‐> PoS & NPV √

* coming soon

# Posterior samples in NP simula6ons

•  The number of designs simulated in the work described by Jim Bolognese required simula6on of approximately 500 design scenarios. At least 500 trial simula6ons for each scenario with each simula6on requiring at least 1000 random draws from the posterior distribu6on resulted in genera6on of over 250 million random draws.

Algorithms for sampling from the Bayesian posterior distribu - Cytel

Documents

Transcript of Algorithms for sampling from the Bayesian posterior distribu - Cytel