Generating random variates from D-distributions via substitution sampling

Statistics and Computing (1995) 5, 311-315

Generating random variates from D-distributions via substitution sampling

S T E P H E N W A L K E R

Department of Mathematics, Imperial College, London, SW7 2BZ, UK

Received March 1994 and accepted May 1995

Laud et al. (1993) describe a method for random variate generation from D-distributions. In this paper an alternative method using substitution sampling is given. An algorithm for the random variate generation from SD-distributions is also given.

Keywords: D-distributions, SD-distributions, substitution sampling

1. Introduction

Laud et al. (1993) describe how D-distributions arise in Bayesian non-parametric problems and give a method for generating random variates from these distributions. Their method involves a complicated procedure of splitting the domain into three parts and designing envelope functions for each of them. In Section 2 an alternative approach is described by introducing a latent variable and performing substitution sampling. An example is given in Section 2.1 and the marginal distribution of the latent variable is con- sidered in Section 2.2.

Damien et al. (1995) also describe SD-distributions, of which the D-distributions are a special class. An algorithm for the random variate generation from SD-distributions is described in Section 3.

2. D-distributions

The class of D-distributions was introduced by Laud (1977). A random variable Z on (0, co) is said to have a D-distribution with parameters 6,/3 > 0, k = 0, !, 2, . . . if its density function is, up to a constant of proportionality, defined by

pg(z) O( 26-1 exp (--/3z)(1 -- exp (-z)) k. (1)

Note for k = 0 that Z has a gamma distribution and for 6 = 1 that 1 - e x p ( - Z ) has a beta distribution with parameter values ( k + 1,g). For k > 0 and 6 r 1 the approach of this paper to the generation of variates from such distributions rests upon the introduction of a random 0960-3174 �9 1995 C h a p m a n & H a l l

variable U = ( U 1 , . . . , gk). The Uis are defined on (0, 1) and are mutually independent given Z. The joint density function of Z and U, Pz, u(z, u), is defined up to a constant of proportionality by

Pz, v(z, u) o( z 6- l exp (-gz)II/k= 1 l(exp (-z),l)(Ui),

where

1 u C (a, b) I(a,b)(U) = 0 otherwise

Clearly this joint density function has marginal density function for Z given by (1). Random variates of interest, {Z(t); t = 1 , . . . , T } for some T, are generated via substitution sampling (Gelfand and Smith, 1990). Briefly, starting with u l 0 , . . . , u(k 0 being independently generated from the uniform distribution on (0, 1), if z (t) is the sampled variate at the tth iteration of the simulated Markov chain, then u (t) is generated from pt~z(.Iz (t)) and Z (t+l) from pzlu(.lu(t)). Under mild regularity conditions Z (t) d_~ Z as t ~ er See Smith and Roberts (1993), for example. The conditional distributions are given by

Pvlz(Ulz) = IIki= a PU, IZ(Uil z)

where, for each i 6 (1 , . . . , k), U/given Z = z is from the uniform distribution on (exp (-z) , 1), and

Pz l u(z[u) (x z 6-1 exp (-/Sz)I(_ log (,), ~)(z),

where u = min (ul, �9 �9 �9 uk). Sampling random variates from both of these conditional distributions is straightforward. For sampling from a truncated gamma distribution see Devroye (1986, p. 39), for example.

312 Walker

2.1. Example

An example from the D-distribution with 6 = 0.5,/3 - 1 and k = 30 is given here. A chain was first run in order to assess at what stage convergence was obtained. By considering the ergodic average as the chain progresses this was found to occur about the 500th iteration and is shown in Fig. 1.

The simulated Markov chain was run for 8000 iterations with the first 500 generations being discarded. From the remaining samples a histogram is constructed and com- pared with the exact density and shown in Fig. 2. This was computed by evaluating the normalizing constant of (1) given in Laud et al. (1993).

Running time was 35 s on a SPARC work station. This example was also run, with 7500 samples generated, using the algorithm of Laud et al. (1993). The running time on the same station was 70 s. Both algorithms were coded in FORTRAN.

The samples generated via the substitution sampling algorithm are dependent. In order to generate independent samples from a single chain an autocorrelation plot is constructed. This gives an indication of the gap which should be left between accepting variates necessary to provide an independent sample. The autocorrelation plot is given in Fig. 3.

It is evident from Fig. 3 that by taking every 15th random variate generated from the chain that (approximately) independent samples will be obtained.

In practical applications an inbuilt convergence diag- nostic would be useful, instead of running preliminary chains each time a sample is required from a particular D-distribution. Let

1 T

t = l

c~

o (5

i

2 4 6 8 10 12

z

Fig. 2. Exact density (line) and density approximation based on a sample of 7500 (histogram) from the D-distribution with parameters given by 6 = 0.5, 13 = 1 and k = 30. Single chain

From the ergodic theorem, as T ~ ~ ,

a.s.

~T ---~ #6,~,k,

where #~,8,k is the expected value of the D-distribution with parameters (6,/3, k) and is given by C(~ + 1,/3, k ) /C(6 , ~, k)

ITERA'rlONNLe~ER

Fig. 1. Progressive ergodic average for Z over a single chain from the D-distribution with parameters given by 6 = 0.5, 13 = 1 and k = 30

~o

0 5 10 15 20 25 30 Lag

Fig. 3. Autocorrelation plot o f sample from a single chain from the D-distribution with parameters given by 6 = 0.5,/3 = 1 and k = 30

Random variate generation f rom D-distributions 313

,r c5

o

An alternative strategy would be to obtain independent samples by simulating multiple chains with independent starting values. The starts for each of the chains can be taken by sampling Uli) , . . . , U(k l) iid dg(0 ' 1). (Here q/(a, b) for a < b represents the uniform distribution on the inter- val (a, b)). This approach was performed with the same parameter values as in the example and 7500 samples gath- ered by taking the variate generated at the 20th iteration of each chain. A histogram representation is given in Fig. 4.

Additionally in this case three chains were started at different z values (1, 4 and 9). The first 100 samples generated from each chain are plotted and shown in Fig. 5.

The few iterations required is due to the independent uniform distributions being a good approximation to the marginal distribution of U. For other parameter values this would not be the case in general and in such situations a single chain is to be preferred.

2 4 6 8 10

z

Fig. 4. Exact density (line) and density approximation based on a sample of 7500 (histogram) from the D-distribution with parameters given by 6 = 0.5, ~ = 1 and k = 30. Multiple chains

where C denotes the normalizing constant for (1) and

c(6,/3,k) =

j = 0 ~ - ~ '

where F(.) denotes the gamma function. Let e > 0 and assume that suitable convergence has been obtained, at Tc, once

[~r - m,~,kl < c

for all T _> To. This criterion can be straightforwardly implemented in the sampling algorithm.

' i t ii " . ..i ~ ::!~.!: . : i~ i i : !~.i~.~ !ii!ra

J 0 20 40 60 aO 100

Fig. 5. Sampled variates from three chains with different start values (1, 4 and 9)

2.2. Investigating the latent variable

In this section the marginal distribution of the latent variable U is given and also the marginal distribution of the variable U = min ( U 1 , . . . , Uk).

It is possible to show that the marginal density function for U is given, up to a constant of proportionality, by

pv(u) oc IG(-/31og (u); 6)J(u),

where IG(.) represents the incomplete gamma function, given by

IG(x; 6) = ~ I ~x sr- lexp (-s)as ,

u = rain (Ul,..., Uk) and J(u) = I(u E (0, 1)k). From here it is possible to show that the density function for U is given, up to a constant of proportionality, by

Pu(u) ~ (1 - u) k- lZG(-/31og (u); 6)I(oA)(u ).

This is an interesting family of distributions on (0, 1) in its own right if k is allowed to take positive non-integral values. It (almost) generalizes the beta distribution since if 6 = 1 then U ~ beta(/3 + 1, k).

To sample from pu(u) a latent variable, X, defined on (0, c~), is introduced such that the joint density function with U is given, up to a constant of proportionality, by

pv, x(U,X) c( (1 - u ) k - l x 6 l exp (--/3x)I(_log(u),~)(x).

Clearly the marginal distribution for X is a D-distribution with parameters (6,/3, k) and the conditional distributions are given by

pvlx(UlX ) o( (1 -u)k-lI(exp(_x),l)(U),

a truncated beta(l, k) distribution, and

P x l v ( x l u ) cx x 6 - 1 exp (-/3x)I(_log(u),~)(x),

314 Walker

a truncated gamma(3,/3) distribution given in Section 2. This gives an alternative scheme for sampling from D- distributions using substitution sampling when it is more convenient to sample a truncated beta distribution rather than k independent uniform variables. This would also be the algorithm to use when sampling from D-distributions with positive non-integral values for k is required.

3. SD-distributions

The class of SD-distributions is described in Damien et al. (1995) and also arise in Bayesian non-parametric problems. The density functions are defined, up to a constant of proportionality, by

pz(z) oc z 6- lexp (-/3z)(1 + z)A-l(1 -- exp (--a -- bz)) k,

(2)

with 6,/3 > 0, k = 0, 1, 2, . . . , and with a _> 0, b > 0, and A = 0, 1 or 2. Note that the D-distributions arise when A = 1, a---0 and b = 1 and a mixture of D-distributions arise when A --- 2, a = 0 and b = 1. Algorithms for the generation of random variates from such distributions are now given. As in Section 2 latent variables are introduced and substitution sampling or Gibbs sampling (Gelfand and Smith, 1990) is utilized to obtain the relevant random variates z. In the following, let g(z) for z > 0 represent the exponential distribution with mean 1/z.

Case 1: A = 0. Here latent variables Y, defined on (0, c~), and U = (U1, . . . , Uk) with each Ui defined on (0, exp (a)), are introduced so that the joint density with Z is given, up to a constant of proportionality, by

Pz, v, y(Z, Ig, y) c< z 6-1 exp (-(/3 + y)z - y)

X I~/k= 11(exp (-bz), exp (a))(Ui)"

The conditional distributions are then given by

u l , . . . , u k l z , y ~ q/(exp ( - b z ) , exp (a)),

PZIU, Y (2lt/, Y) (X 2 6 - 1 exp (-z(/3 + y))/(max {0,-log (u)/b}, ~)(Z)

and

ylz, u ,,~ ~(I + z),

and, as before, u = min (Ul, . . . , Uk).

Case 2: A = 1. Here the latent variable U, defined as in Case 1, is introduced so that the joint density with Z is given, up to a constant of proportionality, by

pz, v(z, u) c( 7. 6 - 1 exp (--/3z)rIk= l I(exp(_bz), exp(a))(Ui).

The conditional distributions are given by

Ul , . . . , uklz ~ ~ (-bz), exp (a))

and

Pz Iv(zig) cx z 6- l exp (-z/3)/(max {0, -log (u)/b}, ~) (g).

Case 3: A = 2. Here the latent variable U, defined as in Case 1, is introduced so that the joint density with Z is given, up to a constant of proportionality, by

k I Pz, v( z, u) c< z 6-1 exp (-/3z) (1 + z)IIi= 1 (exp (-bz), exp (a)) (Ui)"

The conditional distributions are given by

Ul, . . . , Uk[Z ~ q/(exp (-bz), exp (a))

and

where

Pzlv(zlu ) oc (67rz 6-1 exp (-z/3)

+/3(1 - 7r)z 6 exp (-zfl))l(max {0,-log (~)/b}, o~)(Z)

1 - ~ - 6"

In all these cases it is straightforward to show that the marginal distribution of Z has the required SD-distribution.

It is interesting to note that for any A = 3, 4 , . . . the density for Z, given by (2), exists. The normalizing constant for (2) in such cases is given by

t=0 I j=0

x (/3 + bj)l+6"

In order to obtain samples, the joint density with U, is given, up to a constant of proportionality, by

~-1 A - 1)zt+6_ 1 Pz'v(Z'U) ~ t~__o ( l

(-z/3)IIi = 1/(exp (-bz), exp (a))(Ui)" X exp k

The full conditional for Z leads to a A-term mixture of truncated gamma distributions.

Acknowledgements

The author is grateful to Ann Mitchell, Paul Damien and Jon Wakefield for their assistance with the preparation of this paper, and for the comments of a referee. The research was financially supported by the Medical Statistics Depart- ment of Glaxo Research and Development Ltd.

References

Devroye, L. (1986) Non-Uniform Random Variate Generation. Springer-Verlag, Berlin.

Gelfand, A. E. and Smith, A. F. M. (1990) Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85, 398-409.

Random variate generation f rom D-distributions 315

Laud, P. W. (1977) Bayesian nonparametric inference in reliabil- ity. Ph.D. Dissertation, University of Missouri, Columbia, MO.

Laud, P. W., Damien, P. and Smith, A. F. M. (1993) Random variate generation from D-distributions. Statistics and Com- puting, 3, 109-12.

Damien, P., Laud, P. W. and Smith, A. F. M. (1995) Random

variate generation approximating infinitely divisible distributions with application to Bayesian inference. Journal of the Royal Statistical Society Series B. 57, 547-564.

Smith, A. F. M. and Roberts, G. O. (1993) Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods. Journal of the Royal Statistical Society Series B 55, 3-23.

Generating random variates from D-distributions via substitution sampling

Documents

Transcript of Generating random variates from D-distributions via substitution sampling