Quantum free-energy calculations: Optimized Fourier path-integral … › pages › topper ›...

Quantum free-energy calculations: Optimized Fourier path-integral Monte Carlo computation of coupled vibrational partition functions

Robert Q. Toppe~) and Donald G. Truhlar Department of Chemistry and Supercomputer Institute, University of Minnesota, Minneapolis, Minnesota 55455-0431

(Received 3 April 1992; accepted 29 May 1992)

The Fourier coefficient path-integral representation of the quantum density matrix is used to carry out direct, accurate calculations of coupled vibrational partition functions. The present implementation of the Fourier path-integral method incorporates two noteworthy features. First, we use a Gaussian in Fourier space as a probability density function, which is sampled using the Box-Muller algorithm. Second, we introduce an adaptively optimized stratified sampling scheme in Cartesian coordinates to sample the nuclear configurations. We illustrate these strategies by applying them to a coupled stretch-bend model which resembles two of the vibrational modes of H20. We also apply a simple, yet accurate method for estimating the statistical error of a Metropolis integration, and we compare the Box-Muller and Metropolis sampling algorithms in detail. The numerical tests of the new method are very encouraging, and the approach is promising for accurate calculations of quantum free energies for polyatomic molecules.

I. INTRODUCTION

In principle, the quantum statistical-mechanical computation of the free energy of a vibrating-rotating molecule with no low-lying excited elect~onic states is straightforward. Using a potential-energy surface obtained within the Born-Oppenheimer approximation, one obtains the energy eigenvalues which are the solutions of the timeindependent nuclear-motion Schrodinger equation, and sums their Boltzmann factors according to l

-3

(1) If

In this expression, Q( T) is the vibrational-rotational partition function, n denotes the collection of all the quantum numbers appropriate to each vibrational-rotational state, Elf is the energy of the eigenstate, and f3= lIkBT, with kB the Boltzmann constant and T the temperature of the ensemble. Then the quantum free energy of an ideal gas of N j

molecules is given by2,3

G=-N~BTln Q(T)~.ans(T) , I

(2)

where Qtrans (T) is the translational partition function. Free energies calculated by Eq. (2) can be used to obtain equilibrium constants from molecular potentials (see, e.g., Refs. 2 and ~), and analogous quantities for transition states could be used to obtain transition-state-theory rate constants.4,7-1O

The use of Eq. (1) in practical calculations assumes that a converged computation of the energy eigenvalues is available. For low-dimensional systems it is a straightforward task, albeit computationally intensive, to obtain the eigenvalues via matrix diagonalization. However, as the

')Present address: Department of Chemistry, University of Rhode Island, Kingston, RI02881.

rotational quantum number J or the dimensionality is increased, such computations rapidly become intractable. Despite recent advances which have improved upon this situation, fully converged vibrational-rotational eigenvalue calculations have not yet been used to calculate converged partition functions or free energies for systems with more than three atoms. This has motivated the present study, in which we have used the path-integral formulation of quantum statistical mechanics for the computation of vibrational partition functions, since this method may be more efficient for the calculation of vibrational-rotational partition functions of polyatomic molecules.

In the present paper we examine the use of Fourier path integrals to calculate partition functions for small systems. It may be possible to use this approach for large systems as well, but that extension is not addressed in the present paper.

II. THEORY

We begin by reviewing well-known theory, as required to explain our approach. An alternate route to Q( T) is to write the microscopic partition function as the trace of the canonical density matrix and to evaluate the trace in the coordinate representation, which yields

Q(T) = J:"" dxp(x,x;f3), (3)

where x denotes a point in an N-dimensional Cartesian A

space, H is the Hamiltonian operator, and A

p(x,x';f3) = (x I exp( -f3H) I x') (4)

is the density matrix. Using a standard approach, II we .telabel x as xl, factor the operator into P identical factors, insert the resolution of the identity P-l times, and use the Trotter formula 12 to obtain

J. Chem. Phys. 97 (5), 1 September 1992 0021-9606/92/173647-21$006.00 © 1992 American Institute of Physics 3647 Downloaded 22 Jul 2010 to 128.122.88.197. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/jcp/copyright.jsp

3648 R. Q. Topper and D. G. Truhlar: Quantum free-energy calculations

P

Q(T)= f dxl ... f dxPig ( xi lexp(-,8HIP)l xi+I),

(5)

where it is understood that xP+ 1 =xl, and each of the Ndimensional integrals S dxi ranges over the infinit;... domain.

We assume that in this coordinate system H can be separated into kinetic and potential energy operators:

N "2 " " ~ Pj H=T+ Vex) = £.., 2m.+ Vex)

j=! J (6)

with ~ the momentum operator for the jth degree of freedom. If P is large the matrix element in Eq. (5) can be approximated byll,13,14

and the approximation of "order" P to Q( T) is given byl5

[ P P N

'+1 . 2 Xexp -~I I mixj -xj)

2rr,8 i=! j= 1

,8 P . ] -~ L Vex') P ;=1

(8)

with xj the jth component of Xi. As pointed out by Chandler and Wolynes,16 gPl(T) has the same form as the classical configuration integral of a "necklace" of beads, each linked to one another by springs with force constant mJ'lfil,8 and subjected to an "external" potential V(Xi)/P, Note that the approximation made in Eq. (7) is not unique, in the sense that others may be used which may yield more accurate estimates of Q( T) for finite values of P. A great deal of research has focused upon developing alternatives to Eqs. (7) and (8).11-27 The accuracy of the chosen approximate form generally depends on the the temperature of interest, as well as on how rapidly Vex) varies in the physically accessible region.

In the limit p ..... 00, gPJ(T) converges to the exact partition function:

lim Q[Pl(T) =Q(T). (9)

It has been pointed out that gPl (T) has the same form as that of a discrete approximation to a path integral1,15,28,29 with all paths beginning and ending at xl. Path integrals appear naturally in the context of the Feynman pathintegral representation of quantum mechanics28 and in the theory of Brownian motion, where they are referred to as Wiener integrals.29 Regardless of the type of approximation used, expressions such as Eq. (8) are referred to as discretized path integrals. When the integral in this form is

evaluated using Monte Carlo quadrature, the calculation is referred to as a discretized path-integral (DPI) Monte Carlo computation.

With use of Feynman's notation, the partition function can be written as1

Q(T)= f dx fxx ~[x(s)]

xexp { -~ f:~ ds H[X(S),X(S)]}. (10)

Equation (10) is an exact representation of the quantummechanical partition function and does not depend upon the approximation made in Eq. (7). In this notation xes) indicates a particular path, and S!~[x(s)] indicates the summation of the contributions from all paths which begin and end at the same points x in the N-dimensional configuration space. Each path is parametrized by s, which has the dimensions of time and ranges between 0 and ,8fz. The density matrix in this notation becomes

p(x,x';,8) = f:' ~ [xes) ]exp { -~ f: ds H[X(S)'X(S)]}.

(11)

There is another path-integral-based route to the partition function, which uses the exact expression given in Eq. (11) as its starting point. This method,I,28,30 which was introduced by Feynman and Hibbs and further developed extensively by Doll and others,22,31-37 employs a Fourierseries expansion of the paths connecting x to x'. To present this method, we let Xj and xi denote the jth component of x and x', respectively. We choose to expand the true paths about linear "free particle" (v[x (s)] = 0) reference paths. The slope of the reference path along the jth degree of freedom is taken as (xi - xj)/,8fz, and the intercept is taken as Xi Expanding each component of each path in a Fourier series about the linear reference path yields

The density matrix can be transformed into a Riemann integral of the form 1,28,31,35,37

[ N m{x.-x~)2]

p(x,x';,8) =J(,8)exp - j~1 J 2~:fi/

x foo dal,I'" . daN, 00

-00

{

N 00 a2 IJ{Jft }

X exp - I L. 2;; --z dsV[x(s,a)]. J=I 1=1 J,l 71 0

(13)

J(,8) is the Jacobian of the transformation, given by35,37

J. Chern. Phys., Vol. 97, No.5, 1 September 1992 Downloaded 22 Jul 2010 to 128.122.88.197. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/jcp/copyright.jsp

R. Q. Topper and D. G. Truhlar: Quantum free-energy calculations 3649

J([3)= II J II , N [( m. )112 00 1] j= 1 2rr{3fi2 /= 1 O\j2;aj,/

(14)

a is an (NX 00 )-dimensional vector of Fourier coefficients, consisting of N vectors aj of order 00, and the aj,/ are members of a set of temperature-dependent parameters

(15)

which are correctly interpreted as Gaussian widths for the Fourier coefficients of the free-particle system. Note that the choice of a linear path as the reference in Eq. (12) is not unique; for example, one could alternatively use the harmonic path, as in Ref. 37. However, such a reference path would not necessarily be appropriate for a system with multiple minima in the potential-energy surface.

We write the partition function as the trace of the density matrix in this free-particle Fourier coefficient representation and introduce the notation

1 r{31! S(x,a)=/i Jo ds V[i(s,a)], (16)

where S(x,a) is the action. In this fashion we obtain

Q(T)= f dxp(x,x;{3)

with

=J({3) f (j~l dXj) (PI /~1 daj,l)

xexp [- i I 2~ -S(x,a)1 j=1 /=1 j,/

(17)

(18)

since x=x' in the trace. Equations (17) and (18) are an exact expression for the quantum-mechanical partition function. In practice, the number of Fourier coefficients is truncated at some finite number K, in which case the dimensionality of the integral to be evaluated is [N + (N XK)].

The critical step for reducing the partition function to a sampling problem is to convert the integral to an average. In the present work this is accomplished by restricting the domain in configuration space to a finite domain D which encloses the most important range of x values. Then we divide the partition function of interest by the partition function Qpb( T) of a particle in the same-size Ndimensional box, but with zero potential. This yields

(19)

where Spb(x,a) is defined by Eq. (16) but with potential Vpb' and Vpb is zero within the confines of the domain D and infinite outside of D. Thus, Spb(x,a) is zero within the integration range. Truncating to finite K, we obtain the approximate form

where a now consists of N vectors of order K, and S(x,a) is given by Eq. (16) with

xis,aj) =Xj+ It aj,/sin(;;). (21)

e~l(T) can be found analytically (see Appendix A). Thus, when the ratio of the two finite-dimensional integrals can be evaluated accurately, we can obtain an approximation to the partition function. This approximation will converge to the exact answer as D and K are increased. The ratio of integrals may be evaluated using Monte Carlo methods. The use of Eqs. (12)-(15) to evaluate the density matrix is called the Fourier path-integral (FPI) method.22,31-36,3~ Direct FPI calculations of the partition function or free energy for molecular systems using Eqs. (20) and (21) have not previously been reported in the literature. However, the FPI formalism has been used to carry out calculations of free-energy differences between

(20)

fully coupled systems and ideal-gas clusters using the "state integration" method.35,40-42 In this method one carries out a number of simulations over a range of temperatures between the temperature of interest and a higher temperature Too' in each case calculating the canonical ensemble average of the potential energy. One then integrates these thermodynamically averaged potential energies over the temperature range from Too to T to obtain_ a free-energy difference. If Too is a high enough temperature for the cluster's internuclear forces to be negligible, the dynamics in this limit correspond to those of a confined ideal gas, and the classical configuration integral may be used to calculate the absolute free energy of the hightemperature state. In contrast, as we will show, a single simulation suffices to evaluate Eqs. (20) and (21). We note that a wide range of methods has been used to calculate classical free-energy differences43-46 as well as absolute

J. Chem. Phys., Vol. 97, No.5, 1 September 1992 Downloaded 22 Jul 2010 to 128.122.88.197. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/jcp/copyright.jsp


classical free energies,47-49 but we will not attempt to review this extensive literature here.SO-52

Doll and co-workers developed a "partial averaging" FPI method and applied it to the computation of thermodynamic properties of rare-gas clusters (including differences in free energies) at low temperatures.22,31-36,40 Partial averaging implicitly takes into account the average contribution of the Fourier coefficients neglected in expressions such as Eqs. (20) and (21). This method is apparently more rapidly convergent than simply neglecting contributions with I> K, but its successful application assumes the use of a potential function which either (a) has an analytic Gaussian transform or (b) is simple enough for an Ndimensional gradient expansion about every point in the Cartesian space to be numerically tractable.22,35 Obviously condition (a) is a restrictive condition for realistic potential surfaces. Condition (b) is computationally intensive when N is large, especially if the potential is complicated or its gradients must be evaluated numerically. To avoid these limitations we concentrate in the present paper on developing variance reduction techniques with strict truncation at I=K.

III. INTEGRATION METHODS

In the preceding section we reviewed the path-integral representation of quantum-mechanical partition functions. Although the Fourier path-integral formalism has been used extensively to compute thermodynamic averages of quantities such as radial distribution functions and potential energies for atomic clusters, it has not previously been used for the direct calculation of partition functions or free energies of such systems. In this paper we describe Monte Carlo integration schemes for such calculations and test them for vibrational partition functions.

In order to numerically integrate the [N + (NXK)]dimensional approximation to the quantum density matrix efficiently, carefully designed methods must be employed. It is well known that Monte Carlo integration methods53-61 scale favorably with the number of dimensions. When the samples generated by a Monte Carlo algorithm are mutually independent of one another, we have the added benefit that the error bars associated with the integration are straightforward to evaluate and have a definite meaning.

There are two principal convergence issues in the Monte Carlo integration of the density matrix to obtain the partition function: (1) convergence with respect to the (N XK) Fourier coefficients (which determines the accuracy of the approximation to the path integral), and (2) convergence with respect to the number of samples taken (which determines the precision to which the integral is evaluated for a given K). By systematically increasing K and repeating the calculation, one can decide whether the first issue has been adequately addressed. The second issue can be addressed through analysis of the statistical variance of the integration, as described in detail below.

Two methods which have been shown to be very useful in the context of accelerating the convergence of "crude" Monte Carlo integrations are stratified samplint3,54,57,59,62

and importance sampling.53,54,58,59,62,63 We have found that combining stratified sampling in coordinate space with importance sampling in Fourier coefficient space provides good accuracy in the present context. Our implementation of stratified sampling is adaptive in that it uses the result of a series of preliminary "probe" samples to determine the optimal size for the strata, and optimally allocates the remainder of the samples to each subregion in order to minimize the error.

In the following subsections, we present an importance sampling integration scheme based on the Box-Muller algorithm59,63 to calculate vibrational partition functions for coupled-mode systems. We then describe how the BoxMuller algorithm can be combined with adaptively optimized stratified sampling to substantially decrease the statistical uncertainty in the calculation of the partition function. In Sec. III A we discuss importance sampling and the Metropolis algorithm. In Sec. III B we introduce some notation for Monte Carlo integration and error analysis, and in Sec. III C we focus on stratified sampling.

III. A. Importance sampling and metropolis walks

We consider the integral of a function f(x) over a domain D of a multidimensional space x; the integral and the volume of the domain are given, respectively, by

(22)

'Y= fD dx. (23)

The average value of f(x) is

<I>=epl'Y, (24)

and the unnormalized second central moment of f(x), which has units of ep2, is given by

p(ep) ='Y fD [f(x) -<I>]2dx. (25)

Integrals such as Eqs. (22), (23), and (25) can be evaluated by Monte Carlo integration. For the integrals of interest for calculating molecular partition functions, most of the contributions to ep, and hence <I> as well, come from a small fraction of the domain D. In order to obtain acceptable convergence rates for Monte Carlo integrations in such cases, it is usually necessary to use variance reduction schemes such as importance sampling or stratified sampling.54,57,59 A well-known example is provided by the widespread use of Metropolis sampling64 for evaluating averages such as Eq. (3) whenf(x) contains a classical Boltzmann factor for a complicated energy function. Metropolis sampling has been well described previously.3,50,56,58,59,64-69 The Metropolis algorithm is a particular type of importance sampling algorithm applicable to evaluating averages. More general importance sampling functions can also be used to evaluate integrals such as Eq. (22) or (25).

In order to integrate f(x) using the method of importance sampling, one samples points within the domain ac-

J. Chern. Phys., Vol. 97, No.5, 1 September 1992

Downloaded 22 Jul 2010 to 128.122.88.197. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/jcp/copyright.jsp


cording to a probability density function g(x) satisfying the normalization condition:s3

1 = JD g(x)dx.

Then

cP= JDI(X)dx= JD h(x)g(x)dx,

where

hex) =/(x)/g(x).

(26)

(27)

(28)

When the domain is sampled in this fashion, the second f A.' • b 5459 central moment 0 'I' IS glVen y ,

f1.(cP) = JD [h(x)-cP]2g (x)dx. (29)

If g(x) is well chosen, Eq. (29) can give a considerably smaller result than Eqs. (24) and (25). As we will see in Sec. III B, this implies that the asymptotic error of a Monte Carlo computation can be reduced when the domain is sampled according to g(x). Note that uniform sampling of the domain D is equivalent to setting g( x) = 1/ r. In this limit Eqs. (24) and (25) are recovered from Eq. (29).

The importance sampling strategy for accelerating Monte Carlo integrations is an example of a variance reduction method. Here, the optimal sampling function is one that causes the individual weighted-function evalua-

1· 1 'bl f A. 53,54,57,59,62,70 tions hex) to vary as Itt e as POSSI e rom '1"

In the limit of zero variance, every weighted-function evaluation gives the same result as any other and a single sample gives the exact value of cP.

One can sample according to g(x) in a number of ways. If g(x) can be integrated and inverted analytically, one can take mutually independent samples from it directly using a variety of algorithms, such as the Box-Muller algorithm.53,59,63 When the integrand to be evaluated can be cast in the form of an average, a more flexible choice of probability density functions can be used than the rather restricted class of integrable and invertible functions. A well-known example of such a case is the computation of canonical ensemble averages in classical statistical mechaniCS.50,59,64,65,67,68,71 In order to average a quantity hex) we must weight it by exp[ -,BV(x)] where Vex) is the potential energy. Thus we consider integrals of the form

cP= J h(x)g(x)dx, (30)

where

G(x) g(x) = SDG(x)dx' (31)

In the case just discussed G(x) is exp[ -,BV(x)]. Clearly, any g(x) of the form of Eq. (31) satisfies Eq. (26). The Metropolis algorithm provides a way of sampling according to this g(x) for arbitrary Vex) despite the fact that we do not generally know the integral of exp[ -,BV(x)] analytically. In fact, the algorithm can be applied to Eq. (30)

for arbitrary normalized g(x). The algorithm is implemented by choosing an initial configuration Xl and then "stepping" within a subdomain centered about Xl' This new configuration X2 is usually chosen uniformly (Le., from a uniform probability distribution function) on a "small" hypercube centered about Xl' The configuration generated by the "step" is always "accepted" if G(X2) > G(Xl) and is accepted with probability G(x2)/G(Xl) otherwise. If the step is accepted, the new configuration is taken as the next sample; if it is rejected the old configuration is counted again. Then

I n

cP-z- L h(xJ, n i=l

(32)

where n is the total number of (new plus reused) samples. We note that the Metropolis procedure is only guar

anteed to sample g(x) asymptotically, i.e., the sampled distribution eventually converges to g(x), but there may be an induction period. Therefore Metropolis walks should be "equilibrated" for a number of steps before samples are

. 50.59 66 67 72 Th . taken for averagIng purposes. '" ere IS no way to predict how long or short the equilibration period will be for a particular problem or subdomain size, so one checks it numerically.

The Metropolis algorithm can be used to integrate any multidimensional function cast into the form of Eq. (30), and it has found many applications in quantum mechanics73,74 as well as in classical statistical mechanics. It has also been used in previous Fourier path-integral Monte Carlo studies,35,36,40 which were concerned with canonical ensemble averages. However, we have explored other methods in the present work, for two reasons. First, the configuration-space integral in Eq. (20) does not have the form of a weighted average. Second, we will show that the integral over the Fourier coefficient space in Eq. (20) can be performed more efficiently by another method.

III. B. Monte Carlo integration and error estimation

In order to motivate our approach, we next discuss the estimation of sampling errors in some detail. We again consider the integral cP given in Eqs. (22) and (23). Let Xl' X2, ... ,Xn be a set of n independently chosen "randomly selected" points sampled uniformly within the volume of the domain D. An unbiased estimate of cP can be obtained by evaluating the function at each sampled point, averaging the result and multiplying by the volume of the domain, i.e., if

1 n

(f)n=- L I(x), n i=l

(33)

then

(34)

is an unbiased estimate of <p. The law of large numbers . h l' . A. A. 54 59 62 E . assures us that In t e Imlt as n -+ 00, 'f'n -+ '1" " quatIon

(33) is the Monte Carlo estimate of 1/1.



The central limit theorem assures us that the standard deviation a", of t/J" from t/J when n is sufficiently large is53,54,59

(35) ,,~co

Note that J1.(t/J)/'Y2 is the variance off, whereas W~ is the variance of t/J". The meaning of W" is the same as the standard deviation in statistics, and in particular the same confidence levels apply. Thus, if we independently carry out the n-point integration v times using independent random number sequences, we expect that, in the asymptotic limit of large v and large n, 68.3% of the v results t/J" will lie within 1 W" of the accurate answer t/J, 95.4% will lie within 2W", and so forth. 59,62,7o,75

If we knew the second central moment J1. (t/J) analytically, we could compute the standard error exactly for any sufficiently large value of n. However, in practice W" must usually be approximated by an estimate of the variance obtained from the same sample as that used for calculating (I> fl' This gives an estimated standard error Wm where

w"=~m,,ln, (36)

and

(37)

Note that m" approaches J1.(t/J) in the asymptotic limit of large n, but of course for any finite value of n, mn does not equal J1. (t/J) exactly. In general, Eqs. (36) and (37) will not give the same result as Eq. (35) since n is finite. However, if in each calculation the samples are mutually independent, we expect that computing the average of m" over v calculations will give a good approximation to J1. (t/J) if v is sufficiently large. The average of mn over v calculations, each involving n samples, will be called (m~j»V" Using this estimate of J1. (t/J) in Eq. (36) yields an estimated standard deviation w~v):

(v)__ n v

«m(j» )112

wn -- n . (38)

When we use importance sampling but the samples are still mutually independent, Eqs. (33) and (37) are replaced by

(39)

(40)

where heX) --f(Xi)lg(Xi)' In Metropolis sampling one still uses Eqs. (39) and (40), but now the Xi are not mutually independent samples.59,76-78 Rather, they are generated by a Markov process. One still finds that t/J" ..... t/J, but now Eqs. (36) and (40) do not give valid error estimates, even in the limit of asymptotically long Metropolis walks. Rather, the true variance of n-point calculations from the mean is given by79--82

{C(O) [ n-! ( ) ] } 112

W,,= -n-- 1+2 K~! 1--~ p(K) (41 )

with

p(K) =C(K)/C(O) (42)

and

(43)

is the expectation value of the product of the deviations of the ith and the (i+K)th function evaluations from the true mean. Note that in this notation ceO) =J1.(t/J), with J1.(t/J) given by Eq. (29). In practice, C(K) must be estimated from C(K):

and therefore p(K) is replaced by p(K):

p(K) =c(K)Ic(O). (45)

The implementation of Eqs. (41 )-( 45) for a correlated data· series is complicated by the fact that the variance in p(K) grows as a function of K, according t083

Var[p(K)] =~ { 1 +2 ~! [p(i)]2}. (46)

As a result, the computation of Wn from the direct application of Eqs. (41)-(45) is usually overwhelmed by the uncertainty of the high-order contributions to Eq. (41), as has been discussed previously in the literature.8o

,8!

Straatsma and co-workers82 suggested a method for estimating Wn from a single correlated data series by calculating the p(K) and their estimated variances, truncating the summation in Eq. (41) at KmaJI when P(Kmax) becomes larger than 2 ~Var(pK)' then fitting p(K), K=I, ... ,Kmax to an exponential function, which is used to estimate the effect of the K=Kmax + 1, ... ,n-l terms. In Sec. IV we present two simpler prescriptions which seem to be sufficiently accurate for most purposes. We will also show that using mutually independent samples to generate samples in the Fourier coefficient space leads to sampling errors that are much narrower than those which can be obtained through the use of the Metropolis algorithm to sample from the same g(x). Furthermore, the use of mutually independent samples allows the errors to be evaluated with much less effort. Finally, unlike the Metropolis algorithm, no equilibration period is required and therefore all of the configurations generated can be used in the Monte Carlo quadrature.

III. C. Monte Carlo integration with stratified sampling

Although importance sampling is very powerful, the proper choice of a general form for the probability density function g(x) is not always obvious. In fact, if one wants to generate mutually independent samples the choice of a particular form for g(x) is generally restricted to particular integrands f(x). Therefore we also seek methods which




may be used adaptively, i.e., which may be used in such a way that the final sampling distribution is determined "on the fly." We have found that the method of stratified sampling,S4,S7,59,62 is particularly easy to use in this way.

The basic strategy of stratified sampling is illustrated by considering Eqs. (22) and (23) again. We divide up the domain D into subdomains Dk (which are referred to as "strata") with subvolumes 1/~ k and uniformly sample each of the subdomains nk times, such that

N. L nk=n,

k=!

(47)

(48)

with Ns the number of strata, n the total number of samples, and 'P" the total volume of the domain D. Note that the nk are not necessarily equal to one another. The stratified n-point approximation to ¢ is then given by57,62

N. ¢n= L '1/'k(J)nk'

k=! (49)

which we see is the subvolume-weighted sum of the averages of I(x) within each stratum.

The square of the standard error wn given in Eqs. (36) and (37) is replaced by the sum of the squares of the standard errors computed within each stratum:57

,62

The goal of stratified sampling is to set up a stratification in such a way that the standard error given in Eqs. (50) is minimized. As discussed by James,53 one can be assured of at least an infinitesimal improvement in the variance if all strata are sampled uniformly, i.e., if all the nk are equal to one another. However, this does not usually yield significant improvement over no stratification.

If one chose the strata Dk a priori and one knew the variances J-Lk/Y k in each stratum in advance, one could optimize the nk subject to the constraints given in Eqs. ( 47) and (48). The optimal solution is easily obtained through the use of Lagrange multipliers and is supplied in standard treatments.54,57,59,62 As the method is apparently not presently being used in the context of path-integral Monte Carlo, we briefly outline the route to the optimal solution.

If we sample all of the strata sufficiently (i.e., we let nk become very large, although still finite), then

N.

- L -J-Lk' k=! nk

(51)

Since J-Lk is independent of nk we can optimize Eq. (51) with respect to all of the nk, subject to Eqs. (47) and (48). We find that the optimum sampling strategy for a fixed set

of Dk is to sample the strata such that the number of samples within each stratum is given by

nk(opt) =n ~J-Lk/r, (52)

( Ns )2

L= L $;. . k=!

(53)

Use of this recipe for the nk predicts an optimum value for the standard error,

(54)

This optimized variance is still a function of how we choose to stratify the domain.

It can be shown that the optimized standard error given by Eq. (54) is always less than the standard error given by Eqs. (36) and (37).54 Thus, stratified sampling with optimized nk will always reduce the variance of a crude Monte Carlo integration. However, if the nk are not optimized or if the stratification boundaries are not well chosen, the improvement in the variance may be insignificant. One obvious problem is that we do not know the strata variances a priori. If we use estimates, then the nk will not precisely equal the optimum ones of Eq. (52). If we use poor estimates the stratification may even be counterproductive.

Sophisticated (and computationally expensive) algorithms have been developed in some previous work to optimize the stratification of D.84

,85 We have adopted a simpler strategy for the stratification of the configuration space, in which we base the stratum boundaries on a preliminary sample, which-for efficiency-is not discarded, but rather is retained for use in the final integral evaluation. We combine this with the Box-Muller algorithm to sample the Fourier coefficient space. The resulting algorithm is very effective. In the rest of this paper we describe the algorithm in some detail. Our discussion is presented in the context of Fourier path-integral Monte Carlo computations of exact quantum-mechanical partition functions, but could readily be applied in other contexts.

IV. COMPARISON OF BOX-MULLER AND METROPOLIS ALGORITHMS FOR A ONE-DIMENSIONAL INTEGRAL

In Sec. III B we pointed out that Eqs. (36) and (37) are not guaranteed to yield an accurate estimate of the width of the distribution of v independent n-point Monte Carlo calculations of an integral whose exact value is given by ¢. In this section we will present several numerical studies which illustrate this point further. We consider an experiment similar to the one discussed in Sec. III B, i.e., we carry out v mutually independent integration experiments, with each experiment carried out by generating n samples of the domain (which mayor may not be mutually independent). We use these samples to estimate ¢, defined by Eq. (22), and the second central moment J-L(¢), defined by Eq. (29). Let D be the positive real axis, let g(x;a) be a normalized importance sampling function parametrized by a, and let h(x;a) =/(x)/g(x;a). Then



TABLE I. Comparison of Monte Carlo algorithms for 200 independent Monte Carlo integrations with n=200000."

Scheme (¢>ej1)v b (m~l)v C wev) d . [1'\ 1) ,1'\2)]"

Met. Ag 1.99988 0.57045 0.016889 [33.0,61.0] Met. Ii' 2.00046 0.571 14 0.016899 [30.5,55.5] BMAi 2.000 12 0.57080 0.016894 [70.5,94.5] BM JJi 2.00007 0.57093 0.016896 [71.5,98.0] Exact 2.00000 0.57080 0.016894 [68.3,95.4]

a¢>=f(j h(x;a)g(x;a)dx; a= 1. See Eqs. (57) and (58). b Average value of integral over n X v samples.

wlv] r . 0.036791 0.041968 0.017199 0.Ql5600 0.016894

CAverage value of the second central moment of h(x;a) over nXv samples. dSee Eq. (38).

(111),112)]. w[v] • ~

[65.0,97.0] 2.18 [09.5,96.0] 2.48 [70.5,95.0] 1.02 [64.5,96.0] 0.92 [68.3,95.4] 1.00

"Percentages of the 200 integrations which lie respectively within one and two standard deviations (previous column) of (¢>~j»v-

fSee Eq. (61). 'Metropolis walk; rejects negative x values as having too high a "penalty" and reevaluates g(x;a) at the previous configuration. Step size, 2.25; RANF is used to generate the pseudorandom number (PRN) sequences.

hSame as footnote g, except RCARRY2 is used to generate the PRN sequences. iBox-Muller algorithm used to sample from a Gaussian with width given by u = 1/,p;;; samples with negative x values are discarded without being used. RANF is used to generate the PRN sequences. iSame as footnote i, except RCARRY2 is used to generate the PRN sequences. -

(55) f-L(ifJ) =1T/2-1. (60) ifJ= fa"" h(x;a)g(x;a)dx,

and the normalization condition is

fa"" g(x;a)dx= 1. (56)

We will consider a particular h(x;a) of the form

h(x;a) =1+ ~x, (57)

and let our importance sampling function be a normalized Gaussian

Next, ifJ is calculated by drawing n samples from g(x;a) and averaging h(x;a) over all of the samples according to Eq. (39), repeating the calculation v times. One can estimate the width of the distribution of such n-sample experiments by using Eqs. (36) and (40), but the estimate will only be accurate if the n samples in a given calculation are mutually independent. Alternatively, the width of the distpbution can be estimated using the standard formula for the variance of v independent Gaussian-distributed objects,70

{ 1 v } 112

w~vl= v-I i~l [ifJ~J_<ifJ~j»v]2 . (61 )

g(x;a) SO'exp( -ax2 )dx .

(58)

Both ifJ and f-L(ifJ) may be calculated analytically, and they happen to be independent of a. They are given by

Notice the different convention for the superscript to distinguish this estimate from Eq. (38). Unlike Eqs. (36) and ( 40), the assumption of mutually independent samples has no relevance to whether Eq. (61) is appropriate. Thus, if v is large enough, Eq. (61) yields a reasonably accurate ifJ=2, (59)

TABLE II. Comparison of Monte Carlo algorithms for 2000 independent Monte Carlo integrations with n=200 000."

(¢>~Jl)v (m;()}v w(v) [1'\ 1) ,1'\2)] wlv] [111),pf)] w lv]

Scheme . • n ;:;v>

Met. A 1.99996 0.57071 0.016893 [32.7,58.8] 0.041280 [68.3,95.2] 2.44 Met. B 2.00002 0.57067 0.016892 [33.2,58.8] 0.040 646 [67.7,95.8] 2.41 BMA 1.99998 0.57083 0.016894 [68.7,95.6] 0.016834 [68.6,95.5] 1.00 BMB 2.00002 0.57075 0.016893 [68.0,95.4] 0.016839 [67.8,95.3] 1.00 Exact 2.00000 0.57080 0.016894 [68.3,95.4] 0.016894 [68.3,95.4] 1.00

"See Table I for explanations of columns and rows.




TABLE Ill. Comparison of Monte Carlo algorithms for 200 independent Monte Carlo integrations with n=2 000 000.&

(~~)v (m~j)v w(vl [PjI',p?)] w!v] [J1l),pfl] w[v]

Scheme . • n

~ Met. A 2.000 15 0.57094 0.0053430 [28.0,58.0] 0.013 289 3 [67.0,95.0] 2.49 Met. B 1.99995 0.57083 0.0053424 [25.5,54.5] 0.014013 4 [67.0,96.5] 2.62 BMA 1.99996 0.57079 0.0053422 [66.5,97.0] 0.0053308 [66.0,97.0] 1.00 BMB 2.00000 0.57074 0.0053420 [69.5,96.5] 0.004995 I [64.5,95.0] 0.94 Exact 2.00000 0.57080 0.005 342 3 [68.3,95.4] 0.0053423 [68.3,95.4] 1.00

"See Table I for explanations of columns and rows.

value for the width of n-point integrations regardless of the algorithm used to carry out the integration. We note that this method is practical for test problems, but not in general for demanding applications, since it requires both n and 'V to be large.

We will compare the accuracies that can be achieved for this problem using two methods to sample from g(x;a). The first method is the Metropolis algorithm, which is explained above. The second method is the BoxMuller algorithm,63 which is well described in standard treatments.S9

,60 Fundamentally, one generates two random numbers (51,52) which are uniformly distributed on (0,1); then the following equations are used to generate two random numbers (YI,y2) which are distributed on (- 00,00 )

with distribution width u:

YI =u~-2In( 1-51)COS(21T52),

Y2=u~-2ln( I-51 )sin( 21T52)'

(62)

(63)

Note that (YI,y2) are mutually independent samples drawn from a Gaussian distribution centered at the origin, with width u on the interval (- 00,00 ). This differs from the Metropolis algorithm in that the sequentially generated configurations are uncorrelated.

In Tables I-III and Figs. 1 and 2, we compare the Metropolis algorithm to the Box-Muller algorithm using various values of n and 'V. In the basic Metropolis algorithm, one generates a randomly chosen configuration within some subspace of the infinite domain about the present configuration, and then tests the new configuration to see whether the energy has increased or decreased. The new configuration is then accepted or rejected as described in Sec. III A, except that in the present case ax2 takes the role of /3V(x). However, we wish to restrict the domain to positive values of x [note the limits of integration in Eqs. (55) and (56)], and therefore we must constrain the Metropolis walk from sampling negative values of x.

There are several ways to restrain the Metropolis walk from generating negative values of x while still sampling g(x;a) properly. In the present study all steps which produce negative values of x are automatically rejected as having too high a penalty, causing h(x;a) to be counted again at the previous configuration. The Box-Muller algorithm also generates negative x samples; these were simply not accumulated for averaging purposes.

In the Metropolis calculations, the step size was adjusted so that the average acceptance ratio was approximately 50%. The random numbers were generated with RANF, the intrinsic Cray FORTRAN random number generator, or RCARRY2, a modified version of the portable random number RCARRY presented recently by James.86

(a) Metropolis

120

100 w (v) (33%)

n 80

N 60 w [v] (65%)

n

40

20

0 1.985 1.99 1.995 2 2.005 2.01 2.015

range of ~(j) n

(b) Box-Muller

250

200 W [v] n 1.0 =

N 150 W (v)

n

100

50

0 1.985 1.99 1.995 2 2.005 2.01 2.015

range of ~(j) n

FIG. 1. Comparison of Metropolis and Box-Muller algorithms. The function integrated is given in Eqs. (55)-(60). In each calculation [(a) based on the Metropolis scheme, and (b) based on Box-Muller integration], n=200 000 configurations are used in each of v=2000 independent integrations of [(x). See text and Table II for more details.




(a) Metropolis

35

30

25 W (v) (28%)

n

N 20 W [v] (67%) n

15

10

5

0 1.985 1.99 1.995 2 2.005 2.01 2.015

range of cp(j) n

(b) Box-Muller 80

70

60 w [v]

1.0 _n __ = 50 W (v)

N n 40

30

20

10

0 1.985 1.99 1.995 2 2.005 2.01 2.015

range of cp{j) n

FIG. 2. Same as Fig. 1, except that n=2000000 configurations are used in each of v= 200 independent integrations of/ex). See text and Table III for more details.

All calculations were carried out on a Cray X/MP-EA computer.

In Tables I-III, some overall trends are evident. The first is that the Box-Muller and Metropolis algorithms predict the correct value of the integral with comparable accuracy. However, w~v) as given by Eqs. (38) and (40) is not an accurate estimate of the width of the distribution of n-sample Metropolis walks, while it is an excellent estimate of the width of n-sample Box-Muller integrations. As previously stated, this is due to the fact that the configurations

TABLE IV. Tests of Metropolis walk equilibration period."

v n <tfJ';P) v <m~j»v w(v) n [1'\ I) ,1'\2)]

200 2x105 1.99952 0.57031 0.001 689 [27.0,56.0] 200 2xl06 2.000 14 0.57102 0.000 534 [34.0,64.0]

2000 2x105 1.99992 0.57063 0.004689 [30.5,59.5]

generated by the Metropolis algorithm are not mutually independent.8

0-82 Second, the actual width of the Metrop

olis distribution of n-sample integrations is, on the average, 2.5 times the width of the Box-Muller distribution (see Fig. 1). Thus, while both algorithms are quite accurate, the Box-Muller algorithm is much more rapidly convergent. Third, RCARRY2 performs just as well as RANF in all cases examined. We have therefore chosen to use RCARRY2 in all of our FPI calculations, as its use allowed us to write our FPI Box-Muller Monte Carlo code in machine-portable FORTRAN.

In the Metropolis calculations described above, no equilibration period was employed for the Metropolis walks. As we have mentioned previously, we are not guaranteed to sample from g(x) from the start of the chain and therefore it is generally important to allow the walker to equilibrate before collecting samples for averaging purposes. Therefore, in Table IV we present calculations using Metropolis variant I with varying n and v, which have been equilibrated for a period of 10%.72 We find that in this simple one-dimensional case, the statistics are unchanged by equilibration; which is to say that g(x) is correctly sampled almost immediately. This justifies our omission of the equilibration period for this case. In general, in multidimensional problems, one is not usually so fortunate, and one should never omit the equilibration period without checking to see if the statistics are affected.

In Tables V and VI we investigate the dependence of our conclusions on the acceptance ratio. The reason we do this is that in certain contexts, a 50% acceptance ratio is not considered to be the optimum choice.72 Table V shows the results of equilibrated and nonequilibrated calculations with a 10% acceptance ratio, and Table VI shows calculations using a 90% acceptance ratio. In both cases we find that the actual width of the distribution of calculations is greater than those obtained using a 50% acceptance ratio, which indicates that the values given in Tables I-IV are a fair comparison between the Metropolis and Box-Muller algorithms. Thus, even when we optimize the step size in the Metropolis algorithm, it leads to an error which is 2-2.5 times larger than using mutually independent samples.

As mentioned earlier, the use of w~vl to calculate the error presumes that both n and v are large, which is impractical for multidimensional problems. In the next section we will compare the Metropolis and Box-Muller algorithms in the context of Fourier path-integral Monte Carlo computations, and in order to do so conveniently we

w!vJ lJ1l ) ,112)] w~vJ

n w~V)

0.004243 [66.0,96.5] 2.51 0.001 308 [72.0,94.0] 2.45 0.004 120 [68.5,95.7] 2.44

'See Table I for explanation of columns. Variant A of the Metropolis code was used in alI calculations. See Table I for exact values.




TABLE V. Metropolis walks using 10% acceptance ratio.'

Equil.b (~J!l>v (mJ!l>v w(v) [PP),Pj2)] w!vJ [11I),pf)]

wiv] v n • • ~

No 200 2XlOs 1.99963 0.571 01 0.001689 [15.0,27.0] 0.008277 [67.5,95.5] 4.90 No 200 2X106 1.99973 0.57063 0.000534 [15.5,35.0] 0.002609 [67.0,95.5] 4.88 No 2000 2X105 2.00005 0.57103 0.001690 [15.3,30.9] 0.008403 [67.5,95.4] 4.97 Yes 200 2XlOs 1.999 11 0.57061 0.001689 [ 16.0,30.0] 0.008177 [65.5,95.5] 4.84 Yes 200 2X106 1.99969 0.57059 0.000534 [11.5,30.0] 0.002678 [69.0,96.5] 5.01 Yes 2000 2XlO5 1.99990 0.57082 0.001689 [16.2,31.0] 0.008432 [67.9,95.3] 4.99

"See Table I for explanation of unfootnoted columns. Variant A of the Metropolis code was used in all calculations; step size, 12.5. See Table I for exact values.

blndicates whether the run was equilibrated with IO%n initial moves.

need a way of computing the error from a single calculation. Therefore we have experimented with the use of Eqs. (41)-(45) to estimate the error. Instead of the exponential fitting procedure developed by Straatsma and co-workers (discussed in Sec. III B), we choose to strictly truncate the sum in Eq. (41) at K=Kmax' In Table VII we compare two different methods for choosing Kmax in the case of the onedimensional example we have been discussing. In method A, Kmax is chosen according to the criterion of Straatsma, Berendsen, and Starn, i.e., Kmax is the smallest K such that

p(K) > 2 #ar(PK)' K= 1, ... ,Kmax ' (64)

Examining Table VII, we see that using method A tends to slightly underestimate the expected error, but that it is still a reasonably good estimate of w~]. Method B is similar to method A, except that Kmax is the smallest K such that

p(K) >0,

and

(65)

This has the effect of including more terms than does method A, while excluding contributions which cannot be distinguished from statistical noise. We see in Table VII that method B still slightly underestimates the error but is a better estimate than method A; it also is reasonably accurate for most purposes (i.e., it is correct to 1 significant figure). Therefore, we recommend method B for the estimation of error estimates of correlated data series, and use it in Sec. IV.

TABLE VI. Metropolis walks using 90% acceptance ratio.'

Equil.b v n (~~j»v <m~j»v w~v) [PjI),Pj2)]

No 200 2XIOs 2.000 73 0.57059 0.001689 [5.0,11.0] No 200 2XI06 2.00026 0.571 59 0.000 535 [5.0,12.5] No 2000 2XIOs 2.00134 0.57244 0.001692 [7.5,13.9] Yes 200 2X 105 1.99859 0.56785 0.001685 [4.5,11.0] Yes 200 2X106 2.00040 0.57206 0.000 535 [4.5,13.5] Yes 2000 2X 10' 2.000 18 0.57056 0.001689 [7.1,13.9]

In Sec. V A we present a method for integrating the partition function of a coupled oscillator, using BoxMuller importance sampling in the Fourier coefficient space59,63 and uniform sampling in the configuration space, and we apply it to a coupled two-dimensional harmonic oscillator. In Sec. V B we extend the method by combining the Box-Muller method of sampling the Fourier coefficient space with a simple algorithm for stratified sampling in the configuration space, and we show that it substantially decreases the standard error of the calculations. Finally, in Sec. V C we present an improved algorithm we call adaptively optimized stratified sampling, which may be efficient enough to carry out calculations of quantum partition functions for polyatomic molecules.

V. PARTITION FUNCTION COMPUTATIONS

V. A. A strongly coupled two-dimensional system

We have carried out calculations on an anisotropic model harmonic system with two strongly coupled degrees of freedom. The Hamiltonian is

A ~ (PJ mj(j)J 2) H=!- -2 +-2-X; +axlx2,

)=1 m) (66)

where ml is the approximate reduced mass of a diatomic OH fragment, and m2 is the approximate reduced mass of an H2 fragment. So that the tests give relevant insight into the effort necessary to treat an actual molecule, the diagonal parameters of the system (frequencies and masses) were chosen to resemble a stretch-bend model of two vi-

w!vJ [111),112)] w rv] . . ~

0.020233 [70.0,96.0] 12.0 0.006023 [66.5,96.0] 11.3 0.019741 [69.0,95.5] 11.7 0.020992 [69.5,94.5] 12.5 0.006294 [68.0,96.0] lI.8 0.019630 [68.2,95.3] 11.6

"See Table I for explanation of unfootnoted columns. Variant A of the Metropolis code was used in all calculations; step size, 0.3. See Table I for exact values.

blndicates whether the run was equilibrated with lO%n initial moves.



TABLE VII. Comparison of methods for choosing Kmax.a

V n (~~J1)v w(v) . u},v] . W.(A)b (Kmax)C W.(B)d (Kmax) <

200 2X105 1.99952 0.001689 0.004243 0.003906 11.5 0.003960 24.3 200 2Xl06 2.000 14 0.000 534 0.001308 0.001249 11.6 0.001252 24.5

2000 2X105 1.99992 0.001689 0.004120 0.003909 14.9 0.003964 28.6

'See Table I for explanation of unfootnoted columns. Variant A of the Metropolis code was used in all calculations; step size, 2.25 (average acceptance ratio, 0.519). See Table I for exact values.

bComputed width from Eqs. (41), (46), and (62). <Average value of Kmax used to calculate w. (previous column). ~mputed width from Eqs. (41), (46), and (63).

brational modes of an H20 molecule. The experimental vibrational frequencies of H 20 are87 VI = 3943 cm -1, v2

=3832 cm- I , and v3=1648 cm- I . The parameters we used, as given in Table VIII, correspond to VI =3900 cm- I

and V2 = 1650 cm - I. The partition function for this system can be solved analytically for both a finite and an infinite number of Fourier coefficients (see Appendix A). Thus we can monitor the convergence of the integrand separately from the convergence of the Monte Carlo (MC) computation, which we will see is important in the calculations for large values of K.

First, we examine a set of calculations carried out by sampling coordinate space uniformly from a square box with sides of length L at various temperatures. This is roughly equivalent to using the K-coefficient approximation to the density matrix of a particle confined in an infinite square potential as an importance sampling function [see Eqs. (19) and (20) and Appendix A]. In all cases the finite approximation to the infinite-dimensional Fourier coefficient space is sampled over the full infinite interval. The calculation is accomplished by generating n configurations and evaluating the inverse exponential of the action functional S(x,a) at each sample via Gauss-Legendre quadrature.35

,60 In each sample, both components of the vector x are chosen uniformly. The components of the a array are drawn from a normalized multidimensional Gaussian distribution via the Box-Muller algorithm (see Sec. IV) with widths given by the parameters aj,/. All components of the x vector and the a array are varied simultaneously, i.e., at each sample a new value of x and a is generated and used.

The average over n = 1 000 000 configurations is used to form the MC estimate of the partition function via the expression

(67)

TABLE VIII. Parameters of the coupled 2D harmonic H 20 modeL"

ml =0.948 amu m2=0.504 amu

IX= 10.0 eV A-2

())I =0.73464 fs- I

aJ2=0.31079 fs- I

aFor Hamiltonian of Eq. (63). Conversion factors between this set of units and atomic units are 1 A=0.529 177 bohr, 1 hartree=27.2114 eV, 1 amu= 1822.89 atomic units of mass, and 1 fs=41.3414 atomic units of time [A. Szabo and N. S. Ostlund, Modern Quantum Chemistry (McGraw-Hill, New York, 1989); E. R. Cohen and B. N. Taylor, Phys. Today 44 (No.8), 9 (1991)].

The standard error is estimated using Eqs. (36), (39), and ( 40) to obtain

wn( T) = ~mn( T)/n, (68)

m n( T) = [Qpb( T) ]2[ (e- 2S(x,a»n_ (e-S(x,a»~]. (69)

The size of the box was obtained by carrying out a series of calculations with K = 1 and n = 50 000, increasing L until the successive calculations varied by less than the desired accuracy. In this case a box length equal to 5.0 A resulted in changes in (Q(T»n ofless than 2% (when compared to L';'-5.5 A) at the highest temperature considered. Thus, L=5.0 A was used for all calculations.

The result of a series of calculations at T = 600 K is given in Fig. 3, and calculations over a range of temperatures are given in Table IX. In Fig. 3 we plot the ratio of the finite-K approximation of the partition function to the exact value of Q( T) for both the analytic values and the Monte Carlo calculations. We see that, for virtually all temperatures and all values of K, the algorithm converges to the correct values within the Monte Carlo statistical limits. The sole exception seems to be at T= (300 K)/K = 16, but we attribute this to a normal statistical fluctuation.

Originally all calculations were carried out using a 10-point Gauss-Legendre integration of S(x,a) at each sam-

FIG. 3. Convergence of finite-K approximation to the partition function at T=600 K. Triangles indicate the exact values at finite K, which are given by Eq. (A9) in the Appendix. Circles with 2w. error bars indicate Monte Carlo calculations using a uniform, or "crude," sampling of the configuration space. See text and Table IX for more details.




TABLE IX. Crude FPI Box-Muller MC partition function computations.'

Q(Kl(T) b (Q(T». 2w.(T) c T(K) K

Q(T) ±---

Q(T) Q(T)

300 758.4 780±75 2 168.5 159±25 4 39.06 38±9 8 9.530 9±2

16 3.437 2.6±0.8 32 1.894 1.7±0.6

600 10.32 1O.2±0.53 2 S.06O 4.9±0.30 4 2.86S 2.8±0.19 8 1.816 1.88±0.14

16 1.369 1.43±0.1O 32 l.l74 1.20±0.09

900 I 3.306 3.29±0.13 2 2.194 2.11 ±0.09 4 1.628 1.S8±0.07 8 1.308 1.3S±o.06

16 l.lSO l.lS±0.05 32 1.074 1.04±O.OS

1200 2.061 2.08±0.07 2 1.S87 l.57±0.OS 4 1.322 1.32±0.OS 8 l.l64 1.20 ± 0.04

16 1.082 1.09±0.04 32 1.041 1.03±0.04

'n= 1000 000 samples; L=S.O A; configuration space sampled uniformly. bExact finite-K approximation to partition function (see Appendix A) divided by the exact value.

<Monte Carlo calculation of finite-K approximation to partition function divided by the exact value (see Appendix A). Error bars at the 95% confidence level are estimated by Eqs. (68) and (69).

pIe. However, upon repeating the calculations with K equal to 16 and 32 several times and using different random number sequences, we found that there was a source of systematic error which caused the Monte Carlo calculations to be too high. However, increasing the order of the Gauss--Legendre quadrature to a 3D-point integration produces the correct statistics, and increasing the order further had a neglible effect on the calculations. We also found that lO-point integration was sufficient for small K. This behavior is apparently due to the rapid oscillations of the high-order sine functions used to expand the "paths" for large values of K. Similar results were recently reported by Doll and co-workers.40 For all calculations presented here we have used lO-point Gauss-Legendre quadrature for K<8 and 3D-point quadrature for K;;;d6.

Considering the convergence of the Fourier pathintegral representation itself, we found that in the hightern perature range (T = 900--1200 K) the partition function is converged to within 4% of the actual value for K=32. However, with the same value of K there is a 20% error when T=600 K, and a 72% error when T=3oo K. Moreover, the statistical error bars for the calculations are somewhat larger than desired in the low-temperature calculations. However, there are also some very encouraging trends. One is that the fractional standard error actually decreases as a function of K at all four temperatures, which

indicates that the convergence properties of Monte Carlo calculations for very large values of K should be excellent (this result is certainly not guaranteed to be the case in general, as discussed recently by Oh88). However, as we have noted previously it may be necessary to use an even higher-order Gauss--Legendre quadrature rule in order to converge the calculation as K is further increased. We also may find that this property is unique to the harmonic oscillator and that the situation is not as favorable for more realistic potentials. Further study of these issues is in progress.92

v. B. Stratified sampling of the configuration space: A simple approach

In Sec. V A we evaluated the partition function using the Fourier coefficient representation of the integrals. We sampled the Fourier space by importance sampling with uncorrelated samples but the configuration space was sampled uniformly. This procedure will not be very efficient for integrating strongly varying functions of x, especially as the dimensionality of the configuration space increases. However, choosing a general importance function for uncorrelated sampling of the configuration space is less than straightforward. For example, an importance function which is appropriate for a Morse oscillator might be a poor choiCe for a double-well potential. Moreover, it may be extremely difficult to design a good importance function that can be sampled practically and efficiently in an uncorrelated fashion for a polyatomic molecule allowed to rotate, vibrate, and undergo internal rotations. Of course, the Metropolis algorithm provides a way to sample a good importance function with correlated samples, but our goal here is to try to develop procedures that do not introduce convergence-retarding correlations in the sample. Therefore, we will use a combination of stratified sampling of the configuration space with importance sampling of the Fourier coefficient space. In this subsection we describe a simple stratified sampling (SS) algorithm which serves as a starting point for a more sophisticated approach described in the next subsection.

In Sec. V C, we discussed stratified sampling and discussed how the variance can be optimized, given a fixed stratification. The stratification we will consider here is the simplest possible one; we establish two strata and label them (D1,D2). Dl is a square box of length b centered at the origin, and D2 its complement within the boundary length of the reference potential, which is an infinite square well of fixed length L (see Fig. 4). Given this stratification scheme, our objective is to optimize b and then allocate samples to each of the two strata according to Eqs. (52) and (53).

The optimization is achieved by noting that Eq. (50) will be minimized by causing each stratum to contribute equally to the square of the standard error. Thus, we proceed as follows: Picking a trial value of b and fixing L, K, and T, we carry out a crude calculation of the partition function using a modest value of npr "probe" samples (in the present case, npr= 10 000 samples). The samples are drawn uniformly over the domain D, which encloses both

J. Chem. Phys., Vol. 97, No.5, 1 September 1992



-2.5

\.. -1.5 -0.5 0.5 1.5 25

--------------~) V

L

FIG. 4. IlIustration of simple stratification scheme. The parameter b is to be optimized for fixed L, K, and T. L is chosen to be sufficiently large such that the calculation is well converged.

strata. During the course of the calculation we keep track of how many samples lie within D, and how many lie within Dz, and we calculate the fractional partition function and its standard error separately for each stratum. Then we use the method of bisection60 to minimize the difference between the squares of the standard errors of the two strata, varying b. Adequate minimization (i.e., to 1 or 2 significant figures) can be obtained within several trials if a reasonable guess of b is made.

Once b is optimized, stratified sampling calculations using a larger value of n are made for various values of K and T. This is accomplished by sampling uniformly over D with n r=n/4 initial probe samples, determining the optimum :alues of (n"n2) from application of Eqs. (52) and (53), and drawing the remaining 3n/4 samples uniformly within each stratum so that the total number of samples in each stratum is the optimum. All samples taken during the initial probe are retained in the final sample, but we have not made use of the samples used to optimize b for purposes of averaging, which is clearly not optimally efficient. This is one area in which we will improve the algorithm in Sec. YC.

It is important to note that in order for the procedure to work well, Ilpr must be large enough for the fractional partition functions to be reasonably well converged. Moreover, the procedure is not optimal in that b should actually be reoptimized for each value of K and T. In the next subsection we describe a more sophisticated method which does not suffer from these deficiencies.

The estimates of the partition function and its standard

11. 00 ~"""""'rT""T""T""T""T.....-rT""T"T~r-rr"""""'rr-r"T""T""T~1T""""'"

I : f a <Q(T}> >Q(T) '± 2 Wn(SS}/Q(T)l ,-.. 9.00

~ . .. QK(T)/Q(T} [exact}

'-' 01 7.00 ......

5.00

3.00

1.00

•

0

• •

5 10

, +- ·f· . t·

..

• -IS K 20 25 30 35

FIG. 5. Same as Fig. 3, except that the configuration space is sampled by stratified sampling (SS). Note that the error bars are much smaller than those in Fig. 3. See text and Table X for more details.

error are obtained by combining Eqs. (49) and (50) with Eqs. (67)-(69):

N.=2 (Q(T»n= I [Qpb(T)h(Q(T»nk'

k=! (70)

{ Nf2 [Qpb(T)]~ [( -2S(x,a»

wn(T) = £..- 1 e nk k=! nk-

_(e-S(x,a»~k]} 112. (71)

Note that the volume factor within [Qpb(T)Jk will generally depend upon the shape and size of the kth stratum.

The stratification parameter b was optimized with L =5.0 A, K=I, and T=300 K. The resulting value of bopt

equal to 0.555 A was then used for all values of T and K. In Fig. 5 we present the results of this scheme for the partition function at T = 600 K; the results should be compared with Fig. 3. We see that as before, virtually all of the Monte Carlo answers are converged to the exact finite-K approximations to the partition function within the expected statistical error, but now the statistical error bars are considerably smaller. Table X presents results at additional temperatures; we see that the error bars are 3.3 times greater on the average in the crude calculations than in the SS calculations, which is a substantial improvement gained by expending very little additional computational effort, i.e., the effort required to optimize b. Thus the SS method appears to be very effective in reducing the statistical errors associated with the calculation. However, the method may still not be adequate when the system of interest is a manydimensional system with a rapidly varying potential in the region of interest. Therefore, in the next subsection we consider a more sophisticated adaptive method which has better convergence properties.

v. C. Adaptively optimized stratified sampling

If we were to consider the extension of the stratified sampling method to a molecule or cluster vibrating and rotating in space, it might be necessary to try a variety of




TABLE X. FPI stratified sampling (SS) Monte Carlo partition function computations."

T K

Q(KI(T) b (Q(T)}. 2w.(T) c (Q(T». d

w.(crude) • (K) --- ±---

Q(T) Q(T) Q(T) w.(SS)

300 758.4 770±19 0.00232 3.98 2 168.5 166±6 0.00500 4.52 4 39.06 40±3 0.00122 3.61 8 9.530 11.0±0.7 0.00330 3.43

16 3.437 3.4±0.2 0.00101 3.53 32 1.894 2.0±0.2 0.000 60 3.26 00 1.000 0.000 30

600 1 10.32 1O.3±0.2 0.0188 3.41 2 5.060 5.03±0.08 0.0092 3.59 4 2.865 2.86±0.06 0.0052 3.39 8 1.816 1.87±0.04 0.00341 3.31

16 1.369 1.38±0.03 0.00252 3.09 32 1.174 1.16±0.03 0.00212 3.33 00 1.000 0.001 82

900 3.306 3.31±0.04 0.05S1 3.0S 2 2.194 2.21 ±0.03 0.0369 2.94 4 1.628 1.62±0.02 0.0271 3.11 8 1.308 1.32 ± 0.02 0.0219 3.15

16 1.150 1.1S±0.02 0.0191 3.46 32 1.074 1.08±0.01 0.0180 3.05 00 1.000 0.0167

1200 I 2.061 2.OS±0.02 0.110 2.99 2 I.S87 I.S9±0.02 0.08S6 2.89 4 1.322 1.31 ±0.02 0.0706 2.96 8 1.164 1.18±0.01 0.0634 2.87

16 1.082 l.o8±0.01 0.0583 2.88 32 1.041 1.03±0.01 0.OS62 2.84 00 1.000 0.0539

"n = 1 000 000 samples; L = 5.0 A; configuration space sampled via the stratified sampling (SS) method. ~ee Table IX. ·See Table IX. Error bars are estimated by Eqs. (68) and (71). d Absolute value of Monte Carlo partition function. K = 00 values repre-sent the exact partition function for the system (see Appendix A).

"Ratio of error of crude calculations (see Table IX) to error of SS calcu-lations.

more complex stratification schemes than the one used in the preceding subsection. For example, in the case of a diatomic molecule, the potential energy is symmetric about the molecular center of mass, i.e., the potential is only a function of the internuclear distance r. For such a radially symmetric problem the use of cubic strata about the center of mass would most likely be less efficient than the use of spherical strata, as the ratio of the volume of a hypersphere to that of a hypercube rapidly decreases as a function of the number of degrees of freedom. The volume of an Ndimensional hypersphere with hyperradius R is given by2,89,90

r[N] sphere (72)

where r(N) is the usual gamma function. The ratio of this volume to that of the hypercube enclosing it is therefore

<Y.dN] N/2 , sphere 1/'

r!~J 2Nr(NI2+ 1) . (73)

This ratio is asymptotically equal to the fraction of samples uniformly generated within a hypercube which lie within the hypersphere enclosing it, and this ratio rapidly tends towards zero as N becomes large. As an example, let us say that we needed to generate I million samples within a hypersphere to obtain the desired convergence in a Monte Carlo integration. If N = 9, the ratio of volumes in Eq. (71) is approximately 0.0064. Therefore, it may be necessary to generate 156 million samples (using 1.4 billion random numbers) to obtain the desired 1 million samples within the hypersphere. As N increases it becomes more difficult to generate the desired number of samples within the hypersphere. If the function to be integrated is also radially symmetric, then samples taken outside of the hypersphere represent wasted effort if the hypersphere is sufficiently large that these samples contribute little to the integral overall. Thus, for radially symmetric problems (or problems with some degree of radial symmetry), sampling directly from a sphere will be more effective than sampling from a cube, as the ratio of the volume which contributes significantly to the integral to the volume of the sampling domain will tend to decrease more slowly as a function of N.

In two and three dimensions, it is relatively straightforward to generate random samples uniformly within a sphere, and a variety of methods can be used to produce the desired distribution, In the present paper this is accomplished in two dimensions by generating a random number 5 uniformly on the interval (0,1) and evaluating

(74)

to obtain r, the radius of the sample. 59 We then choose the direction of the vector pointing to the sample such that the distribution of polar angles is uniform on the interval (0,21/'). The Cartesian coordinates of each configuration generated by this procedure are then easily obtained via a point transformation at each sample.

Another possibility for improving the efficiency of the algorithm is to use more strata. For example, one could stratify along the radius, forming three concentric spherical strata (D1,D2,D3 ). However, as the number of strata increases, the uncertainty of the boundary optimizations as well as of the optimized nk will increase, and this must be kept in mind when designing stratified sampling algorithms. We expect that the improvement in efficiency gained by sampling directly from the interior of a sphere will enable us to optimize the Dk and nk for the threestratum case reasonably well. However, it is necessary to further develop our stratified sampling algorithm in order to ensure that these quantities are obtained with enough efficiency to use the method routinely. We have developed such an algorithm, which we will refer to as "adaptively optimized" stratified sampling (AOSS).

The AOSS algorithm proceeds in three stages. In the first stage, we determine the boundary of the domain D over which the partition function is to be evaluated. We need to find a boundary radius R of a "great sphere" enclosing D which will be large enough to determine the integral accurately, yet not so large as to waste effort by




1.00

0.60

0.20 X

2 -0.20

-0.60

-1. 00 ~..L...L.....L..<......::~~~g::::...........~,-,--....-.l -1.00 -0.60 -0.20 0.20 0.60 1.00

Xl

FIG. 6. Example of the bins used in the second stage of adaptiyely optimized stratified sampling (AOSSJ within hyperspheres for N=2 (two dimensions). Here R= 1 and B=20 bins. The outer radii of the first, second, and 19th bins are indicated.

generating samples which are very far from the origin. We also wish to use as many of the samples for this determination as possible in the final integration of the function. This is accomplished by choosing a set of T test radii (R t =R1,R2, ••• ,R r with Rt>Rt + 1), and uniformly sampling the interior of each successively smaller sphere with p samples in each test. The value of exp[ -S(x(i),a(i»] is evaluated at each configuration and stored in an array, as is the radius of each configuration. The samples taken uniformly on the interior of the lth sphere which lie within smaller test spheres are uniform on the interior of the smaller spheres, and therefore these samples can be used in later optimization tests. Successively smaller spheres are sampled until the integral is observed to be stationary within a specified percent tolerance, at which point R is fixed. All samples drawn from spheres with R1>R which fall within the boundary of the optimized great sphere can then be used in the final evaluation of the integral.

In the second stage, we optimize the stratification boundaries for the three subdomains (D1,D2,D3). We divide up the great sphere with boundary radius R into B bins, with the radius of the outermost boundary of the bth bin given by rb' In general, one could divide up the great sphere in any number of ways, but we choose to divide it into B bins that all have the same volume. The reason for using equal volumes is that this binning ensures that a uniform sample over the great sphere will generate approximately the same number of samples within each bin, which will ensure that we gain some information about each bin. An example of such a binning for N = 2, B = 20, R = I is shown in Fig. 6. Note that the first bin is a circle about the origin while the other 19 bins are concentric strips, and that the bins are not evenly distributed along the radial direction. We then assign the m samples gener-

ated during the first optimization stage to these bins. Then, we gerrerate more samples uniformly on the interior of the great sphere until a total of npr probe samples are generated, and assign these to the bins as well. We then generate all distinct combinations of the sampled bins such that three concentric and contiguous strata are defined in each combination. We generate these combinations by starting with the innermost bin and sweeping out to the outermost bin. Each combination of the B bins into three strata can be specified by the three radii of the outermost boundaries of the strata. For example, if the bins shown in Fig. 6 are used, combination 1 would be denoted by (ri>r2>R), combination 2 by (rl,r3,R), combination 18 by (rl,rI9,R), combination 19 by (rz,r»R), combination 20 by (r2>r4,R), and so forth. Note that the number of stratifications to be tested is equal to (1/2)(B - 2)(B - 1).

For each of the trial stratifications we calculate mle

which is the Monte Carlo approximation to /Lie within each stratum [see Eqs. (51) and (68)]. Then, as in Sec. V B, the optimum stratification boundaries are chosen such that the contribution of each stratum to the total error is approximately the same as every other stratum, which corresponds to choosing the stratification with the minimum value of Var(mk):

3

Var(mk) -i L (mk- {mk)3)2, k=l

3

{mk)3=~ L mk' k=!

(75)

(76)

Once the stratification boundaries are chosen according to this criterion, we can compute a prediction for the optimum fraction of moves to be allocated to each stratum from Eq. (52), which we denote T k:

(77)

where G is given by the Monte Carlo approximation to r in Eq. (53). In computing the Tie we have made full use of the information obtained during the probe sample, and we will retain all samples used to optimize the stratum boundaries for use in the final averaging, in contrast with the SS procedure.

In the third stage of the calculation, we commit ourselves to the optimized set of boundaries and concentrate on allocating the optimum number of samples to each stratum. By the end of the calculation, we want to ensure that the three strata have been sampled according to our best estimate of Eq. (52). However, it is possible that we may have initially (in the second stage) made a poor estimate of the optimum number of samples to be allocated to each stratum. We deal with this in the third stage by dividing the remaining (nr=n -npr ) samples in the computation into rounds, with each round consisting of ns samples. In the first round, the initial guess of the T k is used to decide what the nk should be at the end of the round by scaling npr+ns by T k:

(78)



TABLE Xl. Comparison of the AOSS-U method with SS-U and AOSS-M methods for calculation of quantum partition functions.'

T(K) SS-ut' AOSS-M" AOSS-U" Exact (K)· Exact (<Xl)f

300 (6± 1) X 10-6 (6±2)XlO-6 (5.3±0.73) X 10-6 5.70XlO-6 3.01X 10-6

600 (2.2±0.18) X 10-3 (1.7±0.24) X 10-3 (2.11 ±0.096) X 10-3 2.14X 10-3 1.82X 10-3

900 (1.83±0.077) X 10-2 (1.78±0.089) X 10-2 (1.75±o.058) X 10-2 1.79 X 10-2 1.67 X 10-2

1200 (5.4±0.19) X 10-2 (6.2±0.22) X 10-2 (5.5±0.14) X 10-2 5.61xlO-2 5.39X 10-2

AFSE' 8.3% 14% 6.0%

an = 100 000 samples, K = 32, 30-point Gauss-Legendre quadrature used in all calculations. All error bars are estimated at 95% confidence levels. t>ss sampling of x within two cubic strata; Box-Muller sampling of a over the infinite domain. L=5.0 A.; bopt=0.555 A.; npr=25 000. CAOSS sampling of x; Metropolis sampling of a. dAOSS sampling of X; Box-Muller sampling of a. Error bars are estimated by Eqs. (70) and (71) with N.=3. 'Exact value for K-coefficient approximation to partition function (see Appendix A). fExact value of partition function (see Appendix A). 'Average of fractional statistical errors.

Then the number of samples accumulated during the first two stages within each stratum (denoted by mk) is subtracted from np) to obtain the total number of samples to be generated within the kth stratum in the first round, given by

Ni1)=ni1)-mk. (79)

If N.!/) ,0, the kth stratum is not sampled during the first round. As the round of sampling proceeds, the mk are updated to include the newly generated samples in the running totals.

In each subsequent round, the 1 k are recomputed using the samples obtained during the previous stages and rounds, and the above procedure is repeated. If in the lth round any of the Nk/) are negative or zero, the kth stratum is not sampled in that round, but may be sampled in later rounds if N}/) becomes positive and more samples are needed in the kth stratum. In short, it is not important that the sampling be optimal at the end of each round, but only that the sampling be optimal at the end of the calculation. The procedure is repeated until n samples are obtained.

The method is completely automatic and requires no intervention from the user. In our experience this procedure produces the desired number of samples within each stratum without oversampling. However, it should be noted that oversampling in one of the strata can occur if npr is too large a fraction of n, in which case one or more strata can be oversampled during the previous two stages. Therefore, care must be taken when choosing npr; it must be large enough to ensure a good determination of the stratification boundaries, yet not so large as to oversample any of the strata. In our code we ensure that the running total of samples equals n at the end of the calculation. However, if desired the calculation can also proceed until a targeted statistical error is achieved.

In Table XI we compare calculations carried out via the stratified sampling method (discussed in Sec. VB) to ones carried out using the AOSS method just described, combined with Box-Muller sampling of the Fourier coefficient space. These results are labeled SS-U and AOSS-U, respectively, where the U denotes that the samples are fully uncorrelated. We also compare both methods to the AOSS method combined with Metropolis sampling of the Fourier

space (AOSS-M). We used n= 100 000 samples to evaluate the partition function, instead of the 1 million samples used in the preceding subsection.

The error analysis of the AOSS-M walks is complicated by the fact that the function evaluations generated within each stratum are correlated to one another due to the correlations introduced by the Metropolis algorithm. However, since the AOSS algorithm draws configurations from various portions of the Metropolis walk in order to sort them into strata in the second stage, we may be able to approximate the situation as if the samples within each stratum came from their own independent Metropolis walks. Thus, in analogy with Eq. (50) we take the error in the AOSS-M computation as the sum of squares of the correlation errors within each stratum:

(

N.=3 2) 1/2 W n = L Wn

k=l k (80)

with wnk given by Eqs. (41)-(46), (63), and (69). In the first AOSS stage, we allow a maximum of T= 19 trial spheres with Rt equally spaced on the interval (1,10) bohrs. In the second AOSS stage, B= 100 bins of equal volume are used (which is more than is necessary for this simple system), with a total of npr=40 000 samples accumulated before the stratum boundaries are optimized. In the third stage, ns= 6000 samples are used in each round until n= 100 000 samples are generated.

We see from Table XI that for a given number of accumulated function evaluations, the average fractional statistical error bars are smaller in the AOSS-U method than in the AOSS-M or the SS-U method. Moreover, the AOSS-U method significantly outperforms the AOSS-M method in terms of accuracy. Since the AOSS-U and AOSS-M methods sample the configuration space in exactly the same way, and in light of our studies in Sec. IV, we conclude that the use of the Box-Muller algorithm to sample the Fourier coefficient space is much more efficient than the use of the standard Metropolis algorithm.

For the calculation of partition functions for small polyatomic molecules, the issue of principal importance is the size of the error for a given number of samples, because the computational effort for calculations on complex systems is dominated by the cost of function evaluations, not




the overhead incurred by generating configurations. Thus we must consider the fraction of generated function evaluations which are actually used in the evaluation of the partition function. We have discussed the fact that the AOSS-U method makes more efficient use of the samples used to determine the boundary of the domain than the SS-U method, as well as the fact that the samples used to determine the optimum stratification are wasted in the SSU method. In particular, it required about 55% more samples to generate the SS-U results in Table XI than to generate the AOSS-U results. We also note that in the Metropolis calculations, the step size for the Metropolis walk in Fourier space must be optimized at each temperature, and the walker must be equilibrated for a number of samples after step-size optimization before samples can be accumulated.72 The function evaluations taken during the equilibration period must be ignored in the final integration, and this is another reason that the AOSS-U method is much more efficient than might be indicated by inspection of Table XI. Finally, we also note that the Metropolis walk error bars given in Table XI, which are estimated through strict truncation of the autocorrelation function and the assumption of independence within each stratum, are most likely a slight underestimate of the true error in the average. Thus the Metropolis algorithm may be performing significantly worse than indicated in Table XI compared to the Box-Muller algorithm.

Perhaps the most striking aspect of the calculation is the fact that for this strongly coupled, highly anisotropic system the AOSS-U method is able to calculate quantum partition functions with 95% confidence limits on the order of a few percent using only 100 000 samples. These calculations take only a few minutes to complete on a single processor of a Cray X-MP lEA computer, and similar timings apply to RISC-based workstations such as the IBM RS/6000. Furthermore, since we have achieved good accuracy with uncorrelated sampling procedures, the calculations are very well suited to parallel computation. Because of the favorable scaling properties of Monte Carlo calculations with respect to the number of degrees of freedom,53 there is good reason to expect that calculations on small polyatomic molecules will be feasible with the present algorithm, and such calculations are currently in progress.92

VI. DISCUSSION

We have presented a new Monte Carlo method for the direct calculation of quantum-mechanical canonical partition functions based upon the Fourier coefficient representation of the density matrix. The method uses the quantum mechanical particle-in-a-sphere as a reference system and combines Box-Muller sampling of a Gaussian probability density in Fourier coefficient space with adaptively optimized stratified sampling (AOSS) of the configuration space. We have carried out calculations of the partition function of an anisotropic test system consisting of two strongly coupled vibrational modes with diagonal parameters similar to a stretch-bend model of two vibrational modes of H 20. The calculations were carried out in the

reference frame of the uncoupled system at a variety of temperatures. Excellent agreement with exact values was obtained in virtually all cases. Another useful result of this work is the introduction of a simple, yet highly successful way to estimate the statistical error of an integral obtained from averaging over correlated samples, based upon the previous work of Straatsma and co-workers. 82

In the present work we have shown that the uncorrelated samples generated via the Box-Muller method are more efficient for sampling the NXK Fourier coefficient variables than the standard Metropolis algorithm. It is interesting to compare our approach to the Fourier pathintegral formalism used by Doll et al. in previous studies to calculate free-energy differences for gas-phase van der Waals clusters. Their method involves calculating average potential energies over a range of temperatures and carrying out a "state integration. ,,35,36,40 In one of these studies40

the Box-Muller algorithm was used as part of the algorithm for taking Metropolis "steps" through the manydimensional Fourier space. The samples generated by that procedure were necessarily sequentially correlated, because a different (nonintegrable) probability density function was used than the one used in the present study. Moreover, in the state integration method for calculating free energies used in that work, it was necessary to carry out a number of simulations in order to estimate the free energy at each individual temperature, and several additional convergence issues arose (the range of temperatures to be used, the number of temperatures within the range, etc.). The focus of our strategy has been to calculate microscopic partition functions using a single (uncorrelated) statistical sample at each temperature of interest. Thus, the present method is designed for the calculation of absolute free energies, and is not directly applicable to the calculation of other thermodynamic quantities (such as enthalpies, canonical ensemble averages, etc., although, of course, since it yields partition functions, all thermodynamic properties2 would be calculable in principle from multiple runs). Also, as it stands the present method is applicable in a practical sense to small or medium-sized systems and further work would be required to develop it for calculations on larger systems, primarily because, as the number of degrees of freedom is increased we will need to use yet more efficient stratified sampling techniques for sampling the N-coordinate space degrees of freedom. Finally, we note that adaptively optimized stratified sampling techniques such as those employed here could possibly be used in combination with the methods employed by Doll and co-workers4O to accelerate the convergence of their procedure by reducing the variance in the Metropolis integrations. It remains to be seen whether such a strategy would result in a substantial improvement, but since the computational overhead of stratified sampling is generally small, a combination of the two methods may be worthwhile.

To make Fourier path-integral partition function calculations practical, an important issue is the rate at which QfKl (T) approaches Q( T) as a function of K, where K is the number of sinusoidal functions used to represent the paths. This issue has been considered in some detail by




Kono, Takasaka, and Lin,91 who compared the convergence of the primitive discretized and Fourier path-integral expressions for the partition function (see Sec. II) of the one-dimensional harmonic oscillator. We note that they found that at low temperatures the primitive discretized form converges more rapidly than the primitive Fourier form, and that the partially averaged Fourier form3s converges more rapidly than either form. However, another issue to consider is the dependence of the Monte Carlo error on T and K. We have found some evidence that at least for the system we have studied, the statistical properties can actually improve as a function of K. It would be interesting to compare the discretized and Fourier approaches on this basis. Also, further work to study the relative efficiency of the two approaches at higher temperature, when the system is anharmonic, when it possesses multiple minima, when it is described by several strongly coupled degrees of freedom, and/or when it is vibrating and rotating in space, would be desirable. Several of these issues are addressed again in further work.92

Freeman and Doll reported that reasonably accurate results for thermodynamic averages in cluster simulations can be obtained by using a single Fourier coefficient for each degree of freedom in configuration space, if partial averaging is employed.3s This is a very encouraging result, since the dimensionality of the integrals is small compared to those used by other workers in implementations of the discrete representation of the density matrix ll ,14,17-2I,23-27 with some notable exceptions.93,94 However, the discrete representation has recently been cast into well-optimized forms which apparently improves upon this situation.23•24,26,27,91 For example, Mak and Andersen,26 as well as Cao and Berne,27 have obtained promising results on model systems with two degrees of freedom. There are evidently a wealth of different approaches which could be pursued in the solution of this type of problem.

VII. SUMMARY

The calculations presented here have demonstrated that the AOSS Box-Muller Fourier path-integral Monte Carlo method should be useful in the calculation of numerically exact vibration-rotation partition functions, and hence quantum free energies, for coupled vibrational problems. Statistical errors on the order of a few percent have been obtained for the model system we have studied, while expending only very modest computing resources. In the following paper92 we apply our methods to the computation of microscopic vibration-rotation partition functions for a diatomic molecule. Work to extend the methods to polyatomic systems is currently in progress.

ACKNOWLEDGMENTS

We thank Gregory Tawa, Da-hong Lu, and Manish Mehta for many helpful conversations. This work was supported in part by the National Science Foundation.

APPENDIX: SOME USEFUL EXACT INTEGRALS

In carrying out our Fourier path-integral computations of partition functions, we make use of several exact integrals, which are presented in this appendix. In particular, we must obtain the integral of the denominator of Eq. (20) in order to evaluate the partition function using Monte Carlo methods, according to Eq. (67). Since we have studied a linearly coupled harmonic system, we also can compare our Monte Carlo partition functions to exact values for finite and infinite values of K.

We first consider the integral of the denominator in Eq. (20). For simplicity, we consider the case N=1. We define QJ,~1 (T) as follows:

Q~fl(T) =JlK1 ({3) JLl2 dX1 f da1,l'" f dal,K -Ll2

(

00 a2) Xexp - I 2J.

/=1 1,1

with

1/2 K

J\Kl({3) = (2::;;#) I~ Ji;Uj,l'

(AI)

(A2)

This is because eJl ( T) = 0 outside of D, and Vpb[x (s)] = 0 within D. The integral over Xl is trivial, as is the Kdimensional integral over Fourier coefficient space, since the integrand is a K-dimensional Gaussian. We have

K

Q~fl(T)=LJ\Kl({3) II G(UI,/) 1=1

(A3)

with

G(UI,/) = f:oo dal,le-ai,l2ui.l= ~U1,1' (A4)

Substituting Eqs. (A2) and (A4) into Eq. (A3), we find that

Q~fl(T)=L(2;#)1I2Cg ~U1JC~1 ~U1,1)

=L ( 2:;;# r12

• (AS)

In N dimensions, for a potential well of arbitrary shape we have

Q~fl(T) =r iIi (2:;Jfz2) 112, (A6)

where r is the volume of the N-dimensional space. However, Eq. (AS) is the same as the exact K -+ 00 partition function for the quantum particle in a box. I Thus we find that QJ,~1 (T) = Qpb( T) for all Kl This result, which may seem surprising, arises from the fact that we have used .If ({3) instead of JI ({3) (which is an infinite product) to form QJ,~1 ( T). However, using the infinite product in the FPI Box-Muller MC calculation of Q( T) would be inconsistent with the fact that the integral is finite dimensional; the Jacobian would have more degrees of freedom than the




element of volume in Fourier coefficient space. Thus, Eq. (A6) is the proper quantity to be used in Eq. (67).

On the other hand, the (actual) finite-K integral for the partition function of the one-dimensional harmonic oscillator has been derived previously.34,91,95 We have

with 0 the oscillator frequency. This formula can be applied to the two-dimensional (2D) coupled harmonic oscillator given in Eq. (64) if we transform to normal coordinates. Carrying out the transformation, the effective frequencies for the decoupled system are given by

2+ 2 1 [ 2 1/2 O±= Q)I 2 Q)2±2: (Q)i+Q)~)2_4( Q)iQ)~-m~mJ] ,

(AS)

and the finite-K partition function for the 2D system is then found using

QL~l (T) = QL~l (T;O+ )QL~l (T;O_). (A9)

Similarly the infinite-K partition function is2

[1 ({3fzO+) 1 [1 ({3fzO_) 1 = 2: csch -2- 2: csch -2- .

(AlO)

lR. P. Feynman, Statistical Mechanics: A Set of Lectures (Benjamin, Reading, MA, 1972).

2D. A. McQuarrie, Statistical Thermodynamics (University Science, Mill Valley, CA, 1973).

3D. Chandler, Introduction to Modern Statistical Mechanics (Oxford University, New York, 1987).

4H. Eyring, D. Henderson, B. J. Slover, and E. M. Eyring, Statistical Mechanics and Dynamics (Wiley, New York, 1982).

5p. S. Dardi and J. S. Dahler, J. Chern. Phys. 93, 3562 (1990). 6X. G. Zhao, A. Gonzalez-Lafont, D. G. Ttuhlar, and R. Steckler, J.

Chern. Phys. 94, 5544 (1991). 1D. G. Truhlar, A. D. Isaacson, and B. C. Garrett, in Theory ofChemical Reaction Dynamics, edited by M. Baer (CRC, Boca Raton, FL, 1985), Vol. IV, pp. 65-137.

8 J. I. Steinfeld, J. S. Francisco, and W. L. Hase, Chemical Kinetics and Dynamics (Prentice-Hall, Englewood Cliffs, 1989).

9p. Hanggi, P. Talkner, and M. Borkovec, Rev. Mod. Phys. 62, 251 (1990).

10C. Zheng, J. A. McCammon, and P. G. Wolynes, Chern. Phys. 158, 261 (1991).

11 J. A. Barker, J. Chern. Phys. 70, 2914 (1979). 12H. F. Trotter, Proc. Am. Math. Soc. 10, 545 (1959). 13M. Suzuki, Cornrnun. Math. Phys. 51, 183 (1976). 14H. De Raedt and B. De Raedt, Phys. Rev. A 28, 3575 (1983). ISB. J. Berne and D. Thirurnalai, Annu. Rev. Phys. Chern. 37, 401

(1986). 16D. Chandler and P. G. Wolynes, J. Chern. Phys. 74, 4078 (1981). l1G. Jacucci and E. Omerti, J. Phys. Chern. 79, 3051 (1983). 18E. L. Pollock and D. M. CeperJey, Phys. Rev. B 30, 2555 (1984). 19R. A. Friesner and R. M. Levy, J. Chern. Phys. 80, 4488 (1984): 20M. Sprik, M. L. Klein, and D. Chandler, Phys. Rev. B 31, 4234 (1985). 21M. Sprik, M. L. Klein, and D. Chandler, Phys. Rev. B 32,545 (1985).

22R. D. Coalson, J. Chern. Phys. 85, 926 (1986). 23X. P. Li and J. Q. Broughton, J. Chern. Phys. 86,5094 (1987). 24p. Zhang, R. M. Levy, and R. A. Freisner, Chern. Phys. Lett. 138, 236

(1988). 2sH. Kono, A. Takasaka, and S. H. Lin, J. Chern. Phys. 89, 3233 (1988). 26c. H. Mak and H. C. Andersen, J. Chern. Phys. 92, (1990). 21J. Cao and B. J. Berne, J. Chern. Phys. 92, 7531 (1990). 28R. P. Feynrnan and A. R. Hibbs, Quantum Mechanics and Path Inte

grals (McGraw-Hill, New York, 1965). 29F. W. Wiegel, Introduction to Path Integral Methods in Physics and

Polymer Science (World Scientific, Singapore, 1986). 3OR. P. Feynman and H. Kleinert, Phys. Rev. A 34,5080 (1986). 31J. D. Doll and D. L. Freeman, J. Chern. Phys. 80, 2239 (1984). 32D. L. Freeman and J. D. Doll, J. Chern. Phys. 80, 5709 (1984). 33 J. D. Doll, R. D. Coalson, and D. L. Freeman, Phys. Rev. Lett. 55, 1

(1985). 34R. D. Coalson, D. L. Freeman, and J. D. Doll, J. Chern. Phys. 85, 4567

(1986). 3sD. L. Freeman and J. D. Doll, Adv. Chern. Phys. 70B, 139 (1988). 36T. L. Beck, J. D. Doll, and D. L. Freeman, J. Chern. Phys. 90, 5651

(1989). 37W. H. Miller, J. Chern. Phys. 63, 1166 (1975). 38G. Franke, E. Hi1f, and L. Polley, Z. Phys. D 9, 343 (1988). 39G. Franke and J. Schulte, Z. Phys. D 12, 65 (1989). 4OJ. D. Doll, D. L. Freeman, and T. L. Beck, Adv. Chern. Phys. 78, 61

(1990). 41J. K. Lee, J. A. Barker, and F. F. Abrai3am, J. Chern. Phys. 58, 3166

( 1973). 42M. R. Mruzik, F. F. Abraham, D. E. Schreiber, and G. M. Pound, J.

Chern. Phys. 64, 481 (1976). 43 J. P. M. Postrna, H. J. C. Berendsen, and J. R. Haak, Faraday Syrnp.

Chern. Soc. 17, 55 (1982). 44 A. Warshel, J. Phys. Chern. 86, 2218 (1982). 4SW. L. Jorgensen and C:Ravimohan, J. Chern. Phys. 83, 3050 (1985). 46p. A. Bash, U. C. Singh, R. Langridge, and P. A. Kollman, Science 236,

564 (1987). 47B. Widom, J. Chern. Phys. 39, 2808 (1963). 48R. L. Coldwell, J. P. Henry, and C.-W. Woo, Phys. Rev. A 10, 897

(1974). 49R. Kulver, J. Cornput. Chern. 11, 511 (1990). soJ. P. Valleau and G. M. Torrie, in Statistical Mechanics. Part A: Equi

librium Techniques, edited by B. J. Berne (Plenum, New York, 1977), pp. 169-191.

SI D. L. Beveridge and F. M. DiCapua, Annu. Rev. Biophys. Biophys. Chern. 18, 431 (1989).

S2C. A. Reynolds and P. M. King, in Computer-Aided Molecular Design, edited by W. G. Richards (Oxford University, London, 1989), pp. 51-60.

s3F. James, Rep. Prog. Phys. 43, 73 (1980). 54 J. M. Hammersley and D. C. Handscomb, Monte Carlo Methods

(Methuen, London, 1965). SSS. K. Zaremba, SIAM Rev. 10, 303 (1968). 56J. H. Halton, SIAM Rev. 12, 1 (1970). s1D. G. Truhlar and J. T. Muckerman, in Atom-Molecule Collision The

ory, edited by R. B. Bernstein (Plenum, New York, 1979), pp. 505-566. 58 L. M. Raff and D. L. Thompson, in Theory of Chemical Reaction Dy

namics, edited by M. Baer (CRC, Boca Raton, FL, 1985), Vol. III, pp. 1-121.

S9M. H. Kalos and P. A. Whitlock, Monte Carlo Methods. Volume I: Basics (Wiley, New York, 1986).

6OW. H. Press, B. P. Fleming, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipies: The Art of Scientific Computing (Cambridge University, Cambridge, 1986).

61 B. Schmeiser, in Stochastic Models: Handbooks in Operations Research and Management Science, edited by D. P. Heyman and M. J. Sobel (Elsevier, New York, 1990), Vol. 2, pp. 295-330.

62M. H. Hansen, W. N. Harwitz, and W. G. Madow, Sample Survey Methods and Theory (Wiley, New York, 1953), Vois. 1 and 2.

63G. E. P. Box and M. E. Muller, Ann. Math. Stat. 29, 610 (1958). MN. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and

E. Teller, J. Chern. Phys. 21, 1087 (1953). 6S J. P. Valleau and S. G. Whittington, in Statistical Mechanics. Part A:

Equilibrium Techniques, edited by B. J. Berne (Plenum, New York, 1977), pp. 137-168.




66J. W. Brady, D. D. Doll, and D. L. Thompson, J. Chern. Phys. 74,1026 (1981 ).

67K. Binder, in ,"donte Carlo Methods in Statistical Physics, edited by K. Binder (Springer-Verlag, Berlin, 1979), pp. 1-36.

61 K. Binder and D. Stauffer, in Applications of the Monte Carlo Method in Statistical Physics, edited by K. Binder (Springer-Verlag, Berlin, 1987), pp. 1-45.

69 A. F. Bielajew and R. W. O. Rogers, in Monte Carlo Transport of Electrons and Photons, edited by T. M. Jenkins, W. R. Nelson, and A. Rindi (Plenum, New York, 1987), pp. 407-419.

7OW. Feller, An Introduction to Probability Theory and Its Applications (Wiley, New York, 1950).

71 D. Bouzida, S. Kumar, and R. H. Swendsen, Phys. Rev. A 45, 8894 (1992).

72M. P. Allen and D. J. Tildesley, Computer Simulation of Liquids (Oxford Science, London, 1987).

7lW. A. Lester and B. L. Hammond, Annu. Rev. Phys. Chern. 41, 283 (1990).

74G. J. Tawa, J. W. Moskowitz, P. A. Whitlock, and K. E. Schmidt, Int. J. Supercomput. Apps. 5, 57 (1991).

75 J. R. Barker, J. Phys. Chern. 91, 3849 (1987). 76J. L. Doob, Ann. Math. Stat. 14, 229 (1943). "w. J. Dixon, Ann. Math. Stat. 15, 119 (1944). 7SU. Grenander, Arkiv. Mat. 1 (17), 195 (1950). 7'lG. M. Jenkins and D. G. Watts, Spectral Analysis and Its Applications

(Holden-Day, San Francisco, 1968).

SOE. B. Smith and B. H. Wells, Mol. Phys. 53, 701 (1984). 81S. K. Schiferl and D. C. Wallace, J. Chern. Phys. 83, 5203 (1985). 82T. P. Straatsma, H. J. C. Berendsen, and A. J. Starn, Mol. Phys. 57, 89

(1986). 83 0. D. Anderson, Time Series and Forecasting: The Box-Jenkins Ap

proach (Butterworths, London, 1975). 84G. P. Lepage, J. Comput. Phys. 27, 192 (1978). 85 J. H. Friedman and M. H. Wright, ACM Trans. Math. Software 7, 76

(1981). 86F. James, Comput. Phys. Commun. 60, 329 (1990). 87W. S. Benedict, N. Gailor, and E. K. Plyler, J. Chern. Phys. 24, 1139

( 1956). 88M._S. Oh, Contemp. Math. 115, 165 (1991). 89 H. Flanders, Differential Forms with Applications to the Physical Sci

ences (Dover, New York, 1989). 90 J. Avery, Hyperspherical Harmonics: Applications in Quantum Theory

(Kluwer, Dordrecht, 1989). 91H. Kono, A. Takasaka, and S. H. Lin, J. Chern. Phys. 88, 6390 (1988). 92R. Q. Topper, G. J. Tawa, and D. G. Truhlar, following paper, J.

Chern. Phys. 97, 3668 (1992); R. Q. Topper and D. G. Truhlar (unpublished), manuscript in preparation (1992).

93R. A. Kuharski and P. J. Rossky, J. Chern. Phys. 82, 5164 (1985). 94G. S. Del Buono, P. J. Rossky, and J. Schnitker, J. Chern. Phys. 95,

3728 (1991). 9sM. Takahashi and M. Imada, J. Phys. Soc. Jpn. 53, 963 (1984).



Quantum free-energy calculations: Optimized Fourier path-integral … › pages › topper ›...

Documents

Transcript of Quantum free-energy calculations: Optimized Fourier path-integral … › pages › topper ›...