Diﬀusion Quantum Monte Carlo · Diﬀusion Quantum Monte Carlo Neil D. Drummond TCM Group,...

Diffusion Quantum Monte Carlo

Neil D. Drummond

TCM Group, Cavendish Laboratory, University of Cambridge

QMC and the CASINO program, TTI, Vallico Sotto, Italy

Thursday 2nd August, 2007

Introduction

• DMC is the most accurate total-energy method for systems with more than a fewtens of electrons.

• VMC is generally used only as a preliminary to DMC studies.

• I shall describe the DMC algorithm in detail.

• You do not need to understand all the details in order to use DMC.

• You do need to be aware of issues such as time-step bias in order to carry outmeaningful DMC work.

References

• CASINO’s DMC algorithm is essentially the same as that described in C. J. Umrigar,M. P. Nightingale and K. J. Runge, J. Chem. Phys. 99, 2865 (1993).

• The exact algorithm is described in the CASINO manual.

• Information about keywords can be obtained using CASINOHELP.

• Useful book: B. L. Hammond, W. A. Lester, Jr. and P. J. Reynolds, Monte CarloMethods in Ab Initio Quantum Chemistry, World Scientific (1994).

• General overview: W. M. C. Foulkes, L. Mitas, R. J. Needs and G. Rajagopal, Rev.Mod. Phys. 73, 33 (2001).

Imaginary-Time Schrodinger Equation (I)

• Imaginary-time Schrodinger equation (ITSE):

[H(R)− EL

]Φ(R, t) = −1

2∇2Φ(R, t) + U(R)Φ(R, t)− ETΦ(R, t) = −∂Φ(R, t)

∂t,

where Φ(R, t) is a function of configuration R and imaginary time t, U is potentialenergy and ET is a reference energy.

• Write

Φ(R, t) =∞∑

n=0

cnφn(R) exp[−(En − ET )t],

where En and φn are nth eigenvalue and eigenfunction of Hamiltonian H(R).

Imaginary-Time Schrodinger Equation (II)

• Excited states die away exponentially compared with ground state.

• If ET = E0 then, in limit that t→∞, Φ is proportional to φ0.

• Ground-state component of Φ is “projected out”.

• This is true for any reasonable boundary conditions on Φ.

Importance-Sampling Transformation

• Suppose we have a trial wave function Ψ(R). Then

−12∇2f(R, t) +∇ · [V(R)f(R, t)] + [EL(R)− ET ] f(R, t) = −∂f(R, t)

∂t,

where f(R, t) = Φ(R, t)Ψ(R) is the importance-sampled wave function, V(R) =Ψ−1(R)∇Ψ(R) is the drift velocity and EL(R) = Ψ−1(R)H(R)Ψ(R) is the localenergy.

• Proof: substitute Φ = Ψ−1f into ITSE.

• Consequences of importance sampling:

1. Term in ITSE involving potential U(R) is replaced by term involving local energyEL(R), which is relatively uniform. Makes branching DMC algorithm stable.

2. Configurations are distributed according to f = ΦΨ rather than Φ. (More useful.)3. Fixed-node approximation is introduced (see later).

DMC Green’s Function (I)

• Importance-sampled ITSE in integral form:

f(R, t) =∫

G(R← R′, t− t′)f(R′, t′) dR′,

where Green’s function G(R ← R′, t − t′) is solution of ISITSE satisfying initialcondition G(R← R′, 0) = δ(R−R′).

• Formally, Green’s function is

G(R← R′, τ) = 〈R|e−τ(T+EL−ET )|R′〉' 〈R|e−τ(EL−ET )/2e−τTe−τ(EL−ET )/2|R′〉,

where T (R) = −12∇2 + (∇ ·V) + V · ∇ and EL is local-energy operator.

• Error in short-time approximation is O(τ3).

DMC Green’s Function (II)

• Further short-time approx.: drift velocity constant between R and R′.

• Obtain DMC Green’s function:

GDMC(R← R′, τ) = GD(R← R′, τ)GB(R← R′, τ),

where

GD(R← R′, τ) = 〈R|e−τT |R′〉 ' 1(2πτ)3N/2

exp

(−[R−R′ − τV(R′)]2

2τ

)

is the drift–diffusion Green’s function and

GB(R← R′, τ) = exp(−τ

2[EL(R) + EL(R′)− 2ET ]

)

is the branching factor.

DMC Green’s Function (III)

• DMC Green’s function is exact in limit of small time steps τ .

• GD is Green’s function for ISITSE without (EL − ET )f term; describes evolution ofdensity of randomly diffusing “particles” in 3N -dimensional fluid of velocity V(R).

• GB is solution of ISITSE without first two terms on LHS; represents exponentialgrowth/decay in density of “particles” at each point in configuration space.

• Green’s function for ITSE: 〈R| exp[−τ(H − ET )]|R′〉. Hence Green’s function forISITSE can be written as

G(R← R′, τ) = Ψ(R)〈R| exp[−τ(H − ET )]|R′〉Ψ−1(R′).

• exp[−τ(H − ET )] is Hermitian and the Green’s functions are real, so

Ψ2(R′)G(R← R′, τ) = Ψ2(R)G(R′ ← R, τ).

Approx. that V(R) is const. between R′ & R violates this detailed-balance condition.

Propagation of Configuration Population (I)

• f is represented by a set of discrete, time-dependent points in configuration space:

f(R, t) 'NC(m)∑

α=1

wα(m)δ(R−Rα(m)),

where t = mτ and τ is the time step (=dtdmc).

• Time-dependent point Rα(m) together with its weight wα(m) is referred to as a“configuration” or “walker”. NC(m) is total number of configs at iteration m.

• Substitute above expression for f into integral form of ISITSE:

f(R, t + τ) =NC(m)∑

α=1

wα(m)GB(R← Rα(m), τ)GD(R← Rα(m), τ).

Propagation of Configuration Population (II)

• To simulate this, configurations drift by τV(Rα(m)) and diffuse (are displaced by arandom vector, Gaussian-distributed with variance τ). Branching factor is absorbedinto a new weight for each configuration.

• Make a number of moves before energy data are accumulated, to allow excited-statecomponents of Φ to die away: equilibration phase.

• Then continue to propagate configurations, but gather energy data: statistics-accumulation phase.

Drift, Diffusion and the Accept/Reject Step (I)

• Drift–diffusion step: each electron i in each configuration α is moved from r′i(α) tori(α) according to

ri = r′i + χ + τvi(r1, . . . , ri−1, r′i, . . . , r′N),

where χ is a three-dimensional vector of Gaussian-distributed numbers with varianceτ and zero mean and vi(R) denotes those components of the total drift velocityV(R) due to electron i.

• Hence each electron i is moved from r′i to ri with transition-probability density

ti(r1, . . . , ri−1, ri ← r′i, r′i+1, . . . , r

′N)

=1

(2πτ)3/2exp

([ri − r′i − τvi(r1, . . . , ri−1, r′i, . . . , r

′N)]2

2τ

).

• Transition-probability density for move from R′ = (r′1, . . . , r′N) to R = (r1, . . . , rN)

is probability that each electron i moves from r′i to ri.

Drift, Diffusion and the Accept/Reject Step (II)

• So, transition-probability density for the configuration move:

T (R← R′) =N∏

i=1

ti(r1, . . . , ri−1, ri ← r′i, r′i+1, . . . , r

′N).

• In limit τ → 0, drift velocity V is constant over configuration move. Hence

T (R← R′) = GD(R← R′, τ),

so drift–diffusion process is described by drift–diffusion Green’s function.

• For finite time steps, approximation that drift velocity is constant violates detailed-balance condition.

• Enforce detailed balance using a Metropolis-style accept/reject step.

Drift, Diffusion and the Accept/Reject Step (III)

• In electron-by-electron algorithm, move of ith electron is accepted with probability

min{

1,ti(r1, . . . , ri−1, r′i ← ri, r′i+1, . . . , r

′N)Ψ2(r1, . . . , ri, r′i+1, . . . , r

′N)

ti(r1, . . . , ri−1, ri ← r′i, r′i+1, . . . , r

′N)Ψ2(r1, . . . , ri−1, r′i, . . . , r

′N)

}

• In configuration-by-configuration algorithm, move of all N electrons is accepted withprobability

min{

1,T (R′ ← R)Ψ2(R)T (R← R′)Ψ2(R′)

}

• In either case, RMS distance diffused in configuration space over one time step is√3Nτp, where p is the acceptance probability.

• For given time step, electron-by-electron algorithm is more efficient, becauseacceptance probability is higher.

• CASINO can perform either e-by-e or c-by-c calculations (choose with dmc methodkeyword), but only the former should be used in practice.

Branching and Population Control (I)

• After all electrons have attempted to move, branching factor of configuration α iscalculated:

Mb(α, m) = exp[(−1

2[S(Rα,m) + S(R′

α,m)]+ ET (m)

)τeff(α, m)

],

where τeff is effective time step of configuration α at iteration m (see later) and S islimited local energy (see later).

• “Unweighted” DMC: number of copies of configuration α in next time step is

M(α, m) = int[η + Mb(α, m)],

where η is a random number drawn from a uniform distribution on [0, 1].

• Expected number of configurations after branching: Mb(α, m).

Branching and Population Control (II)

• “Weighted” DMC: each config. carries a weight wα, which is multiplied by Mb(α, m)after each move. Only if weight gets large or small does config. branch or combine.

• Branching: daughter configurations each receive an equal fraction of the parentconfiguration’s weight. Conserves total weight.

• Combining: another “weak” configuration is sought, one of the two is chosenrandomly with probabilities proportional to weights, total weight is given to “winner”,and “loser” is killed. Conserves total weight.

• Clearly, branching procedure introduces no bias. Neither does combining procedure:e.g., if β and γ are to be combined then expected weight of β afterwards is

〈w′β〉 = (wβ + wγ)wβ

wβ + wγ+ 0

wγ

wβ + wγ= wβ.

• Use lwdmc to choose between weighted and unweighted DMC.

Branching and Population Control (III)

• Let Ebest(m) be best estimate of ground-state energy at iteration m.

– During equilibration Ebest is average of configuration local energies over lastebest av window iterations.

– During statistics accumulation, Ebest is mixed estimator of Hamiltonian.

• Reference energy ET (m) is adjusted so that total weight

Mtot(m) =NC(m)∑

α=1

wα(m)

does not deviate too much from target weight M0. Reference energy is updated as

ET (m + 1) = Ebest(m)− min{1, τ}τEFF(m)

log(

Mtot(m)M0

),

where τEFF is mean effective time step (see later).

Branching and Population Control (IV)

• For weighted DMC, average configuration weight is not necessarily unity. So meanconfiguration population is generally not equal to M0.

• DMC Green’s function was derived under the assumption that ET is constant.Adjusting ET leads to population-control bias.

• Suppose local energies are mostly less than E0. Population will try to increase.Population-control mechanism counteracts this.

• Suppose local energies are mostly greater than E0. Population will try to decrease.Population-control mechanism counteracts this.

• In either case, average local energy increases as a result. Population control introducesa positive bias into DMC energy.

Branching and Population Control (V)

0 1000 2000 3000 4000Target population (configurations)

-7.695

-7.694

-7.693

-7.692

-7.691

-7.690

DM

C e

nerg

y (m

Ha

per

elec

tron

)

Population-control bias in a 3D Wigner crystal (a system in which population-controlbias is particularly significant).

• Since fluctuations in average local energy and branching factor are proportional to1/√

M0, population-control bias goes as 1/M0.

• Population-control bias increases very slowly as system-size increases.

• Improve trial wave function to reduce population-control bias.

Branching and Population Control (VI)

0 0.05 0.1 0.15Reciprocal of target weight

-0.078

-0.076

-0.074

-0.072D

MC

ene

rgy

(a.u

. per

ele

ctro

n)18 electrons; Unwt54 electrons; Unwt54 electrons; Unwt; S-J118 electrons; Unwt118 electrons; Wt226 electrons; Unwt

Population-control bias in a paramagnetic Fermi fluid with rs = 4 a.u. No Jastrowfactor was used, except where indicated by “S-J”.

Wave-Function Antisymmetry

• Want to find Fermionic (antisymmetric) ground state.

• Lowest-energy wave function is Bosonic (symmetric) ground state.

• Therefore have to constrain DMC to preserve antisymmetry.

• Constraint is “automatic” in importance-sampled DMC algorithm.

• If Φ and Ψ have different nodal surfaces, there must exist regions where f is negative.

• Our algorithm is based on interpreting f as a probability density.

• Can never have a negative f in our algorithm.

• So we cannot describe a change in the nodal surface of Φ.

• By importance sampling and not permitting weights to become negative, we haveintroduced the fixed-node approximation.

Fixed-Node Approximation (I)

• Nodes of Ψ divide configuration space into nodal pockets.

• Within each nodal pocket λ we in effect solve Schrodinger equation subject toboundary condition that (asymmetric) wave function Φλ is zero outside pocket.

• So HΦλ = Eλ0 Φλ + δλ, where δλ are δ functions at pocket boundary arising from

discontinuity of derivative of Φλ and Eλ0 is pocket energy.

• Consider antisymmetric wave function Φλ(R) = AΦλ(R) ≡ ∑P (−1)pPΦλ(R),

where {P} are operators that permute like-spin coordinates and {p} are correspondingparities.

• Variational principle:

EF0 ≤

〈Φλ|H|Φλ〉〈Φλ|Φλ〉

=〈Φλ|AHA|Φλ〉〈Φλ|A2|Φλ〉

= Eλ0 ,

so each pocket energy is greater than Fermion ground-state energy EF0 .

Fixed-Node Approximation (II)

• Have used fact that A is Hermitian and that it commutes with H, and thatcontribution due to δλ vanishes because Φλ = 0 at nodes.

• Within each nodal pocket λ, mixed estimator gives energy Eλ0 .

• Configuration populations in high-energy pockets die out, so DMC energy ismin{Eλ

0 } ≥ EF0 .

• Fixed-node DMC energy exceeds Fermion ground-state energy, becomingequal in limit that nodal surface is exact.

• Error in DMC energy is second order in error in nodal surface.

• FNA is only fundamental approximation in DMC.

• Drift velocity diverges at nodal surface, carrying away configurations.

• At finite time steps, configurations can drift/diffuse across surface.

Fixed-Node Approximation (III)

• Source of time-step bias. Best solution: reject such moves.

• FNA with antisymmetric trial wave function enables us to approximate lowest-energyantisymmetric eigenstate.

• Likewise, FNA with trial wave function of any given symmetry enables us toapproximate lowest-energy state with that symmetry.

• Hence we can use fixed-node DMC to calculate (some) excited-state energies by usingan appropriate trial wave function.

• Variational principle does not hold for excited-state energies in general.

Beyond the Fixed-Node Approximation (I)

• Why not just allow weights to change sign?

– In principle, could represent negative f by allowing weights to become negativewhen they cross the nodes of Ψ (can use a nodeless “guiding” wave function todetermine dynamics of configuration population).

– Dynamics of positive and negative configurations are identical.– Positive and negative configurations tend to same distribution.– In principle the difference of these distributions gives f ; in practice it is difficult to

take difference of two sets of almost identical noisy data.– Have an exponentially decaying signal-to-noise ratio.

• Released-node method :

– Perform fixed-node method to obtain good configuration distribution.– Then allow node crossing with change of sign of weights.– Allow equilibration, then try to obtain some statistics before statistical noise gets

too large.– Impractical for all but the simplest systems.

Beyond the Fixed-Node Approximation (II)

• Pairing methods:

– Try to keep positive and negative populations separate by annihilating positive andnegative configurations.

– Difficult to make algorithms stable in practice.– Has only been applied to small systems, with limited success.

• Alternative approach for reducing fixed-node error: improve the nodal surface.

• Optimise parameters that affect the nodal surface (parameters in orbitals,multideterminant expansion coefficients, backflow functions) together with Jastrowfactor.

BREAK(Reconvene in 15 minutes’ time.)

Effective Time Step

• RMS distance diffused by each electron during each move (should be√

3τ) is reducedbecause some moves are rejected.

• At finite time steps it is better to use a time step appropriate for the actual distancediffused when calculating branching factors.

• Effective time step:

τeff(α, m) = τ〈∆r2

d〉accepted

〈∆r2d〉attempted

= τ

∑i pi∆r2

d,i∑i ∆r2

d,i

,

where sums run over all attempted moves of electrons i in configuration α at time stepm. ∆rd,i are the diffusive displacements and the pi are the acceptance probabilitiesof electron moves.

Divergence of the Local Energy and Drift Velocity (I)

-3 -2.5 -2 -1.5 -1 -0.5 0Displacement of electron from Si (a.u.)

-15

-10

-5

0

5

Loc

al e

nerg

y (a

.u.)

S-J wave functionS wave function

Divergence in local energy of SiH4 as an electron is moved through the nodal surface.Note that local energy changes sign at divergence.

Divergence of the Local Energy and Drift Velocity (II)

• Local energy EL(R) = Ψ−1(R)HΨ(R) and drift velocity V(R) = Ψ−1(R)∇Ψ(R)diverge as 1/R, where R is distance from nodal surface.

• Divergences cause population-control problems and time-step bias. (E.g.,configurations can escape from their nodal pockets.)

• Eliminating divergences by limiting local energies and drift velocities improves stabilityof DMC.

• Use of limiting schemes controlled by limdmc keyword.

Drift-Velocity Limiting Scheme

• Limited one-electron drift velocity is

v =−1 +

√1 + 2a|v|2τ

a|v|2τ v,

where a ∈ (0, 1] and v is unlimited drift velocity.

– Limited drift velocity is used to calculate trial moves and acceptance probs.– Unlimited drift velocity is used to calculate local energies.

• Limited local energy is

S(R) = Ebest + [EL(R)− Ebest]|V (R)||V (R)|,

where V and V are limited and unlimited total drift velocities.

– Limited local energies are used to evaluate branching factors.– Unlimited energies are used in mixed estimator of energy.

Derivation of the Drift-Velocity Limiting Scheme (I)

• Drift velocity varies rapidly as it diverges near the nodes.

• Calculate Green’s function for drift process under assumption that ∇Ψ rather thanV = Ψ−1∇Ψ is constant when close to nodal surface.

• Let electron i have position vector ri(t) as it drifts, with all other electrons fixed atpositions {rj}j 6=i.

• Suppose configuration is close to node and ∇iΨ(R) 6= 0 at node.

• Then ri(t) must be close to a point si such that Ψ(r1, . . . , ri−1, si, ri+1, . . . , rN) = 0.Hence

Ψ(r1, . . . , ri−1, ri(t), ri+1, . . . , rN)

' ∇iΨ(r1, . . . , ri−1, si, ri+1, . . . , rN) · (ri(t)− si) = |A|r⊥i (t),

where A = ∇iΨ(r1, . . . , ri−1, si, ri+1, . . . , rN) is constant over move and r⊥i (t) iscomponent of ri(t)− si in direction of A.

Derivation of the Drift-Velocity Limiting Scheme (II)

• Equation of motion for single-electron drift process:

dri(t)dt

= vi(r1, . . . , ri−1, ri(t), ri+1, . . . , rN) = vi(t)

=∇iΨΨ' r⊥i

[r⊥i (t)]2.

• Hence

vi(t) =dr⊥i (t)

dt=

1r⊥i (t)

.

• Integrating this from time t to t + τ gives

r⊥i (t + τ)− r⊥i (t) '√

[r⊥i (t)]2 + 2τ − r⊥i (t) ≡ vi(t)τ,

Derivation of the Drift-Velocity Limiting Scheme (III)

• Magnitude of limited drift velocity is

vi(t) =

√[r⊥i (t)]2 + 2τ − r⊥i (t)

τ

=−1 +

√1 + 2τ/[r⊥i (t)]2

τ/r⊥i (t)=−1 +

√1 + 2v2

i (t)τvi(t)τ

.

• Limited drift velocity is

vi(t) =−1 +

√1 + 2av2

i (t)τav2

i (t)τvi(t),

where parameter a (=alimit) has been introduced such that a = 1 corresponds tosolution close to node and limit a→ 0 corresponds to “normal” solution.

• Limited velocity is only substantially different from unlimited velocity when latter islarge compared with 1/

√2aτ .

Derivation of the Drift-Velocity Limiting Scheme (IV)

• Magnitude of limited velocity is always less than that of unlimited velocity.

• Limited drift velocity reduces to unlimited velocity as τ → 0.

• Short-time approximation is improved upon, because the variation of the drift velocityover moves close to nodes is taken into account.

• Limited velocity does not diverge at nodes: it tends to√

2/aτ . Removes pathologicalbehaviour.

Derivation of the Drift-Velocity Limiting Scheme (V)

1 4 16 64 256Unlimited drift velocity (a.u.)

1

2

4

8

16L

imite

d dr

ift v

eloc

ity (

a.u.

)

Limited drift velocity

(2/aτ)1/2

(2aτ)-1/2

Limited drift velocity for a = 1 (typical value) and τ = 0.01 a.u. (a common time stepin pseudopotential calculations).

Derivation of the Local-Energy Limiting Scheme (I)

• Close to nodal surface, limited local energy is average energy of configuration as itdrifts for one time step from point at which limited energy is to be evaluated.

• Gives a limited local energy that remains bounded as node is approached and isphysically sensible for computing the branching factor.

• Average local energy during configuration drift to R(t+ τ) from R(t) ≡ R in vicinityof a node:

EL(R) =1τ

∫ t+τ

t

EL(R(t′)) dt′

= Ebest +1τ

∫ t+τ

t

[EL(R(t′))− Ebest] dt′.

Derivation of the Local-Energy Limiting Scheme (II)

• Divergence in local energy close to a node:

EL(R) = Ψ−1(R)HΨ(R) ' E0 +B

R⊥' Ebest + BV (R),

where B is constant and R⊥ is distance of point R from nodal surface.

• Hence

EL(R) = Ebest +B

τ

∫ t+τ

t

V (R(t′)) dt′

= Ebest +B

τ[R⊥(t + τ)−R⊥(t)]

= Ebest + BV (R)

= Ebest + [EL(R)− Ebest]V (R)V (R)

.

Derivation of the Local-Energy Limiting Scheme (III)

• EL reduces to EL when drift velocity is small, either away from nodes and nuclei orwhen τ is small.

• EL is always closer to Ebest than EL.

• Since both drift velocity and local energy diverge as 1/R⊥, limited local energy EL

remains finite as node is approached.

• Ratio of drift velocities is calculated as

V (R)V (R)

=

√v21(R) + . . . + v2

N(R)V (R)

in electron-by-electron algorithm.

• Limited single-electron drift velocities are calculated at same time as local energy,once all electron positions have been updated.

Evaluating Expectation Values: Mixed Estimators (I)

Let A be an Hermitian operator that commutes with H. Then φ0 is an eigenvector ofA. Let corresponding eigenvalue be a0. Then

a0 =〈φ0|A|Ψ〉〈φ0|Ψ〉

= limt→∞

∫f(R, t)Ψ−1(R)A(R)Ψ(R) dR∫

f(R, t) dR

'∑m

m′=1

∑NC(m′)α=1 wα(m′)A(α, m′)

∑mm′=1

∑NC(m′)α=1 wα(m′)

,

where m denotes iteration number (excluding the equilibration period) and

A(α, m) = Ψ−1(Rα(m))A(Rα(m))Ψ(Rα(m)).

Evaluating Expectation Values: Mixed Estimators (II)

• Ground-state energy can be estimated by using A(α, m) = EL(α, m).

• wα(m′) is the weight of configuration α at the end of time step m′.

• In unweighted DMC, branching factor is used in place of weights.

• At each iteration, local energies are averaged over configuration population (weightedby branching factors) and written to dmc.hist, along with total weight of eachiteration.

• Analyse the weighted data in dmc.hist using reblocking analysis.

Growth Estimator of Energy (I)

• Total weight at time t = mτ :

W (t) ≡∫

f(R, t) dR ≈Nconfig(m)∑

α=1

wα(m) ≡Mtot(m).

• Assume DMC simulation is equilibrated, so f(R, t) = Ψ(R)φ0(R). Then

W (t + τ) =∫ ∫

G(R← R′, τ)f(R′, τ) dR′ dR

=∫ ∫

Ψ(R)〈R| exp[−τ(H − ET )]|R′〉Ψ−1(R′)Ψ(R′)φ0(R′) dR dR′

= 〈Ψ| exp[−τ(H − ET )]|φ0〉 = W (t) exp[−τ(E0 − ET )].

Growth Estimator of Energy (II)

• Hence single-iteration growth estimator is

E0 = −1τ

log(

exp[−ET (t + τ)τ ]W (t + τ)W (t)

)

≈ −1τ

log(

exp[−ET (m + 1)τ ]Mtot(m + 1)Mtot(m)

).

• To reduce effects of statistical fluctuations, take expectation value of argument oflogarithm. (This introduces a small bias.)

• Statistical error bars on growth estimator usually much greater than mixed estimator.

• Difference of growth and mixed estimators indicates time-step bias.

• Set growth estimator to T to calculate growth estimator.

• Growth estimator rarely used in practice. “DMC energy” means mixed estimate.

Pure Estimators via Future Walking (I)

• If [A, H] 6= 0 then mixed estimator ADMC of A is not exact: error is O(∆), where ∆ iserror in trial wave function. Can use extrapolated estimator Aextrap = 2ADMC−AVMC

to estimate expectation value of A with O(∆2) error: see my talk on wave functions.

• Alternatively can use future walking to obtain “pure” distribution φ20.

• Let Rα be a point in configuration space. Then

Wfut(Rα) ≡∫ ∫

δ(R−Rα)G(R′ ← R,∞) dR dR′

=1

Ψ2(Rα)

∫Ψ2(R′)G(Rα ← R′,∞) dR′

∝ 1Ψ2(Rα)

Ψ(Rα)φ0(Rα) =φ0(Rα)Ψ(Rα)

,

where we have (i) used detailed balance and (ii) used fact that DMC projects outmixed ground state φ0Ψ from Ψ2.

Pure Estimators via Future Walking (II)

• Wfut(Rα) is total weight of all descendents of a configuration at Rα with unit weightafter a large number of iterations.

• Can therefore calculate Wfut(Rα) by maintaining a family tree of descendants of eachconfiguration Rα.

• In practice, only need to maintain family tree over a finite number of iterations.

• If one multiplies usual configuration weights wα by Wfut(Rα), resulting weighteddistribution is proportional to φ2

0.

• Activate in CASINO using future walking keyword.

Time-Step Bias (I)

• DMC Green’s function is approximate at finite time step: hence bias.

• Time-step bias is linear for sufficiently small time steps.

• Time-step bias does not get more severe in larger systems.

• Bias is greatly reduced if trial wave function is good.

• Must either (i) use sufficiently small time step that bias is negligible or (ii) performsimulations at different time steps and extrapolate to zero time step.

• Time-step biases may cancel, but must not assume this without checking.

Time-Step Bias (II)

0 0.1 0.2 0.3 0.4 0.5DMC time step (a.u.)

-0.08

-0.079

-0.078

-0.077

-0.076

-0.075D

MC

ene

rgy

(a.u

. per

ele

ctro

n) N=18N=18; S-JN=54N=118N=226

Time-step bias in a paramagnetic Fermi fluid with rs = 4 a.u. No Jastrow factor wasused, except where indicated by “S-J”.

Modifications to Green’s Function in Systems with Bare Nuclei

• Drift-velocity limiting scheme removes divergence at nodal surface; interference withcusps at bare nuclei is a side-effect.

• Immediately before limited drift velocity of electron at r′ is calculated, calculate aparameter as

a(r′) =12

(1 + v · ez) +Z2z2

10(4 + Z2z2),

where v is unit vector in direction of unlimited drift velocity, ez is unit vector fromclosest bare nucleus to electron, z is distance of electron from nucleus and Z isatomic number.

• Makes a small (and hence the limiting weak) if electron is close to nucleus anddrifting towards it.

Preventing Electrons from Overshooting Nuclei (I)

• Around bare nucleus, drift velocity v is directed towards nucleus.

• Drifting particles should never cross nucleus; should end up on top of it.

• Can impose this on DMC Green’s function at finite time steps.

• Use cylindrical polar coordinates with z-axis lying along line from nucleus to electron.Let position of closest nucleus be RZ.

• Position of electron relative to nucleus: r′ −RZ = z′ez.

• Limited drift velocity can be resolved as v = vzez + vρeρ, where eρ is a unit vectororthogonal to ez.

• New z-coordinate after drifting for one time step is z′′ = max{z′ + vzτ, 0}, whichcannot lie beyond nucleus.

• Drift in the radial direction over one time step is ρ′′ = 2vρτz′′/(z′ + z′′).

Preventing Electrons from Overshooting Nuclei (II)

• New radial coordinate is approximately vρτ when far from the nucleus, but is forcedto go to zero as nucleus is approached.

• If electron attempts to overshoot nucleus, it will end up on top of it.

• Let electron position at end of drift process be r′′ = z′′ez + ρ′′eρ.

Diffusion Close to a Bare Nucleus (I)

• Close to a nucleus, f is the square of the hydrogenic 1s orbital.

-2 -1 0 1 2x (a.u.)

0

0.1

0.2

0.3

0.4

0.5

ψ1s

(x)

• Cusp cannot be reproduced by Gaussian diffusion at finite time steps.

• Starting from a nucleus, an electron should take a random step w distributedaccording to exp(−2Z|w|).

• Only diffuse in this fashion when electron is likely to cross nucleus.

• Let Π be plane with normal ez, containing nucleus.

Diffusion Close to a Bare Nucleus (II)

• For Gaussian diffusion process, probability that electron crosses Π is

q =12

erfc(

z + vzτ√2τ

).

• So, with probability p ≡ 1− q, w is sampled from

g1(w) = (2πτ)−3/2 exp(−|w|

2

2τ

),

and new electron position is r = r′′ + w; otherwise, w is sampled from

g2(w) =ζ3

πexp(−2ζ|w|),

and r = RZ + w.

• ζ =√

Z2 + 1/τ , so ζ ≈ Z for large time steps, giving desired cusp.

Diffusion Close to a Bare Nucleus (III)

• Choice of ζ causes second moments of g1 and g2 to be equal up to O(τ). HenceGreen’s function remains correct to O(τ).

• Single-electron Green’s function for the move from r′ to r is given by

g(r← r′) = pg1(r− r′′) + qg2(r−RZ).

• In order to calculate Green’s function for reverse move, all steps above (apart fromrandom diffusion) must be performed, starting at point r and ending up at r′.

• To activate modifications to Green’s function, use nucleus gf mods keyword.

Eliminating Population-Control Bias (I)

• Propagation of DMC mixed wave function:

f(R,m) =∫

GDMC(R← R′, τ)f(R′,m− 1) dR′.

• In absence of accept/reject step, effect of including (time-step-dependent) referenceenergy ET (m) in GDMC can be “undone” by multiplying RHS by exp[−τET (m)].

• Effect of including reference energy from previous time step can be eliminated bymultiplying by exp[−τET (m− 1)].

• When accept/reject step is present, effect of changing reference energy can beapproximately undone using best estimate of effective time step in the “undoing”factors.

Eliminating Population-Control Bias (II)

• Effect of changing reference energy may be eliminated by multiplying f by

Π(m) =∏

m′=0

exp [−τEFF(m)ET (m−m′)] ,

• Sufficient to include Tp (=tpdmc) factors in the product, provided that Tp is greaterthan number of iterations over which the DMC data are correlated. (Each factorconstitutes an additional source of variance.)

• Let Π(m,Tp) =∏Tp−1

m′=0 exp [τEFF(1)EVMC − τEFF(m)ET (m−m′)]. Replacingf(R,m) with Π(m,Tp)f(R,m), mixed estimator can be written as

〈Φ|A|Ψ〉〈Φ|Ψ〉 '

∑mm′=1 Π(m′, Tp)

∑NC(m′)α=1 wα(m′)A(α, m′)

∑mm′=1 Π(m′, Tp)

∑NC(m′)α=1 wα(m′)

.

• The exp(τEFF(1)EVMC) factors cancel out of estimator, but help keep Π(m,Tp)close to 1.

Effectiveness and Practicality of Π-Weighting Scheme (I)

0 20 40 60 80 100Target population (configurations)

-6.302

-6.300

-6.298

-6.296D

MC

ene

rgy

(a.u

.)Π-weights not usedΠ-weights used (T

P=500)

DMC energy against population size for SiH4 with & without Π-weighting scheme. NoJastrow factor is used.

Effectiveness and Practicality of Π-Weighting Scheme (II)

• Π-weighting eliminates population-control bias.

• But in nearly all DMC work, population-control bias is negligibly small (typically haveseveral hundred configurations).

• Where population-control bias is present (can check for this by carrying out runsat different target populations), simplest way of eliminating it is to use a largerpopulation!

• If equilibration is a small fraction of total run time, increase in target population doesnot lead to loss of efficiency.

• Π weights are exponential functions of energy and therefore system size. Fluctuationsin weights can cause numerical problems for large systems.

• Therefore we do not normally use Π weighting scheme (so tpdmc=0).

Population-Explosion Catastrophes (I)

• Configuration-population explosions are liable to occur whenever local energy showssingular behaviour.

• Large local energies can invalidate the short-time approximation and the branchingfactor can diverge.

• Usual signature: unphysically low average configuration energy, accompanied by ajump in the population.

• CASINO halts if the population per processor exceeds 10× nconfig.

Population-Explosion Catastrophes (II)

0 250 500 750 1000 1250Iteration

-7.5

-7

-6.5A

vera

ge lo

cal e

nrgy

(a.

u.)

Average local energy during a DMC simulation of SiH4. Dashed line shows DMCground-state energy as found using simulations with a much smaller time step.

Electron–nucleus cusp condition not satisfied at hydrogen nuclei.

Population-Explosion Catastrophes (III)

0 20 40 60 80 100 120Iteration

0

50

100

150

200

Con

figu

ratio

n po

pula

tion Actual populationTarget population

Config population in a simulation of H2O. Electron–nucleus cusp condition not satisfied.

Nuclear Persistent-Electron Catastrophes (I)

• If electron–nucleus cusp condition isn’t satisfied, local energy diverges as r−1 whenelectron approaches a nucleus.

• Divergence is negative, causing positive divergence of branching factor.

-1 -0.5 0 0.5 1Displacement from Cl atom (a.u.)

-458

-457

-456

-455

-454

-453L

ocal

ene

rgy

(a.u

.)

Kato satisfiedKato not satisfied

Local energy as electron moves through a bare nucleus. Note that local energy doesnot change sign at divergence, unlike divergence at a node.

Nuclear Persistent-Electron Catastrophes (II)

• Probability density of electron being at nucleus is finite.

• Accept/reject step tends to prevent electron moves away from nucleus, so electronsmay become trapped.

• Possible for simulation to proceed with a population of configurations containing a“persistent” electron, but will have large negative bias.

• If multiplicity sufficiently high, unbounded population explosion occurs.

• Use a trial wave function satisfying electron–nucleus cusp conditions: nuclearpersistent-electron catastrophes never seen in practice.

• E.g., if all-electron calculations are performed with a Gaussian basis set,cusp correction should be T.

• Unfortunately the Gaussian cusp correction sometimes fails.

Smoothly Truncated Localised Orbitals (I)

• The use of truncated localised orbitals can lead to bad behaviour in the local energyand hence population explosions.

• Preferable to truncate localised orbitals abruptly than to bring them smoothly to zero:set bsmooth=F. Discontinuity in wave function leads to a small bias, but not toinstability.

0 1 2 3 4Distance from Si atom (a.u.)

0

0.05

0.1

Orb

ital v

alue

Smooth (A)Smooth (B)UntruncatedAbrupt

Localised orbital for SiH4.

Smoothly Truncated Localised Orbitals (II)

0 1 2 3Distance from Si atom (a.u.)

-0.5

0

0.5

Lap

laci

an o

f or

bita

l

Smooth (A)Smooth (B)UntruncatedAbrupt

Laplacian of localised orbital for SiH4.

• Shell in which orbital is brought to zero is new small length scale; need appropriatelytiny time step.

• No such problem when orbital is truncated abruptly.

Nonlocal Pseudopotentials

• Nonlocal pseudopotentials are awkward in DMC: when we derived the DMC Green’sfunction we assumed that the Hamiltonian was local.

• Locality approximation: replace nonlocal pseudopotential VNL by local operatorVLA(R) = Ψ−1(R)VNLΨ(R).

• Clear that 〈VLA〉VMC = 〈Ψ|VLA|Ψ〉/〈Ψ|Ψ〉 = 〈Ψ|VNL|Ψ〉/〈Ψ|Ψ〉 = 〈VNL〉VMC.

• Likewise, 〈φLA|VLA|Ψ〉/〈φLA|Ψ〉 = 〈φLA|VNL|Ψ〉/〈φLA|Ψ〉.

• However, 〈φLA|VNL|Ψ〉/〈φLA|Ψ〉 6= 〈φNL|VNL|Ψ〉/〈φNL|Ψ〉. So DMC does not giveground-state energy of nonlocal Hamiltonian.

• Can show that error in 〈φLA|VNL|Ψ〉/〈φLA|Ψ〉 is second order in error in Ψ.

• DMC energies in locality approximation are not guaranteed to exceed GS energy.

• The use of the locality approximation can lead to catastrophic behaviour.

T-Move Scheme (I)

• Can split nonlocal pseudopotential into a part with negative matrix elements and apart with positive matrix elements.

• Can simulate process corresponding to negative elements (so-called T-moves).

• Treat positive matrix elements within locality approximation.

• Restores property that DMC energy is greater than ground state.

• Tend to move away from nodes on nonlocal integration grid: eliminates instabilities.

• Set use tmove to T to use the T-move scheme.

T-Move Scheme (II)

0 0.005 0.01 0.015 0.02DMC time step (a.u.)

-15.840

-15.838

-15.836

-15.834

-15.832

-15.830

-15.828D

MC

ene

rgy

(a.u

.)

No Jas.; Loc. app.No Jas.; T-moveu+χ; Loc. app.u+χ; T-moveu+χ+f; Loc. app.u+χ+f; T-move

DMC energy against time step for a pseudo-oxygen atom, with and without theT-move scheme. Three different trial wave functions are used.

If You Encounter a Catastrophe. . . (I)

• If you encounter catastrophic behaviour:

1. Check that your wave function satisfies the Kato cusp conditions by using qmc plotto examine the local energy as an electron is moved through each nucleus1.

2. Check that any localised orbitals are truncated abruptly.3. Use the T-move scheme for nonlocal pseudopotentials.4. Consider using a more complete basis set for your orbitals.5. Consider using a smaller time step.

• If you still encounter occasional catastrophic behaviour, you can set an upper limiton the population (trip popn).

• CASINO will jump back to an earlier point in the simulation and change the randomnumber sequence, if the upper limit is exceeded.

• Choose the upper limit to be slightly higher than the maximum population that wouldbe encountered in an ordinary fluctuation.

1Type casinohelp qmc plot to find out how to use qmc plot.

If You Encounter a Catastrophe. . . (II)

0 1000 2000 3000Iteration

460

480

500

520

540

560

Con

figu

ratio

n po

pula

tion

Population in DMC simulationTarget populationSuggested trip population

Suggested trip population in a DMC simulation of SiH4.

Time-Step Bias in the Total Energy of Neon and Ne+ (I)

0 0.01 0.02 0.03 0.04 0.05DMC time step (a.u.)

-128.96

-128.95

-128.94

-128.93

-128.92

-128.91

-128.90D

MC

ene

rgy

(a.u

.)

AE Ne, no GF modsAE Ne, GF modsPseudo-NeExact energy

DMC energy of all-electron neon and pseudoneon against time step.

Time-Step Bias in the Total Energy of Neon and Ne+ (II)

0 0.001 0.002 0.003 0.004 0.005DMC time step (a.u.)

-128.925

-128.920

-128.915

-128.910D

MC

ene

rgy

(a.u

.)

No GF modsGF mods

DMC energy of all-electron neon against time step (closeup).

Time-Step Bias in the Total Energy of Neon and Ne+ (III)

0 0.01 0.02 0.03 0.04 0.05DMC time step (a.u.)

-128.16

-128.15

-128.14

-128.13

-128.12

-128.11

-128.10D

MC

ene

rgy

(a.u

.)AE Ne

+, no GF mods

AE Ne+, GF mods

Pseudo-Ne+

Exact energy

DMC energy of all-electron Ne+ and pseudo-Ne+ against time step.

Time-Step Bias in the Total Energy of Neon and Ne+ (IV)

• Time-step bias near-linear at time steps less than 0.005 a.u. in AE calculations.

• Bias in pseudoneon energy remains well-behaved up to higher time steps.

• Modifications to all-electron Green’s function reduce time-step bias.

• Shapes of time-step bias curves are similar for neon and Ne+. Suggests that time-stepbias in ionisation energy will be small.

Time-Step Bias in the Ionisation Energy of Neon (I)

0 0.01 0.02 0.03 0.04 0.05DMC time step (a.u.)

0.78

0.79

0.80

0.81D

MC

ioni

satio

n en

ergy

(a.

u.) All-electron, no GF mods

AE, GF modsPseudo-NeExp. (Kaufman & Minnhagen)

DMC ionisation energy of neon against time step.

Time-Step Bias in the Ionisation Energy of Neon (II)

• All-electron ionisation energy with standard algorithm shows least time-step bias atsmall time steps.

• So time-step bias in total energy is mainly due to innermost electrons; it cancels inionisation-energy calculations.

• Modifications to Green’s function largely eliminate bias due to innermost electrons,but this is irrelevant to ionisation energy.

• Conclusion should hold in ionisation-energy calculations for heavier atoms and othersituations where energy differences are taken.

• Pseudoneon calculations are more efficient because fewer electrons are simulated, anda simpler Jastrow factor and a larger time step can be used.

• Pseudopotential does not lead to any significant loss of accuracy.

Choosing a DMC Time Step (I)

• Always check for time-step bias by performing simulations at different time steps.

• First guess at time step: one fiftieth of (optimised) VMC time step.

• RMS distance diffused by each electron each time step (√

3pτ , where p is meanacceptance probability) should be ≤ smallest length scale.

• For “typical” time steps in “typical” systems, “typical” correlation period is about1000 iterations.

• So equilibration and statistics accumulation should typically be tens of thousands ofiterations.

• RMS distance diffused by each electron over equilibration period (√

3pNequilτ , whereNequil is number of equilibration iterations) should be ≥ longest length scale inproblem.

• Should also check that mean energy has stopped decreasing over equilibration period.

Choosing a DMC Time Step (II)

0 100 200 300 400Iteration

-6.4

-6.35

-6.3

-6.25

-6.2

-6.15

-6.1E

nerg

y (a

.u.)

Average local energyReference energyBest estimate of energy

Evolution to ground state during DMC equilibration for SiH4.

Summary

To carry out successful DMC calculations:

• Test for time-step bias by performing simulations at different time steps (for arepresentative system).

• Ensure the equilibration period is sufficiently long.

• Use a sufficiently large target population.

• Use a good basis for your orbitals.

• Use a highly optimised Jastrow factor.

• Ensure that statistical error bars are at least an order of magnitude smaller than theenergy difference you are trying to resolve.

Good luck with your DMC calculations!

Diﬀusion Quantum Monte Carlo · Diﬀusion Quantum Monte Carlo Neil D. Drummond TCM Group,...

Documents

Transcript of Diﬀusion Quantum Monte Carlo · Diﬀusion Quantum Monte Carlo Neil D. Drummond TCM Group,...