arXiv:cond-mat/0111138v1 [cond-mat.mtrl-sci] 8 Nov 2001
Transcript of arXiv:cond-mat/0111138v1 [cond-mat.mtrl-sci] 8 Nov 2001
arX
iv:c
ond-
mat
/011
1138
v1 [
cond
-mat
.mtr
l-sci
] 8
Nov
200
1
The Siesta method for ab initio order-N materials simulation
Jose M. Soler,1 Emilio Artacho,2 Julian D. Gale,3 Alberto Garcıa,4
Javier Junquera,1, 5 Pablo Ordejon,6 and Daniel Sanchez-Portal7
1 Dep. de Fısica de la Materia Condensada, C-III,Universidad Autonoma de Madrid, E-28049 Madrid, Spain2 Department of Earth Sciences, University of Cambridge,
Downing St., Cambridge CB2 3EQ, United Kingdom3 Department of Chemistry, Imperial College of Science,
Technology and Medicine, South Kensington SW7 2AY, United Kingdom4 Departamento de Fısica de la Materia Condensada,
Universidad del Paıs Vasco, Apt. 644, 48080 Bilbao, Spain5 Institut de Physique, Batiment B5, Universite de Liege, B-4000 Sart-Tilman, Belgium
6 Institut de Ciencia de Materials de Barcelona, CSIC,Campus de la UAB, Bellaterra, 08193 Barcelona, Spain
7 Dep. de Fısica de Materiales and DIPC, Facultad de Quımica, UPV/EHU, Apt. 1072, 20080 Donostia, Spain(Dated: February 1, 2008)
We have developed and implemented a self-consistent density functional method using standardnorm-conserving pseudopotentials and a flexible, numerical LCAO basis set, which includes multiple-zeta and polarization orbitals. Exchange and correlation are treated with the local spin density orgeneralized gradient approximations. The basis functions and the electron density are projected ona real-space grid, in order to calculate the Hartree and exchange-correlation potentials and matrixelements, with a number of operations that scales linearly with the size of the system. We use amodified energy functional, whose minimization produces orthogonal wavefunctions and the sameenergy and density as the Kohn-Sham energy functional, without the need of an explicit orthogonal-ization. Additionally, using localized Wannier-like electron wavefunctions allows the computationtime and memory, required to minimize the energy, to also scale linearly with the size of the system.Forces and stresses are also calculated efficiently and accurately, thus allowing structural relaxationand molecular dynamics simulations.
I. INTRODUCTION
As the improvements in computer hardware and soft-ware allow the simulation of molecules and materialswith an increasing number of atoms N , the use of so-called order-N algorithms, in which the computer timeand memory scales linearly with the simulated systemsize, becomes increasingly important. These O(N) meth-ods were developed during the 1970ās and 80ās for longrange forces1 and empirical interatomic potentials2 butonly in the last 5-10 years for the much more complexquantum-mechanical methods, in which atomic forces areobtained by solving the interaction of ions and electronstogether3. Even among quantum mechanical methods,there are very different levels of approximation: empir-ical or semiempirical orthogonal tight-binding methodsare the simplest ones4,5; āab-initioā nonorthogonal-tight-binding and nonself-consistent Harris-functional meth-ods are next6,7; and fully self-consistent density func-tional theory (DFT) methods are the most complexand reliable8. The implementation of O(N) methods inquantum-mechanical simulations has also followed thesesteps, with several methods already well establishedwithin the tight-binding formalism5, but much less soin self-consistent DFT9. The latter also require, in addi-tion to solving Schrodingerās equation, the determina-tion of the self-consistent Hamiltonian in O(N) itera-tions. While this is difficult using plane waves, a local-
ized basis set appears to be the natural choice. One pro-posed approach are the āblipsā of Hernandez and Gillan10,regularly-spaced Gaussian-like splines that can be sys-tematically increased, in the spirit of finite-element meth-ods, although at a considerable computational cost.
We have developed a fully self-consistent DFT, basedon a flexible linear combination of atomic orbitals(LCAO) basis set, with essentially perfect O(N) scal-ing. It allows extremely fast simulations using mini-mal basis sets and very accurate calculations with com-plete multiple-zeta and polarized bases, depending on therequired accuracy and available computational power.In previous papers11,12 we have described preliminaryversions of this method, that we call Siesta (SpanishInitiative for Electronic Simulations with Thousands ofAtoms). There is also a review13 of the tens of stud-ies performed with it, in a wide variety of systems, likemetallic surfaces, nanotubes, and biomolecules. In thiswork we present a more complete description of themethod, as well as some important improvements.
Apart from that of Born and Oppenheimer, the mostbasic approximations concern the treatment of exchangeand correlation, and the use of pseudopotentials. Ex-change and correlation (XC) are treated within Kohn-Sham DFT14. We allow for both the local (spin) den-sity approximation15 (LDA/LSD) or the generalized gra-dient approximation16 (GGA). We use standard norm-conserving pseudopotentials17,18 in their fully non-localform19. We also include scalar-relativistic effects and the
2
nonlinear partial-core-correction to treat exchange andcorrelation in the core region20.
The Siesta code has been already tested and appliedto dozens of systems and a variety of properties13. There-fore, we will just illustrate here the convergence of afew characteristic magnitudes of silicon, the architypi-cal system of the field, with respect to the main preci-sion parameters that characterize our method: basis size(number of atomic basis orbitals); basis range (radius ofthe basis orbitals); fineness of the real-space integrationgrid; and confinement radius of the Wannier-like electronstates. Other parameters, like the k-sampling integrationgrid, are common to all similar methods and we will notdiscuss their convergence here.
II. PSEUDOPOTENTIAL
Although the use of pseudopotentials is not strictlynecessary with atomic basis sets, we find them con-venient to get rid of the core electrons and, moreimportantly, to allow for the expansion of a smooth(pseudo)charge density on a uniform spatial grid. Thetheory and usage of first principles norm-conservingpseudopotentials17 is already well established. Siesta
reads them in semilocal form (a different radial potentialVl(r) for each angular momentum l, optionally gener-ated scalar-relativistically21,22) from a data file that userscan fill with their preferred choice. We generally use theTroullier-Martins parameterization23. We transform thissemilocal form into the fully non-local form proposed byKleinman and Bylander (KB)19:
V PS = Vlocal(r) + V KB (1)
V KB =
lKBmaxā
l=0
lā
m=āl
NKBl
ā
n=1
|ĻKBlmnćvKB
ln ćĻKBlmn| (2)
vKBln =< Ļln|Ī“Vl(r)|Ļln > (3)
where Ī“Vl(r) = Vl(r) ā Vlocal(r). ĻKBlmn(r) =
ĻKBln (r)Ylm(r) (with Ylm(r) a spherical harmonic) are the
KB projection functions
ĻKBln (r) = Ī“Vl(r)Ļln(r). (4)
The functions Ļln are obtained from the eigenstates Ļln
of the semilocal pseudopotential (screened by the pseudo-valence charge density) at energy Ē«ln using the orthogo-nalization scheme proposed by Blochl24:
Ļln(r) = Ļln(r) ānā1ā
nā²=1
Ļlnā²(r)< Ļlnā² |Ī“Vl(r)|Ļln >
< Ļlnā² |Ī“Vl(r)|Ļlnā² >(5)
[ā 1
2r
d2
dr2r +
l(l+ 1)
2r2+ Vl(r) +
+ V H(r) + V xc(r)]Ļln(r) = Ē«lnĻln(r) (6)
V H and V xc are the Hartree and XC potentials for thepseudo-valence charge density, and we are using atomicunits (e = h = me = 1) throughout all of this paper.
The local part of the pseudopotential Vlocal(r) is inprinciple arbitrary, but it must join the semilocal poten-tials Vl(r) which, by construction, all become equal to the(unscreened) all-electron potential beyond the pseudopo-tential core radius rcore. Thus, Ī“Vl(r) = 0 for r > rcore.Ramer and Rappe have proposed that Vlocal(r) be opti-mized for transferability25, but most plane wave schemesmake it equal to one of the Vl(r)ās for reasons of effi-ciency. Our case is different because Vlocal(r) is the onlypseudopotential part that needs to be represented in thereal space grid, while the matrix elements of the non-local part VKB are cheaply and accurately calculated bytwo-center integrals. Therefore, we optimize Vlocal(r) forsmoothness, making it equal to the potential created bya positive charge distribution of the form26
Ļlocal(r) ā exp[ā(sinh(abr)/ sinh(b))2], (7)
where a and b are chosen to provide simultane-ously optimal real-space localization and reciprocal-spaceconvergence27. After some numerical tests we have takenb =1 and a =1.82/rcore. Figure 1 shows Vlocal(r) for sil-icon.
Since Vl(r) = Vlocal(r) outside rcore, ĻKBln (r) is strictly
zero beyond that radius, irrespective of the value of Ē«ln28.
Generally it is sufficient to have a single projector ĻKBlm
for each angular momentum (i.e. a single term in thesum on n). In this case we follow the normal practiceof making Ē«ln equal to the valence atomic eigenvalue Ē«l,and the function Ļl(r) in Eq. 4 is identical to the cor-responding eigenstate Ļl(r). In some cases, particularlyfor alkaline metals, alkaline earths, and transition met-als of the first few columns, we have sometimes foundit necessary to include the semicore states together withthe valence states29. In these cases, we also include twoindependent KB projectors, one for the semicore and onefor the valence states. However, our pseudopotentials arestill norm-conserving rather than āultrasoftā30. This isbecause, in our case, it is only the electron density thatneeds to be accurately represented in a real-space grid,rather than each wavefunction. Therefore, the ultrasoftpseudopotential formalism does not imply in Siesta thesame savings as it does in PW schemes. Also, since thenon-local part of the pseudopotential is a relatively cheapoperator within Siesta, we generally (but not necessar-ily) use a larger than usual value of lKB
max in Eq. (2), mak-ing it one unit larger than the lmax of the basis functions.
III. BASIS SET
Order-N methods rely heavily on the sparsity of theHamiltonian and overlap matrices. This sparsity re-quires either the neglect of matrix elements that are smallenough or the use of strictly confined basis orbitals, i.e.,orbitals that are zero beyond a certain radius7. We have
3
FIG. 1: Local pseudopotential for silicon. Vlocal is the un-screened local part of the pseudopotential, generated as theelectrostatic potential produced by a localized distribution ofpositive charge, Eq. (7), whose integral is equal to the valenceion charge (Z = 4 for Si). The dashed line is āZ/r. VNA isthe local pseudopotential screened by an electron charge dis-tribution, generated by filling the first-Ī¶ basis orbitals withthe free-atom valence occupations. Since these basis orbitalsare strictly confined to a radius rc
max, VNA is also strictly zerobeyond that radius.
adopted this latter approach because it keeps the en-ergy strictly variational, thus facilitating the test of theconvergence with respect to the radius of confinement.Within this radius, our atomic basis orbitals are productsof a numerical radial function times a spherical harmonic.For atom I, located at RI ,
ĻIlmn(r) = ĻIln(rI)Ylm(rI) (8)
where rI = r ā RI , r = |r| and r = r/r. The angu-lar momentum (labelled by l,m) may be arbitrarily largeand, in general, there will be several orbitals (labelled byindex n) with the same angular dependence, but differ-ent radial dependence, which is conventionally called aāmultiple-Ī¶ā basis. The radial functions are defined by acubic spline interpolation31 from the values given on afine radial mesh. Each radial function may have a dif-ferent cutoff radius and, up to that radius, its shape iscompletely free and can be introduced by the user in aninput file. In practice, it is also convenient to have an au-tomatic procedure to generate sufficiently good basis sets.We have developed several such automatic procedures,and we will describe here one of them for completeness,even though we stress that the generation of the basisset, like that of the pseudopotential is to a large extent
up to the user and independent of the Siesta methoditself.
In the case of a minimal (single-Ī¶) basis set, we havefound convenient and efficient the method of Sankey andNiklewski7,32. Their basis orbitals are the eigenfunctionsof the (pseudo)atom within a spherical box (although theradius of the box may be different for each orbital, see be-low). In other words, they are the (angular-momentum-dependent) numerical eigenfunctions Ļl(r) of the atomicpseudopotential Vl(r), for an energy Ē«l + Ī“Ē«l chosen sothat the first node occurs at the desired cutoff radius rc
l :
(
ā 1
2r
d2
dr2r +
l(l+ 1)
2r2+ Vl(r)
)
Ļl(r) = (Ē«l + Ī“Ē«l)Ļl(r)
(9)
with Ļl(rcl ) = 0 (we omit indices I and n here for simplic-
ity). In order to obtain a well balanced basis, in which theeffect of the confinement is similar for all the orbitals, itis usually better to fix a common āenergy shiftā Ī“Ē«, ratherthan a common radius rc, for all the atoms and angularmomenta. This means that the orbital radii depend onthe atomic species and angular momentum.
One obvious possibility for multiple-Ī¶ bases is to usepseudopotential eigenfunctions with an increasing num-ber of nodes32. They have the virtue of being orthogonaland asymptotically complete. However, the efficiency ofthis kind of basis depends on the radii of confinementof the different orbitals, since the excited states of thepseudopotential are usually unbound. Thus, in practicewe have found this procedure rather inefficient. Anotherpossibility is to use the atomic eigenstates for differentionization states33. We have implemented a differentscheme34, based on the āsplit-valenceā method which isstandard in quantum chemistry35. In that method, thefirst-Ī¶ basis orbitals are ācontractedā (i.e. fixed) linearcombinations of Gaussians, determined either variation-ally or by fitting numerical atomic eigenfunctions. Thesecond-Ī¶ orbital is then one of the Gaussians (generallythe slowest-decaying one) which is āreleasedā or āsplitāfrom the contracted combination. Higher-Ī¶ orbitals aregenerated in a similar way by releasing more Gaussians.Our scheme adapts this split-valence method to our nu-merical orbitals. Following the same spirit, our second-Ī¶
functions Ļ2Ī¶l (r) have the same tail as the first-Ī¶ orbitals
Ļ1Ī¶l (r) but change to a simple polynomial behaviour in-
side a āsplit radiusā rsl :
Ļ2Ī¶l (r) =
rl(al ā blr2) if r < rs
l
Ļ1Ī¶l (r) if r ā„ rs
l
(10)
where al and bl are determined by imposing the conti-nuity of value and slope at rs
l . These orbitals thereforecombine the decay of the atomic eigenfunctions with asmooth and featureless behaviour inside rs
l . We havefound it convenient to set the radius rs
l by fixing the
norm of Ļ1Ī¶l in r > rs
l . We have found empirically that areasonable value for this āsplit-normā is ā¼ 0.15. Actually,
4
instead of Ļ2Ī¶l thus defined, we use Ļ1Ī¶
l ā Ļ2Ī¶l , which is
zero beyond rsl , to reduce the number of nonzero matrix
elements, without any loss of variational freedom.To achieve well converged results, in addition to the
atomic valence orbitals, it is generally necessary to alsoinclude polarization orbitals, to account for the defor-mation induced by bond formation. Again, using pseu-doatomic orbitals of higher angular momentum is fre-quently unsatisfactory, because they tend to be too ex-tended, or even unbound. Instead, consider a valencepseudoatomic orbital Ļlm(r) = Ļl(r)Ylm(r), such thatthere are no valence orbitals with angular momentuml + 1. To polarize it, we apply a small electric field E inthe z-direction. Using first-order perturbation theory
(H ā E)Ī“Ļ = ā(Ī“H ā Ī“E)Ļ, (11)
where Ī“H = Ez and Ī“E = ćĻ|Ī“H |Ļć = 0 because Ī“H isodd. Selection rules imply that the resulting perturbedorbital will only have components with lā² = lĀ±1,mā² = m:
Ī“HĻlm(r) = (Er cos(Īø)) (Ļl(r)Ylm(r))
= ErĻl(r)(clā1Ylā1,m + cl+1Yl+1,m) (12)
and
Ī“Ļlm(r) = Ļlā1(r)Ylā1,m(r) + Ļl+1(r)Yl+1,m(r). (13)
Since in general there will already be orbitals with an-gular momentum l ā 1 in the basis set, we select thel + 1 component by substituting (12) and (13) in (11),multiplying by Y ā
l+1,m(r) and integrating over angularvariables. Thus we obtain the equation
[ā 1
2r
d2
dr2r +
(l + 1)(l + 2)
2r2
+ Vl(r) ā El]Ļl+1(r) = ārĻl(r) (14)
where we have also eliminated the factors E and cl+1,which affect only the normalization of Ļl+1. The po-larization orbitals are then added to the basis set:Ļl+1,m(r) = NĻl+1(r)Yl+1,m(r), where N is a normal-ization constant.
We have found that the previously described proce-dures generate reasonable minimal single-Ī¶ (SZ) basissets, appropriate for semiquantitative simulations, anddouble-Ī¶ plus polarization (DZP) basis sets that yieldhigh quality results for most of the systems studied. Wethus refer to DZP as the āstandardā basis, because it usu-ally represents a good balance between well convergedresults and a reasonable computational cost. In somecases (typically alkali and some transition metals), semi-core states also need to be included for good quality re-sults. More recently36, we have obtained extremely ef-ficient basis sets optimized variationally in molecules orsolids. Figure 2 shows the performance of these atomicbasis sets compared to plane waves, using the same pseu-dopotentials and geometries. It may be seen that the SZbases are comparable to planewave cutoffs typically used
5 10 15 20 25 30 350.00
0.50
1.00
1.50
2.00
2.50
Ene
rgy
(eV
)
SiSZ (4)
DZ (8)
TZ (12)
SZP (9)
DZP (13)
TZP (17)
TZDP (22)
PW cutoff (Ry)PW basis size(25) (71) (130) (201) (280) (369) (464)
TZTP (27)TZTPF (34)
FIG. 2: Comparison of convergence of the total energy withrespect to the sizes of a plane wave basis set and of the LCAObasis set used by Siesta. The curve shows the total energyper atom of silicon versus the cutoff of a plane wave basis, cal-culated with a program independent of Siesta, which uses thesame pseudopotential. The arrows indicate the energies ob-tained with different LCAO basis sets, calculated with Siesta,and the plane wave cutoffs that yield the same energies. Thenumbers in parentheses indicate the basis sizes, i.e. the num-ber of atomic orbitals or plane waves of each basis set. SZ:single-Ī¶ (valence s and p orbitals); DZ: double-Ī¶; TZ: triple-Ī¶;DZP: double-Ī¶ valence orbitals plus single-Ī¶ polarization d or-bitals; TZP: triple-Ī¶ valence plus single-Ī¶ polarization; TZDP:triple-Ī¶ valence plus double-Ī¶ polarization; TZTP: triple-Ī¶ va-lence plus triple-Ī¶ polarization; TZTPF: same as TZTP plusextra single-Ī¶ polarization f orbitals.
in Car-Parrinello molecular dynamics simulations, whileDZP sets are comparable to the cutoffs used in geom-etry relaxations and energy comparisons. As expected,the LCAO is far more efficient, tipically by a factor of10 to 20, in terms of number of basis orbitals. This ef-ficiency must be balanced against the faster algorithmsavailable for plane waves, and our main motivation forusing an LCAO basis is its suitability for O(N) meth-ods. Still, we have generally found that, even withoutusing the O(N) functional, Siesta is considerably fasterthan a plane wave calculation of similar quality.
Figure 3 shows the convergence of the total energycurve of silicon, as a function of lattice parameter, fordifferent basis sizes, and table I summarizes the sameinformation numerically. It can be seen that the āstan-dardā DZP basis offers already quite well converged re-sults, comparable to those used in practice in most planewave calculations.
Figure 4 shows the dependence of the lattice constant,bulk modulus, and cohesive energy of bulk silicon withthe range of the basis orbitals. It shows that a cutoffradius of 3 A for both s and p orbitals yields already verywell converged results, specially when using a āstandardāDZP basis.
5
5.0 5.2 5.4 5.6 5.8 6.0Lattice Constant (Ć )
0.0
0.5
1.0
1.5
2.0T
otal
Ene
rgy
(eV
)
SZDZTZSZPDZPTZPTZDPTZTPTZTPFPWminima
FIG. 3: Total energy per atom versus lattice constant forbulk silicon, using different basis sets, noted as in Fig. 2. PWrefers to a very well converged (50 Ry cutoff) plane wavecalculation. The dotted line joins the minima of the differentcurves.
TABLE I: Comparisons of the lattice constant a, bulk mod-ulus B, and cohesive energy Ec for bulk Si, obtained withdifferent basis sets. The basis notation is as in Fig. 2. PWrefers to a 50 Ry-cutoff plane wave calculation. The LAPWresults were taken from ref. 37, and the experimental valuesfrom ref. 38.
Basis a (A) B (GPa) Ec (eV)
SZ 5.521 88.7 4.722
DZ 5.465 96.0 4.841
TZ 5.453 98.4 4.908
SZP 5.424 97.8 5.227
DZP 5.389 96.6 5.329
TZP 5.387 97.5 5.335
TZDP 5.389 96.0 5.340
TZTP 5.387 96.0 5.342
TZTPF 5.385 95.4 5.359
PW 5.384 95.9 5.369
LAPW 5.41 96 5.28
Expt. 5.43 98.8 4.63
IV. ELECTRON HAMILTONIAN
Within the non-local-pseudopotential approximation,the standard Kohn-Sham one-electron Hamiltonian maybe written as
H = T +ā
I
V localI (r) +
ā
I
V KBI + V H(r) + V xc(r)
(15)
5.15.25.35.45.55.6
a (Ć
) ExpPW
SZ
DZP
75
100125150175
B (
GP
a)
ExpPW
DZP
SZ
5.0 6.0 7.0 8.0cutoff radii (a.u.)
3
4
5E
c (
eV) PW
ExpSZ
DZP
FIG. 4: Dependence of the lattice constant, bulk modulus,and cohesive energy of bulk silicon with the cutoff radius ofthe basis orbitals. The s and p orbital radii have been madeequal in this case, to simplify the plot. PW refers to a wellconverged plane wave calculation with the same pseudopoten-tial.
where T = ā 12ā2 is the kinetic energy operator, I is an
atom index, V H(r) and V xc(r) are the total Hartree and
XC potentials, and V localI (r) and V KB
I are the local andnon-local (Kleinman-Bylander) parts of the pseudopo-tential of atom I.
In order to eliminate the long range of V localI , we screen
it with the potential V atomI , created by an atomic electron
density ĻatomI , constructed by populating the basis func-
tions with appropriate valence atomic charges. Noticethat, since the atomic basis orbitals are zero beyond thecutoff radius rc
I = maxl(rcIl), the screened āneutral-atomā
(NA) potential V NAI ā” V local
I +V atomI is also zero beyond
this radius7 (see Fig. 1). Now let Ī“Ļ(r) be the differ-ence between the self-consistent electron density Ļ(r) andthe sum of atomic densities Ļatom =
ā
I ĻatomI , and let
Ī“V H(r) be the electrostatic potential generated by Ī“Ļ(r),
6
which integrates to zero and is usually much smaller thanĻ(r). Then the total Hamiltonian may be rewritten as
H = T +ā
I
V KBI +
ā
I
V NAI (r) + Ī“V H(r) + V xc(r)
(16)
The matrix elements of the first two terms involve onlytwo-center integrals which are calculated in reciprocalspace and tabulated as a function of interatomic distance.The remaining terms involve potentials which are calcu-lated on a three-dimensional real-space grid. We considerthese two approaches in detail in the following sections.
V. TWO-CENTER INTEGRALS
The overlap matrix and the largest part of the Hamilto-nian matrix elements are given by two-center integrals39.We calculate these integrals in Fourier space, as proposedby Sankey and Niklewski7, but we use some implemen-tation details explained in this section. Let us considerfirst overlap integrals of the form
S(R) ā” ćĻ1|Ļ2ć =
ā«
Ļā1(r)Ļ2(r ā R)dr, (17)
where the integral is over all space and Ļ1, Ļ2 may be ba-sis functions Ļlmn, KB pseudopotential projectors Ļlmn,or more complicated functions centered on the atoms.The function S(R) can be seen as a convolution: we takethe Fourier transform
Ļ(k) =1
(2Ļ)3/2
ā«
Ļ(r)eāikrdr (18)
where we use the same symbol Ļ for Ļ(r) and Ļ(k), as itsmeaning is clear from the different arguments. We alsouse the planewave expression of Diracās delta function,ā«
ei(kā²āk)rdr = (2Ļ)3Ī“(kā² ā k), to find the usual resultthat the Fourier transform of a convolution in real spaceis a simple product in reciprocal space:
S(R) =
ā«
Ļā1(k)Ļ2(k)eāikRdk (19)
Let us assume now that the functions Ļ(r) can be ex-panded exactly with a finite number of spherical har-monics:
Ļ(r) =
lmaxā
l=0
lā
m=āl
Ļlm(r)Ylm(r), (20)
Ļlm(r) =
ā« Ļ
0
sin ĪødĪø
ā« 2Ļ
0
dĻY ālm(Īø, Ļ)Ļ(r, Īø, Ļ). (21)
This is clearly true for basis functions and KB projec-tors, which contain a single spherical harmonic, and alsofor functions like xĻ(r), which appear in dipole matrix
elements. We now substitute in (18) the expansion of aplane wave in spherical harmonics40
eikr =
āā
l=0
lā
m=āl
4Ļiljl(kr)Yālm(k)Ylm(r), (22)
to obtain
Ļ(k) =
lmaxā
l=0
lā
m=āl
Ļlm(k)Ylm(k), (23)
Ļlm(k) =
ā
2
Ļ(āi)l
ā« ā
0
r2drjl(kr)Ļlm(r). (24)
Substituting now (23) and (22) into (19) we obtain
S(R) =
2lmaxā
l=0
lā
m=āl
Slm(R)Ylm(R) (25)
where
Slm(R) =ā
l1m1
ā
l2m2
Gl1m1,l2m2,lmSl1m1,l2m2,l(R), (26)
Gl1m1,l2m2,lm =
ā« Ļ
0
sin ĪødĪø
ā« 2Ļ
0
dĻ
Y āl1m1
(Īø, Ļ)Yl2m2(Īø, Ļ)Y ā
lm(Īø, Ļ), (27)
Sl1m1,l2m2,l(R) = 4Ļil1āl2āl
ā« ā
0
k2dkjl(kR)
Ć iāl1Ļā1,l1m1
(k)il2Ļ2,l2m2(k), (28)
Notice that iāl1Ļ1(k), il2Ļ2(k), and il1āl2āl are all real,
since l1āl2āl is even for all lās for which Gl1m1,l2m2,lm 6=0. The Gaunt coefficients Gl1m1,l2m2,lm can be obtainedby recursion from Clebsch-Gordan coefficients7. How-ever, we use real spherical harmonics for computationalefficiency:
Ylm(Īø, Ļ) = ClmPml (cos Īø) Ć
sin(mĻ) if m < 0
cos(mĻ) if m ā„ 0
(29)
where Pml (z) are the associated Legendre polynomials
and Clm normalization constants31. This does not affectthe validity of any of previous equations, but it modifiesthe value of the Gaunt coefficients. Therefore, we findit is simpler and more general to calculate Gl1m1,l2m2,lm
directly from Eq. (27). To do this, we use a Gaussianquadrature31
ā« Ļ
0
sin ĪødĪø
ā« 2Ļ
0
dĻā 4Ļ1
NĪø
NĪøā
i=1
wi sin Īøi1
NĻ
NĻā
j=1
(30)
7
with NĻ = 1 + 3lmax, NĪø = 1 + int(3lmax/2), and thepoints cos Īøi and weights wi are calculated as describedin ref. 31. This quadrature is exact in equation (27) forspherical harmonics Ylm (real or complex) of l ā¤ lmax,and it can be used also to find the expansion of Ļ(r) inspherical harmonics (eq. (21)).
The coefficients Gl1m1,l2m2,lm are universal and theycan be calculated and stored once and for all. The func-tions Sl1m1,l2m2,l(R) depend, of course, on the functionsĻ1,2(r) being integrated. For each pair of functions, theycan be calculated and stored in a fine radial grid Ri, upto the maximum distance Rmax = rc
1 + rc2 at which Ļ1
and Ļ2 overlap. Their value at an arbitrary distance Rcan then be obtained very accurately using a spline in-terpolation.
Kinetic matrix elements T (R) ā” ćĻā1 | ā 1
2ā2|Ļ2ć canbe obtained in exactly the same way, except for an extrafactor k2 in Eq. (28):
Tl1m,l2m2,l(R) = 4Ļil1āl2āl
ā« ā
0
1
2k4dkjl(kR)
Ć iāl1Ļā1,l1m1
(k)il2Ļ2,l2m2(k). (31)
Since we frequently use basis orbitals with a kink7, weneed rather fine radial grids to obtain accurate kineticmatrix elements, and we typically use grid cutoffs of morethan 2000 Ry for this purpose. Once obtained, the finegrid does not penalize the execution time, because theinterpolation effort is independent of the number of gridpoints. It also affects very marginally the storage require-ments, because of the one-dimensional character of thetables. However, even though it needs to be done onlyonce, the calculation of the radial integrals (24), (28),and (31) is not negligible if performed unwisely. We havedeveloped a special fast radial Fourier transform for thispurpose, as explained in appendix B.
Dipole matrix elements, such as ćĻ1|x|Ļ2ć, can alsobe obtained easily by defining a new function Ļ1(r) ā”xĻ1(r), expanding it using (21), and computing ćĻ1|Ļ2ćas explained above (with the precaution of using lmax +1instead of lmax).
VI. GRID INTEGRALS
The matrix elements of the last three terms of Eq. (16)involve potentials which are calculated on a real-spacegrid. The fineness of this grid is controlled by a āgrid cut-offā Ecut: the maximum kinetic energy of the planewavesthat can be represented in the grid without aliasing41.The short-range screened pseudopotentials V NA
I (r) in(16) are tabulated as a function of the distance to atoms Iand easily interpolated at any desired grid point. The lasttwo terms require the calculation of the electron densityon the grid. Let Ļi(r) be the Hamiltonian eigenstates,expanded in the atomic basis set
Ļi(r) =ā
Āµ
ĻĀµ(r)cĀµi, (32)
where cĀµi = ćĻĀµ|Ļić and ĻĀµ is the dual orbital of ĻĀµ:
ćĻĀµ|ĻĪ½ć = Ī“ĀµĪ½ . We use the compact index notationĀµ ā” Ilmn for the basis orbitals, Eq. (8). The elec-tron density is then
Ļ(r) =ā
i
ni|Ļi(r)|2 (33)
where ni is the occupation of state Ļi. If we substitute(32) into (33) and define a density matrix
ĻĀµĪ½ =ā
i
cĀµiniciĪ½ , (34)
where ciĪ½ ā” cāĪ½i, the electron density can be rewritten as
Ļ(r) =ā
ĀµĪ½
ĻĀµĪ½ĻāĪ½(r)ĻĀµ(r) (35)
We use the notation ĻāĀµ for generality, despite our useof real basis orbitals in practice. Then, to calculate thedensity at a given grid point, we first find all the atomicbasis orbitals, Eq. (8), at that point, interpolating theradial part from numerical tables, and then we use (35)to calculate the density. Notice that only a small num-ber of basis orbitals are non-zero at a given grid point,so that the calculation of the density can be performedin O(N) operations, once ĻĀµĪ½ is known. The storage ofthe orbital values at the grid points can be one of themost expensive parts of the program in terms of memoryusage. Hence, an option is included to calculate and usethese terms on the fly, in the the spirit of a direct-SCFcalculation. The calculation of ĻĀµĪ½ itself with Eq. (34)does not scale linearly with the system size, requiring in-stead the use of special O(N) techniques to be describedbelow. However, notice that in order to calculate thedensity, only the matrix elements ĻĀµĪ½ for which ĻĀµ andĻĪ½ overlap are required, and they can therefore be storedas a sparse matrix of O(N) size. Once the valence den-sity is available in the grid, we add to it, if necessary,the non-local core correction20, a spherical charge den-sity intended to simulate the atomic cores, which is alsointerpolated from a radial grid. With it, we find theexchange and correlation potential V xc(r), trivially inthe LDA and using the method described in ref. 42 forthe GGA. To calculate Ī“V H(r), we first find Ļatom(r)at the grid points, as a sum of spherical atomic densi-ties (also interpolated from a radial grid) and subtract itfrom Ļ(r) to find Ī“Ļ(r). We then solve Poissonās equa-tion to obtain Ī“V H(r) and find the total grid potentialV (r) = V NA(r) + Ī“V H(r) + V xc(r). Finally, at everygrid point, we calculate V (r)ĻāĀµ(r)ĻĪ½ (r)ār
3 for all pairs
ĻĀµ, ĻĪ½ which are not zero at that point (ār3 is the vol-
ume per grid point) and add it to the Hamiltonian matrixelement HĀµĪ½ .
To solve Poissonās equation and find Ī“V H(r) we nor-mally use fast Fourier transforms in a unit cell that iseither naturally periodic or made artificially periodic bya supercell construction. For neutral isolated molecules,
8
our use of strictly confined basis orbitals makes it triv-ial to avoid any direct overlap between the repeatedmolecules, and the electric multipole interactions de-crease rapidly with cell size. For charged molecules wesupress the G = 0 Fourier component (an infinite con-stant) of the potential created by the excess of charge.This amounts to compensating this excess with a uni-form charge background. We then use the method ofMakov and Payne43 to correct the total energy for theinteraction between the repeated cells. Alternatively, wecan solve Poissonās equation by the multigrid method,using finite differences and fixed boundary conditions,obtained from the multipole expansion of the molecularcharge density. This can be done in strictly O(N) opera-tions, unlike the FFTās, which scale asN logN . However,the cost of this operation is typically negligible and there-fore has no influence on the overall scaling properties ofthe calculation.
Figures 5 and 6 show the convergence of different mag-nitudes with respect to the energy cutoff of the integra-tion grid. For orthogonal unit cell vectors this is simply,in atomic units, Ecut = (Ļ/āx)2/2 with āx the gridinterval.
-40
-20
0
20
40
ā E
t (m
eV) Si
-6
-4
-2
0
2
ā d
(m
Ang
)
H2O
-15
-5
5
15
10 30 50 70 90
ā P
(kb
ar)
Ec (Ry)
-1
0
1
20 60 100 140 180
ā Ī±
(o )
Ec (Ry)
FIG. 5: (a) Convergence of the total energy and pressure inbulk silicon as a function of the energy cutoff Ecut of the realspace integration mesh. Circles and continuous line: using agrid-cell-sampling of eight refinement points per original gridpoint. The refinement points are used only in the final calcula-tion, not during the self-consistency iteration (see text). Tri-angles: two refinement points per original grid point. Whitecircles: no grid-cell-sampling. (b) Bond length and angle ofthe water molecule as a function of Ecut
VII. NON-COLLINEAR SPIN
In the usual case of a normal (collinear) spinpolarized system, there are two sets of values forĻi(r), ĻĀµĪ½ , Ļ(r), V
xc(r), and HĀµĪ½ , one for spin up and an-other for spin down. Thus, the grid calculations can be
-40
-20
0
20
40
ā E
t (m
eV) Fe
LDA-60
-40
-20
0
GGA
-40
-20
0
20
40
100 300 500
ā P
(kb
ar)
Ec (Ry)
-40
-20
0
20
40
100 300 500Ec (Ry)
FIG. 6: Same as Fig. 5 for the total energy and pressureof bulk iron. This is presented as a specially difficult casebecause of the very hard partial core correction (rm = 0.7 au)required for a correct description of exchange and correlation.
repeated twice in an almost independent way: only tocalculate V xc(r) need they be combined. However, inthe non-collinear spin case44,45,46,47, the density at everypoint is not represented by the up and down values, butalso by a vector giving the spin direction. Equivalently,it may be represented by a local spin density matrix
ĻĪ±Ī²(r) =ā
i
niĻĪ²āi (r)ĻĪ±
i (r) =ā
ĀµĪ½
ĻĪ±Ī²ĀµĪ½Ļ
āĪ½(r)ĻĀµ(r) (36)
ĻĪ±i (r) =
ā
Āµ
ĻĀµ(r)cĪ±Āµi (37)
ĻĪ±Ī²ĀµĪ½ =
ā
i
cĪ±ĀµinicĪ²iĪ½ (38)
where Ī±, Ī² are spin indices, with up or down values. Thecoefficients cĪ±Āµi are obtained by solving the generalizedeigenvalue problem
ā
Ī½Ī²
(HĪ±Ī²ĀµĪ½ ā EiSĀµĪ½Ī“
Ī±Ī²)cĪ²Ī½i = 0 (39)
where HĪ±Ī²ĀµĪ½ , like ĻĪ±Ī²
ĀµĪ½ , is a (2N Ć 2N) matrix, with N thenumber of basis functions:
HĪ±Ī²ĀµĪ½ = ćĻĀµ|T + V KB + V NA(r) + Ī“V H(r) + V Ī±Ī²
XC(r)|ĻĪ½ ć.(40)
This is in contrast to the collinear spin case, in whichthe Hamiltonian and density matrices can be factorizedinto two N ĆN matrices, one for each spin direction. To
calculate V Ī±Ī²XC(r) we first diagonalize the 2 Ć 2 matrix
ĻĪ±Ī²(r) at every point, in order to find the up and downspin densities Ļā(r), Ļā(r) in the direction of the local spin
9
vector. We then find V āXC(r), V ā
XC(r) in that direction,with the usual local spin density functional15 and we ro-
tate back V Ī±Ī²XC(r) to the original direction. Thus, the
grid operations are still basically the same, except thatthey need now be repeated three times, for the āā, āāand āā components. Notice that ĻĪ±Ī²(r) and V Ī±Ī²
XC(r) arelocally Hermitian, while HĪ±Ī²
ĀµĪ½ and ĻĪ±Ī²ĀµĪ½ are globally Her-
mitian (HĪ²Ī±Ī½Āµ = HĪ±Ī²ā
ĀµĪ½ ), so that their āā components canbe obtained from the āā ones.
VIII. BRILLOUIN ZONE SAMPLING
Integration of all magnitudes over the Brillouin zone(BZ) is essential for small and moderately large unit cells,especially of metals. Although Siesta is designed forlarge unit cells, in practice it is very useful, especiallyfor comparisons and checks, to be able to also performcalculations efficiently on smaller systems without usingexpensive superlattices. On the other hand, an efficientk-sampling implementation should not penalize, becauseof the required complex arithmetic, the Ī-point calcula-tions used in large cells. A solution used in some pro-grams is to have two different versions of all or part ofthe code, but this poses extra maintenance requirements.We have dealt with this problem in the following way:around the unit cell (and comprising itself) we define anauxiliary supercell large enough to contain all the atomswhose basis orbitals are non-zero at any of the grid pointsof the unit cell, or which overlap with any of the basisorbitals in it. We calculate all the non-zero two-centerintegrals between the unit cell basis orbitals and the su-percell orbitals, without any complex phase factors. Wealso calculate the grid integrals between all the supercellbasis orbitals ĻĀµā² and ĻĪ½ā²ā² (primed indices run over allthe supercell), but within the unit cell only. We accumu-late these integrals in the corresponding matrix elements,thus making use of the relation
< ĻĀµ|V (r)|ĻĪ½ā² >=ā
(Āµā²Ī½ā²ā²)ā”(ĀµĪ½ā²)
< ĻĀµā² |V (r)f(r)|ĻĪ½ā²ā² > .
(41)
f(r) = 1 for r within the unit cell and is zero other-wise. ĻĀµ is within the unit cell. The notation Āµā² ā” Āµindicates that ĻĀµā² and ĻĀµ are equivalent orbitals, re-lated by a lattice vector translation. (Āµā²Ī½ā²ā²) ā” (ĀµĪ½ā²)means that the sum extends over all pairs of supercellorbitals ĻĀµā² and ĻĪ½ā²ā² such that Āµā² ā” Āµ, Ī½ā²ā² ā” Ī½ā², andRĀµ ā RĪ½ā² = RĀµā² ā RĪ½ā²ā² . Once all the real overlap andHamiltonian matrix elements are calculated, we multiplythem, at every k-point by the corresponding phase fac-tors and accumulate them by folding the supercell orbitalto its unit-cell counterpart. Thus
HĀµĪ½(k) =ā
Ī½ā²ā”Ī½
HĀµĪ½ā²eik(RĪ½ā²āRĀµ) (42)
where ĻĀµ and ĻĪ½ are within the unit cell. The resultingN Ć N complex eigenvalue problem, with N the num-ber of orbitals in the unit cell, is then solved at everysampled k point, finding the Bloch-state expansion coef-ficients cĀµi(k):
Ļi(k, r) =ā
Āµā²
eikRĀµā²ĻĀµā²(r)cĀµā²i(k) (43)
where the sum in Āµā² extends to all basis orbitals in space,i labels the different bands, cĀµā²i = cĀµi if Āµā² ā” Āµ, andĻi(k, r) is normalized in the unit cell.
The electron density is then
Ļ(r) =ā
i
ā«
BZ
ni(k)|Ļi(k, r)|2dk
=ā
Āµā²Ī½ā²
ĻĀµā²Ī½ā²ĻāĪ½ā²(r)ĻĀµā² (r) (44)
where the sum is again over all basis orbitals in space,and the density matrix
ĻĀµĪ½ =ā
i
ā«
BZ
cĀµi(k)ni(k)ciĪ½ (k)eik(RĪ½āRĀµ)dk (45)
is real (for real ĻĀµās) and periodic, i.e. ĻĀµĪ½ = ĻĀµā²Ī½ā² if(Ī½, Āµ) ā” (Ī½ā², Āµā²) (with āā”ā meaning again āequivalent bytranslationā). Thus, to calculate the density at a gridpoint of the unit cell, we simply find the sum (44) overall the pairs of orbitals ĻĀµ, ĻĪ½ in the supercell that arenon-zero at that point.
In practice, the integral in (45) is performed in a finite,uniform grid of the Brillouin zone. The fineness of thisgrid is controlled by a k-grid cutoff lcut, a real-space ra-dius which plays a role equivalent to the planewave cutoffof the real-space grid48. The origin of the k-grid may bedisplaced from k = 0 in order to decrease the number ofinequivalent k-points49.
If the unit cell is large enough to allow a Ī-point-onlycalculation, the multiplication by phase factors is skippedand a single real-matrix eigenvalue problem is solved (inthis case, the real matrix elements are accumulated di-rectly in the first stage, if multiple overlaps occur). Inthis way, no complex arithmetic penalty occurs, and thedifferences between Ī-point and k-sampling are limited toa very small section of the code, while all the two-centerand grid integrals use always the same real-arithmeticcode.
IX. TOTAL ENERGY
The Kohn-Sham14 total energy can be written as asum of a band-structure (BS) energy plus some correctionterms, sometimes called ādouble countā corrections. TheBS term is the sum of the energies of the occupied statesĻi:
EBS =ā
i
nićĻi|H |Ļić =ā
ĀµĪ½
HĀµĪ½ĻĪ½Āµ = Tr(HĻ) (46)
10
where spin and k-sampling notations are omitted here forsimplicity. At convergence, the Ļiās are simply the eigen-vectors of the Hamiltonian, but it is important to realizethat the Kohn-Sham functional is also perfectly well de-fined outside this so-called āBorn-Oppenheimer surfaceā,i.e. it is defined for any set of orthonormal Ļiās. Thecorrection terms are simple functionals of the electrondensity, which can be obtained from equation (35), andthe atomic positions. The Kohn-Sham total energy canthen be written as
EKS =ā
ĀµĪ½
HĀµĪ½ĻĪ½Āµ ā 1
2
ā«
V H(r)Ļ(r)d3r
+
ā«
(Ē«xc(r) ā V xc(r))Ļ(r)d3r
+ā
I<J
ZIZJ
RIJ(47)
where I, J are atomic indices, RIJ ā” |RJ ā RI |, ZI , ZJ
are the valence ion pseudoatom charges, and Ē«xc(r)Ļ(r)is the exchange-correlation energy density. In order toavoid the long range interactions of the last term, weconstruct from the local-pseudopotential V local
I , whichhas an asymptotic behavior of āZI/r, a diffuse ioncharge, Ļlocal
I (r), whose electrostatic potential is equalto V local
I (r):
ĻlocalI (r) = ā 1
4Ļā2V local
I (r). (48)
Notice that we define the electron density as positive,and therefore Ļlocal
I ā¤ 0. Then, we write the last term in(47) as
ā
I<J
ZIZJ
RIJ=
1
2
ā
IJ
U localIJ (RIJ ) +
ā
I<J
Ī“U localIJ (RIJ )
āā
I
U localI (49)
where U localIJ is the electrostatic interaction between the
diffuse ion charges in atoms I and J :
U localIJ (|R|) =
ā«
V localI (r)Ļlocal
J (r ā R)d3r, (50)
Ī“U localIJ is a small short-range interaction term to correct
for a possible overlap between the soft ion charges, whichappears when the core densities are very extended:
Ī“U localIJ (R) =
ZIZJ
Rā U local
IJ (R), (51)
and U localI is the fictitious self interaction of an ion charge
(notice that the first right-hand sum in (49) includes theI = J terms):
U localI =
1
2U local
II (0) =1
2
ā«
V localI (r)Ļlocal
I (r)4Ļr2dr.
(52)
Defining ĻNAI from V NA
I , analogously to ĻlocalI , we have
that ĻNAI = Ļlocal
I + ĻatomI , and equation (47) can be
transformed, after some rearrangement of terms, into
EKS =ā
ĀµĪ½
(TĀµĪ½ + V KBĀµĪ½ )ĻĪ½Āµ +
1
2
ā
IJ
UNAIJ (RIJ)
+ā
I<J
Ī“U localIJ (RIJ) ā
ā
I
U localI
+
ā«
V NA(r)Ī“Ļ(r)d3r (53)
+1
2
ā«
Ī“V H(r)Ī“Ļ(r)d3r +
ā«
Ē«xc(r)Ļ(r)d3r
where V NA =ā
I VNAI and Ī“Ļ = Ļā ā
I ĻNAI .
UNAIJ (R) =
ā«
V NAI (r)ĻNA
J (r ā R)d3r
= ā 1
4Ļ
ā«
V NAI (r)ā2V NA
J (r ā R)d3r (54)
is a radial pairwise potential that can be obtained fromV NA
I (r) as a two-center integral, by the same methoddescribed previously for the kinetic matrix elements:
TĀµĪ½ = ćĻĀµ| ā1
2ā2|ĻĪ½ć
= ā1
2
ā«
ĻāĀµ(r)ā2ĻĪ½(r ā RĀµĪ½)d3r (55)
V KBĀµĪ½ is also obtained by two-center integrals:
V KBĀµĪ½ =
ā
Ī±
ćĻĀµ|ĻĪ±ćvKBĪ± ćĻĪ±|ĻĪ½ć (56)
where the sum is over all the KB projectors ĻĪ± that over-lap simultaneously with ĻĀµ and ĻĪ½ .
Although (53) is the total energy equation actuallyused by Siesta, its meaning may be further clarified ifthe I = J terms of 1
2
ā
IJ UNAIJ (RIJ ) are combined with
ā
I UlocalI to yield
EKS =ā
ĀµĪ½
(TĀµĪ½ + V KBĀµĪ½ )ĻĪ½Āµ +
ā
I<J
UNAIJ (RIJ )
+ā
I<J
Ī“U localIJ (RIJ) +
ā
I
UatomI
+
ā«
V NA(r)Ī“Ļ(r)d3r (57)
+1
2
ā«
Ī“V H(r)Ī“Ļ(r)d3r +
ā«
Ē«xc(r)Ļ(r)d3r
where
UatomI =
ā« ā
0
(
V localI (r) +
1
2V atom
I (r)
)
ĻatomI (r)4Ļr2dr
(58)
is the electrostatic energy of an isolated atom.
11
The last three terms in Eq. (53) are calculated usingthe real space grid. In addition to getting rid of all long-range potentials (except that implicit in Ī“V H(r)), theadvantage of (53) is that, apart from the relatively slowly-varying exchange-correlation energy density, the grid in-tegrals involve Ī“Ļ(r), which is generally much smallerthan Ļ(r). Thus, the errors associated with the finitegrid spacing are drastically reduced. Critically, the ki-netic energy matrix elements can be calculated almostexactly, without any grid integrations.
It is frequently desirable to introduce a finite electronictemperature T and/or a fixed chemical potential Āµ, eitherbecause of true physical conditions or to accelerate theself-consistency iteration. Then, the functional that mustbe minimized is the free energy50
F (RI , Ļi(r), ni) = EKS(RI , Ļi(r), ni) ā Āµā
i
ni
ākBTā
i
(ni logni + (1 ā ni) log(1 ā ni)). (59)
Minimization with respect to ni yields the usual Fermi-Dirac distribution ni = 1/(1 + e(Ē«iāĀµ)/kBT ).
X. HARRIS FUNCTIONAL
We will mention here a special use of the Harris energyfunctional, that is generally defined as51,52
EHarris[Ļin] =ā
i
nouti ćĻout
i |Hin|Ļouti ć
ā1
2
ā« ā«
Ļin(r)Ļin(rā²)
|rā rā²| d3rd3
rā²
+
ā«
(Ē«inxc(r) ā vinxc(r))Ļ
in(r)d3r +
ā
I<J
ZIZJ
RIJ(60)
where Hin is the KS Hamiltonian produced by a trialdensity Ļin and Ļout
i are its eigenvectors (which in gen-eral are different from those whose density is Ļin). Asin Eq. (46), the first term in (60) can be written asTr(HinĻout), and the rest are the so-called ādouble countcorrectionsā. An important advantage of eq. (60) is thatit does not require Ļin
i to be obtained from a set of or-thogonal electron states Ļin
i , and in fact Ļin is frequentlytaken as a simple superposition of atomic densities. How-ever, we will assume here that the states Ļin
i are indeedknown. In this case, the Kohn-Sham energy EKS [Ļin],Eq. (47), obeys exactly the same expression (60), exceptthat Ļout
i and nouti must be replaced by Ļin
i and nini .
Thus, a simple subtraction gives
EHarris[Ļin] = EKS [Ļin] +ā
ĀµĪ½
HinĪ½Āµ
(
ĻoutĀµĪ½ ā Ļin
ĀµĪ½
)
. (61)
Generally the Harris functional is used nonself-consistently, with a trial density given by the sum ofatomic densities. But here we want to comment on its
usefulness to improve dramatically the estimate of theconverged total energy, by taking Ļin
ĀµĪ½ as the density ma-
trix of the (nā1)āth self-consistency iteration and ĻoutĀµĪ½ of
the nāth iteration. In fact, EHarris frequently gives, afterjust two or three iterations, a better estimate than EKS
after tens of iterations. Unfortunately, we have foundthat there is hardly any improvement in the convergenceof the atomic forces thus estimated, and therefore theself-consistent Harris functional is less useful for geome-try relaxations or molecular dynamics.
XI. ATOMIC FORCES
Atomic forces and stresses are obtained by direct dif-ferentiation of (53) with respect to atomic positions.They are obtained simultaneously with the total energy,mostly in the same places of the code, under the generalparadigm āa piece of energy ā a piece of force/stressā(except that some pieces are calculated only in the lastself-consistency step). This ensures that all force contri-butions, including Pulay corrections, are automaticallyincluded. The force contribution from the first term in(53) is
ā
āRI
ā
ĀµĪ½
(TĀµĪ½ + V KBĀµĪ½ )ĻĪ½Āµ =
ā
ĀµĪ½
(TĀµĪ½ + V KBĀµĪ½ )
āĻĪ½Āµ
āRI+ 2
ā
Āµ
ā
Ī½āI
dTĀµĪ½
dRĀµĪ½ĻĪ½Āµ
+2ā
Āµ
ā
Ī½āI
ā
Ī±
SĀµĪ±vKBĪ±
dSĪ±Ī½
dRĪ±Ī½ĻĪ½Āµ
ā2ā
ĀµĪ½
ā
Ī±āI
SĀµĪ±vKBĪ±
dSĪ±Ī½
dRĪ±Ī½ĻĪ½Āµ (62)
where Ī± are KB projector indices, ā I indicates orbitalsor KB projectors belonging to atom I, and we have con-sidered that
āSĀµĪ½
āRIĪ½
= ā āSĀµĪ½
āRIĀµ
=dSĀµĪ½
dRĀµĪ½, (63)
where RIĀµis the position of atom IĀµ, to which orbital
ĻĀµ belongs and RĀµĪ½ = RIĪ½ā RIĀµ
.
Leaving aside for appendix A the terms contain-ing āĻĪ½Āµ/āRI , the other derivatives can be obtainedby straightforward differentiation of their expansion inspherical harmonics (Eq. (25)). However, instead of us-ing the spherical harmonics Ylm(r) themselves, it is con-venient to multiply them by rl, in order to make them
12
analytic at the origin. Thus
dSĀµĪ½(R)
dR=
ā
lm
ā(
SĀµĪ½lm(R)
RlRlYlm(R)
)
=ā
lm
d
dR
(
SĀµĪ½lm(R)
Rl
)
RlYlm(R)R
+ā
lm
SĀµĪ½lm(R)
Rlā(RlYlm(R)) (64)
In fact, it is SĀµĪ½lm(R)/Rl, rather than SĀµĪ½
lm(R), that isstored as a function of R on a radial grid. Its deriva-tive, d(SĀµĪ½
lm(R)/Rl)/dR, is then obtained from the samecubic spline interpolation used for the value itself. Thevalue and gradient of RlYlm(R) are calculated analyt-ically from explicit formulae (up to l = 2) or recur-rence relations31. Entirely analogous equations apply todTĀµĪ½/dRĀµĪ½ .
The second and third terms in Eq. (53) are sim-ple interatomic pair potentials whose force contributionsare calculated trivially from their radial spline inter-polations. The fourth term is a constant which doesnot depend on the atomic positions. Taking into ac-count that V NA(r) =
ā
I VNAI (r ā RI), and therefore
āV NA(r)/āRI = āāV NAI (r ā RI), the force contribu-
tion from the fifth term is
ā
āRI
ā«
V NA(r)Ī“Ļ(r)d3r = (65)
āā«
āV NAI (r)Ī“Ļ(r)d3
r +
ā«
V NA(r)āĪ“Ļ(r)
āRId3
r
The sixth term is the electrostatic self-energy of thecharge distribution Ī“Ļ(r):
ā
āRI
1
2
ā«
Ī“V H(r)Ī“Ļ(r)d3r =
ā«
Ī“V H(r)āĪ“Ļ(r)
āRId3
r (66)
In the last term, we take into account that d(ĻĒ«xc)/dĻ =vxc to obtain
ā
āRI
ā«
Ē«xc(r)Ļ(r)d3r =
ā«
V xc(r)āĻ(r)
āRId3
r (67)
Now, using Eq. (35) and that, for Ī½ ā I, āĻĪ½(r)/āRI =āāĻĪ½ , the change of the self-consistent and atomic den-sities are
āĻ(r)
āRI= Re
ā
ĀµĪ½
āĻĪ½Āµ
āRIĻāĀµ(r)ĻĪ½ (r)
ā 2Reā
Āµ
ā
Ī½āI
ĻĪ½ĀµĻāĀµ(r)āĻĪ½ (r) (68)
āĻatom(r)
āRI= ā2Re
ā
ĀµāI
ĻatomĀµĀµ ĻāĀµ(r)āĻĀµ(r) (69)
where we have taken into account that the density matrixof the separated atoms is diagonal. Thus, leaving still
aside the terms with āĻĪ½Āµ/āRI , the last term in Eq. (65),as well as those in (66) and (67), have the general form
Reā
Āµ
ā
Ī½āI
ĻĪ½Āµ
ā«
V (r)ĻāĀµ(r)āĻĪ½(r)d3r
= Reā
Āµ
ā
Ī½āI
ĻĪ½ĀµćĻĀµ|V (r)|āĻĪ½ć. (70)
These integrals are calculated on the grid, in the sameway as those for the total energy (i. e. ćĻĀµ|V (r)|ĻĪ½ć).The gradients āĻĪ½(r) at the grid points are obtainedanalytically, like those of ĻĪ½(r) from their radial grid in-terpolations of Ļ(r)/rl :
āĻIlmn(r) =d
dr
(
ĻIln(r)
rl
)
rlYlm(r)r
+ĻIln(r)
rlā(rlYlm(r)). (71)
In some special cases, with elements that require hardpartial core corrections or explicit inclusion of the semi-core, the grid integrals may pose a problem for geometryrelaxations, because they make the energy dependent onthe position of the atoms relative to the grid. This āegg-box effectā is small for the energy itself, and it decreasesfast with the grid spacing. But the effect is larger andthe convergence slower for the forces, as they are pro-portional to the amplitude of the energy oscillation, butinversely proportional to its period. These force oscilla-tions complicate the force landscape, especially when thetrue atomic forces become small, making the convergenceof the geometry optimization more difficult. Of course,the problem can be avoided by decreasing the grid spac-ing but this has an additional cost in computer time andmemory. Therefore, we have found it useful to minimizethis problem by recalculating the forces, at a set of po-sitions, determined by translating the whole system bya set of points in a finer mesh. This procedure, whichwe call āgrid-cell samplingā, has no extra cost in memory.And since it is done only at the end of the self-consistencyiteration, for fixed ĻĀµĪ½ , it has only a moderate cost inCPU time.
At finite temperature, the forces are really the deriva-tives of the free energy with respect to atomic displace-ments since
dF (RI , Ļi(r), ni)
dRI=
āF
āRI+
ā
i
āF
āni
āni
āRI
+ā
i
ā«
āF
āĻāi (r)
āĻi(r)
āRId3
r =āE
āRI. (72)
In this particular equation we have used the notationd/dRI , as opposed to ā/āRI , to indicate the inclusionof the change in Ļi(r) and ni when we move the atom,in calculating the derivative. But we have used also thatāF/āni = āF/āĻi(r) = 0 and that the last two terms in(59) do not depend on RI , so that āF/āRI = āE/āRI .
13
The latter are the atomic forces actually calculated. No-tice, however, that dE/dRI 6= āE/āRI , so that the cal-culated forces are indeed the total derivatives of the free,not the internal energy.
We would like to also mention the calculation of forcesusing the non-self-consistent Harris functional, in whichthe āinā density is a superposition of atomic densities. Wehave implemented this as an option for āquick and dirtyācalculations because, used with a minimal basis set, itmakes Siesta competitive with tight binding methods,which are much faster than density functional calcula-tions. The problem that we address here is that, al-though EHarris is stationary with respect to Ļout, it isnot so with respect to Ļin. In particular, there appearsa force term
ā«
āV inxc (r)
āRIĻout(r)d3
r. (73)
A similar term appears for the electrostatic interactionbetween the input and output density, but it presentsno special problems because of the linear character ofthe Hartree potential. However, evaluation of (73) re-quires the change of the exchange-correlation potentialwith density, a quantity also required to evaluate the lin-ear response of the electron gas, but not in normal energyand force calculations. Finally, notice that, apart fromthis minor difficulty, the Harris-functional forces are per-fectly well defined at the first iteration only. For lateriterations (but still not converged) there is no practi-cal way to calculate āĻin/āRI and, without the help ofHellman-Feynmannās theorem (which applies only at con-vergence), the forces are not well defined. Of course, theomission of the terms depending on this quantity pro-duces an estimate of the forces, but we have found thattheir convergence is not appreciably faster than those es-timated from the Kohn-Sham functional.
XII. STRESS TENSOR
We define the stress tensor as the positive derivativeof the total energy with respect to the strain tensor
ĻĪ±Ī² =āEKS
āĒ«Ī±Ī²(74)
where Ī±, Ī² are Cartesian coordinate indices. To trans-late to standard units of pressure, we must simply divideby the unit-cell volume and change sign. During the de-formation, all vector positions, including those of atomsand grid points (and of course lattice vectors), changeaccording to
rā²Ī± =
3ā
Ī²=1
(Ī“Ī±Ī² + Ē«Ī±Ī²) rĪ² (75)
The shape of the basis functions, KB projectors, andatomic densities and potentials do not change, but their
origin gets displaced according to (75). From this equa-tion, we find that
ārĪ³āĒ«Ī±Ī²
= Ī“Ī³Ī±rĪ² (76)
The change in EKS is essentially due to these positiondisplacements, and therefore the calculation of the stressis almost perfectly parallel to that of the atomic forces,thus being performed in the same sections of the code.For example:
āTĀµĪ½
āĒ«Ī±Ī²=
3ā
Ī³=1
āTĀµĪ½
ārĪ³ĀµĪ½
ārĪ³ĀµĪ½
āĒ«Ī±Ī²=āTĀµĪ½
ārĪ±ĀµĪ½
rĪ²ĀµĪ½ (77)
Since āTĀµĪ½/ārĪ±ĀµĪ½ is evaluated to calculate the forces, it
takes very little extra effort to multiply it also by rĪ²ĀµĪ½
for the stress. Equally, force contributions like (70) havetheir obvious stress counterpart
ā
ĀµĪ½
ĻĪ½ĀµćĻĀµ|V (r)|(āĪ±ĻĪ½)rĪ²ć (78)
However, there are three exceptions to this parallelism.The first concerns the change of the volume per grid pointor, in other words, the Jacobian of the transformation(75) in the integrals over the unit cell. This Jacobian issimply Ī“Ī±Ī² , and it leads to a stress contribution
[ā«
(
V NA(r) +1
2Ī“V H(r)
)
Ī“Ļ(r)d3r + Exc
]
Ī“Ī±Ī² (79)
Notice that the the renormalization of the density, re-quired to conserve the charge when the volume changes,enters through the orthonormality constraints, to be dis-cussed in appendix A. The second special contribu-tion to the stress lies in the fact that, as we deformthe lattice, there is a change in the factor 1/|rā r
ā²| ofthe electrostatic energy integrals. We deal with thiscontribution in reciprocal space, when we calculate theHartree potential by FFTs, by evaluating the derivativeof the reciprocal-space vectors with respect to Ē«Ī±Ī². SinceGā²
Ī± =ā
Ī² GĪ²(Ī“Ī²Ī± ā Ē«Ī²Ī±):
ā
āĒ«Ī±Ī²
1
G2=
2GĪ±GĪ²
G4(80)
Finally, the third special stress contribution arises inGGA exchange and correlation, from the change of thegradient of the deformed density Ļ(r) ā Ļ(rā²). The treat-ment of this contribution is explained in detail in refer-ence 42.
Electric Polarization
The calculation of the electric polarization, as an in-tegral in the grid across the unit cell, is standard and
14
almost free for molecules, chains and slabs (in the direc-tions perpendicular to the chain axis, or to the surface).For bulk systems, the electric polarization cannot befound from the charge distribution in the unit cell alone.In this case, we need the so-called Berry-phase theory ofpolarization53,54, which allows to compute quantities likethe dynamical charges53 and piezoeletric constants55,56.Here we comment some details of our implementation57.
If RĪ± are the lattice vetors and Pe =
ā3Ī±=1 P
eĪ±RĪ± is
the electronic contribution to the macroscopic polariza-tion, then we have
2Ļ P eĪ± = GĪ± Ā· Pe
= ā 2e
(2Ļ)3
ā«
BZ
dk GĪ± Ā· ā
ākā²Ī¦(k,kā²)
ā£
ā£
ā£
ā£
kā²=k
(81)
where GĪ± is the corresponding reciprocal lattice vector,e is the electron charge, ui(k, r) = eāik.rĻi(k, r) is theperiodic part of the Bloch function, and the factor of twocomes from the spin degeneracy. The quantum phaseĪ¦(k,kā²) is defined as
Ī¦(k,kā²) = Im [ln (detćui(k, r)|uj(kā², r)ć)] (82)
The derivative in (81) depends on a gauge that mustbe chosen such that u(k + G, r) = eāiGĀ·ru(k, r). Inpractice, the integral is replaced by a discrete summa-tion, and a finite-difference approximation is made for the
derivative53: ākĪ±ā
ākā²
Ī±Ī¦(k,kā²)
ā£
ā£
ā£
kā²=k
ā 12 [Ī¦(k,k+ākĪ±)ā
Ī¦(k,k ā ākĪ±)], where ākĪ± = GĪ±/NĪ±. Then (81) be-comes, for Ī± = 1
G1 Ā·Pe ā ā 2e
Ī©N2N3
N2ā1,N3ā1ā
i2=0,i3=0
N1ā1ā
i1=0
Ī¦(ki1i2i3 ,ki1+1i2i3) ,
(83)
where we have split the sum to stress the fact that wehave a two-dimensional integral in the plane defined byG2 and G3, and a linear integral along G1. Due to theapproximation in the derivative, the linear integral usu-ally requires a finer mesh than the surface integral. Toevaluate Ī¦(k,k + āk) we use our LCAO basis:
ćui(k)|uj(k + āk)ć = ćĻi(k)|eāiākĀ·r|Ļj(k + āk)ć =ā
Ī½
ā
Āµā² ciĪ½(k)cĀµā²j(k + āk) (84)
eāikĀ·(RĪ½āRĀµā² )ćĻĪ½ |eāiākĀ·(rāRĀµā²)|ĻĀµā²ć.
Formulas similar to (84) have been implemented by sev-eral authors58,59, mainly in the context of Hatree-Fockcalculations, in which the basis orbitals are expandedin gaussians whose matrix elements can be found ana-lytically58. Our numerical, localized pseudo-atomic ba-sis orbitals are not well suited for a gaussian expan-sion. Instead, we expand the plane-waves appearingin equation (84) to first order in āk, eāiākĀ·(rāRĀµ) ā1 ā iāk Ā· (r ā RĀµ) + O(āk2), and then we calculate thematrix elements of the position operator as explained in
sectionV. It is interesting to note that, since the dis-cretized formula (83) only holds to O(āk2), the approxi-mation of the matrix elements in (84) does not introduceany further errors in the calculation of the polarization.In a symmetrized version, we approximate equation (84)as
ā
Ī½
ā
Āµā²
ciĪ½(k)cĀµā²j(k + āk)eāi(k+āk
2)Ā·(RĪ½āRĀµā²) [ ćĻĪ½ |ĻĀµā²ć
āiāk
2Ā· ( ćĻĪ½ |(r ā RĪ½)|ĻĀµā² ć + ćĻĪ½ |(r ā RĀµā²)|ĻĀµā² ć) ] ,(85)
XIII. ORDER-N FUNCTIONAL
The basic problem for solving the Kohn-Sham equa-tions in O(N) operations is that the solutions (the Hamil-tonian eigenvectors) are extended over the whole systemand overlap with each other. Just to check the orthogo-nality of N trial solutions, by performing integrals overthe whole system, involves ā¼ N3 operations. Among thedifferent methods proposed to solve this problem5,9, wehave chosen the localized-orbital approach6,60,61 becauseof its superior efficiency for non-orthogonal basis sets.The initially proposed functional6,60 used a fixed num-ber of occupied states, equal to the number of electronpairs, and it was found to have numerous local minima inwhich the electron configuration was easily trapped. Arevised functional form61 which uses a larger number ofstates than electron pairs, with variable occupations, hasbeen found empirically to avoid the local minima prob-lem. This is the functional that we use and recommend.
Each of the localized, Wannier-like states, is con-strained to its own localization region. Each atom I isassigned a number of states equal to int(Zval
I /2+1) sothat, if doubly occupied, they can contain at least oneexcess electron (they can also become empty during theminimization of the energy functional). These states areconfined to a sphere of radius Rc (common to all states)centered at RI . More precisely, the expansion (Eq. (32))of a state Ļi centered at RI may contain only basis or-bitals ĻĀµ centered on atoms J such that |RIJ | < Rc.This implies that Ļi(r) may extend to a maximum rangeRc + rmax
c , where rmaxc is the maximun range of the ba-
sis orbitals. For covalent systems, a localization regioncentered on bonds rather than atoms is more efficient62
(it leads to a lower energy for the same Rc), but it isless suitable to a general algorithm, especially in case ofambiguous bonds. Therefore, we generally use the atom-centered localization regions.
In the method of Kim, Mauri, and Galli (KMG)61, the
15
band-structure energy is rewritten as:
EKMG = 2ā
ij
(2Ī“ji ā Sji)(Hij ā Ī·Sij)
= 4ā
i
ā
ĀµĪ½
ciĀµĪ“HĀµĪ½cĪ½i
ā 2ā
ij
ā
Ī±Ī²ĀµĪ½
ciĪ±SĪ±Ī²cĪ²jcjĀµĪ“HĀµĪ½cĪ½i (86)
Where Sij = ćĻi|Ļjć, Hij = ćĻi|H |Ļjć, Ī“HĀµĪ½ = HĀµĪ½ āĪ·SĀµĪ½ , and we have assumed a non-magnetic solution withdoubly occupied states. The ādouble countā correctionterms of Eq. (47) remain unchanged and the electrondensity is still defined by (35), but the density matrix isre-defined as
ĻĀµĪ½ = 2ā
ij
cĀµi(2Ī“ij ā Sij)cjĪ½
= 4ā
i
cĀµiciĪ½ ā 2ā
ij
ā
Ī±Ī²
cĀµiciĪ±SĪ±Ī²cĪ²jcjĪ½ (87)
The parameter Ī· in Eq. (86) plays the role of a chemi-cal potential, and must be chosen to lie within the bandgap between the occupied and empty states. This may betricky sometimes, since the electron bands can shift dur-ing the self-consistency process or when the atoms move.In general, the number of electrons will not be exactlythe desired one, even if Ī· is within the band gap, becausethe minimization of (86) implies a trade-off in which thelocalized states become fractionally occupied. To avoidan infinite Hartree energy in periodic systems, we simplyrenormalize the density matrix so that the total electroncharge
ā
ĀµĪ½ SĀµĪ½ĻĪ½Āµ equals the required value.
For a given potential, the functional (86) is minimizedby the conjugate-gradients method, using its derivativeswith respect to the expansion coefficients
āEKMG
āciĀµ= 4
ā
Ī½
Ī“HĀµĪ½cĪ½i
ā2ā
j
ā
Ī±Ī²Ī½
( SĀµĪ½cĪ½jcjĪ±Ī“HĪ±Ī²cĪ²i
+ Ī“HĀµĪ½cĪ½jcjĪ±SĪ±Ī²cĪ²i) (88)
The minimization proceeds without need to orthonor-malize the electron states Ļi. Instead, the orthogonal-ity, as well as the correct normalization (one below Ī·and zero above it) result as a consequence of the min-imization of EKMG. This is because, in contrast tothe KS functional, EKMG is designed to penalize anynonorthogonality61. The KS ground state, with all theoccupied Ļiās orthonormal, is also the minimum of (86),at which EKMG = EKS . If the variational freedom isconstrained by the localization of the Ļiās, the orthogo-nality cannot be exact, and the resulting energy is slightlylarger than for unconstrained wavefunctions. In insula-tors and semiconductors, the Wannier functions are ex-ponentially localized63, and the energy excess due to their
strict localization decreases rapidly as a function of thelocalization radius Rc, as can be seen in Fig. 7.
2 3 4 5 6 7 8Rc (A)
ā20
0
20
40
60
80
Rel
ativ
e er
ror
(%)
Bulk ModulusLattice ConstantCohesive Energy
FIG. 7: Convergence of the lattice constant, bulk modulus,and cohesive energy as a function of the localization radiusRc of the Wannier-like electron states in silicon. We used asupercell of 512 atoms and a minimal basis set with a cutoffradius rc = 5 a.u. for both s and p orbitals.
If the system is metallic, or if the chemical potentialis not within the band gap (for example because of thepresence of defects), the KMG functional cannot be usedin practice. In fact, although some O(N) methods canhandle metallic systems in principle9, we are not aware ofany practical calculations at a DFT level. In such caseswe copy the Hamiltonian and overlap matrices to stan-dard expanded arrays and solve the generalized eigen-value problem by conventional order-N3 diagonalizationtechniques64. However, even in this case, most of the op-erations, and particularly those to find the density andpotential, and to set up the Hamiltonian, are still per-formed in O(N) operations.
Irrespective of whether the O(N) functional orthe standard diagonalization is used, an outer self-consistency iteration is required, in which the densitymatrix is updated using Pulayās Residual Metric Mini-mization by Direct Inversion of the Iterative Subspace(RMM-DIIS) method65,66. Even when the code is strictlyO(N), the CPU time may increase faster if the numberof iterations required to achieve the solution increaseswith N . In fact, it is a common experience that the re-quired number of selfconsistency iterations increases withthe size of the system. This is mainly because of theācharge sloshingā effect, in which small displacements ofcharge from one side of the system to another give riseto larger changes of the potential, as the size increases.Fortunately, the localized character of the Wannier-likewavefunctions used in the O(N) method help to solvealso this problem, by limiting the charge sloshing. Ta-
16
TABLE II: Average number of selfconsistency (SCF) iter-ations (per molecular dynamics step) and average numberof conjugate-gradient (CG) iterations (per SCF iteration) re-quired to minimize the O(N) functional, during a simulationof bulk silicon at ā¼ 300 K. We used the Verlet method67 atconstant energy, with a time step of 1.5 fs, and a minimalbasis set with a cutoff radius rc = 5 a.u. Rc is the local-ization radius of the Wannier-like wavefunctions used in theO(N) functional (see text). N is the number of atoms in thesystem.
Rc = 4A Rc = 5A
N CG SCF CG SCF
64 5.8 9.3 8.4 8.4
512 4.9 11.4 8.8 10.1
1000 4.3 11.5 9.9 11.5
ble II presents the average number of iterations requiredto minimize the O(N) functional and the average numberof selfconsistency iterations, during a molecular dynam-ics simulation of bulk silicon at room temperature. It canbe seen that these numbers are quite small and that theyincrease very moderately with system size. As might beexpected, the number of minimization iterations increasewith the localization radius, i.e. with the number of de-grees of freedom (cĀµi coefficients) of the wavefunctions.But this increase is also rather moderate.
Figure 8 shows the essentially perfect O(N) behaviourof the overall CPU time and memory. This is not surpris-ing in view of the completely strict enforcement of O(N)algorithms everywhere in the code (except the marginalN logN factor in the FFT used to solve Poissonās equa-tion, which represents a very small fraction of CPU timeeven for 4000 atoms).
XIV. OTHER FEATURES
Here we will simply mention some of the possibilitiesand features of the Siesta implementation of DFT:
ā¢ A general-purpose package68, the flexible data for-mat (fdf), initially developped for the Siesta
project, allows the introduction of all the dataand precision parameters in a simple tag-oriented,order-independent format which accepts differentphysical units. The data can then be accessed fromanywhere in the program, using simple subroutinecalls in which a default value is specified for thecase in which the data are not present. A simplecall also allows the read pointer to be positionedin order to read complex data āblocksā also markedwith tags.
ā¢ The systematic calculation of atomic forces andstress tensor allows the simultaneous relaxation ofatomic coordinates and cell shape and size, using a
0 2000 4000 6000 8000Number of Atoms
0
500
1000
1500
2000
Mem
ory
(Mb)
0 2000 4000 6000 8000
0
50
100
150
200
250
0
CP
U T
ime
(min
)
FIG. 8: CPU time and memory for silicon supercells of 64,512, 1000, 4096, and 8000 atoms. Times are for one aver-age molecular dynamics step at 300 K. This includes 10 SCFsteps, each with 10 conjugate gradient minimization steps ofthe O(N) energy functional. Memories are peak ones. Al-though the memory requierement for 8000 atoms was deter-mined accurately, the run could not be performed because ofinsufficient memory in the PC used.
conjugate gradients minimization or several otherminimization/annealing algorithms.
ā¢ It is possible to perform a variety of molecular dy-namics simulations, at constant energy or temper-ature, and at constant volume or pressure, also in-cluding Parrinello-Rahman dynamics with variablecell shape67. The geometry relaxation may be re-stricted, to impose certain positions or coordinates,or more complex constraints.
ā¢ The auxiliary program Vibra processes systemat-ically the atomic forces for sets of displaced atomicpositions, and from them computes the Hessian ma-trix and the phonon spectrum. An interface to thePhonon program69 is also provided within Siesta.
ā¢ A linear response program (Linres) to calculatephonon frequencies has also been developed70. Thecode reads the SCF solution obtained by Siesta,and calculates the linear response to the atomicdisplacements, using first order perturbation the-ory. It then calculates the dynamical matrix, fromwhich the phonon frequencies are obtained.
ā¢ A number of auxiliary programs allows various rep-resentations of the total density, the total and lo-cal density of states, and the electrostatic or to-tal potentials. The representations include bothtwo-dimensional cuts and three-dimensional views,which may be colored to simultaneously representthe density and potential.
17
ā¢ Thanks to an interface with the Transiesta pro-gram, it is posible to calculate transport propertiesacross a nanocontact, finding self-consistently theeffective potential across a finite voltage drop, ata DFT level, using the Keldish Greenās functionformalism71.
ā¢ The optical response can be studied with siesta us-ing different approaches. An approximate dielectricfunction can be calculated from the dipolar tran-sition matrix elements between occupied and un-occupied single-electron eigenstates using first or-der time-dependent pertubation theory.72 For finitesystems, these are easily calculated from the matrixelements of the position operator between the ba-sis orbitals. For infinite periodic systems, we usethe matrix elements of the momentum operator.It is important to notice, however, that the use ofnon-local pseudopotentials requires some correctionterms73.
We have also implemented a more sophisticatedapproach to compute the optical response of fi-nite systems, using the adiabatic approximationto time-dependent DFT74,75. The idea is to in-tegrate the time-dependent Schrodinger equationwhen a time depedent perturbation is applied to thesystem76. From the time evolution, it is then pos-sible to extract the optical adsorption and dipolestrength functions, including some genuinely many-body effects, like plasmons. Using this approach wehave succesfully calculated the electronic responseof systems such as fullerenes and small metallicclusters77.
ACKNOWLEDGMENTS
We are deeply indebted to Otto Sankey and DavidDrabold for allowing us to use their code as an initialseed for this project, and to Richard Martin for contin-uous ideas and support. We thank Jose Luis Martinsfor numerous discussions and ideas, and Jurgen Kublerfor helping us implement the noncollinear spin. Theexchange-correlation methods and routines were devel-oped in collaboration with Carlos Balbas and Jose L.Martins. We also thank In-Ho Lee, Maider Machado,Juana Moreno, and Art R. Williams for some routines,and Eduardo Anglada and Oscar Paz for their computa-tional help. This work was supported by the FundacionRamon Areces and by Spainās MCyT grant BFM2000-1312. JDG would like to thank the Royal Society for aUniversity Research Fellowship and EPSRC for the pro-vision of computer facilities. DSP acknowledges supportfrom the Basque Government (Programa de Formacionde Investigadores).
APPENDIX A: ORTHOGONALITY FORCE AND
STRESS
We have yet to comment on the force and stressterms containing āĻĀµĪ½/āRI. Substituting the firstterm of Eq. (68) into Eqs. (65-67) and adding thefirst term of Eq. (62) we obtain a simple expression:ā
ĀµĪ½ HĪ½ĀµāĻĀµĪ½/āRI . Now, ĻĀµĪ½ is a function of the Hamil-tonian eigenvector coefficients and occupations only(Eq. (34)). On the Born-Oppenheimer surface (BOS),EKS is stationary with respect to those coefficients andoccupations, and the Hellman-Feynmann theorem guar-antees that any change of them will not modify the totalenergy to first order, and therefore will not affect theforces. In other words, the atomic forces are the partialderivatives āEKS/āRI at constant cĀµi and ni. Even inthe Car-Parrinello scheme, in which the system movesout of the BOS, making the Hellman-Feynmann theo-rem invalid, the atomic forces are nevertheless defined asderivatives at constant cĀµi and ni. Thus, it may seemthat the terms āĻĀµĪ½/āRI are irrelevant for the calcula-tion of the forces. However, in the previous discussionwe have omitted to say that the KS energy must be min-imized under the constraint of orthonormality of the oc-cupied states and that, therefore, at the BOS the energyis stationary only with respect to changes of Ļi which donot violate the orthonormality. With an atomic basis set,the displacement of atoms (and the deformation of theunit cell) modifies the basis, and therefore the occupiedstates Ļi =
ā
Āµ ĻĀµcĀµi, even at constant cĀµiās. And thechange of the states affects their orthonormality. Thus,in order to calculate the new total energy, we need tore-orthonormalize the occupied states, by changing theircoefficients cĀµi. Schematically, we must solve
ćĻi|Ī“S|Ļjć + ćĪ“Ļi|S|Ļjć + ćĻi|S|Ī“Ļjć = 0 (A1)
where Ī“S represents the change of SĀµĪ½ due to the atomicdisplacements, and Ī“Ļi the modification of Ļi due to thechange of cĀµi. Without lack of generality, we can expandĪ“Ļi in the basis of the eigenvectors Ļi as Ī“Ļi =
ā
j ĻjĪ»ji.
Substituting this expansion into (A1) and using thatćĻi|S|Ļjć = Ī“ij we obtain Ī»ji = ā 1
2 ćĻj |Ī“S|Ļić. Thus
|Ī“Ļić = ā1
2
ā
j
|ĻjććĻj |Ī“S|Ļić. (A2)
In terms of the coefficients cĀµi, we have ćĻj |Ī“S|Ļić =ā
ĀµĪ½ cjĀµĪ“SĀµĪ½cĪ½i and
Ī“cĀµi = ā1
2
ā
j
ā
Ī·Ī½
cĀµjcjĪ·Ī“SĪ·Ī½cĪ½i
= ā1
2
ā
Ī·Ī½
Sā1ĀµĪ· Ī“SĪ·Ī½cĪ½i (A3)
where we have used that cĀµi = ćĻĀµ|Ļić, andā
i
cĀµiciĪ½ = ćĻĀµ|ĻĪ½ć = Sā1ĀµĪ½ (A4)
18
Differentiating now Eq. (34) and using (A3) we obtain
Ī“ĻĀµĪ½ = ā1
2
ā
Ī·Ī¶
(
ĻĀµĪ·Ī“SĪ·Ī¶Sā1Ī¶Ī½ + Sā1
ĀµĪ· Ī“SĪ·Ī¶ĻĪ¶Ī½
)
(A5)
Andā
ĀµĪ½
HĀµĪ½Ī“ĻĪ½Āµ = āā
ĀµĪ½
EĀµĪ½Ī“SĪ½Āµ (A6)
where EĀµĪ½ is the so-called energy-density matrix:
EĀµĪ½ =1
2
ā
Ī·Ī¶
(
Sā1ĀµĪ· HĪ·Ī¶ĻĪ¶Ī½ + ĻĀµĪ·HĪ·Ī¶S
ā1Ī¶Ī½
)
=ā
i
cĀµiniĒ«iciĪ½ (A7)
where Ē«i are the eigenstate energies. To calculate the or-thogonalization force or stress, Ī“SĀµĪ½ must be substitutedby the appropriate derivative:
ForthogI = 2
ā
Āµ
ā
Ī½āI
EĪ½ĀµāSĀµĪ½
āRĀµĪ½(A8)
This equation has been derived before in different ways,and Ordejon et al78 found it also for the O(N) functional,even though it does not require the occupied states to beorthogonal. In this case, Eq. (A7) must be substitutedby a more complicated expression78.
Similarly, the stress contribution is
ĻorthogĪ±Ī² = ā
ā
ĀµĪ½
EĪ½ĀµāSĀµĪ½
āRĪ±ĀµĪ½
RĪ²ĀµĪ½ (A9)
APPENDIX B: RADIAL FAST FOURIER
TRANSFORM
We consider here how to perform fast integrals of theform
Ļl(k) =
ā« ā
0
r2drjl(kr)Ļl(r) (B1)
where jl(kr) is a spherical Bessel function and Ļl(r)is a radial function which behaves as Ļl(r) ā¼ rl forr ā 0. Although methods to perform fast Bessel andHankel transforms have been described previously in dif-ferent fields79,80,81 we have developed a simple methodadapted to our needs. It is based on the fact that jl(x)has the general form (P s
l (x) sin(x) + P cl (x) cos(x))/xl+1,
where P sl (x), P c
l (x) are simple polynomials P s,cl (x) =
āln=0 c
s,cln x
n. Thus, the method involves computing l+1fast sine and cosine transforms31 and add the differentterms:
Ļl(k) =
lā
n=0
csln2kl+1ān
ā« +ā
āā
Ļl(r)
rlā1ānsin(kr)dr
+
lā
n=0
ccln2kl+1ān
ā« +ā
āā
Ļl(r)
rlā1āncos(kr)dr (B2)
Notice that we have extended the integral to the wholereal axis, defining Ļl(ār) ā” (ā1)lĻl(r), in accordance tothe behaviour Ļl(r) ā¼ rl, r ā 0. The coefficients cs,c
ln canbe obtained by defining a complex polynomial Pl(x) =P c
l (x) + iP sl (x), which obeys the recurrence relations82
P0(x) = i ā”āā1
P1(x) = iā x (B3)
Pl+1(x) = (2l + 1)Pl(x) ā x2Plā1(x)
In order to perform the integrals in (B2) using discreteFFTās, we need to calculate Ļ(r) on a regular radial grid,up to a maximum radius rmax, beyond which Ļ(r) is as-sumed to be strictly zero. The separation ār betweengrid points determines a cutoff kmax = Ļ/ār in recip-rocal space, and vice versa, āk = Ļ/rmax. For con-volutions, such as those involved in Eq. (28), we needrmax = rc
1 + rc2 and kmax = max(kc
1, kc2), where rc
1,2, kc1,2
are the cutoff radii and maximum wavevectors of Ļ1,2,respectively. We must then pad with zeros the inter-vals [rc
1,2, rmax] for the forward transforms Ļ1,2(r) āĻ1,2(k). In practice, we set rmax = 2 maxĀµ(rc
Āµ), kmax =maxĀµ(kc
Āµ), where Āµ labels all the basis orbitals and KBprojectors, and we use the same real and reciprocalgrids for all orbital pairs. In this way, we need toperform the forward transform only once for each ra-dial function ĻĀµ(r). Finally, notice that in Eq. (28),Ļā
1,l1m1(k)Ļ2,l2m2
(k) ā¼ kl1+l2 for k ā 0, while l1+l2āl iseven and nonnegative, so that the integrands of Eq. (B2)for the backward transform are all even and well behavedat the origin.
APPENDIX C: EXTENDED-MESH ALGORITHM
We describe here a simple and efficient algorithm tohandle mesh indices in three-dimensional periodic sys-tems. Its versatility makes it suitable for several differ-ent tasks in Siesta like neighbor-list constructions, ba-sis orbital evaluation in the real-space integration grid,density-gradient calculations in the GGA, etc. It wouldbe also very apropriate for other problems, like the so-lution of partial differential equations by real-space dis-cretization or the calculation of the interaction energyin lattice models. For clarity of the exposition, we willdescribe the algorithm for a specially simple application,namely the evaluation of the Laplacian of a function f(r)using finite differences, even though the algorithm is notused in Siesta for this purpose. In three dimensions,one generally discretizes space in all three periodic direc-tions, using an index for each direction. For simplicity,let us consider an orthorhombic unit cell, with mesh stepsāx,āy,āz. Then the simplest formula for the Laplacianis
ā2fix,iy,iz= (fix+1,iy,iz
ā 2fix,iy,iz+ fixā1,iy ,iz
)/āx2
+ (fix,iy+1,izā 2fix,iy,iz
+ fix,iyā1,iz)/āy2
+ (fix,iy,iz+1 ā 2fix,iy,iz+ fix,iy ,izā1)/āz
2
19
A direct translation of this expression into Fortran90code might read
Lf(ix,iy,iz) = &( f(modulo(ix+1,nx),iy,iz) + &
f(modulo(ix-1,nx),iy,iz) )/dx2 &+ ( f(ix,modulo(iy+1,ny),iz) + &
f(ix,modulo(iy-1,ny),iz) )/dy2 &+ ( f(ix,iy,modulo(iz+1,nz)) + &
f(ix,iy,modulo(iz-1,nz)) )/dz2 &- f(ix,iy,iz) * (2/dx2+2/dy2+2/dz2)
where the indices iĪ± (Ī± = x, y, z) of the arrays f andLf run from 0 to nĪ±ā1, as in C. There are two problemswith this construction. First, the modulo operations arerequired to bring the indices back to the allowed range[0, nĪ± ā 1]. And second, the use of three indices to referto a mesh point implies implicit arithmetic operations,generated by the compiler, to translate them into a singleindex giving its position in memory.
A straightforward solution to these inefficiencies wouldbe to create a neighbor-point list j_neighb(i,neighb),of the size of the number of mesh points times the num-ber of neighbor points. However, although the latter areonly six in our simple example, they may frequently beas many as several hundred, what generally makes thisapproach unfeasible. A partial solution, addressing onlythe first problem, is to create six (or more for longerranges) one-dimensional tables jĀ±1
Ī± (iĪ±) = mod(iĪ±Ā±1, nĪ±)to avoid the modulo computations83. Here, we describea multidimensional generalization of this method, whichsolves both problems at the expense of a very reasonableamount of extra storage.
The method is based on an extended mesh, which ex-tends beyond the periodic unit cell, by as much as re-quired to cover all the space that can be reached fromthe unit cell by the range of the interactions or thefinite-difference operator. The extended mesh range isiminĪ± = āānĪ± and imax
Ī± = nĪ± ā1+ānĪ±, where ānĪ± = 1in our particular example, in which the Laplacian for-mula extends just to first-neighbor mesh points. In prin-ciple, in cases with a small unit cell and a long range, themesh extension may be larger than the unit cell itself,extending over several neighbor cells. However, in themore relevant case of a large system, we will expect theextension region to be small compared to the unit cell.We then consider two combined indices, one associatedto the normal unit-cell mesh, and another one associatedto the extended mesh
i = ix + nxiy + nxnyiz,
iext = (ix ā iminx ) +next
x (iy ā iminy ) +next
x nexty (iz ā imin
z ),
where nextĪ± = imax
Ī± ā iminĪ± + 1 = nĪ± + 2ānĪ±. The key
observation is that, if iext is a mesh point within theunit cell (0 ā¤ iĪ± ā¤ nĪ± ā 1), and if jext is a neighbormesh point (within its interaction range, i.e. |jĪ± ā iĪ±| ā¤ānĪ±), then the arithmetic difference jext ā iext depends
only on the relative positions of iext and jext (i.e. onjĪ± ā iĪ±), but not on the position of iext within the unitcell. We can then create a list of neighbor strides āijext,and two arrays to translate back and forth between i andiext. One of the arrays maps the unit cell points to thecentral region of the extended mesh, while the other onefolds back the extended mesh points to their periodicallyequivalent points within the unit cell. Then, to access theneighbors of a point i, we a) translate i ā iext; b) findjext = iext + āijext; and c) translate jext ā j. Noticethat several points of the extended mesh will map to thesame point within the unit cell and that, in principle, aunit cell point j may be neighbor of i through differentvalues of jext. In our example, the innermost loop wouldthen read
Lf(i) = 0do neighb = 1,n_neighb
j_ext = i_extended(i) + ij_delta(neighb)j = i_cell(j_ext)
Lf(i) = Lf(i) + L(neighb) * f(j)end do
where the number of neighbor points would ben_neighb=7, including the central point itself, and
ij_delta(1) = 1 ; L(1) = 1/dx2ij_delta(2) = -1 ; L(1) = 1/dx2
ij_delta(3) = nx_ext ; L(3) = 1/dy2ij_delta(4) = -nx_ext ; L(4) = 1/dy2ij_delta(5) = nx_ext*ny_ext ; L(5) = 1/dz2
ij_delta(6) = -nx_ext*ny_ext ; L(6) = 1/dz2ij_delta(7) = 0 ; L(7)=-2/dx2-2/dy2-2/dz2
Notice that the above loop is completely general, for anylinear operator, using an arbitrary number of neighborpoints for its finite difference representation. In fact, itis even independent of the space dimensionality. Further-more, the index operations required are just one additionand three memory calls to arrays of range one84. This in-ner loop simplicity comes at the expense of the two extraarrays i_extended and i_cell (of the size of the normaland extended meshes, respectively) which are generallyan acceptable memory overhead. Notice, however, thatthe the neighbor-point list ij_delta is independent ofthe mesh index i, what makes this array quite small inmost problems of interest.
APPENDIX D: SPARSE MATRIX TECHNIQUES
We will describe here some of the sparse-matrix mul-tiplication techniques used in evaluating Eqs. (35), (86),(87), and (88). There is a large variety of sparse-matrixrepresentations and algorithms, each one optimized for adifferent kind of sparsity. The main constraint for choos-ing our representation and algorithms is that they mustbe O(N) in both memory and CPU time. We enforcethis condition strictly by requiring, for example, that avector of size ā¼ N will not be reset to zero a number
20
ā¼ N of times. In our sparse matrices, like SĀµĪ½ , HĀµĪ½ , cĀµi,ĻĀµĪ½ , and ĻĀµ(r), the number p of non-zero elements in arow is typically much larger than one (but still of orderā¼ N0) and much smaller than the row size m ā¼ N1.Such matrix rows are efficiently stored as a real vector ofsize p, containing the non-zero elements, and an integervector of the same size containing the column index ofeach non-zero element. The whole matrix A of n rows isthen represented by two arrays A and jcol, of size nĆ p,such that85 Aij = A(i,k), where j = jcol(i,k). Theproblem with this representation is that, given a valuej of the column index, there is no simple way to accessthe element Aij without scanning the whole row, whatis frequently too costly. One solution is to unpack a rowi, that will be repeatedly used, into āexpanded formā, i.e.to transfer it to a vector Arow of the full row size m (con-taining also all the zeros), so that Aij = Arow(j). Sincep >> 1, the size of Arow is negligible compared to thatof A and jcol.
To find the matrix product C of two sparse matricesA and B
Cik =ā
j
AijBjk
we proceed iteratively for each row i of A (which willgenerate the same row of C): each non-zero element jof the row is multiplied by every non-zero element of thejth row of B (whose column index is, say k) and theresult is accumulated in the kāth position of an auxiliaryāexpandedā vector. After finishing with that row of A wepack the vector in sparse format into the ith row of C andrestore the auxiliary vector to zero. In fact, the packingcan be performed simultaneously with the product, usingan auxiliary index vector instead:
C = 0.ncolC = 0
jcolC = 0pos = 0 ! Auxiliary index vectordo i = 1, nA
do jA = 1,ncolA(i)j = jcolA(i,jA)
do jB = 1,ncolB(j)k = jcolB(j,jB)
jC = pos(k)if (jC==0) then ! New non-zero col
ncolC(i) = ncolC(i) + 1jC = ncolC(i)
jcolC(i,jC) = kpos(k) = jC
endifC(i,jC) = C(i,jC) + A(i,jA)*B(j,jB)
enddoenddo
do jA = 1,ncolA(i) ! Restore pos to zeroj = jcolA(i,jA)
do jB = 1,ncolB(j)k = jcolB(j,jB)
pos(k) = 0enddo
enddoenddo
Notice that the auxiliary vector pos, which keeps theposition in āpacked formatā of the non-zero elements ofone row of C, is initiallized in full only once. Noticealso that this algorithm, unlike those of ref. 31, does notrequire the matrix elements to be stored in ascendingcolumn order.
The previous algorithm generates all the non-zero ele-ments of C but in many cases we need only some of them.For example, to calculate the electron density (Eq. (35)),we need only the density matrix elements ĻĀµĪ½ for whichĻĀµ and ĻĪ½ overlap. Also the expression (88) needs tobe evaluated only for the coefficients cĀµi which are al-lowed to be non-zero by the localization constraints. Inthese cases, in which the array jcolC is already known,another algorithm is more effective. We start by find-ing the sparse representation of B in column order or, inother words, the transpose of B:
Bt = 0 ! B transposejcolBt = 0
ncolBt = 0do i = 1,nB
do jB = 1,ncolB(i)j = jcolB(i,jB)ncolBt(j) = ncolBt(j) + 1
jBt = ncolBt(j)jcolBt(j,jBt) = i
Bt(j,jBt) = B(i,jB)enddo
enddo
We then unpack a row i of A and multiply it by a columnj of B (a row of its transpose) for each required matrixelement Cij of their product:
C = 0.
Arow = 0. ! Auxiliary vectordo i = 1,nC
do jA = 1,ncolA(i) ! Copy one row of Aj = jcolA(i,jA)
Arow(j) = A(i,jA)enddo
do jC = 1,ncolC(i) ! Calculate Cijj = jcolC(i,jC)
do jBt = 1,ncolBt(j)k = jcolBt(j,jBt)
C(i,jC) = C(i,jC) + Arow(k)*Bt(j,jBt)enddo
enddodo jA = 1,ncolA(i) ! Restore Arow to zero
j = jcolA(i,jA)Arow(j) = 0.
enddoenddo
21
The combination of these two matrix multiplication al-gorithms allows an efficient evaluation of Eqs. (86-88).Since these equations involve a trace or a relatively smallsubset of a final matrix, it is important to control theorder and sparsity of the intermediate products, in orderto keep them as sparse as possible. Notice that, once arow of A Ć B has been evaluated, it may be multipliedby a third matrix, to obtain a row of the final product,without need of storing the whole intermediate matrix.
To calculate the density at a grid point using Eq. (35)we need to access the matrix elements ĻĀµĪ½ , and this is in-efficient if they are stored in sparse format. Thus, we firstcopy the matrix elements, between the nr basis orbitals
which are non-zero at the grid point r, into an auxiliarymatrix array, of size naux Ć naux, with naux ā„ nr. Wealso create a lookup table pos, of size equal to the to-tal number of basis orbitals, such that pos(mu) is theposition, in the auxiliary matrix, of the matrix elementsof orbital mu (or zero if they have not been copied toit). If there are new non-zero orbitals at the next gridpoints, we keep copying them into the auxiliary matrix,until all its naux slots are full, at which point we erase itand restart the process. Since succesive grid points tendto contain the same non-zero basis orbitals, these copiesand erasures are not frequent.
1 L. Greengard, Science 265, 909 (1994).2 R. W. Hockney and J. W. Eastwood, Computer Simulation
Using Particles (IOP Publishing, Bristol, 1988).3 R. Car and M. Parrinello, Phys. Rev. Lett. 55, 2471 (1985).4 C. M. Goringe, D. R. Bowler, and E. Hernandez, Rep.
Prog. Phys. 60, 1447 (1997).5 P. Ordejon, Comp. Mat. Sci. 12, 157 (1998).6 P. Ordejon, D. A. Drabold, M. P. Grumbach, and R. M.
Martin, Phys. Rev. B 48, 14646 (1993).7 O. F. Sankey and D. J. Niklewski, Phys. Rev. B 40, 3979
(1989).8 M. C. Payne, M. P. Teter, D. C. Allan, T. A. Arias, and
J. D. Joannopoulos, Rev. Mod. Phys. 64, 1045 (1992).9 S. Goedecker, Rev. Mod. Phys. 71, 1085 (1999).
10 E. Hernandez and M. J. Gillan, Phys. Rev. B 51, 10157(1995).
11 P. Ordejon, E. Artacho, and J. M. Soler, Phys. Rev. B 53,R10441 (1996).
12 D. Sanchez-Portal, P. Ordejon, E. Artacho, and J. M.Soler, Int. J. Quantum Chem. 65, 453 (1997).
13 P. Ordejon, Physica Status Solidi (b) 217, 335 (2000), seealso http://www.uam.es/siesta.
14 W. Kohn and L. J. Sham, Phys. Rev. 140, A1133 (1965).15 J. P. Perdew and A. Zunger, Phys. Rev. B 23, 5048 (1981).16 J. P. Perdew, K. Burke, and M. Ernzerhof, Phys. Rev.
Lett. 77, 3865 (1996).17 D. R. Hamann, M. Schluter, and C. Chiang, Phys. Rev.
Lett. 43, 1494 (1979).18 G. B. Bachelet, D. R. Hamann, and M. Schluter, Phys.
Rev. B 26, 4199 (1982).19 L. Kleinman and D. M. Bylander, Phys. Rev. Lett. 48,
1425 (1982).20 S. G. Louie, S. Froyen, and M. L. Cohen, Phys. Rev. B 26,
1738 (1982).21 L. Kleinman, Phys. Rev. B 21, 2630 (1980).22 G. B. Bachelet and M. Schluter, Phys. Rev. B 25, 2103
(1982).23 N. Troullier and J. L. Martins, Phys. Rev. B 43, 1993
(1991).24 P. E. Blochl, Phys. Rev. B 41, 5414 (1990).25 N. J. Ramer and A. M. Rappe, Phys. Rev. B 59, 12471
(1999).26 D. Vanderbilt, Phys. Rev. B 32, 8412 (1985).27 The local potentials constructed in this way usually have a
strength (depth) that is an average of the different Vlās and
neither too deep nor too shallow. This tends to maintainthe separable potentials free of ghost states86.
28 For some atoms, tipically those with semicore statessuitable to be treated together with the valence states,Vl(r) only assumes the asymptotic coulombic behaviourā2Zval/r, and therefore only cancels out exactly with ourVlocal(r), for r larger than certain rC > rcore. In thesecases, to avoid very extended KB projector functions, wegenerate the local potentials with a prescription differ-ent from that presented in the text: if rC > 1.3 rcore
we take, Vlocal(r) = Vl(r) for r > rcore and, Vlocal(r) =exp(v1 +v2r
2+v3r3) for r < rcore, where v1, v2, and v3 are
determined by enforcing the continuity of the potential upto the second derivative. This simple prescription usuallyproduces smooth local potentials with similar propertiesto those commented in the text (see note 27).
29 If there are both semicore and valence electrons with thesame angular momentum, the pseudopotential is generatedfor an ion.
30 D. Vanderbilt, Phys. Rev. B 41, 7892 (1990).31 W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P.
Flannery, Numerical Recipes (Cambridge University Press,Cambridge, 1992).
32 D. Sanchez-Portal, J. M. Soler, and E. Artacho, J. Phys.:Condens. Matter 8, 3859 (1996).
33 G. Lippert, J. Hutter, P. Ballone, and M. Parrinello, J.Phys. Chem. 100, 6231 (1996).
34 E. Artacho, D. Sanchez-Portal, P. Ordejon, A. Garcia, andJ. M. Soler, Phys. Stat. Sol. (b) 215, 809 (1999).
35 S. Huzinaga et al., Gaussian basis sets for molecular cal-culations (Elsevier Science, Berlin, 1984).
36 J. Junquera, O. Paz, D. Sanchez-Portal, and E. Artacho(2001), cond-mat/0104170. To appear in Phys. Rev. B.
37 C. Filippi, D. J. Singh, and C. J. Umrigar, Phys. Rev. B50, 14947 (1994).
38 C. Kittel, Introduction to Solid State Physics (John Wileyand Sons, New York, 1986).
39 Some integrals, like ćĻIlmn|VNA
I |ĻIā²lā²mā²nā²ć could alsobe calculated in this way, but this is not the caseof ćĻIlmn|V
NA
Iā²ā² |ĻIā²lā²mā²nā²ć, which involve rather cum-bersome three-center integrals of arbitary numericalfunctions. Therefore, it is simpler to find the totalneutral-atom potential and to calculate a single integralćĻIlmn|V
NA(r)|ĻIā²lā²mā²nā²ć in the uniform spatial grid.40 J. D. Jackson, Classical Electrodynamics (John Wiley &
22
Sons, New York, 1962).41 Notice that our grid cutoff to represent the density is not
directly comparable to the energy cutoff in the context ofplane wave codes, which usually refers to the wavefunc-tions. Strictly speaking, the density requires a value fourtimes larger.
42 L. C. Balbas, J. L. Martins, and J. M. Soler, Phys. Rev. B64, 165110 (2001).
43 G. Makov and M. C. Payne, Phys. Rev. B 51, 4014 (1995).44 L. M. Sandratskii and P. G. Guletskii, J. Phys. F: Metal
Phys. 16, L43 (1986).45 J. Kubler, K. H. Hock, J. Sticht, and A. R. Williams, J.
Appl. Phys 63, 3482 (1988).46 T. Oda, A. Pasquarello, and R. Car, Phys. Rev. Lett. 80,
3622 (1998).47 A. V. Postnikov, P. Engel, and J. M. Soler (2001), cond-
mat/0109540. Submitted to Phys. Rev. B.48 J. Moreno and J. M. Soler, Phys. Rev. B 45, 13891 (1992).49 H. J. Monkhorst and J. D. Pack, Phys. Rev. B 13, 5188
(1976).50 N. D. Mermin, Phys. Rev. B 137, A1441 (1965).51 J. Harris, Phys. Rev. B 31, 1770 (1985).52 W. M. C. Foulkes and R. Haydock, Phys. Rev. B 39, 12520
(1989).53 R. D. King-Smith and D. Vanderbilt, Phys. Rev. B 47,
1651 (1993).54 R. Resta, Rev. Mod. Phys. 66, 899 (1994).55 G. Saghi-Szabo, R. E. Cohen, and H. Krakauer, Phys. Rev.
Lett. 80, 4321 (1998).56 D. Vanderbilt, J. Phys. Chem. Solids 61, 147 (2000).57 D. Sanchez-Portal, I. Souza, and R. M. Martin, in Fun-
damental Physics of Ferroelectrics, edited by R. Cohen(American Institute of Physics, Melville, 2000), vol. 535of AIP Conference Proceedings, pp. 111ā120.
58 S. DallāOlio and R. Dovesi, Phys. Rev. B 56, 10105 (1997).59 E. Yaschenko, L. Fu, L. Resca, and R. Resta, Phys. Rev.
B 58, 1222 (1998).60 F. Mauri, G. Galli, and R. Car, Phys. Rev. B 47, 9973
(1993).61 J. Kim, F. Mauri, and G. Galli, Phys. Rev. B 52, 1640
(1995).62 U. Stephan, D. A. Drabold, and R. M. Martin, Phys. Rev.
B 58, 13472 (1998).63 W. Kohn, Phys. Rev. 115, 809 (1959).64 E. Anderson, Z. Bai, C. Bischof, L. S. Blackford, J. Dem-
mel, J. Dongarra, J. D. Croz, A. Greenbaum, S. Hammar-ling, A. McKenney, et al., LAPACK Usersā Guide (SIAM,Philadelphia, 1999).
65 P. Pulay, Chem. Phys. Lett. 73, 393 (1980).66 P. Pulay, J. Comp. Chem. 13, 556 (1982).67 M. P. Allen and D. J. Tildesley, Computer Simulation of
Liquids (Oxford Univ. Press, Oxford, 1987).
68 A. Garcia and J. M. Soler (2001), unpublished.69 K. Parlinski (2001), unpublished.70 J. M. Pruneda, S. Estreicher, J. Junquera, J. Ferrer, and
P. Ordejon (2001), cond-mat/0109306. To appear in Phys.Rev. B.
71 M. Brandbyge, K. Stokbro, J. Taylor, J.-L. Mozos, andP. Ordejon, Mat. Res. Soc. Symp. Proc. 636, D9.25.1(2001).
72 E. N. Economou, Greenās Functions in Quantum Physics(Springer-Verlag, Berlin, 1983).
73 A. J. Read and R. J. Needs, Phys. Rev. B 44, 13071 (1991).74 E. K. U. Gross, C. A. Ullrich, and U. J. Gossmann, in
Density Functional Theory, edited by E. K. U. Gross andR. M. Dreizler (Plenum Press, New York, 1995), pp. 149ā171.
75 E. K. U. Gross, J. F. Dobson, and M. Petersilka, in Den-sity Functional Theory II: Relativistic and Time DependentExtensions, edited by R. F. Nalewajski (Springer-Verlag,Berlin-Heidelberg, 1996), vol. 181 of Topics in CurrentChemistry, pp. 81ā172.
76 K. Yabana and G. F. Bertsch, Phys. Rev. B 54, 4484(1996).
77 A. Tsolakidis, D. Sanchez-Portal, and R. M. Martin (2001),cond-mat/0109488.
78 P. Ordejon, D. A. Drabold, R. M. Martin, and M. P. Grum-bach, Phys. Rev. B 51, 1456 (1995).
79 B. W. Suter, IEEE Transactions on Signal Processing 39,532 (1991).
80 A. A. Mohsen and E. A. Hashish, Geophysical Prospecting42, 131 (1994).
81 T. Ferrari, D. Perciante, and A. Dubra, Journal of theOptical Society of America A 16, 2581 (1999).
82 M. Abramowitz and I. A. Stegun, Handbook of Mathemath-ical Functions (Dover Publications, New York, 1964).
83 K. Binder and D. W. Heermann, Monte Carlo Simulationin Statistical Physics (Springer Verlag, Berlin, 1992).
84 In the present example, further savings can be achieved byextending the arrays f(i) and Lf(i) themselves, eliminatingthe index translations and facilitating the paralelization.Another efficient possibility is provided in Fortran90 bythe intrinsic cshift operation. However, these approachesare more complicated in other cases, like atomic-neighborlist constructions, while the double index algorithm is quitegeneral.
85 In Fortran, we alternate the order of i and k, to store therow elements consecutively in memory. If the number ofnonzero elements fluctuates largely for different rows, wealso store all the rows consecutively as a large single vector.
86 X. Gonze, R. Stumpf, and M. Scheffler, Phys. Rev. B 44,8503 (1991).