Resource, Spring 2006 - site.physics.georgetown.edu

5

Transcript of Resource, Spring 2006 - site.physics.georgetown.edu

ERDC MSRC Resource, Spring 2006 1

Contentsfrom the director . . .

Hurricane Katrina Researchers Tap High Performance Computing ResourcesBy Wayne Stroupe, Interagency Performance Evaluation Task Force Public Affairs ...................................... 2

Governor Haley Barbour Dedicates Most Powerful Supercomputer in Department of DefenseBy Rose J. Dykes ........................................................................................................................................ 6

ERDC Cray XT3 Proves Vital in the Certification of the Miniature Air-Launched DecoyBy Dr. Nathan Prewitt and Phil Bucci ........................................................................................................... 7

Nonlinear Response of Materials to Large Electric Fields: A Capabilities ApplicationProject on the Cray XT3By Dr. James Freericks, Department of Physics, Georgetown University ................................................... 10

Tuning Codes for Performance on the Cray XT3By Drs. Sam B. Cable and Thomas C. Oppe ............................................................................................. 13

Cray XT3 Input/OutputBy Bobby Hunter and Dr. Jeff Hensley ....................................................................................................... 20

Porting Codes to SapphireBy Robert Alter .......................................................................................................................................... 22

Launch of ezHPCBy Scotty Swillie ....................................................................................................................................... 24

visitors ........................................................................................................................................................ 27

acronyms .................................................................................................................................................... 30

training schedule ......................................................................................................................................... 32

10 ERDC MSRC Resource, Spring 2006

Nonlinear effects in nature are often the most impor-tant and most interesting effects in science. Manyelectronic devices either have their operation based ontheir nonlinear characteristics (like the transistor) orhave the regime where they stop operating properlydetermined by where nonlinear effects become toostrong (like intermodulation distortion in passive micro-wave filters). There is much interest, from both a basicscience and an applications perspective, in understand-ing nonlinear effects in strongly correlated materials.This is because strongly correlated materials havemany properties that can be tuned by simply applyingpressure, changing the temperature, or by chemicaldoping. Examples of these materials include the high-temperature superconductors, rare-earth magnets,Kondo-effect and heavy Fermion materials, Mottinsulators, and so on. Strongly correlated materials arecharacterized by having a strong enough electron-electron interaction that the independent particle picture(i.e., band theory) does not apply, and their behavior isdetermined by complex quantum-mechanical dynamics.The most interesting materials are those that lie close toa phase boundary between different types of materials,like the boundary between a metal and an insulator, orbetween a magnetic and a nonmagnetic system.

As the military becomes more and more “high-tech,”it relies on electronics in both the analog and digitalrealms for all forms of operation. These electronicsdevices are susceptible to attack or damage from large

Nonlinear Response of Materials to Large Electric Fields:A Capabilities Application Project on the Cray XT3By Dr. James Freericks, Department of Physics, Georgetown University

electric fields (either from natural sources or from high-energy, pulsed-beam weapons). In addition, the militaryis moving toward “smart” electronics, which haveelements that can be tuned “on the fly” to respond todifferent needs of the user. Strongly correlated materi-als, because of their inherent high tunability, are solidcandidates for use in such devices. Unfortunately, littleis known about their response to these kinds of electricfields, and modeling can provide useful answers to helpengineer and design the next generation of smartelectronics devices.

This Capabilities Application Project(CAP) provides the first numericallyexact solution to this long-standing

problem.

Strongly correlated materials require the solution of theso-called many-body problem to understand theirbehavior. For years, the many-body problem wasviewed to be essentially intractable, but significantprogress has been made over the past two decades ascomputational algorithms became more sophisticated,and as new ideas emerged for successfully solving thequantum-mechanical problem. Dynamical mean-fieldtheory [1] was introduced in 1989 as a new way toapproach this problem. In the ensuing 17 years, it hasbeen employed to solve most of the “classic'” models inmany-body physics (Hubbard, periodic Anderson,Falicov-Kimball, Holstein, etc.) [2] in equilibrium.These solutions have provided tremendous insight intothe Mott metal-insulator transition and the role of strongelectron correlations in real materials.Our project extended this highly successful approachfrom the equilibrium case to the nonequilibrium case (inthe presence of an external electric field), which allowsthe nonlinear response of the material to be calculated.The formalism for solving the nonequilibrium many-body problem (in an arbitrary strength external field)was worked out in the early 1960s [3,4]. We solve thesimplest many-body model that illustrates this behavior.The model has two kinds of particles on a crystallattice: mobile electrons, which can move from onelattice site to the nearest neighbor site, and localizedelectrons, which do not move [5]. When a mobileelectron sits on the same site as a conduction electron,there is an energy cost (of magnitude U) because of

Guest writer Dr. JamesFreericks received hisbachelors degree (summacum laude) in Physics fromPrinceton University in 1985and his Ph.D. in Physics fromthe University of California,Berkeley, in 1991. Afterspending time as a postdoctoral fellow atUC’s Santa Barbara and Davis campuses, hewent to Georgetown University, where hehas been a Professor of Physics since 1994.He has been funded by the Office of NavalResearch, starting with a Young Investiga-tor Program Award, since 1996, and hasbeen involved in high performance com-puting on DoD machines since 2001. Thiswas his first CAP.

ERDC MSRC Resource, Spring 2006 11

the mutual electrical repulsion between the particles. IfU is large enough, and the number of mobile pluslocalized electrons equals the number of lattice sites inthe crystal, then all electrons become frozen, and thematerial cannot conduct electricity – the systemundergoes a metal-to-insulator transition as a functionof U.

In nonequilibrium calculations, we need to work withobjects called Green's functions, which describe thedistribution in energy of available quantum states in thesystem, and the statistical occupation of those states asthey are driven by the external electric field. Ourformulation works in real time, where the transienteffects are included, and the steady-state response onlybuilds up after a long period. The Green’s functions arecontinuous matrix operators defined on the so-calledKadanoff-Baym-Keldysh contour, which involvesevolving the system forward in time from some startingpoint to some ending point, and then de-evolving thesystem backward in time from the ending point to thestarting point. One of the numerical challenges of thisapproach is that we need to discretize the continuousmatrix operator so we can perform matrix operationson it via computer-based linear-algebra techniques. Weare limited, by computational time (and to some degreeby memory), in how large a matrix we can use in thecalculations, which ultimately limits how far out in timethe calculations can go.The formalism is straightforward, but complex. Thenoninteracting solution in arbitrarily large electric field isknown [6], so we need to add in the effects of theelectron-electron interactions. We use the dynamicalmean-field theory approach to do this, which requires

us to solve a set of nonlinear matrix-valued equationsself-consistently. Describing the quantum-mechanicalbasis for this algorithm is beyond the scope of thisarticle. Instead we focus here on the numerical issuesand the use of the XT3 for large-scale parallel pro-grams. The most challenging numerical part of thealgorithm involves evaluating a two-dimensionalGaussian-weighted matrix-valued integral to determinethe Green’s function. The integrand requires a matrixinversion and a matrix multiplication to determine itsvalue (the matrix is a general complex matrix). We useGaussian integration and average the results of adiscretization of 100 points per dimension and 101 pointsper dimension for a total of 20,201 quadrature points,and hence 20,201 matrix inversions and multiplicationsper iteration step. Since the matrices in the integrandare independent of one another, this part of the algo-rithm is easily parallelized in the master/slave format,by sending the matrix inversions to different slavenodes. After the integration is completed, the masternode needs to perform an additional four matrix inver-sions to complete one step of the iteration. Thesecalculations must proceed in a serial fashion, becauseresults from the previous calculation are required forthe next. Because of this structure, we expect thecomputational time for the code to be expressed as asum of these two pieces T = T0+T1/N, where T0 is thetime for the serial part of the code, T1 the total time forthe parallel part, and N the number of slave nodesduring the run. Of course, during the time that the serial

Figure 1. Run time scaling analysisfor a medium-sized code on theCray XT3. The black symbols arethe original code; the red are thecommunications-tuned code. Thegreen is the fit to the expected formbecause of the serial part of thecode. The blue is the limiting resultfor N = 10,201 processors or more

The formalism is straightforward,but complex.

12 ERDC MSRC Resource, Spring 2006

part of the code is running, the slave nodes are idle, andthe relative fraction of time for this part of the codeincreases as the number of nodes increases. Hence thecode cannot display complete linear scale-up with thenumber of nodes, but we will see that we are able toperform a linear scale-up of the parallel part of thecode up to the maximal number of nodes that it makessense to run on. Since the serial part of the code is atmost only about half of the total computational time(and often is much less), we run with an efficiency of50 percent or more.

The code is quite robust, having been run on Beowulf-type machines, the T3E, and the X1, prior to beingported to the XT3, so it was believed that there wouldbe little required to get the code to scale well to largenumbers of slave nodes. This turned out not to be true,as can be seen from our scaling attempts on a medium-sized problem of 1700X1700 matrices, with the blacksymbols in Figure 1. Once we reached about 1,000processors, the code stopped scaling with the numberof slave nodes. We quickly realized the problem herewas a communications bottleneck of using a many-to-one communications method to send the matrix resultsfrom each slave node to the master node for accumula-tion of the quadrature. This problem was easily fixed byusing a recursive-binary-gather operation to accumulateresults on the nodes, in turn, and then send the finalresults to the master node. One can see the scalingbehavior returns by looking at the red symbols for thecommunications-tuned code in Figure 1. The recursivegather scheme involves taking all the active slavenodes, dividing them in half, and sending the resultsfrom one half of the nodes to the other half. Once aslave node has sent its results, it becomes inactive. Theprocess is repeated until only one active slave noderemains. At this point, it sends the accumulated resultsto the master node, and the serial part of the codestarts. The serial code takes the local Green’s functionmatrix (which resulted from the two-dimensionalintegration) and performs four more matrix inverses toconstruct the next approximation to the self-energymatrix. The system is checked for convergence, and ifnot converged, the iterations continue. Typically, weneed between 10 and 50 iterations to reach conver-gence. Once the calculations have converged for theGreen’s functions, we can directly extract the electrondistribution functions. These data tell us how the electronsdirectly evolve in response to the electric field. In addition,we also investigate the current as a function of time, tosee the transient effects of oscillations.

The results of the scaling analysis show that the coderuns well on a large number of processors. The rolloveras N becomes large occurs because of the serial partof the code. If we follow the initial slope, for thelinearly scaling part of the curve, we can see that invirtually all results, the efficiency is better than 50 per-cent; so choosing an optimal number of processorsrequires us to balance the efficiency factor versus theactual clock time for the code. On the XT3, we find amedium-sized problem, with 1700X1700 matrices takingabout 1,100 hours per iteration, with between 15 and30 iterations needed for convergence. Increasing thesize to 2200X2200 resulted in an increase of thecomputational time to about 2,000 hours per iteration(because the matrix inversion is the limiting part of thecode, the time should scale as the cube of the matrixsize). We could not increase the problem size largerthan this on the XT3, because the memory per nodewas just 1GB at the time. Results for differentdiscretization sizes need to be scaled to the limit wherethe discretization size goes to zero to achieve reliableresults. We could show, by comparing to exact sumrules [7], that our results improved in accuracy from anaverage error of a few percent to less than one-tenthof a percent when we could scale to the zerodiscretization limit. Our CAP project used about600,000 CPU hours for the scaling analysis and produc-tion runs over a period of about 6 weeks. We arecurrently preparing these results for publications inpeer-reviewed journals.

References:[1] W. Metzner and D. Vollhardt, “Correlated Lattice Fermionsin d=∞ Dimensions,” Phys. Rev. Lett. 62, 324 (1989).

[2] A. Georges, G. Kotliar, W. Krauth, M. J. Rozenberg,“Dynamical mean-field theory of strongly correlated Fermionsystems and the limit of infinite dimensions,” Rev. Mod.Phys. 68, 13 (1996).

[3] L. P. Kadanoff and G. Baym, Quantum Statistical Mechan-ics, (New York, Benjamin, 1962).

[4] L. V. Keldysh, “Diagram Technique for NonequilibriumProcesses,” J. Exptl. Theoret. Phys. (USSR) 47, 1515 (1964)[Sov. Phys. JETP 20, 1018 (1965)].

[5] L. M. Falicov and J. C. Kimball, “Simple model for semi-conductor-metal transitions: SmB6 and transition-metaloxides,” Phys. Rev. Lett. 22, 997 (1969).

[6] V. Turkowski and J. K. Freericks, “Nonlinear response ofBloch electrons in infinite dimensions,” Phys. Rev. B 71,085104 (2005).

[7] V. O. Turkowski and J. K. Freericks, “Spectral moment sumrules for strongly correlated electrons in time-dependentelectric fields,” Phys. Rev. B 73, 075108—1-15 (2006).