GYRO on the Cray X1E: High Performance on a 5D Code with...

22
1 GYRO on the Cray X1E GYRO on the Cray X1E: High Performance on a 5D Code with Extreme Bandwidth Requirements J. Candy General Atomics, San Diego, CA Fall Creek Falls Conference 17-18 Oct 2005 Acknowledgements: GA: R. Waltz, C. Estrada-Mila, J. Kinsey ORNL: M. Fahey, P. Worley

Transcript of GYRO on the Cray X1E: High Performance on a 5D Code with...

1 GYRO on the Cray X1E

GYRO on the Cray X1E: High Performance on a5D Code with Extreme Bandwidth Requirements

J. CandyGeneral Atomics, San Diego, CA

Fall Creek Falls Conference17-18 Oct 2005

Acknowledgements:

GA: R. Waltz, C. Estrada-Mila, J. KinseyORNL: M. Fahey, P. Worley

2 GYRO on the Cray X1E

GYRO on the Cray X1E: An Technical Overview

Our chief weapon is surprise ... surprise and fear ... fear and surprise ...

Our two weapons are fear and surprise ... and ruthless efficiency ...

Our three weapons are fear, surprise, and ruthless efficiency ...

and an almost fanatical devotion to the Pope ...

Our four ... no ...

Amongst our weapons....

Amongst our weaponry ... are such elements as fear, surprise ...

3 GYRO on the Cray X1E

GYRO on the Cray X1E: An Technical Overview

• GYRO

– computes turbulent flux of particles and energy in a tokamak

– solves the gyrokinetic equations

– uses a 5-D Eulerian grid

– was developed at General Atomics, starting in late 1999.

– is the most comprehensive code of its kind

4 GYRO on the Cray X1E

GYRO on the Cray X1E: An Technical Overview

• Port to the ORNL Cray X1 done by Mark Fahey in 2003

– with modest effort, the code vectorized well

• Exhaustive performance tests published by Worley, Fahey and others.

– GYRO loop structure and memory access patterns are conducive to

good performance

– high X1E communication bandwidth on the allows excellent GYRO

scalability

5 GYRO on the Cray X1E

GYRO Overview: A 5D Eulerian Turbulence Code

6 GYRO on the Cray X1E

GYRO Overview: A 5D Eulerian Turbulence Code

7 GYRO on the Cray X1E

Very Good Performance on the Original X1Some Data Courtesy of Mark Fahey (ORNL)

16 32 64 128 256 512Number of processors

0.5

1.0

2.0

4.0

8.0

16.0

32.0ra

te (n

orm

aliz

ed to

32

Pow

er3)

IBM Power3 (seaborg)

IBM Power4

SGI Altix

Cray X1

Opteron 250 IB

8 GYRO on the Cray X1E

Very Good (non-FFT) Performance on the X1EResults Courtesy of Pat Worley (ORNL)

9 GYRO on the Cray X1E

Symbolic Distribution Scheme:Each box is a processor

f([n], {e, k}, i, j)

n.= (spectral) binormal direction

i.= radial direction

j.= parallel direction

{e, k}.= 2D velocity space

{e, k}1 {e, k}2 {e, k}3 {e, k}4

{e, k}5 {e, k}6 {e, k}7

n1, n2, n3 i, j i, j i, j i, j

n4, n5, n6 i, j i, j i, j i, j

n7, n8, n9 i, j i, j i, j i, j

n10, n11, n12 i, j i, j i, j i, j

10 GYRO on the Cray X1E

All-to-All Communication only on Subgroups

{e, k}1 {e, k}1

{e, k}5 {e, k}5

n1, n2, n3 i, j i, n j1, j5

n4, n4, n5, i, j i, n j2, j6

n6, n7, n8 i, j i, n j3, j7

n9, n10, n11 i, j i, n j4, j8

f([n], {e, k}, i, j)

→ f([j], {e, k}, i, n)

{e, k}1 {e, k}2 {e, k}3 {e, k}4

{e, k}5 {e, k}6 {e, k}7

n1, n2, n3 i, j i, j i, j i, j

n1, n2, n3 k, j k, j k, j k, j

{i, e}1 {i, e}2 {i, e}3 {i, e}4

{i, e}5 {i, e}6

f([n], {e, k}, i, j)

→ f([n], {i, e}, k, j)

11 GYRO on the Cray X1E

Matching Experimental Power Flows inDIII-D L-mode Plasmas

Phys. Rev. Lett 11 (2003) 045001.

Simulations can match between-

shot transport scaling, and can

match power flows within a

factor of two.

Most realistic gyrokinetic simu-

lations ever published. 250 300 350 4001/ρ∗

0

2

4

6

8

10

12

χ eff/

χ GB a

t r/a

=0.6

Full Physicsno ExB

νei = 0β=0

experiment

Full physics (-5%)

12 GYRO on the Cray X1E

The Local Limit of Global Simulations

Phys. Plasmas 11 (2004) L25.

Local simulations are the rigor-

ous limit of global simulations

as the gyroradius to system size

decreases.

The largest case is Worley’s

B3-GTC benchmark case. 0.2 0.4 0.6 0.8r/a

0.0

0.5

1.0

1.5

2.0

2.5

3.0

χ i [u

nits

of (

c s/a

)ρs2 ]

1/ρ∗ = 5001/ρ∗ = 3751/ρ∗ = 2101/ρ∗ = 105

GTC

GS2

PG3EQ GYRO

13 GYRO on the Cray X1E

Transport in the Vicinity of a Minimum-q Surface

Phys. Plasmas 11 (2004) 1879.

Transport is smooth across

a minimum-q surface. Local and

global results give consistent

picture.

Done exclusively on the X1. 0.2 0.3 0.4 0.5 0.6 0.7 0.8r/a

0.0

0.5

1.0

1.5

2.0

χ i [u

nits

of (

c s/a

)ρs2 ]

Flux-tube

ρ* = 0.0025

14 GYRO on the Cray X1E

Particle Transport in Reactors: Helium Pinch andDeuterum-Tritium flow Separation

Phys. Plasmas 12 (2005) 022305.

Discovered particle flow sep-

aration effect in D-T (reactor)

plasmas. Tritium (T) is better

confined than Deuterium (D).

Developed analytic theory via

FLR symmetry breaking.

First systematic gyrokinetic study

of impurity transport.

1.5 2.0 2.5 3.0 3.5 4.0 4.51/LTe

0

5

10

15

χ i /

χ GB

D

Pure D50% D50% T

(a)

1.5 2.0 2.5 3.0 3.5 4.0 4.51/LTe

-5

-4

-3

-2

-1

0

Di /

χG

BD

Pure D50% D50% T

(b)

15 GYRO on the Cray X1E

ExB Shear Stabilization of Turbulence

Phys. Plasmas 12 (2005) 062302.

In the DIII-D tokamak, equi-

librium sheared E×B rotation

is a powerful stabilizing mecha-

nism.

First and only systematic gy-

rokinetic study of E × B shear

stabilization with kinetic electrons.

16 GYRO on the Cray X1E

Electron Transport via Electromagnetic Fluctuations

Phys. Plasmas 12 (2005) 072307.

Electrons are strongly resonant

at rational magnetic surfaces.

Also, 50% of electron transport

is driven by magnetic-flutter at

β/βcrit = 0.6.

Done exclusively on the X1.1.36 1.38 1.40 1.42 1.44

q

1

2

3

4

d/dr

(ln T

e)

βe=0.1% βe=0.4% βe=0.7%

17 GYRO on the Cray X1E

The Connection between Entropy, Upwind Dissipation,Velocity-Space Resolution and Steady-States of Turbulence

Submitted to Phys. Plasmas.

Study performed to clarify various

misconceptions and misunder-

standings about Eulerian methods.

Results show that there is no miss-

ing velocity-space structure in

GYRO.

GYRO velocity-space integration is

extremely accurate, and simulations

are noiseless

32 64 128 256 512Velocity-space points (PPC)

140

160

180

200

220

240

Entropy (S+W)

32 64 128 256 512Velocity-space points (PPC)

8

10

12

14

16

Flux (F)

18 GYRO on the Cray X1E

Demonstration that the Parallel Nonlinearity is Negligible

Even at unphysically large val-

ues of ρ∗ = ρi/a, no effect of

the PNL is observed. This result

is essentially a requirement of

the gyrokinetic ordering.

Unlikely that current codes could

treat the term accurately if it was

important.0.001 0.010

ρi/a

0

1

2

3

4

5

χ i/χ

GB

PNL onPNL off

19 GYRO on the Cray X1E

Benchmarking Eulerian verus PIC codes onETG Simulations (adiabatic ion dynamics)

Submitted to Phys. Plasmas.

PIC Simulations of ETG are dom-

inated by discrete particle noise

in “strong transport” regime.

Benchmarking effort show per-

fect intercode agreement when

noise is under control.

GYRO (Eulerian, red) versus

PG3EQ (PIC, black). Blue curve

shows the “pure noise” diagnostic.

20 GYRO on the Cray X1E

Summary of Results and Requirements

• Most of the previous simulations/publications were done using

only 16-64 MSPs

• GYRO algorithms are powerful enough to make most ITG-only, ITG-TEM,

or ETG-only studies quick and easy

• Each publication typically presents from 10 to 100 simulations

Are there problems for which 64 MSPs are not enough?

21 GYRO on the Cray X1E

The Future: Fully-Coupled ITG-ETG Studies

• We want to attempt

simulation of ETG turbulence embedded in ITG turbulence

• Preliminary testing on 640 X1E MSPs shows

– µ.=

mi/me = 20 works with 0 ≤ kθρe ≤ 0.3

– implies that µ = 30 runs with 0 ≤ kθρe ≤ 0.6 are feasible

– cost scales roughly as µ3.5

22 GYRO on the Cray X1E

The Future: Fully-Coupled ITG-ETG Studies

This is our INCITE proposal; it will be described in detail at a talk at

ORNL on Wednesday