GYRO on the Cray X1E: High Performance on a 5D Code with...
Transcript of GYRO on the Cray X1E: High Performance on a 5D Code with...
1 GYRO on the Cray X1E
GYRO on the Cray X1E: High Performance on a5D Code with Extreme Bandwidth Requirements
J. CandyGeneral Atomics, San Diego, CA
Fall Creek Falls Conference17-18 Oct 2005
Acknowledgements:
GA: R. Waltz, C. Estrada-Mila, J. KinseyORNL: M. Fahey, P. Worley
2 GYRO on the Cray X1E
GYRO on the Cray X1E: An Technical Overview
Our chief weapon is surprise ... surprise and fear ... fear and surprise ...
Our two weapons are fear and surprise ... and ruthless efficiency ...
Our three weapons are fear, surprise, and ruthless efficiency ...
and an almost fanatical devotion to the Pope ...
Our four ... no ...
Amongst our weapons....
Amongst our weaponry ... are such elements as fear, surprise ...
3 GYRO on the Cray X1E
GYRO on the Cray X1E: An Technical Overview
• GYRO
– computes turbulent flux of particles and energy in a tokamak
– solves the gyrokinetic equations
– uses a 5-D Eulerian grid
– was developed at General Atomics, starting in late 1999.
– is the most comprehensive code of its kind
4 GYRO on the Cray X1E
GYRO on the Cray X1E: An Technical Overview
• Port to the ORNL Cray X1 done by Mark Fahey in 2003
– with modest effort, the code vectorized well
• Exhaustive performance tests published by Worley, Fahey and others.
– GYRO loop structure and memory access patterns are conducive to
good performance
– high X1E communication bandwidth on the allows excellent GYRO
scalability
7 GYRO on the Cray X1E
Very Good Performance on the Original X1Some Data Courtesy of Mark Fahey (ORNL)
16 32 64 128 256 512Number of processors
0.5
1.0
2.0
4.0
8.0
16.0
32.0ra
te (n
orm
aliz
ed to
32
Pow
er3)
IBM Power3 (seaborg)
IBM Power4
SGI Altix
Cray X1
Opteron 250 IB
8 GYRO on the Cray X1E
Very Good (non-FFT) Performance on the X1EResults Courtesy of Pat Worley (ORNL)
9 GYRO on the Cray X1E
Symbolic Distribution Scheme:Each box is a processor
f([n], {e, k}, i, j)
n.= (spectral) binormal direction
i.= radial direction
j.= parallel direction
{e, k}.= 2D velocity space
{e, k}1 {e, k}2 {e, k}3 {e, k}4
{e, k}5 {e, k}6 {e, k}7
n1, n2, n3 i, j i, j i, j i, j
n4, n5, n6 i, j i, j i, j i, j
n7, n8, n9 i, j i, j i, j i, j
n10, n11, n12 i, j i, j i, j i, j
10 GYRO on the Cray X1E
All-to-All Communication only on Subgroups
{e, k}1 {e, k}1
{e, k}5 {e, k}5
n1, n2, n3 i, j i, n j1, j5
n4, n4, n5, i, j i, n j2, j6
n6, n7, n8 i, j i, n j3, j7
n9, n10, n11 i, j i, n j4, j8
f([n], {e, k}, i, j)
→ f([j], {e, k}, i, n)
{e, k}1 {e, k}2 {e, k}3 {e, k}4
{e, k}5 {e, k}6 {e, k}7
n1, n2, n3 i, j i, j i, j i, j
n1, n2, n3 k, j k, j k, j k, j
{i, e}1 {i, e}2 {i, e}3 {i, e}4
{i, e}5 {i, e}6
f([n], {e, k}, i, j)
→ f([n], {i, e}, k, j)
11 GYRO on the Cray X1E
Matching Experimental Power Flows inDIII-D L-mode Plasmas
Phys. Rev. Lett 11 (2003) 045001.
Simulations can match between-
shot transport scaling, and can
match power flows within a
factor of two.
Most realistic gyrokinetic simu-
lations ever published. 250 300 350 4001/ρ∗
0
2
4
6
8
10
12
χ eff/
χ GB a
t r/a
=0.6
Full Physicsno ExB
νei = 0β=0
experiment
Full physics (-5%)
12 GYRO on the Cray X1E
The Local Limit of Global Simulations
Phys. Plasmas 11 (2004) L25.
Local simulations are the rigor-
ous limit of global simulations
as the gyroradius to system size
decreases.
The largest case is Worley’s
B3-GTC benchmark case. 0.2 0.4 0.6 0.8r/a
0.0
0.5
1.0
1.5
2.0
2.5
3.0
χ i [u
nits
of (
c s/a
)ρs2 ]
1/ρ∗ = 5001/ρ∗ = 3751/ρ∗ = 2101/ρ∗ = 105
GTC
GS2
PG3EQ GYRO
13 GYRO on the Cray X1E
Transport in the Vicinity of a Minimum-q Surface
Phys. Plasmas 11 (2004) 1879.
Transport is smooth across
a minimum-q surface. Local and
global results give consistent
picture.
Done exclusively on the X1. 0.2 0.3 0.4 0.5 0.6 0.7 0.8r/a
0.0
0.5
1.0
1.5
2.0
χ i [u
nits
of (
c s/a
)ρs2 ]
Flux-tube
ρ* = 0.0025
14 GYRO on the Cray X1E
Particle Transport in Reactors: Helium Pinch andDeuterum-Tritium flow Separation
Phys. Plasmas 12 (2005) 022305.
Discovered particle flow sep-
aration effect in D-T (reactor)
plasmas. Tritium (T) is better
confined than Deuterium (D).
Developed analytic theory via
FLR symmetry breaking.
First systematic gyrokinetic study
of impurity transport.
1.5 2.0 2.5 3.0 3.5 4.0 4.51/LTe
0
5
10
15
χ i /
χ GB
D
Pure D50% D50% T
(a)
1.5 2.0 2.5 3.0 3.5 4.0 4.51/LTe
-5
-4
-3
-2
-1
0
Di /
χG
BD
Pure D50% D50% T
(b)
15 GYRO on the Cray X1E
ExB Shear Stabilization of Turbulence
Phys. Plasmas 12 (2005) 062302.
In the DIII-D tokamak, equi-
librium sheared E×B rotation
is a powerful stabilizing mecha-
nism.
First and only systematic gy-
rokinetic study of E × B shear
stabilization with kinetic electrons.
16 GYRO on the Cray X1E
Electron Transport via Electromagnetic Fluctuations
Phys. Plasmas 12 (2005) 072307.
Electrons are strongly resonant
at rational magnetic surfaces.
Also, 50% of electron transport
is driven by magnetic-flutter at
β/βcrit = 0.6.
Done exclusively on the X1.1.36 1.38 1.40 1.42 1.44
q
1
2
3
4
d/dr
(ln T
e)
βe=0.1% βe=0.4% βe=0.7%
17 GYRO on the Cray X1E
The Connection between Entropy, Upwind Dissipation,Velocity-Space Resolution and Steady-States of Turbulence
Submitted to Phys. Plasmas.
Study performed to clarify various
misconceptions and misunder-
standings about Eulerian methods.
Results show that there is no miss-
ing velocity-space structure in
GYRO.
GYRO velocity-space integration is
extremely accurate, and simulations
are noiseless
32 64 128 256 512Velocity-space points (PPC)
140
160
180
200
220
240
Entropy (S+W)
32 64 128 256 512Velocity-space points (PPC)
8
10
12
14
16
Flux (F)
18 GYRO on the Cray X1E
Demonstration that the Parallel Nonlinearity is Negligible
Even at unphysically large val-
ues of ρ∗ = ρi/a, no effect of
the PNL is observed. This result
is essentially a requirement of
the gyrokinetic ordering.
Unlikely that current codes could
treat the term accurately if it was
important.0.001 0.010
ρi/a
0
1
2
3
4
5
χ i/χ
GB
PNL onPNL off
19 GYRO on the Cray X1E
Benchmarking Eulerian verus PIC codes onETG Simulations (adiabatic ion dynamics)
Submitted to Phys. Plasmas.
PIC Simulations of ETG are dom-
inated by discrete particle noise
in “strong transport” regime.
Benchmarking effort show per-
fect intercode agreement when
noise is under control.
GYRO (Eulerian, red) versus
PG3EQ (PIC, black). Blue curve
shows the “pure noise” diagnostic.
20 GYRO on the Cray X1E
Summary of Results and Requirements
• Most of the previous simulations/publications were done using
only 16-64 MSPs
• GYRO algorithms are powerful enough to make most ITG-only, ITG-TEM,
or ETG-only studies quick and easy
• Each publication typically presents from 10 to 100 simulations
Are there problems for which 64 MSPs are not enough?
21 GYRO on the Cray X1E
The Future: Fully-Coupled ITG-ETG Studies
• We want to attempt
simulation of ETG turbulence embedded in ITG turbulence
• Preliminary testing on 640 X1E MSPs shows
– µ.=
√
mi/me = 20 works with 0 ≤ kθρe ≤ 0.3
– implies that µ = 30 runs with 0 ≤ kθρe ≤ 0.6 are feasible
– cost scales roughly as µ3.5