Recent advances in coupling in the GFDL Flexible Modeling ...

31
Recent advances in coupling in the GFDL Flexible Modeling System iCAS2015 Annecy France V. Balaji, Rusty Benson, Bruce Wyman, S-J. Lin, Alistair Adcroft and many others NOAA/GFDL and Princeton University 14 September 2015 V. Balaji ([email protected]) Component concurrency 14 September 2015 1 / 31

Transcript of Recent advances in coupling in the GFDL Flexible Modeling ...

Page 1: Recent advances in coupling in the GFDL Flexible Modeling ...

Recent advances in coupling in the GFDL FlexibleModeling System

iCAS2015Annecy France

V. Balaji, Rusty Benson, Bruce Wyman, S-J. Lin, Alistair Adcroftand many others

NOAA/GFDL and Princeton University

14 September 2015

V. Balaji ([email protected]) Component concurrency 14 September 2015 1 / 31

Page 2: Recent advances in coupling in the GFDL Flexible Modeling ...

Outline

1 Towards exascale with Earth System models

2 Adapting ESM architecture for scalability

3 Concurrent radiation

4 Other issuesConcurrent nestingIce-ocean boundary

5 Summary

V. Balaji ([email protected]) Component concurrency 14 September 2015 2 / 31

Page 3: Recent advances in coupling in the GFDL Flexible Modeling ...

Outline

1 Towards exascale with Earth System models

2 Adapting ESM architecture for scalability

3 Concurrent radiation

4 Other issuesConcurrent nestingIce-ocean boundary

5 Summary

V. Balaji ([email protected]) Component concurrency 14 September 2015 3 / 31

Page 4: Recent advances in coupling in the GFDL Flexible Modeling ...

Climate modeling, a computational profile

Intrinsic variability at all timescales from minutes to millennia;distinguishing natural from forced variability is a key challenge.coupled multi-scale multi-physics modeling;physics components have predictable data dependenciesassociated with grids;Adding processes and components improves scientificunderstanding;New physics and higher process fidelity at higher resolution;Ensemble methods to sample uncertainty (ICEs, PPEs, MMEs...)algorithms generally possess weak scalability.

In sum, climate modeling requires long-term integrations ofweakly-scaling I/O and memory-bound models of enormouscomplexity.

V. Balaji ([email protected]) Component concurrency 14 September 2015 4 / 31

Page 5: Recent advances in coupling in the GFDL Flexible Modeling ...

Earth System Model Architecture

Earth System Model

? ?? ?

Atmosphere Land Ice Ocean

? ?AtmDyn AtmPhy

? ? ?Rad H2O PBL

? ?OcnBio OcnClr

? ?LandBio LandH2O

Complexity, resolution, UQ: components can have their own grids,timesteps, algorithms, multiple concurrent instances.V. Balaji ([email protected]) Component concurrency 14 September 2015 5 / 31

Page 6: Recent advances in coupling in the GFDL Flexible Modeling ...

The hardware jungle

Upcoming hardware roadmap looks daunting! GPUs, MICs, DSPs,and many other TLAs. . .

Intel straight line: IvyBridge/SandyBridge, Haswell/Broadwell:“traditional” systems with threading and vectors.Intel knight’s move: Knights Corner, Knights Landing: MICs,thread/vector again, wider in thread space.Hosted dual-socket systems with GPUs: SIMD co-processors.BG/Q: CPU only with hardware threads, thread and vectorinstructions. No followon planned.ARM-based systems coming. (e.g with DSPs).FPGAs? some inroads in finance.Specialized processors: Anton for molecular dynamics, GRAPEfor astrophysics.

V. Balaji ([email protected]) Component concurrency 14 September 2015 6 / 31

Page 7: Recent advances in coupling in the GFDL Flexible Modeling ...

The software zoo

Exascale using nanosecond clocks implies billion-way concurrency!It is unlikely that we will program codes with 106− 109 MPI ranks: it willbe MPI+X. Solve for X . . .

CUDA and CUDA-Fortran: proprietary for NVIDIA GPUs. Invasiveand pervasive.OpenCL: proposed standard, not much penetration.ACC from Portland Group, now a new standard OpenACC.Potential OpenMP/OpenACC merging...?PGAS languages: Co-Array Fortran, UPC, a host of proprietarylanguages.Code generation:

Domain-specific languages (DSLs): e.g STELLA, Psy.Source-to-source translators.

V. Balaji ([email protected]) Component concurrency 14 September 2015 7 / 31

Page 8: Recent advances in coupling in the GFDL Flexible Modeling ...

GFDL between jungle and zoo

GFDL is taking a conservative approach:

it looks like it will be a mix of MPI, threads, and vectors.Developing a three-level abstraction for parallelism: components,domains, blocks. Kernels work on blocks and must havevectorizing inner loops.Recommendation: sit tight, make sure MPI+OpenMP works well,write vector-friendly loops, reduce memory footprint, offload I/O.Other concerns:

Irreproducible computationTools for analyzing performance.Debugging at scale.

Recent experience on Titan, Stampede and Mira reaffirm thisapproach.This talk will focus on coarse-grained parallelism at the componentlevel.

V. Balaji ([email protected]) Component concurrency 14 September 2015 8 / 31

Page 9: Recent advances in coupling in the GFDL Flexible Modeling ...

Outline

1 Towards exascale with Earth System models

2 Adapting ESM architecture for scalability

3 Concurrent radiation

4 Other issuesConcurrent nestingIce-ocean boundary

5 Summary

V. Balaji ([email protected]) Component concurrency 14 September 2015 9 / 31

Page 10: Recent advances in coupling in the GFDL Flexible Modeling ...

Earth System Model Architecture

Earth System Model

? ?? ?

Atmosphere Land Ice Ocean

? ?AtmDyn AtmPhy

? ? ?Rad H2O PBL

? ?OcnBio OcnClr

? ?LandBio LandH2O

Extending component parallelism to O(10) requires a different physicalarchitecture!V. Balaji ([email protected]) Component concurrency 14 September 2015 10 / 31

Page 11: Recent advances in coupling in the GFDL Flexible Modeling ...

Serial coupling

Uses a forward-backward timestep for coupling.

At+1 = At + f (Ot ) (1)Ot+1 = Ot + f (At+1) (2)

6

-

P

T

Ot

At+1

Ot+1

At+2

Ot+2

At+3

Ot+3

At+4

Ot+4

At+5

V. Balaji ([email protected]) Component concurrency 14 September 2015 11 / 31

Page 12: Recent advances in coupling in the GFDL Flexible Modeling ...

Concurrent coupling

This uses a forward-only timestep for coupling. While formally this isunconditionally unstable, the system is strongly damped∗. Answersvary with respect to serial coupling, as the ocean is now forced byatmospheric state from ∆t ago.

At+1 = At + f (Ot ) (3)Ot+1 = Ot + f (At ) (4)

6

-

P

T

Ot

At

Ot+1

At+1

Ot+2

At+2

Ot+3

At+3

Ot+4

At+4

V. Balaji ([email protected]) Component concurrency 14 September 2015 12 / 31

Page 13: Recent advances in coupling in the GFDL Flexible Modeling ...

Massively concurrent coupling

6

-

P

T

Ot

Lt

Bt

Ct

Rt

At

Ot+1

Lt+1

Bt+1

Ct+1

Rt+1

At+1

Ot+2

Lt+2

Bt+2

Ct+2

Rt+2

At+2

Components such as radiation, PBL, ocean biogeochemistry, eachcould run with its own grid, timestep, decomposition, even hardware.Coupler mediates state exchange.V. Balaji ([email protected]) Component concurrency 14 September 2015 13 / 31

Page 14: Recent advances in coupling in the GFDL Flexible Modeling ...

Outline

1 Towards exascale with Earth System models

2 Adapting ESM architecture for scalability

3 Concurrent radiation

4 Other issuesConcurrent nestingIce-ocean boundary

5 Summary

V. Balaji ([email protected]) Component concurrency 14 September 2015 14 / 31

Page 15: Recent advances in coupling in the GFDL Flexible Modeling ...

The radiation component

The atmospheric radiation component computes radiative transfer ofincoming shortwave solar fluxes and outgoing longwave radiation as afunction of all radiatively active species in the atmosphere (greenhousegases, aerosols, particulates, clouds, ...).

The physics of radiative transfer is relatively well-known, but a fullMie-scattering solution is computationally out of reach.Approximate methods (sampling the “line-by-line” calculation into“bands”) have been in use for decades, and “standard” packageslike RRTM are available.They are still very expensive: typically ∆trad > ∆tphy (in the GFDLmodels typically 9X). The model is sensitive to this ratio.Other methods: stochastic sampling of bands (Pincus andStevens 2013), neural nets (Krasnopolsky et al 2005)

Challenge: can we exploit “cheap flops” to set ∆trad = ∆tphy?

V. Balaji ([email protected]) Component concurrency 14 September 2015 15 / 31

Page 16: Recent advances in coupling in the GFDL Flexible Modeling ...

Traditional coupling sequence

Radiation timestep much longer than physics timestep.(Figure courtesy Rusty Benson, NOAA/GFDL).V. Balaji ([email protected]) Component concurrency 14 September 2015 16 / 31

Page 17: Recent advances in coupling in the GFDL Flexible Modeling ...

Proposed coupling sequence

Radiation executes on physics timestep from lagged state.(Figure courtesy Rusty Benson, NOAA/GFDL).

V. Balaji ([email protected]) Component concurrency 14 September 2015 17 / 31

Page 18: Recent advances in coupling in the GFDL Flexible Modeling ...

Proposed coupling sequence using pelists

Requires MPI communication between physics and radiation.(Figure courtesy Rusty Benson, NOAA/GFDL).V. Balaji ([email protected]) Component concurrency 14 September 2015 18 / 31

Page 19: Recent advances in coupling in the GFDL Flexible Modeling ...

Proposed coupling sequence: hybrid approach

Physics and radiation share memory.(Figure courtesy Rusty Benson, NOAA/GFDL).

V. Balaji ([email protected]) Component concurrency 14 September 2015 19 / 31

Page 20: Recent advances in coupling in the GFDL Flexible Modeling ...

Results from climate run

20 year AMIP/SST climate runs have completed on Gaea (Cray XE6).Control: 9.25 sypd

∆trad = 9∆tphy864 MPI-ranks / 2 OpenMP threads

Serial Radiation: 5.28 sypd∆trad = ∆tphy864 MPI-ranks / 2 OpenMP threads

Concurrent Radiation: 5.90 sypd∆trad = ∆tphy432 MPI-ranks / 4 OpenMP threads (2 atmos + 2 radiation)Can get back to 9 sypd at about ∼2700 cores (roughly 1.6X).

Comparison of Concurrent Radiation to Controlclimate is similarTOA balance is off by ∼ 4W/m2, mostly in the short wave, buteasily retuned when ready to deploy

Results presented at AMS (Benson et al 2015). Article in the works forGMD special issue on coupling.V. Balaji ([email protected]) Component concurrency 14 September 2015 20 / 31

Page 21: Recent advances in coupling in the GFDL Flexible Modeling ...

Outline

1 Towards exascale with Earth System models

2 Adapting ESM architecture for scalability

3 Concurrent radiation

4 Other issuesConcurrent nestingIce-ocean boundary

5 Summary

V. Balaji ([email protected]) Component concurrency 14 September 2015 21 / 31

Page 22: Recent advances in coupling in the GFDL Flexible Modeling ...

Lee vortices off Hawaii under two-way nesting

Figure courtesy Lucas Harris and S-J Lin, NOAA/GFDL.V. Balaji ([email protected]) Component concurrency 14 September 2015 22 / 31

Page 23: Recent advances in coupling in the GFDL Flexible Modeling ...

Concurrent two-way nesting

Typical nesting protocols force serialization between fine and coarsegrid timestepping, since the C∗ are estimated by interpolating betweenCn and Cn+1.

F n F n+ 13 F n+ 2

3 F n+1

Cn C∗n+ 13 C∗n+ 2

3 Cn+1

We enable concurrency by instead estimating the C∗ by extrapolationfrom Cn−1 and Cn, with an overhead of less than 10%. (See Harrisand Lin 2012 for details.)

V. Balaji ([email protected]) Component concurrency 14 September 2015 23 / 31

Page 24: Recent advances in coupling in the GFDL Flexible Modeling ...

Ice-ocean boundary

Atmos.Thermo

AtmosphereDynamics

OceanDynamics

OceanThermo

IceThermo

IceDynamics

SST, uo

OceanDynamics

OceanThermo

Q(SST,Ti,Ta), τ(uo,ui,ua)

Sequential coupling

Δtcoupled

Timetn+1tn

Figure courtesy Alistair Adcroft, Princeton University.

V. Balaji ([email protected]) Component concurrency 14 September 2015 24 / 31

Page 25: Recent advances in coupling in the GFDL Flexible Modeling ...

Ice-ocean boundary

Atmos.Thermo

AtmosphereDynamics

OceanDynamics

OceanThermo

IceThermo

IceDynamics

SST, uo

Q(SST,Ti,Ta), τ(uo,ui,ua)

Concurrent coupling

Δtcoupled

Timetn+1tn-1

OceanDynamics

OceanThermo

SST, uo

OceanDynamics

OceanThermo

SST, uo

tn

Atmos.Thermo

AtmosphereDynamics

IceThermo

IceDynamics

Q(SST,Ti,Ta), τ(uo,ui,ua)

Atmos.Thermo

AtmosphereDynamics

IceThermo

IceDynamics

Q(SST,Ti,Ta), τ(uo,ui,ua)

Figure courtesy Alistair Adcroft, Princeton University.

V. Balaji ([email protected]) Component concurrency 14 September 2015 25 / 31

Page 26: Recent advances in coupling in the GFDL Flexible Modeling ...

Ice-ocean boundary

Atmos.Thermo

AtmosphereDynamics

OceanDynamics

OceanThermo

IceThermo

IceDynamics

uo

τ(uo,ui,ua)

Staggered-concurrent coupling

Δtcoupled

Timetn+1tn-1

OceanDynamics

OceanThermo

OceanDynamics

OceanThermo

SST

tn

Atmos.Thermo

AtmosphereDynamics

IceThermo

IceDynamics

Q(SST,Ti,Ta)

Atmos.Thermo

AtmosphereDynamics

IceThermo

IceDynamics

Q(SST,Ti,Ta) τ(uo,ui,ua)Q(SST,Ti,Ta) τ(uo,ui,ua)

uo SST uo SST

Figure courtesy Alistair Adcroft, Princeton University.

V. Balaji ([email protected]) Component concurrency 14 September 2015 26 / 31

Page 27: Recent advances in coupling in the GFDL Flexible Modeling ...

Ice-ocean boundary

Atmos.Thermo

AtmosphereDynamics

OceanDynamics

OceanThermo

IceThermo

IceDynamics

SST, uo

OceanDynamics

OceanThermo

Q(SST,Ti,Ta), τ(uo,ui,ua)

Sequential coupling + Adams-Bashforth

Δtcoupled

Timetn+1tn

OceanDynamics

OceanThermo

tn-1

SST, uo

SST*, uo*

Figure courtesy Alistair Adcroft, Princeton University.

V. Balaji ([email protected]) Component concurrency 14 September 2015 27 / 31

Page 28: Recent advances in coupling in the GFDL Flexible Modeling ...

Ice-ocean boundary

Atmos.Thermo

AtmosphereDynamics

OceanDynamics

OceanThermo

IceThermo

IceDynamics

SST, uo

Q(SST,Ti,Ta), τ(uo,ui,ua)

Concurrent coupling + AB

Δtcoupled

Timetn+1tn-1

OceanDynamics

OceanThermo

SST, uo

OceanDynamics

OceanThermo

SST, uo

tn

Atmos.Thermo

AtmosphereDynamics

IceThermo

IceDynamics

Q(SST,Ti,Ta), τ(uo,ui,ua)

Atmos.Thermo

AtmosphereDynamics

IceThermo

IceDynamics

Q(SST,Ti,Ta), τ(uo,ui,ua)

SST*, uo* SST*, uo

*

Figure courtesy Alistair Adcroft, Princeton University.

V. Balaji ([email protected]) Component concurrency 14 September 2015 28 / 31

Page 29: Recent advances in coupling in the GFDL Flexible Modeling ...

Ice-ocean boundary

Atmos.Thermo

AtmosphereDynamics

OceanDynamics

OceanThermo

IceThermo

IceDynamics

uo

τ(uo,ui,ua)

Staggered-concurrent coupling + AB

Δtcoupled

Timetn+1tn-1

OceanDynamics

OceanThermo

OceanDynamics

OceanThermo

SST

tn

Atmos.Thermo

AtmosphereDynamics

IceThermo

IceDynamics

Q(SST,Ti,Ta)

Atmos.Thermo

AtmosphereDynamics

IceThermo

IceDynamics

Q(SST,Ti,Ta) τ(uo,ui,ua)Q(SST,Ti,Ta) τ(uo,ui,ua)

uo SST uo SST

uo*

SST*

Figure courtesy Alistair Adcroft, Princeton University.

V. Balaji ([email protected]) Component concurrency 14 September 2015 29 / 31

Page 30: Recent advances in coupling in the GFDL Flexible Modeling ...

Outline

1 Towards exascale with Earth System models

2 Adapting ESM architecture for scalability

3 Concurrent radiation

4 Other issuesConcurrent nestingIce-ocean boundary

5 Summary

V. Balaji ([email protected]) Component concurrency 14 September 2015 30 / 31

Page 31: Recent advances in coupling in the GFDL Flexible Modeling ...

Summary

Moore’s law has taken us from the von Neumann model to the“sea of functional units” (Kathy Yelick). Not easy to understand,predict or program performance.... but the “free lunch” decades are over, they’ve come to takeaway your plates.Coarse-grained parallelism is an area in the current effort toreclaim performance from the encroaching “sea”.The “component” abstraction still may let us extract some benefitsout of the machines of this era:

sharing of the wide thread space.distribute components among heterogeneous hardware?concerns about stability, conservation, and accuracy.

Presented at AMS, CW2015, GMD paper near submission:“Coarse-grained component concurrency in Earth Systemmodeling”, Balaji et al 2015.

V. Balaji ([email protected]) Component concurrency 14 September 2015 31 / 31