Mark’Taylor’ AMWG’ 2/11/2013 · 3!...

7
Office of Science U.S. Department of Energy CAMSE Development Updates Mark Taylor AMWG 2/11/2013

Transcript of Mark’Taylor’ AMWG’ 2/11/2013 · 3!...

Page 1: Mark’Taylor’ AMWG’ 2/11/2013 · 3! CAM5:’’128’cores:’’CAMSE’1.8x’slower’than’CAMFV’’(1.65’ vs2.96SYPD)! CAM5:’’CAMSE’breakeven:’’768’cores’(10’SYPD)’

Office  of  Science  

U.S. Department of Energy

CAM-­‐SE  Development  Updates  

Mark  Taylor  AMWG  

2/11/2013  

Page 2: Mark’Taylor’ AMWG’ 2/11/2013 · 3! CAM5:’’128’cores:’’CAMSE’1.8x’slower’than’CAMFV’’(1.65’ vs2.96SYPD)! CAM5:’’CAMSE’breakeven:’’768’cores’(10’SYPD)’

ComputaAonal  Performance  

§ CAM5  on  Titan  § ne120_tx01  (0.25°  cubed-­‐sphere  grid,  0.1°  POP  tripole  grid)  

§ CESM  (non-­‐ATM)                            16%  § Physics                                                                19%  § Dynamics                                                        19%      §  Tracers  (horizontal/SE)        36%  §  Tracers  (verAcal  remap)    12%  

§ Upcoming  improvements  target  tracer  transport:  § ORNL:      PPM  verAcal  remap    (2x  faster)  § ORNL:      GPU  acceleraAon  of  Tracers  (verAcal  &  horizontal)    >>  2x  faster  §  SNL:    VerAcally  Lagrangian  dynamics:    5x  less  tracer  verAcal  remaps  § NCAR:  CSLAM:    >>3x    (horizontal  advecAon)  

§ Net  gain  in  CESM  performance:    1.4-­‐1.6x  with  GPU  acceleraAon  or  CSLAM  

     

2  

Page 3: Mark’Taylor’ AMWG’ 2/11/2013 · 3! CAM5:’’128’cores:’’CAMSE’1.8x’slower’than’CAMFV’’(1.65’ vs2.96SYPD)! CAM5:’’CAMSE’breakeven:’’768’cores’(10’SYPD)’

   

3  

§ CAM5:    128  cores:    CAM-­‐SE  1.8x  slower  than  CAM-­‐FV    (1.65  vs  2.96  SYPD)  § CAM5:    CAM-­‐SE  breakeven:    768  cores  (10  SYPD)  § CAM4:    SE  and  FV  are  comparable  (due  to  fewer  tracers)  

128 256 512 1K 2K 4K 1

2

4

8

16

NCORES

Sim

ulat

ed Y

ears

/Day

CESM FC5 1.0°

CAM−SECAM−FV

CAM4  0.25°    ANL  Intrepid  CAM5  1.0°    SNL  Redsky    

Page 4: Mark’Taylor’ AMWG’ 2/11/2013 · 3! CAM5:’’128’cores:’’CAMSE’1.8x’slower’than’CAMFV’’(1.65’ vs2.96SYPD)! CAM5:’’CAMSE’breakeven:’’768’cores’(10’SYPD)’

     

4  

§ Improved  scaling  on  Titan  (16  cores  per  CPU)  as  compared  to  Jaguar  (6  cores  per  CPU)  out  to  1  spectral  element  per  core  

§ CAM5-­‐SE  0.25°  F1850  running  on  Titan  at  3  SYPD  on  43200  cores    (ATM:  3.4  SYPD)  

§ CAM5  on  Titan  ~2.8x  more  expensive  than  CAM4  on  Jaguar  

1K 4K 16K 64K 256K0.5

1

2

4

8

16

NCORES

Sim

ulat

ed Y

ears

/Day

CESM1 F1850, ATM component, XT5

SE 0.25°

FV 0.25°EUL T341

CAM5    0.25°    Titan    

CAM4    0.25°    Jaguar  

4K 8K 16K 32K 64K 128K

1

2

45

8

NCORES

Sim

ulat

ed Y

ears

/Day

CESM F1850 0.25°

ATM componentTotal

Page 5: Mark’Taylor’ AMWG’ 2/11/2013 · 3! CAM5:’’128’cores:’’CAMSE’1.8x’slower’than’CAMFV’’(1.65’ vs2.96SYPD)! CAM5:’’CAMSE’breakeven:’’768’cores’(10’SYPD)’

Variable  ResoluAon  § Sofware  included  in  CAM5.2  release    § Not  documented  or  supported  § Running  at  SNL  and  LLNL  for  CSSEF  project:    1/8  degree  resoluAon  over  conAnental  U.S.  

§ Running  at  Michigan  (C.  Zarzycki,  C.  Jablonowski)    (1/8  degree  hurricane  simulaAons)  

§ Challenges:  §  CUBIT  for  mesh  generaAon  §  GeneraAon  of  data  sets  §  ResoluAon  dependent  topo  smoothing  §  Tuning  dycore  dissipaAon  parameters      

     

5  

Page 6: Mark’Taylor’ AMWG’ 2/11/2013 · 3! CAM5:’’128’cores:’’CAMSE’1.8x’slower’than’CAMFV’’(1.65’ vs2.96SYPD)! CAM5:’’CAMSE’breakeven:’’768’cores’(10’SYPD)’

Hurricane  Isaac  2012    

6  

§ Global  Forecast  System  (GFS)  iniAal  condiAons  +  CAM-­‐SE      

§ 5  day  simulaAon  with  1/8  degree  resoluAon  over  AtlanAc,  1  degree  global  

§ 5-­‐10x  speedup  over  global  1/8  degree  (3h  on  Bluefire)  

§  Images  courtesy  of  Colin  Zarzyki,  Univ.  Michigan  

     

Page 7: Mark’Taylor’ AMWG’ 2/11/2013 · 3! CAM5:’’128’cores:’’CAMSE’1.8x’slower’than’CAMFV’’(1.65’ vs2.96SYPD)! CAM5:’’CAMSE’breakeven:’’768’cores’(10’SYPD)’

1/8 Continental U.S. variable resolution configuration for DOE CSSEF Atmosphere Test-bed

Global 1/8° 6M core hours per year ANL/Intrepid

SGP 8x grid (1/8° over SGP ARM site) 0.12M core hours per year SNL/redsky

Precipitable water (gray), precip rate (color), sea level pressure (contours)