U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded...

29
LLNL-PRES-687782 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC Burning on the GPU: Fast and Accurate Chemical Kinetics GPU Technology Conference Russell Whitesides April 7, 2016 Session 6195 Funded by: U.S. Department of Energy Vehicle Technologies Program Program Manager: Gurpreet Singh & Leo Breton

Transcript of U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded...

Page 1: U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded by: U.S. Department of Energy ... batch2 batch1 batch0 . Lawrence Livermore National

LLNL-PRES-687782 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC

Burning on the GPU: Fast and Accurate Chemical Kinetics

GPU Technology Conference

Russell Whitesides April 7, 2016

Session 6195

Funded by: U.S. Department of Energy

Vehicle Technologies Program Program Manager: Gurpreet Singh & Leo Breton

Page 2: U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded by: U.S. Department of Energy ... batch2 batch1 batch0 . Lawrence Livermore National

Lawrence Livermore National Laboratory LLNL-PRES-687782 2

To make it go faster?

+

Why?

Page 3: U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded by: U.S. Department of Energy ... batch2 batch1 batch0 . Lawrence Livermore National

Lawrence Livermore National Laboratory LLNL-PRES-687782 3

Why?

We burn a lot of gasoline.

•  Transportation efficiency • Chemistry is vital to predictive simulations • Chemistry can be > 90% of simulation time.

Page 4: U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded by: U.S. Department of Energy ... batch2 batch1 batch0 . Lawrence Livermore National

Lawrence Livermore National Laboratory LLNL-PRES-687782 4

National lab compute power and industry need.

Supercomputing @ DOE labs: Strong investment in GPUs with eye towards exascale

OEM engine designers:

Require fast turnaround with desktop class hardware

Why?

Page 5: U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded by: U.S. Department of Energy ... batch2 batch1 batch0 . Lawrence Livermore National

Lawrence Livermore National Laboratory LLNL-PRES-687782 5

“Colorful Fluid Dynamics”

YO2 Temperature

“Typical” engine simulation w/ detailed chemistry

Page 6: U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded by: U.S. Department of Energy ... batch2 batch1 batch0 . Lawrence Livermore National

Lawrence Livermore National Laboratory LLNL-PRES-687782 6

Detailed Chemistry in Reacting Flow CFD:

Each cells is treated as an isolated system for chemistry.

Operator Splitting Technique: Solve independent set of ordinary differential equations (ODEs) in each cell to calculate chemical source terms for species and energy advection/diffusion equations.

t t+∆t

Page 7: U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded by: U.S. Department of Energy ... batch2 batch1 batch0 . Lawrence Livermore National

Lawrence Livermore National Laboratory LLNL-PRES-687782 7

CPU (un-coupled) chemistry integration

Each cells is treated as an isolated system for chemistry.

t t+∆t

Page 8: U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded by: U.S. Department of Energy ... batch2 batch1 batch0 . Lawrence Livermore National

Lawrence Livermore National Laboratory LLNL-PRES-687782 8

GPU (batched) chemistry integration

On the GPU we solve chemistry in batches of cells simultaneously.

t t+∆t

Page 9: U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded by: U.S. Department of Energy ... batch2 batch1 batch0 . Lawrence Livermore National

Lawrence Livermore National Laboratory LLNL-PRES-687782 9

See also Whitesides & McNenly, GTC 2015; McNenly & Whitesides, GTC 2014

Previously at GTC:

Page 10: U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded by: U.S. Department of Energy ... batch2 batch1 batch0 . Lawrence Livermore National

Lawrence Livermore National Laboratory LLNL-PRES-687782 10

n_gpu = 0;

Note: most CFD simulations are done on distributed memory systems

rank0

rank1

rank2

rank3

rank4

rank6

rank7

rank5

CPU

CPU

CPU

CPU

CPU

CPU

CPU CPU

Page 11: U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded by: U.S. Department of Energy ... batch2 batch1 batch0 . Lawrence Livermore National

Lawrence Livermore National Laboratory LLNL-PRES-687782 11

++n_gpu; //now what?

Note: most CFD simulations are done on distributed memory systems

rank0

rank1

rank2

rank3

rank4

rank6

rank7

rank5

CPU

CPU

CPU

CPU

CPU

CPU

CPU CPU

Page 12: U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded by: U.S. Department of Energy ... batch2 batch1 batch0 . Lawrence Livermore National

Lawrence Livermore National Laboratory LLNL-PRES-687782 12

Here CPU is a single core.

Ideal CPU-GPU Work-sharing

SGPU =walltime(CPU)walltime(GPU)

Page 13: U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded by: U.S. Department of Energy ... batch2 batch1 batch0 . Lawrence Livermore National

Lawrence Livermore National Laboratory LLNL-PRES-687782 13

Let’s make use of the whole machine.

Ideal CPU-GPU Work-sharing

§  # CPU cores = NCPU

§  # GPU devices = NGPU

Stotal =NCPU + NGPU SGPU −1( )( )

NCPU 1

2

3

4

5

6

7

8

1 2 3 4

S tot

al

NGPU

SGPU = 8 NCPU=4

NCPU=8

NCPU=16

NCPU=32 **

*TITAN(1.4375)*surface(1.8750)

SGPU =walltime(CPU)walltime(GPU)

Page 14: U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded by: U.S. Department of Energy ... batch2 batch1 batch0 . Lawrence Livermore National

Lawrence Livermore National Laboratory LLNL-PRES-687782 14

Distribute based on number of cells and give more to GPU.

Good performance in simple case with both CPU and GPU doing work

100

1000

10000

1 2 4 8 16

Chem

istryTime(secon

ds)

NumberofProcessors

CPU Chemistry

GPU Chemistry (std work sharing)

Page 15: U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded by: U.S. Department of Energy ... batch2 batch1 batch0 . Lawrence Livermore National

Lawrence Livermore National Laboratory LLNL-PRES-687782 15

Distribute based on number of cells and give more to GPU.

Good performance in simple case with both CPU and GPU doing work

100

1000

10000

1 2 4 8 16

Chem

istryTime(secon

ds)

NumberofProcessors

CPU Chemistry

GPU Chemistry (std work sharing)

GPU Chemistry (custom work sharing)

SGPU = 7 Stotal = 1.7 (SGPU = 6.6)

Page 16: U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded by: U.S. Department of Energy ... batch2 batch1 batch0 . Lawrence Livermore National

Lawrence Livermore National Laboratory LLNL-PRES-687782 16

Let’s go!

First attempt @

engine calculation on GPU+CPU

Page 17: U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded by: U.S. Department of Energy ... batch2 batch1 batch0 . Lawrence Livermore National

Lawrence Livermore National Laboratory LLNL-PRES-687782 17

What happened?

First attempt @

engine calculation on GPU+CPU

§  2x Xeon E5-2670 (16 cores) => §  2x Xeon E5-2670 + 2 Tesla K40m => §  Stotal = 21.2/17.6 = 1.20

21.2 hours 17.6 hours

(SGPU = 2.6)

Page 18: U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded by: U.S. Department of Energy ... batch2 batch1 batch0 . Lawrence Livermore National

Lawrence Livermore National Laboratory LLNL-PRES-687782 18

Integrator performance when doing batch solution

If the systems are not similar how much extra work needs to be done?

vs.

Page 19: U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded by: U.S. Department of Energy ... batch2 batch1 batch0 . Lawrence Livermore National

Lawrence Livermore National Laboratory LLNL-PRES-687782 19

Batches of dissimilar reactors will suffer from excessive extra steps

What penalty do we pay when batching?

Page 20: U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded by: U.S. Department of Energy ... batch2 batch1 batch0 . Lawrence Livermore National

Lawrence Livermore National Laboratory LLNL-PRES-687782 20

Batches of dissimilar reactors will suffer from excessive extra steps

What penalty do we pay when batching?

Page 21: U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded by: U.S. Department of Energy ... batch2 batch1 batch0 . Lawrence Livermore National

Lawrence Livermore National Laboratory LLNL-PRES-687782 21

Batches of dissimilar reactors will suffer from excessive extra steps

Possibly a lot of extra steps.

Page 22: U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded by: U.S. Department of Energy ... batch2 batch1 batch0 . Lawrence Livermore National

Lawrence Livermore National Laboratory LLNL-PRES-687782 22

Sort reactors by how many steps they took to solve on the last CFD step

Easy as pie?

n_steps >100

1

batch3 batch2 batch1 batch0

Page 23: U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded by: U.S. Department of Energy ... batch2 batch1 batch0 . Lawrence Livermore National

Lawrence Livermore National Laboratory LLNL-PRES-687782 23

Have to manage the sorting and load-balancing in distributed memory system

Not so fast.

rank0

rank7

rank5

rank6

rank4

rank1

rank2

rank3

Page 24: U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded by: U.S. Department of Energy ... batch2 batch1 batch0 . Lawrence Livermore National

Lawrence Livermore National Laboratory LLNL-PRES-687782 24

Load balance based on expected cost and expected performance.

MPI communication to re-balance for chemistry.

rank0

rank7

rank5

rank6

rank4

rank1

rank2

rank3

Page 25: U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded by: U.S. Department of Energy ... batch2 batch1 batch0 . Lawrence Livermore National

Lawrence Livermore National Laboratory LLNL-PRES-687782 25

Let’s go again!

Second attempt @

engine calculation on GPU+CPU

Page 26: U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded by: U.S. Department of Energy ... batch2 batch1 batch0 . Lawrence Livermore National

Lawrence Livermore National Laboratory LLNL-PRES-687782 26

How much does difference does it make?

Total steps significantly reduced by batching appropriately

Page 27: U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded by: U.S. Department of Energy ... batch2 batch1 batch0 . Lawrence Livermore National

Lawrence Livermore National Laboratory LLNL-PRES-687782 27

J

Engine results with improved work-sharing and reactor sorting

9.1 hrs

7.6 hrs

13.0 hrs

~40 % reduction in chemistry time; ~36% reduction in overall time

Stotal=1.7SGPU=6.6

Page 28: U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded by: U.S. Department of Energy ... batch2 batch1 batch0 . Lawrence Livermore National

Lawrence Livermore National Laboratory LLNL-PRES-687782 28

§  Improve SGPU •  Derivative kernels •  Matrix operations

§  Extrapolative integration methods •  Less “startup” cost when re-initializing •  Potentially well suited for GPU

§  Non-chemistry calc’s on GPU •  Multi-species transport •  Particle spray

Future directions

Possibilities for significant further improvements.

Page 29: U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded by: U.S. Department of Energy ... batch2 batch1 batch0 . Lawrence Livermore National

Lawrence Livermore National Laboratory LLNL-PRES-687782 29

§ Much improved CFD chemistry work-sharing with GPU

§ ~40% reduction in chemistry time for real engine case (~36% total time)

§ Working on further improvement

Summary

Thank you!

+