U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded...

LLNL-PRES-687782 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC

Burning on the GPU: Fast and Accurate Chemical Kinetics

GPU Technology Conference

Russell Whitesides April 7, 2016

Session 6195

Funded by: U.S. Department of Energy

Vehicle Technologies Program Program Manager: Gurpreet Singh & Leo Breton

Lawrence Livermore National Laboratory LLNL-PRES-687782 2

To make it go faster?

We burn a lot of gasoline.

•  Transportation efficiency • Chemistry is vital to predictive simulations • Chemistry can be > 90% of simulation time.

National lab compute power and industry need.

Supercomputing @ DOE labs: Strong investment in GPUs with eye towards exascale

OEM engine designers:

Require fast turnaround with desktop class hardware

“Colorful Fluid Dynamics”

YO2 Temperature

“Typical” engine simulation w/ detailed chemistry

Detailed Chemistry in Reacting Flow CFD:

Each cells is treated as an isolated system for chemistry.

Operator Splitting Technique: Solve independent set of ordinary differential equations (ODEs) in each cell to calculate chemical source terms for species and energy advection/diffusion equations.

t t+∆t

CPU (un-coupled) chemistry integration

Each cells is treated as an isolated system for chemistry.

t t+∆t

GPU (batched) chemistry integration

On the GPU we solve chemistry in batches of cells simultaneously.

t t+∆t

See also Whitesides & McNenly, GTC 2015; McNenly & Whitesides, GTC 2014

Previously at GTC:

n_gpu = 0;

Note: most CFD simulations are done on distributed memory systems

CPU CPU

++n_gpu; //now what?

Note: most CFD simulations are done on distributed memory systems

CPU CPU

Here CPU is a single core.

Ideal CPU-GPU Work-sharing

SGPU =walltime(CPU)walltime(GPU)

Let’s make use of the whole machine.

Ideal CPU-GPU Work-sharing

§  # CPU cores = NCPU

§  # GPU devices = NGPU

Stotal =NCPU + NGPU SGPU −1( )( )

NCPU 1

1 2 3 4

SGPU = 8 NCPU=4

NCPU=8

NCPU=16

NCPU=32 **

*TITAN(1.4375)*surface(1.8750)

SGPU =walltime(CPU)walltime(GPU)

Distribute based on number of cells and give more to GPU.

Good performance in simple case with both CPU and GPU doing work

1 2 4 8 16

istryTime(secon

NumberofProcessors

CPU Chemistry

GPU Chemistry (std work sharing)

Distribute based on number of cells and give more to GPU.

Good performance in simple case with both CPU and GPU doing work

1 2 4 8 16

istryTime(secon

NumberofProcessors

CPU Chemistry

GPU Chemistry (std work sharing)

GPU Chemistry (custom work sharing)

SGPU = 7 Stotal = 1.7 (SGPU = 6.6)

Let’s go!

First attempt @

engine calculation on GPU+CPU

What happened?

First attempt @

§  2x Xeon E5-2670 (16 cores) => §  2x Xeon E5-2670 + 2 Tesla K40m => §  Stotal = 21.2/17.6 = 1.20

21.2 hours 17.6 hours

(SGPU = 2.6)

Integrator performance when doing batch solution

If the systems are not similar how much extra work needs to be done?

Batches of dissimilar reactors will suffer from excessive extra steps

What penalty do we pay when batching?

Possibly a lot of extra steps.

Sort reactors by how many steps they took to solve on the last CFD step

Easy as pie?

n_steps >100

batch3 batch2 batch1 batch0

Have to manage the sorting and load-balancing in distributed memory system

Not so fast.

Load balance based on expected cost and expected performance.

MPI communication to re-balance for chemistry.

Let’s go again!

Second attempt @

How much does difference does it make?

Total steps significantly reduced by batching appropriately

Engine results with improved work-sharing and reactor sorting

9.1 hrs

7.6 hrs

13.0 hrs

~40 % reduction in chemistry time; ~36% reduction in overall time

Stotal=1.7SGPU=6.6

§  Improve SGPU •  Derivative kernels •  Matrix operations

§  Extrapolative integration methods •  Less “startup” cost when re-initializing •  Potentially well suited for GPU

§  Non-chemistry calc’s on GPU •  Multi-species transport •  Particle spray

Future directions

Possibilities for significant further improvements.

§ Much improved CFD chemistry work-sharing with GPU

§ ~40% reduction in chemistry time for real engine case (~36% total time)

§ Working on further improvement

Summary

Thank you!

U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded...

Documents

Transcript of U.S. Department of Energy Vehicle Technologies Program … · 2016. 4. 10. · Session 6195 Funded...

PDF (6195 KB)

SCC Tutorials Batch1

Globsyn C# Batch1 SVIMCS Appraisal Project Report

The Princess batch2 - NUI Galway

CSAT IAS Prelim 2011 Online Education Test Schedule Batch2 [PDF Library]

Not Guilty batch2 - NUI Galway

Application Lifecycle Management (ALM) Octanefiles.asset.microfocus.com/4aa6-6195/en/4aa6-6195.pdf · Application Lifecycle Management (ALM) Octane Frequently Asked Questions March

6195 SecureApplication MJT 20110614 - Copy

Full Cases Batch1

Lego 6195 Aqua Zone

Users/Lau/Desktop/Kregan/Firestop FSONEMAX/Batch2/WL2043 … · 2020-05-18 · Title /Users/Lau/Desktop/Kregan/Firestop FSONEMAX/Batch2/WL2043_012615.dwg Created Date: 2/6/2015 3:04:32

6195 Chapter 5 McDavid I Proof 3

Labor Law (Batch2)

Capecomputerscienceunit2paper1 Batch2 140603075712 Phpapp01

Cases NATRES Batch1

Pdf batch2 ps300dpicrop

Md.ahadulla sec 'd'. reg.no.t-6195

Footwear Design & Development Institute, Noida · 1 1/06/2020 Batch1

Credit BaTCH2 Cases

From Cheng Spec Proc Cases Batch2