High Performance Compu2ng Department, Simula Research Laboratory… · 2016. 1. 26. · Johannes...

1
Pa$ent Data Johannes Langguth High Performance Compu2ng Department, Simula Research Laboratory, Oslo, Norway Par,,oning Scalable Heterogeneous CPU-GPU Computa$ons for Unstructured Tetrahedral Meshes MPI separator Node interior part MPI separator Node interior part Global problem GPU 1 separator GPU 1 interior part GPU 0 separator GPU 0 interior part MPI separator CPU interior part CPU-GPU separator MPI separator Node interior part 3D Tetrahedral Mesh QPI 32 GB/s 51.2 GB/s 51.2 GB/s 8 GB/s 8 GB/s MPI Infiniband Heterogeneous Nodes Compute CPU main part Compute MPI separator Send MPI separator to other nodes Receive MPI separator from other node Compute GPU 0 separator Compute GPU 1 separator Compute GPU 1 main part Compute GPU 0 main part Send CPU<GPU separator to GPU 0 GPU 0 swap GPU 1 swap Send CPU<GPU separator to GPU 1 Send GPU 1 separator to GPU 0 Send GPU 0 separator to GPU 1 CPU swap Thread 0 1 2 15 14 3 Time during compute round 13 Idle Compute CPU<GPU separator Send GPU 0 separator to host Send GPU 1 separator to host Calcium Handling -75 -50 -25 0 25 scalars -87.2 40 C A B t=150ms t=250ms t=500ms t=600ms t=750ms t=1250ms t=1350ms t=1100ms 0 500 1000 1500 2000 2500 3000 16 32 64 128 GFLOPs GPU only Heterogeneous Communica=on Disabled Scaling Performance

Transcript of High Performance Compu2ng Department, Simula Research Laboratory… · 2016. 1. 26. · Johannes...

Page 1: High Performance Compu2ng Department, Simula Research Laboratory… · 2016. 1. 26. · Johannes Langguth High Performance Compu2ng Department, Simula Research Laboratory, Oslo, Norway

MonodomainSimula,onsofCardiacElectrophysiologyoverUnstructured3DHeartGeometry

Pa$entData

JohannesLangguthHighPerformanceCompu2ngDepartment,SimulaResearchLaboratory,Oslo,Norway

Par,,o

ning

ScalableHeterogeneousCPU-GPUComputa$onsforUnstructured

TetrahedralMeshes

MPIseparator

Nodeinteriorpart

MPIseparator

Nodeinteriorpart

Globalproblem

GPU1separator

GPU1interiorpart

GPU0separator

GPU0interiorpart

MPIseparator

CPUinteriorpart

CPU-GPUseparator

MPIseparator

Nodeinteriorpart

ff

3DTetrahedralMesh

QPI$32$GB/s$51.2$GB/s$ 51.2$GB/s$

8$GB/s$ 8$GB/s$

MPI$Infiniband$

HeterogeneousNodes

Compute(CPU(main(part(

Compute(MPI((separator(

Send(MPI(separator(to(other(nodes(

Receive(MPI(separator(from(other(node(

Compute(GPU(0(separator(

Compute(GPU(1(separator( Compute(GPU(1(main(part(

Compute(GPU(0(main(part(

Send(CPU<GPU((separator(to(GPU(0(

GPU(0(swap(

GPU(1(swap(

Send(CPU<GPU((separator(to(GPU(1(

Send(GPU(1((separator(to(GPU(0(

Send(GPU(0((separator(to(GPU(1(

CPU(swap(

Thread(

0(

1(

2(

15(

14(

3(

Time(during(compute(round(

13(

…(

Idle(Compute(CPU<GPU((separator(

Send(GPU(0(separator(to(host(

Send(GPU(1(separator(to(host(

Calcium

Hand

ling

-75

-50

-25

025

scalars

-87.2

40

CA Bt=150ms t=250ms t=500ms t=600ms

t=750ms t=1250ms t=1350mst=1100ms

0"

500"

1000"

1500"

2000"

2500"

3000"

16" 32" 64" 128"

GFLOPs'

GPU"only" Heterogeneous" Communica=on"Disabled"

ScalingPerformance