Increasing cluster throughput with Slurm and rCUDA · Slurm User Group Meeting 2015. September 15....

Post on 22-Sep-2020

0 views 0 download

Transcript of Increasing cluster throughput with Slurm and rCUDA · Slurm User Group Meeting 2015. September 15....

Increasing cluster throughput with Slurm and rCUDA

Federico SillaTechnical University of Valencia

Spain

Increasing cluster throughput with Slurm and rCUDA

Slurm User Group Meeting 2015. September 15th 3/37

Outline

What is “rCUDA”?

st

Slurm User Group Meeting 2015. September 15th 4/37

Basics of GPU computing

Basic behavior of CUDA

GPU

Slurm User Group Meeting 2015. September 15th 5/37

rCUDA: remote GPU virtualization

A software technology that enables a more flexible use of GPUs in computing facilities

No GPU

Slurm User Group Meeting 2015. September 15th 6/37

Basics of remote GPU virtualization

Slurm User Group Meeting 2015. September 15th 7/37

Basics of remote GPU virtualization

Slurm User Group Meeting 2015. September 15th 8/37

Remote GPU virtualization allows a new vision of a GPU deployment, moving from the usual cluster configuration:

Remote GPU virtualization envision

PCI-e CPU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

Interconnection Network

PCI-e CPU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

PCI-e CPU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

PCI-e CPU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

PCI-e CPU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

PCI-e CPUGPU GPU

mem

Main

Mem

ory

Network

GPU GPUmem

to the following one ….

Slurm User Group Meeting 2015. September 15th 9/37

Physicalconfiguration

CPU

Main

Mem

ory

Network

PCI-e

CPU

Main

Mem

ory

Network

PCI-e

CPU

Main

Mem

oryNetwork

PCI-e

CPU

Main

Mem

ory

Network

PCI-e

CPU

Main

Mem

ory

Network

PCI-e

PCI-e

CPU

Main

Mem

ory

Network

Interconnection Network

Logical connections

Logicalconfiguration

Remote GPU virtualization envision

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

PCI-e C

PU

Main

Mem

ory

Network

Interconnection Network

PCI-e C

PU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

PCI-e C

PU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

CPU

Main

Mem

ory

Network

GPU GPUmem

GPU GPUmem

PCI-e C

PU

Main

Mem

ory

Network

GPU GPUmem

GPU GPUmem

PCI-e C

PU

Main

Mem

ory

Network

GPU GPUmem

GPU GPUmem

PCI-e

Slurm User Group Meeting 2015. September 15th 10/37

Several efforts have been made to implement remote GPU virtualization during the last years:

rCUDA (CUDA 7.0) GVirtuS (CUDA 3.2) DS-CUDA (CUDA 4.1) vCUDA (CUDA 1.1) GViM (CUDA 1.1) GridCUDA (CUDA 2.3) V-GPU (CUDA 4.0)

Remote GPU virtualization frameworks

Publicly available

NOT publicly available

Slurm User Group Meeting 2015. September 15th 11/37

Outline

Is “remote GPU virtualization” useful?

nd

Slurm User Group Meeting 2015. September 15th 12/37

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

Without GPU virtualization

GPU virtualization is useful for multi-GPU applications

CPU

Main

Mem

ory

Network

PCI-e

CPU

Main

Mem

ory

Network

PCI-e

CPU

Main

Mem

ory

Network

PCI-e

CPU

Main

Mem

ory

Network

PCI-e

CPU

Main

Mem

ory

Network

PCI-e

PCI-e

CPU

Main

Mem

ory

Network

Interconnection Network

Logical connectionsPC

I-e CPU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

Interconnection Network

PCI-e C

PU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

PCI-e C

PU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

PCI-e C

PU

GPU GPUmem

Main

Mem

oryNetwork

GPU GPUmem

PCI-e C

PU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

PCI-e C

PU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

With GPU virtualization

Many GPUs in the cluster can be provided

to the application

Only the GPUs in the node can be provided

to the application

1: more GPUs for a single application

Slurm User Group Meeting 2015. September 15th 13/37

1: more GPUs for a single application

64GPUs!

Slurm User Group Meeting 2015. September 15th 14/37

MonteCarlo Multi-GPU (from NVIDIA samples)

Loweris better

Higher is better

1: more GPUs for a single application

FDR InfiniBand +

NVIDIA Tesla K20

Slurm User Group Meeting 2015. September 15th 15/37

GPU GPUmem

GPU GPUmem

PCI-e C

PU

Main

Mem

ory

Network

Interconnection Network

PCI-e C

PU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

PCI-e C

PU

GPU GPUmem

Main

Mem

ory

Network

GPU GPUmem

CPU

Main

Mem

ory

Network

GPU GPUmem

GPU GPUmem

PCI-e C

PU

Main

Mem

ory

Network

GPU GPUmem

GPU GPUmem

PCI-e C

PU

Main

Mem

ory

Network

GPU GPUmem

GPU GPUmem

PCI-e

CPU

Main

Mem

ory

Network

PCI-e

CPU

Main

Mem

ory

Network

PCI-e

CPU

Main

Mem

oryNetwork

PCI-e

CPU

Main

Mem

ory

Network

PCI-e

CPU

Main

Mem

ory

Network

PCI-e

PCI-e

CPU

Main

Mem

ory

Network

Interconnection Network

Logical connections

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

GPU GPUmem

Physicalconfiguration

Logicalconfiguration

2: increased cluster performance

Slurm User Group Meeting 2015. September 15th 16/37

3: easier cluster upgrade

No GPU

• A cluster without GPUs may be easily upgraded to use GPUs with rCUDA

Slurm User Group Meeting 2015. September 15th 17/37

3: easier cluster upgrade

• A cluster without GPUs may be easily upgraded to use GPUs with rCUDA

GPU-enabled

Slurm User Group Meeting 2015. September 15th 18/37

Box A has 4 GPUs but only one is busy Box B has 8 GPUs but only two are busy

1. Move jobs from Box B to Box A and switch off Box B

2. Migration should be transparent to applications (decided by the global scheduler)

TRUE GREEN GPU COMPUTING

4: GPU task migration

Box A

Box B

Slurm User Group Meeting 2015. September 15th 19/37

5: virtual machines can easily access GPUs

High performance

network available

Low performance

network available

Slurm User Group Meeting 2015. September 15th 20/37

Outline

Cons of “remote GPU virtualization”?

rd

Slurm User Group Meeting 2015. September 15th 21/37

The main GPU virtualization drawback is the reduced bandwidth to the remote GPU

Problem with remote GPU virtualization

No GPU

Slurm User Group Meeting 2015. September 15th 22/37

Using InfiniBand networks

Slurm User Group Meeting 2015. September 15th 23/37

H2D pageable D2H pageable

H2D pinned

Almost 100% of available BW

D2H pinned

Almost 100% of available BW

rCUDA EDR Orig rCUDA EDR OptrCUDA X3 Orig rCUDA X3 Opt

Impact of optimizations on rCUDA

Slurm User Group Meeting 2015. September 15th 24/37

Effect of rCUDA optimizations on applications

• Several applications executed with CUDA and rCUDA• K20 GPU and FDR InfiniBand• K40 GPU and EDR InfiniBand

Slurm User Group Meeting 2015. September 15th 25/37

Outline

What happens at the cluster level?

th

Slurm User Group Meeting 2015. September 15th 26/37

• GPUs can be shared among jobs running in remote clients• Job scheduler required for coordination• Slurm

App 2App 1

App 3App 4

rCUDA at cluster level … Slurm

App 6App 5

App 7App 8App 9

Slurm User Group Meeting 2015. September 15th 27/37

• A new resource has been added: rgpu

rCUDA at cluster level … Slurm

slurm.confSelectType = select / cons_rgpuSelectTypeParameters = CR_COREGresTypes = rgpu [,gpu]

NodeName = node1 NodeHostname = node1CPUs =12 Sockets =2 CoresPerSocket =6ThreadsPerCore =1 RealMemory =32072Gres = rgpu :1[ , gpu :1]

gres.confName = rgpu File =/ dev/ nvidia0 Cuda =3.5 Mem =4726 M[ Name =gpu File =/ dev/ nvidia0 ]

New submission options:--rcuda-mode=(shared|excl)--gres=rgpu(:X(:Y)?(:Z)?)?

X = [1-9]+[0-9]*Y = [1-9]+[0-9]*[ kKmMgG]Z = [1-9]\.[0-9](cc|CC)

Slurm User Group Meeting 2015. September 15th 28/37

Non

-GPU

Short execution time

Long execution time

Ongoing work: studying rCUDA+Slurm

Analysis of different GPU assignment policies Based on GPU-memory occupancy Based on GPU utilization

Applications used for tests: GPU-Blast (21 seconds; 1 GPU; 1599 MB) LAMMPS (15 seconds; 4 GPUs; 876 MB) MCUDA-MEME (165 seconds; 4 GPUs; 151 MB) GROMACS (167 seconds) NAMD (11 minutes) BarraCUDA (10 minutes; 1 GPU; 3319 MB) GPU-LIBSVM (5 minutes; 1GPU; 145 MB) MUMmerGPU (5 minutes; 1GPU; 2804 MB)

Set 1

Set 2

Slurm User Group Meeting 2015. September 15th 29/37

InfiniBand ConnectX-3 FDR based cluster Dual socket E5-2620v2 Intel Xeon based nodes:

1 node without GPU (Slurm controller) 16 nodes with NVIDIA K20 GPU

Three workloads: Set 1 Set 2 Set 1 + Set 2

Three workload sizes: Small (100 jobs) Medium (200 jobs) Large (400 jobs)

1 node hosting the main Slurm

controller

16 nodes with one K20 GPU

each

Test bench for studying rCUDA+Slurm

Slurm User Group Meeting 2015. September 15th 30/37

Results for execution time 8 nodes

16 nodes

Loweris better

4 nodes

Results for Set 1

Slurm User Group Meeting 2015. September 15th 31/37

Reducing the amount of GPUs

35% Less

39% Less

41% Less

Slurm User Group Meeting 2015. September 15th 32/37

Results for energy consumption8 nodes

16 nodes

Loweris better

4 nodes

Results for Set 1

Slurm User Group Meeting 2015. September 15th 33/37

Energy when removing GPUs

39% Less

42% Less

44% Less

Slurm User Group Meeting 2015. September 15th 34/37

Outline

… in summary …

th

Slurm User Group Meeting 2015. September 15th 35/37

• High Throughput Computing• Sharing remote GPUs makes applications

to execute slower … BUT more throughput (jobs/time) is achieved

• Green Computing• GPU migration and application migration allow to devote just the

required computing resources to the current workload

• More flexible system upgrades• GPU and CPU updates become independent

from each other. Attaching GPU boxes to non GPU-enabled clusters is possible

• Datacenter administrators can choose between HPC and HTC

Slurm + rCUDA allow …

Slurm User Group Meeting 2015. September 15th 36/37

Get a free copy of rCUDA athttp://www.rcuda.net

@rcuda_

More than 650 requests world wide

Sergio Iserte Carlos Reaño Javier Prades Fernando Campos

rCUDA is owned by Technical University of Valencia

Slurm User Group Meeting 2015. September 15th 37/37

Thanks!Questions?