Distributing Computation to Large GPU Clusters | GTC...

61
Distributing Computation to Large GPU Clusters

Transcript of Distributing Computation to Large GPU Clusters | GTC...

Page 1: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Distributing Computation to Large GPU Clusters

Page 2: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

What is this about?

DiCE: Software library for writing applications

scaling to many GPUs and CPUs in a cluster

Page 3: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

What is this about?

DiCE: Software library for writing applications

scaling to many GPUs and CPUs in a cluster

Used since 2003 in our rendering products...

NVIDIA indeX NVIDIA Iray

courtesy of Vyacheslav Serov

courtesy of Rüdiger Raab

courtesy of Thomas Zancker

Page 4: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Why are we presenting this here?

DiCE is a base technology in indeX

— Clustering / networking /distribution based on DiCE

DiCE API exposed by indeX

— Distribute pre-computation of data for indeX

— Do your own computation…

Page 5: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Design Goals

„Provide a software library to be used by rendering

experts to write scalable software for GPU clusters.“

— Not required: low level paralellization / networking knowledge

— High level of abstraction / easy to use...

— Not specific to special domain (e.g. rendering)

— High performance, meant for interactive applications

Other solutions...

Page 6: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Unique Combination of Features

Simple programming model

Ease of deployment / commodity hardware

Unified multi-core and cluster parallelization

GPU support

Dynamic clustering

Focus on interactive applications

Multi-user support e.g. for web services

Available on Windows, Linux, Mac OS X

Page 7: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Overview

Networking / Clustering

Datastore

Job System

C++ API

Application

Page 8: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Overview

Networking / Clustering

Datastore

Job System

C++ API

Application

Page 9: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Overview

Networking / Clustering

Datastore

Job System

C++ API

Application

Page 10: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Overview

Networking / Clustering

Datastore

Job System

C++ API

Application

Page 11: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

DiCE and indeX

Networking / Clustering

Datastore

Job System

C++ API

Application

indeX

Page 12: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Job System

Networking / Clustering

Datastore

Job System

C++ API

Application

Page 13: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

0 1 2 3 4

5 6 7 8 9

10 11 12 13 14

15 16 17 18 19

Parallelization Model

Programmer: split work in n fragments!

— As independent as possible

— Potentially thousands per „frame“!

No apriori knowledge about resources in the cluster!

Goal: Distribute work over all GPUs / CPUs in cluster

Page 14: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Parallelization Model

Fragmented Job

~ similar to CUDA kernel

Implement C++ class:

void execute_fragment(int i, int n) {…}

To be called once for every fragment

Ask DiCE to execute job in n fragments

0 1 2 3 4

5 6 7 8 9

10 11 12 13 14

15 16 17 18 19

Page 15: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Parallelization Model - Cluster

Not a shared memory model!

Page 16: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Parallelization Model - Cluster

Not a shared memory model!

Idea: Split execution and integration of results

void execute_remote(int i, int n, OUT){…} Remote host

void receive_result(int i, int n, IN) {…} Origin host

execute_remote()+receive_result() = execute_fragment()

Page 17: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Parallelization Model – Single Host

My_job

• Scene

• Camera

• Framebuf[ ]

1 Host

2 GPUs

Page 18: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

0 GPU 1

1 GPU 1

2 GPU 2

3 GPU 2

4 GPU1

5 GPU 2

Parallelization Model – Single Host

My_job

• Scene

• Camera

• Framebuf[ ]

1 Host

2 GPUs

Page 19: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Parallelization Model – Single Host

Exe

cu

te fra

gm

en

t 1

Exe

cu

te fra

gm

en

t 2

Exe

cu

te fra

gm

en

t 4

Exe

cu

te fra

gm

en

t 5

My_job

• Scene

• Camera

• Framebuf[ ]

Exe

cu

te fra

gm

en

t 0

Exe

cu

te fra

gm

en

t 3

0 GPU 1

1 GPU 1

2 GPU 2

3 GPU 2

4 GPU1

5 GPU 2

Page 20: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Parallelization Model – Single Host

Exe

cu

te fra

gm

en

t 1

Exe

cu

te fra

gm

en

t 2

Exe

cu

te fra

gm

en

t 4

Exe

cu

te fra

gm

en

t 5

My_job

• Scene

• Camera

• Framebuf[ ]

Exe

cu

te fra

gm

en

t 0

Exe

cu

te fra

gm

en

t 3

0 GPU 1

1 GPU 1

2 GPU 2

3 GPU 2

4 GPU1

5 GPU 2

Page 21: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Parallelization Model – Single Host

Exe

cu

te fra

gm

en

t 1

Exe

cu

te fra

gm

en

t 2

Exe

cu

te fra

gm

en

t 4

Exe

cu

te fra

gm

en

t 5

My_job

• Scene

• Camera

• Framebuf[ ]

Exe

cu

te fra

gm

en

t 0

Exe

cu

te fra

gm

en

t 3

0 GPU 1

1 GPU 1

2 GPU 2

3 GPU 2

4 GPU1

5 GPU 2

Page 22: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Parallelization Model – Single Host

Exe

cu

te fra

gm

en

t 1

Exe

cu

te fra

gm

en

t 2

Exe

cu

te fra

gm

en

t 4

Exe

cu

te fra

gm

en

t 5

My_job

• Scene

• Camera

• Framebuf[ ]

Exe

cu

te fra

gm

en

t 0

Exe

cu

te fra

gm

en

t 3

0 GPU 1

1 GPU 1

2 GPU 2

3 GPU 2

4 GPU1

5 GPU 2

Page 23: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Parallelization Model

Page 24: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

0 GPU 1

Host 1

1 GPU 1

Host 2

2 GPU 2

Host 2

3 GPU 2

Host 1

4 GPU1

Host 3

5 GPU 2

Host 3

Parallelization Model – 3 Hosts

Host 2 Host 3

Host 1 My_job

• Scene

• Camera

• Framebuf[ ]

3 Host

2 GPUs, each

Page 25: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Host 3 Host 2

Parallelization Model – 3 Hosts

Host 1 My_job

• Scene

• Camera

• Framebuf[ ]

My_job

• Scene

• Camera

My_job

• Scene

• Camera

0 GPU 1

Host 1

1 GPU 1

Host 2

2 GPU 2

Host 2

3 GPU 2

Host 1

4 GPU1

Host 3

5 GPU 2

Host 3

Page 26: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Host 3 Host 2

Parallelization Model – 3 Hosts

Host 1 My_job

• Scene

• Camera

• Framebuf[ ]

My_job

• Scene

• Camera

My_job

• Scene

• Camera

Exe

cu

te re

mo

te 1

Exe

cu

te re

mo

te 2

Exe

cu

te re

mo

te 4

Exe

cu

te re

mo

te 5

Exe

cu

te fra

gm

en

t 0

Exe

cu

te fra

gm

en

t 3

0 GPU 1

Host 1

1 GPU 1

Host 2

2 GPU 2

Host 2

3 GPU 2

Host 1

4 GPU1

Host 3

5 GPU 2

Host 3

Page 27: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Host 3 Host 2

Parallelization Model – 3 Hosts

Host 1 My_job

• Scene

• Camera

• Framebuf[ ]

My_job

• Scene

• Camera

My_job

• Scene

• Camera

Exe

cu

te re

mo

te 1

Exe

cu

te re

mo

te 2

Exe

cu

te re

mo

te 4

Exe

cu

te re

mo

te 5

Exe

cu

te fra

gm

en

t 0

Exe

cu

te fra

gm

en

t 3

0 GPU 1

Host 1

1 GPU 1

Host 2

2 GPU 2

Host 2

3 GPU 2

Host 1

4 GPU1

Host 3

5 GPU 2

Host 3

Page 28: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Host 3 Host 2

Parallelization Model – 3 Hosts

Host 1 My_job

• Scene

• Camera

• Framebuf[ ]

My_job

• Scene

• Camera

My_job

• Scene

• Camera

Exe

cu

te re

mo

te 1

Exe

cu

te re

mo

te 2

Exe

cu

te re

mo

te 4

Exe

cu

te re

mo

te 5

Exe

cu

te fra

gm

en

t 0

Exe

cu

te fra

gm

en

t 3

Re

ce

ive

resu

lt 1

Re

ce

ive

resu

lt 2

Re

ce

ive

resu

lt 4

Re

ce

vie

resu

lt 5

0 GPU 1

Host 1

1 GPU 1

Host 2

2 GPU 2

Host 2

3 GPU 2

Host 1

4 GPU1

Host 3

5 GPU 2

Host 3

Page 29: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Parallelization Model – 3 Hosts

Page 30: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Parallelization Model - Hierarchical

Viewer Host

Compositor Host

Render Host

GPUs

Compositor Job

GPU Fragment

Rendering Job

GPU Job

Page 31: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Datastore

Networking / Clustering

Datastore

Job System

C++ API

Application

Page 32: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Datastore

In memory NoSQL datastore for arbitrary C++ objects

Store object on some host / retrieve on any host

Data transport (mostly) transparent to application

Page 33: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Datastore Objects

class My_adder Your class

{

float m_a;

int m_b;

float sum() { return m_a + m_b; }

};

Page 34: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Datastore Objects

class My_adder

{

float m_a; Arbitrary member variables

int m_b;

float sum() { return m_a + m_b; }

};

Page 35: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Datastore Objects

class My_adder

{

float m_a;

int m_b;

float sum() { return m_a + m_b; } Arbitrary member functions

};

Page 36: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Datastore Objects

class My_adder : public Element< UUID > Derive from base class

{

float m_a;

int m_b;

float sum() { return m_a + m_b; }

};

Page 37: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Datastore Objects

class My_adder : public Element< UUID >

{

float m_a;

int m_b;

void serialize(Serializer* serializer) Implement serialization

{

serializer->write(m_a);

serializer->write(m_b);

}

};

Page 38: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Datastore Objects

class My_adder : public Element< UUID >

{

float m_a;

int m_b;

void serialize(Serializer* serializer);

void deserialize(Deserializer* deserializer) Implement deserialization

{

deserializer->read(m_a);

deserializer->read(m_b);

}

};

Page 39: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Datastore Objects

class My_adder : public Element< UUID >

{

float m_a;

int m_b;

void serialize(ISerializer* serializer);

void deserialize(IDeserializer* deserializer);

};

register_serializable_class< My_adder >(); Register class

Page 40: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Datastore: Cache

Per host cache for objects

— Accessing object will make sure it is in the cache!

— If necessary fetch from other hosts

If cache is full: throw away objects owned by others (LRU)

— Store more data in cluster than a single host could

Configurable redundant storage for handling host failure

Page 41: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Datastore Transactions

Important for multi-user operation

Page 42: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Datastore Transactions

Important for multi-user operation

ACID

— Atomicity: Transaction commit, abort

— Isolation: Starting transaction “freezes” view on datastore

Page 43: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Datastore Transactions

Important for multi-user operation

ACID

— Atomicity: Transaction commit, abort

— Consistency: Cluster wide locks available

— Isolation: Starting transaction “freezes” view on datastore

— Durability: Redundancy

Page 44: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Transaction Isolation

A X

T7

T8

1

Page 45: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Transaction Isolation

Isolation based on multi-version capability

A5 X9

T7

T8

1

Page 46: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Transaction Isolation

Isolation based on multi-version capability

Copy-on-write

A5 X9

T7

T8

1 X10

Page 47: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Transaction Isolation

Isolation based on multi-version capability

Copy-on-write

A5 X9

T7

T8

1 X10

Page 48: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Transaction Isolation

Isolation based on multi-version capability

Copy-on-write

A5

T8

1 X10

Page 49: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Networking / Clustering

Networking / Clustering

Datastore

Job System

C++ API

Application

Page 50: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Networking / Clustering

Handles cluster building and data transfers

— Self-organizing, dynamic addition and removal of hosts

— Tested with up to 1000 hosts

— Several networking protocols for different environments…

Page 51: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Network Layer: UDP with Multicast

Unicast: Send to each host

Page 52: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Network Layer: UDP with Multicast

Unicast: Send to each host

Multicast: Like radio, send once, received by many

Page 53: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Network Layer: UDP with Multicast

Unicast: Send to each host

Multicast: Like radio, send once, received by many

Page 54: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Network Layer: UDP with Multicast

Self Organization:

— Multicast address identifies cluster

— Multicast “beacon” packets to announce to other hosts

— “Election” process to elect one synchronizer

Multicast / unicast used for bulk data transfers

— Especially effective for many hosts

Page 55: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Network Layer: TCP

For networks with

— low bandwidth multicast or

— No multicast (e.g. Amazon Web Services)

Discovering hosts

— UDP multicast layer or

— At least one know host

TCP used for all data transport

Still fully dynamic

Page 56: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Host 1

Memory

Network Layer: Infiniband

Native Infiniband with RDMA

0x1234

Host 2

Memory

0x4532

CPU CPU

Page 57: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Host 1

Memory

Network Layer: Infiniband

Native Infiniband with RDMA

0x1234

Host 2

Memory

0x4532

CPU CPU

Page 58: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Host 1

Memory

Network Layer: Infiniband

Native Infiniband with RDMA

RDMA used for speeding up bulk data transfer

Fastest transmissions > 30 Gbit/s end-to-end

0x1234

Host 2

Memory

0x4532

CPU CPU

Page 59: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Other Features

More multi-user capabilities (scopes, ...)

„Futures“

Global logging system

HTTP Server

RTMP Video streaming

Cloud Bridge

...

Page 60: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Summary

DiCE is a library for writing scalable applications

DiCE used since 10 years in our rendering products

Currently directly usable to if you use indeX

Page 61: Distributing Computation to Large GPU Clusters | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S... · Networking / Clustering Handles cluster building and data transfers

Thank you …

Stefan Radig Sr. Manager, NVIDIA Iray and DiCE