Tammy Dahlgren, Tom Epperly, Scott Kohn, & Gary Kumfert

46
Tammy Dahlgren, Tom Epperly, Scott Kohn, & Gary Kumfert

description

Tammy Dahlgren, Tom Epperly, Scott Kohn, & Gary Kumfert. Going Parallel. Goals. Describe our vision to the CCA Solicit contributions (code) for: RMI (SOAP | SOAP w/ Mime types) Parallel Network Algs (general arrays) Encourage Collaboration. Outline. Background on Components @llnl.gov - PowerPoint PPT Presentation

Transcript of Tammy Dahlgren, Tom Epperly, Scott Kohn, & Gary Kumfert

Page 1: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

Tammy Dahlgren, Tom Epperly, Scott Kohn, & Gary Kumfert

Page 2: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 2CASC

Goals

Describe our vision to the CCA

Solicit contributions (code) for: RMI (SOAP | SOAP w/ Mime types) Parallel Network Algs (general arrays)

Encourage Collaboration

Page 3: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 3CASC

Outline

Background on Components @llnl.gov General MxN Solution : bottom-up

Initial Assumptions MUX Component MxNRedistributable interface

Parallel Handles to a Parallel Distributed Component

Tentative Research Strategy

Page 4: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 4CASC

Components @llnl.gov

Quorum - web voting Alexandria - component repository Babel - language interoperability

maturing to platform interoperability … implies some RMI mechanismSOAP | SOAP w/ MIME typesopen to suggestions,

& contributed sourcecode

Page 5: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 5CASC

Babel & MxN problem

Unique Opportunities SIDL communication directives Babel generates code anyway Users already link against Babel

Runtime Library Can hook directly into Intermediate

Object Representation (IOR)

Page 6: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 6CASC

Impls and Stubs and Skels

Application

Stubs

Skels

IORs

Impls

Application: uses components in user’s language of choice

Client Side Stubs: translate from application language to C

Internal Object Representation: Always in C

Server Side Skeletons: translates IOR (in C) to component implementation language

Implementation: component developers choice of language. (Can be wrappers to legacy code)

Page 7: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 8CASC

Internet

Remote Components

Application

Stubs

Marshaler

IORs

Line Protocol

Line Protocol

Unmarshaler

Skels

IORs

Impls

Page 8: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 9CASC

Outline

Background on Components @llnl.gov General MxN Solution : bottom-up

Initial Assumptions MUX Component MxNRedistributable interface

Parallel Handles to a Parallel Distributed Component

Tentative Research Strategy

Page 9: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 10CASC

Initial Assumptions

Working Point-to-Point RMI Object Persistence

Page 10: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 11CASC

Example #1: 1-D Vectors

p0

in t g loba lS ize = 6 ;in t loca lS ize = 2 ;in t[ ] loca l2globa l = { 0 , 1 };doub le [ ] loca lD ata = { 0 .0 , 1 .1 };

V ector y;

p1

in t g loba lS ize = 6 ;in t loca lS ize = 2 ;in t[ ] loca l2globa l = { 2 , 3 };doub le [ ] loca lD ata = { 2 .2 , 3 .3 };

V ector y;

p2

in t g loba lS ize = 6 ;in t loca lS ize = 2 ;in t[ ] loca l2globa l = { 4 , 5 };doub le [ ] loca lD ata = { 4 .4 , 5 .5 };

V ector y;

p4in t g loba lS ize = 6 ;in t loca lS ize = 3 ;in t[ ] loca l2globa l = { 3 , 4 , 5 };doub le [ ] loca lD ata = { .6 , .5 , .4 };

V ector x;

doub le resu lt;

p3in t g loba lS ize = 6 ;in t loca lS ize = 3 ;in t[ ] loca l2globa l = { 0 , 1 , 2 };doub le [ ] loca lD ata = { .9 , .8 , .7 };

V ector x;

doub le resu lt;0.0, 1.1p0

x

2.2, 3.3p1

x

4.4, 5.5p2

x

.9, .8, .7p3

y

.6, .5, .4p4

y

double d = x.dot( y );double d = x.dot( y );

Page 11: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 12CASC

Example #1: 1-D Vectors

p0

in t g loba lS ize = 6 ;in t loca lS ize = 2 ;in t[ ] loca l2globa l = { 0 , 1 };doub le [ ] loca lD ata = { 0 .0 , 1 .1 };

V ector y;

p1

in t g loba lS ize = 6 ;in t loca lS ize = 2 ;in t[ ] loca l2globa l = { 2 , 3 };doub le [ ] loca lD ata = { 2 .2 , 3 .3 };

V ector y;

p2

in t g loba lS ize = 6 ;in t loca lS ize = 2 ;in t[ ] loca l2globa l = { 4 , 5 };doub le [ ] loca lD ata = { 4 .4 , 5 .5 };

V ector y;

p4in t g loba lS ize = 6 ;in t loca lS ize = 3 ;in t[ ] loca l2globa l = { 3 , 4 , 5 };doub le [ ] loca lD ata = { .6 , .5 , .4 };

V ector x;

doub le resu lt;

p3in t g loba lS ize = 6 ;in t loca lS ize = 3 ;in t[ ] loca l2globa l = { 0 , 1 , 2 };doub le [ ] loca lD ata = { .9 , .8 , .7 };

V ector x;

doub le resu lt;p0

x

p1

x

p2

x

p3

y

p4

y

double d = x.dot( y );double d = x.dot( y );

Page 12: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 13CASC

Rule #1: Owner Computes

double vector::dot( vector& y ) {

// initialize double * yData = new double[localSize]; y.requestData( localSize, local2global, yData);

// sum all x[i] * y[i] double localSum = 0.0; for( int i=0; i<localSize; ++i ) { localSum += localData[i] * yData[i]; }

// cleanup delete[] yData; return localMPIComm.globalSum( localSum );}

Page 13: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 14CASC

Design Concerns

vector y is not guaranteed to have data mapped appropriately for dot product.

vector y is expected to handle MxN data redistribution internally

Should each component implement MxN redistribution?

y.requestData( localSize, local2global, yData);y.requestData( localSize, local2global, yData);

Page 14: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 15CASC

Outline

Background on Components @llnl.gov General MxN Solution : bottom-up

Initial Assumptions MUX Component MxNRedistributable interface

Parallel Handles to a Parallel Distributed Component

Tentative Research Strategy

Page 15: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 16CASC

Vector Dot Product: Take #2

double vector::dot( vector& y ) {

// initialize MUX mux( *this, y ); double * yData =

mux.requestData( localSize, local2global );

// sum all x[i] * y[i] double localSum = 0.0; for( int i=0; i<localSize; ++i ) { localSum += localData[i] * yData[i]; }

// cleanup mux.releaseData( yData ); return localMPIComm.globalSum( localSum );}

Page 16: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 17CASC

Generalized Vector Ops

vector<T>::parallelOp( vector<T>& y ) {

// initialize MUX mux( *this, y ); vector<T> newY =

mux.requestData( localSize, local2global );

// problem reduced to a local operation result = x.localOp( newY );

// cleanup mux.releaseData( newY ); return localMPIComm.reduce( localResult );}

Page 17: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 18CASC

Rule #2: MUX distributes data

Users invoke parallel operations without concern to data distribution

Developers implement local operation assuming data is already distributed

Babel generates code that reduces a parallel operation to a local operation

MUX handles all communication How general is a MUX?

Page 18: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 19CASC

Example #2: Undirected Graph

1 2

0

3

6 7 8

9 10

11

1 2

0

3 4 5

4 5

6 7

8

9 10

11

Page 19: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 20CASC

Key Observations

Every Parallel Component is a container and is divisible to subsets.

There is a minimal (atomic) addressable unit in each Parallel Component.

This minimal unit is addressable in global indices.

Page 20: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 21CASC

Atomicity

Vector (Example #1): atom - scalar addressable - integer offset

Undirected Graph (Example #2): atom - vertex with ghostnodes addressable - integer vertex id

Undirected Graph (alternate): atom - edge addressable - ordered pair of integers

Page 21: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 22CASC

Outline

Background on Components @llnl.gov General MxN Solution : bottom-up

Initial Assumptions MUX Component MxNRedistributable interface

Parallel Handles to a Parallel Distributed Component

Tentative Research Strategy

Page 22: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 23CASC

MxNRedistributable Interface

interface Serializable {

store( in Stream s ); load( in Stream s );};

interface MxNRedistributable extends Serializable {

int getGlobalSize(); local int getLocalSize(); local array<int,1> getLocal2Global();

split ( in array<int,1> maskVector, out array<MxNRedistributable,1> pieces);

merge( in array<MxNRedistributable,1> pieces);};

Page 23: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 24CASC

Rule #3: All Parallel Components implement “MxNRedistributable”

Provides standard interface for MUX to manipulate component

Minimal coding requirements to developer

Key to abstraction split() merge()

Manipulates “atoms” by global address

Page 24: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 25CASC

Now for the hard part...

... 13 slides illustrating how it all fits together for an Undirected Graph

Page 25: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 26CASC

pid=0pid=0

pid=1pid=1

%> mpirun -np 2 graphtest

Page 26: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 27CASC

BabelOrb * orb = BabelOrb.connect( “http://...”);

pid=0pid=0

pid=1pid=1

1

2orb

3

orb 4

Page 27: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 28CASC

Graph * graph = orb->create(“graph”,3);

orb

orb

graph

graph

graph

graph

graph

MUX

MUX

MUX

MUX

MUX

1

2

3

4

Page 28: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 29CASC

graph

MUX

graph

MUX

graph

MUX

graph->load(“file://...”);

1 2

0

3 4 5

6

5

6 7 8

9 10

11

2

3 4 5

6 7 8

9 10

orb

orb

graph

MUX

graph

MUX

FancyMUX

Routing

1

2

Page 29: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 30CASC

graph->doExpensiveWork();

1 2

0

3 4 5

6

2

3 4 5

6 7 8

9 10

5

6 7 8

9 10

11

orb

orb

graph

MUX

graph

MUX

1 2

0

3 4 5

6

2

3 4 5

6 7 8

9 10

5

6 7 8

9 10

11

1 2

0

3 4 5

6

2

3 4 5

6 7 8

9 10

5

6 7 8

9 10

11

Page 30: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 31CASC

PostProcessor * pp = new PostProcessor;

1 2

0

3 4 5

6

2

3 4 5

6 7 8

9 10

5

6 7 8

9 10

11

orb

orb

graph

MUX

graph

MUX

pp

pp

Page 31: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 32CASC

pp->render( graph );

1 2

0

3 4 5

6

2

3 4 5

6 7 8

9 10

5

6 7 8

9 10

11

orb

orb

graph

MUX

graph

MUX

pp

pp

MUX queries graph for global size (12)

Graph determinesparticulardata layout(blocked)

012345

6789

1011

require

require

MUX is invoked to guaranteethat layout before render implementation is called

Page 32: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 33CASC

MUX solves general parallel network flow problem (client & server)

1 2

0

3 4 5

6

2

3 4 5

6 7 8

9 10

5

6 7 8

9 10

11

orb

orb

graph

MUX

graph

MUX

pp

pp

012345

6789

1011

require

require

0, 1, 2, 3

4, 5

6, 7

8, 9, 10, 11

Page 33: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 34CASC

MUX opens communication pipes

1 2

0

3 4 5

6

2

3 4 5

6 7 8

9 10

5

6 7 8

9 10

11

orb

orb

graph

MUX

graph

MUX

pp

pp

012345

6789

1011

require

require

0, 1, 2, 3

4, 5

6, 7

8, 9, 10, 11

Page 34: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 35CASC

MUX splits graphs with multiple destinations (server-side)

1 2

0

3 4 5

6

2

3 4 5

6 7 8

9 10

5

6 7 8

9 10

11

orb

orb

graph

MUX

graph

MUX

pp

pp

012345

6789

1011

require

require

0, 1, 2, 3

4, 5

6, 7

8, 9, 10, 11

2

3 4 5

6 7 8

3 4 5

6 7 8

9 10

Page 35: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 36CASC

MUX sends pieces through communication pipes (persistance)

1 2

0

3 4 5

6

2

3 4 5

6 7 8

9 10

orb

orb

graph

MUX

graph

MUX

pp

pp

012345

6789

1011

require

require

5

6 7 8

9 10

11

2

3 4 5

6 7 8

3 4 5

6 7 8

9 10

1 2

0

3 4 5

6

5

6 7 8

9 10

11

Page 36: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 37CASC

MUX receives graphs through pipes & assembles them (client side)

1 2

0

3 4 5

6

2

3 4 5

6 7 8

9 10

orb

orb

graph

MUX

graph

MUX

pp

pp

012345

6789

1011

require

require

5

6 7 8

9 10

11

2

3 4 5

6 7 8

3 4 5

6 7 8

9 10

1 2

0

3 4 5

6

5

6 7 8

9 10

11

3 4 5

6 7 8

9 10

11

1 2

0

3 4 5

6 7 8

Page 37: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 38CASC

pp -> render_impl( graph );(user’s implementation runs)

1 2

0

3 4 5

6

3 4 5

6 7 8

9 10

11

1 2

0

3 4 5

6 7 8

2

3 4 5

6 7 8

9 10

5

6 7 8

9 10

11

Page 38: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 39CASC

Outline

Background on Components @llnl.gov General MxN Solution : bottom-up

Initial Assumptions MUX Component MxNRedistributable interface

Parallel Handles to a Parallel Distributed Component

Tentative Research Strategy

Page 39: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 40CASC

Summary

All distributed components are containers and subdivisable

The smallest globally addressable unit is an atom

MxNRedistributable interface reduces general component MxN problem to a 1-D array of ints

MxN problem is a special case of the general problem N handles to M instances

Babel is uniquely positioned to contribute a solution to this problem

Page 40: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 41CASC

Outline

Background on Components @llnl.gov General MxN Solution : bottom-up

Initial Assumptions MUX Component MxNRedistributable interface

Parallel Handles to a Parallel Distributed Component

Tentative Research Strategy

Page 41: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 42CASC

Tentative Research Strategy

Java only, no Babel serialization &

RMI built-in Build MUX Experiment Write Paper

Finish 0.5.x line

add serialization

add RMI

Add in technology from Fast Track

Fast Track Sure Track

Page 42: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 43CASC

Open Questions

Non-general, Optimized Solutions Client-side Caching issues Fault Tolerance Subcomponent Migration Inter vs. Intra component

communication MxN , MxP, or MxPxQxN

Page 43: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 44CASC

MxPxQxN Problem

Long-Haul Network

Page 44: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 45CASC

MxPxQxN Problem

Long-Haul Network

Page 45: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 46CASC

The End

Page 46: Tammy Dahlgren, Tom Epperly,  Scott Kohn, & Gary Kumfert

GKK 47CASC

Work performed under the auspices of the U. S. Department of Energy by the University of California, Lawrence Livermore National Laboratory under Contract W-7405-Eng-48

UCRL-VG-142096