Download - Parallel CORBA Objects CORBA

Transcript
Page 1: Parallel CORBA Objects CORBA

Parallel CORBA Objects CORBA

May, 22nd 2000

ARC « Couplage »

Christophe René (IRISA/IFSIC)

Page 2: Parallel CORBA Objects CORBA

Contents

Introduction

Parallel CORBA object concept

Performance evaluation

Encapsulation example

Conclusion

Page 3: Parallel CORBA Objects CORBA

Introduction

Objective To design a Problem Solving Environment able to integrate a large

number of codes aiming at simulating a physical problem To perform multi-physics simulation (code coupling)

Constraints Simulation codes may be located on different machines

• distributed processing

Simulation codes may require high performance computers• parallel processing

Approach Combining both parallel and distributed technologies using a

component approach (MPI + CORBA)

Page 4: Parallel CORBA Objects CORBA

CORBA Generalities

CORBA: Common Object Request Broker ArchitectureOpen standard for distributed object computing by the OMG

Software bus, object oriented Remote invocation mechanism Hardware, operating system and programming language

independence Vendor independence (interoperability)

Problems to face Performance issues Poor integration of high performance computing environments with

CORBA

Page 5: Parallel CORBA Objects CORBA

How does CORBA work ?

Interface Definition Language (IDL)

Describe remote object

IDL compiler Stub and skeleton code

generation

IDL stub (proxy) Handle remote invocation

IDL skeleton Link between object

implementation and ORB

interface MatrixOperations {const long SIZE = 100;typedef double Vector[ SIZE ];typedef double Matrix[ SIZE ][ SIZE ];void multiply( in Matrix A, in Vector B,

out Vector C );};

Server

Client

IDL stub

Objetinvocation

Object Request Broker (ORB)

IDLcompiler

OA

IDL skeleton

Object implementation

Page 6: Parallel CORBA Objects CORBA

Encapsulating MPI-based parallel codes into CORBA objects

Master/slave approach One SPMD code acts as the

master whereas the others act as slaves

The master drives the execution of the slaves through message-passing

Drawbacks Lack of scalability when

communicating through the ORB Need modifications to the original

MPI code

Advantage Can be used with any CORBA

implementation

SPMDcode

MPI Communication layer

Encapsulated MPI Code

MPI Slave processes

MP

I M

as

ter

pro

ce

ss

SPMDcode

SPMDcode

SPMDcode

SPMDcode

SPMDcode

SPMDcode

Skel.

OAStub

Scheduler

Client

CORBA ORB

Page 7: Parallel CORBA Objects CORBA

Master has to Select the method to invoke within the slave processes Scatter data to slave processes Gather data from slave processes

Master process CORBA + MPI

initialization

Slave processes MPI initialization

Master / Slave approach in details

MPI Communication layer

Encapsulated MPI Code

MP

I S

lav

e p

roc

es

se

s

MP

I M

as

ter

pro

ce

ss

SPMDcode

Skel.

OAStub

Scheduler

Client

CORBA ORB

SPMDcode

SPMDcode

Disp. Disp. Disp.

SPMDcode

Page 8: Parallel CORBA Objects CORBA

Parallel CORBA object concept

A collection of identical CORBA objects

Transparent to the client Parallel remote invocation Data distribution

CORBA ORB

MPI Communication layer

Parallel CORBA Object

SPMDcode

Skel.

OA

SPMDcode

Skel.

OA

SPMDcode

Skel.

OA

Parallel Server

Stub

Client

SequentialClient

Page 9: Parallel CORBA Objects CORBA

Problems to face

Communication between a sequential client and a parallel server a parallel client and a sequential server a parallel client and a parallel server

Implementation constraints Do not modify the ORB core to keep interoperability features

Approach Modify stub and skeleton code Extend the IDL compiler

Page 10: Parallel CORBA Objects CORBA

Extended-IDL

Collection specification Size specification = number of requests to send Shape specification used to distribute arrays

Data distribution specification Scatter and gather elements of an array

Reduction operator specification Perform collective operations using request replies

Page 11: Parallel CORBA Objects CORBA

Specifying number of objects in a collection

Several ways: integer value interval of integer value mathematical function

• power

• exponential

• multiple

character “*”

interface[ 4 ] Example1 { /* ... */};

interface[ 2 .. 8 ] Example2 { /* ... */};

interface[ 2 ^ n ] Example3 { /* ... */};

interface[ * ] Example4 { /* ... */};

Page 12: Parallel CORBA Objects CORBA

Shape depends on data distribution specification, but users may add special requirements

Shape of the object collection

How can we organize 8 objets ?

Page 13: Parallel CORBA Objects CORBA

Shape of the object collection (cont’d)

Specification of the shape

size of one dimension• integer value

• mathematical function

–multiple dependence between

dimensions

interface[ 8: 2, 4 ] Example1 { /* ... */};

interface[ *: 2 ] Example2 { /* ... */};

interface[ *: *, 2 ] Example3 { /* ... */};

interface[ *: 2 * n ] Example3 { /* ... */};

interface[ x ^ 2: n, n ] Example4 { /* ... */};

Page 14: Parallel CORBA Objects CORBA

Inheritance mechanism

Under some constraints numbers of processors must match

shapes of virtual nodes array must match

interface[ * ] Example1 { /* ... */};interface[ 2 ^ n ] Example2 : Example1 { /* ... */};

interface[ 2 ^ n ] Example1 { /* ... */};interface[ * ] Example2 : Example1 { /* ... */};

interface[ * ] Example1 { /* ... */};interface[ * : 2 ] Example2: Example1 { /* ... */};

interface[ *: 2 ] Example1 { /* ... */};interface[ *: 3 ] Example2: Example1 { /* ... */};

Inheritance not allowed Inheritance allowed

Inheritance allowed Inheritance not allowed

Page 15: Parallel CORBA Objects CORBA

Specifying Data distribution

New keyword: dist

Only arrays and sequences may be distributed

Available distribution mode: BLOCK BLOCK( size ) CYCLIC CYCLIC( size ) “*”

interface[ * ] Example { typedef double Arr1[ 8 ]; typedef Arr1 Arr2[ 8 ]; typedef sequence< double > Seq; void Op1( in dist[ CYCLIC ] Arr1 A, in Arr1 B, out dist[ BLOCK ][ * ] Arr2 C );

void Op2( in dist[ BLOCK ] Seq A, inout Seq B );};

Page 16: Parallel CORBA Objects CORBA

Block( 5 )

Block = Block( BlockSize )

BlockSize = ( ArrayLength + ProcNb - 1 ) / ProcNb

Cyclic( 3 )

Cyclic = Cyclic( 1 )

Distribution examples on 2 processors

Page 17: Parallel CORBA Objects CORBA

0 1

2 3

0 1

2 3

Mapping

Vector distribution on a processor matrix

interface[ * ] Example { typedef double Arr[ 8 ]; void Op1( in dist[ BLOCK ] Arr A ); void Op2( in dist[ BLOCK, 2 ] Arr A );};

Op2 Op1

Extended-IDL specification

Page 18: Parallel CORBA Objects CORBA

Mapping (cont’d)typedef double Arr1[ 8 ];typedef Arr1 Arr2[ 8 ];

interface[ * ] Example1 { void Op1( in dist[ * ][ CYCLIC ] Arr2 A ); void Op2( in dist[ CYCLIC, 2 ][ * ] Arr2 A );};

interface[ * ] Example2 { void Op1( in dist[ BLOCK, 2 ][ BLOCK, 1 ] Arr2 A, out dist[ BLOCK ][ BLOCK ] Arr2 B );};

typedef double Arr1[ 8 ];typedef Arr1 Arr2[ 8 ];

interface[ * ] Example3 { void Op2( in dist[ CYCLIC, 2 ][ CYCLIC ] Arr2 A );};

Specification not allowed

Specification allowed

Page 19: Parallel CORBA Objects CORBA

Reduction operators

Reduction operator available: min, max addition (sum), multiplier (prod) bitwise operation (and, or, xor) logical operation (and, or, xor)

interface[ * ] Example1 { typedef double Arr[ 8 ];

cland boolean Op1( in dist[ BLOCK ] Arr A, in double B ); void Op2( in dist[ CYCLIC ] Arr A, inout cmin double B ); void Op3( in dist[ CYCLIC( 3 ) ] Arr A, out csum double B );};

Page 20: Parallel CORBA Objects CORBA

interface MatrixOperations { const long SIZE = 100;

typedef double Vector[ SIZE ]; typedef double Matrix[ SIZE ][ SIZE ];

void multiply( in Matrix A, in Vector B, out Vector C );

double minimum( in Vector A );};

interface[ * ] MatrixOperations { const long SIZE = 100;

typedef double Vector[ SIZE ]; typedef double Matrix[ SIZE ][ SIZE ];

void multiply( in Matrix A, in Vector B, out Vector C );

double minimum( in Vector A ); };

Collection specification

interface[ * ] MatrixOperations { const long SIZE = 100;

typedef double Vector[ SIZE ]; typedef double Matrix[ SIZE ][ SIZE ];

void multiply( in dist[ BLOCK ][ * ] Matrix A, in Vector B, out dist[ BLOCK ] Vector C );

double minimum( in Vector A ); };

Collection specificationData distribution specification

interface[ * ] MatrixOperations { const long SIZE = 100;

typedef double Vector[ SIZE ]; typedef double Matrix[ SIZE ][ SIZE ];

void multiply( in dist[ BLOCK ][ * ] Matrix A, in Vector B, out dist[ BLOCK ] Vector C );

csum double minimum( in Vector A ); };

Collection specificationData distribution specificationReduction operator specification

Summary

Page 21: Parallel CORBA Objects CORBA

Code Generation Problems

New type for distributed parameters: distributed array Amount of data to be sent to remote objects is known at runtime An extension of CORBA sequence Data distribution specification stored in distributed arrays

Skeleton code generation Provide access to data distribution specification

Stub code generation Scatter and gather data among remote objects Manage remote operation invocations

Page 22: Parallel CORBA Objects CORBA

stu

b

void multiply( const Matrix A, const Vector B, Vector C );

...pco->multiply( A, B, C );...cl

ien

t

MPI Communication

layer

SPMDcode

SPMDcodeA

B

Stub code generation

void multiply(in dist[BLOCK][*] Matrix A,

in Vector B,

out dist[BLOCK] Vector C );

Parallel CORBA Object

Skel.

OA

Skel.

OA

CORBA ORB

void multiply( const Matrix_Seq A, const Vector_Seq B, Vector_Seq C );

void multiply( const Matrix_DArray A, const Vector_DArray B, Vector_DArray C );C

A

B

#2

A

B

#1

Requests

A

B

#2

C

A

B

#1

C

Page 23: Parallel CORBA Objects CORBA

Parallel CORBA Object as client

Stub code generation when the client is parallel

Assignment of remote object references to the stubs

Use of distributed data type as operation parameters in the stubs

Exchange of data through MPI by the stubs• to build requests

• to propagate results

Page 24: Parallel CORBA Objects CORBA

CORBA ORB

MPI Communication layer

Parallel CORBA Object

SPMDcode

Stub

SPMDcode

Stub

SPMDcode

Stub

Skel.

OA

Objectimpl.

SequentialServer

Parallel Client

Parallel CORBA Object as client (cont’d)

Only one process has lot of works gather distributed data from other processes send the alone request scatter distributed data to

other processes broadcast value of non

distributed data

Page 25: Parallel CORBA Objects CORBA

Parallel Server

MPI Communication layer

Parallel CORBA Object (size p)

SPMDcode

Skel.

OA

SPMDcode

Skel.

OA

SPMDcode

Skel.

OA

CORBA ORB

MPI Communication layer

Parallel CORBA Object (size n)

SPMDcode

Stub

SPMDcode

Stub

SPMDcode

Stub

Parallel Client

Parallel CORBA Object as client (cont’d)

p requests are dispatched among n objects (cyclic distribution) p < n: data distribution handled by the stub p > n: data distribution handled by the skeleton p = n: user choice

Page 26: Parallel CORBA Objects CORBA

Naming Service

Currently (as defined by the OMG): Provide some methods to access a remote object through a symbolic

name Associate a symbolic name with an object reference and only one

Our needs: Associate a symbolic name with a collection of object references

Implementation constraint: Object reference to the Standard Naming Service and the Parallel Naming

Service must be the same:orb->resolve_initial_reference( “NameService” );

Our solution: Add new methods to the Naming Service interface

Page 27: Parallel CORBA Objects CORBA

Example_impl* obj = new Example_impl();NamingService->join_collection( Matrix_name, obj );...NamingService->leave_collection( Matrix_name, obj );

Server side

objs = NamingService->resolve_collection( Matrix_name );srv = Example::_narrow( objs );...srv->op1( A, B, C );

Client side

module CosNaming { ... interface NamingContext { ... typedef sequence<Object> ObjectCollection; void join_collection( in Name n, in Object obj ); void leave_collection( in Name n, in Object obj ); ObjectCollection resolve_collection( in Name n ); };};

Extensionto the

CosNaming IDL

specification

Extension to the Naming Service

Page 28: Parallel CORBA Objects CORBA

Implementation

Using MICO implementation of CORBA

Library (not included in the ORB core) Parallel CORBA object base class Functions to handle distributed data Data redistribution library interface

Extended-IDL compiler (extension of MICO IDL compiler)

Parser Semantic analyzer Code generator

Experimental platform Cluster of PCs Parallel machine (Cenju - NEC)

Page 29: Parallel CORBA Objects CORBA

0

10

20

30

40

50

60

70

80

90

0 10000 20000 30000 40000 50000 60000 70000 80000

Message size (bytes)

Throughput (Mb/s)

MPI

CORBA

Comparison between CORBA and MPI

Benchmark: send / receive Platform:

2 Bi - Pentium III 500 Mhz Ethernet 100 Mb/s

Latency: MPI: 0,35 ms CORBA: 0,52 ms

Differences due to: Protocol Memory allocation

interface Bench { typedef sequence< long > Vector; void sendrecv( in Vector in_a, out Vector out_a );};

Page 30: Parallel CORBA Objects CORBA

Performance evaluation (cont’d)

Four experiments:

CORBA ORB

MPI Communication layer

Skel.

OA

Objectimpl.SPMDcode

Parallel CORBA Object

Skel.

OA

Objectimpl.SPMDcode

Skel.

OA

Objectimpl.SPMDcode

Skel.

OA

Objectimpl.SPMDcode

MPI Communication layer

Skel.

OA

Objectimpl.SPMDcode

Parallel CORBA Object

Skel.

OA

Objectimpl.SPMDcode

Skel.

OA

Objectimpl.SPMDcode

Skel.

OA

Objectimpl.SPMDcode

Code 1 Code 2

...

...

...

...

...

...

MPI Communication layer

...Client

Stub 1 Stub 2

Client

Stub 1 Stub 2

ParallelScheduler

Parallel CORBA object• through the ORB

CORBA ORB

Stub 2

Client

Scheduler

MPI Communication layer

Skel. 2

OA

Code 2

SPMDCodeSPMD

CodeSPMDCodeSPMD

CodeSPMDCodeSPMD

CodeSPMDCode

MPI Slave processesMP

I Mas

ter

pro

cess

MPI Communication layer

Skel. 1

OA

Code 1

SPMDCodeSPMD

CodeSPMDCodeSPMD

CodeSPMDCodeSPMD

CodeSPMDCode

MPI Slave processesMP

I Mas

ter

pro

cess

Stub 1

SPMDCode

SPMDCode

Master/slave• through file exchange

– ASCII file– XDR file

• through the ORB

Page 31: Parallel CORBA Objects CORBA

Performance evaluation (cont’d)

0

50

100

150

200

250

1 2 4 8Number of objects belonging to the collection

ms

Matrix order = 256 ; Element type = long

ORB

PaCO

ASCII

XDR

0

500

1000

1500

2000

2500

1 2 4 8

Number of objects belonging to the collection

ms

224 Mb/s

47 Mb/s

56 Mb/s

37 Mb/s

Page 32: Parallel CORBA Objects CORBA

Performance evaluation (cont’d)

ORB

PaCO0

100

200

300

400

500

600

700

800

900

1 2 4 8Number of objects belonging to the collection

ms

Matrix order = 512

258 Mb/s

0

500

1000

1500

2000

2500

3000

3500

1 2 4 8Number of objects belonging to the collection

ms

Matrix order = 1024

293 Mb/s

Page 33: Parallel CORBA Objects CORBA

int main( int argc, char* argv[] ){ /* ... */

MPI_Init( &argc, &argv );

MPI_Comm_rank( MPI_COMM_WORLD, &id ); MPI_Comm_size( MPI_COMM_WORLD, &size ); /* ... */

MPI_Send( ... ); MPI_Recv( ... );

/* ... */

MPI_Finalize();}

1. Adapt the original source code

new code

int main( int argc, char* argv[] ){ /* ... */

/* MPI_Init( &argc, &argv ); */

MPI_Comm_rank( MPI_COMM_WORLD, &id ); MPI_Comm_size( MPI_COMM_WORLD, &size ); /* ... */

MPI_Send( ... ); MPI_Recv( ... );

/* ... */

/* MPI_Finalize(); */}

Remove invocation to MPI_Init() and to MPI_Finalize()

Encapsulation example

int main( int argc, char* argv[] ){ /* ... */

MPI_Init( &argc, &argv );

MPI_Comm_rank( MPI_COMM_WORLD, &id ); MPI_Comm_size( MPI_COMM_WORLD, &size ); /* ... */

MPI_Send( ... ); MPI_Recv( ... );

/* ... */

MPI_Finalize();}

int app_main( int argc, char* argv[] ){ /* ... */

/* MPI_Init( &argc, &argv ); */

MPI_Comm_rank( MPI_COMM_WORLD, &id ); MPI_Comm_size( MPI_COMM_WORLD, &size ); /* ... */

MPI_Send( ... ); MPI_Recv( ... );

/* ... */

/* MPI_Finalize(); */}

Rename the main function

Original code

Page 34: Parallel CORBA Objects CORBA

Encapsulation example (cont’d)

typedef sequence< string > arg_type;

interface[ * ] Wrapper { void compute( in string directory, in arg_type arguments ); void stop();};Interface IDL

2. Define the IDL interface

void Wrapper_impl::compute( const char* directory, const arg_type_DArray& arguments ){ int argc = arguments.length() + 1; char** argv = new (char *) [ argc ];

argv[ 0 ] = strdup( “...” ); /* Application name */ for( int i0 = 1; i0 < argc; ++i0 ) argv[ i0 ] = strdup( arguments[ i0 - 1 ] );

chdir( directory ); app_main( argc, argv );

for( int i1 = 0; i1 < argc; ++i1 ) free( argv[ i1 ] ); delete [] argv;}

Objectimplementation

3. Write method implementation

Page 35: Parallel CORBA Objects CORBA

Encapsulation example (cont’d)

int main( int argc, char* argv[] ){ /* ... */

PaCO_DL_init( &argc, &argv ); orb = ORB_init( argc, argv );

/* ... */

srv = new Wrapper_impl();

/* ... */

orb->run();

/* ... */

PaCO_DL_exit();}

server

4. Write the main functionof the server

int main( int argc, char* argv[] ){ /* ... */

arg_type arguments; arguments.length( 2 );

arguments[ 0 ] = strdup( “...” ); /* arg 1 */ arguments[ 1 ] = strdup( “...” ); /* arg 2 */

pos->compute( working_directory, arguments );

/* ... */}

client

5. Write the client

Page 36: Parallel CORBA Objects CORBA

Conclusion

We show that MPI and CORBA can be combined for distributed and parallel programming

Implementation depends on the CORBA implementation Need to have a standardized API for the ORB

Response to the OMG RFI “Supporting Aggregated Computing in CORBA”