CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer...

103
CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    212
  • download

    0

Transcript of CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer...

Page 1: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

CX: A Scalable, Robust Network for Parallel Computing

Peter Cappello & Dimitrios Mourloukos

Computer Science

UCSB

Page 2: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

2

Outline

1. Introduction

2. Related work

3. API

4. Architecture

5. Experimental results

6. Current & future work

Page 3: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

3

Introduction

• “Listen to the technology!” Carver Mead

Page 4: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

4

Introduction

• “Listen to the technology!” Carver Mead

• What is the technology telling us?

Page 5: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

5

Introduction

• “Listen to the technology!” Carver Mead

• What is the technology telling us?

– Internet’s idle cycles/sec growing rapidly

Page 6: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

6

Introduction

• “Listen to the technology!” Carver Mead

• What is the technology telling us?

– Internet’s idle cycles/sec growing rapidly

– Bandwidth increasing & getting cheaper

Page 7: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

7

Introduction

• “Listen to the technology!” Carver Mead

• What is the technology telling us?

– Internet’s idle cycles/sec growing rapidly

– Bandwidth is increasing & getting cheaper

– Communication latency is not decreasing

Page 8: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

8

Introduction

• “Listen to the technology!” Carver Mead

• What is the technology telling us?

– Internet’s idle cycles/sec growing rapidly

– Bandwidth increasing & getting cheaper

– Communication latency is not decreasing

– Human technology is getting neither

cheaper nor faster.

Page 9: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

9

Introduction

Project Goals

1. Minimize job completion time

despite large communication latency

Page 10: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

10

Introduction

Project Goals

1. Minimize job completion time

despite large communication latency

2. Jobs complete with high probability

despite faulty components

Page 11: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

11

Introduction

Project Goals

1. Minimize job completion time

despite large communication latency

2. Jobs complete with high probability

despite faulty components

3. Application program is oblivious to:• Number of processors

• Inter-process communication

• Fault tolerance

Page 12: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

12

Heterogeneous machine/OS

Introduction

Fundamental Issue: Heterogeneity

M1

OS1

M2

OS2

M3

OS3

M4

OS4

M5

OS5…

Page 13: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

13

Heterogeneous machine/OS

Introduction

Fundamental Issue: Heterogeneity

M1

OS1

M2

OS2

M3

OS3

M4

OS4

M5

OS5…

Functionally Homogeneous

JVM

Page 14: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

14

Outline

1. Introduction

2. Related work

3. API

4. Architecture

5. Experimental results

6. Current & future work

Page 15: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

15

Related work

• Cilk Cilk-NOW Atlas

– DAG computational model

– Work-stealing

Page 16: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

16

Related work

• Linda Piranha JavaSpaces

– Space-based coordination

– Decoupled communication

Page 17: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

17

Related work

• Charlotte (Milan project / Calypso prototype)

– High performance Fault tolerance not

achieved via transactions

– Fault tolerance via eager scheduling

Page 18: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

18

Related work

• SuperWeb JavelinJavelin++– Architecture: client, broker, host

Page 19: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

19

Outline

1. Introduction

2. Related work

3. API

4. Architecture

5. Experimental results

6. Current & future work

Page 20: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

20

API

DAG Computational model

int f( int n )

{

if ( n < 2 )

return n;

else

return f( n-1 ) + f( n-2 );

}

Page 21: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

21

DAG Computational Model

int f( int n ) {

if ( n < 2 ) return n;

else return f( n-1 ) + f( n-2 );

}

f(4)

Method invocation tree

Page 22: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

22

DAG Computational Model

int f( int n ) {

if ( n < 2 ) return n;

else return f( n-1 ) + f( n-2 );

}

f(4)

f(3) f(2)

Method invocation tree

Page 23: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

23

DAG Computational Model

int f( int n ) {

if ( n < 2 ) return n;

else return f( n-1 ) + f( n-2 );

}

f(4)

f(3) f(2)

f(2) f(1) f(1) f(0)

Method invocation tree

Page 24: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

24

DAG Computational Model

int f( int n ) {

if ( n < 2 ) return n;

else return f( n-1 ) + f( n-2 );

}

f(4)

f(3) f(2)

f(1) f(1) f(0)

f(1) f(0)

Method invocation tree

f(2)

Page 25: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

25

DAG Computational Model / API

f(4) execute( ) {

if ( n < 2 )

setArg( , n );

else {

spawn ( );

spawn ( );

spawn ( );

}

}

_______________________________

f(n-1)

+

+

execute( ) {

setArg( , in[0] + in[1] );

}

f(n)

+

+

f(n-2)

Page 26: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

26

DAG Computational Model / API

execute( ) {

setArg( , in[0] + in[1] );

}

+

+

f(4)

f(3) f(2)

+

execute( ) {

if ( n < 2 )

setArg( , n );

else {

spawn ( );

spawn ( );

spawn ( );

}

}

_______________________________

f(n-1)

+

+

f(n)

f(n-2)

Page 27: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

27

DAG Computational Model / API

execute( ) {

setArg( , in[0] + in[1] );

}

+

+

f(4)

f(3) f(2)

+

f(2) f(1) f(1) f(0)

+

+

execute( ) {

if ( n < 2 )

setArg( , n );

else {

spawn ( );

spawn ( );

spawn ( );

}

}

_______________________________

f(n-1)

+

+

f(n)

f(n-2)

Page 28: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

28

DAG Computational Model / API

execute( ) {

setArg( , in[0] + in[1] );

}

+

+

f(4)

f(3) f(2)

+

f(2) f(1) f(1) f(0)

+

+

f(1) f(0)

+

execute( ) {

if ( n < 2 )

setArg( , n );

else {

spawn ( );

spawn ( );

spawn ( );

}

}

_______________________________

f(n-1)

+

+

f(n)

f(n-2)

Page 29: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

29

Outline

1. Introduction

2. Related work

3. API

4. Architecture

5. Experimental results

6. Current & future work

Page 30: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

30

Architecture: Basic Entities

CONSUMERPRODUCTION

NETWORK

CLUSTERNETWORK

register ( spawn | getResult )* unregister

Page 31: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

31

Architecture: Cluster

TASKSERVERPRODUCER

PRODUCER

PRODUCER

PRODUCER

Page 32: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

32

A Cluster at Work

f(4)

f(3) f(2)

+

f(2) f(1) f(1) f(0)

f(1) f(0)

+

+

+

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

Page 33: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

33

A Cluster at Work

f(4)

TASKSERVER

PRODUCER

PRODUCER WAITING

READYf(4)

Page 34: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

34

A Cluster at Work

f(4)

TASKSERVER

PRODUCER

PRODUCER WAITING

READYf(4) f(4)

Page 35: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

35

A Cluster at Work

f(4)

TASKSERVER

PRODUCER

PRODUCER WAITING

READYf(4)

Page 36: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

36

Decompose

execute( )

{

if ( n < 2 )

setArg( ArgAddr, n );

else

{

spawn ( + );

spawn ( f(n-1) );

spawn ( f(n-2) );

}

}

Page 37: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

37

A Cluster at Work

f(4)

f(3) f(2)

+

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

f(4)

+

f(3)

f(2)

Page 38: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

38

A Cluster at Work

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

+

f(3)

f(2)

f(3) f(2)

+

Page 39: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

39

A Cluster at Work

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

+

f(3)

f(2)

f(3)

f(2)

f(3) f(2)

+

Page 40: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

40

A Cluster at Work

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

+

f(3)

f(2)

f(3) f(2)

+

Page 41: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

41

A Cluster at Work

f(3) f(2)

+

f(2) f(1) f(1) f(0)

+

+

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

+

f(3)

f(2) +

f(2)

f(1)

+

f(1) f(0)

Page 42: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

42

A Cluster at Work

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

++

f(2)

f(1)

+

f(1) f(0)

+

f(2) f(1) f(1) f(0)

+

+

Page 43: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

43

A Cluster at Work

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

++

f(2)

f(1) +

f(1) f(0)

f(2)

f(1)

+

f(2) f(1) f(1) f(0)

+

+

Page 44: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

44

A Cluster at Work

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

++f(1) +

f(0)

f(2)

f(1)

+

f(2) f(1) f(1) f(0)

+

+

Page 45: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

45

Compute Base Case

execute( )

{

if ( n < 2 )

setArg( ArgAddr, n );

else

{

spawn ( + );

spawn ( f(n-1) );

spawn ( f(n-2) );

}

}

Page 46: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

46

A Cluster at Work

+

f(2) f(1) f(1) f(0)

f(1) f(0)

+

+

+

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

++f(1) +

f(0)

f(2)

f(1)

+

f(1)

f(0)

Page 47: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

47

A Cluster at Work

+

f(1) f(0)

f(1) f(0)

+

+

+

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

+++

f(0)f(1)

+

f(1)

f(0)

Page 48: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

48

A Cluster at Work

+

f(1) f(0)

f(1) f(0)

+

+

+

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

+++

f(0)f(1)

+

f(1)

f(0)f(1)

f(0)

Page 49: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

49

A Cluster at Work

+

f(1) f(0)

f(1) f(0)

+

+

+

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

+++

+

f(1)

f(0)f(1)

f(0)

Page 50: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

50

A Cluster at Work

+

f(1) f(0)

f(1) f(0)

+

+

+

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

+++

+

f(1)

f(0)f(1)

f(0)

Page 51: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

51

A Cluster at Work

+

f(1) f(0)

+

+

+

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

+++

+

f(1)

f(0)

+

Page 52: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

52

A Cluster at Work

+

f(1) f(0)

+

+

+

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

++

+

f(1)

f(0)

+

Page 53: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

53

A Cluster at Work

+

f(1) f(0)

+

+

+

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

++

+

f(1)

f(0)

+

+

f(1)

Page 54: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

54

A Cluster at Work

+

f(1) f(0)

+

+

+

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

++

+

f(0)

+

f(1)

Page 55: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

55

Compose

execute( )

{

setArg( ArgAddr, in[0] + in[1] );

}

Page 56: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

56

A Cluster at Work

+

f(1) f(0)

+

+

+

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

++

+

f(0)

+

f(1)

Page 57: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

57

A Cluster at Work

+

f(0)

+

+

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

++

+

f(0)

Page 58: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

58

A Cluster at Work

+

f(0)

+

+

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

++

+

f(0)

f(0)

Page 59: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

59

A Cluster at Work

+

f(0)

+

+

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

++

+

f(0)

Page 60: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

60

A Cluster at Work

+

f(0)

+

+

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

++

+

f(0)

Page 61: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

61

A Cluster at Work

+

+

+

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

++

+

+

Page 62: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

62

A Cluster at Work

+

+

+

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

++

+

Page 63: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

63

A Cluster at Work

+

+

+

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

++

+

+

Page 64: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

64

A Cluster at Work

+

+

+

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

+++

Page 65: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

65

A Cluster at Work

+

+

+

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

+++

Page 66: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

66

A Cluster at Work

++

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

++

+

Page 67: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

67

A Cluster at Work

++

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

+

+

Page 68: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

68

A Cluster at Work

++

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

+

+

+

Page 69: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

69

A Cluster at Work

++

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

+

+

Page 70: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

70

A Cluster at Work

++

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

+

+

Page 71: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

71

A Cluster at Work

+

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

+

+

Page 72: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

72

A Cluster at Work

+

TASKSERVER

PRODUCER

PRODUCER WAITING

READY+

Page 73: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

73

A Cluster at Work

+

TASKSERVER

PRODUCER

PRODUCER WAITING

READY+

+

Page 74: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

74

A Cluster at Work

+

TASKSERVER

PRODUCER

PRODUCER WAITING

READY+

Page 75: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

75

A Cluster at Work

+

TASKSERVER

PRODUCER

PRODUCER WAITING

READY+

R

Page 76: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

76

A Cluster at Work

TASKSERVER

PRODUCER

PRODUCER WAITING

READY

R

1. Result object is sent to Production Network

2. Production Network returns it to Consumer

Page 77: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

77

Task Server ProxyOverlap Communication with Computation

PRODUCER

Task Server Proxy

OUTBOX

INBOXCOMMCOMP

READY

WAITING

TASK SERVER

PRIORITY Q

Page 78: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

78

Architecture Work stealing & eager scheduling

• A task is removed from server only after a

complete signal is received.

• A task may be assigned to multiple producers

– Balance task load among producers of varying

processor speeds

– Tasks on failed/retreating producers are re-assigned.

Page 79: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

79

Architecture: Scalability

• A cluster tolerates producer:

– Retreat

– Failure

• 1 task server however is a:

– Bottleneck

– Single point of failure.

• We introduce a network of task servers.

Page 80: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

80

Scalability: Class loading

1. CX class loader loads classes (Consumer JAR) in each server’s class cache

2. Producer loads classes from its server

Page 81: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

81

Scalability: Fault-tolerance

Replicate a server’s tasks on its sibling

Page 82: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

82

Scalability: Fault-tolerance

Replicate a server’s tasks on its sibling

Page 83: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

83

Scalability: Fault-tolerance

Replicate a server’s tasks on its sibling

When server fails,its sibling restores stateto replacement server

Page 84: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

84

Architecture

Production network of clusters

• Network tolerates single server failure.

• Restores ability to tolerate a single failure.

ability to tolerate a sequence of failures

Page 85: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

85

Outline

1. Introduction

2. Related work

3. API

4. Architecture

5. Experimental results

6. Current & future work

Page 86: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

86

Preliminary experiments

• Experiments run on Linux cluster

– 100 port Lucent P550 Cajun Gigabit Switch

• Machine

– 2 Intel EtherExpress Pro 100 Mb/s Ethernet cards

– Red Hat Linux 6.0

– JDK 1.2.2_RC3

– Heterogeneous

• processor speeds

• processors/machine

Page 87: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

87

Fibonacci Tasks with Synthetic Load

+

+

f(n-1)

+

+

f(n)

f(n-2)

execute( ) {

if ( n < 2 )

synthetic workload();

setArg( , n );

else {

synthetic workload();

spawn ( );

spawn ( );

spawn ( );

}

}

execute( ) {

synthetic workload();

setArg( , in[0] + in[1] );

}

Page 88: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

88

TSEQ vs. T1 (seconds)Computing F(8)

Workload TSEQ T1 Efficiency

4.522 497.420 518.816 0.96

3.740 415.140 436.897 0.95

2.504 280.448 297.474 0.94

1.576 179.664 199.423 0.90

0.914 106.024 120.807 0.88

0.468 56.160 65.767 0.85

0.198 24.750 29.553 0.84

0.058 8.120 11.386 0.71

Page 89: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

89

Parallel Efficiency over 60 nodes

0

0.2

0.4

0.6

0.8

1

1.2

F(13) Fib(14) Fib(15) Fib(16) Fib(17) Fib(18)

Par

alle

l E

ffic

ien

cy

Workload 1

Workload 2

Parallel efficiency for F(13) = 0.87Parallel efficiency for F(18) = 0.99

Average task time:Workload 1 = 1.8 sec.Workload 2 = 3.7 sec.

Page 90: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

90

Outline

1. Introduction

2. Related work

3. API

4. Architecture

5. Experimental results

6. Current & future work

Page 91: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

91

Current work

• Implement CX market maker (broker)

Solves discovery problem between Consumers & Production

networks

• Enhance Producer with Lea’s Fork/Join Framework

– See gee.cs.oswego.edu

CONSUMER PRODUCTIONNETWORKCONSUMERCONSUMERCONSUMER

PRODUCTIONNETWORK

PRODUCTIONNETWORK

PRODUCTIONNETWORK

MARKETMAKER} {

JINI Service

Page 92: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

92

Current work

• Enhance computational model: branch & bound.

– Propagate new bounds thru production network: 3 steps

PRODUCTION NETWORK

SEARCH TREE

TERMINATE!

BRANCH

Page 93: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

93

Current work

• Enhance computational model: branch & bound.

– Propagate new bounds thru production network: 3 steps

PRODUCTION NETWORK

SEARCH TREE

TERMINATE!

Page 94: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

94

Current work

• Investigate computations that appear

ill-suited to adaptive parallelism

– SOR

– N-body.

Page 96: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

96

Introduction

Fundamental Issues

• Communication latency

Long latency Overlap computation with communication.

• Robustness

Massive parallelism faults

• Scalability

Massive parallelism login privileges cannot be required.

• Ease of use

Jini easy upgrade of system components

Page 97: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

97

Related work

• Market mechanisms– Huberman, Waldspurger, Malone, Miller &

Drexler, Newhouse & Darlington

Page 98: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

98

Related work

• CX integrates

– DAG computational model

– Work-stealing scheduler

– Space-based, decoupled communication

– Fault-tolerance via eager scheduling

– Market mechanisms (incentive to participate)

Page 99: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

99

Architecture Task identifier

• Dag has spawn tree• TaskID = path id• Root.TaskID = 0• TaskID used to detect

duplicate: – Tasks– Results.

F(4)

F(3) F(2)

+

F(2) F(1) F(1) F(0)

F(1) F(0)

+

+

+

0

000

2

1

1

1

1

22

2

Page 100: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

100

Architecture: Basic Entities

• Consumer

Seeks computing resources.

• Producer

Offers computing resources.

• Task Server

Coordinates task distribution among its producers.

• Production Network

A network of task servers & their associated producers.

Page 101: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

101

Defining Parallel Efficiency

• Scalar: Homogeneous set of P machines:

Parallel efficiency = (T1 / P) / TP

• Vector: Heterogeneous set of P machines:

P = [ P1, P2, …, Pd ], where there are

P1 machines of type 1,

P2 machines of type 2, …

Pd machines of type d :

Parallel efficiency = ( P1 / T1 + P2 / T2 + … Pd / Td ) –1 / TP

Page 102: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

102

Future work

• Support special hardware / data: inter-server task

movement.

– Diffusion model:

Tasks are homogeneous gas atoms diffusing through network.

– N-body model: Each kind of atom (task) has its own:

• Mass (resistance to movement: code size, input size, …)

• attraction/repulsion to different servers

Or other “massive” entities, such as:

» special processors

» large data base.

Page 103: CX: A Scalable, Robust Network for Parallel Computing Peter Cappello & Dimitrios Mourloukos Computer Science UCSB.

103

Future Work

• CX preprocessor to simplify API.