Post on 09-Jan-2016
description
PDCS96 10–11 December 1996
Charlotte:Metacomputing on the Web
Arash Baratloo Mehmet KaraulZvi Kedem Peter Wyckoff
New York University
PDCS96 10–11 December 1996
Roadmap
Goals Virtual machine model Code sample Execution environment Distributed shared memory Experiments Summary
PDCS96 10–11 December 1996
Goals
Programmer’s goals� High level language
� Reliable and predictable virtual machine (fault tolerance, heterogeneity of machine types and speeds, transiently available machines)
� Portability
User’s goals� Utilize any machine on the Web (no account or shared file system)
� Reliable and predictable virtual machine
� Authentication of results
CPU donator’s goals� Protection from malicious code
� Full control of her resources
� No administrative hassles
PDCS96 10–11 December 1996
Leveraging Java
Predictable and reliable virtual machine on top of the Java virtual machine
Java-capable browser widely available Emerging standard Security Heterogeneity Portability Compilers to appear in the near future
PDCS96 10–11 December 1996
Virtual Machine Model
Separation of programming and execution environment� Programmer develops applications for a perfect virtual machine
� Slow and fast machines are handled transparently by the runtime system
� Transiently available machines handled transparently by the runtime system
� Fault tolerance handled transparently by the runtime system
High level programming model� Unbounded number of parallel routines
� Java plus three simple language constructs
Distributed shared memory Simple memory semantics
PDCS96 10–11 December 1996
Example: Matrix Multiplication
public class MatrixMult extends Droutine { public static int Size = 500; public Dfloat a[][] = new Dfloat[Size]
[Size]; public Dfloat b[][] = new Dfloat[Size]
[Size]; public Dfloat c[][] = new Dfloat[Size]
[Size];
public void drun(int numTasks, int id) { int sum; for(int i=0; i<Size; i++) { sum = 0; for(int j=0; j<Size; j++) sum += a[id][j].get() * b[j]
[i].get(); c[id][i].set(sum); } }
public void run() { InitMatrix(a); InitMatrix(b); parBegin(); addDroutine(this, Size); parEnd(); PrintMatrix(c); }}
Sample Charlotte Program
PDCS96 10–11 December 1996
Execution Environment
The same Charlotte program runs on:
a single machine
multiple machines
(one user machine and a set of
potential volunteer machines)
Interaction among machines solely through Java-capable browsers
W W W
PDCS96 10–11 December 1996
Eager Scheduling
Difficulties in a distributed system� Detection of crashed-failed machines
� Detection of slow machines
Solution: Eager scheduling� Volunteer machines contact user machine for work
� Routines may be assigned to multiple machines
Difficulties with eager scheduling� Inconsistent memory views across routines and different
executions of the same routine
� Violation of exactly-once semantics
Solution: Two-phase Idempotent Execution Strategy (TIES)
PDCS96 10–11 December 1996
DSM
Why DSM?� Easy to use
� Programmer and user transparent
Design objectives� Heterogeneity
� Operating system independence
� Compiler independence
These require an object-based approach for implementing DSM
PDCS96 10–11 December 1996
DSM — Implementation
Realized at the object level
All objects have a unique identifier
Identifiers are identically mapped to objects across machines
Data is transferred on demand
Granularity can be controlled
False sharing avoided
ID
Dirty?
Value
ID
Dirty?
Value
Shared Data
Local Data
Local Data
Memory ofthe user'smachine
Memory ofa volunteer
machine
Shared Data
Local Data
Local Data
PDCS96 10–11 December 1996
Experiments
10 Sun SPARC 5 workstation
10 MBit/s Ethernet
Application: Ising model
Measured time is wall-clock time
Three tests� Scalability
� Load balancing
� Transiently available machines
PDCS96 10–11 December 1996
Experiment 1: Scalability
0
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
S 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
Time Equivalent Machines Speedup
PDCS96 10–11 December 1996
Experiment 2: Load balancing
0
100
200
300
5F+0H 4F+2H 3F+4H 2F+6H 1F+8H 0F+10H
0
1
2
3
4
5
Time Equivalent Machines Speedup
PDCS96 10–11 December 1996
Experiment 3:Transient Availability
Five machines used
After 100 seconds: 1 machine crashed and 1 added
After another 100 seconds: 2 machines crashed and 2 added
90.18 % efficiency as opposed to 5 reliable machines
86.25 % efficiency as opposed to sequential execution (95.64 % for 5 reliable machines)
PDCS96 10–11 December 1996
Summary
Charlotte targets the Web
Leverages benefits of Java (security, heterogeneity, widely available, ...)
Seamlessly crosses administration boundaries
Distribution of program and data
DSM with no compiler or OS support