Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.

22
Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries

Transcript of Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.

Page 1: Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.

Crossing The Line:

Distributed Computing Across

Network and Filesystem

Boundaries

Page 2: Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.

Native-Language-Based Distributed Computing

Blending Java and Native Languages to achieve cross-platform, cross-

network, cross-filesystem distributed computing.

Page 3: Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.

The Objective

Linux

NT

Cluster

SGI

Solaris

Cluster

Spawn ThisHere

Data Servers

Data RepositoryUpdate Database, Post-process data

Page 4: Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.

Implications

• Collaborative programming environment

• Provisional access to computational resources beyond local networks.

• Data mining (send processes to the data).

• Optimal process mapping.

• Nomadic virtual environment.

Page 5: Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.

Demands on the System:

• Protocol for porting of static forms of the executable program collectives across to the remote host, resolve its dependencies, check against security violations, and instantiate as a process.

• Whom to trust?

• API for message-passing library calls.

Page 6: Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.

Initial Approach: Use Java

• Java provides a means to load processes obtained from remote resources using the java.lang.ClassLoader class.

• The Java SecurityManager provides a general security framework.

• Java bytecode representation provides uniformity in a heterogeneous environment (and more! …later).

Page 7: Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.

Initial Implementation:

• Java-based communication substrate with Java programming bindings to message-passing functions.

• Commands such as add, delete, spawn, kill, and similar PVM-style commands to configure the environment.

• Additional commands to merge, split and register virtual environments.

Page 8: Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.

Initial Implementation:

• Provided the requisite mechanisms to “soft-install” processes upon remote resources without accessing the filesystem.

• Allowed for distributed parallelization and synchronization of processes within the environment.

Page 9: Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.

Functionality...

• The Java-based implementation proved to be well suited for “computationally-lite” distributed computing tasks across network boundaries.

• Although speedups were observable for parallelization of tasks over clusters, performance for traditional distributed computing tasks left a lot to be desired.

Page 10: Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.

Pros/Cons of using Java

• Pros:– Java offers tremendous potential to the user in terms of

portability and heterogeneous execution

– Bytecode Representation, RMI, Object Serialization

• Cons:– As an interpreted language, Java suffers a significant

performance penalty.

– As with any new language, the thought of rewriting existing codes brings reluctance, lack of enthusiasm.

Page 11: Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.

Pros/Cons of using Java

16.8917.02

41.2742.67

24.4284.73

103.4

437.9

0 100 200 300 400 500

Times (in seconds) to multiply two 500x500 matrices

C-wrappedLINPACK

implementation

Java-wrappedLINPACK

implementation

Standalone Cimplementation

Standalone JavaImplementation

UnoptimizedJIT/Optimized

Page 12: Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.

So, What Language to Use?

Java is a highly-portable language

Java adheres to the “Write once, run anywhere” philosophy

Java has a well-established collection of scientific library bindings

Java’s executional speed is suitable for HPC

C/Fortran/C++ are highly-portable languages

C/Fortran/C++ adhere to the “Write once, run anywhere” philosophyC/Fortran/C++ have well-established scientific library bindings

C/Fortran/C++ executional speeds are suitable for HPC

Page 13: Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.

So, What Language to Use?

Java is a highly-portable language

Java adheres to the “Write once, run anywhere” philosophy

C/Fortran/C++ have well-established scientific library bindings

C/Fortran/C++ executional speeds are suitable for HPC

Utilize Java for its portability and standardization,but focus on using Java as a wrapper for porting

of native code in the form of shared libraries

Page 14: Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.

Solution: Blend Java with C/Fortran

• Use Java for the initial introduction of the program collective to the remote host. The wrapper class may be analyzed for class dependencies, shared library usage, security violations, etc.

• Use C/Fortran codes as the computational engine of the process. Compiled into shared libraries (.so’s or .DLL’s), they can be encapsulated within the program collective and loaded onto the remote resource.

Page 15: Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.

The (New) Objective:

Linux

NT

Cluster

SGI

Solaris

Cluster

Spawn ThisHere

Native Library,FORTRAN BLAS

perhaps

Page 16: Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.

How does it work?

Request to create Java-based process, “A_process” Local search for

A_process failsRequest for class A_process

Bytecode for A_process.class...011AF01222EBABEFAC 22EBABEF

A C

Class file is run throughthe “BYTE GRINDER.”

Page 17: Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.

...22EBABEF A

List ofMethod Calls Native Libraries

List

List ofDependency Classes

The “BYTE GRINDER”

• List of method calls aids in imposing security on processes obtained remotely, run locally.

• Libraries in the native library list can be obtained from the requesting user or from a trusted third party.

• Classes in the dependency class list are analyzed similarly.

Page 18: Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.

How does it do that?

• Following the Java Virtual Machine specification, the incoming bytecode is “analyzed.”– Magic Number (CAFEBABE)– Major Version– Minor Version– Constant Pool Count

• Construction of the constant pool. (Watch out for those double and long entries!)

Page 19: Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.

How does it do that?

– Read super-class entry from the constant pool– Read list of Interfaces from constant pool

references– Read list of fields from the constant pool.– Method opcode listings.– Java Opcode of each method is analyzed for

invocation of calls such as “System.load.” Argument yields the native library dependency.

• Socket calls, File manipulations, etc.

Page 20: Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.

Other Things That Make it Work:

• Processes are (sub-subclassed) extensions of the java.lang.Thread class, which allows for its execution to be started, stopped, suspended, prioritized, serialized or likewise governed by the remote host.

• JNI: Automatic header file generation and a protocol for interfacing with C/C++ codes (which are then used to interface to Fortran).

Page 21: Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.

Work in progress:

• Implement new features in the recent release of the JDK 1.2 relating to native method calls.

• Optimize message passing mechanism in the substrate.

• Implement full security mechanisms.

• Generate cross network IceT demo Apps.

Page 22: Crossing The Line: Distributed Computing Across Network and Filesystem Boundaries.

Summary

• IceT extends the scope of distributed computing environments by– locating and migrating static processes across

filesystems and by

– dynamically merging and splitting virtual environments.

• IceT provides a provisional mechanism for supporting program collectives running concurrently, across multiple networks, and existing in multiple fazes.