UNIVERSITY OF JYVÄSKYLÄ P2PDisCo – Java Distributed Computing for Workstations Using Chedar...

22
UNIVERSITY OF JYVÄSKYLÄ P2PDisCo – Java Distributed Computing for Workstations Using Chedar Peer-to-Peer Middleware Presentation for 7 th International Workshop on Java tm for Parallel and Distributed Computing (IWJPDC 2005) 4.4.2005 Mikko Vapa, research student Department of Mathematical Information Technology University of Jyväskylä, Finland http:// tisu .it.jyu.fi/ cheesefactory With co-authors Niko Kotilainen, Matthieu Weber, Joni Töyrylä and Jarkko Vuori

Transcript of UNIVERSITY OF JYVÄSKYLÄ P2PDisCo – Java Distributed Computing for Workstations Using Chedar...

Page 1: UNIVERSITY OF JYVÄSKYLÄ P2PDisCo – Java Distributed Computing for Workstations Using Chedar Peer-to-Peer Middleware Presentation for 7 th International.

UNIVERSITY OF JYVÄSKYLÄ

P2PDisCo – Java Distributed Computing for Workstations Using Chedar Peer-to-Peer Middleware Presentation for 7th International Workshop on Javatm for Parallel and Distributed Computing (IWJPDC 2005)

4.4.2005

Mikko Vapa, research studentDepartment of Mathematical Information Technology

University of Jyväskylä, Finlandhttp://tisu.it.jyu.fi/cheesefactory

With co-authors Niko Kotilainen, Matthieu Weber, Joni Töyrylä and Jarkko Vuori

Page 2: UNIVERSITY OF JYVÄSKYLÄ P2PDisCo – Java Distributed Computing for Workstations Using Chedar Peer-to-Peer Middleware Presentation for 7 th International.

2005

UNIVERSITY OF JYVÄSKYLÄ

Overview

• This paper introduces Peer-to-Peer Distributed Computing (P2PDisCo) software providing interface for distributing the computation of Java programs to multiple workstations

• P2PDisCo has been built over Chedar peer-to-peer middleware

• Currently, P2PDisCo is being used for speeding up the training of neural networks with evolutionary algorithm

Page 3: UNIVERSITY OF JYVÄSKYLÄ P2PDisCo – Java Distributed Computing for Workstations Using Chedar Peer-to-Peer Middleware Presentation for 7 th International.

2005

UNIVERSITY OF JYVÄSKYLÄ

Peer-to-Peer Networks

• Peer-to-Peer (P2P) networks allow sharing of resources (e.g., computing power, storage space, network bandwidth, printers) over the Internet

• In contrast to clusters, in P2P networks all the tasks and responsibilities for managing the network are shared between peers– This means that there exists no single control entity

responsible for providing the services• Because P2P networks do not require a dedicated hardware,

distributing computation among workstations is usually a cost-effective solution

Page 4: UNIVERSITY OF JYVÄSKYLÄ P2PDisCo – Java Distributed Computing for Workstations Using Chedar Peer-to-Peer Middleware Presentation for 7 th International.

2005

UNIVERSITY OF JYVÄSKYLÄ

Related Work

• There are many alternatives for distributing computing using Java programming language– Programming language independent distributed computing

tools such as Globus Toolkit with Java Commodity Grid Kit– Programming language dependent Java distributed

computing software:• Java extensions requiring changes to Java compiler

and/or Java Virtual Machine (JVM)– JavaParty as an example

• Java libraries providing special class libraries without a need for modifications to the Java compiler or JVM

– JavaSymphony and P2PDisCo as examples

Page 5: UNIVERSITY OF JYVÄSKYLÄ P2PDisCo – Java Distributed Computing for Workstations Using Chedar Peer-to-Peer Middleware Presentation for 7 th International.

2005

UNIVERSITY OF JYVÄSKYLÄ

Related Work

• Many of these distribution tools have some centralized components in them:– Globus uses centralized indexes for resource discovery

whereas in P2PDisCo the resource discovery is decentralized and provided by the Chedar peer-to-peer network

– In JavaSymphony all the computing resources are centrally configured under JS-Shell whereas in P2PDisCo no central management exists

• There are also some implementations of Java distributed computing that use peer-to-peer network for locating resources– An example of such system is GT-P2PRMI allowing Remote

Method Invocation (RMI) bindings and lookups to be executed via a modified RMIRegistry called P2PRMIRegistry

Page 6: UNIVERSITY OF JYVÄSKYLÄ P2PDisCo – Java Distributed Computing for Workstations Using Chedar Peer-to-Peer Middleware Presentation for 7 th International.

2005

UNIVERSITY OF JYVÄSKYLÄ

Chedar P2P Middleware

• Chedar (CHEap Distributed Architecture) is peer-to-peer middleware designed for the needs of peer-to-peer applications

• Chedar constructs a pure peer-to-peer network using topology management algorithms and provides functionalities for locating resources in the network

• Implementation of Chedar is based on Java Standard Edition, thus providing platform independency and easy adaptation to different hardware

Page 7: UNIVERSITY OF JYVÄSKYLÄ P2PDisCo – Java Distributed Computing for Workstations Using Chedar Peer-to-Peer Middleware Presentation for 7 th International.

2005

UNIVERSITY OF JYVÄSKYLÄ

Chedar P2P Middleware

• Each Chedar node maintains a database of locally available resources for example information about which applications are running on the device or what files are located in the node

• Resources can contain meta-information about themselves for example the version number for applications and last modification date for files

• Resource database is stored as an XML document using a specific Document Type Definition (DTD)

• This organization of data allows making rich and complex queries to the database in the form of XPath expressions

Page 8: UNIVERSITY OF JYVÄSKYLÄ P2PDisCo – Java Distributed Computing for Workstations Using Chedar Peer-to-Peer Middleware Presentation for 7 th International.

2005

UNIVERSITY OF JYVÄSKYLÄ

Chedar P2P Middleware

• Chedar node keeps a list of neighbors it is connected to through TCP sockets– TCP provides reliable data delivery and the disappearance

of a neighbor can be detected with TCP timeout• The neighbor list is updated based on heuristics such as number

of relayed query replies and the actual query replies provided by the neighbor to form an efficient topology for resource discovery

• As a search mechanism we currently use Breadth-First Search (BFS) algorithm, which scales to small network sizes and guarantees to locate all resources in the network– In our experiments, the query traffic in the network of 200

workstations with 100 Mb/s Ethernet connections has not yet posed a significant problem and therefore a more efficient version of the query algorithm has not been implemented

Page 9: UNIVERSITY OF JYVÄSKYLÄ P2PDisCo – Java Distributed Computing for Workstations Using Chedar Peer-to-Peer Middleware Presentation for 7 th International.

2005

UNIVERSITY OF JYVÄSKYLÄ

Chedar P2P Middleware

• Each query contains a Message-ID and a query XPath description

• Whenever a query enters a Chedar node, the node checks its resource database for matching resources to XPath expression and if resource is found, a reply message is sent back using the route, which the query came from

• To properly relay the reply message back to the query originator, the message needs to contain the same Message-ID as the query had

• For communicating between two peers, Chedar provides a point-to-point communication protocol allowing basic message passing primitives to be executed by P2P applications

• The protocol uses the same path as the reply message to deliver messages between peers

Page 10: UNIVERSITY OF JYVÄSKYLÄ P2PDisCo – Java Distributed Computing for Workstations Using Chedar Peer-to-Peer Middleware Presentation for 7 th International.

2005

UNIVERSITY OF JYVÄSKYLÄ

Peer-to-Peer Distributed Computing

• Problem– Evolving neural networks in a simulator needs a lot of

computing power– One computer is not enough for many research cases

• Solution– Distribute computation across desktop computers all over

the University of Jyväskylä– It has to be as invisible as possible to the user of the network

simulator– The simulator should not interfere with the desktop use of

the distributed computers• As a solution Peer-to-Peer Distributed Computing (P2PDisCo)

was developed on top of Chedar

Page 11: UNIVERSITY OF JYVÄSKYLÄ P2PDisCo – Java Distributed Computing for Workstations Using Chedar Peer-to-Peer Middleware Presentation for 7 th International.

2005

UNIVERSITY OF JYVÄSKYLÄ

P2PDisCo - Architecture• The node that wants to distribute its computation (denoted as

master) needs to query resources, receive query replies and send data (parameters) for the computation

• The node that offers computation time has to implement Distributed interface to be able to receive start, stop and is application running signals

• Reading of parameters and writing of results are done for the streams offered by P2PDisCo

Chedar

Master

startApplicationstopApplicationapplicationRunning

Chedar

P2PDisCo

Application

queryResourceresourceReplysendData

queryResourcesendData readFile

writeFile

receivedData registerResourcesendData

receivedData Distributed

Page 12: UNIVERSITY OF JYVÄSKYLÄ P2PDisCo – Java Distributed Computing for Workstations Using Chedar Peer-to-Peer Middleware Presentation for 7 th International.

2005

UNIVERSITY OF JYVÄSKYLÄ

P2PDisCo - Architecture

Chedarnode

Chedarnode

Chedarnode Chedar

node

Chedarnode

Master

Page 13: UNIVERSITY OF JYVÄSKYLÄ P2PDisCo – Java Distributed Computing for Workstations Using Chedar Peer-to-Peer Middleware Presentation for 7 th International.

2005

UNIVERSITY OF JYVÄSKYLÄ

P2PDisCo - Architecture

Chedarnode

Chedarnode

Chedarnode Chedar

node

Chedarnode

Master

Page 14: UNIVERSITY OF JYVÄSKYLÄ P2PDisCo – Java Distributed Computing for Workstations Using Chedar Peer-to-Peer Middleware Presentation for 7 th International.

2005

UNIVERSITY OF JYVÄSKYLÄ

P2PDisCo - Architecture

Chedarnode

Chedarnode

Chedarnode Chedar

node

Chedarnode

Master

Query

Query

Query: who has the resource ”NetSimulator” available?

Page 15: UNIVERSITY OF JYVÄSKYLÄ P2PDisCo – Java Distributed Computing for Workstations Using Chedar Peer-to-Peer Middleware Presentation for 7 th International.

2005

UNIVERSITY OF JYVÄSKYLÄ

P2PDisCo - Architecture

Chedarnode

Chedarnode

Chedarnode Chedar

node

Chedarnode

Master

Reply

Reply

Reply: I do!

Reply

Page 16: UNIVERSITY OF JYVÄSKYLÄ P2PDisCo – Java Distributed Computing for Workstations Using Chedar Peer-to-Peer Middleware Presentation for 7 th International.

2005

UNIVERSITY OF JYVÄSKYLÄ

P2PDisCo - Architecture

Chedarnode

Chedarnode

Chedarnode Chedar

node

Chedarnode

Master

Task

Task

Task

Page 17: UNIVERSITY OF JYVÄSKYLÄ P2PDisCo – Java Distributed Computing for Workstations Using Chedar Peer-to-Peer Middleware Presentation for 7 th International.

2005

UNIVERSITY OF JYVÄSKYLÄ

P2PDisCo - Architecture

Chedarnode

Chedarnode

Chedarnode Chedar

node

Chedarnode

Master

Computation…

Page 18: UNIVERSITY OF JYVÄSKYLÄ P2PDisCo – Java Distributed Computing for Workstations Using Chedar Peer-to-Peer Middleware Presentation for 7 th International.

2005

UNIVERSITY OF JYVÄSKYLÄ

P2PDisCo - Architecture

Chedarnode

Chedarnode

Chedarnode Chedar

node

Chedarnode

Master

Result

Result

Results are sent back to the master node.Calculation ends and everybody is happy.

Result

Page 19: UNIVERSITY OF JYVÄSKYLÄ P2PDisCo – Java Distributed Computing for Workstations Using Chedar Peer-to-Peer Middleware Presentation for 7 th International.

2005

UNIVERSITY OF JYVÄSKYLÄ

What happens inside a Chedar node

Task

Chedar node starts the distributed application, hijackingits file operations.

Chedar

Distributed program

Result

Any Java program that uses files to read input and store output can be distributed

Page 20: UNIVERSITY OF JYVÄSKYLÄ P2PDisCo – Java Distributed Computing for Workstations Using Chedar Peer-to-Peer Middleware Presentation for 7 th International.

2005

UNIVERSITY OF JYVÄSKYLÄ

Security Concerns

• Because of security concerns the distributed application has been beforehand installed to the computers and it is not automatically delivered during the task distribution

• In the task distribution only the execution parameters i.e. configuration files are transferred

• Also, currently the IP addresses of master nodes are restricted such that only certain IP addresses are allowed to start computations

Page 21: UNIVERSITY OF JYVÄSKYLÄ P2PDisCo – Java Distributed Computing for Workstations Using Chedar Peer-to-Peer Middleware Presentation for 7 th International.

2005

UNIVERSITY OF JYVÄSKYLÄ

Future Work

• At this time P2PDisCo is just a tool for our research project to speed up the computations of NeuroSearch neural network resource discovery algorithm

• Possible improvements– Checkpointing of computation such that if connection is lost

the computation can be resumed from the same point– Master could leave the network and gather results afterwards– Extending API of P2PDisCo to allow direct communication

between computing nodes, which makes it possible to parallelize the evolutionary algorithm for multiple computers with other architectures than master-slave, such as the panmictic model commonly used for parallelization of evolutionary algorithms

Page 22: UNIVERSITY OF JYVÄSKYLÄ P2PDisCo – Java Distributed Computing for Workstations Using Chedar Peer-to-Peer Middleware Presentation for 7 th International.

UNIVERSITY OF JYVÄSKYLÄ

Thank You!

Any questions?