Search on Centralized Networks

download Search on Centralized Networks

of 40

Transcript of Search on Centralized Networks

  • 8/6/2019 Search on Centralized Networks

    1/40

    Peter S.B. Dushkin

    Efficient Search Methods

    In Centralized Systems

    Diploma in Computer Science

    Queens College, Cambridge 2003

  • 8/6/2019 Search on Centralized Networks

    2/40

    2

  • 8/6/2019 Search on Centralized Networks

    3/40

    Proforma

    Peter B Dushkin

    Queens College

    Peer to Peer Content SharingDiploma, Computer Science, 2003

    Word Count: 9,039

    Project Originator: Mr. Peter Dushkin

    Project Supervisor: Mr. Meng How Lim

    Original Aims:

    The original aim of this project was to build an example of a peer to peer content search

    and distribution system. Each user of the system is capable of entering keywords (such as

    getFile or getPeers) to find out information about attached nodes on the network. Theproject follows Napsters example by making a central server available to the connected

    clients. The server contains a registry file that maintains and keeps current information

    about connected clients and what content each is advertising.

    Work Completed:

    An RMI client and server were designed and implemented to test various search and

    retrieval methods in a centralized network environment. Class libraries were built for

    both the local and remote software and data was collected to test the project.

    Special Difficulties:

    There were no difficulties.

    Declaration:

    I Peter Dushkin of Queens College, being a candidate for the Diploma in ComputerScience, hereby declare that this dissertation and the work described in it are my own

    work, unaided except as may be specified below, and that the dissertation does not

    contain material that has already been used to any substantial extent for a comparablepurpose.

    Signed

    3

  • 8/6/2019 Search on Centralized Networks

    4/40

    Date

    4

  • 8/6/2019 Search on Centralized Networks

    5/40

    Table of Contents

    List of Figures ...................................................................................................................... 6

    1 Introduction ....................................................................................................................... 71.1 Abstract ...................................................................................................................... 7

    1.2 Motivation .................................................................................................................. 7

    1.3 Subject Overview and Terminology .......................................................................... 92 Preparation ...................................................................................................................... 10

    2.1 Resources ................................................................................................................. 10

    2.2 Planning & Documentation ......................................................................................103 Implementation ............................................................................................................... 15

    3.2 Classes and Methods ................................................................................................15

    3.2 The Client ................................................................................................................17

    3.3. The Server .............................................................................................................. 18

    3.4 Security ................................................................................................................... 194 Evaluation ....................................................................................................................... 21

    4.1 Data Collection ........................................................................................................213.3 Node Discovery ....................................................................................................... 21

    3.4 Content Discovery ................................................................................................... 23

    3.5 Content Delivery ...................................................................................................... 253.6 Observing results .................................................................................................... 26

    5 Conclusions ..................................................................................................................27

    5.1 Further Development ............................................................................................... 275.2 Final Conclusion ......................................................................................................28

    Appendices .....................................................................................................................29

    A. Key Code Samples ....................................................................................................29B. Bibliography ..........................................................................................................31

    ....................................................................................................................................... 31

    C. Project Proposal .....................................................................................................32

    Supervision Requirements ................................................................................................. 37

    5

  • 8/6/2019 Search on Centralized Networks

    6/40

    List of Figures

    Figure 1..11An example of the question and answer exercises in the planning and documentation

    period. A number of key outputs of this exercise helped us get a better sense of what

    needed to be addressed in sequence diagrams and use cases.Figure 2..12

    User view of the system. This use-case describes the main inputs required of a user on the

    system. Later, these inputs will be translated into classes and methods.Figure 3..14

    The pseudo code for how the client is intended to interact with the system. A number of

    such pseudo code examples were designed and updated later informing sequencediagrams and class and method design.

    Figure 4..16Sequence flow of the central indexing architecture. The local host queries the remote

    directory server for the location of a given piece of content. The server replies with the IPlocation of the node containing the text file. Local host then handshakes with the remote

    host to receive the content.

    Figure 5..17Design of the systems core class diagrams.

    Figure 6..18

    The public interface for the client.Figure 7..19

    Design of the systems core class diagrams.

    Figure 8..20The above class example remotely queries the server and returns a time stamp.Figure 9..21

    The startPeers method on the client side invokes the remote givePeers.

    Figure 1022Functioning of the timing sequences. Commands issued on the local client, the execution

    of remote methods, and returned results are all time stamped to issue data points for

    further exploration.Figure 1123

    Wait time for locating remote connected nodes.

    Figure 1223

    Wait time for locating remote connected nodes. One client.Figure 1324

    Wait time for multiple peers requesting multiple files.

    Figure 1424Wait time for multiple peers requesting the same file.

    Figure 1525

    Wait time for multiple peers requesting small files.Figure 1625

    6

  • 8/6/2019 Search on Centralized Networks

    7/40

    Wait time for multiple peers requesting a large file.

    1 Introduction

    1.1 Abstract

    The digital exchange of information over peer-to-peer networks is not a new topic.Applications such as classroom educational tools, chat services, multicast applications,

    and, more commonly, electronic mail, could all be categorized as variations on this

    design. The now infamous birth of Napster as a quick and dirty way to access music fileshas reinvigorated industrial and academic activity in the p2p arena. Core to such

    development activities has been issues such as disk space utilization, bandwidth

    constraints, effectiveness of peer discovery and group management, point of failure,quality of data location, reliable exchange of information, and other issues. The fruit of

    these development efforts have been second generation services such as Gnutella, Kazaa,

    JXTA, and Pastry; all of which offer a completely decentralized method of file sharing.

    1.2 Motivation

    A common trait of all such Peer to Peer systems is that they establish their own particular

    foundation, leaving the application development community much flexibility to build

    their products and services on top. The assumption being, of course, that the foundationthat was laid by their predecessors is both efficient and reliable. This simply quandary

    makes up driving motivation behind this diploma project. How do I test the effectiveness

    of the underlying architecture of a file-sharing system? Given the popularity of this area

    of development, there seem to be as many protocols as there are claims to a solidsolution. For example, Gnutellas flexible network design dramatically decreases the

    possibility of a single point of failure. But, as the system scales, Gnutellas method of

    flooding the network with queries eventually does more harm than good. Anotherexample is Pixie, a peer-to-peer architecture that uses the concept of content scheduling

    to decrease the limitations imposed by network utilization. In this case, rather than flood

    the network with requests, the application schedules content based on the efficient use ofresources. Finally, there is the recently popular hybrid networks a combination of the

    decentralized and centralized architectures of companies such as Gnutella and Napster

    respectively.

    With this latest evolution in networking approaches in mind, the goal of this project is to

    design a small centralized network and test how efficiently that network performs search,discovery, and retrieval methods under varying conditions. Since Napsters introduction,

    central indexing services, as a scalable solution for content delivery, have been largely

    replaced by decentralized systems that are less vulnerable to a single point of failure.

    However, as both research and industrial efforts continue to offer solutions to theshortcomings of both centralized and decentralized networks, it is becoming increasingly

    apparent that a combination of the two networks is (currently) the best solution. This is

    7

  • 8/6/2019 Search on Centralized Networks

    8/40

    the stuff that the KaZaA file-sharing network is made of. In the KaZaA solution,

    centralized servers are located through-out decentralized peer groups, combining the fast

    access of a central index with the request propagation strengths of a decentralizedsolution.

    In large part due to the renewed utility of centralized search methods within hybridsolutions, this project sets out to explore how centralized search and retrieval is

    accomplished and, if need be, where it can be improved. The four primary system

    features considered in this project are:

    Node Discovery. One of the major challenges in peer-to-peer systems design is

    the discovery of nodes on the network. Each individual node needs a reliablemethod for discovering and handshaking with every other node. Additionally,

    information about the various nodes on the network needs to be stored in some

    manner. In an indexing system such as this one, a central server is used to

    maintain information about the nodes on the network. Each individual node must

    log onto the network, register with the central server, and query the serversregistry to discover other nodes on the network.

    Content Discovery. Content discovery is the location of files on the network. In

    centralized systems, this can be directly from one peer to another. In our system,

    the directory server acts as the adjudicating element in the network, directing thelocal clients requests. Specifically, requesting peers are given IP references to

    nodes on the network with requested files. Of interest to me in this project is the

    time it takes between content request and node discovery.

    Content Delivery. Once the server, remote hosts, and content location is

    discovered, there needs to be an efficient mechanism for delivering files over thenetwork. In many peer-to-peer systems, this piece of the puzzle is key to theoverall success of the design. For centralized architectures in particular, content

    delivery can become an electronic thorn relative to the amount of nodes on the

    network. The more hosts requesting content, the more likely the single directoryserver is unavailable for replying to requests.

    Security. Security should never be overlooked when designing any networkedsystem. Security is especially important in peer to peer networks where both the

    volume of content and network nodes can be quite large. Common security

    problems such as viruses, encryption cracking, bandwidth clogging, internal and

    external network attacks, eavesdropping, and so on are all concerns whendesigning such a system. Additionally, as the number of nodes on the peer to peer

    network increases, so does the systems overall vulnerability to security breaches.

    For our centralized application of peer to peer, I have decided to implement abasic example of Secure Sockets Layer (SSL) from the standard java SDK.

    8

  • 8/6/2019 Search on Centralized Networks

    9/40

    1.3 Subject Overview and Terminology

    At the heart of any discussion of efficient network design is overall topology, or, howto best connect the nodes within a group. For centralized system such as the one I

    considered, the topology is considered in terms of information flow of the network as

    a whole. The nodes in the graph are the peers and links (or edges) between peersindicate a regular sharing of information. For the network to be truly effective, the

    nodes should be able to use the edges to share information without unnecessarily

    loading the network.

    How the network is designed determines how information is shared. Below are

    definitions of the most common network models in use today:

    1. Centralized. The architecture considered in this diploma project. Centralized

    client/server systems are currently the most popular form of network with a

    central server adjudicating among its client peers. Examples include web

    servers, databases, SETI@Home, Napster, etc.2. Ring. A common method for scaling centralized services is to use a cluster of

    machines arranged in a ring to act as a distributed server. Communication

    between servers coordinate the sharing of the system state. This establishes agroup of nodes that provide identical function to a single server but incorporate

    redundancy and load balancing capabilities. Typically, ring systems consist of

    machines that are nearby on the network and owned by a single organization.3. Hierarchical. DNS is an example of such a system. In the case of hierachical

    systems, authority flows from the root name servers to the server for the

    registered name and downward. Usenet is another example of a largehierarchical system.

    4. Decentralized. Popular as truepeer to peer computing, decentralized systemscommunicate symmetrically, where each node takes on all responsibility as bothclient and server. Popular examples are Gnutella and FreeNet.

    Each of the above system architectures dramatically effects the overall success

    and usability of the network. In our centralized example, there are numerousenvironmental factors that can influence the overall stability and effectiveness of the

    system. For example, resources may suddenly become unavailable if a user decides to

    disconnect from the network or power-off a machine. Of course, the volume of users andtheir volume of use is a key concern. There are also random events such as connectivity

    failures, hackers, viruses, etc. (albeit, not a consideration in this project) that can

    influence the systems performance.

    9

  • 8/6/2019 Search on Centralized Networks

    10/40

    2 Preparation

    This chapter details the requirements gathering and design phase prior to algorithmdesign and data collection. In this phase, a waterfall type method was employed,

    consisting of evaluating the projects requirements, building and updating use cases,planning the final system design, and finally, implementing, debugging and testing it.

    2.1 Resources

    2.1.1 Hardware

    The only significant hardware requirements of this project was access to a small

    network. After considering setting up a number of Linux boxes in Concroft, it was

    decided to opt for the convenience of setting up a small network environment withinmy college room. This was done by networking my laptop directly to a rented PC.

    2.1.2 Protocol

    There are a great deal of file-sharing protocols on the market today. Most reflect its

    particular flavor of file-sharing. For example, the Gnutella networks decentralized

    approach is, in fact, its own protocol and can be developed for accordingly. When Ioriginally considered protocols with my supervisor, we opted to go with the relatively

    new and much-discussed JXTA API provided by Sun Microsystems. This decision

    was based entirely on our desire to explore a new application of peer-to-peernetworking with a large development community supporting it. Later on in the

    project, after learning a good deal about the JXTA API, we decided to move to

    Remote Method Invocation (RMI) as the protocol of choice. We did this to be able todig deeper than the JXTA solution would have allowed.

    2.2 Planning & Documentation

    A good deal of design-time efforts went into this project. Requirements analysis was

    made up of several sections, each defining a particular functionality of the system.This project followed the following planning constructs: Analysis (the textual what,

    how, and why of planning for development), Use Case Diagrams (graphicaldescriptions of how the user interacts with the system), Sequence Diagrams (what is

    the particular step-by-step process of the system), & Class Diagrams (what are theclasses and their methods). Other diagrams such as State, Activity, Collaboration, &

    Deployment were considered but deemed overkill and, in many cases, redundant.

    10

  • 8/6/2019 Search on Centralized Networks

    11/40

    2.2.1 Analysis

    The analysis phase is used as a top-level exploration of the project as a whole. Itfollows a simple question and answer format and is intended to touch on each and

    every aspect of the system, from hardware and software requirements, to how

    users collaborate. Below is a piece of the initial Analysis phase Q and A.Q: What is the intended purpose of this project?

    A: To build a simple example of a peer-to-peer system. In this case, a

    central directory server will be used (similar to Napsters model).

    Q: What are the particular inputs of the system?

    A: The user should have a way of connecting to the central server

    and registering with the network. Additionally, it should be able

    to retrieve information on network nodes and file contents.

    Finally, methods for searching for and retrieving files should be

    designed.

    Q: What software is required?

    A: For this project, the j2sdk1.4.2 was used in addition to SparxSystems

    UML tools. Additionally, RMI was used for remote networking. RMI is

    included in the java development kit.

    Q: What hardware is required?

    A: For realistic client/server interaction, two computers would be nice

    but much can be done on a single computer.Q: How many users of the system will there be?

    A: For this system, a single directory server and three remote clients.

    Q:

    Figure 1: An example of the question and answer exercises in the

    planning and documentation period. A number of key outputs of thisexercise helped us get a better sense of what needed to be addressed in

    sequence diagrams and use cases.

    2.2.2 Use Case Diagrams

    Use case diagrams were designed to describe how the individual user interactswith the system. This exercise helped us to satisfy feedback in the analysis phase

    such as what are the desired inputs?, how is a file advertised? and what is beingaccomplished?. It became clear early on that one of the key issues was going to behow Content Delivery was going to occur. Protocols such as I/O Streams, TCP/IP,

    and UDP would have made the job easier but, since this is an RMI project, I had

    to rely on remote object calls. This issue is further explored in the Implementationsection below.

    11

  • 8/6/2019 Search on Centralized Networks

    12/40

    Figure 2: User view of the system. This use-case describes the main

    inputs

    required of a user on the system. Later, these inputs will be translated into

    classes and methods.

    2.2.3 Sequence Diagrams

    At this stage of the projects documentation, the overall architecture of the projectstarted to take shape. When I discussed different design alternatives for this

    project with my supervisor, we concentrated on determining the factors most

    important to issues relating to peer location, content location, content delivery andsecurity. Several early architecture ideas were designed and eventually dropped.

    Accordingly, the main issue we initially struggled with was the choice of

    protocol. Sun Microsystemss JXTA seemed like a good option initially but, oncewe decided to move away from a decentralized architecture to a directory server,it was no longer relevant.

    Pseudo code was designed in this stage to for both the client and server

    implementations. The pseudo code underwent numerous revisions as the projectprogressed. Below is a version of the client implementation:

    Pseudo Code for Client Implementation

    If the directory server accepts a new connection

    register client with the server/network;update the directory;

    initialize the peer group;

    else if the server refuses to connect

    attempt to establish a

    connection n times; else fail;

    if there is input into the user interface

    search clients for files matching entered

    string; else download files from clients;

    12

  • 8/6/2019 Search on Centralized Networks

    13/40

    Figure 3: The pseudo code for how the client is intended to interact withthe system. A number of such pseudo code examples were designed

    and updated later informing sequence diagrams and class and

    method design.

    Both the Use Case Diagrams and the pseudo code laid a good foundation for thedevelopment of the several sequence diagrams that were built to help us describe the

    steps that would be taken during the exchange between client and server. A goodexample of the flow of events can be seen in Figure 4.

    13

  • 8/6/2019 Search on Centralized Networks

    14/40

    Figure 4.Sequence flow of the central indexing architecture. The local host queries

    the remote directory server for the location of a given piece of content. The server

    replies with the IP location of the node containing the text file. Local host thenhandshakes with the remote host to receive the content.

    14

  • 8/6/2019 Search on Centralized Networks

    15/40

    3 Implementation

    The work completed is outlined in the following sections. The implementation of theRMI client/server was informed by the use cases set up in the requirements. Classes and

    their associated attributes and procedures were designed based on the information

    gathered during the planning and requirements gathering phase. A first draft of the Serverand Client implementations were designed. Because RMI transparently accomplishes a

    good deal of the work involved in setting up a file sharing system, it made much of the

    design time efforts straight forward.

    I make a few assumptions during implementation. I assume that only one peer is

    accessing the server at a time and that queries to remote methods are not happening in

    tandem. I also assume that the same is true for similar activity on the system such as filerequests and downloads. Finally, I have designed a small network and assume the reader

    understands that the results will not be the same should the number of nodes or overall

    usage increase.

    3.2 Classes and Methods

    Classes and methods were designed during the implementation phase to transport method

    calls from the local GUI interface to the remote server. When I designed the methods in

    my client and server classes, they essentially followed the below conventions:

    Derive an interface from java.rmi.Remote that contains the methods to be madeavailable to RMI clients.

    Define a class that extends the appropriate subclass of

    java.rmi.server.RemoteServer. In our case, this class is UnicastRemoteObject. Implement the derived interface in the derived class.

    Use javac to create class files.

    Create stub and skeleton classes with the JDK rmic utility, and make the stubclasses available to the client and servers.

    Start the RMI registry on the local machine.

    Start the main application, which should instantiate the RMI server class and

    register it with the local registry.

    Originally, the system was designed with a command-line interface but a GUI was added

    later on in the development process. On the server side, Start() and Stop() methods weredesigned for allowing clients to register and deregister with the server. As information

    flows to and from the server, and updateDir() method updates the directory of files.

    15

  • 8/6/2019 Search on Centralized Networks

    16/40

    Figure 5. Design of the systems core class diagrams.

    ThestartPeers() method is provided as a callback from the server to the

    requesting client. It calls this method back on the client and passes its references

    to the other peers on the network. Once this happens, the receiving peers hold the

    node information in an array and searches this information when necessary.

    The generic actionPerformed() method is used here to handle all the inputs from

    the GUI. The possibilities reflect the BLAH:

    Search Nodes on the Network. To do this, the client user enters the

    getPeers string in the command line. This puts a call to theserver

    to get a new set of peers for the client.

    Search Files on the Network. To do this, the client calls the remote file

    search method on all peers in a clients peer array and put the results in the

    list object of the graphical interface.

    Download a Located File. To do this, the client user calls agetFile

    method on the peer it needs to receive the file from. The results are written

    to a byte array on the local client.

    16

  • 8/6/2019 Search on Centralized Networks

    17/40

    Additional methods are used to return basic information to the requesting client.

    They include writeFile, which writes the contents of the byte array contents to a

    file. AgetFile is called by the remote client to return the byte array of the contentsof the file that is requested. In other words, if a client wants to download a file

    called A_Tale_of_Two_Cities.txt, it must call that method on the providing client.

    3.2 The Client

    After the classes were designed and revised, the next step was to implement the core code

    for the local client. The client design is declared in a public interface that extends remoteobjects. Accordingly, this interface extends java.rmi.Remote and its methods are declared

    to throw RemoteExceptions to the server.

    As each new client joins and leaves the network, it calls the methods designed in the

    Client interface class. Additionally, fileName and fileSize strings were added to storeinformation about the files and file sizes located on the other clients on the network. The

    class takes the form:

    package network;

    import java.rmi.*;

    import java.io.*;

    public interface Client extends Remote {

    String [] filenames = new String[99];

    long [] fileSizes = new long[99];

    public abstract void initPeers(Client clientA; Client clientB; Client c

    lientC) throws java.rmi.RemoteException;

    public abstract void getHost() throws java.rmi.RemoteException;

    public abstract void listFile(int i) throws java.rmi.RemoteException;

    public abstract void getNumFiles() throws java.rmi.RemoteException;

    public abstract void getIP(string searchString) throws

    java.rmi.RemoteException;

    public void writeFile(string filename) throws java.rmi.RemoteException;

    public byte[] getFile(string filename) throws java.rmi.RemoteException;

    Figure 6. The public interface for the client.

    The interfaces purpose is to mark derived interfaces that contain methods to be exported

    by the remote RMI Server. These method calls were designed to find out as much as

    possible about the surrounding network environment importantly information about thelocation of other nodes, getIP() and getHost(), and information about the files each and

    every node has in its local directory. This is done via the listFile, getNumFiles. Finally,

    methods to download the file are getFile and writeFile. These last two methods later

    proved to be somewhat problematic. Namely, the getFile byte array. I will discussreasons and solution further down in the paper.

    Once the public Client interface was designed and tested, the main client implementation

    class ClientImpl was coded. This class contains the actual logic of the designedmethods and procedures in addition to the relationship to the graphical user interface. The

    core functionality of this class is its ability to register each new client with the network

    and ultimately make its information available to the other clients. Other key algorithms

    17

  • 8/6/2019 Search on Centralized Networks

    18/40

    involve the deregistering of the client and the method that updates the array of file names

    and sizes on the current listing. This is the updateDir() method.

    This file is too large to demonstrate here, but one method of note is the getFile method

    that reads from filename in the public interface to an array of bytes. The contents of the

    file is assigned a location in memory by declaring a new temporary file location. Thetemporary byte array location is later written a new file in the local disk of the requesting

    node. The essence of this process is detailed in the below steps:

    {

    byte[] temp = new byte[1];

    byte[] contents1;

    File inputFile;

    try {

    inputFile = new File(filename);

    size = inputFile.length();

    contents1 = new byte[(int) size];

    FileInputStream in = new FileInputStream(inputFile);

    }

    return temp;

    }

    Figure 7. Design of the systems core class diagrams.

    3.3. The Server

    The remote extension of the server class is, by comparison, not as complex as the designof the client classes. Essentially, the server was designed to simply do the followingthings: register and deregister clients, store information about client activity, and give

    information to requesting clients as need be. Accordingly, the Server classes methods are

    register, deregister, and givePeers.

    In the implementation of the Server class, a vector array was provided that keeps

    information about client activity. When local clients query the server, the server searchesits vector array to provide information about what clients are currently active on the

    network. The code for this functionality takes the form:

    protected vector clients;

    public ServerImpl () throws RemoteException {

    clients = new Vector ();

    }

    Additional efforts were made to determine how long objects were taking to execute onthe server-side. The two key operations for the remote timing of events were, first, the

    18

  • 8/6/2019 Search on Centralized Networks

    19/40

    initialization of a timer at the local node that executes on the server. The timer terminates

    once the result is returned from the remote server. The data from this operation helped me

    gather more information about how the network was behaving in different environments -such as increased traffic or the transmission of different file sizes.

    The server code that tests the start and stop time of a remote call should be distinguishedfrom the Timer package which attempts to gauge the length of remote execution. To do

    this the local java code makes a call on the remote object being implemented and returns

    a date and time in association with the object being acted on. An example of one of theclass diagrams in this package takes the form:

    public class Timer implements Runnable {

    TimeMonitor tm;

    public Timer(TimeMonitor tm){

    this.tm = tm;

    }

    public void start(){

    (new Thread(this)).run();

    }

    public void run(){

    while(true){

    try{

    Thread.currentThread().sleep(10000);

    } catch(InterruptedException x){

    }

    if (this.tm!=null){

    try{

    this.tm.tellMeTheTime(new Date());

    } catch(RemoteException x){

    }

    }

    Figure 8. The above class example remotely queries the server and returns

    a time stamp.

    The purpose of the this code (in addition to its related classes) is to remotely start the

    execution of a timer on each new thread that is executing on the server. When the threadhas completed its execution, the current system time is returned as a result. The end result

    is that I get a sense of how long the various methods take to execute on the server.

    3.4 Security

    Security in networking, and particularly in large peer-to-peer applications, is an important

    topic. Because this diploma project is about the effective sharing of resources, the is a lot

    of potential to deliver harmful material to any of the nodes on the network. In largernetworks this topic of research is crucial to the vitality of the system.

    In my particular RMI client/server program, the intention was to decrease the flexibilityenjoyed by local clients when invoking remote classes. Otherwise, any client program

    19

  • 8/6/2019 Search on Centralized Networks

    20/40

    could run any server object, some of which could be potentially harmful to the network.

    When researching solutions provided by RMI, the answer I decided to use in my

    implementation was to install a security manager. Without the installation of a securitymanager there are no restrictions placed on how remote objects are accessed and by

    whom.

    I used the java.rmi.security classes to quite simply instatiate the security manager with

    the below statement:

    if(System.getSecurityManager() == null) {

    System.setSecurityManager(new RMISecurityManager());

    }

    In addition to the above lines of code, the Java SDK that I was using for this project

    required that a security policy file be specified at runtime. This is done by defining thejava.security.policy property:

    java -Djava.security.policy = mypolicy

    In order to access remote objects on the system, Java looks for a system-wide policy file

    in its runtime library. It also looks for a local policy file in the home directory of each

    requesting client. A sample policy file that grants full access permissions to everyonelooks like:

    grant {

    permission java.security.AllPermission;

    };

    Through the use of its local policy file, each client on the network can grant permissionsto each other node on the network. This exchange is made possible by the Permission

    classes in the java.security package, which provides access grants to specific resources.

    20

  • 8/6/2019 Search on Centralized Networks

    21/40

    4 Evaluation

    4.1 Data Collection

    Presented in this project are the results of three experiments. In all cases, the intent is tolearn something about the strengths and weaknesses of centralized search methods. To do

    this effectively, I use the algorithms designed in a Time package that help to gauge how

    long an action event takes from the time it is initiated to the time a response is returnedfrom the server. Data is collected on performance tests on each of the three areas of

    interest mentioned above: node discovery, content discovery, and content download. In

    each case, the load placed on the system is equal to that performed by 50 simultaneous

    queries performed by each of the three clients.

    System tests were completed over a period of one week. Tests and resulting data was not

    gathered consecutively as research has suggested that this can cause results to varysignificantly.

    Later on in the testing phase, results were compiled and explored.

    3.3 Node Discovery

    Of primary interest when implementing the Timer class is the location of additional peers

    on the network As each new node comes onto the network, it registers with the server.

    The servers chief task is to keep track of all the nodes currently logged onto the systemand give references to requesting peers. To keep information updated, any given node

    can, at any time, contact the server to request an updated list of information about the

    other nodes.

    When a new client comes onto the network, it initializes the startPeers method and

    remotely invokes the servers givePeers method. Information about the other nodes onthe network is then loaded into a local array. The initializes method on the client side

    takes the form:

    .

    public void getPeers (Client clientA, Client clientB, Client clientC)

    {

    clients[0] = clientA.getHost();

    clients[1] = clinetB.getHost();

    clinets[2] = clientC.getHost();

    }

    Figure 9.The startPeers method on the client side invokes the remote givePeers

    method on the server. A resulting list of connected nodes is delivered to the

    client.

    21

  • 8/6/2019 Search on Centralized Networks

    22/40

    It is the first objective of this project to study the effectiveness of this peer discovery

    interplay between client and server. Data is colleted by timing the initialization of the

    getPeers method, the remote invocation of the givePeers method, and the final responsefrom the server. The sequence of events takes the form:

    > getPeers

    client sent time

    Time: Tues Aug 02 12:11:09 CST 2003

    server received time

    Time: Tues Aug 02 12:11:10 CST 2003

    client returned time

    Time: Tues Aug 02 12:11:11 CST 2003

    Figure 10.Functioning of the timing sequences. Commands issued on the local

    client, the execution of remote methods, and returned results are all time stamped

    to issue data points for further exploration.

    As Figures 7 and 8 indicate, I tested the getPeers method call in an environment where

    only one client was querying the server and then, in an environment where three clients

    where simultaneously querying the server. In the case of Figure 7, the overall averageof the combined data points for each client was 1.562. In Figure 8, the single clients

    average was 1.262. While it wasnt surprising that the increased load of the Figure 7

    resulted in a larger overall average, I did expect the numbers to be further apart.

    A second observation was in the difference in fluidity between the two figures. In

    Figure 7, a somewhat erratic behavior is observed that is not so (or not at all) present in

    its counterpart. I am guessing that this observation can possibly be attributed to thethree clients competing for the same method invocations on the remote server. This

    touches on the issues inherent in concurrent systems programming mentioned earlier.

    For me, the question that Figure 7 raises is whether or not RMI is thread-safe. There area lot of possibilities here. One such possibility is that the connections are being pooled

    in such a way that only one is being used by an outstanding remote call at a time. Just

    because the stub never modifies any instance data does not mean that concurrent callswriting to the same socket will marshall correctly. Another possible explanation is the

    actual activation of the remote objects. In Suns documentation it was unclear to me

    how to tell whether a remote object is in an active or passive state when being accessed.

    Without clarity here, it is possible that the graph below reflects multiple threads trying

    to spawn multiple processes for the same activation group in this case the givePeersmethod.

    22

  • 8/6/2019 Search on Centralized Networks

    23/40

    0

    1

    2

    3

    4

    -10 10 30 50

    0

    0.5

    1

    1.5

    2

    -10 10 30 50

    Figure 11: wait time for locating Figure 12: wait time for locating remote

    remote connected nodes. connected nodes. One client.

    3.4 Content Discovery

    The way in which content is stored and advertised on a network can dramatically

    influence the effectiveness of its associated search methods. The advantage of a systemwith a centralized directory is that it is possible to quickly gain access to informationabout which nodes contain which files. Systems such as KaZaA use this fact to its

    advantage by combining a fast directory lookup node, or supernode, with the propagate

    power of decentralized systems.

    Fast access to content references initially happens when the client registers with the

    remote server. As each new node registers and deregisters with the remote server, the

    updateDir() method updates the array of file names and sizes with the currentinformation. The local client then stores that information as an array in its local directory:

    String[] filenames = new String[99]; //stores file listLong[] fileSizes = new long[99]; //stores file sizes

    The metrics I used to explore the content location qualities of centralized systems areoutlined in the below charts. Two approaches were designed. In the first test, each peer

    simultaneously searches for a different file on the network. In the second approach, each

    peer is simultaneously searching for the same file on the network. The focus of these two

    different tests is, in general, to gauge the overall time it takes to locate a file on thenetwork and how well file location performs under increased load conditions.

    23

  • 8/6/2019 Search on Centralized Networks

    24/40

    0

    1

    2

    3

    4

    0 20 40

    0

    1

    2

    3

    4

    5

    -10 10 30 50

    Figure 13: wait time for multiple Figure 14: wait time for multiple

    peers requesting the same peers requesting multiple

    file. files.

    As observed by Figures 9 and 10, there is no real significant differences between multiple

    clients accessing the same file and multiple files being accessed by multiple clients. I didexpect to see some variation in the results. There is, however, some notable spikes in

    Figure 9s activity. Whether or not I can attribute these small increases to issues such as

    thread safety or the remote activation of objects is hard to say - although I doubt it. It ismore likely that these notables are due to slight variations in the results. As a side note, it

    is important to point out that, of the files searched, the majority of the desirable files

    (likely the ones I was querying the most often) were located on a single client. In terms

    wait time resulting from competition for resources, this observation (on a small scale,anyway) doesnt seem to have much effect on the systems activity.

    A second test was conducted to rank the expected results from content searches based onthepopularity of the content. Unfortunately, I dont have the luxury of a large network

    used by a diverse group of users with varied interests to get a truly random sampling of

    how popularity may influence usage of the network. As a next best solution, I gave eachfile on the network a popularity ranking. This was done by assigning values from 1 to

    10 (10 being the highest) to each of the ten files on each for the three clients being tested.

    As the remote method invocations request files at random, the final results hope to give

    us an idea of where traffic might be directed within the network. The results of this testrevealed:

    Client A Rank Hits Client B Rank Hits Client C Rank Hits

    Timer.java 10 x Crossley.txt 10 Hayden.txt 10

    Server.java 9 Xxx Hello.c 9 X Client.java 9

    Bio.txt 8 Xxxxx Resume.txt 8 Xxx Letter.txt 8 x

    Memo.doc 7 Xxx Dad.doc 7 Xx Crypt.java 7

    Summary.txt 6 Xxx Itinerary.txt 6 Xx ToDo.txt 6 xx

    Funny.txt 5 Xx FindIt.html 5 Xx Flight.txt 5 xx

    RMI.html 4 Xx NIHA.txt 4 X Monitor.java 4 x

    NMH.html 3 x Dickens.txt 3 Xxx eCOS.html 3 x

    Sam.doc 2 Stream.java 2 X JXTA.txt 2 xx

    Mom.doc 1 Jill.doc 1 Columbia.txt 1 x

    24

  • 8/6/2019 Search on Centralized Networks

    25/40

    3.5 Content Delivery

    This section of my diploma project explores how content delivery behaves in a

    centralized network environment. Namely, I evaluate how the flow of informationhappens from one node to the next. To get a sense of how efficient this type of

    information exchange is in our small network, the two tests I used evaluated 1) multiple

    file downloads (on various small files under 2 MEG in size) happening at the same timeand 2) a multiple downloads of a single large file (20 MEG in size).

    0

    2

    4

    6

    8

    -10 10 30 50

    Figure 13: wait time for multiple Figure 14: wait time for multiplepeers requesting peers requesting a large file.

    small files.

    Once a file was located on the network, the actual transmission from one node to the nextproved to be a bit more complicated than I anticipated. After researching solutions such

    as TCP/IP and IO Streams, it seemed that the best method for RMI file transfer was to

    read the files contents into an array of bytes on the remote client. To do this, I followedthe following sequence of events:

    1) Instantiate the remote object

    2) Open the file and get its size3) Allocate the byte array and read the file into that array.

    4) Copy the file name.

    Once these steps were accomplished, the remote file object could be transferred by

    calling thegetFile method on the remote client. This method call fills the local clients

    array of bytes with the bytes from the remote file. Then, the local client calls the

    writeFile method which writes the contents of the byte array to a file.

    As observed in Figure 12, initial large-file transfer tests yielded poor results. In most

    cases, the transfer of a large file proved too memory intensive and the system simplyhung. After exploring this problem, I discovered that this wasnt a short-coming of the

    centralized architecture design but how the writeFile method call was writing bytes into

    the local array. The (short-term) remedy to the file was to cut the client file into byte 5

    25

  • 8/6/2019 Search on Centralized Networks

    26/40

    MEG byte arrays and transfer the file as a sequential series, reassembling on the

    receiving end.

    3.6 Observing results

    The network characteristics of centralized systems were studied with peer location,

    content location, and file download effectiveness in mind. In our fist test, I tested a singleclients ability to invoke remote method calls on the server to get a listing of connected

    peers on the network. The metric used is the wait time between the execution of the

    command and the returned results. In the fist case, I found that that average wait time is

    roughly 1.562 in an environment where load is being placed on the server. In the casewhere one node is accessing the remote server to get peer information, the wait time is

    comparatively less 1.262 as might be expected.

    The ability for a local client to quickly find the location of files on the network was

    shown in the second exploration. In both cases multiple nodes querying the same and

    different files the response time was immediate. As an added observation to this test, itis interesting to note that the percentage of responses to requests maintains a high and

    predictable level. At no one time was a request for a file rejected by the remote server. A

    second test was added to the subject of content location popularity. The popularity of aparticular file (or group of files on a particular node) on the network can dramatically

    influence how activity is distributed. In our test a sampling taken at random shows that

    that overall load of the network was weighted towards Client A. In such a case, especially

    if the popularity of files is proportionally small on the surrounding peers, a bottleneckcould possibly occur. As observed by the weighted results in Client A, an effective

    solution to evenly distributing how and when remote objects are invoked by connected

    clients is an important consideration when dealing with a centralized system.

    Finally, the effects a file retrieval has on the system was tested. In both cases, all three

    peers in the network were transferring a relatively small file under 2 MEG and then alarger file of 20 MEGS.

    26

  • 8/6/2019 Search on Centralized Networks

    27/40

    5 Conclusions

    5.1 Further Development

    There are a number of improvements I would likely make to this project, given more

    time. As mentioned earlier, one of the primary problems of centralized systems is thatthey are not as efficient in propagating requests throughout the network as their

    decentralized counterparts. The problem is that a remote peer cannot send unrequested

    data to a client doing a search. The remote peer can only send data when it is explicitlycalled for by the requesting client. This inflexibility of the centralized RMI approach

    makes it quite impossible to seamlessly share information throughout the network. To

    illustrate the point, consider Client A. When Client A wants a file, it makes a call to theremote Server, requesting information about the other connected nodes. If Client B has

    the requested file, then Client A makes a direct connection. But, what if the file does not

    exist within Client As peergroup, but is available somewhere else on the network? In an

    ideal situation, Client B could be able to refer the requesting client to another node on thenetwork that does have the file. This is the essence of the JXTA API. JXTA uses the

    notion of Advertisements and a Peer Discovery Protocol (PDP) to fluidly locate

    references to information throughout the network. Advertisements are essentiallymessages represented as XML that make available information stored in a given peers

    cache such as other peers, peer groups, or available local or remote content. When a

    peer attempts to discover a particular piece of content, it searches the referring

    advertisements until a reference to the correct node is found. The efficient propagationmethods of the JXTA protocol are not possible in an RMI environment where

    information exchange is a one-to-one dynamic.

    A second improvement would address the problem of concurrency within distributed

    systems. The way RMI currently works, a method dispatched by the RMI runtime to a

    remote object implementation may or may not execute in a separate thread. The RMIruntime makes no guarantees with respect to mapping remote object invocations to

    threads. As a result, when an RMI server is written, any assistance in executing separate

    threads must be hand coded. This introduces a degree of complexity that, although I didnot have time to address it in this diploma project, is crucial to any system that entertains

    the possibility of numerous simultaneous client requests (such as a multi-user file-sharing

    system).

    Another improvement I would like to add, time allowing, would be the inclusion ofadditional tests for each of the three subject groups. I feel that additional variations could

    be done on system load testing. For example, in the case of the content discovery tests, it

    might be interesting to explore how the system behaves when content is evenlydistributed throughout the network versus unevenly located in only a few nodes. Another

    27

  • 8/6/2019 Search on Centralized Networks

    28/40

    such improvement might be a decent stab at building a system that manages to propagate

    requests from node to node. Given the focus of this project, I was not able to invest too

    much effort in finding a good solution to this problem. None-the-less, propagation is theshortcoming of centralized systems (and the advantage of its decentralized kin) and the

    design of an RMI system that intelligently tackles this problem would certainly be

    interesting. Finally, it would definitely be useful to see gauge how each of the search,discovery, and retrieval subject areas behave as the size of the system scales. While a

    three node network is useful for the purposes of an academic exploration, translation into

    the day-to-day environments would require a more robust architecture.

    5.2 Final Conclusion

    Overall, the goals of this project have been accomplished. I have spent a good deal of

    time testing the various strengths of searching a centralized network and have found thatsuch a network can be both powerful and powerless depending on what you are

    demanding of it. Centralized directory servers are a very powerful tool for providing fast

    references to remote locations on the network. This fact is certainly a valuablecommodity in large, multi-node environments where multiple files are being shared. Thedata points under peer discovery and peer delivery sections certainly back up this finding.

    On the other hand, I found content delivery to be a problematic for the reasons stated

    above. I dont feel this is the result of a centralized environment but, rather theshortcomings of RMI. Certainly, my earlier solution of cutting up my files into segments

    of byte streams could be solved with sockets or some other such solution, but, the issue of

    propagation makes RMI a poor solution for large-scale file sharing environments centralized or decentralized.

    28

  • 8/6/2019 Search on Centralized Networks

    29/40

    Appendices

    A. Key Code Samples

    //The group of Time classes in the Time package act as an

    //aid for data collection by remotely invocing the stub objects

    //on the server returning the times that objects were

    //invoked.

    TimeMonitorImpl.java

    Package Time;

    import java.rmi.*;

    import java.util.Date;

    import java.io.Serializable;

    public class TimeMonitorImpl implements TimeMonitor, Serializable

    {

    public void tellMeTheTime( Date d ) throws RemoteException

    {

    System.out.println("Time: " + d.toString() + "\n");

    }

    }

    //The below method helps register the clients with the server.

    register() in ServerImpl.java

    public int register (Client client) {

    String chost="";

    try {

    chost = getClientHost();

    } catch (ServerNotActiveException ignored) {

    }

    clients.addElement (client);

    try{System.out.println(chost + " has registered - sharing "

    +client.getNumFiles()+" files");

    givePeers(client);}

    catch (RemoteException ignored) {}

    return clients.size()-1;}

    public void givePeers (Client client) throws RemoteException{

    if (clients.size () > 1) { // Only give random clients if there is

    try {int randNumber = (int)(Math.random()*(clients.size()-1));

    System.out.println("Giving new Peer "+randNumber+" to

    remote.");

    Client temp = (Client) clients.elementAt (randNumber);

    randNumber = (int)(Math.random()*(clients.size()-1));

    29

  • 8/6/2019 Search on Centralized Networks

    30/40

    System.out.println("Giving new Peer "+randNumber+" to

    remote.");

    Client temp2 = (Client) clients.elementAt(randNumber);

    randNumber = (int)(Math.random()*(clients.size()-1));

    System.out.println("Giving new Peer "+randNumber+" to

    remote.");

    Client temp3 = (Client) clients.elementAt

    (randNumber);

    client.initPeers(temp, temp2, temp3); }

    catch (RemoteException ignored) {}

    } else { // First client, register with itself three

    times;

    Client temp = (Client) clients.elementAt (0);

    try {client.initPeers(temp, temp, temp); }

    catch (RemoteException ignored) {}

    }

    }

    //The main method of the ClientImpl class. Key methods from the

    //Client class are passed via this method.

    public static void main (String[] args) throws RemoteException,

    NotBoundException {

    if (args.length != 1) throw new IllegalArgumentException

    ("please enter host name");

    ClientImpl Client = new ClientImpl (args[0]);

    Client.start ();

    }

    public String listFile(int i) throws java.rmi.RemoteException {

    return this.fileNames[i];

    }

    public long listSize(int i) throws java.rmi.RemoteException {return this.fileSizes[i];

    }

    public int getNumFiles() throws java.rmi.RemoteException {

    return numFiles;

    }

    public int getTime() throws java.rmi.RemoteException {

    return getTime;

    }

    public String getIP() throws java.rmi.RemoteException {

    return myip;

    }

    30

  • 8/6/2019 Search on Centralized Networks

    31/40

    B. Bibliography

    [1] Brookshier, Govoni, Krishnan, & Soto, JXTA: Java P2P Programming.

    [2] Gradecki, Mastering JXTA.

    [3] Rosenberg & Scott, Applying Use Case Driven Object Modeling with

    UML.

    [4] Fowler & Scott, UML Distilled, Second Edition.

    [5] Kolenikov & Hatch, Building Linux Virtual Private Networks (VPNs).

    [6] Nelson Minar: Distributed Systems Topologies, Parts 1 and 2

    http://www.openp2p.com/pub/a/p2p/2001/12/14/topologies_one.html

    31

  • 8/6/2019 Search on Centralized Networks

    32/40

    C. Project Proposal

    Peter DushkinQueens

    pbd22

    Diploma in Computer Science Project Proposal

    A File Discovery Scheme in Decentralized Computing

    December 6th, 2002

    Project Originator: Peter Dushkin

    Project Supervisors: Meng How Lim

    Signature:

    Director of Studies: Dr. Robin Walker

    Signature:

    Overseers: Dr. Larry Paulson & Dr. Tim Harris

    32

  • 8/6/2019 Search on Centralized Networks

    33/40

    Table of Contents

    Introduction..........................................................................................................................2Project Proposal...................................................................................................................3

    I. Front end application.................................................................................................3

    II. Node software...........................................................................................................4Resources.............................................................................................................................5

    Supervision Requirements...................................................................................................5

    Phases of Development........................................................................................................6Timetable and Milestones....................................................................................................7

    Weeks 1 and 2: Proposal Definition.............................................................................8

    Weeks 3 to 6: Paper Network Design...........................................................................8Weeks 7 to 10: Paper Software Design.........................................................................8

    Weeks 11 to 13: Physical Network Implementation.....................................................8

    Weeks 14 and 20: Application Coding.........................................................................8

    Weeks 21 to 27: Evaluation & Debugging...................................................................9

    Weeks 28 to 35: Evaluation & Debugging/Dissertation...............................................9Week 36: Final Form....................................................................................................9

    33

  • 8/6/2019 Search on Centralized Networks

    34/40

    Introduction

    The ways in which a network of computers share files have come a long wayover the past decade. The days of a small office or university network sharing

    documents over a dedicated LAN or WAN have rapidly evolved into radically newareas of network computing. The two most commonly used today are bothcentralized and decentralized architectures.

    Centralized, or client/server, networks rely on one central server to adjudicateactivity. The central computer maintains a database of files owned by computerson the network. When a computer requests a file, it is checked by the centralserver against the database and, if acknowledged, a direct connection can beestablished between the requesting and sending computers.

    The problem with a centralized network architecture is that a lot of demand is

    placed on the central server. As a result, the network can become quite slow dueto bottlenecks. Also, should the central server experience problems or go down,the whole network is affected.

    A response to these problems is decentralized computing. In this model, all of thenodes on the network act in both a client and server capacity, removing the needfor a central server. This project will be using a file location scheme to show thecomparative advantages of decentralized to centralized computing.

    34

  • 8/6/2019 Search on Centralized Networks

    35/40

    Project Proposal

    This diploma project will use the TCP and RMI protocols to search for files on a smalldecentralized network. A Java-based GUI application will be developed to serve as the

    primary interface to the network. It will allow the end-user to ping the nodes on the

    network and discover information about the various computers - with the end-goal of filelocation. Each node on the network will provide a simple query interface that enables

    them to receive requests and respond accordingly.

    Outlined below are some possible features of the requesting and responding nodes:

    I. Front end application

    The front-end application will be the GUI interface to the decentralized network.

    Possible networking protocols involved are RMI, TCP, IP and UDP. The

    applications core objective is to serve as an interface for the location anddiscovery of files. Additionally, it should return information about individual

    nodes. Some of the returning information might be:

    a) The IP address of each computer

    b) Network Bandwidth used by nodes.

    c) Network status of each computer.

    d) Time/milliseconds between ping and pong.e) The geographic location of the computers.

    II. Node software

    The node software will directly relate and respond to the incoming

    packets sent by the requesting computer. As a result, the primaryresponsibility of the node software is to return the appropriate

    information or pass the request along to the next computer. Some of the

    resulting class definitions should be:

    Pong (to return a packet request with related information)

    Retrieval of information

    Download of file(s)

    35

  • 8/6/2019 Search on Centralized Networks

    36/40

    Figure 1: Intended Network Build

    Below is the intended network set-up. I will be logging onto the suggested networkthrough SSH.

    College Server

    linux2.pwf.cl.cam.ac.uk linux3.pwf.cl.cam.ac.uk linux4.pwf.cl.cam.ac.uk

    131.111.128.110

    CAMBRIDGE

    NETWORK

    PWF Server/Client PWF Server/Client PWF Server/Client

    Router

    L. 1

    L. 2

    L. 3

    Possible Extension of the Project

    Depending on the overall development of the project, ways of decreasing bandwidthutilization may be considered. A number of peer-to-peer networking protocols have been

    wrestling with possible solutions to the problem of excessive network traffic. Below are

    some suggested solutions.

    I. Pong Limiting

    Pong limiting reduces the amount of traffic on the network by only

    returning a pong with its own address if the host is not restricted by afirewall. Moreover, only a fixed number of pongs should be returned in

    response to a given ping.

    II. Pong Caching

    A drawback of pong limiting is that it is inefficient if too many pings arebeing sent. To solve this problem, a possible solution is the caching ofthe most recent pongs and avoiding the broadcast of pings.

    In other words, if the appropriate reply is cached, then the distance that

    the matching request has to travel can be significantly reduced.

    III. Ping Multiplexing

    36

  • 8/6/2019 Search on Centralized Networks

    37/40

    The idea behind Ping Multiplexing is that when a singe incoming ping

    reaches a node, it is "multiplexed" into numerous outgoing pings. The

    reverse is true for pongs (numerous pongs can be "demultiplexed" intoone pong).

    Resources

    I am planning on using the following:

    1. Operating Systems: Linux

    2. Programming Language: Java, UNIX, possibly Perl

    3. Networking Protocols: TCP, IP, RMI, UDP4. Hardware: C4 computers via SSH.

    5. Additional Software: Viso for UML

    6. Storage: My ADP Tape drive, possibly Penguin.

    Supervision Requirements

    I will be sitting down with Mr. How Lim once ever two weeks to provide a project update

    and discuss milestones. Otherwise, we will be corresponding via email as needed.

    Phases of Development

    37

  • 8/6/2019 Search on Centralized Networks

    38/40

    Research andImformation

    Gathering

    Paper

    Planning

    Physical

    NetworkDesign

    Physical

    SoftwareDesing

    Dissertation and

    Completion

    Timetable and Milestones

    Weeks 1 and 2: Proposal Definition

    38

  • 8/6/2019 Search on Centralized Networks

    39/40

    * Meetings with Supervisor, Overseer, Director of studies.

    * Set up schedule of meetings with overseers and supervisor.* Consolidation of project plan and overall direction.

    * Acknowledgement of resource availability.

    * Supporting research and data collection.* Project fine tuned per overseer/supervisor comments and finalized.

    * Network hardware availability sorted out.

    * All appropriate signatures prior to final project plan.* Milestones: Final project plan.

    Weeks 3 to 6: Paper Network Design

    * Study Linux and how it is going to be used for this project.

    * Study particularly relevant aspects of Unix/Linux documentation.

    * Study particularly relevant aspects of P2P documentation.* Start designing sketches of network diagram and related functions.

    * Software Modules* Start building UML diagrams of approved sketches.

    * Milestone: Final Network Design.

    Weeks 7 to 10: Paper Software Design

    * Study related networking java source code.

    * Study documentation on ping/pop schemes.* Create sketches of the application's GUI.

    * Create pseudocode for all of the items in "A" to "I" above.* Design methods, classes, procedures, functions, etc.* Apply work to UML diagrams.

    * Milestone: Final Application Design

    Weeks 11 to 13: Physical Network Implementation

    * Implement Network Design* Start to think about dissertation.

    * Milestone: Running Network

    Weeks 14 and 20: Application Coding

    * Implement Application Design

    * Successful Implementation of all "core items".* Start to work on the development of dissertation.

    * Milestone: Prototype Application

    Weeks 21 to 27: Evaluation & Debugging

    39

  • 8/6/2019 Search on Centralized Networks

    40/40

    * Check consistency, logic, etc.

    * Evaluate the overall protocol* Check against original schematics/specification

    * Evaluate network efficiency/functioning

    * Evaluate bandwidth usage

    Weeks 28 to 35: Evaluation & Debugging/Dissertation

    * These weeks are a repetition of the last weeks.

    * Check consistency, logic, etc.

    * Evaluate the overall protocol* Check against original schematics/specification

    * Evaluate network efficiency/functioning

    * Continuation of work and development on dissertation.

    * Milestone: Fully flushed out dissertation

    Week 36: Final Form

    * Completed application and dissertation.

    * Milestone: Completed Dissertation and Application