Distributed computing

UNIVERSITY INSTITUTE OF TECHNOLOGY,

THE UNIVERSITY OF BURDWAN

PARALLEL COMPUTING LAB

Presentation on Distributed Computing

DISTRIBUTED COMPUTING

Presented by..Alokeparna Choudhury(ME201310005)

Hossainara Begum (ME201310004)

CONTENTS

INTRODUCTION CENTRALIZED VS. DISTRIBUTED COMPUTING WHAT IS DISTRIBUTED SYSTEM? ORGANIZATION ARCHITECTURE TYPES OF DISTRIBUTED SYSTEM COMMUNICATION MIDDLEWARE MOTIVATION HISTORY GOAL CHARACTERISTICS EXAMPLES OF DISTRIBUTED COMPUTING DISTRIBUTED COMPUTING USING MOBILE AGENT

CONTD..

TYPICAL DISTRIBUTED COMPUTING A TYPICAL INTRANET INTERNET JAVA RMI TRANSPARENCY IN DISTRIBUTED SYSTEM CATEGORIES OF APPLICATIONS IN DISTRIBUTED

COMPUTING MONOLITHIC MAINFRAME APPLICATION vs DISTRIBUTED

APPLICATION ADVANTAGES DISADVANTAGES ISSUES & CHALLENGES CONCLUSION REFERENCES

INTRODUCTION

Nowadays it is not only feasible but also easy to put together computing systems composed of large numbers of computers connected by a high-speed network.

They are usually called computer networks or distributed systems, in contrast to the previous centralized systems(or, single-processor system).

CENTRALIZED VS. DISTRIBUTED COMPUTING

Early computing was performed on a single processor. Uni processor computing can be called Centralized computing.

A Distributed system is a collection of independent computers, interconnected via a network, capable of collaborating on a task. Distributed computing is computing performed in a distributed system.

Centralized computing Distributed computing

m ain f r am e c o m p u terw o r k s ta tio n

n e tw o r k h o s t

n e tw o r k lin k

te r m in a l

ce n tra lize d co m pu t in gdis t r ibu te d co m pu t in g

WHAT IS DISTRIBUTED SYSTEM?

Definition ‘‘A system in which hardware and software components

located on networked computers communicate and coordinate their actions only by passing messages.’’ (Coulouris)

‘‘A distributed system is a collection of independent computers that appears to its users as a single coherent system. ’’ (Tannenbaum)

• A Distributed system consists of multiple autonomous computers that communicate through a computer network.

• Distributed computing utilizes a network of many computers, each accomplishing a portion of an overall task, to achieve a computational result much more quickly than with a single computer.

• Distributed computing is any computing that involves multiple computers remote from each other that each have a role in a computation problem or information processing.

• In the term distributed computing, the word distributed means spread out across space. Thus, distributed computing is an activity performed on a spatially distributed system.

• These networked computers may be in the same room, same campus, same country, or in different continents.

CooperationCooperation

Cooperation

InternetInternet

Large-scaleApplicationResource

Management

Subscription

Distribution

Distribution Distribution

Distribution

Agent

Agent Agent

Agent

Job Request

ORGANIZATION

Organizing the interaction between each computer is of prime importance. In order to be able to use the widest possible range and types of computers, the communication channel should not contain or use any information that may not be understood by certain machines.

Special care must also be taken that messages are delivered correctly and that invalid messages are rejected which would otherwise bring down the system and perhaps the rest of the network.

Another important factor is the ability to send software to another computer in a portable way so that it may execute and interact with the existing network. This may not always be possible when using differing hardware and resources, in which case other methods must be used such as cross-compiling or manually porting this software.

ARCHITECTURE

Distributed programming typically falls into one of several basic architectures: Client-server, 3-tier architecture, N-tier architecture, Distributed objects, loose coupling, or tight coupling.

Client-server — Smart client code contacts the server for data, then formats and displays it to the user. Input at the client is committed back to the server when it represents a permanent change.

3-tier architecture — Three tier systems move the client intelligence to a middle tier so that stateless clients can be used. This simplifies application deployment.

N-tier architecture — N-Tier refers typically to web applications which further forward their requests to other enterprise services. This type of application is the one most responsible for the success of application servers.

Tightly coupled (clustered) — refers typically to a set of highly integrated machines that run the same process in parallel, subdividing the task in parts that are made individually by each one, and then put back together to make the final result.

Peer-to-peer —an architecture where there is no special machine or machines that provide a service or manage the network resources. Instead all responsibilities are uniformly divided among all machines, known as peers. Peers can serve both as clients and servers.

Space based — refers to an infrastructure that creates the illusion (virtualization) of one single address-space. Data are transparently replicated according to application needs. Decoupling in time, space and reference is achieved.

TYPES OF DISTRIBUTED SYSTEM

Distributed Computing Systems

--cluster computing system

--grid computing system Distributed Information Systems

--transaction processing system

--enterprise application integration Distributed Pervasive Systems

--home system

--electronic health care system

--sensor networks

COMMUNICATION MIDDLEWARE

Several types of communication middleware exist.With remote procedure calls (RPC), an application component can send a request to another application component by doing a local procedure call, which results in the request being packaged as a message and sent to the caller. Then the result will be sent back to the caller application as the result of the procedure call.Techniques were developed to allow calls to remote objects, leading to what is known as remote method invocations (RMI).

MIDDLEWARE (CONTD.) An RMI is essentially the same as an RPC, except that it operates on

objects instead of applications.

RPC and RMI have the disadvantage that the caller and callee both need to be up and running at the time of communication. In addition, they need to know exactly how to refer to each other.

This tight coupling is often experienced as a serious drawback, and has led to what is known as message-oriented middleware (MOM). In this case, applications simply send messages to logical contact points, often described by means of a subject. Applications can indicate their interest for a specific type of message, after which the communication middleware will take care that those messages are delivered to those applications.

MOTIVATION

Inherently distributed applications Performance/cost Resource sharing Flexibility and extensibility Availability and fault tolerance Scalability Network connectivity is increasing. Combination of cheap processors often more cost-

effective than one expensive fast system. Potential increase of reliability.

HISTORY1975 - 1995 Parallel computing was favored in the early years Primarily vector-based at first Gradually more thread-based parallelism was introduced The first distributed computing programs were a pair of programs called

Creeper and Reaper invented in 1970s Ethernet that was invented in 1970s. ARPANET e-mail was invented in the early 1970s and probably the earliest

example of a large-scale distributed application. Massively parallel architectures start rising and message passing interface

and other libraries developed Bandwidth was a big problem The first Internet-based distributed computing project was started in 1988

by the DEC System Research Center. Distributed.net was a project founded in 1997 - considered the first to use

the internet to distribute data for calculation and collect the results.

1995 – TODAY

Cluster/grid architecture increasingly dominantSpecial node machines eschewed in favor of COTS

technologiesWeb-wide cluster softwareGoogle take this to the extreme (thousands of

nodes/cluster)SETI@Home started in May 1999 - analyze the

radio signals that were being collected by the Arecibo Radio Telescope in Puerto Rico.

GOAL

Making Resources Accessible Data sharing and device sharing

Distribution Transparency Access, location, migration, relocation, replication,

concurrency, failure Communication

Make human-to-human comm. easier. E.g.. : electronic mail

Flexibility Spread the work load over the available machines in the

most cost effective way To coordinate the use of shared resources To solve large computational problem

CHARACTERISTICS

Resource Sharing Openness Concurrency Scalability Fault Tolerance Transparency

EXAMPLES OF DISTRIBUTED COMPUTING

Network of workstations (NOW) / PCs: a group of networked personal workstations or PCs connected to one or more server machines.

Distributed computing using mobile agents

The Internet(World Wide Web)

An Intranet: a network of computers and workstations within an organization, segregated from the Internet via a protective device (a firewall).

JAVA Remote Method Invocation (RMI)

DISTRIBUTED COMPUTING USING MOBILE AGENTS

Mobile agents can be wandering around in a network using free resources for their own computations.

TYPICAL DISTRIBUTED COMPUTING

intranet

ISP

desktop computer:

backbone

satellite link

server:

network link:

A TYPICAL INTRANET

the rest of

email server

Web server

Desktopcomputers

File server

router/firewall

print and other servers

other servers

print

Local areanetwork

email server

the Internet

INTERNET

The Internet is a global system of interconnected computer networks that use the standardized Internet Protocol Suite (TCP/IP).

JAVA RMI

Embedded in language Java:- Object variant of remote procedure call Adds naming compared with RPC (Remote Procedure Call) Restricted to Java environments

TRANSPARENCY IN DISTRIBUTED SYSTEMS Access transparency: enables local and remote resources to be accessed using identical

operations. Location transparency: enables resources to be accessed without knowledge of their

physical or network location (for example, which building or IP address). Concurrency transparency: enables several processes to operate concurrently using

shared resources without interference between them. Replication transparency: enables multiple instances of resources to be used to increase

reliability and performance without knowledge of the replicas by users or application programmers.

Failure transparency: enables the concealment of faults, allowing users and application programs to complete their tasks despite the failure of hardware or software components.

Mobility transparency: allows the movement of resources and clients within a system without affecting the operation of users or programs.

Performance transparency: allows the system to be reconfigured to improve performance as loads vary.

Scaling transparency: allows the system and applications to expand in scale without change to the system structure or the application algorithms.

CATEGORIES OF APPLICATIONS IN DISTRIBUTED COMPUTING

Science Life Sciences Cryptography Internet Financial Mathematics Language Art Puzzles/Games Miscellaneous Distributed Human Project Collaborative Knowledge Bases Charity

MONOLITHIC MAINFRAME APPLICATION VS DISTRIBUTED APPLICATION

The monolithic mainframe application architecture: Separate, single-function applications, such as order-entry or

billing Applications cannot share data or other resources Developers must create multiple instances of the same

functionality (service).

The distributed application architecture: Integrated applications Applications can share resources A single instance of functionality (service) can be reused.

ADVANTAGES OF DISTRIBUTED COMPUTING

Cost : Better price / performance as long as everyday hardware is used for the component computers – Better use of existing hardware

Performance : By using the combined processing and storage capacity of many nodes, performance levels can be reached that are out of the scope of centralised machines

Scalability : Resources such as processing and storage capacity can be increased incrementally

Inherent distribution : Some applications like the Web are naturally distributed

Reliability : By having redundant components the impact of hardware and software faults on users can be reduced

DISADVANTAGES OF DISTRIBUTED COMPUTING

The disadvantages of distributed computing: Multiple Points of Failures: the failure of one or more

participating computers, or one or more network links, can generate trouble.

Security Concerns: In a distributed system, there are more opportunities for unauthorized attack.

Software: Distributed software is harder to develop than conventional software; hence, it is more expensive

ISSUES & CHALLANGES

Heterogeneity of components :-

Variety or differences that apply to computer hardware, network, OS, programming language and implementations by different developers.

All differences in representation must be deal with if to do message exchange.

Example : different call for exchange message in UNIX different from Windows.

Openness:-

System can be extended and re-implemented in various ways. Cannot be achieved unless the specification and documentation

are made available to software developer. The most challenge to designer is to tackle the complexity of

distributed system; design by different people.

Transparency:-

Aim : make certain aspects of distribution are invisible to the application programmer ; focus on design of their particular application.

They not concern the locations and details of how it operate, either replicated or migrated.

Failures can be presented to application programmers in the form of exceptions – must be handled.

Security:-

Security for information resources in distributed system have 3 components :

a. Confidentiality : protection against disclosure to unauthorized individuals.

b. Integrity : protection against alteration/corruption

c. Availability : protection against interference with the means to access the resources.

The challenge is to send sensitive information over Internet in a secure manner and to identify a remote user or other agent correctly.

Scalability :- Distributed computing operates at many different scales, ranging

from small Intranet to Internet. A system is scalable if there is significant increase in the number

of resources and users. The challenges is :

a. controlling the cost of physical resources.

b. controlling the performance loss.

c. preventing software resource running out.

d. avoiding performance bottlenecks.

Failure Handling :- Failures in a distributed system are partial – some

components fail while others can function. That’s why handling the failures are difficult

a. Detecting failures : to manage the presence of failures cannot be detected but may be suspected.b. Masking failures : hiding failure not guaranteed in the worst case.

Concurrency :- Where applications/services process concurrency, it will

effect a conflict in operations with one another and produce inconsistence results.

Each resource must be designed to be safe in a concurrent environment.

CONCLUSION

The concept of distributed computing is the most efficient way to achieve the optimization.

Distributed computing is anywhere : intranet, Internet or mobile ubiquitous computing (laptop, PDAs, pagers, smart watches, hi-fi systems)

It deals with hardware and software systems, that contain more than one processing / storage and run in concurrently.

Main motivation factor is resource sharing; such as files , printers, web pages or database records.

Grid computing and cloud computing are form of distributed computing.

REFERENCES

Andrew S. Tanenbaum and Maarten Van Steen, Distributed Systems : Principles and Paradigms, Pearson Prentice Hall, 2nd Edition 2007.

www.inderscience.com/ijcnds George Coulouris, Jean Dollimore, and Tim Kindberg,

Distributed Systems: Concepts and Design, Addison-Wesley,Pearson Education 3rd Edition 2001.

Distributed computing

Engineering

Transcript of Distributed computing