Distributed Computing

1

Distributed Computing

Lecture 7: Grid Computing

2

Tuesday, February 24, 2009

“UNIX was never designed to keep people from doing stupid things, because that policy would also

keep them from doing clever things.”

- Doug Gwyn

3

Aggregation of Computing Power: Clustering

High Performance Computing (HPC) environments created using workstations interconnected via high speed networks.

Desirables in clustering Scalable solutions Readily available environment for research into parallel

computing. Unused computing cycles can be scavenged providing

inexpensive additional computing capacity. Commodity microprocessor based systems offer enormous

cost benefits. Robust/stable first generation software available.

4

Disadvantages of Clustering Cluster is a dedicated facility built at a single

location.

Financial, political and technical constraints place limits on the size of clusters.

Generally fall outside the financial limits of individual research groups.

5

The Grid Problem

Flexible, secure, coordinated resource sharing among dynamic collections of individuals and institutions

Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals -- assuming the absence of… central location, central control, omniscience, existing trust relationships.

GridGrid

Diff. P2P Vs Grid

6

The Grid Problem (contd.)

Infrastructure?, framework ?, platform ? Architecture ? Infrastructure and failure

The sharing that we are concerned with is not primarily file exchange but rather direct access to computers, software, data, and other resources, as is required by a range of collaborative problem solving and resource brokering strategies emerging in industry, science and engineering.

This sharing is, necessarily highly controlled, with resource providers and consumers defining clearly and carefully just what is shared, who is allowed to share, and the conditions under which sharing occurs.

A set of individuals and/or institutions defined by such sharing rules form what we call a virtual organization (VO).

7

Elements of the Problem

Resource sharing Computers, storage, sensors, networks, … Sharing always conditional: issues of trust, policy,

negotiation, payment, … Coordinated problem solving

Beyond client-server: distributed data analysis, computation, collaboration, …

Dynamic, multi-institutional virtual orgs Community overlays on classic org structures Large or small, static or dynamic

8

Broader Context

“Grid Computing” has much in common with major industrial thrusts Business-to-business, Peer-to-peer, Application Service

Providers, Storage Service Providers, Distributed Computing, Internet Computing…

Sharing issues not adequately addressed by existing technologies Complicated requirements: “run program X at site Y

subject to community policy P, providing access to data at Z according to policy Q”

High performance: unique demands of advanced & high-performance systems

9

Common Characteristics of Grids Grids are large

Both in terms of number of potentially available resources and their geographical dispersion.

Grids are distributed Latencies involved in moving data between resources are

substantial and may dominate applications. Grids are dynamic

Available resources change on the same time scale as the life span of a typical application.

10

Common Characteristics of Grids (Contd.)

Grids are heterogeneous Form and properties of sites differ in significant ways.

Grids cross the boundaries of human organizations Policies for access to and use of resources differ at different

sites.

11

Typical Issues in Grids Resource discovery Execution planning Authentication and security Heterogeneity of compute servers and data formats Pricing

12

Grid Architecture

Application

Languages and Frameworks

Collective APIs and SDKs

Collective Services

Resource APIs and SDK

Resource Services

Connectivity APIs and SDKs

Connectivity Services

Fabric

Grid Services Architecture

Applications

Grid ServicesSecurity

Data Management Fault Detection

Information Services

Resource Management

Grid FabricData Transport

Schedulers

Control Interfaces

Operating Systems

Application Toolkits

Data Intensive Remote VisualizationCollab Design

High Throughput Remote Control

Portability

Instrumentation QoS Services

High Energy Physics Data Analysis Climate StudiesCollab Engineering

Online Instrumentation

Message Passing

14

Fabric Provides resources to which shared access is mediated by

grid protocols.

Fabric components implement the resource specific operations that occur on specific resources as a result of sharing operations at higher layers. [LSF]

At a minimum, resources should implement, Enquiry mechanisms

Permit discovery of structure, state and capabilities of resources. Resource management mechanisms

Provide control of delivered quality of service.

15

Connectivity Layer Defines core communication and authentication

protocols required for Grid-specific network transactions.

Communication requirements include transport, routing and naming.

Authentication protocols are built on communication protocols to provide cryptographically secure mechanisms for verifying the identity of users and resources.

16

Resource Layer

Builds on connectivity layer protocols to define protocols, APIs and SDKs for the secure negotiation, initiation, monitoring, control, accounting and payment of sharing operations on individual resources.

Resource layer protocols call Fabric layer functions to access and control resources.

Resource layer protocols are chosen to capture the fundamental mechanisms of sharing across many different resource types without constraining the type and performance of higher protocols.

17

Resource Layer (Contd.)

Primary classes of resource layer protocols Information protocols

Used to obtain information about the structure and state of a resource, e.g., configuration, current load, usage policy, etc.

Management protocols Used to negotiate access to a shared resource. Parameters typically specified

Resource requirements. Operations to be performed.

Requested protocol operations should be consistent with the policy under which resource is shared.

Accounting and payment typical issues.

18

Collective Layer Contains protocols and services (and APIs and

SDKs) that are not associated with any one specific resource.

These protocols are global in nature and capture interactions across collections of resources.

They can implement a wide variety of sharing behaviours without placing new requirements on the resources being shared.

19

Collective Layer (Contd.) Typical services

Directory services Co-allocation, scheduling and brokering services Monitoring and diagnostic services Data replication services Grid enabled programming systems Workload management systems and collaboration frameworks Software discovery services Community authorization services Community accounting and payment services Collaboratory services

20

Collective Layer(Contd.)

Unlike Resource layer protocols, Collective layer can vary from being very general to highly application specific.

Collective functions can be implemented as persistent services with associated protocols and SDKs designed to be linked with applications.

For large user communities, Collective layer protocols need to be standards based.

21

Application Layer

User applications. Constructed in terms of and by calling upon

services defined at any layer. At each layer, well defined protocols provide access

to some useful resources. At each layer APIs may also be defined whose

implementation exchange protocol messages with appropriate services to perform desired actions.

Application layer, in practice, uses sophisticated frameworks and libraries defining protocols, service and APIs.

22

Application Layer(Contd.)

Application

Languages and Frameworks

Collective APIs and SDKs

Collective Services

Resource APIs and SDK

Resource Services

Connectivity APIs and SDKs

Connectivity Services

Fabric

23

Example Grids GridLab Testbed

Ten thousand machines in Europe for developers of Grid tools SC2001 ARG Testbed & Global Grid Testbed

Collaboration Hastily assembled loose federation of world machines for

SC2001 and SC2002 demonstrations NCSA Virtual Machine Room and PACI Grid

Production resources TeraGrid (www.teragrid.org)

USA distributed terascale facility at 4 sites for open scientific research

Information Power Grid (www.ipg.nasa.gov) NASAs high performance computing grid

Mobile Grid Computer Science Major Area Examination

Ye Wen

26

Introduction: Grid Computing

The Grid Computing ProblemCoordinated resource sharing and problem solving in dynamic, heterogeneous environment.

Characteristics of current Grid system Large-scale Heterogeneous Dynamic resource sharing relationship

Pros and Cons Pros: large-scale, heterogeneity, flexibility Cons: static availability of resources, infrequent change

27

Introduction: Mobile Computing

What is mobile computing about?Build a distributed system for a network in which mobile devices and static hosts connected via wireless links.

Characteristics of mobile computing Versatile communication (no wire constraints) Ubiquitous computation Flexible usability

Pros and Cons Pros: ubiquity, availability, productivity Cons: constraints of wireless network

Unpredictable network quality Lowered trust and robustness Limited local resources and battery lifetime for mobile devices

28

Mobile Grid: Grid in mobile environment

Mobile Grid: Sharing both advantages Powerful computation capability of Grid system Ubiquitous and flexible availability of mobile system

A scenario:

Other scenarios: scientific application, commercial business

Forest fire

Firemen

Firemen

FiremenFiremenComputation center

History databases

Geographic databases

Fire simulationWeather forecast

Wireless links

29

Exploring Mobile Grid (Outline)

Overview of GridGrid architecture

Performancescheduling scheme, scheduling algorithm

Energy awarenessdynamic power management, computation offloading

Adaptationdisconnected operation, adaptive application

Securitymobile authentication

Address mobility and location independent namingmobile IP, ad hoc protocols

Distributed, reliable and scalable storagepeer-to-peer resource routing

30

Scheduling: Application Level Scheduling

Goal of scheduling: maximize application performance. Application Level Scheduling (AppLeS)

An application-specific approach to build scheduler for parallel applications on heterogeneous systems.

Comprehensive system and application information Static information

User-specified application parameters Application performance model

Dynamic information: Network Weather Service Performance prediction: Network Weather Service Experience the system from the point view of application

Run-time scheduling: Information is applied to application model to estimate application performance and choose an optimal resource allocation from a set of viable configurations.

Goodness: accurate

F. Berman, R. Wolski, S. Figueira, J. Schopf, and G. Shao, "Application-Level Scheduling on Distributed Heterogeneous Networks." In Proceedings of Supercomputing 96, Pittsburgh, PA, Nov. 1996.

31

Energy saving

“Energy crisis” of mobile devices Performance also concerns energy

Energy consumption estimation Simulation: SimplePower, Wattch Empirical methods

Ways to save energy Dynamic power management (DPM) policies: tradeoff between energy

and performance Spin down disks Turn off screen Network interface hibernation Processor voltage scaling Comprehensive stochastic model

Computation offloading

32

Computation offloading

Scheduling in terms of energy: Offloading can reduce computation, but communication also consumes

energy Optimize energy consumption by offloading part of computation

Model a program Task definition: each call site (statically); each invocation (dynamically) Cost graph

Relationship between tasks and data Node weight indicating power consumption of computation and

communication Edge weight indicating mean number of times for tasks accessing data

Aggregate the consumption from the cost graph and optimize

Zhiyuan Li, Cheng Wang, Rong Xu, "Computation offloading to save energy on handheld devices: a partition scheme." In Proceedings of the international conference on compilers, architecture, and synthesis for embedded systems, Atlanta, Georgia, USA, 2001.

33

Disconnected operation

Another fact affects performance: unpredictable network link quality Solution: adaptation [application level adaptation]

Disconnected operation in Coda file system Definition

a mode of operation that enables a client to continue accessing critical data during temporary failures of a shared data repository.

Solution: proxy + cache Venus: client-side proxy Three working states

Hoarding Emulation Reintegration

James J. Kistler, M. Satyanarayanan, "Disconnected Operation in the Coda File System." ACM Transactions on Computer Systems, Feb. 1992, Vol. 10, No. 1, pp. 3-25.

Hoarding

Emulation Reintegration

Disconnection

Physicalreconnection

Logicalreconnection

34

Mobile security

Difficulties of security in wireless mobile environment Inherent vulnerability of wireless media Performance impact!

Charon: indirect authentication using Kerberos Extend Kerberos by inserting a remote proxy (again!!) between client

and other servers Secure channel is built by first granting the proxy service to client Proxy interacts with other servers on client’s behalf Client can be very small: only need DES encryption/decryption No compromise of security:

The communication between client and proxy is encrypted Proxy believes the identity of user Proxy does not possess client’s session key and private key

Armando Fox, Steven D. Gribble, "Security on the move: indirect authentication using Kerberos." In Proceedings of the second annual international conference on Mobile computing and networking (MobiCom'96), Rye, New York, United States, 1996.

35

Conclusions

Incorporating mobility into Grid architecture is necessary and beneficial.

Problems arise since meaning of performance is extended Computational performance: scheduling Energy: power management and offloading Unstable network: adaptation Security Addressing and naming Scalability & Reliability

A lot can be borrowed from other research areas, but they should be put into a real Mobile Grid framework for inspection.

Future focus: comprehensive scheduling

Recommended Reading

36

Anatomy of the Grid

Physiology of Grid

Distributed Computing

Documents

Transcript of Distributed Computing