Distributed Computing
description
Transcript of Distributed Computing
1
Distributed Computing
Lecture 7: Grid Computing
2
Tuesday, February 24, 2009
“UNIX was never designed to keep people from doing stupid things, because that policy would also
keep them from doing clever things.”
- Doug Gwyn
3
Aggregation of Computing Power: Clustering
High Performance Computing (HPC) environments created using workstations interconnected via high speed networks.
Desirables in clustering Scalable solutions Readily available environment for research into parallel
computing. Unused computing cycles can be scavenged providing
inexpensive additional computing capacity. Commodity microprocessor based systems offer enormous
cost benefits. Robust/stable first generation software available.
4
Disadvantages of Clustering Cluster is a dedicated facility built at a single
location.
Financial, political and technical constraints place limits on the size of clusters.
Generally fall outside the financial limits of individual research groups.
5
The Grid Problem
Flexible, secure, coordinated resource sharing among dynamic collections of individuals and institutions
Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals -- assuming the absence of… central location, central control, omniscience, existing trust relationships.
GridGrid
Diff. P2P Vs Grid
6
The Grid Problem (contd.)
Infrastructure?, framework ?, platform ? Architecture ? Infrastructure and failure
The sharing that we are concerned with is not primarily file exchange but rather direct access to computers, software, data, and other resources, as is required by a range of collaborative problem solving and resource brokering strategies emerging in industry, science and engineering.
This sharing is, necessarily highly controlled, with resource providers and consumers defining clearly and carefully just what is shared, who is allowed to share, and the conditions under which sharing occurs.
A set of individuals and/or institutions defined by such sharing rules form what we call a virtual organization (VO).
7
Elements of the Problem
Resource sharing Computers, storage, sensors, networks, … Sharing always conditional: issues of trust, policy,
negotiation, payment, … Coordinated problem solving
Beyond client-server: distributed data analysis, computation, collaboration, …
Dynamic, multi-institutional virtual orgs Community overlays on classic org structures Large or small, static or dynamic
8
Broader Context
“Grid Computing” has much in common with major industrial thrusts Business-to-business, Peer-to-peer, Application Service
Providers, Storage Service Providers, Distributed Computing, Internet Computing…
Sharing issues not adequately addressed by existing technologies Complicated requirements: “run program X at site Y
subject to community policy P, providing access to data at Z according to policy Q”
High performance: unique demands of advanced & high-performance systems
9
Common Characteristics of Grids Grids are large
Both in terms of number of potentially available resources and their geographical dispersion.
Grids are distributed Latencies involved in moving data between resources are
substantial and may dominate applications. Grids are dynamic
Available resources change on the same time scale as the life span of a typical application.
10
Common Characteristics of Grids (Contd.)
Grids are heterogeneous Form and properties of sites differ in significant ways.
Grids cross the boundaries of human organizations Policies for access to and use of resources differ at different
sites.
11
Typical Issues in Grids Resource discovery Execution planning Authentication and security Heterogeneity of compute servers and data formats Pricing
12
Grid Architecture
Application
Languages and Frameworks
Collective APIs and SDKs
Collective Services
Resource APIs and SDK
Resource Services
Connectivity APIs and SDKs
Connectivity Services
Fabric
Grid Services Architecture
Applications
Grid ServicesSecurity
Data Management Fault Detection
Information Services
Resource Management
Grid FabricData Transport
Schedulers
Control Interfaces
Operating Systems
Application Toolkits
Data Intensive Remote VisualizationCollab Design
High Throughput Remote Control
Portability
Instrumentation QoS Services
High Energy Physics Data Analysis Climate StudiesCollab Engineering
Online Instrumentation
Message Passing
14
Fabric Provides resources to which shared access is mediated by
grid protocols.
Fabric components implement the resource specific operations that occur on specific resources as a result of sharing operations at higher layers. [LSF]
At a minimum, resources should implement, Enquiry mechanisms
Permit discovery of structure, state and capabilities of resources. Resource management mechanisms
Provide control of delivered quality of service.
15
Connectivity Layer Defines core communication and authentication
protocols required for Grid-specific network transactions.
Communication requirements include transport, routing and naming.
Authentication protocols are built on communication protocols to provide cryptographically secure mechanisms for verifying the identity of users and resources.
16
Resource Layer
Builds on connectivity layer protocols to define protocols, APIs and SDKs for the secure negotiation, initiation, monitoring, control, accounting and payment of sharing operations on individual resources.
Resource layer protocols call Fabric layer functions to access and control resources.
Resource layer protocols are chosen to capture the fundamental mechanisms of sharing across many different resource types without constraining the type and performance of higher protocols.
17
Resource Layer (Contd.)
Primary classes of resource layer protocols Information protocols
Used to obtain information about the structure and state of a resource, e.g., configuration, current load, usage policy, etc.
Management protocols Used to negotiate access to a shared resource. Parameters typically specified
Resource requirements. Operations to be performed.
Requested protocol operations should be consistent with the policy under which resource is shared.
Accounting and payment typical issues.
18
Collective Layer Contains protocols and services (and APIs and
SDKs) that are not associated with any one specific resource.
These protocols are global in nature and capture interactions across collections of resources.
They can implement a wide variety of sharing behaviours without placing new requirements on the resources being shared.
19
Collective Layer (Contd.) Typical services
Directory services Co-allocation, scheduling and brokering services Monitoring and diagnostic services Data replication services Grid enabled programming systems Workload management systems and collaboration frameworks Software discovery services Community authorization services Community accounting and payment services Collaboratory services
20
Collective Layer(Contd.)
Unlike Resource layer protocols, Collective layer can vary from being very general to highly application specific.
Collective functions can be implemented as persistent services with associated protocols and SDKs designed to be linked with applications.
For large user communities, Collective layer protocols need to be standards based.
21
Application Layer
User applications. Constructed in terms of and by calling upon
services defined at any layer. At each layer, well defined protocols provide access
to some useful resources. At each layer APIs may also be defined whose
implementation exchange protocol messages with appropriate services to perform desired actions.
Application layer, in practice, uses sophisticated frameworks and libraries defining protocols, service and APIs.
22
Application Layer(Contd.)
Application
Languages and Frameworks
Collective APIs and SDKs
Collective Services
Resource APIs and SDK
Resource Services
Connectivity APIs and SDKs
Connectivity Services
Fabric
23
Example Grids GridLab Testbed
Ten thousand machines in Europe for developers of Grid tools SC2001 ARG Testbed & Global Grid Testbed
Collaboration Hastily assembled loose federation of world machines for
SC2001 and SC2002 demonstrations NCSA Virtual Machine Room and PACI Grid
Production resources TeraGrid (www.teragrid.org)
USA distributed terascale facility at 4 sites for open scientific research
Information Power Grid (www.ipg.nasa.gov) NASAs high performance computing grid
24
Mobile Grid Computer Science Major Area Examination
Ye Wen
26
Introduction: Grid Computing
The Grid Computing ProblemCoordinated resource sharing and problem solving in dynamic, heterogeneous environment.
Characteristics of current Grid system Large-scale Heterogeneous Dynamic resource sharing relationship
Pros and Cons Pros: large-scale, heterogeneity, flexibility Cons: static availability of resources, infrequent change
27
Introduction: Mobile Computing
What is mobile computing about?Build a distributed system for a network in which mobile devices and static hosts connected via wireless links.
Characteristics of mobile computing Versatile communication (no wire constraints) Ubiquitous computation Flexible usability
Pros and Cons Pros: ubiquity, availability, productivity Cons: constraints of wireless network
Unpredictable network quality Lowered trust and robustness Limited local resources and battery lifetime for mobile devices
28
Mobile Grid: Grid in mobile environment
Mobile Grid: Sharing both advantages Powerful computation capability of Grid system Ubiquitous and flexible availability of mobile system
A scenario:
Other scenarios: scientific application, commercial business
Forest fire
Firemen
Firemen
FiremenFiremenComputation center
History databases
Geographic databases
Fire simulationWeather forecast
Wireless links
29
Exploring Mobile Grid (Outline)
Overview of GridGrid architecture
Performancescheduling scheme, scheduling algorithm
Energy awarenessdynamic power management, computation offloading
Adaptationdisconnected operation, adaptive application
Securitymobile authentication
Address mobility and location independent namingmobile IP, ad hoc protocols
Distributed, reliable and scalable storagepeer-to-peer resource routing
30
Scheduling: Application Level Scheduling
Goal of scheduling: maximize application performance. Application Level Scheduling (AppLeS)
An application-specific approach to build scheduler for parallel applications on heterogeneous systems.
Comprehensive system and application information Static information
User-specified application parameters Application performance model
Dynamic information: Network Weather Service Performance prediction: Network Weather Service Experience the system from the point view of application
Run-time scheduling: Information is applied to application model to estimate application performance and choose an optimal resource allocation from a set of viable configurations.
Goodness: accurate
F. Berman, R. Wolski, S. Figueira, J. Schopf, and G. Shao, "Application-Level Scheduling on Distributed Heterogeneous Networks." In Proceedings of Supercomputing 96, Pittsburgh, PA, Nov. 1996.
31
Energy saving
“Energy crisis” of mobile devices Performance also concerns energy
Energy consumption estimation Simulation: SimplePower, Wattch Empirical methods
Ways to save energy Dynamic power management (DPM) policies: tradeoff between energy
and performance Spin down disks Turn off screen Network interface hibernation Processor voltage scaling Comprehensive stochastic model
Computation offloading
32
Computation offloading
Scheduling in terms of energy: Offloading can reduce computation, but communication also consumes
energy Optimize energy consumption by offloading part of computation
Model a program Task definition: each call site (statically); each invocation (dynamically) Cost graph
Relationship between tasks and data Node weight indicating power consumption of computation and
communication Edge weight indicating mean number of times for tasks accessing data
Aggregate the consumption from the cost graph and optimize
Zhiyuan Li, Cheng Wang, Rong Xu, "Computation offloading to save energy on handheld devices: a partition scheme." In Proceedings of the international conference on compilers, architecture, and synthesis for embedded systems, Atlanta, Georgia, USA, 2001.
33
Disconnected operation
Another fact affects performance: unpredictable network link quality Solution: adaptation [application level adaptation]
Disconnected operation in Coda file system Definition
a mode of operation that enables a client to continue accessing critical data during temporary failures of a shared data repository.
Solution: proxy + cache Venus: client-side proxy Three working states
Hoarding Emulation Reintegration
James J. Kistler, M. Satyanarayanan, "Disconnected Operation in the Coda File System." ACM Transactions on Computer Systems, Feb. 1992, Vol. 10, No. 1, pp. 3-25.
Hoarding
Emulation Reintegration
Disconnection
Physicalreconnection
Logicalreconnection
34
Mobile security
Difficulties of security in wireless mobile environment Inherent vulnerability of wireless media Performance impact!
Charon: indirect authentication using Kerberos Extend Kerberos by inserting a remote proxy (again!!) between client
and other servers Secure channel is built by first granting the proxy service to client Proxy interacts with other servers on client’s behalf Client can be very small: only need DES encryption/decryption No compromise of security:
The communication between client and proxy is encrypted Proxy believes the identity of user Proxy does not possess client’s session key and private key
Armando Fox, Steven D. Gribble, "Security on the move: indirect authentication using Kerberos." In Proceedings of the second annual international conference on Mobile computing and networking (MobiCom'96), Rye, New York, United States, 1996.
35
Conclusions
Incorporating mobility into Grid architecture is necessary and beneficial.
Problems arise since meaning of performance is extended Computational performance: scheduling Energy: power management and offloading Unstable network: adaptation Security Addressing and naming Scalability & Reliability
A lot can be borrowed from other research areas, but they should be put into a real Mobile Grid framework for inspection.
Future focus: comprehensive scheduling
Recommended Reading
36
Anatomy of the Grid
Physiology of Grid