Grid Computing 7700 Fall 2005 Lecture 5: Grid Architecture and Globus Gabrielle Allen...
-
date post
19-Dec-2015 -
Category
Documents
-
view
213 -
download
0
Transcript of Grid Computing 7700 Fall 2005 Lecture 5: Grid Architecture and Globus Gabrielle Allen...
Grid Computing 7700Fall 2005
Lecture 5: Grid Architecture and Globus
Gabrielle [email protected]
http://www.cct.lsu.edu/~gallen
Concrete Example
I have a source file Main.F on machine A, an input file on machine B. Main.F is written using MPI, it will need around 4GB of core memory to run, it will take several hours to complete, and will produce a large output file.
What functionality do we need?
Issues
How to select a machine to run it on? How to provide an executable which can run
on that machine? How to move the input file? How to start the executable? How to monitor the job? When does it start?
When does it finish? How to move the output file back? What about security? How do we know if it didn’t work and how it
failed?
How to Select a Machine
What properties of a machine are we interested in?– What resources does my executable require?
• 4 GB memory, “several hours of compute time”• Enough diskspace for the output
– What kind of environment do I need on the machine?• OS limitations?• MPI? (Which version?), Fortran?
– What resources am I authorized to run on?– How quickly will it run? – How much will it cost/what is my allocation there?– How to find all this information? What should the
user provide?
More Complicated
What if the program might need to read in data kept on machine C while it is running?
What about distributing across processors on different machines?
What if I have a lot of interconnected programs?
How do I find the output file afterwards? What is it doesn’t work?
Questions
What kind of functionality do we need? What tools exist to do this? What kinds of features of distributed
computing do they need to be designed?
What design issues to watch for?
Abstract Requirements
Single sign-on Job submission, monitoring and management
– submit a job to a resource on the grid– monitor the progress of a submitted job– retrieve results– cancel job
File transfer– move files from A to B, securely, reliably and efficiently
Resource discovery– locate resources or services with particular characteristics
Less typical: Metacomputing, workflow enactment, resource
brokering,...
What do I have to choose from? Globus Toolkit
– version 2 is widely deployed; nearest thing to a de facto standard– horizontally integrated bag of tools– suits grid application developers better than end users– Brand new V4 based on web services
UNICORE– less widely deployed; few UK deployments– vertically integrated– suits end users better than application developers
Condor– high throughput computing– great for cycle harvesting
Web Services?– GT4 or roll your own using Web Services tools
Others– yes, there are others
Computationally intensive File access/transfer Bag of various heterogeneous protocols & toolkits Monolithic design Recognised internet, ignored Web Academic teams
Generation GameIn
crea
sed
func
tiona
lity,
stan
dard
izat
ion
Time
Customsolutions
Open GridServices
ArchitectureWeb services
Globus ToolkitCondor, Unicore
Defacto standardsGridFTP, GSI
X.509,LDAP,
FTP, …
App-specificServices
Data and knowledge intensive Open services-based architecture
Builds on Web services GGF + OASIS+W3C
Multiple implementations Global Grid Forum
Industry participation(adapted from Ian Foster GGF7 Plenary)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Grid Architecture
Fabric
Connectivity
Resource
Collective
Application
Fabric Layer
Contains the resources themselves which the Grid infrastructure needs to access
Fabric components implement local, resource specific operations to provide higher level Grid operations– NFS storage protocol– Kerberos security– PBS queuing system
Grid cannot provide more than local operations can support (e.g. advanced reservation)
Fabric Layer
Computational resources Storage resources Network resources But also
– Database resources– Code repository resources– Etc.
Fabric Layer
What is the minimum functionality?– Introspection mechanisms:
• Computational resources: hardware, software characteristics, state information such as current load and queue state
• Storage resources: hardware, software characteristics, available space
• Network resources: network characteristics and load– Resource management mechanisms
• Computational resources: starting programs, monitoring and controlling execution of resulting programs
• Storage resources: file put and get
Fabric Layer
What is desirable?– Introspection mechanisms:
• Storage resources: bandwidth utilization
– Resource management mechanisms• Computational resources: control over resources
allocated to processes, advanced reservation• Storage resources: 3rd party transfers, high
performance transfers, put and get of file subsets, callback functionality
• Network resources: control of resources, prioritization, reservation
Connectivity Layer
Core communication and authentication protocols for needed network transactions
Exchange of data between fabric layer resources
Security Requirements: transport, routing, naming Assumed using protocols from TCP/IP stack (IP,
ICMP, TCP, UCP, DNS, OSPF, RSVP, …), but could be others.
Connectivity Layer
Security requirements– Single sign-on to all resources– Delegation of rights– Integration with local security– Implementation of trust relations– Secure transport of data
Resource Layer
Protocols for secure negotiation, initiation, monitoring, control, accounting on individual resources
Concerned with individual resources (addressed in next layer)
Information protocols– Obtaining information about structure and state of a
resource Management protocols
– Negotiating access for given resource requirements, performing operations (job starting, data access). Monitoring and controlling resources and processes.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Grid Architecture
Fabric
Connectivity
Resource
Collective
Application
Resource Layer
Protocols for secure negotiation, initiation, monitoring, control, accounting on individual resources
Concerned with individual resources (addressed in next layer)
Information protocols– Obtaining information about structure and state of a
resource Management protocols
– Negotiating access for given resource requirements, performing operations (job starting, data access). Monitoring and controlling resources and processes.
Collective Layer
Dealing with operations across collective resources Build on relativity small number of resource/connectivity
protocols Examples
– Directory services (to provide information about resources)– Co-allocation, scheduling, brokering services– Monitoring and diagnostic services– Data replication services– Community authorization and accounting services
What do I have to choose from? Globus Toolkit
– version 2 is widely deployed; nearest thing to a de facto standard– horizontally integrated bag of tools– suits grid application developers better than end users– Brand new V4 based on web services
UNICORE– less widely deployed; few UK deployments– vertically integrated– suits end users better than application developers
Condor– high throughput computing– great for cycle harvesting
Web Services?– GT4 or roll your own using Web Services tools
Others– yes, there are others
UNICORE Packaged Software with GUI Open source
– http://unicore.sourceforge.net/ Designed for firewalls Strict security model
– explicit delegation Abstract Job Object (AJO)
– built-in workflow management Resource Broker
– can submit to Globus grids Has notion of software resource Few APIs
– extend through plug-ins– starting to expose service interfaces
Serves the user
http://www.unicore.org/
Condor: High-throughput computing
Condor converts collections of workstations and clusters into a distributed high-throughput computing facility
Emphasis on policy management and reliability High-throughput scheduler Supports job checkpoint and migration
– single processor jobs only Remote system callsCondor-G lets Condor users add Globus-enabled
resources to their private view of a Condor pool ("flock")
"glide-in"
http://www.cs.wisc.edu/condor/
Legion/Avaki
Object based meta-system, providing a single integrated infrastructure
All components are objects (unlike GT)– Data abstraction, encapsulation, inheritance, polymorphism
API to core services Core object types
– Classes/metaclasses: managers and policy makers– Host objects: abstractions of processing resources (one or many)– Vault objects: persistent storage– Implementation objects and caches: “exectuables”– Binding agents: maps objects to physical addresses– Context objects: naming of objects
Globus Toolkit V2
GT2 “Implements Grid protocols for security, information discovery, resource management, data management, communication, fault detection and portability”
Bag of tools rather than a uniform programming model, aims to provide distinct services with well defined APIs
Assumes suitable software deployed on resources to provide basic fabric functionality (although some tools to help this are provided)– Discovering and packaging structure and state
information
Globus Toolkit version 2 "Single sign-on" through Grid Security
Infrastructure (GSI) Remote execution of jobs
– GRAM, job-managers, Resource Specification Language (RSL)
Grid-FTP– Efficient, reliable file transfer; third-party file
transfers MDS (Metacomputing Directory Service)
– Resource discovery (GRIS and GIIS) Co-allocation (DUROC)
– Limited by support from scheduling infrastructure Other GSI-enabled utilities
– gsi-ssh, grid-cvs, etc. Low-level APIs and command-line interfaces Commodity Grid Kits (CoG-kits), Java, Perl,
Python Widespread deployment, lots of projects
Diverse global services
Coreservices
Local OS
A p p l i c a t i o n s
Globus Toolkit V2
Connectivity– Grid Security Infrastructure (GSI) protocols– Based on public-key-infrastructure (PKI) and Internet protocols– Single sign-in (authentication creates a proxy credential: a digitally
signed certificate that grants the holder the right to perform operations on behalf of signer for a limited time)
– Delegation (communication of a (restricted) proxy credential to a remote service)
– Credential format is extension of X.509 certificate – Remote delegation protocol based on transport layer security (TLS)
protocol (follow on to SSL)– High-level programming API extensions of generic sercurity service
application programming interface (GSS-API)
Globus Toolkit V2
Resource Layer– Grid Resource Allocation and Management
(GRAM) protocol– Monitoring and Discovery Service (MDS-2)– Grid File Transfer Protocol (GridFTP)
GRAM Protocol
Grid Resource Allocation and Management– Creation and management of remote computations– GSI for authentication, authorization, delegation– GRAM implementations map requests expressed in a
Resource Specification Language (RSL) into commands understood by local schedulers and computers
– Multiple GRAM implementations exist (with C, Java, Python interfaces)
– GT2 implementation• Based on HTTP protocol• “gatekeeper” initiates remote computations• “jobmanager” manages remote computation• GRAM reporter monitors and publishes information
MDS-2
Monitoring and Discovery Service– Framework for discovering and accessing structure
and status information about resources (and services)
• Data model for representing information• Protocols for publishing and accessing information
– GT2 implementation• Based on LDAP (lightweight directory access protocol)• Local registry to manage collection and publication of
information at a single location• Collective registry to support queries for information
from multiple locations• Caching for performance
GridFTP Protocol
Extended version of file transfer protocol– GSI security – Partial file access, high speed striping– Third party transfers– Separate control/data channels
Computationally intensive File access/transfer Bag of various heterogeneous protocols & toolkits Monolithic design Recognised internet, ignored Web Academic teams
Generation GameIn
crea
sed
func
tiona
lity,
stan
dard
izat
ion
Time
Customsolutions
Open GridServices
ArchitectureWeb services
Globus ToolkitCondor, Unicore
Defacto standardsGridFTP, GSI
X.509,LDAP,
FTP, …
App-specificServices
Data and knowledge intensive Open services-based architecture
Builds on Web services GGF + OASIS+W3C
Multiple implementations Global Grid Forum
Industry participation(adapted from Ian Foster GGF7 Plenary)
Web Services
A Web service is a software system designed to support interoperable machine-to-machine interaction over a network.
It has an interface that is described in a machine-processable format such as WSDL.
Other systems interact with the Web service in a manner prescribed by its interface using messages (usually enclosed in a SOAP envelope).
These messages are typically conveyed using HTTP, and are normally comprised of XML
Software applications written in various programming languages and running on various platforms can use web services to exchange data over networks.
This interoperability (e.g., between Java and Python, or Windows and Linux applications) is due to the use of open standards.
OASIS and the W3C are the primary committees responsible for the architecture and standardization of web services.
Specifications for additional features under development.
Basically: Web service = TRANSPORT (HTTP) + MESSAGING (SOAP) + DESCRIPTION (WSDL) + DISCOVERY (UDDI) + MESSAGE (XML)
Service Oriented Architecture
Components are defined by service interfaces (e.g. Web Services)
Characterized by:– Abstract logical view of programs, databases etc– Services defined by exchanged messages (not by
properties of the agents themselves)– Internal structure of agent is not relevant (can
accommodate legacy systems)– Services defined by machine processable meta data
(documented semantics)– Small number of operations – Services oriented towards network usage– Platform neutral (e.g. messages in XML)
Open Grid Services Architecture
Resulted from attempt to standardize GT protocols, influenced by uptake of web services and SoA ideas:– Modularize components for different grid
functions– Uniform treatment of network entities
(service orientation)– Standard IDLs aligned with Web services– Develop within standards body (Global Grid
Forum)
Open Grid Services Architecture
Grid Service– A web service which is extended to include transient and stateful
services OGSI specification
– Open Grid Services Infrastructure– Defines interfaces, behaviours and conventions for grid services– Now replaced by range of web service definitions
OGSA defines services and interfaces required in a working grid environment– GGF working groups are identifying required functions and then
making OGSI compliant interfaces Multiple implementations
– GT3: reference implementation of OGSI and basic OGSA services– GT4: pure web services
GT4
Released April 2005 Service oriented architecture Web services to describe and invoke most
components GT4 web service containers for deploying and
managing GT4 services (Java, C, Python) Most interfaces still need to be standardized
Coursework 3
Write one or two pages describing each of the following Globus components:– GRAM– MDS– GridFTP
Best documentation and relevant papers at http://www.globus.org