Grid Computing 7700 Fall 2005 Lecture 5: Grid Architecture and Globus Gabrielle Allen...

41
Grid Computing 7700 Fall 2005 Lecture 5: Grid Architecture and Globus Gabrielle Allen [email protected] http://www.cct.lsu.edu/~gallen
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of Grid Computing 7700 Fall 2005 Lecture 5: Grid Architecture and Globus Gabrielle Allen...

Grid Computing 7700Fall 2005

Lecture 5: Grid Architecture and Globus

Gabrielle [email protected]

http://www.cct.lsu.edu/~gallen

Concrete Example

I have a source file Main.F on machine A, an input file on machine B. Main.F is written using MPI, it will need around 4GB of core memory to run, it will take several hours to complete, and will produce a large output file.

What functionality do we need?

Issues

How to select a machine to run it on? How to provide an executable which can run

on that machine? How to move the input file? How to start the executable? How to monitor the job? When does it start?

When does it finish? How to move the output file back? What about security? How do we know if it didn’t work and how it

failed?

How to Select a Machine

What properties of a machine are we interested in?– What resources does my executable require?

• 4 GB memory, “several hours of compute time”• Enough diskspace for the output

– What kind of environment do I need on the machine?• OS limitations?• MPI? (Which version?), Fortran?

– What resources am I authorized to run on?– How quickly will it run? – How much will it cost/what is my allocation there?– How to find all this information? What should the

user provide?

More Complicated

What if the program might need to read in data kept on machine C while it is running?

What about distributing across processors on different machines?

What if I have a lot of interconnected programs?

How do I find the output file afterwards? What is it doesn’t work?

Questions

What kind of functionality do we need? What tools exist to do this? What kinds of features of distributed

computing do they need to be designed?

What design issues to watch for?

Abstract Requirements

Single sign-on Job submission, monitoring and management

– submit a job to a resource on the grid– monitor the progress of a submitted job– retrieve results– cancel job

File transfer– move files from A to B, securely, reliably and efficiently

Resource discovery– locate resources or services with particular characteristics

Less typical: Metacomputing, workflow enactment, resource

brokering,...

What do I have to choose from? Globus Toolkit

– version 2 is widely deployed; nearest thing to a de facto standard– horizontally integrated bag of tools– suits grid application developers better than end users– Brand new V4 based on web services

UNICORE– less widely deployed; few UK deployments– vertically integrated– suits end users better than application developers

Condor– high throughput computing– great for cycle harvesting

Web Services?– GT4 or roll your own using Web Services tools

Others– yes, there are others

Computationally intensive File access/transfer Bag of various heterogeneous protocols & toolkits Monolithic design Recognised internet, ignored Web Academic teams

Generation GameIn

crea

sed

func

tiona

lity,

stan

dard

izat

ion

Time

Customsolutions

Open GridServices

ArchitectureWeb services

Globus ToolkitCondor, Unicore

Defacto standardsGridFTP, GSI

X.509,LDAP,

FTP, …

App-specificServices

Data and knowledge intensive Open services-based architecture

Builds on Web services GGF + OASIS+W3C

Multiple implementations Global Grid Forum

Industry participation(adapted from Ian Foster GGF7 Plenary)

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Grid Architecture

Fabric

Connectivity

Resource

Collective

Application

Fabric Layer

Contains the resources themselves which the Grid infrastructure needs to access

Fabric components implement local, resource specific operations to provide higher level Grid operations– NFS storage protocol– Kerberos security– PBS queuing system

Grid cannot provide more than local operations can support (e.g. advanced reservation)

Fabric Layer

Computational resources Storage resources Network resources But also

– Database resources– Code repository resources– Etc.

Fabric Layer

What is the minimum functionality?– Introspection mechanisms:

• Computational resources: hardware, software characteristics, state information such as current load and queue state

• Storage resources: hardware, software characteristics, available space

• Network resources: network characteristics and load– Resource management mechanisms

• Computational resources: starting programs, monitoring and controlling execution of resulting programs

• Storage resources: file put and get

Fabric Layer

What is desirable?– Introspection mechanisms:

• Storage resources: bandwidth utilization

– Resource management mechanisms• Computational resources: control over resources

allocated to processes, advanced reservation• Storage resources: 3rd party transfers, high

performance transfers, put and get of file subsets, callback functionality

• Network resources: control of resources, prioritization, reservation

Connectivity Layer

Core communication and authentication protocols for needed network transactions

Exchange of data between fabric layer resources

Security Requirements: transport, routing, naming Assumed using protocols from TCP/IP stack (IP,

ICMP, TCP, UCP, DNS, OSPF, RSVP, …), but could be others.

Connectivity Layer

Security requirements– Single sign-on to all resources– Delegation of rights– Integration with local security– Implementation of trust relations– Secure transport of data

Resource Layer

Protocols for secure negotiation, initiation, monitoring, control, accounting on individual resources

Concerned with individual resources (addressed in next layer)

Information protocols– Obtaining information about structure and state of a

resource Management protocols

– Negotiating access for given resource requirements, performing operations (job starting, data access). Monitoring and controlling resources and processes.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Grid Architecture

Fabric

Connectivity

Resource

Collective

Application

Resource Layer

Protocols for secure negotiation, initiation, monitoring, control, accounting on individual resources

Concerned with individual resources (addressed in next layer)

Information protocols– Obtaining information about structure and state of a

resource Management protocols

– Negotiating access for given resource requirements, performing operations (job starting, data access). Monitoring and controlling resources and processes.

Collective Layer

Dealing with operations across collective resources Build on relativity small number of resource/connectivity

protocols Examples

– Directory services (to provide information about resources)– Co-allocation, scheduling, brokering services– Monitoring and diagnostic services– Data replication services– Community authorization and accounting services

What do I have to choose from? Globus Toolkit

– version 2 is widely deployed; nearest thing to a de facto standard– horizontally integrated bag of tools– suits grid application developers better than end users– Brand new V4 based on web services

UNICORE– less widely deployed; few UK deployments– vertically integrated– suits end users better than application developers

Condor– high throughput computing– great for cycle harvesting

Web Services?– GT4 or roll your own using Web Services tools

Others– yes, there are others

UNICORE Packaged Software with GUI Open source

– http://unicore.sourceforge.net/ Designed for firewalls Strict security model

– explicit delegation Abstract Job Object (AJO)

– built-in workflow management Resource Broker

– can submit to Globus grids Has notion of software resource Few APIs

– extend through plug-ins– starting to expose service interfaces

Serves the user

http://www.unicore.org/

Condor: High-throughput computing

Condor converts collections of workstations and clusters into a distributed high-throughput computing facility

Emphasis on policy management and reliability High-throughput scheduler Supports job checkpoint and migration

– single processor jobs only Remote system callsCondor-G lets Condor users add Globus-enabled

resources to their private view of a Condor pool ("flock")

"glide-in"

http://www.cs.wisc.edu/condor/

Legion/Avaki

Object based meta-system, providing a single integrated infrastructure

All components are objects (unlike GT)– Data abstraction, encapsulation, inheritance, polymorphism

API to core services Core object types

– Classes/metaclasses: managers and policy makers– Host objects: abstractions of processing resources (one or many)– Vault objects: persistent storage– Implementation objects and caches: “exectuables”– Binding agents: maps objects to physical addresses– Context objects: naming of objects

Globus Toolkit V2

GT2 “Implements Grid protocols for security, information discovery, resource management, data management, communication, fault detection and portability”

Bag of tools rather than a uniform programming model, aims to provide distinct services with well defined APIs

Assumes suitable software deployed on resources to provide basic fabric functionality (although some tools to help this are provided)– Discovering and packaging structure and state

information

Globus Toolkit version 2 "Single sign-on" through Grid Security

Infrastructure (GSI) Remote execution of jobs

– GRAM, job-managers, Resource Specification Language (RSL)

Grid-FTP– Efficient, reliable file transfer; third-party file

transfers MDS (Metacomputing Directory Service)

– Resource discovery (GRIS and GIIS) Co-allocation (DUROC)

– Limited by support from scheduling infrastructure Other GSI-enabled utilities

– gsi-ssh, grid-cvs, etc. Low-level APIs and command-line interfaces Commodity Grid Kits (CoG-kits), Java, Perl,

Python Widespread deployment, lots of projects

Diverse global services

Coreservices

Local OS

A p p l i c a t i o n s

Globus Toolkit V2

Connectivity– Grid Security Infrastructure (GSI) protocols– Based on public-key-infrastructure (PKI) and Internet protocols– Single sign-in (authentication creates a proxy credential: a digitally

signed certificate that grants the holder the right to perform operations on behalf of signer for a limited time)

– Delegation (communication of a (restricted) proxy credential to a remote service)

– Credential format is extension of X.509 certificate – Remote delegation protocol based on transport layer security (TLS)

protocol (follow on to SSL)– High-level programming API extensions of generic sercurity service

application programming interface (GSS-API)

Globus Toolkit V2

Resource Layer– Grid Resource Allocation and Management

(GRAM) protocol– Monitoring and Discovery Service (MDS-2)– Grid File Transfer Protocol (GridFTP)

GRAM Protocol

Grid Resource Allocation and Management– Creation and management of remote computations– GSI for authentication, authorization, delegation– GRAM implementations map requests expressed in a

Resource Specification Language (RSL) into commands understood by local schedulers and computers

– Multiple GRAM implementations exist (with C, Java, Python interfaces)

– GT2 implementation• Based on HTTP protocol• “gatekeeper” initiates remote computations• “jobmanager” manages remote computation• GRAM reporter monitors and publishes information

MDS-2

Monitoring and Discovery Service– Framework for discovering and accessing structure

and status information about resources (and services)

• Data model for representing information• Protocols for publishing and accessing information

– GT2 implementation• Based on LDAP (lightweight directory access protocol)• Local registry to manage collection and publication of

information at a single location• Collective registry to support queries for information

from multiple locations• Caching for performance

GridFTP Protocol

Extended version of file transfer protocol– GSI security – Partial file access, high speed striping– Third party transfers– Separate control/data channels

Computationally intensive File access/transfer Bag of various heterogeneous protocols & toolkits Monolithic design Recognised internet, ignored Web Academic teams

Generation GameIn

crea

sed

func

tiona

lity,

stan

dard

izat

ion

Time

Customsolutions

Open GridServices

ArchitectureWeb services

Globus ToolkitCondor, Unicore

Defacto standardsGridFTP, GSI

X.509,LDAP,

FTP, …

App-specificServices

Data and knowledge intensive Open services-based architecture

Builds on Web services GGF + OASIS+W3C

Multiple implementations Global Grid Forum

Industry participation(adapted from Ian Foster GGF7 Plenary)

Web Services

A Web service is a software system designed to support interoperable machine-to-machine interaction over a network.

It has an interface that is described in a machine-processable format such as WSDL.

Other systems interact with the Web service in a manner prescribed by its interface using messages (usually enclosed in a SOAP envelope).

These messages are typically conveyed using HTTP, and are normally comprised of XML

Software applications written in various programming languages and running on various platforms can use web services to exchange data over networks.

This interoperability (e.g., between Java and Python, or Windows and Linux applications) is due to the use of open standards.

OASIS and the W3C are the primary committees responsible for the architecture and standardization of web services.

Specifications for additional features under development.

Basically: Web service = TRANSPORT (HTTP) + MESSAGING (SOAP) + DESCRIPTION (WSDL) + DISCOVERY (UDDI) + MESSAGE (XML)

Service Oriented Architecture

Components are defined by service interfaces (e.g. Web Services)

Characterized by:– Abstract logical view of programs, databases etc– Services defined by exchanged messages (not by

properties of the agents themselves)– Internal structure of agent is not relevant (can

accommodate legacy systems)– Services defined by machine processable meta data

(documented semantics)– Small number of operations – Services oriented towards network usage– Platform neutral (e.g. messages in XML)

Open Grid Services Architecture

Resulted from attempt to standardize GT protocols, influenced by uptake of web services and SoA ideas:– Modularize components for different grid

functions– Uniform treatment of network entities

(service orientation)– Standard IDLs aligned with Web services– Develop within standards body (Global Grid

Forum)

Open Grid Services Architecture

Grid Service– A web service which is extended to include transient and stateful

services OGSI specification

– Open Grid Services Infrastructure– Defines interfaces, behaviours and conventions for grid services– Now replaced by range of web service definitions

OGSA defines services and interfaces required in a working grid environment– GGF working groups are identifying required functions and then

making OGSI compliant interfaces Multiple implementations

– GT3: reference implementation of OGSI and basic OGSA services– GT4: pure web services

GT4

Released April 2005 Service oriented architecture Web services to describe and invoke most

components GT4 web service containers for deploying and

managing GT4 services (Java, C, Python) Most interfaces still need to be standardized

Coursework 3

Write one or two pages describing each of the following Globus components:– GRAM– MDS– GridFTP

Best documentation and relevant papers at http://www.globus.org

Required Reading

The Physiology of the Grid – See course page for link