Chapter 4:- Introduction to Grid and its Evolution

60
Chapter 4:- Introduction to Grid and its Evolution Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

description

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT. Overview. Background: What is the Grid? Related technologies Grid applications Communities Grid Tools Case Studies. What is a Grid?. - PowerPoint PPT Presentation

Transcript of Chapter 4:- Introduction to Grid and its Evolution

Page 1: Chapter 4:- Introduction to Grid and its Evolution

Chapter 4:-Introduction to Grid and its

Evolution

Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

Page 2: Chapter 4:- Introduction to Grid and its Evolution

2

OverviewBackground: What is the Grid?Related technologiesGrid applicationsCommunitiesGrid ToolsCase Studies

Page 3: Chapter 4:- Introduction to Grid and its Evolution

3

What is a Grid?Many definitions exist in the literatureEarly defs: Foster and Kesselman, 1998

“A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational facilities”

Kleinrock 1969: “We will probably see the spread of ‘computer

utilities’, which, like present electric and telephone utilities, will service individual homes and offices across the country.”

Page 4: Chapter 4:- Introduction to Grid and its Evolution

Grid computing (1) “Coordinated resource sharing and

problem solving in dynamic, multi-institutional virtual organisations” (I. Foster)

Page 5: Chapter 4:- Introduction to Grid and its Evolution

Grid computing (2)Information grid

large access to distributed data (the Web)

Data gridmanagement and processing of very large

distributed data sets

Computing gridmeta computer

Page 6: Chapter 4:- Introduction to Grid and its Evolution

Parallelism vs grids: some recalls

Grids date back “only” 1996 Parallelism is older ! (first classification in 1972) Motivations:

need more computing power (weather forecast, atomic simulation, genomics…)

need more storage capacity (Petabytes and more)in a word: improve performance ! 3 ways ...

Work harder --> Use faster hardwareWork smarter --> Optimize algorithmsGet help --> Use more computers !

Page 7: Chapter 4:- Introduction to Grid and its Evolution

The performance ? Ideally it grows linearly Speed-up:

if TS is the best time to process a problem sequentially, then the parallel processing time should be TP=TS/P with P

processorsspeedup = TS/TP

the speedup is limited by Amdhal law: any parallel program has a purely sequential and a parallelizable part TS= F + T//,

thus the speedup is limited: S = (F + T//) / (F + (T///P)) < P

Scale-up: if TPS is the time to solve a problem of size S with P processors, then TPS should also be the time to process a problem of size n*S

with n*P processors

Page 8: Chapter 4:- Introduction to Grid and its Evolution

8

Why do we need Grids?Many large-scale problems cannot be

solved by a single computerGlobally distributed data and resources

Page 9: Chapter 4:- Introduction to Grid and its Evolution

9

Background: Related technologiesCluster computingPeer-to-peer computingInternet computing

Page 10: Chapter 4:- Introduction to Grid and its Evolution

10

Cluster computingIdea: put some PCs together and get them

to communicateCheaper to build than a mainframe

supercomputerDifferent sizes of clustersScalable – can grow a cluster by adding

more PCs

Page 11: Chapter 4:- Introduction to Grid and its Evolution

11

Cluster Architecture

Page 12: Chapter 4:- Introduction to Grid and its Evolution

12

Peer-to-Peer computingConnect to other computersCan access files from any computer on the

networkAllows data sharing without going through

central serverDecentralized approach also useful for Grid

Page 13: Chapter 4:- Introduction to Grid and its Evolution

13

Peer to Peer architecture

Page 14: Chapter 4:- Introduction to Grid and its Evolution

14

Internet computingIdea: many idle PCs on the InternetCan perform other computations while not

being used“Cycle scavenging” – rely on getting free time

on other people’s computersExample: SETI@homeWhat are advantages/disadvantages of cycle

scavenging?

Page 15: Chapter 4:- Introduction to Grid and its Evolution

15

Some Grid ApplicationsDistributed supercomputingHigh-throughput computingOn-demand computingData-intensive computingCollaborative computing

Page 16: Chapter 4:- Introduction to Grid and its Evolution

16

Grid UsersMany levels of users

Grid developersTool developersApplication developersEnd usersSystem administrators

Page 17: Chapter 4:- Introduction to Grid and its Evolution

17

Some Grid challengesData movementData replicationResource managementJob submission

Page 18: Chapter 4:- Introduction to Grid and its Evolution

Computational grid“Hardware and software infrastructure that

provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities” (I. Foster)

Performance criteria:securityreliabilitycomputing powerlatencythroughputscalabilityservices

Page 19: Chapter 4:- Introduction to Grid and its Evolution

Grid characteristicsLarge scaleHeterogeneityMultiple administration domainAutonomy… and coordinationDynamicityFlexibilityExtensibilitySecurity

Page 20: Chapter 4:- Introduction to Grid and its Evolution

Levels of cooperation in a computing grid

End system (computer, disk, sensor…) multithreading, local I/O

Cluster synchronous communications, DSM, parallel I/O parallel processing

Intranet/Organization heterogeneity, distributed admin, distributed FS and databases load balancing access control

Internet/Grid global supervision brokers, negotiation, cooperation…

Page 21: Chapter 4:- Introduction to Grid and its Evolution

Basic services Authentication/Authorization/Traceability

Activity control (monitoring)

Resource discovery

Resource brokering

Scheduling

Job submission, data access/migration and execution

Accounting

Page 22: Chapter 4:- Introduction to Grid and its Evolution

Layered Grid Architecture(By Analogy to Internet Architecture)

Application

Fabric“Controlling things locally”: Access to, & control of, resources

Connectivity“Talking to things”: communication (Internet protocols) & security

Resource“Sharing single resources”: negotiating access, controlling use

Collective“Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services

InternetTransport

Application

Link

Internet Protocol Architecture

From I. Foster

Page 23: Chapter 4:- Introduction to Grid and its Evolution

ResourcesDescriptionAdvertisingCatalogingMatchingClaimingReservingCheckpointing

Page 24: Chapter 4:- Introduction to Grid and its Evolution

Resource management (1)Services and protocols depend on the infrastructure

Some parametersstability of the infrastructure (same set of resources or not)freshness of the resource availability informationreservation facilitiesmultiple resource or single resource brokering

Example of request: I need from 10 to 100 CE each with at least 512 MB RAM and a computing power of 150 Mflops

Page 25: Chapter 4:- Introduction to Grid and its Evolution

Resource management and scheduling (1)

Levels of schedulingjob scheduling (global level ; perf: throughput)resource scheduling (perf: fairness, utilization)application scheduling (perf: response time, speedup, produced

data…)

Mapping/Scheduling processresource discovery and selectionassignment of tasks to computing resourcesdata distributiontask scheduling on the computing resources(communication scheduling)

Page 26: Chapter 4:- Introduction to Grid and its Evolution

Resource management and scheduling (2)Individual perfs are not necessarily consistent

with the global (system) perf !

Grid problemspredictions are not definitive: dynamicity !Heterogeneous platformsCheckpointing and migration

Page 27: Chapter 4:- Introduction to Grid and its Evolution

GRAM GRAM GRAMLSF Condor NQE

Application

RSL

Simple ground RSL

Information Service

Localresourcemanagers

RSLspecializationBroker

Ground RSL

Co-allocator

Queries& Info

A Resource Management System Example (Globus)

NQE: Network Queuing Env.(batch management; developedby Cray Research

LSF: Load Sharing Facility(task scheduling and load balancing; Developed by Platform Computing)

Resource Specification Language

Page 28: Chapter 4:- Introduction to Grid and its Evolution

Resource information (1) What is to be stored ?

virtual organizations, people, computing resources, software packages, communication resources, event producers, devices…

what about data ???

A key issue in such dynamics environments

A first approach : (distributed) directory (LDAP)easy to use tree structuredistributionstaticmostly read ; not efficient updatinghierarchicalpoor procedural language

Page 29: Chapter 4:- Introduction to Grid and its Evolution

Resource information (2)Goal:

dynamicitycomplex relationshipsfrequent updatescomplex queries

A second approach: (relational) database

Page 30: Chapter 4:- Introduction to Grid and its Evolution

Programming on the grid: potential programming models

Message passing (PVM, MPI)Distributed Shared MemoryData Parallelism (HPF, HPC++)Task Parallelism (Condor)Client/server - RPCAgentsIntegration system (Corba, DCOM, RMI)

Page 31: Chapter 4:- Introduction to Grid and its Evolution

Program execution: issues Parallelize the program with the right job structure, communication

patterns/procedures, algorithms

Discover the available resources

Select the suitable resources

Allocate or reserve these resources

Migrate the data

Initiate computations

Monitor the executions ; checkpoints ?

React to changes

Collect results

Page 32: Chapter 4:- Introduction to Grid and its Evolution

Data managementIt was long forgotten !!!Though it is a key issue !Issues:

indexingretrievalreplicationcachingtraceability(auditing)

And security !!!

Page 33: Chapter 4:- Introduction to Grid and its Evolution

34

Some Grid-Related ProjectsGlobusCondorNimrod-G

Page 34: Chapter 4:- Introduction to Grid and its Evolution

35

Globus Grid ToolkitOpen source toolkit for building Grid systems and

applicationsEnabling technology for the Grid Share computing power, databases, and other

tools securely online Facilities for:

Resource monitoringResource discoveryResource managementSecurityFile management

Page 35: Chapter 4:- Introduction to Grid and its Evolution

36

Data Management in Globus ToolkitData movement

GridFTPReliable File Transfer (RFT)

Data replicationReplica Location Service (RLS)Data Replication Service (DRS)

Page 36: Chapter 4:- Introduction to Grid and its Evolution

37

GridFTPHigh performance, secure, reliable data transfer

protocolOptimized for wide area networksSuperset of Internet FTP protocolFeatures:

Multiple data channels for parallel transfersPartial file transfersThird party transfersReusable data channelsCommand pipelining

Page 37: Chapter 4:- Introduction to Grid and its Evolution

38

More GridFTP featuresAuto tuning of parametersStriping

Transfer data in parallel among multiple senders and receivers instead of just one

Extended block modeSend data in blocksKnow block size and offsetData can arrive out of orderAllows multiple streams

Page 38: Chapter 4:- Introduction to Grid and its Evolution

39

Striping ArchitectureUse “Striped” servers

Page 39: Chapter 4:- Introduction to Grid and its Evolution

40

Limitations of GridFTPNot a web service protocol (does not

employ SOAP, WSDL, etc.)Requires client to maintain open socket

connection throughout transferInconvenient for long transfers

Cannot recover from client failures

Page 40: Chapter 4:- Introduction to Grid and its Evolution

41

GridFTP

Page 41: Chapter 4:- Introduction to Grid and its Evolution

42

Reliable File Transfer (RFT)Web service with “job-scheduler” functionality for

data movementUser provides source and destination URLsService writes job description to a database and

moves filesService methods for querying transfer status

Page 42: Chapter 4:- Introduction to Grid and its Evolution

43

RFT

Page 43: Chapter 4:- Introduction to Grid and its Evolution

44

Replica Location Service (RLS)

Registry to keep track of where replicas exist on physical storage system

Users or services register files in RLS when files created

Distributed registryMay consist of multiple servers at different sitesIncrease scaleFault tolerance

Page 44: Chapter 4:- Introduction to Grid and its Evolution

45

Replica Location Service (RLS)Logical file name – unique identifier for contents

of filePhysical file name – location of copy of file on

storage systemUser can provide logical name and ask for

replicasOr query to find logical name associated with

physical file location

Page 45: Chapter 4:- Introduction to Grid and its Evolution

46

Data Replication Service (DRS)Pull-based replication capabilityImplemented as a web serviceHigher-level data management service built on top

of RFT and RLSGoal: ensure that a specified set of files exists on a

storage siteFirst, query RLS to locate desired filesNext, creates transfer request using RFTFinally, new replicas are registered with RLS

Page 46: Chapter 4:- Introduction to Grid and its Evolution

47

CondorOriginal goal: high-throughput computingHarvest wasted CPU power from other

machinesCan also be used on a dedicated clusterCondor-G – Condor interface to Globus

resources

Page 47: Chapter 4:- Introduction to Grid and its Evolution

48

Earth System GridProvide climate studies scientists with

access to large datasetsData generated by computational models –

requires massive computational powerMost scientists work with subsets of the

dataRequires access to local copies of data

Page 48: Chapter 4:- Introduction to Grid and its Evolution

49

ESG InfrastructureArchival storage systems and disk storage

systems at several sitesStorage resource managers and GridFTP servers

to provide access to storage systemsMetadata catalog servicesReplica location servicesWeb portal user interface

Page 49: Chapter 4:- Introduction to Grid and its Evolution

50

Earth System Grid

Page 50: Chapter 4:- Introduction to Grid and its Evolution

51

Earth System Grid Interface

Page 51: Chapter 4:- Introduction to Grid and its Evolution

52

Laser Interferometer Gravitational Wave Observatory (LIGO)

Instruments at two sites to detect gravitational waves

Each experiment run produces millions of filesScientists at other sites want these datasets on

local storageLIGO deploys RLS servers at each site to register

local mappings and collect info about mappings at other sites

Page 52: Chapter 4:- Introduction to Grid and its Evolution

53

Large Scale Data Replication for LIGOGoal: detection of gravitational wavesThree interferometers at two sitesGenerate 1 TB of data dailyNeed to replicate this data across 9 sites to

make it available to scientistsScientists need to learn where data items

are, and how to access them

Page 53: Chapter 4:- Introduction to Grid and its Evolution

54

LIGO

Page 54: Chapter 4:- Introduction to Grid and its Evolution

55

LIGO SolutionLightweight data replicator (LDR)Uses parallel data streams, tunable TCP

windows, and tunable write/read buffersTracks where copies of specific files can be

found Stores descriptive information (metadata) in a

database Can select files based on description rather than

filename

Page 55: Chapter 4:- Introduction to Grid and its Evolution

56

TeraGridNSF high-performance computing facilityNine distributed sites, each with different

capability , e.g., computation power, archiving facilities, visualization software

Applications may require more than one site

Data sizes on the order of gigabytes or terabytes

Page 56: Chapter 4:- Introduction to Grid and its Evolution

57

TeraGrid

Page 57: Chapter 4:- Introduction to Grid and its Evolution

58

TeraGridSolution: Use GridFTP and RFT with front end

command line tool (tgcp)Benefits of system:

Simple user interface High performance data transfer capability Ability to recover from both client and server

software failuresExtensible configuration

Page 58: Chapter 4:- Introduction to Grid and its Evolution

59

TGCP DetailsIdea: hide low level GridFTP commands from

usersCopy file smallfile.dat in a working directory to

another system:tgcp smallfile.dat tg-login.sdsc.teragrid.org:/users/ux454332

GridFTP command:globus-url-copy -p 8 -tcp-bs 1198372 \gsiftp://tg-gridftprr.uc.teragrid.org:2811/home/navarro/smallfile.dat \gsiftp://tg-login.sdsc.teragrid.org:2811/users/ux454332/smallfile.dat

Page 59: Chapter 4:- Introduction to Grid and its Evolution

60

The realityWe have spent a lot of time talking about

“The Grid”There is “the Web” and “the Internet”Is there a single Grid?

Page 60: Chapter 4:- Introduction to Grid and its Evolution

61

The realityMany types of Grids existPrivate vs. publicRegional vs. GlobalAll-purpose vs. particular scientific problem