The Anatomy and Physiology of the Grid Revisited

33
The Anatomy and Physiology of the Grid Revisited Nenad Medvidovic USC-CSSE and Computer Science Department University of Southern California [email protected] http://csse.usc.edu/~neno/ Collaborative work with Joshua Garcia, Ivo Krka, Chris Mattmann, and Daniel Popescu

description

The Anatomy and Physiology of the Grid Revisited. Nenad Medvidovic USC-CSSE and Computer Science Department University of Southern California [email protected] http:// csse.usc.edu/~neno / Collaborative work with Joshua Garcia, Ivo Krka , Chris Mattmann , and Daniel Popescu. What is the grid?. - PowerPoint PPT Presentation

Transcript of The Anatomy and Physiology of the Grid Revisited

Page 1: The Anatomy and Physiology of the Grid Revisited

The Anatomy and Physiologyof the Grid Revisited

Nenad MedvidovicUSC-CSSE and Computer Science Department

University of Southern [email protected]

http://csse.usc.edu/~neno/

Collaborative work with Joshua Garcia, Ivo Krka, Chris Mattmann, and Daniel Popescu

Page 2: The Anatomy and Physiology of the Grid Revisited

What is the grid?• A distributed systems technology that enables the

sharing of resources across organizations scalably, efficiently, reliably, and securely

• Analogous to the electric grid

Page 3: The Anatomy and Physiology of the Grid Revisited

Why Study the Grid?

• A highly successful technology• Deficiencies in the existing guidance for building grids

More to come• Grids are not easy to build– See CERN’s Large Hadron Collider

• Their architecture was published very early– “anatomy” and “physiology”

• Yet “What is (not) a grid?” is still a subject of debate

Page 4: The Anatomy and Physiology of the Grid Revisited

The Architectural Perspective• Grids are large, complex systems– Thousands of nodes or more– Span many agency boundaries

• Qualities of Service (QoS) are critical– Scalability– Security– Performance– Reliability ...

• Software architecture is just what the doctor orderedThe set of principal design decisions about a software

system [Taylor, Medvidovic, Dashofy 2009]

Page 5: The Anatomy and Physiology of the Grid Revisited

So, What Did We Set out to Do?

• Study grid’s reference requirements and architecture

• Study the architectures of existing grid technologies

• Compare the twoKnowing that there will likely be very few

straightforward answers• Suggest how to fix any discrepancies

Knowing that there will likely be very few straightforward answers

Page 6: The Anatomy and Physiology of the Grid Revisited

Architectural Recovery Approach

Page 7: The Anatomy and Physiology of the Grid Revisited

Original grid reference architecture

Page 8: The Anatomy and Physiology of the Grid Revisited

Some Reference Requirements

Page 9: The Anatomy and Physiology of the Grid Revisited

Studied Grid TechnologiesTechnology PL KSLOC # Modules

Alchemi C# (.NET) 26.2 186Apache Hadoop Java, C/C++ 66.5 1643Apache HBase Java, Ruby, Thrift 14.1 362

Condor Java, C/C++ 51.6 962DSpace Java 23.4 217Ganglia C 19.3 22GLIDE Java 2 57

Globus 4.0 (GT 4.0) Java, C/C++ 2218.7 2522Grid Datafarm Java, C 51.4 220Gridbus Broker Java 30.5 566

Jcgrid Java 6.7 150OODT Java 14 320Pegasus Java, C 79 659SciFlo Python 18.5 129iRODS Java, C/C++ 84.1 163

Sun Grid Engine Java, C/C++ 265.1 572Unicore Java 571 3665Wings Java 8.8 97

Page 10: The Anatomy and Physiology of the Grid Revisited

Architecture Recovery Technique- Focus -

• Establish idealized architecture and candidate architectural style(s)

• Identify data and processing components– Groups implementation modules according to a set of rules

• Map identified data and processing components onto an idealized architecture

Examine Source codeDocumentationRuntime behavior Tie to requirements satisfied by component

Page 11: The Anatomy and Physiology of the Grid Revisited

Rules of Focus1. Group based on isolated classes2. Group based on generalization3. Group based on aggregation4. Group based on composition5. Group based on two-way association6. Identify domain classes7. Merge classes with a single originating domain class

association into domain class8. Group classes along a domain class circular dependency path9. Group classes along a path with a start node and end node

that reference a domain class10. Group classes along paths with the same end node, and

whose start node references the same domain class

Page 12: The Anatomy and Physiology of the Grid Revisited

Some Refinements to the Rules• Domain class rules

– Class with large majority of outgoing calls• Exclusion rules

– Class with large majority of incoming calls– Utility classes– Heavily passed data-structures– Benchmarking and test classes

• Additional groupings– By exception– By interface– By package if idealized architecture matches first-class component

Page 13: The Anatomy and Physiology of the Grid Revisited

Focus Rules for Distributed Systems

• Infer distributor connectors from idealized architecture

• Classes with methods and names similar to first-class components are domain classes

• Classes importing network communication libraries are domain classes

• main() functions often identify first-class components

• Classes deployed onto different hosts must be grouped separately

Page 14: The Anatomy and Physiology of the Grid Revisited

Discovered discrepancies

• Empty layers• Skipped Layers• Up-calls• Multi-layer components

Page 15: The Anatomy and Physiology of the Grid Revisited

Empty Layers

- Wings -

Page 16: The Anatomy and Physiology of the Grid Revisited

Skipped Layers

- Pegasus -

Page 17: The Anatomy and Physiology of the Grid Revisited

Upcalls- Hadoop -

Page 18: The Anatomy and Physiology of the Grid Revisited

Multi-Layer Components

- iRODS -

Page 19: The Anatomy and Physiology of the Grid Revisited

What about Globus?

Page 20: The Anatomy and Physiology of the Grid Revisited

Collective

Application

Connectivity

Resource

Fabric

GetOpts

GridContext

Utilities

GlobusDescriptorSetter

ServiceAnnotatorSimpleWriter

CL Option

GenerateUndeploy

WSDDService

ServiceNotificationThread

EJBServiceClient

JMSAdapterClient

GroupLogAttribute

AuthMethod

EJBFactoryCallback

WSDL2Java

ServiceActivatorHolder

PersistentGridServiceImplBasicHandler

JAXRPCHandler

HomeWrapper

SecureContainerHandler

Parser

NotificationSubscriptionFactoryCallbackImpl

DynamicFactoryCallbackImpl

OGSI LoggingFaultElement

OGSI AuthenticationToken

PrivateKey

GSSCredential

BinarySecurityToken

ServiceRequest

ServiceData

SecurityDescriptor

OGSI AuthenticationFaultOGSIHolder OGSIType

UUID

OGSI FaultType

Exception Data

ServiceDesc

X509 Certificate

FlattenedWSDLDefinition

OGSA ClientOperation

TypeEntry

Semaphore

ServiceDataSet WSDLConstants

JavaClassWriterSymbolTable

ServiceEntry

PerformanceLog

ServiceLifecycleMonitorImpl

CommandLineTool

Element

JavaGridServiceDeployWriter

TypeMappingInfo

SecContext

GSSContext

ListDescriptorHandler TimerTask

ServiceDeployment

ExtendedDateTimeType

HandleType

ServiceDataAttributes

ServiceLocator

NotificationSinkNotifyer

PrivilegedInvokeMethodAction

RPCURIProvider MessageContext

Method

CreateInfo

ServiceDataAnnotation

Map

BinarySecurityTokenFactory

NotificationSinkManager

ServiceContainer ServicePropertiesImpl

JavaGridServiceDeployConstants

WSDL2

Emiter

ToolingCommand

CLArgsParserDocument

CLOptionDescriptor

Java2WSDL

Two layer boundary AND

Upcall

Two layer boundary AND

Upcall

Two layer boundary ANDUpcall

Couldn’t determine right “layer”

upcall

upcall

upcall

What about Globus?

Page 21: The Anatomy and Physiology of the Grid Revisited

Discrepancies Found

Page 22: The Anatomy and Physiology of the Grid Revisited

Revised Grid Architecture

• The connectivity layer is eliminated• Explicitly addressing deployment view• Subsystem types rather than layer-oriented• Four architectural styles comprise the grid– Client/server– Peer-to-peer– Layered– Event-based

• An improved classification of grid technologies

Page 23: The Anatomy and Physiology of the Grid Revisited

Revised Grid Reference

Architecture

Page 24: The Anatomy and Physiology of the Grid Revisited

Grid Styles – C/S• Application components are clients to

Collective components– e.g., application components query for

resource component locations from collective components

• Application components are clients to Resource components– e.g., direct job submission from

application components to resource components

• Resource components can act as clients to Collective components– e.g., resource components may obtain

locations of other resource components through collective components

Page 25: The Anatomy and Physiology of the Grid Revisited

Grid Styles – p2p• Resource components are

peers– e.g., Grid Datafarm Filesystem

Daemon (gfsd) instance makes requests for file data from other gfsds

• Collective components are peers– e.g., iRODS agents

communicate with each other to exchange data to create replicas

Page 26: The Anatomy and Physiology of the Grid Revisited

Grid Styles – Event-Based

• Resource components notify Collective components that monitor them– e.g., executors send heartbeats

to managers

Page 27: The Anatomy and Physiology of the Grid Revisited

Grid Architectural Styles – Layered

• Collective or Resource components request services from Fabric components– e.g., iRODS agent accesses a

DBMS with metadata

Page 28: The Anatomy and Physiology of the Grid Revisited

Grid Technology Classification

• Computational grid– Implementing all

Collective components – e.g., Alchemi and Sun

Grid Engine

Page 29: The Anatomy and Physiology of the Grid Revisited

Grid Technology Classification

• Data grid– Job scheduling

components in Collective subsystem are not required

– e.g., Grid Datafarm and Hadoop

Page 30: The Anatomy and Physiology of the Grid Revisited

Grid Technology Classification

• Hybrid– Resource components

providing services either to perform operations on a storage repository or to execute a job or task

– e.g. Gridbus Broker and iRODS

File Resource

Computational Resource

Page 31: The Anatomy and Physiology of the Grid Revisited

Correcting Violations in the Reference Architecture

• Why were there originally so many upcalls?– Legitimate client-server and event-based communication

• Why so many skipped layer calls?– The Fabric layer was at the wrong level of abstraction– Mostly utility classes that should be abstracted away

• Why so many multi-layer components?– Connectivity layer was at the wrong level of abstraction– Not a layer, but utility libraries to enable connector functionality– Also accounts for skipped layer calls

• Benefit of the deployment view– Essential for distributed systems– Helped to identify that the Fabric layer was not abstracted

properly

Page 32: The Anatomy and Physiology of the Grid Revisited

Where Are We Currently?

• There are remaining violations– Are they legitimate or a result of an improperly recast

reference architecture?• Original Focus is not ideal for recovering systems of

these types– Distributed systems realized by a middleware

• A more automated approach that combines static and dynamic analysis would be preferable

• Use the recast reference architecture to build a new grid

• What are the overarching grid principles?

Page 33: The Anatomy and Physiology of the Grid Revisited

Evolving Grid Principles1. A grid is a collection of logical resources (computing and data) distributed

across a wide-area network of physical resources (hosts).2. In a single grid-based application, the logical resources are owned by a single

agency, while the physical resources are owned by multiple agencies.3. All resources in a grid are described using a common meta-resource language.4. Atomic-level logical resources are defined independently of the atomic-level

physical resources.5. The allocation of the atomic-level logical resources to the atomic-level

physical resources can be N:M.6. All computation in a grid is initiated by a client, which is a physical resource.

The client sends the logical resources to the servers, which are also physical resources. A server can, in turn, delegate the requested computation to other physical resources.

7. All agencies that own physical resources in a grid must be able to specify policies that enforce the manner in and extent to which their physical resources can be used in grid applications.