Overview of Grids, Services Oriented Architectures and Science Portals Overview of Grids, Services...

46
Overview of Grids, Services Overview of Grids, Services Oriented Architectures and Oriented Architectures and Science Portals Science Portals Sriram Krishnan, PhD [email protected]
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    225
  • download

    2

Transcript of Overview of Grids, Services Oriented Architectures and Science Portals Overview of Grids, Services...

Overview of Grids, Services Oriented Overview of Grids, Services Oriented Architectures and Science PortalsArchitectures and Science Portals

Sriram Krishnan, [email protected]

OutlineOutline

• What is Grid Computing?

• What are Services Oriented Architectures?

• What are Science Portals?

• How are all the pieces tied together?

• Some case studies

Cluster ComputingCluster Computing• Independent computers combined into a unified system through software and

networking• Typical Setup

– Collection of commodity computers (PCs)– Using a commodity network (Ethernet)– Typically running open-source operating system (Linux)

• Interconnect– Gigabit Ethernet (commodity)

• High Latency• Cheap

– Myrinet, Infiniband, … (non-commodity)• Low Latency• OS-bypass• Expensive

– Programming model is Message Passing

• History– Network Of Workstations (NOW) pioneered the vision for clusters of commodity processors– Beowulf popularized the notion and made it very affordable

© 2008 UC Regents

High Performance Computing ClusterHigh Performance Computing Cluster

Front-end Node Public Ethernet

PrivateEthernet Network

ApplicationNetwork (Optional)

Node Node Node Node Node

Node Node Node Node Node

Pow

er Distribution

(Net addressable units as option)

Clusters now Dominate High-End ComputingClusters now Dominate High-End Computing

http://www.top500.org/charts/list/29/archtype/

Grid ComputingGrid Computing• “Coordinated resource sharing and problem solving in

dynamic multi-institutional virtual organization.” [Foster, Kesselman, Tuecke]

– Coordinated - multiple resources working in concert, eg. Disk & CPU, or instruments & database, etc.

– Resources - compute cycles, databases, files, application services, instruments.

– Problem solving - focus on solving scientific problems– Dynamic - environments that are changing in unpredictable ways– Virtual Organization - resources spanning multiple organizations

and administrative domains, security domains, and technical domains

Other TermsOther Terms

• Cyberinfrastructures– Encompasses advanced scientific computing, as well as

a more comprehensive infrastructure for research and education based upon distributed, federated networks of computers, information resources, on-line instruments, and human interfaces (Atkins Report, 2003)

• eScience– Computationally intensive science that is carried out in

highly distributed network environments (e.g. in the context of the U.K. eScience program)

Grids are not the same as Clusters!Grids are not the same as Clusters!

• Ian Foster’s 3 point checklist– Resources not subjected to centralized control– Use of standard, open, general-purpose

protocols and interfaces– Delivery of non-trivial qualities of service

• Grids are typically made up of multiple clusters

Popular MisconceptionPopular Misconception

• Misconception: Grids are all about CPU cycles– CPU cycles are just one aspect, others are:

• Data: For publishing and accessing large collections of data, e.g. Geosciences Network (GEON) Grid

• Collaboration: For sharing access to instruments (e.g. UCSD TeleScience Grid), and collaboration tools (e.g. Global MMCS at IU)

How do you build a “Grid”?How do you build a “Grid”?• Start with raw hardware,• Add storage• and networks,• Mix in scientific datasets,• Build collaboratory and

visualization tools

• How do you manage, provision, schedule, authenticate, monitor, program, and access these resources?

SETI@HomeSETI@Home

• Uses 1000s of internet connected PCs to help in search for extraterrestrial intelligence

• When the computer is idle, the software downloads ~ 1/2 MB chunk of data for analysis.

• Results of analysis sent back to the SETI team, combined with 1000s of other participants

• Largest distributed computation project in existence

– Total CPU time: 2433979.781 years– Users: 5436301

• Statistics from 2006

NCMIR TeleScience GridNCMIR TeleScience Grid• An ability to dynamically link resources together as an

ensemble to support the execution of large-scale, resource-intensive, and distributed applications

IMAGING INSTRUMENTS

COMPUTATIONALRESOURCES

LARGE-SCALE DATABASES

DATA ACQUISITION ,ANALYSISADVANCED

VISUALIZATION

“Tel

esci

ence

Gri

d”

TeraGridTeraGrid

TeraGrid is a “top-down”,

planned Grid

PSCPSC

Extensible Terascale Facility

• Members: IU, ORNL, NCSA, PSC, Purdue, SDSC, TACC, ANL, NCAR

• 280 Tflops of computing capability

• 30 PB of distributed storage

• High performance networking between partner sites

• Linux-based software environment, uniform administration

• Focus is a national, production Grid

PRAGMA Grid Member InstitutionsPRAGMA Grid Member Institutions

31 institutions in 15 countries/regions (+ 7 in preparation)

UZurichSwitzerland

NECTECThaiGridThailand

UoHydIndia

MIMOSUSMMalaysia

CUHKHongKong

ASGCNCHCTaiwan

HCMUTIOIT-HCMVietnam

AISTOsakaUUTsukubaTITechJapan

BIIIHPCNGONTUSingapore

MUAustralia

APACQUTAustralia

KISTIKorea

JLUChina

SDSCUSA

CICESEMexico

UNAMMexico

UCNChile

UChileChile

UUtahUSA

NCSAUSA BU

USA

ITCRCosta Rica

BESTGridNew Zealand

CNICGUCASChina

LZUChina UPRM

Puerto Rico

Usability IssuesUsability Issues

• Access to Grid resources is still very complicated– User account creation– Management of credentials (identities)– Installation and deployment of scientific

software– Interaction with Grid schedulers– Data management

Technical ChallengesTechnical Challenges

• Security– Grids traverse organizational boundaries

• Different administration domains have different authentication mechanisms• Resources have different use agreements and sharing priorities

– Need to provide Single Sign-On (SSO), Authentication, Authorization

• Resource Management– Resources loosely-coupled

• Higher network latencies• Planned and unplanned disruptions

– Requirements• Seamless access to Grid resources• QoS guarantees for jobs• Scheduling/co-scheduling of resources• Failure management

Technical ChallengesTechnical Challenges• Data Management

– Data Transfer• GridFTP: High-performance, secure, reliable data

transfer protocol optimized for high-bandwidth wide-area networks

– Managing large-scale scientific data across different sites

• Storage Request Broker (SRB): Shared collections that can be distributed across multiple organizations and heterogeneous storage systems

Technical ChallengesTechnical Challenges

• Interoperability– In the past, different projects

used different protocols and APIs

• Legion, Condor, Globus, SGE, etc

– Need to use standard, open mechanisms

• Current thrust towards the use of Service oriented architectures and Web service technologies for interoperability

Service Oriented Architectures Service Oriented Architectures (SOAs)(SOAs)

• “SOA represents a model in which functionality is decomposed into small, distinct units (services), which can be distributed over a network and can be combined together and reused to create applications” - Erl, Thomas (2005). Service-Oriented Architecture: Concepts, Technology, and Design.

Benefits of SOAsBenefits of SOAs• Reduce complexity by encapsulating the back-end

implementation– Service interfaces can be published and used by a number of

clients

• Enable interoperability across systems through the use of open standards– Web services (WSDL, SOAP, XML Schemas) are de facto

standards– Lend themselves well to the creation of workflows

• Support a loosely-coupled model where clients can bind to services at run-time– Enables greater flexibility and fault tolerance

What are Web Services?What are Web Services?• Many different definitions are available• IBM (Gottschalk, et al): A Web service is an

interface that describes a collection of operations that are network accessible through standardized XML messaging.

• Microsoft (on MSDN): A programmable application logic accessible using standard Internet protocols.

• Simply put, a Web service is a network service that provides a programmatic interface to remote clients

Web Services: FeaturesWeb Services: Features

• Independent of programming language and OS• All information required to contact a service is

captured by the Web Service Description– Web Services Description Language (WSDL) provides

a way to encapsulate an interface definition, data types being used, and the protocol information

• Web services provide programmatic access to remote clients using standard internet protocols

Web Services LifecycleWeb Services Lifecycle

ServiceRegistry

ServiceRequestor

ServiceProvider

Lookup Publish

Interact

Open Grid Services ArchitectureOpen Grid Services Architecture

• A standards-based distributed service system that supports the creation of sophisticated distributed services required in inter-organizational computing environments

• The standards are described by a set of specifications called the Web Services Resource Framework (WSRF)

Open Grid Services ArchitectureOpen Grid Services Architecture

• The evolution of the Grid to an architecture based on prior Grid and Web service technologies– Open: Extensibility, Vendor-neutrality, Committed to community

standardization

• Use of WSDL to achieve self-describing, discoverable services & interoperable protocols

• Support for reliable & secure invocation, lifetime management, notification, policy & credential management, and virtualization

Open Grid Services ArchitectureOpen Grid Services Architecture

From Theory to PracticeFrom Theory to Practice

SOAs in eScienceSOAs in eScience

• Grid and scientific communities have been adopting SOAs over the past few years– Open Grid Services Architecture (OGSA)– Web Services Resource Framework (WSRF)

• However, in general, most past efforts have focuses on middleware, and not science– For instance, the Globus Toolkit– More recently, there are several efforts to build infrastructures for

Services Oriented Science*

• I. Foster. “Services Oriented Science”. In the Science Magazine, 2005

Application-level ServicesApplication-level Services• Traditional model: Services for middleware

tools, e.g. job launch, data transfer, etc• Current trend: “Services Oriented Science”

– Scientific applications as first class services– Delegation of middleware management to the

services back-end– End-users are presented with science-oriented,

and not middleware-oriented interfaces

Enabling Multiple User InterfacesEnabling Multiple User InterfacesGemstone: http://gemstone.mozdev.org ADT: http://mgltools.scripps.edu

GridSphere: http://www.gridsphere.orgKepler: http://kepler-project.org

What is a Web Portal?What is a Web Portal?• Web portals aggregate information content from

diverse sources, and present them in a unified way• Traditional Model

– Monolithic websites, all information content co-located on central server

• Current Trend– Information content geographically distributed, and

implemented as an SOA– Portals provide a single point of entry, by aggregating

geographically distributed resources

What is a Web Portal?What is a Web Portal?• “A portal is a web based application

that commonly provides personalization, single sign on, content aggregation from different sources and hosts the presentation layer of Information Systems”(JSR 168)

• Grid/Science Portals build upon the familiar Web portal model, such as Yahoo or Amazon, to deliver the benefits of Grid computing to virtual communities of users, providing a single access point to Grid services and resources.

Portals: Pros & ConsPortals: Pros & Cons

• Pros– Single point of entry to diverse information sources– Ubiquitous access to applications (browser based)– No need to install complex software

• Cons– Limited interaction with local desktop tools– Interfaces may not be rich enough for complex tasks

such as visualization– Not very easy to make highly interactive interfaces

Portal TechnologyPortal Technology• JSR 168 Portlet API

– Similar to Servlet API in providing reusable Web applications

– Ratified in August 2003 by vendors including BEA, Sun, IBM, Oracle, Plumtree, etc

• GridSphere: http://www.gridsphere.org– JSR 168 Compliant– Used by several projects at UCSD such as

GEON, NEES, NBCR, CAMERA

What is a Portlet?What is a Portlet?

• Unit of composition for a portal - a portal is simply an aggregation of portlets• Standardized packaging model to share applications among portal vendors• Builds off Servlet API and specification so no major surprises for existing Java

portal developers• API provides useful methods for storing per user data and configuration settings• Can be used as building blocks to aggregate content from disparate information

sources

Putting it all together: NEES ArchitecturePutting it all together: NEES Architecture

Case Study: The NBCR SOACase Study: The NBCR SOA

• Transparent access to distributed resources by grid-enabling biomedical codes and biological and biomedical databases– Researchers should be able harness the computational and data

resources without having to worry about the complexity of the back-end infrastructure

• Enable integration of applications across different scales (e.g. atomic to macro-molecular, to cellular and tissue, and so on)– With the help of commodity workflow tools and Problem Solving

Environments (PSEs)

ApproachApproach

• Scientific applications wrapped as Web services– Provision of a SOAP API for programmatic

access

• Clients interact with application Web services, instead of Grid resources

Security Services (GAMA)

NBCR SOA: Big PictureNBCR SOA: Big Picture

Condor pool SGE Cluster PBS Cluster

Globus DRMAA Globus

Application Services

StateMgmt

Web Portals ADT KeplerContinuity

Scientific SOA: BenefitsScientific SOA: Benefits• Applications are installed once, and used by all authorized

users– No need to create accounts for all Grid users

– Use of standards-based Grid security mechanisms

• Users are shielded from the complexities of Grid schedulers

• Data management for multiple concurrent job runs performed automatically by the Web service

• State management and persistence for long running jobs

• Accessibility via a multitude of clients

Web Portal based AccessWeb Portal based Access

Scientific Workflows Scientific Workflows

• Need for automation of scientific processes– An end-to-end application is typically more

than a single application run

• Must be reproducible and maintainable

• Should be easy to compose from individual components

Molecular Visualization Using the Vision Workflow Molecular Visualization Using the Vision Workflow ToolkitToolkit

Bioinformatics Workflows Using KeplerBioinformatics Workflows Using Kepler

ConclusionsConclusions• Grid computing provides coordinated resource sharing and

problem solving in dynamic multi-institutional virtual organization

• Service oriented Architectures (SOA) provide a model in which functionality is decomposed into small, distinct services, which can be distributed over a network and can be combined together and reused to create applications– Grid computing and eScience moving towards SOAs

• Web portals aggregate information content from diverse sources that are implemented as SOAs, and present them in a unified way– Services can also be accessed via a multitude of other clients

Questions?Questions?