Post on 21-Jan-2016
description
Grid ServicesGrid Services
Presented by
Karan Bhatia
Presented by
Karan Bhatia
2
Hype Curve
3
Overview
• Grid Computing Background– Definition
– Opportunities
– Markets
• Technical Challenges– Security Infrastructure
– Resource Management
– Service Interoperability
• Summary
4
Grid Computing is …
• “Co-ordinated resource sharing and problem solving in dynamic multi-institutional virtual organization.” [Foster, Kesselman, Tuecke]
– Co-ordinated - multiple resources working in concert, eg. Disk & CPU, or instruments & database, etc.
– Resources - compute cycles, databases, files, application services, instruments.
– Problem solving - focus on solving scientific problems
– Dynamic - environments that are changing in unpredictable ways
– Virtual Organization - resources spanning multiple organizations and administrative domains, security domains, and technical domains
5
Grid Computing is … (Industry)
• “about finding distributed, underutilized compute resources (systems, desktops, storage) and provisioning those resources to users or applications requiring them.” [The Grid Report, Clabby Analytics]
– Distributed - all the resources laying around in departments or server rooms.
– Underutilized - typical utilization of “big iron” is 5 to 10%. Organizations save money by increasing utilization versus purchasing new resources.
– Resources - servers and server cycles, applications, data resources
– Provisioning - predict and schedule resource use depending on load.
6
Types of Grids…
• Compute Grids– Seti@home, Entropia,
United Devices, Condor
• Data Grids– Storage Resource Broker
(SRB), Avaki, BIRN, GEON
• Collaboration Grids– Instrumentation
(telescience), applications
• Enterprise Grids– Majority of commercial
interest
• Partner Grids– B2B, Academic/Govt Grids
• Service Grids– “Utility” Computing, “On
Demand”, pervasive, autonomic, etc…
7
A Grid is …
• “the next generation Internet,”
• “all about free cycles ala SETI@HOME,”
• “a distributed object system,”
• “a new programming model,”
• “a replacement for high performance computing,”
8
IMAGING INSTRUMENTS
COMPUTATIONALRESOURCES
LARGE-SCALE DATABASES
DATAACQUISITION ,ANALYSIS
ADVANCEDVISUALIZATION
Example… TeleScience Grid
9
Grid Resources - Networks
10
Grid Resources - Compute
11
Top 500.org
12
13
Another Grid Example … Google
• Queries– 150 M queries/day (2000/s)
– 100 countries
– 3.3 B documents
• Hardware– 15,000 Linux systems in 6 data centers
– 15 Tflop/s and 1000 TB total capacity
– 40-80 1U/2U servers/cabinet
– 100 MB Ethernet switches/cabinate with gigabit uplinks
– Growth from 4000 systems (18 M queries/day)
14
Grid Resources - Data
• SDSC Resources – HPSS:
• SDSC's central long-term data storage system,• one of the world's largest IBM High Performance Storage System
(HPSS) units,• currently holds more than a petabyte (a million gigabytes) of data in
approximately 21 million files,• It has the capacity to store six petabytes of data; files are added at an
average rate of 10,000 gigabytes per month.
– Storage-Area Network (SAN): • A 72-processor Sun Microsystems SunFire 15K high-end server and 11
Brocade switches (1,400 ports) • 225,000 gigabytes of networked disk storage for data-oriented
applications.
• 1 TB of data = $2500
15
Protein Data Bank (PDB)
16
Putting it all together… TeraGrid
17
Grid Market
18
Grid Companies
• IBM– “on demand” solutions
• Sun Microsystems– N1 initiative
• Oracle– 10g
• Dell
• HP– “utility” computing
• Platform Computing– LSF, metaclulstering
• United Devices– Desktop grids
• DataSynapse• Akamai• Google?• Sony online
entertainment?
• Where’s Microsoft?
19
Grid Organizations
• Global Grid Forum (GGF)
• Organization for the Advancement of Structured Information Standards (OASIS)
• Distributed Management Task Force (DMTF)
• World Wide Web Consortium (W3C)
• Globus Alliance
• NSF Middleware Initiative (NMI)
• NASA IPG
• DOE Science Grid
• EU DataGrid
• NSF TeraGrid
20
Technical Challenges for Grid Computing
21
Challenges: Security
• Grids traverse organizational boundaries– Different administration domains have different authentication
mechanisms– Resources have different use agreements and sharing priorities
• Single sign-on– Multiple passwords difficult to manage
• Rights delegation• Trust
– Authentication of users– Authorization of users– Resource access
22
Security• Public Key Infrastructure
– Public key A.public– Private key A.private
• Supports Encrpyption– Message to B:
• m’ = F(m,A.private), send m’ to B• recv m’, m = F’(m’,A.public)
• Digital Signatures– Signed message to B:
• m’ = (m,F(m,A.public))
– Receiver verifies that m’ is from A and not tampered
23
Grid Security Infrastructure (GSI)
• A central concept in GSI authentication is the certificate.
• Every user and service on the Grid is identified via a certificate, a text file containing the following information:– a subject name identifying the person
or object that the certificate represents, – the public key belonging to the
subject, – the identity of a Certificate Authority
(CA) that has signed the certificate to certify that the public key and the identity both belong to the subject,
– the digital signature of the named CA.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
24
Proxy Certificate
• A proxy consists of a new certificate with a new public and private key.
• The new certificate contains the owner's identity modified slightly to indicate that it is a proxy.
• The new certificate is signed by the owner rather than a CA.
– This is called a self-signed certificate.
• The certificate also includes a time notation after which the proxy should no longer be accepted by others.
• Proxies have limited lifetimes in order to minimize the security vulnerability.
• Because the proxy isn't valid for very long, it doesn't have to kept quite as secure as the owner's private key.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
25
Mutual Authentication
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
26
Additional Challenges
• Certificate Management– MyProxy
• Role-based Access Control– CAS, VOM
• Authorization services• Integration with
applications & Portals
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
27
Challenges: Resource Management
• Resources loosely-coupled– Higher network latencies– Planned and unplanned disruptions
• How to provide QoS guarantees?
• Case Study: Entropia Desktop Grids– Additional trust/security issues
29
Entropia 1: Gimps• Over 1.5 Billion
CPU hours served
• 300,000+ machines, over 4 years operational
• Every PC and hardware config imaginable (proc, memory, disk, etc.)
• Every networking hookup imaginable
• Found 35th, 36th, 37th, 38th, and 39th Mersenne Primes
30
Entropia 2: FightAids@home
• Sept 2000 launch• Internet-Based• 54,657 total
machines• 10,770,506 total
hours of computation
• 27,881 peak billions of calculations/sec
31
Entropia 3: DCGrid
• Enterprise focus– Tremendous resources available in enterprise– Complements other HPC resources
• Computing Platform– Arbitrary application (open scheduling model)– Security, unobtrusiveness, manageability guaranteed
• Focus on – Pharmaceuticals, Chemicals, and Materials – Financial Services
32
DCGrid Architecture
35
Server vs. Desktop Grids
• Server environment– Fixed IP, always connected
– Always-on operation
– Moderate number of systems (10’s – 100’s)
– Dedicated use, trusted systems
• Desktop environment– Dynamic, temporary IP, intermittent connection
– Off evenings, off weekends, off lunch
– Large numbers of systems (100’s – 1000’s - ?)
– Shared resources, potentially untrusted users
• These differences give rise to desktop Grid challenges
36
Typical PC-Grid Environment
0
100
200
300
400
500
600
700
552 576 600 624 648 672 696 720
Time (hours)
37
PC-Grid Challenges
• Provide a stable compute environment for apps– Isolate app from variable desktop environment
• Operate in environment of dynamic use– Unobtrusiveness and Fault Tolerance are key!
• Provide simple application integration– Support ANY Application without modification
• Provide centralized management console– Zero additional management costs
38
JobManagement
ResourceSchedulinng
Physical NodeManagement
Job Manager
Subjob Scheduler
Node Manager
End-user
Entropia Clients
computation
resource
resource description
Workflow
2
3
45
6b
1
7
8
a
39
Stable Compute Environment
• Entropia Proprietary Sandbox– Binary-level protection
– System virtualization (registry, file system, network)
• Open Scheduling Infrastructure– Intelligent scheduling (match resources to subjobs
requirements)
– Manage subjob redundancy/fault tolerance
40
Manage Dynamic Use
• PC primary use must be respected!• Entropia Proprietary Sandbox
– Guaranteed to run at idle priority– Limit application capability– Monitor page faults, network access
• Management– Provide time-of-use windows– Different levels of unobtrusiveness
• Gathers 95+ % of cycles
41
Application Integration
• Support any Win32 binary– Language Neutral (C, C++, Fortran, Java,C#, etc.)
– Compiler/library Neutral
Client1 *
Client2 *
…
…
Open Grid Platform
App A
App B
App C
qsubqstat…
ApplicationPreparation Tools
Run Applications
42
Manageability
43
Application Performance
0
5
10
15
20
25
30
35
40
0 25 50 75 100 125 150
Number of Clients
Sequences per hourEntropia
1CPU SGI
1CPU SUN
Linear (Entropia)
0
50
100
150
200
250
300
350
400
0 100 200 300 400 500 600
Number of Clients
Throughput (Packets per Hour)
0
20
40
60
80
100
120
140
160
0 5 10 15 20 25 30 35 40 45 50
Number of Clients
Compounds per Hour
GOLD
AUTODOCK
HMMER
0
1000
2000
3000
4000
5000
6000
7000
0 100 200 300 400 500
Number of Clients
Compounds per Hour
DOCK
44
Scheduling PerformanceJob 14 Nodes (94 clients)
0
10
20
30
40
50
60
70
80
90
100
0 3600 7200 10800 14400 18000 21600
Time (secs)
Client ID
45
Challenges: Service Interoperability
• Trying to force homogeneity on users is futile. Everyone has their own preferences, sometimes even dogma.
• The Internet provides the model…
46
Typical Application
WebBrowser
ComputeServer
DataCatalog
DataViewer
Tool
Certificateauthority
ChatTool
CredentialRepository
WebPortal
ComputeServer
Resources implement standard access & management interfaces
Collective services aggregate &/or
virtualize resources
Users work with client applications
Application services organize VOs & enable
access to other services
Databaseservice
Databaseservice
Databaseservice
SimulationTool
Camera
Camera
TelepresenceMonitor
RegistrationService
47
Typical Application
• Implementations are provided by a mix of– Application-specific code
– “Off the shelf” tools and services
– Tools and services from the Globus Toolkit
– Tools and services from the Grid community (compatible with GT)
• Glued together by…– Application development
– System integration
48
How it Really Happens(without the Grid)
WebBrowser
ComputeServer
DataCatalog
DataViewer
Tool
Certificateauthority
ChatTool
CredentialRepository
WebPortal
ComputeServer
Resources implement standard access & management interfaces
Collective services aggregate &/or
virtualize resources
Users work with client applications
Application services organize VOs & enable
access to other services
Databaseservice
Databaseservice
Databaseservice
SimulationTool
Camera
CameraTelepresence
Monitor
RegistrationService
A
B
C
D
E0Grid
Community
0Globus Toolkit
13Off the Shelf
9Application Developer
49
How it Really Happens(with the Grid)
WebBrowser
ComputeServer
GlobusMCS/RLS
DataViewer
Tool
CertificateAuthority
portlet
MyProxy
Portal
ComputeServer
Resources implement standard access & management interfaces
Collective services aggregate &/or
virtualize resources
Users work with client applications
Application services organize VOs & enable
access to other services
Databaseservice
Databaseservice
Databaseservice
SimulationTool
Camera
CameraTelepresence
Monitor
Globus IndexService
GlobusGRAM
GlobusGRAM
GlobusDAI
GlobusDAI
GlobusDAI
4Grid Community
4Globus Toolkit
9Off the Shelf
2Application Developer
50
Theory -> Practice
51
What You Get in the Globus Toolkit
• OGSI(3.x)/WSRF(4.x) Core Implementation– Used to develop and run OGSA-compliant Grid Services (Java,
C/C++)
• Basic Grid Services– Popular among current Grid users, common interfaces to the most
typical services; includes both OGSA and non-OGSA implementations
• Developer APIs– C/C++ libraries and Java classes for building Grid-aware
applications and tools
• Tools and Examples– Useful tools and examples based on the developer APIs
52
Components in Globus Toolkit 3.0
GSI
WS-Security
Data Managemen
tSecurity
WSCore
Resource Managemen
t
Information Services
RFT(OGSI)
RLS
WU GridFTPJAVA
WS Core(OGSI)
OGSI C Bindings
MDS2
WS-Index(OGSI)
Pre-WSGRAM
WS GRAM(OGSI)
53
Components in Globus Toolkit 3.2
GSI
WS-Security
CAS(OGSI)
SimpleCA
Data Managemen
tSecurity
WSCore
Resource Managemen
t
Information Services
RFT(OGSI)
RLS
OGSI-DAI
WU GridFTP
XIO
JAVAWS Core(OGSI)
OGSI C Bindings
MDS2
WS-Index(OGSI)
Pre-WSGRAM
WS GRAM(OGSI)
OGSI Python Bindings
(contributed)
pyGlobus(contributed)
54
Planned Components in GT 4.0GSI
WS-Security
CAS(WSRF)
SimpleCA
Data Managemen
tSecurity
WSCore
Resource Managemen
t
Information Services
Authz Framework
RFT(WSRF)
RLS
OGSI-DAI
New GridFTP
XIO
JAVAWS Core(WSRF)
C WS Core(WSRF)
MDS2
WS-Index(WSRF)
Pre-WSGRAM
WS-GRAM(WSRF)
CSF(contribution)
pyGlobus(contributed)
55
Grid and Web Services Convergence
The definition of WSRF means that the Grid and Web services communities can move forward on a common base.
Grid
Services
Example
• (from sotomayor tutorial)
• MathService API:
– add(int x)
– subtract(int x)
– getvalue()
Note 1: How is this different than - Web Services? - Corba? - COM/DCOM?
Note 2: This is too simple! What about - co-ordination/workflows - personalization - presentation - security
OGSI
(or
what is a
grid service?)
• Using web service infrastructure
– MathService is defined by WSDL (like idl)
<?xml version="1.0" encoding="UTF-8"?>...<types><xsd:schema targetNamespace="http://www.gt3tutorial.org/namespaces/0.2/core/gwsdl/Math" attributeFormDefault="qualified" elementFormDefault="qualified" xmlns="http://www.w3.org/2001/XMLSchema"> <xsd:element name="add"> <xsd:complexType> <xsd:sequence> <xsd:element name="value" type="xsd:int"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="addResponse"> <xsd:complexType/> </xsd:element>...</types>
<message name="AddInputMessage"> <part name="parameters" element="tns:add"/></message><message name="AddOutputMessage"> <part name="parameters" element="tns:addResponse"/></message>...
<gwsdl:portType name="MathPortType" extends="ogsi:GridService"> <operation name="add"> <input message="tns:AddInputMessage"/> <output message="tns:AddOutputMessage"/> <fault name="Fault" message="ogsi:FaultMessage"/> </operation> <operation name="subtract"> <input message="tns:SubtractInputMessage"/> <output message="tns:SubtractOutputMessage"/> <fault name="Fault" message="ogsi:FaultMessage"/> </operation> <operation name="getValue"> <input message="tns:GetValueInputMessage"/> <output message="tns:GetValueOutputMessage"/> <fault name="Fault" message="ogsi:FaultMessage"/> </operation></gwsdl:portType>
</definitions>
Basic
Concepts
The
GridService
PortType
• a “grid service” is a web service that implements the GridService PortType
<portType name="GridService"><operation name="setServiceData"> [snip] </operation><operation name="destroy"> [snip] </operation><operation name="requestTerminationAfter"> [snip] </operation><operation name="requestTerminationBefore"> [snip] </operation><operation name="findServiceData"> [snip] </operation></portType>
<gwsdl:portType name="GridService"><sd:serviceData maxOccurs="unbounded" minOccurs="1" modifiable="false" mutability="constant" name="interface" nillable="false" type="xsd:QName"/> <sd:serviceData maxOccurs="unbounded" minOccurs="0" modifiable="false" mutability="mutable" name="serviceDataName" nillable="False" type="xsd:QName"/> <sd:serviceData maxOccurs="1" minOccurs="1" modifiable="false" mutability="mutable" name="factoryLocator" nillable="true" type="ogsi:LocatorType"/> <sd:serviceData maxOccurs="unbounded" minOccurs="0" modifiable="false" mutability="extendable" name="gridServiceHandle" nillable="false" type="ogsi:HandleType"/> <sd:serviceData maxOccurs="unbounded" minOccurs="1" modifiable="false" mutability="mutable" name="gridServiceReference" nillable="false" type="ogsi:ReferenceType"/> <sd:serviceData maxOccurs="unbounded" minOccurs="1" modifiable="false" mutability="static" name="findServiceDataExtensibility" nillable="false" type="ogsi OperationExtensibilityType"/> <sd:serviceData maxOccurs="unbounded" minOccurs="1" modifiable="false" mutability="static" name="setServiceDataExtensibility" nillable="false" type="ogsi:OperationExtensibilityType"/> <sd:serviceData maxOccurs="1" minOccurs="1" modifiable="false" mutability="mutable" name="terminationTime" nillable="false" type="ogsi:TerminationTimeType"/> <sd:staticServiceDataValues> <ogsi:findServiceDataExtensibility inputElement="ogsi:queryByServiceDataNames"/> <ogsi:setServiceDataExtensibility inputElement="ogsi:setByServiceDataNames"/> <ogsi:setServiceDataExtensibility inputElement="ogsi:deleteByServiceDataNames"/> </sd:staticServiceDataValues></gwsdl:portType>
GridService
PortType
• FindServiceData()• QueryByServiceDataNames()• GetServiceData()• SetByServiceDataNames()• DeleteByServiceDataNames()• RequestTerminationAfter()• RequestTerminationBefore()• Destroy()
Capabilities
of a
Grid
Service
• 2-level naming (GSH vs. GSR)
• Factories
• Lifetime management
• Service Data Elements
• Event Notification
• ServiceGroups
GSH
versus
GSR
• A GSH (Grid Service Handle) is a unique name for a Grid Service Instance
• A GSR (Grid Service Reference) is a perhaps temporary mechanism to access the Grid Service Instance
Factories
• Create new instances of services dynamically
• Individualized Instances
• lifetime management techniques
Service
Data
Elements
• Generalized State
– useful for describing capability
– Get/Set model similar to javaBeans Properties
• Can specify initial values in WSDL
• Integrated with Notification mechanism
Service
Data
Elements:
GridService
• Interface
• ServiceDataName
• FactoryLocator
• GridServiceHandle
• GridServiceReference
• TerminationTime
Notifications
• Source – implements NotificationSourcePortType– sends a notification message (XML Element) to Sinks• Sink– implements NotificationSinkPortType– sends a notification subscription request to source– causes a GridService Instance of porttype NotificationSubscription to be created
ServiceGroups
• A grid service that maintains information about other grid services• Can be used to implement a classic registry model• Can be used for dataset replication• A grid service can belong to more than one Service Group• Membership in a ServiceGroup can be homogeneous or heterogeneous• Service group portTypes are optional
Grid
Services:
Summary
• Extends Web Services to support Transient Services– WSDL 1.2 expected to include extensions• Requires support for factories, lifetime management, soft-state management, and
notifications• Java implementation pretty solid– Security implementation still shaky
69
Other Challenges
• Developing user interfaces
• Data Management
• Scheduling/co-scheduling of resources
• Failure management
• Application development
• Performance
• Many others…
70
What I hope you got from this talk
• Grid Computing is about – Co-ordinated use of different resources– Provisioning resources for increased utilization– Scaling to large numbers of resources, services
and users
• Many systems being built
• Many Applications being developed