"One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle,...

59
"One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members ) Paul Brebner University College London [email protected] "Grid middleware is easy to install, configure, secure, debug and manage - across multiple sites"

Transcript of "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle,...

Page 1: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

"One can't believe impossible things"

UK OGSA Evaluation Project

(UCL, Imperial, Newcastle, Edinburgh)

(Full list of project members)

Paul Brebner

University College London

[email protected]

"Grid middleware is easy to install, configure, secure, debug and manage - across multiple sites"

Page 2: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Complexity – The Grid will be BIG

Page 3: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Complexity - growing

Page 4: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Complexity – built on the internet

Page 5: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Complexity – but more complex

Page 6: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Simplicity – Start with something simple

• OGSA– OGSI

• GT3.2 – exemplar of a Grid SOA

• Initially evaluate installation, configuration, and security

• Then performance and scalability, deployment, architectural choices, etc.

Page 7: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Realism – But realistic test-bed

• Heterogeneous platforms– Linux, Solaris, Windows

• Cross-organisational– Four nodes– Independently administered– Firewalls and access restrictions

• Security– UK e-Science CA

Page 8: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Confusion – What is Globus?

• How is Globus intended to be used?– 1: Science as first-order services: Middleware

for building and hosting Grid Applications, by exposing science code as Grid services.

– 2: Middleware as services: As a set of high level Grid services, composed to provide new Grid functionality. Science isn’t first-order service, but managed by Grid services.

Page 9: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Confusion – Science services or Grid services

Client

E=mc2

1

Page 10: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Confusion – Science services or Grid services

Client

E=mc2

1

D=A+2B+C2

Page 11: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Confusion – Science services or Grid services

Client

2

D=A+2B+C2

E = mc2

E=mc2

1

D=A+2B+C2

Page 12: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Confusion – How to evaluate

• Do we evaluate GT3 as middleware for hosting Grid services, or as a toolkit for constructing Grid middleware?

• If the first, only need GT3 Core – just the container. If the second, need “All Services” (and more – there’s no scheduler).

Page 13: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Simplicity – Incremental

• Start with Core Package

• Add Security

• Then try “All Services”

• Simple enough – in theory

Page 14: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Steps – single node

Install

OS/HW

GT3

Install

Page 15: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Steps – single node

Install

Configure

OS/HW

GT3

Install

Page 16: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Steps – single node

Install

Configure

Deploy

OS/HW

GT3

Install

Page 17: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Steps – single node

Install

Configure

Deploy

Run

OS/HW

GT3

Install

Page 18: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Steps – Multiple sites

GT3

Page 19: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Steps – Multiple sites

GT3 GT3 GT3 GT3

Page 20: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Steps – Multiple sites

GT3 GT3 GT3 GT3

Interoperate

Page 21: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Steps – Multiple sites

GT3 GT3 GT3 GT3

Interoperate

GT3 GT3

Secure

Page 22: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Steps – Multiple sites

GT3 GT3 GT3 GT3

Interoperate

GT3 GT3

SecureManage

Page 23: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Reality – What we found

• Port number management• Host access• Remote visibility of installation, container,

services• Installation by System Administrators• Tomcat or Test container• Compilation issues on Solaris• Exponential increase in testing complexity as

number of nodes increases.

Page 24: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Reality – What we found

• Port number management– Post number conflicts (with other services)– What port is the container running on?

Page 25: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Reality – What we found

• Host access– Is the container visible on that port externally?– From which machines?– For which users?– Non-trivial to test/debug if/when something

goes wrong

Page 26: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Reality – What we found

• Remote visibility of installation, container, services– What infrastructure is installed?– What packages and versions?– How is it configured?– What state is it in?

Page 27: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Reality – What we found

• Installation by System Administrators– Division of roles– Didn’t meet expectations– Extra effort to support multiple roles

• System Administrators – install, configure and secure

• Globus Administrators – test, maintain• Globus Developers – develop, deploy, test/use Grid

services

Page 28: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Reality – What we found

• Tomcat or Test container– Differences in deployment, configuration, and

management– With Tomcat, increased potential for

centralised management, and sand-boxing of run-time environment

Page 29: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Reality – What we found

• Compilation issues on Solaris– Took longer than expected– Only Linux testing and support can be taken for

granted

Page 30: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Reality – What we found

• Exponential increase in testing complexity as number of nodes increases– Testing (and maintaining) interoperability

between m client machines, and n servers gets complicated.

– How well will this scale for 100s, 1000s of nodes?

Page 31: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Reality – Security

• In theory just had to– obtain (and update) host, client, and CA certificates

– convert

– install

– configure

– generate (and update) proxies.

• However, parts of “All Services” package also needed.

Page 32: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Security - What we found

• Interactions between security for multiple installations

• Essential to test non-secure interoperability first• Windows client-side security• Testing and viewing security configuration• Debugging secure calls• Client side security is programmatic• Security management scalability

– Construction and maintenance of user accounts and grid-map file entries.

Page 33: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Security - What we found

• Interactions between security for multiple installations– For testing may want

• multiple versions, or duplicates (with different configurations) of same versions.

• One container with no security, and another container with security

– May want test/production environments

Page 34: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Security - What we found

• Essential to test non-secure interoperability first– Trying to test interoperability and security

simultaneously wasn’t fun

Page 35: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Security - What we found

• Windows client-side security– Still havn’t got it working– Not obvious exactly what parts of Globus are

needed for client side code with security (no “client plus security” package).

Page 36: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Security - What we found

• Testing and viewing security configuration– Need to be able to view/edit and check security

configuration for containers and services– Confusion about hierarchical security settings

• Virtual Organisations, clusters, servers, containers, factories, services, methods, and instances.

– Remotely– Validate security deployment before run-time

Page 37: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Security - What we found

• Debugging secure calls (or any stateful service)– Proxy interceptor approach (e.g. TCPMON) won’t

work with stateful services• As grid handle returned to client contains the port number of

the instance, not the proxy

– But proxies are an important design pattern for SOAs…

– GT4/WS-RF may be different• Handle resolvers, WS-Addressing and WS-

RenewableReferences

Page 38: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Security - What we found

• Client side security is programmatic– Client side code modifications required to call

services/methods with required protocols– Should be declarative– Sensitive to server side security credentials

Page 39: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Security - What we found

• Security management scalability– Construction and maintenance of user accounts and grid-map file

entries.– For each server, each user needs an account, and an entry in the

container gridmap file (mapping client certificate to account)– May also need service specific gridmap files– Not scalable for large numbers of users, servers, services.

• Alternatives?– Tool support

– Role based authentication

– Shared accounts or certificates

Page 40: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Recommendations

• If Globus is middleware, then need:– Platform independent, automatic, installation.– Tool support for configuration and deployment

creation, validation, viewing and editing.– Management console for grid, nodes, globus

packages, containers and services.– Support for remote, location independent,

cross-organisational, multiple role scenarios.

Page 41: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Recommendations (continued)

• If Globus is middleware, then need:– Remote deployment and management of

services.– Remote distributed debugging of grid

installations, services, and applications.– Tool support, and more scalable processes for

security.

Page 42: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Alternatives

• Next we plan to evaluate the two architectural choices in more detail– Science exposed as services, vs science code managed

by higher level grid services.

• Explore alternative mechanisms for:– Load balancing and resource management

– Directory services (service and resource discovery)

– Data movement approaches (e.g. SOAP Attachments vs GridFTP)

Page 43: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Performance

• First approach (initial results)– Scientific benchmark (SciMark2.0) modified to measure

throughput, and invoked as a Stateful Grid Service– Metric is Calls Per Minute (CPM) – one unit of work.– No data movement, just computation and memory load.– JVM: 512MB Heap and –server (of course )

• Good performance and scalability– Security has minimal overhead– Problem with client side timeouts as response times

increase

Page 44: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Performance

ART (s)

0

50

100

150

200

0 10 20 30 40 50 60 70

Threads

Tim

e (

s)

UCL (4 cpu Sun)

Newcastle (2 cpu Intel)

Imperial (2 cpu Intel)

Edinburgh (4 hyperthread cpu Intel)

All

TomcatFastest: 3.6s (Edinburgh)Slowest: 25s (UCL)

Page 45: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Performance

Throughput (CPM)

0

10

20

30

40

50

60

70

80

0 20 40 60 80

Threads

CP

M

UCL (4 cpu Sun)

Newcastle (2 cpu Intel)

Imperial (2 cpu intel)

Edinburgh (4 hyperthread cpu Intel)

All (12 cpus)

Theoretical Maximum

95% of predicted maximum throughput

Page 46: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Performance

• Tomcat vs Test container– No difference on 3 out of 4 nodes

– But 67% faster on one node (Newcastle, slowest Intel box)

• Attachments will work with GT3 and Tomcat– But not with security

– Limit of 1GB (DIME)

– Bug in Axis – doesn’t clean up temporary files.

Page 47: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Performance

• Stateful instances can be problematic– Intermittent unreliability

• On some runs, 1 exception in 300 calls (reliability of .9967)– But non-repeatable, SOAP/network related?

• What is the safe response to exceptions? Can’t just retry.

– Possible to kill container (relies on clients being well behaved):

• By invoking same instance/method more than once.• By consuming container resources

– But instances can be passivated/activated in theory– Could be used to enable fine-grain (per instance) control over resource

usage.

Page 48: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Deployment

• How to install and configure Grid infrastructure and services - scalably and securely?

• Install GT3 infrastructure and security manually– MMJFS allows executable code to be staged

automatically (But not services - could provide a deployment service).

• Install bootstrapping code, and then install and deploy all other code and security automatically.– Using SmartFrog (HP) in the lab, and then test-bed.– Configuring GT3 security remotely is an open-issue, as is

“trust” with System Administrators.

Page 49: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Dreams - Debugging

• Debugging distributed systems is tricky– Need better support for cross-cutting non-functional concerns such

as deployment and debugging.– (One) problem with debugging services is not knowing the context

of errors (to aid diagnosis or cure) – a service is just an interface.

• Deployment aware debugging:– Starting from functional work-flows, generate deployment-flows,

which are executed prior to, or concurrent with, functional work-flows.

– If failure in functional work-flow, then corresponding deployment-flow is examined to determine likely causes, and parts are re-executed.

Page 50: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Grid Dreams - Debugging

• Backtrack through deployment steps (Like peeling an onion)– Some steps will need to be reversed– Track dependencies, and redundant operations.

• This approach may fix an (interesting) sub-class of problems:• Those which can be fixed by simply redoing (or replicating) (part of) the

installation, E.g.– Intermittent failure of container or services– Resource starvation or overload

• Security problems that can be fixed with reconfiguration or refresh of certificates/proxies.

– But not:• network, or all configuration and security/access problems.

Page 51: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

UK OGSA Evaluation Project

• Thank you – Questions/Comments?

• Email: [email protected]– After November: [email protected]

Page 52: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

UK OGSA Evaluation Project

• Thank you – Questions/Comments?

• Email: [email protected]– After November: [email protected]

• Not

Page 53: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

UK OGSA Evaluation Project

• Thank you – Questions/Comments?

• Email: [email protected]– After November: [email protected]

• Not (quite)

Page 54: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

UK OGSA Evaluation Project

• Thank you – Questions/Comments?

• Email: [email protected]– After November: [email protected]

• Not (quite) the

Page 55: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

UK OGSA Evaluation Project

• Thank you – Questions/Comments?

• Email: [email protected]– After November: [email protected]

• Not (quite) the End

Page 56: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

UK OGSA Evaluation Project

• Thank you – Questions/Comments?

• Email: [email protected]– After November: [email protected]

• Not (quite) the End…

Page 57: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Postscript – The Secret Life of Grid?

UK OGSA Evaluation Project Report 1.0

Evaluation of Globus Toolkit 3.2 (GT3.2) Installation

http://sse.cs.ucl.ac.uk/UK-OGSA/Report1.doc

Page 58: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Postscript – The Secret Life of Grid?

Our experiences Evaluating Grid technology reminds me of an Australian book (“The Secret Life of Wombats”) about a school boy who used to sneak out of his dormitory after everyone was asleep to go “wombatting”. He spent his nights secretly crawling down Wombat burrows with a flashlight – a potentially lethal activity (not just from cave-ins, as wombats are ferocious when cornered!) – and wrote copious notes resulting in a substantial increase in knowledge of these “mysterious and often misunderstood creatures”.

UK OGSA Evaluation Project Report 1.0

Evaluation of Globus Toolkit 3.2 (GT3.2) Installation

http://sse.cs.ucl.ac.uk/UK-OGSA/Report1.doc

Page 59: "One can't believe impossible things" UK OGSA Evaluation Project (UCL, Imperial, Newcastle, Edinburgh) (Full list of project members)Full list of project.

Postscript – The Secret Life of Grid?

Our experiences Evaluating Grid technology reminds me of an Australian book (“The Secret Life of Wombats”) about a school boy who used to sneak out of his dormitory after everyone was asleep to go “wombatting”. He spent his nights secretly crawling down Wombat burrows with a flashlight – a potentially lethal activity (not just from cave-ins, as wombats are ferocious when cornered!) – and wrote copious notes resulting in a substantial increase in knowledge of these “mysterious and often misunderstood creatures”.

UK OGSA Evaluation Project Report 1.0

Evaluation of Globus Toolkit 3.2 (GT3.2) Installation

http://sse.cs.ucl.ac.uk/UK-OGSA/Report1.doc