9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid...

52
9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26, 2010

Transcript of 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid...

Page 1: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

9-1.1

“Grid-enabling” applications

Part 1

© 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26, 2010

Page 2: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Grid-enabling an applicationA poorly defined and understood term.

It does NOT mean simply executing a job of a Grid platform!Almost all computer batch programs can be shipped to a remote Grid site and executed with little more than with a remote ssh connection.

This is a model we have had since computers were first connected (via telnet).

Grid-enabling should include utilizing the unique distributed nature of the Grid platform.

9-1.2

Page 3: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Grid-enabling an application

With that in mind, a simple definition is:

Being able to execute an application on a Grid platform, using the distributed

resources available on that platform.

However, even that simple definition is not agreed upon by everyone!

9-1.3

Page 4: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

A broad definition that matches our view of Grid enabling applications is:

“Grid Enabling refers to the adaptation or development of a program to provide the capability of interfacing with a grid middleware in order to schedule and utilize resources from a dynamic and distributed pool of “grid resources” in a manner that effectively meets the program’s needs”2

2 Nolan, K., “Approaching the Challenge of Grid-Enabling Applications.,” Open Source Grid & Cluster Conf., Oakland, CA, 2008.

9-1.4

Page 5: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

9-1.5

How does one do “Grid-enabling”?

Still an open question and in the research domain without a standard approach.

Here we will describe various approaches.

Page 6: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

We can divide the use of the computing resources in a Grid into two types:

•Using multiple computers separately to solve multiple problems

•Using multiple computers collectively to solve a single problem

9-1.6

Page 7: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Using Multiple Computers SeparatelyParameter Sweep Applications

In some domains areas, scientists need to run the same program many times but with different input data.

“Sweep” across parameter space with different values of input parameter values in search of a solution.

Many cases, not easy to compute answer and human intervention is required for to search or design space

9-1.7

Page 8: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Parameter Sweep ApplicationsExamples

•A scientist might wish to search for a new drug and needs to try different formulations that might best fit with a particular protein.

•A design engineer might be studying effects of different aerodynamic designs on performance of an aircraft.

•Computing aesthetic design process with many possible alternative designs and a human has to choose.

•Sometimes, a learning process - design engineer wishes to understand effects of changing various parameters.

9-1.8

Page 9: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Parameters in Parameter Sweep

Typically, many parameters that can be altered.

Might be a vast combination of parameter values.

Ideally, some automated way of doing parameter sweep needed that includes both specifying parameter sweep and a way of scheduling individual sweeps across Grid platform.

9-1.9

Page 10: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Implementing Parameter Sweep

Can be simply achieved by submitting multiple job description files, one for each set of parameters but that is not very efficient.

Parameter sweep applications are so important that research projects devoted to making them efficient on a Grid.

Parameter sweeps appears explicitly in job description languages.

9-1.10

Page 11: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

RSL-2/JDD Example

<count> 5 </count>

causes five instances of job to be submitted.

Simply cause five identical executables submitted.

Four would be pointless unless either:•Code selected actions for each instance, or •different inputs and output files selected for each instance in job description file.

Job description elements usually can be specified to change for each instance.

9-1.11

Page 12: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

JSDL (version 1)Originally did not have parameter sweep.

Has been (unofficially) extended to incorporate features for parameter sweep.

Two forms of parameter sweep creation identified:

•Enumeration in a list, and•Numerically related arguments.

9-1.12

Page 13: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Arguments Enumerated in a List

Two additional elements:

•<Parameter> To specify selection of parameters•<Value> To list the values

contained within an <Assignment> element for each assignment.

Multiple/nested assignments for various scenarios:

• Single substitution or • Multiple simultaneous substitutions in different combinations.

9-1.13

Page 14: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

9-1.14Fig 9.1

Page 15: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Parameter sweep element selection and substitution

9-1.15Fig 9.2

Page 16: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Selecting XML Element

Expression needed that selects an XML element.

XPath expression -- provides a way to select an XML element in a XML document.

9-1.16

Page 17: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

XPathSuppose XML document has form:

<a><b>

<c> </c>

</b></a>

XPath expression to identify element :

<c> ... </c>

would be /a/b/c

9-1.17

Page 18: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

XPath allows for much more expressive forms.

For example suppose multiple tags called <c>:

<a><b>

<c> </c>..<c> </c>

</b></a>

Expression to select 3rd <c> element is /a/b/c[3]

9-1.18

Page 19: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

To take an example for parameter sweep, consider JSDL job:

<jsdl:JobDefinition>

<jsdl:JobDescription>

<jsdl:Application>

<jsdl-posix:POSIXApplication>

<jsdl-posix:Executable>/bin/echo</jsdl-posix:Executable>

<jsdl-posix:Argument>Hello</jsdl-posix:Argument>

<jsdl-posix:Argument>Fred</jsdl-posix:Argument>

</jsdl-posix:POSIXApplication>

</jsdl:Application>

</jsdl:JobDescription>

</jsdl:JobDefinition>

9-1.19

Page 20: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

To alter second argument to be Bob, Alice, and Tom (3 sweeps):<jsdl:JobDefinition><jsdl:JobDescription><jsdl:Application><jsdl-posix:POSIXApplication><jsdl-posix:Executable>/bin/echo</jsdl-posix:Executable><jsdl-posix:Argument>Hello</jsdl-posix:Argument><jsdl-posix:Argument>Fred</jsdl-posix:Argument>

</jsdl-posix:POSIXApplication></jsdl:Application></jsdl:JobDescription><sweep:Sweep><sweep:Assignment>

<sweep:Parameter>//jsdl-posix:Argument[2]</sweep:Parameter><sweepfunc:Values><sweepfunc:Value>Bob</sweepfunc:Value><sweepfunc:Value>Alice</sweepfunc:Value><sweepfunc:Value>Tom</sweepfunc:Value>

</sweepfunc:Values></sweep:Assignment>

</sweep:Sweep></jsdl:JobDefinition> 9-1.20

Page 21: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Question

What is the output from the echo programs?

9-1.21

Page 22: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Numerically Related Arguments

Job description languages such as JSDL can be extended to increment an integer argument automatically with a for-like construct.

for construct would specify the values of an argument, which would substitute in a similar fashion to the previous substitutions—essentially a macro-substitution.

9-1.22

Page 23: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Example - XPML job description languageFirst, a parameter element specifies argument values, for example

<parameter name="arg1" type="integer" domain="range">

<range from="1" to="99" type="step" interval="2"/>

</parameter>

Argument called arg1. Values for arg1 here are 1,3,5 ... 99.

Argument arg1 would occur later within execute element:

<execute>

<command value=" ... "/>

<arg value="$arg1"/>

...

</execute>

One value of arg1 substitutes for each sweep. 9-1.23

Page 24: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Using Multiple Computers Collectively

9-1.24

Page 25: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Data partitioning

Perhaps easiest way to use multiple computers together.

Divide data into parts.

Each computer works on each part.

9-1.25

Page 26: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Example

BLAST algorithm used in bioinformatics to find statistical matches between gene sequences.

User might submit sequence query that is compared to a very large database of known sequences in order to discover relationships or to match sequence to a gene family.

Databases extremely large.

9-1.26

Page 27: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Partitioning BLAST database

9-1.27

If just one sequence from user, database partitioned into parts and different computers work on different parts.

Fig 9.3

Page 28: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Alternatively, if user(s) submitting many queries, submit each query to a different computer having access to whole database

9-1.28Fig 9.4

Page 29: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Legacy Code

9-1.29

In many cases, Grid users want to re-use their existing programs written in C, C++ or even Fortran if really old.

Documented source code may not be available.

May be pre-packaged by manufacturer so rewriting not an option.

Page 30: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

9-1.30

Grid Enabling Legacy Software (GriddLeS)

One project that addresses porting legacy code onto a Grid.

Focuses on file handling

Overloads existing file handling routines and redirects requests to remote locations if required.

Page 31: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

9-1.31

Grid Enabling Legacy Software (GriddLeS)

Derived from: http://www.csse.monash.edu.au/~davida/griddles/

Page 32: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Exposing an Application as a Service

• Grid computing has embraced Web service technology so natural to consider its use for accessing applications.

• “Wrap” application code to produce a Web service

• “Wrapping” means application not accessed directly but through service interface

9-1.32

Page 33: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Web Service Wrapper Approach

9-1.33Fig 9.5

Page 34: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Web service invoking a program

If Web service written in Java, service could issue a command in a separate process using exec method of current Runtime object with the construction:

Runtime runtime = Runtime.getRuntime();

Process process = runtime.exec(“<command>” )

where <command> is command to issue, capturing output with

OutputStream stdout = process.getOutputStream();

...9-1.34

Page 35: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Portlet acting as a front-end to a wrapped application

9-1.35Fig 9.6

Page 36: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Application with physically distributed components

9-1.36Fig 9.7

Page 37: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Using Grid Middleware API’s

Could use Grid middleware APIs in application code for operations such as:

• File input/output

• Starting and monitoring jobs

• Monitoring and discovery of Grid resources.

9-1.37

Page 38: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Using Globus API’s

Globus provides suite of services that have APIs (C and Java interfaces) that could be called from the application.

Extremely steep learning curve!!

Literals hundreds, if not thousands, of C and Java routines listed at the Globus site.

No tutorial help and sample usage.9-1.38

Page 39: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Code using Globus APIs to copy a file (C+

+)

Directly from (van Nieuwpoort) Also in (Kaiser 2004) (Kaiser

2005).

9-1.39

Page 40: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Using CoG kit API’s

Using CoG kit API’s is at slightly higher level.

Not too difficult but still requires setting up the Globus context.

9-1.40

Page 41: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

CoG Kit program

to transfer files

9-1.41

Page 42: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Higher Level Middleware-Independent APIs

Higher level of abstraction than Globus middleware API’s desirable because:

•Complexity of Globus routines

•Grid middleware changes very often

•Globus not only Grid middleware

9-1.42

Page 43: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Other Grid middlewareIncludes:

• UNICORE (Uniform Interface to Computing Resources)

• gLite (Lightweight Middleware for Grid computing)

– part of EGEE (Enabling Grids for E-sciencE) collaborative.

To give an indication of the rapid changes that occur:

• gLite 3.0.2 Update 43 released May 22, 2008.• gLite 3.1 Update 27 released July 3, 2008 6 weeks later. 9-1.43

Page 44: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Concept of higher-level API’s above

Grid middleware

9-1.44

Higher-level API’s should expose simple interface not tied to specific version of Grid middleware or even Grid middleware family at all.

Fig 9.8

Page 45: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

9-1.45

Grid Application Toolkit (GAT)

• APIs for developing and executing portable Grid applications that are independent of the underlying Grid infrastructure and available services.

• Developed in 2003-2005 time frame.

Page 46: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

9-1.46

Page 47: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

9-1.47

Copy a file in GAT/C++(Kaiser, H. 2005)

Page 48: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

9-1.48

SAGA(Simple API for Grid Applications)

A subsequent effort made by Grid community to standardize higher level API’s

Page 49: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

9-1.49

SAGA Reading a file (C++) (Kielmann 2006)

Page 50: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

9-1.50

What is meant by parameter sweep?

(a) Executing an application multiple times each time with the arguments specifically incremented by one each time

(b) Executing an application multiple times with the same arguments

(c) Executing an application multiple times with different arguments

(d) Cleaning out the parameters from a computer program

SAQ 9-2

Page 51: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

9-1.51

What is the XPath expression to select the second c element within the second b element within the second a element of an XML document?

(a) 2a/2b/2c

(b) a/b/c[2]

(c) a[2]/b[2]/c[2]

(d) a2/b2/c2

(e) 2a2b2c

(f) None of the other answersSAQ 9-3

Page 52: 9-1.1 “Grid-enabling” applications Part 1 © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. slides9-1.ppt Modification date: Feb 26,

Questions

9-1.52