A JSDL Application Repository Portal for Heterogeneous Grids and the NGS David Meredith NGS...

18
A JSDL Application Repository Portal for Heterogeneous Grids and the NGS David Meredith NGS Operations, e-Science Centre, Daresbury Laboratory, UK [email protected]

Transcript of A JSDL Application Repository Portal for Heterogeneous Grids and the NGS David Meredith NGS...

A JSDL Application Repository Portal for

Heterogeneous Grids and the NGSDavid Meredith

NGS Operations, e-Science Centre,

Daresbury Laboratory, UK

[email protected]

NGS Applications Repository Portal/Portlet

Core Functionality

1. A JSDL Repository:

• Search/browse for JSDL (personal and shared) by category of interest (e.g bioinformatics, chemistry, tutorials/examples). Select, load, modify, save.

• JSDL documents can be pre-configured and published by domain experts / resource administrators (users benefit from sharing expertise, artefacts and configuration captured in JSDL).

• Community formation around a “best practice” approach (OGF).

2. JSDL GUI Editor: for authoring, validating, sharing, uploading app descriptions.

3. Grid Operations: File Staging, Application Submission, Monitoring (run either ‘out-of-the-box,’ or modify/tweak as required).

4. Generic designed to be extensible, can extend to support different Grid middleware technologies and data staging protocols.

Ali Anjomshoaa, Fred Brisard, Michel Drescher, Donal K. Fellows, William Lee, An Ly, Steve McGough, Darren Pulsipher, Andreas Savva, Chris Smith

JSDL 1.0 is an OGF recommendation

JSDL 1.0 is published as GFD-R-P.56 – http://www.ggf.org/gf/docs/?final

<jsdl:Application> <jsdl:ApplicationName>gnuplot</jsdl:ApplicationName> <jsdl-posix:POSIXApplication> <jsdl-posix:Executable> /usr/local/bin/gnuplot </jsdl-posix:Executable> <jsdl-posix:Argument>control.txt</jsdl-posix:Argument> <jsdl-posix:Input>input.dat</jsdl-posix:Input> <jsdl-posix:Output>output1.png</jsdl-posix:Output> </jsdl-posix:POSIXApplication> </jsdl:Application> <jsdl:Resources> ….

JSDL

1. An XML Schema language for describing the requirements of computational jobs for submission to Grids.

2. Is agnostic of middleware - no dependencies on Globus, WSRF, gLite (means portal can be generic and not tied to any particular set of Grid technologies).

3. GGF / OGF Standard.

4. JSDL documents can be validated against the JSDL and JSDL POSIX XSD Schema to ensure its correctness

Grid Heterogeneity

Grid A - NGS

Middleware - GT

Grid B - EGEE

Middleware - gLite

• Different middleware adopt different formats for the description of applications and their associated resources (JDL, RSL), and for their subsequent execution to a Grid.

•A Number of different data storage resources are also relevant for management and transfer of data. e.g. GsiFTP, SRB, SRM, WebDav, (S)FTP.

Grid A Globus RSL (Resource Specification Language) &(executable=$(GLOBUSRUN_GASS_URL)/home/ngs0153/cpi) (arguments= 30 fileA) (jobType=mpi) (environment = (NGSMODULES mpich-gm/1.2.5..10-intel8.1:intel/fce/9.1.032) (TMP /tmp)) (count = 4) (hostCount = 8) (minMemory = 512) (maxWallTime = 3) (directory=/home/ngs0153) (stdin=/home/ngs0153/cpi.in) (stdout=/home/ngs0153/cpi.out) (stderr=/home/ngs0153/cpi.err)

Grid B gLite JDL (Job Description Language) Type = "Job";JobType = "Normal";RetryCount = 3;Executable ="/home/ngs0153/cpi";Arguments = "30 fileA";VirtualOrganisation = "myGridVOproject";StdInput = "cpi.in";StdOutput = "cpi.out";StdError = "cpi.err";InputSandbox = { "gsiftp://grid-data.rl.ac.uk:2811/home/ngs0153/cpi", "gsiftp://grid-data2.dl.ac.uk:2811/myhome/fileA" };InputSandboxDestFileName = { "cpi", "fileA" };OutputSandbox = { "cpi.out" };OutputSandboxDestURI = { "gsiftp://mygridhome.dl.ac.uk:2811/myhome" };DeleteOnTermination = { "fileA" };Environment = { "NGSMODULES=mpich-gm/1.2.5..10-intel8.1:intel/fce/9.1.032", "TMP=/tmp" };Requirements = ( other.GlueCEInfoLRMSType == "PBS" ) && ( member( GlueCEInfoHostName, {"grid-data.rl.ac.uk:2119" , "mygrid-resource.dl.ac.uk:2119" } ) ) && ( GlueHostProcessorModel == "Intel" );Rank = -other.GlueCEStateEstimatedResponseTime;

• Middleware specific dependencies added at run time - convert the JSDL into middleware specific scheme (e.g. RSL).

• Add mw-specific parameters, e.g. RSL JobType (cater for this in JSDL using XML Schema extensions in place of <xsd:any> placeholder elements)

• Portal Database has to accommodate all middleware variations.

Catering for Grid Heterogeneity

GT2 RSL extension XML schema<?xml version="1.0" encoding="UTF-8"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"xmlns="http://www.ggf.org/namespaces/2004/11/jsdl-rsl-1.0.xsd"targetNamespace="http://www.ggf.org/namespaces/2004/11/jsdl-rsl-1.0.xsd"elementFormDefault="qualified">

<xsd:element name="jobType" type="jobType"/><xsd:element name="gramMyJob" type="gramMyJob"/><xsd:element name="dryRun" type="boolean" default="no"/><xsd:element name="save_state" type="boolean" default="no"/>

Portal is ‘open’, free to browse pubic JSDL documents without log-in (free to use JSDL editor).

Login required to browse personal applications, save and submit jobs, interact with Grid resources.

List jobs, read job descriptions and load a job to initialise the ‘Active Job.’

Changes to the parameters in the GUI will update and validate the JSDL template automatically.

Applications Repository

Input fields are pre-configured / filled out.

Fields are taken from the JSDL and JSDL-POSIX extension schemas.

POSIXApplication is a JSDL extension. It defines standard POSIX elements.

–stdin, stdout, stderr–Working directory–Command line arguments–Environment variables

<POSIXApplication>

<Executable ... />

<Input ... />?

<Output ... />?

<Error ... />?

<WorkingDirectory ... />?

</POSIXApplication>

<POSIXApplication>

<Executable ... />

<Input ... />?

<Output ... />?

<Error ... />?

<WorkingDirectory ... />?

</POSIXApplication>

‘My Job’ Detail

<jsdl1:Environment name=“TMP">/tmp</jsdl1:Environment> <jsdl1:Environment name="NGSMODULES">envVarValue1</jsdl1:Environment>…..

Environment Variables

Paste and parse command line arguments (space and/or line separated values)

<jsdl1:Argument>fasta34</jsdl1:Argument><jsdl1:Argument>-H</jsdl1:Argument><jsdl1:Argument>humanDNA2.input</jsdl1:Argument><jsdl1:Argument>/var/data/bioinformatics/..</jsdl1:Argument><jsdl1:Argument>S</jsdl1:Argument>

Command Line Arguments

Named file systems used to declare mount points on the consuming system.

File system names are referenced throughout the portal (and JSDL doc) for substituting mount points.

Changes to a FS mount point will be updated automatically throughout the portal/JSDL.

Used when specifying path info e.g. locations to files/dirs, stage data locations etc.

<jsdl:FileSystem name=“WORKINGDIR"> <jsdl:MountPoint>/home/ngs0024/myScratchDir </jsdl:MountPoint></jsdl:FileSystem><jsdl:FileSystem name=“DataDir"> <jsdl:MountPoint>/home/ngs0024/myDataDir</jsdl:MountPoint></jsdl:FileSystem>…<jsdlposix:Output filesystemName="WORKINGDIR"> fasta.out </jsdl1:Output>

Named File Systems

List of data from across the Grid that should be copied to the consuming system

Before job: src URI

After job: tgt URI

JSDL does not mandate the protocol / URI format.

Data is staged relative to named file systems.

<jsdl:DataStaging> <jsdl:FileName>Mg.psf</jsdl:FileName> <jsdl:FilesystemName>WORKINGDIR</jsdl:FilesystemName> <jsdl:CreationFlag>overwrite</jsdl:CreationFlag> <jsdl:DeleteOnTermination>false</jsdl:DeleteOnTermination> <jsdl:Source> <jsdl:URI>gsiftp://ngs.rl.ac.uk:2811/apps/Siesta_mpi/…</jsdl:URI> </jsdl:Source> </jsdl:DataStaging>

Stage Data

Candidate Hosts: resources that can be used to run the given application.

The candidate host list can contain personal and default hosts (available to all users).

In future, a RB matchmaking will be used to select execute host from candidate hosts.

<jsdl:CandidateHosts> <jsdl:HostName> ngs.rl.ac.uk:2119 </jsdl:HostName> <jsdl:HostName> clyde.dl.ac.uk:2119 </jsdl:HostName></jsdl:CandidateHosts>

Candidate Hosts

Browse Host / Data Transfer

• File and recursive directory transfers between hosts

• File and directory operations

• Actions for updating application

1. JSFv1.1 (Java Server Faces) GUI.

2. JSR-168 compliant. Vanilla JSF (core spec) is JSR-168 compliant so can host as Web application or portlet within institutional portals (JSF extensions can be problematic).

3. Spring v2.0 for managing objects in an n-tier server application (highly recommended, adds J2EE to non J2EE apps, e.g. Tomcat/Jetty apps).

• Declarative transaction demarcation (akin to EJB 3 session beans).

• Data source management (e.g. JPA PstCtx, Hibernate Session).

• Propagation of Data Source across DAO’s / session façade’s during long running transactions.

4. C3p0 pooled database connections.

5. JPA (Java persistence API) for ORM (object relational mapping). Hibernate 3.2 for domain model (could use Kodo, Toplink, apache openJPA).

6. CogKit for Globus API from Globus.

7. Object / Xml data binding framework. XMLBeans / JAX-B.

Technical

• Parametric jobs (parametric JSDL extension schema – defines parametric variables, functions, ranges for modifying JSDL doc for iteration).

• Middleware extensions, e.g. gLite resource broker, JSDL conversion to JDL (aim to use SAGA).

• Integrate OMII WHIP artefact sharing framework (gather and bundle remote resources / artefacts together into self contained application bundle, e.g. executable for particular OS, src, input files, data files).

• Support Roles / VO’s (for artefact sharing, not just public / personal).

• Shibboleth enable.

• Describe more apps using NGS Uniform Execution Environment (UEE) - standard way to describe same application across different (NGS) resources – consistent JSDL description with multiple candidate hosts for the same app.

• Improvements / refinements (AJAXify)

TODO

• Staging from more Data Grid + Web protocols (SRB). Browsing / file operations with different data storage resources. Staging across different protocols adds complexity (buffering required).

CURRENT

Please come and find me at the NGS Stand

Demo on the OMII booth (2.00pm Wed)

https://portal.ngs.ac.uk

1. Please contact NGS to request more hosted applications.

Summary

1. JSDL Repository: https://portal.ngs.ac.uk

• Search/browse for JSDL (personal and shared) by category of interest (e.g bioinformatics, chemistry, tutorials/examples). Select, load, save application (run either ‘out-of-the-box,’ or modify/tweak as required).

• JSDL documents can be pre-configured and published by domain experts / resource admins (users benefit from sharing expertise and artefacts captured in JSDL).

• Community formation around a “best practice” approach (JSDL is an OGF recommendation).

2. JSDL GUI Editor: for authoring, validating, sharing, uploading app descriptions.

3. Grid Operations: File Staging, Application Submission, Monitoring.

4. Generic and not tied to any particular set of Grid technologies. Extend to support more middleware and staging protocols.