PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

46
PROGRESS – Computing Portal and Data Management in the Cluster of SUNs Michał Kosiedowski Sun HPC Consortium Heidelberg 2003

description

PROGRESS – Computing Portal and Data Management in the Cluster of SUNs. Michał Kosiedowski Sun HPC Consortium Heidelberg 2003. R & D Center. PSNC was established in 1993 and is an R&D Center in: New Generation Networks POZMAN and PIONIER networks 6-NET, SEQUIN, ATRIUM projects - PowerPoint PPT Presentation

Transcript of PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Page 1: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

PROGRESS – Computing Portal and Data Management in the

Cluster of SUNsMichał Kosiedowski

Sun HPC ConsortiumHeidelberg 2003

Page 2: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 2

R & D Center• PSNC was established in 1993 and is an

R&D Center in:– New Generation Networks

• POZMAN and PIONIER networks• 6-NET, SEQUIN, ATRIUM projects

– HPC and Grids• GRIDLAB, CROSSGRID, PROGRESS projects

– Portals and Content Delivery Tools• Polish Educational Portal "Interkl@sa", Multimedia

City Guide, dLibra framework

Page 3: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 3

Center of Excellence• PSNC became the

Sun CoE in New Generation Networks, Grids and Portals in November 2002

Page 4: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 4

PROGRESS• Duration: December 2001 – May 2003 (R&D)• Budget: ~4,0 MEuro• Project Partners

– SUN Microsystems Poland– PSNC IBCh Poznań– Cyfronet AMM, Kraków– Technical University Łódź

• Co-funded by The State Committee for Scientific Research (KBN) and SUN Microsystems Poland

Page 5: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 5

PROGRESS (2)• Cluster of 80

processors• Networked

Storage of 1,3 TB• Software:

ORACLE, HPC Cluster Tools, Sun ONE, Sun Grid Engine

Page 6: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 6

PROGRESS – architecture

Page 7: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 7

PROGRESS – the talk• HPC Window – the access environment

to grid resources: user interfaces (Computing Portal and Migrating Desktop) + Grid Service Provider

• Data Management System – a distributed system designed for storing scientific data used in experiments performed in the computing cluster (grid)

Page 8: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 8

HPC Window - motivation• grid access environments lacked

flexibility: one grid -> one portal• user must have relied on the grid

as far as the history is concerned• user interfaces were not that much

functional

Page 9: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 9

Grid-Portal EnvironmentPORTAL

HPCRESOURCES

GRIDMANAGEMENT

SYSTEM

GRID SERVICEPROVIDER

4-tier new4-tier newgrid-portal environmentgrid-portal environment

PORTAL

HPCRESOURCES

GRIDMANAGEMENT

SYSTEM

3-tier classical3-tier classicalgrid-portal grid-portal

environmentenvironment

Page 10: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 10

PROGRESS GPE

Content P

rovider

Webservice

Session B

ean

Entity B

eans

Page 11: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 11

PROGRESS HPC Portal• interaction with users• presentation of data obtain from

services

Page 12: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 12

PROGRESS HPC Portal (2)

Page 13: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 13

PROGRESS HPC Portal (3)

Page 14: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 14

PROGRESS HPC Portal (4)

Page 15: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 15

PROGRESS HPC Portal (5)

Page 16: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 16

PROGRESS HPC Portal (6)

Page 17: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 17

PROGRESS HPC Portal (7)

Page 18: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 18

PROGRESS HPC Portal (8)

Page 19: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 19

Grid Service Provider• the use of the grid resources more comfortable

to the end users• allows for easy building of numerous portals and

other user interfaces; users can switch from one to another and use the same GSP services

• various thematic scientific web portals sharing the same grid resources

• possibility of providing all clients (user interfaces) with computing resources belonging to two or more different grids

Page 20: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 20

Grid Service Provider (2)• Necessary services to provide:

– job submission service• managing the creation of user jobs, their submission to the

grid and the monitoring of their execution (typically through reverse reporting performed by the Grid Management System about events connected with the execution of jobs)

– application management service• storing information about applications available for running

in the grid• assisting application developers in adding new applications

to the application repository– provider management service

• keeping up-to-date information on the services available within the provider

Page 21: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 21

GSP: Job submission service

• computing job building, submitting them to the grid for execution and viewing the results

• job description is prepared using the XRSL language and transferred to the grid resource broker for the execution of the job

• grid resource broker reverse reports on grid events connected with the job

• "workflowed" jobs: sequences and parallels

Page 22: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 22

GSP: Application mgmt. srv.

• application repository management• application descriptor contains a reference to the

application executable: a reference to a file stored in the DMS or a path to a binary on grid computing server filesystems

• also included in the application descriptor: available (required or optional) arguments, required environment variables and required input and output files

• applications in PROGRESS may be unconfigured or configured: one executable -> multiple configured applications

• virtual applications

Page 23: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 23

GSP: Provider mgmt. service

• enables keeping up-to-date information on services available in the grid service provider

• a service descriptor contains information on the Web Service interface: URL at which the service is available, the service namespace reference (URN) and the service WSDL reference

• services may have multiple instances: informational services

Page 24: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 24

GSP: Informational services

• examples of instance enabled services

• intended for use by web portals• PROGRESS example: short news

service• other: document directory,

discussion forum (under development)

Page 25: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 25

GSP: XRSL Language• Extended Resource Specification

Language (XRSL) is an XML based language designed for description of computing jobs

• the XML documents describing grid computing jobs are passed to the grid resource broker, which analyzes them and executes jobs in accordance with requirements included

Page 26: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 26

Data Mgmt. Syst. - motivation

• data management systems were not oriented towards grid access environment

• data management systems concentrated on distributing data between grid computers

Page 27: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 27

Data Management SystemINTERNET

DBFile System

Data Broker

Data Storage

Mirror & Proxy

Data Storage Data Storage

MetadataRepository

SRS

UniTree

WS

GASSFTPGrid FTP(...)

Clients

Portal

Grid broker

Migrating desktop

Page 28: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 28

Data Management System (2)

• provides seamless access to data and information for grid computing

• uses metadata for describing stored data

• stores data on various media such as files, tapes and databases

• serves as the source of input data and the destination for the results of computing experiments

Page 29: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 29

DMS: Data brokerINTERNET

DBFile System

Data Broker

Data Storage

Mirror & Proxy

Data Storage Data Storage

MetadataRepository

SRS

UniTree

WS

GASSFTPGrid FTP(...)

Clients

Portal

Grid broker

Migrating desktop

Page 30: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 30

DMS: Data broker (2)• serves as an interface (Web

Services based) for external clients, such as the HPC Portal and the grid resource broker

• delivers DMS functions for directory, file and metadata management

Page 31: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 31

DMS: Data broker (3)• directory mgmt.: add, remove and rename

directories, retrieve root and current path, change path, list contents

• file mgmt.: add, remove and rename files, add, remove and retrieve physical location, add and remove archives, add and remove symbolic links

• metadata mgmt.: metadata scheme mgmt., retrieve list of schemes and attributes, assign schemes to files and edit values and attributes, search metadata repository

Page 32: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 32

DMS: Metadata repositoryINTERNET

DBFile System

Data Broker

Data Storage

Mirror & Proxy

Data Storage Data Storage

MetadataRepository

SRS

UniTree

WS

GASSFTPGrid FTP(...)

Clients

Portal

Grid broker

Migrating desktop

Page 33: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 33

DMS: Metadata repository (2)

• responsible for storing and managing metadata

• format of the metadata scheme associated with a file can be defined by the user or chosen from the predefined formats like the Dublin Core

Page 34: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 34

DMS: Data storage modules

INTERNET

DBFile System

Data Broker

Data Storage

Mirror & Proxy

Data Storage Data Storage

MetadataRepository

SRS

UniTree

WS

GASSFTPGrid FTP(...)

Clients

Portal

Grid broker

Migrating desktop

Page 35: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 35

DMS: Data storage mod. (2)

• enable access to physical data• data are arranged in data containers

and can be stored on all media types and accessed by a uniform interface

• data can be organized as files on generic filesystems, BLOBs in databases or files on data tapes

• GASS, GridFTP and FTP as the data transport protocols

Page 36: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 36

DMS: Mirror & proxy module

INTERNET

DBFile System

Data Broker

Data Storage

Mirror & Proxy

Data Storage Data Storage

MetadataRepository

SRS

UniTree

WS

GASSFTPGrid FTP(...)

Clients

Portal

Grid broker

Migrating desktop

Page 37: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 37

DMS: Mirror & proxy mod. (2)

• enables access to external scientific databanks

• mirrors earlier defined sets of data (like the SRS databank in PROGRESS)

• serves as a proxy to internet based data resources (with caching functions)

Page 38: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 38

Web Services Communication

HPC Portal GridService Provider

DataManagement System

GridResource Broker

saveJob()getApplications()saveTaskOfJob()saveStdOfTask()submitJob()getUserJobs()getJobStatus()

listUserDirectory()addUserFile()

getUserFileLocation()

submitJob() changeJobStatus()

Page 39: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 39

Authentication & authorization

• utilize the services available within the Sun One Portal Server 6.0 package: authentication techniques, user database, portlet access control, identity server

• design an authorization system for the grid service provider and the data management system: based on the RAD model

• apply a Single Sign-On mechanism

Page 40: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 40

Authorization scheme

Portal

GRID SERVICEPROVIDER

Identity serverRAD based

authorizationsystem

Logon

Authentication

Request

Method invocation

Token validation

Resource accessauthorization

Page 41: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 41

Visualization of job results

Page 42: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 42

Where to go now?• Project: Research & Development

finished; the test and deployment phase now

• We will continue the R&D on the tools, including the grid service provider and the data management system

Page 43: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 43

Where to go now? (2)• PROGRESS HPC Portal is a

bioinformatic thematic portal: other thematic scientific portals possible to deploy, using the same grid service provider, data management system and grid resources and utilizing the same portal tools

Page 44: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 44

Where to go now? (3)• The grid service provider may be

equipped with means of communication with multiple grids:– cooperation with GRIDLAB– perhaps some Sun Grid Engine based

grids?

Page 45: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 45

Where to go now? (4)• encourage application developers to

add their applications to the PROGRESS repository

• encourage visualization module developers to add their software to the PROGRESS repository

• encourage bioinformatic experts to take over as information services editors

Page 46: PROGRESS – Computing Portal and Data Management in the Cluster of SUNs

Sun HPC Consortium 2003 46

PROGRESShttp://progress.psnc.pl/

http://progress.psnc.pl/portal/

[email protected]