INFN-GRID Globus evaluation (WP 1) Massimo Sgaravatto INFN Padova for the INFN Globus group...
-
Upload
joy-warren -
Category
Documents
-
view
216 -
download
3
Transcript of INFN-GRID Globus evaluation (WP 1) Massimo Sgaravatto INFN Padova for the INFN Globus group...
INFN-GRID Globus evaluation
(WP 1)
Massimo SgaravattoINFN Padova
for the INFN Globus [email protected]
http://www.infn.it/globus
Globus Some basic services (security, information service, resource
management, …) must be implemented in order to implement and use a Grid for real applications
Globus identified as possible Grid framework providing these services
… but it has been developed mainly for “traditional” computing, different from computing in HEP
High performance vs. high throughput Supercomputers vs PC farms Distributed data intensive computing not addressed
Need to assess what can be used for HEP environment WP1 “Installation and Evaluation of the Globus Toolkit” of the
INFN-GRID Project Goal: evaluation of the Globus toolkit
Which services can be useful ? What is necessary to integrate/modify ? What is missing ?
Globus ArchitectureApplications
Core ServicesMetacomputing
Directory Service
GRAMGlobus
Security Interface
Heartbeat Monitor
Nexus
Gloperf
Local ServicesLSF
Condor MPI
NQEEasy
TCP
SolarisIrixAIX
UDP
High-level Services and ToolsDUROC globusrunMPI Nimrod/GMPI-IO CC++
GlobusView Testbed Status
GASS
Proposed work plan Security
To access GRID resources mechanisms for user authentication and authorization needed
Evaluation of GSI service
Information Service To discover the GRID resources (CPU, storage, network, …)
mechanisms to “publish” them must be defined Analysis of GIS service to “publish” information using a uniform and
standard interface
Resource Management Necessary a uniform interface to submit jobs on GRID resources
Uniform standard interface to different resource management systems Uniform standard language for task management Assessment of Globus services for resource allocation and process
management
Proposed work plan Data Access and Migration
High performance and reliable tools needed to “manage” data (access to remote data, data transfers, wide area replica, …)
Assessment of Globus tools for data management (GASS, Globusftp)
Fault Monitoring Faults in a GRID environment must be promptly detected and
recovery mechanisms must be implemented Evaluation of HBM service for fault detection
Execution Environment Management Code migration (moving the application where the job will
actually be executed) as a possible implementation strategy Evaluation of GEM service to support code migration
Globus installation tools Reduce complexity and manpower for Globus installation and
maintenance
Globus installation tools Flavia’s presentation
INFN-GRID installation tool to shorten the installation time of the Globus toolkit, avoid common mistakes, support for specific customisations
Possibility (option) to install optional software, to proceed with INFN specific customizations (INFN CA, configuration of a hierarchical GIS architecture), to install and use specific INFN tools
Proven to be successful within INFN (used to setup a INFN GRID testbed) and also outside (CERN, FNAL, …)
Security Evaluation of Globus GSI
User authentication (implementation based on X.509 certificates)
User authorization “managed” by grid-mapfile (mapping between Grid users and local users)
Some shortcomings, but the GSI security model seems to satisfy our requirements
Some shortcomings already addressed INFN-CA used to sign certificates CRL (issued by INFN CA) distribution Centralized management of grid-mapfiles
Security Centralized management of the grid-mapfiles
Goal: Ease the sharing of the same access policies (represented by the grid-mapfiles) for groups of hosts with common purposes
Proposed system Central repository (LDAP server) to store user
certificates and to define groups of users Certificates published by CA manager Group manager responsible for editing group
memberships (using a LDAP client) Resource owners (Globus administrators) periodically
(i.e. cron job) “connect” to this repository, “download” the subject of the certificates that meet a specified criterion (i.e. all users of group X), and produce grid-mapfile entries
Security AFS tests
Analysis of what can be done now with the existing tools (quite unfit for any real need)
Possible ways to address the existing shortcomings identified
New Globus tool (gsiklog) available
Information Service Alessandro’s presentation
Evaluation of Globus GIS (Grid Information Service) Definition and implementation of a hierarchical
architecture of GIS 1.1.3 Performance and scalability tests Web interface for browsing Various shortcomings must be addressed (to use the
GIS in a production environment) Mixed push/pull model more suitable than a pull model Performance Lack of security …
Dc=bo, Dc=infn,dc=it,o=grid
Bologna
GIIS
INFN CMS GIIS
GIIS
Dc=pd,Dc=infn,dc=it,o=grid
Exp=cms, o=grid
Top Level INFN GIIS
Dc=infn,dc=it,o=grid
Padova
INFN GIS Topology
GRIS
Resource Management Most of these activities as collaboration with Grid
Workload Management work package
Evaluation of Globus resource management architecture Evaluation of Globus GRAM
Tests with fork, Condor, LSF and PBS as underlying resource management systems
The model is fine, but lack of “robustness” (needed for real production environments)
Memory leaks in the Globus job manager (fixed) Scalability (one job manager for each job) Reliability (the job manager is not persistent) …
Globus resource management architecture (simplified design)
GlobusGRAM
CONDOR
GlobusGRAM
LSF
GlobusGRAM
PBS
Site1Site2 Site3
Broker Grid InformationService (GIS)
Submit jobs
ResourceDiscovery
Information on characteristics andstatus of local resources
LocalResource
ManagementSystems
Globus GRAMas uniform interface
to different local resource management systems
Broker chooses in whichresources to submit the jobs
(not implemented in the Globus framework)
Farms
RSL
RSL
RSL
Resource Management Evaluation of GRAM API Evaluation of GRAM Reporter (“cooperation” between GRAM and
GIS) in particular for farms Many useless attributes (at least for our needs), attributes not
calculated (always defined as 0), some attributes not properly calculated by Globus shell scripts
Some important information describing the farms and the submitted jobs (necessary for example for a resource broker) missing
Draft proposal for a possible modification of the default schema Evaluation of RSL as uniform language to specify resources
More flexibility required Submission of Condor jobs to Globus resources
Condor-G (useful as a reliable crash-proof job submission service) GlideIn
Evaluation of MPICH-G2 vs. MPICH Some shortcomings found (lack of support for shared memory, worse
latency performance wrt. MPICH)
Data management Tests with GASS
Service to ease the access to remote files without having a distributed file system and/or transferring files from/to remote storage systems
Tests with command line tools and APIs Problems (huge decrease in transfer rate) when transferring
big files Tests with Globusftp alpha release 2
Collaboration with WP network INFN-GRID Tests of new features
Support for GSI mechanisms Capability of resuming interrupted file transfers Throughput tests using parallel data transfers
Antonio’s presentation
Other services Fault Monitoring (HBM)
Evaluation of HBM for fault detection (for “system” and “user” processes)
… but the HBM package is not seeing active development
Execution Environment Management (GEM) Evaluation of GEM as service for code migration … but the GEM service now provides only
limited capabilities (executable staging)
WP 1: Deliverables & Milestones Deliverables
Tools, documentation and operational procedures for Globus deployment (6 Months)
Final report on suitability of the Globus toolkit as basic Grid infrastructure (6 Months)
Milestones Basic deployment Grid infrastructure for the
INFN GRID (6 months) Globus installed on ~ 40 machines on ~ 10 different
sites
Conclusions The activities of WP 1 are over The Globus toolkit can provide basic
services useful to create and deploy usable Grids, but many shortcomings and issues must be addressed
… more details in the report
Other info: http://www.infn.it/globus