D-Grid and DEISA: Towards Sustainable Grid Infrastructures
description
Transcript of D-Grid and DEISA: Towards Sustainable Grid Infrastructures
D-Grid and DEISA: Towards Sustainable Grid Infrastructures
Michael GerndtTechnische Universität München
Technische Universität München
• Founded in 1868 by Ludwig II, Bavarian King
• 3 campuses, München, Garching, Freising
• 12 faculties: with 20.000 students, 480 professors, 5000 employers
• 5 nobel price winners and thre winners who studied at TUM
• 26 Diplom study programs, 45 Bachelor and master programs
Faculty of Informatics
• 19 Chairs
• 30 professors, 3000 students
• 340 beginners in WS 04/05
• Computer science: diplom, bachelor, master
• Information systems: bachelor, masters
• Bioinformatics: bachelor, masters
• Applied informatics, computational science and engineering
Thanks
• Prof. Gentzsch: Slides on D-Grid• My students in Grid Computing: Slides on DEISA• Helmut Reiser from LRZ: Discussion on status of D-
Grid and DEISA
e-Infrastructure
1. Resources: Networks with computing and data nodes, etc.
2. Development/support of standard middleware & grid services
3. Internationally agreed authentication, authorization, and auditing infrastructure
4. Discovery services and collaborative tools
5. Data provenance
6. Open access to data and publications via interoperable repositories
7. Remote access to large-scale facilities: Telescopes, LHC, ITER, ..
8. Application- and community-specific portals
9. Industrial collaboration
10. Service Centers for maintenance, support, training, utility, applications, etc.
Courtesy Tony Hey
Biomedical Scenario
• Bioinformatics scientists have to execute complex tasks• There is the need to orchestrate these services in workflows
Tools
Computational Power
Storage and DataServices
(SOA)
Courtesy Livia Torterolo
Tools
Computational Power
Storage and DataServices
(SOA)
Grid
Gridified Scenario
Grid technology leverages both the computational and data management resources
Providing optimisation, scalability, reliability, fault tolerance, QoS,…
Appl.
Grid Portal/ Gateway
Courtesy Livia Torterolo
D-Grid e-Infrastructure *)
• Building a National e-Infrastructure for Research and Industry
01/2003: Pre-D-Grid Working Groups Recommendation to Government
09/2005: D-Grid-1: early adopters, ‘Services for Science’
07/2007: D-Grid-2: new communities, ‘Service Grids’
12/2008: D-Grid-3: Service Grids for research and industryD-Grid-1: 25 MEuro > 100 Orgs > 200 researchers
D-Grid-2: 30 MEuro > 100 addl Orgs > 200 addl researchers and industry
D-Grid-3: Call 05/2008
• Important: • Sustainable production grid infrastructure after funding stops
• Integration of new communities
• Evaluating business models for grid services
Structure of D-Grid
Steering CommitteeCG, DGI, Leader of technical areas
Ad
visory B
oard
External E
xperts and IndustryDGI: Integration Project
Community Grids
informs
advises
reports
reviews
CoordinatesCooperation
D-Grid -1, -2, -3 2005 - 2011
12
As
tro
-Gri
d
C3
-Gri
d
HE
P-G
rid
IN-G
rid
Me
diG
rid
ON
TO
VE
RS
E
WIK
ING
ER
WIS
EN
T
Te
xtg
rid
. . . . . .
Im W
iss
en
sne
tz
Business Services, SLAs, SOA Integration, Virtualization
User-friendly Access Layer, Portals
Generic Grid Middleware and Grid Services
Integration Project DGI-2
DGI Infrastructure Project
• Goals• Scalable, extensible, generic grid platform for the future• Longterm, sustainable grid operation, SLA-based
• Structure• WP 1: D-Grid basic software components
– large storage, data interfaces, virtual organizations, management
• WP 2: Develop, operate and support robust core grid– resource description, monitoring, accounting, and billing
• WP 3: Network (transport protocols, VPN)– Security (AAI, CAs, Firewalls)
• WP 4: Business platform and sustainability– project management, communication and coordination
DGI Services, Available Dec 2006
• Sustainable grid operation environment with a set of core D-Grid middleware services
• Central registration and information management for all resources
• Packaged middleware components for gLite, Globus and Unicore and for data management systems SRB, dCache and OGSA-DAI
• D-Grid support infrastructure for new communities with installation and integration of new grid resources
• Help-Desk, Monitoring System and Central Information Portal
14
DGI Services, Dec 2006, cont.
• Tools for managing VOs based on VOMS and Shibboleth
• Prototype for Monitoring & Accounting for Grid resources, and first concept for a billing system
• Network and security support for Communities (firewalls in grids, alternative network protocols,...)
• DGI operates „Registration Authorities“, with internationally accepted Grid certificates of DFN & GridKa Karlsruhe
• Partners support new D-Grid members with building their own „Registration Authorities“
DGI Services, Dec 2006, cont.
• DGI will offer resources to other Communities, with access via gLite, Globus Toolkit 4, and UNICORE
• Portal-Framework Gridsphere can be used by future users as a graphical user interface
• For administration and management of large scientific datasets, DGI will offer dCache for testing
• New users can use the D-Grid resources of the core grid infrastructure upon request
D-Grid Middleware
Nutzer
ApplicationDevelopment
and User Access
GAT API
Data/Software
Resourcesin D-Grid
High-levelGrid
Services
Basic Grid Services
DistributedData Archive
User
NetworkInfrastructur
LCG/gLite
Globus 4.0.1
AccountingBilling
User/VO-Mngt
SchedulingWorkflow Management
Data management
Security
Plug-In
UNICORE
DistributedCompute Resources
GridSphere
Monitoring
Die DGI-Infrastruktur (10/2007)
2.20
0 C
PU
-Co
res,
800
TB
Dis
k, 1
.400
TB
Tap
e
Image courtesy Harvey Newman, Caltech
HEP-Grid: p-p collisions at LHC at CERN
Tier2 Centre ~1 TIPS
Online System
Offline Processor Farm ~20 TIPS
CERN Computer Centre
FermiLab ~4 TIPSFrance Regional Centre
Italy Regional Centre
Germany Regional Centre
InstituteInstituteInstituteInstitute ~0.25TIPS
Physicist workstations
~100 MBytes/sec
~100 MBytes/sec
~622 Mbits/sec
~1 MBytes/sec
There is a “bunch crossing” every 25 nsecs.There are 100 “triggers” per secondEach triggered event is ~1 MByte in size
Physicists work on analysis “channels”.Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server
Physics data cache
~PBytes/sec
~622 Mbits/sec or Air Freight (deprecated)
Tier2 Centre ~1 TIPS
Tier2 Centre ~1 TIPS
Tier2 Centre ~1 TIPS
Caltech ~1 TIPS
~622 Mbits/sec
Tier 0Tier 0
Tier 1Tier 1
Tier 2Tier 2
Tier 4Tier 4
1 TIPS is approximately 25,000 SpecInt95 equivalents
20
AstroGrid
21
MediGRID - Data flow in Life Science Grids
Grid-based Platform for Virtual Organisations inthe Construction Industry
Seite 23
BIS-Grid: Grid Technologies for EAI
Grid-based Enterprise Information System
Use of Grid technologies to integrate distributed enterprise information systems
D-Grid: Towards a Sustainable Infrastructure for Science and Industry
• 3nd Call: Focus on Service Provisioning for Sciences & Industry
• Close collaboration with: Globus Project, EGEE, Deisa, CoreGrid, NextGrid, …
• Application and user-driven, not infrastructure-driven => NEED
• Focus on implementation and production, not grid research, in a multi-technology environment (Globus, Unicore, gLite, etc)
• Govt is (thinking of) changing policies for resource acquisition (HBFG !) to enable a service model
24
DEISA
Distributed
European
Infrastructure for
Supercomputer
Applications
What is DEISA?
Framework 6 project.
Consortium of leading SC centers in Europe.
What are the main goals?
To deploy and operate a persistent, production quality, distributed supercomputing environment with continental scope
To enable scientific discovery across a broad spectrum of science and technology. Scientific impact (enabling new science) is the only criterion for success.
IDRIS-CNRS, Paris (F)Prof. Victor Alessandrini
LRZ, München (D)Dr. Horst-Dieter Steinhoefer
RZG, Garching (D)Dr. Stefan Heinzel
CINECA, Bologna (I) Dr. Sanzio Bassini
EPCC, Edinburgh (GB)Dr. David Henty
CSC, Espoo (FIN)Mr. Klaus Lindberg
SARA, Amsterdam (NL)Dr. Axel Berg
ECMWF, Reading (GB)(Weather forecast)Mr.Walter Zwieflhofer
FZJ, Jülich (D)Dr. Achim Streit
BSC, Barcelona (ESP)Prof. Mateo Valero
HLRS, Stuttgart, (D)Prof. Michael Resch
Projectdirector
Executivecommittee
DEISA: Principal Project Partners
2004 2006 2008
Deployment of a co-scheduling service
Synchronizing remote supercomputers
Allowing high performance data transfer services across sites.
Evolving star-like configuration 10 Gb/s Phase2 network.
DEISA deployed UNICOREEnabler for transparent
access to distributed resources
Allows high performance data sharing at a continental scale as well as transparent job migration across similar platforms
A virtual dedicated 1 Gb/s internal network provided by GEANT
DEISA: Principal Project Partners
DEISA Architecture
Linuxclusters
AIXdistributedSuperclusters
VectorSystems
DEISA environment incorporates different platforms and operating systems: IBM Linux on PowerPC, IBM AIX
on Power4-5 SGI Linux on Itanium NEC vector systems
Since 2007 DEISA infrastructure's aggregated computing power is close to 190 Teraflops.
The GRID file system allows storage of data in heterogeneous environments avoiding data redundancy.
DEISA Operation and Services
• Load balancing the computational workload across national borders
• Huge, demanding applications are run by reorganizing the global operation in order to allocate substantial resources in one site
• Runs “as such” with no modification. This strategy only relies on network bandwidths, which will keep improving in the years to come.
UNICORE Portal
Jobs may be attached to different target sites and systems
Dependencies between tasks / subjobs
UNICORE Infrastructure
DEISA – Extreme Computing Initiative
• Launched in May 2005 by the DEISA Consortium, as a way to enhance its impact on science and technology.
• Applications adapted to the current DEISA Grid • International collaborations involving scientific teams. • Workflow applications involving at least two platforms.• Coupled applications involving more than one platform.
• The Applications Task Force (ATASKF) was created in April 2005.
• It is a team of leading experts in high performance and Grid computing whose major objective is to provide the consultancy needed to enable the user’s adoption of the DEISA research infrastructure.
Common Production Environment
• DEISA Common Production Environment (DCPE)• Defined and deployed on each computer integrated in the
platform.
• The DCPE includes:• shells (Bash and Tcsh),• compilers (C, C++, Fortran and Java),• libraries (for communication, data formatting, numerical
analysis, etc.),• tools (debuggers, profilers, editors, batch and workflow
managers, etc.),• and applications.
• Accessible via the module command• list, load and unload each component.
Monitoring: The Inca System
• The Inca system provides user-level Grid monitoring• Periodic, automated testing of the software and services
required to support persistent, reliable grid operation.• Collect, archive, publish, and display data.
Potential Grid Inhibitors
• Sensitive data, sensitive applications (medical patient records)
• Different organizations have different ROI
• Accounting, who pays for what (sharing!)
• Security policies: consistent and enforced across the grid !
• Lack of standards prevent interoperability of components
• Current IT culture is not predisposed to sharing resources
• Not all applications are grid-ready or grid-enabled
• SLAs based on open source (liability?)
• “Static” licensing model don’t embrace grid
• Protection of intellectual property
• Legal issues (privacy, national laws, multi-country grids)
35
Lessons Learned and Recommendations
• Large infrastructure update cycles• During development and operation, the grid infrastructure
should be modified and improved in large cycles only: all applications depend on this infrastructure !
• Funding required after project• Continuity especially for the infrastructure part of grid projects is
important. Therefore, funding should be available after the project, to guarantee services, support and continuous improvement and adjustment to new developments.
• Interoperability• Use software components and standards from open-source and
standards initiatives especially in the infrastructure and application middleware layer.
• Close collaboration of Grid developers and users• Mandatory to best utilize grid services and to avoid application
silos.
Lessons Learned and Recommendations
• Management board steering collaboration• For complex projects (infrastructure and application projects),
a management board (consisting of the leaders of the different projects) should steer coordination and collaboration among the projects.
• Reduce re-invention of the wheel• New projects should utilize the general infrastructure, and
focus on an application or on a specific service, to avoid complexity, re-inventing wheels, and building grid application silos.
• Participation of industry has to be industry-driven.• Push from outside, even with government funding, is not
promising. Success will come only from real needs e.g. through existing collaborations with research and industry, as a first step.