Update on EU DataGrid progress and plans for
EGEE
Fabrizio Gagliardi
EU DataGrid Project Leader
www.edg.org
California 14 January 2003 EDG/EGEE status 2
Overview
Project Outline
Atlas task force
CMS stress test
Tutorials
Relationships with other grid projects
Future Directions (EGEE)
Summary
California 14 January 2003 EDG/EGEE status 3
The Project 9.8 M Euros EU funding over 3 years
90% for middleware and applications (HEP, Earth Obs. and Bio Med.)
Three year phased developments & demos (2001-2003)
Total of 21 partners Research and Academic institutes as well as industrial companies
Related projects and activities: DataTAG (2002-2003)
CrossGrid (2002-2004)
GRIDSTART (2002-2004)
Grace (2002-2004)
California 14 January 2003 EDG/EGEE status 4
DataGRID project priorities
Quality Policy Statement published
http://eu-datagrid.web.cern.ch/eu-datagrid/WP12/default.htm
List of priorities defined at the project retreat
http://documents.cern.ch/age?a021130
Followed-up at project conference
http://www.tomiexpress.hu/datagrid/
Show-stoppers found by users on the application testbed are the highest priority
Incremental improvements to current release driven by the needs of the applications (HEPCAL)
After initial middleware development and testbed deployment effort has been refocused on quality and stability
California 14 January 2003 EDG/EGEE status 5
ATLAS is eager to use Grid tools for the Data Challenges ATLAS Data Challenges are already on the Grid (NorduGrid, USA) The DC1/phase2 (to start in October) is expected to be done using
the Grid tools to a bigger extent
ATLAS-EDG Task Force was put together in August with the aims: To assess the usability of the EDG testbed for the immediate
production tasks To introduce the Grid awareness to the ATLAS collaboration
The Task Force has representatives both from ATLAS and EDG: 40 members (!) on the mailing list, ca 10 of them working nearly full-time
The initial task: to process 5 input partitions of the Dataset 2000 at the EDG Testbed + one non-EDG site (Karlsruhe); if this works, continue with other datasets
ATLAS-EDG Task Force (by Oxana Smirnova)
California 14 January 2003 EDG/EGEE status 6
Achievements (by Oxana Smirnova):
A team of hard-working people across the Europe
ATLAS software (release 3.2.1) is packaged into relocatable RPMs, distributed and validated elsewhere
DC1 production script is “gridified”, submission script is produced
User-friendly testbed status monitor deployed
5 Dataset 2000 input files are replicated to 5 sites (2 @ each)
After fixing the “long jobs” problem, 50% of the planned challenge is performed (5 researchers × 10 jobs) – unfortunately, only CERN testbed was fully available
With the rest of the testbed being fixed, jobs are getting scheduled and executed elsewhere
Second test: 4 input files (ca 400 MB each) replicated to 4 sites; 250 jobs submitted, adjusted to run ca 4 hours each. The jobs were distributed across all the testbed by the Resource Broker
California 14 January 2003 EDG/EGEE status 7
Summary (by Oxana Smirnova)
Advantages of the Grid: Possibility to execute tasks and move files over a distributed computing
infrastructure by using one single personal certificate (no need to memorize dozens of passwords)
Possibility do distribute the workload adequately and automatically, without logging in explicitly to each remote system
Possibility to do worldwide production in a perfectly coordinated way, using identical software (RPMs), scripts and databases
Where we are now: Several Grid toolkits are on the market EDG – probably the most elaborated, but still in development This development goes way faster with the help of the users running real
applications Common efforts of the ATLAS-EDG Task Force proved that it is possible
to execute real tasks on the EDG Testbed already now
Thanks all the members for the efforts so far, but there’s more to be done!
CMS/EDG stress test statusAndrea Sciabà
on behalf of
CMS & EDG collaboration
CCS general meeting
December 3, 2002
California 14 January 2003 EDG/EGEE status 9
Sites and resources
Site CE No. of CPU SE Disk space (GB)
CERN lxshare0227 122 lxshare0393
lxshare0384
100
1000(=10010)
CNAF testbed008 40 grid007g 1000
RAL gppce05 12 gppse05 360
(NIKHEF) tbn09 20 tbn03 35
Lyon ccgridli03
ccgridli08
120 (shared)
400
ccgridli07 200
Legnaro cmsgrid001 48 cmsgrid002 513(+513)
Total 762 3721
California 14 January 2003 EDG/EGEE status 10
CMSIM events vs. time
California 14 January 2003 EDG/EGEE status 11
Current issues
The biggest problems related to the Information System: Symptom: no resources are found
Cause: instability of the MDS when it is overloaded Solution: submitting jobs at a lower rate improves the chances of success Symptom: the RB gets stuck (no job ever starts)
Cause: investigating... Symptom: grid elements disappear from the II
Cause: services on some machines stopped workingSolution: restart the services
Symptom: timeouts when copying the input sandbox Symptom: log file lost (“Stdout does not contain useful data”)
Cause: several (no free files/inodes, broken connect. between CE & RB, …)
Problems related to the replica manager: Symptom: file registration in the RC fails from time to time
California 14 January 2003 EDG/EGEE status 12
Current issues
None of these problems is a show-stopper and they happen just in a fraction of the jobs!
Fixes are already there for some of them (but not yet deployed)
California 14 January 2003 EDG/EGEE status 13
Conclusions
50000 events (FZ files) produced in ~ 2 days!
The CMS Task Force has made impressive progress and the first results are promising. A few issues have been identified and solutions are being worked out/applied
The entire task force shows a fruitful cooperation between CMS and EDG!
California 14 January 2003 EDG/EGEE status 14
Tutorials
DAY1
Introduction to Grid computing and overview of the DataGrid project
Security
Testbed overview
Job Submission
lunch
hands-on exercises: job submission
DAY2 Data Management
LCFG, fabric mgmt & sw distribution & installation
Applications and Use cases
Future Directions
lunch
hands-on exercises: data mgmt
The tutorials are aimed at users wishing to "gridify" their applications using EDG software and are organized over 2 full consecutive days.Approx. 100 people have followed the tutorial since August.
October:3 & 4 – CERN31 & Nov 1 - CERN
December2 & 3 – Edinburgh5 & 6 - Italy 9 & 10 – NIKHEF12 - Cracow
More sessions will be organised in the future
http://hep-proj-grid-tutorials.web.cern.ch/hep-proj-grid-tutorials/
California 14 January 2003 EDG/EGEE status 15
Through links with sister projects, there is thepotential for a truely global scientific applications grid
Related Grid projects
California 14 January 2003 EDG/EGEE status 16
Interaction with sister projects CrossGrid
Using the same security certs. Testbed sites install EDG software
Extending it for needs of intensive interactive applications
Participating in the EDG testing activities
Representatives in each projects architecture & management groups
DataTAG (EDT) EDT is deploying EDG sw to investigate
inter-operability with US projects (iVDGL, GriPhyN, PPDG)
Results feedback into EDG software releases
(e.g. GLUE compatible information providers/consumers)
NorduGrid Using the same security certs. Involved in EDG architecture work
Good ideas for gatekeeper and MDS configuration
Helped develop GDMP and GSI extensions for Replica Catalog
Involved in Glue schema work Security policy
Mware testing Working in WP8 (HEP applications)
iVDGL/GriPhyN/PPDG US members in EDG architecture
group Looking for common packaging and
toolkit usage solutions
No strict boundaries with a large cross-fertilization of ideas, software and peopleDataGRID is learning from the experiences in these projects
California 14 January 2003 EDG/EGEE status 17
Plans for the future
Further development in 2003 Further iterative improvements to middleware driven by LCG and users needs More extensive testbeds providing more computing resources Prepare EDG software for future migration to Open Grid Services Architecture
Interaction with LCG LCG intends to make use of the DataGRID middleware LCG is contributing to DataGRID
Testbed support and infrastructure Get access to more computing resources in HEP computing centres
Testing and verification Reinforce the testing group and maintain a certification testbed
Fabric management and middleware development
New EU project (EGEE) Make plans to preserve current major asset of the project: probably the largest Grid
development team in the world EoI for FP6 ( www.cern.ch/egee-ei )
California 14 January 2003 EDG/EGEE status 18
EGEE vision Enabling Grids for E-science in Europe
Goal create a general European Grid production quality infrastructure on top of present and future EU RN infrastructure
Build on EU and EU member states major investment in Grid Technology
Several pioneering prototype results
Largest Grid development team in the world
Goal can be achieved for about €100m/4 years on top of the national and regional initiatives
Approach Leverage current and planned national and regional Grid programmes (e.g. LCG)
Work closely with relevant industrial Grid developers, NRNs and US
EGEE
applications
Geant network
California 14 January 2003 EDG/EGEE status 19
Work done so far
EoI for FP6: www.cern.ch/egee-ei submitted on June 7th
Several follow up meetings
An editorial board and an Interim Task Force established to prepare a position paper and a presentation for a EU Grid workshop in Brussels on October 3-4
Both bodies extended to follow-up with the EU (IST02, ER02, individual contacts)
California 14 January 2003 EDG/EGEE status 20
GRIDs useGÉANT infrastructure
GÉANT profits from technological innovation
GRIDs empowered GÉANT
International dimension
GÉANT networkGÉANT network
GRIDs platformsGRIDs platforms
Application areasApplication areas
GÉANT and GRIDs: The model
R&
D o
n G
RID
sR
&D
on
GR
IDs
California 14 January 2003 EDG/EGEE status 21
IST Programme
Structuring the ERA
Programme
ResearchInfrastructures
GÉANT, GRIDs, GÉANT, GRIDs,
other ICT-RIother ICT-RI
100 + 200 M Euro100 + 200 M Euro
665 M Euro
3.825 M Euro2.655 M Euro
•Integrated Projects•Networks of Excellence•Specific Targeted Projects•Coordinated actions•Support actions
•Integrated Infrastructure Initiatives
•Coordinated actions
•Support actions•More info on: http://www.cordis.lu/ist/fp6/activities.htm
Separate calls for proposals!
InstrumentsK
. Baxevan
idis E
U
California 14 January 2003 EDG/EGEE status 22
Communication Network Development Call
45-47 Million Euros available in the first EU call (Dec 17th, 2002)
Hard to get the whole budget, we will need to share with one, two, more projects and a lot of competition to be expected (1200 EoIs received in this area!)
Focus on support and integration of already established Grid infrastructures
Build a Grid production layer on top of the EU RN infrastructure
No major funds for H/W, CS research or application development (in a first approximation)
California 14 January 2003 EDG/EGEE status 23
Integrated Infrastructure Initiative (I3)
Three lines of funding supported (with possible budget breakdown):
Networking activities (nothing to do with networks…): This is the overhead: management, coordination, dissemination
and outreach (7-10% of the total funding)
Specific service activities: Provision and procurement of Grid services (60% of total funding)
Joint research activity Engineering development to improve the services provided by the
Grid infrastructure (20% of total funding) Application support and focused R&D (10% of total funding)
California 14 January 2003 EDG/EGEE status 24
Networking activities
Coordination and management of the participating Grid infrastructures
Management structure to be defined
Dissemination, training and outreach Leverage EDG and other project tutorials
Proposal from Terena received
User clubs, industry forum etc.
California 14 January 2003 EDG/EGEE status 25
Specific service activities
Integration of major national and international Grid infrastructures
Two tier structure: 1st Tier: Major Grid centres (6-8). Must satisfy minimum level of
Grid resources and staffing
2nd Tier: POPs in all other Geant supported countries
EU resources for doubling the 1st tier centres Grid support staff, a central operation centre and a distributed call and support centre
Interface to Geant follow-on project
Mostly staff and overhead (computer fabrics and storage provided by the partners)
California 14 January 2003 EDG/EGEE status 26
Joint research activity
Focus on hardening and re-engineering of Middleware
Leverage current EU Grid projects and international Grid technology developers (large and established M/W development community)
8-10 WPs with critical mass in a single geographical center, dedicated WP managers hired by the project and reporting to the project technical management (possible international and industrial participation)
Quality assurance group, integration, certification and distribution group with industrial quality
International senior advisory group for project review, long term technology development and direction
California 14 January 2003 EDG/EGEE status 27
Additional activities
Application support: high level interface and portals
user requirements (a la HEPCAL)
CS focused activity: Long term CS issues for production quality Grids
California 14 January 2003 EDG/EGEE status 28
Distribution of responsibilities
Motivation: provide transparent, effective process for proposal preparation
EGEE Executive Committee: Responsible for defining Work Packages and setting up Task Forces to
deliver technical content for proposal. Max ~10 persons for effective process
Should represent stakeholders with major, proven computer and human resources to contribute to EGEE
US has observer status (Ian Foster)
EGEE technical advisory board: Advise the Executive Committee on the overall architecture and specific
technical issues US participation being investigated
California 14 January 2003 EDG/EGEE status 29
Distribution of responsibilities
EGEE Editorial board: Responsible for gathering input from taskforces, overall editing of
proposal, filling out administrative forms and maintaining timeline
EGEE National Partners board: Responsible for coordination and communicating with interested parties
on national/regional level. Ideally one person per country/region Consulted by Executive Committee during preparation of proposal, to
ensure adequate transparency – must be seen as impartial
EGEE interest group: All institutes, companies, organisations interested to remain informed
about progress of EGEE proposal. Includes potential subcontractors for different workpackages
California 14 January 2003 EDG/EGEE status 30
EGEE proposal timeline
Tentative Schedule (continued)
• EU call out on Dec 17th
• Draft 1: overall project structure end of February 2003
• Draft 2: with detailed workpackages end of March 2003
• Final proposal including admin and management end of April 2003
• Submission by May 6th 2003
• First feedback from EU in June-July
• Contract negotiation late summer, fall ’03
• Contract signature by the end of ’03
• Start of project Q1-Q2 ‘04
California 14 January 2003 EDG/EGEE status 31
Summary
ATLAS/DataGRID task force has been a successful experience for EDG
CMS stress test still on-going is a major advance on production quality performance in view of next EU EDG review on February 4-5
Collaboration with LCG and other Grid projects (EU and US) enhanced
Deployment of a very large production Grid testbed being explored with the EU (EGEE) in close collaboration with LCG and the US Grid developers
Need to define a common EU-US roadmap for EGEE
This meeting is a good place to progress
Top Related