EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE EGEE A Large-scale Production Grid...

56
EGEE-II INFSO-RI- 031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE A Large-scale Production Grid Infrastructure Erwin Laure EGEE Technical Director ISSGC06 July 16-28, 2006 Ischia, Italy
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    0

Transcript of EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE EGEE A Large-scale Production Grid...

  • Slide 1
  • EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE A Large-scale Production Grid Infrastructure Erwin Laure EGEE Technical Director ISSGC06 July 16-28, 2006 Ischia, Italy
  • Slide 2
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 2 Lost in Definitions? Defining the Grid: Access to (high performance) computing power Distributed parallel computing Improved resource utilization through resource sharing Increased storage provision Controlled access to distributed storage Interconnection of arbitrary resources (sensors, instruments, ) Collaboration between users/resources Higher abstraction layer above network services Corresponding security
  • Slide 3
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 3 Defining the Grid A Grid is the combination of networked resources and the corresponding middleware, which provides services for the user. This interconnection of users, resources, and services for jointly addressing dedicated tasks is called a virtual organization. Comparison between Grids and Networks: Networks realize message exchange between endpoints Grids realize services for the users higher level of abstraction
  • Slide 4
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 4 Defining the Grid A Grid is the combination of networked resources and the corresponding middleware, which provides services for the user.
  • Slide 5
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 5 The EGEE Project Aim of EGEE: to establish a seamless European Grid infrastructure for the support of the European Research Area (ERA) EGEE 1 April 2004 31 March 2006 71 partners in 27 countries, federated in regional Grids EGEE-II 1 April 2006 31 March 2008 Expanded consortium 91 partners
  • Slide 6
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 6 Defining the Grid A Grid is the combination of networked resources and the corresponding middleware, which provides services for the user.
  • Slide 7
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 7 EGEE Infrastructure Country participating in EGEE Scale (June 2006): ~ 200 sites in 40 countries ~ 25 000 CPUs > 10 PB storage > 35 000 jobs per day > 60 Virtual Organizations
  • Slide 8
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 8 EGEE Infrastructures Production service Scaling up the infrastructure with resource centres around the globe Stable, well-supported infrastructure, running only well-tested and reliable middleware Pre-production service Run in parallel with the production service (restricted nr of sites) First deployment of new versions of the gLite middleware Test-bed for applications and other external functionality T-Infrastructure (Training&Education) Complete suite of Grid elements and application (Testbed, CA, VO, monitoring, support, ) Everyone can register and use GILDA for training and testing 20 sites on 3 continents
  • Slide 9
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 9 EGEE Operations Process Geographically distributed responsibility for operations: There is no central operation Regional Operation Centers Responsible or resource centers in their region Tools are developed/hosted at different sites: GOC DB (RAL), SFT (CERN), GStat (Taipei), CIC Portal (Lyon) Grid operator on duty 6 teams working in weekly rotation CERN, IN2P3, INFN, UK/I, Ru,Taipei Crucial in improving site stability and management Expanding to all ROCs in EGEE-II Operations coordination Weekly operations meetings Regular ROC managers meetings Series of EGEE Operations Workshops Nov 04, May 05, Sep 05, June 06 Procedures described in Operations Manual Introducing new sites Site downtime scheduling Suspending a site Escalation procedures; etc. Highlights: Distributed operation Evolving and maturing procedures Procedures being in introduced into and shared with the related infrastructure projects
  • Slide 10
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 10 Defining the Grid A Grid is the combination of networked resources and the corresponding middleware, which provides services for the user.
  • Slide 11
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 11 Production Grid Middleware Key factors in EGEE Grid Middleware Development: 1.Strict software process Use industry standard software engineering methods Software configuration management, version control, defect tracking, automatic build system, 2.Conservative approach in what software to use Avoid cutting-edge software Deployment on over 100 sites cannot assume a homogenous environment middleware needs to work with many underlying software flavors Avoid evolving standards Evolving standards change quickly (and sometime significantly cf. OGSI vs. WSRF) impossible to keep pace on > 100 sites Long (and tedious) path from prototypes to production
  • Slide 12
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 12 EGEE Middleware: gLite Exploit experience & existing components VDT (Condor, Globus) EDG/LCG AliEn Develop a lightweight stack of EGEE generic middleware Dynamic deployment Pluggable components Focus is on re-engineering and hardening March 4, 2006: gLite 3.0 LCG-2 prototyping product 20042004 2005 product gLite 2006 gLite 3.0
  • Slide 13
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 13 Developing gLite 3.0 now available on production infrastructure After gLite 3.0: Continuous release of single components As needed by users and as made available by developers Major releases provide a check-point In general in coincidence with major application challenges Continuing development to Bring components not yet included in release to maturity Improve functionality Increase robustness Increase usability Improve the compliance to international standards
  • Slide 14
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 14 Grid Interoperability Leading role in building world-wide grids Incubator for new Grid projects world-wide Interoperation efforts Bilateral: EGEE/OSG, EGEE/NDGF, EGEE/NAREGI Multilateral: Grid Interoperability Now (GIN) Experiences and requirements fed back into standardization process (GGF now OGF) Strengthening contacts with industry GINGIN
  • Slide 15
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 15 Middleware Globus GT4Condor APST Platform Infrastructure UnixWindowsJVMTCP/IPMPI.Net Runtime Environmental Sciences Life & Pharmaceutical Sciences Applications Geo Sciences Building Software for the Grid VPNSSH Courtesy IBM Slide Courtesy David Abramson
  • Slide 16
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 16 Middleware Globus GT4Condor APST Platform Infrastructure UnixWindowsJVMTCP/IPMPI.Net Runtime Environmental Sciences Life & Pharmaceutical Sciences Applications Geo Sciences Building Software for the Grid VPNSSH Courtesy IBM, Upper Middleware & Tools Lower Middleware Bonds Slide Courtesy David Abramson
  • Slide 17
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 17 Middleware structure Higher-Level Grid Services may or may not be used by the applications should help them but not be mandatory Foundation Grid Middleware is deployed on the infrastructure should not assume the use of Higher-Level Grid Services must be complete and robust should allow interoperation with other major grid infrastructures
  • Slide 18
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 18 gLite Grid Middleware Services Overview paper http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf
  • Slide 19
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 19 Job submission Computing Element Storage Element Site X Information System submit query retrieve Resource Broker User Interface publish state File and Replica Catalogs Authorization Service query update credential publish state discover services
  • Slide 20
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 20 SA3 Testing & Certification Functional Tests Testbed Deployment gLite Software Process JRA1 Development Software Error Fixing SA3 Integration Deployment Packages Integration Tests Installation Guide, Release Notes, etc SA1 Pre- Production Scalability Tests Pre-Production Deployment Fail Pass SA1 Production Infrastructure Release Problem Serious problem Directives
  • Slide 21
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 21 Defining the Grid A Grid is the combination of networked resources and the corresponding middleware, which provides services for the user.
  • Slide 22
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 22 EGEE Applications >20 applications Astronomy Biomedicine Computational Chemistry Earth Sciences Financial Simulation Fusion Geo-Physics High Energy Physics Further applications in evaluation Applications now moving from testing to routine and daily usage
  • Slide 23
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 23 High Energy Physics Large Hadron Collider (LHC): One of the most powerful instruments ever built to investigate matter 4 Experiments: ALICE, ATLAS, CMS, LHCb 27 km circumference tunnel Due to start up in 2007 Mont Blanc (4810 m) Downtown Geneva
  • Slide 24
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 24 Accelerating and colliding particles
  • Slide 25
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 25 The accelerator generates 40 million particle collisions (events) every second at the centre of each of the four experiments detectors The LHC Accelerator
  • Slide 26
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 26 LHC DATA This is reduced by online computers that filter out a few hundred good events per sec. Which are recorded on disk and magnetic tape at 100-1,000 MegaBytes/sec ~15 PetaBytes per year for all four experiments
  • Slide 27
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 27 simulation reconstruction analysis interactive physics analysis batch physics analysis batch physics analysis detector event summary data raw data event reprocessing event reprocessing event simulation event simulation analysis objects (extracted by physics topic) Data Handling and Computation for Physics Analysis event filter (selection & reconstruction) event filter (selection & reconstruction) processed data [email protected]
  • Slide 28
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 28 LCG depends on two major science grid infrastructures . EGEE - Enabling Grids for E-Science OSG - US Open Science Grid
  • Slide 29
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 29 Example: HEP LHC data and service challenges Preparing for LHC start-up in 2007 Ensure key services & infrastructure are in place Emphasis on providing a service Computing needs of experiments E.g. LHCb: ~700 CPU years in 2005 on the EGEE infrastructure E.g. ATLAS: over 10,000 jobs per day ATLAS LHCb ATLAS Massive data transfers > 1.5 GB/s
  • Slide 30
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 30 Example: Addressing emerging diseases Emerging diseases know no frontiers. Time is a critical factor Avian influenza: human casualties International collaboration is required for: Early detection Epidemiological watch Prevention Search for new drugs Search for vaccines
  • Slide 31
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 31 WISDOM, the first step WISDOM focuses on drug discovery for neglected and emerging diseases. Summer 2005: World-wide In Silico Docking On Malaria 46 million ligands docked in 6 weeks ~1 million virtual ligands selected 1TB of data produced 1000 computers in 15 countries Equivalent to 80 CPU years Spring 2006: drug design against H5N1 neuraminidase involved in virus propagation impact of selected point mutations on the efficiency of existing drugs identification of new potential drugs acting on mutated N1 N1H5
  • Slide 32
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 32 Challenges for high throughput virtual docking 300,000 Chemical compounds: ZINC & Chemical combinatorial library Target (PDB) : Neuraminidase (8 structures) Millions of chemical compounds available in laboratories High Throughput Screening 2$/compound, nearly impossible Molecular docking (Autodock) ~100 CPU years, 600 GB data Data challenge on EGEE, Auvergrid, TWGrid ~6 weeks on ~2000 computers In vitro screening of 100 hits Hits sorting and refining
  • Slide 33
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 33 Example: Pharmacokinetis A lesion is detected in an MRI study of a patient start with virtual biopsy The process requires obtaining a sequence of MRI volumetric images. Different images are obtained in different breath-holds. Before analyzing the variation of each voxel, images must be co-registered to minimize deformation due to different breath holds. The total computational cost of a clinical trial of 20 patients is around 100 CPU days.
  • Slide 34
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 34 Example: Determining earthquake mechanisms Seismic software application determines epicentre, magnitude, mechanism Analysis of Indonesian earthquake (28 March 2005) Seismic data within 12 hours after the earthquake Solution found within 30 hours after earthquake occurred 10 times faster on the Grid than on local computers Results Not an aftershock of December 2004 earthquake Different location (different part of fault line further south) Different mechanism Rapid analysis of earthquakes important for relief efforts Peru, June 23, 2001 Mw=8.4 Sumatra, March 28, 2005 Mw=8.5
  • Slide 35
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 35 Flood forecasting problem Many kinds of data Meteorological, hydrological, hydraulic Generated by simulations or obtained from sensors Permanent or periodically updated Publicly available or with restricted access
  • Slide 36
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 36 ITU-BR system for RRC 2006 ITU-BR developed a system for RRC 2006 Run compatibility and complementary analysis 84 PCs executing 168 parallel tasks Compatibility analysis < 4h Great Success ! ITU-BR wanted to be sure and do even better Provide more CPU power Reduce risks by providing a supplementary system Gain experience on how to access large and reliable computing resources on demand EGEE used a subset of its Grid for RRC 2006 Over 400 PCs Compatibility analysis < 1h
  • Slide 37
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 37 The Future of Grids Increasing the number of infrastructure users by increasing awareness Dissemination and outreach Training and education Increasing the number of applications by improving application support and middleware functionality Improved usability through high level grid middleware extensions Increasing the grid infrastructure Incubating related projects Ensuring interoperability between projects Protecting user investments Towards a sustainable grid infrastructure
  • Slide 38
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 38 User Information & Support More than 170 training events and summer schools across many countries >3000 people trained induction; application developer; advanced; retreats Material archive online with ~250 presentations Public and technical websites Dissemination material constantly evolving to expand information and keep it up to date 4 conferences organized (~ 460 @ Pisa) Next conference: September 2006 in Geneva ~600 participants
  • Slide 39
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 39 Industry and EGEE-II Industry Task Force Group of industry partners in the project Links related industry projects (NESSI, BEinGRID, ) Works with EGEEs Technical Coordination Group Collaboration with CERN openlab project IT industry partnerships for hardware and software development EGEE Business Associates (EBA) Companies sponsoring work on joint-interest subjects Industry Forum Led by Industry to improve Grid take-up in Industry Organises industry events and disseminates grid information e.g. this Wednesday here at the school
  • Slide 40
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 40 The Future of Grids Increasing the number of infrastructure users by increasing awareness Dissemination and outreach Training and education Increasing the number of applications by improving application support and middleware functionality Improved usability through high level grid middleware extensions Increasing the grid infrastructure Incubating related projects Ensuring interoperability between projects Protecting user investments Towards a sustainable grid infrastructure
  • Slide 41
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 41 Middleware Globus GT4Condor APST Platform Infrastructure UnixWindowsJVMTCP/IPMPI.Net Runtime Environmental Sciences Life & Pharmaceutical Sciences Applications Geo Sciences Building Software for the Grid VPNSSH Courtesy IBM, Lower Middleware Upper Middleware & Tools Bonds Slide Courtesy David Abramson ???
  • Slide 42
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 42 Portals on EGEE P-Grade Genius
  • Slide 43
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 43 Example: Biomedicine Parallel simulation of blood flow on the Grid Online visualization of simulation results on the desktop Interactive steering of simulation Grid is invisible Cooperation with University Amsterdam
  • Slide 44
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 44 Example: Flooding Crisis Support Simulation of flooding on the Grid Online visualization of simulation results in the CAVE Interactive steering of simulation Grid is invisible Cooperation with Slowak Academy of Sciences
  • Slide 45
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 45 Scientific Visualization Use your favourite device to connect to the Grid: Sony PSP PlayStation Portable
  • Slide 46
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 46 Not only portals Portals are a good way to bring computing power to end-users In most cases domain specific Application programmers (and portal programmers) need more powerful interfaces Workflow engines Higher level programming abstractions (SAGA, DRMAA, ) Programming environments (gEclipse) Compilers?
  • Slide 47
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 47 The Future of Grids Increasing the number of infrastructure users by increasing awareness Dissemination and outreach Training and education Increasing the number of applications by improving application support and middleware functionality Improved usability through high level grid middleware extensions Increasing the grid infrastructure Incubating related projects Ensuring interoperability between projects Protecting user investments Towards a sustainable grid infrastructure
  • Slide 48
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 48 Projects related to EGEE EUGRID
  • Slide 49
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 49 Related Infrastructures GINGIN
  • Slide 50
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 50 The Future of Grids Increasing the number of infrastructure users by increasing awareness Dissemination and outreach Training and education Increasing the number of applications by improving application support and middleware functionality Improved usability through high level grid middleware extensions Increasing the grid infrastructure Incubating related projects Ensuring interoperability between projects Protecting user investments Towards a sustainable grid infrastructure
  • Slide 51
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 51 Sustainability: Beyond EGEE-II Need to prepare for permanent Grid infrastructure Maintain Europes leading position in global science Grids Ensure a reliable and adaptive support for all sciences Independent of project funding cycles Modelled on success of GANT Infrastructure managed centrally in collaboration with national bodies (in EGEE-II: JRUs)
  • Slide 52
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 52 Grids in Europe Great investment in developing Grid technology Sample of National Grid projects: Austrian Grid Initiative DutchGrid France: Grid5000 Germany: D-Grid; Unicore Greece: HellasGrid Grid Ireland Italy: INFNGrid; GRID.IT NorduGrid Swiss Grid UK e-Science: National Grid Service; OMII; GridPP EGEE provides framework for national, regional and thematic Grids
  • Slide 53
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 53 Evolution EGEE EGEE-II EDG EGEE-III European e-Infrastructure Coordination Testbeds Utility Service Routine Usage
  • Slide 54
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 54 Summary Grids represent a powerful new tool for science Today we have a window of opportunity to move grids from research prototypes to permanent production systems (as networks did a few years ago) EGEE offers a mechanism for linking together people, resources and data of many scientific community a basic set of middleware for gridfying applications with documentation, training and support regular forums for linking with grid experts, other communities and industry
  • Slide 55
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 55 Summary Success will lead to the adoption of grids as the main computing infrastructure for science If we succeed then the potential return to international scientific communities will be enormous and open the path for commercial and industrial applications
  • Slide 56
  • Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 EGEE - A Large-scale Production Grid Infrastructure 56 EGEE06 Conference EGEE06 Capitalising on e-infrastructures Demos Related Projects Industry International community (UN organisations in Geneva etc.) 25-29 September 2006 Geneva, Switzerland http://www.eu-egee.org/egee06