LIGO-G040247-00-E LIGO Scientific Collaboration Data Grid Status Albert Lazzarini Caltech LIGO...

download LIGO-G040247-00-E LIGO Scientific Collaboration Data Grid Status Albert Lazzarini Caltech LIGO Laboratory Trillium Steering Committee Meeting 20 May 2004.

If you can't read please download the document

Transcript of LIGO-G040247-00-E LIGO Scientific Collaboration Data Grid Status Albert Lazzarini Caltech LIGO...

  • Slide 1

LIGO-G040247-00-E LIGO Scientific Collaboration Data Grid Status Albert Lazzarini Caltech LIGO Laboratory Trillium Steering Committee Meeting 20 May 2004 University of Chicago Slide 2 LIGO-G040247-00-E LIGO Laboratory at Caltech 2 The LIGO Scientific Collaboration and the LSC Data Grid LSC Data Grid: 6 US sites + 3 EU sites (Birmingham, Cardiff/UK, AEI-MPII/Germany) *LHO, LLO: observatory sites * LSC - LIGO Scientific Collaboration - iVDGL supported iVDGL has enabled the collaboration to establish a persistent production grid Cardiff AEI/Golm Birmingham Slide 3 LIGO-G040247-00-E LIGO Laboratory at Caltech 3 LIGO Laboratory Distributed Tier 1 Center CPU (GHz) Disk (TB) Tape (TB) Network() LHO 763 14 140 OC3 LLO 391 7 140 OC3 CIT 1150 30 500 GigE MIT 244 9 - FastE TOTAL 2548 60 780 These resources represent Tier 1 Center for Collaboration They are dedicated to Collaboration use only Use GT subset to move data, provide computing resource access to collaboration Slide 4 LIGO-G040247-00-E LIGO Laboratory at Caltech 4 LIGO Scientific Collaboration (LSC) Tier 2 - iVDGL/GRID3 sites 296 GHz CPU 64 TB storage (commodity IDE) for data OC-12 (622 Mbps) to Abilene 889 GHz CPU 34 TB RAID 5 storage OC-12 (622 Mbps) to Abilene Slide 5 LIGO-G040247-00-E LIGO Laboratory at Caltech 5 GEO 600 Computing Sites European data grid sites 272 GHz CPU 18 TB storage (RAID 5 and commodity IDE) GigE to SuperJANET (to GEANT to Abilene) 670 GHz CPU 40 TB storage (commodity IDE) Fast Ethernet to G-WiN (to GEANT to Abilene) 508 GHz CPU 18 TB storage (RAID5 and commodity IDE) 10 Gbps to SuperJANET (to GEANT to Abilene) Slide 6 LIGO-G040247-00-E LIGO Laboratory at Caltech 6 LSC DataGrid Current Status Lightweight Data Replicator Built upon GT (RLS/GridFTP) Provides automated data movement -- in production, 7x24 LSC Data Grid Server deployed at 7 of 9 sites Built on VDT + LSC specific APIs Data grid in science production, 7x24 Most use comprises conventional use of GT, Condor Job submission to individual sites, manual data product migration, tracking, etc. 35 LSC scientists with digital credentials VDT use is limited to subgroup participating in GriPhyN/iVDGL Small experiments at running analysis jobs across multiple sites successful Part of SC2002 demo Big Run was pulled together as part of demo for SC2003 Real effort to do production work -- hurdle still too high to interest most scientific users Slide 7 LIGO-G040247-00-E LIGO Laboratory at Caltech 7 Summary Developed data replication, distribution capabilities over collaboration Data Grid Robust, fast replication of data sets across 3 continents 50+ TB over the internet Provide data discovery mechanisms Deployed a persistent Data Grid for the international collaboration Access to distributed computing power - US & EU Single sign-on using a single grid identity Will eventually enable CPU-limited analyses as background jobs Challenge: making full use of inherent CPU capacity Implemented the use of virtual data catalogs for efficient (re)utilization of data as part of SC2002/03 Tracking data locations, availability with catalogs Data discovery, data transformations Ongoing work for two classes of pipeline analyses Slide 8 LIGO-G040247-00-E LIGO Laboratory at Caltech 8 Plans Continue in deployment and evolution of GRID3 LIGO will participate with iVDGL partners in the future Open Science Grid (OSG) initiative Focus must be on 7x24 production S4, S5 Runs, data analysis Q42004 - Q42005 Constrains use, access to resources for grid research Provides excellent opportunity for use-case studies of success/failure Continue to integrate grid technologies - VDT Better, wider use of virtual data across all Data Grid sites Publish data as they become available automatically Prototype exists in non grid-enabled internal code - develop API to expose this module Job scheduling across the distributed grid Enhance/extend persistent data grid for the collaboration Add sites (Tier 3) Add additional LIGO Laboratory (Tier 1) resources Redeploy the SC2003 pipelines using more efficient script topologies Target: saturate distributed grid resources