High Energy Physics – A big data use case

19
High Energy Physics – A big data use case Bob Jones Head of openlab IT dept CERN This document produced by Members of the Helix Nebula consortium is licensed under a Creative Commons Attribution 3.0 Unported License . Permissions beyond the scope of this license may be available at http://helix-nebula.eu/ . The Helix Nebula project is co- Franco-British Workshop on Big Data in Scie London, 6-7 November 2012

description

High Energy Physics – A big data use case. Franco-British Workshop on Big Data in Science London, 6-7 November 2012. Bob Jones Head of openlab IT dept CERN. - PowerPoint PPT Presentation

Transcript of High Energy Physics – A big data use case

Page 1: High Energy Physics –  A big data  use case

High Energy Physics – A big data use case

Bob JonesHead of openlabIT deptCERN

This document produced by Members of the Helix Nebula consortium is licensed under a Creative Commons Attribution 3.0 Unported License.Permissions beyond the scope of this license may be available at http://helix-nebula.eu/. The Helix Nebula project is co-funded by the European Community Seventh Framework Programme (FP7/2007-2013) under Grant Agreement no 312301

Franco-British Workshop on Big Data in ScienceLondon, 6-7 November 2012

Page 2: High Energy Physics –  A big data  use case

Accelerating Science and Innovation2

Page 3: High Energy Physics –  A big data  use case

3

200-400 MB/sec

Data flow to permanent storage: 4-6 GB/sec

1.25 GB/sec

1-2 GB/sec

1-2 GB/sec

Page 4: High Energy Physics –  A big data  use case

• A distributed computing infrastructure to provide the production and analysis environments for the LHC experiments

• Managed and operated by a worldwide collaboration between the experiments and the participating computer centres

• The resources are distributed – for funding and sociological reasons

• Our task was to make use of the resources available to us – no matter where they are located

• Secure access via X509 certificates issued by network of national authorities - International Grid Trust Federation (IGTF)

– http://www.igtf.net/

WLCG – what and why?

Tier-0 (CERN):• Data recording• Initial data reconstruction• Data distribution

Tier-1 (11 centres):• Permanent storage• Re-processing• Analysis

Tier-2 (~130 centres):• Simulation• End-user analysis

4

Page 5: High Energy Physics –  A big data  use case

• Castor service at Tier 0 well adapted to the load:– Heavy Ions: more than 6 GB/s to tape (tests

show that Castor can easily support >12 GB/s); Actual limit now is network from experiment to CC

– Major improvements in tape efficiencies – tape writing at ~native drive speeds. Fewer drives needed

– ALICE had x3 compression for raw data in HI runs

WLCG: Data TakingHI: ALICE data into Castor > 4 GB/s (red)

HI: Overall rates to tape > 6 GB/s (r+b)

23 PB data written in 201116 PB in 2012 !

1.3 0.8

5.84.6

0.9

1.2

0.0 0.1 0.20.8

Accumulated Data 2012ALICEAMSATLASCMSCOMPASSLHCBNA48NA61NTOFUSER

Page 6: High Energy Physics –  A big data  use case

Overall use of WLCG

109 HEPSPEC-hours/month(~150 k CPU continuous use)

1.5M jobs/day

Usage continues to grow even over end of year technical stop- # jobs/day- CPU usage

Page 7: High Energy Physics –  A big data  use case

Brazil

UkraineGreece

India

Pakistan

Australia

Finland

Republic of KoreaTaipeiNorwayTurkeyEstonia

HungaryAustriaBelgium

Israel

RomaniaChina

SwedenPortugal

Czech RepublicSlovenia

SwitzerlandJapan

CanadaPoland; 3%Russian Federation; 4%Spain; 4%

Italy; 6%Germany; 9%

France; 10%

UK; 16% USA; 32%

Significant use of Tier 2s for analysis

CPU – 11.2010-10.2011TRIUMF

1%CERN3% KIT

5% PIC2%

CCIN2P33%CNAF

4%NDGF

1%Sara/

NIKHEF4%

ASGC1%

RAL4%

FNAL5%BNL

5%

Tier-2's61%

Page 8: High Energy Physics –  A big data  use case

• WLCG has been leveraged on both sides of the Atlantic, to benefit the wider scientific community– Europe (EC FP7):

• Enabling Grids for E-sciencE (EGEE) 2004-2010

• European Grid Infrastructure (EGI) 2010--

– USA (NSF):• Open Science Grid (OSG)

2006-2012 (+ extension?)

• Many scientific applications

Broader Impact of the LHC Computing Grid

ArcheologyAstronomyAstrophysicsCivil ProtectionComp. ChemistryEarth SciencesFinanceFusionGeophysicsHigh Energy PhysicsLife SciencesMultimediaMaterial Sciences… 8

Page 9: High Energy Physics –  A big data  use case

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 May 2008 9

EGEE – What do we deliver?• Infrastructure operation

– Sites distributed across many countries Large quantity of CPUs and storage Continuous monitoring of grid services & automated site

configuration/management Support multiple Virtual Organisations from diverse

research disciplines

• Middleware– Production quality middleware distributed under

business friendly open source licence Implements a service-oriented architecture that virtualises

resources Adheres to recommendations on web service inter-operability

and evolving towards emerging standards

• User Support - Managed process from first contact through to production usage– Training– Expertise in grid-enabling applications– Online helpdesk– Networking events (User Forum, Conferences etc.)

Page 10: High Energy Physics –  A big data  use case

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 May 2008 10

Sample of Business Applications • SMEs

– NICE (Italy) & GridWisetech (Poland): develop services on open source middleware for deployment on customer in-house IT infrastructure

– OpenPlast project – (France) Develop and deploy Grid platform for plastics industry

– Imense Ltd (UK) - Ported gLite application and GridPP sites

• Energy– TOTAL, UK - Ported application using GILDA testbed– CGGVeritas (France) – manages in-house IT infrastructures

and sells services to petrochemical industry

• Automotive• DataMat (Italy) – Provides grid services to automotive

industry

Page 11: High Energy Physics –  A big data  use case
Page 12: High Energy Physics –  A big data  use case

Bob Jones – CERN openlab 2012

CERN openlab in a nutshell

• A science – industry partnership to drive R&D and innovation with over a decade of success

• Evaluate state-of-the-art technologies in a challenging environment and improve them

• Test in a research environment today what will be used in many business sectors tomorrow

• Train next generation of engineers/employees

• Disseminate results and outreach to new audiences

CONTRIBUTOR (2012)

12

Page 13: High Energy Physics –  A big data  use case

Bob Jones – CERN openlab 2012

Virtuous Cycle

CERN requirements push the

limitApply new techniques

and technologie

s

Joint development in rapid

cycles

Test prototypes

in CERN environmen

t

Produce advanced products

and services

A public-private partnership between the research community and industry13

Page 14: High Energy Physics –  A big data  use case

• Inter-partner collaborations: 2

• Fellows: 4 • Summer Students: 6

• Publications: 37• Presentations: 41• Reference Activities: over 15

• Product enhancements: on 8 product lines

openlab III (2009-2011)

CERN openlab Board of Sponsors 2012

Page 15: High Energy Physics –  A big data  use case

ICE-DIP

Marie Curie proposal submitted in January 2012 to EC and accepted for funding (total 1.25M€ from EC):

ICE-DIP, the Intel-CERN European Doctorate Industrial Program, is an EID scheme hosted by CERN and Intel Labs Europe.

ICE-DIP will engage 5 Early Stage Researchers (ESRs).Each ESR will be hired by CERN for 3 years and will spend 50% of their time at Intel. Academic rigour and training quality is ensured by the associate partners, National University of Ireland Maynooth and Dublin City University, where the ESRs will be enrolled in doctorate programmes.

Research themes: usage of many-core processors for data acquisition, future optical interconnect technologies, reconfigurable logic and data acquisition networks.

Focus is the LHC experiments’ trigger and data acquisition systems

15

Page 16: High Energy Physics –  A big data  use case

How to evolve WLCG?

A distributed computing infrastructure to provide the production and analysis environments for the LHC experiments

• Collaboration - The resources are distributed and provided “in-kind”

• Service - Managed and operated by a worldwide collaboration between the experiments and the participating computer centres

• Implementation - Today general grid technology with high-energy physics specific higher-level services

Evolve the Implementation while preserving the collaboration & service

16

Page 17: High Energy Physics –  A big data  use case

CERN-ATLAS flagship configuration

Monte Carlo jobs (lighter I/O)10s MB in/out~6-12 hours/job

Ran ~40,000 CPU days

Ramón Medrano Llamas,Fernando Barreiro, Dan van der Ster (CERN IT), Rodney Walker (LMU Munich)

Difficulties overcomeDifferent vision of cloudsDifferent APIsNetworking aspects

Page 18: High Energy Physics –  A big data  use case

Conclusions

• The Physics community took the concept of a grid and turned into a global production quality service aggregating massive resources to meet the needs of the LHC collaborations

• The results of this development serve a wide range of research communities;have helped industry understand how it can use distributed computing;have launched a number of start-up companies andprovided the IT service industry with new tools to support their customers

• Open source licenses encourage the uptake of the technology by other research communities and industry while ensuring the research community contribution is acknowledged

• Providing access to computing infrastructures by industry and research communities for prototyping purposes reduces the investment and risk for the adoption of new technologies

October 2012 - The LHC Computing Grid - Bob Jones

Page 19: High Energy Physics –  A big data  use case

Conclusions

• Many research communities and business sectors are now facing an unprecedented data deluge. The Physics community with its LHC programme has unique experience in handling data at this scale

• The on-going work to evolve the LHC computing infrastructure to make use of cloud computing technology can serve as an excellent test ground for the adoption of cloud computing in many research communities, business sectors and government agencies

• The Helix Nebula initiative is driving the physics community exploration of how commercial cloud services can serve the research infrastructures of the future and provide new markets for European industry

October 2012 - The LHC Computing Grid - Bob Jones