DA-JPL-final

31

Transcript of DA-JPL-final

Page 1: DA-JPL-final
Page 2: DA-JPL-final

CERN Big Data Analytics as a Service Infrastructure: Challenges and Desired FeaturesManuel Martín Márquez

Page 3: DA-JPL-final

3

CERN • CERN - European Laboratory for Particle Physics• Founded in 1954 by 12 Countries for fundamental

physics research in a post-war Europe• Major milestone in the post-World War II recovery/reconstruction

process

Jet Propulsion Laboratory – NASAPasadena, October 6th

Page 4: DA-JPL-final

4

CERN openlab• Public-private partnership between CERN and

leading ICT companies• Accelerate cutting-edge solutions to be used by

the worldwide LHC community• Train the next generation of top engineers and

scientists.

Page 5: DA-JPL-final

5

CERN openlab

Jet Propulsion Laboratory – NASAPasadena, October 6th

Page 6: DA-JPL-final

6Manuel Martin MarquezIntel IoT Ignition Lab – Cloud and Big DataMunich, September 17th

Page 7: DA-JPL-final

7Manuel Martin MarquezIntel IoT Ignition Lab – Cloud and Big DataMunich, September 17th

A World-Wide Collaboration

Page 8: DA-JPL-final

8

Fundamental Research• Why do particles have mass?

Manuel Martin MarquezJet Propulsion Laboratory – NASAPasadena, October 6th

Page 9: DA-JPL-final

9

Fundamental Research• Why is there no antimatter left in the Universe?

• Nature should be symmetrical

• What was matter like during the first second of the Universe, right after the "Big Bang"? • A journey towards the beginning of the Universe

gives us deeper insight.

Manuel Martin MarquezJet Propulsion Laboratory – NASAPasadena, October 6th

Page 10: DA-JPL-final

10

Fundamental Research• What is 95% of the Universe made of?

Jet Propulsion Laboratory – NASAPasadena, October 6th

Page 11: DA-JPL-final

11Manuel Martin MarquezIntel IoT Ignition Lab – Cloud and Big DataMunich, September 17th

Page 12: DA-JPL-final

04/15/2023 Document reference 12

The Large Hadron Collider (LHC)

Largest machine in the world27km, 6000+ superconducting magnets

Emptiest place in the solar system High vacuum inside the magnets

Hottest spot in the galaxy During Lead ion collisions create temperatures 100 000x hotter than the heart of the sun;

Fastest racetrack on EarthProtons circulate 11245 times/s (99.9999991% the speed of light)

Page 13: DA-JPL-final

13

CERN’s Accelerator Complex

Page 14: DA-JPL-final

04/15/2023 Document reference 14

ATLAS Detector

150 Million of sensorControl and detection sensors

Massive 3D cameraCapturing 40+ million collisions per secondData rate TB per second

Page 15: DA-JPL-final

04/15/2023 Document reference 15

CMS Detector

Raw DataWas a detector element hint?How much energy?What time?

Reconstructed DataParticle TypeOriginMomentum of tracks (4 vectors)Energy in cluster (jets)Calibration Information

Page 16: DA-JPL-final

16

Page 17: DA-JPL-final

17

Worldwide LHC Computing Grid• Provides Global computing resources

• Store, distribution and analysis

• Physics Analysis using ROOT• Dedicated analysis framework • Plotting, fitting, statistics and analysis

Jet Propulsion Laboratory – NASAPasadena, October 6th

Page 18: DA-JPL-final

18

Grid Data Analysis in Practice• Small Datasets

• Copy files and run locally

• Large Datasets• Split the analysis in multiple jobs• Jobs sent to Grid

Page 19: DA-JPL-final

19

CERN Control Systems

Control and operationsMillion of sensors, large number of control devices, front-end equipment, etc.Many critical systems: Cryogenics, Vacuums, Machine Protection, etc.

Page 20: DA-JPL-final

20

The ChallengeLHC Availability – Estimated VS Observed

Setup6%

Injection4%

Ramp3% Squeeze

3%

Stable Beams83%

Setup28%

Injection15%

Ramp2%

Squeeze5%

Stable Beams

37%

No Beam

(access)14%

Manuel Martin MarquezJet Propulsion Laboratory – NASAPasadena, October 6th

Page 21: DA-JPL-final

21

The Challenge

Access System; 527

Controls; 158

Cryogenics; 655

Electricity; 455

Fluids; 657

Other; 12Heavy Handling; 266

Safety Systems; 233Technical Infrastructure; 124

LHC Corrective Intervention: 3087 / year

Manuel Martin MarquezJet Propulsion Laboratory – NASAPasadena, October 6th

Page 22: DA-JPL-final

22

The ChallengeFault Drivers

Manuel Martin MarquezJet Propulsion Laboratory – NASAPasadena, October 6th

Page 23: DA-JPL-final

23

The Challenge• A look into the near Future

• LHC run 2 (2015)

Manuel Martin Marquez

2015

Jet Propulsion Laboratory – NASAPasadena, October 6th

Page 24: DA-JPL-final

Post-LHC accelerator projects80-100 km

Page 25: DA-JPL-final

25

Data Analytics Challenges• Profit from our data investment

• Extracting knowledge.

• Optimize our systems is mandatory• Reducing and predicting faults and corrective interventions• Increase the availability and operations efficiency

• Control and Monitoring Systems• Proactive• Predictive• Intelligent

Manuel Martin MarquezJet Propulsion Laboratory – NASAPasadena, October 6th

Page 26: DA-JPL-final

26

DA Technology Aspects:• Near-real-time processing

• GBs per second – Low Latency (order of second)• Integrate pre-existing human knowledge and inferred from

analytics• Important factors to considered

• Scalability• Fault-tolerance• Guarantee all data is processed

Manuel Martin MarquezJet Propulsion Laboratory – NASAPasadena, October 6th

Page 27: DA-JPL-final

27

DA Technology Aspects:• Batch Processing

• Different Domains• Highly heterogeneous data nature• Support wide range of DA tools and programming languages

• Data Repositories• Store large amount of data (Hundreds of TBs) • Integrate with existing repositories

Manuel Martin MarquezJet Propulsion Laboratory – NASAPasadena, October 6th

Page 28: DA-JPL-final

28

DA Technology Aspects:

Manuel Martin Marquez

• CERN Accelerator Logging Service (1 million signals)• Cryogenics temperatures, • Magnetic field strengths, Power dissipation, Vacuum Pressures, • Beam intensities and positions…etc…

• About 5 million daily/average data requests• Throughput over 100TB/Year, 300TB in 2015

Jet Propulsion Laboratory – NASAPasadena, October 6th

Page 29: DA-JPL-final

29

DA Educational Aspects:• General

• New professional profile

• CERN• Many domains of expertise involved

• Vacuum, cryogenics, power converters

• Engineering and Control teams• Need to work close to data scientists

Manuel Martin Marquez

Data analysis platforms,statistics,

mathematics,data visualization,

monitoring, security,

etc.

Data Scientists

Jet Propulsion Laboratory – NASAPasadena, October 6th

Page 30: DA-JPL-final

30

DA as a Service:• Integration

• Use open and well-defined standards• Real-time Analysis• Batch Processing • Data Repositories

• Offer solution to other data analytics need in other institutions• ESA, Human Brain Project, etc.

Manuel Martin MarquezJet Propulsion Laboratory – NASAPasadena, October 6th

Page 31: DA-JPL-final