Tackling tomorrow’s computing challenges...Two general -purpose detectors cross -confirm...
Transcript of Tackling tomorrow’s computing challenges...Two general -purpose detectors cross -confirm...
Tackling tomorrow’scomputing challenges
Tackling tomorrow’scomputing challenges
today at CERN
Maria GironeCERN openlab CTO
CERNistheEuropeanLaboratoryforParticlePhysics.
Maria GironeCERN openlab CTO
ThelaboratorystraddlestheFranco-SwissbordernearGeneva.
Maria GironeCERN openlabCTO
Ithas22memberstatesandsupportsaglobalcommunityof15,000researchers.
Associatememberstates
Observers
Associatememberstatesinthepre-stagetomembership
Memberstates
Cooperation agreements
3observers8associates22members
Aworld-wideendevour
Budget(2017)1100MCHF
Maria GironeCERN openlabCTO
TheseresearchersareprobingthefundamentalstructureoftheUniverse.
UnderstandingtheveryfirstmomentsofourUniverseafterthe
BigBang
UnderstandingDarkMatter
Looking forAntimatter
CERN’smission:research,technology,education,andcollaboration.
Maria GironeCERN openlabCTO
qAdvance thefrontiersofknowledgeE.g.thesecretsoftheBigBang…whatwasthematterlikewithinthefirstmomentsoftheUniverse’sexistence?
qDevelop newtechnologiesforacceleratorsanddetectorsInformationtechnology- theWebandtheGRID
Medicine- diagnosisandtherapy
qTrain scientistsandengineersoftomorrow
qUnite peoplefromdifferentcountriesandcultures
Tackling tomorrow’scomputing challenges
today at CERN
Maria GironeCERN openlab CTO
TheLHCistheworld’slargestandmostpowerfulparticleaccelerator.
Maria GironeCERN openlabCTO
TheLHCistheworld’slargestandmostpowerfulparticleaccelerator.
CMS
ALICE
ATLAS LHCb
Maria GironeCERN openlabCTO
Itisbuiltaround100mundergroundandhasacircumferenceof27km.
CMS
ALICE
ATLAS LHCb
Maria GironeCERN openlabCTO
CMS
ALICE
ATLAS LHCb
Maria GironeCERN openlabCTO
Theparticlesareacceleratedtoclosetothespeedoflight.
TheLHCisamachineofrecords!
HOTTESTspotsinthe
galaxy
COLDERTEMPERATURES
than outer space
The
FASTESTRACETRACK
on the Planet
The MostPowerful
MAGNETS
The Highest
VACUUM
The Most
SOPHISTICATED
DETECTORSever built
Maria GironeCERN openlabCTO
Whatmaycomenext.Maria Girone
CERN openlabCTO
Tackling tomorrow’scomputing challenges
today at CERN
Maria GironeCERN openlab CTO
Thedetectorsarelikegiganticdigitalcamerasbuiltincathedral-sizedcaverns.
Maria GironeCERN openlabCTO
Experimentsarerunbycollaborationsofscientistsfrominstitutesallovertheworld.
Maria GironeCERN openlabCTO
Twogeneral-purposedetectorscross-confirmdiscoveries,suchastheHiggsboson.
ATLAS
46mlong,25mdiameterweights 7’000tonnes100millionelectronic channels ,3000kmofcables
CMS
22mlong,15mdiameterweights 14’000tonnesMostpowerful superconducting solenoid ever built
Maria GironeCERN openlabCTO
ALICEandLHCb experimentshavedetectorsspecialisedonstudyingspecificphenomena.
LHCbALICE
Studies the«QuarkGluonPlasma»,stateofmatterwhich existed momentsafter theBig Bang.
Studies thebehaviour difference between thebquarkandtheanti-b quarktoexplain thematter-antimatterasymmetry intheUniverse.
Maria GironeCERN openlabCTO
Tackling tomorrow’scomputing challenges
today at CERN
Maria GironeCERN openlab CTO
Collisionsgenerateparticlesthatdecayincomplexwaysintoevenmoreparticles.
Maria GironeCERN openlabCTO
Uptoabout1billionparticlecollisionscantakeplaceeverysecond.
Maria GironeCERN openlabCTO
Thiscangenerateuptoapetabyteofdatapersecond.Filteringthedatainrealtime,selectingpotentiallyinterestingevents(trigger).
Maria GironeCERN openlabCTO
PB/s
Datagenerated40milliontimespersecond
100,000selections
persecond
TB/s
1,000selections
persecond
GB/s
Tackling tomorrow’scomputing challenges
today at CERN
Maria GironeCERN openlab CTO
TheCERNdatacentreprocesseshundredsofpetabytesofdataeveryyear.
Maria GironeCERN openlabCTO
CERN’sdatacentreinMeyrin istheheartofthelaboratory’scomputinginfrastructure. Maria Girone
CERN openlabCTO
Thetwocentres areconnectedbythree100Gb/sfibre-opticlinks.
WIGNERCENTRE(H)
100,0000processorscores100PBondisk
MEYRINCENTRE(CH)
300,0000processorscores180PBondisk230PBontape
TheWignerdatacentre inBudapestservesasanextensiontotheoneinMeyrin.
Thetwocentres areconnectedbythree100Gb/sfibre-opticlinks.
MEYRINCENTRE(CH)
300,0000processorscores180PBondisk230PBontape
WIGNERCENTRE(H)
100,0000processorscores100PBondisk
Maria GironeCERN openlabCTO
Tackling tomorrow’scomputing challenges
today at CERN
Maria GironeCERN openlab CTO
Physicistsmustsiftthroughthe30-50PBsproducedannuallybytheLHCexperiments.
Maria GironeCERN openlabCTO
Physicistsmustsiftthroughthe30-50PBsproducedannuallybytheLHCexperiments.
“Offline”- Asynchronous
L1Trigger(HW)
HLTrigger(SW)
~40MHz
~100kHz
~1kHz
~PB/s “Online”– Realtime
“RawData”~1-10GB/sWLCG
Maria GironeCERN openlabCTO
The WLCGgivesthousandsofphysicistsacrosstheglobenearreal-timeaccess.
Tier-0 (CERN and Hungary): data recording, reconstruction and distribution
Tier-1: permanent storage, re-processing, analysis
Tier-2: Simulation,end-user analysis
Maria GironeCERN openlabCTO
The Worldwide LHC Computing Grid integrates computer centres worldwide to combine computing and storage resources into a single infrastructure accessible by all LHC physicists
With170computingcentresin42countries,theWLCGisthegridthatneversleeps!
ThesizeofWLCG.
Computing
2017 – a new record in peak performance
CPU delivered Data stored
~1MCores
3PB/day
> 2 million jobs/day
~1M CPU cores
~1EB of storage
~170 sites, 42 countries
10-100 Gb/s links340 Gb/s transatlantic
Maria GironeCERN openlabCTO
3PB moved per day
DataOrganization,ManagementandAccessinWLCG
Maria GironeCERN openlabCTO
MakinghundredsofpetabytesofdataaccessiblegloballytoscientistsisonethebiggestchallengesofWLCG
1to10Tblinks
Storage
Storage
StorageComputeCompute
Compute
HPC
cache
cache
cacheCommercial
Cloud
WLCG
Tackling tomorrow’scomputing challenges
today at CERN
Maria GironeCERN openlab CTO
TheLHChasbeendesignedtofollowacarefullysetoutprogrammeofupgrades.
Maria GironeCERN openlabCTO
Theplannedupgradeswillgreatlyincreasethescientificreach.
RUN3
ALICE&LHCbupgrades
RUN4
ATLAS&CMSupgrades
Maria GironeCERN openlabCTO
Morecollisionshelpphysiciststoobserverareprocessesandstudywithgreaterprecision.
Rate of new physics is 1event in 1012
Selecting a new physicsevent is like choosing 1grain of sand in 20volley ball courts
Maria GironeCERN openlabCTO
TheHL-LHCwillcomeonlinearound2026.Morecollisionsandmorecomplexdata.
ATLAS and CMS had to cope with monster pile-up
With L=1.5 x 1034 cm-2 s-1 and 8b4e bunch structure à pile-up of ~ 60 events/x-ing (note: ATLAS and CMS designed for ~ 20 events/x-ing)
CMS: event with 78 reconstructed vertices
CMS:eventfrom2017with78reconstructedvertices
ATLAS:simulationforHL-LHCwith200vertices
Maria GironeCERN openlabCTO
TheALICEandLHCb experimentswillincreasetheirdataacceptanceratesforRun3.
CourtesyofAutomationDataCenterFacilities
• LHCb andALICEwillmoveofflineprocessingclosertotheonlinedatacollectionchain• Performingprocessinganddataanalysisinnearreal-time• Solutionsunderinvestigation
• NewHLTfarmsforRun3• FlexibleandefficientsystemwithambitiousPUEratio
Maria GironeCERN openlabCTO
TheATLASandCMSexperimentswillbesignificantlyupgradedfortheHL-LHC.
PLACE• ByRun4,thedetectorswillbecomemoregranularandmoreradiationhard.
• Reconstructingmoreparticleswithmoregranulardetectorswillbecomputationallymoreexpensive.
Maria GironeCERN openlabCTO
Usingcurrenttechniques,requiredcomputingcapacityincreases50-100times.
Year
2018 2020 2022 2024 2026 2028
CPU
Res
ourc
es [k
HS0
6*10
00]
20
40
60
80
100
Run 2 Run 3 Run 4
Resource needs(2017 Computing model)Flat budget model(+20%/year)
ATLAS Preliminary
Year
2018 2020 2022 2024 2026 2028
Dis
k St
orag
e [P
Byte
s]
1000
2000
3000
4000
5000
Run 2 Run 3 Run 4
Resource needs(2017 Computing model)Flat budget model(+15%/year)
ATLAS Preliminary
Maria GironeCERN openlabCTO
DatastorageneedsareexpectedtobeintheorderofExabytes bythistime.
Year
2018 2020 2022 2024 2026 2028
CPU
Res
ourc
es [k
HS0
6*10
00]
20
40
60
80
100
Run 2 Run 3 Run 4
Resource needs(2017 Computing model)Flat budget model(+20%/year)
ATLAS Preliminary
Year
2018 2020 2022 2024 2026 2028
Dis
k St
orag
e [P
Byte
s]
1000
2000
3000
4000
5000
Run 2 Run 3 Run 4
Resource needs(2017 Computing model)Flat budget model(+15%/year)
ATLAS Preliminary
Maria GironeCERN openlabCTO
Itisvitaltoexplorenewtechnologiesandmethodologies.
Year
2018 2020 2022 2024 2026 2028
CPU
Res
ourc
es [k
HS0
6*10
00]
20
40
60
80
100
Run 2 Run 3 Run 4
Resource needs(2017 Computing model)Flat budget model(+20%/year)
ATLAS Preliminary
Year
2018 2020 2022 2024 2026 2028
Dis
k St
orag
e [P
Byte
s]
1000
2000
3000
4000
5000
Run 2 Run 3 Run 4
Resource needs(2017 Computing model)Flat budget model(+15%/year)
ATLAS Preliminary
Maria GironeCERN openlabCTO
Factor4 Factor8
Tackling tomorrow’scomputing challenges
today at CERN
Maria GironeCERN openlab CTO
Closingtheresourcegapinthenextdecaderequiresclosecollaborationwithindustry.
Improvementsinhardwareperformanceandcapacity.
Innovationandrevolutionarythinking.
TechnologyEvolutionandImprovements
Softwareinnovation,NewArchitectures,TechniquesandMethods
Maria GironeCERN openlabCTO
CERNopenlab isauniquescience-industrypartnership,fosteringresearchandinnovation.
MANAGEMENT
JOINT R&D
EDUCATION
INNOVATION & KNOWLEDGE TRANSFER
COMMUNICATION
Maria GironeCERN openlabCTO
Threemainareasofresearchanddevelopment.
Increasedatacentreperformancewithhardwareaccelerators(FPGAs,GPUs,..)
optimizedsoftware
NewtechniqueswithMachineLearning,DeepLearning,AdvancedDataAnalytics
COMPUTINGCHALLENGES
Maria GironeCERN openlabCTO
Scaleoutcapacitywithpublicclouds,HPC,new
architectures
Tackling tomorrow’scomputing challenges
today at CERN
Maria GironeCERN openlab CTO
Datacentre technologiesandinfrastructures.Maria Girone
CERN openlabCTO
Maria GironeCERN openlabCTO
Facedwitharesourcegapofthismagnitude:
1. fullyexploitavailablehardware;2. expanddynamicallytonewcomputing
environments
Datacentre technologiesandinfrastructures.
Layered,virtualizedservicesprovideflexibilityandefficiency.
OpenStackResourceProvisioning(>1physicaldatacentre)
HTCondor
PublicCloud
VMsContainersBareMetalandHPC
(LSF)
VolunteerComputin
g
IT&ExperimentServices
EndUsers CI/CD
APIsCLIsGUIs
ExperimentPilotFactories
320kcores
CERN Tool Chain
04/09/2017 Tim Bell - CERN Computing Infrastructure 5
CERNisoneoftheearlyadoptersandlargestcontributorstoOpenStack• 90%oftheresourcesareprovidedthroughaprivatecloud• Allowsforflexibleanddynamicdeployment
Movingtocontainersforevenmoreflexibility• CurrentinvestigationswithinCERNopenlab
Maria GironeCERN openlabCTO
Large-scaletestswithcommercialclouds.
Experimentshavedemonstratedthatitispossibletoelasticallyanddynamicallyexpandproductionresourcestocommercialclouds.
300kCores80kCores
JointprocurementofR&Dcloudservicesforscientificresearch.
Maria GironeCERN openlabCTO
Demonstrationswithlarge-scale,dedicatedHPCresources,too.
T. Wenaus 2018-05-29
Processing since Jan 1
2
MC simu
MC reco
DerivationData
Grid
HPC
HLT, Cloud
Cores by processing type Cores by resource type
● Smooth Tier-0 running on 23k cores○ Thank you CERN for a 20% bump over pledge from contingency○ Commissioning Bphysics stream spillover to grid
● Sustained production with smooth operations, ~300-350k cores● HPC peaks to ~900k cores (but cores are 5-10x weaker than grid)● Derivation production of the 2018 collision data underway● Moving >1 PB, >20 GB/s, 1.5-2M files per day
HS06 shares
HS06 by resource type
HPC
Grid
HLT, Cloud
Grid 73%
Cloud13%
HPC 9%
Green istransparent(grid-like) HPC
Full sim events since Jan 1:Grid 3.81 BCloud 1.02 BTransparent HPC 0.553 BComplex HPC 0.483B
500k 500kHPCaresignificantresourcesandarebeingtestedbytheexperiments• Optimizedforhighlyparallelapplications
ATLASreachedmorethan200ktraditionalx86HPCcoresforsimulationworkflows
AllexperimentsareexploringtheuseofheterogeneousHPCarchitectures
CERNwillpartnerwithEU-PRACEforoptimizingtheuseofHPCresources
Maria GironeCERN openlabCTO
CERNisapartnerofDEEP-EST,ablueprintprojectforheterogeneousHPCsystems.
DEEP-EST:Dynamical Exascale EntryPlatform- ExtremeScaleTechnologies
Maria GironeCERN openlabCTO
Tackling tomorrow’scomputing challenges
today at CERN
Maria GironeCERN openlab CTO
ComputingPerformanceandSoftwareExploitingheterogeneousresources.
Maria GironeCERN openlabCTO
Softwareoptimizationcangainfactorsinperformance.
Processor evolution ● Moore’s Law continues to deliver
increases in transistor density○ Doubling time is clearly lengthening
● Clock speed increases stopped around 2006
○ No longer possible to ramp the clock speed as process size shrinks (Dennard scaling failed)
● So we are basically stuck at ~3GHz clocks from the underlying Wm-2 limit
○ This is the Power Wall○ Limits the capabilities of serial processing○ CPU based concurrency still in development
for LHC Run 320
C Leggett, LBNL
Accelerated computing devices (GPUs, FPGAs) offer a different model
● Potentially much greater throughput● Still many unresolved issues for legacy
code and complexity of heterogeneous processing
Maria GironeCERN openlabCTO
HEPhasavastinvestmentinsoftware• Significantefforttomakeefficientmulti-threadedand
vectorizedCPUcode
Acceleratedcomputingdevices(GPUs,FPGAs)offeradifferentmodel
• Complexityofheterogeneousarchitectures
SimultaneouslyexploringlowerperformancebutlowerpoweralternativeslikeARM
Thelandscapeisshiftingatalllevels.Maria Girone
CERN openlabCTO
2008
2018
Exploitingco-processorsforsoftware-basedfilteringandreal-timereconstruction.
Higherdataratesrequiremoreselectivetriggeringandfasterreconstruction
• LHCb isinvestigatingFPGAsandGPUstoallowreconstructionof5GB/sofeventsinrealtime.
• CMSisportingheavy”offline”taskstoreal-timeprocessingforHL-LHC• IntegrateGPUsintheHLTfarmtogivehigh-
qualityreconstructionin100msec latency(asopposedtotensofsec)
Upgrade I data processing chain
3
5 TB/s
0.1-0.2 TB/s
~5 GB/s
DETECTOR READOUT
HLT1 PARTIAL RECO
HLT2 FULL RECO
5% FULL
85% TURBO & real-time analysis
10% CALIB
Maria GironeCERN openlabCTO
QuantumComputingisalsoonthehorizon.
CERNopenlab isengaginginQCwithindustry
• Cansubstantiallyspeed-uptrainingofdeeplearningandcombinatorialsearches
• Wellsuitedforfitting,minimization,optimization
• CandirectlydescribebasicinteractionsaswellaslatticeQCDcalculations
Tackling tomorrow’scomputing challenges
today at CERN
Maria GironeCERN openlab CTO
Machinelearningandadvanceddataanalytics.
Maria GironeCERN openlabCTO
Monitoring,automationandanomalydetection.
Cooling&
Ventilation
Cryogenics
ElectricGrid
VACUUM
GAS
LHCCircuit,QPS,WIC,PIC,…
A multitude of Industrial Control SystemsExperimentandacceleratoroperationshavesimilarchallengestoindustrialapplications
• Detectorsandacceleratorsinfrastructurehealthneedstobemonitored
• Qualityofproduceddataneedstobevalidated
• Resourceusageneedstobeoptimized
Workingwithindustrypartnerstodeploysimilartechniquesandautomation
Maria GironeCERN openlabCTO
Exploringimagerecognitionforreconstructionandobjectidentification.
WithcurrentsoftwareandcomputeraneventlikeHL-LHCtakes10sofseconds
Examinethedetectorhitinformationanduse3Dimagerecognitiontechniquestoidentifyobjects• Recognizephysicsobjectsfrom
learnedpatterns
Mightdramaticallyincreasethespeedforreconstruction
INPUTIMAGES
IDENTIFICATION
Maria GironeCERN openlabCTO
Simulationisoneofthemostresource-intensivecomputingapplications.
Lookingatadversarialnetworkstoimprovespeedwithoutgivingupaccuracyofsimulatedevents
• Onenetworkattemptstosimulateeventsthatmatchadatadistribution• Whileasecondnetworktriestodistinguishdataandsimulation
• MainR&Dareas• Adaptingtheexistingcodetonewcomputingarchitectures
• Replacingcomplexalgorithmswithdeep-learningapproaches(FASTSIMULATION)
Maria GironeCERN openlabCTO
Tackling tomorrow’scomputing challenges
today at CERN
Maria GironeCERN openlab CTO
CERNiscollaboratingwithothercommunitieswhosharesimilarcomputingchallenges.
Maria GironeCERN openlabCTO
TheSquareKilometre Array(SKA)observatory’stwotelescopeswillenableastronomerstostudytheskyinunprecedenteddetail.
• Firstphasewillbeoperationalinthemid2020s;observatorywillfunctionfor50years.
Maria GironeCERN openlabCTO
Jointexascale data-storageandprocessingchallengebetweenHL-LHCandSKA.
WesternAustraliaSouthAfrica
Acceleratinginnovationandknowledgetransfertomedicalapplications. Maria Girone
CERN openlabCTO
• CERN-MEDICIS:productionofinnovativeisotopesformedicalresearch
• Acceleratordesignforfuturehadrontherapyfacilities
• Medicalimaging• Dosimetry• Computing&simulationforhealthapplications
CDERNMEDICIS
BioDynamo
Tackling tomorrow’scomputing challenges
today at CERN
Maria GironeCERN openlab CTO
Tackling tomorrow’scomputing challenges
today at CERN
Maria GironeCERN openlab CTO
CERNhasbeenpushingtheboundariesofknowledgeandtechnologyformorethan60years.
Thenextphaseoftheprogramme willincludeunprecedentedcomputingchallenges.
Welookforwardtotacklingthesechallengesthroughopencollaborationandinnovationwithindustryandotherscientificcommunities.
Tackling tomorrow’scomputing challenges
today at CERN
Maria GironeCERN openlab CTO
«Magic is nothappeningatCERN,magic is being explainedatCERN.»
TomHanksThankyou!
CreditfortheslidelayouttoAndrewPurcell,CERN
Tackling tomorrow’scomputing challenges
today at CERN
Maria GironeCERN openlab CTO
openlab.cern/whitepaper
Homepage:home.cern
@CERNopenlab
Homepage:openlab.cern
openlab.cern/education
home.cern/students-educators
@CERN
Tackling tomorrow’scomputing challenges
today at CERN
Maria GironeCERN openlab CTO
BackupSlides
Storagehasmanymorelayersandimprovementsinaccess.
IntroductionofNVRAMprovidesmuchfasteraccesstodataandapplicationsinmemory
HighPerformanceandlongdurationSSDimprovestheaccessonlargedatasets
LowcostandhighcapacitySSDcouldrevolutionizelongtermstorage
Largerspinningdiskbringsustoexabytesofstorage
ThedataattheLHC.
• Rawdata:• Wasadetectorelementhit?• Howmuchenergy?• Whattime?
• Reconstructeddata:• Momentumoftracks(4-vectors)
• Originofcollision• Energyinclusters(jets)• TypeofParticle• Calibrationinformation• …
n 150Millionsensorsdeliverdata…40Milliontimespersecond
Maria GironeCERN openlabCTO