Post on 16-Mar-2018
FINDING THE “HIGGS” IN THE HAYSTACK(S)
Stephen J. Gowdy (CERN) 12th September 2012 XLDB Conference
Overview
Large Hadron Collider (LHC)
Compact Muon Solenoid (CMS) experiment
The Challenge
Worldwide LHC Computing Grid (wLCG)
Data Organisation
Analysis Techniques
Databases
Future Trends
12th September 2012 Finding the "Higgs" in the Haystack(s) 2
a hadron is a composite particle made of quarks
Large Hadron Collider
12th September 2012 Finding the "Higgs" in the Haystack(s) 3
Big machine characteristics
17 mile circular tunnel, 100m underground, straddling the French-Swiss border
Protons currently travel at 99.9999964% of the speed of light
Each proton enters CH over 11,000 times in a second
Will not reach design beam energy till 2014
Interactions potentially every 25ns (40MHz)
Each interaction has multiple collisions
Call “pileup”, currently around 30 collisions per event
12th September 2012 Finding the "Higgs" in the Haystack(s) 4
Accelerator Complex
Older machines feed newer machines
LHC Protons start in LINAC2 then go to the PS via the BOOSTER
From the PS they are injected to the SPS
Injected to LHC at 450GeV Accelerated to 4TeV in
LHC
Need to have “fills” ~1/day
12th September 2012 Finding the "Higgs" in the Haystack(s) 5
LHC
CMS
CERN Main Site
12th September 2012 Finding the "Higgs" in the Haystack(s) 6
SPS
a muon is a (comparatively) long lived big brother to the electron
Compact Muon Spectrometer
12th September 2012 Finding the "Higgs" in the Haystack(s) 7
12th September 2012 Finding the "Higgs" in the Haystack(s) 8
Particle Identification 101
12th September 2012 Finding the "Higgs" in the Haystack(s) 9
12th September 2012 Finding the "Higgs" in the Haystack(s) 10
Trigger Architecture
12th September 2012 Finding the "Higgs" in the Haystack(s) 11
Matching “Trigger Towers” ECAL, HCAL:
ET(dd
Electron Isolation,
Jet detection
Sorting
ETmiss
ETtot
0.8 < || < 2.4 || < 1.2 || < 2.1
for Endcap and Barrel:
pT, , , quality
Track segments
endcap and barrel
≤ 4 candidates
Final decision, partitioning
Interface to TTC, TTS (Trigger throttling system)
Data Rates
RAW (ie unprocessed) data is about ~1MB/ev
Potential detector acquisition rate 1MB * 40MHz = 40TB/s
Actual data is much larger but all detectors not able to readout at 40MHz
Hardware trigger decision allows 100kHz rate Looks at individual detectors to make a fast choice
Data rate up to 100GB/s
High Level Trigger done on filter farm Output rate is nominally 300Hz ~= 300MB/s
12th September 2012 Finding the "Higgs" in the Haystack(s) 12
why it isn’t easy
12th September 2012 Finding the "Higgs" in the Haystack(s) 13
The Challenge
A “Higgs” event
12th September 2012 Finding the "Higgs" in the Haystack(s) 14
A Haystack
12th September 2012 Finding the "Higgs" in the Haystack(s) 15
40 reconstructed vertices High PileUp run 25th October 2011
Haystacks
So that was one event
2012 average is 30 collisions per event
By the end of 2012 will have almost 7 billion events recorded
After the reduction of 40MHz to O(300Hz)
Doesn’t include simulated data
Looking for a half million Higgs particles
Assuming predicted cross sections are correct
Many are much much harder to find than 4 muons
12th September 2012 Finding the "Higgs" in the Haystack(s) 16
like an electric grid that supplies computing power
12th September 2012 Finding the "Higgs" in the Haystack(s) 17
Worldwide LHC Computing Grid (wLCG)
Tiered System
Tier-0 at CERN Data gets “sorted” and its first pass reconstruction
Tier-1 centres CMS has seven, large regional facilities
Provide custodial tape storage
Large scale re-reconstruction
Tier-2 centres Frequently universities or groups of universities
Simulation
End user analysis
12th September 2012 Finding the "Higgs" in the Haystack(s) 18
Schematic
12th September 2012 Finding the "Higgs" in the Haystack(s) 19
CERN
Fermilab IN2P3 ASGC KIT CNAF
Florida UCSD
Tier-0
Tier-2
Tier-1
Tier-3
CMS Detector
Filter Farm
UCLA MyLaptop
LHCOPN (Optical Private Network)
12th September 2012 Finding the "Higgs" in the Haystack(s) 20
CMS is green
Traf
fic
on
a C
ER
N H
olid
ay
Resources
12th September 2012 Finding the "Higgs" in the Haystack(s) 21
Tier-0 121 21%
Tier-1 137 23%
Tier-2 324 56%
CPU (kHS06) 582kHS06~=150kSi2k
Tier-0 4800 9%
Tier-1 21000 40%
Tier-2 27000 51%
Disk (TB) 51800TB
Tier-0 23000 33%
Tier-1 47000 67%
Tape (TB) 90000TB
lining up the bytes in a consumable order
12th September 2012 Finding the "Higgs" in the Haystack(s) 22
Data Organisation
Data Tiers
“Streamer” files written to disk by filter farm
Read and reorganised into Primary Datasets (PD)
Based on trigger selections (physics motivation)
Output is the custodial RAW data
Reconstruction run on RAW PDs
Output RECO and AOD (Analysis Object Data)
Simulation also produces similar data tiers plus truth information
12th September 2012 Finding the "Higgs" in the Haystack(s) 23
Data Ordering
ROOT used as persistency framework
Depending on expected reading pattern adjust ordering of data in files
RAW & RECO expected to read whole event
Ordering in file is by event
AOD could have subset of data read
Pass frequently over a single variable making plots
12th September 2012 Finding the "Higgs" in the Haystack(s) 24
Attribute 1
Attribute 4
1 2 3 … n
1 2 3 n
… 1 2 3 n
…
Skims
Train model like event selection
Various analysis include their event selection
Selection done using reco output
More detailed and accurate than trigger info
Can cut a lot harder
First skims done at Tier-1 on the Tier-0 output
Called PromptSkims as it is started ASAP
Currently write out 81 datasets from Tier-0 output
12th September 2012 Finding the "Higgs" in the Haystack(s) 25
Datasets
Files are collected in datasets
Datasets should be processed together
This actually uses a database (Oracle)
Each dataset has provenance attached to it
Can be superseded by a reprocessing
End user tool queries database and creates jobs to process it
Typically across all the Tier-2s hosting the dataset
12th September 2012 Finding the "Higgs" in the Haystack(s) 26
narrowing the haystacks
12th September 2012 Finding the "Higgs" in the Haystack(s) 27
Analysis Techniques
Discriminating Variables
Each analysis will find the variables that enhance their signal to noise ratio High energy muon is an easy
one i.e. something going really
fast doesn’t bend so much in the magnetic field
May end up loosing a lot of signal to reduce the background by a larger factor Optimise S/√B or S/ √ (S+B)
12th September 2012 Finding the "Higgs" in the Haystack(s) 28
0
10
20
30
40
50
60
Momentum of muon (GeV)
Pseudo Data
Background Signal
Multivariate Analysis
Many different types Simple rectangular cuts (multiple 1-d cuts)
Maximum Likelihood approaches Combine the probability of all input variables
Fisher Discriminants Input variables are projected to another space to
avoid correlations
Neural Networks
Most of these methods rely on training
Some packages can apply many methods
12th September 2012 Finding the "Higgs" in the Haystack(s) 29
TMVA (Toolkit for MVA in ROOT)
12th September 2012 Finding the "Higgs" in the Haystack(s) 30
New Boson Plot
H -> ZZ -> llll
Use five angles and two masses as discriminators
12th September 2012 Finding the "Higgs" in the Haystack(s) 31
not xldbs though
12th September 2012 Finding the "Higgs" in the Haystack(s) 32
Databases
Conditions Database
Largest database use (not in size, ~300GB)
Provides calibration, geometry and alignment information
Used by all running jobs
Can be more than 100k jobs world wide
Network of squid caches used
Database queues transformed into http requests
Home grown technology to achieve this (Frontier)
Works as data is written once, read many
12th September 2012 Finding the "Higgs" in the Haystack(s) 33
12th September 2012 Finding the "Higgs" in the Haystack(s) 34
Squids Aggregate: 500k requests/min
500MB/s
Offline Servers: 4k requests/min
0.5MB/s
Other Databases
PhEDEx : Manages file transfers
Single Oracle instance at CERN
DBS : Dataset Bookkeeping System
Contains meta-data about datasets and files
Main instance in Oracle at CERN
User instances available elsewhere with MySQL
Job tracking databases
Use both Oracle and MySQL
Recent system archiving information in CouchDB
12th September 2012 Finding the "Higgs" in the Haystack(s) 35
Reading Rate
12th September 2012 Finding the "Higgs" in the Haystack(s) 36
6TB/day
250TB/day
…need to wear shades
12th September 2012 Finding the "Higgs" in the Haystack(s) 37
Future Trends
Federated Storage
Aiming towards an architecture where all storage is visible globally
12th September 2012 Finding the "Higgs" in the Haystack(s) 38
User App
Global Redirector
US Redirector EU Redirector
Site A Site B Site C Site D
Open /store/foo
Query /store/foo Query /store/foo
Query /store/foo
/store/foo
Redirect Global
Open /store/foo
US Region EU Region ?? Region
Redirect EU
Redirect Site C
Clouds: for a rainy day
Helix Nebula
European initiative to provide unified system
Shows importance for standards
Proof of concept demonstrated on Amazon
Costs still prohibitively expensive
Estimate order of magnitude
Running our own data centres more cost effective
May be interesting for adding short term capacity
12th September 2012 Finding the "Higgs" in the Haystack(s) 39
Clouds: internal cloud
CERN moving to “agile” infrastructure
Commissioning new data centre in Hungary
Filter farm as cloud during LHC shutdown
Using OpenStack across 15k cores
Allows flexibility for redeployment
Farm also needed for detector work
12th September 2012 Finding the "Higgs" in the Haystack(s) 40
Summary
Database technology used in various roles
Whole size around 10TB: not huge
Our Big Data: 20PB RAW data
CMS uses worldwide computing infrastructure to deliver physics results
We’ve found a needle, now need to figure out what kind it is: http://lanl.arxiv.org/abs/1207.7235
12th September 2012 Finding the "Higgs" in the Haystack(s) 41
XLDB Europe 2013 @ CERN
CERN will be happy to host a European Satellite XLDB
Planned date: 25+26 June 2013 During LHC long shutdown, which will allow to
include also discussions on LHC data management issues
We invite everyone to help reaching out to places in Europe with challenging xldb-related issues please contact dirk.duellmann@cern.ch and
becla@slac.stanford.edu
12th September 2012 Finding the "Higgs" in the Haystack(s) 42