EDELWEISS data structure and analysis framework · 2014. 5. 21. · proc0: copy to Lyon proc1:...
Transcript of EDELWEISS data structure and analysis framework · 2014. 5. 21. · proc0: copy to Lyon proc1:...
-
KIT – Universität des Landes Baden-Württemberg und nationales Forschungszentrum in der Helmholtz-Gemeinschaft
Benjamin Schmidt, June 2014 at MPI München
www.kit.edu
Photo by Böhringer Friedrich
EDELWEISS data structure and analysis framework
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 2
Motivation to build a new data structure and analysis framework (Kdata)
! We had: Edw-II data analysis dispersed between Ana and Era ! 2 experts (full time analysis) ! Each with their own code
single(few local)-user / single-programmer ! 2010 A. Cox and I struggling to find, to access and to analyze Edw2
data ! Coincidence (Muon-Veto/Bolometer) study as diploma work
Benjamin Schmidt
Era Root based, but difficult access, no server with most recent code/data… Saclay Ana Fortran, Paw and C, No paw support, French comments in code/data…
Task: Get the data
J. Cham
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 3
! Short term facilitate data access ! Build flexible event based data structure ! Single combined HLA-file:
muon-veto and bolometer data ! Make code and data easily available
Documentation
! Long term establish a common collaboration-wide analysis and data storage tool ! Share tasks (calibration, template creation, …) / Remove barriers
(documentation) ! Allow for upgrade to 100’s of detectors – develop automatic processing
scheme
Benjamin Schmidt
Motivation to build a new data structure and analysis framework (Kdata)
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 4
The general picture – The idea All software modules
Benjamin Schmidt
KDS data structure
KPTA pulse trace analysis Kamping
Raw Amp HLA
Analysis: KDataPy KQPA
DAQ KSamba
ampToHLA
A bit special: Standalone code Extensive use of templates
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 5
Specific known - unknown requirements during Kdata development
! Requirements Edw-3: ! 10 -> 40 detectors
! Larger workload for debugging, calibration and analysis
! New detector design (channel number/specifics initially unknown)
! New electronics (some specifics unknown)
! 1st time resolved ionization signals (trace length?, num traces?) ! Change in analog amplifiers -> signal shape?, trace length?, sampling? ! new efforts to optimize signal treatment needed
! Integrate muon-veto in bolo DAQ
Benjamin Schmidt
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 6
! The idea: ! Build a data storage and analysis framework use ROOT
for event-based physics data ! Fast I/O ! Support for LHC lifetime ! Data compression ! Statistics tools ! Well known
! C++ class library for data encapsulation ! Keep it modular ! Keep it flexible and general ! Try to keep it simple ! Keep fully split tree (library independent)
! Document it ! Make it easily accessible Benjamin Schmidt
Event based data sorage Kdata - implementation
repository
https://edwdev-ik.fzk.de/SVN_Repository_for_the_KIT_Dark_Matter_Group/KData.html
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 7
Kdata event structure in detail
! Use ROOT types ! No nested arrays
! Kdata library not needed to read data ! Long livety of data guaranteed
! Kdata coded consistent to ROOT and taligent coding style: ! Easier to read/collaborate/check code ! For example:
! classes defined in header .h; implemented in .cxx ! variables start with small f (fChannelName; fAmp; fExtra; …) ! functions start with capital letter GetChannelName(); GetTrace();… ! Kds completely implemented with Get…() and Set…() methods
àTab completion (ipython, root session)
Benjamin Schmidt
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 8
Kdata event structure in detail
! ROOT TTree with single event branch ! Event with flexible structure:
! Variable sized TClonesArrays for Bolometer-, BoloPulse-, PulseAnalysis-, Samba- and MuonModule information
! Allows to change in hardware number of bolos/number of channels per bolo… without code change in “kds” (data structure source code)!
! Requires some effort to get to know, though
Benjamin Schmidt
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 9
Kdata event structure Logic Layout:
Benjamin Schmidt
TTree
KEvent
KBoloPulseRecord = Channel
KPulseAnalysisRecord
KSambaRecord
KMuonModuleRecords
KBolometerRecord
Logic event structure via TRef and TRefArray Very powerful – can be spread over files,…. A word of caution though: Require specific handling in event building: Never forget to reset the referenced object count TProcessID::SetObjectCount ->blowing up file size otherwise Probably most bugs and pbs in kds were related to TRef issues
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 10
Kdata event structure Logic Layout:
Benjamin Schmidt
TTree
KEvent
KBolometerRecord
KBoloPulseRecord = Channel
KPulseAnalysisRecord
KSambaRecord
KMuonModuleRecords
Looping in python: for event in filereader: for bolo in event.boloRecords(): for pulse in bolo.pulseRecords(): for analyis in pulse.analysisRecords(): Looping C++ style in python: for i in range(f.GetEntries()): f.GetEntry(i) event = f.GetEvent() for ii in range(event.GetNumBolos()): bolo = event.GetBolo(ii) samba = bolo.GetSambaRecord() print samba.GetNtpDateSec() for iii in range(bolo.GetNumPulseRecords()): pulse = bolo.GetPulseRecord(iii) Trace = pulse.GetTrace() …
KPulseAnalysisRecord
KPulseAnalysisRecord
Bandpass analysis
Optimal filter
Trapezoidal filter …
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 11
Kdata event structure in detail
Benjamin Schmidt
! Structure subclassed in ! Raw: KRawEvent, KRawBolometerRecord, … ! Amp: KAmpEvent, KAmpBolometerRecord, …. ! HLA: KHLAEvent, KHLABolometerRecord, …
Raw – with pulse traces! No KPulseAnalysisRecords
Amp and HLA – no pulse traces, but KPulseAnalysisRecord
With a quick calculation 2.87* 356/1850 *2.35 à FWHM 1.04 keV Ana 1.1 keV
< 1/10 raw file size
~ 1/2 samba file size
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 12
Python and KDataPy
Benjamin Schmidt
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 13
simpleEventViewer output:
Benjamin Schmidt
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 14
Looping utilites – no need to write the looping/plotting
Benjamin Schmidt
! Use KDataPy.util with plotpulse(), looppulse(), loopbolo() and KDataPy.loop_amp with loopchannel(), plotchan_x(), plotchan_x_files(), plotchan_x_dir()
! Loop_amp to be completed with plotchannel_xy(), … and loop/plotbolo functions – Note that KDataPy.util loopbolo() also works for Amp and HLA data
! Basic usage: import ROOT import KDataPy.util as ut ut.plotpulse(“/sps/edelweis/kdata/data/raw/nk23b002_000.root”, “chalB FID823”)
! Documentation
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 15
Our data acquisition chains revisited
Benjamin Schmidt
Samba Macs
Muon Veto DAQ
Bolo-Raw data
Automated proc0: copy to Lyon proc1: rootification proc2: raw->amp proc3: amp->hla proc4: merge/skim muon/hla bolo data spsToHpss: backup on tape drive
Kdata - ROOT on kalinka
Our look up place
Modane
Lyon
Karlsruhe
Radon
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 16
Using the Kdata pulse processing library
Benjamin Schmidt
Adam Cox our benevolent dictator for life
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 17
The KPulseAnalysisChain
Benjamin Schmidt
The kpta-chain is applied before your analysis function
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 18
Ionisation channel after pattern removal:
Benjamin Schmidt
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 19
Advantages – Drawbacks (personal opinion) ! Flexibility of data structure
! Consistency of data structure (over time)
! Same data structure for different detector systems -> Great for coincidence studies
! Same data structure for different processing/analyses (bandpass, optimal filter, …)
! Decouple high level analyses from DAQ/processing changes
! Independent kpta library ! Has been reused with (flat) data from
EURECA test stand ! Very versatile
Benjamin Schmidt
! Flexibility of data structure comes with some complexity (heavyness)
! Especially Ttree.Draw() more complex
! Single raw data folder à restricted use of ls
! Writing kpta with templates a bit more complex
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 20
Usage of pyhton
Benjamin Schmidt
! 90 % of the time python feels like the right solution
! Shorter, more legible code ! Vast set of external libraries ! Extremely handy for scripting ! Basic Documentation in python
always via ‘’’docstrings’’’
! Main price – speed: ! Circumvent by producing an
additional set of data files skimmed by detector
! Future use of pypy + ROOT6
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 21 Benjamin Schmidt
But 50 x slower PyPY-JIT compile 1.06 x slower
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 22 Benjamin Schmidt
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 23
CouchDB for everything else and python to glue everything together
! Automat database (117 parameters every 20 sec) ! dataDB
! Samba header information Useful to find data under conditions(temperature, voltage, run_type,…)
! Processing state History of processing/file location (complete documentation)
! Supplementary processing databases ! Templates, high-/lowpass filter parameters, cuts
! Radon measurements ! …
Benjamin Schmidt
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 24
A more complex example: Heat template fitting code
! Three python modules (all part of KDataPy!): 1. templateFitSelection.py (looping over data, select pulses, average
parameters; call the other scripts) 2. pulsetempy.py (perform template fit) 3. uploadAnalyticalTemplateToDB.py (save fit parameter to DB)
! Usage: Import KDataPy.TemplateFitSelection as tfit tfit.templateFitSelection(‘/sps/kdata/data/raw/nk23b002_000.root’) tfit.run(‘chalB FID808’)
! Note that there are some more options though! ! The code itself is commented and should help to discover more options ! Sorry – Documentation (web) has not been updated yet
Benjamin Schmidt
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 25
Basic looping once more
! More verbose version: ! Use plotPulseEventViewer module in kanacodewok
import plotPulseEventViewer as plt plt.plotPulseEventViewer(‘/sps/edelweis/kdata/data/raw/nk23b002_000.root’, ‘chalA FID823’)
Benjamin Schmidt
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 26
More advanced usage
! Hook in an analysis function
Benjamin Schmidt
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 27
Processing – some details
! Database driven: ! Proc0: scp of samba raw data to ccage (Lyon)
! Task1: change scp account to keitel (all tests finished, batch-, hpss-,…) ! Task2: add md5 checksum test after transfer
! Proc1: rootification (Modane) scp to ccage (Lyon) ! Task: transfer rootification to Lyon
! Proc2: processing and filtering ! Template fitting tools with DB access implemented ! Adaptation of processing to 8 step function ionization channels ! All data from november processed with KFeldbergKampSite
(BW Bandpass filter – all channels treated seperately) sps/edelweis/kdata/data/amp/Run305
! Task1: automate using DB and redhook.sh script ! Task2: implement KSeebugKampSite (BW Bandpass with simultaneous heat-
ionization fits) ! Task3: (longer term) revive/debug optimal filter KChamonixKAmpSite
Benjamin Schmidt
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 28
Processing – some more details
! Proc3: calibration of Amp level files ! Task1: portation of Era scripts: perform calibration, store results (calibDB) ! Task2: implement Amp->HLA process using calibDB
! Proc4/5/6: ! Tasks: concat/Merge/Skim data
! What can/should be automated? ! Tasks: facilitate access to data:
! Implement run list based on datadb (see talks by Cecile/Lukas/Valentin) ! Write python utilities to facilitate plotting/looping
! KDataPy.util ! KDataPy.loop_amp …
! spsToHPSS: ! Fully working ! Task1: nj13b…tar. There is a file that was too big for automatic processing ! Task2: implement md5 checksum test after writing
Benjamin Schmidt
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 29
Template fitting
The program is rather verbose!
Benjamin Schmidt
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 30
Template fitting
Benjamin Schmidt
Strong dependence on initial parameters Initial params from last fit pulstemplates db Some tweaking still necessary (larger amplitude…)
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 31
A useful trick – Quitting your loop
Benjamin Schmidt
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 32
Loop-/plotbolo
! You need to correlate channels? à skip looping at bolometer level
Benjamin Schmidt
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 33 Benjamin Schmidt
Okay a stupid example, but a quick one Note the documentation with further examples: KDataPy Utility functions
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 34
From theory to practice – Part 2 Working with Amp level data
Benjamin Schmidt
! Structure subclassed in ! Raw: KRawEvent, KRawBolometerRecord, … ! Amp: KAmpEvent, KAmpBolometerRecord, …. ! HLA: KHLAEvent, KHLABolometerRecord, …
Raw – with pulse traces! No KPulseAnalysisRecords
Amp and HLA – no pulse traces, but KPulseAnalysisRecord
With a quick calculation 2.87* 356/1850 *2.35 à FWHM 1.04 keV Ana 1.1 keV
< 1/10 raw file size
~ 1/2 samba file size
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 35
Ttree.Draw() example
Benjamin Schmidt
With a quick calculation 2.87* 356/1850 *2.35 à FWHM 1.04 keV Ana 1.1 keV
TTree->Draw() command or rather TChain->Draw() (called from python) c.Draw("fPulseAna[].GetAmp()", "fPulseAna[].GetBoloPulseRecord().GetChannelName() == \"slowD FID823\" && fPulseAna[].GetExtra(8)==5 ")
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 36
Using loop_amp
Benjamin Schmidt
Or – if the automatic binning is too crude:
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 37
Loop_amp together with file lists/directories
! Use loop.plotchan_x_files([“file1.root”, “file2.root”], ‘channel’, …) or use loop.plotchan_x_dir(‘directory’, ‘file-pattern’, ‘channel’, …)
Benjamin Schmidt
Amplitude
Entries Ent
ries
Amplitude
Ent
ries
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 38
Plotting a Tgraph of two variables – very first example: RMS vs energy
Benjamin Schmidt
Chi2
Amplitude
These are just examples Develop your own “hook-in” functions! x_some_function() Xy_some_function()….
-
June 2014, CRESST/EDELWEISS/EURECA software workshop 39
Calibrated data
! ERA calibrated data in Kdata v3.0 format for Run12 Computing Center in Lyon and at KIT
! Ana calibrated data in Kdata (dev-version) for Run20 https://edwdev-ik.fzk.de/wsvn/EDELWEISS/analysis/kdata/branches/newhla2/ An initial data set FID804 available at KIT and Lyon /sps/edelweis/schmidt/AnaToKData/Run20
! KData preliminary analysis files of single detectors Run12 – Run20 – Run 304 at KIT
Benjamin Schmidt
Hole collecting
Hole veto Electron veto
Electron collecting