Scheduling system for distributed MPD data processing
description
Transcript of Scheduling system for distributed MPD data processing
LOGO
Scheduling system for distributed MPD data processing
Gertsenberger K. V.
Joint Institute for Nuclear Research, Dubna
NICA scheme
Gertsenberger K.V. 2
Multipurpose Detector (MPD)
The software MPDRoot is developed for the MPD event simulation, reconstruction of experimental or simulated data and following physical analysis of heavy ion collisions registered by the MultiPurpose Detector at the NICA collider.
3Gertsenberger K.V.
Development of the NICA cluster
2 main directions of the development:
data storage development for the experiment
organization of parallel processing of the MPD events
4
development and expansion distributed cluster for the MPD experiment based on LHEP farm
development and expansion distributed cluster for the MPD experiment based on LHEP farm
Gertsenberger K.V.
Current NICA cluster in LHEP
5Gertsenberger K.V.
Data storage on the NICA cluster
6Gertsenberger K.V.
Distributed file system GlusterFS
it aggregates existing file systems in a common distributed file system
automatic replication works as background process
background self-checking service restores corrupted files in case of hardware or software failure
Parallel MPD data processing
PROOF serverparallel data processing in ROOT macros on the parallel architectures
concurrent dataprocessing
MPD-schedulerscheduling system for the task distribution to parallelize data processing on the cluster nodes
7Gertsenberger K.V.
MPD-schedulerDeveloped on C++ language with ROOT classes’ support.
SVN: mpdroot/macro/mpd_scheduler
Uses scheduling system the Sun Grid Engine system (qsub command) for execution in cluster mode.
SGE combines cluster machines at the LHEP farm (nc10, nc11 and nc13) into the pool of worker nodes with 34 logical processors.
Jobs for distributed execution on the NICA cluster are described and passed to MPD-scheduler as XML file:
$ mpd-scheduler my_job.xml
8Gertsenberger K.V.
9
<job>
<macro name="$VMCWORKDIR/macro/mpd/reco.C" start_event=”0” count_event=”1000” add_args=“local”/>
<file input="$VMCWORKDIR/macro/mpd/evetest1.root" output="$VMCWORKDIR/macro/mpd/mpddst1.root"/>
<file input="$VMCWORKDIR/macro/mpd/evetest2.root" output="$VMCWORKDIR/macro/mpd/mpddst2.root"/>
<file db_input="mpd.jinr.ru,energy=3,gen=urqmd" output="~/mpdroot/macro/mpd/evetest_${counter}.root"/>
<run mode=“global" count=“25" config=“~/mpdroot/build/config.sh"/>
</job>The description starts and ends with tag <job>.
Tag <macro> sets information about macro being executed by MPDRoot:
name – file path of a ROOT macro to execute, necessary parameter
start_event – number of the first event to process for all input files, optional
count_event – count of the events to process for all input files, optional
add_args – additional arguments of the ROOT macro, if required
Job description. Tag <macro>.
Gertsenberger K.V.
10
Tag <file> defines files to process by macro above:
input – input file path
output – result file path
start_event – number of the first event in the input file, optional
count_event – count of the events to process in the input file, optional
paralell_mode – processor count to parallel event processing of input file, optional
merge – whether merge result part files in parallel_mode, default: “true”Gertsenberger K.V.
Job description. Tag <file>.<job>
<macro name="$VMCWORKDIR/macro/mpd/reco.C" start_event=”0” count_event=”1000” add_args=“local”/>
<file input="$VMCWORKDIR/macro/mpd/evetest1.root" output="$VMCWORKDIR/macro/mpd/mpddst1.root"/>
<file input="$VMCWORKDIR/macro/mpd/evetest2.root" output="$VMCWORKDIR/macro/mpd/mpddst2.root"/>
<file db_input="mpd.jinr.ru,energy=3,gen=urqmd" output="~/mpdroot/macro/mpd/evetest_${counter}.root"/>
<run mode=“global" count=“25" config=“~/mpdroot/build/config.sh"/>
</job>
11
<job>
…
<file db_input="mpd.jinr.ru, energy=3, gen=urqmd" output="~/mpdroot/macro/mpd/evetest_${counter}.root"/>
…
</job>db_input – string for defining a list of files from MPD simulation database
mpd.jinr.ru – net address of the server with simulation database and some selection parameters: range of the collision energy, type of the particle generator, particles of the collision, description and other.
The list of special variables of argument “output”:
${counter} = file counter with start value and step being equal 1
${input} = input file path
${file_name} = name of the input file without extension
${file_name_with_ext} = name of the input file with extensionGertsenberger K.V.
Processing event files from MPD simulation database.
12
Tag <run> describes run parameters and the allocated resources for the job:
mode – execution mode: ‘global’ – distributed processing on the NICA cluster, ‘local’ – multithreaded execution on a multicore computer
count – maximum count of the processors allocated for this job
config – path of a bash file with environment variables (including ROOT environment variables) being executed before macro
logs – log file path for multithreaded modeGertsenberger K.V.
Job description. Tag <run>.<job>
<macro name="$VMCWORKDIR/macro/mpd/reco.C" start_event=”0” count_event=”1000” add_args=“local”/>
<file input="$VMCWORKDIR/macro/mpd/evetest1.root" output="$VMCWORKDIR/macro/mpd/mpddst1.root"/>
<file input="$VMCWORKDIR/macro/mpd/evetest2.root" output="$VMCWORKDIR/macro/mpd/mpddst2.root"/>
<file db_input="mpd.jinr.ru,energy=3,gen=urqmd" output="~/mpdroot/macro/mpd/evetest_${counter}.root"/>
<run mode=“global" count=“25" config=“~/mpdroot/build/config.sh"/>
</job>
13
<job>
<command line="get_mpd_prod energy=5-9 "/>
<run mode="global" config="~/mpdroot/build/config.sh"/>
</job>
Tag <command> with argument line is used to run a non-ROOT command.
Running non-ROOT command on the NICA cluster
Gertsenberger K.V.
Job description. Non-ROOT command.
Local use
MPD-scheduler can be used to parallel event processing on user multicore machine in local mode
14Gertsenberger K.V.
<job>
<macro name=“~/mpdroot/macro/mpd/reco.C"/>
<file input=“~/mpdroot/macro/mpd/evetest1.root" output="~/mpdroot/macro/mpd/mpddst1.root“
start_event=”0” count_event=”0”/>
<file input="~/mpdroot/macro/mpd/evetest2.root" output="~/mpdroot/macro/mpd/mpddst2.root“
start_event=”0” count_event=”1000” parallel_mode=“5” merge=“true”/>
<run mode="local" count=“6" config=“~/mpdroot/build/config.sh" logs="processing.log"/>
</job>
<job> <macro name="~/mpdroot/macro/mpd/reco.C"/> <file input="$VMCWORKDIR/evetest1.root" output="$VMCWORKDIR/mpddst1.root"/> <file input="$VMCWORKDIR/evetest2.root" output="$VMCWORKDIR/mpddst2.root"/> <file input="$VMCWORKDIR/evetest3.root" output="$VMCWORKDIR/mpddst3.root"/> <run mode=“global" count=“3" config=“~/mpdroot/build/config.sh"/></job>
MPD-scheduler on the NICA cluster
15Gertsenberger K.V. 15Gertsenberger K.V.
SGESGE SGE
SGE = Sun Grid Engine serverSGE = Sun Grid Engine worker
*.root
GlusterFS
SGE batch system
(10) (10) (14)
qsub
evetest1.root
SGE
MPD-schedulerevetest2.root
evetest3.root
free free free
mpddst2.root
job_reco.xml
<job> <command line="get_mpd_production energy=5-9 "/> <run mode="global" config="~/mpdroot/build/config.sh"/></job>
job_command.xml
mpddst1.root mpddst3.rootjob_command.xml
The speedup of the one reconstruction on the NICA cluster
16Gertsenberger K.V.
The description of the scheduling system on mpd.jinr.ru
17Gertsenberger K.V.
Conclusions The distributed NICA cluster was deployed based on LHEP farm for
the NICA/MPD experiment (Fairsoft, ROOT/PROOF, MPDRoot, Gluster, Sun Grid Engine). 128 cores
The data storage was organized with the GlusterFS distributed file system: /nica/mpd[1-8]. 10 TB
The system for the distributed job execution – MPD-scheduler was developed to run MPDRoot macros concurrently on the cluster. It’s based on the Sun Grid Engine scheduling system.
The web site mpd.jinr.ru in section Computing – NICA cluster – Batch processing presents the manual for the developed MPD scheduling system.
18Gertsenberger K.V.