Scheduling system for distributed MPD data processing

LOGO

Scheduling system for distributed MPD data processing

Gertsenberger K. V.

Joint Institute for Nuclear Research, Dubna

NICA scheme

Gertsenberger K.V. 2

Multipurpose Detector (MPD)

The software MPDRoot is developed for the MPD event simulation, reconstruction of experimental or simulated data and following physical analysis of heavy ion collisions registered by the MultiPurpose Detector at the NICA collider.

3Gertsenberger K.V.

Development of the NICA cluster

2 main directions of the development:

data storage development for the experiment

organization of parallel processing of the MPD events

4

development and expansion distributed cluster for the MPD experiment based on LHEP farm

development and expansion distributed cluster for the MPD experiment based on LHEP farm

Gertsenberger K.V.

Current NICA cluster in LHEP

5Gertsenberger K.V.

Data storage on the NICA cluster

6Gertsenberger K.V.

Distributed file system GlusterFS

it aggregates existing file systems in a common distributed file system

automatic replication works as background process

background self-checking service restores corrupted files in case of hardware or software failure

Parallel MPD data processing

PROOF serverparallel data processing in ROOT macros on the parallel architectures

concurrent dataprocessing

MPD-schedulerscheduling system for the task distribution to parallelize data processing on the cluster nodes

7Gertsenberger K.V.

MPD-schedulerDeveloped on C++ language with ROOT classes’ support.

SVN: mpdroot/macro/mpd_scheduler

Uses scheduling system the Sun Grid Engine system (qsub command) for execution in cluster mode.

SGE combines cluster machines at the LHEP farm (nc10, nc11 and nc13) into the pool of worker nodes with 34 logical processors.

Jobs for distributed execution on the NICA cluster are described and passed to MPD-scheduler as XML file:

$ mpd-scheduler my_job.xml

8Gertsenberger K.V.

9

<job>

<macro name="$VMCWORKDIR/macro/mpd/reco.C" start_event=”0” count_event=”1000” add_args=“local”/>

<file input="$VMCWORKDIR/macro/mpd/evetest1.root" output="$VMCWORKDIR/macro/mpd/mpddst1.root"/>


<file db_input="mpd.jinr.ru,energy=3,gen=urqmd" output="~/mpdroot/macro/mpd/evetest_${counter}.root"/>

<run mode=“global" count=“25" config=“~/mpdroot/build/config.sh"/>

</job>The description starts and ends with tag <job>.

Tag <macro> sets information about macro being executed by MPDRoot:

name – file path of a ROOT macro to execute, necessary parameter

start_event – number of the first event to process for all input files, optional

count_event – count of the events to process for all input files, optional

add_args – additional arguments of the ROOT macro, if required

Job description. Tag <macro>.

Gertsenberger K.V.

10

Tag <file> defines files to process by macro above:

input – input file path

output – result file path

start_event – number of the first event in the input file, optional

count_event – count of the events to process in the input file, optional

paralell_mode – processor count to parallel event processing of input file, optional

merge – whether merge result part files in parallel_mode, default: “true”Gertsenberger K.V.

Job description. Tag <file>.<job>






</job>

11

<job>

…

<file db_input="mpd.jinr.ru, energy=3, gen=urqmd" output="~/mpdroot/macro/mpd/evetest_${counter}.root"/>

…

</job>db_input – string for defining a list of files from MPD simulation database

mpd.jinr.ru – net address of the server with simulation database and some selection parameters: range of the collision energy, type of the particle generator, particles of the collision, description and other.

The list of special variables of argument “output”:

${counter} = file counter with start value and step being equal 1

${input} = input file path

${file_name} = name of the input file without extension

${file_name_with_ext} = name of the input file with extensionGertsenberger K.V.

Processing event files from MPD simulation database.

12

Tag <run> describes run parameters and the allocated resources for the job:

mode – execution mode: ‘global’ – distributed processing on the NICA cluster, ‘local’ – multithreaded execution on a multicore computer

count – maximum count of the processors allocated for this job

config – path of a bash file with environment variables (including ROOT environment variables) being executed before macro

logs – log file path for multithreaded modeGertsenberger K.V.

Job description. Tag <run>.<job>






</job>

13

<job>

<command line="get_mpd_prod energy=5-9 "/>

<run mode="global" config="~/mpdroot/build/config.sh"/>

</job>

Tag <command> with argument line is used to run a non-ROOT command.

Running non-ROOT command on the NICA cluster

Gertsenberger K.V.

Job description. Non-ROOT command.

Local use

MPD-scheduler can be used to parallel event processing on user multicore machine in local mode

14Gertsenberger K.V.

<job>

<macro name=“~/mpdroot/macro/mpd/reco.C"/>

<file input=“~/mpdroot/macro/mpd/evetest1.root" output="~/mpdroot/macro/mpd/mpddst1.root“

start_event=”0” count_event=”0”/>

<file input="~/mpdroot/macro/mpd/evetest2.root" output="~/mpdroot/macro/mpd/mpddst2.root“

start_event=”0” count_event=”1000” parallel_mode=“5” merge=“true”/>

<run mode="local" count=“6" config=“~/mpdroot/build/config.sh" logs="processing.log"/>

</job>

<job> <macro name="~/mpdroot/macro/mpd/reco.C"/> <file input="$VMCWORKDIR/evetest1.root" output="$VMCWORKDIR/mpddst1.root"/> <file input="$VMCWORKDIR/evetest2.root" output="$VMCWORKDIR/mpddst2.root"/> <file input="$VMCWORKDIR/evetest3.root" output="$VMCWORKDIR/mpddst3.root"/> <run mode=“global" count=“3" config=“~/mpdroot/build/config.sh"/></job>

MPD-scheduler on the NICA cluster

15Gertsenberger K.V. 15Gertsenberger K.V.

SGESGE SGE

SGE = Sun Grid Engine serverSGE = Sun Grid Engine worker

*.root

GlusterFS

SGE batch system

(10) (10) (14)

qsub

evetest1.root

SGE

MPD-schedulerevetest2.root

evetest3.root

free free free

mpddst2.root

job_reco.xml

<job> <command line="get_mpd_production energy=5-9 "/> <run mode="global" config="~/mpdroot/build/config.sh"/></job>

job_command.xml

mpddst1.root mpddst3.rootjob_command.xml

The speedup of the one reconstruction on the NICA cluster


The description of the scheduling system on mpd.jinr.ru


Conclusions The distributed NICA cluster was deployed based on LHEP farm for

the NICA/MPD experiment (Fairsoft, ROOT/PROOF, MPDRoot, Gluster, Sun Grid Engine). 128 cores

The data storage was organized with the GlusterFS distributed file system: /nica/mpd[1-8]. 10 TB

The system for the distributed job execution – MPD-scheduler was developed to run MPDRoot macros concurrently on the cluster. It’s based on the Sun Grid Engine scheduling system.

The web site mpd.jinr.ru in section Computing – NICA cluster – Batch processing presents the manual for the developed MPD scheduling system.


Scheduling system for distributed MPD data processing

Documents

Transcript of Scheduling system for distributed MPD data processing