Scheduling system for distributed MPD data processing

18
LOGO Scheduling system for distributed MPD data processing Gertsenberger K. V. Joint Institute for Nuclear Research, Dubna

description

Scheduling system for distributed MPD data processing. Gertsenberger K . V . Joint Institute for Nuclear Research , Dubna. NICA scheme. Multipurpose Detector (MPD). - PowerPoint PPT Presentation

Transcript of Scheduling system for distributed MPD data processing

Page 1: Scheduling system for distributed MPD data processing

LOGO

Scheduling system for distributed MPD data processing

Gertsenberger K. V.

Joint Institute for Nuclear Research, Dubna

Page 2: Scheduling system for distributed MPD data processing

NICA scheme

Gertsenberger K.V. 2

Page 3: Scheduling system for distributed MPD data processing

Multipurpose Detector (MPD)

The software MPDRoot is developed for the MPD event simulation, reconstruction of experimental or simulated data and following physical analysis of heavy ion collisions registered by the MultiPurpose Detector at the NICA collider.

3Gertsenberger K.V.

Page 4: Scheduling system for distributed MPD data processing

Development of the NICA cluster

2 main directions of the development:

data storage development for the experiment

organization of parallel processing of the MPD events

4

development and expansion distributed cluster for the MPD experiment based on LHEP farm

development and expansion distributed cluster for the MPD experiment based on LHEP farm

Gertsenberger K.V.

Page 5: Scheduling system for distributed MPD data processing

Current NICA cluster in LHEP

5Gertsenberger K.V.

Page 6: Scheduling system for distributed MPD data processing

Data storage on the NICA cluster

6Gertsenberger K.V.

Distributed file system GlusterFS

it aggregates existing file systems in a common distributed file system

automatic replication works as background process

background self-checking service restores corrupted files in case of hardware or software failure

Page 7: Scheduling system for distributed MPD data processing

Parallel MPD data processing

PROOF serverparallel data processing in ROOT macros on the parallel architectures

concurrent dataprocessing

MPD-schedulerscheduling system for the task distribution to parallelize data processing on the cluster nodes

7Gertsenberger K.V.

Page 8: Scheduling system for distributed MPD data processing

MPD-schedulerDeveloped on C++ language with ROOT classes’ support.

SVN: mpdroot/macro/mpd_scheduler

Uses scheduling system the Sun Grid Engine system (qsub command) for execution in cluster mode.

SGE combines cluster machines at the LHEP farm (nc10, nc11 and nc13) into the pool of worker nodes with 34 logical processors.

Jobs for distributed execution on the NICA cluster are described and passed to MPD-scheduler as XML file:

$ mpd-scheduler my_job.xml

8Gertsenberger K.V.

Page 9: Scheduling system for distributed MPD data processing

9

<job>

<macro name="$VMCWORKDIR/macro/mpd/reco.C" start_event=”0” count_event=”1000” add_args=“local”/>

<file input="$VMCWORKDIR/macro/mpd/evetest1.root" output="$VMCWORKDIR/macro/mpd/mpddst1.root"/>

<file input="$VMCWORKDIR/macro/mpd/evetest2.root" output="$VMCWORKDIR/macro/mpd/mpddst2.root"/>

<file db_input="mpd.jinr.ru,energy=3,gen=urqmd" output="~/mpdroot/macro/mpd/evetest_${counter}.root"/>

<run mode=“global" count=“25" config=“~/mpdroot/build/config.sh"/>

</job>The description starts and ends with tag <job>.

Tag <macro> sets information about macro being executed by MPDRoot:

name – file path of a ROOT macro to execute, necessary parameter

start_event – number of the first event to process for all input files, optional

count_event – count of the events to process for all input files, optional

add_args – additional arguments of the ROOT macro, if required

Job description. Tag <macro>.

Gertsenberger K.V.

Page 10: Scheduling system for distributed MPD data processing

10

Tag <file> defines files to process by macro above:

input – input file path

output – result file path

start_event – number of the first event in the input file, optional

count_event – count of the events to process in the input file, optional

paralell_mode – processor count to parallel event processing of input file, optional

merge – whether merge result part files in parallel_mode, default: “true”Gertsenberger K.V.

Job description. Tag <file>.<job>

<macro name="$VMCWORKDIR/macro/mpd/reco.C" start_event=”0” count_event=”1000” add_args=“local”/>

<file input="$VMCWORKDIR/macro/mpd/evetest1.root" output="$VMCWORKDIR/macro/mpd/mpddst1.root"/>

<file input="$VMCWORKDIR/macro/mpd/evetest2.root" output="$VMCWORKDIR/macro/mpd/mpddst2.root"/>

<file db_input="mpd.jinr.ru,energy=3,gen=urqmd" output="~/mpdroot/macro/mpd/evetest_${counter}.root"/>

<run mode=“global" count=“25" config=“~/mpdroot/build/config.sh"/>

</job>

Page 11: Scheduling system for distributed MPD data processing

11

<job>

<file db_input="mpd.jinr.ru, energy=3, gen=urqmd" output="~/mpdroot/macro/mpd/evetest_${counter}.root"/>

</job>db_input – string for defining a list of files from MPD simulation database

mpd.jinr.ru – net address of the server with simulation database and some selection parameters: range of the collision energy, type of the particle generator, particles of the collision, description and other.

The list of special variables of argument “output”:

${counter} = file counter with start value and step being equal 1

${input} = input file path

${file_name} = name of the input file without extension

${file_name_with_ext} = name of the input file with extensionGertsenberger K.V.

Processing event files from MPD simulation database.

Page 12: Scheduling system for distributed MPD data processing

12

Tag <run> describes run parameters and the allocated resources for the job:

mode – execution mode: ‘global’ – distributed processing on the NICA cluster, ‘local’ – multithreaded execution on a multicore computer

count – maximum count of the processors allocated for this job

config – path of a bash file with environment variables (including ROOT environment variables) being executed before macro

logs – log file path for multithreaded modeGertsenberger K.V.

Job description. Tag <run>.<job>

<macro name="$VMCWORKDIR/macro/mpd/reco.C" start_event=”0” count_event=”1000” add_args=“local”/>

<file input="$VMCWORKDIR/macro/mpd/evetest1.root" output="$VMCWORKDIR/macro/mpd/mpddst1.root"/>

<file input="$VMCWORKDIR/macro/mpd/evetest2.root" output="$VMCWORKDIR/macro/mpd/mpddst2.root"/>

<file db_input="mpd.jinr.ru,energy=3,gen=urqmd" output="~/mpdroot/macro/mpd/evetest_${counter}.root"/>

<run mode=“global" count=“25" config=“~/mpdroot/build/config.sh"/>

</job>

Page 13: Scheduling system for distributed MPD data processing

13

<job>

<command line="get_mpd_prod energy=5-9 "/>

<run mode="global" config="~/mpdroot/build/config.sh"/>

</job>

Tag <command> with argument line is used to run a non-ROOT command.

Running non-ROOT command on the NICA cluster

Gertsenberger K.V.

Job description. Non-ROOT command.

Page 14: Scheduling system for distributed MPD data processing

Local use

MPD-scheduler can be used to parallel event processing on user multicore machine in local mode

14Gertsenberger K.V.

<job>

<macro name=“~/mpdroot/macro/mpd/reco.C"/>

<file input=“~/mpdroot/macro/mpd/evetest1.root" output="~/mpdroot/macro/mpd/mpddst1.root“

start_event=”0” count_event=”0”/>

<file input="~/mpdroot/macro/mpd/evetest2.root" output="~/mpdroot/macro/mpd/mpddst2.root“

start_event=”0” count_event=”1000” parallel_mode=“5” merge=“true”/>

<run mode="local" count=“6" config=“~/mpdroot/build/config.sh" logs="processing.log"/>

</job>

Page 15: Scheduling system for distributed MPD data processing

<job> <macro name="~/mpdroot/macro/mpd/reco.C"/> <file input="$VMCWORKDIR/evetest1.root" output="$VMCWORKDIR/mpddst1.root"/> <file input="$VMCWORKDIR/evetest2.root" output="$VMCWORKDIR/mpddst2.root"/> <file input="$VMCWORKDIR/evetest3.root" output="$VMCWORKDIR/mpddst3.root"/> <run mode=“global" count=“3" config=“~/mpdroot/build/config.sh"/></job>

MPD-scheduler on the NICA cluster

15Gertsenberger K.V. 15Gertsenberger K.V.

SGESGE SGE

SGE = Sun Grid Engine serverSGE = Sun Grid Engine worker

*.root

GlusterFS

SGE batch system

(10) (10) (14)

qsub

evetest1.root

SGE

MPD-schedulerevetest2.root

evetest3.root

free free free

mpddst2.root

job_reco.xml

<job> <command line="get_mpd_production energy=5-9 "/> <run mode="global" config="~/mpdroot/build/config.sh"/></job>

job_command.xml

mpddst1.root mpddst3.rootjob_command.xml

Page 16: Scheduling system for distributed MPD data processing

The speedup of the one reconstruction on the NICA cluster

16Gertsenberger K.V.

Page 17: Scheduling system for distributed MPD data processing

The description of the scheduling system on mpd.jinr.ru

17Gertsenberger K.V.

Page 18: Scheduling system for distributed MPD data processing

Conclusions The distributed NICA cluster was deployed based on LHEP farm for

the NICA/MPD experiment (Fairsoft, ROOT/PROOF, MPDRoot, Gluster, Sun Grid Engine). 128 cores

The data storage was organized with the GlusterFS distributed file system: /nica/mpd[1-8]. 10 TB

The system for the distributed job execution – MPD-scheduler was developed to run MPDRoot macros concurrently on the cluster. It’s based on the Sun Grid Engine scheduling system.

The web site mpd.jinr.ru in section Computing – NICA cluster – Batch processing presents the manual for the developed MPD scheduling system.

18Gertsenberger K.V.