The LEGO Train Framework Andrei Gheata Costin Grigoras Jan Fiete Grosse-Oetringhaus.
-
Upload
abner-benson -
Category
Documents
-
view
236 -
download
3
Transcript of The LEGO Train Framework Andrei Gheata Costin Grigoras Jan Fiete Grosse-Oetringhaus.
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 2
Idea
• Manage trains using MonALISA– Users register wagons– Train operators compose trains
• Automatic testing per wagon• Train file generation• Submission managed by ML (existing LPM
infrastructure)• Merging managed by LPM• Aim: allow operators easy running of analysis
trains (~weekly) getting output on the scale of 1-2 days
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 3
Configuration & Testing
• Train Configuration – New class AliAnalysisTaskCfg
• Contains description of wagons (add task macro, libraries, dependencies)
• See talk by Andrei on Monday
• Testing– Uses alientest04 machine– Downloads AliEn packages (ROOT, AliRoot)– Copies a part of the input data
set to the local machine– Runs tests per wagon– Uses syswatch to extract mem/cpu
information– Tests also "base line" task which is empty
Base line
Phys Sel
Centr Sel
User A
User B
User C
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 4
Workflow
MonALISA
User
Train operator
Test machine
AliEn
1. adds wagons
2. composes train4. recompose after test
3. generates test files + executes test5. generates train jdl + scripts
6. runs train
config
test results
train files
LPM
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 5
Screenshot
Handler configuration
Wagon configuration
Data configuration
Testing and running status
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 11
Operator Workflow
Select dataset
Select wagon
Start testing
Inspect output
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 12
Operator Workflow (2)
status of analysis
status ofmerging
intermediatemerging stepsSubmit final
merge job(to be automatized)
final mergingstatus
check output
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 13
Demo…
• Enough theory, let's do some clicking…
http://alimonitor.cern.ch/trains
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 14
Some More Details
• Train runs with an analysis tag– All code + "AddTask" macro has to be in the tag (no
par file!)
• Output per run stored in the input data directory (like AOD, QA trains). E.g.:/alice/data/2010/LHC10h/000137366/ESDs/pass2/PWG4/
CorrelationTrain/7_20111117_1350
• All merged runs found in/alice/cern.ch/user/a/alitrain/PWG4/CorrelationTrain/
7_20111117_1350/merge
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 15
Operations
• After 10-12h most jobs are done (~90-98%)– Few running, few waiting– This situation can persist for days killer for merging the output– Solutions
• Kill jobs that have waited longer than X (being tested on the level of the LPM, better as a JDL tag)
• Remove CE requirement after a certain time (thx Latchezar for this idea), to be implemented
• Merge jobs have the same tails of few jobs that wait a long time– Ideas: same as above or run them on any CE (problem with
splitting, Pablo is investigating)• Output available after ~2 days
– 25% (real time) spend in running– 75% in merging– I believe this can still be improved!
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 16
Operations (visually…)Analysis jobs
WaitingRunningDoneError
Merging jobs
WaitingRunningDoneError
Analysis jobs
WaitingRunningDoneError
hours since submission
hours since submission
hours since submission
here we kill the remaining ones
80% donein 4 hours
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 17
Current Trains
• Four active beta testers– Jets (Christian KB)– D2H (Zaida)– Correlations in pp (Eva)– Correlations in PbPb (JF)
• We got a lot of feedback, improved the system
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 18
TODO
• Graphs for CPU/Wall/Mem consumption of user tasks as function of AliRoot tag
• Some improvements in the web interface• Automatic launching of final job
The LEGO Framework - Jan Fiete Grosse-Oetringhaus 19
Documentation
• Mailing list (for operators)– [email protected]
• TWiki (Users + operators)– https://twiki.cern.ch/twiki/bin/viewauth/ALICE/
AnalysisTrains