Introduction to Ganga Karl Harrison (University of Cambridge) ATLAS Distributed Analysis Tutorial...

18
Introduction to Ganga Karl Harrison (University of Cambridge) ATLAS Distributed Analysis Tutorial Milano, 5-6 February 2007 http://cern.ch/ganga

Transcript of Introduction to Ganga Karl Harrison (University of Cambridge) ATLAS Distributed Analysis Tutorial...

Page 1: Introduction to Ganga Karl Harrison (University of Cambridge) ATLAS Distributed Analysis Tutorial Milano, 5-6 February 2007 .

Introduction to Ganga

Karl Harrison(University of Cambridge)

ATLAS Distributed Analysis TutorialMilano, 5-6 February 2007

http://cern.ch/ganga

Page 2: Introduction to Ganga Karl Harrison (University of Cambridge) ATLAS Distributed Analysis Tutorial Milano, 5-6 February 2007 .

6 February 2007 2/18

Ganga basics• Ganga is an easy-to-use frontend for job definition and

management– Allows simple switching between testing on a local batch system and large-scale data processing on distributed resources (Grid)

– Developed in the context of ATLAS and LHCb• For ATLAS, have built-in support for applicationsbased on Athena framework, for JobTransforms,and for DQ2 data-management system

– Component architecture readily allows extension– Implemented in Python

• Strong development team, meaning strong user support– F.Brochu (Cambridge), U.Egede (Imperial), J.Elmsheuser

(München), K.Harrison (Cambridge), H.C.Lee (ASCC), D.Liko (CERN),A.Maier (CERN), J.T.Moscicki (CERN), A.Muraru (Bucharest),V.Romanovsky (IHEP), A.Soroko (Oxford), C.L.Tan (Birmingham)

• Contributions past and present from many others

Page 3: Introduction to Ganga Karl Harrison (University of Cambridge) ATLAS Distributed Analysis Tutorial Milano, 5-6 February 2007 .

6 February 2007 3/18

Ganga job abstraction

• A job in Ganga is constructed from a set of building blocks, not all required for every job

Merger

Application

Backend

Input Dataset

Output Dataset

Splitter

Data read by application

Data written by application

Rule for dividing into subjobs

Rule for combining outputs

Where to run

What to run

Job

Page 4: Introduction to Ganga Karl Harrison (University of Cambridge) ATLAS Distributed Analysis Tutorial Milano, 5-6 February 2007 .

6 February 2007 4/18

Framework for plugin handling

Athena

GangaObject

IApplication IBackendIDatasetISplitter IMerger

LCG-CE-requirements-jobtype-middleware-id-status-reason-actualCE-exitcode

-atlas_release-max_events-options-option_file-user_setupfile-user_area

User

System

Plugin

Interfaces

Example plugins

and schemas

• Ganga provides a framework for handling different types of Application, Backend, Dataset, Splitter and Merger, implemented as plugin classes• Each plugin class has its own schema

Page 5: Introduction to Ganga Karl Harrison (University of Cambridge) ATLAS Distributed Analysis Tutorial Milano, 5-6 February 2007 .

6 February 2007 5/18

Applications and Backends• Running of a particular Application on a given Backend is enabled by

implementing an appropriate adapter component or Runtime Handler– Can often use same Runtime Handler for several Backend: less

coding

PBS OSG NorduGridLocal LSF PANDA

US-ATLAS WMS

LHCb WMS

ExecutableAthena

(Simulation/Digitisation/Reconstruction/Analysis)

AthenaMC(Production)

Gauss/Boole/Brunel/DaVinci(Simulation/Digitisation/Reconstruction/Analysis)

LHCb Experiment neutral ATLAS

Implemented

Coming soon

Page 6: Introduction to Ganga Karl Harrison (University of Cambridge) ATLAS Distributed Analysis Tutorial Milano, 5-6 February 2007 .

6 February 2007 6/18

LHCbapplications

ATLASapplications

Otherapplications

Applications

Experiment-specificworkload-management systems

Local batch systems Distributed (Grid) systems

Processing systems (backends)

Metadatacatalogues

Data storage and retrieval

Filecatalogues

Tools fordata

management

Localrepository

Remoterepository

Ganga job archives

Gangamonitoring

loop

User interfacefor job definitionand management

• Ganga has built-in support for ATLAS and LHCb• Component architecture allows customisation for other user groups

Ganga: how the pieces fit together

Page 7: Introduction to Ganga Karl Harrison (University of Cambridge) ATLAS Distributed Analysis Tutorial Milano, 5-6 February 2007 .

6 February 2007 7/18

Help with using Ganga

• Ganga documentation can be found in the User Guides section of the Ganga web side: http://cern.ch/ganga/– Most relevant items are:

• Installation• Working with Ganga - general introduction to functionality• GUI manual - introduction to graphical interface• Link to ATLAS Wiki page for distributed analysis using Ganga

– https://twiki.cern.ch/twiki/bin/view/Atlas/GangaTutorial427– Today’s tutorial borrows heavily from this

• For problems or feature requests, do any of the following:– Use hypernews forum for Ganga users and developers:

https://hypernews.cern.ch/HyperNews/Atlas/get/GANGAUserDeveloper.html

– Send e-mail to [email protected]– Submit a report via Ganga’s bug-submission page in Savannah:

https://savannah.cern.ch/bugs/?func=additem&group=ganga• Should either login to Savannah first, or give e-mail address

Page 8: Introduction to Ganga Karl Harrison (University of Cambridge) ATLAS Distributed Analysis Tutorial Milano, 5-6 February 2007 .

6 February 2007 8/18

Setting up to run Athena jobs with Ganga

• Setup sequence for account on lxplus is as follows– Ensure that you have a Grid certificate installed, and that you are registered with the ATLAS Virtual Organisation

– Setup environment for Athena, then checkout and build UserAnalysis package

– Setup the environment for using LCG client tools– Setup the environment for using DQ2– Setup the environment for using Ganga

• Can work from an account on a different machine, but this implies installing Ganga, an ATLAS release, LCG client and DQ2 tools– Not difficult on a supported platform, but takes time

• Detailed setup instructions given as part of hands-on exercises

source /afs/cern.ch/sw/ganga/install/etc/setup-atlas.sh

Optional, butsometimes useful

Page 9: Introduction to Ganga Karl Harrison (University of Cambridge) ATLAS Distributed Analysis Tutorial Milano, 5-6 February 2007 .

6 February 2007 9/18

Using Ganga

• Command Line Interface in Python (CLIP) provides interactive job definition and submission from an enhanced Python shell (IPython)– Especially good for trying things out, and seeing how the system works

• Scripts, which may contain any Python/IPython or CLIP commands,allow automation of repetitive tasks

• Scripts included in distribution enable kind of approach traditionally used when submitting jobs to a local batch system

• Graphical User Interface (GUI)allows job management based on mouse selectionsand field completion– Lots of configuration possibilities

• Ganga allows users to work in a variety of ways

• CLIP and scripts covered now• GUI dealt with in separate session

Page 10: Introduction to Ganga Karl Harrison (University of Cambridge) ATLAS Distributed Analysis Tutorial Milano, 5-6 February 2007 .

6 February 2007 10/18

Ganga startup and configuration files

• A Ganga CLIP session is started by giving the command:

– If the user doesn’t have a valid proxy then his/her Grid passphrase is requested

• When Ganga is first run, a configuration file .gangarc is created in the user’s home directory– The file includes comments on the configuration possibilities– The latest default configuration file can always be obtained with:

• Before processing .gangarc Ganga processes, in the order they are specified, any configuration files pointed to by the environment variable GANGA_CONFIG_PATH– This makes possible the use of group configuration files, but allows settings to be overridden on a user-by-user basis

ganga

ganga -g

Page 11: Introduction to Ganga Karl Harrison (University of Cambridge) ATLAS Distributed Analysis Tutorial Milano, 5-6 February 2007 .

6 February 2007 11/18

Ganga workspace

• Ganga creates a directory gangadir in your home directory and uses this for storing job-related files and information– You can’t move this directory, but before running Ganga, you can create ~/gangadir as a link to another location

gangadir

repository

input

Local

templates

output

workspace

Remote

gui

<username>

jobs 66 67

Page 12: Introduction to Ganga Karl Harrison (University of Cambridge) ATLAS Distributed Analysis Tutorial Milano, 5-6 February 2007 .

6 February 2007 12/18

Python commands

• Ganga is developed in Python, making use of IPython extensions

• All Python/IPython commands can be used at the prompt in a Ganga CLIP session, and the syntax for CLIP and Python commands is the same

• Information about Python can be found at: http://www.python.org/– If you’re new to Python, the on-line tutorial is extremely helpful

• The following are often useful

# A hash (#) marks the start of a comment# A slash (\) at the end of a line indicates that# the following line is a continuationdir() # List currently available objectshelp() # Give helphelp( item ) # Give help on specified itemx = 5 # Assign value to variableprint x # Print value of variablectrl-D # Exit from session

Page 13: Introduction to Ganga Karl Harrison (University of Cambridge) ATLAS Distributed Analysis Tutorial Milano, 5-6 February 2007 .

6 February 2007 13/18

IPython commands

• Information about IPython extensions can be found at: http://ipython.scipy.org/

• One useful extension is the possibility to use shell commands from Python, together with both shell variables and Python variables# Use ! before shell commands# Use $ before Python variables# Use $$ before shell variables

here = ‘where the heart is’!echo $$HOME is $here

!ls $$HOME/mySubdir

!emacs # Start emacs session, but don’t try adding &

Exit # Exit from session

Page 14: Introduction to Ganga Karl Harrison (University of Cambridge) ATLAS Distributed Analysis Tutorial Milano, 5-6 February 2007 .

6 February 2007 14/18

Ganga CLIP commands (1)

• Ganga commands are explained in the guide Working with Ganga:http://cern.ch/ganga/user/html/GangaIntroduction

• From a CLIP session, available classes, objects and functions may be listed, and help can be requested for each

• Useful commands include the followingplugins( ‘type’) # List plugins of specified type: # ‘applications’, ‘backends’, etcj1 = Job( backend =LSF() ) # Create a new job for LSFa1 = Executable() # Create Executable applicationj1.application = a1 # Set value for job’s applicationj1.backend = LCG() # Change job’s backend to LCGexport( j1, ‘myJob.py’ ) # Write job to specified fileload( ‘myJob.py’ ) # Load job(s) from specified filej2 = j1.copy() # Create j2 as a copy of job j1jobs # List jobsjobs[ i ].subjobs # List subjobs for split job i

Page 15: Introduction to Ganga Karl Harrison (University of Cambridge) ATLAS Distributed Analysis Tutorial Milano, 5-6 February 2007 .

6 February 2007 15/18

Ganga CLIP commands (2)

• When a job j has been defined, the following methods can be used

• Once a job has been submitted, it can no longer be modified, and it cannot be resubmitted, but the job can be copied and the copy can be modified/submitted

• Ganga supports use of templates, which can be used as the basis of a job definition

j.submit() # Submit the jobj.kill() # Kill the job (if running)j.remove() # Kill the job and delete associated filesj.peek() # List files in job’s output directory

t = JobTemplate() # Create templatetemplates # List templatesj3 = Job( templates[ i ] ) # Create job from template i

Page 16: Introduction to Ganga Karl Harrison (University of Cambridge) ATLAS Distributed Analysis Tutorial Milano, 5-6 February 2007 .

6 February 2007 16/18

CLIP: “Hello World” example

• From a Ganga CLIP session, a job that writes “Hello World” can be created, and then submitted to LCG, as follows app = Executable()app.exe = ‘/bin/echo’app.env = {}app.args = [‘Hello World’ ]# Property values set above are in fact the defaults# for Executable applicationj = Job( application = app, backend = LCG() )j.submit()# Check on job progressjobs# When job has completed, check the outputj.peek( ‘stdout’ )

Page 17: Introduction to Ganga Karl Harrison (University of Cambridge) ATLAS Distributed Analysis Tutorial Milano, 5-6 February 2007 .

6 February 2007 17/18

Using Ganga commands from a Linux shell• Ganga includes scripts that can be used from a Linux shell (i.e.

outside of CLIP) # Create a job for submitting Executable to LCG ganga make_job Executable LCG test.py [ Edit test.py to set Executable and/or LCG properties ] # Submit job ganga submit test.py # Query status, triggering output retrieval if job is completed ganga query

# Kill job ganga kill id # Remove job from Ganga repository and workspace ganga remove id

• For ATLAS users, Ganga also includes athena script for running Athena jobs

• Given job name or id as returned by query, also have possibilities such as

• Same syntax can be used from inside CLIP, with no overheads for startup

Page 18: Introduction to Ganga Karl Harrison (University of Cambridge) ATLAS Distributed Analysis Tutorial Milano, 5-6 February 2007 .

6 February 2007 18/18

Hands-on exercises

• Set of 11 exercises attached to course agenda– You should aim to complete at least the first 8

• Exercises 1-5: deal with setup for using Ganga with ATLAS release 12.0.4

• Exercises 5-8: introduce basic CLIP functionality• Exercise 9: shows how to use Ganga for job submission

from the Linux shell• Exercises 10-11: provide minimal examples of how to use

Python and IPython– Can be left for later if you don’t have time now, or can be omitted altogether if you’re already an expert