The Pipeline Processing Framework LSST Applications Meeting IPAC Feb. 19, 2008 Raymond Plante...
-
Upload
lauren-robbins -
Category
Documents
-
view
212 -
download
0
Transcript of The Pipeline Processing Framework LSST Applications Meeting IPAC Feb. 19, 2008 Raymond Plante...
The Pipeline Processing Framework
LSST Applications MeetingIPAC
Feb. 19, 2008
Raymond PlanteNational Center for Supercomputing Applications
LSST Applications Meeting
February 19-20, 2008
2
Overview
• Pipeline Framework provides – a container for hosting science algorithms– a mechanism for applying algorithm in parallel
• Data-Parallel Processing Model– algorithm implemented as “stage” of the pipeline– stage can have optional serial sections– parallel section applied to one data-parallel unit of data
• one CCD amplifier• one section of sky
– algorithm implementation usually avoids doing I/O• I/O handled in separate steps• stage is handed data it is supposed to work on• exception: database access
LSST Applications Meeting
February 19-20, 2008
3
Pipeline Concepts
LSST Applications Meeting
February 19-20, 2008
4
Pipeline Concepts
• Pipeline = a sequence of processing Stages
LSST Applications Meeting
February 19-20, 2008
5
Pipeline Concepts
• Pipeline = a sequence of processing Stages• Each stage can be distributed across multiple
processors.
LSST Applications Meeting
February 19-20, 2008
6
Pipeline Concepts
• Pipeline = a sequence of processing Stages• Each stage can be distributed across multiple
processors.– Each stage starts and ends with synchronized serial steps
LSST Applications Meeting
February 19-20, 2008
7
Pipeline Concepts
• Pipeline = a sequence of processing Stages• Each stage can be distributed across multiple
processors.– Each stage starts and ends with synchronized serial steps
• Slice = Parts of the stages working on the same portion of data.– Can reside in one address space on a single machine
Parallel processing
Slice
Slice
Stage
QueueQueue Queue
Stage Stage
Queue
PipelineSerial processing
Pipeline
Parallel processing
Slice
Pipeline Process• executes serial processing• controls the parallel slice workers
Slice Worker Processes• processes one data-parallel
portion of the data (e.g. a CCD)
Stage
QueueQueue Queue
Stage Stage
Queue
Slice
Stage
QueueQueue Queue
Stage Stage
Queue
Parallel processing
Slice
Slice
Stage
QueueQueue Queue
Stage Stage
Queue
Parallel processing
Slice
Slice
Stage
QueueQueue Queue
Stage Stage
Queue
DC2 Pipeline Harness
LSST Applications Meeting
February 19-20, 2008
9
Pipeline Execution
• Pipeline Harness manages parallel processing on HPC platforms– Message Passing Interface
• MPI-2 functionality via MPICH2 • Explicit process spawning, control
– Coordination of Serial & Parallel Processing
• Pipeline is a sequence of Stages• “Slices” serve as data parallel worker threads• Pipeline manager instructs Slices in execution of Stages• Pipeline <=> Slices communicate via MPI
• Pipeline Harness interface hides complexity – Application Stage developers implement Stage API
• process() Parallel processing• preprocess(), postprocess() Serial processing
– Python as Stage “glue”
• Stage developer writes algorithm code in C++• Python interface is generated • Stitches algorithm code together to create a Stage using Python
LSST Applications Meeting
February 19-20, 2008
10
Pipeline Dataflow
• Data flows through Stages via Queues• A stage can add data products to it output Queue. • Products can be persisted at any point in the chain.
PipelineManager
Pipeline
Stage
QueueQueue Queue
Stage Stage
QueueNew Input
DataOutput
Products
LSST Applications Meeting
February 19-20, 2008
11
Coupling Pipelines via the Event Framework
PipelineManager
Image/DetectionPipeline
Stage
QueueQueue Queue
Stage Stage
Queue
PipelineManager
Object AssociationPipeline
Stage
QueueQueue Queue
Stage Stage
QueuePipelineManager
Moving ObjectsPipeline
Stage
QueueQueue Queue
Stage Stage
Queue
EventSystem
“New Detectionsavailable”
“New MovingObject Candidates
Available”
LSST Applications Meeting
February 19-20, 2008
12
Tools for Stage Implementations
• Configuring a stage with Policies– Policy: a set of data properties as name-value pairs– Provided to stage implementation when stage is configured
• Recording messages: Logging– Messages have an associated “loudness”
• “DEBUG” = soft; “WARN” = louder
– Messages sent to a named topic• topics have an associated loudness threshold• messages louder than the threshold will be recorded
– Messages can have data properties associated with them• all messages automatically timestamped
– can be used to time sub-portions of implementation
• caller can attach other arbitrary properties
– Framework handles destination of messages• outside of pipeline harness, messages printed to screen• inside a parallel pipeline, messages sent out through event system, recorded
in database
LSST Applications Meeting
February 19-20, 2008
13
Possible Variations
• Fine control over inter-slice communication– normal communication between master and slices
– stage could have direct access to other slices via MPI commands
• Custom pipeline – managed by pipeline orchestration layer for monitoring
– external communication via events
LSST Applications Meeting
February 19-20, 2008
14
Building the Stack
• Basic Installation instructions: http://dev.lsstcorp.org/pkgs/GettingStarted.html
setenv LSST_HOME $PWD/stackmkdir $LSST_HOME; cd $LSST_HOMEcurl -o newinstall.sh http://dev.lsstcorp.org/pkgs/newinstall.shsh ./newinstall.shsource loadLSST.shlsstpkg fetch LSSTPipe
• Best supported platform:– Linux, gcc v3.4.6
• Alternatives to building the stack– Logging into LSST cluster @ NCSA
– Running Virtual Machine with stack pre-installed
LSST Applications Meeting
February 19-20, 2008
15
Working with code from the repository
• GettingStarted document contains SVN survival guide
• LSST software organized into packages– packages are separately versioned
– usually one person is in charge of tracking its state
• Building from SVNsetenv LSST_DC2 svn+ssh://svn.lsstcorp.org/DC2svn co $LSST_DC2/fw/trunk fw-trunk # check out the packagecd fw-trunksetup -r . # load required environmentscons # build it in placescons install # install it into the stack
LSST Applications Meeting
February 19-20, 2008
16
Testing Your Code
• Outside the framework– Create classes that can apply your algorithm to arbitrary data– Classes should not depend on pipeline framework– Create unit tests (in tests subdir.) or examples (in examples
subdir.) that exercise the class– testing can occur in C++, Python or both
• Inside the framework– Create python implementation of a Stage class– Create a policy file for configuring stage– Create a simple pipeline using policy files– Use the launchDC2.py script from the dc2pipe package to run
• provide identifying name for run (run ID) as input
• Process will likely change somewhat for DC3
LSST Applications Meeting
February 19-20, 2008
17
Running on the cluster
• The LSST cluster @ NCSA – up to date software stack– input data organized and ready for use– standard pipelines configured to write output in organized tree
• /lsst/DC2root contains directory for each run ID• each run ID has subdirectory that names a pipeline that was run• each pipeline contains...
– input: the input data processed– output: the output image products– work: the pipeline's working directory, contains copy of all input
policy file, log capturing stdout, stderr from master process.
• output database products– saved in MySQL database on lsst10– database named after run ID
LSST Applications Meeting
February 19-20, 2008
18
Dealing with bugs
• Bugs, issues and milestones are tracked using trac– life as a trac “ticket”
• Life Cycle:– ticket is created and assigned to a developer– developer creates copy of relevant package under the
package's tickets subdirectory in svn. Example: ticket #350 for change to fw
svn copy -m “addressing #350” $LSST_DC2/fw/trunk $LSST_DC2/fw/tickets/350
– changes are implemented, tested, checked into ticket branch– request code review: checked for compliance against coding
standards– reviewed code merged into trunk
• some refinement of process is expected