An Overview of Swift Parallel scripting for distributed systems Mike Wilde [email protected]
description
Transcript of An Overview of Swift Parallel scripting for distributed systems Mike Wilde [email protected]
![Page 1: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/1.jpg)
An Overview of SwiftParallel scripting for distributed systems
Mike [email protected]
Computation Institute, University of Chicagoand Argonne National Laboratory
www.ci.uchicago.edu/swift
![Page 2: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/2.jpg)
Acknowledgments• Swift effort is supported in part by NSF grants OCI-
721939 and PHY-636265, NIH DC08638 and DA024304, The Argonne LDRD program, and the UChicago/Argonne Computation Institute.
• The Swift team:– Ben Clifford, Ian Foster, Mihael Hategan, Sarah Kenny, Mike
Wilde, Zhao Zhang, Yong Zhao
• Java CoG Kit used by Swift developed by:– Mihael Hategan, Gregor Von Laszewski, and many others
• User contributed workflows and application user– I2U2 Project, U.Chicago Molecular Dynamics,
U.Chicago Radiology and Human Neuroscience Lab, Umaryland, Dartmouth Brain Imaging Center
![Page 3: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/3.jpg)
Swift programs
• A Swift script is a set of functions– Atomic functions wrap & invoke application programs
– Composite functions invoke other functions
• Data is typed as composable arrays and structures of files and simple scalar types (int, float, string)
• Collections of persistent file structures (datasets) are mapped into this data model
• Members of datasets can be processed in parallel
• Statements in a procedure are executed in data-flow dependency order and concurrency
• Variables are single assignment
• Provenance is gathered as scripts execute
![Page 4: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/4.jpg)
A simple Swift scripttype imagefile;
(imagefile output) flip(imagefile input) { app { convert "-rotate" "180" @input @output; }}
imagefile stars <"orion.2008.0117.jpg">;imagefile flipped <"output.jpg">;
flipped = flip(stars);
![Page 5: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/5.jpg)
Parallelism via foreach { }type imagefile;
(imagefile output) flip(imagefile input) {
app {
convert "-rotate" "180" @input @output;
}
}
imagefile observations[ ] <simple_mapper; prefix=“orion”>;
imagefile flipped[ ] <simple_mapper; prefix=“orion-flipped”>;
foreach obs,i in observations { flipped[i] = flip(obs); }
Nameoutputs
based on inputs
Process alldataset members
in parallel
![Page 6: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/6.jpg)
The Swift Scripting Model
• Program in high-level, functional model• Swift hides issues of location, mechanism
and data representation• Basic active elements are functions that
encapsulate application tools and run jobs• Typed data model: structures and arrays of
files and scalar types• Variables are single assignment• Control structures perform conditional,
iterative and parallel operations
![Page 7: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/7.jpg)
The Variable model
• Single assignment:– Can only assign a value to a var once– This makes data flow semantics much
cleaner to specify, understand and implement
• Variables are scalars or references to composite objects
• Variables are typed• File typed variables are “mapped” to
files
![Page 8: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/8.jpg)
Swift statements
• Var declarations– Can be mapped
• Type declarations• Assignment statements
– Assignments are type-checked
• Control-flow statements– if, foreach, iterate
• Function declarations
![Page 9: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/9.jpg)
Data Flow Model
• This is what makes it possible to be location independent
• Computations proceed when data is ready (often not in source-code order)
• User specifies DATA dependencies, doesn’t worry about sequencing of operations
• Exposes maximal parallelism
![Page 10: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/10.jpg)
Data Management
• Directories and management model– local dir, storage dir, work dir– caching within workflow– reuse of files on restart
• Makes unique names for: jobs, files, wf• Can leave data on a site
– For now, in Swift you need to track it– In Pegasus (and VDS) this is done automatically
![Page 11: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/11.jpg)
Mappers and Vars
• Vars can be “file valued”
• Many useful mappers built-in, written in Java to the Mapper interface
• “Ext”ernal mapper can be easily written as an external script in any language
![Page 12: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/12.jpg)
Example: fMRI Type Definitionstype Study {
Group g[ ]; }
type Group { Subject s[ ];
}
type Subject { Volume anat; Run run[ ];
}
type Run { Volume v[ ];
}
type Volume { Image img; Header hdr;
}
type Image {};
type Header {};
type Warp {};
type Air {};
type AirVec { Air a[ ];
}
type NormAnat {Volume anat; Warp aWarp; Volume nHires;
}Simplified version of fMRI AIRSN Program
(Spatial Normalization)
![Page 13: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/13.jpg)
Coding your own “external” mapper
awk <angle-spool-1-2 'BEGIN { server="gsiftp://tp-osg.ci.uchicago.edu//disks/ci-gpfs/angle"; }{ printf "[%d] %s/%s\n", i++, server, $0 }’
$ cat angle-spool-1-2 spool_1/anl2-1182294000-dump.1.167.pcap.gzspool_1/anl2-1182295800-dump.1.170.pcap.gzspool_1/anl2-1182296400-dump.1.171.pcap.gzspool_1/anl2-1182297600-dump.1.173.pcap.gz…$ ./map1 | head[0] gsiftp://tp-osg.ci.uchicago.edu//disks/ci-gpfs/angle/spool_1/anl2-1182294000-
dump.1.167.pcap.gz[1] gsiftp://tp-osg.ci.uchicago.edu//disks/ci-gpfs/angle/spool_1/anl2-1182295800-
dump.1.170.pcap.gz[2] gsiftp://tp-osg.ci.uchicago.edu//disks/ci-gpfs/angle/spool_1/anl2-1182296400-
dump.1.171.pcap.gz…
![Page 14: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/14.jpg)
fMRI Example Workflow(Run resliced) reslice_wf ( Run r) { Run yR = reorientRun( r , "y", "n" ); Run roR = reorientRun( yR , "x", "n" ); Volume std = roR.v[1]; AirVector roAirVec = alignlinearRun(std, roR, 12, 1000, 1000, "81 3 3"); resliced = resliceRun( roR, roAirVec, "-o", "-k");}
Collaboration with James Dobson, Dartmouth
(Run or) reorientRun (Run ir, string direction, string overwrite) { foreach Volume iv, i in ir.v { or.v[i] = reorient (iv, direction, overwrite); }}
![Page 15: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/15.jpg)
AIRSN Program Definition
(Run snr) functional ( Run r, NormAnat a,
Air shrink ) {
Run yroRun = reorientRun( r , "y" );
Run roRun = reorientRun( yroRun , "x" );
Volume std = roRun[0];
Run rndr = random_select( roRun, 0.1 );
AirVector rndAirVec = align_linearRun( rndr, std, 12, 1000, 1000, "81 3 3" );
Run reslicedRndr = resliceRun( rndr, rndAirVec, "o", "k" );
Volume meanRand = softmean( reslicedRndr, "y", "null" );
Air mnQAAir = alignlinear( a.nHires, meanRand, 6, 1000, 4, "81 3 3" );
Warp boldNormWarp = combinewarp( shrink, a.aWarp, mnQAAir );
Run nr = reslice_warp_run( boldNormWarp, roRun );
Volume meanAll = strictmean( nr, "y", "null" )
Volume boldMask = binarize( meanAll, "y" );
snr = gsmoothRun( nr, boldMask, "6 6 6" );
}
(Run or) reorientRun (Run ir, string direction) { foreach Volume iv, i in ir.v { or.v[i] = reorient(iv, direction); } }
![Page 16: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/16.jpg)
Automated image registration for spatial normalization
reorientRun
reorientRun
reslice_warpRun
random_select
alignlinearRun
resliceRun
softmean
alignlinear
combinewarp
strictmean
gsmoothRun
binarize
reorient/01
reorient/02
res lic e_w arp/22
align l inear/03 al ign l inear/07align l inear/11
reorient/05
reorient/06
res lic e_w ar p/23
reorient/09
reorient/10
res l ic e_w arp/24
reorient/25
reorient/51
res l ic e_warp/26
reorient/27
reorient/52
r es l ic e_warp/28
reorient/29
reorient/53
res l ic e_warp/30
reorient/31
reorient/54
res lic e_w arp/32
reorient/33
r eorient/55
resl ic e_warp/34
reorient/35
reor ient/56
resl ice_warp/36
reor ient/37
reorient/57
resl ice_warp/38
resl ice/04 resl ice/08res l ice/12
gsm ooth/41
s tr ic tmean/39
gs mooth/42gs mooth/43gs mooth/44 gsm ooth/45 gs mooth/46 gs m ooth/47 gs m ooth/48 gs m ooth/49 gsm ooth/50
softm ean/13
al ign l inear/17
com binewarp/21
binarize/40
reorient
reorient
alignlinear
reslice
softmean
alignlinear
combine_warp
reslice_warp
strictmean
binarize
gsmooth
Collaboration with James Dobson, Dartmouth [SIGMOD Record Sep05]
AIRSN workflow expanded:AIRSN workflow:
![Page 17: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/17.jpg)
Workflow Coding/Style Issues
• Swift is not a data manipulation language - use scripting tools for that
• Expose or hide parameters
• One atomic, many variants
• Expose or hide program structure
• Driving a parameter sweep with readdata() - reads a csv file into struct[].
• Can code your own mappers
![Page 18: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/18.jpg)
Virtual Node(s)
SwiftScript
Abstractcomputation
Virtual DataCatalog
SwiftScriptCompiler
Specification Execution
Worker Nodes
Provenancedata
ProvenancedataProvenance
collector
launcher
launcher
file1
file2
file3
AppF1
AppF2
Scheduling
Execution Engine(Karajan w/
Swift Runtime)
Swift runtimecallouts
C
C CC
Status reporting
Swift Architecture
Provisioning
FalkonResource
Provisioner
AmazonEC2
![Page 19: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/19.jpg)
Running swift
• Fully contained Java grid client
• Can test on a local machine
• Can run on a PBS cluster
• Runs on multiple clusters over Grid interfaces
![Page 20: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/20.jpg)
WorkflowStatus
and logs
swiftcommand
launcher
launcher
f1
f2
f3
Worker Nodes
Appa1
Appa2
Using Swift
SwiftScript
Appa1
Appa2
Data
f1 f2 f3
sitelist
applist
Provenancedata
![Page 21: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/21.jpg)
Site selection and throttling
• Avoid overloading target infrastructure
• Base resource choice on current conditions and real response for you
• Balance this with space availability
• Things are getting more automated.
![Page 22: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/22.jpg)
Clustering and Provisioning
• Can cluster jobs together to reduce grid overhead for small jobs
• Can use a provisioner
• Can use a provider to go straight to a cluster
![Page 23: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/23.jpg)
Research topic:Collective I/O for Petascale Workflow
• Goal: efficient I/O at petascale for loosely coupled programming
• Approach– Distribute input data to local filesystems– Create fast local filesystems distributed
throughout the cluster– Asynchronously batch output data in a
pipelined manner to later stages and persistent storage
• Work is in early stages
![Page 24: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/24.jpg)
Abstract architecture
Global FS
Workload Processor
Intermediatefilesystem &
IO processorIFS
Computenode
LFS
Computenode
LFS
<-- Interconnect -->
. . .
![Page 25: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/25.jpg)
Application Dataflow Patterns
Common input datasets
Per-invocation datasets. . .
Large output dataset
App App AppApp
App App App App
App
![Page 26: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/26.jpg)
LCP Collective IO Model
Global FS
Global FS
Application Script
ZOID onIO node
ZOID IFS for staging
<-- Torus & Tree Interconnects -->
CN-striped IFS for Data
Computenode(local
datasets)
LFSCompute
node(local
datasets)
LFS. . .
IFSCompute
node
IFSseg
IFSCompute
node
IFSseg
. . .Large Input
Dataset
![Page 27: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/27.jpg)
Distributor and collector
Global FS
IFSNodeIFS
. . .
CNLFS
CNLFS
. . .
CNLFS
CNLFS
. . .
IFSNodeIFS
DistributorCollector
55
44
33
22
11
![Page 28: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/28.jpg)
Mapping Compute Nodes to IFSs
<-- Torus Interconnect -->
CNLFS
. . .
CNLFS
CNLFS
CNLFS
CNLFS . . .
CNLFS
CNLFS
CNLFS
CNLFS . . .
CNLFS
CNLFS
CNLFS
. . .CNLFS
CNLFS
CNLFS . . .
CNLFS
CNLFS
CNLFS
CNLFS . . .
CNLFS
CNLFS
CNLFS
CNLFS . . .
CNLFS
CNLFS
CNLFS
. . .CNLFS
CNLFS
.
.
.
.
.
.
IFS
IFS
![Page 29: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/29.jpg)
Asynchronous Output Stagingcompute out compute outcompute outcompute out
compute
ou
t
to GFS
to GFS
computeo
utcompute
ou
tcompute
ou
t
next task
ou
t
![Page 30: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/30.jpg)
Getting Started• www.ci.uchicago.edu/swift
– Documentation -> tutorials
• Get CI accounts– https://www.ci.uchicago.edu/accounts/
• Request: workstation, gridlab, teraport
• Get a DOEGrids Grid Certificate– http://www.doegrids.org/pages/cert-request.html
• Virtual organization: OSG / OSGEDU• Sponsor: Mike Wilde, [email protected], 630-252-7497
• Develop your Swift code and test locally, then:– On PBS / TeraPort– On OSG: OSGEDU
• Use simple scripts (Perl, Python) as your test apps
http://www.ci.uchicago.edu/swift
![Page 31: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/31.jpg)
Summary• Clean separation of logical/physical concerns
– XDTM specification of logical data structures
+ Concise specification of parallel programs– SwiftScript, with iteration, etc.
+ Efficient execution (on distributed resources)– Karajan+Falkon: Grid interface, lightweight dispatch, pipelining,
clustering, provisioning
+ Rigorous provenance tracking and query– Records provenance data of each job executed
Improved usability and productivity– Demonstrated in numerous applications
http://www.ci.uchicago.edu/swift
![Page 32: An Overview of Swift Parallel scripting for distributed systems Mike Wilde wilde@mcs.anl](https://reader035.fdocuments.in/reader035/viewer/2022070410/5681469a550346895db3b353/html5/thumbnails/32.jpg)
For Swift Information
• www.ci.uchicago.edu/swift– Quick Start Guide:
• http://www.ci.uchicago.edu/swift/guides/quickstartguide.php
– User Guide:• http://www.ci.uchicago.edu/swift/guides/userguide.php
– Introductory Swift Tutorials:• http://www.ci.uchicago.edu/swift/docs/index.php