Automating Real-time Seismic Analysis Through Streaming and ...
Transcript of Automating Real-time Seismic Analysis Through Streaming and ...
![Page 1: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/1.jpg)
Automating Real-time Seismic AnalysisThroughStreamingandHighThroughputWorkflows
RafaelFerreiradaSilva,Ph.D.
http://pegasus.isi.edu
![Page 2: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/2.jpg)
Pegasus http://pegasus.isi.edu 2
Do we need seismic analysis?
![Page 3: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/3.jpg)
Pegasus http://pegasus.isi.edu 3
USArrayA continental-scale Seismic Observatory
US Array TA (IRIS Service)836 stations
394 stations have online data available
http://ds.iris.edu/ds/nodes/dmc/earthscope/usarray/_US-TA-operational/
![Page 4: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/4.jpg)
Pegasus http://pegasus.isi.edu 4
The development of reliable risk assessment methods for thesehazards requires real-time analysis of seismic data
![Page 5: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/5.jpg)
Pegasus http://pegasus.isi.edu 5
So, how to efficiently process these data?
![Page 6: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/6.jpg)
Pegasus http://pegasus.isi.edu 6
ScientificProblem
AnalyticalSolution
Computational Scripts
Automation
ExperimentTimeline
MonitoringandDebug
ScientificResultModels,QualityControl,
ImageAnalysis,etc.
Fault-tolerance,Provenance,etc.
Shellscripts,Python,Matlab,etc.
EarthScience, Astronomy,Neuroinformatics,Bioinformatics,etc.
DistributedComputing
Workflows,MapReduce,etc.
Clusters,HPC,Cloud,Grid,etc.
![Page 7: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/7.jpg)
Pegasus http://pegasus.isi.edu 7
What is involved in an experiment execution?
![Page 8: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/8.jpg)
Pegasus http://pegasus.isi.edu 8
Automate
Recover
Debug
Why Scientific Workflows?
Automates complex, multi-stage processing pipelines
Enables parallel, distributed computations
Automatically executes data transfers
Reusable, aids reproducibility
Records how data was produced (provenance)
Handles failures with to provide reliability
Keeps track of data and files
![Page 9: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/9.jpg)
Pegasus http://pegasus.isi.edu 9
Taking a closer look into a workflow…
job
dependencyUsuallydatadependencies
split
merge
pipeline
Command-lineprograms
DAGdirected-acyclic graphs
abstract workflow
executable workflow
storage constraints
optimizations
![Page 10: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/10.jpg)
Pegasus http://pegasus.isi.edu 10
From the abstraction to execution!
stage-in job
stage-out job
registration job
Transferstheworkflowinputdata
Transferstheworkflowoutputdata
Registerstheworkflowoutputdata
abstract workflow
executable workflow
storage constraints
optimizations
![Page 11: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/11.jpg)
Pegasus http://pegasus.isi.edu 11
Optimizing storage usage…
cleanup jobRemovesunuseddata
abstract workflow
executable workflow
storage constraints
optimizations
![Page 12: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/12.jpg)
Pegasus http://pegasus.isi.edu 12
Workflow systems provide tools togenerate the abstract workflow
dax = ADAG("test_dax")firstJob = Job(name="first_job")firstInputFile = File("input.txt")firstOutputFile = File("tmp.txt")firstJob.addArgument("input=input.txt", "output=tmp.txt")firstJob.uses(firstInputFile, link=Link.INPUT)firstJob.uses(firstOutputFile, link=Link.OUTPUT)dax.addJob(firstJob)for i in range(0, 5):
simulJob = Job(id="%s" % (i+1), name="simul_job”)simulInputFile = File("tmp.txt”)simulOutputFile = File("output.%d.dat" % i)simulJob.addArgument("parameter=%d" % i, "input=tmp.txt”,
output=%s" % simulOutputFile.getName())simulJob.uses(simulInputFile, link=Link.INPUT)simulJob.uses(simulOutputFile, line=Link.OUTPUT)
dax.addJob(simulJob)dax.depends(parent=firstJob, child=simulJob)fp = open("test.dax", "w”)dax.writeXML(fp)fp.close()
abstract workflow
![Page 13: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/13.jpg)
W h i c h W o r k f l o wManagement System?
Pegasus http://pegasus.isi.edu 13
PegasusNimrod
![Page 14: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/14.jpg)
Pegasus http://pegasus.isi.edu 14
…and which model to use?
task-oriented stream-based
files
parallelismnoconcurrency
datatransfersviafiles
heterogeneous executiontaskscanruninheterogeneousresources
streams
concurrencytasksrunconcurrently
datatransfersviamemoryormessage
homogeneous executiontasksshouldruninhomogeneousresources
![Page 15: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/15.jpg)
Pegasus http://pegasus.isi.edu 15
What does Pegasus provide?
AutomationAutomatespipelineexecutions
Parallel,distributedcomputationsAutomaticallyexecutesdatatransfers
Heterogeneous resourcesTask-orientedmodel
Applicationisseenasablackbox
DebugWorkflowexecutionandjobperformancemetricsSetofdebuggingtoolstounveilissuesReal-timemonitoring,graphs,provenance
OptimizationJobclusteringDatacleanup
RecoveryJobfailuredetection
CheckpointFilesJobRetry
RescueDAGs
![Page 16: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/16.jpg)
Pegasus http://pegasus.isi.edu 16
…and dispel4py?
AutomationAutomatespipelineexecutions
Concurrent,distributedcomputationsStream-basedmodel
WorkflowCompositionPythonLibraryGrouping(all-to-all,all-to-one,one-to-all)
OptimizationMultiple streams(in/out)AvoidsI/O(sharedmemoryormessagepassing)
MappingSequential
Multiprocessing (sharedmemory)Distributedmemory,messagepassing(MPI)
DistributedReal-time (ApacheStorm)ApacheSpark(Prototype)
![Page 17: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/17.jpg)
Pegasus http://pegasus.isi.edu 17
Asterism greatlysimplifiestheeffortrequiredtodevelop
data-intensive applicationsthatrunacrossmultiple
heterogeneous resourcesdistributedinthewidearea
Pegasus
sub-workflowdispel4pyworkflow
ASTERISM
![Page 18: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/18.jpg)
Pegasus http://pegasus.isi.edu 18
Where to run scientific workflows?
![Page 19: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/19.jpg)
High Performance Computing
http://pegasus.isi.edu 19Pegasus
sharedfilesystem
submit host(e.g.,user’slaptop)
There are several possible configurations…
typically most HPC sites
WorkflowEngine
![Page 20: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/20.jpg)
Cloud Computing
http://pegasus.isi.edu 20Pegasus
objectstorage
submit host(e.g.,user’slaptop)
High-scalable object storages
Typical cloud computing deployment (Amazon S3,
Google Storage)
WorkflowEngine
![Page 21: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/21.jpg)
http://pegasus.isi.edu 21Pegasus
submit host(e.g.,user’slaptop)
local data management
Typical OSG sitesOpenScience Grid
WorkflowEngine
Grid Computing
![Page 22: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/22.jpg)
http://pegasus.isi.edu 22Pegasus
sharedfilesystem
submit host(e.g.,user’slaptop)
And yes… you can mix everything!
ComputesiteB
ComputesiteA
object storage
WorkflowEngine
![Page 23: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/23.jpg)
Pegasus http://pegasus.isi.edu 23
How do we use Asterism to automate seismic analysis?
![Page 24: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/24.jpg)
WORKFLOW
Pegasus http://pegasus.isi.edu 24
Phase1(pre-process)
Phase2(cross-correlation)
Seismic Ambient Noise Cross-Correlation
Preprocesses and cross-correlates traces (sequences of measurements of acceleration in three dimensions) from multiple seismic stations (IRIS database)
Phase 1: data preparation using statistics for extracting information from the noise
Phase 2: compute correlation, identifying the timefor signals to travel between stations. Infers properties of the intervening rock
![Page 25: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/25.jpg)
Seismic Ambient Noise Cross-Correlation
Pegasus http://pegasus.isi.edu 25
Distributed computation framework for event stream processing
Designed for massive scalability, supports fault-tolerance with a “fail fast, auto restart” approach to processes
Rich array of available spouts specialized for receiving data from all types of sources
Hadoop of real-time processing, very scalablespout
bolt
![Page 26: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/26.jpg)
http://pegasus.isi.edu 26
Seismic workflow execution
Pegasus
IRISdatabase(stations)
datatransfersbetweensitesperformedbyPegasus
ComputesiteB(ApacheStorm)
ComputesiteA(MPI-based)
input data (~150MB)
submit host(e.g.,user’slaptop)
output data (~40GB)
Phase1
Phase2
![Page 27: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/27.jpg)
WORKFLOW
Pegasus http://pegasus.isi.edu 27
Southern California Earthquake Center’s CyberShake
Builders ask seismologists: What will the peak ground motion be at my new building in the next 50 years?
Seismologists answer this question using Probabilistic Seismic Hazard Analysis (PSHA)
286 sites, 4 modelseach workflow has 420,000 tasks
![Page 28: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/28.jpg)
Pegasus http://pegasus.isi.edu 28
A few more workflow features…
![Page 29: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/29.jpg)
Pegasus http://pegasus.isi.edu 29
Performance, why not improve it?
clustered jobGroupssmall jobstogethertoimproveperformance
tasksmallgranularity
workflow restructuring
workflow reduction
pegasus-mpi-cluster
hierarchical workflows
![Page 30: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/30.jpg)
Pegasus http://pegasus.isi.edu 30
What about data reuse?
data alreadyavailable
JobswhichoutputdataisalreadyavailableareprunedfromtheDAG
data reuse
workflow restructuring
workflow reduction
pegasus-mpi-cluster
hierarchical workflows
workflowreduction
data alsoavailable
data reuse
![Page 31: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/31.jpg)
http://pegasus.isi.edu 31
Handling large-scaleworkflows
pegasus-mpi-cluster
recursion endswhen workflow withonly compute jobsis encountered
sub-workflow
sub-workflow
workflow restructuring
workflow reduction
hierarchical workflows
![Page 32: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/32.jpg)
Pegasus http://pegasus.isi.edu 32
Running fine-grainedworkflows on HPC systems…
pegasus-mpi-cluster
HPCSystemsubmit host(e.g.,user’slaptop)
workflow wrapped as an MPI jobAllowssub-graphs ofaPegasusworkflowtobe
submittedasmonolithic jobstoremoteresources
workflow restructuring
workflow reduction
hierarchical workflows
![Page 33: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/33.jpg)
PegasusAutomate, recover,anddebug scientificcomputations.
Get Started
PegasusWebsitehttp://pegasus.isi.edu
HipChat
![Page 34: Automating Real-time Seismic Analysis Through Streaming and ...](https://reader031.fdocuments.in/reader031/viewer/2022022418/58a1ac141a28abe50c8bf5f8/html5/thumbnails/34.jpg)
Thank You
Questions?
RafaelFerreiradaSilva,[email protected] KaranVahi
RafaelFerreiradaSilva
RajivMayani
MatsRynge
Ewa Deelman
Automating Real-time Seismic AnalysisThroughStreamingandHighThroughputWorkflows