Fractal Antennae and Coherence 1 Fractal Antennae and Coherence
Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of...
Transcript of Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of...
![Page 1: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark](https://reader036.fdocuments.in/reader036/viewer/2022062300/55c158a0bb61eb88758b4836/html5/thumbnails/1.jpg)
Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory
Coherence of Seismic-Reflection Wavefields using Apache Spark
Ian Lumb
HPCS 2015 - Montreal
http://hpcs.ca
![Page 2: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark](https://reader036.fdocuments.in/reader036/viewer/2022062300/55c158a0bb61eb88758b4836/html5/thumbnails/2.jpg)
Outline
● The challenges and opportunities of RTM● Refactoring RTM with Spark/RDDs
o Spark’ing coherence between wavefields● Summary
![Page 3: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark](https://reader036.fdocuments.in/reader036/viewer/2022062300/55c158a0bb61eb88758b4836/html5/thumbnails/3.jpg)
http://www.acceleware.com/technical-papers
![Page 4: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark](https://reader036.fdocuments.in/reader036/viewer/2022062300/55c158a0bb61eb88758b4836/html5/thumbnails/4.jpg)
Zhou 2014Fig. 7.25
![Page 5: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark](https://reader036.fdocuments.in/reader036/viewer/2022062300/55c158a0bb61eb88758b4836/html5/thumbnails/5.jpg)
![Page 6: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark](https://reader036.fdocuments.in/reader036/viewer/2022062300/55c158a0bb61eb88758b4836/html5/thumbnails/6.jpg)
Motivation
● RTM is performance-challengedo Algorithms research remains topical
GPUs responsible for compelling results● Revisit RTM as a ‘Big Data problem’
o In-memory analytics has the potential to Improve performance of data and wavefield
manipulations in concert with computations Introduce new prospects for imaging conditions
![Page 7: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark](https://reader036.fdocuments.in/reader036/viewer/2022062300/55c158a0bb61eb88758b4836/html5/thumbnails/7.jpg)
![Page 8: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark](https://reader036.fdocuments.in/reader036/viewer/2022062300/55c158a0bb61eb88758b4836/html5/thumbnails/8.jpg)
Key Performance Challenges● RTM modeling kernel is compute intensive
o Stable, non-dispersive solution via FDM requires Small time steps and small grid intervals Higher-order approximations of the spatial
derivatives● RTM wavefields exceed memory capacity
o Multiple-TB source volumes must be stored to disk
e.g., Liu et al., Computers & Geosciences 59 (2013) 17–23
![Page 9: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark](https://reader036.fdocuments.in/reader036/viewer/2022062300/55c158a0bb61eb88758b4836/html5/thumbnails/9.jpg)
Resilient Distributed Datasets (RDDs)
● Abstraction for in-memory computing● Fault-tolerant, parallel data structures
o Cluster-ready● Optionally persistent ● Can be partitioned for optimal placement● Manipulated via operators
Zaharia et al., NSDI 2012
![Page 10: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark](https://reader036.fdocuments.in/reader036/viewer/2022062300/55c158a0bb61eb88758b4836/html5/thumbnails/10.jpg)
RTM via RDDs: Implementation using Spark● Apache Spark is an implementation of RDDs● Make use of HDFS or alternative FS
o GPFS, AWS S3, OpenStack Swift, Ceph or Lustre● Choose appropriate programming model(s)
o Not limited to MapReduceo Iterative and/or interactive (including streaming)
● Manage Spark workloads o Built-in mode or YARN mode, Mesoso Univa Universal Resource Broker after Lumb, insideBIGDATA
http://insidebigdata.com/2015/03/06/8-reasons-apache-spark-hot/
![Page 11: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark](https://reader036.fdocuments.in/reader036/viewer/2022062300/55c158a0bb61eb88758b4836/html5/thumbnails/11.jpg)
RTM via RDDs: Implementation using Spark (2)
● Deployable on bare metal … cloudso Monitoring/management Bright Cluster Manager
● Introduces analytics possibilities for RTMo Program in Java (C/C++ via JNA), Scala or Python
● Uptake is significant - rapidly growing community● Results are extremely impressive
o Exploit CPUs and/or GPUs after Lumb, insideBIGDATA http://insidebigdata.com/2015/03/06/8-reasons-apache-spark-hot/
![Page 12: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark](https://reader036.fdocuments.in/reader036/viewer/2022062300/55c158a0bb61eb88758b4836/html5/thumbnails/12.jpg)
RTM via RDDs: Opportunities● Apply RDDs to gathers of seismic data
o Partition RDDs optimally for wavefields calculations● Apply RDDs to source wavefields
o Partition RDDs optimally for cross-correlation of forward and reverse time wavefields Significantly reduce/eliminate disk I/O
● Investigate alternate imaging conditionso Machine-learning and/or graph-analytics algorithms
in addition to cross-correlation
![Page 13: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark](https://reader036.fdocuments.in/reader036/viewer/2022062300/55c158a0bb61eb88758b4836/html5/thumbnails/13.jpg)
![Page 14: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark](https://reader036.fdocuments.in/reader036/viewer/2022062300/55c158a0bb61eb88758b4836/html5/thumbnails/14.jpg)
SparkWorkers
Spark (YARN) Master
Sparkor YARN
![Page 15: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark](https://reader036.fdocuments.in/reader036/viewer/2022062300/55c158a0bb61eb88758b4836/html5/thumbnails/15.jpg)
http://www.informationweek.com/big-data/big-data-analytics/apache-spark-3-promising-use-cases/a/d-id/1319660
![Page 16: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark](https://reader036.fdocuments.in/reader036/viewer/2022062300/55c158a0bb61eb88758b4836/html5/thumbnails/16.jpg)
![Page 17: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark](https://reader036.fdocuments.in/reader036/viewer/2022062300/55c158a0bb61eb88758b4836/html5/thumbnails/17.jpg)
http://ipython.org/notebook.html
![Page 18: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark](https://reader036.fdocuments.in/reader036/viewer/2022062300/55c158a0bb61eb88758b4836/html5/thumbnails/18.jpg)
Thunder: Initial Impressions● Written in Spark's Python API (Pyspark)
o Makes use of scipy, numpy, and scikit-learn● IPython Notebook serves as interactive GUI
Runs in a Web browser Notebooks can include text and graphics Secure, remote access to an in-cluster IPython
Notebook server ● Includes modular functions for time-series analysis● Can interface with C/C++ from Python
http://thunder-project.org/
![Page 19: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark](https://reader036.fdocuments.in/reader036/viewer/2022062300/55c158a0bb61eb88758b4836/html5/thumbnails/19.jpg)
![Page 20: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark](https://reader036.fdocuments.in/reader036/viewer/2022062300/55c158a0bb61eb88758b4836/html5/thumbnails/20.jpg)
Is there a case for migration?● In-memory computing via RDDs is promising
o Application to gathers and wavefields● Spark provides analytics upside
o Imaging conditions other than cross-correlation ● Spark may be applicable to modeling kernels ● Spark can be easily incorporated into pre-existing IT
infrastructureso Compliments existing HPC environments
http://rice2015oghpc.rice.edu/technical-program/
![Page 21: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark](https://reader036.fdocuments.in/reader036/viewer/2022062300/55c158a0bb61eb88758b4836/html5/thumbnails/21.jpg)
Summary● Is there a case for migration?
o From: RTM via HPC o To: RTM via Big Data or ( Big Data and HPC )
● Does it make sense to refactor other HPC problems as ‘Big Data problems’?
![Page 22: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark](https://reader036.fdocuments.in/reader036/viewer/2022062300/55c158a0bb61eb88758b4836/html5/thumbnails/22.jpg)
Resilient Distributed Datasets (RDDs)
● Abstraction for in-memory computing● Fault-tolerant, parallel data structures
o Cluster-ready● Optionally persistent ● Can be partitioned for optimal placement● Manipulated via operators
Zaharia et al., NSDI 2012
![Page 23: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark](https://reader036.fdocuments.in/reader036/viewer/2022062300/55c158a0bb61eb88758b4836/html5/thumbnails/23.jpg)
Refactoring HPC with Spark/RDDs …
● Could Spark/RDDs replace MPI?o Spark has primitives for distributed in-memory
parallel computing … including fault tolerance
![Page 24: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark](https://reader036.fdocuments.in/reader036/viewer/2022062300/55c158a0bb61eb88758b4836/html5/thumbnails/24.jpg)
Acknowledgements
● M. Zaharia et al. for RDDs● Communities responsible for Spark, Python & Thunder● M. Lamarca, P. Labropoulos, D. Shestakov & L.
Gibbons at Bright Computing
![Page 26: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields using Apache Spark](https://reader036.fdocuments.in/reader036/viewer/2022062300/55c158a0bb61eb88758b4836/html5/thumbnails/26.jpg)
Resources
● RTM's scientific context● Spark support in Bright Cluster Manager for
Apache Hadoop