Post on 14-Dec-2015
Ryan Fraser (CSIRO), Lesley Wyborn (GA), Richard Chopping (GA), Terry Rankine (CSIRO), Robert Woodcock (CSIRO)
MINERALS DOWN UNDER
Virtual Geophysics Laboratory (VGL): Exploiting the Cloud and HPC
Scientific workflow – Virtual Geophysics Laboratory (VGL)
Scientific Workflow Engine (or Virtual Laboratory)Automates and massively expands (GA) Geophysicists computational capacity via the Cloud Amazon – EC2 / S3 (and others using this interface) OpenStack
Collaboration between CSIRO, GA, NCI, Monash, UQ, and ANUVGL is just a pretty face User Driven GUI Leverages data providers and cloud technologies to do all the heavy lifting
Non-Restrictive Open-source
2
V(what)GL
VEGL – Virtual Exploration Geophysics Laboratory One primary science collaboration One primary workflow One collection of geophysical data sets
VGL - Virtual Geophysics Laboratory NeCTAR funded activity Collaboration with multiple partners (CSIRO, NCI, GA, UQ, Monash, ANU) Supporting multiple workflows New data sets New data types New Use – Not just exploration.
Done
Geophysics (as seen by a software dev)
Geophysics is taking physical measurements… Magnetism over an area Acceleration due to gravity over an area…Applying lots of mathematics……to infer the structure of what is under the surface…It is not geology Samples are never taken, only measurements
5
Apply Mathematics
Raw data
But -- Where is all the ????
Our Geophysics Problem
Measurements coming from the field are ‘raw’ Varying spatial reference systems Noisy Artefacts from collection process
This data needs processing From raw data to a data product
Data products are valuable They will be re-used and referenced repeatedly
Processing is a time consuming process Made worse by a purely manual workflow
6
The Past
Compile raw data using proprietary FORTRAN Also use software – IntrepidTransform to a regular grid using more software MATLAB, Intrepid, ER Mapper, ESRI ArcGIS, QGISCrop data spatially to suit final data product eg: everything in VictoriaTransform data into a file format that can be read by proprietary scientific code. This is usually done with some handwritten python or c There is no version control, code is often rewritten / redoneUpload data to HPC Manually enter input parameters/start job
7
Hardcopy of data
SSH Client
MATLABIntrepid
8
Let’s map it out…
Transform to a regular grid
Crop data to area of interest
Reformat data for processing
Upload data to NCI
Configure job and start
processing
Download results
Get handed field data Visualise data
There seems to be a problem…
Reproducibility – there is none• What was the input of your model?• What transformations occurred?It’s a manual process• Time consuming• Error proneExpensive • Licensing costs
9
The Recent Past - Our solution
Virtual Exploration Geophysics Laboratory - http://siss1.anu.edu.au/VEGL-Portal
Provenance (“I hate this word”) All input data is saved and then published with the final data product
VEGL automates portions of the workflow Allowing scientists to focus on scienceBuilt entirely on open source tools No licensing costs
10
11CLOUD
Data Discovery
12
Data Selection
13
Script Builder
14
Job Monitoring
15
Published provenance records
16
17
From this…
Hardcopy of data
SSH Client
MATLABIntrepid
Transform to a regular grid
Crop data to area of interest
Reformat data for processing
Upload data to NCI
Configure job and start
processing
Download results
Get handed field data Visualise data
…to thisVirtual Geophysics Laboratory
Build “science” from existing
libraries
Run jobCollect and
publish results
Discover raw data
Select spatial bounds
VEGL – Point-of-View benefits
User:• Data all accessible in one place• Same science but MUCH more efficient• Bigger scale, quicker • Repeatability
Developer/Tech: Its concepts can be re-used for other scientific workflows It can produce actual scientific data products Is capable of integrating with any SISS (OGC) data provider The power lies with the underlying services, accessed using standardised
protocols
19
NOW - Opportunities: Virtual Geophysics Laboratory
Collaboration with multiple partners – CSIRO, GA, NCI, Monash, UQ, ANU Supporting multiple workflows Model Registry (3D) New Scientific Codes – Underworld, eScript, UBC, Airborne EM inversion
codes New data sets from GA: National Airborne Geophysical DB including– Gravity, Radiometric, AEM, Magnetics
New Use – Broad application.
20
Opportunities
Exploiting the generic Modularising the workflow for general scientific usage Repurposing for other use cases – nature hazards, climate prediction, etc Commercial uptake Integration with other VLs to achieve ultimate aim
•Supercomputing - Pawsey Centre, NCI •Cloud – NeCTAR, NCI and commercial providers
• NeCTAR VGL funded – opportunity to use and/or repurpose the platform for other uses
Thank you and for more information:
CSIRO Earth Science & Resource EngineeringRyan FraserProject Lead
Phone: +61 8 6436 8760Email: ryan.fraser@csiro.au
Web: www.csiro.ausiss.auscope.org
http://siss1.anu.edu.au/VEGL-Portal/
https://twiki.auscope.org/wiki/Grid/VEGLPortalDevelopment