1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.

23
1 Adventures in Web Adventures in Web Services for Large Services for Large Geophysical Datasets Geophysical Datasets Joe Sirott Joe Sirott PMEL/NOAA PMEL/NOAA

Transcript of 1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.

Page 1: 1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.

11

Adventures in Web Services Adventures in Web Services for Large Geophysical for Large Geophysical

DatasetsDatasets

Joe SirottJoe Sirott

PMEL/NOAAPMEL/NOAA

Page 2: 1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.

22

MotivationMotivation

Zonal averages of precipitation trendsFrom Zhang, et al Nature 448, 461-465(26 July 2007)

Page 3: 1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.

33

Seasonal zonal averages of Arctic temperature trendsFrom Graversen, et al Nature 541, 53-56(3 Jan 2008)

Page 4: 1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.

44

Use caseUse case

Calculate zonally averaged seasonal Calculate zonally averaged seasonal temperature trends from 20temperature trends from 20thth century century climate experiment from four climate climate experiment from four climate models (NASA GISS, NCAR PCM and models (NASA GISS, NCAR PCM and CCSM, GFDL CM2.1, and Hadley CM3) in CCSM, GFDL CM2.1, and Hadley CM3) in CMIP3 archives from 30N to 90NCMIP3 archives from 30N to 90N

Total of 81 files in 36GBTotal of 81 files in 36GB

Time period of interest 1979-2000Time period of interest 1979-2000

Page 5: 1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.

55

Recipe is…Recipe is…

Regrid all model data to common gridRegrid all model data to common gridCalculate seasonal ensemble means for all Calculate seasonal ensemble means for all models for 30N-90N, 1979 - 2000models for 30N-90N, 1979 - 2000Calculate zonal means from seasonal Calculate zonal means from seasonal ensemble meansensemble meansCalculate seasonal trends from zonal meanCalculate seasonal trends from zonal meanPlot/download resultsPlot/download results

Page 6: 1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.

66

Traditional approachTraditional approach

Find datasets/variables of interestFind datasets/variables of interest

Download individual data files or subset Download individual data files or subset with OPeNDAPwith OPeNDAP

Analyze data locallyAnalyze data locally

Page 7: 1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.

77

Problems with traditional approachProblems with traditional approach

Awkward user interface(s)Awkward user interface(s)Obscure UI naming conventions makes it difficult Obscure UI naming conventions makes it difficult to find variables of interestto find variables of interest

Datasets often aren’t aggregatedDatasets often aren’t aggregated

Subsetting and/or aggregation services often fail Subsetting and/or aggregation services often fail with large datasets (e.g. out of memory errors)with large datasets (e.g. out of memory errors)

Requires download of 36GB of data (file Requires download of 36GB of data (file download) or ~2.5GB (OPeNDAP) for final download) or ~2.5GB (OPeNDAP) for final product ~5KB.product ~5KB.

Page 8: 1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.

More modern approachMore modern approach

Aggregated dataAggregated dataSpatial or temporal subsettingSpatial or temporal subsettingMeaningful variable and dataset namesMeaningful variable and dataset namesModern Web UIModern Web UI

Page 9: 1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.

Mandatory product plugMandatory product plug

Page 10: 1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.

1010

Dapper Dapper (dapper.pmel.noaa.gov/dapper)(dapper.pmel.noaa.gov/dapper)

Web server that provides distributed Web server that provides distributed access to in-situ or gridded data via access to in-situ or gridded data via OPeNDAP protocolOPeNDAP protocol

Aggregates local files, or remote datasets Aggregates local files, or remote datasets via HTTP or OPeNDAPvia HTTP or OPeNDAP

Streams data (no more “out of memory” Streams data (no more “out of memory” errors)errors)

Page 11: 1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.

1111

DChart (dapper.pmel.noaa.gov)DChart (dapper.pmel.noaa.gov)

Browser based tool for visualizing or Browser based tool for visualizing or downloading in-situ or gridded ocean or downloading in-situ or gridded ocean or atmospheric dataatmospheric data

Also aggregates dataAlso aggregates data

AJAX based user interfaceAJAX based user interface

Access to ~3.5 TB of gridded dataAccess to ~3.5 TB of gridded data

Configurable UIConfigurable UI

Page 12: 1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.

1212

Page 13: 1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.

What’s missing?What’s missing?

Still requires download of ~2.5GB for final Still requires download of ~2.5GB for final product ~5KBproduct ~5KB

Lots of clicking to download multiple Lots of clicking to download multiple datasetsdatasets

BIG problem for AR5 data needs (>1PB)BIG problem for AR5 data needs (>1PB)

Page 14: 1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.

Missing pieceMissing piece

Page 15: 1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.

Ideal analysis environment Ideal analysis environment (scientist perspective)(scientist perspective)

Highly interactive (i.e. command line)Highly interactive (i.e. command line)Scripting in familiar language of choice Scripting in familiar language of choice (bash, Python, Ruby, Matlab)(bash, Python, Ruby, Matlab)Access to multiple tools (Matlab, nco, cdo, Access to multiple tools (Matlab, nco, cdo, GrADS, Ferret, gdal, … )GrADS, Ferret, gdal, … )Access to custom home-grown toolsAccess to custom home-grown toolsStorage of intermediate products Storage of intermediate products (anomalies, statistics, etc.)(anomalies, statistics, etc.)

Page 16: 1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.

Limitations of Web servicesLimitations of Web services

Users locked-in to backend analysis Users locked-in to backend analysis softwaresoftwareDifficult to debugDifficult to debugSteep learning curveSteep learning curveHow to handle long lived operations?How to handle long lived operations?Security problemsSecurity problemsNo (or limited) scripting capabilitiesNo (or limited) scripting capabilitiesNot interactiveNot interactive

Page 17: 1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.

A cloud computing alternativeA cloud computing alternative

Upload data to cloudUpload data to cloudMove computation to dataMove computation to dataBoot VM preloaded with common analysis Boot VM preloaded with common analysis toolstoolsUsers can customize (and share) VM Users can customize (and share) VM images and dataimages and dataUsers have full ssh access to Xen VM(s) Users have full ssh access to Xen VM(s) running Linux with local access to data running Linux with local access to data stored in cloudstored in cloud

Page 18: 1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.

Amazon AWSAmazon AWS

Amazon EC2Amazon EC2– Uses customizable Linux XEN imageUses customizable Linux XEN image– Start 1-100 hosts in parallelStart 1-100 hosts in parallel– $0.10/instance-hour$0.10/instance-hour

Amazon S3Amazon S3– Data storage serviceData storage service– $0.15 GB/month for storage$0.15 GB/month for storage– Data transfer in $0.10/GBData transfer in $0.10/GB– Data transfer out $0.18/GBData transfer out $0.18/GB

Page 19: 1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.

Cloud analysis architectureCloud analysis architecture

Page 20: 1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.

Sample workflow (free version)Sample workflow (free version)

1.1. User authenticated via Web UIUser authenticated via Web UI2.2. EC2 instance booted with OPeNDAP EC2 instance booted with OPeNDAP

access to datasets (stored on S3 or EC2 access to datasets (stored on S3 or EC2 volumes)volumes)

3.3. User rpms installed (optional)User rpms installed (optional)4.4. ssh access to instance using ssh keypair ssh access to instance using ssh keypair

(generated when account issued)(generated when account issued)5.5. User analyzes, downloads, visualizes, ...User analyzes, downloads, visualizes, ...6.6. Instance restored to pool after user done Instance restored to pool after user done

(or after period of inactivity)(or after period of inactivity)

Page 21: 1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.

Analysis cloud advantagesAnalysis cloud advantages

ScalableScalableData lives in same network as softwareData lives in same network as softwareNo user software lock-inNo user software lock-inUsers can work in familiar environmentUsers can work in familiar environmentSecurity problems reducedSecurity problems reducedInteractiveInteractiveAccess to debugging toolsAccess to debugging tools

BUTBUT

Lots of details to work out!Lots of details to work out!

Page 22: 1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.

2222

Questions?Questions?

Page 23: 1 Adventures in Web Services for Large Geophysical Datasets Joe Sirott PMEL/NOAA.

2323

More infoMore info

PMEL Dapper ServerPMEL Dapper Serverhttp://dapper.pmel.noaa.gov/dapperhttp://dapper.pmel.noaa.gov/dapper

PMEL DChartPMEL DCharthttp://dapper.pmel.noaa.gov/dcharthttp://dapper.pmel.noaa.gov/dchart

Downloads, propagandaDownloads, propagandahttp://http://www.epic.noaa.gov/epic/software/dapperwww.epic.noaa.gov/epic/software/dapper//http://http://www.epic.noaa.gov/epic/software/dchart/www.epic.noaa.gov/epic/software/dchart/

[email protected]@noaa.gov