The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington)...

28
The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin O’Brien, Ansley Manke, Steve Du, Xiaoping Wang, Joe Mclean, Joe Sirott, Jerry Davison

Transcript of The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington)...

Page 1: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.

The Live Access Server(Access to observational data)

Jonathan Callahan (University of Washington)

Steve Hankin (NOAA/PMEL – PI)

Roland Schweitzer, Kevin O’Brien, Ansley Manke, Steve Du, Xiaoping Wang, Joe Mclean, Joe Sirott,

Jerry Davison

Page 2: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.

Gridded vs. Observational Data

•Clean

•Organized

•Labeled

•Voluminous

•Handled by machines

•Dirty

•Messy

•Often un/mis-labeled

•Increasingly voluminous

•Previously handled by hand

Page 3: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.

Live Access Server (LAS)

• Web based, common interface to diverse sources of climate data

• Single interface for subsetting, download, visualization, comparison

• Easy access to metadata and documentation

• Unified access to distributed data holdings

• Uniform user interface to existing back end visualization packages

Page 4: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.

LAS Data Model

For data access users must specify:

Dataset

Variable

4D Region‘Constraints’

Page 5: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.

Dataset

Page 6: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.

Dataset

Page 7: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.

Variable

Page 8: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.

4D RegionConstraints

Page 9: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.

Output

Page 10: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.

LAS Architecture

LAS is three tiered

Page 11: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.

Access to Remote Data

Ferret back end is linked with OPeNDAP

Page 12: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.

Data Server Details

Javaservletredesig

n

Page 13: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.

Server Side Functionality

After parsing the user request LAS must:

For interactive results each task should take <5 sec.

Access & Subset the data

Perform analysis

Create Visualization

Page 14: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.

The Hard Part

After parsing the user request LAS must:

Access & Subset the data

Perform analysis

Create Visualization

Page 15: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.

Classes of Observational Climate Data

Station time series (Eulerian)– Oceanic

• tide guages (1D)

• moored thermister chains (2D)

– Atmospheric• surface weather stations (1D)

• profilers (2D)

Page 16: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.

Classes of Observational Climate Data

Profile data– Oceanic

• CTD casts, bottle data (ordered by cruise track, quasi-scattered)

• repeat stations (ordered by cruise track or station location)

– Atmospheric• profilers (station based)

• baloons (2D, quasi-lagrangian)

Page 17: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.

Classes of Observational Climate Data

Tracks (Lagrangian)– Oceanic

• ship underway data (surface)

• drifting buoys (surface)

• ARGO floats (surface tracks, scattered profiles)

• instrumented animals (depth)

– Atmospheric• airplane underway data (altitude)

• baloons (altitude, quasi-stationary, quasi-profile)

Page 18: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.

Classes of Observational Climate Data

Random Scatter– Oceanic

• surface ship observations

• profile locations

– Atmospheric• surface weather obs

Page 19: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.

Example Dataset

NOAA/NODC/OCL World Ocean Database 2001– data collected from ocean cruises and moorings

– scattered profiles, lagrangian drifters

– physical, chemical and biological data

– dozens (hundreds?) of variables

– > 7 million profiles (1792-present, global)

– > 10 Gigabytes of data (accelerating every year)

Page 20: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.

Example Dataset

NOAA/NODC/OCL World Ocean Database 2001Current access:

• Choose either temporally or spatially sorted data• Choose year(s) or 10x10 degree box• Choose instrument• Retrieve data for all variables from that ‘file’

Problems:• Cannot subset data (1 year x 1 instrument ≈ 7 Mbytes)• Data returned in impenetrable compressed ASCII files• Associated metadata is lost

Page 21: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.

Example Dataset

NOAA/NODC/OCL World Ocean Database 2001Our attempt at synoptic/cross-instrument data access

– Store data by variable• Plan for those getting data out, not putting data in.

• What do scientific analysis and visualization packages need?

– Store data for minimum # of disk seeks• Memory is fast (and cheap!), disk seeks are slow.

• Multi-stage process for determining data blocks needed.

• Read excess data into memory, then winnow.

Page 22: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.

Example Dataset

NOAA/NODC/OCL World Ocean Database 2001

Longitude

Lati

tude

Tim

e

Step 1: synoptic meta-pointer file (0.3 MByte)a) load synoptic meta-pointer file into memoryb) subset to extract metadata pointers

10deg x 10deg x 50 irregular timesteps = 260 Kbytes

number of profilespointer into NetCDF metadata file=

Page 23: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.

Example Dataset

NOAA/NODC/OCL World Ocean Database 2001

Step 2: metadata/data-pointer file (200 Mbyte)a) read blocks of profile metadata into memoryb) subset by X/Y/T to obtain valid data pointers

TXY

Julian dayLatLonCruise ID# of levelsVar_ptrVar_QC

=

N variablesx

Page 24: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.

Example Dataset

NOAA/NODC/OCL World Ocean Database 2001

Step 3: data files (10 - 2000 Mbyte)a) read profile datab) subset by depth/quality flag to obtain valid data

1D profile

TXY Depth

ValueQuality flag

=Z N depthsx

Page 25: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.

Example Dataset

NOAA/NODC/OCL World Ocean Database 2001Our attempt at synoptic/cross-instrument data accessSuccesses:

• Able to subset without accessing (much) unwanted data• Access to (<1 Mbyte) subsets in seconds• Access to metadata (“What profiles exist?”) even faster

Problems:• Only set up for most important variables• Data cannot be updated, must be rewritten• Must reinvent logic for relational queries• Funky, home built soluition

Page 26: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.

Other data streams

• METAR obs (station time series)– 1700 US weather stations report hourly data– 25 variables = 120 Mbytes/month

• ARGO floats (profiles)– 4000 floats reporting profiles every 10 days– 50 levels x 10 variables = 24 Mbytes/month

• Tagging Of Pacific Pelagics (TOPP) (lagrangian tracks)– 50 animals per year tagged with 1 min data recorders– 5 variables = 0.8 Mbytes/month

• Voluntary Observing Ships (random scatter)– 3000 surface ship reports per day– 25 variables = 9 Mbytes/month

Page 27: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.

Observational Data Access Requirements

• Subset based on X, Y, Z, T or metadata (e.g. quality flag or station/ship/platform/animal_ID).

• Only return requested data. (Reduced volume for remote data access.)

• For near-real-time, daily updates are acceptable. (Can recreate static files on a daily basis if necessary.)

• Use standards wherever possible.• Make the creation of the database as simple as

possible. (Non-experts can follow cookbook examples.)

Page 28: The Live Access Server (Access to observational data) Jonathan Callahan (University of Washington) Steve Hankin (NOAA/PMEL – PI) Roland Schweitzer, Kevin.

Conclusion

• Efficient access to observational data is an unsolved problem.

• Data volumes are increasing exponentially.

• Data access problems hinder the development of interactive visualization tools.