New Resources in the Research Data Archive

32
New Resources in the Research Data Archive Doug Schuster

description

New Resources in the Research Data Archive. Doug Schuster. Topic Outline. New Resources Search/Discovery and Data Delivery TIGGE JRA-25 Routine Updates. Data Search, Discovery and Delivery. Popular Datasets Google Style Search Drill Down Style Search File Level Metadata Example: - PowerPoint PPT Presentation

Transcript of New Resources in the Research Data Archive

Page 1: New Resources in the Research Data Archive

New Resources in the Research Data Archive

Doug Schuster

Page 2: New Resources in the Research Data Archive

Topic Outline New Resources

Search/Discovery and Data Delivery TIGGE JRA-25 Routine Updates

Page 3: New Resources in the Research Data Archive

Data Search, Discovery and Delivery

Popular Datasets Google Style Search Drill Down Style Search File Level Metadata

Example: Search for model generated tropical cyclone track

data using “Drill Down” method.

Page 4: New Resources in the Research Data Archive

Data Search, Discovery, and Delivery

Page 5: New Resources in the Research Data Archive

Data Search, Discovery, and Delivery (Drill Down)

Page 6: New Resources in the Research Data Archive

Data Search, Discovery, and Delivery (Drill Down)

Page 7: New Resources in the Research Data Archive

Data Search, Discovery, and Delivery (File Level Metadata)

Page 8: New Resources in the Research Data Archive

Background on TIGGE

WMO World Weather Research Programme THORPEX

– THe Observing system Research and Predictability EXperiment

– THORPEX Interactive Global Grand Ensemble (TIGGE) Archive supports research• Grand Ensemble = multiple NWP centers ensembles

are combined (an ensemble of ensembles)• 10 international NWP Centers contributing to TIGGE

Page 9: New Resources in the Research Data Archive

Background on TIGGE

Three mirrored archive centers• NCAR• ECMWF• CMA

{Shared System Development!}

• Daily Data Flow Metrics– 245 GB– 1.6 Million gridded fields as separate data packets– 3000+ Files/day

Page 10: New Resources in the Research Data Archive

Data Receipt

Archive Centre

Current Data Provider

NCAR NCEP

CMC

UKMO

ECMWFMeteoFrance

JMAKMA

CMA

BoMCPTEC

IDD/LDM

HTTP

FTP

Unidata IDD/LDM

Internet Data Distribution / Local Data Manager

Commodity internet application to send and receive data

NCDC

Page 11: New Resources in the Research Data Archive

Archive Summary

• Online Data– Period, most recent two weeks– ~ 4 TB , public products– ~ 2 TB, data preparation, subsetting, DB

• Offline Data– Full period of record– ~ 200 TB, NCAR MSS system

Page 12: New Resources in the Research Data Archive

Major ChallengesInsure data receipt, build complete archive

Exchange manifest files as part of IDD/LDM data

transmission between Archive centersVerify send, receiveAutomated resend requests for missing fields

Collate data fields into different files typesHarvest and hold metadata in MySQL DB’s

Identify location of every field in file setUpdated often Critical for users interface and background data

processing

Page 13: New Resources in the Research Data Archive

Major ChallengesAccess system must accurately display

what common parameters are available as users make selectionsDriven by multi-center research (Grand

Ensemble)Parameters vary between centers.

Page 14: New Resources in the Research Data Archive

Variance between centers

N200N128

0.56x0.561.00x1.001.25x0.83

1.25x1.251.50x1.50

0 1 2 3 4

Spatial Resolution

ECMWF UKMO JMA NCEP CMA CMC BOM MF KMA CPTEC

Number of Data Providers

Mo

de

l Re

so

luti

on

ECMW

F

UKMO

JMA

NCEPCM

ACM

CBOM M

FKM

A

CPTEC

0

10

20

30

40

50

60

70

80 # fields, # ensemble members

Conforming parame-ters

Ensemble Members

ECMW

F

UKMO

JMA

NCEPCM

ACM

CBOM M

FKM

A

CPTEC

02468

1012141618

Forecast Length, Initialization

Forecast Length (Days)

Forecasts/day

Page 15: New Resources in the Research Data Archive

Get Forecast Data

NCAR online file archive

• Selection options (Portal or RDA)

•Center(s)•Date•File type (sl, pl, etc)•Initialization time•Forecast length

Download Options• Point and click using browser, one file at a time• Script to run on local machine

•User and password encrypted ‘wget’ commands• background process to access all files

User customized files

• Selection options (Portal)•Same as for files, plus•Parameter Subsets•Grid Interpolation•Spatial subsets•Formats, GRIB2, NetCDF

Delayed ModeReal Time

Two User Interfaces

Page 16: New Resources in the Research Data Archive

User access selection demonstration

Animation, what you will see– Multiple centers

• (ECMWF, UKMO, NCEP, CMA, CMC, KMA)– Fields/Parameters

• (Geopotential Height, 2m Temperature)– Levels

• (500 hPa, Single Level)– Spatial and temporal ranges

• (Global, 3-days, 12Z initializations, 48 hour forecasts)– Regridding to common spatial resolution

• (1.5°)– Output format

• (netCDF)

Page 17: New Resources in the Research Data Archive

Sample Data Request for an Event

Page 18: New Resources in the Research Data Archive
Page 19: New Resources in the Research Data Archive
Page 20: New Resources in the Research Data Archive
Page 21: New Resources in the Research Data Archive

Retrieve Completed Subset

Page 22: New Resources in the Research Data Archive
Page 23: New Resources in the Research Data Archive

Subset Request Animation

Page 24: New Resources in the Research Data Archive

Gustav/Hannah Animation

Page 25: New Resources in the Research Data Archive

Features of JRA-25/JCDAS at NCAR

All data available through web/RDA portal and NCAR MSS, 11 TB• Available dates, 1979 though 2007• 23 different data products

– 4 x daily, GRIB1 format– Monthly mean, netCDF (NCAR derived from binary) format

• All data users are registered and must agree to JMA’s ‘Condition of Use’

Page 26: New Resources in the Research Data Archive

Typhoon Sepat, 16 August 2007

Images courtesy Dave Stepaniak

Page 27: New Resources in the Research Data Archive

Routine Updates• NCEP

FNL Global Tropospheric Analysis (Daily)BUFR/PREPBUFR obs. data (Weekly)

• Unidata IDD data (Daily)NetCDF format obs collected from GTSIDD model data (GRIB-2)

GFSNAMRUC

Page 28: New Resources in the Research Data Archive

Routine Updates• SST

NCEP OI Global SST 1x1 Deg (weekly)NOAA OI Global 0.25 x 0.25 SST (monthly)Hadley Centre Global Sea Ice and SST (monthly)

• ReanalysisNNR Yearly updatesNARR Yearly updatesJRA-25

Page 29: New Resources in the Research Data Archive

Questions?

Page 30: New Resources in the Research Data Archive

Lessons Learned

Manifest files and automated resend are critical for a complete archive

The impact of different contributions from the NWP centers across archive cannot be under estimated

There are important design considerations to insure prompt browser interactions Caching data from the DB

Page 31: New Resources in the Research Data Archive

Lessons Learned

Computational resource requirements ramp up quickly with multi-dimensional problemsD’s, center, ensemble member, parameter,

forecast length, etc. Archive file structure choices greatly impact

subsetting abilityTIGGE currently based on synoptic orderTime-series by parameter could be better?

Page 32: New Resources in the Research Data Archive

Major Challenges Limited online storage – 4 TB, ≅ 2 weeks

temporal coverageFull archive on NCAR Mass Storage

System User registration and metrics required

Accept data policy; for research and education only

48 hour delay from forecast initialization time