Gridded Data Sub-setting Services through the RDA at NCAR

16
1

description

Gridded Data Sub-setting Services through the RDA at NCAR. Doug Schuster, Steve Worley, Bob Dattore , Dave Stepaniak. Gridded Data Sub-setting Services Through the RDA at NCAR. Research Data Archive (RDA) Overview Problem Background Required Infrastructure Current Services - PowerPoint PPT Presentation

Transcript of Gridded Data Sub-setting Services through the RDA at NCAR

Page 1: Gridded Data Sub-setting Services through the RDA at NCAR

1

Page 2: Gridded Data Sub-setting Services through the RDA at NCAR

Gridded Data Sub-setting Services through the RDA at

NCAR

Doug Schuster, Steve Worley, Bob Dattore, Dave Stepaniak

Page 3: Gridded Data Sub-setting Services through the RDA at NCAR

3

Gridded Data Sub-setting Services Through the RDA at NCAR

• Research Data Archive (RDA) Overview• Problem Background• Required Infrastructure• Current Services• Future Directions

Page 4: Gridded Data Sub-setting Services through the RDA at NCAR

4

RDA Overview• Total archive volume over 1.3 PB• 8000+ unique users annually

Meteorological and Oceanographic Observations

Operational and Reanalysis model outputs

Remote Sensing Observations

Topography/Bathymetry, Vegetation, Land Use

Page 5: Gridded Data Sub-setting Services through the RDA at NCAR

5

Problem Background

NNR (1996)

ERA-40 (2003)

NARR (2004)

JRA-25 (2006)

ERA-I (2011)

CFSR (2011)

0

10

20

30

40

50

60

70

80

Reanalysis Data Volume

Volu

me

(TB

)

Dat

a Vo

lum

e

Page 6: Gridded Data Sub-setting Services through the RDA at NCAR

6

Problem Background• Large computational/storage resources needed

– Store data– Extract desired data from large grids/files– Convert data to desirable format(s)

Scientific data centers have these resources

Individual researchers generally don’t

Page 7: Gridded Data Sub-setting Services through the RDA at NCAR

7

Problem Background• Goals

– Make data more accessible and easier to use for individual researchers• Reasonable access volumes• Desired data formats• User defined parameters/grids

• Researchers stay focused on research

Page 8: Gridded Data Sub-setting Services through the RDA at NCAR

8

Required Infrastructure

Powerful Computing

NCAR HPC/DAV

Large Disk Storage (500 TB)

Rich and Detailed Metadata Databases

(RDADB)

Generalized Software Tools-Control system (RDAMS)-Sub-setting-Format conversion

Web InterfaceCommand Line

Interface

Page 9: Gridded Data Sub-setting Services through the RDA at NCAR

9

Required Infrastructure• Rich Metadata Databases (key ingredient)

Metadata DB

File attribute metadata:Name, Dataset, Location,

Format

File content metadata:T(C,D,T,L,L)

RH(C,D,T,L,L)Vort(C,D,T,L,L)Vis(C,D,T,L,L)

PcpR(C,D,T,L,L)

Drive Interfaces

Support Efficient Backend Processing

Provide Scalability

Page 10: Gridded Data Sub-setting Services through the RDA at NCAR

10

Current Services• Sub-setting available on 13 datasets

– ERA-I, CFSR, Operational Model, EaSM– Also available on select observation sets

• Sub-setting options– Parameter selection– Spatial region selection (limited availability)

• Available output formats– Native GRIB formats– NetCDF format

Page 11: Gridded Data Sub-setting Services through the RDA at NCAR

11

Current Services

Page 12: Gridded Data Sub-setting Services through the RDA at NCAR

12

Current Services

• Sub-set requests• Processed in delayed mode• User notified by email when request is ready• Download data via server provided wget

scripts

Page 13: Gridded Data Sub-setting Services through the RDA at NCAR

13

Current Services

Aug Sep Oct Nov Dec0

200

400

600

800

1000

Number of Unique Users and Requests for 2011 Gridded Sub-Sets

#UniqueUsers#Requests

Month (2011)

Cou

nt

Page 14: Gridded Data Sub-setting Services through the RDA at NCAR

14

Current Services

Aug Sep Oct Nov Dec0

50

100

150

200

250

300

Volume of Data Accessed and Output for 2011 Gridded Sub-Set Requests

Data AccessedData Output

Month (2011)

Volu

me

(TB

)

Page 15: Gridded Data Sub-setting Services through the RDA at NCAR

15

Future Directions• Spatial Interpolation• Faster Request Processing (NWSC)• Include More RDA Datasets• Improved Access Portals• Additional Output Formats• Web Service Access

Page 16: Gridded Data Sub-setting Services through the RDA at NCAR

16

Summary • Data Analysis Research Challenges

– Large and Growing Data Volumes– Numerous Formats

• RDA – Supply “User Friendly” Data– Parameter and Spatial Sub-Setting– Format Conversion– Improved and Additional Services

http://[email protected]