RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is...

9
RDA Data Support Section

description

1. What is it?  Research Data Archive (RDA)  600+ datasets that are significant to many NCAR and University scientists  Archive work began over 40 years ago  Branded as RDA in 2003  Generally, focused on atmospheric and oceanic environmental measurements or analyzed products derived from them  Critical data for weather and climate studies

Transcript of RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is...

Page 1: RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?

RDA

Data Support Section

Page 2: RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?

Topics

1. What is it?2. Who cares?3. Why does the RDA need CISL?4. What is on the horizon?

Page 3: RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?

1. What is it?

Research Data Archive (RDA) 600+ datasets that are significant to many

NCAR and University scientists Archive work began over 40 years ago Branded as RDA in 2003 Generally, focused on atmospheric and

oceanic environmental measurements or analyzed products derived from them

Critical data for weather and climate studies

Page 4: RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?

Who cares?

Growth in user access via the web, 2001 - 08• Promoted with more online data and better interfaces

Consistent user access from the MSS• Represents provision to NCAR computers

26-year record for filling one-off data requests• Decreasing as web increases in recent years

Over 6000 Unique Users in 2008

Page 5: RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?

Rely heavily on CISL infrastructure and experts: Secure and reliable MSS/HPSS storage Disk to support web services Networks to bring data in and distribute out to users Computing platforms to prepare and serve the RDA DSS is Geo-science educated; need technical advise/support

Current metrics Storage:

Primary – 400+ TB, 4+M files All – 800+ TB (backup/working/etc) Disk: 40TB on SAN

Servers and laptops Servers (8) mix of SunOS & Linux About 12 laptops/desktops

Data movement and growth

Why does RDA need CISL?

Page 6: RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?

Complete User CommunityPros:-Fast access to online data.-Access to all RDA content metadata.-Access to RDA data. processing services.

Complete User CommunityCons:-Slow access to offline data.-Have to create a separate RDA account and log in.-Data processing requests take a long time to finish.-Slow download speeds for some users.

HPC User CommunityPros:-Access to full RDA.-Fast computing.-No login required.

HPC User CommunityCons:-No access to online data.-Forced to use MSS as a file server: access is too slow-No direct access to RDA metadata.-No direct access to RDA data processing services.

Page 7: RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?

Complete User CommunityImprovements:-Fast access to full RDA.-Expanded data processing services available.-Faster turnaround on data processing requests.-No need for separate RDA user account. Authenticate through Kerberos?-Faster download speeds (future tools with proper data usage authorization –GRID FTP, etc…).-Consistent “first point of contact” for user support?

HPC User CommunityImprovements:-Fast access to full RDA.-Access to all RDA content metadata.-Access to RDA data processing services.-No need for separate RDA user account.-Consistent “first point of contact” for user support. ?

Page 8: RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?

What is on the horizon?

Transition off all SunOS to Linux Move SAN storage to GPFS GLADE Put more data online in GLADE (O 130TB)

Fast access path internal and external Transition ALL RDA from MSS to HPSS Implement more on demand products

Data extraction and computing across TB datasets Must be successful in GLADE, with HPSS,

and using a scalable DA compute environment

Page 9: RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?

Questions

1. What is it?2. Who cares?3. Why does the RDA need CISL?4. What is on the horizon?