The LEAD Effort at Unidata The Unidata Seminar will start at 1:30 PM MST.

Post on 18-Dec-2015

213 views 0 download

Tags:

Transcript of The LEAD Effort at Unidata The Unidata Seminar will start at 1:30 PM MST.

The LEAD Effort at Unidata

The Unidata Seminar will start at 1:30 PM MST

The LEAD Effort at Unidata

Tom Baltzer, Brian Kelly, Doug Lindholm, Anne Wilson

December 14, 2005

LEAD is funded by the National Science Foundation under the following Cooperative Agreements:

ATM-0331594ATM-0331591ATM-0331574ATM-0331480ATM-0331579 ATM-0331586 ATM-0331587 ATM-0331578

Outline

1. Setting the Stage: Introduction to LEAD and Unidata’s LEAD Efforts: Anne

2. Application of current technology on the LEAD testbeds: Tom

3. The LEAD Hardware at Unidata: Brian

4. The THREDDS Data Repository: Doug

Setting the Stage: Introduction to LEAD and Unidata’s LEAD

Efforts

Anne Wilson

Current IT Barriers to Mesoscale Weather Research and Education

• Data and tools useable mainly by experts

• Researchers and educators constrained by hardware limitations

• Rigid, brittle technology can’t accommodate mesoscale weather research requirements:– real time, on demand, dynamic data processing

and sensor steering

A Solution: Linked Environments for Atmospheric

Discovery (LEAD)• Funded by NSF Large Information Technology

Research (ITR) award• Produce a web service based, scalable

framework for handling meteorological data and model output:– Identifying, accessing, preparing, assimilating,

predicting, managing, analyzing, mining, visualizing– Independent of data format and physical location

• Dynamically adaptive workflows and steering of sensors

The LEAD Vision

• Data access via querying, and browsing

• Analysis and forecast tools that can be composed into workflows

• Workflows and sensors that respond to the weather

• Support users ranging from grade 6 to experienced researchers

LEAD Objectives

• Lower the barrier for entry and increase the sophistication of problems that can be addressed by complex end-to-end weather analysis and forecasting/simulation tools

• Improve our understanding of and ability to detect, analyze and predict mesoscale atmospheric phenomena by interacting with weather in a dynamically adaptive manner

• Result: Paradigm change in how experiments are conceived and performed

LEAD Challenges

Challenge RequirementsDisparate, high volume data sets Efficient transmission, remote

subsetting and aggregration, reliable, robust storage, format independence

Huge computational demands, e.g. ensemble forecasting

Distributed, load balanced computations

Use of existing complex numerical models and data assimilation systems

Make existing tools work in web service environment

Lack of controlled vocabulary Ontology, dictionary

Support for 6 – 12, college, graduate, and advanced research

Robust security, user aids, education modules, meaningful responses

Multidisciplinary Effort

• Meteorology

• Computer Science and Information Technology

• Education and Outreach

LEAD Institutions

> 100 scientists, students, technical staff

LEAD Thrust Groups

• Data*

• Orchestration

• Portal

• Meteorology

• Grid and Web Services Test Bed*

• Education and Outreach Test Bed

*Major Unidata areas

LEAD Data Subsystem

Query Service

Dictionary

Ontology Service

Resource Catalog

myLEAD Catalog

LEAD Data Repository (LDR)

Public Data (e.g. IDD

data)

LEAD Portal

Unidata Technology Used in LEAD

• LDM/IDD Data Delivery: near real time data delivery• THREDDS: catalogs of data and their associated

metadata• Common Data Model (CDM): single interface to

multiple data formats• THREDDS Data Server (TDS): integrated OPeNDAP

and http data access• Integrated Data Viewer (IDV): visualization• THREDDS Data Repository (TDR): data storage

framework• Decoders

Unidata and LEAD

• Unidata also brings:– Experience with atmospheric data– Community of users– Robust, fielded software

Recent LEAD-Related Efforts

2. Application of current technology on our LEAD testbed: Tom

3. Structure of the LEAD testbed: Brian

4. THREDDS Data Repository: Doug

Goal: Support both LEAD and our community

Application of Current Technologies on the LEAD

Testbed Systems

Tom Baltzer

Acronyms for LEAD ToolsADAS - ARPS Data Assimilation System (Center for Advanced Prediction of Storms at OU)

ADaM - Algorithm Development and Mining (University of Alabama at Huntsville)

IDV – Integrated Data Viewer (Unidata)

LDM/IDD – Local Data Manager/Internet Data Distribution (Unidata)

OPeNDAP – Open-source Project for a Network Data Access Protocol (OPeNDAP.org)

THREDDS – Thematic Real-time Environmental Distributed Data ServicesTDS - THREDDS Data ServerTDR – THREDDS Data Repository (Unidata)

WRF – The Weather and Research Forecasting Model (ARW Core - NCAR)

Also: WS-Eta – Workstation Eta Model

LEAD Testbed Systems• Testbed systems at several LEAD locations to provide:

– Data • Near Real-Time data ingest, storage and access• LEAD Data Product storage and access

– Data Processing• High Performance Computing• Grid and Web Services

• Allow each institution to develop methods by which their capabilities fit into LEAD effort• Single Web Portal system at Indiana Univ. to bring it all together and provide User Interface

Core Academic Partner + Grid Test BedCore Academic Partner + Education Test BedCore Academic Partner + Grid Test Bed + Education Test Bed

Core Academic Partner

CSUUnidata

OU

UI IU

UAH

UNC

MU

HU

LEAD Grid

Data Aspects of LEAD Testbeds

LEAD Testbed Systems• UPC Technologies being leveraged to facilitate LEAD needs

– LDM/IDD – THREDDS– IDV– NetCDF Decoders

– OPeNDAP (Unidata supported)

IDD

Testbed SystemTestbed System

Forecast Forecast Model OutputModel Output Weather station Weather station

observationsobservations

Aircraft dataAircraft data

Radar dataRadar data

Typical LEAD Testbed (Current Source Data Configuration)

Decoders

THREDDSCatalog

GridFTP

OPeNDAP

LEADGrid System

IDD

Testbed SystemTestbed System

Forecast Forecast Model OutputModel Output Weather station Weather station

observationsobservations

Typical LEAD “Data” Testbed (Future Source Data Configuration)

Decoders

THREDDSCatalog

GridFTP

LEADGrid System

TDS &TDR

Radar dataRadar data

Aircraft dataAircraft data

Note: UPC plans ~ 6 month store

OPeNDAP

LEAD Processing on the Unidata Testbed System

UPC Processing Testbed (Current Configuration)

NCEP NAM (Eta) Forecast

PrecipitationLocator

CenterLat/Lon

OPeNDAPAccess

THREDDSCatalog

Unidata LEAD Test Bed

RegionalForecasts

WS-Eta

WRF

Initial and

Boundary

Conditions

- WRF being Steered by Chiz’s GEMPAK precipitation locator

Next Steps

NCEP NAM (Eta) Forecast

PrecipitationLocator

Cen

ter

Lat

/Lo

n

OPeNDAPAccess

THREDDSCatalog

Unidata LEAD Test Bed

RegionalForecasts

WS-Eta

WRF

BoundaryConditions

CAPS ADASAssimilation

Initial

Conditions

MillersvilleADaM Precip

Locator

Longer Term

NCEP NAM (Eta) Forecast

PrecipitationLocator

CenterLat/Lon

OPeNDAPAccess

THREDDSCatalog

Unidata LEAD Test Bed

RegionalForecasts

WS-Eta

WRF

BoundaryConditions

ADAS

IDD Datasets• Radar• Surface & Upper air• Satellite• NCEP NAM

ADaM

Ultimately

NCEP NAM (Eta) Forecast

PrecipitationLocator

CenterLat/Lon

OPeNDAPAccess

THREDDSCatalog

Unidata LEAD Test Bed

RegionalForecasts

WS-Eta

Web ServiceWRF

BoundaryConditions

Web ServiceADAS

IDD Datasets• Radar• Surface & Upper air• Satellite• NCEP NAM

Web ServiceADaM

LEADGrid System

Objectives for UPC Testbed

• Testing ground for integration new UPC and LEAD technologies

• Determining ways to bring LEAD Technologies to the Unidata Community

• “Operational” environment for LEAD• Processing cluster• Data Storage

– ~6 months of IDD data– LEAD product data

The LEAD Hardware at Unidata

Brian Kelly

Existing LEAD Infrastructure

Lead1GRID Server

Development Tools

NFS Server

Cluster Node

Lead3HTTP Server

THREDDS Server

OpenDAP Server

LDM Node

NFS Server

Cluster Node

Lead2GRID Server

NFS Server

Cluster Node

Cluster Monitoring

Lead4TDS

LDM Node

NFS Server

Cluster Node

LeadStor8 TB of Disk

NFS Server

40 TB Storage

Cluster

~30 GFLOP

Processing Cluster

Portal Servers for Web,

TDS, Grid and

LDM Services

UCAR/Unidata LEAD

Infrastructure

LEAD Portal Systems

Processing Cluster

Head Node

HTTP, TDS and

Grid Server

LDM ServerTest Server

Gigabit Network for

NFS Storage Access

Storage Cluster

Gateway

Beowulf Cluster

Connected by a Gigabit

Fibre Network

LEAD Processing Cluster

Each Node contains Two Athlon 2400+ CPUs

Cluster Uses OSCAR with the MPICH MPD

Eight Nodes is ~30 GFLOPs

LEAD Storage Cluster

LEAD Storage

Gigabit Network

LEAD Storage Nodes

LEAD Storage

Head Node

One (1) Guanghsing GHI-583 5U Case24 hot swapable SATA trays

1000W 2+2 power supply

● One (1) Tyan Thunder K8SD Pro MotherboardDual Opteron CPUs

Four 64-bit 133/100 Mhz PCI-X Slots

Two Gigabit Ethernet ports

● One (1) AMD Opteron 242 Processor1.6 Ghz CPU

● Three (3) Broadcom RAIDCore BC4853Eight SATA ports

Controller spanning

Advanced raid

● Twenty-Four (24) Seagate Barracuda ST3400832AS7200 RPM 400GB SATA Drives

LEAD Storage Node

Twenty-Four (24) 400 GB Drives

Divided into Two (2) Eleven Column RAID 5 Arrays and Two Hot Spares

Form Two (2) 4 TB LUNs Using bcraid

Each Node Publishes the Two LUNS over iSCSI

LEAD Storage Node

● Mounts Each Node's Two (2) 4 TB LUNs Published via iSCSI

● Builds Two (2) 20 TB 6 column RAID 5 Meta-devices using mdadm

● Divides Each Meta-device into Volume using LVM

● Each Volume is Formatted with an XFS Filesystem

● Each Filesystem is Published with NFS

LEAD Storage Gateway

Result: 40 TB of mid-performance double-redundant storage

THREDDS Data Repository (TDR)

Doug Lindholm

LEAD ArchitectureData Storage Perspective

LEAD Data Grid

Unidata

NCSA

IU

OU

UAH

LEAD ArchitectureData Storage Perspective

LEAD Data Grid

Cataloger(myLEAD)

Storage Locator

Data Mover

ID Generator

Name Resolver

Metadata Generator

Metadata Crosswalk

Unidata

NCSA

IU

OU

UAH

“Atomic” Capabilities

LEAD ArchitectureData Storage Perspective

LEAD Data Grid

Cataloger(myLEAD)

Storage Locator

Data Mover

ID Generator

Name Resolver

Metadata Generator

Metadata Crosswalk

Unidata

NCSA

IU

OU

UAH

“Atomic” Capabilities

Application Services

DataMining

(ADAM)

Visualization(IDV)

DataAssimilation

(ADAS)

ForecastModel(WRF)

LEAD ArchitectureData Storage Perspective

LEAD Data Grid

Portal

Cataloger(myLEAD)

Storage Locator

Data Mover

ID Generator

Name Resolver

Metadata Generator

Metadata Crosswalk

Unidata

NCSA

IU

OU

UAH

“Atomic” Capabilities

Application Services

DataMining

(ADAM)

Visualization(IDV)

User

DataAssimilation

(ADAS)

ForecastModel(WRF)

LEAD ArchitectureData Storage Perspective

LEAD Data Grid

Portal

Cataloger(myLEAD)

Storage Locator

Data Mover

ID Generator

Name Resolver

Metadata Generator

Metadata Crosswalk

Unidata

NCSA

IU

OU

UAH

“Atomic” Capabilities

Application Services

DataMining

(ADAM)

Visualization(IDV)

User

DataAssimilation

(ADAS)

ForecastModel(WRF)

LEAD ArchitectureData Storage Perspective

LEAD Data Grid

Portal

Cataloger(myLEAD)

Storage Locator

Data Mover

ID Generator

Name Resolver

Metadata Generator

Metadata Crosswalk

Unidata

NCSA

IU

OU

UAH

“Atomic” Capabilities

Application Services

DataMining

(ADAM)

Visualization(IDV)

User

DataAssimilation

(ADAS)

ForecastModel(WRF)

LEAD ArchitectureData Storage Perspective

LEAD Data Grid

Portal

Cataloger(myLEAD)

Storage Locator

Data Mover

ID Generator

Name Resolver

Metadata Generator

Metadata Crosswalk

Unidata

NCSA

IU

OU

UAH

“Atomic” Capabilities

Application Services

DataMining

(ADAM)

Visualization(IDV)

User

DataAssimilation

(ADAS)

ForecastModel(WRF)

LEAD ArchitectureData Storage Perspective

LEAD Data Grid

Portal

Cataloger(myLEAD)

Storage Locator

Data Mover

ID Generator

Name Resolver

Metadata Generator

Metadata Crosswalk

Unidata

NCSA

IU

OU

UAH

“Atomic” Capabilities

Application Services

DataMining

(ADAM)

Visualization(IDV)

User

DataAssimilation

(ADAS)

ForecastModel(WRF)

DataRepository

TH

RE

DD

S D

ata

Rep

osito

ry

Stora

geLo

cato

r

locate-Storage()

Data

Move

r

move-Data()

Unique

ID

Gener

ator

generate-UniqueID()

Name

Resolv

er

mapID-ToURL()

Met

adat

a

Gener

ator

generate-Metadata()

Met

adat

a

Cross

walk

translate-Metadata()

Catal

oger

catalog-Metadata()

THREDDS Data RepositoryComponent Architecture

putData() getData()discoverData()

Data Storage

THREDDS Data Repository

Stora

geLo

cato

r

locate-Storage()

Data

Move

r

move-Data()

Unique

ID

Gener

ator

generate-UniqueID()

Name

Resolv

er

mapID-ToURL()

Met

adat

a

Gener

ator

generate-Metadata()

Met

adat

a

Cross

walk

translate-Metadata()

Catal

oger

catalog-Metadata()

THREDDS Data RepositoryComponent Architecture

THREDDS Data RepositoryputData() getData()discoverData()

Data Storage

Resou

rce

Broke

r

locate-Storage()

trebuch

et

move-Data()

Unique

ID

Gener

ator

generate-UniqueID()

RLS

mapID-ToURL()

THREDDS

Met

adat

a

Gener

ator

generate-Metadata()

THREDDS to L

EAD

Cross

walk

translate-Metadata()

myL

EAD

catalog-Metadata()

THREDDS Data RepositoryComponent Architecture

THREDDS Data RepositoryputData() getData()discoverData()

Data Storage

LEAD Configuration

Stora

geLo

cato

r

locate-Storage()

Data

Move

r

move-Data()

generate-UniqueID()

mapID-ToURL()

generate-Metadata()

translate-Metadata()

THREDDS

Catal

og

catalog-Metadata()

THREDDS Data RepositoryComponent Architecture

THREDDS Data RepositoryputData() getData()discoverData()

Data Storage

Alternate Configuration

THREDDS

Met

adat

a

Gener

ator

Unidata Architecture

Internet Data Distribution

(IDD)

DataStorage

Local DataManager

(LDM)

Unidata Architecture

Internet Data Distribution

(IDD)

DataStorage

Local DataManager

(LDM)

access

THREDDSClient

API

Unidata Architecture

Internet Data Distribution

(IDD)

THREDDSCatalog

DataStorage

Local DataManager

(LDM)discover

access

THREDDSClient

API

Unidata Architecture

Internet Data Distribution

(IDD)

THREDDSCatalog

DataStorage

Local DataManager

(LDM)

Common Data Model

(CDM)

discover

access

THREDDSClient

API

Unidata Architecture

Internet Data Distribution

(IDD)

THREDDSCatalog

THREDDSData

Server(TDS)

DataStorage

Local DataManager

(LDM)

Common Data Model

(CDM)

discover

access

THREDDSClient

API

Unidata Architecture

Internet Data Distribution

(IDD)

THREDDSCatalog

THREDDSData

Server(TDS)

THREDDSData

Repository(TDR)

DataStorage

Local DataManager

(LDM)

Common Data Model

(CDM)

discover

access

store

THREDDSClient

API

Unidata Architecture

Internet Data Distribution

(IDD)

THREDDSCatalog

THREDDSData

Server(TDS)

THREDDSData

Repository(TDR)

DataStorage

LocallyGenerated

Data

LocallyGenerated

Data

Local DataManager

(LDM)

Common Data Model

(CDM)

discover

access

store

store

store

THREDDSClient

API

Unidata Architecture

Internet Data Distribution

(IDD)

THREDDSCatalog

THREDDSData

Server(TDS)

THREDDSData

Repository(TDR) E-mail

Application(e.g. IDV)

Service

DataStorage

LocallyGenerated

Data

LocallyGenerated

Data

Local DataManager

(LDM)

Common Data Model

(CDM)

discover

access

store

store

store

notify

Questions?