Indiana University School of Informatics The LEAD Gateway Dennis Gannon, Beth Plale, Suresh Marru,...

21
Indiana University School of Informatics The LEAD Gateway Dennis Gannon, Beth Plale, Suresh Marru, Marcus Christie School of Informatics Indiana University

Transcript of Indiana University School of Informatics The LEAD Gateway Dennis Gannon, Beth Plale, Suresh Marru,...

Page 1: Indiana University School of Informatics The LEAD Gateway Dennis Gannon, Beth Plale, Suresh Marru, Marcus Christie School of Informatics Indiana University.

Indiana University School of Informatics

The LEAD Gateway

Dennis Gannon, Beth Plale, Suresh Marru, Marcus Christie

School of InformaticsIndiana University

Page 2: Indiana University School of Informatics The LEAD Gateway Dennis Gannon, Beth Plale, Suresh Marru, Marcus Christie School of Informatics Indiana University.

Indiana University School of Informatics

Overview• The LEAD ITR Project

– Science Objectives– Adaptive CyberInfrastructure for Mesoscale Storm Prediction

• A tour of the LEAD project– Components of our approach to Data and Data Driven Adaptive Workflow

• Experience so far.• The Gateway Lifecycle

Page 3: Indiana University School of Informatics The LEAD Gateway Dennis Gannon, Beth Plale, Suresh Marru, Marcus Christie School of Informatics Indiana University.

Indiana University School of Informatics

Predicting Storms• Hurricanes and tornadoes cause

massive loss of life and damage to property

• Underlying physical systems involve highly non-linear dynamics so computationally intense

• Data comes from multiple sources– “real time” derived from streams of

data from sensors– Archived in databases of past storms

• Infrastructure challenges:– Data mine instrument radar data for

storms– Allocate supercomputer resources

automatically to run forecast simulations

– Monitor results and retarget instruments.

– Log provenance and metadata about experiments for auditing.

Page 4: Indiana University School of Informatics The LEAD Gateway Dennis Gannon, Beth Plale, Suresh Marru, Marcus Christie School of Informatics Indiana University.

Indiana University School of Informatics

The LEAD Project

Page 5: Indiana University School of Informatics The LEAD Gateway Dennis Gannon, Beth Plale, Suresh Marru, Marcus Christie School of Informatics Indiana University.

Indiana University School of Informatics

Analysis/Assimilation

Quality ControlRetrieval of Unobserved

QuantitiesCreation of Gridded Fields

Prediction/Detection

PCs to Teraflop Systems

Product Generation, Display,

Dissemination

End Users

NWSPrivate Companies

Students

Traditional Methodology

STATIC OBSERVATIONS

Radar DataMobile Mesonets

Surface ObservationsUpper-Air BalloonsCommercial Aircraft

Geostationary and Polar Orbiting SatelliteWind ProfilersGPS Satellites

The Process is Entirely Serial

and Static (Pre-Scheduled):

No Response to the Weather!

The Process is Entirely Serial

and Static (Pre-Scheduled):

No Response to the Weather!

Page 6: Indiana University School of Informatics The LEAD Gateway Dennis Gannon, Beth Plale, Suresh Marru, Marcus Christie School of Informatics Indiana University.

Indiana University School of Informatics

Analysis/Assimilation

Quality ControlRetrieval of Unobserved

QuantitiesCreation of Gridded Fields

Prediction/Detection

PCs to Teraflop Systems

Product Generation, Display,

Dissemination

End Users

NWSPrivate Companies

Students

The LEAD Vision: Adaptive Cyberinfrastructure

DYNAMIC OBSERVATIONS

Models and Algorithms Driving Sensors

The CS challenge: Build cyberinfrastructure services that The CS challenge: Build cyberinfrastructure services that provide adaptability, scalability, availability, useability, and provide adaptability, scalability, availability, useability, and real-time response. real-time response.

Page 7: Indiana University School of Informatics The LEAD Gateway Dennis Gannon, Beth Plale, Suresh Marru, Marcus Christie School of Informatics Indiana University.

Indiana University School of Informatics

Change the Paradigm• To make fundamental advances we need:

– Adaptivity in computational model.

• But also Cyberinfrastructure to:– Execute complex scenarios in response to weather events• Stream processing, triggers• Close loop with the instruments.

– Acquire computational resources on demand.• Need supercomputer-scale resources• Invoked in response to weather events

– Deal with data deluge• User can no longer manage his/her own experiment products

Page 8: Indiana University School of Informatics The LEAD Gateway Dennis Gannon, Beth Plale, Suresh Marru, Marcus Christie School of Informatics Indiana University.

Indiana University School of Informatics

The LEAD Gateway Portal• To support three classes of users– Meteorology research scientists & grad students.

– Undergrads in meteorology classes– People who want easy access to weather data.

Go to:http://www.leadproject.org

Page 9: Indiana University School of Informatics The LEAD Gateway Dennis Gannon, Beth Plale, Suresh Marru, Marcus Christie School of Informatics Indiana University.

Indiana University School of Informatics

Gateway Components • A Framework for Discovery

– Four basic components

• Data Discovery– Catalogs and index services

• The experiment– Computational workflow managing on-demand resources

• Data analysis and visualization• Data product preservation,

– automatic metadata generation and experimental data providence.

Page 10: Indiana University School of Informatics The LEAD Gateway Dennis Gannon, Beth Plale, Suresh Marru, Marcus Christie School of Informatics Indiana University.

Indiana University School of Informatics

Data Search

• Select a region and a time range and desired attributes

Page 11: Indiana University School of Informatics The LEAD Gateway Dennis Gannon, Beth Plale, Suresh Marru, Marcus Christie School of Informatics Indiana University.

Indiana University School of Informatics

Portal: Experimental Data & Metadata Space

• CyberInfrastructure extends user’s desktop to incorporate vast data analysis space.

• As users go about doing scientific experiments, the CI manages back-end storage and compute resources.– Portal provides ways to explore

this data and search and discover it.

• Metadata about experiments is largely automatically generated, and highly searchable.– Describes data object (the file)

in application-rich terms, and provides URI to data service that can resolve an abstract unique identifier to real, on-line data “file”.

Page 12: Indiana University School of Informatics The LEAD Gateway Dennis Gannon, Beth Plale, Suresh Marru, Marcus Christie School of Informatics Indiana University.

Indiana University School of Informatics

Workflow: Composing Computational Tools to build

new Tools• Workflow is a term that describes

the process of moving data through a sequence of analysis and transformational steps to achieve a goal.

• Another Paradigm Shift for the users.

• Each activity a user initiates in LEAD is an Experiment which consists of– Data discovery and collection.– Applied analysis and transformation

• A graph of activities (workflow)

– Curated data products and results

• Each workflow activity is logged using an event system and stored as metadata in the users workspace.– Provides a complete provenance of

work.

Page 13: Indiana University School of Informatics The LEAD Gateway Dennis Gannon, Beth Plale, Suresh Marru, Marcus Christie School of Informatics Indiana University.

Indiana University School of Informatics

The Experiment Builder

• A Portal “wizzard” that leads the user through the set-up of a workflow

• Asks the user: – “Which workflow do you want to run?”

• Once this is know, it can prompt the user for the required input data sources

• Then it “launches” the workflow.

Page 14: Indiana University School of Informatics The LEAD Gateway Dennis Gannon, Beth Plale, Suresh Marru, Marcus Christie School of Informatics Indiana University.

Indiana University School of Informatics

Parameter Selection

Page 15: Indiana University School of Informatics The LEAD Gateway Dennis Gannon, Beth Plale, Suresh Marru, Marcus Christie School of Informatics Indiana University.

Indiana University School of Informatics

Selecting the forecast region

Page 16: Indiana University School of Informatics The LEAD Gateway Dennis Gannon, Beth Plale, Suresh Marru, Marcus Christie School of Informatics Indiana University.

Indiana University School of Informatics

Page 17: Indiana University School of Informatics The LEAD Gateway Dennis Gannon, Beth Plale, Suresh Marru, Marcus Christie School of Informatics Indiana University.

Indiana University School of Informatics

Gateway Support for Adaptive QueriesLEAD requires ability to construct workflows that

are • Data Driven

– Weather data streams define nature of computation• Persistent and Agile

– Data mining of data stream, detects “interesting” feature, event triggers workflow scenario that has been waiting for months.

• Adaptive– In response to weather: weather changes. – Nature of workflow may have to change on-the-fly.– Resource and requirements change.

Page 18: Indiana University School of Informatics The LEAD Gateway Dennis Gannon, Beth Plale, Suresh Marru, Marcus Christie School of Informatics Indiana University.

Indiana University School of Informatics

Experience with on-demand computing

• We use TeraGrid.– Actually “best effort” and not yet “on demand”

– Use Grid technology for remote job execution and security.

• Reliability is critical.• Workflow can automatically resubmit a failed task to another resource

• Urgent Computing handled by the Spruce Gateway.

Page 19: Indiana University School of Informatics The LEAD Gateway Dennis Gannon, Beth Plale, Suresh Marru, Marcus Christie School of Informatics Indiana University.

Indiana University School of Informatics

Validating Scientific Discovery

• The Gateway is becoming part of the process of science by being an active repository of data provenance

• Disks are cheap, so why not record everything?

• The system records each computational experiment that a user initiates – A complete audit trail of the experiment or computation

– Published results can include link to provenance information for repeatability and transparency.

Page 20: Indiana University School of Informatics The LEAD Gateway Dennis Gannon, Beth Plale, Suresh Marru, Marcus Christie School of Informatics Indiana University.

Indiana University School of Informatics

Experience so far• First release to support “WxChallenge: the new

collegiate weather forecast challenge”– The goal: “forecast the maximum and minimum temperatures,

precipitation, and maximum sustained wind speeds for select U.S. cities.

– to provide students with an opportunity to compete against their peers and faculty meteorologists at 64 institutions for honors as the top weather forecaster in the nation.”

– 79 “users” ran 1,232 forecast workflows generating 2.6TBybes of data.

• Over 160 processors were reserved on Tungsten from 10am to 8pm EDT(EST), five days each week

• National Spring Forecast– First use of user initiated 2Km forecasts as part of that program.

Generated serious interest from National Severe Storm Center.

• Integration with CASA project scheduled for final year of LEAD ITR.

Page 21: Indiana University School of Informatics The LEAD Gateway Dennis Gannon, Beth Plale, Suresh Marru, Marcus Christie School of Informatics Indiana University.

Indiana University School of Informatics

The LEAD Gateway Lifecycle

• Work began in 2003 with requirements analysis by the LEAD meteorology and CS teams.

• First 2 years of development supported by LEAD ITR and NMI Portals project.

• Year 3 & 4 support of 2 FTE from TG.– Public Release March 2007.

• Current Status– A new production release in July 2007.– Last year of LEAD ITR: hardened version of the Gateway to transition to community support• UCAR - UNIDATA may be the host. • Extensive planning underway.