Download - Data Centric HPC for Numerical Weather Forecasting

Transcript
Page 1: Data Centric HPC for Numerical Weather Forecasting

DATA CENTRIC HPC FOR NUMERICAL

WEATHER FORECASTING

James Faeldon

Delfin Jay Sabido III

Karen España

IBM Philippines, STG Labs

Page 2: Data Centric HPC for Numerical Weather Forecasting

Extreme Weather Events

• The Philippines is home to devastating typhoons.

• 19 typhoons a year and intense monsoon rains that can

cause widespread flooding.

• Research collaboration by the Philippine Government,

University of the Philippines and IBM (2013).

P The strongest typhoons group

near the Philippines

Image courtesy of NOAA

Typhoon Tracks Eastern Hemisphere

Before After

Super Typhoon Haiyan (Nov 2013)

Image courtesy of DigitalGlobe

Page 3: Data Centric HPC for Numerical Weather Forecasting

Coupled Models for Pre-Disaster Planning

Numerical weather model

forecasts typhoon track and intensity

Machine learning model predicts

affected population and damages

Optimization model recommends

relief supplies pre-positioning and

allocation

Typhoons can be forecasted a few days in advance.

But we need more reports, better visualization and data

exploration tools to reduce analysis cycles and facilitate

timely decisions.

Operations Center

Page 4: Data Centric HPC for Numerical Weather Forecasting

Operational Forecasting Schedule Runs

Data-Intensive

Compute-Intensive

Data-Intensive processes increasingly becoming the

bottleneck in operational forecasting workflow.

Page 5: Data Centric HPC for Numerical Weather Forecasting

Drivers for Increased Data Processing

Analytics Big Data

Page 6: Data Centric HPC for Numerical Weather Forecasting

Operational Forecasting Data Challenges

Quality Control Sampling

Verification Machine Learning

Ensemble Forecasts

Update relief operations plan based on new forecast

+ 7 historical days

663 Gb per forecast

Model Output

Statistics

6-hour

processing

and

analysis

window

ETL

Source Qty Unit Size Total Size

AWS 733 7Kb/day 5Mb/day

Satellite 1 480Mb/day 480Mb/day

Radar 7 9Gb/day 63Gb/day

Real-time Sensor Data

Res Cells Grid Cells Total Size

12km 5.2 M 307 x 481 x 35 81Gb/forecast

4km 8.8 M 619 x 406 x 35 138Gb/forecast

Forecast Data

Page 7: Data Centric HPC for Numerical Weather Forecasting

Project Goals

• Manage and process data arriving in time-sensitive

remote sensors and weather forecasts.

• Reduce data analysis cycles to facilitate timely decisions.

Page 8: Data Centric HPC for Numerical Weather Forecasting

Numerical Weather Model

Post-Processing

MapReduce, NoSQL Database

Stream Pre-Processing Date Warehouse, OLAP Database

Weather Sensors

Observations Structured Data

Data A

ssimilatio

n

Fo

reca

st D

ata

1 Remote sensor data

in various format.

2 Quality Control,

Interpolation,

Sampling, Filtering,

Classification

3 High Performance

Computing

4 Store structured and

unstructured data for

analysis and post-

processing

5 Business

intelligence, data

mining,

visualization,

verification 6 Dashboards and Reports

Automated End-to-End Process

Decision Support Tool

Reports

Page 9: Data Centric HPC for Numerical Weather Forecasting

Hardware Infrastructure

Traditional HPC

(BlueGene/P)

Commodity Servers

(x86)

Elastic

Cloud Computing

(Virtual Machines)

In-situ Big Data

MapReduce

Real-time

Data Processing

OLAP

Visualization

Numerical Weather

Models

MPP Jobs

Page 10: Data Centric HPC for Numerical Weather Forecasting

Weather Model

• WRF ARW v3.5 limited area model

• 3.4 hours using 2048 cores

BlueGene/P (850Mhz).

10

Page 11: Data Centric HPC for Numerical Weather Forecasting

Pre-Processing • Stream Processing, ETL, R, Python

• Multi-stage quality control of remote sensor data.

• Spatio-temporal interpolation and sampling.

• Star-schema data warehouse.

• NoSQL with MapReduce.

NetCDF,

Image,

CSV

Staging

Files

Low-latency

Stream

Processing

ETL

Custom Scripts NoSQL

Data Warehouse BI Cubes

Observations,

Forecast Raw

Data

Quality

Control,

Sampling,

Filtering

Structured point or topological data (small <1TB),

emphasis on data consistency.

Gridded high-resolution data (big >1TB), emphasis

on availability and scalability. Input to coupled

models down the line.

Data stores for post

processing…

Page 12: Data Centric HPC for Numerical Weather Forecasting

Post Processing

• Business Intelligence Cubes • Multi-dimensional analysis

• Dashboards and reports

• GIS Integration

• MapReduce Views (NoSQL) • Model Verification

• Ensemble Forecasts/MOS

• Ad-Hoc Data Mining

Multi-Dimensional Cubes

MapReduce Views

Reports and Dashboards Reports and visualization generated using BI and data visualization tools

Custom Scripts Coupled Models Model Output

Statistics Reports and Dashboards

Down-stream predictive models uses MapReduce views as data source

Page 13: Data Centric HPC for Numerical Weather Forecasting

Current Challenges and Future Directions

• Improvements in geostatistics: Gridded data to topological features. • River basins, flood prone area, political boundaries and other locations of

interests

• Generating statistics makes for very data-intensive processing

• Potential for parallelization.

• Efficient stream processing engine of larger tuples with longer sliding windows. • Complex quality control and verification requires longer time-series statistics

spanning multi-day historical observed and forecasted data.

• Strategy: can we retain data processing all in-memory, caching, etc..

• Efficient MapReduce views on array-based data models and other approaches.

• Improvements on data warehousing schema. • Ongoing improvements for handling spatio-temporal data.

Page 14: Data Centric HPC for Numerical Weather Forecasting

Summary

• Planning for extreme weather events is a time-critical workflow that involves complex analysis of large data-sets from various sources.

• Recent advances in Big Data and HPC enables architecture of real-world disaster planning application.

• Current integration schemes uses intermediary staging files and ETL-like scripts.

• Better algorithms and techniques are needed to improve performance and integration.

Page 15: Data Centric HPC for Numerical Weather Forecasting

James Faeldon

[email protected]

IBM Philippines, STG Labs