Environmental Science, Big Data and the Cloud

51

description

Scientific instruments, environmental sensors, and large-scale simulations are creating more scientific data than ever before. By using advanced, large-scale information processing facilities, scientists are now able to analyze massive volumes of data in ways that never would have been possible just a few years ago. While a few researchers have access to these large computer systems, most are limited by the processing capacity they can access conveniently and quickly. Cloud computing solutions utilizing Microsoft Azure allow environmental science researchers to access the compute and storage resources that they need, when they need them—without the up-front financial investment required—and helps reduce the time between progress and breakthroughs. Microsoft Azure brings on-demand computing and data access to environmental scientists and researchers everywhere.

Transcript of Environmental Science, Big Data and the Cloud

Page 1: Environmental Science, Big Data and the Cloud
Page 2: Environmental Science, Big Data and the Cloud
Page 3: Environmental Science, Big Data and the Cloud
Page 4: Environmental Science, Big Data and the Cloud

Microsoft Research: Computational Ecology and Environmental Science Group

http://research.microsoft.com/en-us/groups/ecology/

Page 5: Environmental Science, Big Data and the Cloud
Page 6: Environmental Science, Big Data and the Cloud

Manual Measurement

Automated Measurement

Sample Collection

Historical Photographs

Counting

Ubiquitous

Motes

Aircraft SurveysModel Output

Typing

Page 7: Environmental Science, Big Data and the Cloud

Monitoring

Collation

Quality assurance

Aggregation

Analysis

Reporting

Forecasting

Distribution

Done poorly,but a few notablecounter-examples

Done poorly to moderately,not easy to find

Sometimes done well,generally discoverable and available,

but could be improved

Integration

(I. Zaslavsky & CSIRO, BOM, WMO)

Page 8: Environmental Science, Big Data and the Cloud

Data-intensive Science

Data

Acquisition &

modelling

Collaboration

and

visualisation

Analysis &

data mining

Dissemination

& sharing

Archiving and

preserving

fourthparadigm.org

Page 9: Environmental Science, Big Data and the Cloud
Page 10: Environmental Science, Big Data and the Cloud
Page 11: Environmental Science, Big Data and the Cloud

Complex shared detector Simple instrument (if any)

Complex and Heavy process by experts Ad hoc observations and models

KB

GB

TB

PB

Science happens when PBs, TBs, GBs, and KBs can be mashed up simply

Provenance and trust widely variesData acquisition, early processing, and reporting ranges from a large government agency to individual scientists.

Smaller data often passed around in email; big data downloads can take days (if at all)

Data sharing concerns and patterns varyOpen access followed by (non-repeatable and tedious) pre-processing

True science ready data set but concerns about misuse, misunderstanding particularly for hard won data.

Computational tools differ. Not everyone can get an account at a supercomputer center

Very large computations require engineering (error handling)

Space and time aren’t always simple dimensions

Page 12: Environmental Science, Big Data and the Cloud
Page 13: Environmental Science, Big Data and the Cloud

Getting what you need, when you need it

Cloud computing is good for…

Page 14: Environmental Science, Big Data and the Cloud
Page 15: Environmental Science, Big Data and the Cloud
Page 16: Environmental Science, Big Data and the Cloud

http://github.com/windowsazure

Page 17: Environmental Science, Big Data and the Cloud
Page 18: Environmental Science, Big Data and the Cloud
Page 19: Environmental Science, Big Data and the Cloud
Page 20: Environmental Science, Big Data and the Cloud
Page 21: Environmental Science, Big Data and the Cloud

Customer Data Center

Page 22: Environmental Science, Big Data and the Cloud
Page 23: Environmental Science, Big Data and the Cloud

http://fetchclimate2.cloudapp.net/

Page 24: Environmental Science, Big Data and the Cloud

Data Marketplaces

Page 25: Environmental Science, Big Data and the Cloud

Web search:

“open weather

data azure”

Page 26: Environmental Science, Big Data and the Cloud
Page 27: Environmental Science, Big Data and the Cloud
Page 28: Environmental Science, Big Data and the Cloud

Weather Forecast Computation as a Service

ttp://aka.ms/oljnt2

Page 29: Environmental Science, Big Data and the Cloud

http://weatherservice.cloudapp.net

Page 30: Environmental Science, Big Data and the Cloud
Page 31: Environmental Science, Big Data and the Cloud
Page 32: Environmental Science, Big Data and the Cloud
Page 33: Environmental Science, Big Data and the Cloud
Page 34: Environmental Science, Big Data and the Cloud

http://research.microsoft.com/en-us/projects/azure/technical-papers.aspx

Page 35: Environmental Science, Big Data and the Cloud

http://aka.ms/dm0 http://research.microsoft.com/projects/msrceesdm/

Page 36: Environmental Science, Big Data and the Cloud
Page 38: Environmental Science, Big Data and the Cloud
Page 39: Environmental Science, Big Data and the Cloud
Page 40: Environmental Science, Big Data and the Cloud
Page 41: Environmental Science, Big Data and the Cloud

MODIS Azure: Computing Evapotranspiration (ET) in the Cloud

A pipeline for

download,

processing, and

reduction of

diverse NASA

MODIS satellite

imagery.

Catharine van Ingen (Microsoft Research), Jie Li, Marty Humphrey (UVA), Youngryel Ryu (UCB), Deb Agarwal (BWC/LBL), Keith Jackson

(BL), Jay Borenstein (Stanford) , Team SICT: Vlad Andrei, Klaus Ganser, Samir Selman, Nandita Prabhu (Stanford), Team Nimbus: David Li,

Sudarshan Rangarajan, Shantanu Kurhekar, Riddhi Mittal (Stanford)

Page 42: Environmental Science, Big Data and the Cloud

MODIS Azure Service

Reduction #1 Queue

Scientific

Results

Downloa

d

Reduction #2 Queue

Source

Metadata

MODIS Azure

Service Web Role

Portal

Request

Queue

Analysis Reduction Stage

Data Collection Stage

Source Imagery Download Sites

. . .

Reprojection

Queue

Derivation Reduction Stage Reprojection Stage

Download

Queue

Scientists

Science results

Catharine van Ingen (Microsoft Research), Jie Li, Marty Humphrey (UVA), Youngryel Ryu (UCB), Deb Agarwal (BWC/LBL), Keith Jackson

(BL), Jay Borenstein (Stanford) , Team SICT: Vlad Andrei, Klaus Ganser, Samir Selman, Nandita Prabhu (Stanford), Team Nimbus: David Li,

Sudarshan Rangarajan, Shantanu Kurhekar, Riddhi Mittal (Stanford)

Page 43: Environmental Science, Big Data and the Cloud
Page 44: Environmental Science, Big Data and the Cloud
Page 45: Environmental Science, Big Data and the Cloud
Page 46: Environmental Science, Big Data and the Cloud

Use laptops &

desktop computers

Overwhelmed by

data

Finding analysis

ever more difficult;

sharing even

harder

Page 47: Environmental Science, Big Data and the Cloud

www.azure4research.com

Page 49: Environmental Science, Big Data and the Cloud
Page 50: Environmental Science, Big Data and the Cloud
Page 51: Environmental Science, Big Data and the Cloud