Environmental Science, Big Data and the Cloud

download Environmental Science, Big Data and the Cloud

of 51

  • date post

    23-Aug-2014
  • Category

    Science

  • view

    1.335
  • download

    4

Embed Size (px)

description

Scientific instruments, environmental sensors, and large-scale simulations are creating more scientific data than ever before. By using advanced, large-scale information processing facilities, scientists are now able to analyze massive volumes of data in ways that never would have been possible just a few years ago. While a few researchers have access to these large computer systems, most are limited by the processing capacity they can access conveniently and quickly. Cloud computing solutions utilizing Microsoft Azure allow environmental science researchers to access the compute and storage resources that they need, when they need them—without the up-front financial investment required—and helps reduce the time between progress and breakthroughs. Microsoft Azure brings on-demand computing and data access to environmental scientists and researchers everywhere.

Transcript of Environmental Science, Big Data and the Cloud

  • Microsoft Research: Computational Ecology and Environmental Science Group http://research.microsoft.com/en-us/groups/ecology/
  • Manual Measurement Automated Measurement Sample Collection Historical Photographs Counting Ubiquitous Motes Aircraft Surveys Model Output Typing
  • Monitoring Collation Quality assurance Aggregation Analysis Reporting Forecasting Distribution Done poorly, but a few notable counter-examples Done poorly to moderately, not easy to find Sometimes done well, generally discoverable and available, but could be improved Integration (I. Zaslavsky & CSIRO, BOM, WMO)
  • Data-intensive Science Data Acquisition & modelling Collaboration and visualisation Analysis & data mining Dissemination & sharing Archiving and preserving fourthparadigm.org
  • Complex shared detector Simple instrument (if any) Complex and Heavy process by experts Ad hoc observations and models KB GB TB PB Science happens when PBs, TBs, GBs, and KBs can be mashed up simply Provenance and trust widely varies Data acquisition, early processing, and reporting ranges from a large government agency to individual scientists. Smaller data often passed around in email; big data downloads can take days (if at all) Data sharing concerns and patterns vary Open access followed by (non-repeatable and tedious) pre-processing True science ready data set but concerns about misuse, misunderstanding particularly for hard won data. Computational tools differ. Not everyone can get an account at a supercomputer center Very large computations require engineering (error handling) Space and time arent always simple dimensions
  • Getting what you need, when you need it Cloud computing is good for
  • http://github.com/windowsazure
  • Customer Data Center
  • http://fetchclimate2.cloudapp.net/
  • Data Marketplaces
  • Web search: open weather data azure
  • Weather Forecast Computation as a Service ttp://aka.ms/oljnt2
  • http://weatherservice.cloudapp.net
  • http://research.microsoft.com/en-us/projects/azure/technical-papers.aspx
  • http://aka.ms/dm0 http://research.microsoft.com/projects/msrceesdm/
  • Windows Azure for Research Group @azure4research www.azure4research.com
  • MODIS Azure: Computing Evapotranspiration (ET) in the Cloud A pipeline for download, processing, and reduction of diverse NASA MODIS satellite imagery. Catharine van Ingen (Microsoft Research), Jie Li, Marty Humphrey (UVA), Youngryel Ryu (UCB), Deb Agarwal (BWC/LBL), Keith Jackson (BL), Jay Borenstein (Stanford) , Team SICT: Vlad Andrei, Klaus Ganser, Samir Selman, Nandita Prabhu (Stanford), Team Nimbus: David Li, Sudarshan Rangarajan, Shantanu Kurhekar, Riddhi Mittal (Stanford)
  • MODIS Azure Service Reduction #1 Queue Scientific Results Downloa d Reduction #2 Queue Source Metadata MODIS Azure Service Web Role Portal Request Queue Analysis Reduction Stage Data Collection Stage Source Imagery Download Sites . . . Reprojection Queue Derivation Reduction StageReprojection Stage Download Queue Scientists Science results Catharine van Ingen (Microsoft Research), Jie Li, Marty Humphrey (UVA), Youngryel Ryu (UCB), Deb Agarwal (BWC/LBL), Keith Jackson (BL), Jay Borenstein (Stanford) , Team SICT: Vlad Andrei, Klaus Ganser, Samir Selman, Nandita Prabhu (Stanford), Team Nimbus: David Li, Sudarshan Rangarajan, Shantanu Kurhekar, Riddhi Mittal (Stanford)
  • Use laptops & desktop computers Overwhelmed by data Finding analysis ever more difficult; sharing even harder
  • www.azure4research.com
  • Windows Azure for Research Group @azure4research www.azure4research.com