Sudarshan S. Chawathe › epscor › wp-content › uploads › sites › 25 › 2013 › ... ·...
Transcript of Sudarshan S. Chawathe › epscor › wp-content › uploads › sites › 25 › 2013 › ... ·...
Accelerating Scientific Dataflows
Sudarshan S. Chawathe
Associate Professor of Computer Science& Cooperating Associate Professor of Climate Change Institute
University of Maine
A Data-Centric ViewProject 301P301dx FeaturesTambora & SO4Map: Icereader DataWeb App ChallengesRFDERFDE Client UpgradesWeb Mapping ServiceWMS ParametersHandheld DataSummary
Sudarshan S. Chawathe Accelerating Scientific Dataflows – p. 2
A Data-Centric Viewn What are the primary and supplemental datasets?
n How are different datasets acquired?
n What are the key transformations, interpretations, andvisualizations?
n What may be automated? What requires humaninterpretation?
n What are effective and efficient modes of interaction withdata?
A Data-Centric ViewProject 301P301dx FeaturesTambora & SO4Map: Icereader DataWeb App ChallengesRFDERFDE Client UpgradesWeb Mapping ServiceWMS ParametersHandheld DataSummary
Sudarshan S. Chawathe Accelerating Scientific Dataflows – p. 3
Project 301n Cyber-Infrastructure for Climate-Change Research.
n Goal: Accelerate scientific discoveries by enabling moreeffective management of large and diverse datasets.
n Approach: Develop domain-specific adaptations of datamanagement methods. Implement and evaluate the methodson real data.
n Research topics (Computer Sci.):u Data importation: “ETL” for scientific data.u Data integration: instruments, documents, Web services, ...u Interactive data exploration and visualization.u Visual programming.u Data mining.u Provenance of data.u Workflows.u Systems issues: performance, scalability, reliability, ...
A Data-Centric ViewProject 301P301dx FeaturesTambora & SO4Map: Icereader DataWeb App ChallengesRFDERFDE Client UpgradesWeb Mapping ServiceWMS ParametersHandheld DataSummary
Sudarshan S. Chawathe Accelerating Scientific Dataflows – p. 4
P301dx Featuresn Integrated view of large, diverse datasets: ice-core data,volcanic records, data extracted from documents, ...
n Interactive data exploration based on charts plottingtime-series and related data, maps, ...
n Palette of tools for data processing, plotting, and othermanipulations. Built-in tools for resampling, smoothing, ...
n Tools that operate on, and produce, objects in theworking-object store, simplifying multi-step data manipulationand plotting.
n Interactive generation of new tools by composition and otherhigher-level operations: tool-generating tools.
n Chart exportation in high-quality vector and raster formats.
n A door to the larger cyber-infrastructure effort, P301.
A Data-Centric ViewProject 301P301dx FeaturesTambora & SO4Map: Icereader DataWeb App ChallengesRFDERFDE Client UpgradesWeb Mapping ServiceWMS ParametersHandheld DataSummary
Sudarshan S. Chawathe Accelerating Scientific Dataflows – p. 5
Tambora and SO4
A Data-Centric ViewProject 301P301dx FeaturesTambora & SO4Map: Icereader DataWeb App ChallengesRFDERFDE Client UpgradesWeb Mapping ServiceWMS ParametersHandheld DataSummary
Sudarshan S. Chawathe Accelerating Scientific Dataflows – p. 6
Map: Icereader Data
A Data-Centric ViewProject 301P301dx FeaturesTambora & SO4Map: Icereader DataWeb App ChallengesRFDERFDE Client UpgradesWeb Mapping ServiceWMS ParametersHandheld DataSummary
Sudarshan S. Chawathe Accelerating Scientific Dataflows – p. 7
Web Application Challenges
1. REST: Representational State Transfer.n Robust and scalable Web applications.
n Standards-based, wide availability.
n Broadly accessible.
2. Modern Web interfaces: JavaScript, HTML5, ...n High interactivity.
n Client-side optimizations.
n Glamor.
3. How to consolidate 1 and 2?
A Data-Centric ViewProject 301P301dx FeaturesTambora & SO4Map: Icereader DataWeb App ChallengesRFDERFDE Client UpgradesWeb Mapping ServiceWMS ParametersHandheld DataSummary
Sudarshan S. Chawathe Accelerating Scientific Dataflows – p. 8
RFDE: Robust Web Applications
n REST Framework for Dynamic Environments
A Data-Centric ViewProject 301P301dx FeaturesTambora & SO4Map: Icereader DataWeb App ChallengesRFDERFDE Client UpgradesWeb Mapping ServiceWMS ParametersHandheld DataSummary
Sudarshan S. Chawathe Accelerating Scientific Dataflows – p. 9
RFDE Client Upgrades
A Data-Centric ViewProject 301P301dx FeaturesTambora & SO4Map: Icereader DataWeb App ChallengesRFDERFDE Client UpgradesWeb Mapping ServiceWMS ParametersHandheld DataSummary
Sudarshan S. Chawathe Accelerating Scientific Dataflows – p. 10
Web Mapping Service
InterpolationModule
TileRenderer
LoadBalancer
Database ServersTMS Servers
Clients
Desktop Applications
WebApplications
MobileApplications
x,y,z
Grids
Cached
Tiles
& Static
n Arbitrary geocoded point and grid data, backgrounds, ...
n Web interface similar to Google Maps; de-facto standard.
n REST-based design; easily re-targetable: android, iOS, ...
n Challenges: 1013 tiles, 10
4 Terabytes.
n Fast in-database dynamic tile generation from numeric data.
n Easy to replicate, map on to cloud services.
A Data-Centric ViewProject 301P301dx FeaturesTambora & SO4Map: Icereader DataWeb App ChallengesRFDERFDE Client UpgradesWeb Mapping ServiceWMS ParametersHandheld DataSummary
Sudarshan S. Chawathe Accelerating Scientific Dataflows – p. 11
WMS Descriptive Parameters
data parameters 115
period 32 years
tiles 23× 1012
rendered tile size 10, 000 Terabytes
database size 0.42 Terabytes
avg static response time 0.2 seconds
avg dynamic response time 0.5 seconds
A Data-Centric ViewProject 301P301dx FeaturesTambora & SO4Map: Icereader DataWeb App ChallengesRFDERFDE Client UpgradesWeb Mapping ServiceWMS ParametersHandheld DataSummary
Sudarshan S. Chawathe Accelerating Scientific Dataflows – p. 12
Handheld Data Analysis
n Test data; do not use!
n HCDX: handheldchronological data explorer.
n Android, iOS, Maemo, Web, ...
n Very high-level end-userprogramming.
n Interactive analysis oftime-series datasets.
n In-field data collection andanalysis.
n Handheld interfaces,functional programming,database optimizations, ...
A Data-Centric ViewProject 301P301dx FeaturesTambora & SO4Map: Icereader DataWeb App ChallengesRFDERFDE Client UpgradesWeb Mapping ServiceWMS ParametersHandheld DataSummary
Sudarshan S. Chawathe Accelerating Scientific Dataflows – p. 13
Summary
n Scientific dataflows: from raw data to insights.u Explication, documentation, optimization, ...u Durability, traceability, analyses, visualizations, ...u Platforms: desktop/laptop, Web, mobile, ...u Bottleneck in the research process?
n Investments in improving dataflow have a multiplier effect onother research investments.
n Acknowledgments:u Faculty: Shaleen Jain, Andrei Kurbatov, Paul Mayewski.u Graduate students: Erik Albert, Mark Royer.u Undergraduate students: Will Lamond, Joe Petrakovich.u Project teams: P301, 10green, RFDE/SSI.u Funding: NSF, U.Maine.
n Data management collaborations? [email protected]
A Data-Centric ViewProject 301P301dx FeaturesTambora & SO4Map: Icereader DataWeb App ChallengesRFDERFDE Client UpgradesWeb Mapping ServiceWMS ParametersHandheld DataSummary
Sudarshan S. Chawathe Accelerating Scientific Dataflows – p. 14