DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann
-
Upload
delftsoftwaredays -
Category
Software
-
view
283 -
download
1
Transcript of DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann
Datacubes :: Delft SW Days :: ©2015 rasdaman
Delft Software Days, Deltares, Delft, 2015-oct-26
Peter Baumann
Jacobs University | rasdaman GmbH
Datacubes:
Exploiting Big Earth Data Better
[co-funded by EU through EarthServer 1/2, PublicaMundi]
[gamingfeeds.com]
Datacubes :: Delft SW Days :: ©2015 rasdaman
Datacubes :: Delft SW Days :: ©2015 rasdaman
Datacube Research
@ Jacobs U
Large-Scale Scientific Information Systems
research group
• Flexible, scalable n-D array services
• www.jacobs-university.de/lsis
Main results:
• pioneer Array DBMS, rasdaman
• standardization:
• OGC Big Geo Data (also ISO, INSPIRE, W3C)
• ISO „Science SQL“
Hiring PhD students, PostDocs
Datacubes :: Delft SW Days :: ©2015 rasdaman
Datacubes :: Delft SW Days :: ©2015 rasdaman
sensor feeds
Data Homogenization: Making Life Simpler
5
coverage
server
sensor, image [timeseries], simulation, statistics data
Datacubes :: Delft SW Days :: ©2015 rasdaman
OGC Web Coverage Service (WCS)
OGC/ISO Coverages: regular & irregular grids, point clouds, meshes
- OGC Coverage
Implementation Schema
Large, growing
implementation basis:
rasdaman, GDAL, QGIS,
OpenLayers, OPeNDAP,
MapServer, GeoServer,
NASA WorldWind, EOx-
Server; Pyxis, ERDAS,
ArcGIS, ...
WCS Core: access to spatio-temporal coverages & subsets
- subset = trim | slice
WCS Extensions: optional functionality facets
- Scaling, CRS transformation, …, Analytics (WCPS)
Datacubes :: Delft SW Days :: ©2015 rasdaman
Array Analytics
Array Analytics :=
Efficient analysis on multi-dimensional arrays of a size several orders of
magnitude above evaluation engine„s main memory
Essential data property: n-dimensional Euclidean neighborhood
- Secondary: #dimensions, density, ...
Operations: Linear Algebra [M. Stonebraker],
statistics, image/signal processing
Datacubes :: Delft SW Days :: ©2015 rasdaman
Datacube Access: A Simple Example
t
Datacubes :: Delft SW Days :: ©2015 rasdaman
A Brief History of Array DBMSs
first appearance in literature (not first implementation)
Datacubes :: Delft SW Days :: ©2015 rasdaman
Agile Array Analytics: rasdaman
„raster data manager“: SQL + n-D arrays
Scalable parallel “tile streaming” architecture
Blueprint for ISO Array SQL standard
[rasdaman visitors]
Datacubes :: Delft SW Days :: ©2015 rasdaman
Array SQL
select id, encode(scene.band1-scene.band2)/(scene.nband1+scene.band2)), „image/tiff“ )
from LandsatScenes
where acquired between „1990-06-01“ and „1990-06-30“ and
avg( scene.band3-scene.band4)/(scene.band3+scene.band4)) > 0
create table LandsatScenes(
id: integer not null, acquired: date,
scene: row( band1: integer, ..., band7: integer ) mdarray [ 0:4999,0:4999] )
Datacubes :: Delft SW Days :: ©2015 rasdaman
Direct Data Visualization
select
encode(
struct {
red: (char) s.b7[x0:x1,x0:x1],
green: (char) s.b5[x0:x1,x0:x1],
blue: (char) s.b0[x0:x1,x0:x1],
alpha: (char) scale( d, 20 )
},
“image/png"
)
from SatImage as s, DEM as d
[JacobsU, Fraunhofer; data courtesy BGS, ESA]
Datacubes :: Delft SW Days :: ©2015 rasdaman
Goal: faster loading by adapting storage units to access patterns
Approach: partition n-D array into n-D partitions („tiles“)
Tiling classification based on degree of alignment [ICDE 1999]
Partitioned Array Storage
chunking [Sarawagi,
Stonebraker, DeWitt, ... ]
Datacubes :: Delft SW Days :: ©2015 rasdaman
Tiling: Tuning Data for Applications
tiling strategies as service tuning [Furtado]:
- regular directional area of interest
rasdaman storage layout language
insert into MyCollection
values ...
tiling area of interest [0:20,0:40], [45:80,80:85]
tile size 1000000
index d_index storage array compression zlib
„chunks“
[Sarawagi,
DeWitt, ...]
Datacubes :: Delft SW Days :: ©2015 rasdaman
Why Irregular Tiling?
e-Science often uses irregular partioning
[OpenStreetMap]
[Centrella et al: scidacreviews.org]
Datacubes :: Delft SW Days :: ©2015 rasdaman
Parallel / Distributed Query Processing
1 query 1,000+ cloud nodes
[SIGMOD DANAC 2014]
Dataset B
Dataset A
Dataset D
Dataset C
select
max((A.nir - A.red) / (A.nir + A.red))
- max((B.nir - B.red) / (B.nir + B.red))
- max((C.nir - C.red) / (C.nir + C.red))
- max((D.nir - D.red) / (D.nir + D.red))
from A, B, C, D
Datacubes :: Delft SW Days :: ©2015 rasdaman
Secured Archive Integration
First-ever direct, ad-hoc mix from protected NASA & ESA services
in OGC WCS/WCPS Web client (EarthServer + CobWeb)
Datacubes :: Delft SW Days :: ©2015 rasdaman
Web clients (m2m, browser)
Scalable Geo Service Architecture
OGC
WMS, WCS,
WCPS, WPS
distributed query
processingNo single point of failure
external
files
Internet
rasserver
databaseFile system
rasdaman
geo services
alternative
storage
Datacubes :: Delft SW Days :: ©2015 rasdaman
Inset: Hadoop not the Answer to All
no builtin knowledge about structured data types
- “Since it was not originally designed to leverage the structure
[…] its performance […] is therefore suboptimal” [Daniel Abadi]
• M. Stonebraker (XLDB 2012): „will hit a scalability wall“
Datacubes :: Delft SW Days :: ©2015 rasdaman
EarthServer: Datacubes At Your Fingertips
INSPIRE WCS :: ©2015 P. Baumann www.earthserver.eu
Datacubes :: Delft SW Days :: ©2015 rasdaman
EarthServer: Datacubes At Your Fingertips
Agile Analytics on x/y/t + x/y/z/t Earth & Planetary datacubes
- EU rasdaman + NASA WorldWind
- Rigorously standards: OGC WMS + WCS + WCPS
- 100s of TB sites now, next: 1+ Petabyte per cube
Intercontinental initiative, 3+3 years:
EU + US + AUS
INSPIRE WCS :: ©2015 P. Baumann
Phase 1 reviewers:
"proven evidence" that rasdaman
will “significantly transform [how to]
access and use data“ …and "with
no doubt has been shaping the Big
Earth Data landscape” …
www.earthserver.eu
Datacubes :: Delft SW Days :: ©2015 rasdaman
EarthServer Phase 1 & 2 Partners
www.earthserver.eu
Datacubes :: Delft SW Days :: ©2015 rasdaman
From data stewardship to service stewardship
Open-standard Coverage Cubes interoperability
- consensus across OGC, ISO, INSPIRE
EarthServer: agile analytics federation
- rasdaman Array DBMS
„A cube tells more than a million images“
Conclusion
[rasdaman/WW screenshots]