DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann

23
Datacubes :: Delft SW Days :: ©2015 rasdaman Delft Software Days, Deltares, Delft, 2015-oct-26 Peter Baumann Jacobs University | rasdaman GmbH [email protected] Datacubes: Exploiting Big Earth Data Better [co-funded by EU through EarthServer 1/2, PublicaMundi] [gamingfeeds.com]

Transcript of DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann

Page 1: DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann

Datacubes :: Delft SW Days :: ©2015 rasdaman

Delft Software Days, Deltares, Delft, 2015-oct-26

Peter Baumann

Jacobs University | rasdaman GmbH

[email protected]

Datacubes:

Exploiting Big Earth Data Better

[co-funded by EU through EarthServer 1/2, PublicaMundi]

[gamingfeeds.com]

Page 2: DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann

Datacubes :: Delft SW Days :: ©2015 rasdaman

Page 3: DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann

Datacubes :: Delft SW Days :: ©2015 rasdaman

Datacube Research

@ Jacobs U

Large-Scale Scientific Information Systems

research group

• Flexible, scalable n-D array services

• www.jacobs-university.de/lsis

Main results:

• pioneer Array DBMS, rasdaman

• standardization:

• OGC Big Geo Data (also ISO, INSPIRE, W3C)

• ISO „Science SQL“

Hiring PhD students, PostDocs

Page 4: DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann

Datacubes :: Delft SW Days :: ©2015 rasdaman

Page 5: DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann

Datacubes :: Delft SW Days :: ©2015 rasdaman

sensor feeds

Data Homogenization: Making Life Simpler

5

coverage

server

sensor, image [timeseries], simulation, statistics data

Page 6: DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann

Datacubes :: Delft SW Days :: ©2015 rasdaman

OGC Web Coverage Service (WCS)

OGC/ISO Coverages: regular & irregular grids, point clouds, meshes

- OGC Coverage

Implementation Schema

Large, growing

implementation basis:

rasdaman, GDAL, QGIS,

OpenLayers, OPeNDAP,

MapServer, GeoServer,

NASA WorldWind, EOx-

Server; Pyxis, ERDAS,

ArcGIS, ...

WCS Core: access to spatio-temporal coverages & subsets

- subset = trim | slice

WCS Extensions: optional functionality facets

- Scaling, CRS transformation, …, Analytics (WCPS)

Page 7: DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann

Datacubes :: Delft SW Days :: ©2015 rasdaman

Array Analytics

Array Analytics :=

Efficient analysis on multi-dimensional arrays of a size several orders of

magnitude above evaluation engine„s main memory

Essential data property: n-dimensional Euclidean neighborhood

- Secondary: #dimensions, density, ...

Operations: Linear Algebra [M. Stonebraker],

statistics, image/signal processing

Page 8: DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann

Datacubes :: Delft SW Days :: ©2015 rasdaman

Datacube Access: A Simple Example

t

Page 9: DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann

Datacubes :: Delft SW Days :: ©2015 rasdaman

A Brief History of Array DBMSs

first appearance in literature (not first implementation)

Page 10: DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann

Datacubes :: Delft SW Days :: ©2015 rasdaman

Agile Array Analytics: rasdaman

„raster data manager“: SQL + n-D arrays

Scalable parallel “tile streaming” architecture

Blueprint for ISO Array SQL standard

[rasdaman visitors]

Page 11: DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann

Datacubes :: Delft SW Days :: ©2015 rasdaman

Array SQL

select id, encode(scene.band1-scene.band2)/(scene.nband1+scene.band2)), „image/tiff“ )

from LandsatScenes

where acquired between „1990-06-01“ and „1990-06-30“ and

avg( scene.band3-scene.band4)/(scene.band3+scene.band4)) > 0

create table LandsatScenes(

id: integer not null, acquired: date,

scene: row( band1: integer, ..., band7: integer ) mdarray [ 0:4999,0:4999] )

Page 12: DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann

Datacubes :: Delft SW Days :: ©2015 rasdaman

Direct Data Visualization

select

encode(

struct {

red: (char) s.b7[x0:x1,x0:x1],

green: (char) s.b5[x0:x1,x0:x1],

blue: (char) s.b0[x0:x1,x0:x1],

alpha: (char) scale( d, 20 )

},

“image/png"

)

from SatImage as s, DEM as d

[JacobsU, Fraunhofer; data courtesy BGS, ESA]

Page 13: DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann

Datacubes :: Delft SW Days :: ©2015 rasdaman

Goal: faster loading by adapting storage units to access patterns

Approach: partition n-D array into n-D partitions („tiles“)

Tiling classification based on degree of alignment [ICDE 1999]

Partitioned Array Storage

chunking [Sarawagi,

Stonebraker, DeWitt, ... ]

Page 14: DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann

Datacubes :: Delft SW Days :: ©2015 rasdaman

Tiling: Tuning Data for Applications

tiling strategies as service tuning [Furtado]:

- regular directional area of interest

rasdaman storage layout language

insert into MyCollection

values ...

tiling area of interest [0:20,0:40], [45:80,80:85]

tile size 1000000

index d_index storage array compression zlib

„chunks“

[Sarawagi,

DeWitt, ...]

Page 15: DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann

Datacubes :: Delft SW Days :: ©2015 rasdaman

Why Irregular Tiling?

e-Science often uses irregular partioning

[OpenStreetMap]

[Centrella et al: scidacreviews.org]

Page 16: DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann

Datacubes :: Delft SW Days :: ©2015 rasdaman

Parallel / Distributed Query Processing

1 query 1,000+ cloud nodes

[SIGMOD DANAC 2014]

Dataset B

Dataset A

Dataset D

Dataset C

select

max((A.nir - A.red) / (A.nir + A.red))

- max((B.nir - B.red) / (B.nir + B.red))

- max((C.nir - C.red) / (C.nir + C.red))

- max((D.nir - D.red) / (D.nir + D.red))

from A, B, C, D

Page 17: DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann

Datacubes :: Delft SW Days :: ©2015 rasdaman

Secured Archive Integration

First-ever direct, ad-hoc mix from protected NASA & ESA services

in OGC WCS/WCPS Web client (EarthServer + CobWeb)

Page 18: DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann

Datacubes :: Delft SW Days :: ©2015 rasdaman

Web clients (m2m, browser)

Scalable Geo Service Architecture

OGC

WMS, WCS,

WCPS, WPS

distributed query

processingNo single point of failure

external

files

Internet

rasserver

databaseFile system

rasdaman

geo services

alternative

storage

Page 19: DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann

Datacubes :: Delft SW Days :: ©2015 rasdaman

Inset: Hadoop not the Answer to All

no builtin knowledge about structured data types

- “Since it was not originally designed to leverage the structure

[…] its performance […] is therefore suboptimal” [Daniel Abadi]

• M. Stonebraker (XLDB 2012): „will hit a scalability wall“

Page 20: DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann

Datacubes :: Delft SW Days :: ©2015 rasdaman

EarthServer: Datacubes At Your Fingertips

INSPIRE WCS :: ©2015 P. Baumann www.earthserver.eu

Page 21: DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann

Datacubes :: Delft SW Days :: ©2015 rasdaman

EarthServer: Datacubes At Your Fingertips

Agile Analytics on x/y/t + x/y/z/t Earth & Planetary datacubes

- EU rasdaman + NASA WorldWind

- Rigorously standards: OGC WMS + WCS + WCPS

- 100s of TB sites now, next: 1+ Petabyte per cube

Intercontinental initiative, 3+3 years:

EU + US + AUS

INSPIRE WCS :: ©2015 P. Baumann

Phase 1 reviewers:

"proven evidence" that rasdaman

will “significantly transform [how to]

access and use data“ …and "with

no doubt has been shaping the Big

Earth Data landscape” …

www.earthserver.eu

Page 22: DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann

Datacubes :: Delft SW Days :: ©2015 rasdaman

EarthServer Phase 1 & 2 Partners

www.earthserver.eu

Page 23: DSD-INT 2015 - Datacubes understanding big eo data better - Peter baumann

Datacubes :: Delft SW Days :: ©2015 rasdaman

From data stewardship to service stewardship

Open-standard Coverage Cubes interoperability

- consensus across OGC, ISO, INSPIRE

EarthServer: agile analytics federation

- rasdaman Array DBMS

„A cube tells more than a million images“

Conclusion

[rasdaman/WW screenshots]