BigDataCube: Flexible, Scalable User Services for Massive ......BigDataCube: Flexible, Scalable User...
Transcript of BigDataCube: Flexible, Scalable User Services for Massive ......BigDataCube: Flexible, Scalable User...
BigDataCube:
Flexible, Scalable User Services
for Massive Spatio-Temporal EO DataSymposium „Neue Perspektiven der Erdbeobachtung“, Köln, 2019-nov-12
Dimitar Misev, Peter Baumann, Bang Pham Huu, Vlad Merticariu, Heike Hoenig,
Dimitris Bellos, Sven Jacobsen, Stefan Wiehle
Jacobs University, rasdaman GmbH, cloudeo AG, DLRTechnically co-sponsored by:
BigDataCube
Mission: flexible & scalable services for massive
spatio-temporal EOdatacubes
Goals:
– standards-based datacubes
– enable industry & research collaboration across boundaries
– serve data ready for use free resources for core business
Approach: public & commercial datacubes, federated
Benefit: enable novel 3rd party services fast, flexible, scalable
BigDataCube: Partners
Jacobs University (project coordinator),
rasdaman GmbH
– datacube R&D, OGC + ISO + INSPIRE standardization
– Federation, security
cloudeo AG
– commercial geo-infrastructure
DLR Maritime Safety and Security Lab
– maritime wind & sea state products, S1 derived
BigDataCube:
Public/Private Datacube Partnership
On-demand standards-based datacube analytics
– Public CODE-DE + DLR service + private cloudeo services
– S1 & S2 (~400 TB), DTM/DEM, wind speed & sea state
Jacobs University (lead), rasdaman GmbH, cloudeo AG, DLR
Goals: Advancing federation; security!
rasdaman: Actionable Datacubes
= „raster data manager“: SQL + n-D datacubes– pioneered actionable datacubes; intl patents, publications
massively scalable datacube analytics engine– 2.5+ PB; 1000x parallelization; intercontinental federation
standards blueprint, reference implementation, awards– 28k+ downloads of open-source rasdaman
rasdaman Full-Stack Architecture
external
archives
rasserver
databasefile system
rasdamangeo services
Web clients (m2m, browser)
distributed query processingNo single point of failure
alternative
storage
[SSTD 2013]
tile access
optional compression
Dataset C
Parallel, Distributed Processing
1 query 1,000+ cloud nodes
Dataset D
Dataset B
Dataset A
max( (A.nir - A.red) / (A.nir + A.red) )
+ avg(B.green)
+ max( (C.red + C.green + C.blue) / 3 )
+ max( (D.nir + D.red) / 2 )
Bring Your Favourite Client
Open standards users in comfort zone
– Map navigation: OpenLayers, Leaflet, ...
– Virtual globe: NASA WorldWind, Cesium, ...
– Web GIS: QGIS, ArcGIS, ...
– Analysis: GDAL, R, python, ...
Result: Federations
CODE-DE: Sentinel-x
CreoDIAS
cloudeo AG: DEM
Alfred-Wegener-Institut: maritime products
Taiwan Datacube– Agro services on FORMOSAT, drones
Helmholtz-Zentrum Geesthacht (under work)
DWD: climate variables (planned)
European Earth Datacube Federation- single common geo information space
Datacubes too large for yes/no decision– Ex: ECMWF climate timeseries
Role-based access control + triggers– Any user/role, any op, any shape down to single pixel
– Full admin control
Authentication & authorization– External identity provider (OGC Testbed-15) or locally
Quota: access, download, processing
Billing
Security
free priced
Results: SAFE for Services & Analytics?
Server effort for analysis tasks?
Zip archive extra tool invocation for extracting image file
– subdirectories
JPEG (lossless) extra CPU cycles for pixel reconstruction
– Wavelets suboptimal for spatio-temporal subsetting
File granularity: 100x100km GB sizes
– Benchmarks [Furtado et al]: ~3 MB suitable
SAFE is archive format, not service format!
Results: Standardization
OGC– Coverage Implementation Schema (CIS) 1.1
• Web Coverage Service (WCS) 2.1
– Web Coverage Processing Service (WCPS) 1.1
• Integrated data / metadata retrieval & processing
ISO TC211
– 19123-1 Abstract Coverage Model (under work)
– 19123-2 Coverage Implementation Schema
ISO 9075-15:2019 SQL/MDA (Multi-Dimensional Arrays)
INSPIRE: coverage compatibility with OGC/ISO
rasdaman as SME Business Enabler
EOfarm/GR: Big Data Analytics for farmers
& water quality monitoring
SBI/DE: intra-field variation analysis service for farmers
CropMaps/DE startup:
SAR-based crop classification
and yield prediction
Related Projects
H2020 LANDSUPPORT– integrated land resource management and agriculture & forestry practices
H2020 EOSC-hub– European Open Science Cloud
H2020 PARSEC– EO business accelerator
BMEL BigPicture– Rule-based classification of field health status
– in-field anomalies, frost damage, drought
BMBF DeepRain
Summary
BigDataCube = public/private datacube federation
– Platform: rasdaman datacube reference implementation
„Making federations a commodity“
– Location transparency
– Security, quota, billing
– Performance, scalability
Datacube standardization progress
Contributing to European technology independence & visibility