The Big Data Lab for Interdisciplinary Spatially- Enabled ... Big Data Lab for Interdisciplinary...

Post on 20-Apr-2018

223 views 3 download

Transcript of The Big Data Lab for Interdisciplinary Spatially- Enabled ... Big Data Lab for Interdisciplinary...

The Big Data Lab for Interdisciplinary Spatially-Enabled Science (BLISS)

Goal

Setting up a large database of time series of changes in land use for most of the agricultural

areas of the planet

Agenda

●The problem●The challenge●The Solution

The problem

Mato Grosso, Brasil, May 8 – Jun 9, 1984

Mato Grosso, Brasil, Jun 10 – Jul 12, 1985

Mato Grosso, Brasil, Jul 12 – Aug 13, 1986

Mato Grosso, Brasil, May 8 – Jun 9, 1988

Mato Grosso, Brasil, Aug 13 – Sep 14, 1989

Mato Grosso, Brasil, Jul 12 – Aug 13, 1990

Mato Grosso, Brasil, Jul 12 – Aug13, 1991

Mato Grosso, Brasil, Aug 12 – Sep 13, 1992

Mato Grosso, Brasil, Jun 10 – Jul 12, 1993

Mato Grosso, Brasil, Jul 12 – Aug 13, 1994

Mato Grosso, Brasil, Jul 12 – Aug 13, 1995

Mato Grosso, Brasil, Jun 9 – Jul 11, 1996

Mato Grosso, Brasil, Jun 10 – Jul 12, 1997

Mato Grosso, Brasil, Jun 10 – Jul 12, 1998

Mato Grosso, Brasil, Jun 10 – Jul 12, 1999

Mato Grosso, Brasil, Jun 9 – Jul 11, 2000

Mato Grosso, Brasil, Jul 12 – Aug 13, 2001

Mato Grosso, Brasil, Jul 12 – Aug 13, 2003

Mato Grosso, Brasil, Jun 9 – Jul 11, 2004

Mato Grosso, Brasil, Jul 12 – Aug 13, 2005

Mato Grosso, Brasil, May 9 – Jun 10, 2006

Mato Grosso, Brasil, Jun 10 – Jul 12, 2007

Mato Grosso, Brasil, Jun 9 – Jul 11, 2008

Mato Grosso, Brasil, Jul 12 – Aug 13, 2009

Mato Grosso, Brasil, Jun 10 – Jul 12, 2010

“Remote sensing images describe landscape dynamics”

What's in an image?

2010 2011

Deforestation event detection: images and time series

Vegetation index time series

Área 1

Área 2

Área 3

source: Victor Maus (INPE)

Time series analysis of land change

Forest

PastureForest

Forest Agriculture

The data

Earth observation satellites and geosensor webs provide key information about global change…

…but that information needs to be modelled and extracted

EO data is now free…and bigImage source: NASA

Sentinels: 3 Tb/day

Is free data download our answer?

Currently, users download one snapshot at a time

Data Access Hitting a Wall

How do you download a petabyte?You don’t! Move the software to the archive

Landsat/TM (August 2007)

MODIS (November 2007)

How hard is to use MODIS?

Detection of deforestation and degradation in MODIS requires much expertise (low-resolution artifacts)

The challenge

Daily warnings of newly deforested large areas

Real-time Deforestation Monitoring

Evaluation of automated methods in one image only!

Real-time Deforestation Monitoring: how to make progress?

The practices of the research community do not match the needs of the end-users!

Real-time Deforestation Monitoring: how to make progress?

Where we want to get to

Remote visualization and method development

Big data EO management and analysis

40 years of Earth Observation data of land change accessible for analysis and modelling.

30 years of EO experience Powerful analysis engine (R)EO database tech (Terralib)Time series EO analysis

SciDB: innovative DMAS for big

arrays

INPE + IFGI

What we know

What we know we don’t know 1: Data

How to put all EO data together? How to work with different ST resolutions?Different satellites have different calibrationsGeometric and radiometric problems

How to organize scientific data in array databases?How to match data semantics to arrays?What’s the equivalent of transaction? What about concurrency control? How to support worldwide users?

What we know we don’t know 2: databases

What are good tools for space-time modelling of EO data?How to combine time series with spatial statistics?How to do space-time object and event detection? How to develop a library of methods for SciDB-R env?

What we know we don’t know 3: methods

What we know we don’t know 4: applications

How best to use ST EO data for global forest studies?How best to use ST EO data for global food studies?

The technology

Nature

“A few satellites can cover the entire globe, but there needs to be a system in place to ensure their images are readily available to everyone who needs them. Brazil has set an important precedent by making its Earth-observation data available, and the rest of the world should follow suit.”

The technology

R: The lingua franca for data analysis

Database

Array databases: all data from a sensor put together in a single array

Xy

t

result = analysis_function (points in space-time )

y

SciDB Architecture: “shared nothing”

Large data is broken into chunks Distributed server process data in parallel

Chunks

1 1 2

5 8 13

34 55 89

233 377 610

0

3

21

144

1

5

0

3

55 89

377 610

1 2

8 13

34

233

21

144

The Proposed Solution

Software goes where the data is!

SciDB: array database for big scientific data

Free satelliteimages

R: Powerful data analysis methods

Global Land Observatory: describing change in a connected

world

Unique repository of knowledge and data about global land change

40 years of LANDSAT + 12 years of MODIS + SENTINELs + CBERS

Free satelliteimages

Global Land Observatory: describing change in a connected

world

Methods for land change for forestry and

agriculture uses

59

ST arrays allows new questions:Are biofuels replacing food production in Brazil?

source: B. Rudorff, INPE

RFields

Current GIS architecture

Single data source, single data schema, layer-oriented view

Distributed architecture for GIS

Consumer

Broker

Provider(s)

- Catalog of available data sets - Location and access information - Data sets meta-data

I need:Rainfall of the Brazilian Amazon from 1999 till 2005

Data set 1 WGS84 (SciDB) Data set 2 SAD69 (GeoTIFF)

Get Data Set

Data Set

SOA

Huge diversity of Geospatial Data...

What is the data about?What is the data format?Where to find the data?

Data Discovery

ConsumerR Package (RGIS?)

- Creates an abstraction layer between users and data sources

- Provides direct access to geospatial data types (Coverage, Time Series Trajectories)

Broker(RDF Triple Store)

- Manages available data sources and their data sets- Stores data sets meta-data- Provides thematic, temporal and geospatial filters- Links data sets to other repositories (meta data enhancement)- Provides credentials for accessing data sources.

Main Challenge:- Generic vocabulary for describing data sets / data sources

Generic Fields data type:new, add obs,

domain, extent, value, combine, neigh,

apply, select, filter, reducereference systems

Generic Fields data type for Big Spatial Data

● GI representation as arrays

● Use of ADBMS routines to GI (server-side processing)

● Keep existing interfaces to data (R & Terralib, Terraview)

MOD09Q1

250 mts spatial resolution

8 days temporal resolution

4800 x 4800 pixels 3 bands (red, nir, qc) 13 years of data (since

2000) HDF format tiles

MODIS tiles

HDFs

REGION HDFs HDF Size (TB)

Binary size (TB)

Binary size ijt (TB)

Amazon(8 tiles x 46 weeks x 13years)

4784 0.30 2.41 4.81

South America(24 x 46 x 13) 14352 0.91 7.22 14.44World land area(225 x 46 x 13) 134550 8.53 67.67 135.33

Data loading

Export

MODISHDF

SciDB bin Load 1D

Array

InsertRedimApply

3DArray

1.3 min / HDF

0.8 min / HDF● Ubuntu server 12 LTS● Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz● 24 cores● RAM 125 GB

Estimated loading time

REGION HDFs Time

Amazon(8 tiles x 46 weeks x 13years)

4784 2.6 days

South America(24 x 46 x 13)

14352 7.9 daysWorld land area(225 x 46 x 13)

134550 74.2 days

SciDB performance

References

Damien Arvor et al. “Analyzing the agricultural transition in Mato Grosso, Brazil, using satellite-derived indices.” In: Applied Geography 32.2 (2012), pp. 702 –713. ISSN: 0143-6228. DOI : http://dx.doi.org/10.1016/j.apgeog.2011.08.007. URL: http://www.sciencedirect.com/science/article/pii/S0143622811001603.

Robert Battle and Dave Kolas. “Enabling the geospatial Semantic Web with Parliament and GeoSPARQL.” In: Semantic Web 3.4 (2012), pp. 355–370. URL: http://dblp.uni-trier.de/db/journals/semweb/semweb3.html#BattleK12.

J. Beddington. “Food, energy, water and the climate: A perfect storm of global events.” In: Sustainable development UK 9 (2009).

Mark Broich et al. “Time-series analysis of multi-resolution optical imagery for quantifying forest cover loss in Sumatra and Kalimantan, Indonesia.” In: International Journal of Applied Earth Observation and Geoinformation 13.2 (2011), pp. 277 –291. ISSN: 0303-2434. DOI: http://dx.doi.org/10.1016/j.jag.2010.11.004. URL: http://www.sciencedirect.com/science/article/pii/S0303243410001340.

Gilberto Camara et al. “Fields as a Generic Data Type for Big Spatial Data.” In: Steffen Fritz et al. “Highlighting continued uncertainty in global land cover maps for the user community.” In: Environmental Research Letters 6.4 (2011), p. 044005. URL: http://stacks.iop.org/1748-9326/6/i=4/a=044005.

J. Gray et al. “Scientific data management in the coming decade.” In: ACM SIGMOD Record 34.4 (2005), pp. 34–41.

P. Griffiths et al. “A Pixel-Based Landsat Compositing Algorithm for Large Area Land Cover Mapping.” In: Selected Topics in Applied Earth Observations and Remote Sensing, IEEE Journal of 6.5 (2013), pp. 2088–2101. ISSN: 1939-1404. DOI: 10.1109/JSTARS.2012.2228167.

Patrick Griffiths et al. “Using annual time-series of Landsat images to assess the effects of forest restitution in post-socialist Romania.” In: Remote Sensing of Environment 118.0 (2012), pp. 199214. ISSN : 0034-4257. DOI: http : / / dx . doi . org / 10 . 1016 / j . rse . 2011 . 11 . 006. URL: http://www.sciencedirect.com/science/article/pii/S0034425711004019.

M. C. Hansen et al. “High-Resolution Global Maps of 21st-Century Forest Cover Change.” In: Science 342.6160 (2013), pp. 850–853.

Manolis Koubarakis et al. “Building Virtual Earth Observatories Using Ontologies and Linked Geospatial Data.” In: Proceedings of the 6th International Conference on Web Reasoning and Rule Systems. RR’12. Vienna, Austria: Springer-Verlag, 2012, pp. 229–233. ISBN : 978-3-642-33202-9. DOI: 10.1007/978-3-642-33203-6_21. URL: http://dx.doi.org/10.1007/978-3-642-33203-6_21.

J.G. Masek et al. “A Landsat surface reflectance dataset for North America, 1990-2000.” In: Geoscience and Remote Sensing Letters, IEEE 3.1 (2006), pp. 68–72. ISSN: 1545-598X. DOI: 10.1109/LGRS.2005.857030.

Ian McCallum et al. “A spatial comparison of four satellite derived 1km global land cover datasets.” In: International Journal of Applied Earth Observation and Geoinformation 8.4 (2006), pp. 246 –255. ISSN: 0303-2434. DOI: http : / / dx . doi . org / 10 . 1016 / j . jag . 2005 . 12 . 002. URL : http//www.sciencedirect.com/science/article/pii/S0303243405001212.

Edzer Pebesma. “spacetime: Spatio-Temporal Data in R.” In: Journal of Statistical Software 51.7 (Nov. 2012), ISSN : 1548-7660. URL: http://www.jstatsoft.org/v51/i07.

Stephen G. Perz. “Grand Theory and Context-Specificity in the Study of Forest Dynamics: Forest Transition Theory and Other Directions.” In: The Professional Geographer 59.1 (2007), pp. 105–114. ISSN: 1467-9272. DOI : 10.1111/j.1467-9272.2007.00594.x. URL: http://dx.doi.org/10.1111/j.1467-9272.2007.00594.x.

Toshihiro Sakamoto et al. “A crop phenology detection method using time-series {MODIS} data.” In: Remote Sensing of Environment 96.3–4 (2005), pp. 366 –374. ISSN: 0034-4257. DOI: http//dx.doi.org/10.1016/j.rse.2005.03.008. URL: http://www.sciencedirect.com/science/article/pii/S0034425705001057.

Michael Stonebraker et al. “The architecture of SciDB.” In: 23rd International Conference on Scientific and Statistical Database Management (SSDBM 2011). Ed. by Judith Bayard Cushing, James French, and Shawn Bowers. Vol. 6809. Lecture Notes in Computer Science. Springer, 2011, pp. 1–16.

Armel Thibaut Kaptue Tchuente, Jean-Louis Roujean, and Steven M. De Jong. “Comparison and relative quality assessment of the GLC2000, GLOBCOVER, {MODIS} and {ECOCLIMAP} land cover data sets at the African continental scale.” In: International Journal of Applied Earth Observation and Geoinformation 13.2 (2011), pp. 207 –219. ISSN: 0303-2434. DOI: http://dx.doi.org/10.1016/j . jag . 2010 . 11 . 005. URL: http://www.sciencedirect.com/science/article/piiS0303243410001352.

P. Vitousek et al. “Human domination of Earth’s ecosystems.” In: Science 277 (2007), pp. 494–500.

Xiaoyang Zhang et al. “Monitoring vegetation phenology using {MODIS}.” In: Remote SensingEnvironment 84.3 (2003), pp. 471 –475. ISSN: 0034-4257. DOI: http://dx.doi.org/10.1016/S0034-4257(02)00135-9. URL: http://www.sciencedirect.com/science/article/pii/S0034425702001359.

Zhe Zhu, Curtis E. Woodcock, and Pontus Olofsson. “Continuous monitoring of forest disturbance using all available Landsat imagery.” In: Remote Sensing of Environment 122.0 (2012). Landsat Legacy Special Issue, pp. 75 –91. ISSN : 0034-4257. DOI: http : / / dx . doi . org / 10 . 1016 / jrse . 2011 . 10 . 030.