Big Data, Data and Information Mining for Earth Observation

36
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013 P.G. Marchetti ESA, M. Iapaolo Randstad Ground Segment and Mission Operations Department Research and Ground Segment Technology Section Earth Observation Programmes Directorate [email protected] michele . [email protected] Image Information Mining and Knowledge Discovery from Earth Observation Data Towards the Sentinels Era

description

Big Data, Data Mining, Information Mining for Earth Observation

Transcript of Big Data, Data and Information Mining for Earth Observation

Page 1: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013

P.G. Marchetti ESA, M. Iapaolo Randstad

Ground Segment and Mission Operations Department

Research and Ground Segment Technology Section

Earth Observation Programmes Directorate

[email protected] [email protected]

Image Information Mining and Knowledge Discovery from Earth Observation DataTowards the Sentinels Era

Page 2: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013

1. Background on the European Space Agency

2. Motivation

3. Overview of ESA activities in the IIM field

4. Systems and services for EO data exploitation

5. The road ahead

Outline

Page 3: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013ESA UNCLASSIFIED – For Official Use

• ERS and Envisat missions 1991-2012

• More than 2 Petabytes of data

• Two decades of global change records

• Need for data preservation, availability

and exploitation

The Heritage: ERS and ENVISAT

Page 4: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013ESA UNCLASSIFIED – For Official Use

First images

Mar 02

Laun

ch

5000 scientific projects

using Envisat data

Sep 04

Envisat SymposiumSalzburg (A)

Global airpollution Chlorophyll

concentration

HurricaneKatrina

Envisat SymposiumMontreux (CH)

Apr 07

CO2 map

Living Planet SymposiumBergen (N)

Jun 10

L’Aquila 2009

and many workshops dedicated to specific Envisat user communities

Iceland 2010

Japan 2011

Prestige tankeroil slick

Ten Years of Envisat Science

Living Planet SymposiumEdinburgh (UK)

Sep 13

Envisat was the Sentinel “precursor” for many operational

users

B-15A iceberg

Bam earthquake

Ozone hole 2005Arctic 2007

Page 5: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013ESA UNCLASSIFIED – For Official Use

6

Space Component

In-Situ Component

Services Component

Copernicus (formerly known as GMES) is a European space flagship programme led by the European Union

Provides the necessary data for operational monitoring of the environment and for civil security

ESA coordinates the space(*) component

The Copernicus Programme

(*)spacecraft, flight operation segment, ground segment

Page 6: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013ESA UNCLASSIFIED – For Official Use

S1A/B: Radar Mission

S2A/B: High Resolution Optical Mission

S3A/B: Medium Resolution Imaging and Altimetry Mission

S4A/B: Geostationary Atmospheric Chemistry Mission

S5P: Low Earth Orbit Atmospheric Chemistry Precursor Mission

S5A/B/C: Low Earth Orbit Atmospheric Chemistry Mission

Jason-CS A/B: Altimetry Mission7

first launchin 2014

first launchin 2014

Copernicus Space Component: Dedicated Missions

Page 7: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013ESA UNCLASSIFIED – For Official Use

DMC

Deimos-2

SPOT (HRS)

Optical M

R and LR

miss

ions

Copernicus Contributing

Missions

Optica

l VH

R

and H

R

missio

ns

SAR missions

Alt

imetr

y

mis

sions

Atmospheric missions

PROBA-V

SPOT (VGT)

MetOp Meteosat 2nd Generation

Cryosat

Jason

Pléiades

RapidEye

COSMO-Skymed

Radarsat

TerraSAR–XTandem-X

Sentinels are

complementary

Copernicus Contributing Missions

Page 8: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013

1. Foster the use of IIM and derived technologies in support of the EO

data exploitation

2. Develop state-of-the-art data processing for improving access and

dissemination of future EO data (e.g. Sentinels mission)

3. Implement systems and services for supporting the “scientific

exploitation” of EO data

4. Investigate new approaches and methodologies to exploit data from

all available missions and archives (joint effort with Long Term Data

Preservation programme)

Motivation

Page 9: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013

Image Information Mining Coordination Group (IIMCG)

The Image Information Mining Coordination Group (IIMCG):

Space Agencies (ESA, DLR, CNES, ASI) European Institutions (EUSC, JRC) National Research Institutes (Uni-Trento, ETHZ, INGV, Mississippi

State University)

Main objectives: Inform Agencies and partners, promote research and technological

activities on IIM (automatic information extraction from EO data for image understanding and retrieval)

Promote the use of IIM techniques for management and exploitation of very large EO data archives/missions (PB of data)

Foster the role of IIM in the context of future missions and existing archives

Involve industry and agency partners to increase the relevance of IIM activities in Europe

Page 10: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013

10 years of IIM activities at ESA

2000 2002 2004 2006 2008 20122010

Knowledge Driven Information Mining Prototype

Knowledge-centred Earth Observation Prototype

Multi-sensor Evolution Analysis Prototype

Technology activities over last decade

Main achievements: KIM System: IIM reference prototype @ ESRIN Platforms for EO data exploitation (KEO, GPOD, SSE, etc.) Tools for multi-temporal and evolution analysis (MEA)

Issues: Limited number of scientific and industrial partners involved National efforts not coordinated and harmonised Funds limited wrt the size of the research goals

Page 11: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013

Image Information MiningFrom Data to Information

Provides processing tools to extract features from images and associate meaning to extracted features (bridging the gap between data and information)

Empower users (researchers, service providers, decision makers) to identify and reuse relevant information for their applications

Encourage the use of common cooperative environments to achieve a common knowledge

EO Data

Data (PB)

Information (KB)

Information

Acquisition

Catalogue & Ordering

Algorithms & Applications

Knowledge Models & Ground Truth

Image Information Mining (IIM)

Page 12: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013

The KIM prototype developed @ ESRIN permits:

Intelligent and effective access to information in large EO datasets Improved exploration and use of EO images for scientific research Extraction of relevant information for different applications (change detection,

global monitoring, disaster management, …) Implementation, integration and validation of services derived from IIM methods

Three main components:1. Ingestion Software (Primitive Feature Extraction / Clustering)2. Database (storing extracted information)3. Interactive Client Application

i. Training and definition of “semantic rules”ii. Application of training (rules) to the entire collectioniii. Definition of “semantic labels” for extracted informationiv. Store for successive re-use

KIM (Knowledge-based Information Mining)

Page 13: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013

KIM Architectural Elements

EO Images

KIM

DatabaseInformationMining

ClientIngestion

Feature Extraction Clustering

Input EO images

Output Identifiers of searched images Feature Maps / Thematic maps

Page 14: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013

KIM Search and label

KIM permits to inspect a collection of images……interactively define “semantic features” using the “primitive features” extracted by the system……search for the defined feature within the entire collection…

Page 15: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013

KIM Information Extraction

…and extract Feature Maps or Thematic Maps

Forest MonitoringFlooded areasCloud masks

Page 16: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013

KIM Primitive features

Spectral Spectral signature

Texture Structural information extracted with the Gibbs Marcov Random Fields (GMRF) model S0 - full resolution images; S1 - sub-sampled images

DCT Discrete Cosine Transform: transforms signals and images from the spatial domain to the frequency domain

EMBD Enhanced-Model-Based-Despeckling: performs a high quality despeckling of SAR images

Area Area of the objects detected with the segmentation process

Compactness Compactness of the objects detected with the segmentation process

Spectral Mean Mean value of the radiometric information of the image inside the closed area detected by the segmenter

Spectral Variance

Variance of the radiometric information of the image inside the closed area detected by the segmenter

Hu Moments Hu-Moment Invariants: shape information conveyed by the contour points. Hu moments are invariant to scale, rotation and translation (the first 4 out of 7 invariant moments as shape descriptors have been used).

Page 17: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013

KIM Validation

KIM has been tested and validated with different datasets:

1. MERIS RR / MERIS FR

2. ERS / ASAR

3. SPOT

4. Landsat

5. Maps (Level 2 / Level 3 products)

Large number of collection created

Low number of significant semantic features identified

Page 18: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013

KIM for Information Extraction

1. Flood Detection (SAR data)

2. Cloud Detection (MERIS RR)

3. Long-term Forest Monitoring (Landsat)

4. Rapid Mapping / Damage Assessment (VHR optical data)

Potentialities of the tool have been highlighted and confirmed in different contexts

End-users expectations not always achieved

Page 19: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013

KEO (Knowledge centred Earth Observation)

KEO is a distributed Component-based Processing Environment (CPE) permitting to:

a. Create & semantically identify internal/external Processing Components

b. Graphically chain Processing Components into processing chains

c. Create Processing Components from IIM components (KIM training)

d. Export and store outputs into Web Servers (WFS, WMS, WCS)

KEO also provides some relevant Reference Data Sets:

e. Heterogeneous data and information, growing with external contributions (images, documents, DEMs, photos, processors, etc.)

f. In support of various applications: Classification, Time Series Analysis, Ortho-rectification, Urban Monitoring, Interferometry, etc.)

Page 20: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013

KEO CPEGraphical Processor Designer

Page 21: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013

MEA (Multi-temporal Evolution Analysis)

Multi-temporal analysis of HR / VHR products:

1. Select multi-temporal applications that might benefit from such extension

2. Design, implement and integrate the automatic multi-temporal algorithms to support the selected applications

3. Create the needed HR/VHR Reference Data Sets and Evolution Models

4. Develop standard interfaces between the different systems for common exploitation of ingested data and processing capabilities

5. Integrate algorithms and Evolution Models provided by other independent projects

6. Validate (with the support of a Validation Group) the Automatic Multi-temporal algorithms and Evolution Models

Page 22: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013

The MEA-ASIM system aims at providing:

1. Advanced tools for Land Use / Land Cover change analysis

2. Level-2 EO products for real time exploitation

3. Interfaces to external systems (G-POD, KEO, data providers, etc.)

4. Access to data via standard WCS OGC interface

5. Native support for Sentinel-2 datasets

RSS

Data

Farm

MEA (Multi-temporal Evolution Analysis)

Page 23: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013

MEA Pixel and Coverage analysis

Time-Series AnalisysSingle and multi plot functionality

Page 24: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013

Cross-comparison of EO products

MEA Pixel and Coverage analysis

Page 25: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013

Exploitation Platforms for EO

Development and implementation of collaborative Exploitation Platforms (G-POD, SSEP, E-CEO, etc.):

1. Fostering the scientific exploitation of EO data

2. Automating the creation data mining and information extraction experiments and algorithms

3. Supporting the creation of EO-based applications and services

4. Supporting the entire scientific research process:

a. Addressing specific scientific challenges and tackling new research problems in a “parallel and collaborative way”

b. Generation of reproducible results that can be easily shared and validated

Page 26: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013

Research and Service Support:Research Process

Page 27: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013

Principal Investigators

• EO algorithms delivery • Data type and range indication

• Output validation

• On-demand EO data processing

• Use of produced data (scientific projects delivery)

• Publications

RSS

• Data are made available in the RSS catalogue

• Algorithm porting and Integration

• Test and validation (involving the PI)

• On-demand EO data processing

• Delivery

RSS G-POD Process Steps

Page 28: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013

RSS Flexible Resources

Flexible Infrastructure satisfies: - HW requirements - Connectivity requirements - SLA (HA, help desk, ticketing

systems, etc.)

On-demand processing service:

Platform

Infrastructure

G-POD

ESRIN- 172 cores- 400 TB

UK-PAC- 96 cores- 300 TB

Flexible/ Unlimited Infrastructure - 10-200 cores- 1-10 TB

EO Scientists Principal Investigators

deliveryProcessEO data

Volume accessed by PI projects in 2012: • Total Number Submitted Jobs 38,774• Average Number of Products per Job: 35• Average Product Size: 700 MB• Total Size Data Processed: 906 TB

Page 29: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013

RSS Facts & Figures

On-demand Processing: actual figures in the last 3 years

• Supported more than 40 active users per year

• Supported >20 processing/re-processing campaigns (included entire missions, e.g. MERIS, ASAR, SMOS and TPM)

• Integrated ~10 new algorithms per year

• Upgraded ~15 algorithms per year

• Set-up flexible (additional) processing capacity in less than 2 working days

• Managed >450 TB data farm (ESA, TPM and scientific products)

• ESA – ENVISAT (~320TB), ERS (~50TB), SMOS (~10TB)

• TPM – MSG (~19TB), METOP (~11TB), ALOS (~2TB)

• Scientific products – AARDVARC Swansea University and MGVI JRC (produced by GPOD and distributed via SSE), MKL3 ACRI

Page 30: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013ESA UNCLASSIFIED – For Official Use

Service PurposeSMOS Testbed aims to provide a flexible test environment to support the ESA calibration team for L1 calibration, and the Expert Support Laboratories (ESLs) for L2 Soil Moisture and Ocean Salinity pre-validation.

G-POD support elements– Fast integration of new versions

– SMOS L0 NRT ingestion chain set-up for L1 NRT custom re-processing

– Access to online data for bulk re-processing

– Access to flexible cloud resources for meeting deadlines

– On-demand SMOS L1 and L2 processors available for SMOS Teams

SMOS New Processor Delivery

Processor Integration in

G-POD

G-POD Processing Campaign

Results Analysis and

Validation

Auxiliary And Calibration Datasets

SMOS Testbed

Page 31: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013ESA UNCLASSIFIED – For Official Use

SMOS L1 TESTBEDProcessor:s Calibration, Telemetry, Level 1A, 1B, 1CSupported versions: 3.46, 5.00, 5.01, 5.02, 5.03, 5.04, 5.05, 6.00, 6.01Reference data series: L0, L1A, L1B, L1C (reprocessed)Auxiliary data baseline: as per Operational environment

SMOS L2 SOIL MOISTURE TESTBEDProcessor: SM L2, SM L2 post-processingSupported versions: 4.00, 4.01Reference data series: L1C (reprocessed)Auxiliary data baseline: as per CESBIO reprocessing

SMOS L2 OCEAN SALINITY TESTBEDProcessor: OS L2Supported versions: 5.00, 5.50Reference data series: L1C (reprocessed)Auxiliary data baseline: as per Operational environment

SMOS Testbed

Page 32: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013

The road ahead (1)

Funds for a full scale research programme are needed to foster:

The widening of competence and expertise in several research centres/industrial actors in Europe

The widening of efforts to cover time series analysis and data analytics in general

Research and development of multi dimensional and scalable DB solutions (including nosql databases, hadoop, etc.)

Large collaborative and persistent effort on crowdsourcing, benchmarking, image and feature annotation and evaluation

Establishing a theoretical framework to bridge the semantic gap and be able to assign “discriminating power” to extracted features and “categorization” of extracted classes/objects

High quality software and algorithm developments able to reach at least the “software prototype” readiness level

Page 33: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013

The road ahead (2)

To achieve these goals it is necessary to:

Establish a common “Big Data Mining” framework with interdisciplinary partners

Establish a R&D network to sustain this field Establish a network of users, and give them access to IIM resources

(system, data, …) Enlarge the scope of “Image” Mining to the physical parameters

measured by EO instruments Address the “instrument” gap, instrument-application Develop methods to use heterogeneous data: in situ, metadata,

linked data, models, etc.

Page 34: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013

The road ahead (3)

Activities to be started:

Promote IIM technology acceptance for EO users Extend and adapt methods from multimedia and social nets Apply human computing, gather knowledge from the use of the

system, adaptation, personalization, etc. Focus on Web/Internet based systems Develop simple and specific HMI and GUI Focus on Visual Data Mining, Visual Analytics, and related methods

In the PDGS identify “long term data preservation” and “interactive data exploitation” components

Design data representations: actionable information

Page 35: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013

The road ahead (4)

Merge the best of :

• data mining approach• time series capability • ability to support and host the user

algorithm

Page 36: Big Data, Data and Information Mining for Earth Observation

International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013ESA UNCLASSIFIED – For Official Use

MANY THANKS!