Big Data, Data and Information Mining for Earth Observation
-
Upload
pier-giorgio-marchetti -
Category
Technology
-
view
217 -
download
2
description
Transcript of Big Data, Data and Information Mining for Earth Observation
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
P.G. Marchetti ESA, M. Iapaolo Randstad
Ground Segment and Mission Operations Department
Research and Ground Segment Technology Section
Earth Observation Programmes Directorate
[email protected] [email protected]
Image Information Mining and Knowledge Discovery from Earth Observation DataTowards the Sentinels Era
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
1. Background on the European Space Agency
2. Motivation
3. Overview of ESA activities in the IIM field
4. Systems and services for EO data exploitation
5. The road ahead
Outline
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013ESA UNCLASSIFIED – For Official Use
• ERS and Envisat missions 1991-2012
• More than 2 Petabytes of data
• Two decades of global change records
• Need for data preservation, availability
and exploitation
The Heritage: ERS and ENVISAT
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013ESA UNCLASSIFIED – For Official Use
First images
Mar 02
Laun
ch
5000 scientific projects
using Envisat data
Sep 04
Envisat SymposiumSalzburg (A)
Global airpollution Chlorophyll
concentration
HurricaneKatrina
Envisat SymposiumMontreux (CH)
Apr 07
CO2 map
Living Planet SymposiumBergen (N)
Jun 10
L’Aquila 2009
and many workshops dedicated to specific Envisat user communities
Iceland 2010
Japan 2011
Prestige tankeroil slick
Ten Years of Envisat Science
Living Planet SymposiumEdinburgh (UK)
Sep 13
Envisat was the Sentinel “precursor” for many operational
users
B-15A iceberg
Bam earthquake
Ozone hole 2005Arctic 2007
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013ESA UNCLASSIFIED – For Official Use
6
Space Component
In-Situ Component
Services Component
Copernicus (formerly known as GMES) is a European space flagship programme led by the European Union
Provides the necessary data for operational monitoring of the environment and for civil security
ESA coordinates the space(*) component
The Copernicus Programme
(*)spacecraft, flight operation segment, ground segment
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013ESA UNCLASSIFIED – For Official Use
S1A/B: Radar Mission
S2A/B: High Resolution Optical Mission
S3A/B: Medium Resolution Imaging and Altimetry Mission
S4A/B: Geostationary Atmospheric Chemistry Mission
S5P: Low Earth Orbit Atmospheric Chemistry Precursor Mission
S5A/B/C: Low Earth Orbit Atmospheric Chemistry Mission
Jason-CS A/B: Altimetry Mission7
first launchin 2014
first launchin 2014
Copernicus Space Component: Dedicated Missions
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013ESA UNCLASSIFIED – For Official Use
DMC
Deimos-2
SPOT (HRS)
Optical M
R and LR
miss
ions
Copernicus Contributing
Missions
Optica
l VH
R
and H
R
missio
ns
SAR missions
Alt
imetr
y
mis
sions
Atmospheric missions
PROBA-V
SPOT (VGT)
MetOp Meteosat 2nd Generation
Cryosat
Jason
Pléiades
RapidEye
COSMO-Skymed
Radarsat
TerraSAR–XTandem-X
Sentinels are
complementary
Copernicus Contributing Missions
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
1. Foster the use of IIM and derived technologies in support of the EO
data exploitation
2. Develop state-of-the-art data processing for improving access and
dissemination of future EO data (e.g. Sentinels mission)
3. Implement systems and services for supporting the “scientific
exploitation” of EO data
4. Investigate new approaches and methodologies to exploit data from
all available missions and archives (joint effort with Long Term Data
Preservation programme)
Motivation
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
Image Information Mining Coordination Group (IIMCG)
The Image Information Mining Coordination Group (IIMCG):
Space Agencies (ESA, DLR, CNES, ASI) European Institutions (EUSC, JRC) National Research Institutes (Uni-Trento, ETHZ, INGV, Mississippi
State University)
Main objectives: Inform Agencies and partners, promote research and technological
activities on IIM (automatic information extraction from EO data for image understanding and retrieval)
Promote the use of IIM techniques for management and exploitation of very large EO data archives/missions (PB of data)
Foster the role of IIM in the context of future missions and existing archives
Involve industry and agency partners to increase the relevance of IIM activities in Europe
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
10 years of IIM activities at ESA
2000 2002 2004 2006 2008 20122010
Knowledge Driven Information Mining Prototype
Knowledge-centred Earth Observation Prototype
Multi-sensor Evolution Analysis Prototype
Technology activities over last decade
Main achievements: KIM System: IIM reference prototype @ ESRIN Platforms for EO data exploitation (KEO, GPOD, SSE, etc.) Tools for multi-temporal and evolution analysis (MEA)
Issues: Limited number of scientific and industrial partners involved National efforts not coordinated and harmonised Funds limited wrt the size of the research goals
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
Image Information MiningFrom Data to Information
Provides processing tools to extract features from images and associate meaning to extracted features (bridging the gap between data and information)
Empower users (researchers, service providers, decision makers) to identify and reuse relevant information for their applications
Encourage the use of common cooperative environments to achieve a common knowledge
EO Data
Data (PB)
Information (KB)
Information
Acquisition
Catalogue & Ordering
Algorithms & Applications
Knowledge Models & Ground Truth
Image Information Mining (IIM)
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
The KIM prototype developed @ ESRIN permits:
Intelligent and effective access to information in large EO datasets Improved exploration and use of EO images for scientific research Extraction of relevant information for different applications (change detection,
global monitoring, disaster management, …) Implementation, integration and validation of services derived from IIM methods
Three main components:1. Ingestion Software (Primitive Feature Extraction / Clustering)2. Database (storing extracted information)3. Interactive Client Application
i. Training and definition of “semantic rules”ii. Application of training (rules) to the entire collectioniii. Definition of “semantic labels” for extracted informationiv. Store for successive re-use
KIM (Knowledge-based Information Mining)
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
KIM Architectural Elements
EO Images
KIM
DatabaseInformationMining
ClientIngestion
Feature Extraction Clustering
Input EO images
Output Identifiers of searched images Feature Maps / Thematic maps
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
KIM Search and label
KIM permits to inspect a collection of images……interactively define “semantic features” using the “primitive features” extracted by the system……search for the defined feature within the entire collection…
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
KIM Information Extraction
…and extract Feature Maps or Thematic Maps
Forest MonitoringFlooded areasCloud masks
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
KIM Primitive features
Spectral Spectral signature
Texture Structural information extracted with the Gibbs Marcov Random Fields (GMRF) model S0 - full resolution images; S1 - sub-sampled images
DCT Discrete Cosine Transform: transforms signals and images from the spatial domain to the frequency domain
EMBD Enhanced-Model-Based-Despeckling: performs a high quality despeckling of SAR images
Area Area of the objects detected with the segmentation process
Compactness Compactness of the objects detected with the segmentation process
Spectral Mean Mean value of the radiometric information of the image inside the closed area detected by the segmenter
Spectral Variance
Variance of the radiometric information of the image inside the closed area detected by the segmenter
Hu Moments Hu-Moment Invariants: shape information conveyed by the contour points. Hu moments are invariant to scale, rotation and translation (the first 4 out of 7 invariant moments as shape descriptors have been used).
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
KIM Validation
KIM has been tested and validated with different datasets:
1. MERIS RR / MERIS FR
2. ERS / ASAR
3. SPOT
4. Landsat
5. Maps (Level 2 / Level 3 products)
Large number of collection created
Low number of significant semantic features identified
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
KIM for Information Extraction
1. Flood Detection (SAR data)
2. Cloud Detection (MERIS RR)
3. Long-term Forest Monitoring (Landsat)
4. Rapid Mapping / Damage Assessment (VHR optical data)
Potentialities of the tool have been highlighted and confirmed in different contexts
End-users expectations not always achieved
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
KEO (Knowledge centred Earth Observation)
KEO is a distributed Component-based Processing Environment (CPE) permitting to:
a. Create & semantically identify internal/external Processing Components
b. Graphically chain Processing Components into processing chains
c. Create Processing Components from IIM components (KIM training)
d. Export and store outputs into Web Servers (WFS, WMS, WCS)
KEO also provides some relevant Reference Data Sets:
e. Heterogeneous data and information, growing with external contributions (images, documents, DEMs, photos, processors, etc.)
f. In support of various applications: Classification, Time Series Analysis, Ortho-rectification, Urban Monitoring, Interferometry, etc.)
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
KEO CPEGraphical Processor Designer
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
MEA (Multi-temporal Evolution Analysis)
Multi-temporal analysis of HR / VHR products:
1. Select multi-temporal applications that might benefit from such extension
2. Design, implement and integrate the automatic multi-temporal algorithms to support the selected applications
3. Create the needed HR/VHR Reference Data Sets and Evolution Models
4. Develop standard interfaces between the different systems for common exploitation of ingested data and processing capabilities
5. Integrate algorithms and Evolution Models provided by other independent projects
6. Validate (with the support of a Validation Group) the Automatic Multi-temporal algorithms and Evolution Models
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
The MEA-ASIM system aims at providing:
1. Advanced tools for Land Use / Land Cover change analysis
2. Level-2 EO products for real time exploitation
3. Interfaces to external systems (G-POD, KEO, data providers, etc.)
4. Access to data via standard WCS OGC interface
5. Native support for Sentinel-2 datasets
RSS
Data
Farm
MEA (Multi-temporal Evolution Analysis)
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
MEA Pixel and Coverage analysis
Time-Series AnalisysSingle and multi plot functionality
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
Cross-comparison of EO products
MEA Pixel and Coverage analysis
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
Exploitation Platforms for EO
Development and implementation of collaborative Exploitation Platforms (G-POD, SSEP, E-CEO, etc.):
1. Fostering the scientific exploitation of EO data
2. Automating the creation data mining and information extraction experiments and algorithms
3. Supporting the creation of EO-based applications and services
4. Supporting the entire scientific research process:
a. Addressing specific scientific challenges and tackling new research problems in a “parallel and collaborative way”
b. Generation of reproducible results that can be easily shared and validated
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
Research and Service Support:Research Process
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
Principal Investigators
• EO algorithms delivery • Data type and range indication
• Output validation
• On-demand EO data processing
• Use of produced data (scientific projects delivery)
• Publications
RSS
• Data are made available in the RSS catalogue
• Algorithm porting and Integration
• Test and validation (involving the PI)
• On-demand EO data processing
• Delivery
RSS G-POD Process Steps
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
RSS Flexible Resources
Flexible Infrastructure satisfies: - HW requirements - Connectivity requirements - SLA (HA, help desk, ticketing
systems, etc.)
On-demand processing service:
Platform
Infrastructure
G-POD
ESRIN- 172 cores- 400 TB
UK-PAC- 96 cores- 300 TB
Flexible/ Unlimited Infrastructure - 10-200 cores- 1-10 TB
EO Scientists Principal Investigators
deliveryProcessEO data
Volume accessed by PI projects in 2012: • Total Number Submitted Jobs 38,774• Average Number of Products per Job: 35• Average Product Size: 700 MB• Total Size Data Processed: 906 TB
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
RSS Facts & Figures
On-demand Processing: actual figures in the last 3 years
• Supported more than 40 active users per year
• Supported >20 processing/re-processing campaigns (included entire missions, e.g. MERIS, ASAR, SMOS and TPM)
• Integrated ~10 new algorithms per year
• Upgraded ~15 algorithms per year
• Set-up flexible (additional) processing capacity in less than 2 working days
• Managed >450 TB data farm (ESA, TPM and scientific products)
• ESA – ENVISAT (~320TB), ERS (~50TB), SMOS (~10TB)
• TPM – MSG (~19TB), METOP (~11TB), ALOS (~2TB)
• Scientific products – AARDVARC Swansea University and MGVI JRC (produced by GPOD and distributed via SSE), MKL3 ACRI
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013ESA UNCLASSIFIED – For Official Use
Service PurposeSMOS Testbed aims to provide a flexible test environment to support the ESA calibration team for L1 calibration, and the Expert Support Laboratories (ESLs) for L2 Soil Moisture and Ocean Salinity pre-validation.
G-POD support elements– Fast integration of new versions
– SMOS L0 NRT ingestion chain set-up for L1 NRT custom re-processing
– Access to online data for bulk re-processing
– Access to flexible cloud resources for meeting deadlines
– On-demand SMOS L1 and L2 processors available for SMOS Teams
SMOS New Processor Delivery
Processor Integration in
G-POD
G-POD Processing Campaign
Results Analysis and
Validation
Auxiliary And Calibration Datasets
SMOS Testbed
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013ESA UNCLASSIFIED – For Official Use
SMOS L1 TESTBEDProcessor:s Calibration, Telemetry, Level 1A, 1B, 1CSupported versions: 3.46, 5.00, 5.01, 5.02, 5.03, 5.04, 5.05, 6.00, 6.01Reference data series: L0, L1A, L1B, L1C (reprocessed)Auxiliary data baseline: as per Operational environment
SMOS L2 SOIL MOISTURE TESTBEDProcessor: SM L2, SM L2 post-processingSupported versions: 4.00, 4.01Reference data series: L1C (reprocessed)Auxiliary data baseline: as per CESBIO reprocessing
SMOS L2 OCEAN SALINITY TESTBEDProcessor: OS L2Supported versions: 5.00, 5.50Reference data series: L1C (reprocessed)Auxiliary data baseline: as per Operational environment
SMOS Testbed
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
The road ahead (1)
Funds for a full scale research programme are needed to foster:
The widening of competence and expertise in several research centres/industrial actors in Europe
The widening of efforts to cover time series analysis and data analytics in general
Research and development of multi dimensional and scalable DB solutions (including nosql databases, hadoop, etc.)
Large collaborative and persistent effort on crowdsourcing, benchmarking, image and feature annotation and evaluation
Establishing a theoretical framework to bridge the semantic gap and be able to assign “discriminating power” to extracted features and “categorization” of extracted classes/objects
High quality software and algorithm developments able to reach at least the “software prototype” readiness level
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
The road ahead (2)
To achieve these goals it is necessary to:
Establish a common “Big Data Mining” framework with interdisciplinary partners
Establish a R&D network to sustain this field Establish a network of users, and give them access to IIM resources
(system, data, …) Enlarge the scope of “Image” Mining to the physical parameters
measured by EO instruments Address the “instrument” gap, instrument-application Develop methods to use heterogeneous data: in situ, metadata,
linked data, models, etc.
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
The road ahead (3)
Activities to be started:
Promote IIM technology acceptance for EO users Extend and adapt methods from multimedia and social nets Apply human computing, gather knowledge from the use of the
system, adaptation, personalization, etc. Focus on Web/Internet based systems Develop simple and specific HMI and GUI Focus on Visual Data Mining, Visual Analytics, and related methods
In the PDGS identify “long term data preservation” and “interactive data exploitation” components
Design data representations: actionable information
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013
The road ahead (4)
Merge the best of :
• data mining approach• time series capability • ability to support and host the user
algorithm
International Conference Frontiers in Diagnostic Technologies (ICFDT 2013) 26/11/2013ESA UNCLASSIFIED – For Official Use
MANY THANKS!