D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical...
Transcript of D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical...
EEA/IDM/R0/17/003
Technical specifications for implementation
of a new land-monitoring concept based on
EAGLE
D4: Draft design concept and CLC-Backbone
and CLC-Core technical specifications,
including requirements review
Version 4.0
31.01.2018
1 | P a g e
Version history
Version Date Author Status and
description
Distribution
2.0 04/10/2017 S. Kleeschulte, G. Banko,
G. Smith, S. Arnold, J.
Scholz, B. Kosztra, G.
Maucha
Reviewers:
G-H Strand, G. Hazeu,
M. Bock, M. Caetano, L.
Hallin-Pihlatie
Draft for review by NRC
LC
EEA, NRC LC
3.0 10/11/2017 S. Kleeschulte, G. Banko,
G. Smith, S. Arnold, J.
Scholz, B. Kosztra, G.
Maucha
Reviewers:
G-H Strand, G. Hazeu,
M. Bock, M. Caetano, L.
Hallin-Pihlatie
Updated draft
integrating the
comments following the
NRC LC workshop
For distribution at Copernicus User
Day
4.0 31/01/2018 S. Kleeschulte, G. Banko,
G. Smith, S. Arnold, J.
Scholz, B. Kosztra, G.
Maucha
Reviewers:
G-H Strand, G. Hazeu,
M. Bock, M. Caetano, L.
Hallin-Pihlatie
Updated draft
integrating the
comments following the
Brussels COPERNICUS
Land monitoring
workshop
For EEA distribution / consultations
2 | P a g e
Table of Contents
1 Context ............................................................................................................................................ 7
2 Introduction .................................................................................................................................... 9
2.1 Background ............................................................................................................................. 9
2.2 Concept ................................................................................................................................. 10
2.3 Role of industry ..................................................................................................................... 14
2.4 Role of Member States ......................................................................................................... 14
2.5 Engagement with stakeholder community ........................................................................... 15
2.6 Structure of this document ................................................................................................... 15
3 Requirements analysis .................................................................................................................. 16
4 CLC-Backbone – the industry call for tender ................................................................................ 21
4.1 Introduction .......................................................................................................................... 21
4.2 Description of CLC-backbone ................................................................................................ 22
4.2.1 Level 1 Objects – hard bones ........................................................................................ 23
4.2.2 Level 2 Objects – soft bones ......................................................................................... 23
4.3 Technical specifications ........................................................................................................ 24
4.3.1 Spatial scale / Minimum Mapping Unit ........................................................................ 24
4.3.2 Delineation accuracy of geometric units ...................................................................... 25
4.3.3 Reference year .............................................................................................................. 25
4.3.4 EO data .......................................................................................................................... 26
4.4 Thematic attribution ............................................................................................................. 27
4.4.1 Thematic classes ........................................................................................................... 28
4.5 European ancillary data sets for Level 1 hard bone creation ............................................... 29
4.5.1 Criteria for ancillary data input ..................................................................................... 31
4.5.2 Ancillary input data ....................................................................................................... 32
4.5.3 Remark: land parcel identification system (LPIS) .......................................................... 36
4.6 Short-Form of CLC-Backbone end product technical specifications ..................................... 37
5 CLC-Core – the grid approach ....................................................................................................... 40
5.1 Background ........................................................................................................................... 40
5.2 Populating the database ....................................................................................................... 41
5.3 Data modelling in CLC-Core .................................................................................................. 42
5.4 Database implementation approach for CLC-Core ............................................................... 43
5.4.1 Database concepts revisited: from Relational to NoSQL and Triple Stores .................. 43
5.4.2 CLC-Core Database: spatio-temporal Triple Store Approach ....................................... 48
3 | P a g e
5.4.3 Processing and Publishing of CLC-Core Products .......................................................... 49
5.4.4 Conclusion and critical remarks .................................................................................... 49
5.5 Short-form of CLC-Core end product technical specifications ................................................ 50
6 CLC+ - the long-term vision ........................................................................................................... 53
7 CLC-Legacy .................................................................................................................................... 57
7.1 Experiences to be considered ................................................................................................ 59
8 Annex 1 – CLC nomenclature ........................................................................................................ 63
9 Annex 2: CLC-Back bone data model ............................................................................................ 64
9.1.1 Integration of temporal dimension into LISA................................................................ 67
9.1.2 Integration of multi-temporal observations into LISA .................................................. 68
10 Annex 3: OSM data ................................................................................................................... 70
10.1 OSM road nomenclature....................................................................................................... 70
10.2 OSM roads: completeness ................................................................................................... 71
11 Annex 4: LPIS data in Europe .................................................................................................... 72
4 | P a g e
List of Figures
Figure 1-1: The relationship of the 2nd generation CLC to the existing CLMS components. .................. 7
Figure 2-1: Conceptual design showing the interlinked elements required to deliver improved
European land monitoring (2nd generation CLC). .................................................................................. 11
Figure 2-2: Conceptual design for the products and stages required to deliver improved European
land monitoring (2nd generation CLC). .................................................................................................. 12
Figure 2-3: A scale versus format schematic for the current and proposed CLMS products. .............. 13
Figure 4-1: illustration of a merger of current local component layers covering (with overlaps) 26,3%
of EEA39 territory (legend: red – UA, yellow - N2K, blue – RZ LC) ....................................................... 21
Figure 4-2: Processing steps to derive (1) the geometric partition of objects on Level 1 using a-priori
information, (2) the delineation of objects on Level 2 using image segmentation techniques and (3)
the pixel-based classification of EAGLE land cover components and (4) attribution of Level 2 objects
based on this classification and additional Sentinel-1 information. ..................................................... 22
Figure 4-3: Illustration of OSM completeness for Estonia (near Vändra), Portugal (near Pinhal Novo),
Romania (near Parta) and Serbia (near Indija) (top to bottom). © Bing/Google and EOX Sentinel-2
cloud free services as Background (left to right). ................................................................................. 33
Figure 4-4: Downloadable datasets for INSPIRE transport network (road) services using the official
INSPIRE interface (Status: Jan. 2018). ................................................................................................... 34
Figure 4-5: Distribution of number of downloadable datasets for INSPIRE transport service across
member countries (Status: Jan. 2018) .................................................................................................. 35
Figure 5-1: CLC with a 1 km raster/grid superimposed (top) illustrating the difference between
encoding a particular unit as raster pixel (centre) or a grid cell (bottom). “daa” is a Norwegian unit:
10 daa = 1 ha. ........................................................................................................................................ 41
Figure 5-2: Representation of real world data in the CLC-Core. ........................................................... 42
Figure 5-3: Example of an RDF triple (subject - predicate - object). ..................................................... 46
Figure 5-4: GeoSPARQL query for Airports near the City of London. ................................................... 47
Figure 5-5: Result of the GeoSPARQL of Figure 5-4 as map and in the JSON format. .......................... 47
Figure 5-6: Schematic view of the distributed SPARQL endpoints communicating with each other.
The arrows indicate the flow of information from a query directed to the French CLC SPARQL
endpoint. ............................................................................................................................................... 48
Figure 5-7: Intended generation of CLC+ products based on the distributed triplestore architecture.
.............................................................................................................................................................. 49
Figure 7-1: CLC2012 along the border between Norway and Sweden (red line). The central mountain
area covered with sparsely vegetated lichen and calluna heath is assigned to 3.3.3 Sparsely
vegetated areas in Norway and 3.2.2 Moors and heathland in Sweden. Both classifications are
correct. .................................................................................................................................................. 58
Figure 7-2: Result of changing the CLC implementation method in Germany ..................................... 59
Figure 7-3: Generalisation technique applied in Norway based on expanding and subsequently
shrinking. The technique exists for polygon as well as raster data. ..................................................... 61
Figure 7-4: Generalization levels used in LISA generalization in Austria .............................................. 61
Figure 7-5: Examples of 25 ha, 10 ha and 1 ha MMU CLC for Germany ............................................... 61
Figure 7-6: CLC+ national test case results (Hungary, Budapest-North). Comparison of traditional CLC
(top) with bottom-up CLC (middle), created from high-resolution national CLC3 (bottom). The
generalized CLC is more fragmented than CLC2012, even if 25ha MMU is kept. The high-resolution
CLC illustrates well the potential laying in CLC+. .................................................................................. 62
5 | P a g e
Figure 9-1: INSPIRE main feature types: land cover dataset and land cover unit (from land cover
extended application schema) .............................................................................................................. 64
Figure 9-2: EAGLE extension to land cover unit and new feature type land cover component ........... 65
Figure 9-3: The new CLC-Basis model (based on LISA) ......................................................................... 66
Figure 9-4: illustration of temporal NDVI profile and derived land cover components throughout a
vegetation season (Example taken from project Cadaster Env Austria, © Geoville 2017) .................. 68
Figure 9-5: Draft sketch to illustrate the principles of the combined temporal information that is
stored in the data model ...................................................................................................................... 69
Figure 11-1: Illustration of (a) physical block, (b) field parcel and (c) single management unit73
Figure 11-2: Overview of available GIS datasets LPIS and their thematic content © Synergise, project
LandSense ............................................................................................................................................. 74
6 | P a g e
List of Tables
Table 2-1: Overview of key characteristics proposed for the four elements / products of the 2nd
generation CLC based on initial discussions with EEA. ......................................................................... 12
Table 2-2: Summary (matrix) of potential roles associated with each element / product. ................. 15
Table 4-2: Main characteristic of Level 1 objects (hard bone) and level 2 objects (soft bones) .......... 24
Table 4-1: Overview of existing relevant products which were analysed as potential input to support
the construction of the geometric structure (Level 1 “hard bones”) of the CLC-Backbone. ................ 30
Table 5-1: Comparison of selected Triplestores with respect to spatial and temporal data. .............. 46
Table 6-1: List of current CLC classes and requirements for external information .............................. 56
7 | P a g e
1 CONTEXT
The European Environment Agency (EEA) and European Commission DG Internal Market, Industry,
Entrepreneurship and SMEs (DG GROW) have determined to develop and design a conceptual
strategy and associated technical specifications for a new series of products within the Copernicus
Land Monitoring Service (CLMS) portfolio (Figure 1-1), which should meet the current and future
requirements for European Land Use Land Cover (LULC) monitoring and reporting obligations. This
collection of products is nominally called the "2nd generation CORINE Land Cover (CLC)".
Figure 1-1: The relationship of the 2nd generation CLC to the existing CLMS components.
After a call for tender in 2017, the EEA has tasked the European Environment Information and
Observation Network (EIONET) Action Group on Land monitoring in Europe (EAGLE Group) with
developing an initial response to fulfilling the needs listed in the following chapters through a
conceptual design and technical specifications. The approach adopted for the development of the
2nd generation CLC represents a sequence of stages, where separate single elements or products of
the whole concept could be developed relatively independently and at different rates, to allow time
for broad consultation with stakeholders. This also allows the continuous inclusion of Member State
(MS) input, the exploitation of industrial production capacity, and the necessary feedback, lessons
learnt and refinement to reach the ultimate goal of a coherent and harmonized European Land
Monitoring Framework.
The first stage of this development process was to engage with the EEA and DG-GROW to refine
their requirements and build on the ideas presented by the EAGLE Group in the proposal. These first
developments were undertaken within a set of constraints outlined by EEA and DG GROW for the
initial product to be delivered within the 2nd generation CLC:
Industrial production by service providers,
Outcome product in vector format,
Highly automated production process,
Short timeframe of production phase,
Driven by Earth Observation (Sentinel-2),
Complete the picture started by the Local Component1 products (EEA-39).
1 The term “component” has a twofold function, 1) as the Copernicus local component products, 2) as a term for Land Cover Component as an element within the EAGLE data model. In this case the former.
8 | P a g e
These developments resulted in the first distributed deliverable (D2) which outlined the conceptual
strategy and proposed a draft technical specification for the first product (CLC-Backbone, see later)
to be developed. This stage also involved a presentation of the concepts in D2 to the EIONET
National Reference Centres (NRCs) Land Cover in Copenhagen in October 2017.
The NRCs feedback collected during the NRC meeting and afterwards was carefully taken into
consideration for the continuation of the development process into the next stage. The conceptual
strategy and technical specification were both refined and extended for the production of the next
distributed deliverable (D3). The concepts and specification for the 2nd generation of CLC contained
in D3 were presented to the Copernicus Land Monitoring Service: Workshop on CORINE Land Cover+
in Brussels in November 20172. This meeting generated lively debate and a set of survey results from
the participants.
This document is the penultimate deliverable (D4) of the project representing the outcomes of the
latest stage of development of the 2nd generation of CLC. This document contains a further
expansion on the conceptual strategy and extended technical specifications for the proposed
products. The aim of the document is to continue to communicate these developments to the
stakeholders involved in the field of European land monitoring and elicit their feedback. This
document will be presented by the EEA to a number of high level meetings and stakeholders.
A final revised version of this document (D5) will be produced and made available in early March
2018.
2 “Copernicus Land Monitoring Service: Workshop on CORINE Land Cover+, 16/11/2017” Report. Framework Contract 385/PP/2014/FC Lot 2 (Copernicus User Uptake Framework Contract). Prepared for: European Commission - DG GROW
9 | P a g e
2 INTRODUCTION
This chapter provides the background to the concept, an outline of the proposed approach to be
adopted, an overview of the elements of the concept and the current status of the developments.
2.1 Background Monitoring of Land Use and Land Cover (LULC) is among the most fundamental environmental
survey efforts required to support policy development and effective environmental management3.
Information on LULC plays a key role in a large number of European environmental directives and
regulations. Many current environmental issues are directly related to the land surface, such as
habitats, biodiversity, phenology and distribution of plant species, ecosystem services, as well as
other issues relevant to population growth and climate change. Human activities and behaviour in
space (living, working, production, education, supplying, recreation, mobility & communication,
socializing) have significant impacts on the environment through settlements, transportation and
industrial infrastructure, agriculture, forestry, exploitation of natural resources and tourism. The
land surface, who´s negative change of state can only – if at all – be reversed with huge efforts, is
therefore a crucial ecological factor, an essential economic resource, and a key societal determinant
for all spatially relevant basic functions of human existence and, not least, nations’ sense of identity.
Land thus plays a central role in all three factors of sustainable development: ecology, economy, and
society.
LULC products so far tend to be produced independently of each other at the global, European,
national and sub-national levels, as well as by different disciplines and sectors, each of them
focussing on a fairly similar end product but different emphases on thematic content and spatial
detail. This independency and uncoordinated production is explained and justified by the large
variation in objectives and requirements of these products, but it leads to reduced interoperability
and sometimes also duplication of work, and thereby, inefficient use of resources.
At the European level CORINE Land Cover (CLC) is the flagship programme for long-term land
monitoring. This is now part of the Copernicus Land Monitoring Service (CLMS). CLC has been
produced for reference years of 1990, 2000, 2006 and 2012, with 2018 under preparation and
expected to be available by late 2018. CLC is produced by MS with technical coordination of EEA and
guided by well-established specifications and methodological guidelines. The CLC specification aims
to provide consistent localized geographical information on LULC using 44 classes at level-3 in the
hierarchical nomenclature (see Annex 1). The vector databases have a minimum mapping unit
(MMU) of 25 ha and minimum mapping width (MMW) of 100 m with a single thematic class
attribute per land parcel. At the European level, the database is also made available on a 100 x 100
m and 250 x 250 m raster products, which have been aggregated from the original vector data at
1:100 000 scale. CLC also includes a directly mapped (i.e. not creating by intersecting CLC status
layers) change layer, which records changes between two of the 44 thematic classes between two
dates with a MMU of 5 ha.
Although CLC has become well established and has been successfully used, mainly at the pan-
European level, there are a number of deficiencies and limitations that restrict its wider exploitation,
particularly at the MS level and below. This is partly due to the fact that the MMU of CLC (25 ha) is
too coarse to capture the fine details of the landscape at the local and regional scales, but also to the
fact that some MS have access to more detailed, precise and timely information from national
programs. In consequence of the first point, features of smaller size that represent landscapes
diversity and complexity are not mapped either because of geometric generalization or because they
3 Harmonised European Land Monitoring - Findings and Recommendations of the HELM Project.
10 | P a g e
are absorbed by classes with broad definitions. Moreover, landscape dynamics that are highly
relevant to locally decided but globally effective policy, such as small-scale forest rotation, changes
in agricultural practices and urban in-filling, may be missed due to low spatial resolution and / or
thematic depth of CLC class definitions. The successful use of CLC in combination with higher spatial
resolution products to detect and document urban in-filling has given a clear indication of the
required direction of development for European land monitoring.
To address some of the above issues, the CLMS has expanded its portfolio of products beyond CLC to
include the High Resolution Layers (HRLs) and local component (LoCo) products. The HRLs provide
pan-European information on selected surface characteristics in a 20 m raster format, also available
aggregated to 100 m raster cells. They provide information on basic surface properties such as
imperviousness, tree cover density and grassland and can be described as intermediate products.
The LoCo products in vector format are based on very high spatial resolution EO data and ancillary
information tailored to a specific landscape monitoring purpose (e.g. Urban Atlas), which provide
more detailed thematic LCLU information on polygon level with a MMU in a range of 0.25 to 1 ha.
However, the LoCo products altogether do not provide wall to wall coverage of the EEA-39
countries, even when combined. Their nomenclatures and geometries are not harmonised and have
thus caused interoperability problems which are now being addressed.
2.2 Concept Given the above issues and the known reporting obligation list in the next chapter, a revised concept
for European Land Monitoring is required which both provides improved spatial and thematic
performance and builds on the existing heritage. Also, recent evolutions in the field of land
monitoring (i.e. improved Earth Observation (EO) input data due to e.g. Sentinel programme,
national bottom-up approaches, processing methods, etc.) and desktop and cloud-computing
capacities offer opportunities to deliver these improvements more effectively and efficiently. The
EEA in conjunction with DG GROW has identified this need and now aims to harmonise and integrate
some of the CLMS activities by investigating the concept and technical specifications for a higher
performance pan-European mapping product under the banner of "2nd generation CLC" (Figure 1-1).
EEA and DG GROW determined that such a 2nd generation CLC should build upon recent conceptual
development and expertise, while still guaranteeing backwards compatibility with the conventional
CLC datasets. Also, the proposed approach should be suitable to answer and assist the needs of
recent evolution in European policies like reporting obligations on land use, land use change and
forestry (LULUCF), plans for an upcoming Energy Union or long-term climate mitigation objectives.
After a successful establishment of 2nd generation CLC there would then also be knock-on benefits
for a broad range of other European policy requirements, land monitoring activities and reporting
obligations. Given these requirements, no single product or specification would address all needs
and the complexity of the solution would require a stage development and implementation.
The proposed conceptual strategy therefore consists of a number of interlinked elements (Figure
2-1) which represent separate products and therefore can be delivered independently in discrete
production phases. Each of the products has its own technical specification and production
methodology and can be produced through its own funding / resourcing mechanisms.
The structure of the conceptual design proposed by the EAGLE Group for the 2nd generation CLC is
based on the elements shown in Figure 2-1. The reports associated with this work will expand
incrementally giving more details on the conceptual design and provide technical specifications of
increasing elaboration for the for the individual products.
11 | P a g e
Figure 2-1: Conceptual design showing the interlinked elements required to deliver improved
European land monitoring (2nd generation CLC).
Although the four new elements / products of the conceptual strategy will be described in more
detail later in this and the final subsequent report (D5), they can be summarised as follows to aid
understanding of the conceptual design:
1. CLC-Backbone is a spatially detailed, large scale inventory in vector format providing a
geometric spatial structure attributed with raster data for landscape features with limited,
but robust, EO-based land cover thematic detail on which to build other products.
2. CLC-Core is a consistent multi-use grid4 database repository for environmental land
monitoring information populated with a broad range of land cover, land use and ancillary
data form the CLMS and other sources, forming the information engine to deliver and
support tailored thematic information requirements.
3. CLC+ is the “nominal” end point or final product in the establishment of CLC 2nd generation.
It is a derived vector and raster product from the CLC-Core and CLC-Backbone and will be a
LULC monitoring product with improved spatial and thematic performance, relative to the
current CLC, for reporting and assessment.
4. The final element of the conceptual design, although not strictly a new product, is the ability
to continue producing the existing CLC, which may be referred to as CLC-Legacy in the
future, which already has a well-established and agreed specification.
Table 2-1 provides the current overview of the main characteristics of the four elements to allow the
reader to make comparisons between the key characteristics of the products.
To manage the project efficiently and effectively, when developing and specifying these new
products it has been necessary to stage the work. Throughout the documents delivered by this
project the staging of the work given in Figure 2-2 will be used as a key graphic to identify the
product and stage being described.
4 A data structure whose grid cells are linked to a data model that can be populated with the information from the different sources.
12 | P a g e
Figure 2-2: Conceptual design for the products and stages required to deliver improved European
land monitoring (2nd generation CLC).
Table 2-1: Overview of key characteristics proposed for the four elements / products of the 2nd generation CLC based on initial discussions with EEA.
CLC-Backbone CLC-Core CLC+ CLC-Legacy
Description Detailed wall to wall (EEA-39) geometric vector reference layer with basic thematic content and a raster attribution.
All-in-one data container for environmental land monitoring information according to EAGLE data model.
Thematically and geometrically detailed LULC product.
A more generalised LULC product consistent with the CLC specification.
Role / purpose Support to CLMS products and services at the pan-European and local levels.
Thematic characterisation of CLMS products and services at the pan-European and local levels.
Support to EU and national reporting and policy requirements.
Maintain the time series (backwards compatibility) and support legacy systems.
Format Raster and vector. GRID database. Raster and vector. Raster and vector.
Thematic detail <10 basic land cover classes, few spectral-temporal attributes.
Rich attribution of LU, LC and ancillary information.
High thematic detail including LC and LU with improvements compared to CLC.
CLC-nomenclature with 44 classes
+ changes between the 44.
Geometric detail (MMU, grid size)
1.0 ha. 1.0 ha, 100 x 100 m grid.
1.0 ha for status and changes.
25 ha for status, 5 ha for changes (raster: 100 x 100 m).
Update cycle 3 - 6 years. Dynamic update as new information becomes available.
3 - 6 years. Standard 6 years.
Reference year 2018 2018 2018 2018 / 2024
Production year 2018 2019 TBD TBD
Method Geospatial data integration and image segmentation for geometric boundaries and attribution / labelling by pixel-based EO derived land cover
EAGLE-Grid approach: population and attribution of regular GRID
Derived by SQL from CLC-Core
Derived from CLC+ and CLC-Core
Input data Geospatial data, EO images and ancillary data selected from LoCo (with visual control) and HRL products.
CLC-Backbone, all available HRLs, LoCo, ancillary data, national data provisions (e.g. LU), photointerpretation.
CLC-Core and CLC-Backbone
CLC+ and / or CLC-Core
13 | P a g e
Figure 2-3 is a schematic description of the conceptual design showing the relationship of the new
elements / products to existing CLMS products in terms of their format and level of spatial detail. In
this representation:
1. The current or conventional CLC (CLC2000, CLC2006, CLC2012, CLC2018 etc.), a polygon map
with fewer geometric details.
2. The LoCos (Urban Atlas, Riparian Zones, N2000 etc.), polygon maps with more geometric
and thematic details.
3. The HRLs (Imperviousness, Forest, Grassland, Wetland etc.), raster products for specific
surface characteristics with high spatial details.
4. The proposed CLC-Core, a grid product where the level of detail has still to be decided.
5. The CLC-Backbone and CLC+, both polygon maps with different levels of thematic detail.
Figure 2-3: A scale versus format schematic for the current and proposed CLMS products.
In any monitoring system the requirement for updates is central to the implementation of the
activity. In this project we are focusing on the technical specifications, therefore the update cycle
will only be addressed in terms of the suggested time between repeat productions. Details of the
update process are more associated with the methodologies which still need to be finalised by the
service providers, national teams or the EEA in a later phase of development. However, particularly
in relation to the CLC-Legacy, reference will be made to the issues associated with the update cycle
and change mapping where appropriate.
Given the context, requirements and issues, the aim should be to find a means to deliver the
concept and its proposed products with an efficient mixture of industrially produced material backed
up by ancillary (e.g. land use) information from various national programmes if they are in existence.
It is important to propose a viable system in which there is flexibility to adapt the later steps to
issues of feasibility and practicality once the implementation of the earlier steps is underway. For
instance, the outcomes of the CLC-Backbone production should be able to influence thematic
content of the CLC-Core and / or the technical specifications of the CLC+.
14 | P a g e
2.3 Role of industry The development and production of CLC to this point has had only a limited role for industry, mainly
focused on the productions of pre-processed image datasets (e.g. IMAGE2006, IMAGE2012, ….), the
validation of the CLC2012 products and in some MS the actual production. Conversely, the
production of the non-CLC products within the CLMS has been dominated by industry through a
series of service contracts to generate consistent products across Europe.
Industry has the ability to produce operational solutions which exploit automation, can handle large
data volumes and are scalable to European-wide coverage. It is also possible that it is easier to
standardize an industrial (top-down) production process than a (bottom-up) process based on MS
participation, and that industrialized production therefore will lead to more harmonized products
across Europe. As the amount of available EO and geographic information data increases, highly
efficient and effective mechanisms for production will be required for at least parts of the European
land monitoring process and economies of scale must be exploited. Industry therefore offers a
number of capabilities which will be required at selected points within the production of the 2nd
generation CLC.
DG GROW has expressed the wish that industry should have an initial role in the production of CLC-
Backbone which has therefore been designed in part to exploit industrial capabilities. Further
opportunities for industrial involvement are shown in Table 2-2.
2.4 Role of Member States The MS have always been intimately linked with the CLC as its production has been the responsibility
of the nominated authorities through EIONET, i.e. NRC Land Cover. The MS have provided the
production teams for the actual mapping, allowing the process to exploit local knowledge and
familiarity with native landscape types, and providing access to national datasets, which may not
otherwise have been possible. The bottom-up production approach has been the key to successful
delivery of this important time series of LULC data.
With respect to the non-CLC products within the CLMS, the MS involvement has so far been limited
to verification and, in the case of the 2012 products, enhancement. Although the MS have provided
valuable feedback and enhancement in some cases, particularly on the HRLs, their capabilities have
not yet been fully exploited within the CLMS. The EEA is looking into possibilities to further increase
the role of MS in non-CLC products, for instance by introducing a new task on enrichment of these
products. It is therefore of the utmost importance to ensure a balanced contribution to the CLC+
suite of products between industry and MS via Eionet NRC/LCs.
Table 2-2 shows the potential for a greater role for the MS across all of the new elements / products
within the 2nd generation CLC. It is important that the MS experts have oversight of the specifications
and an understanding of the opportunities to provide feedback on the monitoring process and
products. This includes the opportunities to contribute with data, experience and local knowledge to
the production and validation activities where appropriate.
The MS are currently involved in verification, but not in the validation of CLMS products. Verification
is the evaluation of whether or not the product corresponds to the reality of the landscape at stake
by confronting the product to the regional / local expertise of the terrain, and hence meets the
expectation of the stakeholders. Validation is the independent geostatistical control on the accuracy
with reference to the product specifications. The intended use of CLMS products and services at the
national level calls for verification against national user requirements – not as an acceptance test but
as a documentation of the usefulness, a promotional tool, and possibly also as assistance to improve
the products for this intended use.
15 | P a g e
Table 2-2: Summary (matrix) of potential roles associated with each element / product.
CLC-Backbone CLC-Core CLC+ CLC-Legacy
EEA Definition, coordination, main user.
Definition, coordination, main user.
Definition, coordination, main user.
Definition, coordination, main user.
MS Review of specification, provision of geometric data, verification, validation, user.
Review of
specification,
population with
national datasets,
land use
information, user.
Review of
specification,
support to
production,
verification,
validation, user.
Production,
verification,
validation, user.
Industry Production. Implementation and maintenance of DB infrastructure.
Support
production,
validation.
Validation.
2.5 Engagement with stakeholder community The success of the project and the long-term development of land monitoring in Europe is
intrinsically linked to the involvement of the stakeholder community. It is vital that all the relevant
stakeholders are aware of this activity and contribute their requirements and opinions to this
process. Revised versions of this document are foreseen at specific milestones towards a final
version for deliver in early 2018.
2.6 Structure of this document This specific deliverable, D4, is the third step towards the definition of the conceptual strategy and
the potential technical specifications for a series of new CLMS products. It aims at communicating
these details to the stakeholders involved in European land monitoring to elicit feedback, comments
and questions. The work reported here begins with a review of requirements which goes beyond the
remit of the call for tender. The four elements of the 2nd generation CLC within the CLMS and their
potential products are described in further detail in the following chapters. This version includes the
first feedback from the NRC LC meeting in Copenhagen in October 2017, the Copernicus Land
Monitoring Service: Workshop on CORINE Land Cover+ in Brussels in November 2017 and dedicated
written feedback from various MS.
For CLC-Backbone, this document updates the draft versions of the technical specifications, but with
less emphasis on the outline of the implementation methodology that was provided in D3. These
details are still open for discussion and review by stakeholders and will ultimately be used as input
for an open call for tenders to industrial service providers for a production to start in 2018. For CLC-
Core, this document further develops the technical specification and proposes a number of options
for the implementation. The technical design of CLC-Core will continue to be developed based on
stakeholder feedback in future steps. The current thinking around CLC+ has been developed and is
approaching a technical specification.
16 | P a g e
3 REQUIREMENTS ANALYSIS
In line with the principle of Copernicus, this analysis in support of the development of new products
with CLMS, is driven by user needs rather than the current technical capabilities of EO sensors and
processing systems. The technical issues will be dealt with in the chapters related to the individual
products focusing on their specification and, to a lesser extent, methodological development. This
analysis was initially based around the requirements set out in the original call for tender, but has
been extended through inclusion of recent work by the EC and ETC to provide a broader range of
stakeholder needs. As this project has developed, further requirements have been identified
through meetings and stakeholder feedback to support the increasingly detailed specifications of a
complete 2nd generation CLC.
The MMU and nomenclature of the current CLC have been operational for three decades. Backward
compatibility is therefore an essential issue when a new monitoring framework is considered. Still,
the specification for a future, improved and harmonised European land monitoring product is now
open for revision. The proposed products should both support current European policies through
continuity of current monitoring activities, but also support new policies. Examples are the reporting
obligations (on the European level) on land use, land use change and forestry (LULUCF), plans for an
upcoming Energy Union, Greening of the CAP, Urban Planning, Biodiversity and long-term climate
mitigation objectives.
A confounding element is that many of the policies that could exploit EO-based land monitoring
information rarely give clear quantitative requirements that can provide a basis for spatial, temporal
or thematic specifications. Also, despite the fact that land monitoring information is crucial in many
domains, so far, there has been no Land Framework Directive in Europe. The higher-level strategic
policies, such as the EU Energy Union often refer to the monitoring linked to other activities such as
REDD+ and LULUCF. Where actual quantitative analysis and assessment takes place, they tend to
rely on currently available products. For instance, LULUCF assessments in Europe (independently
from national reporting obligations) use CLC (with a 25 ha MMU), while monitoring at the global
scale is using Global Land Cover 2000 (GLC 2000) with a 1 km spatial resolution (or 100 ha MMU).
Reference was made to future monitoring in Decision No 529/2013/EU of the European Parliament
and of the Council on accounting rules on greenhouse gas emissions and removals resulting from
activities relating to LULUCF, but this only refers to MS activities with MMUs between 0.05 ha and 1
ha. Many MS also employ sample based field surveys (e.g. national forest inventories) to collect the
required LULUCF data. Some of the initiatives associated with LULUCF have explored the potential of
improved monitoring performance, such as the use of SPOT4 imagery with a 20 m spatial resolution
to show land-use change between 2000 and 2010 for REDD+ reporting, but until very recently it has
not been practical for these types of monitoring to become operational globally.
A more fruitful avenue for user requirements in support of the 2nd generation CLC is to consider the
initiatives which are now addressing habitats, biodiversity and ecosystem services. The EU
Biodiversity Strategy to 2020 was adopted by the European Commission to halt the loss of
biodiversity and improve the state of Europe’s species, habitats, ecosystems and the services. It
defines six major targets with the second target focusing on maintaining and enhancing ecosystem
services, and restoring degraded ecosystems across the EU, in line with the global goal set in 2010.
Within this target, Action 5 is directed at improving knowledge of ecosystems and their services in
the EU. Member States, with the assistance of the Commission, will map and assess the state of
ecosystems and their services in their national territory, assess the economic value of such services,
and promote the integration of these values into accounting and reporting systems.
17 | P a g e
The Mapping and Assessment of Ecosystems and their Services (MAES) initiative is supporting the
implementation of Action 5 and, although much of the early work in this area on European level has
used inputs such as CLC, it is obvious that to fully address this action an improved approach is
required especially for a better characterisation of land, freshwater and coastal habitats, particularly
in watershed and landscape approaches. The spatial resolution at which ecosystems and services
should be mapped and assessed will vary depending on the context and the purpose for which the
mapping/assessment is carried out. However, information from a more detailed thematic
characterisation and classification and at higher spatial resolution are required which are compatible
with the European-wide classification and could be aggregated in a consistent manner if needed.
A first version of a European ecosystem map covering spatially explicit ecosystem types for land and
freshwater has been produced at 1 ha spatial resolution using CLC (100 m raster), the predecessor of
the HRL imperviousness with 20 m resolution, JRC forest layer with 25 m spatial resolution plus a
range of other ancillary datasets (e.g. ECRINS water bodies) using a wide variety of spatial
resolutions (from detailed Open Street Map data to 10 x 10 km grid data used in Article 17 reporting
of the Habitats Directive). It is clear that ecosystem mapping and assessment could more fully
exploit the recently available Copernicus Sentinel data and land products, and move down to a
higher spatial resolution.
The primary goal of the 2nd generation CLC is to underpin European policies and, where possible,
fulfil their needs. Furthermore, the EEA also wants to make the new products useful at national,
regional and even local levels. To this end the requirements gathering exercise was extended to a
broader stakeholder community at the national level and below, including representatives of
commercial service providers, NGOs and academia. This work drew on existing surveys (e.g. MS CLC
survey conducted by ETC-ULS5) and the feedback from the two events where the 2nd generation CLC
was promoted.
The recent EC-funded NextSpace project collected user requirements for the next generation of
Sentinel satellites to be launched around 2030. These requirements were analysed on a domain
basis. From the land domain, those requirements which were given a context of land cover
(including vegetation), glaciers, lakes, above-ground biomass, leaf area index and snow were
considered. A broad range of spatial resolutions were requested from sub 2.5 m to 10 km. Two
thirds of the collated requirements wanted data in the 10 – 30 m region. The requested MMUs were
also broadly distributed, but with a preference for 0.5 to 5 ha, which represents field / city block
level for much of Europe. The temporal resolution / update frequency was dominated by yearly
revisions, although some users could accept 5 yearly updates. Some users also wished for monthly
updates, although the reason was unclear. The users were less specific about the thematic detail
required and the few references were given to CLC, EUNIS, LCCS and “basic land cover”. The
acceptable accuracy of the products was in the range 85 to 90%.
The European Topic Centre on Urban, Land and Soil systems (ETC/ULS) undertook a survey of
EIONET members involved with the CLMS and CLC production considering a range of topics in
advance of the 2018 activities. Some of the questions referred to the shortcomings of CLC and the
potential improvements that could be made towards a 2nd generation CLC product. As expected, it
was noted that some CLC classes cause problems because of their mixed nature and instances on the
ground that are sometimes complex and difficult to disentangle. It was suggested that this situation
could be improved by the use of a smaller MMU so that the mapped features have more
5 Service Contract No 3436/R0-Copernicus/EEA.56586, Task 3: Planning of cooperation with EEA member and cooperating countries. Final report in preparation of the CLC2018 exercise
18 | P a g e
homogeneous characteristics. Also, a MMW reduction, particularly for highways, would allow linear
features to be represented more realistically. Of the 32 countries, who responded to the
questionnaire, 25 of them would support a finer spatial resolution, showing that there is, in general,
a national demand for high spatial resolution LULC data. The proposals ranged from 0.05 ha to
remaining at the current 25 ha, but the majority favoured 0.56 to 5 ha. Thematic refinement is also
supported by around one third of the respondents, who requested improved thematic detail,
separation of land cover from land use, splitting formerly mixed CLC classes and the addition of
further attributes to the spatial polygons.
During the Copernicus Land Monitoring Service: Workshop on CORINE Land Cover+ in Brussels in
November 2017 a number of MS representatives were asked to provide their requirements for land
monitoring and their thoughts on the 2nd generation CLC proposals as presentations to the whole
event. Their required spatial detail ranged from 0.5 ha MMU down 5 m pixel resolutions, which are
at the most detailed end of what had been suggested to date and potentially the smaller spatial
resolutions are out of scope for the 2nd generation CLC. For the required thematic detail, it was
suggested to go beyond the current CLC 44 class nomenclature and aim for either a full level-4
classification or split the more vague or ambiguous classes. There was a unanimous request for
yearly updates and a general feeling that the quality of the data should be improved relative to the
local component products.
As the final session in the workshop, a survey of the attendees (via an online voting tool) was
undertaken against a set of questions designed to understand the audience’s feeling towards the 2nd
generation CLC and assess their preferences for some of the key technical specifications. There was
strong support for 2nd generation CLC approach with 88 % of respondents considering the products
to have relevance at European and national levels with 35 % also thinking they would be useful at
sub-national levels. Similar levels of support were found for a spatially detailed land cover product
with limited thematic content, i.e. CLC-Backbone. On the technical specification of CLC-Backbone, 80
% felt a MMU of 1.0 ha or smaller would be appropriate, but this was almost evenly split across 0.25
ha, 0.5 ha and 1 ha. The CLC-Backbone MMW was evenly split between 10 m and 20 m and a lively
debate was produced around the impact of the input EO data. The dominant level of the preferred
thematic detail was around 10 classes with 51 %, although 27 % voted for an even less detailed
nomenclature of around 5 classes. In terms of the update cycle for CLC-Backbone, the majority of
the audience thought 3 years as sufficient, however almost a quarter wanted yearly updates. For the
CLC-Core the preferred grid size was less clear, ranging from 10 m to 100 m, with the extremes
dominating and representing almost equal numbers of votes. Further issues regarding the data
sources and contents of the 2nd generation CLC products were also surveyed, but these will be dealt
with in the relevant chapters.
During the project, the EEA and the EAGLE Group have received a large amount of written feedback
from MS and other stakeholders which has been addressed where appropriate in the relevant
product chapters below. Some of the feedback, although important, is out of scope for this project
and is being dealt with separately by the EEA and DG-GROW. The key themes of the feedback have
been:
Clarification of the overall 2nd generation CLC concept. The team appreciate that this is quite
a departure from current practice within the CLMS and during the iterations of the
deliverables the description of the concept has been clarified and improved, particularly in
6 0.5 ha being required by LULUCF
19 | P a g e
visualising the relationship with the exiting CLMS components and the links between the
individual products of the 2nd generation CLC.
Suggestions regarding the technical specifications. A large number of technical issues have
been identified from the feedback from the stakeholders. These issues have been reviewed
and incorporated where appropriate, practical and feasible within the revision of the
technical specifications given in the later chapters of this deliverable.
Use cases for the 2nd generation CLC products. The requirements for an improved version of
CLC has been known for some time and the detailed specification for CLC+ is now well in line
with a range of global and European policy and environmental management needs. The new
products within the 2nd generation CLC, such as CLC-Backbone and CLC-Core, have a vital
role to play in the delivery of CLC+. Still, there are so far, no admissible use cases. The
required use cases should be developed as end users become more familiar with the
capabilities.
Inputs to the products from MS and other sources. Inclusion of MS data in the production of
the 2nd generation CLC products matches the bottom-up approaches promoted by the EEA
and EAGLE, and may help make the products more relevant for use at the national and sub-
national level. There are, however, a number of technical and organisation challenges which
need to be addressed. This project will address the technical issues by proposing additional
specifications for potential MS inputs while EEA and DG-GROW will deal with the
organisational issues.
Quality of the 2nd generation CLC products. A further issue of improved spatial and thematic
detail is the need to increase the quality of the products so that they will be accepted by the
user community. The conceptual strategy aims to support high quality products by allowing
CLC-Backbone to focus on improvement of the spatial detail and CLC-Core to focus on the
enrichment of thematic detail.
Change detection, update cycle and maintenance. The aim of this project was to propose the
initial technical specification for the 2nd generation CLC products, not to design a
methodology for the updating and maintenance of the product in an operational monitoring
environment. These issues have been noted where relevant in the technical specifications
and are likely to feature more prominently in later development work.
Feasibility studies. The project was designed as a desk study and there was no requirement
or resources to undertake any testing of the specifications proposed or methods that may
be relevant. Where appropriate existing feasibility study results were incorporated into the
reviews and technical specification development, such as the work undertaken in Hungary.
Governance and implementation of the 2nd generation CLC. As was expected, the proposal of
a set of products more closely aligned to MS activities, yet integral to the CLMS, has raised a
number of issues related to governance and implementation of work. These issues are out of
scope for this project, but have been noted and are being dealt with by the EEA and DG
GROW.
From the requirements review so far, the feedback from the workshops and considering the specific
requirement associated with LULUCF, it is suggested that the ultimate product of a 2nd generation
CLC+ should have a MMU of around 1 ha (with the ability to estimate areas down to 0.5 ha), a MMW
of 20 m and be based on EO data with a spatial resolution of between 10 and 20 m. The thematic
content would be a refinement of the current CLC nomenclature to cope with changing MMU,
20 | P a g e
separation of land cover and land use for the needs of ecosystem mapping and assessment. The
temporal update could come down from 6 years to 3 years in the short-term, and potentially to 1
year in the longer-term. To reach this ultimate goal and initiate implementation of the 2nd
generation CLC, the requirements review supports the role of the CLC-backbone and CLC-Core in
contributing to the development and delivery of the required land monitoring system.
21 | P a g e
4 CLC-BACKBONE – THE INDUSTRY CALL FOR TENDER
4.1 Introduction CLC-Backbone has been conceived as a new high spatial resolution vector product. It represents a
baseline object delineation with the emphasis on geometric rather than thematic detail. The wall-to-
wall coverage of CLC-Backbone shall draw on, complete and amend the incomplete picture formed
by the current local component products (Urban Atlas, Riparian Zones and Natura 2000) which cover
approximately a quarter of total area as shown in Figure 4-1. CLC-Backbone will provide a
comprehensive and detailed coverage of whole EEA-39.
The objective of the CLC-Backbone is to provide:
A useful standalone product
a spatially detailed geometric base for CLC+
Basic homogeneous land cover units (EAGLE terminology) as thematic input to CLC-Core and
CLC+
Figure 4-1: illustration of a merger of current local component layers covering (with overlaps) 26,3% of EEA39 territory (legend: red – UA, yellow - N2K, blue – RZ LC)
CLC-Backbone will initially be produced by a mostly automated industrial approach divided into
three distinct steps (Figure 4-2). The spatial units of CLC-Backbone, called “landscape objects”, are
22 | P a g e
vector polygons. They are attributed using a wall-to-wall pixel-based 10*10m2 classification and
further object-based characteristics.
Figure 4-2: Processing steps to derive (1) the geometric partition of objects on Level 1 using a-priori information, (2) the delineation of objects on Level 2 using image segmentation techniques and (3) the pixel-based classification of EAGLE land cover components and (4) attribution of Level 2 objects based on this classification and additional Sentinel-1 information.
For implementation of CLC-Backbone the following principles are taken into account in order to
contribute to a harmonized European land cover monitoring framework:
• Heritage: The concept takes into account the existing pan-European and local Copernicus
layers. The technical specifications to be provided by this contract are therefore based on a
thorough review of these datasets and their geometry and thematic content.
• Integration: CLC-Backbone will integrate selected geometric information from existing
COPERNICUS products and ancillary data within the process.
4.2 Description of CLC-backbone
CLC-backbone will consist of two sub-products:
Raster product: pixel-based (Sentinel-2 based), 10*10m2, pure land cover classes
Vector product: 1 ha MMU, derived from linear networks and image segmentation,
attributed with aggregated statistics from the raster product and additional characteristics
from Sentinel-1
23 | P a g e
The production of the CLC-backbone foresees four main steps:
Step 1:
The first level of object borders (geometric skeleton) derived in this step represent persistent
object borders (“hard bones”) in the landscape (Step 1). The borders are derived from ancillary data
(roads, railways and rivers).
Step 2:
On a second level a subdivision of the persistent objects (Level 1) will be achieved through image
segmentation (“soft bones”), based on multi-temporal Sentinel data within a defined observation
period (resulting in Level 2 landscape objects = polygons). The geometric output of CLC-Backbone –
the delineated “landscape objects” – represent spectrally and/or texturally homogeneous features
that are further characterised and attributed in step 3.
Step 3:
Production of an independent pixel based (10*10m2) pure land cover classification with pure land
cover classes based on the EAGLE land cover component concept.
Step 4:
Characterization of the landscape (objects/features) from Step 2 using the pixel values from Step 3.
Further attribution using spectral characteristic of the object (mean, variance) from Sentinel-2 and
Sentinel-1.
4.2.1 Level 1 Objects – hard bones The idea behind the Level 1 objects (“hard bones”) is to derive a partition of landscape that is
reflecting relatively stable (persistent) structures and borders built from anthropogenic and natural
features. The transportation network like roads and railways (both without tunnels) is the major
component, followed by the river networks.
When building the spatial framework, there is a potential long list of sources for persistent object
geometries. However, throughout the consultation phase the decision was taken to solely
concentrate on linear networks and provide all other object geometries directly from Image
segmentation.
4.2.2 Level 2 Objects – soft bones The aim of this second level segmentation is to derive objects that can be differentiated by the
spectral response and behaviour throughout a year. With the increasing number of observation
dates per pixel throughout a year the likelihood to find homogeneous management units increases,
but at the same time pixel based noise is increased as environmental conditions within a field parcel
may vary significantly throughout a year (e.g. soil moisture content). Service providers will have to
balance the number of acquisitions to maximize the identification of single field parcels and on the
other side to reduce “salt and pepper” effects due to varying spectral response.
The level 2 landscape objects represent in the ideal case a consistent management of field parcels,
and/or objects with a unique vegetation cover and homogeneous vegetation dynamics throughout a
year. As an example, different settlement structures (e.g. single houses or building blocks) should
appear spectrally different (due to the different levels of sealing, shape spatial extend and spatial
built-up pattern) and will therefore be delineated as separate polygons. The actual cultivation
measures applied on the land will also influence the type of delineation as e.g. in agriculture the field
structure is visible in multi-temporal images due to the differences in temporal-spectral response
according to the applied management practices on field level (mowing, ploughing, sowing).
24 | P a g e
Table 4-1: Main characteristic of Level 1 objects (hard bone) and level 2 objects (soft bones)
1. Level objects 2. Level objects
Name Hard bones Soft bones
Method Partition of landscape according to existing spatial data
Image segmentation
Input data Persistent landscape objects: Transport infrastructure (roads, railways excluding tunnels); river network
Sentinel-2 complete time-stacks of one vegetation period (amended by Landsat 8 data in case of clouds/haze)
Special issue Geometric transformation of line geometry into vector-polygons and further transformation of vector-polygons into raster based regions
Pre-processing (atmospheric and topographic) of S-2 images;
Advantage Very good cost/benefit ratio; no doubling of efforts, reuse existing European geospatial data infrastructure;
Independent data source; Clearly defined observation period (2018)
Disadvantage Data might be outdated; large vector processing facilities needed; effort to integrate and collect appropriate data
Reliability and reproducibility of segmentation algorithms
Concerns European data vs. national data Technical feasibility (big data processing); cooperation with data centres necessary
4.3 Technical specifications
4.3.1 Spatial scale / Minimum Mapping Unit Raster product:
MMU 100 m2 (10*10m2 Pixel)
Vector product:
CLC-Backbone shall address landscape objects with a predefined
o MMU of 1 ha and a
o Minimum mapping width (MMW) of 20m.
Units (final “landscape objects” resulting from merged Level 1 and Level 2)
o Spectral and/or textural homogeneous (over time) units that represent meaningful
objects in the real world (unique land cover and/or homogeneous vegetation
dynamics)
o They are characterized with the land cover class that is predominantly occurring
throughout the season (e.g. differentiation between trees and clear-cut according to
dominating time period of occurrence within observation period)
25 | P a g e
o Examples for meaningful objects are:
Single agricultural management units (land parcels)
Tree stands that are homogeneous according to criteria age, mixture and/or
management
Clear cuts
City blocks
Extraction site
Lakes
Wider roads, railways and rivers (>=20m)
o Short-term temporary effects (less than 6 month) like variation in soil water content
or temporary flooding do not constitute a meaningful object in the context of CLC-
backbone
o Adjacent Units can have the same land cover code
As they might only be separated by a road or river, but nevertheless
comprise different spatial units with unique management cycle and spatial
configuration (and thus different spectral-temporal characteristics)
Area coverage of units
o Wall-to-wall
o 100% of area has to be covered by segmented units
4.3.2 Delineation accuracy of geometric units The appropriate size and delineation are defined by a sample of verification polygons derived via
visual interpretation
The delineation accuracy of the resulting polygons (vector product) is defined with
Appropriate size:
o Too large polygons (overestimation of object area): not more than 10% of all
polygons
o Too small polygons (underestimation of object area): not more than 15% of all
polygons
Any deviation of more than 15% of polygon area is considered too small or
too large
Appropriate delineation (positional accuracy)
o Shift of border: max. 20m shift
not more than 10% of perimeters is shifted more than 15m
4.3.3 Reference year The production of CLC-Backbone shall generally make use of images from the year 2017/2018, i.e.
Sentinel-2 HR layer stack. The image stack covers one full vegetation season with 2018 as reference
year, but taking regional conditions into account.
The vegetation season in Mediterranean regions starts in October of the previous year and the
vegetation season is for this region therefore set to 10/2017-9/2018. Northern regions, with short
vegetation season, and areas along the Atlantic coast (with frequent and high cloud cover) may,
26 | P a g e
even with Sentinel-2 , be impossible to cover with data from a single season. A pragmatic solution is
then to include imagery from 2017 or earlier if necessary. Special condition, currently not known to
the development team, may of course also imply similar pragmatic deviations from the reference
year in other parts of Europe.
4.3.4 EO data The Sentinel-2 data for the production of CLC-Backbone will need to make use of all available
observations from the ESA service hubs. The aim is to provide a multi-spectral, multi-date, 10-20 m
spatial resolution imagery as basic EO-input. This selection can be amended by Sentinel-1, where
helpful. As the full satellite constellation of Sentinel (using 2A and 2B) is available operationally from
late 2017 onwards, it is anticipated that last months of 2017 and full year 2018 is the first full
vegetation period for a comprehensive multi-temporal coverage of Europe.
The complete layer stack of multiple time series of all S-2 (and if necessary S-1) acquisitions have to
be considered to receive in minimum 6 (i.e. monthly) acquisitions for the 2018 vegetation period
(that is extended to 2017 observations in Mediterranean areas) in order to:
fully track the seasonal cycles of vegetation
increase the automation of the processing
ensure the homogeneity of results across Europe
Older Sentinel-2 archive data (before 2018) might be used to
close data gaps for 2017/18, e.g. due to cloud cover (specifically in northern countries)
to confirm stable object geometries (“soft bones”)
In case of availability of a European-wide coverage of VHR data for 2018, its integration in the
processing chain should be considered (either for accompanying the production or for verification).
The synergetic use of Sentinel -1 (SAR) imagery shall be considered for improving classification
accuracy and enhancing the thematic information, especially on thematic issues like soil properties
(wetness). Recent studies on the use of merged optical-SAR imagery as well as SAR visual products
and the experiences in HRL production have confirmed their applicability for both semi-automated
classification and visual interpretation.
Landsat 8 or equivalent data shall be used for gap filling in case of insufficient coverage by Sentinel-
2.
4.3.4.1 Pre-processing of Sentinel-2 data
As Sentinel-2 is an optical system the average cloud coverage influences the number of observations
significantly. All cloud free (and shadow-free) pixels of an image can be analysed. The constellation
of Sentinel-2 satellites provides an average the revisiting frequency of 3-4 days in Europe. The
coverage is even more frequent in northern regions due to the overlaps of the S-2 path-footprints.
EO imagery has to be pre-processed to correct for atmospheric and topographic conditions. ESA has
decided that Sen2Cor will be used for correction of imagery for Europe and will reprocess the L2A
level of Sentinel 2 imagery for Europe back to May 2017 using this algorithm. The current plan is
thus to start atmospheric correction using Sen2Cor in Europe and to reassess the algorithms
(e.g. Maja) in 2019.
In case radiometric pre-processing using ESA Sentinel data hub is not available in time, service
providers will have to perform their own radiometric pre-processing. Service Providers will therefore
have to prove their ability to perform the pre-processing as well as a quality assurance to exclude
scenes with displacement errors (common problem with S-2).
27 | P a g e
4.4 Thematic attribution
Raster product:
Classification of individual pixels according to the nomenclature described is chapter 0.
below
Vector product:
The polygons produced by geometric segmentation (merging Level 1 and Level 2) specification are
classified using three different approaches.
1. Attribution with basic land cover class statistic based on pixels in the raster product found in
the landscape object
Statistical parameters to be assigned to the landscape object
i. Dominating land cover class (majority)
ii. Ordered list of land cover class percentages (3 most dominating land cover
classes)
iii. Percentage of land cover class i…j
1. count of pixels per land cover class/total number of pixels per
landscape object
2. Land cover class (chapter 0) determined by conventional analysis of the spectral properties
of the entire object
3. Attribution according to Sentinel-1 data
(1) Wetness
(2) Roughness
28 | P a g e
4.4.1 Thematic classes Thematic detail:
After obtaining the hard bones and soft bones of the geometric skeleton, each single landscape object
is then labelled by simple land cover classes. The descriptive nomenclature that is applied for the
labelling contains the following EAGLE derived Land Cover Components:
9 land cover classes
1. Sealed surface (buildings and flat sealed surfaces)
2. Woody – coniferous
3. Woody – broadleaved
4. Permanent herbaceous (i.e. grasslands)
5. Periodically herbaceous (i.e. arable land, natural grasslands with periodic vegetation
cover)
6. Permanent bare soil
7. Non-vegetated or scantly vegetated surfaces (i.e. rock, screes and sand, lichen,
sparsely vegetated alpine heath)
8. Water surfaces
9. Snow & ice
Remark to class woody:
Representatives from some member countries have expressed the wish to further differentiate the
class woody into “trees” and “shrubs”. This differentiation would be useful but can currently not be
achieved with sufficient accuracy from spectral attributes alone. A robust differentiation is only
possible by using a normalized difference surface model (nDSM).
In the CLC-Backbone nomenclature code list no land use terminology has been used. This is done to avoid semantic confusions between land cover and land use. Land use is, however, implied in the separation between permanent and periodically herbaceous land.
permanent herbaceous
o Permanent herbaceous areas are characterized by a continuous vegetation cover throughout a year. No bare soil occurs within a year. These areas are either unmanaged or extensively managed natural grasslands or permanently managed grasslands, or arable areas with a permanent vegetation cover (e.g. fodder crops) or even set-aside land in agriculture. For managed grasslands, the biomass will vary over the year, depending on the number of mowing (grassland cuts) or grazing events.
periodically herbaceous
o Periodically herbaceous areas are characterized by at least one land cover change (in the sense of EAGLE land cover components) between bare soil and herbaceous vegetation within one year. Depending on the management intensities these areas can also have up to several changes between these two EAGLE land cover components within a year. Normally these areas are managed as arable areas.
29 | P a g e
A permanent and managed grassland as defined in IACS/LPIS may be ploughed every 3-5 years for amelioration purposes followed by an artificial seeding phase and a renewal of vegetation cover. Thus, a managed grassland may even show a phase of bare soil within a time frame of 5-6 years. This has to be considered, when evaluating shorter time phases, as by occasion a managed permanent grassland may accidentally be subject to renewal within the observation period of 1 year.
Complex classes, either as mixture of existing land cover classes (e.g. CLC-mixed classes) or in the sense of ecologically diverse classes (e.g. wetlands), are avoided because the nomenclature solely concentrates on the classification according to surface structure and properties.
A detailed technical description how to model the temporal changes is given in Annex 3.
Important note:
As the main emphasis is on the geometric delineation of landscape objects neighbouring objects may
be assigned the same land cover code. The polygons should still be kept as unique entities, as they can
be different with respect to attributes that may be added in a later stage (e.g. broadleaved trees vs.
needle-leaved coniferous trees, count of grass cutting, number of bare soil count, wetness …).
Accuracy:
The accuracy for the thematic differentiation is defined with
9 land cover classes
o 85% overall accuracy: for 10*10 m2 pixel based product
Omission errors: max. 20%
Commission errors: max. 20%
o 90% overall accuracy: for vector based product
Omission errors: max. 20%
Commission errors: max. 20%
Remark on further development towards CLC+:
Apart from the technical specifications for the foreseen tender for CLC-Backbone, additional
attribution by various data sources is an option for future enrichment of classes towards the
development of CLC+. First of all, member countries can be asked to populate the resulting geometries
with national data. In addition, crowd-sourced data may be used to characterize the land use within
these polygons (either by in-situ point observations or by wall-to-wall datasets e.g. OSM land use).
4.5 European ancillary data sets for Level 1 hard bone creation
Throughout the consultation process a number of potential datasets that could contribute to the
creation of the “hard bones” (level 1 objects in Figure 4-2) were discussed. The team decided, after
thorough consideration of all proposals, to concentrate solely on linear networks such as roads,
railways and rivers as input to the hard bone.
30 | P a g e
Table 4-2: Overview of existing relevant products which were analysed as potential input to support the construction of the geometric structure (Level 1 “hard bones”) of the CLC-Backbone.
Product comment Format Potential use for constructing basic geometry
Reference year
Roads and railways
National data portals7 National reference data line National INSPIRE data set provision; not harmonized European dataset, varying license conditions, varying levels of roads, varying information on tunnels
Varying throughout member countries
Open Street Map – roads and railways
Crowd sourced data Line Linear transportation network (centre lines), standardized classes, information on tunnels available
Frequent change mapping; full time history
HERE maps (Navmart) Operational Commercial data
Line Commercial layer; licensing constraints
Regular updates based on business processes
Rivers
EU Hydro (Copernicus reference layer)
Free an open COPERNICUS product
Line Geometric quality derived from VHR images, very high location accuracy
2011-2013
WISE WFD – surface water bodies
bottom-up dataset from official national hydrography networks
Waterbodies as polygons Waterbody line as line geometry
Based on WFD2016 reporting (UK and Slovenia only for viewing)
2016 (6 year update cycle, next update 2022)
7 At the current stage not all geographic information is available via the national INPSIRE portals
31 | P a g e
The usage of additional geospatial information for constructing Level 1 “hard bones” is based on the
following arguments:
European landscape is a highly anthropogenic transformed landscape, where transport and
river networks (beside others) form the basic subdivision of landscape
The location of transport and river networks are fairly known due to geospatial data on
national and/or European scale
Many of the borders defined by transport and river networks can also be identified from
Sentinel-2 images, but at high costs and with lower accuracies
Usage of linear networks:
The linear networks are used to represent line features (in the sense of being smaller than 20m) on
the earth surface. Therefore, all kind of tunnels are excluded.
The linear networks (centre line of roads, railways and rivers) represent relatively stable borders of
landscape objects (e.g. parcels/agricultural fields/city blocks that are surrounded by roads, railways
and rivers). This division of landscape is used as input to step 2 for the subsequent segmentation.
Wider roads (>20m) however establish landscape objects themselves.
It is important to note that no attribute information is needed for the integration of the linear
networks. Any part of the linear network represents a border for the adjacent polygon, therefore in
the final partition of the landscape not even the information, if the underlying border represents a
road, railway or river is of any importance anymore. The vector line network is transformed into
polygons (1 ha MMU). These polygons are further subdivided using Sentinel-2 data in Step 2 using
automated image segmentation.
Roads, railways and rivers that are wider than 20m have to be covered in step 2 as polygon features.
In case automated image segmentation algorithms do not fully recover these narrow and elongated
polygons a manual editing phase for sufficiently covering those wider linear networks (that comprise
polygon features) has to be foreseen.
4.5.1 Criteria for ancillary data input The geospatial data for the creation of the CLC-Backbone have to meet following specifications:
Provision of a consistent set of information across European countries
o Scale: 1:10.000 or better
o Geometric quality: +/- 5m positional accuracy
o Geometric representation:
Centre line
Parallel lines (distance less than 20 m, e.g. driving directions) have to be
merged
o Completeness: no spatial or thematic data gaps
Spatial completeness: >95% of network covered
Thematic completeness: all types of roads from highway down to
agricultural tracks covered
o Network: connected network
no dangles
snapping tolerance 10m
32 | P a g e
o Specifically, for roads and railways
Information on tunnels per line string
Road types:
Standardized and comparable road type categories across Europe
all types of roads (from highway to agricultural tracks)
o Specifically, for rivers
Completeness: All rivers within catchments larger than 10 km2
Provision of data at no cost or at reasonable costs, where cost reduction in the overall
process justifies the costs
Support of the full, free and open data policy of the end product
o Access to final data is based on a principle of full, open and free access as
established by the Copernicus data and information policy Regulation (EU) No
1159/2013 of 12 July 2013
Provision of data to industrial service providers, potentially outside the country of the data
provider.
Availability of the data to the service provider at the start of the project implementation.
A combination of datasets is feasible (e.g. WFD-WISE dataset within EU amended by EU-Hydro
outside EU coverage), if both dataset comply with the specification criteria.
4.5.2 Ancillary input data Three different possible data sources are identified for transport network (Table 4.1):
National reference data
Open street map data
Commercially available navigation data
For rivers two European datasets are available
Water framework directive – WISE
EU-hydro (COPERNICUS product)
The choice of dataset to be used for the production has to consider the criteria defined in chapter
4.5.1. If alternative dataset exists (e.g. availability of national reference data and OSM) the objective
criteria in chapter 4.5.1 are used to select the data with the highest fitness of purpose.
4.5.2.1 Open street map data
Throughout the last 10-15 years the Open Street Map has turned to be the major and
comprehensive provider for road and railway networks worldwide. Data collected by volunteers and
provided with an open-content license.
One major critic when using crowd-sourced data is their incompleteness and varying quality. For
OSM road dataset recent scientific reviews (e.g. Barrington-Leigh and Millard-Ball, 20178) have
shown a completeness in Europe of by far more than 96% for the majority of the EEA 39 countries
(the figures for the completeness per country are available in the Annex 3). According to their study
only Serbia, Bosnia-Hercegovina and Turkey have completeness below 90%. These three countries
may be subject for a refinement of roads using additional national data sources.
In the Figure 4-3 below the completeness of OSM in four countries (Estonia, Portugal, Romania and
Serbia) is visualised on aerial images and Sentinel-2 background data.
8 http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0180698
33 | P a g e
Figure 4-3: Illustration of OSM completeness for Estonia (near Vändra), Portugal (near Pinhal Novo), Romania
(near Parta) and Serbia (near Indija) (top to bottom). © Bing/Google and EOX Sentinel-2 cloud free services as
Background (left to right).
License:
The Open Street Map data is licensed under the Open Data Commons Open Database License
(ODbL) by the Open Street Map Foundation (OSMF). As the input data from OSM is restricted to
geometric properties and these properties are substantially altered according to the geometric
transformation from original vector data into the grid database and combined with remote sensing
data the CLC-backbone is not regarded as a derivative database of OSM, but a collective database.
34 | P a g e
Therefore, according to Art. 4,5 of the ODbL v1.09 the COPERNICUS standard license (guaranteeing a
full, free and open access with the possibility to derive business application) can be applied to the
dataset.
4.5.2.2 INSPIRE national portals: roads & railways
The project ELF – European Location Framework10 – has compiled overviews for existing national
services mainly for INSPIRE Annex I themes (roads buildings, hydrology). They try to serve as single
point of access for harmonised reference data from National Mapping, Cadastre and Land Registry
Authorities. The INSPIRE regulation will certainly lead to an increased availability and accessibility for
European reference data, but at the current point in time the data situation even for core national
data is still quite patchy and requires large efforts for searching, finding, compiling and harmonizing
(semantically and geometrically) the various dataset. Therefore, ELF is currently not suited to
provide data according to the criteria defined in chapter 4.5.1.
The recent survey (status: 19. Jan. 2018) in the official INSPIRE portal 11 reveals that in total 101
datasets are available from 10 member countries for download in the data theme “transport
networks”. Some countries do only provide regional distributed datasets, but not singular national
wide download services.
Figure 4-4: Downloadable datasets for INSPIRE transport network (road) services using the official INSPIRE interface (Status: Jan. 2018).
9 https://opendatacommons.org/licenses/odbl/1.0/ 10 http://locationframework.eu/ 11 http://inspire-geoportal.ec.europa.eu/thematicviewer/Theme.action?themecode=tn
35 | P a g e
Figure 4-5: Distribution of number of downloadable datasets for INSPIRE transport service across member countries (Status: Jan. 2018)
The advantage of using national data instead of European wide available data has to be carefully
evaluated with regard to costs and benefit. First of all, it is still a challenge to find the appropriate
dataset. Although it is known that other national services exist (e.g. Spain), they cannot be found
using this central access portal. This has to be considered, when replacing existing European wide
data sources with national data.
The current available national data are not harmonized across member countries regarding the e.g.
definition of road-types or completeness of road types.
4.5.2.3 Rivers and lakes
One example for a joint European (EU 28) dataset is the reference spatial data set of the water
framework directive (WFD) that is part of the water information system for Europe (WISE)12. They
compile national information reported by the EU Member states to one homogeneous dataset.
According to the reporting cycle of the WFD that requires an update ever 6 years, the dataset is
available in a version 2010 and 2016. Such datasets may provide a valuable data source for the
generation of “hard bones”, but still do not provide the optimal solution. In the case of the WISE
WFD dataset the coverage is in principal restricted to EU 28, but not even complete within EU 28, as
12 https://www.eea.europa.eu/data-and-maps/data/wise-wfd-spatial
36 | P a g e
countries like UK, Ireland, Slovenia and Lithuania are not included yet (UK and Slovenia are only
available for EEA and EC for visualisation purposes).
Compared to the EU-Hydro dataset they provide a higher geometric quality, but are less complete,
as smaller rivers of catchments less than 10 km2 are not included by definition.
A combination of these datasets is recommended.
4.5.3 Remark: land parcel identification system (LPIS) The land parcel identification system (LPIS) in Europe is at the current stage not suited to be
integrated into the Level 1 “hard bone” process, as the data is currently either
not completely available on national level (approx. 50% of countries provide data)
partly only available on regional level (Spain, Germany)
do not represent the same spatial scale across countries
are not available with the same semantic information (e.g. grassland/arable land
differentiation)
if data are harmonized under INSPIRE, they are provided either under Annex II land cover or
Annex III land use (but not in one homogeneous data format)
not accessible for all users / uses
A detailed explanation is given in the Annex 4.
It is expected that the availability and homogeneity of IACS/LPIS data in Europe is improving in the
next years. For a future update of the CLC-backbone this improved availability of LPIS data will play
an important role.
37 | P a g e
4.6 Short-Form of CLC-Backbone end product technical specifications
CLC-Backbone
[Example image if available.]
Description
CLC-Backbone is a spatially detailed, large scale, EO-based land cover inventory in a vector format
(accompanied by a raster product layer) providing a geometric backbone with limited, but robust
thematic detail on which to build other products.
Input data sources (EO & ancillary)
EO data used in the production
Spatial resolution 10 – 20 m
Spectral content – Visible, NIR, SWIR, SAR
Temporal resolution – in minimum monthly cloud free acquisitions
Source data –Sentinel-2 amended by Landsat 8 where necessary
Contributing data – Sentinel-1
Ancillary data used in the production
Roads and railways (national datasets and/or OSM)
Hydrography: WISE WFD and/or EU – Hydro
Thematic Content
EAGLE Land Cover Components:
1…Sealed
2…Woody vegetation - coniferous
3…Woody vegetation - broadleaved
4…permanent herbaceous (i.e. grasslands)
5…Periodically herbaceous (i.e. arable land)
6…Permanent bare soil
7…Non-vegetated bare surfaces (i.e. rock and screes)
8…Water surfaces
9…Snow & ice
38 | P a g e
Methodology
The detailed geometric structure is derived via a hierarchical 2-step approach. In a first step, the
boundaries (hard bones) of relatively stable landscape objects are detected based on existing data
sets for roads, railways and rivers (Level 1 objects). In a second step, image segmentation
techniques based on multi-temporal Sentinel-2 data are applied to further subdivide the Level 1
objects into smaller units (Level 2 objects).
The third step contains a classification and spectral-temporal attribution (Sentinel-1 and -2) of the
landscape objects derived from steps 1 and 2 using a pixel-based classification of the Land Cover
components. The pixel based classification is generalized on object level using indicators like
dominating land cover class and percentage mixture of land cover classes.
The third step contains an independent land cover classification of individual pixels. The fourth step
assigns land cover class values (dominating class, percentages of classes, …) to the Level 2 polygons
using the pixel map (Step 3). In addition, other attribution methods are used to characterize the
polygon with spectral-temporal characteristic and indices (e.g. Sentinel-1 characteristics).
Geometric resolution (Scale)
~1:10 000
Geographic projection / Reference system
ETRS89-LAEA
ETRS89 Lambert Azimuthal Equal-Area (EPSG: 3035, http://spatialreference.org/ref/epsg/etrs89-
etrs-laea/)
Coverage
EEA 32 Member Countries and 7 Cooperating Countries, i.e. the full EEA39 (Albania, Austria, Belgium,
Bosnia- Herzegovina, Bulgaria, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, France ,
Germany, Greece, Hungary, Iceland, Ireland, Italy, Kosovo (under the UN Security Council Resolution
1244/99), Latvia, Liechtenstein, Lithuania, Luxembourg, Former Yugoslavian Republic of Macedonia,
Malta, Montenegro, the Netherlands, Norway , Poland, Portugal , Romania, Serbia, Slovakia,
Slovenia, Spain, Sweden, Switzerland, Turkey, United Kingdom) who are participating in the regular
CLC dataflow. The number of countries involved is 39. The area covered is, approximately, 6 Million
km2.
Geometric accuracy (positioning accuracy)
Raster product: equals Sentinel-2
Vector product: Less than half a pixel based on 10 m spatial resolution input data
Spatial resolution / Minimum Mapping Unit (MMU)
Raster product: 100 m2 (10*10m2)
Vector product: 1 ha
Min. Width of linear features
Vector product: 20 m
Topological correctness
[TBD]
39 | P a g e
Raster coding
Encoding identical to [thematic detail]
Thematic accuracy (in %) / quality method
The target will be set at 90 %. A quantitative approach will be used based on a set of stratified random
point samples compared to external datasets (e.g. VHR, GoogleEarth, national orthophotos).
Target / reference year
2018 (full vegetation season, starting partly in autumn 2017 in southern Europe)
Up-date frequency
[TBD]
Information actuality
Plus and minus 1 year from target year (2017 – 2019). A longer time window may be required for the
ancillary datasets.
Delivery format
Raster product: Grid
Vector product: Vector
Data type
Raster product: Grid
Vector product: Vector
Naming conventions
[TBD]
Medium
[TBD]
Delivery reliability
[TBD]
Delivery time
[TBD]
Archive
[TBD]
40 | P a g e
5 CLC-CORE – THE GRID APPROACH
The CLC-Core product should be seen as an underpinning semantically and geometrically
harmonized “data container” and “data modeller” that holds, or allows referencing of, land
monitoring information from different sources in a grid-based information system. The grid
database will essentially store geospatial and thematic data from sources such as CLC-Backbone,
CLC, Local Component data, HRLs, in-situ as well as information provided by the MS. The contents of
the grid database can be exploited directly within grid-based analyses or alternatively subsets of the
grid database can be extracted and analysed by spatial queries driven by objects from an external
vector dataset. In this way, the CLC-Core can support the other elements of the 2nd generation CLC,
for instance, a CLC-Backbone object could gain additional thematic information by aggregating or
analysing the equivalent portion of the CLC-Core as a starting point for CLC+ production (see next
chapter).
5.1 Background Textbooks usually differentiate between “raster GIS” and “vector GIS” (Couclelis 199213, Congalton
199714). The “vector GIS” does, like the paper map, attempt to represent individual objects that can
be identified, measured, classified and digitized. The “raster GIS” attempts to represent continuous
spatial fields approximated by a partition of the land into small spatial units with uniform size and
shape. The “raster GIS” therefore have structural properties that make them particularly suitable for
use in modelling exercises.
A grid is a spatial model falling somewhere between the vector model and the raster model. The grid
looks like a raster. The difference is the information attached to the grid cell. A raster cell (pixel) is
classified to a particular land cover class or single characteristic. The grid cell, on the other hand, is
characterized by how much it contains of each land cover class or other information. The grid is, as a
result, a more information-rich data model than the raster.
The difference between the raster and the grid is illustrated in Figure 5-1, using an example based on
CLC. The original CLC vector data set can be converted into a new data set with uniform spatial units
of e.g. in this case 1 X 1 km. The result can either be a “raster” where a single CLC class is assigned to
each pixel (representing for example the dominant land cover class inside the pixel; or the CLC class
found at the centre of the pixel unit). Alternatively, the result is a “grid” where an attribute vector
representing the complete composition of land cover classes inside the unit is assigned to each grid
cell. Geometric information on a more detailed scale than the size of the spatial unit is in both cases
lost, but the overall loss of information is less in the grid than in the raster.
The loss of information when using a grid structure can be further reduced if the grid size is selected
to be less than the minimum mapping unit of the input product. For instance, with a MMU of 0.25 ha
13 Couclelis, H. 1992. People manipulate objects (but cultivate fields): Beyond the raster-vector debate in GIS, in Frank, A.U., Campari, I. and Formentini, U. (eds) Theories and Methods of Spatio-Temporal Reasoning in Geographic Space, Lecture Notes in Computer Science 639: 65 – 77, Springer Berlin Heidelberg 14 Congalton, R.G. 1997. Exploring and evaluating the consequences of vector-to-raster and raster-to-vector conversion. Photogrammetric Engineering and Remote Sensing, 63: 425-434.
41 | P a g e
(50 x 50 m), the grid size should be around 10 x 10 m. Consequently, any downstream products
derived from the grid-based information must also consider grid size in relation to its requirements.
Figure 5-1: CLC with a 1 km raster/grid superimposed (top) illustrating the difference between encoding a particular unit as raster pixel (centre) or a grid cell (bottom). “daa” is a Norwegian unit: 10 daa = 1 ha.
A grid data model is not the result of mapping in any traditional sense. The grid is “populated” with
thematic information from (one or more) existing sources (Figure 5-2). One method frequently used
to populate grids with LULC information is to calculate spatial coverage of each LULC class based on
a geometric overlay between the grid and detailed polygon datasets. The assignment of attributes
can furthermore involve counting or statistical processing of registered data for observations made
inside each grid cell (for instance, not only the area of a class, but also the number of objects it
represented). Ancillary databases, which provide important and relevant information about the
content and context of the grid cells, can also be added. For instance, the number of buildings,
length of roads or number of people found within the grid cells.
5.2 Populating the database The grid as a spatial model consists of square spatial units of uniform size and shape that can have an
(internally) heterogeneous thematic composition, i.e. the grids are “geometrically uniform polygons”.
Each grid cell has a unique ID that is related to a database containing the attribute data.
42 | P a g e
In a first step, CLC-Core will be populated with existing thematic information from the CLMS (see
also chapter 4.4):
Local Component data (Urban Atlas, N2000, Riparian Zones)
High Resolution Layers (Imperviousness, Forest, Grassland, Water & Wetness)
CLC-Backbone
Any other data, such as CLC (25 ha), MS data, …
Upon agreement, Member States are invited to provide also their national information to populate
the information system, mainly on land use and agricultural themes.
Figure 5-2: Representation of real world data in the CLC-Core.
5.3 Data modelling in CLC-Core CLC-Core will be the effective “engine” to model, process and integrate the individual input data sets
to create new, value added information. The modelling will make use of synergies (the same
information provided for one cell based on different sources) as well as disagreements (different
information provided for one cell by different sources) during the data processing.
We would like to refer to the modelling and combination of different layers of the database,
including the modelling parameters as an “instance” of the database. In this understanding, any
output of a database modelling process using different input layers and different parameters will
create a specific instance with associated metadata.
Technically CLC-Core shall:
Allow further characterisation (e.g. by additional input data / knowledge, see also chapter
5.2 - Populating the database).
Allow for (fully or partial) inclusion of Copernicus High Resolution Layers and local
component products.
Allow the best possible transformation of National datasets into CLC+ (capturing the wealth
of local knowledge existing in the countries).
Storing information in a grid structure will also help to overcome the problem of different geometries:
as the data are converted to grid cells, differences in vector geometries will have a lesser impact.
43 | P a g e
5.4 Database implementation approach for CLC-Core The implementation approach for the database of CLC-Core is based on paradigms originating from
the semantic web and knowledge engineering. This opens up new possibilities beyond the classical
object-relational database management system concept. In the following subsections, we describe
the pros and cons of contemporary object-relational database models, not-only SQL (NoSQL)
concepts, and in particular Triple Stores. In addition, we elaborate on the intended approach that
utilizes a spatio-temporal Triple Store to store, manage and query CLC-Core data. Furthermore, we
propose strategies to generate CLC-Core products – i.e. “flat” shape-files or GeoJSON files – based
on a spatio-temporal Triple store.
5.4.1 Database concepts revisited: from Relational to NoSQL and Triple Stores The contemporary object-relational database model relies on the work of Codd (1970)15. Although
object-relational database management systems (ORDBMS) were developed further since the 1970s,
their relational concept remains. Due to the emergence of the World Wide Web, Web 2.0 and the
Semantic Web the requirements for databases have changed drastically, which led to the
development of NoSQL database concepts (Friedland et al., 2011; Sadalage & Fowler, 2012)16,17.
ORDBMSs are databases that follow the relational model, published by Codd (1970). Any relational
model defines how data are stored, retrieved and altered within databases – base on n-ary relations
on sets. Formally speaking, any relation is a subset of the Cartesian product of sets. In such an
environment “reasoning” is done using a two-valued predicate logic. Data, stored utilizing the
relational model, are viewed as tables, where each row represents an n-tuple of the relation itself.
Each column is labelled with the name of the corresponding set (domain). Operations are performed
using relational algebra, allowing to express data transactions in a formal way. ORDBMSs are
designed to adhere to the ACID principles – atomicity, consistency, isolation and durability – which
are defined by Gray (1981)18. ACID principles ensure that database transactions are performed in a
“reliable” way. In contemporary ORDBMSs consistency is the central issue limiting performance and
the ability to scale horizontally – which is crucial for Web 2.0 or Big Data requirements. In addition,
ORDBMSs rely on a fixed schema to which all data have to adhere. Hence, alterations of the data
model are hardly possible, which leads to the fact that data models in an ORDBMS are regarded as
static. Other downsides of contemporary ORDBMSs are the data load performance, and the handling
of large data volumes with high velocity with replication. Nevertheless, ORDBMS are still widely used
in business applications, as the guarantee consistency, which is crucial for e.g. bank accounting. In
addition, commercial and open-source ORDBMSs are mature products and offer a variety of options
and extensions.
The term “NoSQL” database emerged in 2009 (Evans, 2009)19, as a name for a meeting of database
specialists on distributed structured data storage in San Francisco on June 11, 2009. Since then the
15 Codd, E. F. (1970). A relational model of data for large shared data banks. Communications of the ACM. 13 (6): 377. doi:10.1145/362384.362685 16 Sadalage, P. J., & Fowler, M. (2012). NoSQL distilled: a brief guide to the emerging world of polyglot persistence. Pearson Education. 17 Friedland, A., Hampe, J., Brauer, B., Brückner, M., & Edlich, S. (2011). NoSQL: Einstieg in die Welt nichtrelationaler Web 2.0 Datenbanken. Hanser Publishing. 18 Gray, J. (1981): The Transaction Concept: Virtues and Limitations. In: Proceedings of the 7th International Conference on Very Large Databases, pp. 144–154, Cannes, France. 19 Evans, E. (2017): NOSQL 2009. Url: http://blog.sym-link.com/2009/05/12/nosql_2009.html, visited: 06-11-2017.
44 | P a g e
term has grown into a widely known umbrella term for a number of different database concepts
(Friedland et al., 2011). NoSQL databases have the following characteristics in common:
- non-relational data model
- absence of ACID principles (especially consistency – defined as BASE [eventual consistency
and basically available, soft state eventually consistent])
- flexible schema
- simple API’s (no join operations)
- simple replication approach
- tailored towards distributed and horizontal scalability
NoSQL databases are especially designed for Web 2.0 applications and cloud platforms that require
the handling large data volumes with replication. In addition, NoSQL databases are designed to
support horizontal scalability, which is necessary for handling big data with high turnover rates20.
Currently, NoSQL concepts are employed by e.g. the following companies: Amazon, Google, Yahoo,
Facebook, and Twitter. Types of NoSQL databases are as follows (see e.g. Scholz (2011)21, Sadalage &
Fowler (2012)):
- Column databases: have tables, rows and columns, but the column names and their format
can change – i.e. each row in a table can be different (Khoshafian et al., 198722; Abadi,
200723). Generally speaking, a wide column store can be regarded as 2D key-value store.
Examples are: Apache Cassandra, Apache HBase, Apache Accumulo, Hybertable, Google’s
Bigtable
- Key-value databases: store data in the form of a key and an associated value (similar to a
hash), and strictly no relations. Key value stores show exceptional performance in terms of
horizontal scaling, and handling big data volumes. Examples of key-value stores are e.g.
OrientDB, Dynamo (Amazon), Redis, or Berkeley DB.
- Document databases: rely on the “document” metaphor that implies that the
data/information is contained in documents – that share a given format or encoding. Thus,
encodings like XML or JSON are used to represent documents. Each document can be
schema free, which is a key advantage for Web 2.0 applications. Examples of document
databases are. Apache CouchDB, MongoDB, Cosmos DB (Microsoft), or IBM Domino.
- Graph databases: utilize the notion of a mathematical graph structure for database
purposes (Robinson et al., 2015)24. A graph contains of nodes and edges that connect nodes.
In a graph database, the data/information is stored in nodes which are connected via
relationships, represented by edges. The relations can be annotated in a way that a graph
database may contain semantic information. The concept allows modelling of real-world
phenomena in terms of graphs, e.g. supply-chains, road networks, or medical history for
20 Sadalage, P. J., & Fowler, M. (2012). NoSQL distilled: a brief guide to the emerging world of polyglot persistence. Pearson Education. 21 Scholz J. (2011). Coping with Dynamic, Unstructured Data Sets - NoSQL: a Buzzword or a Savior?. In: M. Schrenk, V. Popovich & P. Zeile (Eds.) Proceedings of the Real CORP 2011, May 18-20, 2011, Essen, Germany: pp. 121 - 129. 22 KHOSHAFIAN, S., COPELAND, G., JAGODIS, T., BORAL H., VALDURIEZ, P. (1987). A query processing strategy for the decomposed storage model. In: ICDE, pp. 636–643. 23 Abadi, D. (2007). Column-Stores for Wide and Sparse Data. In: Proceedings of the 3rd Biennial Conference on Innovative Data Systems Research (CIDR), Url: http://db.csail.mit.edu/projects/cstore/abadicidr07.pdf, visited: Nov. 08, 2017. 24 Robinson, I., Webber, J., Eifrem, E. (2015). Graph Databases: New Opportunities for Connected Data, O'Reilly Media Inc.
45 | P a g e
populations. Graph databases are capable of storing semantics and ontologies – especially in
the field of Geographic Information Science (Lampoltshammer & Wiegand, 2015)25. They
became very popular due to their suitability for social networking, and are used in systems
like Facebook Open Graph, Google Knowledge Graph, or Twitter FlockDB (Miller, 2013)26.
- Multi-modal: such databases are capable of supporting multiple NoSQL models/types.
Examples are: Cosmos DB (Microsoft), CouchBase (Apache), Oracle, OrientDB.
Additionally, the term Linked Data has emerged as part of the Semantic Web initiative (Bizer et al.,
2009, Heath & Bizer, 2011)27,28. Originating from a web of documents, Bizer et al. (2009) describe an
approach to develop a web of data (i.e. the data driven web) – by publishing structured data such
that data from different sources can be interlinked with typed links. Additionally, they formulate
four characteristics of linked data:
- They are published in machine-readable form
- They are published in a way that their meaning is explicitly defined
- They are linked to other datasets
- They can be linked from other data sets
In order to technically fulfil these characteristics, Burners-Lee (2009)29 established a number of
Linked Data principles:
- Uniform Resource Identifiers (URI‘s) to denote things
- HTTP URI‘s shall be used that things can be referred to and dereferenced
- W3C standards like Resource Description Framework (RDF) should be used to provide
information (Subject – Predicate – Object)
- Data about anything should link out to other data, can contribute to the Linked Data Cloud
RDF is a family of W3C specifications that was originally designed as metadata model. Over the years
it has evolved to the general method to describe resources. RDF is based on the notion of making
statements about resources – which follow the form: subject – predicate – object, known as triple.
Here the subject denotes the resource, the predicate represents aspects of the resource, and
therefore is regarded as an expression of the relationship between the subject and object (see
Figure 5-3).
25 Lampoltshammer, T. J. & Wiegand, S. (2015). “Improving the Computational Performance of Ontology-Based Classification Using Graph Databases” Remote Sensing, vol. 7(7): 9473-9491. 26 Miller, J.J. (2013). Graph database applications and concepts with Neo4j, Proceedings of the Southern Association for Information Systems Conference, Atlanta, vol. 2324, pp. 141-147. 27 Bizer, C., Heath, T., & Berners-Lee, T. (2009). Linked data-the story so far. Semantic services, interoperability and web applications: emerging concepts, 205-227. 28 Heath, T. and Bizer, C. 2011. Linked data: Evolving the web into a global data space. Synthesis lectures on the semantic web: theory and technology, 1(1), 1-36. 29 Berners-Lee, T. 2009. Linked Data - Design Issues. Online: http://www.w3.org/DesignIssues/LinkedData.html; visited: Nov. 09, 2017.
46 | P a g e
Figure 5-3: Example of an RDF triple (subject - predicate - object).
To make use of semantics and ontologies in RDF there is the opportunity to integrate an RDF Schema (RDFS) or an Ontology Web Language (OWL). These technologies can be utilized to allow semantic reasoning within RDF triples. Both RDF and RDFS need to be stored in a specific data storage – called Triplestores. Triplestores are databases for the purpose of storing and retrieval of (RDF) triples through semantic queries. Triplestores are specific graph databases, document oriented stores, or specific relational data stores that are designed for RDF data. A high number of available Triplestores is able to store spatial and temporal amended data. The following table gives an overview of the available triplestores (both open-source and commercial products).
Table 5-1: Comparison of selected Triplestores with respect to spatial and temporal data.
Product name Supports spatial data Supports temporal data
Apache Jena Yes No
Apache Marmotta Yes Yes
OpenLink Virtuoso Yes Yes
Oracle Yes Yes
Parliament Yes Yes
Sesame/RDF4J Yes No
AllegroGraph No Yes
Strabon Yes Yes (stSPARQL)
GraphDB Yes No
To query and manipulate RDF data the W3C standardized language SPARQL is used. SPARQL allows
to write queries that can use quantitative (typically proximity based) relations as well as qualitative
relations (spatial “on” and temporal “after”, “from”). Spatial and temporal extensions of SPARQL
(such as stSPARQL and GeoSPARQL) supporting qualitative relations have been proposed but
expressivity and dealing with uncertainty are still major challenges (Belussi & Migliorini, 2014)30.
Nevertheless, spatial queries, using GeoSPARQL can look as depicted in the GeoSPARQL query “list
airports near the city of London”. The result of the query is shown in Figure 5-5, the GeoSPARQL
30 Belussi, A., & Migliorini, S. (2014). A framework for managing temporal dimensions in archaeological data. In Temporal Representation and Reasoning (TIME), 2014 21st International Symposium on (pp. 81-90). IEEE.
47 | P a g e
query “list airports near the city of London” is depicted. The result of the query is shown in Figure
5-4 the GeoSPARQL query “list airports near the city of London” is depicted. The result of the query
is shown in a map metaphor and as JSON (Figure 5-5).
One major feature of triplestores and SPARQL is the possibility to create federated queries. A
federated query is a single query that is distributed to several SPARQL endpoints (capable of
accepting SPARQL queries and returning results), that individually compute results. Finally, the
originating SPARQL endpoint gathers the individual results, compiles them in one single output, and
returns the output accordingly. This allows to query a large number of different data storages that
can be geographically dispersed (e.g. over Europe).
Figure 5-4: GeoSPARQL query for Airports near the City of London.
Figure 5-5: Result of the GeoSPARQL of Figure 5-4 as map and in the JSON format.
PREFIX gn: <http://www.geonames.org/ontology#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX spatial: <http://jena.apache.org/spatial#> PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
SELECT ?link ?name ?lat ?lon WHERE { ?link spatial:nearby(51.139725 -0.895386 50 'km') . ?link gn:name ?name . ?link gn:featureCode gn:S.AIRP . ?link geo:lat ?lat . ?link geo:long ?lon }
48 | P a g e
5.4.2 CLC-Core Database: spatio-temporal Triple Store Approach The approach followed for the CLC-Core database can be based on a triplestore approach. This
approach could exploit the advantages of NoSQL databases (especially triplestores) while
overcoming the drawbacks of contemporary ORDBMS – especially addressing horizontal scalability
and consistency.
Hence, we propose distributed triplestore approach, where each member country of the European
Union could host their own CLC-core data in an RDF format. Each member country could host a
triplestore that provides a SPARQL endpoint. This approach can help to overcome data integrity and
replication issues that might arise with contemporary ORDBMS. Hence, each member country can
host their own data, and offer them to be queried via the SPARQL endpoint. The databases of the
member countries can be connected via federated queries, which requires only one SPARQL query at
an endpoint at hand (i.e. at exactly one member country’s CLC-endpoint) – see Figure 5-6.
If there are member countries that are not sure if they should operate a triplestore (and a SPARQL
endpoint) with the CLC-Core data, they can also ask another country to host their data on their
infrastructure. Such a concept would give CLC+ the flexibility to react to changing situation in EU
member countries regarding e.g. the ability to host the CLC dataset accordingly. For example, Austria
could host the CLC+ data of another country if they e.g. don’t have the resources to develop the
infrastructure for hosting and publishing the data.
Figure 5-6: Schematic view of the distributed SPARQL endpoints communicating with each other. The arrows indicate the flow of information from a query directed to the French CLC SPARQL endpoint.
Inevitable for such an approach is the development of an underlying shared semantical model – i.e.
an ontology. An abstract semantical model – in RDFS or OWL – can be developed that describes the
abstract model of CLC+ (e.g. classes, properties, etc.), which can be stored in a triplestore. This
allows the utilization of the abstract semantic model in queries (semantic reasoning based on the
developed knowledge basis). An example for semantic reasoning could be the determination of the
land cover type, based on a number of properties of grid cells.
France: CLC+ SPARQL
Germany: CLC+ SPARQL
Austria: CLC+ SPARQL
Italy: CLC+ SPARQL
SPARQL query Federated SPARQL queries Flow of result data
49 | P a g e
5.4.3 Processing and Publishing of CLC-Core Products To generate CLC+ products based on the distributed triplestore architecture, we need to define the
products requested by the customers. Based on the defined products we need to develop a
migration/transition strategy describing the data flow from triplestores to the desired product.
We are of the opinion that the RDF approach is well suited for distributed storage and ad-hoc
queries and analyses, whereas customers would prefer a single “flat” file containing the CLC+ data.
Due to historic (and practical) reasons such a file could be an ESRI shapefile, an ESRI Geodatabase
file, a GRID file or a GeoJSON file. Such files containing the CLC+ data could be generated in an
automatic way by using a spatial Extract Transform and Load (ETL) tool like SAFE FME31, geokettle32
(open source), GDAL/OGR33 (open source). The migration can be performed with e.g. different
geographical or temporal coverage. These files can be hosted and published on a central server,
where they are available for the public (see Error! Reference source not found.).
Figure 5-7: Intended generation of CLC+ products based on the distributed triplestore architecture.
5.4.4 Conclusion and critical remarks The intended fine spatial and temporal resolution of CLC+ will result in high data volumes with high
turnover rates (velocity) – which are two characteristics of Big Data34. Thus, it seems advisable to
adhere to the nature of these datasets when planning a sustainable architecture for CLC+. As
contemporary ORDBMS are not designed for Big Data applications, which manifests in their
weaknesses regarding horizontal scaling, consistency and replication, we advise to have a deeper look
into NoSQL concepts. Especially triplestores with RDF schemas seem appropriate to cope with the
requirements of CLC+, with respect to data volume, data turnover rate, distributed architecture and
semantic queries (and reasoning). Although, NoSQL datastores are widely used in business
applications like Facebook, Twitter, Amazon, and Google, the technology and theoretical foundation
is still under active development. Hence, NoSQL databases do not reach the technological maturity of
contemporary ORDBMS, like Oracle or PostgreSQL. In our eyes, this is a weakness and an opportunity.
By using NoSQL databases, this CLC+ is on the edge of the technological developments and is able to
contribute to the future of NoSQL databases – and to strengthen the spatio-temporal abilities of them.
31 https://www.safe.com/fme/fme-server/ 32 http://www.spatialytics.org/projects/geokettle/ 33 http://www.gdal.org 34 Hilbert, M. (2015): "Big Data for Development: A Review of Promises and Challenges. Development Policy Review.".martinhilbert.net. visited: Nov. 08, 2017.
Web Server (public)
…
Spatial
ETL Tool
Triple Store #1
Triple Store #2
Triple Store #n
SHAPE File France
SHAPE File Germany
SHAPE File Austria
SHAPE File Finland
…
GeoJSON France
GeoJSON Germany
GeoJSON Austria
GeoJSON Finland
…
50 | P a g e
5.5 Short-form of CLC-Core end product technical specifications
CLC-Core
[Example image if available.]
Description
CLC-Core is a consistent, multi-use grid database repository for environmental information populated
with a broad range of land cover, land use and ancillary data, forming the information engine to
deliver and support tailored thematic information requirements.
Input data sources (EO & ancillary)
EO data used in the production
No direct EO inputs.
Ancillary data used in the production
CLMS dataset
MS datasets
Other relevant environmental datasets
Thematic Content
EAGLE Data Model: TBD
Methodology
CLC-Core will be based on a grid spatial model which consists of square spatial units of uniform size
and shape that can have an (internally) heterogeneous thematic composition, i.e. the grids are
“geometrically uniform polygons”. Each grid cell has a unique ID that is related to a database
containing the attribute data. In a first step, CLC-Core will be populated with existing thematic
information from the CLMS, such as LoCo, HRL, CLC-Backbone, CLC, and any other data, such as in-
situ and MS data, mainly on land use and agricultural themes.
Geometric resolution (Scale)
n/a
Geographic projection / Reference system
ETRS89-LAEA
ETRS89 Lambert Azimuthal Equal-Area (EPSG: 3035, http://spatialreference.org/ref/epsg/etrs89-
etrs-laea/)
51 | P a g e
Coverage
EEA 32 Member Countries and 7 Cooperating Countries, i.e. the full EEA39 (Albania, Austria, Belgium,
Bosnia- Herzegovina, Bulgaria, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, France ,
Germany, Greece, Hungary, Iceland, Ireland, Italy, Kosovo (under the UN Security Council Resolution
1244/99), Latvia, Liechtenstein, Lithuania, Luxembourg, Former Yugoslavian Republic of Macedonia,
Malta, Montenegro, the Netherlands, Norway , Poland, Portugal , Romania, Serbia, Slovakia,
Slovenia, Spain, Sweden, Switzerland, Turkey, United Kingdom) who are participating in the regular
CLC dataflow. The number of countries involved is 39. The area covered is, approximately, 6 Million
km2.
Geometric accuracy (positioning accuracy)
Grid: Less than half a grid cell
Spatial resolution / Minimum Mapping Unit (MMU)
Grid: 1 ha
Min. Width of linear features
Grid: 100 m
Topological correctness
[TBD]
Raster coding
n/a
Thematic accuracy (in %) / quality method
To be explored as it will aggregate a disparate range of data types, products and sources.
Target / reference year
2018 (full vegetation season, starting partly in autumn 2017 in southern Europe)
Up-date frequency
[TBD]
Information actuality
Plus and minus 1 year from target year (2017 – 2019). A longer time window may be required for the
ancillary datasets.
Delivery format
Grid
Data type
Grid
Naming conventions
[TBD]
Medium
[TBD]
Delivery reliability
[TBD]
52 | P a g e
Delivery time
[TBD]
Archive
[TBD]
53 | P a g e
6 CLC+ - THE LONG-TERM VISION
CLC+ is expected to represent the new long-term vision for land cover / land use monitoring in
Europe. CLC+ shall meet the requirements of today and tomorrow for European LC/LU information
needs, building upon the EAGLE concept and expertise, while still have capabilities within the 2nd
generation CLC to guarantee backwards compatibility with the original CLC datasets. That implies:
addressing and overcoming the limitations of the spatial resolution of CLC;
overcoming the problems created by the use of heterogeneous classes, while still allowing
their creation in CLC-Legacy to maintain the “landscape vision” of CLC;
a better discrimination of specific landscape features by introducing a limited number of
new or sub-level classes;
keeping the CLC nomenclature unchanged to a large degree, while allowing it to address
known major shortcomings in the classical CLC nomenclature (by attribution and/ or
introduction of subclasses);
making use of a consistent and reproducible method of mapping changes at the 1 ha level
for future land monitoring at a much higher spatial resolution than with the traditional CLC.
Technically CLC+ is proposed as a raster data set with 1 ha spatial resolution (100 x 100 m), created
from CLC-Core, i.e. already including information from CLC-Backbone (e.g. pixel classification),
existing HRL, LoCo and national data stored in CLC-Core. CLC+ shall be understood as one specific
instance (i.e. realisation) of CLC-Core in which the stored information is aggregated to a specific
nomenclature, using the geometry defined by the CLC-Core grid cells35.
The 2018 reference year provides a unique opportunity for creating both a traditional CLC update
(by the MS) and a CLC-Legacy (via CLC-Backbone, CLC-Core and CLC+). The proposed nomenclature
and structure of CLC+ will not deviate too much from the classical CLC to allow a comparison of both
approaches.
Obviously, many other land monitoring products with various nomenclatures and spatial frameworks
may also be defined on the basis of CLC-Core in the future, also addressing the separation of land
cover and land use (already foreseen in CLC-Core), the use of attributes in the object
characterisation or the inclusion of ecosystem or habitat information. But this will be topic of future
discussions and developments, not part of the current work.
Despite the 1 ha MMU of CLC+, the information provided from the pixel based classification of CLC-
Backbone and stored in CLC-Core will allow the reporting of fractions of 1 ha units to respond to
more detailed (i.e. 0.5 ha) reporting obligations.
Due to the higher geometric resolution, the use of some of the often-debated heterogeneous
classes, like 2.4.2 (complex cultivation pattern) and 2.4.3 (land principally occupied by agriculture,
with significant areas of natural vegetation) will be significantly reduced, if not disappear altogether.
35 For this first realisation of CLC+ the vector geometry of CLC-Backbone is not used. Nonetheless, this could be considered as further option to be explored in the future.
54 | P a g e
Other conceptually heterogeneous classes like 2.4.4 (agro-forestry) will remain as they also have a
meaning at the 1 ha level.
The CLC+ nomenclature should be formulated in a way that it is compatible with the traditional
level-3 CLC classes, and takes into account user needs expressed e.g. in the CLC survey conducted by
ETC-ULS in 201636, former CLC production experience, as well as feasibility.
Considering the frequency of classes in CLC and proposals from MS, the following are primary candidate classes for division into subclasses:
121 – green energy industry separated
133 – “construction of artificial features” and “nature construction"
142 – cemeteries and sport/recreation (subclasses e.g. sport arenas, sport fields and racecourses, golf courses, ski slopes, parks (outside urban fabric), allotment gardens, archaeological sites and museums, holiday villages (temporary residential areas), sport airfields)
123 – agricultural grasslands (pastures, meadows) and non-agricultural grasslands under strong human impact
324 – forests growth areas (succession, afforestation, reforestation) and decline of forests (clearcutting, damage)
512 – natural and man-made
Attributes that can be applied for a number of classes are proposed to have a priority in introduction to CLC+. These are e.g.:
abandoned / out of use: can be applied for disused artificial sites / brownfields (121), uncompleted construction sites (133), suburban unused ruderal areas (231), salines (422).
irrigated: all 2xx classes
burnt: all 22x and 3xx classes
presence of windmills: all 2xx, most 3xx classes and 523
As the original CLC nomenclature is not a pure land cover classification there will be the need to
make use of more information than provided by the Copernicus Land Monitoring Service. Table 6-1
provides an overview of the traditional 44 CLC classes and information needed in addition to the
information provided by the Local Components, the High Resolution Layers and the land cover
information collected by CLC-Backbone in order to derive a specific CLC class. The new nomenclature
will be demanding similar and probably even more ancillary information.
The Hungarian test case37 for implementing the EAGLE concept in a bottom up CLC production has
clearly shown that content and quality of a bottom up derived CLC is ultimately dependent on input
data on:
• Land Use (critical);
• Crops and use intensity (e.g. from LPIS)
• Habitat (vegetation composition) information.
The situation is different for each CLC class and Table 6-1 clearly shows that many CLC classes cannot
be derived without additional information on the land use aspect of the area. The current LoCo data
36 Service Contract No 3436/R0-Copernicus/EEA.56586 Task 3: Planning of cooperation with EEA member and cooperating countries. FINAL report in preparation of the CLC2018 exercise. 37 Negotiated procedure No EEA/IDM/R0/16/001 based on Regulation (EC) No 401/2009 of 23.4.2009 on the European Environment Agency and the European Environment Information and Observation Network. Final report: Implementation of the EAGLE approach for deriving the European CLC from national databases on a selected area in one EU country
55 | P a g e
are only partly able to mitigate this lack of land use information as they are only available in selected
areas, not covering all of the European territory.
In almost all cases this information gap can be closed with data from MS (databases or photo-
interpretation). In case MS data is not available, a few classes can be approximated via European
level information, but most certainly at the expense of data accuracy and precision. The MS
information should be provided as input to CLC-Core. In case of missing national land use
information as well as no other European-level replacement data38, traditional CLC data or
production methods (e.g. photo interpretation) might need to be used as back-up solution.
38 Wall-to-wall HRL can be used to address density issues or leaf types, but not they cannot be used as replacement information of land use.
56 | P a g e
Table 6-1: List of current CLC classes and requirements for external information
57 | P a g e
7 CLC-LEGACY
The history of European land monitoring is a story of permanent evolution of approaches and
concepts, resulting in different solutions for individual needs and specifications. In this context, the
CLC specifications provided the first set of European-wide accepted de-facto standards, i.e. a
nomenclature of land cover classes, a geometric specification, an approach for land cover change
mapping and a conceptual data model (Heymann et al, 1994). In this sense, CLC represents a unique
European-wide, if not global, consensus on land monitoring, supported by 39 countries.
Being the thematically most detailed wall to wall European LC/LU dataset and representing a time-
series started in the early 1990’s CLC data became the basis of several analyses and indicator
developments, like Land & Ecosystem Accounts (LEAC) for Europe. It is an obvious requirement that
the planned new European land monitoring framework (2nd generation CLC) has to be able to
ensure a possible maximum of backwards compatibility with standard CLC data.
This comparability will be ensured by CLC-Legacy – a data set with 25 ha MMU and the standard 44
thematic CLC classes, but derived by spatial and thematic generalisation from CLC+ (and/or CLC-
Core, tbd). Considering today’s already existing different CLC production methods, i.e. traditional
visual photo-interpretation vs. national bottom-up solutions, shows that those specific
methodologies result in CLC data with slightly different characteristics in terms of LC/LU statistics or
fine geometric detail, but still respecting and adhering to the general CLC specifications.
Although original CLC vector data are still used mainly for cartographic purposes, in practice the
most commonly used versions of CLC data are CLC status and CLC change raster layers. That is why
CLC-Legacy is also proposed as a 100 x 100m raster data set.
As already mentioned in chapter Error! Reference source not found., 2018 offers the unique
opportunity of comparing two different production approaches for CLC, i.e. the classical approach
implemented by the MS (CLC2018) and a modernised approach making use of capacities from the
Copernicus service industry and MS. Comparing CLC2018 and CLC_Legacy_2018 will allow to assess
the strengths and weaknesses of both approaches.
Production method is, however, not the only source of variation. The specification of the
nomenclature, combined with the variation in land cover and landscape across the European
continent, is necessarily leading to large variability within each class and partial overlap between
classes (Figure 7-1: CLC2012 along the border between Norway and Sweden (red line). The central
mountain area covered with sparsely vegetated lichen and calluna heath is assigned to 3.3.3 Sparsely
vegetated areas in Norway and 3.2.2 Moors and heathland in Sweden. Both classifications are
correct.).
Variation is inherent in the CLC methodology and variation due to the production method should not
cause any major concern.
58 | P a g e
Figure 7-1: CLC2012 along the border between Norway and Sweden (red line). The central mountain area covered with sparsely vegetated lichen and calluna heath is assigned to 3.3.3 Sparsely vegetated areas in Norway and 3.2.2 Moors and heathland in Sweden. Both classifications are correct.
For the practical creation of a CLC-Legacy 201839, the production process is facing a classical dilemma
that is associated with a change of methodology:
How to derive a CLC-Legacy 2018? The production process of CLC-Backbone, CLC-Core and CLC+ will
only provide status information for 2018, not any changes relative to CLC2012. A possible solution to
this dilemma (i.e. to map changes relative to the last update) is the approach used by Germany that
moved from a traditional CLC mapping approach to an automated approach in 2012, using a “status
layer first” approach instead of “change mapping first”. The result of that methodological change
was a delineation of CLC2012 objects based on national high spatial resolution data with a different
geometry than the previous CLC2006 objects. For an illustration, see also the areas marked in Figure
7-2.
In a subsequent step it was then necessary to separate the differences between CLC2006 and the
new CLC2012 into “technical changes” (due to the new mapping approach) and “real changes”. This
step involved a mostly manual revision and the constitution of the change layer 2006-2012. In
subsequent update cycles it may be possible to derive changes from the new versions of CLC+, by
creating a difference layer, then manually separating real changes from differences between
datasets (resulting from differences in CLC Core input data). Additional manual change detection
might be needed for changes that are not automatically detectable due to lack of up-to-date input
information (e.g. changes of land use without land cover change).
39 The process of deriving CLC-Legacy from CLC-Backbone, CLC-Core and finally CLC+ should not be confused with the traditional creation of CLC2018 by the MS.
59 | P a g e
CLC2006 CLC2012
Figure 7-2: Result of changing the CLC implementation method in Germany
For a future CLC2024 (assuming the 6 year update cycle) the change mapping approach could be the
combination of the traditional “change mapping first” approach and the ‘update first’ approach.
Information from 1 ha CLC+2018 and CLC+2024 would be used to derive 1 ha changes with
strong manual control, to separate differences from real changes.
These 1 ha changes would be aggregated to 5 ha change objects (i.e. CLC-Changes2018-2024).
The 5 ha change objects (2018-2014) would then be added to CLC2018 to create a CLC2024.
Methods and approaches for how to obtain 25 ha status layers and 5 ha changes from higher spatial
resolution data exist already in a number of MS.
7.1 Experiences to be considered
The following experiences can be considered for establishing the grounds for creating CLC-Legacy:
CLC accounting layers are used as input for EEAs Land and Ecosystem Accounting (LEAC)
system. The methodology was developed by ETC/ULS to create these layers, as well as for
the raster generalization of CLC accounting layers40.
Methodologies for bottom-up creation of CLC data have been developed by several CLC
national teams. This includes spatial and thematic aggregation of in-situ based high spatial
resolution data (e.g. Norway, Finland, Germany, The Netherlands, United Kingdom, Spain).
In most cases, rules were developed to allow thematic conversion in an automated or semi-
automated fashion. For spatial generalisation different techniques are available, depending
on the starting point in each of the countries.
CLC+ national test case for Hungary41
40 Generalization of adjusted CLC layers. ETC/ULS report 2016. 41 Negotiated procedure No EEA/IDM/R0/16/001 based on Regulation (EC) No 401/2009 of 23.4.2009 on the European Environment Agency and the European Environment Information and Observation Network. FINAL REPORT: Implementation of the EAGLE approach for deriving the European CLC from national databases on a selected area in one EU country
60 | P a g e
CLC Accounting Layers
One of the strongest assets of CLC is its ability to provide consistent information on land cover
changes and trends of the environment over multiple decades. Any new land monitoring
framework therefore must ensure the continuity of the land cover change information, such as
provided by the CLC Accounting Layers.
Specific characteristics and the variety of the CLC creation and update methodologies have
shown, that while CLC change data are well representing land cover changes between two certain
reference dates, there are significant "breaks" in the continuity of CLC time-series both in
statistical and in geometric sense. The solution applied for the harmonization of CLC time-series
is based on the idea to combine CLC status and change information to create a homogenous
quality time series of CLC / CLC-change layers for accounting purposes fulfilling the relation:
CLC change = CLC accounting new status – CLC accounting old status
Additional criteria of the realization were:
• Add more detail to the latest CLC status layer (CLC2012) from previous CLCC
information and use this "adjusted" layer as a reference
• Create previous CLC status layers by "backdating" of the reference, realized as
subtracting CLCC based information for CLC2012
Based on the above principles, the working steps of the creation of adjusted CLC layers (2000-
2006-2012) were as follows:
1 Include formation information from CLC-change layers into current CLC2012 status by
creating adjusted CLC2012 layer
a. Overwrite CLC2012 with first with code_2006 from CLC-change 2000-2006.
Intermediate result: A1_CLC2012
b. Overwrite F1_CLC2012 with code_2012 from CLC-change 2006-2012. Result:
A2_CLC2012
2 Create adjusted CLC2006 by including consumption information (code 2006 from CLC-
change 2006-2012) into A2_CLC2012. Result: A1_CLC2006
3 Create adjusted CLC2000 by including consumption information (code_2000 from CLC-
change 2000-2006) into A1_CLC2006. Result: A1_CLC2000
Due to the fact that the MMU is different for CLC status (25 ha) and CLC change (5 ha) layers,
resulting CLC accounting layers may include at several locations more spatial detail, than original
CLC status layers. Although this fact contradicts the original CLC specifications, still CLC-
accounting layers are considered as the most consistent harmonized time-series of CLC data.
CLC accounting layers are always created on the basis of the latest raster CLC status layer. Fine
details are added to this layer from the formation codes of previous CLC-change inventories. All
the former CLC accounting status layers are calculated backwards by overlaying CLC-change
consumption codes.
61 | P a g e
Figure 7-3 to Figure 7-5 illustrate different techniques and results of national generalisation
approaches to produce standard CLC from more detailed data.
Figure 7-3: Generalisation technique applied in Norway based on expanding and subsequently shrinking. The technique exists for polygon as well as raster data.
Original 3000m² Level III 1ha Level III 5ha Level III 25ha Level III
Figure 7-4: Generalization levels used in LISA generalization in Austria
Figure 7-5: Examples of 25 ha, 10 ha and 1 ha MMU CLC for Germany
62 | P a g e
CLC2012
Generalized CLC , MMU = 25 ha
High spatial resolution CLC, MMU = 20m cells
Figure 7-6: CLC+ national test case results (Hungary, Budapest-North). Comparison of traditional CLC (top) with bottom-up CLC (middle), created from high-resolution national CLC3 (bottom). The generalized CLC is more fragmented than CLC2012, even if 25ha MMU is kept. The high-resolution CLC illustrates well the potential laying in CLC+.
63 | P a g e
8 ANNEX 1 – CLC NOMENCLATURE
64 | P a g e
9 ANNEX 2: CLC-BACK BONE DATA MODEL
The complete list of feature types within CLC-Backbone contain:
INSPIRE o application schema: land cover vector
Land cover dataset Land cover unit
INSPIRE extension o Application schema: land cover extension (not endorsed)
Parametric observation
EAGLE o Land cover component
Each land cover dataset can contain several land cover units (in our case Level 2 landscape objects).
Each land cover unit contains a valid geometry (polygon border in our case). A specific land cover
unit can consist of one or more land cover components (either in time and/or space), each of which
with a specific lifespan.
It is important to note that the parametric observation as defined in the (not endorsed) INSPIRE land
cover extension application schema is an absolutely necessary part to attach specific observations
(e.g. temporal profile of NDVI) to the objects (land cover units).
INSPIRE main feature types: land cover dataset and land cover unit
Figure 9-1: INSPIRE main feature types: land cover dataset and land cover unit (from land cover extended application schema)
65 | P a g e
EAGLE extension to land cover unit and new feature type land cover component
Figure 9-2: EAGLE extension to land cover unit and new feature type land cover component
Each land cover unit can contain one or more so called parametric observations. According to
INSPIRE these parametric observations are limited to percentage, countable and presence
parameter. In order to enlarge the usage and to integrate the required observation (e.g. mean NDVI,
or vegetation trajectories) of a single date this parameter type is enlarged with the data type
“Indexparameter”.
Therefore, the CLC-Backbone data model includes the single date observations as ParameterType
(with a specific observationDate and a measurement as real number (0…1 or 0….100). A new CLC-
Backbone base model is shown in Figure 9-3.
66 | P a g e
Figure 9-3: The new CLC-Basis model (based on LISA)
The new CLC-Backbone data model includes now three feature types:
LandCoverDataset
LandCoverUnit
LandCoverComponent And two additional datatypes
LandCoverUnit o ParameterType
Name: name of the parameter type (e.g. NDVI_max, NDVI_mean, NDVI_min, …)
ObservationDate: date of the observation (e.g. acquisition date of satellite scene)
IndexParameter: Value of e.g. NDVI_max, etc. to be stored here
LandCoverComponent o Occurrence
Cover percentage of a single land cover component within the LC Unit o Overlaying
1….if the LC component is overlaying another component (e.g. temporarily covered with water, snow)
0…default-value, if the component is not overlaying o validFrom: The time when the activity complex started to exist in the real world. o validTo: The time when the activity complex no longer exists in the real world.
Integration of temporal dimension
67 | P a g e
9.1.1 Integration of temporal dimension into LISA
As LISA was historically set up as single snap-shot within a 3-year interval (repetition cycle of
orthofoto campaign), no temporal phenomena could be modelled. The underlying assumption for
the adaptation of the model is that the land cover unit establishes the more or less stable geometry
over time (>1 year continuity).
Within the land cover unit several land cover components can exist with a specific life time. The land
cover unit can be considered as stable container for these changing land cover components.
Types of changes:
Changes within a land cover unit is in general regarded as status change.
We consider changes within the land cover components as cyclic changes.
Long term ecosystem changes (conditional changes) will be rather recorded in various attributes that
will be defined within the products nr. 5.
The implementation of the temporal dynamics is realized using a 1:many relation between the
LandCoverUnit and the LandCoverComponent in combination with the defined Lifespan of the
specific component.
Example
A typical agricultural field the land cover unit could have the following LC components:
Bare soil January – March
Herbaceous vegetation April-June
Bare soil 2nd half of June
Herbaceaous vegetation July-September
Bare soil 2nd half of September
Herbaceous vegetation October-December A web-based application within the GSE Cadaster Environment project (financed by ESA) is available under: http://demo.geoville.com/shiny/cadenv/p2/wien/ to demonstrate the above-mentioned concept (Figure 9-4).
68 | P a g e
Figure 9-4: illustration of temporal NDVI profile and derived land cover components throughout a vegetation season (Example taken from project Cadaster Env Austria, © Geoville 2017)
Important note:
It is not foreseen in the technical specifications that each temporal LC component is stored in the
underlying CLC Base production database. The production chain may just provide the final
classification of the LC unit in the final CLC-Backbone product, without giving information on the
appearance of land cover components throughout.
9.1.2 Integration of multi-temporal observations into LISA
Spectral profiles over time constitute a valuable information source (Figure 9-5), not only for the
automated classification, but as well for the visual interpretation of object characteristic. For each
CLC-Backbone object various time series of NDVI-indices (NDVI_mean, NDVI_max, NDVI_min, etc.)
can be stored in the data model using the parametricObservation.
69 | P a g e
Figure 9-5: Draft sketch to illustrate the principles of the combined temporal information that is stored in the data model
The sketch above illustrates the principles of the combined temporal information that is stored in
the data model
The graph above illustrates the variation within a year within a LandCoverUnit. The geometric borders of the unit are pre-fixed based on the Level 2 landscape object geometries
o red points: single observations from each S-2 scene calculated on object level (Level 2 landscape polygon)
From these observations (stored in the parametricObservation of the LandCoverUnit the complete temporal profile for the specific object can be reconstructed as graph and visually illustrated to domain experts.
o Brown and green bars: lifespan of the single LandCoverComponents (green – herbaceous vegetation and brown – bare soil) within a Land Cover unit
o Dark green bar: classification of the land cover unit according to the information of the sequential appearance of land cover components
Within the LandCoverComponent the attributes validFrom and validTo are used to store the life-time
information of the component. If only the start, but not the end of a LandCoverComponent can be
observed (e.g. due to missing observations in time), only the validFrom can be used. The element
will automatically be set to invalid, if a new LandCoverComponent appears (with 100% coverage of
the landcover unit).
70 | P a g e
10 ANNEX 3: OSM DATA
10.1 OSM road nomenclature The following table identifies all OSM road types that should be considered for CLC-Backbone Level 1
delineation. In many countries, most roads are tagged as “unclassified”. Only highway =
motorway/motorway_link implies anything about quality. (Source:
https://wiki.openstreetmap.org/wiki/Key:highway)
CLC-Backbone
Potential 20m width
Key Value Comment
y wide highway motorway A restricted access major divided highway, normally with 2 or more running lanes plus emergency hard shoulder. Equivalent to the Freeway, Autobahn, etc..
y highway trunk The most important roads in a country's system that aren't motorways. (Need not necessarily be a divided highway.)
y wide highway primary The next most important roads in a country's system. (Often link larger towns.)
y highway secondary The next most important roads in a country's system. (Often link towns.)
y highway tertiary The next most important roads in a country's system. (Often link smaller towns and villages)
y highway unclassified The least most important through roads in a country's system – i.e. minor roads of a lower classification than tertiary, but which serve a purpose other than access to properties. Often link villages and hamlets. (The word 'unclassified' is a historical artefact of the UK road system and does not mean that the classification is unknown; you can use highway=road for that.)
y highway residential Roads which serve as an access to housing, without function of connecting settlements. Often lined with housing.
y highway service For access roads to, or within an industrial estate, camp site, business park, car park etc. Can be used in conjunction with service=* to indicate the type of usage and with access=* to indicate who can use it and in what circumstances.
Link roads
Y wide highway motorway_link The link roads (sliproads/ramps) leading to/from a motorway from/to a motorway or lower class highway. Normally with the same motorway restrictions.
y highway trunk_link The link roads (sliproads/ramps) leading to/from a trunk road from/to a trunk road or lower class highway.
y highway primary_link The link roads (sliproads/ramps) leading to/from a primary road from/to a primary road or lower class highway.
y highway secondary_link The link roads (sliproads/ramps) leading to/from a secondary road from/to a secondary road or lower class highway.
y highway tertiary_link The link roads (sliproads/ramps) leading to/from a tertiary road from/to a tertiary road or lower class highway.
Special road types
y highway track Roads for mostly agricultural or forestry uses. To describe the quality of a track, see tracktype=*. Note: Although tracks are often rough with unpaved surfaces, this tag is not describing the quality of a road but its use. Consequently, if you want to tag a general use road, use one of the general highway values instead of track.
71 | P a g e
10.2 OSM roads: completeness Estimation of OSM completeness according to parametric estimation in study of Barrington-Leigh
and Millard-Ball (2017)42. A sigmoid curve was fitted to the cumulative length of contributions
within OSM data and used to estimate the saturation level for each country.
country ISO-Code frcComplete_fit OSM length [km]
Serbia SRB 74,3% 61.233
Bosnia and Herzegovina BIH 86,5% 37.877
Turkey TUR 89,7% 501.384
Sweden SWE 92,7% 280.107
Norway NOR 93,6% 132.527
Hungary HUN 93,8% 90.670
Romania ROU 94,3% 162.844
Ireland IRL 96,0% 107.263
Spain ESP 96,2% 509.937
Liechtenstein LIE 96,3% 359
Croatia HRV 96,8% 60.119
Bulgaria BGR 96,9% 80.983
Cyprus CYP 97,0% 12.091
Albania ALB 97,6% 22.875
Italy ITA 97,9% 594.384
Belgium BEL 98,3% 103.778
Lithuania LTU 98,5% 69.885
Poland POL 98,5% 358.733
Iceland ISL 98,6% 17.871
France FRA 99,9% 1.164.238
Denmark DNK 100,3% 96.656
Estonia EST 100,3% 42.878
Portugal PRT 100,4% 150.176
United Kingdom GBR 100,4% 443.265
Netherlands NLD 100,5% 136.118
Greece GRC 100,8% 156.586
Slovakia SVK 101,4% 41.566
Austria AUT 101,4% 128.934
Luxembourg LUX 101,5% 6.210
Germany DEU 101,6% 744.304
Latvia LVA 101,7% 61.066
Montenegro MNE 102,2% 12.346
Macedonia (FYROM) MKD 102,4% 15.174
Switzerland CHE 103,0% 73.066
Czech Republic CZE 104,0% 112.441
Finland FIN 105,3% 231.298
Kosovo XKO 107,4% 13.297
Malta MLT 109,8% 2.241
Slovenia SVN NoData 36.041
42 Barrington-Leigh C, Millard-Ball A (2017) The world’s user-generated road map is more than 80% complete. PLOS ONE 12(8): e0180698. https://doi.org/10.1371/journal.pone.0180698
72 | P a g e
11 ANNEX 4: LPIS DATA IN EUROPE
The land parcel identification system is the geographic information system component of IACS
(Integrated agricultural control system). It contains the delineation of the reference areas, ecological
focus areas and agricultural management areas that are subject for receiving CAP-payments. Due to
different national priorities in the implementation of the CAP the LPIS-data are not homogeneous
throughout Europe. Some countries use rather aggregated physical blocks as geometric reference
and other countries use single parcel management units as smallest geographic entity within LPIS.
Three types of spatial objects can in principal be differentiated within LPIS (Figure 11-1):
Physical block
Field parcel
Single management unit
A physical block is defined as the area that is surrounded by permanent non-agricultural objects like roads, rivers, settlement, forest or other features. A physical block normally contains several field management units that may belong to various farmers.
A field parcel is owned by a single farmer and is the dedicated management unit for a specific agricultural main category. It contains either grassland or arable land, but never both categories. It may be split into several single sub-management units.
A single management unit contains a specific crop type (e.g. winter wheat, maize) or a specific grassland management type (e.g. one cut meadow, two cut meadow, 3 and more cut meadow).
The relation between field parcels and single management units is a 1:m relation, where 1 management unit (field) can contain 1 or many sub-management units (single parcels).
Remark to the relation with the real estate database:
Land properties are recorded in many countries in the real estate database. An elementary part of the real estate database is the cartographic representation of real estate parcel boundaries. These parcels should not be mixed up with the field parcels. The parcels in the real estate database constitutes owner ship rights and are documented as official boundaries. The boundaries in agricultural management units however are delineated to derive correct area figures as base for payments under the CAP. In most cases the parcel boundaries between the real estate database and the LPIS will be the same, but not in all cases.
73 | P a g e
(a)
(b)
(c)
Figure 11-1: Illustration of (a) physical block, (b) field parcel and (c) single management unit
From year to year the availability of national LPIS data is increasing. An overview of available and
downloadable datasets has been created based on a previous ETC-ULS report by Synergise within
the Horizon 2020 project LandSense (Figure 11-2). The minimum thematic detail that should be
harmonized from LPIS data is the classification into (1) arable land, (2) permanent grassland, (3)
permanent crops, (4) ecological focus areas and (5) others. As these categories contain quite
heterogeneous classes on national level a harmonization is quite challenging, as they are applied
quite inconsistently across Europe. Nevertheless, the geometric delineation of field parcels at any
level could already provide valuable input given that LPIS is available for the large majority of
countries. Unless a critical number of countries with available LPIS data is reached the inclusion of
LPIS will add more bias in the resulting geometric CLC-backbone compared to a method solely based
on pure image segmentation.
74 | P a g e
Figure 11-2: Overview of available GIS datasets LPIS and their thematic content © Synergise, project LandSense