D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical...

75
EEA/IDM/R0/17/003 Technical specifications for implementation of a new land-monitoring concept based on EAGLE D4: Draft design concept and CLC-Backbone and CLC-Core technical specifications, including requirements review Version 4.0 31.01.2018

Transcript of D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical...

Page 1: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

EEA/IDM/R0/17/003

Technical specifications for implementation

of a new land-monitoring concept based on

EAGLE

D4: Draft design concept and CLC-Backbone

and CLC-Core technical specifications,

including requirements review

Version 4.0

31.01.2018

Page 2: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

1 | P a g e

Version history

Version Date Author Status and

description

Distribution

2.0 04/10/2017 S. Kleeschulte, G. Banko,

G. Smith, S. Arnold, J.

Scholz, B. Kosztra, G.

Maucha

Reviewers:

G-H Strand, G. Hazeu,

M. Bock, M. Caetano, L.

Hallin-Pihlatie

Draft for review by NRC

LC

EEA, NRC LC

3.0 10/11/2017 S. Kleeschulte, G. Banko,

G. Smith, S. Arnold, J.

Scholz, B. Kosztra, G.

Maucha

Reviewers:

G-H Strand, G. Hazeu,

M. Bock, M. Caetano, L.

Hallin-Pihlatie

Updated draft

integrating the

comments following the

NRC LC workshop

For distribution at Copernicus User

Day

4.0 31/01/2018 S. Kleeschulte, G. Banko,

G. Smith, S. Arnold, J.

Scholz, B. Kosztra, G.

Maucha

Reviewers:

G-H Strand, G. Hazeu,

M. Bock, M. Caetano, L.

Hallin-Pihlatie

Updated draft

integrating the

comments following the

Brussels COPERNICUS

Land monitoring

workshop

For EEA distribution / consultations

Page 3: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

2 | P a g e

Table of Contents

1 Context ............................................................................................................................................ 7

2 Introduction .................................................................................................................................... 9

2.1 Background ............................................................................................................................. 9

2.2 Concept ................................................................................................................................. 10

2.3 Role of industry ..................................................................................................................... 14

2.4 Role of Member States ......................................................................................................... 14

2.5 Engagement with stakeholder community ........................................................................... 15

2.6 Structure of this document ................................................................................................... 15

3 Requirements analysis .................................................................................................................. 16

4 CLC-Backbone – the industry call for tender ................................................................................ 21

4.1 Introduction .......................................................................................................................... 21

4.2 Description of CLC-backbone ................................................................................................ 22

4.2.1 Level 1 Objects – hard bones ........................................................................................ 23

4.2.2 Level 2 Objects – soft bones ......................................................................................... 23

4.3 Technical specifications ........................................................................................................ 24

4.3.1 Spatial scale / Minimum Mapping Unit ........................................................................ 24

4.3.2 Delineation accuracy of geometric units ...................................................................... 25

4.3.3 Reference year .............................................................................................................. 25

4.3.4 EO data .......................................................................................................................... 26

4.4 Thematic attribution ............................................................................................................. 27

4.4.1 Thematic classes ........................................................................................................... 28

4.5 European ancillary data sets for Level 1 hard bone creation ............................................... 29

4.5.1 Criteria for ancillary data input ..................................................................................... 31

4.5.2 Ancillary input data ....................................................................................................... 32

4.5.3 Remark: land parcel identification system (LPIS) .......................................................... 36

4.6 Short-Form of CLC-Backbone end product technical specifications ..................................... 37

5 CLC-Core – the grid approach ....................................................................................................... 40

5.1 Background ........................................................................................................................... 40

5.2 Populating the database ....................................................................................................... 41

5.3 Data modelling in CLC-Core .................................................................................................. 42

5.4 Database implementation approach for CLC-Core ............................................................... 43

5.4.1 Database concepts revisited: from Relational to NoSQL and Triple Stores .................. 43

5.4.2 CLC-Core Database: spatio-temporal Triple Store Approach ....................................... 48

Page 4: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

3 | P a g e

5.4.3 Processing and Publishing of CLC-Core Products .......................................................... 49

5.4.4 Conclusion and critical remarks .................................................................................... 49

5.5 Short-form of CLC-Core end product technical specifications ................................................ 50

6 CLC+ - the long-term vision ........................................................................................................... 53

7 CLC-Legacy .................................................................................................................................... 57

7.1 Experiences to be considered ................................................................................................ 59

8 Annex 1 – CLC nomenclature ........................................................................................................ 63

9 Annex 2: CLC-Back bone data model ............................................................................................ 64

9.1.1 Integration of temporal dimension into LISA................................................................ 67

9.1.2 Integration of multi-temporal observations into LISA .................................................. 68

10 Annex 3: OSM data ................................................................................................................... 70

10.1 OSM road nomenclature....................................................................................................... 70

10.2 OSM roads: completeness ................................................................................................... 71

11 Annex 4: LPIS data in Europe .................................................................................................... 72

Page 5: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

4 | P a g e

List of Figures

Figure 1-1: The relationship of the 2nd generation CLC to the existing CLMS components. .................. 7

Figure 2-1: Conceptual design showing the interlinked elements required to deliver improved

European land monitoring (2nd generation CLC). .................................................................................. 11

Figure 2-2: Conceptual design for the products and stages required to deliver improved European

land monitoring (2nd generation CLC). .................................................................................................. 12

Figure 2-3: A scale versus format schematic for the current and proposed CLMS products. .............. 13

Figure 4-1: illustration of a merger of current local component layers covering (with overlaps) 26,3%

of EEA39 territory (legend: red – UA, yellow - N2K, blue – RZ LC) ....................................................... 21

Figure 4-2: Processing steps to derive (1) the geometric partition of objects on Level 1 using a-priori

information, (2) the delineation of objects on Level 2 using image segmentation techniques and (3)

the pixel-based classification of EAGLE land cover components and (4) attribution of Level 2 objects

based on this classification and additional Sentinel-1 information. ..................................................... 22

Figure 4-3: Illustration of OSM completeness for Estonia (near Vändra), Portugal (near Pinhal Novo),

Romania (near Parta) and Serbia (near Indija) (top to bottom). © Bing/Google and EOX Sentinel-2

cloud free services as Background (left to right). ................................................................................. 33

Figure 4-4: Downloadable datasets for INSPIRE transport network (road) services using the official

INSPIRE interface (Status: Jan. 2018). ................................................................................................... 34

Figure 4-5: Distribution of number of downloadable datasets for INSPIRE transport service across

member countries (Status: Jan. 2018) .................................................................................................. 35

Figure 5-1: CLC with a 1 km raster/grid superimposed (top) illustrating the difference between

encoding a particular unit as raster pixel (centre) or a grid cell (bottom). “daa” is a Norwegian unit:

10 daa = 1 ha. ........................................................................................................................................ 41

Figure 5-2: Representation of real world data in the CLC-Core. ........................................................... 42

Figure 5-3: Example of an RDF triple (subject - predicate - object). ..................................................... 46

Figure 5-4: GeoSPARQL query for Airports near the City of London. ................................................... 47

Figure 5-5: Result of the GeoSPARQL of Figure 5-4 as map and in the JSON format. .......................... 47

Figure 5-6: Schematic view of the distributed SPARQL endpoints communicating with each other.

The arrows indicate the flow of information from a query directed to the French CLC SPARQL

endpoint. ............................................................................................................................................... 48

Figure 5-7: Intended generation of CLC+ products based on the distributed triplestore architecture.

.............................................................................................................................................................. 49

Figure 7-1: CLC2012 along the border between Norway and Sweden (red line). The central mountain

area covered with sparsely vegetated lichen and calluna heath is assigned to 3.3.3 Sparsely

vegetated areas in Norway and 3.2.2 Moors and heathland in Sweden. Both classifications are

correct. .................................................................................................................................................. 58

Figure 7-2: Result of changing the CLC implementation method in Germany ..................................... 59

Figure 7-3: Generalisation technique applied in Norway based on expanding and subsequently

shrinking. The technique exists for polygon as well as raster data. ..................................................... 61

Figure 7-4: Generalization levels used in LISA generalization in Austria .............................................. 61

Figure 7-5: Examples of 25 ha, 10 ha and 1 ha MMU CLC for Germany ............................................... 61

Figure 7-6: CLC+ national test case results (Hungary, Budapest-North). Comparison of traditional CLC

(top) with bottom-up CLC (middle), created from high-resolution national CLC3 (bottom). The

generalized CLC is more fragmented than CLC2012, even if 25ha MMU is kept. The high-resolution

CLC illustrates well the potential laying in CLC+. .................................................................................. 62

Page 6: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

5 | P a g e

Figure 9-1: INSPIRE main feature types: land cover dataset and land cover unit (from land cover

extended application schema) .............................................................................................................. 64

Figure 9-2: EAGLE extension to land cover unit and new feature type land cover component ........... 65

Figure 9-3: The new CLC-Basis model (based on LISA) ......................................................................... 66

Figure 9-4: illustration of temporal NDVI profile and derived land cover components throughout a

vegetation season (Example taken from project Cadaster Env Austria, © Geoville 2017) .................. 68

Figure 9-5: Draft sketch to illustrate the principles of the combined temporal information that is

stored in the data model ...................................................................................................................... 69

Figure 11-1: Illustration of (a) physical block, (b) field parcel and (c) single management unit73

Figure 11-2: Overview of available GIS datasets LPIS and their thematic content © Synergise, project

LandSense ............................................................................................................................................. 74

Page 7: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

6 | P a g e

List of Tables

Table 2-1: Overview of key characteristics proposed for the four elements / products of the 2nd

generation CLC based on initial discussions with EEA. ......................................................................... 12

Table 2-2: Summary (matrix) of potential roles associated with each element / product. ................. 15

Table 4-2: Main characteristic of Level 1 objects (hard bone) and level 2 objects (soft bones) .......... 24

Table 4-1: Overview of existing relevant products which were analysed as potential input to support

the construction of the geometric structure (Level 1 “hard bones”) of the CLC-Backbone. ................ 30

Table 5-1: Comparison of selected Triplestores with respect to spatial and temporal data. .............. 46

Table 6-1: List of current CLC classes and requirements for external information .............................. 56

Page 8: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

7 | P a g e

1 CONTEXT

The European Environment Agency (EEA) and European Commission DG Internal Market, Industry,

Entrepreneurship and SMEs (DG GROW) have determined to develop and design a conceptual

strategy and associated technical specifications for a new series of products within the Copernicus

Land Monitoring Service (CLMS) portfolio (Figure 1-1), which should meet the current and future

requirements for European Land Use Land Cover (LULC) monitoring and reporting obligations. This

collection of products is nominally called the "2nd generation CORINE Land Cover (CLC)".

Figure 1-1: The relationship of the 2nd generation CLC to the existing CLMS components.

After a call for tender in 2017, the EEA has tasked the European Environment Information and

Observation Network (EIONET) Action Group on Land monitoring in Europe (EAGLE Group) with

developing an initial response to fulfilling the needs listed in the following chapters through a

conceptual design and technical specifications. The approach adopted for the development of the

2nd generation CLC represents a sequence of stages, where separate single elements or products of

the whole concept could be developed relatively independently and at different rates, to allow time

for broad consultation with stakeholders. This also allows the continuous inclusion of Member State

(MS) input, the exploitation of industrial production capacity, and the necessary feedback, lessons

learnt and refinement to reach the ultimate goal of a coherent and harmonized European Land

Monitoring Framework.

The first stage of this development process was to engage with the EEA and DG-GROW to refine

their requirements and build on the ideas presented by the EAGLE Group in the proposal. These first

developments were undertaken within a set of constraints outlined by EEA and DG GROW for the

initial product to be delivered within the 2nd generation CLC:

Industrial production by service providers,

Outcome product in vector format,

Highly automated production process,

Short timeframe of production phase,

Driven by Earth Observation (Sentinel-2),

Complete the picture started by the Local Component1 products (EEA-39).

1 The term “component” has a twofold function, 1) as the Copernicus local component products, 2) as a term for Land Cover Component as an element within the EAGLE data model. In this case the former.

Page 9: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

8 | P a g e

These developments resulted in the first distributed deliverable (D2) which outlined the conceptual

strategy and proposed a draft technical specification for the first product (CLC-Backbone, see later)

to be developed. This stage also involved a presentation of the concepts in D2 to the EIONET

National Reference Centres (NRCs) Land Cover in Copenhagen in October 2017.

The NRCs feedback collected during the NRC meeting and afterwards was carefully taken into

consideration for the continuation of the development process into the next stage. The conceptual

strategy and technical specification were both refined and extended for the production of the next

distributed deliverable (D3). The concepts and specification for the 2nd generation of CLC contained

in D3 were presented to the Copernicus Land Monitoring Service: Workshop on CORINE Land Cover+

in Brussels in November 20172. This meeting generated lively debate and a set of survey results from

the participants.

This document is the penultimate deliverable (D4) of the project representing the outcomes of the

latest stage of development of the 2nd generation of CLC. This document contains a further

expansion on the conceptual strategy and extended technical specifications for the proposed

products. The aim of the document is to continue to communicate these developments to the

stakeholders involved in the field of European land monitoring and elicit their feedback. This

document will be presented by the EEA to a number of high level meetings and stakeholders.

A final revised version of this document (D5) will be produced and made available in early March

2018.

2 “Copernicus Land Monitoring Service: Workshop on CORINE Land Cover+, 16/11/2017” Report. Framework Contract 385/PP/2014/FC Lot 2 (Copernicus User Uptake Framework Contract). Prepared for: European Commission - DG GROW

Page 10: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

9 | P a g e

2 INTRODUCTION

This chapter provides the background to the concept, an outline of the proposed approach to be

adopted, an overview of the elements of the concept and the current status of the developments.

2.1 Background Monitoring of Land Use and Land Cover (LULC) is among the most fundamental environmental

survey efforts required to support policy development and effective environmental management3.

Information on LULC plays a key role in a large number of European environmental directives and

regulations. Many current environmental issues are directly related to the land surface, such as

habitats, biodiversity, phenology and distribution of plant species, ecosystem services, as well as

other issues relevant to population growth and climate change. Human activities and behaviour in

space (living, working, production, education, supplying, recreation, mobility & communication,

socializing) have significant impacts on the environment through settlements, transportation and

industrial infrastructure, agriculture, forestry, exploitation of natural resources and tourism. The

land surface, who´s negative change of state can only – if at all – be reversed with huge efforts, is

therefore a crucial ecological factor, an essential economic resource, and a key societal determinant

for all spatially relevant basic functions of human existence and, not least, nations’ sense of identity.

Land thus plays a central role in all three factors of sustainable development: ecology, economy, and

society.

LULC products so far tend to be produced independently of each other at the global, European,

national and sub-national levels, as well as by different disciplines and sectors, each of them

focussing on a fairly similar end product but different emphases on thematic content and spatial

detail. This independency and uncoordinated production is explained and justified by the large

variation in objectives and requirements of these products, but it leads to reduced interoperability

and sometimes also duplication of work, and thereby, inefficient use of resources.

At the European level CORINE Land Cover (CLC) is the flagship programme for long-term land

monitoring. This is now part of the Copernicus Land Monitoring Service (CLMS). CLC has been

produced for reference years of 1990, 2000, 2006 and 2012, with 2018 under preparation and

expected to be available by late 2018. CLC is produced by MS with technical coordination of EEA and

guided by well-established specifications and methodological guidelines. The CLC specification aims

to provide consistent localized geographical information on LULC using 44 classes at level-3 in the

hierarchical nomenclature (see Annex 1). The vector databases have a minimum mapping unit

(MMU) of 25 ha and minimum mapping width (MMW) of 100 m with a single thematic class

attribute per land parcel. At the European level, the database is also made available on a 100 x 100

m and 250 x 250 m raster products, which have been aggregated from the original vector data at

1:100 000 scale. CLC also includes a directly mapped (i.e. not creating by intersecting CLC status

layers) change layer, which records changes between two of the 44 thematic classes between two

dates with a MMU of 5 ha.

Although CLC has become well established and has been successfully used, mainly at the pan-

European level, there are a number of deficiencies and limitations that restrict its wider exploitation,

particularly at the MS level and below. This is partly due to the fact that the MMU of CLC (25 ha) is

too coarse to capture the fine details of the landscape at the local and regional scales, but also to the

fact that some MS have access to more detailed, precise and timely information from national

programs. In consequence of the first point, features of smaller size that represent landscapes

diversity and complexity are not mapped either because of geometric generalization or because they

3 Harmonised European Land Monitoring - Findings and Recommendations of the HELM Project.

Page 11: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

10 | P a g e

are absorbed by classes with broad definitions. Moreover, landscape dynamics that are highly

relevant to locally decided but globally effective policy, such as small-scale forest rotation, changes

in agricultural practices and urban in-filling, may be missed due to low spatial resolution and / or

thematic depth of CLC class definitions. The successful use of CLC in combination with higher spatial

resolution products to detect and document urban in-filling has given a clear indication of the

required direction of development for European land monitoring.

To address some of the above issues, the CLMS has expanded its portfolio of products beyond CLC to

include the High Resolution Layers (HRLs) and local component (LoCo) products. The HRLs provide

pan-European information on selected surface characteristics in a 20 m raster format, also available

aggregated to 100 m raster cells. They provide information on basic surface properties such as

imperviousness, tree cover density and grassland and can be described as intermediate products.

The LoCo products in vector format are based on very high spatial resolution EO data and ancillary

information tailored to a specific landscape monitoring purpose (e.g. Urban Atlas), which provide

more detailed thematic LCLU information on polygon level with a MMU in a range of 0.25 to 1 ha.

However, the LoCo products altogether do not provide wall to wall coverage of the EEA-39

countries, even when combined. Their nomenclatures and geometries are not harmonised and have

thus caused interoperability problems which are now being addressed.

2.2 Concept Given the above issues and the known reporting obligation list in the next chapter, a revised concept

for European Land Monitoring is required which both provides improved spatial and thematic

performance and builds on the existing heritage. Also, recent evolutions in the field of land

monitoring (i.e. improved Earth Observation (EO) input data due to e.g. Sentinel programme,

national bottom-up approaches, processing methods, etc.) and desktop and cloud-computing

capacities offer opportunities to deliver these improvements more effectively and efficiently. The

EEA in conjunction with DG GROW has identified this need and now aims to harmonise and integrate

some of the CLMS activities by investigating the concept and technical specifications for a higher

performance pan-European mapping product under the banner of "2nd generation CLC" (Figure 1-1).

EEA and DG GROW determined that such a 2nd generation CLC should build upon recent conceptual

development and expertise, while still guaranteeing backwards compatibility with the conventional

CLC datasets. Also, the proposed approach should be suitable to answer and assist the needs of

recent evolution in European policies like reporting obligations on land use, land use change and

forestry (LULUCF), plans for an upcoming Energy Union or long-term climate mitigation objectives.

After a successful establishment of 2nd generation CLC there would then also be knock-on benefits

for a broad range of other European policy requirements, land monitoring activities and reporting

obligations. Given these requirements, no single product or specification would address all needs

and the complexity of the solution would require a stage development and implementation.

The proposed conceptual strategy therefore consists of a number of interlinked elements (Figure

2-1) which represent separate products and therefore can be delivered independently in discrete

production phases. Each of the products has its own technical specification and production

methodology and can be produced through its own funding / resourcing mechanisms.

The structure of the conceptual design proposed by the EAGLE Group for the 2nd generation CLC is

based on the elements shown in Figure 2-1. The reports associated with this work will expand

incrementally giving more details on the conceptual design and provide technical specifications of

increasing elaboration for the for the individual products.

Page 12: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

11 | P a g e

Figure 2-1: Conceptual design showing the interlinked elements required to deliver improved

European land monitoring (2nd generation CLC).

Although the four new elements / products of the conceptual strategy will be described in more

detail later in this and the final subsequent report (D5), they can be summarised as follows to aid

understanding of the conceptual design:

1. CLC-Backbone is a spatially detailed, large scale inventory in vector format providing a

geometric spatial structure attributed with raster data for landscape features with limited,

but robust, EO-based land cover thematic detail on which to build other products.

2. CLC-Core is a consistent multi-use grid4 database repository for environmental land

monitoring information populated with a broad range of land cover, land use and ancillary

data form the CLMS and other sources, forming the information engine to deliver and

support tailored thematic information requirements.

3. CLC+ is the “nominal” end point or final product in the establishment of CLC 2nd generation.

It is a derived vector and raster product from the CLC-Core and CLC-Backbone and will be a

LULC monitoring product with improved spatial and thematic performance, relative to the

current CLC, for reporting and assessment.

4. The final element of the conceptual design, although not strictly a new product, is the ability

to continue producing the existing CLC, which may be referred to as CLC-Legacy in the

future, which already has a well-established and agreed specification.

Table 2-1 provides the current overview of the main characteristics of the four elements to allow the

reader to make comparisons between the key characteristics of the products.

To manage the project efficiently and effectively, when developing and specifying these new

products it has been necessary to stage the work. Throughout the documents delivered by this

project the staging of the work given in Figure 2-2 will be used as a key graphic to identify the

product and stage being described.

4 A data structure whose grid cells are linked to a data model that can be populated with the information from the different sources.

Page 13: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

12 | P a g e

Figure 2-2: Conceptual design for the products and stages required to deliver improved European

land monitoring (2nd generation CLC).

Table 2-1: Overview of key characteristics proposed for the four elements / products of the 2nd generation CLC based on initial discussions with EEA.

CLC-Backbone CLC-Core CLC+ CLC-Legacy

Description Detailed wall to wall (EEA-39) geometric vector reference layer with basic thematic content and a raster attribution.

All-in-one data container for environmental land monitoring information according to EAGLE data model.

Thematically and geometrically detailed LULC product.

A more generalised LULC product consistent with the CLC specification.

Role / purpose Support to CLMS products and services at the pan-European and local levels.

Thematic characterisation of CLMS products and services at the pan-European and local levels.

Support to EU and national reporting and policy requirements.

Maintain the time series (backwards compatibility) and support legacy systems.

Format Raster and vector. GRID database. Raster and vector. Raster and vector.

Thematic detail <10 basic land cover classes, few spectral-temporal attributes.

Rich attribution of LU, LC and ancillary information.

High thematic detail including LC and LU with improvements compared to CLC.

CLC-nomenclature with 44 classes

+ changes between the 44.

Geometric detail (MMU, grid size)

1.0 ha. 1.0 ha, 100 x 100 m grid.

1.0 ha for status and changes.

25 ha for status, 5 ha for changes (raster: 100 x 100 m).

Update cycle 3 - 6 years. Dynamic update as new information becomes available.

3 - 6 years. Standard 6 years.

Reference year 2018 2018 2018 2018 / 2024

Production year 2018 2019 TBD TBD

Method Geospatial data integration and image segmentation for geometric boundaries and attribution / labelling by pixel-based EO derived land cover

EAGLE-Grid approach: population and attribution of regular GRID

Derived by SQL from CLC-Core

Derived from CLC+ and CLC-Core

Input data Geospatial data, EO images and ancillary data selected from LoCo (with visual control) and HRL products.

CLC-Backbone, all available HRLs, LoCo, ancillary data, national data provisions (e.g. LU), photointerpretation.

CLC-Core and CLC-Backbone

CLC+ and / or CLC-Core

Page 14: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

13 | P a g e

Figure 2-3 is a schematic description of the conceptual design showing the relationship of the new

elements / products to existing CLMS products in terms of their format and level of spatial detail. In

this representation:

1. The current or conventional CLC (CLC2000, CLC2006, CLC2012, CLC2018 etc.), a polygon map

with fewer geometric details.

2. The LoCos (Urban Atlas, Riparian Zones, N2000 etc.), polygon maps with more geometric

and thematic details.

3. The HRLs (Imperviousness, Forest, Grassland, Wetland etc.), raster products for specific

surface characteristics with high spatial details.

4. The proposed CLC-Core, a grid product where the level of detail has still to be decided.

5. The CLC-Backbone and CLC+, both polygon maps with different levels of thematic detail.

Figure 2-3: A scale versus format schematic for the current and proposed CLMS products.

In any monitoring system the requirement for updates is central to the implementation of the

activity. In this project we are focusing on the technical specifications, therefore the update cycle

will only be addressed in terms of the suggested time between repeat productions. Details of the

update process are more associated with the methodologies which still need to be finalised by the

service providers, national teams or the EEA in a later phase of development. However, particularly

in relation to the CLC-Legacy, reference will be made to the issues associated with the update cycle

and change mapping where appropriate.

Given the context, requirements and issues, the aim should be to find a means to deliver the

concept and its proposed products with an efficient mixture of industrially produced material backed

up by ancillary (e.g. land use) information from various national programmes if they are in existence.

It is important to propose a viable system in which there is flexibility to adapt the later steps to

issues of feasibility and practicality once the implementation of the earlier steps is underway. For

instance, the outcomes of the CLC-Backbone production should be able to influence thematic

content of the CLC-Core and / or the technical specifications of the CLC+.

Page 15: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

14 | P a g e

2.3 Role of industry The development and production of CLC to this point has had only a limited role for industry, mainly

focused on the productions of pre-processed image datasets (e.g. IMAGE2006, IMAGE2012, ….), the

validation of the CLC2012 products and in some MS the actual production. Conversely, the

production of the non-CLC products within the CLMS has been dominated by industry through a

series of service contracts to generate consistent products across Europe.

Industry has the ability to produce operational solutions which exploit automation, can handle large

data volumes and are scalable to European-wide coverage. It is also possible that it is easier to

standardize an industrial (top-down) production process than a (bottom-up) process based on MS

participation, and that industrialized production therefore will lead to more harmonized products

across Europe. As the amount of available EO and geographic information data increases, highly

efficient and effective mechanisms for production will be required for at least parts of the European

land monitoring process and economies of scale must be exploited. Industry therefore offers a

number of capabilities which will be required at selected points within the production of the 2nd

generation CLC.

DG GROW has expressed the wish that industry should have an initial role in the production of CLC-

Backbone which has therefore been designed in part to exploit industrial capabilities. Further

opportunities for industrial involvement are shown in Table 2-2.

2.4 Role of Member States The MS have always been intimately linked with the CLC as its production has been the responsibility

of the nominated authorities through EIONET, i.e. NRC Land Cover. The MS have provided the

production teams for the actual mapping, allowing the process to exploit local knowledge and

familiarity with native landscape types, and providing access to national datasets, which may not

otherwise have been possible. The bottom-up production approach has been the key to successful

delivery of this important time series of LULC data.

With respect to the non-CLC products within the CLMS, the MS involvement has so far been limited

to verification and, in the case of the 2012 products, enhancement. Although the MS have provided

valuable feedback and enhancement in some cases, particularly on the HRLs, their capabilities have

not yet been fully exploited within the CLMS. The EEA is looking into possibilities to further increase

the role of MS in non-CLC products, for instance by introducing a new task on enrichment of these

products. It is therefore of the utmost importance to ensure a balanced contribution to the CLC+

suite of products between industry and MS via Eionet NRC/LCs.

Table 2-2 shows the potential for a greater role for the MS across all of the new elements / products

within the 2nd generation CLC. It is important that the MS experts have oversight of the specifications

and an understanding of the opportunities to provide feedback on the monitoring process and

products. This includes the opportunities to contribute with data, experience and local knowledge to

the production and validation activities where appropriate.

The MS are currently involved in verification, but not in the validation of CLMS products. Verification

is the evaluation of whether or not the product corresponds to the reality of the landscape at stake

by confronting the product to the regional / local expertise of the terrain, and hence meets the

expectation of the stakeholders. Validation is the independent geostatistical control on the accuracy

with reference to the product specifications. The intended use of CLMS products and services at the

national level calls for verification against national user requirements – not as an acceptance test but

as a documentation of the usefulness, a promotional tool, and possibly also as assistance to improve

the products for this intended use.

Page 16: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

15 | P a g e

Table 2-2: Summary (matrix) of potential roles associated with each element / product.

CLC-Backbone CLC-Core CLC+ CLC-Legacy

EEA Definition, coordination, main user.

Definition, coordination, main user.

Definition, coordination, main user.

Definition, coordination, main user.

MS Review of specification, provision of geometric data, verification, validation, user.

Review of

specification,

population with

national datasets,

land use

information, user.

Review of

specification,

support to

production,

verification,

validation, user.

Production,

verification,

validation, user.

Industry Production. Implementation and maintenance of DB infrastructure.

Support

production,

validation.

Validation.

2.5 Engagement with stakeholder community The success of the project and the long-term development of land monitoring in Europe is

intrinsically linked to the involvement of the stakeholder community. It is vital that all the relevant

stakeholders are aware of this activity and contribute their requirements and opinions to this

process. Revised versions of this document are foreseen at specific milestones towards a final

version for deliver in early 2018.

2.6 Structure of this document This specific deliverable, D4, is the third step towards the definition of the conceptual strategy and

the potential technical specifications for a series of new CLMS products. It aims at communicating

these details to the stakeholders involved in European land monitoring to elicit feedback, comments

and questions. The work reported here begins with a review of requirements which goes beyond the

remit of the call for tender. The four elements of the 2nd generation CLC within the CLMS and their

potential products are described in further detail in the following chapters. This version includes the

first feedback from the NRC LC meeting in Copenhagen in October 2017, the Copernicus Land

Monitoring Service: Workshop on CORINE Land Cover+ in Brussels in November 2017 and dedicated

written feedback from various MS.

For CLC-Backbone, this document updates the draft versions of the technical specifications, but with

less emphasis on the outline of the implementation methodology that was provided in D3. These

details are still open for discussion and review by stakeholders and will ultimately be used as input

for an open call for tenders to industrial service providers for a production to start in 2018. For CLC-

Core, this document further develops the technical specification and proposes a number of options

for the implementation. The technical design of CLC-Core will continue to be developed based on

stakeholder feedback in future steps. The current thinking around CLC+ has been developed and is

approaching a technical specification.

Page 17: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

16 | P a g e

3 REQUIREMENTS ANALYSIS

In line with the principle of Copernicus, this analysis in support of the development of new products

with CLMS, is driven by user needs rather than the current technical capabilities of EO sensors and

processing systems. The technical issues will be dealt with in the chapters related to the individual

products focusing on their specification and, to a lesser extent, methodological development. This

analysis was initially based around the requirements set out in the original call for tender, but has

been extended through inclusion of recent work by the EC and ETC to provide a broader range of

stakeholder needs. As this project has developed, further requirements have been identified

through meetings and stakeholder feedback to support the increasingly detailed specifications of a

complete 2nd generation CLC.

The MMU and nomenclature of the current CLC have been operational for three decades. Backward

compatibility is therefore an essential issue when a new monitoring framework is considered. Still,

the specification for a future, improved and harmonised European land monitoring product is now

open for revision. The proposed products should both support current European policies through

continuity of current monitoring activities, but also support new policies. Examples are the reporting

obligations (on the European level) on land use, land use change and forestry (LULUCF), plans for an

upcoming Energy Union, Greening of the CAP, Urban Planning, Biodiversity and long-term climate

mitigation objectives.

A confounding element is that many of the policies that could exploit EO-based land monitoring

information rarely give clear quantitative requirements that can provide a basis for spatial, temporal

or thematic specifications. Also, despite the fact that land monitoring information is crucial in many

domains, so far, there has been no Land Framework Directive in Europe. The higher-level strategic

policies, such as the EU Energy Union often refer to the monitoring linked to other activities such as

REDD+ and LULUCF. Where actual quantitative analysis and assessment takes place, they tend to

rely on currently available products. For instance, LULUCF assessments in Europe (independently

from national reporting obligations) use CLC (with a 25 ha MMU), while monitoring at the global

scale is using Global Land Cover 2000 (GLC 2000) with a 1 km spatial resolution (or 100 ha MMU).

Reference was made to future monitoring in Decision No 529/2013/EU of the European Parliament

and of the Council on accounting rules on greenhouse gas emissions and removals resulting from

activities relating to LULUCF, but this only refers to MS activities with MMUs between 0.05 ha and 1

ha. Many MS also employ sample based field surveys (e.g. national forest inventories) to collect the

required LULUCF data. Some of the initiatives associated with LULUCF have explored the potential of

improved monitoring performance, such as the use of SPOT4 imagery with a 20 m spatial resolution

to show land-use change between 2000 and 2010 for REDD+ reporting, but until very recently it has

not been practical for these types of monitoring to become operational globally.

A more fruitful avenue for user requirements in support of the 2nd generation CLC is to consider the

initiatives which are now addressing habitats, biodiversity and ecosystem services. The EU

Biodiversity Strategy to 2020 was adopted by the European Commission to halt the loss of

biodiversity and improve the state of Europe’s species, habitats, ecosystems and the services. It

defines six major targets with the second target focusing on maintaining and enhancing ecosystem

services, and restoring degraded ecosystems across the EU, in line with the global goal set in 2010.

Within this target, Action 5 is directed at improving knowledge of ecosystems and their services in

the EU. Member States, with the assistance of the Commission, will map and assess the state of

ecosystems and their services in their national territory, assess the economic value of such services,

and promote the integration of these values into accounting and reporting systems.

Page 18: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

17 | P a g e

The Mapping and Assessment of Ecosystems and their Services (MAES) initiative is supporting the

implementation of Action 5 and, although much of the early work in this area on European level has

used inputs such as CLC, it is obvious that to fully address this action an improved approach is

required especially for a better characterisation of land, freshwater and coastal habitats, particularly

in watershed and landscape approaches. The spatial resolution at which ecosystems and services

should be mapped and assessed will vary depending on the context and the purpose for which the

mapping/assessment is carried out. However, information from a more detailed thematic

characterisation and classification and at higher spatial resolution are required which are compatible

with the European-wide classification and could be aggregated in a consistent manner if needed.

A first version of a European ecosystem map covering spatially explicit ecosystem types for land and

freshwater has been produced at 1 ha spatial resolution using CLC (100 m raster), the predecessor of

the HRL imperviousness with 20 m resolution, JRC forest layer with 25 m spatial resolution plus a

range of other ancillary datasets (e.g. ECRINS water bodies) using a wide variety of spatial

resolutions (from detailed Open Street Map data to 10 x 10 km grid data used in Article 17 reporting

of the Habitats Directive). It is clear that ecosystem mapping and assessment could more fully

exploit the recently available Copernicus Sentinel data and land products, and move down to a

higher spatial resolution.

The primary goal of the 2nd generation CLC is to underpin European policies and, where possible,

fulfil their needs. Furthermore, the EEA also wants to make the new products useful at national,

regional and even local levels. To this end the requirements gathering exercise was extended to a

broader stakeholder community at the national level and below, including representatives of

commercial service providers, NGOs and academia. This work drew on existing surveys (e.g. MS CLC

survey conducted by ETC-ULS5) and the feedback from the two events where the 2nd generation CLC

was promoted.

The recent EC-funded NextSpace project collected user requirements for the next generation of

Sentinel satellites to be launched around 2030. These requirements were analysed on a domain

basis. From the land domain, those requirements which were given a context of land cover

(including vegetation), glaciers, lakes, above-ground biomass, leaf area index and snow were

considered. A broad range of spatial resolutions were requested from sub 2.5 m to 10 km. Two

thirds of the collated requirements wanted data in the 10 – 30 m region. The requested MMUs were

also broadly distributed, but with a preference for 0.5 to 5 ha, which represents field / city block

level for much of Europe. The temporal resolution / update frequency was dominated by yearly

revisions, although some users could accept 5 yearly updates. Some users also wished for monthly

updates, although the reason was unclear. The users were less specific about the thematic detail

required and the few references were given to CLC, EUNIS, LCCS and “basic land cover”. The

acceptable accuracy of the products was in the range 85 to 90%.

The European Topic Centre on Urban, Land and Soil systems (ETC/ULS) undertook a survey of

EIONET members involved with the CLMS and CLC production considering a range of topics in

advance of the 2018 activities. Some of the questions referred to the shortcomings of CLC and the

potential improvements that could be made towards a 2nd generation CLC product. As expected, it

was noted that some CLC classes cause problems because of their mixed nature and instances on the

ground that are sometimes complex and difficult to disentangle. It was suggested that this situation

could be improved by the use of a smaller MMU so that the mapped features have more

5 Service Contract No 3436/R0-Copernicus/EEA.56586, Task 3: Planning of cooperation with EEA member and cooperating countries. Final report in preparation of the CLC2018 exercise

Page 19: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

18 | P a g e

homogeneous characteristics. Also, a MMW reduction, particularly for highways, would allow linear

features to be represented more realistically. Of the 32 countries, who responded to the

questionnaire, 25 of them would support a finer spatial resolution, showing that there is, in general,

a national demand for high spatial resolution LULC data. The proposals ranged from 0.05 ha to

remaining at the current 25 ha, but the majority favoured 0.56 to 5 ha. Thematic refinement is also

supported by around one third of the respondents, who requested improved thematic detail,

separation of land cover from land use, splitting formerly mixed CLC classes and the addition of

further attributes to the spatial polygons.

During the Copernicus Land Monitoring Service: Workshop on CORINE Land Cover+ in Brussels in

November 2017 a number of MS representatives were asked to provide their requirements for land

monitoring and their thoughts on the 2nd generation CLC proposals as presentations to the whole

event. Their required spatial detail ranged from 0.5 ha MMU down 5 m pixel resolutions, which are

at the most detailed end of what had been suggested to date and potentially the smaller spatial

resolutions are out of scope for the 2nd generation CLC. For the required thematic detail, it was

suggested to go beyond the current CLC 44 class nomenclature and aim for either a full level-4

classification or split the more vague or ambiguous classes. There was a unanimous request for

yearly updates and a general feeling that the quality of the data should be improved relative to the

local component products.

As the final session in the workshop, a survey of the attendees (via an online voting tool) was

undertaken against a set of questions designed to understand the audience’s feeling towards the 2nd

generation CLC and assess their preferences for some of the key technical specifications. There was

strong support for 2nd generation CLC approach with 88 % of respondents considering the products

to have relevance at European and national levels with 35 % also thinking they would be useful at

sub-national levels. Similar levels of support were found for a spatially detailed land cover product

with limited thematic content, i.e. CLC-Backbone. On the technical specification of CLC-Backbone, 80

% felt a MMU of 1.0 ha or smaller would be appropriate, but this was almost evenly split across 0.25

ha, 0.5 ha and 1 ha. The CLC-Backbone MMW was evenly split between 10 m and 20 m and a lively

debate was produced around the impact of the input EO data. The dominant level of the preferred

thematic detail was around 10 classes with 51 %, although 27 % voted for an even less detailed

nomenclature of around 5 classes. In terms of the update cycle for CLC-Backbone, the majority of

the audience thought 3 years as sufficient, however almost a quarter wanted yearly updates. For the

CLC-Core the preferred grid size was less clear, ranging from 10 m to 100 m, with the extremes

dominating and representing almost equal numbers of votes. Further issues regarding the data

sources and contents of the 2nd generation CLC products were also surveyed, but these will be dealt

with in the relevant chapters.

During the project, the EEA and the EAGLE Group have received a large amount of written feedback

from MS and other stakeholders which has been addressed where appropriate in the relevant

product chapters below. Some of the feedback, although important, is out of scope for this project

and is being dealt with separately by the EEA and DG-GROW. The key themes of the feedback have

been:

Clarification of the overall 2nd generation CLC concept. The team appreciate that this is quite

a departure from current practice within the CLMS and during the iterations of the

deliverables the description of the concept has been clarified and improved, particularly in

6 0.5 ha being required by LULUCF

Page 20: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

19 | P a g e

visualising the relationship with the exiting CLMS components and the links between the

individual products of the 2nd generation CLC.

Suggestions regarding the technical specifications. A large number of technical issues have

been identified from the feedback from the stakeholders. These issues have been reviewed

and incorporated where appropriate, practical and feasible within the revision of the

technical specifications given in the later chapters of this deliverable.

Use cases for the 2nd generation CLC products. The requirements for an improved version of

CLC has been known for some time and the detailed specification for CLC+ is now well in line

with a range of global and European policy and environmental management needs. The new

products within the 2nd generation CLC, such as CLC-Backbone and CLC-Core, have a vital

role to play in the delivery of CLC+. Still, there are so far, no admissible use cases. The

required use cases should be developed as end users become more familiar with the

capabilities.

Inputs to the products from MS and other sources. Inclusion of MS data in the production of

the 2nd generation CLC products matches the bottom-up approaches promoted by the EEA

and EAGLE, and may help make the products more relevant for use at the national and sub-

national level. There are, however, a number of technical and organisation challenges which

need to be addressed. This project will address the technical issues by proposing additional

specifications for potential MS inputs while EEA and DG-GROW will deal with the

organisational issues.

Quality of the 2nd generation CLC products. A further issue of improved spatial and thematic

detail is the need to increase the quality of the products so that they will be accepted by the

user community. The conceptual strategy aims to support high quality products by allowing

CLC-Backbone to focus on improvement of the spatial detail and CLC-Core to focus on the

enrichment of thematic detail.

Change detection, update cycle and maintenance. The aim of this project was to propose the

initial technical specification for the 2nd generation CLC products, not to design a

methodology for the updating and maintenance of the product in an operational monitoring

environment. These issues have been noted where relevant in the technical specifications

and are likely to feature more prominently in later development work.

Feasibility studies. The project was designed as a desk study and there was no requirement

or resources to undertake any testing of the specifications proposed or methods that may

be relevant. Where appropriate existing feasibility study results were incorporated into the

reviews and technical specification development, such as the work undertaken in Hungary.

Governance and implementation of the 2nd generation CLC. As was expected, the proposal of

a set of products more closely aligned to MS activities, yet integral to the CLMS, has raised a

number of issues related to governance and implementation of work. These issues are out of

scope for this project, but have been noted and are being dealt with by the EEA and DG

GROW.

From the requirements review so far, the feedback from the workshops and considering the specific

requirement associated with LULUCF, it is suggested that the ultimate product of a 2nd generation

CLC+ should have a MMU of around 1 ha (with the ability to estimate areas down to 0.5 ha), a MMW

of 20 m and be based on EO data with a spatial resolution of between 10 and 20 m. The thematic

content would be a refinement of the current CLC nomenclature to cope with changing MMU,

Page 21: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

20 | P a g e

separation of land cover and land use for the needs of ecosystem mapping and assessment. The

temporal update could come down from 6 years to 3 years in the short-term, and potentially to 1

year in the longer-term. To reach this ultimate goal and initiate implementation of the 2nd

generation CLC, the requirements review supports the role of the CLC-backbone and CLC-Core in

contributing to the development and delivery of the required land monitoring system.

Page 22: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

21 | P a g e

4 CLC-BACKBONE – THE INDUSTRY CALL FOR TENDER

4.1 Introduction CLC-Backbone has been conceived as a new high spatial resolution vector product. It represents a

baseline object delineation with the emphasis on geometric rather than thematic detail. The wall-to-

wall coverage of CLC-Backbone shall draw on, complete and amend the incomplete picture formed

by the current local component products (Urban Atlas, Riparian Zones and Natura 2000) which cover

approximately a quarter of total area as shown in Figure 4-1. CLC-Backbone will provide a

comprehensive and detailed coverage of whole EEA-39.

The objective of the CLC-Backbone is to provide:

A useful standalone product

a spatially detailed geometric base for CLC+

Basic homogeneous land cover units (EAGLE terminology) as thematic input to CLC-Core and

CLC+

Figure 4-1: illustration of a merger of current local component layers covering (with overlaps) 26,3% of EEA39 territory (legend: red – UA, yellow - N2K, blue – RZ LC)

CLC-Backbone will initially be produced by a mostly automated industrial approach divided into

three distinct steps (Figure 4-2). The spatial units of CLC-Backbone, called “landscape objects”, are

Page 23: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

22 | P a g e

vector polygons. They are attributed using a wall-to-wall pixel-based 10*10m2 classification and

further object-based characteristics.

Figure 4-2: Processing steps to derive (1) the geometric partition of objects on Level 1 using a-priori information, (2) the delineation of objects on Level 2 using image segmentation techniques and (3) the pixel-based classification of EAGLE land cover components and (4) attribution of Level 2 objects based on this classification and additional Sentinel-1 information.

For implementation of CLC-Backbone the following principles are taken into account in order to

contribute to a harmonized European land cover monitoring framework:

• Heritage: The concept takes into account the existing pan-European and local Copernicus

layers. The technical specifications to be provided by this contract are therefore based on a

thorough review of these datasets and their geometry and thematic content.

• Integration: CLC-Backbone will integrate selected geometric information from existing

COPERNICUS products and ancillary data within the process.

4.2 Description of CLC-backbone

CLC-backbone will consist of two sub-products:

Raster product: pixel-based (Sentinel-2 based), 10*10m2, pure land cover classes

Vector product: 1 ha MMU, derived from linear networks and image segmentation,

attributed with aggregated statistics from the raster product and additional characteristics

from Sentinel-1

Page 24: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

23 | P a g e

The production of the CLC-backbone foresees four main steps:

Step 1:

The first level of object borders (geometric skeleton) derived in this step represent persistent

object borders (“hard bones”) in the landscape (Step 1). The borders are derived from ancillary data

(roads, railways and rivers).

Step 2:

On a second level a subdivision of the persistent objects (Level 1) will be achieved through image

segmentation (“soft bones”), based on multi-temporal Sentinel data within a defined observation

period (resulting in Level 2 landscape objects = polygons). The geometric output of CLC-Backbone –

the delineated “landscape objects” – represent spectrally and/or texturally homogeneous features

that are further characterised and attributed in step 3.

Step 3:

Production of an independent pixel based (10*10m2) pure land cover classification with pure land

cover classes based on the EAGLE land cover component concept.

Step 4:

Characterization of the landscape (objects/features) from Step 2 using the pixel values from Step 3.

Further attribution using spectral characteristic of the object (mean, variance) from Sentinel-2 and

Sentinel-1.

4.2.1 Level 1 Objects – hard bones The idea behind the Level 1 objects (“hard bones”) is to derive a partition of landscape that is

reflecting relatively stable (persistent) structures and borders built from anthropogenic and natural

features. The transportation network like roads and railways (both without tunnels) is the major

component, followed by the river networks.

When building the spatial framework, there is a potential long list of sources for persistent object

geometries. However, throughout the consultation phase the decision was taken to solely

concentrate on linear networks and provide all other object geometries directly from Image

segmentation.

4.2.2 Level 2 Objects – soft bones The aim of this second level segmentation is to derive objects that can be differentiated by the

spectral response and behaviour throughout a year. With the increasing number of observation

dates per pixel throughout a year the likelihood to find homogeneous management units increases,

but at the same time pixel based noise is increased as environmental conditions within a field parcel

may vary significantly throughout a year (e.g. soil moisture content). Service providers will have to

balance the number of acquisitions to maximize the identification of single field parcels and on the

other side to reduce “salt and pepper” effects due to varying spectral response.

The level 2 landscape objects represent in the ideal case a consistent management of field parcels,

and/or objects with a unique vegetation cover and homogeneous vegetation dynamics throughout a

year. As an example, different settlement structures (e.g. single houses or building blocks) should

appear spectrally different (due to the different levels of sealing, shape spatial extend and spatial

built-up pattern) and will therefore be delineated as separate polygons. The actual cultivation

measures applied on the land will also influence the type of delineation as e.g. in agriculture the field

structure is visible in multi-temporal images due to the differences in temporal-spectral response

according to the applied management practices on field level (mowing, ploughing, sowing).

Page 25: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

24 | P a g e

Table 4-1: Main characteristic of Level 1 objects (hard bone) and level 2 objects (soft bones)

1. Level objects 2. Level objects

Name Hard bones Soft bones

Method Partition of landscape according to existing spatial data

Image segmentation

Input data Persistent landscape objects: Transport infrastructure (roads, railways excluding tunnels); river network

Sentinel-2 complete time-stacks of one vegetation period (amended by Landsat 8 data in case of clouds/haze)

Special issue Geometric transformation of line geometry into vector-polygons and further transformation of vector-polygons into raster based regions

Pre-processing (atmospheric and topographic) of S-2 images;

Advantage Very good cost/benefit ratio; no doubling of efforts, reuse existing European geospatial data infrastructure;

Independent data source; Clearly defined observation period (2018)

Disadvantage Data might be outdated; large vector processing facilities needed; effort to integrate and collect appropriate data

Reliability and reproducibility of segmentation algorithms

Concerns European data vs. national data Technical feasibility (big data processing); cooperation with data centres necessary

4.3 Technical specifications

4.3.1 Spatial scale / Minimum Mapping Unit Raster product:

MMU 100 m2 (10*10m2 Pixel)

Vector product:

CLC-Backbone shall address landscape objects with a predefined

o MMU of 1 ha and a

o Minimum mapping width (MMW) of 20m.

Units (final “landscape objects” resulting from merged Level 1 and Level 2)

o Spectral and/or textural homogeneous (over time) units that represent meaningful

objects in the real world (unique land cover and/or homogeneous vegetation

dynamics)

o They are characterized with the land cover class that is predominantly occurring

throughout the season (e.g. differentiation between trees and clear-cut according to

dominating time period of occurrence within observation period)

Page 26: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

25 | P a g e

o Examples for meaningful objects are:

Single agricultural management units (land parcels)

Tree stands that are homogeneous according to criteria age, mixture and/or

management

Clear cuts

City blocks

Extraction site

Lakes

Wider roads, railways and rivers (>=20m)

o Short-term temporary effects (less than 6 month) like variation in soil water content

or temporary flooding do not constitute a meaningful object in the context of CLC-

backbone

o Adjacent Units can have the same land cover code

As they might only be separated by a road or river, but nevertheless

comprise different spatial units with unique management cycle and spatial

configuration (and thus different spectral-temporal characteristics)

Area coverage of units

o Wall-to-wall

o 100% of area has to be covered by segmented units

4.3.2 Delineation accuracy of geometric units The appropriate size and delineation are defined by a sample of verification polygons derived via

visual interpretation

The delineation accuracy of the resulting polygons (vector product) is defined with

Appropriate size:

o Too large polygons (overestimation of object area): not more than 10% of all

polygons

o Too small polygons (underestimation of object area): not more than 15% of all

polygons

Any deviation of more than 15% of polygon area is considered too small or

too large

Appropriate delineation (positional accuracy)

o Shift of border: max. 20m shift

not more than 10% of perimeters is shifted more than 15m

4.3.3 Reference year The production of CLC-Backbone shall generally make use of images from the year 2017/2018, i.e.

Sentinel-2 HR layer stack. The image stack covers one full vegetation season with 2018 as reference

year, but taking regional conditions into account.

The vegetation season in Mediterranean regions starts in October of the previous year and the

vegetation season is for this region therefore set to 10/2017-9/2018. Northern regions, with short

vegetation season, and areas along the Atlantic coast (with frequent and high cloud cover) may,

Page 27: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

26 | P a g e

even with Sentinel-2 , be impossible to cover with data from a single season. A pragmatic solution is

then to include imagery from 2017 or earlier if necessary. Special condition, currently not known to

the development team, may of course also imply similar pragmatic deviations from the reference

year in other parts of Europe.

4.3.4 EO data The Sentinel-2 data for the production of CLC-Backbone will need to make use of all available

observations from the ESA service hubs. The aim is to provide a multi-spectral, multi-date, 10-20 m

spatial resolution imagery as basic EO-input. This selection can be amended by Sentinel-1, where

helpful. As the full satellite constellation of Sentinel (using 2A and 2B) is available operationally from

late 2017 onwards, it is anticipated that last months of 2017 and full year 2018 is the first full

vegetation period for a comprehensive multi-temporal coverage of Europe.

The complete layer stack of multiple time series of all S-2 (and if necessary S-1) acquisitions have to

be considered to receive in minimum 6 (i.e. monthly) acquisitions for the 2018 vegetation period

(that is extended to 2017 observations in Mediterranean areas) in order to:

fully track the seasonal cycles of vegetation

increase the automation of the processing

ensure the homogeneity of results across Europe

Older Sentinel-2 archive data (before 2018) might be used to

close data gaps for 2017/18, e.g. due to cloud cover (specifically in northern countries)

to confirm stable object geometries (“soft bones”)

In case of availability of a European-wide coverage of VHR data for 2018, its integration in the

processing chain should be considered (either for accompanying the production or for verification).

The synergetic use of Sentinel -1 (SAR) imagery shall be considered for improving classification

accuracy and enhancing the thematic information, especially on thematic issues like soil properties

(wetness). Recent studies on the use of merged optical-SAR imagery as well as SAR visual products

and the experiences in HRL production have confirmed their applicability for both semi-automated

classification and visual interpretation.

Landsat 8 or equivalent data shall be used for gap filling in case of insufficient coverage by Sentinel-

2.

4.3.4.1 Pre-processing of Sentinel-2 data

As Sentinel-2 is an optical system the average cloud coverage influences the number of observations

significantly. All cloud free (and shadow-free) pixels of an image can be analysed. The constellation

of Sentinel-2 satellites provides an average the revisiting frequency of 3-4 days in Europe. The

coverage is even more frequent in northern regions due to the overlaps of the S-2 path-footprints.

EO imagery has to be pre-processed to correct for atmospheric and topographic conditions. ESA has

decided that Sen2Cor will be used for correction of imagery for Europe and will reprocess the L2A

level of Sentinel 2 imagery for Europe back to May 2017 using this algorithm. The current plan is

thus to start atmospheric correction using Sen2Cor in Europe and to reassess the algorithms

(e.g. Maja) in 2019.

In case radiometric pre-processing using ESA Sentinel data hub is not available in time, service

providers will have to perform their own radiometric pre-processing. Service Providers will therefore

have to prove their ability to perform the pre-processing as well as a quality assurance to exclude

scenes with displacement errors (common problem with S-2).

Page 28: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

27 | P a g e

4.4 Thematic attribution

Raster product:

Classification of individual pixels according to the nomenclature described is chapter 0.

below

Vector product:

The polygons produced by geometric segmentation (merging Level 1 and Level 2) specification are

classified using three different approaches.

1. Attribution with basic land cover class statistic based on pixels in the raster product found in

the landscape object

Statistical parameters to be assigned to the landscape object

i. Dominating land cover class (majority)

ii. Ordered list of land cover class percentages (3 most dominating land cover

classes)

iii. Percentage of land cover class i…j

1. count of pixels per land cover class/total number of pixels per

landscape object

2. Land cover class (chapter 0) determined by conventional analysis of the spectral properties

of the entire object

3. Attribution according to Sentinel-1 data

(1) Wetness

(2) Roughness

Page 29: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

28 | P a g e

4.4.1 Thematic classes Thematic detail:

After obtaining the hard bones and soft bones of the geometric skeleton, each single landscape object

is then labelled by simple land cover classes. The descriptive nomenclature that is applied for the

labelling contains the following EAGLE derived Land Cover Components:

9 land cover classes

1. Sealed surface (buildings and flat sealed surfaces)

2. Woody – coniferous

3. Woody – broadleaved

4. Permanent herbaceous (i.e. grasslands)

5. Periodically herbaceous (i.e. arable land, natural grasslands with periodic vegetation

cover)

6. Permanent bare soil

7. Non-vegetated or scantly vegetated surfaces (i.e. rock, screes and sand, lichen,

sparsely vegetated alpine heath)

8. Water surfaces

9. Snow & ice

Remark to class woody:

Representatives from some member countries have expressed the wish to further differentiate the

class woody into “trees” and “shrubs”. This differentiation would be useful but can currently not be

achieved with sufficient accuracy from spectral attributes alone. A robust differentiation is only

possible by using a normalized difference surface model (nDSM).

In the CLC-Backbone nomenclature code list no land use terminology has been used. This is done to avoid semantic confusions between land cover and land use. Land use is, however, implied in the separation between permanent and periodically herbaceous land.

permanent herbaceous

o Permanent herbaceous areas are characterized by a continuous vegetation cover throughout a year. No bare soil occurs within a year. These areas are either unmanaged or extensively managed natural grasslands or permanently managed grasslands, or arable areas with a permanent vegetation cover (e.g. fodder crops) or even set-aside land in agriculture. For managed grasslands, the biomass will vary over the year, depending on the number of mowing (grassland cuts) or grazing events.

periodically herbaceous

o Periodically herbaceous areas are characterized by at least one land cover change (in the sense of EAGLE land cover components) between bare soil and herbaceous vegetation within one year. Depending on the management intensities these areas can also have up to several changes between these two EAGLE land cover components within a year. Normally these areas are managed as arable areas.

Page 30: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

29 | P a g e

A permanent and managed grassland as defined in IACS/LPIS may be ploughed every 3-5 years for amelioration purposes followed by an artificial seeding phase and a renewal of vegetation cover. Thus, a managed grassland may even show a phase of bare soil within a time frame of 5-6 years. This has to be considered, when evaluating shorter time phases, as by occasion a managed permanent grassland may accidentally be subject to renewal within the observation period of 1 year.

Complex classes, either as mixture of existing land cover classes (e.g. CLC-mixed classes) or in the sense of ecologically diverse classes (e.g. wetlands), are avoided because the nomenclature solely concentrates on the classification according to surface structure and properties.

A detailed technical description how to model the temporal changes is given in Annex 3.

Important note:

As the main emphasis is on the geometric delineation of landscape objects neighbouring objects may

be assigned the same land cover code. The polygons should still be kept as unique entities, as they can

be different with respect to attributes that may be added in a later stage (e.g. broadleaved trees vs.

needle-leaved coniferous trees, count of grass cutting, number of bare soil count, wetness …).

Accuracy:

The accuracy for the thematic differentiation is defined with

9 land cover classes

o 85% overall accuracy: for 10*10 m2 pixel based product

Omission errors: max. 20%

Commission errors: max. 20%

o 90% overall accuracy: for vector based product

Omission errors: max. 20%

Commission errors: max. 20%

Remark on further development towards CLC+:

Apart from the technical specifications for the foreseen tender for CLC-Backbone, additional

attribution by various data sources is an option for future enrichment of classes towards the

development of CLC+. First of all, member countries can be asked to populate the resulting geometries

with national data. In addition, crowd-sourced data may be used to characterize the land use within

these polygons (either by in-situ point observations or by wall-to-wall datasets e.g. OSM land use).

4.5 European ancillary data sets for Level 1 hard bone creation

Throughout the consultation process a number of potential datasets that could contribute to the

creation of the “hard bones” (level 1 objects in Figure 4-2) were discussed. The team decided, after

thorough consideration of all proposals, to concentrate solely on linear networks such as roads,

railways and rivers as input to the hard bone.

Page 31: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

30 | P a g e

Table 4-2: Overview of existing relevant products which were analysed as potential input to support the construction of the geometric structure (Level 1 “hard bones”) of the CLC-Backbone.

Product comment Format Potential use for constructing basic geometry

Reference year

Roads and railways

National data portals7 National reference data line National INSPIRE data set provision; not harmonized European dataset, varying license conditions, varying levels of roads, varying information on tunnels

Varying throughout member countries

Open Street Map – roads and railways

Crowd sourced data Line Linear transportation network (centre lines), standardized classes, information on tunnels available

Frequent change mapping; full time history

HERE maps (Navmart) Operational Commercial data

Line Commercial layer; licensing constraints

Regular updates based on business processes

Rivers

EU Hydro (Copernicus reference layer)

Free an open COPERNICUS product

Line Geometric quality derived from VHR images, very high location accuracy

2011-2013

WISE WFD – surface water bodies

bottom-up dataset from official national hydrography networks

Waterbodies as polygons Waterbody line as line geometry

Based on WFD2016 reporting (UK and Slovenia only for viewing)

2016 (6 year update cycle, next update 2022)

7 At the current stage not all geographic information is available via the national INPSIRE portals

Page 32: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

31 | P a g e

The usage of additional geospatial information for constructing Level 1 “hard bones” is based on the

following arguments:

European landscape is a highly anthropogenic transformed landscape, where transport and

river networks (beside others) form the basic subdivision of landscape

The location of transport and river networks are fairly known due to geospatial data on

national and/or European scale

Many of the borders defined by transport and river networks can also be identified from

Sentinel-2 images, but at high costs and with lower accuracies

Usage of linear networks:

The linear networks are used to represent line features (in the sense of being smaller than 20m) on

the earth surface. Therefore, all kind of tunnels are excluded.

The linear networks (centre line of roads, railways and rivers) represent relatively stable borders of

landscape objects (e.g. parcels/agricultural fields/city blocks that are surrounded by roads, railways

and rivers). This division of landscape is used as input to step 2 for the subsequent segmentation.

Wider roads (>20m) however establish landscape objects themselves.

It is important to note that no attribute information is needed for the integration of the linear

networks. Any part of the linear network represents a border for the adjacent polygon, therefore in

the final partition of the landscape not even the information, if the underlying border represents a

road, railway or river is of any importance anymore. The vector line network is transformed into

polygons (1 ha MMU). These polygons are further subdivided using Sentinel-2 data in Step 2 using

automated image segmentation.

Roads, railways and rivers that are wider than 20m have to be covered in step 2 as polygon features.

In case automated image segmentation algorithms do not fully recover these narrow and elongated

polygons a manual editing phase for sufficiently covering those wider linear networks (that comprise

polygon features) has to be foreseen.

4.5.1 Criteria for ancillary data input The geospatial data for the creation of the CLC-Backbone have to meet following specifications:

Provision of a consistent set of information across European countries

o Scale: 1:10.000 or better

o Geometric quality: +/- 5m positional accuracy

o Geometric representation:

Centre line

Parallel lines (distance less than 20 m, e.g. driving directions) have to be

merged

o Completeness: no spatial or thematic data gaps

Spatial completeness: >95% of network covered

Thematic completeness: all types of roads from highway down to

agricultural tracks covered

o Network: connected network

no dangles

snapping tolerance 10m

Page 33: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

32 | P a g e

o Specifically, for roads and railways

Information on tunnels per line string

Road types:

Standardized and comparable road type categories across Europe

all types of roads (from highway to agricultural tracks)

o Specifically, for rivers

Completeness: All rivers within catchments larger than 10 km2

Provision of data at no cost or at reasonable costs, where cost reduction in the overall

process justifies the costs

Support of the full, free and open data policy of the end product

o Access to final data is based on a principle of full, open and free access as

established by the Copernicus data and information policy Regulation (EU) No

1159/2013 of 12 July 2013

Provision of data to industrial service providers, potentially outside the country of the data

provider.

Availability of the data to the service provider at the start of the project implementation.

A combination of datasets is feasible (e.g. WFD-WISE dataset within EU amended by EU-Hydro

outside EU coverage), if both dataset comply with the specification criteria.

4.5.2 Ancillary input data Three different possible data sources are identified for transport network (Table 4.1):

National reference data

Open street map data

Commercially available navigation data

For rivers two European datasets are available

Water framework directive – WISE

EU-hydro (COPERNICUS product)

The choice of dataset to be used for the production has to consider the criteria defined in chapter

4.5.1. If alternative dataset exists (e.g. availability of national reference data and OSM) the objective

criteria in chapter 4.5.1 are used to select the data with the highest fitness of purpose.

4.5.2.1 Open street map data

Throughout the last 10-15 years the Open Street Map has turned to be the major and

comprehensive provider for road and railway networks worldwide. Data collected by volunteers and

provided with an open-content license.

One major critic when using crowd-sourced data is their incompleteness and varying quality. For

OSM road dataset recent scientific reviews (e.g. Barrington-Leigh and Millard-Ball, 20178) have

shown a completeness in Europe of by far more than 96% for the majority of the EEA 39 countries

(the figures for the completeness per country are available in the Annex 3). According to their study

only Serbia, Bosnia-Hercegovina and Turkey have completeness below 90%. These three countries

may be subject for a refinement of roads using additional national data sources.

In the Figure 4-3 below the completeness of OSM in four countries (Estonia, Portugal, Romania and

Serbia) is visualised on aerial images and Sentinel-2 background data.

8 http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0180698

Page 34: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

33 | P a g e

Figure 4-3: Illustration of OSM completeness for Estonia (near Vändra), Portugal (near Pinhal Novo), Romania

(near Parta) and Serbia (near Indija) (top to bottom). © Bing/Google and EOX Sentinel-2 cloud free services as

Background (left to right).

License:

The Open Street Map data is licensed under the Open Data Commons Open Database License

(ODbL) by the Open Street Map Foundation (OSMF). As the input data from OSM is restricted to

geometric properties and these properties are substantially altered according to the geometric

transformation from original vector data into the grid database and combined with remote sensing

data the CLC-backbone is not regarded as a derivative database of OSM, but a collective database.

Page 35: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

34 | P a g e

Therefore, according to Art. 4,5 of the ODbL v1.09 the COPERNICUS standard license (guaranteeing a

full, free and open access with the possibility to derive business application) can be applied to the

dataset.

4.5.2.2 INSPIRE national portals: roads & railways

The project ELF – European Location Framework10 – has compiled overviews for existing national

services mainly for INSPIRE Annex I themes (roads buildings, hydrology). They try to serve as single

point of access for harmonised reference data from National Mapping, Cadastre and Land Registry

Authorities. The INSPIRE regulation will certainly lead to an increased availability and accessibility for

European reference data, but at the current point in time the data situation even for core national

data is still quite patchy and requires large efforts for searching, finding, compiling and harmonizing

(semantically and geometrically) the various dataset. Therefore, ELF is currently not suited to

provide data according to the criteria defined in chapter 4.5.1.

The recent survey (status: 19. Jan. 2018) in the official INSPIRE portal 11 reveals that in total 101

datasets are available from 10 member countries for download in the data theme “transport

networks”. Some countries do only provide regional distributed datasets, but not singular national

wide download services.

Figure 4-4: Downloadable datasets for INSPIRE transport network (road) services using the official INSPIRE interface (Status: Jan. 2018).

9 https://opendatacommons.org/licenses/odbl/1.0/ 10 http://locationframework.eu/ 11 http://inspire-geoportal.ec.europa.eu/thematicviewer/Theme.action?themecode=tn

Page 36: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

35 | P a g e

Figure 4-5: Distribution of number of downloadable datasets for INSPIRE transport service across member countries (Status: Jan. 2018)

The advantage of using national data instead of European wide available data has to be carefully

evaluated with regard to costs and benefit. First of all, it is still a challenge to find the appropriate

dataset. Although it is known that other national services exist (e.g. Spain), they cannot be found

using this central access portal. This has to be considered, when replacing existing European wide

data sources with national data.

The current available national data are not harmonized across member countries regarding the e.g.

definition of road-types or completeness of road types.

4.5.2.3 Rivers and lakes

One example for a joint European (EU 28) dataset is the reference spatial data set of the water

framework directive (WFD) that is part of the water information system for Europe (WISE)12. They

compile national information reported by the EU Member states to one homogeneous dataset.

According to the reporting cycle of the WFD that requires an update ever 6 years, the dataset is

available in a version 2010 and 2016. Such datasets may provide a valuable data source for the

generation of “hard bones”, but still do not provide the optimal solution. In the case of the WISE

WFD dataset the coverage is in principal restricted to EU 28, but not even complete within EU 28, as

12 https://www.eea.europa.eu/data-and-maps/data/wise-wfd-spatial

Page 37: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

36 | P a g e

countries like UK, Ireland, Slovenia and Lithuania are not included yet (UK and Slovenia are only

available for EEA and EC for visualisation purposes).

Compared to the EU-Hydro dataset they provide a higher geometric quality, but are less complete,

as smaller rivers of catchments less than 10 km2 are not included by definition.

A combination of these datasets is recommended.

4.5.3 Remark: land parcel identification system (LPIS) The land parcel identification system (LPIS) in Europe is at the current stage not suited to be

integrated into the Level 1 “hard bone” process, as the data is currently either

not completely available on national level (approx. 50% of countries provide data)

partly only available on regional level (Spain, Germany)

do not represent the same spatial scale across countries

are not available with the same semantic information (e.g. grassland/arable land

differentiation)

if data are harmonized under INSPIRE, they are provided either under Annex II land cover or

Annex III land use (but not in one homogeneous data format)

not accessible for all users / uses

A detailed explanation is given in the Annex 4.

It is expected that the availability and homogeneity of IACS/LPIS data in Europe is improving in the

next years. For a future update of the CLC-backbone this improved availability of LPIS data will play

an important role.

Page 38: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

37 | P a g e

4.6 Short-Form of CLC-Backbone end product technical specifications

CLC-Backbone

[Example image if available.]

Description

CLC-Backbone is a spatially detailed, large scale, EO-based land cover inventory in a vector format

(accompanied by a raster product layer) providing a geometric backbone with limited, but robust

thematic detail on which to build other products.

Input data sources (EO & ancillary)

EO data used in the production

Spatial resolution 10 – 20 m

Spectral content – Visible, NIR, SWIR, SAR

Temporal resolution – in minimum monthly cloud free acquisitions

Source data –Sentinel-2 amended by Landsat 8 where necessary

Contributing data – Sentinel-1

Ancillary data used in the production

Roads and railways (national datasets and/or OSM)

Hydrography: WISE WFD and/or EU – Hydro

Thematic Content

EAGLE Land Cover Components:

1…Sealed

2…Woody vegetation - coniferous

3…Woody vegetation - broadleaved

4…permanent herbaceous (i.e. grasslands)

5…Periodically herbaceous (i.e. arable land)

6…Permanent bare soil

7…Non-vegetated bare surfaces (i.e. rock and screes)

8…Water surfaces

9…Snow & ice

Page 39: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

38 | P a g e

Methodology

The detailed geometric structure is derived via a hierarchical 2-step approach. In a first step, the

boundaries (hard bones) of relatively stable landscape objects are detected based on existing data

sets for roads, railways and rivers (Level 1 objects). In a second step, image segmentation

techniques based on multi-temporal Sentinel-2 data are applied to further subdivide the Level 1

objects into smaller units (Level 2 objects).

The third step contains a classification and spectral-temporal attribution (Sentinel-1 and -2) of the

landscape objects derived from steps 1 and 2 using a pixel-based classification of the Land Cover

components. The pixel based classification is generalized on object level using indicators like

dominating land cover class and percentage mixture of land cover classes.

The third step contains an independent land cover classification of individual pixels. The fourth step

assigns land cover class values (dominating class, percentages of classes, …) to the Level 2 polygons

using the pixel map (Step 3). In addition, other attribution methods are used to characterize the

polygon with spectral-temporal characteristic and indices (e.g. Sentinel-1 characteristics).

Geometric resolution (Scale)

~1:10 000

Geographic projection / Reference system

ETRS89-LAEA

ETRS89 Lambert Azimuthal Equal-Area (EPSG: 3035, http://spatialreference.org/ref/epsg/etrs89-

etrs-laea/)

Coverage

EEA 32 Member Countries and 7 Cooperating Countries, i.e. the full EEA39 (Albania, Austria, Belgium,

Bosnia- Herzegovina, Bulgaria, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, France ,

Germany, Greece, Hungary, Iceland, Ireland, Italy, Kosovo (under the UN Security Council Resolution

1244/99), Latvia, Liechtenstein, Lithuania, Luxembourg, Former Yugoslavian Republic of Macedonia,

Malta, Montenegro, the Netherlands, Norway , Poland, Portugal , Romania, Serbia, Slovakia,

Slovenia, Spain, Sweden, Switzerland, Turkey, United Kingdom) who are participating in the regular

CLC dataflow. The number of countries involved is 39. The area covered is, approximately, 6 Million

km2.

Geometric accuracy (positioning accuracy)

Raster product: equals Sentinel-2

Vector product: Less than half a pixel based on 10 m spatial resolution input data

Spatial resolution / Minimum Mapping Unit (MMU)

Raster product: 100 m2 (10*10m2)

Vector product: 1 ha

Min. Width of linear features

Vector product: 20 m

Topological correctness

[TBD]

Page 40: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

39 | P a g e

Raster coding

Encoding identical to [thematic detail]

Thematic accuracy (in %) / quality method

The target will be set at 90 %. A quantitative approach will be used based on a set of stratified random

point samples compared to external datasets (e.g. VHR, GoogleEarth, national orthophotos).

Target / reference year

2018 (full vegetation season, starting partly in autumn 2017 in southern Europe)

Up-date frequency

[TBD]

Information actuality

Plus and minus 1 year from target year (2017 – 2019). A longer time window may be required for the

ancillary datasets.

Delivery format

Raster product: Grid

Vector product: Vector

Data type

Raster product: Grid

Vector product: Vector

Naming conventions

[TBD]

Medium

[TBD]

Delivery reliability

[TBD]

Delivery time

[TBD]

Archive

[TBD]

Page 41: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

40 | P a g e

5 CLC-CORE – THE GRID APPROACH

The CLC-Core product should be seen as an underpinning semantically and geometrically

harmonized “data container” and “data modeller” that holds, or allows referencing of, land

monitoring information from different sources in a grid-based information system. The grid

database will essentially store geospatial and thematic data from sources such as CLC-Backbone,

CLC, Local Component data, HRLs, in-situ as well as information provided by the MS. The contents of

the grid database can be exploited directly within grid-based analyses or alternatively subsets of the

grid database can be extracted and analysed by spatial queries driven by objects from an external

vector dataset. In this way, the CLC-Core can support the other elements of the 2nd generation CLC,

for instance, a CLC-Backbone object could gain additional thematic information by aggregating or

analysing the equivalent portion of the CLC-Core as a starting point for CLC+ production (see next

chapter).

5.1 Background Textbooks usually differentiate between “raster GIS” and “vector GIS” (Couclelis 199213, Congalton

199714). The “vector GIS” does, like the paper map, attempt to represent individual objects that can

be identified, measured, classified and digitized. The “raster GIS” attempts to represent continuous

spatial fields approximated by a partition of the land into small spatial units with uniform size and

shape. The “raster GIS” therefore have structural properties that make them particularly suitable for

use in modelling exercises.

A grid is a spatial model falling somewhere between the vector model and the raster model. The grid

looks like a raster. The difference is the information attached to the grid cell. A raster cell (pixel) is

classified to a particular land cover class or single characteristic. The grid cell, on the other hand, is

characterized by how much it contains of each land cover class or other information. The grid is, as a

result, a more information-rich data model than the raster.

The difference between the raster and the grid is illustrated in Figure 5-1, using an example based on

CLC. The original CLC vector data set can be converted into a new data set with uniform spatial units

of e.g. in this case 1 X 1 km. The result can either be a “raster” where a single CLC class is assigned to

each pixel (representing for example the dominant land cover class inside the pixel; or the CLC class

found at the centre of the pixel unit). Alternatively, the result is a “grid” where an attribute vector

representing the complete composition of land cover classes inside the unit is assigned to each grid

cell. Geometric information on a more detailed scale than the size of the spatial unit is in both cases

lost, but the overall loss of information is less in the grid than in the raster.

The loss of information when using a grid structure can be further reduced if the grid size is selected

to be less than the minimum mapping unit of the input product. For instance, with a MMU of 0.25 ha

13 Couclelis, H. 1992. People manipulate objects (but cultivate fields): Beyond the raster-vector debate in GIS, in Frank, A.U., Campari, I. and Formentini, U. (eds) Theories and Methods of Spatio-Temporal Reasoning in Geographic Space, Lecture Notes in Computer Science 639: 65 – 77, Springer Berlin Heidelberg 14 Congalton, R.G. 1997. Exploring and evaluating the consequences of vector-to-raster and raster-to-vector conversion. Photogrammetric Engineering and Remote Sensing, 63: 425-434.

Page 42: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

41 | P a g e

(50 x 50 m), the grid size should be around 10 x 10 m. Consequently, any downstream products

derived from the grid-based information must also consider grid size in relation to its requirements.

Figure 5-1: CLC with a 1 km raster/grid superimposed (top) illustrating the difference between encoding a particular unit as raster pixel (centre) or a grid cell (bottom). “daa” is a Norwegian unit: 10 daa = 1 ha.

A grid data model is not the result of mapping in any traditional sense. The grid is “populated” with

thematic information from (one or more) existing sources (Figure 5-2). One method frequently used

to populate grids with LULC information is to calculate spatial coverage of each LULC class based on

a geometric overlay between the grid and detailed polygon datasets. The assignment of attributes

can furthermore involve counting or statistical processing of registered data for observations made

inside each grid cell (for instance, not only the area of a class, but also the number of objects it

represented). Ancillary databases, which provide important and relevant information about the

content and context of the grid cells, can also be added. For instance, the number of buildings,

length of roads or number of people found within the grid cells.

5.2 Populating the database The grid as a spatial model consists of square spatial units of uniform size and shape that can have an

(internally) heterogeneous thematic composition, i.e. the grids are “geometrically uniform polygons”.

Each grid cell has a unique ID that is related to a database containing the attribute data.

Page 43: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

42 | P a g e

In a first step, CLC-Core will be populated with existing thematic information from the CLMS (see

also chapter 4.4):

Local Component data (Urban Atlas, N2000, Riparian Zones)

High Resolution Layers (Imperviousness, Forest, Grassland, Water & Wetness)

CLC-Backbone

Any other data, such as CLC (25 ha), MS data, …

Upon agreement, Member States are invited to provide also their national information to populate

the information system, mainly on land use and agricultural themes.

Figure 5-2: Representation of real world data in the CLC-Core.

5.3 Data modelling in CLC-Core CLC-Core will be the effective “engine” to model, process and integrate the individual input data sets

to create new, value added information. The modelling will make use of synergies (the same

information provided for one cell based on different sources) as well as disagreements (different

information provided for one cell by different sources) during the data processing.

We would like to refer to the modelling and combination of different layers of the database,

including the modelling parameters as an “instance” of the database. In this understanding, any

output of a database modelling process using different input layers and different parameters will

create a specific instance with associated metadata.

Technically CLC-Core shall:

Allow further characterisation (e.g. by additional input data / knowledge, see also chapter

5.2 - Populating the database).

Allow for (fully or partial) inclusion of Copernicus High Resolution Layers and local

component products.

Allow the best possible transformation of National datasets into CLC+ (capturing the wealth

of local knowledge existing in the countries).

Storing information in a grid structure will also help to overcome the problem of different geometries:

as the data are converted to grid cells, differences in vector geometries will have a lesser impact.

Page 44: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

43 | P a g e

5.4 Database implementation approach for CLC-Core The implementation approach for the database of CLC-Core is based on paradigms originating from

the semantic web and knowledge engineering. This opens up new possibilities beyond the classical

object-relational database management system concept. In the following subsections, we describe

the pros and cons of contemporary object-relational database models, not-only SQL (NoSQL)

concepts, and in particular Triple Stores. In addition, we elaborate on the intended approach that

utilizes a spatio-temporal Triple Store to store, manage and query CLC-Core data. Furthermore, we

propose strategies to generate CLC-Core products – i.e. “flat” shape-files or GeoJSON files – based

on a spatio-temporal Triple store.

5.4.1 Database concepts revisited: from Relational to NoSQL and Triple Stores The contemporary object-relational database model relies on the work of Codd (1970)15. Although

object-relational database management systems (ORDBMS) were developed further since the 1970s,

their relational concept remains. Due to the emergence of the World Wide Web, Web 2.0 and the

Semantic Web the requirements for databases have changed drastically, which led to the

development of NoSQL database concepts (Friedland et al., 2011; Sadalage & Fowler, 2012)16,17.

ORDBMSs are databases that follow the relational model, published by Codd (1970). Any relational

model defines how data are stored, retrieved and altered within databases – base on n-ary relations

on sets. Formally speaking, any relation is a subset of the Cartesian product of sets. In such an

environment “reasoning” is done using a two-valued predicate logic. Data, stored utilizing the

relational model, are viewed as tables, where each row represents an n-tuple of the relation itself.

Each column is labelled with the name of the corresponding set (domain). Operations are performed

using relational algebra, allowing to express data transactions in a formal way. ORDBMSs are

designed to adhere to the ACID principles – atomicity, consistency, isolation and durability – which

are defined by Gray (1981)18. ACID principles ensure that database transactions are performed in a

“reliable” way. In contemporary ORDBMSs consistency is the central issue limiting performance and

the ability to scale horizontally – which is crucial for Web 2.0 or Big Data requirements. In addition,

ORDBMSs rely on a fixed schema to which all data have to adhere. Hence, alterations of the data

model are hardly possible, which leads to the fact that data models in an ORDBMS are regarded as

static. Other downsides of contemporary ORDBMSs are the data load performance, and the handling

of large data volumes with high velocity with replication. Nevertheless, ORDBMS are still widely used

in business applications, as the guarantee consistency, which is crucial for e.g. bank accounting. In

addition, commercial and open-source ORDBMSs are mature products and offer a variety of options

and extensions.

The term “NoSQL” database emerged in 2009 (Evans, 2009)19, as a name for a meeting of database

specialists on distributed structured data storage in San Francisco on June 11, 2009. Since then the

15 Codd, E. F. (1970). A relational model of data for large shared data banks. Communications of the ACM. 13 (6): 377. doi:10.1145/362384.362685 16 Sadalage, P. J., & Fowler, M. (2012). NoSQL distilled: a brief guide to the emerging world of polyglot persistence. Pearson Education. 17 Friedland, A., Hampe, J., Brauer, B., Brückner, M., & Edlich, S. (2011). NoSQL: Einstieg in die Welt nichtrelationaler Web 2.0 Datenbanken. Hanser Publishing. 18 Gray, J. (1981): The Transaction Concept: Virtues and Limitations. In: Proceedings of the 7th International Conference on Very Large Databases, pp. 144–154, Cannes, France. 19 Evans, E. (2017): NOSQL 2009. Url: http://blog.sym-link.com/2009/05/12/nosql_2009.html, visited: 06-11-2017.

Page 45: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

44 | P a g e

term has grown into a widely known umbrella term for a number of different database concepts

(Friedland et al., 2011). NoSQL databases have the following characteristics in common:

- non-relational data model

- absence of ACID principles (especially consistency – defined as BASE [eventual consistency

and basically available, soft state eventually consistent])

- flexible schema

- simple API’s (no join operations)

- simple replication approach

- tailored towards distributed and horizontal scalability

NoSQL databases are especially designed for Web 2.0 applications and cloud platforms that require

the handling large data volumes with replication. In addition, NoSQL databases are designed to

support horizontal scalability, which is necessary for handling big data with high turnover rates20.

Currently, NoSQL concepts are employed by e.g. the following companies: Amazon, Google, Yahoo,

Facebook, and Twitter. Types of NoSQL databases are as follows (see e.g. Scholz (2011)21, Sadalage &

Fowler (2012)):

- Column databases: have tables, rows and columns, but the column names and their format

can change – i.e. each row in a table can be different (Khoshafian et al., 198722; Abadi,

200723). Generally speaking, a wide column store can be regarded as 2D key-value store.

Examples are: Apache Cassandra, Apache HBase, Apache Accumulo, Hybertable, Google’s

Bigtable

- Key-value databases: store data in the form of a key and an associated value (similar to a

hash), and strictly no relations. Key value stores show exceptional performance in terms of

horizontal scaling, and handling big data volumes. Examples of key-value stores are e.g.

OrientDB, Dynamo (Amazon), Redis, or Berkeley DB.

- Document databases: rely on the “document” metaphor that implies that the

data/information is contained in documents – that share a given format or encoding. Thus,

encodings like XML or JSON are used to represent documents. Each document can be

schema free, which is a key advantage for Web 2.0 applications. Examples of document

databases are. Apache CouchDB, MongoDB, Cosmos DB (Microsoft), or IBM Domino.

- Graph databases: utilize the notion of a mathematical graph structure for database

purposes (Robinson et al., 2015)24. A graph contains of nodes and edges that connect nodes.

In a graph database, the data/information is stored in nodes which are connected via

relationships, represented by edges. The relations can be annotated in a way that a graph

database may contain semantic information. The concept allows modelling of real-world

phenomena in terms of graphs, e.g. supply-chains, road networks, or medical history for

20 Sadalage, P. J., & Fowler, M. (2012). NoSQL distilled: a brief guide to the emerging world of polyglot persistence. Pearson Education. 21 Scholz J. (2011). Coping with Dynamic, Unstructured Data Sets - NoSQL: a Buzzword or a Savior?. In: M. Schrenk, V. Popovich & P. Zeile (Eds.) Proceedings of the Real CORP 2011, May 18-20, 2011, Essen, Germany: pp. 121 - 129. 22 KHOSHAFIAN, S., COPELAND, G., JAGODIS, T., BORAL H., VALDURIEZ, P. (1987). A query processing strategy for the decomposed storage model. In: ICDE, pp. 636–643. 23 Abadi, D. (2007). Column-Stores for Wide and Sparse Data. In: Proceedings of the 3rd Biennial Conference on Innovative Data Systems Research (CIDR), Url: http://db.csail.mit.edu/projects/cstore/abadicidr07.pdf, visited: Nov. 08, 2017. 24 Robinson, I., Webber, J., Eifrem, E. (2015). Graph Databases: New Opportunities for Connected Data, O'Reilly Media Inc.

Page 46: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

45 | P a g e

populations. Graph databases are capable of storing semantics and ontologies – especially in

the field of Geographic Information Science (Lampoltshammer & Wiegand, 2015)25. They

became very popular due to their suitability for social networking, and are used in systems

like Facebook Open Graph, Google Knowledge Graph, or Twitter FlockDB (Miller, 2013)26.

- Multi-modal: such databases are capable of supporting multiple NoSQL models/types.

Examples are: Cosmos DB (Microsoft), CouchBase (Apache), Oracle, OrientDB.

Additionally, the term Linked Data has emerged as part of the Semantic Web initiative (Bizer et al.,

2009, Heath & Bizer, 2011)27,28. Originating from a web of documents, Bizer et al. (2009) describe an

approach to develop a web of data (i.e. the data driven web) – by publishing structured data such

that data from different sources can be interlinked with typed links. Additionally, they formulate

four characteristics of linked data:

- They are published in machine-readable form

- They are published in a way that their meaning is explicitly defined

- They are linked to other datasets

- They can be linked from other data sets

In order to technically fulfil these characteristics, Burners-Lee (2009)29 established a number of

Linked Data principles:

- Uniform Resource Identifiers (URI‘s) to denote things

- HTTP URI‘s shall be used that things can be referred to and dereferenced

- W3C standards like Resource Description Framework (RDF) should be used to provide

information (Subject – Predicate – Object)

- Data about anything should link out to other data, can contribute to the Linked Data Cloud

RDF is a family of W3C specifications that was originally designed as metadata model. Over the years

it has evolved to the general method to describe resources. RDF is based on the notion of making

statements about resources – which follow the form: subject – predicate – object, known as triple.

Here the subject denotes the resource, the predicate represents aspects of the resource, and

therefore is regarded as an expression of the relationship between the subject and object (see

Figure 5-3).

25 Lampoltshammer, T. J. & Wiegand, S. (2015). “Improving the Computational Performance of Ontology-Based Classification Using Graph Databases” Remote Sensing, vol. 7(7): 9473-9491. 26 Miller, J.J. (2013). Graph database applications and concepts with Neo4j, Proceedings of the Southern Association for Information Systems Conference, Atlanta, vol. 2324, pp. 141-147. 27 Bizer, C., Heath, T., & Berners-Lee, T. (2009). Linked data-the story so far. Semantic services, interoperability and web applications: emerging concepts, 205-227. 28 Heath, T. and Bizer, C. 2011. Linked data: Evolving the web into a global data space. Synthesis lectures on the semantic web: theory and technology, 1(1), 1-36. 29 Berners-Lee, T. 2009. Linked Data - Design Issues. Online: http://www.w3.org/DesignIssues/LinkedData.html; visited: Nov. 09, 2017.

Page 47: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

46 | P a g e

Figure 5-3: Example of an RDF triple (subject - predicate - object).

To make use of semantics and ontologies in RDF there is the opportunity to integrate an RDF Schema (RDFS) or an Ontology Web Language (OWL). These technologies can be utilized to allow semantic reasoning within RDF triples. Both RDF and RDFS need to be stored in a specific data storage – called Triplestores. Triplestores are databases for the purpose of storing and retrieval of (RDF) triples through semantic queries. Triplestores are specific graph databases, document oriented stores, or specific relational data stores that are designed for RDF data. A high number of available Triplestores is able to store spatial and temporal amended data. The following table gives an overview of the available triplestores (both open-source and commercial products).

Table 5-1: Comparison of selected Triplestores with respect to spatial and temporal data.

Product name Supports spatial data Supports temporal data

Apache Jena Yes No

Apache Marmotta Yes Yes

OpenLink Virtuoso Yes Yes

Oracle Yes Yes

Parliament Yes Yes

Sesame/RDF4J Yes No

AllegroGraph No Yes

Strabon Yes Yes (stSPARQL)

GraphDB Yes No

To query and manipulate RDF data the W3C standardized language SPARQL is used. SPARQL allows

to write queries that can use quantitative (typically proximity based) relations as well as qualitative

relations (spatial “on” and temporal “after”, “from”). Spatial and temporal extensions of SPARQL

(such as stSPARQL and GeoSPARQL) supporting qualitative relations have been proposed but

expressivity and dealing with uncertainty are still major challenges (Belussi & Migliorini, 2014)30.

Nevertheless, spatial queries, using GeoSPARQL can look as depicted in the GeoSPARQL query “list

airports near the city of London”. The result of the query is shown in Figure 5-5, the GeoSPARQL

30 Belussi, A., & Migliorini, S. (2014). A framework for managing temporal dimensions in archaeological data. In Temporal Representation and Reasoning (TIME), 2014 21st International Symposium on (pp. 81-90). IEEE.

Page 48: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

47 | P a g e

query “list airports near the city of London” is depicted. The result of the query is shown in Figure

5-4 the GeoSPARQL query “list airports near the city of London” is depicted. The result of the query

is shown in a map metaphor and as JSON (Figure 5-5).

One major feature of triplestores and SPARQL is the possibility to create federated queries. A

federated query is a single query that is distributed to several SPARQL endpoints (capable of

accepting SPARQL queries and returning results), that individually compute results. Finally, the

originating SPARQL endpoint gathers the individual results, compiles them in one single output, and

returns the output accordingly. This allows to query a large number of different data storages that

can be geographically dispersed (e.g. over Europe).

Figure 5-4: GeoSPARQL query for Airports near the City of London.

Figure 5-5: Result of the GeoSPARQL of Figure 5-4 as map and in the JSON format.

PREFIX gn: <http://www.geonames.org/ontology#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX spatial: <http://jena.apache.org/spatial#> PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>

SELECT ?link ?name ?lat ?lon WHERE { ?link spatial:nearby(51.139725 -0.895386 50 'km') . ?link gn:name ?name . ?link gn:featureCode gn:S.AIRP . ?link geo:lat ?lat . ?link geo:long ?lon }

Page 49: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

48 | P a g e

5.4.2 CLC-Core Database: spatio-temporal Triple Store Approach The approach followed for the CLC-Core database can be based on a triplestore approach. This

approach could exploit the advantages of NoSQL databases (especially triplestores) while

overcoming the drawbacks of contemporary ORDBMS – especially addressing horizontal scalability

and consistency.

Hence, we propose distributed triplestore approach, where each member country of the European

Union could host their own CLC-core data in an RDF format. Each member country could host a

triplestore that provides a SPARQL endpoint. This approach can help to overcome data integrity and

replication issues that might arise with contemporary ORDBMS. Hence, each member country can

host their own data, and offer them to be queried via the SPARQL endpoint. The databases of the

member countries can be connected via federated queries, which requires only one SPARQL query at

an endpoint at hand (i.e. at exactly one member country’s CLC-endpoint) – see Figure 5-6.

If there are member countries that are not sure if they should operate a triplestore (and a SPARQL

endpoint) with the CLC-Core data, they can also ask another country to host their data on their

infrastructure. Such a concept would give CLC+ the flexibility to react to changing situation in EU

member countries regarding e.g. the ability to host the CLC dataset accordingly. For example, Austria

could host the CLC+ data of another country if they e.g. don’t have the resources to develop the

infrastructure for hosting and publishing the data.

Figure 5-6: Schematic view of the distributed SPARQL endpoints communicating with each other. The arrows indicate the flow of information from a query directed to the French CLC SPARQL endpoint.

Inevitable for such an approach is the development of an underlying shared semantical model – i.e.

an ontology. An abstract semantical model – in RDFS or OWL – can be developed that describes the

abstract model of CLC+ (e.g. classes, properties, etc.), which can be stored in a triplestore. This

allows the utilization of the abstract semantic model in queries (semantic reasoning based on the

developed knowledge basis). An example for semantic reasoning could be the determination of the

land cover type, based on a number of properties of grid cells.

France: CLC+ SPARQL

Germany: CLC+ SPARQL

Austria: CLC+ SPARQL

Italy: CLC+ SPARQL

SPARQL query Federated SPARQL queries Flow of result data

Page 50: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

49 | P a g e

5.4.3 Processing and Publishing of CLC-Core Products To generate CLC+ products based on the distributed triplestore architecture, we need to define the

products requested by the customers. Based on the defined products we need to develop a

migration/transition strategy describing the data flow from triplestores to the desired product.

We are of the opinion that the RDF approach is well suited for distributed storage and ad-hoc

queries and analyses, whereas customers would prefer a single “flat” file containing the CLC+ data.

Due to historic (and practical) reasons such a file could be an ESRI shapefile, an ESRI Geodatabase

file, a GRID file or a GeoJSON file. Such files containing the CLC+ data could be generated in an

automatic way by using a spatial Extract Transform and Load (ETL) tool like SAFE FME31, geokettle32

(open source), GDAL/OGR33 (open source). The migration can be performed with e.g. different

geographical or temporal coverage. These files can be hosted and published on a central server,

where they are available for the public (see Error! Reference source not found.).

Figure 5-7: Intended generation of CLC+ products based on the distributed triplestore architecture.

5.4.4 Conclusion and critical remarks The intended fine spatial and temporal resolution of CLC+ will result in high data volumes with high

turnover rates (velocity) – which are two characteristics of Big Data34. Thus, it seems advisable to

adhere to the nature of these datasets when planning a sustainable architecture for CLC+. As

contemporary ORDBMS are not designed for Big Data applications, which manifests in their

weaknesses regarding horizontal scaling, consistency and replication, we advise to have a deeper look

into NoSQL concepts. Especially triplestores with RDF schemas seem appropriate to cope with the

requirements of CLC+, with respect to data volume, data turnover rate, distributed architecture and

semantic queries (and reasoning). Although, NoSQL datastores are widely used in business

applications like Facebook, Twitter, Amazon, and Google, the technology and theoretical foundation

is still under active development. Hence, NoSQL databases do not reach the technological maturity of

contemporary ORDBMS, like Oracle or PostgreSQL. In our eyes, this is a weakness and an opportunity.

By using NoSQL databases, this CLC+ is on the edge of the technological developments and is able to

contribute to the future of NoSQL databases – and to strengthen the spatio-temporal abilities of them.

31 https://www.safe.com/fme/fme-server/ 32 http://www.spatialytics.org/projects/geokettle/ 33 http://www.gdal.org 34 Hilbert, M. (2015): "Big Data for Development: A Review of Promises and Challenges. Development Policy Review.".martinhilbert.net. visited: Nov. 08, 2017.

Web Server (public)

Spatial

ETL Tool

Triple Store #1

Triple Store #2

Triple Store #n

SHAPE File France

SHAPE File Germany

SHAPE File Austria

SHAPE File Finland

GeoJSON France

GeoJSON Germany

GeoJSON Austria

GeoJSON Finland

Page 51: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

50 | P a g e

5.5 Short-form of CLC-Core end product technical specifications

CLC-Core

[Example image if available.]

Description

CLC-Core is a consistent, multi-use grid database repository for environmental information populated

with a broad range of land cover, land use and ancillary data, forming the information engine to

deliver and support tailored thematic information requirements.

Input data sources (EO & ancillary)

EO data used in the production

No direct EO inputs.

Ancillary data used in the production

CLMS dataset

MS datasets

Other relevant environmental datasets

Thematic Content

EAGLE Data Model: TBD

Methodology

CLC-Core will be based on a grid spatial model which consists of square spatial units of uniform size

and shape that can have an (internally) heterogeneous thematic composition, i.e. the grids are

“geometrically uniform polygons”. Each grid cell has a unique ID that is related to a database

containing the attribute data. In a first step, CLC-Core will be populated with existing thematic

information from the CLMS, such as LoCo, HRL, CLC-Backbone, CLC, and any other data, such as in-

situ and MS data, mainly on land use and agricultural themes.

Geometric resolution (Scale)

n/a

Geographic projection / Reference system

ETRS89-LAEA

ETRS89 Lambert Azimuthal Equal-Area (EPSG: 3035, http://spatialreference.org/ref/epsg/etrs89-

etrs-laea/)

Page 52: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

51 | P a g e

Coverage

EEA 32 Member Countries and 7 Cooperating Countries, i.e. the full EEA39 (Albania, Austria, Belgium,

Bosnia- Herzegovina, Bulgaria, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, France ,

Germany, Greece, Hungary, Iceland, Ireland, Italy, Kosovo (under the UN Security Council Resolution

1244/99), Latvia, Liechtenstein, Lithuania, Luxembourg, Former Yugoslavian Republic of Macedonia,

Malta, Montenegro, the Netherlands, Norway , Poland, Portugal , Romania, Serbia, Slovakia,

Slovenia, Spain, Sweden, Switzerland, Turkey, United Kingdom) who are participating in the regular

CLC dataflow. The number of countries involved is 39. The area covered is, approximately, 6 Million

km2.

Geometric accuracy (positioning accuracy)

Grid: Less than half a grid cell

Spatial resolution / Minimum Mapping Unit (MMU)

Grid: 1 ha

Min. Width of linear features

Grid: 100 m

Topological correctness

[TBD]

Raster coding

n/a

Thematic accuracy (in %) / quality method

To be explored as it will aggregate a disparate range of data types, products and sources.

Target / reference year

2018 (full vegetation season, starting partly in autumn 2017 in southern Europe)

Up-date frequency

[TBD]

Information actuality

Plus and minus 1 year from target year (2017 – 2019). A longer time window may be required for the

ancillary datasets.

Delivery format

Grid

Data type

Grid

Naming conventions

[TBD]

Medium

[TBD]

Delivery reliability

[TBD]

Page 53: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

52 | P a g e

Delivery time

[TBD]

Archive

[TBD]

Page 54: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

53 | P a g e

6 CLC+ - THE LONG-TERM VISION

CLC+ is expected to represent the new long-term vision for land cover / land use monitoring in

Europe. CLC+ shall meet the requirements of today and tomorrow for European LC/LU information

needs, building upon the EAGLE concept and expertise, while still have capabilities within the 2nd

generation CLC to guarantee backwards compatibility with the original CLC datasets. That implies:

addressing and overcoming the limitations of the spatial resolution of CLC;

overcoming the problems created by the use of heterogeneous classes, while still allowing

their creation in CLC-Legacy to maintain the “landscape vision” of CLC;

a better discrimination of specific landscape features by introducing a limited number of

new or sub-level classes;

keeping the CLC nomenclature unchanged to a large degree, while allowing it to address

known major shortcomings in the classical CLC nomenclature (by attribution and/ or

introduction of subclasses);

making use of a consistent and reproducible method of mapping changes at the 1 ha level

for future land monitoring at a much higher spatial resolution than with the traditional CLC.

Technically CLC+ is proposed as a raster data set with 1 ha spatial resolution (100 x 100 m), created

from CLC-Core, i.e. already including information from CLC-Backbone (e.g. pixel classification),

existing HRL, LoCo and national data stored in CLC-Core. CLC+ shall be understood as one specific

instance (i.e. realisation) of CLC-Core in which the stored information is aggregated to a specific

nomenclature, using the geometry defined by the CLC-Core grid cells35.

The 2018 reference year provides a unique opportunity for creating both a traditional CLC update

(by the MS) and a CLC-Legacy (via CLC-Backbone, CLC-Core and CLC+). The proposed nomenclature

and structure of CLC+ will not deviate too much from the classical CLC to allow a comparison of both

approaches.

Obviously, many other land monitoring products with various nomenclatures and spatial frameworks

may also be defined on the basis of CLC-Core in the future, also addressing the separation of land

cover and land use (already foreseen in CLC-Core), the use of attributes in the object

characterisation or the inclusion of ecosystem or habitat information. But this will be topic of future

discussions and developments, not part of the current work.

Despite the 1 ha MMU of CLC+, the information provided from the pixel based classification of CLC-

Backbone and stored in CLC-Core will allow the reporting of fractions of 1 ha units to respond to

more detailed (i.e. 0.5 ha) reporting obligations.

Due to the higher geometric resolution, the use of some of the often-debated heterogeneous

classes, like 2.4.2 (complex cultivation pattern) and 2.4.3 (land principally occupied by agriculture,

with significant areas of natural vegetation) will be significantly reduced, if not disappear altogether.

35 For this first realisation of CLC+ the vector geometry of CLC-Backbone is not used. Nonetheless, this could be considered as further option to be explored in the future.

Page 55: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

54 | P a g e

Other conceptually heterogeneous classes like 2.4.4 (agro-forestry) will remain as they also have a

meaning at the 1 ha level.

The CLC+ nomenclature should be formulated in a way that it is compatible with the traditional

level-3 CLC classes, and takes into account user needs expressed e.g. in the CLC survey conducted by

ETC-ULS in 201636, former CLC production experience, as well as feasibility.

Considering the frequency of classes in CLC and proposals from MS, the following are primary candidate classes for division into subclasses:

121 – green energy industry separated

133 – “construction of artificial features” and “nature construction"

142 – cemeteries and sport/recreation (subclasses e.g. sport arenas, sport fields and racecourses, golf courses, ski slopes, parks (outside urban fabric), allotment gardens, archaeological sites and museums, holiday villages (temporary residential areas), sport airfields)

123 – agricultural grasslands (pastures, meadows) and non-agricultural grasslands under strong human impact

324 – forests growth areas (succession, afforestation, reforestation) and decline of forests (clearcutting, damage)

512 – natural and man-made

Attributes that can be applied for a number of classes are proposed to have a priority in introduction to CLC+. These are e.g.:

abandoned / out of use: can be applied for disused artificial sites / brownfields (121), uncompleted construction sites (133), suburban unused ruderal areas (231), salines (422).

irrigated: all 2xx classes

burnt: all 22x and 3xx classes

presence of windmills: all 2xx, most 3xx classes and 523

As the original CLC nomenclature is not a pure land cover classification there will be the need to

make use of more information than provided by the Copernicus Land Monitoring Service. Table 6-1

provides an overview of the traditional 44 CLC classes and information needed in addition to the

information provided by the Local Components, the High Resolution Layers and the land cover

information collected by CLC-Backbone in order to derive a specific CLC class. The new nomenclature

will be demanding similar and probably even more ancillary information.

The Hungarian test case37 for implementing the EAGLE concept in a bottom up CLC production has

clearly shown that content and quality of a bottom up derived CLC is ultimately dependent on input

data on:

• Land Use (critical);

• Crops and use intensity (e.g. from LPIS)

• Habitat (vegetation composition) information.

The situation is different for each CLC class and Table 6-1 clearly shows that many CLC classes cannot

be derived without additional information on the land use aspect of the area. The current LoCo data

36 Service Contract No 3436/R0-Copernicus/EEA.56586 Task 3: Planning of cooperation with EEA member and cooperating countries. FINAL report in preparation of the CLC2018 exercise. 37 Negotiated procedure No EEA/IDM/R0/16/001 based on Regulation (EC) No 401/2009 of 23.4.2009 on the European Environment Agency and the European Environment Information and Observation Network. Final report: Implementation of the EAGLE approach for deriving the European CLC from national databases on a selected area in one EU country

Page 56: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

55 | P a g e

are only partly able to mitigate this lack of land use information as they are only available in selected

areas, not covering all of the European territory.

In almost all cases this information gap can be closed with data from MS (databases or photo-

interpretation). In case MS data is not available, a few classes can be approximated via European

level information, but most certainly at the expense of data accuracy and precision. The MS

information should be provided as input to CLC-Core. In case of missing national land use

information as well as no other European-level replacement data38, traditional CLC data or

production methods (e.g. photo interpretation) might need to be used as back-up solution.

38 Wall-to-wall HRL can be used to address density issues or leaf types, but not they cannot be used as replacement information of land use.

Page 57: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

56 | P a g e

Table 6-1: List of current CLC classes and requirements for external information

Page 58: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

57 | P a g e

7 CLC-LEGACY

The history of European land monitoring is a story of permanent evolution of approaches and

concepts, resulting in different solutions for individual needs and specifications. In this context, the

CLC specifications provided the first set of European-wide accepted de-facto standards, i.e. a

nomenclature of land cover classes, a geometric specification, an approach for land cover change

mapping and a conceptual data model (Heymann et al, 1994). In this sense, CLC represents a unique

European-wide, if not global, consensus on land monitoring, supported by 39 countries.

Being the thematically most detailed wall to wall European LC/LU dataset and representing a time-

series started in the early 1990’s CLC data became the basis of several analyses and indicator

developments, like Land & Ecosystem Accounts (LEAC) for Europe. It is an obvious requirement that

the planned new European land monitoring framework (2nd generation CLC) has to be able to

ensure a possible maximum of backwards compatibility with standard CLC data.

This comparability will be ensured by CLC-Legacy – a data set with 25 ha MMU and the standard 44

thematic CLC classes, but derived by spatial and thematic generalisation from CLC+ (and/or CLC-

Core, tbd). Considering today’s already existing different CLC production methods, i.e. traditional

visual photo-interpretation vs. national bottom-up solutions, shows that those specific

methodologies result in CLC data with slightly different characteristics in terms of LC/LU statistics or

fine geometric detail, but still respecting and adhering to the general CLC specifications.

Although original CLC vector data are still used mainly for cartographic purposes, in practice the

most commonly used versions of CLC data are CLC status and CLC change raster layers. That is why

CLC-Legacy is also proposed as a 100 x 100m raster data set.

As already mentioned in chapter Error! Reference source not found., 2018 offers the unique

opportunity of comparing two different production approaches for CLC, i.e. the classical approach

implemented by the MS (CLC2018) and a modernised approach making use of capacities from the

Copernicus service industry and MS. Comparing CLC2018 and CLC_Legacy_2018 will allow to assess

the strengths and weaknesses of both approaches.

Production method is, however, not the only source of variation. The specification of the

nomenclature, combined with the variation in land cover and landscape across the European

continent, is necessarily leading to large variability within each class and partial overlap between

classes (Figure 7-1: CLC2012 along the border between Norway and Sweden (red line). The central

mountain area covered with sparsely vegetated lichen and calluna heath is assigned to 3.3.3 Sparsely

vegetated areas in Norway and 3.2.2 Moors and heathland in Sweden. Both classifications are

correct.).

Variation is inherent in the CLC methodology and variation due to the production method should not

cause any major concern.

Page 59: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

58 | P a g e

Figure 7-1: CLC2012 along the border between Norway and Sweden (red line). The central mountain area covered with sparsely vegetated lichen and calluna heath is assigned to 3.3.3 Sparsely vegetated areas in Norway and 3.2.2 Moors and heathland in Sweden. Both classifications are correct.

For the practical creation of a CLC-Legacy 201839, the production process is facing a classical dilemma

that is associated with a change of methodology:

How to derive a CLC-Legacy 2018? The production process of CLC-Backbone, CLC-Core and CLC+ will

only provide status information for 2018, not any changes relative to CLC2012. A possible solution to

this dilemma (i.e. to map changes relative to the last update) is the approach used by Germany that

moved from a traditional CLC mapping approach to an automated approach in 2012, using a “status

layer first” approach instead of “change mapping first”. The result of that methodological change

was a delineation of CLC2012 objects based on national high spatial resolution data with a different

geometry than the previous CLC2006 objects. For an illustration, see also the areas marked in Figure

7-2.

In a subsequent step it was then necessary to separate the differences between CLC2006 and the

new CLC2012 into “technical changes” (due to the new mapping approach) and “real changes”. This

step involved a mostly manual revision and the constitution of the change layer 2006-2012. In

subsequent update cycles it may be possible to derive changes from the new versions of CLC+, by

creating a difference layer, then manually separating real changes from differences between

datasets (resulting from differences in CLC Core input data). Additional manual change detection

might be needed for changes that are not automatically detectable due to lack of up-to-date input

information (e.g. changes of land use without land cover change).

39 The process of deriving CLC-Legacy from CLC-Backbone, CLC-Core and finally CLC+ should not be confused with the traditional creation of CLC2018 by the MS.

Page 60: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

59 | P a g e

CLC2006 CLC2012

Figure 7-2: Result of changing the CLC implementation method in Germany

For a future CLC2024 (assuming the 6 year update cycle) the change mapping approach could be the

combination of the traditional “change mapping first” approach and the ‘update first’ approach.

Information from 1 ha CLC+2018 and CLC+2024 would be used to derive 1 ha changes with

strong manual control, to separate differences from real changes.

These 1 ha changes would be aggregated to 5 ha change objects (i.e. CLC-Changes2018-2024).

The 5 ha change objects (2018-2014) would then be added to CLC2018 to create a CLC2024.

Methods and approaches for how to obtain 25 ha status layers and 5 ha changes from higher spatial

resolution data exist already in a number of MS.

7.1 Experiences to be considered

The following experiences can be considered for establishing the grounds for creating CLC-Legacy:

CLC accounting layers are used as input for EEAs Land and Ecosystem Accounting (LEAC)

system. The methodology was developed by ETC/ULS to create these layers, as well as for

the raster generalization of CLC accounting layers40.

Methodologies for bottom-up creation of CLC data have been developed by several CLC

national teams. This includes spatial and thematic aggregation of in-situ based high spatial

resolution data (e.g. Norway, Finland, Germany, The Netherlands, United Kingdom, Spain).

In most cases, rules were developed to allow thematic conversion in an automated or semi-

automated fashion. For spatial generalisation different techniques are available, depending

on the starting point in each of the countries.

CLC+ national test case for Hungary41

40 Generalization of adjusted CLC layers. ETC/ULS report 2016. 41 Negotiated procedure No EEA/IDM/R0/16/001 based on Regulation (EC) No 401/2009 of 23.4.2009 on the European Environment Agency and the European Environment Information and Observation Network. FINAL REPORT: Implementation of the EAGLE approach for deriving the European CLC from national databases on a selected area in one EU country

Page 61: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

60 | P a g e

CLC Accounting Layers

One of the strongest assets of CLC is its ability to provide consistent information on land cover

changes and trends of the environment over multiple decades. Any new land monitoring

framework therefore must ensure the continuity of the land cover change information, such as

provided by the CLC Accounting Layers.

Specific characteristics and the variety of the CLC creation and update methodologies have

shown, that while CLC change data are well representing land cover changes between two certain

reference dates, there are significant "breaks" in the continuity of CLC time-series both in

statistical and in geometric sense. The solution applied for the harmonization of CLC time-series

is based on the idea to combine CLC status and change information to create a homogenous

quality time series of CLC / CLC-change layers for accounting purposes fulfilling the relation:

CLC change = CLC accounting new status – CLC accounting old status

Additional criteria of the realization were:

• Add more detail to the latest CLC status layer (CLC2012) from previous CLCC

information and use this "adjusted" layer as a reference

• Create previous CLC status layers by "backdating" of the reference, realized as

subtracting CLCC based information for CLC2012

Based on the above principles, the working steps of the creation of adjusted CLC layers (2000-

2006-2012) were as follows:

1 Include formation information from CLC-change layers into current CLC2012 status by

creating adjusted CLC2012 layer

a. Overwrite CLC2012 with first with code_2006 from CLC-change 2000-2006.

Intermediate result: A1_CLC2012

b. Overwrite F1_CLC2012 with code_2012 from CLC-change 2006-2012. Result:

A2_CLC2012

2 Create adjusted CLC2006 by including consumption information (code 2006 from CLC-

change 2006-2012) into A2_CLC2012. Result: A1_CLC2006

3 Create adjusted CLC2000 by including consumption information (code_2000 from CLC-

change 2000-2006) into A1_CLC2006. Result: A1_CLC2000

Due to the fact that the MMU is different for CLC status (25 ha) and CLC change (5 ha) layers,

resulting CLC accounting layers may include at several locations more spatial detail, than original

CLC status layers. Although this fact contradicts the original CLC specifications, still CLC-

accounting layers are considered as the most consistent harmonized time-series of CLC data.

CLC accounting layers are always created on the basis of the latest raster CLC status layer. Fine

details are added to this layer from the formation codes of previous CLC-change inventories. All

the former CLC accounting status layers are calculated backwards by overlaying CLC-change

consumption codes.

Page 62: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

61 | P a g e

Figure 7-3 to Figure 7-5 illustrate different techniques and results of national generalisation

approaches to produce standard CLC from more detailed data.

Figure 7-3: Generalisation technique applied in Norway based on expanding and subsequently shrinking. The technique exists for polygon as well as raster data.

Original 3000m² Level III 1ha Level III 5ha Level III 25ha Level III

Figure 7-4: Generalization levels used in LISA generalization in Austria

Figure 7-5: Examples of 25 ha, 10 ha and 1 ha MMU CLC for Germany

Page 63: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

62 | P a g e

CLC2012

Generalized CLC , MMU = 25 ha

High spatial resolution CLC, MMU = 20m cells

Figure 7-6: CLC+ national test case results (Hungary, Budapest-North). Comparison of traditional CLC (top) with bottom-up CLC (middle), created from high-resolution national CLC3 (bottom). The generalized CLC is more fragmented than CLC2012, even if 25ha MMU is kept. The high-resolution CLC illustrates well the potential laying in CLC+.

Page 64: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

63 | P a g e

8 ANNEX 1 – CLC NOMENCLATURE

Page 65: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

64 | P a g e

9 ANNEX 2: CLC-BACK BONE DATA MODEL

The complete list of feature types within CLC-Backbone contain:

INSPIRE o application schema: land cover vector

Land cover dataset Land cover unit

INSPIRE extension o Application schema: land cover extension (not endorsed)

Parametric observation

EAGLE o Land cover component

Each land cover dataset can contain several land cover units (in our case Level 2 landscape objects).

Each land cover unit contains a valid geometry (polygon border in our case). A specific land cover

unit can consist of one or more land cover components (either in time and/or space), each of which

with a specific lifespan.

It is important to note that the parametric observation as defined in the (not endorsed) INSPIRE land

cover extension application schema is an absolutely necessary part to attach specific observations

(e.g. temporal profile of NDVI) to the objects (land cover units).

INSPIRE main feature types: land cover dataset and land cover unit

Figure 9-1: INSPIRE main feature types: land cover dataset and land cover unit (from land cover extended application schema)

Page 66: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

65 | P a g e

EAGLE extension to land cover unit and new feature type land cover component

Figure 9-2: EAGLE extension to land cover unit and new feature type land cover component

Each land cover unit can contain one or more so called parametric observations. According to

INSPIRE these parametric observations are limited to percentage, countable and presence

parameter. In order to enlarge the usage and to integrate the required observation (e.g. mean NDVI,

or vegetation trajectories) of a single date this parameter type is enlarged with the data type

“Indexparameter”.

Therefore, the CLC-Backbone data model includes the single date observations as ParameterType

(with a specific observationDate and a measurement as real number (0…1 or 0….100). A new CLC-

Backbone base model is shown in Figure 9-3.

Page 67: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

66 | P a g e

Figure 9-3: The new CLC-Basis model (based on LISA)

The new CLC-Backbone data model includes now three feature types:

LandCoverDataset

LandCoverUnit

LandCoverComponent And two additional datatypes

LandCoverUnit o ParameterType

Name: name of the parameter type (e.g. NDVI_max, NDVI_mean, NDVI_min, …)

ObservationDate: date of the observation (e.g. acquisition date of satellite scene)

IndexParameter: Value of e.g. NDVI_max, etc. to be stored here

LandCoverComponent o Occurrence

Cover percentage of a single land cover component within the LC Unit o Overlaying

1….if the LC component is overlaying another component (e.g. temporarily covered with water, snow)

0…default-value, if the component is not overlaying o validFrom: The time when the activity complex started to exist in the real world. o validTo: The time when the activity complex no longer exists in the real world.

Integration of temporal dimension

Page 68: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

67 | P a g e

9.1.1 Integration of temporal dimension into LISA

As LISA was historically set up as single snap-shot within a 3-year interval (repetition cycle of

orthofoto campaign), no temporal phenomena could be modelled. The underlying assumption for

the adaptation of the model is that the land cover unit establishes the more or less stable geometry

over time (>1 year continuity).

Within the land cover unit several land cover components can exist with a specific life time. The land

cover unit can be considered as stable container for these changing land cover components.

Types of changes:

Changes within a land cover unit is in general regarded as status change.

We consider changes within the land cover components as cyclic changes.

Long term ecosystem changes (conditional changes) will be rather recorded in various attributes that

will be defined within the products nr. 5.

The implementation of the temporal dynamics is realized using a 1:many relation between the

LandCoverUnit and the LandCoverComponent in combination with the defined Lifespan of the

specific component.

Example

A typical agricultural field the land cover unit could have the following LC components:

Bare soil January – March

Herbaceous vegetation April-June

Bare soil 2nd half of June

Herbaceaous vegetation July-September

Bare soil 2nd half of September

Herbaceous vegetation October-December A web-based application within the GSE Cadaster Environment project (financed by ESA) is available under: http://demo.geoville.com/shiny/cadenv/p2/wien/ to demonstrate the above-mentioned concept (Figure 9-4).

Page 69: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

68 | P a g e

Figure 9-4: illustration of temporal NDVI profile and derived land cover components throughout a vegetation season (Example taken from project Cadaster Env Austria, © Geoville 2017)

Important note:

It is not foreseen in the technical specifications that each temporal LC component is stored in the

underlying CLC Base production database. The production chain may just provide the final

classification of the LC unit in the final CLC-Backbone product, without giving information on the

appearance of land cover components throughout.

9.1.2 Integration of multi-temporal observations into LISA

Spectral profiles over time constitute a valuable information source (Figure 9-5), not only for the

automated classification, but as well for the visual interpretation of object characteristic. For each

CLC-Backbone object various time series of NDVI-indices (NDVI_mean, NDVI_max, NDVI_min, etc.)

can be stored in the data model using the parametricObservation.

Page 70: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

69 | P a g e

Figure 9-5: Draft sketch to illustrate the principles of the combined temporal information that is stored in the data model

The sketch above illustrates the principles of the combined temporal information that is stored in

the data model

The graph above illustrates the variation within a year within a LandCoverUnit. The geometric borders of the unit are pre-fixed based on the Level 2 landscape object geometries

o red points: single observations from each S-2 scene calculated on object level (Level 2 landscape polygon)

From these observations (stored in the parametricObservation of the LandCoverUnit the complete temporal profile for the specific object can be reconstructed as graph and visually illustrated to domain experts.

o Brown and green bars: lifespan of the single LandCoverComponents (green – herbaceous vegetation and brown – bare soil) within a Land Cover unit

o Dark green bar: classification of the land cover unit according to the information of the sequential appearance of land cover components

Within the LandCoverComponent the attributes validFrom and validTo are used to store the life-time

information of the component. If only the start, but not the end of a LandCoverComponent can be

observed (e.g. due to missing observations in time), only the validFrom can be used. The element

will automatically be set to invalid, if a new LandCoverComponent appears (with 100% coverage of

the landcover unit).

Page 71: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

70 | P a g e

10 ANNEX 3: OSM DATA

10.1 OSM road nomenclature The following table identifies all OSM road types that should be considered for CLC-Backbone Level 1

delineation. In many countries, most roads are tagged as “unclassified”. Only highway =

motorway/motorway_link implies anything about quality. (Source:

https://wiki.openstreetmap.org/wiki/Key:highway)

CLC-Backbone

Potential 20m width

Key Value Comment

y wide highway motorway A restricted access major divided highway, normally with 2 or more running lanes plus emergency hard shoulder. Equivalent to the Freeway, Autobahn, etc..

y highway trunk The most important roads in a country's system that aren't motorways. (Need not necessarily be a divided highway.)

y wide highway primary The next most important roads in a country's system. (Often link larger towns.)

y highway secondary The next most important roads in a country's system. (Often link towns.)

y highway tertiary The next most important roads in a country's system. (Often link smaller towns and villages)

y highway unclassified The least most important through roads in a country's system – i.e. minor roads of a lower classification than tertiary, but which serve a purpose other than access to properties. Often link villages and hamlets. (The word 'unclassified' is a historical artefact of the UK road system and does not mean that the classification is unknown; you can use highway=road for that.)

y highway residential Roads which serve as an access to housing, without function of connecting settlements. Often lined with housing.

y highway service For access roads to, or within an industrial estate, camp site, business park, car park etc. Can be used in conjunction with service=* to indicate the type of usage and with access=* to indicate who can use it and in what circumstances.

Link roads

Y wide highway motorway_link The link roads (sliproads/ramps) leading to/from a motorway from/to a motorway or lower class highway. Normally with the same motorway restrictions.

y highway trunk_link The link roads (sliproads/ramps) leading to/from a trunk road from/to a trunk road or lower class highway.

y highway primary_link The link roads (sliproads/ramps) leading to/from a primary road from/to a primary road or lower class highway.

y highway secondary_link The link roads (sliproads/ramps) leading to/from a secondary road from/to a secondary road or lower class highway.

y highway tertiary_link The link roads (sliproads/ramps) leading to/from a tertiary road from/to a tertiary road or lower class highway.

Special road types

y highway track Roads for mostly agricultural or forestry uses. To describe the quality of a track, see tracktype=*. Note: Although tracks are often rough with unpaved surfaces, this tag is not describing the quality of a road but its use. Consequently, if you want to tag a general use road, use one of the general highway values instead of track.

Page 72: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

71 | P a g e

10.2 OSM roads: completeness Estimation of OSM completeness according to parametric estimation in study of Barrington-Leigh

and Millard-Ball (2017)42. A sigmoid curve was fitted to the cumulative length of contributions

within OSM data and used to estimate the saturation level for each country.

country ISO-Code frcComplete_fit OSM length [km]

Serbia SRB 74,3% 61.233

Bosnia and Herzegovina BIH 86,5% 37.877

Turkey TUR 89,7% 501.384

Sweden SWE 92,7% 280.107

Norway NOR 93,6% 132.527

Hungary HUN 93,8% 90.670

Romania ROU 94,3% 162.844

Ireland IRL 96,0% 107.263

Spain ESP 96,2% 509.937

Liechtenstein LIE 96,3% 359

Croatia HRV 96,8% 60.119

Bulgaria BGR 96,9% 80.983

Cyprus CYP 97,0% 12.091

Albania ALB 97,6% 22.875

Italy ITA 97,9% 594.384

Belgium BEL 98,3% 103.778

Lithuania LTU 98,5% 69.885

Poland POL 98,5% 358.733

Iceland ISL 98,6% 17.871

France FRA 99,9% 1.164.238

Denmark DNK 100,3% 96.656

Estonia EST 100,3% 42.878

Portugal PRT 100,4% 150.176

United Kingdom GBR 100,4% 443.265

Netherlands NLD 100,5% 136.118

Greece GRC 100,8% 156.586

Slovakia SVK 101,4% 41.566

Austria AUT 101,4% 128.934

Luxembourg LUX 101,5% 6.210

Germany DEU 101,6% 744.304

Latvia LVA 101,7% 61.066

Montenegro MNE 102,2% 12.346

Macedonia (FYROM) MKD 102,4% 15.174

Switzerland CHE 103,0% 73.066

Czech Republic CZE 104,0% 112.441

Finland FIN 105,3% 231.298

Kosovo XKO 107,4% 13.297

Malta MLT 109,8% 2.241

Slovenia SVN NoData 36.041

42 Barrington-Leigh C, Millard-Ball A (2017) The world’s user-generated road map is more than 80% complete. PLOS ONE 12(8): e0180698. https://doi.org/10.1371/journal.pone.0180698

Page 73: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

72 | P a g e

11 ANNEX 4: LPIS DATA IN EUROPE

The land parcel identification system is the geographic information system component of IACS

(Integrated agricultural control system). It contains the delineation of the reference areas, ecological

focus areas and agricultural management areas that are subject for receiving CAP-payments. Due to

different national priorities in the implementation of the CAP the LPIS-data are not homogeneous

throughout Europe. Some countries use rather aggregated physical blocks as geometric reference

and other countries use single parcel management units as smallest geographic entity within LPIS.

Three types of spatial objects can in principal be differentiated within LPIS (Figure 11-1):

Physical block

Field parcel

Single management unit

A physical block is defined as the area that is surrounded by permanent non-agricultural objects like roads, rivers, settlement, forest or other features. A physical block normally contains several field management units that may belong to various farmers.

A field parcel is owned by a single farmer and is the dedicated management unit for a specific agricultural main category. It contains either grassland or arable land, but never both categories. It may be split into several single sub-management units.

A single management unit contains a specific crop type (e.g. winter wheat, maize) or a specific grassland management type (e.g. one cut meadow, two cut meadow, 3 and more cut meadow).

The relation between field parcels and single management units is a 1:m relation, where 1 management unit (field) can contain 1 or many sub-management units (single parcels).

Remark to the relation with the real estate database:

Land properties are recorded in many countries in the real estate database. An elementary part of the real estate database is the cartographic representation of real estate parcel boundaries. These parcels should not be mixed up with the field parcels. The parcels in the real estate database constitutes owner ship rights and are documented as official boundaries. The boundaries in agricultural management units however are delineated to derive correct area figures as base for payments under the CAP. In most cases the parcel boundaries between the real estate database and the LPIS will be the same, but not in all cases.

Page 74: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

73 | P a g e

(a)

(b)

(c)

Figure 11-1: Illustration of (a) physical block, (b) field parcel and (c) single management unit

From year to year the availability of national LPIS data is increasing. An overview of available and

downloadable datasets has been created based on a previous ETC-ULS report by Synergise within

the Horizon 2020 project LandSense (Figure 11-2). The minimum thematic detail that should be

harmonized from LPIS data is the classification into (1) arable land, (2) permanent grassland, (3)

permanent crops, (4) ecological focus areas and (5) others. As these categories contain quite

heterogeneous classes on national level a harmonization is quite challenging, as they are applied

quite inconsistently across Europe. Nevertheless, the geometric delineation of field parcels at any

level could already provide valuable input given that LPIS is available for the large majority of

countries. Unless a critical number of countries with available LPIS data is reached the inclusion of

LPIS will add more bias in the resulting geometric CLC-backbone compared to a method solely based

on pure image segmentation.

Page 75: D4: Draft design concept and CLC-Backbone and CLC-Core ...€¦ · conceptual design and technical specifications. The approach adopted for the development of the 2nd generation CLC

74 | P a g e

Figure 11-2: Overview of available GIS datasets LPIS and their thematic content © Synergise, project LandSense