Improvement of the use of administrative sources …...the use of geospatial data from the main...

36
The project is funded by European Statistical System National Statistical Institute IMPROVEMENT OF THE USE OF ADMINISTREATIVE SOURCES (ESS.VIP ADMIN WP6: pilot studies and applications) Improvement of the use of administrative sources (ESS.VIP ADMIN WP6 pilot studies and applications) Subtitle: Establishing a point-based foundation for address geocoding of statistical and administrative data Grant agreement_07112.2017.007-2017.441 Final methodological report Sofia, April 1, 2019

Transcript of Improvement of the use of administrative sources …...the use of geospatial data from the main...

Page 1: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

The project is

funded by European

Statistical

System

National

Statistical

Institute

IMPROVEMENT OF THE USE OF ADMINISTREATIVE SOURCES (ESS.VIP ADMIN WP6: pilot studies and applications)

Improvement of the use of administrative sources

(ESS.VIP ADMIN WP6 pilot studies and applications)

Subtitle: Establishing a point-based foundation for address geocoding of

statistical and administrative data

Grant agreement_07112.2017.007-2017.441

Final methodological report

Sofia, April 1, 2019

Page 2: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

2

Table of contents

List of acronyms and abbreviations ....................................................................................................... (2)

Executive summary ............................................................................................................................... (3)

I. Introduction and background ............................................................................................................. (5)

II. Summary of project activities and main results .............................................................................. (11)

III. Proposed solution for establishing a point-based foundation for address geocoding at NSI ........ (14)

Conclusion ........................................................................................................................................... (31)

References ........................................................................................................................................... (31)

Annex: Summary table of the analysis and conclusions on selected data sources to establish a point-

based infrastructure ............................................................................................................................. (32)

List of acronyms and abbreviations:

AO Addressable Object

ATU Administrative Territorial Units

CA Geodesy, Cartography and Cadaster Agency (in short Cadastral Agency)

CLU Classificatory of Localisation Units

DG CRAS DG Civil Registration and Administrative Service (in short Civil Registration)

EKATTE Unified classificatory for administrative- territorial and territorial units

GSGF Global Statistical Geospatial Framework

SDG Sustainable Development Goals

NAR Centralized Information System “Address Register” (in short National Address Register)

NRPP National Register of Populated Places

NSDI National Spatial Data Infrastructure

NSI National Statistical Institute of Bulgaria

SEGA State e-Government Agency

SBR Register of Statistical Units (Statistical Business Register)

SPR Information System Demography (Statistical Population Register)

UN-GGIM United Nations initiative on Global Geospatial Information Management

Page 3: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

3

Executive summary

The purpose of this report is to propose a methodological framework to guide the work on

establishment of a point-based geocoding infrastructure for Census 2021 and post-census

statistical production. The paper presents the use of geospatial data at NSI and proposes possible

solutions to enable consistent address geocoding at organizational level. It outlines an

implementation strategy that could be followed by executing certain tasks and activities. More

particularly the report tries to answer the following questions:

What information to be used and how?

What institutional arrangements are needed?

How to harmonise address data and enable it spatially?

How to make geocoding to a point a consistent process?

What can improve the situation?

For answering these questions and building a consistent analytical approach the project team

used different inputs as: outcomes from the project kick-off meeting, assessment results from

specially designed survey sent to the partner organisations and some of the subject matter

departments of NSI, status quo and plans for establishing and maintenance of important

registers at national level, results from EFGS/GEOSTAT projects, as well as UN-GGIM papers

on integration of statistical and geospatial information, core geospatial data themes,

recommendations for data content for 2021 Population and Housing Censuses, and a number

of good practices.

The report is divided into three sections.

Section I provides an overview on geospatial data use at NSI, introduces the strategic and

legislative preconditions for identification of needs, describes the challenges and issues in the

national context and summaries the general recommendations taking into account the current

national context.

Section II gives a summary of the project activities and main results, marking the scope of

methodological work and the steps followed to complete the project tasks.

Section III lays out the proposed solution for establishment of a point-based geocoding

framework for Census 2021. It focuses on the data sources and on expanding and improving

the use of geospatial data from the main spatial data provider (Cadastral Agency). The section

elaborates on conceptual model to harmonise address data collected from a range of sources,

methods for geo-enabling this data and organisation of geocoding, as well as findings and

recommendations for organisational set-up to implement the infrastructure.

Expanding and improving the use of cadastral data within NSI and close collaboration with CA

was essential for the project work and is essential for NSI to meet the requirement for full

geocoding of census data and integration of statistical and geospatial information.

Page 4: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

4

[NOTE on basic data configurations - The ESSnet KOMUSO typology] The data sources in the

focus of the project were not intended for statistical content production. The sources were

assessed particularly for their value to provide location for statistical information, as purely

infrastructural data.

Acknowledgements

The project team from NSI would like to thank all the partner institutions and experts involved

in this project, namely:

o Geodesy, Cartography and Cadaster Agency

o DG Civil Registration and Administrative Service

o Municipality of Gabrovo

o State e-Government Agency

Page 5: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

5

I. Introduction and background

1. Background

GIS technologies and geospatial data concept were introduced for the first time in between

1997-1998 at NSI, in support of pre-enumeration phase or so-called census mapping process.

At the beginning, NSI started by creating digital models and spatial data layers for few

settlements through scanning and digitizing paper maps and plans, and consequently after

Census 2001 to add census data to these digital models and layers. This method of spatial data

collection and processing continued until 2006 when the Cadastral Agency provided on the

voluntary basis first vector data from the digital cadastral maps. In 2009, recognising the

importance of having official and more accurate spatial data, NSI and the CA have signed

bilateral agreement for data exchange to support Census 2011. The agreements had been the

main precondition for cooperation and implementing joint activities.

2. Identifying the needs

There are certain strategic and legislative drivers that bring the needs for integration of high-

resolution geospatial data to support the statistical production at NSI.

2.1. Strategy for Development of the National Statistical System of the Republic of

Bulgaria 2013 - 2017, amended by an extension until 2020 1

The strategy highlights as a horizontal priority the process of establishing conditions for

production and integration of spatial (georeferenced) information with statistical information

by:

Using the infrastructure for spatial information in the European Community (INSPIRE),

in particular via an EU geoportal.

Integration of statistical data, when applicable, in order to establish an infrastructure

with multiple information sources for providing a spatial and temporal analysis.

Enlarging the use and dissemination of regional geo-referenced statistical information.

Furthermore, this strategic document also recognises that the process of integration of statistical

and geospatial data will be one of the main challenges that will determine the development of

the NSI until 2020.

2.2. Law on Population and Housing Census 2021 2

In 2019, the national law on population and housing census 2021 was adopted. The law

identifies certain stages and actions of Census preparations that require the use of official

geospatial data, orthophoto imagery, collaboration with relevant bodies as well as the

application of georeferencing activities in order to ensure reliable, detailed and comparable data

1 http://www.nsi.bg/sites/default/files/files/pages/uplf_e/Strategy2013-2017_2020.pdf 2 http://www.nsi.bg/sites/default/files/files/pages/Census2021/ZPNJF2021.pdf (BG only)

Page 6: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

6

from the survey. The law is in line with the EU regulatory framework related to the Census

2021.

When drafting the law, NSI organized a series of public consultations, where the demand on

statistical data from Census on very detailed territorial levels was highlighted by key users as

policy makers (national and sub-national level), academia and researchers, general public etc.

The address registration process and geocoding activities will be a milestone in Census 2021

and will provide valuable mechanism for linking individuals to housing units and dissemination

of data on the smallest possible territorial units.

2.3. COMMISSION IMPLEMENTING REGULATION (EU) 2018/1799 of 21

November 2018 on the establishment of a temporary direct statistical action for the

dissemination of selected topics of the 2021 population and housing census geocoded to a

1 km2 grid 3

In 2018, EC adopted an implementing regulation on the establishment of a temporary direct

statistical action in order to develop, produce and disseminate selected topics of the 2021

population and housing census geocoded to a 1 km2 grid. The action is justified by a common

need across the Union for reliable, accurate and comparable information on population

distribution with sufficient spatial resolution, founded on harmonised output requirements and

intended in particular for pan-European regional policy-making.

The plans of NSI to geocode the census data on point level, or in certain parts of the territory at

least on 1 km2 grid, will provide precise enough information to fulfil the task.

2.4. Regulation (EC) no 177/2008 of the European Parliament and of the council of 20

February 2008 establishing a common framework for business registers for statistical

purposes and repealing Council Regulation (EEC) No 2186/934

The regulation establishes a common framework for business registers for statistical purposes

in the Community and demands that EU Member States shall set up one or more harmonised

registers for statistical purposes, as a tool for the preparation and coordination of surveys, as a

source of information for the statistical analysis of the business population and its demography,

for the use of administrative data, and for the identification and construction of statistical units.

The regulation also specify the information content of the business register, where collecting

data on geographical location code and address (including postcode) at the most detailed is

mandatory for the local units.

At operational level the MSs represented in the permanent working group on business registers

and statistical units of Eurostat are trying to find solutions for identification of the local units

via their physical geographic location. In order to handle this issue the working group propose

several operational rules related to identification through geographical localisation:

3 https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32018R1799 4 https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32008R0177

Page 7: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

7

Operational rule: Identification

For the identification of a local unit, the physical geographic location has to be identified. Such

a single physical location is normally best approximated by the postal address. Several physical

locations of the same enterprise within the same community or within the same region are to be

treated as several local units of that enterprise.

Operational rule: Physical geographic location

A physical location of a local unit may be found within a building, may correspond to one

building or may comprise more than one building. In the latter case, the various buildings do

not form separate local units if they are physically close together and have a common postal

address.

Operational rule: Local unit without postal address

A local unit may not be situated in a building at all. If in that case the other criteria are fulfilled

a separate local unit should be identified. In such a case a postal address may not exist; however,

the geographical identification could be represented by geographical coordinates or other

measures.

2.5. Sustainable Development Goals5

The 2030 Agenda for Sustainable Development of UN is another strategic driver for NSI to

establish a point-based foundation for address geocoding of statistical and administrative data.

The demand for statistical data on SGDs is growing each year and NSI is trying to address this

needs by combining different sources and planning concrete activities related to SDGs in the

National Statistical Programme elaborated on a yearly basis.

The deliverables of this project can be used to expand the current statistical production models

and provide NSI with a powerful tool for ensuring statistical data disaggregated on levels that

will ease and enrich the reporting process on SDGs and its targets at national level.

The project recognizes the need for further practical application of the proposed point-based

foundation in data production for relevant indicators from the Global Indicators Framework

(GIF) or substituent proxy indicators from the national list which is under development.

Indicators from the GIF that can be tested:

9.1.1 Proportion of the rural population who live within 2 km of an all-season road.

11.1.1 Proportion of urban population living in slums, informal settlements or inadequate

housing.

11.7.1 Average share of the built-up area of cities that is open space for public use for all, by

sex, age and persons with disabilities.

5 https://sustainabledevelopment.un.org/?menu=1300

Page 8: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

8

3. Issues and challenges at national and organisational level

3.1. A common standard for addresses

Standardisation of address data is applied in the framework of civil registration system,

regulated by the national law on civil registration. The law provides a very general definition

and set of rules to be applied by registration authorities.

The address according to the law is the unambiguous description of the place where the person

lives or where he or she receives correspondence. The address in the Republic of Bulgaria shall

contain the district, municipality and settlement names. Depending on the location described,

the address may also include a localisation unit name (square, boulevard, street, residential

complex, neighborhood, etc.), street number, entrance, floor and apartment number. The locator

unit number may consist of a combination of up to four characters, the first three numerical

digits, and the last character a letter. The entrance can consist of one letter or a number up to

two digits, the floor up to two digits and the apartment up to three digits. The mayor of each

municipality defines the addresses in a given municipality by issuing official act. The collection

of addresses for all municipalities are forming the National classification of current and

permanent addresses (NCCPA).

The classification does not contain any other addresses except those, which describe units for

residential purposes. On other hand, very few municipalities out of 265 maintain a local address

register or database in digital format. In many cases, the addresses are stored on paper and being

inserted in the classification only if any actualisation “event” is initiated by the citizens, as

example - property or civil registration.

3.2. Delay and reset of the National Address Register project

The national strategy on the development of e-government in the republic of Bulgaria 2014-

2020 6 set as priority establishment of conditions for normal functioning of the primary

electronic registers existing in the country. The NAR was included in the action plan, as it was

considered as one of the foundational registers and one of the pillars for the development of e-

governance.

Initially, NAR was determined to synchronise address information in administrative registers

by providing Unique Address Identifier (UAI), but not serving physical location in the form of

geographical coordinates for the requested address.

After a number of discussions at national level and recognition of user needs on geo-enabled

address national dataset, Cadastral Agency was appointed as an authority to be responsible of

the national address system and administration of address register. Finally, in the middle of

2018, the NAR-project was assigned to CA for implementation.

6 https://www.e-gov.bg/en/about_us

Page 9: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

9

However, a fully functioning NAR with relevant coverage suitable for census purposes is not

guaranteed to be present in time for census activities. The needs for results driven by the EU

and national legislative obligations, including identified user priorities are another reason to

find solutions in cooperation with the CA as a partner organisation of NSI.

3.3. National spatial data infrastructure

Availability and access to fundamental geodata is one of the main challenges at national level.

The national spatial data infrastructure is in a very un-mature stage of development. Currently,

access to geodata is left to bilateral agreements between providers and users. Interoperability

and availability of national reference geodata is still insufficient for statistical purposes.

3.4. Challenges in organisational context

Current in-house production methodology for geocoding is inefficient and not capable to cover

user needs for regular production of spatial statistics. Investments in technical infrastructure

and a stepwise transformation to service oriented architecture are needed in order to ensure

more effective and consistent automated processes and interoperability. Certainly, the lack of

organizational integration of geospatial data and common platform are the challenges to be

worked on. The problems generated from that leads not only to production capacity, but also

ensuring a quality measures that usually can be detected with geospatial analysis and

assessment activities.

4. Methodological and organisational recommendations

In order to cover the identified needs and to prepare a business plan for establishing a point-

based geospatial framework in NSI, the project team developed methodological and

organizational recommendations that should be followed. The recommendations are structured

in a way that answers given questions/issues.

4.1. A point-based spatial statistical framework

Use high-quality point-based location data. All features making up the infrastructure need to

be time-aware and have a start- and end date, and relevant metadata to know when and how the

geocodes were derived. The data should be regularly updated with relevant timestamps.

Presence of high-precision and standardized geocodes/identifiers at a unit record level.

Geocodes are standardised codes or identifiers usually from national classifiers or coding

systems and are used to link unit record data with location data.

Priority on address geocoding process. Geocoding is the process of assigning geocodes to

statistical unit records or its geographical localisation attributes as addresses. Addresses are

collected and used from a number of statistical surveys and registers and currently are the only

option to spatially enable statistical data that is already collected. Addresses can be turned into

Page 10: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

10

geocodes if they are linked to geospatially referenced object, such as is building, managed

properly and provided with identifiers.

Establish address location and identification system. NSI should work on establishment of

address reference dataset because there are no national address or building registers. Cadastral

data maintained by CA contains ID of properties but integration with statistical records only by

cadastral identifiers is not possible without an intermediate reference framework that contains

all key identifiers. In parallel, a model to maintain collected address data in a standardised and

consistent way should be developed. A centralized address location and identification system

that can harmonise all address data across the organisation by using permanent address

identifiers is needed. The system should provide a permanent location identifier to all the

datasets containing address by validating service. In addition, it is good point-of-entry

validating against the central address database to be developed as late as for the time of census.

Ideally, the system should in the future provide integration with the national address register.

In all cases such system is needed because it could maintain location description that exist and

are used by people but are not officially recognised.

4.2. Institutional priorities and activities

Build the infrastructure required to answer the needs. Covering the identified needs

requires certain changes to be initiated by NSI in its corporate infrastructure management.

These changes should take into account all current issues (constraints) and challenges at the

national context, including the burden that will be generated on the production and institutional

management processes.

Ensure resources. Beside identification and accessing new sources to feed the system of point-

based geolocation data, investments on capacity and resources are needed to answer the

challenges. The new infrastructure will come with new architecture, new or significantly

revised data flows, which NSI has to manage together with the external data providers.

Coordination and cooperation. Building new infrastructure requires sustainable and

consistent coordination between all parties related to the maintenance of the infrastructure. The

components and the overall business process has to be developed in close cooperation with all

data or service providers.

Page 11: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

11

II. Overview of project activities and results

Methodology work was carried out in four stages.

Experts from National Geodesy, Cartography and Cadastre Agency (CA), Directorate General

for Civil Registration and Administrative Services (CR), State e-Government Agency (SEGA)

and Municipality of Gabrovo participated in the project activities.

Fig. 1: Focuses of methodological work on setting point-based geocoding infrastructure

Page 12: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

12

Stage 1: Analyze and assess the potential of administrative and statistical sources for setting up

a point-based infrastructure for geocoding of data.

(1.1) Assess usefulness and accuracy of recognized geospatial data sources.

(1.2) Assess completeness and geographical coverage of the datasets for the moment of

Census enumeration planning and collecting phase.

(1.3) Assess quality and scheme of maintenance of the datasets.

(1.4) Assess temporal aspects maintenance and temporal cohesion with statistical data.

The address data within the following datasets were examined and assessed for being

useful for initial establishment, regular and complementary updating and standardization

of address foundation. The address data was assessed for quality of address content -

consistency and completeness:

Information System Demography (NSI)

Business register (NSI)

Census 2011 (NSI)

Survey on newly built residential buildings and dwellings (NSI)

Nomenclature of permanent and current addresses in Bulgaria (DG CRASS)

Cadastral map and cadastral registers (CA)

Local taxes (Municipality of Gabrovo)

Stage 2: Improve access to geospatial data in regard of setting up building/dwelling and address

register for statistical purposes.

(2.1) Harmonise address location data in regard European and International standards

and coordinate address identifiers/ geocodes.

(2.2) Develop uniform approach to spatially enable address data with cadastral map

objects.

(2.3) Build formal working relationship with location data providers for sustainable

geospatial data flows and feedback managing routines.

Stage 3: Develop strategy for setting up and maintenance of point-based geocoding

infrastructure at NSI:

(3.1) Assess resources needed/ assess processing capacity.

(3.2) Set-up organization for obtaining and management of geospatial data.

(3.3) Specify geo-statistical census output.

Stage 4: Develop methodology for consistent geocoding of statistical/ administrative data.

(4.1) Develop routines for geocoding, geocoded data verification and managing

geocoding errors.

(4.2) Develop consistent methodology for complementary geocoding (for areas where

point location does not exist).

Page 13: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

13

Main results

Conclusions on sources to be used and proposal for data to set up a point-based

foundation.

Methodology for enabling address data with geolocation. Spatial address data model.

Methodology for geocoding of administrative and statistical data records. Components

of consistent geocoding.

Organisational setup - key findings and recommendations for activities.

Page 14: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

14

III. Proposed solution for establishing a point-based foundation for address geocoding at

NSI

“As to methods there may be a million and then some, but principles are few. The man who grasps principles

can successfully select his own methods.” ― Harrington Emerson

The Global Statistical Geospatial Framework provides a common method for statistical and

geospatial data integration and sets five high-level strategic principles to form the basis for the

statistical geospatial infrastructure development:

Principle 1: Use of fundamental geospatial infrastructure and geocoding.

Principle 2: Geocoded unit record data in a data management environment.

Principle 3: Common geographies for dissemination of statistics.

Principle 4: Statistical and geospatial interoperability – Data, Standards and Processes.

Principle 5: Accessible and usable geospatially enabled statistics.

The team followed these principles and the connected objectives, consolidating the outcomes

from GEOSTAT projects together with UN-GGIM recommendations for content of the core

data, to elaborate on how to implement the address geocoding framework in the contemporary

situation.

1. A Point-based address reference dataset

For the census, an address system for locating buildings and dwellings or identifying housing

units is needed. The address reference system is location system in a human readable form. It

defines set of address components and the rules for their combination into addresses.

Currently, the exhaustive set of addresses in the country cannot be provided by one single

source.

1.1. Data sources and frameworks

To find the high-accuracy data appropriate for an address point-based foundation, a number of

data sources from the public domain were initially selected for assessment, following the

recommendations to use data from trusted, authoritative sources.

Addresses in statistical datasets:

Addresses of buildings and dwellings from Census 2011 survey (NSI)

Addresses of current residence of population, Statistical Population Register (NSI)

Addresses of businesses, Statistical Business Register (NSI)

Addresses of newly built residential buildings, quarterly exhaustive statistical survey

collecting data from local authorities (NSI)

Addresses in administrative sources:

Addresses of physical location of land parcels, buildings, units within buildings from

Cadastral registers (CA)

Nomenclature of permanent and current addresses in Bulgaria (DG CRAS)

Local taxes (Municipality of Gabrovo)

Page 15: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

15

Location reference data from Cadastral map

Vector data of land parcels, buildings, units within buildings (CA)

The overall quality of the sources and datasets was assessed - as regulations and ordinances for

maintenance, schemes of collection, storage, update, usage of thematic coding systems, etc. The

quality of thematic content of addresses/geocodes was evaluated for completeness of attributes

and consistency of coding for the address components by comparing to the standardised

national list/ nomenclatures. A summary table of the analysis can be found as annex to the

report.

Regarding the establishment and maintenance of spatial address reference dataset the data

sources were classified to have the following roles in the process:

Initial – sources to establish initial collection of addresses and addressable objects;

Updating – sources for regular update or address foundation;

Complementing – sources for complementary updating, contributing to address

collection by extending the thematic scope of address information.

Standardising – national registers and nomenclatures that provide standard for address

components.

Addresses are defined as structured descriptions of a place, and often an address consists of a

number of hierarchical components and identifiers. No official national address standard is in

place, but coding systems and frameworks to help in data harmonisation do exist.

National classificatory and registers:

EKATTE – Unified classificatory for administrative- territorial and territorial units

National Register of Populated Places and Unified Classificatory of Administrative-

territorial and Territorial Units (NSI);

Register of Geographic Names (CA);

Classificatory of Localisation Units (thoroughfares) (DG CRAS);

Classificatory of Addresses – List of all addresses capable for registration of citizens,

defining the numbering ranges for every localisation unit (DG CRAS).

First we want to establish possibly the most exhaustive list of trusted, valid set of addresses.

During the census survey in 2011 the addresses of residential buildings and dwellings were

collected in a standardised structure and coded in the time of collection. The census address

collection is the most complete and standardised address dataset and proposed for initial setup

of address reference dataset.

DG CRAS maintains address information within the framework of Civil Registration System

and provides national coding system for thoroughfares and national list with addresses where

citizens could be registered for residence.

Nomenclature from DG CRAS maintained with the help of local authorities is proposed for

address components/ characteristics update. For registering new addresses and retiring, one that

is not in use and updating changes in address characteristics.

1.2. Address content and harmonisation

Page 16: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

16

Address definition in Cadastral and Property Act is: “Address of an immovable property" shall

be the description of its physical location comprising obligatorily the names of the district, of

the municipality and the populated place/the settlement unit, and including (as appropriate) the

name of the street, respectively square or boulevard, housing complex or neighborhood, street

number, entrance, floor, self-contained property within a building, and for immovable

properties in agricultural and forest areas, respectively the name of the locality.

Addresses differ in content and quality of detail in urban areas of the settlements. Addresses

outside urban areas (in agricultural and forest areas) also in small villages may have no assigned

street names so address is given only at populated place/ village level (address area name).

Buildings in this village share the same address and do not have distinctive address. This issue

is problematic when conducting on-line census for identification of housing units.

According to INSPIRE, five subclasses of address components are defined: administrative unit

name, address area name, thoroughfare name, address locator, and postal descriptor.

Every component represents a level of objects in this hierarchical framework and defines a level

of accuracy of addressing.

District name

Municipality name

Capital city municipality subdivision name

Settlement name (type prefixes)

Plovdiv /Varna city subdivision name

Postal area

Localisation unit name (type prefixes)

Street locator

Building locator

Entrance locator

Floor locator

Unit within building

House numbers or names are important to distinguishing one location from neighboring

addresses. This is the mandatory information. It can be a systematic designator, such a number

or a name. Addresses can have other locators, such as an entrance number or apartment/ unit

number.

Two types of temporal information are recommended by the theme specification:

temporal information on when this version of the address is valid in real world

temporal information on the changes of the address record in the database or spatial

dataset.

Provides metadata about the lifecycle of the connection between the address and the object.

The status value (Current, Retired) of address data relates to the real world address or address

component and not to the property to which the address or address component is assigned, the

addressable object. The addressable object has its own status value.

Page 17: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

17

A characteristic point represents the position of the address. The address record metadata should

provide information on how this point is captured and by whom:

Captured from cadastral addressable object or

Created manually by NSI staff by pinpointing the position on map

Created by NSI by field surveying, capture GPS coordinates

Address geographic position should be specified by the type of spatial object used to derive the

position. This could be: building, part of the building/ entrance for residential multifamily

buildings, land parcel, and administrative unit. Wherever possible, building or entrance is

recommended to be used, for reasons of precision.

position - a pair of geographic coordinates;

position object – the type of the object that provides position

position object identifier- permanent identifier of AO;

position level the level of accuracy from which is the position;

position source – provides information how is position captured;

Position GridID – holds the grid-cell ID of ETRS89_1km grid net.

Fig 2. Characteristic points captured from cadastral map objects

Address location can be determined by the following component combination/ location styles:

The address should allow the unambiguous determination of an object for purposes of

identification and location. Ambiguity of naming was found on every level of address hierarchy,

except on district level (Units from the same level of hierarchy with the same name). This issue

has to be followed with metadata informing every step of address identification.

1.3. Address Matching

There are three general parts of addressing which identification is essential for finding the

physical location:

Locate within the country Identify settlement

Locate within the settlement Identify the localisation unit

Locate within localisation unit Identify building or parcel

Locate within the building Identify dwelling/ unit within building

Page 18: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

18

To identify unambiguously the settlement, "full name" is needed, which includes the settlement

type, name and administrative-territorial belonging. Three alternatives for settlement

identification exist in the datasets:

[District], [Municipality], [Settlement type], [Settlement name] or

[EKATTE code] or

[Postal code], [Settlement name].

Statistical registers and databases use EKATTE consistently to code ATUs. In administrative

datasets, EKATTE is also widely known and used. Within 3 days of ATU change published in

State Gazette, NRPP and EKATTE are updated and updates are made visible to the users in a

structured way. Automation in update is possible but no standard network services are available.

EKATTE is in use in cadastral registries. Some inconsistencies were found in coding, which

may need joint expert work to be cleared, in order to perform proper references of address data

by codes.

Postcodes are assigned to populated places/ settlements. For cities, sub division of postal areas

exists. The postcodes are four digit codes. No boundaries available. The advantage of using

postcodes is unambiguous identification of settlement without stating the upper level

administrative units. People usually are aware of the code of the postal area where they live,

which can be useful in address collecting processes. It is recommended postcodes to be taken

into account when implementing the address dataset.

To identify localisation unit within settlement the name and the type of localisation unit are

needed or localisation unit code from CLU.

[Localisation unit type], [Localisation unit name] or

[Localisation unit code]

Localisation unit can be street, boulevard, square, alley, housing complex, neighborhood,

system of small unnamed roads, allotment, settlement formations or localities, which are named

geographical areas situated in the lands of the settlement, outside urban area of the settlement.

Addresses describing physical location of an object include different address components

depending of the location of addressable object (are location specific). It depends if the

building/land parcel is situated in the settlement urban regulation area where addresses are more

detailed and accurate or outside this area.

The addressing style (specification) is determined by the type of localisation unit. Three types

of address specification could be distinguished – urban street type addressing, urban

quarter/area type addressing, and rural addressing. What components are sufficient to provide

unambiguous identification of place were identified by each of address styles.

One place could have many address descriptions. This is the case when housing complex/

quarters and building designator are used together with street name/ code and street locator.

This happens in the quarters where there are named streets and mixed address styles are in

place. Building on the corner of two streets have a potential for either address to be used to

Page 19: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

19

define that location. It is recommended all the descriptions of one place that are in use to be

included in the dataset.

We can have more than one address descriptions for one object, however, it should be assured

that one address determines only one object and only one object should be selected from

cadastral data to represent the address position.

For addresses that give ambiguous location, coding of address names solves the problem but

still there is an issue with the automated coding of text addresses.

Issue with ambiguousness of addresses descriptions are usually due to merging of settlements,

not followed by readdressing in the municipal level. DG CRAS has currently solved this issue

in its databases by applying unique coding of thoroughfare. Same names of localisation units

within the address area, from different type. Still, equal names in one addressable area are a

problem when matching textual addresses.

2. Components of consistent geocoding framework

What is required for a dataset to become a spatial statistical framework?

Have standardised identifiers/geocodes

Use of high quality location data regularly updated with time stamps

Presence of high-quality geocodes at statistical unit records

Data management and documentation of processing

2.1. Persistent identifiers

To turn addresses into geocodes, we need to establish identifier system. In the current practice,

when some tasks need address matching at NSI, address components are coded and locators are

formatted and are combined into a structured identifier. This practice have proved to bring

inconsistencies in address identification and matching between different datasets and especially

in different periods. Semantic identifiers are not preferred as a practice for consistent

maintenance of location identification. It is recommended a unique and persistent identifier to

be used because it is more reliable than simple coding and matching.

Centralised maintenance of address reference dataset that supply unique numeric address code-

the persistent identifier for an address unit is a key requirement. Additionally, the model can

reserve a field to hold the future national identifier from NAR.

Page 20: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

20

Fig 3: Sources for initial set-up, regular update and standartisation of records of address

reference dataset (address index)

2.2. Hierarchical geocoding framework

The process of obtaining locations and geocodes for different addressing levels should use

relevant and fundamental geospatial data that is why the cadastral data is currently the choice

for location reference objects.

The level of buildings, parts of buildings and land parcels is the level that provides the highest

accuracy of geocodes.

Cadastral map and cadastral registers are produced and maintained in digital format, and pass

through a number of mandatory procedures, for quality assurance and acceptance. The approved

cadastral map and cadastral register data are maintained in information system and the date of

the entry in the system is indicated. Each cadastral object is attributed an identifier. The

Page 21: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

21

structure and the content of the identifier of a real estate is prescribed by an ordinance, issued

by the Minister of Regional Development and Public Works, and is attributed by the cadastral

office.

The land parcel part contains: identifier; boundaries and area, fixed by the geodetic co-

ordinates of the points defining them; permanent purpose of use of the territory; method

of permanent use; address;

The building part contains: identifier; boundary and/or outline of the building and of

facility; built-up area determined by the geodetic co-ordinates of the defining points;

number of floors; purpose of use; building type; address. A polygon feature (the

building footprint) on the map represents building entity.

The self-contained object in a building (unit in a building) part contains: identifier;

floor; outline; number of levels in the object; area according to the documents; purpose

of use; information about individual units/dwellings; address. The dwellings have

spatial representation. The outlines of units in the building are available as polygon

features on the map.

Part of building- Cadastral map also contains helpful point information used for

labeling the map, which is not part of official cadastral data. These point tags give

information for the type of the object that is labeled and the label itself. Could be used

for extracting characteristic point for entrance in apartment buildings, where available.

Entrance locators are part of standard address description for residential multifamily

buildings with more than one entrance.

Fig 4. Tags for building entrances in cadastral dataset.

Addresses that are recorded in the cadastral registers are physical description of the location of

the objects. Address data in cadastral records is collected when the immovable properties are

registered in the cadaster. The address is declared by the property owner or collected from

documents and is recorded as an attribute to the corresponding object. Currently, addresses in

cadastral are not updated when some changes in address components occur like renaming of

thoroughfare, change in designators, etc. The addresses in the cadastral registry need pre-

processing that includes address repairing, coding before matching with the reference address

list.

Addresses in cadastral attributes provide the path to geocode statistical data.

Page 22: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

22

Coverage of the territory with cadastral parcels in the beginning of 2018 was 39%, coverage in

the beginning of 2019 was 73%.

Fig 4. Cadastral parcel coverage at the beginning of 2018 (light colour) and lands covered

with cadastral parcels between January 2018 and January 2019 (dark colour).

Although the percentage of territory covered with digital cadastral map will be close to 100%

at the end of 2019, the remaining uncovered parts of the territory are predominantly urban areas.

(Fig 5) This means that geographical coverage over the building cadastral objects by cadastral

map is much less.

The ‘building’ is selected as basic addressable unit. For assuring geographical coverage of this

important layer and achieving fully geocoded census, NSI need to fill the gaps in data

availability and collect building data in the parts where cadastral data is not completed.

For every captured building, a record in the address index should exist and mandatory attributes

populated so compliant addresses to be available for all buildings. NSI objective is at least 95%

of basic addressable units of residential type to be covered before census collection. Human

and financial resources for implementing this task are planned in the frame of census activities

in the pre-enumeration phase. The building data will be updated on continuous basis by

cadastral building dataset update when cadastral data is produced and accepted or updated.

Around 40% of residential building positions need to be captured by NSI.

Page 23: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

23

Fig 5. Cadastral data in urban areas

Requirements for location accuracy of address characteristic point are to be set within 5 meters

of the true position of the building centroid or entrance. Particular care is required to locate the

address on the correct side of the street.

The level of localisation units

The street level objects are important and valuable dataset for address allocation and geocoding.

Unfortunately, nationwide public vector data of localisation units does not exist. Commercial

products and OpenStreetMap roads are option, but do not provide the national coding used in

administrative and statistical records. The coding system for street entities is provided by DG

CRASS on a monthly basis, including status and date of change of units.

Classificatory of localisation units contains: identifier, localisation unit name, type, code,

validity period and status. The identifier is 5 digit code which identify the unit within the

settlement.

Features for localisation units could be extracted from cadastral land parcels by filtering the

usetype (for transport) of individual land parcel and matching by street code or name. Routines

for capturing parcels and updating street features from cadastral land parcels and from tag

information were developed. OpenStreetMap is used as additional source for information.

The type of localisation unit is important information to unambiguously distinguish the place.

f.e street and square can have same names within the settlement but define different locations.

No cadastral data in

settlement urban area

Cadastral data in

settlement urban

area and in

settlement lands

No cadastral data

for the entire

settlement land

Page 24: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

24

Fig 6. Capturing of street address parcels for establishing settlement street reference dataset

The level of settlements

Settlements are territorial units that define address area and are important as feature layer for

address data referencing and management. They are complex features.

Settlements need to be modeled as complex features for the purpose of addressing. Settlement

urban areas and settlement lands are needed as polygon features. Furthermore, a characteristic

point for the settlement need to be provided within the settlement main urban area for geocoding

purposes. Settlement polygons are a basic territorial units that enable construction of upper

hierarchical levels of administrative units.

The settlement codes are provided by NSI as administrator of NRPP and EKATTE. CA

provides boundaries of settlements. Characteristic point need to be calculated by NSI.

The grid net framework.

The grid nets are important for spatial, grid-based statistics. No national grid systems for spatial

analysis and reporting are set in use, so NSI plans to apply European grid system, which should

be reflected when setting the production environment

Page 25: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

25

2.3. Address data management

For applying consistent management of address reference data and geocoding standard

processes and sequences of processes should be applyed. Every process populates its own part

metadata variables in address records, designed for the purposes of information management at

the record level.

Address Standardisation processes. There are several processes that could contribute to

address standardisation, depending on the input address representation (text, coded, etc).

Parsing splits up address text into address components. Cleaning, clears spelling errors

and erroneous symbols, Repairing adds implicit information, fills the gaps and formats

address.

Address Verification is the process that checks if address is ‘true address’ and has all

mandatory attributes required by specification. In one word- is the address correct,

usable, but not necessarily in the address index.

Address Validation or Identification is the process that matches formatted address to the

index of valid addresses. An indication for address matching confidence should be

developed in the system. The more addresses matches are found the less is the

confidence of identification. The value could be the number of matched addresses for

example. If process fail to validate the address, information on which level matching

failed should be provided.

Address Learning –Addresses are “learned” from spatial reference dataset (from cadastral

objects) if they are “true addresses” (provide settlement information, street and street

number) and be consistent with neighboring addresses and neighboring thoroughfare

feature. Addresses can be learned from every dataset, if address learning process is

switched on.

Address Allocation finds the address position among map objects from one reference level, f.e

buildings. The process populates address geographic position attribute and position

metadata of an existent address, if find object on a map with the same address identifier.

Finding address location on the map includes coding and repairing of address data in

cadastral unit records and selecting the object if address matches. The addresses within

cadastral object attributes are first standardised and validated. For every address in the

Address index

Address Management -Update, Registration, Archiving – processes are needed to manage

address registration and updating routines. They populate time and status values.

Page 26: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

26

Fig 7. “Learning” addresses from map features

Fig 8. Allocating addresses from Address Index on a building point or grid cell (from address

list to map)

Geocoding data by address is the ability to “travel” the path from the address description in data

records to the geographic coordinates on map. Preferably, the point represented by these

geographic coordinates should be the most accurate representation of the physical location of

the addressable object. The most precise location is by default recorded in position attribute in

records within address reference dataset. Every address in the Address Index should be

positioned on a building object or if not, on a grid cell.

Page 27: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

27

When allocating addresses from address index, the position attribute will be populated by

Allocation process with coordinates of a characteristic point from one of the following accuracy

levels: [Unit within building], Building entrance, building, parcel, street or settlement.

Ideally, we want these coordinates to fall within building footprint. But when, in some, cases

this is not possible and then the allocation passes on the upper accuracy level and returns the

geocode of the upper level that is possible.

We want to assure that every address to be geocoded at least to 1kmETRS89 grid cell. This will

need some manual processing and as a final step – imputation of gridID in the address record.

2.4. Geocodes in statistical unit records

To provide quality address data for high precision geocodes it is important how address data is

entered and stored. In statistical records address data is relatively well structured and formatted

and standard nomenclatures are used.

There are measures that are recommended to be taken for providing accuracy of address

components in statistical records and reduce inefficiency and duplication of work on address

management.

Quality requirements about completeness or correct spelling of address components can be

provided by consistent collection mechanisms for avoiding spelling errors and entering of

incorrect value. If the collection interfaces standardise address data and check its existence in

the reference dataset they provide point-of-entry validation.

Fig. 9: Web interface for collection of addresses for Statistical Units in SBR.

Point-of-entry address validation against address reference dataset is recommended to be

implemented for computer and internet based capture of address information. During the e-

census collection phase, such validation within online survey forms could provide a mechanism

to link the household to the frame of cadastral units/features, returning a valid geocode. It

should also be considering the address and location capture if address is not recognised by the

address identification system. Standardised and formatted addresses in the time of entry into

statistical dataset provide high quality geocodes.

Page 28: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

28

Central address services would help ensure consistent standards and help control quality.

Fig. 10: Provide address services based on established address reference dataset for the for

statistical registers and datasets.

3. Organisational setup - key findings and recommendations for activities.

Implementation of the foundational level of statistical spatial framework requires changes at

organisational level and that brings financial and operational challenges to the organisation.

Three points were found clearly important:

Single standard for addresses

Point-of-entry validation of address data

Development of central address processing system

Clear guidelines are needed both for staff that will maintain address reference dataset

and for users that will use address services.

3.1. Technical conditions

Automated corporate services based on one central reference address dataset is the preferred

solution to establish consistent address maintenance in statistical data and provide quality

geocoding.

NSI needs a centralised management and storing of collected geospatial data. Which is one of

the major steps and resources needed to enable an enterprise organisation of work and data and

metadata access.

Manual tasks take time and effort and increase the risk for errors. Automation provides

efficiency and consistency and can significantly improve geocoding workflows.

A user friendly and easy to use web application is needed for address and building data analysis

and management. It should provide tools for in-house and mobile capturing of building data.

Page 29: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

29

The application should enable joint work of staff from central and regional statistical offices,

staff from local authorities, staff hired for pre-enumeration tasks to work together on building

and address data. Furthermore, styles for visual indication of missing or incorrect data could be

applied. For instance, streets, buildings and address characteristic points with the same street

name could be highlighted in the same colour.

3.2 Proposal for a sequence of steps to establishing a point-based address infrastructure

for geocoding

The capability for address geocoding needs to be in place for the time of census enumeration.

Implementation can be planned in a stepwise approach so some priorities and sequence of tasks

in establishing different levels of geocoding is proposed:

1. Enable Address Management and Address Identification.

Design spatial address reference dataset, design process metadata at record level.

Create guiding materials for address data management and addressing issues.

Implement.

Populate dataset by Registering selected initial collection of addresses (Census

2011)

Update to current

Standardise cadastral addresses and Validate/Identify

2. Enable Address Allocation.

Establish Geocoding Level of Settlements.

Allocate addresses on Settlement points and Analyze, Repair.

Establish Geocoding Level of Streets.

Allocate addresses on Streets points and Analyze, Repair.

Establish Geocoding Level of Buildings

Allocate addresses on Building points and Analyze, Repair.

[Establish Geocoding Level of Units within Building]

Allocate addresses on the level of Unit within building points and Analyze,

Repair.

3. Enable Address Geocoding.

Geocode Address index to ETRS89Grid_1km

4. Enable Point-of-entry Validation for computer and internet based capture of

address information to provide high precision address geocodes in statistical records.

3.3 Institutional arrangement required to conduct and support address geocoding framework.

Addresses from Population Register and Nomenclature of permanent and current addresses in

Bulgaria are received on a monthly basis from DG CRAS in an agreed content and in a defined

csv format in the framework of SPR. No additional action is needed.

CA provides copies of cadastral map and cadastral register to NSI, excluding the information

of ownership of immovable properties. The address data is attributed to spatial features and

address components are provided in separate fields. The copies are acquired once a year,

Page 30: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

30

contemporarily, and are received in a structured geodatabase, provided with metadata for the

content and in a predefined coordinate reference system based on ETRS1989 datum.

Two bilateral agreements were signed between NSI and CA at the end of October 2017 that

allow general access to the Cadastral information system and the Cartography fund.

There is option for accessing the cadastral map as a standard web map service for a small annual

maintenance fee. It is recommended to NSI to use this map service, as it contains the most up

to date information, when provided technical conditions for that. NSI should work on the

development of automation of the updates and ability to streamline the collection and

processing of cadastral data, as the volume of data is continuously increasing.

For achieving fully geocoded census in two years, it is essential to continue close collaboration

with CA and municipalities and to stay focused on the selected development priorities.

For maintaining and improving the quality of address information between cooperating

organizations the following opportunities for working groups were identified:

Clearing coding issues on settlement level

Clearing coding issues on street level

Page 31: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

31

Conclusions

The main goal of the project was to identify and propose applicable solution for point-based

foundation for geocoding by using available and trusted address and location data maintained

in statistical and administrative datasets.

Taking into account the initial level of maturity of the national spatial data infrastructure,

finding a consistent path for high-resolution geocoding of statistical data was a real challenge.

The project benefited from the valuable guidance of GEOSTAT reports/outcomes on how to

set up and use a point-based foundation for statistics and the dozens of good practices.

Based on identified needs concrete initiatives for geocoding and address collection during the

Census 2021 were proposed and included in the Census Programme. Additional financial

resources were allocated by the central budget to Census 2021 for implementation. The role of

CA as a partner institution of NSI in this process was recognised as essential and highlighted

in the Census 2021 act and the Programme.

Work on the project brought benefit from improved understanding and expanded knowledge

on administrative data from cadastral registers within NSI. Additional use cases were marked

and NSI can initiate activities to use the data not only as reference infrastructure but also as a

source for calculating statistics, like land use statistics, land area statistics, housing statistics

and more. Integrating cadastral identifier in statistical framework can facilitate linking more

administrative data sources.

References

EFGS/GEOSTAT 2 (2017). A Point-based Foundation for Statistics - Final report from the

GEOSTAT 2 project.

EFGS/GEOSTAT 3 National report (v0.96_Draft). Implementing the Statistical Geospatial

Framework at Statistics Sweden.

EFGS/GEOSTAT 3

UN-GGIM: Europe (2017). Core Spatial Data Theme Address. Recommendation for Content.

Version 1.0 2017-11-10.

UN-GGIM: Europe (2017). Core Spatial Data Theme Buildings. Recommendation for Content.

Version 1.0 - 2018-06-01.

Page 32: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

32

Annex

Summary table of the analysis and conclusions on selected data sources to establish a point-based infrastructure

Statistical sources Administrative sources

Quality Factor Census 2011 Statistical

Population

Register

Statistical Business

Register

Newly built

residential

buildings/dwellings

Cadastral map and

cadastral registers

NCPCA Local taxes

Relevant Dataset

Units

(with attributed

addresses)

Residential

Buildings,

Dwellings

Population Enterprises, Local

units

Newly built and

destroyed residential

buildings and the

dwellings in

residential

buildings.

Land parcels,

Buildings, Unit of

property within

building

Addresses for

population

registration,

Localisation Units

(Thoroughfares)

Immovable

properties (real

estates) declared for

taxation.

Relevance of

location represented

by address

Addresses describe

physical location of

residential buildings

and dwellings

Addresses describe

physical location of

current residence of

population.

Address describe

postal address

(correspondence

address) for the

Enterprise.

Addresses describe

physical location of

the place of activity

for Local Units

Addresses describe

physical location of

newly built

residential buildings

Addresses describe

physical location of

units

List of addresses

currently approved

for population

registration,

nomenclature of

localisation units

Addresses of

physical location of

real estates declared

by the owners for

property taxation

Relevance of data

source

Exhaustive

statistical survey on

population and

housing as of 1st of

February 2011. The

dataset is result of

one-off data

collection.

Microdata in

statistical register,

maintained by

information system,

source for

demography

statistics.

Statistical register,

maintained by

information system. Collecting and

integrating data

from several

administrative

registers.

Exhaustive

statistical survey.

The information on

newly built and

destroyed residential

buildings is obtained

quarterly through

regular reports from

all local

administrations. For

the period 2004-

2007 annual data,

since 2008 -

quarterly and annual

data.

Basic data on the

location, boundaries

and dimensions of

immovable

properties (real

estate) within the

territory of the

country, submitted

and kept up to date

as well as in

accordance with the

law. Maintained for

administrative

purposes.

Nomenclature

maintained for

administrative

purposes in the

framework of

population

registration.

Local/municipality

registers of declared

immovable

properties for which

local taxes are

collected.

Maintained for

administrative

purposes.

Page 33: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

33

Relevance of

provider

Internal provider,

authoritative,

trusted.

Follows statistical

production

standards.

Internal provider,

authoritative,

trusted.

Follows statistical

production

standards.

Internal provider,

authoritative,

trusted.

Follows statistical

production

standards.

Internal provider,

authoritative,

trusted.

Follows statistical

production

standards.

The cadastral

authority is the

Agency of Geodesy,

Cartography and

Cadastre of the

Ministry of Regional

Development and

Urban

Development.

External provider,

authoritative,

trusted.

DG CRAS, Civil

Registration and

Administrative

Services of the

Ministry of Regional

Development and

Urban

Development.

External provider,

authoritative,

trusted.

Local authorities.

Legal basis Statistical act,

Census act

Statistical act, EU

and EC Regulations

setting up common

basic standards in

the area of

demography

statistics, Civil

Registration Act

Statistical act, EU

Regulations in legal

framework for

business registers

for statistical

purposes,

Guidelines on

Statistical Business

Registers

Statistical act,

National Statistical

Programme

Cadaster and

Property Register

Act

Civil Registration

Act defines address

content and format

for address

registration.

Local Taxes and

Fees Act

Standard identifiers No standard

identifiers available

for housing units,

only census-system

specific unique

identifiers. Cannot

provide links to

administrative data

for

buildings/dwellings.

Not suitable for use

in the point based

infrastructure.

National: Personal

Identity Number

(PIN)

Unique

Identification Code

(UIC) of buusiness

entites

Regulation Plan

Identifier of the

property from the

territory regulation

plans and/or

cadastral ID of the

building.

Standardised by

ordnance national

cadastral IDs: for

land properties, for

buildings and for

units within

buildings; Not

standard/Internal.

Identifiers Suitable

for infrastructure.

Identifier of

Localisation unit

within settlement.

Identifier of

declaration

document;

Regulation Plan

Identifier of the

property

Address resolution

level/ highest spatial

level available

Address resolution

to the level of

property location

(including within

building locations).

Address resolution

to the level of

property location

(including within

building locations).

For Enterprises:

Postal address to the

highest available

level - location

within building. For

Local Units

Address resolution

to the level of

property location/

street number.

Geodetic accuracy

standards for

surveyed objects.

No accuracy

standard for related

addresses of the

Resolution of

addresses for citizen

registration included

in Classificatory of

addresses Address is

to the level of

Address resolution

to the level of

property location.

Page 34: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

34

addresses provide

accuracy to the level

of settlement. A

higher resolution

address for Local

Units is needed.

objects, but address

description

formatting is set.

property location/

street number.

Availability of

Spatial data

representation

No spatial data

available, only

address description.

No spatial data in the

information source,

only address

description.

No spatial data

available, only

address description.

No spatial data

available, only

address description.

Vector polygons of

landed properties,

vector polygons of

building footprints,

vector polygons of

schemes of property

units within

buildings

No spatial data

available, only

address description.

No spatial data

available, only

address description.

Address collection

and storage format

Rules for control and

validation in the

time of electronic

collection and data

entry. Formatting

and standardization

of address

components.

Updated monthly

from demographic

events. System rules

on validation and

control. Address

attributes stored in

separate fields.

Collected by

standardised form,

stored in separate

fields in the

database.

Electronic form

filled quarterly by

information from

municipality

authorities. Stored in

one field, comma

delimited.

Addresses collected

by special semicolon

delimited format.

Address elements

stored in separate

fields and can be

provided as separate

address attributes.

Addresses are

defined by local

authorities and

reported to the

regional structures

of DG CRAS on a

daily basis.

In declaration

document, formatted

text. Usually one-

field text. Technical

implementations

differ by

municipality.

Maintenance of

updates of

addresses,

documentation of

the updates and time

stamps

No updates of the

dataset. Addresses

are valid for

1.02.2011

Monthly updates

received from the

Unified System for

Civil Registration

and Administrative

Service of

Population by

information on

demographic events

(incl. migration e.g

change of address of

current residence).

Updated and

documented in the

time when event

occurs in the source

administrative

registers.

Addresses in the

dataset are not

updated. Time stamp

is related with the

date when the

building receives

completion

certificate.

Address data in

cadastral records is

updated upon

request of the

property owner. No

updates of address

attributes are

maintained

currently.

Changes reported

from municipalities

on a daily basis and

consolidated in the

national database.

Period of validity

and status are

documented on unit

level.

Declaration by the

property owner.

Correcting

declaration to

change information.

Use of address

attributes coding

and standardisation

Settlement and

administrative units

national coding,

Localisation Unit

national coding,

NUTS coding,

Settlement and

administrative units

national coding,

Localisation Unit

national coding,

NUTS coding,

Settlement and

administrative units

national coding,

Localisation Unit

national coding,

Settlement and

administrative units

national coding,

NUTS coding.

Settlement and

administrative units

national coding,

Localisation unit

national coding,

Postal codes,

Settlement and

administrative units

national coding,

Localisation unit

national coding.

No coding applied to

address attributes.

Settlement code

available in the

declaration

document.

Page 35: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

35

controls in

formatting of

designators

controls in

formatting of

designators

NUTS coding,

Postal codes.

Registered

geographic name

coding.

Consistency of

address coding

Consistent with

national coding

systems for as of

02.2011

Consistent with

current national

coding systems.

Consistent with

national

administrative-

territorial coding.

Street codes not

updated on every

change/ frequency

for street codes

updates not set.

High percentage of

unfilled street/street

number attributes.

(addresses may not

be assigned by the

local authority for

the time of data

collection).

Coding of

administrative units

is consistent. Some

inconsistencies on

the level of

settlement were

found. Localisation

units are barely

coded.

Provides national

coding systems.

Consistency of

settlement coding

not checked.

Completeness of

address attributes

Around 8% of

address descriptions

are not geocodable

(are not “true

addresses”) due to

missing address

components.

Around 7.9% of

address descriptions

in the dataset are not

“true addresses” due

to missing street

name and/or street

number.

For Enterprises

around 10% and for

Local Units around

4% of address

descriptions are not

geocodable to the

point.

Around 40% of units

are not geocodable

by address

description.

Cadastral ID or

Regulation Plan

Identifier of the

property could be

used for geocoding.

It was difficult to

assess accuracy of

addresses in the part

of street name

because co.

Completeness of

localisation unit is

difficult to assess

All addresses listed

in the classificatory

of addresses are

“true addresses”.

Address accuracy

not checked for

completeness of

address components.

Geographic

coverage

(as of beginning of

2018)

National. National. National. National. 39% of the territory

of the country is

covered by digital

cadaster in the

beginning of 2018,

73% in the

beginning of 2019,

above 93% coverage

estimated for the

2020.

All settlements

registered in the

country (5256)

define address areas.

In 2003 settlements

no localisation units

are defined so no

street addresses are

formed.

All properties on the

territory of

municipalities

within the country

excluding properties

with assessed value

less than 1680 lv.

Complexity of pre-

processing,

standardisation

Easy to obtain and

check the data. Easy

to populate

corresponding

standard Address

reference dataset

structure.

Easy to No parsing needed.

For Local Units in

SBR more detailed

address need to be

collected from

corresponding legal

units in

administrative

register.

Cadastral IDs

provide direct link to

cadastral map and

corresponding

building object.

Location of property

by Regulation plan

ID is an option

CR Require time-

consuming

standardization of

Localisation units.

Clearing coding

errors in settlement

coding requires joint

work with experts of

the provider.

Easy to obtain and

populate standard

Address reference

dataset structure.

Addresses of

properties need to be

parsed, normalized

and standardised

before use. Different

formats of address.

High resource

consuming task.

Page 36: Improvement of the use of administrative sources …...the use of geospatial data from the main spatial data provider (Cadastral Agency). The section elaborates on conceptual model

36

where no digital

cadaster is available.

Availability and

costs, conditions of

access

Internal source,

information for

addresses is

available for use and

processing and can

be provided in

format needed.

Internal source,

information on

addresses is

available for use and

processing. Can be

provided in

XLS/XSLX, CSV,

SAV or other

formats.

Internal source,

information for

addresses is

available for use and

processing.

Addresses attributes

together with UIC

can be provided in

dbf, txt и xls

formats.

Internal source,

information for

addresses is

available for use and

processing in xls

format.

Full or partial copy

of cadastral map in

vector CAD format.

Other formats by

agreement. WMS

available for annual

fee. On-line access

http://kais.cadastre.b

g for viewing,

querying and

downloading of

parts of map.

Received monthly

by an agreed CSV

format, together

with the information

on demographic

events (births,

deaths, marriages,

divorces, migration)

in the IS

Demography

Agreements are

needed with the

municipalities.

Relevance for

establishing

Address reference

dataset for the

Census 2021

Suitable for initial

setup of Address

reference dataset.

Suitable for

updating addresses

of dwellings from

individual

population records.

Suitable to extend

the scope of address

information – add

new categories of

addresses.

Appropriate for

updates where

cadastral

information is not

available. Cadastral

ID is recommended

to be set as

obligatory to be

collected (if the

region is covered by

digital cadaser.

Otherwise

Regulation Plan ID

of the property

should be collected

obligatory.

Appropriate to

provide position for

characteristic point

for building

addresses.

Suitable to extend

the scope of address

information. High

quality location

data- geodetic

precision and

accuracy. Suitable

for establishing a

point infrastructure.

Suitable for

standardisation of

address components.

Regularly updated.

Provides period of

validity of address

components and is

suitable for address

reference update.

Appropriate to

verify address

reference dataset

locally, by local

authorities.