« « CLASSIFICATIONS – a key element in the process of harmonization « Isabel Valente...

52
« « CLASSIFICATIONS – a key element in the process of harmonization « Isabel Valente (isabel.valente@ine.pt ) Statistics Portugal/Metadata Unit Work Session on Quality management systems (Q2010) Helsinki– 3 – 6 May, 2010

Transcript of « « CLASSIFICATIONS – a key element in the process of harmonization « Isabel Valente...

«

«

CLASSIFICATIONS – a key element in the process of harmonization

«

Isabel Valente ([email protected])Statistics Portugal/Metadata Unit

Work Session on Quality management systems (Q2010)

Helsinki– 3 – 6 May, 2010

VariablesSurveys Concepts

Classifications

Thesaurus

DataWarehouseProduction systemsDissemination systems

1 In, Morgado, Isabel, “Metadata and survey documentation Portuguese NSI experience”, European Conference on Quality and Methodology in Official Statistics (Q2004), 24-26, May, 2004, Mainz-Germany.

Fig.1 Macro Architecture of the Statistical Metadata System1

Integrated System of Statistical Classifications (SINE)

conceptual model developed by the Neuchâtel group

SINE main phases

2002-2004- development of the consultation application

- replacement of the existing information on classifications in the Portal

2004 – 2005- enlargement of the information made available

- begin the gradual incorporation of code lists

- start the development of the management application

SINE main phases

2006-2007- consolidation of the management application

- small adjustments' and improvements in the consultation application

Current phase (2008)

- consolidation and improvement of the existing model

- of harmonization of the existing information

SINE main purposes

1. be a reference base about national, communitarian and international classifications for statistical ends

2. be a reference instrument for the classifications management

3. be an instrument for the harmonization and coordination of classifications

SINE structure

Level

Item

Family

Classification

Version

• Classifications

• Code lists for observation

• Code lists for dissemination

What’s the difference between a classification and a code

list?

General ideasClassifications

• more conceptual• have a formal base• complex structures• big dimension• system of codification• formalized rules about

revisions and changes• versions are defined

Code lists• less conceptual• don’t have a formal base• simple structures • small dimension• could or not have a system of

codification• don’t have formalized rules about

revisions and changes• are not based over the idea of

version • operational lists of internal use of

the institution

Marital status

Degree of relationship with

the representative of the household

Ranks of turnover

Size classes of persons

employed

Sex

What to do?

Should those cases be considered classifications or code lists?

Classifications structures which have for base

Communitarian or national regulations

Methodological manuals

Communitarian or international recommendations

Reference structures

Consequence The remaining structures (code lists), whenever possible,

where approach to those structures

Problem encountered

Access to the code lists for the dissemination of data in 1st place

Access to the classification structure which is part of a recommendation or regulation in 2nd place

≈2000

Another problem

How to distinguish between standard classifications or reference structures from those code lists?

Solution

Trying to find distinctive elements in the versions names

Norms for the writing of names (naming convention)

General form

Main part [+ “,”+formal qualifier] [“+” (“+ informal qualifier +”)”] [+ “-“+ variant n] Qualifier

Examples:- Nomenclature of territorial units for statistics, 2002 version- International standard classification of education, 1997 (levels of education)- Types of dwellings (4)

Specific form: variant

The variant is always the last part of the name and is formed by: “–”+ word “variant” + “variant” number

Examples:- CAE Rev.2 (sections C to E) – variant 1- Classes of net monthly wages (IEFA, €) - variant 1

Constitutent elements of the name version

Rules for the writing of names

reference structures• keep the original and official

name• keep the word “nomenclature" or

“classification” in the name• Informal qualifiers are added to

distinguish national classifications from communitarian ones.

code lists • could or not keep the original

name• couldn't have the word

“nomenclature" or “classification in the name

• informal qualifiers are added to distinguish the code lists

• if variants of a reference structure they keep the name or acronym of that classification

• the names should be general

Another problem

Lack of harmonization in the writing form of classifications and code lists as also in its contents

How to harmonize?

1. Harmonization of the names of

– classifications

– versions

– items labels

Internal rules to SINE for the writing of classifications and versions names

Names are initiated by a capital letter, followed by small caps. Exception to that: acronyms, names or words that followed an end point.

examples:• V00011 - Statistical classification of products by activity

in the European Economic Community, 2002 version • V00021 -International standard industrial classification of

all economic activities, revision 3.1

Internal rules to SINE for the writing of classifications and versions names

The names of code lists should use the plural form

example:• V01610 - Types of primary and lower secondary education

Code lists derived from a standard classification have to keep in its own name the acronym or name of the standard classification

examples:• V01675 - CAE Rev. 3 (total, sections C to N) - variant 2 • V01717 - CPA 2008 (legal services) - variant 7

Internal rules to SINE for the writing of classifications and versions names

Those code lists have to include the word variant in its name

example:• V02023 - Activity status (IEFA) - variant 4

Cumulative structures have to include in its name the expression “cumulative”

example:– V02069 - Countries (cumulative - air transport companies)

Internal rules to SINE for the writing of classifications and versions names

• The items labels should be in its extensive form. Abbreviations should be avoided. Exception to that: acronyms or names.

• Items labels are initiated by a capital letter, followed by small caps.

example:

Problems with the names

• People give different names to the same things according with the perspective that is followed

• We should harmonize the expressions used avoiding to name the same things in a differently way

Problems with the names

Types of flow

Type of rail freight traffic

Type of movement in port

Type of traffic on the enterprise

Version Code Label

00811 T Total

00811 1 National

00811 2 International

Problems with the names

• However when we have too many versions of the same classification we need elements to distinguish between them.

Problems with the names

2. Harmonization of contents

How to do that?

Lists of countries

• compulsory harmonization of codes and labels of the items according with the Norm ISO alpha 2.

• the names of countries in Portuguese must be in accordance with the version approved by the Statistical Council.

• groupings of countries used in code lists had been centrally created and managed in order to establish a consistent and harmonized base of reference for this end.

• codes are always independent of the used language so they remain unchangeable in translations.

Activities or products code lists

• code lists derived from standard classifications had to keep codes and labels equal to those ones when equal.

• if different should have different codes and labels.

• for the aggregation of consecutive categories, codes are connected by a hyphen (i.e.: C-D).

• for the aggregation of non-consecutive categories connection is done by the particle “+” (i.e.: A+C).

Other code lists

• In code lists that integrate the same classification and without a standard classification for reference is tried to find the structure that is more including.

• Once found that structure it passes to be the reference structure. New code lists that appear are approached to that structure.

Other code lists

V00253 - Activity status, 2005 Code Label

1 Actives

11 Employed

12 Unemployed

121 Unemployed seeking first job

122 Unemployed seeking new job

2 Inactives

21 Pupils/students

22 Homemakers

23 Retired

24 Permanent disabled for work

25 Others

Other code lists

• For other code lists where it is not possible to find a standard and in which the categories little varied is promoted to keep unchangeable the codes and labels for the categories that where kept unchangeable.

Other code lists

Other code lists

• Use in code lists of certain codes for certain situations

• total codified with T • residual values preferential with 9, or

finished in 9

• promoted the use of codes and labels of structures already inserted in SINE in detriment of new codifications and formularizations.

Age groups

• ONU, Standard international age classification

• five-year and ten year age groups, with the boundaries generally beginning at multiples of five and ten and ending at four and nine

• ages separated by a hyphen, preceded and followed by a space, thus simplifying the use of particles and becoming them more generalist

Other size classes

• consecutive classes should be explicitly clear, so they should not repeat equal values in different classes

• in all items should be explicit what is the target of quantification (i.e.: years, euro, person, etc.).

• minimum and maximum thresholds should use normalized expressions:

– In the lower class “Less than” (i.e.: Less than 30 years).

– In the higher class “and more” following the last value immediately used (i.e.: 65 and more years).

– The signals “<”, “>”, “≤” and” ≥” should not be used

Other size classes

• numerical values higher than the thousand have to be separated by a space in order to make the reading between hundreds, thousands, tens of thousands, millions, etc., easier (10 000 000)

• or alternatively be adopt in its substitution powers of 10 (106)

other size classes

Other size classes

Conclusions

• SINE gave to know what exist about classifications• widened the term to code lists• make classifications structures available:

– in a normalized format – in an easy way – at any time– in accordance with the users needs

Conclusions

• Because of that it was possible:

– the detection and correction of errors of writing– harmonization in the form of writing of codes and labels– to implement some harmonization procedures and rules– to improve the clarity and the precision of the terms used– to improve the integration between code lists and standard

classifications– harmonization of codes and labels between code lists– reduction of the number of code lists needed by the creation of

generic and transversal structures– Time profits– Bigger integration between the different metadata subsystems

Conclusions

Classifications systems are a key element for the improvement of the quality and coherence of

the existing metadata

the existing information

Thank you