An information model for a metadata-driven editing and ...€¦ · An information model for a...

27
An information model for a metadata-driven editing and imputation system Rok Platinovsek UNECE Work Session on Statistical Data Editing, April 24-26 2017

Transcript of An information model for a metadata-driven editing and ...€¦ · An information model for a...

Page 1: An information model for a metadata-driven editing and ...€¦ · An information model for a metadata-driven editing and imputation system Rok Platinovsek UNECE Work Session on Statistical

An information model for a metadata-driven

editing and imputation system

Rok Platinovsek UNECE Work Session on Statistical Data Editing, April 24-26 2017

Page 2: An information model for a metadata-driven editing and ...€¦ · An information model for a metadata-driven editing and imputation system Rok Platinovsek UNECE Work Session on Statistical

Contents

• Banff parameters

• Metadata information model

• Data organization

2 24-26 April 2017 Rok Platinovsek

Page 3: An information model for a metadata-driven editing and ...€¦ · An information model for a metadata-driven editing and imputation system Rok Platinovsek UNECE Work Session on Statistical

Banff

Page 4: An information model for a metadata-driven editing and ...€¦ · An information model for a metadata-driven editing and imputation system Rok Platinovsek UNECE Work Session on Statistical

Banff procedures Edit Specification

and Analysis

Edit Summary Statistics Tables

ErrorLocalization

Deterministic Imputation

DonorImputation

Imputation Estimators

Prorating

MassImputation

OutlierDetection

Amendment

Review

Selection

4 24-26 April 2017 Rok Platinovsek

Page 5: An information model for a metadata-driven editing and ...€¦ · An information model for a metadata-driven editing and imputation system Rok Platinovsek UNECE Work Session on Statistical

Banff processor

5 24-26 April 2017 Rok Platinovsek

Page 6: An information model for a metadata-driven editing and ...€¦ · An information model for a metadata-driven editing and imputation system Rok Platinovsek UNECE Work Session on Statistical

The metadata information model

Page 7: An information model for a metadata-driven editing and ...€¦ · An information model for a metadata-driven editing and imputation system Rok Platinovsek UNECE Work Session on Statistical

Process flow

Function

Method

Scalar parameter

Variable listWeighted

variable listExpression Edit set Estimator set

Edit Estimator

Algorithm

7 24-26 April 2017 Rok Platinovsek

E&I metadata information model

Page 8: An information model for a metadata-driven editing and ...€¦ · An information model for a metadata-driven editing and imputation system Rok Platinovsek UNECE Work Session on Statistical

Process flow

Function

Method

Scalar parameter

Variable listWeighted

variable listExpression Edit set Estimator set

Edit Estimator

Algorithm

8 24-26 April 2017 Rok Platinovsek

Process flow

• Topmost object

• Several process flows, e.g. production and testing

Page 9: An information model for a metadata-driven editing and ...€¦ · An information model for a metadata-driven editing and imputation system Rok Platinovsek UNECE Work Session on Statistical

Process flow

Function

Method

Scalar parameter

Variable listWeighted

variable listExpression Edit set Estimator set

Edit Estimator

Algorithm

9 24-26 April 2017 Rok Platinovsek

Function

• Description of E&I activity without reference to

implementation

• Purpose attribute (review | selection | amendment)

Page 10: An information model for a metadata-driven editing and ...€¦ · An information model for a metadata-driven editing and imputation system Rok Platinovsek UNECE Work Session on Statistical

Process flow

Function

Method

Scalar parameter

Variable listWeighted

variable listExpression Edit set Estimator set

Edit Estimator

Algorithm

10 24-26 April 2017 Rok Platinovsek

Method

• Central information object in the model

• Implementation attribute, e.g., “Banff donor imputation”

• Parameter set depends on implementation

Page 11: An information model for a metadata-driven editing and ...€¦ · An information model for a metadata-driven editing and imputation system Rok Platinovsek UNECE Work Session on Statistical

Process flow

Function

Method

Scalar parameter

Variable listWeighted

variable listExpression Edit set Estimator set

Edit Estimator

Algorithm

11 24-26 April 2017 Rok Platinovsek

Driver table

Page 12: An information model for a metadata-driven editing and ...€¦ · An information model for a metadata-driven editing and imputation system Rok Platinovsek UNECE Work Session on Statistical

Process flow

Function

Method

Scalar parameter

Variable listWeighted

variable listExpression Edit set Estimator set

Edit Estimator

Algorithm

12 24-26 April 2017 Rok Platinovsek

Function Method name Purpose Implementation

Verify the set of edits prep

Verify the set of edits BANFF veryfyedits

Edit summary tables review

Edit summary tables BANFF editstats

Identify outliers selection

Identify outliers; historic method BANFF outlier

Identify outliers; current method BANFF outlier

Identify inconsistent observations and select fields for imputation selection

Identify inconsistent observations and select fields for imputation BANFF errorloc

Impute missing values and fields identified by error localization amendment

Deterministic imputation BANFF deterministic

Donor imputation within areas BANFF donorimputation

Donor imputation (unrestricted) BANFF donorimputation

Estimator imputation (negative values not accepted) BANFF estimatorimputation

Estimator imputation for QR_PROF (negative values accepted) BANFF estimatorimputation

E&I process for the egg-lying statistic (production)

Page 13: An information model for a metadata-driven editing and ...€¦ · An information model for a metadata-driven editing and imputation system Rok Platinovsek UNECE Work Session on Statistical

Process flow

Function

Method

Scalar parameter

Variable listWeighted

variable listExpression Edit set Estimator set

Edit Estimator

Algorithm

13 24-26 April 2017 Rok Platinovsek

Method parameters

Page 14: An information model for a metadata-driven editing and ...€¦ · An information model for a metadata-driven editing and imputation system Rok Platinovsek UNECE Work Session on Statistical

Process flow

Function

Method

Scalar parameter

Variable listWeighted

variable listExpression Edit set Estimator set

Edit Estimator

Algorithm

14 24-26 April 2017 Rok Platinovsek

Scalar parameter

• Name-value pair

• E.g. name=”mindonors” value=”10”

• Information model extensible

Page 15: An information model for a metadata-driven editing and ...€¦ · An information model for a metadata-driven editing and imputation system Rok Platinovsek UNECE Work Session on Statistical

Process flow

Function

Method

Scalar parameter

Variable listWeighted

variable listExpression Edit set Estimator set

Edit Estimator

Algorithm

15 24-26 April 2017 Rok Platinovsek

Variable list

• Ordered vector of variable names

• Each variable marked up via unique variable-ID

Page 16: An information model for a metadata-driven editing and ...€¦ · An information model for a metadata-driven editing and imputation system Rok Platinovsek UNECE Work Session on Statistical

Process flow

Function

Method

Scalar parameter

Variable listWeighted

variable listExpression Edit set Estimator set

Edit Estimator

Algorithm

16 24-26 April 2017 Rok Platinovsek

Expression

• SAS expression, used for various purposes

• E.g. “strat=1” used to select a subset to which E&I action is

applied

• Contains a variable list of variables used in the expression

Page 17: An information model for a metadata-driven editing and ...€¦ · An information model for a metadata-driven editing and imputation system Rok Platinovsek UNECE Work Session on Statistical

Process flow

Function

Method

Scalar parameter

Variable listWeighted

variable listExpression Edit set Estimator set

Edit Estimator

Algorithm

17 24-26 April 2017 Rok Platinovsek

Edit & edit set

• The same edit can appear in several edit sets

• Several methods can use the same edit set

• An edit is expressed as a SAS expression

Page 18: An information model for a metadata-driven editing and ...€¦ · An information model for a metadata-driven editing and imputation system Rok Platinovsek UNECE Work Session on Statistical

Example: edit rule definition

18 24-26 April 2017 Rok Platinovsek

Edit 1

Edit rule

HEN_LT20 + HEN_GE20 + HEN_OTH = HEN_TOT

Expression

HEN_LT20,

HEN_GE20,

HEN_OTH,

HEN_TOT

Variable list

Page 19: An information model for a metadata-driven editing and ...€¦ · An information model for a metadata-driven editing and imputation system Rok Platinovsek UNECE Work Session on Statistical

Example: donor imputation

19 24-26 April 2017 Rok Platinovsek

Edit 1

Edit rule

Edit 2

Edit rule

Post-imp edit 1

Edit rule

Post-imp edit 2

Edit rule

Edit 3

Edit rule

Production edits

Edit set

Post-imp edits

Edit set

Donor imputation (unrestricted)

Method

STRAT,

HEN_TOT,

QR_REV

Var list

name="acceptnegative“, value="yes"

Scalar parameter

name="mindonors “, value=“10"

Scalar parameter

Page 20: An information model for a metadata-driven editing and ...€¦ · An information model for a metadata-driven editing and imputation system Rok Platinovsek UNECE Work Session on Statistical

Metadata versioning

• Once parameters are used in production, they should be retained

indefinitely

• => Need a versioning mechanism

• Example:

20 24-26 April 2017 Rok Platinovsek

<methodparameters>

<scalarparameter vstart="2017-01-11" vend="" name="mindonors" value="10"/>

</methodparameters>

<methodparameters>

<scalarparameter vstart="2017-01-11" vend=“2017-04-25" name="mindonors" value="10"/>

<scalarparameter vstart="2017-04-25" vend="" name="mindonors" value="15"/>

</methodparameters>

Page 21: An information model for a metadata-driven editing and ...€¦ · An information model for a metadata-driven editing and imputation system Rok Platinovsek UNECE Work Session on Statistical

Data organization

Page 22: An information model for a metadata-driven editing and ...€¦ · An information model for a metadata-driven editing and imputation system Rok Platinovsek UNECE Work Session on Statistical

Data organization principles

• E&I process fully described in the metadata

• Audit trail info required for traceability and reproducibility:

a) Mark the field that was reviewed, selected or amended

b) Identify the method via reference to metadata

c) Timestamp (there may be multiple parameter set versions)

22 24-26 April 2017 Rok Platinovsek

Page 23: An information model for a metadata-driven editing and ...€¦ · An information model for a metadata-driven editing and imputation system Rok Platinovsek UNECE Work Session on Statistical

A naive data organization model

23 24-26 April 2017 Rok Platinovsek

edt_status edt_mref edt_time id year class var1 var2 edt_n_var1

original 2016-03-01 09:31 001 2015 2 45 150 .

original 2017-02-15 15:01 001 2016 2 51 156 .

original 2016-03-01 09:31 002 2015 9 12 99 .

original 2017-02-15 15:01 002 2016 9 60 110 .

selection/banff_errorloc ref5 2017-02-16 10:03 002 2016 9 . . 1

amendment/banff_estimator ref7 2017-02-15 10:23 002 2016 9 13 110 1

• Status can be original or denote the E&I method in question

• Edt_mref – link to metadata

• Indicator variable (edt_n_) added for each original variable that is

subject to E&I actions

Page 24: An information model for a metadata-driven editing and ...€¦ · An information model for a metadata-driven editing and imputation system Rok Platinovsek UNECE Work Session on Statistical

Demands on the data organization model

a) Data pertaining to different production cycles can be extracted in

a standardized way. The extracted data have a standard

structure.

b) Any editing version can be extracted in a standardized way.

c) Indicators like the imputation rate can be calculated in a

standardized way.

d) Traceability and reproducibility of E&I actions via audit trail.

24 24-26 April 2017 Rok Platinovsek

Page 25: An information model for a metadata-driven editing and ...€¦ · An information model for a metadata-driven editing and imputation system Rok Platinovsek UNECE Work Session on Statistical

Conclusions

Page 26: An information model for a metadata-driven editing and ...€¦ · An information model for a metadata-driven editing and imputation system Rok Platinovsek UNECE Work Session on Statistical

Conclusions

• Information model for a Banff-based E&I system with full audit trail

• Metadata information model: 12 metadata objects that fully

specify the E&I process

• Data organization principles

• Extendable to non-Banff implementations with minimal or no

changes

26 24-26 April 2017 Rok Platinovsek

Page 27: An information model for a metadata-driven editing and ...€¦ · An information model for a metadata-driven editing and imputation system Rok Platinovsek UNECE Work Session on Statistical

Thank you!

Rok Platinovsek [email protected]

UNECE Work Session on Statistical Data Editing, April 24-26 2017