End-to-End Management of the Statistical Process An Initiative by ABS

21
End-to-End Management of the Statistical Process An Initiative by ABS Bryan Fitzpatrick Rapanea Consulting Limited and Australian Bureau of Statistics Work Session on Statistical Metadata (METIS) March 2010, Geneva

description

End-to-End Management of the Statistical Process An Initiative by ABS. Bryan Fitzpatrick Rapanea Consulting Limited and Australian Bureau of Statistics Work Session on Statistical Metadata (METIS) March 2010, Geneva. The Objectives. Business transformation aimed at reducing cost - PowerPoint PPT Presentation

Transcript of End-to-End Management of the Statistical Process An Initiative by ABS

Page 1: End-to-End Management of the Statistical Process An Initiative by ABS

End-to-End Management of the Statistical Process

An Initiative by ABS

Bryan FitzpatrickRapanea Consulting Limited

andAustralian Bureau of Statistics

Work Session on Statistical Metadata (METIS)March 2010, Geneva

Page 2: End-to-End Management of the Statistical Process An Initiative by ABS

The Objectives• Business transformation aimed at

• reducing cost• improving effectiveness and ability to respond

– A holistic approach to managing and improving the entire statistical life-cycle

• International collaboration– ABS does not want to go it alone– aim is for a shared approach

• sharing of ideas, interfaces, tools• but with acceptance of national differences

• Build on recent progress in international statistical community– standards (SDMX, DDI), GSBPM– aim is to make them work in practice

• A new program – IMTP– Information Management Transformation Program

Page 3: End-to-End Management of the Statistical Process An Initiative by ABS

End-to-End Management of the Statistical Process• Metadata is always the key to better approaches and process

improvements– it has been in all previous ABS improvement programs– ABS has a long history in trying to manage metadata (with modest

successes)

• Metadata means all the information we use in and around the processes and the data– to improve things we need to understand it, rationalise it, share it, and

use it to automate and drive processes and make the outputs more integrated and usable

• Previous improvement programs have generally been much more limited– Focused on few areas in a few projects– Narrow metadata focus

Page 4: End-to-End Management of the Statistical Process An Initiative by ABS

SDMX and DDI

• They are useful standards– they are not the focus of ABS interest in the exercise

• the focus is optimising the statistical processes and improving the results from the processes

• but we need to describe and manage all aspects the statistical process and that is their target domain

– they are international standards• sponsored and used by the community ABS is part of for purposes

that are relevant to IMTP• to discuss the issues internally and with other organisations we

need models– SDMX and DDI are in use, relevant, and fit for purpose

– IMTP aims to apply these standards (along with some others – ISO 11179, ISO 19115) and make them work

• build on recent work in the international statistical community

Page 5: End-to-End Management of the Statistical Process An Initiative by ABS

IMTP and Metadata Management• Metadata Management will be a major part of IMTP

– storing it, rationalising it, making it available for sharing and easy use, presenting it in different ways

• and integrating with existing stores such as Input Data Warehouse, Data Element Repository, ABS Information Warehouse

– we talk of a “Metadata Bus” and “Metadata Services”• some technical jargon

– it means the metadata is easily available to all systems running in the ABS environment

– we are still figuring out precisely what we mean and how it should look

• we need to get “use cases” – examples of what business areas and their systems need to do with the metadata

• but the services will deliver various sorts of metadata in XML formats– conforming to schemas from DDI and SDMX

Page 6: End-to-End Management of the Statistical Process An Initiative by ABS

IMTP and Metadata Management• IMTP focus will be on metadata that is “actionable”

– it means we want it in a form that both people and systems can use

• that can be easily stored and passed around• that can be used easily to generate whatever format is required in

any particular case– including web pages, PDFs, manuals, other human-readable forms

• SDMX and DDI both represent the metadata in XML

• Major focus on metadata management– version and maintained as in SDMX and DDI– “confrontation” across collections and processes

• aim is consistent, standard, metadata across the organisation– and consistent with international use wherever sensible

Page 7: End-to-End Management of the Statistical Process An Initiative by ABS

What sorts of metadata?• Current ABS metadata management has many shortcomings

– much metadata in corporate stores• in too many stores, and often documentary rather than actionable• often not used to drive systems even where it is available and actionable

– the systems predated the stores

– but much metadata is still embedded in individual systems– there are cases of good managed shared approaches

• but often narrowly focused– eg around dissemination

• End-to-end management of the process requires a comprehensive, consistent approach

– questions, question controls, interviewer instructions– coding, editing and derivation metadata– data relationship metadata– table structures– classification evolution and history– alternative hierarchies in geography and other classifications– …

Page 8: End-to-End Management of the Statistical Process An Initiative by ABS

SDMX and DDI• SDMX comes from the international agencies (OECD, IMF, Eurostat,

UNSD, World Bank, ECB, BIS)– they get aggregate statistical tables from many countries regularly over time

– they want to automate and manage the process• they need standard agreed definitions and classifications, standard agreed table

structures, standard agreed formats for both data and metadata

– They commissioned SDMX in 2002• started a project, gathered use cases, employed consultants• produced a standard and presented it to large numbers of international statistical

forums• started to use it and to pressure NSOs to use it

– SDMX is pretty good• excellent for managing dissemination of statistical data

– very good tools for very impressive web sites based on data organised in the SDMX model• also some good frameworks for managing evolution of classifications• a framework for discussing agreements on concepts and classifications

– Metadata Common Vocabulary, Cross-Domain Concepts, Domain-specific Concepts

Page 9: End-to-End Management of the Statistical Process An Initiative by ABS

SDMX and DDI• DDI (Data Documentation Initiative) comes from the data archive

organisations across many countries– trying to capture and store survey data for future use

• and to document it so future users can understand it and make sense of it• mostly social science collections from researchers• funding organisations are requiring such data to be preserved for further use

– mostly they had to grab data and try to salvage metadata after the event• but DDI now aims to capture all metadata “at source”

– early versions were narrowly focused on an individual data set• grew out of their documentation processes

– latest version (DDI V3) is much more extensive, better organised• common analysis/designer support with SDMX • an end-to-end model compatible with the Generic Statistical Business

Process Model (GSBPM)

Page 10: End-to-End Management of the Statistical Process An Initiative by ABS

DDI Metadata• DDI has

– Survey-level metadata• Citation, Abstract, Purpose, Coverage, Analysis Unit, Embargo, …

– Data Collection Metadata• Methodology, Sampling, Collection strategy• Questions, Control constructs, and Interviewer Instructions organised into

schemes

– Processing metadata• Coding, Editing, Derivation, Weighting

– Conceptual metadata• Concepts organised into schemes

– Including 11179 links• Universes organised into schemes• Geography structures and locations organised into schemes

Page 11: End-to-End Management of the Statistical Process An Initiative by ABS

DDI Metadata• DDI has (cont)

– Logical metadata• Categories organised into schemes

– (categories are labels and descriptions for question responses, eg, Male, Unemployed, Plumber, Australia, ..)

• Codes organised into schemes and linked to Categories– Codes are representations for Categories, eg “M” for Male, “Aus” for Australia)

• Variables organised into schemes– Variables are the places where we hold the codes that correspond to a response

to a question

• Data relationship metadata– eg, how Persons are linked to Households and Dwellings

• NCube schemes– descriptions for tables

Page 12: End-to-End Management of the Statistical Process An Initiative by ABS

DDI Metadata• DDI has (cont)

– Physical metadata• record structures and layouts

– File instance metadata• specific data files linked to their record structures

– Archive metadata• archival formats, locations, retention times, etc

– Places for other stuff not elsewhere described• Notes, Other Material

– References to “Agencies” which own artefacts but no explicit structure to describe them

– Inheritance and links embedded in most schemes• but need to be ferreted out, not necessarily easily usable

Page 13: End-to-End Management of the Statistical Process An Initiative by ABS

SDMX Metadata

• SDMX has– Organisations organised into schemes

• Organisations own and manage artefacts, and provide or receive things

– Concepts organised into schemes|

– Codelists, including classifications• a Codelist combines DDI Categories and Codes

– Data Structure Definitions (Key Families)• a DSD describes a conceptual multi-dimensional cube used

in a Data Flow and referenced in Datasets

Page 14: End-to-End Management of the Statistical Process An Initiative by ABS

SDMX Metadata• SDMX has

– Data Flows• described by a DSD, linked to registered data sets, and categorised

– Categories organised into schemes• not the same as a DDI Category• provide a basis for indexing and searching data

– Hierarchical Codelists• a misnomer – maps relationships amongst inter-related classifications• explicit, actionable representations of relationships

– Process metadata• a Process has steps with descriptions, transition rules, computation

information, inputs, outputs• all actionable, linked to other SDMX artefacts or to external sources

Page 15: End-to-End Management of the Statistical Process An Initiative by ABS

SDMX Metadata

• SDMX has– Structure Sets

• additional linking of related DSD and Flows

– Reporting Taxonomies• information about assembling reports or publications

– Reference Metadata, Metadata Structure Definitions, and Metadata Flows

• additional, probably useful, options for attaching metadata to data

– Annotations almost everywhere• good options for managed, actionable extensions

Page 16: End-to-End Management of the Statistical Process An Initiative by ABS

What sorts of metadata?• What are we interested in?

– Concepts• probably organised into schemes• what are the use cases?

– Classifications• broken up into Categories and Codes DDI-style?• with links to related classifications SDMX Hierarchical

Codelist-style?• what are the use cases?

– Questions and related metadata• just how should it look?

– a DDI package but precisely what is useful– what are the use cases?

Page 17: End-to-End Management of the Statistical Process An Initiative by ABS

What sorts of metadata?

• What are we interested in?– Survey-level metadata?

• what are the use cases?

– Structure Definitions• almost certainly, but we need use cases

– Variable, Relationship, and Record Structure metadata

• maybe, but we need use cases

– Processing metadata• almost certainly, but we need use cases• SDMX Process and/or DDI artefacts

Page 18: End-to-End Management of the Statistical Process An Initiative by ABS

What are the next steps?

• Basically we need use cases– How do we see our metadata being used?– What are trying to support?– What can we get from our pilot programs?

• we need to do our own abstraction from that

• We can then start to define a provisional set of services– with parameters and schemas

• We can then think about existing sources and demonstration systems

• We can then think about repositories and stores

Page 19: End-to-End Management of the Statistical Process An Initiative by ABS

Timeframe and Process• We are at the start of the process

– a project team that is still forming– several “satellite” projects

• small, sometimes significant projects attempting to apply ideas– and provide use cases for design

• Have had substantial training and discussion around application of DDI and SDMX– international experts providing training– significant numbers of ABS staff involved– more to come later this month

• Not a “big bang” new implementation– rather a framework and environment for all new developments

• with some retro-fitting to existing systems– some direct development of key components

Page 20: End-to-End Management of the Statistical Process An Initiative by ABS

International Collaboration• A definite part of the project

– most national agencies are feeling financial pressures and struggling to build everything themselves

• Need to discuss how collaboration might proceed– some discussions have been held amongst heads of NSOs

• more planned– agreed standards are important enabler

• need participation of NSOs in evolution of standards– what are barriers to collaboration and how might we manage it– probably do not want too large a group of collaborators at the start

• ABS (and others) will continue to report to international forums and meetings

– managerial and technical– important part of fostering the collaboration

• and finding out what others are doing• and getting feedback on our ideas

Page 21: End-to-End Management of the Statistical Process An Initiative by ABS

Questions?

[email protected]