CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco...

100
Eurostat CORA – COmmon Reference Architecture Monica Scannapieco Istat Carlo Vaccari Università di Camerino Antonino Virgillito Istat

Transcript of CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco...

Page 1: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORA –COmmonReferenceArchitecture

Monica Scannapieco Istat

Carlo Vaccari

Università di Camerino

Antonino Virgillito

Istat

Page 2: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Outline

• Introduction (90 mins)

• CORE Design (60 mins)

• CORE Architectural Components (90 mins)

• Illustration of CORE Platform (135 mins)

• Case studies (90 mins)

• CORE Follow-up (60 mins)

Page 3: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Introduction

Page 4: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Generalities

• Principal Outcome: Environment for the definitionand execution of standard statistical processes

– Definition of a process in terms of availableservices

– Execution of the composed workflow

Page 5: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Generalities

Service Repository

Process

View

“Plug and play” approach to process execution

Page 6: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Generalities

“Plug and play” approach to process execution

Service Repository

Process View Data View

Page 7: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Why CORE?

STOP

Allocation(MAUSS – R)

Selection(SAS Script)

Estimation(ReGenesees)

START

Page 8: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Why CORE?

STOP

Allocation(MAUSS – R)

Selection(SAS Script)

Estimation(ReGenesees)

START

Page 9: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Why CORE?

STOP

Allocation(MAUSS – R)

Selection(SAS Script)

Estimation(ReGenesees)

START

TechnologicalHeterogeneity

• Different technologies

• Different formats• …

Page 10: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Why CORE?

STOP

Allocation(MAUSS – R)

Selection(SAS Script)

Estimation(ReGenesees)

START

TechnologicalHeterogeneity

• Different technologies

• Different formats• …

DataHeterogeneity

• Different names for variables

• Variables as combinations of other variables

• ...

Page 11: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Why CORE?

• Technological heterogeneity can be solved by

solutions available on the market

CORE permits to solve both technological and data

heterogeneity in a single environment

Page 12: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Vision

1. Abstract services: well-defined, technology-independentfunctionalities implemented by different IT tools;

2. Statistical process: workflow defined in terms of availableservices;

3. Data model: standardization of the semantics/format ofservices data, i.e. definition of the domain entities involved asinput/output between services.

Page 13: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Vision

1. Abstract services: well-defined, technology-independent functionalities implemented by IT tools

Allocation

Selection

Estimation

(MAUSS – R) (ReGenesees)

(SAS Script)

Page 14: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Vision

3. Data model: standardization of the semantics/format of services data

Allocation

Selection

(MAUSS – R)

(SAS Script)

<schema name="DEMO_DD">

<entity name="SamplePlan">

<property name="VAR"/>

<property name="SIZE"/>

. . .

</entity>

</schema>

3.1 Domain descriptor (DD)

3.2 Mapping to/from DD

DD

Page 15: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Design Tasks - 1

• Design of services

• Definition of integration APIs (IAPIs)

• Data conversion from/to CORA model to/fromtool specific format

• Graphical front ends for designing schemas and mappings

Page 16: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Design Tasks - 2

• Design of processes

• How to define and execute processes withinCORE

• Modelling language

• Execution

• Visual interfaces design

• Design of a service repository

Page 17: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Design Tasks - 3

• Design of exchanged data

• Definition of data models and formats (plainXML/XSD, SDMX…) to be used for data exchanges

• Definition of metadata necessary for processexecution

• SDMX Relationships

Page 18: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Design

Page 19: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Design: Services

• Abstract services: specify a well-definedfunctionality in a technology-independentway

• An abstract service can be implementedby one or more concrete services, i.e. IT tools

• Examples: sample allocation, record linkage, estimates and errorscomputation, etc.

Page 20: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Design: Services

• GSBPM classification

• Documentation purpose

• Provided that a CORE service can be linked to IT tools, GSBPM tagging enables the performance of a search e.g. retrieving“all the IT tools implementing the 5.4 Impute

subprocess of GSBPM proposal”

Page 21: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Design: Services

• Service inputs and outputs

• Specified by logical names

• Characterized with respect to their “role” in data exchangeNon-CORE: if they are not provided by/to otherservices of the process, but are only “local” to a specific service

CORE: they are passed by/to other services and hencethey do need to undergo CORE transformations

Page 22: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Design: Data and Metadata

• They are specified as service inputs and outputs

• Logical names link them to previously specifiedservices

• Non-CORE data only need the file system pathwhere they can be retrieved

Page 23: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Design: CORE Data

• The specification of CORE data is provided by 3 elements:

• Domain descriptor

• CORE data model

• Mapping model

Page 24: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Domain Descriptor: Model

• Entity

• Like “entities” in Entity Relationships

• Entity properties

• Like “attributes” in Entity Relationships

• Very simple (meta-)model

Page 25: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Domain Descriptor: Example

<schema name="DEMO_Domain_Descriptor">

<entity name="SamplePlan">

<property name="STRATIFICATION_VAR"/>

<property name="STRATUM_SAMPLE_SIZE"/>

<property name="STRATUM_POPULATION_SIZE"/>

</entity>

<entity name="Enterprise">

<property name="IDENTIFIER"/>

<property name="STRATIFICATION_VAR"/>

<property name="WEIGHT"/>

<property name="SAMPLING_FRACTION"/>

<property name="ENTERPRISE_FLAG"/>

<property name="EMPLOYEES_NUM"/>

<property name="VALUE_ADDED"/>

<property name="AREA"/>

</entity>

</schema>

Page 26: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Domain Descriptor Role

• Role of the Domain Descriptor (DD): fromservice-to-service data mapping to service-to-global data mapping

Page 27: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Data Model: Role

• Specified once and valid for all processes

• Extensible, i.e. core tag, data set kind, column kind can be modified

• Adds more semantics to data

• Example of usage: mapping to othermodels

Page 28: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Data Model

• Rectangular data set

• CORE tag:

• Data set level (mandatory)

• Column level (optional)

• Rows level (optional)

• Data set kind

• Column kind

Page 29: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Data Model Role

• Specified once and valid for all processes

• Extensible, i.e. core tag, data set kind, column kind can be modified

• Adds more semantics to data

• Example of usage: mapping to othermodels

Page 30: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Mapping Model

• Rectangular data assumption

• Mapping is intended to be specified with respect to Domain

Descriptor

• Columns are to be mapped to properties of an entity

• It contains the specification of how CORE data model

concepts are associated to data

Page 31: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Logical Architecture

Page 32: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE GUIs

• Process design

• Ad-hoc customization of an existing tool (Oryx)

• Service data flow

• Service design

• Set of interfaces for the definition of services and

related data flow

• Data design

• Set of interfaces for the specification of domain

descriptors and mapping files

Page 33: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Use Case Specification

• CORE (Principal) Users

33

Page 34: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Use Case Specification: Tool Management

34

Page 35: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Use Case Specification: Service Management

35

Show Tools' List

Add Service

Modify Service

Delete Service

«uses»

Show Services' List

Statistical User Service Management«uses»

«uses»

«uses»

«uses»

«uses»

Select Service

SelectTool

«uses»

Page 36: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Use Case Specification: Process

36

Page 37: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Process design: Oryx

• Oryx is an academic open source frameworkfor graphical process modeling

• Based on web technology

• Extensible via a plugin mechanism and new stencil sets

• Supports BPMN and other processmodeling languages

• Programming language Javascript and Java, internal data format based on RDF

Page 38: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Stencil Set

• Set of graphical objects and rules that specify how to relate those graphical objects to others

• Additional properties that can later be used by other applications or Oryx extensions (e.g. setting element colors and visibility)

• Can be used to build process models

Page 39: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

The CORE Stencil Set

• Graphical representation of CORE processes

• Easy-to-use editor (desktop feeling)

• Easy-to-extend source (JSON)

• Defined from BPMN

• Guarantees complete BPMN compliance

Page 40: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Integration APIs

• Purpose: wrapping a tool by a CORE

service

• Translates inputs and outputs of the tool in a

completely transparent and automatic way

Page 41: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Repository

• Processes and their instances

• Services with their GSBPM and CORE classifications

• Tools and their runtime features

• Data with their logical classification within CORE processes

Page 42: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Database design: Overview

42

Page 43: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Database Design: Principal Entities

43

• Service & Tool

Page 44: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Database Design: Principal Entities

-id

-name

-GSBPMtag

-coretag

-version

-namespace

service

-id

-name

-definition

process

0..*

0..*

44

• Service & Process

Page 45: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Database Design: Principal Entities

45

• Operational Data

Page 46: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Process Engine

• Official statistics processes can be viewed from two

perspectives:

• Functional: they are data-oriented, reflecting a common

feature of scientific workflows

• Organizational: they are workflow-oriented, have the

complexity of real production lines, with the need for

harmonizing the work of different actors

Page 47: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Process Engine

• Hence our process engine has two layers …

DATA FLOW CONTROL SYSTEM

WF ENGINE

Complex control flows

� Syncronizing constructs, cycles,

conditions, etc.

� E.g.: Interactive multi-user

editing imputation

Simple control flows

� Sequence of tasks is composed

by connecting the output of one

task to the input of another

� Data intensive operations

Page 48: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Worflow Engine Selection Process

• CORE workpackage (WP$) led by INSEE

• Business Process Management (BPM) platforms:

• Bonita (http://www.bonitasoft.com/)

• Activiti (http://www.Activiti.org/)

• ActiveVOS (http://www.activevos.com/)

Page 49: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Worflow Engine Selection Process

Page 50: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

SDMX Relationships

• Both propose an information model

• CORE information model

– takes explicitly process dimension into account

– Data dimension spanning over the whole statistical process

• SDMX information model

– focused on data exchange (though processes are also considered)

Page 51: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

SDMX Relationships

• CORE information model

– Deals with both microdata and macrodata

• SDMX information model

– Mainly deals with macrodata

Page 52: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

SDMX Relationships

1. Can we use SDMX for micro and macro data exchanges in a CORE process?

– Need for mapping of information models

2. What about metadata?

– CORE: Data and metadata managed at the same way

– SDMX: distinction between structural metadata and reference metadata. Possibility of having domain knowledge codified through concepts

Page 53: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

SDMX Relationships

• Choices and steps:

• Conversion from CORE XML to CSV in order to use SDMX conversion tools

• Starting from the CORE file structure it was created a SDMX DSD (Data Structure Definition)

• SDMX data format : cross-sectional

• Once prepared the DSD, we proceeded to convert the CORE file using the SDMX Converter tool

Page 54: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

SDMX Relationships

• CORE-to-SDMX Conversion Proof-of-Concept

• Setting:

• Italian Time-Use Survey

• Data Structure Wizard and the SDMX Converter

• Compute Estimates and Sampling Errors” (as the aggregated data dissemination phase)

• Choices and steps:

• Conversion from CORE XML to CSV in order to use SDMX conversion tools

• Starting from the CORE file structure it was created a SDMX DSD (Data Structure Definition)

Page 55: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

SDMX Relationships

• The experiment has shown the feasibility of the conversion to SDMX format of a data file obtained as a CORE output

• Not automated conversion:

• Manual mapping of the CORE output’s fields to the dimensions and attributes of the SDMX DSD

• SDMX does not manage more than measure, it was necessary the verticalization of the CORE output file in order to convert it to the SDMX cross sectional

Page 56: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Architecture Deployment

• Web-based architectured centered on a

centralized component

– CORE Environment

• Different CORE deployments can co-exist

– Intra- or Inter- organization

• Services can be remotely executed

– Support is needed in the form of a

distibuted component for tool execution

and data transfer

Page 57: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Types of service runtime

• Batch

– Tool executed by a command line call

– Can be automated

• Interactive

– User interact with the tool through a tool-provided GUI

– Cannot be automated

• Web service

– No tool – procedure distributed on a web service actived

by a programming language call

– Can be automated

Page 58: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Technical Architecture

GUI Definition Repository

Integration APIs

Process Engine

Runtime

CORE Environment

Web service client

Remote activation

Runtime

Runtime agent

Batch-Interactive runtime

Web service runtime

Web container

Run on the machine on which the tool is deployed.Is responsible for: -Preparing the input -Gathering the output-Activating the tool

Page 59: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Technical Architecture

GUI Definition Repository

Integration APIs

Process Engine

Runtime

CORE Environment

Web service client

Remote activation

Runtime

Runtime agent

Batch-Interactive runtime

The process enginesignals a service must be executed

Page 60: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Technical Architecture

GUI Definition Repository

Integration APIs

Process Engine

Runtime

CORE Environment

Web service client

Remote activation

Runtime

Runtime agent

Batch-Interactive runtime

Service definition isextracted from the repository, as well asthe required datasetsand the correspondingmappings

Page 61: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Technical Architecture

GUI Definition Repository

Integration APIs

Process Engine

Runtime

CORE Environment

Web service client

Remote activation

Runtime

Runtime agent

Batch-Interactive runtime

Datasets are convertedaccording to the mapping

Page 62: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Technical Architecture

GUI Definition Repository

Integration APIs

Process Engine

Runtime

CORE Environment

Web service client

Remote activation

Runtime

Runtime agent

Batch-Interactive runtime

Converted datasets are transferred to the remote runtime

Page 63: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Technical Architecture

GUI Definition Repository

Integration APIs

Process Engine

Runtime

CORE Environment

Web service client

Remote activation

Runtime

Runtime agent

Batch-Interactive runtime

The tool is activated by the runtime agent

Page 64: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Technical Architecture

GUI Definition Repository

Integration APIs

Process Engine

Runtime

CORE Environment

Web service client

Remote activation

Runtime

Runtime agent

Batch-Interactive runtime

The output datasets are gathered and sent back to the CORE environment

Page 65: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Technical Architecture

GUI Definition Repository

Integration APIs

Process Engine

Runtime

CORE Environment

Web service client

Remote activation

Runtime

Runtime agent

Batch-Interactive runtime

Datasets are convertedback to CORE format according to the mapping

Page 66: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Technical Architecture

GUI Definition Repository

Integration APIs

Process Engine

Runtime

CORE Environment

Web service client

Remote activation

Runtime

Runtime agent

Batch-Interactive runtime

Converted datasets are stored in the repository

Page 67: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Technical Architecture

GUI Definition Repository

Integration APIs

Process Engine

Runtime

CORE Environment

Web service client

Remote activation

Runtime

Runtime agent

Batch-Interactive runtime

The process continues its execution

Page 68: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Scenario 1

• Remote execution command line/GUI

– Physical layers: CORE env, Service

– AGENT

Page 69: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Scenario 2

• Remote execution web service

– Physical layers: CORE env, Service

Page 70: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Scenario

Page 71: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Why a Process Scenario?

• Helps to clarify ideas and to asses their feasibility

• Forces to make newly proposed solutions concrete

• Can/will be used as empirical test-bed during the whole implementation cycle of the CORE environment

71

Page 72: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

How did we build the Scenario?

• Rationale for our Scenario:

• Naturality: involves typical processing steps

performed by NSIs for sample surveys

• Minimality: very easy workflow (no

conditionals, nor cycles), can be run without a

Workflow Engine

• Appropriateness: incorporates as muchheterogeneity as possible: heterogeneity isprecisely what CORE must be able to get rid of

72

Page 73: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Spreading Heterogeneity over the

Scenario

• The Scenario incorporates both:

• Data Heterogeneity

Via data exchanged by CORE services belonging to the scenario process

• Technological Heterogeneity

Via IT tools implementing scenario sub-processes

73

Page 74: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Data Heterogeneity

• The Scenario entails different levels of data

heterogeneity:

• Format Heterogeneity: CSV files, relational DB tables, SDMX XML files involved

• Statistical Heterogeneity: both Micro and Aggregated Data involved

• “Model” Heterogeneity: some data refer to ordinary real-world concepts (e.g. enterprise, individual, …), some other to concepts arising from the statistical domain (e.g. stratum, variance, sampling weight, …)

74

Page 75: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Technological Heterogeneity

• The Scenario requires to wrap inside

CORE-compliant services very different IT

tools:

• simple SQL statements executed on a relational DB

• batch jobs based on SAS or R scripts

• full-fledged R-based systems requiring a human-computer interaction through a GUI layer

75

Page 76: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

The Scenario at a glance

76

START

ComputeStrata

Statistics

Allocate the

Sample

Selectthe

Sample

Compute Estimates and Sampling Errors

CalibrateSurvey Data

CollectSurvey Data

Check and Correct

Survey Data

Store Estimatesand Sampling

Errors

Convert to

SDMX

STOP

ALLOCATION

ESTIMATION

Page 77: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Sample Allocation Subprocess

• Overall Goal: determine the minimum

number of units to be sampled inside

each stratum, when lower bounds are

imposed on the expected level of

precision of the estimates the survey

has to deliver

• Two statistical services are needed:

• Compute Strata Statistics

• Allocate the Sample

77

START

Compute

Strata Statistics

Allocate the Sample

AL

LO

CA

TIO

N

START

Compute

Strata Statistics

Allocate the Sample

AL

LO

CA

TIO

N

Page 78: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Compute Strata Statistics

Service• Goal: compute, for each stratum,

the population mean and standard deviation of a set of auxiliary variables

• IT tool: a simple SQL aggregated query with a group-by clause• NSIs usually maintain their sampling frame(s) as Relational DB tables

• Integration API: must support Relational/CORE transformations

• CORA tag: “Statistics”

78

START

ComputeStrata Statistics

Allocate the Sample

AL

LO

CA

TIO

N

START

ComputeStrata Statistics

Allocate the Sample

AL

LO

CA

TIO

N

Page 79: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Allocate the Sample Service

• Goal: solve a constrained

optimization problem to find and

return the optimal sample

allocation across strata

• IT tool: Istat MAUSS-R system

• implemented in R and Java, can be run either in batch mode or interactively via a GUI

• Integration API: must support

CSV/CORE transformations

• MAUSS handles I/O via CSV files

• CORA tag: “Statistics”

79

START

Compute

Strata Statistics

Allocate the Sample

AL

LO

CA

TIO

N

START

Compute

Strata Statistics

Allocate the Sample

AL

LO

CA

TIO

N

Page 80: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Sample Selection Subprocess

• Goal: draw a stratified random sample of units from the sampling frame, according to the previously computed optimal allocation

• IT tool: a simple SAS script to be executed in batch mode

• Integration API: CSV/CORE transformation• SAS datasets have proprietary, closed format ���� we’ll not support direct SAS/CORE conversions

• CORA tag: “Population”• output stores the identifiers of the units to be later surveyed + basic information needed to contact them

80

Selectthe Sample

Selectthe Sample

Page 81: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Estimation Subprocess

• Overall Goal: compute the

estimates the survey must

deliver, and asses their

precision as well

• Two statistical services are

needed:

• Calibrate Survey Data

• Compute Estimates

and Sampling Errors81

Compute Estimates

and Sampling Errors

Calibrate

Survey Data

ES

TIM

AT

IONCompute Estimates

and Sampling Errors

Calibrate

Survey Data

ES

TIM

AT

ION

Page 82: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Calibrate Survey Data Service

• Goal: provide a new set of weights (the “calibrated weights”) to be used for estimation purposes

• IT tool: Istat ReGeneseessystem• implemented in R, can be run either in batch mode or interactively via a GUI

• Integration API: can use both CSV/CORE and Relational/CORE transformations

• CORA tag: “Variable”

82

Compute Estimates

and Sampling Errors

Calibrate

Survey Data

ES

TIM

AT

IONCompute Estimates

and Sampling Errors

Calibrate

Survey Data

ES

TIM

AT

ION

Page 83: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Estimates and Errors Service

• Goal: use the calibrated weights to compute the estimates the survey has to provide (typically for different subpopulations of interest) along with the corresponding confidence intervals

• IT tool: Istat ReGenesees system

• Integration API: can use both CSV/CORE and Relational/CORE transformations

• CORA tag: “Statistic”

83

Compute Estimates

and Sampling Errors

Calibrate

Survey Data

ES

TIM

AT

IONCompute Estimates

and Sampling Errors

Calibrate

Survey Data

ES

TIM

AT

ION

Page 84: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Store Estimates Subprocess

• Goal: persistently store the

previously computed survey

estimates in a relational DB

• e.g. in order to subsequently

feed a data warehouse for

online publication

• IT tool: a set of SQL statements

• Integration API: Relational/CORE

transformation again

• CORA tag: “Statistics”84

Store Estimates

and Sampling Errors

Store Estimates

and Sampling Errors

Page 85: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Convert to SDMX Service

• Goal: retrieve the aggregated data

from the relational DB and directly

convert them in SDMX XML format

• e.g. to later send them to

Eurostat

• IT tool: ???

• Integration API: must support

SDMX/CORE transformations

• CORA tag: “Statistics”

85

Convert to

SDMX

STOP

Convert to

SDMX

STOP

Page 86: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Scenario Open Issues

• Besides I/O data, CORE must be able to handle “service

behaviour parameters”. How?

• e.g. to analyze a complex survey, ReGenesees needs a

lot of sampling design metadata, namely information

about strata, stages, clusters identifiers, sampling

weights, calibration models, and so on

• Enabling the CORE environment to support interactive

services execution is still a challanging problem

• we plan to exploit MAUSS-R and/or ReGenesees to test

the technical feasibility of any forthcoming solution

• How to implement a SDMX/CORE converter?

86

Page 87: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Demo Scenario

• Involves 3 typical processing steps performed by NSIs for sample surveys:

• Sample Allocation

• Sample Selection

• Estimation

• It has been used as empirical test-bed during the whole implementation cycle of the CORE environment

87

Page 88: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Rationale for the Scenario

• Minimality: very easy workflow (no conditionals,

nor cycles), can be run without a Workflow

Engine

• Appropriateness: addresses heterogeneity issues

• heterogeneity is precisely what CORE must be ableto get rid of

88

Page 89: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Spreading Heterogeneity over the

Scenario

• The Scenario incorporates both:

• Data Heterogeneity: Via data exchanged by CORE services belonging to the scenario process

• Technological Heterogeneity: Via IT tools implementing scenario services

A batch job based on a SAS script

Two full-fledged R-based systems

89

Page 90: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

The Scenario at a glance

90

START

MAUSS-R

ALLOCATION

SAS SCRIPT

SELECTION

STOP

ReGeneseesSystem

ESTIMATION

Page 91: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Sample Allocation Service

• Overall Goal: determine the

minimum number of units to be

sampled inside each stratum,

when lower bounds are imposed

on the expected level of precision

of the estimates the survey has to

deliver

• IT tool: Istat MAUSS-R system

• implemented in R and Java

• CORA tag: “Statistics”

91

START

MAUSS-R

ALLOCATION

Page 92: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Sample Selection Service

• Goal: draw a stratified random

sample of units from the

sampling frame, according to the

previously computed optimal

allocation

• IT tool: a simple SAS script to

be executed in batch mode

• CORA tag: “Population”

92

SAS SCRIPT

SELECTION

Page 93: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Estimates and Errors Service

• Goal: compute the

estimates the survey has

to provide (typically for

different subpopulations of

interest) along with the

corresponding confidence

intervals

• IT tool: Istat ReGenesees

System

• R-based

• CORA tag: “Statistics”93

STOP

ReGeneseesSystem

ESTIMATION

Page 94: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE Follow up

Page 95: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

CORE in Istat

• CORE is an Action of the Istat strategic plan Stat2015

• Period 2013-2015

• Objective: Usage of CORE platform in production scenarios ofIstat

• Plan for 2013:

• Implementation of engineering activities

• Usage of CORE to support sharing of generalized software functionalities– currently studying how to

• Usage of CORE in dissemination flow of the corporate architecture in conjunction with an ETL tool (Kettle) – currentlystudying how to

Page 96: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Development of CORE Services for ESS: Issues

• CORE is strictly related to the “Shared Services” technical cross-cutting issue of the ESS VIP (Vision Infrastructure Project) Programme

• Period 2013-2018

• Role: Supporting standardisation of the communication

protocol among standard statistical services

Page 97: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Issue 1: Relationship between CORE and SOA

• Hints for answering issue 1:

• CORE adopts a SOA design approach

• CORE services can be deployed as Web Services

• CORE do “imply”/”include” SOA technologies

• SOA technologies does not “imply”/”include” CORE

Page 98: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Issue 2: Relationship between CORE and GSIM

• Hints for answering issue 2:

• CORE did not have the purpose of defining yet another information model

• CORE takes into account the need for an information model

• Introduced only for demonstration purposes

• Hence from a design perspective CORE is open to adopt a full-fledged information model like GSIM

• CORE Model slot/CORE Domain Descriptor slot

Page 99: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Issue 3: Relationship between CORE and DDI/SDMX

• Hints for answering issue 3:

• DDI/SDMX provides “logical” information models

• GSIM serves a documentation purpose

• DDI/SDMX serve (mainly) a representation purpose

• CORE could be integrated with DDI/SDMX by:

• Mapping rectangular datasets representation of CORE data to such models

• Mapping in principle feasible as CORE model “less expressive”

Page 100: CORA – COmmon Reference Architecture · CORA – COmmon Reference Architecture Monica Scannapieco ... Use Case Specification: Service Management 35 Show Tools' List ... • Activiti

Eurostat

Issue 4: CORE Deployment Issues in the ESS –SOA supporting platform

• Hints for answering issue 4:

• Need for designing a CORE deployment for the ESS

• Service repositories

• Data exchanges

• Security issues

• Performance issues

• ...