Description of CORE Implementation in Java

18
ESSnet CORE COmmon Reference Environment Date of dissemination Version Page February 2012 1.0 1 Partner’s name: Istat WP number and name: WP6 – Implementation library for generic interface and production chain for Java Deliverable number and name: 6.1 Description of CORE Implementation in Java Description of CORE Implementation in Java Partner in charge Istat Version 1.0 Date February 2012 Version Changes Changed by Date 1.0 First version ISTAT 20/02/2012 This document is distributed under Creative Commons licence "Attribution-Share Alike - 3.0 ", available at the Internet site: http://creativecommons.org/licenses/by-sa/3.0

Transcript of Description of CORE Implementation in Java

ESSnet

CORE COmmon Reference Environment

Date of dissemination Version Page

February 2012 1.0 1

Partner’s name: Istat

WP number and name: WP6 – Implementation library for generic interface and production chain for Java

Deliverable number and name: 6.1 Description of CORE Implementation in Java

Description of CORE Implementation in Java

Partner in charge Istat

Version 1.0

Date February 2012

Version Changes Changed by Date

1.0 First version ISTAT 20/02/2012

This document is distributed under Creative Commons licence

"Attribution-Share Alike - 3.0 ", available at the Internet site:

http://creativecommons.org/licenses/by-sa/3.0

ESSnet

CORE COmmon Reference Environment

Date of dissemination Version Page

February 2012 1.0 2

Summary

This document presents the implementation details of the Java implementation of the CORE web platform

Keywords: CORA, CORE, implementation details, GUI, implementation scenario

ESSnet

CORE COmmon Reference Environment

Date of dissemination Version Page

February 2012 1.0 3

Contents

1 Introduction ................................................................................................................................. 5

2 Implementation details ................................................................................................................ 6

2.1 Internal architecture ............................................................................................................. 6

2.2 The Repository Web Interface ............................................................................................. 7

2.3 CORE Transformation API .................................................................................................. 8

2.4 Local Service Runtime ......................................................................................................... 9

2.5 Process Runtime ................................................................................................................. 10

2.6 Internal Code Organization ................................................................................................ 10

3 GUI functionalities .................................................................................................................... 12

3.1 Process design .................................................................................................................... 12

3.2 Domain descriptor definition ............................................................................................. 14

3.3 Mapping definition ............................................................................................................. 15

3.4 Process execution ............................................................................................................... 18

ESSnet

CORE COmmon Reference Environment

Date of dissemination Version Page

February 2012 1.0 4

Summary

The purpose of this document is to describe the CORE Java implementation

Keywords: CORE, CORA, IT architecture, information model, GSBPM

ESSnet

CORE COmmon Reference Environment

Date of dissemination Version Page

February 2012 1.0 5

1 Introduction

This document gives an architectural description of the Java implementation of the CORE web platform. The implementation is intended as a proof of concept of the main aspects of the CORE general architecture and usage. Hence, its direct use in production systems is discouraged although it can represent a starting point for actual implementations. We refer to the architectural components as described in deliverable 3.1.

The CORE web platform is a Java web application that represents a web front-end for various components of the CORE architecture. In particular, it provides a GUI for accessing the repository allowing users to store and retrieve definitions of the process and service data. At the same time, it incorporates a web interface a basic process execution engine for sequential execution of services within a process. More precisely, service definitions are read from the repository and sequentially organized in a process. An implementation of the Local Service Runtime and Transformation API components are included in the web platform and used for executing on the server where the platform is installed all the services that belong to a same process, executing data transformations from one process step to another.

The document is structured as follows: Section 2 describes the implementation details of the CORE web platform, presenting general design of the system components and their internal architecture and technical choices. Section 3 shows the user interface of the system, by presenting the whole functionality of the CORE web platform through the sequence of commented GUI screenshots.

ESSnet

CORE COmmon Reference Environment

Date of dissemination Version Page

February 2012 1.0 6

2 Implementation details

In this section we specify the details of the Java implementation of the CORE web platform. First, we give an overview of the internal architecture of the application, then we specify the packages in which the application is structured and their role.

2.1 Internal architecture The CORE web platform is organized into two subsystems, controlled by the web GUI (Figure 1). On one hand, there is the repository interface, that provides functionality to pick a process from the available processes or create a new process, view/edit the services of a selected process by specifying the tools which the service is attached to and the input and output files that are expected by the service for its execution. On the other hand, there is functionality to execute a process instance. This consists in three distinct components, namely the Integration API, the Process Engine and the Local Process Runtime.

Figure 1: CORE Platform Internal Architecture

The two subsystems correspond exactly to the two phases of design-time and runtime of a process. At design time the role of the user is to define the global data model (Domain Descriptor) used in the process and how the inputs and outputs (Operational Data) of the various services relate to it though proper mapping files. At runtime, the role of the user is to spawn an instance of a process, by providing instances of operational data that are variable from one execution to another. The Local Service Runtime component is responsible for triggering the transformation of the operational data according to the mappings so that the

GUI

DefinitionRepository

IntegrationAPIs

Process EngineRuntime

Services

GUI

DefinitionRepository

IntegrationAPIs

Process EngineRuntime

Services

ESSnet

CORE COmmon Reference Environment

Date of dissemination Version Page

February 2012 1.0 7

tools attached to the service can be executed with all the input data they expect correctly in place. Transformation take place through the Integration API.

In the following of this section we provide all the technical details of the implementation of the components in the platform.

2.2 The Repository Web Interface The internal design of the Repository Web Interface is organized into 5 layers (Figure 2):

• GUI: JSP pages implementing the user interface. They can be forms for sending data to the server, processed by an action, and/or results of an action execution

• Actions: Java classes whose operations are triggered in correspondence to a HTTP call activated by a form submission on the GUI. They receive data from the HTTP request and execute some server-side processing by calling Services.

• Services: Java classes that implement transaction on the DB, realized through sequences of calls to DAOs

• Data Access Objects (DAOs): Java classes that implement database CRUD operations related to one or more domain objects

• Entities: Javabean classes representing records of one database table

Each layer is supported by one or more Java frameworks. Frameworks are used to simplify development and to obtain more robust and standardized code. In the following, we list the frameworks used in the application and explain their role within the architecture:

• Hibernate: maps domain objects to database tables. Provides an API (used within the DAO objects) to access to CRUD operations, that creates the SQL statements corresponding to each operation by exploiting the object-table mapping.

GUI Actions Services DAOs Entities

Struts2 Spring Hibernate

Figure 2: Repository Web Interface Internal Design

ESSnet

CORE COmmon Reference Environment

Date of dissemination Version Page

February 2012 1.0 8

• Spring: handle database access configuration and transparently supports transaction. Service classes are managed by Spring, so that each method in a service class is surrounded by a transaction which is automatically committed when method ends successfully or rollbacked if some errors occur.

• Struts2: Model-View-Controller framework that handles the communication between the JPS pages that represent the application GUI and the server-side actions that handle the processing.

Other frameworks are used within the project

• XJC: XML to Java compiler included in the Java Software Development Kit. Used to generate the Java classes starting from the XML schema files of the CORA model (discussed in Deliverable 3.1).

• JQuery: Javascript framework that facilitates the manipulation of HTML objects on the web page and AJAX calls.

• Log4J: Java framework used for generating log entries.

2.3 CORE Transformation API

The CORE Transformation API is the component responsible for converting datasets obtained from input and output files into datasets represented in the XML CORA format. In other words, the Transformation API acts as a wrapper for a service runtime, as explained in deliverable 3.1.

The main component of the Transformation API is the Transformation interface. This interface defines two operations, transformFromCora and transformToCora. The design of the Transformation API is pluggable with respect to the dataset format. Provided that the data is represented in tabular format, a programmer can realize her own implementation of the Transformation interface, specifying the two operations.

The other component of the Transformation API is the Mapping. The Mapping is used by the Transformation component for realizing the dataset transformation with respect to the CORA format. The Mapping is specific for the data format and has to be defined by the programmer, through the definition of a specific type of XML file and its corresponding parser.

As an example, the CORE platform proof of concept includes the implementation of the Transformation API for the CSV format. This consists of two files, CsvTransformation, which is the direct implementation of the Transformation interface, and CsvMappingParser, that is the component responsible for generating the actual mapping object starting from its XML definition.

ESSnet

CORE COmmon Reference Environment

Date of dissemination Version Page

February 2012 1.0 9

2.4 Local Service Runtime The Local Service Runtime component is responsible for the execution of the tool associated to a service and the related data transformations. Tools have to be executed by specifying their command line. In other words, interaction of interactive tools is not supported in this execution mode.

The command line for a tool is stored in the repository and contains all the information required for running the tool, including the input and output files. However, in order to allow users a degree of flexibility in defining their processes, input and output file names are not fixed in the command line but can be specified as parameters. At design time, the parameters names are associated to a CORE dataset, i.e. the input or output of a CORE transformation and at runtime they are replaced with the actual names of the files stored in the file system. This mechanism is entirely handled by the platform and completely transparent to the user, that does not have to deal with complex pathnames that can easily lead to errors.

CORE datasets are stored in memory during a process execution and can be optionally be written to file, for debugging purposes or for data exchange, in the form of XML files whose schema is defined by the Domain Descriptor.

The functionality of the Local Service Runtime is structured in two phases, initialization and execution. The initialization phase is executed before the whole process starts and has the purpose of preparing the execution by allocating the input and output ports that are used by the service to trigger the input/output transformation before/after the execution. More precisely, a port is created for each operational data associated to a CORE dataset. Non-CORE datasets, i.e. the files that are provided by the users and not subject to transformation, are not considered in this phase.

The execution phase is activated when the service has to be executed within the process flow. It is structured in general in the following steps:

1. For each input port the corresponding CORE dataset is read from memory

2. All the CORE datasets are transformed into files, using the associated mapping. Please note that the Local Service Runtime is oblivious to the kind of transformation that take place which is completely handled by the specific subclassing of the Transformation API.

3. The tool command line is executed.

4. For all the output ports, the corresponding file is retrieved from the file system.

5. For all the output files a CORE dataset is generated and stored in memory and is available for the successive steps.

During the execution phase, any runtime error that occurs is caught and stored, so that it can be shown to the user after process execution is terminated.

The Local Service Runtime is able to handle only execution of services attached to tools that are installed on the same machine as the CORE web platform (hence the name “Local”). The implementation of a Remote Service Runtime differs only mainly for pre-execution and post-

ESSnet

CORE COmmon Reference Environment

Date of dissemination Version Page

February 2012 1.0 10

execution phases where the CORE datasets are transferred from the machine hosting the CORE web platform to that hosting the tool installation. A form of mechanism for the remote execution of a command line is also required. These requirements can be both easily addressed by exploiting a message-based middleware platform.

2.5 Process Runtime The Process Runtime component included in the Java implementation of the CORE platform is a basic process execution engine that only supports sequential execution of processes. While it does not include the advanced functionality required by a production-level product, we deem it is perfectly suited for the demonstration purposes this implementation is targeted at, and it can even support effectively some simple production process.

The Process Engine simply manages the execution of a process instance by invoking in sequence the Local Service Runtime components associated to all the services in the process, as described in the previous Section.

Before a process instance execution start the user must specify all the non-CORE datasets, i.e. the user-provided variable datasets that are not subject to CORE transformations and that can change from one execution of the process to another.

2.6 Internal Code Organization The following table presents the source code organization of the Java project, summarizing the Java packages in which the source code is organized into. Each package roughly corresponds to an architectural component.

Component Description

eu.cora.model.data Contains the Cora Dataset class, which is the heart of the Cora data model, and classes used by it. The code for this component is fully generated from the xml schema cora.data.model.xsd

eu.cora.domaindescriptor Contains the Core Schema class, used for defining Core datasets, and classes used by it. The code for this component is fully generated from the xml schema cora.domain.descriptor.xsd

eu.cora.model Helper and handler classes for the core objects in eu.cora.model.data and eu.cora.domaindescriptor.

eu.cora.mapping.csv Contains the CsvFile class, used to define mappings between Cora Datasets and csv files, and classes used by it. Code for these classes is fully generated from the xml schema cora.mapping.csv.xsd. Furthermore, this component contains classes which handle these mappings, and perform the transformations between Cora Datasets and Csv files.

eu.cora.domain Entities objects. The code for these classes was generated using Hibernate.

ESSnet

CORE COmmon Reference Environment

Date of dissemination Version Page

February 2012 1.0 11

eu.cora.util Contains the helper class Utils, in which some generic helper functions are defined.

eu.cora.runtime Contains classes that facilitate the execution of Cora Processes, defining a runtime environment for them.

eu.cora.service Contains the RuntimeService class, responsible for setting up the runtime environment for process execution.

eu.cora.controller Contains action classes for the web application, that act as controller layer in Struts2.

Table 1: Description of all components in the CORE Java web platform

ESSnet

CORE COmmon Reference Environment

Date of dissemination Version Page

February 2012 1.0 12

3 GUI functionalities

In this section we show the GUI environment functionalities and how they can be used in order both to design and to execute a statistical process. The GUI is a web application, providing different functionalities that are grouped in two main menus:

• Design a process – it allows to define the process services and to specify both domain descriptor and mapping for each pair of services involved in the process;

• Run a process – it performs the execution of a process.

In particular, the Design a process menu allows to perform the following operations:

• Select a process previously created;

• Create a new process;

• Create a Domain Descriptor from scratch or using a file;

• View and editing a Domain Descriptor;

• Specify mapping;

• Display a mapping file previously created.

In order to explain how the GUI environment works we present in the following paragraphs a practical scenario of design and execution of a statistical process.

3.1 Process design As a user logs in the application can perform one of the following operations:

1. Select a process from a list;

2. Create a new process;

In the first scenario the user has the possibility to choose a process among the list of processes that have been previously created. In the second he types the name of the process he wants to create (see Figure 3). Once the process has been created/selected on the left frame of the web page the list of services composing the process is displayed (see Figure 4). In this frame is also displayed the “Add a service” button which allows to define a new service in the process. By clicking the name of a service the web application displays the service properties, namely: the service name, the tool connected to the service, the command line needed to run the tool, the GSBPM tag and the logical names of the input, output, core input, core output files.

ESSnet

CORE COmmon Reference Environment

Date of dissemination Version Page

February 2012 1.0 13

Figure 3: Process selection

Figure 4: Service list

The GUI environment permits the creation of a service by pressing the “Add a service button”. Once this button has been pressed, the following fields are displayed in the main frame (see Figure 5):

ESSnet

CORE COmmon Reference Environment

Date of dissemination Version Page

February 2012 1.0 14

• service name: define the name of the service;

• tool selection: select the tool that implements the service. The list of available tools is dynamically loaded from a dedicated tool repository, which contains the tools provided by the National Statistical Institutes;

• command line: in this field the user specifies the command line needed to run the tool containing the parameters necessary for a correct execution of the tool;

• GSBPM tag: this metadata is needed to properly allocate the service in the GSBPM stack;

• logical inputs and outputs: it is possible to specify the Core input and Core output and a set of “non Core” input and “non Core” output.

Figure 5: Service creation

3.2 Domain descriptor definition The GUI environment allows to define a domain descriptor in two different ways:

ESSnet

CORE COmmon Reference Environment

Date of dissemination Version Page

February 2012 1.0 15

1. definition from scratch: in such case the user has to specify the name of at least one entity and of the connected properties.

2. definition from file: this functionality can be useful in this case in which the number of entities and/or properties is high. In particular it allows to have a default population of the DD starting from a text file. This file should contain (at least) the header of a rectangular dataset, i.e. the names of the columns of the dataset. In order to create the DD from a file template, the user has to specify the name for the entity and the file separator used in the file (see Figure 6).

Figure 6: Create domain descriptor from file

The GUI also allows to view and modify a previously created domain descriptor, through the functionality “View/edit an existing DD”.

3.3 Mapping definition The GUI exposes a functionality that gives the possibility to define a mapping between the Core input/output of a previously selected service (the red service in the left frame shown in Figure 7) and a domain descriptor. In order to complete the mapping process the user has to perform three different steps.

In the first step the user defines (see Figure 7):

• domain descriptor: the user specifies a domain descriptor previously created;

• Core input/output: the GUI loads the list of Core input/output related to the selected service, and the user selects the input/output involved in the mapping;

ESSnet

CORE COmmon Reference Environment

Date of dissemination Version Page

February 2012 1.0 16

• mapping source: the user selects a local file containing the header of a rectangular dataset, i.e. the names of the columns of the dataset.

• Global mapping properties: the user specifies the metadata of the dataset. In particular he has to choose the dataset kind and the Core tags for the dataset and the dataset rows.

Figure 7: Mapping specification – step 1

In the second and third step the user binds the domain descriptor properties to the properties of the mapping source (see Figure 8 and Figure 9). Further for each property he specifies the core kind and column kind metadata. The available values of such metadata are listed on the basis of the selection performed in the first step. For example if the dataset kind is dimensional, the column kind can be either dimension or measure.

ESSnet

CORE COmmon Reference Environment

Date of dissemination Version Page

February 2012 1.0 17

Figure 8: Mapping specification - step 2

Figure 9: Mapping specification - step 3

ESSnet

CORE COmmon Reference Environment

Date of dissemination Version Page

February 2012 1.0 18

3.4 Process execution This functionality permits to upload the “non Core” input files of the services of the selected process. Pressing the “Run process” button it is possible to start the execution of the process. The environment displays the status of the execution.

Figure 10: Process execution