Extending OCL for OLAP querying on conceptual multidimensional models of data warehouses

18
Extending OCL for OLAP querying on conceptual multidimensional models of data warehouses Jesús Pardillo * , Jose-Norberto Mazón, Juan Trujillo Lucentia Research Group, Department of Software and Computing Systems, University of Alicante, Spain article info Article history: Received 4 June 2009 Received in revised form 28 October 2009 Accepted 3 November 2009 Keywords: OLAP Conceptual modelling Multidimensional modelling Data warehouse Query language abstract The development of data warehouses begins with the definition of multidimensional mod- els at the conceptual level in order to structure data, which will facilitate decision makers with an easier data analysis. Current proposals for conceptual multidimensional modelling focus on the design of static data warehouse structures, but few approaches model the que- ries which the data warehouse should support by means of OLAP (on-line analytical pro- cessing) tools. OLAP queries are, therefore, only defined once the rest of the data warehouse has been implemented, which prevents designers from verifying from the very beginning of the development whether the decision maker will be able to obtain the required information from the data warehouse. This article presents a solution to this drawback consisting of an extension to the object constraint language (OCL), which has been developed to include a set of predefined OLAP operators. These operators can be used to define platform-independent OLAP queries as a part of the specification of the data warehouse conceptual multidimensional model. Furthermore, OLAP tools require the implementation of queries to assure performance optimisations based on pre-aggregation. It is interesting to note that the OLAP queries defined by our approach can be automatically implemented in the rest of the data warehouse, in a coherent and integrated manner. This implementation is supported by a code-generation architecture aligned with model-driven technologies, in particular the MDA (model-driven architecture) proposal. Finally, our pro- posal has been validated by means of a set of sample data sets from a well-known case study. Ó 2009 Elsevier Inc. All rights reserved. 1. Introduction Data warehouses are databases that store historical data for decision-making purposes [21]. Several kinds of applications can be used to analyse these data, among which the most popular is on-line analytical processing (OLAP) which allows hu- man analysts to navigate through multidimensional structures in order to access data in a more natural manner. Designers of multidimensional models must structure the information that is available into facts and dimensions. Facts are usually mea- sures of business processes of some kind (e.g., how many products are sold, how many patients are treated, how long some- thing takes, etc.), and dimensions represent the different ways in which the data can be viewed and sorted (e.g., according to time, store, customer, product, etc.). The development of a data warehouse begins with the definition of a conceptual multidimensional model which is plat- form-independent. When particular design decisions are made, these models are then translated into logical schemata which are tailored to a given technology and are additionally refined in a physical schemata (selecting indexes, partitioning tables, 0020-0255/$ - see front matter Ó 2009 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2009.11.006 * Corresponding author. Tel.: +34 965 90 37 72; fax: +34 965 90 93 26. E-mail addresses: [email protected] (J. Pardillo), [email protected] (J.-N. Mazón), [email protected] (J. Trujillo). Information Sciences 180 (2010) 584–601 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins

Transcript of Extending OCL for OLAP querying on conceptual multidimensional models of data warehouses

Information Sciences 180 (2010) 584–601

Contents lists available at ScienceDirect

Information Sciences

journal homepage: www.elsevier .com/locate / ins

Extending OCL for OLAP querying on conceptual multidimensionalmodels of data warehouses

Jesús Pardillo *, Jose-Norberto Mazón, Juan TrujilloLucentia Research Group, Department of Software and Computing Systems, University of Alicante, Spain

a r t i c l e i n f o

Article history:Received 4 June 2009Received in revised form 28 October 2009Accepted 3 November 2009

Keywords:OLAPConceptual modellingMultidimensional modellingData warehouseQuery language

0020-0255/$ - see front matter � 2009 Elsevier Incdoi:10.1016/j.ins.2009.11.006

* Corresponding author. Tel.: +34 965 90 37 72; fE-mail addresses: [email protected] (J. Pardillo)

a b s t r a c t

The development of data warehouses begins with the definition of multidimensional mod-els at the conceptual level in order to structure data, which will facilitate decision makerswith an easier data analysis. Current proposals for conceptual multidimensional modellingfocus on the design of static data warehouse structures, but few approaches model the que-ries which the data warehouse should support by means of OLAP (on-line analytical pro-cessing) tools. OLAP queries are, therefore, only defined once the rest of the datawarehouse has been implemented, which prevents designers from verifying from the verybeginning of the development whether the decision maker will be able to obtain therequired information from the data warehouse. This article presents a solution to thisdrawback consisting of an extension to the object constraint language (OCL), which hasbeen developed to include a set of predefined OLAP operators. These operators can be usedto define platform-independent OLAP queries as a part of the specification of the datawarehouse conceptual multidimensional model. Furthermore, OLAP tools require theimplementation of queries to assure performance optimisations based on pre-aggregation.It is interesting to note that the OLAP queries defined by our approach can be automaticallyimplemented in the rest of the data warehouse, in a coherent and integrated manner. Thisimplementation is supported by a code-generation architecture aligned with model-driventechnologies, in particular the MDA (model-driven architecture) proposal. Finally, our pro-posal has been validated by means of a set of sample data sets from a well-known casestudy.

� 2009 Elsevier Inc. All rights reserved.

1. Introduction

Data warehouses are databases that store historical data for decision-making purposes [21]. Several kinds of applicationscan be used to analyse these data, among which the most popular is on-line analytical processing (OLAP) which allows hu-man analysts to navigate through multidimensional structures in order to access data in a more natural manner. Designers ofmultidimensional models must structure the information that is available into facts and dimensions. Facts are usually mea-sures of business processes of some kind (e.g., how many products are sold, how many patients are treated, how long some-thing takes, etc.), and dimensions represent the different ways in which the data can be viewed and sorted (e.g., according totime, store, customer, product, etc.).

The development of a data warehouse begins with the definition of a conceptual multidimensional model which is plat-form-independent. When particular design decisions are made, these models are then translated into logical schemata whichare tailored to a given technology and are additionally refined in a physical schemata (selecting indexes, partitioning tables,

. All rights reserved.

ax: +34 965 90 93 26., [email protected] (J.-N. Mazón), [email protected] (J. Trujillo).

J. Pardillo et al. / Information Sciences 180 (2010) 584–601 585

etc.) [38]. Several approaches [2,18,36] for conceptual modelling have recently been developed in order to represent multi-dimensional structures in an implementation-independent manner, thus reflecting real-world situations as accurately aspossible. Unfortunately, these approaches mainly focus on the static data warehouse structures and little importance hasbeen placed on model user query behaviour at the conceptual level. The benefit of anticipating user behaviour by consideringthe conceptual modelling of OLAP queries is twofold: (i) conceptual multidimensional models are validated at the verybeginning of the development [7], which prevents designers from deploying an entire data warehouse, and does not thusmeet the decision maker’s data analysis needs, and (ii) OLAP queries at the conceptual level form the expected workloadfor the data warehouse, which can be implemented in a coherent and integrated manner with the rest of the data warehousein the subsequent design stages. For instance, modelling OLAP queries is a requirement for later logical and physical designissues such as index selection [20] or view selection [22,17].

Fig. 1 presents two concerns of data warehouse design: multidimensional models and OLAP queries. This figure shows(lower part) that most current related work has traditionally been focused on the modelling of the structure of the static partof the data warehouse, by providing conceptual, logical, and physical schemata. The innovative nature of our proposal lies inits capacity to specify OLAP queries on every model used in the design of the data warehouse. We shall therefore begin byspecifying OLAP queries on the conceptual model of data warehouses. Then, thanks to the use of model-driven architecture,we shall provide the traceability of these queries up to their final implementation. Fig. 1 summarises the differences betweenour proposal and other approaches, by considering not only the multidimensional models but also the OLAP queries. Querylanguages for multidimensional models may be characterised as OLAP algebras. It is important to note that OLAP manipu-lates dimensional-data by following the data cube metaphor [32].

The main restriction for defining multidimensional queries at the conceptual level is the rather limited support offered bycurrent conceptual modelling languages [2,18,36]. We therefore believe that it is extremely important to be able to provideOLAP operators with the means to model OLAP queries at the conceptual level. These queries can therefore be defined andvalidated regardless of the final technology platform chosen to implement the data warehouse.

In this article, OCL (object constraint language) [31] is therefore extended with a new set of OLAP operators in order tofacilitate the specification of multidimensional queries as part of the definition of UML conceptual schemata. OCL has alreadybeen successfully extended for several purposes, such as in [12,3] in which OCL is extended to assess security and quality,respectively. Nevertheless, to the best of our knowledge, ours is the first extension to consider OLAP operators. In our exam-ple, we use the operators in combination with our UML profile for multidimensional modelling [23]. Our new OCL operationshave been tested and implemented by means of a set of sample data.

Furthermore, our work is aligned with model-driven technologies, in particular, with model-driven architecture (MDA)[31], in which the implementation of the system is supposed to be automatically generated from its high-level models.All the multidimensional queries are defined at the conceptual level from the data warehouse workload, and this permitsa more complete code-generation phase, including the automatic translation of these queries from their initial platform-independent definition to the final (platform-dependent) implementation, thus increasing performance by means of pre-aggregation [22,17,35].

In this article, we extend our MDA for data warehousing [28,23,34,33], and in particular [32], in which a preliminary workon the topic presented in this article was introduced. The extension presented here improves on our previous work: (i)through a formal characterisation of OLAP operations and the data types involved (see Section 3), (ii) by providing macrodefinitions that facilitate their management (see Section 4), (iii) by validating the presented framework in an OCL engineagainst sample data sets (see Section 6), and (iv) by applying MDA to the automatic generation of code for OLAP queries.These mechanisms thus increase understanding and permit the validation of conceptual models while they are being man-aged [7], from the first stage of the data warehouse development.

This article is structured as follows. Section 2 introduces multidimensional modelling by means of a sample scenario, andprepares the conceptual multidimensional modelling to be queried by OLAP operations. The background to OLAP algebras isthen presented (Section 3). Our extension of OCL with OLAP operators is described in Section 4. The translation of OLAP oper-ators into code is later studied in Section 5. Code generation is supported by means of an implementation based on Eclipse,

Fig. 1. Modelling OLAP queries on model-driven data warehouses.

Fig. 2. A conceptual model of sales and inventory data warehouse for OLAP purposes.

586 J. Pardillo et al. / Information Sciences 180 (2010) 584–601

which is detailed in Section 6. Related work is outlined in Section 7. Finally, our main conclusions and proposed future workare discussed in Section 8.

2. Multidimensional modelling at the conceptual level

OLAP analysis manipulates multidimensional-data extracted from data warehouses by following the data cube metaphor.Fig. 2 shows an illustrative example based on the notation of [23]. This data model is implemented in UML [31]. For eachdimensional property, a given UML metaclass (i.e., modelling element) has been extended by means of stereotypes andsemantic constraints in order to host dimensional modelling. This extension also includes a specific iconography with whichto visualise them. Fig. 2 models two facts: sales and inventory (represented as ), in which the sale amount and units andinventory quantity are the measures ( ). These are described through three dimensions ( ): time, location, and product. Eachdimension also contains various descriptors ( ) (that represent any kind of aggregation level property), which are arrangedin the aggregation hierarchies ( ) between different granularity levels ( ). For the sake of simplicity, the considered sce-nario omits advanced properties such as inheritance or degenerate facts and dimensions, and also assumes that aggregationhierarchies are symmetric [24,30]. The manner in which these are dealt with will be discussed in Section 4.9.

Multidimensional models are conceived in model-driven data warehouses as conceptual data models which providestakeholders with a more natural understanding of how data are structured and how they can be accessed for analysis pur-poses. Multidimensional models therefore provide visual primitives which facilitate the modelling of data warehouses ratherthan mere abstract data modelling constructors. For instance, the dimensions in [23] represent containers of aggregationhierarchies but not the actual data to be stored. In other words, they cannot be instantiated in data entities on their own.As it is shown in Fig. 3, it is important to note that even the metadata that describe facts and dimensions have no logicalcounterparts to be stored in a database.1 Conceptual models may be decomposed into two layers, one of which contains a kindof metadata: the upper layer provides the visual notation that is rendered in diagrams (e.g., see Fig. 2) whereas the lower layerprovides the concepts through which to multidimensionally model the data warehouse storage or database schema (domainmodel shown in Fig. 5). In order to be able to specify OLAP queries conceptually, they should be targeted to the second layerwhich correctly models the stored data.

The expressiveness required to model data warehouse models allows us to characterise them as object-oriented informa-tion systems [41]. Conceptual multidimensional models may thus be mapped into UML class diagrams, (denominated as‘‘dimensional mapping” by some researchers), and class diagrams may conversely be interpreted as multidimensional mod-els (‘‘dimensional interpretation”) (see [10,11] for the ontological basis of these mappings).

Conceptual modelling frameworks such as that of [23] have coupled the two layers presented in a single modelling lan-guage or metamodel. In these cases, obtaining the underlying database schema implies detaching the presentation layerfrom the data layer. This task involves two separate activities:

1 Please note that the relational model indistinctly characterises facts and dimensions as relations (data tables for database engines).

DIAGRAM(Visual Notation)

DATABASE SCHEMA(Domain Model)

Presentation Layer

Data Layer

See Fig. 2

See Fig. 5

Fig. 3. Metadata layers involved in conceptual modelling of data warehouses.

Table 1Mapping lðd; oÞ between multidimensional and object-oriented concepts.

Dimensional-data [23] Object-oriented [31]

Fact ClassBase ClassDimension of f with grain b Shared association of ob with of , where lðb; obÞ;lðf ; of ÞRoll d up to r Shared association of od with or , where lðd; odÞ;lðr; odÞMeasure of f Property of of , where lðf ; of ÞDescriptor of b Property of ob , where lðb; obÞ

J. Pardillo et al. / Information Sciences 180 (2010) 584–601 587

Multidimensional metamodelling through the structural modelling of the OLAP concepts with class diagrams [23,2,13]. Themust-have entities would be: fact, dimension, or aggregation hierarchy (whose precise meaning is determined by eachmetamodel). Additional entities would be, for example: the different kinds of aggregation hierarchies [30] (i.e., symmet-ric, non-covering, etc.), fact or dimension inheritance, or dealing with degenerated facts and dimensions [21].Semantic matching by translating the meaning of each (meta-)class instance, according to the multidimensional meta-model, into class diagrams, i.e., modelling them as classes (correctly-speaking), associations (inheritance included) orproperties. Owing to the lack of standardisation, this matching depends on the metamodel chosen. However, rules forthe foundations of any of these models are studied below.

For the purpose of this discussion, the metamodel resulting from the first activity is that presented in [23]. Its semanticmatching follows the rules shown in Table 1 (let l be a predicate or relation characterising the semantic matching). Therationale for this is derived from [28].

As an example, let us render the conceptual model of Fig. 2 as the UML object diagram shown in Fig. 4. This diagram rep-resents the metaclasses involved, along with their instances, without the aid of a visual notation,2 thus highlighting the mul-tidimensional concepts to be mapped. According to Table 1, the class diagram of Fig. 5 maps these concepts into thestructural view or data schema of an object-oriented information system.

It is important to note that an explicit representation of dimensions is omitted from the class diagrams. In the object-ori-ented paradigm, they are instead represented as merely class roles instead. As it has previously been stated, the rules of Table1 preserve the structural semantics of conceptual multidimensional models such as that studied in [23]. The resulting classdiagram can thus be conceptually queried by OLAP algebras, whose design will be discussed as follows.

3. Characterisation of OLAP algebras

The main data structure managed by OLAP algebras is the data cube, which is modelled at the conceptual level by usingmultidimensional concepts. Whereas multidimensional modelling copes with database static structures, data cubes are con-ceived as dynamic entities because they are built during runtime in response to OLAP queries. OLAP algebras manage datacubes as sets of occurrences from a given fact indexed by several occurrences of certain dimensions (specifically, their relatedaggregation levels). In addition, the entire OLAP analysis is defined as a sequence of OLAP operations through these datacubes.

Data cubes are composed of data cells. Each data cell represents an aggregation of a fact measure by means of severaldimensions. A data cell c may be characterised as a tuple c ¼ ðy; �xÞ, where

y is the aggregated value calculated from a fact measure dom(v), and�x is the grouping criteria denominated as data cell coordinates. This may be characterised as a list of aggregation leveloccurrences �x ¼ ðx1; . . . ; xnÞ in which each coordinate or axis xi has a domain in the aggregation level domðxiÞ.

2 Object diagrams are not usually considered as visual notation, but rather as a type of low-level instance specification.

Fig. 4. Object diagram of the conceptual model in Fig. 2 (link-end labels hidden).

Fig. 5. Class diagram for the conceptual multidimensional model in Fig. 2.

588 J. Pardillo et al. / Information Sciences 180 (2010) 584–601

Its schema, dom(c), is then characterised as ½domðyÞ; ðdomðxiÞ; . . . ; domðxnÞÞ� where ðdomðxiÞ; . . . ; domðxnÞÞ is its n-dimen-sional base.

Since the definition of a data cube is built on data cells of unrestricted schemata, the actual data cube schema (includingits base) results from the composition of all the different schema of the contained data cells.

Example 3.1. A sales manager wishes to know the annual amount of sales per state. According to Fig. 5, this informationrequirement cs is answered by a data cube with schema dom(c)=(Sale::amount, (Year,State)). A particular c 2 cs could bec = (1000,(2008,Alicante)).

OLAP algebras characterise sets of operations of the form of f : C � P ! C, where C is the universe of data cubes for whicha particular algebra is defined and P is a set of additional parameters for each f. Several data types are defined for both C andP. For example, Cube data type is defined as Set(Cell) and Cell data type as Tuple(v:V, co:CO) where CO = Tu-

ple(x1:X1, . . . , xn:Xn) are the data cell coordinates3. There is no official commitment to the operations that should be pres-ent in an OLAP algebra, but [39] presents the backbone of state-of-the-art OLAP algebras which is supposed to be minimal (nooperation can be specified in terms of the others) and complete (every information need may be expressed in terms of theseoperations). Our extension of OCL therefore includes this set of OLAP operators, which are summarised as follows:

Dimension addition and removal: The operation signatures are:

3 See

fadd;removegDimension : Cube� ðAxis� AdditivityÞ ! Cube

where addDimension and removeDimension add and remove an Axis to and from a data Cube by calculating the resultingdata Cube with the Additivity aggregation function.

While the dimension addition must calculate aggregation sets (data cells) from finer data cubes than those provided(whose data cells do not contain sufficient data to convert the requested calculation into finer aggregations), dimensionremoval can be accomplished without additional data cubes which are different from those of the input.

http://www.lucentia.es/index.php/OCL/OLAP_Algebra for a comprehensive definition of the data types involved.

J. Pardillo et al. / Information Sciences 180 (2010) 584–601 589

Slice and dice: These operations filter data cells by considering either a single axis (slicing), or several axes (dicing). Theycan thus be managed as a single operation which is characterised as follows:

sliceAndDice : Cube� ðCell� DiceÞ ! Cube

where sliceAndDice checks each data Cell of a data Cube for a Dice criteria in order to include it in the resulting dataCube. Contrary to the traditional definition, in which coordinates are the only criteria for filtering data cells, Dice is not lim-ited to coordinates, and measures can also be taken into account. This removes the need to pull measures into axes in order toapply this operation (see Section 4.9).

Drill across: This operation signature is:

drillAcross : Cube� ðConformity� CubeÞ ! Cube

where drillAcross obtains data cells from a data Cube (third argument) which are linked to a particular data cell of theinput data Cube (first argument) by holding a Conformity predicate, in order to build the resulting data Cube.

Dimensional projection: This operation signature is:

dimensionalProject : Cube� ðMeasureÞ ! Cube

where dimensionalProject checks each data cell of a data Cube for a Measure in order to include it in the resulting dataCube.

Roll up: This operation signature is:

rollUp : Cube� ðAxis� Rolling� AdditivityÞ ! Cube

where rollUp groups the data cells of a data Cube by exchanging one Axis of the cells’ coordinates for its Rolling coun-terpart and by calculating the resulting data Cube with the Additivity aggregation function.This operation modifies the data granularity by calculating coarser data cells from finer ones. According to this definition,axes are only exchanged when they are already part of the data cell coordinates.

Drill down: This operation signature is:

drillDown : Cube� ðAxis� Drilling� AdditivityÞ ! Cube

where drillDown desegregates the data cells of a data Cube by exchanging an Axis of the cells’ coordinates for its Dril-ling counterpart and by calculating the resulting data Cube with the Additivity aggregation function.This operation is the opposite of the previous one: whereas rolling up goes from finer to coarser data cells, drilling down actsconversely. In contrast, when calculating finer than actual data cells, the input data cube does not have a sufficient level ofdetail [2] as occurs with addDimension.

Set-oriented operations: Signatures for the considered operations are:

union;intersection;difference : Cube� ðCubeÞ ! Cube

where these operations calculate the resulting data Cube according to the semantics of their respective set-oriented coun-terparts for a given pair of input data Cubes.

4. Extending OCL with OLAP operators

Conceptual modelling languages require the use of a general-purpose (textual) sublanguage to express all kinds of que-ries, constraints and rules since most of them cannot be expressed by using only the graphical constructs provided by themodelling language [9]. Several properties may be desired for this sublanguage, some of which are:

� It should be understandable for software engineers.� It should be declarative but executable.� It should be sufficiently expressive to permit the allocation of OLAP algebras.� It should be capable of querying conceptual models.

OCL [31] is an excellent candidate for UML models. The goal of this section is to extend OCL with the aforementionedOLAP operators to facilitate the definition of multidimensional queries on UML models.

Each defined data type such as Cube or Axis is supported in OCL by its own data type definition. However, OCL does notpossess any interesting capabilities with which facilitate the codifying of OLAP algebras. The OCL notation is therefore dec-orated herein. Note that its notational extension is conceived as a mere syntactic sugar: it does not compromise the codevalidation by OCL engines. These lightweight extensions are:

Data type definition: The introduction of data types such as Cube and Axis, by renaming their native OCL counterpartsbased on the Tuple data type. The requested capability is equivalent to the well-known typedef instruction presentedin the C programming language. On the other hand, in native OCL expressions, only data types modelled in the dataschema are permitted. Therefore, this data-type renaming could also be modelled in such a way (note that the first choiceis preferred here).

590 J. Pardillo et al. / Information Sciences 180 (2010) 584–601

Macro definition: Several operations such as drillAcross or rollUp are parameterised by any function (e.g.,Additivity), relationship (e.g., Rolling) or predicate (e.g., Conformity). These second-order data types can-not be arguments of OCL operations. However, thanks to macros, they may also be codified herein. In addition,macros encapsulate operation semantics by means of their signature, which eases their management. The intro-duced macro notation introduced resembles the original OCL operations owing to the use of the def

expression.

With regard to data types, note also that OCL expressions can be specified without their being shown explicitly. OCL per-mits this because it infers data types from the queried data model. This capability is extensively used herein. As an example,let us study the following expression. The second alternative is preferred owing to its conciseness:

let sale_amount: Integer = 10000 in

– data types that are explicitly shown

let sale_amount = 10000 in

– data types that are implicit but inferred by OCL

The mathematical characterisation of a data cell c ¼ ðy; �xÞ (see Section 3) can be translated into OCL by this data type:

def Cell = Tuple(v: V, co: Tuple(x1: X1, . . ., xn: Xn))

where

V is the codomain of the aggregation function that builds c;x1, . . ., xn are the names of the aggregation levels in �x; andX1, . . ., Xn are domðx1Þ; . . . ; domðxnÞ, respectively.

All OLAP analyses begin with the manipulation of a particular data cube. OLAP operations are then successivelyapplied over the intermediate data cubes. This process is elegantly formalised as a closed algebra. An initial datacube is therefore necessary to begin OLAP analyses. This may be defined by a convenience data cube called agrain.

Definition 4.1 (Data cube grain). This data cube contains all the data cells for every fact class of the database (representedwith class diagrams) at the finest granularity for all dimensions of the fact involved.

For example, the grain in Fig. 5 contains data cells whose coordinates co have Tuple(Product: Product, City, City,Day: Day) as their data cell base and Sale::amount, Sale::units, or Inventory::quantity as the aggregated value v.

Each OLAP analysis is thus codified in such a way that it begins by applying a particular OLAP operation over a grain. Thistypically means dimensionally projecting certain measures and then changing their base:

C = ‘‘sales amount per month and year”

grain->dimensionalProject(Sale::amount)->removeDimension(Day, sum()). . .– remaining OLAP analysis

An OLAP algebra (see Section 3) can be applied over a given grain in order to answer information needs or to gain insightinto data. The translation of these operations into OCL is described as follows. The OCL definition of each one is shown bymeans of examples and explanations.

4.1. Dimension removal

Operation Definition : Dimensional Removal

def Cube::removeDimension(x: Axis, a: Additivity): Cube =

let cos = self->collect(c_i | c_i.co.excluding(x))->asSet() in

self->allMeasures()->collect(m |

cos->collect(co_i | Tuple {co = co_i,v = self->select(c_i | c_i.v.type = m)

->select(co_i.includes(c_i.co))->collect(v)->a

}) )->flatten()

Example Query : Crd ¼ \grain dropping products"

grain->removeDimension(Product, sum()) =

J. Pardillo et al. / Information Sciences 180 (2010) 584–601 591

let cos = self->collect(c_i | c_i.co.excluding(Product))->asSet() in

self->allMeasures()->collect(m |

cos->collect(co_i | Tuple {co = co_i,v = self->select(c_i | c_i.v.type = m)

->select(c_i | co_i.includes(c_i.co))->collect(v)->sum()

}) )->flatten()

4.2. Dimension addition

This generates new coordinates, which contain the entire axis. These coordinates are checked for empty data cells in orderto calculate valid aggregations.

Operation Definition : Dimension Additiondef Cube::addDimension(x: Axis, a: Additivity): Cube =

let cos = self->collect(c_i | c_i.co.including(x))->flatten() in

self->allMeasures()->collect(m |

let cs = grain->select(c_i | c_i.v.type = m) in

cos->select(co_i | cs->select(c_i | c_i.co.isRelated(co_i)))->collect(co_i | Tuple {

co = co_i,v = cs->select(c_i | c_i.co.isRelated(co_i))->collect(v)->a

}) )->flatten()

The data cube grain supports this definition. However, others may also be considered, e.g., for the sake of performance.This operation is thus implicitly parameterised by the supporting data cube.

Example Query : Cad ¼ \Crd per product"cs_cb->addDimension(Product, sum()) =

let cos = self->collect(c_i|c_i.co.including (Product))->flatten() in

self->allMeasures()->collect(m |

let cs = grain->select(c_i | c_i.v.type = m) in

co->collect(co_i | Tuple {co = co_i,v = cs->select(c_i | c_i.co.isRelated(co_i))->collect(v)->sum()

}) )->flatten()

4.3. Slice and dice

Operation Definition : Slice & Dice

def Cube::sliceAndDice(c: Cell | d: Dice): Cube =

self->select(c | d)

This operation is implemented in OCL in a straightforward manner, by simply renaming the selected operation of the OCLCollection data type (over which Cubes are supported).

Example Query : Csd ¼ \Crd only with amounts over 5000 sold in Madrid"

cs_rd->sliceAndDice(c | c.v > 5000 and c.co.state =madrid) =

cs_rd->select(c | c.v > 5000 and c.co.state =madrid)

Parameter d may be bound to unrestricted (powerful) OCL predicates: from simple boolean operators such as and and or,to those which are as complex as necessary (e.g., all those included in third-party libraries).

4.4. Drill across

This is supported by some kind of semantic linkage between data cubes [39,16]. These inter-schema relationships areexplicitly modelled in certain frameworks through the use of for example, flows or correlations [2]. They may beabstracted in an OLAP algebra as a linkage predicate between data cells which may be called conformity in the sense of[21]. According to [34], they may be defined as the dimension shared between the facts classes involved, where conformity

592 J. Pardillo et al. / Information Sciences 180 (2010) 584–601

is the equality between coordinates. For example, sales and inventory data cells (Fig. 5) can be drilled across the coordi-nates that they share.

Operation Definition : Drill Acrossdef Cube::drillAcross(p: Conformity, cs: Cube): Cube =

self->collect(c_i | c_i.p(cs))->flatten()

Operation Definition : Dimension Sharing ðsample conformity predicateÞ def Cell::equals(c: Cube): Set(Cell) =

c->select(c_j | c_j.co.equals(c_j.co))

Example Query : Cdas ¼ \ inventory amount according Csd"

let cs_inv = grain->dimensionalProject(Inventory) in

– target inventory data cube

cs_sd->drillAcross(equals, cs_inv) =

cs_sd->collect(c_i | c_i.equals(cs_inv))->flatten()

where:

c_i.equals(cs_inv) =

c->select(c_j | c_j.co.equals(c_j.co))

This operation definition is fairly simple because its complexity lies in the construction of target data cube (cs_inv in the

example above), which is typically understood during OLAP analyses, when drilling across is requested. In order to buildcs_inv, the dimensional projection is used, which is defined below.

4.5. Dimensional projection

According to the data cube definition presented, data cells store only one value v (i.e., measure) in contrast to other def-initions such as [2] in which several values are permitted.

We believe that the studied OLAP algebra studied is more elegantly formalised. It can also manage several values in a sin-gle data cube. However, the definition of data cells are simpler, and those of OLAP operations are therefore also simpler: itsoperations can manage each required measure in its own data cell, whereas several measures can continue to be managed ina whole data cube (they can store different measures). Therefore, this operation simply defines the actual measure to be in-cluded in a certain data cube.

This may be easily conceptualised in OCL by filtering data cells containing values for a specific measure.

Operation Definition : Dimensional Projectiondef Cube::dimensionalProject(m: Measure): Cube =

self->select(c_i | c_i.v.type = m)

Example Query : Cdp ¼ \sales quantity according to Csd"

cs_sd->dimensionalProject(Sale::quantity) =

cs_sd->select(c_i | c_i.v.type = Sale::quantity)

4.6. Roll up

Operation Definition : Roll Up

def Cube::rollUp(x: Axis, r: Rolling, a: Additivity): Cube =

let cos = self->collect(c_i | c_i.co.rollUp(x, r))->flatten()->asSet() in

self->allMeasures()->collect(m |

cos->collect(co_i | Tuple {co = co_i,v = self->select(c_i |

c_i.co.rollUp(x, r).equals(co_i))->collect(v)->a

}) )->flatten()

As with the addDimension operation, its definition is supported by a grain, but others may also be considered.

Example Query : Cru ¼ \Cmp per years ðrather than monthsÞ"cs_mp->rollUp(Month, year, sum()) =

let cos = self->collect(c_i | c_i.co.rollUp(Month, year))

J. Pardillo et al. / Information Sciences 180 (2010) 584–601 593

->flatten()->asSet() in

self->allMeasures()->collect(m |

cos->collect(co_i | Tuple {co = co_i,v = self->select(c_i |

c_i.co.rollUp(Month, year)->includes(co_i))->collect(v)->sum()

}) )->flatten()

Parameter year is employed as the Rolling r argument, and is thus supported by the shared associations of Fig. 5 be-tween aggregation level classes. Thus, year is the role that Year plays in the Month aggregation.

4.7. Drill down

Operation Definition : Drill Down

def Cube::drillDown(x: Axis, d: Drilling, a: Additivity): Cube =

let cos = self->collect(c_i | c_i.co.drillDown(x, d))->flatten()->asSet() in

self->allMeasures()->collect(m |

cos->collect(co_i | Tuple {co = co_i,v = grain->select(c_i | c_i.v.type = m)

->select(c_i | c_i.co.isRelated(co_i))->collect(v)->a

}) )->flatten()

Unlike rolling up, drill-down cannot calculate aggregations from the previous data cube. Its definition should therefore besupported by a grain.

Example Query : Cdd ¼ \Cru per city ðinstead of stateÞ"def Cube::drillDown(State, city, sum()): Cube =

let cos = self->collect(c_i | c_i.co.drillDown(State, city))->flatten()->asSet() in

self->allMeasures()->collect(m |

cos->collect(co_i | Tuple {co = co_i,v = grain->select(c_i | c_i.v.type = m)

->select(c_i | c_i.co.isRelated(co_i))->collect(v)->sum()

}) )->flatten()

4.8. Set-oriented operations

OCL defines primitive operations for the Collection data type. Since the data cubes studied are supported by the Set

operations (which specialise Collection) such as union, intersection, or difference, they may be directly applied to datacubes. Advanced set-oriented operations can also be derived from them.

Example Query : Cso ¼ \Cdd also summarising sales per stateðCruÞ"cs_dd->union(cs_ru)

The previous sub-sections have shown how to codify OLAP algebra operations based on the state-of-the-art backbonepresented in [39]. The management of various advanced multidimensional properties are discussed as follows.

4.9. Additional issues

Inheritance. This relationship is commonly used in conceptual models in order to represent a categorisation hierarchybetween aggregation levels [23], in contrast to their typical aggregation hierarchy, or even fact specialisation. This impliesadditional data with regard to the subtypes of the dimensional-data concepts. These data may similarly be managed bythe OLAP algebra studied. For instance, the sliceAndDice operation may take advantage of these data by applying oclI-

sKindOf and oclIsTypeOf in order to identify the actual data type of each dimensional-data occurrence and then actingaccordingly.

594 J. Pardillo et al. / Information Sciences 180 (2010) 584–601

Pushand pull. These operations deal with the fact-dimension dichotomy in dimensional-data models [39], converting cur-rent measures into dimensions (push), or conversely (pull). Conceptually speaking, the focus of analysis may alter betweenmeasures and dimensions. As [39] states, they could be codified in OCL by (i) drilling across some kind of semantic relation-ship between measures and dimensions, and (ii) changing coordinates for the target data cube.

Degenerate facts and dimensions. These dimensional-data concepts respectively model (a) interesting facts about otherfacts and some of their dimensions, or (b) dimensions which do not have an inner structure [21]. Since, from an operationalpoint of view, they are actually facts and dimensions, OLAP algebras should deal with them as with any other fact or dimen-sion. They are thus managed as their respective canonical counterparts and are rendered in class diagrams as mere classeswith associations to both the related fact and the dimension class.

Data Derivation. With regard to dimensional-data concepts such as measures and aggregation level descriptors, dataderivation can be modelled in conceptual models in the same way as any other data model type. OLAP algebras shouldremain unchanged, since derivation rules act as semantic annotations. The retrieval engine rather than the OLAP algebra(its conceptualisation) is responsible for taking care of these metadata. Nevertheless, it is interesting to consider derivedmeasures, since performance and additivity may suffer otherwise.

Additivity and complex aggregation functions. Common OLAP aggregation functions, such as count or sum, may bedirectly modelled in OCL by means of size or sum operations. However, it is also possible to define other (more complex)functions by reusing them. For instance, one widely-used OLAP aggregation function is avg, which may be codified in OCL asfollows4:

def Set(V)::avg(): V =

self->iterate(i: V; acc = 0: V | acc + i) / self->size()

Various aggregation functions are permitted for each data cube depending on the actual information needs (e.g., averageamount of sales versus total sales) and the additivity constraints [30] that should be hold (e.g., inventory quantity cannot beaggregated through time with meaningful results). It is therefore mandatory for decision makers to explicitly specify thedesired aggregation function when applying OLAP operations.

5. Code generation for OLAP queries

Our OLAP operators allow users to define multidimensional queries at the conceptual level in order to check that the mul-tidimensional model agrees with decision makers’ information requirements. It should also be emphasised that conceptualmultidimensional models in UML with OCL queries defined by using our OLAP operators can be directly translated into codefor various final technology platforms.

5.1. OLAP translation

Several translations involving OLAP algebras exist (Fig. 6). This work explores coding an OLAP algebra with OCL.Unfortunately, most database engines that implement data warehouses only manage SQL. In order to provide compatibilitywith current developments, this mapping between OCL and SQL should be stated. In [1], the OLAP algebra that is stated asthe state-of-the-art backbone in [39] is translated into SQL, thus, proving that relational model expressiveness is at leastequal to the expressiveness achieved by OLAP models. In [8], while OCL translation into SQL is stated, whereas OCL andrelational calculus are compared in [25]. These two results theoretically enable the mapping of OLAP algebras codified inOCL into SQL.

Note that the notational extension of OCL carried out in this article does not alter the expressiveness of OCL, and thus doesnot interfere with the discussed mappings. The translation is therefore achieved in two steps:

1. Macro expansion, in which each notational extension is substituted for the piece of code that it encapsulates. Since macrosare parameterised, this substitution also involves substituting each argument for the (actual) parameter of the macroinvocation. Its semantics are thus equivalent to the generic semantics of all macro processors.

2. OCL translation. A pure OCL code remains from the previous step. It is therefore directly translated into SQL according to[8].

Whereas macro expansion is a syntactic process, OCL translation involves a semantic mapping between class diagramsand relational models. In databases, the languages involved represent two different abstraction levels, namely conceptualand logical levels. Logical design deals with additional concerns such as technologies, or time and space constraints. Notethat various logical variations are possible for a single conceptual model.

4Set is a template class for parameter V as is broadly used in OCL itself.

OLAP

OCL

SQL

Set Theory

Relational

MDX

this work

OCL

SLQMXDAbelló et al.DOLAP'03

Codd,Commun. ACM'70

MDX

Demuth & Hussmann,UML'99

Mandel & Cengarle,FM'99

Fig. 6. Translations involving OLAP algebras in the literature.

J. Pardillo et al. / Information Sciences 180 (2010) 584–601 595

Since this is the goal of this research, the solution of the semantic gap between conceptual and logical levels implies that,given a particular platform hosting and by manipulating the logical schemata, there is a correspondence at the logical levelof all the dimensional properties conceptually defined. It is worth noting that, owing to technology-specifics, the semanticgap is always attached to a particular platform, e.g., a relational model in a ROLAP system. Given the maturity of relational plat-forms in data warehousing, several studies assess the translation of dimensional structures into relations (tables). Two works[28,33] are of particular note since they are aligned to the principles of this work. Both evaluate these mappings by means offormal languages and integrate them into a comprehensive method for data-warehousing design based on model-driven tech-nologies [4]: in [28], multidimensional models are translated into relational models according to a star schema configuration[21]. In [33], however, the previous mapping is complemented with the correspondences of multidimensional models withthe metadata over which OLAP tools manage relational models. However, standardised mechanisms for managing OLAPmetadata such as CWM [31] are not usually sufficiently expressive to deal with all multidimensional concepts. This problemcan be solved through their extension, but incompatibilities then arise with compliant OLAP tools. Thus, according to thisstudy, enabling conceptual queries helps both to validate conceptual models and to query them in a platform-independentmanner.

5.2. Star schema and snowflake logical configurations

In order to illustrate this discussion, the query derivation for two of the most widely-known logical representations of adata warehouse are presented, namely star and snowflake schemata [21]. Given the class diagram of Fig. 5, both schematamay be obtained by applying equivalent mappings to those put forward in [28]. These are shown in Fig. 7. Whereas snow-flake schema maps every aggregation level to a different relation, star schema collapses aggregation hierarchies into a uniquedenormalised dimension table (surrogate keys are not modelled here for the sake of simplicity).

Given these logical data schemata, the OLAP algebra studied may be translated into its relational calculus counterpart (seeFig. 1) rendered in SQL. This translation is carried out by means of the mappings discussed in [1]. Let us use the SQL tem-plates employed for querying both kinds of logical configurations as an example:

– star schema

SELECT DESCRIPTOR(d1.levelA_SK), . . ., DESCRIPTOR(dN.levelZ_SK),AGGREGATION_FUNCTION(f.measure1), . . .

FROM Fact f, Dimension1 d1, . . ., DimensionN dN

WHERE f.FK1 = d1.PK AND . . .f.FKN = dN.PK

AND di.descriptorA = value AND . . .

GROUP BY d1.levelA_SK, . . ., dN.levelZ_SK– snowflake schema

SELECT DESCRIPTOR(d1l1.SK), . . ., DESCRIPTOR(d2l1.SK), . . .

AGGREGATION_FUNCTION(f.measure1), . . .

FROM Fact f,

Dimension1Level1 d1l1, . . ., Dimension1LevelN d1lN,

Dimension2Level1 d2l1, . . ., Dimension2LevelM d2lM, . . .

WHERE f.FK1 = d1l1.SK AND . . .f.FKN = d1lN.SK

AND d1l1.roll_FK = d1l2.SK AND . . .d1LN-1.roll_FK = d1lN.SK

AND d2l1.roll_FK = d2l2.SK AND . . .d2LM-1.roll_FK = d2lM.SK

AND di.descriptorA = value AND . . .

GROUP BY d1l1.SK, . . ., d2l1.SK, . . .

city_nameLocation

day_nameTime

month_namemonth_numberyear_number

Product

state_namestate_codecountry_namecountry_code

Table

column

Sale Inventoryamount quantityunits

location time productproducttimelocation

Sale Inventoryamount quantityunits

Statenamecode

Country

code

nameCity

nameDay

Monthnamenumber

numberYear

codeProduct

codeBranch

Table

primary_key

foreign_key

name

product_codebranch_code month branchstate

country year

primary & foreign_key

legend

EKALFWONSRATS

Fig. 7. Two well-known data schema configurations for logical design.

596 J. Pardillo et al. / Information Sciences 180 (2010) 584–601

5.3. Development process

The architecture conceived for the code generation of OLAP algebras is presented in Fig. 8. Given a logical platform and alogical configuration for this platform, OLAP operations on conceptual models may be translated into their logical counter-parts. This process can be supported by model-driven technologies [4] such as MDA which has already been proven in themanagement of data schemata [28,34].

MDA conceives code generation as a sequence of model-to-model transformations that culminate in a final model-to-code transformation. Three abstraction levels are considered and related to the traditional phases of database design,namely: conceptual models as PIM’s (platform independent models), logical models as PSM’s (platform specific models),and code as physical models. These transformations are codified according to MDA QVT and Mof2Text languages [31] bymeans of relations or mappings between data models. Given these, an MDA-compliant transformation engine can automat-ically generate the code, implementing both the data warehouse and its OLAP queries.

Unrestricted types of information requirements (see Fig. 8) in the PIM layer are used by UML class diagrams to codify dataschemata (static part of conceptual OLAP data models). OCL does the same for the queries over them (their dynamic aspect).Obviously, OCL queries remain attached to data schemata, and various solutions such as those of [23,2] may be considered. Itis worth noting that many information requirements usually include ad hoc OLAP queries that cannot be completely knownin their entirety in advance. In order to solve this problem, information requirements can be derived from decision makers’goals in their organisations [27], which are considered to be sources of more stable requirements. In the PSM layer, CWM andSQL do the same as UML and OCL, respectively. In this layer, query transformations depend on both design rationale and themapping of the data schema. The last layer contains code for a particular platform: Oracle, MySQL, etc. Our architecture thusallows designers to include platform-related techniques such as index or view selection [20,22,17] in order to optimise thefinal data warehouse implementation.

6. Implementation

Code generation has been implemented using the Eclipse development platform.5 This is designed as a conglomerate ofmodules, called plug-ins, which enable developers to adapt it to new functionalities. In particular, Eclipse provides severalmodules for MDA: MDT which implements UML and UML profiles; EMF which implements MOF [31], and thus thedefinition of CWM models; medini QVT and SmartQVT which respectively implement the declarative and imperative partof QVT; ATL which implements a transformation language compatible with QVT [19]; and MOFScript which implementsMof2Text.

Fig. 9 shows the implementation of the conceptual model in Fig. 2, together with an example of an OLAP query usingthese technologies.

In order to prove that the OCL/OLAP algebra studied behaves as expected, it is evaluated over sample data sets and thecorrectness of the results are then verified. Following the examples in Section 4, a sample OLAP session containing all thedefined OCL operations is as follows:

5 Site: http://www.eclipse.org.

Fig. 8. MDA for platform-independent OLAP queries.

J. Pardillo et al. / Information Sciences 180 (2010) 584–601 597

Example Query : Sequence of operations during an OLAP sessionlet cs_sd = grain ->sliceAndDice(c | c.co.state.country = spain) in

let cs_dpu = cs_sd ->dimensionalProject(Sale::units) in

let cs_dpq = cs_sd ->dimensionalProject(Inventory::quantity) in

let cs_da = cs_dpu->drillAcross(equals, cs_dpq) in

let cs_rmp = cs_dpu->removeDimension(Product, sum()) in

let cs_rus = cs_rmp->rollUp(City, state, sum()) in

let cs_ruc = cs_rus->rollUp(State, country, sum()) in

let cs_rum = cs_ruc->rollUp(Month, year, sum()) in

let cs_dd = cs_rum->drillDown(Country, state) in

let cs_ad = cs_dd->addDimension(Product, sum()) in

let cs_rmc = cs_dpu->removeDimension(City, sum()) in

let cs_rmd = cs_rmc->removeDimension(Day, sum()) in

let cs_u = cs_ad ->union(cs_rmd)

6

Table 2 shows data cells of some of the data cubes involved in this OLAP session. Each table shows data cubes (first col-umn) grouped by (i) their dimensions (in particular, Table 2 shows data cubes with at least one 3D data cell), and (ii) theirdata cell coordinates (second column, described by default descriptors of the aggregation level involved) and values (thirdcolumn). The results returned by each OCL operation are those expected in all cases.

Moreover, Fig. 10 shows the implementation of the model transformations such as that from conceptual to logical sche-mata (on the left-hand side) in ATL and from logical to physical schemata (on the right-hand side) in MOFScript. The logicalconfigurations discussed are obtained in Fig. 11, which shows both the OLAP queries in SQL (upper part of figure) and thedatabase schemata in CWM (lower part).

7. Related work

To the best of our knowledge, the main contribution towards the study of OLAP algebras is that of [39]. In [2], the sameauthors had previously presented a conceptual model by means of its two concerns, i.e., as a database schemata togetherwith an algebra for querying it. The first is mathematically formalised but also provides a UML extension. With regard toalgebra, one is provided for the OLAP operations, which is stated as being the backbone [39]. However, as with many otherOLAP algebras such as [5], their mathematical scaffolding, mainly focused on the study of OLAP foundations, makes themdifficult for developers to understand and manipulate, and they are not correctly integrated into conceptual schemata. How-ever, it is worth noting that [14] formalises a powerful SQL operator for data cube aggregation, but at the logical level.

With regard to the conceptual versus logical expressiveness of OLAP algebras, [2] provides a mapping from their math-ematical formalisation into SQL, similar to that which is carried out with OCL in this article. In the same line, several worksrelated to software engineering [25,8] present similar mappings between OCL and relational calculus. However, the necessityto provide visual artefacts for OLAP is also well-known [26]. With regard to OLAP algebras, this necessity has also been ex-tended to the query definition by means of graphical languages with which to query data warehouses [37] or the direct rep-resentation of queries by marking conceptual schemata [6].

Interestingly, there is an analogy between the operations presented in an OLAP algebra and the data interactions identi-fied in the discipline of visual-analytics. Specifically, [40] discusses an information seeking mantra, which is widely-knownwithin the community. This summarises all human-computer interaction in a reduced set of operations. These operationsare always ‘‘overview first, zoom and filter, details on demand”. As readers may note, this mantra equally characterises OLAPanalyses, in which OLAP algebras define these operations. For example, overview, zooming, and detailing operations aretranslated into OLAP roll-up and drill-down, whereas filters signify slice and dice.

To sum up, this study extends our previous works on the model-driven development of data warehouses. This researchcommences with a proposal for a conceptual modelling framework [23] which supports the OLAP algebra herein. [28] for-mally presents the technical scaffolding for the code generation of database schemata. In [29], dimensional normal forms are

6 See http://www.lucentia.es/index.php/OCL/OLAP_Algebra for a comprehensive account of the data cubes involved.

Fig. 9. Data-warehousing development platform: details of conceptual modelling.

Table 23D data cubes involved in the sample OLAP session.

City Day Product v v.oclType()

grain Madrid 17/12/2008 LAPTOP 10 QuantityMadrid 23/05/2007 LAPTOP 2 UnitsBarcelona 15/05/2007 KEYBOARD 2 UnitsSeville 15/05/2007 KEYBOARD 1 UnitsSeville 13/05/2007 SCREEN 5 QuantityGranada 15/05/2007 CD 25 UnitsLondon 19/11/2008 SCREEN 500 Amount

cs_sd Madrid 17/12/2008 LAPTOP 10 QuantityMadrid 23/05/2007 LAPTOP 2 UnitsBarcelona 15/05/2007 KEYBOARD 2 UnitsSeville 15/05/2007 KEYBOARD 1 UnitsSeville 13/05/2007 SCREEN 5 QuantityGranada 15/05/2007 CD 25 Units

cs_dpu Madrid 23/05/2007 LAPTOP 2 UnitsBarcelona 15/05/2007 KEYBOARD 2 UnitsSeville 15/05/2007 KEYBOARD 1 UnitsGranada 15/05/2007 CD 25 Units

cs_dpq Madrid 17/12/2008 LAPTOP 10 QuantitySeville 13/05/2007 SCREEN 5 Quantity

cs_da Madrid 17/12/2008 LAPTOP 10 Quantity

598 J. Pardillo et al. / Information Sciences 180 (2010) 584–601

technically articulated according to [28]. This is also extended in [34] in order to design the databases of an entire data-ware-housing architecture, namely corporate data warehouses acting together with data marts. While these works are focused ondemand-driven approaches, in [27], a hybrid approach is studied, i.e., integrating both information needs and operationaldata sources (the main two forces in data warehousing) into conceptual modelling. Code generation for OLAP tools basedon relational databases is studied in [33], together with the database schemata derivation. Data mining is also studied atthe conceptual level in [42].

Fig. 11. Logical models generated for star and snowflake schemata.

Fig. 10. Model transformations for the generation of database schemata.

J. Pardillo et al. / Information Sciences 180 (2010) 584–601 599

8. Conclusions

As a result of the current lack of support, OLAP queries are not defined as part of the conceptual multidimensional modelbut are added only after it has been implemented in the final platform. This is error-prone and makes the early validation ofthe development process difficult if the envisaged data warehouse is to satisfy the users information requirements. In thisarticle, we address this issue by modelling an OCL extension which defines a set of OLAP operators that facilitate the defi-nition of platform-independent OLAP queries as part of the multidimensional conceptual modelling of the data warehouse.

600 J. Pardillo et al. / Information Sciences 180 (2010) 584–601

Our solution takes advantage of well-known standards of the software-engineering discipline, e.g., UML, OCL, or QVT, thatenable an integrated solution for querying data warehouses. These standards can be used to permit both data warehouseschemata and OLAP queries to be automatically generated for various logical configurations such as common star and snow-flake schemata in relational platforms.

The main benefits of the presented solution are: (i) it is the first approach to extend OCL in order to formally and under-standably define OLAP queries at the conceptual level, (ii) this solution integrates query language and data schemata at theconceptual level, thus enabling us to check whether multidimensional models agree with users’ information needs duringthe early stages of development [7], and (iii) the code-generation scaffolding automatically manages and translates concep-tual OLAP queries into their logical counterparts with regard to a specific platform, which is extremely useful for improvingthe performance of these queries by means of pre-aggregation [22,17,35].

Future work includes the enrichment of the algebra studied here with additional operations. In particular, this algebracontains operations concerning data manipulation. However, operators related to visual concerns such as pivoting and sort-ing may be also investigated. Moreover, additional heavier OCL notational extensions may be introduced in order to provideshorter symbols. With regard to the development process, more powerful model transformations may be investigated in or-der to introduce optimisation techniques with regard to the different platforms available (query optimisers, physical struc-tures such as indexes, partitions, etc.). It would also be interesting to enrich requirement analysis by capturing the mostfrequent user needs codified in OCL, the usage of the dimensional normal forms [29] to assure the correctness of conceptualmodels related to their logical design, and the assessment of additivity constraints [15] during conceptual modelling.

References

[1] A. Abelló, J. Samos, F. Saltor, Implementing operations to navigate semantic star schemas, in: International Workshop on Data Warehousing and OLAP(DOLAP), 2003, pp. 56–62.

[2] A. Abelló, J. Samos, F. Saltor, YAM2: a multidimensional conceptual model extending UML, Inf. Syst. 31 (6) (2006) 541–567.[3] F. Abreu, Using OCL to Formalize Object Oriented Metrics Definitions, INESC, Software Engineering Group ES007/2001, Version 0.9, May 2001.[4] J. Bézivin, Model driven engineering: an emerging technical space, in: Generative and Transformational Techniques in Software Engineering (GTTSE),

2006, pp. 36–64.[5] M. Blaschka, C. Sapia, G. Höfling, B. Dinter, Finding your way through multidimensional data models, in: International Conference on Database and

Expert Systems Applications (DEXA) Workshop, 1998, pp. 198–203.[6] L. Cabibbo, R. Torlone, From a procedural to a visual query language for OLAP, in: International Conference on Scientific and Statistical Database

Management (SSDBM), 1998, pp. 74–83.[7] J. Cabot, E. Teniente, Incremental integrity checking of UML/OCL conceptual schemas, J. Syst. Software 82 (9) (2009) 1459–1478.[8] B. Demuth, H. Hußmann, Using UML/OCL constraints for relational database design, in: International Conference on UML, 1999, pp. 598–613.[9] D. Embley, D. Barry, S. Woodfield, Object-Oriented Systems Analysis, A Model-Driven Approach, Youdon Press Computing Series, 1992.

[10] J. Evermann, Y. Wand, Ontology based object-oriented domain modelling: fundamental concepts, Requir. Eng. 10 (2) (2005) 146–160.[11] J. Evermann, Y. Wand, Ontology based object-oriented domain modeling: representing behavior, J. Database Manag. 20 (1) (2009) 48–77.[12] E. Fernández-Medina, M. Piattini, Extending OCL for secure database development, in: International Conference on UML, 2004, pp. 380–394.[13] L. Fuentes-Fernández, A. Vallecillo-Moreno, An introduction to UML profiles, Eur. J. Inform. Prof. 5 (2) (2004) 5–13.[14] J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, H. Pirahesh, Data cube: a relational aggregation operator

generalizing group-by, cross-tab, and sub-totals, Data Min. Kowl. Disc. 1 (1) (1997) 29–53.[15] J. Horner, I.-Y. Song, P.P. Chen, An analysis of additivity in OLAP systems, in: International Workshop on Data Warehousing and OLAP (DOLAP), 2004,

pp. 83–91.[16] S.-M. Huang, T.-H. Chou, J.-L. Seng, Data warehouse enhancement: a semantic cube model approach, Inform. Sci. 177 (11) (2007) 2238–2254.[17] M.-C. Hung, M.-L. Huang, D.-L. Yang, N.-L. Hsueh, Efficient approaches for materialized views selection in a data warehouse, Inform. Sci. 177 (6) (2007)

1333–1348.[18] B. Hüsemann, J. Lechtenbörger, G. Vossen, Conceptual data warehouse modeling, in: International Workshop on Design and Management of Data

Warehouses (DMDW), 2000, p. 6.[19] F. Jouault, I. Kurtev, On the architectural alignment of ATL and QVT, in: ACM Symposium on Applied Computing (SAC), 2006, pp. 1188–1195.[20] K.-F. Kao, I.-E. Liao, An index selection method without repeated optimizer estimations, Inform. Sci. 179 (13) (2009) 2263–2272.[21] R. Kimball, M. Ross, The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, Wiley, 2002.[22] Y.-C. Liu, P.-Y. Hsu, G.-J. Sheen, S. Ku, K.-W. Chang, Simultaneous determination of view selection and update policy with stochastic query and response

time constraints, Inform. Sci. 178 (18) (2008) 3491–3509.[23] S. Luján-Mora, J. Trujillo, I.-Y. Song, A UML profile for multidimensional modeling in data warehouses, Data Knowl. Eng. 59 (3) (2006) 725–769.[24] E. Malinowski, E. Zimányi, Hierarchies in a multidimensional model: From conceptual modeling to logical representation, Data Knowl. Eng. 59 (2)

(2006) 348–377.[25] L. Mandel, M.V. Cengarle, On the expressive power of OCL, in: World Congress on Formal Methods (FM), 1999, pp. 854–874.[26] A.S. Maniatis, P. Vassiliadis, S. Skiadopoulos, Y. Vassiliou, Advanced visualization for OLAP, in: International Workshop on Data Warehousing and OLAP

(DOLAP), 2003, pp. 9–16.[27] J.-N. Mazón, J. Pardillo, J. Trujillo, A model-driven goal-oriented requirement engineering approach for data warehouses, in: International Conference

on Conceptual Modelling (ER) Workshops, 2007, pp. 255–264.[28] J.-N. Mazón, J. Trujillo, An MDA approach for the development of data warehouses, Decis. Support Syst. 45 (1) (2008) 41–58.[29] J.-N. Mazón, J. Trujillo, J. Lechtenbörger, Reconciling requirement-driven data warehouses with data sources via multidimensional normal forms, Data

Knowl. Eng. 63 (3) (2007) 725–751.[30] J.-N. Mazón, J. Lechtenbörger, J. Trujillo, A survey on summarizability issues in multidimensional modeling, Data Knowl. Eng. 68 (12) (2009) 1452–

1469.[31] Object Management Group, Specifications: Model Driven Architecture (MDA) v.1.0.1, Object Constraint Language (OCL) v2.0, Unified Modelling

Language (UML) v.2.2b1, October 2009. <http://www.omg.org>.[32] J. Pardillo, J.-N. Mazón, J. Trujillo, Bridging the semantic gap in OLAP models: platform-independent queries, in: International Workshop on Data

Warehousing and OLAP (DOLAP), 2008, pp. 89–96.[33] J. Pardillo, J.-N. Mazón, J. Trujillo, Model-driven metadata for OLAP cubes from the conceptual modelling of data warehouses, in: International

Conference on Data Warehousing and Knowledge Discovery (DaWaK), 2008, pp. 13–22.[34] J. Pardillo, J. Trujillo, Integrated model-driven development of goal-oriented data warehouses and data marts, in: International Conference on

Conceptual Modelling (ER), 2008, pp. 426–439.

J. Pardillo et al. / Information Sciences 180 (2010) 584–601 601

[35] T.B. Pedersen, C.S. Jensen, C.E. Dyreson, Extending practical pre-aggregation in on-line analytical processing, in: International Conference on Very LargeData Bases (VLDB), 1999, pp. 663–674.

[36] N. Prat, J. Akoka, I. Comyn-Wattiau, A UML-based data warehouse design method, Decis. Support Syst. 42 (3) (2006) 1449–1473.[37] F. Ravat, O. Teste, R. Tournier, G. Zurfluh. Graphical querying of multidimensional databases, in: East-European Conference on Advances in Databases

and Information Systems (ADBIS), 2007, pp. 298–313.[38] S. Rizzi, A. Abelló, J. Lechtenbörger, J. Trujillo, Research in data warehouse modeling and design: dead or alive? in: International Workshop on Data

Warehousing and OLAP (DOLAP), 2006, pp. 3–10.[39] O. Romero, A. Abelló, On the need of a reference algebra for OLAP, in: International Conference on Data Warehousing and Knowledge Discovery

(DaWaK), 2007, pp. 99–110.[40] B. Shneiderman, The eyes have it: a task by data type taxonomy for information visualizations, in: IEEE Symposium on Visual Languages (VL), 1996, pp.

336–343.[41] J. Trujillo, M. Palomar, J. Gómez, I.-Y. Song, Designing Data Warehouses with OO Conceptual Models, IEEE Comput. 34 (12) (2001) 66–75.[42] J. Zubcoff, J. Pardillo, J. Trujillo, A UML profile for the conceptual modelling of data-mining with time-series in data warehouses, Inf. Software Technol.

51 (6) (2009) 977–992.