Daniel and Matera have written the Integration Mashups · Shipm ent rd s CRM data o n Pcustomers...

10
Data-Centric Systems and Applications Mashups Florian Daniel Maristella Matera Concepts, Models and Architectures Chapter 2 Data and Application Integration Figures

Transcript of Daniel and Matera have written the Integration Mashups · Shipm ent rd s CRM data o n Pcustomers...

Page 1: Daniel and Matera have written the Integration Mashups · Shipm ent rd s CRM data o n Pcustomers Integrated Order DB SHIPMENT(OrderID, SenderID, SenderName, SenderSurname, SenderAddress,

1

Daniel · Matera

Data-Centric Systems and ApplicationsMichael J. Carey · Stefano Ceri Series Editors

Florian Daniel · Maristella Matera

Mashups Concepts, Models and Architectures

Data-Centric Systems and Applications

MashupsFlorian DanielMaristella Matera

Computer Science

DCSA

Mashups Concepts, Models

and Architectures

Mashups have emerged as an innovative soft ware trend that re-interprets existing Web building blocks and leverages the composition of individual components in novel, value-adding ways. Additional appeal also derives from their potential to turn non-programmers into developers.

Daniel and Matera have written the fi rst comprehensive reference work for mashups. Th ey systematically cover the main concepts and techniques underlying mashup de-sign and development, the synergies among the models involved at diff erent levels of abstraction, and the way models materialize into composition paradigms and archi-tectures of corresponding development tools. Th e book deliberately takes a balanced approach, combining a scientifi c perspective on the topic with an in-depth view on relevant technologies. To this end, the fi rst part of the book introduces the theoretical and technological foundations for designing and developing mashups, as well as for designing tools that can aid mashup development. Th e second part then focuses more specifi cally on various aspects of mashups. It discusses a set of core component tech-nologies, core approaches, and architectural patterns, with a particular emphasis on tool-aided mashup development exploiting modeldriven architectures. Development processes for mashups are also discussed, and special attention is paid to composition paradigms for the end-user development of mashups and quality issues.

Overall, the book is of interest to a wide range of readers. Students, lecturers, and researchers will fi nd a comprehensive overview of core concepts and technological foundations for mashup implementation and composition. Even without low-level coding details, practitioners like soft ware architects will fi nd guidance on key imple-mentation concepts, architectural patterns, and development tools and approaches. A related website provides additional teaching material which can be used either as part of a course or for self study.

This book is timely, provides a through scientific investigation and also has prac-tical relevance in the general area of composition and mashups. It is of particular interest to researchers and professionals wishing to learn about relevant concepts and techniques in service mashups, composition, and end-user programming. From the Preface by Boualem Benatallah, University of New South Wales, Sydney

9 7 8 3 6 4 2 5 5 0 4 8 5

ISBN 978-3-642-55048-5

Chapter 2 Data and Application Integration Figures

Page 2: Daniel and Matera have written the Integration Mashups · Shipm ent rd s CRM data o n Pcustomers Integrated Order DB SHIPMENT(OrderID, SenderID, SenderName, SenderSurname, SenderAddress,

© Copyright 2014 by F. Daniel and M. Matera. Reproduction for classroom use and teaching allowed if source is properly cited.

2.3 Data Integration 21

Virtual integrationMaterialized integration

Data Source

Wrapper(local schema)

Wrapper(local schema)

Wrapper(local schema)

Mediator(Global Schema)

…  ...

…  ...

query answer

Data Source Data Source

Integrated Information System

query answer

Data Warehouse

ETL

Data Source Data Source Data Source

load load load

Integrated Information System

load

query

query

answer

answer

Fig. 2.1 Typical architectures for materialized and virtual data integration [3].

The peculiarity of data wharehouses with respect to other integration so-lutions is that the integrated data set is materialized and stored in a singlerepository; therefore, query answering generally takes a short time. However,several problems relate to data freshness. In fact, as represented in Figure2.1, data are extracted from the data sources, transformed and loaded in thewarehouse prior to query time, for example at regular intervals, while globalqueries are instead posed directly to the warehouse. This means that, in anenvironment where data at the origin change very frequently, ETL opera-tions would have to be performed with the same frequency of the data sourceupdates, and complexity would hence explode. For this reason, the adoptionof data warehouses is recommended when complex, ad-hoc data analysis andmining of historical data have to be performed, while such systems are notappropriate when frequent access to fresh data is required.

Loosely coupled solutions have then been proposed to provide unified in-terfaces to real-time distributed data. These architectures support a virtualdata integration : as represented in Figure 2.1, data remain at the origi-nal sources and, every time a query is posed to the integrated system, theinformation is extracted directly from the data sources. This solution is par-ticularly appropriate when autonomous and very heterogeneous sources, asthe ones available on the Web, have to be integrated, and also when theintegrated system is characterized by “transient” data integration tasks, im-plying the dynamic access to fresh data and resulting into integrated datasets that are useful and valid for a short period of time only [106].

It is evident that virtual approaches fit better the data integration re-quirements generally occurring in mashup development. For this reason, inthe rest of this section we discuss in more details this class of data integrationsolutions.

Page 3: Daniel and Matera have written the Integration Mashups · Shipm ent rd s CRM data o n Pcustomers Integrated Order DB SHIPMENT(OrderID, SenderID, SenderName, SenderSurname, SenderAddress,

© Copyright 2014 by F. Daniel and M. Matera. Reproduction for classroom use and teaching allowed if source is properly cited.

2.3 Data Integration 23

Source3

CUSTOMER(CustomerID, Name, Surname, Address)

Source1

ORDER(OrderID, SenderID, DestinationAddress, Description, PayerID, PickUpDate, PickUpTime, Weight, Size)

CRM data on customersShipment orders

Integrated Order DB

SHIPMENT(OrderID, SenderID, SenderName, SenderSurname, SenderAddress, DestinationAddress, PickUpDate, PickUpTime)

Data on shipments (including pricing conditions)

Source2

Mediator (virtual global schema)

Data sources (local schemas)

Pricing data

PRICING_DATA(CustomerID, Discount, Taxes)

BILLING(OrderID, CustomerID, Weight, Size, Discount, Taxes)

Fig. 2.2 Example of an integrated database storing shipment data extracted fromdi↵erent data sources. Each data source is characterized by a local schema. Data inte-gration is performed according to a vritual global schema managed by the mediator.

2.3.2.1 Global-As-View (GAV) mapping

In the Global-As-View (GAV) mapping, the content of each relation inthe mediated schema is specified through a query, i.e., a view, over a combi-nation of the source local schemas. Figure 2.3(a) graphically illustrates thedefinition, according to a GAV mapping, for the BILLING relation of ourreference example. We do not express the view definition in a query lan-guage. However, intuitively, the figure describes how the BILLING tuples inthe global database can be built by combining data extracted from tuplesin the three sources: the attributes OrderID, CustomerID, Weight andSize come from a tuple in the ORDER relation in Source1; CustomerNameand CustomerSurname from a tuple of the CUSTOMER relation in Source2with the same CustomerID; and Discount and Taxes from a tuple in thePRICING relation Source3 having the same CustomerID. Similar mappingscan also be specified for defining the SHIPMENT global relation.

GAV is conceptually simple [106]. Indeed, the defined views express exactlyhow the elements of the global schemas have to be instantiated using the datao↵ered by the data sources. The execution of queries thus simply requiresunfolding the global queries, taking into account the view definitions, andissuing the resulting queries to the original data sources. Nevertheless, thisapproach works well only for stable data sources. In the case of modificationsto the data sources (e.g., a local schema changes or a data source has to beadded or deleted) the global schema needs to be modified as well.

Page 4: Daniel and Matera have written the Integration Mashups · Shipm ent rd s CRM data o n Pcustomers Integrated Order DB SHIPMENT(OrderID, SenderID, SenderName, SenderSurname, SenderAddress,

© Copyright 2014 by F. Daniel and M. Matera. Reproduction for classroom use and teaching allowed if source is properly cited.

24 2 Data and Application Integration

Source1.ORDER OrderID SenderID DestinationAddress PickUpDate PickUpTime Weight Size

Source2.CUSTOMER CustomerID Name Surname Address

Local sources Mediator

a) GAV Mapping for the global relation BILLING. The global relation is defined as a view on the local source relations.

b) LAV Mapping for Source2 and Source3. The local source relations are defined as views over the global relations.

Source3.PRICING CustomerID Discount Taxes

BILLING OrderID CustomerID CustomerName CustomerSurname Weight Size Discount Taxes

SHIPMENT OrderID SenderID SenderAddress DestinationAddress PickUpDate PickUpTimeSource2.CUSTOMER

CustomerID Name Surname Address

Local sources Mediator

Source3.PRICING CustomerID Discount Taxes

BILLING OrderID CustomerID CustomerName CustomerSurname Weight Size Discount Taxes

Join

Join

Join

Fig. 2.3 Example of GAV and LAV schema mappings for the integrated order DB.

2.3.2.2 Local-As-View (LAV) mapping

With respect to GAV, the Local-As-View (LAV) mapping privileges morethe independence of the local sources with respect to the global schema defi-nition, thus it tries to overcome the intrinsic rigidity of GAV which makes itdi�cult to modify the local schemas or to add a new data source. In LAV, en-tities in the original sources are mapped to the mediated schema. This meansthat the mediated schema is designed independently of the actual sources and,instead of specifying how to compute tuples of the mediated schema like inGAV, each data source is described as a view over the mediated schema.

For instance, in our reference example, a LAV mapping (see Figure 2.3)would express that the attributes of the CUSTOMER relation in Source2correspond to the CustomerID, CustomerName and CustomerSurnameattributes of the BILLING global relation; by means of a join on theCustomerID attribute, the Address attribute is then mapped on the

Page 5: Daniel and Matera have written the Integration Mashups · Shipm ent rd s CRM data o n Pcustomers Integrated Order DB SHIPMENT(OrderID, SenderID, SenderName, SenderSurname, SenderAddress,

© Copyright 2014 by F. Daniel and M. Matera. Reproduction for classroom use and teaching allowed if source is properly cited.

30 2 Data and Application Integration

Machine 1 Machine 2 Machine 3

Local OS 1

Network

Distributed application

Distributed system layer (middleware)

Local OS 2 Local OS 3

Fig. 2.4 The basic architecture of a distributed system and application.

the reason for which there is the need for an ingredient in the developmentof distributed applications that is not needed in the development of conven-tional, non-distributed applications, i.e., middleware.

Middleware is software that enables and mediates network heterogene-ity and interactions among the pieces of a distributed application. That is,middleware provides a distributed system layer that masks hardware and op-erating system heterogeneity and provides an abstract view on the capabilitiesof the distributed system. This abstract view is provided to the distributedapplications running on top, so as to enable and facilitate their development.As such, the middleware services are typically used by developers who pro-gram distributed applications and less by non-programmers like the usersof the distributed applications. Middleware therefore represents, on the onehand, a computing infrastructure upon which distributed applications canbe developed and, on the other hand, a programming abstraction that easesdevelopment.

Depending on the kind of abstraction and development support they pro-vide, we distinguish four core types of middleware [13]:

• RPC-based middleware : The use of remote procedure calls (RPCs)represents the most basic technique to implement a distributed systemand the foundation for many of today’s middleware solutions. RPC-basedmiddleware provides the necessary infrastructure to transform procedurecalls into remote calls that are able to invoke procedures running on adi↵erent machine as if they were running locally. The problem with proce-dure calls, independently of whether local or remote, is that they typicallyimplement a blocking invocation model, which requires the invoking en-tity to wait till the invoked entity has finished processing. XML-RPC [278]and the Simple Object Access Protocol (SOAP [47]) are examples of RPCmiddleware.

• Object brokers: Object brokers address another issue of RPC-based mid-dleware, i.e., their intrinsic focus on imperative, procedural programminglanguages. With the advent of object-oriented programming, RPCs be-came apparently obsolete, and the focus of middleware moved from RPCs

Page 6: Daniel and Matera have written the Integration Mashups · Shipm ent rd s CRM data o n Pcustomers Integrated Order DB SHIPMENT(OrderID, SenderID, SenderName, SenderSurname, SenderAddress,

© Copyright 2014 by F. Daniel and M. Matera. Reproduction for classroom use and teaching allowed if source is properly cited.

32 2 Data and Application Integration

formation operations may plugged-in in the form of external, custom appli-cation logic, providing for extensibility. The configuration and operation ofthe whole message broker can be managed via a dedicated administration in-terface. Examples of message brokers are IBM’s WebSphere Message Broker[153] and today’s enterprise service buses (ESBs).

2.4.2 Workflow management systems

Alonso et al. [13] correctly point out that the key contribution of messagebrokers to application integration is their capability to hide the heterogeneityand distribution of the information systems to integrate and to provide auniform view on their data and functionalities. Workflow managementsystems (WfMSs) provide support for the last ingredient, i.e., the definitionof integration logic.

The initial application area of WfMSs was o�ce automation, whose pur-pose is to automate the coordination of repetitive, administrative processesinvolving multiple human actors and electronic documents. Soon, it becameevident that the automation support provided by WfMSs could be lever-aged to coordinate also software systems, not only human actors. Nowadays,WfMSs (or business process management systems, as they are called moreprominently today) are very flexible systems that enable the seamless coor-dination of software and human actors, typically starting from a graphicalmodel that expresses the necessary integration logic. Most systems also pro-vide support for advanced coordination features, such as transactions, ex-ception handling, recovery and compensation, events, deadline management,and similar.

sender pays

Get shipment details

Calculate route

Compute cost

Payment Delivery

Delivery Payment

receiver pays

Shipment planning application

Customer relationship management system Billing application

Payment application

Delivery app

Delivery appPayment

application

Fig. 2.5 A simplified workflow model of a possible integration logic for the logisticsapplication integration scenario. The model shows how the integration logic coordi-nates the interaction with existing information systems.

Page 7: Daniel and Matera have written the Integration Mashups · Shipm ent rd s CRM data o n Pcustomers Integrated Order DB SHIPMENT(OrderID, SenderID, SenderName, SenderSurname, SenderAddress,

© Copyright 2014 by F. Daniel and M. Matera. Reproduction for classroom use and teaching allowed if source is properly cited.

2.4 Application Integration 35

ness processes and agile applications that span organizations and platformscaused a re-thinking of some of the traditional development practices and theemergence of a new paradigm. This new paradigm is called service-orientedcomputing (SOC), and it is a computing paradigm that uses Web servicesas building blocks for the engineering of composite, distributed applicationsout of the reusable application logic encapsulated by Web services [46].

SOC builds on two ingredients: an ecosystem of readily reusable Web ser-vices and a development paradigm based on composition as core abstraction.We discuss both in the following.

2.4.4.1 Service-oriented architecture (SOA)

The architecture of the Web service ecosystem is commonly called a service-oriented architecture (SOA), which is a logical architecture for the designof software systems that provide services to either end-user applications orto other services distributed in a network, via published and discoverableinterfaces [224].

The SOA can be articulated into three roles and two artifacts (see Figure2.6): A web service (the first artifact) is implemented and made publiclyaccessibly by a so-called provider (or service provider). In order to advertisethe web service and enable its potential clients to be aware of the existence ofthe service, the provider publishes a service descriptor (the second artifact),which describes the purpose and features of the service, where to access it (aURI), and how to access it (e.g., using which communication protocol). Thisadvertisement is supported by a dedicated registry, which aggregates servicedescriptors from multiple providers and allows potential consumers to searchfor descriptors. If a consumer wants to use some external functionality, it canquery the registry for suitable web services, obtaining as response a servicedescriptor (or a set thereof). Following the instructions on how to invoke theservice contained in the retrieved descriptor, the consumer can bind to theconcrete service and use it.

Provider

Consumer

Registry

publishes descriptor

searches for service

binds to and

uses service

Web service

Service descriptor retrieves descriptor

Fig. 2.6 The service-oriented architecture (SOA) with its roles and artifacts.

Page 8: Daniel and Matera have written the Integration Mashups · Shipm ent rd s CRM data o n Pcustomers Integrated Order DB SHIPMENT(OrderID, SenderID, SenderName, SenderSurname, SenderAddress,

© Copyright 2014 by F. Daniel and M. Matera. Reproduction for classroom use and teaching allowed if source is properly cited.

2.4 Application Integration 37

Provider 1 Consumer 2 Provider 2Consumer 1

Local OS 1

The Web

Local OS 2 Local OS 3

App. logic

Data

Present. App. logic

WS1WS2

WS3

Web service compositionService-based application

Local OS 4

Web service middleware, SOAP, HTTP, TCP/IP

App. logic

WS4

WS5App. logic

Process engine

Fig. 2.7 The distributed computing environment enabled by web services as aninstance of the generic architecture of distributed systems (see Figure 2.4).

according programming libraries and/or Web service runtime environments(e.g., so-called runtime containers). A process/service orchestration engine(e.g., Apache ODE), an evolution and adaptation of WfMSs to Web services,provides for the actual composition and facilitates the abstract definition ofintegration logic and its execution. Pure service compositions may be builtexclusively using a process engine, service-based applications usually also relyon hand-coded integration logics. Of course, the use of a process engine isnot mandatory in any case, but its composition-specific abstractions and run-time support significantly eases the composition task compared to coding theintegration logic with common programming languages.

Process engines are instructed with an abstract specification of the process(the Web services composition). An important role in service composition istherefore played by the logical integration languages used to express this spec-ification. In this respect, the standard language for Web service compositiontoday is the Web Services Business Process Execution Language (WS-BPEL[163]), an XML dialect that is specifically tailored to the needs of Web ser-vice composition and that enjoys large support by all major process enginevendors.

Figure 2.8 provides an intuitive feeling of the syntax of WS-BPEL anda functional view on the main composition features. The key aspects of aprocess specification are the definition of the partner links, which define theWeb services to communicate with, the variables used to store intermediateresults and messages, and the structured set of activities, which implementthe actual integration logic. The language comprises activities for the interac-tion with Web services (receive, reply, invoke, for storing data into variables(assign), for common control flow specifications (sequence, if, while, etc.), aswell as for exception and event handling.

Page 9: Daniel and Matera have written the Integration Mashups · Shipm ent rd s CRM data o n Pcustomers Integrated Order DB SHIPMENT(OrderID, SenderID, SenderName, SenderSurname, SenderAddress,

© Copyright 2014 by F. Daniel and M. Matera. Reproduction for classroom use and teaching allowed if source is properly cited.

38 2 Data and Application Integration

<process name="NCName" targetNamespace="anyURI" queryLanguage="anyURI"? expressionLanguage="anyURI"? suppressJoinFailure="yes|no"? exitOnStandardFault="yes|no"? xmlns="http://docs.oasis-open.org/wsbpel/2.0/ process/executable"> <extensions>? ... </extensions> <import namespace="anyURI"? location="anyURI"? importType="anyURI" />* <partnerLinks>? <!-- Note: At least one role must be specified. --> ... </partnerLinks> <messageExchanges>? ... </messageExchanges> <variables>? ... </variables> <correlationSets>? ... </correlationSets> <faultHandlers>? <!-- Note: There must be at least one faultHandler --> ... </faultHandlers> <eventHandlers>? ... </eventHandlers>

activity*

</process>

The header of the process definition, among

others, specifies the languages used to query

XML data and define expressions (e.g., used

by the <condition> and <while> activities).

Partner links model relationships between

partner processes for peer-to-peer

communications. Partner links have a type

and an endpoint reference.

Extensions enable the definition of new

attributes, new elements, operation

extensions, etc.

Enables declaring dependencies on external

XML Schema or WSDL definitions.

Message exchanges associate outbound

messages with inbound messages.

Variables hold messages or data and capture

the state of the process.

Correlation sets are sets of properties shared

by all messages of a conversation.

Fault handlers allow the definition of fault

handling activities.

Event handlers enable reacting to external

events or alarms.

Activities specify the process logic.

Fig. 2.8 The basic structure of a service composition (a process) in BPEL [163].

For example, if we wanted to implement the shipment management pro-cess of Figure 2.5 as a Web service composition (assuming that all involvedsystems are equipped with a suitable Web service), the corresponding WS-BPEL logic could look as follows: The Get shipment details task is initiatedby the operator of the logistics company through an application that sendsa message to the process. That is, the task starts with receiving a message.Then the task invokes the CRM system to confirm sender and receiver data,e.g., via a synchronous request-response operation. The received shipmentdetails are assigned to a variable, which can be used to forward them tothe shipment planning application via the invocation of the respective Webservice, which corresponds to the execution of the Calculate route task. Thecalculated route details can be assigned to another variable, which can beused as input for the Compute cost task. Doing so requires invoking thebilling application with attached the route details. And so on and so forth.

BPEL is a standard language for the composition of SOAP web services.In [226] and [227], Pautasso shows examples of how to compose RESTful webservices by extending JOpera and BPEL, respectively.

Page 10: Daniel and Matera have written the Integration Mashups · Shipm ent rd s CRM data o n Pcustomers Integrated Order DB SHIPMENT(OrderID, SenderID, SenderName, SenderSurname, SenderAddress,

© Copyright 2014 by F. Daniel and M. Matera. Reproduction for classroom use and teaching allowed if source is properly cited.

2.5 Cloud Computing 41

comes for the cost of the resources consumed by the virtualization itself. Yet,practice has shown that the benefits largely outweigh this additional e↵ort.

2.5.2 Cloud architectures

In line with the above virtualization strategies, cloud architectures can bestructured into di↵erent logical layers, each providing a di↵erent type of re-source as a service and, hence di↵erent abstractions. We illustrate these layersin Figure 2.9 and discuss the three bottom-most layers, i.e., the technologicalones (for an explanation of the HaaS layer, we refer the reader to [31] or[149]):

• Infrastructure as a Service (IaaS): IaaS provides computing infras-tructure as a service, that is, it provides an abstract view on hardware(e.g., computers, CPUs, storage devices, memory, etc.) and allows theuser to run own images of operating systems on top of the virtualizedhardware. The management interface provided by IaaS providers typicallyallows their users to dynamically define the required hardware characteris-tics and to start and stop operating system instances (e.g., to save money,as running an instance costs money). The target user of IaaS is the systemadministrator.

• Platform as a Service (PaaS): PaaS usually provides developmentenvironments or runtime environments for software development and soft-

Cloud

Software as a Service (SaaS)

Applications

Application services

Platform as a Service (PaaS)

Integrated development environment

Runtime environment

Infrastructure as a Service (IaaS)

Infrastructure services

Resource set

Virtual resource set

Physical resource set

Human as a Service (HaaS)

Crowdsourcing

SaaSPaaSIaaS

HaaS

Fig. 2.9 A simplified cloud architecture stack with IaaS, PaaS, SaaS, and HaaS(adapted from [172]).