Daniel and Matera have written the Integration Mashups · Shipm ent rd s CRM data o n Pcustomers...
Transcript of Daniel and Matera have written the Integration Mashups · Shipm ent rd s CRM data o n Pcustomers...
1
Daniel · Matera
Data-Centric Systems and ApplicationsMichael J. Carey · Stefano Ceri Series Editors
Florian Daniel · Maristella Matera
Mashups Concepts, Models and Architectures
Data-Centric Systems and Applications
MashupsFlorian DanielMaristella Matera
Computer Science
DCSA
Mashups Concepts, Models
and Architectures
Mashups have emerged as an innovative soft ware trend that re-interprets existing Web building blocks and leverages the composition of individual components in novel, value-adding ways. Additional appeal also derives from their potential to turn non-programmers into developers.
Daniel and Matera have written the fi rst comprehensive reference work for mashups. Th ey systematically cover the main concepts and techniques underlying mashup de-sign and development, the synergies among the models involved at diff erent levels of abstraction, and the way models materialize into composition paradigms and archi-tectures of corresponding development tools. Th e book deliberately takes a balanced approach, combining a scientifi c perspective on the topic with an in-depth view on relevant technologies. To this end, the fi rst part of the book introduces the theoretical and technological foundations for designing and developing mashups, as well as for designing tools that can aid mashup development. Th e second part then focuses more specifi cally on various aspects of mashups. It discusses a set of core component tech-nologies, core approaches, and architectural patterns, with a particular emphasis on tool-aided mashup development exploiting modeldriven architectures. Development processes for mashups are also discussed, and special attention is paid to composition paradigms for the end-user development of mashups and quality issues.
Overall, the book is of interest to a wide range of readers. Students, lecturers, and researchers will fi nd a comprehensive overview of core concepts and technological foundations for mashup implementation and composition. Even without low-level coding details, practitioners like soft ware architects will fi nd guidance on key imple-mentation concepts, architectural patterns, and development tools and approaches. A related website provides additional teaching material which can be used either as part of a course or for self study.
This book is timely, provides a through scientific investigation and also has prac-tical relevance in the general area of composition and mashups. It is of particular interest to researchers and professionals wishing to learn about relevant concepts and techniques in service mashups, composition, and end-user programming. From the Preface by Boualem Benatallah, University of New South Wales, Sydney
9 7 8 3 6 4 2 5 5 0 4 8 5
ISBN 978-3-642-55048-5
Chapter 2 Data and Application Integration Figures
© Copyright 2014 by F. Daniel and M. Matera. Reproduction for classroom use and teaching allowed if source is properly cited.
2.3 Data Integration 21
Virtual integrationMaterialized integration
Data Source
Wrapper(local schema)
Wrapper(local schema)
Wrapper(local schema)
Mediator(Global Schema)
… ...
… ...
query answer
Data Source Data Source
Integrated Information System
query answer
Data Warehouse
ETL
Data Source Data Source Data Source
load load load
Integrated Information System
load
query
query
answer
answer
Fig. 2.1 Typical architectures for materialized and virtual data integration [3].
The peculiarity of data wharehouses with respect to other integration so-lutions is that the integrated data set is materialized and stored in a singlerepository; therefore, query answering generally takes a short time. However,several problems relate to data freshness. In fact, as represented in Figure2.1, data are extracted from the data sources, transformed and loaded in thewarehouse prior to query time, for example at regular intervals, while globalqueries are instead posed directly to the warehouse. This means that, in anenvironment where data at the origin change very frequently, ETL opera-tions would have to be performed with the same frequency of the data sourceupdates, and complexity would hence explode. For this reason, the adoptionof data warehouses is recommended when complex, ad-hoc data analysis andmining of historical data have to be performed, while such systems are notappropriate when frequent access to fresh data is required.
Loosely coupled solutions have then been proposed to provide unified in-terfaces to real-time distributed data. These architectures support a virtualdata integration : as represented in Figure 2.1, data remain at the origi-nal sources and, every time a query is posed to the integrated system, theinformation is extracted directly from the data sources. This solution is par-ticularly appropriate when autonomous and very heterogeneous sources, asthe ones available on the Web, have to be integrated, and also when theintegrated system is characterized by “transient” data integration tasks, im-plying the dynamic access to fresh data and resulting into integrated datasets that are useful and valid for a short period of time only [106].
It is evident that virtual approaches fit better the data integration re-quirements generally occurring in mashup development. For this reason, inthe rest of this section we discuss in more details this class of data integrationsolutions.
© Copyright 2014 by F. Daniel and M. Matera. Reproduction for classroom use and teaching allowed if source is properly cited.
2.3 Data Integration 23
Source3
CUSTOMER(CustomerID, Name, Surname, Address)
Source1
ORDER(OrderID, SenderID, DestinationAddress, Description, PayerID, PickUpDate, PickUpTime, Weight, Size)
CRM data on customersShipment orders
Integrated Order DB
SHIPMENT(OrderID, SenderID, SenderName, SenderSurname, SenderAddress, DestinationAddress, PickUpDate, PickUpTime)
Data on shipments (including pricing conditions)
Source2
Mediator (virtual global schema)
Data sources (local schemas)
Pricing data
PRICING_DATA(CustomerID, Discount, Taxes)
BILLING(OrderID, CustomerID, Weight, Size, Discount, Taxes)
Fig. 2.2 Example of an integrated database storing shipment data extracted fromdi↵erent data sources. Each data source is characterized by a local schema. Data inte-gration is performed according to a vritual global schema managed by the mediator.
2.3.2.1 Global-As-View (GAV) mapping
In the Global-As-View (GAV) mapping, the content of each relation inthe mediated schema is specified through a query, i.e., a view, over a combi-nation of the source local schemas. Figure 2.3(a) graphically illustrates thedefinition, according to a GAV mapping, for the BILLING relation of ourreference example. We do not express the view definition in a query lan-guage. However, intuitively, the figure describes how the BILLING tuples inthe global database can be built by combining data extracted from tuplesin the three sources: the attributes OrderID, CustomerID, Weight andSize come from a tuple in the ORDER relation in Source1; CustomerNameand CustomerSurname from a tuple of the CUSTOMER relation in Source2with the same CustomerID; and Discount and Taxes from a tuple in thePRICING relation Source3 having the same CustomerID. Similar mappingscan also be specified for defining the SHIPMENT global relation.
GAV is conceptually simple [106]. Indeed, the defined views express exactlyhow the elements of the global schemas have to be instantiated using the datao↵ered by the data sources. The execution of queries thus simply requiresunfolding the global queries, taking into account the view definitions, andissuing the resulting queries to the original data sources. Nevertheless, thisapproach works well only for stable data sources. In the case of modificationsto the data sources (e.g., a local schema changes or a data source has to beadded or deleted) the global schema needs to be modified as well.
© Copyright 2014 by F. Daniel and M. Matera. Reproduction for classroom use and teaching allowed if source is properly cited.
24 2 Data and Application Integration
Source1.ORDER OrderID SenderID DestinationAddress PickUpDate PickUpTime Weight Size
Source2.CUSTOMER CustomerID Name Surname Address
Local sources Mediator
a) GAV Mapping for the global relation BILLING. The global relation is defined as a view on the local source relations.
b) LAV Mapping for Source2 and Source3. The local source relations are defined as views over the global relations.
Source3.PRICING CustomerID Discount Taxes
BILLING OrderID CustomerID CustomerName CustomerSurname Weight Size Discount Taxes
SHIPMENT OrderID SenderID SenderAddress DestinationAddress PickUpDate PickUpTimeSource2.CUSTOMER
CustomerID Name Surname Address
Local sources Mediator
Source3.PRICING CustomerID Discount Taxes
BILLING OrderID CustomerID CustomerName CustomerSurname Weight Size Discount Taxes
Join
Join
Join
Fig. 2.3 Example of GAV and LAV schema mappings for the integrated order DB.
2.3.2.2 Local-As-View (LAV) mapping
With respect to GAV, the Local-As-View (LAV) mapping privileges morethe independence of the local sources with respect to the global schema defi-nition, thus it tries to overcome the intrinsic rigidity of GAV which makes itdi�cult to modify the local schemas or to add a new data source. In LAV, en-tities in the original sources are mapped to the mediated schema. This meansthat the mediated schema is designed independently of the actual sources and,instead of specifying how to compute tuples of the mediated schema like inGAV, each data source is described as a view over the mediated schema.
For instance, in our reference example, a LAV mapping (see Figure 2.3)would express that the attributes of the CUSTOMER relation in Source2correspond to the CustomerID, CustomerName and CustomerSurnameattributes of the BILLING global relation; by means of a join on theCustomerID attribute, the Address attribute is then mapped on the
© Copyright 2014 by F. Daniel and M. Matera. Reproduction for classroom use and teaching allowed if source is properly cited.
30 2 Data and Application Integration
Machine 1 Machine 2 Machine 3
Local OS 1
Network
Distributed application
Distributed system layer (middleware)
Local OS 2 Local OS 3
Fig. 2.4 The basic architecture of a distributed system and application.
the reason for which there is the need for an ingredient in the developmentof distributed applications that is not needed in the development of conven-tional, non-distributed applications, i.e., middleware.
Middleware is software that enables and mediates network heterogene-ity and interactions among the pieces of a distributed application. That is,middleware provides a distributed system layer that masks hardware and op-erating system heterogeneity and provides an abstract view on the capabilitiesof the distributed system. This abstract view is provided to the distributedapplications running on top, so as to enable and facilitate their development.As such, the middleware services are typically used by developers who pro-gram distributed applications and less by non-programmers like the usersof the distributed applications. Middleware therefore represents, on the onehand, a computing infrastructure upon which distributed applications canbe developed and, on the other hand, a programming abstraction that easesdevelopment.
Depending on the kind of abstraction and development support they pro-vide, we distinguish four core types of middleware [13]:
• RPC-based middleware : The use of remote procedure calls (RPCs)represents the most basic technique to implement a distributed systemand the foundation for many of today’s middleware solutions. RPC-basedmiddleware provides the necessary infrastructure to transform procedurecalls into remote calls that are able to invoke procedures running on adi↵erent machine as if they were running locally. The problem with proce-dure calls, independently of whether local or remote, is that they typicallyimplement a blocking invocation model, which requires the invoking en-tity to wait till the invoked entity has finished processing. XML-RPC [278]and the Simple Object Access Protocol (SOAP [47]) are examples of RPCmiddleware.
• Object brokers: Object brokers address another issue of RPC-based mid-dleware, i.e., their intrinsic focus on imperative, procedural programminglanguages. With the advent of object-oriented programming, RPCs be-came apparently obsolete, and the focus of middleware moved from RPCs
© Copyright 2014 by F. Daniel and M. Matera. Reproduction for classroom use and teaching allowed if source is properly cited.
32 2 Data and Application Integration
formation operations may plugged-in in the form of external, custom appli-cation logic, providing for extensibility. The configuration and operation ofthe whole message broker can be managed via a dedicated administration in-terface. Examples of message brokers are IBM’s WebSphere Message Broker[153] and today’s enterprise service buses (ESBs).
2.4.2 Workflow management systems
Alonso et al. [13] correctly point out that the key contribution of messagebrokers to application integration is their capability to hide the heterogeneityand distribution of the information systems to integrate and to provide auniform view on their data and functionalities. Workflow managementsystems (WfMSs) provide support for the last ingredient, i.e., the definitionof integration logic.
The initial application area of WfMSs was o�ce automation, whose pur-pose is to automate the coordination of repetitive, administrative processesinvolving multiple human actors and electronic documents. Soon, it becameevident that the automation support provided by WfMSs could be lever-aged to coordinate also software systems, not only human actors. Nowadays,WfMSs (or business process management systems, as they are called moreprominently today) are very flexible systems that enable the seamless coor-dination of software and human actors, typically starting from a graphicalmodel that expresses the necessary integration logic. Most systems also pro-vide support for advanced coordination features, such as transactions, ex-ception handling, recovery and compensation, events, deadline management,and similar.
sender pays
Get shipment details
Calculate route
Compute cost
Payment Delivery
Delivery Payment
receiver pays
Shipment planning application
Customer relationship management system Billing application
Payment application
Delivery app
Delivery appPayment
application
Fig. 2.5 A simplified workflow model of a possible integration logic for the logisticsapplication integration scenario. The model shows how the integration logic coordi-nates the interaction with existing information systems.
© Copyright 2014 by F. Daniel and M. Matera. Reproduction for classroom use and teaching allowed if source is properly cited.
2.4 Application Integration 35
ness processes and agile applications that span organizations and platformscaused a re-thinking of some of the traditional development practices and theemergence of a new paradigm. This new paradigm is called service-orientedcomputing (SOC), and it is a computing paradigm that uses Web servicesas building blocks for the engineering of composite, distributed applicationsout of the reusable application logic encapsulated by Web services [46].
SOC builds on two ingredients: an ecosystem of readily reusable Web ser-vices and a development paradigm based on composition as core abstraction.We discuss both in the following.
2.4.4.1 Service-oriented architecture (SOA)
The architecture of the Web service ecosystem is commonly called a service-oriented architecture (SOA), which is a logical architecture for the designof software systems that provide services to either end-user applications orto other services distributed in a network, via published and discoverableinterfaces [224].
The SOA can be articulated into three roles and two artifacts (see Figure2.6): A web service (the first artifact) is implemented and made publiclyaccessibly by a so-called provider (or service provider). In order to advertisethe web service and enable its potential clients to be aware of the existence ofthe service, the provider publishes a service descriptor (the second artifact),which describes the purpose and features of the service, where to access it (aURI), and how to access it (e.g., using which communication protocol). Thisadvertisement is supported by a dedicated registry, which aggregates servicedescriptors from multiple providers and allows potential consumers to searchfor descriptors. If a consumer wants to use some external functionality, it canquery the registry for suitable web services, obtaining as response a servicedescriptor (or a set thereof). Following the instructions on how to invoke theservice contained in the retrieved descriptor, the consumer can bind to theconcrete service and use it.
Provider
Consumer
Registry
publishes descriptor
searches for service
binds to and
uses service
Web service
Service descriptor retrieves descriptor
Fig. 2.6 The service-oriented architecture (SOA) with its roles and artifacts.
© Copyright 2014 by F. Daniel and M. Matera. Reproduction for classroom use and teaching allowed if source is properly cited.
2.4 Application Integration 37
Provider 1 Consumer 2 Provider 2Consumer 1
Local OS 1
The Web
Local OS 2 Local OS 3
App. logic
Data
Present. App. logic
WS1WS2
WS3
Web service compositionService-based application
Local OS 4
Web service middleware, SOAP, HTTP, TCP/IP
App. logic
WS4
WS5App. logic
Process engine
Fig. 2.7 The distributed computing environment enabled by web services as aninstance of the generic architecture of distributed systems (see Figure 2.4).
according programming libraries and/or Web service runtime environments(e.g., so-called runtime containers). A process/service orchestration engine(e.g., Apache ODE), an evolution and adaptation of WfMSs to Web services,provides for the actual composition and facilitates the abstract definition ofintegration logic and its execution. Pure service compositions may be builtexclusively using a process engine, service-based applications usually also relyon hand-coded integration logics. Of course, the use of a process engine isnot mandatory in any case, but its composition-specific abstractions and run-time support significantly eases the composition task compared to coding theintegration logic with common programming languages.
Process engines are instructed with an abstract specification of the process(the Web services composition). An important role in service composition istherefore played by the logical integration languages used to express this spec-ification. In this respect, the standard language for Web service compositiontoday is the Web Services Business Process Execution Language (WS-BPEL[163]), an XML dialect that is specifically tailored to the needs of Web ser-vice composition and that enjoys large support by all major process enginevendors.
Figure 2.8 provides an intuitive feeling of the syntax of WS-BPEL anda functional view on the main composition features. The key aspects of aprocess specification are the definition of the partner links, which define theWeb services to communicate with, the variables used to store intermediateresults and messages, and the structured set of activities, which implementthe actual integration logic. The language comprises activities for the interac-tion with Web services (receive, reply, invoke, for storing data into variables(assign), for common control flow specifications (sequence, if, while, etc.), aswell as for exception and event handling.
© Copyright 2014 by F. Daniel and M. Matera. Reproduction for classroom use and teaching allowed if source is properly cited.
38 2 Data and Application Integration
<process name="NCName" targetNamespace="anyURI" queryLanguage="anyURI"? expressionLanguage="anyURI"? suppressJoinFailure="yes|no"? exitOnStandardFault="yes|no"? xmlns="http://docs.oasis-open.org/wsbpel/2.0/ process/executable"> <extensions>? ... </extensions> <import namespace="anyURI"? location="anyURI"? importType="anyURI" />* <partnerLinks>? <!-- Note: At least one role must be specified. --> ... </partnerLinks> <messageExchanges>? ... </messageExchanges> <variables>? ... </variables> <correlationSets>? ... </correlationSets> <faultHandlers>? <!-- Note: There must be at least one faultHandler --> ... </faultHandlers> <eventHandlers>? ... </eventHandlers>
activity*
</process>
The header of the process definition, among
others, specifies the languages used to query
XML data and define expressions (e.g., used
by the <condition> and <while> activities).
Partner links model relationships between
partner processes for peer-to-peer
communications. Partner links have a type
and an endpoint reference.
Extensions enable the definition of new
attributes, new elements, operation
extensions, etc.
Enables declaring dependencies on external
XML Schema or WSDL definitions.
Message exchanges associate outbound
messages with inbound messages.
Variables hold messages or data and capture
the state of the process.
Correlation sets are sets of properties shared
by all messages of a conversation.
Fault handlers allow the definition of fault
handling activities.
Event handlers enable reacting to external
events or alarms.
Activities specify the process logic.
Fig. 2.8 The basic structure of a service composition (a process) in BPEL [163].
For example, if we wanted to implement the shipment management pro-cess of Figure 2.5 as a Web service composition (assuming that all involvedsystems are equipped with a suitable Web service), the corresponding WS-BPEL logic could look as follows: The Get shipment details task is initiatedby the operator of the logistics company through an application that sendsa message to the process. That is, the task starts with receiving a message.Then the task invokes the CRM system to confirm sender and receiver data,e.g., via a synchronous request-response operation. The received shipmentdetails are assigned to a variable, which can be used to forward them tothe shipment planning application via the invocation of the respective Webservice, which corresponds to the execution of the Calculate route task. Thecalculated route details can be assigned to another variable, which can beused as input for the Compute cost task. Doing so requires invoking thebilling application with attached the route details. And so on and so forth.
BPEL is a standard language for the composition of SOAP web services.In [226] and [227], Pautasso shows examples of how to compose RESTful webservices by extending JOpera and BPEL, respectively.
© Copyright 2014 by F. Daniel and M. Matera. Reproduction for classroom use and teaching allowed if source is properly cited.
2.5 Cloud Computing 41
comes for the cost of the resources consumed by the virtualization itself. Yet,practice has shown that the benefits largely outweigh this additional e↵ort.
2.5.2 Cloud architectures
In line with the above virtualization strategies, cloud architectures can bestructured into di↵erent logical layers, each providing a di↵erent type of re-source as a service and, hence di↵erent abstractions. We illustrate these layersin Figure 2.9 and discuss the three bottom-most layers, i.e., the technologicalones (for an explanation of the HaaS layer, we refer the reader to [31] or[149]):
• Infrastructure as a Service (IaaS): IaaS provides computing infras-tructure as a service, that is, it provides an abstract view on hardware(e.g., computers, CPUs, storage devices, memory, etc.) and allows theuser to run own images of operating systems on top of the virtualizedhardware. The management interface provided by IaaS providers typicallyallows their users to dynamically define the required hardware characteris-tics and to start and stop operating system instances (e.g., to save money,as running an instance costs money). The target user of IaaS is the systemadministrator.
• Platform as a Service (PaaS): PaaS usually provides developmentenvironments or runtime environments for software development and soft-
Cloud
Software as a Service (SaaS)
Applications
Application services
Platform as a Service (PaaS)
Integrated development environment
Runtime environment
Infrastructure as a Service (IaaS)
Infrastructure services
Resource set
Virtual resource set
Physical resource set
Human as a Service (HaaS)
Crowdsourcing
SaaSPaaSIaaS
HaaS
Fig. 2.9 A simplified cloud architecture stack with IaaS, PaaS, SaaS, and HaaS(adapted from [172]).