Data Vault + Data Virtualization = Double Flexibility€¦ · and data virtualization. He is...

24
1 Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands. All rights reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photographic, or otherwise, without the explicit written permission of the copyright owners. by Rick F. van der Lans R20/Consultancy BV Twitter @rick_vanderlans www.r20.nl Data Vault + Data Virtualization = Double Flexibility Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 2 Rick F. van der Lans Rick F. van der Lans is an independent consultant, lecturer, and author. He specializes in data warehousing, business intelligence, database technology, and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects in which data warehousing, and integration technology was applied. Rick van der Lans is an internationally acclaimed lecturer. He has lectured professionally for the last twenty five years in many of the European and Middle East countries, the USA, South America, and in Australia. He has been invited by several major software vendors to present keynote speeches. He is the author of several books on computing, including his new Data Virtualization for Business Intelligence Systems. Some of these books are available in different languages. Books such as the popular Introduction to SQL is available in English, Dutch, Italian, Chinese, and German and is sold world wide. He also authored The SQL Guide to Ingres and SQL for MySQL Developers. As author for TechTarget.com and BeyeNetwork.com, writer of whitepapers, chairman for the annual European Enterprise Data and Business Intelligence Conference, and as columnist for a few IT magazines, he has close contacts with many vendors. R20/Consultancy B.V. is located in The Hague, The Netherlands, www.r20.nl. You can get in touch with Rick via: Email: [email protected] Twitter: @Rick_vanderlans LinkedIn: http://www.linkedin.com/pub/rick-van-der-lans/9/207/223

Transcript of Data Vault + Data Virtualization = Double Flexibility€¦ · and data virtualization. He is...

Page 1: Data Vault + Data Virtualization = Double Flexibility€¦ · and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects

1

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands. All rights

reserved. No part of this material may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photographic, or

otherwise, without the explicit written permission of the copyright owners.

by

Rick F. van der LansR20/Consultancy BVTwitter @rick_vanderlanswww.r20.nl

Data Vault + Data Virtualization = Double Flexibility

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 2

Rick F. van der LansRick F. van der Lans is an independent consultant, lecturer, and author. He specializes in data warehousing, business intelligence, database technology, and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects in which data warehousing, and integration technology was applied.

Rick van der Lans is an internationally acclaimed lecturer. He has lectured professionally for the last twenty five years in many of the European and Middle East countries, the USA, South America, and in Australia. He has been invited by several major software vendors to present keynote speeches.

He is the author of several books on computing, including his new Data Virtualization for Business Intelligence Systems. Some of these books are available in different languages. Books such as the popular Introduction to SQL is available in English, Dutch, Italian, Chinese, and German and is sold world wide. He also authored The SQL Guide to Ingres and SQL for MySQL Developers.

As author for TechTarget.com and BeyeNetwork.com, writer of whitepapers, chairman for the annual European Enterprise Data and Business Intelligence Conference, and as columnist for a few IT magazines, he has close contacts with many vendors.

R20/Consultancy B.V. is located in The Hague, The Netherlands, www.r20.nl. You can get in touch with Rick via: Email: [email protected]: @Rick_vanderlansLinkedIn: http://www.linkedin.com/pub/rick-van-der-lans/9/207/223

Page 2: Data Vault + Data Virtualization = Double Flexibility€¦ · and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects

2

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 3

Reporting on a Data Vault DW ??

Reporting andAnalytics

productiondatabases

stagingarea

DVEDW

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 4

Flexibility is Gone!

Datastore

DatastoreData

store

Datastore

DatastoreData

store

Datastore

Datastore

stagingarea

DVEDW

productiondatabases

Page 3: Data Vault + Data Virtualization = Double Flexibility€¦ · and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects

3

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 5

Physical Data Marts

Define data structuresDefine ETL logicInstall a database instanceCreate a databaseImplement the tablesDesign physical database structureInitial load of the tablesPeriodic load of the tablesTune and optimize the database (regularly)Tune and optimize ETL logic

Monitor database usageDevelop and run backup andrecovery processesUnload dataChange data structureChange ETL logicTune and optimize physicaldatabase designTune and optimize ETL logicReload data…

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 6

Remarks on Data Marts and Cubes

Gartner in Data Management Cost-Cutting Tips, March 10, 2008:Consolidate data marts into an application-neutral data warehouse or smaller data marts to reduce the cost and complexity of the data integration processes feeding the data marts. Gartner predicts this could save you 50 percent of what you're spending to support the siloed data marts.

Page 4: Data Vault + Data Virtualization = Double Flexibility€¦ · and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects

4

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 7

Flexibility Through Data Virtualization

DataVirtualization

Server

productiondatabases

stagingarea

DVEDW

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 8

Data Virtualization Overview (1)

productiondatabases

streamingdatabases

socialmedia data

productionapplication

big datastores

website

ESB

analytics& reporting

unstructureddata

mobileApp

datawarehouse

& data marts

internalportal dashboard

externaldata

privatedata

Data Virtualization Server

applications

Page 5: Data Vault + Data Virtualization = Double Flexibility€¦ · and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects

5

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 9

Data Virtualization Overview (2)

streamingdatabases

socialmedia data

productionapplication

big datastores

website

ESB

analytics& reporting

mobileApp

datawarehouse

& data marts

internalportal dashboard

externaldata

privatedata

ODBC/SQL JDBC/SQL XML/SOAP REST/JSON XQuery MDX/DAX

JMS SQL SQL+ XSLT Hive Prop. Excel JSONCICS SOAP

productiondatabases

applications

SQL statement

JMS message SQL statement SOAP messageData Virtualization Server

unstructureddata

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 10

Indonesian “Rijsttafel”

Page 6: Data Vault + Data Virtualization = Double Flexibility€¦ · and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects

6

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 11

The Service Hatch

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 12

Data Virtualization as Service Hatch

Kitchen Servicehatch

Food Restaurant

Datasources

Datavirtualization

serverData End Users

Page 7: Data Vault + Data Virtualization = Double Flexibility€¦ · and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects

7

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 13

The Market of Data Virtualization Servers

Cirro Data HubCisco/Composite Information ServerDenodo PlatformIBM InfoSphere Federation ServerInformatica Data ServicesInformation Builders EIIOracle Data Services IntegratorProgress EasylRed Hat Teiid and Jboss Data VirtualizationStone Bond Enterprise Enabler VirtuosoAnd many more …

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 14

Gartner on Integration Tools

Source: Gartner 2014: Modernize Your Data Integration Capabilities for Diverse Use-Cases, Ted Friedman

Page 8: Data Vault + Data Virtualization = Double Flexibility€¦ · and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects

8

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 15

Source table

Virtual table:May contain row selections, column selections, column concatenations, transformations, column and table name changes, groupings, aggregations, data cleansing, …

Developing Virtual Tables

Data consumer

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 16

Nested virtual table

Source table

Virtual table

Nesting Virtual Tables

Page 9: Data Vault + Data Virtualization = Double Flexibility€¦ · and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects

9

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 17

Layers of Virtual Tables

DataVirtualization

Server

Database 2Database 1 Database 3 Database 4

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 18

Virtual tablewith cache

Virtual tablewithout cache

Caches Mimimize Access to Data Stores

Page 10: Data Vault + Data Virtualization = Double Flexibility€¦ · and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects

10

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 19

Enable caching

Table where cache should be stored

Refresh specification

Table where cache should be stored

Refresh specification

Enable caching

Table where cache should be stored

Refresh specification

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 20

Data Virtualization and Data Vault

Data Vault - EDW

productionapplication website

analytics& reporting

mobileApp

internalportal dashboard

ODBC/SQL JDBC/SQL XML/SOAP REST/JSON XQuery MDX/DAX

SQL

Data Virtualization Server

Page 11: Data Vault + Data Virtualization = Double Flexibility€¦ · and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects

11

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 21

Solution With Data Virtualization

OperationalSystems

Data VaultEDW

SupernovaLayer

Extended Supernova Layer

Data DeliveryLayer

PDB PDB PDB PDB

Data virtu

alization

Data sto

rage

Data Vault

Users andReports

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 22

The Challenge: The Versions

Page 12: Data Vault + Data Virtualization = Double Flexibility€¦ · and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects

12

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 23

Example: A Data Vault Model

OperationalSystems

Data VaultEDW

SupernovaLayer

Extended Supernova Layer

Data DeliveryLayer

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 24

Example: The SuperNova Model

All the satellite data is added to hubs and linksA record in a hub table represents a version of a hub objectA record in a link table represents a version of a link objectThe hub/link id + startdate are the primary keys

Page 13: Data Vault + Data Virtualization = Double Flexibility€¦ · and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects

13

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 25

Why the Name SuperNova?

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 26

Determining Versions of Hubs

 HUB_ID META_LOAD_DTS META_LOAD_END_DTS 1 2012-06-01 00:00:00 2013-11-14 23:59:59 1 2013-11-15 00:00:00 2014-03-06 23:59:59 1 2014-03-07 00:00:00 9999-12-31 00:00:00

 

 HUB_ID META_LOAD_DTS META_LOAD_END_DTS 1 2013-06-21 00:00:00 2013-07-20 23:59:59 1 2013-07-21 00:00:00 2013-11-12 23:59:59 1 2013-11-13 00:00:00 9999-12-31 00:00:00

 

Satellite 1 records for hub object 1:

Satellite 1 records for hub object 1:

Merged result showing all versions of hub 1: HUB_ID STARTDATE ENDDATE 1 2012-06-01 00:00:00 2013-06-21 23:59:59 1 2013-06-22 00:00:00 2013-07-20 23:59:59 1 2013-07-21 00:00:00 2013-11-12 23:59:59 1 2013-11-13 00:00:00 2013-11-14 23:59:59 1 2013-11-15 00:00:00 2014-03-06 23:59:59 1 2014-03-07 00:00:00 9999-12-31 00:00:00  

Page 14: Data Vault + Data Virtualization = Double Flexibility€¦ · and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects

14

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 27

201206-01

190001-01

201306-21

201307-21

201311-13

201311-15

999912-31

Versions of hub 1from satellite1 table

Versions of hub 1from satellite2 table

201403-07

+

Visualization of Merge Process

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 28

Step 1 of Determining Hub Versions

(SELECT HUB_ID, META_LOAD_DTS AS STARTDATE, META_LOAD_END_DTS AS ENDDATE

FROM HUB1_SATELLITE1UNION SELECT HUB_ID, META_LOAD_DTS, META_LOAD_END_DTS FROM HUB1_SATELLITE2)

Merge all the satellites (with a union operator) :

Intermediate result: SATELLITES HUB_ID STARTDATE ENDDATE 1 2012-06-01 00:00:00 2013-11-14 23:59:59 1 2013-06-21 00:00:00 2013-07-20 23:59:59 1 2013-07-21 00:00:00 2013-11-12 23:59:59 1 2013-11-13 00:00:00 9999-12-31 00:00:00 1 2013-11-15 00:00:00 2014-03-06 23:59:59 1 2014-03-07 00:00:00 9999-12-31 00:00:00 2 2011-03-20 00:00:00 2012-02-25 23:59:59 2 2012-02-26 00:00:00 2014-02-25 23:59:59 2 2012-02-26 00:00:00 9999-12-31 00:00:00 2 2014-02-26 00:00:00 9999-12-31 00:00:00 3 2013-09-09 00:00:00 2013-11-11 00:00:00 3 2013-11-12 00:00:00 2013-11-12 00:00:00  Note that this result does not include hub object 4, because it has no satellite data. 

Page 15: Data Vault + Data Virtualization = Double Flexibility€¦ · and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects

15

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 29

Step 2 of Determining Hub Versions

SELECT HUB1.HUB_ID, SATELLITES.STARTDATE, SATELLITES.ENDDATE, HUB1.BUSINESS_KEYFROM HUB1 LEFT OUTER JOIN

(SELECT HUB_ID, META_LOAD_DTS AS STARTDATE, META_LOAD_END_DTS AS ENDDATEFROM HUB1_SATELLITE1UNION SELECT HUB_ID, META_LOAD_DTS, META_LOAD_END_DTS FROM HUB1_SATELLITE2) AS SATELLITES ON HUB1.HUB_ID = SATELLITES.HUB_ID)

Join with the original Hub table and get the business key(s):

Intermediate result:

 STARTDATES HUB_ID STARTDATE ENDDATE BUSINESS_KEY 1 2012-06-01 00:00:00 2013-11-14 23:59:59 b1 1 2013-06-21 00:00:00 2013-07-20 23:59:59 b1 1 2013-07-21 00:00:00 2013-11-12 23:59:59 b1 1 2013-11-13 00:00:00 9999-12-31 00:00:00 b1 1 2013-11-15 00:00:00 2014-03-06 23:59:59 b1 1 2014-03-07 00:00:00 9999-12-31 00:00:00 b1 2 2011-03-20 00:00:00 2012-02-25 23:59:59 b2 2 2012-02-26 00:00:00 2014-02-25 23:59:59 b2 2 2012-02-26 00:00:00 9999-12-31 00:00:00 b2 2 2014-02-26 00:00:00 9999-12-31 00:00:00 b2 3 2013-09-09 00:00:00 2013-11-11 00:00:00 b3 Table continues on the next p

3 2013-11-12 00:00:00 2013-11-12 00:00:00 b3 4 NULL NULL b4 -1 NULL NULL Unknown -2 NULL NULL N.a.    

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 30

Step 3 of Determining Hub Versions

Find for each hub the correct versions: HUB1_VERSIONS HUB_ID STARTDATE ENDDATE HUB_BUSINESS_KEY 1 2012-06-01 00:00:00 2013-07-20 23:59:59 b1 1 2013-07-21 00:00:00 2013-11-12 23:59:59 b1 1 2013-11-13 00:00:00 2013-11-14 23:59:59 b1 1 2013-11-15 00:00:00 2014-03-06 23:59:59 b1 1 2014-03-07 00:00:00 9999-12-31 00:00:00 b1 2 2011-03-20 00:00:00 2012-02-25 23:59:59 b2 2 2102-02-26 00:00:00 2014-02-25 23:59:59 b2 2 2014-02-26 00:00:00 9999-12-31 00:00:00 b2 3 2013-09-09 00:00:00 2013-11-11 23:59:59 b3 3 2013-11-12 00:00:00 2013-11-30 00:00:00 b3 4 1900-01-01 00:00:00 9999-12-31 00:00:00 b4 -1 1900-01-01 00:00:00 9999-12-31 00:00:00 Unknown -2 1900-01-01 00:00:00 9999-12-31 00:00:00 N.a.

 

Page 16: Data Vault + Data Virtualization = Double Flexibility€¦ · and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects

16

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 31

The Three Steps Combined

CREATE VIEW HUB1_VERSIONS ASWITH STARTDATES (HUB_ID, STARTDATE, ENDDATE, BUSINESS_KEY) AS (

SELECT HUB1.HUB_ID, SATELLITES.STARTDATE, SATELLITES.ENDDATE, HUB1.BUSINESS_KEYFROM HUB1 LEFT OUTER JOIN

(SELECT HUB_ID, META_LOAD_DTS AS STARTDATE, META_LOAD_END_DTS AS ENDDATEFROM HUB1_SATELLITE1UNION SELECT HUB_ID, META_LOAD_DTS, META_LOAD_END_DTS FROM HUB1_SATELLITE2) AS SATELLITES ON HUB1.HUB_ID = SATELLITES.HUB_ID)

SELECT DISTINCT HUB_ID, STARTDATE, CASE WHEN ENDDATE_NEW <= ENDDATE_OLD THEN ENDDATE_NEW ELSE ENDDATE_OLD END AS ENDDATE,BUSINESS_KEY

FROM (SELECT S1.HUB_ID, ISNULL(S1.STARTDATE,'1900-01-01 00:00:00') AS STARTDATE, (SELECT ISNULL(MIN(STARTDATE - '1' SECOND),'9999-12-31 00:00:00') FROM STARTDATES AS S2WHERE S1.HUB_ID = S2.HUB_IDAND S1.STARTDATE < S2.STARTDATE) AS ENDDATE_NEW, ISNULL(S1.ENDDATE,'9999-12-31 00:00:00') AS ENDDATE_OLD, S1.BUSINESS_KEY

FROM STARTDATES AS S1) AS S3

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 32

Hubs with Less Than Two Satellites

Hubs with no satellites:

Hubs with one satellite:

CREATE VIEW HUB3_VERSIONS (HUB_ID, STARTDATE, ENDDATE, BUSINESS_KEY) ASSELECT HUB_ID, ISNULL(META_LOAD_DTS, '1900-01-01 00:00:00'),

'9999-12-31 00:00:00', BUSINESS_KEYFROM HUB3

CREATE VIEW HUB2_VERSIONS (HUB_ID, STARTDATE, ENDDATE, BUSINESS_KEY) ASSELECT HUB2.HUB_ID, ISNULL(HUB2_SATELLITE1.META_LOAD_DTS, '1900-01-01 00:00:00'),

ISNULL(HUB2_SATELLITE1.META_LOAD_END_DTS, '9999-12-31 00:00:00'), HUB2.BUSINESS_KEY

FROM HUB2 LEFT OUTER JOIN HUB2_SATELLITE1ON HUB2.HUB_ID = HUB2_SATELLITE1.HUB_ID

Page 17: Data Vault + Data Virtualization = Double Flexibility€¦ · and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects

17

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 33

Creating the SuperNova Hub Views

A hub is joined with all its satellites using the data in the hub_versions views:CREATE VIEW SUPERNOVA_HUB1

(HUB_ID, STARTDATE, ENDDATE, BUSINESS_KEY, ATTRIBUTE1, ATTRIBUTE2) ASSELECT HUB1_VERSIONS.HUB_ID, HUB1_VERSIONS.STARTDATE, HUB1_VERSIONS.ENDDATE,

HUB1_VERSIONS.BUSINESS_KEY, HUB1_SATELLITE1.ATTRIBUTE, HUB1_SATELLITE2.ATTRIBUTE

FROM HUB1_VERSIONS LEFT OUTER JOIN HUB1_SATELLITE1

ON HUB1_VERSIONS.HUB_ID = HUB1_SATELLITE1.HUB_ID AND (HUB1_VERSIONS.STARTDATE <= HUB1_SATELLITE1.META_LOAD_END_DTS AND HUB1_VERSIONS.ENDDATE >= HUB1_SATELLITE1.META_LOAD_DTS)

LEFT OUTER JOIN HUB1_SATELLITE2 ON HUB1_VERSIONS.HUB_ID = HUB1_SATELLITE2.HUB_ID AND (HUB1_VERSIONS.STARTDATE <= HUB1_SATELLITE2.META_LOAD_END_DTS AND HUB1_VERSIONS.ENDDATE >= HUB1_SATELLITE2.META_LOAD_DTS)

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 34

Virtual Contents of the SuperNova Hub

 SUPERNOVA_HUB1 HUB_ID STARTDATE ENDDATE HUB_BUSINESS

_KEY ATTRIBUTE1 ATTRIBUTE2

1 2012-06-01 00:00:00 2013-07-20 23:59:59 b1 a1 a7 1 2013-07-21 00:00:00 2013-11-12 23:59:59 b1 a1 a8 1 2013-11-13 00:00:00 2013-11-14 23:59:59 b1 a1 a9 1 2013-11-15 00:00:00 2014-03-06 23:59:59 b1 a2 a9 1 2014-03-07 00:00:00 9999-12-31 00:00:00 b1 a3 a9 2 2011-03-20 00:00:00 2012-02-25 23:59:59 b2 a4 a10 2 2102-02-26 00:00:00 2014-02-25 23:59:59 b2 a5 a11 2 2014-02-26 00:00:00 9999-12-31 00:00:00 b2 a6 a11 3 2013-09-09 00:00:00 2013-11-11 23:59:59 b3 NULL a12 3 2013-11-12 00:00:00 2013-11-30 00:00:00 b3 NULL a13 4 1900-01-01 00:00:00 9999-12-31 00:00:00 b4 NULL NULL -1 1900-01-01 00:00:00 9999-12-31 00:00:00 Unknown NULL NULL -2 1900-01-01 00:00:00 9999-12-31 00:00:00 N.a. NULL NULL  

Page 18: Data Vault + Data Virtualization = Double Flexibility€¦ · and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects

18

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 35

Creating Version Views for Links

CREATE VIEW LINK_VERSIONS ASWITH STARTDATES (LINK_ID, STARTDATE, ENDDATE, HUB1_ID, HUB2_ID, EVENTDATE) AS (

SELECT LINK.LINK_ID, SATELLITES.STARTDATE, SATELLITES.ENDDATE, LINK.HUB1_ID, LINK.HUB2_ID, LINK.EVENTDATE

FROM LINK LEFT OUTER JOIN(SELECT LINK_ID, META_LOAD_DTS AS STARTDATE, META_LOAD_END_DTS AS ENDDATEFROM LINK_SATELLITE1UNION SELECT LINK_ID, META_LOAD_DTS, META_LOAD_END_DTSFROM LINK_SATELLITE2) AS SATELLITES ON LINK.LINK_ID = SATELLITES.LINK_ID)

SELECT DISTINCT LINK_ID, STARTDATE, CASE WHEN ENDDATE_NEW <= ENDDATE_OLD THEN ENDDATE_NEW ELSE ENDDATE_OLD END AS ENDDATE,HUB1_ID, HUB2_ID, EVENTDATE

FROM (SELECT S1.LINK_ID, ISNULL(S1.STARTDATE, '1900-01-01') AS STARTDATE, (SELECT ISNULL(MIN(STARTDATE - INTERVAL '1' SECOND),'9999-12-31 00:00:00') FROM STARTDATES AS S2WHERE S1.LINK_ID = S2.LINK_IDAND S1.STARTDATE < S2.STARTDATE) AS ENDDATE_NEW,ISNULL(S1.ENDDATE,'9999-12-31') AS ENDDATE_OLD,S1.HUB1_ID, S1.HUB2_ID, S1.EVENTDATE

FROM STARTDATES AS S1) AS S3

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 36

Creating the SuperNova Link Views

CREATE VIEW SUPERNOVA_LINK (LINK_ID, HUB1_ID, HUB2_ID, STARTDATE, ENDDATE, EVENTDATE,ATTRIBUTE1, ATTRIBUTE2) AS

SELECT LINK_VERSIONS.LINK_ID, LINK_VERSIONS.HUB1_ID, LINK_VERSIONS.HUB2_ID,LINK_VERSIONS.STARTDATE, LINK_VERSIONS.ENDDATE, LINK_VERSIONS.EVENTDATE,LINK_SATELLITE1.ATTRIBUTE, LINK_SATELLITE2.ATTRIBUTE

FROM LINK_VERSIONS LEFT OUTER JOIN LINK_SATELLITE1

ON LINK_VERSIONS.LINK_ID = LINK_SATELLITE1.LINK_ID AND (LINK_VERSIONS.STARTDATE <= LINK_SATELLITE1.META_LOAD_END_DTS AND LINK_VERSIONS.ENDDATE >= LINK_SATELLITE1.META_LOAD_DTS)

LEFT OUTER JOIN LINK_SATELLITE2 ON LINK_VERSIONS.LINK_ID = LINK_SATELLITE2.LINK_ID AND (LINK_VERSIONS.STARTDATE <= LINK_SATELLITE2.META_LOAD_END_DTS AND LINK_VERSIONS.ENDDATE >= LINK_SATELLITE2.META_LOAD_DTS)

A link is joined with all its satellites using the data in the link_versions views:

Page 19: Data Vault + Data Virtualization = Double Flexibility€¦ · and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects

19

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 37

Virtual Contents of the SuperNova Link

 LINK_VERSIONS LINK_ID STARTDATE ENDDATE HUB1_ID HUB2_ID EVENTDATE 1 2013-12-01 00:00:00 2013-12-24 23:59:59 1 5 2013-12-01 1 2013-12-25 00:00:00 2014-01-23 23:59:59 1 5 2013-12-01 1 2014-01-24 00:00:00 9999-12-31 00:00:00 1 5 2013-12-01 2 2014-03-12 00:00:00 9999-12-31 00:00:00 1 6 2014-01-01 3 2013-12-27 00:00:00 2014-02-01 23:59:59 2 6 2013-12-25 3 2014-02-02 00:00:00 9999-12-31 00:00:00 2 6 2013-12-25 4 2013-12-08 00:00:00 9999-12-31 00:00:00 3 -1 2013-06-24  

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 38

Lineage Analysis of All Views

Page 20: Data Vault + Data Virtualization = Double Flexibility€¦ · and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects

20

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 39

Defining Primary and Foreign Keys

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 40

Caching of SuperNova Views

Page 21: Data Vault + Data Virtualization = Double Flexibility€¦ · and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects

21

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 41

The Extended SuperNova Model

Add derived dataTransform dataReuse of definitionsAlways use the XSN layer

OperationalSystems

Data VaultEDW

SupernovaLayer

Extended Supernova Layer

Data DeliveryLayer

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 42

The Data Delivery Model

Data is shown in a filtered mannerData is shown in aggregated formData is shown in one large, highly denormalized tableData is shown in a star schema formData is shown with a service interface…

OperationalSystems

Data VaultEDW

SupernovaLayer

Extended Supernova Layer

Data DeliveryLayer

Page 22: Data Vault + Data Virtualization = Double Flexibility€¦ · and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects

22

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 43

Virtual Data Marts

Define data structuresDefine ETL/DV logicInstall a database instanceCreate a databaseImplement the tablesDesign physical database structureInitial load of the tablesPeriodic load of the tablesTune and optimize the database (regularly)Tune and optimize ETL logic

Monitor database usageDevelop and run backup andrecovery processesUnload dataChange data structureChange ETL/DV logicTune and optimize physicaldatabase designTune and optimize ETL logicReload data…

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 44

Why Not Database Views?

Not database server independentMore advanced distributed join featuresMore advanced heterogeneous join featuresMore advanced caching/refreshing featuresDatabase views offer no lineage/impact analysisDatabase views offer only one API: SQLNo versioning of joinsNo data cleansing featuresNo business glossary…

Page 23: Data Vault + Data Virtualization = Double Flexibility€¦ · and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects

23

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 45

The Whitepaper

Download: www.r20.nl or http://www.cisco.com/web/services/enterprise-it-services/data-virtualization/documents/whitepaper-cisco-datavaul.pdf

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 46

Closing Remarks

Data Vault offers data model extensibility and report reproducibilityData vault is half the solutionSuperNova (with data virtualization) is the other halfWith data virtualization a more flexible reporting and analytical environment can be developed (quickly)Avoid the (physical) data mart explosion! Go virtual!

Page 24: Data Vault + Data Virtualization = Double Flexibility€¦ · and data virtualization. He is managing director of R20/Consultancy B.V.. Rick has been involved in various projects

24

Copyright © 1991 - 2015 R20/Consultancy B.V., The Hague, The Netherlands 47