Agile Data Engineering - Intro to Data Vault Modeling (2016)
Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... –...
Transcript of Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... –...
![Page 1: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/1.jpg)
Trends in Data Warehouse Data Modeling:
Data Vault andAnchor Modeling
![Page 2: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/2.jpg)
Thanks for Attending!● Roland Bouman, Leiden the Netherlands● MySQL AB, Sun, Strukton, Pentaho (1 nov)● Web- and Business Intelligence Developer● author:
– Pentaho Solutions– Pentaho Kettle Solutions
● Http://rpbouman.blogspot.com/● Twitter: @rolandbouman
![Page 3: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/3.jpg)
Data Warehouse (DWH)● Support Business Intelligence (BI)
– Reporting– Analysis– Data mining
● General Requirements– Integrate disparate data sources– Maintain History– Calculate Derived data– Data delivery to BI applications
![Page 4: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/4.jpg)
DWH Architectures● Categories
– Traditional– Hybrid– Modern
● Aspects– Modelling– Data logistics
![Page 5: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/5.jpg)
DWH Architectures● Traditional
– Information Factory (Bill Inmon)– Enterprise Bus (Ralph Kimball)
● Hybrid● Modern
![Page 6: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/6.jpg)
DWH Architectures● Traditional● Hybrid
– Hub-and-Spoke● Modern
![Page 7: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/7.jpg)
DWH Architectures● Traditional● Hybrid● Modern
– Data Vault (Dan Linstedt)– Anchor Modeling (Lars R �önnb �äck)
![Page 8: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/8.jpg)
Inmon DWH (Traditional):Corporate Information Factory
“A source of data that is subject oriented, integrated, nonvolatile and time variant for the purpose of management's decision processes.”
Bill Inmon (the Data Warehouse Toolkit)●http://www.inmoncif.com/home/
![Page 9: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/9.jpg)
Inmon DWH (Traditional): Corporate Information Factory
● Enterprise or Corporate DWH, DWH 2.0● Focus on backroom data integration
– Central information model– Single version of the truth
● Data delivery– Disposable data marts
● Bottom-up
![Page 10: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/10.jpg)
Data logistics of theCorporate Information Factory
EnterpriseData Warehouse
OLTPDB
OLAPDB
CubeFiles
Staging
ExtractTransformLoad
ExtractTransformLoad
Source Data Marts BI Apps
![Page 11: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/11.jpg)
Data Modeling for theCIF Enterprise DWH
● Normalized, typically 3NF● Organized in “subject areas”
– Series of related tables– Example: Customer, Product, Transaction– Common key
![Page 12: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/12.jpg)
Data Modeling for theCIF Enterprise DWH
● History– PK includes a date/timepart
● Contains both detail and aggregate data– Multiple levels of aggregation
![Page 13: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/13.jpg)
Kimball DWH (Traditional):Dimensional Model andDWH Bus Architecture
“The data warehouse is the conglomeration of an organization's staging and presentation areas, where operational data is specifically structured for query and analysis performance and ease of use.”
Ralph Kimball (the Data Warehouse Toolkit)●http://www.kimballgroup.com/
![Page 14: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/14.jpg)
Kimball DWH (Traditional): DWH Bus Architecture
● Focus on data delivery● Integration at the data mart level ● Top-down
![Page 15: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/15.jpg)
Data logistics of theDWH Bus Architecture
(Enterprise) Data WarehouseOLTP
DB
OLAPDB
Cube
Files
Staging
ExtractTransformLoad
Source EDW is a collection Data Marts BI Apps
![Page 16: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/16.jpg)
Data Modeling for theDWH Bus Architecture
● Dimensional Modeling– Star schemas
● Organized in:– Fact tables– Dimension tables
![Page 17: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/17.jpg)
Data Modeling for theDWH Bus Architecture
● Fact tables– Highly normalized– Additive metrics
● Dimension tables– Highly denormalized– Descriptive labels– Shared across fact tables
![Page 18: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/18.jpg)
Data Modeling for theDWH Bus Architecture
● History– Slowly changing dimensions (versioning)– Fact links to Date and/or Time dimensions
● Detailed, not aggregated
![Page 19: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/19.jpg)
Sakila Rental Star Schema
![Page 20: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/20.jpg)
Sakila DWH Bus Architecture
fact_rentalfact_inventory fact_payment
dim_date
dim_customerdim_store dim_staffdim_store
dim_film
![Page 21: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/21.jpg)
Problems with traditionalDWH architectures
● General Problems– Lack of flexibility and resilience to change– Loading (ETL) Complexity
● Problems with Inmon– Centralization requires upfront investment– Single version of whose truth, when?
● Problems with Kimball– Dimensional Model anomalies
![Page 22: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/22.jpg)
Dimensional Modeling Anomalies
● Snowflaking (dimension normalization)– Monster dimensions– Outriggers– Ex: Customer Demographics
● Hierarchical data– Bridge table (closure table)– Ex: Employee/Boss,
● Multi-valued dimensions– Bridge table– Ex: Account/Customer bridge table
![Page 23: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/23.jpg)
Hybrid DWH: Hub-and-Spoke● Inmon back-end (hub)● Kimball front-end (satellites)
![Page 24: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/24.jpg)
Modern: Data Vault“The Data Vault is a detail oriented,
historical tracking and uniquely linked set of normalized tables that supports one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema.”
Dan Linstedt (Data Vault Overview)●http://danlinstedt.com/
![Page 25: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/25.jpg)
Data Vault ● Focus on
– Data Integration– Traceability and Auditability– Resilience to change
● Single version of the facts– Rather than single version of the truth
● All of the data, all of the time– No upfront cleansing and conforming
● Bottom-up
![Page 26: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/26.jpg)
Data Vault Modelling● Hubs● Links● Satellites
![Page 27: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/27.jpg)
Data Vault Modelling: Hubs● Hubs Model Entities● Contains business keys
– PK in absence of surrogate key● Metadata:
– Record source– Load date/time
● Optional surrogate key– Used as PK if present
● No foreign keys!
![Page 28: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/28.jpg)
Data Vault Modelling: Links● Links model relationships
– Intersection table (M:n relationship)● Foreign keys to related hubs or links
– Form natural key (business key) of the link● Metadata:
– Record source– Load date/time
● Optional surrogate key
![Page 29: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/29.jpg)
Data Vault Modelling: Satellites● Satellites model a group of attributes● Foreign key to a Hub or Link● Metadata:
– Record source– Load date/time
![Page 30: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/30.jpg)
Sakila Data Vault Example
![Page 31: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/31.jpg)
Data Vault tools and Example● Kettle Data Vault Example
– Sakila Data Vault– Chapter 19– Kasper van de Graaf– http://www.dikw-academy.nl
● Quipu– Data Vault Generator– Kettle templates– Johannes van den Bosch– http://www.datawarehousemanagement.org/
![Page 32: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/32.jpg)
Modern: Anchor model“Anchor Modeling is an agile information
modeling technique that offers non-destructive extensibility mechanisms enabling robust and flexible management of changes. A key benefit of Anchor Modeling is that changes in a data warehouse environment only require extensions, not modications.”
Lars Rönnbäck (Agile Information Modeling � �in Evolving Data Environments)●http://www.anchormodeling.com/
![Page 33: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/33.jpg)
Anchor Modelling● Focus on
– Resilience to change– Agility– Extensibility– History tracking
● Bottom-up
![Page 34: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/34.jpg)
Anchor Modelling● 6NF (Date, Darwen, Lorentzos)● Table features no non-trivial join
dependencies at all● Translation: A 6NF table cannot be
decomposed losslessly ● Translation● Temporal Data
![Page 35: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/35.jpg)
Anchor Modelling Constructs● Anchors● Attributes● Ties● Knots
![Page 36: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/36.jpg)
Anchor Modelling: Anchors● Entities are modeled as Anchors● Relationships may be modeled as Anchors
– m:n relationships having properties● Only a surrogate key
![Page 37: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/37.jpg)
Anchor Modelling: Ties● Ties model relationships
– 1:n relationships– m:n relationships without properties
● Static vs Historized– History tracked using date/time
● May be Knotted– Knot holds set of association types
● Two or more “anchor roles”– Relationships may be broken into several
ties having only mandatory anchors
![Page 38: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/38.jpg)
Anchor Modelling: Attributes● Models properties of an Anchor● Static vs Historized
– History tracked using date/time● May or not be Knotted
– Knot holds set of valid attribute values
![Page 39: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/39.jpg)
Anchor Modelling: Knots● Reference table
– Fairly small set of distinct values● Dictionary lookup to qualify
– Attributes– Ties
● “Knotted” Attributes and Ties
![Page 40: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/40.jpg)
Anchor Model Diagram
anchor
knot
Static attribute
Historized attribute
Static tie
Historized tie
http://www.anchormodeling.com/modeler/latest/
![Page 41: Data Vault and Anchor Modeling - Percona · PDF fileData Vault and Anchor Modeling ... – Monster dimensions ... “The Data Vault is a detail oriented, historical tracking and uniquely](https://reader034.fdocuments.in/reader034/viewer/2022052515/5a8abd187f8b9a78648bfd45/html5/thumbnails/41.jpg)
Aknowledgements● Kasper de Graaf
– Twitter: @kdgraaf– http://www.dikw-academy.nl
● Jos van Dongen– Twitter: @josvandongen– http://www.tholis.com/