Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

31
Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16

Transcript of Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Page 1: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault

RMOUG Training Days

2006Colorado Convention Center

Denver, Colorado February 15-16

Page 2: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault; What’s The Combination?

Jeff MeyerEnterprise Data Integration – Oracle DBA

Department of Technology Services

Denver Public Schools

Page 3: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault

Who are we? DBAs Managers Analysts

Enterprise Data Warehouse Projects Currently in process Planned

Data Marts

Page 4: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault

Brief History and Revisit Some Definitions

Three Basic Building Blocks of the Data Vault

Advanced Features Questions

Page 5: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault

Brief History and Revisit Some Definitions

Three Basic Building Blocks of the Data Vault

Advanced Features Questions

Page 6: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault – Brief History and Revisit Some Definitions 1970 – Dr. E.F. Codd of IBM 1979 – First Working Relational

Database by Relational Software Incorporated

Oracle v2 1991 – William H. Inmon published

‘Building the Data Warehouse’

Page 7: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault – Brief History and Revisit Some Definitions Legacy System –

‘… any system that has been put into production.’

(para-phrased W.H. Inmon)

Operational Data Store – ‘… a subject-oriented, integrated, volatile, current

or near current collection of operational data.’

W.H. Inmon

Page 8: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault – Brief History and Revisit Some Definitions Data Warehouse –

‘… a subject-oriented, integrated, time-variant, non-volatile collection of data designed for support of business decisions’

W.H. Inmon

Data Vault – ‘… a detail-oriented, historical tracking and uniquely linked

set of normalized tables that support one or more functional areas of business.’

Dan Linstedt

Page 9: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault – Brief History and Revisit Some Definitions

Data Mart – ‘… a subset of a data warehouse, for use by a single

department or function.’

www.e-formation.co.nz/glossary.asp

Corporate Information Factory – ‘… the framework that exists that surrounds the data

warehouse; typically contains an ODS, a data warehouse, data marts, DSS applications, exploration warehouses, and so forth.’

W.H. Inmon

Page 10: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault – Brief History and Revisit Some Definitions

* Source: Bill Inmon and Claudia Imhoff

Page 11: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault – Why?

Why do we need it? We finally have a Data Model that will work for small,

medium, or large business Anyone building a Data Warehouse can use these

techniques.

We’ve got issues in constructing the data warehouse from 3rd normal form, or star schema form. There are inherent road blocks to each method that we must

solve technically through our Data Model.

Page 12: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault

Brief History and Revisit Some Definitions

Three Basic Building Blocks of the Data Vault

Advanced Features Questions

Page 13: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault – Three Basic Building Blocks

Hub – stand alone table; list of unique business keys; used for business identification

Satellite – descriptive data; historical data; used for descriptive information for the HUB or LINK

Link – associative table; list of unique relationships between keys; used for relationships between HUBs and LINKs

Page 14: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault – Three Basic Building Blocks

Preview

Hub Employees

Hub Schools

ELAName

EEOCDates

Hub Students

EEOCName

ShotsAddrs

Assign Enrollments

Page 15: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault – Three Basic Building Blocks

HUB

Primary Key

<Business Key>

Load DTS

Record Source

Primary Key

<Business Key>

Load DTS

Record Source

Sample Data Set “CUSTOMER”

CONTRACTS2-2-200093KFLLA10

CONTRACTS2-2-2000929ABC29859

CONTRACTS2-2-2000PAFJG28958

FINANCE2-2-2000PPRU_32597

SALES8-3-2001HUJI_BFIOQ6

SALES6-4-2001LLOA_82J5J5

CONTRACTS3-7-2000KKO92854_dd4

CONTRACTS1-25-2000DKEF3

CONTRACTS10-2-2000ABC925_24FN2

RCRD SRCLOAD DTSCUSTOMER #ID

MANUFACT10-12-2000ABC1234561

CONTRACTS2-2-200093KFLLA10

CONTRACTS2-2-2000929ABC29859

CONTRACTS2-2-2000PAFJG28958

FINANCE2-2-2000PPRU_32597

SALES8-3-2001HUJI_BFIOQ6

SALES6-4-2001LLOA_82J5J5

CONTRACTS3-7-2000KKO92854_dd4

CONTRACTS1-25-2000DKEF3

CONTRACTS10-2-2000ABC925_24FN2

RCRD SRCLOAD DTSCUSTOMER #ID

MANUFACT10-12-2000ABC1234561

A Hub is a list of unique business keys.

Page 16: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault – Three Basic Building Blocks

SATELLITE

Primary KeyLoad DTS

DetailBusiness Data

Aggregation Data

{Update User}{Update DTS}

Record Source

Primary KeyLoad DTS

DetailBusiness Data

Aggregation Data

{Update User}{Update DTS}

Record Source

CONTRACTS10-2-2000ABC925_24FN2

MANUFACT10-12-2000ABC1234561

RCRD SRCLOAD DTSCUSTOMER #ID

CONTRACTS10-2-2000ABC925_24FN2

MANUFACT10-12-2000ABC1234561

RCRD SRCLOAD DTSCUSTOMER #ID

CONTRACTSWorldwide Suppliers Inc10-14-20002

CONTRACTSWorldPart10-2-20002

CONTRACTSABC DEF Incorporated12-2-20001

MANUFACTABC Worldwide Suppliers, Inc10-31-20001

MANUFACTABC Suppliers, Inc10-14-20001

MANUFACTABC Suppliers10-12-20001

RCRD SRCNAMELOAD DTSCSID

CONTRACTSWorldwide Suppliers Inc10-14-20002

CONTRACTSWorldPart10-2-20002

CONTRACTSABC DEF Incorporated12-2-20001

MANUFACTABC Worldwide Suppliers, Inc10-31-20001

MANUFACTABC Suppliers, Inc10-14-20001

MANUFACTABC Suppliers10-12-20001

RCRD SRCNAMELOAD DTSCSID

CUSTOMER NAME SATELLITE

A Satellite is a time-dimensional table housing detailed information about the hub’s business keys.

Page 17: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault – Three Basic Building Blocks

Hub Employees

ELAName

EEOCDates

Employees HUB and some of its Satellites

Page 18: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault – Three Basic Building Blocks

LINK

Primary Key

Load DTS

Record Source

CONTRACTS10-2-2000ABC925_24FN2

MANUFACT10-12-2000ABC1234561

RCRD SRCLOAD DTSCUSTOMER #ID

CONTRACTS10-2-2000ABC925_24FN2

MANUFACT10-12-2000ABC1234561

RCRD SRCLOAD DTSCUSTOMER #ID

FINANCE10-14-2000CONT259101

FINANCE10-14-2000CONT212100

RCRD SRCLOAD DTSCONTACT #ID

FINANCE10-14-2000CONT259101

FINANCE10-14-2000CONT212100

RCRD SRCLOAD DTSCONTACT #ID

FINANCE10-14-20001012

FINANCE10-14-20001001

RCRD SRCLOAD DTSCONTACT IDCSID

FINANCE10-14-20001012

FINANCE10-14-20001001

RCRD SRCLOAD DTSCONTACT IDCSID

A Link is an associative or intersection table, representing theconnection between information between business elements.

Link Table

Page 19: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault – Three Basic Building Blocks

Hub Employees

ELAName

EEOCDates

Hub Schools

Geo CdAddr

FloorBldg

Assign

Sat

Hub and SatellitesHub and Satellites

Link and Satellites

Page 20: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault

Brief History and Revisit Some Definitions

Three Basic Building Blocks of the Data Vault

Advanced Features Questions

Page 21: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault – Advanced Features Point-In-Time –

A structure which sustains integrity of joins across time to all the SATELLITES that are connected to the HUB or LINK.

Bridge – A single row table that contains the latest Load Date Time

Stamp (DTS). Similar to Point-In-Time except it spans a subject-area or a schema.

User Grouping Link – The information provides the user with a customized view

from a reporting standpoint and does not affect the underlying information.

Page 22: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault – Advanced FeaturesPoint-In-Time (PIT)

MANUFACT10-12-2000ABC1234561

RCRD SRCLOAD DTSCUSTOMER #ID

MANUFACT10-12-2000ABC1234561

RCRD SRCLOAD DTSCUSTOMER #ID

ABC DEF Incorporated12-2-20001

ABC Worldwide Suppliers, Inc10-31-20001

NAMELOAD DTSCSID

ABC DEF Incorporated12-2-20001

ABC Worldwide Suppliers, Inc10-31-20001

NAMELOAD DTSCSID

123 World Drive12-5-20001

123 World Dr10-14-20001

ADDRESSLOAD DTSCSID

123 World Drive12-5-20001

123 World Dr10-14-20001

ADDRESSLOAD DTSCSID

12-5-200012-2-200012-5-20001

10-14-200012-2-200012-2-20001

10-31-2000

10-14-2000

NAME_LOAD_DTS

10-14-2000

10-14-2000

ADDRESS_LOAD_DTS

10-31-20001

10-14-20001

LOAD DTSCSID

12-5-200012-2-200012-5-20001

10-14-200012-2-200012-2-20001

10-31-2000

10-14-2000

NAME_LOAD_DTS

10-14-2000

10-14-2000

ADDRESS_LOAD_DTS

10-31-20001

10-14-20001

LOAD DTSCSID

A structure which sustains integrity of joins across time to all the satellites that are connected to the hub.

Customer Name Satellite Customer Address Satellite

Hub Key

Load Date

{Sat Load DTS}

{Sat Load DTS}

{Rec Source}

Hub Key

Load Date

{Sat Load DTS}

{Sat Load DTS}

{Rec Source}

Page 23: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault – Advanced FeaturesBridge

A single row table that contains the latest Load DTS with multiple columns. A Bridge is not a helper table.

Similar to a PIT Table except it spans or applies to a subject-area or schema. A PIT Table is HUB (LINK) and SATELLITE specific.

Page 24: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault – Advanced FeaturesUser Grouping Link

Primary Key

Load DTS

Record Source

Primary Key

Load DTS

Record Source

EXCEL10-2-2000Small Customers2

EXCEL10-12-2000Big Customers1

RCRD SRCLOAD DTSGrouping LabelID

EXCEL10-2-2000Small Customers2

EXCEL10-12-2000Big Customers1

RCRD SRCLOAD DTSGrouping LabelID

FINANCE10-14-2000ABC-1101

FINANCE10-14-2000ABC295882100

RCRD SRCLOAD DTSCustomer #ID

FINANCE10-14-2000ABC-1101

FINANCE10-14-2000ABC295882100

RCRD SRCLOAD DTSCustomer #ID

EXCEL10-14-20001011

EXCEL10-14-20001001

RCRD SRCLOAD DTSCustomer #Grp#

EXCEL10-14-20001011

EXCEL10-14-20001001

RCRD SRCLOAD DTSCustomer #Grp#

The User Grouping Link, allows users to “state” how they want roll-ups to occur – in situations where source data doesn’t exist.

BASE TABLE:

Page 25: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault – How is DPS using DV Hub_Students

Student_IDSIS_CodeLoad_DTSRec_SRC

Hub_Students

Student_IDSIS_CodeLoad_DTSRec_SRC

Hub_Schools

School_IDSchool_NumberLoad_DTSRec_SRC

Hub_Schools

School_IDSchool_NumberLoad_DTSRec_SRC

Hub_Employees

Employee_IDHR_Emp_IDDPSIDLoad_DTSRec_SRC

Hub_Employees

Employee_IDHR_Emp_IDDPSIDLoad_DTSRec_SRC

Lnk_School_Enrollments

Sch_Enr_IDSchool_IDStudent_IDGrade_NameLoad_DTSRec_SRC

Lnk_School_Enrollments

Sch_Enr_IDSchool_IDStudent_IDGrade_NameLoad_DTSRec_SRC

Lnk_Teacher_Schools

Teacher_School_IDSchool_IDEmployee_IDLoad_DTSRec_SRC

Lnk_Teacher_Schools

Teacher_School_IDSchool_IDEmployee_IDLoad_DTSRec_SRC

The direction of the arrows equate to crow’s feet.

Page 26: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault – Why is DPS using DV

Storage considerations. Vertical partitioning of data (rate of

change). All the FACTS all the TIME. Scalability and Extensibility.

Page 27: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault – What was not covered. How to apply Data Vault Modeling. Best practices. Lessons Learned. Dan Linstedt’s use of DECODE in

determining changed data capture. Who’s data is it? SLAs? The new regulations / compliance that will

affect all of us.

Page 28: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault – Questions?

Page 29: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault - References DATA VAULT OVERVIEW: THE NEXT EVOLUTION IN DATA MODELING

Dan Linstedt - Core Integration Partners, Inc.http://www.tdan.com/i021hy01.htm

DATA VAULT™ OVERVIEW THE NEXT EVOLUTION IN DATA MODELING SERIES 2Dan Linstedt - Core Integration Partners, Inc.http://www.tdan.com/i023hy02.htm

DATA VAULT - SERIES 3 END-DATES AND BASIC JOINS Dan Linstedt - Core Integration Partnershttp://www.tdan.com/i024hy02.htm

DATA VAULT - SERIES 4 LINK TABLES Dan Linstedt - Core Integration Partnershttp://www.tdan.com/i027ht04.htm

DATA VAULTTM OVERVIEW THE NEXT EVOLUTION IN DATA MODELING SERIES 5 – LOADING TABLES Dan Linstedt - Core Integration Partnershttp://www.tdan.com/i027ht04.htm

Data Vault Modeling – Class Materials and Notes; copyright 2002-2003Dan Linstedt – Core Integration Partnershttp://www.coreintegration.com

Home of the Data Vault; www.danlinsedt.com Audit the Data – or Else. Un-audited Data Access Puts Business at High Risk; Bloor, Robin

and Baroudi, Carol; Lumigent, Inc.; copyright 2004

Page 30: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault – Contact Information

JEFFREY [email protected]

Page 31: Data Vault RMOUG Training Days 2006 Colorado Convention Center Denver, Colorado February 15-16.

Data Vault

RMOUG Training Days

2006Colorado Convention Center

Denver, Colorado February 15-16