Post on 27-Jan-2015
description
Data Management & Warehousing
PROCESS NEUTRAL DATA MODELLING
CONCEPTS
DAVID M WALKER
ETIS COMMUNITY GATHERING
13-14 NOVEMBER 2008 - BRUSSELS
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 1
14 November 2008
Agenda
•! The Issues With Conventional Data Warehouse
Data Models
•! Assumptions About The Data Model To Be
Constructed
•! Requirements Of A Data Warehouse Data Model
•! Constructing The Data Warehouse Data Model
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 2
14 November 2009
Data Management & Warehousing
THE ISSUES WITH
CONVENTIONAL DATA WAREHOUSE
DATA MODELS
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 3
14 November 2009
Issues
•! Data models take a long time to develop
•! Data models are expensive to change
–! Affects Source -> Data Warehouse ETL
–! Affects Data Warehouse -> Data Mart ETL
•! The design often reflects the first or largest
source system
–! This makes it difficult to add other systems
•! They often reflect current working practice
–! Making it difficult to change when the business does
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 4
14 November 2009
Issues
•! A struggle to keep up with rapidly changing
source system data models
•! Reference data is often not stored in a time
variant way
•! History is lost with data model changes
•! Queries directly on the data warehouse are
complex
•! Different rules apply to query each table
•! Different database platforms have different
needs © 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 5
14 November 2009
Data Management & Warehousing
ASSUMPTIONS ABOUT
THE DATA MODEL
TO BE CONSTRUCTED
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 6
14 November 2009
Assumptions
•! Used in data warehouse
–! Not in the operational systems or the data marts
–! Different style of modelling required
•! Users not going to query the data model
–! Users will query separate dependent data marts
•! Data will be extracted from the model to
populate the data marts by ETL tools
•! Data will be loaded into the model from the
source systems by ETL tools
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 7
14 November 2009
Assumptions
•! Direct updates will be prohibited
–! A separate application or applications will exist as a
surrogate source and ETL used to load the data
•! Not a ‘mixed mode’ database
–! Some parts using one data modelling convention and
other parts using another
–! This is bad practice with any modelling technique!
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 8
14 November 2009
Data Management & Warehousing
REQUIREMENTS OF A
DATA WAREHOUSE
DATA MODEL
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 9
14 November 2009
Requirements
•! Uses A Design Pattern
–! General reusable approaches and solutions to
commonly occurring problems that can be used in
many different situations
•! Convention Over Configuration
–! Decrease the number of decisions that designers /
developers need to make, gaining simplicity, without losing flexibility
–! Achieved by ensuring that tables and columns use a
standard structures, naming convention, etc. and are
populated and queried in a consistent fashion © 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 10
14 November 2008
Requirements
•! DRY (Don’t Repeat Yourself)
–! Reduce duplication because it:
•! Increases the difficulty of changing the model
•! Decreases the clarity of the model
•! Leads to opportunities for inconsistency
•! Static over a long period of time
–! No need to add or modify tables on a regular basis
–! Note: There is a difference between designed and
implemented, it is possible to have designed a table
but not to implement it until it is actually required
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 11
14 November 2008
Requirements
•! The data model should store data at the lowest
possible level
–! Information stored at the transaction level
–! Avoid the storage of aggregates
•! Supports the best use of platform specific
features without compromising the design
–! Where available supports:
•! Partitioning
•! Column Storage
•! Many Insert/Few Update strategies
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 12
14 November 2009
Requirements
•! Completely time-variant
–! It should be possible to reconstruct all information at
any point in time
•! Communication tool
–! Aids the refinement of requirements
–! Aids the explanation of possibilities
–! Develops confidence from the user
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 13
14 November 2009
Requirements
•! Uses Standard BI Relational Databases
–! Ensure that the solution can be deployed on any
current platform and, if necessary, re-deployed on a
future platform
•! Process Neutral
–! It will not reflect past, current or planned business
processes, practices or dependencies
–! Stores the data items and relationships as defined by
their use at the point in time when the information is
created and acquired
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 14
14 November 2008
CONSTRUCTING THE
DATA WAREHOUSE
DATA MODEL
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 15
14 November 2009
Data Management & Warehousing
Who is the customer?
•! Everyone has a
different definition
•! Everyone needs a
different information
•! Users have conflicting
definitions
•! Customer can be
individuals or
businesses
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 16
14 November 2009
More problems …
•! Some of the customers are suppliers as well
•! Some businesses have separate divisions that
have to be handled separately
•! Some customers interact with different divisions
within our organisation
•! Some individuals or organisations also perform
other roles
–! e.g. legal, re-sellers, partners, etc.
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 17
14 November 2009
The Party
•! These problem arises because the data is being
looked at in terms of current business process
•! In fact there is no customer entity, just different
types of party
–! Individuals, Organisations, Organisational Units
–! Concept of Party identical to that in contract law
•! The role of customer is defined not by the table
definition but by the usage of party data with
other information held (e.g. the purchase
transaction relating to a product) © 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 18
14 November 2009
Attributes of Party
•! The attributes of ‘Party’ will be those that remain
static over the life of the record
–! State ID Number, Name, Start Date, End Date
–! These attributes have ‘lifetime value’
•! Attributes that change need to be stored
elsewhere
•! The Party table needs to be categorised or typed
–! Individual, Organisation, Organisation Unit
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 19
14 November 2009
PARTIES Data Model
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 20
14 November 2009
PARTIES
•!PARTY_DWK
•!PARTY_ID •!PARTY_NAME
•!PARTY_START_DATE
•!PARTY_END_DATE
•!PARTY_TYPE_DWK PARTY_TYPES
•!PARTY_TYPE_DWK
•!PARTY_TYPE •!PARTY_TYPE_DESC
•!PARTY_TYPE_GROUP
•!PARTY_TYPE_START_DATE
•!PARTY_TYPE_END_DATE
Supporting Non-Lifetime Attributes
•! Need to add data for different Party Types
–! Marital Status for Individuals
–! Number of Children for Individuals
–! Number of Employees for Organisations
–! Turnover for Organisations
•! Need to add data that changes over the lifetime
of the party
–! Usually the same attributes that are needed for
different Party Types
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 21
14 November 2009
PARTY_PROPERTIES Data Model
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 22
14 November 2009
PARTIES
•!PARTY_DWK
•!PARTY_ID •!PARTY_NAME
•!PARTY_START_DATE
•!PARTY_END_DATE
•!PARTY_TYPE_DWK
PARTY_PROPERTY_TYPES
•!PARTY_PROPERTY_TYPE_DWK
•!PARTY_PROPERTY_TYPE •!PARTY_PROPERTY_TYPE_DESC
•!PARTY_PROPERTY_TYPE_GROUP
•!PARTY_PROPERTY_TYPE_START_DATE
•!PARTY_PROPERTY_TYPE_END_DATE
PARTY_PROPERTIES
•!PARTY_DWK
•!PARTY_PROPERTY_TYPE_DWK •!PARTY_START_DATE
•!PARTY_END_DATE
•!PARTY_PROPERTY_VALUE
Relationships between Parties
•! Parties have relationships
–! David Walker works in Professional Services
–! David Walker is employed by Data Management &
Warehousing
–! David Walker is married to Helen walker
•! This is known as a Peer-To-Peer relationship
•! This is the first place that we see a role defined
by a relationship
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 23
14 November 2009
PARTY_LINKS Data Model
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 24
14 November 2009
PARTIES
•!PARTY_DWK
•!PARTY_ID •!PARTY_NAME
•!PARTY_START_DATE
•!PARTY_END_DATE
•!PARTY_TYPE_DWK
PARTY_LINK_TYPES
•!PARTY_LINK_TYPE_DWK
•!PARTY_LINK_TYPE •!PARTY_LINK_TYPE_DESC
•!PARTY_LINK_TYPE_GROUP
•!PARTY_LINK_TYPE_START_DATE
•!PARTY_LINK_TYPE_END_DATE
PARTY_LINKS
•!PARTY_DWK
•!LINKED_PARTY_DWK •!PARTY_LINK_TYPE_DWK
•!PARTY_START_DATE
•!PARTY_END_DATE
Segments of Parties
•! Grouping Parties together because at some
point in time they shared characteristics
•! This is known as a Peer Group Relationship
•! Examples
–! Married people with two or more children
–! IT companies with less than <100 employees
•! Usually generated by analysis and the results
stored
•! Most commonly seen in market segmentation
type applications © 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 25
14 November 2009
PARTY_SEGMENTS Data Model
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 26
14 November 2009
PARTIES
•!PARTY_DWK
•!PARTY_ID •!PARTY_NAME
•!PARTY_START_DATE
•!PARTY_END_DATE
•!PARTY_TYPE_DWK
PARTY_SEGMENT_TYPES
•!PARTY_SEGMENT_TYPE_DWK
•!PARTY_SEGMENT_TYPE •!PARTY_SEGMENT_DESC
•!PARTY_SEGMENT_GROUP
•!PARTY_SEGMENT_START_DATE
•!PARTY_SEGMENT_END_DATE
PARTY_SEGMENTS
•!PARTY_DWK
•!PARTY_SEGMENT_TYPE_DWK •!PARTY_START_DATE
•!PARTY_END_DATE
Understanding The Conventions
•! All Type tables have the same format
–! Categorisation
•! All Property tables have the same format
–! Time Variant Attributes
•! All Link tables have the same format
–! Peer-To-Peer Relationships
•! All Segment tables have the same format
–! Peer Group Relationships
•! There are no other significant clusters of data about a
single entity such as Party
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 27
14 November 2009
Introducing Major Entities
•! Party is a Major Entity
–! These are entities that exist regardless of the
business process
–! It is the relationships between major entities that are
defined by business processes
–! Major Entity attributes differ from one another
•! All Organisations only need a finite number of
major entities including:
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 28
14 November 2009
–! Campaign
–! Account
–! Electronic Address
–! Asset
–! Channel
–! Contract
–! Geography
–! Product/Service
–! Calendar
Data Models For Other Major Entities
•! Geography
–! Geography Types
•! Postal Addresses, GPS Co-ordinates, ELR
–! Geographic Property Types
–! Geographic Properties
–! Geographic Link Types
–! Geographic Links
–! Geographic Segment Types
–! Geographic Segments
•! and so on for every major entity
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 29
14 November 2009
Ma
jor
En
tity
Sub-M
odel
Major Entity Sub Model
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 30
14 November 2009
MAJOR ENTITY
MAJOR ENTITY
TYPES
MAJOR ENTITY
PROPERTIES
MAJOR ENTITY
LINKS
MAJOR ENTITY
SEGMENTS
MAJOR ENTITY
PROPERTY TYPES
MAJOR ENTITY
LINK TYPES
MAJOR ENTITY
SEGMENT TYPES
Relationships Between Major Entities
•! Storing names with multiple addresses and
multiple electronic addresses (e-mail, telephone
numbers, etc.)
–! Billing, Contact, Home, Work, etc
•! Usage
–! Party -> Contract -> Account -> Electronic Address ->
A Number -> Usage
–! Party -> Contract -> Account -> Electronic Address ->
B Number -> Usage
–! Product/Service -> Tariff - Usage
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 31
14 November 2009
Party -> (Electronic) Addresses
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 32
14 November 2009
PARTY
ADDRESS
ELECTRONIC
ADDRESS
PARTY_
ADDRESS_
HISTORY
PARTY_
ELECTRONIC_ADDRESS_
HISTORY
PARTY_
ADDRESS_
HISTORY_ TYPES
PARTY_
ELECTRONIC_ADDRESS_
HISTORY_ TYPES
Party -> Usage (Simplified)
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 33
14 November 2009
PARTY CONTRACT ACCOUNT
ACCOUNT
ELECTRONIC ADDRESS
HISTORY & TYPE
USAGE HISTORY
ELECTRONIC
ADDRESS
TARIFF
PRODUCT
SERVICE PRODUCT SERVICE
TARIFF
HISTORY & TYPE
A Number
B Number
Extending the Data Model
•! Identify as many Major Entities as possible
–! But remember there are only a finite number so don’t
invent things for the sake of it
•! Define the standard sub-model around them
•! Put appropriate data in the sub-model
•! Create the relationships to _HISTORY tables for
the transaction the business wants to analyse
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 34
14 November 2009
Does this help meet requirements?
!!Uses A Design Pattern
!!Convention Over Configuration
!!DRY (Don’t Repeat Yourself)
!! Static over a long period of time
!! The data model should store data at the lowest possible
level
!! Supports the best use of platform specific features
without compromising the design
!!Completely time-variant
!!Communication tool
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 35
14 November 2009
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 36
14 November 2008
Some Key Elements
•! Self Similar modelling –! All _TYPE tables have the same structure, etc.
–! Naming conventions are consistent everywhere –! Easy to create standard algorithms for load and extraction
–! Easy to partition on type and/or date
•! Insert ‘heavy’ / Update ‘light’ –! Most ETL will result in an insert, there will be very few updates
•! Manages ‘Slowly Changing Dimensions’ –! Inherent in the Major Entity Sub-Model design
–! Significantly reduces overhead in the Data Mart build
•! Data Driven –! Types provide extensible metadata
–! Prevents un-necessary updating of the data model itself
•! Natural Star Schemas –! Histories will map to FACTS, –! Major Entity Collections will collapse into DIMENSIONS
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 37
14 November 2008
Is this all there is to it ?
•! At a high level – YES
•! BUT: –! There are methods for dealing with data quality
–! Special case methods for some lifetime attributes
•! e.g. Handling women changing their names at marriage
–! Insert/Update methods for performance
–! Design Patterns for implementation
–! Other detailed techniques
•! This talk could only ever be:
“An introduction to
Process Neutral Data Modelling”
Further Reading
•! Available From http://www.datamgmt.com
•! White Papers –! Overview Architecture for Enterprise Data Warehouses
•! March 2006 - 32 pages
–! Data Warehouse Documentation Roadmap
•! April 2007 – 28 pages
–! How Data Works •! June 2007 – 32 Pages
–! Data Warehouse Governance •! April 2007 – 24 Pages
–! Data Warehouse Project Management •! October 2008 – 32 Pages
–! Process Neutral Data Modelling
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 38
14 November 2009
© 2008 Data Management & Warehousing
David M Walker
ETIS Community Gathering, Brussels Page 39
14 November 2008
Data Management & Warehousing
Thank you !!
Website: http://www.datamgmt.com
Phone: +44 7050 028 911
E-mail: davidw@datamgmt.com
Skype/MSN: datamgmt