DW Concepts Dimension Modeling Techniques

download DW Concepts Dimension Modeling Techniques

of 59

Transcript of DW Concepts Dimension Modeling Techniques

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    1/59

    1

    www.technologica.com

    DW Concepts

    Dimension ModelingTechniques

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    2/59

    2

    www.technologica.com

    TechnoLogica DW Projects

    Business Management SystemNational Health Insurance Fund (10.2004 current)

    Customer Data IntegrationAllianz Bulgaria Holding (10.2004 current)

    Regulatory Reporting SystemBULBANK (2002 - 2003)

    Information System Monetary StatisticsBulgarian National Bank (April 2003 August 2004)

    Management Information SystemBULBANK (January 2001 - June 2002)

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    3/59

    3

    www.technologica.com

    Agenda

    DW Terminology Overview

    Dimensional Modeling

    Dimension Types

    History and Dimensions

    Hierarchy in Dimensions

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    4/59

    4

    www.technologica.com

    The data warehouse must

    Make an organizations information easily accessible.

    Present the organizations information consistently.

    Be adaptive and resilient to change

    Be a secure bastion that protects our informationassets.

    Serve as the foundation for improved decision making

    The business community must accept the datawarehouse if it is to be deemed successful.

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    5/59

    5

    www.technologica.com

    Components of a Data Warehouse

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    6/59

    6

    www.technologica.com

    Dimensional Modeling

    Dimensional modeling is a new name for an oldtechnique for making databases simple andunderstandable

    Dimensional modeling is quite different from third-normal-form (3NF) modeling

    ERM ->The TransactionProcessing Model

    o One table per entity

    o Minimize data redundancy

    o Optimize update

    DM -> The data warehousingmodel

    o One fact table for a process inthe organization

    o Maximize understandability

    o Optimized for retrieval

    o Resilient to change

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    7/59

    7

    www.technologica.com

    Star Dimensional Modeling

    History(Dimension

    table)

    Customer(Dimension

    table)

    Product(Dimension

    table)

    Channel(Dimension

    table)

    Item_nbr

    Item_descQuantityDiscnt_priceUnit_priceOrder_amount

    (Fact table)

    OrderHistory

    (Dimensiontable)

    Customer(Dimension

    table)

    Product(Dimension

    table)

    Channel(Dimension

    table)

    Item_nbr

    Item_descQuantityDiscnt_priceUnit_priceOrder_amount

    (Fact table)

    Order

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    8/59

    8

    www.technologica.com

    Four-Step Dimensional Design Process

    1. Select the business process to model.

    2. Declare the grain of the business process.

    3. Choose the dimensions that apply to each facttable row.

    4. Identify the numeric facts that will populateeach fact table row.

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    9/59

    9

    www.technologica.com

    Dimensions

    Determine these by the ways you want to sliceand dice the data

    Small number of rows compared to facts

    Usually 5-10 dimensions surrounding a fact table

    Time is almost always a dimension used byevery fact

    Track history

    Uses Surrogate Keys

    Hierarchies are usually built into them if possible

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    10/59

    10

    www.technologica.com

    Date Dimension

    The date dimension is the one dimension nearlyguaranteed to be in every data mart

    Date Dimension = Time Dimension before

    We can build the date dimension table inadvance (5-10 years -> only 3,650 rows)

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    11/59

    11

    www.technologica.com

    DateDimension

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    12/59

    12

    www.technologica.com

    Date Dimension

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    13/59

    13

    www.technologica.com

    Date Dimension

    Data warehouses always need an explicit datedimension table. There are many date attributesnot supported by the SQL date function, includingfiscal periods, seasons, holidays, and weekends.

    Rather than attempting to determine thesenonstandard calendar calculations in a query, weshould look them up in a date dimension table.

    select sum(f.amount_sold)

    from DATE_DIM d, FACT fwhere d.Calendar_Month = January

    and d.id = f.date_dim_id;

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    14/59

    14

    www.technologica.com

    Dimension Normalization(Denormalized dimension)

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    15/59

    15

    www.technologica.com

    Dimension Normalization(Denormalized dimension)

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    16/59

    16

    www.technologica.com

    Dimension Normalization(Snowflaking)

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    17/59

    17

    www.technologica.com

    Dimension Normalization(Snowflaking)

    The dimension tables should remain as flattables physically.

    Normalized, snowflaked dimension tables

    penalize cross-attribute browsing and prohibit theuse of bit-mapped indexes.

    Disk space savings gained by normalizing thedimension tables typically are less than 1 percent

    of the total disk space needed for the overallschema

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    18/59

    18

    www.technologica.com

    Too Many Dimensions

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    19/59

    19

    www.technologica.com

    Too Many Dimensions

    A very large number of dimensions typically is asign that several dimensions are not completelyindependent and should be combined into asingle dimension.

    If our design has 25 or more dimensions, weshould look for ways to combine correlateddimensions into a single dimension

    It is a dimensional modeling mistake to representelements of a hierarchy as separate dimensionsin the fact table.

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    20/59

    20

    www.technologica.com

    Surrogate Keys

    Every join between dimension and fact tables inthe data warehouse should be based onmeaningless integer surrogate keys.

    You should avoid using the natural operationalproduction codes. None of the data warehousekeys should be smart, where you can tellsomething about the row just by looking at thekey.

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    21/59

    21

    www.technologica.com

    Surrogate Keys

    Surrogate keys are like an immunization for thedata warehouse

    Buffer the data warehouse environment fromoperational changes

    Performance advantagesThe smaller surrogate key translates into smaller fact tables,smaller fact table indices, and more fact table rows per blockinput-output operation

    Surrogate keys are used to record dimensionconditions that may not have an operational codeNo Promotion in Effect, Date Not Applicable.

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    22/59

    22

    www.technologica.com

    Surrogate Keys

    The date dimension is the one dimension wheresurrogate keys should be assigned in ameaningful, sequential order

    Surrogate keys are needed to support one of theprimary techniques for handling changes todimension table attributes

    Dont use concatenated or compound keys for

    dimension tables

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    23/59

    23

    www.technologica.com

    Data Warehouse Bus Architecture

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    24/59

    24

    www.technologica.com

    Data Warehouse Bus Matrix

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    25/59

    25

    www.technologica.com

    Conformed Dimensions

    Most dimensions are defined naturally at the mostgranular level possible

    Conformed dimensions are either identical or strictmathematical subsets of the most granular,

    detailed dimension

    They have consistent dimension keys, consistentattribute column names, consistent attributedefinitions, and consistent attribute values

    The conformed dimension may be the samephysical table within the database or may beduplicated synchronously in each data mart

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    26/59

    26

    www.technologica.com

    Conformed Dimensions

    Roll-up dimensions conform to the base-levelatomic dimension if they are a strict subset of thatatomic dimension.

    2

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    27/59

    27

    www.technologica.com

    Conformed Dimensions

    They should be built once in the staging area

    They must be published prior to staging of thefact data

    The dimension authority has responsibility fordefining, maintaining, and publishing a particulardimension or its subsets to all the data martclients who need it

    28

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    28/59

    28

    www.technologica.com

    Tracking History in Dimensions

    Unchanging Dimensions

    Changing, but Original Values are IrrelevantA phone number in a customer record

    Slowly Changing Dimensions (SCD)A customer address, manager

    Rapidly Changing DimensionsIncome range of a customer

    Continuously Changing DimensionsCustomer age

    29

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    29/59

    29

    www.technologica.com

    Type 1: Overwrite the Value

    The type 1 response is easy to implement, but:

    it does not maintain any history of prior attribute values

    any preexisting aggregations based on the departmentvalue will need to be rebuilt

    30

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    30/59

    30

    www.technologica.com

    The type 2 response is the primary technique foraccurately tracking slowly changing dimensionattributes. It is extremely powerful because thenew dimension row automatically partitions

    history in the fact table.

    Its not suitable for dimension tables that alreadyexceed a million rows

    Type 2: Add a Dimension Row

    31

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    31/59

    31

    www.technologica.com

    Type 2: Add a Dimension Row

    ProductKey

    ProductDescription Department

    SKUNumber

    EffectiveDate

    ExpirationDate

    12345 IntelliKidz 1.0 Education ABC922-Z 01.1.1900 22.4.2005

    25984 IntelliKidz 1.0 Strategy ABC922-Z 23.4.2005 01.1.2500

    ProductKey

    ProductDescription Department

    SKUNumber

    EffectiveDate

    Most

    ResentFlag

    12345 IntelliKidz 1.0 Education ABC922-Z 01.1.1900 N

    25984 IntelliKidz 1.0 Strategy ABC922-Z 23.4.2005 Y

    Product

    Key Date Key

    Amount

    Sold

    12345 200 100

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    32/59

    32

    www.technologica.com

    Type 3: Add a Dimension Column

    The type 3 slowly changing dimension techniqueallows us to see new and historical fact data byeither the new or prior attribute values.

    33

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    33/59

    33

    www.technologica.com

    Hybrid SCD Techniques

    Series of Type 3 Attributes

    Predictable Changes withMultiple Version Overlays

    Report each years sales using thedistrict map for that year.

    Report each years sales using adistrict map from an arbitrarydifferent year.

    Report an arbitrary span of years

    sales using a single district mapfrom any chosen year. The mostcommon version of this requirementwould be to report the completespan of fact data using the currentdistrict map.

    34

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    34/59

    34

    www.technologica.com

    Hybrid SCD TechniquesType 2 with "Current" Overwrite

    Unpredictable Changes with Single-Version Overlaypreserves historical accuracy while supporting the ability toreport historical data according to the current values

    35

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    35/59

    35

    www.technologica.com

    Dimension Table Staging

    36

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    36/59

    36

    www.technologica.com

    Dimension Table Staging

    38

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    37/59

    38

    www.technologica.com

    Junk Dimensions

    What to do with flags and indicators Leave the flags and indicators unchanged in the fact

    table row.

    Make each flag and indicator into its own separate

    dimension Strip out all the flags and indicators from the design.

    A junk dimension is a convenient grouping oftypically low-cardinality flags and indicators

    39

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    38/59

    39

    www.technologica.com

    Junk Dimensions

    Whether to use junk dimension5 indicators, each has 3 values -> 243 (35) rows

    5 indicators, each has 100 values -> 100 million (1005) rows

    When to insert rows in the dimension

    40

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    39/59

    40

    www.technologica.com

    Multiple Currencies

    41

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    40/59

    41

    www.technologica.com

    Customer Dimension

    Critical element for effective CRM

    The most challenging dimension for any datawarehouse

    extremely deep (with millions of rows) extremely wide (with dozens or even hundreds of

    attributes)

    sometimes subject to rather rapid change

    42

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    41/59

    42

    www.technologica.com

    Customer DimensionName and Address Parsing

    43

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    42/59

    43

    www.technologica.com

    Customer DimensionOther Common Customer Attributes

    Gender

    Ethnicity

    Age or other life-stage classifications

    Income or other lifestyle classificationsStatus (for example, new, active, inactive, closed)

    Referring source

    Business-specific market segment

    Scores characterizing the customer, such aspurchase behavior, payment behavior, productpreferences

    44

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    43/59

    www.technologica.com

    Customer DimensionAggregated Facts as Attributes

    These attributes are to be used for constrainingand labeling; they are not to be used in numericcalculations

    Focus on those which will be used frequently

    Minimize the frequency with which theseattributes need to be updated

    Replace metrics with more meaningful

    descriptive values, such as High Spender

    45

    Di i O i f

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    44/59

    www.technologica.com

    Dimension Outriggers for aLow-Cardinality Attribute Set

    46

    R idl Ch i C

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    45/59

    www.technologica.com

    Rapidly Changing CustomerDimensions

    Challenges It generally takes too long to constrain or browse

    among the relationships in such a big table

    It is difficult to use previously described techniques fortracking changes in these large dimensions

    One solution is to break off frequently analyzed orfrequently changing attributes into a separatedimension, referred to as a minidimension

    47

    R idl Ch i C t

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    46/59

    www.technologica.com

    Rapidly Changing CustomerDimensions

    The Mini Dimension with "Current" Overwrite

    48

    R idl Ch i C t

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    47/59

    www.technologica.com

    Rapidly Changing CustomerDimensions

    The minidimensionterminology refers to whenthe demographics key is part of the fact tablecomposite key

    If the demographics key is a foreign key in thecustomer dimension, we refer to it as anoutrigger

    49

    R idl Ch i C t

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    48/59

    www.technologica.com

    Rapidly Changing CustomerDimensions

    Type 2 with Natural Keys in Fact Table

    Customer Dimension - Current

    Attributes (SCD1) Fact Table

    Customer ID (Natural Key) Customer Key (FK)

    Customer Name Customer Demographics Key (FK)

    Customer Address More Foreign Keys

    Customer Date of Birth Facts

    Customer Date of 1st Order

    Age

    Gender

    Customer Dimension - "As was"

    Attributes (SCD2)

    Annual Income Customer Key (PK)

    Number of Children Customer ID (Natural Key)

    Marital Status Customer Name

    Customer Address

    Customer Date of Birth

    Customer Date of 1st Order

    Age

    Gender

    Annual Income

    Number of Children

    Marital Status

    50

    I li ti f T 2 C t

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    49/59

    www.technologica.com

    Implications of Type 2 CustomerDimension Changes

    Be careful to avoid overcounting because wemay have multiple rows in the customerdimension for the same individual

    COUNT DISTINCT

    A most recent row indicator

    The comparison operators depend on thebusiness rules used to set our effective/expirationdates.

    51

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    50/59

    www.technologica.com

    Capture the keys of the customers or productswhose behavior you are tracking

    Customer Behavior Study Groups

    52

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    51/59

    www.technologica.com

    Commercial Customer Hierarchies

    53

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    52/59

    www.technologica.com

    Commercial Customer Hierarchies

    Bridge tables

    54

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    53/59

    www.technologica.com

    Commercial Customer Hierarchies

    55

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    54/59

    www.technologica.com

    Commercial Customer Hierarchies

    Be aware of risk of double counting

    SELECT 'San Francisco', SUM(F.REVENUE)FROM FACT F, DATE DWHERE F.CUSTOMER_KEY IN

    (SELECT B.SUBSIDIARY_KEYFROM CUSTOMER C, BRIDGE BWHERE C.CUSTOMER_KEY =

    B.PARENT_KEY

    AND C.CUSTOMER_CITY = 'SanFrancisco') //to sum all SF parentsAND F.DATE_KEY = D.DATE_KEYAND D.MONTH = 'January 2002GROUP BY 'San Francisco'

    56

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    55/59

    www.technologica.com

    Heterogeneous Product Schemas

    57

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    56/59

    www.technologica.com

    Heterogeneous Product Schemas

    58

    Common Dimensional Modeling

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    57/59

    www.technologica.com

    Common Dimensional ModelingMistakes to Avoid

    Mistake 10: Place text attributes used forconstraining and grouping in a fact table

    Mistake 9: Limit verbose descriptive attributes indimensions to save space

    Mistake 8: Split hierarchies and hierarchy levelsinto multiple dimensions

    Mistake 7: Ignore the need to track dimension

    attribute changes

    Mistake 6: Solve all query performance problemsby adding more hardware

    59

    Common Dimensional Modeling

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    58/59

    www.technologica.com

    Common Dimensional ModelingMistakes to Avoid

    Mistake 5: Use operational or smart keys to joindimension tables to a fact table

    Mistake 4: Neglect to declare and then complywith the fact tables grain

    Mistake 3: Design the dimensional model basedon a specific report

    Mistake 2: Expect users to query the lowest-level

    atomic data in a normalized forma

    Mistake 1: Fail to conform facts and dimensionsacross separate fact tables

    60

  • 8/3/2019 DW Concepts Dimension Modeling Techniques

    59/59

    Answers

    Questions

    and