SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Post on 03-Dec-2015

6 views 2 download

description

Dimensional

Transcript of SQL Saturday 46 Raleigh Sep 2010 - Dimensional Modeling

Blog:www.Rafael-Salas.com

Email:rfsalas@yahoo.es

@RafSalas

About Rafael

DW BI Professional– 12 years

SQL Server MVP – 4 years

Architect/Consultant @ Quaero, CSG

Systems

Live in Charlotte, NC

Quaero is Hiring! DB Engineer

5+ years of database support

Expertise on SQL Server 2005 and 2008 database environment is a must

Expertise on ETL skills including SSIS packages, stored procedures and T-SQL.

Ability to work directly and effectively with clients.

Experience working in complex production database environments

Experience in implementing data hygiene and customer matching routine is plus.

Excellent written and verbal communication skills

Experience in scripting language and XML a plus.

Rafael_Salas@CSGSystems.com

Agenda

The Stage: Kimball‟s Data Warehouse

Lifecycle overview

Dimensional Modeling Basics

Dimensional Design Process: 4 steps

More About Dimension tables

More About Fact Tables

What To Expect?

Introduction to dimensional modeling

concepts, terminology and design

guidelines

Not an advanced dimensional modeling

class

No demos, but lots of slides

Questions welcome at anytime

The Stage: Kimball’s DW

Lifecycle

Kimball DW Lifecycle is one of the most

popular data warehousing

methodologies

First Lifecycle book published in 1996,

latest in 2010

Dimensional model or “star schema” is

today‟s dominant “theme” in leading BI

field

Kimball DW Lifecycle

Fundamentals

Enterprise data warehouse framework

Business Driven

Iterative approach

Dimensional Model for data delivery

Intuitive DB model to end users

Fast query performance

Dimensional Modeling in the DW

Lifecycle

Dimensional Modeling

Logical model design technique

Intuitive DB structures to end users

Fast query performance

Divides the world in

Facts

Dimensions

Also known as “Star Schema”

Reviewing Star Schema Benefits

Transforms normalized data into a simpler model

Delivers high-performance queries

SQL Server offers Star Join Query Optimization

Uses mature modeling techniques that are widely supported by many BI tools

Requires low maintenance as the data warehouse design evolves

Introducing the Star Schema

Facts

A measurement of a business event

Numeric values

Additive, semi-additives, non-additives

Normalized data structures

Fact Table Anatomy

Dimension keys (FKs)

Facts

Dimensions

Context of the facts

Descriptive attributes

Who, what, where, when, how…

Query Constraining and result set

labeling

Denormalized data structures

e.g., Geography, Customer, Time,

Product

Dimension Denormalization

Denormalization of

Customer

Before You Start Modeling

DW Bus Matrix

DW High level architecture

Dimensional Design Process: 4

steps

Business Requirements

• Bus Matrix

Data Reality

• Initial Data

Profiling

Step 1: Choose the business process

Step 2: Declare the grain

Step 3: Identify Dimensions

Step 4: Identify Facts

High Level Dimensional Model

Grain = one row per General Ledger

Journal line

Applied

Date

P and L

Unit

Vendor

Client

GL

Account

Number

Record

ed Date

GL Journal

Line

GL

Transacti

on Detail

= Fact

= Dimension

GL Main

Account

Period

Ending

Date

P and L

Unit

Vendor

Client

GL

Account

Number

GL Balance

Grain = one row per GL Account per

budget period

Detailed Dimensional Model

More About Dimensions

Surrogate Keys

Conformed Dimensions

Slowly Changing Dimensions (SCD)

Role-Playing Dimensions

Date Dimension

Surrogate keys

“A meaningless key, ideally integer number, to be used as

the primary key of dimensions”

Better query performance

Creating row versioning is easier

No risk of key collision for multi-source DW

Avoid overhead of using transactional keys

Flexibility when inserting pre-defined rows

Conformed Dimensions

Shared dimensions across the enterprise

Deliver a consistent interpretation for all business process involved

Allow for drill across fact tables

ETL work is done only once

Applied

Date

P and L

Unit

Vendor

Client

GL

Account

Number

Record

ed Date

GL Journal

Line

GL

Transacti

on Detail

GL

Main

Account

Period

Ending

Date

P and L

Unit

Vendor

Client GL

Account

Number

GL Balance

Slowly Changing Dimensions (SCD)

How do the dimensions have to

respond to data changes?

Common types SCD Type 1

SCD Type 2

SCD Type 3

SCD Type 6

Slowly Changing Dimensions

(SCD) Type 1

Override previous value

Best when tracking history is not required

1 row per natural key

Simplest approach for handling data

changes

Insert…else…update

SQL Server 2008 T-SQL 'Merge‟

SSIS SCD Transformation

Slowly Changing Dimensions

(SCD) Type 1

Customer Key Customer

Code Customer First

Name Customer Last

Name ETL Insert

Date ETL Update

Date

12345 YFG-FDS Jane Ross 02/24/2008

Customer Dimension

Last name changes

Customer Key Customer Code

Customer First

Name Customer Last

Name ETL Insert

Date ETL Update

Date

12345 YFG-FDS Jane Smith 02/24/2008 09/09/2008

Existing row is updated!

Slowly Changing Dimensions

(SCD) Type 2

Insert a new row

Best for tracking changes in attribute values

Use effective dates to represent row lifespan

If row does not exists then insert …else

expire current version and insert new one.

Slowly Changing Dimensions

(SCD) Type 2

Customer Dimension

A new row is inserted!

Existing row is expired!

Customer

Key Customer

Code Customer First

Name Customer

Last Name Start Date End Date Current

row

12345 YFG-FDS Jane Ross 02/24/2008 12/31/2099 Y

Customer Dimension

Last name change

Customer

Key Customer

Code Customer First

Name Customer

Last Name Start Date End Date Current

row

12345 YFG-FDS Jane Ross 02/24/2008 09/08/2008 N

67843 YFG-FDS Jane Smith 09/09/2008 12/31/2099 Y

Role-Playing Dimensions

Same physical dimension plays distinct

logical roles in a fact table

Implemented through views or query aliases

Date Dimension

playing 4 roles

Date Dimension

Grain should not be lower than daily

Hour: 8,736 rows per year

Minute: 525,600 rows per year

Second: A way too many…

Surrogate key rule exception: intelligent

key is recommended (integer value:

20081011)

Time of day, if required, in fact table

(most cases)

More about Facts

3 Type of fact tables:

Transaction

Periodic snapshot

Accumulating snapshots

Transaction Fact Tables

Records events in a point in

time

Represent transaction

activity

The most common type of

fact tables

Only inserts (most cases)

Store facts at the most atomic level possible

Periodic Snapshot Fact Tables „Snapshots‟ taken in a

regular basis

regardless of activity

Stores 1 row per time

period

Complement of

transactional fact tables

Only Inserts (most

cases)

Accumulating Snapshot Fact

Tables

Captures activity for processes with defined beginning and end

1 row per event lifetime

Fact row is updated at each milestone

Least frequently used Fact table type

Accumulating Snapshot Fact

Tables Appl.Key Start Date Complete

date

Transm.Da

te

Process

date

1 20080215 -1 -1 -1

Appl.Key Start Date Complete

date

Transm.Da

te

Process

date

1 20080215 20080217 20080217 -1

Appl.Key Start Date Complete

date

Transm.Da

te

Process

date

1 20080215 20080217 20080217 20080219

Insert

Update

Update

T1

T2

T3

Dimensional Modeling Myths

It fits only as departmental solution

Limited extensibility potential

It only provides aggregated data

It only supports many-to-one

relationships

It is waste of disk space

Risks

High Profile Success (and failure!) is visible to Management

Business Driven Hard for technologists

Technology Focus Let‟s build it and users will come

Dashboards not a good starting point

Data Quality and integration

Complexity Tackling too much at once

SQL Server and Dimensional

modeling

SSAS

SSIS

SCD transformation ETL

Relational Engine

T-SQL Merge ETL

Start join optimization Query

performance

Want to learn more?

Kimball Method:

The Data Warehouse Lifecycle Toolkit. 2nd

edition. 2008

Dimensional Modeling advanced

techniques

The Data Warehouse Toolkit. 2nd edition.

2002

SQL Server 2008 BI/DW:

www.microsoft.com/bi/

Blog:www.Rafael-Salas.com

Email:rfsalas@yahoo.es

@RafSalas