Bi dimension modelling basics

29
Introduce Microsoft BI & Basics of Dimension Modelling Parikshit Savjani

description

 

Transcript of Bi dimension modelling basics

Page 1: Bi   dimension modelling basics

Introduce Microsoft BI & Basics of Dimension Modelling

Parikshit Savjani

Page 2: Bi   dimension modelling basics

Parikshit Savjani is a Premier Field Engineer with Microsoft with specialization on SQL Server and Business Intelligence (SSAS,SSIS and SSRS).His role involves consulting,performance tuning,delivering workshops,chalk talks to Premier Customers of Microsoft environment. He has 4.5 years of experience with Microsoft & SQL Server.He contributes to the community by Blogging his learnings on this site, www.sqlserverfaq.net & MSDN Blogs

Know the Presenter

Page 3: Bi   dimension modelling basics

Agenda

Introduce Microsoft BI & Basics of Dimension Modelling

Page 4: Bi   dimension modelling basics

What is Business Intelligence?• BI is process which allows Business Analysts to take

informed decisions better and faster.

• Data Warehouse is the process of consolidating the data from disparate data sources to facilitate BI.

• Dimension Modeling is the data modeling principle to architect the Data Warehouse to support BI.

Page 5: Bi   dimension modelling basics

Enterprise BI• Comprehensive view of Corporate Data

• Dedicated IT Staff• Large Volumes of Data• Complex Business Logic• Complex Security

Team BI• Created and Managed by Team of Information Workers

• Multi-User, but not corporate level

• Variable Security Requirements• Consistency of Data and Terms• Reduced Data Volumes• Fewer Users• Monitored by IT Staff

Personal BI• Built and Managed by Information Workers/Analysts

• Use Familiar Tools (Excel)• Models Evolve Dynamically• Data Owned by Information Workers

• Variable Data Sources• Small Data Volumes• Single User

BI Overview

Page 6: Bi   dimension modelling basics

Enterprise BI Solution

Page 7: Bi   dimension modelling basics

Microsoft BI Stack

Page 8: Bi   dimension modelling basics

Data Modeling ConceptsNormalization Principles• 1st Normal Form

Every row should be uniquely identified by PK No Repeating group of columns

• 2nd Normal Form In the Composite Primary Key there should be no

partial dependency

• 3rd Normal Form Non-key attribute should be dependent only on Key

attribute and no other non-key attribute

Page 9: Bi   dimension modelling basics

Data Modeling Demo

Page 10: Bi   dimension modelling basics

ORDER NUMBERCUSTOMER ID

CUSTOMER NAMECUSTOMER CITY

CUSTOMER STATECOUNTRY

EMPLOYEEIDEMPLOYEE NAMEEMPLOYEE EMAILEMPLOYEE PHONE

PRODUCTIDPRODUCTNAME

PRODUCTCATEGORYMODELIDMODEL

VENDORID VENDOR

UNITPRICEQUANTITYDISCOUNT

SALESAMOUNT

ORDER NUMBERCUSTOMER ID

CUSTOMER NAMECUSTOMER CITY

CUSTOMER STATECOUNTRY

EMPLOYEEIDEMPLOYEE NAMEEMPLOYEE EMAILEMPLOYEE PHONE

PRODUCTIDORDER NUMBER PRODUCTNAME

PRODUCTCATEGORYMODELIDMODEL

VENDORID VENDOR

UNITPRICEQUANTITYDISCOUNT

SALESAMOUNT

PRODUCTIDPRODUCTNAME

PRODUCTCATEGORYUNIT PRICEMODELIDMODEL

VENDORID VENDOR

PRODUCTIDORDER NUMBER

QUANTITYDISCOUNT

SALESAMOUNT

Page 11: Bi   dimension modelling basics

ORDER NUMBERCUSTOMER IDEMPLOYEEID

PRODUCTIDPRODUCTNAME

PRODUCTCATEGORYUNIT PRICEMODELID

VENDORID

PRODUCTIDORDER NUMBER

QUANTITYDISCOUNT

SALESAMOUNT

MODELIDMODENAME

VENDORID VENDOR

EMPLOYEEIDEMPLOYEE NAMEEMPLOYEE EMAILEMPLOYEE PHONE

CUSTOMER IDCUSTOMER NAMECUSTOMER CITY

CUSTOMER STATECOUNTRY

VENDOR MASTER

PRODUCT MODEL MASTER

PRODUCT MASTER

CUSTOMER MASTER

EMPLOYEE MASTER

ORDER MASTER

ORDER TRANSACTIONS

3rd Normal OLTP Design

Page 12: Bi   dimension modelling basics

Dimension Modeling Demo

Page 13: Bi   dimension modelling basics

PRODUCTIDPRODUCTBUSINESSKEY

PRODUCTNAMEPRODUCTCATEGORY

SIZE COLOR

UNIT PRICEMODEL

VENDOR

PRODUCTIDCUSTOMERIDEMPLOYEEID

QUANTITYDISCOUNT

SALESAMOUNTORDER NUMBER

EMPLOYEEID EMPLOYEEBUSINESSKEY

EMPLOYEE NAMEEMPLOYEE EMAILEMPLOYEE PHONE

CUSTOMER IDCUSTOMERBUSINESSKE

YCUSTOMER NAMECUSTOMER CITY

CUSTOMER STATECOUNTRY

PRODUCT DIMENSION

CUSTOMER DIMENSION

EMPLOYEE DIMENSION

FACT SALES

DIMENSION MODEL

Page 14: Bi   dimension modelling basics

Dimension Modeling ConceptsDimension Tables

Provides context to slice the dataMaps to the Master Table of the OLTP system

mapsShould be Denormalized & should be 1st NFAre wide in nature. Comparatively shallow as compared to Fact

Tables. Include as many columns as you can think ofAre related to only Fact table and otherwise

should be unrelated

Fact Tables

Measures of interest.Maps to Transactional table of OLTP system.Are in 3 NFNarrow in NatureVery Deep contains rows for every transactionAggregated in the context of the DimensionsConsists of Key Columns and Measure

Columns

Page 15: Bi   dimension modelling basics

Star Schema A Star Schema contains a

fact table and one or more dimension tables. 1. A Fact Table: The central

fact table store the numeric fact (measures) such as Sales dollars, Costs, Unit Sales etc.

2. Dimension Tables: They surround the central fact table, and they store descriptive information about the measures

The shape looks like a Star

Page 16: Bi   dimension modelling basics

SnowFlake Schema

If there are m dimensions and if each dimension has n rows, the theoretical size of the Cube is m*n.Addition of one redundant Dimension can increase the size of the Cube by large amount.

Page 17: Bi   dimension modelling basics

Dimension Modelling - Caveat

• If there are m dimensions and if each dimension has n rows, the theoretical size of the Cube is m*n.

• Addition of one redundant Dimension can increase the size of the Cube by large amount.

Page 18: Bi   dimension modelling basics

Dimension Modeling Designs• Conformed Dimensions• Reference Dimensions• Role Playing Dimensions• Parent Child Dimension• Many to Many Dimensions• Slowly Changing Dimensions• Degenerate Dimensions/Fact

Dimensions• Factless Fact

Page 19: Bi   dimension modelling basics

Conformed Dimensions• A conformed dimension is a dimension that has exactly the same

meaning and content when being referred from different fact tables in multiple datamarts.

• For two dimension tables to be considered as conformed, they must either be identical or one must be a subset of another

• There cannot be any other type of difference between the two tables. For example, two dimension tables that are exactly the same except for the primary key are not considered conformed dimensions.

• The time dimension is a common conformed dimension in an organization

Page 20: Bi   dimension modelling basics

Reference Dimensions• Snowflake schema• A Reference dimension using columns from multiple

tables, or the dimension table links a dimension that is directly linked to the fact table

Page 21: Bi   dimension modelling basics

Role Playing

Dimensions • It is used in a cube more than one time, each time for a different purpose.

• Each role-playing dimension is joined to a fact table on a different foreign key.

Page 22: Bi   dimension modelling basics

Parent Child Dimensions

• A Parent Child Dimension is a standard dimension which contains parent-child hierarchy.

• A parent-child hierarchy is a hierarchy in a standard dimension that contains a parent attribute.

• A parent attribute describes a self-referencing relationship, or self-join, within a dimension main table.

Page 23: Bi   dimension modelling basics

Many to Many Dimension

• DIMENSION MODEL (BANK)

BRANCHIDTIMEKEY

CUSTOMERIDTRANSACTIONAMOUN

TTRANSACTIONTYPE

BRANCHID

TIMEKEY

CUSTOMERIDACCOUNTID

BRANCHIDTIMEKEY

ACCOUNTIDTRANSACTIONAMOUN

TTRANSACTIONTYPE

ACCOUNTIDCUSTOMERID

CUSTOMERID

INTERMEDIATE FACT TABLE

Page 24: Bi   dimension modelling basics

Slowly Changing Dimension

• Ideally Dimensions Attributes are never expected to change over time. For e.g Month, City, State, Cost.

• Some of Attributes of Dimension might change over a time For e.g. ProductUnitPrice, CustomerCity referred to as SCD.

• Type 1 SCD• No History is maintained.

• Type 2 SCD• History maintained in the form of rows.

• Type 3 SCD• History maintained in the form of columns.

SurrogateKeyStartDateEndDateStatus

Page 25: Bi   dimension modelling basics

Degenerate Dimensions

• Known as Degenerate dimension• Fact dimension is a standard dimension that is

constructed from the columns directly in the fact table

Page 26: Bi   dimension modelling basics

Factless Fact

• Fact Tables with no Measures.• Used to measure the occurrence of

an event

StudentIDTeacherIDTimeKeyClassID

DIMENSION MODEL (SCHOOL)

StudentID

TeacherID

TimeKey

ClassID

Page 27: Bi   dimension modelling basics

ReferencesData warehousing Toolkit 2.0 – Ralph Kimball

Page 28: Bi   dimension modelling basics

Q&A

Page 29: Bi   dimension modelling basics

• Parikshit Savjani Email: [email protected] Blog: http://www.sqlserverfaq.net