ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis)

Post on 23-Jan-2016

35 views 0 download

Tags:

description

ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis). Dimensional Modeling II. Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair of Business. Technical Architecture Design. Product Selection & Installation. End-User Application Specification. - PowerPoint PPT Presentation

Transcript of ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis)

1

ACCTG 6910Building Enterprise &

Business Intelligence Systems(e.bis)

ACCTG 6910Building Enterprise &

Business Intelligence Systems(e.bis)

Dimensional Modeling II

Olivia R. Liu Sheng, Ph.D.Emma Eccles Jones Presidential Chair of Business

Olivia R. Liu Sheng, Ph.D.Emma Eccles Jones Presidential Chair of Business

2

TechnicalArchitecture

Design

TechnicalArchitecture

Design

ProductSelection &Installation

ProductSelection &Installation

End-UserApplication

Specification

End-UserApplication

Specification

End-UserApplication

Development

End-UserApplication

Development

The Business Dimensional Lifecycle

ProjectPlanningProject

Planning

Business

Requirement

Definition

Business

Requirement

Definition

DeploymentDeploymentMaintenance

andGrowth

Maintenanceand

Growth

Project ManagementProject Management

DimensionalModeling

DimensionalModeling

PhysicalDesign

PhysicalDesign

Data StagingDesign &

Development

Data StagingDesign &

Development

3

Outline

• Table structure, types, characteristics and terminology

• Design steps• Dimensional models with varying

types of fact and dimension tables

4

Types of Facts

• Transactional facts (transactions or line items in transactions)

• Snapshots• Factless facts

5

Types of Dimensions

• Role playing dimensions• Heterogeneous dimensions• Slowly changing dimensions• Large dimensions• Many-to-many dimensions

6

Keys and Attributes

• Primary key - a column whose value uniquely identifies each row (record) in the table.

• Attributes – columns in a table that are not designated as the primary key.

• Foreign key – a non-primary-key attribute for a table that corresponds to a primary key of another table.

7

Attributes in DW tables

• Dimension Table– One Primary Key– Dimension Attributes

• Fact table– Primary key --- A collection of primary keys from

all its associated dimension tables• All warehouse keys in fact table are foreign keys

referring to its associated dimension tables• All/part of warehouse keys in fact table form the

primary key of fact table

– Fact Attributes

8

Attributes in DW tables

SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES

CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY

PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME

TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER

reference

referenced by

reference

referenced by

reference

referenced by

Data warehouse keys generated by the system

9

Keys and Grain

• Keys– Primary or natural keys (from source

systems)– Warehouse or synthetic keys

(generated by a data warehouse tool)• Grain

– The level of detail of fact measures described in the DW, e.g., sales transactions from order line items by order date, product and customer

10

Single-Fact-Table Data Warehouse Design Decisions

1. The business questions in focus and source information systems*

2. The grain of the fact table3. The dimensions tables and keys4. The fact attributes and dimension

attributes

*All DW attributes must be mapped to or derived from source attributes

11

Single-Fact-Table Data Warehouse Design Decisions

1. The business questions in focus and source information systems

2. The grain of the fact table3. The dimensions tables and keys4. The fact attributes and dimension

attributes

12

Sample Business Questions

• Report Sales in terms of – (total) amt, (total) qty and (avg.) price

• Report Sales by PRODUCT name and/or category name

• Report Sales by CUSTOMER name, city and/or or state

• Report Sales by ORDER date, month, year, holiday, special event or other time constraints

• Report using a combination of the measures and constraints

13

Relational Schema of B.com B2B System

Orders ( Order_No, SID, BID, CID, Order_date)

OrderLine (Order_No, Line_ID, PID, Actual_Del_Date, Target_Del_Date, Arrival_Date, Shipping_Fee, Tax, Quantity, Unit_Price,Defect_on_arrival)

Delivery ( SID, CID, Unit_shipping_fee, UNIT_DEL_TIME)

Contract ( CID, Contract_Name, Payment_term, Payment_num)

Payment ( PaymentID, OrderNO, Pay_Amount, Date)

14

Relational Schema of B.com B2B SystemCategory ( CAT_ID, CAT_Name)

Product ( PID, CAT_ID, P_Weight, P_Life, P_Name)

Supplier ( SID, S_Name, S_City, S_State, S_Country)

Product_Supply ( PID, SID, Unit_Price, Quantity_in_Stock, Production_in_Week)

Buyer ( BID, B_Name, CityID, B_Type)

Buyer_City ( CityID, C_Name, C_State, C_Country, C_Tax)

15

Single-Fact-Table Data Warehouse Design Decisions

1. The business questions in focus and source information systems

2. The grain of the fact table3. The dimensions tables and keys4. The fact attributes and dimension

attributes

16

Grain of the Fact Table

Type of fact table: transactional facts

Potential grains: order or orderlineConstraints: order date, product,

customerGrain: sales from orderline (by

order date, product, and customer)

17

Single-Fact-Table Data Warehouse Design Decisions

1. The business questions in focus and source information systems

2. The grain of the fact table3. The dimensions tables and keys4. The fact attributes and dimension

attributes

18

Dimension Tables and KeysKey dimension tables jointly make up the primary key for a fact table

TIME# TIME_KEY* ORDER_DATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG...

SALES# TIME_KEY# CUSTOMER_KEY# PRODUCT_KEY* PRICE* QUANTITY* SALES_AMOUNT

PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME

CUSTOMER# CUSTOMER_KEY* CID* CNAME* CITY* STATE

REFERENCE

REFERENCED BY

REFERENCE

REFERENCED BY

REFERENCE

REFERENCED BY

19

Single-Fact-Table Data Warehouse Design Decisions

1. The business questions in focus and source information systems

2. The grain of the fact table3. The dimensions tables and keys4. The fact attributes and dimension

attributes

20

Determine Fact Attributes

SALES# TIME_KEY# CUSTOMER_KEY# PRODUCT_KEY* PRICE* QUANTITY* SALES_AMOUNT

21

Types of Fact Attributes

• Additive fact attributes can be added along any dimension.

SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES

CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY

PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME

TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER

reference

referenced by

reference

referenced by

reference

referenced by

22

Types of Fact Attributes

• Non-additive fact attributes cannot be added along any dimension.

SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES

CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY

PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME

TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER

reference

referenced by

reference

referenced by

reference

referenced by

23

Types of Fact Attributes

• Semi-additive fact attributes can be added along some dimensions.

INVENTORY_PRODUCT# PRODUCT_KEY

WAREHOUSE# WAREHOUSE_KEY

INVENTORY_TIME# TIME_KEY

INVENTORYFACT# TIME_KEY# PRODUCT_KEY# WAREHOUSE_KEY* QUANTITY_ON_HAND

REFERENCE

REFERENCED BY

REFERENCEREFERENCED BY

REFERENCE

REFERENCED BY

24

Time Dimension

• Data warehouse needs an explicit time dimension table instead of just a time attribute (e.g, ORDERDATE).

• Save computation effort and improve query performance

• Complex queries regarding calendar calculation are hidden from end users of data warehouse.

25

Time Dimension

Besides the time attribute, time dimension table includes the following additional attributes:

– Day_of_week (1-7); Day_number_in_month (1-31); – Day_number_in_year (1-365)– Week_number (1-52); month (1-12), Quarter (1-4)– Holiday_flag (y/n)– Fiscal_quarter, Fiscal_year

26

SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES

CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY

PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME

TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER

reference

referenced by

reference

referenced by

reference

referenced by

Determine Dimension Attributes

27

SALES# TIME_KEY# PRODUCT_KEY# CUSTOMER_KEY* PRICE* QUANTITY* SALES

CUSTOMER# CUSTOMER_KEY* CID* CNAME* STATE* CITY

PRODUCT# PRODUCT_KEY* PID* PNAME* PCNAME

TIME# TIME_KEY* ORDERDATE* DAY_OF_WEEK* DAY_NUMBER_IN_MONTH* DAY_NUMBER_IN_YEAR* WEEK_NUMBER* MONTH* QUARTER* HOLIDAY_FLAG* FISCAL_YEAR* FISCAL_QUARTER

reference

referenced by

reference

referenced by

reference

referenced by

Avoid Snowflake Designs

28

Avoid Snowflake Design

PRODUCT_CATEGORY# PRODUCT_CATEGORY_KEY* PCID* PCNAME

CUSTOMERTIME

SALES

PRODUCT# PRODUCT_KEY* PID* PNAME* PRODUCT_CATEGORY_KEY

REFERECEREFERENCED BY

REFERENCE

REFERENCED BY

REFERENCE

REFERENCED BY

REFERENCE

REFERENCED BY

Snowflake structure

29

Avoid Snowflake Schemas

• Tradeoff of avoiding snowflake

– Advantage: improve query performance and easy of understanding

– Disadvantage: require more storage space