Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science...

19
Data Warehousing Concepts , by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Dr. Awad Khalil Computer Science Department Computer Science Department AUC AUC

Transcript of Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science...

Page 1: Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.

Data Warehousing Concepts, by Dr. Khalil

1

Data Warehousing Design

Dr. Awad KhalilDr. Awad Khalil

Computer Science DepartmentComputer Science Department

AUCAUC

Page 2: Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.

Data Warehousing Concepts, by Dr. Khalil

2

Content

Designing a Data Warehouse Database Designing a Data Warehouse Database Dimensional ModelingDimensional Modeling Star SchemaStar Schema Snowflake SchemaSnowflake Schema Advantages of Dimensional ModelingAdvantages of Dimensional Modeling Methodology for Dimensional ModelingMethodology for Dimensional Modeling

Page 3: Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.

Data Warehousing Concepts, by Dr. Khalil

3

Designing a Data Warehouse Database Designing a data warehouse database is highly complex.Designing a data warehouse database is highly complex. The database component of a data warehouse is described using a The database component of a data warehouse is described using a

technique called technique called dimensionality modeling: “A logical design : “A logical design technique that aims to present the data in a standard, intuitive technique that aims to present the data in a standard, intuitive form that allows for high-performance access”form that allows for high-performance access”

Dimensionality modeling uses the concepts of Entity-Relationship Dimensionality modeling uses the concepts of Entity-Relationship (ER) modeling with some important restrictions.(ER) modeling with some important restrictions.

Every dimensional model (DM) is composed of one table with a Every dimensional model (DM) is composed of one table with a composite primary key, called the composite primary key, called the fact table, and a set of smaller , and a set of smaller tables called tables called dimension tables.

Every dimension table has a simple (non-composite) primary key Every dimension table has a simple (non-composite) primary key that corresponds exactly to one of the components of the that corresponds exactly to one of the components of the composite key in the fact table.composite key in the fact table.

This characteristic ‘star-like’ structure is called a This characteristic ‘star-like’ structure is called a star schema or or star join.

Page 4: Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.

Data Warehousing Concepts, by Dr. Khalil

4

Star Schema

A logical structure that has a fact table containing factual data in the center, A logical structure that has a fact table containing factual data in the center, surrounded by dimension tables containing reference data (which can be surrounded by dimension tables containing reference data (which can be denormalized).denormalized).

The diagram shows a Star schema for property sales of a Real Estate database.The diagram shows a Star schema for property sales of a Real Estate database.

Page 5: Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.

Data Warehousing Concepts, by Dr. Khalil

5

Other Schema Versions

Snowflake Schema A variant of the star schema where dimension tables do not contain A variant of the star schema where dimension tables do not contain

denormalized data.denormalized data.

Starflake Schema A hybrid structure that contains a mixture of star and snowflake schemas.A hybrid structure that contains a mixture of star and snowflake schemas.

The diagram shows part of star schema for property sales of a Real Estate The diagram shows part of star schema for property sales of a Real Estate database with a normalized version of the Branch dimension table.database with a normalized version of the Branch dimension table.

Page 6: Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.

Data Warehousing Concepts, by Dr. Khalil

6

Dimensional Model - Advantages Efficiency – The consistency of the underlying database structure – The consistency of the underlying database structure

allows more efficient access to the data by various tools including allows more efficient access to the data by various tools including report writers and query tools.report writers and query tools.

Ability to handle changing requirements – The start schema – The start schema can adapt to changes in the user’s requirements, as all dimensions can adapt to changes in the user’s requirements, as all dimensions are equivalent in terms of providing access to the fact table.are equivalent in terms of providing access to the fact table.

Extensibility – The dimensional model is extensible. – The dimensional model is extensible. Ability to model common business situations – There are a – There are a

growing number of standard approaches for handling common growing number of standard approaches for handling common modeling situations in the business world.modeling situations in the business world.

Predictable query processing – Data warehouse applications – Data warehouse applications that drill down will simply be adding more dimension attributes that drill down will simply be adding more dimension attributes from within a single star schema.from within a single star schema.

Page 7: Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.

Data Warehousing Concepts, by Dr. Khalil

7

Database Design Methodology for Data Warehouse

Nine-Step Methodology by Kimball (1996):

1- Choosing the process1- Choosing the process2- Choosing the grain2- Choosing the grain3- Identifying and conforming the dimensions3- Identifying and conforming the dimensions4- Choosing the facts4- Choosing the facts5- Storing pre-calculations in the fact table5- Storing pre-calculations in the fact table6- Rounding out the dimension tables6- Rounding out the dimension tables7- Choosing the duration of the database7- Choosing the duration of the database8- Tracking slowly changing dimensions8- Tracking slowly changing dimensions9- Deciding the query priorities and the query modes9- Deciding the query priorities and the query modes

Page 8: Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.

Data Warehousing Concepts, by Dr. Khalil

8

1- Choosing the process

The process (function) refers to the subject matter of a particular The process (function) refers to the subject matter of a particular data mart. The best choice for the first data mart tends to be the data mart. The best choice for the first data mart tends to be the one that is related to sales.one that is related to sales.

Page 9: Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.

Data Warehousing Concepts, by Dr. Khalil

9

2- Choosing the grain Means deciding exactly what a fact table record represents.Means deciding exactly what a fact table record represents.

Page 10: Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.

Data Warehousing Concepts, by Dr. Khalil

10

3- Identifying and Conforming the Dimensions Dimensions set the context for asking questions about the facts in the fact Dimensions set the context for asking questions about the facts in the fact

table.table. The diagram shows Star schema for property sales and property advertising The diagram shows Star schema for property sales and property advertising

with Time, PropertyForSale, Branch, and Promotion as conformed (shared) with Time, PropertyForSale, Branch, and Promotion as conformed (shared) dimension tables.dimension tables.

Page 11: Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.

Data Warehousing Concepts, by Dr. Khalil

11

4- Choosing the Facts The grain of the fact table determines which facts can be used in the data mart. The grain of the fact table determines which facts can be used in the data mart. All the facts must be expressed at the level implied by the grain.All the facts must be expressed at the level implied by the grain. The diagram shows how the Lease fact table shown in the previous diagram The diagram shows how the Lease fact table shown in the previous diagram

could be corrected so that the fact table is appropriately structuredcould be corrected so that the fact table is appropriately structured

Page 12: Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.

Data Warehousing Concepts, by Dr. Khalil

12

5- Storing Pre-Calculations in the Fact Table Once the facts have been selected Once the facts have been selected

each should be re-examined to each should be re-examined to determine whether there are determine whether there are opportunities to use pre-opportunities to use pre-calculations.calculations.

A common example of the need to A common example of the need to store pre-calculations occurs when store pre-calculations occurs when the fact comprise a profit and loss the fact comprise a profit and loss statement.statement.

The diagram shows the fact table The diagram shows the fact table with the rentDuration, totalRent, with the rentDuration, totalRent, clientAllowance, clientAllowance, staffCommission, and staffCommission, and totalRevenue attributes. These totalRevenue attributes. These types of facts are useful because types of facts are useful because they are additive quantities, from they are additive quantities, from which we can derive valuable which we can derive valuable information.information.

Page 13: Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.

Data Warehousing Concepts, by Dr. Khalil

13

6- Rounding out the Dimension Tables

In this step, we return to the dimension tables and add In this step, we return to the dimension tables and add many text descriptions to the dimensions as possible. many text descriptions to the dimensions as possible.

The text descriptions should be as intuitive and The text descriptions should be as intuitive and understandable to the users as possible.understandable to the users as possible.

The usefulness of a data mart is determined by the scope The usefulness of a data mart is determined by the scope and nature of the attributes of the dimension tables.and nature of the attributes of the dimension tables.

Page 14: Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.

Data Warehousing Concepts, by Dr. Khalil

14

7- Choosing the Duration of the Database

The duration measures how far back in time the fact table The duration measures how far back in time the fact table goes.goes.

Very large fact tables raise at least two very significant Very large fact tables raise at least two very significant design issues:design issues: First, it is often increasingly difficult to source First, it is often increasingly difficult to source

increasingly old data.increasingly old data. Second, it is mandatory that the old versions of the Second, it is mandatory that the old versions of the

important dimensions be used, not the most current important dimensions be used, not the most current versions. This is known as the ‘versions. This is known as the ‘slowly changing slowly changing dimensiondimension’ problem’.’ problem’.

Page 15: Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.

Data Warehousing Concepts, by Dr. Khalil

15

8- Tracking Slowly Changing Dimensions

The slowly changing dimension problem means, for example, that the The slowly changing dimension problem means, for example, that the proper description of the old client and the old branch must be used proper description of the old client and the old branch must be used with the old transaction history.with the old transaction history.

Often, the data warehouse must assign a generalized key to these Often, the data warehouse must assign a generalized key to these important dimensions in order to distinguish multiple snapshots of important dimensions in order to distinguish multiple snapshots of clients and branches over a period of time.clients and branches over a period of time.

There are three basic types of slowly changing dimensions:There are three basic types of slowly changing dimensions: Type 1 – where a changed dimension attribute is overwritten;Type 1 – where a changed dimension attribute is overwritten; Type 2 – where a changed dimension attribute causes a new record Type 2 – where a changed dimension attribute causes a new record

to be created;to be created; Type 3 – where a changed dimension attribute causes an alternate Type 3 – where a changed dimension attribute causes an alternate

attribute to be created so that both the old and the new values of attribute to be created so that both the old and the new values of the attribute are simultaneously accessible in the same dimension the attribute are simultaneously accessible in the same dimension record.record.

Page 16: Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.

Data Warehousing Concepts, by Dr. Khalil

16

9- Deciding the Query Priorities and the Query Modes

In this step we consider physical design issues.In this step we consider physical design issues.

The most critical physical design issues affecting the end-user’s The most critical physical design issues affecting the end-user’s perception of the data mart are the physical sort order of the fact perception of the data mart are the physical sort order of the fact table on disk and the presence of pre-stored summaries or table on disk and the presence of pre-stored summaries or aggregations.aggregations.

Beyond these issues there are a host of additional physical design Beyond these issues there are a host of additional physical design issues affecting administration, backup, indexing performance, and issues affecting administration, backup, indexing performance, and security.security.

Page 17: Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.

Data Warehousing Concepts, by Dr. Khalil

17

Example- Dimensional Model (Fact Constellation) for a Real Estate Data Warehouse

At the end of this methodology, we At the end of this methodology, we have a design for a data mart that have a design for a data mart that supports the requirements of a supports the requirements of a particular Real Estate business is particular Real Estate business is designed for a Real Estate business designed for a Real Estate business process and also allows the easy process and also allows the easy integration with other related data integration with other related data marts to ultimately form the marts to ultimately form the enterprise-wide data warehouse.enterprise-wide data warehouse.

We integrate the star schemas for We integrate the star schemas for the business processes of the Real the business processes of the Real Estate company using the Estate company using the conformed dimensions. For conformed dimensions. For example, all the fact tables share example, all the fact tables share the Time and Branch dimensions.the Time and Branch dimensions.

A dimensional model, which A dimensional model, which contains more than one fact table contains more than one fact table sharing one or more conformed sharing one or more conformed dimension tables, is referred to as a dimension tables, is referred to as a fact constellation.fact constellation.

Page 18: Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.

Data Warehousing Concepts, by Dr. Khalil

18

Example- Fact and Dimension Tables for each Business Process

Business ProcessBusiness Process Fact TableFact Table Dimension TablesDimension Tables

Property SalesProperty Sales PropertySalePropertySale Time, Branch, Staff, Time, Branch, Staff, PropertyForSale, PropertyForSale, Owner,ClientBuyer, PromotionOwner,ClientBuyer, Promotion

Property RentalsProperty Rentals LeaseLease Time, Branch, Staff, Time, Branch, Staff, PropertyForSale, PropertyForSale, Owner,ClientBuyer, PromotionOwner,ClientBuyer, Promotion

Property ViewingProperty Viewing PropertyViewingPropertyViewing Time, Branch, Staff, Time, Branch, Staff, PropertyForSale, PropertyForSale, PropertyForRent,ClientBuyer, PropertyForRent,ClientBuyer, ClientRenterClientRenter

Property Property AdvertisingAdvertising

AdvertAdvert Time, Branch, Staff, Time, Branch, Staff, PropertyForSale, PropertyForRent, PropertyForSale, PropertyForRent, Promotion, NewspaperPromotion, Newspaper

Property Property MaintenanceMaintenance

PropertyMaintenancPropertyMaintenancee

Time, Branch, Staff, Time, Branch, Staff, PropertyForRentPropertyForRent

Page 19: Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.

Data Warehousing Concepts, by Dr. Khalil

19

Thank you