SLOWLY CHANGING DIMENSIONS - Oracle · PDF fileThe SCD technique is used to preserve history...

42
OWB PROFESSIONAL COMMUNITY Slowly Changing Dimensions Author: Gerco Soet Creation Date: 05-10-2001 Last Changed: 21-02-2002 Doc. Ref: PC-OWB-002-Slowly Changing Dimensions Version: 1.0

Transcript of SLOWLY CHANGING DIMENSIONS - Oracle · PDF fileThe SCD technique is used to preserve history...

OWB PROFESSIONAL COMMUNITY

Slowly Changing Dimensions

Author: Gerco Soet

Creation Date: 05-10-2001

Last Changed: 21-02-2002

Doc. Ref: PC-OWB-002-Slowly Changing Dimensions

Version: 1.0

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

Document Control

Change Record 3

Date Author Version Change Reference

5-Oct-01 Gerco Soet Draft 1a No Previous Document 11-Jan-02 Gerco Soet 0.2 Comments Maarten Pauw

Reviewers

Name Position

Maarten Pauw Community Leader

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Document Control ii

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

Contents

Document Control ................................................................................................................ ii Change Record............................................................................................................... ii Reviewers........................................................................................................................ ii

Introduction........................................................................................................................... 1 Purpose ........................................................................................................................... 1 Background .................................................................................................................... 1 Scope................................................................................................................................ 1 Related Documents ....................................................................................................... 1

Slowly Changing Dimensions............................................................................................. 2 Theory ............................................................................................................................. 2 SCD Schema ................................................................................................................... 2 Surrogate keys................................................................................................................ 3

Implementing Slowly Changing Dimensions using OWB ............................................. 4 Type I............................................................................................................................... 4 Type III ............................................................................................................................ 4 Type II ............................................................................................................................. 6

Comments or questions ..................................................................................................... 10

Appendices .......................................................................................................................... 11 Example of Type I in OWB......................................................................................... 13 Example of Type III in OWB ...................................................................................... 15 Example of Type II in OWB ....................................................................................... 24 Example of loading facts ............................................................................................ 34

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Document Control iii

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

Introduction

Purpose

This document describes how to implement Slowly Changing Dimensions (SCD) type 1, 2 and 3 as described by Ralph Kimball[1] using Oracle Warehouse Builder 3i (OWB).

Background

This project is carried out under the responsibility of the Oracle Warehouse Builder Professional Community of Oracle Netherlands.

The project goal is to deliver a document which describes how to implement Slowly Changing Dimensions type 1, 2 and 3 as described by Ralph Kimball[1] using Oracle Warehouse Builder. Furthermore the goal is to deliver some reusable OWB packages or templates concerning SCD.

The project includes a literature study of the items in the related documents, creating this document and some reusable OWB packages/templates.

Scope

This document only deals with the types 1, 2 and 3 SCD’s, which are briefly discussed in the next chapter. The document will not describe possible combination of the different types, like a combination of type 2 and 3.

Related Documents

1. The Data Warehouse Toolkit by Ralph Kimball

2. Some Web documents: - Slowly Changing Dimensions, Ralph Kimball (http://www.dbmsmag.com/9604d05.html) - Surrogate keys, Ralph Kimball (http://www.dbmsmag.com/9805d05.html) - Implementing Slowly Changing Dimensions, Joe Luedtke (http://www.sqlmag.com/Articles/Print.cfm?ArticleID=7835)

3.

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Introduction 1 of 1

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

Slowly Changing Dimensions

This chapter will briefly describe de SCD technique as described by Ralph Kimball in [1].

Theory

The SCD technique is used to preserve history in the Data Warehouse environment. There are three types of SCD:

1. Type I Overwrite the dimension record with the new values, thereby losing history. This type is mostly used in case of corrections. See Appendix Example of Type I in OWB for the OWB implementation.

2. Type II Create a new additional dimensional record with a new value for the primary key, which is a surrogate key. This type is used when a true physical change to the dimension entity (like product or customer) has taken place and it is appropriate to perfectly partition history by the different descriptions. See Appendix Example of Type II in OWB for the OWB implementation.

3. Type III Create an ‘old’ field in the dimension record to store the immediate previous attribute value. This type is applied in case of a ‘soft’ change of the dimension entity. This means that it is still logically possible to act as if the change had not occurred. It is only necessary to keep track of the old as well as the new value of an attribute, or the original as well as the new. An example might be the redrawing of sales district boundaries. See Appendix Example of Type III in OWB for the OWB implementation.

There are also combinations possible of these types. A combination of types I and II is possible, when a few attributes cause a new record (II) and others an update of the existing attributes (I). Another possibility is a combination of type II and III. This can be done when it’s necessary to not only keep track of changes in the dimension records by creating new records for each change (II), but also to be able to look at the same facts in relation to different (historical) values of the same dimension record. These combinations are not within the scope of this document and will not be further investigated.

SCD Schema

Complete history of dimension: Yes Type II No

Comparison of actual dimension value: Yes Type III with original or a previous version No

Type I

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Slowly Changing Dimensions 2 of 2

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

Surrogate keys

A surrogate key is a substitute for a natural key. It is recommended that you use surrogate keys in case of SCD’s type II. Every new created record will get a new surrogate key. Some reasons not to use the natural (production) keys (for example: the product code) are:

• Production keys, which are still available in the warehouse, may be reused after purging.

• Production keys may be reused by mistake.

• In case of an acquisition there will be need to merge the production keys from different systems.

The creation of surrogate keys can be easily implemented in OWB by the use of a sequence.

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Slowly Changing Dimensions 3 of 3

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

Implementing Slowly Changing Dimensions using OWB

This chapter will describe how to implement the different SCD types using Oracle Warehouse Builder. Because type II is the most complex one, first type I and III will be described.

Type I

SCD Type I is a simple overwrite of the existing dimension record. This means that no history will be preserved. This type is therefore mostly used to correct errors in the data.

Type I is implemented by a normal Insert/Update mapping in OWB. There are no additional requirements for the data to be delivered.

For instance, the name of a supplier is misspelled, which is corrected in the source system. There is no need to keep history of the misspelled ‘version’ of the supplier and therefore an update of the existing supplier dimension record can be done.

Type III

SCD type III is used to partly preserve history. You keep track of an ‘old’ and ‘new’ (current) value. For example, after reorganization you want to able to track how the new organization is currently doing in comparison to the old one. Suppose the organization is divided in LOB’s (Lines of Business), which are subdivided into divisions. After the reorganization some divisions report to a different LOB then before. Thus the old value will contain the previous LOB and the new value the current LOB of a division. This way you are able to see how the ‘old’ organization would have done with today’s figures.

In this document we assume that there are no multiple changes of the same dimension value in one load. This is acceptable because the types of changes that trigger a type III SCD are usually not very frequent.

The implementation of type III in OWB roughly consists of the following steps:

• Create a new field in the dimension to hold the current value (current_LOB). Furthermore it’s probably best to rename the original field to ‘previous_…’ (previous_LOB). It is also possible to create an ‘effective_date’ field, which contains the date on which the new value became valid. This date is not needed for the implementation of this type, but can be used for query purposes.

• Create a procedure or mapping to determine if the concerning current dimension value is new or has changed for the dimension key.

• Create a new record and/or update the dimension record, in case of an existing value, by changing the previous and current fields (and effective date).

The first step in implementing a type III SCD is to decide of which dimension field you want to keep a current and previous value. Having done so, a new field <current_value> must be created in the (target) dimension, which will contain the current value, while the original field preferably must be renamed to

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Implementing Slowly Changing Dimensions using OWB 4 of 4

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

<previous_value> to hold the previous value. Optionally an effective date field can be created for query purposes.

The next step is to find a way to detect a change of this dimension value. The most desirable solution would be the case where the source system can indicate this change by setting a flag or provide a file with only updated source records.

If this not possible a solution must be created in OWB. In this document 2 possible solutions will be explored for implementing type III dimensions.

1. Determine if an update or insert must be done before the mapping.

2. Determine if an update or insert must be done within the mapping

Before the mapping

Procedure

The first option to be examined is to determine whether an update or insert must be done before running the mapping. This means that a procedure must be created which determines if a dimension value exists in the (target) dimension. And if so, to check whether the current value has changed.

In order to do this, it is best to use a small lookup table, which contains the dimensions unique key, the corresponding surrogate key and the current_value field ordered by the unique dimension key. If this lookup table is properly indexed and pinned in memory during processing, it gives the best performance.

The procedure must check whether the dimensions unique key exists in the lookup table. If it does not exist a new record has to be inserted in the dimension. If it exists in the lookup table, the current_value in the lookup table must be compared with the ‘new’ value from the source record. If the values differ from each other, the dimension record must be updated, if not, nothing must be done. In case of an update the previous_value field must be updated with the current_value field from the lookup table and the current_value field in the dimension with the new value from the staging table. If present, the effective_date field must be updated with the system date or some other logical date representing the load date.

After finishing updating the dimension, the lookup table has to be changed to account for the changes made to the dimension. This can be done by truncating the table and fill it with the data from the changed dimension.

In order for the mapping to recognize the records to be processed the procedure must set an indicator, which indicates if we’re dealing with an insert (I), an update (U) or nothing (NULL). This indicator is a newly created column in the load/staging table called for instance ‘SCD_ind’. Besides the SCD_ind column two other columns current_value and s_key have to be created. The procedure inserts the current_value and the surrogate key from the lookup table into these columns in case of an update. This makes the updating of the dimension a lot easier because now only the staging table is needed for the (update) mapping.

Mappings

After creating the procedure to determine the type of change, mappings have to be created to actually perform the desired actions on the dimension table. Two mappings have to be created, one insert and one update mapping. The insert mapping will process all the staging records with a SCD_ind = ‘I’. This mapping only inserts new records in the dimension. For each new record a new surrogate key is created using a sequence.

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Implementing Slowly Changing Dimensions using OWB 5 of 5

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

The update mapping will process all the staging records with an SCD_ind = ‘U’. This mapping will only do updates on the dimension. It first searches the right dimension record based on the s_key field from the staging table and then does an update of the previous_value, current_value and effective_date fields.

Within the mapping

Mappings

The second option to process type III record is to tackle everything in the mappings. Again it is best to create an insert and an update mapping similar to the first option. The difference is that the source table is not the staging table but a view based on the staging and lookup table.

The source view for the insert mapping is a query, which selects all the records from the staging table, which don’t exist in the lookup table based on the unique key. An alternative is to place a filter on the staging table in the mapping, which excludes the records, which exist in the lookup table. In OWB 3i you can use the built-in key lookup function to determine if the record exists in the lookup table.

The source view for the update mapping is a query, which selects all the records from the staging table, which can be joined to the lookup table using the unique key and where the source value is not equal to the current value. In this case it is not possible to use a filter on the staging table, because you need data from the lookup table like the surrogate key. In a future release of OWB it is possible to use a lookup function, which can return more than one value, so you don’t need to join the tables.

Processing the facts

The lookup table can also be used in the process of loading the facts into the warehouse. During this process the corresponding surrogate key from the lookup table can replace the dimension key. The surrogate key can be found by joining the lookup table or using the wb_lookup function (only for one column keys).

Type II

Type II SCD is used to partition history. It allows you to keep track of all the changes made in the past. Every time a change occurs a new dimension record is created with a new surrogate key, but with the same unique source key. For example, you want to be able to keep track of changes in product names. Every time the name of a product changes, a new record is created in the product dimension with a different surrogate key and the new product name. From that moment on, all the fact records will be joined with this new record. This way you have a perfect representation of your product sales over time.

In this document we distinguish two cases in which to apply SCD type II:

1. One change per dimension key per load.

2. Multiple changes per dimension key per load.

In the first case we can just join the new facts to the new dimension record. In the second case however, we must carefully decide to which new dimension records the incoming facts will be joined. Do we want to join all the fact records to the latest changed record in the load or to the record, which was current when the fact

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Implementing Slowly Changing Dimensions using OWB 6 of 6

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

occurred? In any case we need some sort of transaction date, which indicates when the source record became valid. This must be provided by the source system.

In both cases we assume that from the moment on that a new version of a source (dimension) record became valid, all the facts are linked to that new version. This means we don’t take into account that for example ‘old’ products are sold after a new version of the product was created. We also assume that in every next load there will be no source records, which are older than the ones from the previous loads. Furthermore we assume in case of multiple changes per load that the dimension records of the recent month aren’t equal to the most current record from the previous load.

The implementation of type II in OWB roughly consists of the following steps:

• Create a current_flag field in the dimension, which indicates (Y/N) whether the record contains the most recent dimension value or not. Add also an effective_start_date/effective_end_date field, which is the date/time on which the record became most current/not current. Optionally it’s possible to create a text field change_reason, which can be used to explain why the change occurred.

• Create a procedure to determine if it concerns a new dimension record or an existing one. Furthermore if it is an existing record check if any of the predefined attributes have changed.

• Create a new record and, in case of an existing value, update the existing dimension record by changing the current_flag and effective_end_date fields.

The first thing to do in implementing a type II SCD is to decide which dimension field(s) will trigger a type II SCD. Having done so, a new field <current_flag> must be created in the (target) dimension, which will contain an indicator indicating the current value. Furthermore an <effective_start_date>/<effective_end_date> field must/can be created, which is the date/time on which the record became most current/not current. An optional text field <change_reason> can be created for query purposes.

In principle it isn’t necessary to save both a current indicator plus a start date, but the indicator allows the most current value to be quickly retrieved. It is very important that the date fields are used correctly. It isn’t always just a matter of using the system date, because this could cause your dimension not to be in line with your time dimension. So it’s important to use the right date/time to avoid confusion. It’s best to use some sort of transaction date provided by the source system if possible.

The next step is to find a way to detect a change of the dimension value. The most desirable solution as described in type III would be the case where the source system can indicate this change by setting a flag or provide a file with only updated source records.

Again if this is not possible a solution must be created in OWB. This best can be done with a procedure, which will determine what to do with the source records before running the mapping from staging to ODS/DWH. It is not advisable to implement the whole thing in a mapping, because this would become very complex and not too good for the performance.

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Implementing Slowly Changing Dimensions using OWB 7 of 7

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

One change per dimension record per load

Looking at the case where there’s only one change of a dimension record in a load, it is only necessary to use the current_flag from the dimension record. Similar to the type III implementation, it is best to create a lookup table, which contains the most current records (based on the current_flag) with their surrogate and unique production key (for updating the dimension afterwards).

Based on the unique key from the source records the procedure will first determine if the key already exists in the lookup table. If so it will, similar to type III, insert an ‘I’ into the SCD_ind column of the staging table. If the key doesn’t exist in the lookup table, the procedure then must check whether the predefined attributes have changed. This can be done by taking the surrogate key from the lookup table and then go to the actual dimension to compare the source record attributes with the dimension records. If one or more of the attributes have changed the procedure will insert an ‘U’ into the SCD_ind column of the staging table. In this case only a SCD_ind column must be created in the load/staging table.

For both the records with an ‘I’ and an ‘U’ a new record must be inserted in the dimension with a current_flag = ‘Y’ and a new surrogate key. Furthermore the dimension records, which were current before the load, must be updated. The current flag must be set to ‘N’ and the effective end date must be filled with the start date – 1 of the new record. These records can be found by selecting the unique keys from the staging table where the SCD_ind = ‘U’ and use these to retrieve the corresponding dimension record for update with end date is null and the minimal surrogate key. This can be done by a procedure, which must be executed after running both mappings.

After this is done the lookup table must be refreshed. This can be done by truncating the table and fill it with the dimension records with current_flag = ‘Y’.

When loading the fact records you simply select the dimension records with current_flag = ‘Y’, which are stored in the lookup table. So, it’s best to use the lookup table to find the surrogate keys corresponding to the unique keys.

Multiple changes per dimension record per load

In case of multiple changes of the dimension records in one load, it is not sufficient enough to just use the current_flag indicator to join the fact records, especially when you want to join the fact records to the dimension record which was current during the occurrence of the fact.

Connect all facts to the most recent dimension record from one load

If you want to connect your facts to the most recent changed dimension record, you can use a similar procedure as described in the case of one change per dimension per load. The records with the same key, but different attributes will all be inserted into the dimension. The only thing that you should be aware of is that you presort your source data (key, transaction date ascending) before loading it into the dimension. This way the most recent changed record will be inserted last and therefore receive the highest surrogate key. When updating your dimension after running the mapping, you should not only update the dimension records from previous loads, but also delete the records from the current load, if there is more than one record for the key, except the one with the highest surrogate key.

After the mapping the lookup table must be refreshed similar to the case of one change per dimension per load.

The fact records can now simply be joined to the dimension record with the current flag set to ‘Y’, similar to type III.

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Implementing Slowly Changing Dimensions using OWB 8 of 8

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

Connect facts to the correct dimension record in time from one load

It’s different when you want to correctly join the fact records in time to the dimension records from the same load. You also need a timestamp to correctly join the fact records to the dimension, which must be provided by the source system.

The processing of the dimension records is the same as the above case. The difference is that no records are deleted in the dimension after running the mapping. You still determine the most recent dimension record and indicate it by setting the current flag to ‘Y’. This is done because the lookup table must be updated with the most recent dimension records.

You can now connect the facts by selecting the dimension record based on the unique source key plus some sort of transaction date, which must be between the dimensions effective start and end date. This implies that you need some sort of transaction date of your facts to tie them to the right occurrence of the dimension record. This date must be provided by the source system.

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Implementing Slowly Changing Dimensions using OWB 9 of 9

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

Comments or questions

Please send any comments or questions to [email protected].

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Comments or questions 10 of 10

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

Appendices

This Appendix describes an example of the implementation of each type of SCD in OWB. For each example we use a simple star schema model containing one fact table, a time dimension, a supplier dimension, a customer dimension and a product dimension.

The DWH_SUPPLIERS dimension is a type I, DWH_CUSTOMERS a type III and DWH_PRODUCTS a type III SCD.

The source files for this model are:

Customer.csv

Customer_code Customer_name Region_code Region_name Country_code Country_name NLN01 AAD NLN Holland North NL Holland NLN02 BERT NLN Holland North NL Holland NLS01 KLAAS NLS Holland South NL Holland NLS02 JAN NLS Holland South NL Holland NLS03 GERT NLS Holland South NL Holland NLW01 KOEN NLW Holland West NL Holland GEN01 MAX GEN Germany North GE Germany GEN02 DAVE GEN Germany North GE Germany GEN03 LIEKE GEN Germany North GE Germany GEE01 ANNE GEE Germany East GE Germany GEE02 WENDY GEE Germany East GE Germany BEW01 MARCEL BEW Belgium West BE Belgium BEW02 ARNOUD BEW Belgium West BE Belgium BEW03 VIVIAN BEW Belgium West BE Belgium BES01 IRMA BES Belgium South BE Belgium BES02 BEA BES Belgium South BE Belgium BES03 GEERT BES Belgium South BE Belgium

Products.csv

product_ code

product_ description

product_ group_code

product_group_description

product_category_code

product_category_description

transaction_date

AP Apple F Fruit F Food 20010101BR Beer L Liquor D Drinks 20010101CK Coke N Non-Liquor D Drinks 20010101FN Fanta N Non-Liquor D Drinks 20010101PS peas V Vegatable F Food 20010101CL Cauliflower V Vegatable F Food 20010101

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Appendices 11 of 11

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

GN Gin L Liquor D Drinks 20010101PR Pear F Fruit F Food 20010101PN Pineapple F Fruit F Food 20010101

Supplier.csv

Supplier_code Supplier_name A SCHUITEMA B AHOLD

Facts.csv

Customer_code Product_code Supplier_code Transaction_date Revenue Costs NLN01 AP A 20010201 11 5NLN01 BR A 20010201 12 4NLN02 AP A 20010201 12 6NLN02 CK A 20010201 13 3NLN03 AP A 20010201 14 7NLS01 FN A 20010201 11 5NLS01 FN A 20010201 12 4NLS01 PS A 20010201 12 6NLS01 CL A 20010201 13 3NLS02 PR A 20010201 14 7NLS03 AP A 20010201 14 7NLS03 CL A 20010201 11 5NLS03 PS A 20010201 12 4NLW01 AP A 20010201 12 6NLW01 CK A 20010201 13 3NLW01 AP A 20010201 14 7GEN01 AP B 20010201 11 5GEN01 BR B 20010201 12 4GEN02 AP B 20010201 12 6GEN02 CK B 20010201 13 3GEN02 AP B 20010201 14 7GEN03 FN B 20010201 11 5GEE01 FN B 20010201 12 4GEE02 PS B 20010201 12 6GEE01 CL B 20010201 13 3GEE02 PR B 20010201 14 7GEE02 AP B 20010201 14 7GEN01 CL B 20010201 11 5GEE03 PS B 20010201 12 4BES01 AP B 20010201 12 6BES01 CK B 20010201 13 3BES01 AP B 20010201 14 7

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Appendices 12 of 12

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

Example of Type I in OWB

This chapter describes an example of how to implement a type I SCD in OWB. This example handles the dimension DWH_SUPPLIERS, which contains suppliers. First the original source file is loaded into the staging table STG_SUPPLIERS. Then the data is loaded into the dimension from the staging table via a OWB mapping. This is a normal INSERT/UPDATE mapping. After the dimension is filled, one of the supplier names is corrected and a new supplier is added in the source file supplier.csv. This updated file is loaded into the staging table and from there the data is processed to the dimension, where the record with the corrected supplier name is simply updated and a new record is created for the new supplier.

The mapping from source file to staging table looks like:

The mapping from staging to the dimension is designed as:

The mapping properties of the dimension are:

After the initial load the dimension contains:

Now the source file is edited as follows:

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Appendices 13 of 13

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

Supplier_code Supplier_name A SCHUITEMA B AHOLDS C HEINEKEN

After reloading and processing the dimension contains:

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Appendices 14 of 14

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

Example of Type III in OWB

This chapter describes an example of how to implement a type III SCD in OWB. This example handles the dimension DWH_CUSTOMERS. The attribute that will trigger the type III is the region code. We assume that if the region code has changed the region name automatically changed too. First the original source file is loaded into the staging table STG_CUSTOMERS. Then the data is loaded into the dimension from the staging table via an insert and an update OWB mapping. There are 2 ways to do this:

1. Determine whether to insert or update a record before running the mapping. 2. Determine whether to insert or update a record during running the mapping.

In the first case a procedure checks if a record exists or not and if so, if the region has changed. This procedure uses a lookup table based on the customer dimension to do this and updates the staging table with data from this lookup table. This data is used by the mappings to process the source records. In the second case a view based on the staging table and the lookup table is the input for the mappings from staging to DWH.

After the dimension is filled, the region (and region name) of one of the customers is corrected and a new customer is added in the source file. This updated file is loaded into the staging table and from there the data is processed to the dimension, where the corrected record causes an update of the current and previous attributes and a new record is created for the new customer.

The mapping from source file to staging table looks like (in both cases):

As can be seen in the mapping, the staging table contains 4 extra columns: SCD_IND, CURRENT_REGION_CODE, CURRENT_REGION_NAME and S_KEY. These columns are filled with data from the lookup table by a procedure in the first case. They’re not used in the second case.

1. Determine whether to insert or update a record before running the mapping.

After the source to staging mapping the following procedure is executed to determine if an insert or update must be done.

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Appendices 15 of 15

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

CREATE OR REPLACE PROCEDURE "CHK_CUSTOMERS" IS begin update stg_customers set SCD_ind = null , current_region_code = null , current_region_name = null , s_key = null where SCD_ind is not null ; commit; /* Determine all new rows. */ update stg_customers cst set cst.SCD_ind = 'I' where (cst.customer_code ) not in (select cst2.cst_code from dwh_customer_lookup cst2 ) ; commit; /* Determine all changed rows. */ update stg_customers cst set (cst.SCD_ind , cst.current_region_code , cst.current_region_name , cst.s_key) = (select 'U' , cst_current_region_code , cst_current_region_name , cst_key from dwh_customer_lookup where cst_code = cst.customer_code) where (cst.customer_code ) in (select customer_code from (select cst2.customer_code customer_code , cst2.region_code from stg_customers cst2 where cst2.SCD_ind is null minus select clkp.cst_code , clkp.cst_current_region_code from dwh_customer_lookup clkp ) ) ; commit; end;

The definition of the lookup table DWH_CUSTOMER_LOOKUP is:

The insert mapping from staging to the dimension is designed as:

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Appendices 16 of 16

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

The operator CST_SEQ is a sequence, which generates the primary/surrogate key (CST_ID) of the dimension. The constant in the mapping is the sysdate (could also be some other logical date). The filter used in the mapping makes sure that only the records with SCD_ind = ‘I’ (records to be inserted) are selected. The mapping properties of the dimension are:

The update mapping from staging to the dimension is designed as:

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Appendices 17 of 17

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

The filter used in the mapping makes sure that only the records with SCD_ind = ‘U’ (records to be updated) are selected. Using the S_KEY the dimension record to be updated is found. The current region attributes of the dimension are updated with the region attributes from the staging table and the previous attributes with the current attributes from staging. The mapping properties of the dimension are:

After running both the mappings the lookup table must be updated to reflect the recent changes made to the dimension. This is done by the following procedure:

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Appendices 18 of 18

CREATE OR REPLACE PROCEDURE "UPD_CST_LKP" IS begin DELETE FROM DWH_CUSTOMER_LOOKUP; COMMIT; INSERT INTO DWH_CUSTOMER_LOOKUP (cst_code ,cst_key ,cst_current_region_code ,cst_current_region_name ) SELECT cst_code , cst_id , reg_current_code , reg_current_name

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

FROM DWH_CUSTOMERS GROUP BY cst_code , cst_id , reg_current_code , reg_current_name ; COMMIT; end;

After the initial load and running the procedure to set the SCD_ind the staging table contains:

All the records have ‘I’ as SCD_ind, what means they will all be inserted as new records in the dimension. This means that only the insert mapping will fill the dimension and it will look like:

After running the 2 mappings the lookup table must be refreshed:

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Appendices 19 of 19

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

Now the source file is edited as follows:

Customer_codeCustomer_name Region_code Region_name Country_code Country_name NLN01 AAD NLS Holland South NL Holland NLN03 GERT NLN Holland North NL Holland …

After reloading and running CHK_CUSTOMER the staging table contains:

The record with customer_code is ‘NLN01’ has a SCD_ind = ‘U’ what means this code already exists in the dimension and the current region differs from the region code. The S_KEY column contains the key of the dimension record to be updated and the column CURRENT_REGION_CODE (and CURRENT_REGION_NAME not shown here) the current region code from the dimension. Now the insert mapping and the update mapping will both process 1 record. The dimension after processing:

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Appendices 20 of 20

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

2. Determine whether to insert or update a record during running the mapping.

The following view is used as source for the insert mapping to the customer dimension:

The query of the view is defined as:

SELECT c.customer_code , c.customer_name , c.region_code , c.region_name , c.country_code , c.country_name FROM STG_CUSTOMERS c WHERE c.customer_code NOT IN (SELECT cl.cst_code FROM DWH_CUSTOMER_LOOKUP cl)

The insert mapping from staging to the dimension is designed as:

Mapping properties:

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Appendices 21 of 21

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

The view for the update mapping is:

The definition of the view: SELECT "STG_CUSTOMERS"."CUSTOMER_CODE", "STG_CUSTOMERS"."CUSTOMER_NAME", "STG_CUSTOMERS"."REGION_CODE", "STG_CUSTOMERS"."REGION_NAME", "STG_CUSTOMERS"."COUNTRY_CODE", "STG_CUSTOMERS"."COUNTRY_NAME", "DWH_CUSTOMER_LOOKUP"."CST_CURRENT_REGION_CODE", "DWH_CUSTOMER_LOOKUP"."CST_CURRENT_REGION_NAME", "DWH_CUSTOMER_LOOKUP"."CST_KEY" FROM "STG_CUSTOMERS" "STG_CUSTOMERS", "DWH_CUSTOMER_LOOKUP" "DWH_CUSTOMER_LOOKUP" WHERE "STG_CUSTOMERS"."CUSTOMER_CODE" = "DWH_CUSTOMER_LOOKUP"."CST_CODE" AND "STG_CUSTOMERS"."REGION_CODE" <> "DWH_CUSTOMER_LOOKUP"."CST_CURRENT_REGION_CODE"

The update mapping is then:

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Appendices 22 of 22

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

Mapping properties:

Be aware that the lookup table must also be refreshes as is done in the first approach.

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Appendices 23 of 23

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

Example of Type II in OWB

This chapter describes an example of how to implement a type II SCD in OWB. This example handles the dimension DWH_PRODUCTS. First the original source file is loaded into the staging table STG_ PRODUCTS. Then the data is loaded into the dimension from the staging table via an insert and an update OWB mapping. There are 2 ways to do this:

1. One change per product per load. 2. Multiple changes per product per load.

In both cases a procedure checks if a record exists or not and if so, if one of the attributes have changed. This procedure uses a lookup table based on the product dimension to do this. After this an insert mapping fills the dimension with the new and corrected products. In the second case the records from staging are presorted by product id and transaction date. Then some records in the dimension are updated to reflect the new situation correctly, after which the lookup table is updated with these changes.

After the dimension is filled, the product description of one of the products is corrected and a new product is added in the source file. This updated file is loaded into the staging table and from there the data is processed to the dimension, where the corrected record causes an update of the current and previous attributes and a new record is created for the new product.

For the second case 3 corrected records are created in the source file for one product each with a different transaction date. This updated file is loaded into the staging table and from there the data is processed to the dimension, where the corrected records cause new records and an update of the records except for the latest version of the product.

The mapping from source file to staging table looks like (in both cases):

As can be seen in the mapping, the staging table contains 4 extra columns: SCD_IND and S_KEY. These columns are filled with data from the lookup table by a procedure in the first case. The S_KEY is not used in the second case.

1. One change per product per load.

After the source to staging mapping the following procedure is executed to determine if an insert or update must be done.

begin

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Appendices 24 of 24

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

update stg_products set SCD_ind = null , s_key = null where SCD_ind is not null ; commit; /* Detremine the new rows. */ update stg_products prd set prd.SCD_ind = 'I' where not exists (select 'x' from dwh_product_lookup prd2 where prd2.prd_code = prd.product_code ) ; commit; /* Detremine the changed rows. */ update stg_products prd set (prd.SCD_ind , prd.s_key) = (select 'U' , prd_key from dwh_product_lookup where prd_code = prd.product_code) where (prd.product_code ) in (select product_code from (select prd2.product_code product_code , prd2.product_description , prd2.product_group_code , prd2.product_group_description , prd2.product_category_code , prd2.product_category_description from stg_products prd2 where prd2.SCD_ind is null minus select plkp.prd_code , plkp.prd_description , plkp.pgr_code , plkp.pgr_description , plkp.pcg_code , plkp.pcg_description from dwh_products plkp where plkp.prd_current_flag = 'Y' ) ) ; commit; end;

The definition of the lookup table DWH_PRODUCT_LOOKUP is:

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Appendices 25 of 25

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

The insert mapping from staging to the dimension is designed as:

The operator PRD_SEQ is a sequence, which generates the primary/surrogate key (PRD_ID) of the dimension. The constant in the mapping is the string ‘Y’ to set the current flag column. The filter used in the mapping makes sure that only the records with SCD_ind = ‘I’ and ‘U’ (records to be inserted and/or updated) are selected. The Expression converts the transaction date from varchar2 to date. The mapping properties of the dimension are:

The update mapping from staging to the dimension is designed as:

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Appendices 26 of 26

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

The filter used in the mapping makes sure that only the records with SCD_ind = ‘U’ (records to be updated) are selected. Using the S_KEY the dimension record to be updated is found. The current flag is updated to ‘N’ and the effective end date with the transaction date –1 from the new product version. The mapping properties of the dimension are:

After running both the mappings the lookup table must be updated to reflect the recent changes made to the dimension. This is done by the following procedure:

begin

DELETE FROM DWH_PRODUCT_LOOKUP; COMMIT; INSERT INTO DWH_PRODUCT_LOOKUP (prd_code ,prd_key

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Appendices 27 of 27

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

,prd_effective_startdate ) SELECT prd_code , prd_id , prd_effective_startdate FROM DWH_PRODUCTS WHERE prd_current_flag = 'Y' ; COMMIT; end; After the initial load and running the procedure to set the SCD_ind the staging table contains:

All the records have ‘I’ as SCD_ind, what means they will all be inserted as new records in the dimension. This means that only the insert mapping will fill the dimension and it will look like:

After running the 2 mappings the lookup table must be refreshed:

Now the source file is edited as follows:

product_code

product_ description

product_ group_code

product_group _description

product_category _code

product_category _description

transaction _date

AP Apples F Fruit F Food 20010120CG Cognac L Liquor D Drinks 20010118

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Appendices 28 of 28

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

After reloading and running CHK_PRODUCT the staging table contains:

The record with product code is ‘AP’ has a SCD_ind = ‘U’ what means this code already exists in the dimension and one or more of the attributes differs from the ones of the dimension. The S_KEY column contains the key of the dimension record to be updated and the transaction date column contains the date on which the new version of the product became valid. Now the insert mapping and the update mapping will both process 1 record. The dimension after processing the insert mapping:

After the update mapping:

2. Multiple changes per product per load. After the source to staging mapping the same procedure as above is executed to determine if an insert or update must be done.

The insert mapping from staging to the dimension is designed as:

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Appendices 29 of 29

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

The filter only selects the records with SCD_ind in (‘I’,’U’). The records are preordered by product code and transaction date, to make sure that the latest version of the product is inserted last into the dimension. The expression converts the transaction date from a varchar to a date.

Mapping properties:

The update mapping:

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Appendices 30 of 30

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

with the following properties per column:

The definition of the view is: CREATE OR REPLACE VIEW DWH_UPD_PRD_V (prd_id, effective_enddate) AS SELECT "PRD_ID","SDATE" FROM (SELECT P1.PRD_ID , P2.PRD_EFFECTIVE_STARTDATE-1 SDATE FROM DWH_PRODUCTS P1 , DWH_PRODUCTS P2 WHERE P2.PRD_CODE (+) = P1.PRD_CODE AND P2.PRD_ID (+) = P1.PRD_ID+1 AND P1.PRD_CURRENT_FLAG = 'Y' AND P1.PRD_EFFECTIVE_ENDDATE IS NULL ) WHERE SDATE IS NOT NULL UNION SELECT DISTINCT SP.S_KEY , SP3.S_DATE FROM STG_PRODUCTS SP , (SELECT SP2.PRODUCT_CODE , MIN(TO_DATE(SP2.TRANSACTION_DATE,'YYYYMMDD'))-1 S_DATE FROM STG_PRODUCTS SP2 WHERE SP2.S_KEY IS NOT NULL GROUP BY SP2.PRODUCT_CODE) SP3 WHERE SP.SCD_IND = 'U' AND SP.PRODUCT_CODE = SP3.PRODUCT_CODE

This view selects the records from the product dimension, which have a current indicator ‘Y’, and an end date is null, except the one with the highest product id per product.

This view is used to do an update on the product dimension, which includes setting the current flag to 'N' and filling the end date with the start date -1 of the next version of the same product. The latest version of the product must not be updated so the indicator keeps the value ‘Y’. This is done so this version can appear in the lookup table.

After running both the mappings the lookup table must be updated to reflect the recent changes made to the dimension as described in case 1.

The initial load of the product file is the same as in case 1. The product file is now updated and the changes are:

product_code

product_ description

product_ group_code

product_group _description

product_category _code

product_category _description

transaction _date

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Appendices 31 of 31

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

AP Apples F Fruit F Food 20010120AP Apples V Vegatable F Food 20010125CG Cognac L Liquor D Drinks 20010118CG Cognac 1l L Liquor D Drinks 20010127

After reloading and running CHK_PRODUCT the staging table contains:

The dimension after processing the insert mapping:

As can be seen all the changed and new records are added to the dimension with a current flag ‘Y’ and an end date is null. The view DWH_UPD_PRD_V that is used for the update mapping contains:

After the update mapping the dimension contains:

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Appendices 32 of 32

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

The latest version of the product kept a current flag equal to ‘Y’ and an end date is null. The other versions were updated, the current flag is set to ‘N’ and the end date is set to the start date of the next version minus one. After the mappings the lookup table must be refreshed to reflect the recent changes:

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Appendices 33 of 33

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

Example of loading facts

This appendix describes two examples of loading the facts into the data warehouse. First the facts are loaded with just one change of a product in this load. The second example deals with multiple changes of a product in a load.

The facts are loaded into the staging table using the following mapping:

1. One change per product per load.

The mapping from staging to data warehouse is designed as:

The expression converts the transaction date as varchar2 to a date. The function LOOKUP_ID does a lookup of the dimension keys cst_id (customer) and prd_id (product) based on the codes provided by the source files. The function is implemented as follows:

begin select cst_id into p_cst_id from dwh_customers where cst_code = p_cst_cd ; select prd_key into p_prd_id from dwh_product_lookup where prd_code = p_prd_cd ; return p_cst_id;

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Appendices 34 of 34

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

return p_prd_id; end;

After loading the source file the staging contains:

Before running the fact mapping the dimension mappings have to be executed. The dimensions contain after the mappings:

DWH_SUPPLIERS

DWH_CUSTOMERS

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Appendices 35 of 35

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

DWH_PRODUCTS

After running the fact mapping the Sales table contains:

DWH_SALES

2. Multiple changes per product per load.

The mapping from staging to data warehouse is almost the same as the one above. The only difference is that a different lookup function is used. The mapping looks like:

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Appendices 36 of 36

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

The expression converts the transaction date as varchar2 to a date. The function LOOKUP_ID does a lookup of the dimension keys cst_id (customer) and prd_id (product) based on the codes provided by the source files. The function is implemented as follows:

begin select cst_id into p_cst_id from dwh_customers where cst_code = p_cst_cd ; select prd_id into p_prd_id from dwh_products where prd_code = p_prd_cd and effective_startdate <= p_transaction_date and (effective_enddate >= p_transaction_date or effective_enddate is null) ; return p_cst_id; return p_prd_id; end;

After loading the source file the staging contains:

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Appendices 37 of 37

OWB Professional Comm

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

The dimensions Supplier and Customer are the same as shown in the case above. The product dimension is: DWH_PRODUCTS

After running the fact mapping the Sales table contains:

DWH_SALES

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Appendices 38 of 38

OWB Professional Comm

File Ref: PC-OWB-002-Slowly Changing Dimensions (v. 1.0 )

Doc Ref: PC-OWB-002-Slowly Changing Dimensions 21-02-2002

Appendices 39 of 39

DWH_SALES DWH_PRODUCTS