Date Warehousing

download Date Warehousing

of 19

  • date post

  • Category


  • view

  • download


Embed Size (px)

Transcript of Date Warehousing



Data Warehouse or Data marty There are many definitions of a data warehouse and data marts.

However there is no single standard definition. However for our purposes we will define them as follows: Data WarehouseExtreme VolumeContains years of daily information at the lowest grain possible.

Data MartSpecific Volume SetsMay only contain month to date information.

Corporate wideThe grouping of data elements is dictated by the corporate structure.

SpecificData is grouped by needs of the team or group building the solution.

Facts and DimensionsThis system is typically made up of facts.

Many MetricsHas many metric tables and rollup grains

Serves data to Datamarts Will have data that is shared across groups.

Serves data to Reports Will have data specific to only the implementation group.

Traditional Reporting Solutions



Opens the door for: Conflicting numbers Human Error Miss-understood data Non-Efficent

Many systems performing many different business functions

Many reports from the multiple sources

Human intervention is needed to makes sense of different reports.

The Disorganized Closet

Like a Disorganized Closet The Data is there, but do you know that you have that special shirt you really need on Friday

Organize Your Data Closet

It takes time It takes discipline It takes a planned approach

Simple Data Warehousing


Automated reporting that understands all sources.

Many systems performing many different business functions

A centralized shared location for all the data.

Reports specific to each system can still be delivered.

Adding Data Marts

Data Mart


Data Mart Reports specific to each system can still be delivered.

Automated reporting that understands all sources can be delivered.

Loading the Warehouse

The Website A normal website where customer can come and order items from your company.

Website DB This is your standard relational database system. Tracks a lot of information.

Data A single data file containing All Orders made by a customer for that day.

Ware House All this data is stored in a single table called Orders

Extract, Transform, Load (ETL) is a process used to get data into your warehouse. The typical chain of events is as follows: Your front end system, usually a transactional system, will send data

and information to its relational database. At certain periods, nightly, hourly or otherwise, a single data file is extracted and delivered to a specified location. The warehouse on detecting a new file will transform this information and load it into its standard model.

Components of a Datawarehouse

External Sources


Data Marts


Factsy Source this record is a one to one match of data delivered from

the data file in the ETL process.Name Markus John CustomerID 1001 1010 Product Airplane Car Price 19.00 10.00 Date 06/25/2010 06/28/2010

y Fact - A fact is a single measureable data piece. Fact tables will not

typically contain text fields. They will also always have a date associated with them. This represents when that fact was taken.Date 06/25/2010 06/28/2010 CustomerDimID 1 2 ProductDimID 101 102 Price 19.00 10.00

Dimensionsy Dimension A dimension is an attribute found within the source

data. In a perfect warehouse all text elements would be turned into dimensions. You may even do this to numeric values. Dimension will typically speed up your reporting processes.CustDimID 1 2 CustID 1001 1010 Name Markus John ProductDimID 101 102 Name Airplane Car

y A Conformed Dimension table is a dimension table that is shared

throughout all of your data marts within your warehouse. For example: A customer, product or employee dimension might be considered a core dimension.

Time Sensitive Dimensionsy Slow Changing Dimensions allow for time sensitive data tracking y Simply add start and end dates to each dimension table. y This will impact your loading and transformation processes.

CustDimID 1 2

CustID 1001 1010

Name Markus John

State NC SC

Start 01/01/2010 01/01/2010

End 01/01/2070 01/01/2070

2 Customers one in NC one in SC.

CustDimID 1 2 3

CustID 1001 1010 1001

Name Markus John Markus

State NC SC SC

Start 01/01/2010 01/01/2010 02/01/2010

End 01/31/2010 01/01/2070 01/01/2070

Markus moves to SC in Feb. You can still report accurate NC sales in Jan because of the start and end dates.

Hierarchiesy Hierarchies are dimension tables that reflect parent to children

relationships. Typically a hierarchy table will be used to rollup metrics to different levels.y We can turn the product dimension table into a hierarchy by adding

a parent product code.ProductDimID 100 101 102 Name Toys Airplane Car ProductCode T A1 C1 ParentCode T T For Example: The Airplane and Car both belong to the Toys product line. This hierarchy could be used to rollup and produce all Toy sales.

Facts and Dimensionsy Notice how this fact tables has relations to the dimension tables.

This allows us to pivot the facts around each dimension in an efficient manner.Date 06/25/2010 06/28/2010 CustomerDimID 1 2 ProductDimID 101 102 Price 19.00 10.00

CustDimID 1 2

CustID 1001 1010

Name Markus John

ProductDimID 101 102

Name Airplane Car

Metricsy Metric A metric is an aggregation of fact information, usually

around a particular set of dimensions. In a typical environment a metric table becomes the source for a single report. But because of the dimensions, metrics can be combined across multiple systems.y These metric tables are combined with their dimensions to produce

that actual output of the reports.Date 06/25/2010 06/28/2010 Date 06/25/2010 06/28/2010 ProductDimID 101 102 CustDimID 1 2 NumOrders 20 30 NumOrders 1 5

Product Metric TableFor Example: The Product Metric table might be used to show that Airplanes(101) were sold 30 times on the 28th.

Customer Metric TableIn this example you can see that Markus (1) bought one product, while John (2) bought 5 orders.

Star Schemay Star Schema The diagram used to depict a traditional data mart is

a called a star schema. Typically a fact or metric tables is placed in the center. All dimension tables are then laid out around it. Giving the diagram a star like appearance.Date Date Day of Week Month Name isHoliday Date CustomerDimID ProductDimID Customer CustomerDimID CustID Name Age Start End EmployeeDimID Price Orders Product ProductDimID ProductName ProdcutCode ParentCode Start End Employee EmployeeDimID SSN Manager SSN Status Start End

Using the Data Marty You can then take these fact and dimension tables and place them in

front of a reporting engine.y The user can then drill through the metrics. y The dimension tables allow the user to pivot the metrics through

any attribute. They can go from viewing Customers by State to viewing Sales by Employees by switching dimensions.y If users consistently want to use one view of the data, you may

decide to turn these into Metric tables.

Creating A Metric TableSQLCreating Metric TableSelect ,Orders.CustomerDimID ,count(*) as numOrders From Orders Group by orders.CustomerDimID Date 06/25/2010 06/28/2010 CustDimID 1 2 NumOrders 1 5

SchemaDate Date Day of Week Month Name isHoliday Orders Date CustomerDimID Customer CustomerDimID CustID Name Age Start End ProductDimID EmployeeDimID Price Employee EmployeeDimID SSN Manager SSN Status Start End ProductDimID ProductName ProdcutCode ParentCode Start End Product

Reporting SQLSelect date, numOrders, Customer.Name from Metric_NumOrders inner join customer on customer.customerDimID = Metric.customerDimID where date between 01/01/2010 and 01/31/2010

Referencesy A data mart is not a data warehouse

y General Data Warehousing Articles