Date Warehousing
Transcript of Date Warehousing
AN INTRO TO DATA MARTS AND DATA WAREHOUSES
MARKUS BEAMERBDPA-CHARLOTTEWWW.BDPA-CHARLOTTE.ORG
ALSO AVAILABLE ON MOBEAMER.BLOGGER.COM
Intelligent Data Strategies
Data Warehouse or Data mart
There are many definitions of a data warehouse and data marts. However there is no single standard definition. However for our purposes we will define them as follows:
Data Warehouse Data MartExtreme VolumeContains years of daily information at the lowest grain possible.
Specific Volume SetsMay only contain month to date information.
Corporate wideThe grouping of data elements is dictated by the corporate structure.
SpecificData is grouped by needs of the team or group building the solution.
Facts and DimensionsThis system is typically made up of facts.
Many MetricsHas many metric tables and rollup grains
Serves data to Datamarts Serves data to Reports
Will have data that is shared across groups.
Will have data specific to only the implementation group.
Traditional Reporting Solutions
Many systems performing many different business functions
Analyst(s) Intelligence
Many reports from the multiple sources
Human intervention is needed to “makes sense” of different reports.
Opens the door for:• Conflicting numbers• Human Error• Miss-understood data• Non-Efficent
The Disorganized Closet
Like a Disorganized Closet
The Data is there, but do you know that you have that special shirt you really need on Friday
Organize Your Data Closet
It takes time
It takes discipline
It takes a planned approach
Simple Data Warehousing
Many systems performing many different business functions
Intelligence
A centralized shared location for all the data.
Automated reporting that understands all sources.
Reports specific to each system can still be delivered.
Adding Data Marts
Intelligence
Automated reporting that understands all sources can be delivered.
Reports specific to each system can still be delivered.
Data Mart
Data Mart
Loading the Warehouse
The Website
A normal website where customer can come and order items from your
company.
Data
A single data file containing All Orders made by a
customer for that day.
Website DB
This is your standard relational database system. Tracks a lot of information.
Ware House
All this data is stored in a single table called
“Orders”
Extract, Transform, Load (ETL) is a process used to get data into your warehouse. The typical chain of events is as follows:
Your front end system, usually a transactional system, will send data and information to it’s relational database.
At certain periods, nightly, hourly or otherwise, a single data file is extracted and delivered to a specified location.
The warehouse on detecting a new file will transform this information and load it into it’s standard model.
Components of a Datawarehouse
External SourcesExternal Sources StageStage Data MartsData Marts ReportsReports
Facts
Source – this record is a one to one match of data delivered from the data file in the ETL process.
Fact - A fact is a single measureable data piece. Fact tables will not typically contain text fields. They will also always have a date associated with them. This represents when that fact was taken.
Date CustomerDimID
ProductDimID Price
06/25/2010 1 101 19.00
06/28/2010 2 102 10.00
Name CustomerID Product Price Date
Markus 1001 Airplane 19.00 06/25/2010
John 1010 Car 10.00 06/28/2010
Dimensions
Dimension – A dimension is an attribute found within the source data. In a perfect warehouse all text elements would be turned into dimensions. You may even do this to numeric values. Dimension will typically speed up your reporting processes.
A Conformed Dimension table is a dimension table that is shared throughout all of your data marts within your warehouse. For example: A customer, product or employee dimension might be considered a core dimension.
CustDimID CustID Name
1 1001 Markus
2 1010 John
ProductDimID Name
101 Airplane
102 Car
Time Sensitive Dimensions
Slow Changing Dimensions allow for time sensitive data tracking Simply add start and end dates to each dimension table. This will impact your loading and transformation processes.
CustDimID CustID Name State Start End
1 1001 Markus NC 01/01/2010 01/01/2070
2 1010 John SC 01/01/2010 01/01/2070
2 Customers one in NC one in SC.
CustDimID CustID Name State Start End
1 1001 Markus NC 01/01/2010 01/31/2010
2 1010 John SC 01/01/2010 01/01/2070
3 1001 Markus SC 02/01/2010 01/01/2070
Markus moves to SC in Feb.
You can still report accurate NC sales in Jan because of the start and end dates.
Hierarchies
Hierarchies are dimension tables that reflect parent to children relationships. Typically a hierarchy table will be used to “rollup” metrics to different levels.
We can turn the product dimension table into a hierarchy by adding a parent product code.
For Example: The Airplane and Car both belong to the Toys product line. This hierarchy could be used to rollup and produce all Toy sales.
ProductDimID
Name ProductCode
ParentCode
100 Toys T -
101 Airplane A1 T
102 Car C1 T
Facts and Dimensions
Notice how this fact tables has relations to the dimension tables. This allows us to “pivot” the facts around each dimension in an efficient manner.
CustDimID CustID Name
1 1001 Markus
2 1010 John
ProductDimID Name
101 Airplane
102 Car
Date CustomerDimID ProductDimID Price
06/25/2010 1 101 19.00
06/28/2010 2 102 10.00
Metrics
Metric – A metric is an aggregation of fact information, usually around a particular set of dimensions. In a typical environment a metric table becomes the source for a single report. But because of the dimensions, metrics can be combined across multiple systems.
These metric tables are combined with their dimensions to produce that actual output of the reports.
Date ProductDimID
NumOrders
06/25/2010 101 20
06/28/2010 102 30
Date CustDimID NumOrders
06/25/2010 1 1
06/28/2010 2 5
Product Metric TableFor Example: The Product Metric table might be used to show that “Airplanes”(101) were sold 30 times on the 28th.
Customer Metric TableIn this example you can see that Markus (1) bought one product, while John (2) bought 5 orders.
Star Schema
Star Schema – The diagram used to depict a traditional data mart is a called a star schema. Typically a fact or metric tables is placed in the center. All dimension tables are then laid out around it. Giving the diagram a star like appearance.
Orders
Date
CustomerDimID
ProductDimID
EmployeeDimID
Price
Product
ProductDimID
ProductName
ProdcutCode
ParentCode
Start
End
Customer
CustomerDimID
CustID
Name
Age
Start
End
Date
Date
Day of Week
Month Name
isHoliday
Employee
EmployeeDimID
SSN
Manager SSN
Status
Start
End
Using the Data Mart
You can then take these fact and dimension tables and place them in front of a reporting engine.
The user can then drill through the metrics.
The dimension tables allow the user to “pivot” the metrics through any attribute. They can go from viewing Customers by State to viewing Sales by Employees by switching dimensions.
If users consistently want to use one view of the data, you may decide to turn these into Metric tables.
SQLSQL SchemaSchema
Creating Metric TableSelect Orders.date
,Orders.CustomerDimID,count(*) as numOrders
From Orders Group by orders.CustomerDimID
Reporting SQLSelect date, numOrders, Customer.Namefrom Metric_NumOrders inner join customer on customer.customerDimID = Metric.customerDimIDwhere date between ‘01/01/2010’ and ‘01/31/2010’
Creating A Metric Table
Orders
Date
CustomerDimID
ProductDimID
EmployeeDimID
Price
Product
ProductDimID
ProductName
ProdcutCode
ParentCode
Start
End
Customer
CustomerDimID
CustID
Name
Age
Start
End
Date
Date
Day of Week
Month Name
isHoliday
Employee
EmployeeDimID
SSN
Manager SSN
Status
Start
End
Date CustDimID NumOrders
06/25/2010 1 1
06/28/2010 2 5
References
A data mart is not a data warehouse http://www.information-management.com/infodirect/19991120/1675-1.html
General Data Warehousing Articles http://www.ralphkimball.com/html/articles.html