Date Warehousing

19
AN INTRO TO DATA MARTS AND DATA WAREHOUSES MARKUS BEAMER BDPA-CHARLOTTE WWW.BDPA-CHARLOTTE.ORG ALSO AVAILABLE ON MOBEAMER.BLOGGER.COM Intelligent Data Strategies

Transcript of Date Warehousing

Page 1: Date Warehousing

AN INTRO TO DATA MARTS AND DATA WAREHOUSES

MARKUS BEAMERBDPA-CHARLOTTEWWW.BDPA-CHARLOTTE.ORG

ALSO AVAILABLE ON MOBEAMER.BLOGGER.COM

Intelligent Data Strategies

Page 2: Date Warehousing

Data Warehouse or Data mart

There are many definitions of a data warehouse and data marts. However there is no single standard definition. However for our purposes we will define them as follows:

Data Warehouse Data MartExtreme VolumeContains years of daily information at the lowest grain possible.

Specific Volume SetsMay only contain month to date information.

Corporate wideThe grouping of data elements is dictated by the corporate structure.

SpecificData is grouped by needs of the team or group building the solution.

Facts and DimensionsThis system is typically made up of facts.

Many MetricsHas many metric tables and rollup grains

Serves data to Datamarts Serves data to Reports

Will have data that is shared across groups.

Will have data specific to only the implementation group.

Page 3: Date Warehousing

Traditional Reporting Solutions

Many systems performing many different business functions

Analyst(s) Intelligence

Many reports from the multiple sources

Human intervention is needed to “makes sense” of different reports.

Opens the door for:• Conflicting numbers• Human Error• Miss-understood data• Non-Efficent

Page 4: Date Warehousing

The Disorganized Closet

Like a Disorganized Closet

The Data is there, but do you know that you have that special shirt you really need on Friday

Page 5: Date Warehousing

Organize Your Data Closet

It takes time

It takes discipline

It takes a planned approach

Page 6: Date Warehousing

Simple Data Warehousing

Many systems performing many different business functions

Intelligence

A centralized shared location for all the data.

Automated reporting that understands all sources.

Reports specific to each system can still be delivered.

Page 7: Date Warehousing

Adding Data Marts

Intelligence

Automated reporting that understands all sources can be delivered.

Reports specific to each system can still be delivered.

Data Mart

Data Mart

Page 8: Date Warehousing

Loading the Warehouse

The Website

A normal website where customer can come and order items from your

company.

Data

A single data file containing All Orders made by a

customer for that day.

Website DB

This is your standard relational database system. Tracks a lot of information.

Ware House

All this data is stored in a single table called

“Orders”

Extract, Transform, Load (ETL) is a process used to get data into your warehouse. The typical chain of events is as follows:

Your front end system, usually a transactional system, will send data and information to it’s relational database.

At certain periods, nightly, hourly or otherwise, a single data file is extracted and delivered to a specified location.

The warehouse on detecting a new file will transform this information and load it into it’s standard model.

Page 9: Date Warehousing

Components of a Datawarehouse

External SourcesExternal Sources StageStage Data MartsData Marts ReportsReports

Page 10: Date Warehousing

Facts

Source – this record is a one to one match of data delivered from the data file in the ETL process.

Fact - A fact is a single measureable data piece. Fact tables will not typically contain text fields. They will also always have a date associated with them. This represents when that fact was taken.

Date CustomerDimID

ProductDimID Price

06/25/2010 1 101 19.00

06/28/2010 2 102 10.00

Name CustomerID Product Price Date

Markus 1001 Airplane 19.00 06/25/2010

John 1010 Car 10.00 06/28/2010

Page 11: Date Warehousing

Dimensions

Dimension – A dimension is an attribute found within the source data. In a perfect warehouse all text elements would be turned into dimensions. You may even do this to numeric values. Dimension will typically speed up your reporting processes.

A Conformed Dimension table is a dimension table that is shared throughout all of your data marts within your warehouse. For example: A customer, product or employee dimension might be considered a core dimension.

CustDimID CustID Name

1 1001 Markus

2 1010 John

ProductDimID Name

101 Airplane

102 Car

Page 12: Date Warehousing

Time Sensitive Dimensions

Slow Changing Dimensions allow for time sensitive data tracking Simply add start and end dates to each dimension table. This will impact your loading and transformation processes.

CustDimID CustID Name State Start End

1 1001 Markus NC 01/01/2010 01/01/2070

2 1010 John SC 01/01/2010 01/01/2070

2 Customers one in NC one in SC.

CustDimID CustID Name State Start End

1 1001 Markus NC 01/01/2010 01/31/2010

2 1010 John SC 01/01/2010 01/01/2070

3 1001 Markus SC 02/01/2010 01/01/2070

Markus moves to SC in Feb.

You can still report accurate NC sales in Jan because of the start and end dates.

Page 13: Date Warehousing

Hierarchies

Hierarchies are dimension tables that reflect parent to children relationships. Typically a hierarchy table will be used to “rollup” metrics to different levels.

We can turn the product dimension table into a hierarchy by adding a parent product code.

For Example: The Airplane and Car both belong to the Toys product line. This hierarchy could be used to rollup and produce all Toy sales.

ProductDimID

Name ProductCode

ParentCode

100 Toys T -

101 Airplane A1 T

102 Car C1 T

Page 14: Date Warehousing

Facts and Dimensions

Notice how this fact tables has relations to the dimension tables. This allows us to “pivot” the facts around each dimension in an efficient manner.

CustDimID CustID Name

1 1001 Markus

2 1010 John

ProductDimID Name

101 Airplane

102 Car

Date CustomerDimID ProductDimID Price

06/25/2010 1 101 19.00

06/28/2010 2 102 10.00

Page 15: Date Warehousing

Metrics

Metric – A metric is an aggregation of fact information, usually around a particular set of dimensions. In a typical environment a metric table becomes the source for a single report. But because of the dimensions, metrics can be combined across multiple systems.

These metric tables are combined with their dimensions to produce that actual output of the reports.

Date ProductDimID

NumOrders

06/25/2010 101 20

06/28/2010 102 30

Date CustDimID NumOrders

06/25/2010 1 1

06/28/2010 2 5

Product Metric TableFor Example: The Product Metric table might be used to show that “Airplanes”(101) were sold 30 times on the 28th.

Customer Metric TableIn this example you can see that Markus (1) bought one product, while John (2) bought 5 orders.

Page 16: Date Warehousing

Star Schema

Star Schema – The diagram used to depict a traditional data mart is a called a star schema. Typically a fact or metric tables is placed in the center. All dimension tables are then laid out around it. Giving the diagram a star like appearance.

Orders

Date

CustomerDimID

ProductDimID

EmployeeDimID

Price

Product

ProductDimID

ProductName

ProdcutCode

ParentCode

Start

End

Customer

CustomerDimID

CustID

Name

Age

Start

End

Date

Date

Day of Week

Month Name

isHoliday

Employee

EmployeeDimID

SSN

Manager SSN

Status

Start

End

Page 17: Date Warehousing

Using the Data Mart

You can then take these fact and dimension tables and place them in front of a reporting engine.

The user can then drill through the metrics.

The dimension tables allow the user to “pivot” the metrics through any attribute. They can go from viewing Customers by State to viewing Sales by Employees by switching dimensions.

If users consistently want to use one view of the data, you may decide to turn these into Metric tables.

Page 18: Date Warehousing

SQLSQL SchemaSchema

Creating Metric TableSelect Orders.date

,Orders.CustomerDimID,count(*) as numOrders

From Orders Group by orders.CustomerDimID

Reporting SQLSelect date, numOrders, Customer.Namefrom Metric_NumOrders inner join customer on customer.customerDimID = Metric.customerDimIDwhere date between ‘01/01/2010’ and ‘01/31/2010’

Creating A Metric Table

Orders

Date

CustomerDimID

ProductDimID

EmployeeDimID

Price

Product

ProductDimID

ProductName

ProdcutCode

ParentCode

Start

End

Customer

CustomerDimID

CustID

Name

Age

Start

End

Date

Date

Day of Week

Month Name

isHoliday

Employee

EmployeeDimID

SSN

Manager SSN

Status

Start

End

Date CustDimID NumOrders

06/25/2010 1 1

06/28/2010 2 5

Page 19: Date Warehousing

References

A data mart is not a data warehouse http://www.information-management.com/infodirect/19991120/1675-1.html

General Data Warehousing Articles http://www.ralphkimball.com/html/articles.html