Post on 07-Apr-2018
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 1/30
Training Material
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 2/30
Data Warehouse
Name of the author: Kanchan Yadav
Date Created:23rd Feb 2009
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 3/30
Data Warehouse Architecture 3
CONTENTS
Why DW Systems?
Why Now?
Status of DW Systems
DW Architecture
Operations System vs. DW Systems
Data Quality
ETL
Audit Requirements
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 4/30
4
Data Warehouse Architecture 4
What is a Data Warehouse?
A single, complete andconsistent store of dataobtained from a variety ofdifferent sources madeavailable to end users in awhat they can understandand use in a businesscontext.
[Barry Devlin]
E.g. Same person promoted and asked to leave
E.g. customer for Savings A/C and same customer for Loan are considered
different
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 5/30
5
Data Warehouse Architecture 5
Data Warehousing --
It is a process
Technique for assembling andmanaging data from various sourcesfor the purpose of answeringbusiness questions. Thus makingdecisions that were not previouspossible
A decision support databasemaintained separately from theorganization’s operational database
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 6/30
6
Data Warehouse Architecture 6
Why Now?
Data is being produced
ERP provides clean data
The computing power is available
The computing power is affordable
The competitive pressures are strong
Commercial products are available
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 7/30
7
Data Warehouse Architecture 7
Status of Data Warehousing Systems
Just begun in India
One of the hottest & fast growing area, inIndia & abroad
Many IT organizations have establishedseparate group for domestic & export work
Data Warehousing already being carried outin various Indian organizations, includingStock exchanges, Banks, Telcom, Insurance
cos, etc. Data Mining not far behind
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 8/30
Data Warehouse Architecture 8
Data Warehouse Architecture
OLTP 1RDBMS
OLTP 3ERP
OLTP 2 VSAM
DataWarehouse/Data Mart
Staging Area
Cube II
Cube I
OLAPTool –Slicing /Dicing
Query /Reporting
Tool
ETL
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 9/30
9
Data Warehouse Architecture 9
What are Operational Systems?
They are mainly OLTPsystems
Run mission criticalapplications
Need to work with stringentperformance requirementsfor routine tasks
Used to run a business!
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 10/30
Data Warehouse Architecture 10
Data Environment in a business
Organization
Current scenario
Data are captured through differentapplications
Applications may be on different platforms,developed at different times with specifiedobjectives
Getting data out of applications is a difficulttask
In other words Data are said to be “in jail”
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 11/30
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 12/30
Data Warehouse Architecture 12
Data in Data Warehouse
Used to manage and control business
Data is historical or point-in-time
Optimized for inquiry rather than update
Use of the system is loosely defined and can be ad-
hoc
Used by managers and end-users to understand the
business and make judgements
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 13/30
Data Warehouse Architecture 13
Application-Orientation vs. Subject-
Orientation Application-Orientation Subject-Orientation
DataWarehouse
Customer
Vendor
Product
Activity
OperationalDatabase
LoansCreditCard
TrustSavings
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 14/30
Data Warehouse Architecture 14
Integration
Integration can take place in various dimensions likeconsistent naming conventions, consistentmeasurement of variables, consistent encodingstructures, consistent physical attributes of data etc.Integration is done at data staging level withoutchanging the operational application systems.
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 15/30
Data Warehouse Architecture 15
Time Orientation
Data warehouse data are snapshot data
It has longer time horizon
It has a key structure containing an element of time.
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 16/30
Data Warehouse Architecture 16
Non Volatility
Data are loaded into the warehouse and accessedthere, but once the snapshot of data is made, the datain the warehouse do not change. Data can be updatedaccording to pre-announced calendar of programme.
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 17/30
Data Warehouse Architecture 17
Metadata
Metadata explains what data exists, where it is locatedand how to access it. The metadata is a core of a datalogistics system, the infrastructure for DW andultimately the intelligence system.
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 18/30
Data Warehouse Architecture 18
To summarize ...
OLTP Systems areused to “run” abusiness
The Data Warehousehelps to “optimize” thebusiness
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 19/30
Loading the Warehouse
Cleaning the data beforeit is loaded
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 20/30
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 21/30
Data Warehouse Architecture 21
Data Quality - The Reality
Tempting to think creating a data warehouse issimply extracting operational data and entering intoa data warehouse
Nothing could be farther from the truth
Warehouse data comes from disparatequestionable sources
Data Profiling before and during ETL
Batch Totals
Handling Missing Values / Outliers / Duplicates / Non-quality Data
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 22/30
Data Warehouse Architecture 22
Data Integration Across Sources
Trust Credit cardSavings Loans
Same datadifferent name
Different dataSame name
Different Type,Length
Different Units
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 23/30
23
Data Warehouse Architecture 23
Data Transformation Terms
Transformation
The conversion of data types from the
source to the target data store
(warehouse) -- always a relational
database
•90% of persons were born on November 11, 1911
• 80% robbery performed in Ghatkopar, Chowky # 1
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 24/30
Data Warehouse Architecture 24
Data Transformation Example
e n c o
d i n
g
u n
i t
f i e
l d
appl A - balanceappl B - balappl C - currbalappl D - balcurr
appl A - pipeline - cmappl B - pipeline - inappl C - pipeline - feetappl D - pipeline - yds
appl A - m,f appl B - 1,0appl C - x,yappl D - male, female
Data Warehouse
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 25/30
Data Warehouse Architecture 25
Data Integrity Problems
Same person, different spellings
Agarwal, Agrawal, Aggarwal etc...
Multiple ways to denote company name
Persistent Systems, PSPL, Persistent Pvt.LTD.
Use of different names
mumbai, bombay
Different account numbers generated by differentapplications for the same customer
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 26/30
26
Data Warehouse Architecture 26
Data Integrity Problems (Cont)
Required fields left blank
Invalid product codes collected at point of sale
manual entry leads to mistakes
“in case of a problem use 9999999”
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 27/30
27
Data Warehouse Architecture 27
Loads
After extracting, scrubbing, cleaning, validating etc.need to load the data into the warehouse
Issues
huge volumes of data to be loaded
small time window available when warehouse canbe taken off line (usually nights)
when to build index and summary tables
allow system administrators to monitor, cancel,resume, change load rates
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 28/30
Data Warehouse Architecture 28
When to Refresh?
periodically (e.g., every night, every week) or after
significant events
on every update: not warranted unless warehouse
data require current data (up to the minute stock
quotes)
refresh policy set by administrator based on user
needs and traffic
possibly different policies for different sources
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 29/30
Data Warehouse Architecture 29
Extraction Techniques
Full Extract from base tables
read entire source table: too expensive
maybe the only choice for legacy systems
8/4/2019 Data Warehouse Basic
http://slidepdf.com/reader/full/data-warehouse-basic 30/30
Data Warehouse Architecture 30
How To Detect Changes
Create a snapshot log table to record ids ofupdated rows of source data and timestamp
Detect changes by:
Defining after row triggers to update snapshot logwhen source table changes
Using regular transaction log to detect changes tosource data