Dw Tutorial Index
-
Upload
asurianand -
Category
Documents
-
view
217 -
download
0
Transcript of Dw Tutorial Index
-
8/13/2019 Dw Tutorial Index
1/38
Index
What are the Source systems?
ETL process
EDW Enterprise data warehouse DM data Mart
OLAP Online analytical processing
Dimensional Modeling
Topology (all data marts, dependent,independent)
Audience
-
8/13/2019 Dw Tutorial Index
2/38
Data Warehousing.
Data Warehouse basic concepts
Data Warehouse Approach
Data Warehouse Implementation
OLAP (Online Analytical Processing)
Next steps in Data Warehousing
By V.S.Rajesh Kumar
November 2004
-
8/13/2019 Dw Tutorial Index
3/38
Data Warehouse- Concepts
Module 1
Data Warehouse basic concepts
-
8/13/2019 Dw Tutorial Index
4/38
What is DSS?
Decision Support System Mainly used by business to take some
strategic decisions based on the trends(comparing current fiscal to previous) and
project the numbers based on history andsome parameters
Not to run the business, OLTP systems takescare of the day to day activities of a business.Example SAP Order Management takes care of
the orders which the organization gets. In theDSS we collect all the data to do the analysis.
-
8/13/2019 Dw Tutorial Index
5/38
OLTP
Online Transaction processingsystem
Examples of OLTP systems are
order management, TERA etc Always follows 3rdnormal form,
while designing the database
All the DML types are active Deal with specific data (customer
x, product z etc)
-
8/13/2019 Dw Tutorial Index
6/38
OLTP vs DSS
More DML operations(Update, Delete,Inserts)
Point Queries
Very specific whileissuing queries
Less history(approximately 6months to 1 year)
Used for day todayactivities (must torun the business)
No change in thedata (No updates anddeletes)
Queries based on
time period, set ofproducts, set ofcustomers etc
Maintains the history.
Used mainly foranalytics (trendanalysis, customerbehavior etc)
-
8/13/2019 Dw Tutorial Index
7/38
General DSS Architecture
Source Data
OLTP 1
OLTP 2
Market Place
Web clicks
Data
Warehouse
Database
Database
Pre
Defined
Reports
Ad hocReporting
OLAP
Cubes
ODS
Staging
DB
ETL
(Tool or
TSQL)
Close the loop (write back to OLTP about the findings in DSS
Data
Mining
-
8/13/2019 Dw Tutorial Index
8/38
Architecture Diagram
Source Data
HR Data
Finance
Payroll
Project
Microsoft
DTS
(DataTransformation
Services)
&
StoredProcedures
ET&L
Data
Warehouse
Database
SQL ServerDatabase
Database
Pre
Defined
Reports
Ad hoc
Reporting
OLAP
Cubes
-
8/13/2019 Dw Tutorial Index
9/38
Example for a DSS
OLTP 1OLTP 2
OLTP 3 OLTP 4
Data
WarehouseOLAP
Reporting
Analytics
-
8/13/2019 Dw Tutorial Index
10/38
DSS Categories
Operational
Data Store
Support for:
Consolidated and
reconciled operational datacapture and access
Detailed, lightly
summarized
Process oriented, Subject
oriented
integrated
Volatile (updateable)
Current
Short; business process life
(30 to 90 days of history),purge
Enterprise
Data Warehouse
Support for:
Single source of consistent,
integrated, cross-functionaldata for access and
distribution
Detailed atomic record of
events, reference and
dimension masters,
derived, summarized
Subject oriented
integrated
non-volatile; periodic loads,
read onlyTime variant
Long; institutional memory
(2 years or more of history),
archive
Relational
Data Mart
Support for:
Subset of Integrated data,
separated for autonomousprocessing, optimized for
access
Aggregated, summarized,
specialized
Subject oriented
integrated
Non-volatile; periodic load,
can contain separate
updateable structures for
OLTP support Time variant
Variable retention; some
archive
Online Analytical
Processing
Support for:
Subset of Integrated data,
separated forautonomous processing,
optimized for access
Aggregated, summarized,
specialized
Subject oriented
integrated
Non-volatile; periodic
load, can contain
separate updateable
structures for OLTPsupport
Time variant
Variable retention; some
archive
EDWRDM OLAPODS
-
8/13/2019 Dw Tutorial Index
11/38
ETL (E Extract)
Extract Getting data out of the sourcesystems. This may be just a DTSpackage which pulls the data, orexporting a table to a flat file in thesource system.
In Teradata we have Fast Export utilitywhere we can export the data to a flatfile.
In Oracle we have SQL*Loader to
export the data to a flat file. In SQL Server we can use a DTS
package to do the same job
-
8/13/2019 Dw Tutorial Index
12/38
ETL (T Transform)
Transform Its not necessary to have thesame data model in source and destination.When the data model is different from sourceobviously we have to modify the source datato destinations data model. This process is
called transformation. Example : When we receive data from various
distis about the reseller information we wontget the geo information. So in the
transformation logic we will have some codewhich assigns the respective geo based on thecountry from which you are getting the data.
This is the simple example on transformation.
-
8/13/2019 Dw Tutorial Index
13/38
ETL (L Load)
Load Loding the transformed data intothe destination datamoel (datawarehouse).
As there are export functionality
available in each RDBMS there is anutility to import the data into thedatabase.
Teradata Fast Import
Oracle SQL*Loader
Sybase - bcp
-
8/13/2019 Dw Tutorial Index
14/38
Data Modeling for OLTP
Usually 3rdnormal form.
Advantages : Flexibility to modifyfor the changes. No redundancy of
the data in the model.
Disadvantages : Complex queriesto generate the reports as the
number of tables to join areusually high.
-
8/13/2019 Dw Tutorial Index
15/38
Dimensional Modeling for DSS
Star Schema, Snowflake schema Based on RDBMS we have to choose what type of model
suits better. Example: Teradata is an RDBMS which can give the
results in reasonable time as its a parallel processingdatabase engine in the market. So we can design the
Enterprise data model in the 3rd
normal form. But wecant have the same approach for SQL server or Oracle,we should think of denormalizing the data model.
Star Schema makes queries run faster as the number oftables to join is less.
In star schema all the hierarchies defined per dimension
will be stored in single table. So the data redundancy ishigh. In snow flake we can have one more table for thehierarchy. Thats the difference between the starschema and snow flake schema.
-
8/13/2019 Dw Tutorial Index
16/38
Star Schema
Star schema is optimized forqueries. You will have theredundant data available in star
schema based data model.
-
8/13/2019 Dw Tutorial Index
17/38
Snow flake
Snow flake wont have much ofredundant data as most of thedimensions will have a look table.
This way the number of joinsbetween the tables will becomemore.
Both have advantages and dis
advantages, so analyze the endusers requirements and spaceconstraints to pick the best.
-
8/13/2019 Dw Tutorial Index
18/38
Data Refresh in DSS
We have to refresh the data in DSSfrom various source systems in timelymanner.
While doing so, either we should do a
full refresh of a particular table orcapture only the changed data (thisprocess is called delta)
Usually for fact tables we go for deltarefresh and for dimension tables we go
for full refresh. As the environment isgetting bigger and bigger almost all thetables will become delta loads.
-
8/13/2019 Dw Tutorial Index
19/38
Advantages of DSS
Safeway a grocery store chain in US givesvarious information from DSS directly to storemanager. Example, the system can predict thea particular stock outage in the store. Basedon the history system knows for every 3 hours
there should be sale on one particular item, ifthe DSS system did not see a transaction fromlast 2 hours it sends an SMS to current shiftsmanager mobile. Thats the level you can gowith the DSS. It takes time to get there.
Walmart does the customer profiling, storesales analysis etc etc on there datawarehouse, its implemented on Teradata.
FedEx uses Teredata, Ab Initio andMicrostrategy as there DSS tools.
-
8/13/2019 Dw Tutorial Index
20/38
Data Warehouse- Concepts
Module 2
Data Warehouse Approach
-
8/13/2019 Dw Tutorial Index
21/38
Distributed Approach
Various departments can startcreating different data marts. Eachcan start working independently
and see the ROI in a short span. Inthe long run integrating these dataadds the complexity and Cost will
be higher as there are moresystems to maintain.
-
8/13/2019 Dw Tutorial Index
22/38
Gives only partof the answer
Requires timeand effort toput the piecestogether
No guaranteeits the rightanswer
Distributed Approach to DSS
How We Are Different
-
8/13/2019 Dw Tutorial Index
23/38
Centralized Approach
Centralized data warehouse contains the datain one place, easy to answer any businessquestion. In the long run this has the costadvantage over the non-centralized datawarehouse. Not very easy to implement as it
needs more time and resources. ROI wont beseen until the implementation is completed.So recommended approach is to implementthe centralized data warehouse is, start withone subject area and keep adding one subjectarea at a time, this way organization will getthe see the ROI at various stages.
-
8/13/2019 Dw Tutorial Index
24/38
Delivers oneversion ofthe truthfor
increasedconfidenceand speed indecision-making
Centralized Approach to DSS
How We Are Different
-
8/13/2019 Dw Tutorial Index
25/38
Data Warehouse- Concepts
Module 3
Data Warehouse Implementation
Steps
-
8/13/2019 Dw Tutorial Index
26/38
Typical Approach
Data Modeling is a cyclic process involving the followingsteps
Requirement Gathering
Requirement Analysis
Requirement Validation
Logical Modeling
Physical Design
Implementation
Validation
The above cycle repeats for any upgrades orenhancements
-
8/13/2019 Dw Tutorial Index
27/38
Requirement Gathering
Identify the Business objectives Identify the reporting requirements
Identify the frequency of report generation
Granularity of Information
Business rules
-
8/13/2019 Dw Tutorial Index
28/38
Requirement Analysis
Study the requirements captured Identify the subject areas
Identify the Measures and criteria fields
Identify the granularity of information
required
-
8/13/2019 Dw Tutorial Index
29/38
Requirement Validation
Validate the analysis with the customer Document Sign off.
-
8/13/2019 Dw Tutorial Index
30/38
Logical Modeling
Identify facts and dimensions Create Logical Model
-
8/13/2019 Dw Tutorial Index
31/38
Physical Design
Analyze Source Systems with respect to Logical Model Data Quality Analysis
Physical Design
Data type
Indexes
Partitioning
Database creation etc.,
Source to target mapping
Capture Transformation rules
Capture Derivation rules for derived fields
-
8/13/2019 Dw Tutorial Index
32/38
Implementation
Database Creation
Staging Design (Design Extraction Jobs)
Develop ETL Jobs
Unit testing of ETL Jobs
Schedule Jobs Test Load
Data Validation
Performance monitoring
ETL Job tuning
Test Database performance tuning
Final loading of data from source to target
-
8/13/2019 Dw Tutorial Index
33/38
Data Warehouse- Concepts
Module 4
OLAP (Online Analytical Processing)
-
8/13/2019 Dw Tutorial Index
34/38
What is OLAP?
What is OLAP?Online Analytical Processing.
Viewing data in a multi dimensional
way.
Why OLAP?
Slice and dice for data warehouse.RDBMS is a 2 dimensional way of
storing / viewing the data
-
8/13/2019 Dw Tutorial Index
35/38
Types in OLAP?
Three types of OLAP in theindustry.
1. MOLAP Multi dimensional OLAP
(Ex MSOLAP, Essbase, Cognos).2. ROLAP Relational OLAP ( Ex
Business Objects, Microstrategy).
3. HOLAP Hybrid OLAP
-
8/13/2019 Dw Tutorial Index
36/38
Data Warehouse- Concepts
Module 5
Next steps in Data Warehousing
-
8/13/2019 Dw Tutorial Index
37/38
Data Mining
OLAP is like fishing (one trend at atime) Data Mining is like fishing using a NET. Mining tools provides the sophisticated
algorithms to find the specific trendswith the data available. Example : MS Analysis Server provides
the following algorithms. (Clusteringetc)
Mainly used to identify set of customerswho think a like, fraud deductions etcetc
-
8/13/2019 Dw Tutorial Index
38/38
Business Activity Monitoring(BAM)
BAM is the technology which is used tomonitor the DW or OLTP actively for certainvalue.
The system can run the set of process when itfinds the exception and sends the informationto relevant owners to take the action.
Based on the findings immediately update therelevant OLTP system (conceptually its calledclosing the loop with DSS and OLTP)
Example - INFORAY is a BAM tool which youcan use on the DW.