Bab 3
description
Transcript of Bab 3
![Page 1: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/1.jpg)
Bab 3
Data Warehousing
![Page 2: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/2.jpg)
Why Data Warehouse?
![Page 3: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/3.jpg)
Scenario 1
ABC Pvt Ltd is a company with branches at Mumbai, Delhi, Chennai and Banglore. The Sales Manager wants quarterly sales report. Each branch has a separate operational system.
![Page 4: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/4.jpg)
Scenario 1 : ABC Pvt Ltd.
Mumbai
Delhi
Chennai
Banglore
SalesManager
Sales per item type per branchfor first quarter.
![Page 5: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/5.jpg)
Solution 1:ABC Pvt Ltd.
• Extract sales information from each database.• Store the information in a common repository at a
single site.
![Page 6: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/6.jpg)
Solution 1:ABC Pvt Ltd.
Mumbai
Delhi
Chennai
Banglore
DataWarehouse
SalesManager
Query &Analysis tools
Report
![Page 7: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/7.jpg)
Scenario 2
One Stop Shopping Super Market has hugeoperational database.Whenever Executives wantssome report the OLTP system becomesslow and data entry operators have to wait for some time.
![Page 8: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/8.jpg)
Scenario 2 : One Stop Shopping
OperationalDatabase
Data Entry Operator
Data Entry Operator
ManagementWait
Report
![Page 9: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/9.jpg)
Solution 2
• Extract data needed for analysis from operational database.
• Store it in warehouse.• Refresh warehouse at regular interval so that it
contains up to date information for analysis.• Warehouse will contain data with historical
perspective.
![Page 10: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/10.jpg)
Solution 2
Operationaldatabase
DataWarehouse
Extractdata
Data EntryOperator
Data EntryOperator
Manager
Report
Transaction
![Page 11: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/11.jpg)
Scenario 3
Cakes & Cookies is a small,new company.President of the company wants his company should grow.He needs information so that he can make correct decisions.
![Page 12: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/12.jpg)
Solution 3
• Improve the quality of data before loading it into the warehouse.
• Perform data cleaning and transformation before loading the data.
• Use query analysis tools to support adhoc queries.
![Page 13: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/13.jpg)
Solution 3
Query and Analysistool
President
Expansion
Improvement
sales
time
DataWarehouse
![Page 14: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/14.jpg)
What is Data Warehouse??
![Page 15: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/15.jpg)
Inmons’s definition
A data warehouse is-subject-oriented,-integrated,-time-variant,-nonvolatile
collection of data in support of management’sdecision making process.
![Page 16: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/16.jpg)
Subject-oriented
• Data warehouse is organized around subjects such as sales,product,customer.
• It focuses on modeling and analysis of data for decision makers.
• Excludes data not useful in decision support process.
![Page 17: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/17.jpg)
Integration
• Data Warehouse is constructed by integrating multiple heterogeneous sources.
• Data Preprocessing are applied to ensure consistency.
RDBMS
LegacySystem
DataWarehouse
Flat File Data ProcessingData Transformation
![Page 18: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/18.jpg)
Integration• In terms of data.
– encoding structures.
– Measurement ofattributes.
– physical attribute. of data
– naming conventions.
– Data type format
remarks
![Page 19: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/19.jpg)
Time-variant
• Provides information from historical perspective e.g. past 5-10 years
• Every key structure contains either implicitly or explicitly an element of time
![Page 20: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/20.jpg)
Nonvolatile
• Data once recorded cannot be updated.• Data warehouse requires two operations in data
accessing– Initial loading of data– Access of data
load
access
![Page 21: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/21.jpg)
Operational v/s Information SystemFeatures Operational Information
Characteristics Operational processing Informational processingOrientation Transaction Analysis
User Clerk,DBA,database professional
Knowledge workers
Function Day to day operation Decision support
Data Current Historical
View Detailed,flat relational Summarized, multidimensional
DB design Application oriented Subject orientedUnit of work Short ,simple transaction Complex query
Access Read/write Mostly read
![Page 22: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/22.jpg)
Operational v/s Information System
Features Operational InformationFocus Data in Information out
Number of records accessed
tens millions
Number of users thousands hundredsDB size 100MB to GB 100 GB to TBPriority High performance,high
availabilityHigh flexibility,end-user autonomy
Metric Transaction throughput Query througput
![Page 23: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/23.jpg)
Data Warehousing Architecture
Extract Transform
Load Refresh
Serve
ExternalSources
Operational Dbs
Analysis
Query/Reporting
Data Mining
Monitoring &Administration
MetadataRepository
DATA SOURCES TOOLS
DATA MARTS
OLAP Servers
Reconciled data
![Page 24: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/24.jpg)
Data Warehouse Architecture• Data Warehouse server– almost always a relational DBMS,rarely flat files
• OLAP servers– to support and operate on multi-dimensional
data structures• Clients– Query and reporting tools– Analysis tools– Data mining tools
![Page 25: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/25.jpg)
Data Warehouse Schema
• Star Schema• Fact Constellation Schema• Snowflake Schema
![Page 26: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/26.jpg)
Star Schema
• A single,large and central fact table and one table for each dimension.
• Every fact points to one tuple in each of the dimensions and has additional attributes.
• Does not capture hierarchies directly.
![Page 27: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/27.jpg)
Star Schema (contd..)Store KeyProduct KeyPeriod KeyUnitsPrice
Store Dimension
Time Dimension
Product Dimension
Fact Table
Benefits: Easy to understand, easy to define hierarchies, reduces no. of physical joins.
Store Key
Store Name
City
State
Region
Period KeyYearQuarter
Month
Product KeyProduct Desc
![Page 28: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/28.jpg)
SnowFlake Schema
• Variant of star schema model.• A single,large and central fact table and one or
more tables for each dimension.• Dimension tables are normalized i.e. split
dimension table data into additional tables
![Page 29: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/29.jpg)
SnowFlake Schema (contd..)
Store KeyProduct KeyPeriod KeyUnitsPrice
Time Dimension
Product Dimension
Fact Table
Store Key
Store Name
City Key
Period KeyYearQuarter
Month
Product KeyProduct Desc
City Key
City
State
Region
City Dimension
Store Dimension
Drawbacks: Time consuming joins,report generation slow
![Page 30: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/30.jpg)
Fact Constellation• Multiple fact tables share dimension tables.• This schema is viewed as collection of stars
hence called galaxy schema or fact constellation.
• Sophisticated application requires such schema.
![Page 31: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/31.jpg)
Fact Constellation (contd..)
Store KeyProduct KeyPeriod KeyUnitsPrice
Store Dimension
Product Dimension
SalesFact Table
Store Key
Store Name
City
State
Region
Product KeyProduct Desc
Shipper KeyStore KeyProduct KeyPeriod KeyUnitsPrice
ShippingFact Table
![Page 32: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/32.jpg)
Building Data Warehouse
• Data Selection• Data Preprocessing– Fill missing values– Remove inconsistency
• Data Transformation & Integration• Data Loading Data in warehouse is stored in form of fact tables
and dimension tables.
![Page 33: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/33.jpg)
Case Study• Afco Foods & Beverages is a new company which
produces dairy,bread and meat products with production unit located at Baroda.
• There products are sold in North,North West and Western region of India.
• They have sales units at Mumbai, Pune , Ahemdabad ,Delhi and Baroda.
• The President of the company wants sales information.
![Page 34: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/34.jpg)
Sales Information
January February March April14 41 33 25
Report: The number of units sold.
113
Report: The number of units sold over time
![Page 35: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/35.jpg)
Sales Information
Jan Feb Mar Apr
Wheat Bread 6 17
Cheese 6 16 6 8
Swiss Rolls 8 25 21
Report : The number of items sold for each product withtime
Product
Tim
e
![Page 36: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/36.jpg)
Sales Information
Jan Feb Mar AprMumbai Wheat Bread 3 10
Cheese 3 16 6
Swiss Rolls 4 16 6
Pune Wheat Bread 3 7
Cheese 3 8
Swiss Rolls 4 9 15
Report: The number of items sold in each City for each product with time
Product
Tim
e
City
![Page 37: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/37.jpg)
Sales Information
Report: The number of items sold and income in each region for each product with time.
Jan Feb Mar AprRs U Rs U Rs U Rs U
Mumbai Wheat Bread 7.44 3 24.80 10
Cheese 7.95 3 42.40 16 15.90 6
Swiss Rolls 7.32 4 29.98 16 10.98 6
Pune Wheat Bread 7.44 3 17.36 7
Cheese 7.95 3 21.20 8
Swiss Rolls 7.32 4 16.47 9 27.45 15
![Page 38: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/38.jpg)
Sales Measures & Dimensions
• Measure – Units sold, Amount.• Dimensions – Product,Time,Region.
![Page 39: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/39.jpg)
Sales Data Warehouse Model
City Product Month Units Rupees
Mumbai Wheat Bread January 3 7.95Mumbai Cheese January 4 7.32Pune Wheat Bread January 3 7.95Pune Cheese January 4 7.32
Mumbai Swiss Rolls February 16 42.40
Fact Table
![Page 40: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/40.jpg)
Sales Data Warehouse Model
City_ID Prod_ID Month Units Rupees
1 589 1/1/1998 3 7.951 1218 1/1/1998 4 7.322 589 1/1/1998 3 7.952 1218 1/1/1998 4 7.32
1 589 2/1/1998 16 42.40
![Page 41: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/41.jpg)
Sales Data Warehouse Model
Prod_ID Product_Name Product_Category_ID
589 Wheat Bread 1590 White Bread 1
288 Coconut Cookies 2
Product Dimension Tables
Product_Category_Id Product_Category
1 Bread2 Cookies
![Page 42: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/42.jpg)
Sales Data Warehouse Model
City_ID City Region Country
1 Mumbai West India
2 Pune NorthWest India
Region Dimension Table
![Page 43: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/43.jpg)
Sales Data Warehouse Model
Sales Fact
Region
Product ProductCategory
Time
![Page 44: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/44.jpg)
Online Analysis Processing(OLAP)
• It enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user.
Data Warehouse
Time
Product
Reg
ion
![Page 45: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/45.jpg)
OLAP Cube
City Product Time Units Dollars
All All All 113 251.26
Mumbai All All 64 146.07Mumbai White Bread All 38 98.49
Mumbai Wheat Bread All 13 32.24
Mumbai Wheat Bread Qtr1 3 7.44
Mumbai Wheat Bread March 3 7.44
![Page 46: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/46.jpg)
OLAP OperationsDrill Down
Time
Reg
ion
Product
Category e.g Electrical Appliance
Sub Category e.g Kitchen
Product e.g Toaster
![Page 47: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/47.jpg)
OLAP OperationsDrill Up
Time
Reg
ion
Product
Category e.g Electrical Appliance
Sub Category e.g Kitchen
Product e.g Toaster
![Page 48: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/48.jpg)
OLAP OperationsSlice and Dice
Time
Reg
ion
ProductProduct=Toaster
Time
Reg
ion
![Page 49: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/49.jpg)
OLAP OperationsPivot
Time
Reg
ion
Product
RegionTi
me
Product
![Page 50: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/50.jpg)
OLAP Server
• An OLAP Server is a high capacity,multi user data manipulation engine specifically designed to support and operate on multi-dimensional data structure.
• OLAP server available are– MOLAP server– ROLAP server– HOLAP server
![Page 51: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/51.jpg)
Presentation
Time
Reg
ion
Product
Report
ReportingTool
![Page 52: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/52.jpg)
Data Warehousing includes
• Build Data Warehouse• Online analysis processing(OLAP).• Presentation.
RDBMS
Flat File
Presentation
Cleaning ,Selection &Integration
Warehouse & OLAP serverClient
![Page 53: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/53.jpg)
Need for Data Warehousing
• Industry has huge amount of operational data• Knowledge worker wants to turn this data into
useful information.• This information is used by them to support
strategic decision making .
![Page 54: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/54.jpg)
Need for Data Warehousing (contd..)
• It is a platform for consolidated historical data for analysis.
• It stores data of good quality so that knowledge worker can make correct decisions.
![Page 55: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/55.jpg)
Data Warehousing Tools• Data Warehouse– SQL Server 2000 DTS– Oracle 8i Warehouse Builder
• OLAP tools– SQL Server Analysis Services– Oracle Express Server
• Reporting tools– MS Excel Pivot Chart– VB Applications
![Page 56: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/56.jpg)
References
• Building Data Warehouse by Inmon• Data Mining:Concepts and Techniques by Han,Kamber.• www.dwinfocenter.org• www.datawarehousingonline.com• www.billinmon.com
![Page 57: Bab 3](https://reader035.fdocuments.in/reader035/viewer/2022062815/56816934550346895de0893f/html5/thumbnails/57.jpg)
Thank You