Post on 03-Apr-2018
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
1/52
Building Data WareHouse
by InmonChapter 2: The Data Warehouse Environment
http://it-slideshares.blogspot.com/IT-Slideshares
http://it-slideshares.blogspot.com/http://it-slideshares.blogspot.com/http://it-slideshares.blogspot.com/http://it-slideshares.blogspot.com/7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
2/52
2. The Data WarehouseEnvironment1. The Structure of the Data Warehouse2. Subject Orientation
3. Day 1 to Day n Phenomenon
4. Granularity5. Exploration and Data Mining
6. Living Sample Database
7.Partitioning as a Design Approach8. Structuring Data in the Data Warehouse
9. Auditing and the Data Warehouse
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
3/52
2. The Data Warehouse Environment(cont.)
10. Data Homogeneity and Heterogeneity
11. Purging Warehouse Data
12. Reporting and the Architected
Environment
13. The Operational Window ofOpportunity
14. Incorrect Data in the Data Warehouse
15. Summary
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
4/52
2.0 Introduction datawarehouse characteristics Subject-oriented in regards to DSS
Integrated of multiple data sources
Non-volatile data archive
Time-Variant collection of data insupport of DSS report
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
5/52
2.1. data warehouse characteristics
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
6/52
2.1. data warehouse characteristics
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
7/52
2.1. The Structure of the Data Warehouse
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
8/52
2.1 The Structure of the Datawarehouse
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
9/52
2.2. Subject Orientation
The data warehouse is oriented to the majorsubject areas of the corporation that havebeen defined in the high-level corporate datamodel. Typical subject areas include the
following:
Customer Product Transaction or activity Policy ClaimAccount
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
10/52
2.2.1
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
11/52
2.2.2 Subject Orientation (cont)
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
12/52
2.2.3 Subject-Orientation (cont)
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
13/52
2.2.4 Subject Orientation (cont)
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
14/52
2.3. Day 1 to Day n Phenomenon Data warehouses are not built all at once. data warehouse be built in an orderly,
iterative, step-at-a-time fashion.
The big bang approach to data warehousedevelopment is simply an invitation todisaster and is never an appropriatealternative.
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
15/52
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
16/52
2.4. Granularity
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
17/52
2.4.1. The Benefits ofGranularity The granular data found in the data warehouse is the
key to reusability.
Looking at the data in different ways is only oneadvantage of having a solid foundation.
Focus on specific needs of each DSS report e.g. daily,monthly, quarterly or yearly or even multiple years trendingreports
Another related benefit of a low level of granularity isflexibility
Another benefit of granular data is that it contains ahistory of activities and events across the corporation.
largest benefit of a data warehouse foundation is thatfuture unknown requirements can be accommodated.
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
18/52
2.4.2. An Example of Granularity
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
19/52
2.4.2.1
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
20/52
2.4.3. Dual Levels of Granularity
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
21/52
2.4.3.1 Telephone example
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
22/52
2.4.3.2 Telephone example (cont)
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
23/52
2.4.3.3 Telephone Example (cont)
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
24/52
2.5. Exploration and DataMining Granular data in Data warehouse support Data
marts
Support process of data mining or data exploration
References
Exploration Warehousing: Turning
Business Information into Business
Opportunity(Hoboken, N.J.: Wiley, 2000)
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
25/52
2.6. Living Sample Database
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
26/52
2.7. Partitioning as a Design Approach
Proper partitioning can benefit the datawarehouse in several ways:
Loading dataAccessing data
Archiving data
Deleting data Monitoring data
Storing data
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
27/52
2.7.1. Partitioning of Data
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
28/52
2.7.1. Partitioning of Data (cont.)
Following are some of the tasks that cannoteasily be performed when data resides inlarge physical units:
Restructuring Indexing Sequential scanning, if needed Reorganization Recovery Monitoring
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
29/52
2.7.1. Partitioning of Data (cont.)
Data can be divided by many criteria, suchas:
By date
By line of business
By geography By organizational unit
By all of the above
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
30/52
2.7.1. Partitioning of Data (cont.)
As an example of how a life insurance company maychoose to partition by physical units of data.
data, consider the following physical units of data: 2000 health claims 2001 health claims 2002 health claims 1999 life claims 2000 life claims 2001 life claims 2002 life claims 2000 casualty claims 2001 casualty claims
2002 casualty claims
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
31/52
2.8 Structuring Data in the Data Warehouse
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
32/52
2.8 Structuring Data in the Data Warehouse(cont.)
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
33/52
2.8 Structuring Data in the Data Warehouse(cont.)
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
34/52
2.8 Structuring Data in the Data Warehouse(cont.)
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
35/52
2.8 Structuring Data in the Data Warehouse(cont.)
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
36/52
2.8. Structuring Data in the DataWarehouse (cont.)
There are many more ways to structuredata within the data warehouse. Themost common are these:
Simple cumulative
Rolling summary
Simple direct Continuous
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
37/52
2.8. Structuring Data in the DataWarehouse (cont.)
At the key level, data warehouse keysare inevitably compoundedkeys.There are two compellingreasons for this:
Dateyear, year/month,year/month/day, and so onis almostalways a part of the key.
Because data warehouse data ispartitioned, the different componentsof the partitioning show up as part ofthe key.
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
38/52
2.8. Structuring Data in the Data Warehouse(cont.)
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
39/52
2.9 Auditing and the Data Warehouse
Data that otherwise would not find itsway into the warehouse suddenly has tobe there.
The timing of data entry into the
warehouse changes dramatically whenan auditing capability is required. The backup and recovery restrictions for
the data warehouse change drastically
when an auditing capability is required.Auditing data at the warehouse forces
the granularity of data in the warehouseto be at the very lowest level.
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
40/52
2.10 Data Homogeneity andHeterogeneity
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
41/52
2.10 Data Homogeneity and Heterogeneity(cont.)
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
42/52
2.10 Data Homogeneity andHeterogeneity (cont.)
The data in the data warehouse then issubdivided by the following criteria:
Subject area Table
Occurrences of data within table
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
43/52
2.10. Data Homogeneity and Heterogeneity(cont.)
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
44/52
2.11 Purging Warehouse Data
There are several ways in which data is purged orthe detail of data is transformed, including thefollowing:
Data is added to a rolling summary file wheredetail is lost.
Data is transferred to a bulk storage medium froma high-performance medium such as DASD.
Data is actually purged from the system.
Data is transferred from one level of thearchitecture to another, such as from theoperational level to the data warehouse level.
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
45/52
2.12 Reporting and the Architected Environment
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
46/52
2.13. The Operational Window ofOpportunityThe following are some suggestions as to how the operational windowof archival data may look in different industries:
Insurance2 to 3 years
Bank trust processing2 to 5 years
Telephone customer usage30 to 60 days Supplier/vendor activity2 to 3 years
Retail banking customer account activity30 days
Vendor activity1 year
Loans2 to 5 years
Retailing SKU activity1 to 14 days Vendor activity1 week to 1 month
Airlines flight seat activity30 to 90 days
Vendor/supplier activity1 to 2 years
Public utility customer utilization60 to 90 days
Supplier activity1 to 5 years
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
47/52
2.14. Incorrect Data in the Data Warehouse
Choice 1: Go back into the datawarehouse for July 2 and find theoffending entry. Then, using update
capabilities, replace the value $5,000with the value $750.
Choice 2: Enter offsetting entries.
Choice 3: Reset the account to theproper value on August 16.
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
48/52
2.14. Incorrect Data in the DataWarehouse (cont.)
Choice 1
The integrity of the data has beendestroyed. Any report running betweenJuly 2 and Aug 16 will not be able to bereconciled.
The update must be done in the data
warehouse environment. In many cases, there is not a single entry
that must be corrected, but many, manyentries that must be corrected.
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
49/52
2.14. Incorrect Data in the DataWarehouse (cont.)
Choice 2
Many entries may have to be
corrected, not just one. Making asimple adjustment may not be an easything to do at all.
Sometimes the formula for correctionis so complex that making anadjustment cannot be done.
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
50/52
2.14. Incorrect Data in the DataWarehouse (cont.)
Choice 2 (cont)
The ability to simply reset an account
as of one moment in time requiresapplication and proceduralconventions.
Such a resetting of values does notaccurately account for the error thathas been made.
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
51/52
2.15. Summary1. The Structure of the Data Warehouse2. Subject Orientation
3. Granularity
4. Exploration and Data Mining5. Living Sample Database
6. Structuring Data in the Data Warehouse
7. Auditing and the Data Warehouse
8. Data Homogeneity and Heterogeneity
9. Purging Warehouse Data
2 15 S
7/29/2019 The Data Warehouse Environment - Building the Data WareHouse
52/52
2.15. Summary
10. Reporting and the ArchitectedEnvironment
11. The Operational Window of
Opportunity12. Incorrect Data in the Data Warehouse