2 Copyright © Oracle Corporation, 2002. All rights reserved. Defining Data Warehouse Concepts and...
-
Upload
pamela-lawson -
Category
Documents
-
view
221 -
download
0
Transcript of 2 Copyright © Oracle Corporation, 2002. All rights reserved. Defining Data Warehouse Concepts and...
2Copyright © Oracle Corporation, 2002. All rights reserved.
Defining Data Warehouse Concepts and Terminology
2-2 Copyright © Oracle Corporation, 2002. All rights reserved.
Objectives
After completing this lesson, you should be able to do the following:
• Identify a common, broadly accepted definition of a data warehouse
• Describe the differences of dependent and independent data marts
• Identify some of the main warehouse development approaches
• Recognize some of the operational properties and common terminology of a data warehouse
2-3 Copyright © Oracle Corporation, 2002. All rights reserved.
Definition of a Data Warehouse
“A data warehouse is a subject oriented, integrated, non-volatile, and time variant collection of data in support of management’s decisions.”
— W.H. Inmon
“An enterprise structured repository of subject-oriented, time-variant, historical data used for information retrieval and decision support. The data warehouse stores atomic and summary data.”
— Oracle’s Data Warehouse Definition
2-5 Copyright © Oracle Corporation, 2002. All rights reserved.
Data Warehouse Properties
Integrated
Time-variantNonvolatile
Subject-oriented
DataWarehouse
2-6 Copyright © Oracle Corporation, 2002. All rights reserved.
Subject-Oriented
Data is categorized and stored by business subject rather than by application.
OLTP Applications
Equity Plans
Shares
Insurance
Loans
Savings
Data Warehouse Subject
Customer financial information
2-7 Copyright © Oracle Corporation, 2002. All rights reserved.
Integrated
Data on a given subject is defined and stored once.
Data WarehouseOLTP Applications
Customer
Savings
Current Accounts
Loans
2-9 Copyright © Oracle Corporation, 2002. All rights reserved.
Data Warehouse
Time-Variant
Data is stored as a series of snapshots, each representing a period of time.
2-10 Copyright © Oracle Corporation, 2002. All rights reserved.
Nonvolatile
Typically data in the data warehouse is not updated or deleted.
Warehouse
Read
Load
Operational
Insert, Update, Delete, or Read
2-11 Copyright © Oracle Corporation, 2002. All rights reserved.
Changing Warehouse Data
Operational Databases Warehouse Database
First time load
Refresh
Refresh
RefreshPurge or Archive
2-12 Copyright © Oracle Corporation, 2002. All rights reserved.
Data Warehouse Versus OLTP
Property OLTP Data Warehouse
Response Time Sub seconds to seconds
Seconds to hours
Operations DML Primarily Read only
Nature of Data 30 – 60 days Snapshots over time
Data Organization Application Subject, time
Size Small to large Large to very large
Data Sources Operational, Internal Operational, Internal, External
Activities Processes Analysis
2-14 Copyright © Oracle Corporation, 2002. All rights reserved.
Usage Curves
• Operational system is predictable
• Data warehouse:– Variable– Random
2-15 Copyright © Oracle Corporation, 2002. All rights reserved.
User Expectations
• Control expectations
• Set achievable targets for query response
• Set SLAs
• Educate
• Growth and use is exponential
2-16 Copyright © Oracle Corporation, 2002. All rights reserved.
Enterprisewide Warehouse
• Large scale implementation
• Scopes the entire business
• Data from all subject areas
• Developed incrementally
• Single source of enterprisewide data
• Synchronized enterprisewide data
• Single distribution point to dependent data marts
2-17 Copyright © Oracle Corporation, 2002. All rights reserved.
Data Warehouses Versus Data Marts
Property Data Warehouse Data Mart
Scope Enterprise Department
Subjects Multiple Single-subject, LOB
Data Source Many Few
Implementation time Months to years Months
2-19 Copyright © Oracle Corporation, 2002. All rights reserved.
Dependent Data Mart
Data Warehouse
Data Marts
Flat FilesMarketing
Sales
Finance
MarketingSales
FinanceHR
OperationalSystems
External Data
Operations Data
Legacy Data
External Data
2-20 Copyright © Oracle Corporation, 2002. All rights reserved.
Independent Data Mart
Sales orMarketing
Flat Files
OperationalSystems
External Data
Operations Data
Legacy Data
External Data
2-21 Copyright © Oracle Corporation, 2002. All rights reserved.
Typical DataWarehouse Components
Source Systems
Staging Area
Presentation Area
AccessTools
ODS
Operational
External
Legacy
Metadata Repository
Data Marts
Data Warehouse
2-23 Copyright © Oracle Corporation, 2002. All rights reserved.
Warehouse Development Approaches
• “Big bang” approach
• Incremental approach:– Top-down incremental approach– Bottom-up incremental approach
2-24 Copyright © Oracle Corporation, 2002. All rights reserved.
“Big Bang” Approach
Analyze enterpriserequirements
Build enterprisedata warehouse
Report in subsets orstore in data marts
2-26 Copyright © Oracle Corporation, 2002. All rights reserved.
Top-Down Approach
Analyze requirements at the enterprise level
Develop conceptual information model
Identify and prioritize subject areas
Complete a model of selected subject area
Map to available data
Perform a source system analysis
Implement base technical architecture
Establish metadata, extraction, and load processes for the initial subject area
Create and populate the initial subject area data mart within the overall warehouse
framework
2-27 Copyright © Oracle Corporation, 2002. All rights reserved.
Bottom-Up Approach
Define the scope and coverage of the data warehouse and analyze the source systems within this scope
Define the initial increment based on the political pressure, assumed business benefit and data volume
Implement base technical architecture and establish metadata, extraction, and load processes as required by increment
Create and populate the initial subject areas within the overall warehouse framework
2-29 Copyright © Oracle Corporation, 2002. All rights reserved.
Incremental Approach to Warehouse Development
• Multiple iterations
• Shorter implementations
• Validation of each phase Strategy
Definition
Analysis
Design
Build
Production
Increment 1
Iterative
2-30 Copyright © Oracle Corporation, 2002. All rights reserved.
Data Warehousing Process Components
• Methodology
• Architecture
• Extraction, Transformation, and Load (ETL)
• Implementation
• Operation and Support
2-31 Copyright © Oracle Corporation, 2002. All rights reserved.
Methodology
• Ensures a successful data warehouse
• Encourages incremental development
• Provides a staged approach to an enterprisewide warehouse:– Safe– Manageable– Proven– Recommended
2-32 Copyright © Oracle Corporation, 2002. All rights reserved.
Architecture
• “Provides the planning, structure, and standardization needed to ensure integration of multiple components, projects, and processes across time.”
• “Establishes the framework, standards, and procedures for the data warehouse at an enterprise level.”
— The Data Warehousing Institute
2-33 Copyright © Oracle Corporation, 2002. All rights reserved.
Extraction, Transformation, and Load (ETL)
“Effective data extract, transform and load (ETL) processes represent the number one success factor for your data warehouse project and can absorb up to 70 percent of the time spent on a typical data warehousing project.”
— DM Review, March 2001
Source TargetStaging Area
2-34 Copyright © Oracle Corporation, 2002. All rights reserved.
Implementation
Data Warehouse Architecture
Implementation
Ex., Incremental Implementation
Increment 1
Increment 2
Increment n
.
.
.
2-35 Copyright © Oracle Corporation, 2002. All rights reserved.
Operation and Support
• Data access and reporting
• Refreshing warehouse data
• Monitoring
• Responding to change
2-36 Copyright © Oracle Corporation, 2002. All rights reserved.
Phases of theIncremental Approach
• Strategy
• Definition
• Analysis
• Design
• Build
• Production
Increment 1Strategy
Definition
Analysis
Design
Build
Production
2-38 Copyright © Oracle Corporation, 2002. All rights reserved.
Strategy Phase Deliverables
• Business goals and objectives
• Data warehouse purpose, objectives, and scope
• Enterprise data warehouse logical model
• Incremental milestones
• Source systems data flows
• Subject area gap analysis
2-39 Copyright © Oracle Corporation, 2002. All rights reserved.
Strategy Phase Deliverables
• Data acquisition strategy
• Data quality strategy
• Metadata strategy
• Data access environment
• Training strategy
2-40 Copyright © Oracle Corporation, 2002. All rights reserved.
Summary
In this lesson, you should have learned how to:
• Identify a common, broadly accepted definition of a data warehouse
• Describe the differences of dependent and independent data marts
• Identify some of the main warehouse development approaches
• Recognize some of the operational properties and common terminology of a data warehouse