2 Copyright © Oracle Corporation, 2002. All rights reserved. Defining Data Warehouse Concepts and...

33
2 Copyright © Oracle Corporation, 2002. All rights reserved. Defining Data Warehouse Concepts and Terminology

Transcript of 2 Copyright © Oracle Corporation, 2002. All rights reserved. Defining Data Warehouse Concepts and...

2Copyright © Oracle Corporation, 2002. All rights reserved.

Defining Data Warehouse Concepts and Terminology

2-2 Copyright © Oracle Corporation, 2002. All rights reserved.

Objectives

After completing this lesson, you should be able to do the following:

• Identify a common, broadly accepted definition of a data warehouse

• Describe the differences of dependent and independent data marts

• Identify some of the main warehouse development approaches

• Recognize some of the operational properties and common terminology of a data warehouse

2-3 Copyright © Oracle Corporation, 2002. All rights reserved.

Definition of a Data Warehouse

“A data warehouse is a subject oriented, integrated, non-volatile, and time variant collection of data in support of management’s decisions.”

— W.H. Inmon

“An enterprise structured repository of subject-oriented, time-variant, historical data used for information retrieval and decision support. The data warehouse stores atomic and summary data.”

— Oracle’s Data Warehouse Definition

2-5 Copyright © Oracle Corporation, 2002. All rights reserved.

Data Warehouse Properties

Integrated

Time-variantNonvolatile

Subject-oriented

DataWarehouse

2-6 Copyright © Oracle Corporation, 2002. All rights reserved.

Subject-Oriented

Data is categorized and stored by business subject rather than by application.

OLTP Applications

Equity Plans

Shares

Insurance

Loans

Savings

Data Warehouse Subject

Customer financial information

2-7 Copyright © Oracle Corporation, 2002. All rights reserved.

Integrated

Data on a given subject is defined and stored once.

Data WarehouseOLTP Applications

Customer

Savings

Current Accounts

Loans

2-9 Copyright © Oracle Corporation, 2002. All rights reserved.

Data Warehouse

Time-Variant

Data is stored as a series of snapshots, each representing a period of time.

2-10 Copyright © Oracle Corporation, 2002. All rights reserved.

Nonvolatile

Typically data in the data warehouse is not updated or deleted.

Warehouse

Read

Load

Operational

Insert, Update, Delete, or Read

2-11 Copyright © Oracle Corporation, 2002. All rights reserved.

Changing Warehouse Data

Operational Databases Warehouse Database

First time load

Refresh

Refresh

RefreshPurge or Archive

2-12 Copyright © Oracle Corporation, 2002. All rights reserved.

Data Warehouse Versus OLTP

Property OLTP Data Warehouse

Response Time Sub seconds to seconds

Seconds to hours

Operations DML Primarily Read only

Nature of Data 30 – 60 days Snapshots over time

Data Organization Application Subject, time

Size Small to large Large to very large

Data Sources Operational, Internal Operational, Internal, External

Activities Processes Analysis

2-14 Copyright © Oracle Corporation, 2002. All rights reserved.

Usage Curves

• Operational system is predictable

• Data warehouse:– Variable– Random

2-15 Copyright © Oracle Corporation, 2002. All rights reserved.

User Expectations

• Control expectations

• Set achievable targets for query response

• Set SLAs

• Educate

• Growth and use is exponential

2-16 Copyright © Oracle Corporation, 2002. All rights reserved.

Enterprisewide Warehouse

• Large scale implementation

• Scopes the entire business

• Data from all subject areas

• Developed incrementally

• Single source of enterprisewide data

• Synchronized enterprisewide data

• Single distribution point to dependent data marts

2-17 Copyright © Oracle Corporation, 2002. All rights reserved.

Data Warehouses Versus Data Marts

Property Data Warehouse Data Mart

Scope Enterprise Department

Subjects Multiple Single-subject, LOB

Data Source Many Few

Implementation time Months to years Months

2-19 Copyright © Oracle Corporation, 2002. All rights reserved.

Dependent Data Mart

Data Warehouse

Data Marts

Flat FilesMarketing

Sales

Finance

MarketingSales

FinanceHR

OperationalSystems

External Data

Operations Data

Legacy Data

External Data

2-20 Copyright © Oracle Corporation, 2002. All rights reserved.

Independent Data Mart

Sales orMarketing

Flat Files

OperationalSystems

External Data

Operations Data

Legacy Data

External Data

2-21 Copyright © Oracle Corporation, 2002. All rights reserved.

Typical DataWarehouse Components

Source Systems

Staging Area

Presentation Area

AccessTools

ODS

Operational

External

Legacy

Metadata Repository

Data Marts

Data Warehouse

2-23 Copyright © Oracle Corporation, 2002. All rights reserved.

Warehouse Development Approaches

• “Big bang” approach

• Incremental approach:– Top-down incremental approach– Bottom-up incremental approach

2-24 Copyright © Oracle Corporation, 2002. All rights reserved.

“Big Bang” Approach

Analyze enterpriserequirements

Build enterprisedata warehouse

Report in subsets orstore in data marts

2-26 Copyright © Oracle Corporation, 2002. All rights reserved.

Top-Down Approach

Analyze requirements at the enterprise level

Develop conceptual information model

Identify and prioritize subject areas

Complete a model of selected subject area

Map to available data

Perform a source system analysis

Implement base technical architecture

Establish metadata, extraction, and load processes for the initial subject area

Create and populate the initial subject area data mart within the overall warehouse

framework

2-27 Copyright © Oracle Corporation, 2002. All rights reserved.

Bottom-Up Approach

Define the scope and coverage of the data warehouse and analyze the source systems within this scope

Define the initial increment based on the political pressure, assumed business benefit and data volume

Implement base technical architecture and establish metadata, extraction, and load processes as required by increment

Create and populate the initial subject areas within the overall warehouse framework

2-29 Copyright © Oracle Corporation, 2002. All rights reserved.

Incremental Approach to Warehouse Development

• Multiple iterations

• Shorter implementations

• Validation of each phase Strategy

Definition

Analysis

Design

Build

Production

Increment 1

Iterative

2-30 Copyright © Oracle Corporation, 2002. All rights reserved.

Data Warehousing Process Components

• Methodology

• Architecture

• Extraction, Transformation, and Load (ETL)

• Implementation

• Operation and Support

2-31 Copyright © Oracle Corporation, 2002. All rights reserved.

Methodology

• Ensures a successful data warehouse

• Encourages incremental development

• Provides a staged approach to an enterprisewide warehouse:– Safe– Manageable– Proven– Recommended

2-32 Copyright © Oracle Corporation, 2002. All rights reserved.

Architecture

• “Provides the planning, structure, and standardization needed to ensure integration of multiple components, projects, and processes across time.”

• “Establishes the framework, standards, and procedures for the data warehouse at an enterprise level.”

— The Data Warehousing Institute

2-33 Copyright © Oracle Corporation, 2002. All rights reserved.

Extraction, Transformation, and Load (ETL)

“Effective data extract, transform and load (ETL) processes represent the number one success factor for your data warehouse project and can absorb up to 70 percent of the time spent on a typical data warehousing project.”

— DM Review, March 2001

Source TargetStaging Area

2-34 Copyright © Oracle Corporation, 2002. All rights reserved.

Implementation

Data Warehouse Architecture

Implementation

Ex., Incremental Implementation

Increment 1

Increment 2

Increment n

.

.

.

2-35 Copyright © Oracle Corporation, 2002. All rights reserved.

Operation and Support

• Data access and reporting

• Refreshing warehouse data

• Monitoring

• Responding to change

2-36 Copyright © Oracle Corporation, 2002. All rights reserved.

Phases of theIncremental Approach

• Strategy

• Definition

• Analysis

• Design

• Build

• Production

Increment 1Strategy

Definition

Analysis

Design

Build

Production

2-38 Copyright © Oracle Corporation, 2002. All rights reserved.

Strategy Phase Deliverables

• Business goals and objectives

• Data warehouse purpose, objectives, and scope

• Enterprise data warehouse logical model

• Incremental milestones

• Source systems data flows

• Subject area gap analysis

2-39 Copyright © Oracle Corporation, 2002. All rights reserved.

Strategy Phase Deliverables

• Data acquisition strategy

• Data quality strategy

• Metadata strategy

• Data access environment

• Training strategy

2-40 Copyright © Oracle Corporation, 2002. All rights reserved.

Summary

In this lesson, you should have learned how to:

• Identify a common, broadly accepted definition of a data warehouse

• Describe the differences of dependent and independent data marts

• Identify some of the main warehouse development approaches

• Recognize some of the operational properties and common terminology of a data warehouse

2-41 Copyright © Oracle Corporation, 2002. All rights reserved.

Practice 2-1 Overview

This practice covers the following topics:

• Answering questions regarding data warehousing concept and terminology

• Discussing some of the data warehouse concept and terminology