Building the Warehouse Chapter 10. Overview Defining DW Concepts & Terminology Planning For a...

23
Building the Warehouse Chapter 10

Transcript of Building the Warehouse Chapter 10. Overview Defining DW Concepts & Terminology Planning For a...

Building the Warehouse

Chapter 10

Overview

Defining DW Concepts& Terminology

PlanningFor a

SuccessfulWarehouse

Project Management(Methodology, Maintaining Metadata)

Meeting aBusiness

Need

Choosing aComputingArchitecture

ModelingThe Data

Warehouse

AnalyzingUser Query

Needs

PlanningWarehouse

Storage

ETT(BuildingThe

Warehouse)

ETT(BuildingThe

Warehouse)

SupportingEnd UserAccess

ManagingThe Data

Warehouse

Extraction/Transformation/Transportation Process (ETT)

* Extract source data * Load data into WH

* Transform/clean data * Detect change

* Index and summarize * Refresh data

Programs

Gateways

Tools

ETT

Operational systems

Warehouse

ETT Processes

Must result in data that is relevant, useful, high-quality, accurate, and accessible

Require a large proportion of warehouse development time and resources

Clean up

Consolidate

Restructure

Relevant

Useful

Quality

Accurate

AccessibleOpertational Systems

ETT

Warehouse

Data Staging Area

The Construction site for the warehouseRequired by most implementationsComposed of ODS, flat files, or

relational server tablesFrequently configured as multitier

staging

Operationalsystem

Operationalsystem

DataStaging

area

DataStaging

areaWarehouseWarehouseExtract

Transport (Load)

Remote Staging Model

Data staging area within the warehouse environment

Operationalsystem

Operationalsystem

Oper.envt.Data

Stagingarea

DataStaging

areaWarehouseWarehouse

Operationalsystem

Operationalsystem

Data Staging

area

Data Staging

areaWarehouseWarehouse

Oper.envt.

Staging envt.

Warehouse envt.

Warehouse environment

Data staging area in its own environment, avoiding negative impact on the warehouse environment

Extract, Transform,transport

Transport (Local)

Onsite Staging Model

Data staging area within the operationalenvironment, possibly affecting the operationalsystem

Operationalsystem

Operationalsystem

Datastaging

area

Datastaging

areaWarehouseWarehouse

WH envt.Operational environment

TransformExtract

Extracting Data

Routines developed to select fields from sourceVarious data formatsRules, audit trails, error correction facilities

Operational databases

Warehouse database

DataStagingarea

Transform

Datamapping

Source Systems

ProductionArchiveInternalExternal

Production Data

Operating system platformsHardware platformsFile systemsDatabase systems and vertical applications

IMSDB2VSAMNonStop SQLOracleSybaseRdb

SAPShared MedicalSystemsDun and BradstreetFinancialsHogan FinancialsOracle Financials

Archive Data

Historical dataUseful for analysis over long periods of timeUseful for first-time loadMay require unique transformations

Operational database

Warehouse database

Internal Data

Planning, sales, and marketing organization data

Maintained by: - Spreadsheets (structured) - Documents (unstructured)Treated like any other source data

Planning

Marketing

Accounting Warehousedatabase

External Data

Information from outside the organization Issues of frequency, format, and predictabilityDescribed and tracked using metadata

A.C.Nielsen, IRI, IMS,Waish America Competitive

information

Economicforecasts

Wall StreetJournal

Warehousingdatabases

Barron’s

Dun and Bradstreet

Purchaseddatabases

Mapping Defines which operational attributes to use Defines how to transform the attributes for the

warehouse Defines where the attributes exist in the warehouse Mapping tools are available

MetadataFile A Staging File OneF1 NumberF2 NameF3 DOB

File AF1 123F2 BloggsF3 10/12/56

Staging File OneNumber USA123Name Mr.BloggsDOB 10-Dec-56

Extraction Techniques

Programs: C, COBOL, PL/SQLGateways: transparent database

accessIn-house development is popularTools - High initial cost - Ongoing automation - Data cleanup

Sources and Targets

Data marts

Data analysis

Data mining

OLAP

Designing Extraction Processes

Analysis: - Source, technologies - Data types, quality, ownersDesign options: - Manual, custom, gateway, third-party - Replication, full, or delta refreshDesign issues: - Batch window, volumes, data currency - Automation, skills needed, resources

Maintaining Extraction Metadata

Source location, type, structureAccess methodPrivilege informationTemporary storageFailure proceduresValidity checksHandlers for missing data

Possible ETT Failure

A missing source fileA system failurePoor mapping informationInadequate storage planningA source structural changeNo contingency planInadequate data validation

Maintaining ETT Quality

ETT must be: - Tested - Documented - Monitored and reviewedDisplay metadata must be

coordinated

Selection CriteriaBase functionality Interface featuresMetadata repositoryOpen APIMetadata accessRepository utilities Input and output processingCleansing, reformatting, and auditingReferenceTraining requirements

WTI Partner ETT Tools

CarletonConstellarEvolutionary Technologies Informatica Information BuildersOracle EDMS, Toolkits, OADWPrism SolutionsSagentVality Technology

Summary

This lesson discussed the following topics:ETT processes are essential and consume a

large proportion of warehouse resources and time

The extraction process acquires source data

You may encounter many data sourcesThere are many data extraction issuesETT Tools should be considered