Lecture 5

25
Lecture 5 Themes in this session Building and managing the data warehouse Data extraction and transformation Technical issues

description

Lecture 5. Themes in this session Building and managing the data warehouse Data extraction and transformation Technical issues. Building the data warehouse. Basic guidelines for the creation of a DW. Create corporate sponsors and plan thoroughly - PowerPoint PPT Presentation

Transcript of Lecture 5

Page 1: Lecture 5

Lecture 5

Themes in this session

• Building and managing the data warehouse• Data extraction and transformation• Technical issues

Page 2: Lecture 5

Building the data warehouse

Page 3: Lecture 5

Basic guidelines for the creation of a DW

• Create corporate sponsors and plan thoroughly• Determine a scalable architectural framework for the DW• Identify and document all assumptions, conflicts and

issues at the start of the project• Choose methodology and tools which are compatible with

the organisation• Take the continuous nature of the DW life cycle into

account• Ensure a thorough data analysis and resolve all data

conflicts• Learn from your experience and above all, learn from

your mistakes

Page 4: Lecture 5

The data warehouse life cycle

Investigation

Analysis of currentenvironment

Identify requirements

Identify architecture

Data warehouse design

Development

Implementation

On-going dataadministration

Page 5: Lecture 5

Creating a strategy for a data warehousing

• Identification of current information strategy (usually an implicit strategy)

• Internal appraisal of the strategy• External appraisal of the strategy• Perform a strategic gap analysis• Enumerate strategic alternatives which can be achieved

through the application of a DW• Choose among alternatives on the basis of the

organisations mission, objectives and resources• Formulate a strategic statement which specifies the

organisation’s utilisation of its information resources and the role the DW is to play in this utilisation

Page 6: Lecture 5

Building a business case for a data warehouse

• Describe the AS-IS situation• Identify business goals that the DW will help to fulfil• Identify business problems that the DW can be used to

solve

• Describe the TO-BE situation• Explain how the data warehouse will be used to

evolve the organisation from the AS-IS to the TO-BE situation

• Calculate the expected ROI for the data warehouse• identify all costs associated with the data warehouse• quantify all the benefits between the AS-IS and TO-BE

situation

Page 7: Lecture 5

Creating a project plan

• What is the project scope?• determine the scope of the data• determine the role of technology• determine any temporal considerations

• What is the business reason?• identify the essential purpose of the project• identify business drivers

• What are the critical success factors for the project?• identify critical objectives• identify critical tasks and activities

• What are the resource constraints on the project?• Determine the need for resources• determine the resource availability

Page 8: Lecture 5

Output from the planning phase

• A set of activities to be performed and a set of requirements on these activities (performance metrics)

• Documented business drivers• Definition of scope of data• Defined temporal scope• Business reasons• The overall approach• Participants and their roles• Assumptions and constraints• Project management strategy

Page 9: Lecture 5

Data warehouse development activities

Architecture definition

Datamodelling

Planning and project initiation

Decision makerneeds

Subject areaanalysis

Source system analysis

Transformdesign

Physicaldatabasedesign

Warehousedevelopment

End-useraccessdesign

End-useraccessdefinition

End-useraccessdevelopment

Warehousepopulate andimplement

Page 10: Lecture 5

Project delivery tactics

• First things first• Market the project• Adopt a customer-focused orientation• Deliver everything well

Note: the project must always be able to show progress and have the ability to deliver business value

Page 11: Lecture 5

Focus areas for the management of a data warehouse

• Monitor and manage data warehouse activity• Monitor and manage data warehouse data• Monitor and manage security in the data warehouse• Monitor and manage the data warehouse data model• Monitoring and managing data warehouse metadata• Monitoring and managing the integration and

transformation interface• Monitoring and managing the demands of the data

warehouse’s business environment

Page 12: Lecture 5

Staffing requirements for initial data warehouse development and subsequent DW management

• Data warehouse management and maintenance– Data warehouse administrator– Data warehouse organisational change manager– Database administrator– Data warehouse maintenance developers– Metadata Manager

• Analysis and design– Business requirements analysts– User groups– Data warehouse architect

• Data procurement– Data quality analyst – Data acquisition developer– Data access developer

• IS executive sponsor

Page 13: Lecture 5

Data warehouse end-user roles and responsibilities

• Support roles• Iteration sponsors• Subject matter experts• User support technician

• User types• Unlimited ad hoc user access users• Limited ad hoc access users• Predefined application users

• Data warehouse initial and ongoing end-user staffing• Subject matter experts• User support technician

Page 14: Lecture 5

Data extraction and transformation

Page 15: Lecture 5

Data Quality (1)

• Data should be accurate• Data should be stored according to data type• Data should have high integrity• Data should be consistent• Databases should be well designed• Data should not be redundant• Data should follow business rules• Data should correspond to established domains

Page 16: Lecture 5

Data quality (2)

• Data should be timely• Data should be well understood• Data should be integrated• Data should satisfy the needs of the business• Users should be satisfied with the data and the

information derived from the data• Data should be complete• There should be no duplicate records• There should be no data anomalies

Page 17: Lecture 5

A four-phase process to achieve high data quality

• Data investigation– parsing– lexical analysis– pattern investigation– data typing

• Data conditioning and standardisation• Data integration• Data Survivorship and formatting

Page 18: Lecture 5

Fundamental types of data transformation

• Simple transformation• data type conversion• date/time format conversions• field decoding

• Cleansing and scrubbing• valid values• complex reformatting

• Integration• simple field level mappings• complex integration

• Aggregation and summarisation

Page 19: Lecture 5

Other transformations

• Operating system conversions• Hardware architecture conversions

• affects the structure of data• affects the structure of programs running against the

data• affects the computer operations needed for each

environment• the available software which makes the different

environments run

• Application conversions

Page 20: Lecture 5

Types of source system extracts

• Point-in-time snapshots• scheduled at specific points in time• efficient method for users to pinpoint specific points in

time or ranges of time• unfortunately requires nearly a complete read of all

operational sources of data

• Significant business events• non-predetermined events drive the capture of data• captured as a snapshot of relevant data entities• triggered when a completion event is performed

• Delta data• see next slide...

Page 21: Lecture 5

Types of source system extracts - Delta data

Delta data is both new and changed data, it represents changes from one point in time to the next

• Delta data can be captured in a number of ways:– Operational events– Changed data capture– date last modified– Point-in-time comparisons

Page 22: Lecture 5

Types of data warehouse updates

• Insert• Full replace• Partial replace• Update• Update plus insert• Insert with update• Replace and insert

Page 23: Lecture 5

Copy management

• What to extract?• When to extract?• How to extract?• Transformation requirements?• What to transform?• How to transform?• What to update?• When to update?• How to update?• How to generate the necessary metadata?

Page 24: Lecture 5

Technical issues

Page 25: Lecture 5

Major questions affecting the choice of the technical solution

• Connectivity and interoperability?• Need for parallel processing ability?• Scalability?• Standards?• Single vendor/multi-vendor• Vendor stability and service?• In house competency?• Compatibility with existing systems architecture?• Compatability with IT strategy?• Functionality vs Cost?