Lecture 5
description
Transcript of Lecture 5
Lecture 5
Themes in this session
• Building and managing the data warehouse• Data extraction and transformation• Technical issues
Building the data warehouse
Basic guidelines for the creation of a DW
• Create corporate sponsors and plan thoroughly• Determine a scalable architectural framework for the DW• Identify and document all assumptions, conflicts and
issues at the start of the project• Choose methodology and tools which are compatible with
the organisation• Take the continuous nature of the DW life cycle into
account• Ensure a thorough data analysis and resolve all data
conflicts• Learn from your experience and above all, learn from
your mistakes
The data warehouse life cycle
Investigation
Analysis of currentenvironment
Identify requirements
Identify architecture
Data warehouse design
Development
Implementation
On-going dataadministration
Creating a strategy for a data warehousing
• Identification of current information strategy (usually an implicit strategy)
• Internal appraisal of the strategy• External appraisal of the strategy• Perform a strategic gap analysis• Enumerate strategic alternatives which can be achieved
through the application of a DW• Choose among alternatives on the basis of the
organisations mission, objectives and resources• Formulate a strategic statement which specifies the
organisation’s utilisation of its information resources and the role the DW is to play in this utilisation
Building a business case for a data warehouse
• Describe the AS-IS situation• Identify business goals that the DW will help to fulfil• Identify business problems that the DW can be used to
solve
• Describe the TO-BE situation• Explain how the data warehouse will be used to
evolve the organisation from the AS-IS to the TO-BE situation
• Calculate the expected ROI for the data warehouse• identify all costs associated with the data warehouse• quantify all the benefits between the AS-IS and TO-BE
situation
Creating a project plan
• What is the project scope?• determine the scope of the data• determine the role of technology• determine any temporal considerations
• What is the business reason?• identify the essential purpose of the project• identify business drivers
• What are the critical success factors for the project?• identify critical objectives• identify critical tasks and activities
• What are the resource constraints on the project?• Determine the need for resources• determine the resource availability
Output from the planning phase
• A set of activities to be performed and a set of requirements on these activities (performance metrics)
• Documented business drivers• Definition of scope of data• Defined temporal scope• Business reasons• The overall approach• Participants and their roles• Assumptions and constraints• Project management strategy
Data warehouse development activities
Architecture definition
Datamodelling
Planning and project initiation
Decision makerneeds
Subject areaanalysis
Source system analysis
Transformdesign
Physicaldatabasedesign
Warehousedevelopment
End-useraccessdesign
End-useraccessdefinition
End-useraccessdevelopment
Warehousepopulate andimplement
Project delivery tactics
• First things first• Market the project• Adopt a customer-focused orientation• Deliver everything well
Note: the project must always be able to show progress and have the ability to deliver business value
Focus areas for the management of a data warehouse
• Monitor and manage data warehouse activity• Monitor and manage data warehouse data• Monitor and manage security in the data warehouse• Monitor and manage the data warehouse data model• Monitoring and managing data warehouse metadata• Monitoring and managing the integration and
transformation interface• Monitoring and managing the demands of the data
warehouse’s business environment
Staffing requirements for initial data warehouse development and subsequent DW management
• Data warehouse management and maintenance– Data warehouse administrator– Data warehouse organisational change manager– Database administrator– Data warehouse maintenance developers– Metadata Manager
• Analysis and design– Business requirements analysts– User groups– Data warehouse architect
• Data procurement– Data quality analyst – Data acquisition developer– Data access developer
• IS executive sponsor
Data warehouse end-user roles and responsibilities
• Support roles• Iteration sponsors• Subject matter experts• User support technician
• User types• Unlimited ad hoc user access users• Limited ad hoc access users• Predefined application users
• Data warehouse initial and ongoing end-user staffing• Subject matter experts• User support technician
Data extraction and transformation
Data Quality (1)
• Data should be accurate• Data should be stored according to data type• Data should have high integrity• Data should be consistent• Databases should be well designed• Data should not be redundant• Data should follow business rules• Data should correspond to established domains
Data quality (2)
• Data should be timely• Data should be well understood• Data should be integrated• Data should satisfy the needs of the business• Users should be satisfied with the data and the
information derived from the data• Data should be complete• There should be no duplicate records• There should be no data anomalies
A four-phase process to achieve high data quality
• Data investigation– parsing– lexical analysis– pattern investigation– data typing
• Data conditioning and standardisation• Data integration• Data Survivorship and formatting
Fundamental types of data transformation
• Simple transformation• data type conversion• date/time format conversions• field decoding
• Cleansing and scrubbing• valid values• complex reformatting
• Integration• simple field level mappings• complex integration
• Aggregation and summarisation
Other transformations
• Operating system conversions• Hardware architecture conversions
• affects the structure of data• affects the structure of programs running against the
data• affects the computer operations needed for each
environment• the available software which makes the different
environments run
• Application conversions
Types of source system extracts
• Point-in-time snapshots• scheduled at specific points in time• efficient method for users to pinpoint specific points in
time or ranges of time• unfortunately requires nearly a complete read of all
operational sources of data
• Significant business events• non-predetermined events drive the capture of data• captured as a snapshot of relevant data entities• triggered when a completion event is performed
• Delta data• see next slide...
Types of source system extracts - Delta data
Delta data is both new and changed data, it represents changes from one point in time to the next
• Delta data can be captured in a number of ways:– Operational events– Changed data capture– date last modified– Point-in-time comparisons
Types of data warehouse updates
• Insert• Full replace• Partial replace• Update• Update plus insert• Insert with update• Replace and insert
Copy management
• What to extract?• When to extract?• How to extract?• Transformation requirements?• What to transform?• How to transform?• What to update?• When to update?• How to update?• How to generate the necessary metadata?
Technical issues
Major questions affecting the choice of the technical solution
• Connectivity and interoperability?• Need for parallel processing ability?• Scalability?• Standards?• Single vendor/multi-vendor• Vendor stability and service?• In house competency?• Compatibility with existing systems architecture?• Compatability with IT strategy?• Functionality vs Cost?