Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

36
How to Improve IQ in the Boardroom : Framework for Managing Information Quality October 21, 2008 Bratislava Dr. Jan Mrazek President Adastra Group [email protected]

description

d

Transcript of Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Page 1: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

How to Improve IQ in the Boardroom

:Framework for Managing

Information Quality

October 21, 2008

Bratislava

Dr. Jan MrazekPresident

Adastra [email protected]

Page 2: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Lack of Data Quality Effects on Credibility

�Wrong and Misguided Business Decisions�Failure to meet regulatory requirements�Negative impact on customer relationships�High costs of reworks and fixes�All in all: negative impact on bottom line

Page 3: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Additional Data Quality ImpactsB

usin

ess

Tec

hnic

alD

ata

Gov

erna

nce

� Lack of confidence in the data and inability to act on it

� Much effort and resources required to reconcile numbers

� Low adoption by users

� Inability to fully leverage previous investments� Inability to compete in timely fashion

� Difficulty to justify additional investments

� Difficulty in reconciling results (to source, to GL, etc)

� Instability of processes which translates into:� More failures� Higher costs� Lower availability of information to the users� Longer time to market to deliver solutions

� High demands from the Data Governance structure and Data Stewards� Potential for them to become a bottleneck or simply be ineffective

Page 4: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Some real life examples

Sub-prime mortgagesABCP’sCDO’s

Page 5: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Case study: IQ Audit EE bank acquired by large multinational bank

Business impact modeling on retail segment portfolio only

EUR 1.65 mil for PI loans only

n/aMissing single customer view, missing historical data, repayment schedules, contact data, tax code validity, collateral data

Underwriting/ Provisioning

Est. 40% of IT budget

n/aMissing data governance, business involvement / definitions / metadata

Operational efficiency

Est. several million EUR

EUR 3.4 mil for PI loans only

EUR 23.3 mil

EUR 19.2 mil

5-year cost (exc. NPV calculation)

Inability to stress test

n/a

EUR 0.63 – 1.7mil per campaign

EUR 960,000per campaign

Per item cost

Collateral data not linked to contract, not reevaluated

Missing or invalid customer contact information (address, postal code, telephone)

Collections

Missing single customer view, historical data, errors in target variables and predictors for data mining (cross-sell, up-sell, attrition management)

Missing or invalid customer contact information (address, postal code, telephone)

Marketing/Sales

Issue areaBiz area

Page 6: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Practical steps where to start and what to do

� Determine where you are – IQ Assessment� Get used to Data Profiling� Revisit your down-stream ETL processes� Automate Data and Business Logic Profiling� Implement in gradual stages MDM Architecture� Enhance your MDM by more general Rule-based

Engine capable to handle transactions in real time

Page 7: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Adastra IQ Assessment

� Thoroughly investigates current data processes

� Delivers an objective Information Quality scorecard based on a quantitative assessment of organization’s data and information environment

� Allows organizations to compare industry’s best practices against their company’s IQ standards and implementations

� Allows organizations to assess improvements in the quality over time

� Is the first step towards IQ ongoing improvements and establishement of IQ conscious culture

� Optional Business Impact Analyses

Page 8: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Adastra IQ Assessment - Milestones

Data Capture

IQ Processes

Source Data

Data Extraction, Transformation & Loading

Data Profiling and Modeling� Data Entry Standards

� On-Line Entry Edits

� On-Line Validation (Timeliness)

� On-Line Validation (Process)

� Data History Availability

� External Data Sources

� Change Management Processes

�Policy and Procedures

� Domain Analysis

� Business Rules Analysis

� Nulls / Blanks Analysis

� Uniqueness Analysis

� Relationships Analysis

� Pattern / Mask Analysis

� Non-Standard Data Analysis

� Change Management Processes

� Policy and Procedures

� Domain Analysis

� Object Definition Analysis

� Verification of Data Definitions

� Event Trigger Analysis

� Entity Integrity and Referential Integrity Analysis

� Change Management Processes

� Policy and Procedures

� Application of Business Rules

� Table Extract / Views Analysis

� Data Quality Validation of ETL

� ETL Tool Usage

� Source Data Management

� Data Auditing and Issue Resolution

� Data Integration

� Change Management Processes

� Policy & Procedures

� Data Stewardship

� On-Going Data Validation Analysis

� Measurement of Data Quality Improvement

� Error Discovery Analysis

� Data Reconciliation Analysis

� Linkage of Quality and Reward

� Project Management Methodologies

� Change Management Processes

� Policy & Procedures

Page 9: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Assessment Scorecard Measures

Few processes and procedures are in place to improv e and maintain information quality. Set methodologies are undevelo ped to streamline these processes. Accountability for information qua lity is unassigned.

1 - 20

Processes and procedures are in place to improve an d maintain information quality. Methodologies are sparsely doc umented and enforced through respective owners. Accountability for information quality is unassigned.

21 – 40

Most processes and procedures are aligned with indu stry best practices to improve and maintain information quality. All me thodologies are clearly documented and enforced through respective owners. Accountability and responsibility for information q uality is not firmly assigned.

41 – 60

Most processes and procedures are aligned with indu stry best practices to improve and maintain information quality. All me thodologies are clearly documented and enforced through data stewar ds to which accountability is held for information quality.

61 – 80

Processes and procedures are aligned with industry best practices to improve and maintain information quality. All metho dologies are clearly documented and enforced through data stewards to wh ich accountability is held for information quality.

81 – 100

Top Tier

Industry Leading

Acceptable

Some Improvement Warranted

Significant Improvement Needed

Page 10: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Practical steps where to start and what to do

� Determine where you are – IQ Assessment� Get used to Data Profiling� Revisit your down-stream ETL processes� Automate Data and Business Logic Profiling� Implement in gradual stages MDM Architecture� Enhance your MDM by more general Rule-based

Engine capable to handle transactions in real time

Page 11: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

DQ Tools

� Many tools on the market� I makes good sense to pay attention to the following aspects:

� Data Connectivity

� Data Profiling Features

� Validation and Measurement� Reporting Features

� Metadata Specific Capabilities

� Performance� Security

� Infrastructure Compatibility

� Usability� Completeness of Vision

� Total Cost of Ownership

Page 12: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Example of profiling output - Ataccama

Extreme Values Frequency Analysis

Profiling output – attribute TAXCODE

scope total inspected percentage of valid count of invalid all records 4 163 217 92.49% 312 781

Page 13: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Data Quality – Record Grading

Page 14: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Practical steps where to start and what to do

� Determine where you are – IQ Assessment� Get used to data profiling� Revisit your down-stream ETL processes� Automate Data and Business Logic Profiling� Implement in gradual stages MDM Architecture� Enhance your MDM by more general Rule-based

Engine capable to handle transactions in real time

Page 15: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

DQ ETL Components

� Adastra has developed a comprehensive set of ETL components designed to address DQ related data processing within ETL workflows

� Our ETL component suite integrates with three major ETL vendors (Informatica, IBM, Ab Initio)

� The list of components include:� Operational Reconciliation� Data Validation� Error Logging

Page 16: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Operational Reconciliation

� As a data extract is received, it is necessary to verify that such data is indeed valid for processing

� Our generic components perform the following verifications:� The extract follows the order of execution� The extract is not a previous repeat� The extract is complete� The extract belongs to the expected period� The extract truly contains the number of records produced by

the source system� There were no transmission errors or inappropriate data

manipulations performed on the extract� These components are performed immediately upon the

landing of the extract, in order to allow any serious data issues to be addressed timely and outside of the critical window

Page 17: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Data Validation

� Utilizing Data Profiling tool to generate Data Validation rules� This saves time and guarantees a proper flow and integrity

of processes� In the absence of this capability, we have generic

components that are metadata driven and are leveraging confirmed knowledge obtained from Data Profiling process

� Alternatively the same can be achieved by a more sophisticated technologies (i.e. Ataccama Data Quality Center) in a fully automated, controlled and metadata driven way outside of the standard ETL process

Page 18: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Error Logging

� In the event that any data quality issues are identified from the previous step, those are logged

� This is a fundamental step to support the following aspects:� Document all data quality issues before any data cleansing is

performed (e.g.: applying a default)� Support measurements and reporting� Support feedback to the source systems for possible data

corrections

� The error logs are persistent in database tables

Page 19: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Practical steps where to start and what to do

� Determine where you are – IQ Assessment� Get used to data profiling� Revisit your down-stream ETL processes� Automate Data and Business Logic Profiling� Implement in gradual stages MDM Architecture� Enhance your MDM by more general Rule-based

Engine capable to handle transactions in real time

Page 20: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

DQ Management Cycle

Page 21: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

DQ Reporting Architecture

Packaged Applications

RDBs and Flat Files

Mainframe and

Midrange

Data Sources

Workbench

Engine

Cleanse

Match

Enrich

Scorecard/Monitor

MetricsRepository

Reportsvia web

Data Quality

Page 22: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Drilling Down on DQ Dashboards

� Different users need to look at Data Quality at a different level

� As such, it is imperative to have the flexibility to serve thoseneeds and be able to report Data Quality Scores across a series of dimensions

� Each user group commonly has a single view or a limited view in terms of analytics. However, below one can see the full potential of such drill down capability:

� Exception Reporting is also available and will notify the respective user groups when specific events happen, such as Data Quality going below a specific threshold

by by

Page 23: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Data Quality Dashboard

� High level overview of key data areas

� A management tool for DQ Manager/Business Sponsor of DQ Program

� Based on defined Business rules

� Tracking against defined KPIs

Page 24: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Data Quality KPIs for one (business) data entity

� A management tool for Data Steward/Business Owner of a given data entity

� Compound KPIs in selected categories

� User-defined KPIs, report structure and target DQ levels

� Using pre-defined business rules

Page 25: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Detailed Analysis of a DQ KPI - Address Validity

� Analytical tool for Data Stewards/ Technical Analysts

� Detailed breakdown of a given KPI

� Allows Data Stewards/ Data Quality Managers to take action

Page 26: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Detailed Analysis of a DQ KPI - Address Consistency

� Allows Data Stewards/ Data Quality Managers to take action, eg.:

� To apply validation on input

� Change existing business process

� Correct a malfunctioning ETL/interface from one of the systems, etc…

Page 27: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Business Rules define Data Quality

� Ability to support complex, hierarchical business rules

� Configurable by users, no Coding

� High performance to execute defined rules on tens of millions of records

� Ability to execute business rules in real-time

Page 28: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Practical steps where to start and what to do

� Determine where you are – IQ Assessment� Get used to data profiling� Revisit your down-stream ETL processes� Automate Data and Business Logic Profiling� Implement in gradual stages MDM Architecture� Enhance your MDM by more general Rule-based

Engine capable to handle transactions in real time

Page 29: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

MDM Hub Architectural Approaches

� There are 4 major architectural approaches for implementing an MDM Hub as part of the enterprise architecture

�Consolidation (Central Master)

�Registry�Coexistence

�Transaction Hub� They vary by:

� Amount of master data attributes stored in a central location� Data latency between operational systems and master repository� Degree of synchronization of master data across the enterprise� Impact on operational systems data storage and functionality

Page 30: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Consolidation Approach

TransactionSystems

MDM Hub Solution

Data Integration

Accounting

Billing

CRM

Data Marts

MasterData

Repository

EnterpriseData

WarehouseOperational

DataStore

ERP

Data QualityCleansingStandardizationIdentificationUnification

DictionariesEtalons

MetadataFront End

Data Integration (ETL, EAI, SOA)

� The master data is physically stored in a central repository

� It is cleansed, standardized, de-duplicated and unified in a batch mode

� The master data repository forms a golden record for all downstream systems

� The master data will be as current as the latest batch run� The operational systems continue to maintain their

version of the master data� With this approach only the downstream processes

benefit from the master data. This may include reporting, analytics, marketing campaigns, data mining, etc.

� Often the consolidation approach is implemented as an extension of the EDW environment

Page 31: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Registry Approach

TransactionSystems

MDM Hub Solution

Data Integration

Accounting

Billing

CRM

Data Marts

MasterData

Repository

EnterpriseData

WarehouseOperational

DataStore

ERP

Data QualityCleansingStandardizationIdentificationUnification

DictionariesEtalons

MetadataFront End

Data Integration (ETL, EAI, SOA)

� Only the master data identifiers are stored in the repository together with their relationships and de-duplication groups

� The rest of the master data is stored in its original location in the operational systems

� The MDM Hub maintains a set of rules for reconstructing and assembling the master data at runtime

� The master data retrieved is always up to date� The performance may not be good for large amounts of

master data accessed due to runtime data federation� The operational systems continue to maintain their

version of the master data� With this approach only the downstream processes

benefit from the master data. This may include reporting, analytics, marketing campaigns, data mining, etc.

Page 32: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Coexistence Approach

TransactionSystems

MDM Hub Solution

Data Integration

Accounting

Billing

CRM

Data Marts

MasterData

Repository

EnterpriseData

WarehouseOperational

DataStore

ERP

Data QualityCleansingStandardizationIdentificationUnification

DictionariesEtalons

MetadataFront End

Data Integration (ETL, EAI, SOA)

� The consolidation often evolves in a coexistence approach

� The master data is physically stored in a central repository. It is cleansed, standardized, de-duplicated and unified in a batch mode. The master data will be as current as the latest batch run

� The master data repository forms a golden record for all downstream systems and some upstream systems

� The difference with the consolidation approach is that the data is published and some of the operational systems may synchronize their data with the master data

� The master data is synchronized across multiple systems

Page 33: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Transaction Hub Approach

TransactionSystems

MDM Hub Solution

Data Integration

Accounting

Billing

CRM

Data Marts

MasterData

Repository

EnterpriseData

WarehouseOperational

DataStore

ERP

Data QualityCleansingStandardizationIdentificationUnification

DictionariesEtalons

MetadataFront End

Data Integration (ETL, EAI, SOA)

� The master data is physically stored in a central repository.

� It is cleansed, standardized, de-duplicated and unified in a batch mode as well as at runtime

� The master data repository forms a golden record for all downstream systems and some upstream systems

� Some of the upstream systems give up the maintenance of the master data to the MDM Hub. They directly access the transaction hub for all master data management

Page 34: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Practical steps where to start and what to do

� Determine where you are – IQ Assessment� Get used to data profiling� Revisit your down-stream ETL processes� Automate Data and Business Logic Profiling� Implement in gradual stages MDM Architecture� Enhance your MDM by more general Rule-based

Engine capable to handle transactions in real time

Page 35: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Extended MDM & IQM Architecture

TransactionSystems

MDM & IQM Hub Solution

Data Integration

Accounting

Billing

CRM

Data Marts

MasterData

Repository

EnterpriseData

WarehouseOperational

DataStore

ERP

Data QualityCleansingStandardizationIdentificationUnification

DictionariesEtalons

MetadataFront End

Data Integration (ETL, EAI, SOA)

Page 36: Adastra Framework for Managing Information Quality Bratislava Oct 21 2008

Adastra, s.r.o.Nile HouseKarolinská 654/2Praha Czech [email protected]

Adastra CorporationLe Parc Office Tower8500 Leslie St.MarkhamOntario, L3T 7M8 [email protected]

Adastra, s.r.o.Francisciho [email protected]

Adastra GmbHBockenheimer Landstrasse 17/19Frankfurt a. [email protected]

Thank You

Adastra Bulgaria EOOD29 Panayot Volov str.,5th [email protected]

CANADA CZECH REPUBLIC SLOVAKIA GERMANY BULGARIA