Data-Ed Engineering Solutions to Data Quality Challenges

75
TITLE PRODUCED BY DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 CLASSIFICATION EDUCATION DATE SLIDE 10/09/12 10/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved! Data Quality Engineering Date: October 9, 2012 Time: 2:00 PM ET Presented by: Dr. Peter Aiken 1 This presentation provides guidance to organizations considering data quality initiatives or preparing for data quality initiatives. This talk will illustrate how organizations with chronic business challenges often can trace the root of the problem to poor data quality. Showing how data quality can be engineered provides a useful framework in which to develop an organizational approach. This in turn will allow organizations to more quickly identify data problems caused by structural issues versus practice-oriented defects. Participants will also learn the importance of practicing data quality engineering quantification. Starting point for new system development data performance metadata data architecture data architecture and data models shared data updated data corrected data architecture refinements facts & meanings Metadata & Data Storage Starting point for existing systems Metadata Refinement Correct Structural Defects Update Implementation Metadata Creation Define Data Architecture Define Data Model Structures Metadata Structuring Implement Data Model Views Populate Data Model Views Data Refinement Correct Data Value Defects Re-store Data Values Data Manipulation Manipulate Data Updata Data Data Utilization Inspect Data Present Data Data Creation Create Data Verify Data Values Data Assessment Assess Data Values Assess Metadata

Transcript of Data-Ed Engineering Solutions to Data Quality Challenges

Page 1: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Data Quality Engineering

Date: October 9, 2012Time: 2:00 PM ETPresented by: Dr. Peter Aiken

1

This presentation provides guidance to organizations considering data quality initiatives or preparing for data quality initiatives. This talk will illustrate how organizations with chronic business challenges often can trace the root of the problem to poor data quality. Showing how data quality can be engineered provides a useful framework in which to develop an organizational approach. This in turn will allow organizations to more quickly identify data problems caused by structural issues versus practice-oriented defects. Participants will also learn the importance of practicing data quality engineering quantification.

Startingpointfor newsystemdevelopment

data performance metadata

data architecture

dataarchitecture and

data models

shared data updated data

correcteddata

architecturerefinements

facts &meanings

Metadata &Data Storage

Starting pointfor existingsystems

Metadata Refinement• Correct Structural Defects• Update Implementation

Metadata Creation• Define Data Architecture• Define Data Model Structures

Metadata Structuring• Implement Data Model Views• Populate Data Model Views

Data Refinement• Correct Data Value Defects• Re-store Data Values

Data Manipulation• Manipulate Data• Updata Data

Data Utilization• Inspect Data• Present Data

Data Creation• Create Data• Verify Data Values

Data Assessment• Assess Data Values• Assess Metadata

Page 2: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Get Social With Us!

Live Twitter FeedJoin the conversation!

Follow us: @datablueprint

@paikenAsk questions and submit your comments: #dataed

2

Like Us on Facebookwww.facebook.com/

datablueprint Post questions and

commentsFind industry news, insightful

content and event updates.

Join the GroupData Management &

Business IntelligenceAsk questions, gain insights and collaborate with fellow

data management professionals

Page 3: Data-Ed Engineering Solutions to Data Quality Challenges

- datablueprint.com 10/11/2012 © Copyright this and previous years by Data Blueprint - all rights reserved!

Meet Your Presenter: Dr. Peter Aiken

• Internationally recognized thought-leader in the data management field - 30 years of experience– Recipient of multiple international

awards– Founder, Data Blueprint

(http://datablueprint.com)• 7 books and dozens of articles• Experienced w/ 500+ data

management practices in 20 countries

• Multi-year immersions with organizations as diverse as the US DoD, Deutsche Bank, Nokia, Wells Fargo, the Commonwealth of Virginia and Walmart

3

Page 4: Data-Ed Engineering Solutions to Data Quality Challenges

10/09/12DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION

Data Quality Engineering

Data Quality Engineering

Page 5: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Outline

5

1. Data Management Introduction

2. Data Quality Definitions & Overview

3. DQM Cycle

4. DQ Awareness & Requirements

5. DQ Dimensions

6. Data Quality Tools

7. Guiding Principles

8. References and Q&ATweeting now:

#dataed

Page 6: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

The DAMA Guide to the Data Management Body of Knowledge

6

Data Management

Functions

Published by DAMA International• The professional

association for Data Managers (40 chapters worldwide)

DMBoK organized around • Primary data

management functions focused around data delivery to the organization

• Organized around several environmental elements

Page 7: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

The DAMA Guide to the Data Management Body of Knowledge

7

Environmental Elements

Amazon:http://www.amazon.com/DAMA-Guide-Management-Knowledge-DAMA-DMBOK/dp/0977140083Or enter the terms "dama dm bok" at the Amazon search engine

Page 8: Data-Ed Engineering Solutions to Data Quality Challenges

© Copyright this and previous years by Data Blueprint - all rights reserved!

TITLE

PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATIONEDUCATION

DATE SLIDE5/15/2012

What is the CDMP?• Certified Data Management

Professional• DAMA International and ICCP• Membership in a distinct group made

up of your fellow professionals• Recognition for your specialized

knowledge in a choice of 17 specialty areas

• Series of 3 exams• For more information, please visit:

– http://www.dama.org/i4a/pages/index.cfm?pageid=3399

– http://iccp.org/certification/designations/cdmp

8

#dataed

Page 9: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Data Management

91/26/2010 © Copyright this and previous years by Data Blueprint - all rights reserved!

Page 10: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Data Management

10

Manage data coherently.

Share data across boundaries.

Assign responsibilities for data.Engineer data delivery systems.

Maintain data availability.

Data Program Coordination

Organizational Data Integration

Data Stewardship Data Development

Data Support Operations

1/26/2010 © Copyright this and previous years by Data Blueprint - all rights reserved!

Page 11: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Data Management

11

Page 12: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Overview: Data Quality Engineering

12

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

1/26/2010 © Copyright this and previous years by Data Blueprint - all rights reserved!

Page 13: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Overview: Data Quality Engineering

13

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

1/26/2010 © Copyright this and previous years by Data Blueprint - all rights reserved!

Page 14: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Outline

14

1. Data Management Introduction

2. Data Quality Definitions & Overview

3. DQM Cycle

4. DQ Awareness & Requirements

5. DQ Dimensions

6. Data Quality Tools

7. Guiding Principles

8. References and Q&ATweeting now:

#dataed

Page 15: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

DefinitionsData Quality Management• Planning, implementation and control activities that

apply quality management techniques to measure, assess, improve, and ensure the fitness of data for use

• Entails the establishment and deployment of roles, responsibilities concerning the acquisition, maintenance, dissemination, and disposition of data.” http://www2.sas.com/proceedings/sugi29/098-29.pdf

15

• Critical support process in organizational change management• Continuous process for defining the parameters for specifying

acceptable levels of data quality to meet business needs and for ensuring that data quality meets these levels

Data Quality • Synonymous with information quality, since poor data quality results

in inaccurate information and poor business performance from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

10/09/2012

Page 16: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Overview: DQM Concepts and Activities1) Data Quality Management Approach2) Develop and promote data quality awareness3) Define data quality requirements4) Profile, analyze and assess data quality5) Define data quality metrics6) Define data quality business rules7) Test and validate data quality requirements8) Set and evaluate data quality service levels9) Measure and monitor data quality10) Manage data quality issues11) Clean and correct data quality defects12) Design and implement operational DQM procedures13) Monitor operational DQM procedures and performance

16

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

Page 17: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Concepts and Activities Data quality expectations provide the inputs

necessary to define the data quality framework:– Requirements– Inspection policies– Measures, and monitors that reflect changes in data

quality and performance• The data quality framework requirements reflect 3

aspects of business data expectations1) A manner to record the expectation in business rules2) A way to measure the quality of data within that

dimension 3) An acceptability threshold

17

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

Page 18: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Outline

18

1. Data Management Introduction

2. Data Quality Definitions & Overview

3. DQM Cycle

4. DQ Awareness & Requirements

5. DQ Dimensions

6. Data Quality Tools

7. Guiding Principles

8. References and Q&ATweeting now:

#dataed

Page 19: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

The DQM CycleThe general approach to DQM is a version of the Deming cycle.

Deming proposes a problem–solving model known as “plan-do-study-act” or “plan-do-check-act”

The cycle begins by:1) Identifying data issues that are

critical to the achievement of business objectives

19

2) Defining business requirements for data quality3) Identifying key data quality dimensions4) Defining business rules critical to ensuring high quality data

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

Page 20: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

The DQM Cycle: (1) PlanPlan for the assessment of the current state and identification of key metrics for measuring quality• The data quality team

assesses the scope of known issues

• This involves:– Determining cost and

impact– Evaluating alternatives for

addressing them

20

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

Page 21: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

The DQM Cycle: (2) Deploy

21

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

Deploy processes for measuring and improving the quality of data:• Data profiling• Institute inspections and

monitors to identify data issues when they occur

• Fix flawed processes that are the root cause of data errors or correct errors downstream

• When it is not possible to correct errors at their source, correct them at their earliest point in the data flow

Page 22: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

The DQM Cycle: (3) MonitorMonitor the quality of data as measured against the defined business rules• If data quality meets defined

thresholds for acceptability, the processes are in control and the level of data quality meets the business requirements

• If data quality falls below acceptability thresholds, notify data stewards so they can take action during the next stage

22

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

Page 23: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

The DQM Cycle: (4) ActAct to resolve any

identified issues to improve data quality and better meet business expectations

• New cycles begin as new data sets come under investigation or as new data quality requirements are identified for existing data sets

23

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

Page 24: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Outline

24

1. Data Management Introduction

2. Data Quality Definitions & Overview

3. DQM Cycle

4. DQ Awareness & Requirements

5. DQ Dimensions

6. Data Quality Tools

7. Guiding Principles

8. References and Q&ATweeting now:

#dataed

Page 25: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Develop and Promote DQ Awareness• Promoting data quality awareness is

essential to ensure buy-in of necessary stakeholders in the organization

• Ensure that the right people in the organization are aware of the existence of data quality issues

• Awareness increases the chance of success of any DQM program

• Awareness includes:– Relating material impacts to data issues– Ensuring systematic approaches to

regulators– Oversight of the quality of organizational

data– Socializing the concept that data quality

problems cannot be solely addressed by technology solutions

25

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

Page 26: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Polling Question #1

26

Which is not a step to promote data quality awareness?

a) Training  on  the  core  concepts  of  data  quality

b) Establish  data  governance  framework  for  data  quality

c) Create  a  data  architecture  map

Page 27: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Develop and Promote DQ Awareness: Steps1) Training on the core

concepts of data quality

2) Establish data governance framework for data quality

3) Create a data quality oversight board that has a reporting hierarchy associated with the different data governance roles

27

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

Page 28: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Define DQ Requirements• Data quality must be understood within the context of ‘fitness for

use’• Data quality requirements are often hidden within defined

business policies• Incremental detailed review and iterative refinement of business

policies helps to identify those information requirements which become data quality rules

• Steps for incremental detailed review:– Identify key data components associated with business policies– Determine how identified data assertions affect the business– Evaluate how data errors are categorized within a set of data quality

dimensions– Specify the business rules that measure the occurrence of data

errors– Provide a means for implementing measurement processes that

assess conformance to those business rules

28

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

Page 29: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Data Quality Dimensions

29

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

Page 30: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Profile, Analyze and Assess DQData assessment using 2 different approaches:

1) Bottom-up2) Top-down

Bottom-up assessment:• Inspection and evaluation of the data sets• Highlight potential issues based on the results of automated

processes

Top-down assessment:• Engage business users to document their business processes

and the corresponding critical data dependencies• Understand how their processes consume data and which

data elements are critical to the success of the business application

30

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

Page 31: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Define DQ Metrics• Metrics development occurs as part of the

strategy/design/plan step • Process for defining data quality metrics:

1) Select one of the identified critical business impacts2) Evaluate the dependent data elements, create and

update processes associate with that business impact

3) List any associated data requirements4) Specify the associated dimension of data quality and

one or more business rules to use to determine conformance of the data to expectations

5) Describe the process for measuring conformance6) Specify an acceptability threshold

31

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

Page 32: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Test and Validate DQ Requirements• Data profiling tools

analyze data to find potential anomalies

• Use the same tools for rule validation

• Rules discovered or defined during the data quality assessment phase are referenced in measuring conformance as part of the operational process

32

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

Page 33: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Set and Evaluate DQ Service Levels• Data quality inspection and monitoring are used to

measure and monitor compliance with defined data quality rules

• Data quality SLAs specify the organization’s expectations for response and remediation

• Operational data quality control defined in data quality SLAs includes:– Data elements covered by the agreement– Business impacts associated with data flaws– Data quality dimensions associated with each data element– Quality expectations for each data element of the indentified

dimensions in each application for system in the value chain– Methods for measuring against those expectations– (…)

33

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

Page 34: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Measure and Monitor DQ• DQM procedures depend on available data

quality measuring and monitoring services • 2 contexts for control/measurement of

conformance to data quality business rules exist:– In-stream: collect in-stream measurements while

creating data– In batch: perform batch activities on collections of data

instances assembled in a data set

• Apply measurements at 3 levels of granularity:– Data element value– Data instance or record– Data set

34

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

Page 35: Data-Ed Engineering Solutions to Data Quality Challenges

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Manage DQ Issues

35

Clean & Correct DQ Defects

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

• Supporting the enforcement of the data quality SLA requires a mechanism for reporting and tracking data quality incidents and activities for researching and resolving those incidents

• A data quality incident reporting system can provide this capability

• It can log the evaluation, initial diagnosis, and actions associated with data quality events

Perform data correction in 3 ways:1) Automated correction2) Manual directed correction3) Manual correction

Page 36: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Manage DQ Issues: Example

36

Data quality incident tracking focuses on training staff to recognize

when data issues appear and how they are to be classified, logged and tracked according to the data quality SLA

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

Page 37: Data-Ed Engineering Solutions to Data Quality Challenges

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Design and Implement Operational DQM

Procedures

37

Monitor Operational DQM Procedures and

Performances1) Inspection and monitoring2) Diagnosis and evaluation

of remediation alternatives

3) Resolve issues4) Reporting

1) Accountability is critical to governance protocols overseeing data quality control

2) All issues must be assigned

3) The tracking process should specify and document the ultimate issue accountability

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

Page 38: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Outline

38

1. Data Management Introduction

2. Data Quality Definitions & Overview

3. DQM Cycle

4. DQ Awareness & Requirements

5. DQ Dimensions

6. Data Quality Tools

7. Guiding Principles

8. References and Q&ATweeting now:

#dataed

Page 39: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Example: Data Quality Interview Session Summary

• During mid-February, the Data Governance Team and Data Blueprint conducted ten qualitative interview sessions with groups of individuals who interact with data on regular basis

• A series of patterns emerged as participants shared stories about the impact of poor data quality on the client, its products, and its customers

• These patterns highlight gaps in best practices for ensuring data quality, i.e. the extent to which data is “fit for use”

• Our preliminary analysis evaluated these stories against attributes of four data quality dimensions

• At this early stage of the post-interview process, we are seeking confirmation of our assumptions and method

39

Page 40: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Which Activities Support Quality Data?

40

• Data quality best practices depend on both– Practice-oriented activities– Structure-oriented activities

Practice-oriented activities focus on the capture and manipulation of data

Structure-oriented activities focus on the data implementation

Quality Data

Page 41: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Quality Dimensions

41

Practice-oriented causes • Stem from a failure to rigor when

capturing and manipulating data such as:– Edit masking– Range checking of input data– CRC-checking of transmitted data

Structure-oriented causes• Occur because of data and metadata that has been arranged

imperfectly. For example: – When the data is in the system but we just can't access it; – When a correct data value is provided as the wrong response to

a query; or – When data is not provided because it is unavailable or

inaccessible to the customer• Developer focus within system boundaries instead of within

organization boundaries

Page 42: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Practice-Oriented Activities

42

• Affect the Data Value Quality and Data Representation Quality

• Examples of improper practice-oriented activities:– Allowing imprecise or incorrect data to be collected when

requirements specify otherwise– Presenting data out of sequence

• Typically diagnosed in bottom-up manner: find and fix the resulting problem

• Addressed by imposing more rigorous data-handling governance

Quality  of  Data  Representa2on

Quality  of  Data  Values

Practice-oriented activities

Page 43: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Structure-Oriented Activities

43

• Affect the Data Model Quality and Data Architecture Quality• Examples of improper structure-oriented activities:

– Providing a correct response but incomplete data to a query because the user did not comprehend the system data structure

– Costly maintenance of inconsistent data used by redundant systems

• Typically diagnosed in top-down manner: root cause fixes• Addressed through fundamental data structure governance

Quality  of  Data  Architecture

Quality  of    Data  Models

Structure-oriented activities

Page 44: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

4 Dimensions of Data Quality

44

An organization’s overall data quality is a function of four distinct components, each with its own attributes:

• Data Value: the quality of data as stored & maintained in the system

• Data Representation – the quality of representation for stored values; perfect data values stored in a system that are inappropriately represented can be harmful

• Data Model – the quality of data logically representing user requirements related to data entities, associated attributes, and their relationships; essential for effective communication among data suppliers and consumers

• Data Architecture – the coordination of data management activities in cross-functional system development and operations

Pra

ctic

e-or

ient

edStructure-­‐

orie

nted

10/09/2012

Page 45: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Effective Data Quality Engineering

45

• Data quality engineering has been focused on operational problem correction– Directing attention to practice-oriented data imperfections

• Data quality engineering is more effective when also focused on structure-oriented causes– Ensuring the quality of shared data across system

boundaries

Data  Representa9on  Quality

As  presented  to  the  user

Data  Value  Quality

As  maintained  in  the  system

Data  Model  Quality

As  understood  by  developers

Data  Architecture  Quality

As  an  organiza9onal  asset

(closer  to  the  architect)(closer  to  the  user)

Page 46: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Full Set of Data Quality Attributes

46

Page 47: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Data Value Quality

47

Page 48: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Data Representation Quality

48

Page 49: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Data Model Quality

49

Page 50: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Data Architecture Quality

50

Page 51: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Startingpointfor newsystemdevelopment

data performance metadata

data architecture

dataarchitecture and

data models

shared data updated data

correcteddata

architecturerefinements

facts &meanings

Metadata &Data Storage

Starting pointfor existingsystems

Metadata Refinement• Correct Structural Defects• Update Implementation

Metadata Creation• Define Data Architecture• Define Data Model Structures

Metadata Structuring• Implement Data Model Views• Populate Data Model Views

Data Refinement• Correct Data Value Defects• Re-store Data Values

Data Manipulation• Manipulate Data• Updata Data

Data Utilization• Inspect Data• Present Data

Data Creation• Create Data• Verify Data Values

Data Assessment• Assess Data Values• Assess Metadata

Extended data life cycle model with metadata sources and uses

51

Page 52: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Data Quality Engineering

52

üü ü üü ü ü

üü ü üü ü ü

üü ü üü ü ü

üü ü üü ü ü

üü ü üü ü üüü ü üü ü ü

üü ü üü ü ü

1/26/2010 © Copyright this and previous years by Data Blueprint - all rights reserved!

üü ü üü ü ü

üü ü üü ü ü

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

Page 53: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Goals and Principles

data quality control into the system development life cycle§ To provide defined processes for measuring,

monitoring, and reporting conformance to acceptable levels of data quality

53

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

1/26/2010 © Copyright this and previous years by Data Blueprint - all rights reserved!

§ To measurably improve the quality of data in relation to defined business expectations

§ To define requirements and specifications for integrating

Page 54: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Activities• Develop and Promote Data Quality Awareness• Set and Evaluate Data Quality Service Levels• Test and Validate Data Quality Requirements• Profile, Analyze, and Assess Data Quality• Continuously Measure and Monitor Data Quality• Monitor Operational DQM Procedures and Performance• Define Data Quality Business Rules• Define Data Quality Metrics• Manage Data Quality Issues• Clean and Correct Data Quality Defects• Define Data Quality Requirements• Design and Implement Operational DQM Procedures

54

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

1/26/2010 © Copyright this and previous years by Data Blueprint - all rights reserved!

Page 55: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Primary Deliverables

• Improved Quality Data• Data Management

Operational Analysis• Data profiles• Data Quality Certification

Reports• Data Quality Service

Level Agreements

55

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

1/26/2010 © Copyright this and previous years by Data Blueprint - all rights reserved!

Page 56: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Roles and Responsibilities

56

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

1/26/2010 © Copyright this and previous years by Data Blueprint - all rights reserved!

Suppliers:§ External Sources§ Regulatory Bodies§ Business Subject Matter

Experts§ Information Consumers§ Data Producers§ Data Architects§ Data Modelers§ Data Stewards

Participants:§ Data Quality Analysts§ Data Analysts§ Database Administrators§ Data Stewards§ Other Data Professionals§ DRM Director§ Data Stewardship Council

Consumers:§ Data Stewards§ Data Professionals§ Other IT Professionals§ Knowledge Workers§ Managers and

Executives§ Customers

Page 57: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Polling Question #2

57

What is one guiding principle for data quality?

a. Business  process  owners  will  agree  to  and  abide  by  data  quality  SLAs

a. IdenDfy  a  blue  record  for  all  data  elements

a. Upstream  data  consumers  specific  data  quality  expectaDons

Page 58: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Outline

58

1. Data Management Introduction

2. Data Quality Definitions & Overview

3. DQM Cycle

4. DQ Awareness & Requirements

5. DQ Dimensions

6. Data Quality Tools

7. Guiding Principles

8. References and Q&ATweeting now:

#dataed

Page 59: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Technology• Data Profiling Tools• Statistical Analysis Tools• Data Cleansing Tools• Data Integration Tools• Issue and Event Management Tools

59

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

Page 60: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Overview: Data Quality Tools4 categories of activities:

1) Analysis2) Cleansing3) Enhancement4) Monitoring

60

Principal tools:1) Data Profiling2) Parsing and

Standardization3) Data Transformation4) Identity Resolution and

Matching5) Enhancement6) Reporting

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

Page 61: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

DQ Tool #1: Data Profiling• Data profiling is the assessment of

value distribution and clustering of values into domains

• Need to be able to distinguish between good and bad data before making any improvements

• Data profiling is a set of algorithms for 2 purposes:– Statistical analysis and assessment of the data quality values within a

data set

– Exploring relationships that exist between value collections within and across data sets

• At its most advanced, data profiling takes a series of prescribed rules from data quality engines. It then assesses the data, annotates and tracks violations to determine if they comprise new or inferred data quality rules

61

Page 62: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

DQ Tool #1: Data Profiling, cont’d• Data profiling vs. data quality-business context and

semantic/logical layers– Data quality is concerned with proscriptive rules– Data profiling looks for patterns when rules are adhered to and

when rules are violated; able to provide input into the business context layer

• Incumbent that data profiling services notify all concerned parties of whatever is discovered

• Profiling can be used to…– …notify the help desk that valid

changes in the data are about to case an avalanche of “skeptical user” calls

– …notify business analysts of precisely where they should be working today in terms of shifts in the data

62

Page 63: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

DQ Tool #2: Parsing & Standardization • Data parsing tools enable the definition

of patterns that feed into a rules engine used to distinguish between valid and invalid data values

• Actions are triggered upon matching a specific pattern

• When an invalid pattern is recognized, the application may attempt to transform the invalid value into one that meets expectations

• Data standardization is the process of conforming to a set of business rules and formats that are set up by data stewards and administrators

• Data standardization example:– Brining all the different formats of “street” into a single format, e.g.

“STR”, “ST.”, “STRT”, “STREET”, etc.

63

Page 64: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

DQ Tool #3: Data Transformation• Upon identification of data

errors, trigger data rules to transform the flawed data

• Perform standardization and guide rule-based transformations by mapping data values in their original formats and patterns into a target representation

• Parsed components of a pattern are subjected to rearrangement, corrections, or any changes as directed by the rules in the knowledge base

64

Page 65: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

DQ Tool #4: Identify Resolution & Matching• Data matching enables analysts to identify relationships between

records for de-duplication or group-based processing• Matching is central to maintaining data consistency and integrity

throughout the enterprise• The matching process should be used in

the initial data migration of data into a single repository

2 basic approaches to matching:• Deterministic

– Relies on defined patterns/rules for assigning weights and scores to determine similarity

– Predictable– Dependent on rules developers anticipations

• Probabilistic – Relies on statistical techniques for assessing the probability that any pair of record

represents the same entity– Not reliant on rules– Probabilities can be refined based on experience -> matchers can improve

precision as more data is analyzed

65

Page 66: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

DQ Tool #5: EnhancementDefinition:• A method for adding value to

information by accumulating additional information about a base set of entities and then merging all the sets of information to provide a focused view. Improves master data.

Benefits:• Enables use of third party data

sources• Allows you to take advantage of

the information and research carried out by external data vendors to make data more meaningful and useful

Examples of data enhancements:

• Time/date stamps• Auditing information• Contextual information• Geographic information• Demographic information• Psychographic information

66

Page 67: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

DQ Tool #6: Reporting• Good reporting supports:

– Inspection and monitoring of conformance to data quality expectations– Monitoring performance of data stewards conforming to data quality

SLAs– Workflow processing for data quality incidents– Manual oversight of data cleansing and correction

• Data quality tools provide dynamic reporting and monitoring capabilities

• Enables analyst and data stewards to support and drive the methodology for ongoing DQM and improvement with a single, easy-to-use solution

• Associate report results with:– Data quality measurement– Metrics– Activity

67

Page 68: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Outline

68

1. Data Management Introduction

2. Data Quality Definitions & Overview

3. DQM Cycle

4. DQ Awareness & Requirements

5. DQ Dimensions

6. Data Quality Tools

7. Guiding Principles

8. References and Q&ATweeting now:

#dataed

Page 69: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Guiding Principles1) Manage data as a core organizational asset. 2) Identify a gold record for all data elements3) All data elements will have a standardized data definition, data type, and

acceptable value domain4) Leverage data governance for the control and performance of DQM5) Use industry and international data standards whenever possible6) Downstream data consumers specify data quality expectations7) Define business rules to assert conformance to data quality expectations8) Validate data instances and data sets against defined business rules9) Business process owners will agree to and abide by data quality SLAs10) Apply data corrections at the original source if possible11) If it is not possible to correct data at the source, forward data corrections to

the owner of the original source. Influence on data brokers to conform to local requirements may be limited

12) Report measured levels of data quality to appropriate data stewards, business process owners, and SLA managers

69

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

Page 70: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Interdependencies - Tools alone cannot do the job!

Data Quality Tools(Technology)

Data Cleansing and Prevention(Process)

Education and Training(People)

Page 71: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Summary: Data Quality Engineering

71

from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International

1/26/2010 © Copyright this and previous years by Data Blueprint - all rights reserved!

Page 72: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Outline

72

1. Data Management Introduction

2. Data Quality Definitions & Overview

3. DQM Cycle

4. DQ Awareness & Requirements

5. DQ Dimensions

6. Data Quality Tools

7. Guiding Principles

8. References and Q&ATweeting now:

#dataed

Page 73: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Recommended Reading

73

Page 74: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Questions?

74

It’s your turn! Use the chat feature or Twitter (#dataed) to submit

your questions to Peter now.

+ =

Page 75: Data-Ed Engineering Solutions to Data Quality Challenges

TITLE

PRODUCED  BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060

CLASSIFICATION

EDUCATIONDATE SLIDE

10/09/1210/04/12 © Copyright this and previous years by Data Blueprint - all rights reserved!

Upcoming Events

75

November Webinar: Get the Most Out of Your Tools: Data Management TechnologiesNovember 13, 2012 @ 2:00 PM – 3:30 PM ET(11:00 AM-12:30 PM PT)

December Webinar:Show Me the Money: The Business Value of Data and ROIDecember 11, 2012 @ 2:00 PM – 3:30 PM ET(11:00 AM-12:30 PM PT)

Sign up here:• www.datablueprint.com/webinar-schedule • www.Dataversity.net

Brought to you by: