Conducting an Effective Data Quality Software Evaluation

17
Conducting an Effective DQ Software Evaluation

description

Now that you have defined your DQ project scope it is important to evaluate the vendors and tools available to cleanse your data quality system. This short but effective guide will help you to create and evaluate a short list of vendors, develop your sample data, and interpret the resulting work from each vendor. Included are tips to develop your sample data with fuzzy matching features, and company names data. Also discussed are ways to evaluate specific vendors and their tools, as well as the best way to interpret the results gathered from these vendors. With these last steps completed, you will be on your way to selecting the best vendor possible to handle your data quality project.

Transcript of Conducting an Effective Data Quality Software Evaluation

Page 1: Conducting an Effective Data Quality Software Evaluation

Conducting an Effective DQ Software Evaluation

Page 2: Conducting an Effective Data Quality Software Evaluation

Intro

Conduct an effective Data Quality system evaluation by following these guidelines:

Create a short list of suppliers to consider (include yourself!)

Develop your samples for trial with data from your databases

Engage the supplier to help you through the trial process

Evaluate the suppliers and their tools according to your business requirements

Page 3: Conducting an Effective Data Quality Software Evaluation

Create Your Short List

This sounds easy!

In reality, the current data quality industry is saturated with White Papers, Webinars, YouTube Channels, etc.

All contain different messages, focus areas, product features and terminology.

Employ some of the following best practices to narrow down to a reasonable short list that is optimal for evaluation

Page 4: Conducting an Effective Data Quality Software Evaluation

Create Your Short List

Finding the suppliers

Utilize varying search terms because different suppliers use different terminology interchangeably.

Search for user groups, blogs, and analyst pages as well

Function First

Start your initial review by going over your Functional Requirements and choosing suppliers that can fill those needs.

Don’t worry at this point about finding a supplier that does everything under one roof – that can be a deciding factor later on.

Page 5: Conducting an Effective Data Quality Software Evaluation

Create Your Short List

Features Second

Start narrowing down your list of suppliers based on the specific features within each category.

Now is the time to consider your Needs vs Wants and abandon anyone who cannot service the basic necessities

Cross-Reference the Buzz

Industry hype can be used to eliminate companies from the competition based on awful press or truly negative customer reviews.

The very best product for the job may not be the one with the most eye-catching cover!

Page 6: Conducting an Effective Data Quality Software Evaluation

Create Your Short List

Add Yourself to the Short List

At some point, someone will suggest internally that you already have the resources needed to handle the job

Look at this step proactively as though you are one of the suppliers

Evaluate your potential to carry out quality initiatives

Page 7: Conducting an Effective Data Quality Software Evaluation

The first word of advice – Use Real Data

Many software trials come preinstalled with sample or demo data

These results will not be a clear reflection of your match results!

Use the following guidelines to develop a real data set that is representative of your common database challenges

Develop Your Sample Data

Page 8: Conducting an Effective Data Quality Software Evaluation

In trials, the matchIT API found 80% more accurate matches to the MPS file than Experian Intact’s Bureau

Develop Your Sample DataFor fuzzy matching, does your test data include these scenarios?

Phonetic matches (e.g. Naughton and Norton)

Reading errors (e.g. Norton and Horton)

Typing errors (e.g. Notron, Noron, Nortopn and Norton)

Examples such as Mr. J. Smith and John Smith, Mr. J R Smith

Names are reversed (John Smith and Smith, John)

One record has missing address elements (e.g. suite number or a locality within a city)

One record has the full postal code and the other a partial postal code or no postal code at all

Page 9: Conducting an Effective Data Quality Software Evaluation

Develop Your Sample Data

When selecting company names data, consider including the following challenges:

Acronyms (e.g. IBM, I B M, I.B.M., International Business Machines)

One record has missing name elements e.g.

The Crescent Hotel, Crescent Hotel

Breeze Ltd, Breeze

Deloitte & Touche, Deloitte, Deloittes

helpIT systems, helpIT, helpIT systems inc., helpIT Group

Page 10: Conducting an Effective Data Quality Software Evaluation

Develop Your Sample Data Ensure that you have groups of records where the data that matches exactly, varies for pairs within the group

URN Name Email Telephone101 John Smith [email protected] John Smith [email protected] 211-456-8352298 John Smith 211-456-8352144 John Smith [email protected] 211-456-8352

There are two clusters here, one containing three records with the same email address, and another one containing three records with the same phone number.

URN Name Email Telephone101 Juan Marcos [email protected] 646-498-3055144 Juan Marcos [email protected] 211-456-8352298 Juan Marcos [email protected] 646-498-3055144 Juan Marcos [email protected] 211-456-8352

In both of these examples, clusters based on email address and the clusters based on phone number should all be grouped into one set by the matching software.

Page 11: Conducting an Effective Data Quality Software Evaluation

Develop Your Sample Data

If you don’t have these scenarios, you can doctor your real data to create them

Make sure to start with real records as close to possible to the test cases

Don’t make more than one change to a genuine record to create the test record, otherwise your data becomes too artificial

Work with a reasonable size sample rather than a whole database or file

Take two selections from your data e.g. one for a specific postal code or geographical area, and one with an alphabetical range by last name

Join the selections and eliminate all of the exact matches

In the end you should have:

A reasonable size sample, without so many obvious matches, and a reasonable number of fuzzier matches

Page 12: Conducting an Effective Data Quality Software Evaluation

Evaluate Specific Suppliers and ToolsSave time in the initial software trial – don’t try to get to grips with a whole plethora of features and options

Engage a knowledgeable salesperson and have them walk you through the software

This will give you experience of the supplier, and the level of support they will provide you in the future

supplier Tool (s) Rep Contact Info

Page 13: Conducting an Effective Data Quality Software Evaluation

Interpret the Results A simple count of duplicates, suppressions or addresses verified is meaningless

It is the number of true and false matches that is significant

The next slide shows you how to measure this

From here your business requirements will dictate what criteria you use to determine which system is best for you

It is likely that you will need to involve the supplier’s support team to tune the matching rules

Page 14: Conducting an Effective Data Quality Software Evaluation

Compare Matching ResultsWhen deduping, suppressing or matching across files, an effective way of comparing results from two systems is as follows:

1. Remove all matches from the file to be cleaned using system A.

2. Perform the same level of matching using system B, and see what matches system B finds in the supposedly ‘clean’ file.

3. Review a reasonable number of matches found by system B but not found by system A and count how many are true, false or indeterminate

4. Repeat this process the other way around i.e. clean the raw file using system B first and then see what matches system A finds in the ‘clean’ file

5. Count the number of true, false, and indeterminate matches in this file

6. Compare the counts in the two ‘clean’ files

Page 15: Conducting an Effective Data Quality Software Evaluation

Compare Results – Final Checks

When matching to a PAF file for address verification, you can look up the addresses that have been matched by one system but not the other using e.g. the national postal company website

One final evaluation trick involves matching the demo data file from system A and system B, using the data supplied by the software

These tests are easier to conduct when you have reduced your short list to two solutions

Page 16: Conducting an Effective Data Quality Software Evaluation

Keep These Things In Mind

Create a short list of suppliers who will be able to help you with your particular project goals

When developing your sample data remember to use real data as much as possible. If you cannot find our suggested samples in your data, create those samples from the data you have by altering one or two factors

It is highly recommended that you talk to the supplier when you are conducting your software trials

Page 17: Conducting an Effective Data Quality Software Evaluation

Contact helpIT US HEADQUARTERS(The Americas, Australia, New Zealand)

helpIT systems inc.51 Bedford Rd.Suite 9Katonah, NY 10536United States

US Toll Free: 866.332.7132US Local: 914.600.7240Australia: +61 280363191Fax: 914.232.1429Email: [email protected] TECHNICAL SUPPORTSupport: 866.matchITEmail: [email protected]

EUROPEAN HEADQUARTERS(UK, Europe, Asia)

helpIT systems ltd.15-17 The Crescent.LEATHERHEADSurreyKT22 8DY

United KingdomTel: +44 (0) 1372 360070Fax: +44 (0) 1372 360081Email: [email protected]

TECHNICAL SUPPORTSupport: +44 (0) 1372 225904Email: [email protected] Registered in EnglandRegistered Office: as aboveCompany No. 02007292VAT No. 564228340