Conducting an Effective Data Quality Software Evaluation
-
Upload
helpitsystems -
Category
Business
-
view
237 -
download
0
description
Transcript of Conducting an Effective Data Quality Software Evaluation
![Page 1: Conducting an Effective Data Quality Software Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022082920/554f90e3b4c905d25b8b51d0/html5/thumbnails/1.jpg)
Conducting an Effective DQ Software Evaluation
![Page 2: Conducting an Effective Data Quality Software Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022082920/554f90e3b4c905d25b8b51d0/html5/thumbnails/2.jpg)
Intro
Conduct an effective Data Quality system evaluation by following these guidelines:
Create a short list of suppliers to consider (include yourself!)
Develop your samples for trial with data from your databases
Engage the supplier to help you through the trial process
Evaluate the suppliers and their tools according to your business requirements
![Page 3: Conducting an Effective Data Quality Software Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022082920/554f90e3b4c905d25b8b51d0/html5/thumbnails/3.jpg)
Create Your Short List
This sounds easy!
In reality, the current data quality industry is saturated with White Papers, Webinars, YouTube Channels, etc.
All contain different messages, focus areas, product features and terminology.
Employ some of the following best practices to narrow down to a reasonable short list that is optimal for evaluation
![Page 4: Conducting an Effective Data Quality Software Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022082920/554f90e3b4c905d25b8b51d0/html5/thumbnails/4.jpg)
Create Your Short List
Finding the suppliers
Utilize varying search terms because different suppliers use different terminology interchangeably.
Search for user groups, blogs, and analyst pages as well
Function First
Start your initial review by going over your Functional Requirements and choosing suppliers that can fill those needs.
Don’t worry at this point about finding a supplier that does everything under one roof – that can be a deciding factor later on.
![Page 5: Conducting an Effective Data Quality Software Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022082920/554f90e3b4c905d25b8b51d0/html5/thumbnails/5.jpg)
Create Your Short List
Features Second
Start narrowing down your list of suppliers based on the specific features within each category.
Now is the time to consider your Needs vs Wants and abandon anyone who cannot service the basic necessities
Cross-Reference the Buzz
Industry hype can be used to eliminate companies from the competition based on awful press or truly negative customer reviews.
The very best product for the job may not be the one with the most eye-catching cover!
![Page 6: Conducting an Effective Data Quality Software Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022082920/554f90e3b4c905d25b8b51d0/html5/thumbnails/6.jpg)
Create Your Short List
Add Yourself to the Short List
At some point, someone will suggest internally that you already have the resources needed to handle the job
Look at this step proactively as though you are one of the suppliers
Evaluate your potential to carry out quality initiatives
![Page 7: Conducting an Effective Data Quality Software Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022082920/554f90e3b4c905d25b8b51d0/html5/thumbnails/7.jpg)
The first word of advice – Use Real Data
Many software trials come preinstalled with sample or demo data
These results will not be a clear reflection of your match results!
Use the following guidelines to develop a real data set that is representative of your common database challenges
Develop Your Sample Data
![Page 8: Conducting an Effective Data Quality Software Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022082920/554f90e3b4c905d25b8b51d0/html5/thumbnails/8.jpg)
In trials, the matchIT API found 80% more accurate matches to the MPS file than Experian Intact’s Bureau
Develop Your Sample DataFor fuzzy matching, does your test data include these scenarios?
Phonetic matches (e.g. Naughton and Norton)
Reading errors (e.g. Norton and Horton)
Typing errors (e.g. Notron, Noron, Nortopn and Norton)
Examples such as Mr. J. Smith and John Smith, Mr. J R Smith
Names are reversed (John Smith and Smith, John)
One record has missing address elements (e.g. suite number or a locality within a city)
One record has the full postal code and the other a partial postal code or no postal code at all
![Page 9: Conducting an Effective Data Quality Software Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022082920/554f90e3b4c905d25b8b51d0/html5/thumbnails/9.jpg)
Develop Your Sample Data
When selecting company names data, consider including the following challenges:
Acronyms (e.g. IBM, I B M, I.B.M., International Business Machines)
One record has missing name elements e.g.
The Crescent Hotel, Crescent Hotel
Breeze Ltd, Breeze
Deloitte & Touche, Deloitte, Deloittes
helpIT systems, helpIT, helpIT systems inc., helpIT Group
![Page 10: Conducting an Effective Data Quality Software Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022082920/554f90e3b4c905d25b8b51d0/html5/thumbnails/10.jpg)
Develop Your Sample Data Ensure that you have groups of records where the data that matches exactly, varies for pairs within the group
URN Name Email Telephone101 John Smith [email protected] John Smith [email protected] 211-456-8352298 John Smith 211-456-8352144 John Smith [email protected] 211-456-8352
There are two clusters here, one containing three records with the same email address, and another one containing three records with the same phone number.
URN Name Email Telephone101 Juan Marcos [email protected] 646-498-3055144 Juan Marcos [email protected] 211-456-8352298 Juan Marcos [email protected] 646-498-3055144 Juan Marcos [email protected] 211-456-8352
In both of these examples, clusters based on email address and the clusters based on phone number should all be grouped into one set by the matching software.
![Page 11: Conducting an Effective Data Quality Software Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022082920/554f90e3b4c905d25b8b51d0/html5/thumbnails/11.jpg)
Develop Your Sample Data
If you don’t have these scenarios, you can doctor your real data to create them
Make sure to start with real records as close to possible to the test cases
Don’t make more than one change to a genuine record to create the test record, otherwise your data becomes too artificial
Work with a reasonable size sample rather than a whole database or file
Take two selections from your data e.g. one for a specific postal code or geographical area, and one with an alphabetical range by last name
Join the selections and eliminate all of the exact matches
In the end you should have:
A reasonable size sample, without so many obvious matches, and a reasonable number of fuzzier matches
![Page 12: Conducting an Effective Data Quality Software Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022082920/554f90e3b4c905d25b8b51d0/html5/thumbnails/12.jpg)
Evaluate Specific Suppliers and ToolsSave time in the initial software trial – don’t try to get to grips with a whole plethora of features and options
Engage a knowledgeable salesperson and have them walk you through the software
This will give you experience of the supplier, and the level of support they will provide you in the future
supplier Tool (s) Rep Contact Info
![Page 13: Conducting an Effective Data Quality Software Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022082920/554f90e3b4c905d25b8b51d0/html5/thumbnails/13.jpg)
Interpret the Results A simple count of duplicates, suppressions or addresses verified is meaningless
It is the number of true and false matches that is significant
The next slide shows you how to measure this
From here your business requirements will dictate what criteria you use to determine which system is best for you
It is likely that you will need to involve the supplier’s support team to tune the matching rules
![Page 14: Conducting an Effective Data Quality Software Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022082920/554f90e3b4c905d25b8b51d0/html5/thumbnails/14.jpg)
Compare Matching ResultsWhen deduping, suppressing or matching across files, an effective way of comparing results from two systems is as follows:
1. Remove all matches from the file to be cleaned using system A.
2. Perform the same level of matching using system B, and see what matches system B finds in the supposedly ‘clean’ file.
3. Review a reasonable number of matches found by system B but not found by system A and count how many are true, false or indeterminate
4. Repeat this process the other way around i.e. clean the raw file using system B first and then see what matches system A finds in the ‘clean’ file
5. Count the number of true, false, and indeterminate matches in this file
6. Compare the counts in the two ‘clean’ files
![Page 15: Conducting an Effective Data Quality Software Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022082920/554f90e3b4c905d25b8b51d0/html5/thumbnails/15.jpg)
Compare Results – Final Checks
When matching to a PAF file for address verification, you can look up the addresses that have been matched by one system but not the other using e.g. the national postal company website
One final evaluation trick involves matching the demo data file from system A and system B, using the data supplied by the software
These tests are easier to conduct when you have reduced your short list to two solutions
![Page 16: Conducting an Effective Data Quality Software Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022082920/554f90e3b4c905d25b8b51d0/html5/thumbnails/16.jpg)
Keep These Things In Mind
Create a short list of suppliers who will be able to help you with your particular project goals
When developing your sample data remember to use real data as much as possible. If you cannot find our suggested samples in your data, create those samples from the data you have by altering one or two factors
It is highly recommended that you talk to the supplier when you are conducting your software trials
![Page 17: Conducting an Effective Data Quality Software Evaluation](https://reader036.fdocuments.in/reader036/viewer/2022082920/554f90e3b4c905d25b8b51d0/html5/thumbnails/17.jpg)
Contact helpIT US HEADQUARTERS(The Americas, Australia, New Zealand)
helpIT systems inc.51 Bedford Rd.Suite 9Katonah, NY 10536United States
US Toll Free: 866.332.7132US Local: 914.600.7240Australia: +61 280363191Fax: 914.232.1429Email: [email protected] TECHNICAL SUPPORTSupport: 866.matchITEmail: [email protected]
EUROPEAN HEADQUARTERS(UK, Europe, Asia)
helpIT systems ltd.15-17 The Crescent.LEATHERHEADSurreyKT22 8DY
United KingdomTel: +44 (0) 1372 360070Fax: +44 (0) 1372 360081Email: [email protected]
TECHNICAL SUPPORTSupport: +44 (0) 1372 225904Email: [email protected] Registered in EnglandRegistered Office: as aboveCompany No. 02007292VAT No. 564228340