How to Evaluate the Accuracy of Address Records
Transcript of How to Evaluate the Accuracy of Address Records
-
8/2/2019 How to Evaluate the Accuracy of Address Records
1/10
How to Evaluate the Accuracy of Address
Records
2011 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means
(electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation.
-
8/2/2019 How to Evaluate the Accuracy of Address Records
2/10
Abstract
The Mailability Score, Match Code, and Result Percentage ports on the Address Validator transformation provide you with
general information about the deliverability and accuracy of address data. This article tells you how to use the ports to
evaluate the data quality of address records. This article also shows you how to simplify the output codes from the ports so
that they are easy to understand.
Supported Versions
Informatica Data Quality 9.1.0
Table of Contents
Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
When to Use the Status Info Ports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Status Info Port Definitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Using Mapplet Rules to Read Status Info Port Outputs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Installing the Core Accelerator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Rule Descriptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
How to Read Mailability Score Output Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
How to Read Match Code Output Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Overview
The Address Validator transformation uses reference data to evaluate the accuracy and deliverability of postal addresses.
The transformation can correct errors in an address, add data to an address, and provide status information on address data
quality.
For example, the United States Postal Service (USPS) provides reference data that identifies every mailbox in the United
States. When the Address Validator transformation reads a United States address record, it compares each address record
with reference data that the USPS provides to Informatica.
The Address Validator transformation handles the input addresses in the following ways:
If the transformation finds a perfect match between the input address and the USPS reference data, it writes the
address information to the output ports with no change.
If the transformation finds a partial match in the reference data, it selects the correct address elements from the
reference data and writes the correct elements to the output ports.
If the transformation cannot find a match in the reference data, it attempts to write the correct form of each input
element to the output ports. The resulting record may not contain a deliverable address.
In each case, the transformation writes the results of the matching operation to status ports that indicate the data quality ofthe address.
2
-
8/2/2019 How to Evaluate the Accuracy of Address Records
3/10
The following illustration shows the Status Info port group on the Templates tab:
Note: The Status Info port group includes the Address Type port. This output port describes the type of mailbox in a United
States or Canadian address. The Address Type port does not contain information about the deliverable status of the address.
When to Use the Status Info PortsThe Mailability Score, Match Code, and Result Percentage ports provide summary information about the data quality of each
address in the data set. The Element Input Status, Element Relevance, and Element Result Status ports provide detailed
information on each element in each address record.
The Mailability Score, Match Code, and Result Percentage outputs are useful indicators of whether you need to define an
address validation stage in a data project.
If the Mailability Score, Match Code, and Result Percentage outputs indicate that all addresses meet your data quality
standards, you do not need to perform additional address validation.
If you find one or more addresses that are not valid, review the outputs on the Element Input Status, Element Relevance,
and Element Result Status ports. Use the ports to identify the address elements that you need to fix.
Status Info Port Definitions
Each Status Info port provides a different type of information about an address record.
The Mailability Score, Match Code, and Result Percentage ports perform the following types of analysis:
3
-
8/2/2019 How to Evaluate the Accuracy of Address Records
4/10
Mailability Score
This port describes the likely outcome of any attempt to deliver mail to the address. The Mailability Score output is
a text description that summarizes the quality of the address in terms of the risk to mail delivery.
Select this port when you need general information on the quality of the input data that you connect to the Address
Validator transformation.
Match Code
This port describes the results of the validation operation that the Address Validator performed on the input
address. The Match Code output is a two-character string that represents the success or failure of the operation.
Select this port to identify output address records that are valid or not valid.
Result Percentage
This port indicates the degree of overall similarity between an input address and the address validation results.
The Result Percentage output is a percentage value. Higher percentage values indicate greater similarity between
the input and output address.
Select this port to identify address records that changed during address validation and to review the extent of the
changes.
Using Mapplet Rules to Read Status Info Port Outputs
The Mailability Score and Match Code ports provide information about address quality in coded format. You can use
Informatica mapplet rules to parse information from the output codes.
The rules use reference tables to convert each code value into an English-language equivalent.
The rules and reference tables are part of the Core Accelerator, which is available to Data Quality customers. The Core
Accelerator contains a rule for each port on the Status Info output group except the Result Percentage port. Result
Percentage does not write coded output.
To use an accelerator rule in an address validation mapping, complete the following steps:
1. Download the accelerator, and import the accelerator objects to the Model repository.
2. Add a rule to an address validation mapping. The mapping must read address records from a data object, and it
must contain an Address Validator transformation.
3. Connect a Status Info port on the Address Validator transformation to the rule you added to the mapping. The rule
name contains the name of the port that you connect to.
4. Run the mapping, or run the Data Vieweron the Address Validator transformation. If you run the mapping, add a
writable data object as a target and connect the rule output to the data object.
5. Read the rule output. If you ran a mapping, open the data object. If you ran the Data Viewer, resize the Developer
tool so that the Data Viewercolumns are visible.
6. Evaluate the rule output, and decide the next steps you need to take for the address data.
Installing the Core Accelerator
You download the Core Accelerator with the Data Quality Content Installer. You find the accelerator object XML file and
reference table ZIP file the in the Accelerator_Content directory of the Content Installer package.
Use the Developer tool to import the accelerator rules to the Model repository.
The rules and reference tables appear in the Model repository in the project you specified during import. Find the rules you
need for the status information ports in this repository location:
[Content_Project_Name]\[Rule_Folder_Name]\General_Data_Cleansing
4
-
8/2/2019 How to Evaluate the Accuracy of Address Records
5/10
The reference tables install to this location:
[Content_Project_Name]\[Rule_Folder_Name]\Dictionaries
You do not need to open or edit the reference tables.
Note: The reference tables used by the rules are different to the address reference data files used by the Address Validator
transformation. You purchase the address reference data files from Informatica. You cannot open or edit the addressreference data files.
Rule Descriptions
Each rule contains an input data object, a Parser transformation, and an output data object.
The following rules parse the output codes on the Match Code and Mailability Score ports:
rule_Assign_DQ_90_Mailability_Score_Description
This rule writes a description of the output code values on the Mailability Score port.
The Parser transformation in this rule reads the reference table DQ90_AV_MailabilityScores_infa.
rule_Assign_DQ_90_Match_Code_Descriptions
This rule writes a description of the output code values on the Match Code port.
The Parser transformation in this rule reads the reference table DQ90_Match_code_desc_infa.
Note: The Core Accelerator does not contain a mapplet rule for the Result Percentage port. The Result Percentage port
does not write coded output. The port writes a percentage value.
The following illustration shows rule_Assign_DQ_90_Mailability_Score_Description in the Developer tool:
How to Read Mailability Score Output Codes
The Mailability Score port output is a single digit that indicates the likelihood of successful delivery to the output address.
After you run the address validation mapping, review the output data from this port to determine if an address needs further
validation. Connect the port to rule_Assign_DQ_90_Mailability_Score_Description to generate text descriptions of each
output code.
5
-
8/2/2019 How to Evaluate the Accuracy of Address Records
6/10
The following table lists the possible values in the output code and the text that the transformation reads from the reference
table DQ90_AV_MailabilityScores_infa:
Output
Code
Reference Table
Text
Description
5 completely
confident
All address data elements that are relevant to delivery are present and correct.
4 al most certai n The address ha s a uni que mat ch in the add ress re ference data and on e of the fol lowing cases
applies:
- Some data elements could not be checked by the address reference data.
- Some data elements were corrected with a very high degree of confidence.
The validation process returns this output code when the number of unmatched elements is
very low.
3 should be fine Some data elements were corrected with a very high degree of confidence.
The validation process returns this output code when the address has a unique match in the
address reference data and the number of unmatched elements is acceptable.
2 fair chance Address data elements that are relevant to delivery are present, and one of the followingscenarios also applies:
- The validation process did not find a strong match in the address reference data.
- The validation process found multiple matches and has similar levels of confidence in each
match.
1 risky The validation process found a partial match between the input data and the address reference
data. The output address is likely to be incomplete.
0 futile The input address is missing too many elements, or a majority of the elements generated no
matches in the address reference data.
How to Read Match Code Output Codes
The Match Code port output describes the results of the address validation operation performed on the input address.
After you run the address validation mapping, review the output data from this port to establish the data quality of the input
addresses. Connect the port to rule_Assign_DQ_90_Match_Code_Descriptions to generate text descriptions of each output code.
The following table lists the possible values at each position in the output code and the text that the transformation reads
from the reference table DQ90_Match_code_desc_infa:
Output Code Reference Table Text Description
V4 Verified - Input data correct - all elements
were checked and input matched perfectly
The input address is a perfect match with
a single address in the address data. The
input and output addresses in the record
use the same information.
V3 Verified - Input data correct on input but
some or all elements were standardised
or input contains outdated names or
exonyms
The output address matches a single
address in the address data. The Address
Validator transformation edited one or
6
-
8/2/2019 How to Evaluate the Accuracy of Address Records
7/10
Output Code Reference Table Text Description
more input data elements for one of the
following reasons:
- An input element uses a name other
than the local name.- An input element uses a name that is
out of date.
V2 Verified - Input data correct but some
elements could not be verified because of
incomplete reference data
The output address matches a single
address in the address data, but the
Address Validator transformation could
not verify every input element because
some address reference data files are not
installed.
V1 Verified - Input data correct but the user
standardisation has deteriorated
deliverability (wrong element user
standardisation - for example postcode
length chosen is too short). Not set byvalidation.
The input address matches a single
address in the address data, but the
Address Validator transformation cannot
write some output data because an output
port has the wrong precision. The outputaddress may be undeliverable.
C4 Corrected - all elements have been
checked
The input address contains information
that matches a single address in the
address data, and the Address Validator
transformation replaced one or more
elements with new elements from the
address reference data. All output
elements are verified correct for the
address.
C3 Corrected - but some elements could not
be checked
The input address contains information
that matches a single address in the
address data, and the Address Validator
transformation replaced one or more
elements with new elements from theaddress reference data. All output
elements are verified correct for the
address. However, the transformation
could not verify every input element.
C2 Corrected - but delivery status unclear
(lack of reference data)
The Address Validator transformation
replaced one or more elements with new
elements from the address reference
data. However, the transformation could
not verify deliverability as some address
reference data files are not installed.
C1 Corrected - but delivery status unclear
because user standardisation was wrong.
Not set by validation.
The Address Validator transformation
replaced one or more elements with new
elements from the address referencedata. However, the transformation could
not verify deliverability as some input
elements cannot be corrected.
I4 Data could not be corrected completely
but is very likely to be deliverable - single
match (e.g. HNO is wrong but only 1 HNO
is found in reference data)
The output address matches a single
address in the address data, but the
Address Validator transformation could
7
-
8/2/2019 How to Evaluate the Accuracy of Address Records
8/10
Output Code Reference Table Text Description
not verify every input element. The input
data is likely to contain an error.
I3 Data could not be corrected completelybut is very likely to be deliverable -
multiple matches (e.g. HNO is wrong but
more than 1 HNO is found in reference
data)
The output address is very likely to bedeliverable but the Address Validator
transformation found multiple matches for
one or more input elements in the
address reference data. For example, a
house number is incorrect but the number
is in the correct range.
I2 Data could not be corrected but there is a
slim chance that the address is deliverable
The Address Validator transformation
cannot find matching address data in the
address reference data. However, the
input record contains data in an address
format that may be deliverable.
I1 Data could not be corrected and is pretty
unlikely to be delivered.
The Address Validator transformation
cannot find matching address data in theaddress reference data. The input record
does not contain data that is likely to be
deliverable.
Q3 FastCompletion Status - Suggestions are
available - complete address
The Address Validator transformation
found multiple good matches for the input
record data in the address reference data.
The transformation returns this code in
Suggestion List mode.
Q2 FastCompletion Status - Suggested
address is complete but combined with
elements from the input (added or deleted)
The Address Validator transformation
found a partial match for the input record
data in the address reference data. The
transformation returned a complete
address.The transformation returns this code in
Suggestion List mode.
Q1 FastCompletion Status - Suggested
address is not complete (enter more
information)
The Address Validator transformation did
not find a match for the input record data
in the address reference data. The
transformation returned a partial address.
The transformation returns this code in
Suggestion List mode.
Q0 FastCompletion Status - Insufficient
information provided to generate
suggestions.
The Address Validator transformation did
not find a match for the input record data
in the address reference data. The
transformation did not return any data for
the address.
The transformation returns this code in
Suggestion List mode.
RA Country recognized from
ForceCountryISO3 Setting
The Address Validator transformation
used the Force Country setting to add
country name data to the address.
8
-
8/2/2019 How to Evaluate the Accuracy of Address Records
9/10
Output Code Reference Table Text Description
R9 Country recognized from
DefaultCountryISO3 Setting
The Address Validator transformation
used the Default Country setting to add
country name data to the address.
R8 Country recognized from name without
errors
The Address Validator transformation
identified a destination country from the
input data.
R7 Country recognized from name with errors The Address Validator transformation
identified a destination country from the
input data, but the input data contains
inconsistent data for this country.
R6 Country recognized from territory The Address Validator transformation
identified a destination country from state
or national territory information in the
input data.
R5 Country recognized from province The Address Validator transformationidentified a destination country from
province information in the input data.
R4 Country recognized from major town The Address Validator transformation
identified a destination country from city
or town information in the input data.
R3 Country recognized from format The Address Validator transformation
identified a destination country from the
structure of the address.
R2 Country recognized from script The Address Validator transformation
identified a destination country from data
provided by a script.
R1 Country not recognized - multiple matches The Address Validator transformation
identified several possible destination
countries. The transformation did not
verify a country for the address.
R0 Country not recognized The Address Validator transformation
could not identify a destination country for
the input address data.
S4 Parsed perfectly The Address Validator transformation
parsed all input elements successfully.
The transformation returns this code in
Parsing mode.
S3 Parsed with multiple results The Address Validator transformationparsed all input elements, but some
elements match multiple element types.
The transformation returns this code in
Parsing mode.
S2 Parsed with Errors - Elements change
position
The Address Validator transformation
parsed all input elements, but the
9
-
8/2/2019 How to Evaluate the Accuracy of Address Records
10/10
Output Code Reference Table Text Description
transformation changed the element type
in one or more cases.
The transformation returns this code in
Parsing mode.
S1 Parse Error - Input Format Mismatch The Address Validator transformation
could not parse input elements because
the address structure did not match the
address reference data structure.
The transformation returns this code in
Parsing mode.
N6 Validation Error: No validation performed
because input data was insufficient
The Address Validation transformation
could not validate the address because
the transformation lacked usable input
data.
N5 Validation Error: No validation performed
because reference database is too old -
please contact Address Doctor to obtain
updated reference data
The Address Validation transformation
could not validate the address because
the address reference data is out of date.
N4 Validation Error: No validation performed
because reference database is corrupt or
in wrong format
The Address Validation transformation
could not validate the address because it
could not read the address reference data.
N3 Validation Error: No validation performed
because country could not be unlocked
The Address Validation transformation
could not validate the address because it
could find an address reference data
license.
N2 Validation Error: No validation performed
because required reference database is
not available
The Address Validation transformation
could not validate the address because it
could not find address reference data forthe destination country.
N1 Validation Error: No validation performed
because country was not recognized
The Address Validation transformation
could not validate the address because it
could not associate the input address with
a country.
Conclusion
The Status Info ports contain detailed information about address accuracy and deliverability.
Select a Mailability Score, Match Code, or Result Percentage port when you need general information about the
deliverability of an address. Use the Informatica accelerator rules to convert the port output codes into text descriptions thatyou can more quickly understand.
Author
David Handy
Principal Technical Writer
10