How to Evaluate the Accuracy of Address Records

download How to Evaluate the Accuracy of Address Records

of 10

Transcript of How to Evaluate the Accuracy of Address Records

  • 8/2/2019 How to Evaluate the Accuracy of Address Records

    1/10

    How to Evaluate the Accuracy of Address

    Records

    2011 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means

    (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation.

  • 8/2/2019 How to Evaluate the Accuracy of Address Records

    2/10

    Abstract

    The Mailability Score, Match Code, and Result Percentage ports on the Address Validator transformation provide you with

    general information about the deliverability and accuracy of address data. This article tells you how to use the ports to

    evaluate the data quality of address records. This article also shows you how to simplify the output codes from the ports so

    that they are easy to understand.

    Supported Versions

    Informatica Data Quality 9.1.0

    Table of Contents

    Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    When to Use the Status Info Ports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    Status Info Port Definitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    Using Mapplet Rules to Read Status Info Port Outputs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Installing the Core Accelerator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    Rule Descriptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    How to Read Mailability Score Output Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    How to Read Match Code Output Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    Overview

    The Address Validator transformation uses reference data to evaluate the accuracy and deliverability of postal addresses.

    The transformation can correct errors in an address, add data to an address, and provide status information on address data

    quality.

    For example, the United States Postal Service (USPS) provides reference data that identifies every mailbox in the United

    States. When the Address Validator transformation reads a United States address record, it compares each address record

    with reference data that the USPS provides to Informatica.

    The Address Validator transformation handles the input addresses in the following ways:

    If the transformation finds a perfect match between the input address and the USPS reference data, it writes the

    address information to the output ports with no change.

    If the transformation finds a partial match in the reference data, it selects the correct address elements from the

    reference data and writes the correct elements to the output ports.

    If the transformation cannot find a match in the reference data, it attempts to write the correct form of each input

    element to the output ports. The resulting record may not contain a deliverable address.

    In each case, the transformation writes the results of the matching operation to status ports that indicate the data quality ofthe address.

    2

  • 8/2/2019 How to Evaluate the Accuracy of Address Records

    3/10

    The following illustration shows the Status Info port group on the Templates tab:

    Note: The Status Info port group includes the Address Type port. This output port describes the type of mailbox in a United

    States or Canadian address. The Address Type port does not contain information about the deliverable status of the address.

    When to Use the Status Info PortsThe Mailability Score, Match Code, and Result Percentage ports provide summary information about the data quality of each

    address in the data set. The Element Input Status, Element Relevance, and Element Result Status ports provide detailed

    information on each element in each address record.

    The Mailability Score, Match Code, and Result Percentage outputs are useful indicators of whether you need to define an

    address validation stage in a data project.

    If the Mailability Score, Match Code, and Result Percentage outputs indicate that all addresses meet your data quality

    standards, you do not need to perform additional address validation.

    If you find one or more addresses that are not valid, review the outputs on the Element Input Status, Element Relevance,

    and Element Result Status ports. Use the ports to identify the address elements that you need to fix.

    Status Info Port Definitions

    Each Status Info port provides a different type of information about an address record.

    The Mailability Score, Match Code, and Result Percentage ports perform the following types of analysis:

    3

  • 8/2/2019 How to Evaluate the Accuracy of Address Records

    4/10

    Mailability Score

    This port describes the likely outcome of any attempt to deliver mail to the address. The Mailability Score output is

    a text description that summarizes the quality of the address in terms of the risk to mail delivery.

    Select this port when you need general information on the quality of the input data that you connect to the Address

    Validator transformation.

    Match Code

    This port describes the results of the validation operation that the Address Validator performed on the input

    address. The Match Code output is a two-character string that represents the success or failure of the operation.

    Select this port to identify output address records that are valid or not valid.

    Result Percentage

    This port indicates the degree of overall similarity between an input address and the address validation results.

    The Result Percentage output is a percentage value. Higher percentage values indicate greater similarity between

    the input and output address.

    Select this port to identify address records that changed during address validation and to review the extent of the

    changes.

    Using Mapplet Rules to Read Status Info Port Outputs

    The Mailability Score and Match Code ports provide information about address quality in coded format. You can use

    Informatica mapplet rules to parse information from the output codes.

    The rules use reference tables to convert each code value into an English-language equivalent.

    The rules and reference tables are part of the Core Accelerator, which is available to Data Quality customers. The Core

    Accelerator contains a rule for each port on the Status Info output group except the Result Percentage port. Result

    Percentage does not write coded output.

    To use an accelerator rule in an address validation mapping, complete the following steps:

    1. Download the accelerator, and import the accelerator objects to the Model repository.

    2. Add a rule to an address validation mapping. The mapping must read address records from a data object, and it

    must contain an Address Validator transformation.

    3. Connect a Status Info port on the Address Validator transformation to the rule you added to the mapping. The rule

    name contains the name of the port that you connect to.

    4. Run the mapping, or run the Data Vieweron the Address Validator transformation. If you run the mapping, add a

    writable data object as a target and connect the rule output to the data object.

    5. Read the rule output. If you ran a mapping, open the data object. If you ran the Data Viewer, resize the Developer

    tool so that the Data Viewercolumns are visible.

    6. Evaluate the rule output, and decide the next steps you need to take for the address data.

    Installing the Core Accelerator

    You download the Core Accelerator with the Data Quality Content Installer. You find the accelerator object XML file and

    reference table ZIP file the in the Accelerator_Content directory of the Content Installer package.

    Use the Developer tool to import the accelerator rules to the Model repository.

    The rules and reference tables appear in the Model repository in the project you specified during import. Find the rules you

    need for the status information ports in this repository location:

    [Content_Project_Name]\[Rule_Folder_Name]\General_Data_Cleansing

    4

  • 8/2/2019 How to Evaluate the Accuracy of Address Records

    5/10

    The reference tables install to this location:

    [Content_Project_Name]\[Rule_Folder_Name]\Dictionaries

    You do not need to open or edit the reference tables.

    Note: The reference tables used by the rules are different to the address reference data files used by the Address Validator

    transformation. You purchase the address reference data files from Informatica. You cannot open or edit the addressreference data files.

    Rule Descriptions

    Each rule contains an input data object, a Parser transformation, and an output data object.

    The following rules parse the output codes on the Match Code and Mailability Score ports:

    rule_Assign_DQ_90_Mailability_Score_Description

    This rule writes a description of the output code values on the Mailability Score port.

    The Parser transformation in this rule reads the reference table DQ90_AV_MailabilityScores_infa.

    rule_Assign_DQ_90_Match_Code_Descriptions

    This rule writes a description of the output code values on the Match Code port.

    The Parser transformation in this rule reads the reference table DQ90_Match_code_desc_infa.

    Note: The Core Accelerator does not contain a mapplet rule for the Result Percentage port. The Result Percentage port

    does not write coded output. The port writes a percentage value.

    The following illustration shows rule_Assign_DQ_90_Mailability_Score_Description in the Developer tool:

    How to Read Mailability Score Output Codes

    The Mailability Score port output is a single digit that indicates the likelihood of successful delivery to the output address.

    After you run the address validation mapping, review the output data from this port to determine if an address needs further

    validation. Connect the port to rule_Assign_DQ_90_Mailability_Score_Description to generate text descriptions of each

    output code.

    5

  • 8/2/2019 How to Evaluate the Accuracy of Address Records

    6/10

    The following table lists the possible values in the output code and the text that the transformation reads from the reference

    table DQ90_AV_MailabilityScores_infa:

    Output

    Code

    Reference Table

    Text

    Description

    5 completely

    confident

    All address data elements that are relevant to delivery are present and correct.

    4 al most certai n The address ha s a uni que mat ch in the add ress re ference data and on e of the fol lowing cases

    applies:

    - Some data elements could not be checked by the address reference data.

    - Some data elements were corrected with a very high degree of confidence.

    The validation process returns this output code when the number of unmatched elements is

    very low.

    3 should be fine Some data elements were corrected with a very high degree of confidence.

    The validation process returns this output code when the address has a unique match in the

    address reference data and the number of unmatched elements is acceptable.

    2 fair chance Address data elements that are relevant to delivery are present, and one of the followingscenarios also applies:

    - The validation process did not find a strong match in the address reference data.

    - The validation process found multiple matches and has similar levels of confidence in each

    match.

    1 risky The validation process found a partial match between the input data and the address reference

    data. The output address is likely to be incomplete.

    0 futile The input address is missing too many elements, or a majority of the elements generated no

    matches in the address reference data.

    How to Read Match Code Output Codes

    The Match Code port output describes the results of the address validation operation performed on the input address.

    After you run the address validation mapping, review the output data from this port to establish the data quality of the input

    addresses. Connect the port to rule_Assign_DQ_90_Match_Code_Descriptions to generate text descriptions of each output code.

    The following table lists the possible values at each position in the output code and the text that the transformation reads

    from the reference table DQ90_Match_code_desc_infa:

    Output Code Reference Table Text Description

    V4 Verified - Input data correct - all elements

    were checked and input matched perfectly

    The input address is a perfect match with

    a single address in the address data. The

    input and output addresses in the record

    use the same information.

    V3 Verified - Input data correct on input but

    some or all elements were standardised

    or input contains outdated names or

    exonyms

    The output address matches a single

    address in the address data. The Address

    Validator transformation edited one or

    6

  • 8/2/2019 How to Evaluate the Accuracy of Address Records

    7/10

    Output Code Reference Table Text Description

    more input data elements for one of the

    following reasons:

    - An input element uses a name other

    than the local name.- An input element uses a name that is

    out of date.

    V2 Verified - Input data correct but some

    elements could not be verified because of

    incomplete reference data

    The output address matches a single

    address in the address data, but the

    Address Validator transformation could

    not verify every input element because

    some address reference data files are not

    installed.

    V1 Verified - Input data correct but the user

    standardisation has deteriorated

    deliverability (wrong element user

    standardisation - for example postcode

    length chosen is too short). Not set byvalidation.

    The input address matches a single

    address in the address data, but the

    Address Validator transformation cannot

    write some output data because an output

    port has the wrong precision. The outputaddress may be undeliverable.

    C4 Corrected - all elements have been

    checked

    The input address contains information

    that matches a single address in the

    address data, and the Address Validator

    transformation replaced one or more

    elements with new elements from the

    address reference data. All output

    elements are verified correct for the

    address.

    C3 Corrected - but some elements could not

    be checked

    The input address contains information

    that matches a single address in the

    address data, and the Address Validator

    transformation replaced one or more

    elements with new elements from theaddress reference data. All output

    elements are verified correct for the

    address. However, the transformation

    could not verify every input element.

    C2 Corrected - but delivery status unclear

    (lack of reference data)

    The Address Validator transformation

    replaced one or more elements with new

    elements from the address reference

    data. However, the transformation could

    not verify deliverability as some address

    reference data files are not installed.

    C1 Corrected - but delivery status unclear

    because user standardisation was wrong.

    Not set by validation.

    The Address Validator transformation

    replaced one or more elements with new

    elements from the address referencedata. However, the transformation could

    not verify deliverability as some input

    elements cannot be corrected.

    I4 Data could not be corrected completely

    but is very likely to be deliverable - single

    match (e.g. HNO is wrong but only 1 HNO

    is found in reference data)

    The output address matches a single

    address in the address data, but the

    Address Validator transformation could

    7

  • 8/2/2019 How to Evaluate the Accuracy of Address Records

    8/10

    Output Code Reference Table Text Description

    not verify every input element. The input

    data is likely to contain an error.

    I3 Data could not be corrected completelybut is very likely to be deliverable -

    multiple matches (e.g. HNO is wrong but

    more than 1 HNO is found in reference

    data)

    The output address is very likely to bedeliverable but the Address Validator

    transformation found multiple matches for

    one or more input elements in the

    address reference data. For example, a

    house number is incorrect but the number

    is in the correct range.

    I2 Data could not be corrected but there is a

    slim chance that the address is deliverable

    The Address Validator transformation

    cannot find matching address data in the

    address reference data. However, the

    input record contains data in an address

    format that may be deliverable.

    I1 Data could not be corrected and is pretty

    unlikely to be delivered.

    The Address Validator transformation

    cannot find matching address data in theaddress reference data. The input record

    does not contain data that is likely to be

    deliverable.

    Q3 FastCompletion Status - Suggestions are

    available - complete address

    The Address Validator transformation

    found multiple good matches for the input

    record data in the address reference data.

    The transformation returns this code in

    Suggestion List mode.

    Q2 FastCompletion Status - Suggested

    address is complete but combined with

    elements from the input (added or deleted)

    The Address Validator transformation

    found a partial match for the input record

    data in the address reference data. The

    transformation returned a complete

    address.The transformation returns this code in

    Suggestion List mode.

    Q1 FastCompletion Status - Suggested

    address is not complete (enter more

    information)

    The Address Validator transformation did

    not find a match for the input record data

    in the address reference data. The

    transformation returned a partial address.

    The transformation returns this code in

    Suggestion List mode.

    Q0 FastCompletion Status - Insufficient

    information provided to generate

    suggestions.

    The Address Validator transformation did

    not find a match for the input record data

    in the address reference data. The

    transformation did not return any data for

    the address.

    The transformation returns this code in

    Suggestion List mode.

    RA Country recognized from

    ForceCountryISO3 Setting

    The Address Validator transformation

    used the Force Country setting to add

    country name data to the address.

    8

  • 8/2/2019 How to Evaluate the Accuracy of Address Records

    9/10

    Output Code Reference Table Text Description

    R9 Country recognized from

    DefaultCountryISO3 Setting

    The Address Validator transformation

    used the Default Country setting to add

    country name data to the address.

    R8 Country recognized from name without

    errors

    The Address Validator transformation

    identified a destination country from the

    input data.

    R7 Country recognized from name with errors The Address Validator transformation

    identified a destination country from the

    input data, but the input data contains

    inconsistent data for this country.

    R6 Country recognized from territory The Address Validator transformation

    identified a destination country from state

    or national territory information in the

    input data.

    R5 Country recognized from province The Address Validator transformationidentified a destination country from

    province information in the input data.

    R4 Country recognized from major town The Address Validator transformation

    identified a destination country from city

    or town information in the input data.

    R3 Country recognized from format The Address Validator transformation

    identified a destination country from the

    structure of the address.

    R2 Country recognized from script The Address Validator transformation

    identified a destination country from data

    provided by a script.

    R1 Country not recognized - multiple matches The Address Validator transformation

    identified several possible destination

    countries. The transformation did not

    verify a country for the address.

    R0 Country not recognized The Address Validator transformation

    could not identify a destination country for

    the input address data.

    S4 Parsed perfectly The Address Validator transformation

    parsed all input elements successfully.

    The transformation returns this code in

    Parsing mode.

    S3 Parsed with multiple results The Address Validator transformationparsed all input elements, but some

    elements match multiple element types.

    The transformation returns this code in

    Parsing mode.

    S2 Parsed with Errors - Elements change

    position

    The Address Validator transformation

    parsed all input elements, but the

    9

  • 8/2/2019 How to Evaluate the Accuracy of Address Records

    10/10

    Output Code Reference Table Text Description

    transformation changed the element type

    in one or more cases.

    The transformation returns this code in

    Parsing mode.

    S1 Parse Error - Input Format Mismatch The Address Validator transformation

    could not parse input elements because

    the address structure did not match the

    address reference data structure.

    The transformation returns this code in

    Parsing mode.

    N6 Validation Error: No validation performed

    because input data was insufficient

    The Address Validation transformation

    could not validate the address because

    the transformation lacked usable input

    data.

    N5 Validation Error: No validation performed

    because reference database is too old -

    please contact Address Doctor to obtain

    updated reference data

    The Address Validation transformation

    could not validate the address because

    the address reference data is out of date.

    N4 Validation Error: No validation performed

    because reference database is corrupt or

    in wrong format

    The Address Validation transformation

    could not validate the address because it

    could not read the address reference data.

    N3 Validation Error: No validation performed

    because country could not be unlocked

    The Address Validation transformation

    could not validate the address because it

    could find an address reference data

    license.

    N2 Validation Error: No validation performed

    because required reference database is

    not available

    The Address Validation transformation

    could not validate the address because it

    could not find address reference data forthe destination country.

    N1 Validation Error: No validation performed

    because country was not recognized

    The Address Validation transformation

    could not validate the address because it

    could not associate the input address with

    a country.

    Conclusion

    The Status Info ports contain detailed information about address accuracy and deliverability.

    Select a Mailability Score, Match Code, or Result Percentage port when you need general information about the

    deliverability of an address. Use the Informatica accelerator rules to convert the port output codes into text descriptions thatyou can more quickly understand.

    Author

    David Handy

    Principal Technical Writer

    10