Repository Migration Guide - Informatica Library/1/IN_901_DQ_Repository... · Preface The...

18
Informatica Data Quality (Version 8.6.2 - 9.0.1) Repository Migration Guide

Transcript of Repository Migration Guide - Informatica Library/1/IN_901_DQ_Repository... · Preface The...

Page 1: Repository Migration Guide - Informatica Library/1/IN_901_DQ_Repository... · Preface The Informatica Data Quality Migration Guide is written for data quality developers. This guide

Informatica Data Quality (Version 8.6.2 - 9.0.1)

Repository Migration Guide

Page 2: Repository Migration Guide - Informatica Library/1/IN_901_DQ_Repository... · Preface The Informatica Data Quality Migration Guide is written for data quality developers. This guide

Informatica Data Quality Repository Migration Guide

Version 8.6.2 - 9.0.1June 2010

Copyright (c) 2010 Informatica. All rights reserved.

This software and documentation contain proprietary information of Informatica Corporation and are provided under a license agreement containing restrictions on use anddisclosure and are also protected by copyright law. Reverse engineering of the software is prohibited. No part of this document may be reproduced or transmitted in any form,by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation. This Software may be protected by U.S. and/or internationalPatents and other Patents Pending.

Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable software license agreement and as provided inDFARS 227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013©(1)(ii) (OCT 1988), FAR 12.212(a) (1995), FAR 52.227-19, or FAR 52.227-14 (ALT III), as applicable.

The information in this product or documentation is subject to change without notice. If you find any problems in this product or documentation, please report them to us inwriting.

Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter Data Analyzer, PowerExchange,PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica B2B Data Transformation, Informatica B2B Data Exchange and InformaticaOn Demand are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other company and productnames may be trade names or trademarks of their respective owners.

Portions of this software and/or documentation are subject to copyright held by third parties, including without limitation: Copyright DataDirect Technologies. All rightsreserved. Copyright © Sun Microsystems. All rights reserved. Copyright © RSA Security Inc. All Rights Reserved. Copyright © Ordinal Technology Corp. All rightsreserved.Copyright © Aandacht c.v. All rights reserved. Copyright Genivia, Inc. All rights reserved. Copyright 2007 Isomorphic Software. All rights reserved. Copyright © MetaIntegration Technology, Inc. All rights reserved. Copyright © Intalio. All rights reserved. Copyright © Oracle. All rights reserved. Copyright © Adobe Systems Incorporated. Allrights reserved. Copyright © DataArt, Inc. All rights reserved. Copyright © ComponentSource. All rights reserved. Copyright © Microsoft Corporation. All rights reserved.Copyright © Rouge Wave Software, Inc. All rights reserved. Copyright © Teradata Corporation. All rights reserved. Copyright © Yahoo! Inc. All rights reserved. Copyright ©Glyph & Cog, LLC. All rights reserved.

This product includes software developed by the Apache Software Foundation (http://www.apache.org/), and other software which is licensed under the Apache License,Version 2.0 (the "License"). You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0. Unless required by applicable law or agreed to in writing,software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See theLicense for the specific language governing permissions and limitations under the License.

This product includes software which was developed by Mozilla (http://www.mozilla.org/), software copyright The JBoss Group, LLC, all rights reserved; software copyright ©1999-2006 by Bruno Lowagie and Paulo Soares and other software which is licensed under the GNU Lesser General Public License Agreement, which may be found at http://www.gnu.org/licenses/lgpl.html. The materials are provided free of charge by Informatica, "as-is", without warranty of any kind, either express or implied, including but notlimited to the implied warranties of merchantability and fitness for a particular purpose.

The product includes ACE(TM) and TAO(TM) software copyrighted by Douglas C. Schmidt and his research group at Washington University, University of California, Irvine,and Vanderbilt University, Copyright (©) 1993-2006, all rights reserved.

This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit (copyright The OpenSSL Project. All Rights Reserved) and redistribution ofthis software is subject to terms available at http://www.openssl.org.

This product includes Curl software which is Copyright 1996-2007, Daniel Stenberg, <[email protected]>. All Rights Reserved. Permissions and limitations regarding thissoftware are subject to terms available at http://curl.haxx.se/docs/copyright.html. Permission to use, copy, modify, and distribute this software for any purpose with or withoutfee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies.

The product includes software copyright 2001-2005 (©) MetaStuff, Ltd. All Rights Reserved. Permissions and limitations regarding this software are subject to terms availableat http://www.dom4j.org/ license.html.

The product includes software copyright © 2004-2007, The Dojo Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to termsavailable at http:// svn.dojotoolkit.org/dojo/trunk/LICENSE.

This product includes ICU software which is copyright International Business Machines Corporation and others. All rights reserved. Permissions and limitations regarding thissoftware are subject to terms available at http://source.icu-project.org/repos/icu/icu/trunk/license.html.

This product includes software copyright © 1996-2006 Per Bothner. All rights reserved. Your right to use such materials is set forth in the license which may be found at http://www.gnu.org/software/ kawa/Software-License.html.

This product includes OSSP UUID software which is Copyright © 2002 Ralf S. Engelschall, Copyright © 2002 The OSSP Project Copyright © 2002 Cable & WirelessDeutschland. Permissions and limitations regarding this software are subject to terms available at http://www.opensource.org/licenses/mit-license.php.

This product includes software developed by Boost (http://www.boost.org/) or under the Boost software license. Permissions and limitations regarding this software are subjectto terms available at http:/ /www.boost.org/LICENSE_1_0.txt.

This product includes software copyright © 1997-2007 University of Cambridge. Permissions and limitations regarding this software are subject to terms available at http://www.pcre.org/license.txt.

This product includes software copyright © 2007 The Eclipse Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to termsavailable at http:// www.eclipse.org/org/documents/epl-v10.php.

This product includes software licensed under the terms at http://www.tcl.tk/software/tcltk/license.html, http://www.bosrup.com/web/overlib/?License, http://www.stlport.org/doc/license.html, http://www.asm.ow2.org/license.html, http://www.cryptix.org/LICENSE.TXT, http://hsqldb.org/web/hsqlLicense.html, http://httpunit.sourceforge.net/doc/license.html, http://jung.sourceforge.net/license.txt , http://www.gzip.org/zlib/zlib_license.html, http://www.openldap.org/software/release/license.html, http://www.libssh2.org,http://slf4j.org/license.html, http://www.sente.ch/software/OpenSourceLicense.html, and http://fusesource.com/downloads/license-agreements/fuse-message-broker-v-5-3-license-agreement.

This product includes software licensed under the Academic Free License (http://www.opensource.org/licenses/afl-3.0.php), the Common Development and DistributionLicense (http://www.opensource.org/licenses/cddl1.php) the Common Public License (http://www.opensource.org/licenses/cpl1.0.php) and the BSD License (http://www.opensource.org/licenses/bsd-license.php).

This product includes software copyright © 2003-2006 Joe WaInes, 2006-2007 XStream Committers. All rights reserved. Permissions and limitations regarding this softwareare subject to terms available at http://xstream.codehaus.org/license.html. This product includes software developed by the Indiana University Extreme! Lab. For furtherinformation please visit http://www.extreme.indiana.edu/.

This Software is protected by U.S. Patent Numbers 5,794,246; 6,014,670; 6,016,501; 6,029,178; 6,032,158; 6,035,307; 6,044,374; 6,092,086; 6,208,990; 6,339,775;6,640,226; 6,789,096; 6,820,077; 6,823,373; 6,850,947; 6,895,471; 7,117,215; 7,162,643; 7,254,590; 7,281,001; 7,421,458; and 7,584,422, international Patents and otherPatents Pending.

Page 3: Repository Migration Guide - Informatica Library/1/IN_901_DQ_Repository... · Preface The Informatica Data Quality Migration Guide is written for data quality developers. This guide

DISCLAIMER: Informatica Corporation provides this documentation "as is" without warranty of any kind, either express or implied, including, but not limited to, the impliedwarranties of non-infringement, merchantability, or use for a particular purpose. Informatica Corporation does not warrant that this software or documentation is error free. Theinformation provided in this software or documentation may include technical inaccuracies or typographical errors. The information in this software and documentation issubject to change at any time without notice.

NOTICES

This Informatica product (the “Software”) includes certain drivers (the “DataDirect Drivers”) from DataDirect Technologies, an operating company of Progress SoftwareCorporation (“DataDirect”) which are subject to the following terms and conditions:

1.THE DATADIRECT DRIVERS ARE PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOTLIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.

2. IN NO EVENT WILL DATADIRECT OR ITS THIRD PARTY SUPPLIERS BE LIABLE TO THE END-USER CUSTOMER FOR ANY DIRECT, INDIRECT,INCIDENTAL, SPECIAL, CONSEQUENTIAL OR OTHER DAMAGES ARISING OUT OF THE USE OF THE ODBC DRIVERS, WHETHER OR NOT INFORMED OFTHE POSSIBILITIES OF DAMAGES IN ADVANCE. THESE LIMITATIONS APPLY TO ALL CAUSES OF ACTION, INCLUDING, WITHOUT LIMITATION, BREACHOF CONTRACT, BREACH OF WARRANTY, NEGLIGENCE, STRICT LIABILITY, MISREPRESENTATION AND OTHER TORTS.

Part Number: IDQ-MIG-90100-0001

Page 4: Repository Migration Guide - Informatica Library/1/IN_901_DQ_Repository... · Preface The Informatica Data Quality Migration Guide is written for data quality developers. This guide

Table of Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiInformatica Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

Informatica Customer Portal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

Informatica Documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

Informatica Web Site. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

Informatica How-To Library. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

Informatica Knowledge Base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

Informatica Multimedia Knowledge Base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

Informatica Global Customer Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

Chapter 1: Introduction to Data Quality Repository Migration. . . . . . . . . . . . . . . . . . . . . . 1Overview of Repository Migration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Informatica Data Quality 8.6.2 Repository Features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Data Quality Plan and Mapping Comparisons. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Changes to Data Quality Transformations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Changes to Data Quality Sources and Targets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Component Comparison Checklist. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Changes to Reference Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Migration and Data Profiling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Chapter 2: Migrating Repository and Reference Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Overview of Migration Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Migration Prerequisites. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Exporting Data from the Data Quality 8.6.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Exporting Data from the 8.6.2 Workbench Machine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Exporting Data from the 8.6.2 Server Machine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Importing Data to Informatica Data Quality 9.0.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Migration Log Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Table of Contents i

Page 5: Repository Migration Guide - Informatica Library/1/IN_901_DQ_Repository... · Preface The Informatica Data Quality Migration Guide is written for data quality developers. This guide

PrefaceThe Informatica Data Quality Migration Guide is written for data quality developers. This guide assumes that youhave an understanding of data quality concepts, flat file and relational database concepts, and the databaseengines in your environment. This guide also assumes that you are familiar with the concepts presented in theInformatica Developer User Guide.

Informatica Resources

Informatica Customer PortalAs an Informatica customer, you can access the Informatica Customer Portal site at http://mysupport.informatica.com. The site contains product information, user group information, newsletters,access to the Informatica customer support case management system (ATLAS), the Informatica How-To Library,the Informatica Knowledge Base, the Informatica Multimedia Knowledge Base, Informatica ProductDocumentation, and access to the Informatica user community.

Informatica DocumentationThe Informatica Documentation team takes every effort to create accurate, usable documentation. If you havequestions, comments, or ideas about this documentation, contact the Informatica Documentation team throughemail at [email protected]. We will use your feedback to improve our documentation. Let usknow if we can contact you regarding your comments.

The Documentation team updates documentation as needed. To get the latest documentation for your product,navigate to Product Documentation from http://mysupport.informatica.com.

Informatica Web SiteYou can access the Informatica corporate web site at http://www.informatica.com. The site contains informationabout Informatica, its background, upcoming events, and sales offices. You will also find product and partnerinformation. The services area of the site includes important information about technical support, training andeducation, and implementation services.

Informatica How-To LibraryAs an Informatica customer, you can access the Informatica How-To Library at http://mysupport.informatica.com.The How-To Library is a collection of resources to help you learn more about Informatica products and features. Itincludes articles and interactive demonstrations that provide solutions to common problems, compare features andbehaviors, and guide you through performing specific real-world tasks.

ii

Page 6: Repository Migration Guide - Informatica Library/1/IN_901_DQ_Repository... · Preface The Informatica Data Quality Migration Guide is written for data quality developers. This guide

Informatica Knowledge BaseAs an Informatica customer, you can access the Informatica Knowledge Base at http://mysupport.informatica.com.Use the Knowledge Base to search for documented solutions to known technical issues about Informaticaproducts. You can also find answers to frequently asked questions, technical white papers, and technical tips. Ifyou have questions, comments, or ideas about the Knowledge Base, contact the Informatica Knowledge Baseteam through email at [email protected].

Informatica Multimedia Knowledge BaseAs an Informatica customer, you can access the Informatica Multimedia Knowledge Base at http://mysupport.informatica.com. The Multimedia Knowledge Base is a collection of instructional multimedia filesthat help you learn about common concepts and guide you through performing specific tasks. If you havequestions, comments, or ideas about the Multimedia Knowledge Base, contact the Informatica Knowledge Baseteam through email at [email protected].

Informatica Global Customer SupportYou can contact a Customer Support Center by telephone or through the Online Support. Online Support requiresa user name and password. You can request a user name and password at http://mysupport.informatica.com.

Use the following telephone numbers to contact Informatica Global Customer Support:

North America / South America Europe / Middle East / Africa Asia / Australia

Toll FreeBrazil: 0800 891 0202Mexico: 001 888 209 8853North America: +1 877 463 2435 Standard RateNorth America: +1 650 653 6332

Toll FreeFrance: 00800 4632 4357Germany: 00800 4632 4357Israel: 00800 4632 4357Italy: 800 915 985Netherlands: 00800 4632 4357Portugal: 800 208 360Spain: 900 813 166Switzerland: 00800 4632 4357United Kingdom: 00800 4632 4357 or 0800023 4632 Standard RateBelgium: +32 15 281 702France: 0805 804632Germany: +49 1805 702 702Netherlands: +31 306 022 797Switzerland: 0800 463 200

Toll FreeAustralia: 1 800 151 830New Zealand: 1 800 151 830Singapore: 001 800 4632 4357 Standard RateIndia: +91 80 4112 5738

Preface iii

Page 7: Repository Migration Guide - Informatica Library/1/IN_901_DQ_Repository... · Preface The Informatica Data Quality Migration Guide is written for data quality developers. This guide

iv

Page 8: Repository Migration Guide - Informatica Library/1/IN_901_DQ_Repository... · Preface The Informatica Data Quality Migration Guide is written for data quality developers. This guide

C H A P T E R 1

Introduction to Data QualityRepository Migration

This chapter includes the following topics:

¨ Overview of Repository Migration, 1

¨ Informatica Data Quality 8.6.2 Repository Features, 2

¨ Data Quality Plan and Mapping Comparisons, 2

¨ Changes to Data Quality Transformations, 2

¨ Changes to Data Quality Sources and Targets, 3

¨ Component Comparison Checklist, 3

¨ Changes to Reference Data, 6

¨ Migration and Data Profiling, 6

Overview of Repository MigrationInformatica provides batch files that you can use to export the contents of an Informatica Data Quality 8.6.2repository to a 9.0.1 Model repository.

The batch files perform the following tasks:

¨ Export the Informatica Data Quality 8.6.2 repository contents to the file system in XML format.

¨ Convert the 8.6.2 data quality processes to transformation, mapplet, and mapping XML.

¨ Copy user-defined reference data files to the file system.

¨ Write the copied reference data to 9.0.1 reference tables in the 9.0.1 Model repository and staging area.

Informatica Data Quality 9.0.1 users complete the migration process by importing the XML to the Model repositorythrough the Developer tool.

1

Page 9: Repository Migration Guide - Informatica Library/1/IN_901_DQ_Repository... · Preface The Informatica Data Quality Migration Guide is written for data quality developers. This guide

Informatica Data Quality 8.6.2 Repository FeaturesThe Informatica Data Quality 8.6.2 repository shows the following similarities and differences when compared withthe 9.0.1 Model repository:

¨ The Informatica Data Quality 8.6.2 repository contains two types of object: projects and plans. The 8.6.2repository does not store transformation or data source definitions as separate objects. The 8.6.2 repositorystores all metadata as XML.

¨ An 8.6.2 repository project is similar to a 9.0.1 Model repository project. Both display user-defined folders in therepository structure.

¨ An 8.6.2 plan equates to a mapping in the 9.0.1 Model repository. A plan contains a data source and datatarget connected by zero or more transformations. It runs in the same manner as a mapping.

¨ The Informatica Data Quality 8.6.2 user creates and runs plans in a client application called Data QualityWorkbench. The application installs with a local repository. Informatica Data Quality 8.6.2 enables remoteclients to connect to an 8.6.2 repository in a client-server manner, but all Informatica Data Quality 8.6.2repositories are identical.

Data Quality Plan and Mapping ComparisonsThe migration process converts all Informatica Data Quality 8.6.2 plans to 9.0.1 mappings.

Some sources, targets, and transformations in the migrated plans convert directly to sources, targets, andtransformations in the 9.0.1 Model repository. Some sources, targets, and transformations convert to multipleobjects or to mapplets in the Model repository.

Changes to Data Quality TransformationsSome transformations are functionally identical across the product versions, while others convert to differenttransformations. Some transformations do not migrate.

The following types of transformation change can occur:

¨ The 8.6.2 transformation has a direct counterpart in 9.0.1. Informatica Data Quality 9.0.1 includestransformations that are effectively copies of 8.6.2 transformations. For example, the Merge, ToUpper, andRule-Based Analyzer transformations in Informatica Data Quality 8.6.2 become Merge, Case, and Decisiontransformations in Informatica Data Quality 9.0.1.

¨ 9.0.1 transformations provide equivalent functionality to or have evolved from 8.6.2 transformations. Forexample, the 9.0.1 Comparison transformation combines the functionality of the Bigram, Jaro, HammingDistance, and Edit Distance transformations. These 8.6.2 transformations convert seamlessly to a Comparisontransformation.

¨ The 8.6.2 transformation does not have a direct counterpart in 9.0.1 but the transformation functionality ismaintained in other transformations. In such cases, the 8.6.2 transformation metadata transfers to othertransformations. For example, the Word Manager transformation does not migrate to 9.0.1, but its metadatatransfers to the Standardizer transformation, which enables the same functionality.

¨ The 8.6.2 transformation is not supported in 9.0.1 and the transformation functionality does not transfer to othertransformations. In such cases, the 8.6.2 transformation input and output metadata is applied to anothertransformation, for example an Expression transformation.

2 Chapter 1: Introduction to Data Quality Repository Migration

Page 10: Repository Migration Guide - Informatica Library/1/IN_901_DQ_Repository... · Preface The Informatica Data Quality Migration Guide is written for data quality developers. This guide

To review the changes made to the transformations in migrated plans, read the ServerMigrationReport filegenerated by the ServerImport batch file. In all cases, review the configuration of the imported mappings andmapplets before you run them in Informatica Data Quality 9.0.1.

Changes to Data Quality Sources and TargetsSome Informatica Data Quality 8.6.2 sources and targets are fully compatible with 9.0.1 data source and targetdefinitions. For example, a CSV Source from Informatica Data Quality 8.6.2 migrates to a file-based data sourceobject in 9.0.1. These sources and targets convert seamlessly to 9.0.1 source and target definitions.

Some 8.6.2 sources and targets incorporate transformation functionality and do not have a one-to-onecorrespondence with source and target definitions in 9.0.1. They convert to 9.0.1 sources and targets and alsogenerate 9.0.1 transformations that perform the operations configured in 8.6.2.

The following types of source and target do not correspond one to one with source and target definitions in 9.0.1:

¨ Sources and targets involved in grouping data records before duplicate analysis.

¨ Sources and targets that perform field matching procedures.

¨ Sources and targets that perform identity matching procedures.

Component Comparison ChecklistThis table lists the source, target, and transformation components available in Data Quality Workbench anddescribes how they convert to objects in the 9.0.1 Model repository.

8.6.2 Component 9.0.1 Component

Aggregation Aggregator transformation

Association [for PowerCenter] Association transformation

Bigram Comparison transformation

Character Labeler Labeler transformations

Consolidation [for PowerCenter] Consolidation transformation

Context Parser Labeler and Parser transformations. The Parser is set to pattern-based parsing mode.

Count Mapplet containing Aggregator, Union, Expression, Joiner, Sorter, and Filtertransformations

CSV Dual Match Source Two file-based data sources and a Match transformation

CSV Identity Group Source File-based data source and Match transformation. May convert to a mapplet.

CSV Match Sink File-based data target

CSV Match Source File-based data target and Match transformation

Changes to Data Quality Sources and Targets 3

Page 11: Repository Migration Guide - Informatica Library/1/IN_901_DQ_Repository... · Preface The Informatica Data Quality Migration Guide is written for data quality developers. This guide

8.6.2 Component 9.0.1 Component

CSV Merge Sink File-based data target

CSV Sink File-based data target

CSV Source File-based data source

DB Identity Group Source Relational data source and Match transformation. May convert to a mapplet.

DB Match Source Relational data source and Match transformation

DB Report Sink File-based data target

DB Sink SQL transformation and relational data target

DB Source Relational data source

Dual Group Source Multiple file-based data sources and Union transformation if required. May convert to amapplet.

Edit Distance Comparison transformation

Fixed Width Sink File-based data target

Fixed Width Source File-based data source

Global AV [Address Doctorengine]

Address Validator transformation. This transformation needs additional configurationfollowing import to the 9.0.1 Model repository.

Global AV [Melissa Data engine] Address Validator transformation. This transformation needs additional configurationfollowing import to the 9.0.1 Model repository.

Global AV [QAS engine] Address Validator transformation. This transformation needs additional configurationfollowing import to the 9.0.1 Model repository.

Global AV [SDK] Not supported

Group Sink Flat-file data target and Sorter and Expression transformations

Group Source Multiple file-based data sources and Union transformation if required. May convert to amapplet.

Hamming Distance Comparison transformation

Identity Group Target Mapplet output transformation

Identity Match Match transformation

Jaro Distance Comparison transformation

Match Key Sink Relational data target

Merge Merge transformation

MinAvgMax Mapplet containing Aggregator, Union, Expression, Joiner, and Router transformations

4 Chapter 1: Introduction to Data Quality Repository Migration

Page 12: Repository Migration Guide - Informatica Library/1/IN_901_DQ_Repository... · Preface The Informatica Data Quality Migration Guide is written for data quality developers. This guide

8.6.2 Component 9.0.1 Component

Missing Values Mapplet containing Aggregator, Expression, Joiner transformations

Mixed Field Matcher Not supported

Normalization [SDK] Not supported

NYSIIS Key Generator transformation

Parsing [SDK] Not supported

Profile Standardizer Parser transformation. The Parser is set to pattern-based parsing mode.

Range Counter Linear Range: Aggregator, Expression, Joiner, and Sorter transformationsVariable Range: Aggregator, Expression, Union transformations

Realtime Sink Mapplet containing data target

Realtime Source Mapplet containing data source

Report Sink Flat-file data target

Rule Based Analyzer Decision transformation

SAP Sink Not supported

SAP Source Not supported

Scripting Not supported

Search Replace Standardizer transformation

Similarity [SDK] Not supported

Soundex Key Generator transformation

Splitter Labeler, Parser, and Expression transformations. The Parser is set to pattern-basedparsing mode.

Sum Mapplet containing Aggregator, Expression, Joiner, Sorter, and Union transformations

To Upper Case Converter transformation

Token Labeler Labeler transformation

Token Parser Parser transformation

Weight Based Analyzer Weighted Average transformation

Word Manager Standardizer transformation

Component Comparison Checklist 5

Page 13: Repository Migration Guide - Informatica Library/1/IN_901_DQ_Repository... · Preface The Informatica Data Quality Migration Guide is written for data quality developers. This guide

Changes to Reference DataInformatica Data Quality 8.6.2 can read reference data from dictionary files and database tables.

The migration process exports the following types of reference data:

¨ Reference data that you create

¨ Informatica reference data files that are not installed by the Data Quality 9.0.1 Content Installer

The migration process does not export the following types of reference data:

¨ Address reference data

¨ Identity population data

¨ Informatica reference data installed by the Data Quality 9.0.1 Content Installer

Run the Data Quality Content Installer to install current Informatica reference data, including any addressreference data and population data you have purchased.

The migration process can recognize that a plan reads reference data when it exports the plan from the 8.6.2repository. In such cases, the migration process retains the link between the plan and the reference data in theexported XML. When you import the project and plan metadata to the 9.0.1 Model repository, the reference data iscopied into reference tables in the 9.0.1 repository and staging area. The mapping created for the plan reads thereference data from these tables. You do not need to reconnect the mapping to the reference tables.

The migration process recognizes Informatica reference data even if the installed data file name has changedbetween versions 8.6.2 and 9.0.1. If an 8.6.2 plan reads a reference data file that is represented by a referencetable in 9.0.1, the migration process updates the imported mapping to read the new reference table.

Migration and Data ProfilingInformatica Data Quality 8.6.2 performs profiling differently from Informatica Data Quality 9.0.1.

Informatica Data Quality 8.6.2 users create plans to profile data sources and write the results to data targets. Themigration process preserves the logic of these plans, so that the files or database tables written by the mapping in9.0.1 contain data that corresponds to profile results.

6 Chapter 1: Introduction to Data Quality Repository Migration

Page 14: Repository Migration Guide - Informatica Library/1/IN_901_DQ_Repository... · Preface The Informatica Data Quality Migration Guide is written for data quality developers. This guide

C H A P T E R 2

Migrating Repository andReference Data

This chapter includes the following topics:

¨ Overview of Migration Process, 7

¨ Migration Prerequisites, 8

¨ Exporting Data from the Data Quality 8.6.2, 8

¨ Exporting Data from the 8.6.2 Workbench Machine, 9

¨ Exporting Data from the 8.6.2 Server Machine, 9

¨ Importing Data to Informatica Data Quality 9.0.1, 10

¨ Migration Log Data, 11

Overview of Migration ProcessTo migrate repository and reference data, you must run batch files provided by Informatica. Informatica providesthe files in the IDQMigration.zip file.

IDQMigration.zip contains the following files:

¨ ClientPackage. Exports the 8.6.2 repository contents and copies reference dictionary data to the file system.The batch processes compresses and save the files in a format legible to the ServerImport batch file.

You can append parameters to the ClientPackage batch file to read plan metadata from the file system and notfrom the 8.6.2 repository. You must use these parameters when migrating metadata from a Data Quality Serverrepository.

¨ ServerImport. Extracts and writes reference metadata to the 9.01 Model repository. Extracts and writesreference data to the 9.0.1 staging database. The file also save plan metadata in a format legible to the 9.0.1Model repository. It does not write the plan metadata to the Model repository.

Note: You must manually import the plan metadata to the 9.0.1 Model repository.

7

Page 15: Repository Migration Guide - Informatica Library/1/IN_901_DQ_Repository... · Preface The Informatica Data Quality Migration Guide is written for data quality developers. This guide

Migration PrerequisitesYou must verify that the client batch file can access all Informatica Data Quality 8.6.2 objects and data. You mustalso understand the changes that migrated objects can undergo during the migration process.

Before you begin the migration process, answer the following questions:

¨ Do the plans read reference data provided by Informatica? The migration process does not migrate referencedata that is included in the current Data Quality Content installer. Use the Data Quality Content Installer to addcurrent reference data, including address and identity population files, to the Informatica Data Quality 9.0.1machine. Run the Data Quality Content Installer before you perform any migration tasks on an Informatica DataQuality 9.0.1 machine.

¨ Do the plans read from or write to database tables? If the plans read from or write to a database, take note ofthe database connection details. Verify that the 9.0.1 Data Integration Service can access the database hostmachines.

If the plans read from or write to files, copy these files to a location accessible to the 9.0.1 Data IntegrationService. You can set the location of source or target files in the migration.properties file. This file is included in theIDQMigration.zip package.

Database ConsiderationsIf Informatica Data Quality 8.6.2 uses multiple staging database types, you must ensure that a database orschema and a connection object exist for each type. Add the name of each connection object to theStage.<databasetype> property in the migration.properties file.

Informatica connects to Microsoft SQL Server and MySQL databases through ODBC. The migration processcreates connection objects for these databases, but you must ensure that the ODBC Data Source has beencreated on the Informatica Server system.

For example, if Informatica Data Quality 8.6.2 connects to a Microsoft SQL Server connection through ODBC DSN'MS_SQL_CONNECTION,' the connection object created also uses this name. If the ODBC DSN on the server hasa different name, edit the name of the DSN or the connection object so that they are consistent.

Exporting Data from the Data Quality 8.6.2Run ClientPackage.bat to create a compressed file that contains repository metadata and reference data.

The export procedures differ for Data Quality Workbench and Data Quality Server installations.

Workbench installations

Run ClientPackage.bat on the Workbench machine to export Workbench repository contents and referencedata to a compressed migration file.

Server installations

Use Workbench to export plan metadata to the file system on a Server repository machine. RunClientPackage.bat on the Server repository machine to create a compressed migration file that contains theplan metadata and the Server reference data.

8 Chapter 2: Migrating Repository and Reference Data

Page 16: Repository Migration Guide - Informatica Library/1/IN_901_DQ_Repository... · Preface The Informatica Data Quality Migration Guide is written for data quality developers. This guide

Exporting Data from the 8.6.2 Workbench MachineRun ClientPackage.bat on the Workbench machine to export the local repository contents and reference data.

ClientPackage.bat creates a compressed file that contains these items. The default name for the file isMigrationPackage.zip.

1. Copy the IDQMigration.zip file to the Workbench host machine.

2. Extract the IDQMigration.zip file.

3. Run ClientPackage.bat. You can apply the following optional parameters:

Option Description

-d Path to the directory where the batch file creates MigrationPackage.zip.

-f Path to a folder that contains plans already exported from the Data Quality repository. Use thisparameter if you have used Workbench to export repository contents to file. Do not use with the -r parameter.

-o Alternative name for MigrationPackage.zip.

-r Server repository export only. Specifies that ClientPackage.bat will run on a remote Data Qualityrepository and extract plan and reference data to the Workbench file system.

-s Staging directory for temporary files.

The batch process creates a compressed migration file that contains the exported repository and referencedata files.

4. Review the PackageReport.html file created by the export process.

This reports lists the plans, reference files, and database connection information copied to the migration fileduring the export process.

Exporting Data from the 8.6.2 Server MachineUse Data Quality Workbench to export the Server repository contents to the Server file system. RunClientPackage.bat on the Server machine to read the exported repository data and copy reference data from theServer machine.

ClientPackage.bat creates a compressed file that contains these items. The default name for the file isMigrationPackage.zip.

1. Use Data Quality Workbench to export plans from the Data Quality Server repository.

Create the exported XML files on the Server repository machine.

2. Copy the IDQMigration.zip file to the Server repository host machine.

Exporting Data from the 8.6.2 Workbench Machine 9

Page 17: Repository Migration Guide - Informatica Library/1/IN_901_DQ_Repository... · Preface The Informatica Data Quality Migration Guide is written for data quality developers. This guide

3. Extract the IDQMigration.zip file.

4. Run ClientPackage.bat. You can apply the following parameters:

Option Description

-d Optional. Path to the directory where the batch file creates the MigrationPackage.zip.

-f Path to the directory that contains plans exported from the Data Quality Server repository. Donot use with the -r parameter.

-o Optional. Alternative name for MigrationPackage.zip.

-r Optional. Specifies that ClientPackage.bat will run on a remote repository and extract planmetadata to the local file system.You can run ClientPackage.bat -r in place of step 1 if you do not have access permissions onthe Server repository machine.

-s Optional. Staging directory for temporary files.

The batch process creates a MigrationPackage.zip file that contains the exported repository and data files.

Importing Data to Informatica Data Quality 9.0.1These steps import reference data to the 9.0.1 Model repository and staging area and create an XML filecontaining mappings that you can import to the repository using the Developer tool.

The batch process writes the mapping XML to a folder named Output in the folder that contains the ServerImportbatch file.

You begin the import process on the Informatica Data Quality 9.0.1 server machine. This

1. Copy the compressed file that contains the exported repository objects and reference data to the Modelrepository host machine.

The default name of the compressed repository objects file is MigrationPackage.zip.

2. Copy the IDQMigration.zip file to the Model repository host machine.

3. Extract the contents of IDQMigration.zip.

4. Open the migration.properties file from the extracted files. Update migration.properties with the followinginformation:

¨ Informatica 9.0.1 Model repository host machine name

¨ Model repository name

¨ Analyst Tool Service name

¨ Name of the project and folder to contain the user-defined reference data

¨ Name of the project and folder that contains the Informatica reference data

¨ The locale setting on the Data Quality Workbench that last edited the plans. If required, use theLocale.Client property to set the locale.

5. Set the RTM.MapRTM property to Yes.

6. Save and close migration.properties.

10 Chapter 2: Migrating Repository and Reference Data

Page 18: Repository Migration Guide - Informatica Library/1/IN_901_DQ_Repository... · Preface The Informatica Data Quality Migration Guide is written for data quality developers. This guide

7. Run ServerImport.bat or ServerImport.sh. Use the following parameters:

Option Description

-f Required. Path to the folder that contains the compressed repository objects file.

-d Optional. Specify an alternative Output folder for the mapping XML file.

-o Optional. Specify a new name for the exported objects XML file.

-p Optional. Specify an alternative properties file to migration.properties.

-s Optional. Specify an alternative temporary working folder.

The ServerImport batch process creates an XML file that contains the mappings you will import to the Modelrepository.

8. Copy the XML file to the Developer tool machine.

9. Open the Developer tool and import the mapping XML to the Model repository.

When the plans are imported, they appear as mappings in a folder in the Model repository. Any 8.6.2transformations that convert to mapplets are also saved to a separate folder.

10. Review the ServerMigrationReport.html file created by the export process.

This reports lists the plans, reference files, and database connection information copied to the migration fileduring the export process. The report also describes changes to imported transformation configurations.

The migration process creates connection objects for the databases that are used by the migrated data qualityplans. Update the JDBC string information in the database connection objects.

Migration Log DataThe ClientPackage batch file creates a log file in the folder to which the repository contents are exported.

Consult this log file for complete low-level information on the changes to exported objects.

Migration Log Data 11