IDE 5.0_Basics_20061025

Using Informatica Data Explorer 5

Education Services

Version IDE-25102006

Agenda

• Overview of Informatica Data Explorer

• Importing Metadata and Accessing Source Data

• Column Profiling

• Data Rules

• Single Table Structural Analysis

• Cross Table Profiling

• Validating Table and Cross Table Analysis

• Normalization

• Repository

• Using the Repository Navigator

• Using Repository Reports

• Integration with PowerCenter

Introduction

Introduction Objectives

• Identify the components of the Informatica Data Explorer product suite

• Describe the Informatica Data Explorer process flow

Informatica Data Explorer Product Suite

IDE Source Profiler

IDEClient

Windows XP, 2000

IDEServer

Unix or Windows

Informix

Sybase

Oracle

COBOLPrograms

IDEFTM / XML

DDL,XML

& DTDs

IDE Repository

FlatFiles

IDEImport

Flat File

IDEProject

RepositoryNavigator

IDE ImportIMS

IDEImportVSAM

Sequential

OS/390DB2

Unload

Command Files &

IDE Repository

PortsMSSQL

IDE Import

Relational

Via ODBC

FixedTargetMapping

RepositoryNavigator

Ports:InformaticaCWM

Sources

RDBMSODBC

FlatFile

* MainFrame

Connectors

Data Table n

Data Table 2

Data Table 1

rofili

Single Table profiling

IDE Profiling IDE Design

Consolidated Schema

Source DataKnowledge

StructureContentQuality

Product Architecture

* The IMS and VSAM importers actually use a GUI (Source Profiler) to read a Copybook and generate a program to extract mainframe data into Flat Files for use by IDE

IDE Server Platform• Windows (2000/2003/XP)• Sun Solaris (7,8,9)• HP-UX (11 or later)•AIX (4.3, 5L)

IDE Server

Repository DBMS and

Server Platform

ODBC DriverConnectivity

Client

Workstation• Windows 2000• Windows XP

IDE Client

Project

Workstation• Windows 2000/XP

Repository

Navigator

Workstation• Windows 2000/XP

FTM/XML

Data / Header

•IBM DB2 UDB-7.2,8.1 •Informix 7.31,9.2,9.3 Informix 9•Microsoft SQL Server 7 and 2000)•Oracle 8i, 9i•Sybase 12 and 12.5

Relational Importers for:•IBM DB2 UDB-5.2, 6.1,7.1,7.2,8.1•Informix-7.24,7.31,9.1,9.2,9.3•Oracle -7.3,8,8i,9i•Sybase 10,11,12•ODBC(SQL Server, etc.)

Performs Actual Profiling

Profiling Results initial store

RepositoryCompleted Profiling

ResultsRepository does not

need to be on the same server as IDE

TCPIPConnectivity

ODBC/JDBCConnectivity

ODBC Driver (API 3.x conformance level 2)

XML format files

Technical Diagram

Flat File Importer for:•Fixed Length•Delimited•DB2 Unload

IDE Process Flow

IDE Data Profiling

IDE Data Prep / Import

Products

IDE Schema Development

• Data Extraction • Cleansing• Transformation

Specifications

FTM / XML Metadata Mapping

DB Load

Messaging

Target DB

Target DesignOR

Message

IDE Repository and NavigatorMetadata Management

Data• Relational• Flat Files• VSAM• IMS• ODBC

Documented Metadata

Introduction Review

• The Informatica Data Explorer product line: • Informatica Data Explorer

• Importer for Flat Files• DDL Generators• Source Profiler• Repository Navigator• Repository

• Import for Relational Databases• Import for VSAM• Import for IMS• FTM

Introduction Review (cont.)

• The IDE Process Flow consists of five major processes:• Data Preparation and Import

• Data Profiling

• Schema Validation and Development

• Metadata Development

• Metadata Management

Lesson 1

Importing Metadata and Accessing Source Data

Lesson 1 Objectives

• Explain what an Informatica Data Explorer Project is, and how it is used

• Create and setup Informatica Data Explorer Projects

• Define the term “metadata” as used by Informatica Software

• Explain the importance of metadata in Data Profiling using Informatica Data Explorer

Lesson 1 Objectives (cont.)

• Explain what source data are, and the ways in which they may be imported into Informatica Data Explorer

• Explain what the Informatica Data Explorer Flat File Importer does

• Describe the format of an Informatica Data Explorer Flat File, including the minimum requirements for Informatica Data Explorer to use it to access source data

Case Study Description

• The Customer Order system is a mainframe application accessed through a CICS user interface

• It was developed 10 years ago

• The Employee Identification system is an Oracle database created 2 years ago

• Business users are sure they know the data

• Senior executives suspect the quality of the data is bad

IDE Source Profiler

IDEClient

Windows 98, NT or 2000

IDEServer

Unix or NT

Informix

Sybase

Oracle

COBOLPrograms

IDEFTM / XML

DDL, IDE XML& DTDs

IDE Repository

FlatFiles

IDEImport

Flat File

IDEProject

RepositoryNavigator

IDE ImportIMS

IDE ImportVSAM

Sequential

OS/390DB2

Unload

Command Files &

IDE Repository

PortsMSSQL

IDE ImportRelational

Via ODBC

Informatica Data Explorer Project

Informatica Data Explorer Projects

• The persistent data store used by IDE

• A Project is a UNIX or Windows NT container (directory, folder etc.)

• Projects contain:• Metadata

• Data

• Profiling as well as Mapping information

• Projects are opened and closed by the IDE Server

What is Metadata?

• Informatica Data Explorer defines metadata as:• Data that describes data

• Information about the characteristics of source data

• In Informatica Data Explorer, metadata is information that will create: • Schemas

• Tables

• Columns

• Other objects

Why Import Metadata?

• Must be imported into an Informatica Data Explorer Project before any subsequent tasks or activities can be started

• Informatica Data Explorer needs? to know the names of the Columns in order to store Data Profiling results

• Informatica Data Explorer needs to know how to interpret the source data (Fixed vs. Delimited)

• Provides basis for automated quality assessments in data profiling

IDE Data Sources

IDE Source Profiler

IDEClient

Windows 98, NT or 2000

IDEServer

Unix or NT

Informix

Sybase

Oracle

COBOLPrograms

IDEFTM / XML

DDL, IDE XML& DTDs

IDE Repository

FlatFiles

IDEImport

Flat File

IDEProject

RepositoryNavigator

IDE ImportIMS

IDE ImportVSAM

OS/390

Command Files &

IDE Repository

PortsMSSQL

IDE ImportRelational

Via ODBC

Sequential

DB2Unload

IDE Flat Files

• Consist of two components

• Header File• Contains metadata describing contents of a data file

• Data file• Data in either delimited or fixed column format as well as

DB2 Load format

IDE Flat Files (cont.)

• Header and Data files may be • Separate files or

• Combined into one file

• A header file should not contain duplicate column names (IDE will automatically re-name them)

• IDE Flat Files may not contain Arrays (repeating groups or occurs)

Informatica Data Explorer Flat File Components

header:file=empinfo.dat

attribute:EMPIDdata_type=INTEGERnull_rule=NOT NULLmin_value=1000max_value=9999

attribute:LAST_NAMEdata_type=CHAR(20)null_rule=NOT NULL

attribute:FIRST_NAMEdata_type=CHAR(20)null_rule=NOT NULL

attribute:GENDERdata_type=CHAR(1)null_rule=NOT NULL

attribute:DEPTIDdata_type=CHAR(4)null_rule=NOT NULLmin_value=100

149,Francis,Lynn,3,200,MIS,Database Administrator,"120 Co

249,Venkatachalam,Nagarajan,3,200,MIS,Project Leader,"300289,Kim,Suk,3,200,MIS,Staff Consultant,"4040 N Fairfax Dr216,Masood,Airaj,,200,MIS,MIS Analyst,"300 N Wakefield Dr134,Swenson,Allison,F,200,MIS,Database Administrator,"900 164,Park,Allison,F,200,MIS,Database Analyst,"PO BOX 1471"323,Blaskiewicz,Allison,F,200,MIS,Technical Specialist,"3255,Barbles,Amy,F,100,Sales,Sales Executive,"4019 Rice Bl273,Karneh,Anna,1,200,MIS,Sr Prog Analyst,"12601 Fair Lak

Header File Data File

The data file example shows the associated comma delimited file to which this header file refers

The header file example shows some of the documented information that can be loaded into Informatica Data Explorer

Header and Data Files

• Header and data can be in one file

• We recommend that two files be made if created manually

• The more information that is provided in the header file, the more automatic comparisons Informatica Data Explorer can make

Open Project

Import Metadata

Lab Exercises 1.1–1.6

Lesson 1 Review

• An Informatica Data Explorer Project is:• The persistent data store used by Informatica Data

Explorer

• Used to organize and partition the work effort

• Metadata describes the data source and is used by Informatica Data Explorer to access the source data

• A structure that contains: • Metadata

• Data

• Profiling and Mapping information

Lesson 1 Review (cont.)

• Informatica Data Explorer can import data from:• Relational Databases

• Oracle 7.3, 8, 8i, 9i or 10g

• Informix 7, 9.1, or 9.2

• Sybase 10, 11, 12 or 12.5

• IBM DB2 UDB 5.2, 6.1 7.1 or 7.2

• Microsoft SQL Server 7 and 2000 (using an ODBC driver)

• Flat Files• Delimited Format

• Fixed Length Format

• DB2 Load Format

• Informatica Data Explorer Flat Files must be:• ASCII or EBCDIC character format (no binary data)

• Binary data is supported via the DB2 Load Utility format

• Informatica Data Explorer Flat File may not contain:• Arrays (repeating groups or occurs)

• Duplicate column names

• Informatica Data Explorer Flat Files must have a header file along with the data file

• Additional information on data preparation is available in the Using Informatica Data Explorer Source Profiler course and the documentation

Lesson 2

Column Profiling

Lesson 2 Objectives

• Explain what Column Profiling is, and why it should be performed.

• Execute the Column Profile function of Informatica Data Explorer.

• Navigate and review the results of Column Profiling.

• Explain informational Tags.

• Describe when and how to apply informational Tags to Informatica Data Explorer objects.

What is Column Profiling?

• A process of discovering physical characteristics of each column in a file

• Comparing documented Metadata against Metadata inferred from the data source

• Column Profiling is done against data in the form of • ASCII flat files

• DB2 Load Utility files

• RDBMS tables

Why Profile Columns?

• Not all database metadata and documentation are accurate pictures of the data source

• Documented descriptions of data elements may be inconsistent with the way the element is actually used

• Informatica Data Explorer Column Profiling builds a description of a column (its metadata) based on the data it contains

Column Lists

• The results of Column Profiling are stored with the Columns in a Table

• Column List viewers can be opened from the Navigation Tree

• Column List viewers provide information about Documented and Inferred Metadata• Documented Metadata are supplied from the header

file or source table

• Inferred Metadata are those that Informatica Data Explorer determined from examining the data.

Column Profiling

Column Viewer

Drill Down

• Allows you to perform ad hoc drill downs through data presented in the Informatica Data Explorer viewers.

• Used to interrogate any data sources that can be accessed via an ODBC connection or Informatica Data Explorer Importer.

• Searches are issued against the selected data, and rows are returned for the specified search.

Drill Downs

Column Details

• Lists of properties about a Column that have been inferred by Informatica Data Explorer

• Columns can have several potential sets of characteristics

• The potential sets of characteristics are dependent on the physical view that is chosen

Drill Down

Drill Down Results

Column Value Pairs

• Informatica Data Explorer will store per Column• Up to 16,000 distinct values

• These are the most frequently occurring values from the set of all values that were observed during the Column Profile execution

• The frequency with which each value was observed

• Informatica Data Explorer will calculate• % Distribution for each distinct value based on the frequency

divided by the total rows profiled

Value Pair Review

• Issues to evaluate during Column Value Pair analysis:• Are the values/range of values correct?

• Is the data type correct?

• Is there a pattern or format to the data for this Column? Do all of the values match this pattern/format?

• Is there a difference in case for alpha characters? Are some values mixed case are others all upper or lower case? Is this an issue?

• Are there different representations (different abbreviations/misspellings) of the same data?

• Are there duplicate values in a field that should be unique?

Sorting Viewers

• It is possible to sort any of the tables displayed in Informatica Data Explorer

• By clicking on the column header, the results will be sorted in ascending order. Double-clicking again will sort the list in descending order

Value Pair Review

Sort Order

• Sorting is based on the character codes of the values in the data:• Spaces sort to the top of an ascending sort. When the caret

(^) symbol is displayed, the sort is based on the actual “space” character not the caret (^).

• Special characters (i.e. #, &, ‘)

• Nulls

• Numbers

• Alpha characters

• Informatica Data Explorer Tags come in various forms, depending on the type of information you want to convey:• Notes – general text

• Action Items – things that need to be done

• Rules – business rules defining nature of object

• Transformations – requirements to change the data to fit the object

Tags (cont.)

• Think of Tags as high-tech Post-Its™ that you can attach to many types of objects in an Informatica Data Explorer Project

• Note: All of the pull down menu items in Tags can be configured through server configuration files

Action Tag

Note Tag

Rule Tag

Content Presentation

• Constant Analysis

• Empty Column Analysis

• Inferred Data Type Analysis

• Null Rule Analysis

• Source Data Type Analysis

• Unique Analysis

• Frequency Analysis

• Pattern Analysis

• Domain Analysis

Content Presentations

Content Presentation (Continued)

Constant Analysis

Lesson 2 Review

• Column Profiling is about the analysis of column content and format

• Column Profiling scans data files and stores the resulting profile information in an Informatica Data Explorer Project

• Column Profiling information can be viewed by opening an Column List for a Table

Lesson 2 Review

• The results of Column Profiling are stored with an Column

• The results of Column Profiling include:• Primary and Alternate Data Types• Null Rules• Minimum/Maximum Value ranges• Value Pairs• Patterns

• Tags can be added to Columns or Tables to convey additional information or instructions about the Column

Lesson 3

Data Rules

Data Rules - Objectives

• What is a Data Rule?

• Using Data Rules in Informatica Data Explorer

• How to test for Data Rules

• Execute Data Rules tasks

• When to apply Data Rules in the data discovery process.

Define Data Rules

• What is a Business Rule? • Business Rule: describe the main characteristic of the data

• What is a Data Rule?• Data Rule is a constraint written against one or more

Tables that is used to find incorrect data.

• Can be viewed as business rules for data

Define Data Rules (cont.)

• Data Rules are often embedded in application programs

• The Informatica Data Explorer Practitioner can discover, document and test Data Rules against the initial source.

Using Data Rules in Informatica Data Explorer

• Data Rules is the process of using Informatica Data Explorer to determine if the externally proposed data relationships are fully supported by the source data.

• Discover if the source data supports the relationships and business needs.

• Data Rules are tested against the initial source, stored and then can be re-run after the data has been cleansed or moved.

Business Rules and Data Rules

• Employees with 2 or more years of service are paid 3 weeks vacation.

• Fulltime employees are assigned to a salary band.

• Employees in Dept C – salaries cannot be greater than $40,000.

• Department number contained in the employee record must correspond to an existing Department number.

• Does the Column contain a particular string of characters?

Business Rules and Data Rules (cont.)

• Does one Column include the full contents of another Column?

• In an address, is there a line of blanks followed by a line of non-blanks?

• Are all three fields of a key null?

• Is the date Column in the wrong format?

• Does the Column contain the right type of data for this type of record?

Create and Execute Data Rules

• Data Rules can be created from two locations:• Rules Tag

• Drill down

• Execute Data Rules from the Rules Tag viewer or Data Rules Management.

Drill Down

New Rule Tag

When to Apply Data Rules

• Tightly coupled to Drill Down

• Data Rules can be executed against different sources.

• Data Rules can be applied at any point in time during the data discovery process.

• Data Rules can be saved and re-run• After the data load as occurred or• A feed is supplied or• Data has changed for any reason

Complex Data Rules

RULE LoanTypeAmtTerm

SELECT "Loan_ID","Loan_Type","Loan_Amt","Loan_Term"

FROM <Use Table in Data Source>

WHERE (UPPER(LOAN_TYPE) = 'AUTO' and

(LOAN_AMT not between 3000 and 50000 or

LOAN_TERM not between 12 and 60)) or

(UPPER(LOAN_TYPE) = 'REAL' and

(LOAN_AMT not between 10000 and 500000 or

LOAN_TERM not between 36 and 360)) or

LOAN_TYPE is null or LOAN_AMT is null or LOAN_TERM is null

Data Rule Management

Lesson 3 Review

• Data Rules can be created on Columns that we think are volatile.

• Data Rules can be created, saved and ran on different data sources.

• Data Rules can be created from two locations:• Rules tab• Drill down

• Execute Data Rules from the Rules Tag viewer or Data Rules Management.

Lesson 4

Single Table Structural Analysis

Lesson 4 Objectives

• Explain what Table Structural Profiling is, and why it should be performed

• Define the term “Functional Dependency” as used by Informatica Data Explorer, and explain the significance

• Contrast a Single-Column Determinant to a Multiple-Column (or compound) Determinant as used by Informatica Data Explorer

• Define the terms “Inferred Dependencies” and “Model Dependencies” as used by Informatica Data Explorer

• Explain why and when an Inferred Dependency should be added to the set of Model Dependencies

• Define the term “Sample Data” as used by Informatica Data Explorer, and explain the use of Sample Data in Dependency Profiling

• Understand when and how to apply Informational Tags in Dependency Profiling

What is Table Structural Profiling?

• A process that discovers the interrelationships between columns in your source data

• Is performed against samples of data that you have imported into Informatica Data Explorer

• It identifies Columns that determine the value of other Columns

Why Profile Table Structure?

• Functional Dependencies determine the structure of a data model and/or database design

• Functional Dependencies can be equated to an elementary form of Business Rule

• Dependencies between data items suggest organization of data storage that is both natural and efficient

Why Profile Table Structure? (cont.)

• Quickly validate expected Dependencies (Keys)

• If data does not conform to expected or required dependency rules, you most likely have a data integrity problem

IDEServer

Flat FilesDB2 LU

What is Sample Data?

• Sample Data is actual data that you import into an Informatica Data Explorer Table either from:• Downloaded flat files or • Directly from a relational database

• Sample Data is a subset of the data in the source database:• Multiple data samples can be loaded into

Informatica Data Explorer• Each data sample is stored in the Project

• Sample Data is associated with a particular Table

Why Import Sample Data?

• Sample data is used in Table Structural Profiling to examine relationships of all columns of a given record

Source

Column Profiling(stores results only)

ImportSampleData

Data Sample #1

Table Structural Profiling

(examines entire records)

Data Sample #2

A value of EMPNO always determines the same value of ENAME throughout the sample data

EMPNO ENAME

123456789012345789

John DoeJane Smith

Eduardo SanchezJane SmithJohn Doe

Eduardo Sanchez

Functional Dependencies

• An Column is functionally dependent on other Columns that determine its value

Functional Dependencies (cont.)

• A Functional Dependency is written as:• A B

• ‘A’ is the Determinant Column

• ‘B’ is the Dependent Column

• The statement is ALWAYS read left to right• ‘A functionally determines B’, or

• “If I know a value for A, I can determine the value for B” or

• For each distinct value of ‘A’ there can only be one value of ‘B’

Functional Dependencies (cont.)

• The determinant side can be compound:• A + B C

• ‘A’ and ‘B’ together are the Determinant Column

• ‘C’ is the Dependent Column

• The determinant side can be Null:• Ø C

• Nothing is the Determinant Column

• ‘C’ is the Dependent Column

• ‘C’ has only one value, or one value and nulls, in the whole sample

Reviewing Inferred Dependencies

• You must review the set of Inferred Dependencies

• The Dependencies inferred by Informatica Data Explorer exist implicitly in the data

• You must make decisions as to which of the Inferred Dependencies explicitly represent the current use of the data

• The review process is to determine• Which dependencies should be included in the set of

dependencies from which the Normalized Schema will be generated

Sample Data

Exercises 4.1 - 4.4

Adding an Inferred Dependency to the Model

• Inferred Dependencies added to the model establish the Tables (tables) that will be created in Normalization• Normalization breaks a single Table (table) into multiple

Tables (tables)

• For example, “Employee” Table in the source system represents two Tables (Employee and Department) once the dependencies are created and the model is normalized

Adding an Inferred Dependency to the Model (cont.)

• Columns that do not participate as a Dependant are automatically included in the Primary Key• Informatica Data Explorer considers all Columns as part of

the key until a relationship is established

• Dependency Profiling is an iterative process

Exercises 4.5 - 4.6

Dependency Subject Area

• Inferred Dependencies• The set of dependencies that are inferred from a sample

of data for a Table

• Table Dependencies• A subset of the Model Dependencies that are wholly

contained in a Table

Dependency Subject Area (cont.)

• Model Dependencies• The set of dependencies that you determine fit into your

design and are supported by the data• Model Dependencies are associated at the schema level• Model Dependencies are the set of all dependencies

across all Tables• Model Dependencies are used to create the normalized

schema

Dependencies

Inferred Dependencies

Key Dependencies

Model Dependencies

Filter Dependencies

Add Dependencies to Model or Filter

When to Add an Inferred Dependency

• Review each Inferred Dependency and add to model only those that can have a explicit reason for existing• Is the application enforcing the dependency?

• Is the user/business enforcing the dependency?

• Is some outside source enforcing the dependency?

Types of Dependencies

• True• The dependency is true for 100% of the data analyzed• Example: Every time a unique value is known for

EMPID, additional information is available (i.e. Employee Name, Address, Phone, etc.)

• Gray• The dependency is almost, but not quite 100% true for

the data analyzed• One row causes the violation

Types of Dependencies (cont.)

• Unsupported

• Two or more rows in the sample data do not support the dependency

• Unknown

• The dependency has not yet been validated against the sample data (Basis dependencies validation appear as Unknown)

• Questions to Ask:• What caused the dependency to be gray?• Should another sample be imported for verification?

• Review each Inferred Gray Dependency and add to model only those that can have a explicit reason for existing:• Is the application supposed to be enforcing the

dependency?• Is the user/business supposed to be enforcing the

dependency?• Is some outside source supposed to be enforcing the

dependency?

When to Add an Inferred Gray Dependency

Lab Exercise 4.7

Tagging Dependencies

• You cannot tag an Inferred or Model Dependency

• You add Tags to the Column that is causing the problem

Compound Determinants

• Two or more Columns that uniquely identify the Dependent Column

• This often represents a M to 1 relationship in the data

• This happens quite often in older file-based systems

Lab Exercise 4.8 – 4.9

Lesson 4 Review

• Importing Sample Data stores the data inside an Informatica Data Explorer Project

• Sample Data is used as input to Dependency Profiling

• You must import Sample Data before you can perform the Profile Dependencies task using Informatica Data Explorer• Data samples are imported using the Import Sample

Data feature• Data samples can be retained from doing a Drill Down

or executing a Data Rule

• Dependency Profiling finds the relationships between Columns in the same source file or table

• All Inferred Dependencies are associated with sets of Sample Data

• Table Dependencies are dependencies that have been added to the model, and are associated with a specific Table

• Model Dependencies are the set of dependencies from all Tables in the schema

• Only Model Dependencies are used as input to the generation of a Normalized Schema

• All Dependencies inferred by Informatica Data Explorer exist implicitly in the data

• You will find many Inferred Dependencies that have no meaning in context of the application or business use of the data

• These are Implicit Dependencies that have no explicit meaning

• Dependency Profiling is an iterative process

Lesson 5

Cross Table Profiling

Lesson 5 Objectives

• Explain what Cross Table Profiling is, and why it should be performed

• Execute the Cross Table Profiling function in Informatica Data Explorer

• Navigate and review the results of Cross Table Profiling

• Define the terms “Synonym” and “Homonym” as used in Informatica Data Explorer

• Understand what data is used for Cross Table Profiling, and how potential Synonyms are identified

• Describe why and when a Synonym should be created

• Create a Synonym

• Understand the significance of creating Synonyms

Cross Table Profiling

• The process that identifies similarity between the values in other columns

• Performed using the value sets associated with the Column objects inside Informatica Data Explorer• These are the Value Frequency

Lists that were created by Column Profiling

Why Profile Redundancies?

• To uncover Columns that actually represent the same business facts

• Informatica Data Explorer can uncover two types of redundancies: • Synonyms

• Redundant data that you would like to eliminate through the creation of Synonyms

• Redundant data that is intended to improve database performance

• Homonyms• Data that looks redundant but actually represents quite different

business facts (Homonyms)

Comparing Value Sets

Value SetOverlap

Value Set1

ValueOverlap

Value Set2

Inferred Redundancies

Exercise 5.1 - 5.2

• Two or more Columns having the same business meaning

• Comparing common values between columns can identify candidate Synonyms

SP_NOValueSet

EMPIDValueSet

28%overlap

Synonyms

Effect of Synonyms

• If the Primary Keys of two Tables are synonyms, they will collapse into a single Table in the Normalized Schema

TransactionID (PK)

ProductID

ProductName

InventorySupplier

TransactionID

TransactionID (PK)

ProductID

PruductName

SupplierName

SupplierAddr

TransactionID (PK)

SupplierName

SupplierAddress

Effect of Synonyms (cont.)

• If two Columns that are synonyms represent a parent-child relationship, they will result in two Columns in two Tables with one Column participating in a Primary Key and the other in the corresponding Foreign Key

OrderNumber

ProductID

ProductName

Order Payment

PaymentID

OrderID

CheckNumber

OrderNumber (PK)

ProductID

ProductName

OrderNumber

PaymentID

OrderNumber (FK)

CheckNumber

Homonyms Defined

• Two or more Columns having the same name yet different business meanings

70%overlap

SHIPPING_STATEValueSet

STATEValueSet

Making Synonyms

Synonyms

Exercise 5.3 - 5.4

Lesson 5 Review

• Cross Table Profiling is about data integration between sets of data

• Cross Table Profiling comprises 2 activities• Comparing value lists

• Use Foreign Key or Join analysis to compare value lists greater than 16,000

• Assigning Synonyms

• Rule of Thumb• Be conservative about making Synonyms

• You can always come back after you’ve normalized the schema and make more

Lesson 5 Review

• You can not make intra-table Synonyms, only inter-table

• You must have built Value Lists either during the Profile Columns task, or during the Import Sample Data task, before you can perform Cross Table Profiling

• Creation of Synonyms participates in Normalization

Lesson 6

Validating Table and Cross Table Analysis

• Understand how Validation differs from Cross Table Profiling

• Define and discuss the term Referential Integrity

• Explain various methods of validation and how it can be use

• Execute Validation tasks

• View Validation results

Lesson 6 Objectives

• Validation can be used to:• Define the exact overlap characteristics of two redundant

Columns• Validate a single or multi-Column foreign key• Validate that the keys of two tables do not overlap (Vertical

Merge)• Validate single or multiple Column keys (Validate Keys)• Validate a Join• Validate against reference table• Validate against Domain values

• Execute Validation from the Single Table Structural Analysis and Cross Table Structural Analysis

Validation

Referential Integrity

• Example A: An Order File contains an OrderID that uniquely identifies each customer order. There should be no OrderID values in the Order or Detail file that do not exist in the other.

Example A

Referential Integrity (cont.)

• Example B: An Order file may have OrderID values that do not exist in the Payment file (outstanding payments or unbilled customers). The Payment file should not have any OrderID values that do not occur in the Order file.

Example B

• Validation compares sets of Columns between two relations to discover the quality of the overlap.

• Validation exhaustively tests all the data.

• Cross Table Profiling discovers potential overlap between Columns.

• Cross Table Profiling estimates overlap.

• Results of Validation– sets of statistics about the overlap and non-overlapping values

Validation and Cross Table Profiling

• To understand the exact overlap:• Execute Validation from the Cross Table Profiling

• create a relationship (Primary Key / Foreign Key, Join, …) between the two Columns and choose Validate

Profile Redundant Columns

Exercise 6.1-6.2

• Validate a Single or Multi-Column Foreign Key

• Primary use – test the Referential Integrity of primary and foreign key relationships.

• Each row in a child table must reference a row in the parent table.

• Every order detail record must reference an order.

• Information discovered can be used to help write logic to perform the data integration.

Foreign Key Analysis

Foreign Key Analysis Results

Parents Without Children

Exercise 6.3-6.5

• Primary use – when two similar systems are merged together.

• Company A merges with Company B: payroll master records are merged.

• It is expected that all rows in the parent and child tables are orphans.• Employees of Company A are not on Company B’s payroll

master file.

• Employees of Company B are not on Company A’s payroll master file.

Vertical Merge Analysis

Vertical Merge

Vertical Merge Analysis Results

Exercise 6.6-6.8

• Primary use – validate keys in a single Table

• Validation looks at the table and checks to make sure that every row is unique.

• Use this feature to find any duplicate rows for keys discovered in Single Table Structural Analysis.

Validate Key Analysis

Validate Key

Validate Key Analysis Results

New Alternate Key

Validate Alternate Key

Exercise 6.9 - 6.12

Lesson 7

Normalization

Lesson 7 Objectives

• Explain what Normalization is and when it should be performed

• Execute the Normalization function of Informatica Data Explorer

• Navigate and review the results of Normalization

• Describe what an Column Trace is, and how it is used

• Understand how to modify the Normalized Schema by making changes to the Source Schema

• Explain the iterative nature of Normalization

Normalization

• A process that transforms an initial schema into a schema with greater integrity

• A process of transforming Source Schema into a:• Non-redundant

• Anomaly-free

• Third Normal Form model

• Normalization is based upon:• Dependencies added to the model in Single Table Structural

Analysis and

• Synonyms made in Cross Table Structural Analysis

Why Normalize?

• A Third Normal Form (3NF) schema has no: • Redundant Columns other than Foreign Keys

• Columns that are only partially dependant on the key

• Transitive Dependencies

• The Normalized Schema provides a checkpoint for the completeness and accuracy of the decisions you made during the profiling tasks

Exercise 7.1

Normalized Schema v. Source Schema

custord

ORDER_NO: char(4)ITEM_NO: char(6)

ORDER_DATE: datetimeSHIPDT: datetimePO_NUM: smallintLAST_NAME: varchar(10)FIRST_NAME: varchar(11)CNAME: varchar(36)CON_TTL: varchar(27)SHIPPING_STREET: varchar(40)SHIPPING_CITY: varchar(20)SHIPPING_STATE: char(2)SHIPPING_ZIP: varchar(10)PHONENUM: varchar(12)SP_NO: smallintQUANTITY: smallintITEM_DSC: varchar(25)SUPID: smallintUNIT_COST: moneyTAX_RATE: decimal(5,4)BILL_CODE: char(10)

empinfo

EMPID: smallint

LAST_NAME: varchar(17)FIRST_NAME: varchar(12)GENDER: char(1)DEPTID: smallintDEPTNM: varchar(14)TITLE: varchar(30)STREET: varchar(40)CITY: varchar(15)STATE: varchar(3)ZIP: varchar(10)PHONE: varchar(14)

All_Constant_Attributes

BILL_CODE: char(10)TAX_RATE: decimal(5,4)

ITEM_NO

ITEM_NO: char(6)

SUPID: smallintITEM_DSC: varchar(25)

ITEM_NO_ORDER_DATE

ITEM_NO: char(6)ORDER_DATE: datetime

UNIT_COST: money

ORDER_NO

ORDER_NO: char(4)

PHONENUM: varchar(12)SHIPPING_ZIP: varchar(10)SHIPPING_STATE: char(2)SHIPPING_CITY: varchar(20)SHIPPING_STREET: varchar(40)CON_TTL: varchar(27)CNAME: varchar(36)FIRST_NAME: varchar(11)LAST_NAME: varchar(10)PO_NUM: smallint

ITEM_NO_ORDER_NO

ITEM_NO: char(6)ORDER_NO: char(4)ORDER_DATE: datetimeEmployeeID: smallintDEPTID: smallint

QUANTITY: smallintSHIPDT: datetime

DEPTID

DEPTID: smallint

DEPTNM: varchar(14)

EmployeeID

EmployeeID: smallintDEPTID: smallint

PHONE: varchar(14)ZIP: varchar(10)STATE: varchar(3)CITY: varchar(15)STREET: varchar(40)TITLE: varchar(30)GENDER: char(1)FIRST_NAME: varchar(12)LAST_NAME: varchar(17)

Normalized Schema Anomalies

• Observable normalization anomalies may include:• Unexpected Tables

• Duplicate Tables

• Tables with strange/unexpected keys

• Columns in the wrong locations

Column Tracing

• Allows you to find the origin of an Column in another schema

• Used to determine the Source Model Dependencies and Synonyms (or the lack thereof) which may be causing the anomaly

Schema Locking

• The existence of a Normalized Schema causes Informatica Data Explorer to lock various objects in the Source Schema

• In order to modify Dependencies in the Source Schema, you must remove the Normalized Schema

Re-Normalizing

• In order to change the Normalized Schema, you must Remove the Normalized Schema

• Modify the Source Schema, then Re-run Normalization

• The next exercises:• Remove a dependency

• Add another Table

• Renormalize schema

• Review the new Normalized Schema

Lab Exercises 7.2 – 7.3

New Normalized Schema

All_Constant_Attributes

BILL_CODE: char(10)TAX_RATE: decimal(5,4)

ITEM_NO

ITEM_NO: char(6)

SUPID: smallintITEM_DSC: varchar(25)

SHIPPING_ZIP

SHIPPING_ZIP: varchar(10)

SHIPPING_STATE: char(2)SHIPPING_CITY: varchar(20)SHIPPING_STREET: varchar(40)

ORDER_NO

ORDER_NO: char(4)

PHONENUM: varchar(12)SHIPPING_ZIP: varchar(10)CON_TTL: varchar(27)CNAME: varchar(36)FIRST_NAME: varchar(11)LAST_NAME: varchar(10)PO_NUM: smallint

ITEM_NO_ORDER_NO

ITEM_NO: char(6)ORDER_NO: char(4)

UNIT_COST: moneyQUANTITY: smallintEmployeeID: smallintSHIPDT: datetimeORDER_DATE: datetime

DEPTID

DEPTID: smallint

DEPTNM: varchar(14)

EmployeeID

EmployeeID: smallint

PHONE: varchar(14)ZIP: varchar(10)STATE: varchar(3)CITY: varchar(15)STREET: varchar(40)TITLE: varchar(30)DEPTID: smallintGENDER: char(1)FIRST_NAME: varchar(12)LAST_NAME: varchar(17)

Lesson 7 Review

• Normalization is a 100% automated process

• The only inputs to the normalization process are• Dependencies added to the Model

• Column Synonyms

• Refinement of the Normalized Schema is an iterative process

• The Normalized Schema is most often used as a basis for• Baseline view

• Review for anomalies

• Comparison to business requirements

• Staging Area

• The Normalized Schema is not a business model

• Normalized Schema anomalies stem from either:• Dependencies added to the model

• Dependencies not added to the model

• Incorrect (or unmade) Synonyms

• You can Normalize the Source Schema as soon as you have added dependencies to the model during Single Table Structural Analysis• Actually, you can do it any time but it will just make

a copy of your existing schema if you have not added any dependencies.

• If you have not established inter-relational Synonyms, you will get duplicate Tables and/or Columns in the Normalized Schema• Duplicate Tables will appear in the Normalized Model

with an extension, such as:• EmployeeID

• EmployeeID_1

• Suggestions:• Make only one change at a time and then renormalize

• Often making one change in the Source Schema can result in several changes in the Normalized Schema

Lesson 8

Exporting to the Repository

Lesson 8 Objective

• Export Projects to the IDE Repository

What is the Repository?

• A series of relational database tables that store the results from the Informatica Data Explorer Product Suite

Repository Export

• The Repository Export dialog box enables you to export an IDE catalog to the Repository

• The Repository Export dialog box provides the ability to limit some of the data that is exported to the Repository

• Once in the Repository, the Catalog becomes available to a variety of DBMS tools, such as SQL, report generators, and so on

• All schemas in the Catalog will be exported to the Repository

IDE Repository Architecture

UNIX or Windows NT

IDEServer

ODBCDrivers

Windows XP, 2000

IDEClient

Client Server

Project

RepositoryRDBMS

UNIX or Windows NT

Exporting to Repository

Lab Exercise 8.1

Lesson 8 Review

• You control what information from a Project is included in the Export process

• The more you export, the longer the process will take

• Information exported to the IDE Repository becomes available to:• Informatica Data Explorer Repository Navigator

• Report Writing tools

• SQL tools

Lesson 9

Using the Repository Navigator

Lesson 9 Objectives

• Understand use of the Repository Navigator

• Access the IDE Repository and browse its contents using the Navigator

• Explain Tags

• Understand how to share information among departments

IDE Repository Navigator

• A browser for the contents of the IDE Repository

• Can be used by anyone in your enterprise

Repository~~~~

KnowledgeAbout

Corporate Systems

Structure

Content

Quality

IDE Repository Architecture

UNIX or Windows NT

IDEServer

ODBCDrivers

Windows XP, 2000

IDEClient

ODBCDrivers

IDESourceProfiler

Client Server

IDEFTM/XML

Project

IDE Repository

RepositoryNavigator

Schema Viewer

• The Schema Viewer functions similar to the Navigation Tree in Informatica Data Explorer• You expand/contract objects

• You use a right-click of the mouse to view properties

• The Schema Viewer provides users with the ability to query profiling information for Tables and Columns (Properties, Tags, Sample Data, Value Frequency Lists) within each schema

Exercise 9.1-9.3

The Link Viewer

• The Link Viewer shows links between any two schemas in the current project.

• Link Viewer uses:

• View Links between Columns

• Find information on compatibility problems

• Access Tags associated with Links

Link Viewer

Table Viewer

• Provides SQL access to the IDE Repository

• Has several pre-built SQL queries

• Allows you to run your own custom queries

Exercise 9.4-9.6

Lesson 9 Review

• The IDE Repository provides:• Rapid access to source data knowledge

• Team collaboration

• Enhanced communication

• Flexible ad hoc reporting

Lesson 10

Repository Reports

Lesson 10 Objectives

• Understand what IDE Repository Reports are

• Demonstrate how to use Repository Reports

• Create a report using a Crystal Reports template

• Export a report using Crystal Reports

What are Repository Reports

• IDE Repository Reports are a series of reports to provide specific management information from the IDE Repository.

• Reports are written with Crystal Reports.

Why use Crystal Reports?

• Provides a user interface to guide the design of reports that are stored in a relational database

• Can export data to other programs such as Excel, Word or HTML pages

• Provides the flexibility to create custom or ad hoc reports. The user is not limited to the reports provided in the Informatica Data Explorer Product Suite

• Accesses the IDE Repository through an ODBC connection

Report Templates

• A series of reports are provided as an easy means of obtaining documentation from the IDE Repository

• The Report Templates can be modified to meet individual needs

List of Reports

Column Profile - By File Column Profiling results sorted by FileColumn Profile - By Field Column Profiling results sorted by FieldNull Rule Exceptions List of Attributes with Null, Zero or BlanksValue Frequency Value Frequency Lists for AttributesSupported Relationships Inferred Dependencies for each Data SampleModel Relationships Dependencies that have been added to the ModelOverlapping Data Redundancy Profiling Overlap ReportNotes Note Tag ReportAction Items Action Item Tag ReportRules Rule Tag ReportTransformations Transformation Tag ReportAttribute Links Reports Links between Attributes

Selection Criteria

• Allow users to select values for certain fields within the templates

• Limit the amount of data reported from the IDE Repository

• Each Template provides selection on ProjectName and SchemaName at a minimum

Exercises 10.1 – 10.4

Exporting Reports

• Crystal Reports provides an option to export report data into other file formats

• Useful for sharing data with individuals that do not have access to Crystal Reports

Exercises 10.5

Lesson 10 Review

• IDE Repository Reports provide reporting capability from the IDE Repository

• Additional reports can be created to meet business needs

Lesson 11

Integration with PowerCenter

PowerCenter Integration

• Informatica Data Explorer has the ability to share metadata with PowerCenter. This allows the business users to share knowledge that was found during the data discovery process with the PowerCenter developers.

• Objects that can be shared are: • Source and target schemas

• Filters

• Expressions (transformation tags in IDE).

Create a Transformation Tag

Transformation Tag

Set Physical Properties

Export to Repository

Open Fixed Target Mapping (FTM)

Open Your Project

Export to PowerCenter

Import Object into PowerCenter

Open Customer in Source Analyzer

Open a new Transform

Open Ports Tab

Informatica Resources

Informatica – The Data Integration Company

Informatica provides data integration tools for both batch and real-time applications:

Data Migration Data Synchronization

Data Warehousing Data Hubs

Business Activity Monitoring

• Founded in 1993

• Leader in enterprise solution products

• Headquarters in Redwood City, CA

• Public company since April 1999 (INFA)

• 2000+ customers, including over 80% of Fortune 100

• Strategic partnerships with IBM Global Services, HP, Accenture, SAP, and many others

• Technology partnership with Composite Software for Enterprise Information Integration (EII) – real-time federated views and reporting across multiple data sources

• Worldwide distribution

Informatica – Company Information

Informatica Affiliations

Informatica Resources

www.informatica.com – provides information (under Services) on:• Professional Services• Education Services

my.informatica.com – customers and contractual partners can sign up to access:• Technical Support• Product documentation (under Tools – online documentation)• Velocity Methodology (under Services)• Knowledgebase• Mapping templates

devnet.informatica.com – sign up for Informatica Developers Network• Discussion forums• Web seminars• Technical papers

IDE 5.0_Basics_20061025

Documents

Transcript of IDE 5.0_Basics_20061025

MONKEYTALK IDE

IDE— INTSORMIL

Annex C - IDE Series€¦ · Web viewLEGO EV3/NXT Robotics platform IDE Challenge Craft materials IDE Maker Electricity IDE Junior Maker Electricity IDE Sprint Sharp parts IDE Mech

Ide-Ide Posting di Media Sosial

MegaRAID IDE (ATA/66) and MegaRAID IDE 100 (ATA/100) …manual describes the MegaRAID IDE (ATA/66) and MegaRAID IDE 100 (ATA/100) IDE controller. Limited Warranty The buyer agrees

Eclipse IDE

Vb.net ide

SuiteCloud IDE Guide - Oracle · SuiteCloud IDE Overview SuiteCloud IDE is an Eclipse-based integrated development environment (IDE) that is packaged for NetSuite platform development.

Ide Kreatif

Debugging with Wing IDE - Meetupfiles.meetup.com/2179791/DebuggingWithWingIDE.pdf · Wingware Wing IDE Python Wing IDE Wing IDE Windows, Linux OSX Django, Turbogearss Plone, GAE*

Presentasi Ide

Amparo IDE

IDE Electric - Cajas plásticas. · 2020. 11. 18. · IDE Electric - Cajas plásticas. Author: IDE ELECTRIC Subject: IDE ELECTRIC fabrica armarios metálicos, armarios de distribución

Introduction to MPLAB IDE - Sonoma State UniversityIntroduction to MPLAB IDE Updated: Feb 2019 What is IDE? •Integrated Development Environment(IDE) •Collection of integrated programs

Setup for AMIBIOS8 - American Megatrends Inc. - Home IDE Master, Primary IDE Slave, Secondary IDE Master, Secondary IDE Slave..... 9 Hard disk drive Write Protect..... 9 IDE Detect

Cloud Based Web IDE for Collaborating …ijcsn.org/.../Cloud-Based-Web-IDE-for-Collaborating-Programmers.pdfCloud Based Web IDE for Collaborating Programmers Cloud Based Web IDE for

PRW24C02BBSG ide

Menghargai Ide

#Ngadu ide

GENERIC IDE DISK GENERIC IDE DISK TYPE47 DMA(M) …