Reporting the Difference Rows Between Two Sources Using a

16
Report ing the diffe r ence rows between two s our ces using Informa tica http://nelrickrodrigues-informatica.blogspot.com The purpose of the mappings discussed below is to report the difference rows between two sources in different scenarios. Scenario 1: When there are difference rows in one of the sources and both the sources are either flat files or a flat fi le and a relati ona l t able. To illustrate this scenario, the two sources considered are two comma separated flat files - EMPLOYEE_FILE_1. txt and EMPLOYEE_FILE_2.txt. The EMPLOYEE_FILE_2.txt has some extra records that need to be reported or loaded into the target which is a r elational table having the same definition as the sources. The difference rows between the two flat files ar e highlighted in red as shown below. Import the two flat file definitions using the Source Analyzer in Informatica PowerCenter Designer client tool as shown below. Create a new mapping in the Mapping Designer and drag the two source definitions into the workspace. Now, to identify the extra records from EMPLOYEE_FILE_2, a Joiner transformation followed by a Filter transformation is used.

Transcript of Reporting the Difference Rows Between Two Sources Using a

8/8/2019 Reporting the Difference Rows Between Two Sources Using a

http://slidepdf.com/reader/full/reporting-the-difference-rows-between-two-sources-using-a 1/16

Report ing the difference rows between two sources using Informatica

http://nelrickrodrigues-informatica.blogspot.com

The purpose of the mappings discussed below is to report the difference rows between two sources in

different scenarios.

Scenario 1: When there are difference rows in one of the sources and both the sources are either flat

files or a flat file and a relational table. 

To illustrate this scenario, the two sources considered are two comma separated flat files -

EMPLOYEE_FILE_1.txt and EMPLOYEE_FILE_2.txt. The EMPLOYEE_FILE_2.txt has some extra records that

need to be reported or loaded into the target which is a relational table having the same definition as

the sources. The difference rows between the two flat files are highlighted in red as shown below.

Import the two flat file definitions using the Source Analyzer in Informatica PowerCenter Designer clienttool as shown below.

Create a new mapping in the Mapping Designer and drag the two source definitions into the workspace.

Now, to identify the extra records from EMPLOYEE_FILE_2, a Joiner transformation followed by a Filter

transformation is used.

8/8/2019 Reporting the Difference Rows Between Two Sources Using a

http://slidepdf.com/reader/full/reporting-the-difference-rows-between-two-sources-using-a 2/16

The illustration discussed below uses an unsorted Joiner Transformation and since both the sources are

having few records, any one of the sources can be treated as the "Master" source. Here the

EMPLOYEE_FILE_1 source is designated as the "Master" source. In practice, however, to improve

performance for an unsorted joiner transformation, the source with fewer rows is treated as the

"Master" source while for a sorted joiner transformation, the source with fewer duplicate key values is

assigned as the "Master" source.

First drag all the ports from the source qualifier SQ_EMPLOYEE_FILE_2 into the joiner transformation.

Notice the ports created in the joiner transformation are designated as the "Detail" source by default.

Drag only the EMP_ID port from SQ_EMPLOYEE_FILE_1 into the joiner transformation as shown below.

8/8/2019 Reporting the Difference Rows Between Two Sources Using a

http://slidepdf.com/reader/full/reporting-the-difference-rows-between-two-sources-using-a 3/16

Double-click the joiner transformation to open up the Edit view of the transformation. Click on the

Condition tab and specify the condition shown below.

8/8/2019 Reporting the Difference Rows Between Two Sources Using a

http://slidepdf.com/reader/full/reporting-the-difference-rows-between-two-sources-using-a 4/16

Since, we need to pass all the rows from the EMPLOYEE_FILE_2, the Join Type used is "Master Outer

Join" in the Properties tab as shown below. This will ensure that all the rows from the "Detail" source

and only the matching rows from the "Master" source will pass from the joiner transformation.

The Join types supported in the joiner transformation are described below.

8/8/2019 Reporting the Difference Rows Between Two Sources Using a

http://slidepdf.com/reader/full/reporting-the-difference-rows-between-two-sources-using-a 5/16

The rows passed by the joiner transformation are shown below. For the missing rows in the "Master"

source, the EMP_ID1 value is NULL.

Now, a filter transformation can be used ahead to pass only the records having EMP_ID1 as NULL since

these rows correspond to the difference rows from EMPLOYEE_FILE_2 source. Pass all the rows from the

 joiner transformation into the filter transformation and add the Filter condition shown below in the

Properties tab of the filter transformation.

8/8/2019 Reporting the Difference Rows Between Two Sources Using a

http://slidepdf.com/reader/full/reporting-the-difference-rows-between-two-sources-using-a 6/16

The records passed by the filter transformation are shown below.

Link the EMP_ID, EMP_NAME and CITY ports to the target definition. The complete mapping is shown

below.

Create a session task and a workflow. After running the workflow, the difference rows loaded into the

target relational table are shown below.

8/8/2019 Reporting the Difference Rows Between Two Sources Using a

http://slidepdf.com/reader/full/reporting-the-difference-rows-between-two-sources-using-a 7/16

Scenario 2: When there are difference rows in both the sources and both the sources are either flat

files or a flat file and a relational table. 

For this illustration, we consider both the sources are flat files. The difference rows in both the flat files

are highlighted in red. The target relational table has an additional column SOURCE_NAME added, whichindicates the source name where the difference row is present.

The EMPLOYEE_FILE_1 is treated as the "Master" source again for the unsorted joiner transformation.

All the ports from the source qualifier SQ_EMPLOYEE_FILE_1 are passed to the joiner transformation as

shown below because the difference records are present in both the sources.

8/8/2019 Reporting the Difference Rows Between Two Sources Using a

http://slidepdf.com/reader/full/reporting-the-difference-rows-between-two-sources-using-a 8/16

The join condition for the joiner transformation is the same as the first scenario, but for this case, the

Join Type is Full Outer Join as shown below.

The rows passed by the joiner transformation are shown below. For the missing rows in the "Master"

source, the EMP_ID1, EMP_NAME1 and CITY1 values are NULL, while for the missing rows in the "Detail"

source, the EMP_ID, EMP_NAME and CITY values are NULL.

The filter transformation shown below should only pass the rows having NULL values in EMP_ID or

EMP_ID1 ports as these correspond to the difference rows in the EMPLOYEE_FILE_2 and

EMPLOYEE_FILE_1 flat files respectively.

8/8/2019 Reporting the Difference Rows Between Two Sources Using a

http://slidepdf.com/reader/full/reporting-the-difference-rows-between-two-sources-using-a 9/16

 

The filter condition used in the filter transformation is shown below.

8/8/2019 Reporting the Difference Rows Between Two Sources Using a

http://slidepdf.com/reader/full/reporting-the-difference-rows-between-two-sources-using-a 10/16

The filter transformation passes the following rows.

Now, add an expression transformation after the filter transformation. Pass all the ports from the filter

transformation to the expression transformation. The expression transformation should have the

following ports in the order shown below.

The logic for the output port EMP_ID_OUT is that if the EMP_ID value from EMPLOYEE_FILE_2 source is

NULL, pass the EMP_ID1 value from the EMPLOYEE_FILE_1 source, else return the EMP_ID value from

EMPLOYEE_FILE_2 source. This logic works because for any row passed from the filter transformation,

either the row from the "Master" source will have NULL values or the row from the "Detail" source will

have NULL values. A similar logic is applied for the EMP_NAME_OUT and CITY_OUT output ports.

8/8/2019 Reporting the Difference Rows Between Two Sources Using a

http://slidepdf.com/reader/full/reporting-the-difference-rows-between-two-sources-using-a 11/16

Another output port SOURCE_NAME_OUT is used to determine the source of the difference row. The

expression used for this port is shown below.

The complete mapping is shown below.

Create a session task and a workflow. After running the workflow, the difference rows loaded into the

target relational table are shown below.

8/8/2019 Reporting the Difference Rows Between Two Sources Using a

http://slidepdf.com/reader/full/reporting-the-difference-rows-between-two-sources-using-a 12/16

Scenario 3: When there are difference rows in two relational tables residing in the same database.

For this illustration, two relational tables EMPLOYEE_TABLE_1 and EMPLOYEE_TABLE_2 having the same

definition are considered. The rows present in both the tables are shown below and the difference rows

are highlighted in red.

Import the table definition of any one source in the Source Analyzer. Here, the EMPLOYEE_TABLE_1

source definition is imported.

8/8/2019 Reporting the Difference Rows Between Two Sources Using a

http://slidepdf.com/reader/full/reporting-the-difference-rows-between-two-sources-using-a 13/16

A simple SQL query that returns the difference rows in EMPLOYEE_TABLE_1 is given below.

SELECT EMP_ID, EMP_NAME, CITY FROM EMPLOYEE_TABLE_1 EMP_1

WHERE NOT EXISTS (SELECT EMP_ID FROM EMPLOYEE_TABLE_2 EMP_2

WHERE EMP_1.EMP_ID = EMP_2.EMP_ID)

The above query can be modified to return the difference rows between the two source tables andalso the table name where the difference row is present. The alias column 'SOURCE_NAME'

gives the source table name of the difference row.

SELECT EMP_ID, EMP_NAME, CITY, 'EMPLOYEE_TABLE_1' AS SOURCE_NAME

FROM EMPLOYEE_TABLE_1 EMP_1 

WHERE NOT EXISTS (SELECT EMP_ID FROM EMPLOYEE_TABLE_2 EMP_2

WHERE EMP_1.EMP_ID = EMP_2.EMP_ID)

UNION 

SELECT EMP_ID, EMP_NAME, CITY, 'EMPLOYEE_TABLE_2' AS SOURCE_NAME

FROM EMPLOYEE_TABLE_2 EMP_2 

WHERE NOT EXISTS (SELECT EMP_ID FROM EMPLOYEE_TABLE_1 EMP_1 WHERE EMP_1.EMP_ID = EMP_2.EMP_ID) 

8/8/2019 Reporting the Difference Rows Between Two Sources Using a

http://slidepdf.com/reader/full/reporting-the-difference-rows-between-two-sources-using-a 14/16

Add the above query to the source qualifier transformation in the Sql Query attribute value as

shown below. This will override the default SQL query issued when the session runs. Ensure that

the order of the columns in the SQL query match the order of the ports in the Source Qualifier.

8/8/2019 Reporting the Difference Rows Between Two Sources Using a

http://slidepdf.com/reader/full/reporting-the-difference-rows-between-two-sources-using-a 15/16

The alias column 'SOURCE_NAME' value also needs to be passed to the target. For this purpose, a new

port SOURCE_NAME is created in the source qualifier as shown below.

By default, all ports that are in the source qualifier are input/output ports. Hence, the new port

SOURCE_NAME that is created should be linked to a field from the source definition having the same

datatype i.e. the datatype varchar2 in the source definition changes to string in the source qualifier. If 

the port SOURCE_NAME is not linked, the session will fail with an error - TE_7020 Internal error. The

Source Qualifier [SQ_EMPLOYEE_TABLE_1] contains an unbound field [SOURCE_NAME].

Link the CITY port from the source definition to the SOURCE_NAME port in the source qualifier. As a

good practice, use an expression transformation in between the source qualifier and the target

definition as shown below.

8/8/2019 Reporting the Difference Rows Between Two Sources Using a

http://slidepdf.com/reader/full/reporting-the-difference-rows-between-two-sources-using-a 16/16

Create a session task and a workflow. After running the workflow, the difference rows loaded into the

target relational table are shown below.