15803147 Typical Scenario

8/8/2019 15803147 Typical Scenario

1/16

Informatica- Complex Scenarios and their Solutions

Author(s)

Aatish Thulasee Das

Rohan Vaishampayan Vishal Raj

Date written(MM/DD/YY): 07/01/2003

Declaration

We hereby declare that this document is based on our personal experiences and / orexperiences of our project members. To the best of our knowledge, this documentdoes not contain any material that infringes the copyrights of any other individual ororganization including the customers of Infosys.Aatish Thulasee Das, Rohan Vaishampayan, Vishal Raj

Project Details

Projects involved: REYREY

H/W Platform: 516 RAM, Microsoft Windows 2000

S/W Environment: Informatica

Appln. Type: ETL tool

Project Type : Dataware housing

Target readers: Datawarehousing team using ETL tools

Keywords

ETL Tools, Informatica, Dataware Housing

INDEX

INFORMATICA- COMPLEX SCENARIOS AND THEIR SOLUTIONS................... ..... ...1

Author(s)........................................................................................................................1

Aatish Thulasee Das.......................................................................................................1

Rohan Vaishampayan.....................................................................................................1

Vishal Raj.......................................................................................................................1Date written(MM/DD/YY): 07/01/2003........................................................................1

Declaration.....................................................................................................................1

Project Details ...............................................................................................................1Target readers: Datawarehousing team using ETL tools...............................................1

Keywords ......................................................................................................................1

INTRODUCTION.................................................................................................................... ..3

SCENARIOS:..............................................................................................................................3

1. PERFORMANCE PROBLEMS WHEN A MAPPING CONTAINS MULTIPLE

SOURCES AND TARGETS.........................................................................................................3


2/16

1.1 Background...............................................................................................................3

1.2 Problem Scenario:.....................................................................................................3

1.3 Solution:...................................................................................................................3Divide and Rule. It is always better to divide the Complex mapping (i.e. multiple

source and Targets) in to simple mappings with one source and one target. That will

greatly help in managing the mappings. Also all the related mappings can be executedin parallel in different sessions. Each session will establish its own connection and the

server can handle all the requests in parallel against the multiple targets. Each session

can be placed in to the Batch and run in CONCURRENT mode.................................3

2. WHEN SOURCE DATA IS FLAT FILE...............................................................................4

2.1 Background..............................................................................................................4What is a Flat File?........................................................................................................4

A Flat file is one in which table data is gathered in lines of ASCII text with the value

from each table cell separated by a delimiter or space and each row represented with anew line...........................................................................................................................4

Below is the sample Flat File which was used during the project.................................4

........................................................................................................................................4Fig 2.1: In_Daily - Flat File...........................................................................................4

2.2 Problem Scenario......................................................................................................4

When the above flat file was loaded into Informatica the Source analyzer was likeshown below....................................................................................................................4

........................................................................................................................................5

Fig 2.2: In_Daily - Flat File after loading into Informatica...........................................5

Two Issues which were encountered during loading the above shown flat files are asfollowing:........................................................................................................................5

2.3 Solution....................................................................................................................5

Following is the solution which was incorporated by us to solve the above problem ..5

1. Since the data was so heterogeneous we decided to keep all the data types in thesource qualifier as String and changed them as per the fields in which they were

getting mapped................................................................................................................52. Regarding the Size of the fields we changed the size to the maximum possible size

for example as mentioned...............................................................................................6

............................................................................................................................6

3 EXTRACTING DATA FROM THE FLAT FILE CONTAINING NESTED RECORD

SETS.......................................................................................................................................... ....6

4. TOO LARGE LOOKUP TABLES:.......................................................................................8

5 COMPLEX LOGIC FOR SEQUENCE GENERATION:....................................... ..... .....12


3/16

Introduction

This Document is based upon learning that we had during the work on project Reynolds

and Reynolds in CAPS (PCC), Pune. We have come up with the Best Practices toovercome the complex scenarios we faced during the ETL process. This Document also

tells about some common best practices to follow while developing the mappings.

Scenarios:

1. Performance problems when a mapping contains multiple

sources and Targets.

1.1 Background

In Informatica, multiple sources can be mapped with the multiple targets. Thisproperty is quite useful to map the relative mappings at one place. This reduces

the creation of multiple sessions. Also all the relative loading takes place in onego. It is quite logical to group the different sources and targets in same mapping

that contains the same logic.

1.2 Problem Scenario:

In the multiple target scenarios, if there are some complex transformations in

some of the sub mappings then the performance is degraded drastically. In this

scenario the single database connection is handling multiple database statements.

Also it is difficult to manage the mapping. For example if there is performanceproblem due to one of the sub mapping then other sub mapping will also suffer

the performance degradation.

1.3 Solution:

Divide and Rule. It is always better to divide the Complex mapping (i.e. multiplesource and Targets) in to simple mappings with one source and one target. That

will greatly help in managing the mappings. Also all the related mappings can be

executed in parallel in different sessions. Each session will establish its ownconnection and the server can handle all the requests in parallel against the

multiple targets. Each session can be placed in to the Batch and run in

CONCURRENT mode.


4/16

2. When source data is Flat File

2.1 Background

What is a Flat File?

A Flat file is one in which table data is gathered in lines of ASCII text with the value fromeach table cell separated by a delimiter or space and each row represented with a newline.

Below is the sample Flat File which was used during the project.

Fig 2.1: In_Daily - Flat File.

2.2 Problem Scenario

When the above flat file was loaded into Informatica the Source analyzer was like shownbelow


5/16

Fig 2.2: In_Daily - Flat File after loading into Informatica.

Two Issues which were encountered during loading the above shown flat files are as

following:

1. Data types of the fields from the flat file and the respective fields from Targettables were not matching. For example refer to Fig 2.1 in the First row i.e. recordcorresponding to BH the Fourth field is having its Data type as Date also refer theThird row i.e. field corresponding to CR the fourth field is Char and in the targettable the corresponding field was having data type as Char.

2. Size of the fields from the flat file and the respective fields from Target tableswere not matching. For example refer to Fig 2.1 the Eighth row i.e. recordcorresponding to QR the fifth field is having its Field size as 100 but after theloading process the source analyzer showed the size of the field equal to 45 (asshown in the Fig 2.2) also the fifth field corresponding to CR is 5 and in the targettable the corresponding field was having size equal to 100.

2.3 Solution

Following is the solution which was incorporated by us to solve the above problem

1. Since the data was so heterogeneous we decided to keep all the data types inthe source qualifier as String and changed them as per the fields in which theywere getting mapped.


6/16

2. Regarding the Size of the fields we changed the size to the maximum possiblesize for example as mentioned

3 Extracting data from the flat file containing nested record

sets.

3.1 Background:

The Flat file shown in the previous section (fig 2.2) contains the nested record set. Toexplain the nested formation of the record of the above file is restructured in the Fig 3.1.


7/16

Fig 3.1: In_Daily - Flat File restructured in the Nested form.

Here the data is in 3 levels. First level of data is containing the Batch File

information starting with record BH and ending with BT record. The second levelof data is containing Dealer records in the batch file, starting with record DH and

ending with DT. The third level of data is containing information of different

activities for a particular dealer.


The data required for loading was in the form such that a single row should

consist of dealer detail as well as different activities done by the particular dealer.

But only the second level data (i.e. 2nd and 14th rows in the flat file shown above)

contain the different dealer details and the Third level of data contains thedifferent activity details for dealers. Both the data required to be concatenated to

form single information to load in a single row of target table.

3.3 Solution:

In this particular kind of scenario, the dealer information data (Second Level data)should be stored into variables by putting the condition that satisfies the dealer

information. This row should be filtered in the next transformation. So, for that

Level 1

Level 3

evel 2


8/16

particular row of flat file (i.e. dealer information) the data is stored in the

variables. And for the dealers activity data (Third Level Data), row should be

passed to next transformation with the Dealer Information that was stored in thevariable during previous row load.

The same is done here:

4. Too Large Lookup Tables:

4.1 Background:

What is a Lookup Transformation?

A Lookup transformation is used in your mapping to look up data in a relational table,

view, or synonym (See 4.1). Import a lookup definition from any relational database towhich both the Informatica Client and Server can connect. Multiple Lookuptransformations can be used in a mapping.

The Informatica Server queries the lookup table based on the lookup ports in thetransformation (See Fig 4.2). It compares Lookup transformation port values to lookuptable column values based on the lookup condition. Use the result of the lookup to passto other transformations and the target.


9/16

You can use the Lookup transformation to perform many tasks, including:

Get a related value. For example, if your source table includes employee ID, but youwant to include the employee name in your target table to make your summary dataeasier to read.

Perform a calculation. Many normalized tables include values used in a calculation,

such as gross sales per invoice or sales tax, but not the calculated value (such as netsales).

Update slowly changing dimension tables. You can use a Lookup transformation todetermine whether records already exist in the target.

(The actual screens are attached for reference.)

Fig 4.1: LOOKUP is a kind of Transformation.


10/16

Lookup Conditions

Fig 4.2: The Lookup Conditions to be specified in order to get Lookup Values.


In the project one of the mappings had large lookup tables that were hampering theperformance of the mapping as

a. They were consuming a lot of cache memory unnecessarily andb. More time was spent in searching for relatively less number of values from a large

lookup table.

Thus the loading of data from source table(s) to the target table(s) was unnecessarilyconsuming more time than it should normally do.

4.3 Our Solution:

We eliminated the first problem by simply using the lookup table as one of the sourcetable itself. The source tables & target tables are not cached in Informatica and hence itmade sense to use the large lookup table as a source. (See Fig 4.3) This also ensuredthat Cache memory would not be wasted unnecessarily and could be used for othertasks.


11/16

Multiple Source Tables Joined in the Source Qualifier

Source Qualifier

Fig 4.3: The Mapping showing the use of Lookup table as a Source table.


12/16

SQL to join the tables

User Defined Join

Fig 3.4: The use of Join condition in the Source Qualifier.

After using the lookup table as a source we used a joiner condition in the Source

Qualifier. This reduced the searching time that was taken by Informatica as the numbersof rows to be searched were drastically reduced since the join condition takes care of theexcess rows which would otherwise have been there in the Lookup transformation. Thusthe second problem was also successfully eliminated.

5 Complex logic for Sequence Generation:

5.1 Background:

What is a Sequence Generator?A sequence generator is transformation that generates a sequence of numbers once you

specify a starting value (see Fig 2.2) and the increment by which to increment thisstarting value. (The actual screens are attached for reference.)


13/16

Fig 5.1: The Sequence Generator is a kind of Transformation.


14/16

Fig 5.2: The Transformation details to be filled in order to generate a sequence.

5.2 Problem Scenario: In the project one of the mappings had two requirements viz.,

a. During the transfer of Data to a column of a Target Table the Sequence Generatorwas required to trigger only selectively. But as per its property, every time a row getsloaded into the Target Table the sequence generator is triggered.

b. Another requirement was that the sequences of numbers generated by the SequenceGenerator were required to be in order.

For e.g.: The values that were to be loaded in the column of the target table were eithersequence generated or obtained from a lookup table. So whenever the lookup conditionreturned a value that value would populate the Target Table but at the same time theSequence Generator would also trigger and hence increment by 1 so its CURRVAL

(current value, see Fig 5.1) would be increment by 1. So when the next value is loaded inthe column of the target table the difference between the sequence generated valueswould be 2 instead of 1. Thus the generated sequence wont be continuous and therewould be gaps or holes in the sequence.


15/16

5.2 Our Solution:

A basic rule for the Sequence Generator is that if a row gets loaded into the Target tablethe sequence generator gets triggered. In order to prevent the sequence generator fromtriggering we created two instances of the same target table. (See Fig 5.3)

Sequence Generator Target Table (Second Instance)

Lookup Table Target Table (First Instance)

Fig 5.3: The Mapping showing two instances of the same Target table.

The sequence generator was mapped to the column in the Target Table in the firstinstance (See Fig 5.3) whereas the value returned from the Lookup Table (if any) wasmapped to the same column in the Target table in the second instance (See Fig5.3).

And all the other values for the remaining columns in the Target Table were filtered onthe basis of the value returned from the Lookup Table i.e. if the lookup table returned avalue then a row in the second instance of the target table would get populated and thusthe sequence generator wont be triggered.

If the lookup table returns a null value then a row would get populated in the first instanceof the target table and in this case the sequence generator would trigger and its valuewould get loaded in the column of the Target Table.


16/16

Thus by achieving control over the triggering of the sequence generator we could avoidthe holes or gaps in the sequence generated by the Sequence generator.

15803147 Typical Scenario

Documents

Transcript of 15803147 Typical Scenario