15803147 Typical Scenario

download 15803147 Typical Scenario

of 16

Transcript of 15803147 Typical Scenario

  • 8/8/2019 15803147 Typical Scenario

    1/16

    Informatica- Complex Scenarios and their Solutions

    Author(s)

    Aatish Thulasee Das

    Rohan Vaishampayan Vishal Raj

    Date written(MM/DD/YY): 07/01/2003

    Declaration

    We hereby declare that this document is based on our personal experiences and / orexperiences of our project members. To the best of our knowledge, this documentdoes not contain any material that infringes the copyrights of any other individual ororganization including the customers of Infosys.Aatish Thulasee Das, Rohan Vaishampayan, Vishal Raj

    Project Details

    Projects involved: REYREY

    H/W Platform: 516 RAM, Microsoft Windows 2000

    S/W Environment: Informatica

    Appln. Type: ETL tool

    Project Type : Dataware housing

    Target readers: Datawarehousing team using ETL tools

    Keywords

    ETL Tools, Informatica, Dataware Housing

    INDEX

    INFORMATICA- COMPLEX SCENARIOS AND THEIR SOLUTIONS................... ..... ...1

    Author(s)........................................................................................................................1

    Aatish Thulasee Das.......................................................................................................1

    Rohan Vaishampayan.....................................................................................................1

    Vishal Raj.......................................................................................................................1Date written(MM/DD/YY): 07/01/2003........................................................................1

    Declaration.....................................................................................................................1

    Project Details ...............................................................................................................1Target readers: Datawarehousing team using ETL tools...............................................1

    Keywords ......................................................................................................................1

    INTRODUCTION.................................................................................................................... ..3

    SCENARIOS:..............................................................................................................................3

    1. PERFORMANCE PROBLEMS WHEN A MAPPING CONTAINS MULTIPLE

    SOURCES AND TARGETS.........................................................................................................3

  • 8/8/2019 15803147 Typical Scenario

    2/16

    1.1 Background...............................................................................................................3

    1.2 Problem Scenario:.....................................................................................................3

    1.3 Solution:...................................................................................................................3Divide and Rule. It is always better to divide the Complex mapping (i.e. multiple

    source and Targets) in to simple mappings with one source and one target. That will

    greatly help in managing the mappings. Also all the related mappings can be executedin parallel in different sessions. Each session will establish its own connection and the

    server can handle all the requests in parallel against the multiple targets. Each session

    can be placed in to the Batch and run in CONCURRENT mode.................................3

    2. WHEN SOURCE DATA IS FLAT FILE...............................................................................4

    2.1 Background..............................................................................................................4What is a Flat File?........................................................................................................4

    A Flat file is one in which table data is gathered in lines of ASCII text with the value

    from each table cell separated by a delimiter or space and each row represented with anew line...........................................................................................................................4

    Below is the sample Flat File which was used during the project.................................4

    ........................................................................................................................................4Fig 2.1: In_Daily - Flat File...........................................................................................4

    2.2 Problem Scenario......................................................................................................4

    When the above flat file was loaded into Informatica the Source analyzer was likeshown below....................................................................................................................4

    ........................................................................................................................................5

    Fig 2.2: In_Daily - Flat File after loading into Informatica...........................................5

    Two Issues which were encountered during loading the above shown flat files are asfollowing:........................................................................................................................5

    2.3 Solution....................................................................................................................5

    Following is the solution which was incorporated by us to solve the above problem ..5

    1. Since the data was so heterogeneous we decided to keep all the data types in thesource qualifier as String and changed them as per the fields in which they were

    getting mapped................................................................................................................52. Regarding the Size of the fields we changed the size to the maximum possible size

    for example as mentioned...............................................................................................6

    ............................................................................................................................6

    3 EXTRACTING DATA FROM THE FLAT FILE CONTAINING NESTED RECORD

    SETS.......................................................................................................................................... ....6

    4. TOO LARGE LOOKUP TABLES:.......................................................................................8

    5 COMPLEX LOGIC FOR SEQUENCE GENERATION:....................................... ..... .....12

  • 8/8/2019 15803147 Typical Scenario

    3/16

    Introduction

    This Document is based upon learning that we had during the work on project Reynolds

    and Reynolds in CAPS (PCC), Pune. We have come up with the Best Practices toovercome the complex scenarios we faced during the ETL process. This Document also

    tells about some common best practices to follow while developing the mappings.

    Scenarios:

    1. Performance problems when a mapping contains multiple

    sources and Targets.

    1.1 Background

    In Informatica, multiple sources can be mapped with the multiple targets. Thisproperty is quite useful to map the relative mappings at one place. This reduces

    the creation of multiple sessions. Also all the relative loading takes place in onego. It is quite logical to group the different sources and targets in same mapping

    that contains the same logic.

    1.2 Problem Scenario:

    In the multiple target scenarios, if there are some complex transformations in

    some of the sub mappings then the performance is degraded drastically. In this

    scenario the single database connection is handling multiple database statements.

    Also it is difficult to manage the mapping. For example if there is performanceproblem due to one of the sub mapping then other sub mapping will also suffer

    the performance degradation.

    1.3 Solution:

    Divide and Rule. It is always better to divide the Complex mapping (i.e. multiplesource and Targets) in to simple mappings with one source and one target. That

    will greatly help in managing the mappings. Also all the related mappings can be

    executed in parallel in different sessions. Each session will establish its ownconnection and the server can handle all the requests in parallel against the

    multiple targets. Each session can be placed in to the Batch and run in

    CONCURRENT mode.

  • 8/8/2019 15803147 Typical Scenario

    4/16

    2. When source data is Flat File

    2.1 Background

    What is a Flat File?

    A Flat file is one in which table data is gathered in lines of ASCII text with the value fromeach table cell separated by a delimiter or space and each row represented with a newline.

    Below is the sample Flat File which was used during the project.

    Fig 2.1: In_Daily - Flat File.

    2.2 Problem Scenario

    When the above flat file was loaded into Informatica the Source analyzer was like shownbelow

  • 8/8/2019 15803147 Typical Scenario

    5/16

    Fig 2.2: In_Daily - Flat File after loading into Informatica.

    Two Issues which were encountered during loading the above shown flat files are as

    following:

    1. Data types of the fields from the flat file and the respective fields from Targettables were not matching. For example refer to Fig 2.1 in the First row i.e. recordcorresponding to BH the Fourth field is having its Data type as Date also refer theThird row i.e. field corresponding to CR the fourth field is Char and in the targettable the corresponding field was having data type as Char.

    2. Size of the fields from the flat file and the respective fields from Target tableswere not matching. For example refer to Fig 2.1 the Eighth row i.e. recordcorresponding to QR the fifth field is having its Field size as 100 but after theloading process the source analyzer showed the size of the field equal to 45 (asshown in the Fig 2.2) also the fifth field corresponding to CR is 5 and in the targettable the corresponding field was having size equal to 100.

    2.3 Solution

    Following is the solution which was incorporated by us to solve the above problem

    1. Since the data was so heterogeneous we decided to keep all the data types inthe source qualifier as String and changed them as per the fields in which theywere getting mapped.

  • 8/8/2019 15803147 Typical Scenario

    6/16

    2. Regarding the Size of the fields we changed the size to the maximum possiblesize for example as mentioned

    3 Extracting data from the flat file containing nested record

    sets.

    3.1 Background:

    The Flat file shown in the previous section (fig 2.2) contains the nested record set. Toexplain the nested formation of the record of the above file is restructured in the Fig 3.1.

  • 8/8/2019 15803147 Typical Scenario

    7/16

    Fig 3.1: In_Daily - Flat File restructured in the Nested form.

    Here the data is in 3 levels. First level of data is containing the Batch File

    information starting with record BH and ending with BT record. The second levelof data is containing Dealer records in the batch file, starting with record DH and

    ending with DT. The third level of data is containing information of different

    activities for a particular dealer.

    3.2 Problem Scenario:

    The data required for loading was in the form such that a single row should

    consist of dealer detail as well as different activities done by the particular dealer.

    But only the second level data (i.e. 2nd and 14th rows in the flat file shown above)

    contain the different dealer details and the Third level of data contains thedifferent activity details for dealers. Both the data required to be concatenated to

    form single information to load in a single row of target table.

    3.3 Solution:

    In this particular kind of scenario, the dealer information data (Second Level data)should be stored into variables by putting the condition that satisfies the dealer

    information. This row should be filtered in the next transformation. So, for that

    Level 1

    Level 3

    evel 2

  • 8/8/2019 15803147 Typical Scenario

    8/16

    particular row of flat file (i.e. dealer information) the data is stored in the

    variables. And for the dealers activity data (Third Level Data), row should be

    passed to next transformation with the Dealer Information that was stored in thevariable during previous row load.

    The same is done here:

    4. Too Large Lookup Tables:

    4.1 Background:

    What is a Lookup Transformation?

    A Lookup transformation is used in your mapping to look up data in a relational table,

    view, or synonym (See 4.1). Import a lookup definition from any relational database towhich both the Informatica Client and Server can connect. Multiple Lookuptransformations can be used in a mapping.

    The Informatica Server queries the lookup table based on the lookup ports in thetransformation (See Fig 4.2). It compares Lookup transformation port values to lookuptable column values based on the lookup condition. Use the result of the lookup to passto other transformations and the target.

  • 8/8/2019 15803147 Typical Scenario

    9/16

    You can use the Lookup transformation to perform many tasks, including:

    Get a related value. For example, if your source table includes employee ID, but youwant to include the employee name in your target table to make your summary dataeasier to read.

    Perform a calculation. Many normalized tables include values used in a calculation,

    such as gross sales per invoice or sales tax, but not the calculated value (such as netsales).

    Update slowly changing dimension tables. You can use a Lookup transformation todetermine whether records already exist in the target.

    (The actual screens are attached for reference.)

    Fig 4.1: LOOKUP is a kind of Transformation.

  • 8/8/2019 15803147 Typical Scenario

    10/16

    Lookup Conditions

    Fig 4.2: The Lookup Conditions to be specified in order to get Lookup Values.

    4.2 Problem Scenario:

    In the project one of the mappings had large lookup tables that were hampering theperformance of the mapping as

    a. They were consuming a lot of cache memory unnecessarily andb. More time was spent in searching for relatively less number of values from a large

    lookup table.

    Thus the loading of data from source table(s) to the target table(s) was unnecessarilyconsuming more time than it should normally do.

    4.3 Our Solution:

    We eliminated the first problem by simply using the lookup table as one of the sourcetable itself. The source tables & target tables are not cached in Informatica and hence itmade sense to use the large lookup table as a source. (See Fig 4.3) This also ensuredthat Cache memory would not be wasted unnecessarily and could be used for othertasks.

  • 8/8/2019 15803147 Typical Scenario

    11/16

    Multiple Source Tables Joined in the Source Qualifier

    Source Qualifier

    Fig 4.3: The Mapping showing the use of Lookup table as a Source table.

  • 8/8/2019 15803147 Typical Scenario

    12/16

    SQL to join the tables

    User Defined Join

    Fig 3.4: The use of Join condition in the Source Qualifier.

    After using the lookup table as a source we used a joiner condition in the Source

    Qualifier. This reduced the searching time that was taken by Informatica as the numbersof rows to be searched were drastically reduced since the join condition takes care of theexcess rows which would otherwise have been there in the Lookup transformation. Thusthe second problem was also successfully eliminated.

    5 Complex logic for Sequence Generation:

    5.1 Background:

    What is a Sequence Generator?A sequence generator is transformation that generates a sequence of numbers once you

    specify a starting value (see Fig 2.2) and the increment by which to increment thisstarting value. (The actual screens are attached for reference.)

  • 8/8/2019 15803147 Typical Scenario

    13/16

    Fig 5.1: The Sequence Generator is a kind of Transformation.

  • 8/8/2019 15803147 Typical Scenario

    14/16

    Fig 5.2: The Transformation details to be filled in order to generate a sequence.

    5.2 Problem Scenario: In the project one of the mappings had two requirements viz.,

    a. During the transfer of Data to a column of a Target Table the Sequence Generatorwas required to trigger only selectively. But as per its property, every time a row getsloaded into the Target Table the sequence generator is triggered.

    b. Another requirement was that the sequences of numbers generated by the SequenceGenerator were required to be in order.

    For e.g.: The values that were to be loaded in the column of the target table were eithersequence generated or obtained from a lookup table. So whenever the lookup conditionreturned a value that value would populate the Target Table but at the same time theSequence Generator would also trigger and hence increment by 1 so its CURRVAL

    (current value, see Fig 5.1) would be increment by 1. So when the next value is loaded inthe column of the target table the difference between the sequence generated valueswould be 2 instead of 1. Thus the generated sequence wont be continuous and therewould be gaps or holes in the sequence.

  • 8/8/2019 15803147 Typical Scenario

    15/16

    5.2 Our Solution:

    A basic rule for the Sequence Generator is that if a row gets loaded into the Target tablethe sequence generator gets triggered. In order to prevent the sequence generator fromtriggering we created two instances of the same target table. (See Fig 5.3)

    Sequence Generator Target Table (Second Instance)

    Lookup Table Target Table (First Instance)

    Fig 5.3: The Mapping showing two instances of the same Target table.

    The sequence generator was mapped to the column in the Target Table in the firstinstance (See Fig 5.3) whereas the value returned from the Lookup Table (if any) wasmapped to the same column in the Target table in the second instance (See Fig5.3).

    And all the other values for the remaining columns in the Target Table were filtered onthe basis of the value returned from the Lookup Table i.e. if the lookup table returned avalue then a row in the second instance of the target table would get populated and thusthe sequence generator wont be triggered.

    If the lookup table returns a null value then a row would get populated in the first instanceof the target table and in this case the sequence generator would trigger and its valuewould get loaded in the column of the Target Table.

  • 8/8/2019 15803147 Typical Scenario

    16/16

    Thus by achieving control over the triggering of the sequence generator we could avoidthe holes or gaps in the sequence generated by the Sequence generator.