Informatica PowerCenter(8.6.1)Performance Tuning
Embed Size (px)
Transcript of Informatica PowerCenter(8.6.1)Performance Tuning
Performance TuningInformatica PowerCenter (Version 8.6.1) Jishnu Pramanik
1.Performance Tuning Overview
1.1 OverviewThe goal of performance tuning is to optimize session performance by eliminating performance bottlenecks. To tune session performance, first identify a performance bottleneck, eliminate it, and then identify the next performance bottleneck until you are satisfied with the session performance. You can use the test load option to run sessions when you tune session performance. If you tune all the bottlenecks, you can further optimize session performance by increasing the number of pipeline partitions in the session. Adding partitions can improve performance by utilizing more of the system hardware while processing the session.
1.2 Need for Performance TuningPerformance is not just one job loading maximum data in a particular time frame. Performance can be more accurately defined as a combination of several small jobs which affect the overall performance of a system. Informatica is an ETL tool with high performance capability. We need to make maximum utilization of its features to increase its performance. With the ever increasing user requirements and exploding data volumes, we need to achieve more in less time. The goal of performance tuning is optimize session performance. This document lists all the techniques available to tune Informatica performance.
2.1 OverviewPerformance of Informatica is dependent on the performance of its several components like database, network, transformations, mappings, sessions etc. To tune the performance of Informatica, we have to identify the bottleneck first. Bottleneck may be present in source, target, transformations, mapping, session, database or network. It is best to identify performance issue in components in the order source, target, transformations, mapping and session. After identifying the bottleneck, apply the tuning mechanisms in whichever way they are applicable to the project.
2.2 Identify bottleneck in SourceIf source is a relational table, put a filter transformation in the mapping, just after source qualifier; make the condition of filter to FALSE. So all records will be filtered off and none will proceed to other parts of the mapping. In original case, without the test filter, total time taken is as follows:Total Time = time taken by (source + transformations + target load) Now because of filter, Total Time = time taken by source So if source was fine, then in the latter case, session should take less time. Still if the session takes near equal time as former case, then there is a source bottleneck.
2.3 Identify bottleneck in TargetThe most common performance bottleneck occurs when the Integration Service writes to a target database. To identify a target bottleneck, configure a copy of the session to write to a flat file target. If the session performance increases significantly when you write to a flat file, you have a target bottleneck. If a session already writes to a flat file target, you probably do not have a target bottleneck.
2.4 Identify bottleneck in TransformationRemove the transformation from the mapping and run it. Note the time taken. Then put the transformation back and run the mapping again. If the time taken now is significantly more than previous time, then the transformation is the bottleneck. But removal of transformation for testing can be a pain for the developer since that might require further changes for the session to get into the working mode. So we can put filter with the FALSE condition just after the transformation and run the session. If the session run takes equal time with and without this test filter, then transformation is the bottleneck.
2.5 Identify bottleneck in sessionsWe can use the session log to identify whether the source, target or transformations are the performance bottleneck. Session logs contain thread summary records like the following:MASTER> PETL_24018 Thread [READER_1_1_1] created for the read stage of partition point [SQ_test_all_text_data] has completed: Total Run Time = [11.703201] secs, Total Idle Time = [9.560945] secs, Busy Percentage = [18.304876]. MASTER> PETL_24019 Thread [TRANSF_1_1_1_1] created for the transformation stage of partition point [SQ_test_all_text_data] has completed: Total Run Time = [11.764368] secs, Total Idle Time = [0.000000] secs, Busy Percentage = [100.000000]. If busy percentage is 100, then that part is the bottleneck. Basically we have to rely on thread statistics to identify the cause of performance issues. Once the Collect Performance Data option (In session- Properties tab) is enabled, all the performance related information would appear in the log created by the session.
2.6 Identifying System Bottlenecks on WindowsOn Windows, you can view the Performance and Processes tab in the Task Manager. To access the Task Manager, press use Ctrl+Alt+Del, and click Task Manager. The Performance tab in the Task Manager provides an overview of CPU usage and total memory used. You can view more detailed performance information by using the Performance Monitor on Windows. To access the Performance Monitor, click Start > Programs > Administrative Tools, and choose Performance Monitor. Use the Windows Performance Monitor to create a chart that provides the following information: Percent processor time : If you have more than one CPU, monitor each CPU for percent processor time. If the processors are utilized at more than 80%, you may consider adding more processors. Pages/second : If pages/second is greater than five, you may have excessive memory pressure (thrashing). You may consider adding more physical memory. Physical disks percent time : The percent of time that the physical disk is busy performing read or write requests. If the percent of time is high, tune the cache for PowerCenter to use in-memory cache instead of writing to disk. If you tune the cache, requests are still in queue, and the disk busy percentage is at least 50%, add another disk device or upgrade to a faster disk device: You can also use a separate disk for each partition in the session. Physical disks queue length : The number of users waiting for access to the same disk device. If physical disk queue length is greater than two, you may consider adding another disk device or upgrading the disk device. You also can use separate disks for the reader, writer, and transformation threads. Server total bytes per second : This is the number of bytes the server has sent to and received from the network. You can use this information to improve network bandwidth.
3.Optimizing the Target
3.1 OverviewYou can optimize the following types of targets: Flat file Relational
3.2 Flat File TargetIf you use a shared storage directory for flat file targets, you can optimize session performance by ensuring that the shared storage directory is on a machine that is dedicated to storing and managing files, instead of performing other tasks. If the Integration Service runs on a single node and the session writes to a flat file target, you can optimize session performance by writing to a flat file target that is local to the Integration Service process node.
3.3 Relational TargetIf the session writes to a relational target, you can perform the following tasks to increase performance: Drop indexes and key constraints : When you define key constraints or indexes in target tables, you slow the loading of data to those tables. To improve performance, drop indexes and key constraints before running the session. You can rebuild those indexes and key constraints after the session completes. If you decide to drop and rebuild indexes and key constraints on a regular basis, you can use the following methods to perform these operations each time you run the session: -Use pre-load and post-load stored procedures. -Use pre-session and post-session SQL commands.
Increase checkpoint intervals : The Integration Service performance slows each time it waits for the database to perform a checkpoint. To increase performance, consider increasing the database checkpoint interval. When you increase the database checkpoint interval, you increase the likelihood that the database performs checkpoints as necessary, when the size of the database log file reaches its limit.
Use bulk loading : You can use bulk loading to improve the performance of a session that inserts a large amount of data into a DB2, Sybase ASE, Oracle, or Microsoft SQL Server database. Configure bulk loading in the session properties. When bulk loading, the Integration Service bypasses the database log, which speeds performance. Without writing to the database log, however, the target database cannot perform rollback. As a result, you may not be able to perform recovery. When you use bulk loading, weigh the importance of improved session performance against the ability to recover an incomplete session. When bulk loading to Microsoft SQL Server or Oracle targets, define a large commit interval to increase performance. Microsoft SQL Server and Oracle start a new bulk load transaction after each commit. Increasing the commit interval reduces the number of bulk load transactions, which increases performance
Use external loading : You can use an external loader to increase session performance. If you have a DB2 EE or DB2 EEE target database, you can use the DB2 EE or DB2 EEE external loaders to bulk load target files. The DB2 EE external loader uses the Integration Service db2load utility to load data. The DB2 EEE external loader uses the DB2 Autoloader utility. If you have a Teradata target database, you can use the Teradata external loader utility to bulk load target files. To use the Teradata external loader utility, set up the attributes, such as Error Limit, Tenacity, MaxSessions, and Sleep, to optimize perfor