Identifying Bottlenecks

7/27/2019 Identifying Bottlenecks

http://slidepdf.com/reader/full/identifying-bottlenecks 1/7

General Informatica Best Practices

Performance and Tuning Overview

Identifying ETL Bottlenecks

Target Bottlenecks

Source Bottlenecks Mapping Bottlenecks

Session Bottlenecks

System Bottlenecks

Partitioning –How to make it fly



Identifying Bottlenecks

Target Bottleneck



Common sources of problems:

indexes or key constraints

database checkpoints

small database network packets size

too many target instances in your mapping

target table is too wide

Common solutions:

drop indexes and key constraints before loading, rebuild after loading

use bulk loading or external loaders when practical

increase database network packets size

decrease the frequency of database checkpoints

optimize target database disks allocation

when using partitions, consider partitioning your target table as well

Source Bottleneck




slow query

small database network packets size

wide source tables

Common solutions:

analyze the query issued by the Source Qualifier. It appears in the session log.

consider using database optimizer hints when joining several tables in a Source Qualifier

consider indexing tables when you have order by or group by clauses

try database parallel queries if supported

try partitioning the session if appropriate, try partitioning your source database as well

test Source Qualifier conditional filter versus filtering at the database level

increase database network packets size

Mapping Bottleneck


too many transforms

unused links between ports

too many input/output or outputs ports in aggregator or ranking transformations

unnecessary data type conversions



Common solutions:

eliminate transformation errors

if several mappings read from the same source, try single pass reading

optimize data types, use integers for comparisons.

don’t convert back and forth between data types

optimize lookups and lookup tables, using cache and indexing tables

put your filters early in the data flow, use a simple filter condition

for aggregators, use sorted input, integer columns to group by and simplify expressions

use reusable sequence generators, increase number of cached values

if you use the same logic in different data streams, apply it before the streams branch off

optimize expressions: isolate slow and complex expressions

reduce or simplify aggregate functions

Session Bottleneck


inappropriate memory allocation settings

running in series rather than in parallel

error tracing override set to high level



Common solutions:

calculate DTM buffer pool and buffer block size

make sure to keep data caches and indexes in memory, paging to disk is very slow

if your mapping allows it, use partitioning

run sessions in parallel, within concurrent batches, whenever possible

increase database commit interval

turn off recovery and decimal arithmetic (they’re off by default)

use debugger rather than high error tracing, always reduce your tracing level for production runs

System Bottleneck


slow network connections

overloaded or under-powered servers slow disk performance

Common solutions:

get the best machines to run your server. Better yet, use several servers against the samerepository (power center only)

use multiple CPUs and session partitioning

make sure Informatica servers and database servers are closely located in your network



if you have several CPUs, several disk drives and gobs of RAM, consider having Informaticaserver and database server on the same machine

shutdown unneeded processes or network services on your servers

use 7 bit ASCII data movement (the default) if you don’t need Unicode

evaluate hard disk performance, try locating sources and targets on different drives

get as much RAM as you can for your servers

Partitioning

A partition is a pipeline stage that executes in a single thread

Partition points mark the thread boundaries in a pipeline and divides the pipeline process into stages

The partition strategy can be different at each partition point in the pipeline process

Adding partitions increase the number of threads created by Informatica PowerCenter allows for up to 16 partitions at each partition point

By increasing partition points, threads increase, allowing performance increase HOWEVER load onserver is also increased, so if server is undersized partitioning is of no value, can actually decreaseperformance

Partitioning continued

Partition Types

Round Robin

Key Range

Hash Key

Pass Through

Performance can be increased by changing partitioning strategy at different partition pointsSource Qualifier –Key Range or Hash Auto

Expression or Filter –Round Robin

Sorter and Aggregator –Hash Auto Keys

Target –Key Range

Identifying Bottlenecks

Documents

Transcript of Identifying Bottlenecks