Download - Cache Informatica

Transcript
Page 2: Cache Informatica

The Hidden agendaThe Hidden agenda

a) Basics of Cache 1) Memory Cache 2) Where the cache files are created 3) Naming Conventions 4) Cache Calculationsb) Advanced Cache 1) Look up Cache 2) Aggregator Cache 3) Joiner Cache 4) Ranker Cache

Page 3: Cache Informatica

Let’s get to the Basics: Let’s get to the Basics: Cache is a combination of:

1) Index Cache: Server stores key values or condition values used to index values at a faster rate.

2) Data Cache: Server stores output values.

Caching Storage Overview :

• For Index Caches:a) Aggregators store group by values from Group-By ports.b) Rankers store Group-By valuesc) Joiners store index values for the master (Join condition columns)

d) Lookups Stores lookup condition information

• For Data Caches:a) Aggregators store aggregate data based on Group-By ports (variable ports,

output ports, non group by ports)• b) Rankers store ranking based on Group-By port (output rows other than ranked column)• c) Joiners store master table (Output columns not in Join condition). d) Look ups Stores stores lookup data that is not stored in the index cache.

Page 4: Cache Informatica

Memory Cache :Memory Cache :

• The server creates a memory cache based on size specified in the session properties which can be done manually based on certain calculations .

• By default, the PowerCenter Server allocates 1 GB to the index cache and 2GB to the data cache for each transformation instance.

• If the PowerCenter Server cannot allocate the configured amount of cache memory, it cannot initialize the session and the session fails.

• If the PowerCenter Server requires more memory than the configured cache size, it pages to the Disc. Since paging to disk can slow session performance, try to configure the index and data cache sizes to store data in memory.

Page 5: Cache Informatica

Where are the Cache Files Created?

Where are the Cache Files Created?

• The PowerCenter Server creates the index and data cache files by default in the PowerCenter Server variable directory, $PMCacheDir.

• If you do not define $PMCacheDir, the PowerCenter Server saves the files in the PMCache directory specified in the UNIX configuration file or the cache directory in the Windows registry. If the UNIX PowerCenter Server does not find a directory there, it creates the index and data files in the installation directory. If the PowerCenter Server on Windows does not find a directory there, it creates the files in the system directory.

• If a cache file handles more than 2 GB of data, the PowerCenter Server creates multiple index and data files. When creating these files, the PowerCenter Server appends a number to the end of the filename, such as PMAGG*.idx1 and PMAGG*.idx2. The number of index and data files are limited only by the amount of disk space available in the cache directory.

Three Instances when the Cache File exists even after Session completion:

• a) The session performs incremental aggregation. • b) You configure the Lookup transformation to use a persistent cache. • c) The session does not complete successfully.

Page 6: Cache Informatica

Naming convention followed by Informatica Server:

• [<Name Prefix> | <Prefix> <session ID>_<transformation ID>]_[partition index]<suffix>.[overflow index]

• For example,

PMLKUP8_4_2.idx,

PMLKUP transformation type as Lookup, 8 the session ID4 the transformation ID, 2 the partition index.

Page 7: Cache Informatica

File Name Component

Description

Name Prefix

Cache file name prefix configured in the Lookup transformation.

Prefix

Describes the type of transformation: Aggregator transformation is PMAGG. Joiner transformation is PMJNR. Lookup transformation is PMLKUP. Rank transformation is PMAGG.

Session ID

Session instance ID number.

Transformation ID

Transformation instance ID number.

Partition Index

If the session contains more than one partition, this identifies the partition number. The partition index is zero-based, so the first partition has no partition index. Partition index 2 indicates a cache file created in the third partition.

SuffixIdentifies the type of file: Index file is .idx. Data file is .dat.

Overflow Index

If a cache file handles more than 2 GB of data, the PowerCenter Server creates multiple index and data files. When creating these files, the PowerCenter Server appends an overflow index to the filename, such as PMAGG*.idx.1 and PMAGG*.idx.2. The number of index and data files are limited by the amount of disk space available in the cache directory.

Page 8: Cache Informatica

Cache CalculationsCache Calculations• Aggregator:

Index size: (Sum of column sizes in group-by ports + 17) X number of groups.Data Size: (Sum of column sizes of output ports + 7) X number of groups.

• Rank:Index size: (Sum of column sizes in group-by ports + 17) X number of groups.Data Size: (Sum of column sizes of output ports + 10) X number of groups + 20.

• Joiner:Index Size: (Sum of master column sizes in join condition + 16) X number rows in master table.Data Size: (Sum of master column sizes NOT in join condition but on output ports + 8)X number of rows in master table

• LookUp:• Index Size: # rows in lookup table [( S column size) + 16] * 2 • Data Size: # rows in lookup table [( S column size) + 8]

Page 9: Cache Informatica

Datatype

Aggregator, Rank

Joiner, Lookup

Binary precision + 2precision + 8 Round to nearest multiple of 8

Date/Time 18 24

Decimal, high precision off (all precision)

10 16

Decimal, high precision on (precision <=18)

18 24

Decimal, high precision on (precision >18, <=28)

22 32

Decimal, high precision on (precision >28)

10 16

Decimal, high precision on (negative scale)

10 16

Double 10 16

Real 10 16

Integer 6 16

StringASCII mode: precision + 3

ASCII mode: precision + 9

Small integer 6 16

Page 10: Cache Informatica

Lookup Caches Overview Lookup Caches Overview

• The PowerCenter Server builds a cache in memory when it processes the first row of data in a cached Lookup transformation

• It allocates memory for the cache based on the amount you configure in the transformation or session properties.

• The PowerCenter Server stores condition values in the index cache and output values in the data cache

• The PowerCenter Server queries the cache for each row that enters the transformation.

• The PowerCenter Server also creates cache files by default in the $PMCacheDir

• If the data does not fit in the memory cache, the PowerCenter Server stores the overflow values in the cache files. When the session completes, the PowerCenter Server releases cache memory and deletes the cache files unless you configure the Lookup transformation to use a persistent cache.

Page 11: Cache Informatica

Types of Lookup CacheTypes of Lookup Cache• When configuring a lookup cache, you can specify any of the following options:

• Persistent cache. You can save the lookup cache files and reuse them the next time the PowerCenter Server processes a Lookup transformation configured to use the cache

• Recache from source. If the persistent cache is not synchronized with the lookup table, you can configure the Lookup transformation to rebuild the lookup cache.

• Static cache. You can configure a static, or read-only, cache for any lookup source. By default, the PowerCenter Server creates a static cache. It caches the lookup file or table and looks up values in the cache for each row that comes into the transformation. When the lookup condition is true, the PowerCenter Server returns a value from the lookup cache. The PowerCenter Server does not update the cache while it processes the Lookup transformation.

• Dynamic cache. If you want to cache the target table and insert new rows or update existing rows in the cache and the target, you can create a Lookup transformation to use a dynamic cache. The PowerCenter Server dynamically inserts or updates data in the lookup cache and passes data to the target table. You cannot use a dynamic cache with a flat file lookup.

• For example, your lookup table is your target table. So when you create the Lookup selecting the dynamic cache what It does is it will lookup values and if there is no match it will insert the row in both the target and the lookup cache (hence the word dynamic cache it builds up as you go along), or if there is a match it will update the row in the target. On the other hand Static caches dont get updated when you do a lookup.

• Shared cache. You can share the lookup cache between multiple transformations. You can share an unnamed cache between transformations in the same mapping. You can share a named cache between transformations in the same or different mappings.

Page 12: Cache Informatica
Page 13: Cache Informatica
Page 14: Cache Informatica
Page 15: Cache Informatica

Calculating the Lookup Index Cache

Calculating the Lookup Index Cache

• The lookup index cache holds data for the columns used in the lookup condition.

• The formula for calculating the minimum lookup index cache size is different than calculating the maximum size.

• For best session performance, specify the maximum lookup index cache size.

• Calculating the Minimum Lookup Index Cache

• 200 * [( S column size) + 16] Columns in lookup condition.• The minimum size for a lookup index cache is independent of the number of

source rows.

• Calculating the Maximum Lookup Index Cache

• # rows in lookup table [( S column size) + 16] * 2 Columns in lookup condition.

Page 16: Cache Informatica

Difference between Static and Dynamic Cache

Static cache: • U can insert rows into the cache as u pass to the target.

• The informatica server returns a value from the lookup table or cache when the condition is true.When the condition is not true, informatica server returns the default value for connected transformations and null for unconnected transformations.

• You can use a relational or flat file lookup.

Dynamic cache :

• U can not insert or update the cache.

• The informatica server inserts rows into cache when the condition is false.This indicates that the the row is not in the cache or target table. U can pass these rows to the target table

• You can use a relational look up only

Page 17: Cache Informatica
Page 18: Cache Informatica

• Example:

• The Lookup transformation, LKP_PROMOS, looks up values based on the ITEM_ID. It uses the following lookup condition:

• ITEM_ID = IN_ITEM_ID1

• ITEM_ID column size Column in lookup condition integer = 16

• The lookup condition uses one column, ITEM_ID, and the table contains 60,000 rows.

• Use the following calculation to determine the minimum index cache requirements:

• 200 * (16 + 16) = 6,400 • Use the following calculation to determine the maximum index cache

requirements: • 60,000 * (16 + 16) * 2 = 3,840,000 • Therefore, this Lookup transformation requires an index cache size between

6,400 and 3,840,000 bytes.

Page 19: Cache Informatica

Calculating the Lookup Data Cache

Calculating the Lookup Data Cache

• In a connected transformation, the data cache contains data for the connected output ports, not including ports used in the lookup condition. In an unconnected transformation, the data cache contains data from the return port.

• 1) PROMOTION_ID - Connected output port not in lookup condition – Integer -> 16

• 2) DISCOUNT - Connected output port not in lookup condition - Decimal 16

• The lookup table has 60,000 rows. • Use the following calculation to determine the minimum data cache

requirements: • 60,000 * (32 + 8) = 2,400,000• This Lookup transformation requires a data cache size of 2,400,000 bytes.

Page 20: Cache Informatica
Page 21: Cache Informatica

Aggregator CacheAggregator Cache• When the PowerCenter Server runs a session with an Aggregator

transformation, it stores data in memory until it completes the aggregation.

• If you use incremental aggregation, the PowerCenter Server saves the cache files in the cache file directory.

Note: The PowerCenter Server uses memory to process an Aggregator transformation with sorted ports. It does not use cache memory. You do not need to configure cache memory for Aggregator transformations that use sorted ports.

Page 22: Cache Informatica

Configuring the Session fro Incremental Aggregation

Configuring the Session fro Incremental Aggregation

• Use the following guidelines when you configure the session for incremental aggregation:

• Verify the location where you want to store the aggregate files. Configure the session to write file names in the session log.

• If you want the PowerCenter Server to write the incremental aggregation cache file names in the session log, configure the session with Verbose Init tracing.

• Verify the incremental aggregation settings in the session properties. You can configure the session for incremental aggregation in the Performance settings on the Properties tab.

• You can also configure the session to reinitialize the aggregate cache. If you choose to reinitialize the cache, the Workflow Manager displays a warning indicating the PowerCenter Server overwrites the existing cache and a reminder to clear this option after running the session.To configure a session for incremental aggregation:

Page 23: Cache Informatica
Page 24: Cache Informatica

Calculating the Aggregator Index Cache

Calculating the Aggregator Index Cache

The index cache holds group information from the group by ports.

# groups [( S column size) + 17]

Columns Group by columns

As per example,

STORE_ID – Integer size 6

ITEM - String size - 18Therefore total column size = 18 + 6 = 24

Assuming there are 72,000 input rows

The Min Index Cache calculation is:

72,000 * (24 + 17) = 2,952,000

The max index cache calculation is double the amount:

2,952,000 * 2 = 5,904,000

Therefore, this Aggregator transformation requires an index cache size between

2,952,000 and 5,904,000 bytes.

Page 25: Cache Informatica
Page 26: Cache Informatica

Calculating the Aggregator Data Cache

Calculating the Aggregator Data Cache

• The data cache holds row data for variable ports and connected output ports. As a result, the data cache is generally larger than the index cache. To reduce the data cache size, connect only the necessary input/output ports to subsequent transformations. Use the following information to calculate the minimum aggregate data cache size:

• # groups[( S column size) + 7]

• Column size a) Non group by input/output ports.

b) Local variable ports.

c) Port containing aggregate

function (multiply by three).*

In the example,

ORDER_ID – Integer 6

SALES_PER_STORE_ITEMS - Decimal 30*

Total = 36

The total number of groups as calculated for the index cache size is 72,000. Use the following calculation to determine the minimum data cache requirements:

• 72,000 * (36 + 7) = 3,096,000• Therefore, this Aggregator transformation requires a data cache size of 3,096,000 bytes.

Page 27: Cache Informatica
Page 28: Cache Informatica

Joiner CacheJoiner Cache• While using joiner cache informatica server first reads the data from master source

and built index & data cache in the master rows. After building the cache,the PowerCenter Server then performs the join based on the detail source data and the cache data.

• Server creates the Index cache as it reads the master source into the data cache. The server uses the Index cache to test the join condition. When it finds a match, it retrieves rows values from the data cache

• The PowerCenter Server caches all master rows with a unique key in the index cache, and all master rows in the data cache.

• For instance,

Index cache. The PowerCenter Server caches 100 master rows with unique keys. Data cache. The PowerCenter Server caches the master rows in the data cache that correspond to the 100 rows in the index cache. The number of rows it stores in the data cache depends on the data. For example, if every master row contains a unique key, the PowerCenter Server stores 100 rows in the data cache. However, if the master data contains multiple rows with the same key, the PowerCenter Server stores more than 100 rows in the data cache.

Page 29: Cache Informatica

Joiner Index Cache Calculation

Joiner Index Cache Calculation

The index cache holds rows from the master source that are in the join condition.

# master rows [( Sum of column size) + 8]Column Size Master column in join condition.

In the example, it joins the sources ORDERS and PRODUCTS on ITEM_NO: • ITEM_NO – Decimal(10) 16

• PRODUCTS is the master source and has 90,000 rows. Use the following calculation to determine the minimum index cache requirements:

• 90,000 * (16 + 16) = 2,880,000• Double the size to determine the maximum index cache requirements: • 2,880,000 * 2 = 5,760,000• Therefore, this Joiner transformation requires an index cache size between

2,880,000 and 5,760,000 bytes.

Page 30: Cache Informatica
Page 31: Cache Informatica

Joiner Data Cache Calculation

Joiner Data Cache Calculation

• The data cache holds rows from the master source until the PowerCenter Server joins the data.

• # master rows [( S column size) + 8]

• Column Master column not in join condition and used for output.

• In the example , The following figure shows the connected output ports for JNR_ORDERS_PRODUCTS:

• ITEM_NAME – string 32

• PRODUCT CATEGORY – decimal 30

• Total column size = 62

• The master source has 90,000 rows. • Use the following calculation to determine the minimum data cache requirements: • 90,000 * (62 + 8) = 6,300,000• This Joiner transformation requires a data cache size of 6,300,000 bytes.

Page 32: Cache Informatica
Page 33: Cache Informatica

Rank Caches Rank Caches

• When the PowerCenter Server runs a session with a Rank transformation, it compares an input row with rows in the data cache. If the input row out-ranks a stored row, the PowerCenter Server replaces the stored row with the input row.

• For example, you configure a Rank transformation to find the top three sales. The PowerCenter Server reads the following input data:

• SALES • 10,000• 12,210• 5,000• 2,455• 6,324• The PowerCenter Server caches the first three rows (10,000, 12,210, and 5,000).

When the PowerCenter Server reads the next row (2,455) it compares it to the cache values. Since the row is lower in rank than the cached rows, it discards the row with 2,455. The next row (6,324), however, is higher in rank than one of the cached rows. Therefore, the PowerCenter Server replaces the cached row with the higher-ranked input row.

• If the Rank transformation is configured to rank across multiple groups, the PowerCenter Server ranks incrementally for each group it finds.

Page 34: Cache Informatica

Calculating the Rank Index Cache

Calculating the Rank Index Cache

• The index cache holds group information from the group by ports. Use the following information to calculate the minimum rank index cache size:

• Rank Index Calculation:• # groups [( S column size) + 17]

• Columns Group by columns.

• PRODUCT_CATEGORY (string(21)- column size) = 24

• There are 10,000 product categories, so the total number of groups is 10,000. Use the following calculation to determine the minimum index cache requirements:

• 10,000 * (24 + 17) = 410,000• Double the size to determine the maximum index cache requirements: • 410,000 * 2 = 820,000• Therefore, this Rank transformation requires an index cache size between

410,000 and 820,000 bytes.

Page 35: Cache Informatica
Page 36: Cache Informatica

Calculating the Rank Data Cache

Calculating the Rank Data Cache

• The data cache size is proportional to the number of ranks. It holds row data until the PowerCenter Server completes the ranking and is generally larger than the index cache. To reduce the data cache size, connect only the necessary input/output ports to subsequent transformations. Use the following information to calculate the minimum rank data cache size:

• # groups [(# ranks *( S column size + 10)) + 20]

• ITEM_NO Decimal(10) = 10 • ITEM_NAME String(23) = 26 • PRICE Decimal (14) = 10 • TOTAL COLUMN SIZE = 46 • RNK_TOPTEN ranks by price, and the total number of ranks is 10. The

number of groups is 10,000. • Use the following calculation to determine the minimum data cache

requirements: • 10,000[(10 * (46 + 10)) + 20] = 5,800,000• This Rank transformation requires a data cache size of 5,800,000 • bytes.

Page 37: Cache Informatica