You are on page 1of 43

Advanced Topics

Informatica

Topics of Discussion

Identifying Bottlenecks Session Settings Some tips from experience

Identify Bottleneck

Identifying Bottlenecks
Writing to a slow target? Reading from a slow source?

Target Bottleneck

Source Bottleneck
No

Transformation inefficiencies? Session inefficiencies?

Mapping Bottleneck
Performance ok?
Yes Done

Session Bottleneck

System not optimized?

System Bottleneck

Target Bottleneck

Common causes of problems


Indexes or key constraints Databases checkpoints Small database network packet size Too many target instances in the mapping Target table is too wide Drop indexes and key constraints before loading and then rebuild Use bulk loading wherever practical Increase database network packet size Decrease the frequency of database checkpoints When using partitions consider partitioning the target table.

Common solutions

Source Bottleneck

Common causes of problems

Slow query Index issues Highly complex Small database network packet size Wide source tables Analyze the query with the help of show plan or other tools. Consider using database optimizer hints when joining several tables Consider indexing tables when you have order by or group by clauses Test source qualifier conditional filter versus filtering at the database level Increase database network packet size

Common Solutions

Mapping Bottleneck

Common causes of problems


Too many transformations Unused links between ports Too many input/output or output ports in an aggregator or ranking transformations Unnecessary datatype conversions Eliminate transform errors If several mappings read from the same source try single pass reading Optimize datatypes use integers for comparison Dont convert back and forth between datatypes Optimize lookups and lookup tables, using cache and indexing tables Put the filters early in the dataflow, use simple filter condition

Common Solutions

Mapping Bottleneck

Common Solutions

For aggregators use sorted input integer columns to group by and simplify expressions Use reusable sequence generators, increase number of cached values If you use the same logic in different data streams apply it before the streams branch off Optimize expressions Isolate slow and complex expressions Reduce or simplify aggregate functions Use local variables to encapsulate repeated computations Integer computations are faster than character computations Use operators rather than the equivalent functions, || faster than CONCAT()

Session Bottleneck

Common causes of problems


Inappropriate memory allocation settings Running in series rather than in parallel Error tracing override set to high level

Common Solutions

Experiment with DTM buffer pool and buffer block size If your mapping allows it use partition Run sessions in parallel with concurrent batches, whenever possible Increase database commit interval Turn off recovery and decimal arithmetic (theyre off by default) Use debugger rather than high error tracing, always reduce your tracing level during production runs Dont stage your data if you can avoid it

System Bottleneck

Common causes of problems


Slow network connections Overloaded or under-powered servers Slow disk performance Get the best machines to run the servers Use multiple CPUs and session partitioning Make sure informatica servers and database servers are closely located in your network If possible consider having informatica server and database server on the same machine

Common Solutions

Identifying Bottlenecks

Examining session results


Read/write throughput Rows failed # of objects in the mapping Type of objects in the mapping How many objects In parallel/partitioned? What's the size of the hardware Source/Target/Database? What kind of pipeline is setup for sourcing and targeting?

Examining Parallelism, Partitioning


Identifying Bottlenecks (2)

Source SQL/Lookup SQL


Group/Order bys Distinct clauses Where clause (filters) use of non-indexed fields Invalid plans

Database issues

Database connection configuration Database instance configuration

Identifying Bottlenecks (3)

Aggregator Problems

Too many multi level aggregates Incorrect selection of master table Too many (or too wide) join columns Not tuning the data and index caches Not tuning the data and index caches

Joiner Problems

Rank Problems

Identifying Bottlenecks (4)

Source or Target problems


Too many fields, width (precision) issues Implicit data conversions Update Strategies Too many targets per mapping No use of the bulk-loader

Session Problems Not enough RAM given to the session Commit point too high Commit point too low Too many sessions running in parallel

Top 10 Mapping Bottlenecks


1.
2.

3.

4. 5. 6. 7. 8. 9.

10.

Too many targets in a single mapping Data width is too large (too many columns passing through the mapping) Too many aggregators, lookups, joiners, ranks in the mapping Not tuning data/index settings for the above objects Too many objects in a single mapping Unused ports in Cached lookups Source query/joins not tuned Lookup query/cache not tuned Ports passed through the mapping but not passed to the target Huge expressions

Top 10 Session Mistakes


1.
2.

3.

4.

5. 6. 7.

8. 9. 10.

Not controlling the log file override Not tuning the data and index caches for lookups, aggregators, ranks and joiners Not Tuning the commit point to match the database performance setup Assuming that giving the mapping more memory will make it run faster Assuming that increasing the commit point will make it run faster Not utilizing partitioning available Running too many sessions in parallel on an undersized, overutilized machine Not architecting for failed rows Not setting the line buffer length for flat files Not testing the session for performance, when targeting a flat file

Problems with Map

Multiple targets, single thread used for write processes Multiple aggregators, single thread used for moving the data Stacked aggregators fight for memory, disk and cache directory in a single session

Problems with Maps (2)

Filter condition is too lengthy not optimized Expression only performs a single calculation, forcing the entire row processing when only the one field should be flowed through Disk contention is high 4 targets, single writer thread, I/O is a hotspot

Expression/Filter Contention
Expression Filter
IIF (EmpId= A and .. Or ..

Expression
B_Rowpass = IIF (EmpId= A and .. Or ...)

Filter
B_Rowpass

..)

Expressions are built for evaluation speed Filters take a different code path (slower)

Passing numeric integer to the filter keeps it fast (increases throughput by 0.5x and 3x)
Filter expressions should be as simple as possible to maximize throughput

Aggregator Contention
Aggregator1 Aggregator1
Aggregator1 Aggregator1
Serial Execution

Aggregator1
Aggregator1 Aggregator1
Parallel Execution

Single map multiple aggregator


Fight for disk I/O (Cache directory)

Splitting the Aggs across Maps


Each map has its own I/O process threads

Fight for RAM


Multiple pass aggregation of the entire data set Session runs only as fast as the slowest aggregator All aggregation done in serial

Data is aggregated only once


Parallelism is increased Amount of RAM per object is increased All aggregation is done in parallel

Update Stragtegies
Update Update Update Target1 Target2 Target3

If each row must be examined speed will be negatively impacted

Remove the update strategies through parallel mappings


Update strategies force each row to be analyzed Update strategies dont work against flat files

Steps to tuning
Make a copy of the map for each target

Remove all but one target from each copy


Work backwards from the target to the source eliminate unused/unnecessary transformations Simplify the mapping Move the filters upstream to the source if possible Move a large cached lookup into a joiner on source feed Stage the target data if necessary, then use a bulk loader to mass insert at high speeds Tune the source SQL, and the session parameters Tune the DB connection and the RDBMS

What does Session Partitioning Do?


It separates the data into physical blocks, basically reduces the amount of work that each load process has to do but increases the number of load process that have to take place Partitioning is the method of splitting the data. Execution of the partitioned loads is the parallelization of the loading process

Source Horizontal Partitioning


Source/Target
A-L

Provides best read ranges if you know what data you are after
Allows for process parallelism Breaks source data into manageable parts Caution: Requires additional maintenance Faster indexes Allows parallel queries to be run

M-S

T-Z

Source Vertical Partitioning


Source/Target

Provides smaller network packets


Allows increased parallelism / increased parallel reads Can provide better management over wide tables
Cols 0-100 Cols 200-300

Potentially decreases I/Os on the source side Assists in the processing component of the data movement

Joiner Object
Master reads rows first into data and index caches RAM is impacted depending on the rows read Joiner fields are read into D&I No control over D&I memory sizes

Contrary to popular belief, the joiner is not slow (compared to cached lookup)
The joiner is a powerful object for performance tuning especially when getting rid of cached lookups.

Joiner Contention
Data and index cache must be calculated properly Master-Detail Join: master should be the smaller table Detail Outer Join: Master should be smaller table Full Outer Join: Both tables should be relatively smaller in size Good for heterogeneous sources Rarely necessary when staging table architecture is employed Consider the size of shared memory for large scale join operations

Lookup Object
Initialization caches ALL data identified in the ports of the lookup, including the data to be matched on RAM is impacted depending on the number of rows read The lookups two flaws are: It fights for resources during execution and its initialization speed is dependent on the speed of the SQL beneath it. WIDE lookups can soak up a lot of time, and space, especially with the Order By clause thats generated/appended to the SQL.

Lookup Contention
Lookups should always be cached when: small width and huge number of rows (primary key only), or any width, and small number of rows (less than 100,000). The exception is: If the hardware has enough RAM to handle all the concurrent sessions and the cached lookup, then it is cached. If the lookup is cached, the data and index cache are utilized.

Uncached lookups should be used when: extremely large number of rows are sourced, or a wide table is sourced, or RAM is scarce
An uncached lookup generates its own database connection An uncached lookup should always be retrieved by primary key Cached or Uncached the connection to the database should have the maximum packet size.

Aggregator Object
Initialization caches ALL data identified in the ports of the aggregator RAM is impacted depending on the number of rows read The aggregator absorbs all rows before pushing them to the target

WIDE Aggregates can soak up a lot of space, also, they can increase I/Os if the data isnt sorted on the way in.

Rank Object
A lot like aggregator, reads all rows in to the data and index cache RAM is impacted depending on the number of rows read.

WIDE Ranks can soak up space, also they can increase the I/Os if data isnt sorted on the way in. All rows must be evaluated to get the top or bottom X%.

Sort Object
A lot like aggregator, reads all rows in to the data and index cache RAM is impacted depending on the number of rows read. WIDE rows can soak up space. Depending on whats setup for sorting on, the index can also grow large.

Router Transformation
Filter1 Expression Filter2 Filter3 Target2 Target3 Target1

Total of 6 passes for 3 filters. This can double the amount of work for passing a row.

Router Transformation
Target1 Expression Router Target2 Target3

Compared to 6 passes of data, now there are 4. This will help improve the performance.

Why use Bulk-Loaders?


Native (Internal) Connectivity Build row sets into RAM blocks Capable of bypassing logging mechanisms Capable of being run in parallel (Synchronized with the RDBMS parallel engine)

Why NOT use Bulk-Loaders?


Limited to FLAT FILE sourcing Provides for inserts only Requires rigid input structure Usually requires external scheduling resources Most loaders cannot be scripted within the database Complex logic for inserts can slow Loaders tremendously

Bulk-Loaders Modes
Fast Slow
The fast loads usually bypass R.I., and indexing. Even faster modes append to the target tables. Slow loads run direct through the engine (same as other applications) Loaders will switch modes if certain criteria arent met before the job starts.

Important session settings


Shared Memory Size Buffer Block Size Line Width Size (If flat file source) Data and Index Cache sizes Commit Point Log Setting
Power Center: Partition Settings

Session Speeds
Simple Map Simple Maps run faster given more memory There is a 1.8 GIG real mem limit to all maps The performance depends on how many maps are running in parallel Eventually too much parallelism will cause slow down of the entire system Even the simple maps run as fast as the source and the target Complex Map Each thread is linked by memory management as the block sizes change, the thread speed slides Speed tests indicate best average performance achieved with 128k Block sizes and 24Mb RAM (Shared Memory)

Alternatives to a lookup
Lookup Expression Expression

Joiner
Source Qual

Lookups are to be used for relatively smaller tables Caches all the data identified in the ports

Joiners cache only the keys based on which the join happens

Alternatives to a lookup
Lookup Expression Expression

Joiner
Src Qual Exp
This is an unconnected lookup The lookup is used to get the values based on a key and a a code
1 1 Txt1 1 2 Txt2 1 3 Txt3 Input O/P Result set 1 Txt1 Txt2 Txt3 2 txta txtb txtc The lookup is replaced by the Source qualifier, expression, aggregator and the joiner. The expression is used to calculate the proper values and the aggregator is used to keep the record with the accurate information (which in this case is the last record)

Agg

Group by Clause
Src Qual Agg

A group by clause can be replaced by an aggregator with a sorted input (if possible)
If the data is coming as sorted then the aggregator is always faster than the group bys

Deletes
Src1 Src Qual Src2
Source 2 is the driving table and Source 1 is the table from which the data is deleted Normally this process gives the keys in the table source 1 from where the data is to be deleted based on some conditions and then the data is deleted accordingly Source 2 contains a few record on a key which forms a part of the composite primary key in table 1 Source 2 is the only table that is used The target instance contains update override, based on which the deletion happens on the particular key coming from the source table

Upd

Tgt
(Src1)

Src2

Src Qual

Upd

Tgt
(Src1)

Thank You J

You might also like