Professional Documents
Culture Documents
16/04/2015
Pipeline Partitioning
You create a session for each mapping you want the PowerCenter Server to run. Every mapping
contains one or more source pipelines. A source pipeline consists of a source qualifier and all the
transformations and targets that receive data from that source qualifier.
If you purchase the Partitioning option, you can specify partitioning information for each source
pipeline in a mapping. The partitioning information for a pipeline controls the following factors:
The
number of reader, transformation, and writer threads that the master thread creates for
the pipeline.
How the PowerCenter Server reads data from the source, including the number of
connections to the source.
How the PowerCenter Server distributes rows of data to each transformation as it
processes the pipeline.
How the PowerCenter Server writes data to the target, including the number of connections
to each target in the pipeline.
You can specify partitioning information for a pipeline by setting the following attributes:
Location
of partition points. Partition points mark the thread boundaries in a pipeline and
divide the pipeline into stages. The PowerCenter Server sets partition points at several
transformations in a pipeline by default. If you have the Partitioning option, you can define
other partition points. When you add partition points, you increase the number of
transformation threads, which can improve session performance. The PowerCenter Server
can redistribute rows of data at partition points, which can also improve session
performance.
Number of partitions. A partition is a pipeline stage that executes in a single thread. If you
purchase the Partitioning option, you can set the number of partitions at any partition point.
When you add partitions, you increase the number of processing threads, which can
improve session performance.
Partition types. The PowerCenter Server specifies a default partition type at each partition
point. If you purchase the Partitioning option, you can change the partition type. The
partition type controls how the PowerCenter Server redistributes data among partitions at
partition points.
Partition Points
By default, the PowerCenter Server sets partition points at various transformations in the pipeline.
Partition points mark thread boundaries as well as divide the pipeline into stages. A stage is a
section of a pipeline between any two partition points. When you set a partition point at a
transformation, the new pipeline stage includes that transformation.
When you add a partition point, you increase the number of pipeline stages by one. Similarly, when
you delete a partition point, you reduce the number of stages by one.
Informatica
16/04/2015
Besides marking stage boundaries, partition points also mark the points in the pipeline where the
PowerCenter Server can redistribute data across partitions. For example, if you place a partition
point at a Filter transformation and define multiple partitions, the PowerCenter Server can
redistribute rows of data among the partitions before the Filter transformation processes the data.
The partition type you set at this partition point controls the way in which the PowerCenter Server
passes rows of data to each partition.
Number of Partitions
A partition is a pipeline stage that executes in a single reader, transformation, or writer thread.
By default, the PowerCenter Server defines a single partition in the source pipeline. If you purchase
the Partitioning option, you can increase the number of partitions. This increases the number of
processing threads, which can improve session performance.
Increasing the number of partitions or partition points increases the number of threads. Therefore,
increasing the number of partitions or partition points also increases the load on the server
machine. If the server machine contains ample CPU bandwidth, processing rows of data in a
session concurrently can increase session performance. However, if you create a large number of
partitions or partition points in a session that processes large amounts of data, you can overload the
system.
Partition Types
When you configure the partitioning information for a pipeline, you must specify a partition type at
each partition point in the pipeline. The partition type determines how the PowerCenter Server
redistributes data across partition points.
The Workflow Manager allows you to specify the following partition types:
Round-robin.
The PowerCenter Server distributes data evenly among all partitions. Use
round-robin partitioning where you want each partition to process approximately the same
number of rows.
Hash. The PowerCenter Server applies a hash function to a partition key to group data
among partitions. If you select hash auto-keys, the PowerCenter Server uses all grouped or
sorted ports as the partition key. If you select hash user keys, you specify a number of ports
to form the partition key. Use hash partitioning where you want to ensure that the
PowerCenter Server processes groups of rows with the same partition key in the same
partition.
Key range. You specify one or more ports to form a compound partition key. The
PowerCenter Server passes data to each partition depending on the ranges you specify for
each port. Use key range partitioning where the sources or targets in the pipeline are
partitioned by key range.
Pass-through. The PowerCenter Server passes all rows at one partition point to the next
partition point without redistributing them. Choose pass-through partitioning where you want
to create an additional pipeline stage to improve performance, but do not want to change
the distribution of data across partitions.
Informatica
16/04/2015
Database
partitioning. The PowerCenter Server queries the IBM DB2 system for table
partition information and loads partitioned data to the corresponding nodes in the target
database. Use database partitioning with IBM DB2 targets stored on a multi-node
tablespace.
You can specify different partition types at different points in the pipeline.
When you use this mapping in a session, you can increase session performance by specifying
different partition types at the following partition points in the pipeline:
Source
qualifier. To read data from the three flat files concurrently, you must specify three
partitions at the source qualifier. Accept the default partition type, pass-through.
Filter transformation. Since the source files vary in size, each partition processes a
different amount of data. Set a partition point at the Filter transformation, and choose roundrobin partitioning to balance the load going into the Filter transformation.
Sorter transformation. To eliminate overlapping groups in the Sorter and Aggregator
transformations, use hash auto-keys partitioning at the Sorter transformation. This causes
the PowerCenter Server to group all items with the same description into the same partition
before the Sorter and Aggregator transformations process the rows. You can delete the
default partition point at the Aggregator transformation.
Target. Since the target tables are partitioned by key range, specify key range partitioning
at the target to optimize writing data to the target.
Add
Enter
Specify
Add
Informatica
16/04/2015
You can configure the following information when you edit or add a partition point:
Specify
Add
Enter
Source
Qualifier or Normalizer. This partition point controls how the PowerCenter Server
extracts data from the source and passes it to the source qualifier. You cannot delete this
partition point.
Rank and unsorted Aggregator transformations. These partition points ensure that the
PowerCenter Server groups rows properly before it sends them to the transformation. You
can delete these partition points if the pipeline contains only one partition or if the
PowerCenter Server passes all rows in a group to a single partition before they enter the
transformation.
Target instances. This partition point controls how the writer passes data to the targets.
You cannot delete this partition point.
Informatica
16/04/2015
You
You
The transformation appears in the Partition Points node in the Partitions view on the Mapping tab of
the session properties.
Informatica
16/04/2015
Informatica
16/04/2015
Cache Partitioning
When you create a session with multiple partitions, the PowerCenter Server can partition caches for
the Aggregator, Joiner, Lookup, and Rank transformations. It creates a separate cache for each
partition, and each partition works with only the rows needed by that partition. As a result, the
PowerCenter Server requires only a portion of total cache memory for each partition. When you run
a session, the PowerCenter Server accesses the cache in parallel for each partition.
After you configure the session for partitioning, you can configure memory requirements and cache
directories for each transformation in the Transformations view on the Mapping tab of the session
properties. To configure the memory requirements, calculate the total requirements for a
transformation, and divide by the number of partitions. To further improve performance, you can
configure separate directories for each partition.
The guidelines for cache partitioning is different for each cached transformation:
Aggregator
Informatica
16/04/2015
Rank
Hash
auto-keys. The PowerCenter Server uses all grouped or sorted ports as a compound
partition key. You may need to use hash auto-keys partitioning at Rank, Sorter, and
unsorted Aggregator transformations.
Hash user keys. You specify a number of ports to generate the partition key.
Hash Auto-Keys
You can use hash auto-keys partitioning at or before Rank, Sorter, Joiner, and unsorted Aggregator
transformations to ensure that rows are grouped properly before they enter these transformations.
Informatica
16/04/2015
With key range partitioning, the PowerCenter Server distributes rows of data based on a port or set
of ports that you specify as the partition key. For each port, you define a range of values.
The PowerCenter Server uses the key and ranges to send rows to the appropriate partition.
Use key range partitioning in mappings where the source and target tables are partitioned by
key range.
The
Informatica
16/04/2015
By
default, the PowerCenter Server fails the session when you use database partitioning for
non-DB2 targets. However, you can configure the PowerCenter Server to default to
passthrough partitioning when you use database partitioning for non-DB2 relational targets:
You
cannot use database partitioning when you configure the session to use source-based
or user-defined commit, constraint-based loading, or session recovery.
The target table must contain a partition key. Also, you must link all not-null partition key
columns in the target instance to a transformation in the mapping.
You must use high precision mode when the IBM DB2 table partitioning key uses a Bigint
field. The PowerCenter Server fails the session when the IBM DB2 table partitioning key
uses a Bigint field and you use low precision mode.
If you create multiple partitions for a DB2 bulk load session, you must use database
partitioning for the target partition type. If you choose any other partition type, the
PowerCenter Server reverts to normal load and writes the following message to the session
log:
ODL_26097 Only database partitioning is support for DB2 bulk load.
If you configure a session for database partitioning, the PowerCenter Server reverts to passthrough
partitioning under the following circumstances:
The
You
You
configure the PowerCenter Server to treat the database partitioning partition type as
pass-through partitioning and you used database partitioning for a non-DB2 relational
target.
10
Informatica
16/04/2015
You
11
Informatica
16/04/2015
If
you use a shift-sensitive code page, you can use multi-threaded reading only if the
following conditions are true:
-
If
you configure a session for multi-threaded reading, and the PowerCenter Server cannot
create multiple threads to a file source, it writes a message to the session log and reads the
source with one thread.
W hen the PowerCenter Server uses multiple threads to read a source file, it may not read
the rows in the file sequentially. If sort order is important, configure the session to read the
file with a single thread. For example, sort order may be important if the mapping contains a
sorted Joiner transformation and the file source is the sort origin.
You can also use a combination of direct and indirect files to balance the load.
Session performance for multi-threaded reading is optimal with large source files.
Although the PowerCenter Server can create multiple connections to small source files,
performance may not be optimal.
Reading
direct files. You can configure the PowerCenter Server to read from one or
more direct files. If you configure the session with more than one direct file, the
PowerCenter Server creates a concurrent connection to each file. It does not create
multiple connections to a file.
Reading indirect files. When the PowerCenter Server reads an indirect file, it reads
the file list and reads the files in the list sequentially. If the session has more than one
file list, the PowerCenter Server reads the file lists concurrently, and it reads the files in
the list sequentially.
Reading
direct files. When the PowerCenter Server reads a direct file, it creates multiple
reader threads to read the file concurrently. You can configure the PowerCenter Server to
read from one or more direct files. For example, if a session reads from two files and you
create five partitions, the PowerCenter Server may distribute one file among two partitions
and one file among three partitions.
Reading indirect files. When the PowerCenter Server reads an indirect file, it creates
multiple threads to read the file list concurrently. It also creates multiple threads to read the
files in the list concurrently. The PowerCenter Server may use more than one thread to read
a single file.
12
Informatica
16/04/2015
Database Compatibility
When you configure a session with multiple partitions at the target instance, the PowerCenter
Server creates one connection to the target for each partition. If you configure multiple target
partitions in a session that loads to a database or ODBC target that does not support multiple
concurrent connections to tables, the session fails.
13
Informatica
16/04/2015
Local.
You can merge target files only if you choose local connections for all target partitions.
14
Informatica
16/04/2015
maintain the sort order, so you must follow specific partitioning guidelines to use this type of
partition point.
When you join data, you can partition data for the master and detail pipelines in the following ways:
1:n.
Use one partition for the master source and multiple partitions for the detail source.
The PowerCenter Server maintains the sort order because it does not redistribute master
data among partitions.
n:n. Use an equal number of partitions for the master and detail sources. When you use
n:n partitions, the PowerCenter Server processes multiple partitions concurrently. You may
need to configure the partitions to maintain the sort order depending on the type of partition
you use at the Joiner transformation.
If you add a partition point at the Joiner transformation, the Workflow Manager adds an equal
number of partitions to both master and detail pipelines.
Use different partitioning guidelines, depending on where you sort the data:
Using
Use
1:n partitions when you have one flat file in the master pipeline and multiple flat files in
the detail pipeline. Configure the session to use one reader-thread for each file.
Use n:n partitions when you have one large flat file in the master and detail pipelines.
Configure partitions to pass all sorted data in the first partition, and pass empty file data in
the other partitions.
Using
Use
Use
Using
the Sorter transformation. Use n:n partitions. If you use a hash auto-keys partition
at the Joiner transformation, configure each Sorter transformation to use hash auto-keys
partition points as well.
15
Informatica
16/04/2015
Configure
the mapping with one source and one Source Qualifier transformation in each
pipeline.
Specify the path and file name for each flat file in the Properties settings of the
Transformations view on the Mapping tab of the session properties.
Each file must use the same file properties as configured in the source definition.
The range of sorted data in the flat files can overlap. You do not need to use a unique range
of data for each file.
The Joiner transformation may output unsorted data depending on the join type. If you use a full
outer or detail outer join, the PowerCenter Server processes unmatched master rows last, which
can result in unsorted data.
16
Informatica
16/04/2015
This example shows sorted data passed in a single partition to maintain the sort order. The first
partition contains sorted file data while all other partitions pass empty file data. At the Joiner
transformation, the PowerCenter Server distributes the data among all partitions while maintaining
the order of the sorted data.
17
Informatica
16/04/2015
The Joiner transformation may output unsorted data depending on the join type. If you use a full
outer or detail outer join, the PowerCenter Server processes unmatched master rows last, which
can result in unsorted data.
The example shows sorted relational data passed in a single partition to maintain the sort order.
The first partition contains sorted relational data while all other partitions pass empty data. After the
PowerCenter Server joins the sorted data, it redistributes data among multiple partitions.
18
Informatica
16/04/2015
For best performance, use sorted flat files or sorted relational data. You may want to calculate the
processing overhead for adding Sorter transformations to your mapping.
You
use the hash auto-keys partition type for the Lookup transformation.
19
Informatica
16/04/2015
The
The
SetCountVariable
20
Informatica
16/04/2015
SetMaxVariable
SetMinVariable
You should use the SetVariable function only once for each mapping variable in a pipeline. When
you create multiple partitions in a pipeline, the PowerCenter Server uses multiple threads to
process that pipeline. If you use this function more than once for the same variable, the current
value of a mapping variable may have indeterministic results.
Partitioning Rules
You can create multiple partitions in a pipeline if the PowerCenter Server can maintain data
consistency when it processes the partitioned data. When you create a session, the Workflow
Manager validates each pipeline for partitioning. You can change the partitioning information for a
pipeline as long as it conforms to the rules and restrictions listed in this section.
There are several types of partitioning rules and restrictions. These include restrictions on the
number of partitions, partitioning restrictions when you change a mapping, restrictions that apply to
other Informatica products, and general guidelines.
21
Informatica
16/04/2015
Sequence numbers generated by Normalizer and Sequence Generator transformations might not
be sequential for a partitioned source, but they are unique.
You
The
The
The
22
Informatica
16/04/2015
You
Product Restrictions
PowerCenter Connect SDK: If the mapping contains a multi-group target that receives data
from more than one pipeline, then you can specify only one partition. If the mapping
contains a multi-group target that receives data from multiple groups, then the partition type
must be pass-through.
Partitioning Guidelines
The following guidelines apply to adding and deleting partition points:
You
23
Informatica
16/04/2015
You
can add a partition point at any other transformation provided that no partition point
receives input from more than one pipeline stage.
A partition
If
you choose key range partitioning at any partition point, you must specify a range for
each port in the partition key.
If you choose key range partitioning and need to enter a date range for any port, use the
standard PowerCenter date format.
The Workflow Manager does not validate overlapping string ranges, overlapping numeric
ranges, gaps, or missing ranges.
If a row contains a null value in any column that makes up the partition key, or if a row
contains values that fall outside all of the key ranges, the PowerCenter Server sends that
row to the first partition.
W hen
connecting to file sources or targets, you must choose the same connection type for
all partitions. You may choose different connection objects as long as each object is of the
same type.
You cannot merge output files from sessions with multiple partitions if you use FTP, an
external loader, or an MQSeries message queue as the target connection type.
24
Informatica
16/04/2015
Session Caches
The PowerCenter Server creates index and data caches in memory for Aggregator, Rank, Joiner,
and Lookup transformations in a mapping. The PowerCenter Server stores key values in the index
cache and output values in the data cache. You configure memory parameters for the index and
data cache in the transformation or session properties.
If the PowerCenter Server requires more memory, it stores overflow values in cache files.
When the session completes, the PowerCenter Server releases cache memory, and in most
circumstances, it deletes the cache files.
The PowerCenter Server creates cache files based on the PowerCenter Server code page.
Memory Cache
The PowerCenter Server creates a memory cache based on the size configured in the session
properties. When you create a mapping, you specify the index and data cache size for each
transformation instance. When you create a session, you can override the index and data cache
size for each transformation instance in the session properties.
When you configure a session, you calculate the amount of memory the PowerCenter Server needs
to process the session. Calculate requirements based on factors such as processing overhead and
column size for key and output columns.
By default, the PowerCenter Server allocates 1,000,000 bytes to the index cache and 2,000,000
bytes to the data cache for each transformation instance. If the PowerCenter Server cannot allocate
the configured amount of cache memory, it cannot initialize the session and the session fails.
If a server grid has 32-bit and 64-bit servers, and if a session exceeds 2 GB of memory, the master
server assigns it to a 64-bit server.
When you specify large cache sizes in transformations on 64-bit machines, the PowerCenter Server
might run out of physical memory and perform slower. If the cache size forces the PowerCenter
Server to swap virtual memory and to spill to disk, performance decreases.
A PowerCenter Server running on a 32-bit machine cannot run a session if the total size of all the
configured session caches is more than 2 GB.
Cache Files
If the PowerCenter Server requires more memory than the configured cache size, it stores overflow
values in the cache files. Since paging to disk can slow session performance, try to configure the
index and data cache sizes to store data in memory.
25
Informatica
16/04/2015
The PowerCenter Server creates the index and data cache files by default in the PowerCenter
Server variable directory, $PMCacheDir. If you do not define $PMCacheDir, the PowerCenter
Server saves the files in the PMCache directory specified in the UNIX configuration file or the cache
directory in the Windows registry. If the UNIX PowerCenter Server does not find a directory there, it
creates the index and data files in the installation directory. If the PowerCenter Server on Windows
does not find a directory there, it creates the files in the system directory.
If a cache file handles more than 2 GB of data, the PowerCenter Server creates multiple index and
data files. When creating these files, the PowerCenter Server appends a number to the end of the
filename, such as PMAGG*.idx1 and PMAGG*.idx2. The number of index and data files are limited
only by the amount of disk space available in the cache directory.
When you run a session, the PowerCenter Server writes a message in the session log indicating
the cache file name and the transformation name. When a session completes, the PowerCenter
Server typically deletes index and data cache files. However, you may find index and data files in
the cache directory under the following circumstances:
The
You
The
The PowerCenter Server use the following naming convention when it creates cache files:
For example, in the file name, PMLKUP8_4_2.idx, PMLKUP identifies the transformation type as
Lookup, 8 is the session ID, 4 is the transformation ID, and 2 is the partition index.
The cache directory should be local to the PowerCenter Server. You might encounter performance
or reliability problems when you cache large quantities of data on a mapped or mounted drive.
26
Informatica
16/04/2015
Cache Calculations
To determine cache requirements for a session, first add the total column size in the cache to the
row overhead. Multiply the result by the number of groups or rows in the cache. This gives the
minimum caching requirements. To determine the maximum requirements for the index cache, you
multiply the minimum requirements by two.
The following tables provide the calculations for the minimum cache requirements for each
transformation:
For an Aggregate Transformation:
27
Informatica
16/04/2015
Cache Partitioning
When you create a session with multiple partitions, the PowerCenter Server can partition caches for
the Aggregator, Joiner, Lookup, and Rank transformations. It creates a separate cache for each
partition, and each partition works with only the rows needed by that partition. As a result, the
PowerCenter Server requires only a portion of total cache memory for each partition. When you run
a session, the PowerCenter Server accesses the cache in parallel for each partition. If you do not
use cache partitioning, the PowerCenter Server accesses the cache serially for each partition.
After you configure the session for partitioning, you can configure memory requirements and cache
directories for each transformation in the Transformations view on the Mapping tab of the session
properties. To configure the memory requirements, calculate the total requirements for a
transformation, and divide by the number of partitions. To further improve performance, you can
configure separate directories for each partition.
The guidelines for cache partitioning is different for each cached transformation:
28
Informatica
16/04/2015
Aggregator
Aggregator Caches
When the PowerCenter Server runs a session with an Aggregator transformation, it stores data in
memory until it completes the aggregation. The PowerCenter Server uses cache partitioning when
you create multiple partitions in a pipeline that contains an Aggregator transformation. It creates one
memory cache and one disk cache for each partition and routes data from one partition to another
based on group key values of the transformation.
After you configure the partitions in the session, you can configure the memory requirements and
cache directories for the Aggregator transformation on the Mappings tab in session properties.
Allocate enough disk space to hold one row in each aggregate group.
If you use incremental aggregation, the PowerCenter Server saves the cache files in the cache file
directory.
29
Informatica
16/04/2015
Joiner Caches
When the PowerCenter Server runs a session with a Joiner transformation, it reads rows from the
master and detail sources concurrently and builds index and data caches based on the master
rows. The PowerCenter Server then performs the join based on the detail source data and the
cache data.
The number of rows the PowerCenter Server stores in the cache depends on the partitioning
scheme, the data in the master source, and whether or not you use sorted input.
When you create multiple partitions in a session, the PowerCenter Server processes the Joiner
transformation differently when you use n:n partitioning and when you use 1:n partitioning.
Processing
master and detail data for outer joins. When you run a multi-partitioned
session with a partitioned Joiner transformation, the PowerCenter Server builds one cache
per partition. In a single-partitioned master pipeline (1:n), the PowerCenter Server outputs
unmatched master rows after it processes all detail partitions. In a multipartitioned master
pipeline (n:n), the PowerCenter Server outputs unmatched master rows after it processes
the partition for each detail cache.
Configuring memory requirements. When you run a session with a Joiner transformation,
the PowerCenter Server uses n times the memory you specify on the Transformation view
of the Mapping tab. The PowerCenter Server might page to disk if you do not specify
enough memory.
When you use 1:n partitioning, each partition requires as much memory as a 1:1 partition session.
When you configure the cache for the Joiner transformation, enter the total transformation memory
requirements for a single partition.
When you use n:n partitioning, each partition requires only a portion of the memory required by a
1:1 partition session. When you configure the cache, divide the memory requirements for a 1:1
partition session by the number of partitions. Enter that amount for the cache requirements.
You
You
However, when you use sorted input and you use n:n partitioning, the PowerCenter Server caches
a different number of rows in the index and data cache:
Index
cache. The PowerCenter Server caches 100 master rows with unique keys.
Data
cache. The PowerCenter Server caches the master rows in the data cache that
correspond to the 100 rows in the index cache. The number of rows it stores in the data
cache depends on the data. For example, if every master row contains a unique key, the
PowerCenter Server stores 100 rows in the data cache. However, if the master data
contains multiple rows with the same key, the PowerCenter Server stores more than 100
rows in the data cache.
30
Informatica
16/04/2015
Lookup Caches
When the PowerCenter Server builds a lookup cache in memory, it processes the first row of data in
the transformation. It queries the cache for each row that enters the transformation.
Configure the index and data cache memory for each Lookup transformation. The PowerCenter
Server caches data differently for static and dynamic caches and also for sessions that use cache
partitioning.
When you run the session, the PowerCenter Server rebuilds a persistent cache if any cache file is
missing or invalid.
Static Cache
When you use a static lookup cache, the PowerCenter Server creates one memory cache for each
partition.If you use cache partitioning, the PowerCenter Server requires only a portion of the total
memory to cache each partition. So, when you configure cache size, you can divide the total
memory requirements by the number of partitions.
If you do not use cache partitioning, the PowerCenter Server requires as much memory for each
partition as it does for a single partition pipeline. So, when you configure cache size, you enter the
total memory requirements for the transformation.
If two Lookup transformations in a mapping share the cache, the PowerCenter Server does not
allocate additional memory for shared transformations in the same pipeline stage. For shared
transformations in a different pipeline stage, the PowerCenter Server does allocate additional
memory.
Static Lookup transformations that use the same data or a subset of data to create a disk cache can
share the disk cache. However, the lookup keys may be different, so the transformations must have
separate memory caches.
Dynamic Cache
When you use a dynamic lookup cache, the PowerCenter Server creates the memory cache based
on whether you use cache partitioning or not.
If you use cache partitioning, the PowerCenter Server creates one memory cache for each partition.
It requires only a portion of the total memory to cache each partition. So, when you configure cache
size, you can divide the total memory requirements by the number of partitions.
If you do not use cache partitioning, the PowerCenter Server creates one memory cache and one
disk cache for each transformation. All partitions share the memory and disk cache.
31
Informatica
16/04/2015
When you configure the cache size, enter the total memory requirements in the transformation or on
the Mapping tab in the session properties.
When Lookup transformations share a dynamic cache, the PowerCenter Server updates the
memory cache and disk cache. To keep the caches synchronized, the PowerCenter Server must
share the disk cache and the corresponding memory cache between the transformations.
Lookup
cache structures are identical. The lookup/output ports for the first shared
transformation must match the lookup/output ports for the subsequent transformations.
The transformations have the same lookup conditions, and the lookup condition columns
are in the same order.
You cannot share a partitioned cache with a non-partitioned cache.
W hen you share Lookup caches across target load order groups, you must configure the
target load order groups with the same number of partitions.
32
Informatica
16/04/2015
Rank Caches
When the PowerCenter Server runs a session with a Rank transformation, it compares an input row
with rows in the data cache. If the input row out-ranks a stored row, the PowerCenter Server
replaces the stored row with the input row.
If the Rank transformation is configured to rank across multiple groups, the PowerCenter Server
ranks incrementally for each group it finds.
The PowerCenter Server uses cache partitioning, when you create multiple partitions in a pipeline
that contains a Rank transformation. It creates one memory cache and one disk cache per partition
and routes data from one partition to another based on group key values of the transformation.
After you configure the partitions in the session, you can configure the memory requirements and
cache directories for the Rank transformation on the Mappings tab in session properties.
33